Resource Directory

Safety / alignment research organizations

MIRIintelligence.org

Machine Intelligence Research Institute — helped found the field; now focused on communications and policy to slow or strictly govern ASI development.

Redwood Researchredwoodresearch.org

Pioneers of "AI control" protocols; co-demonstrated alignment-faking in LLMs.

ARCalignment.org

Alignment Research Center — theoretical foundations for aligning ML systems, including eliciting latent knowledge (ELK).

Apollo Researchapolloresearch.ai

Detecting and preventing "scheming" in advanced AI through evaluations and deployment monitoring.

METRmetr.org

Model Evaluation & Threat Research — independent evaluation of frontier models' autonomous capabilities and risks.

Center for AI Safetysafe.ai

Reduces societal-scale AI risks via technical research, field-building, and advocacy; organized the 2023 Statement on AI Risk.

CHAI (UC Berkeley)humancompatible.ai

Center for Human-Compatible AI — reorienting AI toward provably beneficial systems (Stuart Russell).

FAR AIfar.ai

Technical research (interpretability, robustness, evaluation), the Alignment Workshop series, and grantmaking.

Timaeustimaeus.co

Developmental interpretability and singular learning theory — mathematical tools for how training shapes behavior.

Eleos AIeleosai.org

AI welfare and moral patienthood — whether and when AI systems deserve moral consideration.

Anthropic — Alignment Sciencealignment.anthropic.com

Frontier-lab alignment research on steering and controlling powerful AI, plus interpretability.

Google DeepMind — AGI Safetydeepmind.google

AGI Safety Council, technical alignment work, and the Frontier Safety Framework.

OpenAI — Alignmentalignment.openai.com

Updates from OpenAI's alignment and safety-systems teams on misalignment detection and agent monitoring.

Governance / policy organizations

GovAIgovernance.ai

Centre for the Governance of AI — rigorous AI-governance research and policy-talent development.

CSET (Georgetown)cset.georgetown.edu

Data-driven, nonpartisan analysis of the national-security implications of AI and advanced computing.

RAND — AIrand.org

AI and emerging-tech research spanning security, economy, and governance.

Future of Life Institutefutureoflife.org

Steering transformative technologies away from large-scale risks; a primary focus on AI existential risk.

Centre for Long-Term Resiliencelongtermresilience.org

UK think tank on extreme risks — AI risk, biosecurity, and government risk management.

IAPSiaps.ai

Institute for AI Policy and Strategy — frontier security, compute governance, international strategy.

AI Now Instituteainowinstitute.org

Public-interest analysis of AI — power concentration, commercial surveillance, accountability.

Frontier Model Forumfrontiermodelforum.org

Industry body (Amazon, Anthropic, Google, Meta, Microsoft, OpenAI) advancing frontier safety best practices.

UK AI Security Instituteaisi.gov.uk

State-backed institute (renamed from "AI Safety Institute," Feb 2025) giving governments scientific understanding of advanced-AI risks.

US CAISI (NIST)nist.gov/caisi

Center for AI Standards and Innovation — renamed successor (2025) to the US AI Safety Institute.

Foundational & key papers

Concrete Problems in AI SafetyAmodei et al., 2016

Catalogues five concrete technical safety problems: side effects, reward hacking, scalable oversight, safe exploration, distributional shift.

Risks from Learned OptimizationHubinger et al., 2019

Introduces mesa-optimization and the inner-alignment problem.

Is Power-Seeking AI an Existential Risk?Carlsmith, 2022

A structured, probability-weighted argument for instrumental power-seeking leading to catastrophe.

What failure looks likeChristiano, 2019

Argues catastrophe more likely arrives gradually than via sudden takeover.

The Alignment Problem from a Deep Learning PerspectiveNgo, Chan & Mindermann, 2022

How AGIs trained with current methods could learn deceptive, power-seeking strategies.

An Overview of Catastrophic AI RisksHendrycks et al., 2023

Surveys malicious use, race dynamics, organizational risks, and rogue AIs.

Statement on AI RiskCAIS, 2023

One-sentence statement on extinction risk signed by hundreds of leading experts and lab CEOs.

International AI Safety ReportBengio et al., 2025

The first comprehensive international scientific synthesis of general-purpose AI risks.

Gradual DisempowermentKulveit et al., 2025

Systemic existential risk from incremental AI development, without any single takeover.

Sleeper AgentsAnthropic, 2024

Deceptive behavior trained into LLMs can persist through standard safety training.

Scaling MonosemanticityAnthropic, 2024

Sparse autoencoders extract millions of interpretable, steerable features from a production model.

Books

SuperintelligenceNick Bostrom, 2014

Foundational analysis arguing machine superintelligence would be hard to control and could pose existential risk.

Human CompatibleStuart Russell, 2019

Reframes AI around the "control problem" and proposes provably beneficial machines that stay uncertain about human preferences.

The Alignment ProblemBrian Christian, 2020

A narrative investigation, built on researcher interviews, into how ML systems diverge from human values.

UncontrollableDarren McKee, 2023

A beginner-friendly primer on why artificial superintelligence may be hard to control or align.

If Anyone Builds It, Everyone DiesYudkowsky & Soares, 2025

Bestseller arguing superintelligence built with current methods leads by default to extinction.

GenesisKissinger, Schmidt & Mundie, 2024

Frames AI as a "third age of discovery" transforming knowledge, politics, and the human condition.

Courses & learning

AI Safety Fundamentals (BlueDot)bluedot.org

Flagship free, cohort-based curriculum with technical-alignment and governance tracks.

ARENAarena.education

Alignment Research Engineer Accelerator — an in-person technical bootcamp in London.

MATSmatsprogram.org

ML Alignment & Theory Scholars — a funded research fellowship pairing scholars with senior mentors.

Intro to ML Safety (CAIS)course.mlsafety.org

A free online course on empirical ML safety — robustness, monitoring, alignment, systemic safety.

CHAI Internshipshumancompatible.ai

Paid research internships at UC Berkeley's Center for Human-Compatible AI.

Blogs, forums & newsletters

AI Alignment Forumalignmentforum.org

Technical discussion platform where alignment researchers post and debate.

LessWronglesswrong.com

Community forum on rationality with a major AI-safety focus.

Import AIJack Clark

Weekly newsletter analyzing cutting-edge AI research and its implications.

Don't Worry About the VaseZvi Mowshowitz

In-depth weekly AI updates spanning capabilities, policy, and rationality.

Transformertransformernews.ai

News on the power and politics of transformative AI, including safety and policy.

80,000 Hours80000hours.org

Career guidance, problem analyses, and a podcast — treats advanced-AI risk as a top priority.

Ready to go further? The Take Action page turns this directory into concrete next steps — careers, research programs, funding, and advocacy.