Home › Resource Directory

Resource directory

A curated directory of the organizations, papers, books, courses, and publications that make up the AI safety and governance field. All links were verified during research.

Reviewed June 2026 · Names reflect recent rebrands (e.g. UK AI Security Institute; US CAISI; Open Philanthropy → Coefficient Giving)

Safety / alignment research organizations

MIRIintelligence.org
Machine Intelligence Research Institute — helped found the field; now focused on communications and policy to slow or strictly govern ASI development.
Redwood Researchredwoodresearch.org
Pioneers of "AI control" protocols; co-demonstrated alignment-faking in LLMs.
ARCalignment.org
Alignment Research Center — theoretical foundations for aligning ML systems, including eliciting latent knowledge (ELK).
Apollo Researchapolloresearch.ai
Detecting and preventing "scheming" in advanced AI through evaluations and deployment monitoring.
METRmetr.org
Model Evaluation & Threat Research — independent evaluation of frontier models' autonomous capabilities and risks.
Reduces societal-scale AI risks via technical research, field-building, and advocacy; organized the 2023 Statement on AI Risk.
CHAI (UC Berkeley)humancompatible.ai
Center for Human-Compatible AI — reorienting AI toward provably beneficial systems (Stuart Russell).
FAR AIfar.ai
Technical research (interpretability, robustness, evaluation), the Alignment Workshop series, and grantmaking.
Timaeustimaeus.co
Developmental interpretability and singular learning theory — mathematical tools for how training shapes behavior.
Eleos AIeleosai.org
AI welfare and moral patienthood — whether and when AI systems deserve moral consideration.
Anthropic — Alignment Sciencealignment.anthropic.com
Frontier-lab alignment research on steering and controlling powerful AI, plus interpretability.
AGI Safety Council, technical alignment work, and the Frontier Safety Framework.
OpenAI — Alignmentalignment.openai.com
Updates from OpenAI's alignment and safety-systems teams on misalignment detection and agent monitoring.

Governance / policy organizations

GovAIgovernance.ai
Centre for the Governance of AI — rigorous AI-governance research and policy-talent development.
CSET (Georgetown)cset.georgetown.edu
Data-driven, nonpartisan analysis of the national-security implications of AI and advanced computing.
RAND — AIrand.org
AI and emerging-tech research spanning security, economy, and governance.
Future of Life Institutefutureoflife.org
Steering transformative technologies away from large-scale risks; a primary focus on AI existential risk.
Centre for Long-Term Resiliencelongtermresilience.org
UK think tank on extreme risks — AI risk, biosecurity, and government risk management.
IAPSiaps.ai
Institute for AI Policy and Strategy — frontier security, compute governance, international strategy.
AI Now Instituteainowinstitute.org
Public-interest analysis of AI — power concentration, commercial surveillance, accountability.
Frontier Model Forumfrontiermodelforum.org
Industry body (Amazon, Anthropic, Google, Meta, Microsoft, OpenAI) advancing frontier safety best practices.
State-backed institute (renamed from "AI Safety Institute," Feb 2025) giving governments scientific understanding of advanced-AI risks.
US CAISI (NIST)nist.gov/caisi
Center for AI Standards and Innovation — renamed successor (2025) to the US AI Safety Institute.

Foundational & key papers

Catalogues five concrete technical safety problems: side effects, reward hacking, scalable oversight, safe exploration, distributional shift.
Risks from Learned OptimizationHubinger et al., 2019
Introduces mesa-optimization and the inner-alignment problem.
A structured, probability-weighted argument for instrumental power-seeking leading to catastrophe.
What failure looks likeChristiano, 2019
Argues catastrophe more likely arrives gradually than via sudden takeover.
How AGIs trained with current methods could learn deceptive, power-seeking strategies.
Surveys malicious use, race dynamics, organizational risks, and rogue AIs.
One-sentence statement on extinction risk signed by hundreds of leading experts and lab CEOs.
The first comprehensive international scientific synthesis of general-purpose AI risks.
Gradual DisempowermentKulveit et al., 2025
Systemic existential risk from incremental AI development, without any single takeover.
Sleeper AgentsAnthropic, 2024
Deceptive behavior trained into LLMs can persist through standard safety training.
Sparse autoencoders extract millions of interpretable, steerable features from a production model.

Books

SuperintelligenceNick Bostrom, 2014
Foundational analysis arguing machine superintelligence would be hard to control and could pose existential risk.
Human CompatibleStuart Russell, 2019
Reframes AI around the "control problem" and proposes provably beneficial machines that stay uncertain about human preferences.
The Alignment ProblemBrian Christian, 2020
A narrative investigation, built on researcher interviews, into how ML systems diverge from human values.
UncontrollableDarren McKee, 2023
A beginner-friendly primer on why artificial superintelligence may be hard to control or align.
If Anyone Builds It, Everyone DiesYudkowsky & Soares, 2025
Bestseller arguing superintelligence built with current methods leads by default to extinction.
GenesisKissinger, Schmidt & Mundie, 2024
Frames AI as a "third age of discovery" transforming knowledge, politics, and the human condition.

Courses & learning

Flagship free, cohort-based curriculum with technical-alignment and governance tracks.
ARENAarena.education
Alignment Research Engineer Accelerator — an in-person technical bootcamp in London.
MATSmatsprogram.org
ML Alignment & Theory Scholars — a funded research fellowship pairing scholars with senior mentors.
Intro to ML Safety (CAIS)course.mlsafety.org
A free online course on empirical ML safety — robustness, monitoring, alignment, systemic safety.
CHAI Internshipshumancompatible.ai
Paid research internships at UC Berkeley's Center for Human-Compatible AI.

Blogs, forums & newsletters

AI Alignment Forumalignmentforum.org
Technical discussion platform where alignment researchers post and debate.
LessWronglesswrong.com
Community forum on rationality with a major AI-safety focus.
Import AIJack Clark
Weekly newsletter analyzing cutting-edge AI research and its implications.
In-depth weekly AI updates spanning capabilities, policy, and rationality.
Transformertransformernews.ai
News on the power and politics of transformative AI, including safety and policy.
80,000 Hours80000hours.org
Career guidance, problem analyses, and a podcast — treats advanced-AI risk as a top priority.
Ready to go further? The Take Action page turns this directory into concrete next steps — careers, research programs, funding, and advocacy.