ASAT
- "AGI Safety & Alignment"
- amplified oversight
- interpretability
- ASAT eng (automated alignment research)
- Causal Incentives Working Group,
Frontier Safety
- Risk Assessment (evals, threat models, the framework),
- Mitigations (e.g. banning accounts, refusal training, jailbreak robustness)
- Loss of Control (control, alignment evals)
Gemini Safety
Voices of All in Alignment
AGI Safety Council
Responsibility and Safety Council