Chloe Li

Hello

I spend my time thinking about how to make powerful AI systems honest and aligned to human values. I work on things like alignment/character training, honesty training, dangerous capability evaluations and control.

I’m currently an Anthropic Fellow working on alignment research with Sam Marks, Jon Kutasov, and Sara Price. Previously, I was the Program Lead and TA of ARENA, a ML engineering program for upskilling people in doing technical AI safety work. Before that I was the director of Cambridge AI Safety Hub, where I founded the research program MARS and led upskilling ML programs like CaMLAB. I have a MSc in machine learning from UCL and a BA Hons in psychology & neuroscience from the University of Cambridge.

Publications & other work

Chloe Li, Mary Phuong, Daniel Tan (2025). Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives. In Proceedings of ICLR 2026.
Chloe Li, Mary Phuong, Noah Y. Siegel (2025). LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring. In Proceedings of IJCNLP-AACL 2025. (Oral Presentation)
ARENA LLM Evaluations Curriculum