AA

Guest dossier

Amanda Askell

philosopherresearch scientist

1 appearance ·5 ideas explored ·Wikipedia ·✓ verified

AI safety mechanistic interpretability constitutional AI AI capabilities scaling hypothesis

Amanda Askell is a Scottish philosopher and AI researcher. She has served as the head of the personality alignment team at Anthropic since 2021. She has played a large role in the development of Claude's personality and constitution. In 2024, she appeared on the Time 100 AI list. She previously worked at OpenAI, but left over concerns that the company was not prioritizing AI safety enough. She has published over 60 papers and has received over 190,000 citations.

Across 1 conversation, Amanda Askell ranges across AI safety, mechanistic interpretability, constitutional AI. Dario Amodei predicts AI will reach PhD-level capabilities by 2026-2027, driven by scaling laws. AI models like Sonnet 3.5 have shown rapid improvement, achieving a 50% success rate on SWE-bench.

Synthesized by TLexDR from 1 conversation. AI-generated. Report an inaccuracy

For the specialist

preview

Constitutional AI allows models to rank responses based on principles like harmlessness, enhancing safety.

#452Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Mechanistic interpretability aims to understand complex abstractions and deception features in neural networks.

#452Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Sparse autoencoders and dictionary learning reveal interpretable features in neural networks, supporting superposition.

#452Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

1 more specialist takeaways

The expert layer unlocks with Read

Unlock with Read

The appearance

Every conversation, in order

Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Reading list

What they pointed you toward

papers

Word2Vec

by Tomas Mikolov et al.

arXiv Google Scholar

articles

Machines of Loving Grace

by Dario Amodei

others

AlphaGo Zero

by DeepMind

Every idea, by region

The full territory

consciousness

AI safety AI capabilities

scaling hypothesis

mechanistic interpretability constitutional AI scaling hypothesis

Adjacent minds

Others exploring the same ideas

CO Chris Olahshares constitutional AI, mechanistic interpretability DA Dario Amodeishares constitutional AI, mechanistic interpretability SA Sam Altmanshares AI capabilities, AI safety