The First Mechanistic Interpretability Frontier Lab — Myra Deng & Mark Bissell of Goodfire AI

68 min

•Feb 6, 20265 months ago

Summary

Goodfire AI, a mechanistic interpretability research lab, discusses their $150M Series A funding and unicorn valuation. The company focuses on using interpretability techniques to understand, control, and design AI models for production use cases, including real-time steering of trillion-parameter models and applications in healthcare, scientific discovery, and enterprise AI safety.

Insights

Mechanistic interpretability is transitioning from academic research to production applications with real business value
Real-time model steering at trillion-parameter scale is now feasible, enabling dynamic model behavior modification
Interpretability techniques can solve practical AI problems like bias removal, hallucination detection, and surgical model edits
The field offers low barriers to entry with modest compute requirements, making it accessible for researchers and startups
Healthcare and scientific discovery represent major commercial opportunities for interpretability applications

Trends

Mechanistic interpretability moving from research to production deploymentReal-time model steering and control becoming commercially viableAI interpretability as a solution for post-training model issuesGrowing demand for AI transparency in high-stakes industries like healthcareSparse autoencoders (SAEs) limitations driving next-generation interpretability methodsIntegration of interpretability techniques into model training pipelinesInterpretability applications expanding beyond language models to vision and scientific modelsIncreased investment and commercial interest in AI safety and control technologies

Topics

Mechanistic Interpretability Model Steering and Control Sparse Autoencoders (SAEs)AI Safety and Alignment Healthcare AI Applications Scientific Discovery AI Post-training Model Optimization Real-time Model Modification AI Bias Detection and Removal Hallucination Detection Enterprise AI Guardrails Foundation Model Interpretability AI Human Interface Design Scalable AI Oversight Production AI Deployment

Companies

Goodfire AI

Featured company that raised $150M Series A at $1.25B valuation for AI interpretability research

Anthropic

Referenced for pioneering interpretability research and SAE work on language models

Palantir

Mark's previous employer where he worked as forward deployed engineer on healthcare team

Two Sigma

Myra's previous employer where she worked as software engineer before joining Goodfire

Rakuten

Production customer using Goodfire's PII detection system for e-commerce platform guardrails

Mayo Clinic

Healthcare partner using Goodfire's interpretability for medical AI model analysis

OpenAI

Referenced for GPT-4 'laziness' controversy and weak-to-strong generalization research

DeepMind

Mentioned for open-sourcing SAE work on Gemma models and interpretability research

People

Myra Deng

Head of Product at Goodfire AI, former Two Sigma software engineer

Mark Bissell

Member of Technical Staff at Goodfire AI, former Palantir healthcare engineer

Tom

Chief Scientist at Goodfire AI who works with customers on real-world problems

Ekdeep

Goodfire researcher specializing in steering techniques and equivalence with prompting

Lee Sharkey

Author of 'Open Problems in Interpretability' paper, works at Goodfire's London office

Ted Chiang

Science fiction author whose work inspires AI interpretability thinking, wrote 'Exhalation'

Quotes

"Goodfire is an AI research lab that focuses on using interpretability to understand, learn from and design AI models. We really believe that interpretability will unlock the new generation next frontier of safe and powerful AI models."

Myra Deng

"Nobody knows what's going on. Subliminal learning is just an insane concept when you think about it. Train a model on not even the logits, literally the output text of a bunch of random numbers. And now your model loves owls."

Unknown

"We don't want to be in the world where steering is only useful for like stylistic things. The types of interventions that you need to do to get to things like legal reasoning are much more sophisticated and require breakthroughs in learning algorithms."

Myra Deng

"If you believe in scaling, then you need to know where to scale. But if you believe in double descent, then you don't believe in anything."

Unknown

Full Transcript

4 Speakers

Speaker A

So welcome to the Latent Space five. We're back in the studio with our special Mechinterp co host, Vibu. Welcome.