What's Missing Between LLMs and AGI - Vishal Misra & Martin Casado

48 min

•Mar 17, 20263 months ago

Summary

Columbia professor Vishal Misra explains his mathematical proof that LLMs perform precise Bayesian inference, updating probability distributions as they process new information. He argues that while current AI excels at pattern matching and correlation, achieving AGI requires two breakthroughs: continual learning capabilities and the ability to move from correlation to causation.

Insights

LLMs can be mathematically modeled as giant sparse matrices where each row represents a prompt and columns show probability distributions over possible next tokens
Transformers perform mathematically perfect Bayesian inference, matching theoretical predictions to 10^-3 bits accuracy in controlled experiments
Current AI architectures are limited to correlation-based learning (Shannon entropy) and cannot discover new causal models (Kolmogorov complexity) like Einstein's relativity theory
AGI requires two fundamental advances: plasticity for continual learning after training, and the ability to build causal models rather than just pattern matching
The 'Einstein test' - training an LLM on pre-1916 physics to see if it discovers relativity - represents a high bar for true AGI capabilities

Trends

Mathematical formalization of LLM behavior through Bayesian frameworksShift from scaling compute to architectural innovations for AI advancementGrowing focus on continual learning and plasticity in AI systemsMovement from correlation-based to causation-based AI architecturesDevelopment of causal reasoning capabilities in language modelsIntegration of simulation and intervention capabilities in AIAcademic validation of LLM mechanisms through controlled experimentsRecognition that consciousness claims for current AI are unfounded

Topics

Bayesian inference in large language models Mathematical modeling of transformer architectures In-context learning mechanisms AGI requirements and definitions Continual learning and neural plasticity Causal reasoning vs correlation learning Domain-specific language translation Kolmogorov complexity vs Shannon entropy Einstein test for artificial general intelligence Retrieval augmented generation (RAG)Catastrophic forgetting in neural networks Consciousness and AI systems Simulation-based reasoning Token probability distributions

Companies

Anthropic

Praised for products like Claude but criticized for consciousness claims about AI systems

OpenAI

Developed GPT-3 used in Misra's cricket database experiments and removed probability displays

ESPN

Acquired cricket site and deployed Misra's natural language query system in production

DeepMind

Mentioned as current employer of one of Misra's research collaborators

Google

Google Research published related work on teaching LLMs Bayesian learning through RLHF

People

Vishal Misra

Columbia professor who developed mathematical models proving LLMs perform Bayesian inference

Martin Casado

A16z partner hosting the podcast and discussing LLM mechanisms with Misra

Dario Amodei

Anthropic CEO allegedly claimed consciousness in AI systems cannot be ruled out

Judea Pearl

Developed causal hierarchy framework distinguishing correlation from causation

Donald Knuth

Recently used LLMs with memory updates to solve Hamiltonian cycle problems

Yann LeCun

AI researcher working on causality and planning approaches similar to Misra's ideas

Albert Einstein

Used as example of discovering new causal models that current AI cannot replicate

Naman Agarwal

Columbia colleague who co-authored Misra's Bayesian wind tunnel research papers

Siddharth Dalal

Research collaborator on Misra's series of papers proving Bayesian behavior in LLMs

Quotes

"You take an LLM and train it on pre1916 or 1911 physics and see if it can come up with the theory of relativity. If it does, then we have AGI."

Vishal Misra

"They are grains of silicon doing matrix multiplication. They don't have consciousness. They don't have an inner monologue."

Vishal Misra

"Scale will not solve everything. You need a different kind of architecture."

Vishal Misra

"Deep learning is still in the Shannon Entropy world. It has not crossed over to the Kolmogorov complexity and the causal world."

Vishal Misra

"The transformer got the precise Bayesian posterior down to 10 to the power minus 3 bits accuracy. It was matching the distribution perfectly."

Vishal Misra

Full Transcript

3 Speakers

Speaker A

Anthropic makes great products. Plot code is fantastic, Cowork is fantastic. But they are grains of silicon doing matrix multiplication. They don't have consciousness. They don't have an inner monologue. You take an LLM and train it on pre1916 or 1911 physics and see if it can come up with the theory of relativity. If it does, then we have AGI.