Machine Learning Street Talk (MLST)

"Vibe Coding is a Slot Machine" - Jeremy Howard

87 min
Mar 3, 2026about 2 months ago
Listen to Episode
Summary

Jeremy Howard, deep learning pioneer and Kaggle grandmaster, discusses the limitations of AI-powered coding, arguing it creates an illusion of productivity while potentially eroding developer competence. He advocates for interactive, notebook-based development environments that keep humans engaged in the learning process rather than delegating cognitive tasks to AI.

Insights
  • AI coding tools create a 'slot machine' effect with illusion of control but minimal actual productivity gains
  • Organizations risk losing institutional knowledge when they delegate cognitive tasks to AI without maintaining human competence
  • Interactive notebook environments enable better human-AI collaboration than traditional coding interfaces
  • Transfer learning principles from ULMFit laid groundwork for modern language model fine-tuning approaches
  • Software engineering skills become more critical as AI handles coding, requiring focus on architecture and design
Trends
Shift from AI replacing developers to AI augmenting experienced developers while potentially harming mid-level programmersGrowing recognition that AI coding productivity claims are overstated with minimal real-world shipping improvementsMovement toward interactive development environments that maintain human engagement and learningIncreasing importance of software engineering principles over pure coding abilityRisk of 'understanding debt' accumulating in organizations that over-rely on AI-generated codeCentralization concerns around AI capabilities being controlled by few powerful entitiesEvolution from autonomous AI fears to more practical concerns about human skill atrophyGrowing emphasis on constraint-based AI systems rather than unconstrained generationRenewed interest in notebook-based development methodologies for production softwareRecognition that AI works best in well-defined domains with clear evaluation functions
Companies
Anthropic
Discussed for Claude coding capabilities and internal productivity studies showing mixed results
OpenAI
Referenced for GPT models and recent mathematical reasoning research with evaluation functions
Cursor
Mentioned as example of AI coding tools and for CTO's upcoming talk on agentic IDE development
Microsoft
Howard's former employer where he presented ULMFit research; also discussed for Windows quality decline
Meta
Referenced for Facebook's data control and power imbalance with users
Google
Mentioned for data collection practices and comparison to other tech giants' privacy issues
Nvidia
Discussed for GTC conference, new chip architectures, and DGX Spark personal supercomputer
Kaggle
Platform where Howard achieved grandmaster status, establishing his machine learning credentials
Instagram
Example of small team (10 staff) dominating sector against larger competitors like Google/Microsoft
WhatsApp
Another example of small team achieving market dominance through better software engineering
People
Jeremy Howard
Main guest, deep learning pioneer, Kaggle grandmaster, and founder of fast.ai and Answer.AI
Rachel Thomas
Howard's collaborator who researched gambling-like aspects of AI coding and productivity studies
Dario Amodei
Anthropic CEO whose essay on AI productivity claims Howard critiques as overstated
Chris Lattner
Compiler expert Howard consulted about Claude's C compiler implementation and LLVM copying
Stephen Merity
Researcher who created AWD-LSTM architecture that Howard used as foundation for ULMFit
Yann LeCun
AI pioneer who worked on self-supervised learning and reviewed Howard's semi-supervised learning post
Fred Brooks
Computer scientist whose 'No Silver Bullet' essay Howard cites regarding software engineering productivity
Brett Victor
Interface designer whose work on direct manipulation and visual programming Howard champions
Elon Musk
Referenced for recent claims about LLMs generating machine code directly without programming languages
Joel Grus
Data scientist who gave famous talk criticizing Jupyter notebooks that Howard later rebutted
Quotes
"It literally disgusts me. Like, I literally think it's inhumane. My mission remains the same as it has been for like 20 years, which is to stop people working like this."
Jeremy Howard
"The thing about AI based coding is that it's like a slot machine in that you have an illusion of control."
Jeremy Howard
"No one's actually creating 50 times more high quality software than they were before. So we've actually just done a study of this and there's a tiny uptick, tiny uptick in what people are actually shipping."
Jeremy Howard
"LLMs cosplay understanding things. They pretend to understand things."
Jeremy Howard
"The vast majority of work in software engineering isn't typing in the code."
Jeremy Howard
Full Transcript
4 Speakers
Speaker A

This episode is brought to you by indeed. Stop waiting around for the perfect candidate. Instead, use Indeed sponsored Jobs to find the right people with the right skills fast. It's a simple way to make sure your listing is the first candidate. C. According to Indeed data, sponsored jobs have four times more applicants than non sponsored jobs. So go build your dream team today with Indeed. Get a $75 sponsored job credit@ Indeed.com podcast. Terms and conditions apply.

0:01

Speaker B

It literally disgusts me. Like, I literally think it's inhumane. My mission remains the same as it has been for like 20 years, which is to stop people working like this.

0:28

Speaker C

Jeremy Howard, a deep learning pioneer, a Kaggle grandmaster. He is a huge advocate for actually understanding what we are building through an interactive loop. A notebook, a repl. The act of poking at a problem until it pushes back. He argues, this is where the real insight happens.

0:40

Speaker B

And this funny thing is, they're both right. LLMs cosplay understanding things. They pretend to understand things. No one's actually creating 50 times more high quality software than they were before. So we've actually just done a study of this and there's a tiny uptick, tiny uptick in what people are actually shipping. The thing about AI based coding is that it's like a slot machine in that you have an illusion of control. You know, you can get to craft your prompt and Your list of MCPs and your skills and whatever, and then. But in the end, you pull the lever, right? Here's a piece of code that no one understands. And am I going to bet my company's product on it? And the answer is, I don't know because, like, I don't, I don't, like, I don't know what to do now because no one's like, been in this situation. They're, they're really bad at software engineering. And then I think that's possibly always going to be true. The idea that a human can do a lot more with a computer, when the human can, like, manipulate the objects inside that computer in real time and study them and move them around and combine them together. Whoever you listen to, you know, whether it be Feynman or whatever, like, you always hear from the great scientists how they build deeper intuition by building mental models which they get over time by interacting with the things that they're learning about. A machine could kind of build an effective hierarchy of abstractions about what the world is and how it works entirely through looking at the statistical correlations of a huge corpus of text using a deep learning model. That was my premise.

1:03

Speaker C

This Video is brought to you by Nvidia GTC. It's running March 16th until the 19th in San Jose and streaming free online. The key topics this year are agentic AI and reasoning, high performance, inference and training, open models and physical AI and robotics. I'm so excited about the DGX Spark. I've been on the waiting list for over a year now. It's a personal supercomputer that is about the size of a Mac Mini and the perfect adornment to a MacBook Pro by the way. And you can fine tune a 70 billion parameter language model with one of these things and I'm giving one away for free. All you have to do is sign up to the conference and attend one of the sessions using the link in the description. As for the sessions, I'm interested in attending Aman Sangh's talk. So he's the CTO of Cursor and his session is Code with context. Build an agentic IDE that truly understands your code base. Now obviously Jensen's keynote is on March 16. He said he's going to unveil a new chip that will surprise the world. Their next generation architecture, Vera Rubin is already in full production and there's speculation we might even get an early glimpse of their new Feynman architecture. So don't forget folks, the link is in the description. If you're attending virtually, it's completely free. Don't miss it. Jeremy Howard welcome to mlst.

3:08

Speaker B

I mean welcome to my home. Thanks for coming.

4:27

Speaker C

Yeah, well, where are we now?

4:31

Speaker B

We are in beautiful Moreton Bay in southeast Queensland. We are by the sea in my backyard.

4:33

Speaker C

The weather didn't disappoint.

4:40

Speaker B

It certainly didn't. It doesn't often, but if you were here yesterday it would have been very different.

4:42

Speaker C

Well, I don't know where to start. So I've been a huge fan probably since about 2017. 18. Of course you had the famous Uln fit paper and when I was at Microsoft I remember doing a presentation about that because it was actually I mean now we take it for granted that we fine tune language models on a corpus of text and then we kind of like continue to train them and specialize them. But apparently this was not received wisdom.

4:46

Speaker B

Oh, this was the first time it happened? Yeah, kind of the first or second. Soakley and Andrew Dye had done something a few years ago but they had missed the key point, which is the thing you pre train on has to be a general purpose corpus. So no one quite realized this key thing. And maybe I had a Bit of fortune here that my background was in philosophy and cognitive science. And so I'd spent some decades thinking about this.

5:10

Speaker C

The technical architecture of ULM fit. Just sketch that out.

5:34

Speaker B

I'm a huge fan of regularization. I'm a huge fan of taking a model that's incredibly flexible and then making it more constrained, not by decreasing the size of the architecture, but by adding regularization. So even that at the time was extremely controversial, but that was by no means a unique insight of ours. So what Steven Meredy had done is he had taken the extreme flexibility of an lstm, a kind of the classic stateful recurrent neural net towards which things are kind of gradually heading back towards nowadays, added five different types of regularization. He added every type of regularization you can imagine. And then that was my starting point, was to say, okay, I now have a massively flexible deep learning model that can be as powerful as I want it to be, and it can also be as constrained as I need it to be. And then I needed a really big corpus of text. Funnily enough, this is also Stephen, he had been at Common Crawl and I think he helped or made the Wikipedia data set. And then I realized actually the Wikipedia dataset made lots of assumptions. It had all these like unk for unknown words because it all assumed classic NLP approaches. So I redid the whole thing, created a new Wikipedia dataset and that was my general corpus. And then I used awd, LSTM trained it. So it was actually overnight. So for eight hours on a gaming gpu. You know, because I was at the University of San Francisco, we didn't have heaps of resources, probably like a 2080 TI or something, I suspect. And then the next morning when I woke up I then it's the same three stage architecture that we do today, you know, pre training, mid training, post training. So then I figured, okay, now that I've trained something to predict the next word of Wikipedia, it must know a lot about the world. I then figured if I then fine tune it on a corpus specific. So what we could now call supervised fine tuning dataset, which in this case was a data set of movie reviews, it would become especially good at predicting the next word of those. So it would learn a lot about movies. Did that for like an hour and then like a few minutes of fine tuning the downstream classifier, which was a classic academic data set, kind of considered the hardest one, which was to take like 5,000 word movie reviews. And to say like, was this a positive or negative sentiment, which today is Considered easy. But at that time the only things that did it quite well were highly specialized models that people wrote their whole PhDs on. And I beat all of their results five minutes later when it fine tuning that model. Amazing.

5:39

Speaker C

And the other interesting thing is this kind of methodology around how you do the fine tuning.

8:32

Speaker B

Yeah, so the how we do the fine tuning was something we had developed at fastai. So this is kind of year one of fastai. So this is still in our very early days. And one of the extremely controversial things we did was we felt that we should focus on fine tuning existing models because we thought fine tuning was important. Some other folks were doing work contemporaneously with that. So Jason Usinski did some really great research, I think it was during his PhD on how to fine tune models and how good they can be. And some other folks in the computer vision world, we were amongst the first. There's a bunch of us kind of really investing in fine tuning. And so yeah, we felt that using a single learning rate to fine tune the whole thing all at once made no sense because the different layers have different behaviors. And this is one of the things Jason Yasinski's research also showed. We developed this idea of like, well, it's also way faster if you just train the last layer, right. Because it only has to back prop the last layer and then once that's pretty good, back prop the last two and then the last three. And then we use something called discriminative learning rates. So different layers we would give different learning rates to. And then another critical insight that no one realized for years, even though we had told everybody, was that you actually have to fine tune every batch norm. So all the normalization layers you do actually have to fine tune because that's moving the whole thing up and down or changing its scale. So yeah, when you do that, you can often just fine tune the last layer or two. And we found that actually with ulmfit, although we did end up unfreezing all the layers, only the last two were really needed to get the close to state of the art result. So it took like seconds.

8:39

Speaker C

Yeah, because the discriminative learning rate thing is interesting because I think the received wisdom at the time was when you fine tune a model, if the learning rate is too high, you kind of blow out the representations. So I guess the wisdom was if you don't have a really low learning rate, you'll just destroy the representations.

10:29

Speaker B

I mean there was no received wisdom because nobody talked about it, no one cared. You know, it's just this. So nearly no one cared. Transfer learning was just not something anybody thought about. And Rachel and I felt like it matters more than anything because only one person has to train a really big model once and then the rest of us can all fine tune it. So we thought we just should learn how to do that really well. So we spend a lot of time just trying lots of things. But in the end, the intuition was pretty straightforward and what intuitively seemed like it ought to work, basically always did work. Which is another big difference between how people still today tend to do ML. Research is, I think it's all about ablations and you can't make any assumptions or guesses and it's not at all true. I find nearly everything that I expect to work, work almost always works first time because I spent a lot of time building up those intuitions, that kind of understanding of how gradients behave.

10:45

Speaker C

I think there's a dichotomy though between continual learning, which is when we want to keep training the thing but maintain generality versus fine tuning a thing to do something specific. So there's always been this idea that, yes, you can make a model specific, you can bend it to your will, but you lose generality and you kind of degrade the representation. So tell me about that.

11:56

Speaker B

Yeah, there's some truth in that, although not as much as you might think on the whole. The big problem is that people don't actually look at their activations and don't actually look at their gradients. So something we do in our software, in our fast AI software, is we have built into it this ability to see in a glance what your entire network looks like. And once you've done it a few times, it just takes a couple of hours to learn. You can immediately see, oh, I see, this is overtrained or under trained or at this layer that something went wrong. It's not a mystery. So basically what happens is, for example, you end up with dead neurons that go to a point where they've got zero gradient, regardless of what you do with them. That often happens if they head off towards infinity, you can always fix that. So, yeah, it's not as bad as people think, but by any means, something that trains well for continuous learning, when done properly, can also be done well to train well for a particular task.

12:16

Speaker C

If you're careful, in a sense, you do want the neurons to die. And I'll explain what I mean by this. We want to bend the behaviour of models to introduce implicit constraints, because without constraints there is no creativity. There is no reasoning and so on and so forth. So in a sense you actually want it to say don't do that, you want it to do something else.

13:21

Speaker B

I don't think of it that way. To me it's more like I find thinking about humans extremely helpful. When it comes to thinking about AI, I find they behave more similarly than differently. And my intuition about each tends to work quite well with a human. When you learn something new, it's not about unlearning something else. And, and so something I always found is when I got models to try to learn to do two somewhat similar tasks, they almost always got better at both of them than one that only learned one of them.

13:44

Speaker C

I was reminded a little bit of, you know, the dino paper from LeCun. So this whole kind of regime of self supervised learning with, I mean that was a vision model. But the idea was, okay, so we're doing pre training and we want to maintain as much diversity and fidelity as possible so that when we do the downstream task we can kind of, we've got more things that we can latch on.

14:20

Speaker B

Yeah, yeah. And you know, semi supervised and self supervised learning was such an unappreciated area. And yeah, Jan Lecun was absolutely one of the guys who was also working on it. I actually did a post because I was so annoyed at how few people cared about semi supervised learning. I did a whole post about it years ago. Yann Lecun looked at it for me as well and suggested a few other pieces of work that I had missed. But I was kind of surprised at how incredibly useful it is to basically say, basically come up with a pretext task. So we did this in vision before Ulm fit. So it was like in medical imaging, take a histology slide and predict, you know, mask out a few squares and predict what used to be there. So some of my students at USF I had doing stuff with that. It was basically entirely taking stuff that we and others had already done in vision. So like this idea of masking out squares, we didn't invent it. Masking out words was the obvious thing, you know, and this idea of gradually unfreezing layers we had done before in computer vision. The whole idea of starting with a pre trained model, it was general purpose had been in computer vision. There was a really classic paper actually in computer vision in might have been around 2015 was entirely an empirical paper saying look what happens when we take a pre trained imagenet model predicting what sculptor created this sculpture or predicting what architecture style this is. And every task it got the state of the art result. And it really surprised me. People didn't look at that and think, I bet that ought to work in every other area as well, whether it be genome sequences or language or whatever. But people have a bit of a lack of imagination, I find. They tend to assume things only work in one particular field. That's really true.

14:42

Speaker C

Yeah. I mean, I guess there's two things there. I mean, first of all, we were kind of hinting at this notion of almost Goodhart's law or the shortcut rule that you get exactly what you optimize for at the cost of everything else. But that doesn't seem to be the case because we can optimise for perplexity in the case of language models. And as you say, what's in seems to happen is we're getting into the distributional hypothesis here a little bit. So you know, you know the word by the company it keeps. So when we have an incredible amount of associative data, it might be mast auto prediction or any of these things like that. The model seems to build something that we might call an understanding.

16:44

Speaker B

Like, well, I've always thought of it as a hierarchy of abstractions. You know, if it's going to predict. If the document is here was the opening that Bobby Fischer used and it has chess notation to predict the next thing. It needs to know something about chess notation or at least openings. If it's like, and this was VetoeD by the 1956 US President, comma, you need to know, you don't just need to know who the president was, but the idea that there are presidents and therefore the idea that there are leaders and therefore the idea that there are groups of people who have hierarchies and therefore that there are people and therefore that there are objects. You can't predict the next word of a sentence well without knowing all of these things. So that knowing my hypothesis for why I created Ulm fit to say it would end to compress that as well as possible to get that knowledge, it would have to create these abstractions, these hierarchies of abstractions somewhere deep inside its model. Otherwise how could it possibly do a good job of predicting the next word? And because deep learning models are universal learning machines and we had a universal way to train them, I figured if we get the data right and if the hardware's good enough, then in theory we ought to be able to build that next word predicting machine, which ought to implicitly build a hierarchical structural understanding of the things that are being described by the text that it is learning to predict.

17:16

Speaker C

I think that they can know in quite a, you know, they know in quite a superficial way. So there's a myriad of surface statistical relationships and they generalize extraordinarily well. It's miraculous.

19:04

Speaker B

It is.

19:16

Speaker C

But the thing is, I want to contrast this with other comments you've made about creativity. So I think knowledge is about constraints and I think creativity is the evolution of knowledge respecting those constraints. Therefore, AI is not creative. And you've said the same thing. You've said AI isn't creative. So on the one hand, how can you say that they know and not think that they can be created?

19:17

Speaker B

I mean, I don't think I've used that exact expression. I know, I've actually, I remember chatting with Peter Norvig on camera and both of us said, well, actually they kind of are creative. Like, we've just got to be a bit careful about our choices of words, I guess. So you know Pyotr Wozniak, who's a guy I really, really respect, who kind of rediscovered spaced repetitive learning, built the Super Memo system, and as the modern day guru of memory, the entire reason he's based his life around remembering things is because he believes that creativity comes from having a lot of stuff remembered. Which is to say, putting together stuff you've remembered in interesting ways is a great way to be creative. LLMs are actually quite good at that. But there's a kind of creativity they're not at all good at, which is moving outside the distribution, which I think is where you're heading with your question. But I'm just kind of, I'm framing it this way to say you have to be so nuanced about this stuff because if you say like they're not creative, it can give you the wrong idea because they can do very creative seeming things. But if it's like, well, can they really extrapolate outside the training distribution, the answer is no, they can't. But the training distribution is so big and the number of ways to interpolate between them is so vast, we don't really know yet what the limitations of that is. But I see it every day because my work is R and D. I'm constantly on the edge of and outside the training data. I'm doing things that haven't been done before. And there's this weird thing, I don't know if you've ever seen it before. I see it, but I see it multiple times every day, where the LLM goes from being incredibly clever to worse than Stupid. Not understanding the most basic fundamental premises about how the world works. And it's like, oh, whoops, I fell outside the training data distribution, it's gone dumb. And then there's no point having that discussion any further because, yes, you know, you've lost it at that point.

19:38

Speaker C

Yes. I mean, I love, you know, Margaret Bowden, she had this kind of hierarchy of creativity. So there's like combinatorial, exploratory and transformative, and the models can certainly do combinatorial creativity. But for me, it's all about constraints. So that, I mean, this is what Bowdoin said and even Leonardo da Vinci, he said that creativity is all about constraints. And you've spoken about, you know, we'll talk about this dialogue, engineering, but what happens is when we talk with language models, it's a specification acquisition problem. So we go back and forth and actually when we think the process of intelligence is about building this imaginary Lego block in our mind and respecting various constraints, and when you respect those constraints and you just continue to evolve, then those things are said to be creative. So language models, when you add constraints to them, so this could be via supervision, via critics, via verifiers, then they are creative and alpha evolve. We've seen many examples of this. But the illusion is on their own, sans constraints. Obviously they have this behavioral shaping stuff that we're talking about. They don't have hard constraints and that's why they can't go outside their distribution.

21:54

Speaker B

I mean, I think they can't go outside their distribution because it's just something that that type of mathematical model can't do. I mean, it can do it, but it won't do it. Well, when you look at the kind of 2D case of fitting a curve to data, once you go outside the area that the data covers, the curves disappear off into space in wild directions. And that's all we're doing, but we're doing it in multiple dimensions. I think Bowdoin might be pretty shocked at how far compositional creativity can go when you can compose the entirety of the human knowledge corpus. And I think this is where people often get confused because it's like. So, for example, I was talking to Chris Lattner yesterday about how Claude Anthropic had got Claude to write a C compiler. And they were like, oh, this is a clean room C compiler. You can tell it's clean room because it was created in rust, you know, and so Chris created the kind of, you know, I guess it's probably the most widely used C C compiler nowadays. Playing on top of llvm, which is the most widely used kind of foundation for compilers. They were like, oh, well, Chris didn't use Rust and we didn't give it access to any compiler source code. So it's a clean room implementation. But that misunderstands how LLMs work, right? Which is all of Chris's work was in the training data. Many, many times. LLVM is used widely and lots and lots of things are built on it, including lots of C and C compilers. Converting it to Rust is an interpolation between parts of the training data. It's a style transfer problem. So it's definitely compositional creativity at most, if you can call it creative at all. And you actually see it when you look at the repo that it created. It's copied parts of the LLVM code, which today Chris says like, oh, I made a mistake, I shouldn't have done it that way. Nobody else does it that way, you know, oh wow, look, they're the only other one that did it that way. That doesn't happen accidentally. That happens because you're not actually being creative, you're actually just finding the kind of non linear average point in your training data between like Rust things and building compiler things.

22:57

Speaker C

All of that is true. I mean, first of all, I think we shouldn't underestimate the size of how big this combinatorial creativity is. So all of that is true. So the code is on the Internet, but also they had a whole bunch of tests which were scaffolded, which meant that every single time some code was committed they could run the test and they basically had a critic and they could then do this autonomous feedback loop. So in a sense it's very similar to the Recent research by OpenAI and Gemini, where you're trying to solve a problem in math and you already have an evaluation function. The same on the ArcPrize, right? You have an evaluation function and what people discount is even knowledge of what the evaluation function is, is partial knowledge of the problem. So you can then brute force search, you can use the statistical pattern matching, use the verifier as a constraint and you can actually.

25:38

Speaker B

And they don't even need to do that, right? You literally already know how to pass those tests because there's lots of software that already does it. So it just uses that and translates them to Rust. That's all it did. Which is impressive. And I'm much less familiar with math than I am computer science, but from talking to mathematicians, they tell me that that's Also, what's happening with Erdos problems and stuff, some of them are newly solved, but they are not sparks of insight, they're solving ones that you can solve by mashing up together very closely related things that humans have already figured out.

26:26

Speaker C

So on the subject of Claude code, now I know you've spoken extensively about vibe coding. Actually Rachel had some interesting work out. I mean, she quoted the meter study which showed that productivity actually went down when people were vibe coding.

27:15

Speaker B

But I think, and they thought that they went up, which is the most interesting.

27:28

Speaker C

And then also there was the anthropic study. I mean, you know, maybe we should rewind a little bit. I mean, Dario had this essay out the other day, I think it was called the Adolescence of Technology or something like that. And he was basically saying, look, you know, we have all of these amazing software engineers at anthropic and they are just so productive. And he was extrapolating to the average software engineer. So there's going to be mass unemployment because soon we're going to be able to automate all of this with AI.

27:31

Speaker B

I mean, it doesn't make any sense. Elon Musk said something a bit similar a few days ago, saying like, oh, LLMs will just spit out the machine code directly. We won't need libraries, programming languages. Yeah, yeah. Look, the thing is, none of these guys have been software engineers recently. I'm not sure Dario's ever been a software engineer at all. Software engineering is an unusual discipline and a lot of people mistake it for being the same as typing code into an ide. Coding is another one of these style transfer problems. You take a specification of the problem to solve and you can use your compositional creativity to find the parts of the training data which interpolated between them. Solve that problem and interpolate that with syntax of the target language and you get code. There's a very famous essay by Fred Brooks written many decades ago, no Silver Bullet, in which it almost sounded like he was talking about today. He was specifically saying something was responding to something very similar, which is in those days it was all like, oh, what about all these new fourth generation languages and stuff like that? We're not going to need any coders anymore, any software engineers anymore, because software is now so easy to write, anybody can write it. And he said, well, he guessed that you could get at maximum a 30% improvement. He specifically said a 30% improvement in the next decade. But I don't think he needed to limit it that much because the vast majority of work in software engineering isn't typing in the code. So in some sense, parts of what Dario said were right. Just like for quite a few people now, most of their code is being typed by a language model. That's true for me, say like maybe 90%, but it hasn't made me that much more productive because that was never the slow bit. It's also helped me with kind of the research a lot and figuring out, you know, which files are going to be touched. But anytime I've made any attempt at getting an LLM to design a solution to something that hasn't been designed lots of times before, it's horrible because what it actually every time gives me is the design of something that looks on its surface a bit similar. And often that's going to be an absolute disaster because things that look on the surface a bit similar and like, I'm literally trying to create something new to get away from the similar thing. It's very misleading.

27:54

Speaker C

First of all, I'm exasperated by what I see as the tech bro predilection to misunderstand cognitive science and philosophy and whatnot, because we've spoken to so many really interesting people on mlst. Like, for example, Cesar Hidalgo, he wrote this book, the Laws of Knowledge. And even Marvita Chiramuta, she's a philosopher of neuroscience and she was talking all about, you know, like, you know, basically that knowledge is protean. So, yeah, I think that knowledge is perspectival. I don't think that knowledge can be this abstract, perspective free thing that can exist on Wikipedia. And I also think that knowledge is embodied and it's alive. It's something that exists in us. And the purpose of an organization is to preserve and evolve knowledge. So when you start delegating cognitive tasks to language models, you actually have this weird paradoxical effect that you erode the knowledge inside the organization.

30:51

Speaker B

Well, that's true and that's terrifying. There's often these arguments online between people who are like, LLMs don't understand anything. They're just pretending to understand. And then other people are like, don't be ridiculous. Look what this LLM just did for me. And the funny thing is, they're both right. LLMs cosplay understanding things. They pretend to understand things. And this was the interesting thing about the early kind of work with cognitive science. Work with, like Daniel Dennett. That's basically what the Chinese room experiment is, right? You've got a guy in a room who can't speak Chinese at all, but he sure looks like he does because you can feed in questions and he gives you back answers, but all he's actually doing is looking up things in a huge array of books or machines or whatever. The difference between pretending to be intelligent and actually being intelligent is entirely unimportant as long as you're in the region in which the pretense is actually effective. So it's actually fine for a great many tasks that LLMs only pretend to be intelligent because for all intents and purposes, it just doesn't matter until you get to the point where it can't pretend anymore. And then you realize, oh, my God, this thing's so stupid.

31:44

Speaker C

I'm a fan of Soul, by the way. So, you know, he said that understanding is causally reducible but ontologically irreducible. And he was saying there was a phenomenal component to understanding. You don't even need to go there. Like, the interesting thing about knowledge being protean is this idea that, you know, it's basically this Kantian idea. The world is a complex place. None of us understand it. It's like the blind men and the elephant. We all have different perspectives. It's very complex thing. And so we all do this kind of modeling. But the interesting thing is that the language model, sometimes they seem to understand. And they understand because the supervisor places them in a frame. So inside that frame. So when you have that perspective of the elephants, they're actually surprisingly coherent. But we discount the supervisor placing the models in that frame.

33:09

Speaker B

Yeah, yeah. So Searle vs. Dennett or vs. Searle and Dennett was what everybody was talking about back when I was doing my undergrad in philosophy. Consciousness explained. Came out about then, probably Chinese Room a little bit before. It's interesting because the discussions were the same discussions we're having now, but they've gone from being abstract discussions to being real discussions. It's helpful if people go back to the abstract discussions because it's. It helps you get out of your. You know, it's very distracting at the moment to look at something that's cosplaying intelligence so well and go back to the fundamental question. So anyway, I just wanted to mention that kind of. It's this interesting situation we're now in where it's very easy to really get the wrong idea about what AI can do, particularly when you don't understand the difference between coding and software engineering. Which then takes me to your point or your question about the implications of that for organizations. A lot of organizations are basically betting their futures on a speculative premise, which is that AI is going to be able to do Everything better than humans, or at least everything in coding better than humans. I worry about this a lot, both for the organizations and for the humans. For the humans, when you're not actively using your design and engineering and coding muscles, you don't grow. You might even wither, but you at least don't grow. And speaking of the CEO of an R and D startup, if my staff aren't growing, we're going to fail. We can't let that happen. And getting better at the particular prompting skills, whatever details of the current generation of AI CLI frameworks isn't growing. You know, that's as helpful as learning about the details of some AWS API. When you don't actually understand how the Internet works, it's not reusable knowledge, it's ephemeral knowledge. So like if you wanted to, you can actually use it as a learning superpower, but also it can do the opposite. Know the natural thing it's going to do is remove your competence over time.

33:53

Speaker C

I agree that that's the natural thing. So, and this is especially pertinent for you because your, your career has been around basically educating people to get, you know, technology and AI literacy. So the default behavior is very similar to a self driving car that, you know, there's this tipping point where at some point you're not engaged anymore, you're not paying attention and you get this delegation of competence and you get understanding debt. That's the default thing. So this study from Anthropic couple of weeks ago, it contradicted Dario completely because it even said that, yeah, there were a few people in the study that were asking conceptual questions that are actually kind of, you know, keeping on top of things and they had a gradient of learning, but most people didn't. And my hypothesis about that is, you know, the ideal situation for gen AI coding is that like us, we've been writing software for decades, we already have this abstract understanding, we're using it in domains that we know well and we can specify, we can remove loads of ambiguity, we can try track and we can go back and forth and we can stay in touch with the process. But what happens is that the default attractor is for people to just go into this autopilot mode and they've got no idea what's happening and it's actually making them dumber.

36:44

Speaker B

I created the first deep learning for medicine company called Denlytic back in, what Was that like 2014? And our initial focus was on radiology and a lot of people were worried that this would cause radiologists to become less effective at radiology. And I strongly felt the opposite, which is, and I did quite a bit of research into this of like what happens when there's like fly by wire in airplanes or anti lock brakes in cars or whatever. If you can successfully automate parts of a task that really are automatable, you can allow the expert to focus on the things that they need to focus on. And we saw this happen. So in radiology we found if we could automate identifying the possible nodules in a lung CT scan, we were actually good at it, which we were. And then the radiologist then can focus on looking at the nodules and trying to decide if they're malignant or what to do about it. So again, it's one of these subtle things. So if there's things which you can fully automate effectively in a way that you can remove that cognitive burden from a human so that they can focus on things that they need to focus on, that can be good. You know, I don't know where we sit in software development because, you know, I've been coding for 40ish years, so I've written a lot of code and I can glance at a screen of code and, you know, unless it's something quite weird or sophisticated, I can immediately tell you what it does and whether it works and whatever. I can kind of see intuitively things that could be improved, you know, possible things to be careful of. I'm not sure I could have got to that point if I hadn't have written a lot of code. So the people I'm finding who can really benefit from AI right now are either really junior people who can't code at all, who can now write some apps that they have in their head and as long as they work reasonably quickly with the current AI capabilities, then they're happy. And then really experienced people like me or like Chris Lattner, because we can basically have it do some of our typing for us, you know, and some of our research for us people in the middle, which is most people most of the time. It really worries me because how do you get from point A to point B without typing code? It might be possible, but we don't have a. We have no experience of that. Is it possible? How would you do it? Is it kind of like going back to school where at primary school we don't let kids use calculators so that they develop their number muscle? Do we need to do that for like first five years? As a developer you have to write all the code yourself. I Don't know. But if I was between like 2 and 20 years of experience developer, I would be asking that question of myself a lot. Because otherwise you might be in the process of making yourself obsolete.

37:47

Speaker C

Yeah, well, this is another thing about knowledge that this Cesar Hidalgo guy said. So he said that knowledge is non fungible, which means it can't be exchanged. So what he means by that is the process of learning is in some important sense not reducible. Right. So you have to have the experience and the experience has to have friction. And when we build models of the world, we actually learn. Like, you know, there's this phrase reality pushes back. So we make lots of mistakes and we update our models and we're just placing these coherence constraints in our model and that's how we come to learn. So you use Claude code and there's so little friction in the process. That's exactly what this study from Anthropic said. It said there was so little friction, they didn't learn anything.

41:16

Speaker B

Right? Yeah, no, exactly. Desirable difficulty is the concept that kind of comes up in education. But even going back to the work of Ebbinghaus, who was the original repetitive space learning guy in the 19th century, and then Pieter Wozniak more recently, we find the same. Like we know that memories don't get formed unless it is hard work to form them. So that's where you kind of get this somewhat surprising result that says revising too often is a bad idea because it comes to mind too quickly. And so with repetitive space learning, with stuff like Anki and SuperMemo, the algorithm tries to schedule the flashcards just before the moment you're about to forget. So then it's hard work. So I studied Chinese for 10 years in order to try to learn about learning myself. And I really noticed this, that I used Anki. And because it was always scheduling my cards just before I was about to forget them, it was always incredibly hard work to do reviews because almost all the cards were once I was on the verge of forgetting. It was absolutely exhausting. But my God, it worked. Well, here I am, I don't really haven't done any study for 15 plus years and I still remember my Chinese.

41:57

Speaker C

Well, I know, I mean, also, I mean, coming back to your radiology example, like one example people give is call centers. So you know, we have this notion that in an organization we have high intelligence roles and low intelligence roles. And for me intelligence is just the adaptive acquisition and synthesis of knowledge. So we assume that, you know, that the low intelligence roles doing the Call center stuff, it doesn't adapt, which means we can, you know, there are certain things that an organization does that do not change. So we could automate them and we don't need to update our knowledge. And I think that discounts actually maybe with the radiology example, that having this holistic knowledge like, you know, in a call center, there are so many weird edge cases that come in, so many weird things happen, and that filters up in the organization, and we adapt over time. So when you start to automate things and you actually lose the competence to create the process which created the thing in the first place, and you lose the evolvability of that knowledge in the organization, you're actually kind of cutting your legs off.

43:28

Speaker B

Yeah, absolutely. And so, you know, all I know is in my company, I tell our staff all the time, almost the only thing I care about is how much your personal human capabilities are growing. I don't actually care how many PRs you're doing, how many features you're doing. There's that nice. John Osterhout, the TCL guy, recently released some of his Stanford Friday takeaway lectures, and he has this nice one called A Little bit of slope makes up for a lot of intercept, which is basically the idea that in your life, if you can focus on doing things that cause you to grow faster, it's way better than focusing on the things that you're already good at that has that high intercept. So the only thing I really care about, and I think is the only thing that matters for, for my company is that my team, I'm focusing on their slope. If you focus on just driving out results at the limit of whatever AI can do, right now, you're only caring about the intercept, you know, So I think it's basically a path to obsolescence through both a company and the people who are in it. So I'm really surprised how many executives of big companies are pushing this now because it feels like if they're wrong, which they probably are, and they have no way to tell if they are, because this is an area they're not at all familiar with. They never learned it in their MBAs, they're basically setting up their companies to be destroyed. And really surprised that shareholders would let them do that, set up such an incredibly speculative action. Yeah, here we are. It feels like a lot of companies are, are going to fail as a result of the amassed tech debt that causes them to not be able to maintain or build their products anymore.

44:22

Speaker C

There are loads of folks out there like Francois Chollet, like he really gets it, he understands this and you know, so he's always said that it's about this kind of mimetic sharing of cognitive models about the domain and how we refine it together on the sharing thing. This is another big scaling problem with gen AI coding. Right. So the ideal case, I've done this, I know a domain really well and I can specify it with exquisite detail and I tell Claude code, go and do this thing and the models in my mind doesn't matter. And then you go into an organization and now I need to share like my knowledge with all of the other people.

46:27

Speaker B

Right.

46:59

Speaker C

And I'm sure you have this in your company as well. Like you need to. This knowledge acquisition bottleneck is a real serious problem in organizations. So when it's just me, I think I'm probably about 50 times more productive using Claude code. It's absolutely magic and I can see why people are so excited about it, but people don't seem to understand the bottleneck and how that doesn't really translate to many real world organization.

47:00

Speaker B

No one's actually creating 50 times more high quality software than they were before. So we've actually just done a study of this and there's a tiny uptick, tiny uptick in what people are actually shipping. That's the facts. Obviously I'm an enthusiast of AI and what it can do, but also my wife Rachel recently pointed out in an article, all of the pieces that make gambling addictive are present in.

47:23

Speaker C

Oh yeah, Dark Flow. I was going to bring that up.

47:58

Speaker B

Tell us about that. It's this really awkward situation where almost everybody I know who got very enthusiastic about AI powered coding in recent months have totally changed their mind about it. When they finally went back and looked at like how much stuff that I built during those days of great enthusiasm. Am I using today? Are my customers using today? Am I making money from today? Almost all the money is being made by influencers or by the companies that produce the tokens. The thing about AI based coding is that it's like a slot machine in that you have an illusion of control. You can get to craft your prompt and Your list of MCPs and your skills and whatever. But in the end you pull the lever, right? You put in the prompt and something comes back and it's like cherry, cherry. It's like, oh, next time I'll change my prompt a bit, I'll add a bit more context. Pull the lever again, Pull the lever again. It's the stochastic thing. You get the occasional win. That's like, oh, I won, I got a feature. So it's got all these hallmarks of like loss disguised as a win, somewhat stochastic feeling of control. All the stuff that gaming companies try to engineer into their gaming rooms. Now, none of that means that AI is not useful, but gosh, it's hard to tell.

48:00

Speaker C

I know. And Rachel, just to be clear as well, she also said that one of the hallmarks of gambling is that you kind of delude yourself that you have some awareness of what's going on, but actually you don't. But let's do the bull case a little bit though, so. Because I do think in restricted cases it is very useful and these are cases where we understand and we can place constraints and specification. But even in those cases you could argue on the one hand that we're not, you know, we're not going to be unemployed anytime soon because you just do more work on the addiction thing. I've noticed that. So I've had 14 hour Claude code marathon sessions and I actually feel addicted to it. It's like a slot machine.

49:40

Speaker B

It really is in there too. Absolutely.

50:17

Speaker C

Yeah, I know it's. And just I've never felt more drained writing. I actually need to take a rest afterwards, like a few days rest because it completely kills me.

50:19

Speaker B

Crap, you know. Yeah, definitely. I've had some successes. Right. And so in fact we've spent the last couple of years building a whole product based around where we know the successes are going to be, which is when you're working on reasonably small pieces that you can fully understand and that you can design and you can build up your own layers of abstraction to create things that are bigger than the parts that you're building out of. Had a very interesting situation recently where it was kind of an experiment basically, which is we rely very heavily on something called ipykernel, which is the thing that powers Jupyter notebooks. And there'd been a major version release of Ipykernel from 6 to 7 and it stopped working. And it stopped working in both of the products that we were trying to use it with. One was called Today NB Classic, which is the original Jupyter notebook, and then our own product called Solvit that would just randomly crash. And Ipykernel's over 5,000 lines of code. It's very complex code, multiple threads, events, blocks, interfaces with IPython, you know, with ZMQ, you know, all kinds of different pieces debugpy. And I couldn't get my head around it and I couldn't see why it was crashing. The tests are all passing. I wonder if AI can solve this. You know, like, I'm always interested in the question of, like, how big a chunk can AI handle on its own right now. The answer turned out to be yes, I think it can. Just, it was like. So I spent a couple of weeks. I didn't develop a lot of understanding about how ipykernel really worked in the process, but I did spend quite a bit of time kind of pulling out separate, like. So the answer was in 2 hours. Codex 5 point. I think it was 5.2 at that time, or maybe 3 had just come out. Couldn't do it then if I got the 200amonth GPT 5.3 Pro to fix the problems, it could. And so by rolling back between those two pieces of software and those two models, I could get things working over a couple of weeks period. And like you say, it wasn't at all found, it was very tiring and it felt stressful because I wasn't really in control. But the interesting thing is I now am in a situation where I have the only implementation of a Python Jupyter kernel that actually works correctly, as far as I can tell, with these new version 7 protocol improvements. And now I'm like, well, this is fascinating because we don't have a kind of a software engineering theory of what to do now. Like, here's a piece of code that no one understands. Yeah, am I going to bet my company's product on it? And the answer is, I don't know because, like, I don't, I don't, like, I don't know what to do now because no one's like, been in this situation and like, will it, does it have memory leaks? Will it still work in a year's time if there's some minor change to the protocol? Is there some weird edge case that's going to destroy everything? No one knows because no one understands this code. It's a really curious situation.

50:28

Speaker C

I mean, first of all, we should acknowledge the pernicious erosion of control. So at the very beginning, you have 10% AI generated code. And then you can just see how it creeps up and up. And then at some point, six months down the line, a PR comes in. And now, you know, 60% of the code is AI generated. And do you see what, you see what happens? You slowly become disconnected. But the bull case for this is, you know, in AI, there's this idea called functionalism that, you know, we don't care what the intelligent thing is. Made out of. As long as it does all of the right things, then we know, you know, we would say it's AI and it's the same thing with software. So the bull case is I understand the domain, I don't need to write, I don't need to know how to write the quicksor algorithm. I just need to, I just need to understand it. Right. And then, you know, so I just need to have all of these tests and it needs to go into deployment and these things need to happen. And at that point, you know what, I don't actually care. And I could also, and to be

54:05

Speaker B

clear, I quite like that framing. But you know what that actually does is it says, wow, software engineering sure is important then because software engineering is all about finding what those pieces are and how they should behave and then how you can put them together to create a bigger piece and then how you can put them together to create a bigger piece. And if we do that well, then in 10 years time we could have software that is far more capable than anything we could even imagine today. But you're only going to get that with really great software engineering. Yeah, you want to be careful. I think in the end, like ipykernel, I'm finding, for example, it's just too big a piece. Right. Because in the end the team that made the original ipykernel were not able to create a set of tests that correctly exercised it. And therefore real world downstream projects, including the original NB Classic, you know, which is what ipykernel was extracted from, didn't work anymore. So this is kind of where our focus is on now. On the development side at Answer, AI is finding the right sized pieces and making sure they're the right pieces. Knowing how to recognize what those pieces are and how to design them and how to put them together is actually something that normally requires some decades of experience before you're really good at it. Certainly it's true for me, I reckon I got pretty good at it after maybe 20 years of experience. Yeah, it's a big question is like, how do you build these software engineering chops which are now even more important than they've ever been before. They're the difference between somebody who's good at writing computer software and somebody who's not. That feels like a challenging question.

55:02

Speaker C

I know. And there's also this notion that there are so many different ways to abstract and represent something. You know, the world is a very complex place and maybe the way we've been abstracting and representing software is mostly a Reflection of our own cognitive limitations. Right. And even in the sciences, in physics you tend to have a lot of quite reductive methods of modeling the world. And then you've got complexity science, which is just embracing the constructive, dissipative, you know, gnarly nature of things. And I think a lot of software today we don't understand, right? So for example, there are many globally distributed software applications that use the actor pattern. And this is just this. It's basically like a complex system, right. And the only way we can understand it is by doing simulations and tests, because no one actually knows how all of these things fit together. So you could argue, I guess, as a bull case, that maybe we already are doing this at the top of software engineering and that is what we want to do eventually anyway.

57:00

Speaker B

Yeah, I'd say probably not. You see companies like Instagram and WhatsApp dominate their sectors whilst having 10 staff and beating companies like Google and Microsoft in the process. I would argue this way of building software in very large companies is actually failing. And I think we're seeing a lot of these very large companies becoming increasingly desperate. And for example, the quality of Microsoft Windows and macOS has very obviously deteriorated greatly in the last five to ten years. Back when Dave Cutler was looking at every line of the NT kernel and making sure it was beautiful, it was a elegant and marvelous piece of software, you know, and I don't think there's anybody in the world who's going to say that Windows 11 is an elegant and marvelous piece of software. So I actually think we do need to find these smaller components that we do fully understand and that we need to build them up. And here's the problem, AI is no good at that. And so I say that empirically they're really bad at software engineering. And then I think that's possibly always going to be true because, you know, we're asking them to often move outside of their training data. You know, if we're trying to build something that literally hasn't been built before and do it in a better way than has been done before, we're saying like, don't just copy what was in the training data. So. And again, this is a confusing point for a lot of people because they see AI being very good at coding and then you think like, oh, that's software engineering. It's like, oh, it must be good at software engineering. But they're different tasks, there's not a huge amount of overlap between them. And there's no current empirical data to suggest that LLMs are gaining any competency at software engineering. Every time you look at a piece of software engineering they've done, like the browser for example, which Cursor created, or the C compiler, which Anthropic created, like I've read the source code of those things quite a bit. Chris Latin is much more familiar with the compiler example than me. But they're very, very obvious copies of things that already exist. So that's the challenge is if you want to build something that's not just a copy, then you can't outsource that to an LLM. There's no theoretical reason to believe that you'll ever be able to, and there's no empirical data to suggest that you'll ever be able to.

57:58

Speaker C

Yes, I think the punchline of this conversation is, and I'm sure you would agree with this, that we need to have the combination of AI and humans working together, right? Because the humans provide the, the understanding and all of the stuff we were saying about knowledge. But we can still use AIs as a tool, but we need to, we need to design operating models or ways of working that make that, you know, we say we, we don't want to diminish our competence and understanding, right? So it's very, it's a very fine

1:01:03

Speaker B

line that's, that's been our focus and we both focus on that for teaching and for own internal development. The stuff I've been working on for 20 years has turned out to be the thing that makes this all work. Stephen Wolfram should get credit for this. He was the guy that created the notebook interface, although also lots of ideas kind of go back to Smalltalk and Lisp and apl, but basically the idea that a human can do a lot more with a computer, when the human can manipulate the objects inside that computer in real time, study them and move them around and combine them together. That's what SmallTalk was all about with objects and APL was the same with arrays. Mathematica basically is a superpowered Lisp, which then also added on this very elegant notebook interface that allowed you to construct kind of a living document out of all this. So I built this thing called nbdev a few years ago, which is a way of creating production software inside these notebook interfaces, inside these rich dynamic environments. And I found that made me dramatically more productive as a programmer. And like today, even though I've never been a full time programmer, as my job, when you look at my kind of GitHub repo output, I think GitHub produced some statistics about it and I was like just about the most productive programmer in Australia. It's working and a lot of the stuff I build has lots and lots of people use it because it's such a rich, powerful way to build things. And so it turns out we've now discovered that if you put AI in the same environment with the human, again in a rich interactive environment, AI is much better as well. Which perhaps isn't shocking to hear, but the normal, like if you use CLAUDE code, which I know you do, and it's a very good piece of software, but the environment we give Claude code is very similar to the environment that people had 40 years ago. You know, it's a line based terminal interface, you know, it can use MCP or whatever. Most of the times it just nowadays uses BASH tools, which again, very powerful. I love BASH tools. I use them all the, you know, CLI tools all the time. But it's still just, it's using text files, you know, as, it's, as its interface to the world. It's really meager. So we put the human and the AI inside a Python interpreter and now suddenly you've got the full power of a very elegant, expressive programming language that the human can use to talk to the AI. The AI can talk to the computer, the human can talk to the computer, the computer can talk to the AI. You have this really rich thing. And then we let the human and the AI in real time build tools that each other can use. And that's what it's about to me, right? It's about like creating an environment where humans can grow and engage and share. It's like for me, when I use Solve it, it's the opposite of that experience you described with CLAUDE code. After a couple of hours I feel energized and, and happy and fulfilled.

1:01:31

Speaker C

I'll give you my take. I think that the thing that you're pointing to here is there's something magic about having an interactive stateful environment that gives you feedback. And that is because our brains kind of, they can do a certain unit of work. So we actually think through refining and testing with reality. And that's why, I mean I, during my PhD, I use Mathematica and Matlab and I agree. So we've got this REPL environment and you know, here's the matrix. Do an image plot, you know, do a change. This is what it looks like now and it's actually a wonderful way to kind of just, just refine my mental model about something. And, but, but CLAUDE code does a lot of this stuff. I I think it's mostly a skill issue. I think the people that use Claude code effectively do this. I've written a content management.

1:05:07

Speaker B

It's possible. It is possible.

1:05:55

Speaker C

It's possible, yeah. So, you know, I've written a content management system called rescript. And when I'm putting together a documentary video, it can go, it can, it can pull transcripts and then I can verify the claims. And you know, part of AI literacy is just understanding the, the asymmetry of language models. Right. So when you give them a sort of discriminative task, they're actually quite good. So if I tell it in a sub agent to go and verify every individual claim, it's much more accurate than if I was in generation mode and I was generating a bunch of claims. And the stateful feedback thing, again, you know, I can have some kind of like schematized XML dump and I can have like an application here on the side which is visualizing and it's like a feedback loop. And for me this is an AI literacy thing. Like the, the good people at AI are already doing this.

1:05:57

Speaker B

Yeah. So I don't fully agree with you. I agree you can do it in Claude code, and I agree it is a AI literacy thing as to whether you can. But also Claude code was not designed to do this. It's not very good at it and it doesn't make it the natural way of working with it. I don't want to say it's an AI literacy problem because that's like saying like, oh, it's a U problem. To me, if a tool is not making it the natural way for a human to become more knowledgeable, more happy, more connected with a deeper understanding and a deeper connection to what they're working on. That's a tool problem. That should be how tools are designed to work. So so many models and tools expressly are being evaluated on can I give it a complete piece of work and have it go away and do the whole thing? Which feels like a huge mistake to me. Versus, have you evaluated whether a human comes out the other end with a deep understanding of a topic so that they can really easily build things in the future?

1:06:40

Speaker C

I agree with all of that. But then there's the other interesting angle, which is that there was a famous talk by Joel Grus, and we'll talk about this, and he said that notebooks are terrible. They're really bad from a software engineering point of view. And at the time, and maybe still now, to a certain extent I agree with him because, you know, I've done ML DevOps. I've worked in large organizations, you know, like, trying to figure out how do we bridge, like data science and software engineering. And Claude code is already more towards the software engineering side. And what that means is it creates idempotent, stateless, repeatable artifacts. Right. So as you say, from a pedagogical point of view, it's really good having this stateful feedback because I can understand what's going on, but then I need to translate that into something which is deployable. And can you tell us the story of. You responded to Joel Bruce, didn't you? And it was a bit of a fiasco, wasn't it? But just tell us about that story.

1:07:52

Speaker B

He did a really good video called I Don't Like Notebooks. It was hilarious. It was really well done. And yeah, I was totally wrong. And all the things he said notebooks can't do. They can. And all the things he said you can't do with notebooks. I do with notebooks all the time. So it was a very good, very amusing incorrect talk. So then I did a kind of a parody of it called I Like Notebooks, in which I basically copied with credit most of his slides and showed how every one of them was totally incorrect. But I actually think your comment about it does come down to the heart of it, which is this difference between how software engineering is normally done versus how scientific research and similar things is normally done. And I agree there is a dichotomy there, and I think that dichotomy is a real shame because I think software development is being done wrong. It's being done in this way, which is all about reproducibility. And these dead pieces, it's all dead code, dead files. I will never be able to express this one millionth as clearly as Brett Victor has in his work. So I'd encourage people who haven't watched Brett Victor to. To watch him. But, you know, he shows again and again how a direct connection, you know, a direct visceral connection with the thing you're doing is all that matters, you know, and that's his mission, is to make sure people have that connection. And that's basically my mission as well. So for me, traditional software engineering is as far from that as it is possible to get. I think it's gross, I find it disgusting, and I find it sad that people are being forced to work like that. I think it's inhumane. And I just don't think it works very well. I mean, empirically, it doesn't work very well. And it's much less good for AI as well as it's much less good for humans. It hasn't always been that way with LNK and Smalltalk and Iverson and apl, you know, Lisp Wolfram with Mathematica. To me, these were the golden days when people were focused on the question of how do we get the human into the computer to work as closely with it as possible. You know, that's where the, the mouse came from, for example, like to be able to like click and drag and visualize entities in your computer as things you can move around. So I feel like we've lost that. I think it's really sad. Yeah. With Claude code and stuff, the default way of working with them is to go super deep into it. It's like, okay, there's a whole folder full of files. You never even look at them. Your entire interaction with it is through a prompt. It literally disgusts me. I literally think it's inhumane. And my mission remains the same as it has been for like 20 years, which is to stop people working like this.

1:08:47

Speaker C

I know, but so casting my mind back, I used to work with data scientists. They were using jupyter notebooks and what I found was typically, I mean, back then you couldn't. If you check them into Git, it wouldn't look very good. Most of these data scientists didn't know how to use Git. They would run the cells out of order, which means it wouldn't be reproducible. There are all sorts of things like that. The thing is, I agree with you that you can use them in this, in this workflow. But it comes back to what I was saying before about, you know, we were talking about the call center and it being like a low intelligence job. You know, the data scientists, the reason why they are doing intelligent work is they are actually creating something that doesn't exist. They are figuring out the contours of a problem. They're actually working in a domain that is poorly understood. But you could argue now the bull case is when the data scientists can succinctly describe the contours of the problem. Maybe we could go to Claude code and we could implement it properly. But how do we bridge between those two worlds?

1:12:29

Speaker B

I think that'd be a terrible, terrible idea. You know, like you don't want to remove people from their exploratory environment. You know, research and science is developed by people building insight. You know, whoever you listen to, you know, whether it be Feynman or whatever. Like you always hear from the Great scientists how they build deeper intuition by building mental models which they get over time by interacting with the things that they're learning about. Like in Feynman's case, because it was theoretical physics, he couldn't actually pick up a spinning quark, but he did literally study spinning plates. You got to find ways to deeply interact with with what you're working with. So many times I've seen data science teams, because you're right, data science teams aren't very familiar with git and aren't very familiar with things that they do need to understand. So often I've seen a software engineer will become their manager and their fix to this will be to tell them, oh, to stop using Jupyter notebooks. And now they have to use all these reproducible blah blah, virtual env blah blah. They destroy these teams. Over and over again I've seen this keep happening because the solution is not create more discipline and bureaucracy, it's solve the actual problem. So for example, we built a thing called an NB merge driver, which a lot of people don't realize this, but actually notebooks are extremely git friendly. It's just that git doesn't ship with a merge driver for them. So git only ships with a merge driver for line based text files, but it's fully pluggable and so you can easily plug in one for JSON files instead. And so we wrote one. So now when you diff, you know, when you get a git diff with our merge driver you see cell level diffs. If you get a merge conflict, you get cell level merge conflicts. The notebook is always Openable. In Jupyter, NBDime did the same thing. So two independent implementations of this. So yeah, there were problems to solve, but the solution to it was not throw away Brett Victor's ideas and make people further away from from their exploratory tools, but to fix the exploratory tools. And I think all software developers should be using exploratory based programming to deepen their understanding of what they're working with so that they end up with a really strong mental model of the system that they're building and they're working with. And then they can come up with better solutions, more incrementally better tested. I basically never have to use a debugger because I basically never have bugs. And it's not because I'm a particularly good programmer, it's because I build things up small little steps and each step works and I can see it working and I can interact with it, so there's no room for bugs, you know.

1:13:22

Speaker C

You know, I'm so torn on this because I agree with you. And I'm also skeptical of people who say that organizations, they converge onto ways of doing things and they no longer need to evolve, they no longer need to adapt. You know, innovation. Innovation is adaptivity. Right. And we should increase the surface area of adaptivity as much as we possibly can. So we need people that are constantly testing new ideas, finding these constraints. But by the same token, we need to use the cloud, we need to use CI cd, we need to get this stuff into production.

1:16:48

Speaker B

Yeah, so do. But like, there's absolutely no like, so NB dev ships with out of the box CI integration. And the tests are literally there because the source is a notebook. The entire exploration of how does this API work, what does it look like when you call it the implementation of the functions, the examples of them, the documentation of them, the tests of them are in one place. So it's much easier to be a good software engineer in this environment. So, yeah, do both.

1:17:19

Speaker C

So do you remember there was that existential risk should be an urgent priority, and it was signed by folks like Hinton and Demis Asabis, and you responded basically with a rebuttal, and that was with Aravind, the snake oil guy. Tell me about that. Do you think we should be worried about AI, existential risk?

1:18:01

Speaker B

I mean, that was a certain time, wasn't it? And I feel like things have changed a bit, thank God. I feel like we not just be an Aravin, but broadly speaking, the community of which we're a part kind of probably won that. Now we have other problems to worry about. But you know, basically at that point the prevailing narrative was AI is about to become autonomous. It could become autonomous at any moment and could destroy the world. So it very much comes from Alicia Yukowsky's work, which I think clearly has been shown to be wrong at many levels at this point.

1:18:20

Speaker C

They would refute that, obviously.

1:19:06

Speaker B

Of course they would.

1:19:07

Speaker C

Yeah.

1:19:08

Speaker B

It's one of those things that they can always refute, just like any doomsday cult, unless you give it a date and the date passes.

1:19:09

Speaker C

Well, even I've updated a little bit in the sense that I, I now think I would now say that these models can be said to be intelligent in restricted domains. The ARC challenge showed that. So if you place constraints into the problem, you can, you can go faster towards a known goal, even agency. You can put a planner on that and you can go, if you, if you know where you're going, you can go there faster, but that doesn't help you. Like you can have all the intelligence and agency in the world, but if you don't have the knowledge and the constraints, then you're going in the wrong direction faster. And I think they don't seem to appreciate that these models don't actually know the world.

1:19:16

Speaker B

Like none of that was even relevant to Arvind. And my point, which was and is that it's misunderstanding where the actual danger is, which is that when you have a dramatically more powerful technology entering the world that can make some people dramatically more powerful, people who are in love with power will seek to monopolize that technology. And the more powerful it is, the more strong that urge from those power hungry people will be. So to ignore people. So here's the problem. If you're like, I don't care about any of that. All I care about is autonomous AI taking off, singularity, paperclip, nano goo, whatever. The obvious solution to that is, oh, let's centralize power, which is what we kept seeing particularly at that time, let's give either very rich technology companies or the government, or both all of this power and make sure nobody else has it. In my threat model, that's the worst possible thing you can do because you've centralized the ability to control in one place and therefore these people who are desperate for power just have to take over that thing.

1:19:50

Speaker C

Could we distinguish though what you mean by power? Because we've just spent some of this conversation talking about how it's not actually as powerful as people think it is.

1:21:18

Speaker B

But I'm not even. But mine is an even if thing, right? So I'm just saying, even if it turns out to be incredibly powerful, I don't even want to argue about whether it's going to be powerful, because that's speculative. Even if it's going to be incredibly powerful, you still shouldn't centralize all of that power in the hands of one company or the government. Because if you do, all of that power is going to be monopolized by power hungry people and used to destroy civilization. Basically, you'll end up with a case where all of that wealth and power will be centralized with the kinds of people who want it centralized. So society for hundreds of years have faced this again and again and again. So when it's like writing used to be something that only the most exclusive people had access to knowing about writing, and the same arguments were made, if you let everybody write, they're going to use it to write things that we don't want them to write, and it's going to be really bad. Ditto with printing, ditto with the vote. And again and again, society has to fight against this natural predilection of the people that have the status quo power to be like, no, this is a threat. So when we're saying, like, okay, what if AI turned out to be incredibly powerful? Would it be better for society to be that, to be kept in the hands of a few, or spread out across society? My argument was the latter. Now there's also an argument which is like, don't worry about it, it's not going to be that powerful anyway. I just didn't want to go there because it's not an argument that's easy to win because you can't really say what's going to happen. We're all just guessing. But I can very clearly say, like, well, if it happens, would it be a really good idea to only let Elon Musk have it, or would it be a good idea to only let Donald Trump have it?

1:21:26

Speaker C

Dan Hendricks spoke about this offense, defense, asymmetry. So it's actually very important for us to have countervailing defenses. But let's just take that as a given for a minute because obviously when we look at something like Meta and Facebook, it's quite clear what the power imbalance is. They control all of our data. They know what we're doing with something like OpenAI and Claude. So it's not as good as we thought it was, because actually, humans still need to be involved. But, for example, they have all of our data, right? And you might be working on some new innovative technology and you're using Claude and you're sending all of your information up there and they can now copy you. I mean, what kind of risks are you talking about? To be more concrete?

1:23:57

Speaker B

Yeah, no, I mean, so I was not talking about any of those things. Right. So at the time I was talking about this speculative question of what if AI gets incredibly powerful?

1:24:37

Speaker C

I mean, like now, for example, they say that this is the new means of production, and that seems completely hyperbolic to me. But, like, in your best estimation now, if there are risks, what are they?

1:24:45

Speaker B

If there are risks with the current state of technology? I mean, I think some of them are the ones we've discussed, which is people enfeebling themselves by basically losing their ability to become more competent over time. That's the big risk I worry about the most, the privacy risk. It's there, but I'm not sure it's Much more there than it was for Google and Microsoft. Before you used to work at Microsoft. You know how much data they have about the average Outlook, Office, et cetera user. Ditto for Google. You know, the average Google Workspace or Gmail user. Those privacy issues are real. Although I think there are bigger privacy issues around these companies which the government can outsource data collection to. So back in the day, it used to be companies like ChoicePoint and Axiom. Nowadays it's probably more companies like Palantir. The US government is actually prohibited from building large databases about US citizens, for example, but it's not prohibited. Companies are not prohibited from doing so and the government's not prohibited from contracting things to those companies. So I mean, that's a huge worry. But I don't think it's one that AI is uniquely creating. It's certainly so you're in the uk. As you know, in the uk, surveillance has been universal for quite a while now. It certainly makes it easier to use that surveillance. But a sufficiently well resourced organization could just throw a thousand bodies at the problem. So, yeah, I'm not sure these are new privacy problems. There's maybe more common ones than they used to be.

1:24:58

Speaker C

Yeah. Jeremy, I've just noticed the time. I need to get to the airport.

1:26:56

Speaker B

All right.

1:26:59

Speaker C

This has been amazing.

1:27:00

Speaker B

Thank you, sir. Thank you for coming.

1:27:01

Speaker C

Yeah.

1:27:03

Speaker B

Hope you had a nice trip.

1:27:03

Speaker C

Thank you so much.

1:27:04

Speaker A

Close your eyes. Exhale. Feel your body relax and let go of whatever you're carrying today. Well, I'm letting go of the worry that I wouldn't get my new contacts in time for this class. I got them delivered from, from 1-800-contacts. Oh my gosh, they're so fast.

1:27:07

Speaker B

And breathe.

1:27:23

Speaker A

Oh, sorry. I almost couldn't breathe when I saw the discount they gave me on my first order. Oh, sorry. Namaste. Visit 1-800-contacts.com today to save on your first order.

1:27:24

Speaker B

1-800-contacts.

1:27:35

Speaker D

It's tax season and at Lifelock, we know you're tired of numbers, but here's a big one you need to billions. That's the amount of money in refunds the IRS has free flagged for possible identity fraud. Now here's another big number. 100 million. That's how many data points LifeLock monitors every second. If your identity is stolen, we'll fix it, Guaranteed. One last big number. Save up to 40% your first year. Visit lifelock.com podcast for the threats you can't control. Terms apply.

1:27:37