The a16z Show

How Foundation Models Evolved: A PhD Journey Through AI's Breakthrough Era

57 min
Jan 16, 20263 months ago
Listen to Episode
Summary

MIT professor Omar Khattab discusses his framework DSPy and argues that the path to AI progress isn't through AGI but through 'artificial programmable intelligence' - building structured systems around language models rather than relying on raw model scaling. He advocates for formal abstractions that allow developers to declare intent without drowning in implementation details.

Insights
  • The industry has moved away from believing that scaling model parameters and pre-training data alone will solve AI problems, now focusing on post-training pipelines, retrieval, and tool use
  • Natural language prompting is too ambiguous for building reliable AI systems, while traditional programming is too rigid - a new abstraction layer is needed that combines both
  • AI systems should be built as programmable, modular systems rather than monolithic models, similar to how software engineering evolved from assembly to higher-level languages
  • The real challenge isn't model capabilities but specification - helping humans articulate what they actually want from AI systems in a structured way
  • DSPy represents a paradigm shift from imperative to declarative programming for LLMs, allowing developers to specify intent while the system handles optimization
Trends
Shift from pure model scaling to systems-based approaches in AI developmentGrowing emphasis on post-training optimization and human feedback integrationMovement toward declarative programming paradigms for AI systemsIncreased focus on AI system composability and modularityEvolution from prompt engineering to formal AI programming frameworksIntegration of reinforcement learning techniques with language model optimizationDevelopment of context-scaling techniques for handling longer inputsEmergence of 'artificial programmable intelligence' as alternative to AGI pursuit
Quotes
"Nobody wants intelligence, period. I want something else, right? And that something else is always specific, or at least more specific."
Omar Khattab
"It's not a problem of capabilities, it's a problem of actually. We don't necessarily just need models, we want systems."
Omar Khattab
"I'm interested in API or artificial programmable intelligence. And the reason I say this is why are we building AI? I think fundamentally it's in my opinion a way of improving and expanding the set of software systems we can build."
Omar Khattab
"That idea that scaling model parameters and scaling just pre training data is all you need exists nowhere anymore. Nobody thinks that actually people deny they ever thought that at this point."
Omar Khattab
"The question sounds to me, don't you have chairs at home? Don't you wish that they all looked like tables? I need both."
Omar Khattab
Full Transcript
3 Speakers
Speaker A

Nobody wants intelligence, period. I want something else, right? And that something else is always specific, or at least more specific. There is this kind of observed phenomenon where if you over engineer intelligence, you regret it because somebody figures out a more general and maybe potentially simpler method that scales better. And a lot of the hard coded decisions you made are things you end up regretting. So I think it's fair to assume that like models will get better and algorithms will get better and a lot of that stuff will improve. Then the question we really ask is intelligence is great, but what problems are you actually trying to solve? That idea that scaling model parameters and scaling just pre training data is all you need exists nowhere anymore. Nobody thinks that actually people deny they ever thought that at this point. Now you see this massively human designed and very carefully constructed pipelines for post training where we really encode a lot of the things we want to do. You see massive emphasis on retrieval and web search and tool use and agent training. There is clearly a sense in which the labs have already recognized that the old playbook doesn't work. The question is, is that actually sufficient for making the best use and the most used of these language models? It's not a problem of capabilities, it's a problem of actually. We don't necessarily just need models, we want systems.

0:00

Speaker B

The conventional wisdom says we're racing toward AGI by making language models bigger and bigger. But what if the entire framing is wrong? On today's episode, you'll hear from a16Z general partner Martin Casado and guest Omar Khattab, assistant professor at MIT and creator of dspy. Omar doesn't think we need artificial general intelligence. He thinks we need artificial programmable intelligence. And the difference matters more than you think. Here's the paradox. Khattab has built one of the most widely used frameworks for working with LLMs, dspy. But he's skeptical that raw model capabilities will solve our problems. While others obsess over scaling laws and parameter counts, he's asking a more fundamental question. Even if models become infinitely scalable, infinitely capable, how do humans actually specify what they want? Natural language is too ambiguous. Code is too rigid. We need something in between, a new abstraction layer that lets us declare intent without drowning implementation details. Think of it as a jump from assembly to C. But for AI systems, the stakes are higher than prompt engineering. This is about whether AI becomes a programmable tool we can reason about and compose, or just an inscrutable oracle we prompt and pray we get into the three irreducible pieces of an AI system. Why the model God is a dead end and what it actually means to build software when intelligence is cheap but the specification is hard.

1:14

Speaker C

Well, listen, Omar, it's great to have you and congratulations on everything. Just so for everybody that's listening. Omar is doing some, in my opinion, of kind of the more interesting technical work in building frameworks around LLMs and models. And, you know, a lot of this has consequences on things like, you know, AGI and capabilities and everything else. And a lot of your comments on social media to me have been kind of some of the most insightful. So I've been really looking forward to having you on the podcast.

2:37

Speaker A

Thank you for hosting me, Martin. I'm meeting you guys to chat as well.

3:04

Speaker C

Awesome. So listen, maybe let's just start with your background, you know, since we have some shared roots and then we'll go from there to a general conversation.

3:06

Speaker A

So, I mean, I'm now an assistant professor at mit. I started a few months ago in electrical engineering and computer science and part of TSAIL. I did my PhD at Stanford where I think the timing was really interesting. I started in 2019 and I graduated, you know, about a year ago. That timing was really great because foundation models as a concept didn't even necessarily have that name. We hadn't coined it at Stanford yet. Was starting to take shape. You know, Bert was around for about a year at the time, but people were sort of hadn't really figured out how to make them work. But I would say as importantly, how to make use of them to build different types of systems and applications, which is basically what I did throughout my whole PhD.

3:16

Speaker C

So, I mean, you're the, I presume, the primary person behind dspy, is that correct?

3:55

Speaker A

That's. You could say that, yeah.

4:00

Speaker C

Yeah, yeah. So for those of you who don't know DSPY is widely used. We're going to be talking about it. That's one of the most widely used, I would say, open source projects around prompt optimization for LLMs. So maybe let's just go ahead and start. You know, you have tweeted, you know, about, you know, whether LLMs will get to AGI or not. I know it's a kind of very fluffy, high level place to start, but would love to. Your thoughts on, you know, are, you know, are we headed towards AGI in the near term? Is this an apt goal? Like, where do you weigh down? And it is particularly timely right now given the conversation that Andre Tripathi just had on the dorkish podcast where he was like, well, you know, maybe 10 years, if you're optimistic. Where do you weigh in on this debate?

4:01

Speaker A

So I mean, I think honestly it's a surprising position because I feel like I'm not sure, but I'm less sort of say bearish than Karpathy necessarily, you.

4:44

Speaker C

Know, you are less bearish than Karpathy on AGI.

4:52

Speaker A

Right. Which is very strange to me. But let me tell you what I think. So back when I started my PhD, basically you could look at a lot of sort of the work that we've done, you know, with my advisors and collaborators and others over the past six years or so as pushing back on this perspective that scaling model size and maybe doing a little bit more pre training and you know, especially at the time it really was about model size and just sort of doing more uniform scaling of that nature is just going to solve all of your problems. And the pushback has to, you know, has two sides. One side is this is an incredibly inefficient way to build capabilities that you care about if you know what you want. That's just a. Waiting for everything to emerge is just incredibly inefficient and the diminishing returns just speak for themselves. The other problem is really a problem of specification or like abstractions. Scaling language models makes this un, I think realistic bet that anything people want to build with these models is just a few keywords away or a few words away and that people know how to actually think of what these words should be because I think it's an incredibly limiting abstraction. But the reason is the reason I'm less bearish than maybe Karpathy sounded, although again, I'm not really sure is, you know, I mean, I think we're seeing very rapid, I would say improvement in the perspective that we see out of the Frontier labs. Like that idea that scaling model parameters and scaling, you know, just pre training data is all you need exists nowhere anymore. Nobody thinks that actually people deny they ever thought that at this point. And now you see this massively human designed and very carefully sort of constructed pipelines for post training where like we really encode a lot of the things we want to do. You see massive emphasis on retrieval and web search and tool use and sort of agent training. And you see all of this emphasis on, you know, you know, OpenAI at their latest thing was building this agent builder and they have products like Codex and others. So there is clearly a sense in which the labs have already recognized that the old playbook doesn't work and that they are actually or at least is not like complete and so if by AGI we just mean this thing that, you know, at a very large set of problems, you can ask it sort of problems and as long as you give it enough context, it's able to handle them. You know, the models are increasingly powerful and reliable. The question is, is that actually sufficient for making the best use and the most use of these language models? And I think that's where my fundamental pushback doesn't go anywhere. Because I think the problem is just, it's not a problem of capabilities, it's a problem of actually we don't necessarily just need models, we want systems. And I can speak a lot more about that.

4:58

Speaker C

Yeah. So I just wanted to just a little bit. So there is a view of the world that like, kind of like some variant of the, you know, transformer architecture is going to get us there. And then the end to end argument kind of suggests that, you know, you put all the data into one model and you have one model that will just become, you know, so good because scaling laws hold that it solves all of reasoning. Right. That's kind of this, you know, absolutist end to end argument.

7:38

Speaker A

I think nobody believes that anymore anyway.

8:07

Speaker C

I think people do in video, maybe not in LLMs, but in video. I think a lot of people are like, listen, there will be one video model that you put everything in. It does everything, it does 3D, it does physics, it does whatever. So maybe in LLMs, you know, people don't believe that anymore because they're for along and to suggest it's not true. There's another view which is like LLMs are totally a dead end. You know, what did Car poppy call them? Ghosts, which I thought it was so beautiful, which is, you know, they can kind of, you know, do some sort of linear interpolation of stuff that they've heard in the past. But like they can't do planning. And so you need an entirely new architecture. And you're saying that you're not in that camp of getting an entirely new architecture?

8:10

Speaker A

It depends, because I've been arguing for a different architecture for years. But that different architecture is built around having these models.

8:47

Speaker C

No, no, a hundred percent. Yeah, that was the third one I was going to say. So the first one is like one model rules them all. The second one is this is the wrong path and there is no kind of system you could build with these models. You know, you've got to do something totally different. Right. I would say like Yann Lecun would say that with JEPA or whatever. Like you need to do something fundamentally different. And then you are in this third spot, which is you can build some sort of system with these models and you can get to, I mean, AGI is such a loose word, but you can actually get to what we're trying to achieve, which, you know, pretty generalized intelligence to tackle any sort of problems. Is that a fair characterization?

8:54

Speaker A

I think so. I mean, I think you could think of it as, I think AGI is fairly irrelevant. Like it's not the thing I'm interested in. I'm interested, I joke sometimes. I'm interested in API or artificial programmable intelligence. And the reason I say this is why are we building AI? Why are we building, seeking to build AGI? I think we're not like, you know, and you can take a step back and ask, you know, well, maybe it's a scientific question or maybe it's just like a dream people have, but I think fundamentally it's in my opinion a way of improving and expanding the set of software systems we can build, or just systems we can build in the world. And if you think about why people build systems, software systems as an example, but really any engineering endeavor, it's not so much so that it's not really about that we lack general intelligences. There are a billion general intelligences out there, like there are 8 billion people. We build the systems because we want them to be processes that are reliable, are interpretable, that are, you know, easy, you know, that we can iterate on, that are modular, that we can study, that we can. Right. So there is all these properties that are scalable and efficient. There is a reason we care about systems and that is not like, you know, it's not that we lack intelligence. So the question that I think is most important is how do we build programmable intelligences? And I think the alignment folks get some of this right. It's not so. It's, you know, you could have a very powerful model that doesn't listen to what you say and a lot of pre trained models could be perceived that way. You know, they have a lot of latent capabilities, presumably. And the question is, you know, could you make it do what you want? But I think what alignment fails to do, at least as a general sort of way of thinking, is it sort of omits to think about, well, what is actually the, what is the shape of the intent that people want to communicate to these models? How can I get people to actually express what it is that they want to happen? And with that, with that bottleneck being, you know, as narrow and Tight as it is, it's not a question of are the models capable enough or not. So that's what I'm saying. I might be even less bearish than Karpathy about whether the models will get so good such that given all the right context and the right instructions and the right tools and the right. They become powerful. Yeah, like maybe I think this is very aligned.

9:27

Speaker C

Again, we don't, you know, not to refer to another discussion that's not on here, but just in general you more take issue with the definition of AGI as being like the same thing as an animal or a human, which is, that's not actually particularly interesting given there's a bunch of animals and humans. But like we actually want smarter software systems and then you think a comp, like a systems based approach to models is the right way to get there. Is that fair enough? So like it's not going to be one model, it's going to be set. Then can you maybe roughly sketch out what you think is the right way to build a system to do this? What are the components that are meaningful in this?

11:33

Speaker A

So I would say like the first inspiring sort of concept here or like the starting point for this conversation is look, to be honest, I have no idea what the capabilities of the models, the core capabilities of the models will be today, tomorrow, in a year, in 10 years. I just don't really know. And I'm invested in getting a sense of how that will happen and progress. And I think it's kind of, it's easy to sort of model different paths based on how you think the progress that's been happening has been happening. But in any case, like there is kind of a bitter lesson to keep in mind. And I don't mean necessarily Rich Sutton's own interpretation of his great essay. I just mean like it is true that there is this kind of observed phenomenon where you know, if you over engineer intelligence in AI, you regret it because somebody figures out a more general and maybe potentially simpler method that scales better and a lot of the hard coded decisions you made are things you end up regretting.

12:14

Speaker C

Yeah.

13:08

Speaker A

So I think it's fair to assume that like models will get better and algorithms will get better and a lot of that stuff will improve. And then the question we really ask is, well, you know, intelligence is great, but what problems are you actually trying to solve? Like what is the, you know, what is the application that you want to improve? Are you trying to, I don't know, approach helping doctors do medicine? Are you trying to improve doing certain types of research, you know, cure cancer maybe? Are you trying to build the next Codex or Cursor or, you know, one of these types of coding applications Are you building? You know, so the question is like, what are you actually trying to solve? And I would argue that intelligence is this really amazing, powerful concept, but precisely because it's a foundation for a lot of applications. And sort of the analogy I like to draw here is improvements in chip manufacturing and increasing numbers of transistors in, in. In. In sort of CPUs. Nobody thinks that more general purpose and more powerful general purpose computers make software obsolete or make us forget about systems. The thing you think about is they make software possible, but you kind of need to have a stack. So back to your question. What should the stack look like? I think the first thing we need to agree on is what is the language, so to speak? What is the medium of encoding, of intent and of structure with which we can specify our systems to that computational.

13:09

Speaker C

So, yeah, so could we approach exactly this question with. I just have a line of inquiry on exactly this question, which is like, what is the right. What is the right language to specify? So, so I'm, you know, I'd love you to tell me why this is the wrong approach. So let's assume I'm an advocate for the God model, right? So models just keep getting better. There's one model that keeps getting better. Let's say that my task is software, and I want to build, you know, the game Core Wars.

14:29

Speaker A

What is it called?

14:55

Speaker C

Core War. Okay, this is. It's a very old hacker game from like the 1970s where, like, you would write software that would try to kill each other. So let's say here, so I want to build. I want to build a online multiplayer version of Core War. So, so what is wrong with the following approach? I have a prompt that says I want to build a multiplayer version of Core War that's online, and that's my prompt. And then I just sit and I just wait for models to get better. Why is that not like the right approach to this?

14:56

Speaker A

So actually, so something about what you said is great. What you said is you just said it. You express the thing you want, and you were so lucky that the thing you wanted was easy enough to express. Like, you know, you're assuming that the speaker, you know, in this abstract hypothetical scenario is being honest. What they want is to build that particular software you mentioned. They fully specified it in a single sentence, and they're not doing anything else. They're waiting for the models to get better. The only issue I have with this by the way, is that, well, I don't know how long you're going to wait, but if you're comfortable with it, that actually that's endorsed by me. The problem is, as you're probably actually trying to hint at maybe is well, most things people want, especially most things that don't exist yet. There is no five word statement that even a ma, you know, even the best intelligence in the world is going to, to do for you. And there is really an untrivial amount of alignment. Ish. Like isn't it just like that's such a loaded.

15:26

Speaker C

Is such an important statement for this discussion which is like there is no say, simple way to describe what you want. There's multiple ways to interpret why that is. One of them is, well, yeah, one of them is I don't know what I want. Another is my wants are complex so I want to use a lot of words. And another one is there's actually fundamental trade offs. So like my wants would be ambiguous. Are you talking about all three of those or.

16:18

Speaker A

Yeah, I'm talking about all three. If when it comes to like actually getting people to express what they want from a system, I mean the kind of the premise I start from is people want systems, nobody wants intelligence, period.

16:44

Speaker C

I think this is a great, this is such a great point. I actually really like how you said that. I hadn't thought about it that way.

16:56

Speaker A

But I don't want better GPUs. Right. I want maybe a neural network. I want something else. Right. And that something else is always specific or at least more specific. And so the question is what is the number of things people can want? So if the vision of AGI. And by the way, this, I like the reason I said like I'm pretty, I'm not necessarily super pessimistic in practice is the frontier. AI labs have been like, they kind of tackle them one at a time, but they've had a track record enough for me to be reasonable enough to like when they reach a bottleneck, they kind of go to the next thing and then unblock themselves and. Right. So, so that's great. But you know, at some point, like there's a view of AGI, which is GPT3, the original GPT3, but scaled 1 million times or 1 billion times and you get, you know, a GPT10 and there's that GPT10 that you go to and in order to build, you know, a complex system. Sorry, in order to not build a system. Right. In order to just Treat it as the end user facing system. Every time you go, you juggle your context and you juggle your prompting, which might, you know, maybe because the model is so good, it might not be the prompting part might not be that hard. And you like, you ask it from scratch every time or some ridiculous thing like that. And I think in the grand scheme of things people are slowly realizing obviously that's not what you want. And so this is the argument for systems is that it's just all of this decision making that happens in making a concrete application or product or thing that encodes taste and knowledge about the world and also knowledge about human preferences or some substrate of a complete story that you want. And it kind of systematizes, it encodes it makes, it makes it sort of maintainable and portable and modular. Because all this stuff that we like to have in building systems and the moment you start thinking that way, you don't want that to be like a blurb, like a string blurb.

17:00

Speaker C

Yeah. So, so let's, I mean I've got, I don't want to get too philosophical, but for me this always begs this very interesting question. So let's just take what you're saying at face value, which is I have a lot of complex wants and those shift over time. And so like a string will never encapsulate it. And so, you know, you know, I'll want to say a whole bunch of stuff and like maybe pull some context in. But it could be the case that these models are so powerful that I just start to abdicate want. Do you ever think about that? I'll just want less. I'll be like, I want whatever the model gives me. Do you think that there's any direction in the future where like we just are less picky about our actual wants and we do converge to like these high level things or. Are you really convicted that.

18:42

Speaker A

No, this is totally possible. I mean, recommendation algorithms versus like search algorithms. Recommendation algorithms are like, give me what I want.

19:24

Speaker C

Yeah. So literally, literally like my, my, my universal prop that I could just like, I can just go to the beach and every time there's like a new model, I just go use it on that new model. Rather than building a complex system would be give me what I want right now. Right, right.

19:30

Speaker A

And you know, over time that model can train you like a recommendation feed. Right. Like you just open the for you tab. Yeah, exactly what it gives you. But I mean, hope it doesn't, you know, I mean, but that, that requires such a fundamental, that's a choice we can make. And a different choice we can make is. Well actually no, we do care about, you know, building systems and encoding knowledge into them. One thing that's been growing on me for a while is kind of to make this slightly less philosophical, although maybe not much. You know, the idea in machine learning that, you know, like it's kind of a fundamental and old and known idea, but there is no free lunch and there's a lot of interpretations of the same theory. The theory is true, it's a mathematical statement and it basically just says if you assume nothing about the world, all learning algorithms you can build and all learners you build are equally bad, pretty much. And once you sort of understand the mathematical version of that, it's kind of, it's almost a, it's a really simple statement and what that really means. And I think like it's a, it's something that comes time and you know, time and time again. Is that something fundamental about intelligence, as we call it, is actually about knowing our world and knowing what, because we're humans, what humans are like and what humans are interested in. And it, you know, like you can't kind of scale your way into that. Now if humans themselves change their preferences to be simpler. Yeah, that's the future that's possible.

19:43

Speaker C

I actually agree, I think like there are like sometimes we want real solutions to problems. There are fundamental trade offs. We have to articulate those trade offs. Right. I mean the only, like there is no simplified version of the answer given what you want to accomplish. So let's assume we're in that world. So I can't go to the beach with my one prompt. Instead I have to like describe. So you've done work on dspy, which I think is in my opinion just the most systematic approach to making the prompt more powerful. So maybe you can describe DSPY and how it works and how it addresses this problem. Yeah, sure. Very specifically the problem is like we've decided that like my one prompt and just waiting for the model to get better is not going to be sufficient for whatever reason. So now I need a better way to think about prompting. Yeah.

21:02

Speaker A

So actually think back to your example. Suppose that what you wanted to build was a bit more complex, so there was more specification involved. But suppose also that you were in more rush because again, applications don't like, they don't want to wait. Nobody. Like, I'm building a system. I want to use the best, you know, intelligence, so to speak, that exists now. But I do need to progress, I do need to proceed. And so the question is, what are you going to do? And when I started, you know, one of. One of the hardest things that makes communicating DSPY stuff difficult is we've been doing this, some version or another of this for something like six years, and DSPY itself is three years old. Like, a lot of this is codified before a lot of the changes in the field. So it kind of makes some of the conversations slightly trickier. But what people did for the longest time. So that's what they did in 2022, when people, you know, were tinkering with early models. That's what they did in 2023. Only in 2024 did there start to be some slight change to this. But fundamentally, to this day, the biggest hurdle in using a model is front engineering, which is basically at least my understanding of it. And really, I think the most canonical understanding of it is changing the way in which you express what you want, such that it evokes the model's capabilities in the right ways. And so this is less about, I would argue, things that are much more timeless and important. And it's really about the belief that there is like a slightly different wording of what you ask that could get the model to behave a lot better. And the problem is that this is actually true. This is true for the latest. This is why, you know, OpenAI and others, you know, and Anthropic and others, they release, like, even for the latest models, they release prompting guides and why not? And they say, well, yeah, you're not holding it right. Right, you're not. And they're correct. But for the most part, the argument that early DSPY was making was is that.

21:46

Speaker C

How do you pronounce it? Dspy.

23:24

Speaker A

Yeah, dspy, like numpy or.

23:26

Speaker C

Oh, I love that. Oh, I almost called it dspy.

23:27

Speaker A

The argument that we want we were making was the models keep getting better, but in any case, they keep changing and the thing you want to build changes a lot more slowly. I'm not saying it doesn't change, but there is kind of. There is actually a conceptual separation between what you were trying to build and LLMs and vision language models. That space is basically separate. And so what if we could try to capture your intent in some kind of purer form? And that intent has to go through language. That's why you're trying to do AI, is that there's some inherent under specification and fuzziness. You're trying to defer some decision making. I don't know how this function should exactly behave in every edge case, but please Be reasonable is what you're trying to communicate, right, with these types of programs you're building. So DSPY says basically there is a number of ideas that you need and you need them together. Which is the thing that I think is a little trickier to a lot of people. You need these five things, there's five bets that DSPY makes and you need them together and you need to be seamlessly composable. And actually, in order to get all five, you don't need five concepts, you basically fundamentally need one concept. So the idea is we have Python, we have programming languages. These programming languages encode a lot of things that are highly non trivial. First of all, they have control flow. And control flow means that I can get modular pieces really easily because I can define separate functions, these separate functions and modules. The nice thing about them is that they really give you a bunch of stuff. So they create like a notion of separation of concerns where contracts of different functions can be described without you knowing everything inside the function or caring about everything inside the function. You can, if you trust that it was built properly, you could just invoke it and it does its job. And then you can reason about sort of how you can compose these things, but you can also compose algorithms over functions. Like I can have a, you know, a more general sort of, you know, processor or function or something that takes these functions and applies things on top of them that are sort of higher level concerns. I can refer to variables and objects and mutate them or, you know, pass them around. When I say if this, then that, and I really mean it, I don't have to go to the model to reassure it that if it doesn't do this or if it, you know, if it does listen to that if statement because I really mean it, I will tip it a thousand dollars. You know, and one thing here is this is a really limiting paradigm. Conventional programming is a really limiting paradigm. Why would we want to go back to it? And I think the answer is like all of the things I mentioned now, like all these symbolic benefits from a specification standpoint. Now this is not about capabilities are really hard to encode in natural language. You can reinvent them. You can tell the model, you know, if you see this, then do that. And the model might reasonably say, well, you know, he didn't actually really mean like 100% of the time. I think the reasonable thing this time is an exception. Right?

23:31

Speaker C

Well, you actually can't, I mean, you actually can't do that with natural languages without implicitly for creating a formal Language. I mean, the most obvious version of this is ambiguity. Right. So the dog brought me the ball and I kicked it. That's fundamentally ambiguous. You don't know if I kick the dong or if I kick the ball. And both are totally reasonable, depending on the person. Right. And so at some level, English doesn't do the job.

26:08

Speaker A

Right. So, so, but programming languages. I agree. Right. But programming languages are also really fundamentally limited in that you have to over specify what you want. Like you kind of have to go be above and beyond what you actually want because no ambiguity is allowed. And that forces you to think through things, you know, maybe you don't even know how to do. Like how do you write a function that generates good search queries or that, you know, plays a game well, or, you know, it's very difficult to do.

26:34

Speaker C

Yeah, I don't want to, I don't want to get too wonky because I know where you're going. And I just have to say this because it just helps frame this conversation. I mean, by the way, what you said on X, which we're getting to really kind of change my brain, which is. So for imperative programming, that's absolutely the case. Right. Which is you need to know everything that possibly happens. Right.

27:01

Speaker A

Or if you don't know, you make it. You're. You're making like the language is going to make a very fixed assumption.

27:18

Speaker C

Yeah, it's going to make some basic assumption. Right. So it's almost like if you're managing a state machine, you've got to know every state machine transaction. That's imperative language. Declarative languages are quite different. Right. So in declarative languages you actually specify what you want formally. Right. And then the system kind of figures out how to get to that end state. Yeah, Right. But the problem is you have to be able to specify every aspect of that end state perfectly, you know, which again, like for some problems, is very complex. So that's also limited. And you just have to know the end state.

27:24

Speaker A

Yeah.

27:55

Speaker C

So now, you know, you're working on DSPY and I would love, you know, you talked about like how using LLMs with a bit more formalism pushes it to like yet another level.

27:56

Speaker A

Right. So the only new abstraction in dspy and it's incredibly simple, it's just this notion of signatures, it's just borrowed from the word, like for function signatures. Our most fundamental idea is that which is just so basic and simple is that interactions with language models in order to build AI software should decompose the role of ambiguity, should isolate ambiguity into functions. And what do you want to specify a function? Like, how do you declaratively specify a function? I think the first, like the most fundamental thing is it takes a bunch of objects. They better be typed and, you know, like, they better take, you know, they better have like interesting and meaningful names. It does a transformation to them and you get back some structured object, you know, potentially carrying multiple pieces. And when you do this, it's your job to. It's your job and this is not easy, but it is your job to describe exactly what you want without thinking particularly about the specific model or compositions you're thinking of. And this is actually a lot harder than it sounds to most people. So, for example, you would not, you know, so there is a class of. There is a class of problems for which some people actually write prompts that are almost signatures. So these are cases where you only have one input. Your output is just a response. You're not trying to like, you know, like you basically take a chatbot, you know, because the APIs are usually. Or the models are structured such that this is a very natural use case. And people like, they try to prompt minimally, right? So they don't encode a lot of, you know, they don't say, I don't know, think step by step, or you're an agent that's supposed to do this, or they just kind of just say what they want. So there's a class of people that almost implicitly write signatures, but there's something wrong with the original, with the fundamental, like, shape of the API that usually exists. And so signatures are just saying, here is a better shape. And we made every decision here slightly more carefully. Now, once you have signatures, every other part of DSPI, from an abstraction standpoint falls off of it. There's really nothing else. Once you have signatures, you could ask, I have a function declaration. It's just declaring a function. It doesn't do anything. One of the hardest things about, like people wrapping their head about DSP signatures is a signature does absolutely nothing. And it's entirely their job to build it. Right? We actually can't help them at all build the signatures. A lot of the time people are like, well, couldn't you generate the signature from this or that? The signature is encoding your intent. I know nothing about your intent up front. That's the whole point.

28:07

Speaker C

What are the. So what, to be very clear, what are the signatures written in?

30:18

Speaker A

I mean, fundamentally, it could be a drag and drop thing. It could be a. But usually, like, it could be whatever. But the Point is, it is, it is a Python class. Usually this is a Python.

30:21

Speaker C

It's formal. It's formal. It's not English, right?

30:30

Speaker A

It's. Well, it's a formal structure in which almost every piece is a fuzzy English based description. So you could say something like, I want a signature that takes a list of documents and the list of documents is the typed object. You could actually say list of documents. And you have to define what the typed document means. And the fact that it, you know, the fact that this type is document, maybe the name matters. Like a list of document is not documents is not necessarily the same as a list of images. Right? They're different things and they're like semantically and fuzzily different. And basically like it says in English, given these inputs, you have several of them. I want to get these outputs and you have several of them and maybe the order matters. So it's really just like I argue, it's like it's what a prompt is supposed to be or what a prompt wants to be when it grows up. It really is just a cleaner prompt. Now, if you grant me that, which I argue is like a really small, it's a very simple contribution, there is really not a lot of richness to this. But that's the point. You get everything else that makes programming great while being able to build really powerful AI systems because you can now isolate your ambiguity at the right joints. You have a notion of where you want the joints to be. And the rest of your systems, the rest of your programs can be very modular. You can have multiple signatures. So now you get what people call multi agent systems. Multi agent systems are just AI programs in which you have multiple functions. You know, it's not really that. There's really nothing, it's not really a complicated idea. Once you take this, you get things like inference time strategies. People are like chain of thought. You know, you have to write your prompt in this way or we have to train the model in a certain way or react agents or, you know, program of thought. We recently released this thing called recursive language models. You know, the thing is when you're solving a task, none of these inference strategies should be of your concern is if, when you wanted to, you know, like this is just a thing that should be compositional and signatures have the shapes that were like, we can use programmatic sort of constructs to compose over these types of, you know, constructs.

30:32

Speaker C

Do you, when you think of, you know, DSPY when you're originally creating it and now as you think about It. Do you think about it as something that will fundamentally only be consumed by humans or for humans?

32:16

Speaker A

Not at all, no. I can't imagine, I can't imagine cases where you bridge the gap. So.

32:27

Speaker C

So, and the reason I ask is there's this obvious question is if the interface LLMs is going to be all automated anyways, do we need to enforce these restrictions that are primarily to keep, you know, natural language speakers within certain boundaries as opposed to if it's, if, you know, like whatever, if it's an agent calling it, we may not need to do that.

32:34

Speaker A

So I think it's just, I think the argument in DSPY is intent should be expressed in its most natural form. So that's the declarative part. And the second part is unfortunately or fortunately in the general case that cannot be reduced below three forms. Some things are really best expressed as code and no amount of automation can remove that. There's no amount of automation that can remove the fact that I actually want to think about three pieces because they're separate to me and I want to maintain them separately. No amount of automation is going to remove the natural language piece. Nobody wants to write Python to describe a really complicated AI system from scratch. And no amount of automation is going to remove the fact that for some classes of problems you really need a more RL like standpoint where you have a distribution of initial states or inputs and you have a way of judging them or like metadata about what correctness looks like. Because that really captures the wonky and sort of like exceptional long tail set of problems that actually vary by implementation or by model.

32:53

Speaker C

Yeah, but you may also just want diversity. Give me something that may solve this problem. Right. Like it may just be that, like there is no formal specification. Yeah, totally.

33:52

Speaker A

Right, Right. So there is a machine learning. Like people associate DSPY a lot with the one that is most different to what they usually see, which is optimization. So a lot of new users and a lot of people that look at the paradigm and try to critique it conceptually, they miss the fact that you have to have these three pieces or like in the general case, you can't get away without any. Now by the way, there's a lot of applications where you do not need all three. If you're building yet another rag app and the model has been post trained to death to take a context and answer a question about it, you don't really need a lot of, you know, a lot of that to express your intent because just close to what the model is good at anyway, a lot of people associate DSPY with the third one, which is the database optimization. And actually a lot of, well, in, you know, well, well intending users would write overly simplistic and general programs and try to distill their intent through data or through kind of this process of trial and error. And that's a really like bad sort of. It's like a misuse of the power of the models and the power of the paradigm. Because if you know what you want, nothing can express it better than just say you saying what you want. The data based optimization is there to smoothen the rough edges. It's for you not to have to maintain laundry lists of exceptions. I'll wrap this up quickly, Martin. The other part of DSPY is the reason we built all of these abstractions and we haven't been changing them. This has been the case for these abstractions are basically three years old. They've basically not been changing. And what we spend all of our, a lot of our research time on is building algorithms. And the thing about those algorithms is I'm not wedded to any of them. I rarely go out, I mean, we usually get excited about one for, you know, a month or something, but I rarely go out and get particularly excited about getting anybody, anyone to pick one of them over the other. We recently released an amazing genetic optimizer for prompt called jepa. Before that we had another one called Simba that was this reflective method. And we had ME Pro before that. We have a lot of these algorithms and they're really clever and cool. But the thing that I'm interested in is we build these algorithms to expire as models get better, we can actually come up with better algorithms that fit, turning the abstractions into higher quality systems. And what we want to happen over time is that our algorithms expire, we build better ones. But the abstractions that we promised and the systems that people express in those abstractions remain as unchanged as possible. So that's kind of like a, that's something that's kind of unusual to a lot of, a lot of sort of folks in the space.

34:01

Speaker C

It may also help just to kind of pencil out where this sits in the software development life cycle, right? There's two places you could put it. You could just be like, I am writing my software, I want to know what's the best prompt to use? You know, and then you could use it there, or you could be like, actually the best prompt is determined at runtime. And so maybe you could invoke this. You know, actually what so do you have is There a standard use? Do you do this like basically before the software is deployed or are there actual runtime uses where you're, you know, trying to find the right.

36:19

Speaker A

So the two sort of like concepts that exist in DSPY for this and I don't know how technical we want this to be, but like we have the notion of modules, this is borrowed straight directly from like neural network layers or pytorch modules. Which is just saying once I have the shape of the input and the shape of the output, which is a signature, I can actually build a learnable piece that has some inherent structure, like what a machine learning person would call an inductive bias. And I want it to take that shape and implement it for me, but carry some parameters internally about what it could learn. So that's a module. And a module is entirely an inference time object in the sense that it modifies behavior when it's being invoked. So things like agents and different types of agents and code based or tool based agents or chain of thought reasoning, all of these are inference time strategies that are modules. And the aspect in DSPY here is that these must be decoupled from your signature. Your signature should know nothing about the inference time techniques that you're using. The other aspect of DSPY is optimizers, which are again they're just functions, like modules are just functions, but they're functions that take your whole program like an actual complete piece of software that has potentially many pieces. And they think holistically, how do I use language models in order to get this thing to perform its intended goal, which might be maximizing a score on a test set, but in principle it could just be like do what the model understands from the instructions it should be doing. And this could be. People do this at inference time sometimes in the sense that like it happens while the user waits, so to speak, user of a system. But it's fundamentally different contract because it sees the whole, this extra information. I see the whole system when I'm an optimizer. I don't just see like an isolated module, I can see sort of all of the pieces. I can see a data distribution, I can see the notion of reward. And so I have much more like a much richer space because there's strictly more information that no inference technique, no LLM sort of is able to capture just from an information like flow standpoint.

36:52

Speaker C

You know, it's interesting because I mean a lot of people think of DSPY as basically prompt optimizer, which is here's my whatever my prompt template. Tell me what prompt would be the best? But to hear you describe it, it's almost like, you know, it's like this kind of, you know, declarative, you know, runtime, Y type thing. Do you know what the standard. I don't even know if you have visibility of this stuff because it's an open source project. But do you know what the standard use is? Is it the naive use case, which is largely prompt optimization, or are people actually using this in more sophisticated ways?

38:50

Speaker A

I think one reason where, okay, so I'm very, I'm very loud about the abstraction. I talk about them all the time. I give talks, I talk about them.

39:22

Speaker C

You scolded me on X about this. Sorry. No, it was fantastic. It was great. You really corrected me. I listen, I was one of the few people that really thought about it as a prompt optimizer. I really thought, listen, I'm going to write my prompt, I'm going to do some templating magic, I'm going to give it to DSPY and then it's going to give me like what's the best thing for what I want to accomplish. And then I'll just going to stick that in my program. That's the way that I thought about it until you, you know, made the point that it's actually more of a set of abstractions that will evolve with your program.

39:30

Speaker A

So I tried to learn from what happened historically in computer science. Like you had these machines and you know, you got general purpose chips and people were programming those directly in whatever language they spoke, right? Machine code. And maybe you could abstract it slightly with assembly. But then there was this amazing time where a lot of languages culminating maybe most popularly in C. But you know, various others before it got this idea that there's actually a general purpose programming model. Like you could build a model of a computer without thinking about any specific computer. And actually that's a bit of an illusion because every specific computer is a lot more complicated. But like you could create this illusion that is much more portable and is much closer to how humans think. And I know it's funny to use C as close to how humans think, but it really was a fundamental jump. Once you have C, it's important to ask, why do people use C instead of writing assembly? And it's really weird to me that anyone would use C because it's faster than assembly, like the code runs faster. So to me, when someone says they use DSPY because the quality of the system is higher, which by the way is very often the case, it's not really the right answer, because you're jumping, in my opinion, to a higher level abstraction such that actually I would be willing to give up some speed in order to have the portability, the maintainability, the closeness to how I think about the system and I want to manage its pieces, there is a trade off I'm willing to accept. Now. The reason people actually have universalized C and they don't regret it, is this amazing compiler ecosystem where people build all of these optimization algorithms and passes and sorts of infrastructure. You inline, you inline functions, so you break the modularity. Right. People are writing modular code, but you're actually breaking a lot of that modularity when it's being turned into an executable artifact. You eliminate dead code. You have all these heuristics, different heuristics for different machines sometimes. And so my vision here is if AI software is a thing, and AI engineering is a thing that needs to exist irrespective of model capability, because we want to have these diverse space of systems, what is the abstraction to capture it? And if natural language is too ambiguous to be the only complete specification of these systems and it's too mushy and we kind of want to have more structure, well, what would that structure look like? And if we know what that structure looks like, well, if we do it naively, you would actually lose a lot of. If we build DSPY poorly, you might have a really elegant program that. That sucks. Right? Like when you use this on nlm, it sucks. So the reason I build optimizers, or like we build optimizers as a team, is not so much that I think people can't write prompts and I want to write better prompts for them. What a boring reason. I don't care about that. Like, people can write prompts, people can iterate on prompts. That's not an issue. The thing I'm trying to say is I want them to express their intent that is less model specific and not worry that they're losing or leaving a lot on the table. Yeah, yeah.

39:56

Speaker C

And honestly, this is where, like, you change the way that I think about this whole thing. And so I'm going to try something I alluded previously in our talk, but I want to try it again on this because this is kind of how it changed my thinking. You can tell me where I'm right or I'm wrong, which is. So you said assembly in C, but we've actually had a lot of paradigm shifts since then. So C, let's just say this is kind of like an imperative Language where for every event that happens, you have to know how to handle it. Right? And so traditionally in distributed systems, like imperatives have not been a good approach because you could have some event, you know, show up at a node and then you don't know what state the node is in. And so you, I mean, it's just there's so the state space is so huge. And so you had actually a big abstraction shift with declarative languages, where declarative languages would be like, okay, listen, we're going to tell you what the end state of the system is and then the system will figure out all of the state transitions to get there, right? This was a higher level of abstraction for people not to have to worry about everything that kind of comes in and every event. And you can actually declare this is, I think like data log or something. You're like, here are all of the conditions that exist. And then I just want to make sure that the system is always in that state. And then the thing you kind of give up in that case is like you can't bound the amount of computation needed. Like you don't know how long it's going to take to get there, but it will always get you to like that state. So it's easier for a programmer. So like you can actually now build programs easier for certain classes of programs. Now when I look at dspy, I feel like it's the same type of leap between like imperative and declarative, but for LLMs where there's certain declarative. Like you can't write a declarative program that's going to solve the same problems at an LLM can because there's no fuzzy this and that and you can't really integrate them. Right? And so like you want the same type of shifts that you went because you've got a new problem domain that you have. And so DSPY kind of gives you that with lm. So you can kind of formally specify it in a way that's kind of natural but also safe. And then it decouples you from the actual implementation below it. So is that a fair way to think about it or is that like just a martinism?

42:38

Speaker A

No, I think that's a fair way to think about it. And I think one funny thing is I'm. And I think you would probably agree with this. I don't know that declarative is better or imperative is better per se. It's more that because of the problem domain.

44:42

Speaker C

Like literally declarative is better for ones where like you've got a very complex system with a lot of asynchronous events because you don't need to maintain a statement.

44:53

Speaker A

Yeah.

45:01

Speaker C

You know, all of these things have trade offs. All of them do. Right. Like it was an LLM to like add two numbers. Right.

45:01

Speaker A

And so I think like a really good shape for this is you want an imperative shell DS by actually compared to. There's a lot of sort of folks that create graph abstractions or whatever, like things that are fairly declarative.

45:07

Speaker C

I was, I'm going to go on the record saying graph abstractions generally are a bad idea in my opinion for basically anything in computer science, but go ahead.

45:20

Speaker A

Right, exactly. And I think it's because humans, when they think top down, we actually think imperatively. And so like DSPY is just Python, which is, you know, I mean, it's a complicated language, but it's fairly imperative in that you're just like, you do this, you do this, do this. But at the leaves where you were going to potentially have a fuzzy task, what were you going to do? I think you were going to write a prompt. I think the issue with prompts is actually fundamentally they're actually so declarative, they are too declarative that you're forced to, to break the contract of declarativeness because you're like, well, if I just say what I want, the model is never going to be able to fit in my bigger program. You know. But one reason, by the way, people forget this, is if you work with a chatbot that is tuned for human responses, you're doing most of the work that DSPOT has to do in a program. Like, in a program, if I have a function and I want to give it inputs and I want to get back outputs, like those have to actually go into variables. Those actually have to go like there's, you know, like the output has to, so to speak, parse in a certain way and I have to funnel things through this. If you're a human who's just asking the model questions, no matter what form it gives you, you're smart enough to be able to like, you know, be, you know, interact, bridge the fuzziness in how the shape of the model.

45:27

Speaker C

It's almost like the imperative is I know every step to the solution, so do every step. Right. And declarative is I actually don't know all of the steps, but I know the solution, so give me the solution. And like DSPY is almost, I kind of know how to frame the solution, do the rest of the work. Right. It's like this kind of fuzzy or.

46:39

Speaker A

Right, right.

46:55

Speaker C

And I just. Listen, there's trade offs to each, right? I mean like dspi, like you have to like whatever have the overhead of a soda model which whatever took a billion dollars to train and expensive and inference. So these are just different points in the declarative space, the performance space, the cost space, et cetera, by the way. So I'm totally bought in. I actually think this is such a nice way, even independent of pspi, like just the core work of this is how you should think about interacting with LLMs formally. I really think that you've kind of nailed that abstraction. So let's just take that as a given. So this is Right. So what are the hard problems now or what are the next set of problems now to make that more pragmatic, like the optimizations under the covers or like whatever you need to do to kind of.

46:56

Speaker A

So yeah, what we. So everything we talked about today, I do almost nothing about this because this is work we did three years ago. And I'm just out there telling people about it. But I'm not really, you know, we're not changing these abstractions. What we actually do is the following set of questions we ask. All right, someone wrote a program and we assume they did a reasonable job describing what they want. And maybe that means they wrote the control flow, they have the signatures and they have some data. These are the three pieces that you might want to have, or they have some, not all of these. How do we actually do a good job at optimizing this? So it's actually a really, I think, an interesting progression to sort of see how we progress from the very early optimizers in 2022 to the latest ones, very early ones had to work with models that basically didn't work. Right. And we're not had essentially no instruction following capability, but were hit and miss for their tasks. So what we did, look, you know the reinforcement learning people do on LLMs, which is you take the program and you do what we call like we bootstrap examples, which just is another like way of saying you just sample, like you just run the program maybe with high temperature or something. A lot of times you see which things actually work and you keep traces of all of these over time. And then those traces which are generated by the model, they can become few shot examples. And if you just do that, sometimes it improves a lot, sometimes it becomes a lot worse. So you just do some kind of discrete search on top to find which ones actually improve on average. That was like when models are really bad. As models have been getting better, we've been moving, we've moved a lot, basically all the way to reflective prompt optimization methods where you actually go to the model and you're like, here is my program. Here are all the pieces. Here's what this language means. Here are the initial instructions I came up with from like, just the declarative signatures, by the way. Here are some rollouts that are generated from this program. Here are how well they perform. Let's debug this. Let's kind of iterate on the system to debug this. And obviously there's a lot of scaffolding to make sure that search is actually like a, like a formal algorithm that is going to lead to improvement. But increasingly more and more of it is actually carried off by the models. One thing we also do a lot of is we ask, all right, conventional policy gradient reinforcement learning methods like grpo, nothing about them cannot be applied to a DSPI program because the DSPI program says nothing about how the optimization should happen. So actually, for a very long time, From February of 2023, you could actually run offline RL, and since May of 2025, you can run online RL or GRPO on any DSPY program that you write. You know, people think that it's limited to prompt optimization, but the, you know, I think the only notion that is fundamental in DSPY to prompt is that natural language is an irreducible part of your program, but that prompt is human facing. It's how you say what you want, how it turns, how it gets turned into. The artifact may well use reinforcement learning with gradients or natural language sort of learning. So we spend a lot of time on optimization. We also spend a lot of time on inference techniques like you just declared that you want your signature, which processes lists of books. Well, guess what? No model has long enough context to work with lists of books. So last week, my PhD student Alex and I released this idea called recursive language models, which sort of takes any model that is good enough and sort of figures out a structure in which it can handle, you know, or scale to essentially unbounded lengths of context. And we were able to, you know, push it to 10 million tokens and see essentially no degradation. And the reason we build these types of algorithms is we really want to back your signatures by whatever it takes to sort of bridge the gap between whatever the current capability limit of the model is and the intent you specified. And the last thing we think a lot about is, well, we've made this argument conceptually and tried to demonstrate it empirically that you need this irreducible. You need these three pieces, signatures in natural language, structured control flow and data to fully specify your intent, at least ergonomically enough. The question though is this is a very large space of programming where you need to figure out, how do I, okay, I have a concrete problem, how do I map it into these pieces, knowing that maybe I need all of them? And so we spend a lot of time, this is why it's a big open source project. We want to see what people actually build and learn from that sort of what are the software engineering, what are the AI software engineering practices that we should encourage and support? So these are the types of questions we think about. And I think one reason this has to have the structure of this open source project, it's just like this large fuzzy space is I don't want to be the only group or small number of teams working on any of these pieces. I think it's a space of the more academics and the more researchers and the more people work on optimizers, all programs benefit. The more people work on modules, all programs benefit. The more people build better models, especially programmable models, whatever that might mean in the future. It sort of models that understand that they're going to get used in this structure, that everyone benefits. And you know, it reminds me sort of of the way in which deep learning sort of really took off, which was some people iterated on the architectures, some people iterated on the optimizers of, you know, you got things like Adam, other methods. And I think like, that is what we're trying to really push, push a community towards.

47:38

Speaker C

All right, so one last thing just to kind of finish off. This is getting a little bit more philosophical. But, you know, I, I think in AI, it allows us to do this. And what you're addressing is again, the ability to declare intent, you know, for these models in a way that hits the abstraction. Right. If you could guess prophetically whether in the future, like the intent, like what these models are going to have independent agency, like agents, or it's going to be humans guiding them. Do you have any opinion on like the direction this goes? I asked this question a bit earlier, but I kind of want to ask it a bit more directly. Do you think that the need for a human to declare things formally is going to go away and over time, like we treat these like grad students or whatever and you know, this all just becomes the inner working of an agent? Or do you think that these things are formal software systems. This is a language like any other language. And we will expect the SPY to be like the interface, something like it. Something like that will need to be an interface that's exposed to humans, you know, for the foreseeable future.

52:37

Speaker A

I think you need some amount of grounding into the world when you build these systems. I just think people in AI a lot talk about, talking about AGI, but this kind of just ethereal intelligence is so smart. But the problem is intelligence that we care about, as far as I can tell is really about. It's almost about the things you might want to ask or the way things are in the world. It's very world oriented. It's not really this, you know, it's not this very abstract thing. So as models get smarter and smarter, I imagine that a lot of the problems people write programs for today could get a lot simpler. Because that use case, it's kind of like risk versus architectures as CPUs. You know, if you believe in sort of complex instruction sets, it's possible that like you had to do all of these, jump through all these hoops to do as fast square root before, but somebody just gives you an instruction for that. Models can keep absorbing with keywords or in their language, more use cases. But the human, philosophically, as you say, the human condition is that we will just want more complex things. And once you want these complex things in a repeatable way, you got to build a system. And if you want to build a system, I don't really see that not having a structure like I don't see that having the structure of LLM APIs today, I see it maybe some nicer, nicer facade on top of the dpi.

53:43

Speaker C

Maybe I can ask you a point of question. So you've grad students, right?

55:12

Speaker A

Yeah.

55:14

Speaker C

Do you ever wish that you had a DSP interface to them? Right. Like limit? It's a very structured way to have asks. Right. And then if not, wouldn't that argue that you wouldn't want that for an LLM in the limit either? So I think it's a glim question, but I actually mean it seriously. Is it like humans just aren't capable of doing this stuff and so that's the reason that we don't have formalism when talking about them? Or is it just a totally different.

55:15

Speaker A

I promise this is an actual answer to your question about grad students. But to me. So here's the answer. The answer is the question sounds to me, don't you have chairs at home? Don't you wish that they all looked like tables? I need both. I really want to have.

55:40

Speaker C

There's a software system, there's a grad student. They're totally different.

55:54

Speaker A

And there's nothing, there's nothing saying that AI that operates as a chatbot, as an agent is as an employee. Like agent is a problem. It's all I think we need it.

55:58

Speaker C

I love wonderful. That's wonderful. So sometimes you want to specify something to a machine.

56:07

Speaker A

Yes.

56:12

Speaker C

That has an LLM. And for that sometimes you just want to talk to something where two different solutions to that. I love that. This is a great answer. It's a great way to end this. Thank you so much for your time. This has been fantastic. Sir.

56:12

Speaker A

Thank you, Martin.

56:23

Speaker B

Thanks for listening to the A16Z podcast. If you enjoyed the episode, let us know by leaving a review@ratethispodcast.com a16z we've got more great conversations coming your way. See you next time. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a16z.com disclosures.

56:26