The a16z Show

From Code Search to AI Agents: Inside Sourcegraph's Transformation with CTO Beyang Liu

47 min
Jan 20, 20263 months ago
Listen to Episode
Summary

Sourcegraph CTO Beyang Liu discusses the company's evolution from code search to AI agents, highlighting their coding agent AMP which recently topped benchmarks for merged pull requests. The conversation explores the shift in software development toward AI agents, the challenges of non-deterministic programming, and concerns about US dependency on Chinese open-source AI models due to policy constraints.

Insights
  • AI agents represent a fundamental shift from deterministic programming to stochastic subroutines where developers abdicate correctness and logic to AI systems
  • The future of software development involves developers becoming orchestrators of multiple AI agents rather than writing code line-by-line
  • US policy around AI safety may be inadvertently creating competitive disadvantages in open-source AI models, with Chinese models dominating the agentic AI space
  • Different AI models excel at different points on the Pareto frontier of intelligence vs speed, requiring specialized agents for different tasks
  • Code review interfaces need fundamental redesign as AI agents generate code at unprecedented rates, making traditional file-by-file review obsolete
Trends
Shift from monolithic AI models to specialized agent architectures with different models for different tasksGrowing dominance of Chinese open-source models in agentic AI applicationsPost-training on open-source models becoming standard practice for product companiesAdvertisement-supported AI coding tools emerging as viable business modelsDeveloper workflow transformation from code writing to AI orchestration and code reviewFlattening of model capabilities leading to increased competition at the model layerState-by-state AI regulation creating compliance complexity for startupsOpen-source model development being constrained by copyright and liability concernsAI coding tools reaching 90%+ code generation rates for experienced developersEmergence of hybrid pricing models combining usage-based and ad-supported tiers
Quotes
"This is the first time in computer science I can think of where we've actually abdicated, like, correctness and logic to us. Like, in the past it was a resource, right? So maybe the performance is different, maybe the availability is different, but, like, whatever I put in, I'm going to get back out. But now we're like, figure out this problem for me."
Martin Casado
"You talk to some devs and they're like, you know, I've never been more productive. But coding isn't fun anymore. That's one of the things that we're trying to solve for."
Beyang Liu
"The United States invented the AI revolution. We built the chips, trained the frontier models and created the entire ecosystem. For right now, if you're a startup building AI products, you're probably writing your code on Chinese models."
Narrator
"By sheer lines of code volume, probably more than 90% of the code that I write these days is through amp. And I think it's only going to get higher and higher level over time."
Beyang Liu
"The best thing we can do is to take a step back and let the free market function. And so to that end, ensuring there's kind of like a standard nationwide set of regulations that's clear and well specified to be going after specific applications and application areas rather than you know, like general, you know, existential risk at the model layer."
Beyang Liu
Full Transcript
4 Speakers
Speaker A

This is the first time in computer science I can think of where we've actually abdicated, like, correctness and logic to us. Like, in the past it was a resource, right? So maybe the performance is different, maybe the availability is different, but, like, whatever I put in, I'm going to get back out. But now we're like, figure out this problem for me.

0:00

Speaker B

You talk to some devs and they're like, you know, I've never been more productive. But coding isn't fun anymore. That's one of the things that we're trying to solve for. It's like amazing new technology. It feels like magic. Never experienced anything like this before in my life. Then the narrative that was fun was like, this thing will just run our lives for us or it's gonna kill us all.

0:16

Speaker A

Total annihilation.

0:33

Speaker B

Like Terminator. And there's just like absolutely no danger that this thing's gonna acquire a mind of its own and try to reach out of the computer and kill you.

0:34

Speaker C

If you use this every day.

0:41

Speaker A

Right?

0:43

Speaker C

This idea that this thing could take over the world.

0:43

Speaker B

Yeah, exactly. That narrative, I think, is largely been dispelled within our circles, but I think that it's sort of like, taken on a life of its own in other circles and it's made its way to some of the halls of policymaking in the us. There's the old adage of like, you know, do you blame it on IGN or malice? I honestly don't know. But it is clearly, like, nonsensical and I think very much in the national interests to be still telling this story.

0:45

Speaker D

The United States invented the AI revolution. We built the chips, trained the frontier models and created the entire ecosystem. For right now, if you're a startup building AI products, you're probably writing your code on Chinese models. Today's guest is Byong Liu, one of the co founders of Sourcecraft. Biang is joined by A16Z's Martin Casado and Guido Appenzeller to talk about the shift he's seeing on the front lines of software development today. Sourcegraph's coding agent, which has hit number one on the benchmark for merged pull requests, runs on open source models. Many of them are Chinese, not because of ideology, but because they work better for what the company needs. Here's the tension. Biang studied machine learning under Daphne Kohler at Stanford. He spent a decade building developer tools. He knows the technology cold and his view is that we're sleepwalking into a dependency problem, not because Chinese models are dangerous, but because. Because American policy has made it nearly impossible to compete in open source AI. We dig into why the Terminator narrative around AI safety might be our biggest strategic mistake. Whether it's already too late to catch up. And what happens when the atomic unit of software isn't a function anymore but a stochastic subroutine you can't fully control.

1:15

Speaker A

Biang, thanks for coming and joining. So the topic today is AI encoding, but I mean I would say you're one of the world's experts on this and so we would love to kind of do a deep dive in kind of how you view the problem, how you view the solution. Of course, you're co founder and CTO of sourcecraft.

2:28

Speaker B

Yeah.

2:43

Speaker A

So we'll talk a bit about that as well. Of course we've got Guido. Thanks for being here. And so maybe just to start we can do a bit of a background on you and then we'll just kind of dig into details.

2:44

Speaker B

Yeah, so background. I've been working on dev tools for more than the past decade of my life. I started Source graph about 10 plus years ago, brought the world's first kind of like production legit code search engine to market and pushed that to I think a good portion of the Fortune 500. Prior to that I was a developer at Palantir and I guess now it's like the early days. Right. That's where I met my co founder Quinn and we were working on data analysis software and a lot of large enterprise code bases that were kind of like drop shipped into and realized that there was a big need for better tooling for understanding massive code bases. And then before that, I guess relevant now is I actually did machine learning as a concentration when I was doing my studies, so. And did some computer vision research.

2:54

Speaker A

Know that?

3:38

Speaker B

Yeah, under Daphne Kohler at AI guy. Yeah, yeah.

3:39

Speaker A

Ogai Dawson Engler, like Tyler stuff. I thought you were a system. You talk like a systems guy. Yeah, yeah.

3:43

Speaker B

For me like this whole phenomenon of like LLMs and coding, it's almost like a homecoming of sorts because I really.

3:51

Speaker A

I didn't know that.

3:56

Speaker B

That's awesome. Yeah.

3:57

Speaker C

Definitely taught me AI as well.

3:58

Speaker A

So she's a great teacher.

3:59

Speaker C

Yes.

4:02

Speaker A

I gotta say that was like the one class I was so happy I comped out of because I didn't think I would do if I didn't pass the comp. I thought it would defeat me.

4:03

Speaker C

I think I failed the comp too many times. I had to.

4:11

Speaker A

There were those two.

4:16

Speaker B

I was the TA for that class actually. 121, 228, 228.

4:17

Speaker A

Yeah, that's right.

4:21

Speaker B

That's what it was.

4:22

Speaker A

Yeah. Yeah. So cool. Great. So sourcecraft started code search navigation, but then now you've been making ways with amp, which is like, you know, an agent, would we call it? So maybe talk through a little bit about what you've been working on, maybe pre AI and now just to level set.

4:22

Speaker B

Yeah, so kind of like the history of the company is we were really built to make coding a lot more efficient inside large organizations and to make the practice of actually building software way more accessible, primarily to professional software engineers. But I think our eventual vision was always to expand the franchise. And we started by tackling the key problem, which is enabling humans to understand code. Because if you've ever worked inside a large code base, you know that that probably takes anywhere from 80 to 99% of the time, and then the remainder is when you actually understand the problem well enough to actually write the code. So that's where we kind of like built up our domain expert and then when LLM sort of matured, it was something that we were always kind of like monitoring in the back of our minds. Originally looked at LLMs and embeddings as a way to enhance the ranking signals that were incorporating into our search engine. And then when things really hit their stride with ChatGPT and all that, it was fairly obvious to us that there was a big opportunity to combine LLMs, which are this amazing technology, with a lot of the stuff that we'd built up to that point. And then I guess to round that out, finally our latest product is this coding agent called amp.

4:38

Speaker A

What's interesting about amp? I don't mean to come up. What's interesting about AMP is it's kind of viewed as like a very sophisticated kind of opinionated view on agents. Do you share that or is that just kind of the outside?

5:44

Speaker B

I would say there's certain things that we're doing that I think are quite unique.

5:56

Speaker A

It was the top recently on one of these benchmarks, right?

6:00

Speaker B

Yeah. I think there's like some startup out there that compares pull request merge rates or something. We would manage to claim the top spot.

6:02

Speaker A

It's awesome.

6:08

Speaker B

Yeah, excellent. Yeah, it was very gratifying to see. But again, like, I would say that I think we're opinionated on some parts of our philosophy of building agents. And my own take is I think a lot of these opinions will soon become widespread, but there's other elements of what we're doing which are like, people like to read in a lot to AI these days and Sometimes it's just like, look, we actually did something very simple here at yielded good results and we shipped that and it works very well.

6:09

Speaker C

So I think your focus is really on large code bases. How is that structurally different from, you know, me coding my little homebrew 201?

6:35

Speaker B

Yeah, so it's funny you mentioned that, like, historically the company's focus has really been on large code bases, but with amp, we decided to build it almost completely separate from the existing code. And the reason for that was one we built amp. AMP is really at this point like seven, eight months old. So we started AMP in around like February, March this year. And that was right at the wave of this new type of LLM hitting the world. The like agentic tool use, LLM finally worked. Yeah, yeah, finally worked. Like after so many demo videos, finally there was a model that could actually do robust tool calling and compose that with reasoning. And our original tack was like, okay, let's build this into some of the existing things that we've created. But the more we started playing around with the technology, the more we sort of came to the conclusion that this was actually truly disruptive and we should actually start from first principles to see, you know, build the agent from the ground up and see what tools we really need. So what we've arrived at is the coding agent, which works, I think very well in large code bases because again, we push this to a lot of our customers. But it's also great for like hobby coding. Like, I spun my dad up recently on it and he's been using it to create these like iPad games for our kid because, you know, typical Asian dad trying to teach him math, right? I want to teach him arithmetic and whatnot. And so my dad, who's never written a single line of code in his life, is able to just like, hey, make a simple game that has him count the numbers and then if he gets it right, the little rocket ship blasts off. So it's kind of interesting. It's a really interesting time to be building because even if you're building for professional developers, as we are, a lot of the technology ends up being just kind of widely accessible.

6:42

Speaker C

This is the new parroting. Your job as a parent is not to write the games for your kids that are agent broke it and curriculum supposed to learn. Yeah, I love it.

8:24

Speaker A

And another thing that's been kind of made kind of splash is you've recently decided to go to an advertisement based model. So like, yeah, on one hand, so I've got this dissonance internally, which is on one hand I'm like, this is the boutique sophisticated. On the other hand I'm like, and it's also for everybody with ads. And so like, how do you kind of reconcile.

8:32

Speaker B

It's really funny because I think we had this sort of reputation for being like the primo agent, yeah, totally super intelligent one, but we never had like a flat rate pricing model. We did pure usage based pricing. And that also meant that there was never any incentive to switch to a cheaper model for our users. So our attck was like the most intelligence and you just pay for the inference cost. But as we built more and more, we kind of realized there's sort of this efficient frontier that you can draw this two by two grid and one axis is intelligence, but the other axis is latency. And there's multiple interest points along this trade off curve. It's not just that having the smartest model makes your experience the best. The smartest model often tends to be a significant amount slower than other models on the market. And so we felt that there was like an opportunity for us to create like a faster top level agent that couldn't do as complex of coding tasks, but it could do these like targeted edits. And when we started to play around with these small, fast models, we realized that, hey, actually the inference costs are significantly lower. And that got us thinking, like going back to folks like my dad, right? Like, he's just doing this stuff on the side. He doesn't want to spend hundreds of dollars per month to create these kind of like simple games. We're like, hmm, maybe there's like a model here. I think it started as a joke. Someone's like, we should just do ads and see how that works. And everyone was like, nah, that'll never work. But then it just kind of kept coming back up. And at one point we're like, all right, let's just try it and see how it works. And we launched it and it's been growing very quickly since then.

8:49

Speaker A

Can I dig philosophically into this just a little bit? So I had a conversation with somebody that works on cloud code, which is a very successful CLI tool. And this person's like, you know, what we've done over time is we've literally just removed stuff between the user and the model. Like that's it. Like, that's like kind of like the way that we improve things are we just like do less and let the model do more. And so I guess that that makes, you know, it sounds kind of intellectually or intuitively interesting. Yeah, it kind of makes sense. But on the other hand, it seems expensive. Yes. Because you're like, you're like, here is this state of the art model that costs a billion dollars to train. And like now it's just the user and the model. And so it's almost like that statement is almost contrary to an advertisement based model or like what you're talking about, like, you know, like, like a fast model or smaller models. So like, are we seeing two parallel paths in the industry?

10:17

Speaker B

I. So there's definitely, you can, there's definitely different, like working styles. Right. Like depending on the task or maybe depending on the person you talk to. People using coding agents, and some of them are like, I just want to write a paragraph long prompt and then have. Have the agent go figure it out. I want to come back to something that's like mostly working. And then there's other people who say, like, actually I don't want to do that because half the time I myself don't have a clear idea of what I want yet. The creative process is sort of one where you kind of like figure out what the software looks like as you go along. And sometimes it's the same person saying both things. Right. Like when I go, there's some features where it's like, implement billing, where I'm like, okay, I know exactly what protocols we needed to support and we have the stripe integration. I know what feedback loops we need to hit. Then it's like, okay, big prompt agent, go at it. But then there's other types of development where it's like, I want to build a brand new feature. We just shipped this code review panel in our editor extension and that was kind of like a situation where I was like, I don't actually know what this review experience should look like because it's not me reviewing other people's code, it's me reviewing agent's code, which is like a new workflow. And for that I kind of did want a more interactive back and forth interaction between me and the agent. So I don't think it's necessarily like these two things don't have to be completely separate products, but they are distinct working modalities.

11:14

Speaker A

Interesting. That's a great way to put it. How do you think about the difference between using somebody else's model, like one of the Soto Labs versus building your own model versus using a open source model? How does that fit in your philosophy?

12:45

Speaker B

Yeah, so I would say our philosophy is not model centric, it's more agent centric. So we view the model as an implementation detail?

13:04

Speaker A

Yeah, I don't know what that means.

13:12

Speaker B

Okay, so let me explain. So like, when you're interacting with an agent, at the end of the day you care about how that agent is going to respond to your inputs. You know, what tools it's going to use, what sort of trajectories it's going to take, what sort of thinking it does. A lot of that goes back to the model, but it's not solely dependent on the model. There's a lot of other things that can influence how an agent behaves. There's the system prompt, there's a set of tools that you give it. There's a tooling environment, there's the tool descriptions, there's the sort of instructions that you give it for connecting to feedback loops. And with the same model with wildly different, like, tool descriptions and system prompts, you actually get like, you know, completely different behaviors out of that model.

13:13

Speaker C

Is that true in both directions? Like with the same prompts and two completely different models would get different behaviors?

13:52

Speaker B

Oh, for sure, for sure. It's like if you have like an agent harness, like a set of tool descriptions and you swap out the model, then there's no guarantee that, that that thing is going to work well with the model that you swapped in. And so what we view as like the kind of atomic composable unit is not the model. It's this thing called the agent, which is essentially this contract of like, user puts text in and gets certain behaviors out. And that agent is really a product of both the model plus all these other things that I just listed. And so when it comes to like figuring out what models we want to use, it's not so much like, hey, we want to use the latest quote unquote frontier model, XYZ Lab. It's really about, hey, what behavior do we want the agent to take, or in some cases the sub agent? And how do we find the right model that enables that agent to do its job?

13:57

Speaker A

It sounds so hard to me. This is the first time in computer science I can think of where we've actually abdicated correctness and logic to us. In the past it was a resource, right? Whatever. It's not logic. It's like, okay, so maybe the performance is different, maybe the availability is different, but whatever I put in, I'm going to get back out whether it's a database or compute or whatever.

14:52

Speaker B

These are like.

15:12

Speaker A

But now we're like, figure out this problem for me. Right? So you're kind of abdicating like core logic and correctness.

15:13

Speaker C

Your unit test comes back with works 45% of the cases.

15:20

Speaker B

Yeah, the non determinism is something that people struggle with a lot. But so for me, I actually do think, you know, like, like historically pre AI, when you think about computer systems, the basic unit of composability is the function call in programming. Right. So it's like when you think about your system, it's like this function calls out to these other functions and those other functions delegate to these other functions. I do think there's still an analog to that in the agent world. The agent is really the analog of the function, but just updated or generalized to AI.

15:23

Speaker A

Can I just push on this? Because I mean, listen, call me a traditionalist.

15:54

Speaker B

Yeah.

15:58

Speaker A

But for, for me, like, like computer infrastructure is compute, network and storage. Right. And like databases and these are resources that are abstracted. They're not like, like so give me storage, give me network. Yep. But like the semantics, like what actually happens, I write, right. That's like my code. We're here, we're like figure it out. For me, it's like we're abdicating actual like, like logic and correctness. It just feels like in a way like a little bit, you know, like in your case, for example, if you pick up, you know, let's say you're using model v2.1 and then you go to model v2.2, like you're have wildly different answers, right. It's almost like a new instruction set or something.

15:59

Speaker B

Yeah, you might have different answers, but I think if you construct the agent, right. They're not going to be wildly different. So like for instance, we have a sub agent that's designed to search for things like uncover relevant context. And you know, it is a bit of a dice roll every time, Right. Like it takes a slightly different trajectory. It might search for, but it's to the point now where if I want to find something in the code base, I have like 99% confidence that this thing will eventually be able to kind of like stochastically iterate to the right answer. And so in that way of thinking it's like, yeah, how it gets there might vary, but if I wanted to do a specific thing, it's reliable enough that I can invoke it.

16:38

Speaker A

It feels like there's kind of a backlash right now in the industry to evals. Do you view like this is an eval problem or like a runtime system?

17:22

Speaker B

Yeah. So you know, my take on evals is evals are definitely effective as a sort of like unit test or a smoke test. Because if you push a change to your agent and it breaks, something you want to know, right? Like, if there's like an important workflow that you're like, hey, this should work reliably well, because if this doesn't work, then probably a lot of other things break, then that's a great instance where you want an eval that will alert you when it goes from green to red. I think where it gets hairier is treating evals as, as a kind of like, optimization target. Because any eval set, like, what are you trying to capture if you're building an end user product, at the end of the day, what you care about is the product experience. And so you construct the eval set to kind of proxy the vibes of the user using the product. And by definition, that means your eval set is always like lagging a little bit from the frontier, because it takes time to distill what is a good product experience into a set of of evals. And we've had multiple times in our past where we've picked a number. Just to take an example, like with, you know, back in the kind of like code completion days of, you know, 20, 23 or whatnot, we had a coding tool that would do coding, autocomplete and the kind of like banner top line metric there was completion acceptance rate. Given that, I suggest this change to the user. What is the likelihood they're going to accept that? Seems like bulletproof, right? But actually, like, like, I think in building that, we ended up over optimizing that to a certain extent because there's. And like any metric you choose, there's going to be a way to game it well.

17:30

Speaker A

I mean, even in this one, like, okay, so like, the developer accepts it, but do they end up committing it?

19:07

Speaker B

Yeah.

19:11

Speaker A

Oh, they committed it. But it, you know, like, whatever did.

19:12

Speaker C

Like the, did it pass code?

19:14

Speaker B

Yeah.

19:15

Speaker A

Did the PR accept there's like a.

19:16

Speaker B

Subtle bug introduced or whatnot? Yeah.

19:18

Speaker A

You know, did it get merged into main. Like, I mean, it just feels like, you know.

19:21

Speaker B

Yeah, yeah.

19:24

Speaker A

You know, like this is kind of an adjacent topic, but something that Guido and I discuss a lot is to what extent the market is Pareto efficient on the Pareto frontier. Like, if you can trade off, like, let's say performance for cost or intelligence for cost, like, will the market kind of adopt that uniformly or does it just optimize only for speed or only for correctness? Like, yeah, being on the front lines. We would love, you know, your sense on this.

19:25

Speaker C

Here's a simple question.

19:57

Speaker B

Yeah, we Asked this question a lot.

19:58

Speaker A

And nobody seems to know.

19:59

Speaker B

Like, like it is the, the question here, like what matters more speed or.

20:00

Speaker A

Intelligence is whether the Pareto frontier is what matters or if it's kind of. There's, oh, there's points on the Pareto frontier that matter. Right.

20:04

Speaker B

So you can imagine. Okay, yeah.

20:09

Speaker A

So, so, so traditional pricing psychology is you're the expensive one or you're the cheap one.

20:11

Speaker B

Yeah.

20:15

Speaker A

Right. And, and everything in the middle is called the value gap, which people don't use. Right, Yep. And so originally we were like, oh, the prayer. Like that happens here. So either you buy the most expensive one or you buy the cheapest one. But actually as we kind of look in the market, it actually feels like most of the frontier is pretty full. Like, like developers are pretty sophisticated. Like, you know, different, you know, there's different cost sensitivities, different price sensitivities.

20:16

Speaker B

So yeah, so you know, it's funny that you mentioned this. Like, you know, the cheap option versus like the premium option. It just so happens that AMP has two top level agents. There's a smart agent and there's a fast agent.

20:37

Speaker A

Oh, that's interesting.

20:49

Speaker B

And the fast agent is the one that's ad supported, like we can offer for free and the smart agent is the one where we're like, okay, we're not. We will always only do usage based pricing for that because we want to keep that at the frontier of smartness. But that being said, like, I don't know, like maybe there's like a third point in there that could make sense. It really just comes down to the vibes at the end of the day. Like as we use this more heavily and see the usage patterns emerge.

20:50

Speaker C

The mid agent.

21:13

Speaker B

Yeah, the mid agent. Like I honestly. Yeah, well if you put it that way, like, oh, like yeah, the galaxy brain ideas, you either want, you know, smart or fat.

21:15

Speaker A

Cool. So I would love, I mean if you're open to, I'd love to dig into a bit on kind of your view on open source models. Do you use them?

21:29

Speaker B

Yes.

21:38

Speaker A

You know, do you think that they are an important part of the ecosystem?

21:38

Speaker B

Yeah, so we do use a variety of open source models. You know, we use both closed source and open source models quite heavily. But the open source ones I think are, are becoming a bigger theme now for a couple reasons. One is with an open source or open weight model you can post train them, which means if you have a domain specific task, AMP has a growing number of sub agents that are specialized for a specific task like context Retrieval or extra reasoning, library fetching. Those are more constrained tasks where you don't necessarily need like frontier general intelligence. If anything you want faster, right? And so the benefit of having open weight models is you can look at the thing that you're trying to optimize for like what that subagent needs and post train the model to accomplish that more, more effectively. And the other element of open weight models that's very appealing is just the pricing aspect of it. Like there's now more and more, more like effective open weight models that are emerging on the scene that are actually quite robust at agentic tool use. The landscape has changed immensely since June of this year. We've gone from there was really only one really good agentic tool use model to now there's like, could you name that?

21:43

Speaker A

I mean it'd be great to actually, I mean, I open.

23:12

Speaker B

Yeah, I mean like so you know, originally there was Claude, right? Like Sonnet or Opus, that was the first agentic tool use model. And that sort of, you know, know, ushered in, in the current agent wave. But now, you know, there's GPT5, there's Kimik2, there's Quantricoder, GLM.

23:14

Speaker A

Are these open source models like on par or pretty close?

23:34

Speaker B

It depends on the workload. So I would say in our evaluations for kind of like the top level smart coding agent driver, we still tend to prefer sonnet or GPT5, but for kind of like quick targeted edits or specific sub agents, I think more and more were preferring smaller models because they have better latency characteristics and because the complexity of the tasks isn't high. Like you reach a ceiling. It's like once you reach a certain level of quality, there's diminishing returns and then you start optimizing for latency because that gets you more, you know, interactivity.

23:40

Speaker C

What's the smallest models you can use for an effective agent?

24:28

Speaker B

I mean for an agent right now it's probably still fairly large, like talking to probably like hundreds of billions of parameters for kind of like a top level agent. But for like search agents you could go smaller than that. And then we also have a model that does kind of like edit suggestions. So you know, for those times where you still have to go into the code and manually edit stuff, this thing suggests the next edit that you'll make. And for that we use a very small model, like you know, single digit, billions of parameters.

24:33

Speaker A

So do you train your own models?

25:07

Speaker B

Yeah, we do.

25:09

Speaker A

Oh wow.

25:10

Speaker B

But I would say we don't train them from scratch.

25:10

Speaker A

No pre Training.

25:14

Speaker B

It's mostly no pre training.

25:15

Speaker A

That'd be dumb.

25:17

Speaker B

Yeah. At this point it's just like, it would be fiscally irresponsible. Pointless, probably.

25:17

Speaker A

Pointless.

25:23

Speaker C

Yeah.

25:24

Speaker A

Yeah. Are these for special uses, Special use cases? Like a lot of the products that we work with, let's say just outside of coding, just like a lot of products that we work with, you know, and just, I mean, here's, there's this general view. Pre training is done.

25:24

Speaker B

Yeah. Right.

25:37

Speaker A

Paying people to create data. We've hit economic equilibrium. Right. It's like, yeah, you can keep, you can keep paying people, but like, you know, we're hitting diminishing returns there because you need kind of more expensive people and like you need 10 times more data. And so at some point you hit equilibrium.

25:38

Speaker B

Yep.

25:52

Speaker A

But, you know, like, there's a lot of product data out there and there's a lot of users out there. And like, you know, the, the, the, the, the, the solution domain is enormous. And so you can start building smaller models. And you know, so it's like, you know, like a, A, is that correct? And B, you know, like the models that you train, do they kind of fit in that general pattern of specific smaller models?

25:52

Speaker B

I, I, I think that's spot on, actually. It's like the, the very large generalist models were great, and they still are great for experimentation because it's almost this thing on all sorts of data and it's almost like a discovery process where the training team themselves don't quite know what behaviors might emerge. But once you map those to specific workloads, specific agents that you want to build, then you have a much clearer target. And it's widely known that a lot of the model labs do this. Now behind the scenes, they might expose an API that's one model, but behind the scenes they're routing to smaller models. And you can also do that at the application layer. Like if you have an agent architecture like we do, there's all sorts of specialized tasks. We've broken down the process of software creation to various tasks like context fetching or debugging or things like that. And once you have a specialized agent for each, then you take a look at what the agent needs to succeed, and you try to get the model as small as possible while still maintaining the requisite quality bar.

26:12

Speaker C

So essentially it's not just a Pareto frontier of, of quality versus cost, but.

27:20

Speaker A

There'S like, there's also multiple, there's also multiple graphs, right?

27:25

Speaker B

Yeah, it's basically per agent. Like every agent maps to a, a Workflow. Yeah, it's emulating some workflow.

27:29

Speaker C

Yeah.

27:38

Speaker B

That you know, maybe approximately maps to something that a human used to do, maybe it doesn't, but it's, it's like a, it's a subroutine. This is why I go back to like the function analogy.

27:39

Speaker A

Yeah, yeah.

27:49

Speaker B

And so for any given age, a.

27:51

Speaker A

Subroutine where you abdicate the logic.

27:53

Speaker B

It'S a stochastic subroutine. Even weirder.

27:58

Speaker C

I mean now we have parameters like how much reasoning do you want? So it's a tunable subroutine. How powerful do you want to make this? What's your budget?

28:01

Speaker B

But there's like a mini Pareto frontier for each of these tasks.

28:08

Speaker A

Right.

28:10

Speaker B

And then the optimal point along that frontier is different for each task.

28:10

Speaker A

So I actually want to dig into, you know, like, like the open source models, the implications. I mean, I know that you've got opinions on that. We've got opinions. It's an interesting topic. But before we do that. So in 10 years, are we using an IDE? Are we using agents on a CLI? What happens to software engineering in 10 years?

28:14

Speaker B

Simple question. Okay, so I mean, listen like you're.

28:33

Speaker A

Like one of the people that is. No, no, no, for, for quite a while.

28:37

Speaker B

I do have a take on this.

28:40

Speaker A

I'm literally like the world expert on this question. I'm saying.

28:41

Speaker B

Yeah, so here's my take. I don't think it's not going to be an IDE that looks like any IDE that exists today. And it's not going to be like a terminal that looks like any terminal that exists today. My view is that, and I don't think this is a particularly unique view, it's just that the effect of AI on every single knowledge domain, including coding, is that it's just going to enable the human to level up. Right. So the job that you do already that like my job has changed so much in the past year, like I think about all the kind of like toilsome like line by line editing that I did like a year ago today it seems like completely foreign. I like honestly don't think I could go back at this point now when I'm doing stuff it's more at the level of like telling the agent to make the specific edits or execute like a specific plan. And I'm really playing the role more of like an orchestrator now and then. You still have to pop in and make some manual edits when it gets stuck. But increasingly I would say by sheer lines of code volume, probably more than 90% of the code that I write these days is through amp. And I think it's only going to get higher and higher level over time. And so when we think about the interface that a human will interact with primarily, I think the future looks like something that allows you to orchestrate the job of multiple agents and crucially something that allows you as the human to understand the essentials of what these agents are outputting. And I actually think that's probably the limiting bottleneck today.

28:45

Speaker C

Of course.

30:21

Speaker A

Comprehension. It's like the human comprehension just on like.

30:22

Speaker B

Yeah.

30:24

Speaker A

Does it map to my understanding of exactly the problem needs, even at like a business size? Because there are fundamental trade offs in systems.

30:24

Speaker B

Yes, right, like, yes, but, but I.

30:32

Speaker A

Think you can't wish those away.

30:33

Speaker B

You can't wish them away. And the human is the bottleneck. But I think the human is still essential and will still remain essential 10 years from now in software engineering because it's fundamentally a creative process.

30:35

Speaker A

No, no, that's what I mean. I mean, sorry, I just want to make sure we're talking about the same thing. Oh yeah, Like a human has in their head of what they want to accomplish.

30:43

Speaker B

Yes.

30:49

Speaker A

And only the human has that in their head.

30:49

Speaker B

Yeah, yeah.

30:51

Speaker A

And so like often that's going to require choosing a point between two trade offs.

30:51

Speaker B

Right.

30:56

Speaker A

Like whatever that is. And so like there has to be some way that this articulation has happens.

30:57

Speaker B

Yes. And, and when you, when you talk to like practitioners today, like a lot of them are very, it's like bittersweet because like on the one hand it's like, oh my God, like agents, they're writing all this code and they're actually pretty good at it. On the other hand, it's like, oh, I'm spending like 90% of my time like essentially doing code review now, which is, you know, this like the one 1 in 100 dev that you talk to that says like, I really love code review. The rest of us are like, oh man, it's like such a dragon while.

31:02

Speaker C

Becoming middle managers of coding.

31:31

Speaker B

Yeah, yeah, exactly. I mean you talk to some devs and they're like, you know, I've never been more productive, but coding isn't fun anymore. And so, you know, that's one of the things that we're trying to solve for actually.

31:33

Speaker C

Beauty, the elegance is gone. It's now all looking at implementations, requirements.

31:42

Speaker B

Yeah, it's that. But also it's just like the task of like reviewing code I think is a slog. And like classical code review interfaces are just not that good. Like, I think they were never that good. But it wasn't like blindingly obvious because the rate at which like lines of code were shipping was remarkable.

31:46

Speaker C

It's a super simple example right today if I review code from pretty much any coding agent out there, typically it's just like file by file by file by file.

32:05

Speaker B

Yeah, yeah.

32:13

Speaker C

Like grouping this by task or something like that. Or explaining it a couple of arrows with little bubbles.

32:14

Speaker A

Exactly.

32:19

Speaker B

You are literally like there's so much low hanging fruit here. So we launched a review panel in our editor extension last week. Week. That's, it doesn't get all the way there, but I think it's the first step and it's already like, it's way better than like an existing like code host review tool. Like, it's, it's mind boggling to me that like we live in an age where like you can literally have a robot like you know, one shot, a very large change and then you pop over to like, you know, GitHub, PR AS and you're clicking, you know, expand hunk, expand hunk, expand hunk. No, code intelligence can't edit diagrams. Yeah, yeah. Like, it just feels like like, you know, it's like we have, we have like a Ferrari engine, but then part of our workflow still requires like strapping it to this like horse and buggy style thing.

32:20

Speaker A

So. Yes, it's like I create a microchip and then I give you an oscilloscope.

33:05

Speaker B

Yeah, yeah, exactly, exactly.

33:10

Speaker A

All right, so listen, we're moving on on time here. So I actually want to get more to the policy side because I do think like, listen, a lot of the way this goes is the way the model goes. Yep. The open source ecosystem, we see it all over the place. Not, not even talking about source draft. But I would say if a company walks in now that's a product company, that's, that's, that's decided that they need to post train their own models. It's, it's, it's going to be on an open source model.

33:14

Speaker B

Yep.

33:38

Speaker A

And more and more of these are Chinese models. And so you mentioned that you do use open source models in Chinese models. So like how do you think about that as far as like a, maybe just like the implications of dependency and then B, what does this like? Yeah, maybe more holistically with the United States and the ecosystem.

33:38

Speaker B

Yeah. So like first off, like in terms of our production setup, like every model that we hit is hosted on American servers. So from like an information security point of view, I Think this is like best practice across the industry. It's like you don't hit models that are hosted in China or. Yeah, so like from, from that part it's, it's fine. I would say though, if you take a step back, it is fairly concerning because my view is that as the model landscape evolves, you're going to start to see a flattening in terms of model capabilities. There's going to be healthy competition at the model layer and there's going to be a number of options for choosing a model at a given point in the Pareto frontier. And with that flattening, there's a strong incentive for application builders to at a given capability level use the one that's open open. For the reasons stated before and because the most capable open weight models right now are of Chinese origin, it essentially means that application builders around the world are choosing to post train on top of these models. And so if the US open weight ecosystem doesn't catch up, we're kind of in danger of the world migrating to a world where, where most systems are heavily dependent on models of Chinese origin.

33:57

Speaker A

Do we have competitive US open source models right now? I mean in any competitive non Chinese.

35:24

Speaker B

If you look at Europe, you know, we've sampled a good portion of the model landscape because again we have all these sub agents and agents. We want to find the best ones for the job and frankly the ones that we find most effective at agentic workloads, they're almost all, I would say they are all of Chinese origin right now. And that's not to say that there haven't been good efforts by American companies. It's just that when you plop those into an agentic application, the tool use isn't quite robust enough. It's not quite there yet.

35:33

Speaker A

Do you think this is a result of policy or funding or like.

36:09

Speaker B

I think probably all of the above.

36:18

Speaker A

The easy answer is like yes, you know, it's a regulatory thing, this and that. I just don't know how true that is. I mean it just turns out it's very sophisticated.

36:21

Speaker B

Like you know, so it is interesting. Like it's like, you know, the AI revolution was basically like born and created in the west, right? And down the street. Yeah, down the street. And the US still holds a lead in basically like every part of the stack. You know, whether it's like chips or you know, frontier intelligence, basically every place except open weight models and robotics. I guess that's the manufacturing aspect of it. But from where I stand, it's like if you go Back to the early days of the AI revolution, back to 2022 or so. I feel like the narrative that was told that was like the dominant narrative was this one of AGI at that point, point where it was kind of like this. It's like amazing new technology. It feels like magic, right? Like, never experienced anything like this before in my life. And then the narrative that was spun was like, hey, AGI is nine. What does AGI mean? Well, either one, it's like Utopia. All our problems are solved. This thing will just, you know, run our lives for us or it's going to kill us all.

36:29

Speaker A

Total annihilation.

37:37

Speaker B

Like Terminator style outcome Skynet.

37:39

Speaker A

Yeah, I love the biology, the biology view of this. He's like, there's this very Abrahamic view of. Yeah, it's either like God or the devil, right. And then he's like, I'm Hindu, we've got a bunch of gods. Some are capricious, some are nice. I've chosen the Hindu view of this.

37:42

Speaker B

Yeah. Arguably that view of the model landscape was the right one in retrospect. And I think at the time, like, people using these models directly kind of of realize this, right? It's like you use the models, they can emulate intelligence of a certain kind, but it's like mostly pattern matching. And there's just like absolutely no danger that this thing's going to acquire a mind of its own and try to reach out the computer and kill you.

38:03

Speaker C

If you use this every day, right? This idea of that this thing could take over the world.

38:31

Speaker B

Yeah, exactly. So now if you talk to practitioners, anyone who's building it, and increasingly anyone who's using it, right. Because now ChatGPT has been out for three some years and everyone and their mom has used it. People kind of understand what the limitations are. So that narrative, I think, is largely been dispelled within our circles, but I think that it's sort of taken on a life of its own in other circles and it's made its way to some of the halls of policymaking in the us.

38:34

Speaker C

It's part of the problem here that not every PoliceMaker is using LM's day to day to put it carefully.

39:06

Speaker B

Yeah, I don't, I didn't know. You know, there's the old adage of like, you know, do you blame it on, you know, ignorance or, or malice? Yeah, Like I, I honestly don't know. Like it's, it's a black box. But it, it is like, it is clearly like nonsensical and I think very much in the, the, the national interests to be still telling this story because it one, it, it, it leads to kind of like overemphasis on, on like the model as the end all be all of, of AI where in reality it's like pushing the models into all these different application areas where the rubber meets the road and things become useful. Yeah. But then also when you think about making laws and regulations for this sort of stuff, if you've been sold on this sort of Terminator style narrative, that's going to put you in a very different mindset with respect to how much risk tolerance you're willing to take on, how much innovation you're going to allow in the ecosystem and your tolerance for open sourcing model weights.

39:13

Speaker A

So you know, you use a bunch of open source models and there's a question that we actually debate quite a bit which is assume the policy environment exists as it is, even with like infinite funding and infinite talent, could you still actually build competitive models? Or like now are we at a place that like we're just at a disadvantage just because the like is it, is it, is it too late to actually assume that we can do it without actually changing policy?

40:13

Speaker B

Like, like build adequate open, open way models?

40:38

Speaker A

Well, let me just give an example. I don't know why OpenAI released the open source models the way they did, but it seems like they were very, very sensitive to what data was in them. And I presume this is kind of concerned around copyright. I don't know the answer to this. I just assume that.

40:42

Speaker B

Interesting.

40:58

Speaker A

We haven't seen something come out of Meta in quite a while. Are there even any open source models? It's just very unusual for the United States not to do this and the efforts that have done it have seemed to be handicapped in one way. And so there's one view of the world that this is a tech problem, it isn't a money problem. We're already in the overhang of policy. Like that's one view. And so like, I guess my, my specific question is do you think that is the case or do you think we've just kind of, you know, haven't, haven't kind of gotten to it yet and we're going to come up with open source models?

40:59

Speaker B

You know, I, I honestly don't know. Like I, I don't, I don't have like inside knowledge of what goes on inside a lot of these research organizations. But it's super remarkable that like here.

41:28

Speaker A

Were the U.S. you know, we were the first with open source models. We had Llama Three. And like now we like, like listen, you're, you're using Chinese models. Models. Yeah, like where are the US models? And then if you know and why aren't they there? And I guess, my best guess, I mean, again, you know, you both can gut check me on this, is like, actually there's like, like all of the rhetoric around like developer liability, even though it didn't happen, but there was rhetoric around it. All of the policy stuff, all the copyright stuff, all the lawsuits. My guess is is that, you know, a lot of these folks are gun shy.

41:38

Speaker B

Yeah, I think that could very well be the case. And I think that the, the way that the regulatory landscape has, solving doesn't help at all as well because there was an effort earlier this year to have kind of like a federal set of standards for AI model layer regulation, but that I think fell apart. And so now we're kind of like slow walking some case, fast walking towards this patchwork quilt of state by state regulations, which is not going to be good.

42:09

Speaker C

Some of that state regulation writes in that it applies to anybody making a model available in the particular state. So in theory, one state, every state, tries to drive policy for all the United States.

42:40

Speaker B

It's very vaguely worded and it leaves a lot of room for interpretation, which is never good. I think for a lot of that.

42:53

Speaker C

Hasn'T been litigated either.

43:01

Speaker B

Right.

43:02

Speaker C

So it massively increases complexity. I think for a small startup to build an open white model at this point is extremely hard. Who wants to take that risk?

43:02

Speaker B

It reminds me like back in the day when we were looking at GDPR compliance, when that was the first thing. And I was talking with like our legal team and external counsel and trying to read the text of that regulation and figure out like, oh, is this thing technically in violation? It seems kind of high level. And the answer that I got was like, look, honestly these are underspecified and it's really like it's going to come down to some decision maker within that bureaucracy and they're going to make a judgment call and hopefully they lean towards going after the bigger fish in the pond before they come after you.

43:12

Speaker A

So, you know, paradoxically, this is the greatest gift you could have ever given to the large social networking giants. Those are the only ones that actually could have the legal teams and the politicity team to navigate this stuff. And we saw this up close as investors were like, as soon as these things came up, it basically entrenched the incumbents.

43:52

Speaker B

Yeah.

44:06

Speaker A

Who could kind of play last quick topic, if you know if you did have like some recommendation on like how we should think about policy going forward to kind of aid in, you know, open source efforts for the United States.

44:06

Speaker B

Yeah. What would you guide do as much as possible to ensure a dynamic and competitive AI ecosystem within the US I mean the best thing that we can do, I mean, we're America. The best thing we can do is to take a step back and let the free market function. And so to that end, ensuring there's kind of like a standard nationwide set of regulations that's clear and well specified to be going after specific applications and application areas rather than you know, like general, you know, existential risk at the model layer, that would be good. And then two, just ensuring that there's like competition at the model layer, avoiding any sort of like anti competitive behavior.

44:18

Speaker A

Regulatory lock in any of that.

45:04

Speaker B

Yeah, regulatory lock in that sort of thing. You know, essentially like, you know, don't let the like Internet Explorer vs Netscape thing play out the way it did in like Internet 1.0 know, with like the AI ecosystem.

45:06

Speaker A

Yeah. We were very, very lucky that actually academia and the broad industry ended up airing on the side of, of openness. Let's hope this happens this time too.

45:18

Speaker B

Yeah.

45:26

Speaker D

Thanks for listening to this episode of the A16Z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or review and share it with your friends and family. For more episodes go to YouTube, Apple Pay podcasts and Spotify. Follow us on X1 6Z and subscribe to our substack@a16z.substack.com thanks again for listening and I'll see you in the next episode. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a16z.com disclosures.

45:29