AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

55 min

•Apr 23, 20265 days ago

Summary

Swix and Jacob discuss the state of AI in 2026, covering the explosive growth of coding agents, foundation model competition, infrastructure consolidation, and emerging frontiers like world models and memory systems. They explore how 2025 was defined by coding agents while 2026 will see these agents 'break containment' into other domains, and debate whether startups can survive foundation model competition.

Insights

Coding has become the dominant AI use case with $2-2.5B ARR for major players (Anthropic, OpenAI, Cursor), representing the first market to reach massive scale and serving as a preview for how other verticals will evolve
Foundation models are moving toward restricted, bundled product strategies (Anthropic) versus open access models (OpenAI), with compute availability and marketing driving market structure more than technical superiority
The shift from capability exploration to efficiency optimization is beginning, but companies still get rewarded for token maximization and 'slop' production, creating tension between quality and quantity
Open source models are gaining market share contrary to earlier predictions, particularly in the top 20% of AI companies focused on cost and latency optimization, driven by fine-tuning services and specialized workloads
Infrastructure stability around agent harnesses (skills, tools, retrieval) has finally emerged after years of constant reinvention, but memory, personalization, and world models remain unsolved frontiers

Trends

Coding agents breaking containment into other domains as the primary 2026 narrative, with agents becoming the primary users of infrastructure productsWinner-take-most consolidation in application layers (Cursor, Cognition dominating coding) while infrastructure remains fragmented with longer tail of specialized playersShift from UI-first to API-first product design driven by agent adoption, with 60% of Vercel traffic now from bots rather than humansCustom silicon (Cerebrus, Talos) enabling 10x+ inference speedups, unlocking new application patterns despite skepticism about incremental speed improvementsPost-training and domain-specific fine-tuning becoming viable strategies for 3-6 month competitive windows before general models catch upEnterprise AI adoption favoring dedicated application partners (Legora, Sierra) over direct foundation model access, despite model labs' technical capabilitiesOpen source models gaining adoption in cost-sensitive and latency-critical workloads, reversing earlier predictions of declining market shareMemory and personalization emerging as the next frontier to differentiate products, with current SEO/AEO tactics becoming commoditizedWorld models and spatial intelligence identified as critical gaps in current LLMs, with implications beyond robotics for fundamental AI reasoningDark factories and zero-human-review code deployment becoming the scaling frontier for AI coding, requiring SDLC transformation and increased testing

Topics

AI Coding Wars and Market ConsolidationFoundation Model Competition and Pricing StrategyAgent Engineering and Harness StandardizationInfrastructure Stability vs. Continuous ReinventionOpen Source vs. Closed Model Market ShareCustom Silicon and Inference OptimizationPost-Training and Domain-Specific Fine-TuningEnterprise AI Adoption PatternsMemory and Personalization SystemsWorld Models and Spatial IntelligenceDark Factories and Zero-Human-Review DeploymentSaaS Disruption by AI-Native AlternativesAgent Experience vs. Developer ExperienceModel Access Restriction and Safety Trade-offsStartup Survival in Foundation Model Era

Companies

Anthropic

Claude/Claude Code generating $2.5B ARR in one year; pursuing restricted, bundled product strategy with Opus model

OpenAI

Competing in coding with ~$2B ARR; pursuing open access strategy and exploring super app and consumer agent experiences

Cursor

AI coding IDE rumored at $2B valuation; competing directly with Claude Code with strong product execution

Cognition

AI coding agent company where Swix works; using Cerebrus custom silicon and pursuing agent lab playbook

LangChain

Infrastructure company that reinvented itself multiple times; CEO Harrison Chase noted stability finally emerging aro...

Vercel

60% of admin traffic now from bots/agents; example of infrastructure company adapting to agent-first product design

Netlify

Competing with Vercel on agent experience; CPO Matt Billman coining term 'agent experience' as evolution of developer...

Legora

Vertical AI application company acting as outsourced AI team; example of robust business model insulated from infrast...

Sierra

Vertical AI application company building customer-facing AI; willing to throw out implementations every 3 months as m...

Supabase

Postgres database incumbent facing disruption from AI-native alternatives; example of SaaS under pressure

Cerebrus

Custom silicon company enabling 10x+ inference speedups; adopted by Cognition and OpenAI for coding workloads

Talos

Custom silicon alternative to NVIDIA enabling significant inference speedups and new application patterns

Fireworks

Open source model inference platform gaining market share in top 20% of AI companies

Together AI

Open source model platform crushing in fine-tuning and specialized workloads

Convex

Mentioned as potential Firebase-like system for AI apps; alternative to traditional databases for rapid iteration

Clickhouse

Database consolidation target; Langfuse moving to Clickhouse as LLM infra consolidates

Langfuse

LLM observability/infra company consolidating; example of dead LM infra category

Google

Announced Gemini in three sizes (Flash, Pro, Ultra) but only released Pro/Flash; theory of distilling from larger unr...

xAI

Elon's AI lab IPO'd at $1.2T valuation; attempting to compete in coding but hasn't broken through yet

Redpoint Ventures

Jacob Efron's employer; investor perspective on AI startup landscape

People

Swix

Host of Latent Space podcast; works at Cognition on AI coding; runs AIE Europe conferences quarterly

Jacob Efron

Host of Unsupervised Learning podcast; investor tracking AI startup landscape and foundation model competition

Harrison Chase

Quoted on infrastructure stability finally emerging around agent harnesses after years of reinvention

Brett Taylor

Example of application company willing to throw out implementations every 3 months as models improve

Max Unstrump

Example of vertical AI application company with sticky enterprise customers despite model improvements

Malta Ubel

Shared stat that 60% of Vercel admin traffic is now from bots, showing agent adoption at infrastructure layer

Matt Billman

Coining term 'agent experience' as evolution of developer experience for infrastructure products

Mike Krieger

Discussed biosafety concerns and Opus model release strategy at dinner with Swix

Alex Wang

At breakfast with Swix; indicated strong focus on coding use case but hasn't launched yet

Ryan LaPopolo

Spends 1 billion tokens per day (~$10K/day at market rates); example of living at the edge of capability exploration

Ankur Goyal

Claimed open source market share at 5% and declining; Swix changed mind on this prediction

Fei-Fei Li

Essay on spatial intelligence and why LLMs lack it; identified as key problem statement for world models

Simon Willison

Coined term 'dark factories' for zero-human-review code deployment frontier

Quotes

"The same way that 2025 was a year of coding agents, 2026 is coding agents breaking containment to do everything else."

Swix•Mid-episode

"It finally feels like we have stability around the infrastructure for AI."

Harrison Chase (LangChain CEO)•Early discussion

"60% of traffic to Vercel's admin app architecture for configuring Vercel applications is bots. It's not human."

Malta Ubel (Vercel CTO)•Infrastructure discussion

"If it doesn't exist as an API that agents can use, it doesn't exist, right?"

Swix•Agent infrastructure section

"That's exactly the difference between a very intelligent LLM who knows everything, but hasn't experienced anything."

Jacob Efron•World models discussion

Full Transcript

Isn't that crazy? That number is just mind-boggling. What is the state of the AI coding wars today? We're in a phase of sort of like capability exploration. The general thesis that I have been pursuing now is that the same way that 2025 was a year of coding agents, 2026 is coding agents breaking containments to do everything else. Do you worry about the foundation models just eating into a bunch of these startup categories? Midsize startups, yes. What do you think the end state of this market is? For the market structure to significantly change, there would be... Today on Unsupervised Learning, we had a fun episode in what's really become an annual tradition, a crossover episode with our friends at Leighton Space. Swix and I sat down and we talked about everything happening in the AI ecosystem today, what we thought of the various changes at the model layer, what's happening in the infra-world, the coding wars, and a bunch of other things. It's a ton of fun to do this with someone I really respect and another great podcaster in the game. Without further ado, here's our episode. well swix this is uh super fun to be back with another unsupervised learning uh latent space crossover episode yeah i feel like a lot of places we could start but you know one thing i always find fascinating uh about the way you spend your time is you obviously are like at the epicenter of this engineering movement and community and you run these events and conferences and put on these awesome talks and and i think just have a great pulse on the zeitgeist of what's going on yeah maybe to to start just what are the biggest topics people are thinking about right now Yeah, so I just came back from London, where we did AI Europe, and we're doing roughly one per quarter now. Yeah, you're really up the pace. We're trying to match AI speed. Yeah, exactly. The topics will be completely different, I imagine. Yeah, yeah. I definitely curate the tracks. You can see what I think when you see the track lists and the speakers that I invite. Obviously, OpenClaw is the story of the last four or five months. And then just below that, I would consider harness engineering and context engineering to be two related topics in Agents and RAG. And then there's a long tail of evergreen stuff like evals, observability, GPUs, LLM infra, just in general. We also have other updates on multimodality and generative media, let's call it. But definitely the first three that I mentioned are top of mind people. I think harnesses in particular are so interesting. There was this tweet from Harrison Chase, the LaneChain CEO, that caught my eye recently where he said, it finally feels like we have stability around the infrastructure for around AI. And I think what he basically was implying is, look over the past two, three years as a company at the epicenter of AI infrastructure, it was a bit like playing whack-a-mole, right? You were constantly moving around with however the building patterns were evolving. For Harrison, for sure, right? He's basically had to reinvent the company every year since he started LaneChain. right it was Langchain, Langgraph and all deep agents and like I think he's like one of the most nimble adept sharp people about this but yeah now is finally the time for stability do you buy that or what have you kind of make of that take I think that it's very expensive to say this time is different sometimes but when you're just writing code like it's actually okay to just like try to make a call and I think it may not even matter if this call is right or not. Like, I just don't even care that much because you can be right on the thesis, but if you don't figure out how to monetize the thesis, then who cares if you said something first? That said, it does feel like, for example, we went through a lot of different ways of packaging integrations up with agents. And it feels like we've landed at skills, which is like the minimal viable format, which is just a markdown file with some scripts attached to it. And I don't see how it can be more simple than that. And so there is some justification for the stability around harnesses. I feel like there may be more adaptation with regards to maybe like the real-time elements or sub-agents or memory or any of those like agent disciplines, let's call it, in agent engineering. but if the thesis is that okay you just want agents are LLMs with tools in the loop with a file system where they can do retrieval with skills and all these like standard tooling that now seems to be relatively consensus then probably that makes sense I just think like there's no point trying to stake your reputation on this thesis that we're there because if it changes again just change with it it's fine Yeah. I've always been struck by how that is much more challenging for infrastructure companies and application companies. Obviously, I think on the application side, you've seen Brett Taylor from Sierra, Max Unstrump from Legora. They're like, look, we build what's ahead of the models and we're willing to throw everything out every three months as the models get better and better. But the thing you at least have there is you have an end customer that's decently sticky. They'll give you a shot at least of building these things. What I've always found more challenging at the kind of like reinvent yourself every three months at the infrastructure layer, it's like developers are definitely a pickier audience maybe than an accounting firm or a bank. And so it's definitely a more challenging position to be in to have to constantly reinvent yourself. Yeah, yeah. And when they churn, it's very complete. They'll leave to the hot new thing because there's no defensibility, I guess. Like even if you are a database, like people can migrate workloads off databases. Like it's a known thing. So I think like basically what we're talking about is the vertical versus horizontal debate in AI startups. And the way I think about it also is just that like when you're Legora, when you're a bridge, like you are the outsourced AI team, right? Your job is to apply whatever state of the art AI methods. Yeah, like this translation layer between model capabilities and your customers. To the end customers. And like, well, if they didn't have you, they would have to hire in-house and they're not going to hire in-house. So they have you. And I think that's like a reasonable, like very robust to any whatever trends and discoveries that people make in the engineering layer. I do think there is useful horizontal companies being built, but they're all very much the reinventions of classic cloud in the AI era. And the primary one being sandboxes, which it's another form of compute, guys. Let's not get too excited about it. But I mean, the workloads are enormous. Right. It's interesting. And I feel like as part of this, you know, the questions that folks are asking around infrastructure, there's a lot around, you know, the extent to which companies should have their own AI teams and what they should be doing in-house. And, you know, I think there's questions around should people be training their own model? Should people be doing, you know, RL in-house based on the data they have? I feel like, you know, one has to evolve their takes on this every three months with Paces. But where are you at on this today? I think actually all models have gone up. And obviously I'm involved in cognition and also Cursor is doing a lot of own model training. And I think that that is some part of what I've been calling the agent lab playbook, where you start off with the state-of-the-art models from the big labs and you specialize for your domain. But once you have enough workload and enough high-quality data from your users, then you can obviously train your own models and save a lot on cost and latency and all that good stuff. You also get a marketing bonus of calling it some fancy name and putting out some research. From my seat, I can't tell how much of it is like actual, you know, value that's provided to the end user and how much of it is that marketing bonus, right? It seems some combination of the. I think it's both. Yeah. No, no. There actually is real value. And you know that for a number of reasons. Like one, even when it's not subsidized, people do choose it as like one of the top four or five. This is both Composer 2 and Sweet 1.6. I want the top five models. Like, you know, in a fair market, in a free market. Yeah. In a model switcher, people do choose it. and like it's not subsidized like it so that's as good as it gets uh but beyond that like domain specific models for example for search with with both which both companies have absolutely makes make a ton of sense everyone says like yeah you should always always do this and honestly like i think the infrastructure for that is becoming easier with um like thinking machines tinker thing as well as prime intellects uh lab stuff yeah i mean like this is one of those like reversal of the bitter lesson where you first bootstrap on the large models and the general purpose models to get big. And as you get very well-defined workloads that are just high quantity, but not high variance, then you just distill down to a smaller model and run that on your own, which totally makes sense. What I'm less clear on is the kind of DIY RL use case, which I think is really mostly around improved quality for different things. Obviously, So there's probably more efficient ways to get a smaller model that's faster and cheaper. And it'll be interesting to see whether – obviously, you had two, three years ago this whole case of companies that were pre-training and claiming better outcomes in their domains than getting kind of cooked as each model iteration improved. I wonder whether that's – a similar story plays out in the RL space. Yeah. For the focus on pure outcomes and quality, not the cost side, which clearly your own models for cost at scale makes a ton of sense. I think there are two sides of the same coin. You basically always want to hold quality constant or trade off a little bit of quality for a drastic decreasing cost. And that's true for everyone. One element I wanted to bring out, which is very much in favor of open models, is custom chips. So this would be Cerebrus, but also Talos. And then there's a huge range of stuff in between. This has been a huge story this past year on just like everything non-NVIDIA is getting bid up, including like freaking MadX is working for it, which is very rewarding for me. But I think one of those things where like, oh, like suddenly because the number of alternative hardware is increasing and the inference that you can get is insanely high, like we're talking thousands of tokens per second instead of less than 100. So the trade-off for quality doesn't hold as much anymore because the speed is so high. Have you seen a lot of companies go all in on the alternative chips? So Cognition has on Cerebrus, and so has OpenAI. And so, no, I don't think so beyond that. And that's mostly because that's clearly, yeah. I used to be kind of a skeptic in terms of like, okay, so what if I get my inference at 100 tokens per second sped up to 200 tokens per second? It's only 2x faster. It's not that big a deal. But when you, I think every 10x does unlock a different usage pattern. And we have proof in Talas and some of the others that you can actually drastically improve inference speed. And what happens from there, I don't even really know. Like, it's so hard to predict when entire applications just appear at once. And it also isn't that expensive. Right. So, like, this is one of those things where, like, I think that the investment cycle is going to be multi-year. and I would caution people to not dismiss it too quickly. I mean, one other like infra question I was curious to get your thoughts on is, obviously it seems increasingly a lot of the cutting edge infra companies are building for agents as the buyers of their product or users of their product, right? Ooh. And another huge team, yeah. And I'm trying to figure out like what, what do you have to do differently about selling into agents? Are they just the ultimate rational developers or is there, you know? No, absolutely not. I think they are easily problems injected and very tuned towards basically compounding existing winners. Yeah. So congrats if you won the lottery for getting into the training data before 2023 because now you're installed in there for the foreseeable future. But yeah, one stat that Vercel CTO Malta Ubel dropped at my conference was that there are now 60% of traffic to Vercel's admin app architecture for configuring Vercel applications is bots. It's not human. So your primary customer is agents now. And it's mostly coding agents, mostly people using CLI, OMCP, whatever. But yeah, I mean, I think step one, if it doesn't exist as an API that agents can use, it doesn't exist, right? Which I think it's a good hygiene thing anyway to make everything API available, but not as an extra push on products people to not only work on the UI. You should probably work on the CLI stuff. Beyond that, I think, honestly, there is, so I come from the sensibility of, I think everything that you are trying to do for agent experience now, which is the term that Matt Billman at Netlify is trying to coin, is the same thing that you should have been doing for developer experience. That you should have had good docs. You should have had a consistent API that is mostly stateless. You should have, I guess, discoverable or progressive disclosure or like search or like whatever. And so now that people have energy in like finding these customers to do that, that's great. Do I believe in extending beyond that into something like AEO for gaming the chatbots? Not necessarily but obviously there going to be huge advantages from people who figure out the short wins and short wins can compound Do you think these compounding advantages to the pre data cutoff companies obviously over some period of time I imagine that doesn persist And so as you think about, I don't know, three, four years from now, what the selection criteria end up being, do you think it still mirrors exactly what you were saying before? Like it's exactly what you should have been doing all along to sell a good product to developers? It could be, except that I think in three, four years, will probably have much better memory and personalization. So then general AEO or GEO doesn't really matter as much. So I think whatever memory or personalization system we end up with will probably determine what you end up choosing much more than what is currently the case, which is just frequency of mentions, let's call it. So you just spam quantity. And I think that's, I mean, that's something I'm looking forward to. I do think like, you know, I think that the fundamental exercise to work through for yourself is if you start a new sort of disruptor company now, there's a big incumbent that everyone knows, like Superbase. Superbase is kind of like the Postgres database incumbent. If you want to start a new Superbase, how would you compete with them? And I don't necessarily have the answer, but I do think people resend, relatively new, I think they were starting in 2023, and there was a recent survey where people checked what Claude recommends by default. if you just don't prompt it with anything, just say, give me an email provider and says recent in like 70% of cases. Like the fact that you can get in there with like such a relatively short existence, I think is encouraging. I do think like you do want to do whatever it is to get in that very short mentions this because it's not gonna be 20 of them. It's gonna be like three. No, definitely. It feels like, you know, probably more consolidation than ever or kind of like, you know, a winner take most market than maybe the physics of go to market in the past might have enabled. The other thing also is like semantic association is going to be very important in the sense that like you want to do like the combo articles where you're like, use my thing with for sale, with blah, blah, blah. And like that all gets picked up in a corpus. And so that's probably one thing that you want to do well. I don't know what else. It's one of those things where I feel I'm behind. I don't know how you feel about this. I think AI is just everyone constantly feeling like they're behind. I want to meet the person that doesn't feel behind. With AI, sorry. My stance was exactly what I said before. Everything that you should do for agents is something that you should have done for humans anyway. And so to the extent that you're just getting more energy to do things for agents, great. But like it's hard to articulate what new thing apart from just like more spam that you should be doing anyway. That will be my take right now. I do think like there will be more turns at this. I think the personalization turn that is coming will be big. And I don't know what that looks like because like basically we're kind of we feel kind of tapped out on the memory side of things. Yeah. I guess since we last chatted, you took this role over at Cognition, and you obviously have a front row seat to the AI coding space today. I feel like coding in many ways, people view it as this like, I mean, besides being like the mother of all markets and this massive opportunity, I think it's kind of a preview of what's to come for many other spaces. Both, I feel like agents are most advanced in coding. I also feel like the competition between foundation models and application companies mirrors what we may see in other spaces. And so maybe for our listeners, can you just lay out, like, what is the state of the AI coding wars today? It is massive, right? And I don't think necessarily the last time we talked about this, we appreciated the size of what it is. No, I wish we did. It's the state of the AI coding wars today. Both OpenAI and Thropic have made it their P0s to competing coding. and Thropic is at like 2.5 billion in ARR just from Cloud Code the way they recognize ARR is up for debate OpenAI I don't think a public number is known but let's call it 2 billion as well and then Cursor is rumored to be 2 billion and those are the public numbers that are known so like huge markets that have just been created in the past one year like Thropic just Cloud Code just recently celebrated their one-year anniversary, which is insane. So I think the other thing that I see is there's some other people who are like, oh, here's the relative penetration of Cloud use cases, right? And it's coding 50% and then legal, whatever, health. It's the remaining ones. And there was a very popular tweet that was like, okay, look at the empty space and all these other use cases. if you are a new founder today, you should be betting on the other stuff on a sort of catch-up theory. And my pushback is the same pushback that I had on Apple versus Google, which is like, well, why is this type of difference? If it went from, let's say, 10% to 50% in the past year, why can't it keep going? And getting that wrong is actually a very painful one because you could have just did the momentum bet instead of the mean reversion bet. So I think that is the state of things now that people are very much into psychosis. they are getting rewarded for spending more rather than spending less. And I think we're not in that phase of efficiency. We're in a phase of capability exploration. So I think people who are more crazy, who are more creative, get rewarded comparatively. Well, it's interesting. I mean, it feels like behind these token maxing leaderboards and whatnot, it's the first phase of this transition from a workforce perspective is you just got to show your employer, like, hey, I use these tools. Here's my number of tokens I cost. And that's it. They don't care about the quality right now. It is maybe distasteful to someone who cares about the craft and all that. But directionally, everyone just wants you to go up regardless. And so it's not very discerning. And it's probably very sloppy. But I think it's net fine because we're still probably underusing AI just in generally. And so I think that's very interesting. Like we had on the podcast Ryan LaPopolo from OPI, who spends a billion tokens a day. And that's for those counting at home, it's like something like $10,000 worth a day of API tokens if they did market rates. And like most of us can't afford that. But like and probably a lot of what he does is slop. Right. But if there were a new capability, he would discover it first before you because he was trying and you were not trying. And you only do things that work. Well, good for you, but the people who are going to discover the next hot thing are living at the edge. Right. An increase in living at the edge is just having the compute budget to run these experiments. I mean, kind of similar to what living at the edge on the research side has always been. It was constrained in many ways by the amount of compute you had to run these experiments. It feels similarly almost on the builder or actualizing these. Yeah. The other thing that's, I mean, very obvious is Anthropic is kind of like the high price premium player that where, you know, restricting limits or restricting model releases even is like the name of the game. Whereas Codex is like, come on in, guys, use our SDK, use our login. We don't care. We're going to reset limits, whatever. You do want to try to exploit the subsidies where you can get it. And definitely Codex is super subsidized right now. Gemini also very subsidized. and comparatively, I think you should make hay, I guess, while that's going on. It's not that bad to be a capabilities explorer on just the $200 a month plan from Cloud Code or from OpenAI. And my sense is that people aren't even there yet. How do you think this market ultimately plays? I mean, it's obviously such a big market that any slice of that market is interesting for anyone going after it. But I think what makes people so interesting in the coding market particularly is it feels like it's kind of this foreshadowing of what will happen in any other kind of application market that the foundation models eventually turn to and are all their models against and gather data around. And so how do you think – like does there end up being room for lots of different kinds of players or like what do you think the end state of this market is and is that – do you think that's applicable to other markets? I feel like there will be, I mean, status quo is probably the most likely outcome, which is there are two big players and there's a small range of longer tail people that fit other use cases that the two big players don't. That feels right to me. I think that for it, for the market structure to significantly change, there would be, there needs to be significant change in like the economics or like the brand building or like the value propositions of the companies involved. And I haven't seen any in the last six months that have really changed the stories materially. So I feel like they would just keep going until something else happens. something else happens meaning like microsoft wakes up and like goes like guys we have github we have uh you know we'll do something much bigger here than other than just copilot um and that would be a big change um msl has put out a model now and i was in a breakfast with uh alex wang where they were like yeah like we we really really want to go after the coding use case they haven't done anything yet, but like don't underestimate them, right? And similarly for the Chinese labs, I think they're trying to go after it. Like ZAI is doing stuff, GLM, same thing. And so it's like everyone's trying to get a piece of that pie. I feel like the status quo has been pretty stable for the past almost a year, I will say. Yeah. And is there room for the application companies more on the enterprise side or where do the what surface area do the model companies leave for application companies? Yeah, that's a good one. It's very much evolving. I will say because OpenAI did not have this level of attention on coding a year ago, we just don't have that much history, right? And it seems like, for example, so the big push at OpenAI now is the super app. Is that a consumer thing? Is that like a product's portfolio rationalization thing? how much is that going to take away attention from coding at the time when they actually do want to put more coding? I think it's very unclear. So I do think there's all these, at both big labs, there's, sorry, at both Lollian Anthropic and DeepMind and XAI are separate cases, they are trying to see the other TAM expansion areas. So Cloud Code for Finance, Cloud Cowork, all those things. Whereas I think Cursor and Cognition are comparatively just focused on coding. And so I do think they leave space. And I do think for the other verticals, that also means the same thing, right? That they're not going to be that intensely focused on that domain. Except for, I think I will mark out finance and healthcare as like the next ones that they're clearly going after. I would say comparatively, healthcare seems more thorny. There have been some announcements about it, but like I would respect the finance work a lot more just because like the path to money is a lot clearer. Yeah. Yeah. No, I mean, obviously, I think maybe similar to the space that's being left in these other domains, there's obviously a lot that's required to actually implement these tools in enterprises versus maybe just giving model access to folks out of the box. Yeah. Yeah, yeah. So the agent lab thing is like, we'll do the last mile for you, whereas I think the model labs tend to just trust the model and be minimalist about it. Both of them work. I don't necessarily think one beats the other for every use case. All I do know is that it does seem like the large enterprises do want a dedicated partner that isn't just the model labs, which is kind of interesting. We've been in this phase of pure capability exploration. And so I think nothing has been better for the large labs. I mean, they are always going to be at the frontier of capability exploration. And so I think have a very good relationship with a lot of these enterprises. But ultimately, over time, like the incentive structure of these labs is always going to be maximal token consumption for the end customers they work with. And there's just, I think, so few companies that have actually gotten to massive scale. Maybe coding, again, is the most interesting. So it's the first space that really is just completely gone. Yeah, you must live it every day. Like absolutely insane. And I think when you get to – Okay, I mean, like I think we say good things about cursor cognition. but the sheer liftoff of like both endopic and open AI because they have independent valuations. I mean, let's throw an XAI in there. Now, IPO-ing at 1.2 trillion. That number is just mind boggling. Like I feel like in normal investing or normal startups, there's kind of like a ceiling market cap or valuation that like you reach and you're going like, all right, it's going to be chiller from now on. And these guys are not slowing down. No. Well, I also think the dynamic that's fascinating about some of these later stage companies is in the past, I feel like in venture world, if you got to a certain level of scale, the question around you was really more a valuation question. And this is like why there was different types of venture people did it. And like the late stage growth people were just incredible at like a little bit of what's the ultimate market opportunity of this company but also what the right way to value it Like we know it in some bands of an outcome that is like sure there some variance to it but it like relatively understood what that band is and then maybe you get over time surprised to the upside Whereas any kind of like, even the labs themselves, any later stage company, the bands of which that company might be worth right now, even in a year or two years, are so massive because of how fast the ecosystem changes that it's like, even for later stage companies, every three months could be an existential level event to the upside, to the downside. And I think that you're obviously seeing it in the positive with code, which if you think about a company like Anthropic, that for a while it was unclear if they were going to have access to enough capital to really stay in the race. And then coding hit at the exact right time. They had the perfect model for it. They executed brilliantly. And now we're one of the most valuable companies in the world. At the same time, I have zero sympathy for OpenAI because they're crushing it. And they're all rich. You know, this is like a high-class champagne problem to have, to be number two at coding or whatever. Like, who cares? Like, you're doing great. It's funny, though. I mean, you would be closer to this, you know, even though you're in the AI coding space. But it's like a lot of people I talk to think Codex is just as good, if not better, than Cloud Code, right? I think one thing that I've been really surprised by, and maybe Cloud Code is a better product in some ways, I'm curious your thoughts, is just in consumer AI with ChatGPT, you saw this big first-mover advantage, right? Where admittedly today, like, I don't know, Claw Gemini, great products, not sure, not abundantly clear ChatGPT is any better, but like people stick with ChatGPT. It's the first thing to introduce them. They stay, but they're not growing anymore. I don't know if you've seen. Right. But that to me is more of like a product problem than it is. It's not like they've like lost share to someone else. My understanding is the overall problem with consumer AI today is much more of a how do you take this tool? And, you know, for folks like us, like knowledge workers, it's like this incredible magic tool. But it's not necessarily a daily active use tool for a lot of people around the world today. and what are the products. It's kind of a category-wide problem. Like in coding, for example, the entire space has gone parabolic. There may be some relative growth in other consumer AI players, but it's not like consumer AI as a category is going parabolic and they're not capturing most of that thing. I think it's actually the larger problem is much more, hey, the category has kind of hit a bit of a plateau of people having figured out how to bring tons more users on board or increase the frequency of those users. And so it seems more of a category-wide problem that it is a massive market share change. I was going to draw the comparison to the coding space where CloudCode was the first product, obviously, to introduce people to this magical experience. By all accounts, Codex is pretty damn close to as good, if not better. But still, that first product, you would have thought that would not be a super sticky product surface area. It actually has, it turns out, it feels like the first lab to introduce you to an experience really does keep a lot of the focus. I think maybe it's like still early days. You know, ChatGPT is like three plus years old and Cloud Code is only one. Just during the year. So give it time. Yeah. I mean, definitely a lot of people have switched to Codex. Maybe that will keep going. It's like really hard to tell. Yeah, I do think that because we are in this like high volatility, high temperature phase, the loyalty and stickiness to first movers and category creators, I don't think is as high as it might be in some other areas in our careers that we've looked at. Yeah. Though, I mean, I've been surprised by the cloud code thing. I would have thought that, like, in many ways, I always worried about the enterprise. You think it would have been gone by now? Not gone, but I always worried that the consumer business of these companies would be quite sticky. And then the enterprise API business was actually, like, you know, in some ways, like, your least loyal buyers. They would move to... But they worked out that it wasn't the enterprise API, it was enterprise product. Totally. And maybe that was the secret. But the amount of lock-in or just default behavior that has happened in that space is more than I might have imagined with two products that by all accounts are pretty damn similar. Yeah. No fight there. I will say I do think that Codex is still in a catch-up in terms of personal experience. the only thing I like out of Codex is Spark. I feel like the skills integration is a little bit better. I feel like the speed is a bit better, maybe because it's written in Rust or whatever. Very minor things that you like, almost like telling yourself rather than objectively assessing between two of them. I do think vibes wise, I think that's going on. I feel like the missing questions in this whole debate is like, why is it so concentrated in only two names, right? Like, where is the Gemini presence? Where is the XAI presence? And like, they are trying. It's just they haven't made that much progress yet. But what the Cloud Code moment does show, it actually in some ways makes you a little more bullish on the potential for someone else to catch up because it does feel like if you're the first person to introduce some magical net new product experience that that actually might be stickier than one might have imagined. Right, right, right. Okay, yeah. And so everyone can believe they have a shot at that. What do you think that new product experience might be? It's like, and this is a failure of imagination on my part. I always wonder, people always say this, well, the thing that will save us is being first to the next new thing. What is it? Yeah. I don't know. Something around consumer agent, computer use, hybrid. I think we're scratching the surface on the consumer side. So my current theory is OpenClaw is a vision of things to come. Totally. And it's good that OpenAI has the association with OpenClaw, but by no means do they have the rights to win it. The general thesis that I have been pursuing now is that the same way that 2025 was the year of coding agents, 2026 is coding agents breaking containment to do everything else. And so coding agents continue to still win, but because they generate software and software eats the world. So it's kind of like the trans-associated property of software eats the world, coding agents eats software, therefore coding agents eat the world, which is an interesting... And breaking containment, always an easier phrase in the consumer context than the enterprise one. You've seen people run these really cool experiments in their own personal lives. I think figuring out how you... Obviously, everyone's focused on the enterprise side now around how you create these experiences. I feel like the vibes... People love to have these narratives of everything is completely shifted. It's like I actually – open AI, organizationally, volatility aside, is great products, great team, great models. Everyone else in the world is incentivized for there to be two, three, more – everyone would love more great model companies. And so I feel like the natural forces of the world revolt when any one company is too much the star of the show. There's so many people in the ecosystem that are incentivized for that not to happen. So I think I'd be shocked if we don't have a reversion of vibes, not maybe completely the other way, but at least a little bit more equal at some point over the next six, 12 months. I think there's just kind of different stages. When you talk about the world wants wanting more model companies, I think about the Neolabs. Yeah. And I mean, I don't know. Is it fair to say none of them have really broken through in the past year? I think that's totally fair. Which is rough. And, well, how are we going to grow that diversity in choice? Like, this is it. Yeah. It'll be really interesting to see what ends up happening with that. And you've seen folks like NVIDIA very incentivized to make sure there's a broader platform of other model providers. I think, I don't know. People say this, but I don't think they try that hard. NVIDIA tries harder to build NeoClouds than NeoLabs. Well, they try pretty damn hard to build NeoClouds. Correct. But, like, you know, let's call it, like, the core weaves of the world. much happier place than any Neolab built on top of them. Yeah. Though one might argue it's easier to enable a Neocloud to be successful than it is you can't will a Neolab into existence the same way you can with a Neocloud. So I bet he has more direct control over it, for sure. What else is kind of catching your eye today on the startup side? And are you worried? There's obviously this whole narrative of the foundation models. They announced a product and every stock goes down 15%. Yeah. Do you worry about the foundation models just kind of eating into a bunch of these startup categories? Not really. I think actually, like, as, there's, okay, there's the point of view of like being an investor in startups and there's a point of view of like, do you want to start something? And I think honestly, like the downside for all these is so minimal in the sense of like, the worst you do is you just get hired into one of these labs anyway. So I think the market for people who just do things and try things and try to execute in like a competent way, even if like, it doesn't work out commercially, even if it just wasn't that great anyway. But that's your job interview to go into one of these things anyway. So I don't feel that from a very, very small startup's perspective. Midsize startups, yes. I would say there's been a lot of dead LM infra consolidation, like the Langfuses of the world getting a Zord to the Clickhouse. I think people have maybe worked out the domain-specific playbook and like i think that's okay um and yeah i'm not that not that worried about uh okay so um i i would say i'd be more worried about traditional sass like low nps s this is the whole ai versus sass debate that has been going on uh and and like literally i'm going through that exact thing in my company where so i'm like kind of thinking through this on a very visceral visceral level right on one hand you have the people who say you vibe coders don't appreciate the amount of work that goes into a CRM. And like, yeah, you think you can rip out Salesforce. So did the 30 entrepreneurs before you, right? Like, you know, you classically underestimate the things that you don't deeply know and the target audience is not you. At the same time, like, we have never been able to build software so easily and customize software so easily. And like, yeah, you're not going to use 90% of the things in Salesforce. So like, what's the typical? So what have you done internally? So we have the main SaaS that we do for event management and sponsor management. And we pay 200K a year for that. Not huge, but like chunky for my scale. And like, yeah, I could probably spend 2000 and build like a custom version of that. The trick has been dealing with the rest of my team and getting them on board. Because I'm the most technical person on my team. But like, I can't make that decision myself, right? Like, I think in the same way, I've been telling with other CEOs, team leaders as well. It's like, well, you can be super cloud pilled. You can be super LM psychosis. and you think that's okay, but you have to bring your team with you. And I think the sort of widening disparity in LLM psychosis in companies is causing real rifts because on one hand, the people who are less AI native are not getting with the picture. They're actually behind. They're actually not waking up to the fact that everything you think is necessary is not actually that necessary. And in fact, it would be better of you if you just like held your nose and went in and came out the other side only talking to agents in natural language and like your life would actually be better and you're just like close-minded. There's that perspective. The other perspective is, oh, you vibe coder, you did this in a weekend and you got the 80% solution and now the rest of your employees have to pick up the rest of your shit that you thought you were such hot, amazing at, but like actually you didn't figure it out And like, actually, LLMs are still useless at this and blah, blah, blah. So I think there's this huge debate going on in every company right now. And like, you know, I have a small microcosm of it. But like, yeah, it's making me hesitate to pull the trigger. But like, I will at some point. It's like maybe I put it off for one year, but not like five. But like, so like SaaS is definitely getting squeezed. It does make me wonder, like, I do think that there's an opportunity for a more AI native system of record thing that is not just Postgres or not just MongoDB, although both are very good. Maybe it's like a convex or like people bring a convex a lot. I don't know. I just feel like the sort of quote unquote Firebase of AI apps isn't really a thing yet beyond what we have, which is fine. It's just we could probably start in a more sort of rapid iteration cycle first before scaling up to like a Postgres or MongoDB, which are more sort of old tech. I was at a dinner with Mike Krieger, the CPO of Anthropic, and we were just kind of going around the room, going like, what are people most worried about? And for me, instead of security, I brought up biosafety. Yeah, classic. Actually, like I said, it was cliche and classic. And the rest of the table were like, what do you mean someone sitting at home can manufacture a virus that wipes out half of humanity? It's almost like the OG Jeffrey Hinton, like this is why you should be scared. I'm like, yeah, read the risk reports. This is the thing. I think, and Mike was just sitting there knowing he was sitting on Mythos and going, actually, it's security. And I think there's part of it is very good marketing, like too good. I would actually advise and topic to tune down the marketing because also it's just a very good model. and you don't have to make so many marketing claims around it. At the same time it is not really a private model if you give it to 40 companies each of whom have like 10 employees or whatever Right It not private It like there bad actors in there Yeah. Hopefully not as bad as releasing it widely. But no, I mean, it's an interesting case study for how many model releases. This might be the first model release that looks like the rest of them from now on. There's an overall product strategy for Enthropic of bundle, restrict access, bundle product with model maybe, whereas OpenAI has definitely been a lot more sort of philosophically aligned on like we will just enable access everywhere and we don't know what will come out of it. Right. I mean, this current moment, obviously the cynical take is also just ties to the amount of compute that both companies have. Yeah, right, right. Yeah, I think that's true. I do think like this is the scale, the dawn of like larger than 10 trillion parameter models is very interesting. I don't think, I think it's a temporary phenomenon because we have much larger compute clusters coming online for everyone over the next like three, five years. And this is like already written in the cards. Yeah. So to the extent that like, you know, will we have rationing of models above 10 trillion in like two years? I don't think so. So I think everyone will have a rationing of the next phase. Right, right. But like that's as it should be almost like my classic example, which this is just me theorizing, not anything confirmed by Google. When Google announced Gemini, they actually announced three sizes, which was Flash, Pro, Ultra. They never released Ultra. They only have Pro and Flash. So my theory is they have Ultra sitting in a basement and they just kept distilling from it for Flash and Pro. So, which like, yeah, I mean, I actually think that's as it should be for any lab that they do that. Yeah, just because those are the models that people actually want to end up using and it's just like cost per unit. It is more, yeah, it's cost. It's not the want, it's just the cost. I do think like it is interesting that for a while I was considering the theory that models capped out at $2 trillion. And I think that's proving to be wrong. and well then if I'm wrong, how wrong am I? Do we do 200 trillion? Do we do two quadrillion? Whatever. And I don't think we have the straight answer to that but it's interesting that we are continuing to scale number params when everyone can see that we're not going to get the next thousand or one million X from this paradigm. So the others, the alias of the world are working on other model architecture improvements. we need a different scaling law, I guess, because I feel like people already feel like we're tapped out on this. The end state of this is we turn most of the world into data centers. And I don't know if we want that. Yeah. I mean, if the return of intelligence are there, maybe not so bad. I think there's just a sheer amount of unscalability that is wrangling people's sensibilities right now. especially in terms of like context lengths. My classic quote is that context length is like the slowest scaling factor in LLMs. We took maybe three years to go from like 4,000 context length to a million and that's about it. Like Gemini has had a million token context length for two years now and no one's using it. So like, yeah, memory is probably going to be the biggest limiting constraint on all these things. Yeah, certainly seems that way. I guess I'm curious over the last year since we recorded last, like what's one thing you've changed your mind on? I feel like I was kind of bearish on open models like last year in the sense of like I had just done the podcast with Ankur Goyal of Braintrust where he, I mean, you know, he has a good cross section of all the top AI companies. and he says market share of open source is 5% and going down. I think that's changed. I think it's going up. And even if... Even though the capability gap does seem to be increasing. Depending on the time. It's hard to tell. It's really hard to tell. Because for listeners, capability gap increasing is on public benchmarks. And let's say you're comparing Mythos versus, I don't know, GPT-OSS or GLM 5.1. and it's really hard to tell because even if they were closing, you will also not believe that they were closing that much because it's very easy to game the benchmarks. So you just don't really, really know. All you know is like there's somewhat objective open router stats on like what people choose in the free market and people do choose some of these open models in significant volume except that a lot of them are heavily discounted. So you need to kind of like price adjust these things. So even if that were true, which I'm not sure, like I feel like the numbers is up now instead of down. I think the separation between what the top tier agent labs are doing versus the average startup in AI or the average GPT wrapper is significant enough that you should not worry about the sort of mean industry number and you should cohort things into like, here's the median, here's like the bottom 80% and here's the top 20%. And top 20% acts very differently than and probably 80%. And so top 20%, which is all I care about, is definitely going towards more open models. The fireworks and the togethers are crushing. And so will all the fine tuners, right? So I think maybe last time we even said things like fine tuning as a service doesn't work. Well, now it's going to work. It's a derivative of the open models market. Well, and also in the workload scaling to the point where people care about cost and speed more and more. And then like, you know, moving from just pure use case discovery of like, what can these models do to, okay, we know what they can do at scale. Now let's do them cheaper and faster. Yeah, yeah. So like that change, I think, is probably the most significant in my mind. And like, I always like to do the mental math of like, this is what I think about scheduling a learning rate. Like when you've been wrong once, what else were you wrong on? And I'm kind of working through it. To me, the other thing was the coding one, which obviously I have now come full 360 on. But I think people are not appreciating dark factories enough, which I don't know if you've discussed in the pod yet. And so this is a kind of a strong DM slash Simon Willison term. The general idea is, OK, there's different levels of AI coding psychosis you can have. The very first level, which I, by the way, I encountered first in cognition five months ago. was zero human written code. Yeah. Right. Which like seems like a reasonable thing now was less reasonable five months ago. The next frontier that sounds as crazy today as zero coding was in the past is zero human review. Yeah. Like you just check it in without even reviewing it. And very few people are doing that. But OpenAI is exploring this. And I feel like it's definitely the only scalable way to do this, which it just means like you have to just kind of like flip the SDLC or change large amounts of what you normally do, which is probably things you should have done anyway. More testing, more automated verification or whatever. But like that is a frontier at which like when you have to unlock that in your companies, you are just going to produce much more quantity of software than you've ever had. And it's going to be like so much, so disposable, so cheap that you can probably innovate in quality a lot as well. Like that quantity helps you get to quality. which I think people are very uncomfortable with because people associate more quantity with slop. Right. No, it's back to exactly the discussion we're having on the reaction to these token maxing scoreboards and the idea that today maybe that's not the best sign of productivity and efficiency going forward. Yeah, but you still get rewarded for it. So you're like, fuck it, whatever. But I think the people who are doing well, who do most well in 2026 are not the cynics who go like, oh that's just slop I'm not going to participate in that they're like okay this is happening with or without me let's bend this the right way yeah I love that I think for me a kind of related thing on the open source model side is for so long I really didn't think it made any sense to do any sort of RL post training, pre training anything you could do to improve overall quality certainly for latency and cost it always made sense to me but for overall quality god you just get that for free in the models 3-6 months later I think what I'm starting to change my tune on a little bit is hearing all these app companies talk about we build stuff and then we throw it out three months later as the models improve. You're like, okay, well then what you're doing for capability improvement is just another version of that. I still don't think that your RL or Post Train is going to make you have a better model for years and years to come. But maybe, I think you still have to be pretty rigorous on is that the single best thing you can do to solve a customer problem? Oftentimes it's literally just like, no, add more data and feed more data even via connectors to these models or like, I don't know, do some clever engineering on the back end or whatever it is. But if the single best thing you can do for that three-month time period to improve your customers' outcomes is post-training in some way that really improves the output of a model, even if you throw it out three months later because the general models get up there, it still might have been worth doing. And so I think I'm more open to... You throw out the results, but you don't throw out the raw data. Totally. Right. Then you just run it again. And so basically there's some... Obviously, at the level of cost of like $10 million, maybe that's too much, but there's some level of cost where... No, it's not even $10 million, right? No, of course it's not. You know, there's obviously some level of investment at which it's the equivalent of just like staffing four engineers to go build something for three months. Yeah. So the other thing I really for listeners, I'm just going to leave some some droplets of info. Look into like the long trajectory, the synthetic rubrics work that people are doing is very important, including something that's called Dr. GRPO. I'll just I'll just leave those key search terms in there. I think what it means is that RL is going much more multi-turn than people think. And that means that you can customize the models in way more specific dimensions than traditional, let's call it SFT or, you know, like a sort of shallow RL that was done a year ago. So like hundreds of turns. Yeah. And I think that leads you down a path of like complete domain specificity. What else are these unanswered questions in AI today? Are you looking for in the next year? Are you paying close attention to? I have a few theses for what is the next frontier. One is memory and personalization we talked about. The other is really world models, which we've done a small little series on from Fei Fei Li to even Moon Lake and General Intuition. And there's a lot of debate as to the relative importance of this. I think a lot of it manifests as 3D static worlds that you kind of inhabit for a little bit and you walk around. And they're like, cool, but how does this help me with my B2B SaaS? It's like all the hype now is robotics, right? Yeah. And there's obviously a correlation between world models and embodied vision and experiences, which leads to robotics. But I think world models is very interesting just in improving intelligence itself. from the next token prediction paradigm. And so I think people are kind of testing their edges around that. One of our top articles this year so far has been on adversarial world models. I do think like if you don't do anything else, just read Fei Fei Li's essay on spatial intelligence, on why LLMs don't have it. And she may not have the solution yet, but she has the right problem statement. And so everyone else is trying to solve that problem statement in their own way. And let's see who wins. But I don't think it does you any favor to equate world models to robotics or world models to gaming or some kind of like the current manifestations because what is at stake is a much more important conception of intelligence than just answering questions. It is, does the AI understand what a table is, like what matter is, what physics is. It's almost like for those who are movie fans, it's like Google hunting where Matt Damon knows everything because he read it in a book, but he's never- Great scene with Robin Williams. Robin Williams. And I look at that scene and I go like, that's exactly the difference between a very intelligent LLM who knows everything, but hasn't experienced anything. Wow. That's an awesome note to end on. Have you used that anecdote? That was great. Yeah. So one thing I've done with LLM space is I moved to like adding daily write-ups. And so one of the times I was doing this daily write-up, I wrote that. That's a great one. I love that. Also, it's been a ton of fun. Thanks so much for coming to us. I'm Jacob Efron, and this has been Unsupervised Learning, a podcast where I get to talk to the smartest people in AI and ask them tons of questions about what's happening with models and what it means for businesses in the world. As I hope is clear, I have a ton of fun doing this. It's a nights and weekends project in addition to my day job as an investor at Redpoint. But our ability to get these incredible guests on really comes from folks like you subscribing to the podcast, sharing it with friends. It's really what ultimately makes this whole thing work. And so please consider doing that. And thank you so much for your support and listening. We'll see you next episode.