OpenAI's Codex: This Model Is So Fast It Changes How You Code

47 min

•Feb 18, 20263 months ago

Summary

OpenAI's Codex team discusses the rapid evolution of their AI coding assistant, from a terminal-based tool for professional developers to a GUI-based app with broader appeal. The episode covers the strategic shift toward agent-based workflows, the release of an extremely fast new model, and how speed fundamentally changes what's possible in AI-assisted development.

Insights

Speed is becoming a primary competitive advantage and enabler for new interaction patterns; the new Codex model is so fast it requires UI/UX adjustments and opens possibilities for real-time voice-based coding
The bottleneck in AI-assisted development is shifting from code generation to code verification and review; teams now struggle with reviewing the volume of code being produced
GUI-based interfaces are proving superior to TUI/terminal approaches for managing multi-agent systems and complex workflows, contrary to developer expectations
Integration of AI agents across multiple services (GitHub, Slack, Linear, Vercel) requires a 'command center' paradigm rather than IDE-centric workflows
Internal adoption and dogfooding by research teams building the models themselves creates a powerful feedback loop for rapid iteration and improvement

Trends

Shift from code generation as primary value to agent delegation and automation as the core use caseSpeed optimization becoming table-stakes; latency reduction enabling new interaction modalities (voice, real-time steering)Multi-agent systems replacing single-tool workflows; agents acting across code, infrastructure, communication, and marketing tasksCode review and verification becoming the critical bottleneck rather than code generationMid-turn steering and real-time model interruption enabling more natural, conversational coding workflowsAutomation of non-coding tasks (PR management, bug fixes, documentation) as primary productivity gainsPersonality/tone customization in AI models emerging as user preference differentiatorHardware infrastructure optimization (WebSocket-based persistent connections) becoming as important as model improvementsEnd-to-end testing and screenshot evidence replacing traditional code review as verification mechanismIntegration with external services (GitHub, Linear, Slack, Vercel) becoming core product feature rather than add-on

Topics

AI Code Generation and SynthesisAgent-Based Automation SystemsGUI vs TUI Design for AI InterfacesModel Speed Optimization and LatencyCode Review and Verification BottlenecksMulti-Agent Workflow ManagementReal-Time Voice-Based CodingMid-Turn Steering and InterruptionContinuous Integration and Deployment AutomationAI Model Personality and Tone CustomizationLong-Context and Long-Horizon Task ReliabilityDistributed Systems Infrastructure for AIEnd-to-End Testing AutomationDeveloper Productivity MetricsCompetitive Positioning in AI Coding Tools

Companies

OpenAI

Develops Codex, the AI coding assistant being discussed; released new fast model and Codex app with Super Bowl commer...

Anthropic

Mentioned as competitor with Claude/Cloud Code; discussed as losing momentum in coding assistant space relative to Op...

Cerebras

Partnership mentioned for serving the new fast Codex model through their infrastructure

GitHub

Integrated with Codex app via GitHub skill for PR management, merge conflict resolution, and code automation

Slack

Integrated with Codex app for agent-based automation and communication workflows

Linear

Integrated with Codex app for bug tracking and issue management automation

Vercel

Integrated with Codex app for deployment automation capabilities

VS Code

Mentioned as IDE alternative that team considered forking but ultimately rejected in favor of custom GUI

People

Tebow

Head of Codex at OpenAI; discusses strategic direction, model capabilities, and product philosophy

Andrew

Member of technical staff on Codex app at OpenAI; shares detailed usage patterns and automation examples

Dan Shipper

Host of AI & I podcast; conducts interview and provides context about Codex's market positioning

Greg

Referenced as power user who lives in Emacs; mentioned in context of TUI vs GUI preferences

Quotes

"The first time I showed it to someone, they were like, no way, this is like a fake demo. Like, this cannot be this fast. This will change everything, especially because it's not yet the fastest that we can actually get it to be."

Tebow•Early in episode

"GUIs are great. IDEs are just a problem. There's something that's a GUI for programming that's not an IDE."

Andrew•Mid-episode

"It's called a Codex app."

Tebow•In response to Andrew's observation

"The bottleneck that is very apparent is like, you know, how fast can you verify that things are correct?"

Tebow•Later in episode

"You cannot delegate understanding right. It's like you're trying to understand something and so speed there is a real advantage."

Tebow•Discussion of code review

Full Transcript

The first time I showed it to someone, they were like, no way, this is like a fake demo. Like, this cannot be this fast. This will change everything, especially because it's not yet the fastest that we can actually get it to be. My experience was trying the app. I didn't really want to go back to a terminal. What I realized is actually GUIs are great, IDEs are just a problem. There's something that's a GUI for programming that's not an IDE. And it seems like you're figuring that out, but I don't even know what that's called. It's called a codec sound. Dan here, and I want to take a second away from the episode to tell you about Granola. Granola is an AI note taker for your meanings, and I use it pretty much every day. That may sound a little bit weird or a little bit creepy, like transcribe all your meanings. Well, for me, it's actually kind of indispensable as a leader. Every is about 20 people now, and it's really important to me that I understand how decisions get made, how I'm showing up in meetings, and how I can help my team the best way I can. Granola acts a little bit like a leadership log for me so I can see how I've done in meetings, what situations came up in a particular week, and how I can do better next time. If you're trying to improve as a leader and scale your company, try Granola as your AI-powered notepad for meetings. Head to granola.ai slash every, code every, to get three months free. And now, back to the episode. Tebow, Andrew, welcome to the show. Hey, thanks for having us. Thanks for having us. Great. Great to get to chat with you. So for people who don't know, Tebow, you are the head of Codex OpenAI. And Andrew, you are a member of the technical staff on the Codex app at OpenAI. And you are the people of the moment. They just ran a Super Bowl commercial about Codex OpenAI did. How are you feeling? Yeah, that Super Bowl was quite surprising, wasn't it? It really was. I think the core thing and I think the reason, the place I want to start this conversation is it feels like that is a strategic shift. You would expect OpenAI to have run a ChatGPT commercial during the Super Bowl. And maybe not, especially if you looked at Codex's positioning like three or four months ago for professional engineers, maybe not have run an ad targeted at a much broader audience. it felt like for a long time there was this divide where Codex was for professional engineers and if you want to do VibeCoding you do that in the ChatGPT app and it seems like that that has shifted a lot over the last like month or two can you tell me about that yeah I think especially like in you know we can talk about last week right so like last week on Monday we released the Codex app immediately we saw like a ton of downloads like more than a million downloads in the first week and then we knew that we were releasing like an extremely strong model like you know, five, three codecs on Thursday, that just made, I think this, it very visible that, you know, we're here to, you know, put incredible experiences out there. We're very committed to codecs. And like also agents are really starting to work and be able to create these things, you know, even if you're like a little bit less technical. I think like the app really showed that, you know, it's like it's much more inviting for people to just try it and like, you know, run multiple agents, you know, with our models being like very, very good at sort of like allowing for multitasking and being reliable for long running long running sessions sort of like allows you to create a lot more so it just felt that you know maybe we can inspire more people to build and then show that agents are here right it's like it's not it's coming it's going to be mainstream you know why don't you try and like create something new and you know inspire people I felt like the right thing that we wanted to reinforce Yeah. While we were designing and developing the app, one of our internal mandates to ourselves the whole time was that we had to make something that we love to use and that we used for all of our work. And if we couldn't do that, then we weren't going to put this out. And this was back when we started. And I think that we surprised ourselves a lot with how fun it was. And especially as we started to build this app before we started to build agent skills. And then once we kind of paired them together, it became this really rich interactive experience where you could open the browser or you could connect to these various services. And so all of a sudden we started to feel this like really connected interactive experience and wanted to share. Like, I kind of see the ad as like a love letter to builders, right? I have never seen a Linux CD in a Super Bowl ad. And so that was really cool to watch. What was the impact of the ad? We're still to measure that. We'll see how it plays out over the long term. But we saw a giant surge of traffic, actually, remarkably, very, very quickly after 4 p.m., like PST, when it aired, the surge and our systems were under heavy load. so it was it felt kind of weird to me that you know people are watching the super bowl and going and like you know installing the app and they're just like trying it out right there and then but that it happened um and a lot of a lot of people reached out and like saying they were really inspired uh by by it and just wanted to build afterwards which is you know what we're like aiming for as well to me back i still want to talk a little bit about the the strategic shifts So Codex app moving from or Codex in general moving from something that is really for professional developers moving to something that has a more broader audience. And and and maybe moving some of the vibe coding from Chachipati into the Codex app. Tell me about that. I don't think we're trying to move by coding, you know, from Chachipati into the Codex app. Like we're very much, you know, two things are happening. Like one, we're pushing the frontier on like professional software development, like 5.3 Codex like beats every single other model, like on the top benchmarks for coding. So it is a very, very capable model. And, you know, it's also like at the speed and cost. It's like, you know, it is a top performer other. I think the app, the second thing is like the app does make things more accessible. And so like it does appeal to like a wider audience. but internally we're also seeing the app you know just it is very much used within research within our own team like the entire codex team uses the app it makes people more productive so it's like very much leaning in into you know how we think agents are best used the patterns that we were seeing you know that we're making people like very uh productive here at the company and outside and then just sort of like going all in on that it does happen that at the same time So it's like, hey, it's just delegation is finally here. It works. You know, and it's like much more accessible. And we're going to try and see like how we can package that and actually ship this to like, you know, much, much wider audience. But that's that might not be the Codex app. When you use that all that it's like you just build in there. 99% of the code that I write is using the Codex app. Same. I mean, I live in there now. Yeah. Okay. Well, that's that's actually really interesting. I definitely want to talk about the app in particular, but I want to go back to the thing you just said, which is maybe if I if I'm reading you right, you're you're kind of like we're pushing the frontier. We're seeing lots of people who are maybe broader than just like senior engineers using this. However, the overall idea of like who is doing what in which app, like maybe you haven't totally figured out yet. And it's not as clean of a line as like no longer vibe coding and chat GPT or really vibe coding and codex. It's like you can do it in both, but we haven't figured out exactly like which thing you're going to do where. Yeah, I think Codex is like the most powerful experience right out there. So you should be fairly technical so that you understand like, hey, you know, code is actually getting written and it's going to get executed on your machine. By default, it's executed in the sandbox. But you should probably be able to read code in order to use Codex to its fullest. we will bring a similar experience to chat gpt at some point which will have like different uh properties in terms of like the sandbox and like how concepts are represented maybe we won't be showing like you know hey it's like this scary terminal command like thing is like running and you know you should probably approve it it's like you know of course you shouldn't do that to someone who is not technical and codex is really there to like appeal to you know just all coders builders you know, technical, like people who are close, like either technical themselves or like technical adjacent, you know, like data science, these kinds of things. Yeah. And, you know, if you use the Codex app for any amount of time, you can see the inspirations from chat. You know, the layout's very similar. We auto name your conversations. You've got contextual actions, but it's pretty clean, right? The composer looks very similar. And you'll see some of that inspiration back in chat for other types of things. But we still believe that when we set out to make something that was for the professional software developer and for us, that it deserved a dedicated experience that could really showcase the power of the models and the way that the models could change the development lifecycle. And so we made something very tailored to that. And we've had a lot of success internally with research teams, with product teams. And so we'll look beyond. But I think we're really happy with where we've ended up on the tailored approach to this. can you tell you about the decision to invest in a gui over a tui i feel like tuis are are so hot right now and obviously you have one for codex already and you you could have said okay we're going to double down and and just make the make the terminal terminal experience even better than it is now and really invest in that versus okay we're going to go you know i think uh yeah making a gui is a little bit of like a counterintuitive or uh uh like counter narrative thing to do so Tell me about that decision process. I think it wasn't counterintuitive. It's more maybe it's not mainstream. And so we experiment with a lot of different approaches. I very much consider that we're still in the experimentation phase. And we're responsible primarily for two things. It's like building the most powerful entity out there that's capable of coding. And then increasingly, this will become like a multi-agent system. And it will become more and more capable. and you will have to figure out how to steer and supervise its outcome and its behavior. That's one thing that we're building. And then we're also building how you even interact with this. It's like, what is the optimal way to have visibility into what this very capable entity or system of entities is doing? How do you steer them? How do you supervise them? And so we're very much still experimenting with what that is. It's like, sure, you can do it in the TUI. It's like, at some point, it starts to feel very limiting, you know, especially on like multimodal, like, you know, actually like the models can like draw little diagrams and generate images and, you know, or, you know, you can talk over it, you know, using voice. Maybe you have like many of them going in parallel and sort of you start to lose track. So we felt like we needed to start experimenting with something else. And it is only when, you know, we started to become like super, super popular internally, we were like, we have to ship this externally. Like, this is kind of like, this has come to a point where it's like too good to sort of like just keep it to ourselves Um I mean that was like the journey that you went you know you were not building in the app Although like when did you start building in the app That was actually like fairly quickly like when the app was building itself. That was, yeah, that was pretty quickly. And yeah, because I was starting with the TUI and with the IDE extension. And I think that my goal personally was how can I get to fully building the app on the app as fast as possible, right? It's like, it's really easy when building this stuff to slip into the mode of like, oh, this will be good for somebody. Like somebody will love this. A certain type of, like, they will love this, right? So we really wanted to get quickly to like, I want to be able to build the app on the app. I want it to be able to run itself with skills. I want it to click around on the app that it spawned. And I want this to be like part of my workflow as soon as possible. And there, like I still use the Tui sometimes when I want to fire something quick. But I think that there is something about the flexibility of controlling UI and being able to have some panes be persistent and others be ephemeral. And we shipped voice with the app so you can prompt with voice. We have mermaid diagrams in the app. We have full image rendering. So all of those things, I think, are like the tip of the iceberg and what we want to do with a dedicated UI. And it's pretty simple. And it's simply intentionally, but I think we're going to do a lot with dynamic stuff there. I mean, yeah, the ceiling is just much higher. Yeah, it's interesting. My experience was trying the app. I didn't really want to go back to a terminal. And I had been coding mostly in cloud code and some codecs in the terminal for several months before that. And I think what I realized is actually GUIs are great. IDEs are just a problem. And like there's something that's a GUI for programming that's not an IDE. And it seems like you're kind of in that, figuring that out. But I don't even know what that's called. It's called a codex app. You know, there was a moment during the development of this where everybody and their mother was forking the same IDE. And we kind of looked at each other and we were like, hey, should we have done a fork of VS Code as well? Like very seriously. I remember exactly which day it was. And I think, I don't know if I would say that IDEs are the problem, but I go back to like the truck analogy sometimes with them. which is that I will open an IDE here and there. I opened one today. It was something very specific that I wanted to do that I don't even remember what it was. But then I closed it, and I went back to using the Codex app. And I think that there is something there with the Codex app being a great daily driver, and occasionally you need an IDE, or occasionally you need a really complex terminal setup, but that this should be your home base, It should be your command center for the agents that are running and a place that you can come back to and track all this stuff. And, you know, there are a lot of design decisions around, like, do we allow freeform panels like an IDE? And we kind of came to the conclusion that a lot of what these models are great at is knowing what is needed in the moment for what type of task. And so we wanted to have kind of more full control over what was able to show at what point. And you can see that in plan mode where you're not necessarily getting a composer. You're getting a really quick way to answer questions. And you've got your plan and you can edit your plan. And I think we only want to do more with that as we go. It seems like you were surprised that you didn't want to go back to the Tui after. I was. Yeah, is that? were you like a like like greg greg did an interview and greg was like i am a power user i thought i would never leave the terminal yeah greg lives in emacs are you like a i was a two e power user for like six months starting with starting with like when cloud code first got really good and i was like holy shit this is so much better than being in cursor or windsurf or whatever and now i feel like we i speed i speed ran my two era and i'm back and back in GUIs like I'm kind of flipping back and forth right now but I can I sort of see the light where um it just if you're especially if you have a bunch of them going at once the affordances of GUI are just like make it it's much nicer yeah and there's a lot more to come there and it's it was a very intentional thing for us like we sort of see you know agents will act and you know are already acting on like much more than code right and so they need to be a companion to every single app and every single thing that you can do on your computer. It's like we integrate with linear, Slack, and of course, you also need to be able to read the code and produce code, but maybe it can do a deploy through Vercel as well. Are you going to do all these things from your IDE? That would feel very odd. And so it's like this command center for your agent. we optimize the entire experience around that you know around the idea that you have a very capable intelligent entity that you're like controlling steering and supervising and you know you you never need to like sort of like go in there and like you know do the things yourself it's like you know the thing is very capable of like you know being delegated to like i think you know when when you accept that that is like you know what we're headed towards and like you know with 5.2 codex is like you know it just feels like you know we're getting like almost there right then you're like, well, you know, it's the same with you, right? You know, like when I talk to you about like a feature ID or something, it's just like, you know, you go and you get inspired and you go and do it. It's just like, you know, I don't suddenly jump into your IDE and like, you know, just go and like implement it. You could. Yeah, I mean, I think you would find it disturbing, right? It's like, I mean, so that's the way that you will, you know, everyone will work with agents. It's like, you just talk to them. How has your workflow changed with 5.3 Codex versus 5.2? I was surprised at how much faster it was. And sort of like I had to adjust on, I had been optimizing a lot more for like long running, sort of like multitasking. And, you know, I sort of like had an expectation of like, okay, this type of task will take like, you know, 10, 15 minutes, I'm going to kick like, you know, four, like, you know, different things and then come back. So I'm able to like, you know, maybe do a little bit less multitasking and like, you know, be more in the flow. So that, you know, felt really good. And then it just feels now very satisfying as well, like, you know, to kick off like automations with it, using skills. It's like it's it's a more generally capable model. It's like less sort of like super focused on code. Right. And so I find it like much more reliable, like, you know, sort of like going through like Twitter replies and like, you know, summarizing like the important teams. or like filing bugs in like linear and then, you know, coming back to that and using automation so that, you know, things are like implemented like daily. Feels like it's like much more robust for these things. I mean, but you're really like the superpower user here. And there's like, you know, it's just like the kind of stuff like, you know, he does. It's just like, you know, it's like, I have very vanilla usage of codecs compared to Andrew. No, I mean, well said. I had a series that I had intentions to run this for a while and I only ran it for three days on on on x twitter uh which was that I was I was setting up a prompt to basically add a feature to the codex app like some random like non-shippable feature to the codex app I had this long prompt like about the quality bar that we had to do and um once I switched it to 5.3 codex the results got actually much more interesting like Like we did a subway surfers panel on the right was one of them. Like a little Minecraft UI for the sub agents was another one that we did that. I don't know. Maybe, maybe we'll ship it. I was like, get back to work. Why do we have Minecraft in the Codex app now? Yeah, but I'm going to explore. No, I mean, 5.3 Codex, like it's, it's, it's neat. It's fast. It's capable. It's multimodal. um what are tbo says you have a lot of cool use cases like what are what are the like more interesting ways that you're using the codex app that maybe people should try but haven't thought of yet andrew came up with automations and i think that sort of like shifts the way that you know you're thinking about these things when they can just like sort of like hop it into background you know on a specific trigger at a specific time and then you know just you can sort like program it yourself yeah you're using that a lot there are a lot of things that i use um the app for that are a little bit outside of just like coding features um i keep it to uh i use it to keep my prs mergeable with automations and so it'll resolve merge conflicts it'll keep them updated it will fix like build issues so that basically like as soon as they're ready to go like they're ready to go there's no like oh hey somebody merged a big thing and there's a conflict now. So I do that. So you said like, so at what point is the, is the automation trigger? Cause I thought the automation triggers like at a certain time schedule, but it sounds like there are other triggers I didn't know about. I, yeah, I, we're looking at a lot of things. I have it right now just on a time schedule and I use our GitHub skill and some internal skills for our CI and that, that runs hourly or every two hours and kind of just cleans everything up. I see so it's like through all you know if there are any changes on main and it just looks through any PRs and just like make sure that they're all up to date so that whenever you're ready to go it's never like that's actually that's good I like that yeah it's it's actually really helpful it's surprisingly helpful um I have one that every day at like 9 a.m uh I get sent all of the contributions that have merged to the codex app over the last day and so it'll do like a nice report of who merged what and it will um i have a group bit by theme so i can be like all right like three people worked on this part of the the composer two people worked on automations like here's what happened so that i can at least be like knowledgeable what's happening because things things get chaotic uh right before launch and yeah one automation i have is uh it's i run it like multiple times a day and it's like pick a random file and find and fix like a subtle bug and then it's kind of funny because it actually does pick a random file uh so it will run like you know python like rand and then it will like you know find a random file and it will start from there and so it's like every time it sort of like explores like a new one um has it caught anything Oh, yeah, yeah. It's like we catch like it's often latent bugs that, you know, are not triggering actually like on the critical path. But you know they actually bugs And then you know just like trivial to fix it like merge it Takes very little time And it a thing that you know I would have never found myself found like an issue in like constraint sampling like the other day That's really cool. Do you have other automations that are worth sharing? Let's see. I feel like I have 60 that are running at all times. Some for testing and some for real. Some of the members on the team really like this one that looks at the PRs that you've done in past day or so and quietly cleans up any bugs you shipped and kind of like looks at a few of the observability platforms to see and like tries to basically ship a fix before anyone's noticed that you shipped a bug. That's cool. I want it's not coding related, which is like marketing research. It runs daily and it's just sort of like it's prompted with like a specific skill to do like deep marketing research, which I've like sort of like tuned over time. And then that just goes and like searches the web on, you know, any sort of like new things that sort of like came up in terms of like how, you know, just like how users are like perceiving, talking about Codex. And then I just received that little report. And it always makes for like an interesting read. Yeah. We can just go on. It's like, these are just examples that we do rely on, you know, they run. Yeah. Yeah. Do you have any particular skills that you guys like that are beyond the normal kind of, you know, I have a GitHub skill and that kind of stuff? I love Andrew's yeet skill, which it just like takes like the change and then, you know, does the commit, does the PR, writes like the draft, puts it in draft and like, you know, publishes a PR with like a PR title and body. Yeah, it's very satisfying. Yeah, it just does everything. That one definitely makes people productive. What are the top-used ones for you? Image Gen is a cool one. Yeah. For both silly automation purposes, like, hey, make me an image that characterizes my last day of work. Not my last day of work, my previous day of work. Yes, yes, yes, yes, Andrew. You know, the Image Gen skill was actually really cool. for I use the Codex app to make a book for my daughters. And so I had I like, you know, put together this prompt for teaching it about like a script that I wanted written. So like 24 pages. Here are my daughter's ages. Here's like where we've lived in the past. Like we were in Boston and moved to New York and then moved over here. And then I said, like, after that, we went through that. I agreed on the script and then we went through and I said, like, all right, now it's time to use the image gen skill. And it made, like it prompted for every page in the book based on the script, it prompted for the image. And then it kind of put them all together and use the PDF skill to put together the book's PDF. And then I printed it. And so we've got like a super custom book that I read to my kids and it's really cool. It's just this awesome thing when you can combine like the intelligence of like the agent and then it's like, it works in a programmatic way, like by using skills. and then you can just combine them in like novel ways. And like, yeah, I think the PDF and image 10 one is like, it's a common combo that we see. It feels like the Codex model, it obviously has gotten faster, which makes it much more usable. And it also feels a little more opacy, like it's a little more, has a little more emotional intelligence, but it still has a little bit of that, like it does exactly what you say thing in a way that is a little, it can be annoying. How are you guys thinking about how you shape the way the model feels and which way you're pushing it. It's something that we obsess over. So we definitely want the model to excel at coding and be really good at instruction following. At the same time, when we optimize a little bit too much in that direction, it can like over index on like, you know, specific words or, you know, sort of like misunderstand the intent, you know, in ways that, you know, humans wouldn't. um like sometimes i will just like have a typo and then you know the typo like you know actually find its way into like the file and i'm like you know obviously you know i didn't mean like you know the typo is like you know i meant like this name of this class um so that's something that we're um you know definitely continuing to push on but like the thing that we're pushing on the most right now is like really efficiency you know speed and then also like what we now refer to as like personalities like you know how supportive is it and then we understand that not everybody has the same preferences there like the previous default you know was definitely like super blunt like pragmatic personality now we've also introduced like a more supportive like friendly personality and you can just like pick between those and I think for things that don't have like sort of like a universal like um accepted you know thing that you know everybody that you know should just use is like you know we're probably going to introduce like some way for you to just make it your own right you know you should feel like you have your own little personal codex um that you know works in exactly the way that you want it to work um do you use the friendly or the pragmatic one pragmatic pragmatic yeah okay i also use pragmatic yeah um interesting i think um you guys recently put out a model that is so fucking fast i was testing it before it came out and I was just like, I can't really keep up with this thing. So I'm curious how that changes how you think about what is now possible with coding with a model like this and also the affordances that you need in order to manage models that are so quick effectively. Yeah. The first time we used this model in the app, we had kind of that same thing happen where all of a sudden there was just like this wall of text and we were at the bottom of the scroll and we were immediately like, all right, we need to smooth this thing out coming in. And so we actually do slow it down ever so slightly just so that you can see the words come in like a little bit smoother. That's so funny. It's like a really funny problem. But this thing has been super fun. And I think what I'm most excited about is what sort of capabilities we can start to add to the app that are really, really dynamic that we couldn't with a model that wasn't this fast. So yes, this model is going to allow you to iterate really, really quickly, but it also opens up a lot of new opportunities to how you code and how you interact with the codex app. Yeah, the first time I showed the very first prototype when we hooked everything up and And, you know, obviously, like, the model is, like, powered by Cerebrus. And so, you know, we've talked about the partnership there. And, like, we're very excited to put, like, you know, the first model that we're serving through that, you know, out there. Like, it's, you know, obviously, like, still, like, very early. It's, like, literally the first time we hook it all up. And we're just, like, so excited that we want to share it. but the first time I showed it to someone, they were like, no way, this is like a fake demo. It's like, you know, this is not real. Like this cannot be this fast. And then they tried like a few prompts. They were just like, it's just like, oh, I literally cannot keep up. It's like, this is insane. And yeah, I think this will change. This will change everything, especially because it's not yet the fastest that we can actually get it to be. With the preview, we're putting it out quite early. We're actually going to layer a number of optimizations on top of it, which should be able to make it maybe 2 to 3x faster than the experience that you have experienced. So that's going to change things. And we're thinking about this also from a point of view of delegation. We think this model has a huge role to play as part of a system of multi-agent systems. and as a way to like speed up, you know, maybe the slower, more intelligent agent as well. So we're going to be experimenting in that way. Hmm. And do you expect the same hardware speedups on like the more intelligent agents to come out soon? So a lot of the things that we worked on were interesting. So like distributed systems and like infra problems that we uncovered because we were able to sample from the model at unprecedented speeds, right? And then if you're getting tokens back this fast, you need to go and optimize the entire set of bottlenecks that you sort of uncover on the critical path of serving. All of those benefit the current... They benefit like GPT-5.3 codex and all future models. And there's one thing that we've been doing as well, which I'm sure we're going to put in a more detailed blockbuster at some point, which is we rode the entire service stack to be based on web sockets and a persistent connection and to do things a lot more incrementally and statefully. And that decreases the overall latency across all models. We haven't chipped it by default yet, but it is something that we are making the default for this new super fast model. And then we're also going to enable on the other models. And it makes things, it decreases like overall turn latency by like something like 30, 40%. We can look into the exact numbers. Yeah. What are the most surprising things that you've seen using the model internally in terms of like what a speed up like this enables? It just allows you to be super, super in the flow. and you're almost like just in real time sculpting the experience or the code it's just a very different feel to it it's very unsettling at first and then once you get into it it's very hard to go back to any other model that's the feedback that we've seen that's what I have felt myself and so it's like this very it takes like five minutes to adapt and then you're sort of like no okay this is how I'm going to use this thing yeah i also don't think that we've poked at the full extent of what we could do with it yeah it's very early we haven't had it for very long yeah someone on the team like channing was just showing like oh yeah it's so fast and it can actually like play pong you know not very well but it's like the model is able to react to things like you know almost like real time right But you start to see how it might replace some deterministic steps. So we have in the Codex app a set of Git actions, right? And as everybody knows with Git, certain configuration of things or certain states that you can be in can make it really hard to run those without a ton of error handling and all sorts of error messages and guidance. And it's really hard to create a good Git experience, which is why nobody ever has. But if you have a model that's almost as fast as running these scripts, then you can imagine a world where these things turn into skills or something like that. And you can have your operations run a little bit differently with some intelligence and not have the same latency that you have today when you asking it to go track something down the code mix You can kind of vaguely gesture and be like, hey, send this up and have that be fast enough for a button. What I'm very excited about is when it's going to come together. One thing that we shipped with 5.3 Codex as well is this thing that we call mid-turn steering, you know where you're you're just you start with your prompt it's like it got to work and then you send another prompt like while it's still working and it adapts like in real time as well like it will just sort of like receive that message acknowledge it and then you know continue its work like if you start to think about okay what would this look like with voice and then with a model that is as fast as the one that we just shipped then that's like a whole other experience that we would be very excited to bring, hopefully very quickly. Because you can easily interrupt as you're talking. Yeah, if you're just talking and engaging with lateral language and then doing the mid-turn steers and then the implementation happens almost instantly because of the speed, it becomes a very pleasant thing to use. Right now, you can sort of emulate it with voice dictation and then send it then mid-turn steering, and then, you know, watch the model implement. And it's like a very cool thing. I think we're going to have a step change in that experience when we just like really just polish it. If speed as a bottleneck is like close to being solved, what do you think is the next bottleneck? What is the next limit on making the thing you want? The bottleneck that is very apparent is like, you know, how fast can you verify that things are correct? so like we i mean we can generate like code faster than ever before we can implement entire features and you know we like i saw like someone just based on a description of you know the codex app if you sort of like synthesize that into a plan uh just based on screenshots like the models are very much capable of like reproducing 95 of the features and just rebuilding the app from scratch now is it going to be bug free is it going to you know is everything like implemented to like you know perfection in the same way that you know the actual app is it's like that takes like a lot of time still like you know for like a human to go and click and verify and you know make sure that you know it's like the designs are like consistent that you know there's like no bugs here or there that the settings panel like you know when you click that button it actually does the thing that you expect I think verification you know definitely becomes a bottleneck like we have people on the team like complain you know like there's too much code to review. That's what we're trying to solve for. I mean, you complain about that. I complain about that. There's so much code to review now. Both on your own machine and from another peer. We're going to have to figure that out. You're already reviewing the code the first time because the agent is just presenting it to you. And then you have to review the code produced by your peers who are like there's like these two rounds of reviews. And yeah. Yeah. I mean, this is something that we're working on. A lot of us still do have to review code. And we want, you know, we're taking a look at what that experience should look like with the model involved. Right. We've got a review mode in the codex app that works really nicely and kind of annotates your diffs on the side with findings and stylistic things and lots of do. Yeah. It's one thing I'm sort of like also excited, like, but, you know, like making the models faster. And then this, like, you know, this one that we just put out, it's like, you know, which is mind-blowingly fast. It's like, you can also use it, you know, you can imagine using it, like, in a way through, like, understand code, understand features, you know, helping you with code review, like helping you understand, like, you know, the code that's up here, really. And it's, like, much more pleasant because this is something that you want to do. Like, you know, you want to be there in the flow. It's like something that has to be, like, synchronous. it's not something that you delegate you cannot delegate understanding right it's like you know you're trying to like you know get to uh understand something and so like speed there like is a real advantage so it sort of like helps offset as well like you know the fact that models are like producing more and more code is like you know speed helps you understand you know this code faster as well yeah i mean i definitely think i've found this already with this with this new model is speed, especially for end to end testing is faster because if you're having it do end to end testing, like manual integration testing, often there's like a toast that pops out of it pops up for like a second. And if the model's not fast, it's not going to get it. And it seems like it's better for that because it the cycle times are much, much shorter. So and I definitely find this too. It's like, I can produce so much code. But when I see a PR come in, or when I make a PR, my first question is like, is there evidence that you've actually tested this and this actually works? Like not just unit tests, like you've gone through it by end to end. How do you handle this? I mean, I've seen a lot of PRs that I have the same question about. It's like, it's so easy to code things now, right? Yeah, I mean, we have gotten the Codex app to be pretty good at, through some skills that we have, of running itself, clicking around, screenshotting itself for evidence, and uploading it to the PR. There's a lot that's pretty interesting there, especially when we make this more async or when the models get really fast at this stuff. I don't know exactly what it looks like yet, but there is a lot there around like, hey, here's a bug fix. This is exactly what it looked like when it was happening. And here's exactly what it looks like now with the same exact click path. And so maybe that's the turning point that code review becomes less important when it's like you can verify that part instead. So you have to kind of do less through the code as a proxy. But there's definitely more to explore there. Last couple of questions. I'm curious, what did you guys learn from Europe and Cloud Code? And how do you think about your positioning in the market versus them? How do you think about the differences? I think they were first to put something out there. And that was interesting to us because we had been working on similar ideas for a bit. But I think our models were a little bit at the time not ready. They were not reliable. like on long horizon tasks is like you know they were not able to like do like reliable tool calls and you know stay on topic and so as soon as like we started to really invest on that and you know especially with gpt5 is like you know we were like okay the models are there we know how to make them even better 5.2 like the broad you know even like better like long context long horizon like reliability and context understanding. And what we were seeing is that Entropic was sort of like, you know, to us, like losing a little bit of steam when it came to the model. And we were in this fortunate position where like the way that we run Codex is like, you know, we've got like product, we've got engineering, but we've also got research and we just like all work together and sit together and solve problems together. And it's like a highly creative space where, you know, at times we decide to like solve problems in the product, in the harness. But at times we also, we're like, Hey, how can we actually improve the model? And like, let's just, you know, talk about it and like, you know, idea it together. And then like research will come and be like, Hey, you know, we've got this like breakthrough that we're sitting on. It's just like, would this be like, sort of like something we can ship? And then it was just sort of like, I get excited about that. One of the examples was we had a lot of complaints on compaction. You know, compaction was like something that people felt like whenever you would hit compaction, you know, people would complain, it's like, it's losing too much context. And so we sort of like solve that end to end. And like, you know, we decided to do like, end to end RL training and, you know, introduce compaction, like within research, and then, you know, make the model like, you know, itself, like, you know, very familiar with the concept of compaction and like producing like optimal, like sort of like delegating to itself, like across time. And, you know, once we had that, and we had solved it at the model level, like sort of like the harness problem became like so much easier because it was just like, oh, just let the model do it. And it's going to be like very reliable. So through that and like through that collaboration, it just felt like, like the momentum has been like very strong and that we're so like able to improve like models and like, you know, ship a model like roughly like on a weekly, a monthly cadence. And then we took like a bit of a different bet and like a different approach with the Codex app, which turned out to be like, you know, an awesome thing, you know, to just try and do is like in a not just like, so like force ourselves, you know, and like trying to cram everything into the Tui. I mean, it was like, it was like a great challenge, right? You know, you were like, I'm, you know, it's like, let's build an app. Like, just like, where do I get started? And then, you know, just like, you just got obsessed by it. It's hard not to. Yeah. I mean, it's like, how was it to just like, you know, build something that was quite contrarian, I suppose. Yeah. I mean, I remember you and I talking about whether or not like early on we were like we don't know if we'll ship this yeah like we'll we'll try it out we'll see if we can get there with something that we love and see if we can get i remember saying like let's get some pmf internally let's let's get everybody at open ai to want to use this thing without being forced to use it let's see let's see if we can do it right we did and it was like adopted very quickly i mean the the minute it was barely usable the research folks like put dev boxes on it right like which was like this crazy hack at the time yes yes um but now they use it like for everything yeah yeah it's like including in training like five three codecs and so like i think i feel really good about having hit the point where like you know like everyone technical at the company like almost everyone at technical at the company like uses codecs but like the people who use it the most are you know actually building codecs and building the models. And so, you know, we're just able to like, you know, improve things at like crazy, crazy speeds. And, you know, there's just like no signs of it slowing down. Amazing. Well, I'm excited for what you ship next. Thank you guys for your time. I really appreciate it. Thank you. Thank you for having us. Thanks. Oh my gosh, folks, you absolutely positively have to smash that like button and subscribe. to AI and I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard, but instead of gold, it's filled with pure, unadulterated knowledge bombs about ChatGPT. Every episode is a roller coaster of emotions, insights, and laughter that will leave you on the edge of your seat, craving for more. It's not just a show. It's a journey into the future with Dan Shipper as the captain of the spaceship. So, do yourself a favor. favor hit like smash subscribe and strap in for the ride of your life and now without any further ado let me just say dan i'm absolutely hopelessly in love with you