BG2Pod with Brad Gerstner and Bill Gurley

Inside OpenAI Enterprise: Forward Deployed Engineering, GPT-5, and More | BG2 Guest Interview

69 min

•Sep 11, 20259 months ago

Summary

OpenAI's head of engineering and head of product discuss the company's enterprise platform, sharing detailed case studies of deployments at T-Mobile, Amgen, and Los Alamos National Labs. They cover GPT-5 development, multimodal capabilities, model customization through reinforcement fine-tuning, and their predictions for AI's future impact across industries.

Trends

Physical autonomy advancing faster than digital autonomyEnterprise AI requiring extensive scaffolding and integration workHealthcare emerging as top AI beneficiary industryShift from memorization-based education to critical thinkingAI-native workforce gaining competitive advantageReal-time voice AI replacing traditional speech-to-text pipelinesReinforcement fine-tuning becoming standard for specialized AI applicationsForward deployed engineers becoming critical for enterprise AI successBottom-up AI adoption requiring subject matter expertiseAI tools enabling non-engineers to build software

Topics

OpenAI Enterprise Platform Forward Deployed Engineering GPT-5 Development Multimodal AI Real-time API Voice AI Reinforcement Fine-tuning AI Agents Healthcare AI Government AI Deployments Model Customization Enterprise AI Implementation AI Evaluation Systems Autonomous Systems Developer Tools

Companies

OpenAI

T-Mobile

Amgen

Los Alamos National Labs

People

Full Transcript

We literally had to bring the weights of the model physically into their supercomputer In San Francisco. You could take a car from one part of SF to the other fully autonomously as opposed to the digital world. I can't book a ticket online right now. Physical autonomy is ahead of digital autonomy in 2025. AI agents are like really in day one here. Like ChatGPT only came out in 2022 and the slope I think is incredibly steep. I actually do think self driving cars have a good amount of scaffolding in the world. You have roads, roads exist, they're pretty standardized, you have some stoplights. AI agents are just kind of dropped in the middle of nowhere. We'll start with long short game. I'm short on the entire category of like tooling evals products. Healthcare is probably the industry that will benefit the most from AI. I think I'm AGIPL'd. You're definitely AJIPL'd. The first one was the realization in 2023 that I would never need to code manually like ever, ever again. Hey folks, I'm Apoor Vagrawal and Today at the OpenAI office we had a wide ranging conversation about OpenAI's work in enterprise. I have with me the head of engineering and head of product of the OpenAI platform, Sherwin Wu and Olivia Godman. OpenAI is well known as the creator of ChatGPT, which is a product that billions across the world have come to love and enjoy. But today we dive into the other side of the business which is OpenAI's work in enterprise. We go deep into their work with specific customers and how OpenAI is transforming large and important industries like healthcare, telecommunications and national security research. We also talk about Sherwin and Olivier's outlook on what's next in AI, what's next in technology and their picks both on the long and short side. This is a lot of fun to do. I hope you really enjoy it. Well, two world class builders, two people who make look building easy. Sherwin, my Palantir 2013 classmate, tennis buddy with two stops at Quora and Opendoor through the IPO before joining OpenAI before ChatGPT. You've now been here for three years and lead engineering for all OpenAI platform. Olivier, former entrepreneur, winner of the Golden Llama at Stripe where you were for just under a decade and now lead all of the product at OpenAI platform. That's right. Thanks for doing it. Thank you. Thanks for having us as a shareholder, as a thought partner, kicking ideas back and forth. I always learn a lot from you guys. And so it's a treat, it's a real treat to do this for everybody. You know, I'll open with. People know OpenAI as the firm that built ChatGPT, the product that they have in their pocket that comes with them every day to work, to personal lives. But the focus for today is OpenAI for Enterprise. You guys lead OpenAI platform. Tell us about it. What's underneath the OpenAI platform for B2B for Enterprise? Yeah, so this is actually a really interesting question too, because when I joined OpenAI around three years ago to work on the API, it was actually the only product that we had. So I think a lot of people actually forget this. Where the original product from OpenAI actually was not ChatGPT, it was a B2B product. It was the API. We were catering towards developers. So I've actually seen the launch of ChatGPT and everything downstream from that. But at its core, I actually think the reason why we have a platform and why we started with an API, and is it kind of comes back to the OpenAI mission. So our mission, obviously is to build AGI, which is pretty hard in and of itself, but also to distribute the benefits of it to everyone in the world, to all of humanity. And it's pretty clear right now to see ChatGPT doing that, because my mom, maybe even your parents, are using ChatGPT. But we actually view our platform, and especially our API and how we work with our customers, our enterprise customers, as our way of getting the benefits of AGI, of AI, to. To as many people as possible, to everyone in every corner of the world. ChatGPT obviously is really, really, really big now. It's, I think, like the fifth largest website in the world. But we actually, by working through developers using our API, we're actually able to reach even more people in every corner of the world and every different use case that you might have, and especially with some of our enterprise customers, we're able to reach, even use cases within businesses and reach end users of those businesses as well. And so we actually view the platform as kind of our way of fully expressing our mission of getting the benefits of AGI to everyone. And so concretely, though, what the platform actually includes today, the biggest product that we have is obviously our developer platform, which is our API. Many developers, the majority of the startup ecosystem builds on top of this, as well as a lot of Digital natives, Fortune 500 enterprises at this point. We also have a product that we sell to governments as well in the public sector. So that's all part of this as well. And also an emerging product line for us in the platform is our enterprise products. So what we actually might sell directly to enterprises beyond just a core API offering. Fascinating. And maybe to double down. I think B2B is actually quite core to the OpenAI mission. What we mean by distributing AGI benefits is I want to live in a world where there are 10x more medicines going out every year. I want to live in a world where education, public service, civil service are increasingly optimized to everyone. There are a large category of use cases that only go through B2B. Frankly, unless you enable the enterprises we talk about Palantir, I think that's probably the same thes at Palantir is like, hey, like those are the businesses who are actually making stuff happen in the real world. And so if you do enable them, if you do accelerate them, that's how essentially you benefit distribute AGI. Yeah, well, maybe we can double click into that. Olivia. The reach for chat is obviously wide, billions of users, but for enterprise, maybe Tell us about it. Maybe we go deep into a customer example or two and what is an organization that we have helped transform maybe and at what layers. So if I were to step back, we started our B2B efforts with the API like a few years ago. Initially the customers were startups, developers, indie hackers, extremely technically sophisticated people who are building cool new stuff essentially and taking massive market risk. So we still have a bunch of customers in that category and we love them and we keep building with them. On top of that, over the past couple of years we've been working more and more with traditional enterprises and also digital natives essentially. I think basically everyone woke up with ChatGPT and those models are working. There is a ton of value and they could see essentially many use cases in the enterprise. Couple of examples which I like the most, one which is very both fresh and is quite cool. We've been working a lot with T Mobile. T Mobile so T Mobile leading us telco operator T Mobile has a massive customer support load. People asking hey, I was charged that amount of money, what's going on? Or my cell phone isn't working anymore. A massive share of that load is voice calls. People want to talk to someone and so for them to be able to essentially automate more and more and to help people self serve in a way debug their subscription was pretty big. And so we've been working with T Mobile pretty for the past year at that point to basically automate not only text support but also voice support. And so today there are features in the T Mobile app that if you call actually handled by OpenAI models behind the scenes and it does sound super natural, human sounding latency quality wise. So that one was really fun. A second one which is really just on that. Can I ask you a follow up question? So we've got text models, we've got voice models, maybe even video models someday that are deployed at T Mobile. But what above the models or adjacent to the models might we have helped T Mobile with? For example? Yeah, there is a ton we are doing. The first one is you have to put yourself in the shoes of an enterprise buyer. Their goal is to automate or reduce, optimize customer support. And going from a model like tokens in tokens out to that use case, it's hard. And so first there's a lot of design, system design. We do have actually now forward deployed engineers who are helping us quite a bit. Engineers, yeah, borrow the term from Palantir. Yeah, it's a great term. Were you at these at Palantir? I was not an fd I was on, I think they called it the dev side. Right. It's like software engineering. I was also only an Internet at Palantir, but yeah, it's a great term. I think it accurately describes what we're asking folks to do, which is like embed very deeply with customers and honestly like build things specific to their systems, they're deployed onto these customers. But yeah, we are obviously growing and hiring that team quite a bit because they've been very effective. Like FT Mobile, four years of my life. Yeah, forward deployed, but go ahead. So forward deployed engineering, forward deployed engineers and the sort of systems and integrations they're doing is first you have to orchestrate. Those models are not just those models know nothing about the CRM and what's going on. And so you have to plug the model to many, many different tools. Many of those tools like in the enterprise do not even have APIs or clean interfaces. Right. It's the first time they'll be exposed to a third party. And so there is a lot of standing up API gateways, tools connecting. Then you have to essentially define what good looks like. Again, it's a pretty new exercise for everyone. Defining a golden set of evals is easier than it sounds. Harder than it sounds. And so we have been spending a bunch of time with them. Evals are important, Evals are super important, especially audio evals. I know audio evals are extra hard to grade and get right, but the bulk of the use case Here is actually audio and we have, I don't know, five minute call transcript. How do you actually know that the right thing happened? It's a pretty tough problem. Yeah, it's pretty tough. And then actually nailing down the quality of the customer experience until it feels unnatural. And here latency and interruptions play a really important part. We shipped in GA an API, real time API I think it was last week, which is like a beautiful work of engineering. There was a really cracked team behind the scenes which basically allows us to get the most natural sounding voice experience without having these weird interruptions on your lag where you can feel that essentially the thing is off. So yeah, coupling all that together and you get a really good experience. Yeah, that's a lot more than just models. Yeah, I was going to say one actually really great thing that I think we've gotten from the T Mobile experience is actually working with them to improve our models themselves. So for example, the last, the real time ga, last week we obviously released a new snapshot, the GA snapshot. And a lot of the improvements that we actually got into the model came out of the learnings that we had from T Mobile. It brings in a lot of other change from other customers. But because we were so deeply embedded into T Mobile and we were able to understand what good looks like for them, we were able to bring that to some of our models. That makes sense. So this is a large customer with tens of millions of users, if not hundreds of millions. And the before and after is on the support side, both tech support internally and then their customer support. Yeah, makes sense. Yeah. Is there another one that you guys can share? I like a lot Amgen. Amgen, the healthcare business. Amgen, yeah. So we are working quite a bit with healthcare companies. Amgen is one of the leading healthcare companies. We specialized into drugs for cancer or inflammatory diseases. They're based out of LA and we've been working essentially with appgen to essentially speed up the drug development and process. So the North Star is pretty bold and it's really interesting. Similarly, we embedded pretty deeply with Amgen to understand what are their needs. And it's really interesting when I look at those healthcare companies, I feel like there are two big buckets of needs. One is pure R and D. Like you're seeing like a massive amount of data and you have super smart scientists who are trying to combine test out things. So that's one bucket. A second bucket is much more common across other industries. It's like pure admin, document authoring, document reviewing work which is by the time your R and D team has essentially locked the recipe of a medication, getting that medication to market is a ton of work. Like, you have to submit to, like, various regulatory bodies, get a ton of reviews. And, you know, when we looked at essentially those problems, what we knew what models were capable of, we saw like, you know, a ton of benefits, a ton of opportunities to automate and, you know, augment essentially the work of those teams. And so, yeah, Amgen has been like a top customer of DPT5, for instance. Wow. I mean, this could be hundreds of millions of lives if a new drug is developed faster. Yeah, exactly. Huge impact. So that's, you know, that's, I think, one good example of a kind of impact on which you need to enable enterprises to do it. And so I think we're going to do more and more of those. And yeah, frankly, on a personal level, it's a delight if I can play a tiny role, essentially like doubling the kind of medication that people get in the real world. That feels like a pretty good achievement. Huge, Huge. Huge. I know. You had one. Yeah. So one of my favorite deployments that we've done more recently actually is with the Los Alamos National Labs. So this is the government national research lab that the US Government is running in Los Alamos, New Mexico. It's also where the Manhattan Project happened back in the 40s and 50s, back when it was a secret project. So after that they ended up formalizing it as a city and a program, and then now it's a pretty sizable national laboratory. This one is very interesting because one, just the depth of impact here is unimaginable for me. It's on the scale of Amgen and some of these other larger companies. But obviously they're doing a lot of actual new research there, so a lot of new science. They're doing a lot of stuff with our Defense Department and Defense use cases as well. So very intense, very intense stuff. But the other thing that's actually very interesting about this one was that it's also a story of a very bespoke and new type of deployment that we've done. So because they're a government lab, they're so restrictive and high security and high clearance with a lot of their things. We couldn't just do a normal deployment with them. They couldn't. You know, you can't have people doing national security research just hitting our APIs. And so we actually did a custom on prem deployment with them onto one of their supercomputers called Bonado. And so this Actually involves a bunch of very bespoke work with some fdes, also with a lot of our developer team to actually bring one of our reasoning models, O3, into their laboratory, into an air gapped supercomputer, Venato, and actually deploy it and get it installed to work on their hardware, on their networking stack and actually run it in this particular environment. And so it was actually very interesting because we literally had to bring the weights of the model physically into their supercomputer in an environment, by the way, where you're not allowed to have. It's very locked down for good reason. You're not allowed to have cell phones or any other electronics with you as well. And so I think that was a very unique challenge. And then the other interesting thing about this deployment is just how it's being used, right? So the interesting thing is because it's so locked down and on prem, we actually do not have much visibility into exactly what they're doing with it. But we do have. They give us feedback. Yeah, yeah, they actually do have some telemetry, but it's within their own systems. But we do know that it's being used for a bunch of different things. It's being used for aiding them in terms of speeding up their experiments. They have a lot of data analysis use cases, a lot of notebooks that they're running with reams of data that they're trying to process. They're actually using it as a thought partner, which is something that's pretty interesting to me. O3 is pretty smart as a model. And a lot of these people are tackling really tough, novel research problems. And a lot of times they're kind of using O3 and going back and forth with it on their experiment design, on what they actually should be using it for, which is something that we couldn't really say about our older models. So yeah, it's just being used for a lot of different use cases for the national lab. And the other cool thing is it's actually being shared between Los Alamos and some of the other labs, Lawrence Livermore, Sandia as well, because it's the supercomputer setup where they can all kind of connect with it remotely. Fascinating. I mean, we've just gone through three pretty large scale enterprise deployments, right, which might touch tens, if not hundreds of millions of people. But there's this. On the other side of this is the MIT report that came out a couple of weeks ago. 95% of AI deployments don't work. A bunch of scary headlines that even Shook the markets for a couple of days. Put this in perspective. For every deployment that works, there's presumably a bunch that don't work. So maybe we can maybe talk about that. What does it take to build a successful enterprise deployment, a successful customer deployment and the counterfactual based on all your experience serving all these large enterprises, I think at that point I may have worked with a couple of hundreds, I think enterprises. So okay, I'm going to pattern match what I've seen. Being clear leading indicator of success. Number one is the interesting combination of top down buy in and enabling very clear group of a tiger team. Essentially like the enterprise, which is sometimes a mix of OpenAI Enterprise employee. So typically you take T mobile. The top leadership was extremely boring. It's a priority. But then letting the team organize and be like, okay, if you want to start small, start small. And then we can scale it up essentially. So that would be part number one. So top down buy in and a bottom call it tiger team. Tiger team of people. A mix of technical skills and people who just have the organizational knowledge, like institutional knowledge. It's really funny in the enterprise, customer support is a good example. What we found is that the vast majority of the knowledge is in people's heads, which is probably a thing that with FD is in general. But you take customer support, you would think that everything is perfectly documented, jira, et cetera. The reality is the standard operating procedures like the SOPs are largely people said. And so unless you have that tiger team mix of technical and subject matter expert, really hard to get something on the ground that would be one, two would be evals first. Whenever we define good evals, that gives a clear common goal for people to hit whenever the customer fails to come up with good evals. It's a moving target. You never know essentially if you made it or not. And you know, evals are much harder than what it looks to get done. And evals also oftentimes need to come up, bottom up. Right. Because all of these things are kind of in people's heads, in the actual operators heads. It's actually very hard to have a top down mandate of like you got like this is how the evals should look. A lot of it needs the bottoms up adoption, right? Yeah, yeah, yeah. And so we've been building quite a bit of tooling on evals. We have like an evals product and you know, we're working on more to essentially solve like you know, that problem or make it easy as we can. The last thing is you want to Hill climb, essentially you have your evals. The goal is to get to 99%. You start at 46. How do you get there? And here, frankly, I think oftentimes a mix of, I will say, almost wisdom from people who've done it before. A lot of that is art, sometimes more than science. Knowing the quirks of the model, the behavior. Sometimes we even need to fine tune ourselves, the models. When there are some clear limitation and being patient, getting your way up there. And then ship. Can we go under the hood a little bit? One of the things that we think about a lot is autonomy. More broadly, what is the makeup of autonomy? On one side in San Francisco, you could take a car from one part of SF to the other fully autonomously. No humans involved. No, you press a limo. A lot of the waymos, they've done billions of rides. I think it was what, three and a half billion rides. This is on the Tesla fsd. I think Waymo's done like million tens of millions of rides. That's a lot of autonomy in the physical world as opposed to the digital world. I can't book a ticket online right now. There's all sorts of problems that happen if I have my operator try to book a ticket. And it's very counterintuitive because the bar for physical safety is so much higher. The bar for physical safety is higher than the human's capability because lives are at stake. Yeah. The bar for digital safety not that high because all you're going to lose is money. Nobody's life is at stake. But yet physical autonomy is ahead of digital autonomy in 2025. What seems counterintuitive, like why is that the case at a technical level? Why is it that what should sound easier is actually a lot harder? Yeah. So I think there are kind of two things at play here. And I really like the analogy with, with self driving cars because they've actually been one of the best applications of AI I think that I've used recently. But I think there are two things in play. One of them is honestly just the timelines. We've been working on self driving cars for so long. That's right. I remember when I back in 2014, it was kind of like the advent of this and everyone was like, oh, it's happening in five years. Turns out it took, I don't know, 10, 15 years or so for this happen. So there's been a long time for this technology to really mature and I think there's probably like dark ages back in 2015 or 2018 or something. Where it felt like it wasn't going to happen. Trough of disillusionment. Yes, yes. And then now we're finally seeing it get deployed, which is really exciting. But it has been like, I don't know, 10 years, maybe even 20 years from the very beginning of the research. Whereas I think AI agents are really in day one here. ChatGPT only came out in 2022, so less than three years ago. But I actually think what we think about with AI agents and all that really, I think started with the reasoning paradigm that when we released the O1 preview model back in late last year, I think. And so I actually think this whole reasoning paradigm with AI agents and the robustness that those bring has only really unfolded for like a year, less than a year really. And so I know you had a chart in your blog post which I really like, which the slope is very meaningfully different now. Like self driving started very, very early, slope seems to be a little bit slower, but now it's reaching the promised land. But man, we started super recently with AI agents and the slope I think is incredibly steep and we'll probably see it crossover at some point, but we really have only had like a year really to explore these. Do you think we haven't crossed over already? When you look at the coding work in particular? Yeah, it's a good point. It's like your chart actually shows AI agents as below self driving. But what is the Y axis by some measures? I would not be surprised actually if AI products or AI agent products is making more revenue than Waymo at this point. Waymo is making a lot, but just look at all the startups coming up. Look at ChatGPT and how many subscriptions are happening there and all of that. And so maybe we have actually crossed and a couple of years from now it's going to look very, very different. Yeah, the Y axis is tangible, felt autonomy, perfectly objective. How do I feel about vibes more than revenue. But revenue is a good one. We should probably redo that with revenue. There's a second thing I wanted to mention on this as well, which is the scaffolding and the environment in which these things operate in. So I actually remember in the early days of self driving, a lot of the researchers around self driving were saying that the roads themselves will have to change to accommodate self driving. There might be sensors everywhere so that the self driving cars can interact with it, which I think is retrospect overkill. But I actually do think self driving cars have a good amount of scaffolding in the world for them to operate in. It's not completely unlimited. You have roads, roads exist, they're pretty standardized. You have stoplights. People generally operate in pretty normal ways and there are all these traffic laws that you can learn. Whereas AI agents are just kind of dropped in the middle of nowhere and they kind of have to feel around for them. And I actually think going off of what Olivier just said too, my hunch is some of the enterprise deployments that don't actually work out likely don't have the scaffolding or infrastructure for these agents to interact with as well. A lot of the really successful deployments that we've made, a lot of what our FD's end up doing with some of these customers is to create almost like a platform or some type of scaffolding, connectors organizing the data so that the models have something that they can interact with in a more standardized way. And so my sense is self driving cars actually have had this in some degree with roads over the course of their deployment. But I actually think it's still very early in the AI agents. And I would not be surprised if a lot of these, a lot of enterprises, a lot of companies just don't really have the scaffolding ready. So if you drop an AI agent in there, it kind of doesn't really know what to do and its impact will be limited. And so I think once this scaffolding gets built out across some of these companies, I think the deployment also speed up. But again, to our point earlier, I think there's no slowdown, there's no, you know, things are still moving very fast. That's great. Well, you know, I've thought about autonomy as a three part structure. You've got perception, you've got the reasoning, the brain, and then you've got the call, the scaffolding. The last mile of making things work. Maybe we can dive into the second part, which is the reasoning, which is the juice that you guys are building with GPT5 most recently. Huge endeavor. Congrats. The first time you guys have launched a full system. Not a model or a set of models, but a full system. Talk about that. I mean the full arc of that development. What was your focus? I mean, honestly, the benchmarks all seem so saturated. Like clearly it was more than just benchmarks that you were focused on. And so what was a North Star like? Tell us about GPT5. Soup to nuts. It's been the work of love of many people for a long time. And to your point, I think GPT5 is amazingly intelligent. You look at the benchmark, like the sweep bench and the likes of it is going pretty high. But I think to me equally important and impactful was I would say the craft, like the style, the tone, the behavior of the model. So you know, capabilities, intelligence and you know, behavior of the model. On the behavior of the model, I think it's the first model like large model release for which we have worked so closely with a bunch of customers for like month and months essentially to better understand what are the concrete locks, what are the concrete blockers of the model. And often it's not about having a model which is way more intelligent, a model which is faster, a model that better follows instruction, a model who is more likely to say no when he doesn't know about something. And so that super close customer feedback loop on GPT5 was pretty impressive to see. And I think all the love that GPT5 has been getting in the past couple of weeks, I think people are starting to feel that essentially the builders and once you see it, it's really hard to come back to a model which is extremely intelligent, but an exclusively academic essentially way. Are there trade offs that you made as you were going through it? Maybe. What are the hardest trade offs you made as you were building GPT5? I actually think a very clear trade off, which I honestly think we are still iterating on, is the trade off between the reasoning tokens and how long it thinks versus performance. And honestly this is something that I think we've been working on with our customers since the launch of the reasoning models, which is these models are so, so smart, especially if you give it all this thinking time. I think the feedback I've been seeing around GPT5 Pro has been pretty crazy too. It's just like these. Andre had a great tweet last night. Yeah, I saw that. Sam retweeted it. But these unsolved problems that none of the other models could handle. You throw it at a GPT5 Pro Pro and it just one shots. It is pretty crazy, but the trade off here is you're waiting for 10 minutes, it's quite a long time. And so these things just get so smart with more inference time. But on the product builder, on the API side, for some of these business use cases, I think it's pretty tough to manage that trade off. And for us it's been difficult to figure out where we want to fall on that spectrum. So we've had to make some trade offs on how much other model think versus how intelligent should it get because As a product builder, there's a real latency trade off that you have to deal with where your user might not be happy waiting 10 minutes for the best answer in the world. It might be more okay with the substandard answer and no wait at all. Yeah, I mean even between GPT5 and GPT5, thinking I have to toggle it now because sometimes I'm so impatient. I just want it asap. Yeah, I think there's ability to skip. Right. And I'm impatient. I just want the more simple answer. That's right. That's right. Well, four weeks in GPT5. How's the feedback? Yeah, I think feedback has been very positive, especially on the platform side, which has been really great to see. I think a lot of the things that Olivier mentioned have come up in feedback from customers. The model is extremely good at coding, extremely good at reasoning through different tasks, but especially for coding use cases, especially, you know, when it thinks for a while, it'll usually solve problems that no other models can solve. So I think that's been a big positive point of feedback. The kind of robustness and the reduction in hallucinations has been a really big positive feedback. Yeah, yeah, yeah. I think there was an eval that showed that hallucinations basically went to zero for a lot of this. It's not perfect. There's still a lot of work to be done. But that's a big one I think because of the reasoning in there too. It just makes the model more likely to say no, less likely to hallucinate answer. So that's been something that people have really liked as well. Other bit of feedback has been around instruction following. So it's really good at instruction following. This almost bleeds into the constructive feedback that we're working on where for that it's so good at instruction following that people need to tweak their prompts or it's almost too literal. That's one is an interesting trade off actually because when you ask people, developers, what do you want? You want the model to follow instructions, of course, but once you have a model which that is extremely literal essentially, then essentially forces you to express extremely clearly what you want, otherwise the model may go sideways. And so that one was interesting feedback. It's almost like the monkey paw where it's like developers and platform customers ask for better instruction following. So yes, we'll give you really good instruction following, but it follows it almost to a T. And so it's obviously something that the team is actually working through. I think a good example of this, by the way, is some customers would have these prompts. I remember when we were testing GPT5, one of the negative feedback that we got was the model was too concise. We were like, what's going on? Why is the model so concise? Interesting. And then we realized it was because they were reusing their old prompts from other models. And with the other models, you have to really beg the model to be concise. So there are 10 lines of be concise. Really be concise. Also keep your answer short. And it turns out when you give that to GPT5, it's like, oh, my gosh, this person really wants it to be concise. And so the response would be like one sentence, which is too terse. And so just by removing the extra prompts around being concise, the model behaved in a much better way and much closer to what they actually ended up wanting. Yeah. Turns out writing the right prompt is still important. Yes, yes. Yeah, Prompt engineering is still very, very important. On constructive feedback for GPT5, there's actually been a good amount as well, which we're all working through. One of them that I think is. I'm really excited for the next snapshot to come out to fix some of this is code quality and small code paradigms or idioms that they might use. I think there were feedback around the types of code and the patterns in which it was using, which I think we're working through as well. And then the other bit of feedback, which I think we've already made good progress on internally, is around the trade off of the reasoning tokens and thinking and latency around intelligence. I think, especially for these simpler problems, you don't usually need a lot of thinking. The thinking should ideally be a little bit more dynamic. And of course, we're always trying to squeeze as much reasoning and performance into as little reasoning tokens as possible. So I'd imagine that curve kind of going down as well. Yeah, well, huge congrats. I mean, it's been. I know it's a work in motion for a bunch of our companies. They've had incredible outcomes with GPT5. One of them is Expo Cybersecurity Business. Just like a. Yeah, I saw the chart from that. It was pretty crazy. Huge, huge upgrade from whatever they were using prior to that. And I think they're going to need a new eval soon. That's right, they're going to need a new eval. It's all about evals. On the multimodality side of it, obviously you guys announced the Real time apa. Last week I saw T Mobile was one of the featured customers on there. Talk about that how obviously the text models are leading the pack, but then we've got audio and we got video. Talk about the progress on the multimodal models. When should we expect to have the next big unlock and what would that look like? It's a good question. The teams have been making amazing progress on multimodality on voice, image, video. Frankly, the last generation models have been unlocking quite a few cool use cases. One of the feedback that we've received is because text was so much in the back on intelligence, people felt in particular voice that the model was somewhat, a little less intelligent. And until you actually see it, it does feel weird to have a better answer on context versus voice. And so that's pretty much a focus that we have at the moment. I think we feel part of that gap, but not the full gap for sure. So I think catch you up. I would say with the text would be one, a second one, which is absolutely fascinating is the model is excellent at the moment on easy casual conversation. Talk to your coach, your therapist. And we basically had to teach the model to speak essentially better in actual work. Economically valuable setups. Give an example. The model has to be able to understand what an SSN is and what does it mean to spell ssn. And if one digit is actually fuzzy, it should actually have to repeat versus guess. There are lots of intuitions like that that someone of course has of our voice that we are currently teaching the model and that's like an ongoing work actually with our customers. Until we actually confront the model to actual customer support calls, actual set goals, it's really hard to get a feel for those gaps. So that's a top priority as well. This is completely off script, but an interesting question that comes up in voice models particularly the Real Time API is previously people taking a speech input, convert that to text, then have a layer of intelligence, then you would have a text to speech model that would sort of play it back. Yeah, and this would be, it would be a stitch of these three parts. But the realtime API you guys have integrated all of that? Yes. And how does it happen? Because a lot of the logic is written in text. A lot of the Boolean logic or any function calling is written in text. How does it work with the Realtime API? That's an excellent question. So the reason why we ship the Realtime API is, is that we saw that for the stitch model. The stitch model? Yeah, the real text, the stitch Like a stitch together. Speech to text, thinking text to speech. So we sent you a couple of issues. One, slowness, like, you know, hops, essentially. Two, loss of signal, like, you know, across each model. Like the speech text model is less intelligent. Yeah, you'd leave emotion, you'd leave the accent. Exactly. Drive pauses. And when you are doing actual voice phone calls, essentially those signals are so important for this entry to the stem. One of the challenges that we have is what you mentioned, which is it means a slightly different architecture essentially for text versus voice. And so that's something that we are actively working on. But I think it was the right call to start essentially with. Let's make the voice experience natural sounding to a point where essentially you're feeling comfortable putting in production and then working backward to unify the orchestration logic essentially across modalities. To be clear, a lot of customers still stitch these together. It's kind of what worked in the last generation. But what we're increasingly seeing is more and more customers moving towards the real time approach because of how natural it sounds, how much lower latency it is, especially as we up level the intelligence of the model. But also even taking a step back, I will say it's pretty mind blowing to me that it works. The fact that I think it's mind blowing that these LLMs work at all, where you just train it on a bunch of texts and it's just autoregressively coming up with the next token and it sounds super intelligent. That's mind blowing in and of itself. But I think it's actually even more mind blowing that this speech to speech setup actually works correctly because you're literally taking the audio bits from it from someone speaking or putting it into the model and then it's generating audio bits back. And so to me, it's actually crazy that this works at all. It's pretty crazy, let alone the fact that it can understand accents and tone and pauses and things like that and then also be intelligent enough to handle a support call or something like that. I mean, if you've gone from text in text out to voice in voice out, that's pretty crazy. We have a bunch of companies in our portfolio that are using these models. Parlo on the customer support side, live kit on the infra side, and there's a bunch of use cases we are starting to see that, that a speech to speech model could address. Obviously a lot of the harder ones still running on what you're calling the stitched model. But I hope the day is not far when it's all on real time API, it's going to happen at some point. Right. And actually maybe that's a good segue into talking about model customization because I suspect that you have such a wide variety of enterprise customers. I think you mentioned what, hundreds of customers or maybe more. Each of them has a different use case, a different problem set, a different, call it, envelope of parameters that they're working in, maybe latency, maybe power, maybe others. How do you handle that? Talk about what OpenAI offers enterprises who need a customized version of a great model to make it great for them. Yeah, so model customization has actually been something that we've invested very deeply in on the API platform since the very beginning. So even pre ChatGPT days, we actually had a supervised fine tuning API available and people were actually using it to great effect. The most exciting thing, actually, I'd say around model customization, it obviously resonates quite well with customers because they want to be able to bring in your own custom data and create your own custom version of O3 or 04 mini or something, or GPT5 even suited to their own needs. It's very attractive. But the most recent development I think is very exciting has been the introduction of reinforcement fine tuning, something we announced late last year, I think in the twelve days of Christmas. We've ga'd it since and we're continuing to iterate on it. What is it? Break it down for us. Yeah, so it's called. It's actually funny, I think we made up the term reinforcement fine tuning. It's like not a real thing until we announced it. It's stuck now. It's stuck on Twitter all the time. I remember we were discussing it and I was like, I don't know about our. Yeah, so reinforcement fine tuning, it's introducing reinforcement learning into the fine tuning process. So the original fine tuning API does something called supervised fine tuning. Call it sft. It is not using reinforcement learning, it's using supervised learning. Supervised learning, yeah. So what that usually means is you need a bunch of data, a bunch of prompt completion pairs. You need to really supervise and tell exactly the model how to how it should be acting. And then when you train it on our fine tuning API, it moves it closer in that direction. Reinforcement fine tuning introduces RL or reinforcement learning to this loop. Way more complex, way more finicky, but an order of magnitude more powerful. And so that's actually what's really resonated with a lot of our customers. It allows you to, if you use rft the discussion is less of like creating a custom model that's specific to your own use case. It is. You can actually use your own data and actually crank the rl. Yeah, turn the crank on RL to actually create a best in class model for your own particular use case. That's the main difference here. With rft, the data set looks a little bit different. Instead of prompt completion pairs, you really need a set of tasks that are very gradable. You need a grader that is very objective that you can use here as well. That's actually been something that we've invested a lot in over the last year. We've actually seen a good number of customers get really good results on this. We've talked about a couple of them across different verticals. So Rogo, which is a startup in the financial services space, they have a very sophisticated AI team. I think they hire some folks from DeepMind to run their AI program and they've been using RFT to get best in class results on parsing through financial documents, answering questions around it and doing tasks around that as well. There's another startup called Accordance that's doing this in the tax space. I think they've been targeting an eval called taxbench which looks at CPA style tasks as well. Because they're able to turn it into a very gradable setup. They're actually able to turn the RFT crake and also get I think soda results on a tax bench just using our RFT product as well. And so it has kind of shifted the discussion away from just customizing something for your own use case to really leveraging your own data to create a best in class, maybe best in the world model for something that you care about for your business. Yeah, I feel like the base models are getting so good at instruction following that for behavior like steering, you don't need to fine tune at that point. You can describe what you want and model is pretty good at it. But pushing the frontier on actual capabilities. My hunch is that RFT will pretty much become the norm if you are actually pushing in your field intelligence to a pretty high point. At some point you need to RL essentially with custom environments. Fascinating. And even going back to the point earlier around top down versus bottoms up for some of these enterprises. A lot of the data that you end up needing for RFT require very intricate knowledge about the exact task that you're doing and understanding how to grade it. A lot of that actually comes from bottoms up. I know a lot of these startups will Work with experts in their field to try and get the right tasks and get the right feedback to craft some of these data sets. Without further ado, we're going to jump into my favorite section, which is a rapid fire question. We had a lot of great friends of ours send in some questions for you guys. But we'll start with the Ultimatum's favorite game, which is a long, short game. Pick a business, an idea, a startup that you're long and the same short that you would bet against that there's more hype than there's reality. Whoever's ready to go first. Long, short. My long is actually not in the AI space, so this is going to be slightly different. There we go. My short is though, in the AI space, so I'm actually extremely long. Esports. And so what I mean by esports is the entire professional gaming industry that's emerging around video games very near and dear to my heart. I play a lot of video games and so I watch a lot of this. So obviously I'm pretty in the weeds on this, but I actually think there's incredible untapped potential in esports and incredible growth to be had in this area. So concretely, what I mean are like a really big one is League of Legends. All of the games that Riot Games puts out, they actually have their own professional leagues. They actually have professional tournaments, believe it or not. They rent out stadiums actually now. But I just think it's like if you look at kind of what the youth and what younger kids are looking and where their time is going, it's predominantly going towards these things. They spend a lot of time on video games. They watch more esports than soccer, basketball, etc. Yeah, a growing number of these too. I've actually been to some of these events and it's very interesting. He's very committed to. Yeah, yeah. I'm extremely long and stuff. And so they're booking out stadiums for people to go watch electronic sports. Yeah, yeah, yeah. It's. I literally went to Oracle arena, the old Warrior Stadium, to watch one of these, I think, before COVID Wow. And then the. So it's just Covid. Wow. That's five years ago. That was a while ago. So I actually, I've been following this for a while and I actually think it had a really big moment in Covid. Like everyone was playing video games and I think it's kind of like come back down. So I think it's like undervalued, you know, it's like I think no one's really appreciating now, but it has all the elements to like really, really take off. And so the youth are doing it. The other thing I'd say is it is huge in Asia, like absolutely massive in Asia. It is absolutely big in Korea and China as well. Like, you know, we rented out Oracle Arena, I think, or like the event I went to was an Oracle Arena. My sense is in Asia they rent out like the entire stadiums, like the soccer stadiums, and the players are already like celebrities. So anyways, as like, you know, I know like Korean culture is really making its way into the US as well. I think that's another tailwind for this whole thing. But anyways, esports I think is something you should keep an eye out on because there's a lot of room for growth. Very unexpected. Yeah, good to hear. Short. My short. My short's a little spicy, which is I'm short on the entire category of tooling around AI products. And so this encapsulates a lot of different things. Kind of cheating because some of these I think are starting to play out already. But I think like two years ago it was maybe like evals products or frameworks or vector stores. I'm pretty short those. I think nowadays there's a lot of additional excitement around other tooling around AI models. So RL environments I think are really big right now as well. Unfortunately, I'm very short on those. I don't really see a lot of potential there. I see a lot of potential and reinforcement learning and applying it. But I think the startup space around RL environments I think is really tough. Main thing is, one, it's just a very competitive space. There's just a lot of people operating in. And then two, if the last two years have shown us anything, the space is evolving so quickly and it's so difficult to try and adapt and understand what the exact stack is that will really carry through to the next generation of models. I think that just makes it very difficult when you're in the tooling space because today's really hot framework or really hot tool might just not get used in the next generation of models. So I've been noticing the same pattern, which is the teams that build breakout startups in AI are extremely pragmatic. They're not super intellectual, but the perfect world. And it's funny because I feel like our generation has basically started in tech in a very stable moment where technology had been building up for years and years with SaaS Cloud. And so we were in a way raised in that very stable moment where it makes sense at that point to design very good abstractions and toolings because you have a sense where it's going. But it's so different today, no one quite knows what's going to happen next year or two. So it's almost impossible to define the perfect tooling platform. Right, right, right, right. Well, there's a lot of that going around right now. Yes. Spicy. A lot of homework there, Olivier. Over to you, sir. Long, short. I've been thinking a lot about education for the past month in the context of kids. I'm pretty short on any education which basically emphasizes human memorization at that point. And I say that having mostly been through the education myself, but I learned so much on historic facts, legal things that are. Some of it does shape your way of thinking. A lot of it, frankly, is just knowledge tokens essentially. And those knowledge tokens, it turns out LLMs are pretty good at it. So I'm quite short on that. That's right. You won't need memory when Stratgpt is bionic. You can just think about straight into your head. Exactly, exactly. What am I long at? Frankly, I think healthcare is probably the industry that will benefit the most from AI in the next year or two. I would say more. I think all the ingredients are here for a perfect storm. A huge amount of structure and structure of data. It's basically the heart of like the pharma companies. The models are excellent at digesting, processing that kind of data. A huge amount of admin heavy documents, heavy culture. But at the same time, companies which are very technical, very R and D friendly companies whose technology in a way is at the heart of what they do. And so, yeah, I'm pretty bullish on the health. This is like life sciences. So you mean life sciences research organizations that are producing drugs? Gotcha. Exactly, yeah, gotcha. Yeah. It's almost like over the last 20, 30 years, these pharma or biotech companies have basically, if you look at the work that they're doing, only a small amount of it is actual research. And so much of it ends up being admin and documents and things like that. And that area is just so ripe for something to happen with AI. And I think that's what we're seeing with Amgen and some of these other customers. Exactly. And it's also not what they want to do. I think it's good that we have some regulations there, obviously, but just means that they have reams and reams of things to kind of go through. And so when you have a technology that's able to really help bring down the cost of something like that, I think it'll just tear right through it. And I think once governments and institutions are going to realize that if you step back, it is probably one of the biggest bottlenecks to human progress. Right. You step back. In the past decade, how many true breakthrough drugs have there been? Not that many. Like, you know how life would be different, like if you double that rate, essentially. So once you realize what it stakes. Yeah. My hunch is that we're going to see quite a bit of momentum in that space. Wow. All right, Lots of homework there as well. Yeah. Next one. Favorite underrated AI tool other than ChatGPT, maybe. I love granola. Oh, man, two votes for granola. There is something like, yeah, hey, what about ChatGPT Record? I like ChatGPT Record as well. But there are some features of granola which I think are really done well. Like the whole integration with your Google Calendar is excellent and just the quality of the transcription and the summary is pretty good. Do you just have it on? Because I know your calendar is back to back. You just have granola on. So the funny thing is that I don't use granola internally. I use granola for my personal life, mostly. I see. Yeah, I see on dates. I'm joking. I'll say. Yeah, granola is actually going to be mine. So two votes for granola. I was going to say. The easy answer for me is Codex. As a software engineer, it's just like. So it's gotten so good recently. Codex CLI, especially with GPT5. Especially for me, I tend to be less time sensitive about the iteration loop with coding. And so leaning into GPT5 on Codex I think has been really, really. What about Codex has changed? Because Codex has also been through a journey. Codex has been around for a bit. I remember it's been launched for more than over a year ago. What's changed about codecs? Codex Apply has been around for a bit, so I feel like it's been less than a year for Codex. A few months, I would say. The time dilation is so crazy in this field. It feels like it's been around for a year ago with GPT4.0. Like, you know, I met that demo. That feels like ages ago. 01 hadn't even come out yet. 12 Days of Christmas hadn't happened yet. The voice demo is a naming thing. Okay. But anyway, yeah. Oh, there was a Codex model. That's what I'm thinking about. There was a codecs model. We are. Yeah, we are. You're not to blame for that confusion. Also, I think the GitHub thing was called Codex. That's right. Yes, yes, that's right. But I'm talking about our coding product within ChatGPT, which is the Codex cloud offering and then also Codex Cli. So actually maybe if I were to narrow my answer a little bit more to Codex Cli, which I've really really liked. I like the local environment set up. The thing that's actually made it really useful in the last I'd say like month or so is one I think the team has done a really good job of just getting rid of all the paper cuts, the small product polish and paper cut things. It kind of feels like a joy to use now I feel more reactive. And then the second thing honestly is GPT5. I just think GPT5 really allows the product to shine. At the end of the day this is a product that really is dependent on the underlying model. And when you have to iterate and go back and forth with model four or five times to get it right, to get it to do the change that you want versus having it think a little bit longer and it just one shots and does exactly what you want to do, you get this weird bionic feeling where you're like I feel so mind melded with the model right now and perfectly understands what I'm doing. So getting that dopamine hit and feedback loop constantly with codecs has made it an indispensable thing that I really really like. Nice. The other thing I'd say Codex is just really good for me is I use it for personal projects. I also use it to help me understand code bases as as an engineering manager now I'm not as in the weeds on the actual code and so you're actually able to use codecs to really understand what's happening with the code base, have it ask questions and have an answer about things and really catch you up to speed on things as well. So even the non coding use cases are really useful with codecs cli. Fascinating. Sam had this tweet about codec's usage ripping I think yesterday so I wonder what's going on there. But you're not alone. Yeah, I think I'm not alone. Just judging from the Twitter feedback I think people are really realizing how great of a combination codex client and GPT5 are. Yeah, I know that team is undergoing a lot of scaling challenges but I mean the system hasn't gone down for me so props to them. But we are in a GPU crunch so we'll see how long that goes. Awesome. Awesome. All right, the next one. Will there be more software engineers in 10 years or less? There's about 40, 50 professional software engineers. That's what you mean, like full time, like actual jobs? Yeah, yeah, because it's a hard one. Because like I think without a doubt there's going to be a lot more software engineering going on. Yes, of course. There's actually a really great post that was shared, I think in our internal Slack. It was like a Reddit post recently, I actually think that highlights this. It was a really touching story. It was a Reddit post about someone who has a brother who's non verbal. I actually don't know if you saw this. It was just posted. A person on Reddit posted, they have a nonverbal brother who they have to take care of the brother. They tried all these types of things to help the brother interact with the world, use computers. But vision tracking didn't work because I think his vision wasn't good. All the tools didn't work. And then this brother ended up using ChatGPT. I don't think he used Codex, but he used ChatGPT and basically taught himself how to create a set of tools that were tailor made to his nonverbal brother. Basically a custom software application just for them. And because of that he now has custom setup that was written by his brother and allows him to browse the Internet. I think the video was him watching the Simpsons or something like that, which is really touching, but I think that's actually what we'll see a lot more of. This guy's not a professional software engineer. His title is not software engineer, but he did a lot of software engineering. Probably pretty good. Good enough definitely for his brother to use. So the amount of code, the amount of building that will happen I think is just going to go through an incredible transformation. I'm not sure what that means for software engineers like myself. Maybe there's, you know, equivalent or maybe there's of course more show in. Yeah, more of me. More of me, specifically way more of you, but definitely a lot more software engineering and a lot of coding. I buy that completely, like I make completely a thesis that there is a massive software shortage, like in the world. Like we've been sort of accepting it, you know, for the past 20 years. But like the goal of software was never to be that super rigid, super hard to build, you know, artifact. It was to be customized, malleable. And so I expect that we'll see way more sort of a reconfiguration of people's jobs and skillsets where way more people code. I expect that product managers are going to code more and more. For instance, you made your PM's code recently, if you remember. Oh yeah, we did that. That was really fun. We started essentially not doing PRDs like product requirements documents, classic PM thing. You write five pages like my product does that, et cetera. And PMs have been basically vite coding prototypes. And one, it's pretty fast with GPT5 and like codecs just a couple hours I think. Freaking fast. And second, it sort of conveys so much more information than a document. Like you get a feel essentially for the feature, like does it write or not? So yeah, I expect that sort of behavior we're going to see more and more. Yeah, instead of writing English, you can actually now write the actual thing you want. Yeah, that's amazing advice for high school students who are just starting out their career. My advice is, I don't know, maybe it's evergreen, like prioritize critical thinking above anything else. If you go in a field which requires extremely high critical thinking skills, I don't know, math, physics or maybe philosophy is in that bucket, you will be fine. Regardless, if you go in a field that turns down that thing and again gets back to memorization, pattern matching, I think you will probably be less future proof as a. You know. What's a good way to sharpen critical thinking? Use ChatGPT and have it test you. That's actually testing having like you know, world class tutor who essentially knows how to put the bar like 20% of what you can do all the time, you know, is actually probably like a really good way to do it. Yeah. Nice. Anything from you sir? Mine is, I think it's just, I think we're actually in such an interesting unique time period where the younger. So maybe this is more general advice for not just high school students, but just the younger generation, even college students. I think the advice would be don't underestimate how much of an advantage you have relative to the rest of the world right now because of how AI native you might be interesting or how in the weeds of the tools you are interested in. My hunch is high schoolers, college students, when they come into the workplace, they're going to have actually a huge leg up on how to use AI tools, how to actually transform the workplace. And my push for some of the younger, I guess high school students is one, just really immerse yourself in this thing and then two, just really take advantage of the Fact that you're in a unique time where no one else in the workforce really understands these tools as deeply probably as you do. A good example of this is actually we had our first intern class recently at OpenAI. A lot of software interns, and some of them were just like the most incredible cursor power users I've ever seen. They were so productive. I was shocked. In a good way. I was like, yeah, I know we can get good interns, but I don't know they'd be this good. And I think part of it is they've grown up using these tools for better or worse in college. But I think the meta level point is they're so AI native and even, I don't know, me and Olivier, we're kind of AI native. We work at OpenAI, but we haven't been steeped in this and kind of grown up in this. And so the advice here would just be like, yeah, leverage that. Don't be afraid to kind of go in and spread this knowledge and take advantage of that in the workplace, because it is a pretty big advantage for them. Yeah. I can't remember who said this to us at Palantir, but every intern class was just getting faster, smarter, like laptops, like smarter every generation. You sure it didn't peak in 2013? You know, when I was an intern. That's right, that's right. It's a weird spike. Summer 2013. Yeah. Two guys like you. That's right, that's right, that's right, that's right. Yeah. Well, lots happened, you know, lots happened since you guys joined OpenAI. Right. With three years and almost three years in your OpenAI journey, what has been the. The rose moment, your favorite moment, the bud moment, where you're like most excited about something but. But still opportunity ahead. And the thorn, toughest moment of your three year journey. The Thorn is easy for me, what we call the blip, which is, you know, the coup of the board. Like, that was a really tough moment. It's funny because, you know, after the fact, it actually reunited quite a bit, the company. Like, there was a feeling, Open Air had a pretty strong culture before, but, you know, there was a feeling of, like, camaraderie, essentially. That was even stronger, but, you know, sure, like tough on the day off. It's very rare to see that antifragility. Most orgs after something like that break, break apart. But I feel like OpenAI got stronger. OpenAI came back. It's a good point. I feel it made OpenAI stronger for real now, essentially When I look at other facts, when I look at, you know, other like, you know, news like departures or you know, whatever, like you know, bad news essentially, I feel the company has built like, you know, a thicker skin and you know, an ability to like recover like way quicker. I think it's definitely right. Part of it too, I think is also just the culture. I also think this is why it was such a low point for a lot of people. So many people just at opening, I care so deeply about what we're doing, which is why they work so hard. You just care a lot about the work. It almost feels like your life's work. Like it's a very audacious mission and thing that you're doing. Which is why I think the blip was so tough on a lot of people, but also is what I think helped bring people back together and why we were able to hold together and get that, that thick skin as well. I have a separate worst moment which was the big outage that we had in December of last year, if you remember. Yeah, you remember. I do. It was like a multi hour outage. Really highlights to us how essential of almost like a utility the API was. So the background is, I think we had a three, four hour outage sometime in November or December last year. Really brutal. Pure sev0. No one could hit chatgpt. No one could hit the APIs. It was really rough. That was just really tough. Just from a customer trust perspective, I remember we talked to a lot of our customers to post more to them on what happened and our plan moving forward. Thankfully we haven't had anything close to that since then and I've been actually really happy with all the investments we've made in reliability over the last six months. But in that moment I think it was really tough. On the happy side, like on the roses, I think I have two of them. The first one would be GPT5 was really good. Like the sprint up to GPT5 I think really showed the best of OpenAI. Having cutting edge science research, extreme customer focus, extreme infrastructure and inference talent. And the fact that we were able to ship such a big model and scale it to many, many, many, many tokens per minute almost immediately I think speaks to it. So that one, I really. With no outages. With no outages, yeah. Really good reliability. I remember when we shipped GPT4 Turbo a year ago, a year and a half ago, we were terrified by the traffic and I feel we've really gotten much better at shipping those massive updates. The second row is Happy moment for me would be the first Dev Day was really fun. It felt like a coming of age of OpenAI. We are embracing that. We have a huge community, developers, we are going to ship models, new products. And I remember basically seeing all my favorite people, OpenAI or not home, essentially nerding out on what are you building, what's coming up next. It felt really a special moment in time that was actually going to be mine as well. So I'll just piggyback off of that, which is the very First Dev Day 2023, November. I remember it. Obviously a lot of good things have happened since then. There's just a very. I don't know why. For me, it was a very memorable moment, which was one. It was actually quite a rush. Up to Dev Day, we shipped a lot, so our team was just really, really sprinting. So it was like this high stress environment kind of going up to add to that. Of course, because we're OpenAI. We did a live demo in Sam's keynote of all the stuff that we shipped. And I just remember being in the back of the audience sitting with the team and waiting for the demo to happen. Once it finished happening, we all just let out a huge sigh of relief. We were like, oh my God, thank you. I think there's just a lot of buildup to it. For me, the most memorable thing was I remember right after Dev Day, all the demos worked well, all the talks worked well. We had the after party and then I was just in a way mode driving home at night with the music playing. It was just like such a great end to the Dev day. That was what I remember. That was my rose for the last episode. Love it. That's awesome. I assume you guys are, but please tell me if you're AGI build yes or no and if so, what was the moment that got you there? What was your aha moment? When did you feel the AGI? I think I'm AJ Peeled. I think I'm AJ Peeled. You're definitely AJ Peeled. I am. Okay. I've had a couple of them. The first one was the realization in 2023 that I would never need to code manually ever, ever again. I'm not the best coder, frankly. I chose my job for a reason. But realizing that what I thought was a given, that we humans would have to write basically machine language forever is actually not a given and that the pace of price is huge. Feeling the AGI. The second feel, the AGI moment for me was maybe the progress on voice and multimodality text. At some point you get used to it. Okay, the machine can write pretty good text. Yeah. Voice makes it real. But once you start actually talking to something that actually understands your tone. Understand my accent in French. It felt like sort of a true moment. Like, okay, machines are going beyond cold, mechanical, deterministic logic to something much more emotional and tangible. Yeah, that's a great one. Yeah, mine are. So I do think I am a GI pilled. I probably gradually became a GI pilled over the last couple years. I think there are two. And for me, yeah, I think I actually get more shocked from the text models. I know the multimodal ones are really great as well. For me, I think they actually line up with two general breakthroughs. So the first one was right when I joined the company in September 2022. It was pre ChatGPT. Yeah, two months ago. But at the time GPT4 already existed internally and I think we were trying to figure out how to deploy. I think Nick Turley's talked about this a lot, early days of ChatGPT, but it was the first time I talked to GPT4 and it was like going from nothing to GPT4 was just the most mind blowing experience for me. I think for the rest of the world, maybe going from nothing to GPT 3.5 in chat was maybe the big one. And then going from 3.5 to 4. But for me, and I think for a lot of maybe some other people who joined around that time, going from nothing to. Or not nothing, but like what was publicly available at the time. But going from that to GPT4 was just incredible. I just remember asking, throwing so many things out. I was like, there's no way this thing is going to be able to give an intelligible answer. And it just like knocks it out of the park. It was absolutely incredible. GPT4 was insane. I remember GPT4 came out when I was interviewing with OpenAI and I was still like on the fan. Should I, Joshua, join? And so that thing, I was like, okay. I mean, I mean guys, there is no way I can work on anything else at that point. Yep, yep, Yep. Yeah. So GPT, yeah, GPT 4 was just crazy. And then the other one was like, is the other breakthrough, which is like the reasoning paradigm. I actually think the purest representation of that for, for me was deep research and throwing like, like asking it to like really look up things that I didn't think it would be able to know. And seeing it like think through all of it, like be really persistent with the search get really detailed with the write up and all of that. That was pretty, pretty crazy. I don't remember the exact query that I threw it, but I just remember, like, I feel like the field AGI moments for me are like, I'll throw something at the model that I was like, there's no way this thing will be able to get. And then it just like knocks that out of the park. Like, that is kind of the field AGI moment. I definitely had that with deep research with some of the things I was asking. Yeah. Well, this has been great. Thank you so much, folks. You guys are building the future. You guys are inspiring us every day and appreciate the conversation. Thank you so much. As a reminder to everybody, just our opinions, not investment advice.