Episode 10 - How AI Is Accelerating Scientific Discovery Today and What's Ahead
48 min
•Nov 20, 20255 months agoSummary
OpenAI's Kevin Wheel and physicist Alex Lubchowska discuss how GPT-5 is accelerating scientific discovery across mathematics, physics, and biology. They present research showing AI's capability to solve frontier problems, explore literature across fields, and dramatically reduce time spent on calculations and literature searches.
Insights
- AI models are beginning to solve novel scientific problems at the frontier of human knowledge, not just replicate existing work, marking a qualitative shift in capability
- The most valuable applications today are in literature search and cross-disciplinary knowledge synthesis, where AI can connect disparate fields that humans struggle to hold together
- Scientists must adopt iterative collaboration with AI models rather than expecting one-shot solutions; low pass-rate problems (5-10%) are often the most scientifically valuable
- The gap between what current models can do and what the scientific community actually uses them for is substantial; awareness and adoption lag significantly behind capability
- Extended thinking time (18+ minutes) dramatically improves model performance on frontier problems, suggesting compute allocation to researchers could unlock major discoveries
Trends
AI-assisted scientific discovery shifting from narrow task automation to frontier problem-solving and novel proof generationCross-disciplinary research acceleration through AI's ability to synthesize knowledge across disparate fields and languagesExtended reasoning/thinking time becoming critical differentiator for frontier science problems versus standard benchmarksShift from static benchmarks (GPQA at 90%) to frontier science evaluations as models saturate traditional academic metricsAdoption lag in scientific community creating near-term opportunity for early adopters to gain competitive advantageFusion energy and drug discovery emerging as high-impact application areas for AI-accelerated researchModel capability doubling cycles creating perception gap where scientists test outdated versions and conclude AI is insufficientLiterature search and knowledge synthesis becoming bottleneck-breaking applications before novel discovery contributionsPersonalized medicine and scalable fusion positioned as near-term AGI impact vectors through scientific accelerationCollaborative human-AI research model replacing traditional solo researcher paradigm in theoretical sciences
Topics
AI-Accelerated Scientific DiscoveryGPT-5 Frontier Problem SolvingCross-Disciplinary Knowledge SynthesisExtended Reasoning and Thinking TimeLiterature Search and Citation DiscoveryMathematical Proof GenerationBlack Hole Physics and Quantum GravityFusion Energy ResearchDrug Discovery and Life SciencesScientific Benchmarking and EvaluationAI Adoption in AcademiaTheoretical Physics ApplicationsComputational BiologyMaterial Science PredictionResearch Methodology and Human-AI Collaboration
Companies
OpenAI
Host organization; Kevin Wheel leads OpenAI for Science initiative; developing GPT-5 models for scientific research a...
Google
Mentioned for AlphaFold work and competing AI research in scientific discovery space; created GPQA benchmark
Lawrence Livermore National Laboratory
Physicist Brian Spears demonstrated fusion research acceleration using GPT-5 on fusion energy problems
Vanderbilt University
Alex Lubchowska is professor of physics; institution where guest researcher is based
People
Kevin Wheel
Head of OpenAI for Science; leading initiative to accelerate scientific discovery through AI model deployment
Alex Lubchowska
OpenAI research scientist and Vanderbilt physics professor; demonstrated GPT-5 solving black hole symmetry problems
Andrew Mayne
Host of OpenAI Podcast; moderating discussion on AI's impact on scientific research
Brian Spears
Physicist at Lawrence Livermore; demonstrated GPT-5 solving fusion energy problems across difficulty levels
Mark Chen
Chief Research Officer at OpenAI; challenged Alex Lubchowska to test GPT-5 on quantum gravity problems
Demis Hassabis
Google DeepMind leader; credited for AlphaFold and competing AI research in scientific discovery
Quotes
"Maybe the most profound way that people are going to feel AGI in their lives is through science."
Kevin Wheel•Early in episode
"Can we help scientists do the next, say, 25 years of scientific research and scientific discovery in five years instead?"
Kevin Wheel•Mission statement
"You go very quickly from the model can't do something to the model can just barely do something. And it's not great at it yet, but you see these early examples."
Kevin Wheel•On capability progression
"The acceleration that is going to come from these tools is going to change science."
Brian Spears•Fusion research context
"Human and AI together are much more powerful than human alone or AI alone."
Andrew Mayne•Collaboration theme
Full Transcript
Hello, I'm Andrew Mayne, and this is the OpenAI Podcast. Today, my guests are Kevin Wheel, head of OpenAI for Science, and Alex Lubchowska, who is an OpenAI research scientist and professor of physics at Vanderbilt University. We're going to be discussing how AI is impacting science, an upcoming research paper, and where science might be headed in the next five years. Maybe the most profound way that people are going to feel AGI in their lives is through science. With ChatGPT, I can just launch it in that direction, in that direction, that direction. The acceleration that is going to come from these tools is going to change science. So you're running the OpenAI for Science initiative. Could you explain what that's about? Yeah, the mission of OpenAI for Science is to accelerate science. So the question is, can we help scientists do the next, say, 25 years of scientific research and scientific discovery in five years instead? Instead, science underpins so much of what we do and how we live. And if we can make progress go faster by putting our most advanced models into the hands of the best scientists in the world, we should do that. That's what we're trying to do. You could ask, like, why now? Why didn't we do this a year ago? Why aren't we doing this a year from now? One of the big reasons is we're just starting to see our frontier AI models being able to do novel science. So we're starting to see examples where GPT-5 can actually prove new things. Maybe not yet things that humans could not do, but things that humans have not done. So these little existence proofs of GPT-5 being able to break out past the frontier of human knowledge and into the unknown. And if there's one thing that I've learned from now, you know, a year and a half or so at OpenAI, it's that you go very quickly from the model can't do something to the model can just barely do something. And it's not great at it yet, but you see these early examples. And then, you know, six months later, 12 months later, all of a sudden you couldn't imagine doing this thing without AI. And I think science is in that initial phase where we're seeing real acceleration for scientists that are using AI. Sometimes novel, you know, not yet maybe large breakthroughs, call them small breakthroughs. And that just says that there's so much potential in this space. We've seen examples of, let's say, AI helping with mathematical proofs. Could you give me an example of how it might do things in some other areas like physics or whatever kind of things we might see in the short term? Yeah, I mean, we're seeing examples every day. And they're across the range of sort of the scientific frontier. Here you see examples in mathematics, in physics, astronomy, life sciences, like biology. Alex, I mean, you've worked on some of these. Maybe it's a good time to talk about some of the physics stuff that you've seen. Yeah, I think coming back to Kevin's point about how this is a special time, that's very much how I feel as well. Because I started the year 2025 thinking, yeah, chat GPT is cool. Like everybody, I used it when it came out. I thought it's a great chat bot. But I was sure it would take a very long time before it would become really relevant for my own work. So I started the year, I would say, as an AI skeptic because I like to see evidence before I'm convinced of something. And I saw people using it to help in the writing. And I started to use it for that as well. It's very useful for proofreading. But I thought, oh, it's going to be a while before it gets to do the special stuff that I'm really a specialist at. You're like black holes. Like black hole physics. Exactly. And I had this experience early this year where I was trying to find this magnetic field solution that describes what happens around a pulsar, which is a rotating star with very powerful magnetic fields. And I was going for this very particular solution. I had to solve a partial differential equation. I was able to identify that solution as an infinite sum over products of special functions called Legend Polynomias. and if you go to physics grad school this is the kind of thing that you spend a lot of time getting familiar with and i also like these puzzles and i was playing around with the sum and i felt like there should be a simple formula that it evaluates to and i thought okay i have this friend who is chat gpt 03 pro which i didn't have access to at the time and i thought okay i'm just gonna send it to him and see what comes out of it and he sends me back this output it thought for 11 minutes, which at the time I'd never seen it do because I was using the free version, which doesn't think for as long. And it gave this beautiful answer where it was able to understand what the sum was and break it down into pieces that it could tackle. And then it had to go and find this special identity that was published in one paper from the 1950s in the Norwegian Journal of Mathematics. And so it understood what the problem was and it knew about this random identity that was just the thing for the job and it used them and it gave this beautiful output. And at the end, the answer was wrong because it made the silly typo. It added an extra factor in front. It was almost kind of like a human making a silly typo at the end, but it was very easy to check the derivation. And I went through it and I realized, okay, there's this extra factor, but aside from that, it did the work. And that really sent me reeling because I thought, okay, I would say that's a uniquely human ability. I thought that's something that makes theoretical physicists, special. You know, now in 2025, clearly they're capable of doing things that I would consider amazing. Yeah. One of the cool things. So you've got examples like Alex's where it was probably not something that he he could have done it himself over eventually. But GPT was able to do it faster. That's acceleration on its own. And there's something qualitative about that even as well, because if you can explore instead of exploring two paths over the course of a week, if you can explore 10 paths in parallel in, you know, an hour, all of a sudden there's a lot more ideas that you can try. And that's also acceleration. We also see examples in like literature search, which you don't think of as maybe like deep scientific innovation, but it's really important to be able to understand, you know, has somebody worked on this problem before? And if so, is there something I can learn to speed up my own work? So, and we've seen interesting examples where there was one, I might get the details of this wrong, but we were talking to this researcher and he was saying he was exploring this particular idea in like high dimensional optimization. And he was like, man, you know, this thing I'm working on, it's interesting, but somebody must have worked on this before. I can't be the first person to have had this idea. I just can't, but I can't find any examples. And then he had given it, he sort of given a description of what he was working on to GPT-5 and GPT-5 found an example from, I think it was like economics or something a completely different field that used completely different terminology so no keyword lookup would have ever worked gpt5 did sort of a conceptual level literature search yeah found somebody's phd thesis in german so also a completely different language you know it was like basically lost to time but this person had done really interesting sort of related work that helped him in his research. And so, you know, that's another area. So you can talk about the acceleration that comes from just like novel proofs and GPT-5 being able to do something on its own or guided by an expert. But there's also these examples of acceleration in calculations and literature search and all of them contribute to accelerating science. Yeah, and the exact same thing happened to me. I was trying to derive this property of black holes and I got this equation that described this phenomenon I was after, and it had a three derivative term, which is pretty unusual. And I looked at it and I recognized it's something called the Schwartzen derivative, which is a special thing that appears in math. And I thought, hmm, wow, this is really strange that this would show up. And I just copy pasted the equation into ChatGPT and said, have you seen this before? And it said, oh yes, this is the conformal bridge equation. I had no idea what a conformal bridge was at the time. And it said, oh, just look up this paper. And that was amazing because it turns out that this equation that showed up in my work, had already been studied in some other works. And I've heard from a lot of colleagues doing research in physics that there's a lot of that going on. And at the forefront of knowledge, everything becomes so niche that it's very hard to know the latest details in neighboring fields. And GPT is an amazing help with that. Yeah, that's another thing that we've heard from professors, researchers that we've talked to is there's so much, you have to be so specialized today. And so sometimes it gets hard to explore an area outside of your main area. There's one particular mathematician we were talking to who said, you know, one of my last papers, I knew there was an area that I wanted to go follow it off in this direction, but it wasn't my specialty and it would have taken me a long time. And I just kind of ended up feeling like, you know, maybe that's not the most efficient place for me to spend my time. Now with GPT-5, I'm going to go back and explore that because I've got a coworker, effectively, a collaborator who has read just about every scientific paper that's out there and is a pretty meaningful expert on just about any topic you want. And I think I'm going to be able to go explore these adjacencies in a far better way with ChatGPT than I could have on my own. And so that's also a fascinating new take, right? It helps every – it can help you go deeper, like you were saying. and it can also help you go more broad. Literature search is pretty interesting because like one of my weird hobbies is I like to go back and look at when was some early scientific discovery made that didn't get utilized much later on. You know, a famous one was carbon filaments. You know, when Thomas Edison spent all that effort to try to find it, you know, it had been published in like 20 years before. Of course, you know, Dewey Decimal System was invented that year, so you can't blame him. Other things like silicon assist in my conductor. You know, if somebody would read in the literature, we might've had that five to 10 years earlier. ability to replicate DNA that had been published like 10 or 12 years earlier before somebody figured that out. And then the shotgun technique we use for DNA, you know, understanding, you know, figuring out like the DNA sequencing that was first published like 1982. But at that time, there weren't supercomputers that could run it. Right. And that's exciting just to think of just having a really good tool that can search through all of this stuff and pull up these answers you have. Yeah. I think especially some of the most interesting research now happens at the at the intersections of two fields. And again, it's hard for one person to be an expert in two fields, let alone three or four or five. And sometimes it's tough for humans to collaborate. You don't necessarily find the right person. The person doesn't have infinite patience. And here with GPT, you have now the option to have a collaborator that will work 24-7, has infinite patience, you know, has read substantially every scientific paper written in the last however many years. And so it's just it's it's a new kind of collaboration that is its own form of acceleration. You think about Claude Shannon's wife was a mathematician and how much that to help what he was able to do. And I think we forget how much collaboration really is a factor of that. But I would say some people hearing this might go, yeah, but it couldn't spell strawberry last year. Yeah. It couldn't do math. So why are we going to have it do science? Yeah. So actually, I don't even know if I've told you this. my own sort of origin story with appreciating uh what gpt5 could do or in this case it was i think oh this was almost a year ago so it was oh one preview maybe um but i was meeting with uh this this guy named brian spears who's a physicist at lawrence livermore um that was in dc and we'd never met before so i didn't know sort of what to expect i thought maybe i was going to go in and and be talking to him about what was new and what he could do with 01 Preview and why he should give it a try. Little did I know, I sat down and he immediately took control of the conversation and said let me tell you what I can do with your models And like these are the most amazing things for science and this is going to change the world And he was like OK let me take you through this And he opened up his laptop and, you know, he works on fusion, right? Lawrence Livermore was the first to to do large scale fusion with positive energy, like super exciting. So he's like, all right, we're going to take a fusion example. and first i'm going to start with the undergrad version of this problem and so he he shows me this conversation and he's like all right so you've got you know a copper rod and we're going to bombard it with super high pressure waves what happens and you know he's like it so he answered and oh one preview gives a good answer it's like okay cool so it got the it got the uh got the undergrad problem right and then now let's let's ask the graduate version of this now what happens inside the rod itself as you're doing this? And, you know, what needs to be true in order for it to generate these certain kinds of shockwaves? And he goes through and he's like, okay, so got that right. All right. Now let's ask the postdoc level question. All right. Now let's ask the, and at this point I'm like, you know, despite having a physics background, I'm just following along for the ride because he's beyond anything I can do. Like, all right, now let's ask the, you just joined Lawrence Livermore and you, you know, kind of question. You've gone through your postdoc you're a nuclear physicist and he keeps going and oh one preview keeps getting the answer right and then he's like all right now let me ask you the you've worked at lawrence livermore for 20 years question and it goes and it gets it right and then not only that but it like suggests that the only way to go forward is to use these these set of simulation tools that are like partially classified or the only lawrence livermore has it's like you know i don't have access to these but if you did, you would want to use these tools. And he's like, look, nothing in here that nothing that I just showed you was something that I couldn't do, but it would have taken me days. And certainly not everybody at the lab can do this. Like the acceleration that comes, that is going to come from these tools is going to change science. And so I went from like sitting down with this guy who I thought maybe I was going to be sort of talking to him about the value of AI to him just completely blowing my mind about the the potential of ai and this is a year ago this is oh one preview you know we've come leaps and bounds since then and the thing that i always try and and like remind everybody the ai models that we're using today as good as gpt 5.1 pro is these are the worst ai models that we will ever use for the rest of our lives and when you think about that the fact that we're here just implies that the future is very bright. How have your colleagues been using these tools? Yeah, there's a lot of different usages, I think. Literature search, here's what I'm working on. Does it connect to any other thing? And this is something that we spend a lot of time on as scientists, just understanding when something new shows up in our work, how it connects to other things. And okay, my own experience that made me become AI-peeled, I think. Is this the reason you came to open AI. Yeah. And when GPT-5 Pro came out, I met Mark Chen, who works here at Open AI. He's chief research officer. And he gave me a challenge. He was very proud. He said, you know, why don't you just give it a hard problem? And I thought, huh, you want a hard problem? Okay. And so I gave it this question. Quantum gravity. Right. So I had just found these new symmetries of black holes, which is something that doesn't happen that often. And I'd written up a paper that came out in June on the archive. And I was happy about that. And I thought, okay, well, let's see how GPT Pro handles this new question. And so I gave it the equation and I didn't say that it has some symmetries. I didn't give it a leading question. I just said, what are the symmetries? And it thought for five minutes and said, no symmetries. And I go, it's not there yet. Still better than the AI. And Mark said, is visibly crestfallen. He goes, okay, well, just give it an easier question then. And so I think, okay, I'm going to give it the warm-up baby version of the problem, which is find the symmetries of this equation not in the full black hole spacetime, which is complicated, but in the flat space limit where the spacetime is empty. And hit enter. It thinks for, you know, nine minutes. And it comes back with this beautiful answer. Oh, this equation has conformal symmetry, which is the correct thing. And here are the three generators. And it was very beautiful. And, you know, this version of the equation probably has been studied. I'm sure has been studied many times over the decades. So I don't know what it did exactly, but it came up with the answer. I thought, okay, this is very good. You know, this is a great outcome. And then Mark said, okay, well, but now that it's been primed on the warmup example, try again in this instance of chat, the harder problem. And I thought, okay, let's go. And so we give it the hard problem again, hit enter, and it thinks, and it thinks. And that was the first time I saw it, I think for so long, I think it took 18 minutes and it comes out with this beautiful answer that was completely correct. And that blew my mind because I had been working on this for a very long time. And I would say that that calculation is at the edge of my abilities. I think it's something that, you know, very few people could have done the way I did it. Um, and so I was really shocked because, you know, you spent years of your life training to be best in class as something and finding symmetries of black holes and these kinds of equations. That's, that's my jam. And I thought, okay, so I guess that just happened and it really sent my mind reeling. And, um, I was a little bit shell shocked for a few days and I just couldn't stop thinking about it. And after that, I realized, OK, I have to become involved in this because to see this capability emerge into the world right now and not to not be involved with this just seemed crazy to me. I was going to actually think you made you made a really important point in the middle of that around the fact that you gave it the hard question. It didn't get it right. You gave it an easier question. It got that right. And then you're able to give it a harder question. There is still, you know, as excited as we clearly are about the future here, there's also a very real set. Like when you're giving GPT-5 or any of these AI models a problem that's on the frontier, that's at the limit of their capabilities, they tend to still be wrong a lot. Kind of like any human would be operating at the level of, at the frontier of their capabilities. And it takes, you know, it isn't just automatic yet. Hopefully in the future it will be, you know, enter in any hard question and the model answers it. But today there's a lot of back and forth and the people that are best, the researchers that are best at getting the most out of the models have a sort of patience to go back and forth with them. I think that's natural. It's probably the way that you would work with any, you know, any two people operating at about the limit of their capabilities. But I think it's important, especially for folks listening to this who are doing research with the models to know that it's not it isn't just one shot and it always works. it there really is a back and forth and sort of a patience that it takes um and one of the interesting research problems that we're spending a lot of time thinking about is how we how we uh help people with how we sort of help reduce that that cognitive load because when you're working on a problem say the model is has a five percent pass rate on some problem so technically the model can get it right once out of 20 times but it's really at the frontier so it's not going to get it right nearly you know even close to every time if you're sitting inside chat gpt and just entering in this question you're gonna have to enter it in you know what 10 times before you have the odds that it's going to get the right answer and that's most people aren't going to do that um and so there's a whole host of problems that the model can solve that people probably try and are like oh after three tries it didn't get it right so let's i'll move on the model's not good enough yet and actually it is but it's just very hard to tell apart low pass rate problems from problems that are too hard. And I think that's actually a really important thing for us to help researchers and mathematicians get past because the most interesting problems right now are gonna be the ones where the model has a very low but non-zero pass rate. Those are gonna be the hardest problems that the model can solve, the best ways that it can help accelerate science. And so that's a really interesting research problem that we're taking on to try and make that a little more automatic, a little less groundwork. But for now, like putting in the time and really going back and forth with the model does yield results. Well, it feels like we're at a moment kind of like when we went from GPT 3.5 to chat GPT. 3.5 was a model, extremely capable model, but it was still effectively a base model. And I was a prompt engineer at the time and knowing how to prompt that I could get great results for it. But it took all those little tricks to sort of understand the context. Then we went to chat GPT and we understood, OK, we know the kind of problems people are trying to solve. Let's make it a little bit easier for them to get there without having to do that. It feels like that's kind of where we're heading into a science, though, that now that you have people like Alex explaining the problems you're trying to solve and what you're doing, that we may see a big acceleration with this. I think it's probably just a characteristic of any question that's on the frontier of or sort of at the limit of what the models can do. And back with GPT 3.5 and early versions of 4, the questions that were at the limit of what the model can do were much more basic. now they're they're questions of you know scientific research but you still when you're operating at the frontier the the pass rate will be low and so you got to kind of like there's value in in sticking with it and trying a few different things and taking the parts that it gets right and refining them while telling the model where it got other things wrong in this example i mentioned it needed a warm-up but the warm-up was the obvious warm-up that you would do as a human because actually when I was attacking this problem, I wasn't thinking about the black hole case first. This flat space limit was the obvious place to start and that is where I began. And so I think the models are actually really good, but we could get better at making them think of the warmup problem themselves so they can go there directly. But more generally, I think there's this thing we have to bear in mind, which is that as scientists, our role is to push the edge of knowledge. There are things that are just beyond the edge. And our goal is to bring them before the edge of knowledge by understanding them. But this edge is very jagged. So there are very basic questions about the universe. Like, why are there three dimensions of space? Or, you know, what happened to the Big Bang? These are things that everybody wants to know the answer to. And yet, even though they're simple questions, there's really nothing intelligent to say about this. We just don't know. They're very hard problems, actually. And then meanwhile, there are these very hard questions that you would think we wouldn't be able to answer at all, to which we have extremely detailed answers. We can predict the electron dipole moment to, I don't know, 12 decimal places, something crazy. So the edge of human knowledge itself is very jagged. And it takes many years of graduate school to learn where the edge is. And I think what we're finding with these AI models is that the edge of their knowledge is also very jagged. So you mentioned, you know, there's some basic questions that the models can't answer. That's true. At the same time, there's some very hard questions that they're very well suited for already today. And I think what exciting is that their edge of knowledge is very jagged in a way that different from ours So obviously as time goes on I think the edge of ability for these models is going to keep expanding. But as long as it expands in a way that is slightly different from our edge, that's also really interesting because at the intersection where it can go farther than us, so we can get ahead of it, that's where a lot of interesting things are going to happen, I think. Yeah, Human and AI together are much more powerful than human alone or AI alone. I want to explore that a little bit more. But first, tell me about the research paper. Yeah. So we've talked a bunch about these anecdotal examples that Alex has gotten from the time that he spent with his colleagues that we see coming in across Twitter on a semi-daily basis at this point. And we wanted to sort of bring them together and just write something, publish something about that lays out the current sort of state of GPT-5 with respect to science. And so what we've got, it's a handful of collaborators from inside OpenAI and I think eight or nine academics from beyond our walls across a bunch of different fields, math, physics, astronomy, computer science, biology, material science. And the paper is something on the order of 12 sections, each one highlighting a different way that GPT-5 is accelerating their work. the goal was not to be you know hypey and say everything is solved it's you know it's really to say hoverboards for everybody yeah like this is what works this is what doesn't work here's what i tried in many cases we're sharing the the chat gpt you know the full share links the conversation so you can see the back and forth that the scientist has with the model um and it's it's meant to be kind of a moment in time to say this is where we are today and i think we'll look back in six months, 12 months, and, and, you know, we'll probably be much further. And that'll be exciting. But even where we are today, we've got a section in the paper on the different a bunch of different examples around literature search, a section of the paper with a bunch of different examples around acceleration, whether it's calculations and other things like that. And then a section where we actually contribute four or five new, non trivial results in mathematics. and a couple of these are small a couple of them probably could have been papers on their own and so you go from kind of the the mundane but but very pragmatic and real bits of acceleration to the the more sort of profound gpt-5 actually pushing past the the current frontier of human knowledge and so we're super excited about this paper it's you know i think there'll be a lot more to come. We're not the only lab doing great work, by the way. Google has been doing this for a while, and I have a ton of respect for what Demis and the team have done with AlphaFold and more. I just think we're at a really exciting time. Ideas in science often have their moment when you have multiple people coming with the same idea, whether it's quantum mechanics, like Alex was talking about, or the light bulb. Right now, it's very clear that AI is just beginning to change science. And it's going to be an exciting few years. What advice do you have for students and grad students in the sciences? Because I hear people talk about like, oh, we're not going to need scientists anymore, which sounds absolutely crazy. It's not like the telescope got rid of the astronomer. It actually created the astronomer. How do you feel about that? And what advice do you have? Okay. I think first of all, it's important to acknowledge there's a lot of anxiety in academia right now that is unrelated to AI. It has to do with lots of changes in the way that science is organized in this country. And we're still going through these changes. So I think that talking to young people, there's a lot of anxiety surrounding this. I actually think AI is a really exciting new tool that's coming, that's becoming available, that is going to help a lot because it's just going to make everybody just so much more efficient. as Kevin was mentioning earlier, when you work on a research project, oftentimes you don't know which way exactly to go. You know, you're here, you want to get there, but there are different possible paths, different lines of attack. And the whole point of research is that from the get-go, you don't know which way to go. And one of the things that's really fun, actually fun with GPT is that you can just say, hey, I'm trying to solve this. Here are some ideas I have. You can upload some notes that you have or just describe it in a few sentences. And it's very good at getting what you're trying to do. And then you can just say, what if I approached it this way? Or what if I were to do it this way? And it can immediately go off and chart a path through the unknown, just signposting different potential avenues. and that actually saves so much time because, you know, okay, I'm a human. I have a little bit of time, energy. And when I'm going to put in the effort to do a calculation, I spend a lot of time trying to prototype it and think ahead where it's going to take me. And with ChatGPT, I can just launch it in that direction, in that direction, that direction. And it doesn't like completely get everything right, but just having these signposts along the way is so helpful because then when you do go down the path yourself, you have somebody helping you along, It feels like. And I think that's just going to make everybody faster, more productive. And, you know, I already the young people that I meet are spending a lot of time experimenting with ChetGPT and figuring out its capabilities. And I think it's going to be a boon for everyone. You mentioned part of the idea of the paper was to say, OK, this is where we are now. Let's go look in six months. Let's talk. we're five years since GPT-3, five years from now, we're sitting down here. What are we going to see? Oh, man. The five-year question is so hard. I mean, it's a great question. Here's a crystal ball. Yeah, no, I think, I mean, the exciting thing about this field in general is from, you look back 12 months and you're completely embarrassed by where you were 12 months ago. Now, the idea, if I, when GPT-3 launched, it was unbelievable, right? I mean, I'll speak for myself. It blew my mind, the idea that AI could do any of these things. And then somewhere in around like three, GPT 3.5 and four, the, the, the Turing test, which we had held up for like, what, 75 years as the pinnacle of artificial intelligence research. Like, oh man, the world will be different when an AI can pass the Turing test. We just went whooshing by and like, now we just don't talk about the Turing test anymore. And even you look back to the beginning of this year, of 2025, and most people were writing code themselves. Most engineers were writing all of their own code. It's gross, writing it yourself. And now fast forward and you've got like the idea that you would do really much of anything without leveraging Codex, Cloud Code, GitHub Copilot, you know, any of these tools. They're all incredible. It's crazy, right? You're so much more productive with it. So just in 12 months, and I know in 12 months, software engineering has fundamentally changed. I think over the next 12 months, we're going to see profound changes in the way that science is done, both in the stuff that we can do in silico, in theoretical physics and mathematics and computer science. And I think we're going to begin to see it in life sciences, in the physical sciences. That's over the next 12 months. I mean, five years. So, yeah, that's a question I think about a lot because when it comes to mathematical proof, I can kind of go into a computer and I can test that. I can verify that or at least test with some extent, the same with some sort of equation for physics. But when you get into talking about the life sciences or material sciences and stuff, are we going to have a bottleneck of way more predictions than ways to test them? Well, I think one of the valuable that there's so many areas where models can help with life sciences, if you take, you know, biology, drug discovery, for example, there, you have a huge search space. and the the more that models can learn uh how to prune that search space the more even if you're going to end up with a bunch of physical real world experiments to run at the end of the day if you can intelligently prune the search space then you can more rapidly converge on the the drugs that are likely to work in particular scenarios um and and then you can think about the impact you know for uh for that to have real world impact you need to make it all the way through the regulatory process, that is its own process that AI can help speed up. Because you end up needing to write these huge papers that bring together, you know, tons of different findings and so on. So you can take each step of the process and AI can help up front as you prune the search space and try and find candidates that are more likely to meet your needs and meet the goals that you have. And then as you go through the process to getting this thing out to consumers and making a real-world impact, AI can contribute there. And we have pilots with a number of the companies in the space doing that. So it really is fairly broad-based. You started off with an interest in particle physics. You were studying that, and then you found other things. And now you find yourself back in the sciences. Do you think other people are going to follow that pattern? I mean, it is an absolute privilege for me to get to come back and work on science. And, you know, I am nowhere near the scientist that folks like Alex and other people here at OpenAI are. But I don't know of something. I think we talk a lot about AGI at OpenAI, artificial general intelligence. I think maybe the most profound way that people are going to feel AGI in their lives is through science. ChatGPT is an incredible tool. I use it tons of times every single day. But AGI inside ChatGPT will be able to do lots of things. But when I can have, you know, personalized medicine, if AI models can contribute to science, you know, finding a way to do scalable fusion more quickly, those kinds of things will change all of our lives. And I think I think these are very real possibilities at the pace that we're going. So that's why this is the most exciting thing in the world to me to get to work on. I don't know what AGI will look like, but sometimes the experience you have of giving ChatGPT a really hard equation you're working on and it just spits out the answer, to me that feels certainly like something approaching that. And I also don't have a crystal ball and also clearly a bad track record of predicting where AI is going, given that at the start of the year I didn't think I'd be here. But there's two things that are simultaneously clear to me. one is um the models are definitely going to keep getting better um and sometimes my colleagues ask me oh are we reaching a plateau and that that is actually something i was wondering about too and then i joined open ai and i got to play with some internal models that we have they're even stronger and i'm like okay this is definitely gonna keep getting really really good and then And the second thing is, I think already with GPT 5 Pro, which is I think our best 5.1 Pro today, our best model that's available on the outside, I think there's a big gap between what the models can do and what the science community uses them for And one of our goals here at OpenAI for Science is to start bridging that gap because I think the models move so fast that unless you're really paying attention, you may not realize how much has changed in just the last few months. And so I think these two facts are true and are going to, over the next year, really lead to big changes in science. The models just keep getting better and people are starting to catch on. And that's why we're seeing all this chatter on Twitter and social media. And that's only going to accelerate. So where that takes us, I don't know, but I'm excited to find out. I think you've both made a very good point in that is that these models improve at such a rapid pace that sometimes people have a very firm idea of what they are because they tried something six months ago. And I've encountered people who I really respect and the scientists are like, oh, I tried it. And I'm like, I tried it 18 months ago. And they're not used to a tool evolving that quickly. Yeah. Or they're using the free version because, you know, of course, that's how everyone starts. And the free version doesn't think for as long. And so it can't solve problems that are as challenging. Yeah, I think that's really real. It's one of the reasons that I think the best advice is to just like keep trying the problems, even if you're working on problems. and as you try them on gpt5 it like isn't super helpful i wouldn't give up i would keep trying it every few months and i think at some point you know it's going to start being valuable if it's not already there today we talked about sort of uh thinking time yeah that's another area that we're really excited to see that with gpt5 pro you can get the model i've seen it think for what maybe 40 minutes on some of the hardest problems. But, you know, it has a certain amount of sort of compute allowance because we have to serve it to many, many, many people. 40 minutes is certainly not a limit on thinking. Like the models can think for two hours, six hours, 12 hours, 24 hours. And one thing we continue to see is that pass rate on hard problems continues to improve as you give the models more time to think which is like you know it's surprising actually the number of times there's a totally reasonable human like intuitive human analogy to these things there are a lot of problems that i can't solve in 20 minutes but that i might be able to solve if you gave me two hours system one and system two thinking yeah and some that i can't solve in two hours but if i had a day to really think about it and try different things i might get there and the models are the same way so being able to give a much small you know there aren't as many scientists in the world as there are users of ChatGPT, if we could find ways to give scientists that really know how to use the models well, just a huge amount of compute. I think that is yet another way that we can accelerate science. Yeah, it's a very good point because you'll hear people talk about it. We hit a wall or whatever. And one of the things that was really an amazing discovery, which a year ago, we found out about the whole the reasoning paradigm and the fact that you can just take the model of today and let it think longer. And we think about, you know, people go, what would we do with all this compute we're building, all this hyperscaling? It's like even using today's models and letting them think for a long time, we could probably have some amazing discoveries. Yeah, 100%. I think if model progress stopped today, just the process of driving awareness within the scientific community and giving people more of the best that the models can deliver, I think we would see a large amount of scientific acceleration. But of course, progress is not going to stop, as Alex was saying. And so when you think about the models being able to think for a longer time, being able to train them to do harder and harder scientific tasks, and actually also just, you know, getting out in the scientific community and helping people see what the frontier really is and how they can use the models better to do the work that they're doing. I just like, I'm excited to see where this goes over the course of the next six months, 12 months, 24 months. Yeah, I think this is a really unique time in history. It feels like a special moment. And to be clear, we're not telling people drop whatever you're doing and come do AI. That's not the message. I think what we want to say is keep doing what you're doing. But also there's this great new collaborator, this new tool you get to use that's going to make it even more fun. And it's going to bring new life into a lot of different fields. One of the challenges right now with benchmarks is that models, when we talk about terms like saturation, it seems like models have done that. Also, a lot of them just don't seem that impressive anymore. Now it looks like we're moving to the scientific frontier. What does scientific benchmarks look like? Yeah. Like with many things, there's sort of an intuitive way to understand this is the models get smarter. Benchmarks are just a way of testing the model in some sense. And as the models get smarter, you need to give them harder and harder tests because they learn how to ace the earlier tests. So if you take GPQA, which stands for Google Proof Q&A, it's a scientific benchmark that asks basically PhD level questions across a range of scientific fields. We thought for a long time that was a very hard benchmark to beat. I think it came out in 2023 and GPT-4 originally was like at 39% on this benchmark. Humans, by the way, are at about 70%. But now you fast forward two years and our latest models are nearly at 90%. Wow. So they're surpassing the capability of most humans in their field of scientific study across every field at once, which is kind of amazing when you think about it. But that isn't you know, those aren't the hardest questions in the world. And that's one of the reasons that we're focused on new evaluations that ask frontier science and mathematics questions. It's also, you know, we released something called GDP Val recently, which is an eval that tests the model's ability to do economically valuable tasks. So the smarter the models get, the harder the tests that we want to keep giving them. Because, you know, every gap that we see, every place where the model can't answer a certain question, that's feedback for us and gives us a way to improve the model further. Curian disease. Great. What area, though, beyond that would you really like to see? And it could be crazy or weird or odd. You'd like to see scientific acceleration. You want to go first? I'm very selfish. So I have my own interests. I really like black holes. That's my passion. You want to build a black hole. I think there's a lot of potential for how AI can accelerate black hole research. And of course, I want to see it help with cancer and drug discovery and all these good things. But my first priority is, yeah, I want to see more AI helping with black holes. So, you know, there's a lot of ideas on the table and so much potential. One thing is there are a lot of theoretical questions that are very thorny. And I think if you just sat down and you could understand everything that is known and you could integrate it, integrate that knowledge, I think a lot of things would fall out of that. And that's one of the things that we're exploring. Dark matter, for instance, is something that we've been talking about because there's a lot of data on dark matter from various experiments. but we still have no idea really what it is. There's a bunch of theories out there. I think a really interesting idea is, could it be that by feeding ChatGPT all the experimental data that is known about dark matter and all the theories, it could rule some of them out already by combining bits of knowledge that are just so disparate that it's hard for our human minds to hold them together? I think that's kind of an exciting frontier. And then I think also, since we were talking about the far future, experimental work is totally not out of the question right now we're focused on more theoretical fields because they can be done in silico but you could totally imagine using ai to design better experiments and maybe run very hard complicated experiments including maybe for black hole physics and other fields i think there's a lot of ground to explore here and very exciting possibilities yeah i'll say fusion just because the if we can actually we have again And small scale, I mean, large scale, but small, small existence proofs of it. So clearly it can work. And the challenge now is to do it like at bigger scale, more reliably. Clearly it's possible. We will figure this out. But if we can accelerate it, then, you know, the world, the world with fusion is a significantly better place than the world without. We solve a lot of problems if we if we solve fusion. And, you know, I'm excited to see if maybe we can contribute in some way. I think it's easily overlooked by people how much we're dependent upon energy. And if we had the same orders of magnitude improvement on energy production that we had in the last 200 years, what that unlocks. And you think about, you know, things that are energy intensive, like desalinization, you know, or construction and other things. And when you have really, really, really unbound energy. Yeah. It's incredible. I mean, some groups might need to like, might be looking to build lots of infrastructure for lots of GPUs, for example. Yeah. Who might want to do that? But even beyond that, I think that we're going to probably see from that the infrastructure build out a lot more energy devoted to energy. And much like mobile phones and laptops made electric cars a lot more efficient because all this money being thrown into battery technology. I think we'll probably see that offshoot. Yeah. And I think anytime you change something by an order of magnitude, the world changes. I think what we've seen over the past year with the way that software engineering has changed, you now don't need to be trained as a software engineer to write meaningful amounts of code. That means you can bring, you know, there are like, what, 30 million software engineers in the world. I think now 300 million, maybe 3 billion people can write software. and that's going to fundamentally change things. If we can move, if we can make energy 10 times more prevalent, 10 times cheaper, it will change the world. And I think it's a really high potential place for us to apply the intelligence of our models. If I can add something, we have ideas that we're excited about in terms of the potential of AI to change science, but this is very much not supposed to be a top-down effort where we dictate what AI is going to do in the world. We're actually very excited about building the best general purpose AI. And if we release that into the world, then everybody will take it and use it for their own purposes. And for me, I'm a black hole physicist. I want to use AI to further black hole science. But for a scientist in another field, I think it's natural to use it for that. And the nature of research is such that it's very hard to know where the next breakthrough is going to come from, really. And so I think our vision is to push this out into the world. we can see, I think we could see a lot more adoption than we have today. And once that happens, who knows where the next biggest discovery will come, but that's how we give ourselves the best chance to accelerate scientific discovery. Yeah, it's such an important point. The frontier or the surface area of science is massive. And this is not about what we can do within open AI individually to accelerate science to accelerate specific scientific projects. It's about giving scientists all around the world AI so that they can accelerate their work. That's how we move science forward faster. So, you know, there are pieces I think that we will try and do because it'll help us learn. But the vast majority, like what we really want is to see 100 scientists win Nobel Prizes using AI. Yeah, it feels like it's not the end of science. It's really the start. Exactly. Exactly. Certainly it's a it's sort of a there's a science 2.0 moment happening, I think.