OpenAI Podcast

Episode 14 - Building AI for better healthcare

31 min
Mar 16, 2026about 1 month ago
Listen to Episode
Summary

OpenAI's health team discusses their approach to building AI for healthcare, including ChatGPT Health's development with 250 physicians and deployment in clinical settings. They cover safety measures, evaluation methods, and real-world applications showing reduced diagnostic errors in Nairobi clinics.

Insights
  • Healthcare AI requires extensive physician collaboration throughout the entire model training pipeline, not just evaluation
  • Context-aware AI that can integrate personal health data across multiple sources creates more personalized and effective healthcare guidance
  • AI deployment in clinical settings can serve as a safety net, with studies showing statistically significant reduction in diagnostic errors
  • The transition from AI being 'interesting' to 'transformative' in healthcare is happening through drug discovery and scaling complex medical experiments
  • Healthcare professionals are becoming reluctant to work without AI assistance once they experience its benefits in clinical workflows
Trends
Integration of AI across all stages of model training specifically for healthcare applicationsShift from reactive to proactive healthcare through AI-powered continuous patient engagementMulti-modal health data integration from wearables, EHRs, and biosensors for personalized careAI-assisted drug repurposing and discovery of new therapeutic applications for existing medicationsDeployment of AI safety nets in clinical workflows to reduce diagnostic and treatment errorsGrowing consumer adoption of AI for health queries with 40 million daily usersDevelopment of adaptive literacy AI that adjusts communication based on user expertise levelPost-deployment monitoring of AI clinical workflows for continuous safety improvement
Companies
OpenAI
Primary focus - developing AI models and ChatGPT Health for healthcare applications
Panda Health
Partner organization that conducted AI Clinical Copilot study in Nairobi clinics
Emory University
Medical school attended by Dr. Nate Gross for healthcare training
Grady Hospital
Large public hospital where Dr. Gross gained clinical experience
Waymo
Self-driving car company used as analogy for AI safety and trust
People
Dr. Nate Gross
Head of health at OpenAI, former medical school student focused on healthcare access
Karan Singhal
Leads health AI research at OpenAI, background in AI safety and privacy
Andrew Main
Host of the OpenAI podcast conducting the interview
Quotes
"We're starting to see medications that have been sitting on a shelf that all of a sudden AI has found ways for them have direct value in patient lives."
Dr. Nate Gross
"They actually felt that it was dangerous to have a group of clinicians not using the AI."
Karan Singhal
"I just reached the point where when I'm biking next to a Waymo, I actually feel safer than if I was biking next to a human driver."
Karan Singhal
"Healthcare today, as everyone knows, is fragmented. Care is missed left and right."
Dr. Nate Gross
"OpenAI's models are the only major models where every phase of model training, from pre training to mid training to post training and every step in between, really integrates health into every major stage."
Karan Singhal
Full Transcript
3 Speakers
Speaker A

Hello, I'm Andrew Main and this is the OpenAI podcast. Today we're talking to Dr. Nate Gross, head of health in Karan Singhal, who leads health AI research at OpenAI. We'll cover what went into training models to handle sensitive questions and how it's helping clinicians, patients and healthcare systems.

0:00

Speaker B

We actually worked really closely with a group, a cohort of around 250 physicians across every stage of generation of this data.

0:16

Speaker C

And we're starting to see medications that have been sitting on a shelf that all of a sudden AI has found ways for them have direct value in patient lives.

0:25

Speaker A

How did you find your way into healthcare?

0:37

Speaker C

So what drew me to healthcare initially was health policy. I was very interested. This was before the first Obama election. Value based care was first becoming a thing. I started studying different ways to make healthcare more accessible to more people and, and then eventually went to Emory for medical school. And what drew me to that was a large public hospital, Grady Hospital, to make sure that you're taking advantage of every clinical hour you have.

0:40

Speaker A

So what kind of things were you doing?

1:13

Speaker C

So I was mostly pissing off the IT department. When I was in medical school. The newsfeed came out, the iPhone came out, Twitter came out, the app store came out. And so comparing the technology that we had as doctors, which was fax machine, clipboard, paper binder, the beginnings of electronic health records, to what my friends had or what the patients had in the waiting room was pretty profound.

1:15

Speaker A

So you come at it from the point of view as an AI researcher, where did your interest in applying this to healthcare come from?

1:40

Speaker B

So I nerded out a lot when I was younger about things like philosophy of mind. And I thought a lot about, you know, intelligence and how far could intelligence go and could machines be intelligent? And a lot of those explorations took me towards, as I was learning about AI and starting to work on my first AI projects, thinking a lot about the ways in which AI could have a lot of impact on humanity in the future. And I thought something like, I didn't predict the future or how fast it would happen, but I thought something like AGI would happen within our lifetimes. So then once, once I had that conviction, I thought a lot about, you know, what are the ways in which I can have either positive impact and hopefully make that a really large upside for humanity or think about the ways in which we could avoid downside. So since then in my career, I've been thinking a lot about both sides of that coin, thinking about that from the perspective as a safety researcher. Which is part of my background. And then really some of that work on safety and privacy that I was working on previously, I started applying it in healthcare and then I started being like, whoa, there's a really massive opportunity to think about the application of this technology, especially in large language models in healthcare. And that's what took me to transitioning to it full time, was just the size, that opportunity and the fact that I felt like the healthcare and clinical AI world was kind of not fully aware of that, of that gap. And so I just thought it was kind of a really amazing opportunity and responsibility to bring us there.

1:47

Speaker A

I want to understand both the vision and actually how this is going to be implemented.

3:05

Speaker C

So our mission at OpenAI is to ensure that AGI benefits all of humanity. And health is one of the places where I think that is not only most achievable, but is the clearest. Healthcare today, as everyone knows, is fragmented. Care is missed left and right. Patients are often left 364 days per year without the opportunity to engage with the organizations that have the information centralized. And doctors have extremely limited time when they do get that chance to engage with the patient, to actually have a meaningful impact beyond a simple surgery or a simple reactive prescription. The system is more reactive than it is proactive today. And that leads to tremendous challenges in the system. It leads to tremendous gaps in care, it leads to leaving people behind in situations when they could be thriving. And one of the reasons that I joined OpenAI is because access has always been a through line in my life. Access to knowledge, first in medicine, then in building a product for doctors to access the latest medical literature, and then in supporting entrepreneurs as they were building health care tools. But OpenAI has the type of technology that can, can do that at scale for the entire ecosystem all at once. Help patients, help health care professionals, and help incredible entrepreneurs who are building for all of the corners and edge cases and tough challenges that exist in each area of the health market.

3:10

Speaker A

What is the strategy here? We know that people use chatbots all the time now for medical questions, but it seems like you're building and working towards something bigger and more comprehensive, not just for the patient side, but the clinician side. Could you talk about like what your goals are?

4:59

Speaker C

Patients are increasingly turning to tools like ChatGPT throughout the year. In fact, 900 million people now use ChatGPT per week. And if you look at how many are doing health related queries, it's about one in four in a given week. So that's 40 million people per day. And so Our strategy in health is as much proactive as it is reactive in stepping up to the responsibility and the opportunity to do good that comes with that strong consumer demand. And so with ChatGPT Health we have created a space to keep these conversations not just secure, but empowered. So when I say secure, of course encrypted with this essentially one way valve protecting your conversations. So these extra security layers, these protections to make sure that we will never train on users. Healthcare data combined with empowerment really, you know, search engines that people have used before to navigate health have amnesia. You know, they're, they're one size fits all. And I think context really matters in healthcare. And so building a series of features and technology hooks to help patients bring in their own context that they choose to, so that each time they choose to engage with AI, it's grounded in their own context is a key reason why We've built this ChatGPT for Health Foundation.

5:14

Speaker A

So I understand the safeguards you put in place to keep the data separate and to make sure that you don't get leaked between there and to be able to undergo a very rigorous method of making sure that your data is secure. But when it comes to the model itself, what comes into training models that are capable of working with something like healthcare, it's kind of like the most important thing in the world for sure.

6:47

Speaker B

It's a high stakes domain and because of the use that people are doing, it's super important that we get it right. We think a lot about a few things when we think about evaluation and training for healthcare. And this is actually the foundation for the work at Health at OpenAI. When we were first starting to work on the health effort at OpenAI, we were thinking a lot about the safety and grounding motivation as an important part of what we were doing. Part of the thesis actually for starting work on Health at OpenAI was thinking this is an excellent way to ground our work in safety and alignment and provide concrete incentives and feedback loop for researchers who think about this problem. So the model improvements and the safety thinking here is not just an afterthought, it's actually the beginning of our work here. Where we started really was thinking about evaluation. So can we think about the ways in which models were already starting to become useful to people then? And there's already starting to be this capability overhang between what the models could do and what people were using them for. We started to navigate that problem and think about where do the models still have gaps today? That's where our work on evaluation comes in. We've taken a pretty methodologically interesting approach to that and a lot of that has reflected in our work in healthbench, which is this realistic evaluation of conversations between users who are either health professionals or consumers, talking to models and measuring the performance and safety of the models in these situations, which are these kind of multi turn conversations. And the way we worked on this is we actually worked really closely with a group, a cohort of around 250 physicians that we work with to kind of across every stage of generation of this data, from thinking about the ways in which the areas that we would focus in for the evaluation and the areas that we thought about were going to be clinically relevant or impactful to the specific. What are the specific things that are being graded in this evaluation? So that's like a range of things from are you tailoring your response to a layperson versus a more technical health professional? Are you thinking about the ways in which you should seek context first before providing an initial response? The model used to be significantly are much better today at kind of seeking context when needed because users are typing in much less than the models often need to be able to provide information that's most helpful.

7:06

Speaker A

It burns.

9:32

Speaker B

Exactly. If a user types in, it burns. How do you think about the right way to provide information? You can provide some initial information potentially based on an impression you might have of what the user might be saying. But the most helpful thing to do in that situation and the safest thing to do in that situation is actually to ask for more context. That's just one example of the many ways that we measured performance in Healthbench. Healthbench in particular actually measured around 49,000 different dimensions of performance. And that's just an example of one possible dimension of performance. So this is a very multifaceted evaluation that we built kind of in concert with this cohort of 250 physicians over a long period of time. And it took us about a year actually end to end to work on that evaluation and then release it in

9:33

Speaker A

kind of the model development cycle. It seems like sometimes some company gets a bit ahead and somebody comes up and catches up and whatnot. I've noticed a pattern with the OpenAI Health models. They've consistently been far ahead in health bench and other evals that like by a big margin. Why is that?

10:15

Speaker B

I think we have a pretty dedicated effort here and a pretty serious effort that is cross functional and kind of across the stack for everything from kind of pre deployment evals like health bench to monitoring in production traffic and thinking about the ways in which we are ensuring safety in production, traffic in a privacy preserving way, and working with physicians across every step of that process. And so to my knowledge, OpenAI's models are the only major models where every phase of model training, from pre training to mid training to post training and every step in between, really integrates health into every major stage. I think the result is that our models are pretty good, not just on our own benchmarks, but also the benchmarks that other people put together.

10:33

Speaker C

I'd like to add a little to what Karen said about the model training because I think when we spend time with the healthcare ecosystem, that's one of the things that is most important to them. So not only were these models trained in development with hundreds of physicians who created over 5,000 conversations and 48,500 rubric criteria through which to evaluate AI responses and score them and identify ways that we could improve the model, do additional data acquisition, do additional post training, hone in on a particular subspecialty or particular area of the world where users were telling us we could improve health or healthcare in that specific topic. But in addition, I think that close proximity to physicians really leads to calling out the most important parts that should be focused on in model development other places. Sometimes I see how a model fared on a medical school exam or a board exam. And healthcare is not multiple choice. Patients are coming in with a tremendous amount of complexities in their own stories and nuance and context, and that's presented in many different ways. And part of the job of working in healthcare is being able to draw from those disparate sources, draw from experience, balance all that in your head. And so having a training mechanism that thinks about things like when to escalate and how to escalate and keep that always as the top priority or adaptive literacy, I mean, can compare the one size fits all handouts that people get when they visit the doctor today to a model that can respond differently when it knows you're an oncologist versus a primary care doctor versus a pharmacist in Kenya versus a patient at the 12th grade literacy level or the third grade literacy level is extremely important for not only making sure that accuracy and impact is maximized, but also just to make sure that everyone can maximally participate in their own care on the patient side. And then finally, uncertainty. If you go back a year and a half ago, many of the mistakes people would call out about AI models were overconfident hallucinations. And I think in, in such a high stakes field like Healthcare one of the most important things is that the model can be trained to better know when it doesn't know and say that. And in addition suggest follow up that can be dug into either by the patient in a referral to the healthcare system or or by the doctor if the doctor is using the model. A test that they might run additional pathways they may go down to make sure that the patient can be led to the best possible outcome.

11:18

Speaker A

We've seen the cost of intelligence drop every year. And it's exciting because every year you're able to get better answers. Medicine, everything, healthcare across the board. But what are the challenges? What are gonna be the blockers or what are you looking at ahead to say that okay, we have to solve for this?

14:22

Speaker B

The drop in cost. Intelligence has been super exciting here because so much of what we think about and care about here is actually about access. And so the more people have access to technology, the more people will benefit. And that's why we're working on rolling out ChatGPT Health more widely to all free users. That's incredibly exciting. Another thing that we think about as researchers is where will the marginal gains intelligence compound the most? Right. And so I think Nate mentioned this exciting thing which is there is more and more data that is being collected that is across different modalities. How do you think about integrating that data across all the different ways that people use ChatGPT and all the different modalities and wearables and things like this that people are collecting lab tests, things like this? That's one place where I think a lot of the intelligence will compound and we'll start to see new 01 capabilities. A model looks at my entire history over a decade and tells me a prediction that even a human couldn't have because it's just the model has a higher context size. So thinking about those zero to one capabilities I think are going to be really cool. The other thing we keep in mind is just like how are people thinking about and using ChatGPT today? Can we measure that, can we improve that? And I think we're kind of at this interesting point right now. I call this to our team, the transition where for context, I bike to work and I bike to work, I wear my helmet, I worry about cars and things like this. Next to me, I just reached the point here in sf. In SF we have a bunch of self driving cars including Waymos. I just reached the point where when I'm biking next to a Waymo, I actually feel safer than if I was biking next to a human driver. I don't worry about whether I'm in their blind spot or not or anything like this. I feel this protective effect by being next to this waymo. And I want everybody to have this protective effect. I want everybody to have this protective effect. With health AI, there are these studies showing that if you have a doctor in your family, that adds a protective effect to your health as well. And I want everybody, whether they're patient or health professional, to think about the ways in which, as a patient you want to feel safer having this. As a health professional, you want this to be a safety net for the decisions that you're making. That's another frontier that I think we're going to cross in the next six months or so, which is really exciting, this inflection point. Another thing that we're thinking about is the right ways to think around post deployment monitoring of certain workflows. I think a good example here that I'd love to talk about is our AI Clinical Copilot study that we did with Panda Health. This was a study where we worked with these 20 or so clinics in Nairobi and actually thought about the ways in which we can deploy a safety net for clinicians in that context, which is basically monitoring things that they type into their electronic health record and only interrupting their flow when there's something potentially concerning that's going on or potential error or things like this. What we found is that when we deployed this to clinicians in the setting, that there was actually a statistically significant reduction in diagnostic and treatment errors for the clinicians who are using this tool versus not. And I think this is a step in the direction of moving beyond model evaluations and even monitoring of the ways in which people are thinking about using ChatGPT today to actually thinking about workflows in which these technologies can be deployed and the right ways to evaluate those workflows after deployment. I think that's another frontier that we are really excited about and would love to see more from our partners.

14:39

Speaker A

Nate, what do you think the challenges are going to be?

18:08

Speaker C

I'll start with talking through some of the challenges that exist on the professional side. So each day when healthcare professionals use AI, they're looking for the ability to trust what they're seeing in the answer. And so a lot of our recent work has been making sure that answers that the AI is providing are not just grounded in what the model was training on, but is ground are grounded in the latest medical literature, the latest guidelines, and sometimes the latest guidance from their own institution or their own region. Some conditions are treated differently in areas of different areas of the country. Other times, different care settings have different levels of resources, different levels of specialists and additional services on hand. And it can be helpful as a healthcare professional to be able to quickly navigate that and come up with completely personalized care plans. And so building connectivity within ChatGPT to not only be HIPAA aligned and be used in these secure environments, but also be able to combine sensitive information with the latest medical knowledge, I think is a great path that we've started down and something that will continue to keep trust as the top priority between how healthcare professionals engage with AI. I think one of the other challenges is that the systems themselves in healthcare are quite siloed, both at an organization level, but also at the tools that have to be used within each organization. AI thus far has been deployed on really a point solution basis in the technology industry. But increasingly the connectivity is becoming available to connect the dots between the hundreds of different systems, some analog, some digital, some structured, some unstructured, many decentralized, many not on the cloud. Being able to connect all of those through unified AI layers to actually make sure that patience and information isn't falling through the cracks and that the connectivity can be maximized to actually bring the greatest amount of impact. That's hard in healthcare, and it's certainly not something that we can say is solved. But with many of our recent products, ranging from ChatGPT for healthcare and its connectivity to apps and connectors, to the OpenAI API for healthcare, to our Frontier foundation for models and agents, we think increasingly there's going to be an opportunity to really accelerate what is possible within the healthcare system and what agents can achieve.

18:10

Speaker A

Part of this seems like it's very collaborative working with the healthcare industry. And I noticed when using the ChatGPT Health app, the first thing I did was able to put in my records and get all of that. And it looked like there was a lot of just cooperation working across the sort of ecosystem to do this. And how has that come to be? Where is this headed?

21:05

Speaker C

It's extremely important that all of the healthcare system has an equal chance to contribute and engage nationally and internationally with providing the context that will help empower patients to receive the best possible answers from ChatGPT. And so on the electronic health record side, this means working with the government and centers for Medicare and Medicaid services, adopting national standards for electronic health record, syncing so that patients in just a few taps are able to bring in their context in consented ways. It's being able to tap into existing standards like mobile phones and the most popular consumer health products and the most popular biosensors and wearables to make sure, again, in just one or two taps, patients are able to not only bring in that information, but leverage it in thoughtful ways, in ways that may not have been possible without the combined set of data that can exist in this sort of ecosystem. So, for instance, being able to reference your recent exercise activity when making a plan of how to spend your evening, or being able to even do things as simple as reference your overnight sleep and stress when your agent is helping you set your calendar for the next day and what tasks you may take on first.

21:24

Speaker A

It's very exciting. I have wear a smart ring, a watch, whatever, but I get this data and all I have in my apps are rings to look at and go like, okay, I guess it's doing something. Being able to plug in a chatgpt has been fantastic because now I'm able to ask those kinds of questions. But that's very exciting. What you talk about too is if you get a plan from your doctor suggestions is literally say, hey, I didn't walk enough yesterday, what should I do today? I've had to be really good at menu planning and literally go on this menu, tell me what to order and whatnot. And so you're saying we're just going to get more of that and much better.

23:02

Speaker C

Yeah. And that's why our partnerships, I believe, are so important. Because in these instances, ChatGPT doesn't replace the incredible technology that our partners are building to go deep on health insights for a particular wearable. But our surface area, our opportunity to bring in that health information can now extend to the many different ways people use ChatGPT, such as what they're going to cook for dinner or how they're going to plan their afternoon. Sometimes I think of two patients and one patient has to navigate the healthcare system by themselves and the other patient maybe has a spouse come with them and that spouse has a clipboard and used to work as a healthcare professional and is very attentive, if not neurotic and can follow up on details and is connected to your personal calendar and the best aspects of that with consent for the patients that want to, I think represents a future where we can make it easier and easier for patients to follow care plans, to play active captain like roles in their own health in partnership with their care teams and their physicians. And I think if we can remove a lot of the friction that historically exists between those processes, whether it's just information not following or there's a lot to keep track of, or a lot of old information to parse and bring in. We can do a tremendous amount of good, or we can help patients themselves be empowered to do a tremendous amount of good in their own care plans.

23:34

Speaker A

And you know, as a physician that it's hard to give as much time as you would like because you're going to always have more patients you have to deal with and you have hours in the day. And it's interesting to see kind of a technology that has infinite time, infinite patience to be able to do that as a complement to.

25:18

Speaker C

I mean, if there's one thing that healthcare professionals are short on, it's time. So when we think about our role internally at OpenAI, we often break down the work that we're doing into three buckets. Raise the floor. So make sure that AI and the benefits of AI are accessible to everyone. And that could be patients, that could be healthcare professionals and others working in health related industries. Sweep the floor, which means help doctors and help other health professionals save time from the tremendous administration and bureaucratic burdens that they have every day so that they can spend more time with their patients. And then thirdly, raise the ceiling. The impact that AI can have in healthcare I think will allow us to look back on this space in a few years and say, wow, we have all accelerated together in a way that medicine is still in the driver's seat, but is also far more empowered than ever before.

25:33

Speaker A

Yeah, I don't think anybody feels like their doctor spent too much time with them. So it looks like this is going to be helpful to solve for that. What has been your favorite aha or wow or this is really cool moment in the intersection of AI and healthcare.

26:37

Speaker B

I'll answer your question in a non standard way, which is I think the most amazing thing to see for me in the last year has been the rate of adoption of health, actually even beyond the ChatGPT Health product. Before we announced the ChatGPT Health product, it's been one of our fastest growing use cases is health and wellness questions. And we share that hundreds of millions of people a week are starting to use ChatGPT for health and wellness. I think seeing that rapid growth, especially coming from a background of being motivated to work on this problem because it felt like healthcare and clinical AI world were not super aware of the potential of LLMs in healthcare and seeing how far we had come, I think has been a really special moment for me.

26:52

Speaker C

There's no doubt that the adoption of this technology and the fact that it is increasingly collaborative with the healthcare system, it is increasingly driving feedback loops back to us to improve the models is the most meaningful thing and the most mission aligned thing. But what I also get excited about is what our research team is increasingly be able to able to give back to them using that feedback. And not only is it the capabilities of the models, but it's what can be unlocked once those models are allowed to run longer and have more context. And we're starting to see discoveries of medications that have been sitting on a shelf that all of a sudden AI has found ways for them to have meaningful and direct value and inpatient lives. It is starting to scale experiments that, you know, we as individuals wouldn't have been able to juggle on our own. And that that partnership, combined with that increased capability to finally move from being interesting to being useful and increasingly to being transformative is I think what is the most exciting thing for us heading into this year.

27:29

Speaker A

Now that you've been working on this for some time, you've been engaging with clinicians and talking to people, helping deploy this, what has been some of the feedback you've seen?

28:45

Speaker B

I think the experience of flying to Nairobi and seeing the clinicians using the tool and the ways in which we did this thing, which we call active change management, where we work really closely with these clinicians and flew to Kenya a couple times to think about the ways that we could deepen their workflows using the AI tool and make it something that not only made sense to them, but actually became kind of something that was indispensable for them. And so as we were concluding the study, the team was actually thinking about, the team at Penta Health was thinking about potentially running another study. And they actually had a lot of hesitance around running another study because that would have involved having some group of clinicians using AI and some group of clinicians not using AI. They actually felt that it was dangerous to have a group of clinicians not using the AI. And so that's the point at which I was like, wow, we have done something major here.

28:52

Speaker C

I think that the stories that we get back from our members every day are one of the most meaningful parts of the job. And these are from caregivers that are increasingly under strain, taking care of family members, trying to navigate their own health at the same time. This is from doctors and nurses who are truly overloaded every day. And we can help them extend their expertise and, you know, compress the tough parts of their, their day a little bit more. And then sometimes, and this is more rare, but increasing. It's the miracle cases. It's the patient who had been bouncing around the system for years, the unsolved diagnosis, the. The emergency where information wasn't present. And suddenly being able to step in and assist and accelerate and bring people into the care that could really help is truly a privilege.

29:44

Speaker A

It's exciting. It's an amplifier. And every doctor I know wants to be able to do more for their patients. Thank you very much. This has been very interesting, guys, thank you.

30:44