AlphaFold: Grand Challenge to Nobel Prize with John Jumper

47 min

•Nov 28, 20256 months ago

Summary

John Jumper, 2024 Nobel Prize winner for AlphaFold, discusses how the AI system revolutionized protein structure prediction and is now expanding to model all biomolecules. The conversation covers AlphaFold's evolution from version 2 to 3, its unexpected applications across biology, and how it's accelerating drug discovery and protein design while remaining a tool that requires experimental validation.

Insights

AlphaFold's greatest impact isn't solving protein folding—it's becoming embedded in scientific workflows, with 35,000+ papers citing it and enabling scientists to test hypotheses 1000x faster than traditional methods
The shift from AlphaFold 2 to 3 involved removing evolutionary data and adopting diffusion architecture, paradoxically making the model simpler, more accurate, and capable of handling DNA, RNA, and small molecules simultaneously
AI tools in biology should be evaluated on utility and reliability (via confidence metrics) rather than interpretability—the field is moving toward Roman engineering pragmatism over perfect understanding
Protein design remains unsolved despite structure prediction success; designing functional enzymes requires solving multiple downstream problems (speed, specificity, manufacturability) beyond just getting the 3D shape right
The future of computational biology likely involves fusion of narrow AI systems (AlphaFold, AlphaGenome, AlphaProteo) with large language models to integrate structural, genomic, and literature data into unified biological understanding

Trends

AI-accelerated drug discovery shifting from target identification to multi-property optimization (toxicity, metabolism, formulation)Protein design moving from academic curiosity to commercial application (enzymes in detergents, therapeutic proteins)Computational tools narrowing hypothesis space in biology, improving experimental priors through Bayesian integrationCross-disciplinary AI applications emerging unexpectedly (bumblebee conservation, sperm-egg fertilization mechanisms)Confidence metrics and uncertainty quantification becoming standard practice in AI-assisted scientific workflowsEnzyme engineering for climate applications (carbon capture, microplastic degradation) gaining computational feasibilityMulti-modal AI integration in biology combining structure prediction, sequence analysis, and natural language understandingShift from interpretability-first to utility-first paradigm in scientific AI tool adoptionProtein-protein interaction prediction becoming foundational for synthetic biology and therapeutic designOpen-source biology databases (AlphaFold Database with 200M+ structures) democratizing structural insights globally

Topics

Protein Structure PredictionAlphaFold Architecture and EvolutionDrug Discovery and DevelopmentProtein Design and Synthetic BiologyDiffusion Models in BiologyConfidence Metrics and Uncertainty QuantificationMulti-protein Complex PredictionEnzyme EngineeringAI Interpretability vs. UtilityComputational Biology WorkflowsBiomolecule Interaction ModelingCryo-electron Microscopy IntegrationClimate-focused Protein DesignLarge Language Models in BiologyScientific Tool Adoption and Validation

Companies

Google DeepMind

Developer of AlphaFold and related AI systems; Jumper is a researcher there; won 2024 Nobel Prize in Chemistry

Isomorphic Labs

Using AlphaFold for drug design and small molecule discovery; applying structure prediction to therapeutic development

University of Washington

David Baker's lab pioneered protein design work leveraging AlphaFold predictions for synthetic biology applications

People

John Jumper

2024 Nobel Prize winner in Chemistry; AlphaFold co-creator; discusses his career path and the tool's impact on biology

Demis Asalvis

Google DeepMind co-founder; 2024 Nobel Prize winner in Chemistry alongside Jumper for AlphaFold development

Hannah Fry

Podcast host and science communicator; conducts interview and provides context on AlphaFold's significance

David Baker

University of Washington researcher; pioneered protein design methods using AlphaFold; 2024 Nobel Prize winner

Max Adderberg

Isomorphic Labs researcher; discussed AlphaFold applications in drug design in previous podcast episode

Rebecca Paul

Isomorphic Labs researcher; discussed AlphaFold applications in drug design in previous podcast episode

Quotes

"I have to be the first person to get into AI because of a lack of computational capability rather than an abundance."

John Jumper•Early career discussion

"The most useful thing that AI has ever done."

Hannah Fry•Episode introduction

"We've made it 10% faster overall across the whole thing. We've amplified this enormous effort and societal expense."

John Jumper•AlphaFold impact discussion

"A protein structure costs about $100,000 and a drug costs about a billion, right? So they can tell you that it can't all be protein structure determination."

John Jumper•Drug design limitations

"If you train a model to be really, really good at a task, it has to learn a lot of deep facts."

John Jumper•Model generalization discussion

Full Transcript

I think we'll get this ability to poke the cell in exciting ways, to interrogate it, and every time we develop that, we'll develop a more interventional understanding of the cell that we will bring forward to medicine and synthetic biology. Welcome to Google DeepMind, the podcast. I'm Professor Hannah Frye. Now today we are talking about AlphaFold, one of the most extraordinary technological breakthroughs in modern science, a tool that has been described as the most useful thing that AI has ever done. And in truth, that might be an understatement. This is a Google DeepMind AI system that solved one of biology's grandest challenges, predicting the 3D structures of proteins, the fundamental building blocks of life. Its latest version, AlphaFold 3, can now model the structure and interactions of all of life's molecules with unprecedented accuracy. And the impact has been seismic. AlphaFold has mapped hundreds of millions of protein structures and more than 3 million researchers across 190 countries now use its database. It is transforming drug discovery. And in 2024, the Nobel Prize in Chemistry was awarded to Google DeepMind's Demis Asalvis and John Jumper, who is our guest on today's podcast. Now this is a story that we have been following on this podcast since season one, which was nearly eight years ago, long before it hit the headlines. So if you are coming to AlphaFold for the first time and wondering what all the fuss is about, you can find our previous Explain It episodes linked in the description. Welcome to the podcast, John. It's exciting. I don't think I've interviewed you since you won your Nobel Prize. Where were you when you found out about it? I stayed home because I was nervous enough. I thought there was a chance, like one in 10 a chance. And so I figured I would be disappointed at home. And I was kind of just sitting in the bed. My original plan was I'll sleep through it and a phone call wakes me up, then I've got the Nobel, but I couldn't sleep. Because you knew the day that the phone might happen, right? You know the day. I knew, in fact, the kind of time that it was scheduled to be announced at 11. And I knew that winners were called about an hour beforehand. So by about 10.30, I said, oh, well, I guess not this year. And I told my wife. And she goes, no, no, wait. And as she's telling me to wait, my phone lights up with a phone call from Sweden. And thankfully, it was not the world's meanest prank call. And yeah, it was just kind of this extraordinary thing. And you answer and they say, is Dr. John Jumper available? Yes, I have some wonderful news. Great. Can you please hold? Right? So they get, I think, Hans Alligran. Well, they make you hold. Well, I think they were trying to read. Part of the problem was they didn't have either Demis or my phone number initially. So anyway, they ended up calling us very late. But then they were finally arranging and they pulled the person on. He says, you know, I have some life-changing news. And they don't say the word Nobel for like 60, 90 seconds, which was the longest minute of my life as there is no other explanation for this call. And I remember the very first thing I did is run to get a shower because I knew I was going to get no time for the rest of the day. But after that, you know, it was announced. You come in, you see the team, you have this amazing kind of celebration. We bought the local waitress out of sparkling wine. Only the best will do. It was, I'm not a connoisseur. And we were celebrating with friends. And there was just this incredible kind of party just across the floors of our building. It was amazing. The thing is, it's an extraordinary story of you as an individual, right? Because your first PhD, your physics PhD, you dropped out, right? Yeah, yeah. And so going from that, which I think must have been quite a hard experience to live through, to like being a Nobel Prize winner and having your tool being used in tens of thousands of academic papers. I mean, I will say dropping out was a very lucky thing for me. I was doing the wrong thing. I didn't really want to, and so I just left. And because I left, I actually fell into this computational biology group that was doing amazing work on custom computer chips to simulate proteins. And then I go back and I do my PhD now in chemistry by another set of accidents. And I didn't have those great computers. So why not get into AI? Why not try and use sophisticated algorithms to make up for, you know, a lack of compute? I have to be the first person to get into AI because of a lack of computational capability rather than an abundance. And then I got lucky enough to kind of find a job that had something to do with everything I'd ever tried to do in my past. And it worked out and I get a Nobel. Do people react differently to you now then? Oh, I mean, well, I think there's all sorts of people. You know, they're the people that I did my chemistry PhD with who knew me as a, you know, pretty good physicist and a lousy chemist. There was, you know, the people that I work with every day. And I'm still, I think, just John, but now John with the Nobel, so he's busy. But then there are all the people I meet. I mean, I would, you know, I'll get on phone calls. And a surprising number of my phone calls start with, it's such an honor to speak with you. And I sometimes think, and also with you. There's a certain type of deference or at least excitement, and it's a symbol of this giant AI world and what it can mean in terms of applying AI to solve real-world problems. And then you're a Nobel Prize winner, so you're allowed to have an opinion on anything, and it's supposed to be a bit valid even if it's not. So people want you to show up to things just so that you can symbolize that a Nobel Prize was won, and then you're done, which is not a very satisfying thing to do as a scientist. but you have this platform that maybe you can use to affect how the public thinks about science, how it funds science. So all of these things kind of roll together in this wild combination. I'm, I would say, roughly at the midpoint of my career in terms of kind of time since undergrad and time since rough retirement. And so I've got to figure out what to do in the second half. And that's a, you know, that's a fair amount to live up to. Yeah, absolutely. A lot of pressure going The thing is, is that we're, I mean, we're still only five years on from that CASP breakthrough, really, when AlphaFold 2 smashed the prediction challenge. Did you realise at the time the potential significance of the work that you were doing? We were sure of two things and totally unsure of some others. I think the two things we were pretty sure of, we were very sure it worked even before we entered CASP. We had measured very well, we knew about how we would do in CASP. That we understood and we were careful. We knew that we had solved this grand challenge, but the normal thought in a grand challenge in science is that you'll solve it and it'll be a great celebration and then you will go build effective, useful systems that use the ideas that enabled you to solve the grand challenge and that this was kind of the beginning of an era. I think the real shock to me is those weights that we train, And that system, that piece of computer software has been so incredibly practically important to scientists working in this field to this day that the actual bit of software is used that makes this difference in all these different application areas, all this different type of science published on top of this as a black box computer program. And the extent to which that has entered into scientific practice has been really, I think, beyond my imagination. Yeah. I mean, it's really difficult to overstate the genuine significance that this has had. I saw one thing where AlphaFold was described as the most useful thing that AI has ever done. That hasn't sort of landed with the public yet, has it? I think it's hard for people to appreciate, and you work in science communication, how very hard science is, how very hard curing disease is. We have to work extremely hard to get smaller bits of knowledge about how, say, the cell works, how the body works for protein structure prediction or protein structure, what alpha fold does. I think it's hard for people to appreciate this process takes a year in the lab. I've seen PhD theses that are progress toward determining the structure of X. And that doesn't mean they finished it, just they feel like they're a little bit closer and they need to graduate. I think- Of one protein. Of one protein. And the notion that we'll turn that work into a machine that gives you a really good answer in five minutes. And then that enables so much more work downstream of it. I think there's something like, I haven't looked at the recent number, but 30, 35,000 different scientific papers that cite alpha fold. 35,000 different contributions to our understanding of biology that build on top of this advance. And I think the right kind of way to think of alpha fold is not certainly that we've solved all problems in biology. We very much haven't. I think for this slice of biology that cares about what structures in the cell look like, structural biology, maybe we've made it 10% faster overall across the whole thing. We've amplified this enormous effort and societal expense. And then ultimately we will have transformative science. And there are certain narrow areas, say protein design, that are just being transformed by this understanding. I think one of the ways for me that really demonstrates just how important this was as a breakthrough was the way that biologists reacted when you published the 200 million protein structures. Just tell me a little bit about that because you just put it out there, right? Oh, yeah, yeah. So the original release was a bit smaller, but was still I think it was like 400,000. What I remember was there was maybe a week in between when we had put out our code and the real experts were playing with it. And they were like, this really works on hard problems. But all the other biologists were, no, no, no. These can't be real hard problems like I work on. And then, of course, we put out this huge database, AlphaFold database. And people are like, well, let me just see how dumb the AI engine was and click on their protein of interest, I think, expecting to make fun of it. And then they sat there and they were amazed. I saw this comment from someone on Twitter saying, how did they get a copy of my structure? How did DeepMind get this thing that I had done and not yet published? Like they couldn't believe that this was like literally a machine doing years of painstaking work like this all at once kind of a flash. And what was I think also amazing about it is how rapidly this turned into a community understanding of what AlphaFold does, what it doesn't do, for example, not sensitive to single amino acid changes and how to build this into their workflow and do work on top of it. I thought it would take kind of years for people to really figure out what's the right way to build it in. How do I make sure that I look at the confidence measures where AlphaFold is saying this looks like a reliable answer, this doesn't. This happened within a matter of months, that science is a community, that people developed this incredibly rapid, not totally perfect, but actually really good understanding very, very fast. And so people were doing excellent science on top of AlphaFold. You know, we released this in, I think, July or something like that. And people were almost immediately doing really excellent science on top of it, that really cool work was coming out by the end of that year. And I think that's just a testament to how much scientists are looking for really effective tools that help them push knowledge forward. And then when they find it, they use it well. How do you stay on top of the work that people are doing with AlphaFold now? I mean, because it's so embedded. in the way that people do biology now? I mean, presumably they're not sending you an email every time. Thank goodness. I actually end up I still pretty often just put the word alpha fold into you know searching on whatever X and just see the random work What I love is the random things that pop up that says oh we use this to do this weird thing I think the other way, one of the nice things about being at a company is if really cool things happen, someone will notice and they'll post it to this chat room that we have to collect various, you know, things that people find cool. And so it's so valuable to collect that experience. And it's so much fun, right? You feel this kind of vicarious ownership of a little bit of that work. Well, OK, tell me then, what is the most random, unusual use of alpha fold that you've come across? One that I really love is this protein in bumblebees. And they're trying to understand bumblebee populations, you know, reproduction, but their biology to try and, you know, enable pollination and understand things like colony collapse. And so there were some important proteins involved in the honeybee life cycle that they were studying with AlphaFold. And you can see ultimately how this kind of leads on to bee conservation. And I think it's so interesting to see the structural biology of this echoing into all these things that we care about, from food to industrial production to everything else. They're all connected because it's all the same biology, right? You know, plants, animals, they have basically the same proteins. We were definitely thinking most about human health. We weren't thinking, how am I going to help honeybee populations? But here we are. There was another really nice story that people were trying to understand human fertilization, when an egg and a sperm meet and come together and eventually fuse. Right. So they want to find the exact proteins that were involved in sperm sticking to eggs. In sperm sticking to eggs. And they, I think, had the full picture of all the egg proteins, but not of all the sperm proteins. And in fact, there were two independent groups that did this. And they said, well, there are only 2,000 proteins that we know that are on the outside of sperm. Why don't we just try all of them and see which ones stick to the proteins that we know are on egg? And if you think about doing this experimentally, that's like, well, I'll spend the next two millennia, you know, a year at a time, $100,000 each in the next millennium, and I'll get a nice paper in Nature. That's not a feasible approach. But AlphaFold is pretty fast, and they had some computers available, so they tried all of them. And then they both came out with this one protein, T-MIM something, I can't remember the number. But this was the one that they didn't know what it did before, and now they find out. AlphaFold says it sticks to the egg and that this is how kind of the first step of fertilization. And so, of course, they don't just trust AlphaFold, right? It's a computational system. So they went and they said, well, what happens if I remove that protein or if I change that protein? I find if they change that protein or remove that protein, then sperm and egg will get close, but they won't fertilize. So you go from this broad hypothesis. There's some protein on the surface of sperm that does this. AlphaFold says, I think it's this one. And then you go do your detailed experiments to confirm. And now you can think about questions like infertility. Now if you see mutations in that protein, maybe that's a cause of infertility. Maybe we can think about treating that. And we go from kind of rough hypothesis, alpha fold in the middle, confirm with experiment, and now maybe we can think ultimately about something like drug design on top of this. But we have to get this biological understanding first that we bring meaning to all those pieces in the cell. And that's what alpha fold, I think, really helps with in the early stages is bringing meaning to the parts of the cell. and then later companies like isomorphic use it in order to build small molecules that have targeted effects. We should talk about the difference between alpha-fold 2 and alpha-fold 3 though right because I mean alpha-fold 2 was like predicting the structure of proteins from these strings of amino acids but of course biology isn't only about proteins right you've got all of these other biomolecules you've got DNA you've got RNA you've got like small drug molecules for example you've got ions you know charged particles etc like all of these are interacting with each other. So how early in the process did you know that you needed to change the fundamental model of alpha-fold-2 in order to incorporate those additional molecules? So even before the world knew about alpha-fold-2, we were sitting there and dreaming. And we were dreaming for two reasons. One is that we had a lot of, for example, proteins that exist naturally in what are called complexes, multiple proteins stuck together. And sometimes there's no real way to predict their structure without predicting them all together. So we're already thinking about this multi-protein problem. But we also almost immediately said, well, a lot of proteins, for example, as you say, bind drugs, small molecules, maybe 20 atoms. You can think of aspirin, right? It sticks to a protein. And we knew that this was really important. We said, but later. And we started to talk about this kind of dream about this goal we would call whole PDB, right? So the protein data bank, the PDB, is the data source we use. But we take it in and we threw away a lot of the things. oh, this has RNA or DNA attached to the protein. Well, let's throw away the RNA or DNA and just have AlphaFold predict the protein. Because you can't handle those extra molecules. We couldn't handle that complexity, that we were very driven by, we have 20 amino acids that produces 20 types of structures, and then we will predict, and all our code was kind of based around that. And we're like, eh, it's a challenge for later, but eventually we'll start doing it. And one of the things we almost immediately realized is a lot of the decisions we made in AlphaFold were very good and very helpful and very annoying to extend to more complicated things. The other bit of work is that we were trying to figure out how to simplify AlphaFold. And we thought, OK, well, AlphaFold is complicated, but maybe there are some things we can remove. How has the architecture shifted from AlphaFold 2 to AlphaFold 3 then? So there's a lot of changes, but I would say there's two big themes of changes. When we were trying to handle much more of the kind of the DNA, small molecules, etc., we adopted this thing called a diffusion architecture, a different way in which we handle our uncertainty. And I can tell you more about that. But then I think the other one was really thinking a lot about the role of evolution and evolutionary data. Well, let me ask you about that then, because this is one of the things that I remember being quite key to Alpha Fold 2, this idea that actually proteins have evolved in lots of different creatures numerous times. And actually, there is like some clues about the evolutionary history that will indicate where amino acids are likely to end up in the final folded shape. So that even if you're starting with a string of amino acids, you're not going in totally blind because, I mean, this sort of stuff's happened before. Right. And that ended up being quite a key part of the model. But it was also, I think, potentially one of the parts that made the model quite, I don't know, like inflexible to other molecules. Is that fair? Yeah. So AlphaFold 2 used evolutionary information in this exuberant way. At kind of every part of almost every block, it was saying, and here's the evolutionary information in case you need it. But a lot of what we studied in AlphaFold 3 that we knew we were moving toward didn't have evolutionary information. So we were shouting at it with nothing. And we were kind of worried that this was both slowing down the network, but also possibly leading to some bad dynamics in how it works. And so we decided to just take that out of most of the network and otherwise emphasize the geometric information, the thing that really is always there. And that turned out to work exceedingly well, actually better than we expected. I wonder if there's an analogy that we can use here, okay, for the difference in this architecture, right? Let's imagine that you're planning a wedding and you've got to do the seating plan for the entire wedding, right? And you have all of these guessed, those are your amino acids. And you have to sort of work out where each one's got to sit. And there's a couple of different ways you can think about this. So you could think about pair wise interactions. So like this person sitting next to this person, is that a good interaction? But then what does that mean for this person over here or that table over there? But you could also potentially think about the history of what you know about those people. How am I doing so far as an analogy? I can go for this. I think, you know, you know, some people went to school together, Some people used to date and had a terrible breakup, right? Those might be... Don't sit them together. You probably don't want to sit them right next to each other unless you're really looking for sparks. But yeah, I think before we just talked about where the wedding guests sit. Now we think about where the flower arrangements are. That's nice. You know, that we think about all these other things that come together to become the reception dinner. In this analogy then, Alpha Fold 2 was very focused on the history of the guests, right? It was sort of like continually checking, looking at where they might best fit based on their past. And that's great for proteins, but it's difficult once you try and include other elements of the reception, like the flowers, like the food. And once you start bringing in other biomolecules, you don't want to focus so much on history. I think that's all true. I think one thing, though, I would say is that we kind of always made this history available, this evolutionary history and the analogy kind of what we know about their past. And what we would find is that AlphaFold, we think, was not relying on it much other than at the very beginning, that it was like saying, oh, well, these people should probably be together. These people should probably be apart. I know a couple of things, but then it kind of trained itself to ignore it. And so by inspecting and seeing that we're probably not using that information, maybe we should stop attaching it constantly into the processing. But then as a result, you managed to massively simplify the model. Well, I wouldn't say we massively simplified, but we made it massively more accurate. and suddenly we were doing new problems. And in fact, we made light adjustments and then we made a much better model. And then it turned out that even that protein-protein problem, something that has nothing to do with ligands or nucleic acids or anything else, even that protein-protein problem got massively better from this kind of science and improvement. Also in AlphaFold3, you have this diffusion element. And I mean, I know that that comes up a lot in video generation, for instance. How does that fit in here? So diffusion is this different idea in how you train a neural network. The AlphaFold 2 system was really heavily based around the kind of shape of a protein backbone. In AlphaFold 3, we went to diffusion where you basically say, here's a blurry image of the protein. I kind of took all of the protein and added some noise, some error, like you looked at it in the wrong prescription glasses, and then guessed the right answer, and you have it constantly refined. And so what this gave us was a really great understanding of local geometry, of how to make things extremely precise, because that's what it does at small scales, and this way of tackling big systems. And that gave us a kind of new approach that we didn't have to get so involved in the details of exactly how proteins look, because they're different than DNA. They're different than RNA and small molecules. and the upside is that it made it really really easy to kind of handle this wide universe of things that we study the downside is that it led to a higher rate of hallucination of weird stuff appearing and so then we needed to handle that in different ways. Well this is one of the big differences between alpha fold 2 and alpha fold 3 right that you have this introduction of the stochasticity that the potential for hallucinations how much should people be concerned about that when they're using it is there a danger that they think because alpha fold 2 was so on the money that they think of Alpha Fold 3 as though it's some kind of oracle? I think one of the wonderful things about biologists is that, you know, as scientists, they're deeply skeptical of their tools. Alpha Fold 2 did have an advantage that wrong answers often look stupid. No one looked at that and said, that's definitely a protein. Whereas wrong answers in Alpha Fold 3 are sometimes more plausible But I think that people have gotten really good not uniformly perfect but really good at saying well AlphaFold 2 is also telling me how accurate it thinks it is in the confidence measure. I should also use that. And so it's this kind of social knowledge. There's no experiment or tool that scientists use that is without limitation, right? Even experimental structure determination has all these known faults. And so scientists, I think, use it relatively well. I haven't seen any uptick in really bad conclusions from people using alpha fold three. I think because it's now such a part of the education and community of scientists that when you use computational methods, here are the confidence measures you look at. We color our proteins by confidence. And ultimately, we also think of it as a tool where we'll induce hypotheses and we'll test those hypotheses experimentally. And alpha fold two is not perfect by any means. It's just very, very useful. How important is interpretability in all of this? I mean, this idea that humans want to understand why alpha fold is folding a protein in a particular way. There's a lot of interest in it. And you will hear people sometimes make very confident pronouncements that, you know, we can only use AI systems if we perfectly understand what they do. And they almost mean, if I can write down an algorithm that I could use instead of that AI system. And I think it's this desire of, that's an annoying black box. What if it just wasn't a black box? And I kind of feel like that narrow demand for we must understand it perfectly is honestly kind of a weird demand. And I think about cases in which we've been perfectly happy not having that in science. One, for example, is just experimental science in general. If you look at, say, how someone crystallizes a protein, early in crystallography, it wasn't clear if those structures were going to look just like a free protein floating in liquid. And more experiments kind of said, most of the time, it's about right. So we always in science totally worked in this kind of partial interoperability way. I think there's really good applications of interoperability when we think about, okay, I want to understand the network so that I can change it and make a better version of, say, alpha fold. And I described some stories earlier about how we do that kind of work. Some people will say, well, I want interoperability so I can trust it. And I think more important than that is if you really want to know whether you trust an answer, well, we have pretty good characterization that our confidence metrics are a reliable guide and people use them in practice to decide when an answer is probably true. I'd love to see a lot more interpretability work go on for AlphaFold. What exactly leads it to generalize so widely? I think there could be more done, but that won't necessarily give people what they think it will give them. I think sometimes if like, okay, the Romans, for example, were building bridges in aqueducts without having a full understanding of gravity, right? They didn't have like Newton's equations. Is there a way in which AlphaFault here is like Roman engineering, but for biology? Like we are able to build stuff now with the tools that you and your team are creating, even though we don't necessarily have full insight into why they're working. I mean, Romans are way back in order to do this. Think about a modern jet airplane or a modern car. We understand the Navier-Stokes equations of fluid dynamics, et cetera, a bit about turbulence. Despite that low-level understanding, we both build wind tunnels to measure flow, and we build simulations that for this precise wing geometry show you how the air goes over it. I think the Roman bridge building is maybe a better analogy of how we do AI development, where we have some intuitions, just like the Romans had intuitions, and they built some beautiful bridges, but they didn't have all the equations and full understanding, and yet they built things they needed, and yet they were able to drive carts across the bridge they built. So we are, in that sense, operating partially intuitionally in AI. But downstream, the people using tools like AlphaFold, I think it's more like having a great computation package that maybe you don't exactly understand your expertise is not exactly in how this airflow results in turbulence, but then you figure out how to change it and adapt and you work with this tool to do your larger scale science. And I think that's really the slightly less Roman version of what AI users are doing. Hey, look, the Romans, there's no shade on the Romans. What have the Romans given us? Yes. Exactly right. Okay. Well, let's talk about some of those downstream applications because earlier this year I got to speak to Max Adderberg and Rebecca Paul from Isomorphic Labs about how they're using you know your your alpha in drug design what has that been like to see this thing that you guys built actually being implemented in drug design in that way I think it's just really extraordinary to see it carried so far and to be a part of, you know, one of the things about drug design is it's not just protein structure prediction. I like to remind people that a protein structure costs about $100,000 and a drug costs about a billion, right? So they can tell you that it can't all be protein structure determination. I think it's really exceptional to see people trying to build on and take these ideas further and really find also a way in order to integrate it into application. We see this across the farm industry. Like, how are we going to build processes around this that enable us to ultimately end up with molecules that are dosed in patients, that pass all these different things? You know, some will help with how does the molecule stick or what is the biology, is this protein a target at all? And some that we have very little to do with, you know, will this drug be metabolized in the liver? Maybe there's a protein-small molecule interaction that you can use to help that. But for the most part alpha fold is probably not the tool. And I think it's really important that we have both the work in how do we understand biology and then the work in specifically how do we make molecules to drug targets. It's an exciting combination. I think one of the things that I hadn't quite appreciated until having that conversation with them is like finding a molecule to bind to a particular protein target is such a small subset of curing disease, right? I mean, Alzheimer's is an example where we know that proteins are involved, but there isn't even necessarily a place to target yet. Well, we don't even know. We still don't know if amyloid beta accumulation is in the causal chain. Is it a symptom? Right. That's a protein. But breaking that up, I mean, there's been some, you know, starting to see a bit of effect. I think it's an example of one of the most important things to say. If you think about finding a drug that sticks to a protein, finding a drug that's non-toxic, at least for the most part in animal models, finding, doing all these hard stages of early drug design that we rightly say are very, very difficult, take people off in years. But the bigger problem to me is like, even when you do all that, 90% of drugs fail in clinical trials. So even though you do all those things right, they still don't work or they're still not safe, right? We determine this experimentally. And a lot of this is our grand ignorance of biology, right? That we don't know the causes of Alzheimer's, of autism, even when we have ideas of cause, you know, for example, Huntington's with very clear genetic correlates, still making a molecule that actually makes those patients' lives better is so very difficult. There are so many giant problems left. We're only starting this world of computational biology or, well, continuing, let's say, but still there's so much left. You know, one case study I like from AlphaFold, one that's somewhat recent, people were trying to understand how cholesterol is moved around the body. This protein that is involved in the transport of fatty molecules from one location to another, I believe it is also found in some of the plaques that build up that are correlated to heart disease. and even as we start to understand that biology and we have this nice piece that AlphaFold contributed the detailed structure of this molecule that they could only take an extraordinarily fuzzy picture with a method called cryo-electron microscopy which is not an uncommon outcome for that technique but then that fuzzy piece actually really well matched with the AlphaFold structure and now you can say, okay, well this is that thing that is moving cholesterol maybe I can interfere with or change how it moves cholesterol Maybe I can add a small molecule. But of course, your first thing you might say, whoa, that's the protein. Why don't you just add a drug that blocks it? And I think you would immediately find out that would be really bad. Your body didn't have this protein by accident. The purpose of this protein is not to cause heart disease, right? The purpose of this is to move fatty molecules where they need to be in the cell. You sort of need that, right? You sort of need that. You're going to sort of need that. So what you're actually going to need to figure out how to do is how might this gives you some new ideas to change how this behaves in the cell without killing the patient and making their life better. And I think alpha fold is a part of that story. It's not the end. It feels like there's a natural next step to all of this, though. If you are like predicting the shape of proteins and then using those models to like interpret the function of proteins in the human body, does it then go on to like designing new proteins? oh, yeah, people have wanted, they've looked at these beautiful proteins and they said, I wish humans could do that, right? And so there's been all this exceptional work, and in fact, a lot of it done at David Baker's lab, who with Demis and I won the Nobel. And AlphaFold has actually been shockingly transformative at this, at saying, well, how do we go from, now we've built these computational systems that understand it, how are we going to design our own proteins? And in fact, a large portion of new approved drugs are proteins. Normally antibodies discovered initially in very, you know, interesting ways, injecting mice or llamas with something that you want to build a protein against and using their natural immune system to find it. But we are starting to talk very seriously about how are we going to design proteins to have the effect we want. And it turns out that the most important part of that is that you can design many things you think might work. It's extraordinarily, you know, it takes a long time. It's difficult. It's expensive to test in the lab. So what's been so important there is using AlphaFold as a proxy for nature, is trying to say, how do we integrate AlphaFold's understanding of how proteins stick together when they do? How do we use that to maximally make a signal for protein design? And people have gotten extraordinarily successful. They've gotten really, really good at getting proteins to stick just where they want. Okay, but hang on, because that wasn't the original intention of AlphaFold to see how proteins stick together. It wasn't the intention of them to see how they stick together. In fact, that was an early surprise from Twitter where two different people said, you know, if you want to know if two proteins stick together, yeah, we were busy making a multi-protein, properly done system. They said, well, just take those two proteins and put some random amino acids in the middle and see if they stick together that way. And that was the best system in the world for seeing if proteins stick together. Wow. We didn't think we were making a system that could help people design proteins in a really deep way. We thought we would go use this fundamental breakthrough and go on and do it. And then people said, actually, it already works. And I think it was this grand story that does show up again and again in AI. So maybe we should have expected it, that if you train a model to be really, really good at a task, it has to learn a lot of deep facts. If you want to be really good at structure prediction it learns some deep facts about how proteins interact And if you do just the right experiments you can kind of access that knowledge But there was this whole field of what I think people started to call alpha where people would find out which things worked. They would just treat it as this really cool black box that they could start experimenting with and try their own ideas on. I think there was a lot of really great science that has been and continues to be done in that vein. You know, we're still figuring out, and there's starting to be work on, like, How do we make enzymes, proteins that do chemistry? How do we do really complicated, sophisticated stuff? We're still, you know, nature would still laugh at us on our ability to design proteins. But we are starting to develop these really interesting tools that are maybe therapeutics, that are also maybe ways to interrogate the cell, that you can bring two proteins together and see how the cell changes because you do that. I think we'll get this interplay in not only the tools we use for therapeutics, but now our ability to poke the cell in exciting ways, to interrogate it. And every time we develop that, we'll develop a more interventional understanding of the cell that we will bring forward to medicine and synthetic biology. Is this alpha proteo that you're describing here? So alpha proteo is Google DeepMind's internal effort to do protein design and thinking about problems in binding and enzymes and really trying to figure out, especially for these super hard problems. How do we get reliable systems? And I think what we're seeing is in the design space still is a lot of success. But when you're actually designing proteins, you have to go to the lab and test them. There's no other way to find out. And then finding out the right ways to predict if they're going to work. And the alpha proteo work has shown that we can get further and further in doing this. Give me an example then of some of the type of proteins that like would be nice to be able to design. You know what? I think if you ask any protein designer, They will have a favorite, and their favorite is really, can we make proteins do things like carbon capture? Can we actually build enzymes that meaningfully contribute to address climate change? I think other ones that you really see, for example, degrading microplastics or environmental plastics. I think one of the things also, though, I'll say as a caution is that for all of these, when you talk about doing a real application, Just like people's conception of drug design is get molecule to stick, drug design done. And that's not the case, right? There's so many more properties you need. You need it to be tolerable in all these ways. You need it to be pill formulatable. You need all these other things. Similarly, in enzymes, you might think, oh, well, you just need to make this reaction happen. An enzyme, right, is a protein that catalyzes a chemical reaction. But no, actually, you need to be able to do this many times, right, enough that you're not constantly having to make new proteins for each reaction. You need it to be fast enough. You need it to not do certain other reactions. You have all these other properties. And I think there's a lot more to be done as we think about going from, oh, maybe this is kind of interesting to this really, really works. Although in fairness, actually, interestingly, on synthetically evolved enzymes, people are already using them. There's a lot of washing powder that has design proteins, which I find fascinating. I think one of the few applications of design proteins and something people would recognize. Yeah, absolutely. How much harder is it, though, to sort of engineer biology than it is to kind of design than it is to just predict? I'm very empirical. You should ask me in three years and we'll know. It's easier and it's harder. One analogy I like to say is if you were trying to figure out what, you know, an object is, and you might say, is this a bicycle? And I would see two wheels, a chain, some handlebars. And I would say, yeah, that's a bicycle. But having two wheels, a handlebar, and a chain doesn't make something a working bicycle. So when you're designing something, you have to get all the details right enough that it actually works. And I think we're still figuring this out in proteins. And right now, protein structure prediction is, you know, let's say solve star, right? It's a very, very useful system. It's not perfect. Design is not yet solved. But I think that it is advancing rapidly. And I don't think we'll be still talking about protein design as incredibly difficult in 15 years. Well, OK, let's just zoom out a little bit on AI and biology more generally, because this whole conversation has reminded me of something that you said when we last interviewed you a few years ago. And I've got a little clip that I can play you of what you said. I think it's really important to remember that these are really powerful techniques that we've developed that are still far short of a real artificial intelligence that you can talk about thinking, and making decisions and everything else. I think that's so interesting because that was 2022, right? I wonder how you reflect now on that. Do you think that machines are beginning to sort of understand biology in an intelligent way now? Have you changed your mind? I think that whether or not they can think, they're extraordinarily useful for solving problems. How far they are from AI or AGI, I think that's almost beside the point. I think the really, really interesting point is where we can characterize these systems as reliable enough, do we find useful things for them to do? I think we need to be much more utilitarian about it. And certainly machines like AlphaFold, I wouldn't necessarily apply the word think. And I don't know if we're in the situation, right, that we used to say, okay, you know, intelligence was playing chess. and we should work on chess because once we have machines that play chess, we've basically got intelligence. And of course, we got machines that played chess really well and superhuman level in what, 1994 was the Kasparov match. And that wasn't the path that led us to machines that can read and write. And so I think we always reach for these problems and say, well, this is the problem or like, you know, people rather optimistically name something humanity's last exam problems so hard that if you solve it there's no point in posing problems to machines anymore and i'm very interested in how do we find those problems that turn out to be so easy in a certain sense that we can do incredibly on them and build very useful systems before we build agi that's those are the kind of science problems and of course you want to use related techniques of the people trying to build agi they're powerful techniques but we don't have to get tied up in the philosophy. We can just build useful systems. In fact, I think the whole industry is thinking a lot about how do we build useful systems that matter for people doing software development, that matter for people doing writing, that expand the nature of the problems we solve. And then we'll see if we end up with AGI, but we will certainly end up with useful systems. So how about the most useful system of all in biology? I mean, you have DeepMind, GoogleDemind have all of these different systems for lots of aspects of human biology, like, you know, alpha fold, alpha genome, you know, alpha proteo and so on. Can you bring those together in a single system? I mean, is there a goal here to sort of build like a simulated cell? You know, I used to work in simulation and simulation is that I will write down the rules for how all the little pieces do their little thing locally. And then I'll put it all, mash it together and turn a big crank. And then I will, I will get it. But we don't even have a parts list for the cell. We have all these effects that I think are not going to give us like a classical simulation simulated cell. I think what we're going to do is build really useful systems that draw information from alpha fold, that draw information from the literature, that draw information from the genome, and use that to say really useful things about biology that matter. And I think quite possibly actually one of the core technologies of that will be finding the right fusion of what we understand in narrow AI systems and what we're understanding about broad machine learning in terms of large language models. Well, so is that, I mean, how do you bring those systems together? Is there sort of ideas from large language models that can be applied? I think very easy to say is, oh, well, we'll just have your large language model call alpha fold dot exe as a tool. But I think there are all these other problems like, okay, well, if alpha fold produces a structure, can these large language models actually understand structure really well? To what extent can they understand these 3D coordinates as well as a human better than a human? How do they bring in information from, say, DNA sequencing from all these others? I think it's far from trivial. How do we get these deep integrations so that a model can understand as much about proteins and protein structure as alpha fold, but also understand the entirety of, say, the biology literature? I'm kind of hopeful we'll get there, but we have to build it. Do you think that there are aspects of biology that are going to resist computational prediction? I think there will certainly be aspects that if you ask deep questions about evolution, you know, or the origin of life, what data are you learning from? What experiments? You're going to have to draw data very far away to answer that question. You might know something about chemistry. You might be able to do these experiments a bit faster, but you're certainly not directly learning from data. Or we talk about evolution and we draw a file of genetic trees, But ultimately, we just have the DNA of the species that exists right now and a little bit into the past. These kinds of things, I think, will be very hard. I think the other thing that, though, will happen is that as we build these AI tools, the space of kind of reasonable hypotheses will narrow. That it will say, probably not that for this reason, probably not that. And our experiments will be better, you know, in a certain kind of Bayesian sense. Our prior over what our reasonable biological answers will narrow because of our computational tools. and experiments will help resolve them. And I think this interplay will get tighter. And as we do more experiments, or as we use AI to do things like protein design that gives us more tools to poke the cell, then we will learn more and we will do more. But I think we'll just see some things will be harder and some things will be easier. And the easier things will happen first. The easier things will happen first. John, thank you so much for joining us. I think it's really easy to have a very romantic idea of science, right? That it's about uncovering the hidden truths of the universe, that your aim as a researcher is to build this picture piece by piece that can help to understand the mechanisms of life. And that, I think, is what makes John's ideas about interpretability completely fascinating, because that is turning things completely on their head. You know, AlphaFold is unashamedly not about the why here. Instead, this is a tool that can just reliably be used to accelerate the work that scientists can do. And then I think when you remember that John Jumper is only halfway through his professional career as a scientist and he's already got one Nobel Prize, you realise he isn't necessarily defending an old paradigm here. He is literally building the next one. And if John's focus is completely on utility rather than understanding, when the person who built the most useful thing that AI has ever done tells you that that is what really matters, well, you have to wonder if he's just showing all of us where science is headed next. You have been listening to Google Deep Mind the podcast with me, Professor Hannah Fry. If you have enjoyed this episode, then please do leave us a comment or a review. And I should tell you that coming up, we have interviews with two of Google DeepMind's co-founders, Demis Isabis and Shane Legg. Trust me, you will not want to miss them. So why not take the opportunity to subscribe to our YouTube channel? See you soon.