Latent Space: The AI Engineer Podcast

🔬 Automating Science: World Models, Scientific Taste, Agent Loops — Andrew White

74 min
Jan 28, 20263 months ago
Listen to Episode
Summary

Andrew White, co-founder of Future House and Edison Scientific, discusses his journey from academia to automating scientific discovery using AI agents. The conversation covers his work on Cosmos, a world model for scientific research, and explores how AI can automate the cognitive processes of hypothesis generation, experiment design, and data analysis in scientific workflows.

Insights
  • Scientific automation is more about automating cognitive processes (hypothesis generation, experiment design, analysis) rather than just modeling specific systems
  • The bottleneck in AI-driven science is often mundane logistics (reagent availability, lead times) rather than model intelligence
  • Scientific taste - knowing what constitutes interesting vs boring results - remains a key frontier for AI systems
  • Enumeration and filtering strategies can be more effective than trying to be smarter, allowing AI to try more ideas faster than humans
  • The transition from first-principles simulation (MD, DFT) to machine learning on experimental data represents a fundamental shift in computational science
Trends
Shift from foundation models for specific domains to general scientific reasoning agentsIntegration of literature search, data analysis, and experimental design in unified AI workflowsMovement from academic research to venture-backed startups for AI science applicationsIncreasing automation of wet lab experiments through cloud labs and CROsFocus on verifiable rewards and human feedback for scientific AI trainingGrowing importance of provenance and citation tracking in AI-generated scientific contentEmergence of world models as memory systems for scientific discoveryTransition from simulation-heavy approaches to experimental data-driven methods
Quotes
"When AlphaFold came out and it's like you can do it in Google Colab, you know, or on a GP or desktop, it was mind blowing. I forget like that protein folding was solved. I always thought that was inevitable. But the fact that it was solved and on like your desktop you can do it was just completely floored changed everything."
Andrew White
"We're trying to automate the cognitive process of scientific discovery. Making hypotheses, choosing experiments to do, analyzing the results from experiments and using it to update your hypotheses or your confidence in those hypotheses, and then leading to a world model which of like, okay, this is how I understand this process to be."
Andrew White
"I think molecular dynamics is overrated. In fact, coming from someone. In the. Thumbnail, you know, and DFT is overrated. In fact, DFT may be even more overrated than, like, the dynamics."
Andrew White
"I think the easy way to succeed in AI over humans is you can try more ideas faster."
Andrew White
"I think there's an unlimited amount of scientific discoveries to be made. So there's no scarcity set where basically we will displace them all."
Andrew White
Full Transcript
3 Speakers
Speaker A

MD was supposed to be the protein folding solution. There is a great counterexample. The counterfactual is basically a group called Deserez de Shaw Research. They had, you know, similar funding to DeepMind, probably more actually. They tested the hypothesis to death that MD could fold proteins. They built their own silicon, they built their own clusters, they had them taped out all themselves. They burned into the silicon the algorithms to run md. They ran MD at huge speeds, huge scales. I remember David Shaw came to a company conference once on MD and he flew in by helicopter and this pretty famous guy, kind of rich, and he gave an amazing presentation about the special computers and special room and outside of Times Square and like what they can do with it. It's beautiful, amazing. And I always thought that protein folding would be solved by them, but it would require a special machine. Maybe the government would buy like five of these things and we could fold, you know, maybe one protein a day or two proteins a day. And when AlphaFold came out and it's like you can do it in Google Colab, you know, or on a GP or desktop, it was mind blowing. I forget like that protein folding was solved. I always thought that was inevitable. But the fact that it was solved and on like your desktop you can do it was just completely floored changed everything.

0:00

Speaker B

This is the first episode of the new AI for Science podcast on the Lease and Space Network. I'm Brandon. I work on RNA therapeutics using machine learning at Atomic AI.

1:12

Speaker C

My name is RJ Haneke. I'm the co founder of Miro Omics where we build spatial transcriptomics AI models.

1:23

Speaker B

The point of this podcast is to bring together AI engineers and scientists, or bring together the two commun. These are two communities which have been developed independently for quite some time. But there's been some attempt to combine them and only now, after many years are we starting to see some of the big developments start to play out in the real world and start to solve key scientific problems. There's no one size fits all solution. You need domain expertise. You need people on both sides of the aisle who can really talk to each other and really work together and understand both the modeling and all of the real subtleties of the system you're actually trying to work on. We hope that we can connect these communities and that we can provide a starting point for this new era of AI and science to move forward.

1:30

Speaker C

So without further ado, let's get started on the first podcast we're really happy to have in the studio today. Andrew White, co founder of Future House and newly formed startup Edison Scientific. Rather than introduce him, I'll let him introduce himself.

2:17

Speaker A

Hi, I'm Andrew from San Francisco, former professor, now running two startups, one that's a nonprofit research lab and one that's a for profit venture backed company. And we're trying to automate science.

2:35

Speaker C

We're going to get into all those points.

2:47

Speaker A

Yeah, really happy to be here. Thanks for having me on.

2:50

Speaker C

I want to know personally about jump from academia to industry and or quasi industry. So I would love to hear that story.

2:52

Speaker A

Yes, I guess that's the whole story. Right. So I did my PhD at University of Washington and I worked in a group with I think 19 people doing experiments and like two people doing simulations. And I was working on a topic called molecular dynamics, which I think is actually suddenly becoming interesting again as everyone's looking for ways to generate data from first principles. Simulation and molecular dynamics covers basically everything that's molecules moving around in dynamic systems, like biology, things like that. Of course the complement in material science is things like density functional theory where you can model chemical reactions in these solid systems. So I was working on that. We work on biomaterials and so the goal of my PhD was trying to find what are called non fouling materials. So in biological systems whenever you put like a foreign object into the body, it will trigger a response. And that response called the foreign body response, basically encapsulates it in like this layer of collagen. This actually is exploited for some implants like if you get a heart, sorry, pacemaker installed, like it coats it with this collagen so that if you go to change the battery, you can almost change the battery out like without even bleeding because like the body has like completely encased it. And this is great for pacemakers, but for like a glucose sensor or like a, you know, brain cognitive interface. BCI is what they call it now. Yeah, there it's not so great. And so that's why some of those things have like a limited lifetime because eventually your body treats it like a wound and heals and rejects it. Yeah, it's, it's kind of like some rejection is like immune based.

3:02

Speaker C

Okay.

4:38

Speaker A

And so that's where like if the body can see anything on it, like if it can see like some, some ligand that combined to with antibodies, then you get this like inflammation which is like an rejection response you see in organ transplants. But with materials the body's just like, oh, there's just like a wound or there's something here and it just covers it up. Yeah, I think you know, the research in that field has gone on a long time since I left my PhD and there was a lot of theories about it's related to the mechanical properties in the material. Like if it's spongy, there's things like if it's trabecular, like have a bunch of little pores in it. We worked on the theory that it had to do with how hydrophilic the material was. Okay. But anyway, so I was only one working on computers in this group. I couldn't figure out, like, how to connect what's on the computer with what's done in the lab, because you can make like a simulation of whatever, 10,000 part, 10,000 particles, 10,000 atoms. And it's like, well, this is not going to model the human body. A lot more atoms involved. So I had a good time. We did some cool stuff, some bioinformatic stuff. I learned a lot. But then when I did my postdoc, I was like, okay, we're going to try to merge experiments and simulations. So I worked on this theory called maximum entropy. And it's about how do you take complex simulations and match them to limited observations? And it's like the inverse of machine learning. Machine learning is like you have simple models, you're going to a lot of data where I had complicated models and trying to fit to very little data. It was fine, it was great. We wrote some papers, it was useful. And then I started my research group at University of Rochester on applying these methods to model peptides. Yeah, and I'm always too early for things. We studied peptides for, I don't know, four or five years. And it was a cool niche field, not that popular. Now peptides are like the hottest thing ever. I think there's even like a peptide rave I heard about a couple weeks ago. But when I was an assistant professor, nobody cared about peptides. So we worked a lot on different ways to combine them. We looked at different experimental methods that we could do these molecular dynamic simulations of peptides. And then in 2019, I was out on a sabbatical at UCLA. They have a place called the Institute of Pure and Applied Mathematics there, which is like this institute where people can go and do a sabbatical and learn new methods. And they happened to be doing machine learning for physics. I think the name of it was like some symmetric thing. It's like machine learning for physics and physics of machine learning.

4:38

Speaker C

Okay.

6:57

Speaker A

It's kind of a cool concept. But Yann Lecun was there and Frank Noe was there, who's a big guy in Europe in this. I don't know, it's just Terrence Tao came by, it was really great group and everyone's kind of jamming. It was like 2019, so there had not really been the big hit, especially in non computer science fields. And then I came back from that and I was like, well, I got to teach class on this. I'm writing a book about how you can apply these methods in chemistry. It was very kind of niche field because every machine learning class that my PhD students could take at the time this is when I was professor at University of Rochester, it always would end in like, okay, this is an RNN and this is what you need to know or this is how you do image classification. But in chemistry it's all about graphs, right? It's all about how do you represent these graph structures. It's all about symmetry and geometry. And that was like not a thing that was very popular. But you had Max Welling on before and the godfather of geometric deep learning. So I wrote this textbook about these methods and there was a bunch of interesting mathematics to it. I had a good time and stuff. And then I think I was following the news in the space in codex. The original codex came out and I had been looking at transformers for a while. I just tinkered with them and we started trying them on, doing some chemistry tasks. We were really impressed actually. And we wrote a benchmark and this is like 2019. Wrote a benchmark of verifiable rewards in 2019. Maybe it's 2020 by then, but like.

6:58

Speaker C

Ahead of the curve.

8:25

Speaker A

Here'S a function and sorry, there's like a task which is like I have a body of a function for like a Markov chain Monte Carlo simulation. It's missing some pieces. Complete it. And then we had like a verifier that would see is it a valid MCMC simulation? What do you call it? We wrote this paper ended up coming out, I think 2021, 2022, because it took a long time to bank enough questions. But I wrote an opinion piece about how transformers could change how we think about chemistry and things like this and how we teach it. And then OpenAI some people there, lama was there, she saw this paper and they reached out like, hey, we're building this new model and we think it'd be great to red team it to what could happen with these models if they're applied to chemistry or biology. And so I was a red teamer for GPT4 and I was using it like nine months or something before release was like August. So GPT4 came out in March and I was using it in August.

8:28

Speaker C

Yeah.

9:20

Speaker A

And then like the REACT and Miracle paper came out. I think Shen Yu, he wrote that paper and I plugged it in with GPT4 like in the fall, you know, and I was like, wow, there's so much stuff coming out with react. Yeah. And it was really exciting. And Then so when GPT4 came out, I released this paper called Chemcro. I work with Philippe Schwaler in Switzerland on this and IBM.

9:20

Speaker C

So that was like REACT applied to chemistry.

9:41

Speaker A

Yeah. And what we had is there was a cloud lab that IBM built in Switzerland. So we had GPT4 operating the cloud lab. And then it was like I had written a literature research agent that did like agentic rag. Again, nobody knew what agentic rag was at the time. I think actually Harrison Chase had written a blog post about some ideas there. And so I stole some of the ideas, really smart guy. And basically applied that. And we saw some really cool stuff. It was really exciting. And then I wrote the paper. It set off this crazy storm of like everyone was. Had a lot of anxiety about AI progress.

9:43

Speaker C

Yeah.

10:20

Speaker A

And I ended up visiting the White House. I guess my, my paper was like the only time a preprint or peer reviewed paper was presented to the President on like their schedule for a certain amount of block.

10:20

Speaker C

Wow.

10:31

Speaker A

And the National Security Advisor at the time, Jake. God, I was confused. Yeah, no, sorry. One of them is a talk show host and one of them was like the National Security advisor. I forget which is which.

10:32

Speaker C

That guy.

10:44

Speaker A

That guy, yeah. He had a presentation about our paper and they presented because there was a big tech CEO summit at this time where they sent out Sam Altman and some other CEOs out there and they.

10:45

Speaker C

This is the future of chemistry as a language or a different one.

10:55

Speaker A

This is the Chemcro papers. Chemcro paper. Yeah, sorry, I probably should name these things. And so it was crazy. And they had me go out there and then I met a lot of three letter agencies I didn't really want to meet. And like, you know, there's like some. Somebody from one three letter agencies. Like how does this change explosives? You know, another three agencies, like how does it change breakout time for nuclear weapons research?

10:57

Speaker C

Yeah, yeah.

11:19

Speaker A

I was like, guys, like I don't, I'm not really sure. And. But it turns out that like, you know, there was not that many people. World experts on, on AI and science.

11:20

Speaker C

Right. So what's the answer?

11:28

Speaker A

Yeah, great. Good question. We'll come back to that.

11:30

Speaker C

Okay, yeah, let's come back to that.

11:34

Speaker A

In the end, I had a lot of energy, a lot of excitement about this area, so took a sabbatical from University of Rochester. I met with Sam Rodriguez and Sam had been talking to Eric Schmidt and Tom Kalil, who was also a National Security Council in the Obama administration, about how to like scale up these ideas. And so Sam had this concept of like focused research organizations, which is how do you do science? Like not in academia, not in, in a. In like one of these kind of near monopoly tech companies, these big labs and want to try this idea out. And I was like, hey, we should do this around agents for science or AI for science. I love Sam. He pushes me to come up, you know, with really lofty ambitions. So we decided to automate science as the goal instead of like, see what fun stuff we could do with agents in science. But I think that was maybe the real mission. But of course, automating science, the, the long term mission.

11:35

Speaker C

Yes.

12:27

Speaker A

And so that was what led to Future House. And that was a very long winded.

12:27

Speaker C

Yeah, no, no, that's great. So, so you chose to leave a tenure track position.

12:31

Speaker A

So I, I was on sabbatical, which is a, a beautiful concept. Um, but then I did resign my tenure position when we co founded Edison.

12:36

Speaker C

Yeah. Okay.

12:44

Speaker A

And I think it was, you know, I, I had been on sabbatical for a very long period of time and so at a certain point I just had to resign my tenure. So I resigned tenure in June.

12:45

Speaker C

Okay. Oh, so that's only recently.

12:53

Speaker A

Yeah, only recently.

12:55

Speaker C

And you just felt like this is the direction of my career.

12:56

Speaker A

Yeah, yeah. I mean I got tenure and I had these early career awards like the NSF Career Award. It was great. And I think academia is really exciting, but I just thought that right now this kind of area, like AR for science is just, I think A difficult to do in academia and B, so exciting that I think you can take bigger bets. And I think having a tenured position and writing research grants is maybe not the biggest bet you can take on a field. Now we have a venture backed startup called Edison, which we spun out of Future House. And we took a lot of the ideas and we're trying to do this at an even bigger scale right now.

13:00

Speaker B

Edison was always kind of the plan, going back to Sam's idea of a fro or fundamental research organization. He always had this goal of let's do fundamental research in this tightly scoped nonprofit which can kind of explore. And then you have that as a natural arm for spinning off venture backed.

13:40

Speaker A

Yeah, I think that's right. I think some things that make that not as clean these days is how expensive AI research is and how expensive GPUs are. So I don't think we can repeat it many times from Future House. It might be like an N of one thing right now, maybe not. I don't know. If venture capital keeps growing, then maybe we can. But yeah, I think we took a lot of ideas of future costs. Another thing, I think we expected it to be harder to automate science. And actually it's really hard. I feel like I'm always miscalibrated in this domain, but it's always hard to predict progress. And I think that I overestimate the speed of things on month scale and I underestimate things on year scale. So the two years from 2023 to 2025 was an enormous amount of progress. It always felt like things were not going as fast as I thought. But when you look back on it, wow, like there's a lot of progress. And so I think the idea that I think in Future House Center, Sam actually regrets us writing this, but in the original marketing or like the announcement is like it's our 10 year mission, automate science. And like now it's like, okay, yeah, yeah. So two years later we had Cosmos and things are going so much faster. And it also this kind of this, this thing, you notice it in San Francisco is like, it's actually kind of hard to find problems which are so hard that they are a challenge for language models, but not so hard that they're impossible and we're in this gray zone. And actually I feel like that's where we are right now is that we can actually automate so much of the scientific method. Because it turns out, especially in a field like biology, which is very empirical, limited, the top 1% guesser of what they think will happen in an experiment, the top quintile or quartile, they're about equal. And so even if you had an. Even if we wait 10 years and get even smarter models, I don't think it's going to really change the fact that we're ready to automate a lot of science with existing LLMs.

14:02

Speaker B

What do you mean by automate science? That's like a pretty loaded state. Lots of things. There's many ways of thinking about that.

15:54

Speaker A

So we try to draw a line between what I would call groups that are trying to model something like the cell or how proteins fold or how antibodies can be Designed or maybe the virtual cells. Like an example. If they're trying to use machine learning or AI to model some very specific system. We're trying to automate the cognitive process of scientific discovery. Making hypotheses, choosing experiments to do, analyzing the results from experiments and using it to update your hypotheses or your confidence in those hypotheses, and then leading to a world model which of like, okay, this is how I understand this process to be. And then that begets new hypotheses or new experiments. We want to automate that sort of loop. We thought that we would have to build up a whole new organization from the ground up for agents. So it means automated labs. It means putting all the papers in one spot, getting APIs wrapped around everything. But over time, the models have gotten better and better that we had to stop and rethink. Okay. We don't actually have to hold their hands so much anymore. Or they don't actually need to necessarily have an automated lab. They can write an email to a CRO or something, or they can tell you what experiment to do and you can take a video of you doing it and then show it to the model. And they can be like, okay, well, this is, you know, what happened. So it's been a really interesting experience of like, sometimes, you know, we over engineer things and sometimes actually basically just mostly over engineer.

16:01

Speaker C

So I always think about systems and scientific. Is a system, like, scientific process as a system? I always think of it in terms of constraints.

17:20

Speaker B

Right.

17:28

Speaker C

And like, what is the bottleneck in the system? So that. So what is your hypothesis about this?

17:28

Speaker A

Right.

17:34

Speaker C

Like in my mind, not knowing a ton, but in my mind, the constraint of the scientific process is the work you do in the lab. And that's sort of notably missing from well, not entirely. You said, you mentioned automating lab and whatever. So, like, how are you thinking about this?

17:34

Speaker A

Yeah, I think you're right. Is that basically the best model, whatever, Opus 7 or GPT 10? Like, yeah, it really can only propose the first experiment, maybe slightly more clever. But at a certain point you just need information, right? Like some little calculations you can do that like there's more atoms in the brain than you could ever simulate even if you had all the energy from the sun. Right. Like, I think you could simulate maybe a thousand brains in real time with all the energy in the sun because. Just too much information.

17:50

Speaker C

Yeah.

18:18

Speaker A

So science really hits these, like, bottlenecks where you just actually have to go measure things.

18:19

Speaker C

Yep.

18:22

Speaker A

We definitely think about maybe like lab in the loop Sort of situations. Like one of our papers, which was called Robin, is that we like had one of our agents propose an experiment. We did the experiment and then we had our agent analyze the experiment and propose the next experiment. And that kind of loop, I think is where you want to get to. So what is the bottleneck in that? I don't think it's like the intelligence of the first experiment. I think the bottleneck might be something like right now I think the bottleneck is something silly like knowing what's the lead time on all the reagents that you need and what is available in the lab.

18:22

Speaker C

Yeah, yeah, yeah.

18:54

Speaker A

I think whether GPT 5.2, Codex Max or Opus 4.5 is going to do better probably doesn't matter. It's just a matter of which one's going to have all the information about what's in the lab and how much it will cost, how long will it take. And also, I guess the kind of frontier that I think about for these models is taste, which is like a lot of science. Of course we want to accelerate technology, we want to improve the economy, we want to improve people's life expectancies, we want everyone to be happier. But a lot of what is done in science is based around like human preferences. Like why do people study, I don't know, a particular worm? Well, like there is a theory that by studying the worm it has led to good medicines or it's led to discovering new genes. But also people studied in the past, people's careers depend on that worm and people want to write papers about that worm. And so there's a human element to some of this. And I think that models don't capture that so well about knowing what is an exciting result and what is a boring result. I see. So I think that's like a scientific taste, it's like a broad category of all these things.

18:55

Speaker B

How do you def. Like, do you try to quantify taste in any way? I mean, I know that I have some fun anecdotes about this, but maybe, yeah, just like to hear what you.

19:55

Speaker A

Yeah, actually we sat on this idea. We sat on it, but we like argued about it for a long time. Sam and I usually every Monday morning at 8 o' clock in the morning, Sam and I meet and we're both, you know, caffeinated and ready and we argue about stuff like this. And we had a lot of Mondays where we talked about scientific taste. And in the end we're like, okay, let's just, let's just do the dumbest thing which is to like have our agents make hypotheses and put them in front of humans and have them be like, I like this one or I like that one. Right? So we just did like whatever, RLHF on hypotheses. And we learned a lot about how bad RLHF is with people. Just like people pay really attention to, to the tone, to the details, to how many specific facts or figures from the hypothesis actionability about if the experiment is feasible. But what people didn't really pay attention to is I don't know how to describe this, but if this hypothesis is true, how does it change the world? If the hypothesis is false, how does it change the world? How much information you gain, it's not really information but impact or something. And that really didn't come through from those things. So then we're like, okay, well this is maybe one strategy and so maybe go back and think about it more. And we then took a pause from that research and then we made Cosmos and then Cosmos has baked into it taste, right? Like at the end of the day there will be some report and we're working on generalizing this. Basically at the end of the day like okay, I made these discoveries and a person will be like great, I'm going to download that one. Or I like that one. Right, I don't like this one. And that rolls up to some hypothesis that came earlier in the process. And so we think we can get to end to end on this as opposed to human preferences.

20:06

Speaker C

So you mean the feedback loop is the click?

21:38

Speaker A

It could be the click. It could be also like the, you know, we do an experiment sometimes in Cosmos you could ask to end an experiment and go see if the experiment is success or failure or something like. But I guess like we, we've brought it out of this kind of like hard to quantify is this a good hypothesis or a bad hypothesis. And into this like you can see some downstream consequences of the hypothesis.

21:42

Speaker B

So yeah, humans have I think a very well calibrated nose for science. Maybe you could argue there are sociological effects across the community. But ultimately oftentimes people really good scientists know right off the bat is this going to be likely to be useful or not. How many attempts did it take before you started to see results that to yourself seemed useful? Even working on this for I guess two years now.

22:02

Speaker A

I think when the co scientist paper came out from Google, I think it was a really interesting idea to do this tournament style or just pairwise ranking of hypotheses. So I think cos science is very interesting. Counterexample to what we built is that what we built is something with either lab in the loop or data analysis in the loop or literature research in the loop where you're iterating on an idea. And then co scientists took this very different approach of let's list all the ideas and then try to come up with a filtration process to come up with the best hypotheses. So co scientists will produce very long reports of like, oh, we like really tested this idea with lots of dialogue and it's very interesting stuff. And I was really impressed with the paper that came out and then we had this Robin paper and one of the things that came out of the Robin paper is that the hypothesis that people thought was best was not the one that led to success in that paper.

22:35

Speaker C

Interesting.

23:25

Speaker A

It was in age related macular, ocular. Age related macular. Basically it's like part of the eyes. You're going blind because you have this accumulation of debris in the eye and can't clear it out. That's the major cause of blindness in people over 60. Ollie who works on the Hill.

23:25

Speaker C

Yeah.

23:42

Speaker A

He'll cringe when he hears me say that. But something like that, Something like that. Sorry Ali, in that one we went to optometrists or ophthalmologists. I actually get confused on that as well. Sorry Ali. But essentially, you know, ask him what hypothesis do you think are good hypotheses which you think would lead to like a good mechanism for treating dry amd? Yeah, and yeah, there was, you know, they agreed to be on the top 10. But, but beyond that it was, it was kind of noise.

23:43

Speaker C

Yeah.

24:09

Speaker A

And then, you know, the, the, what we found was ribosutal was a very good medicine and it had a mechanism that I think is novel. Although there was lots of debate on X because in, I think in 2012 there was a master's thesis which proposed this mechanism on like page 38. I actually think it was a typo. I think they meant wet, wet amd. But anyway, I won't, I won't beleaguer the point. I will concede that maybe was there is one reported example of it in the past that was a really eye opening experience for me because that was the first really serious test where we really went to the lab and we spent like four weeks on a battery of experiments to see which hypothesis led to a good mechanism and a good repurposed drug. And it was not as correlated with human opinions as I expected. And so since then I think that I have a lot more faith in these verifier in the loop kind of scenarios where you have either data analysis, literature search or you're running a unit test or whatever you're going and running the experiment. Anything like that I think is going to give you a higher signal than the sort of vagaries of like, oh, this is a higher opinion or that we like this one better.

24:10

Speaker C

Yeah. Maxwellian called it nature's computer.

25:19

Speaker A

Yeah.

25:22

Speaker C

You have this computer computational cycle you're running and the nature is part of that.

25:24

Speaker A

Yeah.

25:29

Speaker B

I am curious. So you said that there is a paper which maybe could propose, maybe propose where this molecule came from, but do you have some way of interpreting or understanding where that hypothesis originated? In the absence of that, is there a trace of thought train?

25:29

Speaker A

Yeah, yeah, yeah. Actually this is something we pay really close attention to at Future House and at Edison was provenance of information. So our first sort of agent was paperqa. Sorry about the name paperqa sounds like an Ethel set, but it was an.

25:47

Speaker C

Agent, it really does.

26:00

Speaker A

Paper QA has like every sentence that it outputs has a citation to a page. Right. So it's like a lot of provenance and then we basically built along the philosophy for everything. So Robin would, which is the name of this, I don't know, workflow or something you can call it, that led to this result on Ripecidil being a good therapeutic for dry md. It has data analysis that goes, shows you which line Python code led to the result here. And then that is like, okay. Then it goes to this other model which says, well based on this literature finding and this result from the data analysis, I believe this is the right thing. But where does the original idea come from? Like going after these rock inhibitors, which is the mechanism or the target was basically enumeration. And so this is like if you can't be smarter, you can be, I don't know, you can try more times. And I think that was like the theory of the Robin paper was that we can put out a whole bunch of hypotheses and then we can filter them. Just like I think about how co scientist did is you go for a filtration process. But the difference is that in co scientist their filtration process was other LLMs sort of ranking it with rubrics or like Personas. And our filtration process was like literature search and data analysis. Like here's some data. Is it consistent with the data? Go see if anyone's discovered it in the literature or if they've disproven it. And I think that's the easy way to succeed in AI over humans is you can try more ideas faster.

26:02

Speaker B

Something I've heard people say, and maybe I've experienced this in my own life. Sometimes hypotheses are kind of cheap, especially in biology. It's many ways actually easy to come up with what you think could be happening. And it seems like to me verifying is times a big bottleneck in maybe the biggest bottleneck. If you have lots of hypotheses and it costs 1/100th of your Runway to test each one of them or something, you don't have any shots on goal. So how do you make sure that you are actually enriching for good hypotheses?

27:25

Speaker A

Literature and data analysis. Right. There was a time when we used something called tiling trees. Tiling tree is like a literal brute force method invented by ed Boyden, Sam's PhD advisor. And basically the idea is, okay, I want to accomplish X, okay, I could try these methods. And then once you pick I'm going to try this method, then you split into two different paths. I'm going to use this method, not use this method. I've been using this method. I need to have, I don't know, some kind of substrate. I'm going to try this substrate or this substrate or this substrate. And you can basically try to really tile space of all the it is. We tried some early experiments there and you're right, you run into this thing where some of the hypotheses come out, just don't make any sense and you are going to waste a ton of effort if you actually test them all nowadays. I actually would argue that if you go to an LLM and you ask it to evaluate hypotheses, including some garbage ones, it will probably do as good of a job as an expert in the field in filtering them out. That's not always the case.

28:00

Speaker B

I've actually seen that myself.

28:57

Speaker A

Yeah. But there's a lot of gotchas and I think people can miss those. But I think they're actually pretty good. And so I'm not as worried about hypotheses that can fail fast by an expert looking at them. I think now the filtration process really happens in literature and I think the filtration process happens in looking at biobank data or what do we know from GWAS or something other sources of existing data, as much as you can draw upon.

28:58

Speaker B

So with regards to existing data, another maybe a contrarian take is that oftentimes the hardest part is just understanding the context of data and where it comes from and how do you interpret it. I can also think from my own life, multiple cases where the data in some sense was there. And you had two people who were both experts and very smart people who looked at it and drew very different interpretations. In fact, when we were interviewing Heather Kulik, she had some fun stories about using LLMs. And she would find that there would be raw data in a paper which wouldn't agree with the conclusions of the actual paper. And it's straight from the paper. It's not even like cross paper talk or something, man.

29:24

Speaker A

I'm going to be a really boring interviewer and be like, yes, you're right. You know, like this is, this is a hard question. Yeah, I think, you know, to give you something concrete, we have a bioinformatics benchmark. We call it bixbench. Bixbench, like we put it out, we've updated a few times. It's in some Frontier LLMs, when they release their system card, they'll mention expense, like one of the things they test on. And we're getting to 60%, 70% correctness on Bixbench. And we found that actually we're at the point where humans disagree at this level. Like humans only agree 70% of the analysis. And so it's true that when it comes to analyzing data, humans do not agree 100% of the time. There is a certain amount of choice that goes into it. And we try to. So Edison is a for profit company. We like trying to sell some of this stuff to companies and we'll go to some companies like, oh, well, we never impute data. Computing data is bad or you know, whatever. And like, okay, well we'll have to change our agents so we don't impute data for them. But then some of the companies like, oh yeah, we impute data. It makes everything easier.

30:09

Speaker B

Right.

31:15

Speaker A

And you know, you want to know what the real modern dark arts are? The like AI resistant area of the world is like medicinal chemistry. That is like the spot where like, you know, there's so much superstition.

31:17

Speaker B

Oh yeah, yeah, everyone, yeah, everyone is like pseudo religious.

31:26

Speaker A

Yeah, exactly, exactly. You have to be the survivor, I feel. Otherwise you get burnt out.

31:30

Speaker B

But the religions never agree too. Two medicinal chemists will have completely different viewpoints about a functional group.

31:33

Speaker A

Yes, exactly. And I remember this as I talked to somebody who worked with a CRO and they're like, oh, whenever company X orders anything, we never put boron on any of the compounds because they hate boron because there was one program that was killed because there was a boron somewhere in the core and it led to some toxic side effects. So no boron for this company. This company, they love things to be fluorinated or something because they think it's great for the admet properties.

31:39

Speaker B

Right.

32:02

Speaker A

And so there's like all this stuff where you reach the point where you're at, I don't know, human bias level or human disagreement level. And I think we're getting to that point in data analysis. And so of course you will see then that if I take the raw data from a paper and I analyze it myself, I will get a different conclusion. One of the cool tricks you can do is this back to this brute force thing is that I can go to our agent and I can run it 100 times and I can take the consensus like analysis. Or I can say even if you make these three different choices in your data analysis, you get the same conclusion or this conclusion is somehow sensitive to those choices. And then you can. There's even like words. It's like epistemic versus aleatoric uncertainty. Right. It's like this is aleator, which means I think it's noise from the data. Or this is epistemic uncertainty, which means I think there's some choices that are being made. There's some model differences that lead to the disagreement anyway. There's like a Donald Rumsfeld formulation of this as well, like known unknowns and yeah, it was aleatoric epistemic debate there.

32:02

Speaker C

Interesting, this kind of digging into your cosmos.

33:01

Speaker A

Yeah.

33:04

Speaker C

So I glanced at the paper and one of the things that jumps out is that there were certain class of problems for which it was only 50 some percent accurate.

33:05

Speaker A

Oh yeah, yeah.

33:16

Speaker C

And can you talk a little bit about that and how that like, okay, so if I'm just raw, getting 50% accurate answers and then I'm going into the wet lab and being like, okay, try this. And then it's like, like the stupid thing did told me to do it. Well, how do you.

33:17

Speaker A

I would say first of all, that 50%, it's actually pretty good because it's rare that experiments in the lab are actually coin tosses. Right. They're usually a lot more outcomes than, you know, than binary.

33:31

Speaker C

Yeah, yeah, sure. Okay. Yeah.

33:41

Speaker A

But that particular number was human agreement in the interpretation of the results. Okay. And so we asked people to evaluate different aspects of cosmos. We had them evaluate like the data analysis decisions where people ask to evaluate the literature. Like is these. Is do you Agree with it's finding the literature. That number that was 50% that came from Cosmos. Interpretation of some of the analysis. Yeah. So like it might go in literature and find this result and it would say, wow, this is super exciting. This is amazing. Or it might do data analysis. Like, this is a novel discovery. We're really excited about it.

33:42

Speaker C

Yeah.

34:15

Speaker A

And then people would disagree. That's actually not interesting. Or like, I don't agree with is the interpretation of it.

34:16

Speaker C

So it's like picking bad problems maybe.

34:20

Speaker A

Yeah.

34:22

Speaker C

In the, the negative class.

34:22

Speaker A

And so I think it's like that, that 52 or 55, whatever it is, that's interpretation. And so I agree. I think that's where, like I was saying, I think the frontier right now is scientific taste. Yeah. And so that's what we're working on right now is how do you get that interpretation to. To match.

34:24

Speaker B

Could you step back and just introduce Cosmos from a high level? Yeah, yeah.

34:39

Speaker C

I would actually be. And even curious to hear, starting from like Chemcro and you know, you have paper QA, Avery Ether 0.

34:44

Speaker A

Yeah, yeah.

34:53

Speaker C

I'd like to hear a little bit of the lineage and how those different decisions were made. What were the key learnings and how did you get to where you are now?

34:54

Speaker A

Yeah. So I could retcon and tell a really great story about how we arrived at Cosmos. But I will say that, like, to a large extent, we just try a lot of stuff and sometimes it works and sometimes it doesn't.

35:03

Speaker C

Okay.

35:12

Speaker A

You know, I'll say that we're very. I'm a builder. Like, I like to like build things piece by piece. I'm probably some fancy word for it, but I'm like a LEGO guy or something. My vision was that we would make an agent that does this part of the scientific process. An agent that does this part of the scientific process, whatever. And so we had like chemcro, which is going to help us with setting up our medicinal chemistry work. We had protein crow, which we haven't released. I don't know if we will ever release. But protein CROW is like designing proteins we might need for some part of our workflows. Or we had a data ll that's an agent, so an LLM plus tools.

35:13

Speaker C

Okay.

35:47

Speaker A

Or we had Ether zero. It was like, okay. We noticed that the frontier models can't work with molecules very well. So let's make a model with intuition for medicinal chemistry. And that was what led to Ether zero. But then Sam actually really pushed on us to like, let's just see if we can do the whole thing. Let's just try to build an AI scientist. Let's just try the whole thing. And that was what led to Robin. And Robin was like, let's just take these agents we already have and we'll just put them in a workflow. Basically. It's like you could express it in a concise Python file of, like, try a whole bunch of ideas, then go see if they all filter through literature or if they've been disproven, and then go come up with experiments that you could do in a wet lab. And this is our inventory list. And then go analyze all the data, then go back and repeat the process. Right? So that's like what Robin was. And we came across Cosmos. We're trying to understand what is the process that Robin is automating. When it came from this idea of, like, a world model, which is that when we first started Edison, we were thinking, like, what do we. What do we want to change about this? Like, what is new here? And so we spent some time thinking about, well, the scientific process, like, what is actually going on in, like, my brain, which is that I have some understanding of. Of the world or the phenomena I'm studying. And that's my world model. And then a lot of the actions I take are about trying to update that world model. And it's something that changes over time. And so this is like this ability to change over time, but it's also something that is practical. I can use it to make predictions about. I know from this experiment this will happen. That's why it's like a model and not just memory or a bunch of papers or something like that. It's supposed to operate in Cosmos. We tried this idea out, and actually Ludo, who's the first author on the paper, we tried a whole bunch of ideas around world models, and we kind of thought they weren't really appropriate. Like, well, we tried a lot of different ways to do this. We tried method A, method B, method C, and they were okay. And so we all just had to take a break. Ludo, his project didn't work on trying to do this world model stuff, but he's like, I'm going to keep trying it. Ludo's a very stubborn person. So he tried it for, I don't know, a week or two weeks. And he kind of, like, quietly was like, hey, can you guys come take a look at this? And we're like, wow, this is actually really cool. And then we started building on it and jamming, really. And I think what Ludo figured out is that. But you have to get this experimental loop thing. You have to let it. And the data analysis agent is what got us in the loop. So if you put that in the loop of it can really update this world model. Because we were trying to build it around literature before, and when you build it around literature, there's just not really experiments you can do and then see the results for that was like our surrogate was literature. It just wasn't working. Data analysis actually really lets you explore ideas. And so that was what led to Cosmos. And so in Cosmos, we basically we had all the pieces sitting around. We were working on world models, we were working on a data analysis agent, working on a literature agent. And then we're working on. We built a platform for scientific agents. So we had things that can write a lawtech report, we had things that can make nice plots. Then we put that all together and a world model was sort of the glue that allowed it to fit together. An analogy is like encoding agents. GitHub is sort of the glue. There's some shared repo and everyone works on the repo. And software engineers have spent whatever, lots of brain cycles thinking about what's the way to coordinate, you know, and organize, working on code together for a long time.

35:47

Speaker C

So the world model is actually like a memory system kind of.

38:56

Speaker A

Yeah, you can think of it as a memory system. We think about it as a model. So like, it actually you can put in input and it will output predictions. We think about calibration.

38:59

Speaker C

Yeah, yeah.

39:07

Speaker A

But like really it is a set of like a big bundle of information that we accumulate over time. It's distilled in some way and that is like what allows us to do this. And I think you can think about like a GitHub repo is like, it's a distillation. Right. Like really there's a long graph of commits that lead up to it and like the current file system in that GitHub repo, or keep saying GitHub. I'm such a corporate shill here. Git. Your git repo is like a distillation of all of the work that people put in into the PRs into the commits. And so I think there's a nice analogy between a git repo and what a world model is. I see. I think that's just sort of what allows us to automate scientific discovery. So.

39:08

Speaker B

Well, can you talk about like kind of how you implement a world model or is that sort of like secret sauce?

39:48

Speaker A

That's our like, secret Sauce right now.

39:53

Speaker B

That's fine.

39:55

Speaker A

No, it's fine. People have asked about it.

39:56

Speaker C

So one thing that's notably missing is the, like, simulation, right? Molecular dynamics or, like, bolts or.

39:58

Speaker A

Yeah, I want to. I'll help you guys pump up your views here. So I think molecular dynamics is overrated.

40:09

Speaker B

In fact, coming from someone.

40:15

Speaker C

In the.

40:19

Speaker A

Thumbnail, you know, and DFT is overrated. In fact, DFT may be even more overrated than, like, the dynamics. I think these methods for materials or.

40:20

Speaker B

For biology or for both.

40:29

Speaker A

For materials.

40:30

Speaker B

Okay.

40:31

Speaker A

And I can explain more about that. Basically, MD and DFT have consumed an enormous number of PhDs and scientific careers at the altar of, you know, the beauty of the simulation.

40:32

Speaker B

Also random interjection. Once I did an estimate, I think pre. Like ChatGPT, something like 20% of the world's computing power just went to simulating water.

40:43

Speaker A

Oh, my fucking God. Water.

40:52

Speaker B

Yeah.

40:53

Speaker A

Yeah. I had to deal with so many water simulations. I did. I did DFT simulations of water, and they are so annoying. I use these big computers from the Department of Defense, and we. I spent like, I don't know, five months. And by the way, this is pre LLM training days. Five months of compute is actually a really long time. I simulated water with quantum, you know, effects with a grotesque mechanism for how a proton hops through water. And it's on YouTube. It's my number one YouTube video. And it represents, like, until now. And it represents, like, I don't know, a million CPU hours of compute. It was, you know, one of the biggest computes that I. Probably the biggest one I've done in my life so far. Maybe ether zero is bigger, but. But it took a lot more work anyway, and. And what's the point, you know?

40:54

Speaker B

What'd you learn? What'd you learn?

41:39

Speaker A

All I learned was like, what set of hyperparameters reproduce some physical effects of water, but none of it was de novo. Right. And this is the issue with molecular dynamics and DFT is that they don't model the world correctly. And so we have to invent little stories we tell ourselves about we're making good inductive biases. And then it models the world more correctly. Like in DFT, you simulate water at 330 Kelvin. When you want room temperature. Water is room temperature. 330 kelvin. No, it's not. That's a little too hot. Right? And so the issue is that people just make up these things or like, I don't know, GGA or blip or B3 lip. All these different methods, people, they're clearly empirical and then they bolt it onto DFT and they say, look, it's a first principles method, right? But actually you made a whole bunch of choices and you whatever overfit to the validation data to get this to work. And that's. I think MD and DFT are like that because if you go look at the catalysts, what catalysts change the world? None of them are single crystal materials that are really well suited for dft. They're always like, they have grain boundaries, they have dopants, they're complicated. Right. And you never capture dft. So I think this is one of the fundamental, I don't know, dichotomies of the world is that simulations simulate really boring things really well. They don't simulate interesting things very well. And so that's why I don't do DFT and MD anymore.

41:40

Speaker C

What about somewhere like the machine learning stuff like AlphaFold and Alpha was trained.

43:07

Speaker A

On X ray crystallography data. And I think this is the story of MD is that MD was supposed to be the protein folding solution. There is a great counterexample, There's a, I don't know what, there's a word. The counterfactual is basically a group called Deserez de Shaw Research. They had similar funding to DeepMind, probably more actually. They tested the hypothesis to death that MD could fold proteins. They built their own silicon, they built their own clusters, they had them taped out all themselves. They burned into the silicon the algorithms to run md. They ran MD at huge speeds, huge scales. I remember David Shaw came to a conference once on MD and he flew in by helicopter and pretty famous guy, kind of rich. And he gave an amazing presentation about the special computers and special room and outside of Times Square and what they can do with it. It's beautiful, amazing. And I always thought that protein folding would be solved by them, but it would require a special machine. Maybe the government would buy like five of these things and we could fold maybe one protein a day, or two proteins a day. And when AlphaFold came out and it's like you can do it in Google Colab, you know, or on a GPU or desktop, it was so mind blowing. I forget like that protein folding was solved. I always thought that was inevitable. But the fact that it was solved and on your desktop you can do it was just completely floored, changed everything.

43:13

Speaker C

This is like the bitter lesson on steroids.

44:35

Speaker A

Yeah, I don't even know what it is, but it's like Imagine ChatGPT came out, but instead it was like, oh, you can Just run it on your phone or locally on your own desktop. Like that's the level of shock that came out. And it gets down to this thing that humans are really bad at estimating problems that aren't human made problems. Protein folding we all thought was like, would require a huge amount of compute. Very challenging problem. The most hardest problem in the world. Right. And it turns out that you can actually do it on I think the numbers are now like 10,000 GPU hours. You can train a good protein folding model. It's actually turned out to be barely an inconvenience.

44:38

Speaker C

Therefore. Why not?

45:07

Speaker A

Oh, therefore protein folding was highly efficient based on experimental data. They took X ray crystallography. That's what DeepMind did is they took X ray crystallography data. Desiree has tried the first principles method and it's like a nice head to head comparison. Two very well resourced groups, they both tried different ideas and the machine learning on experimental data beat out first principle simulation by you know, a very large margin.

45:08

Speaker C

And so why isn't like bolts or whatever inside of Cosmos? Like why isn't there a tool that's that can run.

45:32

Speaker A

Oh, we have bolts inside of. We have bolts. Gentle gen. Yeah, yeah, we have that inside of Cosmos.

45:38

Speaker C

Okay.

45:43

Speaker A

It is, I mean I think in the version that we have for people to just sign up and use. It's not in there. Yeah. But like you know you can imagine that you can just Modal or Lambda or Tamarind or 310. There's all these companies that basically wrap a lot of these deep learning protein design tools or chemistry design tools. They wrap them in an API and you just give that to give it to Claude code if you want, you can give it to Cosmos and you can be like, hey, if you want to design a protein for X, use.

45:43

Speaker C

These tools, your mechanism it sounds like. Or one of the primary mechanisms that has been successful is enumerate a whole bunch of possibilities and filter. And so how do you think about serendipity and outer distribution thinking and getting there. And how far have you gotten in what?

46:09

Speaker A

That's a great question. I think, I guess the short answer is that there's very. So. So this is the domain of seaborne, so chemical, biological, radiological, nuclear weapons or I don't know, safety.

46:26

Speaker C

Yeah.

46:38

Speaker A

This domain has been explored a lot in history by a lot of organizations and I would say that there was a big question mark for us a few years ago was like how much of this stuff is intellectually bottlenecked?

46:38

Speaker C

Yeah.

46:52

Speaker A

Like how often are People like, oh wow, I want to cause harm, but I need to know like some facts and could LLMs make that easier or go faster or anything like that? I think, you know, the first set of answers in 2023 I think was basically no. Is that like, you know, you can go find the synthesis route for many dangerous compounds on Wikipedia, people know what are the targets in the human body that like are targeted by most biological weapons. It's not really that much of a mystery. So I don't think there was a lot of new ground when LLMs first came about. Then there's a lot of concern about laboratory protocols is that could agents or LLMs reveal some tacit knowledge that maybe people couldn't find on Wikipedia? Or maybe for making something there's some technique that is required when you scale it up in size or something, or maybe there's some way to get around like tracking lists by ordering different compounds or something. And that I think was really well tested by a few different labs, not me, but there were some groups that spun up that started making tests for this and labs pay attention to it. I think it's really been put into process where LLMs will shut down or be filtered in those scenarios. But I think that is actually an area where there is some risk. And so I think this is something that people pay attention to for open source models. And there's still, I think, some discussion there, but. But I think to a large extent it's not really been greatly accelerating in practice, or at least I haven't seen much evidence of it. And again, I think it comes down to the fact that it's not really available. But if you look hard enough, you can find most of the information you would need to get up to no good in the public domain already. But then I think now the next frontier is can it somehow help you with. With real time protocols, troubleshooting more in the loop and more especially in the computational side of things. There were some scenarios that are now coming into focus that could be more dangerous or more intellectually bottlenecked. And so I think people are trying to pay attention to that to some extent. There was a first wave that we thought this could unlock a lot of stuff and I don't think it came to pass. I think there's now an emerging sort of second wave of there are some actually new scenarios that were just too far fetched to consider two years ago that I think are now realistic. Some smart people are paying attention to it, but I don't think it's solved yet. I don't know, it's very vague.

46:53

Speaker B

So I guess one kind of differentiator. There's a lot of talk about AI safety in the broader LLM ASI space and there it's jokes about paper click maxing robots or something. But the core threat here is more a malicious actor using this as a tool to accelerate something dangerous. And kind of the first order hypothesis is that you basically already have to be an expert to effectively create a bioweapon or a chemical weapon. And expert already know how to do this.

49:18

Speaker A

Yeah, I think, you know, so each of the categories in the cbrn, they're all a little different. But I think to a large extent it's a lot of like pushing material around. You know the classical example, nuclear is like, it's a lot of centrifugation, a lot of ultra centrifugation, a lot of high pressure or high RPMs.

49:52

Speaker B

Yeah.

50:09

Speaker A

And so it's just you can maybe get smarter about how to set up, you know, the economy of scale to do that with an LLM. But to a large extent I think you can call your, your friend and country X and they can tell you what are the steps. It's, it's not, I don't think it's that much of a secret. It's just a lot of like moving material around and I don't think it's accelerate meaningfully accelerated. Now that said, there are all kinds of like dumb dual use things of like maybe you want to call a company that makes centrifuges and you want to make sure that they sell you them and they go through some KYC steps and maybe an LLM can get you through the KYC faster. And that's like a dumb thing. Like okay, yes, email makes it so that you can order centrifuges off the Internet more easily. Is email a dual use technology? Yeah, to some extent it is. So I think there's a lot of weird second order things that, that we don't pay attention to in AI safety of. Does it make KYC easier? Does it make it easier for people to know where to order this from or what is the expected price or what should you order first? All those simple logistical things I think are accelerated by AI just as a consequence of AI being an accelerating technology. But certainly. Shit guys, there's some scary stuff. I try not to think about it too much. I don't know, I guess I don't want to get too political, but I do think that right now the United States government is maybe taking a slower, less Intensive look at safety. But there's definitely people I think in other spaces than the US government thinking about it hard.

50:10

Speaker B

Do you think it's a thing people need to spend more time on?

51:45

Speaker A

I do get waves of angst about AI. I'm sure many people living in San Francisco do get a little bit of waves of it and sometimes I think that there isn't enough work being done on it. And then sometimes I think, wow, like I need to mellow out and like, you know, we have lots of time to think about it. What is my opinion on it then? I don't know. I think my opinion is not formed fully.

51:48

Speaker C

Yeah, you and Sam have done a lot of thinking about funding science and future of science. You've been vocal about the reproducibility crisis and other things. First question, why this focus research organization or fro?

52:13

Speaker A

Yeah, focus research.

52:30

Speaker C

Yeah. What does that get you that you don't get from academia or big lab or whatever?

52:31

Speaker A

A nice network of people. I think Edison is a real. Of course, I think Edison's going to do great, but I think it's a mystery of what's going to happen. So I don't think we've had as much friction there as you might expect. But yeah, this is all stuff that, that Sam and I think about all the time is like how do you balance stuff like this? How do you balance the economics? You know, there are some, there are some venture backed companies that are having cash salaries over a million dollars and it's like insane to me that you would use all of your cash from your equity financing, you know, in these.

52:38

Speaker B

Insane salaries they can, in terms of like total spend on GPUs, it can still be a total, a small fraction of your burn. So some, sometimes it kind of makes sense.

53:12

Speaker A

Yeah, yeah, that's, that's one way to think about it.

53:20

Speaker C

So, so like you, this is a good lead into. You are automating science in some capacity.

53:24

Speaker B

Yeah.

53:33

Speaker C

So where does that leave scientists?

53:33

Speaker A

So I think this is Jevons paradox. We can try here is. So let me start with the contrast here is that, you know, if we automate, you know, taxi cab drivers, there's a, there's not going to be an increase in people needing to go places. Maybe there'll be somewhat an increase but like there is a finite amount of like time people will be spending in cars and so there's an upper limit. So when you automate that, that's like a scarcity thing. It's basically you're displacing jobs when you Automate driving in science. I don't think there is a finite appetite or a finite capacity for science. I don't think science is like a scarcity thing. Like there's 100 more discoveries left to be made and then we'll be done. And so we're displacing jobs. I think instead actually if we can make science go much, much faster, there will be no decrease in demand. There will be actually I think an increase in demand that will match whatever automation amount we have. And so my vision for what a scientist would be in the future is that they will be, I don't know, like agent wranglers or cosmos wranglers of like, okay, they're exploring 100 ideas simultaneously or they're like working with systems like ours to make 10x the discoveries, 100x discoveries. Because I think there's an unlimited amount of scientific discoveries to be made. So there's no scarcity set where basically we will displace them all. Now that's kind of like this is what I would tell when I go talk to a first year PhD student. Everything's going to be just fine. But then when it gets into the nuts and bolts, I do agree that this is going to be a really hard thing where like if I am CEO of a company that makes science like a pharma company or material science company or something like that, or r&d arm at IBM, I think, well, I could spend, you know, a million more dollars on, on compute for the AI scientist or could hire 10 more people. I might just choose to go with the AI scientists because you know, to a large extent like hiring people is hard, right? And, and hiring an AI scientist is probably a little bit easier. And so I think that there could be some friction. But another thing is science is in some ways closer to art in the sense that there is a large number of people just appreciate good science. If you get published in Nature, it's not because it's really going to be world changing. Of course that's part of it. But it's also because people are like, wow, this is really interesting science. So I think the enjoyers of science are also scientists. And so I think that it's kind of hard to imagine a scenario when there's not scientists as the consumers of science. And so I think if they're going to be consumers of science, they're also going to be some of the producers who are involved in the process by itself. I don't know if that makes any sense.

53:35

Speaker C

Yeah, you've touched on this the question in my mind is just what does a scientist do?

56:11

Speaker A

Then there's a great short story by Ted Chang, I think in 2003 or something and it's about. Well, at first scientists were displaced and they became the interpreters of what the AI scientists are doing. The scientists read the AI scientists papers and then translate them for whatever popular science or something. And then after that like they couldn't read the papers anymore and so they were left behind and so they had nothing to do and they just sat around. But the problem is that science is like you have to translate science to make any impact. Science cannot exist by itself. I do agree. There's like engineering can exist by itself. Like if you give some kind of system a goal of making me a material that I can make a space elevator out of, you could be not participating in the beginning, the process, in the middle of the process and you just come by the end and be like, okay, follow this recipe like science of like what's the origin of life? Or like is there water run on other planets? Or you know, why is some catalyst better than another catalyst that has to be hitting human eyes and human brains at some point? So I think a human has to be involved in the process.

56:15

Speaker C

Don't want to be contrarian, but yeah, be contrarian. Why does a human have to be involved?

57:20

Speaker A

Why does a human have to be involved? Well, the human has to be involved at least some point to be like, yes, this is good science or this is bad science.

57:24

Speaker C

Okay, so you're. It goes back to taste.

57:29

Speaker A

Yeah. But I don't know, maybe you're right. Maybe there is no point for humans. Maybe we'll be like, what is it? Sora, you know, like the AI slop app. But I think in Sora there's still humans at the end clicking the videos or something.

57:32

Speaker C

Yeah.

57:42

Speaker B

So the Sora analogy kind of brings up an interesting point. Like is it possible that like due to the biases of AI science, if we really go full in science that you know, there still is a market for kind of boutique human science. Like you know, there's still people who want to, you know, paint things old fashioned way. But more to the point, does it become even more important for to have a human who is actively doing their own exploration? Because there will be like large blind spots and biases due to the models that just you'll never be able to overcome. Because this is sort of baked in now due to your training data and without a human that you'll always get stuck and there will be a Blind spot that will never.

57:44

Speaker A

Bio, which is a company in Oakland or in Emeryville, they do really cool stuff with automation. I think they're going to be testing this theory of okay, maybe if that's the bottleneck we can see evidence of it because they're going to start doing really well. It could be true.

58:28

Speaker B

I still though want to say all of those in my mind are still sort of scoped in terms of R and D for pharma or bio, but they're not like none of them are attempting to answer big fundamental questions. And maybe there's different levels when I think about that and you seem to be seems like the focus of future House in Edison is much more towards R and D and end run science. But I have some background in fundamental physics. It's like is there any thought about how do you take on dark matter candidates? And I just think the data to really give us a complete story is just not there yet.

58:42

Speaker A

You know what, like I'm sure everybody at every company like is the biggest critic of their own product, you know, so we think Cosmos is, we think it's great but there's a very large amount of area for improvement.

59:26

Speaker C

So with Cosmos can, so there's like an open like sort of access to everybody version. Do you provide access to other labs that it is less open?

59:39

Speaker A

We have a version of Cosmos that has bigger resources, it can run for longer, it uses GPUs basically when it does data analysis, it'll have a GPU. We use that for things like machine learning experiments. If you want to know this question about whether it's better to pre train first on noisy data or not, we have pre release models that, that are coming out and we try those. But yeah, so I guess yes we do and we do have research partnerships with companies where we build something specific for them and that is something we think about. But broadly I would say Cosmos that's on the website is pretty close to what is the best we have internally.

59:52

Speaker B

I have a question. So you previously have stated that you think that language is the natural language. Was it language of chemistry? The future of chemistry is language?

1:00:35

Speaker A

Yeah. Yeah.

1:00:47

Speaker B

Okay. So I wonder, do you still believe that?

1:00:48

Speaker A

Good question. I think I would say yes, I still believe that. So in that article, the opinion article, my point was that, you know, at the time when I wrote that article, maybe three years ago now or something, maybe 20, 23, it was that we have models for predicting solubility of compounds. We have data about very large populations and we have papers and we have code and the only way to bridge all that information is natural language. And the argument was that humans, whenever we can't bridge information, if I can't talk about my code or I can't talk about some idea to you, I will invent words until I can get the point across. And that humans are always innovating on language to make it represent all known observations. And people innovate on language to represent whatever code pattern they have. Right. This is the only shared activity we've been doing for this long, is coming up with words to represent everything we know. And so I think for that reason, natural language is the only possible way to connect all the different pieces of data we need in biology, medicine, or any domain for that matter. I think there's some caveats to this of, like, you know, you can make an argument like if Yann Lecun were here and he would make an argument about like, you know, world models or like vision or embodiedness. Right. Like, there's arguments against natural language that, like, you know, that maybe there's something more that it does. It's not the complete story. Or maybe natural language imposes limitations that you cannot exceed because you are stuck in this abstract space that was invented by humans and you can't escape it. And so you can, like, touch something.

1:00:52

Speaker B

Yeah, it is an abstraction. Right. And scientists basically work exclusively in abstractions to some degree. I found that interesting because it seems like most scientists are right, like when they explain things, they explain things through language. But many conversations, maybe most at some point result in people drawing diagrams or something. Chemistry, like biochemistry largely, or medicinal chemistry, is oftentimes it's a language of graphs. Right. Or I mean, bonds are abstract abstractions. Yes, but they're pretty good abstractions for many cases. Or geometry. Thinking about protein as the geometry of a protein, I think that that's how people, a lot of scientists, like to think about things. So I find it interesting that you are focusing primarily as language. Have you thought about essentially a multimodal version of this where when it comes along a smiles per string, it doesn't just say, oh, this is a smile string, but this is a graph. This is a representation of some higher abstract object.

1:02:27

Speaker A

You're absolutely right. And the problem with this, I don't know, Jacob's ladder or something, whatever you want to call it is yes, you can say that you can call a molecule by its name. You can show the graph. Then if you go to molecule like ferrocene, well, it doesn't really have bonds part of it. And so then you're like, well, we need to draw it visually. And then you go to a molecule, like, I don't know, glycine, betaine, this dihedral angle, right? And so, like, it's not actually this thing I drew. It's actually an ensemble between this thing and this thing, right? Then you go to benzene. You're like, well, not only is it like an ensemble of these different conformers, it actually has electron density. And you can't really ignore the electron density in benzene. You need to treat it correctly. And it was like, well, you can't actually represent the electron density that way. You actually have to look at the correlation of the electrons individually, right? Because you can't really model benzene with, like, dft, right? Functional. You have to actually look at the electron correlation. These are the electron correlation. Like, well, you can model the electron correlation, but actually, these things, when they're in a solution, they have relativistic effects because there's a whole bunch of stuff around it. So you really got to have the relativity in there. And you're like, well, you got the relativity and you have the electron correlation. You can have the bonds and you have the confirmators. But you really need to think about the cosmic radiation background because it does actually impact everything. And there is some energy there, right? And before you know it, you've ran out of, you know, you've ran out of compute or whatever resource you're using to model this. And so I think you have to draw the line somewhere. Natural language, like I said, is that humans have worked for a long time to make it be the, you know, what's the word? Like, the least abstract or the. You know, it's somewhere on the border of, like, it's still abstract enough that you don't need to know all these details, but it's still granular enough or concretized enough that you actually can make use of it. There may be some other representation, like multimodal might turn out the video or maybe, I don't know, there's some other, like, fusion that you can make. I like natural language because we all work really hard to make it right at that boundary. And I do agree sometimes. Sometimes ideas slip and they can't be in language. You have to get out the whiteboard. Or I just slip and you have to wave your hands around, you know, or maybe then. Then you need that. That. That degree of freedom to communicate.

1:03:35

Speaker C

Just digging in on this a little bit. Yeah, more like famously, quantum mechanics is, like, undescribable, right? Like, there's there's an argument that you cannot understand quantum mechanics with words or in, in with our preconceived understanding of the physical world because it doesn't behave like the macroscopic world. And so that the only way to understand is through mathematics. Right. And I largely see language as the joint key of science as well. But I wonder if that's not true for many domains and quantum mechanics is just the one that hits you in the face.

1:05:42

Speaker A

I mean, I don't know, actually. I think there's like seven principles of quantum mechanics or five or something like this that you can actually express pretty concisely in language. I agree that you need to actually look at the consequences of them. You need some mathematics. I don't know. I actually, I don't know. This is like a challenge. I think you could actually describe a lot of quantum mechanics and language. Sure, sure. But, but I, I see your point. And yeah, I, I guess I'm a realist. Like, I, when I talk to my kids, you know, maybe I, I will be like, okay, let me draw for you. I don't, I don't make sure in our house everything is described with natural language. So I, I agree with you there. I think maybe we can be a little, a little flexible with, with natural language and include equations and smiles. Strings in it. And I think we can get a little bit farther. So maybe that's okay. But some people, I think like optionality, you know, like, oh, it could be this or it could be that. I'm somebody, I like to take strong opinions and see how much farther they can get me. And I think in my career it's actually been better for me to take strong opinions which in my deepest of hearts I know that are maybe not correct or not fully correct, but once you take these strong opinions, you can sort of move many steps down the road once you take these strong opinions. And like, for example, at Future House, we took the opinion that scientific agents are the future and that allows us to skip a lot of steps because a lot of other people were like, we need to build a foundation model for X and we just skipped all that. Right. And I think if you also were unopinionated and you had optionality, like I can think of a famous example of a different company that liked the optionality and they wasted a lot of time on foundation models or something, then I think you get stuck. So that's one of my strong opinions, is that natural language is a way to join all these different domains. It may not be a correct opinion it may be more subtle or more complicated, but it's allowed me to get very far. I'll drop it someday and maybe find a new one.

1:06:17

Speaker C

Yeah, not yet, though.

1:08:11

Speaker A

That's my meta opinion on the matter.

1:08:13

Speaker B

The Ether Zero story on your blog I find hilarious and kind of awesome. You know, when I was a kid, I loved the, like, genie, monkey paw concept of be careful what you wish for, because you just might get it.

1:08:15

Speaker A

Yes.

1:08:29

Speaker B

Maybe just, like, quick story. Can you just talk about that? That was just a really fun.

1:08:30

Speaker A

Etherzero was a hell of a project because conceptually, it was a very short project of like, hey, people have made a lot of progress in verifiable rewards in math and code. Let's see if we can do it in chemistry. So chemistry is not a verifiable field, right? Of course, you can go test something in the lab. But then we had to think about all these ways that we can make chemistry verifiable. And one of the ones we settled on was make a molecule that has three nitrogens, two oxygens, ten hydrogens, or something. And we thought that was a pretty verifiable question. But every time we would train a model, it would find some new, insanely weird trick to generate these molecules. And I'll just tell you, one of the examples was that it would make these molecules, and we would do some checks to make sure it had the right bonds, the right number of electrons, the right number of atoms and stuff like that. But it would just solve the problem in any way possible, right? So it would just put all the nitrogens over here, put all the oxygens over here, which is things that don't look good. And so we started coming up with these rules of, like, oh, let's check to make sure it followed these good practices, or these good practices. We found ourselves into this. It's like the opposite of the bitter lesson. Like, I don't know, the boutique lesson where you try to make everything custom. But one of the things it kept doing is it kept putting these nitrogens in a row, and it put, like, one nitrogen, two nitrogen, three nitrogen, all in a chain. And this is like, if you have three nitrogens, it's like explosive. Two nitrogens, it's bad. And four nitrogens, you can't make. And I kept telling everyone it would make these six nitrogen compounds, and they're just literally impossible, and they're not possible. And many of the people on the team were computer scientists on this team. And one of them one day sent me that. This is on the COVID of nature. Today on nature's website, somebody made a 6 nitrogen compound. And this is somebody's career to deliver this compound. Because this is the most unstable, like insane compound you can make. It's some ridiculous setup. And the spectroscopy to get that proven was very difficult. And I don't know how they did it. It was amazing accomplishment. Like, look, Andrew, it's not actually impossible. And it was so funny to me that our model was sitting here spitting out these six nitrogen compounds in 2024 or 2025, and the paper just happened to come out that year that mankind had finally made a 6 nitrogen compound.

1:08:36

Speaker C

Do you think that those were actually synthesizable even under these extreme circum.

1:10:44

Speaker A

Our model was just. It was just reward hacking.

1:10:49

Speaker C

Okay.

1:10:52

Speaker A

And it was just. The model was so creative in ways to reward hack like one of the one. Another one we did was we wanted it to make sure that when it would propose a reaction, make this compound, Tell me how to make this compound. We would try to make it sure that all the reagents were purchasable. Like you could purchase them. They were not like made up.

1:10:52

Speaker C

Yeah.

1:11:10

Speaker A

And the reason we came up with that is that originally we just like take the end compound and then like remove one atom and be like, here's buy this and then put the atom on. And it's like, okay, well, I wish it was like that. So they have to be purchasable. And then. Well, we're like. We thought it might be hard if they're all purchasable because sometimes you actually order things custom or something. So just make sure one purchasable. So the first thing it starts doing is putting nitrogen in there. Because nitrogen is purchasable and it has no participation in the reaction.

1:11:10

Speaker B

Right.

1:11:37

Speaker A

Like, oh my God. Okay. So they're like, okay, it has to be purchasable, but has to participate in the reaction. They start just putting acid base chemistry, which is put an acid here. Acids are purchasable and it'll move one atom and they're like, okay, fine. Can't be that everything has to be purchasable. Then we find ourselves and I'm sitting there one day building this ridiculous catalog of purchasable compounds and a bloom filter so it can go fast enough on our training loop. And I'm like, why am I doing this? How did I get here? And I don't know. It was really funny because pre training or training transformers on just data, like just supervised training where you just have the inputs and the outputs Directly, very nice, relaxing. You know, like, things are always robust. You know, things go pretty smoothly when we do these verifiable rewards where you have to, like, write a bulletproof verifier. It is really difficult. And we had so many models trained only to find out they were hacking some other, like, random thing in our setup. It's really hard. And I. And I don't envy the Frontier Labs that have to do this at a very massive scale because we had a lot of Adventures in Ether 0. And you guys should read the blog post.

1:11:37

Speaker B

Definitely read the book. Blog post.

1:12:40

Speaker A

It's very fun.

1:12:42

Speaker B

Great read.

1:12:42

Speaker C

Grpo.

1:12:43

Speaker A

We did make some modifications to grpo. Yeah, I actually. I used to know all the names of these modifications, but I think it's like DAPO is one modification. And like, the clipping we did was special and we explored a lot of that stuff.

1:12:44

Speaker C

Yeah.

1:13:00

Speaker A

And it was also one of these things where, like, you think the hypers are wrong, the algorithm is wrong, and then you find out it's just because, like, you had somehow sorted the reagents when you made your training data, but you did test data. You didn't sort them alphabetically. And the model was just like barfing because its whole strategy was to exploit something in the way you sorted things. So, yeah, we explored a lot of different methods and it was. I learned a lot about chemistry, a lot about nomenclature, and actually there's a. I learned a lot about medicinal chemistry as well. More than I ever wanted to.

1:13:01

Speaker C

Awesome.

1:13:32

Speaker B

If you want to do some, like, engineering, just check out Edison Scientific. And I think they're hiring with lots of interesting things. Everything from scientists to infrastructure engineer.

1:13:33

Speaker A

Yeah.

1:13:44

Speaker B

Thanks, Andrew, again.

1:13:45

Speaker C

Yeah, thank you very much for joining us.

1:13:46