"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

Approaching the AI Event Horizon? Part 2, w/ Abhi Mahajan, Helen Toner, Jeremie Harris, @8teAPi

143 min
Feb 14, 20262 months ago
Listen to Episode
Summary

Part 2 of a live AI analysis show featuring discussions on AI for biology/medicine, automated AI R&D, and geopolitical AI competition. Guests include Abhi Mahajan on AI for cancer treatment, Helen Toner on recursive self-improvement risks, and Jeremy Harris on AI infrastructure security threats.

Insights
  • Biology lacks verifiable ground truth for clinically valuable problems, making AI reinforcement learning from experimental results much harder than in math/code domains
  • Automated AI R&D may create strategic surprises but experts fundamentally disagree on whether it leads to full human replacement or just accelerated workflows with human bottlenecks
  • US AI infrastructure is heavily compromised by Chinese components and personnel, creating existential vulnerabilities that require immediate attention to supply chains and data center security
  • The gap between internal lab capabilities and public models may be smaller than expected, but tooling and speed advantages give labs significant practical benefits
  • Market adoption is pulling AI capabilities into production faster than researchers anticipated, with consumer willingness to accept privacy/security tradeoffs accelerating deployment
Trends
Recursive self-improvement in AI systems becoming imminent reality with 2027-2028 timelinesShift from model release-based to continuous AI risk evaluation frameworksChina's dominance in AI hardware supply chains creating national security vulnerabilitiesFoundation models for biology focusing on human tumor data rather than in vitro experimentsExport controls effectively slowing Chinese AI development despite public skepticismJagged AI capabilities creating unpredictable automation patterns across domainsTest-time training enabling personalized AI models that diverge from base modelsInfrastructure security becoming critical bottleneck for AI developmentBiomarker discovery through embedding space analysis without human interpretabilityAI agent workflows reaching production readiness for business automation
Companies
Noetic AI
Abhi Mahajan's company building foundation models to predict cancer patient treatment responses
OpenAI
Discussed for public timelines on AI R&D automation and recent model releases with evaluation challenges
Anthropic
Mentioned for AI safety research, model releases, and commitment to automated AI R&D development
TSMC
Critical semiconductor manufacturer vulnerable to Chinese invasion, representing AI supply chain risk
Nvidia
Key AI chip provider affected by export controls and Chinese demand for advanced semiconductors
Huawei
Chinese tech company developing AI chips as alternative to restricted Western semiconductors
Isomorphic Labs
Announced predictive model doubling AlphaFold 3 performance on key drug discovery benchmarks
DeepSeek
Chinese AI company whose CEO publicly stated export controls were limiting their AGI development
Gladstone AI
Jeremy Harris's company that wrote first US government AI threat assessment for State Department
CSET
Georgetown research center led by Helen Toner that published 'When AI Builds AI' report
Artera AI
Pathology AI company that received FDA approval for black box companion diagnostic model
Axiom
Startup creating models to predict small molecule toxicity impact on hepatocyte cells
People
Abhi Mahajan
AI researcher at Noetic AI working on cancer treatment prediction, known as 'owl posting' online
Helen Toner
Former OpenAI board member, runs CSET at Georgetown, authored report on automated AI R&D risks
Jeremy Harris
Gladstone AI researcher who wrote US government AI threat assessment, hosts Last Week in AI podcast
Dario Amodei
Anthropic CEO mentioned for 'Machines of Loving Grace' essay on AI's biological development impact
Eliezer Yudkowsky
AI safety researcher whose early writings on intelligence explosion influenced current lab leaders
James Zhao
Stanford professor discussed earlier in show for AI for science research and test-time training work
Sam Hammond
Policy advisor mentioned for earlier discussion on US administration's AI competition management
Dan Hendricks
AI safety researcher mentioned for AI threat assessment work and timeline predictions
Nicholas Carlini
Anthropic researcher who participated in heated AI R&D automation debate at CSET workshop
Ryan Greenblatt
Redwood Research member who engaged in AI R&D timeline discussions at Georgetown workshop
Quotes
"97% of oncology trials fail and you can look at that and say like, wow, we're awful bad at designing these drugs. Maybe we should get better at designing them."
Abhi Mahajan
"There is no version of an international treaty on AI that doesn't involve inspections of compute stockpiles and like very precise overwatch of the kinds of algorithms that are being deployed."
Jeremy Harris
"We lack both the technical means to reliably control superhuman AI systems and the trust and coordination mechanisms needed for the US and China to address this problem collaboratively."
Nathan Labenz
"Maybe the biomarkers that define patient response for this particular drug is non human legible. You need a black box biomarker to encapsulate whatever that piece of information is."
Abhi Mahajan
"When AI builds AI, things just might start to get weird."
Helen Toner
Full Transcript
6 Speakers
Speaker A

Hello and welcome back to the Cognitive Revolution. Coming up, you'll hear part two of a marathon live show that I co hosted with my friend Prakash, also known as Adapie on Twitter, in which we explore AI for science, recursive self improvement and geopolitical competition. I love doing full Deep Dive episodes, but I can only cover so many topics in that way and so I am experimenting with higher intensity live shows as a way to deliver what I hope is the same high quality analysis but but in a denser format. In the first half, which hit the feed yesterday, we talked to Professor James Zo of Stanford about his work on AI for science, Sam Hammond about how well the US Administration is doing to manage international AI competition, and Shoshana Tikovsky about AI agent behavior in the wild.

0:00

Speaker B

In this second half we talk to.

0:51

Speaker A

ABHI Mahajan, also known as owl, posting about AI for biology and medicine, including the foundation models he's building at Noetic AI to better predict which patients will respond to which cancer treatments and why. Though he's skeptical of many AI for biology results that have been published to date, he does expect trends to continue to the point where AI is ultimately transformative for the field. Then we talk to Helen Toner about a report that CSET just put out called When AI Builds AI, which summarizes conversations from a closed door workshop in in which participants tried but failed to establish any consensus expectation about the impact of automated AI R and D, ultimately leading to the conclusion that automated AI RD is simply a major source of potential strategic surprise. Then finally we have Jeremy Harris talking about the very challenging position we find ourselves in where we lack both the technical means to reliably control superhuman AI systems and and the trust and coordination mechanisms needed for the US and China to address this problem collaboratively. Plus a bit of discussion of how he maintains situational awareness and how our respective personal productivity stacks are evolving. As you'll hear the challenges of making sense of such massive disagreement among leading AI experts and simply keeping up to date with AI developments coming at us daily comes up repeatedly in these conversations. And to be real, nobody seems to have perfect solutions. One partial solution that I can recommend though is using large language models to help identify blind spots. And for that purpose I am really enjoying the Blind Spot finder recipe that.

0:53

Speaker B

I recently created on Granola.

2:33

Speaker A

Granola works at the operating system level so it can capture all of the audio into and out of your computer, including, if you wish, the contents of this episode and its recipe feature can work across sessions to identify trends, opportunities, or yes, blind spots that only become apparent with that zoomed out view. Obviously this is a tool that grows in value over time, but if you want to try it today, I suggest.

2:35

Speaker B

Downloading the app, starting a session while.

3:01

Speaker A

You play this episode, and then asking it to identify blind spots based on this conversation. What is so cool about this feature for active granola users at least is that the blind spots it identifies for you will be different from the ones it identifies for me. As I said last time, this was fun for me, but especially because it is a new format. I would love your feedback. Do you feel that you got as much value from this more time efficient approach as you usually do from our full deep dive episodes? Or did we miss the mark in some way? Please let me know in the comments or if you prefer by reaching out privately via our Website Cognitive Revolution AI or by DMing me on your favorite social network. With that, I hope you enjoy the Cognitive revolution live from February 11th co hosted with Adapie.

3:03

Speaker C

I'm going to add Abhi Mahajan. Abhi is owl posting online and he works on AI for cancer at Noetic. AI Abhi, welcome.

3:53

Speaker D

Great to meet you. Thanks for having me on.

4:06

Speaker B

You have the great distinction of being recommended to me as the ZVI for AI and biology and the intersection of those two. So big shoes to fill, big reputation to live up to, but excited to meet. This is actually the first time we've properly spoken.

4:09

Speaker C

I wanted from Ron Alpha that you built an entire competitive intelligence platform, LLM based to feed the clinical analysis pipeline. So and also that clot recommends every every cancer drug it sees. So let's, let's, let's talk about that.

4:24

Speaker D

Yeah, the typical way that a lot of like increasingly a lot of biopharmas are interested in asset acquisition as opposed to just developing their drugs from scratch. This is partially because like China is pumping out a lot of very interesting preclinical assets. Why not just buy those for a few million dollars? They've already done the optimization. Let's just run those in patients. Most of the time the way you look for these drugs is either you mine your personal network or you have these clinical trial aggregation platforms that help you do the job. Both of these are obviously lossy and a better way is just scrape the entire semantic web yourself and annotate every single investigational drug you find with your company's priorities. What you think is important to look for modalities that are particularly interested in organize that all into a table, rank it by some metric, and then you give that to the therapeutics team to work off of. Obviously there's still a human due diligence step. These models still are not perfect. Even like 5.2, 5.3 not perfect, but it's pretty good.

4:41

Speaker C

Do you have an internal eval that you run and when you swap model engines regularly, do you upgrade every time a new model engine shows up? You evaluate and then decide.

5:39

Speaker D

It's a pretty hacky process. The our metric for at least my personal metric for evaluation is like amongst the drugs that our therapeutics team are really interesting and like want to move forwards on, does the next version, next generation of the LLM continue to recommend those drugs as these are like very good and I don't actually think it was pretty good at the very beginning. I only built this like pipeline a few months ago. It remains pretty good now. I don't think there's been like any dramatic jump. I partially think this is due to like identifying what makes for a good drug is a very like qualitative process and a very vibe based. Like it depends on the economic status of the company. It depends on like do we know anyone there? Because oftentimes these companies don't make it easy for you to give them your money. It takes a super long process to figure that out. Yeah, it's pretty good though.

5:50

Speaker B

So I definitely Recommend your blog owlposting.com I still got quite a bit of archive to work my way through, but I want to throw a couple of what I thought were kind of your more interesting, arguably hot takes at you and then get you to kind of double click into some of the intuition and implications of those one, because we're obviously in a moment now where there's a tremendous amount of interest in creating AI scientists of all kinds. And one of the big bets that companies are making with some serious capital behind them increasingly, is that they're going to close the loop by allowing AIs to design and run their own experiments through some sort of automation. Feed that data back in and they're going to get reinforcement learning from basically experimental results. Now one of the things that you had said in one of your posts is that there's not a lot of verifiable ground truth in biology. And I would love to understand what that means exactly. And then what does that mean in terms of the ability to close that loop? Is there some sort of fundamental messiness or uncertainty that you see as kind of, at least in the near term, being irreducible? That would become the functional limit on how much systems could learn from that kind of closed Loop experimentation.

6:40

Speaker D

Yeah. To say that biology has no verifiable ground truth is probably a little bit hyperbolic on my end. But what I will defend is that there's not a lot of verifiable ground truth for the most clinically valuable problems. So like yes there, there is verifiable ground truth of like does this protein exist in the solution? Is this like variant that you, your NGS sequencer identified? Is this like true? Those are both verifiable. But I don't think you'll quite see the same explosion of intelligence that happened in math and code as you will in biology because like rewards are like so cheap and so easy to get in those fields. In biology it's just like such a long iterative process to get any iota of information. So like one, one easy analog to this is like like training an RLVR model on the, on the task. The best selling book there is technically a verifiable reward, right? There's like book sales, there is the, the, the country that the author is writing from all these sources of data. But it takes 18 months to get that singular data point. And when you get that singular data point it's very hard to trace it back to any one of these things. And like, like one, one like biology grounded example of this is like let's say you want to do RLVR on toxicology prediction. This is arguably the thing that sinks the vast majority of phase one drugs out there. Toxicology sounds like a very simple topic. It does not. You can, a drug can be toxic on the order of seconds like snake venoms. It can be order on the toxic on the order of months, years. It potentially doesn't kill an animal. It maybe just leads to like cognitive deficits, heart damage. Oftentimes it's dose dependent. Also it could be species dependent. All these like measures of toxicity. There's like no real way to understand them other than just observing them in vivo like in vivo setting and then seeing what they read at. There are companies like one, one asset based startup called Axiom which is trying to create a like a model that can very easily tell like given the small molecule, what is its toxicity impact on hepatocyte cells in a salvage. It's a very clean, simple problem that probably saves like months of time in preclinical settings. But it doesn't poke at the much more important problem of how does this perform in an animal.

8:00

Speaker C

Just a segue here. So Isomorphic Labs I think yesterday announced they have a predictive model which doubles the performance of AlphaFold 3 on key benchmarks. Binding affinity, pocket identification, structure prediction. How does that fit in to how things go? Is this actually useful or does this just create more targets which need to be validated anyway and it's not that useful?

10:14

Speaker D

Yeah, obviously very incredible piece of work. ISO ddes and I'm no longer in the protein engineering field but I think the benchmark they did, that leftmost plot they're presenting on, that's an incredibly difficult benchmark to get better at and they're 2x better than what was previously. So very good. But I'm sure, like you've heard the sentiment, that the field is already awash with many really good pre clinical assets and the bottleneck is actually how well do these work in patients. It sounds perhaps obvious that if you get better at this preclinical design step, you get better at putting into humans. That's the story that has been told for 10 years. It is not obviously clear that any of it has borne fruit. I imagine at some point it will, but there isn't really strong evidence to suggest that it does. There's actually this really great paper that a chemistry paper that came out just a few days ago called the Affinity Advantage. That paper is probably one of the strongest bull cases that being able to optimize every facet of every protein that comes in through preclinical pipeline has non linear or super linear benefits to the drug development process and it's just a matter of time until these models get easier. Even better. It's not an opinion I share, but I'm sympathetic to it.

10:43

Speaker C

In Dario Amodei's, you know, I think one of his papers, the blog post that he had machines of living grace, I think he tried to kind of map out how he thought biology developments in biology would work and he kind of pointed out that a lot of the major developments of biology comes from better kind of imaging and sensing techniques that allow you to kind of like look deeper and understand deeper on what's happening in there. And then after that it becomes easier to do a lot of other things downstream of that, you know, starting microscopy and you know, which led to all the downstream developments from there and etc.

11:58

Speaker D

Etc.

12:38

Speaker C

Etc. What do you think are potentially the developments which might be coming up in the next four to five years that might do something like that?

12:38

Speaker D

I guess I would vaguely gesture to building models that generative models of human in vivo biology. I think there's layers of discussion to be had. What other instruments do we need? What other modalities do we need? I think there's a lot of low hanging fruit in simply collecting a huge amount of highly rich data collected from real human tumors, intestinal lesions, plasma readouts, and just feeding a model with that information and not paying attention to any of the in vitro or otherwise biologically unrealistic settings. And then from there maybe you get access to a genuine bonafide human simulator of biology and maybe that's a lot helpful for fixing the current state of 97% of oncology trials fail. I think the Dario pitch of scientists in the data center churning out interesting ideas. I think there's already tens of thousands of PhD students training on very good ideas. Most of them can't be validated because it's too expensive to do it.

12:48

Speaker B

Hey, we'll continue our interview in a moment after a word from our sponsors.

13:54

Speaker A

Are you interested in a career in AI policy research? If so, you should know that Govai is hiring 10 years ago, a small group of researchers made a bet that AI was going to change the world. That bet became govai, which is now one of the world's leading organizations studying how to manage the transition to advanced AI systems. GOVAI advises governments and companies on how to address tough AI policy questions and produces groundbreaking AI research. GOVAI is now hiring its next cohort of researchers to tackle hard problems that will define AI's role in society. The research scholar position is a one year appointment for talented, ambitious individuals looking to transition into the field, and they're also hiring for Research Fellows, experienced researchers doing high impact AI policy work. Past scholars and fellows have defined new research directions, published in leading media outlets and journals, done government secondments, gone on to work in leading AI labs, government agencies and research groups, and even launched new organizations. Applications close on February 15, so hurry to Governance AI opportunities. That's Governance AI opportunities or see the link in our show notes. Want to accelerate Software development by 500%? Meet Blitzy, the only autonomous code generation platform with infinite code context purpose built for large complex enterprise scale code bases. While other AI coding tools provide snippets of code and struggle with context, Blitzi ingests millions of lines of code and orchestrates thousands of agents that reason for hours to map every line level dependency. With a complete contextual understanding of your code base, Blitzi is ready to be deployed at the beginning of every sprint, creating a bespoke agent plan and then autonomously generating enterprise grade premium quality code grounded in a deep understanding of your existing code base services and standards. Blitzi's orchestration layer of cooperative Agents thinks for hours to days autonomously planning, building, improving and validating code. It executes spec and test driven development done at the speed of compute. The platform completes more than 80% of the work autonomously, typically weeks to months of work, while providing a clear action plan for the remaining human development used for both large scale feature additions and modernization work. Blitzi is the secret weapon for Fortune 500 companies globally, unlocking 5x engineering velocity and delivering months of engineering work in a matter of days. You can hear directly about Blitzi from other Fortune 500 CTOs on the modern CTO or CIO classified podcasts or meet directly with the Blitzi team by visiting blitzi.com that's B L I T Z Y.com schedule a meeting with their AI Solutions consultants to discuss enabling an AI native SDLC in your organization today.

13:58

Speaker B

So that connects pretty directly it seems like to what you are doing in your work on cancer at Noetic. Right. You guys are focused first of all at the roughly the clinical stage and try to predict what drugs will work best for a particular patient given some relatively deep data about their specific condition. Right. So maybe walk us through what that looks like. I was interested to learn that it is basically a foundation model with lots of different data sources thrown into it and also that it sort of is trained with this kind of masking strategy where the, the idea is that the model has to learn how to predict from partial data, whatever partial data it might have. And I'm a big believer in that strategy because there just is so much, obviously so many modalities and so much, you know, noise going on inside the system that we don't understand. I've been a big kind of speculator about that being a driver of how AI can help in health over time. So give me the kind of double click past what I have been able to learn with online research into what you guys are doing.

17:07

Speaker D

Yeah, so the, the like. Let me start with perhaps like the economic pitch For Noetic, like 97% of oncology trials fail and you can look at that and say like, wow, we're awful bad at designing these drugs. Maybe we should get better at designing them. But like, one interesting phenomenon is that if you look at a lot of the papers that are published after a cancer clinical trial failed, there's usually some patients who did respond to the drug or respond to the regimen they were on. And the researchers try really hard to figure out what is the exact biological archetype that makes up this response population. And they always come up with something super complicated, very heterogeneous. It's like this particular cytokine group or granzyme genes were highly expressed in the response population. It never leads to anything particularly interesting. And so one argument you can make is that maybe the biomarkers that define patient response for this particular drug is non human legible. You need a black box biomarker to encapsulate whatever that piece of information is. Noetic is kind of built around that thesis. We collect vast amounts of human tumor data. We profile them at four levels of modality pathology, which is the blue chip that almost everyone has. Spatial proteomics, a 16 flex panel to identify cell types, whole plex spatial transcriptome. So this is 19,000 genes over the entire surface of a tumor. Then to identify functional state of the tumor and then exome sequencing to identify genetic alterations. So is this KRAS positive? Is this SDK knockout and so on. And then the ML angle is that you train. Yeah, like exactly. You said like a self supervised mass model in the hopes that one you get a very good representation of any given team that walks into the door. So you now have the ability to place in the universe of all the cancers I've seen. Where does this patient fall in that embedding space? That's what we are doing a lot of. We gather patient samples from people who have ran clinical trials, they have patient samples, we profile them, we run that through the model, see if the response population falls in a different area than the non response population. If it does, maybe we have access to a biomarker that no human on earth understands, but we uniquely are able to. The more interesting thing you can do with this is use the generative capacity of the model and knock out specific transcripts or specific genes and see how that changes the expression of transcripts within the tumor microenvironment. You can imagine there's this concept that's appearing in the cancer literature called nudge drugs, which are drugs that don't actually operate on the immune system or really the cancer site itself, but but rather push it in a direction that makes it more sensitized to other drugs. And so you can imagine, oh, I'll knock out this particular transcript and then I will hallucinate. What would it be like if I added keytruda, which is like an immune checkpoint blockade that operates on the PD1 axis into the site of the tumor and maybe now you predict like, oh, the tumor's like highly inflamed. It's like hot. There's a high chance that it'll just like melt away entirely. Yeah, those are the two big economic and ML strategies we're pursuing.

18:20

Speaker B

Yeah, that's really super exciting when you talk about the first of all identifying or having access to, I think was your phrase, biomarkers that nobody else has access to. Because you can see a sort of divergence in where different patient populations fall in embedding space. Do you have any means? And if not, maybe I can introduce you to the good folks at Goodfire who just did a version of this with identifying biomarkers for Alzheimer's that had been previously not identified. But do you have any means right now of saying okay? Because one thing to say, these things are falling, you know, these patient populations are falling into different parts of embedding space. It's another thing to then say, why? Like, what is it actually that is causing that divergence? How far are you guys along in terms of being able to make interpretable what it is that the models have learned from their unsupervised training?

21:26

Speaker D

Yeah, the Primamente and Good Fire post was very interesting to read. We do have mechanurb research group internally which is exploring these ideas. I have no doubt they'll find something interesting. But one argument against doing this at all is why do we care about interpretability in a clinical setting? We might care about interpretability because the FDA gets very upset with you if you try to do anything that's black box. And maybe that was true a year ago, but circa, I think September or August 2025, there was a pathology AI company called Artera AI which came up with was basically a companion diagnostic as to they intake in the pathology slide of your prostate tumor. And they will predict whether you will respond to androgen deprivation therapy. They have no idea why this model works. They've retrospectively validated on thousands of patients from prior phase three trials.

22:20

Speaker C

And.

23:12

Speaker D

And the FDA was fine with that. So one argument against doing mechanical is that why spend a ton of resources exploring something that the primary regulatory agency you care most about doesn't really mind whether or not it's white box or black box?

23:12

Speaker B

I guess the obvious answer would be because presumably that knowledge would be a great input to further experimental ideas or other knowledge. But maybe you think it's just so hard to. I don't know, there's no verifiable ground truth or something that would prevent that from working.

23:27

Speaker D

Yeah, I guess what was discovered in the pyramid fire pose, I forgot what exactly it was, but it was something about fragmentomics, something about how the genes are fragmented is some potential Biomarker for Alzheimer's. It's a very interesting piece of work. It sounds very expensive to validate it. And so I imagine we would run into the exact same problem. Maybe we get, we have a very good hypothesis, like what for what comes out of the system. But we already have so many other hypotheses, potentially ones that are like, even have like higher literature backing. I could imagine a world in which like mechan as a field gets so good that you can like triage that this thing's going to be really easy to validate. This can be really hard to validate. But right now the way that a mechanurp usually works in biology is like you're staring at like semantic segmentation plots a lot and like trying to think like, oh, is this real or is this fake? Is this the mod? Is this the model? Like identifying some very spurious correlation and that time just simply feels better spent elsewhere.

23:43

Speaker B

Interesting. Okay, here's another idea of a place that it might be well spent. Continual learning, of course, a huge theme right now in AI in general. The first conversation we had today with Professor James Zhao from Stanford included a little talk about their recent paper learning to discover at test time where they're using here autoregressive large language models and giving them problems like make a faster CUDA kernel or find a better solution to this math problem with a lower bound than anybody has previously found or find a better solution to this math problem with a lower bound than anybody has previously found. And they interestingly kind of flipped the usual model of what we're trying to do when we create an ML model on its head and said, what if we just try to get this model to produce the single best answer that we can and we don't care if it generalizes. And in fact we'll probably throw away this model after this test time fine tuning what we want is the answer. And they were able to find at relatively reasonable cost, like $500 compute cost, that they were able to actually get some new state of the art answers on some of these highly technical questions. If I'm a cancer patient and you've got a general foundation model, a question that naturally occurs to me is like, can you fine tune this on my data? Can we do some like test time tuning? Can we do sort of intensive masking on like just my samples and like really dial this thing in to understand my particular biology? And then would it be, if we did that, would it be like more accurate for me? Do you think that does that line of Thinking have legs and why not?

24:43

Speaker D

So I actually looked at the paper and they actually have a section for biology. They do like single cell RNA denoising using this test time training model, which I thought was really interesting. I guess my instinctive answer is it's an interesting idea, it very well might work and it falls into the bucket of ideas that we would simply have to try it until to make sure that it does or does not work. The results for single cell RNA denoising that's in the Jane Shao paper are certainly good. They're better than the state of the art, which I thought for each one of these cases they attach a note by an actual domain expert saying how useful is this in practice? And the domain expert in question for the single cell RNA section did say, this is very cool. But at the end of the day we don't really care about the results of single cell RNA denoising. We care about some biological utility that is underlying that. And so maybe you get better at solving this verified task problem, but that doesn't translate to anything actually useful. Maybe it would be for the response non response prediction case, but it kind of just sounds easier to fine tune the model like using normal supervised, supervised learning. Like why go through the RL process if the end result is like binary? I think they even call it out in the paper. Like the setup isn't really meant for binary or sparse learning tasks. It's meant for fuzzier things.

26:28

Speaker B

Yeah, they're working on that, but it's not done yet. I guess maybe zoom out.

27:51

Speaker C

And.

27:59

Speaker B

You kind of alluded earlier already to this idea that a lot of people think we just need better ideas for drug candidates. And your consistent position is like, that's probably not really the bottleneck. You made a really interesting point around how the ability to evaluate more accurate ability to evaluate those candidates drives a lot more value than just throwing a lot of, a lot more candidates through a pipeline. The quality of the pipeline matters most more than its scalability. So again, I think you kind of have suggested where you think this can come from with just large scale foundation model style training, but give us the next level of depth on that. Why are all these other ideas not so exciting and is this basically just a bitter lesson sort of idea where all your cleverness is going to be washed away by scale and so keep your eyes on the prize. You got a data max and compute max until you solve it all. Is it, is it kind of that?

28:01

Speaker D

I, I guess I kind of view things in like three ideological camps. The first of which, like, like, like the first one is like maybe us, like we're like, we index very heavily on like human data is the only thing that matters. You can start from like in vitro settings and bootstrap your way up to something more complicated. You need to start with the most complicated thing to begin with. The second camp is very interested in modeling single biomolecules and their interactions in the hopes that maybe you can bootstrap your way upwards, but you raise the absolute success rate from maybe 5% to 20% and maybe that's all you need. I think that the second camp defines the vast majority of mlbio companies that exist today. I think some of them have clinical candidates that are ongoing right now. We'll see what the results are. I think generally it doesn't seem like there has been a massive step change in their ability to design drugs. This is not like, like knocking them. Drug discovery is hard. It's everything's in bet at the end of the day. The third camp is it's like maybe it doesn't, it's not really a for profit thing, but you can just like improve the clinical trial process to begin with. And this is arguably the path that like, like China, like this is like China's main advantage. So they're able to run clinical trials far more cheaply than anyone else, partially because of like lower cost of human labor, but also because they've just like set up the system pretty nicely so such that it's not such a huge regulatory and financial headache to get things going. This has like some downsides. Drugs are treated innocent before proven guilty. The FDA is the other way around. But the obvious benefit of doing that is like you're betting neither on the AI in human data getting better or the AI in vitro settings data getting better. You're trusting that the typical drug design process, if just made slightly more financially efficient, will like improve things on. I think all three of these are important and I think it would be probably like grandiose of me to like assign an unequal weighting to any one of them. Each one feels important to push on.

29:07

Speaker B

There have been some interesting. Oh, sorry, we'll be speaking from.

30:55

Speaker C

I'm going to take a little bit of a segue to something you said earlier, which is that a lot of the new inds are coming in from China. Like what is. What has happened in the last like couple of years? Is it an AI thing? Is it, you know, the CEO of Ginkgo Bioworks was on PVP and yesterday he said they Just have more hands. And some people believe it's a regulation thing, some people believe it's a clinical trial registration, we can just register more people. Like what is driving, some people believe it's a US cost thing. What is driving this transfer of basic R and D to China at this point?

30:59

Speaker D

I think this particular subject is very deep. It's not something I have expertise in. My instinctual thought is that there are many different answers to this. And the one that I think is most interesting is the idea that China was always a very good generics manufacturer. And that's where they started. Slowly they extended their way into Wuxi having a very good CRO ecosystem. And then at some point enough talent began to be incubated in China where they began to realize we have all this infrastructure here, why not just develop our own drugs? And there is something very important about having this like very close interplay between both the person who is designing the drug and the person who is actively doing wet lab assays on the drug. Whereas in America you have like a super long feedback loop of like I need to get my sav together, I need to go reach out to VCs, I need to go buy a lab. Whereas in China that ecosystem is a little bit set up already. Actually maybe the only missing part is that the VCs are still like not super, they're more like risk averse than perhaps VCs in America. But the co location of like the grunt work and the intellectual work is actually surprisingly important. A few months ago, actually last year, I interviewed one of the very few people doing novel biotech research in India, a guy named Soham who runs a company called popfacts. He said like this is the primary reason why he expects not only China to start producing really interesting drugs, but also potentially India, potentially Egypt. Places where there is intellectual capital, there's a lot of hands and it's just those combinations lead to really good compounding results.

31:40

Speaker C

Indeed. And does that accelerate with the AI models, this kind of co scientists, does that mean that even if they don't have that much intellectual capacity yet they can, they have the hands to carry it out?

33:17

Speaker D

I guess like this is, this is something that's like, it's a little bit opaque to almost everyone as to like what exactly is the level of how impressive are the bio AI models coming out of China? I think there's certainly some interesting work that has been done. It's not clear to me that there's anything like radically new there that won't be found anywhere else. A fair amount of it is like, like I like I don't want to say this in a disparaging way, but it is like, like scaling up stuff that was originally developed in either the uk, London or America. And there's like nothing like you. There hasn't been really like a deep sea thing where there's something like radically crazy that comes out of any of the Chinese labs. I obviously could be wrong on this though. There's I think the whatever the bio AI labs are doing in China, there's like much less American visibility visibility around it.

33:32

Speaker A

Hey.

34:18

Speaker B

We'll continue our interview in a moment after a word from our sponsors.

34:19

Speaker A

The worst thing about automation is how often it breaks. You build a structured workflow, carefully map every field from step to step, and it works in testing. But when real data hits or something unexpected happens, the whole thing fails. What started as a time saver is now a fire you have to put out. Tasklet is different. It's an AI agent that runs 24 7. Just describe what you want in plain English, send a daily briefing, triage support emails, or update your CRM and whatever it is, Tasklit figures out how to make it happen. Tasklit connects to more than 3,000 business tools out of the box, plus any API or MCP server. It can even use a computer to handle anything that can't be done programmatically. Unlike ChatGPT, Tasklit actually does the work for you. And unlike traditional automation software, it just works. No flowcharts, no tedious setup, no knowledge silos where only one person understands how it works. Listen to my full interview with Tasklet founder and CEO Andrew Lee. Try Tasklet for free at Tasklet AI and use code COGREV to get 50% off your first month of any paid plan. That's code COGREVASKLET AI. Your IT team wastes half their day on repetitive tickets, and the more your business grows, the more requests pile up. Password resets, access requests, onboarding, all pulling them away from meaningful work. With Servl, you can cut help desk tickets by more than 50% while legacy players are bolting AI onto decades old systems. Servil was built for AI agents from the ground up. Your IT team describes what they need in plain English and Servil AI generates production ready automations instantly. Here's the transformation A manager onboards a new hire. The old process takes hours pinging Slack, emailing it, waiting on approvals. New hires sit around for days. With servl, the manager asks to onboard someone in Slack, and the AI provisions access to everything automatically in seconds with the necessary approvals, it never touches it. Many Companies automate over 50% of tickets immediately after setup, and Serval guarantees 50% help desk automation by week four of your free pilot. As someone who does AI consulting for a number of different companies, I've seen firsthand how painful manual provisioning can be. It often takes a week or more before I can start actual work. If only the companies I work with were using servl, I'd be productive from day one. SERVL powers the fastest growing companies in the world like Perplexity, Verkada, Merkor and Klay. So get your team out of the help desk and back to the work they enjoy. Book your free pilot@serval.com cognitive that's S E R V A L.com cognitive.

34:23

Speaker C

Okay, go ahead.

37:13

Speaker B

One big question I have is kind of. I find it very hard to calibrate myself on how excited I should be about all these AI for biology and AI for medicine developments. I know that, you know, there's always these kind of headlines, AI does this, AI discovers this drug. You know, I've done episodes of the Cognitive Revolution on it. One with Jim Collins. You know, he has created a bunch of antibiotic candidates. You know, there's a long list, right? Professor Zhao did the nanobodies thing that came out of the virtual lab. To hear him talk about it earlier today, it sounds like those were reasonably well validated. But then you always get this other side too that's like, well, not so fast. You know, it's all very messy. We got a long way to go. You know, most of these things don't pan out. And I feel like that sort of parallels the debate that we hear in a lot of different domains where, you know, even in like programming, which is one of the more, let's say legible domains, we've got something like a meter study that showed slowdown of developers and that was very confusing. I'm like still quite confident that it's making me faster and I kind of want to throw that away. Or there's of course just a lot of denial and cope out there and all sorts of motivated reasoning. How should one try to ground their worldview? Obviously subscribing to owlposting is something everyone should do, but what else would you advise me? How can I patch these blind spots in my worldviewer or get to a better position from which to have my own sense of what really counts or what really matters and what doesn't? Because again, this happens all over the place where there's this disagreement even among some of the most informed people about just how big can AIs reason or how big of a deal is it going to be. But in biology it's particularly hard for me to make sense of. So I'd love to get some tips for how to climb the learning curve faster.

37:14

Speaker D

I've actually written about this in the past a very long time ago. The title of the article is 5 things to keep in mind when reading Biology ML papers. The long and short of it is the evaluations in biology I think are very difficult. And so the vast like you see this thing about like in like more typical wet lab biology of like oh, we cured cancer in a mouse, who knows when it'll actually translate to humans. There's a very similar phenomenon in a lot of biology ML papers where they're doing something that feels like it should be useful, but there's like a lot of things that they're probably hiding from you when explaining the results. That would only be obvious to a domain expert. One I think really funny example of this is like small molecule binding affinity papers. I've written about one company's work in this, but they found that let's say you're able to predict these set of molecules bind to this target. These set of molecules do not bind to the target. And you're very happy with yourself. You publish a Nature article about it. What they what these folks at a company called Leachbio found is that this can often be confounded by which chemists actually produce the molecule in the first place. Because some chemists are very attached to specific targets, they're very good chemists. So they often produce things that like bind to that specific target. And these chemicals like very importantly all look very similar to each other. And it is this type of similarity that's like very human vibes based and it's hard to pin down to a singular metric. And so they found that these models often confounded by other overlap. I think this and these problems just appear over and over again across the in vitro biology biomolecule generation where you can be confounded by variables that you did not know even existed in the data set. And so I would probably name that as the thing to be most aware of when reading these papers. There were a few people I tried trust on Twitter and people in real life who can give a pretty good overview of any arbitrary paper. But I think with LLMs, Popular Science people often retweet them and say this is transformative. And more often than not they're probably correct. Opus 4, 6, which is genuinely crazy. I think when people do that for bioml papers, there's a 50, 50 chance that they're completely missing the point because they're not in that field and they don't understand how the failure modes emerge in these models.

39:11

Speaker B

Yeah. Do you think that an OPUS can help me identify those blind spots? Is it.

41:35

Speaker D

Sorry, go ahead, go ahead.

41:42

Speaker B

Yeah. Is it good enough to do that?

41:44

Speaker D

Actually, I've written an article about this also titled Can Owen Preview find mistakes amongst 56 MLSP papers? MLSB is a structural biology workshop at Neurips and it's not very good at it. This was obviously last generation of models. Maybe it would be a lot better. But there's some problems that are going to reoccur in almost every biology ML paper of like, oh, your train sizes aren't large enough, your test sizes are not stratified correctly, but you kind of just learn to pick your battles in this field and you just move on. There's a lot of more fundamental problems with these papers that LLMs in my experience just often miss entirely. I think the funnest, like I. In almost every article I've written, I have found that LLMs tell me something about this particular subfield that the domain experts completely disagree with and they say, like, that is not how you should think about this domain. It's like, that's not the real problem we should actually be worried about. I don't know why this is the case. It's kind of fun. It's like a domain of science that LLM still haven't quite captured human taste.

41:46

Speaker B

Yeah, fascinating. Okay, that leaves a lot of work in front of us. Do you want to go back briefly before we break to Noetic again? I mean, fortunately my son is doing well. He recently got cancer three months ago. I've had a intensive crash course in cancer and I hope to be able to close the book on it and return it to a more intellectual and less personal interest going forward. And I think we're on good solid track to do that. But how? I think you've demonstrated in this conversation that you're like not getting too carried away with the promise of what AI systems can do. We've got the data center of geniuses, we've got the century of progress compressed into five years kind of visions. How much would you shave off of those notions just to describe your own expectations of what Noetic can do specifically and maybe what the field more broadly he's going to be able to accomplish?

42:50

Speaker D

I think I'm very optimistic that like human simulation companies like akin to Nordic, but I think there's other players out there as well will be able to vastly help with the results of a few, at least a few clinical trials within the next few years. That feels like almost like, like you're not even paying attention too much of the trend lines. I'm almost like like indexing on like what we're capable of today. I think it's like pretty obvious like there, there's like, there's papers going back like year like years that are able like to show that oh we've developed an MLML is able to better able to stratify patients. The problem has always been like an economical one and like how do you actually deploy this in a real setting? I think we'll, we'll be able to do that just fine. I think phase one, phase one drug, sorry, the failure of phase one drugs will go down and I think this has already been slightly proven out. And then like I think a McKinsey study that was done five years ago that show that like AI design drugs have like a 5 to 10% lower failure rate which may be noise, maybe real. I do kind of expect those trend lines to continue a little bit. Where I'm like most unsure of is whether these models will able to discover brand new targets entirely, which is ultimately what like people care about. I think there's like a, I think like believing that these models will be able to find new targets far faster than humans would really requires you to index heavily on the trend lines. And I want to use index on the trend lines. So I believe that these models will be able to deliver very good target finding. But I'm also very sympathetic to the mindset of finding targets is just such an unbelievably hard problem that the models will not make a tent in it because you need this human iteration feedback loop and unless you build a really good human simulator, which is our bet, you're not going to get close to solving that problem.

43:48

Speaker C

The way I put it is usually you can kind of see like one order of magnitude ahead, maybe two. No one can see three orders of magnitude ahead. It's just not possible. You have no idea what's going to happen. Abhi, thank you so much. I learned a lot from this and hope to see you online. Hope to read more of your blog.

45:31

Speaker D

Yeah, absolutely. Thanks for having me on.

45:50

Speaker B

Thanks for being here. We'll be working our way through outposting archives for some time to come.

45:53

Speaker C

Bye Bye. Our next guest is Helen Toner. Who runs CSET at Georgetown. She's a former OpenAI board member. And there's two competing views here. She has, on the one hand, the intelligence explosion is coming. On the other hand, AI capabilities may be permanently jagged. So let's add her to the stage. Helen, nice to have you.

46:00

Speaker E

Hey, thanks for bringing me in at the end of your marathon. Impressed. You guys still going strong?

46:21

Speaker B

There's so much to cover, you know, and we've all got to accelerate our personal productivity timelines and try to pack more information into, you know, the same amount of time. So experimenting with. With ways to do that super fast. Yes, please. That's honestly one of my reservations about live content is I listen to everything at 2x speed and I'm like, I can't listen to it 2x speed if it's live.

46:26

Speaker E

So I mean, my constant struggle is to talk slower than I naturally want to. So if you want me to talk double speed, I'm here for it.

46:51

Speaker B

Please go fast as you want. Okay, so you guys just put out this report. I think this is obviously a great candidate, if not a shoe in, for the most important question of our moment. What is going on with the possibility of automated AI R&D? Do we have this tipping point where we're starting to hit recursive self improvement? And if so, like, how big of a deal is that going to be? You guys brought together a bunch of people that authored this report and some others as well that aren't necessarily authors, but contributed to conversations. I understand, like quite a few people from frontier model developers. And it strikes me that, like, this debate goes back basically to the beginning of AI right there was the idea very early on that we could have an intelligence explosion. When I started reading Eliezer in 2007, you know, he was very worried about this. And yet, you know, you've written, I think you put your finger on something a lot of people were feeling last year when you said, even though what passes now for long timelines is pretty short, and yet the disagreement on this topic seems to be as fundamental and seems to be kind of as impervious to new evidence as it has ever been. So maybe just for starters, take us inside the workshop, give us kind of the lay of the land in terms of what are the world models that people have and why are we still just working from so much intuition, despite the fact that we now have, like, you know, what in some circles would even be called AGI out there as products for us to use today?

46:58

Speaker E

Yeah. So this workshop was held in July last Year and was maybe one of my work highlights of the year it got started. So it was a day and a half. We brought people in, we had people coming from some of the frontier companies policy a bunch of great people to get a sense of kind of what the vibe was like. We started out, the first session was about kind of how is AI being used to automate AI R&D right now. We had presentations from people who are doing that. And before the first break we had Ryan Greblatt from Redwood Research, Nicholas Carlini from Anthropic, Dash Kapoor from Princeton of AI as Normal technology, and Thomas Larson, who's one of the AI 2027 authors. They were all arguing so fiercely in a friendly and productive way before the first break that they just, everyone else stood up to go and get coffee and drinks and snacks and they just kept on arguing right through the break, which is great. It was exactly what we were looking for. But I think did sort of preface something that we knew going in which was there are really different perspectives here. And the workshop was Chatham House. I feel okay giving that anecdote because they ended up writing. One thing that came out of that was Nicholas was pushing the others constantly for like, okay, you have such different views about where things are going. Where's the first place that you actually disagree about what we'll see? And they found that as they're looking out for what we're going to see in 20, 26, 27, they actually agree a lot about kind of what we're going to see before we get to that recursive point, which is kind of a bummer. Like it's, you know, it's nice that they agree and can. They wrote a post about that actually, which is the reason I feel fine, you know, sharing that. That anecdote from an otherwise Chatham House workshop. They were to post about the stuff they agree on. But it sucks that like that means that it's actually going to be hard to identify in advance whether we are heading into recursive loop or whether we're not. Two big things that I think. So what we were trying to do with the workshop one was like get this idea of recursive self improvement out of sort of purely like Silicon Valley, San Francisco, really AI pilled spaces and make it, explain it, present it to a wider audience, let people engage with it, but then also actually try and make some progress on like, okay, why do people disagree about this? What is happening, what might happen in the future, what indicators could we gather stuff like that. And I came out of it, thinking that maybe two of the core disagreements here, one is, does AI truly replace all of what humans can do? So does you get to that, like fully automated? Because if you're going to have the biggest, the really scary recursion, that's probably what you need. It can't be that. We could have much more productive human researchers. You could have the Alec Radfords and the Ilya Sutskevers managing fleets of AI researchers. But if it all has to come back to them and they have to process and digest and think through the research, you're not going to get that massive recursive loop. So that's one piece is do you truly get humans being fully replaced? Because if not, then maybe you have some parts of the workflow being really accelerated. We have this diagram in there of like sort of an Amdahl's Law kind of thing, where Amdahl's law is basically, if you have a process that depends on different inputs and there are different potential bottlenecks, then if you speed up one part of the process, the bottlenecks will just bite somewhere else. And so it may be that, you know, you speed up the coding part of AI research, but if you don't speed up other parts, then you don't end up speeding up the whole thing very much. Or, you know, do you have, I think, another mental model that people who are skeptical that this is going to really go crazy, a mental model they bring is like, okay, we have a long history, but of computers doing more and more of the lower level work. So, you know, we don't have to do punch cards anymore, we don't have to write assembly code. We have these higher level languages. And so, you know, for example, AI doing more of the coding is just another natural step in that process. And humans, you know, this is kind of a like expanding PI model. The amount of like tasks that we realized can be involved in AI R&D expands and there's always that outer band that the humans can do while they're automating the inner bands. And I think that is very different from the view that, you know, other people sort of the more that AI 2027 authors would have or lots of other people in this space, which is no, first you automate some of what the humans can do and then you automate all of what the humans can do, and then you go until some other bottleneck hits. So then the other question is, okay, what are those bottlenecks? We can talk about that as well. But I think those are two of the biggest questions that came out for me. One was, are you truly going to automate everything, including what all the humans can do? And the other is, if you do, how soon do the bottlenecks bite?

48:32

Speaker C

So Sholto Douglas, who is now at Anthropic, had this idea of a software only singularity, he calls it, where we get very good at coding and all of the digital stuff, including AI research, I presume, but not at producing power or copper or all of the physical substrates which are going to be required in order to support this expansion. How do you think that fits in? Like the fact that maybe the digital stuff happens but the physical stuff just doesn't?

52:31

Speaker E

Yeah, I think there's two versions of this. I think some people, when they talk about a software only singularity, they basically mean like, it turns out that software is enough to get like absolutely crazy recursive loops. So I think there's like, Tom Davidson at Forethought Institute has written about this, for example. And you know, maybe you can get just like massively more intelligent systems having massive impacts on the world primarily through sort of software improvements. There's a different thing, which, what you're describing of the version from Sholto is a different thing which I would think of as more like sort of a jagged software only intelligence explosion, meaning the AI is getting much more capable in certain ways and like, but its effects on the world are very limited because it is software only. And this kind of gets another, another thing that I found really, really helpful from the workshop and really interesting, which is people have very different intuitions about, okay, assume that you have an AI that is very, very good at AI, R&D. What does that mean for what the AI can do elsewhere? And I think some people are like, okay, well if it's very good at AI, R&D, then it can train AI models to do whatever, so it can do whatever. And maybe there's like, maybe you have to spend like a week gathering data or something, but then if you want to do some arbitrary task, you could do it. Whereas I think other people have an intuition of like, okay, well, even if it gets very, very good at automating a R and D, this sort of software based task, it's still going to really struggle to, for example, design new biological molecules or it's going to struggle to think about geopolitical strategy questions because you have to actually go out and see how different countries and decision makers will react and things like that. And that is a piece that I feel goes under in a lot of these conversations is what is that connection between AI that can do incredibly good AI R and D and AI that can affect the world in non AI R and D specific ways that we also tried to tease apart a little bit in there?

53:06

Speaker C

Would you think that's kind of like the connection between, like, okay, now you have AI doing AI research that's affecting the economy. It's also affecting the political economy. And then you have to have mitigations to a political economy for this to work out. Does that mean you might need the AI research to go into how to fix the political economy, which is going to be a little bit scary.

54:48

Speaker E

Yes. Say more about what you mean by affecting the political economy in the sense.

55:11

Speaker C

That, for example, right now you have Bernie Sanders saying that we should have a moratorium because he's scared about jobs. He's very scared about jobs. He wants a moratorium on data centers. I think there's like six states with a moratorium now, including New York State. And so AI research leading into, hey, how does AI fix the political economy? How do we deal with the humans? How do we mitigate the impact that we have on the humans? Is that something that you think would happen with the first, the first configuration of the software only Singularity in the sense that it's not jagged and also affects the political economy?

55:15

Speaker E

Yeah, yeah, that's. That's the kind of thing that's right. If you're positing that you can just have a sort of software only singularity that is going to radically transform the world, then it's going to have to be able to do things like, okay, and then the company, like deploys, you know, chatbots that talk to enough people that convince them the data centers are great, the data centers all get built and the moratoriums get rolled back. Yeah, that kind of has to be built in. That's right. Which to me intuitively feels like it's sort of a different skill set and also is more dependent on like deployment and rollout and adoption. Yeah. So I tend to be a little, A little more skeptical there. But yeah, I think that's a, that's an example for sure.

55:49

Speaker B

I see one kind of odd, seemingly, to me, odd pairing of beliefs that I observe and I sort of detect in the report is the idea that among the sort of more skeptical folks that there's going to be a plateau and also that plateau is going to be subhuman. And then on the other hand, it's like it's going to not plateau, it's just going to Run away, you know, have some sort of singularity.

56:21

Speaker C

I.

56:46

Speaker B

A position that I feel pretty intuitively attracted to, that I don't hear too often, is the idea that maybe there will be a plateau, but it could be very easily a superhuman plateau. If I try to zoom out, you know, as far as I possibly can, and look at, like, life on Earth, I would say it seems like humans are sort of part of maybe an entry into a steep part of an intelligence explosion or, you know, an S curve of capability. And I don't think we're the end of history, but we were clearly better than what came before and that was enough to take over the world. And so I think I just don't hear too many people say, yeah, it's not necessarily going to be a singularity. It's not necessarily going to like, you know, go totally beyond comprehension. But in the same way that we were just that much better than Neanderthals. And it might not have been that much, but it was enough to change everything. I kind of feel like there's not too much more room between where the AIs are now and where they will soon presumably be. And even if that doesn't, like, you know, go sort of critical from there, it feels like we're like. It's very hard for me to imagine that it's not enough to be transformative. So if. Is that a position that was represented in the workshop and how do you personally react to it?

56:47

Speaker E

Yeah, I mean, I think that sounds pretty close to my default expectation, maybe. And if so, then it was represented there because I was there maybe to like, riff on it a little bit. Something we didn't put in the report, but that I've definitely found helpful for my own thinking is like thinking about, okay, you have an S curve. We're clearly some kind of S curve. I agree. And you have like three, maybe less, like situating us as humans in the middle of an ongoing S curve. But we have some kind of S curve of like, AI capabilities. And there's three segments that are of interest. One is like, how long is sort of the lead up period? The first part of the S curve, one is how steep is that? You know, middle of the S. And 1 is how high is the ceiling. And I think a lot of, mostly when you're hearing people talk about automated R and D, either they're sort of. They're in one of two camps on all three of those questions. So either they think the lead up is short and the curve is steep and the ceiling is high. Or they think the lead up is long and the curve is gradual and the ceiling is low. And so I also think find it really interesting to think about, okay, what are different like combinations of those sort of parameters where to me it feels like really looking like the lead up is pretty short these days. Like we're not too far from that sort of takeoff period. But like, for example, what if the curve is steep but the ceiling is low or the curve is gradual but the ceiling is high? I feel like we don't talk that much about either of those. And maybe also to your point, Nathan, about the like superhuman but not like all powerful God like singularity. There's no, you know, no point of return. Yeah, I also, I really think there's room for more thinking about what does it mean to be superhuman and what are the domains where there's like tons of headroom above humans and you can easily identify like, you know, yes, it would look, this is what it would look like to be superhuman at, you know, like optimizing a kernel or like, you know, selling things.

58:00

Speaker B

For example, you're immutality.

59:40

Speaker D

Yeah.

59:42

Speaker B

In previous, you know, parts of this marathon conversation, series of conversations, we've just seen how the ability to interpret the signals that people throwing are throwing off in sleep to predict disease, you know, is just a really random but I think instructive example of how there's an obvious lot of room to be superhuman at some of these tasks and there's potentially a lot of power to unlock, especially if you can integrate that kind of infinite modality grokking with a kind of basic reasoner. I really don't see any reason that we're not going to be able to achieve that.

59:43

Speaker E

Yeah, often those things though will involve, I think another piece that's underexplored here is people tend to either be in the camp, like the ceiling is high and you're not going to need all that. It's not going to be delayed by real world adoption. Or the ceiling is low and it's going to be delayed by real world adoption. And to me I'm sort of like, isn't the obvious combination of these, like once you get the real world integration, you know, for example, you have to collect all that sleep data or like, you know, humans are really bad at interpreting scent data. But like dogs like smell things we can't like, but you have to do a bunch of sensors and like all that. So I also feel like there's sort of unexplored questions around how high is that ceiling as you have kind of really increasingly integrated AI to more and more aspects of life in the economy.

1:00:24

Speaker C

I do. I do also wonder to what extent because in my, you know, the way I might view things as happening is software and mathematics first. And the question for me is that if you get software and mathematics first, you may get things like I don't need a lidar for my self driving car anymore. I can use cameras. And the cameras can be, you know, really bad cameras now because the math does all the work and I don't need all this sophisticated technology. It could be that your phone could do what those sleep detection machines do with the right software package. Your phone has a lot of sensors. There's an enormous amount of technology within the phone. And you do wonder whether it would really be application of algorithms to existing frameworks, existing infrastructure, increasing the bandwidth of your communications tech with new encryption and new cryptography, which is how DSL was invented. For example. DSL was really using the existing copper pipes with new algos. So I wondered what extent you don't get a slowdown just because of your physical infrastructure. Because you kind of innovate around or with your physical infrastructure.

1:01:02

Speaker E

Yeah, I'm sure that will work in some places. Yeah, I think it'll work in some places and it won't work in others. So I think if we're talking like there's a huge ongoing struggle with my mind goes is like cybersecurity for critical infrastructure where like the systems are, the physical systems are old, they're hooked up the operational technology, they're hooked up to old information technology because they have to be like there's going to be a limited amount that you can optimize using smart new algorithms there because it's just the stuff is old. Likewise for, you know, my center does a lot of work with kind of military technology. Same thing there. Like if you have a ship that was built in the 1960s, it's a ship that was built in the 1960s or other pieces of equipment. So yes, I think in some places, yes in some places, no, to me that's another. I think Prakashi mentioned it as I came on the talk I gave on jaggedness. To me it's another place where the jaggedness bites. And I think as well, my default expectation in AI R D is that we'll see jaggedness. The jaggedness is fractal, right? Like you zoom into the sort of quote unquote task of AI R and D or the skill of AI R and D and actually it's many, many different things. And so we'll see the airnd accelerating in areas that are especially amenable to using AI and lagging more in other areas. Not to say they can't ultimately be automated, but it will, like, take longer.

1:02:23

Speaker C

How far do you think the product which is on the market right now behind what people are using inside the labs?

1:03:34

Speaker E

I don't know. I honestly don't know. That was one of the sort of most like, trying to be productive section of the well, it's not true, but most, like, maybe actionable section of the report is a set of indicators. And we have like, a table summarizing the three categories of indicators that we have. And that is the biggest category is kind of indicators from inside companies. And one of them is that public private gap. My sense is that it's not huge right now, but I don't have any inside information. You guys talk to company employees as well.

1:03:41

Speaker B

Do you believe Roon? He says we have no idea how good we have it and the gap is very small.

1:04:08

Speaker E

Exactly. I'm thinking of things like that exact suite.

1:04:13

Speaker C

So what I learned in the last few days is that the real gap is that they're using models which are three times faster. So it's just the same model. They're running on lower batch size, it's three times faster, and that's what they're using internally. It's the same tokens. It's just a lot faster.

1:04:15

Speaker E

And there's surely also, like, tooling stuff. Right. As well. Like, you know, something we put in the report, a couple of our reviewers who were looking at this who were less familiar with the idea of automating R and D. Some of them were like, oh, haven't you seen that study of like, 95% of AI pilots fail? And there's the, you know, that meter study of, you know, that AI slows people down. And so we included in the report, like, explicitly noting, you know, yes, productivity boost from AI are mixed, but these AI researchers are, like, in the very, very best position to benefit from their technology. Like, they are the best up to speed on what it can do and what it can't do. They're shaping how it's developed and, like, what directions it's pushed in. They're in the perfect setting to be building tooling to squeeze the most juice out of these models. So I'm sure that that is also a piece of it as well.

1:04:33

Speaker B

One of the things you mentioned early on just a few minutes ago is the idea that you wanted to kind of bring awareness of these possibilities Outside of the places where they are most often discussed. One other thing I would love to hear your perspective on is how ideological do you think companies are about this? I mean, this is one of the things that confuses me in the sense that every Frontier Lab leader has read their Eliezer, you know, catechism. They, you know, I think they, they previously, many of them in the past had said like, things about how we should be extremely careful about this sort of thing and we should not engage in an arms race dynamic. You know, it's obviously part of the OpenAI charter, Dario, things like this. And now we're in a world where there is an. A publicly stated timeline by OpenAI to the AI R D intern and then another, you know, not too much longer out 2028, the full AI R D researcher.

1:05:14

Speaker E

I mean, Mario Anthropic as well as Darius.

1:06:17

Speaker B

Yeah, I would say Anthropic will even seem more committed to it or more, you know, we're resigned to it maybe, but they believe it.

1:06:20

Speaker C

Jack Clark, June, summer this year. Jimmy Barr, who just left Xai, was a co founder 12 months. And OpenAI this year, research intern and then full researcher kind of a year later. I think it's this year. So that's my guess.

1:06:26

Speaker E

This year for what specifically?

1:06:47

Speaker C

Like the start of recursive self improvement. We're going to.

1:06:49

Speaker E

Oh, but aren't we there already? Wasn't it last year? I mean, you had Gemini doing the. What was the evolutionary algorithm stuff where it designed like co. Designed an algorithm that like sped up its own training by 1%. Like, come on, that's recursive. It's really the one. I'm talking about the lead up to that. Yeah. So you. But I don't know, you think we might be this year where there's no human needed whatsoever. I think it's a high bar.

1:06:52

Speaker C

I think we might be. I think I updated on multiple. The multiple book thing took me by surprise. Like one and a half million agents all of a sudden on the. On the web. Yeah, it's all nonsense for sure, but things start off as nonsense. I think what might happen is you get a single model update which kind of fixes a little bit of hallucination, a little bit of security issues around, like leaking. Leaking secrets and I think that might be enough.

1:07:18

Speaker E

Sounds hard. Sounds real hard. Fixing the security stuff.

1:07:43

Speaker C

We'll see.

1:07:47

Speaker E

Yeah, maybe.

1:07:48

Speaker B

So go. I want to. I do want to give you the chance to talk about the dynamics of this.

1:07:50

Speaker D

The.

1:07:56

Speaker B

There's, you know, different reads that we might put onto People, they're ideological about it. Elon Musk has said things like, pretty much like, I don't know if this is going to be good or bad, but I want to be around to see it. And also like, you know, I'd rather be part of it than a spectator. So that sounds like somebody who is inclined to gamble with humanity at, you know, in a pretty self aware way. Others may feel like they're trapped into these dynamics and they at least we'll do it as safely as possible. How would you describe that milieu right now? I think it's dramatically underappreciated by people outside the AI bubble where we spend all our time.

1:07:57

Speaker E

Yeah, I mean, I think my impression of it, my sense of it from the people I talk to is there is just a sense of inevitability about AI advancing and a desire to be a part of the future being created because they see this as the future that's being created. And I think there's also, you know, the people who you mentioned how this has been kind of part of the AI conversation since the very beginning is this. IJ Goode was in early 1960s talking about when you create the first ultra intelligent machine, which I feel like is we always need more terminology in AI, feel like we should get ultra intelligent to make a comeback. That there's this sort of very natural logic. If you have a computer science kind of brain, it's just a very natural logic to say, okay, we have some level of skill at building computers. We, when the computers have more skill than we do, then they'll build ones that have more skill than that. And then, you know, you get a loop. And so I think that that logic is just very appealing and seems very natural. And so people think of it as something that's going to happen anyway and then they may as well be involved, may as well be involved. That's not everyone, but I do get the sense that that's sort of the like the water that most folks are swimming in. And then if you have a different view, then it's in contrast to that. I don't know. Is that your sense as well?

1:08:35

Speaker B

Yeah, I think so. And I think the inevitability is a pretty compelling argument. I mean, I resist it because. And I at least want to make the point that like, even if some form of this is inevitable, there is still probably important discretion that we can exercise in terms of exactly what flavor. And there's questions like should we keep chain of thought interpretable or should we, you know, embrace thinking in Latent space. And I do think it's important to kind of keep in mind that it's not probably all one or all the other AI defies all binaries. There's going to be these gradations and these kind of more local decision points. But yeah, I mean, in 2022, I was just trying to make AI work for practical tasks and with no background in AI research, basically ended up independently inventing a number of the techniques that have gone on to produce great things. I didn't take them past any local plateaus, but you know, just having AIs improve their own outputs, you know, kind of proto, proto constitutional AI type stuff. I do think the attractor is, you know, the sort of gravity well is like pretty strong and it just, it's hard to avoid some version of these techniques because if even a, you know, a bozo like me lands on them, I don't know how they're not going to happen. You know, in the big broad world, especially as we start to also get dramatic democratization of training techniques. Right. I mean, prime intellect just put something out that kind of allows anybody to spin up their own, you know, RL environment on a distributed basis, on a community basis. So like everything's going to get tried. And that I think that's pretty hard to argue against. But again, also I do want people to still own what exactly it is that they are doing along the way.

1:09:41

Speaker E

Yeah, I think there's something in here which takes me back to conversation, long running conversations about autonomous weapons. There's something in here about the, like the level of human oversight that you can have, where I do think that using that tool, using AI to accelerate research is, I totally agree, an attractor. But you would really hope that there's a meaningful difference between I have a fleet of 10 million AI agents and they're running experiments for me and I am leading them and I am guiding them, versus I have set something into motion. I have no fucking clue what's going on. And I think there's a boundary somewhere. Is it a boundary that we're able to stay on one side of? I'm not sure, but I hope it might be. That to me feels like the point to try to intervene, not, you know, we shouldn't use AI for research. It's obviously not going to work.

1:11:28

Speaker C

To what extent do you think policymakers are naive? Because like earlier on we spoke to Sam Hammond and he's, he advises some policymakers on AI and he was talking about privacy and the restrictions that, the constraints that we could put on and, or the regulations. And to me, I think one thing that struck me was that I think a lot of policymakers are not aware perhaps that these like AI, with access to existing technology, you know, with access to persistent search, persistent memory, it would basically do a Google stalking of you before it even met you. It would know all of those things in the public domain. Right. Like the amount of access information that it could have, the persistence of information listening in on conversations. These things are going to be very powerful in that sense. And I think you can ban the AI from using facial recognition. Fine. But then you have network analysis, et cetera, et cetera. You can do metadata analysis on WhatsApp conversations, on where the messages are going. You don't need to know the content. There's a lot of these techniques where you can de. Anonymize traffic, you can de. Anonymize people, you can, you don't need the facial rec, you can ban the facial rec, you can still do gait analysis and speech analysis, voice analysis, handwriting analysis, there's so many other techniques. And all of these things will be available to AI. To what extent does this whole, like, you know, we're going to make sure that we're going to have privacy thing is, are they, are they being naive? Is it going to be possible?

1:12:11

Speaker E

I mean, I think the US has done a worse job of this than pretty much every other country on the planet. So I think there's like some basic rule. I don't think you want to do rules at the level of like, no facial recognition. I think you want to do rules at the level of like, no data brokers.

1:13:47

Speaker F

Right.

1:13:58

Speaker E

Of like, you can collect data, but if you're going to collect it, the user needs to know and they need to have, you know, notice and consent or. I'm not deep on privacy law, so I don't want to pretend that I have the right, the great privacy proposal here, but I do think there's ways to do it that are better than the US And I do think there's ways to do it that give you that sort of underlying flexibility. Yeah, yeah, maybe I'll leave it at that because, yeah, again, render privacy law goes real deep and I'm not there.

1:13:58

Speaker B

One more question for you is, you know, we, in the report, you talk about the possibility that the gap that we think is currently small between the models we have and the models that are used internally could open and you have some recommendations around certain transparency measures. I want to give one quick shout out to the AI whistleblower Initiative founded by my friend Carl Koch who has engaged with OpenAI and at least played some role in their recent updates to their whistleblower policies, which I find like amazing that OpenAI is continuing to work in that direction even today. Where do you think we are on the spectrum from secret non disparagement clauses to where we need to be in terms of insight into what is going on at the labs other than private philanthropist funded whistleblower support which you can avail yourself of again be the AI whistleblower initiative. Should you need that? What other policies do you think the government should be doing? And maybe even more broadly if you want to zoom out like what do you think a situation, hypothetically, situationally aware US government should be doing in general that it is currently not?

1:14:21

Speaker E

Yeah, I think there's a bunch of things here. I think on transparency I think we're doing better than we have been. So we have these two new state laws, SB 53 in California, Ray's and New York. I think those are good starts. I, I think something that it would be great but I think for a lot of this information we're also just really dependent on what the companies still choose to put out. Now we're fortunate. I want to give credit to both OpenAI and Anthropic and to somewhat lesser extent Google. They do put out pretty proactively a pretty good amount of information. So I think they should get some credit for that. But I don't love that it's almost entirely at their discretion what it is that they put out. I guess that will be shifting as SB53 and raise start to be enforced. So I'm interested to see what that that looks like. I think also we need to shift to a bit of a. I think there's been a start, the beginnings of a push to shift from a model release based schedule to something more continuous, which is partly driven by kind of interest in these internal deployment type dynamics, not just the external releases. So the idea here is if the risk is not actually purely tied to when you put your model on the market, then all of your risk evaluation shouldn't be tied to that either. And also creating better incentives for the companies around not forcing them to just rush things out the door, but instead trying to have more of a sort of continuous pulse of updating metrics over time. So I think we could definitely be doing better on transparency and then ideally pairing those requirements as well with some kind of independent audit requirement or independent way to let people come in, external third parties come in and check that Things are happening as they're supposed to be happening. That has been in several of these proposals and keeps getting stripped out by industry lobbying. So that I think is a new frontier as well. I think there's various other kind of policy implications that we put in the report, some that are maybe interesting. One is just this general hardening the world sort of recommendation. Resilience, societal resilience is another way of putting this. So this is cyber defense, biodefence, biosurveillance, investing in biosurveillance, just meaning monitoring diseases, not surveilling people, investing in epistemic security stuff of trying to have a way to determine what's real and what's fake, tagging real content, you know, all this sort of like broader societal resilience stuff, which is like, okay, just assume that this is going to get much, much, much better. And then we might see automated, you know, A, R and D contributing to increased pace of change. I think also there's been some. This is less of like a policy and more of a mindset. I think there's been a shift over the past year or two to like, oh, actually maybe open models are always going to be pretty close behind and so, you know, concerns that you might have about there being an access gap or a concentration of power kind of gap. If the closed models are far ahead, maybe we don't have to worry so much about that. And I think if you're taking seriously the possibility that automating R and D speeds up the closed labs significantly, then we just need to revisit those assumptions about kind of open models and closed models. There's a few others, but yeah, would point people to the report for kind of the full set.

1:15:34

Speaker B

When AI builds AI, things just might start to get weird. So yeah, definitely check out the full report from Helen and co authors at cset and beyond. It's interesting times for better or worse. Any of the closing thoughts before we break?

1:18:22

Speaker F

No.

1:18:38

Speaker E

Great to be on. Great to chat with you as always and yeah, look forward to next time.

1:18:38

Speaker D

Indeed.

1:18:43

Speaker C

Cool. Hello. Very nice to be short in our.

1:18:44

Speaker B

Timeline between now and next time.

1:18:46

Speaker E

See ya.

1:18:48

Speaker C

Cheers.

1:18:50

Speaker B

Bye for now.

1:18:51

Speaker C

Our next guest is Jeremy Harris. He's from Gladstone AI and they wrote the first ever US government AI threat assessment for the State Department. It's been about 10 months now when they said every American AI data center is compromised. Jeremy, what has changed? Have things gotten better or worse?

1:18:53

Speaker F

Yeah, well, to piggyback off, I think Nathan just said things are getting weird, so. Things aren't weird. Yeah, great. To be on what has changed since then is less than one might have hoped and for really interesting reasons. I think one of the things that a lot of people who are on the sort of concern about the AI risk story and just kind of like the AI threat landscape and national security perspective, whether it's loss of control, weaponization, a big part of the story that's missing is understanding the infrastructure build out. Like, what are the actual bones that were mildew on here? Because that's the substrate that underlies everything. And there are all kinds of assumptions and that are being made about it where we're kind of abstracting away. What I really think is like at least 50% of the problem here. And we think a lot about model reconstruction attacks and all kinds of interesting debate about whether it even makes sense to secure models in a world where you can just reconstruct them if an API is available. But more fundamentally, when you're building your entire AI industrial base off of, you know, components that are made in China with personnel who often are Chinese nationals, I mean, this isn't even like, forget about the Manhattan Project. We're so, so far, you know, kind of behind that. The. I think it's incumbent on us to, like, take a little step back and just ask about, like, what is that chess board? Even what is the board itself? Forget about the pieces. But are we playing on something that's fundamentally stacked in a way that doesn't allow for a winnable outcome? And I'm not saying this to be pessimistic. I think there's actually. There's solutions that you come up with very quickly once you take that new perspective. But kind of closing your eyes and not looking at it doesn't address the problem. And I think we're in a space where we're doing a lot of algorithmic level thinking, because that's what so much of the Western economy now is based on. People at keyboards who are used to, we're not making T shirts anymore, we're not building transformers anymore. We're not doing that stuff. And so we tend to like to pretend that it doesn't exist. So that's kind of like, I guess my more recent lens on the problem is the last two years. I know there's not quite an answer to your question, but that's kind of the chessboards I see.

1:19:14

Speaker C

When you look at it end to end, you have, you know, the software piece and the talent piece. 50% of top AI researchers are Chinese nationals. And that includes, you know, people working at Frontier Labs in The US right now. And then you have the infra piece. A lot of stuff is coming from Taiwan, South Korea, some of it is coming from China too. And then you have ASML sitting in Holland, which is supplying into TSMC. And then you have ASML suppliers, they have like 3,000 odd suppliers spread across the world. They're buying, I think neon gas from Ukraine. When Ukraine got invaded, they had a problem. All of these missing pieces, all of these like pieces all spread out across the world, stage right. And TSMC has been upfront by saying, you know, we, we are only possible in a safe globalized economy. If we ever got invaded, everything's over. Yeah, there's no we. Like we can't do anything. That's it, it's fine. So where do you think, how do you think that fits in with a threat perspective? It seems like someone just has a deadman switch over, you know, tsmc. So how does that, how does that work in terms of like security and securing US prospects and the future in the us?

1:21:24

Speaker F

Yeah, I think it's a great question that this whole Taiwanese kind of scenario planning thing is something that everybody has talked about. I'm not so sure everybody's kind of worked out the implications to full satisfaction. Yeah, I mean, so first of all, yes, if Taiwan gets invaded, TSMC is gone. It's gone. Whether it's because China takes it or because it's as I would expect and hope could be trapped to the nines to blow. Right. I mean it takes like, you know, hundreds or thousands of like insane level PhDs to tweak that. You think of it as like a giant box of 500 dials, each one of which has to be perfectly tuned to keep these things pumping out at the right yields. You're not going to replicate that if you're missing either the equipment or the people. So like this is extremely fragile, even the most fragile production process process that primates on this planet perform. So an invasion is unlikely to leave it in China's hands. And so yeah, the question is then what do you get when you roll that back? What's the number two positioned entity? And then you start thinking about, okay, well what does SMYC do? What can it do? And the SMIC Huawei complex does seem like a very plausible runner up, especially when you look at scale production, especially when you look at the emphasis Huawei's place on networking large numbers of GPUs together. They don't have to be as efficient as ours. They can't be. They don't have the logic. But they can be networked together way better and that's how they get effectively, you know, competitive scale performance. So, so this is a real issue. You also, in a funny way, this interacts somewhat positively with the energy bottleneck that we have here anyway, right? Like we're going to be bottlenecked by energy probably sometime around the end of the year. When that happens, TSMC's ability to outproduce, I mean, it gets complicated because on a per chip basis they're way more, they're way more energy efficient, they're pumping out more flux. But we do have that energy kind of ceiling on our, like that's the main constraint that we'll draw it towards. So the timing matters a lot here, right? That dance between how much does logic matter, how much does energy matter, how much does memory matter, all these things, how much does packaging matter. All four of those things have become bottlenecks at different parts of the game in the last few years. Another piece that I think again, when we think about the actual bones that the AI economy runs on, it's not just chips and not just like the data centers themselves. The power grid is a really just generally vulnerable target. We know that, for example, there have been components in Chinese transformers that have been snuck in explicitly Trojans to be able to take down our gear. A very plausible scenario, just like based on talking to folks who are working this problem on the IC side is Taiwanese invasion begins. And one of the first things that China considers doing is just shutting down the West. The. It's kind of obvious if it's existential, you know, that's massively escalatory. So there's huge, you know, huge question marks there. But it's, it's a scenario that's being taken very seriously for all the reasons you might imagine. So yeah, I mean, I think when, if that happens, there are questions that suddenly run much deeper than just our ability to literally make chips where, you know, in Arizona or wherever, wherever the next thing is. But literally, like if we can be kneecaps economically and more fundamental level, we don't even get to look at the chessboard that we hope to look at. We don't even get to indulge in the oh well, what can Samsung do versus what can SMYC do versus X and you know, cxmt like all these things, we don't get to play that game. Like we literally don't have an economy. Like, like there's serious implications there. So if we think about this as a game with the stakes that it might have. And this is contingent on what's between Xi Jinping's ears and the politics ears. But this could end up looking like, you know, we're preparing ourselves to take a punch in the face, but then yet we get kicked in the balls, if you will. I mean it's, this is the kind of scenario to use the technology that, that we may be towards. And again, that zoom out I think is really important. We've got target fixation here on what could be a pretty narrow part of the test.

1:22:41

Speaker C

What you had some ideas on, not only do we need to speed up, but we need to slow China down. And what were, you know, what, what was your concept around slowing China down? Because you know, they have, they're trying their best. They are definitely not there on the chips yet. They ascend. The Huawei Ascends 910s are, you know, by 10. Doesn't really like them. You know, they, they want to get the H1 hundreds in there is this concept of like building on the US AI stack. It's also revenue denial. If you, if you manage to flow the revenue into Nvidia versus flowing the revenue into Huawei, Huawei has more revenue to develop those chips. So therefore we should deny them. Like how does this balance out this like letting them take on the chips but not too powerful and, but you know, still enough that it doesn't create a market of Huawei. It sounds like a very delicate balance here.

1:26:55

Speaker F

It does sound like a very delicate balance. I personally, I'm sort of less oriented towards the argument that says, you know, if we just let Nvidia do business in China, then the Chinese will go, oh sweet, we have Nvidia that's serving our needs. We don't have to push so hard on the gas on this issue that's been identified for years as the possibly number one national technological priority that we are pouring multiple Apollo moon landing like amounts of cash into. Like, like this is to me a kind of a miscalibrated sense of the, even just the messaging that the CCP has been putting out. I just don't see a world. We don't see, for example Nvidia, okay, they can ship the H200 or whatever it is now and then suddenly the CCP goes, oh, okay, forget about that quarter trillion dollar investment that we just made in PPP terms into, into kind of our national AI chip capacity, you know, in infrastructure. Forget about that. You know what we'll see with the Nvidia play, there's a sense both that the Ability to access these Nvidia chips is transient because the next administration may just as well pull it down. But also that like, why not both? I mean it, it seems like an insane thing given that AI is a matter of like national security importance for China. It be pretty surprising to me if they just decided to respond that way. It seems like they haven't so far. So I guess that's why I think about the export control thing from a slowdown standpoint. They have worked. We know from Deepseek, the public statements of their CEO before Deepseek was on the radar and this is actually I think really worth noting and kind of like under recognized and appreciated before Deepseap was on the radar, they were coming out and saying, hey, we really think we could do this AGI thing though, like one issue, there's just one problem. We can't get chips and these export controls are killing us. Then obviously R1 drops and everything is about deep seek and they get dragged in front of the politburo, whatever and debrief and suddenly things change. You get these little trickles, these little leaks of similar information that come out the edges of the kind of Chinese AI ecosystem every once in a while. But it's pretty clear that the export controls were working. If nothing else, just look at the massive orders that are going to be coming in for the H200 to show how much demand, pent up demand there actually is in the AI ecosystem. And of course we know all about the frustrations of AI companies in China and about the current way to land for their chipsets. So yeah, I don't know, this is, that's kind of like my, my bias take is very much towards the direction of, I think we got to listen to Chinese companies when they tell us that our export control policy is working.

1:27:50

Speaker B

Maybe I'll come back to some of.

1:30:36

Speaker D

The.

1:30:38

Speaker B

Frustrating duality of difficulties where on the one hand we have, you have expressed very low hope for the opportunity or the possibility of meaningful true collaboration between the west and China. And then at the same time I think you're also not super optimistic about our ability to create a superintelligence that we can actually control and get to do what we wanted to do. And I think the way I think about our conversation from a year ago or so and your contribution to the broader disco course with America's superintelligence project is like those two things are both real, they're both true. And like you're kind of engaging in motivated reasoning if you try to deny either one of them with that in mind, we are now also seeing like, some of the potentially foreshadowing kind of moments on the, on the AI side itself. Right. Like just in the last week, we've had these new models from Anthropic and OpenAI and they've both kind of said we weren't really able to run the evals like we kind of intended to. Anthropic basically said the eval awareness is pretty high. And so we'll just do a survey and a little internal survey of whether or not this is safe to release that. That's probably a bit of a simplification on my part, but I think that is a fair enough summary of their position. Then OpenAI similarly was like, well, the autonomy risk, the part of our preparedness framework that's also pretty hard to evaluate. We don't really have tasks that are kind of long enough horizon that we can get a real handle on just how autonomously capable a new model like 5.3 Codex is. So that's kind of crazy. And yet, of course, both models are put out there. I don't see China driving the need to do that. It seems like they're doing that and doing it on the same day, notably because they're like their own competition between the two of them. And also just sense of like, rivalry seems to be heating up. They're like going at each other in super bowl ads to some degree at this point. Not something I thought I would see from Anthropic at the beginning was like a, a Super bowl attack ad, but here we are. What do you make of, like, the dynamics between the Western companies? Like, if I were to put on my slightly pessimist hat for a moment, I would say it seems like we might be racing to the bottom, which was exactly what we were hoping to avoid.

1:30:41

Speaker F

Yeah, I think we are racing to the bottom. I think the only, the only frame that makes any sense. If we're going to talk about, okay, we need to regulate this technology, say, domestically, in the same way that everybody from all leading AI companies have been saying for, I want to say, over a decade, pretty much, you're never going to do that unless you deal with the outer loop, the outermost loop, which is international competition. There is no version of, I don't think anyone. I think again, we can enjoy the indulgence in target fixation of like, oh, yeah, let's play the game pretending that other countries don't exist, but in the same way as ignoring the infrastructure target fixation, or, sorry, the algorithmic throwing a Fixation and not wean infrastructure. This causes us to miss what is really the entire problem. So you are not going to get to a point where you can have a strategic or I should say a tactical slowdown when you really need it. Where you're like, okay, suppose we find that the next version of whatever model can design custom bioweapons, execute catastrophic malware attacks. All these things that are entirely plausible and that no counter jailbreak measures are truly 100% effective against the kind of people we'd be worried about. Yes, you would absolutely in that world need somebody to be able to say, okay guys, tactical Hulk, this is insane. We can't be in a universe where you get a nuke and you get a nuke and you get a nuke. It's we can't do all program free for nukes. Okay, so what are we going to do? We're going to have to have a slowdown. If China still exists and has their program and they are X Men, I mean, I'm repeating all this stuff everybody said a million times. You know, they're 12 months away, six months away.

1:33:12

Speaker A

I don't care.

1:34:51

Speaker F

We've got a shot clock now. That's the situation. So we have to start there. Like we have to start there and say, okay, any serious solution to this problem will involve dealing with China. Two ways you can do that. One is you have to come by Amon with China. There are a lot of interesting reasons why I think this is just like not going to work. One of which is if you think about international treaties, they don't tend to reflect some sort of Star Trek y commitment to everybody on planet Earth want to do the right thing. They tend to reflect the rail politic, kind of like land in terms of actual power. Like nukes, you have nuke drawdowns when everybody can retain arsenals that can still destroy the entire planet three times over. And there's literally no point in building the marginal nuke. You have similar things, if you actually look at the history of bioweapon and chemical weapon treaties is like you find in every case, like actually they don't get you the marginal lift over just like killing people with artillery and gunshot. It looks nice and they often get adhered to for that reason. But then at the margins, you have Chinese research labs on American soil doing all kinds of crazy research. You have whatever facilities and all this stuff happens anyway. And so this may sound like super cynical, but I think it just reflects the way things work. That's at least my take. I would Think that. So the question then is how do you, how do you deal with adversary like China that's in the position they are, that does have a stranglehold, as they do on our infrastructure. They simply do. So the question is, what are your offensive options? That is it. There's like, you're not going to build the perfect Fort Knox. This is not, not a thing that's possible. And so the question is, what. What do you do to induce consequence on the other side? That's the only math that will work if my theory of the world is correct. It's not a pretty theory. It's not one that leaves us feel warm and fuzzy inside. It's one that may make you think a little bit about, you know, reach destruction, that sort of thing. And I think there are nuances here. Obviously Dan Hendricks had his, his frame on it, but the bottom line is, yeah, I mean, I think you kind of need, and it doesn't need to be an AI based response though eventually, you know, you can certainly argue that any offensive auction that isn't coupled to the scaling laws is eventually going to be beaten by something that does.

1:34:52

Speaker E

Right.

1:37:01

Speaker F

So, so there's kind of an important design principle in these things. But there are offensive options that need to be explored. And this is unfortunate, but it does mean that if you have a situation where your adversary can turn to you at any time and say, hey, watch me turn the power off on your entire grid and have like tens of millions of Americans reward die of starvation or exposure, like you need the ability to say, okay, you know, watch the same thing happen in, in Beijing and we can turn it back on. By the way, we need to have this de escalation option. I know it's a bit of a grim view, but I think that like, when I think about what actually gives leverage in this situation, it looks a lot less like, let's have a pretty tricky. Especially given the history of countries like China, like Russia, with respect to treaty adherence, like they signed treaties. Like, we know what it looks like when China signs a treaty. It, it doesn't end up being pretty. In a situation like this where you just, you need perfect adherence at such a high level of precision. Like there's no version of an international treaty on AI that doesn't involve inspections of compute stockpiles and like very precise overwatch of the kinds of algorithms that are being deployed, like the kinds of, you know, evaluation schemes, like the level of cooperation that's required to do something tractable here strikes me as being quite significant. And the Trust just like I don't see it being there.

1:37:01

Speaker B

So what's your p. Doom and.

1:38:25

Speaker A

And on what timeline?

1:38:27

Speaker B

I mean, we were just talking with Helen about this report that they put out about when AI builds AI and the possibility of recursive self improvement. Sure. Seems like all of the vague tweeting that is going on right now out of the Frontier Labs is suggesting that that is happening. And then on top of that, of course OpenAI has public timelines that they've put out. I guess to their credit. Maybe. I guess we could see that both ways. The anthropic people that I talk to are, if anything, always the most firm believers that the recursive self improvement dynamic is unavoidable. How long do you think we have before. Before these things really start to take on a kind of runaway dynamic? Is there anything that you, you know, if you had power, you know, and a lot of power, is there anything that you feel like you would want to bet on and where does that leave you in terms of pdu? Maybe I should just stop all this and spend more time with my family.

1:38:28

Speaker F

Yeah. In general, I, I'm a big fan of the happy warrior mindset. I think it's like just never constructive to be to go and hold. First of all, we have to assume that no matter how firmly we might believe in whatever outcome, we may just turn out to be wrong. There's a famous story, Richard Feynman walking around New York city in the 70s. I think it was looking at all the skyscrapers and being like, wow, isn't it sad that all of this is going to be wiped out by a nuclear war between Russia and the United States sometime in the next few years? And he was just. That's just. That was a fact of the matter and it reflected a pretty reasonable understanding of the dynamics unfolding between those countries at the time. I'm not saying it's ever quite that simple, but this is an ingredient, if nothing else, that makes you less effective if you're just stuck in a hole all the time. Just as a meta point, I guess that I think is the first piece. We have to act with agency and we're going to be most effective doing that if we're not stuck in a deterministic Calvinist frame with this whole thing in terms of. I'll also not answer your question before I answer it. Just by saying regardless of timelines, one thing to focus on is that there's some things that are pure optionality plays. There are things that you do if you're Going to build a frontier AI cluster at scale that rule out nation state security at that cluster. If you don't do these things right on day one, by day 360, once you've finished building the site, your site is going to be compromisable and there's no going back into doing that. We think of these as we call like the one way doors, right. Of the data center construction process, figuring out what those one way doors are, setting standards for them and actually executing on that. Even doing it just like voluntarily. Right. Like when you think about opening eye and anthropic and so on, like kind of all independently going hey, we just want to buy that optionality because at some point.

1:39:28

Speaker C

Can you give me a concrete example of a one way door?

1:41:24

Speaker F

Yeah, so. So there's a bunch that I can't go into, but one that I can, it's pretty easy is think about like the people that you're getting in the loop to review the site plans and details that would be, let's say, useful to an adversary who's trying to extract information. If those people are Chinese nationals. Okay, you're done.

1:41:28

Speaker C

Cool.

1:41:47

Speaker F

Right? Like you're never going to unfuck that. That's baked in.

1:41:48

Speaker E

Right.

1:41:51

Speaker F

So these are actually the interesting thing with these one way doors is they tend to be surprisingly cheap. And that's the tragedy of it all is like you actually could, if you were thoughtful, go through and be like well on a fraction of the budget that would be required in Capex and OPEX for these builds. You just create pure optionality by implementing things. That's a really important element. Putting offensive options on the table is a pure optionality play. You don't need to exercise those options, you need to have all the tables. That's what I'm saying. I'm not saying let's go to war with China. That's a crazy thing to say. There's all things in their context, but you need options and that's a crucial thing. So having an understanding of mapping out the ecosystems that are relevant to AI, ecosystems that are relevant and thinking about what might that end game play out to be, those seem like pure optionality plays regardless of timelines, all cheap, all things you can do quickly again it seems to me kind of. And I'm not saying they're not being done, it's just that often there's a lack of focus on the end game here anyway without getting into the weeds too much. Okay, so PWM and timelines. Oh, sorry Prakash.

1:41:51

Speaker C

Yeah, no, go ahead, go Ahead.

1:43:04

Speaker F

Pdom and timelines. So I'll almost say P doom. I don't find it useful. Like I, I know, I know what I'm focused on. I know what I gotta do. If I start thinking about like my generic answer has been for years, like any number between 10 and 90% I'll take as like, that's a reasonable number. I'm not like, I, I've, I've read the debates, I've seen the post on. That's wrong.

1:43:07

Speaker E

Yeah.

1:43:31

Speaker C

So is that, is that your P doom or P loss of control to superintelligence? Because I think, I think in some places you've mentioned it's a loss of control to superintelligence rather than doom.

1:43:32

Speaker F

So yeah, you've obviously done your homework really well. Yes, that is more of a loss of control. The super intelligence, I think, I think by virtue of the way that numbers multiply together, I don't know that my, my answer is that different for like P doom in general. Again, this is coming from somebody who for better or for worse has almost explicitly not put in that much time to, to kind of wallow in those numbers as I, as I think we're sort of all tempted to do. Right. Like, I have that temptation. I, I get it. Like I, I mentioned I had a daughter, right, that like I, I don't like the landscape that's playing out. But I had a daughter. Like I chose to have a daughter and, and I didn't have her in like 2018 before the scaling laws blew up. Like, this is, you know, this is a choice that I made. I think there's, there's a kind of almost like spiritual risk to getting locked into that kind of thinking. And I say this as somebody who's experienced that, you know, I went through that and settling how it ran my years. So, yeah, so I guess I'll, I'll just like, you know, not answer the question by saying 10 to 90%. Sounds reasonable. I think if you're below 10%, I just really think that there's like, there's homework you got to do because a lot of these scenarios are maybe sound crazy, but they're just, they're a lot less crazy than they seem. And when you get into the nitty gritty, it's like a lot of these scenarios are already, they're halfway one falling if you're above 90%. I mean, first of all, if you live as if you're above 90% and you're like, that's going to just make you less effective. I Also think again. Richard Feynman certainly seemed to think he was in that ballpark. There's just an epistemic question here of how quickly does the world adapt? I think we're constantly surprised by how quickly the world adapts, both how fragile and how resilient it is. And like the 11th chapter of the book will often involve a new character that comes out of nowhere. And we just need to kind of make sure that we keep uncertainty about our uncertainty factored into this analysis. And that I think that buys me 10% pretty easily. I've been wrong on stuff that I thought I was a hundred percent on often enough to be like, okay, I'm not, I'm not going to push it that much. And I know that's frustrating for like a lot of people. Like, no, no, but like, look at the math, man. And I get the math, but what I'm questioning here is just the process that led to the math. And I don't know that I can plausibly ever get fully behind that process and interrogate with confidence. So last thing is timelines. I thought AI 2027. And contrary to, I think Dan has kind of pulled back a little bit his timelines from that.

1:43:42

Speaker C

He said 2027 always meant 2028, but now it means 2029. Yeah, yeah.

1:46:13

Speaker F

AI of the Apocalypse is future and it always will be. But you know, not actually, but you know, I think there's like, there's a sense in which. So when GV3 first came out, I was like, oh man, I've got two year timelines. And, and that was because I didn't understand what the hell would be involved in the infrastructure build out. And now that I have a much better understanding of that, I'm still kind of like, what's the next bottleneck going to be? I'm very uncertain about this. And again, it's one of those things that doesn't really affect what I do just because I'm so focused on just all the low hanging fruit that we have to pick right now. There's so much stuff that we're just not doing because we're paralyzed by the problem. So I think in terms of what we do, there's pure alpha on the table. In the short term. 2027 doesn't sound insane to me. 2030 doesn't sound insane to me. 2035 sounds a bit far. I guess I'll sort of leave it at that. As a spread. I think we should be acting as if 2027 is plausible. I think it would be Unfortunate if we, if it happened in 2027 and we're like, man, we had a lot of really plausible analyses that pointed to that and we just didn't do anything that would be a shame.

1:46:20

Speaker B

Can you give us a little bit more of a hit list in terms of the low hanging fruit that you want to see us pick? I mean, it's. We've got the one which is like build at least some subset of our data center, build out in a secure way so that we can run hypersensitive projects there as needed. What else is kind of on the. If you, if you're the. Replace David Sacks as the next AIs are, what's going to be your priority sheet?

1:47:27

Speaker F

Yeah, I mean, so that first one, by the way, is a lot of things, right. So it bundles together. I mentioned the personnel security issue, insider threat problems. There are a huge number of things in that bucket alone that are necessary and contribute very cheaply to much more optionality on the security side. I think, again, you zoom out more. So you look at the grid. What could you be doing to introduce redundancies quickly? The supply chains that lead to a lot of these components are very clearly sourcing heavily from China. Here's an easy win. Look at the companies that are offering to build data centers suspiciously fast and who owns those companies? So there was actually a letter that came out from the House Elect Committee on CCT a while ago naming Day one data centers. Right. As an entity that is somewhat suspect. And you'll have these data centers where data center building companies where it's like, oh wow, you can build stuff way faster than anybody else. It involves sourcing components from China. And like, my personal opinion is if I were to see that, I might be asking myself the question. China is kind of a command economy through civil military fusion. If the CCP wants me to have this very, very rare and precious and backlog component for my data center in the continental United States, that might tell me something about how much faith I should have in security and integrity of that component. You know, there's just not a lot of infrastructure level kind of attention being paid to these things. And the labs, by the way, like, they want to do the right thing here. They don't want to be in a position where they're getting a company to build something for them. And then it turns out that that thing is compromised and it comes out that that is not good for anybody. So incentives aren't wide there. There's just. Yeah, it's things like that where there's been so little attention paid to the bones that there's just tons of stuff we can improve, including with AI.

1:47:58

Speaker E

Right.

1:50:05

Speaker F

Like looking for malware in old software that's load bearing for our infrastructure. Or not malware, but rather vulnerabilities and finding ways to harden it. So yeah, this is a defocused answer, but hopefully gives a sense of the venue.

1:50:06

Speaker B

One thing we haven't really given you a chance to flex your ability on in this conversation is just the breadth and depth of your technical understanding of so many AI developments. And I definitely recommend the Last Week in AI podcast, which you usually host as a great source of I think very sophisticated analysis by, by both of you. But, but I tune in for you mostly, to be honest, and I wonder, I wonder how you are doing it. How are you keeping up? What is your. How have your methods evolved so that you're maintaining situational awareness as much as you can?

1:50:23

Speaker F

Well, thank you. First of all, it's very kind of you to say. I have told you this before, but I do actually watch the Cognitive Revolution and it's. I think it's. Anyway, there's a lot of the ecosystem here is really rich and interviews are really important because you get stuff that you can't get from the papers. And I tend to focus more on the papers, so I just don't get that kind of analysis and I just talk to friends from the labs. But it's different from those deep dives. Yeah, I mean, so back in, I can't remember when I started on last weekend AI, but it was like in maybe 2021 or something. And back then I would just read papers and you couldn't use GPT3 to help you understand a paper. It just wasn't a thing. Now that's changed. I've had an experience that was kind of frustrating this week in particular because I'm preparing a state of play briefing for a customer. Basically they want to know what happened the last quarter in the world of AI that we should be tracking. There was a paper that I had Gemini helped me with and I got to a really good understanding of how it is like a residual, like the dynamics of radiant throw flow through this like residual whatever. And it was pretty complex. What I realized though, after having an interaction with Gemini for long enough, I switched over to Claude and I was like, wait a minute. I just like hallucinated my way through that entire conversation, got to an understanding where I was like, oh yeah, I'm pretty smart for figuring this out. And also I got this nail pat and everything got flipped, flipped around. So not saying that always happens, but that has been kind of the most recent update in my process is really be mindful to kind of double check, especially as you start to get lost in a rabbit hole. I typically spend like I would say about 30 to 40% of my time reading the paper and then the rest interacting with a model about usually it's like the implications of the paper or what it is is it's reinforcement learning versus super biased fine tuning. Like if I'm reading the paper, I'm doing sft. Like that's what's going on with the models. I get to actually like go on policy and I get to test my own understanding. Like I would have done this experiment differently. Is that a stupid idea? And often I'll get a pretty good answer. And that it makes you feel like you're rotating the shape instead of just staring at it. And that for me it's been just really helpful and empowering. It feels empowering.

1:51:05

Speaker B

Do you have any particular workflows, pipelines, whatever that try to filter things for you and surface what you really need to spend time on? Because that is as challenging, I mean it's more challenging than ever. And it seems to be maybe as big of a deal as being able to successfully make sense of any one thing is like what are you going to choose to spend your time on in the first place? How has that evolved for you?

1:53:30

Speaker F

It's a great question. This is that age old question of taste, right? And one of the things that I've had to come to accept is I can't develop good taste in all the domains that we want to cover on the podcast, like I'm never going to. My taste is basically if one of the frontier labs puts out a piece of research or if the researcher I know and have a lot of respect for appreciation for put something out or as a co author on something, I'm going to take a really hard look at that. And then besides that, I have the usual Twitter account set of Twitter accounts that I follow and that's another way. But my passes at these papers are I'm pretty focused on the what's on the critical path to ASI question. Not that I know the answer, but just like I'm trying to find things that to me gesture at that. Which is why, you know, I don't tend to talk about like Gans or the latest in, well, I was going to say the latest in text to video. Now that seems like it could be on the path. So you Never know. But I guess part of it is just acceptance. I am reading these papers for the concepts more than the outcomes. And often what will happen is there's this paper that will come out and it might not be the perfect paper to cover from a given topic area. Oh, you know, residual connections and really optimizing the crap out of them to get ultra deep transformers. There's this paper about it. Is this the best paper? Probably not. But the reason I focus so much on explaining the underlying kind of concepts on the podcast is that A, there's going to be another paper next week that obviates whatever the hell the last paper did, and B, I think that the kind of core concept is the most of what the landscape is the most important thing. So that when there's another paper that comes out about optimizing residual connections, you're like, okay, I'm familiar with this playpen. Like, I know. I know the furniture in this room. I can rearrange it a little bit, be more confident. So I guess the answer is just like, I get around the taste issue by not having it, which is maybe.

1:53:59

Speaker B

Just what's underappreciated for you right now by AI obsessed people. I mean, there's of course, like, in the broader world, AI is underappreciated, and just how crazy things might soon get is, I think, very proudly underappreciated. What do you think I might be missing? What are the most likely blind spots for somebody like me that you would want to draw to attention?

1:55:56

Speaker F

I guess the challenge with blind spots is that, like, we all have them, and by definition, we don't know that we have them. So what I'll try to do is roll back and tell you about my blind spots as of about two years ago. And that was around the time that we put together that report that Prakashi mentioned earlier. So I sound like a broken record. But the infrastructure layer, the kind of. The stuff that this might sound with a wrong way to put it, but the stuff that feels too blue collar to most people who are AI obsessed like I am, you sort of start to realize how much of the world is actually built on infrastructure that we just abstract away. So I think that's actually really important. That needs to be footstep, like, understanding what is the down to. Like, what is the dynamics of the leasing process that Frontier Lab goes through to get a new piece of land? Like, what can screw up there? What causes delays in construction projects that we talk so much about? You know, this lab has a xai, has their new Colossus cluster and it's going to be online shockingly like to one gigawatt sooner than anthropics, which surprised everybody and all this stuff. Why did that happen? Like what was the actual driving? Because that's going to tell you if you believe in the scaling Mosque, that actually is probably one of the most important variables that you want to track is like delays in construction processes. Sounds pretty mundane, but like hey, the world runs on it and procurement schedules and things like that. So I guess that's one piece that I've been missing. Another one is like how, yeah, how kind of real nation state security happens. And as hard as information about that, one of the biggest challenges there is that there is no such thing as one nation state security. Key capability. Nation states are siloed obviously, because security, you can't have tactics, techniques and procedures that are exchanged between silos because then there's no information security. And this by definition means that you would have to go through a process of taking team A, comparing them to team B. Well, okay, team A wins. Okay, so next day, like you'd have to go through that kind of selection process, run a delo score type situation to even know what the most exquisite capabilities are that like we could field and it still wouldn't tell you quite what other countries can be in. So that's kind of like. Anyway, I think a really important dynamic that's very easy in the AI security context. Especially for physical security. Especially for physical security, which is again undervalued precisely because we tend to abstract away. We focus a lot on cyber because it couples, it couples to AI and it feels like it's in our sweet nerdy space. And I get that and I love it and it's critical, but it's also not. If you look at like what the Russians do. They like they do cyber for sure, but they will go up and they will arson your transformer. That's not an issue. Examples of that sort of thing happening. So anyway, there's that piece, maybe the last one and more in the kind of comfortable and familiar learning space that I occupy is just this idea of the distinction between having a model and then having the compute to run that model. If you believe in the inference time scaling laws, then model theft is one thing, but actually being able to point that model at basically have compute on compute war at inference time seems like a really important dimension. And you see this play out in a lot of interesting ways, one of which is the Chinese ecosystem. They have a huge number of users and then they have some okayish language models. The problem is that their labs are all flooded with these inference requests from their giant user population which leaves very little R and D compute for just like innovation improving of models. That's actually one of the frustration for Chinese labs. Much more than labs here. They're just like dude, we have so much demand but we're not bottlenecked by money, we're bottlenecked by compute. And so the dynamics of how inference affects training and then how like what it means to steal a model and what it means for model on model warfare to happen, especially cyber like cyber hardening has a certain amount of test on compute. That test on compute is going to be focused on in some way and then the offense side is going to have a certain amount of test on compute. And how those play out like the relative budgets matters a lot there. Obviously if you're defending you have a wider surface area you got to defend. But anyway there's a whole debate there that I guess would be another dimension. It's just like going beyond just boning the model, like worry about running it. What can you do with this model that you have?

1:56:19

Speaker C

I have one last question which is you're pretty security conscious. Have you run openclaw and what is your current personal productivity stack?

2:00:44

Speaker F

Yeah, yeah, yeah. I have not run openclaw. I have. So it's actually funny you say that I'm setting up a like I have an old laptop that I'm going to use as my. My burner laptop for the purpose of exactly that part. Yeah, partly because anyway for the exact reasons you would imagine in terms of my understanding.

2:00:53

Speaker E

Yeah.

2:01:12

Speaker F

So a big part of my job is becoming now constructing agentic workflows to do some things that are not super security sensitive but more just like I'm going to try to use it to optimize my comms because that's a huge bottleneck for me. And, and for that actually I'm still in the discovery phase of trying to choose platforms. I'd be interested in your thoughts for that actually as I dive in that's this is literally like next week is my deep dive. This is almost the worst possible timing because I think my answer is going to be like horribly outdated. Yeah, it's a great question. I wish I had the answer.

2:01:12

Speaker B

I talked about a little bit about mind at the top. Interested to hear more about what Prakash is doing too. But for me right now it's Claude code as the kind of base product and then taking inspiration from a guy named Daniel Meisler who I did an episode of the podcast with who's created personal AI infrastructure that's an open source framework and also friends who I just trade notes with privately. I'm trying to create deep context for myself by first exporting all of my digital history from all Gmail, Slack, all the other places where I have these comms, get them into a local database. Then of course you need a daily update process to fetch the latest because you're still communicating on all these other platforms, then layering on top of that summarization and different angles on the data. Right now I'm at the phase where I'm like here's a month worth of all comms that seems to for me come out to about 300,000 tokens. Now summarize that down to like 10,000 tokens of what a chief of staff would need to understand this month in Nathan's life. So you got that like 30 to 1 reduction, then probably put like a year long version of that and then sort of the let's talk about the relationships and kind of have that sort of cut on it, the projects cut and then hopefully with that deep context and the I'm also trying to have it leave pointers in those summaries with like regular habit of quoting any distinctive language so it can go search down the ground truth for the original. Hopefully it will then have enough context to be able to, you know, not exactly right exactly as I would, but sort of come much closer at least.

2:01:54

Speaker A

To.

2:03:42

Speaker B

Responding as I would having the sort of the context necessary to exercise something like the judgment or taste that I would exercise in doing things. And that was actually part of the process of setting up this episode. I gave that system 20 names and said do research on these people, find out what they've been up to lately, give me a brief on that. And then also had it draft the outreach emails which were only lightly personalized. And I still did kind of go in a little bit before tweaking, but.

2:03:44

Speaker F

I appreciate that, that's nice.

2:04:14

Speaker B

But yeah, I don't like to publish or even send as like one to one communication AI output directly, but I do find that I can get to something that I do feel comfortable signing my name to faster with an AI draft in many cases these days. And so it's very much a work in progress for me, but that's kind of where I'm at at the moment. And again it'll. I'm sure by the time we talk next it'll have changed quite a bit. Prakash, what's your angle right Now I've got.

2:04:16

Speaker C

I've got a couple of things that I ended up building out. One was a market, stock market tracker. I have a number of metrics which I think no one else watches, and it's fairly hard to obtain. And the great thing is, Claude is very good at financial math. Very, very good. Far better than I have ever been. And so it's relatively easy to talk to Claude and figure out what kind of thesis you have and then build out metrics precisely for that thesis to watch pickup lines. So that's been very useful. I used to do it in my head. You look at something, look at something else, and then you calculate the ratios, blah, blah, blah. Then I realized that I was spending a lot of time doing ratios in my head, and I was like, maybe I should automate this. Now it's all automated. It's nice. I don't do the ratios in my head anymore. I just look at it and I can see the screens automatically. I can see what I'm looking for. Then the other thing was podcast clipping, because we do a lot of podcasts and content these days has been repackaged into short clips in order to hit socials. And so that I tried, like six months ago, the tech wasn't there. And I tried about three, four weeks ago, the tech was there. Everything works. Transcription works, review works, selection works, everything works. And this is. Yeah, and this has been my experience. Like, it's kind of like, it's kind of like maybe, maybe it gets it like 1% better, but that 1% better clears the hurdle. And that's like that binary. That's a binary step up. Like, it works or it doesn't work, and that 1% just clears the hurdle. And I. I really feel like in the last month, like, a lot of things started clearing the hurdle.

2:04:52

Speaker F

So I was just gonna say, you know, when you said that in the last six months, so many things have gone from toyed to just serviceable in production. And it seems to map onto. Nathan, what you're saying earlier about the takeoff dynamics and the labs automating their own research. That all kind of maps very nicely. One of the things on the financial side, too. So I found cloud is also useful on questions like, so you might have a thesis, but then there's a question about how do I. If I write about this, what's the best bet to make? Because that's a category of problem I've mentioned in the past, right? Where it's like, you know, you'll have a thesis, but, you know, you're not going to bet on Microsoft because OpenAI is such a tiny fraction. It's already all of a sudden, how do you, what is it? How do you leverage and torque to this thesis? And that's kind of something that, you know, the world is so complex that you just need something to peruse and kind of all that knowledge. So the finance use case is a. Yeah, really great one. Great point.

2:06:40

Speaker C

Yeah. I also, it's also been very, very weird in the market because I feel like Twitter is literally like a month or two ahead of the market. It's, it's just been amazing. People tell you like, TSMC will do well and then like, like three months later it happens. Like, it's like, what, what's going on? Like. And I, I, you know, I was, I was a professional financier. I, I've always expected that hedge funds get there before, before you do. And in talking to my friends at prime brokerages and hedge funds, they are very negative on AI. They just don't believe it's happening. They don't believe, you know, they believe it's like crypto. They believe a lot of west coast tech is just scamming retail investors. Index investing is the only thing that really works. And everything else is either insider trading or scams. And that's pretty much what the prime brokerage guys and the hedge fund guys, like, believe.

2:07:36

Speaker F

Like, you know, Medallion. Right. Or Jean Street. Right. These guys who have AI in their phones and these, I guess, Medallion. It's like they can't, you know, they only invest, you know, famously 5 billion a year because otherwise they would actually move the markets in a feedback loop. But yeah, what about them is there?

2:08:33

Speaker C

They were down last year, so the impact is starting to be felt, I think. Well, you know, also Jim Simons died. I don't know to what extent. He was still supervising because he'd already kind of semi retired for, for like 10 years almost. But Medallion was down. There's some sense that it's also because they're losing talent to the lab too.

2:08:49

Speaker D

Right.

2:09:07

Speaker C

Like, you can't forget about that. They're starting to lose talent to the labs and some of the labs do have like, internal teams which will eventually look at trading on the market, I think. So we'll see where that goes. Very cool.

2:09:08

Speaker B

Jeremy, thanks for joining us. Let's check back in on your personal productivity stack once you've upgraded it. And in general, I'm reusing this joke everywhere I go. Let's shorten the timeline for our next conversation.

2:09:23

Speaker F

I like It. Thanks, guys. Appreciate it.

2:09:38

Speaker C

Thanks, Jeremy. Cheers. Cheers.

2:09:41

Speaker B

So what do we make of it all? The big thing I can't get past in all this stuff is the amount of disagreement between. And this has been commented on so many times, so many ways, right up to the level of the Turing Award winners that can't see the same phenomenon. Yeah, but it seems to happen at kind of every layer of. It's like a fractal problem. You go into these specific workshops around AI R&D, you get people from the labs. I do understand that there are even people at the frontier companies that, you know, have heterodox positions and don't really buy into the hype. And then even with the AI for Science, it's like, I can't make any case that I should trust my own intuition more than abhi's, because he's. How many times did it happen in talking to him where he was like, I've actually written about that. So he's clearly thought about this much longer and harder than I have. But it does still feel like it's a very hard thing to reconcile. Where you do see these examples, and it seems like some of them are really starting to work. But then the skepticism remains and is like, is very hard to move people off of. And I don't want to paint him as overly skeptical either, because he did say toward the end, I think his skepticism is more like backward looking, you know, forward looking. He was kind of like, I do believe the, you know, the trends will continue and that they will have impact. Yeah, but where does that. How do you try to make friends.

2:09:45

Speaker C

Of when you say it's fractal? I feel it's also fractal internally to me, where I have some assumptions here. And then sometimes I feel cognitive dissonance from something else that I might believe. And then you kind of test those assumptions and you see where things are going. I have had moments of truth or kind of perception where I start to realize that I think things might move faster than I expected. My original timelines were end of 2025 for junior software developers to be replaced in capability, not in organizations, but the capability is available at the end of 2025, and it takes about three years to percolate. So end of 2028, no more junior software developers, basically, or at least the task that junior software developers are doing today. And then so I had end of 2025, end of 2026, end of 2027, and of 2027, even senior researchers at AI labs, you know, full, fully capabilities are done. The models have a capability, but deployment again takes two to three years. It takes time. That was my sense. My update in the last month has been probably that things are going to go faster than we expected and that we will see discontinuities. And those discontinuities are like this kind of like things get 1% better, but all of a sudden they clear the. And we don't have a good sense of these things because we keep seeing linear improvements and they're kind of linear, maybe super linear, but we don't have this sense of clearing the hurdle. But when it clears a hurdle, it's obvious. It's obvious. It started to be obvious for software I think in the last month or so. So I think that we just have misperceptions on where things are going because we can kind of see the trajectory of capability or we don't understand how humans absorb that capability. Like what is that process and what hurdles do we need to clear? What is the open claw? I thought you need full security and privacy and all of this stuff. It seems you didn't. It seems like people are willing to put out their credit card numbers and crypto tokens on the open web and.

2:11:16

Speaker B

You don't need privacy.

2:13:28

Speaker C

The notebook guys are, there's a moat on notebook saying like, oh, you know, my user is so annoying. Here's his credit card number. And Scott Alexander ended up calling up the guy and asking him like, hey, you know, did this actually happen? And yes, that was the credit card number. It was leaked. So I think this clearing the hurdle concept and where humans accept the technology and kind of where the market pulls that technology that we don't know that even I don't have a good perception of. But it seems like we're starting to clear those hurdles where humans are starting to pull the technology from the market in and that's when you start to see kind of revenue growth. That's when you start to see, you know, the, the demand growth really happen. Where the market starts to pull the product out of the ether. And I think that's happening now. I think the open claw, I think, I think we'll have a much better version of OpenClaw, closed sourced, secure version running on, you know, corporate, running inside corporate data centers. You know, by the end of the year, I watched the all in podcast. Jason Kalakanis, not the most technical person in the world. He had a team for the all in, about 15 people. He started to get everyone to create a skill for themselves. Every task that they do, they create a skill. He has open claw machines, one machine per person. And then he has a consolidation agent that consolidates everything into something he calls Ultron. And then he can talk to Ultron so he can ask Ultron. And that's his entire company. Like it's a summary of the entire company and he's talking to it. I thought that would be two years from now. I knew it would eventually happen, but I didn't think it would happen now. So, yeah, I think things are, things are actually moving faster than people think because of the market acceptance. The market is pulling it out. I don't think the researchers have a good sense because researchers don't understand the market that well. They don't understand the demand dynamics that happen with consumers and how, how products get pulled out. Once a demand is there, like products will just get pulled out of the ether because people start focusing. They know that money can be made there. They just start focusing on it. So that's my sense. So I, you know, I. It's not a, it's, it's not a like, pdum answer. It's more of like, this is what I feel people want to answer. So what's your feel?

2:13:29

Speaker B

The confusion and the lack of ability to establish consensus on foundational points is a major challenge to having a lot of confidence on much of anything. I do think a good true north for me, well, the true north for me with everything I'm doing is trying to learn as much as possible, trying to have the most up to date comprehensive worldview as possible. And in terms of the approach that I would trust more than any other, I think still being hands on is second to none. And I, you know, I haven't allowed that to lapse much at all over the last few years. But anytime I do get too busy or, you know, cluster too many podcast recordings into a week or whatever, I always kind of come away feeling like I got to get a little bit more grounded with the latest stuff in a very interactive way. And I think one metric I have for myself, or metric is maybe not quite right, but an indicator that I want to pay attention to this year is can I get to the point where I'm spending less time at the desk? And that's like along the lines of the, you know, the Jason talking to Ultron. I want to be able to do stuff while exercising, even if that's just a walk around the neighborhood. I want to get the frameworks, the tools, you know, the deep context, all that stuff set up well enough where I can start to go comfortably out into the world, have a thought, you know, maybe have an actual conversation, but move things forward in practical ways in on fronts that, like, right now I can really only do on my computer. I think a lot of that is right now with the latest models that have come out kind of on me just to get the setup and the sort of familiarity in the workflows to be able to do that a little bit. Probably still more than a little bit, but, you know, I put more on me right now in terms of, like, why have I not hit maximum capacity than I do on the, you know, the models or the model developers? More computer use would help for sure. You know, a little bit more ability to just get over these sort of UI humps remains, I think, a barrier. And I nothing I really have been not surprised by, but something I've really learned from just being deeply interactive over the last few weeks is, I think another big unlock to watch for is when the models get better at knowing when to use code versus when to use their own fluid intelligence. Because one of the first projects I've been doing is like just backfilling information, backfilling transcripts of the podcast for the website, backfilling all these different data sources into a queryable database. And you hit so many edge cases in doing that. And the model right now, Claude Opus, we've gone from 4.1 to 4.5 to 4.6 pretty quickly, but pretty consistently. I have felt like it really wants to code, and I have often given it the feedback. Don't try to guess at this and write some sort of regular expression. Or it'll grep for one search term or another, or it'll throw 10 search terms into a grep command. And a lot of times I'm like, just read the document. If you just read the document, you will know what it contains, you will know what to do. You'll have the right judgment once you have read the document. If you don't read the document and you instead try to grep your way through it, you're never quite going to get there. And so that's like a metacognitive skill that I think I've been able to improve its performance somewhat through prompting. But it's, you know, it's obviously going to get better in training. And that, I think, will be a huge unlock. Just as it gets a little bit smarter around its own, a little bit more, a little bit more inclined, or a little bit more intuitive about when it should deploy its own fluid intelligence rather than use other tools. Getting that Balance. Right. Will be. Will make it, in my experience, dramatically more useful. And I have to imagine that's coming pretty soon.

2:15:48

Speaker C

Yeah, I think, you know, when you, when we talk to James Zhao today, that continual learning piece, the test time training, it would be fascinating if it actually worked with your own model because your model will start to diverge. You have the baseline and then your model will start to diverge and then it would become your personalized model within, within like, you know, two or three cycles of like, you know, talking to it, you know, month, two months of data, it will become your own, your own model. It would start to diverge from the. Yeah, and that, that would be fascinating because. Because at that point it's for real. Like you can. And especially for I, I used to write a lot of journals. I have, you know, 99 to 2003 at Stanford. I have full journals for every single, every single month.

2:20:15

Speaker B

Like everything that happened.

2:20:58

Speaker C

You know, obviously I've never read those after writing them. It's just kind of an exercise in journaling. But I do wonder if, like those of us who have lots and lots of written work, either in the public or in the private, once you get this continual learning going, you can kind of start feeding it in. This is what Kurzweil is doing with his stats writing, by the way. He's feeding his dad's writing into these models and he's like talking to the model about his dad. Kurzweil someday is going to feed all of that into test time training kind of model and, you know, with a voice access and he probably has a recording of the dad's voice and he's going to start talking to the dad. It's a fascinating time.

2:21:00

Speaker F

Yeah.

2:21:39

Speaker B

To say the least. More explorations of all these themes to come. A couple things coming up on the Cognitive Revolution feed. One is with Ali Behrouz, who's the nested learning author. He was on our last live show. I did do a full three hour. Ali's take on everything and he's got a new paper coming out also that I think, you know, it's the way to continual learning is starting to become elucidated, I would say. I mean, I wouldn't say it's clear, but it's, you know, no less than Jeff Dean has said that he kind of sees this as a very promising paradigm. So I'm definitely watching that really closely. Workshop Labs is a, is a startup also. That's like trying to do this, you know, personalized model training on top of like the latest large open source models up to the sort of Kimi scale, trillion parameter kind of thing. So that's really interesting. That's actually another reason I spent so much time doing all this personal data curation is that I wanted to be able to give them a data set for them to train a model for me on that would be like a really good data set. They don't need that much data. But I was like, well, you know, we want to make sure it's the right data to hopefully get a good model back. So that's still pending. I haven't seen that model yet, but I'm very going to be very interested to see how much that closes the gap between what Claude can do with just access to, you know, all this stuff in text and then how much does it help to actually start tuning weights to, to try to capture more of like they aspire not just to style transfer, but judgment transfer. They want the model to reflect the judgment that you're the individual user would make at the time. And an interesting theory there too is they, their motivation is that they want to help individuals preserve economic leverage. So instead of like doing everything through a foundation model and kind of adjusting yourself to take advantage of the model, they want to shape the models around individual humans with the goal that, you know, it's not a winner take all. Big tech runs away with everything but some sort of more decentralized ecological kind of proliferation of, of somewhat different models that hopefully at least kind of can exist in, in some sort of equilibrium with one another. And then on top of that, there's another one that I've come into soon with the founders at Harmonic and they are chasing mathematical super intelligence.

2:21:40

Speaker C

Yeah.

2:24:11

Speaker B

And when it comes to like these, I will say just as a teaser, they gave maybe the most ambitious vision of what five years from now could look like, the most like mind blowing vision of what five years from now could look like of probably anyone that I've heard. And that is saying something because I've heard a lot. But they still kind of blew my hair back a little bit with what they think they can accomplish over the next five years.

2:24:12

Speaker C

Definitely going to look forward to that one.

2:24:37

Speaker B

Lots more to come.

2:24:39

Speaker C

Yeah, indeed. Nathan, thanks for doing this as always. A pleasure, a pleasure.

2:24:40

Speaker B

It's been fun.

2:24:46

Speaker C

Bye bye.

2:24:47

Speaker B

Until next time.

2:24:48

Speaker A

If you're finding value in the show, we'd appreciate it if you'd take a moment to share it with friends, post online, write a review on Apple podcasts or Spotify, or just leave us a comment on YouTube of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries either via our website Cognitiverevolution AI or by DMing me on your favorite social network. The Cognitive Revolution is part of the Turpentine Network, a network of podcasts which is now part of a 16Z where experts talk technology, business, economics, geopolitics, culture, and more. We're produced by AI Podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement@aipodcast.ing. and thank you to everyone who listens for being part of the Cognitive Revolution.

2:24:51