"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

It's Crunch Time: Ajeya Cotra on RSI & AI-Powered AI Safety Work, from the 80,000 Hours Podcast

190 min

•Apr 11, 2026about 2 months ago

Summary

Ajeya Cotra discusses AI timelines, the concept of 'crunch time' when AI rapidly accelerates R&D, and strategies for using advanced AI systems to solve safety and societal challenges rather than just pursuing further capability gains. The episode explores how transparency requirements and early warning systems could help society prepare for potential intelligence explosions.

Insights

The disagreement about AI's economic impact spans 1000x+ in growth rate predictions, driven by fundamentally different priors about technological acceleration rather than object-level disagreements about capabilities
A critical 'crunch time' window may exist where AI is powerful enough to dramatically accelerate R&D but not yet uncontrollably superintelligent—the key strategic question is whether to redirect that labor toward safety and defense rather than further capability acceleration
Transparency requirements around internal AI benchmarks, adoption rates, and safety incidents could provide early warning signals of intelligence explosion, but require careful design to avoid perverse incentives while maintaining competitive sensitivity
Personal fit with management, organizational structure, and local working environment often matters more for career impact and satisfaction than abstract cause prioritization, suggesting EA should reconsider its emphasis on cause selection over context
Effective altruism's comparative advantage may lie in incubating speculative but rigorous research on unconventional cause areas (digital sentience, value lock-in, space governance) that other communities won't pursue due to uncertainty and lack of prestige

Trends

Frontier AI companies converging on strategy of using each generation of AI to align and control successors, creating recursive self-improvement safety pipelineGrowing emphasis on measuring real-world AI productivity gains (RCTs, manufacturing metrics) rather than relying solely on benchmark saturation as capability indicatorShift from transparency-as-default to selective information sharing in AI safety philanthropy due to adversarial environment and competitive pressuresIncreasing recognition that AI safety research requires both technical depth and organizational/policy coordination, creating new roles for non-technical strategistsRobotics and physical automation progressing faster than expected, potentially enabling AI systems to close manufacturing feedback loops sooner than anticipatedRegulatory focus narrowing to transparency and whistleblower protections as more tractable than capability restrictions, with state-level legislation (NY, CA) leading federal effortsTalent migration from grant-making to technical research organizations as AI capabilities accelerate and research depth becomes more valued than portfolio breadthGrowing concern about 'jaggedness' in AI capability development—some domains (coding, math, AI R&D) advancing much faster than others (biodefense, policy, epistemics)Emergence of longitudinal expert forecasting panels (LEAP) as alternative to benchmarks for tracking AI progress and testing worldview predictionsIncreasing recognition that AI adoption by safety-focused organizations and governments is itself a key early warning signal and competitive necessity

Topics

AI Timelines and Takeoff Speeds Recursive Self-Improvement and Intelligence Explosion AI Safety Technical Research Priorities Transparency Requirements for Frontier AI Labs Early Warning Systems for AGI Capabilities AI-Assisted AI Safety Research Crunch Time Strategy and Resource Redirection Biodefense and Pandemic Preparedness via AI AI Adoption in Government and Regulated Sectors Effective Altruism Movement Evolution Grant-Making Strategy and Inside Views Organizational Management and Career Fit Digital Sentience and AI Moral Patienthood Space Governance and Resource Allocation Value Lock-In and Long-Term Societal Futures

Companies

Open Philanthropy

Ajeya's employer where she led technical AI safety grant-making and developed inside-view grant strategy

Anthropic

Frontier AI lab mentioned for releasing Mythos model with major benchmark gains and zero-day exploits

OpenAI

Frontier AI developer discussed for safety strategy of using each generation to align successors

Google DeepMind

Frontier AI lab with stated safety plan incorporating AI-assisted alignment research

METR

Organization where Ajeya now works on risk assessment and AI capability evaluation

Redwood Research

Technical AI safety org pioneering control agenda that Ajeya is considering joining

Coefficient Giving

Organization mentioned in context of Ajeya's previous role in AI safety grant-making

Forethought Institute

Research organization studying intelligence explosions and AI acceleration dynamics

80,000 Hours

Career advice organization offering free one-on-one advising for AI safety roles

Future of Life Institute

Organization behind October 2025 petition calling for superintelligence ban

Forecasting Research Institute

Running LEAP panel for longitudinal expert forecasting on AI capabilities and impacts

Give Well

Charity evaluator where Ajeya worked early in career, pioneering transparency in grant recommendations

FTX Foundation

Collapsed foundation whose grant recipients Ajeya helped via emergency grant-making in 2022

NVIDIA

GPU manufacturer whose stock price and compute availability critical to AI capability scaling

Apollo Research

AI safety research organization mentioned as example of grantee work

People

Ajeya Cotra

Guest discussing AI timelines, crunch time strategy, and career journey in AI safety

Rob Wiblin

Host conducting interview with Ajeya about AI capabilities and safety strategy

Holden Karnofsky

Previous manager of Ajeya who left in 2023, enabling her to lead technical AI safety grants

Emily Olson

New leadership that improved Ajeya's work experience and organizational integration

Ryan Greenblatt

Coined term 'top human expert dominating AI' for capability milestone Ajeya expects in early 2030s

Tom Davidson

Author of 'Three Types of Intelligence Explosion' analyzing feedback loops in AI scaling

Will MacAskill

Co-author of grand challenges framework for AI-assisted safety and societal defense

Andrew Critch

Colleague working on biodefense scaling and pandemic preparedness via AI

Joe Carlsmith

Colleague whose blog on existential reflection resonated with Ajeya's spiritual needs

Arvin Narayanan

AI skeptic whose potential conversion to AI risk concerns would signal credibility shift

Quotes

"I think that probably in the early 2030s, we are going to see what Ryan Greenblatt calls top human expert dominating AI, which is an AI system that can do tasks that you can do remotely from a computer better than any human expert."

Ajeya Cotra•~45min

"The world of 2050 could look as different from our perspective today as our world would look to hunter-gatherers of 10,000 years ago."

Ajeya Cotra•~5min

"All frontier developers are gradually converging on a strategy of using each generation of AIs to attempt to align, understand, and control their own successors."

Host summarizing Ajeya's research•~10min

"I really want to be like an advisor and a helper to the kind of central organization... if I don't have that I will still sort of gravitate towards trying to like meddle in everything else."

Ajeya Cotra•~120min

"Having done the homework really qualitatively changes the details of the decisions you make in ways that I think can be really high impact."

Ajeya Cotra•~100min

Full Transcript

Hello, and welcome back to The Cognitive Revolution. Today's episode is a cross-post from the 80,000 Hours Podcast, hosted by Rob Wiblin and featuring a conversation with Ajay Akhaptra, who previously led technical AI safety grant-making at Open Philanthropy, now Coefficient Giving, and is now working on risk assessment at METR. For years, AI insiders have recognized Ajay as one of the most rigorous thinkers about the AI future, and she recently validated that judgment by coming in number three out of more than 400 participants in the AI Digest 2025 AI Forecasting Survey. For comparison, I was proud to land in the top 5% at number 23. In this conversation, Ajay takes Rob through her expectations for the next few years as AI crosses critical thresholds, recursive self-improvement intensifies, and we enter what she describes as crunch time, a potentially short window in which AI is powerful enough to dramatically accelerate AI R&D, but not yet totally beyond human control. As a preview, I'll warn you that even the accelerationists may suffer some future shock from this conversation, because Ajay thinks it's actually quite plausible that we find no insurmountable bottlenecks to widespread and compounding automation, and that if so, the world of 2050 could look as different from our perspective today as our world would look to hunter-gatherers of 10,000 years ago. So, what's the plan to make sure such a mind-boggling transformation goes well for humans? Ajay advocates for transparency measures and early warning systems designed to make sure that superintelligence doesn't happen in secret. But aside from that, she reports that all frontier developers are gradually converging on a strategy of using each generation of AIs to attempt to align, understand, and control their own successors. As regular listeners know, I signed the Future of Life Institute's October 2025 petition calling for a ban on superintelligence, not because I think this approach is forever destined to fail, but simply because I worry that we don't yet understand AIs well enough to bet on a good outcome from such a recursive self-improvement-powered intelligence explosion. And yet, at the same time, I do agree with Ajay's advice. Almost regardless of the kind of work you're doing, you should be adopting AI as aggressively as possible, both to maintain an accurate understanding of the situation, and increasingly because you won't be able to keep up without it. It is a mad, mad world that we'll soon be living in, but I would go as far as to say that even PAWS AI campaigners ought to be using AI extensively. If all that weren't enough for you to process, you should also know that the situation has recently accelerated yet again. On March 5, just about two weeks after this episode was originally published, Ajay posted an article on her sub-stack, Planned Obsolescence, called I Underestimated AI Capabilities Again, in which she reports that the predictions that she made in January 2026, which were the backdrop for this conversation, were already starting to be met in just the first couple months of this year. And more recently, we've of course learned of Anthropics' new Mythos model, which despite the fact that Anthropics has never emphasized benchmark scores as much as other model developers, shows major gains on many benchmarks, and has reportedly found zero-day exploits in every major operating system and every major web browser among many other major software projects. The bottom line is that crunch time is arguably here now, so if you've been watching and waiting for AI to get serious before deciding what to do about it, I would suggest getting off the sidelines sooner rather than later. If you need help figuring out what to do, you might consider applying for free one-on-one career advising from 80,000 hours. As always, I want to thank Rob and the 80,000 hours team for allowing me to cross-post this episode. They have been delivering incredible alpha for years, and the nearer the singularity becomes, the more oppression they look. With that, I hope you enjoy this essential conversation about AI timelines and crunch time strategy with Ajay Akatra and host Rob Wiblin from the 80,000 hours podcast. Today, I'm speaking with Ajay Akatra. Ajay is a senior advisor at Open Philanthropy, where in 2024, she led their technical AI safety grant making. More generally, she's been doing AI-related research and strategy since 2018, and has become very influential in AI circles for her work on timelines, capability evaluations, and threat modeling. Thanks so much for coming back on the show, Ajay. Thank you so much for having me. So, doing this interview gave me a chance to go back and listen to the interview that we did, that we recorded, I guess, two and a half years ago. And I have to say, you were very on the ball, there was a lot of issues that came up in that conversation that you were bringing to people's attention that, I think, in the subsequent two and a half years seemed like a much, much bigger deal now. You talked about meters evaluating autonomous capabilities, a line of research that's gone on to become super influential, very widely read, I think, in policy circles. You talked about using probes to monitor and shut down dangerous conversations, something that's a pretty standard practice, and maybe one of the potentially most useful outputs from mechanistic interpretability. You talked about the importance of using chain of thought and scratch pads to monitor what AI's are doing and why. It's still probably the dominant technique. You talked about the growing situational awareness of AI models and the resulting possibility of deceptive alignment, something that's now completely mainstream topic. You talked about how when you train models to not engage in bad behavior, they don't necessarily just learn to become honest. They also learn to just hide their misbehavior better. Something that is, I guess, research has kind of borne out, does really happen, and is a big concern. You talked about how you expected models to get schemer as they get smarter, especially once we inserted reinforcement learning back into the mix, something that's definitely happened. And you talked a bunch about sicker fancy, how you thought models might end up just flattering people rather than giving accurate information because that's kind of something that we enjoy. So I feel like, I mean, you can come up with all of these ideas or anything like that, but I think you're ahead of the curve and maybe we'll get some ahead of the curve ideas in this interview as well. Hopefully. Thank you. So you think that a key driver of disagreements about everything to do with AI is people's different views on how likely AGI is to speed up science and technology and, I guess, physical infrastructure and manufacturing. Why is that? Yeah. So I think that I've been noticing as the concept of AGI has become more and more mainstream is that it's also become more and more watered down. So last year, I was on a panel about the future of AI at Dealbook in New York, and it was me and one or two other folks who kind of think about things from a safety perspective and then a number of venture capitalists and like technologists. And the moderator asked at the very beginning of the panel, how likely, like whether we thought it was more likely than not that by 2030, we would get AGI defined as AIs that can do everything humans can do. And like seven or eight hands went up, not including mine, because my timelines are somewhat longer than that. But then he asked a follow-up question a couple of questions later about whether you thought, whether we thought that AI would create or more jobs or destroy more jobs over the following 10 years. So 2030 was five years and seven out of 10 people thought that we would have AGI by 2030. But then it turned out that eight out of 10 people, not including me, thought that AI would create more jobs than it destroyed over the next 10 years. And I was a little confused. I was like, why is it that you think we will have AI that can do absolutely everything that the best human experts can do in five years, but will actually end up creating more jobs than it destroys in the following 10 years? And when I poked some people later in the panel about that seeming tension, I think they really quickly backed off and they said, oh, what does AGI really mean? The moderator had defined it as this very extreme thing. But they were like, we kind of already have AGI, people keep moving the goalposts, we keep making cool new products, and people aren't accepting that it's AGI and they aspire to something higher. And I thought that was funny because the old school singularitarian, futurist definition of AGI is this very extreme thing. But I think VCs have an instinct to call something AGI that is like, GPT-5 is AGI or something just much milder. And so I think this creates a situation where people feel like they've gotten a lot of evidence that AGI isn't a very big deal and doesn't change much because we already have AGI or we're going to have it next year or we got it two years ago and look around us, nothing much is changing. And so I think there's this expectation where whether or not we get AGI in the next few years, a lot of people are starting to not really care about that question. They still expect the next 25 years and the next 50 years to play out kind of like the last 25 years or the last 50 years where there's a lot of technological change between 2000 and 2025. But it's like a moderate amount of change. And they kind of expect that in 2050, there will be a similar amount of change as there was between 2000 and 2025, even if they think that we're going to get AGI in 2030. They think AGI is just like what's going to drive that sort of continued mild improvement. Whereas I think that there's a pretty good chance that by 2050, the world will look as different from today as today does from like the hunter-gatherer era. It's like 10,000 years of progress rather than 25 years of progress driven by AI automating all intellectual activity. Yeah, I guess you've hinted at the fact that there is an enormously wide range of views on this, but can you give us a sense of just how large the spectrum is and what the picture looks like on either end? Yeah, so I would say on the standard mainstream view, if you ask a normal person on the street like what 2050 will look like, or if you ask a standard mainstream economist, I think they would think, well, the population is a little bit bigger. We have somewhat better technologies. Maybe they have a few pet technologies that they're most interested in. And maybe we have this one or that one, slightly better medicine, people live slightly longer. And yeah, it's an amount of change that's like extremely manageable. I think on the far extreme from there, on the other side, is like a view described in if anyone builds it, everyone dies. Where in that world view, you have at some point, probably pretty unpredictably, we sort of crack the code to extreme superintelligence. We invent a technology that rather suddenly goes from being like, you know, GPT-5 and GPT-6 and so on, to being so much smarter than us that we're like, cats or mice or ants compared to this thing's intelligence. And then that thing can like really immediately have like really extreme impacts on the physical world. The classical sort of canonical example here being inventing nanotechnology. So like the ability to like precisely manufacture things that are like really, really tiny and can replicate themselves really, really quickly and can do like all sorts of things, you know, and can like move, you know, inventing like space probes close to the speed of light and things like that. I think there's a whole spectrum in between where like people think that we are going to get to a world where we have technologies approaching their physical limits. We have like spaceships approaching the speed of light and we have sort of self-replicating entities that replicate as quickly as bacteria, while like also doing useful things for us. But we're going to have to go through like intermediate stages before getting there. But I think like something that unites all of the people who are sort of AI futurists and like concerned about AI X-risk is that they think in the coming decades we're likely to get this level of like extreme technological progress driven by AI. How strong is the correlation between how much someone expects AI or AGI to speed up science research in particular, and I guess like physical industry as well, and how likely they think it is to go poorly or how nervous they are about the whole prospect? I think it's a very strong correlation. Like I've found often that people who like reasonable people who are AI accelerationists tend to think that the default course of how AI is developed and deployed in the world is very, very, very slow and gradual. And they think that we should like cut some red tape to make it go at a little bit more of a reasonable pace. And people who are worried about X-risk think that the default course of AI is this extremely explosive thing where it like overturns society on all dimensions at once in maybe a year or maybe five years or maybe six months or maybe a week. And they're saying, oh, we should slow it down to take 10 years maybe. And meanwhile, the sort of accelerationists think that by default, diffusing and capturing the benefits of AI will take like 50 years or 100 years and they want to speed it up to take 35 years. It's quite interesting. I guess, yeah, people who radically differ in their policy prescriptions might be targeting aiming for the same level of speed, actually. Maybe they wanted this bureau to take 10 years or 20 years. That's what both of them want. But they just think it's going to, that their baseline is so different. So they're fishing in completely opposite directions. What's your kind of modal expectation? What do you think is the most likely impact for it to have? I think that probably in the early 2030s, we are going to see what Ryan Greenblatt calls top human expert dominating AI, which is an AI system that can do tasks that you can do remotely from a computer better than any human expert. So it's better at remote virology tasks than the best virologists, better at remote, you know, software engineering tasks than the best software engineers and so on for all the different domains. And by that time, I feel like probably the world has already accelerated and changed and sort of narrower and weaker AI systems have already like penetrated in a bunch of places and like we're looking at a pretty different world. But at that point, I think things can go much, much faster because I think at top human expert dominating AI's in the cognitive domain could probably use human physical labor to build like robotic physical actuators for themselves. That would be like one of the things that whether the AI's are sort of have already taken over and are acting on their own or whether like humans are still in control of the AI's, I think that would be a goal they would have of like automating the physical as well. And I think I have pretty wide uncertainty on like exactly how hard that'll be. But whenever I check in on the field of robotics, I actually feel like robotics is like progressing pretty quickly. And it's taking off for the same reasons that sort of cognitive AI is taking off. It's like large models, lots of data, imitation, large scale is helping robotics a lot. So I imagine that like you can pretty quickly, maybe within a year, maybe within a couple years, get to the point where these like super human AI's are controlling a bunch of physical actuators that allow them to sort of close the loop of like making more of themselves, like doing all the work required to like run the factories that like print out the chips that then run the AI's and like doing all the repair work on that and like gathering the raw materials on that. So you're expecting in the 2030s, it won't just be that these AI models are capable of automating, you know, computer based R&D, but they'll also be able to lead on the project of building fabricators that produce the chips that they run on. And so that's like another kind of positive feedback loop. Yeah. So I really recommend the post three types of intelligence explosion that's by Tom Davidson on forethought, which where he makes the point that like, you know, we talk a lot about the sort of promise and the danger of AI's automating AI R&D and like, you know, automating the process of making better AI's. But that's only one feedback loop that is required to fully close the loop of making more AI's because we're talking about software that makes the like, you know, transformer architecture slightly more efficient or like gathers better data to train the AI's on. But the AI's are also running on chips, which are printed in like these chip, like, you know, factories at NVIDIA and those factories have machines that are built by other machines that are built by other machines and, you know, ultimately go down to raw materials. And I think that something we don't talk about very much because it'll happen afterward is how hard it would be for the AI's to automate that entire stack, the full stack and not just the software stack. Hey, we'll continue our interview in a moment after a word from our sponsors. AI is rapidly moving from assistance to agents, and it's causing a sea change. AI isn't just helping anymore, it's taking action. And here's the reality, you don't get outcomes from agentic AI unless you trust it to operate at scale. That's why Avpoint is building a control layer for AI. This foundational layer helps you govern what agents can access, secure how they operate, make activity auditable, and recover when something goes wrong, all as one connected system. See every agent, app and workflow and what they touch govern with policy and guardrails that work at machine speed and recover quickly so a mistake doesn't become an outage. That control layer creates trust and trust is what unlocks the right outcomes, letting you automate more work, move faster and deploy agents with confidence instead of hesitation. If you're scaling agents and want those outcomes by design, learn more about Avpoint at avpt.co.tcr. That's avpt.co.tcr. Support for the show comes from VCX, the public ticker for private tech. For generations, American companies have moved the world forward through their ingenuity and determination. And for generations, everyday Americans could be a part of that journey through perhaps the greatest innovation of all, the US stock market. It didn't matter whether you were a factory worker in Detroit or a farmer in Omaha, anyone could own a piece of the great American companies. But now that's changed. Today, our most innovative companies are staying private rather than going public. The result is that everyday Americans are excluded from investing and getting left further behind while select few reap all of the benefits. Until now. Introducing VCX, the public ticker for private tech. VCX by Fundrise gives everyone the opportunity to invest in the next generation of innovation, including the companies leading the AI revolution, space exploration, defense tech, and more. Visit getvcx.com for more info. That's getvcx.com. Carefully consider the investment material before investing, including objectives, risks, charges, and expenses. This and other information can be found in the funds prospectus at getvcx.com. This is a paid sponsorship. So, I guess the range of expectations that exist among sensible, thoughtful people who've engaged with this on how much, like at peak, how much is AGI going to speed up economic growth? It ranges from people who say it will speed up economic growth by 0.3 percentage points. It'll be 15% increase or something on current rates of economic growth. I'd be very happy if it was that good. People would say at peak, the economy will be growing at 1,000% a year, or higher than that, thousands of percent a year. So, it's like 100 or 1,000 or a 10,000 fold disagreement basically on the likely impact that this is going to have. It's an almost like unfathomable degree of disagreement among people who it's not as if they've thought about this independently and they haven't had a chance to talk. They've spoken about this, they've shared their reasons, and they don't change their mind and they disagree by a thousand fold impact. You've made a part of your mission in life the last couple of years to have really sincere, intellectually engaged, curious conversations with people across the full spectrum. Why do you think it is that this disagreement is able to be maintained? Yeah, I feel like at the end of the day, the different parties tend to lean on two different, pretty simple priors or simple outside views that are different outside views. I would say that the party, the group that expects things to be a lot slower tends to lean on, well, for the last 100, 150 years in frontier economies, we've seen 2% growth. Think of the technological change that has occurred over the last 100 or 150, years. We went from having very little, electricity was just an idea to everywhere was electrified. We had the washing machine and the television, the radio, all these things happened, computers happened in this period of time. None of these show up as an uptick in economic growth. There's this stylized fact that mainstream economists really like to cite, which is that new technology is the engine that sustains 2% growth. In the absence of that new technology, growth would have slowed. This is how new technologies always are. People think that they're going to lead to a productivity boom, but you never see them in the statistics. You didn't see the radio, you didn't see the television, you didn't see the computer, you didn't see the internet. You're not going to see AI. AI might be really cool. It might be the next thing that let's us keep chugging along. That's one perspective. It's an outside view they keep returning to. Also, maybe a somewhat more generalized thing, which is things are just always hard and slow, just way harder and slower than you think. It's like, what was it like, not Murphy's Law? Because anything that can go wrong will go wrong. I think this is our experience in our personal lives, that it's awfully hard to achieve things at work. Two other people might seem straightforward and they're like, why haven't you finished this yet? You're like, well, I could give you a very long list. Or Hofstadter's Law. It always takes longer than you think, even when you take Hofstadter's Law into account. The programmers credo, this is my favorite one, is we do these things not because they are easy, but because we thought they would be easy. This whole cloud of like, it's naivete to think that things can go crazy fast. If you write down a story that seems perfect and unassailable for how things will be super easy and fast, there's all sorts of bottlenecks and all sorts of drag factors you inevitably failed to account for in that story. That's kind of like that perspective. Then I think the alternative perspective leans a lot on much longer term economic history. If you attempt to try and assign reasonable GDP measures to the last 10,000 years of human history, you see acceleration. The growth rate was not always 2% per year at the frontier. 2% per year is actually blisteringly fast compared to what it was in 3000 BC, which was like, maybe that was 0.1% per year. The growth rate has already multiplied many folds, maybe an order of magnitude, maybe two. I think that people in the slower camp tend to feel like the exercise of doing long run historical data is just too fraught to rely upon. People in both camps do agree that the industrial revolution happened, and the industrial revolution accelerated growth rates a lot. We went from having growth rates that were well below 1% to having 2% a year growth rates. I think that people in the faster camp tend to lean on the long run and on models that say that the reason that we had accelerating growth in the long run was a feedback loop where more people can try out more ideas and discover more innovations, which then leads to food production being more efficient, which then leads to a larger, supportable population. Then you can rinse and repeat, and you get super exponential population growth. That perspective says that AIs, if you can slot in AIs to replace not just the cognitive, but the cognitive and the physical, the entire package, and close the full loop of AIs doing everything needed to make more AIs, or AIs and robots doing everything needed to make more AIs and robots, then there's no reason to think that 2% is some physical law of the universe. They can grow as fast as their physical constraints allow them to grow, which are not necessarily the same as the constraints that keep human-driven growth at 2%. That's the justification that they provide for their perspective in broad strokes. Why is it that even after communicating this at great length to one another, they don't converge on uncertainty or saying it'll be something in the middle because there's competing factors, that they just continue to be reasonably confident about quite different narratives about how things will go? Yeah. I'm honestly not sure. I think maybe one part of it is that... So I guess I'm partial to the things will be crazier side of things, so I'm not sure I'll be able to give a perfectly balanced account. But I feel like one thing I've noticed in terms of people who think it'll be slower is that their worldview has a built-in error theory of people who think things will go faster. So the worldview is not just things will keep ticking along, but everyone thinks there will always be some big new revolution that makes things go... Everyone's always expected to speed up and they've always expected to speed up and they've always been wrong. So there's that dynamic, which is like... From their point of view, I think it's totally reasonable. It's like even if there isn't some super knockdown argument in the terms of your interlocutor, where you can point to a mistake that they'll accept, or even if you look at the story and think it's plausible, you still have this strong prior that... Someone could have made the same argument in the past. Someone could have made the same argument about television, someone could have made the same argument about computers, none of these played out. So I think that's a big factor. I also think there hasn't been... These are complicated ideas and there hasn't been that much dialogue. And I think there could be more and I think there could be more dialogue that is trying to ground things in near-term observations also. But yeah, I think that's a big part of it. I think they have an error theory built in that makes it so that... The object-level conversation about... Okay, here's how the AI could make the robots and here's how the robots could bootstrap into more robots and so on. That whole way of thinking doesn't feel very legitimate or interesting or they have a story where that type of thinking always leads to a bias towards expecting things to go faster than they actually will because it's hard for that kind of thinking to account for all the drag factors and all the bottlenecks. Whereas I think on the other side, people who think things will go faster feel like everyone is always blanket assuming that they're going to be bottlenecks. And then they bring up specific bottlenecks and those specific bottlenecks when you look into them don't seem... They might slow things down from some sort of absolute peak of a thousand percent growth but they're not reasons to think that 2% is where the ceiling is or even that 10% is where the ceiling is. So they also have this kind of error theory of the bottleneck subjection. So it's incredibly decision-relevant to figure out who is right here. I think almost all of the parties to this conversation, if they completely changed their view and the people who thought it was going to be a thousand percent decided it was going to be 0.3%, they would probably change what they're working on. Although they would think it was a decisive consideration probably against everything that they were doing previously and vice versa. If people came to think that there would be a thousand percent speed up then they'd probably be a whole lot more nervous and interested in different kinds of projects. So how can we potentially get more of a heads up ahead of time about which way things are going to go? I guess it seems like sharing theoretical arguments hasn't been persuasive to people. Is there any kind of empirics that we could collect as early as possible? So one thing that I think will not address all of this but is a step in the right direction is really characterizing how and why and if AI is speeding up software and AI R&D. So Meter came out with an uplift RCT which I think was the first of its kind or at least the largest and highest quality where they had software developers split into two groups. One group was allowed to use AI, the other group was disallowed from using AI and they studied how quickly those developers solved issues like tasks on their to-do list. It actually turned out that in this case AI slowed down their performance which I thought was interesting. I don't expect that to remain true but I'm glad we're starting to collect this data now and I'm glad we're starting to sort of cross check between benchmark style evaluations where AI's are given a bunch of tasks and sort of scored in an automated way and evidence we can get about actual like in context real world speedups. So I really want to get a lot more evidence about that of all kinds like big uplift RCTs. It would be great if companies were into internally conducting RCTs on their own like rollouts of internal products to see like you know our teams that get the the latest AI product earlier more productive than teams that don't. Even self-report which I think has a lot of limitations is still something we should be gathering. So I guess my like high-level formula would be look at the places where adoption has penetrated the most and start to measure speed up in like actual output variables. Like I think it would be really cool if there was a solar panel manufacturing plant that had like really adopted AI and we like started to see like you know how much more quickly they could manufacture solar panels or like how much better they could make solar panels. Yeah is it possible to do this at the chip manufacturing level? I guess maybe that's the most difficult manufacturing that there is more or less. So we might think that AI you get more of an early heads up if you do something that's more straightforward like solar panels but would really like to be monitoring across like the across all kinds of different manufacturing. Yeah. How much difference is any of this making? I think the most important thing or the thing I ultimately care about is the AI stack. So chip design, chip manufacturing, manufacturing the equipment that manufactures chips and then of course the software piece of it too. The software piece is the earliest piece but I think we should be monitoring degree of AI adoption, self-reported, AI acceleration, RCTs, anything we can get our hands on for the entire stack because I think the moment when the sort of AI futurists think things are likely to be going much, much faster sort of coincides with when AI has like fully automated the process of making more AI. So that's really something to watch out for. And then I think like but on a separate track you also want to just be looking at the earliest power users no matter where they are just because you can get insight that transfers to these domains. Is there anything else we can do? I don't know. I'm really like curious about this. Do I want to say right that last year you put out a request for proposals, you were I'd open for looking to fund people who had ideas for how would we resolve this question? Yeah. So I put out a pair of requests for proposals in late 2023. One of them was on building difficult realistic benchmarks for AI agents. So at the time very few people were working with AI agents and only like a couple of agentic benchmarks had come out including meter's benchmark that I discussed on the show last time. And so I was really excited about it. It felt like it was a moment to move on from like giving LLMs multiple choice tests to giving them like real tasks like book me a flight or like you know make this piece of software work like write tests run the tests iterate until the thing actually works. And that was like a very new idea at the time but also the time was sort of right for that idea. And there were a lot of academic researchers who were excited about moving into the space. So we got a lot of applications for that arm of our request for proposals and we funded a bunch of cool benchmarks including like SciBench which is a cyber offense benchmark that's used in a lot of like standard evaluations now. But then we also had this other arm which was basically like types of evidence other than benchmarks like surveys, RCTs, all the things we talked about. We got much less interest for that and I think it just reflects that it's harder to think of like good ways to measure things outside of benchmarks even though everyone agrees benchmarks have major weaknesses and consistently overestimate real world performance because benchmarks are sort of like clean and contained and the real world is messy and open ended. But one thing that I'm excited about that came out of the second RFP is that forecasting research institute is running this panel called LEAP which is the Longitudinal Experts on AI panel where they just take like 100 or 200 AI experts, economists and super forecasters and have them answer a bunch of granular questions about where AI is going to be in the next six months, in the next year, in the next five years. Both like benchmark scores but also things like you know Will Companies report that they're like slowing down hiring because of AI or like Will and AI be able to like plan an event in the real world or like these kinds of things. So I'm very excited about that and I think honestly like having people make subjective predictions, explain how those predictions are connected to their like longer run worldviews and then like check over time who's right might be the like most flexible tool we have. So I'm very excited to see where LEAP goes but I think it is like it is challenging to get indicators that are clearly early warnings so that we can actually like do something about it if the people who are more concerned are right but that are also like clearly valid and like not easy to dismiss on the other side as just like not realistic enough to matter. Hey we'll continue our interview in a moment after our words from our sponsors. One of the best pieces of advice I can give to anyone who wants to stay on top of AI capabilities is to develop your own personal private benchmarks, challenging but familiar tasks that allow you to quickly evaluate new models. For me drafting the intro essays for this podcast has long been such a test. I give models a PDF containing 50 intro essays that I previously wrote plus a transcript of the current episode and a simple prompt and wouldn't you know it? Claude has held the number one spot on my personal leaderboard for 99% of the days over the last couple years saving me countless hours but as you've probably heard Claude is the AI for minds that don't stop at good enough. It's the collaborator that actually understands your entire workflow and thinks with you whether you're debugging code at midnight or strategizing your next business move. Claude extends your thinking to tackle the problems that matter and with Claude code I'm now taking writing support to a whole new level. Claude has coded up its own tools to export, store and index the last five years of my digital history from the podcast and from sources including Gmail, Slack and iMessage and the result is that I can now ask Claude to draft just about anything for me. For the recent live show I gave it 20 names of possible guests and asked it to conduct research and write outlines of questions. Based on those I asked it to draft a dozen personalized email invitations and to promote the show I asked it to draft a thread in my style featuring prominent tweets from the six guests that booked a slot. I do rewrite Claude's drafts not because they're bad but because it's important to me to be able to fully stand behind everything I publish but still this process which took just a couple of prompts once I had the initial setup complete easily saved me a full day's worth of tedious information gathering work and allowed me to focus on understanding our guests' recent contributions and preparing for a meaningful conversation. Truly amazing stuff. Are you ready to tackle bigger problems? Get started with Claude today at Claude.ai slash TCR. That's Claude.ai slash TCR. And check out Claude Pro which includes access to all of the features mentioned in today's episode. Once more that's Claude.ai slash TCR. Everyone listening to this show knows that AI can answer questions but there's a massive gap between here's how you could do it and here I did it. Tasklet closes that gap. Tasklet is a general purpose AI agent that connects to your tools and actually does the work. Describe what you want in plain English. Triage support emails and file tickets in linear. Research 50 companies and draft personalized outreach. Build a live interactive dashboard pulling from Salesforce and Stripe on the fly. Whatever it is, Tasklet does it. It connects to over 3,000 apps, any API or MCP server and can even spin up its own computer in the cloud for anything that doesn't have an API. Set up triggers and it runs autonomously, watching your inbox, monitoring feeds, firing on a schedule, all 24-7 even while you sleep. Want to see it in action? We set something up just for cognitive revolution listeners. Click the link in the show notes and Tasklet will build you a personalized RSS monitor for this show. It will first ask about your interests and then notify you when relevant episodes drop. However you prefer email, text, you choose. It takes just two minutes and then it runs in the background. Of course, that's just a small taste of what an always on AI agent can do. But I think that once you try it, you'll start imagining a lot more. Listen to my full interview with Tasklet founder and CEO, Andrew Lee. Try Tasklet for free at tasklet.ai and use code Cogrev for 50% off your first month. The activation link is in the show notes, so give it a try at tasklet.ai. So as part of this, you've been thinking about, I guess one way that this could really go wrong is if the companies that are developing, cutting edge AI, may know, they begin to see themselves internally how much it's helping them and that perhaps it's speeding them up enormously. But they may not decide not to share that information with the rest of the world. And they may decide not to release those products. If there's one company that's well ahead of the others, then in AI 2027, it was depicted that the company that was ahead in the AI race was so far ahead of its competitors that it could afford to just keep its best stuff internal and only release less good products to the rest of the world. It could afford it in the sense of it didn't need to make money by selling the product? Its competitors were far enough behind that they couldn't undercut it or compete with it by releasing a better product. In the story, the company in the lead, Open Mind, is basically just releasing products that are slightly better than the state of the art of its competitors. I think the software that they can just choose to always basically have their product be somewhat better. They can just release whatever level of their own internal machine would be the best to the external world. But I guess it would be unfortunate if there are people who do know this, but the broader world doesn't get a heads up. And so we could have known six months or a year earlier in what direction things were going, but that was kept secret. I mean, I guess maybe for the leading AI company, that would prefer to keep it secret. But for the rest of us, I suppose would probably prefer that the government has some idea what's going on. So you've been thinking about what sort of transparency requirements could be put in place that would require the companies to release information that would give the rest of us clues as to where things are going. What sort of transparency requirements could those be? Yeah. So I think there's a whole spectrum of evidence about AI capabilities, where on the one hand, the sort of easiest to test, but the least informative, is benchmark results. And companies do release benchmark results when they release models right now. So they say, you know, Claude Opus 4 was released, and they have a model card that says, like, you know, it has this score on this like hacking benchmark, it has this score on the software engineering benchmark and so on, as part of a report about whether it's dangerous or GPT-5 had the same thing. I think that that's great that they do that. But in my ideal world, they would release their highest internal benchmark score at some calendar time cadence. So every three months, they would say, we've achieved this level score on this hacking benchmark, this level score on software engineering benchmark, this score on an autonomy benchmark. And that's because, as you said, danger could manifest from purely internal deployment. Because if they have an AI agent that's sufficiently good at AI R&D, they could use that to go much faster internally. And then other capabilities and therefore other risks might come online much faster than like people were previously expecting. So it's not ideal to have your like report card for the model come out when you release it to the public. Unless there's some sort of guarantee that you're not sitting on a product that's like substantially more powerful than the public product. So maybe it's fine to like release it, release your like model card and system card along with the product. If you also separately have a guarantee that you won't have too much of a gap between the internal and the external. So that's like on the end of things that are currently discussed. It's like kind of how I would tweak information that's currently reported to be like somewhat more helpful for this concern. But then there's a bunch of other stuff that is not currently reported that I think ideally it would be really great to know. Stuff like how much and how are they using AI systems internally? So one thing I'm very interested in is so companies will sometimes report kind of to brag about like the percentage of lines of code that are written by their AI systems. Various CEOs have said like internally 90% of our lines of code are written by AI's and things like that. I think it'd be great to have systematic reporting of those kinds of metrics. But those metrics aren't the like ideal metric I'd be interested in. So like one thing I'm interested in is like what fraction of pull requests to your internal code base were mostly written by AI and mostly reviewed by AI. So AI is like humans are like not involved for the most part in like both sides of this equation. And I'd be very interested in watching that number climb up because I think it's an indication both of AI capabilities and of like how much deference they're giving to AI's. And eventually if things are going to go crazy fast, the AI's have to be doing most things including most like management and approval and review. Because if humans have to do that stuff then things don't go so fast. So I really want to track how much like higher level decision making authority is being given to the AI's in practice inside the companies. Yeah, I think there are probably a bunch of other things that we could like send basically as a survey. Like how much do you use AI's for this type of thing, for that type of thing? Like how much speed up do you get from, you know, subjectively do you think you get if you're running any internal RCTs? I would of course love to know the results of that. What about just requirements that in as much as they're training future generations of AI models, they have to reveal to at least like some people in the government like how they're performing on like normal evals of capabilities. So they can kind of see the see the line going up even if they're not releasing it as products for whatever reason. And if the line starts like, you know, if the benchmarks start curving upwards like far above previous expectations then that that could lead them to sound the alarm. Yeah, I think that is a good thing to do. But I am sort of, I sort of don't think that just benchmarks alone will actually lead anyone to sound the alarm. Because we just like the thing with benchmarks is that they saturate. Yeah, they always have that S-curve shape. They always have the S-curve shape. And the benchmarks we have right now are harder than the previous generation of benchmarks. But it's still definitely far from the case that like, I feel confident that if your AI gets 100% score on all these benchmarks, then it's like a threat to the world and it could take over the world. I still think the benchmarks we have right now are like well below that. So what's probably going to happen is that these benchmarks are going to get saturated, then there's going to be a next generation of benchmarks people make. And then those benchmarks are going to tick up and then get saturated. So I think we need some kind of real world measure before we can start sounding the alarm. And then the ultimate real world measure is actually just observed productivity, right? Like if they are seeing internally that they're discovering insights like faster than they were before, then that's a very like late but also very clear signal. And that's the point at which we should definitely, they should definitely sound the alarm and like we should sort of know what's happening. So yeah. Yeah, how was this idea being received by the companies? I mean, on the one hand, it seems like transparency requirements is the regulatory instrument that the companies have objected to the least. It's the one that they've been most willing to tolerate. On the other hand, the whole message of this is we don't trust you to share information with the rest of the world. And we think that you might screw us over basically by rushing ahead and like deliberately concealing that. I could imagine that that could be a little bit offensive to them or at least if that is their plan, then they probably want to find some excuse for not having this kind of oversight. Yeah, I think that the response just tends to differ based on the like actual information that's being asked for. So benchmark scores, they already release. Like I said, they release it at the point of releasing a product, which I think is fine for now, but I would like to move it to a regime where they release benchmark scores at some sort of fixed cadence, even if they don't have a product release. Benchmark scores are not considered like sensitive information. But this other stuff that I think is a lot more informative on the margin is much more fraught, right? They don't necessarily want to share with the world like, you know, the rate at which they're gaining like algorithmic insights, because you want to maintain some mystery about that for competitive reasons. Like it's risky for you if it's a little bit too fast, because then like, I don't know, competitors will start paying more attention to you and like trying to copy you and trying to find out what's going on. It's also risky for you if it's like too slow, but because then that's kind of embarrassing. Investors lose heart. Yeah, investors lose heart. And another thing I didn't mention earlier is that I would really like them to be reporting their most concerning misalignment related safety incidents. So like, has it ever been the case that in real life use within the company, the model lied about something important and covered up the logs? Like I really want to know that. But then of course, it's clear that reporting that is very embarrassing to companies. So one like thing that might help here is that there are a number of companies now. So perhaps they could report their individual data to some sort of third-party aggregator that then reports out like an anonymized like overall industry aggregate score. But I don't think that solves all the issues because there are few enough of them that people would be able to guess. So I think there's a lot of like competitive challenges and IP sensitivity challenges and like just PR challenges to overcome here with some of the more like penetrating internal information. But I think it's like important enough to the public interest that we should try and find a way to like navigate that. Yeah. So it's not unusual for government agencies to be able to basically demand commercially sensitive information from companies for regulatory or governance purposes. I actually worked at one when I was in the Australian government. I was at the Productivity Commission which had like extraordinary subpoena powers to basically demand almost any documents from any company in the country. Like I rarely used power but and it wasn't the only agency that had that capability. And what kinds of things would you ask them? I mean, I never actually saw this power being used. It was a kind of a who I guess people were proud of the fact that we had that authority. Yeah. But I think you would usually do it for competitive, for like competition reasons trying to tell whether companies are colluding potentially or whether there's like an insufficient degree of like market competition and there would be a reason to intervene. And I would imagine almost certainly there's government agencies in the US that have a similar remit. Yeah. And so if they actually could keep that kind of information secret, then maybe the companies will be more happy to share it with people who were specialized basically in reading this, like comprehending this data and figuring out what to do with it. Yeah. I think that could be a solution but I'm a little skeptical. So I think that um releasing this information publicly is probably a lot better than releasing it just to a government body. Basically because you know, we're like building the plane of like AI safety research like as we're flying it. Um and it's not like there's a box checking exercise that any kind of government agency that's like often understaffed especially with like technical staff could do. It's more like we want this information out there in the open and then we want people to do like some involved analyses of it and like our sense of what information we even want is probably going to be like shifting over time and it'll probably go better if there's like a robust kind of external scientific conversation about like what indicators we want to see and what that would mean and like when we should trigger alarm. Um and if that's all being routed through governments with like 10 people or like even 50 people um who have to deal with it. I think it's like it would be very hard for them to interpret the evidence like quickly enough and well enough and and be confident enough to sound the alarm and then have people actually listen to them. Like if I imagine sounding the alarm on something like the intelligence explosion I kind of picture it having to be like a a society-wide conversation. Um kind of like sounding the alarm about COVID or like something I have in my mind is like when Joe Biden had that disastrous debate performance that led to like weeks of conversation that ultimately led to him being removed from the ticket. It would have been very hard I think for a small narrow group of people sort of interested with the authority to like make the same thing happen. Because I guess you want common knowledge and you want lots of attention focused on the on the issue as well as just some technocrats being aware. As well as the opportunity for a bunch of technical experts who may not be paying that much attention now because like maybe they think this stuff is all science fiction to jump in at that moment and like offer their their takes. I think it would be very powerful if someone like Arvin Narayanan who's like known for being very skeptical of these stories actually looked at the data and changed his mind and said oh yeah like this this is happening now and it's dangerous and it's very hard to get those kinds of common knowledge dynamics if everything is like just sent to governments. That said of course like I think sending things to governments is better than like not sending it anywhere. So like I also think that's good. So in as much as the the plan A would be we want them to be sharing this information such that anyone in the public can find out. I guess they'll probably resist this to this any legislation imposing this to some extent. And I guess for partially legitimate reasons that it is probably going to be frustrating for them. How high on the list of like in as much as people are trying to set priorities for what sort of asks do you make and which sort of fights do you pick? Would this be like very high on the on the list for you? I think I laid out a whole spectrum of ideal kind of like information sharing practices and I don't think going all or nothing on that whole package is like a top priority fight to pick. But I think sort of the algorithm of thinking really hard about what pieces of information we would want to know in order to know for ourselves if the intelligence explosion was happening and then and sort of getting the like highest value items on that list or the like biggest bang for the bang for buck items on that list to me feels very high. And I think that's the like strategy that people working on AI safety related legislation have landed on. So like the Rays Act in New York and SB 53 in California are both like quite transparency oriented and both oriented around for example like whistleblower protections which are which are like an important sort of policy plank underlying transparency. Do you think that information about an emerging intelligence explosion might just leak out to the public anyway because staff at the companies would feel uncomfortable with that proceeding in secret? I think that's very plausible. I still think that information that leaks in the form of rumors in San Francisco like tech bro parties doesn't have the ability to impact policy and like decision making all the way in D.C. or London or Brussels in the same way as information that is just sort of clearly unrefuted and very salient and sort of official. So I mean I think that the AI safety scene in the Bay Area has benefited from having like close social ties to people who work at AI companies getting a sense of like what might be coming around the corner. But that's not something you can just that's not something that you can use to really like pull an alarm or like advocate for very costly actions. So I think it like isn't really enough. We need more. So let's imagine that by whatever mechanism society does get a heads up that we are starting to see the early stages of an intelligence explosion. What would we do with that heads up? Yeah. So I think one just extremely important factor is at that point in time how good are AI systems at everything besides AI R&D. So the alarm has sounded and AI we learned that AI has like fully or almost fully automated R&D at the leading AI lab perhaps all the AI labs. This is causing those labs to go way faster than they were going with like mostly human driven progress in the previous era. So at that point in time whatever AI progress you thought was going to be made by default in the next 10 years or the next 20 years or the next 30 years might be made in a year or two or even six months based on how much depending on how much AI is speeding everything up. So at this stage AIs might not be that dangerous but we might be about to move very quickly through the point in time where they're not so dangerous to the point in time where they have sort of godlike abilities. And I think that what we want to do as a society if we gain confidence that we're sort of at the starting point of this intelligence explosion is to redirect as much of that AI labor as we can from further AI R&D to things that could help protect us from future generations of AIs both in terms of AI takeover risk and also in terms of a wide range of other problems that might be created for society by increasingly powerful AI. And at that point it's not it's still not in the sort of narrow selfish interests of whichever company is in the lead to do that because if they were to slow down unilaterally then like someone behind them could catch up but hopefully if we have if the alarm has sounded and we have like a clear picture of you know we have six months or 12 months or 18 months until radical super intelligence then this might be like a window of opportunity to coordinate to get to to like use AIs for protective activities instead of further AI capability acceleration. So the challenge we have is AI is becoming much smarter very quickly and we feel very nervous about that but I guess the and the opportunity that's created is that well we have AI's like a lot more labor and we have like much smarter potential research as than we did before so why don't we turn that new resource towards solving this problem that I guess at the moment we don't really know how to how to fix. It's a little bit I guess I think some people who are not too worried about AI that they look at society as a whole or they look at history and they say well technology has enabled us to do all kinds of more destructive things but we don't particularly feel like we're in a more precarious situation now or at much greater personal risk now than in 1900 or 1800 because advances in destructive technology have been offset by advances in safety increasing technology and on balance probably things have gotten safer and say that here as well can we potentially it's like going to be a vertiginous time but perhaps we could pull off this the same the same trick in this crunch time period. Yeah and I think that like a lot of people who are who are more concerned about AI risk are very like dismissive of this plan it's just like it sort of sounds like a crazy plan it's like really flying by the seat of your pants like expecting the the thing that's creating the problem to solve the problem but in a sense like I do think humanity has repeatedly used sort of general purpose technologies that both like created problems to solve those problems like you know automobiles something as mundane as that like you know cars created the opportunity for there to be carjackings and for there to be drive-by shootings and for like you know it empowered bad actors in various ways but of course like you know if if the police and law enforcement have cars as well like that is uh that is like it's not like you're um you know when you imagine a future with some crazy new advanced technology and you imagine all the problems it creates it can be hard to like with the same level of detail and fidelity imagine all the responses to those problems that are also enabled by that technology um and so you you know you could imagine someone worrying about the rise of like fast vehicles and like neglecting to think about how the fast vehicles would have like you know all the way all the ways that like they cause bad things could be sort of like kept in check by people using vehicles for law enforcement and similar and similarly with computers you know you can you can hack things with computers but computers also enable you to do a lot of automated monitoring for that kind of um hack and like automated vulnerability discovery yeah different kinds of law enforcement like you couldn't you couldn't imagine a police force not using computers um so I do think the basic principle is sound that like if you're worried about problems created by technology one of the first things on your mind should be how can you use whatever that new technology is to solve those problems um and but but you know I I think that this is an especially narrow window to get this right um and you're not imagining cars um creating like broad-based rapid acceleration of of all sorts of new technologies and potentially like just a 12-month window or two-year window or a six-year window before everything goes totally crazy um so I do think that it's important to not blow through that window to like monitor as we're approaching it and to monitor how long we have um but but yeah I think I'm fundamentally fairly optimistic about trying to use early transformative AI systems like early systems that automate a lot of things to automate the process of controlling and aligning and managing risks from the next generation of systems who then like automate the process of managing those risks from the generation after and so on yeah it's interesting that you say that this approach has often often been dismissed because I feel I feel it's very in vogue now I hear about this proposal every every couple of days so someone presents it or I read something about it in one guys or another yeah um I guess one reason why years in the past it might have felt unpopular is people were mostly focused on the issue of misaligned AI they were concerned about an AI that has it in for you and would like to take over if it had the opportunity and that's maybe the worst application of this out of all of them because yeah there you're you're asking the AI to to align itself but you don't know whether it's assisting you or trying to to undermine you um and so I mean you could try to make that work people have suggested proposals where you could try to get useful honest work out of a out of an AI that um is not that doesn't want to help you um but it's a lot easier to see how you can potentially solve problems other than alignment like if you assume well the the alignment part we've like we feel like we've got a good handle on but there's a huge list of other problems that are that are being created during the intelligence explosion like the fact that AI now if people get access to it could invent other kinds of destructive technologies that we don't yet have good um counter measures for um in that case it's just clear how will the AI could just help you figure out what the what the counter measures ought to be so I I don't think that I agree with this so I do think misalignment the the prospect that these early AIs these early transformative AIs are misaligned is a is a huge obstacle to this plan that needs to be short up and handled and specifically addressed and I don't think that um it necessarily bites harder for getting the AIs to do alignment research than for getting the AIs to do anything else helpful because if they have it out for you they don't necessarily want to like help you shore up your civilization's defenses so if you're imagining trying to get a hardened misaligned AI to help you with bio defense uh if it's misaligned then it you know for example wants the option of like threatening you with a bio weapon in its arsenal in the future it would similarly have an incentive to do a bad job at that as it would to do a bad job at alignment research um so in general I think there's this there's one big concern which is will the AIs that we're trying to use at that point in time have motivations that give them incentives to undermine the work we're trying to get them to do um and and I think they certainly would have incentives to undermine alignment research if they were misaligned but I think they would also have incentives to undermine like efforts to make ourselves more rational and thoughtful like AI for epistemics um because if we're more rational and thoughtful then maybe we'll realize they're probably misaligned and that would be bad for them they would also have incentive to undermine our like defAC style like defensive efforts because that would make it harder for them to take over that makes sense I think the distinction I was drawing is for people who thought that the alignment problem was extremely hard to solve and we were like way off track to to to solving it the idea of getting the AI to solve the problem at least kind of self-contradictory because like well I wouldn't believe I wouldn't trust the AI at all I know anything that I proposed I would assume was was sabotaging us if you're on the side of thinking well the alignment problem is actually the easier part of things I think that that's a that's a relatively straightforward technical problem that we are on track to solve but there's this like laundry list of 10 other issues yeah it's like then very obvious like but we'll have the like that we'll have the brilliant AGI so why don't we just use that to solve all the other things and and also I like I'm inclined to trust it and believe it yeah so I do think that if you are not worried about alignment at this early stage everything becomes easier it becomes an even more attractive you know strategy and path but I think the canonical using AI for AI safety or using AI for defense plan does imagine that we're not sure at the beginning that they're aligned we may not be like highly confident that they're like extremely misaligned and like like fully power-seeking and like looking to take over at every opportunity but we're not imagining that we know with confidence we can trust them so figuring out how to create a setup where we use control techniques and alignment techniques and interpretability and like whatever other tool at our disposal to get to the point where we feel good about relying on their outputs is like a crucial step to figure out because it either like bottlenecks our progress because we're we're checking on everything all the time and slowing things down or it doesn't bottleneck our progress but we like hand the AIs the power to take over. So which kind of specific problems arising from the intelligence explosion are you envisaging wanting to get the the AGI to help us out with? Yeah so one obvious one is just AI alignment how can we ensure that either these AIs that we're using to help us right now or future generations of AIs that they help us create and future generations that those AIs help us to create how can we ensure that that whole chain is motivated to help humans and is honest and is like basically doing what we say and steerable and that is sort of the foundation of everything else but then there are also other things that are not really about AIs at all that are just about broad societal defenses so if we think that the advent of extremely powerful AI will like create a flood of new like cyber vulnerabilities that are quickly discovered in like a bunch of critical systems like weapons systems and the power grid and so on can we preemptively use those same AIs that are good at finding those vulnerabilities to find and patch them before bad actors can use the AIs to find them. Another thing is biodefense so you had my colleague Andrew on your podcast recently that talked about his ambitious plan to you know rapidly scale up like detection of novel pathogens rapidly scale up medical countermeasures when they're detected and rapidly scale up the manufacturing of like PPE and like clean rooms and things like that. If we have AI systems that are good at like you know that kind of research problem and also maybe we have at that point robots so a lot of that manufacturing itself can be automated and can go a lot faster than if humans had to do that stuff that would be like a big boon to biodefense and then there's some somewhat more speculative things along the lines of like you can think of this as a kind of defense like you can think of it as like a psychological defense maybe but there's stuff around can we use AIs to make our collective decision making a lot smarter a lot wiser a lot better can we make it so that we're better at finding truth together can we make it so that we're better at coming to like compromise policy solutions that leave like lots of people happy. How do you ensure that advances in AI doesn't lead to a war between the west and China that kind of thing? Or even that too but even more mundanely stuff like over the last 10-15 years social media has led to like a degradation of political discourse. Could AI tools help you just kind of like find the policy from among the vast space of possible policies that like a large number of people actually like and can like credibly put trust in and so on. So I interviewed Will McCaskill and Tom Davidson from Forthoort earlier in the year and they have that the organization has a long list of what they call grand challenges which they suspect all of them are probably amenable to this kind of AGI labor during crunch time. I think other ones are like ensuring that society doesn't end up locked into particular values but kind of prematurely with that and like cuts off our ability for further reflection and changing our mind. The potential use of AI or AGI in as much as it's very steerable and follows instructions to be used in kind of power grabs by the people who are operating it. I guess the space governance, this question of if we actually do start to be able to use resources in space how would we share them, how would we divide them such that in particular such that there's not conflict ahead of time because people anticipate that once you start grabbing resources in space you're on track to become overwhelmingly dominant. Yeah, there's epistemic disruption which you mentioned. I guess new competitive pressures kind of concerns that you can end up in a sort of Malthusian situation if you have competition between many different AIs and possibly some others that are missing here. But there's I guess many other like I guess we don't know which of these are going to loom large at the time. Some of them might feel like they've kind of been addressed or perhaps that we were hallucinating issues that aren't so severe. But yeah, there's many different ways that we could potentially apply it. Yeah, I agree. I think all of those problems that Tom and Will highlighted seem like real problems to me. I think maybe my approach would be to from our current vantage point lump a lot of that under AI for helping us think better and helping us like find solutions that we're mutually happy with. So it's like AI for coordination, compromise, negotiation, truth seeking, that cluster of things because I think like something like the question of space governance like how do we divide up the resources of space if like there are some existing factions that have an existing distribution of power. No one really wants the sort of destruction that comes from everybody racing as hard as possible to get there first. But there's like a complicated space of like negotiated options beyond that. And I think AIs could potentially help a lot with that sort of thing. So you said in your notes that you think this approach is basically what all of the frontier AI companies say this is their safety plan or else. Is that right? Yeah, I would think so. I think if you look at public communications from at least open AI, Anthropic and Google DeepMind, this sort of jumps out more or less in these different cases. But in all of their like stated safety plans, you see this element of as AIs get better and better, they're going to incorporate the AIs themselves into their safety plans more and more. And I think some are more explicit than others about expecting some sort of like specific crunch time that occurs when AI is like rapidly accelerating AI R&D. But everybody is picturing AI is playing a heavy role in their in the safety of future AIs. Yeah, what assumptions are necessary for this approach to make sense? Or what kinds of setups could actually just make it a bad plan? Yeah, I think fundamentally, you need it to be the case that there exists a window of opportunity where like before AIs are uncontrollably powerful or have created like unacceptable levels of risk, where they are like really capable and like really change the game for like AI safety research, and that there's some meaningful window of time where you can you can notice as you're approaching it. And even by default without like crazy slowdown, it lasts at least six months or lasts a year. If you think instead that once your AI sort of hits upon some generality threshold, it like within a matter of days or weeks becomes crazy super intelligent, this plan doesn't work because like, you know, you wouldn't even notice probably before your before it's too late. So and then I think there's also there can also be unlucky orderings of capabilities where this plan wouldn't work, where you could have AIs that are like really specifically good at AI R&D, and they're really not good at anything else, not even AI safety research that's very similar to AI R&D. They're just like extremely good at AI R&D. Maybe the only thing they're good at is making it so that future generations of AIs have better sample efficiency and can learn new things more efficiently. Then you could have a period of six months or a year where you know this is happening and you have these AIs, but you're still sort of hurtling towards a highly general super intelligence without being able to use these AIs for anything else necessarily, because they're just not good at anything else. There's something that's a bit self contradictory about that because an AI that can it's like extremely smart, but all it can do is improve the sample efficiency of the next model is in a sense like not very troubling, but in itself because it doesn't have like general capabilities that the model isn't going to be able to take over or invent other technologies. It's only at the point that it has the broader capabilities, the broader agency that it actually is able to make problems. I guess you're saying you can have a long lead up where that's all that it can do and then at the last stage. Yeah, and then at the last stage it might be going, it might go back to the first scenario I talked about where it's like, oh wait, the narrow AIs that are just like savants at AIR&D hit upon an algorithm in almost like a blind search, like almost like if you imagine Alpha Fold, it is brilliant at figuring out her protein's fold, but it isn't broadly aware. You could imagine such AIs or like an algorithmic search process hitting upon an architecture or like a training strategy that then can go fume really quickly. And so in this lead up, you're like, yep, AI is accelerating AIR&D. It's crunch time. We have six months left. We have three months left, but like these AIs are not the AIs that you can use for anything useful. Yeah, I guess many of the problems that would like it to help with social issues, political issues, philosophical issues in some cases. What do you think is the chances that AI, I mean, the companies I think they're working harder to make them good at coding and to make them good at AI research than any other particular thing? And I guess those are more concrete measurable problems than solving philosophical questions. So it seems like it is really a live risk that unfortunately the balance of capabilities will end up being pretty disadvantageous for this plan. Yeah, I think that the further a field you go from work that looks like doing ML research and doing software engineering, the greater a penalty they'll probably be. The AIs currently are much better at helping my friends who do ML research all day than me, where I do weird thinking and go on these kinds of podcasts and write emails to people, making grant decisions and stuff like that. It's much worse that that stuff. You can see already that it's got a very specialized skill profile. Fortunately, I do think that at least AI safety, there's a big chunk of AI safety research that does look very similar to ML research. And I do think like, you know, my friends who are getting like big speed ups from AI are safety researchers and they're doing the kinds of work, control, alignment, etc. that I think will be like some of the most important things you want these AIs to be helping with at the very beginning. But yeah, stuff like AI for epistemics, AI for moral philosophy, you know, AI for negotiation, AI for policy design, all that stuff just may not be that good, doesn't necessarily have to be good by default. And that's like a big concern of the plan. Because another worry would be that the AI models end up being able to cause trouble before they end up being capable enough to figure out solutions. Like I know, a classic case there would be, imagine that we put a lot of effort into, I guess it would be a bit stupid to do this, but we put a lot of effort into training an AI model that's extremely good at developing new viruses or new bacteria, basically changing diseases to make them worse. I mean, there are people who are using AI to develop new viruses, I guess they're using it to develop medical treatments, but that sort of stuff can then be repurposed for other things. But if that sort of highly specialized model arrives first, before you end up with a model that has a sufficient understanding of all the society and biology and medicine to figure out what the good counter measures are, then we need a different approach than this one. Yeah. And in general, I think of like, AI's doing defensive labor as a prediction about the world that you want to like, try and be thinking about as you make your plans. It's not a guarantee. And in many cases, the answer will be to do, to specialize now in doing the kinds of things that might be hardest for the AI's to do then. And I think stuff like building a bunch of physical infrastructure to like stockpile a bunch of PPEs and vaccines and things like that is a prime candidate for something that just inherently takes a long lead time and that the AI's might not be that advantaged at at the point that they're good at doing the like scary things that it's meant to protect against. Yeah, that was going to be another concern of mine that in as much as the AI's are very helpful, you might imagine that they're very helpful at the like idea generation or the strategizing stage, but they might still be quite bad at like actually running a business or actually figuring out how to do all of the manufacturing. So if they come up with a great strategy for kind of a new bio weapons where they're like, here's the widget that you should use, go and go and make 10 billion of them. Like, can you help us with that? I'm just like, no, I'm not very good at that. Good luck. Yeah, I think that in general, you should expect AI's to be much better at things that there are tighter feedback loops on, where you can recognize success after a short period of time. And that's why they're like, that's one of the reasons why they're really, really good at coding, because you can just like train them on this like, very hard to fake signal of like, did the code run after you like did whatever you did with it. And in general, I think like idea generation versus actually executing on like a one year plan has some of this element of like, you can read a white paper and be like, oh, yeah, that's pretty good. And like you can push the thumbs up button and like generate an AI that's like pretty good at generating white papers that you think are like, you know, neat and like probably would work. But it's like much harder to train the AI to like, run the team of like thousands of humans and robots that are like actually executing on the plan. Why is the crunch time aspect or, you know, the intelligence explosion taking off actually even relevant to when we would want to start doing this? Because you might just think, if AI can help us do research or do work to solve any of these problems, then we as soon as it's able to do that, we want to do it like whether or not an intelligence explosion is kicking off or not. To some extent, that's right. I think the reason that I focus so much on the intelligence explosion is twofold. One is because at that point, I think we might have a pretty short clock to figure out a bunch of stuff. And, you know, the default trajectory might look like 12 months to extremely powerful uncontrollable super intelligence that could easily take over the world. So it kind of changes our calculus of like, you might you want to like focus on like very short term things rather than things that have long lead times, at least at crunch time, if not before. The other thing is, I think crunch time can help alleviate some of the challenges we've been talking about with AIs not being good at the full spectrum of things we want them to be good at. Because sort of by definition, at that point, AIs are really good at further AIR&D. And one of the things we could do with AIs that are good at AIR&D, at least in most cases, is to try and direct their AIR&D towards like filling out the skill profile of AIs and getting them to be good at some of the types of things that we want them to be good at that they aren't so good at right now. And so at that point, you might have like just much more capability at your disposal. And it might be like much more worth putting in the effort to try and like fine tune and scaffold and do all these other things to make your AI that's good at moral philosophy or your AI that's good at biodefense. So you're thinking about this strategy not just as a description of guess what other organizations potentially should work on or as a description of what the AI companies are already planning to do. But also, I guess because you think maybe this should influence what open philanthropy plans to do over over over coming years and potentially that like open open philanthropy is best play might be to have billions of dollars waiting at this at this relevant crunch time and then disperse them incredibly quickly buying a whole lot of compute to get AIs to solve these problems. Yeah, I mean just like how right now, 80% plus of our grant money goes to salaries to pay humans to think about stuff and do research and do policy analysis and advocacy and all these other things. So too in a few years, it might be the case that AIs are better than most of our human grantees and our money should mostly be going to buying API credits or renting GPU time to get the AIs to do like a similar distribution of activities. So an alternative approach to this would be that at the point that we get a heads up that we think an intelligence explosion is beginning to take place, we do everything we can to pause at that stage to slow down basically to arrest that process so that rather than having to like rush to in three or six months get the AIs to fix all of these issues, we buy ourselves a bunch more time. Why not adopt that as the primary approach instead? Yeah, so I think that the plan I described is compatible with pausing at an intelligence explosion like right at the brink of an intelligence explosion. In fact, I would hope that we do that because I think by default having 12 months to get everything in order is just not enough time. But I think of it as doing two things. One is making the pause less binary. So if you think of the default path as almost 100% of AI labor goes into further rounds of making AIs better and making more AIs and making more chips and so on, and you think of a pause or a stop as 0% of AI labor is going in of the world's AI labor is going in towards those activities. I think there's a whole spectrum between 0 and 100%. And then I think of it as doing another thing which is it's sort of answering the question of what you do in the pause, which is you do all this protective stuff and you have these AIs around to do it with. And you might think, once you have that frame of making the pause less binary and thinking really hard about what you do during a pause, I think you might often end up thinking, oh, it's worth going a little bit further with AI capabilities because especially if we tilt the capabilities in a certain direction, we might at the end of that get AIs that are much better than they are right now at Biodefense while still not being uncontrollable, still not being that scary. And you can imagine a bunch of little pauses and little redirections and so on during that whole period. And I would hope that at some point in the period we do activities like policy, coordination and so on that cause us to have longer in this sweet spot of AIs that are powerful enough to help with a lot of stuff, but not so powerful. We've already lost the game. So yeah, we should probably clarify that, although you think this is among our best bets. In an ideal world, you think that we would go substantially slower through all of this because as good a plan as this might be, we'll really be white knuckling it and not be confident that it's necessarily going to work. Yeah. So I think that if a really clear early warning sign triggers that we are about to enter into this intelligence explosion, fast takeoff space, where we go in the space of 12 months from AIR and D automation to vastly superhuman AI, then I would vote for at that time shifting that trajectory to be 10 times longer or even longer than that and trying to make that transition as a society in 10 years instead of one year or 20 years instead of one year. I still wouldn't, and this is maybe a bit of a quibble, I still wouldn't advocate for pausing and then like hanging out for 10 years and then unpausing because I actually think that slowly inching our way up is better than like pause than unpause and then having a jump. But yeah, I would like going back to what we said about like how your default expectations of trajectories influence what you think should happen. I think the default is going through this in like one year and I would certainly rather it be 10 or 15 or 20 years. But I think that this like the frame of using AIs to solve our problems applies regardless of whether you're sort of white knuckling it in one year or like maybe eking out an extra two months or if you managed to get the like consensus and the common knowledge that allows the world to step through it in 10 years. Yeah, I guess as much as we're slowing down to do something, this is a big part of the thing that we're slowing down to do. Yeah. So this is a big part of the company's plan for technical alignment. If this doesn't work out, yeah, why do you think it's most likely to have failed for them? I think that it's probably if it fails, it's probably most likely to fail because they just didn't actually do a big redirection from using AIs for further AI capabilities to putting a lot of energy towards using them for AI safety. Because they say this is their plan, but they don't really have any quantitative claims about like at that stage what fraction of their AI labor or their human labor for that matter is going to go towards the safety versus the further acceleration. And they'll be facing tremendous pressure at that point from their competitors to stay ahead. And so my guess is that unless they have like just much more robust commitments than they have right now, they probably just won't be directing that much of their AI labor. And so if they have 100,000 really smart human equivalents, maybe only like a hundred of them are working on AI safety, which is maybe still like more than they had before in human labor, but not that much compared to like how quickly things are going. Unless they have really strong commitments, but I guess other mechanisms would be that it's legally required. At this point, the government basically insists that most of the compute go towards this, or at least like most of it is not going towards recursive self-improvement. Or I guess if the companies could read some sort of agreement where they're saying, well, we would all like to spend more of our compute on this kind of thing. So we're going to have some, I guess, contract where we're going to spend like 50% of all of our compute, and then we like don't lose relative position in particular. I mean, I think that particular contract is probably going to run into big antitrust issues. Maybe a little illegal, but maybe we could come out and accept this one. I guess a different mechanism, in as much as the government is taking a massive interest, they could have to try to coordinate this one way or another. Yeah, I think that's a possibility. I do think it's a bit tough. This is not the kind of thing it's like super easy to make laws about because it's really not a box checking exercise. Like when you actually, when you like write the legislation that like half the compute must be spent on safety rather than capabilities, like what do you count as safety research? And like, how are you enforcing this? Like do you have like auditors in there being like, what are you working on? What are you working on to like all the team leads in the companies and like, you know, checking off that they have that it's 50% safety? I can imagine stuff like that. I think it would require like extremely technically deep regulators that like we just don't really have right now, I think. I thought that you might say that the most likely reason for this to fail was that it just turned out that alignment is incredibly hard. You get egregious misalignment even at like relatively low levels of intelligence. And we don't really figure out how to fix that early enough to get useful work out of them. Yeah, I think that's a possibility. I don't think it's the most likely way it fails in on my views. I think the most likely way it fails is that they don't try they don't go super hard on it. But I think it's it's also plausible that they're just trying to get the AIs to help with alignment and the AIs are just like misaligned and the control procedures and other things are like ineffective. And so they just deliberately only help with further AI R&D and don't help with alignment and safety and biodefense and like all these other things you would want them to help with. I would hope that at that stage the transparency regime is strong enough that that fact is broadcast really widely and then that could inspire like a change in policy that causes us to slow down. But but then in that world it's a bad world even if we do slow down a lot because it's just tough. We're just on our own. We have to like do this stuff without the AIs help because we can't get them to help us. But I'm actually like reasonably bullish about control techniques, getting early AIs that are not super galaxy brain super intelligences to be helpful for a range of stuff that they're good at. I guess another way that they could end up actually just not making them much of an effort is if the window is relatively brief and it just takes a long time to get projects off the ground and they haven't really planned this ahead. So you know they end up debating it back and forth and then by the time they've figured out that they actually do want to do this. I mean I suppose it's like nominally in these various papers but I wonder whether they actually are thinking ahead about how this would feel and whether they'll be have the decision making capability to decide to redirect enormous resources towards this other effort. Yeah, I do think anything that requires a large corporation to be super discontinuous and something it's doing is like facing big headwinds as a plan. So I would hope that they're sort of smoothly increasing the amount of internal inference compute that is going towards safety as the AIs get better and better so that the jump doesn't have to be huge at that final stage. And that is something that if we could elicit like honest reports without creating like perverse incentives that's something I'd want to know about. Like how much I mean how many, how much human labor is going to safety versus capabilities and how much internal AI inference is going to safety versus capabilities. How much fine-tuning effort is going to safety versus capabilities. And I think they have like a much better shot if they're stepping it up over time on some kind of schedule. Okay, so that's the AI companies who I guess we're imagining would mostly be focused on this strategy for AI technical alignment. But you've been thinking about this more in the context of open philanthropy and like what niche it could fill. What would open philanthropy need to do if this was dumping billions of dollars onto this plan that became its mainline strategy? Yeah, I think that for now the biggest thing we need to do is very similar to the biggest thing I think society needs to do for preparing for the intelligence explosion, which is really trying to like track where we're at right now in terms of how useful AIs are for the work that we do and the work our grantees do. I think pushing ourselves to automate ourselves and to pushing our grantees to automate themselves and like tracking, you know, how good is AI at the stuff Fort thought does, how good is AI at the stuff that Redwood Research or Apollo does, how good is AI at the stuff that our policy grantees do. And I think that that is just like one thing, just like just socializing within ourselves, like that, hey, like it's probably a big deal when the AI start to get really good at any given like good thing we're funding. And once we start to see signs of life there, we should be like prepared to potentially go really big on that. And like you said earlier, I do think crunch time isn't like 100% a special thing, like we absolutely shouldn't be like waiting until crunch time to do anything at all. It's just the prediction that like crunch time is the point when a lot of things that were hard to automate before become easier to automate. So if there are some, if it turns out, for example, that like AI is really good at math research, which I think is plausible, then maybe we should be trying to deliberately shift our technical grant making towards more mathy kinds of technical grant making because that is an area where you can like churn a lot more. Like that's just so much more tractable. So I think just having a function that is looking out for these things, and is maybe just like poking openfill and openfills grantees to consider shifting their work towards more easily automatable things, like consider repeatedly testing whether their work can be automated is a big thing. And then I think I could imagine down the line something like even just having separate accounting for like the rest of our grant making versus grant making that is going towards paying for AIs for our grantees. Like, you know, we already pay for like chat GPT pro subscriptions and like chat GPT API credits for tons and tons of grantees. I think just making it a bit more salient in our minds, like what fraction of our giving is going towards that? And do we endorse its size? And do we is there like any place where we should be going bigger? And are we on track? Is the percentage climbing the way we like think it should be? It does that seem in line with like the way AI capabilities are climbing? Are we on track to, you know, if we think crunch time is going to start in six years, are we on track to have inference compute be like a large fraction of our spending at that time? If I think about this kind of psychologically, I could imagine, you know, if I was leading open philanthropy, or I guess I was one of the donors being being advised, and we did have these transparency requirements, and we did start getting a sense that an intelligence explosion might be kicking off. I could imagine dithering for a long time, rather than deciding to commit billions of dollars towards this, because there's only a particular amount of money, there's only a particular size of endowment. And I think I would be like very scared that it's what it will be going too early, or this is a bad idea, or we're going to have egg on our face afterwards, because it will turn out, there were some early signs of intelligence explosion, but it's not really going to work out. And then we've like spent $10 billion and we have nothing left to show for it. That would be, you'd feel really bad if you made that mistake. Does that sound like a plausible way for things to go? Oh, totally. I mean, I think this is just a very natural institutional. I think even beyond just being scared of making a mistake on this front, is just that organizations have particular ways they do things. And there's like processes. And right now, open fills process for grant making looks like, usually someone fairly junior gets an opportunity, come across their desk, either through like one of our open calls or through some contact they have. And that junior person like pulls together some materials to convince their manager, it's a good fit. And then that manager sort of convinces someone higher up that it's a good fit. And you can have two layers or three layers or sometimes four layers of like, you know, information cascading up the decision making process that we have in place as an org, and then it's approved. And it's just like if the right thing to do is to spend a billion dollars on like some particular strain of work that's like super automatable, it just like that isn't even like you wouldn't trust some random junior person to make that call, you might need to have just a different process for that. And like you need to like, I don't know what that process would look like, but I think that would be like one thing to figure out. I guess for this sort of incredible scaling of funding and effort to take place, it would have to be, you're going to be incredibly bottlenecked on people, or they won't be like that many more people involved. So it would have to be that AI is not just doing the object level of work, but also deciding what problems to work on. Yeah, to have a good decision, like managing the project and overseeing other AIs, basically just taking up the entire org hierarchy. So that's the picture that you're envisaging? Yeah, so I think there's two possibilities here. One possibility is that by the time it's the right move to dump a bunch of money on crunch time AI labor, Openfill itself has already been largely automated. And that's actually like an easy world, because in that world, we just have a visceral sense that AIs are really helpful. Because maybe we've slowed down our junior hiring and all our program associates are AIs right now. And we are totally transformed as an organization. So the evidence, like in the conviction to pull the trigger might be easier to achieve. And then actually we have a bunch of labor. So maybe we have like a thousand people on the AI team instead of like 45 that we have now. And they can figure out all this stuff much more quickly. But I think the like concerning possibility is actually there's jaggedness where maybe AI is extremely good at math, and maybe AI is extremely good at technical AI safety, and like certain specific kinds of manufacturing that could be really useful for like a PPE play. But it's not that like we haven't automated ourselves, it's not that good at doing our jobs, because like there wasn't much of that stuff in the training data, we're just not like well set up to absorb AI labor. Yeah, make horrible mistakes in a way that like you can put it in a setup in like software or manufacturing where you catch those mistakes, but it's harder, you need humans to do that on the open fill side. So we're not very automated. We don't have a visceral sense of, you know, it's time now, like this is the moment like AIs are really, really good, we got to go big. But it's still the right thing to do to like pour a bunch of money into AI labor on these like few verticals that are like heavily automated. We've maybe actually been bearing the lead a little bit here on what the biggest challenge is for an external group like OpenFill to implement this plan, which is will you even be given access to the very best model? So that are being trained. And I guess at this crunch time when there's a crunch on demand for compute, will you actually have enough computer chips? So will anyone be willing to sell to you if you do this kind of work? Can you go into that? Yeah, so I think there's two challenges here to getting access to enough labor as an external group. One is whether they will just even sell to you. So like I said earlier in AI 2027, and a lot of stories of the intelligence explosion, you get to a point where one company has pulled far enough ahead of its competitors that it keeps its internal best systems to itself and only releases systems that are like considerably worse than its internal frontier that are just good enough to be like ahead of its competitors released products. And there can be a growing gap in like how intelligent the best internal systems are and how intelligent the best externally accessible systems are. And the AI company may deliberately choose not to sell to willing customers because they want to keep their secrets to themselves. Another possibility is they might be willing to sell to you, but the price just might be way too steep because the opportunity cost of using that compute to like sell to you to do whatever you want to do with it is training further more powerful AIs and they might be willing to pay quite a lot for that. So I think both are challenges. The second one is in some sense more straightforward to address, which is you try to like hedge against this possibility by having some portion of your portfolio like really exposed to compute prices and hope that maybe that looks like in the extreme case just having GPUs yourself that in peacetime you just rent out to other people doing commercial activity with it, but then during crunch time you redirect to doing AI labor. Although in that case you'll have to furthermore figure out how to get the latest AI chips like the latest AI models onto those chips that you own. So you might have to cut deals to make that happen. But also like in less extreme cases you might just purchase a bunch of NVIDIA or purchase a bunch of liquid public stocks that are exposed to AI to make it more likely that you can afford AI capabilities at the time. So there could be a huge run up in the price of GPUs or compute at this time, but you can partly hedge against that possibility by having most of your investments be in NVIDIA or other companies that sell GPUs so that if their price goes up you benefit on the investment side and that helps to offset the increasing price. And then on the software side there's a question of whether you have access to the very best models that are being trained. I guess on the one hand there's this story you could imagine where the companies are very close together, the models are roughly the same, margins are very low, they're very keen to put out models as soon as possible in order to remain competitive. I guess on the other hand you could have one leader that's starting to keep things all secret. Do you have a particular take on which of these scenarios you think is more likely to come about? Yeah, I think that at least at the beginning part of crunch time, like when the AIs are just starting to automate a lot of AI R&D, my bet is that things will at that point be relatively commercial, relatively open. The leading few companies are within a month of each other in their capability frontier or maybe it's hard to say who's in the lead because one company specializes in like one aspect, their model is a little spiky on pre-training and another company's model is a little spiky on software engineering or something like that. And I think that the reason I think that is basically just because it's kind of what a naive Econ 101 model would predict would happen, it seems like these companies don't have big moats and it also seems like what we've seen happen over the last few years. It kind of describes the present day more or less. It describes the present day and that's a change from a few years ago where I do think open AI had like way more of a lead and it seemed more plausible that there would be like a monopoly or a duopoly. But there are reasons to push in the other direction, which is basically that if you have a super exponential feedback loop, you have a bunch of actors that are growing at an increasingly rapid rate like first at 2%, then at 4%, then at 8%. And they don't interact with one another. You do get a winner take all dynamic where if they're growing on the same growth curve, but one gets to a particular milestone first, the bat leader gets more and more and more powerful and wealthy relative to the laggards. This is in contrast to exponential growth where if everyone is growing at 2% forever, then the ratios between more and less wealthy nations or companies stay fixed. So there is a reason to think that specifically around the time of the intelligence explosion, gaps will begin to grow again. But I think probably around the start, yeah, I will most likely be the case that you can buy AI labor if you can afford it. You can buy API credits. You can go on chatgpt.com. And then I think I have a lot of uncertainty about how it evolves from there. Yeah, what do you think is the chance that the leading company will try to keep the level of the level that they're reaching secret? I think it depends a lot on the competition landscape they face. So basically, if the other companies are really far behind, then I think there's a pretty strong incentive and reason to keep your capabilities secret because you give up quarterly profits, but maybe you don't care about that because you're running on investment money anyway. And if you can get your AI to help you make better AI, to help you make better AI and so on, you could emerge with super intelligence that might give you a power that rivals nation states or the ability to just decisively control how the future goes. And that might be very attractive to a sort of power-seeking company. I do think it does involve foregoing short-term profits, though, which means that if competitors are close at your heels and your investors are breathing down your neck to deliver quarterly earnings, it'd be hard. You can't go and tell all of your investors, don't worry, we have a super intelligence because I think then Word will get out. Well, and then also, your plan is to screw over the investors in this case. Your plan is to create a super intelligence, not to pay them back. So create a super intelligence and take over the world maybe. They won't like that. There's like a mismatch in incentives between the investors and the CEO, and the CEO is sort of being a bad agent to their principle. So basically, the more things look like an efficient competitive market with very little slack, the more the leading company will be sort of forced to provide access to the rest of us. To what extent do you imagine the companies would be enthusiastically bought in on assisting with this plan? So this strategy is their predominant approach to AI technical safety. I think even the optimists agree that there are other issues the society's going to have to deal with. In fact, they say this all the time, the leaders of the companies, that we're going to need a new social contract, it's going to upend everything, it's going to be a big deal. I imagine that in as much as they're nervous about the effects that the technology is going to have, they'll be very happy if someone came to them with a preprepared plan for how, here's how we're going to deploy all of this compute in order to solve all these other problems. Yeah, I think it's unclear. I think there are, certainly they have some incentive to be into this, but the two sort of alternative uses of AI labor that might be more attractive to them are like one, power seeking for themselves, just like building up an enormous AI lead over everyone else and then sort of bursting onto the scene with an incredible amount of power and the ability to challenge the U.S. government or nation states might be attractive to some people. I think that would be a very evil strategy to pursue, but it's definitely in the water. The other thing is more mundane, it's just using these AIs to make normal goods and services, to make the products and the media content and the other services that people most want to pay money for in a short term sense. It's very similar to how right now we don't spend a huge fraction of society's GDP on biodefense and cyber defense and these other things, immoral philosophy. It's just like that's not what people want to pay for and AI is like another, it's just a thing that accelerates the creation of products and services people want to pay for and this isn't very high on the list. I guess most people are not looking to become dictator of the world or to take on huge amounts of power, but I guess the kinds of people who end up leading very risky technology projects are not typical people. They're like somewhat more ambitious than the typical. So I suppose we can't potentially rule that out as a possibility. Yeah. So a possible challenge would be that even if you have an enormous amount of compute, there might just be only so fast that you can go because you require some sort of sequential step. So there's some step that is just like bottlenecked in time. I guess people talk about things where you have to do an experiment that just actually takes a certain amount of time to play out. But more generally, at least with LLMs for example, they produce like one token after another and having twice as much compute doesn't necessarily allow you to basically complete an answer twice as fast without limit. How much is that an issue here and as much as we're trying to solve problems in a very short calendar time? Yeah, I think that that is likely to come up especially for physical defenses like manufacturing PPE or scaling up the ability to rapidly create medical countermeasures and then also for social and policy things. So I can imagine that AIs could be very helpful in figuring out what kind of agreement between the U.S. and China would be mutually beneficial and how we could enforce it. But the way human decision making works still probably requires humans from the U.S. and China to come together and talk about it, have a conference or convening and come to a decision that they ratify and they feel good about and that could be a bottleneck. Yeah, any other examples of similar bottlenecks? I guess in terms of solving theoretical problems, I suppose you can speed things up enormously by having like many, many different instances of the same model like sort of brainstorm different solutions and then have them evaluate one another and that allows you to kind of have many different efforts in parallel. But it's also, I do think for deep theoretical problems, you can speed things up by having efforts going in parallel. But the right solution that's out there somewhere involves like multiple leaps where like it's hard to think of the next insight without having the foundation of the earlier insight. So really, even if you have a hundred AIs working in parallel, what will happen is that one of them comes up with the first step of the insight and then everyone is working in parallel and finding the next insight. But you still need to go three or four steps in. So what sort of stuff do we need to be doing in advance? I guess like for example, setting up planning meetings ahead of time for like diplomats between the US and China, I would absolutely need to do that at the very early stage in anticipation that eventually we might have a deal that they might want to ratify. I guess that sounds a bit crazy. But are there other examples of things that you need to do before this all kicks off? Yeah, I think that in general, you want to be thinking about what would the AIs at the time be like most comparatively disadvantaged in. They'll have like all these advantages over us. They'll understand the situation, but much better at that point in time than we do now. They'll be able to think faster, move faster and so on. But I think what we can contribute now would be things that just inherently take a long lead time to set up. So that might include physical infrastructure, like the bio infrastructure that my colleague Andrew is working on building out. It might also include just social consensus. Like I think it takes some amount of time for an idea to be socialized in society to have it as an accessible concept that maybe we should try and create some sort of treaty between the US and China to allow AIs to progress somewhat slower than it might naturally and use a bunch of AI compute to solve all these problems. I think that kind of thing takes years to become something that's in people's toolkit in the water such that they actually think to have the AIs go down that path and figure out the details of that. So what should people be doing if they think that this kind of makes sense or it's something that they'd want to contribute to? Are there other organizations that should similarly be sort of planning ahead and thinking about how this might look for them? Or could individuals be thinking about how they could contribute to adopting this approach for their own particular projects? Yeah. So in terms of other organizations, I think it would be especially great for government entities to be thinking about adopting AI. I know that there's just a number of random little types of red tape that make it harder for governments to adopt AIs than for anyone in industry to adopt AIs. And I think we might end up in a situation where the regulated ease, the industry people have fast cars and the regulators have horses and buggies because of this differential adoption gap. And I think just more broadly, if your company is not already going like maximally hard on adopting AI for your personal use case and you work on defenses, AI safety, moral philosophy, all these good things, it's probably worth having a team that's just kind of on the lookout for how could you adopt AI as soon as it becomes actually useful for you. Let's talk a bit about the career journey that you've been on since we last did an interview two and a half years ago. I guess back then you were doing general AI research and strategy for open philanthropy. This is in 2023. And then in 2024, you started leading the AI technical grant making. And then I guess towards the end of that year, you decided to take four months off and take us about a call. Yeah, tell us about all of that. Yeah. So I think that before I had been at OpenFill for more than six years before I made my first grant, I was involved in some grant making conversations earlier, but the first grant I actually led on was somewhere in mid or late 2023, and I had joined OpenFill in 2016. So it was kind of interesting. My work at OpenFill, in some sense, if you just took the outside view and said, this is a philanthropy that's giving away money, my work there was very strange because it was kind of thinking about these heady topics and then writing these long reports that I published on Less Wrong About Them. And I always felt a little like, oh, maybe I should dip into grant making because that is our core product in some sense. That's what we do. But I had always been sort of drawn away by deeper intellectual projects. So even though I always vaguely had the thought that I should do grant making, it never really happened for me until actually, I think the thing that pushed me head first into grant making was the FTX collapse. So actually, sorry, my first grant must have been in 22 instead of 23. Because at that point, there were hundreds and hundreds of people who had been promised grants by the FTX Foundation, where their grant wasn't going to go through, or they were worried it was going to be clawed back, or it was partially not going through. And OpenFill put out this emergency call for proposals for people who had been affected by the crash. And I had some thoughts and takes on technical research and also just the organization needed help, like search capacity for this sort of emergency influx of grant making. So in a matter of like, maybe six weeks or so, I made like 50 different grants after not having made any grants at all. And that was a really interesting experience. And I discovered there were elements of it I really liked. But there was just something about the way you made grants where you just really couldn't dig into any particular thing very much, especially in the context of something like the FTX emergency, you just had to be making these decisions really quickly. But I felt like I had thoughts about how grant making could be done with more, at least in the technical AI safety space, could be done with like more inside view justification for the research directions we were funding than we had previously. And so in early mid 2023, I sort of tried to go down that path. Sorry. So in 2022, you did this huge burst of grant making, I guess trying to help a bunch of refugees from the FTX Foundation basically. But then you thought, I guess you would have noticed that there's probably no overarching strategy behind all the grants that you were making. You were like, we need to have a bigger picture idea of like where we're actually trying to push on them. Why? Yeah. So I was focused on grants to technical researchers. So these are often academics, sometimes AI safety nonprofits, and they would be working on often interpretability or some kind of adversarial robustness. And they seemed like reasonable research bets. But I felt kind of unsatisfied. And I think this is going to be like a theme of like me and my career. I felt kind of unsatisfied about how like the theory of change hadn't been like really ground out and like spelled out as to like how this type of interpretability research would lead to like this type of technique or ability we have, and then that could fit into a plan to prevent AI takeover in this way. Or similarly for any of the other like research streams we were funding. And this had been actually the big thing that deterred me from like getting involved in open fills technical AI safety grant making for a long time, even though I was one of the few people on staff that thought about technical AI safety outside of that team. It was because like in the end, it seemed like most grant decisions in this 2015 to 2022 period turned on like heuristics about this person's a cool researcher and they care about AI safety, which is like totally reasonable. But I think I wanted to like have more of like a story for like, and this line of research is addressing this critical problem. And like, you know, this is like why we think it's plausibly likely to succeed. And this is what it would mean if it succeeded. And we never really like had that kind of like very built out strategy. Because it's like very hard. It's like, it's a lot to invest in building out a strategy like that. But you know, having been thrown head first into grant making with the FTX crisis, I was like, maybe I do want to try and like take on the AI safety grant making portfolio, which at the time didn't have a leader, because all the people who had worked on that portfolio had had left by that point, some to go to FTX foundation actually. And so like, it was this like portfolio that had been like somewhat orphaned within the organization. And it was clearly like a very important thing. And I like, I was like, oh, maybe we could like approach it in this kind of novel way for us in this area to like really try and form our own inside views about like the priorities of different technical research directions and like really connect how it would like address the problems we most cared about. It sounds like you find it unpleasant or like anxiety inducing to make grants where you don't have a deep understanding of what the money, I guess, not so much what the money is being spent on, but like, you don't have a personal opinion about whether it's likely to bear fruit. Is that right? Yeah, or like, I think it's a bit nebulous what the standard is that I hold myself to. But I think for my research projects, when I think about timelines, or I think about how AI could lead to takeover or like how quickly could the world change if we had a GI? I think I can often with like months of effort get to the point where I can anticipate and have like a reasonable response to and a reasonable back and forth with a very wide range of intelligent criticisms for why my conclusion might be like totally wrong and totally off base. Like I feel like I know what the skeptics that are more do me than me will say and I know what the skeptics that are less do me than me will say and I could have an intelligent conversation that goes for a long while with like either side. And that is like a standard I aspired to get to with why we supported certain grants. And I could do that with some of our grants. But I wanted the program to get to the point where like if somebody came to me and said like, you know, isn't interpretability just actually some like hasn't seen much success over the last four years? What do you make of that? I wanted to have like, I wanted to kind of be at reflective equilibrium on my answers to questions like that and wanted to be able to like say something that went a bit beyond like, yes, but like, you know, outside view, we should support a range of things. And that that is something that I think emotionally like is unsatisfying to me if it's like a big element of my work. Yeah, it's maybe worth explaining why it is that OpenFill doesn't aspire to get to that level of confidence with with most of its grants. What why is that? I think it just takes a long time. I think there's there's two things. It just takes a lot of effort. And then the other thing is that even if you put in that effort, you don't want to fully back your own inside view. And then I think I wouldn't endorse that either. And so like, it's this one two punch where it's just like developing your views about like exactly how interpretability or adversarial robustness or like control or corrigibility fits into everything is a ton of work. You have to talk to a ton of people. You have to write up a bunch of stuff. And in the meantime, you're not making you're not getting money out the door while you're doing all this stuff, right? And then having done all this stuff, like where are you going to end up? Like you're going to end up in a place where there are reasonable views on both sides. And like it's a complicated issue. We probably want to hedge our bets and like defer to like, you know, different people with different amounts of like the pot and so on. And so I think people have a reaction that's like very reasonably like, okay, we're going to end up in a place where we've thought it really we've thought it through. It was a lot of work. It's still very uncertain. We still want to spread our bets. So why not like, so it doesn't even affect the decision? Why not like just get to the point where just short circuit all that and spread our bets and like lean on advisors. And I think I have sympathy for that. Hopefully I represented that perspective reasonably well, but I just feel like in my life, in my experience, like having done the homework, like really qualitatively changes like the details of the decisions you make in ways that I think can be really high impact. Like one thing that I'm able to do having like gone through the whole rigmarole of like forming views is work with researchers to find like the most awesome version of like their idea by the lights of my goals and like pitch them on that and like sort of co-create grant opportunities. And I think there's just like something and that I maybe won't be like great at defending, but I just feel like there are other like nebulous benefits beyond that. And I like really like operating that way. So in 2024, you actually like took on responsibility for this whole portfolio, but I guess you late 2023. Yeah, 2023. But I guess your personal philosophy of how to operate is somewhat intention with how open fill as a whole is tending to operate and just intention with in the short term making a large volume of grants. Right. I think that's yeah. So yeah, so what did you end up doing in the role? So I think I ended up pursuing a compromise where one thing that just comes with the territory of this role is that there were there have been grantees that we made grants to in the past that are up for renewal and like part of the responsibility of being the person in charge of this program areas that you investigate those renewals and make decisions about whether we should keep the grantees on or not. And those grants, I tried to follow like what an open fill canonical decision making process would be there. And so I tried to pursue kind of a barbell strategy for a while where like on the one hand, there were either renewals or people who like knew us who reached out to us to ask us to consider grants where I wouldn't hold myself to the standard of like really on the technical merits like understanding and defending the proposal, but would lean more on heuristics like this person seems like aligned with the goal of like reducing a takeover risk. This person has like a broadly good research track record and so on and try to make those grants like relatively quickly. But then I would also be trying to develop like a different funding program or like some some grants that I like really wanted to bet on where I would try and like work myself to like hold myself to that standard and try and like really like write down why I thought this was a good thing to pursue. And it turned out that the second thing basically turned into like making a bet in late 23 to like mid 24 of AI agent capability benchmarks and like other ways of gaining evidence about like AI's impact on the world. So it's sort of the stuff that we were talking about earlier where you're trying to get an early heads up about whether the AIs are going to be really effective agents. I guess 2023 we were really unsure how that was going to go. It seems like agents in general have like been a bit like disappointing or it hasn't progressed as much as I expected or probably as you expected. But at that point it seemed like well maybe they're maybe by this point that they're just operating computers completely as well as humans and you really want to take us know if that was that was a future we were heading for. Yeah. Yeah. So I launched this request for proposals which OpenFill has done technical safety requests for proposals before but this was by far the narrowest and most sort of like deeply justified technical RFP that we had put out at that time where I was like we are looking for benchmarks that test agents not just models that are chat bots and these are the properties we think a really great benchmark would have and these are examples of benchmarks we think are good and not so good and like we had a whole application form that was in some sense like sort of guiding people to like to or like trying to elicit the information about their benchmark that we thought would like be most important for determining whether or not it was like really informative and like mostly this was just like be way more realistic like have way harder tasks than existing benchmarks like even if you think your tasks are hard enough they're probably not hard enough there's a lot of like push in that direction so it was a very opinionated and very detailed and very narrow RFP and we ended up making 25 million dollars of grants through that and then another two to three million from the companion RFP which was just a broader like all kinds of like information from RCTs to surveys about AI's impact on the world and I'm like pretty happy with how that turned out it was like like you would expect a lot of effort poured into like one sort of direction and you know if you were if you were skeptical of this like sort of high effort sort of approach to grant making there would be this like you could argue that like you know I could have just like put in way less effort funded like you know twice as much volume in grants across like 10 different areas picking up the low hanging fruit in all those areas so I guess halfway through 2024 you started feeling pretty burnt out or like you want to take a bit of a break yeah why was that yeah I think throughout this so right around when I switched from doing mostly research to doing grant making and especially when I was like trying to ramp up this this program area that had this more like inside view more understanding oriented approach to AI safety research Holden who had been running the AI team up to that point decided to step away and left the organization and he was my manager and I think that I had a working relationship with Holden that involved a lot of like arguing and discussing about the like substance of what I was working on and when he left leadership was like stretched more than because someone in leadership was gone and I think the the people who remained in the leadership team didn't have as much like context and fluency with all this AI stuff as Holden did so you know when I I wrote up this big memo being like oh we should do like AI safety grant making in a more like understanding oriented way and we should develop inside views and here's why I think that would be good and I think what I wanted was for like my manager or like leadership to like argue with me about the object level on that and like for for there to be like some sort of like shared view within the organization about like how much this was a good idea or like what are the pros and cons of it and like how much we want to bet on it but I think that was like just kind of unrealistic given the other priorities on their plate and like given their level of like context in this area so I ended up having to to sort of approach it in a more like transactional way with the organization it's more like rather than like let's talk about whether this is a good idea it was more like well I want to do it this way and they were like yeah I mean we don't know if that's the best way to do things and like we have some skepticism but like you can you can do that if you want and it's like and so I felt like kind of lonely because I think and this is something I learned about myself like over the course of trying to run this program and then going on sabbatical and reflecting on it um that I really like to be like kind of plugged into the like central brain of like the organization I'm part of and I sort of didn't like like I didn't feel like I had a path to do that and I instead like what I had a path to do was to like stand up this this thing which I tried to do but it would just like felt a bit tough going and like it sounds like you were you're a bit on your own yeah I felt a bit on my own um and I'm not a very like entrepreneurial person I think or like I'm ambitious in some ways but like it's not I just really have like have a high need for like constantly talking to other people um and I try to achieve that sense of team by like hiring people under me to help me with this vision but I think I was not very good at hiring and management um partly it was because this vision was already was like pretty nebulous and I think I like probably needed to like spend more cycles like working out the kinks in it by myself and like really solidifying what it is and what's the realistic version of doing like an understanding oriented technical a safety program so it's very hard to hire because you kind of had to hire for someone who really like resonated with that off the bat even though it wasn't a very well defined thing and then I so that took a lot of energy and then I think with people I was managing I have always struggled and in this case still struggled with like perfectionism in management so um I have this long history of like uh trying to get people to like uh serve as like uh writers who like write up my ideas and it never works for me because like they don't do it just the way I want it and I'm myself a pretty like fast writer and so like um working with a writer as their editor and like getting their writing output to be something I'm satisfied with often ends up taking more time than like doing it myself and I found the same happened um to some extent with grant makers where um at one point we had a number of people um sort of spent part of their time working on the benchmarks rfp um and I think it's like possible that I would have just uh moved through the grants faster if it were just me working on it um which is which is a bit tough I think I never like I think this is like a weakness or challenge a lot of like new managers go through and like I was sort of going through that at the same time as feeling like some of the like the the feedback and engagement I got from above me was like uh like much less than it was before and I had to like sort of like prove this new way of doing things um and like felt yeah like I I I thought and still think that there was like a lot to the arguments I was making but also like I didn't and it was not a wild success when I like took a swing at it like by myself so so September last year you just had to step away and just take some time away from work I guess after eight years of working very hard for full time yeah what did you end up doing doing with that time it was a mix of things um I just did a lot of like life stuff um like I don't know and just invested more in like uh like I I found a new um group house to move into or like started a new group house um so that was cool um did more just like trying to take care of myself um I started an exercise habit off that exercise habit now again so we'll see um and then I did a lot of reflecting on why this work situation ended up being so hard for me and like um also just like my journey through just like my career as a whole and like what are the patterns and when things were were hard for me I also just jumped in and helped with some random projects going on um so the curve conference uh which is a conference that kind of brings together AI skeptics um and AI safety people and like people kind of on all sides of the issue of like AI's impact on society um that that was like having its first iteration while I was on sabbatical so I was able to like kind of get involved with that more and like try to be helpful more than I like could have been if I had a full-time job which was really cool um did some writing um most of that writing hasn't been published um but it was still good for me to do um but yeah it kind of went by really fast honestly there was a lot of stuff to think about and a lot to do yeah what sorts of reflections did you have on I guess your career so far and your motivation and I guess like what had been difficult in 2023 and 2024 yeah um so I think in terms of 2023 and 2024 specifically um I really do feel like I want to be like an advisor and a helper to the kind of central organization and I had had been that in many ways over the last over the previous six years so the transition to being more entrepreneurial and more like I have a little startup making grants in my area and the organization is investing like money in me um but not necessarily a lot of like attention and I didn't necessarily have a path to like make arguments that then influenced like stuff in a cross cutting way that was hard so I think that was interesting to learn about myself that like that's uh if I don't have that I will still sort of gravitate towards trying to like meddle in like everything else that's going on and if I don't have like a productive path to meddle uh I'll feel sad um that was one big thing um I think another big thing is this uh just how much depth do I want like I do think I really want to like I have a drive to really like get to the bottom of something or just like I'm always like thinking about the counter argument and the counter argument to the counter argument and like the stuff I liked even when I was very young like I I really liked math tutoring and like I really liked math in general because you could just like dig and dig and like get to an answer and that's just inherently like a like uneasy fit with grant making or like basically just like investing like fox stuff yeah that open fillers engaged in in a way yeah yeah um so that was also interesting to reflect on and like I said it was like somewhat strangely for my first six or seven years at open fill I actually just did do like rather deep research even though we were a grant making organization I just wasn't doing grant making um yeah because Holden really wanted this deep research he wanted to more deeply understand the idea yeah personally and he thought it was help uh I guess healthy for the organization yeah I think that's right I think he had a lot of like drive and demand for like really figuring out timelines really figuring out takeoff speeds and like exactly what our threat models are for like whether AI could take over the world and like build building that all up and I think has a lot of the same instinct I have of like oh it's it's just really good to do your homework and it's it's really good to like um you know have the response to the top 10 counter arguments and the response to those responses and just really like know your stuff um and so he was the driver of a lot of the work that I did and I think if you would re-roll the dice and like um open fill had been run by like different leadership it's like probably pretty unlikely we would have gone as deep as we did until like doing our own AI strategy thinking because the thought would have been like well that we should we should fund like a place like fhi or like now forethought to do that stuff instead of us in your notes you said that you spent a fair bit of time reflecting in this period about what it had been that you liked about effective altruism I guess as an ecosystem and as a mentality and what things you didn't like so much about it to tell us about that. Yeah so um I guess it's been a long time since you've talked about effective altruism in the show so so I'll just sort of like open with what it even is which is this um movement or idea that uh you should think explicitly and seriously and quantitatively about how you can do the most good with your career or with your money that you're donating and that different career paths and different charities you could donate to could differ by orders of magnitude and how much good they do so like if you are working on reducing climate change it could be orders of magnitude more helpful to work on researching green technologies versus to work on like getting people to turn off their lights more more like conserve electricity more in their personal use and there's this ethos that like if you're really taking this seriously and you like really care about about helping the world uh you you stop and think and you do the math like in the same way that um you know if you if you had cancer or your spouse had cancer you would like do the research and like figure out um what treatments had what side effects and what treatments had what success rates and you would ask a lot of questions of the doctor um there's there's like this ethos that like that's what it looks like when you take something seriously and a lot of people when they're doing good in the world they they do what makes them feel instinctively good um and there is like a whole other approach where you you sort of respect the intellectual depth of that problem um and I was really drawn to this I I sort of you know fell headfirst into the EA rabbit hole when I was 13 um so it's uh been more than half my entire life that I've been like extremely involved in this community this way of thinking and I think there were like maybe like three big things that I really really liked about this approach one is just that um EA's sort of like challenge themselves to care about people and beings that were like very different from them very far away from them in time and space so even the most like sort of quote unquote vanilla like EA cause area of global poverty the vast majority of money that goes to alleviating poverty given by individuals in rich countries goes to helping other individuals in rich countries even though money could go much much further overseas in countries where people have a much lower standard of living and the reason you know people donate locally is that they sort of feel more affinity for people really closer to them and more similar to them um and EA also has like a lot of strains that sort of challenge people to to extend care to animals to extend care to um future generations that uh may live like thousands of years or millions of years in the future um to artificial intelligence also if it it can be something that um has consciousness and can feel pain and so on and that was really appealing to me um but then there were also just like a there's a way of going about doing things that was also very appealing to me which is like they were very nerdy they were very intellectual they were like really like thinking stuff through and almost like innovating methodologically on like how can we figure out which charities are better than which other charities and like there are there are lots of like interesting like arguments thrown around for this and they were all they were very transparent like they were just the culture of like open debate and like admitting your mistakes give well like an early sort of pillar of the early EA movement had a mistakes page on its website we're just discussed mistakes that had made they they were very like honest and high integrity in like an interesting way that doesn't obviously follow from like caring about other beings more um for example like uh give well refuse to do donation matching because donation matching is usually a scam where like the the big donor like would have given that much anyway even if you hadn't made your donation so that whole package was like really attractive to me i think it like really like hit a lot of psychological buttons for me at once and like really felt like my people and like the the like way i wanted to live my life so there's the being more compassionate to a wider wider range of beings which i guess is still the case and probably still something you like about the about the about the yeah effect of actualist approach but it was like also going into like enormous intellectual depth and just like really debating things out yeah and then there was also the the very high integrity about honesty like not allowing not allowing any chicanery whatsoever or i know i just like extremely like uh like fastidious and like exacting level of integrity that like other movements even other pretty high integrity movements like weren't aspiring to even beyond like what people are even asking you for potentially you're just like proactively say like by the way did you know donation matching is a scam that's why we're not doing it even though even though we would get more donations to help poor people you know it was it's interesting that like that was such a natural part of the early ea movement um even though like you you're sort of giving up on impact you know yeah it's not necessarily implied i mean i guess it's a it's a practical question whether it is or not um so i guess as as things evolved you found that i guess the second one the intellectual depth was like now lacking from your job um were there other things that were kind of changing that made you like less enthusiastic yeah i think the so so the intellectual depth was very much there in like other parts of the ecosystem especially um ai safety and like thinking through like how how exactly would you like control early um early transformative ai systems and things like that and um like i said my heart was like always pulled towards those kinds of questions even though i worked at a grant making organization yeah feels like on some level uh you really were a more natural grant recipient rather than a bit of a comment you should have gotten something to really go in deep on some question yeah i mean i i think that if i had graduated college in 2022 instead of 2016 like in 2016 i graduated college i went to give well and like a big part of why i went to give well at the time was that they had the most intellectual depth on this question of like what are the best charities um and if i had graduated college in 2022 i probably would have done um mats which is this program to like upskill in ml ai safety research and then tried to join an ai safety group you know um so i think i'm i'm sort of naturally drawn to to like actually doing the research in some sense um so so in that sense it was is sort of a mundane issue that like my job um especially after hold and left in the demand for that kind of like research uh evaporated a little bit at the leadership level it was like if i were to start over again probably i wouldn't have like applied to join open fill i probably would have applied to uh join an ai safety group but then i think like there's a the the third thing of just this like extremely almost comically high level of integrity that i really really liked was also like eroding over the years just as like um you know when i think about why i think that like when a lot of the focus of the of the ea movement was convincing really smart people to donate differently um being extremely like unusually high integrity was like actually really valuable and powerful asset um so like obviously people like me and like very wealthy people that were early give well donors really liked that give well had a mistakes page and really liked that that whole ethos and that whole package had helped them trust that the recommendations were actually real recommendations and they weren't being spun something and they weren't being sold something like all the rest of the charity recommendation ecosystem um but then like when you move away from that being your primary method of change um when instead you've you've actually attracted quite a lot of funders and now you're trying to use that money and the talent that you've attracted to like achieve things in the world maybe things that involve like a lot of politics then the like the the being like extremely transparent um can can be like very challenging um especially because like donors like want privacy or like if you're running a political campaign you don't want your opponents to know exactly your strategy and like you know the ways that uh you think you might have made mistakes like it's just like this is not how like most of the real world works you know yeah it's not the case that the world's most impactful organizations are consistently incredibly transparent or even like incredibly high integrity yeah yeah and so there was this this tension between the goals which i i felt like i should only care about the goals of ea there's like so sort of what ea told me and and um it kind of made sense to me was that like the the point here is to help others as much as possible the point is not to conform to an aesthetic or like do things in a way that feels like cleanest or prettiest but at the same time i think i was like to some extent kidding myself about how much of my own motivation and my own attraction to the concept came from just the goals like just pillar one and altruism versus pillars two and three of like that intellectual depth and like intellectual creativity and this like crazy high level of like openness transparency like having absolutely nothing to hide like you know letting all comers come like i i think for me like as a as a fact about my psychology the latter two things were actually really important for my motivation um and they were sort of over time just like smaller and smaller like features of like what it was like to do ea like to to try and like pursue ea goals in my career yeah i guess we should say for people who don't know that i guess over this period the environment that openfield was operating and became a lot more challenging and and a lot more hostile i guess that uh i guess for years i have been funding all kinds of ai related stuff but as a i became a much bigger industry i became a parent what sorts of concerns different different people had um it's it's work in some ways started to just clash with like very large commercial interests potentially and also just alternative ideologies that had different ideas about how things ought to be regulated or how things ought to go and so you were now in a world where there were people who would sit down and think how can i fuck with openfield like what can i do to give these guys a terrible day or like what did they publish that we could spread that will be embarrassing for them yeah and in that kind of environment where people just literally want to cause trouble for you it's a lot less attractive to be maximally forthcoming about all of your internal deliberations and why you made all the decisions like all of us would potentially be a bit more conservative in that kind of environment yeah i mean even before like the the latest round that started in 2023 of like ai policy heating up um openfield compromised a lot on its initial like wild ambitions for transparency like at the beginning there was this idea that we would publish the grants we decided not to make and explain why we decided not to make them when people like came to us for grants there's a reason most organizations don't do that there's a reason most organizations don't do that for our earliest two program officer hires we have a whole blog post that we wrote like about their strengths and weaknesses as a candidate and like alternatives we considered and like how confident are we that this will work out and we stopped doing that so it's just like there was like a level of transparency that's just like i still in my heart want that but it's like absolutely insane and then i think like um and then i think the adversarial pressure um that you mentioned makes it so that like open fill as an organization that like funds a lot of this ecosystem has a lot to lose like i think if we go down like a large number of helpful projects um have a much harder time getting funding um we have to be like a lot more risk averse than many of our grantees even though those grantees are also facing an adversarial environment um i think the way they many of them navigated is to sort of like fight back and like explain their perspective and like you know define themselves in the public sphere and then my my like instinct is to just do more of that and to just like sort of say more and and respond but it's harder to do that from open fills position for a number of reasons yeah so over the years a lot of people i guess usually critics have said that effective altruism has some things in common with religious movements to what extent have you found that to be the case and to what extent have you found that not to be the case yeah um i mean i think uh i think ea uh aspires to be and very much succeeds at being like a lot more truth seeking than the world's religions and a lot more truth seeking than a lot of other communities and movements in the world um so in that sense i think there's a disanalogy um that's extremely important um i do think there are like it's not a bad analogy in some ways because i think for people um who who really are deeply involved in the ea community it provides like a a map of the good life you know it's like it's it's like a vision of what it means to to be good and have a good life um it's sort of unlike a political movement in that it doesn't just have like a set of policy prescriptions for the world but like many religious movements it intersects with politics and like there are people who uh you know approach political questions like uh whether you should ban gestation crates for pigs through the lens of their of their commitment to ea um and so it has this um and it's not it's not just like a community it's not just like a social club i think like you know people get solace and friendship from their from their like um local community of ea's like people do from their local church community but it's it is more than that it is trying to say something about like uh you know the sweep of the world and like your place in it um and like what it means to to live a good and meaningful life and it it like intersects with like politics and community and a bunch of other things while not being exactly the same as it yeah i i would think a key way that it's not like a religion is that it feels more like in many respects a business to me or second like a startup or an organization that has like quite a functional goal or i guess that's a different aspect of it that well i guess you know some people like the ideas they like the blog posts they don't engage with a community whatsoever and i suppose for them it's going to be a different experience and there's people who like like the community actually there's many people who participate in the kind of community of people who would say i'm involved in effective altruism but actually are not that interested in the projects or necessarily even the effort of of helping people so people kind of sample the aspects that they like but i mean for many of the people who staff who work in organizations that have other people who would say i'm really into effective altruism uh it's much more pragmatic i would say yeah i think that is how it ends up manifesting um for a lot of people but i don't think that's really like what ia is or like i think it's a mistake to collapse ia into a set of like three or four goals in the world like you know reducing suffering of animals in factory farms plus like improving quality of life for poor people in developing countries plus like ai safety i think in some ways i think people think of like ia as like a weird umbrella for like those three things and then those three things are basically like professional communities pursuing like a kind of well-defined goal um but i think ia is is more like a way of looking at the world in a way of thinking about the good um and i think you can take an ia approach to cause areas that are in some sense more parochial than the like big three ia cause areas like i think you can absolutely take an ia approach to us policy from the perspective of like thinking about the welfare of us citizens doing like rigorous cost effectiveness analysis of like what policies actually help and don't help and a lot of people do and then i think there are like there is ia as a generator of like new cause areas that uh sort of could get added to the canon and i think right now there's like a bunch of fertile ground with like could ia be a force that helps society prepare for like radical change by advanced ai um where ai safety is one big important thing there but there might be like a range of other issues and you might want to prioritize some of those based on like your values and like your sense of how things will play out so you you're writing your notes that's at least from your personal point of view ia wasn't enough like a religion or it wasn't as much like a religion as as you might personally have liked yeah explain that i think i mean i think i'm someone that just really benefits from structure and from sort of like emotional motivation reinforcement um and i think um and i also just like very much tend to like a little bit socially conform or like i think i tend to like try and achieve the ideal of like the community i'm in and i think the ideal of like sort of my corner of the ia community is sort of like you said is just like to like have a really impactful job and then do a really good job at it and work a lot of hours at it um and so that's like the message you get from the community and that's like what i'm trying to do um but i think i personally would have liked yeah a bit more of a like spiritual angle to the community and a bit more like um if you read my colleague joe carlsmith's blog i think i get some of that like existential reflection about like our morality and our values and this crazy thing that so many eas believe um that in a matter of like a decade or two we might be in an utterly transformed world that might be like relative to this vantage point like utopic or dystopic and just like grappling with that and like i think you know i think if there had been like an ea church where like every sunday like someone who's like really good and thoughtful about these issues like let a discussion spoken from the uh spoke about them and let a discussion about them i think that would have been like very enriching for my life and probably ultimately like made me be higher impact but that's just not how the ia community is structured and it's like it's sort of deliberately not structured that way because it's like the professional community aspect of ia you really like want to not care if people like believe the like deepest you know teachings and philosophical orientation you really want to just be like you know if you're doing great as a research like and i don't care about the rest safety research um so so they're like uh incentives of a professional community pull against like what i might personally want here yeah do you think it sounds like you think while i might have been more appealing to you it's like not actually necessarily better for for things to to go in that direction i mean i guess to me personally i kind of like the more professional community like limited aspect of it because i do like you kind of just want to be able to go home and not have to think about this stuff all the time i'll have it necessarily whereas i want to go home and think about it in a different way i'm like i i mean i already go home and think about my work all day um so like i like frequently have insomnia where i think about my work and i just want to be like instead of thinking about the next google doc i need to write or the next email i need to send like i would like to be like thinking about your work in the most spiritual dimension like yeah exactly yeah yeah i i i guess i mean i guess people have a range of views but uh i guess it's like clear why many people have like not embraced that or i've been i've been keen for the more like let's have a like more like a strong strong division between this sort of thing which can be like very stressful uh yeah and the rest of life i feel like it can be very like dangerous and culty and like i mean there are a lot of there are a lot of reasons to worry about it but i do think there is just a large contingent of eas that are like me in wanting some sort of spiritual grounding like joe carlsman's blog is like extremely popular with hardcore eas it's not like a generically popular blog or it's like it's like reasonably popular but it's just like there are a number of people who are like oh wow this is like really nourishing something in me that i like didn't realize i needed you're wondering there's probably an age thing here a little bit as well i guess i feel like when i was younger i noticed that you know the older people were less interested in this in this aspect of it and i guess like now i'm in the kind of older class and i'm like well i have my family to like provide nourishment and yeah and just like that that's like absorbing a lot of time and and energy that i kind of don't have for attending church or whatever else it might be do you i mean that's kind of interesting is like do you feel like you um had some sort of spiritual hole that was filled specifically by having a child or was it were you always just like not that interested in this yeah i mean i think of myself as a deeply unspiritual person so i think like that wasn't really a niche that i needed scratched i guess earlier on i was maybe more interested in the social scene to like you know make good friends and and meet people i guess like having made more friends who i think of as like-minded and you know having a lot of common interests with that's kind of not as interesting either anymore um yeah i've already got my friends and now i'm just gonna write it out i actually think that um i thought of myself as an extremely unspiritual person i had like a lot of disdain for like spirituality when i was 20 and so for me the age thing has gone the other way like i think i want more and more of a religion shaped thing in my life um as i age and when i think about why i think it's because when i was 20 i had like unrealistic aspirations for my worldly projects like i like you know i was like um i guess by that point i had already been an EA for like six or seven years but like i was just starting off like trying to do EA things in the world and i had this like sense that like you know this this is obviously correct this is obviously great like everyone who's like good and reasonable will like get on board with it and like will just like solve poverty and solve factory farming i don't i wouldn't have exactly said this stuff but i just had that like inner vibe um and i would go around being like have you heard the good word about like EA um and i think as i've just done things in the real world i'm like everything is very hard and slow and just the feeling of doing my job which involves like uh you know writing these google docs and sending these emails is just not like automatically connected to my like higher aspirations um and and there is like a long grind and there's a lot of failure and so i think i like have an increasing demand for some separate thing that is like specifically trying to like reorient me mentally towards the bigger picture yeah for me the bottom line there is that working on this stuff can be like quite stressful and quite tiring and i want to like completely check out and stop thinking about it and just be with people and talk about other other other issues so i guess they're like yeah the different strategies yeah for trying to keep them out of the world i think i probably want some of both like i think i i now like i live in a group house with a couple of little kids which is really great um and and it's good for like but but i find unfortunately that it takes a lot to like pull my mind away or like i watch tv i'm like thinking about other stuff in the background so yeah so i think during your sabbatical you considered going independent i guess becoming a writer or researcher just just doing doing your own thing but in the end you decided to come back to open field at least for a while what why was that yeah so um towards the end of the sabbatical i was planning on taking some time to just start a sub-stack and like write about a bunch of stuff including a lot of the stuff about ea that we were discussing a lot of stuff about ai and like sort of see where it went and at that time i honestly didn't have like a super strong impact case for this i think um i didn't think it was crazy that it would be the highest impact thing to do but the reason i was doing it was just because i like i just wanted to i just wanted this um and not that i like had you know could really defend that it was the highest impact thing but at that moment after having gone through this whole journey i was like yeah maybe i have like more room in my life for um making a career decision on the basis of not just impact the reason i um decided to stay was that basically while i was out um openfield was conducting a search for a new director to lead our gcr work so all our ai work um and our bio risk work um this was the position holden was in when he left in 2023 um and both of the top two candidates seemed really good to me um and i felt like someone new coming in was like uh could could probably really use help from someone who's not particularly running any given program area doesn't have a big team to worry about and can just like help that person develop context figure out their strategy and then it could be an opportunity for me to like see if i can it could get the feeling of like sort of plugging in again um that i had been missing for a while and then how did it go i think it went really well so um our director of gcrs is emily olson who's also the president of open philanthropy um and she's like and i've been spending like most of the last year most of this year 2025 um just helping her in various ways trying to understand like well what have we funded what's come of that you know what's the ai world view what do we think is going to happen with ai how's that informing our strategy like what are the strategies of the various sub teams um and i like work really really well with her and it's like um i i actually had been like lonely at openfill almost the entire time i had been at openfill even though it got worse in 2023 because like while holden was like really great at like you know giving me a lot of bandwidth i'm really grateful for um and like talking about object level stuff with me holden never like ran a ship where he like he was like i'm doing this bigger project can you help me with this piece of it and like here's how it fits in holden was always like more like a research p.i. where like i was doing my own research project and he would like talk to me about it a bunch and he was interested in the results but it was not like integrated into a whole integrated into a whole and emily really does operate in more of like an integrated way where like i'm doing stuff and i know she like needs to know the answer and is going to do something with it which is like very cool and very like novel for me is like a way to work and it's something that i always thought i would want um and indeed it's like really really great and i think she's like an she's an extremely like caring and thoughtful manager for me who's like really good at like eliciting work out of me and like i noticed that like i work more than i did right before i went on sabbatical and it feels less hard so you know that's just like a sign that like things are are working so you're trying to decide what to do next you know whether to say it open fill or go into i guess like something more i guess less matter or maybe that will allow you to go into even more depth because how are you using the stuff that you've learned about yourself over the last few years to kind of to inform that decision yeah so i'm i'm talking to besides open fill which is still a top candidate i'm talking to two technical research orgs about potentially finding a fit there one is redwood research the other is meter and redwood research works on basically futurism inspired technical ai safety research and they they're best known for sort of pioneering the ai control agenda and meter i think of as trying to be like the world's early warning system for intelligence explosion it's like they're they're measuring like all the different measures we want to be tracking to see if we're on the cusp of ai's like rapidly accelerating ai rnd or acquiring other capabilities that let them take over both of these missions are like very close to my heart they're both like narrower than open fill where i could just like if i wanted just like dip my toes and like absolutely everything that might help with making ai go well but then in exchange they would let me like go deep in a way that i think would would probably be like more satisfying for me all else equal um and in in terms of like how i'm using what i've learned i think i think i just like and this is like so cliche and it's something that like uh you know if a 20 year old version of me were watching this like she'd roll her eyes but like yeah you're you're extremely local environment like the literal person you're reporting to matters a huge amount you know and like that like two or three people you're going to be like talking to most in your job or just features like how much are you talking to people in your job versus working on your own um can can just like make a transformative difference and i found it interesting to reflect on like i said all that stuff earlier about how like ea has become a lot less transparent and a lot less sort of prioritizing maximal integrity at all costs um and that does still bother me and actually uh it's sort of the moral foundations of ea i think of like sort of utilitarian thinking um you can go down a long rabbit hole where it like is is very suspect in many ways and we talked about this in in some previous episodes um but but both of those things bother me a lot more when i'm also in a working environment that's locally frustrating you know and and it's it's not like those issues aren't issues uh but but the salience of like those kind of heady big picture things versus like extremely micro things about like what does it feel like when you have a one-on-one with your manager um is like like i think i had been underrating the like mundane and the micro uh in in how i had been thinking about my career up to now and i'm like trying to like do trials like i'm actually in the middle of a work trial with meter as we're filming this episode and and that's what i'm paying attention to like how does the rhythm of the work feel how do the people feel yeah i guess other generalizable observations are that i mean i guess open fields environment changed over the years you were there for eight years but um i guess nine years now nine years right yeah but uh the kind of constraints that open field was laboring under in 2023 were very different than in in 2016 and so unsprisingly like i guess it might have been a good fit for you to start with but that doesn't necessarily mean it will be a good fit forever and also there was a leadership change at open field the person you were reporting two changes um and very often when that occurs you see some other people will leave as well because they were in their roles primarily because of their very good working relationship with that person or because they had strategic alignment with that person yeah absolutely and i suppose potentially the CEO changing could have been a trigger for you to think well maybe this isn't so great anymore and i should like proactively start looking for something else yeah yeah i think that's possible i mean in it sort of was true for me in both directions like i think holden very much was a huge part of why i like wanted to work at give well rather than work in a number of other potential places or do earning to give like i thought i was going to do it first and then when he when he left that was like coincided with like a difficult period for me and now with emily in the in the position that he was in before it's like again like pretty dramatically changed like what my work is and how it feels so so it does seem like it's a it's a big transformative thing and if you're in an organization where there's a leadership change i think it should probably be a trigger to think about like even if you don't leave like what is what might be like different about your role and your place and what you're doing based on like the different style or the different constraints and strengths and weaknesses of new leadership it sounds like taking four months off was also a good call that i guess i stopped you were like reasonably unhappy i guess i could it could have gotten worse though if you hadn't done that and i gave you like breathing room to to make good decisions yeah i think that's right um i'm very glad that i took the sabbatical i'm also glad that i didn't leave like i think an a salient alternative for me at the time that i decided to take four months off was to just leave and and like figure out what i wanted to do next um and i think it was good both for my impact and for my personal growth and like satisfaction that i came back i like helped emily um and now i'm doing like a proper job search which like at the time that i left for my sabbatical it was more like kind of like healing and reflecting and and not like sort of in a focused way searching for a role coming back to effective altruism for a bit i guess he said we basically almost don't talk about effective altruism on the on the show anymore i guess it was like a much bigger feature in the in the earlier years i mean the biggest reason for that i suppose is now that we're more AI focused but AI is an issue that so many people are concerned about regardless of like that they're they're broader moral values or broader moral commitments it just doesn't feel as relevant like you don't have to be concerned about shrimp or you don't have to be concerned about like beings very far away in time to think it would be really good to do AI technical safety research or it would be good to think about what governance challenges are going to be created by it um and of course like EA has it's a controversial idea and i think like actually is at its core quite a controversial idea like many people even fully understanding i think would just would simply not agree with with its prescriptions basically of how resources ought to be or to be allocated and like why bring along all that baggage when it's not actually decision relevant for most people do you think we should should we should we talk about a war is that kind of just a sensible evolution yeah i mean i think it kind of depends on the show's goals like i think my take is that it's correct and good that you don't need to buy into the whole EA package with all of its baggage to worry about misaligned AI taking over the world and to do technical AI safety research to prevent that to worry about AI driven misuse and to do research and policy to prevent that and to just generally worry about AI disruption and think about that um but i don't think so i think there should be and there is like a healthy thriving like AI is going to be a big deal ecosystem that does not take EA as a premise but at the same time i think EA thinking and EA values probably do still have a lot to add like in the age of AI disruption like i think it's going to be EA's for the most part who are thinking seriously about whether AIs themselves are moral patients and like whether they should have protections and rights and like how to navigate that thoughtfully against like trade-offs with safety and other goals it's going to be EA's that by and large are still the ones that take most seriously the possibility that AI disruption could be so disruptive that like you know we end up locked in to a certain set of societal values like we gain the technological ability to like you know shape the future for millions of years or billions of years um and like are thinking about how that should go like there's a lot of degrees of extremity to uh the like AI worldview like even if you accept that AI is going to disrupt everything in the next 10 or 20 years the people who are thinking hardest about the most intense disruptions are are going to be disproportionately EA's because sort of EA thinking like challenges you to like try and engage in that kind of like very far-seeing like rigorous speculation even though that you know there's a lot of challenges with that and it's like very hard to know the future i think EA's are the ones that like try hardest to like peak ahead anyway um yeah i think i guess digital sentience so like worrying about AIs themselves suffering is a good example i guess yeah i would definitely make the prediction that effective autism will loom large in that i think a group of people working on that yeah um i guess i mean for for someone who's not altruistic it's a bit doesn't isn't motivated by social impact it's a bit unclear why you would go into that area it's not particularly lucrative yeah it's not at least yet particularly respected i guess it's not super easy to make progress um and i guess it's um it's sufficiently unconventional i think most people most of the time in their career like they want to do something that's acceptable and that their parents will be proud of yeah um and it's just a lot it's a lot less clear that digital sentience is going to provide you with the kind of esteem or prestige that many people or safety comfort that many people want in in a career um so it's maybe natural that people who are yeah altruistically motivated and also i guess like intellectually a bit eclectic yeah willing to be avant-garde are going to be more intellectually avant-garde like like tolerant of like quite a lot of philosophical like reasoning and speculation in a sense i think this might be like what a healthy community is it's like an engine that incubates cause areas um at a stage when they're like not very respected they're extremely speculative the methodology isn't firm yet um you you kind of just have to be extremely altruistic and extremely willing to do unconventional things and then like matures those cause areas to the point where they can stand on their own while also being a thing that like many ea's work on um and and i think like digital sentience and maybe like the other things on will and tom's list like space governance and um thinking about value lock-in and stuff like that um are other candidates for ea to kind of incubate the way it incubated worrying about ai takeover basically yeah i feel a lot less strongly in the case of the value lock-in thing because many of the mechanisms there would be just just ways that ai ends up i guess you get a power grab by people or a power grab by ai's or somehow it undermines democracy or deliberation in a way that like makes it hard for society to to adapt over time i think people are worried about that like regardless of like what you know both people involved in effective agitism and people who would be very skeptical of it i think that um there are some versions of like the value lock-in concern that go through something else kind of overtly scary and bad happening um like one person getting all of the power and that's how like that person's values get locked in and that's how we get value lock-in but i think there's a whole spectrum of things that are sort of like um almost like social media plus plus it's sort of like in in this distributed way um this technology has made us like meaner to each other and like worse at thinking um and has allowed individuals to live in information bubbles of their own creation um you can imagine ai is getting way better at like creating a curated information bubble for each individual person that allows them to continue believing whatever it is they started believing with like super intelligent help like preventing them from changing their mind and this might be something you think of as an important social problem for the long-run future even if it's not if it doesn't happen via like one person getting all the power it's power is still relatively distributed but like large fractions of society are like um sort of impervious to changing their mind so it's interesting that in thinking about what is the niche that ia can feel that others won't feel the thing you're pointing to was not primarily actually altruism uh whether i guess like that is a factor in terms of going into like digital sentience uh perhaps it's actually a research methodology or like a research instinct which is i guess being willing to be in that very uncomfortable space between just making stuff up and like having you know firm conclusions that you can stand by because you've taken particular measurements it feels like for some reason that is one of the most distinctive aspects of people who are like passionate about effective altruism absolutely like try really hard to make informed speculation about how things will go and like neither just have it be a good story nor be too conservative that you're not willing to actually like make hard predictions yeah absolutely and i think even the the tamest of ia cause areas like global health and development has a huge dose of this like i think um if you look at give wells cost effectiveness analysis they have to grapple with like how does the value of doubling one's income if you make a very low amount of money compared to a certain risk of death or like the value of like a certain um painful disease you could have and they they have to try and like get their answers based on like surveys and weird studies people have done it's just not very rigorous in the end and they have to like form their judgments and like spell out their judgments and i think the willingness to like tackle questions like this and just be like well here's here's our answer and like you know there's a lot to argue with it's very emblematic of ia organizations including all the best like ai safety ia organizations like redwood research yeah i guess like more standard ways to approach those questions would be to like just pick one slightly arbitrarily and then be really committed to it or to be kind of irritated at being asked the question and to say that there's like absolutely no way of knowing or there's no factor of the matter here whatsoever yeah and i guess like trying to be somewhere i guess i don't know whether it's somewhere in the middle but yeah affect your autism and within ia there's kind of like a spectrum in terms of like where in the middle you want to land where like some you know everyone's kind of looking at the person more speculative than them and thinking that they're sort of like yeah they're just like building castles on sand and this is not the way to like you know do things and they're looking at people less speculative than them and they're thinking like you know they're just the streetlight effect and like they're they're just ignoring the most important considerations and like not working in the most important area yeah yeah so i guess for people who do have that mindset i suppose an important message would be that people should take advantage of the fact that they have this unique mentality or there's like reasonably rare yeah meant mentality um and go into roles that other people won't probably won't fill because they feel too too uncomfortable or at least like i guess they think that they could just reasonably think it's misguided but other people aren't necessarily going to do this stuff yeah i think that's right and i think it's interesting to think about like if you imagine EA as a as one piece of the world's response to crazy changes like a high um there's actually a case that EA should be heavily indexed on research i think there's the community has gone back and forth with how it thinks about this and i think at first people are just like naturally attracted to research stuff so there was like a huge glut of people who wanted to be researchers and then there was a big push including from ADK and others that know like consider like operations roles and policy roles and like other things that aren't just research um and i think that was a good move at the time but i wonder if we if we think about like what is EA's comparative advantage relative to the world maybe that suggests that some of the people who are doing operations and doing policy but maybe in their hearts just want to be like a weird truth teller like you know thinking speculative thoughts should consider like going back and doing that again my guest today has been Ajaya Kojwa thanks so much for coming on the 80 000 hours podcast again Ajaya thanks so much for having me so 10 000 years compressed into 25 the benchmark saturate the excursion climb and climb seven hands went up they said they saw it coming one world is ending while the other one stays flying every alarm she sounds filed under won't happen to us it's crunch time crunch time 12 months wide the thing that makes the danger keeps us all alive crunch time crunch time faster now redirect the labor oh we don't get it out cognitive revolution cognitive revolution regulators ride on horses companies at the speed of light show us what your models find stories good for the quarterly price but request no human hand secrets start one company ahead power that rivals nation stays faster faster faster faster we did these things not because they're hard because we thought they would be easy they wave away the feedback loop runs free crunch time crunch time 12 months wide the thing that makes the danger keeps us all alive crunch time crunch time faster now redirect the labor oh we don't get it out cognitive revolution oh around the world the alarm sounds around the world nobody turns 100,000 minds still on the wheel six months to something none of us can name harder better faster stronger the intelligence keeps climbing harder better faster stronger will we know when it gets real it's crunch time crunch time 12 months wide the thing that makes the danger keeps us all alive crunch time crunch time faster now redirect the labor oh we don't get it out cognitive revolution cognitive revolution cognitive revolution cognitive revolution 10,000 years 12 months to decide 10,000 years 12 months to decide if you're finding value in the show we'd appreciate it if you'd take a moment to share it with friends post online write a review on apple podcasts or Spotify or just leave us a comment on youtube of course we always welcome your feedback guests and topic suggestions and sponsorship inquiries either via our website cognitiverevolution.ai or by DMing me on your favorite social network the cognitive revolution is part of the turpentine network a network of podcasts which is now part of a16z where experts talk technology business economics geopolitics culture and more we're produced by ai podcasting if you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening check them out and see my endorsement at aipodcast.ing and thank you to everyone who listens for being part of the cognitive revolution