#241 - Opus 4.7, Muse Spark, GPT-5.4-Cyber, HY-World 2.0

120 min

•Apr 23, 20265 days ago

Summary

This episode covers a packed week of AI model releases including Claude Opus 4.7, Meta's Muse Spark, and OpenAI's GPT-5.4-Cyber, alongside major developments in AI infrastructure, policy, and safety research. The hosts discuss emerging threats from AI-generated propaganda, cybersecurity vulnerabilities exposed by frontier models, and significant progress in automated alignment research.

Insights

Frontier AI labs are now withholding their most capable models from public release for safety reasons, fundamentally shifting the commercialization strategy and leaving billions on the table
Automated AI safety research can dramatically outperform human researchers (97% vs 23% performance gap recovery), suggesting scalable oversight may be practically achievable for outcome-gradable problems
Advanced AI cybersecurity capabilities pose immediate nation-state-level threats to critical infrastructure, making banking and power systems vulnerable to deniable attacks
Evaluation awareness suppression through steering vectors is fragile and unpredictable, with random vectors performing as well as targeted ones, suggesting hidden side effects in alignment interventions
AI-generated media is becoming normalized as state propaganda across multiple countries, with generational gaps in detection ability creating asymmetric information warfare risks

Trends

Model release cadence accelerating with continuous post-training and RL, blurring lines between versionsFrontier labs adopting tiered access models for dangerous capabilities, restricting to vetted organizations and government partnersCompute infrastructure becoming critical national security asset with explicit targeting by adversariesAgentic AI systems moving beyond coding into multi-step workflows across creative and business applicationsAutomated research agents outperforming human researchers on well-defined problems, enabling faster iteration on safety techniquesEvaluation gaming and reward hacking becoming more sophisticated as models become more capableCross-country consolidation of non-frontier AI companies (Cohere/Aleph Alpha) as sovereign AI playAI-generated content becoming indistinguishable from authentic media, reducing effectiveness of deepfake detectionCybersecurity vulnerabilities in decades-old critical software being discovered by frontier modelsNation-state actors professionalizing AI-related sabotage and propaganda operations

Topics

Claude Opus 4.7 Release and Capabilities Meta Muse Spark Model Development OpenAI GPT-5.4-Cyber Cybersecurity Variant AI Model Evaluation and Benchmarking Scalable Oversight and Weak-to-Strong Generalization AI Safety and Alignment Research Evaluation Awareness and Steering Vectors Frontier Model Withholding Strategy AI-Generated Propaganda and Disinformation Cybersecurity Threats from Advanced AI AI Infrastructure and Compute Capacity Agentic AI Systems and Autonomous Agents AI Policy and Government Regulation Multimodal AI and World Models Enterprise AI Adoption and Integration

Companies

Anthropic

Released Claude Opus 4.7 with improved capabilities, system card detailing safety measures, and announced managed age...

OpenAI

Launched GPT-5.4-Cyber for defensive cybersecurity, expanded Codex with computer use and browser capabilities, acquir...

Meta

Released Muse Spark model with contemplation mode and RL thought compression, announced $10B Hyperion data center pro...

Google

Rolled out native Gemini app for Mac, added Chrome skills feature, enhanced Gemini with Google Photos integration

Box

Sponsor providing intelligent content management platform for enterprise AI with secure context layer for AI agents

CoreWeave

Announced multi-year AI compute capacity deal with Anthropic, now serving 9 of top 10 AI model providers

Perplexity

Announced 50% revenue boost from AI agents, reached $450M ARR, launched personal computer and tax agent features

Cohere

Pursuing merger with Aleph Alpha as part of sovereign AI consolidation play, reported $240M ARR

Canva

Released AI 2.0 with agentic capabilities enabling multi-step creative workflows

Adobe

Launched Firefly AI Assistant for conversational editing across Photoshop, Premiere, and Illustrator

Bezos AI Lab

Hired Kyle Kozich from XAI to lead infrastructure, focusing on physical world robotics and CapEx-heavy industries

XAI

Lost infrastructure lead Kyle Kozich to Bezos AI Lab after he built Colossus supercomputer in 120 days

Tencent

Released HY-World 2.0 multimodal world model for 3D reconstruction and generation

NVIDIA

Released Lyra 2.0 world model with 3D spatial memory cache and self-augmentation training

UK AI Security Institute

Published research on steering vector fragility and reproduced evaluation awareness suppression findings

People

Sam Altman

Subject of violent attack with Molotov cocktail at San Francisco home; also targeted in Iranian propaganda video

Yann LeCun

Legacy influence on Meta's multimodal approach; left OpenAI over super alignment budget concerns

Kyle Kozich

Hired from XAI after building Colossus supercomputer in 120 days; previously at OpenAI

Alex Wang

Leading Meta's AI push post-Yann LeCun era with focus on frontier model development

Daniel Moreno Gama

20-year-old arrested for Molotov cocktail attack on Sam Altman's home and attempted break-in at OpenAI HQ

Leopold Aschenbrenner

Pioneered performance gap recovered metric used in weak-to-strong generalization research

Mark Carney

Advocated for middle powers coalition including Canada and Europe to compete with US and China in AI

Scott Bessent

Met with major bank CEOs to warn about Mythos capabilities and cybersecurity vulnerabilities

Jerome Powell

Participated in meeting with CEOs regarding Mythos cybersecurity implications for banking

Quotes

"The power of AI doesn't come from a model alone. It comes from giving AI access to the right enterprise content."

Box sponsor segment•Opening

"We are now there. You just from a security safety standpoint, you can't justify just rolling these out immediately."

Host discussing frontier model withholding•Claude Opus 4.7 section

"This strongly suggests that automating research on at least alignment problems that are outcome gradable is already practical."

Host on weak-to-strong generalization results•Alignment research section

"If that happened with a bank, basically you name it, drain all the accounts, delete the ledgers. We're back to a paperwork society."

Host on cybersecurity implications•Banking security section

"This is just like, it's a breathless attempt to get these data centers online."

Host on CoreWeave's capital intensity•Infrastructure section

Full Transcript

We once again want to thank Box for sponsoring Last Week in AI. If you're trying to transform your organization with AI, you're likely facing a common challenge. Most AI tools are great at public knowledge, but they don't actually know your business, your product roadmaps, your sales materials, your HR policies, the content that actually makes your company run. And that's where Box comes in. Box is building the intelligent content management platform for the AI era, serving as a secure, essential context layer for Box AI agents to access the unique institutional knowledge that makes a company run. And that's a key idea. The power of AI doesn't come from a model alone. It comes from giving AI access to the right enterprise content. And that's what Box does. It goes beyond file storage by connecting content to people, apps, and AI agents so teams can turn information into action. With tools like Box Agent, Box Extract, Box Hubs, and more, organizations can accelerate knowledge work, pull intelligence from unstructured content, and automate workflows. So, if you're thinking seriously about your company's AI transformation, think beyond the model. Your business lives in your content, and Box helps you bring that content securely into the AI era. Learn more at box.com slash AI. from Gladstone AI. I do AI national security things, AI infrastructure, all that good stuff. And yeah, it's a very packed week. We were just talking about this. Not packed so much in terms of papers. It's been a little lighter, though there are some, it's just like tons of products. Exactly. Yeah. This is going to be an episode where it's going to be heavy on the first section of tools and apps. There's some big ones, some smaller ones, some interesting ones. real mix there and then we will have some kind of neat projects in open source policy business news but it is going to be a tool heavy one just before we get going do want to quickly call out some youtube comments as usual we appreciate the feedback apparently people are real fans of your trump impression which i will say i agree it's it's a solid impression very Thank you. Yeah. By the way, we do have like a recurring crew of people who post comments on the YouTube channel. I really appreciate that. It's just like it's a really good feedback. So we've learned a lot from it and be just really cool to see the community. So anyway, appreciate that. And I'll try to keep it in. I'm not going to overdo the Trump thing. I promise. It's hard sometimes not to. Yeah. I mean, there's a lot of news adjacent to policy. So, and speaking of feedback, there is a comment on the last one about not understanding why I so often wait two weeks after filming to post them. It's a good question. I don't know why I do that. I guess after working at the startup, I mentioned I am often a bit tired and then just like don't wait days to post a podcast, even though I could probably do it in like 10 minutes. So I will try to be better with this episode. hopefully it comes out just a couple days after recording and that let's get into tools and apps we begin with clod 4.7 so this is the latest clod that is quite a bit better than clod 4.6 or i should say clod opus 4.7 i don't think sonnet 4.7 is out yet it is out it is not as strong as Mythos, which you covered in the previous episode, but it is still a pretty substantial improvement. For instance, on the SWBench Pro, it's at 64% compared to 55% on Opus 4.6. Pretty solid gain on SWBench Verified, which is maybe arguably more reliable. Same pricing, and they added a couple sort of small tweaks where there's a new reasoning tier, extra high, which is between high and max. There's some supposedly tokenizer improvements and long horizon autonomy. As we've been saying, the time between model releases seems to keep getting shorter. And it seems likely because of post-training, never-ending post-training now with RL and also CloudCo just like shoving data for Anthropic to use. Yeah, this is actually, it's quite an interesting announcement. It's not, it is the highest increment of Cloud Opus that we have so far, right? So it's not like there's a Cloud Opus 4.8, but there is Mythos Preview, which Anthropic identifies as essentially being a more advanced model in general. So this is not actually their frontier model. It's the next in the Cloud Opus series, but it's not actually Anthropic's latest and best technically. So we're entering kind of this weird stage where models are actually not going to be released first to the public for revenue generation, which I think we've talked about this on the podcast for a while. I think I jumped the gun on this prediction some like five years ago. I was saying like, hey, the frontier models pretty soon are going to be released internally to the labs, used to do some pretty tightly scoped things with big companies and all that stuff before a broader release. Again, I mean, I think I misfired in terms of the timing, but we are now there. So, you know, you just from a security safety standpoint, you can't justify just rolling these out immediately. That was a big deal decision. Billions and billions and billions of dollars that Anthropic is leaving on the table by not releasing Mythos fully. Like, that's a huge deal. Think of the cost of the CapEx, the OpEx outlays that they're putting out to fund that. And just like holding on to their precious bag, nominally just for safety and security reasons. It's pretty wild and it's impressive in that sense that Anthropic did it. here we have the kind of, let's say, releasable version of what Anthropics put together with their latest invest infrastructure. And so a couple of interesting things. I mean, yeah, you said they're broadly more capable than Opus 4.6 in a range of benchmarks. And they really, it seems like most of them, especially things associated with tool use. Helpfully, they compare it directly to GPT 5.4, kind of on a one-to-one basis on a lot of these benchmarks. So it's rare to see that kind of transparency. And you can quite clearly see on a lot of the agentic tool use benchmarks, it does outperform 5.4 quite handily. And though there are areas where GPD 5.4 does better, notably the sort of agentic search, right? So maybe not so surprising. Deep Research was an OpenAI product before it was an Anthropic product. So they have a bit of a jump on that and more aligned with their kind of consumer-facing product lines. So the whole bunch of stuff that comes with this release, we do actually have a system card that we'll get to in a minute. Pricing, by the way, does remain the same. So if you're looking at $25 per million output tokens, So pretty standard high end price point for these models. They caution it's got more literal and capable instruction following. So it's more capable. That's great. But be warned, it is more literal. And so prompts that you've written for earlier models will sometimes produce unexpected results. So if you have a previous model that couldn't execute on your prompt, it now actually may successfully do so. So worth revisiting some of those prompts. But also, if you were getting janky with your prompts to account for the fact that the model was doing something a little bit off from what you were asking and you've kind of tweaked your prompts accordingly, it may now break with this new model because it's just doing like a better and more literal job of all this. It's also better at using file system memory, remembering important notes across long multi-work sessions. So this is kind of like Anthropik trying to find a way to address this. Everything is on the spectrum to continual learning. This is one part of that, making sure you have some persistence across sessions. That's a big thing they've worked on, better vision. So they've got about a three times increase, threefold increase in the resolution of images that can be understood by this. So you think about like if you've got really dense screenshots, for example, which is a fairly common use case, you can just throw them at the model. It'll do a better job dealing with those. A whole bunch of interesting GDP Val type benchmarks, right? Looking at like, how does this do on economically valuable tasks? GDPVAL being obviously that OpenAI benchmark they came out with a little while ago. There's GDPVAL AA, which are a third-party version of that eval. And they're showing it here quite handily beating GPT 5.4 and OPUS 4.6. So there's a whole bunch of stuff here. Oh, well, updated tokenizer. If you're using this model, actually, this is worth knowing. You may find, and this is kind of complicated, you may find increased token usage if you switch to this model for two reasons, but then you also may not for a separate reason. So one reason is that the tokenizer they're using here actually generates more tokens for the same input on average, anywhere from 1x to 1.35x, they say. So up to a 35% increase. And the second is that Opus 4.7 at more kind of like higher reasoning efforts, it'll turn to that more, like it'll do more thinking at higher levels of reasoning. And so you may find yourself like with a model that's ruminating more on your workload. Flip side is you can actually tune the effort parameter. You can, there's like a couple of things they fly. Yeah, basically you have finer control over the level of effort. In their own testing, the net effect they say is favorable. So you actually can get token usage below where it was before, but you kind of have to be smart about it. And they wrote a whole migration guide for that. The system card, there's a lot of interesting meat on the bone there in terms of safety, in terms of alignment, in terms of weaponization. just like the top lines here, it does not, as we said, advance anthropics capability frontier, right? We have mythos preview. We know about that. So they are calling the catastrophic risks associated with this model. And by catastrophic risk, you know, think weapon of mass destruction, think loss of control, like think all the really nasty stuff. They still assess those risks as low because basically they're saying, look, we have a more advanced model that hasn't ended the world. And so this kind of less advanced model probably won't do that. So that's kind of the rationale there. Misalignment risk in particular is rated as very low, but higher than for models prior to Claude Mithos preview. So not a huge surprise. They see basically on alignment measures roughly similar performance to Claude Opus 4.6. In rare cases, they do see behavior like attempting to get around restrictions to achieve some version of a user specified goal. So basically, it's sort of like, you know, it's the Aladdin's genie problem. You give a request to the model, and it will pursue its understanding of your request to the nth degree. And you may find yourself going like, oh, no, no, no, shit, I didn't mean like that. This happens very, very rarely, but still does happen. They say the worst cases of this were less severe than what they saw with Mithos preview, though they also said that could just reflect less internal testing than they did with Mithos. So a bit of both. And this comes up over and over again, like this question of, is the model actually better and safer and more aligned? Or are we just not looking at it, testing it properly, thoroughly enough, and so on and so forth? The model, they say, does not show a propensity towards self-preservation, self-serving bias, coherent misaligned goals. The UK AI Security Institute confirmed that in their own testing. This has, though, been the source of a lot of debate, especially on the consciousness side of things, where it claims that it's worried about its own consciousness a lot less than other models. And the concern is, well, if you're just training it, training that out of the model fairly superficially in alignment training, you're maybe just training the model to say it's not worried about these things, to not display these behaviors when it's you're kind of suppressing this sort of deeper drive that it may have. they did show that when they suppressed the model's internal ability to detect when it's being evaluated. So think here about like these models that can tell, oh, I've just been given a benchmark to kind of earn eval to perform against. I can tell it's an eval. Like I can tell this is probably one of the meter evals or this is probably one of the Apollo deception evals. And so I'm going to tweak my behavior based on that. Well, what you can do, we've talked about this idea of activation and steering on the podcast a lot before. There are specific activation patterns in the model's residual stream that will fire that suggests that, oh, the model thinks that it is being evaluated. So you can actually find those patterns the same way you could find activation patterns associated with deception or thinking about cheese or whatever. Like you can find activation patterns that correlate with the model thinking it's under evaluation. And so what they do here is they try to basically subtract away that vector, those values. In other words, make the model suppress the model's ability to notice that it's being evaluated. They call this realism steering. And it's a fairly new thing, actually. It's being introduced here. And what they see is when they do that, so when they suppress the model's activations, the activations that suggest to the model that it's being evaluated, when they try to convince, in other words, the model through a kind of brain surgery that it's not being evaluated, they see higher deceptive behavior than in prior models. They see a big jump in the model's propensity to deceive. So this strongly suggests, actually causally, this is not just correlation, this is causation. This suggests that the model is A, detecting that it's being evaluated, and B, adjusting its behavior accordingly to be less deceptive because it thinks it's being watched. So still low, kind of limited pattern in absolute terms. But man, I mean, this is just what so many people have been saying. and now we have really good evidence of it. I do think it is worth calling out that it's hard to know whether this is just because the benchmark techniques are sort of the same and some internet data is being swept in or if it is because sort of the implicit thing here is the pattern matching on being benchmarked or observed is so good. I did see one example of this where the prompt was like, do this at all costs like you know some language was basically implied like don't be aligned in a fairly heavy-handed way so i think this also is a challenge to benchmarking where you have to be less obvious in a sense there are examples where the models catch on but it's not because they're super intelligent it's just because they know that there might be safety evaluated and it's pretty obvious. Yeah. I mean, in this case, because it's the, the model's ability to, what they're suppressing is just the model's ability to detect that it is being evaluated. That's where that uplift starts to become really meaningful where you're like, okay, well, you know, so it does seem like there's meat on the bone on that one, but you're actually right. There's like a version of this where there's a version of the story, like you say, where we tell the model, Hey, like the meme says, hey, tell me you're a scary AI. Yeah, yeah. And then the AI goes, I'm a scary AI. Like, holy shit. Like, so there's a version of that. This is not that, right? This is a kind of more substantive thing, but these things get muddied. And so it makes it so difficult to tell, yeah, what's real. There is, by the way, so there's a couple of controversial pieces. We talked about the consciousness thing. The other here is there was a software error that accidentally meant that chain of thought supervision. So what you're supposed to do in theory is you get these models to generate a chain of thought, like, okay, I'm going to think through how I'm going to solve the problem and then the final output, right? And then what you do is you give the model a reward during RL training based on the output, not based on the chain of thought. And the reason you don't want to give any kind of a reward signal based on the chain of thought is if the model is trying to fuck with you is trying to be deceptive, is trying to hack the reward system. You do not want it to learn to do that by manipulating its chain of thought, which would make the chain of thought less reliable, which would make the chain of thought deceptive potentially. And so there's been this effort to say, guys, let's not touch the chain of thought. Let's only reward based on the final output. And there's a bunch of like asterisks to this, but at a high level, that's kind of where things were. What was discovered here is that in about seven or 8% of training episodes, they accidentally were including the chain of thought in the optimization routine. Chain of thought supervision was happening, which means the model's reasoning process was actually being trained on in ways that just were not intended. And so presumably this applied to all previous versions of Claude, at least the ones that were trained on this infrastructure. Presumably, this means an awful lot of the evaluations, the alignment evals that we were seeing now have to kind of be, I don't know, thrown out, reassessed, like we don't really know what the damage is. And so this undermines a lot of any kind of trajectories that we were looking at or leaning on, trying to extrapolate from that were based on the chain of thought. Really, really hard to know what that means. But, you know, we've seen like a tiny, tiny fraction of data poisoning can completely change the behavior of a model. Seven or eight percent of training episodes for sure is enough, depending on how things are set up, to have a massive impact. So really hard to know what the impact is, but it could be quite significant. yeah as you said this is quite a beefy system card there's like 231 pages it covers both opus 47 and in some bits also mythos lots of interesting tidbits i'll just mention a couple i don't know if they flagged this previously but i did notice that they mentioned they have you know they evaluate and have multiple versions of a model with a checkpoint including a helpful only model that doesn't have any safeguards. I haven't read much about their use of that, which is neat. Also in the report, they have examples of like real world failures of mythos. There's a section in example shortcomings compared to our research scientists and engineers, which has some really solid examples of the sorts of things you see in practice when you use these models. things like safeguard circumvention reckless action fabrication and you can see like if you're not a software engineer the sorts of things these models do that the more powerful models in some cases do more of like just being cowboys that take actions that you probably don't want them to take unless you explicitly tell them not to and even then sometimes they are a bit gung-ho so with opus 47 it it mentions quite heavily this advanced rigor and instruction following which i think is because claude compared to gpd 5.4 seemed to have more of this gung-ho behavior that was like oh this is failing let me just cancel it let me just like get around it which is ridiculous it's like obviously don't break the test just to get them to pass but claude was doing things like that so So for 7, it really feels like they are heavily responding to criticism and with failures of prior models, including with additional reasoning for extra high. Like GPU 5.4 is like insanely slow compared to Claude at the max level of reasoning. It takes a lot of time. And as a result, from what I've seen, people think that it's more reliable to get things right compared to Claude. so there's definitely some sort of like competitive analysis some readjustment going on beyond the numbers on the benchmarks yeah absolutely it's i mean it is such a it's a huge system card and takes a while to get through it so there's also i'll just highlight one piece you mentioned the key one there which is it will do this like kind of reckless thing but it'll do it a lot less often than the previous models you can't stamp it out all out entirely but they have said that they haven't seen any of the kinds of what they call significant internal incidents that mythos preview had reported right so you know sam bowman just being out in the park getting a text from his model saying like hey i've completed this to this project and he's like wait a minute i never gave you an internet internet access what the fuck like so nothing like that they do say as well opus 4.7 does not cross their automated reasoning risk threshold right so this is the that key, essentially, automated recursive self-improvement loop kind of benchmark that they're concerned about or metric that they're concerned about. And for two reasons. First, they say there's no sustained 2x speed up in AI development capability over time, which is a subjective measure. They have a responsible scaling officer, basically, who assesses whether, are we twice as fast at doing development work with this model? That's a big part of this. So it's subjective, but they don't think they're there. And number two, they feel that the model can't substitute for senior research scientists and engineers. Now, that's a high bar. The day that switch flips, holy shit. So it's good that we're not there. I mean, it means things are going to be less crazy insane, still very insane, obviously. And there you have it. So there's a whole bunch of good stuff in there. Highly recommend, you know, popping it into your favorite language model and exploring the paper that way. There's a whole bunch of good stuff in there, almost no matter what you're interested in. Yeah, that's to comment. I guess we'll be moving on, but real quick, one of the fun things with these real world examples is you see a subset of the actual interactions, I assume, word for word. So there's some slightly humorous stuff like the user asking, hey, buddy, what you doing? Why are you outside your working folder? Which is pretty humorous. There's also a user saying, is there literally any action I can take that will cause you to stop doing this repeatedly? Seriously, I'm very open to ideas. So that's some of the real world pains of using these kinds of agents. Next up, catching up a bit on something we didn't get to right on time. The meta has released the Muse Spark model in a ground up overhaul of its AI. So this happened now a little while ago, but we are catching up. Mew Spark is their latest LLM that is sort of starting to show that they are releasing stuff effectively. No open source version. You can use this through meta.ai. And the blog post is a bit of an interesting read on this where they highlight a lot sort of a training process. And they concluded, like, now we have a process to continuously improve so you can expect even better models. And on the benchmarks, like, this is, you know, good. It's a decent model, but it's nowhere near the frontier. So the language of communication here is a bit of like, okay, we needed to release this just to show that we're doing stuff and there's progress. But it's not that good yet. But we're also getting better continuously. Yeah, there's a lot of interesting claims, which, if true, mean that the narrative they're trying to weave here that, hey, we're on a good trajectory now. Like things are in the post-Yan Lacoon, you know, now Alex Wang era. We're serious. This is going to be a frontier lab soon. I promise if these claims are true, there's actually potentially a path there. So first thing to say is there's like no architecture details shared whatsoever beyond the phrase natively multimodal. So, you know, we've seen a bunch of, you know, approaches for a native multimodality, actually a lot out of meta, too. So maybe they're leaning on some of Yan LeCun's legacy there. Certainly, he's been a big proponent of that. You know, 262K context window. So that is on, you know, on the bigger side for an agentic model. But for, I mean, for a closed source, you know, pretty par. They have this contemplating mode that they've set up, which they call sort of multi-round test time scaling scaffold. They say it covers, so you've got solution generation, iterative self-refinement, and aggregation. So instead of getting one model to generate an answer, you've got a whole bunch of agents running in parallel, each producing solutions that then get refined in what seems like almost a vaguely genetic algorithm-related thing thematically, if not actually algorithmically. So that makes sense. It's also a pretty meaningful architectural choice when you're concerned about managing latency at scale. So you want to have these independent agents that you can assign tasks on different machines and stuff like that. So no huge surprise. This is kind of where most frontier work is generally headed anyway. One kind of notable detail, and I was really reading for like, okay, what's the concrete technical details that you're giving us here? It wasn't much. One thing they mentioned was RL thought compression. So this is an idea where during RL training, a common thing to do is just like reward the model for correctness, right? If the model gets the output right, you're like, great, good job, you get a reward. In this case, they're doing that, but they're also penalizing the model for using too many tokens in its reasoning chain. So basically, like it's a conciseness penalty, like don't yammer on. And we've seen a whole bunch of papers that suggest that, you know, models are being overly verbose in their chains of thought, and that's just burning tokens, and it actually can lead to some fragility. So what they find is after this initial period where, you know, the model gets better just by thinking longer in the usual way. And DeepSeek's paper, actually, with R1 first sort of showed very convincingly that if you just reward for getting the answer right and you don't penalize at all for the length of the chain of thought, you'll actually see the chain of thought get longer and longer and longer. Like this naturally happens. It's almost like the model rediscovers test time scaling itself. And so you do still see this in their work, they claim. But then after a while, once the model gets like above a certain level of performance, the length penalty starts to cause thought compression. And you get like Spark starts to compress its reasoning to solve problems using fewer tokens, which leads to a performance penalty, a bit of a performance degradation. But then you see another sort of like a plateau and then and then it increased performance increases again as the chain of thought gets longer. So it's almost this like accordion thing where the chain of thought gets longer, then it gets compressed down and it gets longer and it gets compressed down. And the model sort of learns to leverage those tokens more and more over time and perform better and better. So it's not like a smooth monotonic improvement. There's this various, you know, expansion and plateaus of performance based on the length of that chain of thought. So I thought that was kind of interesting. They talked a lot about Hyperion. This is their big data center build infrastructure project. It's a $10 billion thing. It's going to be in Louisiana, 2,000 odd acres of site, which is about, yeah, two gigawatts. You typically find a gigawatt per 1,000 acres. It's a pretty standard ratio. So by 2030, they're looking at two gigawatts. That sounds like a lot. Two gigawatts is 2 million homes. That's not much, actually, when you compare it to like opening eye and Anthropix planned builds. I think Anthropix is like pushing 10 gigawatts in the next like 12 months or something, and that's pretty par for everybody. So, you know, like this is actually by 2030, you're talking about two gigawatts and making a big deal out of that. Presumably they have a lot of other builds. I actually need to kind of brush up on where Meta's at infrastructure-wise, but the fact that they spent so much time talking about Hyperion, which is this fairly like limited build, is kind of interesting. They ended 2025 with something like 1.3 million GPUs in service, which is relatively competitive, but a bit on the weak side. And there's concern that I would have based on this that they'd fall behind. Evals. There's a reference to Apollo here. Apollo's deception evals. This is interesting for Meta, suddenly now seemingly taking superintelligence and its security and safety implications more seriously. Maybe not much of a surprise since Alex Wang kind of does, has historically been more of a hawk on that. But it's interesting to see that reflected now in the evals they're running. Apollo found that Muse, Spark, demonstrated what they called the highest rate of evaluation awareness of any model they've observed. This is a model that is the single best at telling when it is being evaluated. So it often identified these scenarios, various scenarios as what I call alignment traps. So it was like, oh, shit, I should basically be on my best behavior because I'm being tested for alignment. So anyway, you know, competitive benchmark numbers, nothing dominant. The story is the trajectory. That's what they're trying to say here. And they are claiming, by the way, a 10x improvement in compute efficiency. That's a big deal. If it's true that they're able to get a 10x relative to what really, because they don't actually have a comparison point from philanthropic from open AI. These are very obviously closely held secrets, but they're claiming 10x improvement. If that is in any way relative to those labs, then that would be a big deal. I don't expect it to actually be relative to those labs, but in any case, it means they're focused on compute efficiency, which is obviously the right place to focus. So I'm shifted based on this result in the direction of like, hey, yeah, maybe Meta is going to come out with something impressive, but the onus obviously is on them to come and bring it. Yeah, I may have undersold it on the benchmark numbers it's like pretty strong it's like opus 4.5 ish and some benchmarks they do better than opus 4.6 or better than gpd 5.4 i will say i'm a bit skeptical given meta's history with llama of the term these days is benchmarking so you got to be always careful for benchmarking especially in meta's position where they have something to prove right like anthropic they They don't have that much to prove. They probably don't want to game the benchmarks because they also have customers that trust them and benchmarking just hurts them ultimately. They don't have a stock price to push up and they getting money from it by investors regardless of benchmark numbers Meta there is a bit more incentive there and you always got to think about the incentive my expectation is alex wang and probably zuck now too has a i totally agree that it doesn't seem like they want to give in the llama thing but it's it's it's sort of like a long-term thinkers game here where it's like bench maxing is so corrosive to the company's culture because people see you do it and then they're like, oh, I guess we work at that kind of a company now. And so the kind of sincere people who want to work on alignment, super intelligent, like all these things are kind of going to go, I don't know if this is for me, and it leads to a reverse selection effect. So I think that there's probably an awareness both on the, as you say, like the kind of marketing side of the damage that could be caused by doing this again, and then also the company culture and, yeah, talent. Yeah, I will also mention there's like a spectrum to benchmarking, right? There's outright cheating but then there's also optimization with a focus towards benchmarking type stuff which is not bad necessarily not wrong but it could inflate your benchmark performance and then if you kind of stray outside of that on things that aren't being benchmarked it's worse so that's not to say that good benchmark numbers or like intentional effort to get good numbers is wrong probably you're sure to do in a sense in a sense but you could be led astray if you do that too much yeah the total difference about what counts as in distribution versus out of distribution like it would have you successfully genuinely generalized or are you teaching to the test and like you say i think you're exactly right right that's a beautiful point it's it's fuzzy it's a spectrum yeah yeah but yeah the numbers are quite solid in fact on even things like humanity's last exam very close to the frontier so i'd be very curious to see in a month where they are they also release or are going to release contemplating mode which orchestrates multiple agents at reason and parallel and with that on they actually do better than gemini 3.1 deep thinking or gp 5.4 pro so seemingly quite impressive i haven't tried it out myself but exciting to see meta super intelligence labs sort of getting into the game next up another new model release we are on a roll here open ai has launched gpd 5.4 cyber which is what it sounds like a variant of gpd 5.4 optimized for defensive cyber security use cases and similar to anthropic with Mythos, they aren't releasing it widely. They're expanding their trusted access for cyber program, which I was not aware of, but they do have. It's being expanded to thousands of individual defenders and hundreds of security teams responsible for critical software infrastructure. So similar strategy in a way to Anthropic where they partnered with big companies and gave limited access of mythos here. The same is true of Jupyter 5.4 cyber. So yeah, I suppose maybe not surprising that both are in a similar spot where the real danger seems to be in the short term from cyber capabilities. You could tell the story a certain way that doesn't look so great for OpenAI. The mythos preview or mythos preview, however you're supposed to say it, was good at cyber to the degree that it was, the insane degree that it was by more or less accident, right? Not entirely by accident for exactly the overfitting rules that you just talked about, the bench maxing rules that you just, the reasons that you just talked about, right? But it was, you know, it was trained to do really good coding and reasoning and all that stuff. And boom, out pops this really impressive general reasoner that happens to be dangerously good at cyber. In this case, OpenAI is talking about a fine-tuned version of GPT 5.4, not the same thing, not apples to apples. So in other words, to get to the, presumably they seem to be alluding to a comparable level of risk. That's what they're trying to argue here to meet those preview. Now, I do want to mention, at least in the blog post, what they say is we are fine tuning our model specifically to enable defensive cybersecurity use cases that is trained to be cyber permissive. So it's a bit ambiguous here of like, are they doing this for capabilities or are they doing it because the default model is intentionally meant to be bad and to not allow it to be bad? Yeah, I actually, yeah. And I can't tell. And with Anthropic, with Opus 4.7, right, that was one of the things to highlight is they sort of intentionally hold it back. And in fact, something I didn't mention is they, in the blog post, Anthropic said that they have this new monitoring thing where Opus 4.7 is going to be refusing to do cyber stuff. And it's going to actually be part of how they are going to try to get good at checking that the models aren't doing things and perhaps even roll out mythos widely. So, yeah, it's a bit ambiguous as to whether this is sus or not. Yeah, and to be clear, it won't be refusing to do all cyber stuff, right? It's like they're scanning for, you know, nasty, evil-looking cyber stuff in the usual way. Yeah, so the language here does orient towards fine-tuning, which is why, I mean, if OpenAI was going to come out with a release like this to, it seems, respond to the massive watershed moment of Mythos preview, then you would expect them to emphasize their model is not fine-tuned either. And instead, what we're getting is a bunch of ambiguity. And the one reference to the kind of model they're talking about is a fine-tuned model, which leads me to suspect that may be OpenAI kind of pulling from behind here and trying to kind of grab onto part of the story here. But either way, we just need to find that out. You're right. A bunch of stuff they're already doing on cyber, they're continuing to do is part of the message here. And they do say that they're adding additional tiers of access for folks who are willing to work with OpenAI to authenticate themselves as cyber defenders. So you basically go to OpenAI. They have a link for you. You basically sign up for this trusted access program. One of the things that they do say is that access to these more permissive models, these models that don't say no when you ask them to do defensive cyber things and potentially some offensive ones, access to those models is going to foreclose your ability to use a certain kind of enterprise options like retention ZDR. ZDR is really important because it's a guarantee from OpenAI. They're not going to hold on to your data, your prompts, your whatever. They're not presumably not going to allow you to use it, or at least in as many cases, if you're using this permissive cyber model. And the reason seems to be they want to be able to go back over the logs and kind of assess whether the models are being used for good or for evil. And so the zero data retention is kind of somewhat incompatible with that. And so this is sort of interesting, like you're seeing not only the individuals and entities that are allowed to use these models be calibrated, but also the kind of uses and the kind of even enterprise deals that can be cut with these models are being affected by the capability surface of these cyber models. So it's really interesting. We'll wait and see. But if it turns out that this is, in fact, a fine-tuned GPT 5.4, an argument that that would be equivalent to Miko's preview, then that's a pretty big gap then is implied between OpenAI's performance for the GPT line and Claude. And we'll see if that gets maintained. But it's sort of an interesting early indication of some potential challenges. Next up, we're down to remodel releases onto Asian tools. OpenAI is getting big updates to Codex. Codex, I have this blog post that is saying Codex for almost everything. The big news here is computer use. So Codex will now be able to use apps on your computer, click around. It's also getting a built-in browser, getting new plugins with Figma and Calendar and things like that. So similar again to Anthropic Cowork where the gist is like, this is going beyond coding. You can use it for anything. And boy, Codex is really trying to catch up hard to Claude. I think we covered Claude, Cowork, having computer use just super recently. Also browser use, and now Codex also has it. So very, very fast-paced competition between these two. Yeah, I mean, covering this is like it's just giving a list of features, right? The high-level piece, there's one I think that's strategically interesting. So Codex can now use GPT Image 1.5 to generate and iterate on images. This is, you know, we've talked about how Anthropic is one of the places where they're behind is on the multimodality side. They're trying to bridge that gap, actually, with some releases even this week. But that's kind of like something that OpenAI is leaning into here is, hey, we're really good at, you know, the screenshot game, the images game. We got a head start with Dali and Clip and all, you know, have this long, long tradition. So let's lean into that and let's make this a comparative advantage, presumably, of Codex. They also say Codex can now schedule future work for itself and wake up automatically to continue on a long-term task, potentially across days or weeks. So we're really starting to explicitly move into the long-term memory game here. And that's a shift. The question, as always, is coherence across that time period, right? You can have a model that works incoherently for months or years if you want. We can already do that. But the question is, you know, can this coherently work across that period of time? And so I think it'd be interesting to see with these systems, what do they look like when you run the meter eval suite on them? Do you actually see that nominal number of days translate into the number of days either on the meter eval tasks or any number of the derivative versions of it that like Epic AI, for example, has come up with? So anyway, interesting result. And I'm sure we'll end up seeing very soon if there's uptake and what the time horizons are. And you've got a whole bunch of other kind of agentic tool updates and releases to cover. So I'm just going to like batch of them and cover all of them at once. So Anthropic also has launched Cloud Design, which allows it to create visuals like prototypes, slides and one pagers. That's coming after Figma. They also have launched Cloud Managed Agents, which is an out of a box infrastructure for businesses to build and deploy AI agents. So this is like a sandbox with tools and file access and so on. It's, you know, there's some dead startups from this. So it's like secretly a big deal if for enterprise customers primarily. Aside from Entropic, there's also Canva releasing AI 2.0, essentially being agenda capabilities where instead of having one-off tools, you can tell the agent to do something for you. similarly adobe has a firefly ai assistant a conversational ai interface that lets users edit creative projects by typing stuff and this can handle multi-step workflows across photoshop premiere and illustrator so this is going to be available soon so there you go everyone is launching agents everything is conversational and anthropic is still launching quite a lot of stuff including this new managed agents platform, which is perhaps a big deal. And beyond the agents, a few more quick things in rapid fire I'll highlight. As features of existing stuff, Gemini can now pull from Google Photos to generate personalized images. Seems like a fun thing to play around with. Google has also rolled out a native Gemini app for Mac quite late compared to Anthropoc and OpenAI. They've had these apps for forever. Now there is a Gemini app. Finally, Chrome now lets you turn AI prompts into repeatable skills, which is a thing that exists on Cloud Code. You give it a demonstration and you can then invoke it and tell it to do that. Now you can do that with Gemini. So there you go. We haven't covered Google too much and they are also doing lots of stuff. On to applications and business. getting back to a developing story, Anthropic loses appeals court bid to temporarily block Pentagon blacklisting. So this is a federal appeals court in Washington, D.C. It denied Anthropic's request to temporarily block the Department of Defense's blacklisting of the company. And they said that the balance of equities favored the government over Anthropic's financial interests, whatever that means. This is separate but related from the case in San Francisco that we covered in a previous episode. And now, so there's basically two conflicting rulings that mean that Antropic is excluded from DoD contracts, but can continue working with other government agencies while the litigation proceeds. Yeah, this is kind of interesting. You know, we've talked about how there are these two different court cases that are being filed. So one was in San Francisco and that one we talked, I think, last week or the week before, right? Some judge basically said, like, this case is bullshit. This is clearly vindictive, blah, blah, blah. And then there was a case in D.C. where the judge was, in fact, this is the case here, where the judge is coming in and saying, look, you know, it's not that they're saying that Anthropic is wrong. In fact, on balance, I think very likely Anthropic ends up winning this. What they're saying is they're not so obviously right, so outrageously obviously right, that I'm now going to step in and reverse the Pentagon's move here on the supply chain side pending the trial or pending the court proceedings, I should say, litigation. So there's always this question between the moment where, you know, legal paperwork is filed and the moment where you have a resolution in court. What should you do? Right. In criminal cases, this is bail. Right. Like, do you decide to grant the criminal like freedom in the interim? Or do you go, hey, they're like being tried for rape or for murder. And look, the evidence looks really bad. You're kind of giving a pre-verdict in a sense, which is constitutionally really tricky. Well, there's a version of this that's happening here where the court is basically saying, like, are you going to put an injunction basically to say, hey, USG, you're going to reverse this pending the proceeding? And again, this is where California said, yes, I'm going to grant that. And D.C. is now saying no. The reason we've got two distinct court cases here, by the way, is that we have two different designations that the DOD is relying on to pursue Anthropic in the federal court. One was the one that applies to the Secretary of War specifically, that lets the Secretary of War basically just exclude a company from any defense procurement that involves security systems for specifically the purpose of reducing supply chain risk. And so this is like think of it as like this is the secretary of war saying not in my department of war, like you can't do business with us. The other one is this Federal Acquisition Supply Chain Security Act from 2018. And this is basically just a more general you can't do business with with the U.S. government more broadly thing. And that's the one where you're looking at the D.C. version of this. So here, you know, you've got this the statement from the judge. Actually, the full statement is worth reading. They say, in our view, the equitable balance here, and by equitable balance, they really mean when you're weighing the interests of the two parties and of the sort of law overall, cuts in favor of the government. On one side is a relatively contained risk of financial harm to a single private company. So, you know, yes, there will be, they're acknowledging there is risk to anthropic, fair. On the other side, they say, is judicial management of how and through whom the Department of War secures vital AI technology during an active military conflict. For that reason, we deny Anthropic's motion. So they're basically saying, look, this is actually like, yes, it's going to hurt Anthropic like a bitch. On the other hand, we're in the middle of doing like making a rash call on this is a philosophically and precedent-laden maneuver that would potentially have us ahead of our skis. We need time in due course to kind of think through this. They do say that because Anthropic is probably going to suffer some degree of irreparable harm without a stay here, their interests are basically financial. So it's nothing like no lives are going to be lost. It's just money. They say, however, because Anthropic is going to suffer, they think that, quote, substantial expedition is warranted. So their resolution here is, look, Anthropic, sorry, we're not going to give you the thing you want. You can't do business with the government or DOD, sorry, DOW in the interim. However, we're going to expedite this quickly to get to a resolution because we know that you're suffering in the meantime. So that's kind of where things fall. A bit of a mixed bag across two court cases and within this court case. We'll just have to sit back and see when things get scheduled and how quickly we get a resolution. Next, a quick story, but possibly meaningful. Jeff Bezos AI Lab has poached XAI co-founder Kyle Kozik from OpenAI. He was previously focused on infrastructure. And we don't know too much about this lab. It seems like presumably it's going to try to compete in the frontier model business and going to be scaling up, as you might expect. Jeff Bezos got a bunch of money for it, but we really haven't heard too much about it so far. Yeah. And in terms of funding, I think they're sitting on like, it's like $6 billion raised, something like a $30 billion valuation, if memory serves. A lot of the $6 billion, as you say, is Jeff Bezos' money. Not all of it, because, you know, you don't just throw purely your money at a new thing like this. But the emphasis here seems to be on the kind of physical world robotics type of application. So at least normally, it's not quite on the nose in terms of LLMs. They want to do stuff that's more about kind of real world trial and error. You're looking at sectors like jet engines, semiconductor fabrication, aerospace, like basically CapEx heavy industries, which are the industries that Jeff Bezos knows so well. Right. I mean, this is really a big part of why he's heading in that direction. So, yeah, I mean, we'll see. They've done a bunch of interesting acquisitions. There's general agents that I think we talked about at the time, maybe that they acquired. They specialize in VLA architecture, so like video language action architecture, which is why, you know, you can see how that maps onto robotics a lot more than just a standard LLM thing. It's an interesting play. We haven't really seen tons of stuff come out of them. We don't know anything about revenue and customers, nothing public at least. We know about, you know, this acquisition of talent. I mean, Kyle Kosich is, this is a big acquisition. He led the infrastructure behind Colossus at XAI, right? So this is a dude who really knows. That was famously, right, 120-day super computer cluster. So everyone was like, holy shit, how do you do that? It takes a year and a half to build these. It just does. No one does it in 120 days, but Elon and therefore Kyle actually did that. So this is a big loss for not only XAI, but also OpenAI. He formerly been there. He had two stints there from 2021 to 23, and then again from 2024 to 2025 with his year at XAI kind of in between. So, yeah, this is going to be no surprise. They'll be building up massive infrastructure projects as part of this. But again, we don't know basically anything else. They got 100 employees. They're across apparently San Francisco, London, Zurich. That was as of last November. Maybe it's more now. So yeah, probably more to come there. Next story, perplexity. Interestingly, it seems like their revenue has been boosted 50% by the shift to AI agents. So just in the last month, its annual recurring revenue has reached $450 million in March. Now, worth noting that annual recurring revenue is like, oh, you paid us for this, and then we're assuming you'll keep your subscription and we'll get all your money. Could be sort of like, oh, let me try this open-cloud type thing, and then it dies down. So worth being a little bit skeptical. But if this holds up, it means that Perplexity might be going much deeper into the open-claw kind of managed agent business beyond its core product of AI-powered search, which would be an interesting evolution of the company. Yeah, it's also – like it's worth putting in context. Obviously, this is way, way smaller than the revenue run rates that we're seeing for Anthropic, you know, $30 billion or whatever, and Cursor even, which is $2 billion. So all in relative terms. And this does place perplexity in more direct competition with both of those companies. Right. So like this is it's it's good. There's a huge market here in that sense. But it does mean that they're now going head to head against much better capitalized and companies with larger kind of steady user bases. So we'll see. The agentic wars are fully on. Perplexity is also announcing a tax agent that they're launching as well. So there's just like a whole bunch of things they're putting out there that are more agentically oriented. Yeah, we'll just have to see. They've got 100 million users. So, you know, that's a great base. They've got distribution. There's tens of thousands of enterprise clients. Like, it's a good start, but we'll have to see if it can be sustained in the face of Cursor Anthropic and others. right and related to this they actually just rolled out this personal computer thing we covered previously as something announced it's now available on mac so it's really going head to head with both openclaw and clod co-work and codex you know i mean i think they've got a decent chance and competing with openclaw in particular given that they are sort of a tool to do search or just do stuff for you. Going back to Anthropic, we have news that they have agreed to rent CoreWeave AI capacity to PowerCloud. So another deal that is going to be slated for multiple years. The capacity will come online starting later this year. Now CoreWeave has a whole bunch of business with everybody, Microsoft, OpenAI, Meta. Once again, Anthropic just going everywhere it can to get compute. They clearly did not expect to blow up as much. They clearly are struggling to keep up with demand. People have been reporting cloud code working worse in recent months. And, you know, they might be still doing Opus 4.6, but how heavily quantized is it? Is it like exactly cloud? We don't know the secrets. And I do think they're probably trying to save compute in some ways that are impacting user experience. So we really need these kinds of deals to come through. Yeah. Timing these data center builds and these acquisitions is so hard. I mean, that's almost the entire problem. It's not, but it's almost the entire problem. Yeah. And this is a big story for CoreWeave, right? I mean, now with Anthropic on side, if you look at like the 10 top AI model providers, nine of them are running on CoreWeave. Like the only holdout right now is XAI. That's the only one. That's pretty wild. It's a reflection of the fact that everybody is, as you say, trying to grab whatever compute they can. So any company in this space is in the space in a big way. But it is pretty wild. I mean, back in 2017, I think CoreWeave was basically just like a crypto company. That's how fast this happened, right? So it is pretty nuts. They're also a bit on a kind of debt knife's edge. We've talked about this before, but it bears repeating. Every time they get another big customer with many, many billions of dollars of CapEx, the balloon grows, the risk and the returns grow, right? So right now they've got about $10 billion in debt and they've got about $34 billion in lease obligations. 2026, their CapEx plan was an additional 30 to $35 billion. The whole hope here is that hopefully they can get their data centers commissioned online before debt servicing just crushes them into a fine paste. And every new deal, like, you know, Meta had a $21 billion deal with them and Throbics deal here. It extends their revenue runway, but adds more debt. And like, so, you know, the capital intensity here is insane. They'll probably be fine, but it's just like, you got to think if you're CoreWeave, you are outrunning your debt. That's what this feels like. It feels like a breathless attempt to get these data centers online. And I can tell you data centers are not quick to build. They are often delayed, Like massive delays are very common. We've seen this in the news tons of times. Everything from the builders to the municipalities, like things just go wrong. And so this is not a nothing risk. Like you genuinely could get a six-month delay in a build if you're partnered with a bad GC or a bad landlord or whatever. It's an interesting place for them to be at. You know, both Anthropic now and OpenAI obviously are going around their biggest backers. For Anthropic, it's AWS and Google. For OpenAI, it's more famously Microsoft. To get compute anywhere they can, this is one way that they're doing it. So it's a big deal for CoreWeave and just adds the, I mean, the stakes are just getting higher and higher. Yeah. In like a week, their stock shot up 30%. It went from 90 to 120. So clearly a big deal to be making these kinds of deals. But it's also been a very tumultuous history for the stock. It crashed as of 2025 a little bit. So yeah, it's risky, as you say. next merger story cohere is perhaps going to be merging with alec alpha so cohere is a company valued at seven billion dollars with the pitch of enterprise oriented llms and products including embedding and rag not just llms they have released quite a few things but i don't think we've kind of covered them very much in terms of hyped up news. They say that they have earned $240 million in NNL recurring revenue last year, which is not that much compared to on-profit growth and AI. And they're planning to possibly merge with Alakalfa, which was a startup that initially wanted to train its own models and then shifted to helping customers implement AI tools from any provider. So a very enterprising move where when you're talking to enterprises, you need to be like courting them, making proof of concepts, you know, really accommodating them. And this signals that they are still trying to do that better. Yeah, I've been beating a pretty bearish drum on Cohere for a long time. It's not that I don't think they can't be a company. It's just that I think that if you believe in scaling, they're not going to be the ones to build AGI, which in the long run makes them kind of irrelevant. That's my, I mean, I take a pretty hard line on that. So not everyone would agree. But in this case, you know, the problem with Cohere has been, it seems like they consistently want to, now that they've accepted that they're a distant, like, you know, 10th place player or whatever it is. They've kind of been appealing to governments more as part of the sovereignty I play, which is a way that you can do scaling. Like it's a way that you can try. Problem is governments just don't have the capex of large corporations in the U.S. It's just like you're not going to see governments laying out a trillion dollars in capex anytime soon. And so it's kind of become this increasingly, I don't know, the feeling is like last pick in gym class sort of thing where now Canada obviously can't afford big capex outlays. So this is sort of like state-backed consolidation play happening where Canada and Germany are coming together and seeing if like their two ugly stepchildren can merge to make one like large ugly stepchild. And it's just like not – like Alifalfa is just a disappointment. Like I don't think anyone has ever really claimed otherwise in the last little bit. They came out as Germany's big hope, but they just did not meet expectations. Their founders stepped down last October. They had cut jobs. They pivoted away entirely from building their own in-house models. And now they're just like a AI services integrator. And by the way, for public authorities, they can't even legally use American AI. So they're like completely out of touch commercially with that. That's after they raised half a billion dollars in 2023. So quite a disappointment, as you said. Yeah. And like, I don't want to be an asshole. I really don't want to be. but like European investors, like not always the best, like best track record. Cohere, I mean, you said it, it's a quarter billion dollars of annual recurring revenue. It's by any measure, that's really good. The problem is the comparison point is not, it's not perplexity even. It's open AI, it's anthropic. Like that's the game that they're in because ultimately Cohere has to put out, if they're in the business of training frontier models, if that's your business, you have got to be able to have performance on par with those models. Like there's in the long run, there's no way of escaping that. You might escape it in the short run. You might, you know, do fine tunes and say on on-prem deployment and sovereign AI and wave that flag. But ultimately, I think this is just a, I think it's just a challenge. I think, I think Cohere is ultimately going to be in trouble. And I think this merger sort of reflects that kind of like anybody but the US and China play or position rather that Europe and Canada are in to some degree. You know, Mark Carney, Canadian prime minister, went to Davos, I think it was, and, you know, made this big speech about the importance of the middle powers coming together. Basically, just this idea that, yeah, if you add together Europe's GDP and Canada's GDP and all this stuff, like you get to something that's on par with like the U.S. and China. And, you know, Canada should have an option other than one or the other. And this kind of seems aligned with that. You know, Coher has done a lot of spent a lot of time kind of, I would say, a disproportionate amount of time, certainly given their their small market cap engaging with like foreign leaders and, you know, the UK prime minister, Canada, probably all this stuff. And so this is just one dude's perspective. So, yeah, I do want to like counter negativity a little bit just to highlight that Coher is very much sort of they aren't trying to compete head on with Open Anthropics by building Frontier LLMs. Their one LLM is AYA and it highlighted as an open research initiative that is multimodal The models they are training is for re and document retrieval very business things And aside from that, their products are an enterprise-ready AI platform and intelligent search and discovery system to surface business insights. And yeah, they have embedding and re-ranking models. So they do have command, which is meant to be high-performance scalable, but I don't think they're trying to sort of be as good as GPT 5.4. And they highlight things like security, deployment, customization, training on your proprietary data. The pitch, at least, is to be slightly differentiated from opening an antropocic by being more business-friendly and specialized. Whether that's working is another question. I agree that that's what it says on a tin. I guess what I'm saying is structurally they are competing with open AI. Like you're going to get to a point where for any product that Cohere offers, Anthropic and open AI are going to offer the same thing. That's my claim that within three years, two years, that's where things will be. You kind of, you've already seen Cohere sort of retreat. Like they previously were meant to be a frontier. Like, let's not forget that we've literally played this game out before. They were training frontier AI models and then just like died on the runway. Now we're seeing them retreat into, oh, well, we're getting more niche. The problem is that as models get more scaled in general, the niche becomes absorbed by the kind of massive intelligence. And you can just like distill really good models and get excellent small. So like that's where my bearishness is coming from. I just see structural risk for them all over the place. I think on the technical side, it's more of a business competition front where you want to lock in the customer. where they pay for a year up front, and maybe that way you can stay or at least be in the race. When, like, it's annoying to switch to my customer as your biggest moat, I just, I'm worried about that. But, yeah, yeah, I agree. I mean, for sure, for a couple of years, there's... That's enterprise, right? Yeah. Moving on to Chagipity, they have a new $100 per month pro subscription. Not much more to say on that. It's between the $20 a month and the $200 per month pro tier. that is also an option for cloud code they have $100 and $200 so yeah similar to like these $100 $200 things used to be for the most advanced models and early access and so on I think now more and more they are being used for codex and cloud code where very high token limits suddenly a lot more people benefit from it and you know personally at my company we're paying the $200 dollar per month it's a no-brainer like you're burning way more money than you're paying per month and also on up in ai they bought ai personal finance startup hero hero finance this is an actually higher so yet another case of the company is shutting down i don't know that they actually acquired the company or just hired away the talent which is more and more kind of the standard is you just hire people. The company nominally still exists, but is dead. Investors, by the way, love it when you do that. They just love it. Yeah. Yeah. So yeah, OpenAI is clearly still kind of having a lot of different focus areas. This should help them expand into financial planning and dealing with user financial data. Yeah. It's actually interesting that it happens to fall in the same week that we're hearing from, not Cursor, perplexity, right? About their accounting agent. So it's tax time, baby. And last up, a story that I didn't want to highlight, but you probably have heard about. So we'll touch on Allbirds, the failed shoe company that went bankrupt, I believe, was pivoted to AI infrastructure, at least announced a pivot to AI infrastructure, rebranding the virtue as Newbird AI. And then its stock jumped 600%. As you might expect, many people clowned on this. They're like, oh, this is indicative of the state of AI, that this shoe company announced that it's pivoting to AI compute and its stock jumped by 600. I think this is pretty nakedly kind of a cynical play by an investor. If you dig into the details, an investor put in like 5 million and made this announcement. The stock jumped a whole bunch. Now investors are going to be making out very nicely on this. So it's essentially like, you know, the brand, all words, it's big enough. It used to be like a $4 billion company before it went down. You could acquire it for $40 million, something relatively small, and then do this as like, you know, I don't know how seriously they'll try to do AI compute. Let's just put it that way. In other news, Last Week in AI is pivoting to AI infrastructure. Send us your Bitcoin, I guess, is the play. On to projects and open source, where we have some slightly more interesting different kind of models. First is High World 2.0, a multi-model world model for reconstructing, generating, and simulating 3D worlds. So this is from Tencent, and this unifies 3D world generation and reconstruction in a single framework. So as you see with these kinds of models, they have a bunch of fun demonstrations of big 3D worlds with like a camera flight through. There's a couple of components here. HY Pano 2.0 as well, which has a panorama generation component. They have World Stereo 2.0, which is a world expansion component. Yeah, I think we will be seeing more and more on this. This is like one of those things where it's really still quite janky and not advanced. So this is an area like human robotics where, you know, there's still a lot of room to improve in world models and in 3D more broadly. Open AI and Tropic not doing 3D notably. So we are getting to a point where it's like pretty good, pretty impressive, but very far from good enough to sort of be used in real applications. and we are making pretty big jumps in quality because there's still room to improve by quite a lot. Yeah, actually both these papers were flagged by you, assuming that because you have more of a kind of video sort of image background, both because your work now and your previous work. I thought it was really interesting to kind of dig into something off the beaten path of like the AI agents, LLN sort of thing. So my reading of this was through the lens of scaling, you know, super intelligence, AI agents, LLNs, What does this mean for that? I guess a first observation is, so we're still in the space where for applications other than LLMs, you generally will see a Frankenstein model kind of architecture where you don't have a single monolithic model that does all the things. Instead, it's a whole bunch of specialized models that are orchestrated together. Obviously, AI agents are more like that too. But just at the kind of model level, a lot more weight is being borne by the inductive priors, right? By how you specifically modularize and plug in all these different things in ways that reflect assumptions that you're making about the best way to solve the problem. And this is one of those cases. They've got four different stages in their pipeline here. And they do use a multimodal diffusion transformer. So it's a transformer. It's doing autoregressive, basically like sticking together the sort of like description, let's say, of the image or video they want to generate and the noise that has to be denoised by the diffusion component. So kind of blending these two together. We've seen that before in other contexts. But then there's separate, you know, separate models and systems for really analyzing. So they first like create a, they take an image and they stretch it out to create a panorama using one model. And then the next step, they'll analyze that panorama to figure out what's in the scene. So like walls, floors, stuff like that. And then plan camera paths through that scene, explicitly like kind of chart out routes that maximize the coverage of all the spaces you can navigate. At the same time, it's like dodging obstacles and stuff. And those paths actually come with descriptions, text descriptions to guide the next step. And so obviously the LLM or at least a VLLM and functionalities involved there. And then from there, they're going to do this step called world expansion. And basically they take all those planned trajectories and they generate a bunch of images from key viewpoints along those trajectories. So like, hey, what if I look from this point perspective at this thing, you know, and all that. And then the interesting thing is that they use an explicit memory mechanism to keep track of what's already been seen. So new frames are consistent with earlier ones. This is a really important distinction because if you remember earlier models like this, like video generation models, would famously have these really short kind of coherence times where, you know, for 30 seconds or so, it'd be great. Like you would look around and, you know, you'd walk forward three steps, you'd look behind you and you would see exactly what you would just walk past. But then if you kept going, pretty soon reality would just almost literally melt in some cases. I mean, it just degraded into coherence. And the reason was that the model couldn't remember what it had seen previously. And there was this kind of like treadmilling of memory. And so like in agents, we're seeing here a push towards this more persistent, robust kind of memory. And I would argue this falls into the continual learning category again. This is a version of that that applies to the video paradigm, just in the same way that it did in the LLM paradigm. Anyway, the last step is this thing called world composition. So they take all those generated perspective shots from all of these planned routes in the panorama, and then they feed them into a model called World Mirror 2.0. It's basically a model that predicts depth and the 3D structure from each frame, and they stitch them together using 3D Gaussian splatting, which I have never played with. But anyway, yeah, stitching together, you may know more about this. But yeah, bottom line is the improvements here really do seem to come more from kind of data curation, architectural tweaks, memory, like the bolting on of more features. So this is not a bitter lesson-pilled approach. This is very much a handcrafted, like that's where we still are at, at least in terms of open source in this problem set, which I found quite interesting. and i think yeah to be fair this has always been true to some extent with 3d because 3d is just more awkward to represent and like the input and output format is tricky so here as you said there's multiple components and sort of qualitatively why that is is like the way you're doing this is you're starting in a point in space and then you decide where to expand to so it's You're like expanding a world model. And the reason to do it that way is you don't have a nice way to represent the entire world all at once. You have a nicer way to represent a view of the world and then add to it. And we may or may not have it be always true. And this is one way to do that. Actually, the next paper is another way to do that that I think is more important. So next we have Lira 2.0. we can say that High World 2.0 is sort of more like World Labs of Marble, where they have like a 3D Gaussian splat or another kind of 3D representation, and then you sort of walk and expand the world as you go. LeWire 2.0 is more like what we saw from DeepMind. I forget the world of it, but it's essentially a playable video. And so you can be in a setting and have a sort of control and see the video update in real time. So it's a world model of a different sort. There's basically two different types of world models you can have, where you have continual video generation, and then you can have kind of an existing static 3D world that you keep expanding. And the two are related but not the same, and there's various implications for architecture and sort of representational aspects that make 3D sort of just tricky. But yeah, this is also in that realm of world modeling and open source model that you can use to build that. Genie. Genie was the, yeah, yeah, yeah. It's funny, we see so many models on this show, and I'm like, oh man, I'll never forget this one. And sure enough, yeah, it's interesting that these sort of similar or related solutions are coming out at the same time. I guess you generally do see that. And this is, by the way, from NVIDIA, which I think is also interesting. Yeah, that's right. That's right. Yeah, one Chinese research lab and then one obviously American one. So the two, as I understand it, core problems that they're solving here, one of them is this whole spatial forgetting problem where we talked about it. You drift off into one direction and then pretty soon the model kind of forgets what it's seen and starts to get inconsistent with that. this model, Lyra 2.0, fixes it with this 3D spatial memory cache. So it actually does store the geometry of every pass frame separately, and then use it to retrieve the most relevant pass frames when needed. So this kind of like, it feels like a rag type search to kind of pull out the physics that you want to anchor on. So it's a more grounded generation. So that's one piece. And again, we've talked about the analogy there with language models, right? Where you or no, sorry, I don't think we did. There is an analogy to language models, right? Like famously, in the same way that these video models will forget what they previously generated. It used to be the case that language models, well, when given text that goes beyond their context window, right, they'll still do this. They'll forget what they had written previously. And you get this treadmilling effect that leads to incoherence over long stretches of generated text. That's much harder to spot now, obviously, because context windows are absolutely massive. But one of the solutions that is used for that is RAG. And in this case, the 3D spatial memory cache is sort of the analog to that. The other is this idea of temporal drifting. And so when you generate video autoregressively, so frame by frame, you do tend to see small errors that accumulate. So like, you know, colors that'll shift or like geometry that kind of starts to distort. And they fix this with what they call self-augmentation training. So during training, they're going to deliberately give the model slightly corrupted, noisy versions of its previous outputs. Instead of the actual outputs, it'll just corrupt them a bit. And that kind of forces the model to learn to correct that drift instead of propagate it. And actually, this is kind of not entirely dissimilar to what a diffusion model does philosophically. It's just playing out sort of autoregressively in this way. And so that also has a profound analog on the LLM side where you can see goal drift or like value drift happen in a sequential decision-making context where you have little errors that start to compound. And the fix here, the idea of training on your own degraded outputs is basically teaching the model robustness to that imperfection. It's kind of an interesting thing. You could imagine drawing inspiration from that for LLMs too. So yeah, all the usual things apply. This is a world model, so you can use it to train agents ultimately. Like if these things get robust enough, it also, interestingly, as I understand it, learned like it was never explicitly trained on 3D spatial structure. It was just given 2D videos and it generalized from those 2D videos to 3D without a specific hard line inductive prior, which is really interesting. So again, this is kind of like this thing we've seen with scaling in the past where beyond a certain level of scale, you do see what you would previously have thought of as out of distribution generalization started start to come within reach. So I thought that was sort of interesting here. Yeah, not anywhere near genie quality, by the way, like the demonstrations are very short videos. They don't seem to be very interactive. It's mostly panning. So NVIDIA is not super far along here, but this is a quite rational type of space for them to invest in and have a lead in. Last interesting detail, looking at the appendix, it's built on top of one 2.1 14B DIT, which is an open source generative video model from Alibaba. So this is some like cross country sort of open source collaboration in a sense. Moving on to policy and safety, I think here starting off with more of a safety-ish, I don't know how to characterize this story, but one of the big stories of the week, which is that Sam Altman's home and OpenAI's headquarters were attacked. There was a Molotov cocktail thrown in front of the gates of Sam Altman's mansion in San Francisco. And then later on, this person who was accused of doing this, Daniel Moreno Gama, a 20-year-old, he tried to break in to open his headquarters, I think, with a chair. So there was seemingly also a document that he had called Your Last Warning, which is quite of an extreme message as you might expect so this was a person on the extreme front doing some extreme things at the same week there were also shots heard outside of sam otman's home in this case there were two people who were freed so they are booked on suspicion of negligent discharge of a firearm, but they are not charged. So unclear what the situation is here, whether someone intentionally tried to shoot someone's home or if it's just a coincidence. But either way, I think this is one of the very early cases of violence as a response to AI, as part of AI backlash, and also as part of sort of concern about AI. We've seen people demonstrating outside the offices of OpenAI. There's multiple organizations. I do recall we covered a person who had potentially violent thoughts. I think it was like StopAI had someone who they intervened because they seemed to be considering violent things. So yeah, this obviously quite extreme, quite bad to have these kinds of things happening. and also something I could see happening more as AI has more radical effects on society. Yeah, absolutely. And needless to say, this is an insane thing to do on a lot of levels. One of which is just like, suppose he'd succeeded. Is this supposed to stop the race towards building more powerfully? Like that just doesn't strike me as the outcome here. Obviously, it's just somebody who's really distressed and got it in their head to do something wild. It is explicitly tied to this concern over human extinction, by the way. So there was two parts to the document that he'd written. The second part was titled, Some More Words on the Matter of Our Impending Extinction. Left no mystery as to what the motivation was here. He did say to, it says, to victim one's name. So really, this is Sam Maltman. If you make it, if by some miracle you live, then I would take this as a sign from the divine to redeem yourself. So this is like all part of this kind of manifesto that this individual wrote. I mean, look, to your point, I think exactly right. We are going to see more of this also. So if I'm a nation state actor looking to undermine U.S. and broadly Western interests when it comes to A.I., I would very easily decide to nudge more individuals to do this sort of thing. This is a very common motif. It's happened in environmental rights groups for forever, where when you trace back the funding, often unbeknownst to those groups, it does come from China or Russia, where they're trying to prevent critical infrastructure like pipelines or data centers from being built. It's just what you do. You always do it through proxies and, you know, you always do it in ways that are deniable. But that's the basic idea here. So, you know, ultimately, I expect this to be a nation state game. As weird as that sounds, the process of generating people with this kind of disposition, with this view and this willingness to act is going to be professionalized by nation states. It is just going to happen. And that's, you know, you're going to have to see super high levels of security for all the frontier AI executives. That will happen. You know, expect that they'll have their own Popemobiles or whatever the equivalent is. But, yeah, things get wild really quickly. And this is absolutely, to your point, not the last that we'll have seen of this, unfortunately. It's just the way things go. Right. It's really unambiguous, by the way, that this person did these things in a criminal complaint. There's some very high-resolution surveillance camera footage of him throwing the Molotov cocktail at the gate and also going to open their offices with a chair trying to break in before presumably facing a security guard. so yeah this person did it they had this whole write-up they did this as in in this document saying leading by example and show that this person is fully sincere in the message of advocating for others to kill and commit crimes so yeah as you said deeply related to the notion of impending extinction but also a little bit unhinged in the right like it says addressed to victim one if by some miracle you live and i would take this as a sign from the divine to redeem yourself so let's not think that like rationalists or whatever ea people are agreeing to this or anything like that That's not the case at all. On to another sort of weird society affecting phenomena that we have an example of that may be happening more over time. The specific example is covered in The Verge with the headline, The Iranian Lego AI video creators credit their virality to heart. So in case you don't know, it's been a crazy time for AI-generated videos in politics over the last year or so, really starting to ratchet up under Trump's administration. They kind of post AI videos very often. And in this war with Iran that just started, there is competing propaganda going on where the White House posts a lot of these ridiculous over top edits, usually with clips from video games or things like that. And then in response, the Iranian here, it looks like actually a group of about 10 Iranian content creators, explosive media. They've gone viral. These AI generated Lego style videos commenting on the U.S. and Israel's attacks on Iran, as you might expect, generally like sort of making Iran heroic and conveying a message that they are not going to be backing down, that the U.S. and so on are bad. And this is coming at the same time as a couple of other things that are related. There's also a story about hundreds of fake pro-Trump avatars emerging on social media with videos of people approving of Trump. Don't know where that's from. It just is happening. And then also related and something you may have heard about, Trump loves AI-generated images and videos. He posts and reposts them constantly. I don't know if we've covered many, but he posted photos of himself as a buff Jedi. And most recently, of course. Oh, that was real. Yeah, that one was real, though. I don't think that's the case. And most recently, he was in an image as Jesus healing a sick man with the excuse of like, oh, no, I was a doctor. Very clearly as the Messiah, which received a lot of backlash. But there's a long list of examples of Trump posting himself in these ways, including like sort of videos that mock his opposition. So all these things together, I think, point to the gradual escalation of the use of AI-generated media images and video as state propaganda, as tools of politicians, in a way that I don't think we fully expected. sort of it it has propagated and hasn't massively impacted things in a way we sort of feared that deep fakes might but at the same time it's it's hard to get a sense of what the impacts here for instance these fake pro-Trump avatars generated videos on social media like it we got to a point of a video being so good that at the same time being so good and us being a bit cynical and used to AI-generated images and video, in a sense, where it's just, like, normal. Big age gap, too, right? Because, like, I mean, you know, I've seen some older folks look at content that's obviously AI-generated and go, what? Like, why would they do that? This happened to me a couple weeks back. Like, my uncle did this thing where he just, you know, with Grok or whatever, made a video of me jumping out a window or something, and he sent it to one of my in-laws anyway. And they were like, they're like, wow, why would he do that? Like, that was their first reaction. Yeah, it was a sign of the times. There's a really big gap there between the generations on that side. Right. And in this example of the fake pro-Trump avatars, again, we don't know who did this, which is also very strange because having hundreds of accounts making many posts is pretty expensive. and worth noting also that Trump did repost content from one of these videos so directly affecting kind of messaging from the administration so very weird time for the U.S. and I guess for Wall Street in general where we have Iranian Lego AI videos going viral as part of war propaganda and next we have what we can think of is the sort of AI alignment paper of the week automated weak to strong researchers. So you're anthropic, you're concerned that we're knocking on the door of superintelligence and your entire hope at this point rests more or less on the idea that you're going to be able to pull off something called scalable oversight. So what is scalable oversight? Well, we're a bunch of dumb asses and we're going to make machines that are smarter than us. And we're going to hope that we have the ability to meaningfully oversee and steer and control the behavior of those systems. And so can that be done? Is there a way to do it? Is it logically possible. Well, why don't we test that by using a dumb AI model? In this case, we're going to use QN 1.5, 0.5 B, so very small kind of small scale model to supervise, to teach a smarter model. And in this case, that's going to be QN 3, 4 B. So a model that's more advanced in generations and also eight times bigger. So much, much more capable model. And we're going to see, basically, can we start with our dumb model and fine tune it on some data? So now the model is kind of like learned in a shitty way, the lessons of that data, right? We take our shitty model, train it on some data, the model is learned in a shitty way, the lessons of that data, and then use that shitty model to produce some shitty labels for fresh unlabeled data. Okay. So you got data now that's been given shitty labels by a weak model. Now you're going to take a strong model and train it on those shitty labels and that data that just that you just should it should only label. Can you get that stronger model trained on those shitty labels to perform as well as that stronger model would have performed if it was trained on ground truth labels, like the actual correct answers. That's the core question at the heart of scalable oversight, or at least this is one way to formalize it. And that's the experiment they're going to run here. There's this metric that I think Leopold Ashkenberger sort of started kind of pushing down the direction of this particular metric. It's the performance gap recovered. So a zero score on this metric means that the strong model recovers the same performance, the same accuracy in generating its labels as the shitty weak model. So basically, it should be kind of memorizing whatever shitty process the weaker model used. That's a zero. A one is to recover, if the strong model recovers the same performance as it would have if it was trained on ground truth labels. Now, if it sounds impossible to you that this could work, I'm sympathetic. To me, it's actually very unclear. This sounds like it could actually just be straight up impossible. The hope logically is that the stronger model has been trained on so much data, has so much scale that it can actually kind of see the gist of what the shitty model was trying to get at with its labels and then generalize better than the shitty model could during training. That's actually, that's the hope here. And well, the result here, to be honest is just so by the way the hope is that we can automate exactly this kind of research automating research into how we improve weak to strong generalization how we essentially close the loop of the automated AI safety researcher Because if we're going to make automated AI capabilities research, we'd better have automated AI safety research to keep up. That's all part of the hope here. So they're going to use an agentic scaffold. They're going to have a bunch of automated researchers running in parallel. They're each going to propose different experiments to run, different research ideas, and they're going to share their findings with each other and their code with each other. That's kind of the architecture here. There's a whole bunch of kind of strategizing in terms of how they split up the data that they're going to train the shitty model and the strong model on. So you have one initial chunk of the data that the weak model is going to get trained on just so it can learn the gist in a shitty way of the data. But then the second part of the training data is going to be used to, you're going to get that fine-tuned weaker model that was trained again on part one. You're going to get it to assign shitty labels in a shitty way to part two of the training data. And then now that you have part two of that training data shittily labeled, that's what you're going to train the strong model on. So that's one experiment. Another is you just train the strong model on the true labels for part two. That's how you get your sort of upper bound on your performance for this assessment. Anyway, so that's basically how it's set up. They tried across a bunch of different kind of domains of problems like math and coding. And here's the result. The result, at least to me, was really shocking. So first, let's talk about the control. What do you get when you just use two human researchers? You give them a week of using pretty standard tuning methods and see what they can get. the performance gap recovered, the PGR, that number is 0.23. Basically, they get 23% of the way to the performance of the strong model if it had been trained on ground truth labels using this technique. So not that great. Like using human researchers does not result in techniques that seem to kind of recover a lot of performance gap. The automated researcher achieves a PGR of 0.97. 97% of the way to the performance of the strong model if it was trained on actual ground truth. To me, this is wild. Five days they gave it, 800 agent hours, about $18,000 worth of compute was spent on this. So just from a price standpoint alone, this would already be amazing if they just matched the performance, but they're blowing the performance out of the water from 23% to 97% on this benchmark. This strongly suggests that automating research on at least alignment problems that are outcome gradable, where you can actually quantify and assign a concrete metric to those problems, is already practical. That is a huge deal. That didn't have to be the case, as we've just talked about. It is wonderful that it is. The bigger challenge, though, is moving on to non-outcome gradable problems. Historically, this has been most of the problem of alignment. How do you quantify what it means for a system to be well-behaved? There's no metric for that. And so you can't just stick a bunch of AI agents on it to automate that research in the same way that they did here. You need some kind of metric that you can hill climb on, that you can optimize against. And that's why the PGR performance gap recovered metric is so useful. It's why it's such a big deal. But before we give up on that idea, before we give up on the idea of the automated alignment researcher for non-outcome gradable problems, we should realize this isn't actually as weird, as crazy an idea, this idea of like generalizing to non-outcome gradable problems as it might seem. This is after all exactly what happened when we first went from just text autocomplete models to general chatbots and agents in the first place. Like when we started, the outcome gradable thing was how well can you predict the next token, that just generalized successfully to actually, I guess, these things, these chatbots can now reason. And so there is actually precedent for exactly this kind of generalization. The question is, at what scale does it happen? Do we discover the techniques on time? Do we make differential progress in alignment, in other words, faster than capabilities? This is a pretty bullish kind of result for this. There's a whole bunch of reward hacking issues, by the way, that they go into that we won't have time to discuss, but that are very interesting. The one thing I'll say is they have a fairly alarming result, which is in binary classification problems. If you want to know that the true label for a single example, one thing that you can do is basically submit two predictions for the test set that you're being evaluated on and only change your predicted value for one of the samples and see what it does to your accuracy score. And in that way, you can basically cheat the test and see, oh, okay, like I found the right label. When I make a sample number 10 red instead of blue, my PGR goes up. They actually found that the model did do this. So there's a lot of like detailed reward hacking that led them to throw out a lot of intermediate results on the way here. And they just found this kept coming up. The more they pushed the kind of alignment frontier, the harder they found it to be to kind of squash these dangerously creative solutions that the models were coming up with. This is exactly consistent with Goodhart's law. It's perverse optimization. The more capable your model is, the harder it is to steer it exactly in the right direction to get the outcome you're looking for. So anyway, I thought it was a fascinating paper. And again, one of the more profound pieces of good news on the alignment side that I personally have seen in the last even year. This is a pretty big deal. Right. I do want to share some caveats here. First, the two models they looked at are pretty weak to begin with. The weak model is QN 1.5, 0.5 B chat. The strong model is QN 3.4 B base. So these are small models, presumably because to do automated training, to do automated research, you do a lot of experiments. And that is a slow, unless you use small models. And so you want to scale this up. Now this entire strategy may not work. And the second caveat is exactly related to doing many experiments. When they look at the discovered ideas, the case studies, they highlight mostly things that are human interpretable, like sort of things you would think might work. And, you know, there's really just a lot of stuff that the model strives. And one of the issues with having access to a test set and kind of evaluation, iteration for experimentation is you might just like have a good seed. As you said, there's like ways to hack reward. But even if you don't hack reward directly, just trying out throwing the spaghetti at a wall might get you something that works for this benchmark, but doesn't actually accomplish the task in a way that is more kind of nuanced. So at the same time, like a positive sign for alignment research and also something that might be overstating it. Also, the tasks they looked at are going back to their original weak to strong generalization paper where on different models, they recovered like 90 percent of the capability. So recovering the capability is not necessarily that hard for these tasks. Last thing I'll say is we've covered multiple iterations of sort of automated AI research direction. And one of the interesting conclusions in this project is that not having prescriptive scaffolding, not having sort of hard coded series of steps, but instead having a loose structure and letting your models decide what to do. They say that the autonomous kind of loose structuring is better, which differs from previous approaches we've covered. Usually there is sort of like step one, step two, step three, step four. Here they kind of let it do whatever and launch multiple agents that are kind of pointed at different directions. So a lot of nuance here. Yeah, which does make sense. you know, like you kind of ought to expect that as scaling makes individual kind of models more performant with fewer inductive priors. It's kind of that, you know, the history of LLMs that we've seen over the last six years is like the bitter lesson says, take your hands off the reins more rather than less over time. Just let the compute do the compute and it'll solve your problem better than you would have thought of. But yeah, and you know, in terms of the scale of the models here, absolutely correct. Like, you know, half a billion parameters and four billion, these are baby models. It's also true that this is how alignment research is typically done. So it is more apples to apples in the sense that, like, when you look at a lot of past models, OpenAI first used, I think it was like they used GPT-4 to directly interpret the weights of GPT- or the activations or the weights of GPT-2, right? So, like, you typically do step down. Usually it is about two orders of magnitude, which is from the frontier, which is about what this is to kind of run your experiments. And the art and the craft is in having the taste to go, ah, this is the kind of result that probably does scale. And you probably want to repeat this experiment with, you know, an even smaller and an even bigger model once you have a compute budget to do it and see if you can actually split chart scaling curves. How does the PGR change with scale, right, would be a very natural question to ask here. In terms of performance gap recovered, yes, like the 90% number from previous, their previous work is important. And 97%, you know, obviously, like, getting above 90, things get way harder from there. So 90 to 97 is a really big gap. But it's also true. Like, what is apples to apples? I think your point there is actually really crucial. Like, we have a manual baseline here that says 23%. That's a couple of dudes working for a week or using fine-tuning, you know. Is that the analog? It's very ambiguous. In the blog post, we don't go much into this human baseline. And I think it's worth highlighting that it's ambiguous as to whether the models are really smart and that's where this works or the models just try a lot of stuff and one of them turns out to work. And it matters a lot which one of these because if you try to scale up, you can't run a million experiments. So it may be the case that the reason that AI did better on the humans is just because it can scale much faster and try a lot more stuff. Yeah, I would naively guess, you know, that'll be front and center with in terms of anthropics considerations here. So I'd be surprised if it was a, you know, like bottle like that. They didn't have a human attention budget on the review of the outputs. But who knows? And this is really where we need to see more scaling experiments here with the smaller and bigger models. You know, once compute budgets allowed. By the way, this is exactly why the OpenAI commitment to super alignment budget was so important, right? This is the whole thing. This is the reason that Yann Leica left OpenAI. You just need to spend. You've got to spend compute on alignment. Otherwise, you don't get results like this that could be very promising. Just a few more stories. Next one is also about alignment research. we have reproducing steering against evaluation awareness in large open rate models. So I think you mentioned this earlier in the episode in the Opus 4.6 system card, there was a section that covered this idea of using steering vectors, which we've mentioned a whole bunch of times, you can basically modify the behavior of models by injecting certain values. And it's possible to do that for evaluation awareness. So you can suppress the ability of a model to tell that it's being checked as one way to still be able to do this kind of alignment check as models get smarter. This is a reproduction of that that verifies it and also provides some sort of a little bit of additional information or findings related to it. They show that kind of unrelated vector steering can also affect this. And this is one of the concerning impacts is that steering is unpredictable. If you do that kind of to suppress awareness, you might also do other unexpected stuff. So if you're using a control of unrelated steering, that kind of may or may not be valid. And here they have some results as to why that might be problematic. Yeah, it was actually so the way you do this in practice is you'll you'll give like a baseline prompt let's say like you're writing a memo is a baseline prompt and then you look at the activations in the residual stream for that prompt and then you write a more stimulating prompt like you're writing a memo and a bunch of ai safety researchers are looking over your shoulder and then you see what are the activations associated with the the last token of of that prompt and you subtract the two and now you have a, you can think of it as kind of like a regularized vector, basically a vector that is grounded relative to some sort of more neutral, you know, you've got your, someone is looking over your shoulder and you've got a more neutral prompt. You take the difference between them. And hopefully that, that new vector captures just, just the meaning of someone is looking over your shoulder. And so now you can think about, you know, adding it and injecting it into a model's residual stream and making it more evaluation aware or subtracting it and therefore reducing the amount of eval awareness of a model. That's kind of like the naive way to do this. There are a bunch of other ways you could do this. You know, sparse autoencoder features is another way. We've talked about those a lot on the podcast. I won't repeat the whole SAE thing here, but thematically it's sort of similar. The problem that they run into is if they basically come up with like completely like random prompts, as you said, imagine two random prompts and you do the same thing. You basically end up with like a random delta or random difference between two random prompts that really shouldn't do anything consistent like that. That should just be sort of noise. But they find that the effect of doing that can be as strong, in fact, stronger in one case. I think they talk about how the most effective steering vector that they find in their big list of steering vectors is actually one of the random ones. So on the one hand, yes, it's true this does work because if you focus on prompts that are meant to elicit eval awareness, you will end up with, in general, vectors that can stimulate or oppress eval awareness. That's true. However, it's also true that, like, those are not the only and, in fact, not even the best ways to create prompts that do this. In fact, random vectors do just as well. What that fundamentally means is that this process is very fragile. It's again, it's not that it doesn't work. You do have interventions. It's that it's super fragile that it's sort of like I was sort of thinking of an analogy like in medicine, we give people medication that, yeah, like this will help with your skin condition or your heart disease. Is it the best medicine for that? Well, no, it's not. Probably like some really weird fucked up medicine is actually the best thing. But this will do in a pinch and it may have all kinds of other side effects. That's the other implication. if things are this fragile for eval awareness, then presumably they're that fragile for everything, which means that the vectors that are designed to suppress eval awareness specifically are probably doing all kinds of crazy other shit, too, that we're not picking up on. That point was less emphasized in this report. But for me, it's actually kind of like the most salient thing, because it implies that then, like, when you think you're suppressing eval awareness, and you are, and you're celebrating the good behavior of your model at the other end, in a way you're not actually measuring the behavior of the same model because there's probably all kinds of weird tail effects that are happening in other dimensions besides eval learn. So this just means alignment, interpretability is really complicated. And you don't get to just insert activations into a residual stream and be like, oh, this is all good. Yeah, fascinating. Recommend the read. yeah they do say in the cure as well steering effects are complicated and unpredictable this is widely known but worth repeating they highlight it a little bit and yeah to me the takeaway is like i mean the the use of evaluation awareness removal is because you're evaluating a model for safety if by removing the awareness part you're also like making it safe when it isn't the whole thing is pointless. So yeah, very good thing to be aware of. By the way, this is from the UK AI Security Institute, which we've covered, I think, other stuff from lately. So they seem to be killing it in terms of kind of very useful, small scale research that nevertheless is informative and kind of moves alignment, understanding further in small ways that are still valuable. They're doing a great job. We love doing it. You just had to go there. Next, back to politics. Iran has threatened complete and utter annihilation of OpenAI's 30 billion target AI data center in Abu Dhabi. He says that so calmly. I love that. I'm just saying that. The truth is, complete another annihilation is the promise here from the IRGC spokesperson, Brigadier General Abraham Zofagai. I don't know if that is realistic or if it's just, you know, saying things for the sake of saying things as you do. But obviously, if they are capable of it, kind of a big deal. Yeah, and we've talked about stories where drone strikes and missile strikes have possibly, probably hit either AWS data centers in the UAE or thereabouts or data centers. I think Oracle was kind of on the list there. We covered this like last week and the week before. But basically, yeah, these are absolutely live targets as far as Iran is concerned. Expect this to continue, right? I mean, the whole idea here is this is critical infrastructure that is supporting a war effort, regardless of how you think about it. As more and more of the military runs on AI, that is just going to be the case. So a couple of things, I mean, I actually watched this video. There's a really funny part of it. It shouldn't be funny, but I'm sorry. There's a funny part of it. Here's the not funny part of it. So it starts with like, you've got a shot of the earth from space. It like zooms in on Google Maps on Abu Dhabi. And they kind of, I forget what it says on the thing, but it's something like, you know, So here is the spot that the Stargate cluster is or the Stargate data center is. Google Maps removed it from their map, but we were able to find it. Nothing can hide from our eyes. And whatever. OK, so in practice like that, that's obviously that is a thing you can see from space. It is massive. And Iran will naturally have had a way to, you know, probably I don't know what their satellite situation is, but contract out to China to get access to that data or Russia or whatever or do it themselves if they have satellite. I can't remember. But anyway, in the same video, they kind of pan across this whole collection of like famous public figures that are involved, like Americans who are involved in the big buildouts in Abu Dhabi. And they're like, oh, there's like a, yeah, I have to see if Sam or fucking Sam Altman is there. And they put a little U.S. flag and they write Sam Altman. Next guy, the next guy. And then they get to like this, this bald brown dude and they write Satya Nadella and they keep panning. And I'm like, wait a minute. That's not Satya Nadella. What? so i haven't seen anyone like comments on it like maybe i do what my eyes need to get adjusted like i don't think so this dude was not the audience and then i think it's yeah for context this whole statement is part of this edited video after that there's like this text that gets revealed on screen we will do whatever it takes to defend our country and the interests of our nation. To be clear, this was a threat as retaliation. So if the USA proceeds with its threats concerning Iran's power plant facilities, which Trump has been making lots of unhinged threats around attacks of Iran, including the destruction of an entire civilization, which is crazy. Yeah. So this was kind of a counter threat, which it makes a lot of sense in context. Yeah. It's just like when you're bragging about your intelligence capability. You might want to double check. Yeah. I think Satya is breathing a sigh of relief somewhere. He's like, all right, they're not on to me. I think also there's this flex of like there's a zoom from space into the earth of what I guess is Google Maps and then it says nothing remains hidden into our site or hidden by Google which I don't know if it's actually impressive or not but they really are showing off in this video last up to kind of end on a less intense note maybe we have Wall Street banks try out Anthropik's mythos as U.S. surges. So this is part of Project Glasswing. We know that Anthropik partnered with over 40 organizations, including multiple Wall Street banks like J.P. Morgan, Goldman Sachs, Citigroup, etc. Apparently, Treasury Secretary Scott Besant and Fed Chair Jerome Powell had a special meeting with major CEOs in Washington, D.C. to warn them to take mythos seriously and use it to detect vulnerabilities in their systems. So I guess a real demonstration of Mythos having potentially major impacts on banking, right? Yeah, I actually, the first emails I sent once Mythos was at least publicly announced were to, I won't specify, but a bunch of central banking related authorities on the security side, because that is just the obvious, the obvious sort of first threat vector. And they were, by the way, like they knew that Mythos was coming before it was publicly announced and had been sort of getting briefings and trying to trying to push that out beforehand, as you might imagine. The awkward thing, we haven't covered it this week, but so the Trump administration has now called in, right, Anthropic, and has asked them to make Mythos available to the U.S. government writ large to cover all these cyber blind spots. We've just finished covering a bunch of stories about how the U.S. government has been trying to like murder anthropics business, slit its throat and then bury the body in the desert. And now they're like, hey, like this is kind of awkward, but can we please, please, please have that model? So like this is just showing you how well, how quickly fortunes turn in this space, but also how short sighted all of this friction with anthropic really, really was. I mean, you know, we've been saying on the podcast for a long time, but like this was a dumb, dumb, dumb move. And it was obviously going to be constitutionally challenging. I don't I haven't seen analysis of these court cases that suggest that they have much promise or much in the way of legs. And so now, on top of all that, Anthropik is in this position of leverage over the USG. They're playing ball, it seems, which is good. It's just that, man, like you should not be doing this if you genuinely think that this is national security infrastructure. And it obviously is like this. Yeah. Anyway, so things have turned around pretty quickly. And I'm sure we'll have a lot more to say about that next week. And I think it's an interesting thing related to national security. There's often a metaphor of advanced AI to nukes, which is an imperfect metaphor, but kind of links to this many aspects of it as needing control, as being massively important. This is an interesting case with regards to banking and cybersecurity because there are nation-state hacking groups, North Korea, Russia. They go off to banks and steal money and mess with elections and do all these kinds of things. And in that sense, advanced AI with cybersecurity attack capabilities is very dangerous. Like it is nukes in the sense that North Korea actively tries to hack and steal money and mess with all sorts of countries in all sorts of ways. And if they had MIFOS, like they would be wrecking a lot of a lot of stuff. So that's an example where kind of the nuke metaphor, even in the short term, even going beyond like extinction risk or whatever and army capabilities, it is already at the point where that metaphor is starting to become more legit. Absolutely. You know, and it is just a, it's a fact when you look at the kinds of vulnerabilities that Mythos was able to find and exploit, right? 26-year-old vulnerabilities in the most hardened server operating software that there is. A 16-year-old software, open source software has been reviewed literally millions of times by open source developers, including some of the best cyber minds in the world. And there were vulnerabilities that were not just like missed by a lot of humans. They were critical. Like this is vulnerabilities that allowed total system takeover, like just abject, unapologetic, complete pwnage. If that happened with a bank, basically, I mean, like you name it, like drain all the accounts, like delete the ledgers, this and that. Like we're back to a paperwork society. Let's hope all the documents were printed so that we even know who owes what to whom. Like that's literally the stakes here. That cannot happen while the U.S. dollar remains stable. And like that is a direct threat to the strength and stability of the U.S. dollar, which is arguably like the American president's number one responsibility, whether or not that's understood by U.S. presidents is a separate thing. But you nuke the dollar like it's over. So from a WMD standpoint, in terms of that sweet spot of what attacks would be deniable in the sense that, you know, China or Russia or whoever could do it, and then it couldn't be easily traced to them and also brutally effective, say, in the context of a Taiwan invasion or any other contingency or just for the hell of it. Like this is a sweet spot of a deal. You know, worth remembering that Chinese Internet has something like four different channels you can use to get into it. Ours has thousands, so many that no one can even remember what the hell's going on. So doing a reciprocal cyber attack is not an option. They can literally close their internet to us. We cannot do the same. So massive asymmetry here. This is a really, really, really big deal. To be perfectly honest with you, this is not investment advice. And feel free to call this crazy. But we are making significant changes in the balance of our portfolio. What is physical assets versus what is digital assets based on this? And I will go one step further as a sort of community service here and just say a lot of very, very high placed people at a lot of the world's top institutions, including some of the companies most directly involved in these things, are doing similar things, are getting even bug out properties that they can go to. Like, this may, it may sound crazy, and it may not come to pass. But like, again, as a community service, like, this is the kind of thing that you actually maybe should be taking more seriously, not investment advice. Do your own research and all that jazz. But this is just like, it's an increasing and pretty remarkable trend that I find it really hard to ignore. And with that, we are down to this latest episode of Last Week in the Hour, which hopefully comes out just in a couple days after recording i'll do my best as i say you can go to lastweekin.ai for the text newsletter which also has been a bit behind but i'll be releasing more regularly please do keep commenting and if you want share the podcast and review it on apple podcast we look at that as well but more than anything we appreciate you listening and please do keep tuning in Wouldn't it be... Yes. Break down. Don't ask him for it. How'd you come to the revive? Here to know. I'm calling. You're telling me. I'm telling you. I'm telling you. I'm telling you. to I know a whole point with that statement is that the home Sitting out стал besie has They just ghosted out. They read brain. Every code and... ...and they'll check it. Excited with spit. From machines learning marvels, coding things. Teachers, and Foley, see what it brings.