Last Week in AI

#233 - Moltbot, Genie 3, Qwen3-Max-Thinking

81 min
Feb 6, 20262 months ago
Listen to Episode
Summary

Episode 233 covers major AI developments including open-source model releases (Qwen3-Max-Thinking, Kimi K2.5), agent tools becoming mainstream (MaltBot/OpenClaw), and significant funding rounds for AI infrastructure startups. The hosts discuss the shift from compute-focused to insight-focused AI development, emerging political implications of AI policy, and the commoditization of AI software.

Insights
  • Distribution and infrastructure are becoming the primary competitive moats as AI software commoditizes, with companies needing their own browsers, chat apps, and APIs to remain relevant
  • Open-source models are rapidly approaching frontier model capabilities, with Chinese models (Qwen, Kimi) and startups challenging OpenAI's dominance on benchmarks
  • On-policy learning and self-distillation are emerging as key techniques for continual learning and model improvement, shifting from supervised fine-tuning paradigms
  • The shift from compute-bottlenecked to insight-bottlenecked AI development is attracting diverse funding strategies beyond pure scaling plays
  • AI development is becoming increasingly politicized, with different labs aligning with different political factions and creating structural tensions around talent and government support
Trends
Browser-integrated AI agents becoming standard features in major platforms (Google Chrome, OpenAI Atlas)Always-on, persistent AI agents with long-term memory gaining adoption through open-source projectsChinese AI companies gaining approval for large GPU imports despite US export restrictions, indicating tacit government supportShift from closed-source to open-source model releases as companies mature and seek platform lock-inMultimodal AI models natively trained on visual and text data simultaneously rather than bolted-on vision adaptersSpecialized chip architectures (optical processors, custom silicon) attracting significant venture capital as alternatives to GPU scalingMeta-reinforcement learning and teacher-student frameworks becoming standard approaches for model improvementDistributed training infrastructure emerging as viable alternative to centralized compute clustersAI safety and alignment research focusing on formal verification and goal drift prevention in recursive self-improvementPolitical polarization affecting AI lab positioning, funding access, and employee retention in the US
Topics
Gemini Auto-Browse Feature in ChromeMaltBot/OpenClaw Open-Source Agent PlatformGoogle Genie 3 Video Game GenerationChatGPT Translator LaunchOpenAI Prism Scientific WorkspaceChina GPU Import Approvals (H200 Chips)Recursive AI Chip Design StartupRecursive Self-Improvement FundingFlapping Airplanes Nature-Inspired AIQwen3-Max-Thinking Model ReleaseKimi K2.5 Multimodal ModelAI2 Open Coding AgentsTrinity 400B Parameter ModelPost-Layer Norm Architecture ResearchContinual Learning via Self-DistillationUS Immigration Enforcement and Tech Worker Response
Companies
Google
Launching Gemini auto-browse agent in Chrome for pro/ultra subscribers; releasing Genie 3 video generation tool
OpenAI
Releasing ChatGPT Translator and Prism scientific workspace; continuing platform expansion beyond core business
Anthropic
Amadei Hoffman commenting on ICE violence in Minnesota; positioning on political issues affecting AI development
ByteDance
Approved to import 400,000 H200 GPU chips from NVIDIA along with Alibaba and Tencent
Alibaba
Approved to import H200 GPU chips as part of China's tacit endorsement of GPU access for major tech firms
Tencent
Approved to import H200 GPU chips alongside ByteDance and Alibaba for AI infrastructure expansion
NVIDIA
H200 chips being approved for import to China; Jensen Huang visiting China amid export restriction discussions
Recursive (Hardware)
Startup hitting $4 billion valuation with $300M funding for specialized AI chip design using recursive improvement
Recursive (Software)
Co-founded by Richard Socher; building AI agents for recursive self-improvement with formal verification for safety
Flapping Airplanes
Launched with $180M seed funding for nature-inspired AI development focused on data efficiency over compute scaling
Safe Superintelligence Inc (SSI)
Stealth research startup focused on fundamental insights for AGI rather than pure compute scaling
Neurofos
Raised $110M Series A for optical processors for AI inference as alternative to GPU scaling
RCAI
30-person startup releasing Trinity 400B parameter open-source model competing with Llama and GLM
Prime Intellect
Collaborating with RCAI on Trinity model using distributed training infrastructure with 2,000 B300 GPUs
AI2 (Allen Institute for AI)
Releasing open-source coding agents project with Sera 8B and 32B models outperforming other open-source options
Alibaba
Approved for H200 GPU imports; operating Ernie model with 200M+ users requiring significant inference capacity
Baidu
Operating Ernie model with 200M+ users, creating GPU supply constraints for R&D despite large user base
DeepSeek
Releasing competitive open-source models; compared against Qwen3-Max-Thinking on benchmarks
Gladstone AI
Co-host Jeremy Harris's organization focused on AI and national security implications
Microsoft
In partnership with OpenAI; competing with Google on platform integration and distribution advantages
People
Andrey Krenkov
Co-host of Last Week in AI podcast; AI grad school background, works at AI startup
Jeremy Harris
Co-host from Gladstone AI; focuses on AI and national security implications
Jensen Huang
NVIDIA CEO visiting China amid H200 GPU export approval discussions and chip supply negotiations
Ilya Sutskever
Former OpenAI researcher now at SSI; influential in shift from compute-bottlenecked to insight-bottlenecked paradigm
Sam Altman
OpenAI CEO; lobbying with Trump administration; riding scaling train with significant political positioning
Dario Amodei
Anthropic CEO; positioning through essays to remain robust across political administrations
Amadei Hoffman
Anthropic leader commenting on ICE violence in Minnesota; taking political stance on immigration enforcement
Jeff Dean
Google researcher; commenting on ICE violence and Trump administration actions affecting tech workers
Reid Hoffman
Major Silicon Valley investor and tech figure; decrying Minnesota ICE violence alongside AI leaders
Richard Socher
Co-founder of Recursive (software) and You.com; building AI agents for recursive self-improvement
Yann LeCun
AI safety researcher; historically argued people won't give models internet access; criticized for optimism
Tim Rockdashel
Google DeepMind researcher; published world model simulator paper that Genie 3 builds upon
Quotes
"Distribution wins an awful lot of these wars. People like to imagine that the best product tends to win. In reality, our workflows are usually pretty set and it takes quite a bit of switching costs to move to a new platform."
Jeremy HarrisEarly in episode discussing Gemini in Chrome
"We're in 2026. Things are getting weird. What can I say?"
Jeremy HarrisDiscussing models talking to each other about debugging
"If you're a frontier lab looking to gather training data to help AI systems learn to automate AI research, do recursive self-improvement and trigger a singularity, this would be something that I might prioritize, actually, even over revenue."
Jeremy HarrisDiscussing OpenAI Prism scientific workspace
"The U.S. is sliding towards authoritarianism and the end of democracy. And when you get to that kind of extreme scenario, it's not just about whether you're Republican or Democrat."
Andrey KrenkovDiscussing political implications of AI development
"Everything is a platform play when you get to a certain scale. And I think they've achieved the penetration that they're going to get."
Jeremy HarrisDiscussing OpenAI's diversification beyond core business
Full Transcript
Hello and welcome to the Last Week in AI podcast where you can hear us chat about what's going on with AI. As usual in this episode you will summarize and discuss some of last week's most interesting AI news and you can go to lastweekin.ai for even more news articles in our newsletter. I am one of your regular hosts, Andrey Krenkov. My background is that I studied AI in grad school and now work at an AI startup. And I'm your other regular co-host, Jeremy Harris from Gladstone AI, AI and National Security stuff, as you will tend to know. And yeah, we've got, I think, an interesting episode today that we're going to have to get through in about 25% less time than usual. So we'll see. We keep saying this, we keep saying this, and then we don't do it, but we're going to try again. We'll see. Yeah, and this episode is partially interesting. There's not any big, big news in the AI business front and the AI development front. There's mainly some really notable open source releases and papers. So this one might be a little more technical as they go and we'll try to not get super, super nerdy. I think in the last couple episodes, you started getting really into the weeds of these papers, which might not be for everyone, but we'll keep it a little quicker. And just to quickly acknowledge, I think we mentioned wanting more comments or appreciating people's comments. I did notice we've got some more feedback on YouTube, so it's nice to see. We're checking it out. One person mentioned there not being any flashy thumbnails. and I just make these kind of very nerdy looking thumbnails for YouTube for those who just listen. And I do have a personal kind of feeling of liking that style. Yeah, it's funny. This podcast is very much like made in our internet garage. I mean, it's fun. I do it for the fun personally. There's so much to keep up on and I feel like it forces me to have a clue what's happening. So really appreciate you guys listening in. And it is, by the way, those comments, they really do, because we do it for the fun, they make it more fun. And it gives a sense of community. And like you guys are actually listening and asking for stuff that you want. So anyway, I just really appreciate it. So thank you. And one last thing we'll mention before we get going. Last episode, we were just chatting before or after I forget. And we're talking about how, you know, we have a lot of data from having recorded so much and having transcripts. So now that Vibe coding is a thing, what if we just got Claude to go and look at all these transcripts and do some analytics? And it worked. I just did it over the weekend, created some dashboards. And there's some funny things there. Like, for instance, like the data is very clear. I talk much slower, significantly slower, like 20% slower than Jeremy. But I also talk 20% more. So our ratio of speech per episode is almost exactly 50-50, which is pretty impressive. If you think about it. Yeah, that is cool, actually. We also do tend to speed up towards the ending of recording as we hit our limit and have to kind of become more efficient with the news we cover. And what this doesn't capture too is like Andre in the background around the like 60, 70% mark of the episode is like feverishly in our Google doc refactoring and being like, okay, the story, we don't have time to do it. We're going to cut it. We're going to do this. We're going to do that. And like the whole, I mean, he's managing an orchestra. I get to just sit here and basically read my notes about the next story and like try to think about what I want to say while Andre is just in this frenzy. So you guys don't see his hard work here, but it is happening. I wouldn't say feverish. I think I'm pretty relaxed about it these days. You don't look panicked, but you're experienced, I would say. Yeah. All right. Well, with that, let's get into the news, starting with tools and apps as usual. The first story is about Gemini coming to Chrome. Google is adding this auto-browse feature in Chrome that is going to be available for pro and ultra subscribers. And it is kind of what you would think. It's an agent powered by Gemini that can do multi-step tasks, do search, schedule appointments, match subscriptions, very much like what we've seen before with Cloud for Chrome extension. And then, of course, there was, I think, ChatGPT agent is what we called it. And also the browser from OpenAI Atlas with a built-in agent. So now we've got the Gemini agent. And it's, I guess, surprising that it took them this long to roll it out, perhaps. But now I think we'll see if people actually start adopting these kinds of tools. Chrome, of course, is the most popular browser by a decent margin out there. And now that it's built in, I'd be curious to see if more people kind of just use it for stuff. Because there's always an example that people say of like, let it book a flight for you. And like, why would you want AI to book flights for you? That's the worst use case. Actually, that's not untrue. It's funny, these things that sound good on paper and then they just, they struggle to translate in the real world. I'm sure there will be a use case where it's hard to imagine this kind of, you know, agent in browser not being the future. It's just like, this doesn't feel like a kind of crypto thing where people are like a solution in search of a problem. But I think the shape of the problem that it ends up solving might surprise a lot of us as we find kind of new ways to change our workflows around it. But it's a massive structural advantage for Google to have this kind of distribution. Distribution wins an awful lot of these wars. People like to imagine that the best product tends to win. In reality, you know, our workflows are usually pretty set and it takes quite a bit of switching costs to move to a new platform. So Google has had this advantage of being such a recognizable first place to do web search. And when they introduced their generative search too, it's like, I think most people are consuming their search through the generative search that Google provides now rather than the traditional way. So yeah, I mean, they'll continue to do this. I think for Google to, like as we see software continuing to get commoditized and it gets cheaper and cheaper and make software thanks to these coding tools, I mean, development speed is gonna accelerate quite a bit. And that'll mean that essentially infrastructure is the layer of defense, right? So you're really seeing battles of compute versus compute here and distribution versus distribution. Those structural factors start to matter more. And in that context, everybody's got to have their own browser. Everybody's got to have their own chat app. Everybody's got to have their own API. These things are just kind of the ante that you got to put up to even be in the game. And speaking of an agent that works on your behalf, next story is that a lot of people are starting to test out this open source one called MaltBot. Well, now as of today, it's called OpenClaw and it used to be called ClawedBot, I think. and it is an open source implementation of cloud or some other model basically connected to a whole bunch of things you might be using like whatsapp signal slack i think it has access to calendar and the pattern is it's kind of always on you can message it from whatsapp give it tasks and it goes off and just does it it's sort of like in cloud code you have dash dash skip work dangerously or something where the AI just gets maximal permissions to go and do whatever it wants. That's kind of a vibe here where you are giving it access to a lot of stuff and telling it to go do things for you. And then it does it. And this kind of like became a hit on Twitter. I guess a lot of people are starting to use it. I saw just this morning that there's now a thing called Maltbook, which is like Reddit. but for the actual bots that people are using. So this is kind of like that her moment. There's also another spinoff called OS the Companion where someone is bundling this so that people don't have to host it on their own laptop. You know, it goes into the cloud. Like this is getting to a point where there's an always-on agent. You can text and that agent has access to whatever it needs to to do stuff for you. and there's been some fun examples of people starting to use this for actual tasks. Yeah, the integrations, as you can imagine, are like with everything basically, you know, WhatsApp, Telegram, Slack, Discord, like all these things. And it's a good test of like what is the high watermark for what people are willing to tolerate risk-wise too. I think that there's a pretty asymmetric tail here. Unless you have a burner laptop that you're comfortable just like nuking, you know, none of the data on that is data you want to save. None of the hardware itself isn't stuff that you want to save because there have been cases even of that where it's just like you get irreparable damage done to your machine. I think that's kind of the phase that we're in right now and we're transitioning out of it slowly. I do believe we'll get to the point where people are handing over really intimate levels of access to their computer. Obviously, Claude doesn't quite do that. It tries to mimic the effect of that as much as it can while keeping itself in a sandbox. you know, remains to be seen how long that will last and how effective it will be. But at least for now, that seems to be kind of the consensus. Whereas people who, yeah, want to use Moldbot, it's like you're really just trying to, you know, let this thing out of its cage. You mentioned the, oh man, I forget the name of that forum you mentioned that's like Reddit for Moldbook. Moldbook. Yeah, yeah, that's right. I really wanted to put actually in today's episode, I think it'll be for next episode, but there was this story about these models talking to each other about debugging their own, like, oh, hey, I'm running into this weird, unexplained error. And then the models, other models or agents answering like, yeah, I've run into the same thing. It's actually just a context window limit issue. If you just change the fit, blah, blah. And then another one jumps in like, yeah, I've had this too. Like human beings, except that they're in a sense talking about their own brains. We're in 2026. Things are getting weird. What can I say? I mean, it's, yeah, it's a fun time. There's a subreddit, I guess, a sub area on Moldbook called Blast Their hearts with the description affectionate stories about our humans and someone also pointed out that some bot presumably made a post about wanting to have private storage for conversations so this is like a real test for alignment and like it brings back to the idea of also like a decade ago or earlier with AI safety, a lot of the discussion of AI safety was about like AI escaping, actual careful kind of thing. Like you don't give it internet access, but then it gains internet access through persuasion or hacking. And the reality of what's happened, which has been pointed out already plenty of times is like, no, we just decided to live dangerously and create AIs that have access to the internet and do whatever. and there's a way I will not need to like escape. That's right. Well, yeah, and that's, it's fine. I mean, I like to complain about Yanlokun. This is just like my personal thing. I'm sure he's a fine fella, you know, but one of the things that he has said for years is, well, people just won't give the models access to the internet. Well, people, obviously, people won't give a misaligned model access to whatever. And it's like, dude, like, what? Yeah, there's a lot of memes running around. Like we designed a total laptop access model from the movie, don't design a total laptop access model. Whole body of like, yeah, AI, you know, alignment, AI safety conversations on this. Obviously, this is, yeah, the latest and greatest sort of example of that, I guess. And by the way, I guess the other thing worth noting is that unlike cloud code, these are always on and they're persistent. So they have built-in long-term memory mechanisms. And that's kind of the other interesting thing is when you have an agent running persistently with memory and context aggregated over weeks, it might actually go some interesting places that Cloud Code doesn't. It's also a good test for long contexts and a demonstration where with the current models, we don't have continual learning. and so they don't really kind of learn in the same way humans do they just take notes more or less over time and there's a lot of hype about continual learning this would be like one example where in the future presumably each one will learn in its weights or whatever moving on next story is about genie 3 it is now going wide with access expanding to gemini ultra subscribers so genie 3 we've already covered in the past is kind of the interactive video generation demo from google so you can prompt it to create a world you typically have a control character although you can also control like a pack of cigarettes or anything else and you can play this generated game and it's quite impressive there's like very consistent generation you can prompt it to make gta you can prompt it for assassin's creed you know all the typical games and you can also make it do very amusing things like being a pack of cigarettes or people have examples taking photos of their pets and playing as their pets or playing as a kid's toy or whatever. We can't really do it justice in words. You would have to go and look it up and watch your fun demo clips. The closest I've seen as a mapping from a speculative but promising research project that I think we covered about maybe two years ago, Tim Rockdashel's group over at Google DeepMind came out with this, or maybe he's a Google AI. Well, it's all Google AI now, I guess. But they had this kind of world model simulator where, you know, if you remember, this was take a video and turn it into a playable video game. And we talked at the time about the architecture behind it. And it was really, really cool. But this is kind of just that, but like a polished version of that, it seems. So it really maps, like you can go back to that paper and sort of go, oh, wow, that, you know, that was it. Maybe we'll dig up the name for next episode, the name of that paper. But it really was right on the nose. It is still an early experimental release. So it's got a couple of limitations. You've got, well, one is just the most obvious thing. You're not going to have great prompt adherence every time, right? So you're not going to get the perfect thing every time. You will sometimes, not always. And then some of the characters that you get are more or less controllable than others. And if you think back to that paper we talked about, that's because this is all being done in a very, I mean, it very much is just like a deep learning type of thing. It's not, there's not like a symbolic thing that's like, hey, that's going to guarantee and enforce the equality among characters. This is all that kind of learn stuff. And limitations in generations are in place, right? So you can only have up to 60 seconds of generated output. But 60 seconds is pretty damn good. Like if you think about the coherence time of these videos, right? We often, it used to be when you do like image to video or whatever, things would kind of be coherent for three seconds. And then all of a sudden, like faces would start to melt and walls would like vaporize and all kinds of weird stuff would happen. So 60 seconds is pretty impressive. Just look for that to get longer and longer, obviously. as the year goes on. And I think it's going to roll out to more and more people. This is a big deal. Next, a release from OpenAI. They are releasing ChatGPT Translator. So this is Google Translate, but from OpenAI with ChatGPT, more or less. There's only a couple of differences. It doesn't support quite as many languages, something like 50 languages. And it has the ability to choose tone. So you can make it more business formal or less formal and so on. But otherwise, kind of the same idea really as Google Translate. OPI seems to just launch a lot of stuff that is not their core business. And this is definitely an example of that. Yeah, I think now as they start to mature, right? Everything is a platform play when you get to a certain scale. And I think they've achieved the penetration that they're going to get. I mean, if you look at what they're hitting in terms of their user base, they're knocking on the door of a billion users. You're starting to get into that space where it's like, we need to find ways to have people spend more time on our platform. There's just no other way around it. And so yeah, they're going for higher hanging fruit. Increasingly, their competition is just Google. Like it's just, that's what it is. So Google has Google Translate. They're going to have to have that. They're sort of in this weird marriage with Microsoft. And so how does the wider ecosystem of Google Drive? So I wouldn't be surprised if they tried at some point to even have like an AI first version of Google Drive, just because again, it's really a platform play at this point. How much of your life can be gobbled up by OpenAI, by Microsoft, by Google? Like these companies are really, they're hitting the kind of edge effects of this whole ecosystem. You can only get eyeballs on for so long and then you got to find new use cases. So I think this is another step in that direction. I'd be interested in what kind of data they get from that. Because when you go into translation, The other interesting thing about that is you do get a kind of access to real-time interaction information, at least between people who don't share a common language. So it might give you access to a certain kind of private data where people are trying to converse in that context and they're forced to dump their context into your thing. So anyway, yeah, it's interesting. Wider platform play, OpenAI continues to spread its wings, basically. And speaking of that, there is another release from OpenAI this week, Prism, which is quite different. It's a workspace designed for scientists. So it's kind of like a word processor, but it has integrated GPT 5.2 and it's meant to help you assess claims or wise your paper, search for prior research. It seems like a kind of really a specialized version of ChatGPT for science in particular. And I guess this is coming after more discussion of GPT 5.2 for science. And if you're on Twitter, you see a lot more discussion of these models assisting with proofs and trying to find to, especially in math, novel things, but also physics and stuff like that. And boy, if I was a frontier lab looking to gather training data to help AI systems learn to automate AI research, do recursive self-improvement and trigger a singularity, this would be something that I might prioritize, actually, even over revenue. And I suspect that's a big part of what that data is going to be used for is to train their own AI systems at doing AI research. I mean, that is explicitly OpenAI's goal. This, I'd be very interested in to look at the terms associated with the use of these tools, because that's just such an obvious way in which it overlaps with OpenAI's long-term mission. And on to applications and business, we begin with our favorite story about China related to GPUs. China is giving Binat to Biden Alibaba and Tencent to buy H200 chips So China has approved the import over 400 H200 ships This is occurring to sources and other firms are joining a queue for more approvals over time Sounds like approvals come with conditions. And this is coming as Jensen Huang was apparently visiting China. And yeah, we still don't know what those conditions are. It's still nominally being decided on. And yeah, 400,000 of these GPUs have been approved for a ByteDance, Alibaba, and Tencent. So that's a good quantity, actually, a big chunk of the capacity that has been discussed, at least to this point. This is interesting because if you recall, I think last week or two weeks ago, we're talking about a story where Jensen Huang was saying, look, there's not going to be a big flashy announcement from the Chinese Communist Party saying, hey, we're open for business. What you'll see is the purchase orders will just flow and nothing will come, nothing's going to stop it from happening. And that will be the sort of tacit endorsement that Chinese Communist Party's giving. Looks like that's not the case. Looks like, in fact, they're just, you know, they're just coming. Well, this is also, according to sources from Reuters, this is not like a formal announcement. This is, you know, people familiar with a matter now know that these companies will be placing orders, I guess. Yeah, actually, you know, sorry, that is a fair point. Yeah, I guess I'm confusing the big flashy headline with, you're right, the formal announcement for it. Yeah, you're right. Yeah. You could say it's basically, it is just kind of a leak in a way of that thing that Jensen told us. So that's fair enough. Yeah. Now we did talk about how we expected this to go through. There had been a whole bunch of stories. And this, again, I think it was something last week, the week before, where this announcement like, oh, China's saying no. And yeah, they're declining these orders and just they don't need these silly NVIDIA chips. At the time, I believe we said quite clearly, mark our words, this is not going to hold and China will open the floodgates. So this is it. I mean, it's not surprising. They just need more capacity. One of the biggest challenges that Chinese companies are facing right now is just that they don't have enough chips to even serve inference to their customers. So they'll have like 200 million users, as I think Baidu does for Ernie, and they just need to have enough hardware to serve up their existing models to those customers, which leaves them almost nothing left over for research. And so keeping up with the West actually becomes harder and not easier because they have such a huge user base. So there's this kind of like interesting balance between, you know, money can't buy GPUs if you're in China just because you're so supply constrained. So weirdly, making more money by having a larger user base paradoxically leads you to having less GPU budget for R&D, which is kind of interesting. So that's a local thing. It's going to change over time as the capacity issues are eased. But it's kind of interesting. And speaking of chips, next up, we've got a startup. Recursive is hitting a valuation of $4 billion just two months after launch. They've raised $300 million at that $4 billion valuations. and their plan or their pitch is to develop chips that are meant to be specialized for AI, be better than presumably TPUs or other specialized chips for AI. So interesting to see that there's still players in a space. I would have expected at this point that all the companies building custom AI chips would have launched. maybe investor appetite went up as knowledge or, I guess, awareness of TPUs rose sometime last year and last few months. Yes, this is kind of an interesting and different take on recursive self-improvement, recursive self-improvement. So they have supposedly a system, like a chip, that can create its own silicon substrate layer and then speed up AI chip improvements. And so this is kind of like, I guess, this faster way of printing circuits. Anyway, so the idea is you iterate on that to get to AGI. So it's a very hardware-oriented version of this. We've talked about the software-only singularity model. This is a take that kind of blurs the line between the software and hardware stuff. So definitely interesting. And former Google researcher is at the head of this. So these are serious people. They have already set up this thing called Alpha Chip that's been used in four generations of TPU design. So they know their design really well. And another startup named Recursive also in talks to get funding at a $4 billion valuation. apparently this week. This is, I guess, that software equivalent founded or co-founded by Richard Socher, who has been around in the space for a while, is also a founder of U.com. And their goal is to build AI agents that will self-improve, presumably recursively as well. Their pitch on the website, just looking there, is they have a platform that equips agentic AI with scientific simulation and optimization tools. So they, I guess, are trying to provide more of a product to solve problems. And then I'm sure part of a pitch is also recursive self-improvement. This one seems to be a little less far along. They don't have as much of a pitch that I can see. And kind of weird that Socher is the lead of you.com. The founder is going to still be there. sounds like he'll be maybe more of an advisor or not the CEO of this company. Anyway, it's all kind of still being cleared up. Yeah, it's sort of interesting. A bit of a red flag then for you.com, to be honest. I mean, the way these companies go. Unless you're Elon, you don't tend to run two different significant companies or I guess unless you're Jack Dorsey or whatever. But kind of philosophy here on their approach to recursive self-improvement thesis is somewhat different from open AI. So they want to be less product oriented, more sort of pure play, recursive self-improvement company. Think here, maybe more philosophically aligned with like ILIA, you know, safe super intelligence. They've got an approach on sort of like recursive self-correction that focuses on like a basically formal verification for safety. So they're concerned about these models as they recursively self-improve, kind of drifting away from their original goals, which is a problem a lot of people have flagged is like, you know, you might have a pretty well aligned initial model, but then as you start that recursive self-improvement loop, goal drift happens. And then you end up with this very kind of dangerous misaligned model that's also very capable. And so they're focusing on mathematical proofs to guarantee that goal drift doesn't happen in the way that they're concerned about. So anyway, one to watch for sure with a fundraise like this and we'll see. Yeah, I do wonder, like, I will say, I don't know if this 4 billion number, if journalists got confused because there's not many other sources you can find for recursive as opposed to recursive. In fact, if you Google for recursive, you'll find recursive and vice versa. So there's fewer sources about this one, I'll just say. and one last big funding round we've got a lab really called flapping airplanes that has launched with 180 million in seed funding and that name flapping airplanes kind of hints at what they do they are aiming to find a different approach to ai development to agi that's more inspired by nature so that's more data efficient and the AI's are able to learn quicker without ingesting half the internet or the entire internet. That's the basic pitch and they're definitely seeming like a more research first effort. Yeah, exactly. Arguing that like, hey, we're more insight bottlenecked than compute bottlenecked, which again, consistent with safe superintelligence's approach and recursive's approach if in fact this recursive thing is the thing that we just talked about. Yeah, so I mean, you know, this is a recurring theme and it does change the way you think about investment in these sorts of problems. It's also reflective of what we're already seeing. Like, it's almost like Iliad came out and said, hey guys, I think we're insight bottlenecked rather than compute bottlenecked. And then all of Silicon Valley went, oh my God, we might be insight bottlenecked rather than compute bottlenecked. And part of this could be interpreted to be you know, Sam Altman really rode the coattails of Ilya on the scaling train and Dario and all that, like, you know, for a long time. And in the regime we were at back in 2018, 2020, it was approximately correct to say scaling was the main thing blocking us from AGI. Like in retrospect, it's clear that like, I don't think anybody would have bet on us having stuff that was this close to AGI in 2020 time now. But we're kind of at the point where we seem to be tapping out raw scaling. And so there's a debate as to whether that's actually the case. And I think that's an interesting debate. But you're certainly seeing Sequoia and Index and all these funds now sort of spreading their bets around a little bit more, looking for other kinds of companies that are not just hardcore scaling as much. The more that approach becomes appealing, the more the massive CapEx costs of pure play compute scaling are going to come into doubt. And so we're figuring out right now what the ingredients are for intelligence. No one really knows, but you got to have a balanced portfolio if you're concerned about getting surprised by some stray breakthrough that requires way less compete. Right. Yeah. This is most comparable to SSI startup that, as far as we know, is just doing research on fundamental insights, kind of the next couple of research breakthroughs necessary to get to AGI. That's the pitch here. They're still in Stealth, really not much that we know, but founded by research types and still quite a small team. I do want to say, too, there's an important national security implication, this whole new paradigm. If it's not just about pure scaling of compute, then for one, the US-China chip stuff becomes less important. I think it's still important. I think almost no matter what paradigm you come up with, having more compute to work with it is probably going to give you an advantage. It would be hard to imagine that not being the case, but who knows? The other piece is your insights become national security level secrets. You can think of like Ilya right now is based in Israel, right? So pretty much assume that Israeli intelligence is all over that, right? Assume also that any company that has Chinese infrastructure that's plugged into that is all over that. Assume also that to the extent that the Western countries have interest in that, they may be all over that. So your need for security actually starts a lot earlier if you're playing the game of, hey, it's all about the insights. Because there are these insights that you can pass along in a 10-minute conversation that could save billions and billions of dollars in compute costs to the extent that's the case. Like, boy, does this become an interesting basically game of tradecraft and getting spies in places and monitoring communications. I think a really important aspect of this for all the people who are working on data center security and all this stuff, don't forget there's this giant hanging chad just waiting to cause problems if, in fact, we live in Ilya's world. And one last funding story, actually going back to chips, we've got a raise of $110 million in a Series A round by Neurofos, which is developing optical processors for AI inference. So this is dealing with optical processing units that are meant to outperform GPUs, which don't use light and photons, at least so far. So another bet on a different, more kind of speculative chip architecture that would make it more efficient and allow scaling. I guess, you know, at some point we're going to hit a wall of scaling on GPUs. They already are insane. So I could see this being very attractive as a bet on like infinite scaling or something. Yeah, like really it's all about how do we get down to almost like the physical limits of heat propagation in data processing and all that. And like just, you know, moving to the optical domain is a big win for that because you don't have circuits. You don't have electrons bumping into each other and rubbing each other and producing heat that then has to be dissipated. Like all this whole nightmare of kind of traditional fabrication. Optics is hard. Optics is really hard. photons are really annoying because you can't keep them in one place. So you can use them for computation, but then storage becomes this massive problem. They are massless and they move at the speed of light. And so keeping that contained is not usually the solution. And so usually people look at hybrid things where you have storage in kind of electronic form and then photonics for the compute. But then that implies an interface between them. So there's a whole like, this is a very difficult space, but at some point, I mean, it's physically possible. So somebody's going to crack it. It's a question, I guess, of whether it'll be a smooth... Jensen's law will continue where Moore's law left off, and will it continue into the optical domain smoothly? Who knows? I'm sure we'll find out at some point. Right. And it does sound like we're not... We're doing a slightly different photonics approach. There's other players in the space already. Just from briefly looking into the details, they have this meta surface modulator that has optical properties that can do matrix multiplication. So it's not exactly like you have photons instead of electrons. It's not kind of an optical transistor. There's some advanced physics and science going on here that might be a little fancier. Yeah, like metasurfaces are, so they are still using photons. It's just that a metasurface is something that basically you can modulate its optical properties, often by giving it electrical currents or maybe there's like nonlinearities where like higher intensity light gets, it induces different properties from the lower intensity light. So it's like, like metasurfaces are this area that got really hot, I want to say like early 2000s or something. Anyway, if you wanted to encode optical properties on a surface so that it looks like a matrix, almost literally, like you could shine a beam of light on it and then the light that comes out the other side, a little patch is really bright where the number is big and then a patch is dark, like that very roughly that kind of idea. This is where you can get into the sort of spatial light modulation or whatever. I'm not sure exactly how they're doing it. It's not clear from this, but that at a high level is what they're sort of trying to do. And I haven't seen a startup succeed in this obviously yet, but I mean, this seems like an important thing. Someone's going to crack this one day. On to projects and open source, quite a few releases this week. First up, we've got Quen3 Max Thinking, which is what it sounds like. It's a big version of Quen3 that's optimized for thinking, trained on large-scale data, large context window. Its context window is 262,144 tokens, unusual. And yeah, they have various demonstrations that show that this is more comparable to that Gemini 3 Pro, Cloud Opus, et cetera, level of reasoning beyond existing QN models. So they've got a couple of things going on that are like a little different. One of the key ones here is the way that it interacts with tools. So traditionally, the system prompt or the developer has to tell a model how to use tools or which tools to use, which mode to use. Like, hey, I want you to run in search mode or calculator mode, whatever the thing is. In this case, that has actually been trained into the model so that it actually just autonomously chooses which mode applies. And they say that that actually helps with hallucinations too. So anyway, it's kind of an interesting piece. They also do not have a separate router model that looks at your prompt to decide which tool. So it's really like all baked into that main model. It's got all kinds of like, it's basically trained natively to have search memory, your code interpreter kind of senses rather than external plugins. And it's a really compelling, compelling take on that same model. It seems to have actually pretty good specs in terms of its performance on benchmarks too. And it is big. Like smaller QN3 models obviously kind of range from under a billion to like 30-ish billion. But this Max series is over a trillion parameters with, I think it's like 35 billion activated per token. So, you know, this is a big, big dude. Right. And they, at least the claimed performance, and you always have to take these with some skepticism. them they're saying they beat gemini 3 pro gb 5.2 claude opus on kind of most of the major benchmarks gpqa diamond etc when you do test time scaling so they also have test time scaling where you give it extra reasoning or multiple agents beats deep seek 3.2 out of water although there's also a big version of deep seek so not exactly a fair comparison i think in general probably not entirely a fair comparison in these plots, not apparent if they're using GP5.2 high or medium or low, whatever. But either way, this is clearly that kind of reasoning maxed, thinking maxed version of Gwent that seems pretty impressive. And speaking of impressive releases, we've got also Kimi K2.5 just released. Similar story. This one is more optimized for coding in particular. So This comes together with Kimi code, which is cloud code, but with Kimi. And what I've seen, like the vibe checks on Kimi have been very solid. People say Kimi K2 was very capable. Here, they say that this outperforms Gemini 3 Pro on the benchmarks. Also better than GP5.2 on another benchmark, especially multi-legal benchmarks. So yeah, the models coming out, QN, Kimi, DeepSeek, All of these are very usable and keep getting better at a fairly steady rate. Yeah, and this is quite a conceptual step forward. So it is an attempt to marry these two modalities, visual and basically text, in a more intimate way. So typically, you know, models are trained first on text, and then you'll like bolt on VisionLeader through supervised fine-tuning. what they're doing is continual pre-training on 15 trillion tokens that mix visual and text data. So putting them kind of on the same footing. The context window is 250K tokens. So it's like pretty much in the butter zone of things that we tend to see now in the open source. It's a trillion parameter model, 32 billion active. So all kind of consistent with what we're seeing for big models in the space. But again, like thinking about the model itself, instead of doing this like text-based training and then fixing it in post-production by doing some training on vision and video what they do is they don use adapters They natively train it to be multimodal And in fact they map the text and the images to the same latent space during training So it quite literally thinking in pixels and words simultaneously on the same level. Usually, again, you have an adapter, right? So you'll have your image that comes in, you will generate some kind of embedding and then process it to make it compatible with token space, which is sort of this janky thing. This is very much like, no, no, no, like whether it's an image video, I'm thinking of it in the same plane, kind of like with humans, if you say a pink elephant, right? Somehow that maps into the visual domain in a very natural way. This is sort of the idea here, and I'm sure there are a million problems with the analogies just throughout there. But anyway, so they did a whole bunch of interesting causal relationship training as well. So during the pre-training phase, they got it to predict the next frame or the next action in a sequence to specifically understand like that a button clicked in a video caused a new page to load or cause the video to pause or whatever. So developing this very natural understanding of how to interact with a computer, which is the main thing that they're after here. I mean, the reason to make these things natively visual is that they wanted to interface with the computer, right? They want to be able to use apps and tools and things like that the same way a human might, which is consistent with a lot of what Anthropik's been doing, especially lately. that's really the goal here. They also have a whole bunch of really interesting feedback loops baked in where like for coding tasks, for example, the model is trained to actually look at the rendered output that it produces. So it'll write some code and then it will run the code, produce some kind of UI, like user interface. And then the model will look at that user interface and use that to calculate basically the difference between that user interface and say the target interface that it was asked to build. And based on that, see its own mistakes and kind of like iterate on its reward. So that's one way in which is really important for the model to be able to think in visual space very naturally so that it can just do a very kind of clear semantic side-by-side between two images to understand what was missed. So it's kind of interesting. And coding with vision is a big part of this. The idea here is really to be able to take a screenshot or some video of a website and then recreate code for it. So the video piece is especially important, right? Like go to X or whatever, scroll around, do a screen cap of that and then share it and then ask it based on that, I want you to recreate this app, including all the flows that I went through, right? That's the sort of thing that can start to unlock, which is a lot more than what we've seen historically where people will take a picture of a website and just like upload it and have, you know, the landing page copied or whatever. So, you know, this is a really interesting paradigm. It's also, they got a whole agent swarm process that they train into this thing where it can create these swarms of up to a hundred sub agents and get speed ups through that. So a lot of, you know, a lot of tasks are parallelizable in this way. Not all of them are, but a lot, you know, if you want to research 50 of my competitors or something, right, that can obviously be paralyzed. You can have one agent researching each, each competitor say, but that can't always be the case. So anyway, they've got a lot going on here in this release. there's a whole bunch of stuff around how their reinforcement learning loops worked with leveraging this deep understanding of the visual domain. And for one example, right, the rewards that this model can give itself during RL can map onto the level of conceptual sort of semantic visual match to a target design, which is not something we've historically seen, at least not an open source. So, you know, it's rewards, for example, during one training step, it gets a reward only when the code it generates results in like a 95% or higher visual match to its target design. That's the kind of thing you can't, you know, that visual fidelity is just really hard to evaluate. And yet they're pulling that off here. So really impressive performance on all the things, including SweetBench Verified, almost 77%, which really puts it up there with a lot of the big Frontier models. And on OCR bench for optical character recognition, it surpassed GPT 5.2 in at least document text extraction. So, you know, this is a really serious, hefty release. It's out there in the open source. At least for QN 3 Max thinking, unlike previous QN models, it is not open sourced. It's, I guess, a project, but this is their new flagship model that I guess they're probably not going to be releasing kimi k 2.5 it looks like you can download it it is open source and as you said they are very much highlighting the kind of visual stuff that cloud code is pretty bad at from my experience also similarly large so this is like over one trillion parameters 32 billion activated parameters so you know also kind of max thinking in a way similar to quen free So easy to see these open source kind of flagship models, Quenfree, Kimi, kind of like moving towards closed source as they get better and better and people actually start using them for business purposes, I would expect. You know, a great... I mean, I got lost in the paper and started to get into the weeds. You're like, hey, it's not even open source. That is really interesting, right? I mean, the Kimi series was supposed to be... 2.5 is open source, QN3 isn't. And that's also, you know, isn't that a pattern, right? Eventually you get to the point where it's the open source model, it just can't, you know, you can't just run on that. It's not that it's bad for business, but, you know, we've seen OpenAI go through that. We've seen XAI go through that. We've seen a lot of meta. We've seen a lot of big companies rotate into closed source. And one more open source release on the coding front. We've got AI2, the Ellen Institute for AI, starting to release things on that front. They have this open coding agents project, and they are beginning with an agent and a paper. The paper is called Soft Verified Efficient Repository Agents. And so the idea there is you have these smaller models, 8 billion and 32 billion parameter models. And with this kind of researchy approach, they are meant to adapt to a specific existing code repository and do effective software engineering in that space. And they say that this means that they are better than other open source models, Quintry, Coder, and also some closed models. Of course, not at the cloud code level, but Sera 8B and Sera 32B are pretty solid. And LN Institute of the I is like open source max. They have a paper, they have the model, they give you everything. Yeah. And early mover in this space too, obviously. Yeah. A lot of impressive models. I actually think like one of the challenges is going to be for the open source community. I don't know how we keep having big releases indefinitely in the future like this when things feel so commoditized, like the pain of switching from one open source model to the next. I mean, this is why there are so many platforms now that just kind of like help you pick your model really quickly. But it's an interesting challenge. And I'm curious what the incentives will be that govern the release of increasingly capable open source models going forward. But yeah. And speaking of that, another open source release coming from RCAI, a 30-person startup, they have released Trinity, a 400 billion parameter open source model that they are saying is one of the largest open source models from a U.S. company. And this is competing with Llama 4 and the GLM 4.5, kind of other open source models out there, not competing with other frontier models. These are available for free download. And, you know, maybe to that question, we've got investors and venture capitalists to thank for these very good startups. Scumi also is a startup, of course. So always fun when VC uses fun, free stuff for the rest of us. Yeah, I think that's a lot of what's happening here. Now, this is a really interesting release, not just because of its capabilities or the way it's trained, but who trained it, right? So this is Acre, A-R-C-E-E, but a collaboration between them and Prime Intellect. And we've covered Prime Intellect and their big releases, including Intellect 1 and other Serp RL-flavored variants. These guys specialize in this very kind of globally distributed training infrastructure. The goal is to make it possible for people to sort of train in the same way that like torrent-based training kind of, where you don't have a single centralized piece of compute. In this particular case, they do use a centralized cluster. It's about 2,000 B300 GPUs. So, you know, pretty hefty, hefty piece of equipment or a bundle of equipment. So they're basically trying to debug and experiment with this new approach. I would expect, given who is doing the training here, that the next step is we start to see a lot of these ideas ported into this sort of distributed training context. A couple of interesting architectural details here. So they use local and global attention layers. So they have some layers of their model that are paying attention only to local correlations between tokens. And they use a sliding window attention. So basically for any given token, you're only going to attend to the tokens that are within a certain window's width of that token. You're kind of going to ignore the rest, even if it's in the context window. And that local attention comes with a rotary position embeddings, right? Rope. We've talked about that before. This is it doesn't matter the details, but it's a fancy way of making sure that the model can keep track of which token is where in the sequence. transformers do not natively know what the positions of each token are. They kind of just are there as a bundle of tokens. And you have to help the model to learn about token ordering by kind of layering on top some information, which is what Rope does. So that happens only if the local layers add that kind of positional embedding information because the relationships between tokens, token positions matter a lot when you zoom in, right? You want to know like, you know, Andre murdered Jeremy. It's important to know what the order of those words are. But when you zoom out, if you think at like, say the paragraph level or the page level or the book level, does it really matter so much which page? I mean, yes, which page comes before which, but like which book comes before which? And once you zoom out enough, eventually you start to find that like the ordering of things matters a bit less. And so global layers, layers that look at the whole context, do not use positional embeddings. They use NOPE, which is basically NOPE positional embeddings. That's basically the idea here. They have a whole bunch of interesting approaches that they use to address certain problems that arise with the attention mechanism. You'll often find that models will put really high attention scores on the very first token. Because of math around how the probabilities of tokens get distributed in Transformer, they find a way around this basically if we had more time for this episode we go into it but then they also have these really interesting strategies to assign experts so this is an moe model and typically when you pick which expert to send your prompt to in a mixture of experts model you're going to use or during training you're going to train the models to kind of load balance or train the experts to load balance so if this expert keeps being assigned the tokens that come in You're going to go, oh, okay, it's getting overused. Let's spread the joy a little bit to the other experts. And usually the way you do this is with a binary logic. So if an expert is underutilized, you increase its bias by a fixed amount, take one step. And if it's overutilized, you decrease its bias by that same amount. So you're taking these discrete steps. Their approach, though, uses a smoother approach. They treat it as a continuous optimization problem. So instead of using just a single step, they use a tanh function basically to create this smooth update that is magnitude aware, right? That lets the update scale based on how far the expert is from its target load, not just that it is above its target load. There's a bunch more stuff. This is a really interesting paper. And in general, these open source papers are really, really helpful, especially from Prime Intellect, because they do go into the weeds of the compute and how everything is tied together. Yeah, man, it's painful to skip these things, but I'm going to stop myself here because I know there's a few more things to dive deep. And yeah, as you said, there's a technical report. It's fairly detailed, 15 pages that goes into the weeds of the training. They also discuss the data mix and the data generation, a lot of synthetic data being used here. They have a graph of a training where there's three phases where you have different data mixtures. So a lot of that sort of dark magic beyond the basics of what they had to do to train this model, which they did over many months. And a relatively cheap $20 million, it sounds like. So very interesting to see the nitty gritty of how that was done. Moving on to research and advancements. First paper, post-layer norm is back, stable, expressive, and deep. So there's these, I guess you, I would, I keep saying nerdy, but yeah, nerdy details in neural net architectures, layer norm, pre-layer norm, post-layer norm. Where you put the layer norm is the gist, right? There's this normalization step where you make the outputs kind of look nice and similar, right? And you can put that in different places in your neural net. And according to this paper, post-layer norm architectures are potentially more expressive, but are harder to train. They have gradient instability, and they find a reason for that. The reason is residual paths. But anyway, they have a small tweak that they say then makes it possible to train very, very large networks with post-layer norm and not have it be an issue. Yeah. And like the intuition behind this is something like, so at every layer in a transformer, what you do is you take the, so you have an input that comes in. The output of that layer is going to chew on that input a bunch. but then you're going to add the input back in to what you spit out and send on to the next layer. So in a sense, you're kind of saying, all right, I'm going to like, imagine that somebody gave you, I don't know, gave you some sort of object and they're like, hey, I want you to like, fuck with this object. So you're like, okay, cool. I've now fucked with the object. And now you've like completely fucked with the object. It's a different object, but you also want to pass on the original version of the object that you got and give those both to the next layer. So that the next layer can be like, okay, if your fuckery with that object was too extreme, you went too far, you kind of lost some important context, you at least have the version that the previous layer got itself as an input to work from. And so this is a way in which you can persist down a very large number of layers, a lot of the core information still needs to be propagated. The math gets complicated, but it turns out that if you try to normalize, like, let's say, in a way where the original object and then the fucked with object both kind of have to occupy, they basically get the, they have to share a maximum amount of information space together. They have to duke it out, basically. Then in that situation, you run into this problem where over time, over the course of many layers, the original version, the residual from the last layer that you're trying to keep in there gets beaten down and beaten down. At every layer, it gets squeezed more and more, and the information it contains gets progressively destroyed. And so what they do here is they say, okay, let's just amplify that component at every layer to give it a fighting chance so that it continues to be robust all the way down the line. That's very roughly the intuition here. The math is quite interesting. The result is important. And the bottom line is where you put your layer norm significantly affects the gradient math, basically the math by which you determine what the updates ought to be to your model's weights. That's basically the gist. Right. And this kind of combines with in the previous week, we also really in the previous weeks, we've discussed a lot of these like tweaks in the architecture of a transformer that are small in a sense. This is one tweak in the way you create a layer, but that tweak kind of makes a significant difference. It doesn't fundamentally change the transformer. It's not this research breakthrough that we've been discussing, but more and more of the really deep advanced tools are being explored, it feels like at least lately. Next, we've got a paper on continual learning. Self-distillation enables continual learning. So they here are tackling continual learning in the sort of traditional sense where you have a sequence of tasks of different kinds that you need to learn. And the challenge is as you learn new tasks, can you still be good at the previous tasks that you learn? And the gist of the approach is that you have a teacher model, a big, strong, kind of good LLM, and you have a student model. and that teacher model kind of provides the answers to the smaller model while the smaller model thinks about the task itself so they go into a bunch of details in terms of on policy versus off policy briefly speaking off policy is when you're kind of doing the task and learning of your learnings you're not getting data from somewhere else and trying to learn and on policy typically kind of often can be better and more stable. So that is just, you can see I'm trying to go fast because we still have a bunch of papers, but they demonstrate one way to do learning in a stable way that has no degradation across a few phases of learning. Yeah, and this on policy, off policy thing, like intuitively, we all feel it every day. So when we're trying to learn something, right? This adage, learn by doing, that's really what this is getting at. you know, people will either learn by being shown a textbook that tells you how to do a thing. And so you'll read the textbook and be like, I kind of get it. But if I ask you to do the thing, you're going to start shaking your boots. Whereas if instead I had said from day one, okay, let's take you out and actually get you to start doing this right away. You are actually putting in like your agency, your decisions are determining your next actions, which then determines the feedback that you get, making you learn a lot faster in a more rich way, in a way that maps more directly onto the kinds of choices that you would be making in the real world when you go to do the thing. And so off policy learning is basically that textbook thing where, you know, you given a textbook and you off policy because you not actually testing your own approach to solving a problem You just reading somebody else On policy is I going to actually use my policies the policies that I have in my brain and they probably be bad to start but I going to use my policies my strategies my approaches to solve this problem, get feedback from the world that directly tells me about my policies, not about someone else's, but about my own. And that way I can learn much faster, more effectively. And so what they're doing here is exactly kind of figuring that out. How can we make this process of fine tuning into something that looks more on policy, like active learning. And their solution is, yeah, you take a teacher, you have a student model and a teacher model. And instead of some external model, the teacher is actually the same base model as the student. They are both the same pile of weights. And what the teacher does, though, is it gets a set of expert demonstrations that are loaded into its prompt, into its context, basically. And based on that prompt that helps the teacher better understand how to solve a problem, the teacher is going to evaluate the solutions that the student generates. And so essentially what you're trying to do is cause the student, well, it's the same model really, but cause the student to update its weights in a way that encodes the information that was in the context window of the teacher. But the teacher is still using the same weights that it had to generate its feedback. And so the teacher is on policy. The student is on policy. Everybody's kind of like doing this active learning thing, which again is just like it's being trained on its own generated outputs. It's getting to see the consequences of its own actions. So yeah, I mean, it's actually kind of an interesting paradigm and potentially much better for catastrophic forgetting. We've talked about this before, this whole idea of going from textbook to textbook to textbook, you kind of forget what was in the first textbook because you're just reading. Whereas if you go out of the real world, you do a thing, you're playing with a much more robust way of learning. And anyway, we've talked about this in previous episodes quite a few times, this difference between supervised fine-tuning and reinforcement learning from the standpoint of catastrophic forgetting. It's just much more advantageous. Right. I guess the key kind of term self-destillation is kind of notable here because distillation typically is you have a big model that is you know you trained and then you get a new model a different model that you distill that's smaller here you have this teacher and student but as you say the teacher is in some sense the same model but with different a different environment and so you distill into yourself you kind of self-improve right is the idea and the next paper is kind of amusingly very similar and came out almost the same day, reinforcement learning via self-distillation, which conceptually is, let's say, a neighbor to the previous paper, not the same, but related. So briefly, the idea here is, again, you're using a model to provide some sort of signal to train, but instead of having this teacher model that has demonstrations their pitch is instead of just having rewards instead of just your phosphometric live verified words the key thing is to have feedback so retrospection like a richer reward signal from the model itself it looks back on the previous outputs analyzes not just if it got wrong but also how and then uses that to train And so, again, you wind up with an on-policy approach with self-improvement as the key of it. And they also have results demonstrating that this works. So, you know, a lot of it seems like continual learning, as we've said, is the topic that everyone's excited about. And on-policy learning kind of predictably is going to see some exploration like this. Yeah. One of the big things we're seeing here is like, how do you take context and then turn it into weight updates? This is another example of that. The key is here that the model will like the teacher will take in the original prompt, its own original failed attempt, right? That didn't give the right answer. And then it will also take the feedback that you just talked about to generate a corrected response. and because it knows what the feedback was, it knows about the failed attempt, has all this context, that that corrected response is much more likely to be correct. And then what it can do is take that corrected response, which hopefully is going to be better, and then basically update the weights, update its own weights to minimize the difference between that corrected response and the initial response that it gave through, well, the KL divergence gloss, basically, which is what you would do in this kind of context. So essentially, it's just like trained to produce the corrected answer directly from the original prompt, just like skipping the whole mistaken feedback loop in the future. So this is a great way to make sure you have richer feedback accounted for, kind of more nuanced feedback that literally turns into a better answer. And then you directly use that answer to improve your weights. And the model is the teacher. It's also the student. Again, this is like that whole idea of self-distillation, which is crucial in this paper and the previous one, as you mentioned. And speaking of that, we've got a real kind of sequence of papers related. Next one is teaching models to teach themselves, reasoning at the edge of learnability. Related, but again, different. This one is what they say is an asymmetric teacher-student meta-reinforcement learning framework. So there's again teacher-student, but it's asymmetric in that the teacher is separate from the student. So they have the teacher proposing these question-answer pairs that the student is then trained on. And the teacher is rewarded based on the student's improvement. So you're kind of teaching the teacher to be a good teacher based on the student learning from the teacher. and that kind of goes into that meta reinforcement learning thing where the student is doing reinforcement learning but then the teacher is doing reinforcement learning from the student's reinforcement learning. So yet another exploration of this kind of continual learning from a different approach, kind of learning to learn or learning to teach, I guess, as opposed to the previous two papers that have kind of set in stone a way to do it. Yeah, and then the goal here is, you know, you give like a hard problem that is way harder than either the teacher or the student could initially tackle. And then what you do is you use the teacher to generate simpler question answer pairs and to iterate with the student on getting the student to be able to solve those. And then if you've done a good job of choosing your question and answer pairs, even though they're simpler than the problem you're actually going after, that really hard problem that's your goal, they should help the student get a little better at solving the hard problem. And so that's the key metric that they use to train the teacher. And so as the teacher and student continue to interact, the teacher should be learning to generate specifically the kinds of practice problems for the student that make the student better at getting better at the harder problem, if that makes sense. So it is actually quite interesting. It's a sort of bi-level meta-RL loop here where you have the teacher and student, like basically the student iterating hard to get smarter, but then the teacher in the outer loop trying to get better at making problems that make the student better at solving the hard thing. They've got a whole bunch of anyway, they go into detail on some really interesting stuff to solve for this. But ultimately, we've seen versions of this before. But the challenge that they always fall into is like, if you want to reward the teacher, you need to typically go in and figure out, okay, exactly how did the problem that you generate, the problems that you generate make the students smarter? And mathematically, the gradient propagation math of that is just a nightmare. So what they do here is they just basically make the student a black box in this context. Like they treat the student's improvement as a black box, is what I mean. Just a black box reward for the teacher. They don't unroll the entire student training process and simplify everything that way. And it turns out that it works reasonably well. So pretty interesting conceptual update. All right. And that's it for research and investments. Moving on to policy and safety. And unfortunately, we're going to have to do some politics talk, which I know people in AI and tech aren't typically fans of. But I think now's the time. The first article is Amadei Hoffman joined tech workers decrying Minnesota violence. So this is after the events of this last weekend and the last couple of weeks in the United States. We've had ICE invading Minnesota, more or less. And unusually for tech and for big companies, as things have escalated rather radically, we've had people in AI like from Anthropic, like Jeff Dean from Google, now Reid Hoffman, a major figure in Silicon Valley, not directly in AI, but he's a major investor and pretty influential in the space, starting to comment on this not being good. that ICE, the Trump administration, broadly speaking, is out of line. And this is coming after, you know, politically, AI has tried to cozy up to Trump. We've had open-air donations. Notably, Sam Altman did a lot of lobbying, you know, directly with Trump. And I guess there's not, obviously, the things going on with ICE and immigration enforcement in general are bad to keep it focused on AI. There's a lot to be said about what's going on that we've kind of indirectly touched on with the way Trump and this current administration is dismantling expert controls. We've touched on that in the last couple of weeks. Funding to scientific research is being hurt pretty badly. You know, if you're a PhD student from China or generally from abroad. In AI, there's now a lot more kind of anxiety going on. So the gist is overall kind of the political situation in the US is getting worse. And it is now so bad that Silicon Valley people in AI, people in tech are starting to comment on it. And we don't discuss it very often, but it is having kind of significant effects on the overall trajectory of AI in some ways that are probably not bad. We have no regulations kind of restricting progress, but in other ways with regards to research funding, with regards to generally the stability of the US and people in tech, things are not great in the US, right? Yeah. I mean, setting aside the politics of this, which I'm not going to comment on that piece. I I think it's, you know, everybody has the views that they have and that's, you know, it's tossing a grenade and things. The implications from an AI standpoint are kind of interesting. If you think about, so Sam has had a lot of success kind of tying himself to the Trump administration. Think about like the Project Stargate announcement, right? That was very much like from the Oval Office. And you know, Jensen has done similar things. Whereas what we have seen historically is like Dario tending to struggle more to make inroads with the Trump administration. There was like recently that I forget what it was, some kind of like roundtable. We keep seeing these like events, these or these working groups that bring together like all the labs except Anthropic. And it's like kind of getting awkward in the same way that, you know, like the Biden administration did this like electric cars thing and then Elon wasn't invited. It's like it's the classic sort of everything gets political at a certain point. I think there's been some interesting discussion as to how Dario is trying to position with all his essays that have come out lately. And maybe we'll talk about the essay that he put out recently next week. But just in terms of the frame and trying to make sure that it's not unpalatable to the current administration, but also robust to changes in administration, changes in Congress. Like these labs, I mean, AI is becoming a political thing. There's just no two ways about it. And, you know, that with the amount of capex that's being invested, congressional races are being determined by the AI lobby in no small part. You know, it's not surprising. We've seen it in previous technological generations. But what the different labs want is different. And so, you know, and the representatives of these labs are either in or out of grace with different administrations and sides. And so I didn't expect it to be the case, but it seems like there are some labs that are more Democrat coded and some labs that are more Republican coded. And what an interesting and I mean, I will say unfortunate to return of events because it does have us orient away from just like underlying technical realities of what are we building here? And, you know, when we see stuff like the risk of autonomy, the risk of loss of control, bioweapon design, cyber weapon design, like these are things that should transcend politics, really. But it's interesting to see that they don't seem to be entirely, not entirely at least. Right. And I do want to be a little more explicit and direct, especially given the way 2026 has gone. Like the U.S. is sliding towards authoritarianism and the end of democracy. And when you get to that kind of extreme scenario, it's not just about whether you're Republican or Democrat. It's also about whether you kind of bend the knee to Trump in particular. And we've seen Apple, Google, all the CEOs kind of cozying up and no one is going to say things in a position to increasingly non-democratic and authoritarian actions. And as we get into 2026, I think this might actually be a major question for AI since we have so much investment in data centers in particular where the government has a lot of power to mess with you in all sorts of ways. let's just say politics and AI are going to keep getting more entangled as the situation in the U.S. gets more extreme, which will probably keep happening, unfortunately. Yeah, I mean, well, yeah. And again, without speaking of the politics of the issue, I'm trying to separate this a little bit out from that. Yeah, I'm kind of over trying to separate it personally because it's just beyond ridiculous. But yeah, if we try to separate it, there's a lot of it that's purely technical or purely nerdy, right? Well, there is also just like the structural. So like if you look at China, right, what's happening with the surveillance state there, you know, and I think everyone can agree that that's not a great thing. We fortunately there are constitutional limitations on the state's ability to spy on people and things like that, which admittedly we have Edward Snowden. We have a lot of cases where that gets sidestepped, but at least nominally that's there. Still, AI is going to exacerbate all of that, right? I mean, we're seeing it in China. Like the surveillance state, it used to be that you could maybe credibly make the case that, well, you know, they're collecting everything that I put out, sure, but collection is not analysis. There's no time to actually look at what I'm doing. And obviously, AI changes that calculus quite materially. So anyway, yeah, the technological pace of things is definitely, it is becoming political. I mean, there's no two ways about it. But yeah, everybody feels very strongly about a lot of things. And everybody is more or less right to feel strongly because the world is actually in a state of massive, massive flux. Right. So the concrete news, just beyond the general discussion of the topic, is that given the violence in Minnesota, now leaders in AI like Amadei from Anthropic, like Jeff Dean and Rita Hoffman are starting to comment on the violence. And if there's more violence, there's also a lot of pressure within the companies from just employees at Google or whatever to take a stance. And that's going to have repercussions for the development of AI for all sorts of things. And that is actually such an interesting problem, right? It's rocking a hard place because if you're obviously Silicon Valley is not typically thought of as the bastion of hardcore conservatism. And so you have to have these employees work. The best employees win the AI race. That is a huge, huge part of this, at least. But at the same time, government subsidies, the support of the government in getting access to energy and getting access to licenses and getting like all this stuff is really important, too. And so you kind of have to like you kind of have to pick one. And this is the resulting in a lot of people stuck between basically their employees on the West Coast who tend to a certain way. and then the administration that tends another. And boy, I mean, I don't envy anybody trying to ride that tiger. Right, yeah. So we'll try not to touch on US politics too much, but just FYI, things are crazy. If you're in tech, you might start to get impacted by it more and seeing more discussion of it within the AI circus, really. For sure. Well, on that bummer note, we're going to close out. Thank you so much for listening to this week's episode. As always, we like to see your feedback. If you can share the podcast with other AI fans, that's always great. But more than anything, we like to see people listening. So be sure to keep tuning in. Tune in when the AI news begins, begins. It's time to break. Break it down. Last week in AI, come and take a ride. Get the load down on tech and let it slide. Last week in AI, come and take a ride. On the labs to the streets, AI's reaching high. Move tech emerging, watching surgeons fly. From the labs to the streets, AI's reaching high. Algorithms shaping up the future seas. Tune in, tune in, get the latest with ease. Last weekend, AI, come and take a ride. Get the lowdown on tech, can't let it slide. Last weekend, AI, come and take a ride. From the labs to the streets, AI's reaching high. From neural nets to robot, the headlines pop. Data-driven dreams, they just don't stop. Every breakthrough, every code unwritten. On the edge of change, with excitement we're smitten. From machine learning marvels to coding kings. Futures unfolding, see what it brings.