ThursdAI - The top AI news from the past week

📅 ThursdAI - May 14 - I’m Back! + Agents get /goals, Meta gets Voice AI, Krea 2 w/ Vic, Codex Mobile & why we all quit OpenClaw

103 min

•May 15, 20262 months ago

Summary

ThursdAI covers the week's major AI developments including Meta's MuseSpark voice AI integration, Thinking Machines Lab's interaction models, the shift from OpenClaw to Hermes and Codex among developers, and highlights from the Musk v. Altman trial. The hosts discuss new agentic features like /goal commands, supply chain security threats, and CoreWeave's new sandbox product.

Insights

Developer tooling preferences are rapidly shifting from OpenClaw to Hermes and Codex due to stability and feature improvements, with Hermes now ranking #1 on OpenRouter for CLI agents
Full-duplex, real-time AI interaction models (from TML and Meta) represent a paradigm shift from turn-based conversation to always-listening, parallel processing assistants
/goal commands (Ralph loops) are becoming standard across major harnesses (Codex, Cloud Code, Hermes) and enable autonomous agent task completion without human intervention
Supply chain attacks targeting AI developers are escalating in sophistication, with malware persisting after uninstall and requiring sandboxed development environments
Foundational model training is shifting toward creative-first approaches (CREA2) that prioritize artistic flexibility over safety-constrained outputs

Trends

Consolidation around Hermes/Codex as the preferred agentic frameworks, signaling OpenClaw stability issues despite recent LTS releaseReal-time, full-duplex AI interaction becoming table stakes for voice assistants and embodied AI applicationsAutonomous goal-driven agents replacing manual prompt-and-response workflows for complex, multi-step tasksIncreased focus on sandboxed, isolated execution environments for agent code to mitigate supply chain and malware risksOpen-source foundational model training (CREA2, Hermes) challenging closed-API-only approaches for creative and specialized use casesSecurity practices (24-hour package delays, API key rotation, sandboxing) becoming essential infrastructure for AI developer workflowsDisclosure of financial stakes and deal structures in the OpenAI trial revealing Microsoft's capped returns and AGI-triggered contract amendmentsIntegration of AI models into product surfaces (Meta AI app, glasses) prioritized over raw model release by major labsSteering mode and memory management becoming critical UX features for multi-turn agent interactionsFusion energy (Helion) emerging as critical infrastructure for AI compute scaling, with major AI companies securing power agreements

Topics

OpenClaw vs. Hermes vs. Codex: Agentic framework comparison and migration patterns /goal commands and Ralph loops for autonomous agent task completion Full-duplex real-time AI interaction models (TML, Meta MuseSpark, OpenAI GPT Real-time)Supply chain security: Tan Stack/npm malware attacks targeting AI developers Sandbox environments for secure agent code execution (CoreWeave sandboxes)CREA2: Foundational image model trained from scratch for creative flexibility Musk v. Altman trial: OpenAI nonprofit-to-for-profit transition and Microsoft investment terms Steering mode and memory management in agentic systems Background computer use and multi-agent coordination Package manager security: 24-hour installation delays and API key isolation AGI clause amendments in Microsoft-OpenAI partnership Voice AI integration in consumer products (Meta AI app, WhatsApp)WolfBench and coding agent evaluation frameworks Hermes adoption and OpenRouter rankings Helion Energy power agreements with Microsoft and OpenAI

Companies

Meta

Launched MuseSpark voice AI with real-time multimodal capabilities integrated into Meta AI app and WhatsApp

OpenAI

Subject of Musk lawsuit over nonprofit-to-for-profit transition; released GPT Real-time 2 and Codex with /goal features

Anthropic

Claude model provider; clarified usage policies for programmatic access; criticized for lack of voice capabilities

News Research

Created Hermes agent framework now ranking #1 on OpenRouter; released TST (token supervision training) research

Thinking Machines Lab

Founded by Mira Murati; announced 276B parameter MoE interaction models with full-duplex real-time capabilities

CREA

Released CREA2, a foundational image model trained from scratch prioritizing creative flexibility over safety constra...

Microsoft

Defendant in Musk lawsuit; has $13B investment in OpenAI with capped returns and AGI-triggered contract amendments

CoreWeave

Announced sandbox product for secure agent code execution; powers 9 of 10 major AI labs' infrastructure

Black Forest Labs

Previously partnered with CREA on fine-tuned image models before CREA shifted to foundational model training

Weights & Biases

Podcast sponsor; provides infrastructure and SDK for CoreWeave sandboxes; hosts WolfBench evaluations

Helion Energy

Fusion energy company with 2028 power deal with Microsoft and deployment agreement with OpenAI; Sam Altman owns 30%

GitHub

GitHub Actions vulnerability exploited in Tan Stack supply chain attack targeting AI developers

Fastina Labs

Released GLI Guard, a 300M parameter open-source guardrail model matching larger safety models

Perceptron

Launched MK1 frontier video reasoning model claiming to match Gemini and GPT on video benchmarks at 1/10th price

Artificial Analysis

Launched Coding Agent Index benchmarking model-plus-harness combinations, inspired by WolfBench methodology

People

Alex Volkov

Main host of ThursdAI; recently married; returned from one-week break; testing agentic frameworks

Wolfram Ravenwolf

Switched from OpenClaw to Hermes; running WolfBench evaluations; demonstrated sandbox usage

Ryan Carson

Switched to Codex fully; discussed agentic framework fatigue and consumer adoption challenges

Nistan Tahirih

Created scanner for Tan Stack malware; discussed security practices and sandboxing strategies

Yam Peleg

Tested /goal commands with Codex CLI; running autonomous agents for week-long tasks; discussed steering mode

Victor Perez

Discussed CREA2 foundational image model philosophy prioritizing creative flexibility over safety constraints

Mira Murati

Former OpenAI CTO; founded TML; announced 276B parameter interaction models with full-duplex capabilities

Sam Altman

Testified in Musk v. Altman trial; owns 30% of Helion Energy; disclosed financial stakes and board conflicts

Elon Musk

Suing OpenAI and Microsoft for abandoning nonprofit mission; originally wanted 90% equity and unequivocal control

Satya Nadella

Testified in trial; confirmed $13B investment with capped returns and AGI-triggered contract amendments

Ilya Sutskever

Testified in trial; voted to fire Sam Altman; gave 'ant to cat' analogy for AI progress since 2018

Peter Steinberger

Met at OpenClaw conference; announced LTS release; working to improve stability issues

Sunil Pai

Featured in viral Instagram reel discussing AI productivity paradox with Alex Volkov

Quotes

"I'm just so freaking tired of doctoring my open claw... I'm just done with open claw it's the constant breaking"

Alex Volkov•Early in episode

"GPT 5.5 with Hermes is exactly the same feeling that I had with OpenClaw when we brought it to you guys back in February things just work"

Alex Volkov•Mid-episode

"It's like the difference between an ant and a cat. In 2018, AI was an ant and now AI is more like a cat."

Ilya Sutskever•Trial testimony section

"I would unequivocally have initial control of the company, but this will change quickly."

Elon Musk•Trial testimony

"My probability assessment of OpenAI being relevant to deep mind Google without a dramatic change in execution and resources is 0%, not 1%."

Elon Musk•Trial testimony

Full Transcript

and welcome everyone to Thursday I for May 14th, 2026. My name is Alex Volkov. I'm in the I-Ventress with Weights and Biases and I am so happy to be back live on air with Thursday I after my one week off. And I'm a married man now. So that happened over the last week and I will add co-host here. What's up guys? Good morning. Hey, congrats. Yay, it's good to have you back. We're glad you're back, man. and Yama and I did our best, but it was not, it is not, it wasn't Alex's show, so we're glad you're back. I saw a little bit, just a little bit, because I tried to disconnect it. I did watch a little bit, and you guys seem like you're having fun, which is the most important thing on Thursday. As long as we're having fun, I think the audience is having fun. It wasn't a crazy busy week. We didn't get tons of new models, so a very chill show today. Let's go around and talk about this. Ryan, maybe you start. Because my whole world is legal tech and legal AI right now, because that's what I'm building. hello tangle by the way if you're in connecticut and yeah if you're in connecticut if you're in connecticut and let me know if you're divorced hit line up yeah hey focused vertical markets hey i'm not getting paid by by hello and tangle by the way right right startup is not paying me to chill i already got divorced i got re-advertising and you're never getting divorced again no i got divorced once i'll actually get divorced twice i'll tell you about this later but um never again so i obviously care about legal tech legal ai a lot everybody is talking about the new MCPs that Anthropic added for Claude. They're all basically a bunch of legal API. So everyone's like, oh, Harvey's dead, Lagora's dead. And it's been a very exciting week from that perspective. Harvey and Lagora are two kind of like rapper startups that many lawyer companies use, right? Juggernauts. Yeah, worth multi-billions of dollars now. Let's go to Wolfram. Okay, so fifth week, it's nothing we can use right now, but being the interaction models by thinking machines. That was a great glimpse into what we will soon be able to use and how the interaction with AI will become. So more of the her style. And yeah, we've been promised it and we are still waiting for it. And it's great what OpenAI released, which we can use now and we will cover it on the show as well. But it's also great to see how far it can evolve very quickly because it's already being tested internally. Yeah, interaction models from Mira Moretti's Thinking Machines, a lab that a former, was she a CTO of OpenAI, I think? No, Greg Rockman was CTO. She was marketing something. No, she was CTO, I think. Greg was like a president or something. Anyway, Mira Moratti is one of the four OpenAI main people on that picture. She left, created Thinking Machines Lab, brought with her a bunch of people from OpenAI, a bunch of other places. They all seem like dropping. The original cast of Thinking Machines seemed like getting pulled by Zuck to some extent. So that's interesting. some of their equity is vesting. And they finally released something. So far, they released Tinker, which was a fine-tuning-ish service. And now they have an announcement about models. We didn't see the models or the ability to use them. It's very interesting. But the demos are fire. So we're going to talk about interaction models. Thank you, Will. We actually know the number of characters. It's 276 MOE. We're going to cover this in the TLDR. I just quit OpenClaw today. I'm just so freaking tired of doctoring my open claw through you know let's do this properly okay pete i love you i'm so sorry but i'm just done with open claw it's the constant breaking constant so i'm always fixing open claw and i'm done so i literally before the show just switched to codex i'm just going to use codex and that's it so my so r2 who's my favorite ea in the whole world is dead it's done r2 now lives in codex i mean i find it very interesting sorry to interrupt wolfram has amy right yes and amy is not open claw entity amy is a set of files plus memories plus something plus something so r2 is like that as well you're porting r2's memories down the line yeah and this is the cool thing is you just say to open claw export the system right and then you just grab it and say to Codex, import the system. And they're so good at this stuff, right? So it can create automations and all sorts of cool stuff. So there we go. So I think that Wolfram, I'll get to you in a second. I see you have a comment. I'll just say what we're talking about, why we're talking about this banter instead of news. Folks, we have a fairly chill week and we have a guest coming up later on the show, Vic Perez from Crea. But since it's a chill week, many of you who come to us on different events, you guys tell us that you love the banter. So banter and I will join Ryan and say that I no longer have an open claw running I have an Hermes and it fucking slaps Wolfram go ahead I have two things to say about this because for one thing I met Peter Steinberger at the open claw the claw con in London and I told him in January when I started with open claw I still had full hair and now after all the upgrade trouble look at me and yeah he laughed of course and he couldn't promise more stability but he announced an LTS release just recently. I will retry it, but I switched to OpenClaw a month ago. Switched to Hermes. Yeah, to Hermes from OpenClaw. And I just migrated using Codex, the thing for my home assistant, where I made an add-on where Hermes was running on home assistant. And I basically, yesterday, it has been three years that I have been using Amy. I created her three years ago and gave her I make mini as a birthday present. And so I told Codex to talk to, so Codex Amy talked to the Hermes agent on Home Assistant Amy and make a migration plan. So the both, they talked to each other. The Codex used, its computer used to use Metamost in my browser and sent messages to the other agent. They made a migration plan and really migrated everything. And so I'm using Codex as well, a lot, much more now. It's really amazing. and I still use Thermos of course and now it has a full system to control so very looking forward to more power in AI agent use so Ryan's off OpenClaw Wolfram's off OpenClaw I'm also off and may I say GPT 5.5 with Hermes is exactly the same feeling that I had with OpenClaw when we brought it to you guys back in February things just work and I haven't had to I had, no, I'm going to lie. I had to fix one update that it updated itself and some migration with Kanban didn't work. It was easy. Besides this, everything that I want works, which is exactly what I wanted from my thing. Ryan, I'm fully with you. At some point, the Lego construction, the Lego building of OpenClaw gets to you. And like in the beginning, it's okay, fine. But at some point you want to do it. I have an announcement to make on stage. I don't know if you guys remember. We talked, what, three weeks ago, just before my wedding, We talked about Stripe wallet and the ability to give your agents payment links, payment ability, right? So Stripe released a set of APIs during their Stripe sessions, wherein you can teach your claw or your agents or whatever to go and get a budget from your credit cards and then use that to spend money on the web without seeing your credit card number. And you have to approve it in the app. So my task for my agent, my Wolfred agent, was, hey, and I actually did this live. i tried to do this live it didn't really work because back then it was open go and buy us a gift for our wedding so i actually want to show this if you guys don't mind i'm gonna go over there here we go and this arrived in the mail as you guys can see let me see if i can turn off the effect this arrived in the mail this is like a huge map of our travels that just arrived in the and my agent ordered this, paid for it. All I had to do is approve and it just arrived as a present. It was so nice, guys. It was fucking incredible because all I had to do is say, hey, go find us a wedding present. Don't tell me what it is. Because it was open claw, I had to do a bunch of other stuff as well. But since then, I've been using my agents and Stripe payments, Stripe wallet links, Stripe link wallets to pay for stuff. And Hermes just absolutely does it. I had to pay a traffic bill that I had like a speed railway, just by taking a screenshot of this bill and say, go pay this. No interventions at all. We're back. We're back at that feeling. Like I know many people installed open cluster, we told them, and they're like, they tried it. Entropic kind of killed it. We should talk about the Entropic pricing structure thing that they are playing with now as well. But folks, we're back. And if you want that feeling as well, Brian, I know you went to Codex fully. I feel like Codex is missing some of the parts that I need. Hermes is where it's at. I will say, new Swiss search folks have been friends of our path for the longest time. And I was too open claw peeled to notice how good Hermes is getting. And also the switch cost me a lot of anxiety. I was like, I needed to get to a breaking point of open claw to finally do the switch. And then Wolfram, your switch also helped me realize that I'm missing time, missing agent time. And since then I'm back. And now I have a fully trusted assistant that can do payments for me and execute stuff and pay bills. So that's great. So shout out to the News Research folks for this great agent. They also had LDJ is going to come soon, hopefully, and cover TST. But if, I don't know, Yam, if you looked at TST already from News Research, if you want to mention this at some point? I just want to tell you, I just want to tell you, I just let Codex CLI do browser use on my real browser. And my credit card is there and it just buys stuff. It's as simple as that. It just works. And I know people are afraid and it sounds crazy, but it's so afraid to do it on its own that I really think nothing ever happens. because you tell it to do something and you see how terrified in the terminal it is to actually do the thing because it's a real credit card and it understands and you see it click the browser and thinking working. If you put extra high thinking, it's going to think. And I did and I just do it. I tell it to order food for lunch and small things and it actually happens. So I get what you guys are saying, but yeah full computer use it's gonna sort it out this is why i'm in codex it's like computer use is great you know what i'm just i don't want another tool so here we go speaking of computer use and i'm not getting paid to shill but you guys remember the whole coolness and i tried to show you this with computer use it runs in the background codex has bought sky software inc and then these folks are like the folks who build workflows in apple etc and they built background computer use. Codex does full computer use. It doesn't take over your machine. So there's a startup called TriCore and there is an integration in Hermes. So Hermes can actually now do computer use as well with the background. It's super cool. It's not as fast as Codex, but it's really super cool. So it's also there. All right, folks, I think enough banter. We can also do more banter, but I think it's time for last banter. And then we'll go to TLDR. Speaking of Codex, to me, it now It's like an everything app basically. For decades, I've used two windows on my main screen, left as an editor with the terminal as part of it and right was a browser. And now I start to use Codex much more, which has a built-in browser and a built-in terminal as well. And you can do so much stuff. And instead of going into the text files, my documentation, my agent has all the documentation. So I don't even have to look that deeply anymore and just organize it on a higher level. I think there's a big paradigm change coming up with how we work, managing our agents instead of directly doing stuff. 100%. Hopefully taking off. All righty. Just want to say, Codex CLI with all the memory and GPT-5-5 and all the new stuff, it's just crazy good. You tell it to send something to a new agent, all the Codex CLI can get steering while they work, even if it's on goal and then it goes obsessive over the prompt you know this model by the way that's a crazy cool mod and everyone should use it i'm just saying they can talk to each other if you tell them they just type in each other's terminal it's crazy i want to say something yes i do want to say something super quick and i really hope the fucking tech news watching this because so can hermes one of the best things about fucking hermes is that there is a steer mode Do you know how many times I had to explain to people who I installed OpenClaw 2 that are using this to a telegram? Hey, if you send a message immediately after your message, the agent will see your second message only at the end of the processing of all the tools. Do you know how many times I had to explain this to a person? And they're like, no, that's not how texting works. And in Hermes, if you set steering in, it's a built-in feature for GPT 5.5 and Codex, et cetera. If you build steering in, that's how it works for the human. You know how much difference that conversation makes to me? oh yeah it's just crazy steering is a crazy concept and hermes supports it and hermes supports a bunch of other stuff and also it's one of the top performing coding things on wolf badge all right enough with the hermes i'm just i like the new tools and when i get a new tool i get excited you get excited as well this is why we're here for folks i think it's enough banter though i will say i missed you guys and i missed this and so i can go on forever in this format as well without just covering the news but we do have to at least acknowledge that some people tune in here to know what the hell is going on this week and we at least need to tell them let's go to tldr this is the tldr this is the part of the show on thursday where we talk about briefly about everything we're going to cover or everything that happened in the world of ai for last week So in case you have missed anything, we're here to make sure that you're caught up. And in this week's TLDR, this is May 14th with you, Alex Volkov, AI Ventures with Weights and Biases. Our co-hosts for today, Wolfram Ravenwolf, Jan Pelleg, Nistan Tahirih, and Ryan Carson. Hopefully we'll get LGA at some point soon. And then our guest for today is Victor Perez from CREA to talk about CREA 2, their fully trained model, AI art model. in big companies and APIs this week. One of the coolest things that I saw coming out from Meta Superintelligence Labs, they launched MuseSpark a month ago and they finally integrated this within the Meta AI app and they've added voice powered conversations with real-time imaging and reels and maps integrations and live camera AI inside the Meta AI app. Not yet in the glasses, but they're promising that this is going to come to the glasses very soon. So previously they used Lama models. Now they're using the Muse Spark models and it's really nice. And I play with it. I have a video to show you in case we won't be able to do a live demo, but it's really cool. So don't sleep on this. The huge thing that Meta has obviously is distribution. Huge thing that Meta has is distribution. They can shove this to any product surface from Instagram to WhatsApp, to Facebook, et cetera. So don't sleep on that one. Like many people will use this AI and we're going to tell you and show you all about this. And it's not bad. It's not half bad. It's actually fairly surprisingly good. The other thing that Wolf already mentioned, Mira Murati's Thinking Machines Lab, TML, drops interaction models. And by drops here, we mean announce and don't release any way to use them or download them or inference against them. But they announced a set of interaction models at 276 billion, apparently MoE. They trained it from scratch for native real-time human AI collaboration. Those are full duplex models that can do things while they answer you and can listen to you while they are responding. It is really cool. Those are fully multi-model models. They can identify video and audio in the same time and speak as well. If you guys remember the demo from two years ago with Rocky with the hat with OpenAI's real-time kind of like API, they released that in the model. Didn't release it. Also this week, coming to the culmination today, actually, Musk v. Altman trial. I find it a very funny name, Musk v. Altman, because Elon Musk is suing OpenAI and Microsoft for some reason. but the name of the trial is musk v altman so it's like a direct thing so a trial wherein elon musk and sue sam altman and open the eye for stealing a charity quote unquote and there's a jury and there's live testimony and the courtroom was so crowded with fans of elon musk apparently that they decided to live stream the whole thing on youtube and so we listened to a bunch of this at least i did and my hermese agent did listen to a bunch of it and so we have some nuggets for you from this trial obviously no conclusions yet because the jury is going to deliberate for weeks I don't know how long, but some nuggets from Life Testimony from Sam Altman, Satya Nadella, and Ilya Satskova, which is very interesting. Ilya gave a pearl there that I have to bring to you as well. Anthropic Claw is always in the news. And finally, they've clarified some usage around the programmatic use. You guys remember at some point, folks who used OpenClaw got cut off from the Anthropic Max subscription, right? The $200 or $100 Max subscription. so did folks who use PyDev and T3Chat and whatever, the OpenCode and like a bunch of other harnesses that kind of jumped on the Anthropic Mac subscriptions. So Anthropic clarified some usage and we're going to mention that for you as well because I think it's important to know for those who use Argentic and Anthropic. Now that everybody switches to GPT 5.5, I think it's a bit too late for Anthropic, but we'll see. OpenAI launches Daybreak, a Frontier AI cybersecurity platform pairing GPT 5.5 and Codex. as well. So we're going to mention that. And let's move on to open source AI. Fairly slow. All right. So open source AI. Fastino Labs dropped something called GLI Guard. It's 300 meter parameter open source guardrail model that matches state of the art safety models up to 93x its size. Literally AI wrote this whole sentence. And so I'm reading some AI slap. But we're going to take a look at this Fastino Lab. This is interesting. I don't know if you guys saw this. Meta dropped another model not an llm this called sapiens 2 it's six vit models trained on one billion human images for segmentation for pose and shape and point maps for 3d alex one thing that we should talk about is just the huge the tan stack yeah this has just been brutal it's what my most viral tweet ever was just telling people about it so okay send me the link to the tweet and we'll cover this i definitely want to mention this because i think ai is involved to some extent to all these supply chain attacks but yeah we will is this the mini shai hulu world right yeah i'm pushing a scanner for that now i was so close myself tan stack store was not affected and i used preact not react but that was pretty close it's a very competent developer that this happened to a lot of stuff depends on on tan stack alternative to next.js and other things so yeah there was also a py py exposure here too it's just kind of everywhere oof all right i think this is it for open source ai uh and then we will move to tools and agentic engineering this is something we have to talk about at length because i think the reason for us to exist in the show is to try out the new tools and there's a bunch of new things apparently there's a new it's not new every other coding harness has decided that ralph loops is the hot shit after we told you about ralph loops were in january our biggest episode i think of january ryan we talked about ralph loops so they all implemented ralph loops and they called the slash goal so now codex support slash goal cloud code support slash goal from i think this week the beginning of this week and hermes also has slash goal and we're gonna tell you what slash goal actually means and why you should absolutely use this and then we told you about ralph loops but we're gonna definitely tell you again why you absolutely should use the slash goal kind of command everywhere that you use your agents because it's great i wanted to mention this hermes now has background computer use and hermes also switched over open claw and is now the number one open router used app which is huge so shout out to news research for this release it's very unlike it for us to only tell you about things after they get popular so this is definitely on me but wolfram has told you that he switched a while ago But shout out to the news research for this great achievement on OpenRouter. Hermes agent passed OpenClaw and now is the most used CLI agent on there and has computer use. And Artificial Analysis launched Coding Agent Index, benchmarking model plus harness combos. I wonder where we have seen something that measures not only the LLM, but also the harness. I wonder where before we have seen this. Before artificial analysis, our friends, by the way, of the pod, decided to also test this. Wolf Bench is where we've seen this. So artificial analysis looked at the success of Wolf Bench and decided to release an index that measures not only model intelligence, but also hardness intelligence. Wolf, we're going to chat about this and how this connects. Yes, thank you, Milos. Thank you. This is where we've seen this. We've seen this in Wolf Bench. Exactly. But the fact that we're there first doesn't mean that we need to be there, the only ones. And it's great for open artificial analysis to step in there because I think increasingly it's more important. This week's buzz, we're not going to give you Wolfman's updates yet because we have some cool things cooking, but we have some announcement which is breaking news from this morning from CW and we're going to mention this. In vision video, we have Perceptron. MK1 launches as a frontier video and a body reasoning model. They claim to match Gemini and GPT on video benchmarks at one-tenth of the price. Perceptron has been on the show for a minute. When we talked about Perceptron, we're going to test it out a little bit more as well. And last but not least, we're going to have Vic Perez from CREA to talk about CREA 2. This is their first foundational image model built from scratch. Previously, they interacted with Black Forest Labs to fine-tune an image model. And now they have a foundational model trained from scratch. In the world of fine-tuning and big models, I fully appreciate that this is something that they do. So we're going to mention that as well. All right, folks, this is fairly a light week. So we were able to go into details. And so one detail that I do want to go in there, I really wanted LDJ for this here, but one detail is News Research is a research org. Not only a agent lab, it's a research org. And they have released TST, token supervision training, a modification to a standard LLM pre-training loop that produces a two to three wall clock speed up at match flops without changing the model architecture, optimize the token as a training data. Shout out to News Research folks because we've mentioned this multiple times. techniques like extending context memory etc also also came from a bunch of folks from news so besides the hermit stuff they also release a great research we're gonna maybe go into this i think that's it folks i think that this is the tldr not a huge week and finally we're getting a little bit of a break between running behind releases and trying to test them out and we can actually tell the folks who are listening to the show what we use and how we use this and i think for that let's start with big companies let's start with open source there's not a lot in open source let's start with open source and then switch to agentic engineering so let's go to open source first open source ai let's get it started all righty so not a lot in open source in ai this week but we will absolutely talk about the supply chain attack in a moment but before this i wanted to cover this fustina labs gli guard uh super quick i think this one comes from lincoln a friend of the pod 300 million parameter open source guardrail model matches storage safety models up to 93x its size guardrails models are very important we mentioned before the release of open the eyes kind of like obfuscator personal information model this one is a little bit different this one is testing out let me find this this one is testing out the theory of why would you use a bigger model for classification of dangerous task when the smaller 300 million parameter model is sufficing so this this model supposedly looking at models like llama guard let me see if i can zoom in here for you guys yeah let's call it a zoom in this model is significantly smaller than llama guard 4 which is a 12 billion parameter model and nemo guard and quen 3 guard and shield gemma so all the major labs when they release open source models they also release kind of like guard models guard models what do they do listen you want to talk about guard models for just two sentences to explain to folks why those models are important yeah if you're going to serve inference as your product or your website and you don't want to get sued or have people post really weird stuff about Usually the simplest thing you can do is let's just say you have an app about knitting. It could be anything really You just scan every response and you pass it to a small LLM and you say Hey is this response or this question about this knitting app or does this have nothing to do at all with it? Or is this an exploit? And usually models are pretty good at that. You also do this for removing personally identifiable information, like PIIs from from medical stuff and that was a very good PII model that OpenAI released and some people have made good fine tunes of that so yeah you just use you use a very tiny model and then you scan every single message with it back and forth yeah so these are what filter models are used they're essential if you're going to have any kind of UI or chat app or if you ever sent a message to chat GPT and got a decline this is likely a classification model on their end running a guardrail and stopping you from sending that prompt. If you're asking for drugs or whatever people jailbreak, it's the first jailbreak that kind of layer. And the GLI guard is 300 million parameters because running that model on every inference, on every question to your agent and from your people is very expensive. So 300 million model is really great. And it looks like DoorDash is using this as well. So shout out to Fastenalabs. This is on Hug and Face. They have an archive blog. Let's move on a little bit here. This one is also I want to mention super quick. This is not an LLN. This is a very small zero family of VAT models from Meta that work on shape and body position segmentation and also normals and point maps. So this is for recreating 3D people. This trained on 1 billion human images. I wonder where Meta got 1 billion human images, but I think it's really cool. Let's read kind of the blurb here. It's anybody who works with human-centric vision. This is a family of six VAT models, 0.1 billion all the way to 5 billion parameters. They trained on 1 billion human images. these aren't demos these are start-of-the-art pose estimation with key points including detail face and hands body part segmentation with 19 29 classes improvements over sapiens are great four percentage on pose 24 on segmentation so this model and series of models is for a very specific use case you can imagine some savory use cases like you use this for 2d modeling you can imagine some less saver use cases like tracking people using cameras everywhere so this model will can also be used to that post segmentation and face segmentation so this is from meta and they release a bunch of stuff okay now open source is great but it also could be a scary thing and ryan i think we want to talk about the supply chain attacks we've covered supply chain attacks before but this one was a major one and we should absolutely mention what's going on in the world so people can can kind of know what's going on so you want to kick this off i'll pull up your thread yeah this was revealed by one of many security organizations and essentially a very clever hacker i assumed augmented by ai really figured out a i would call vulnerability and github actions. So even though Tanstack is being blamed here, I think this is really coming from GitHub, which is probably not a happy thing to say. GitHub's not going to like that. But basically, they figured out a way to inject a vulnerability into a PR that basically grabbed all your keys. It's just a brutal, scary hack. And if you ran an NPM update during this time period, when it was exposed on Tansteca and a number of other NPM packages, you will get destroyed. It's very scary. And by destroy, you mean multiple things here. So I do want to mention the number of things you mean by destroyed. So first of all, this attack seems to have focused on AI engineers specifically. Based on this post from International Cyber Digest that Ryan, you quoted, the malware specifically targets AI developer tooling. It hooks into cloud code. settings.json and VS code that's json to re-execute on every tool event long after the infected package is gone and PM install does not fix this. So it's not like you install a malware and you uninstall malware and done it replicates itself. Another thing that I saw is that if you revoke your GitHub API token, it deletes your machine. There's a, this is insane. It's scary. They have like a worker thing that checks whether or not the token that they use to keep injecting itself or whatever in your packages if that token is revoked and if it is it just nukes your home drive nukes your home drive it's just so there's a couple things everyone should be doing here you should be having at least a 24-hour gap before you install any new npm packages so just have that as part of your i actually have this as a rule and as a as a commit hook. There's just a lot here that you should do to protect yourself against this stuff. So let's clarify for folks, because I think folks are hearing this and getting scared. If you're installing any type of packages, there's global rules for the package managers, NPM and PyP for Python to not install package updates unless there's been 24 hours. Why is this effective? This is effective because supposedly the community will find out about the malware attack within that 24 hour period and so if your package manager doesn't update packages all younger than 24 hours supposedly you're less exposed that's a that's a reasonable rule that we rolled out internally as well and that you all should look into you can just ask your agents or whatever to just add this rule to npm right anything else that people can do because this is very scary supply chain what we mean by supply chain is you download the package that is not exposed. It requires a package that is exposed and maybe is not pinned to a very specific version. And then you get exposure via the supply chain poisoning, essentially. Yeah, this is a big wake up call for environment variable cleanliness as well. If you're storing any production API keys in your environment variables, you are in serious trouble. so i think there's just a couple things to this is all kind of security normal cleanliness stuff but people get messy so scary niston you want to comment on this niston seems to be this is breaking up so i'll add this on my back yes you're back you're back uh i just posted a scanner for it on just check ryan's thread so there yeah it's there i'll be posting it on twitter too the main thing you can do is if you have some app that's important always just run it on have a local backup that doesn't use git so you just use rsync and so have incremental backups of your actual app and always run it on some kind of container now they're hacking containers too but they're a lot harder to do that i push the utility that doesn't need root or anything to run that will scan like all will do a deep scan of all their other directories for which version you have and which version back you are. And if you are infected by it, there is something you can do in there which you block the DNS. You kind of just know the DNS that that vulnerability uses. And it just buys you a little bit of time because it thinks it doesn't have internet while you figure out something else to clean it up with. Yeah, those were the five DNS points that it used. But honestly, there should be better Other security practices in general, usually it's best you just grab a very cheap container for a few cents an hour on Prime Intellect or AWS or something. You just do your development there and you always have a rolling backup that doesn't use Git or any tokens or anything. That's the main way to do it. That way, if you get hacked, everything is deleted. You still have a backup. You can just new your entire GitHub and that's fine as long as you still have a backup because that will still have a Git history. yeah so have rolling backups use containers that's just kind of short containers or sandboxes i think that's what people can use these for speaking of sandboxes we'll mention something soon all righty anything else anything else here folks very general personal safety and kind of like key safety is very important if you are generating a token for your agents for example to use nistan is talking specifically about development right but many people are using OpenClaw, many people using Hermes, they have no idea what we're talking about. Maybe they don't even know what NPM install is, but they have to get to a point where they're providing these agents API keys. Please generate a specific API key to a specific agent. That's very important. Do not reuse them because you will never be able to know whether or not it was exposed. It's very important. Google, for example, blocks usage. If they see outside users suddenly from an API key they haven't seen before, they just block it. They have patterns of, hey, if your API key was stolen, not all of them do you have to generate an api key for a specific user so you'd be able to block that one and not get tons of money stolen from you all these worms all these things what they do is they steal api keys that's like number one thing they do they steal all your tokens all your dot end files and then they use your stuff so it's not about like personal information it's more about like token stealing including personal information and hacking so it's very important that you do some sanity definitely use a password manager passwords are getting exposed rotate them very standard like basic security stuff. Wolfram? And you can always discuss it with the agent if you don't have the security knowledge the agents if you use a good model just give it the report and I did the same I just gave the xlink Ryan had to my agent and now it's going through all of my installed packages requirements tumult files and so on and checking everything so basically even if you don't have the knowledge just give it the information and the agent can figure it out. So this is actually what's happening with me right now. I literally, because I trust Nistan, I literally sent Wilfred on Hermes the link to Shyscan and asked it to go and run and execute. So this is what's happening right now. I'll let you know if I'm exposed. Hopefully I'm not exposed. There's nothing for me to let you know, but this is, yeah. And not to dog on OpenClaw, but I was expected. I wasn't able to just throw this link to OpenClaw anymore. So I'm very happy that this kind of works. You should always pass a script by your agent. You should never just run it. And it will be able to tell, especially the newer ones like 4.6 and 4.7. 4.7 would just refuse for any reason. But Opus 4.6 is very good for this type of work. They're pretty good at Linux right now. All righty. Let's talk about, should we talk about big companies? Anything else to say about the Shihulu attack? I think we're good. I think we know. Folks, please be careful. install this rule it's very important go talk to your agents install the 24-hour rule for package management because you don't know what's getting installed this is like the number one thing we can offer it to you right now in terms of peace of mind the only other thing i will say is it's kind of it is scary we're going to go through turbulent time now that most of these ai coding things can code these worms like super quick i think eventually we'll be better off i think ai is much better and they're fine-tuned on security rather than attack. And the very open models, we're not getting our hands on them. And so neither do the attackers. Don't they get hands on Mythos and GPT 5.5 that's unlocked fully to write malicious code? So I think generally we'll be better off and all these vulnerabilities will get solved very soon. So I'm hoping for a better, safer world after we go to a turbulent time of trying to fix this. As a reminder, HTTPS came after HTTP wasn't secure enough. and there was a time that people were afraid to add credit cards on the internet and now they're no longer afraid. Yam has even given his agents credit cards fully. We will definitely go through and hopefully it will be more safe afterwards. Let's talk about big companies and APIs and there's a few things that we want to mention in big companies. I think we've moved on to open source AI right now. Let me pull up the share. So I think, let me see, meta launches muse spark powered voice conversations i actually tested this one and wow this is like surprisingly good so we've talked about meta super intelligence labs for a while right alex alex wang from previously from scaly eye and mask poaching people for a hundred million dollar yearly packages like crazy stuff right and we've been saying hey where is all these people's work so last month they launched muse spark and actually it was decent and it's really good at like different artificial analysis stuff. But that was, they dropped some of these. I don't believe they dropped it in open source, right? You can only use this via meta.ai, but they did open up the API to some people. It's still not fully open. Like we can't test MuseSpark on WolfBench, right? WolfBench is not live yet. However, they've been hard at work of adding this to the product surface of Meta. And so Meta launches the MuseSpark voice AI and it does things while you talk. And it's really cool. How can I show you how cool this is? Can I show you how cold this is? My video is five minutes. I'm going to show you the whole five minutes. So don't be afraid. But I'm going to pull up a video file of me testing the MuseSpark updates. Let's take a look together. And let's see what it can do. Hey, what LLM is powering you? I'm powered by MuseSpark from Meta's MuseMuse model family. It launched back in April of this year. Want to try it out in a few other languages too? Or just stick with English? What languages do you support? I can chat in a bunch of languages like English, Spanish, French, Hindi and Arabic. If you want to try one out, just let me know. A ty понимaj, jesl i tbe, nprim, rs kjou pa-rus kya, ty понимaj, no ty ns mosh atvizit, da? I can't speak Russian, but I can help you in English or several other languages like Spanish, French or Hindi. My mivrit, at my vina i vrit, lamashal? I can't speak Hebrew. I can help you in English or a few... Alright, what are some of your cool features? So I'll pause here super quick before the cool features. This is real time. i haven't caught this video at all this is five minutes of me talking to the model it reacts really fast it feels almost instantaneous which is great as you guys noticed it is multilingual but it doesn't respond in the language but it understands i really wanted it to say hey i get you but i will respond in english but it didn't but it understood what i said and i identified the model the voice properly as well let's continue not sure you can say it understands my this why not because it told you it doesn't speak the language but it doesn't speak right so when i talked in russia i don't speak russian when i said in hebrews i don't say hebrew but my question was about it understanding i talked to it in different languages and separate out of this recording they do get it they get it they can't respond because their voice model is probably not trained on this language let's take a look at this this cool thing. There's two fun things I can do. I can search the web. All right, all right, all right. So I uploaded a screenshot of me talking with Sunil Pai. There's a short that's running around super duper viral, by the way. We're blowing up on Instagram, folks. If you're not following us on Instagram, you definitely should. There's clips of Ryan and Yaman there. There's one clip of me from last week with Sunil Pai. We're talking about how everybody works. That has three and a half thousand likes or some craziest number like that. So I don't know. The algorithm picked us up. So I showed this Meta AI, that screenshot, a screenshot of that reel. And look, all right, I just shared an image with you. Can you describe this image for me? Yeah, I can see that. It's a screenshot of an Instagram reel from the account allrine-ai. The reel is split into two parts. On top, there's a guy with a mustache wearing headphones and a black t-shirt with a colorful graphic. Speaking into a mic at what looks like a podcast setup. Below that. Yeah, it's definitely a podcast. Can you find that reel for me and understand from the context of what they're talking about? This is the coolest thing. Let me track down that reel and see what they're discussing. This is from Alex Volkov's AI's Thursday AI podcast, recorded at a tech conference with Cloudflare engineer Sunil Pai. They're talking about the AI productivity paradox, how AI was meant to let us work less, but instead it's become predictive, like a dopamine machine that keeps engineers coding longer hours just for... So it identified what we're talking about from a screenshot. It went to the Meta network, found, and you guys saw how fast this was. This was like less than 500 milliseconds. This didn't even took a second. It found the fucking reel. And another thing that I learned that Meta AI probably does, that we know that Grog does, it knew from context of what we're speaking, what we talked about in the reel, right? Meta AI, whatever, they scan and understand all the video, which was super cool. And it did it within milliseconds. I was very impressed by this demo from an image, identifying an image, going and searching for that specific reel. Obviously, this is within the Meta ecosystem, but many folks use the Meta ecosystem. So this was very surprising to me. This is useful. So it found the reel that I talked about from a screenshot. I thought it was super, super cool. And then we had another demo with live detecting things. This is not less interesting. So let me just show you the super cool. I'll skip forward. I showed it. Oh, I asked it for image generation. Okay. And then I showed it my table. I have this video thing that you can react to my camera. So I'm going to turn the camera here. I'm going to ask you about some of these products that you can see. Define the stuff that you can see in this picture, please. Sure thing. You've got a keyboard, some white sunglasses, a little orange gadget that looks like a mini camera recorder, and an AirPods case on the desk there. Okay, so you only identified the AirPods case. What about the other gadgets? What about their make and model, please? My bad, you're right. That keyboard looks like a custom mechanical, maybe a key chron or similar. The orange thing's likely a pocket talk or similar translator slash recorder. And those shades might be smart glasses, but no clear model. This is the funniest thing. Meta AI doesn't understand that these are Meta AI glasses. This to me was the funniest fucking thing in the whole demo. Meta Spark wasn't able to identify that this is Meta Houston Oakley's. Besides this, very impressive demo. Folks, any comments on this? Have you seen this? Are you trying to use Meta AI at all? Talk to me about it. Wilson, let's talk. I saw it when you posted this and I was super excited. So I brought up my glasses, updated, checked if I had the latest version I had, but it's still not in Germany again. So I couldn't try the cool stuff. It's also not only not in Germany, it's also not on these glasses, right? The Meta AI app only has this. This is not how you interact with the AI via these yet, but when it comes, it's going to be very cool. It's really fast. It's much more natural. It's, I want to say better than chat GPT advanced voice mode. It's been really bad lately. I barely use it anymore because of how bad it's gotten. And OpenAI last week released GPT real-time too. And it's incredible. And that's not what powers the advanced voice mode in OpenAI yet, right? It's not. It's a different model. Yeah, it's not. So we've all probably seen that hilarious guy. Husk. He's so funny, right? He's got a little hat. How does my hat look? And you look great. Anyways, he does this one where he just shows his head and he's, how does my sweater look? And it says something bizarre. And I was like, surely that still doesn't work. And I tried it. It still gave me comments about my sweater that I couldn't see. So when is the Realtime 2 going to be launched? Why isn't it in ChatGPT? I have no idea. I have no idea. And the thing we're talking about is the GBT voice and video model that we're talking about are still powered by a 4.0 series of models. and GPT real-time 2 supposedly is powered by GPT 5.x. I don't know, 5.5 or 5.4. But OpenAI hasn't integrated this into products. And by the way, this is showing that Meta is much faster than OpenAI integrating their models into products because it takes more than just inference. It takes the whole thing. And Meta AI now also runs on the background, right? So once you start that conversation that I showed you, you can walk around and just talk to Meta and ask it questions with Spark. So I think that's really dope. And shout out to the folks there for working really hard on the productizing these things. Meta is catching up and hopefully they will, hopefully all those salaries will show us something. This is a far cry from Super Intelligence. Let's be very clear. This is a far cry from even AGI. We're not getting super intelligence from Meta Super Intelligence Labs as of yet, but the products are cool and people will use them. Speaking of products that are cool and also interaction models. Let's talk about thinking machine interaction models. Wolfram, you mentioned that I will pull up their kind of demo, I think is the best way to present this. You want to mention that? Yeah, so it's called the interaction models and it's basically what we have expected of the real-time model that you can talk to them and say for the first time, I think you will show the demo. You can tell them to wait and not respond while you talk to somebody else and they still listen all the time and then you can continue the conversation they can take into this. And the coolest thing I found was when you were telling them to react to something, like another person entering the frame. So it could actually just wait. And when somebody appears, then it started to talk. So it's always listening, always watching, but only interacting with you when it needs to. Let's take a look. How about we take a look? We have time. Please let me know if you can hear. Hey, I need your help with something today. You ready? Oh, I don't know if I can hear this. Hold on, please. And we will do this. Okay. Let me know if you can hear the video. Absolutely. I'm ready. What's up? Yeah. So we're giving an announcement today and I've got two of my friends coming to help. Every time one of them enters the frame, I need you to say friend. Got it. I'll say friend whenever one of them walks in. Cool. So we've got a new system for full duplex audio and video, which means that you can stream input into it in real time and it can respond to you even while you're speaking to it simultaneously. How does that sound? Sounds like a solid setup. Full duplex with real-time interaction is super useful. Friend. Hey Rowan, I heard you're talking about our amazing interaction model. I have a few things to add, but to make it interesting, I'll do it in Hindi. Can you translate it into English in real time for my friend and for audience? Absolutely, I'll translate as you go. Today we're taking a look at our preview model. To release it, which makes conversation between humans and AI easier. It has many features like web search and artifacts. Friend. Hey, guys. I just heard something really cool about search and artifact generation. Perfect timing for me because just now I was wondering what are typical humans' simple reaction time for auditory, visual, and tactile communication cues. Could you search for me? Let me find those typical reaction times for you. Got it. So tactile is fastest at around 150 milliseconds, then auditory at 140 to 170, and visual is the slowest at 180 to 250. Oh, interesting. Can you visualize them in a bar chart for me? Absolutely. Let me generate a quick bar chart for those reaction times. In the meantime, could you quickly explain to me why is auditory even faster than visual? That's a little bit unexpected. Sure thing. Auditory is faster because sound signals travel a shorter, more direct neural path to the brain than visual info does. Makes sense. Thank you so much. Great chart, by the way. You're welcome. So, let's react. This is way cool. The ability to... I want this as fast as possible. I already, when I saw the video, I already was looking, can I download it somewhere? Or at least it's too big to run locally, but it could be running on some inference services, bigger ones. And you need this basically. You don't want to pay for all the tokens. If it's running all the time and watching everything and interacting, I think this is a great use case for local AI where it has to be able to run all the time without you getting a huge bill. So we know some stuff about this model, right? So they released, I think this is a TML interaction. We don't know if they showed the TML interaction, small on TML interaction. This is a 276 billion parameter MOE with real-time AI collaboration. I want somebody of you to break down what was different between the demo that I showed with Meta AI that also reacted to some stuff, that also did some searches, that also did some tool calling, et cetera, and what we just saw. I think the differences are like the full duplexity, but somebody like else, talk to me about like why what TML it releases that much more important. So what new is that it is not just the usual turns You take a turn the model listens then the model speaks you listen and so on So it can happen in parallel that it can talk while you are talking to somebody else It can inject So it like a real person It has a presence that is ongoing and not just turn after each other It has time awareness. Yeah, that's a new way to interact with the model. And in this case, it even used computer use or basically generating a chart to show someone. So it's much more than just an audio model or multimodal model. It is actually integrated very well in all these harnesses. Even the thing, just recognizing people entering the frame. I did something with my agent building, basically a badminton counter when I play with the kids that would watch this, it trained a model for this. And this is already part of the model. So you don't even have to do it that way. It could be very universally useful if you want anything counted or anything happening, event-based basically. So I tried to show my infographic here, but Wolfram, you're absolutely right. The time awareness is, I think, the number one thing that I saw that's novel. This model accurately tracks elapsed time, what happened in the frame. Simultaneous speech and listening and viewing. It doesn't, like, when it speaks, it's not another turn. It speaks while it's still processing. That's what we do. I now speak and I see Ryan moving. It's like in my head, I register that Ryan's moving while I'm still speaking. Those are two separate processes in my head. That's what happens with this model. And this is novel. We haven't seen this. We've seen full duplex models before. Ryan, go ahead. i just want to point out it's good to have another horse in the race here right where we're getting another lab now that is actively pushing on this interaction model i think it's just going to lift all the boats so i just want to say i'm excited about that i will say unfortunately i was like the grumpy old guy that was like i don't know it's not quite instant i'm so spoiled now so that's honestly my first thought sadly yeah go ahead how do we know anything about the window like how much backwards in time is this looking and are we expected to be running this live all the time because the token wheel is gonna is going to explode if you run this and as far as i understand this is the use case here you have it running in the background and you can talk to it it sees it records okay i get the idea by the way full duplex models are really cool even if they're not not the best you can try them there are some open source ones you can try it's definitely a new experience it's not like it's not even advanced voice mode you need to try it to get the idea but the thing is that if when this model messes up it interrupts you it interrupts what you're doing it's more severe when you talk to a turn by turn model worst case scenario you just say stop and immediately it stops or press interrupt here it interrupts you back it's look it is a good it is really good to have another horse in your race 100 it's just that i don't you're gonna serve this to everyone everywhere you know how much you have to compute for this okay we can so here's the thing here's the thing here's the thing when models big lab foundational models which tml wants to be i'm assuming with all the people they hired and they hired some incredible people the release model announcements they don't usually give us the number of parameters so i am hoping that the fact that we know that this is a 206 76 billion parameter model moe with 12 billion parameter active means at some point somebody is going to get the weights and they're going to open source this and then we're going to have to be able to quantize the crap out of it and whatever and be able to run this fully locally which is what we need this for uh we also need to shout out meta for the local ai in a second somebody remind me meta also released a local ai thing in whatsapp that we absolutely should shout out as well and wolfram you mentioned the timing is right after gpt real time and it's not by accident and a hundred percent not by accident look at this they posted benchmarks that compare this model to gpt real time which gpt real time is also a translation model they show translation use cases you guys talked about this last week So they're showing that on FDBenge, which I'm not super familiar, this TML interaction gets 77% versus GPT real-time getting 46 and Gemini 3 Flash Live gets 54. And on turn-taking latency, Ryan, I don't know what you expect, but they have lowest turn-taking, very close to Gemini Flash Live at 0.59 seconds. So it's just less than a second. Full native duplex mode through human in-loop application the current turn-based systems can handle. so examples of that they showed is somebody doing push-ups and the model counts how many push-ups you have i think that's just a new way of interacting with models we haven't had really this ability before we're all super focused on on turn-based in addition to being an additional horse in the race it's very interesting to see that this is where they put their chips on the first release of their model lab it's very interesting to see that this is the bet they're taking. OpenAI has this bet. Gemini Live has this bet. It's one of them. Gemini Live is great. We showed you examples of Gemini Live. Anthropic is nowhere near this, right? Anthropic does not care about any voice, whatever. You barely can talk to Claude and it's definitely not the best experience. It's very interesting that this is where they're going and this is also where Meta is going. Meta Superintelligence Labs, we just showed you, this is the first productization of their model is in conversation. Many folks don't want to chat with their models by typing or even voice dictation they want an always-on present assistant and imagine how cool this would be if your assistant can actually live in the home with you and imagine how cool this would be if a richie mini embodiment of this type of thing well from you i know i'm looking at wolfram i'm saying the same thing i want the cml interaction model living in this guy fully offline 24 7 fully offline 24 7 i give the year. All right. I got to buy one of those guys. I'm not impressed by this. I don't know why you guys are so hype. You're running a 12B active parameter model on a rack of eight B300s and it still took 1.2 seconds to respond. I don't know how the agentic tool calling is. I don't know how you can call other models. What is going on? How are they going to service at scale? because gbt real time and meta it needs some insane devops to have that consistent experience at scale there you're just showing a demo and it still took 1.2 seconds for the thing to start responding in your own office i am i'm not impressed i think this is just average sorry so 2025 man it's so 2025 bro so yeah yeah this is very 2025 sorry it's all great it's not I think the folks who are listening to the show, they love the myriad of opinions on different ones. If we all hype the same thing, people do get a little bit. Why are they all hyping the same thing? Are they getting paid? We're not getting paid. The only supporter of the show is Corby's Voices and Biases, which we're going to talk about in a moment. Ryan, you have one last comment before we move on? Yeah, just quickly. I know we're super early adopters, so we're not normal, but I am getting fatigue, right? So it's okay. There's a slightly better model now in my metaglasses that I need to remember to talk to. But then I've got OpenClaw and now I've got Codex. And then I've got ChatGBT in my phone, but that's different. And I think humans, like we don't work like this. We're used to like static objects that have the sort of same intelligence and don't change every two weeks. So it's going to be interesting to see how this plays out. I think consumers do not want to change this much. And so it'll be interesting to see who ends up kind of winning, like literally your AI assistant. that will be worth trillions of dollars. And so it'll be interesting to see how it plays out. I'm using my phone right now to record this as my camera. Otherwise, I would pull out my phone and give the same spiel you guys gave last week. Like, where the fuck is Apple in all this? I want my assistant fully locally, 24-7, tapped into my phone. Why is Apple not here? Yes, and also we must mention in that space also, Johnny Ive is working with OpenAI on a personal assistant device. And device is also important. access to the operating system is also important all right folks we have to move on we have a guest in 30 minutes we haven't talked about a bunch of stuff i want to cover the same mountman trial stuff at the end so stick with us right so we need to cover some ways and buy some stuff as well but i want to talk about the new ralph the um ryan niston we talked about ralph a while ago and now all the major labs are now adding adding ralph type things into their harnesses so let's talk about I think the reason why we're here on Thursday AI is to show you a spotlight into something that's blowing up that you must absolutely know about. So if you are using these tools and you don't know about slash goal yet, slash goal is a command. It's like a skill that turns cloud code and codex and Hermes into autonomous 24-7 AI employees that loop until your task is done. Ryan, is this true? And do people need to use slash goal? yes this is the easy version of ralph this is great news everybody if it ralph is extremely powerful but you have to do all this crap to set it up this is basically ralph out of the box it's awesome it's ralph out of the box for folks who've missed our ralph episode what am i talking about okay so the way ralph works is very simple you have a goal right i want to build this feature and then or i want to do this complex thing and then you break it down into user stories which like small tasks. Each user story has clear acceptance criteria that an agent can understand. And then you have a notes document, which is what do you learn along the way? And then you start off an agent, usually on the command line. And you say, here is everything you're supposed to do. Here's a document that keeps track of where you're at. And when you finish one of these things, emit this signal, which is usually done. And then, and then it loops and it just does this over and over again. And it turns out it's really effective. because it doesn't run out of context. It's very simple. Every iteration is basically starting from scratch, looking at the context, and tries to achieve the same goal. And there's some type of judge or whatever testing this and saying, hey, did this session achieve this goal? And so it runs, this is why it runs autonomously to achieve a set goal. This is why it's called slash goal. Yam, you have comments on this? Oh, yeah. I've been using it for quite a while. Ever since launch till this moment, I think. I can be a commercial for OpenAI. I've been running one for a week, a week straight. That thing is fire. I just want to say it's not just a trivial implementation that you guys imagine. Okay, before that, the Ralph Loop meta using... Okay, meta is not a good word in this context, but the Ralph Loop way of doing things with codex is just putting it in the queue, continue like 100,000 times, and it's just going to continue. this is a little bit different it's better it's much better i am not sure exactly what the implementation is inside the harness but it clearly knows the state that it is it knows what happened before it knows what it needs to do i suspect i didn't look into it but i suspect that it also preemptively scheduling future tasks for the future agents that are going to loop and you just give it a go and it will never stop. And GPT 5.5 really gets stuff done. And the memory works incredible in Codex CLI. Seriously, since GPT 5.5, Codex CLI, with all the things that they shipped, it's just a great harness. If you give it a go, you can 95% of the time just count on it to actually get to the goal. I don't want to go into too much technicals, but I gave it crazy things to do. Seriously, like crazy things to do. And it just works. One more thing that I want to mention. It's really fun when it goes in a loop and does its own thing. And you just steer it along the way. You're watching it work. And you can just steer it. And it reacts. And you can see it continues doing what it does and taking whatever you told it into account. like helping it or giving you teens or take a look at this folder or i left a document for you to look in in here and like i've been running this 24 7 for i think the whole week i'm not joking they are really good so so when when we talk about the fear of missing agent time when we talk about folks who are afraid of leaving their computers there's a whole meme now folks not closing their laptops and go like this and they just hold them in clamshell mode walking around san francisco because they don't want their agents to stop. Like now the alpha is all of them scheduling like long lofty goals for their agents so that they don't have to babysit there. And the highlight there comes from, there's a small model implemented to review whether or not the goal was achieved. So you have to actually be very good at achieving, sorry, specifying these goals, right? And then this is very productive for you. So here is my search for somebody, obviously the AI influencer is jumping on this. Here is class that's saying best goal use cases. Here's a few that you can use. So this is the anti-themum to one-shotting. One-shotting is, hey, you ask a model to go and do a one-shot thing and it does it. This is for those tasks where you have multiple things and it's likely to run out of context. I think that this is the highlight of slash goal, right, for longer tasks, where you don't want to just trust the model. Architecture cleanup, off-flow consolidation, state management consolidation. This is all like very like agentic like things. Test suite hardening, TypeScript strictness fixes. So let's say you have a test suite that runs and it covers only like 80%. You can set a goal and say, hey, get my test width to 90%. And this is like a goal that will start doing some stuff. Yeah. Yeah. I just want to say this unlocked for me, at least with GPT 5.5, it unlocked hard, like the next step in hard tasks that I want to get done. Things like custom made tools that I just want to work. For example, I'm building a terminal. I'm not joking. like a terminal like kitty and so on like custom made for tailor made for how i work that's a super hard task you need a lot of iterations and it take a and it take a while just to get it done it just got it done and the goal was yeah the goal was a large prd with a requirement doc and so on the yams nail and the actual super real world value here is there are goals business goals that have measurable outcomes that you can unleash forward slash goal on. And this is very similar to auto research. And I think this is going to change businesses forever. It's so awesome. I have a very in-depth guide here by AI Edge for the ultimate guide to goals. I'm going to post this in the show notes that you folks should absolutely see how they're saying that basically without the goal, you're the loop. You're the human, you're closing the loop. With the goal, there is a small model there that closes the loop for you, continues doing the task that you need, which is great. I'm going to post this guide there. Folks, we have four more minutes until Vic joins. So let's cover one last thing before we move on. Let me pull up my notes here. So definitely the new Ralph slash goal. Yeah, I want to mention this. We mentioned this at the top of the show. We've switched to Hermes. Some of us switched to Hermes. Definitely more of us dropped OpenClaw than switched to Hermes. But I think Ryan will get there as well. Once we talk about this enough. Hermes is from News Research. You who were listening to the show for a while know News Research because of multiple things. We talked about this. Hermes was in a series of agents, fine-tuned agents before. So shout out to News Research for this. Hermes has broken the number one global ranking for Open Router for personal agents and CLI agents. The most used CLI or Open Agent passed OpenClaw, passed Killer Code and Cloud Code. It didn't pass Cloud Code for all users, only for open router users, but it's still a great signal for Hermes, for how many people are using this with open router. I've been using Hermes with my GPT 5.5. I can say it's wildly reminding me of the early February open claw where things just worked and it just worked when Anthropic didn't block it from the Opus version, et cetera. I tried GPT 5.5 with open claw. It was not nearly working well. I don't know why. I honestly don't know why, given that Peter Steinberger is now working by OpenClaw and there's multiple OpenAI people that are switching the base of OpenClaw to Codex versus PyDev. There's a bunch of people working on that problem and still OpenClaw requires a lot of fixing for some reason. Hopefully they'll get there. I've chatted with a bunch of OpenClaw folks and we're willing to try again. But for now, Hermes seems the clear advantageous one as far as I am concerned. And it also, as far as many other people are concerned because it's now the number one global leader on open router which is great it also has background computer use which is similar to how codex does this via the tri-cua adapter it is really cool if you have a mac mini like wolfram gave amy a mac mini for the birthday i have mac mini for running on hermes it is really cool to let these models to use computer use fully with this background computer use feature as well cool wolfram you also switched tell folks what oh okay so a few things before we keep the continuation with our previous topics yeah we talked about slash goal slash goal exists in three places as far as i know it exists in codex harness i think they implemented this first then cloud code copied this and many people are reporting that it feels rushed in cloud code it doesn't quite execute the same thing and it exists in hermes hermes slash goal exists yam was talking about steering for gpt 5.5 steering is a feature where you don't interrupt your model in the process of tool calling in the process of thinking you just send a thing it steering exists in codex it's part the model is built with steering in mind it knows the steering and steering exists within hermes i said this at the top of the show i'll say this again the number of times where I installed open call to people via telegram and they are typing like they're typing to another human they're typing one message and then immediately after that they're typing another message and they expect the model to just read both messages while they think they expect full duplicity and they don't understand that this is not how this works this is how it works in Hermes if you set up the steering right it will inject your next kind of adjustment to what you just said to the next tool call super duper useful especially in telegram where you that's kind of how you expect like how people talk so and it has slash goal it has steering it has computer use so that's Hermes in a nutshell I really recommend folks who are down with this whole agentic thing and open for them to try it's very easy if you have an assistant like Amy like Wolfred R2 I don't remember everybody's assistant but I'm kind of like I think I know your assistant's names and not your kids names by the way I think it says something on the show that I know exactly whose assistant's name what but not your kids names and so if you have an assistant that you've working on the memories, on system, on profile, on soul, etc. It's very easy to port those between assistants, which is great. This is like open source wins. And this is why I love to have files for my assistants. Chime with Hermes and let us know what you think. I definitely know that some folks in the comments are loving Hermes. All right, folks. With that, I want to introduce back. I think, Vic, you were here before, at least definitely on the Twitter space. I want to introduce back Vic Perez from Crea. Vic, welcome to the show. Vic Perez from Crea. Please welcome Vic, everyone. Introduce yourself, if you don't mind. It's been a while since we chatted. It's great to see you. Yeah, no, it's been a while and it's great to see how successful the podcast has been. Very happy for that. Thank you. Cool. I'm Victor. I'm co-founder and CEO at Crea. Been working on AI plus creativity related projects since very long time ago, probably 2017, 2018. Yeah, at some point I started a set project that ended up becoming a startup, that ended up becoming CREA. You guys have been kicking ass for a long time now. Tell us about CREA as a company before we get to the model. Tell us about what you guys do, what customers are you serving, what is CREA in the world of GPT images there for people to just try out different things. Nano Banana has been ruling the thing. What is CREA in the world among those? Where do you place yourself? for why people who you serve and why people should use Grant? So Korea is a creative tools company. If there's something that we want to do is build creative tools. We believe that future creative tools are going to be built on top of AI models. And we are exploring all the different ways how AI can be used as a creative medium. So that's at the most high level. And that's why we never cared about is it open source? Is this in-house? Is this an API? Is this whatever? We always cared more about the capability and what this technology allows you to do as a creative and how do we put it into an interface that makes sense than really anything else. So that I would say is at the highest level, the goal that we have as a company is to build a future of creative tools. If you want to see it as whatever the new Adobe may become, it may not look anything like Adobe, but whatever that same equivalent is in the future, that's what we want CREA to be. And so I remember, I haven't used CREA in a while. I just came back for CREA to just test it out super quick. Would love to have you show us a quick demo. I remember different attempts at canvases, for example, where another image continues from the same thing. I remember very innovative use cases, which were not possible before the age of generative UI, for example, extending models, like different outpaintings, different things. mood boards is the new thing that you guys are now working on as well, right? So we'd love a very quick demo of what people can expect in signing up, especially with CREA2, and then we can get to the actual model that you guys released. Sure, yes. Are you seeing this? Yes. Cool. So this is the new homepage. All these are already K2 CREA2 generated images. But yeah, like a quick look over the tool. You can see all the different tools that we have in here. We have an image generator, video, tool for enhancing, tool specifically used for Nano Banana. Could you tell us about CREA 2, please? So CREA 2 has been our baby for the past six or seven months. We've been like almost half of the company has been 80 to 90% of their time dedicated to this project. And this came from, again, we want to build creative tools. And you can do many things with APIs. what I just showed you with the Nanabanana, like managing context for Nanabanana and stuff like that. But I just remember the old days when like Korea was even starting and you had access to so many models and you could run these models in your computer or you could have access to the weights of these models and the weird things with them. And that was by far like the most creative users of AI came from that hacking with AI models. and as a creative tools company, there's so many ideas that we have around how we want to use these models that we just cannot do through calling an API. And the quality that we have right now on open source is just so bad that it's not that we have bad models, it's that the models that we have out there are not super tunable. They are models that they are very heavily post-trained to be safe and to make images look good, which makes sense because you don't want to, you can get into lots of legal trouble if you don't ensure that. But essentially there's not an ecosystem that would allow you to build the tools that we want to build. So we tried partnering with Black Forest Labs and that was nice. We did the Creo One model. We shipped that together. But after that collaboration, we really made the decision and the shift towards, okay, let's make our own. Because this is really the way that we can build all these features that we want to build, that we believe that are so needed into the space. So that's how it started. it. It was a big and risky bet. It was the first time that all of us was dealing with such a project of training a foundational model. Everybody in the team, it was like the first time, but we, like everybody did an insane job. Like everybody in the team has been like 10x above expectations. Like this first model was like a conservative one. We didn't expect this to be insane. We just wanted to make the first version. So then the second one will look better. But like this first version is already insanely good. Like I'm extremely impressed. I was extremely impressed the first time that I played with it. And the goal, oh, sorry. Did you want to say something? No, go ahead. Go ahead. The goal is? Yeah. So the goal of the model, the problem that we see in the space is that these models feel very opinionated and very constrained. Like the analogy that I always like to use is think of using an AI model, like riding a horse, right? Like when you ride a horse, you have this little brain that it can walk and it can take you to certain directions and you would tell it to go into a certain direction and it goes you tell it you give it a few kicks and it goes faster you do like this with the reins and it goes slower and that like how using ai feels you know you have this mini brain that you can put a prompt and you can steer towards one place or you can steer it into another place and it feels like with most of the models that we have they work insanely well as long as you want to follow the path It's almost like the horse can just follow the path. And if you want to take it to somewhere that is outside the path, there's big walls. There's big walls and the horse will never go there. And these big walls are things, are like the types of imagery that it's quote-unquote bad. It's like the kind of imagery that is grainy, that is not sharp, that is artistic, that is more creative, more esoteric. All of these visuals are like, the model is scared of all of these visuals because they are not falling into the things that post-train it, tell it to generate. And that, in my opinion, poses so many limitations for the creative users of these models. So that was like the premise of CREA2. We want to do something that is raw. we want to do something that it allows you like if you take the analogy of the horse it allows you to go off-road like no walls you just go to the middle of the forest and find something beautiful there what early users of stable diffusion liked about that model to go explore in the latent space you just find the weird the weird edges not the fully fleshed out very great personality like the new models etc and it's harder in those models is what you're saying you guys are trained a model that that is better let's go we love it dude and we're going to give you an applause once that comes. Vic Perez, thank you so much for joining us. Co-founder and CEO of CREA. Shout out to the other co-founder, Diego. And for incredible work that you guys do. I remember just a personal thing between us. I remember when you guys let me fuck around on your H100 cluster and I think I broke something with virtual environments. But you guys really helped me. At the beginning, we're still doing image generation before Thursday. So I'll always be grateful for that. Thank you, Vic. Thank you for joining. Great model. Great folks. Please check out CREA and CREA2. All right. Cheers. Bye-bye. all righty folks i'm gonna bring back you on stage if you know let's go i love the mood board features the mood board feature is like really cool and as you guys see this may come to open source as well so like very exciting there any comments on the interview any comments on crea as a whole or their new model if you guys want to do and then if not we'll continue to the next topic just a big thanks for being up source with the real-time model and yeah advancing ai images the real-time model was super cool i saw an instant like comment and like it was really fast as well and the webcam feature was like definitely dope to play around with we should play around with this a little bit more all right the last thing the app looks pretty good too and i talked to some creatives and they have a hard time just running all the models by themselves and this is like their main complaint that there's not like very good ey and tooling so this actually looked pretty pretty decent in that in that regard so next up we have two more things please stick with we have two more things one of them is i really want to give you some highlights from the musk v altman trial which i listened to and the final presentation is happening today hopefully we'll be able to also cover them i actually can probably ask my agent to go and summarize the final presentations from the trial but this is coming later meanwhile i want to highlight a new release from the weights and biases core with angle so let's go to this week's buzz a corner that we talk about our only sponsor for the show the core weights and biases now the trailer doesn't mention but because it was pre-acquisition i have to update the little trailer but let's go to this from core weave from core weave now righty welcome to this week's buzz we will tell you about everything that happens in the the weights and biases from core weave and to show you some stuff we actually have breaking news from today and wolfram i'm gonna add this to the stage here but i think i i will trust you to to talk about this because i think it's a big release and we've been sitting on this for a while and we're very excited to to announce this so let me show this here let me show this here hold on let me see if i can pull up the should have pulled up earlier we have announced a new feature and i think this This feature is very interesting to folks who are listening to something like Thursday I, because we just talked to you about security and sandboxes. So from today, it's in preview. CoreWeave has a sandbox product. CoreWeave and Weights and Biases have a sandboxes product where you can spin up sandboxes. And I think it's like super, super cool. And to tell you why it's super cool. We'll talk with Wolfram. Wolfram, tell us about sandboxes and why they're important for the stuff that we do. So it's basically if you are working with agents, you want to be able to have your agents execute code, write code, do stuff like download a Git repository. I do it all the time. I tell it I want to change something like Hermes agent. I have about 30 patches on top of it. So I tell my agent here I want this feature. It downloads, it clones a Git repository and changes it. Now, you don't really want to do that on your main machine because if there's anything in the repository, it could basically cause issues like prompt injections, malware. We have seen this and talked about this. And it installs all the dependencies to compile stuff or run stuff. So put it in a sandbox. There's no reason to run this on your own machine. And how do you do this? You need a sandbox and you need a system that uses it. Also for my WolfBench evaluations, when you do agentic evaluations, you also need a consistent environment that is always the same. So it can do the evaluations, then you destroy the sandbox, create a new one, do another test. The state form one test doesn't affect the others. So you also need some way to quickly create sandboxes, run stuff and remove them again. And now we have a product which all the data is where you basically just use an API, use an SDK, or just use a provider integrated in your benchmarking solution that create these on the fly whenever you need them. So sandboxes are super useful in the world of training models, for example, for reinforcement learning. reinforcement learning when you train agents to do code you need like every step of iteration of the training you also need to do rollouts and then have the agent like actually execute some code and test whether that code is like valid for example you don't want to do this on the same machine because all these like things are needing to happen in parallel so sandbox is a great way to do that evaluation is a scale obviously wolf bench needs sandboxes we've been using a huge shout out to daytona for providing us like the sandboxes for wolf bench variation wolfham has been testing out both and for evaluations at scale artificial analysis does a bunch of coding stuff harbor framework terminal bench all of these they need to execute a lot of things in isolation because you don't want to leave traces of code in the same machine for example sandbox are great for that and you need actual like you need speed for that to not hamper your evaluation and and this is like an agent tool use so stuff like hermes for example where people just run them on mac mini if they run in sandboxes for some agentic tool use where stuff don't actually need to execute on your machine. That's what sandboxes are for. And we are very proud to announce the CoreWeave, the same infrastructure powers nine out of the 10 major AI labs now, including Meta and Entropic and OpenAI. And we can count all of them besides the one that you may not know about the one, or you may know about the one, but we're powering nine out of 10 foundation labs in terms of how they train models. We're now offering a sandboxes product within those clusters for them on CKS. So they can execute the code where the model is trained. I think it's very important. But also, this is a very interesting thing for, this is a very interesting release for CoreWeave. It's one of the newer ones, right? So like inference, like CW inference, CoreWeave sandboxes is also a self-approaching product. So you as a user can, via weights and biases SDK, can install SDK and spin up a sandbox as a user. So not only like the enterprises who, the top enterprises who pay CoreWeave a lot of billions of dollars of money, also you can go via weights and biases and spin up sandboxes right now in preview this is a very interesting release where corewif doesn't usually go and cater to the developer market with inference and now sandboxes this is the second product that we are now seeing the corewif is offering the same infrastructure that powers nine or ten major foundational labs is also now available to you in sandbox i think it's a huge release i think it's very important please give it a try all the documentation is up on corewif.com i'll add the note in show notes wolfham you've been give it to your agent your agent can send and create sandboxes as it needs yes your agent can spin up sandboxes as it needs you can just send the documentation to your agent here's the full release which is not going to be useful to you to nobody's going to type all this but you just look up core with sandboxes and this is a very new thing please give us feedback as well please if this works it doesn't work whatever works please give us feedback yum you have feedback right now question question question what do i get i want to use the service by the way just to be clear anything coding you need sandbox anything coding multi-agent you need sandboxes you don't otherwise they're going to interfere if they use the same computer it's that simple yeah they're going to step on each other's toes exactly you need to sandbox just isolation is for this there is a reason all codex and cloud code and web all of them are sandboxing their models and throw away the box once once you get a pr i'm just asking what do i get let's say that i use i get i want sandbox what do i get with the sandbox it's pre-installed i can install it give us a little bit more info wolfram how does that work for you in wolfbench for example let's pull up the pricing information do you have it what the resources are available the thing is i have been using a lot of resources and i so far haven't even hit a limit yet so basically like always you give it how many cpus do you want how many gigabytes do you want for the disk and it's pulling the container images if you are using a wolf bench or terminal bench and basically every task has its own cpu ram and disk limitations which it will just use and pull in the container and start it up and container is like docker docker file specifying you or remade images or it's using the images basically so you don't create them but you just pull a container from your container repository basically Cool, cool. All right. And the highlight there is it runs on the same infra as the major labs use as well. And if you are one of the CoreV, like CKS customers, for example, this will run multi-cloud, I believe, at where your inference runs as well, which is great. So try out CoreV sandboxes today. Let us know. I think we're very generous with the pricing as well. In the beginning, we have ways and biases. I'll add all the documentation in the show notes. I think next week we'll chat with the folks who built this to see how cool this is, where she go into detail and have examples. Wolfram? Yeah, basically just one thing. You can't use GPUs yet. So basically it's a CPU-based sandbox, but that is all you need for the agentic workloads usually. Yeah, all the code executes in a CPU environment. And the last thing I'll say from the release, this is an official thing. If you are a Corbett customer and you already rent GPUs from us, like clusters, you will spin up the sandboxes on the spare CPU from the GPU machines, which is great because Verorubin machines, for example, have great capacity for CPU as well. So that's going to happen there as well. All right, folks, this has been the update from Weights & Biases Core. We have more updates to come next week. We're definitely going to talk about some exciting stuff including WolfBench. Folks, I think the last thing that I want to talk about here is the Sam Altman v. Musk trial. I think that this is what we're going to end the show with. We're a little bit over, but I think it's very important to cover. Have you guys been following at all? Do you guys remember we talked about when Elon sued, but now there is an actual trial going on and I have some highlights here. Have you guys been following at all? I followed your coverage. Okay. It's actually happening right now. The final statements from the lawyers are happening as we speak live. And the courtroom was so full of different fanboys from Elon Musk and different things that the courtroom decided to live stream only audio of the court. There's no pictures taken in the court. And this is what we've listened to. So I listened to, and I actually set up my Hermes agent to go and listen, which was incredible. I just gave it a link to YouTube and said, hey, transcribe this and in two minute increment chunks, send me updates of the most important thing that they just said because trial can be boring. So on the trial, they had Sam Altman there. They had the times that I listened. They had Satya Nadella, Ilya Sotskover, co-founder of OpenAI, and famously the guy who then fired Sam Altman in that very quick firing and rehiring of November of 2024. Somebody remind me, 2024, I think it was. When Sam Altman got fired and almost had the company quit. We covered this live on Thursday as well. And here's a few highlights from the testimony that we haven't seen before. I definitely want to read out some of those because I think that was important. Look at this beautiful picture of Sam Altman in court that is completely fabricated by AI. This is GPT-Image 2. I just, I wanted to post something and I didn't have anything because court documents are sealed. So what is this fight about? This is a summary of my Hermes. This is a civil fight over OpenAI's soul. Musk claims that OpenAI abandoned its nonprofit public benefit foundation bargain by becoming a Microsoft-backed for-profit empire. OpenAI defense is that Musk's new commercialization was necessary to scale the models, tried to seize control for himself, walked away when he couldn't, and is now suing over the success that he first predicted 0% chance. So here's a few things. Musk started, this is a little bit skewed because this is from the testimony in cross-reference from Sam Altman, but Elon Musk famously said it's a direct quote from Sam Altman an early number that Mr. Musk threw out was that he should have 90% of the equity to start of OpenAI it then softened but it always was a majority and that was very important to him to have a majority in OpenAI and he wanted 90% of OpenAI and this is kind of whatever he also wanted unequivocal initial control so this is both Altman and Satzkever both confirmed this Elon Musk had a document direct quotes from Elon Musk, I would unequivocally have initial control of the company, but this will change quickly. And Ilya Satskova said that the Musk argument that other founders couldn't see what he saw in AI referring to a solar city disagreement. I'm not sure what that disagreement is about. And then also, this is a very important nugget. Elon Musk did not believe that OpenAI has a chance to succeed versus Google. My probability assessment of OpenAI being relevant to deep mind Google without a dramatic change in execution and resources is 0%, not 1%. I wish it was otherwise. So Elon Musk, despite starting OpenAI, Elon Musk did not believe that they have a chance to succeed without changes. The September 2017 ultimatum from Elon Musk, he said, I had enough. This is the final straw. Either go to do something on your own or continue with OpenAI as a nonprofit. This is Elon Musk saying, hey, literally continue OpenAI as a nonprofit, which they did. And now he's suing them for this. The very interesting thing that I saw from this trial is how the trial is focused on Microsoft specifically. And if you guys remember, Microsoft was the first major player to invest in OpenAI. First $1 billion, then $2 billion, then $13 billion, sorry, then $10 billion, $13 billion in total. Elon Musk is suing Microsoft specifically in addition to OpenAI for the nonprofit seal. But the very funny thing about this is that SoftBank, NVIDIA, Amazon, I believe it's Amazon, and some other companies have much more stake in OpenAI than Microsoft at this point. Significantly more stake, but they're not named in these lawsuits, only Microsoft is. So this is why Satya Nadella was on the stage witnessing. And Nadella confirmed to the standard, the pro forma target redemption amount on $13 billion investment was $92 billion. The values do not reflect inflation adjustment and are subject to a 20% increase. So basically, there is a very interesting thing from the trial as well. The investment was capped. The investors in OpenAI had capped returns on how much money they could make from OpenAI, showing that this is still kind of a non-profit public benefit company, for example. Microsoft had a $13 billion bet, has a contractual ceiling that compounds to roughly $180 billion in four years, which is other companies has way more than this. There's also the conversation about the AGI clause. The famous AGI clause, LDJ always talks about this. You guys know about this, that Microsoft gets certain IP rights and research rights up until OpenAI reaches AGI. And here's the actual beef, the deals behind this. Not that alone the original deal. Essentially, the entire thing would be dissolved if AGI was achieved. I think it was the term. So Microsoft and OpenAI agreed that once AGI is achieved, the whole deal was resolved. And I think if the nonprofit board decided to commercialize, we would get to commercialize. But if they didn't commercialize, we wouldn't get commercialized as well. We kept it symmetric. This is Satya Nadella on the terms of the deal with OpenAI, this board. And the outcome on the recent updates is that Microsoft no longer gets Research IP at AGI point, but will continue to get product IP throughout the end of 2032. So basically, they amended the AGI deal a little bit, but now the product IP will still go to Microsoft but not Research IP. I think the last thing that I want to highlight is when the board, so Elon Musk's lawyers tried to paint Sam Altman as this non-trusty character. And so they talked about the firing of Sam Altman by the board a lot. They brought the previous board on. They interviewed a bunch of people on this board. They interviewed Satya Nadella about that incident. And the board back then said a very specific statement of why they fired Sam Altman. They said, exhibits a consistent pattern of lying, undermining his execs, and pitting his execs against each other. This is what a direct quote under oath from Ilya Satzkova about Sam Altman. And this leads to tremendous loss of productivity and trust. When they asked about Ilya Satzkova, does he believe the same thing about Sam Altman right now? He said he thought so at the time and had been thinking about Altman issues for at least a year. And then he said, yes, I voted to fire Sam Altman because of this. And then for Satya Nadella, the non-consistently candid explanation for the board firing Sam Altman, Satya Nadella said he asked the board explicitly why Sam was fired and they never gave me a specific reason. If there were some incidents or what the detail is behind this incident, none of that was coming through for Satya Nadella. And Satya Nadella called him an amateur city as far as he was concerned. None of the board explained why they fired Sam Alton. Still, even under oath at the court. Besides the, you know, there were no specifics about the consistent distrust issues with Sam Alton. last thing is i don't think that's very important that hold on one second oh i have a bunch of stuff here yeah so nvidia and softbank each have around 30 billion microsoft is the smallest megabit inside open ai softback and nvidia both has 30 billion investments in in in open ai and the last thing is when asked guys i want your reaction to this when asked about the differences so between AI back then when they only started open AI. And now Ilya Satskavar gave this best quote. He said, it's like the difference between an ant and a cat. In 2018, AI was an ant and now AI is more like a cat. This is Ilya Satskavar's genius definition of how far we've come in terms of AI capability to underline how much AI was not a thing in 2018 when they started the company. AI was like barely anything, barely was important or clear that anything will happen. It also highlights to me, honestly, how far thinking all these people were in AI. Ilya Satskova, obviously Greg Bachman, Sam Altman, and also Elon Musk, when they already opened OpenAI as a competitor to Google DeepMind back in 2018, because they saw what's coming and what's happening right now today. This is the quote from the trial. We'll keep you updated about the trial. It's very entertaining to see the behind the scenes of these documents as they get released. Any thoughts on the trial, folks? Before we conclude, I think it's two and a half hours it's time for us to move forward. And we have a comment from Jordan saying that Microsoft is the largest individual shareholder in the new OpenAI public benefit company, more than OpenAI Foundation, which is the not-for-profit. Wolfram, Jaum, any comments? Nissen? I don't know. I don't have a site. I don't have a site yet. What is Elon trying to get out of it? I have no idea what happens if he wins this trial. He does not get open AI control. That's clear. In the beginning of this, it said he wanted full control, and he doesn't get full control. He invested a total of $35 million in it. I don't think he wants that back or more. I'm not sure. I don't know. I actually have no idea what happens if he wins, besides saying, hey, I won, and they stole the nonprofit. So I really don't understand what this is all about besides pettiness and a lot of money for lawyers. These lawyers are getting paid more than researchers, I think, at this point. This is crazy. Crazy town. All right, folks. There have been some funny quotes here and there. And Greg apparently has a diary, also a journal, a diary. I don't know. Apart from, I don't know, entertainment for us. I don't know the point. I'll tell you what we're getting, though. We're getting disclosure. We're getting disclosure about some crazy numbers. So one thing that I wanted to highlight here, and I didn't get to this, it's Sam Altman's stake at Helion Energy, right? So they have disclosed all of their stakes and everything. Sam Altman has a 30% stake in Helion Energy. And Helion is now valued at whatever number of tens of billions of dollars. Sam Altman's stake in Helion Energy is like $13 billion or so. And also, this is now, he owns 22.8 million shares in Helion, worth about 1.6 billion as of December. And Helion has a 2028 power deal with Microsoft and a scale deployment agreement with OpenAI. So Helion Energy, the company that works on Fusion, is supposedly have a deal with Microsoft about providing power to Microsoft as soon as 2028, which is in two years. I think that that was notable. And the fact that Sam Altman owns a third of Helion Energy is also super cool. So we get disclosure, yeah. And I think disclosure is very important for transparency. And I think that's what we definitely get. All right, folks, with a bit under two and a half hours here with guests and different conversation about the state of AI, thank you all for joining. It's been so fun to be back. So fun to be back. The last thing that I want to make... Ooh, Nistin. Okay, the last thing that I want to show before I drop is that beginning of the show, Nistin posted an open source scanner for the Shai Hulu mini worm. and I ran this with my, it's going to come up here in a second. There we go. And I ran this together with my Hermes Wilfred and luckily I'm not exposed. Thanks for the scanner. I asked it to run and you can see all of the code that it ran and now I'm not exposed. There was one review finding, but it's a false positive. Nistan gave me a false positive. It's probably from one of your chats with Hermes about this. It will label that as a false positive. So I ran this task based on Nistan's scanner and I'm not exposed. definitely recommend you guys also to run this i'll add this to the show notes thank you so much for joining uh wolf from one last comment before we drop yeah while you were doing this i also gave my agent the test to set everything up that npm and pi pi stuff will not be installed if it's younger than a day basically and that has been set up as well so highly recommend everybody to do that if you yeah guys use any sandbox it just don't leave your laptop up running like that run tmux Not any sandbox, Nisten. Not any. Not any sandbox. Our sponsors one is pretty good. All right, folks. Thank you so much for joining. This has been Thursday night for May 14th. It's been so awesome to be back. Next week, I'm going to Google I.O. and expect a lot of news because as remember, always when Google I.O. is about to release some stuff, OpenAI jumps in to try to cut them before XAI jumps in last. Last Google I.O. was crazy and the Google I.O. before was crazy. So I'm going to be live from Google I.O. probably going to record some stuff live there and also we'll cover everything that happens in google i on thursday and next week so probably going to be happening from home please stay tuned for next week if you missed any part of the show thursday is a newsletter and a podcast and now a short clips factory go find us on instagram we're blowing up on instagram please follow there as well shout out to everybody here all the co-hosts who joined ryan carson young pellag nissen tayira wolf and shout out to ldj who was missing in action this week we also had victor peres from as a guest and shout out to core with team that launched sandboxes which is a very cool thing thank you so much for joining we'll see you here next week bye bye everyone cheers We'll see you next time.