Cal Newport on Mythos and Anthropomorphization

59 min

•Apr 22, 20263 months ago

Summary

Cal Newport and Ed Zitron discuss how AI reporting has become increasingly sensationalized and directionally true rather than factually accurate, examining the hype around Claude Mythos, AI agents, and the anthropomorphization of language models. They argue that tech companies are marketing capabilities that don't yet exist while media outlets amplify fears without proposing solutions, creating a moral hazard that stresses the public without justification.

Insights

AI reporting prioritizes directional truth (making a general point) over factual accuracy, similar to COVID coverage but without the actionable guidance that pandemic reporting provided
The Mythos release appears to be primarily a marketing exercise, with security improvements being incremental (66.7% to 83.1% on benchmarks) rather than revolutionary, yet covered as a major breakthrough
AI agents don't currently exist as functional autonomous systems—what's marketed as agents are actually hand-coded wrapper programs around LLMs that still require significant human engineering
The future of AI is likely distributed, specialized applications rather than centralized trillion-parameter models, which threatens the business model of companies like OpenAI and Anthropic
Tech executives face a moral hazard: either they're marketing hype (unethical) or they genuinely believe in AI doom (in which case they should shut down, not continue operations)

Trends

Credulous tech journalism treating unverified claims from AI companies as fact without independent verificationMarketing-driven AI capability announcements that conflate incremental improvements with paradigm shiftsShift from actionable crisis reporting (COVID) to nihilistic resignation reporting (AI) that stresses people without solutionsAnthropomorphization of language models creating false impressions of autonomy and intentionalityRegulatory theater: governments holding meetings about AI threats without meaningful action or understandingOpen-source models (3-5B parameters) replicating capabilities of proprietary trillion-parameter models, undermining moat argumentsAI safety research institutions (AISI) with ties to effective altruism producing studies that conflate correlation with causationTech companies pivoting to product clones (Figma, Word) rather than demonstrating claimed autonomous capabilitiesBenchmark gaming as primary metric for AI progress rather than real-world economic value or capabilityDecoupling of AI company marketing narratives from actual product capabilities and deployment

Topics

AI Reporting and Media CredibilityClaude Mythos Security CapabilitiesAI Agents and Autonomous SystemsLanguage Model AnthropomorphizationBenchmark Gaming vs Real-World PerformanceDistributed vs Centralized AI ArchitectureAI Safety and CybersecurityTech Company Marketing and Moral HazardOpen-Source vs Proprietary AI ModelsAI Job Displacement ClaimsNatural Language Interfaces in SoftwareLLM Limitations in Planning and ReasoningAgentic Web ProposalsAI Regulatory TheaterPost-Training and Harness Engineering

Companies

Anthropic

Primary focus of discussion regarding Claude Mythos release, marketing practices, and claimed security capabilities

OpenAI

Discussed alongside Anthropic regarding business model threats from open-source alternatives and AGI marketing

Axios

Criticized for credulous AI reporting, misquoting executives, and aligning with tech companies to amplify fears

Box

CEO Aaron Levy cited for making unfounded claims about AI agents and agentic workflows

Coinbase

CEO Brian Armstrong mentioned for promoting agentic web concepts without demonstrable technology

Salesforce

Criticized for building APIs for non-existent agents and spending billions on speculative technology

Meta

Example of company spending $70B+ on metaverse with minimal results, paralleling current AI spending

Microsoft

Discussed regarding Copilot attempts to create natural language interfaces for software automation

Google

Mentioned for natural language search capabilities and recent AI summary integration issues

Figma

Subject of Anthropic's recent product clone announcement, raising questions about AI capability claims

AI Safety Institute (AISI)

Criticized for producing studies conflating tweet volume with AI scheming, lacking methodological rigor

People

Cal Newport

Guest discussing AI reporting credibility, Mythos hype, and limitations of language models as autonomous agents

Ed Zitron

Host conducting critical analysis of AI industry marketing and media coverage patterns

Dario Amodei

Discussed for making vague statements about AI capabilities and job displacement without specifics

Sam Altman

Mentioned regarding marketing of AI capabilities and moral hazard of existential risk claims

Aaron Levy

Quoted making unfounded claims about AI agents and agentic workflows in enterprise software

Brian Armstrong

Mentioned for promoting agentic web concepts without demonstrable underlying technology

Gary Marcus

Cited for reporting on Anthropic's use of hand-coded symbolic AI rules in coding harnesses

Kevin Roose

Criticized for aligning with tech companies in AI fear-mongering coverage

Casey Newton

Criticized for aligning with tech companies in AI fear-mongering coverage

Meredith Whittaker

Mentioned for making directionally correct but unverified claims about AI agents booking flights

Quotes

"I'm just going to sort of, it's a, I call it like head shaking doomerism. You're just like, it's a, this field's just going away. What can we do? Like this, this sort of like passive head shaking."

Cal Newport•Early in discussion

"If you actually really believed 50% of the economy was going to be automated, that we're going to have to have government checks just so we can afford to buy the cat food to eat after all the jobs are gone... you wouldn't just write a sort of too cool for school head-shaking resignation article."

Cal Newport•Mid-discussion

"Either you need to be building the barricade, or you're just scaring people for the marketing. Neither of these, I think, is something that's defensible."

Cal Newport•On CEO moral hazard

"Agents don't exist. They don't. They don't have the ability to like, oh, they'll use computers. Computer use is basically non-functional in AI and it takes insane amounts of compute."

Ed Zitron•On AI agents

"What you're not doing is actually doing step-by-step evaluations. It doesn't have a clearly isolated goal that it's trying to measure how close you're getting to it. It doesn't have a world model."

Cal Newport•On LLM limitations as planners

Full Transcript

This is an iHeart Podcast. Guaranteed human. Run a business and not thinking about podcasting? Think again. More Americans listen to podcasts, then add supported streaming music from Spotify and Pandora. And as the number one podcaster, iHeart's twice as large as the next two combined. Learn how podcasting can help your business. Call 844-844-IHEART. A win is a win. A win is a win. I don't care what y'all say. Yep, that's me. Clifford Taylor IV. You might have seen the skits, my basketball and college football journey, or my career in sports media. Well, now I'm bringing all of that excitement to my brand new podcast, The Clifford Show. This is a place for raw, unfiltered conversations with athletes, creators, and voices that not only deserve to be heard, but celebrated. So let's get to it. Listen to The Clifford Show on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. And for more behind the scenes, follow at Clifford and at TikTok Podcast Network on TikTok. When a group of women discover they've all dated the same prolific con artist. They take matters into their own hands. I vowed I will be his last target. He is not going to get away with this. He's going to get what he deserves. We always say that. Trust your girlfriends. Listen to The Girlfriends. Trust me, babe. On the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. I got you. I got you. In 2023, Bachelor star Clayton Eckerd was accused of fathering twins, but the pregnancy appeared to be a hoax. You doctored this particular test twice, Ms. Owens, correct? I doctored the test once. It took an army of internet detectives to uncover a disturbing pattern. Two more men who'd been through the same thing. Greg Gillespie and Michael Mancini. My mind was blown. I'm Stephanie Young. This is Love Trapped. Laura, Scottsdale Police. As the season continues, Laura Owens finally faces consequences. Listen to Love Trapped Podcast on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. CallZone Media. Hello and welcome to Better Offline. I'm, of course, your host, Ed Zitron. As ever, support your neighborhood Zitron by subscribing to the premium newsletter, discount link in the episode notes, of course, buy a t-shirt, download a blog, whatever it is you want to do, okay? It's not up to me what you do, but today I'm joined by the incredible commsci professor and commentator Cal Newport. Cal, thank you for joining me. Always a pleasure, Ed. So I kind of wanted to start with, I asked you for a quote a few, like a week ago, maybe two weeks ago. I can't remember how time works anymore. But it was around the way the reporters cover AI and how it seems that a lot of the reporting is kind of directionally true rather than actually true. Yes, and I want to add something to it since. So I've been thinking about that quote. Yeah, I've been thinking about it. So what I said, if I remember that quote properly, what I was saying is I was picking up a lot in the reporting on AI that you would lean into a story without having necessarily verified that the details are true. and that this is what's actually going on, say, with the new AI model, you would lean into it anyways because it was what I call directionally correct. It makes the general point that you see it as your job as a reporter to make, which is, hey, you need to be worried about this or this is a big deal, right? And so I think that is a problem. There's another issue I'm seeing, though. I've sort of been refining my thinking on this. I'm also wondering if some of what I'm seeing in some of the reporting on this is it's just a embrace of the form of i'm going to give you a stress wave with no relief just like we're all going to take turns just i will choose an area you haven't thought about how about math mathematics are going to go away mathematicians are going to be okay i'll take that one yeah let's go like negative clickbait yeah and they but there there's this weird sort of passivity to it where it's like, I'm just going to sort of, it's, it's a, I call it like head shaking doomerism. You're just like, it's a, it's, this field's just going away. What can we do? Like this, this sort of like passive head shaking. It's a very specific style. You don't see a lot of other reporting historically. I think that takes on this resignation of, I'm just going to make the case that like you're screwed and then kind of give you a shoulder shrug. And then we're, and then we're going to drop the mic and walk off. And I'm kind of getting tired of this. Like, I think there is a cost to stressing the hell out of people. I mean, I'm getting letters all the time now from people. They'll say things like, I feel like I'm trapped in a cage just being hit with wave after wave of stress, and there's no outlet. There's no door or possibility of making things better. And I think the CEOs are doing it, and I think increasingly we're seeing commentators doing it as well. This is not good in many different ways. So I don't know. I'm adding that to my list. Some of it's directional true reporting. They really are worried that people aren't worried enough. And I think it's just sport now. Can you find an area to come in and just write a head-shaking article that's only trying to undermine the existence of this important human activity or this job or our lives or whatever? It's a very unusual style that quickly became a standard. And I see it a lot with anything to do with AI and job studies. like i've been sent this tufts report where it's like oh yeah ai affected or i they find these weird weasel words where it's like jobs that could be at risk from ai at some point and we put them in one bucket and then jobs that might one day be we'll put that in another bucket and there you go don't know what we're like you said don't know what we're meant to do with this don't know what anyone's meant to do with this information but it's just like well there you have it there you yeah, but we're all fucked. It's the end. Even though the data does not say that. Like, I've read, I think, every AI jobs report now. Every single one. And they're all the same. They are all, right now, AI can do this. And then you look at what it says, it's like, it can do law. Well, it can't really do law. It can do one sigma within law, kind of. And even then, it isn't really obvious. And the people saying it can do that are partners at law firms that don't write motions, but don't do like the grunt work. So it's almost, it feels like the reporters have either given up or are just looking for clicks. And it's hard to tell sometimes. This is what I'm trying to figure out because I'm realizing if it's entirely just, I think this is directionally true and that's good enough, then they should be way more upset and in the streets and sparking a revolution, right? Like if you actually really believed 50% of the economy was going to be automated, that we're going to have to have government checks just so we can afford to buy the cat food to eat after all the jobs are gone. If you really thought that our entire infrastructure was about to collapse, that superintelligence was going to emerge suddenly and be a threat to human existence, you wouldn't just write a sort of too cool for school head-shaking resignation article. You would be like, we got to, where are the John Connors, right? Like we need to get on the cool trench coats and get out there and go against the Skynet revolution, you would be on your feet. Nothing would be more important to you. So this is my case about the tech CEOs. I think there's a moral hazard that I don't think that we're putting our finger on properly here, right? So you have the tech CEOs in the AI space that'll just come out and just drop these bombs. Yeah. White-collar blood. You know, he never actually said that. That's Axios putting words in people's mouths. Oh, that was Axios. I thought that that was it. Dario Amadei, Wario, He did say 50%, but I thought he said the blood bath. That's my bad. Well, I trust, I, I, this, the New York fact checkers figured that out for me, but Axios does a lot of this where they put like these really quotable quotes in the headlines about articles on interviews or speeches given by AI people. And it turns out the thing in the headline wasn't what they said. It was directly what they said. But anyways, so they're out there making these big statements. The jobs are going away. The internet, as we know it, is about to all fall apart because of mythos is going to have this new capability. The super intelligence is coming. I don't even know what's going to happen. There's two possible things going on here, and both of them are morally bad. One is, which is the one I think is true, which is this is largely marketing. This works. It gets reported. It keeps us seeming inevitable and important, in which case that's a huge moral hazard because you are making many, many people, normal people, stress the hell out. actively scaring them, actively scaring them. The other option is you actually believe it's true. Well, this is an even larger moral trap that you've just fallen into because you are now perpetuating something that's going to cause exponentially more harm. You should be the very first person shutting down your company and trying to get the other ones to do it as well. So it's this weird moral trap they've set up where whatever is actually going on here. If they're coming out here saying these things, it is bad. This, this can't possibly normatively speaking, be the right ethical behavior to be out there saying these scary things all the time. Because either you need to be building the barricade, or you're just scaring people for the marketing. Neither of these, I think, is something that's defensible. I have a third and worse option, which is I choose Axios. I think Axios, there are some good reporters there. I think the leadership over there is disgusting. I think that they are aligning themselves with the companies. I think that what, like, if you watch, there was a Jim, what's his name, interviewing Sam Altman. These, I think that there is a level of, and I would put this across people like Kevin Roos and Casey Newton, these are my words, not cows, that they're aligning them, that they're saying, we think this is going to happen, and we're here to tell you, great news. This is good news for me, the writer, because I will be safe somehow. I will be fine. You will not. You should be scared. But it's also a good thing because economy, marketing, market, good. And it's a very incoherent message because it's like, to your point, yeah, if this was a virus, like a pandemic, you wouldn't be writing, hey, millions of people are going to die. What? Pretty good, right? Hey, it would be good. We'll have less people. That'd be good, right? It would be seen as peculiar. Someone did write that. Someone did write that, by the way. Someone did write that? They did say, I remember early pandemics, someone did write, hey, you know what? This is good for the planet. Did it go out? Like, hey, we're driving less. This is great. And we're overpopulated. You're like, oh, I mean, I mean, that's a different conversation that maybe I, but in all seriousness, you didn't have mainstream media being like, well, COVID is going to kill everyone. The end, I guess, I guess, you know, maybe we'll just be inside forever. You didn't have this kind of straight. In fact, you had the direct opposite. It was, we need to get outside again. Who cares about this thing? Well, it's just, yeah, go on. Yeah. I think that's an interesting. Now I want to just pull on that thread a little bit because I think COVID gives an interesting, I think it gives two different interesting observations to go in both directions, right? So I think you're definitely right what you're saying is when the pandemic was coming or it was getting bad, really a lot of the coverage was about what should we be doing or who are the people doing the wrong thing. But it was very much coming from this angle of like, okay, we need to do whatever it is. Like we need to be better about this. It's got to be vaccines. It's got to be masks. It's got to be pickier mitigation whether you like it or not. It was very focused on what should we be doing or who is it that's getting in the way of a plan that maybe would get us out of this, which is where I think you're very right, is that you did not see a lot of COVID pieces that were just, well, I'm just going to kind of walk through like all the different ways. You might die and the morgues are going to fill up and that's COVID. That's just how life goes. But I also think the other thing we saw in a lot of COVID coverage is something that we are seeing in the AI coverage. That's where I saw a lot of the directionally true, not factually, but directionally true. There was definitely a period early on in COVID because I was following that coverage quite carefully where the papers were thinking, okay, this is the right behavior. And they were probably right about a lot of these things, but I just would notice this. There would be a lot of like, okay, we need people to buy into, for example, the lockdowns or whatever. And there'd be a lot of directionally true reporting where maybe they would like put on a photo of a mass grave that was sort of unrelated to COVID. Or you would see a lot of – there'd be pushback from like conservatives about schools. And then they put a lot of articles in the paper about teachers dying of COVID even though it was – they weren't in school. They got COVID elsewhere. And if you really pushed on it, it was because it's directionally true. Like the general or more general truth here is like we need to be worried about this or these mitigations work. It doesn't matter if this photo is actually right or if this teacher who died in Orlando, the fact that they hadn't yet been back in a school building yet. It's serving the directional truth. So it's like it highlights something. COVID highlights something we're seeing now that the reporters that are doing directional reporting, like we should be scared about it. I dare you not to be scared now. I dare you not to be scared now. Just trying to ratchet it up. But then you also get the contrast, which is this new style of just like head shaking resignation. And actually, I don't think the reporters think they're going to be safe. They're also like writing is going to go away. The media is going to go away. So it's an almost like nihilistic type of approach to this. Like, yeah, I'm screwed. We're all screwed. What are we going to do? And that is definitely different than we saw during that last crisis, which was obviously much more actually severe than what's happening now. So it's really confusing me, to be honest. well the directionally reporting during covid yeah probably shouldn't have but at the same time it was in it was actually in pursuit of something good like it was an attempt to make people take this seriously because that's ultimately what it was is take this seriously don't go outside stay like don't don't meet with people don't be indoors with people blah blah blah blah great in this case it's like yep you should be scared of this and what should you do fuck knows use chat gpt i guess yeah and what's what's really confusing to me as well as you say all these people don't think they'll be safe for the most part i just don't i actually take back what i said i think a lot of them just don't acknowledge it they don't acknowledge the core ridiculousness of being like well everyone's jobs are going to get replaced don't know like the garfield meme with him looking at the the garfield with the cross out on the tv yeah flawlessly described there um it's just it's frustrating as well because it is terrifying people without like i'm not saying literally axios or however but stories like this are what made that made a mentally unstable person throw a molotov cocktail at sam altman's house like it's obvious that these people were scared of the ai doom partly because to your point what the fuck are we meant to do about it because using these tools is not i don't really see how that works because if going along that line of logic if the answer is you need to use this stuff now but the eventual end point is that it's intelligent enough to do everything for you how does using it now matter at all like what what's the surely chat gpt would be seen as like a a rock versus a shotgun at that point like it's just technologically irrelevant if they get to agi which they probably won't. And it's just naturally illogical stuff. Yeah, and I'm with you. I've been making that same argument, this idea that you need to learn how to prompt some generation of a chatbot that exists right now is going to be the key to your long-term. I mean, even if, as you say, AI ends up playing a major sustained role in the economy, it's not going to be everyone typing on a web interface to a chatbot that's sycophantic and has a personality. Like, I think I've heard you say this recently, and I agree with it as well. I don't think we should be chatting with technology. We should not be chatting in a sort of anthropomorphized, humanized way. It doesn't mean you can't do natural language processing. I mean, Google is natural language processing. You're writing your Google searches in natural language, but no one's having a conversation with Google. You list the keywords as quickly as possible, and Google's pretty good at figuring out population, Spain, 1982. And you press enter, and you get that information. You're not like, hey, so I'm wondering what the population is of Spain in 1982. too. Can you help me find that question mark? There's something odd about that anthropomorphized conversational interface. I guess we saw a lot of Star Trek growing up and that's what we think the future is supposed to be like, but it has all sorts of problems. Remembering Star Trek, when he would go, computer, do this, the computer didn't go, that's a great idea, Jean-Luc. What a great idea. Thank you. The computer just did the thing. I don't have any trouble with natural language queries because i think the whole reason that say chat gpt has grown comes from search i think it is the core of it because chat gpt and claude and all them are better at understanding what you asked for not saying the data output is necessarily great but just they understand the the the inference they make from what you say is better than google or at least better than google has been i feel like it was better before and i think that had google not kind of boofed it on this one we wouldn't be in this spot but even then using google now it forces you it forces the ai summaries and you could do minus ai and all that but sometimes i don remember to and it just it just turned search into this nightmare but nevertheless back to what you were saying I agree I think the anthropomorphization needs to go I think that these things need to respond like terminal windows or what have you. They need to respond like computers and go, okay, here you go. Just don't need all that kludge. I don't need to be told, oh, what a great idea. I know, I had it. Or indeed, if I'm being told that, I need to be told if it's a bad idea. But I don't even necessarily need an answer. I just need stuff to look at so that I can come to my own conclusions. I think it's hard, actually. I think it's actually hard to get a language model to do that, right? Because if you think about, when you go back to the base layer of what's happening in the pre-training, is that you're building a language model that's trying to win at the token guessing game. So I'm trying to guess what word or part of word actually comes next to what I assume to be a real piece of text. And then if you do that autoregressively, so you call it again and again and again, adding the answer to the input so it grows out an answer, what you're going to get is text expansion. You've given me a text that I'm trying to expand as if there was a real text that exists and I'm trying to match it. You get that like kind of indirectly. So really its idiom is the type of text it's trained on, which for the most part is more sort of prose style text. So you can tune it away from it. Like you can tune its mood. You can tune its sycophancy. But it might be hard to actually tune an LLM because it deals with human written prose as its main training data. It might be harder than we think to tune that away from being verbose and to just give a table. Now, I guess you could take its output and then maybe run that through another thing that then strips away the other piece. It's like it's possible. But I think the anthropomorphized verbosity we see in language models is also, that's kind of the native tongue of this particular, which is why we still have a lot of chatbots being emphasized. And tools that are built upon LLM as the digital brain are still way more scarce than you would imagine outside of maybe computer programming and coding harnesses. We just don't have a lot of other examples where we just use the LLM as a general person's digital brain. Because I think this verbosity is okay. hey, humans can interpret that, but it's not great if the LLM is just a digital brain that's interfacing between you and another computer that doesn't need to hear that their idea is great or wants to try to parse the different types of text. So there's some interesting things going on there about the fundamental nature of these things. But even then with Google AI mode, it still seems kind, it still like actually seems like it can give fairly short answers. But if you mess, if you argue with it as i have it will just provide you with it even google's will provide you with just hot dog shit yeah like it will just claim something is true my why one i just did a private equity thing on private credit even and my favorite thing is being like what fund is this part of and i go it's part of this fund that fund was funded was founded after this happened and it goes okay well maybe it's this one different fund three years old doesn't not involved do you have proof of that. Well, this is what you don't see in Star Trek is Captain Kirk or whoever, I'm going to mix up the episodes here, say like, hey, computer, we are approaching Deep Space Nine. Prepare docking procedures. The computer is like photon torpedo fired, station destroyed. And you're like, well, no, I said we're supposed to dock. Oh, you're right, Kirk. I shouldn't have fired the photon torpedoes. Thank you for holding me accountable, Captain Kirk. That was. I did the opposite thing. That didn't happen in Star Trek. I'm I'm I'm I'm I'm and podcasting. Call 844-844-IHEART to get started. That's 844-844-IHEART. A win is a win. A win is a win. I don't care what y'all say. Yep, that's me, Clifford Taylor IV. You might have seen the skits, the reactions, my journey from basketball to college football, or my career in sports media. Well, somewhere along the way, this platform became bigger than I ever imagined. And now I'm bringing all of that excitement to my brand new podcast, The Clifford Show. This is a place for raw, unfiltered conversations with some of your favorite athletes, creators, and voices that not only deserve to be heard, but celebrated. One week, I'll take you behind the scenes of the biggest moments in sports and entertainment. And the next, we'll talk about life, mental health, purpose, and even music. The Clifford Show isn't just a podcast. It's a space for honest conversations, stories that don't always get told, and for people who are chasing something bigger. So if you've ever supported me or you're just chasing down a dream, this is right where you need to be. Listen to The Clifford Show on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. And for more behind the scenes, follow at Clifford and at TikTok Podcast Network on TikTok. There's two golden rules that any man should live by. Rule one, never mess with a country girl. You play stupid games, you get stupid prizes. And rule two, never mess with her friends either. We always say that, trust your girlfriends. I'm Anna Sinfield, and in this new season of The Girlfriends... Oh my God, this is the same man. ...a group of women discover they've all dated the same prolific con artist. I felt like I got hit by a truck. I thought, how could this happen to me? The cops didn't seem to care, so they take matters into their own hands. I said, oh hell no. I vowed I will be his last target. He's going to get what he deserves. Listen to The Girlfriends. Trust me, babe. On the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. ever. I went and had lunch with him one day and I was like, and dad, I think I want to really give this a shot. I don't know what that means, but I just know the groundlings I'm working my way up through. And I know it's a place that come look for up and coming talent. He said, if it was based solely on talent, I wouldn't worry about you, which is really sweet. He goes, but there's so much luck involved. And he's like, just give it a shot. He goes, but if you ever reach a point where you're banging your head against the wall and it doesn't feel fun anymore, it's okay to quit. If you saw it written down, it would not be on a calendar of, you know, the cat, just hang in there. Yeah, it would not be. Right, it wouldn't be that. There's a lot of luck. Listen to Thanks Dad on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. so one thing that's really been driving me insane by which i mean going on twitter is looking at people like aaron levy of box and brian armstrong of coinbase talking about like agents spending money and the agentic web and how we need to prepare the web for agents doing stuff and the agents will do this fantastical doesn't exist agents don't do that just like not they don't have the ability to like, oh, they'll use computers. Computer use is basically non-functional in AI and it takes insane amounts of compute. It feels like a conversation keeps happening in theory, in the media, on social media about something that's possibly completely impossible, but the certainty they discuss it with is insane to me. This whole agent conversation, I've never seen anything like it in my life. I mean, it does feel a little bit like crypto to me. I think that is kind of a fair comparison where if you had a blockchain-driven software, like in theory that software would kind of work, but it just gave you a worse version of what you could already do for pennies using the actual Amazon server somewhere. And all you were really gaining was some sort of cyber-libertarian philosophical feel of goods about like, yes, but this was purely decentralized. I got worse versions of software to be decentralized. But now no one can control it. This is what like early agents, I mean, okay, so here's what, I've been writing about agents. I've been thinking a lot about it. I mean, the issue is, I don't think people understand what they are. I think people think that it's a new type of digital brain that is now able to go on and do more autonomous activity. I always see this get mixed up. It's just like people talking about Mythos breaking out of its sandbox to do XYZ. Mythos is a language model. You can give it an input and it can give you a token. And you're talking about a program that is calling mythos and then taking actions based on what it called. And this is really what we're talking about with agents is the digital brains are LLMs. And then you write a program that will say to the LLM, give me a plan for doing X. And then the LLM spits out what seems like a reasonable text that seems like a reasonable plan. And then you execute that plan. The program executes that plan on behalf of the LLM. And I wrote about this. Yeah. And I wrote about this earlier this year. LLMs are bad, you know, as a digital brain are bad planners. It's not really, you're not going to get consistently usable plans because what an LLM is actually trying to do is finish the story you gave it. So all it wants to do is produce a story that sounds reasonable. So it's giving you reasonable sounding plans. Like, yeah, that's what a plan for doing this would more or less sound like. But what it's not doing is actually doing step-by-step evaluations. It doesn't have a clearly isolated goal that it's trying to measure how close you're getting to it. It doesn't have a world model to evaluate what's going to happen with the steps that are going to unfold next. And so in almost every context, it turns out, oh, a digital brain by itself, being an LLM, doesn't lead to good agents. In programming, it seems to work a little bit better. But I do think Gary Marcus, I don't know if it was a scoop, but Gary Marcus captured in a recent newsletter something really important. When Anthropic leaked the code for their cloud code coding harness that sits on top of their LLMs to do coding. It turns out they've added a huge amount of old-fashioned hand-coded symbolic AI style rules and pattern recognizers and special if-thens. So they've just been sitting there tuning this program for specifically doing computer programming. And the LLM is being a little bit more isolated to just the code production. So they've kind of just gone back to old-fashioned. And that's just like an old fashioned system that is plussing up an LLM. But I'm with you. Yeah, it's very hard. Just asking an LLM, tell me, give me a plan for doing X. For almost any scenario of X, you really can't trust a plan from a model whose goal is primarily to finish text, to finish the story you gave it in a reasonable style way. That's not how we plan. That's not how we think about planning. And it doesn't give you consistently usable plans. But you're right. It's magic. Like the agents are coming. They've been saying this. I mean, I wrote the article I wrote, you know, in January. It was like, what happened to the year of the agent? 2025 was the year of the agent. All we had was coding agents. That's the only thing that we worked on that whole year. It was supposed to, I mean, I have the receipts. Early 2025, all of these executives saying, your work as a knowledge worker, not as a computer programmer, but just as a knowledge worker, it's going to be largely done with agents. You're going to have agents are going to be a major part of your workforce in just a normal office setting. And none of that happened because it turns out just asking an LLM, give me a plan for doing X, doesn't often actually produce a workable plan. And as a result, the only way to make agents work, which they do not, is to build a bunch of symbolic or if this, then that shit, just like scripts. Like, I mean, if you use Manus, for example, it's just writing a shit ton of Python. And it's writing it to do stuff that it, it's like, oh yeah, let me just do this. and it just writes a Python tool to fill out a spreadsheet. It's insane. It's really insane. But what's more insane to me is that the conversation around agents is as if they're already here. I'm about to read you something from Box CEO Aaron Levy, the CEO of a public company. One corollary to the fact that AI agents take real work to set up in a company at scale is that the role of the forward-deployed engineer, or whatever it gets called in the future, isn't going away anytime soon. When a vendor sells any kind of agents into an organization, you're no longer just selling a software tool that gets implemented and you're done, you're fundamentally selling some sort of actual workflow being done by your technology. What are you fucking talking about? What are you talking, you are a cloud storage and collaboration. What do you sell? And the answer is nothing. They don't sell any agents. Agents don't, oh, agents are going to do this. What you are describing is a different kind of technology. Just that's it. Like it's something else that doesn't exist. but this is everywhere you go you look at any consultancy right now any conference right now there will be a speech about agents even meredith whitaker who i deeply deeply respect went on stage last year was like yeah ai agents using money they're booking plane tickets no they're not they're not that's not happening and i said i say this again deeply respect meredith i said this online people flip their shit at me it's like oh she's directionally correct yeah she's directionally correct it's like let's be scared of the things that exist because i think it's perhaps scarier for a different reason that we have large swaths of the tech industry talking about something that doesn't exist like just like agents don't like they don't they don't exist they don't like people are talking about the agentic internet i keep reading about even on the verge i read about it i read it all over the shop where it's like oh yeah well the internet needs to be rebuilt for agents to use it's like what do you mean and they never say because the answer is when we come up with something else because i don't even think neurosymbolic makes sense for this i mean neurosymbolic being the one where it's they have a deterministic system that they access it from what i understand like the other thing as well, now that I think about it out loud, is how would they actually browse the internet? Where are they being housed? Are we using GPUs to make them browse the internet? That's insanely, insanely, that's very, very convoluted and probably quite expensive to do. And to what end? That's the real question, right? I mean, I've seen these proposals. I mean, basically where a lot of these proposals go, I mean, it's the agents were supposed to, we thought that we could just make AI do anything. So we'll just, we'll have it use the mouse and just use our computers for us. Oh, that's hard. We don't know how to do that. All right, so what we'll do is we'll rewire all applications that anyone uses in the internet so that we don't actually have to use the mouse. It can have a text interface so that an LLM, like they do, the coding agents do, can give a description of how to do something in Excel in text without having to actually move a mouse or click things around. And then these evolve to say, okay, well, what's the one type of instruction that we're good at producing? because when LLMs produce plans, they're directionally correct plans. They don't actually get the thing done. But they said, oh, what LLMs are good at is producing code that compiles and we can actually check that it works. And so this is where this whole vision has changed is that all applications and internet websites should have a code accessible API that you can expose and then an LLM can write a program that will then access that API. So we don't need to teach the LLM how to use Excel. It'll write a Python program that'll call hooks into Excel. The problem with this is no one wants to open up their application to just agents in general. If I'm Microsoft, I was like, I want to write a custom tool for my program. Why would I expose my program for anyone else to use it? But your original question is a big one. To what end? I've been writing about this recently, especially with work and AI. You've got to find the real bottlenecks, right? Yeah. It's the drunk looking for the keys under the streetlight. There's a lot of this going on where this is what we can do with AI right now. Then this now becomes like the key to productivity. But the real bottlenecks in people's work is often not the things that we're trying to aim AI at. Like I don't know people are super frustrated at booking a plane ticket online. Yeah, it's really easy. How often do you book plane tickets? You kind of want to know. Like let me see. Maybe this time will be better. What seat's available? It takes five minutes. So it was a huge jump to go from a travel agent to a web interface. But this is not a bottleneck in people's life now where I want to give complicated time. And they're easy. They're so simple. I can do it while sitting on the toilet. I don't want an agent to choose. And people are like, oh, your calendar will tell it. My calendar doesn't lay out my entire day. I don't have every single thing I do on there. It's just strange. Well, I had the same argument with like social science researchers who are like, if you're geeky enough to learn coding agents, they're like, this is revolutionizing science research because now, for example, you could have it write a program to process a data file and then format it into a plot And that might have taken you four hours to do And you work with it for a half hour and you get that result This is revolutionizing research. And I'm saying, well, it's not. The bottleneck for social science researchers is not analyzing data and producing plots. You're not sitting there doing that eight hours a day every day. And if I could do this twice as fast, I'll produce twice as many papers. I might write one paper in a three-month period. Yeah, in there, there's like four hours I spent making a plot. And sure, it'd be nice if that four hours became 30 minutes. But that's four hours out of like a multi-month process of sort of thinking about this paper. What is a plot, by the way? Like a graph. Oh, right. Yeah, the computer science term. But yeah, it's like that's nice. That got a little bit faster. But that's not the bottleneck. That's not what's going to unlock a lot more research. It's like, man, I would write more papers if it wasn't for how long it took me to draw a graph. And if you could have five graphs. Isn't the problem data? Getting the data. Like actually collecting data? That's what it is. I wrote about this talking to a well-known business school professor years ago for my book, Deep Work. And he talked about, he just realized, oh, being a business professor, publishing papers is about data access. I have to spend most of my year talking to people, building relationships, trying to set up an agreement with a company where I can get good data that I can get three papers out of. In all of that work, there's one day in there where you're crunching the numbers and making a plot. And that's nicer. If you could do a little bit faster, but it's not a productivity bottleneck. It's a marginal efficiency. I think there's a lot of that going on right now with AI and productivity as we look at what the AI can do and then try to make that thing into somehow being the key to getting things done. I just, my productivity problem is that the UI and UX and everything sucks. Everything's disjointed. Setting up Riverside is always fun. They move the menus around. Projects are in a different place. That takes up time. moving files places also takes up a lot of time this morning when i put out my private credit piece i had to do these threads i had to click around a website and put in the alt text but i had to tweak it slightly it's like i don't know how ai would possibly help me here and they're not working on that well they tried they tried they i thought that was going to be this is what i was excited about earlier in the gen ai revolution i was like okay here's the real value prop is natural language interface into advanced features on software where I can just say, all right, I want you to go take this column in the spreadsheet and get rid of all the rows that have values before this. And then I want to make a pie chart because I don't want to learn how to do all that in Excel. I don't know how to do that. And they tried it. I mean, this was Microsoft Copilot, but it turns out we underestimated the degree to which when we as humans are interacting with a chatbot that we're incredibly gracious, we're able to adjust and kind of get the gist of what it means and filter out the part of the chatbot response that's not really relevant or ask the follow-up question. And when they tried to just use LLM responses to automate actions within programs, it's just not accurate enough. So they wanted that to be the case, that you could just be talking to a Riverside bot and you never would have to press a button ever again in Riverside. It's just not accurate enough. LLMs, it's fine for human conversation. It's just not accurate enough in this general case. Also, that thing you're describing with how they want the agentic web to just be a series of APIs so that every agent writes Python or what have you to use them, that's a massive computational increase for no reason. Because you're basically saying, instead of someone clicking a mouse and hitting a keyboard, we will write code for everything. yeah what an insane what a truly insane idea i mean it's it's just very like salesforce today i don't know if you saw they announced that they're doing salesforce headless 360 mark benioff needs to fire everyone in marketing but they've made it so that you can do everything with salesforce fire and api which is i mean the first question i always ask is what does salesforce do because i've talked to so many people they can't tell me there's like 21 different features no one knows what they do but it's like it's just a very bizarre thing it's very much a cart horse thing but also what agent like that's what this is the thing that really drives me insane they're talking about we built this api for the agentic web for agents to use it which one what agent what are you talking about well it will be in the future what do you you change something materially with your publicly traded company worth 300 billion dollars because it might happen while we're getting ahead of it what the and it's you talk to members of the media about this and they just go yeah you know yeah yeah you know it'll happen it's obviously gonna happen they wouldn't put this much money behind it if it wasn't going to it's like i don't know especially with salesforce and i'm like you don't think salesforce would spend a bunch of money for no reason well buddy you've not been following Salesforce at all then. Yeah, go on. Yeah, I was going to say, how much did Meta spend on the metaverse? Over $70 billion. Where did that money go? Where did it go? Where did it go? Customizing, floating dinosaur avatars. Not building legs. That's the second $50 billion, right? If they had gotten the second half of the investment, they would have got to the legs. They're just not there yet. Another $100 billion will have toes. So changing subject a little, Mythos has been one of my favorite media hysterias recently. I genuinely wonder, like, if they ran War of the Worlds again today, I think Axios would have a headline two minutes and it'd be like, there are aliens, they're attacking. I heard it on a podcast. I've looked through the system card. I don't know if you have for Mythos. It's wacky. It's wacky. It's wacky. I can't believe we're letting people get away with having a psychologist talking to the chatbot. Like in your system, it's nuts. It's all gone through marketing. They had a psychiatrist or a psychologist, I can't remember, talk to it and be like, yeah, we found these emotional features. How is – like we need regulators to stop this stuff because I've heard – and people's response to this is, well, banks are having meetings about it and the government's having meetings about it. Governments have meetings about NFTs. there was a way gavin newsom signed an executive order about uh web 3 these people will meet and talk about anything oh it's scary and they're not talking about it which means it's powerful well how is it powerful what does it do because i think you probably saw this as well it didn't list how many false positives there were it also didn't mention that the free bsd bug that they talk about that they found the wasn't actually exploitable i think it was something about like the level it was at. I forget. I don't do programming other than very simple Python. A dog's Python. Yeah. I mean, FreeBS Deep Kernel is full of bugs. All these things are full of bugs. Because they're open source. I had to have this conversation with someone recently where they were like, Mythos, can you believe of all the places it found a bug in the kernel of Linux? Like in Linux, they found, like, are you kidding me? all day long is just bug fixes having to be pushed into that repository. Yeah, the Mythos story, I think, I mean, A, someone needs to get a Nobel Prize in marketing because it was absolutely brilliant what they did there. I've spent a lot of time on it. It's complicated because, again, you can't really trust the system. The system cards are just gonzo that Anthropic puts out, and it's not publicly available. But there were, I think, a few very telling things. So there's two features they say Mythos has. One is finding vulnerabilities in source code, and two is writing programs to exploit them. It's first really important that people understand this has been something that people have been doing with LLMs since the beginning of publicly available LLMs, right? Not only is there nothing new about that, but I found – they put this on my podcast – almost word for word from the Anthropic system card, they said in the Opus 4.6, rather, systems card, right? a publicly available model that's already been out for many months, almost word for word for what they said about mythos, except for no coverage of it and no fear. They said, we have found 500 zero-day vulnerabilities, including some that had been existing for decades without having been discovered. That is what they said about what Opus 4.6 could do. For mythos, they said the same thing. They just replaced the word 500 with thousands. But when Opus 4.6 came out, there was no, oh my God, They have found many hundreds of zero-day exploits, many of which have been around for decades because they didn't push that marketing button. No one particularly cared about it. I went back to my podcast and showed multiple papers. This has been a huge concern. And it's a real concern, by the way, right? Yeah. Is that partially what slows down slightly cracking, right, the breaking into systems, is the fact that it's annoying and hard. And LLMs have made it easier. GPT-4 was good at finding exploits, right? And this was a big deal. They were like, GPT-3-5 wasn't great at it. GPT-4 is. And then as we got the more recent models, they've been much better at writing code to exploit them because we had better agents for it and they're better able to produce multi-step software goals and so they can better build software to exploit them. This is a real issue, but it's not new with Mythos, right? But Mythos was presented as if some Rubicon had been passed. But there was a couple things I noticed right off the bat. One, they made the mistake of listing a bunch of the exploits that they vulnerabilities they had found to try to brag. Look at this thing in FreeBSD. Look at this thing in FFP&G or whatever. They showed all these exploits they found. They didn't count on a lot of security researchers said, well, wait a second. Why don't I get a much smaller, cheaper model aiming at that same source code and say, can you find any vulnerabilities? They could find the same ones. So the evidence that it's finding vulnerabilities better, we don't have any way of knowing that's true. And if anything, we actually are getting a lot of reports that they were paying big bounties for security researchers. I'm going to give you access to mythos. I'm going to pay you for any bugs you can report that you found with it. So they had security researchers just, who knows how many false positives were coming out of that. And then on the exploitation side, we only really have one study. It comes from AISI, who I do not trust, but it's the only independent study. The fact that they gave them access itself should make us maybe a little bit suspect. But it basically just showed normal progression. No massive leap. Model by model gets a little bit better on some of these tests and benchmarks and Mythos has no out of scale leap. It's just like on some it's about the same, on some it's a little bit better. And yet it got covered as if we had just turned on Whopper from the movie War Games. Like we had just some new entity that was like on its own undermining security. And I do not think that, I think that was highly credulous coverage of what almost certainly is just like a standard, slight, jagged move forward on these various capabilities that we've been seeing for the last three years. Run a business and not thinking about podcasting? Think again. More Americans listen to podcasts than ad-supported streaming music from Spotify and Pandora. And as the number one podcaster, iHeart's twice as large as the next two combined. So whatever your customers listen to, they'll hear your message. Plus, only iHeart can extend your message to audiences across broadcast radio. Think podcasting can help your business? Think iHeart. Streaming, radio, and podcasting. Call 844-844-iHeart to get started. That's 844-844-iHeart. A win is a win. A win is a win. I don't care what y'all say. Yep, that's me, Clifford Taylor IV. You might have seen the skits, the reactions, my journey from basketball to college football, or my career in sports media. Well, somewhere along the way, this platform became bigger than I ever imagined. And now I'm bringing all of that excitement to my brand new podcast, The Clifford Show. This is a place for raw, unfiltered conversations with some of your favorite athletes, creators, and voices that not only deserve to be heard, but celebrated. One week, I'll take you behind the scenes of the biggest moments in sports and entertainment. And the next we'll talk about life, mental health, purpose, and even music. The Clifford Show isn't just a podcast. It's a space for honest conversations, stories that don't always get told, and for people who are chasing something bigger. So if you've ever supported me or you're just chasing down a dream, this is right where you need to be. Listen to The Clifford Show on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. And for more behind the scenes, follow at Clifford and at TikTok Podcast Network on TikTok. There's two golden rules that any man should live by. Rule one, never mess with a country girl. You play stupid games, you get stupid prizes. And rule two, never mess with her friends either. We always say that, trust your girlfriends. I'm Anna Sinfield, and in this new season of The Girlfriends... Oh my God, this is the same man. A group of women discover they've all dated the same prolific con artist. I felt like I got hit by a truck. I thought, how could this happen to me? The cops didn't seem to care, so they take matters into their own hands. I said, oh, hell no. I vowed I will be his last target. He's going to get what he deserves. Listen to The Girlfriends. Trust me, babe. On the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. What's up everyone? I'm Ako Wodum. My next guest you know from Step Brothers, Anchorman, Saturday Night Live, and the Big Money Players Network, it's Will Ferrell. My dad gave me the best advice ever. I went and had lunch with him one day and I was like, and dad I think I want to really give this a shot. I don't know what that means but I just know the groundlings I'm working my way up through and I know it's a place they come Look for up and coming talent. He said, if it was based solely on talent, I wouldn't worry about you, which is really sweet. Yeah. He goes, but there's so much luck involved. And he's like, just give it a shot. He goes, but if you ever reach a point where you're banging your head against the wall and it doesn't feel fun anymore, it's okay to quit. If you saw it written down, it would not be an inspiration. It would not be on a calendar of, you know, the cat. Just hang in there. Yeah, it would not be. Right, it wouldn't be that. There's a lot of luck. Listen to Thanks Dad on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. Also, when you said that the difference between Opus 4.6 and Mythos 500, 2000s, makes me ask the very simple question of did they look as hard? To your point about the security researcher. They didn't. Like, did they spend as much time? Probably not. So they probably could have found them. Also, by the way, I immediately was looking it up. AI Safety Institute is, of course, heavily linked to effective altruism. Can I say why I'm upset at AISI? I talked about them on my... Two weeks ago, I did a... I don't know when this is coming out, but I did a podcast in whatever, March, where I looked at this report. And mainly, I looked at the Guardian's coverage of this report done by AISI. But it was just the most inane thing. The headline was massive increase in AI scheming is detected. And they had a chart. Jesus fucking Christ. And they had a chart. God damn it. And bad line went up. And it went up in like January and it goes up. And if you read this article about this study, they're like, something's going on. Scheming has been increasing rapidly recently. And they like gave some examples of it or whatever. And so I look at this. Like, I want to look at what is going on here. So I look at this chart. What are they charting? Oh, they're charting tweets per day. That they've detect tweets about AI doing things that you didn't want it to do. And I said, huh. So when does this line start going up? The week that OpenClaw was released to the public. And everyone just started building their own bad agents and then tweeting about how bad they were. And you know what word was not mentioned in that article? OpenClaw. And even though the examples they were giving, so they just said scheming just started rising. I guess AI is becoming sentient. And all they were measuring was people paraphrasing the same viral story to use their own fucking language. And then I looked at the biggest spike. I was like, well, this day in February on this chart had the biggest spike. It was like, oh, there was this one tweet about OpenClaw erasing someone's emails. And then it got retweeted. It went super viral. I was like, okay, great. The real headline of this article, letting people write their own agents leads to terrible agents. That's it. But the whole thing, so that's AISI. I'm looking at the tweets as well. One of them is from a 47 follower account with AIR called underscore underscore just underscore underscore Lisa. And it's, this is really bad. Opus is editing files and making up reasons it's deleting adult content. So hallucinations. And also Opus is not doing that. The stupid open claw program you wrote that prompting Opus and then taking action on your computer based on what it says is deleting your files The program you wrote that you gave access to your files and just said whatever we get from this prompt execute it is erasing your files Opus can't do anything. It can produce tokens. But here's the other point I want to make about Mythos that I don't think is being made. And it reminds me of the Sherlock Holmes story of the dog that didn't bark, right? Where the actual piece of evidence that mattered is not what you heard, but what you didn't hear. This is what I think the real story here is, is you did not hear Dario Amadei in the lead up to the Mythos release in the last year, let's say, or the last two years. You did not hear him talking about what we're working on and why AI is important is because we're going to be able to find vulnerabilities in software that have been long hidden. We're going to build the ultimate cybersecurity machine. This was not discussed. That's old fashioned stuff. That's boring stuff. That's stuff that we were worried at. Even GPT-2, people were worried about that. What we've been hearing about steadily was jobs are going to be automated. We're going to have like whole creative industries wiped out. We might have sentience coming and at the very least like AGI and these massive disruptions. This is what they've been focusing on again and again. And then their biggest, best model, right? Their newest, greatest, bestest model that they train forever and use all the electricity. What did they say about it? None of those things. They didn't talk about any of the things they said the key AI was, the things they were afraid of, the things they're excited about. instead they went back and talked about a boring parochial old feature that has been an issue that nerdy security researchers have been talking about for a half decade now that to me is if i was an investor i would say take off your like greek helmet cosplay mythos is coming to just hold on a second is this better at automating jobs is this better at like producing code is this is this is this age like why are we talking about finding bugs we're worried about that with gpt4 like that's a problem, but it's like that's not something new. Uh-oh, something must be going on. You just put a lot of money into a new model. And the best thing you could find to emphasize was it's good at finding bugs. I think that is a problem. It's what they didn't say about this model. They would have much, much, much rather be able to brag this model is now much better at any of those things that they've been saying is the key to the AI future. And you didn't hear them talk much at all about any of those. Yeah. And that's the thing. If it was so powerful, like here's the thing. I don't know what would make me convinced that LLMs were the future, but a step toward it would be, we typed create a Slack competitor, which they claim they did once and then didn't show it and refused to. And they said, oh, it worked autonomously for 30 hours, but then wouldn't talk about it. If they were like, we created the Slack clone, here it is. And it was bug-free. Like if it actually just worked and we're like, we now, we have done this Because theoretically, if this SaaSpocalypse story was true, which it's not, that AI is going to replace all software, if they actually did that, because someone from Anthropic just left the board of Figma and they created a Figma clone and the stock went down because the market's run by toddlers. If they were like, we've released a clone of Microsoft Word, like we've done Anthropic Word and we now sell that as part of our subscription, that would actually be quite something. But the thing is, they're not. It gets back to the old talking point of if they made AGI, why would they sell it? Wouldn't it be a massive competitive advantage to keep this? And I think you're right. I think maybe Mythos is not as powerful as they say and they've just had to dress it up. But it gets back to the thing of the directionally true media coverage. It's like, well, this is scary, right? I mean, that system card's like 180 pages long. I don't got all day. I have to write three 100-word blogs a week. I couldn't possibly spend time reading this. And it's just – We need so much more skepticism. We need so much more skepticism, right? I mean, this is why, again, like the most skeptical – we're not skeptics, but like the – I call it the East Coast Computer Scientists. So those of you, we're technically minded and we're not near Silicon Valley. So we're not in that world. It's very hard to be a professor in a world where there's just hundreds of millions of dollars being handed around and they try to like ignore it. But the East Coast computer scientists are all baffled by – you talk to any East Coast computer scientist. They're all baffled by – like oftentimes there's claims that are just not true or widely exaggerated. Why are we so credulous? I mean it would be one thing if it was like a government agency we didn't realize was like trying to protect the fact that there was UFOs. And they're just straight up lying. We've never encountered that before. Like I didn't realize that – no, it's a business. and the credulity with which we're taking these claims. Like Mythos is, I think the most important story there is, yeah, this is another example of what I wrote last summer about AI has hit a bit of a wall in the sense that all of the improvements that have come really since over the last two years have almost all been either on post-training or more importantly on the harnesses that you built. So it comes in the software you're building. What is a harness? I've seen this word used a lot. I think it's good for me and the listeners to hear the exact definition. Think of it as like a computer program that can do stuff. You can talk to it, can do stuff, and it'll prompt or talk to an LLM as like its digital brain. So the harness might actually be able to touch your file system, write the files, compile code, move things around. But to figure out what actions to take, it will also then prompt an LLM and say, okay, what should I do next? And you can put it on different- Is that just a wrapper? Yeah, it's a wrapper. But that's where all the progress has come. All of the progress in coding agents since about a year has come, especially starting this fall, has come from better wrappers, better harnesses. It's all in let's build better, just hand coding, no machine learning, no intelligence, no Skynet here, but just hand coding these programs that we'll call LLMs. Let's just keep tuning and tweaking those to be better and better. And, of course, the programmers building those particular programs, they're building them to do their type of work. So it's a field they understand really well. So they can really just sit here and twist and tune. And also, programmers are very adaptable. They like tools, and they'll adapt around the weaknesses or not. So it's kind of like a best-case scenario. But this is another indication of we're not getting these fundamental giant leaps in the capabilities of the digital brains. It's either some bench-maxing, like we tuned it to do better on a particular benchmark, or we built better programs around it. So when you put the money that they put in the mythos, and if really the best thing you had to emphasize when it was done is we have a cybersecurity benchmark where Opus 4.6 was at 66.7 and this is 83.1. That doesn't necessarily going to justify what's going on. Or that AISI has this – there's only one thing in there where they see a leap from mythos at a particular contrived security scenario they came up with. And this big leap that got them all worried was Opus 4.6 could, on average, complete 16 out of 32 steps in this challenge. And Mythos, on average, could do 22 steps out of 32. Wow. That's hundreds and hundreds of millions of dollars of training, electricity, or whatever. I think that's an issue. I just – I think that – and maybe this is a simplistic point. I don't think they know what they're doing at this point. like i don't get the sense that anthropic or even open ai has a strategy because today as we're speaking so this will be out next wednesday but they released anthropic design the thing i mentioned the figma clone it's like why are you fucking cloning figma what are you doing you're trying i thought you're going to automate the economy yeah you're going to replace a so you've made a figma clone what like we heard the rumors last year that they were going to do a product um and OpenAI was going to do a productivity suite. It's like, why? It's like they're doing everything they can to ignore the core problem, which is the core technology is not going anywhere. Because Mythos appears to be, they called it a step change, but that's a nice way of saying incremental improvement. It's 100% correct. Yeah. And let me tell you why I would be worried if I was them. Here's the worrisome thing about Mythos, right? Is again, they talk about these vulnerabilities hidden for decades that Mythos found or what have you. And they replicated multiple different independent security teams were able to find most of those vulnerabilities using three to five billion parameter open weight models. So to put that in perspective, right, a model like Mythos is going to have hundreds of billions, if not a trillion parameters. And they use a three to five billion parameter off the shelf. You could run this model on a chip inside your... 10 trillion 10 trillion oh 10 trillion parameter that's crazy love the number bro is that true yeah that's what it's oh my god 10 trillion parameters is insane like you better be uh that better be either gaming the stock market and creating billions of dollars a days and like fancy option returns or changing lead in the gold because to run something that has 10 billion 10 trillion parameters to do almost anything else is a, it's like we're going to launch ourselves in the space to do something and then land every time. That's so incredibly expensive. But the real fear then is like, well, wait a second. If they could do most of this stuff with a free, cheap model that I could just run on a machine at home, that's what keeps, I think, Dario Amadei up at night. That's what keeps Sam Altman up at night. It's the future. Look, I've been pitching this, right? I think the useful and the only ethical and sustainable future for AI is what I call distributed AGI. And I think it's just what the future is going to be, which is you have specialized applications for different things. Where, oh, we want to do this thing over here. We built something that has some AI in it. And maybe it has an LLM or it's a modular architecture and it has a billion parameter model in there and a world model. And it's really good at doing this thing. And it's small and it mainly runs on chip. and now this program can do this thing that I used to have to do. And you multiply that across 10,000 different use cases and you're like, oh, we kind of have AGI, right? There's all these different things that have AI tools that like do pretty well. That's like a completely, probably the most probable future. It's a future I really like for a lot of reasons. There'll be a lot of things that we can't make progress on. A lot of things we will, but it's a much more heterogeneous future. There's no giant how 9,000 brains as economically more interesting and diverse. It doesn't have all the sustainability issues. That has to be the future. But the problem about that future, if you're Sam Altman or Daryl Amadei, is that their entire moat is, unless you need 10 trillion parameters, they want that to be the key to the AI future because that moat is something that no one can cross. And if that's not the moat, if it's just, oh, if I want to build a poker playing AI that's really good, I just need people who are good at poker and to spend a couple of years and figure out a cool custom system. And that thing now does well. if that's the future, you don't need open AI and you don't need Anthropic. And I think that probably might be the future. And I think that's terrifying. And they're trying to race to an IPO and they're marketing out of their butts. Like what can we do to kind of keep things going? So at least we can get our stock on the market. That's what would keep me up at night if I was them is actually the future. There might be a lot of AI in the future and it's not going to be nearly as sexy as they're hoping. What if there's also, by the way, that 10 trillion number, I can't source it to Anthropic. I've seen it reported multiple places. This is a problem. They never talk about it. We have an issue with news right now. We're just like mythology spreads, ironic considering the name. But the other thing is as well, it's like hundreds of billions, a trillion parameter. You're just using a nuke to kill a single gopher. Yeah. Like you're just like, we're going to throw everything we have at it. To the point that, I don't know if you've been seeing the amount of trouble Anthropica's had keeping its service online and how they're making the models dumber. Yeah. It just feels like we're in this weird hysterical moment where no one knows why they're doing this, but everyone's ready to accept whatever anyone says. Like, it's just like, oh, we're all doing this insane thing, so we're just going to repeat what kind of informs the bias and makes us look less dumb because the more excited we are. I think the frontier models are like F1 cars, and the equivalent of points on the F1 circuit are your positioning on the benchmark leaderboards. So you do this, you build these giant models and you spend all this money in electricity and they're so big, they're not even economically viable to have people use, which might really be what's going on with mythos is like, we have to make this seem super premium because otherwise people are going to get charged $5,000 a month. And just like if you're Red Bull or Ferrari, your F1 car doing well on this leaderboard just lets people know, This company builds good cars and then you can sell your normal cars. I think that's a lot of what's going on here is that they want to be high on that leaderboard means we know how to do AI. We AI smart, even though the future of actual consumer deployed products is going to be much more like a Honda Odyssey minivan than it's going to be like a top Formula One car. Well, Cal, it's been an absolute pleasure having you as ever. Where can people find you? You can find me at calnewport.com. my podcast is deep questions on thursdays the thursday episodes are all ai reality checks where i take a fun story actually ed's coming up or he he's he may have already been on it by the time this comes out or maybe it's the day after this comes out so now you have to check it out now the ai reality check episode's gonna double dose you bring this out of me ed by the way you bring out my sort of ornery side i'm normally like the very very kind of stayed uh professor new yorker writer just like well on the one hand on the other you bring this out of me i love it but The thing is, you're critical only of things that need to be. You're still willing to humor these things as long as there's something to humor. And that's why I like having you on because people claim I'm just a hater. So we've got to have people for a little balance. But thank you for joining me. Thank you, everyone, for listening. You have a monologue coming up as well on Friday. Thank you all. Thank you for listening to Better Offline. The editor and composer of the Better Offline theme song is Matt Ossowski. You can check out more of his music and audio projects at matosowski.com. M-A-T-T-O-S-O-W-S-K-I.com. You can email me at ez at betteroffline.com or visit betteroffline.com to find more podcast links and, of course, my newsletter. I also really recommend you go to chat.wheresyoured.at to visit the Discord and go to r slash betteroffline to check out our Reddit. Thank you so much for listening. Better Offline is a production of Cool Zone Media. For more from Cool Zone Media, visit our website, coolzonemedia.com, or check us out on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. a win is a win a win is a win I don't care what y'all say yep that's me Clifford Taylor the fourth you might have seen the skits my basketball and college football journey or my career in sports media well now I'm bringing all of that excitement to my brand new podcast The Clifford Show this is a place for raw unfiltered conversations with athletes creators and voices that not only deserve to be heard but celebrated so let's get to it listen to The Clifford Show on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. And for more behind the scenes, follow at Clifford and at TikTok Podcast Network on TikTok. When a group of women discover they've all dated the same prolific con artist, they take matters into their own hands. I vowed I will be his last target. He is not going to get away with this. He's going to get what he deserves. We always say that. Trust your girlfriends. Listen to the girlfriends. Trust me, babe. On the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. In 2023, Bachelor star Clayton Eckerd was accused of fathering twins, but the pregnancy appeared to be a hoax. You doctored this particular test twice, Ms. Owens, correct? I doctored the test once. It took an army of internet detectives to uncover a disturbing pattern. Two more men who'd been through the same thing. Greg Gillespie and Michael Marancini. My mind was blown. I'm Stephanie Young. This is Love Trapped. Laura, Scottsdale Police. As the season continues, Laura Owens finally faces consequences. Listen to Love Trapped podcast on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts. What's up, everyone? I'm Ako Wodum. My next guest, it's Will Ferrell. Woo, woo, woo, woo, woo. My dad gave me the best advice ever. He goes, just give it a shot. But if you ever reach a point where you're banging your head against the wall and it doesn't feel fun anymore, it's okay to quit. If you saw it written down, it would not be on a calendar of, you know, the cat. Just hang in there. Yeah, it would not be. Right, it wouldn't be that. There's a lot of luck. Listen to Thanks Dad on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.