GPT 5.2 didn't just process data this week. It actually conjectured a brand new formula for single minus gluon tree amplitudes. Which, if you aren't a theoretical physicist, is a problem that human mathematicians generally describe as incredibly long and painful. Yeah, we are talking about what is typically a quarter page of really dense, messy algebra. And the AI identified a hidden pattern that simplified all of that into a single elegant product structure. And that is massively significant because this goes way beyond a calculator doing arithmetic faster than a human. Right. We are looking at a system analyzing a mess of complexity and spotting a symmetry that human experts had missed for decades. Right. It is the difference between simply computing a result and actually understanding the architecture of the physics problem itself. It proves we are building genuine collaborators now. But while the science side is having this massive eureka moment, the way these AI companies operate is shifting just as drastically. We are moving entirely away from the era of chatbots, where you type a question and get an answer. We are entering the era of agents and coworkers that actually execute tasks. That distinction really matters. A chatbot talks. An agent acts. And right now, the industry has split into two very distinct battles to make that happen. There is one battle for raw speed and hardware infrastructure and another for deep autonomous execution. Exactly. Which brings us to the core question we are going to be exploring today. In a week where AI is proving theorems and learning to drive our computers, are the safety guard rails crumbling under the pressure to win massive government and enterprise contracts? It is the defining tension of everything happening right now. And the autonomous side of that battle just got a huge injection of talent. As of this morning, February 25th, 2026, Anthropic has officially acquired Vercept. This is a major signal flare. Vercept was a Seattle-based startup, and they were founded by some real heavy hitters from the Allen Institute for AI. They had built this desktop agent called V. I've actually seen demos of V. It is creepy and impressive in equal measure. It doesn't just read the underlying text code. It physically sees the screen elements. Right. The visual component is the main hurdle everyone has been trying to clear. For a long time, if you wanted an AI system to use a computer, you had to hook it up to an API that is essentially a special backdoor that lets the software talk directly to other software. But the real world is incredibly messy. Most software out there doesn't have clean APIs. Exactly. So Vi interacts directly with the graphical user interface, the actual pixels on the monitor. It looks at the screen just like you do. It identifies that a specific gray rectangle is a submit button, and a specific white box is a text field. Which sounds so simple to us. It does, but for a machine, it is incredibly difficult. It knows the grounding problem. You have to perfectly map pixel coordinates to semantic actions. And this acquisition perfectly explains the sudden jump in Anthropics performance numbers. We saw the new benchmarks for Claude Sonnet 4.6 this week, specifically looking at Osworld. Yes. For context, the Aulis World benchmark is the standard test for measuring how well an AI can navigate an operating system. The test asks things like, can it open a spreadsheet, copy a specific cell, open a web browser, paste that data into a form, and hit enter? Real computer use. Yes. And in late 2024, the best models in the world were scoring under 15% on that benchmark. Which is functionally useless for a user. You would spend more time correcting its mistakes than the time it theoretically saves you. A 15% success rate is basically a toy, but Sonnet 4.6, which is clearly integrating this new Vercep tech, is now hitting 72.5%. That crosses a massive threshold. At over 72%, you can actually walk away from your desk and let the agent run. You can trust it with a multi-step workflow. That is entirely why Amdropic bought them. They aren't interested in keeping the Vi brand alive. They are integrating the team of about 20 engineers to make Claude fully capable of direct computer use. They want an AI that writes the email for you, opens your client, attaches the PDF, and clicks send. So Anthropic is heavily betting on the smart autonomous agent that navigates a messy desktop environment OpenAI however seems to be betting on something else entirely this week They released GPT 5.3 Codex Spark. Spark is the operative word there. This new model is entirely obsessed with latency. And they are achieving this incredible speed through specialized hardware, right? This goes beyond standard software optimization. It is absolutely a hardware play. Spark is running on Cerebra's Wafer Scale Engine 3. I really need you to visualize this for a second because I saw a photograph of this chip recently. Standard computer chips are small. They are roughly the size of a postage stamp. This Cerebra's chip is the size of a dinner plate. Yeah, it's the size of a giant pancake. It is an entire uncut silicon wafer. Usually in standard chip manufacturing, you take a silicon wafer and cut it into hundreds of tiny individual chips. Cerebra's just keeps the whole thing intact as one massive processor. Why does keeping it intact matter so much for speed? Because in a traditional setup, you have your memory sitting on one physical stick and the processor sitting on another. Data has to physically travel back and forth through wires to compute anything. Even if we can get the speed of light, that travel takes time and consumes a lot of energy. Right. On the wafer scale engine, the memory and the compute cores are right next to each other on the exact same piece of silicon. The data transfer delay is effectively eliminated. It is nearly instant. And the practical result of that architecture is over 1,000 tokens per second. Well over 1,000. Wait, hold on. Let's back up a second to put that in perspective. A human being reads roughly five words a second. Generating 1,000 tokens a second means a massive wall of text appears instantly on your screen. You can't even begin to read it as it generates. You absolutely cannot. But for writing code, which is exactly what Spark is designed for, it fundamentally changes the texture of the work. It is essentially 15 times faster at coding than their standard model. They are marketing this as conversational coding. Think about how you use a standard chatbot right now. You type a prompt. You wait maybe 10 seconds. The code appears block by block. You read it. It feels very much like sending a letter and waiting for a reply in the mail. A turn-based interaction. Right. With Spark, the code generates so fast you can interrupt it mid-thought. You can see it going on the wrong logical path in line 3 of a function and just stop it instantly. You correct it on the fly. So it feels more like jamming with a musician in a studio. That is the perfect analogy. It creates a tight, real-time feedback loop. But there is a serious trade-off here. You do not get that kind of speed for free. The model is less rigorous. Exactly. Spark is completely latency-first. If you look at the SWE Bench Pro scores, which is the premier software engineering benchmark, Spark scores significantly lower than the full GPT 5.3 codex model. And perhaps more worryingly, OpenAI explicitly states it is not rated for high-capability cybersecurity work. So we have a purposeful division now. If you want the AI to discover the glue-on-tree amplitude formula, you wait patiently for the slow, deep-thinking model. If you want to hack together a website infrastructure in 10 minutes, you use Spark for fast execution. It is a deliberate split between deep reasoning and velocity. Speaking of deep reasoning and that physics discovery we mentioned at the start, we sort of glossed over where that actually took place. That discovery did not happen in a standard chat window on a browser. It happened inside Prism. OpenAI Prism. Yeah. This is their brand new workspace designed specifically for scientists. And it is fascinating because it attacks the actual workflow of doing science. It is a fully latex native writing environment. Latex is the complex typesetting system that pretty much every physicist and mathematician uses to write their formal papers. Prism integrates the AI directly into that source mode. Plenty of standard text editors have AI plugins these days. But Prism is completely different because of the context window and the deep integration. Prism reads the entire project structure simultaneously. It sees your equations, your citations, your raw empirical data files, and all of your figures. So if I go in and tweak a variable in my core equation on page 2, PRISM automatically knows to update the resulting graph in figure 3 on page 10 Yes It validates if your empirical results actually match your theoretical model It can check all your citations against the actual text of the reference papers to ensure you are quoting them correctly It is doing the tedious grunt work of consistency that usually drives researchers crazy. And I asked about the business model earlier because I noticed they are offering this entirely for free for personal accounts. That is the classic Silicon Valley play. They want to become the default infrastructure for scientific discovery. Right now, a scientist might use Overleaf for collaborative writing, Zotero to manage their citations, and Python for data analysis. Prism is designed to replace all of those fragmented tools. They want the next Nobel Prize-winning discovery to happen natively inside an OpenAI interface. Discovery is an incredibly valuable commodity. If you own the tool where the science happens, you get to see where human knowledge is going before anyone else does. Which brings us directly to the money. Because whether it is Vi navigating a messy desktop or Prism writing a theoretical physics paper, this technology has to be monetized. And simply selling an API key to developers isn't cutting it for these massive valuations anymore. We are seeing the rise of the true AI coworker and the massive consulting armies required to install them. OpenAI refers to these as frontier alliances. They have realized a harsh truth. You cannot just hand a Fortune 500 company access to a super intelligent model and expect them to magically become more productive. The companies literally do not know how to use it. They have no idea how to wire it into their legacy systems. So OpenAI has officially partnered with McKinsey, BCG, Accenture, and Capgemini. These are the massive consulting firms you traditionally hire when you want to fire half your staff and completely restructure the remaining half. Or, phrased more charitably, they're the people you hire when you need to redesign your organization's central nervous system. These consulting firms are wiring the Frontier platform directly into massive corporate data warehouses and customer relationship management systems. They are actively redesigning organizational workflows to accommodate autonomous agents. It is essentially organizational surgery. And it is very expensive surgery. Anthropic is playing this enterprise game just as hard right now. They are currently hitting a $14 billion revenue run rate. And to fuel that massive infrastructure and enterprise expansion, they just closed their Series G funding round. The number on that round was staggering, $30 billion. $30 billion in cash. That completely values Anthropic at $380 billion. That is an immense, almost incomprehensible amount of capital. But here is where the tension we talked about earlier really surfaces. We have $30 billion funding rounds. We have heavy military interests, and we have these massive consulting armies deploying agents. What happens to the original safety mission? Anthropic was founded specifically by people leaving OpenAI to be the definitive safety company. That core mission is severely colliding with reality right now. Just yesterday, on February 24th, Anthropic released version 3.0 of their Responsible Scaling Policy, or RSP. I read through that document last night. There is a very specific, very controversial change in the language. In the previous versions of the RSP, Anthropic had a hard written commitment. They stated they would unilaterally pause all development if they couldn't meet certain strict safety measures. If a model was deemed too dangerous or autonomous to contain, they would stop training. Period. And in version 3.0, that is gone. That unilateral commitment is entirely gone. They've replaced it with a clear bifurcation. They now differentiate between industry recommendations and company plans. So they are publicly recommending that the entire AI industry should pause if things get dangerous, but they are no longer promising that they will pause if their competitors keep going. They are framing it as a classic collective action problem. Their argument, which is highly rational from a pure business perspective, is that if Anthropic pauses to build perfect safety guards, but OpenAI or a massive state-backed lab in China keeps scaling, Anthropic just loses market share. They lose their influence over the industry. Exactly. They effectively cede the future to actors who might be far less concerned with safety than they are. It is the ultimate prisoner's dilemma. If I play nice and you decide to play rough, I lose everything. So I am forced to keep playing rough But there another massive pressure point here that we absolutely have to discuss The Pentagon The U Department of Defense has been getting incredibly loud about AI integration over the last year They have. Yeah. The Pentagon explicitly communicated that a strict refusal by an AI company to work on national security tasks was viewed as a supply chain risk. Supply chain risk. That is highly specific bureaucrat speak for, we are not going to buy anything from you. Exactly. It is all about reliability. If the United States government is going to heavily integrate your AI agents into their defense systems, they need absolute certainty that you aren't going to suddenly turn the servers off because of an internal moral qualm about how the tech is being used. So this policy shift at Anthropic isn't just about preserving enterprise market share. It is about making themselves a viable long-term government contractor. This new policy is a calculated pivot to survive. in a world where the U.S. government is suddenly the biggest, most important customer in the room. The entire concept of safety is being redefined in real time. It is no longer about slowing down to ensure alignment. It is about staying in the absolute lead so you have the power to set the rules. And while they are fiercely fighting for those lucrative government contracts, they are also simultaneously writing massive checks to make their foundational legal problems disappear. We really have to talk about the data that powers all of this. The cost of content. We finally have a concrete price tag on it. The Barts Vianthropic Settlement. $1.5 billion. That is a historic number for a copyright lawsuit. It completely sets the precedent for the entire industry. For a very long time, the standard defense from these AI labs was fair use. The argument was always that an AI learns exactly like a human learns. It reads publicly available information and internalizes the concepts. But this settlement strongly suggests that the actual cost of doing business moving forward involves paying out billion-dollar class action settlements to the authors and creators. And the details of this specific case are crucial. This wasn't just a broad philosophical debate about whether an AI reading a purchased book is fair use. It was specifically focused on the Books 3 dataset. Right, the shadow libraries. This was a massive dataset that was largely composed of directly pirated books. So the legal battle shifted away from a high-level debate about copyright theory to a very specific, undeniable accusation. You downloaded this material from a known pirate site, and your engineers knew exactly what they were doing. A $1.5 billion penalty absolutely stings. But for a company that just raised $30 billion in cash a few days ago, it is highly affordable. It is literally just a line item on a spreadsheet for them. And that is the dark irony of this entire settlement. A $1.5 billion penalty instantly destroys any new startup. It completely bankrupts a university research lab trying to build an open source model. But for Anthropic or OpenAI or Google, it is just the toll they have to pay to get on the highway. It effectively entrenches the giants. They're the only entities on Earth who can actually afford to retroactively pay for the data they already scraped from the Internet. It completely clears the competitive field. So tying all of this together, we have AI models right now that are capable of genuine, world-changing brilliance. They are discovering hidden physics formulas. They are coding at the speed of thought. And we have autonomous agents that can finally navigate the messy pixel-based reality of our everyday computers. But to actually get those agents out of the lab and into the real world, these companies are making a very specific set of compromises. They're wiring themselves deep into the corporate structure through massive consulting firms. They are softening their founding safety pledges to seamlessly align with military interests. And they are paying billion-dollar fines to retroactively legalize the aggressive data gathering that made the model smart in the first place. We spent years intensely debating whether artificial intelligence would be safe or whether it would be perfectly aligned with human values. But looking at the reality of 2026, safety isn't a philosophical stance anymore. It is simply a clause in a massive enterprise or government contract. And it is being defined entirely by whoever is signing the biggest check, whether that happens to be the Pentagon or the venture capitalists. If you're not subscribed yet, take a second and hit follow on whatever app you're using. It helps us keep making this. We appreciate you being here.