The Pragmatic Engineer

Building Claude Code with Boris Cherny

97 min
Mar 4, 2026about 2 months ago
Listen to Episode
Summary

Boris Cherny, creator of Claude Code at Anthropic, discusses how AI coding tools evolved from a side project to one of the fastest-growing developer tools. He shares insights on building AI-powered software, the transformation of engineering workflows, and compares the current AI revolution to the printing press disruption of medieval scribes.

Insights
  • AI coding tools are fundamentally changing software development workflows, with some engineers now shipping 20-30 pull requests daily without writing code by hand
  • The most effective AI coding approach involves giving models tools and letting them determine how to use them, rather than constraining them within predefined interfaces
  • Safety in AI coding tools requires multiple layers including alignment, runtime classifiers, and human oversight rather than relying on a single solution
  • The current AI transformation mirrors the printing press revolution, where specialized scribes evolved into a broader class of writers as literacy became democratized
  • Future engineering success will favor generalists who can work across disciplines rather than specialists focused on single domains
Trends
Shift from handwritten code to AI-generated code with human oversightRise of agentic AI systems that use tools autonomously rather than following scripted workflowsIncreased prototyping velocity enabling hundreds of iterations before product launchDemocratization of coding skills beyond traditional software engineersMulti-agent systems and swarm intelligence for complex software development tasksEnterprise AI tools requiring sophisticated permission and security systemsDecline in importance of language and framework debates as AI can translate between themGrowth of hybrid roles combining engineering, product, and design skillsAutomated code review systems using AI as first-pass reviewersReal-time adaptation of development practices as AI models rapidly improve
Companies
Anthropic
AI safety company where Boris works and developed Claude Code
Meta
Boris's previous employer where he led code quality across Instagram, Facebook, WhatsApp
Instagram
Platform where Boris worked on development infrastructure and migrations
Facebook
Social platform mentioned as part of Boris's work at Meta on code quality
WhatsApp
Messaging platform included in Boris's code quality responsibilities at Meta
Y Combinator
Startup accelerator where Boris worked at an early medical software company
OpenAI
AI company mentioned as using WorkOS infrastructure services
Cursor
AI coding tool mentioned as using WorkOS and Sonar's verification services
GitHub
Platform mentioned for code hosting and AI coding tool Copilot
Microsoft
Company cited for publishing research on code quality impact on productivity
People
Boris Cherny
Creator and engineering lead of Claude Code at Anthropic, former Meta engineer
Adam Wal
Boris's ramp-up buddy at Anthropic who rejected his first handwritten pull request
Dario Amodei
Anthropic CEO who questioned how Claude Code achieved rapid internal adoption
Mike Krieger
Mentioned as being present during Claude Code launch review meetings
Fiona Fung
Manager for the Claude Code team, former colleague of Boris at Meta
Andrew Karpathy
AI researcher who posted about feeling behind as a programmer due to AI progress
Marc Andreessen
Venture capitalist quoted about role expansion across tech disciplines
Jared Sumner
Engineer at Anthropic praised for exceptional technical systems understanding
Ben Mann
Anthropic founder who helped brainstorm permission prompts for Claude Code safety
Quotes
"What happens when you join one of the top AI labs in the world and your first pull request gets rejected not because the code was bad, but because you wrote it by hand?"
Host
"The way to think about it is the model is its own thing. You give it tools, you give it programs that it can run, you let it run programs, you let it write programs, but you don't make it a component of this larger system."
Boris Cherny
"I'll be honest, it writes better code than I do."
Boris Cherny
"I think we're living through a time as transformative as the printing press."
Boris Cherny
"The model just wants to use tools. That's just what I realized. This thing, if you give it a tool, it will figure out how to use it to get the thing done."
Boris Cherny
Full Transcript
3 Speakers
Speaker A

What happens when you join one of the top AI labs in the world and your first pull request gets rejected not because the code was bad, but because you wrote it by hand? This is exactly what happened to Boris Czerny when he joined Antrophic. Boris is the creator and engineering lead behind Claude Code. Before joining Antropic, he spent seven years at Meta, where he led code quality across Instagram, Facebook, WhatsApp and Messenger, and was one of the most prolific code authors and code reviewers at the company. In today's episode, we cover how cloud code went from a side project to one of the fastest growing developer tools, and the internal debate at Anthropic whether to release it at all. Boris Daily workflow of shipping 2030 pull requests a day with zero handwritten code and how code review works when AI writes everything. Why Boris believes we're living through a time as transformative as a printing press and which engineering skills matter more now and which ones do not? If you want to understand how one of the people closest to AI coding agents actually builds software today and what that means for the rest of us engineers, this episode is for you. This episode is presented by statsig, the unified platform for flags, analytics, experiments and more. Check out the show Notes to learn more about them and our other season sponsors, Sonar and Work os.

0:00

Speaker B

How did you get into tech, software engineering and coding in general?

1:08

Speaker C

It starts a while back. I think there was kind of like two parallel paths that crossed. So when I was maybe 13 or something like this, I started selling my old Pokemon cards on ebay and I realized that on ebay you can actually write HTML. And I was looking at other people's Pokemon card listings and I realized some of them have big colors and fonts and stuff like this. And then I discovered the blink tag. My name is Blink Tag and if I put the blink tag on it I could sell my card for 99 cents instead of 49 cents or whatever. So I kind of learned about HTML this way. Then I got an HTML book and kind of learned about HTML. And then the second thing was this was also, I think sometime in middle school we had these old TI 83 graphing calculators and we used them for math. And what I realized is I can get a better answer on the math test if I just program the answers to the math test into my calculator. And so I wrote these little programs to just program the answers. And then the test got harder. So then I had to program solvers instead of the actual questions because I didn't know what the coefficients and stuff would be ahead of time. And then the math got more advanced like the next year, and so I had to drop down from basic to assembly to just make the program run a little bit faster.

1:14

Speaker B

Oh, wow. So again in high school you dropped down to assembly.

2:27

Speaker C

I think this is like middle school or high school, maybe like 8th or 9th grade or something like this. Then the thing I realized is everyone in my class was starting to realize that I had this solver and they got kind of jealous. And so I bought this little serial cable so I can give it to them too. And then the next math test, everyone on the class just got A's and the teacher was like, what's going on? Eventually she realized it. It's just like, ok, okay, you get away with it once and knock it off. But for me it was very practical. So in school I studied economics, actually dropped out to start startups. And I never thought that coding would be a career at all. It was always very practical. To me, coding is a means to build things and to make useful things. This startup, the first one was, I think it's like my friends and I were trying to get weed and so we started this weed review startup. We made a website called kind of Different Dispensaries, I think, and then we just tried to get kind of like weed samples so we could like review it for them. And it actually kind of blew up. And then I actually got more interested in. At the time, no one was like testing this stuff. And so I got into kind of the like chemical testing, kind of chemical analysis. And then after this, I kind of did a bunch of other startups and then I joined YC actually pretty early. And that was the first hire of this YC startup up in Palo Alto.

2:30

Speaker B

After how's it decided to go to one startup after the other kind of vibes.

3:53

Speaker C

Vibes, I'd say, because, you know, startups, it's never a linear path. You always kind of pivot, pivot, pivot. You have to figure out what the market wants and what users want. And it's never the thing that you think, you always try a thing, but the idea is always a hypothesis. And then almost always you have to pivot once, twice, three times. You know, at this medical software company, this is called Agile Diagnosis. This was kind of an early YC company. This was back in maybe 2011, 2012, something like that. It was medical software for doctors. And the idea was there's these like clinical decision protocols. They vary a lot hospital to hospital. And our idea was there was one hospital in Chicago that had a really great protocol specifically for cardiac symptoms. And so we're like, wouldn't outcomes be great if every hospital in the US Would use the same protocol? And so we tried to standardize it, and we made this decision tree software for doctors to use. And I wrote some of the software. The team was like. It was just a few of us. It was a pretty small team. And I wrote the software. It was in a web browser. And I remember this was back in the Internet Explorer six days. That's what hospitals were using. And I wrote this SVG renderer because it was this visual decision tree. And we launched it, and then we had a DAU chart and the DUs were flat and couldn't figure it out. And we were piloting it with a few hospitals at the time. And at the time we were based in Palo Alto. We were piloting it with, you know, a few hospitals, including ucsf. And I rode a motorcycle at the time. So I rode my motorcycle up to, you know, ucsf and I shadowed doctors for a couple of days just to see how. How do they actually use this? And I realized that actually, doctors don't have time to sit down and use a computer because you're seeing a patient. Then you have maybe five minutes until the next patient. And in those five minutes, you have to walk down the hall, you have to go to the computer station. You have to open up this totally legacy computer. By the time it boots up, that's like three minutes. Then you open up Inner Explorer 6. That takes like 30 seconds. Then you have to open up this app that we built. You have to sign in, and your five minutes are up. You don't even have time to use it. And so we rewrote everything to run on Android, and they still weren't using it. And the thing we realized is doctors are walking around with a bunch of residents behind them. In this kind of situation, it's like a social situation, right? Like, the thing that matters is they're seen as an authority. They don't want to be seen on their phones. And then we pivoted again. So at that point, we were like, okay, so maybe the doctor isn't the target user. Actually, we want it to be used by maybe nurses or X ray technicians or something like this. At that point, I left because I was like, this is actually pretty far off from kind of what I wanted to do. This is like, the most fun thing for me is finding this product market fit, because it's always surprising. You can't have one big idea because the idea is probably going to be wrong. So you kind of form hypotheses, you follow it down and you see what's right.

3:57

Speaker B

Also, I find it so interesting how you're telling us this story because I feel behind a lot of startup success stories, we hear the success story, we hear the path of how it went. But first of all, a lot of startups are like this. And second of all, what struck me is you were hired as a software engineer, right? And this was back before product engineers or anything was a thing which we're now talking about. But you just like you rode your motorbike and you went there and you shadowed the people and you understood how they're using it, why they're not using it, getting, getting ideas. I feel, you know, this, this is what makes a great software engineer. Back then and even today, right. You weren't, doesn't seem to me that you were focused on the technology. You were focused on the outcome though.

6:48

Speaker C

Yeah, I mean, look, there's different kinds of engineers and there's different ways to do it. And you know, even on our team right now, I look at an engineer like Jared Sumner and he's just incredible technical mind, he understands systems better than anyone I've met. And you know, you need people like this, you need people with this kind of depth. For me, engineering has always been a practical thing and you know, for me I've always been a generalist and like, it doesn't matter if I'm doing, you know, like design or you know, if I'm doing engineering or user research or whatever.

7:32

Speaker A

The investment thesis for AI and software engineering is straightforward. As AI writes more code, more code needs to be verified. But there's a catch. AI generated code is on average harder to verify than human written code. This is why they're Sonar, the makers of Sonarqube. As a critical verification layer for the AI enabled world, Sonar ensures that speed, speed and volume with AI does not compromise your code base. Sonar's competitive position is built on 17 years of specialized expertise that no foundational model can replicate. We're talking about deep analysis engines like symbolic execution and cross repository data flow tracking that simulate how code actually behaves, not just what it says. To bridge the divide between AI productivity and code quality, Sonar has released a SonarQube MCP server. This tool acts as a universal translator between AI applications and the Sonarqube platform. By using the Model context protocol, it gives AI tools like Claude Code, GitHub, Copilot and cursor direct access to Sonarqube's analysis capabilities. Instead of context switching, your AI agent becomes a full fledged code review and quality assurance copilot capable of analyzing code snippets for issues, filtering bugs by severity, and even checking your project's quality gate status before you ever commit code. Whether you're working with coding assistants or scaling up with full agentic workflows, Sonar provides the automated verification that 75% of the Fortune 100 rely on. It's about giving your developers the freedom to innovate without the fear of breaking the code base. Head to sonarsource.com pragmatics to learn more about how sonar enables the confidence to develop at the speed of AI. With this, let's get back to Boris career and what he learned working at startups.

8:04

Speaker C

My first job I ever had, I was like, I think I was 16 and I just wanted to buy an electric guitar. And so what I did was I just started freelancing and so I was like, okay, I guess I'll make websites. And I think Fiverr was not a thing back then. So there's some other freelancing websites. So I just started like, I put up a website, I started bidding on stuff and my first paycheck, I just spent the entire thing on an electric guitar. But it was very practical, right? Because it's like when you're in this kind of setup, you have to do the engineering, you have to do kind of the accounting, you have to do the design, you have to talk to customers. So it's just always been like that for me.

9:45

Speaker B

After a couple of these startups, you ended up at Facebook, now called Meta, and there you spent seven years there. Can you just talk us through what you worked there, what you've learned there? You've also had a very remarkable career growth in terms of four promotions over seven years. And what did you take away from that experience?

10:17

Speaker C

Yeah, so I started on Facebook groups. That was the first time I worked on. Vlad Kolesnikov hired me. I think he's actually still at Facebook. I think he's on some other team now. And it was cool. Actually. There's a big group of people that I worked with that were these kind of early JavaScript people too. And I did a bunch of JavaScript stuff. And it's funny, I kept crossing paths with these people and so Vlad, he worked on Bolt js, which was the software, it was the framework that powered Ads Manager, which later became React js. I kept crossing paths with these people. And later on for, yeah, later on there was a Bunch more people like this. But anyway, so I was working on Facebook groups. I was really excited about it because of this mission of connecting people to their community. This is the thing that drew me in. And at the time I was a big Reddit user. I became a Reddit user back when I was a teenager because I didn't know anything, anyone else that coded. Even in college, I didn't really know anyone that coded. And honestly, I was always kind of embarrassed about it because I thought it was this nerdy thing and I thought it was kind of this thing that I knew how to do. But I wanted, you know, I wanted to be like a cool kid and you know, like, I couldn't like tell people that I coded. It was like, it was very nerdy. And at some point I discovered it was some like, programming community on Reddit. And I was, I was just shocked. Like there's other people that are into this thing. It's like such a weird hobby. It's so niche and it was just so exciting to find like minded people like this and get this connection. And so I just wanted to work on this. I wanted to kind of contribute to this in some way. So I worked on Facebook groups for a while and then there's a bunch of different projects to kind of get into details for any of these. Eventually I became the tech lead for Facebook groups and kind of grew into this. And the org grew. The work really changed. It changed from kind of building to a lot of like doc writing and coordination and kind of delegating to others. The culture was changing at the time. So, you know, this early Facebook culture was disappearing. The docs were coming in, the, you know, alignment meetings were coming in. There was a lot of, a lot more work around this kind of foundational stuff like privacy, security, things like this that I think honestly early on a lot of corners were cut in order to grow, but at some point you just have to pay that debt. And that was the time when that happened. Then I spent a few years at Instagram after, um, and that was also a funny story. My wife got a, got a job offer and she was just really excited about it. And she came to me and was like, hey, like, I got this offer, but we're gonna have to move. Is that okay? And I was like, yeah, that's fine. You know, like, I work in tech. We can work remotely anywhere. Where's the job? And she was like, it's a Nara. And I was like, where's that? And Nara's like rural Japan.

10:38

Speaker B

And this was different time zone as well.

13:17

Speaker C

Different time zone? Yeah, this was 12 hours or something

13:19

Speaker B

different or something like that.

13:21

Speaker C

Something like that, yeah. It was like 2021.

13:22

Speaker B

Wow.

13:25

Speaker C

And then I tried to kind of find a team that would sponsor me because there were these kind of arcane HR rules about the time zone you have to be in and the team you have to be co located with and so on. And so there was a little kind of nascent team for Instagram in Tokyo and Will Bailey was running this team. He was also the guy that made Instagram stories. And so he was my manager for a while. And so we decided to grow that team together. And I worked remotely from Nara and then most of the team was in Tokyo. And during this time I started hacking on Instagram and the stack was just insane. Facebook was the single best web serving stack in the world. The way that everything is optimized, from the hack language to the HHVM runtime, to GraphQL as the transport layer, to the client libraries, relay and all the stuff and React, it was just amazing. There's no other dev stack in the world that was this good and it is just fully optimized. And then I went to Instagram and it's like, you know, Python, where the type checker didn't work and click to definition didn't work. And it was this like kind of hacked together Django and then like a fork of, you know, the Cython runtime and just nothing really worked. And so I came to Instagram, I joined the Labs team, you know, in Japan, and the idea was to find the next big thing for Instagram. We tried some stuff, but what I very quickly realized is that I was just not effective at working on the stack because it was such a terrible stack. And so I just just went and started working on dev infra because we needed to fix it. And there's a few projects that we worked on. So one was migrating from Python to the big Facebook monolith. Another one was migrating from Rust to GraphQL. And these projects, they're actually in progress. These are things that involve. It takes hundreds of engineers many years to do this. It's a big code base, it's a big migration now. It's much faster with these tools that

13:25

Speaker B

we have, the AI tools. And migrations are a pretty good use case for them though.

15:18

Speaker C

Yeah, it's like the, it's the perfect use case for it. And then I just started getting kind of deeper into this and by the end, by the time I left Instagram, so I was working on this on dev infra and kind of leading a bunch of these migrations. That's also where I intersected with Fiona Fung, who is now the manager for the Quad Code team. I just worked with her and she was just such an amazing leader. This incredible depth and kind of history in tech. And I just thought like, there's no better, there's no better manager for this team. And then I also started working on code quality. And so the work on Instagram kind of expanded a bit and by the time I left, I was leading code quality for all of Meta. And so I was responsible for the quality of the code bases across Instagram, Facebook Messenger, WhatsApp, Reality Labs, kind of all these code bases at Meta. It was this program called Better Engineering. And the idea was, I think it's sort of like 2016 or 2018 or something. But Zuck mandated that every engineer at the company, 20% of their time has to be spent fixing tech debt.

15:21

Speaker B

Oh, interesting.

16:17

Speaker C

And we call this better engineering. And some of this is kind of bottom up, where a team knows best the tech debt that they have to fix. And then some of it is top down where you need to do very big migrations, you need to migrate to new language features, new frameworks, things like this. And at Facebook scale, there was tens of thousands of these migrations every year. And so I just started leading all of this and I realized very quick that it just needed a little bit more order to it. There was no goals, no one knew kind of like what the outcomes were. There wasn't any tracking. And so we developed a bunch of stuff. One of the ideas was a centralized way to prioritize the different kind of code quality efforts. The second thing was figuring out the impact of code quality on the engineering productivity, which turned out to be significant.

16:19

Speaker B

How did you measure? What did you find there?

17:04

Speaker C

There is a bunch of stuff. I think some of this has been published. I don't know if all of it has, but essentially you try to do causal analysis and causal inference. This is the methodology and you try to figure out what are the factors that make it so engineers are more productive. Some of it is code quality, some of it is outside of code quality. So for example, Meta went back to return to office instead of work from home. Those partially driven by this because we just found some fairly strong correlations that we thought were causal about this. But code quality actually contributes like, you know, double digit percent to productivity. It turns out even at the biggest

17:06

Speaker B

scale, it's kind of comforting to hear Because I think it's rare to have a place where you actually measure this. But I think we feel it like when you have a clean code base, modular or it can get easier to work with and I think, you know, reasoning could also be easier for LLMs to work with it. And my hint would be, yes, it should be right. But I think there's just very little data. But that's a feeling that I would have.

17:39

Speaker C

Yeah, I think a lot of the big companies have published about this. Like I think Facebook published something. Microsoft publishes a bunch about this, Google does. But yeah, totally. If every time that you build a feature you have to think about, do I use Framework X or Y or Z? These are all options that you can consider because the codebase is in a partially migrated state where all of these are around the code somewhere. As an engineer you're gonna have a bad time, as a new hire you're gonna have a bad time. As a model, you might just pick the wrong thing and then the user has to course correct you. So actually the better thing to do is just always have a clean code base. Always make sure that when you start a migration, you finish the migration. And this is great for engineers and nowadays it's great for models too.

18:04

Speaker B

And then you joined Entrophic and I've heard a story, which you can confirm or give more color to it, that your first pull request was rejected by Adam Wal.

18:46

Speaker C

He was my ramp up buddy. So I joined Anthropic. I was trying to figure out kind of like what to do next. And I met a bunch of people at all the different labs and Anthropic was just the obvious choice for me because of the mission. This is the thing that personally I know that I need the most and also just kind of seeing all this change that's happening. It's important to have some sort of framework to think about this and to think about our role in it. I'm also a really big sci fi reader. That's definitely my genre. I'm a big reader. I have a giant bookshelf at home and stuff. And I just know how bad this thing can go. And I just felt like this is a place that has serious thinkers. People are taking this very seriously and thinking about what can we do to make this thing go better. So when I joined Anthropic I did a bunch of ramp up projects, just various stuff that I was hacking on and I wrote my first pull request by hand because I thought that's how you write code.

18:55

Speaker B

That used to be how you Write code.

19:44

Speaker C

That used to be how you write code. But even at the time at Anthropic, there was this thing called Clyde and it was the predecessor to Quadcode. It was super janky. It was Python. It took 40 seconds to start up. It was research code. It was not agentic. But if you prompt it very carefully and hold the tool just right, it can write code for you. And so Adam rejected my PR and he was like, actually, you should use this quiet thing for it instead. And I was like, okay, cool. It took me like half a day to figure out how to use this tool because you have to pass in a bunch of flags and use it correctly. But then it sped out a working print. It just one shotted it. Oh, and this was like 2024. This was like September 2024, August, something like that. And I think for me, this was my first field AGI moment at Anthropic, because I was just, oh, my God. I didn't know the model could do this. I was used to these kind of tab completions, line level completions in an ide. I had no idea that it could just make a working pull request for me.

19:46

Speaker A

Boris just talked about how he had a true wow moment at work using their AI. A very different wow moment is when you use a tool at work that makes things so much easier than before. And this leads us nicely to our presenting sponsor, statsig. Statsig offers engineering teams a tooling for experimentation and feature flagging that used to require years of internal work to build. It's the kind of tool that was so complex to build that only large companies like Meta or Uber had their own custom advanced tooling for it. Here's what statsig looks like in practice. You ship a change behind a feature gate and roll it out gradually, say to 1% or 10% of users.

20:49

Speaker C

first.

21:23

Speaker A

You watch what happens. Not just did it crash, but what did it do to the metrics you care about. Conversion, retention, error rate, latency. If something looks off, you turn it off quickly. If it's trending the right way, you

21:24

Speaker B

keep it rolling forward.

21:35

Speaker A

And the key is that measurement is part of the workflow. You're not switching between three tools and trying to match up segments and dashboards after the fact. Feature flags, experiments and analytics are all in one place using the same underlying user assignments and data. This is why teams at companies like Notion, Brexit and Atlassian use statsig. Statsic has a generous free tier to get started, and propricing for teams starts at $150 per month.

21:37

Speaker B

To learn more and get a 30

22:00

Speaker A

day enterprise trial, go to statsig.compragmatic and with this, let's get back to Boris and the origin story of Claude code.

22:01

Speaker B

And then when you joined Antrophic, we've covered this in a deep dive, but we could recap briefly on how cloud code came to be out of. Out of what seemed like a side project or just a cool hack.

22:10

Speaker C

So yeah, I started hacking on a bunch of different stuff. I was working on some things in product. I worked on reinforcement learning for a little bit just to kind of understand the layer under. The layer which I was building. This is still advice that I give to a lot of engineers is always understand the layer under. It's really important because that just gives you the depth and you kind of like you have a little bit more levers to work at the layer that you actually work at. This was the advice 10 years ago, it's still the advice today. But the layer under is a little bit different now. Before it was like understand if you're writing JavaScript, understand the JavaScript VM and frameworks and stuff. Now it's like understand the model. So I was hacking on a bunch of different stuff. Some things shipped, some things didn't ship. And at some point I just wanted to understand the publicanthropic API because I'd never used it before and I didn't want to build a ui. I just wanted to, you know, hack something up quite quickly because we didn't have quad code back then, we're still writing code by hand and I wrote this little bash tool that all it did was it hit the Anthropic API and it was essentially like a chat based application, but just in the terminal because that's what AI used to be. And I still think about it like engineers are the first adopters. And so when we started to move out of conversational AI to agentic AI, it took a little bit, but engineers understood it pretty quick. And I think now when you ask non engineers about like what is AI, they would say it's this conversational AI, it's like a chatbot or something. And that's why I'm actually very excited for Cowork, this new product that we've launched because it's going to bring the same thing that engineers saw very early to everyone else. But when I think about Cowork, I think back to this moment that we're talking about very early on. Quad code originally wasn't quad code, it was a chatbot because that's what I thought AI was, but we had to kind of figure out kind of what is the next thing. And so at the time I built this chatbot, it was somewhat useful, but it was just a chatbot. And the next thing that I tried was I wanted it to use tools, because tool use just came out and I didn't know what it was. And I was like, let's experiment. And I gave it a single tool, which was the Bash tool. And I didn't know what to do with the Bash tool. And so I asked it, you know, like, I actually didn't know if it could even do this, but I asked it, like, what music am I listening to? And it just wrote a little applescript program using, like, SED or whatever to open up my music player and then like, query it to see what music it's listening to. And just one shot at this. With Sonnet 3.5, this is actually my second Field AGI moment very quickly after the first one. And the model just wants to use tools. That's just what I realized. This thing, if you give it a tool, it will figure out how to use it to get the thing done. And I think at the time, when I think about the way that people were approaching AI encoding, everyone essentially had this mental model of, you take the model and you put it in a box and you figure out what is the interface? How do you want to interact with this model? What do you need it to do? Essentially, it's like if you have a program, you stub out some module, stub out some function, and you say, okay, this is now AI, but otherwise the rest of the program is just a program. And so this is just not the way to think about the model. The way to think about it is the model is its own thing. You give it tools, you give it programs that it can run, you let it run programs, you let it write programs, but you don't make it a component of this larger system in this way. And I think there's just like, you know, this is a version of the Bitter lesson. The Bitter Lesson is a very specific framing, but there's many corollaries to it. This is one of the corollaries is just let the model do its thing. Don't try to put it in a box, don't try to force it to behave a particular way.

22:22

Speaker B

One of the first ways you saw it was giving it tools, giving it access to the bash and then later to the file system and then to more tools, right?

26:06

Speaker C

That's right, yeah. We give it Bash. Then I say we. It was just me the first three months, but then the team grew, so it was Bash and file Edit. That was the second one.

26:13

Speaker B

And one of the interesting things we talked about last time for the deep dive is when you built it and it started to actually write code with all the tools that you had. You've had an internal debate inside and traffic, should we just keep it to ourselves? Because suddenly it spread across engineering and it was making all of you a lot more productive, right?

26:25

Speaker C

Yeah, that's right. In the end, the decision was to release so that we can study safety in the wild. Because when you think about safety, and I keep talking about the word safety, the reason Anthropic exists as a lab is safety. This is the reason it was founded. This is the reason it exists. If you ask anyone at Anthropic why they chose it, it's because of safety. And so if you think about model safety, there's different layers at which to think about it. There's kind of alignment and mechanistic interpretability. This is at the model layer. Then there's evals. And this is kind of like putting the model in a petri dish and synthetically studying it in this way. And then you can study it in the wild and you can see how it actually behaves. You can see how users talk about it, you can see what are the risks in the wild. And you actually learn a lot this way. And by doing this, we've been able to make the model much safer. So in hindsight, it was totally the right decision.

26:43

Speaker B

It's amusing to hear about it from your perspective, because from the outside, what I saw and what a lot of engineers saw is like, oh, Anthropic release cloud code. Oh, wow. For the first release with, I believe it was with Sonnet 4 release. Did it come up with Sonnet 4 originally or Sonnet 4.5?

27:33

Speaker C

I think it was 4. That was the general availability in February, but I think it was Research Preview before that.

27:52

Speaker B

Yeah, but when it came out, my implementation was like, oh, this thing can write code pretty well. And over time it became a lot more capable. So from our perspective, it was like this really capable coding tool that we just started to adopt and use and use for all sorts of increasingly productive, productive parts. And it has become, I believe, one of the fastest growing developer tools. And I'm always surprised to hear the story that it actually comes from research and the goal to understand how people use the model. Because on the other hand, like, some startups have been trying to build Developer tools deliberately to get adoption. And yet this research tool is getting a lot more adoption.

27:58

Speaker C

I mean, this is anthropic. We're a research lab, we're a safety lab. And product is this kind of thing tacked onto the side. Product exists so that we can serve research better and so we can make the model safer. And this is kind of how we think about everything. There's also this funny moment early on when we had this launch review and we were deciding whether to launch it. I remember this moment because we were in the room, I think there was Mike Krieger, there was Dario, there were some other folks in the room, and we were deciding what should we do. We were looking at the internal adoption chart, which was just vertical. Since we said it was just insane. Nowadays it's 100%, right? Just 100%. Nowadays, every technical employee at Anthropic uses quad code. Every day is pretty much 100%. For non technical employees, it's actually getting quite close to 100%. It's increasing very quickly. Half the sales team uses quad code. Um, and I think that's increasing. It's just. It's crazy. Dario had this question about, like, how. How did it grow this fast? Are you, like, forcing people to use it? And I was like, no, we offer this tool, people vote with their feet and, you know, just like, let people use the tool that they prefer. Yeah, they chose it.

28:38

Speaker B

You don't seem like the person who's act. Exactly. Forcing people to use your tool. Yeah, yeah.

29:46

Speaker C

I mean, the way we did it, we just. We launched a thing and then we just, like, listened to the users and we talked to people. We saw how they use it, we followed up, we made it better. And yeah, I mean, now we're at the point where quad code writes, I think, something like 80% of the code at Anthropic, on average. And it writes all my code, for sure.

29:52

Speaker B

Yeah. And this started for you? It started the first time you mentioned. I think it was in November when it started to write all of your code. When did that switch come? And what happened to made you trust it to write your code or how much do you trust it? How much you review that code, for example.

30:09

Speaker C

So the switch was instant. When we started using Opus 4. 5. This was before. Before it came out. You know, we were dogfooding it for a little bit and it was just. Right away, it's such a more capable model. I just found that I didn't have to open my IDE anymore. I just uninstalled My ide, because I just didn't need it at that point. I actually did that like a month later because I just didn't even realize that I wasn't using it anymore.

30:25

Speaker B

Yeah, a lot of us had similar experiences once Opus4.5 was out in the public and especially over the winter break, I had a similar experience. I just realized that this thing, it actually writes, if I'm being honest with myself, as good code as I would have written in the stack that I'm very familiar with and my code base, my side projects where I know it and just a lot better than what I could for code base that I'm not as familiar or technologies I'm not as familiar with.

30:48

Speaker C

Yeah, I'll be honest, it writes better code than I do.

31:13

Speaker B

I don't want to go there. I still like to keep my pride, but. Probably true.

31:17

Speaker C

Yeah. I realized this because also in December I was traveling a little bit. I was on a coding vacation. We were talking about this before, but I went to Europe. We were just in a different time zone, kind of nomading around. And it was so fun because I was just coding all day, every day, which is my favorite thing to do. And I wrote maybe 10, 20 pull requests every day, something like that. Opus 4.5 and Quadcode wrote 100% of every single one. I didn't edit a single line manually. And I realized at the end of that month Opus introduced maybe two bugs, whereas if I'd written that by hand, that would have been like 20 bugs or something like that.

31:20

Speaker B

Can we talk about your development workflow? You have written threads about this, which is awesome. It's on social media, on threads and on X. But can you tell us how you use today Claude code in terms of parallelism and tips and tricks that you and the team have for kind of learned and share across the. Across the team? Yeah.

31:56

Speaker C

I mean, look, there's no one right way to use quad code. So I can share some tips and things, but I think the wrong conclusion to draw would be to just copy these and use it. The way we build quad code is we build it to be hackable because we know every engineer's workflow is different. There's no one way to do things. There's no two engineers that have the same workflow. It's just every engineer is same workstation set up.

32:15

Speaker B

Right? Like keyboards, monitor placement. All that ever has it differently.

32:40

Speaker C

Yeah, it's like we're like craftspeople, right? Like you choose your tools. Like we care deeply about it. So there's no one right way to do it. So for me, the way that I do it generally is I have five terminal tabs. Each one of them has a checkout of the repository. So it's five parallel checkouts. And usually I'll kind of round robin and start quadcode in each one almost every time I start in plane mode. So that's like Shift tab twice in the terminal. And I also overflow as I run out of tabs because there's only so many terminal tabs. I used to use web a lot for this. So like Claud AI code, that's the place that I overflow to. Nowadays I actually use the desktop app. It's more convenient. So quad code, you know, it's been in our desktop app for, you know, for many months. It's just the code tab in the quad app. And I actually really like it because it has built in worktree support. So that's existed for a while. And that's quite nice for parallelism. So you have multiple. You don't need multiple checkouts, you just have one. And then we automatically set up git work trees for you. So you get this kind of environment isolation. The reason I do that is I actually just really hate fiddling with git work trees on the command line because it's kind of fiddly. Like you need to know the CD Git work tree.

32:43

Speaker B

For those who are not as familiar with it, it's when you can check out, instead of having a separate local folder, it's almost like checkstall separate branch. Right. And then you can work on it separately but not have the conflicts only at like merge time.

33:52

Speaker C

That's right. Imagine that you have a folder, but you have maybe like Git makes five copies of that folder in a way that's very cheap and kind of easy to throw away. So you get this kind of isolation. It can work in parallel and the quads don't interfere.

34:07

Speaker B

Yeah. So you now have support for this, which I think you recently added, like native support. But like for your workflow, you just stuck with the old one of checking out on separate folders, right?

34:20

Speaker C

Yeah, exactly. I actually find that over time, I'm using the desktop app more and more for this, just because I don't need these separate checkouts and you know, I just have a bunch of quads running in parallel and I don't have to think about it. The other surprise hit is the iOS app for me, every day I start like, I wake up and I just start a few agents on my phone.

34:30

Speaker B

Oh, the Native one?

34:46

Speaker C

Yeah, the native one, yeah. It's just like. It's the Quad app. It's the code tab in the Quad app, and it's the same exact quad

34:47

Speaker B

code, except it runs in the cloud, right?

34:52

Speaker C

It runs in the cloud, yeah. So you have to kind of configure the environment. Luckily, our environment's pretty simple, so, you know, and we just use hooks for it. So you just use the session start, hook and configure it. This is kind of one of the benefits of making QWA code really hackable is it's very easy to do this kind of configuration. And this is something, honestly, I would never have predicted because, you know, like, I code on a computer. If you told me six months ago I'd be writing, I don't know, a third, I haven't pulled the data. Maybe like a third half, something like this of my code on a phone. That's crazy. But that's. That's what I'm doing today.

34:55

Speaker B

And you're using parallel agents. At what point did you start using them and how has it changed your work? Because one thing that I noticed on myself, I don't really use that many parallel agents. I may be like two at a time, but I'm someone who. Well, I like to be in charge, and especially with Claude. Claude is a tool that you can follow it along. It tells you what it's doing. You can also have, for example, Learn mode, which this was shipped a lot earlier, where you can actually follow along. It gives you tasks. I feel that staying in one tab and following along the model is pretty fast as well. I can kind of keep in touch. I'm assuming at some point you must have done this, but then what happened when you changed to parallel? And do you feel you're losing any control or. It doesn't really matter that much.

35:29

Speaker C

Yeah, I think there's kind of like two modes to think about or kind of like two kind of workflows to think about. So when you're new to a code base. Highly recommend. Learn mode is awesome. Highly recommend it for people that are onboarding to the quadco team, people that onboard to Anthropic. The thing that we recommend is. So you do. For people that haven't tried it, you do config in quadcode. You pick the output style and you can do Learn or explanatory. We usually recommend Explanatory because that tends to be better for new code bases that you kind of haven't been in before. For me, once you're familiar with the Code base. You just want to be productive, right? Like, you just want to ship as much as you can and you want to kind of be effective doing that. So the rule really switches. I don't really go deep into tasks anymore. I start a quad in plan mode. I'll have it kick something off with Opus 4.5. I think it got there with 4.6. It just really, really does it. Once there is a good plan, it just. It will one shot, the implementation almost every time. So the most important thing is to go back and forth a little bit to get the plan right. So what I do is I start one, I enter plan mode, I give it a prompt as it's chugging along. I'll go to my second tab and I'll start the second Claude, also in plan mode, get it chugging along, then go to the third tab, go to the fourth one, then maybe I'll go back to the first one when I get notified that it's done, and then I'll kind of.

36:14

Speaker B

Do you have notifications on or do you turn them off?

37:30

Speaker C

I actually operate in both modes. Sometimes I do focus mode on the Mac, so I just have it off. But also sometimes I use the system notifications.

37:32

Speaker B

And you're very, very productive with PRs. I mean, I think it was very visible even around the holiday breaks on social media you actually were responding to. I think someone reported a bug or a feature request. I'm not sure which one it was. And then an hour or two later it was done because you did it. You've also talked about number of pull requests you've done on a day. Not to show up, but just as context. What does a pull request typically involve in terms of complexity? Are these are some super trivial or some actually like larger pieces of work as well?

37:42

Speaker C

Yeah, each one varies a lot. Sometimes it's a few lines, sometimes it's a few hundred or a few thousand lines. They're all just very, very different. It's changed so much. Back when I was at Instagram, I think I was one of the top two, maybe top three most productive engineers at Instagram, just by volume of code written.

38:15

Speaker B

Oh, wow.

38:32

Speaker C

So I've always, for me, I've always just coded a lot. Coding is a way that I can express myself and it's just like it's a way that my brain thinks also. And so now I just get to do it. But I think with quad code, the kind of code that you write, if you are very productive, it tends to be even just the number of PR sort of under ourselves. What's happening, Because I think people that used to be very productive in the old days, before AI assistants, a lot of the code maybe was like code migrations or something like this. So like people that shipped, you know, 2030 PRs every day, a lot of it was like pretty, you know, like a one liner or kind of migrating A to B or whatever. Nowadays I ship, you know, 2030 PRs every day, but every PR is just completely different. Some of them are thousands of lines, some of them are hundred, some of them are dozens, some of them are one liners. It's none of these are kind of code migrations because actually Claude just does those. And I don't need to be part

38:33

Speaker B

of that shipping this much code or this much more productive. The obvious question that comes up for any like a software professional as well, the review what the way teams used to work, and I'm not sure if Instagram did this, but a lot of other companies did this is you make a pull request, you put it up there. There's a mandatory human reviewer at Google. There's actually two because there's one on code quality as well. How has this workflow changed? How does the cloud code team think about code review and how has it changed over time?

39:26

Speaker C

Yeah, I'll start by thinking, I'll start by talking about how code review used to work for me. So the way that I used to do it is every time. I also used to be one of the most prolific code reviewers.

39:54

Speaker B

Oh, okay. So both.

40:04

Speaker C

Yeah, yeah.

40:06

Speaker B

Writers or code reviewers.

40:06

Speaker C

That's actually. And that's one of the benefits of being in a different time zone. Like I'm not superhuman, I just didn't have any meetings. And the way that I approach code review is every time that I would have to comment about something, I would drop it in a spreadsheet and I would like describe the issues. So let's say, you know, like someone named a parameter B, you know, in a function badly. I would like put that in a spreadsheet. If someone did some bad react pattern or something, I would put that in a spreadsheet. And then over time I would just kind of tally up the spreadsheet. And anytime that a particular row had more than three or four instances, I would write a lint rule for it. So just automate it with kind of select. And so that's what it used to look like for me. I've always tried to automate myself away because there's just so many things to do. And this is one of our Superpowers as engineers is we are able to automate all of the tedious work. There's very few other fields where you're able to do this thing. This is a thing uniquely that we're able to do. And this is a thing that I've just always enjoyed because it gives me more free time and I get to do the work I actually enjoy. And so today the way this looks is a little different, but it mirrors this a little bit. So when CLAUDE code writes code, it generally it will run tests locally. And this is something CLAUDE just often decides to do when it's relevant or it'll write new tests. So you kind of do this kind of verification. When we make changes to CLAUDE code, CLAUDE will also test itself. So it'll launch itself kind of in a sub process, it'll verify itself, and it'll test itself end to end.

40:08

Speaker B

This is for your internal CLAUDE code implementation. So you have like this test suite so that it can test itself.

41:29

Speaker C

Yeah, that's right, that's right. But it'll literally launch itself just in a bash process and kind of just see like, hey, do I still work while. Okay, so it'll do this. And this is something that we just didn't code in. Like it just, with Opus 4.5 especially, it just started spontaneously doing this. It just wants to kind of check. So we do this and then we also run Claude P. So this is the CLAUDE Agent SDK in CI. So every pull request at Anthropic is code reviewed by Quad Code. And that actually catches maybe like 80% of bugs, something like this. And it's the first round of kind of code review. CLAUDE will automatically address some of these. Some of them it'll leave to a human because it's not sure what to do. There's always an engineer that does the second passive code review. And you know, there always has to be a person in the loop approving the change.

41:36

Speaker B

So on the team, before anything goes into production, if you will, an engine does look at it. Yes. As you're thinking of code review, would you do this for every type of project? Or this is specifically because you now know that this actually has real world impact. People depend on it. You know, there's a lot of users. Let me put it the other way around. Like, can you see places where you would just not have an engineer review code? What situations would that be in?

42:22

Speaker C

I think it depends how. How. How it's used. Yeah, I'd agree with that. Like, you know, if you're building some personal side project like you can just YOLO straight to main.

42:47

Speaker B

You know, like even, even before AI, you would have not reviewed. You just trust yourself or, you know, just ship to production or SSH into production and do some changes, that kind of stuff.

42:54

Speaker C

Right, exactly, exactly. The very first versions of quad code that were internal, like, you know, I committed straight to main, but then, you know, as soon as you have users and you know, for anthropic, our main customer base is enterprises. This is what we care about the most. For us, for safety reasons, security is really important, privacy is important. These are, these are all related. It's also very important for our customers. And so because this is an enterprise product, it has to be secure, it has to be. We have to make sure that it meets a certain bar. So we definitely use a lot of automation. But at least for now there has to be a human in the loop just to make sure.

43:04

Speaker B

One thing that is just known about LLMs is they're non deterministic and by putting the LLM as a reviewer, Claude doing a review, it will give good feedback. But how would you deal with the fact that you can't be sure if it's always giving the feedback? You cannot be sure that even if it's capable of catching an issue that it will necessarily catch that. Are you doing anything in this loop to do deterministic thing? For example, linting is very deterministic as you will very well know. Like have you thought of marrying some of these ideas or are you using, for example, are you using linters on the code base or you found no need to for it?

43:37

Speaker C

Yeah, absolutely, absolutely, yeah.

44:12

Speaker B

So this is just a. Yeah, yeah.

44:14

Speaker C

We have type checkers, we have linters, we run the build. Claude is actually so good at writing lint rules. So actually what I do now, I used to tally stuff up in the spreadsheet. Now what I do is when a coworker puts up a pull request and I'm like, this is lintable. I'll just be odd. Please write a lint roll for this in that PR on their PR and we have, you know, you just run like slash. I think it's like setup GitHub or something like this. You can do this in Claude code and It'll install the GitHub app which then makes it so you can tag OD on any pull request, any issue. I use this every single day. So very, very useful. So you want these deterministic steps also though, there are ways to get Claude to be a little bit more deterministic. So for example, you can do best of N, you can have it do multiple passes and this is actually quite easy to do. So you know, for example, the coderview skill that we use internally, it's open source and it's available in the quad code repo. And so all we do is, you know, we launch parallel agents to do stuff and then we launch parallel deduping agents to check for false positives. But essentially best of n, the way you implement it is all you say is claude start three agents to do this and that's it.

44:16

Speaker A

Boris just talks about building that enterprise infrastructure layer, the auth, the permissions, the security that has to all work before you can ship to real customers. This makes it a great time to speak about our season sponsor WorkOS. If you're building any SaaS, especially an AI product one, then authentication, permissions, security and enterprise identity can quietly turn into a long term investment. Saml edge cases, directory sync, audit logs and all the things enterprise customers expect. It's a lot of work to build these mission critical parts and then some more to maintain them. But you don't have to work OS provides these building blocks as infrastructure so your team can stay focused on what actually makes your product unique. That's why Companies like Anthropic, OpenAI and Cursor already run on workos. Great engineers know what not to build. If identity is one of those things for you, visit workos.com and with this, let's get back to building cloud code with Boris.

45:24

Speaker B

How does cloud code work in terms of architecture? So as an engineer, how can I imagine it's set up? We covered some of this in the deep dive and I think you told me that you had some pretty complex ideas when you started and you just simplified a lot of it.

46:18

Speaker C

Yeah, yeah, it's very simple. Like, you know, there's not much to it. There's like there's a core query loop, there's a few tools that it use that it uses. We delete these tools all the time, we add new tools all the time. We're just always experimenting with it. So there's kind of this core kind of agent part of it, then there's the 2e part of it and then there's actually a ton of different pieces around security in making sure that everything that quadco does is safe and that there's a human in the loop for when it happens.

46:33

Speaker B

And by safety, do you mean as a user when it's doing stuff on my computer or also as anthropic monitoring use cases that could be deemed Unsafe.

47:03

Speaker C

Yeah, there's kind of a couple versions of this safety. There's just many, many layers. And for things like safety and security, there's no one perfect answer. So it's always a Swiss cheese MO model. You just need a bunch of layers. And with enough layers, the probability of catching anything goes up. And so you just have to kind of count the number of nines in that probability and pick the threshold that you want. And so for something like prompt injection, for example, we do this generally at three different layers. So let's think about something like webfetch. So quad fetch is a URL and it reads the contents of that webpage and then it does something in quad code. So one of the risks for something like this is prompt injection. Maybe there's an instruction on that website to be like, hey, quad, delete all the folders, or something like that. So we think about this in a number of ways. The most basic way is it's an alignment problem. And so Opus 4.6 is the most aligned model we've ever released because we've taught the model how to be more resistant to prompt injection. And so you can read about this on the model card, and I think it was part of the release. The second part is that we have classifiers at runtime where if there is a request that seems to be prompt injected, we block it and we just make the model try again. And then the third layer is for something like webfetch, we actually summarize the results using a sub agent and then we return that summary back to the main agent. So again, this kind of reduces the probability of prompt injection. And so you can kind of see how this isn't just one mechanism, it's a layer. And by having a bunch of these different layers, it just reduces the probability a lot.

47:14

Speaker B

One interesting technical choice that you also mentioned is using RAG or not rag. Retreat Retrieval, Augmented generation. And you mentioned how in the earlier version of cloud code, you use the local vector database to speed up search and you layer through this away. Can you talk about how this one. Because this was another example where I guess. Did the model get better?

48:41

Speaker C

Yeah, I mean, this is one of those things where we try so many different things, we try so many different tools, and just statistically most of them we throw away even something like the spinner in quad code. I think it's gone through like 100 iterations, I want to say just the spinner. And out of those we've landed maybe like 10 or 20 in production and like 80 of them. I probably Just threw away because it didn't feel good enough. So just statistically, almost all the code we write we threw away because it's just so easy to write this code and try stuff and see what feels good. So for something like rag, we tried a bunch of different approaches early on. So the first one was RAG for retrieval, because I think I was just like reading up, like how people were doing retrieval and it seemed like all the papers were talking about rag. And so the way I did it was it was like a local vector database. I think it was like written in TypeScript and Eoswift on the user machine. And then I was using some embedding model that was in the cloud to compute the embeddings before storing it. And that worked pretty good. But there's a lot of issues with rag. So for example, I was finding that the code drifted out of sync. If I make a local function, it's not yet indexed, and so RAG isn't going to find it. There's also this question of, like, how exactly is the index permissioned? So who can access it? I can access it, but then how do we encode that in kind of permission policies? How do we make sure no one else can access it? How do we make sure that if there's a rogue IT person within the company, they can't access someone else's data? This is really, really important that we think about this. And so we just decided it was sort of working. But it also has a lot of downsides. And so we tried a bunch of other stuff. One of them was just using the model to kind of index everything recursively. That was kind of a cool idea. There was another version where we just tried glob and Grep. We tried a bunch of different stuff. It turned out that Agentix Search just outperformed everything. And when I say Agentix Search, it's a fancy word for glob and grep. That's all it is.

49:04

Speaker B

Nice. So the model both got good enough and you realize that it can use these tools pretty efficiently.

51:00

Speaker C

Yeah, and it was partially inspired, honestly, by my experience at Instagram. Because at Instagram, Click to Definition didn't work because the dev stack was just borked like half the time. And I think now it's better. And so what engineers learned to do instead is, let's say you're looking for the definition of the function foo. Instead of click to definition, what you would do is you would use the global index, which is quite good at meta, and then you would search for foo per opening parenthesis. And this worked pretty well. And it's funny because this works for the model pretty well too.

51:07

Speaker B

Interesting how one idea from one area can come to the other. One of the more advanced parts of cloud code that we also previously talked about is the permission system. Can you talk about what was complex about it? And also you recently open source sandboxing. Right.

51:40

Speaker C

Permissioning is really complex. There's like everything else that has to do with security. It's a Swiss cheese model. There are a number of classifiers that run to make sure the command is safe, and there's also static analysis that we do to make sure the command is safe. As a user, you can also allowlist particular patterns that you know to be safe. So for example, some standard UNIX utilities we pre allow because we know they're read only because we know they can't exfiltrate data or anything like this. So we just won't prompt you for permission. But actually, quite a few tools fall into this category because even something like the find command, there's actually a way to execute arbitrary code as part of that command because there's like system flags that you can use for this, or even something like the SED command, there's ways to use this. So there's just all this arcania about these various UNIX utilities where it's actually not as safe as you think. And so we want to be by default, fairly conservative about what we allow by default. As a user, though, you can configure an allow list so you can say, for example, like, these patterns are allowed. These patterns are not allowed. And so we let you define that. And we also check this allowlist to make sure that it's safe.

51:58

Speaker B

Yeah, and then you have this like, neat permission system where every time you run a command that needs permission, you can decide to run it once. So run it for either the session or whatever it makes sense, or just globally allow it going forward. Right?

53:09

Speaker C

That's right. This is a funny artifact. This was actually in the very, very first version of QWT code. This is the way permissions worked. This is the very first release. This was like September 2024, the first internal release. I remember at the time we weren't sure whether agentic safety could even be solved. And so there was actually a lot of pushback internally from safety teams because they were like, okay, you can't just let the model run bash commands. That's unsafe. So what do you do? This is not a solvable problem, so we can't launch this. I brainstormed With Ben Mann and Ben, he started the WAPS team. He's one of the founders at Anthropic. He's the person that hired me to Anthropic. We just came up with permission prompts as the way to do this. If you're not sure, just ask the human and they can decide.

53:23

Speaker B

I wanted to ask you about how software engineering is done in general in terms of Anthropic. And one of the first questions, which is, I guess a more formal one or from the outside, is titles or lack of them. Everyone at Anthropic has the same title, member of technical staff. Why did this happen and what does this result in this kind of like everyone there basically no titles, right? Except for one.

54:07

Speaker C

I think it's kind of an acknowledgement that everyone just is figuring stuff out. And if you kind of squint and look at the work people are doing, it's all quite similar and it's kind of quite generalist. And if you talk to the average software engineer, they might not just be doing coding, they might also be doing a little design. They might also be talking to users. They might be writing their own product requirements, they might be writing software and also, you know, doing research. They might be writing product code and also infrastructure code. At Anthropic, there's a lot of generalists. This is also, you know, from my background, this is one of the reasons that I gravitated towards it. And I think member of technical staff just kind of encodes this in the way that people talk to each other, even if they don't know each other. Without this title, the default would have been I see your name on Slack, and under your name it says Software Engineer. And then I'm like, well, okay, I guess you're the coding person then. So I'm not going to ask you product questions. But when everyone's title is member of technical staff, by default, you assume everyone does everything. And so it kind of inverts this relationship between people, even if you don't know each other well yet, in a way, it's kind of this optimism built into the structure. I think it's also a glimpse of the future because I think this is where software engineering is going. I think this is where every discipline is going, is more of this generalist model.

54:31

Speaker B

It definitely feels like it in software engineering. And I heard this funny comment by Marc Andreessen how he said that there's this Mexican standoff happening in the tech world where the. The designers are saying that they're actually now doing like, PM and engineering work. The engineers are saying we're doing design. And like everyone thinks they're doing the work of the others. And they're kind of standing there like, I'm doing your work as well. But the reality is everyone's role is expanding, most of it thanks to AI because it makes easier for an engineer to do product work or for a product person to engineer work and so on. So just what you've said.

55:54

Speaker C

I remember back in the back in June or July of last year, I walked into the office and the data sci, there's a row of data scientists that sit right next to the QuadCo team, at least at the time. And I walked in and our data scientist for the quadco team had quadcode up on his monitor and he was using it. And I was like, this is interesting because you're a data scientist. Why are you using a terminal? You didn't have Node JS installed because we depended on Node JS back then. I was like, are you dogfooding it? Are you just trying to figure out how this thing works or something? He's like, no, no, I'm using it to run queries. He was just using it to run SQL. It had little ASCII visualizations in the terminal. And then the next week the entire row of data scientists had quad code running on their computers and this expanded. And so if you look at the team today on the quad code team, everyone codes the engineer's code, our engineering manager codes, designer's code, data scientist code, our finance guy codes, everyone on the team codes. And I think part of it is Claude code just makes it so easy so you don't really have to understand the code base. You can just dive in and kind of make small changes quite easily. But I think another thing is people are able to use Claude code to do their jobs more, whether it's financial forecasts or data science or whatever. And by doing this, it's actually quite an easy crossover to just use it to write a little bit of code also. So it's just a way to dip your toe in the water.

56:29

Speaker B

One other interesting thing about how you work is Kat, who was talking about. She is, I guess the title is the same, but people might gravitate for a role a bit more. I understand she's a little bit more on a product role, but you said that PRDs are just not really written inside entropy and PRDs product requirement document. It's a well known artifact across big tech and increasingly over larger startups where you write a spec and the Idea is that you write down your thoughts, people align, you send it over, and now you know what to build. But apparently you're not doing much of this or at all.

57:59

Speaker C

Some of this, I think, is because Anthropic is still, you know, it's still a startup, so you don't actually have to align with that many people. Usually you can just kind of talk about it or do it in Slack or whatever. But yeah, also part of it is, you know, like, Kat used to be an engineering manager. She's extremely technical. And I think this is the way that, you know, our product team thinks about it too, is, you know, better just send a pr, you're doing a

58:30

Speaker B

lot of prototyping instead. So, like, that's also something where when we talked about how you were building cloud code early on, you were showing, actually you had a whole thread about the number. I think you did like 15 or 20 prototypes for the to do list and all of them interactive working and what surprised me compared to my past tech experience. And you said that, well, you did this in like a day and a half, all 20. Tried it out, got a feeling for it, which incomprehensible for me. It would have taken a week or two weeks and people would have not done 20, they would have done three.

58:51

Speaker C

Yeah.

59:22

Speaker B

So, like, are you seeing this? Is there an increase in prototyping and building and showing instead of writing things?

59:22

Speaker C

Yeah, absolutely. I mean, on our team, the culture is we don't really write stuff, we just, we show. It's a little hard to reflect back on the time before because I think now just prototyping, everything is so baked into the way that we build. Just everything is prototyped multiple times. Like, you know, we launched agent teams earlier this week. This is our implementation of Swarms. It's very exciting because it just lets Claude do more work for longer, more autonomously. You have a bunch of different uncorrelated context windows and you have this kind of communication between agents. They can just do more. This is something that Daisy and Suzanne and other folks on the team and Karen, they prototyped this for months and they tried, all in all, probably hundreds of versions of this before they got a user experience that felt really good. It was just really, really hard to get right. There's just no way we could have shipped this if we started with static mocks in figma or if we started with a PRD or something like this. It's a thing that you have to build and you have to feel and you have to see how it feels

59:30

Speaker B

and to me, one of the big takeaways even from there was we probably should prototype more and just be more daring or just release your priors of how long it took to build a prototype or who needed to build. Back then, it was always an engineer that needed to build. But it's probably not true anymore.

1:00:31

Speaker C

Yeah, that's right. I mean, we're in this world right now also where we don't know what the right answer is. I think back in the old way of building, the cost of building was high and so you had to actually spend a lot of effort to aim very carefully before you take your shot. Because after you take your shot, it's very hard to course correct. You can only take so few shots. But now it's changed. The cost of building is very low. But also we don't know where we're aiming. So we just have to try and we have to see what feels good. And it's just very, very exploratory. And I think also a big part of it is humility, where personally I'm wrong half the time, I'd say most of my ideas are bad. At least half of them are bad. And I don't know which half until I try it.

1:00:46

Speaker B

And I get feedback from others as well sometimes.

1:01:28

Speaker C

That's right. It's like I have to try it myself and then I have to see what others think because my intuition does not always match others.

1:01:30

Speaker B

When you were showing these prototypes of just how the, the tasks were built, you were telling me that you built the prototypes and then your process was always you first, like looked at it, you tried it out, you got a feel for it, and then for the ones that you felt were good, you showed it to others. And sometimes they give you feedback like, nah, this doesn't work. And then sometimes when it felt good, then you shared it even broader. So I feel like, you know, like it's a mix, right, where like sometimes you can decide already and then sometimes you get feedback and then eventually some good ideas come out of it.

1:01:36

Speaker C

Yeah, and there's a lot of examples of this. Like we launched this kind of condensed view for file reads and file search just because the model is just so agentic now I felt like half the screen is these file reads and I actually don't care it read a thing. I don't really care what it is. And so we condensed this down to make the output a little bit more readable. I really liked it. After probably 30 prototypes or something like this, it took so much effort to make that feel really Good and clean. We rolled it out to employees at Anthropic for about a month and we had everyone dog fooded. And I fixed another, probably dozen dozen bugs, dozen tweaks based on all this feedback. We launched it externally and almost all users liked it, but there were a few users that didn't because they want more expanded output. And so on the GitHub issue, I was just going back and forth with people to be like, what don't you like? And people gave a lot of feedback. I shipped another version. Then some people liked it, some people didn't. And so I iterated again and kind of made it good. And it's actually, I think, almost there, where people can configure it the way that they want, but still the default is really good. But this is just the process. You know, we get it right some of the time. We have to learn from our users, we want to hear from people so we can get it right.

1:02:06

Speaker B

Do you use ticketing systems for your work where you capture like, all right, here's the work I want to. Or do you just pretty much do the work as it comes in?

1:03:12

Speaker C

So at Anthropic, we weave it up to teams on the Quadco team, we weave it up to every person. Different people use this differently. For example, I don't use a ticketing system. Some people like to use Asana or Notes or something like this. One of the coolest things that I saw this was maybe like three months ago or something, we launched plugins. And the way we launched that is Daisy. For a weekend, she had a very early version of Swarms and she let the swarm run and she told it, your job is to build plugins. You have to come up with a spec. Then you have to make a Asana board and split up into tasks. And then all the different agents have to build it. And she set up a container and she set up a Claude in Dangerous mode and she let it run for the entire weekend. It spawned a couple hundred agents, they made a hundred tasks on the Asana board and then they implemented it. And that's pretty much the version of plugins that we shipped. These kind of coordination systems, they used to be for humans, but I think nowadays it's just as much for models.

1:03:20

Speaker B

Let's talk about Cloud Cowork. It's one of the very important things about us. It looks great. So I tried it out inside Claude. You have the coworker tab there and you can, I feel it's a lot more visual way of running agents, interacting with them. One of the surprising things I heard that it was built in 10 days. Can you take us through what it took to build it and what does it actually mean? Was it from the idea or from the decision of building it? And how big was the team building it?

1:04:17

Speaker C

The team was really small. It was just a few people. For a long time we felt that there is some product to be built for non engineers. The reason we felt this is for a long time, people that were using quad code are non engineers. And so, you know, in the product world, when you see latent demand, you see people jumping through hoops to use a product that was not designed for them. That's a really good sign. It's time to build another product that is built just for them. There's all these people on Twitter that there's this one guy that was using quadco to like monitor his tomato plants. I just, I vault this. It was like he had a webcam set up and Quad was like, oh my God, I'm so happy that our plant is budding because it had a webcam and just every day it was monitoring it and it was so happy that the tomatoes were growing. There was someone that was using QuadCode to recover photos off of a corrupted hard drive and it was like his wedding photos. Like I said, our entire finance team at Anthropic uses quadcode. Our sales team uses quad code. So there's just all these people that are non engineers that were using it. And at that point, quadcode, it's available in a lot of form factors, right? Like we started in a terminal, then we expanded and we added support for IDEs. So we have extensions for, you know, every VS code based IDE, every JetBrains based IDE. There's also iOS and Android apps, there's the desktop app, there's web. So then there's like Slack and GitHub apps. So we kind of expanded to all these places to make quad code easier for engineers. But ultimately none of these are built still for non engineers. And so cloud code evolved a lot, but it still felt like there's kind of a gap and there's a product that could make this even easier for people. And so for the last couple months, the team was kind of hacking around and just seeing like what is the right product. And at some point someone came up with this idea of like, what if we just take quad code, add some guardrails. So for example, coworkships with a virtual machine. This is one of the many ways that we make sure it's really Safe, especially for non technical users that don't want to read like bash commands to figure out what it's doing. And they were hacking on this. I think it was something like 10 days end to end or something. It was just fully built with quad code and then we shipped it.

1:04:45

Speaker B

And can you give us a sense of the complexity behind an app like this and if we can walk through what parts needed to be built? Because from the outside it's a little bit hard to tell. Like, is this just a nice UI wrapper that's, I don't know, like a few hundred lines of code? I'm just being obviously provocative here or behind the scenes. It's actually a really complex piece of software. The reason I ask is like Uber is a great example where people look at the app. It looks really simple. I've worked there and I know it's really, really complex because you don't see a lot of the complexity. There's a lot of regional things, there's a lot of backend things that are all hidden. So from just from looking at it, Claude Cowork, it's hard to tell how much of this is additional business logic that needed to be carefully thought out versus it's actually just a nice little thin wrapper on top of the model.

1:06:55

Speaker C

In some places I think there's less complexity than you would. In some places there's more complexity. So on the product side, it's quite simple because it's just the Quad desktop app. So you know, you download the Quad app, it's, it's a single desktop app. It has a tab for Cowork, it has a tab for code, it has a tab for chat. So it is just one app and we're able to inherit a lot of that product logic. There's some UI rendering code under the hood. You know, it's just the same quad code running. It's the same Quad agent SDK that powers quad code. A lot of the complexity actually is about safety because we know, like I said, we know the user is non technical and so we just want to make sure they have a good experience. And so for example, if someone launches the app and then you know, like they delete a bunch of family photos, that's really not good. And so we wanted to make sure that we protect against this so you can't accidentally do that. And so that's where a lot of the guardrails came from. So there's a bunch of classifiers running on the backend. This is for safety. And again, extra mitigations for things like prompt injection and, you know, risks like this around security. On the front end, there's an entire virtual machine that we ship. There's a bunch of operating system system level integrations to make sure people don't accidentally delete things. So just around safety, there's a lot there. And then we also had to rethink the permission system because we inherit the permission system from QuadCode but also for Cowork, actually a big part of the value is not just running locally, but it's using all of your tools the way that Quad code uses it. But the thing is, for non technical users, your tools aren't really available as CLIs. Some of them are available over MCP, many of them are available in a browser. And so Cowork is really, really good when you pair it with a Chrome extension. And this is the way that I usually use it. So, you know, for example, I use it every week to do project management for the team. We have a spreadsheet that tracks kind of at a really high level what everyone's working on. And this is kind of my personal way of project managing. You know, other people, like I said use, use Asana, other people use notes or whatever. For my own tests, I don't use anything but kind of for the team overall, I have the spreadsheet and I have Cowork kind of check in and I just ask Cowork every week, hey, can you look at the rows for any status that has not been filled out? Can you just ping the engineer on Slack? And so it'll open one tab in Chrome for the spreadsheet, it'll open another tab with Slack and then it'll just start messaging engineers in Slack and it just one shots it. There's like one engineer's name for some reason it can't autocomplete, but everything else it just gets. And so this is actually like from a safety point of view. We also thought pretty deeply about this Chrome extension, how this works and how the permissioning model should interact with this local permissioning model. So there's also a bunch of code to kind of make sure that that feels smooth.

1:07:42

Speaker B

And what's the tech stack behind this? I assume a lot of will be similar to the Cloud app, but is it Electron, Typescript, those kind of things or something else?

1:10:15

Speaker C

Yeah, just Electron and Typescript. Actually some of the people working on it are early Electron folks. So Felix, who's the creator of Cowork, he was a really early engineer on Electron and he helped build it.

1:10:23

Speaker B

Oh, amazing. And Cowork launched macOS only. What was the reason for both for choosing this platform first and for now only choosing this platform.

1:10:35

Speaker C

Yeah. So Windows coming soon. I think probably by the time this podcast comes out, we will have Windows support. We just wanted to start early and start learning everything we do at Anthropic. It's kind of like the way that I told my own story. One of the things I like about Anthropic is it just really, really matches the way that people hear or think about it back to this point where we don't have high certainty about the things that we build and our intuition is often wrong. And so we just have to learn from users and figure out what people actually want. And you spend a lot of time listening to people and understanding the feedback deeply. This is the way that we build product. And so we always launch a little bit before it's ready. We did this for quadcode. When we launched quadcode initially it didn't even support Windows. Also it didn't support, you know, like a lot of different stacks. And then over the coming weeks we added support for every stack. Now QuadCode supports every single stack. You know, like Windows, whatever. Weird. Linux distro. You use macOS. We support everything. And so for Cowork also, we just wanted to launch early. We wanted to start with Mac as that was just the starting point. But yeah, it's going to support everything.

1:10:46

Speaker B

One thing you mentioned is getting feedback. I'm curious both for cloud code and for cloud Cowork, how do you go about things like observability monitoring when you're rolling out? Do you use any feature flags? And I'm more interested in like, did you build custom tools for this or did you decide to use certain vendors? Because especially for observability, I'm sure that this is both important, but it also sounds like pretty high scale in terms of the number of users that we can derive or this will not be a small operation.

1:11:50

Speaker C

Yeah, there's some off the shelf vendors that we use. There's some custom code that we use. So it's actually, it's a mix of both. There's nothing too surprising about it. There's one thing about Anthropic that's kind of interesting is because we're an enterprise company and we care a lot about privacy and security, we can't see people's data. And so, you know, like, if someone reports a bug, like I actually can't pull up your logs to kind of see what's going on. A lot of work goes into kind of figuring out how to log events and things like this in a privacy preserving way. This is just very important to the way that we operate.

1:12:20

Speaker B

For Cowork. What kind of learnings have you had so far? It's been out for I think a few weeks now. Did you see something unexpected? Are you shaping the product based on feedback that you're getting?

1:12:50

Speaker C

Yeah. Every day the team is landing so many fixes. The most surprising thing is just how much people are loving it. To be honest. When quadcode first came out, it actually wasn't an overnight hit. This is something people think it was, but it was sort of a slow takeoff at the beginning. And I think the first big inflection was in May when we released Opus 4 and Sonnet 4. That's when it really clicked and that's when our growth became exponential. But at the beginning it was sort of a research preview. People didn't really know how to use it. Some people got it immediately, but most people didn't. It took a little while for Cowork. It's a much steeper growth trajectory than QuadCode was at the beginning. So it's just been an instant hit and that's actually been very surprising. I didn't really expect that.

1:13:02

Speaker B

One of your new releases, which came out just very recently, it was I think yesterday or the day before when we're recording this podcast, was agent teams. And as I understand the idea with what agent teams, agents forms instead of single agent, you can have a lead agent and it can delegate to its different teammates. How did you start experimenting with this and how did you decide to ship it?

1:13:43

Speaker C

Now we're always doing experiments, right? There's all sorts of ways to get more mileage out of Quad code. One way you can do it is by extending context. Another way is auto compacting context. So it's essentially infinite context. And that's what we have right now. Another way is using subagents. So you have multiple agents kind of working together. There's just like a lot of different approaches to get a little bit more mileage out of the context window. There's this one idea called uncorrelated context windows. That's what we call it. And the idea is you have multiple context windows, but they essentially start fresh so they don't know about each other. And so an example of this is like a correlated context window is if you have the model and it does a task and then you have it just do a second task in that same context window. And in this case the second task knows about the first one, because it's in the same window, but for something like a subagent, it's uncorrelated because the main agent prompts the subagent. But the subagent's context window is fresh. Besides that prompt, it doesn't know what's in the parent context window. And you can see this actually a little bit in, for example, like subagents versus skills, because when you run a skill or slash command, it sees the parent context window versus for a subagent, it doesn't. So it's uncorrelated. There's some cases where you want that context. There's some cases when you don't. And there's this kind of interesting thing where uncorrelated context windows and just throwing more context at the problem and throwing more tokens at it when the windows are uncorrelated gives you better results. It's actually a form of test time compute to do this. And for something like teams, we've been experimenting with this for a while, I think since maybe like October or September or something like this. And it really just felt like with Opus 4.6, it clicked where the model figured out really how to use this. And sometimes you see these kind of cute exchanges where the agents are talking to each other and they're like discussing something. And it's just very cool to see. It's very humanistic in a way. But there's other times where you just get very good results. And so we had a bunch of internal evaluations, for example, where we have quad build something very, very complex, something more complex than what a single quad would build. And we saw the results just really, really improve with Opus 4.6 with Teams. And that's why we felt it's the right time to release it. We also wanted to be careful. And the reason you have to opt into it, the reason it's a research preview, is it uses a ton of tokens because it's just a bunch of quads that are running. Not everyone wants this all the time, so just excited to see how people use it and, you know, to hear the feedback. It's something you want for fairly complex tasks. You don't probably want this for every task. The main CLAUDE decides the roles for the sub quads. We don't have kind of a regimented way to do this. It's context specific. I wouldn't say there's one right way to do it. I think actually a lot of the magic of this comes out of this idea of uncorrelated context. Windows. It's less about the specific configuration of the agents, but it's something that people should experiment with. I don't think there's a one size fits all.

1:14:07

Speaker B

Have you seen use cases even in even. I know it's still research, but have you seen use cases where it looks promising? This approach, the swarm approach?

1:17:01

Speaker C

Well, like I said before, plugins were fully built with swarms. There's a bunch of other features since that were built in this way. So yeah, I think for anything where you see a single Claude struggling, the swarms can help.

1:17:10

Speaker B

It's an interesting tool to look at. Talking about change in general with Andrew Karpathy. You had a really interesting exchange back in December where when he posted that he's never felt as much behind as a programmer as he is now because of the progress with AI, and then you shared the story about how you started to debug a memory leak the old fashioned way. And then Claude just one shot at it. I think it was a reflection of how everyone is feeling that things are changing so fast. And in the holiday break I started to feel that things have really shifted. How did you, I guess come to terms with this or start to embrace this change?

1:17:21

Speaker C

This is something I really struggle with. The model is improving so quickly that the ideas that worked with the old model might not work with the new model. The things that didn't work with the new model might work or with the old model might work with the new model. And it's weird because there's just not a lot of other technologies like this. So I just don't really have a lot of experience to draw on to figure out how I should approach this. And it's been this new skill that I've had to learn. In a way, it's like you just always have to bring this beginner mindset. Honestly, I'm using the word humility a lot, but you always just have to bring this kind of intellectual humility because just all of these ideas that were bad before are now good in the inverse. I think that's honestly it. It's something I constantly have to remind myself about. And back in the. It's funny back in the old world, when someone tries an idea again and we've tried it in the past and it didn't work, usually the feedback is like, why are you doing this again? Yeah, yeah, you should run.

1:18:00

Speaker B

I mean, we used to call it a bit of a gatekeeping, but it was somewhat valid where I know with architecture, someone came and said like, why don't we do Microservices and someone said, we tried it and it didn't work. And if you tried it a year or two or three years ago, it was kind of valid, right, because not much has changed.

1:19:02

Speaker C

Yeah, that's right. That's right. And something like Microsoft is funny because it's like every 10 years it goes in and out of style. But yeah, now it's, I think, the first time ever where it's actually not crazy to just try the same idea every few months because the model improves and it just works. And I actually see this with engineers on the team, people that are newer to the team, people that are newer to engineering, sometimes do things in a better way than I do and I just have to look at them and I have to learn and I have to adjust my expectations. An example of this is when we release features. Sometimes I'll screenshot myself using them on X or on threads or whatever, just to kind of talk about it. But recently Tarik, our Devreau guy, he actually codes a lot. He's amazing. And he just started automating this so he's having quadcode generate its own videos for, for its launches. And he just started doing this. And this is something I thought would be maybe it's possible. It's not something I would have tried because I wouldn't have thought the model was ready, but he just did it and it just kind of worked.

1:19:16

Speaker B

One thing that I felt just a bit odd about, and I think a lot of developers can relate, is I've come to terms with this. Starting from Opus 4.5 and also similar models. I think GPT 5.2 gave me similar vibes as well. The models have been just really good at writing code and I realized that I don't think I will handwrite the code when I'm get I. When I want to get stuff done. If I actually want to, you know, get the pleasure of writing, I can still do it. But one thing I reflected on is it's just been so much effort to get good at coding. I remember when I, when I was learning, when I started from like kind of hacking around to going to university to learning C and C and it was just bloody hard. And actually going through my first few jobs where I started to become better at it. I became better at debugging and there's a point where a lot of my identity was tied to being good at coding. That's how we used to get jobs or higher paying jobs when I was an engineering manager. When we designed the interview loop at Uber we had talked with managers of what we need to screen for, and we talk like, well, what do developers do most of their time? About 50% of the time, they code. Therefore, we placed about 50% of the signal was all about coding. So there was a lot of things that tied into coding because it is just hard. I think we all know that it takes grit, it takes some level of intelligence to get good at it. And there's a sense of loss of, like. Well, I think it's great on one end that the model can do it, but it feels that something really quickly got taken away that I don't think. I personally thought it would happen this quickly. And I think a lot of other people are feeling. Some people move on a bit easier, but there's definitely the sense of. Of grief. How did you think about it? Because, again, you're an example of you wrote so much code at Facebook, also outside of it. I know it was just a tool of doing it, but not many people could do what you did. And now the models can also work as good as you have, or if not better.

1:20:17

Speaker C

That's the challenge. Yeah. I think it's something that used to be a thing that we do as software engineers. It's becoming a thing that everyone is able to do. There was a moment when I started coding. It was a very practical thing, and it was a way to get things done. And at some point, I just fell in love with the art of coding and, like, languages and kind of the tools themselves. And at some point, I kind of fell down this rabbit hole. I wrote this, like, I wrote a book about, you know, a programming language, Typescript.

1:22:15

Speaker B

You wrote the first ever typescript book with O'Reilly.

1:22:45

Speaker C

Yeah, yeah, yeah, that's right. It was funny, actually. There was this, like. There's this amazing moment for me in my little town in Japan. I went to the bookstore and I found that book translated. In Japanese.

1:22:49

Speaker A

No.

1:22:59

Speaker C

In this tiny town. And that was just, like, the coolest moment. And then I actually realized I don't remember Typescript at all, because I was only writing Python for a couple years at that point.

1:23:00

Speaker B

Yeah.

1:23:09

Speaker C

And, like, at some point, I started the first. The biggest Typescript meetup in the world that was in sf. And I got to meet kind of a lot of my heroes. There was, like, Chris Kowal, who wrote, like, General Theory of Reactivity. There was Ryan Dahl, the guy that made Node. One of the first times that I went really deep into this community and just the language itself and the tools themselves. And for something like Typescript, there's this beauty in the type system because Heilsberg is just like, he's just brilliant. The idea of conditional types and just anything can be a literal type. And there's these very deep ideas that even the most hardcore functional languages do not have. Even in something like Haskell, it doesn't go this far. And Anders just took it and he pushed it much further than it had been pushed. And Joe Pamer and a bunch of other folks kind of explored a lot of these ideas and thought of this. And I think for them it was also very practical because they had these large untyped JavaScript code bases. Had he gradually migrated to something typed. And you have to come up with this very beautiful ideas to do this. For me, Scala was another kind of rabbit hole that I fell into in kind of like this functional programming world. And still when I write code and when the model writes code, I always think in the types first. That's what matters, is what is the type signature that matters more than the code itself. And getting that right. So there is this beauty to it. There's an art to it, for sure. But in the end, it's a practical thing. And in the end, this is a thing that we use to build things and it's a means to an end. It's not an end to itself. I think one metaphor I have for this moment in time that we're in is the printing press in the 1400s or whatever, because at that moment it was actually quite similar. There was a group of scribes that knew how to write.

1:23:09

Speaker B

And it was, as I understand, of course, we never lived it, but as I imagine, it was hard process to learn. You needed to learn, you needed to get the equipment, you probably needed some sponsorship or being selected, practicing, because you needed to produce the same thing over and over again. And few people could do that. And I assume it was either high prestige or highly paid, or who knows, let's assume it was. But then the printer press came along.

1:25:02

Speaker C

Yeah, yeah. And at least in Europe, like you had to, like a lord or a king or something had to employ you. And then you had to go through, you know, years of training and there was this class of scribes that knew how to write. They were employed by someone like this. Often the king themselves or the queen was not literate. So it was this very, very niche skill. And it was like less than 1% of the population was literate in Europe back then. And then the printing press came out and what happened? So the cost of printed material went down, something like 100x over the next, I think 30 years, 50 years or something, the quantity of printed materials went up like 10,000x in the next 50 hundred years. This was the first effect, literacy, it took a little while for it to catch up. So I think global literacy, it went up to something like 70%, but that took another 200 years, 300 years. Because learning to read is just very hard. Learning to write is hard. It takes a lot of effort. It takes education system, it takes infrastructure to have paper and ink and the free time to do this instead of working on a farm. So it took early stage of industrialization to actually get there. But I think this effect of making it so this thing that was locked away in ivory tower, now it's accessible to everyone. This was just none of the things around us would exist today without this if we weren't literate. If the people that built this microphone weren't literate, it would have just been very hard to have a modern economy. None of these things would exist. And I just kind of think about back then, if people had to predict what would happen when the printing press came out, no one would have predicted that the microphone would become a thing. So I just feel like this is the best analog for the moment that we're in right now.

1:25:26

Speaker B

It's interesting that you say that some of the kings were illiterate who were employing the scribes, because if we're being honest with ourselves, we have business owners who know what they want to build and they are employing software engineers because they themselves cannot write code. And I think we like to mock the CEOs who are coming there, coming to the team. They might even have a drawn prototype or whiteboard and saying this should be easy. But of course they don't understand how difficult it is. But there seems to be a bit of analogy where there's a person who wants what they want. But until now they needed to hire a specialist who can build that. And there's always that disconnect between the idea and the person. And just like with the printing press, what would happen if they could actually express and the king could actually read or write their own letters? They wouldn't need that middleman and things become more efficient. I mean, of course for the scribe, it's not the best news necessarily, but I mean, smart scribes can also do so someone needs to write the books, run the press, etc.

1:27:13

Speaker C

Yeah, exactly. And if you think about what happened to the scribes, they ceased to become scribes. But now there's a category of writers and authors. These people now exist, and the reason they exist is because the market for literature just expanded a ton, I guess

1:28:14

Speaker B

also, if we think about back then, a scribe's work was read by a few people and with the printing press and author, there's a lot more authors. And some of them are not really read, but some of them have wider reach than they could imagine. There's new careers that exist because of that. Yeah, I love the analogy.

1:28:30

Speaker C

And the most exciting thing for me is it's just so impossible to say today what will happen after this happens and after this transition happens. Just, you know, the economy as we know it would not have existed without it. So what's next? Like, what is the thing that we can't even predict today that will exist because anyone can do this?

1:28:47

Speaker B

Well, we cannot predict, but I think we can look at what is working right now. If you look around in your environment, may that be the team across Centrophic who are software engineers or builders or members of technical staff, however we call them, who to you are standout, what are they doing, what skills have they built up and how have they changed the way they work?

1:29:13

Speaker C

It's hard to name individuals because honestly, this is just the strongest. These are the strongest people I've ever worked with in my career. There's all sorts of different archetypes. There are some people that are really amazing prototypers. So take something from 0 to 0.5, just figure out what are some cool ideas, what does the technology unlock? There's other people that are amazing at finding product market fit. So kind of 0.5 to 1 or maybe 0 to 1. There's other people that span different disciplines and I'm just seeing more and more of these people, like I said, people that span product engineering and infrastructure engineering, or product and design, or design and engineering. I think I'm just seeing a lot more of these hybrids.

1:29:34

Speaker B

What's a belief that changed from last year to this year? Something that you either believed or a conviction that you had that you've either revised or completely threw away?

1:30:15

Speaker C

I think one thing I wasn't sure about is how big a problem is safety. To be totally honest, I joined Anthropic because, like I said, I read a lot of sci fi and I know how bad this thing can go if it goes bad. It wasn't something I was sure about, but seeing it from the inside and then seeing how the new risks that have arisen in the last year, it just makes me much, much more worried about it. So I think it was kind of an important thing for me now, it's just the most important thing for me is how do we make sure this thing goes well.

1:30:26

Speaker B

I think it's safe to say you were a really great software engineer even before all the AI things started. And you seem to be a very productive engineer. Of course, part of a team as well, but also individually. What are some skills before being a software engineer that are still as valuable or maybe even more valuable than before? And what are ones that are maybe just not as much and they're best left behind probably.

1:30:59

Speaker C

Okay, so stuff that's left behind is left behind, is maybe very strong opinions about code style and languages and things like this. I can't wait to get past these endless language debates and framework debates and all this stuff, because the model can just use whatever language and framework and if you don't like it, it can just rewrite it for you. So it just doesn't matter anymore. I think something that still matters a lot today is it's being methodical and hypothesis driven. This matters both in product design. In this world where everything is being disrupted and we need to figure out what to build next. And this is something everyone is thinking about. But it also matters for engineering day to day, something like debugging, you just have to be very methodical about it and the model can do this and it can help a lot. But I think still we're in this transition point where you still need to have the skill. I don't know if you're still going to need to have it in six months. Other skills that I think are more valuable are being curious and being open to doing things beyond your swim lane. So if you're working on engineering but you really understand the business side, you can just build really awesome products. And I think the next billion dollar product after quadcode, whatever the next startup is that becomes the next trillion dollar startup, it might just be one person that has some cool idea and their brain just is able to think across engineering and product and business or design and finance and something else. People are going to become more and more multidisciplined and this will become more and more rewarded. So in some ways I think this will be the year of the generalist. I think the other skill that's actually been rewarded a bit is having a short attention Spanish.

1:31:25

Speaker B

I was being rewarded now. Oh yeah,

1:33:10

Speaker C

teenagers are using TikTok and all this stuff. I think in some ways it's kind of dangerous for society because you want people that can think deeply and can contemplate ideas and aren't just moving on to the next idea very quick, but in some ways I think this year is the year that is going to reward. It's like the year of adhd, because the work for me has become jumping between quads, it's become managing quads. And so it's not so much about deep work, it's about how good am I about context switching and jumping across multiple different contexts very quickly.

1:33:15

Speaker B

Could I add that from what all you said? Maybe you could add one thing, which is adaptability, because you're saying of course that ADHD and you can jump across. But of course earlier you are very good at focusing deeply on one thing as well. And what strikes me about you, and maybe this is true for other people as well, you're just kind of very open to adapting your working style and seeing what works well for this stage, especially when things are changing. I think the one certain thing we can be sure is whenever the next model comes out, it'll change again. And you need to be curious and open to adapting how you work. Right?

1:33:51

Speaker C

Yeah.

1:34:24

Speaker B

And as closing, what's a book or books that you would recommend?

1:34:25

Speaker C

I've gone down a Cixin Liu rabbit hole. So he's the three body problem guy, but he actually has a lot of other really great books. I really love his short stories. He has a couple books of short stories. I'm a big fan for people that are new to sci fi and you want a little bit harder sci fi. I really love Accelerando by Strauss. This is a book I would totally recommend. It's essentially the product roadmap for the next 50 years. It starts with takeoff starting to happen and AI singularity. And then it ends up with this kind of group lobster consciousnesses orbiting Jupiter. And it's just amazing. And the thing that I think it really captures is just the pace, this quickening, quickening, quickening pace of how this feels. It really matches the feeling right now. And then on the technical side, I would strongly recommend functional programming in Scala, even if language choice just doesn't matter as much anymore. I think there is this art to functional programming that just teaches you how to code better and it'll just teach you how to think in types. If you read this book, I think what's really important is to do the exercises also. And I've gone through and I've done all of them probably like three times over. And it's just amazing. It really just like knocks this idea of functional types into your head and it's just a thing you can't stop thinking about Boris.

1:34:28

Speaker B

Thank you so much. This was awesome.

1:35:45

Speaker C

Yeah, thanks Gerge.

1:35:47

Speaker A

This was a really interesting conversation. And the thing that I keep coming back to is to Boris. Printing press, press analogy. The idea that medieval scribes were this tiny elite who could write, employed by kings who themselves were often illiterate, and that we soft rangers might be in a similar position today. We are the scribes. We spent years mastering this craft and now the printing press is arriving. But what Boris told me is that the scribes did not disappear. They became writers and authors and the entire market for written work expanded beyond anything anyone could have predicted. I do find this hopefully hopeful and also appreciate that Boris didn't sugarcoat it. The other thing that struck with me is just how differently the Claud Code team Builds software. No PRDs, no mandatory ticketing system designers and data scientists and finance people all writing code and building dozens or hundreds of prototypes before shipping a feature. And Boris is shipping 20 to 30 pull requests a day without editing a single line by hand. And there are different verification systems in place, clock call reviewing its code, automated lint rules, best of n passes, and human code review. If you've enjoyed this podcast, please do subscribe on your favorite podcast platform and on YouTube. A special thank you. If you also leave a rating on the show. Thanks and see you on the next one.

1:35:49