"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

"Descript Isn't a Slop Machine": Laura Burkhauser on the AI Tools Creators Love and Hate

83 min

•May 6, 20262 months ago

Summary

Laura Burkhauser, CEO of Descript, discusses how to build AI-powered creative tools responsibly without enabling content spam. She defines 'slop' as mass-produced content optimized for algorithmic engagement rather than artistic merit, and outlines Descript's strategy for choosing generative models, building proprietary AI where they have unique data advantages, and designing pricing that doesn't punish users for AI feature usage.

Insights

Slop is defined by incentive structure and scale (algorithmic arbitrage at volume) rather than quality alone; bad art is a necessary learning phase for creators new to any medium
Creator hostility toward AI features exists on a spectrum: deterministic tools (green screen, studio sound) are universally loved, agentic editing (Underlord) is polarizing due to quality gaps, and generative video faces visceral resistance due to hype-driven messaging and poor user experience
Descript's defensibility lies in proprietary models for augmented recorded media (voice cloning, retake removal, jump cut smoothing) where they have unique editing data, while borrowing frontier models for pure generation through FAL
Aesthetic judgment and human evaluation remain irreplaceable for model selection; vibes-based assessment by trusted evaluators outperforms automated metrics for creative outputs
The future of content will be shaped by artists adapting to new tools in unexpected ways, similar to how painting evolved after photography, rather than by economic logic alone driving infinite slop production

Trends

Shift from general-purpose AI models to narrowly-scoped, task-specific tools that creators can trust and controlEmergence of outcome-based pricing models replacing usage-based billing as AI costs stabilize and models improveGrowing importance of multimodal understanding (video + audio + text) as core capability for agentic video editingCreator preference for AI tools that augment human expression rather than replace it, driving demand for editing assistance over content generationRise of AI agents as team members in creative workflows, requiring universal design principles where agents and humans access identical tool setsIncreasing need for aesthetic judgment and expert human evaluation in model selection as benchmarks become insufficient for creative applicationsExpansion of AI editing tools beyond single products into broader agent ecosystems via APIs and MCP integrationsPersistent cost challenges in AI-powered software requiring creative pricing strategies that don't penalize exploration and experimentationCultural resistance to generative media driven by both technical limitations (poor consistency, difficult UX) and threatening rhetoric from vendorsDecoupling of AI feature quality perception from actual capability; users frustrated by overhyped tools that don't deliver promised ease-of-use

Topics

Defining and preventing content spam in AI-generated mediaModel selection and evaluation for creative applicationsProprietary vs. frontier model strategy for AI productsAgentic video editing and AI co-editorsMultimodal AI understanding for video contentPricing models for AI-powered software featuresCreator attitudes toward generative AI toolsAesthetic judgment in AI model evaluationVoice cloning and audio regeneration ethicsJump cut smoothing and video editing automationAPI design for AI agents in creative workflowsLabor displacement and job transition in creative industriesArtistic adaptation to new creative technologiesUnderlord API and agentic editing capabilitiesDeterministic vs. generative AI feature design

Companies

Descript

AI-powered video editing platform; subject of episode discussion and Laura Burkhauser's company as CEO

Anthropic

Frontier AI lab; Claude model discussed as potential backbone for Descript's Underlord agent

OpenAI

Frontier AI lab; GPT models considered for agentic editing; mentioned for autonomous researcher timeline

Google

Frontier AI lab; Gemini and Vio models evaluated as defaults for image generation in Descript

FAL

Model provider platform; Descript's primary integration partner for accessing generative models

Runway

Generative video model provider; mentioned as customer of Sequence billing platform

Midjourney

Image generation model; CEO cited for aesthetic judgment approach to model curation

Kling

Generative video model; mentioned as specialized tool for specific video generation use cases

Nano Banana

Image generation model; Descript's current default for photorealistic image generation

C-Dance

Generative video model; discussed as potential default but considered too opinionated for B-roll use

Waymark

TV commercial creation platform; host's company; discussed as example of aesthetic evaluation challenges

Cognition

AI coding platform; mentioned as customer of Sequence billing platform

Incident I.O.

Incident management platform; mentioned as customer of Sequence billing platform

Open Router

Model routing platform; mentioned as customer of Sequence billing platform

Intercom

Customer service platform; mentioned for opening up customer service model to competitors

AvePoint

AI governance platform; sponsor discussing control layer for agentic AI operations

People

Laura Burkhauser

Discusses Descript's AI strategy, model selection, pricing, and vision for responsible creative AI tools

Nathan Labenz

Hosts episode; Descript customer and early Underlord API adopter; CEO of Waymark

Andrew Mason

Descript founder; previous guest on Cognitive Revolution in August 2024; preceded Burkhauser as CEO

David Foster Wallace

Referenced for Infinite Jest's prediction of infinite video consumption and algorithmic manipulation

Benedict Evans

Referenced for framework on software bundling and unbundling dynamics

Quotes

"Descript isn't a slop machine and we don't want it to be."

Laura Burkhauser•Opening email referenced

"Slop is when you can identify a temporary inefficiency in the market to create content that is extremely cheap for you to make that might get you enough revenue or engagement that it ends up being net positive for you."

Laura Burkhauser•Early discussion

"I think bad art is a really important stage that one must go through to get to good art or good content."

Laura Burkhauser•Mid-episode

"My job is to make sure that no matter how good Frontier models get, you have a better experience using Descript than you would with an AI agent alone."

Laura Burkhauser•Strategy discussion

"Art always reacts to technological advances in ways that surprise us. I think like and then you might be like, yeah, art, we're talking about content. But it's like artistic expression kind of content will change first."

Laura Burkhauser•Closing discussion on content future

Full Transcript

Hello, and welcome back to The Cognitive Revolution. Today, my guest is Laura Burkhauser, CEO of the pioneering video editing platform Descript, which originally burst onto the scene in 2017 with its revolutionary AI-powered, word processor-like editing paradigm. And as you'll hear, has continued to push the boundaries of what AI can do for creators ever since. Laura took over for Descript founder Andrew Mason, who was my guest on the show back in August 2024, after serving as VP of product for several years. And as a longtime Descript customer and early adopter of their new Underlord API, I've been impressed both by their customer obsession and product velocity. And so I was genuinely excited to get Laura's take on product management in the AI era. We begin with a remarkable email that Laura recently sent to customers in which she recognized that generative AI is a polarizing topic among creators and declared that, quote, Descript isn't a slop machine and we don't want it to be. For me, this begged the question, what is slop? For Laura, who emphasizes that all creators have to start somewhere and that all new media takes time to mature, it's less about the quality of the content and more about the incentives that drive its creation. In short, it's the mass production of content for the explicit purpose of algorithmic attention arbitrage that she objects to. In this, she's in step with Descript's creator customer base, who she says approach AI with a passionate mix of enthusiasm and hostility. Narrowly scoped, purpose-built, and critically reliable AI tools, such as Descript's studio sound, green screen, and audio overdub features, are pretty much universally loved. Underlord, their natural language instructable AI editing assistant, which I personally do find quite useful. Everyone wants to love, but many still find frustratingly limited. And then there are the infamously unruly image and video generation models, which, despite and perhaps in part because of their soaring popularity, are the object of visceral hatred. It's a lot to manage, particularly with general-purpose products like Claude Code accelerating to the point where they're starting to be capable of video editing. But Laura's true north is simple. It's her job to make sure that no matter how good Frontier models get, you have a better experience using Descript than you would with an AI agent alone. To this end, we get Laura's razor-sharp takes on how Descript decides which generative models to include in the product, why they plan to use Frontier models to power agentic editing for the foreseeable future, while also training task-specific models in-house where they happen to have a unique proprietary data advantage. the critical importance of and challenges associated with multimodal understanding, the critical role that expert aesthetic judgment plays in the process of model evaluation and iteration, the product design principle that says AI assistants should be able to do everything that human users can and vice versa, how Descript is designing the Underlord API to be hired by coding agents, and the pricing and design challenges that arise when a single button click or API call can consume multiple dollars worth of credits. Finally, we take stock of where we are in the big picture. Laura emphasizes that while economic logic might dictate a future of infinite slop, artists have a long history of adapting to and incorporating new technologies in unpredictable and often defiant ways. And so she's betting that our cultural reality will be far more vibrant than our black mirror fears. With that, I hope you enjoy this extremely insightful conversation about managing both AI products and customer bases with Descript CEO, Laura Burkhauser. Laura Burkhauser, CEO at Descript. Welcome to the Cognitive Revolution. Hi, Nathan. Thanks for being here. I'm excited for this conversation. As long-time listeners know, we are Descript customers and use Descript to help produce the podcast. And this is actually the second CEO of Descript episode on the cognitive revolution, although the person holding the seat has changed. And so you're new in the role. That's right. Looked back at a couple emails that you sent, one that I think you sent kind of right after taking over. And you wrote something I think will be a really great jumping off point for us, which was, the script isn't a slot machine and we don't want it to be. So how do we keep building generative AI features without surrendering to the slot? Or is it impossible? That's, I think, going to be a defining question, honestly, of like how people, even in the big picture, like spend their time over the next few years. So I want to just start off with the kind of at least first question that comes to mind for me. What is slop? How do you know it when you see it? Yeah, I might and I might define it differently than other people. So to me, I think about slop as being a form of content arbitrage. So there's like a it's when you can identify a temporary almost like inefficiency in the market or opportunity in the market to create content that is likely to give you a return. on your investment. And in this case, it's like that you can pump the system with a lot of content that is extremely cheap for you to make that might not get a ton of engagement or might not get a like a ton of subscribers, but gets you enough revenue or engagement from that content that it ends up being net positive for you. And so there are people that can identify these kinds of slop arbitrage moments and really take advantage of it but the two key elements to me of slop are the incentive is money in some way ultimately and it is happening at scale i think there's a lot of bad art out there and uh i would say like generally i'm pro bad art like i think bad art is a really important stage that one must go through to get to good art or good content. I don't know if you can remember the first few things you put on the internet, Nathan, but my guess is that you're like, you would cringe if you looked at them now, knowing how sophisticated you've become in your creations. And so I guess like to me, there's a difference between slop, which is like, I'm trying to pump, I'm trying to juice the algorithm. I'm trying to pump YouTube full of a bunch of avatar meditation videos in this moment when it hasn't caught on so that I can get some ad revenue real quick. And like, oh, maybe I should be like a meditation guru on YouTube. I'm going to create this avatar. And, you know, like that's a different thing. And then the result may be the same, but I don't think that's slop. That's just someone's bad. That's just someone's bad idea. Yeah. Interesting. So if you're feeding the algorithmic hogs, you are producing slop. I wonder, though, and by the way, if the comment section is to be believed, I'm still in the bad art phase of my own personal development. So we'll see if that ever kind of still went. And, you know, one thing I will say about AI is it is allowing me to create stuff that I don't think is terrible, at least. And that I enjoy the process of creating in ways that I just never would have had any opportunity to do before. And so for now, I'm ignoring the haters in the comments who are almost uniformly opposed to my new AI-generated YouTube preview art. but I just figure, you know, I'm having fun doing it and I kind of like the look of it. So for now we'll keep going. Maybe it'll evolve and I'll finally- That's exactly what I mean though. That's like exactly what I mean. Like, okay, if you were learning how to paint for the first time and you sat in front of a canvas and you painted me a picture, it would probably be really bad. Like you haven't figured out what your voice is, what your aesthetic is, like what feels good to you. You haven't taken any classes. You haven't played with the paint, right? That's your very first thing. This is a new medium. A lot of the, like we're having fun with a new medium is how you get to good stuff, right? Like I'm sure that you're finding that as you play with this stuff more and more, you are starting to have your own opinions. Like, oh, I don't like this or, oh, this is working. This prompt is working better. I'm liking what I'm getting more. You're going out into the world, you're seeing other people who are doing stuff, and you're like, I like that person's style. Like, how can I show up in that same way that that person does? And that is the same way that if you were learning how to paint, you would be developing your painting style, right? So I also am not a hater who thinks anything created with Gen.AI is slop. I think that because this is a new technology, most of us are in our create a lot of bad stuff kind of phase. And also because the technology is still nascent. But I am very bullish that it will be possible and already is to create really awesome stuff with Gen AI, like stuff that is not sloppy at all. And I think the only way you're going to ever get there is by, yeah, creating a lot of bad stuff first, because that's how you get good at everything. That's how you get good at art. Yeah, no doubt in my mind that good quality stuff can be produced with the new Gen AI tools that my creative teammates at Waymark are like, just clearly head and shoulders above me in terms of their ability to do that. And it's a little bit hard sometimes to put our finger on exactly why they're so much better. But I think it's pretty undeniable that there is still like a significant skill gradient in terms of what people can do. And they've also put in the time to a much greater extent than I have, even still. And I say that as an early adopter and enthusiast of just about everything AI. Yeah, and they're also fighting their tools right now. Like right now, I think that there are two things that are really blocking us from seeing a lot more good Gen AI art or content or whatever you want to call it. And the first is that the technology is just not quite there yet. And that's like, it's getting better, but no one would actually choose to make a video by generating five to 10 seconds of it at a time. And like crossing their fingers that the voice consistency between clip one and clip two is good enough that no one notices or deciding we're just not going to have voice in video. Like right now, there's all kinds of constraints. You have to generate something like 50 times sometimes to get it exactly the way that you want it. And so if you're making good gen AI stuff right now, it's because like you're super invested in the medium and you want to fight your tools the entire time. And then I think the other reason we're not seeing a lot of it is because there's a lot of stigma right now for people with the kinds of taste and skills to be able to make good stuff. There's a lot of stigma for them to be using these tools and publishing and owning that they're using them. And so when you go on X or you go on like wherever it is that you're consuming social media, a lot of the people that you're seeing using these tools are people who are earlier in their journey of knowing what good looks like visually. And so like it is bad. And I think the average person who's never studied, whether by study I mean in school or in just experience, if you've never studied film or you've never studied photographic composition or whatever, yeah, you're going to need to spend some time in the medium before the stuff you make is actually good. And you're not going to know why. You're going to be like, I know this isn't good. I don't know why. I don't have a vocabulary to say why. If you stick with it, it might get better. It doesn't for everyone. But those are the two reasons. It's like right now we don't have the tastemakers really using it. And right now they really have to fight the technology for it to feel good, for them to ever get in a flow state, for it to really feel fun for them. So you just have early adopters that are really digging into this stuff right now. That comment on vocabulary I think is a very apt one. I had just said it's a little hard to put our finger sometimes on why the creative team is so much better than me. But I think that is a huge part of it. And so often, if I send our creative lead something that I'm working on, he'll give me like a few adjectives or an artist name for inspiration, something like that. Or even a, in some cases, it could be like an explicit direction for composition. And that really does usually take it up a pretty clear notch. So I think that's a great detail to highlight. I don't know if it was in the same email or another one, but you described adding generative AI models to the Descript product as a polarizing topic, and I guess a polarizing product move. And in that same email, I think you were recruiting people to join a generative AI advisory committee. I've been broadly very impressed, by the way, with my interactions with them because we're on the API early adopter data list. I think your team is like very plugged into what are people trying to do and how can we help them do it. I've been very impressed by my interaction with the people that are building the product. Who signed up for this? What did they want to tell you? What are you hearing from people? And how has your sense of what is making it polarizing and how the sort of discourse around these tools is evolving since you formed that committee? Yeah, it has been fascinating. First of all, there was a huge uptake in that invitation. We got way more people who signed up for it than we could convene in a reasonable way. Although I think I've talked to most folks who signed up for it now. And yeah, so why did I send this email in the first place? Because when I became the CEO of Descript, and I had been the VP of product for several years before that, so I wasn't like brand new to the discourse. But we had just changed our pricing, which was a tough moment. We are a really popular product with a huge and loyal and vocal set of customers that we love dearly. And I'm glad that came through in the way that we've talked to you. We are obsessed with our customers. But we did need to change our pricing, which is always like a kind of moment. And we got a lot of feedback on it. And there was this one type of dual feedback that I thought was really interesting and I wanted to dig in on. and came up a bunch in some form of like, I wish you would stop spending time building AI features and use that time to invest in the core quality of the app. And I'm like, when you see feedback like that a number of times as you're raising prices or as you're changing prices, you should get real curious because there's a lot that could be in that. There are like some dual things. So the first is like, do we have a problem with core quality? And what do you mean by core quality? What is core quality to you? Are we talking about performance? Are we talking about reliability? Are we talking about like upload speed? Are we talking about playback speed? And interestingly, and so there were some really good things that came out of that. And I hope folks see that we've been kind of knocking through a bunch of quality things. things. But then there was also a bunch of like core quality, being new features that in fact, to my mind are AI features, right? And so one of the things that became really like clear to me is that when we talk about AI features, different people really mean different things. So Descript is like an AI native product. We've been AI native since like the whole idea of editing a video like a transcript is actually an AI idea. The way that we implement green screen is like using an AI model, like a visual model, the way that we do studio sound, like that's an AI model that we created. We do like voice cloning, regenerate an overdub, like so you can change something that you've always, I recorded the wrong word. Let's just go back and change that and have it in my voice say the right thing. And now it lip dubs and we built that model. And it's like, these are all AI products. And many of the features they wanted us to improve are actually like AI features. And so what I came to understand is that there is a hierarchy of like hostility towards different types of AI features. And that Descript users, at least, love, love a lot of AI features, especially when those features are effects, transitions, things that have a button that do something that feels deterministic to the video. Even if it's powered by AI, green checkmark. Everyone loves it. Keep building that into infinity. Then there's like Underlord, which is our AI co-editor. Underlord is somewhat polarizing. Everyone wants it. People are very excited about speeding up their workflows, their AI editing workflows with AI. And they love the idea of an agentic co-editor that helps them do that. But they're mad that it's not as good as they want it to be, some folks, for some of their use cases. And so there it's, ah, like it's not like don't build Underlord, but it's like, why, why isn't it, why isn't it perfect yet? But I do want this. I do want you to build something that helps me get through the drudgery of editing faster. That's the general sentiment on identic co-editing. Then, okay, so when saw the hatred, it is really the generative video is like the polarizing topic. Like to some extent, avatars, not so much voice cloning and TTS. People feel pretty good about that. But it's really like the generative videos. And when I dig into it, I think, and we've talked a little bit about this, so I don't want to belabor it. But there were really like two things that people didn't like that made people mad about this. In addition to some of the other kind of general pause AI, stop AI kind of stuff that's like in the air. But like strictly speaking from a creator perspective, I think there's like a I feel like I'm going insane because everyone is telling me that this stuff is super good and it sucks and I hate working with it, which I think is a very reasonable perspective for the average creator to have. Because like I said, I think right now to have a good experience, you've just got to be really invested in the model because the technology is hard to use. And we don't talk about that enough in all the hype cycle. And so people feel like they're sold this story that this stuff is amazing and incredible in the future and they use it and they're like, what's wrong with the world? Like, I feel like I'm taking crazy pills here. So that's kind of one part. And then I think there is like this idea, like along with the hype cycle, part of this discourse is like, how many times have you seen someone say something like, C-Dance just put a gun to Hollywood's head and pulled the trigger. and it's like yeah okay well if that's how you're gonna sell that's how you're gonna sell your technology the people who you just said got a gun put to their head aren't really gonna like or be excited to use the technology and like that's not how I talk about this stuff that's not my perspective is like that this stuff is going to end the role of of traditional film that it's going to end recorded media, that it's going to like put everyone who works in traditional media out of a job. Like personally, I just don't believe that. I think this is a new tool, like many new creation tools that we have gotten over the past. Like film was a new tool and that this is like generally pretty exciting. And I don't know, is it threatening? I don't know that the main story should be that this is like threatening and going to displace a ton of jobs because it's not clear to me it's going to displace a lot of jobs. I think it may also create a ton of jobs. It may shift jobs. But this is simply a creative tool that we ought to be approaching with fun, like a sense of play and curiosity. And instead, because of the discourse around it, it is perceived as being threatening and overhyped. Hey, we'll continue our interview in a moment after a word from our sponsors. Most billing platforms were built to send invoices and assume your pricing is simple and predictable. But if you're building an AI product, a fintech tool, or a developer platform in 2026, your pricing is anything but. Usage tiers, consumption billing, and bespoke enterprise contracts are now the norm, and you're probably managing it all across disconnected tools and fragmented systems. Sequence handles the entire revenue workflow from contract to cash Quoting invoicing metering revenue recognition plus sequence agents that automate the manual finance work that usually takes teams days each month while also helping them to collect cash faster. Companies like Cognition, Incident I.O., Runway, and Open Router use sequence to run their full revenue process between CRM and ERP without the spreadsheet mess. If your pricing has gotten more complicated than your current billing setup can handle, check out SequenceHQ.com and use the code Cognizam in the source field when you book a public demo to save 20% off year one. Today's episode is brought to you by Anthropic, makers of Claude and Claude Code. Over the last few months, Claude has helped me build and refine a personal deep context database that now contains all of my emails, Slack messages, tweets, DMs across platforms, video calls, and podcast transcripts going back a full five years. On top of that, we've now layered summary articles describing my relationship with hundreds of contacts, organizations, and ideas. And now that this exists, there's almost nothing that Claude can't help with. For tax season, I asked Claude to help me get organized. It went through my inbox, tracked down 1099s for all 10 of my part-time jobs, and built me a comprehensive report on my expenses and donations. For my angel investing, Claude can now draft investment memos in exactly the form that my venture fund requires, based on the calls I've had and the emails I've exchanged with the founders. And when someone needs a favor, Claude can often do it as well as I can. Recently, a friend reached out to ask if I know anyone who might be a fit for a role that he is currently hiring for. Initially, nobody came to mind. But then I thought to ask Claude, and sure enough, it identified two great leads. Claude is the AI for minds that don't stop at good enough. It's the collaborator that actually understands your entire workflow and thinks with you. Whether you're debugging code at midnight or strategizing your next business move, Claude extends your thinking to tackle the problems that matter. So, for problems worth solving, get started with Claude at claude.ai.tcr. That's claude.ai.tcr. And check out Claude Pro, which includes all of the features mentioned in today's episode. Once more, that's claude.ai slash TCR. I love the emphasis on play. That's one of my most common refrains as well. This technology rewards play, and not just the video or visual generative models, but really all of the current frontier AI capabilities really reward play more than any other technology I've ever used. And that really is the right mindset to go into it. And I couldn't agree more with that. I think there's like six different follow-up questions that I want to ask based on everything that you just told me. And maybe the first one would be, how do you choose which generative models to put into a product? There are obviously many. They have like very different strengths and weaknesses. They have different price points. And there's another question coming out in price and how you're thinking about that and managing that. but these things are super hard to benchmark, right? It's not like in a, you know, when we get to the underlord portion, I think you'll have a much clearer line of sight to like, is this, you know, new model or new prompt or whatever, like doing what we want it to do in a reliable way for a finite set of understood use cases. With the generative stuff, it's tough. Is it just vibes or do you have a better answer for like how you're figuring out what to actually pull the trigger on moving into the product? Yeah, so there's sort of two, two stage gates. The first is like, should this be available within Descript? And the second is, should we make this the default model? Because most people are not going to change the default model. They're going to accept whatever you put as the default model, right? That actually might be surprising to you. I feel like that's something that if you're deep in AI, you're sort of like, why aren't you using the model picker? Obviously, Nano Banana 2 Pro is going to be like the best thing for photorealistic like face swaps, but then you should be using Kling for this other usage, right uh that's how people who are deep in think about things but the average kind of person doesn't have that level of sophistication doesn't want that level of sophistication and so like how do we make decisions about about default models how do we make decisions about what models to improve or to bring in is a little bit vibes uh i'm not gonna lie because it's not like we like eval every single model out there and say like these are the five best or whatever so it's often things where it needs to be available via kind of the we use foul as our provider and so if you're not in foul you're not going to be in descript because we don't want to build our own custom connector for your thing unless it's like the best thing ever but that means we need to like sign a new data license agreement and all this stuff that's like what a headache we've already done it with foul we're just going to do it there and so that's why like cdance is now in descript is it is finally in FAL. So we're like, great, you can come on in. And then like within stuff that's in FAL, we try to pick the stuff that feels like generally the best or in the game. Because what you see are these standard kind of industry benchmarks of these different things. And you'll sort of like see that you have the same labs on the leaderboard kind of month after month. And so we try to make sure that we have some representation from each of those labs, because you're always like one week away from that lab coming back to the top and having the best thing available. When it comes to the default, that is where we do do, we look at external evals, and then we run some of our own on common customer use cases to find out, like, generally, we think that people are going to have the best experience. Now we have, like, for image generation, Nano Banana Pro, I think, is our new default. And what we then do is we'll A-B test it against the existing default and make sure that we're seeing kind of good things from the A-B test and that the A-B test matches kind of like what our internal evals tell us. And if it does, then it's like a definite shift. This is our new default. When you do an internal eval, is it a panel of trusted people that are like scoring outputs? Yeah, it is. Yeah, interesting. Hard thing to automate. We have done some of that stuff. It's been a minute since I last did a version of that. I was also struck, and our original use case was a little bit different. So with Waymark, we have this basically TV commercial maker for small business. And we've, for a long time, had a tool that pulls in all the images that the small business has published to their website or their Facebook page or whatever. And then we make this library. And that was a great convenience factor even five years ago when there was not much we could do with it beyond just be like, here's what we collected for you so you don't have to go collect it for yourself and upload it. But obviously with AI, there's a lot more we can do. But the aesthetic quality of an image was always a really hard thing to evaluate. Like in early days, you could caption it, but did it look good? Like models who had a hard time with that. I'm not even sure that you should be trying to automate that. I don't know. Like, I went to this dinner with the CEO of Midjourney and he's like, the reason why we have the best, like still have aesthetically like the best image generation is because I have my thumb on the scale. And like Google just lets like some kind of democratic panel or automation decide what the best what the best image is. And the best image is always some generic pretty blonde lady or whatever when you ask for something. I thought that was pretty funny. But all of that is to say that like I this may be an unpopular opinion. So fun for you because it's a little bit controversial. But I don't think that you can't you should underestimate the importance of vibes in like aesthetic evals. And just when we first built Studio Sound, which, by the way, like it's still our internal model and we still think it's better than we recently evaled it against all the other new Studio Sound providers. and we still chose ours, even though in other cases we have thrown ours out and taken another kind of model that's obviously better. But we kept Studio Sound. Studio Sound was originally built by a cellist who just had a really good ear for things. And he did what we might now be calling eval, but our original eval thing was this guy would listen to different models and be like, this is better, this is better. Now, when he left the script, then we had to actually write an eval that was like, what are the 37 different things that make one form of background noise removal better than another form of background noise removal? But I don't know. I think that it's reasonable to say in something that is like primarily judgment based, we're just going to have a human do this and we're going to have a human do this forever. Someone that we know that we've vetted as having good taste is going to make these decisions. Yeah. Okay. That's quite interesting. Could you do just a quick overview of the frontier model landscape as you see it? Like you started to a minute ago when you said nano bananas best for space swap and then cling this other thing. Is there a like expanded version of that that you would say is kind of here's how users should generally orient themselves to their options? You mean specifically for like video and image generation? Yes. or others if you have a similar account for others. But yeah, that's what I was thinking. Yeah, I, you know, I'm probably not the, I'm like, do I have a cheat sheet here that kind of tells me that? I'm honestly like, I'm probably not the best person to ask about this. I know that like, but I know that we, that like generally we have a perspective about it. I know that our defaults right now are Nano Banana Pro and Vio from Google and that we're considering replacing Vio with C-Dance. What's interesting is like, I'm not going to get into that actually. Yeah. But I would say that, especially when it comes to video generation, okay, so maybe I will, especially when it comes to video generation, I think that it really depends on your, okay, so I don't think that there is going to be a winner take all across all of these generative image and video models, because I just think the use cases for generative video, for example, are so different that it's very difficult for me to believe that the same model is going to be the winner for something like Oscar film worthy special effects and making the cheapest good and cheapest but high quality enough video for all of the product pages on amazon.com like I just think like there's going to be models that are really good for like massive bulk actions that don't require things like consistency across time or sound or voice, that's not going to be important. It's going to be like about a bulk play versus something where quality, you know, you'll pay thousands of dollars per generation if the quality is really, really good. And so like most products, I think it really helps to understand who your core customer and hero use cases are. And one of the things that this kind of came up for me with C-Dance is like C-Dance is an amazing generative video solution if you want a really opinionated edit where it's going to like make a lot of artistic choices for you that you didn't necessarily ask for. And so if you're someone who's kind of like wanting to abdicate a lot of that or able to like ahead of the game exactly describe beat by beat what you want to happen you'll have some you'll have like a good time with c dance but like a lot of people at descript are using generative media as b-roll and so then like c dance feels sort of like almost too flashy and directed when you do a general prompt into it for it to not be distracting right like b-roll is supposed to be generally like not super distracting and not kind of taking too much of your attention your attention should be on like the a roll and so i don't know that's like a it's a weird example where we're like should we make c dance the default and i'm like on one hand yes it's like the quality is really good but on the other hand for the typical use case which is not using it as a roll but actually using it as filler b roll i don't know that first of all c dance may be overkill It's like too expensive for that use case. And it's maybe like too much of a scene stealer for that use case. It's kind of like a, you told me it was okay to get in the weeds. So yeah, please. An example of getting in the weeds. I rely on people to help me understand these different frontiers more and more all the time. I mean, I, I used to three years ago, I could like try all the new models myself. And now it's just getting to the point where I have to rely on the network to help me understand it. So, yeah, please don't shy away from any and all of the nitty-gritty detail. Okay, cool. I think that's really interesting stuff on generative models. And I think the perspective also on, like, there won't just be one winner makes a lot of sense, too. And I actually think that's why, like, a lot of the orchestrator agents, one of the things that they're going to need to be good at doing is understanding which generative model or which other kind of model they're going to need to orchestrate. between all of the different models and sort of understand, like, given the context that I have about the video or the project that this user is working on, this is likely the right model that I should use. And this is why, or whatever, to kind of hit their cost quality and like use case bullseye. Yeah. Yeah. Hey, we'll continue our interview in a moment after a word from our sponsors. AI is rapidly moving from assistants to agents and it's causing a sea change. AI isn't just helping anymore, it's taking action. And here's the reality. You don't get outcomes from agentic AI unless you trust it to operate at scale. That's why AvePoint is building a control layer for AI. This foundational layer helps you govern what agents can access, secure how they operate, make activity auditable, and recover when something goes wrong. All as one connected system. See every agent, app, and workflow, and what they touch. Govern with policy and guardrails that work at machine speed. And recover quickly so a mistake doesn't become an outage. That control layer creates trust, and trust is what unlocks the right outcomes, letting you automate more work, move faster, and deploy agents with confidence instead of hesitation. If you're scaling agents and want those outcomes by design, learn more about AvePoint at avpt.co.tcr. That's avpt.co slash TCR. how see the video? How does it understand video? Because I've done a bunch of stuff with this over time where I've been like, and even with Gemini, which is video native in some sense, in that I can throw a video at the API and it will accept a video file. I'm not quite sure what's going on under the hood. Like, are they taking frames out of the video and doing some sort of sampling? It doesn't always feel like it has a true, I'm watching the video. You know, it doesn't always feel to me like it's, like sometimes I've asked it to critique videos and it sort of says that there are like hard cuts when there weren't hard cuts, just because like I moved my head between two frames or things like that. So, and obviously video is just like a huge, heavy file in the first place, right? So it's a big part of video software over time has been just managing that. And we've got another generation of that problem in managing that, but as we provide video as inputs to models. So how does it get processed and how does it get structured so that it can be presented to one or more AIs in the most effective way? Yeah, right now we translate visuals to text and consume those. So it's called captioning. And so we do frame-by-frame captioning of what is in this frame. And then we use some clever tricks to sort of fake giving the agent eyes and ears that way. And I think it does. I think it is okay. I think this is an area of like huge opportunity for us. And like working multimodally is right now the agent quality team's number one priority. So I would stay tuned here to see a major upgrade in the next month or two. But but right now, right now we do visual captioning. This connects also to the idea of like training your own models versus, you know, going out and getting models. It sounds like if I interpret your previous statement correctly, you're kind of neutral on whether it's your own model or somebody else's model and really just focus on what's going to deliver the best, maybe cost adjusted user experience. How do you think about, first of all, choosing between build versus buy when it comes to models? Yeah, so we have sort of like a strategic bullseye of where we aspire to have the best models. And I'd say that where Descript aspires to have the best models. is when you start with recorded media and you're editing recorded media, we want to have amazing, we want to be like the world's best at that job. So that's, like I was telling you before, something we call regenerate, where we can go into this recording that we're in. And a few questions back, I may be like, oh, I really didn't like my answer. Can you actually make me say this instead? and you can change my voice and my lips to say the thing that I wish I had said in the first place. That's a great example of 98% recorded media, but, oh, we need to update all of our branding to say this, or we need to update the dates, or California just added a new law and we need to change some of the language from the way that Laura explained to this concept two years ago. So can we just regenerate that without having to re-record the whole video? That's the kind of job Descript wants to be really good at. We're about to launch smoothing jump cuts. So if you do have some crazy jump cuts, you can use Descript and we'll just make it look like you naturally moved over there and you never made the edit in the first place. So those are the kinds of things where it's like we have Descript models to do that and we want to own that space. For purely generative stuff, we've sort of said like we don't want to own that space. It's very expensive to build those models. And I think most of the companies that are spending hundreds of millions of dollars to build those models are still going to lose to Google. So I don't want to set money on fire that way. So I think it's about like kind of deciding where you want to win and then deciding where you can borrow. And so we are very friendly to buy to borrowing around, especially around like pure generation kind of stuff. I think like, I'm really excited about, but think this is where you kind of get into a blurry line, heavily augmented recorded media. So for what it's worth, you didn't ask, but like, I, I will say that like, there is a part of me, maybe like a stodgy part of me, that just feels possessive about human expression, human face, facial expression, and feels a little bit unsettled by AI, AI clones, or something that purports to be me, but isn't. At the same time, I'm very sympathetic to the idea of like, I had to turn a whole bunch of lights in this studio. I had to make sure that like my makeup was like decent before I came on to record with you. And wouldn't it just be nice if mostly we could have this conversation in an authentic and human way that we don't type on a piece of paper and then add my voice to and then add my robot face to. We can mostly just talk as humans. But then in post, we could do all kinds of magic to just make it look like I was wearing makeup and looked amazing and had a great outfit on and the light was perfect and I didn't say anything stupid. And so that's kind of like the vision that I have for Descript, is sort of really making the killer use case that we're better than anyone at augmented human recorded media So that where we really like to build Yeah cool That quite interesting In the I guess I sort of feeling out the kind of strategic landscape of models. It sounds like some of the models that you're building, maybe there are no offerings on the market. You know, I've not seen one, for example, that does like jump cut smoothing. And I could also imagine, you know, you've got kind of like retake removal now. And that's been there for a while, but I'm also guilty all too often of vocalized pauses. So we can remove ums and uhs and that kind of stuff in a pretty smooth way. And it strikes me that the structured nature of the edits that people make in Descript is an unbelievable data set for some of these use cases that probably just nobody else really has even collected the data on. So I feel like I'm intuiting kind of probably where the core advantage lies based on all the work that people have done in the product over time. But maybe you could tell us a little bit more about how you think about the data flywheel. And I guess if I was going to say, what do I want Underlord to be better at? It would be some of these subtle things where I stuttered, I repeated myself, whatever. and then but which one do I cut you know do I cut the first version that I said or do I cut the second version that I said it's not always I think you'd usually kind of think maybe cut the first one if you if you felt the need to say it again then probably the second would be better it's not always the case and often also the when I highlight a word and you know hit ignore on it then I kind of go back and like watch that passage again to see like how did that land you know was it glitchy in any weird way or whatever. And I think if there was a model that could sort of make those marginal decisions well, where you kind of try this edit, try that edit, and see which one looks better, that right now is like the bulk of the time that I spend in Descript that I would love to offload is kind of, I just made that edit. How does it look? Let me try the alt version. How does it look? Making good decisions there would be, I think, an amazing upgrade. It doesn't sound like that's something Google's going to solve anytime soon or anybody else. I don't know. Maybe you have other candidates out there, but that sounds like you're probably going to have to do at home. That's right. That's exactly right. So it's like, we don't think, we tend to invest where we think we have great data, where we think we can build something without breaking the bank, and where we think it's unlikely that one of the labs is suddenly going to care a ton about just like removing retakes. Because that's much less interesting to to google i think than solving uh voice consistency between three second clips or whatever and so i yeah you're intuiting correctly and if that's where you want us to work first of all i'll pass your i'll pass your feedback onto the product team we're kind of taking things on chunk by chunk but that's where like the thumbs up and the thumbs down really helps us identify like where are we not hitting the quality bar in either this ai action which are like the deterministic seeming tools like remove retakes or an Underlord request themselves, right? Where you may say something like, get rid of all of my filler words, unless you can't make a clean cut. If you can't make a clean cut, keep the filler word. I haven't tried something where I sort of give it the like, freedom to determine it can't make a clean cut. Is that something I should be doing more of? Yeah, you should. We have tried to make Underlord truly open world. So like, Underlord doesn't just have 28 tools that it can use. And if you choose something that's like idea 29, it doesn't know what to do. It ought to be able to handle nearly any request, not all with equal ability. And then we use the thumbs up and thumbs down to understand as so we have an eval set, which I can tell you about. But then we use user feedback to help us understand like where the pain is, because there are all kinds of things that we're bad at. I'm just going to be real. There are some things we're great at. And we actually have like in our evals, we have three grades that you can get for user request. And that is like, you didn't break anything. So that's, I'm just like, you just didn't break my video. You didn't do something that made me say, oh my God, you ruined everything. That's like grade one. Grade two is like, you did what I asked. You know, I said, remove filler words and you remove filler words. Thanks, buddy. and did it well would be you remove filler words and you didn't have these, you know, really striking jump cuts or changes in tone as a result. And so what we do the way that we do evals is we have like a random selection of real user queries. We run Underlord against those, you know, in version one and version two. And then we have actually a ton of LLM judges go through and grade them kind of multiple. and we take the average of those and we say, okay, this is the percent where we didn't break things and we aspire for that percent to be close to 100%. We never want to break your stuff. Then there's like, did what I ask. And right now I think we're aiming for that to be like 90% of the time. We do the thing that you ask us to do. And then there's like, do it well. Now, right now with do it well, we're like, we're okay. We could be better. I would like that number to be 80% by, and I'm going to say by the end of the year. Because right now it's like we're not doing it really well. And I think that it's when you really feel like 80% of the time that I ask you to do something, you do it at about the level that I would do it. That's when you like really trust your AI co-editor. And we and but that's across all use cases. There are some nice cool spots where we're doing it well a lot. And like rough cut is an example of that. So whenever you're asking it to help you with a rough cut and to like help you get a long story into like a shorter form, we tend to do to do that well. A lot of the visual stuff we're not doing as well. And that's why multimodal is a real priority for us right now. But user feedback helps us understand like, wait, there's a hotspot here. There's a hotspot around filler words where like people are just not they are not happy here. Why don't we go spend a couple sprints getting this part of the product really cleaned up? So as you try to push the frontier on this, I can kind of imagine a couple of different strategic directions that you might go in terms of how to get the best performance out of the available models with the various constraints that they have. And maybe you're even doing like multiple of these. But one angle would be to say, OK, well, Claude or GPT or maybe Gemini is going to be probably the best reasoning and tool use agent. So what we really need to do is set whichever one of those we're using up for success. How do we do that? Well, we need to give it a richer understanding of what it's working with. But since it can't natively detect these awkward moments, for example, maybe we need an awkward moment detector that we can run and then kind of feed in to the model to flag when these things are happening so that it knows to reason appropriately about that. But then you can imagine a different version where you're like, I've been hearing very good things about GLM 5.1, and I think the weights are out there for this. maybe we want to try to do something deeper, where we actually teach the core model to understand some of these inputs. And I'm adding video as a modality to a GLN 5.1. It doesn't sound easy at all, but you could do some sort of late fusion, cross-training, what have you. And I guess this sort of decision probably depends a lot on what resources you have. Do you feel like you can hire the team to do frontier work at that level? Or is it just so hard to compete with the frontier labs for that kind of talent that that's out of range? And it might also depend on like, do we think that open source models, it definitely would also depend on, do we think open source models are going to continue to be competitive? Or do we feel like Cloud 5 is going to run away for compute reasons or whatever constitutional reasons, whatever else, if it runs away from the open source bases, then we can't really keep up even if we do get good at doing more advanced stuff on open source spaces. So I guess to bottom line all that, what's the model strategy? How do you think about where you want to, what trends you want to bet on carrying you forward? I think that we are betting on, the main bet that we've made with our agent is trying to build a very generalized harness and to give the agent access to a bunch of low-level tools, assuming that generalized intelligence is going to get better and better, and that we'll use probably a handful of whatever model Anthropic or OpenAI or Gemini comes out with that when the next cloud model drops, we'll have it evaled within 15 minutes in the product, and that building for that, at least for the short to medium term, is the right bet to make rather than investing a lot of time and money and research trying to keep up with the labs. So then it's about like, how do we build an agent harness that's going to be able to instantly take advantage of leaps in general intelligence? How do we not get bitter lessened into not being able to immediately take advantage of those leaps. And so that's how we've tried to build our agent. Just give it like a ton of context about the Descript model and also about video editing and how to think about user requests and give it access to our low-level tools. Yeah. And then we do some stuff. We have various experiments that we're doing to sort of make it better through personalization over time. Okay, I'm very interested in the personalization. Does that boil down to basically saying, though, that you want Underlord to be essentially in the same position as your human users? I mean, obviously, it can't see as well and it can't hear in the same native way. But subject to the constraints of some of these things having to be arm's length tool calls to do the sort of sensing. it sounds like aside from that you're sort of like building one harness kind of for both humans and for AIs at the same time is that a reasonable way to think about it yeah I have not thought about it that way but I think that is right and I we do have a design principle that like Underlord should not be able to do anything in the editor that a human can't do and and vice versa they should all have access to the same tools and yeah that we think about and that also kind of like aligns with the general design principle we have that Underlord is like a collaborator with you in the editor, the same way that we've been a collaboration, a video collaboration tool since day one. We've been a tool that teams use because often video is not a solo job. It's a job that you do with a team and that Underlord is like a member of that team. Yeah. Okay. That's quite interesting. I feel like I'm seeing this still in a fuzzy way, but it seems like increasingly like a sort of universal design pattern, which Descript doesn't quite follow yet, although what you articulate is very consistent with it, is sort of a app that you could use using your own intelligence and click all the buttons and do all the things, and then sort of frame that with a agentic companion that to varying degrees you could say, like, do this point thing for me or like do everything for me. And it would kind of log out for you what it's doing using the same exact tools that you could use. And then in sort of creating that log, you could also kind of pretty easily go in and like undo the one thing that it did that you didn't want it to do or it didn't work well or what have you. Do you see the form factor? I mean, right now there's not this sort of like very persistent underlord presence that's like the long running agent where I can see like everything it did. It feels like it's more kind of embedded into the product as opposed to framing and sitting outside the product, do you think that there, is that something you think will change over time? Or do you feel like maybe I'm going the wrong direction with my framing notion? No, I think that it will change over time. And I think we need to decide exactly how. But ultimately, Underlord would be more powerful if it had, like right now it lives at the project level. It needs to at least live at the drive level. But I think it would be even more powerful if it could live outside of the drive. and be a collaborator that you can bring, like with MCP, I think about that as like, that's Underlord is now my collaborator that I bring into Cloud Cowork. And for example, now one of the ways that I create content is I say like, hey, Cloud Cowork, look across everything I've done in Notion and Slack over the last week. And can you come up with like six ideas for clips that I can make about my thoughts on the AI space? And this is much better than saying like, hey, can you just like brainstorm six thoughts about the AI space that like maybe I believe or maybe I don't? It's like, no, I talk about this shit all the time. You've been listening. I know you have. Go find some things that I'm saying that you think are kind of interesting. Suggest them for me. I workshop it in Cloud Cowork. And then I'm like, great. Go create like scripts in Descript. Go create projects for each of these and put this script in there as like scratch text so that I can then go into each of these projects and record. And then like, and then I can say like, okay, I've done all the raw recordings, go turn them into like LinkedIn quotes using the skill that I built that tells you like what my LinkedIn clips look like, right? Or we have, we have a user, I'm obsessed, actually, I should send you this Claude skill. But he's been a big user of the MCP. And he has like a podcast, an edit podcast skill that he's created. And he just like runs it, it's like triggered whenever he finishes a Zoom recording and it just like goes through the skill, creates the project into script and he goes and he looks at it. So all of this is to say that I think we need to bring Underlord not only out of the project and into the drive, but also like out of Descript and into like the team of agents and like into the world where you're already doing a lot of app connections and video, your video team, Underlord, your video team needs to be in there with the rest of your teams, right? Working on creating your content. I would love to see the edit podcast skill. I've created my own. I wouldn't say it's very advanced by any means, but it's kind of, it's a working V1 that, for example. Oh, that's awesome. It does like six different things. One is like, I typically open every episode with addressing the guest and then saying, welcome to the cognitive revolution. And then usually at the end, I've got an outro, trim everything before the welcome, trim everything after the, you know, the thank you. And there's like five other steps. Yeah. He gets a pull quote and then he has the podcast theme come in after the pull quote and it like shows up pretty well. Yeah, cool. I mean, I'm sure I could learn something from that, no doubt. And maybe there'd be somebody who could learn something from mine. Although, again, it's not that advanced. I'm sure they could. I think the main thing they could learn is just like that you can do this at all. Not necessarily the quality with which I've done it so far. But, okay. So I think this is really interesting, and it does kind of get it in some ways like some of the biggest questions about the future of software and even just the future of knowledge. How is all this going to work? So as an early user of the Descript API and the Underlord API is really the core of that, I think it's a really interesting pattern that you guys have gone with so far where the tool is smart. I am not with the Descript API. I am not even afforded the ability to do like very specific, fully deterministic edits to a project. Instead, like I'm prompting Underlord and it's doing the thing. And so I can prompt it very specifically. But there's always this kind of translation layer. It works pretty well in what I've experienced so far. I'm new to it like everyone else. And I think this is kind of in one way, like the only way maybe it feels like for software companies to have any sort of defensibility. because like you got to have some smarts inside your kind of periphery, right? That seems like it's got to be a core principle. At the same time, there are times where I'm like, I just kind of want to make sure I'm doing exactly what I want to do. And then there's also personalization. Like my Claude Code universe is ever growing and has, you know, tremendous amounts of context and access to every podcast I've done and also ones where I've guested, you know, where it's not even in my Descript account. And it just has like a broader world of a broader view of me. So how do you see that? And then I'm also thinking Finn too. I'm sure you've been following this pretty closely. Like they have opened up intercom has opened up their customer service model to other companies to build their own, you know, competitors to Finn using the same intelligence. Benedict Evans, you know, all software revolutions are either bumbling or unbumbling. What are you, how do you think this is all going to get bundled or unbundled or rebundled? Where are the lines going to get drawn? Where should personalization live? Like, where's all this going? Please de-confuse me as much as possible. Yeah, I can't. And I think anyone who tells you that they can is lying to you, probably for self-satisfying reasons. But what I would say is I think that Descript currently and like my job is to make sure for the foreseeable future can give you a better experience if you are using Underlord and all of the context that we have about you in Descript. And it's just going to get more and more powerful as we build in personalization and drive level understanding. and so I think that is likely to be the primary way that you do video editing within Descript because we just if you do it that way not only do we think because of the way that the agent coordinates within Descript and understands the like capital D capital M Descript model and how everything in the app is set up like we have important context in the underlord layer where you're just going to have a better time than if you're asking Claude to coordinate across a bunch of tools but the other thing that that gets you is like if you're doing everything in Descript And that guarantees that when, which for most people will be true, you inevitably need to go in and do the last 10% yourself before you're really comfortable hitting publish. You're going to be able to go into Descript and we will have access to all of the discrete things that have been done so that you can undo them, change them. They're not just flat files that have sort of been put in there. And it's like, do you want to edit this thing that's fundamentally uneditable? And you're like, no, this isn't helpful. So I think like that's our vision. However, we do think that it is important to break out some of our tools and make them generally accessible. Things like the transcript ought to just be callable in a deterministic way without having to go through Underlord. And there may be additional tools that it might be nice, like you don't need a ton of context to be successful with them, where I think we may want to think about distributing those just as deterministic tools and giving access to Cloud. But when it comes to like really orchestrating like multiple types of media, doing visual edits, setting things up like a layout, like things like that, I just generally think you're going to have a better time. That's true today. And because of all the work that we're doing with the agent, I think is going to be true long into the future. Yeah. Yeah. I like the framework of my job is to make sure you have a better time using our thing than using Cloud Code. That is going to be a challenge. We're together. Like, I actually think you should be using Cloud Code. But I think, like, the package that we're talking about where it's think about what you're doing is, like, you're telling Cloud Code to hire Underlord as your video team. And then to go off and do its job the way that it thinks it ought to be done best. Yeah. Yeah. I think that is a good paradigm. And I do think video is one of the areas where it probably lasts longer than many others. I think that's right. And I think like, look, this is like I am not offended in a lot of like like a common question is just what happens if one of like like how are you going to defend yourself against like, I don't know, X X big lab. And I think the answer that like any company that telling the truth will give you is like sure if Google or Claude or OpenAI decides that the thing that my app does is exactly the thing that they want to be great at and they want to spend the time and the money and the years and the sustained effort to make a great product to do that job what can I do probably nothing but i think that we overestimate the number like i think there will i think that there will be some low-hanging fruit for them to do that with a bunch of very lucrative businesses and like we're having a robust video and reliable video editor is actually like pretty high up the tree you gotta climb a lot of branches before you're like why don't we just like build and maintain something like it like a robust video editor forever i think that's why a lot of people don't do it. Yeah, I think the one thing that changes, I think I agree with that analysis, as long as we're in something like the normal regime. And then beyond this, it's like sort of beyond conventional business strategy. But I don't think it's like, it is a crazy world. It's a, it's a crazy reality. But I no longer think it's crazy to think about the possibility that the AI coding agents and AI entrepreneur agents just sort of can sustain that effort themselves. And that's where things get really through the looking glass. I don't know if you have a view on what, if anything, can be done if that threshold gets crossed. OpenAI, as I'm sure you're aware, has a timeline that they've publicly stated from when they expect to have an autonomous AI researcher, and that is March 2028. So we're less than two years away from their, their target to have the autonomous AI researcher, it starts to be sort of a weird world where you're like, geez, like that thing might be able to create its own specialist models that, you know, could do all these like very particular use cases and create, you know, its own kind of sense organs, you know, to figure out like what's going on in the video. And then I guess that's just the singularity. I I don't know if there's any other interpretation of what happens at that point, but maybe it's just too remote to think about or maybe you do have some thoughts. My general thought is like I get really excited when I think about being able to automate more and more labor. Like I think that generally is like leads to an exciting future if we're willing to do the work to make it one. I am skeptical of that timeline and think that when someone tells you something like that it's really important to get very concrete about what exact bet that they're making like kind of get that on paper and in and in your process of getting it on paper like what what actually do you mean by an autonomous researcher what is it able to do without human oversight what is it not able to do without human oversight you often get to a more reasonable picture that still that implies like a different future than sort of understanding things at the sort of topic sentence level, which I know you are deeper than that. But that's just like a general tip that I give people when they're trying to understand these claims of like in six months, there will be no more white collar worker jobs. It's like, is that what was actually said? Like, let's slow down and like, look at the claim that's being made. But generally, I am like very bullish about the direction that sort of labor automation is going in. And I think it's generally good news. And I think that what we can't predict exactly what the timelines will look like or how it'll play out across different industries. And that's why I think the companies that will win are going to be companies that are able to make decisions quickly and well and who generally like embrace change and are not resistant to it. And so when I think about like, do can I right now tell you exactly what the labor market is going to look like in five years and where Descript will play within it? No. But do I think that we have built the company rituals that allow us to nimbly shift strategy and quickly take advantage of leaps in labor automation to quickly build tactics to deal with the competitive situation as it evolves over the next several years? Like that's where I have a lot of confidence, but I tend to be, I tend to think that we are overstating a lot of the near term changes in labor and society that will come, that will come from AI and probably understating a lot of like the longer term changes in, in society and culture and labor that will come from AI. I don't know. do you think like podcast editor is a job in two three years or maybe people still want to delegate like i don't want to watch it you watch it and do a little quality control and it's not even editing but it's just sort of giving feedback to the ai is kind of one version of that i can imagine but then it's i don't know if podcast editor is going to be a job will people be employed to tell stories yes what's what kind of stories using what media uh and who will they be employed by that's all subject to change but like if the thing that you're really good at is telling stories for brands or is interviewing other people and finding out what's interesting about that and like getting that out like doing a media job like those that will still exist that will still be a job and And especially if you kind of like embrace new media and embrace new distribution channels, like you may still have that, like you may still be have to have that job of the future. Does that make sense? Yeah, I do wonder. I think there's a huge question around. Will people make these transitions? I think that's a because a lot of times when I have this conversation, I'm like, geez, it seems like this work is going to be pretty highly automated. and then people point to a different, like sort of adjacent kind of work. But a lot of times I'm like, yeah, but are the people doing the first thing going to switch to the other thing in any effective way? I think that is where I see disruption being pretty meaningful. Like I feel like the winners and losers are not, in many cases, they're different people. And that there will be winners doesn't mean necessarily at least that like the losers will be able to kind of pivot into a winning position And it might just be like a major redistribution of like of who's winning and losing. And that could be a huge challenge for society, even, you know, even if while there may be like, you know, all kinds of new things pop up that do create new kinds of winners and new kinds of opportunities. To some extent, like that's just always true, right? Economies are always changing. Sectors are always growing and shrinking and there's labor displacement. this may happen in in a sped up way in which case we'll need whenever there's like tremendous labor displacement in a short period of time which there may be in the circumstance then you need to make sure that that systems are in place to take care of people through those moments of extreme disruption i think like where i i have a lot of faith that the that like long term there will be big shifts in labor, but there won't, I am not like a subscriber to there being permanent losers and irrevocable losers, unless that is the society that we choose to build. I think, I think like there may be a very difficult moment where there's an accelerated kind of labor displacement, in which case we need to be ready to meet that moment as a society. But I think it's a moment that we've met before and that changing labor landscapes is not like a new human problem. Yeah, that might actually tie back into the original slop question in an interesting way. But maybe one more beat on just kind of some practical product stuff, because I did want to ask about something that I think is increasingly common and is like quite prominent, actually, in the Descript experience today, which is there are individual button clicks or and certainly like individual prompts that I can give to Underlord that will spend a few dollars worth of credits for me in kind of one go. and that's a weird new world for software, right? Where it used to be just like click and do whatever you want. And now you're a few clicks in and you might be, you know, through your monthly token budget. How do you guys think about designing for that new cost paradigm? I think it's a temporary cost paradigm myself. So first of all, what I'd say is I actually like personally as a consumer hate the concept of feeling like pressing a button is gonna cost me a dollar. Even though like, actually, if you do a good job on creating my clips, that's a pretty damn good deal because that used to take me a lot of time. Sure, I'd love to pay a dollar to get someone to do my clips. But like, I'll just say like, as a consumer, I don't feel great about that experience, which is why like when we, but AI costs Descript money, right? This stuff costs us money, so we can't be free and unlimited. So the way that we try to create pricing for Descript is like, we are going to make it so that hobbyists can make like one really good thing a month, and that creators can make like one really good thing a week, and that businesses can have teams of people that are making multiple good things a week. And then it's like, well, what kind of things? Well, those things really are very different for different people. The reason why it kind of has to be a shared pool of AI credits is we used to say like, oh, like every creator gets this much AI speech and this much filler word removal and this many clip things. And it's like, yeah, but like if I'm a podcaster, I may never need AI speech and I need clips every single week, multiple times a week. So that's not a good deal for me. So we wanted to have kind of like a budget you can spend across any kind of AI job. and we went and we looked across all of our main use cases and we're like, is this enough credits for like someone who's making one podcast a month or making one, you know, long form YouTube video a month or something like that for a hobbyist and like for a creator, like one thing a week. Now you're making, how many did you say a week? Two a week? Usually two a week. All right, you need a double license. But in any case, like that's how we tried to price it out. And then like we have the idea that you can add on more credits or more media hours if you are in a special circumstance. And so that's how we design pricing. But I think like this is a temporary moment of pricing. Everyone's doing this. And we all know it's like, and the consensus around where pricing is heading is more outcome pricing, where what you're charged on is maybe something like exports. And it's like, you're not gonna get charged unless you get to the outcome that is getting you the value that you need. And then we'll charge you for that value. And I think we all kind of want to live in that world. But because of the state that the models are in right now, and because of how expensive AI is right now, we live in this world that feels kind of uncomfortable for everyone. And so what Descript tried to do is set up a pricing situation again, using those general design principles with the idea that less than 5% is our gate, like less than 5% of people have to buy some kind of pop off every month, less than 5% of active users. And that feels okay to me. If less than 5% are hitting their limits and needing to buy extra stuff, I'm like, okay, this is feeling all right. If it were something like 50%, I'm like, wow, this is not a fun amusement park to be at. Everything costs so much damn money. Yeah, interesting. That's, I think, again, a very interesting and useful frame. This has been great. I guess my last question to kind of tie it back to the beginning and also try to zoom out a little bit. Is it going to be possible to avoid a future of infinite slop? It seems like right now one barrier to infinite slop is that the models are kind of expensive. So you've got to have some reach or you've got to have some reason to believe that you're going to get paid back in order to spend all the credits. Maybe in the future we have some sort of universal basic income or new social contract that sort of reduces the need for people to kind of push slop to try to earn whatever pennies per view or whatever the case may be. But if pricing is dropping over time and that future may or may not, that new social contract may or may not arrive in a timely fashion, how do you think, what is the future of content? Is it going to be infinite generation? And what is it going to look like on the consumption side? Like, are we going to be lost in slop? Or do you have a vision where we can sort of, even if it's infinite generation on the consumption side, maybe we can somehow rise above that reality? I'd love to kind of hear how you think the future content equilibrium shakes out in an aspirational way. Yeah, I don't think that. So whenever people ask me about this question, they tend to be people in tech or they tend to be economists. and with a ton of respect to people in tech and people who are economists. I just don't think that like what we're missing about content is like content is a little bit businessy. And certainly a lot of the people who use Descript are using it in businessy ways. But it's a little bit art, too. Right. Like it's a little bit artistic expression, creative expression and creative storytelling. And whenever you're kind of like playing in that field, I think it's like it that isn't as clearly driven. by like free market nihilism as other areas of the world. And so I'm not talking about the script now. I'm kind of like playfully thinking about video or film as a medium. But when something's about artistic expression, art, I think, tends to have a way of reacting to the technology of the day, to the culture of the day in ways that surprise us. I think about like the invention of the camera and how that changed painting, the medium of painting forever. and people were just not interested in photorealistic painting after the invention of the camera. And so like I bring that up to say that like art always reacts to technological advances in ways that surprise us. I think like and then you might be like, yeah, art, we're talking about content. But it's like artistic expression kind of content will change first. And there will be people who do very creative things in this moment that are unexpected and surprise us and raise the quality bar. And then there will be businesses that see that and say like, I want a little hunk of that. It's like the Miranda Priestly. Is that like the Devil Wears Prada where she's like, this designer over here decided that this Cerulean blue, that was in their spring collection and then you buy it in a bargain bin at Marshall's. And that's the way that content works too. You'll have people that are like interesting, exciting, truly creative, have an aesthetic eye that are going to do interesting and exciting things both with this technology and in response and defiance to this technology and to this moment in culture. And that will inspire all of the then marketing people to steal from that aesthetic and that response. And so like, I don't, it is easy to look into the future and see our nightmares. And like people pitch me on things like there won't even be human creators anymore. What's going to happen is you're going to stare into your phone and there will be a seed idea of a video. and then based on where your eyeballs go, it'll generate more and more video in a way that makes you never want to look away. And I'm like, oh, like in Infinite Jest, like David Foster Wallace basically told us this would all happen in the 90s. And it's like, maybe that'll happen. It's easy to look into the future and see our nightmares, especially in a world as skeptical as technology as the one that we're in right now. But I'm actually really excited to see what artists and creative people do with this technology and in response to it. And I think that there will be really exciting things that come out of it and that those are the things that will win in the marketplace. Yeah, even in a world of slop. I love an optimistic vision for the future, so I think that's a great note to end on. Laura Burkhazer, CEO at Descript, thank you so much for being part of the Cognitive Revolution. Thank you, Nathan. Pognitive revolution Them say every pixel a nightmare Every frame a ghost Say the algorithm ate the canvas Say the art is toast But part art is the road, yeah That's how you grow First stroke always crooked for the river start to flow Solar gun to Hollywood pressed against the head But the painter keep on painting, working with the thread Slap, nah slap, cause it ugly or it new Slap is when the money pull the strings right through At scale, at scale Them feeding the machine, but we put in a thumb, yeah At thumb on the scale Dump on the scale Dump on the scale Cognitive revolution We are set to tell Hierarchy of hostility Them draw the bottle line Love the button at the avatar Fear the next design Taking crazy pills They telling me it's good But it's up and I hate it, I'm not losing where I stood Now the tool is young, my hand's still learning how Spend time in the medium, find your voice right now Slop, now slop, cause it ugly are it new Slop is when the money pull the strings right through I'd scale, I'd scale them, feed in the machine But we're putting a thumb, yeah, a thumb on the scale Dump on the scale Dump on the scale Cognitive revolution We are set to tell Camera come to paint, I change the game Never paint the same, never stay the same Heart always answer, heart always talk back To the tool at the moment, that creative attack Easy if you see nightmares Easy if you see the fall But I excited to see what the artists do with art Tastes of sellers listening to the rules Tastes of thumb on the scale that goes through the glue Not some pen-a-la-robots picking pretty blonde lady Woman hand on the wheel keeping it all steady From the bargain-pending marshals to the spring collection sheet The artists move first, marketers just follow the scene Slop, no, slap, cause it ugly or it new Slop is when the money pull the strings right through At scale, at scale, them feeding the machine But we put in a thumb, yeah, a thumb on the scale Thumb on the scale Thumb on the scale Cognitive revolution, we have set the tail Play and curiosity, that's the only way Bad heart today, good heart someday Thump on the scale, thump on the scale Cognitive revolution, don't ever fail If you're finding value in the show, we'd appreciate it if you'd take a moment to share it with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guests and topic suggestions, and sponsorship inquiries, either via our website, CognitiveRevolution.ai, or by DMing me on your favorite social network. The Cognitive Revolution is part of the Turpentine Network, a network of podcasts, which is now part of A16Z, where experts talk technology, business, economics, geopolitics, culture, and more. We're produced by AI Podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at AIpodcast.ing. And thank you to everyone who listens for being part of the Cognitive Revolution.