I just want AI to rename my photos

64 min

•Nov 30, 20256 months ago

Summary

David Pierce interviews Thomas Paul Mann, CEO of Raycast, about how AI is being thoughtfully integrated into productivity software. They discuss the challenges of building reliable AI features, the importance of user control and privacy, and how AI can enhance rather than replace existing workflows without becoming a gimmick.

Insights

AI integration should solve real, frequently-used problems rather than chase sci-fi demos that rarely work reliably in practice
Operating system-level AI access is more powerful than app-specific AI because it can orchestrate across all user tools and data
Users need discoverability and guidance about what AI can do, not just open-ended prompts, especially for safety-critical actions
The most successful AI features are those that generate reusable artifacts (code, extensions) rather than one-off conversational outputs
Privacy and user control must be built in from the start when AI has deep system access, not added as an afterthought

Trends

Shift from AI as novelty feature to AI as infrastructure layer in operating systems and productivity platformsModel switching costs are collapsing; users rapidly adopt new models, creating pressure for intelligent routing between modelsAgentic AI workflows remain unreliable; most successful implementations focus on narrow, well-defined tasks rather than open-ended autonomyDesigner-to-developer convergence accelerating as AI coding tools lower barriers to interactive prototypingPrivacy-first AI design becoming competitive differentiator as users grow skeptical of surveillance-based personalizationPrompt-first workflows becoming standard for knowledge workers, replacing traditional app-based task completionExtension ecosystems evolving toward AI-generated, ephemeral software rather than persistent, hand-coded integrationsPersonalization through contextual AI (understanding user patterns and preferences) emerging as key to differentiationHuman-in-the-loop systems gaining favor over fully autonomous AI due to reliability and trust concernsDiscoverability and guided workflows becoming critical UX challenges as AI capabilities expand beyond user understanding

Topics

AI Model Integration and OrchestrationAgentic AI Workflows and ReliabilityPrivacy and User Control in AI SystemsPrompt Engineering and Natural Language InterfacesAI-Powered File and Task AutomationOperating System-Level AI ArchitectureExtension Ecosystems and Developer PlatformsContextual AI and PersonalizationAI Safety and Guard RailsDiscoverability and User Guidance for AI FeaturesLocal vs. Cloud-Based AI ProcessingAI Model Selection and RoutingHuman-in-the-Loop AI SystemsDesigner-Developer ConvergenceAI for Code Generation and Scripting

Companies

Raycast

Launcher app integrating AI models to enable natural language control of apps, files, and system tasks across Mac, iO...

OpenAI

Provided ChatGPT and Whisper models that Raycast integrated early; pioneered accessible LLM APIs for third-party deve...

Todoist

To-do list app using AI transcription via Whisper to convert voice rambling into structured task lists

Microsoft

CEO Satya Nadella predicted computers will largely use themselves via AI, a vision the hosts debate as unrealistic ne...

Anthropic

Claude model provider; Claude Code capability mentioned as example of AI that can modify CSS and generate code

Google

Mentioned as provider of integrated services (Google Docs, Gmail) that Raycast extensions connect to

Notion

Productivity app integrated via Raycast extensions; used by host to capture and organize AI-generated content

Linear

Issue tracking tool integrated via Raycast for project management workflows

GitHub

Developer platform integrated via Raycast extensions for code and repository management

Spotify

Music service mentioned as example of app that Raycast can integrate with via extensions

Tesla

Self-driving car example used to illustrate how incremental AI improvements matter more than waiting for full autonomy

People

Thomas Paul Mann

Discusses philosophy of integrating AI into productivity software, challenges of agentic workflows, and privacy consi...

David Pierce

Leads discussion on AI integration, challenges hosts with real-time Raycast testing, and probes design philosophy que...

Satya Nadella

Vision of computers using themselves via AI is discussed and critiqued by hosts as philosophically problematic

Quotes

"There's a lot of stuff out there that actually does not benefit from having chat GPT shoved into it in some ridiculous way. But on the flip side, there are actually a lot of things that become better and more useful and more functional with these kinds of tools."

David Pierce•Early in episode

"We're not going to go and build our own models. What we did, we did some optimizations on the prompt level and also some fine tuning to like make the models really good in our case."

Thomas Paul Mann•Mid-episode

"If everything is like different and you can't find yourself around, it becomes quite annoying and not useful, right? That's like why people prefer apps in the first place."

Thomas Paul Mann•Mid-episode

"For tools having something unpredictable is like a no-go, right? Like you wouldn't use something complex like Photoshop and half of the time the pixel turns red and half of the time it turns blue."

Thomas Paul Mann•Later in episode

"AI should be on the operating system level. It just makes so much more sense to be there instead of like in every app and every app needs to rebuild it."

Thomas Paul Mann•End of episode

Full Transcript

Dell PCs with Intel inside are built for every moment with long lasting battery life and built in intelligence. You can stay focused on what matters most. Dell Technologies built for you. Dell.com slash Dell PCs. Welcome to the Verge Cast, the flagship podcast of using AI models to rename all the files on your computer for better and for worse. I'm your friend David Pierce and this is the first in a two part series we're doing about AI and more specifically how people who are building AI tools are thinking about the AI tools that they're building. Basically, we're at this moment in time where everyone who makes any kind of app, any kind of software, any kind of hardware for goodness sake is trying to figure out the ways to put AI into this. And on some level, I think that's silly, right? Like there's there's a lot of stuff out there that actually does not benefit from having chat GPT shoved into it in some ridiculous way. But on the flip side, there are actually a lot of things that become better and more useful and more functional with these kinds of tools. One thing I think about a lot is is text transcription. It's a simple thing, but right open AI put out this whisper model that does really good, really fast transcription of audio. That ends up being really powerful for lots of things. There's this feature in Todoist, the app that I really like. It's a to do list app and they have this feature called Ramble. I think I've talked about this before, but you can just talk your to do list. All the things you're thinking about, all the things on your mind, all the things on your shopping list. You just sort of yell it into the app and then it will attempt to go through and structure it all and make sense of it. And there's there's a couple of different layers of AI in there, right? But the first one is just take your voice and reliably successfully transcribe it. That's very powerful. There's also an app I use called my mind that is using AI to do really great search. So that instead of having to like make a bunch of notes and then file them into folders or give them tags or do any kind of organizing. You just put it all in and trust that you'll be able to sort and search and find things as you need to. This stuff can really work. So for the next two Sundays, what I'm going to do is I'm going to talk to two people who are making apps that I think are doing a smart job of AI. It's going to sound in both of these interviews like I like the products and I do. It's that's why they're here because I think they're thinking about AI. Not as just something to like shove into the app to charge you more money or juice their stock price or whatever. But because there's something it actually makes possible and sometimes it makes those things possible in ways that are complicated and messy and privacy threatening and maybe even threatened to like ruin the vibe of the thing you're trying to build in the first place. But that also have upsides that make the things more useful and more fun and more discoverable. So we're going to talk about all of that. And my guest for this first one is Thomas Paul Mann, who is the founder and CEO of a company called Raycast. Raycast is initially was a Mac app. It's now on iOS and on Windows. The way that I would describe it is it's sort of a launcher and then some right. So you use it to replace spotlight on your Mac and it then will let you launch apps. You can use it to store like text expansion things. I have one set up so that when I type type H H O M E for home, it just immediately spits out my home address. That's the thing that lives in Raycast. You can also use it to like manage the windows on your computer and move stuff around. But increasingly one of the biggest things that it can do is access AI models and you can use it just to chat with chat GPT inside of Raycast. But you can also use chat GPT to use your apps. So I can go in and I can type, you know, at browser, download all of the tabs as a CSV and put it into a text file that I can then send to somebody. And that's like a thing it is in theory capable of doing. I can open it up. I can say at Finder. Show me all the files that I have created in the last 24 hours. And it's it's actually an AI system that can use your other apps and even use your computer. We've talked a lot about a browser as we've talked a lot about these sort of tools that have lots of additional context. Raycast has more context than just about any other app. I've been using this app for a long time. I really like it a lot. I have not made that much use of all of the AI stuff inside of it. So I wanted to have Thomas on to both talk me through how he thinks about putting AI into this product and also what it can do for you when it really starts to work. I really enjoyed talking to him. I think you'll enjoy hearing it. We're going to take a quick break and then we're going to get to my interview with Thomas. We'll be right back. Learn more about both technologies on L'Oreal.com. L'Oreal Group. Create the beauty that moves the world. Thomas Paul Mann. Welcome to the Vergecast. Hey, thanks for having me. You and I have talked many times, but we've never talked into a recorder like this. And I'm very excited to have you here. When we were doing the series about people who are sort of building and thinking about AI and what AI can do, which is a conversation you and I have had versions of many times. So now we're just going to do it again. And I'm excited about it. Sounds good. Yeah, sounds like we have done it a few times already. So let's see. Indeed. So first, give me a sense of, I think you've been thinking about AI inside of Raycast for a while. And I would say just sort of rewind to like the early days of when you started thinking about how AI models fit into what Raycast was doing a couple of years ago. Like what were those first conversations you were having? Yeah. So yeah, Raycast is like sort of this global search bar on your Mac, right? And actually now also on Windows. But like basically what we realized when Chatchi BD came around and suddenly everybody talked about a prompt and everybody was looking for like a text box to feed in a prompt that we were really well positioned for that because Raycast itself is like a message search box. And so you can open it anywhere and so you can just type something in. And it just used to be just very static text like, oh, you're searching an app or like a command or you want to do something. But then it felt quite natural to extend this and just put in natural language like a prompt and then get going. So pretty much right after Chatchi BD came out, we were, hey, wait a second, that suits us really well. And so I think the first model was Chippity 3 that we integrated, which was back by end of 2022. Okay. So the thinking even then was like we can solve some of the way people talk to their computers. Because I think to me it's like the very first good thing that any of these models did was make it slightly easier to like speak in English to your computer. Was that kind of your thinking too that like we can just make this make a little more sense? Yeah, I think like the very first thing that people did was just like asking questions, right? And so it's kind of funny because all the way back when we started Raycast, we were like, oh, like we're programmers. We sometimes have questions. How do I do X with that? And then we used to go to something like Stack Overflow and find those questions, right? That somebody else asked and then you like go over it and read the answer yourself. And you kind of had to be very good at basically keywords to find the proper question that leads you to the answer. But now this whole thing got like basically flipped upside down. And so the very first thing we saw like, oh, people just ask questions and so get the answers and then carry on with whatever they do. And this can be little things can be fun things, but like it helped them to basically stay informed. And so one of the first big challenges to overcome was like, oh, but sometimes those models hallucinate and how do we get over that? And I think then the next sort of wave was very easy. Like, oh, well, let the models do what we would do and search the web and then take that information and distill it down into the style and the tone of voice we wanted to have. So that was sort of the very first things that we've seen picking up. And since then it became more and more advanced, right? And then it was like, okay, maybe not just search the web, but search your calendar, search your files, read your files. Do actions like organizing your folders and files on your Mac or like other things. So I think like, as is every new technology, people kind of adapt to like what's possible and then they take the next step and like pushing the boundaries a bit more and more. And given we have like quite advanced users, like they oftentimes on the forefront and so they pushing really hard to the extremes and then we can kind of see what they wanted to do and then indicate that quite nice and make accessible to many more people. So do you have to make a decision early on about like how much of the stack of AI do we want to be part of? Like, I assume there was never a question of like, should we train a Raycast LLM? But it does seem like, you know, you could build something that is essentially just a text box that replaces the chat GPT text box. Or you could build something on top of it. You could try to integrate with an API and be a sort of developer on that. There just seemed like a lot of different ways you could try to do that thing that you just described at like vastly different levels of complexity. Was it obvious to you where to land early on? No. I mean, this thing came out overnight, right? Like suddenly this thing will stand like, oh, wow. Like what used to be sort of sci-fi and the movie thing was certainly like somewhat possible. But like, you kind of need to get started somewhere. So early on, we kind of were just playing with the APIs and they created them. Initially just open AI because this was literally the only API that was there, right? There was nothing else that was even available. And so that like took several months until somebody else popped up. And then it became clear that models are all a bit different. Not even talking about which ones are smarter or faster, but just to have nuances and people prefer to one over the other for often times person reasons like, oh, I like how this model talks to me. And so really quickly we said like, oh, it probably makes sense to integrate all the different models because people are going to have personal preferences and they're going to be better and worse models. Like better models for certain cases. Like sometimes you just don't need the full intelligence. You just like want to do something simple like, oh, summarize this blog post or rewrite this message. Those don't need to be the cutting edge models. You want to have them rather fast. And then sometimes you want to have like a model that goes on for several minutes and does a bunch of research and then coming back with like a big research paper for you. And for that, you probably need to have a better model. And so early on we said, okay, let's integrate with as many models as possible because we have a quite technical audience. So that will help us also guide which models they prefer. And what we now see after like a couple of years, whenever there is a new model dropping, everybody goes to the new model, tries out and basically want to have the latest and greatest. And then they're using that for several months until the next model drops from a different company, most likely. And then they're going over to the next one. So the switching costs between those models are extremely low at the moment, at least for us and for our users. And then I think there's like, they're building up a bit of like muscle memory or like even learning how to get the most out of those models. And those things are sometimes a bit more tied to, let's say, a model family. But early on we said like, we're not going to go and build our own models. What we did, we did some optimizations on the prompt level and also some fine tuning to like make the models really good in our case. And so that is, for example, making like a lot of like a genetic workflows. But nowadays, like a lot of the models are pretty good at that on a basic level already. What were you doing in those early days? Like that was before everybody was talking about agentic stuff, but it also seems like kind of perfectly up the alley of Raycast to figure out, okay, how do we have this unique access to your device and your files and your data? How do we teach this model how to do stuff? Were you poking at that stuff even in these sort of early days before everybody was talking about agentic AI? Yeah. So we had this pretty early when we said like, this makes total sense for us. Like we have this extension platform. So they're like over 2000 extensions that are publicly available. You can integrate Raycast with Notion, Linear, Google Docs, GitHub, you name it. Anything you can really think of and also on your local computer, like it can see your files and your calendar, etc. And so we had all of those extensions lying around and we're thinking like, oh my God, the obvious thing is instead of like you doing everything manual, you say what you want to do and the computer does it for you. Turns out it's a bit harder getting there, right? But the promised land is like quite nice, right? So you flip it upside down essentially and kind of change how you do use computers in the first place. Because if I think about how I used to use a computer is like, okay, I have an idea what I want to do. So in my head, I'm kind of transpiling that into clicks and keystrokes and navigate around my computer. But now it's like, instead of doing that, I just write down or even like speak into my microphone for several seconds and then let the computer handle it for me. So we had this idea early on getting there was a bit harder because one, we wanted to make sure that those things work really well. So that isn't that easy. You also had to figure out a bit the UX because it's still with a prompt way you can say anything is great. But also we used to have great UI that guides us, right? We have buttons that we can click and help us basically navigating. And now suddenly the computer pretends that everything is possible. It's a bit of a lie because that's oftentimes not the truth. And so figuring out sort of the middle ground between when it makes sense to have UI and when it makes sense to just have an open prompt field. That was like a bit of a challenge. Well, that's also kind of an essential Raycast problem, right? Like this is a thing you and I have talked about before the how do you discover what this thing is able to do problem because you open it up and it is just a text box. Like you have the exact same problem that chat GPT has, which is that you open it up and it makes clear that you can do things. But it's not super clear what things to do or how you teach it to do things. And again, this is where like all the agentic companies get really excited because they're like, you just say it and we'll figure it out for you. I'm extremely suspicious of that as a concept. But it does seem like you're sort of stacking discovery problems on discovery problems here. Is there a way to start to push through those things? I think so, yeah. We got to learn how to use this new technology, right? And it changes in how it behaviors. If I look for example, for a moment at coders, right? So they probably a bit further ahead in this adoption curve. And it's like they're very close to technologies. So I think that's why it also progresses there really quickly. But programming used to be like your I text and like if you write something wrong, that's bad. And then at some point, somebody like a compiler tells it that's bad and then we correct it, right? And then we happen to like know it's like, oh, we can do some auto completions. So we kind of know what's possible so we can show you possibilities and then you pick them. And that's great. And then the first LLM use case was like piggybacking those completions and say like, oh, maybe I can tell you a bit longer what you could write and kind of suggest you that and like predict that for you. And then that worked really well. And then now is like, well, I don't even write code anymore. I just like write what I want to do and let the LLM do it for me. Right. And so I think you see like sort of that the pattern where I call this oftentimes prompt first. Like if you know what you do and you know the system, you get actually really good results. But you're right. Like there is a discoverability phase where you need to know what a system can do. Right. And I think we had is like not that long ago when we had all the voice assistants. I'm not meaning like the ones we have right now, but like back in the day, the Alexos and also things where suddenly like voice interface where the hot thing. And then everybody was like, oh my God, this is amazing. I can order an Uber and God knows what. And then everybody got them and well, we all know how that one turned out. Right. Like it was that useful after all. So I think like it's sort of like the tag is obviously like much better now, but people still like need to learn how to use the tag and that just doesn't happen overnight. And so prompting is still like a skill thing. Like oftentimes you get user feedbacks. Oh, this didn't work. And then you look at it is like, well, I'm not surprised that it didn't work because it didn't really tell it. So yeah, discoverability is something that I think is extremely important. I think as those system become much more proactive, I think this will be better. Like when a system pushes to you like, Hey, how about that? Or like you start typing and it's a chest. Oh, I know kind of what you want to do. And I know what the system can do. So I can tell chest intelligently what's possible and kind of guiding you in the right direction. So one place I think that approach could be really useful. And I'm sort of surprised to have not seen more people try to do it is what you were talking about earlier with the fact that there are lots of different models that all have lots of different skills. It seems to me what we need is not just a sort of model switcher, right? And Raycast offers you access to lots of different models. There are lots of apps out there that are just like, we have all the models in one place. And that's something. But what I actually want is something that is like an intelligent router between the models that's like, Okay, this is actually the one that is going to do better image generation. And oh, what you prompted me is actually a huge research product. Let me funnel it to this. Like I think the idea that we all have to understand which model is best for which thing is like ridiculous and just bad UI. And it seems like this is the thing that you're actually in theory in a pretty good position to orchestrate. Is this like a possible thing to do? Like why doesn't this exist yet? Yeah. In fact, like we started doing that, right? I think like it's basically first you kind of need to understand what are the best models for what thing, right? And like some of them you can measure, but others are also like a personal choice. But we started doing this and I noticed like, oh, there are some models that just better at like what you said, like image generation or some are better at like a trending workflows where they use certain tools to get a job done. And some are better at like, yeah, recognition of like images and all this kind of stuff. So we started basically now abstracting that away. And so that we think about of like this, disclosing the complexity over time. So we think the best experience is like you sort of have an automatic mode, which just does what you want, right? Automatically. And you don't need to worry about it. Like if you ask a question where it needs like a deep research, it does it for you. If you want to generate an image, it picks the best model for that. But then also like sometimes when you get more advanced, you maybe want to have the flexibility so you can go a level deeper and we want to give you still the configurability where you say, hey, I kind of know what I'm doing. So I want to have that specific model doing that job. And so you kind of like go up and down and in the configuration and can pick what you want. And I think you see this like in the industry where you saw this on chat, they like put out like an auto mode and then everybody was freaking out that you can't select models anymore. And then they kind of like had to broaden it up. And I think like this is like something which you which you would see more often because I mean, if I think about it, like we have a lot of people who are like, we have this massively smart systems and we still need to figure out which one we need to use. As you mentioned, it's a bit ridiculous, right? So over time, I think this will just go away and it just like does what you want and figures it out, which is not there yet in a way. Do you think Raycast is in a uniquely good position to do that? Because it seems to me like, again, when I think about Raycast, I think about it as like, it's just a box with access to all of my stuff, right? And I think on the one hand, it has it has access to all the files on my computer, which I think is a thing that is sort of unique to Raycast. On the other hand, you have like you said, all of these extensions and all of these apps that I'm plugging into and I'm like, you know, typing my API keys for all of these apps into Raycast. So like you, you have this access. And then on the on the third hand, you have access to all of these models. And so what it seems to me is at least in theory, there's nothing you don't have to just be able to orchestrate all of this stuff on my behalf. Like is there some big blocker here that I'm missing? Or is it just a matter of figuring out how to make all of this stuff work? Yeah, it's more or less. Just basically for us, I guess you're right. Like we're basically in the perfect spot, right? Just like happen to be there at the right time at the right place. So yes, we have access to all of that on top. We also kind of see what you're doing, right? And throughout the day, which actions you perform through Raycast. And so we had like the usage patterns and those things as well. So connecting those things together is I think the magic cells here is like what basically makes this really personal and unique and tied to you, right? Because we all somewhat unique in using our computers differently. And we use different apps and we write differently and we're interested in different things. And so you kind of need to have this personalization layer. But we're also spending like hours a day in front of a computer and perform things, right? And so collecting those and analyzing those and making sure that we basically can predict the next things you want to do and becoming smarter over time. And having this sort of reinforcement learning for you personally. I think that's what really makes a difference. And we call this sort of the contextual AI. We talk a lot about context generally with AI, right? Like I think when you talk to people like prompting is providing context to a large language model so it can produce the best results. But sometimes really hard having the relevant context. But if you happen to like be always around while somebody works on computer, you can collect a lot of that and basically become smarter over time and can help the person steer in the right directions. And kind of ideally the computer knows the same as you do, right? From like what you have read and what you have consumed and building that up over time. So it can basically like take the same resources and throw them back at you and form connected dots because it has read all the things and consume that. And then also tie together and use the same apps and tools that are already connected to RayCost. How do you start on a project like this? Like I think a thing that I see a lot of AI developers and people building stuff with these tools do wrong is they try to do the like 100% version of the thing all at once, right? And like I can say with great confidence, most agent work flows don't work. They just don't. And there are a lot of perfectly valid reasons for that. But there's also a lot that this stuff can do. And I think to me it seems like the struggle right now is figuring out how do we do, how do we sort of sequence this thing that eventually gets us to the place that we think and hope this technology is going, but that actually works today. And what I see is everybody either builds stuff that doesn't work or it ends up just being a chat GPT text box and you just sort of offload the does it work or does it not to chat GPT? Yeah. Like have you have you figured out how to sort of sequence your way to this magical thing that may someday be true, but clearly isn't yet? Yeah. I think that's the tricky part, right? Like we all seen the shiny demos on in launch videos and then they fall apart the moment you use it, right? And it's kind of annoying. And I mean, sometimes those things don't even ship like they're just videos and they're never getting getting materialized. It's science fiction, right? And it is like the dream. This is the interesting thing about this moment is like, I think people mostly agree on what the dream is, but it is still a dream, right? Like it is it is it's it's a plausible future, but it is still the future. And and I think like everybody would do well to just remember that a little more often. Yeah, I think I wonder like if it ends up being this like, you know, like self driving cars, like, oh, it's just like one more year and then it's going to be self driving, right? And then this goes on for 10 years, right? The progress that LLM's made shows a bit different. The trend, right? The trend is more like, oh, wow, we like having like a lot of progress in a very short time and it doesn't seem to stop any anywhere close. But I think like it boils down to like sort of the usual things like do the simple things first, right? So try out because it's such a new technology. You kind of need to get an understanding what's possible and what is not. And so it's a lot of like just prototyping and see kind of what sticks and what brings real value. And it's not just like science fiction, as they say, which maybe works one out of 10 times and then that's not going to be used for you, right? And so I think like finding that middle ground is extremely hard. And you see like some of those things that happening, right? So where people see value, which may be not a sci-fi things that they that we all trim up, but like say meeting recordings, like those happen now basically. You're going to regular basis, right? So there's some true value in here that was not possible before. Or like just like, yeah, the research case is like just consuming and finding information about topics that you would otherwise not have looked up. So there's some of those very concrete examples. And I think there's a lot more out there. Like coding is another one, right? It makes just so much progress in such a short period of time. But it's not this like super channel stuff, which I think for us is in a way like a challenge because Ray, cause this is like everything app, right? You open it and then you can type something in. And so finding that middle ground is sometimes hard for us. But it's coming back to like, okay, let's let's see what people do very, very often every day. See how we can improve those workflows and then go, sweat the details to the prototyping and see what actually makes a difference. And then when you find something like that, you kind of can bring it back and then and then bring it back to users. And that usually like resonates as well, because that's what people then are used to. But yeah, it's like there's a lot of like exploration and at the end of the day, everybody cooks with the same water, right? Yeah. Yeah. Is there an example of that in the product now that you can think of that feels like that that sort of medium measure that you either got right or kind of in the middle of getting right? Yeah. It's it's weird. It's actually the simple thing sometimes like, oh, you open it and like I use it for meetings, for example, all the time. There's like a word pops up. You don't know you open your ass to get the answer done. Like it's those we also optimize Rekos as a tool for something that you basically use like hundreds or thousands of times, right? Like it does a lot of like little things that pile up. So it's for the short interactions. Things that we see people use all the time is like just like plain reformatting text and fix spelling because like, well, we still typing all day long, right? So making those things easier and faster to do. And then people when they get comfortable with it, they're getting a bit more adventurous, right? And then it's like, oh, I just happened to download a bunch of files. I need to move them in a separate folder and also rename them so they make more sense. And so they then type in those prompts and see like, oh, this works as well. And then I take the next step, right? Yeah. Okay. Renaming files is actually like a perfect example of the kind of thing I want to talk about, about Raycast specifically. Yes. Because we've been talking a lot on this show and elsewhere about the idea that Satya Nadella and Microsoft have right now that before long, you're going to barely use your computer and it will sort of use itself on your behalf. Just to put my own cards on the table. I think that is not correct. At least not in any sort of near future. But I do think that there is a lot of room for like doing computer tasks without having to do the tasks, right? And I think about like all of the things that, you know, we've spent 20 years downloading little tiny utilities to do that. These sort of like one off apps that are like batch resize a bunch of photos. Simple example or like rename these photos with all the same name in sequential order based on when I took them. Like these are the kinds of things that we do a lot on our computers that are not hard tasks and they're not particularly like mentally complex tasks. But it's like a constant part of computing life. It seems like you're in a position where I should just be able to say to Raycast. Rename all of the photos on my desktop based on what they are and when I shot them and put them in an order that makes sense. Just clean up my desktop for me. Yeah. Are we almost there? Are we there? Are we nowhere near there? Where are we? We 90% there, I would say. Really? In fact, you can do this today in Raycast. Like we have that, right? You can do this. And then the 90% I'd say like every now and then it doesn't work, right? Thomas, I'm going to try this right now while we sit here and it's not going to work and I'm going to be mad at you about it. I'm scared. But so, but like it is possible, right? And you mentioned like this sort of like super a chendigo asses guy thing, what you mentioned, right? So you like what VETA computer does everything for you? I mean, when we reach that state, we talk about HCI, right? And there was a question like, why should I even open a computer? Like what is a computer at that moment, right? Sure. I think how we think about it is more what's an intelligent OS? What's sort of the AI OS, right? So how how operating systems will change to like adopted as new future where everything can be smart and it's not necessarily static. So you mentioned like you maybe want to have a little app to do something. What if you could have this app just like by asking AI and it builds this little app for you and then you have it for yourself, right? And then you use it for the job. And then when and then the job is done and then it maybe gets disposed and then it's like that's fine, right? And so it's like this one off software, this personal kind of software that is like personal to you, but maybe also to your team or your company that is like very tailored to the use case you want. I think that's like something which is quite fascinating as we like things get smarter and software maybe gets cheaper to build. I think there is something quite fascinating when your operating system becomes similar, right? So where you can just like prompt things into existence for like a short period of moment when you need them. And then when the job is done, you just like don't need to use them anymore. And then tomorrow you have a different one or maybe at some point you have apps that are just appearing there as you as you progress with your day. And it's like, oh, I saw David needs to like do certain things. Hey, here's a little app for you that you probably can use. All right, we got to take one more break and then we're going to come back and we're going to finish my conversation with Thomas Paul Mann. Be right back. Light, straight and multi-styler and the new LED face mask, both of which were recognized as CES 2026 Innovation Award Honorees. Learn more about both technologies on loriall.com. Loriall Group, create the beauty that moves the world. All right, we're back. We're talking AI with Thomas Paul Mann. Let's get back into it. You break up another thing that I've been wondering about, which is I think a thing that Raycast did really well early on was make it really easy to build Raycast extensions. Like it's it's just a little bit of fairly straightforward JavaScript and you can have something up and running pretty fast. And so you've built sort of an app store on top of Raycast in a way that seems to be working really well and there's a lot of stuff and it's pretty easy to do. Does that all eventually go away if we get a gentic AI that is good enough to just go do all this stuff on my behalf and I no longer need this sort of interim step of somebody built an extension that helps go do it? Or is actually what we need lots and lots like should I be using AI to build JavaScript extensions for Raycast or should I be using Raycast to just completely obviate the JavaScript extensions? Yeah, I mean fair point. So yeah, extension was really what put us sort of on the map because we realized really quickly, okay, people just want to integrate and Raycast with everything basically. And there's no way we can build all of that. So we gave it out to community and then we made it like super easy to build them and that allowed us to like have over 2000 extensions now in the store. So and every day there is like new contributions coming and so on and so forth. But if you take a step back, what we really wanted to do is like build a productivity platform. That's sort of like what we wanted to do. And extensions is almost like an implementation detail or JavaScript itself, but even extensions are an implementation detail, right? So imagine like, like those wouldn't exist for a second, but services still exist, right? You still want to do something with like Google Docs or Spotify or you name it, right? All your files for that. And so the idea was always like, how can you integrate with those things really easily so I can do the job for you? Like this illusion that we did is like, oh, people can build extensions, you can use them. But you could even equally think about like, oh, like an AI can build something like that for you so you can use it. And then your extension might be behave differently. So the notion of extensions becomes almost a bit plurie, right? It's just like that's evolving software in a way. And even for yourself, you're probably like just downloading some extensions, but you haven't built them in the first place or somebody else built them. So it's not too far off like for you prompting an AI to come back with a solution for you. But it's like tailored to what's you, right? The key thing I think is like to make it all cohesive. Like if everything is like different and you can't find yourself around, it becomes quite annoying and not useful, right? That's like why people prefer apps in the first place and why apps and mobile phones won because they're like optimized for the phone, right? They follow the same UI and UX patterns and people know how to use them. And so then the mobile app is like also kind of like more and more catering to which that. And I think that's going to be similar like you to make it really useful. We want to integrate with everything around us and make it extremely easy for you to consume that information. And then also because software becomes like free in some way to create at least like little apps, you can transform that however you want to consume it. And I think that's super exciting because we all like slightly different and we have maybe different preferences. I maybe want to see like a craft where this you will have a different representation. And like if you could just change that with like just a little prompt and then you have it your way, I think that's super exciting. Where basically software becomes malleable and you can change it at Hock and it becomes just what you want and becomes really this personal touch. And that's what I'm personally really excited about. And that's what I feel like operating systems really wolf into something that is like a personal operating system to you and they're not looking all the same and software is not all the same. They're like tailored really to the person that sits in front of the screen. Yeah, it's funny. One of the things I talk about all the time with AI stuff that I think is actually really powerful is just like simple CSS stuff for styling apps and webpages. Just the idea that all of a sudden what I now have is the power to tell this app that I want it to be blue and it can be because that is because like that's a that's a thing that like Claude code can do right is is change the CSS to make it blue. That that is a thing it is capable of doing. And then what you need on the other side is basically just the hooks that give me that tool to do. And I think what it's been before is like, okay, you have to build a bunch of complicated things and you have to come up with a whole like how do we display the color wheel do we do and it's like that's not like an impossible thing to do, but it is a thing to do. But if you just let people plug in that way. Yeah, you give them all kinds of opportunities and options just by like opening it up to we're going to let you build this however you want to build it. Yeah, I think that we have all the building blocks right. Right, but I think what I'm getting at with the with the extensions thing is like, as the as you're thinking about AI and I guess just just to go back to this I like I want to rename a bunch of photos and a folder on my computer, which is the thing Raycast is very well set up to do. If I prompt Raycast kind of out of nowhere to just do that, you have a bunch of tools and you have a bunch of, you know, agentic systems that will go try and figure out how to do that for me. Or should I build the thing once like vibe code my way into a Raycast extension that that renames files on my computer, and then just use that over and over because now I've built a thing that is like reliable and robust and simple. And it will do the same thing every time and the problem with a lot of these AI systems is they don't do the same thing the same way every time. And sometimes that's exciting and interesting and leads you down different roads, but other times it I just want it to rename the photos. Like I don't need new ideas about renaming photos. I need you to rename the photos and in the same way all the time. Exactly. And especially as you're thinking about this, you're like, okay, well, do we do we want to use this all of these AI models in a way to like build rigid structured things that you can then do on your computer over and over reliably, or is the kind of open-endedness of the system a feature, not a bug? And I just can't quite figure out where I land on that spectrum. Yeah, it's a tricky one, but I think like for tools having something unpredictable is like a no-go, right? Like you wouldn't use, let's say, I don't know something complex like Photoshop and half of the time the pixel turns red and half of the time it turns blue, right? Like you couldn't work, right? Right. And so I think that's the strong argument for software, right? And so let's say if you can generate the software once, you don't need any AI anymore, it just works and then it does the job perfectly all the time. I think it's like a feature, right? It's not a bug, like it's great. So I think like leaning much more to what's that because that's kind of what the world runs on, right? It's software is like getting written once and then you use it. And you can always adapt it, right? Tomorrow you say like, oh, rename the files like this way now and then you can use this. And I think that's like something which is quite nice when you get out an artifact that you can use. And that's like what we have at the moment as extensions, right? You get this artifact out, you can use those extensions and use them over and over again. Where we sometimes struggle with is like, yeah, sometimes those non-smart things, how you do them, they're like just because they're so reliable and fast and become the muscle memory are somewhat better in a way. And so you kind of want to find a middle ground. And I think for tasks that are very concrete, you want to have what you mentioned, like just you have an app, an extension, whatever it is, but it does the job. It does it all the time the same way, great. And there's some other tasks and I feel like they're oftentimes more open-ended. They like don't have a single solution. They have like nuances to it. You don't even know exactly what you want. And those I feel like are the ones that are really good with AI where it just goes out and does something for you and you come back and say, oh, I haven't thought about that. That's cool. That's a nice solution. So yeah, I think there is something nice about the concreteness of software. You write it once and then it works the same way the whole time. Yeah, that makes sense. Does your quality bar have to be higher than some others because you have this kind of access to all of the apps and even like the system? Like if you wanted to break my computer or allow chat GBT to break my computer, you could. Like it's you have an unusual level of access to my computer in that sense. Does that do you have to treat this kind of nascent technology differently because of it? There is certainly a lot of like scrutiny there. And then when users come to us, they're often times ask us like, oh, is AI running in the background? Can it do something? And so we had actually to put a lot of like just like even UI and call out into the product to say, okay, this is secure. This is not running if you're not triggering it on. You in control. So if there is a disruptive action, for example, like deleting a file, you will be prompted and you can say yes or no to that. And I think that's like, that's like definitely something that we need to maybe do more than others, which others can go with YOLO in a way. And because we have this like system that you mentioned, like we can access your system in a very deep fashion. And so kind of need to build up this trust. And that's also what people expect from us. Like they use it for years already and it becomes, well, it always works, right? It's this app that basically can never fail because it's like always there. And if you don't have it, people like feel like they can't use their computer anymore. And so we put a lot of effort into like making like super stable. And so that's like in here the same way. Like if you use that, it needs to work basically all the time, which as we discussed is really challenging, right? And I think this is with machine learning and AI generally, it will never be 100%, right? This is just the technology doesn't get you there. So it's always like how far you can push it. That's why we have all this benchmarks where all the model providers try to climb them up and be on top of each other. But you will never be 100% correct. And for that is like even more important to have the guard rails, right? So if something goes wrong, you can easily recover or in idea world, it never goes off rails. And you basically give the user the control, which is often I'm described as like having the human in the loop, even so that feels like again, a bit of a sci-fi term, the human. I mean, yeah. Yeah. So do you have to be extra careful about that stuff kind of at every time? Like does it make building Raycast harder because you have built in this AI stuff that can do so much but is kind of unpredictable in that way? I wouldn't say necessarily harder, but it's something which we think about from the get go. We say like, hey, we want to build a private company. We don't want to collect your data and this kind of stuff. So that is something that we build trust on. It just need to be smart to know what you build and maybe what you shouldn't build. And then when you build it also in an elegant way and give the user basically the choice of like, do they want to use it? And then if they use it, give them control. You can also say like, hey, always delete my files. Don't ask me for confirmations. That's like user configuration, right? But default, that's not turned on for reasons. And so basically flexibility. Exactly. Yeah. Full nihilism. Just whatever, delete anything you want. Go on that. See what happens. But then you also want to be smart, right? Like if it's like a rename that you could do undo, you don't want to like prompt the user for that. So this is I think the complexity you may be referring to. You maybe need to think a bit more differently about certain things to make sure that the user's build up confidence over time. Okay. What's something you wouldn't build? Like you mentioned things, things you can and can't do because you have this kind of thing. Is there, is there something that feels obviously over the line to you on that front? I should, I should watch out now what I say, obviously. But I think like it goes to like the privacy aspect. We had certain things like for example, give you, give you like a sense of what we felt like quite cool. We have this feature called focus and the idea of it is like basically you can block distractions like websites and other things. And then it basically plans them out. And if you go there, you see like a warning and so on and so forth. And then initially we had ideas like, hey, wouldn't it be cool to make this smart so that you don't even need to configure. What do you want to plug? It just kind of like detects that this is probably a distraction. And then how you would do it is it's probably like you do a screen recording all the time or some screenshots and then you send them out and then you analyze them. And then you come back. But at the end we felt like, oh yeah, this is maybe stretching it a bit too far. Unlike analyzing your screen all the time, which we don't really want to do. And I think what we realized like users probably would be very hesitant. And then we thought about using like local LLMs for that. And then we said like, actually the person that sits in front of the computer kind of knows what the distractions or the better solution is probably just letting them define it as boring as it sounds. So I feel like sometimes that's the right thing, right? Like, I mean, we have still intelligence we can think. So sometimes maybe we can also put in what we want. So that was like just one of the things which came to mind, which we like sort of first started off. Oh, it's make this super cool AI solution. And then you ask yourself like three times wine and you end up is like, yeah, maybe a more traditional solution actually cuts it here. That's such a good example because that is the sort of thing that at first glance you're like, yeah, it would be useful if Raycast or my system could understand the places that I'm wasting my time. Right? Because it's going to be slightly different for everybody. I spend too much time on Reddit. You might spend too much time on Instagram. And if I could just be like, just delete all the places that I waste time and it could do that. There's something that is cool about that. And there is something that is like immediately horrifying and off-putting about that. Exactly. What what a lot of companies have said forever is just we're going to push through that discomfort and trust that actually people will eventually get used to it. We've made it so convenient that they're going to get past the ick factor of this. And I think, A, this stuff just doesn't work reliably enough yet to do that in a really sort of predictable way. And the minute I go to like my work email and my focus session is like, I'm like, I'm out, right? Like we've now broken the system. But also I think, I think, frankly, every developer has some responsibility here to say, it's actually okay that we're not comfortable with this. And maybe I shouldn't be pushing you to get comfortable with this. Maybe I should be asking you to make decisions because you're a person capable of making decisions not to get over the fact that I'm going to make them for you. And I think we're about to go through a million versions of that with all of this AI stuff. It's like, should we just just bet on the tech getting good enough that everybody will get used to it or have to? Or should we like continue to make an effort to let people be in charge of their own existence? And like this gets big and heady and existential really fast. But it does feel like we're encountering that question kind of a million times every day. And even like, I just keep thinking back to this thing Satya Nelda said about like, we're not that far away from people mostly not using their computers and just directing their computers to use themselves. And I think philosophically there are ways in which that feels wrong to me. I feel like it's always sort of this value exchange, right? What do you get out? What do you put in and what do you get out, right? And so if it's like super valuable, people are willing to put certain things in, right? I mean, people upload hell stuff to chatbots nowadays and all this kind of stuff, but they're getting something out of it, right? So I think it's always the question like, what is the value exchange here? I think it's at this moment really hard predicting the future. If I would look back two years ago when we basically just started this whole AI wave, right? Like, would you think like the world is as it is right now where everything is AI? I don't know. Like it's really hard to predict. Like would you think like coding has changed that much? Would you think like whatever, pick any topic really, right? It's really, really hard to predict. And I think it's the classic, we overestimate the short term and underestimate the long term in this case. I think it's really like that. I think no idea what's going to be happening in the next six to 12 months. I mean, everything changed so rapidly. One thing is clear that those things are here to stay. Like you hear sometimes like even if no models progress any further, we by no means have reached the limit of what you can do with even the state of the art, right? And I think that's kind of like nice for everybody in industry because I mean before AI, let's be honest, like there was a bit of a tri-face in tech, right? Everything was hyper-optimized and nothing really radical changed, at least in the terms of software. And so now there's a lot of busts and like every week there is something new. And I think like even if everything stagnates, we haven't reached sort of the limits what we can do with all the technologies that we invented in the last two years alone. Yeah. I know it is strange that it feels like everyone is so busy. I mean, the self-driving cars thing is a perfect example, right? Like everybody is so busy trying to invent the absolute end state of this where it's like what if it reshaped society? It's like, no, no, no, what if my car parked itself? That's awesome. Let's do that. Like let's figure out how my car can park itself and then how my car can like run more efficiently and there are like a million things along the way that are cool and exciting and powerful that don't require like rethinking the way an economy works. And like let's not skip all the steps because those are interesting things on the way to something potentially bigger. Before I let you go, let's just spend a couple of minutes talking about how you use AI in Raycast and in general. Like where does this stuff fit into sort of your day-to-day life and workflows right now? Yeah. I think the biggest change for me is like, for me it's prompt first now. Basically everything I do is start with a prompt. Like, hmm, well, we launched something. Okay, got to write a blog post. Let me ramble for five minutes into my microphone and then that's my starting point and then iterate on that. Like that's one of the things. Oh, I need to like enter emails, which I do a lot. Okay, I'm going to do a lot with AI here, writing code, same way. One of the things that changed for me quite radically is that you can sort of do things in parallel in the background. Like I can just kick off a bunch of things. Oh, there is a feature request on Twitter. Okay, let me kick something off and address that right away. Oh, there is another one here. Let me do that as well. Oh, I have this idea. Let me like kick off some deep research and figure out what's a good solution for that. And it's like, oh, I need to like prepare for the board meeting. Oh, let me put a few things together. So I think my brain is like completely rewired and it's like I'm prompt first by now. And I basically just put things on like start with a prompt and then see. Do you then wait, I have a procedural question about that. Oh, yeah, please. So if you start everything with a prompt, is the goal then to kind of filter everything out into somewhere? Or do you find yourself like living more and more of your life kind of inside the chats of these LLMs? Oh, yeah, there are sometimes things that are just like instead of our AI chat and Raycos, but like does never really sort of produce as an output, right? Is maybe me like chatting for a while through something. Oh, like I have this like pick any topic I have like, oh, I want to think about how we can land a deal. These are sort of the points we have like what are elegant ways like maybe continue the conversation. So how could I like find find a solution to like reach our customers better and like sort of it's almost I think about it as like a thinking partner, like throwing things back and forth and like talk to somebody for like a bit and sharpen myself up in a quicker fashion. That's how I use it like a whole lot. And so that's I see also like in our company is just changing where more and more people just start with a prompt. A big change that we see in the company is all our designers decode now. What used to be basically all static designs, they're more and more become interactive prototypes directly in our product. So they can get something where you can feel it and see it and it works. And then oftentimes an engineer like brushes it up. But all our designers are basically also halfway developers now, which is an incredible change. And I think that like just is really nice for creative people as well because there was always this barrier like, oh, you you draw a few pixels and then somebody else needs to rebuild them to make it interactive. And so now we cross that by essentially it's just like it's a plant. Like if you're a creative person and you have to will you can make things happen, which which I'm super, super happy to see that it basically coding becomes more accessible to a way. And I'm still failing in a lot of ways that regarding programming, but I think that's like something that that we've seen in our company happening really heavily that designers become also developers. Okay. Yeah, I think to me, part of the reason I ask is because one of the things that was most sort of unlocking in my brain was the thing in Raycast where you can like you can basically at mention one of your. Oh, yes. And then prompt it. And it's like that that to me is like, okay, now we are now we are getting to like the sequence of things that make sense together, right? I don't now need. I don't know. I don't I don't now need a bunch of different very specific apps. I can just ask AI models to talk to the apps that they already have access to. It sometimes works. It sometimes doesn't my whole clean up the desktop thing has not worked at all as we've been sitting here. Just just nothing. It gave me a bunch of semi helpful information about the files that I have got to improve it. See, that's the way but I can do more prompting. I'll figure some stuff out. But I think like there's just there's something that unlocks when you start to see, okay, here, here are kind of the things that are available to me. And you've just seen more of those things than most people. So I was curious to know, like, are you just constantly doing computer activities through prompts now? Like you're starting by everything starts with the prompt. Yeah, pretty much like basically for me, it's it's a lot of like, I'm in a browser, I have a few types open. I pull them in with my app browser. Essentially, I get all the all the tabs in. Then I start from there. Then I say like, oh, by the way, put this in an ocean page. So then it ends up in an ocean page and I can share it with my team. Then I iterate on the notion page. I do those things like quite a lot. But also is like, I let it write code for me to do certain tasks. Like I had reasonably bit of a silly example, but I had to do my texture turn. Well, I didn't do the text return with AI, right? But for that, I needed to download all my payroll and all of them had a password. So I was just asking AI is like, hey, take those 10 PDFs and here's the password. Can you remove it so I can send it to my account? And it did it for me. It just wrote some code. I didn't really look at a code because I like I kind of like know, okay, that's like what it would do. And then it like perfect. Otherwise, I would have spent like, I don't know, five minutes going over each PDF. First of all, figuring out how to remove a password, which I have no idea. And so I think that's like, that's, I think the change, which I'm quite happy about. And it's like, for programmers, this has kind of existed for a long time. We call this thing scripts. It's like little things that a programmer, every programmer you ask, they have a script for various random stuff that I do multiple times a day. What is if this script is just natural language? Like, what if you just say this and then to your point, if it solves the problem once, just reuse it. We can use it like many times, right? That's I think like those kind of little things that will make a big difference. And that's what we do with Raycast. We want to speed up every little thing and you use Raycast hundreds of times a day. How can what, what are the next hundred things you should do with Raycast? That's how we think about it. What are the problems we can solve that you use actually super often, not just like once a year or whatever. And that's like, I think the journey we on. That's pretty cool. Yeah. And it's like once you have computer access, the number of things that can start to comprise becomes just enormous. Yes. And you have access to the browser. And it's like, again, this is why I think Raycast is so fascinating because you have, you can see the whole stack in a way that is very hard to do for almost any other app. It makes it means the trust bar for you is very high. And it also means like we talk a lot about, you know, that these AI agents just can't see and do all the things that they need. Raycast kind of can. Yeah. I think like that's the, that's the, the nice position to be in, like being at this position to do all of this kind of stuff. But we still got to connect all the dots and build up the discoverability as you mentioned, make sure that people get it. And also make sure people get real value out of it. Like I've seen so many demos of cool stuff, but then you're never going to use this day to day only so little that it doesn't really play well. And so that's like for us really the challenge, like natural language is great, but discoverability is hard. You don't know what's feasible and so on and so forth. But yeah, like I'm excited about this, helping basically making your computer smarter by using the same apps and tools you have. By having one AI that kind of follows you around across your journey on your computer and not having like an AI in every app. And it's like everything is like isolated. We've been there with apps, right? It's kind of like annoying. And we don't want to spread that again that all our knowledge, memory, context lives in each and every app. And I get it. Like every company of those apps want to have this, right? They want to lock you in. So you stay in that single app. It's like the financial things that I want to have, right? They don't want to give it away. But if you purely think from a user's standpoint, AI should be on the operating system level. It just makes so much more sense to be there instead of like in every app and every app needs to rebuild it. It just happened to be this gold rush that everybody sees. But truly from a user's point, I feel like the best thing is if you have a smart operating system that helps you to get your job done. Yeah, I agree. All right, Thomas, this has been very fun. Thank you so much for doing this with me. Well, thanks for having me, long-term listener, and finally making our way here somewhere together. We did it. All right, that's it for the show. Thank you to Thomas again for being here. And thank you to all of you for watching and listening as always. If you have questions, if you have Raycast extensions you want to tell me about, if you have thoughts, concerns, feelings about any of this, I want to hear all of them. You can call the hotline 866-verge-1-1. You can email vergecast.tivherge.com. I'm David at the verge.com. Hit us up. I think this question of how AI belongs in our software is big and fascinating and messy, and I want to know how you feel about it. So get at us, ask us all your questions. We have another one of these coming up next week about a very different kind of app that I'm very excited to talk about. We'll get to that. But for now, The Vergecast is a verge production and part of the Vox Media podcast network. The show is produced by Eric Gomez, Brandon Kieffer, and Travis Larchuk. We'll be back on Tuesday and Friday with all of your usual Good Vergecast stuff. We'll see you then. Rock and roll. Be strong whatever you're doing.