Cursor's Third Era: Cloud Agents
Cursor's founders discuss their latest Cloud Agents launch, which gives AI agents full computer access in cloud VMs to write, test, and deploy code autonomously. They explore the shift from tab completion to agentic coding, parallel agent workflows, and how coding patterns are fundamentally changing as agents take on larger development tasks.
- Cloud agents with full VM access represent a paradigm shift from local coding assistance to autonomous development workflows
- Video recordings of agent work are becoming crucial for code review, making it easier to evaluate large diffs and agent decisions
- The future of coding involves parallel agent workflows where developers manage multiple concurrent tasks rather than writing code directly
- Agent labs like Cursor are building routing systems to automatically select optimal models, reducing user cognitive load
- Self-aware agents that can modify their own system prompts and understand their environment constraints will be key to scaling autonomous development
"We think that over the months the big unlock is not going to be one person with a model getting more done. Like the water flowing faster, it will be making the pipe much wider and so paralyzing more."
"If you put yourself in the model shoes and you are seeing tokens stream by and all you could do was site read code and spit out tokens and hope that you had done the right thing. No chance I'd be so bad."
"We have found that in this new world where agents can end to end write much more code, reviewing the code is one of these new bottlenecks that crop up."
"10 person startups need the DEVEX and pipelines that a 10,000 person company used to need."
"I think honestly it's 1%. If I just am like, get frustrated and I'm like, I don't want to go have it tell an agent to change this one thing."
So this is another experiment that we ran last year and didn't decide to ship at that time, but may come back to you LLM judge, but one that was also agentic and could write code. So it wasn't just picking, but also taking the learnings from two models or N models that it was looking at and writing a new diff. And what we found was that there were strengths to using models from different model providers as the base level of this process. Basically you could get almost like a synergistic output that was better than having a very unified like bottom model tier.
0:00
We think that over the months the big unlock is not going to be one person with a model getting more done. Like the water flowing faster, it will be making the pipe much wider and so paralyzing more. Whether that's swarms of agents or parallel agents, both of those are things that contribute to getting much more done in the same amount of time.
0:30
This week, one of the biggest launches that Curse has ever done is cloud agents. I think you, you had cloud agents before, but this was like you give cursor a computer, right?
0:53
Yeah.
1:03
So it's just basically they bought autotab and then they repackaged it. Is that what's going on or.
1:03
That's a big part of it. Yeah. Cloud agents already ran in their own computers, but they were sort of sight reading code. Those computers were not, they were like blank VMs typically that were not set up for the devex for whatever repo the agent's working on. One of the things that we talk about is if you put yourself in the model shoes and you are seeing tokens stream by and all you could do was site read code and spit out tokens and hope that you had done the right thing. No chance I'd be so bad. Like obviously you need to run the code. And so that I think also is probably not that contrarian of a take, but no one has done that yet. And so giving the model the tools to onboard itself and then use full computer, use end to end pixels in, coordinates out and have a cloud computer with different apps in it is the big unlock that we've seen internally in terms of usage of this going from oh, we use it for little copy changes to no, we're really like driving new features with this kind of new type of agentic workflow.
1:09
All right, let's see it.
2:07
Cool. So this is what it looks like in cursor.com agents. So this is one I kicked off a while ago. So on the left hand side is the Chad, very classic sort of agentic thing. The big new thing here is that the agent will test its changes. So you can see here it worked for half an hour. That is because it not only took time to write the tokens of code, it also took time to test them end to end. So it started Dev servers iterate when needed. And so that's one part of it is like model works for longer and doesn't come back with a. I tried some things pr, but I tested it PR that's ready for your review. One of the other intuition pumps we use there is if a human gave you a pr, asked you to review it, and they hadn't tested it, you'd also be annoyed because you'd be like, only ask me for a review once it's actually ready. So that's what we've done with simple question.
2:08
I wanted to gather upfront. Some PRs are way smaller, like just copy change. Does it always do the video or is it sometimes.
2:56
Sometimes.
3:03
Okay, so what's the judgment?
3:04
The model does it. So we do some default prompting with sort of what types of changes to test. There's a slash command that people can do called slash no test, where if you do that, the model will not
3:05
test, but the default is test.
3:17
The default is to be calibrated. So we tell it, don't test very simple copy changes, but test like more complex things. And then users can also write their agents MD and specify like this type of if you're editing this subpart of my mono repo never tested because that won't work or whatever. Okay, so pillar one is the model actually testing. Pillar two is the model coming back with a video of what it did. We have found that in this new world where agents can end to end write much more code, reviewing the code is one of these new bottlenecks that crop up. And so reviewing a video is not a substitute for reviewing code, but it is an entry point that is much, much easier to start with than glancing at some giant diffuser. And so typically you kick one off you, it's done, you come back and the first thing that you would do is watch this video. So this is a video of it. In this case, I wanted the tooltip over this button. And so it went and showed me what that looks like in. In this video that I think here it actually used a gallery. So sometimes it will build storybook type galleries where you can see like that component in action. And so that's. Pillar two is like these demo videos of what it built and Then pillar number three is I have full remote control access to this vm. So I can go in here, I can hover things, I can type, I have full control and same thing for the terminal, I have full access. And so that is also really useful because sometimes the video is like all you need to see and oftentimes by the way, the video is not perfect. The video will show you is this worth either merging immediately or oftentimes is this worth iterating with to get it to that final stage where I am ready to merge it. And so I can go through some other examples where the first video wasn't perfect, but it gave me confidence that we were on the right track. And two or three follow ups later it was good to go. And then I also have full access here where some things you just want to play around with. You want to get a feel for what is this. And there's no substitute to a live preview and the VNC kind of VM remote access gives you that.
3:20
Amazing. Well, sorry, what is vnc?
5:17
Just the remote desktop. Remote desktop, yeah.
5:19
Sam, any other details that you always want to call out?
5:22
Yeah, for me the videos have been super helpful, I would say, especially in cases where a common problem for me with agents and cloud agents beforehand was almost like under specification in my requests, where our plan mode and going really back and forth and getting detailed implementation spec is a way to reduce the risk of under specification. But then similar to how human communication breaks down over time, I feel like you have this risk where it's okay when I pull down, go to the trouble of pulling down and like running this branch locally, I'm going to see that like I said, this should be a toggle and you have a checkbox and like why didn't you get that detail? And having the video up front just has that makes that alignment like you're talking about a shared artifact with the agent very clear, which has been just super helpful for me.
5:24
I can quickly run through some other examples. So this is a very front end heavy one. Yes, I was going to say is
6:07
this only for front end?
6:13
Exactly. One question you might have, is this only for front end? So this is another example where the thing I wanted it to implement was a better error message for saving secrets. So the cloud agents support adding secrets. That's part of what it needs to access certain systems, part of onboarding that
6:14
is giving on cloud agents.
6:34
Yes. So this is a fun thing is
6:36
it can get super meta, it can
6:38
get super meta, it can start its own cloud agents, it can Talk to its own cloud agents. Sometimes it's hard to wrap your mind around that. We have disabled its cloud agents, starting more cloud agents. So we currently disallow that.
6:40
Someday you might.
6:52
Someday we might. Someday we might. So this actually was mostly a backend change in terms of the error handling here where if the secret is far too large, it would. Oh, this is actually really cool. That's the devtools. So if the secret is far too large, we don't allow secrets above a certain size. We have a size limit on them. And the error message there was really bad. It was just some generic failed to save message. So I was like, hey, we wanted an error message. So first cool thing it did here, zero prompting on how to test this. Instead of typing out the like a character 5,000 times to hit the limit, it opens Dev tools, writes JS or to paste into the input 5000 characters of the letter A and then hits save, closes the devtools, hits save and gets this new. Gets the new error message. So that looks like the video actually cut off. But here you can see the. Here you can see the screenshot of the. Of the error message. So that is like front end, back end, end to end feature to get that.
6:53
Yeah. And you just need a full vm, full computer run, everything. Okay.
7:47
Yeah, yeah. So we've had versions of this. This is one of the autotab lessons where we started that. In 2022. No, in 2023. And at the time it was like browser use, Dom, like all these different things. And I think we ended up very sort of AGI pilled in the sense that just give the model pixels, give it a box, a brain in a box is what you want. And you want to remove limitations around context and capabilities such that the bottleneck should be the intelligence. And given how smart models are today, that's a very far out bottleneck. And so giving it its full VM and having it be onboarded with devex set up like a human would has just been for us internally a really big step change in capability.
7:51
Yeah, I would say let's call it a year ago, the models weren't even good enough to do any of this stuff.
8:36
So even six months ago.
8:41
Yeah, so yeah, what people have told me is like round about Sonnet 4.5 is when this started being good enough to just automate fully by pixel.
8:43
Yeah, I think it's always a question of when is good enough. I think we found in particular with Opus 4.5, 4.6and codecs 5.3 that those were additional step changes in the autonomy grade capabilities of the model to just go off and figure out the details and come back when it's done.
8:51
I want to appreciate a couple details. One tanstack router. I see it. Yeah, I'm a big fan. Do you know I have to name the Tanstack? This is a random lore. Some buddies with Tanner. And then the other thing, if you switch back to the video. Yeah, I want to shout out this thing. Probably Sam did it. I don't know.
9:06
The chapters.
9:21
What is this called? Yeah, this is called Chapters. It's like a Vimeo thing. I don't know. But it's so nice. The design details like the. And obviously a company called Cursor has to have a beautiful cursor.
9:21
And it is a cursor. Cursor.
9:31
You see it branded Cursor.
9:32
Cursor. Yeah. Okay, cool.
9:33
And then I was like. I complained to Evan. I was like, okay, you guys branded everything but the wallpaper. And he was like, no, that's a cursor wallpaper. I was like, what?
9:34
Yeah, Rio picked the wallpaper, I think. Yeah. The video. That's probably Alexi and a few others on the team with the Chapters on the video. Matthew, Frederica. There's been a lot of teamwork on this. It's a huge effort.
9:42
I just like design details. Yes. And then when you download it, it adds like a little cursor kind of TikTok clip thing.
9:53
Yes, yes.
10:00
So to make it really obvious it's
10:01
from Cursor, we did the TikTok branding at the end. This was actually in our launch video. Alexi demoed the cloud agent that built that feature, which was funny because that was an instance where one of the things that's been a consequence of having these videos is we use Best of N where you run head to head different models on the same prompt. We use that a lot more because one of the complications with doing that before was you'd run four models and they would come back with some giant diff, like 700 lines of code times four. What are you going to do? You're going to review all. That's horrible. But if you come back with four 20 second videos. Yeah, watch four 20 second videos. And then even if none of them is perfect, you can figure out like, which one of those do you want to iterate with to get it over the line? And so that's really been really fun. Here's another example that we found really cool, which is we've actually turned synths into a slash Command as well, repro, where for bugs, in particular the model having full access to the to its own vm, it can first reproduce the bug, make a video of the bug reproducing, fix the bug, make a video of the bug being fixed, like doing the same pattern workflow with obviously the bug not reproducing. And that has been the single category that has gone from like these types of bugs. Really hard to reproduce and takes you tons of time locally. Even if you tried a cloud agent on it, are you confident it actually fixed it to when this happens? You'll merge it in 90 seconds or something like that. So this is an example where. Let me see if this is the broken one or the. Okay, this is the fixed one. Okay. So we had a bug on cursor.com agents where if you would attach images, remove them and then still submit your prompt, they would actually still get attached to the prompt. Okay. And so here you can see Kersher is using its full desktop. By the way, this is one of the cases where if you just do browser use type stuff, you will have a bad time because it now needs to upload files. Like it just uses its native file viewer to do that. And so you can see here, it's uploading files, it's going to submit a prompt and then it will go and open up. So this is the meta, this is cursor agent prompting cursor agent inside its own environment. And so you can see here bug, there's five images attached, whereas when it's submitted, it only had one image.
10:02
I see, yeah, but you gotta enable that if you're using cursor agent inside cursor agent.
12:23
Yeah, exactly. And so here, this is then the after video where it went, it does the same thing. It attaches images, removes some of them, hit send. And you can see here, once this agent is up, only one of the images is left in the attachments. Yeah.
12:27
Beautiful.
12:41
Okay, so easy merge.
12:42
Yeah. When does it choose to do this? Because this is an extra step.
12:43
Yes. I think I've not done a great job yet of calibrating the model on when to reproduce these things. Sometimes it will do it of its own accord. We've been conservative where we try to have it only do it when it's quite sure because it does add some amount of time to how long it takes it to work on it. But we also have added things like the slash reproach command where you can just do fix this bug repro. And then it will know that it should first make you A video of it actually finding and making sure it
12:47
can reproduce the bug.
13:15
Yeah. Yeah.
13:15
One sort of ML topic this ties into is reward hacking, where. Well, you write tests that you update only pass. So first write test, it shows me it fails, then make your test pass, which is a classic, like red, green. Yeah, like tdd. TDD thing. No. Very cool. Was that the last demo?
13:16
Yeah. Anything I missed on the demos or points that you think it covers it well?
13:32
Yeah.
13:35
Cool. Before we stop the screen share, can you give me like a. Just a tour of the slash commands? Because I so goddamn many. What are the good ones?
13:36
Yeah, we want to increase discoverability around this too. I think that'll be like a future thing we work on. But there's definitely a lot of good stuff.
13:43
Now we have a lot of internal ones that I think will not be that interesting. Here's an internal one that I've made. I don't know if anyone else at Cursor uses this one.
13:49
FixBB.
13:57
I've never heard of it.
13:58
Yeah, Fix Bug bot. So this is a thing that we want to integrate more tightly.
13:59
So you made this for yourself?
14:04
I made this for myself. It's actually available to everyone in the team, but no one knows about it. But yeah, there will be Bugbot comments. And so Bugbot has a lot of cool things. We actually just launched Bugbot Auto Fix where you can click a button and. Or change a setting and it will automatically fix its own things. And that works great in a bunch of cases. There are some cases where having the context of the original agent that created the PR is really helpful for fixing the bugs because it might be like, oh, the bug here is that this is a regression. And actually you meant to do something more like that. And so having the original prompt and all the context of the agent that worked on it. And so here I could just do fix, or I used to be able to do Fix VB and it would do that. No test is another one that we've had. Repro is in here. We mentioned that one.
14:05
One of my favorites is Cloud Agent Diagnosis. This is one that makes heavy use of the Datadog mcp and I think Nick and David on our team wrote and basically if there is a problem with a cloud agent, we'll spin up a bunch of sub agents. Yeah, we'll take the ID as an argument and spin up a bunch of subagents using the Datadog MCP to explore the logs and find like all of the problems that could have happened with that. It takes the debugging time like from potentially you can do quick stuff quickly with the Datadog ui, but it takes it down to again like a single agent call as opposed to trawling through logs yourself.
14:53
You should also talk about the stuff we've done with transcripts.
15:26
Yes, also. So basically we've also done some things internally. There'll be some versions of this each ship publicly soon where you can spit up an agent and give it access to another agent's transcript to either basically debug something that happened, so act as an external debugger or continue the conversation. Almost like forking it.
15:28
A transcript includes all the chain of thought for the 11 minutes here, 45 minutes there.
15:47
Yeah, exactly. So basically acting as a like secondary agent that debugs the first. So we've started to push and they're
15:51
all the same code. It's just a different prompt but the. The same.
15:59
Yeah. So basically same cloud agent infrastructure and then same harness. And then like when we do things like include, there's some extra infrastructure that goes into piping in like an external transcript if we include it as an attachment. But for things like the cloud agent diagnosis, that's mostly just using the Datadog MCP because we also launched MCPS along with, along with this cloud agent launch. Launch support for Cloud Agent MCP is that.
16:02
Oh, that was drowned out.
16:25
We will, we'll be doing a bigger marketing moment for it next week but you can now use mcp.
16:27
You'll be ahead and I would. I actually don't know if the Datadog MCP is like publicly available yet. I realized this beta testing it but it's been one of my favorites to use.
16:35
So I think Data one's interesting for Datadog because Datadog wants to own that site. Right. Which is bits. I don't know if you've tried Bits.
16:45
I haven't tried Bits.
16:52
Yeah, that's their cloud agent product.
16:53
Yeah, they want to be like we own your logs and give us some part of the self healing software that everyone wants. But obviously Cursor has a strong opinion on coding agents and you're taking away from that which obviously you're going to do. And not every company is like Cursor, but it's interesting. If you're a datadog, what do you do here? Do you expose your logs to mtp?
16:55
Do you.
17:16
You let other people do it or do you try to own that as because it's extra business for you? Yeah, it's an interesting one.
17:16
Good question. All I know is that I love
17:22
the Datadog MCP and yeah, it's gonna be.
17:25
No.
17:28
No surprise that people like will demand it. Right? Yeah, it's. It's like any system, a record company like this. It's like, how much do you give away? Cool. I think that's that for the sort of cloud agents tour. Cool. And we just talked about like cloud agents have been. When did Kirsten launch plot agents?
17:28
Do you know? June last year.
17:44
June last year. So it's been a slowly developing thing. You did like a bunch of like Michael did a post for himself where he like showed this chart of like agents overtaking tab and I'm like, wow, this is like the biggest transition in code. Like in. In like the last.
17:45
Yeah, I think that kind of got drowned out. Yeah, I think it's a very.
18:00
Not at all. I think it's been highlighted by our friend Andrej Kafathi today. Okay, talk more about it.
18:03
What does it mean?
18:08
Is it just got given like the cursor tab key?
18:09
Yes. Yes. That's like cool.
18:11
I know, but it's going to be like put in a museum.
18:12
It is.
18:15
I have to say, I haven't used Tavin a little bit myself.
18:16
Yeah. I think that what it looks like to code with AI code generally create software, even if you want to go higher level, is changing very rapidly. Not a hot take, but I think from our vantage point at cursor, I think one of the things that is probably underappreciated from the outside is that we are extremely self aware about that fact. And Kershaw got its start in phase one era, one of like tab and autocomplete. And that was really useful in its time. But a lot of people start looking at text files and editing code like we call it hand coding now. When you like type out the actual letters, it's.
18:19
Oh, that's cute.
18:56
Yeah. Oh, that's cute.
18:57
So Boomer.
18:58
So Boomer. And so that I think has been slowly accelerating and now in the last few months rapidly accelerating shift. And we think that's going to happen again with the next thing where the. I think some of the pains around tab of it's great, but I actually just want to give more to the agent and I don't want to do one tab at a time. I want to just give it a task and it goes off and does a larger unit of work and I can lean back a little bit more and operate at that higher level of abstraction. That's going to happen again. Where it goes from agents handing you back diffs and you're like in the weeds and giving it 30 second to three minute tasks to. You're giving it three minute to 30 minute to three hour tasks and you're getting back videos and trying out previews rather than immediately looking at diffs every single time.
18:58
Yeah, nothing said.
19:43
One other shift that I've noticed as our cloud agents have really taken off internally has been a shift from primarily individually driven development to almost this collaborative nature of development. For us, Slack is actually almost like a development an IDE basically.
19:44
Maybe don't even build a custom ui. Like maybe that's like a debugging thing.
20:01
But actually that I feel like, yeah, there's still so much left to explore there. But basically for us, like Slack is where a lot of development happens. Like we will have these issue channels or just like this product discussion channels where people are always at cursoring and that kicks off a cloud agent. And for us at least we have team follow ups enabled. So if Jonas kicks off at cursor in a thread I can follow up with it and add more context. And so it turns into almost like a discussion service where people can like collaborate on ui. Oftentimes I will kick off an investigation and then sometimes I even ask it to get blame and then tag people who should be brought in because it can tag people in Slack and then
20:04
other people will come in, tag other people who are not involved in pharmacy
20:44
can just do Jonas. If I was talking to.
20:48
Yeah, that's cool.
20:50
You should.
20:52
You guys should make a bigger deal of that.
20:52
I know it's a lot to. I feel like there's a lot more to do with our Slack surface area to show people externally. But yeah, basically like it can bring other people in and then other people can also contribute to that thread and you can end up with a PR again with the artifacts visible and then people can be like, okay, cool, we can merge this. So for us it's like the IDE is almost like moving into Slack in some ways as well.
20:53
I have the same experience with. But it's not developers, it's me designer, salespeople. Yeah, so me on like technical marketing vision, designer on design and then salespeople on here's the legal thirds of what we agreed on and then they all just collaborate and correct by the agents.
21:15
I think that we found in these threads is the work that is left that the humans are discussing in these threads is the nugget of what is actually interesting and relevant. It's not the boring details of where does this if statement go. It's do we want to ship this? Is this the right ux. Is this the right form factor? How do we make this more obvious to the user? It's like those really interesting kind of higher order questions that are so easy to collaborate with and leave the implementation to the cloud agent.
21:31
Totally. And no more discussion of am I going to do this? Are you going to do this? Curse is doing it. You just have to this idea like
21:57
it Sometimes I don't know if there's a this prob. You guys probably figured this out already but sometimes you need like a mute button. So like cursor. Like we're going to take this offline but still online, but like we need to talk among the humans first before you like stop responding to everything.
22:02
Yeah. This is a design decision where currently cursor won't chime in unless you explicitly ention it.
22:18
Yeah. So it's not always listening but can
22:23
see all the intermediate messages.
22:26
Have you done the recursive? Can cursor add another cursor or spawn another cursor?
22:28
Oh, we've done some versions of this.
22:32
We can add humans.
22:34
Yes. One of the other things we've been working on that's like an implication of generating the code is so easy is getting it to production is still harder than it should be. And broadly you solve one bottleneck and three new ones pop up. And so one of the new bottlenecks is getting into production. And we have a like joke internally where you'll be talking about some feature and someone says I have a PR for that. Which is it's so easy to get to I have a PR for that. But it's hard still relatively to get from I have a PR for that to I'm confident and ready to merge this. And so I think that over the coming weeks and months, that's the thing that we think a lot about is how do we scale up compute to that pipeline of getting things from a first draft an agent did.
22:35
Isn't that what merge. Isn't that what graphite's for?
23:18
Like graphite is a big part of that.
23:21
The cloud agency, is it fully integrated or still different companies working on.
23:24
I think we'll have more to share there in the future. But the goal is to have great end to end experience where cursor doesn't just help you generate code tokens, it helps you create software end to end. And so review is a big part of that that I think especially as models have gotten much better at writing code, generating code, we've felt that relatively crop up more.
23:28
Sorry, this was completely unplanned but like I had people arguing one, you need AI to review AI and then it is another approach thought school of thought where it's no reviews are dead. Like just show me the video.
23:49
Yeah, I feel again for me the video is often like alignment and then I often still want to go through a code review process, still look at the files and there's a spectrum of course like the video. If it's really well done and it does it like flower fully like test everything, you can feel pretty competent. But it's still helpful to look at the code. I make have pay a lot of attention to Bugbot. I feel like Bugbot has been a great, really highly adopted internally. We often, we tell people like don't leave bug bot comments unaddressed because we have such high confidence in it. So people always address their bug bot comments.
24:05
Once you've had two cases where you merged something and then you went back later, there was a bug in it, you merged it, you went back later and you were like oh, bug bot had found that I should have listened to Bugbot. Once that happens two or three times, you learn to wait for Bugbot.
24:34
Yeah. So I think for us there's like that code level review where like, like it's looking at the actual code and then there's like the like feature level review where you're looking at the features. There's like a whole number of different like areas. There'll probably eventually be things like performance level review, security review, things like that where it's like more, more different aspects of how this feature might affect your code base that you want to potentially leverage an agent to help with.
24:44
And some of those like bugbod will be synchronous and you'll typically want to wait on before you merge. But I think another thing that we're starting to see is as with cloud agents, you scale up this parallelism and how much code you generate. 10 person startups need the DEVEX and pipelines that a 10,000 person company used to need. And that looks like a lot of the things I think that 10,000 person companies invented in order to get that volume of software to production safely. So that's things like release frequently, release slowly have different stages where you release have checkpoints, automated ways of detecting regressions and so I think we're going to need stacked diffs, stack diffs, merge queues. Exactly. A lot of those things are going to be important.
25:09
For what it's worth, I think the majority of people still don't know what stack divs are and I like, I have many friends in Facebook and like, I'm pretty friendly with graphite. I've just, I've never needed it because I don't work on that larger team. And it's just like democratization of here's what we've already worked out at very large scale and here's how you can. It benefits you too. Like, I think to me, one of the beautiful things about GitHub is that it's actually useful to me as an individual solo developer. Even though it's like actually collaboration software. Yep. And I don't think a lot of dev tools have figured that out yet. That transition from like large down to small.
25:50
Yeah. Cursor is probably an inverse story.
26:24
This is small down to a large.
26:27
Where historically cursor, part of why we grew so quickly was anyone on the team could pick it up and in fact people would pick it up on the weekend for their side project and then bring it into work because they loved using it so much. And I think a thing that we've started working on a lot more, not us specifically, but as a company and other folks at Cursor is making it really great for teams and making it that the 10th person that starts using Cursor in a team is immediately set up with things like we launched Marketplace recently so other people can configure what MCPs and skills like plugins. So skills and MCPs. Other people can configure that so that my cursor is ready to go and set up. Sam loves the Datadog MCP and Slack mcp. You've also been using a lot also
26:28
pre launch, but I feel like it's so good.
27:14
Yeah, my cursor should be configured if Sam feels strongly. That's just amazing. And required.
27:16
Is it automatically shared or you have
27:20
to go and it depends on the mcp. So some are obviously auth per user and so Sam can't authenticate my cursor with my Slack mcp, but some are team auth and those can be set up by admins.
27:22
Yeah. Yeah, that's cool. Yeah, I think we had AMAN on the pod when of course it was five people and like everyone was like, okay, what's the thing? And then it's usually something teams and Org and enterprise, but it's actually working. But like usually at that stage when you're five, when you're just a VS code fork, it's like, how do you get there? Will people pay for this? People do pick, right? Yeah.
27:34
And I think for cloud agents, we expect to have similar kind of PLG things where I think off the bat we've seen a lot of adoption with kind of smaller teams where the code bases are not quite as complex to set up. If you need some insane Docker layer caching thing for builds not to take two hours, that's going to take a little bit longer for us to be able to support that kind of infrastructure. Whereas if you have front end, back end like one click. Agents can install everything that they need themselves.
27:56
This is a good chance for me to just ask some technical sort of check the box questions. Can I choose the size of the vm?
28:24
Not yet. We are planning on adding that because
28:30
obviously you want like L, XXL whatever, right? It's like the Amazon like sort of menu.
28:32
Yes, yes, exactly. We will add that.
28:37
Yeah. In some ways you have to basically become like an EC2, almost like you rent a box.
28:39
You rent a box. Yes. We talk a lot about brain in a box. Yeah. So cursor. We want to be a brain in a box.
28:45
But is the mental model different? Is it more serverless? Is it more persistent? Is it something else?
28:50
We want it to be a bit persistent. The desktop should be something you can return to even after some days. Like maybe you go back, they're like still thinking about a feature for some period of time.
28:57
So full like suspend the memory and
29:06
bring it back and then keep going.
29:08
Exactly.
29:10
That's an interesting one because what I actually do on like From Manus and OpenCloud, whatever is like I want to be able to log in with my credentials to the thing but not actually store it in any like secret store or whatever. Because it's like this is the most sensitive stuff. This is like my email whatever and just have it like persist through the image. I don't know how it works under the hood but like to rehydrate and then just keep going from there. But I don't think a lot of infra works that way. A lot of it's stateless where like you save it to a Docker image and it's only whatever you can describe in a Docker file and that's it. Because that's the only thing you can clone multiple times in parallel.
29:10
Yeah, we have a bunch of different ways of setting them up. So there's a Docker file based approach. The main default way is actually snapshotting
29:44
like a Linux vm.
29:51
Like a vm, right? You run a bunch of install commands and then you snapshot more or less the file system. And so that gets you set up for everything that you would want to bring a new VM up from that template basically. And that's a bit distinct from what Sam was talking about with the hibernating and rehydrating, where that is a full memory snapshot as well. So there, if I had like the browser open to a specific page and we bring that back, that page will still be there.
29:53
Was there any discussion internally in just building this stuff about every time you show the video, it's actually you show a little bit of the desktop and the browser. And it's not necessary if you just show the browser. If. No, if you're just demoing a front end application, why not just show the browser like it's.
30:16
We do have some panning and zooming, like it can decide that when it's actually recording and cutting the video to highlight different things. I think we played around with different ways of segmenting it and yeah, there's been some different rubs on it for sure.
30:30
Yeah.
30:43
I think one of the interesting things is the version that you see now in cursor.com actually is like half of what we had at peak, where we've decided to unship or unshipped quite a few things. So two of the interesting things to talk about, One is directly in answer to your question, where we had native browser that you would have locally, it was basically an iframe that via port forwarding could load the URL, could talk to localhost in the vm.
30:44
So that gets you basically in your machine's browser.
31:11
In your local browser you would go to localhost 4000 and that would get forwarded to localhost 4000 in the VM via port forwarding. We unshipped that.
31:15
Like an ngrok.
31:24
Like an ngrok, exactly. We unshipped that because we felt that the remote desktop was sufficiently low latency and more general purpose. So we build Cursor Web, but we also build Cursor Desktop. And so it's really useful to be able to have the full spectrum of things. And even for Cursor Web, as you saw in one of the examples, the agent was uploading files and like I couldn't upload files and open the file viewer if I only had access to the browser. And we've thought a lot about this might seem funny coming from Cursor where we started as this VS code fork and I think inherited a lot of amazing things, but also a lot of legacy UI from VS code. And so with the web ui we wanted to be very intentional about keeping that very Minimal and exposing the right set of primitive sort of app surfaces we call them, that are shared features of that cloud environment that you and the agent both use. So agent uses desktop and controls it. I can use desktop and control it. Agent runs terminal commands, I can run terminal commands. So that's our philosophy around it. The other thing that is maybe interesting to talk about that we unshipped is and we may both of these things, we may reship and decide at some point in the future that we've changed our minds on the trade offs or gotten it to a point where put
31:25
it out there, let users tell you they want it.
32:37
Exactly.
32:39
All right, fine.
32:40
So one of the other things is actually a files app. And so we used to have the ability at one point during the process of testing this internally to see next to. I had Git desktop and terminal on the right hand side of the tab there earlier to also have a files app where you could see and edit files. And we actually felt that in some ways by restricting and limiting what you could do there, people would naturally leave more to the agent and fall into this new pattern of delegating, which we thought was really valuable. And there's currently no way in Cursor web to edit these files.
32:40
Yeah, except you like open up the PR and go to GitHub and do the thing, which is annoying.
33:14
Just tell the agent.
33:18
I have criticized OpenAI for this because OpenAI's codecs app doesn't have a file editor. Like it has File Viewer, but it doesn't have File editor.
33:19
Do you use the file viewer a lot?
33:27
No, I understand but like sometimes I want it. The only way to do it is like freaking go in the. No, you have an open in cursor button or open in Anti gravity or opening whatever. And people pointed that. So I was part of the early testers group. People pointed that and they were like, this is like a design smell. It's like you actually want a VS code fork that has all these things but also a file editor. And they were like, no, just trust us.
33:29
Yeah, I think we as cursor will want to as a product offer the whole spectrum. And so you want to be able to work at really high levels of abstraction and double click and see the lowest level. That's important. But I also think that like you won't be doing that in Slack. And so there are surfaces and ways of interacting where in some cases limiting the UX capabilities makes for a cleaner experience that's more simple and drives people into these new patterns where even locally we kicked off joking about this. People like don't really edit files, hand code anymore and so we want to build for where that's going and not where it's been.
33:54
A lot of cool stuff and okay, I have a couple more observations about the design elements about these things. One of the things that I'm always thinking about is Cursor and other peers of Cursor start from like the dev tools and work their way towards cloud agents. Other people like the lovables and bolts of the world start with here's like the Vibe code full cloud thing. They were already cloud agents before anyone else. Cloud agents really. And we'll give you the full deploy platform. So we own the whole loop, we own all the infrastructure, we have the logs, we have the live site, whatever. And you can do that cycle. Cursor doesn't own that cycle. Even today you don't have the Vercel, you don't have the whatever deploy infrastructure that you're going to have which gives you powers because anyone can use it and any enterprise, whatever you're in for, I don't care. But that also gives me limitations as to how much you can actually fully debug end to end. I guess I'm just putting out there that is there a future where there's full stack cursor where cursorapps.com where I host my Cursor site which is basically a Vercel clone. I don't know.
34:33
I think that's an interesting question to be asking and I think the logic that you laid out for how you would get there is logic that I largely agree with. I think right now we're really focused on what we see as the next big bottleneck. And because things like the Datadog, MCP exist, I don't think that the best way we can help our customers ship more software is by building a hosting solution right now.
35:30
By the way, these are things I've actually discussed with some of the companies I just named.
35:52
Right now just this big bottleneck is getting the code out there. And also unlike Lovable and the Bolt, we focus much more on existing Software and the 0 to 1 greenfield is just a very different problem. Imagine going to Shopify and convincing them to deploy on your deployment solution. That's very different and I think will take much longer to see how that works. May never happen relative to oh, it's like a zero to one app.
35:56
I'll say it's tempting because look like 50% of your apps are Vercel, Sulabase, Tailwind, React, it's the stack. That's what everyone does. So that's kind of interesting. Yeah. The other thing is the model selector dying right now in cloud agents it's stuck down bottom left. Sure it's codecs high today, but do I care if it's suddenly switched to Opus? Probably not.
36:22
We definitely want to give people a choice across models because I feel like the meta changes very frequently. I was a big like opus 4.5 maximalist and when codex5.3 came out I had hard switch. So that's all I use now.
36:44
Yeah, agreed. I don't know if any but basically like when I use it in Slack. Right. Cursor does a very good job of exposing if people go use it. Here's the model we're using, here's how you switch if you want but otherwise it's like extracted away which is like beautiful because then actually you should decide.
36:56
Yeah, I think we want to be doing more with defaults where we can suggest things to people. A thing that we have in the editor, the desktop app is Auto which will route your request and do things there. So I think we will want to do something like that for cloud agents as well. We haven't done it yet and so I think we have both people like Sam who are very savvy and want know exactly what model they want and we also have people that want us to pick the best model for them because we have amazing people like Sam and we, we are the experts. We have both the traffic and the internal taste and experience to know what we think is best.
37:11
Yeah, I have this ongoing thesis of agent lab versus model lab and to me Cursor and other companies are example of Agent lab that is building a new playbook that is different from a model lab where it's like very GPU heavy although obviously has a research team and my thesis is like every agent lab is going to have a router because you are going to be asked what I don't keep up to every day. I'm not a Sam. I don't keep up every day for using ux, the arbiter of taste. Put me on Cursor Auto. Is it free? It's not free.
37:47
Auto is not free. But there's different pricing tiers. Yeah.
38:17
Put me on Crystal Auto. You decide for me based on all the other people you know better than me. And I think every agent that should basically end up doing this because that actually gives you extra power because people stop caring or having loyalty to any one lab. Yeah.
38:19
Two other maybe interesting things That I don't know how much they're on your radar are one the best event thing we mentioned, where running different models head to head is actually quite interesting because.
38:34
Which exists in cursor.
38:44
That exists in cursor, IDE and web. So the problem is where do you run them?
38:45
Okay.
38:50
And so I can share my screen again if that's interesting.
38:50
Obviously, parallel agents. Very topical.
38:53
Yes, exactly. Parallel agents, in your mind, are they the same thing?
38:55
Bestevent and parallel agents. I don't want to put words in your mouth.
38:58
Bestevent is a subset of parallel agents where they're running on the same prompt. That would be my answer. So this is what that looks like. And so here in this dropdown picker, I can just select multiple models. And now if I do a prompt, I'm going to do something silly. I am running these five models.
39:00
Okay. This is a straight clone of cursor 2.0.
39:20
Yeah, yes, exactly. But they are running. So the cursor 2.0 you can do desktop or cloud. And so this is cloud specifically where the benefit over work trees is that they have their own VMs and can run commands and won't try to kill ports that the other one is running, which are some of the pains.
39:22
These are all cloud work trees.
39:39
No, these are all cloud agents with their own VMs.
39:41
Okay. But you do it locally.
39:44
Sometimes people do work trees and that's been the main way that people have set up parallel.
39:45
I gotta say, that's so confusing for folks. Yeah. No one knows what work trees are exactly.
39:49
I think we're phasing out work trees. Really? Yeah.
39:53
Okay.
39:56
But yeah, and one other thing I would say though, on the multimodal choice. So this is another experiment that we ran last year and didn't decide to ship at that time, but may come back to you. And there was an interesting learning that's relevant for these different model providers. It was something that would run a bunch of best of ends, but then synthesize and basically run like a synthesizer layer of models. And that was other agents that would LLM judge, but one that was also agentic and could write code. So it wasn't just picking, but also taking the learnings from two models or N models that it was looking at and writing a new diff. And what we found was that at the time, at least there were strengths to using models from different model providers as the base level of this process. Like basically you could get almost like a synergistic output that was better than having a very unified like Bottom model tier. So it was really interesting because it's like potentially even though even in the future when you have like maybe one model is ahead of the other for a little bit, there could be some benefit from having like multiple top tier models involved in like a model swarm or whatever agent swarm that you're doing that they each have strengths and weaknesses. Yeah.
39:57
Andre called this the console, right?
41:05
Yeah, exactly. We actually. Oh, that's another internal command we have that Ian wrote. Slash council, which they send. Yeah, yeah.
41:06
This idea is in various forms everywhere and I think for me, like, for me the productization of it you guys have done like this is very flexible. But if I were to add another what your thing is on here, it would be too much. I don't want.
41:13
Ideally it's all. It's something that the user can just choose and it all happens under the hood in a way where like you just get the benefit of that process at the end and better output basically. But don't have to get too lost in the complexity of judging along the way.
41:27
Okay.
41:40
Another thing on the many agents and different parallel agents that's interesting is an idea that's been around for a while as well that has started working recently is subagents. So this is one other way to get agents of different prompts and different goals and different models, different vintages to work together and collaborate and delegate.
41:40
Yeah, I'm very like. I like one of my. I always looking for. This is the year of the blah, right?
42:04
Yeah.
42:08
I think one of the things on the blah is sub agents, I think or search, but I haven't used them in cursor. Are they fully formed? I almost need like an intro because do I form them from new every time? Do I have fixed sub agents? How are they different from slash commands? There's all these like really basic questions that no one stops to answer for people because everyone's just like too busy launching. We get the rig.
42:09
Honestly you could. You can see them in cursor now if you just say spin up like
42:33
50 subagents to cursor defines what subagents.
42:37
Yeah. So basically I think I shouldn't speak for the whole subagents team. This is like a different team that's been working on this. But our thesis or thing that we saw internally is that like they're great for context management, for kind of long running threads or if you're trying to just throw more compute at something we have strongly used almost like a generic task interface where then the main agent can define like what goes into the subagent. So if I say explore my code base, it might decide to spin up an Explore subagent and or might just decide to spin up five explore subagents.
42:40
But I don't get to set what those subagents are.
43:09
Right.
43:11
It's all defined by the model.
43:11
I think I actually would have to refresh myself on the sub agents interface.
43:12
There are some built in ones like the Explore subagent is pre pre built, but you can also instruct the model to use other subagents and then it will. And one other example of a built in subagent is I actually just kicked one off in cursor and I can show you what that looks like.
43:17
Yes, because I tried to do this in pure prompt space.
43:31
So this is the desktop app and
43:35
that's all you need to do, right?
43:36
Yeah, that's all you need to do. So I said use a sub agent to explore and I think, yeah, so I can even click in and see what the sub agent is working on here. It ran some fine command and this is a composer under the hood. Even though my main model is Opus, it does smart routing to take. Like in this instance the explorer sort of requires reading a ton of things and so a faster model is really useful to get an answer quickly. But this is what subagents look like and I think we wanted to do a lot more to expose hooks and ways for people to configure these. Another example of a sort of built in subagent is the computer use sub agent in the cloud agents where we found that those trajectories can be long and involve a lot of images obviously and execution of some testing verification task. We wanted to use models that are particularly good at that. So that's one reason to use subagents. And then the other reason to use subagents is we want contexts to be summarized reduced down at a subagent level. That's a really neat boundary at which to compress that rollout and testing into a final message that agent writes that then gets passed into the parent rather than having to do some global compaction or something like that. Awesome.
43:37
Cool. While we're in the subagents conversation, I can't do a cursor conversation and not talk about listen stuff. What is that?
44:48
What is.
44:55
He built a browser. He built an os. Yes. And he experimented with a lot of different architectures and basically ended up reinventing the software Engineer Org chart. It's all cool. But what's your take? Is there any whole behind the scenes stories about that whole adventure.
44:55
Some of those experiments have found their way into a feature that's available in cloud agents now, the long running agent mode. Internally we call it grind mode. And I think there's some hint of grind mode accessible in the picker today because you can choose grind until done. And so that was really the result of experiments that Wilson started in this vein where he, I think the Ralph Wiggum loop was like floating around at the time, but it was something he also independently found and he was experimenting with and that was what led to this product service.
45:12
And it's just simple idea of have criteria for completion and do not stop until you complete.
45:38
There's a bit more complexity as well in our implementation. Like there's a specific. You have to start out by aligning and there's like a planning stage where it will work with you and it will not get like start grind execution mode until it's decided that the plan is amenable to both of you.
45:43
Basically I refuse to work until you make me happy.
46:01
We found that it's really important where people would give like very underspecified prompt and then expect it to come back with magic. And if it's going to go off and work for three minutes, that's one thing. When it's going to go off and work for three days, probably should spend like a few hours upfront making sure that you have communicated what you actually want.
46:05
Yeah. And just to like really drive home the point. We really mean three days. No, no.
46:21
We've had no human innovation whatsoever.
46:26
I don't know what the record is, but there's been a long time with the grinds.
46:29
And so the thing that is available in cursor, the long running agent is if you want to think about it very abstractly, that is like one worker node. Whereas what built the browser is a society of workers and planners and different agents collaborating. Because we started building the browser with one worker node at the time that was just the agent and it became one worker node when we realized that the throughput of the system was not where it needed to be to get something as large of a scale as the browser done. Yeah. And so this has also become a really big mental model for us with cloud cloud agents is there's the classic engineering latency throughput trade offs. And so, you know, the code is water flowing through a pipe. We think that over the coming months the big unlock is not going to be one person with a model getting more done. Like the water flowing faster. It will be making the pipe much Wider and so paralyzing more. Whether that's swarms of agents or parallel agents. Both of those are things that contribute to getting much more done in the same amount of time. But any one of those tasks doesn't necessarily need to get done that quickly. And throughput is this really big thing where if you see this system of a hundred concurrent agents outputting thousands of tokens a second, you can't go back like that. Just you see a glimpse of the future where obviously there are many caveats like no one is using this browser irl. There's like a bunch of things not quite right yet. But we are going to get to systems that produce real production code at the scale much sooner than people think. And it forces you to think what even happens to production systems. Like We've broken our GitHub actions recently because we have so many agents like producing and pushing code that like CICD is just overloaded because suddenly it's like effectively regrew cursors growing very quickly anyway. But you grow headcount 10x when people run 10x as many agents. And so a lot of these systems, exactly a lot of these systems will need to adapt.
46:32
It also reminds me, the three of us live in the app layer, but if you talk to the researchers who are doing RL infrastructure, it's the same thing. It's like all these parallel rollouts and scheduling them and making sure as much throughput as possible goes through them. It's the same thing.
48:27
We were talking briefly before we started recording, you were mentioning memory chips and some of the shortages there. The other thing that I think is just like hard to wrap your head around the scale of the system that was building the browser, the concurrency there. If Sam and I both have a system like that running for us shipping our software, the amount of inference that we're going to need per developer is just really mind boggling. And that makes sometimes when I think about that, I think that even with the most optimistic projections for what we're going to need in terms of build out are underestimating the extent to which these swarm systems can like churn at scale to produce code that is valuable to the economy. And yeah, you can cut this if
48:42
it's sensitive, but I was just. Do you have estimates of how much your token consumption is like per developer? Yeah. Or yourself? I don't need like company average.
49:25
I just, I feel like for a while I wasn't an admin on the usage dashboard so I like wasn't able to actually see but it was a
49:32
mine has gone up.
49:39
Oh yeah. In terms of how much work I'm doing, it's more like I have no worries about developers losing their jobs, at least in the near term because I feel like that's a more broad discussion.
49:40
Yeah, but you went there. I didn't go. I wasn't going there. I was just like how much more are you using?
49:51
There's so much stuff to be built and so I feel like I'm basically just trying to constantly. I have more ambitions than I did before personally, so can't speak to the broader thing but for me it's like I'm busier than ever before. I'm using more tokens and I'm also doing more things.
49:55
Yeah, yeah. I don't have the stats for myself. But I think broadly a thing that we've seen that we expect to continue is Java's paradox where you can't do
50:11
an AI podcast without seeing it.
50:21
Exactly. We've done it. Now we can wrap, we've done. We said the words. Phase one, Tab Auto completed. People paid like 20 bucks a month and that was great. Phase two where you were iterating with these local models. Today people pay like hundreds of dollars a month. I think as we think about these highly parallel kind of agents running off for long times in their own VM system, we are already at that point where people will be spending thousands of dollars a month per human and I think potentially tens of thousands beyond where it's not like we are greedy for like capturing more money, but what happens is just individuals get that much more leverage and if one person can do as much as 10 people. Yeah that tool that allows them to do that is going to be tremendously valuable and worth investing in and taking the best thing that exists.
50:22
One more question on just the cursor in general and then open ended for you guys to plug whatever you want to plug. How is cursor hiring these days?
51:03
What do you mean by how?
51:11
So obviously Leetcode is dead.
51:12
Oh, okay.
51:14
Everyone says work trial. Different people have different levels of adoption of agents. Some people can really adopt, can be much more productive. But other people, you just need to give them a little bit of time and sometimes they've never lived in a token rich place like cursor and once you live in a token rich place, you just work differently. You need to have done that and a lot of people anyway it was just open ended like how's it agentic engineering, agentic coding, Change your opinions on hiring. Is there any like broad like insights? Yeah, basically I'M asking this for other people, right?
51:14
Yeah, totally. Totally. To hear Sam's opinion. We haven't talked about this, the two of us. I think that we don't see necessarily being great at the latest thing with AI coding as a prerequisite. I do think that's a sign that people are keeping up and curious and willing to upskill themselves in what's happening. Because as we were talking about the last three months, the game has completely changed. It's like what I do all day is very different.
51:45
Like it's my job and I can't.
52:09
Yeah, totally. I do think that still, as Sam was saying, the fundamentals remain important in the current age and being able to go and double click down. And models today do still have weaknesses where if you let them run for too long without cleaning up and refactoring, the code will get sloppy and there'll be bad abstractions. And so you still do need humans that, like, have built systems before, know good patterns when they see them and know where to steer things.
52:11
Yeah, I would agree with that. I would say again, cursor also operates very quickly, and leveraging agentic engineering is probably one reason why that's possible in this current moment. I think in the past it was just like people coding quickly, and now there's like people who use agents to move faster as well. So as part of our process, we'll always look for. We'll select for kind of that ability to make good decisions quickly and move well in this environment. And so I think being able to figure out how to use agents to help you do that is an important part of it too.
52:34
Yeah. Okay. The fork in the road. Either predictions for the end of the year, if you have any, or plugs
53:03
predictions are not going to go well.
53:09
I know it's hard.
53:11
So hard. Don't get it wrong. It's okay.
53:12
Just one other plug that may be interesting that I feel like we touched on but haven't talked a ton about is a thing that the kind of these new interfaces and this parallelism enables is the ability to hop back and forth between threads really quickly. And so a thing that we have.
53:14
You want to show something or.
53:30
Yeah, I can show something. A thing that we have felt with local agents is this pain around context switching. And you have one agent that went off and did some work and another agent that did something else. And so here, by having I just have three tabs open, let's say, but I can very quickly hop in here. This is an example I showed earlier. But the actual Workflow here I think is really different in a way that may not be obvious. Where I start the morning I kick off 10 agents or something, the first one of them finishes, come in, watch the video. Either is close and so I might send a follow up, I might say, hey, make it red. Or I might hop into the desktop and try it out. And within 90, 120 seconds I've kicked this one back off and either started the merge process like CI is running now and I'll come back to it later, or it's off with some additional follow up information and then I can hop into the next one and then the next one I hop in and I'm like, okay, this looks interesting. Actually try it out for real in the app. I want to see it in action, not just in the gallery. So I can kick that off and the agent will go and work on that because maybe I wanted to try it out like what the button looks like in the actual thing. And then here I might hop in as well and check the video here or do something and so you're really parallelizing much more and follow up here, check in there. It's much more this higher level of abstraction and having the different desktops where you can hop back and forth and you're not like, oh, I checked out this branch. Oh, where was that work tree again? It's really like solving for that which we've ourselves have struggled with in cursor and these local agents to be like where was that diff again? It's lost in some work tree, never going to find it. Oh, my local thing is rebuilding. Oh, just make another one. That's what you end up with. And then you wait for five more minutes for it to run. And so this is really like a new way of just parallelizing that we found to be really fun, honestly, where you're just hopping in and injecting taste and you're like, that doesn't quite feel right. Oh, actually this is not architected quite right. But you're just focusing on those like taste. Interesting questions.
53:31
And for me, the cloud ecosystem too also enabled this to be like something that is like adding productivity to my dead time, like commuting or like overnight or something like that. The fact that I don't have to leave my computer open, there's no cursor.
55:37
There is a cursor mobile app.
55:50
If there is, I'm not sure it's like the current thing we. I use it on my phone all the time, just on the web. So pretty good experience there for Checking in and unblocking. I think yeah. You can see the videos and stuff in the web app, which is awesome.
55:51
Yeah, yeah. I think this is one that the ADD will inherit the earth. Like if you're like your attention span is cooked but you still can manage. Like actually this is good for you. But also I think this is where the coding tools start coming into conflict with the productivity tools where like the linear, the Kanban boards. Because what you have there is cool. But you know what? You actually need a Kanban board. Like which people have Vibe.
56:04
Vibe.
56:28
Kanban is out there open source. I'm sure you guys have talked about it. But it starts to conflict because actually the code doesn't matter anymore. It's the process of the human interacting and checking and seeing getting the World of Warcraft sound package to go work or whatever job done. I don't know. It's an interesting future productivity thing.
56:29
Yeah.
56:46
I also think another big theme last year it's called the year of coding agents. This year coding agents spill over to the real world to cloud, cowork and all the other stuff. I'm sure Cursor is going to focus on software. But let's call it openclaw is extremely mind expanding in terms of. I did not know that could happen. And it's all based on a coding agent based totally.
56:46
And I think one of the things that like cock and cut friends and family that are not in the software world that's interesting is I do speaking of predictions, I do think that we are going to start see other industries go through what software development has started going through. I think by virtue of how good models are at writing software and how early adopter the people building the new technology are and trying it out and applying it to themselves that certain kinds of shifts will happen too to other industries. And there's a lot to be learned from how that's gone down and is continuing to go down in software. In terms of all the interesting questions about to what point do people get more leverage? When do you start changing the role to become much more generalist? Like all of these questions that we've seen some data on, but we'll see a lot more in the coming months that will happen everywhere.
57:07
So any parting thoughts? Any thoughts of your own?
57:55
Not really. Good. We covered so much. Good.
57:59
We covered.
58:02
We covered a lot coming up with a prediction. I just think agents are going to keep getting better going to stop doing as much manual coding. Probably zero lines of code written in the whole month of December this year by myself. 100% agents is a personal prediction, but,
58:02
oh, you're not at zero today. What in what cases?
58:16
I think honestly it's 1%. If I just am like, get frustrated and I'm like, I don't want to go have it tell an agent to change this one thing.
58:19
But prompting. Sometimes I feel like working on prompts. Sometimes I still go in and manually at it because it's so like bare intent transfer that like telling the agent what I want. It's like writing an essay where I don't use agents to write essays yet because the process of writing it is the thinking.
58:26
I still can't stand AI generated writing. So, yeah, I also can't have the agent write prompts.
58:42
So no dspy, no jepa, nothing like that.
58:46
Here we have some internal tooling around some of the prompt optimization things, but there's a fair amount of just what concepts do I need to communicate to the agent, to the model?
58:49
I also noticed another thing I'm also looking for is voice. I noticed that you didn't use your voice to code. Even OpenAI. When we do a podcast with them, they don't use their voice. And I'm like, at some point this gets good. You can stop typing.
58:58
We have some people who like that a lot internally and I think we'll be experimenting in that space too, for sure.
59:09
Do you use voice a lot?
59:14
Not a lot. Sometimes it's bound to my caps lock, so I can press it. I just.
59:15
And when you use it, do you want it to talk back or you just want just dump in? Yeah, yeah.
59:19
But like the brain dump is good because you can interrupt yourself, you can go on a tangent, whatever. It just captures everything. Yeah. And slopping into LLM, it's fine.
59:25
Yeah. The way that we did this with Autotab was people would record full screen recordings with audio to teach the model, like how to do a task. And one of the funny things that we learned was people would use their Siri voice where they would start talking in like short stilted sentences and enunciate really clearly because they were used to. They last used AI two years ago, where you had Apple has damaged like
59:32
an entire generation of people's expectations.
59:54
Exactly. And we had to be like, no, you're very native. So you do this, but just dump everything in. You can say, you can repeat yourself, you can contradict yourself. The models are smart enough to figure
59:57
it out, but it's still very bad. So voice coding was always I considered like the hardest part because you have to say, like technical Things that is spelled like spelling matters, capitalization matters, and like it's all not in voice. So we'll see. So far it's been more sort of emotional companionship, that kind of stuff. But at some point it's going to hit voice coding. Yeah.
1:00:06
I have a prediction for you. Yeah, I predict that by the end of the year, the volume on. I think it will take longer than people think and longer than we think for cloud and agents working in their own boxes to surpass local agents. But I think that crossover will happen before the end of the year and probably by the end of the year, agents running in the cloud will be a multi, like more than 2x the volume of local agents.
1:00:24
Okay, you're leaving me an opening. What's not good today?
1:00:52
Yeah, there's a bunch of hard things. So one of them is just getting those sandboxes to be really good. And the thing that was part of this launch that we spent an inordinate amount of time on is cursor.com onboard, where you pick a repo, add secrets, give it access to things, and the agent just goes off and installs things.
1:00:55
Yes. I think out of the whole thing, that was my favorite.
1:01:13
Yeah, we worked a lot on that. Sam and I in particular spent a lot of late nights making that. Good. But there's still a lot to do there. Right. Set up one, two things. Maybe it's too slow. It's too slow working on it. Setup is not like a unitary thing where everything is set up or not. Right. Like, things will break over time. You have new dependencies, you need access to new systems. Like you change where your database lives. So that's one part of it. And then the other part of it is having these agents run in the cloud and be more autonomous. We've really started to see the lack of memory. And Sam, as someone who's thought a lot about this, once you start getting the model kind of doing operating the code base, there's more particularities that are not. It's not just a read file tool. It needs to know, how do I start up the backend? How do I check the status of the backend that's very particular to your code base. And even if it's great at npm, run, watch, or whatever the default things are, there's always quirks. Everyone has quirks. And getting the model good at those things will require more work. And we're working on that. But we think that will be one of the big unlocks, is having them be onboarded not only in terms of their environment, but also in terms of their understanding of design trade offs, how the code base works, how to be a good developer. In any one code base it's not
1:01:15
cursor rules, it's going to be something else. Is it going to be a file? We just call it another markdown file, a different name And I don't know,
1:02:27
one thing that we learned we being cursor the company this year there's a really great blog post that the Jedi and other people in the agent quality team put out about dynamic file context.
1:02:35
Is it your team or is it a different team?
1:02:45
Different team, yeah. And they were working on basically doing a lot everything is file system and so a lot of my thinking personally on memory this past year has changed to be more aligned with that where it's like giving the agent pointers to things, annotations to things. The second thing I think that I've started to think differently about memory is a subset of agent, self auditability and self awareness. So basically the agent might want to propose annotations or links or memory files to itself when it finds that there's some gap in its functionality in its own harness that might need to be filled by some piece of information on a semi permanent basis. But there's a whole bunch of other things that are a side effect of self auditability that are really interesting. Like potentially finding conflicting instructions or skills and rules that like might be like eh, these are bugging each other and also things like fixing like devX problems that it runs into. I think that basically the dynamic file system stuff is probably very promising from memory and there's also this notion of needing to have the agent be a little bit more self aware in terms of being able to identify gaps in its own functionality and decide how to fill them.
1:02:47
That's such a good point. Like self awareness broadly has been a really big thing that I think Sam has pushed us to do more and more of where the agent should understand how its environment works, it should understand how secrets work. Like it needs to be self aware about its own harness and its environment and anything.
1:03:54
This is not inherent in the model.
1:04:11
You have to do specifics. Right. If it's running in cursor versus some other sandbox that's a bit different. And then the other part of it that starts to get really interesting is when the model starts editing its own system prompt.
1:04:12
Yeah.
1:04:25
What does that even mean? How do you do that safely and
1:04:26
in a way do that? This is just research, right?
1:04:28
I think it will do that. Yeah, it will Manage its own context. And so system prompt is part of the context and you can argue about
1:04:31
yeah like other things that it might decide to turn off or on depending. And all this self awareness to us in this context is not like the model itself having a notion of consciousness, but more like knowing like what system it's operating in and the constraints of that system and potentially being able to have agency in optimizing itself to operate best in the. In that system. This was like one of the first things I learned at DOT when we launched was that I, we had made the model or made the agent or whatever we would call it at that time it was far less agentic, made the product work very well at a certain number of things, but didn't have complete self awareness of like its own boundaries. So people would be like, hey, can you do this thing? And the thing was there and could be done and the product would be like oh no. And I'd be like but you can. And so like basically like that was one of the earliest things I found.
1:04:37
Just believe in yourself.
1:05:26
I know as a product developer, like it needs to both be able to do the thing and it needs to have complete knowledge of its ability to do the thing. Those are not always obviously the same like part of the prompt at all. Yeah, it's something that I think has continued to be a theme in the ecosystem that users will often attribute increased intelligence to a system that is more highly self aware and is more able to manipulate itself to do well in a system. If that makes sense.
1:05:27
Yeah. This is more abstract than I ever thought. We'll get at this curious discussion, Coop, that isn't the kind of conversation that you have.
1:05:53
We talk about this stuff all the
1:06:01
time, improving agents in general.
1:06:02
Yeah, I think to your point, right. About the agent layer and thinking a lot about models and the harness and the product and the affordances that falls from the.
1:06:04
No, I mean you guys are like my sort of leading example what an agent lab looks like and can be successful. And I think people always think hungry for insights into how you guys operate. So thank you for taking the time to share.
1:06:13
Thanks for coming.
1:06:24
Yeah, thank you.
1:06:25