Am I Even Needed Anymore? GLM-5, Agentic Loops & AI Productivity Psychosis - EP99.34
The hosts discuss GLM-5, a new open-source frontier AI model from China, and explore the psychological impact of AI productivity tools. They examine how agentic AI workflows are creating 'AI productivity psychosis' where users feel compelled to work constantly and take on unlimited tasks, leading to fatigue and anxiety rather than the promised efficiency gains.
- Agentic AI workflows are creating psychological pressure to work constantly, leading to 'AI productivity psychosis' rather than improved work-life balance
- Open-source frontier models like GLM-5 are becoming competitive with proprietary models while being significantly cheaper to operate
- The shift to agentic AI is changing how models are tuned, with emphasis on code execution and tool calling rather than conversational abilities
- AI productivity gains lead to taking on more work rather than working less, creating a cycle of increased stress and responsibility
- The gap between 95% task completion and 100% remains a significant challenge, often requiring extensive human intervention
"I always thought Codex would eventually win, but I am pleasantly surprised to see it happening so quickly"
"There is no longer a reason for procrastination because you have a buddy there who is going to do all the work for you"
"AI productivity psychosis. But yeah, there's just this sense of real, like think of what could be done in this period of time"
"Am I even needed anymore? Like, I genuinely like the productivity around me is such that I'm like, what am I even. What's the point?"
So we said I'm out and took a permanent vacation.
0:00
So, Chris, this week actually, wait a minute. Before we start, as most of you already know, we are planning to go on a tour this year and we've been promoting people to register for that event to show their interest, but also help us figure out like, what, what kind of event to put on and where we should put it on. And so far it's been giving us really great insight. So thank you to everyone who has completed that registration form. If you haven't and you are interested, there'll be a link in the description below. But I, I did want to call out a few things from it. So we've been learning about our audience, the kinds of people that are in our audience, and also, you know, their responsibilities and things that they do and are interested in. Now look, there's some pretty heavy hitters in there, I'm not gonna lie. And I have a message for you at the in a moment, but one that really stood out to me and I don't want to dox anyone here, but there's someone quite high up at NASA who allegedly listens to the show. Now you might think that this is maybe someone just saying that they are high up at NASA, but it really checks out. So they, they would have to know the person's email and like various other details. And my message for these people is, what are you doing listening to this crap for?
0:11
The message is don't vibe code rockets, guys.
1:23
I mean, it's, it's slightly concerning. I'm like, oh wow, this stuff's really penetrating. And you know, I do want to see NASA successfully get to the moon. So stop listening to this.
1:26
Are you seriously opening the podcast with telling listeners not to listen?
1:37
I don't know. I've done it a few times and it hasn't gone down well. But anyway, it is shocking to me. So yeah, anyway, let's get into it. Another fatiguing week. Chris, in the world of AI, I.
1:41
Think as you pointed out, there is no longer a reason for procrastination because you have a buddy there who is going to do all the work for you. You just simply need to direct them and test their work. And this pressure where you're like, not only can I complete all of my tasks that I have listed today, I can actually do them simultaneously. And there is no reason, even while I'm waiting for it, to execute stuff to stop working, I should be right onto the next task. And that pressure is leading to our lives taking a downward spiral. I was just telling you how I've been neglecting basically every other element of my life except for working at the moment. And you know, there's a physical manifestation of my productivity in reverse in my backyard.
1:55
If I don't have like four tabs going, even like important moments in my life when I should be very focused spending time with my wife or children and enjoying those moments, I'm thinking, oh, you know what? I have got telegrams set up now to communicate with SIM theory so I could be just spitting out more agent tasks. That's how sick I have become.
2:42
I was about to say it's almost sick where you're like, you're spending time with your family. Like think of the lost productivity. Like I could be doing so much right now.
3:03
It's awful. And I think that it's, it's. I don't know how healthy it is. I don't know what becomes of it, but the only way to describe it truly is that like fatigue and like psychosis, AI productivity psychosis.
3:10
But yeah, there's just, there's just this sense of real, like think of what could be done in this period of time. And I was even saying to you, like I've seen you getting so much done lately and some of the stuff I'm working on is taking longer and I feel horrible. Like I feel like I'm wildly unproductive because I, I can't accomplish it all immediately. It's, it's a, I don't know what the, the feeling is, but I think it's going to happen more and more industries where the AI really can do a lot of the work for you.
3:27
Yeah, I think we should dive into it a little bit later because we have mentioned it since the very start of the year this like fatigue and this complete change in our, our own like feelings around this stuff since the start of the year. And it, it, it's time very well with. Once we started working more agentically ourselves, like we really flipped a switch and I think you can just tell the sort of vibe shift for us ourselves. So, so we'll get into that a little bit later. But I do want to cover a more sort of topical announcement at the start and just talk about maybe like what it means with the release of GLM5. Now this is a new model. It's an open source model, a frontier model out of a Chinese lab. Think it's like zipu, zippu, zippu. But anyway, their new domain is z AI not to be confused with x AI. We're all picking letters now z AI. It was released on February 11th or 12th depending on where you are in the world. And it's a pretty interesting open source model. In fact the first open source model like Kimi K2.2.5 was interesting but a little bit nuts. I think people got really excited about it for like 24 hours and then I don't like truly, I don't think anyone's really using it seriously anymore. But, but glm5 I do think might be a bit of a paradigm shift. So it's a mixture of experts model, it's got a 200k input to, sorry, 128k output. It's pricing and, and just to clarify, the pricing is going to be different depending on how you run it because it's an open source model so you could run it on your own infrastructure. But the sort of cheapest consumer pricing of this model, if you want your data to go to China, which maybe that's okay, is $0.8 per million input tokens and $2 per 56 per million output. So basically let's call it free and.
3:55
Very cheap compared to the others. And I think there are public cloud hosting versions of it that are in America which is sort of like a dollar instead of 80 cents and things like that.
6:07
Yeah. So reasonably cheap. I mean in comparison to say autopas 4.6, super duper cheap in terms of the new Codex, not that much cheaper. So you know, you would have to question like is it, is it worth leaning heavily into this model? I guess if you want to run it on your own infrastructure, maybe yes. But a few interesting tidbits about the model I think that are just like, like pretty big wows. So zero US hardware dependency they're claiming trained entirely on Huawei Ascend chips. So that's kind of interesting to the bull case for Nvidia.
6:16
Yeah, that is, that is interesting because you keep thinking, oh well, Everybody's gonna need GPUs and that's going to be the massive explosion. But you just always assumed it would come from them.
6:59
So that is a bucket of Hawaii Ascend chips. Maybe. I, I don't know. But anyway, it's getting really good initial feedback. We haven't had a ton of time to use it just yet, but I would say my initial impressions of it in comparison to Kimiko 2.5, which I noticed some rough edges immediately, GLM5, I haven't really noticed any difference in very early testing between a Claude Opus or a Codex.
7:08
I immediately plugged it into our new agentic loop to see how it performed. And admittedly, I'm in a world of pain at the moment with the issues that I'm dealing with, so you can't really use it to assess the model.
7:39
So.
7:52
But the good news is it screwed up in all of the same ways as all the Frontier models did. And so I have a feeling it's sort of on par in at least that respect. I also ran it in a normal chat loop, getting it to do file operations and code editing and things of that nature. And there's just certain tasks, I think, with models now, where they're sort of indiscernible once they reach a certain level. Like if you swap out, say, Sonnet 4.5, Opus 4.6 and GPT 5 or 5.2, for example, on a relatively easy task, they all more or less do the same thing. It's only when you get to the really difficult stuff you start to notice the differences. And certainly GLM5 fits into that category. It's able to compete with the best of them in terms of the basic tasks, at least.
7:53
Yeah, I think we've got to, we've really got to, in the following week, push it a bit harder to see what it's truly capable of. Because these are just very initial impressions. But I would agree with you, like just the basic tool calling loops, like, if you ask it to do like go off and research something and produce an image or a document or whatever, it handles it really smoothly and you can just get this like, vibe and sense that it's tuned really well for the first time. Like, whereas a lot of these models, I don't know, they just have all these rough edges. And I reserve the right to take all this back because again, like, may discover huge rough edges during the week and then just pivot away from it. But pretty interesting model. I. I don't think it's like a showstopper in the sense that someone who's maybe using like the latest Codex version would go, oh, you know what? I'm going to use glm because the price difference, if you know, like you like, the reality is you're not going to run it on your own infrastructure. Right. If you're just using it for like an open claw or something like that. So the reality is you might just go after a Codex when it's actually available through the API.
8:38
The thing I like about models like this though, as someone who's just been going agentic loop, agentic loop all day long, I always think about the token consumption and I'm like, Geez, it's a lot. And when the agent goes off and spends a lot of time and maybe does like 80 tasks or something like that in order to get something done and it's looping and looping knowing that you're working with a model that's reasonably priced or really, really cheaply priced like this one. It's just a sense of power. You just like this is incredible. Like the amount of work it is doing for the money I'm spending is brilliant. And I definitely understand, I know I went on a lot about price last week, but I understand that there's cases where say Claude 4.6 Opus Max, which I've been using a lot this week admittedly is really value for money because it's getting you through really important tasks that you just otherwise couldn't get done. Like I sort of that but for the like day to day grunt work and the things we've been talking about, like recurring agentic tasks like for example drafting email replies or event driven stuff where when this happens, go through this agentic skill workflow, for example, those kind of things. I think models like this are just ideal for like that day to day routine task kind of work that needs to happen all the time and you can set up all of these different situations that handle that. I think a model like this is actually better for those scenarios because it's faster, the price isn't a concern, so you can just set up as much of it as you want. And it's, it's pretty reliable at those kind of tasks. So I can see there's a real place for models like this and probably even beyond that, to be honest, I don't think we need to necessarily cap it just because it comes from China.
9:48
Yeah, we, I think though what stood out to me with this release is the fact that we're reentering a new model war period where these labs are, you know, one upping each other. But the tuning now has leaned so heavily into the agentic loop. Like all the models now are just tuned to work in this agentic loop and there's a real emphasis in the, in the way they're trained and you can see that in the pivot of like how it's now GPT 5.3 Codex and the Codex brand of models from OpenAI is what they're clearly focused on right now. Because at the core level if the way these agentic loops work is code, like they're running and executing code, it doesn't matter if you're not coding. Like if you're doing white collar tasks or you know, this kind of knowledge work stuff. It's the same thing. It's the underlying model needs to execute code and store memories and tweak files and do all the operations of the computer. And so I think now that's why you're seeing OpenAI lean into codecs and Anthropic, I guess has had a one size fits all approach. But I did notice during the week a couple of people commented to me that the new Opus 4.6 and these agentic tune models, they're not fitting into their workflow as well for like term by turn based conversations where they wanted to go call tools and do stuff.
11:27
It's tr.
12:52
It's, it's too aggressively trying to go into an agentic loop.
12:52
Yeah, I can tell because I noticed that as soon as you give the models tools like for example, writing a skills md, sorry, writing an agents MD file or any way to write files, it will just gleefully write as much context out into files as it possibly can. I think one of my repos I was up to like 70 MD files the model has just made because it can, while it's doing ordinary chat operations, it is so apt to call tools of that nature that it's just gagging for it. Even like planning tools or prompting the user tools. If you give it the right ones that it's expecting that it's been tuned for, it'll just be all over those things like absolutely enjoy calling them. So I can see how that could break up a normal chat loop. And actually I've noticed even in some cases if you disable tools entirely, the thing will hallucinate tools. It'll try to call them anyway because it's just so apt to do that.
12:57
The other, the other interesting piece about the agentic loop models is Codex during the week, like it was all sort of Opus 4.6 and then a lot of the open claw hype at the start of the year and I think that's still obviously quite hype. I mean it literally has more stars on GitHub. It sounds like we're in preschool.
13:53
Then there are stars in the universe.
14:15
Yeah. But then there are stars for like VS code. So I think that project's not to be underestimated. I, I do think that that like that idea or the concept behind it is really interesting. But the, the vibe shift, I think there's been definitely a bit of a vibe shift around the Codex model as more people work with it. And I'm curious in the comments if people have had experiences working with Codex and they. They do prefer it over Claude 4.6 opus because it seems like a lot of people do, or at least that's what they're manipulating us on X to believe. Sam Altman, though, had another absolute banger tweet from how the team operates. I always thought Codex would eventually win, but I am pleasantly surprised to see it happening so quickly. They're really trying to.
14:17
It just has to be seen as a visionary, even if it's revisionist. Like, I'm coming out now to tell you that I used to have a vision that exactly what happened would happen. You know, it's like, oh yeah, I actually predicted everything that happened did happen. That was me.
15:03
It's, you know, it's got to be the cringiest tweet ever. But you know, I, as I said, I mean, one or two weeks ago, I've lost track because of my AI psychosis, but I, I did have a lot of good experiences with Codex and still do. So I. It's one of those.
15:18
Yeah, I've been using 5.2 codecs all week because I'm trying to mix it up with which models I'm using all the time now. And I find the thing really good. Like it's pretty reliable. I mean, I can only imagine 5.3 is better. And because 5.2 is great on its own, like you pointed it out to me a few weeks ago and I started to use it and yeah, it's great.
15:36
Yeah, it's more. I find it's more precise for certain things that excels more like backend tasks. I still don't love it for any knowledge work. Like creative stuff, they sort of like took out its soul. But apart from that it's pretty good. I think. The other thing I thought was really interesting here is when we heard about the GLM5 announcement, we both called out context window. It seems like everyone gets stuck or at least the agentic loops maybe work better with 200k context. And that's why the 1 million context with 4.6 is still in beta and know insanely expensive. So I, I did think, like, do you think now working in the agentic sort of loop world, do you think that's putting less of an emphasis on. On context size even mattering? Like if it can manage that 200k context and it can create sub agents who also have a 200k context to go and think through other elements of the particular task, does it even matter?
15:53
Well, it's interesting you say that because you pointed this out to me and I had a think about it. And I think the reason why it is, what you're saying is because if you think about an agentic loop, almost the entire thing is this cobbling together of a prompt at each stage of the process that is appropriate for that step of the process. So for example, the agent needs to the whole time be referring back to what is the overall goal? What step am I up to within this step? What tools do I need to call them? What were their results? And so the whole time it's almost keeping like a diary of where we're up to, what needs to be done now. And then within that you've got things like skills and memory. So like the project memories that are telling it how the project's laid out, what, what files are important, that kind of thing, what's been generated. Then you've got the skills which are appropriate to that step. So it actually has procedures on here's how to accomplish the step you're on. And as you say, where appropriate, those are spawned as sub agents with even tighter prompts. So they don't need to worry about, for example, the overall goal anymore. They're only really worried about their particular part of the thing. So you're getting these much, much tighter system prompts which are like here is precisely what you need to do. There's no room for ambiguity. This isn't a planning phase anymore. Like you're doing this and then here are the tool results you've made and a little bit of the conversation history that's relevant. So yes, you can fit a lot more into a tighter context and therefore we're seeing better results. And I only realized this was even happening the other day when I mentioned earlier I had an agentic process that had like 80 tool calls or 80 iterations or something in it. And at no stage was context an issue in that entire process. So I think the answer is yes.
17:01
Yeah, to me I have not. When you're in an agencic loop sense, I don't really notice the, the context problem at all. And I think a lot of this stuff can be backported into sort of turn by turn interactions as well. Where the session has its own working memory, it like its own sort of core almost plan. Like the, the prompt structure is a bit better there as well. But ultimately I think the only area it falls down where it might benefit from 1 million context. If you've got billies in the bank and are willing to spend on a million contexts in sub agents and Core in the core agent process. Like it could add up fast, but the, the area I think it still suffers is the human interaction is, is at least having the broader system knowledge of how it all fits together. And in that broader overall context that I bring to the table, it's like my only remaining skill that I'm bringing to the table. It's like that, that, you know, vision really, the vision of how it all fits together. And I think that I'd be interested to know because I don't have a big enough code base example where it started out from agentic looping. So, like, we came from a world of sort of, you know, not, not that still writing nearly all the code by hand to, you know, basically none now. And. But because it's been built up over time, I have a fundamental understanding of where things are in the system. So I feel like my briefing and my bigger picture focus is such that I am still contributing a lot now being a bigger code base that you fundamentally do need to understand how it works, or at least I'm telling myself this. And, and so to me, like, I wonder if the, the larger context in the loop, like there is benefit fitting more into that core, you know, compaction, call it, of what it's working on at a given time.
18:49
I think this is where, though I mentioned earlier, the propensity for it to like, want to like, just be lathered up about writing these MD files with information about what it's discovered in the session. And what I'm definitely finding is that having that project knowledge exposition as you go and work with it and it consolidating that and including it at appropriate times leads to less of these situations where you're having to build up this huge context for it and explain every little bit so it understands what you understand about the project. So I agree, I don't think that part's perfect yet. But I definitely think we're getting to the point where certain tasks you do, once it gets a feel for the project and which files actually matter in the process, it can jump to it a lot faster. And I think that we've both done a bit of work on this, on getting it to focus on that stuff and remember the important parts of it. And when it does that, I think that you can sort of overcome that too. In saying that one of the problems I definitely see with the smaller context in terms of an agentic loop is it ends up leading to more loops because you'll have it instead of going, you know what? I'll just Read this entire file in, because I can. Instead it's like, let me read a hundred lines of this file. Oh wait, I need to get another hundred lines of this file. Oh wait, I need to get 100 lines of that file. And so it's this kind of painstaking loop where it gradually builds up the information it does get there. But you kind of wonder, okay, well, if I had a million contexts that wasn't going to send me broke, I could just shove the whole files in there. It's going to not only get the understanding so it can do less work next time, but it's going to not have to do all these extra cycles, which may add up in terms of tokens anyway.
20:48
And I guess it can't do those cycles all async, like in parallel, because sometimes it needs to figure out the first chunk to know what to look for in the next.
22:27
Yeah, you're totally right. And I think that's the thing. One thing we had discussed about doing things in parallel is often you'll be doing it on the same project and you don't really want three workers all simultaneously editing stuff and then within that sub agents also editing the same stuff, because it becomes this massive mess. And so we were actually talking about how do you get that straight line speed a bit faster in terms of it accomplishing agentic goals. So you can just go single threaded and work on the same project at the same time, or maybe some other isolated element of it in another tab. So you're not having this conflict between the things, even if it's just in your own head.
22:37
Yeah, and I think in the coding world they have like obviously git branch and like all these like git protocols. But my counter argument to it is there's still the human element of like, someone's got to test this stuff and validate it because ultimately you're most of the time building a product for humans. Right. And so if you have all these different branches and side loops and all this stuff going on, the cognitive overload of you, then to be able to focus and review things and remember what the original mission was for that task is kind of nuts. And I think one thing I've been doing is asking it at the end of the loop to summarize what I need to test and what the original goal was. And I know that might sound mental, but when you've got four or five tabs open and they're complete, then I just look at the checklist and the summary and go, okay, I need to go play around with this stuff in the product and make sure all these things are validated and work until you.
23:17
Realize to some degree it can do that too. It's like, oh, hey, I've just run it and I've opened the browser to check and it works.
24:17
Yeah, I should mention. I mean, I think it can do those things, but I still need to validate it as a user, like, because it's not.
24:23
It's still relevant. Mike, don't worry.
24:30
I know. I feel like I. I feel like I've turned into one of the. It's like the copium meme of like this whole podcast so far being. I'm still relevant too. I'm still relevant.
24:31
Yeah, we really do need to rename the podcast to still Relevant. I think it's a more appropriate name. But yeah, it is funny. And the other thing that I find that goes along with how easy it is to do stuff is just how flippantly you can just add stuff to whatever you're working on. You can be like, can you just add an image in there? Can you add a feature for this, a landing page for this? Whatever you want. And it's just perfect, absolutely brilliant in context output that works and is tested so suddenly. And I think this comes back to that overload we were talking about. Suddenly the limitations are like the limitations you have is purely your own creativity or your ability to ask for things. You sort of. At this point where I guess it's like being a multi billionaire where you can basically buy whatever you want and so suddenly you've got this paralysis of choice because you're like, oh, will I buy a house in this city or that city? What role wear today?
24:41
Like, it's not as good as that, that choice.
25:38
As a worker. It's almost as good as that. Like any task I want to accomplish, I simply have to ask and someone does it for me and better than I could do it. Like, it's pretty, you know, it's. It's pretty. I don't know, confronting in a way because you. You're only limited by your imagination.
25:43
I. I think partially, yes. I just don't know if with fully there yet. So sometimes I'm blown away and I have this existential crisis where I'm like, it just implemented this whole thing perfectly exactly how I would have it tests, it works, it's great commit, you know, and other times I am shocked at how, how wrong it gets it and how. Yeah. And how like stupid it can be and how bad it is at planning. And so like you start to notice flaws in it. But I kind of wonder if that's because you get used to this new baseline where, you know, that's the bar is set so high. I think the people who test things like Tesla's full self driving on YouTube seem to notice this a lot too, where, you know, they have a version update and they're like, this is incredible. Like it can do everything. But you know, the next video is like nitpicking that, you know, that, that version of it. So yeah, I mean, look, if it.
25:59
Was as easy as I said, I wouldn't be sitting here like feeling behind.
26:58
You know, that's my point is like I, I keep thinking to myself, oh, if it was so smart, why aren't I done now? Like, you know, like, you're just not.
27:02
Asking the right questions.
27:12
Yeah, it sort of is. It's like, you know, it's that prompting capability and, and also, I mean, this is somewhat an embarrassing story to tell and I'm not sure I want to like, actively promote it, but. Because to get the new version of Sim Theory out, we've been working very late nights, extremely late nights. And attendance like the, the. Some. There's a lot of doctors I know from the survey that listen to this show. So you can diagnose me.
27:14
Starting to list your medical issues.
27:41
Yeah, well, at least it's, you know, the AI has an opinion. But I'd like, you know, I'd like the real guys to chime in here, but I have swollen tendons in the top of my arm. I believe it's tendons. And from just, I guess using a mouse and keyboard just way too much and not moving from my desk ever. But I was like, okay, my body's failing me, so I'm gonna have to have a new hack here. So I've just been going like full voice input, barking orders at the thing. And I have a new low latency voice input system in the Sim Theory update, which I've been using a ton of to just bark like bar commands in. And I've been improving it because I. Obviously it's like my main input method now. And so I've got some ideas. I want to have it so you can like command it, like open a new tab and put this prompt in it and like actually bark orders and it understands. But that's completely unrelated to my point. But I do find that it's gotten to a point where you start to look at other things like, oh, timing is slowing me down. Like I can't type fast enough. I can Think faster. Once I've trained myself to think aloud. So I've been using voice input for almost everything. And you know, the pain's getting better. But I, I think it kind of does show that this is this other like disgusting trend with this is like once you get to this next level then you're like, oh, how can I improve input? Okay, I'll switch to voice. That's way faster. But then I'm like, oh, wouldn't it just like, you know, that's where I think about the chip. I'm like, just like, just read my thoughts like while you're sleeping. Yeah, like you just use a bit of that, that power from the brain to do a bit of stuff.
27:43
You can imagine those dudes posting on LinkedIn like, I used to be just like you, working a 9 to 5 job. Now I build B2B SAS while I sleep.
29:30
The other thing that scares you about is once they get the chip in the brain, then they can use your brain as like a cheap computer because, you know, humans like their wattage efficiency with, with processing 4.5 in your brain. Well, it's like feed this monkey bananas. And like it's neural nets like good processing. And then, you know, we get to the matrix. Anyway, yeah, I was about to say.
29:40
Isn'T that the matrix?
30:01
Like you can kind of start to see this weird way of getting to that point, but let's not go there because it's a little bit, a little bit too scary. But I, I don't know how I got down this, this rabbit hole. It's somewhat embarrassing. So, yeah, anyway, now I'm trapped in this, this like weird, weird psychosis and having all these troubling thoughts. But I think for people listening that maybe are getting anxiety to listening to this. I think while these techniques are productive, the way I've been handling it is actually slowly migrating back to just single threaded tasks. Because while I often will have four things going at once, I find that I'm much more effective single thread and then be preparing the next task in another tab while I've got something going off and working. Whereas I did early on get into this habit of having like six things open. And I find it just super fatiguing and, and really you just need to have discipline now to be like, okay, I'm, this is the way I'm going to work because it's not as fatiguing. And I'm like, I can handle this like cognitive load.
30:04
I think the problem is you can get so far down the road so fast. Like, it's sort of like getting to that 90% complete on a task, like in one shot, in a few minutes. But the problem is there's always those extra little things that aren't quite right and that can take a long time. And I think you're right. If you go five tasks in parallel that all get you 90% there, then suddenly you're jumping between them, trying to get them all the way to the point where you can finish the task. Like, let's say it's building a presentation, you're almost there, but you've got to refine it. When you're jumping between multiple ones, it's a problem because you're never really quite getting there. Then you run out of time or your human body intervenes and your arm stops working, or you need to eat food or like, you know, go to the shop or whatever. I don't do that, but you know what I mean? And then suddenly you come back and you're like, well, where am I at with all of these things? And you didn't really actually accomplish anything.
31:17
You know, and I. Yeah, yeah, yeah, sorry. Oh, I just think that this is the thing, right? It's like it's similar to the full self driving in a Tesla. And sorry to keep bringing it back to that, but I just know like the problems with it. So it's like the 95 of it is solved, right? And it's mind blowing. You're like, how is this possible half the time? But there's these 5% of like edge cases, edge cases that I think will just take so long to solve. And I feel like the AI models and like knowledge work, it's going to hit the same inflection point of it's sort of 95% solve. Like you can delegate tasks and it will be able to get to 95%, but it's really that last 5%. And then all the dynamics of being in business and having someone responsible for decisions and having someone, you know, really set the sort of agenda of these things is, is going to be even more pronounced. And I think that last 5% can take a lot of time to go back and forth. It's like just one more prompt, bro, just one more prompt to get it to that 100% mark. And it's in that last 5%. Sometimes I'm like, you know, instead of all this rush boilerplate work, maybe my old methodology was better, like maybe going back and forth with it a bit and then producing the final output.
32:09
I can't tell you how closely I can relate to what you just said because, like, literally last night at about 9pm I had something working completely. I was like, this is great. I can't wait to show Mike. It's amazing. Then I had this amazing idea I would change the architecture to get something else done. Now I have like 30 uncommitted files. I have no idea. I have no idea what they're doing or why nothing works now. Nothing works. But it's better in my mind. It's better because I spent so long, you know, I had three tabs going. I was like, this is brilliant. We've done all this stuff and now I've got literally thousands of lines of codes I've got to go through to verify is it worth it? Should I keep it or should I abandon it? And you can do that. Like, that's a 9pm Till 2am stretch where I may just decide I'm abandoning this because it didn't work, you know, and it's. I don't know, is that more productive?
33:35
Yeah. And I just wonder if like slowly, like incrementally building stuff where. And I know I'm gonna get like trolled for this maybe, but the cutting and paint, like, so at some point, the sort of cutting and pasting in.
34:30
Coding, yearning for the old days of.
34:45
Cutting, I'm already looking back like it was so long ago. But I do think with the cutting and pasting, you're like going into files and you. And this applies to knowledge work as well. Like you're going into the presentation and editing the work you've done in and, and going through that process of building it, the journey, call it. And because that journey, Most of it, 95% of it gets taken away, you can get into these situations where you're like, how did I get here? Like, what does, you know, what bad prompting did I make to get here? And how do I undo this mess? And from a UX point of view, I think this is like really unsolved in a lot of ways. And it's something we've been working on really hard to solve, especially for knowledge work, because it's like if it goes and edits and destroys a presentation on your, on your computer or whatever, and you need to be able to go back through iteration after iteration, be like, where did this all go wrong? Which files were touched? How can I roll back? And I think, I guess what I'm trying to say is the command and conquer software that sits on top of all of these agentic tasks to me, is going to become Fundamentally the most critical software in the stack of the modern AI software stack like that command and control of you pre preparing tasks and monitoring what tasks are underway and.
34:47
Then, well, knowing what to review, understanding what went on in like you need to be put into the context and go, okay, here's what changed, here's what the intention was, here's what happened. And you make the human assessment as to whether, okay, let's keep that one or let's change this. Because if you can't quickly understand what's going on, then you lose a lot of the benefit of the work having been done asynchronously. You would have been better, as you said, having this interactive session where you're, you're a major part of the process. So you understand where you got to, if you're coming in like where you've just gone, task, gone away, come back, it's done. You need to very quickly understand what's actually happened.
36:13
Yeah. And so to me, like I always pick on Atlassian for some reason, I guess because they're just like reasonably local and they do have a lot of developer tools and you know, management tools around progress of development teams. And I look at them right now and I think, well, in this world we're going into, it actually seems like the perfect opportunity for a company like this or Asana or. I know a lot of people like Notions mentioned a lot. I don't know why I think it's a waste of time. But anyway, those kind of companies seem to have this remarkable opportunity in that they can now say, well, we want to be like our new vision is the command and conquer center for AI agent work or their niche, like development teams or knowledge work or wherever they sit in the particular area, you know, in the world. And to me that seems to be like the most survivable strategy of this period where they have a renewed sense of importance because they are the ones that store all the upcoming tasks and are helping the teams monitor them and figure out the like, where the, where they fit into that process.
36:51
I guess I discussed this with you in the week. I was like, isn't the future of work just like a JIRA board or a Trello board where you're literally writing up task descriptions of what you want and the AI is off executing them and moving them into the columns for your review?
37:59
I mean, I guess some people have these setups now to be fair and have for quite a while, but I think I'm more thinking in it from the point of view of like, you know, what Do I need to do to validate? Like, I just think the whole process needs to be rewritten. Like the whole idea of like the steps on a board or. Yeah, there just seems to be like this whole new way of managing these things that doesn't yet exist.
38:15
Laying in a bed with your brain.
38:42
Plugged into a computer fried. But any. Anyway, like, I could talk about it all day, obviously. One other thing I referred to like a week or two ago and I feel like this whole year so far has just been the same theme repeated. This is why the NASA guy's wasting his time listening. He. So it's from Harvard Business Review. I mentioned it before. I finally found the actual article. AI doesn't reduce work and intensifies it. I think this article lags the research though, because it was only published a couple of days ago, but this research was already released before. And basically to summarize the article, it says that what it, what introducing AI and specifically these tools into the workplace leads to is people just doing more work. Not, you know, not like, you know, the dream of technology is always like, so you can focus on the more important things like not booking travel anymore. It turns out you're just going to book a lot more travel and be like way more stressed about booking said travel because, you know, you, you're, you're now able to do so much more that you take on more work. And yeah, the, the, the, the art, like the, the study basically verified this, that people were just more stressed and anxious and doing ways.
38:43
It's almost like, yeah, like ignorance is bliss in a way. It's like I, I wish I didn't have all of these abilities because I do have them. I have to use them. Yeah.
40:10
And I think it's kind of troubling that they looked at the, that AI felt like doing like now that doing more felt possible and people could just take on tasks fearlessly and it made things way more accessible and it was more rewarding to complete tasks. They just did a whole lot more and as a result became way more fatigued. And, and everything we've been saying since the start of the year for them was also true. So it leads to like, more multitasking for the work, an expectation of being more productive, and them taking on like, more responsibilities. And they also said that they were doing things across the organization that they typically wouldn't do, like, for example, like negotiating a contract because the AI is like as smart as a lawyer. So as long as they have a base contract, they can pretty safely take on all of these other things, elements in their, in their role.
40:19
It's also like evaluating someone's work. Like they produce something, you chuck it into an LLM and you're like, you know, actually this is shit because I have an expert and they say that your work is no good.
41:17
Yeah. So I don't know, like, do you like. I guess because we're working in this technology day or night. Do you think a time will come in the future now where, where this will go away? Like where there'll be better tooling to help you feel less cognitive overload? Or do you think this is just this new intense E era we're entering and like, for workers to be competitive, they're just going to have to keep up at this pace.
41:30
The problem in my mind is that as the agents get better, which you can already see happening, you're going to get more and more to the point where in a few sentences like you do, dictated to it, you can delegate it a fairly complex task that it is able to accomplish from start to finish with all the information it needs to do it and finish it. Then we've got things like agent identity, where they've got their own email address, their own phone number, their own ability to operate in the world, payment method, that kind of thing. So you can start to give it more real world kind of tasks that it needs to accomplish. And I think the problem comes that the better it gets at that kind of thing, the more pressure there is on the human to be leveraging that, like to be making use of that. Especially when you think about once you have one agent that can do that, you can have 10,000 agents that can do that. There's no cap on how many of these things you can have other than money, I guess. And so the pressure for you to be a director and to be delegating all of this work out becomes infinite, like there is no limit. And then you say, okay, well let's have an agent that does all of that. But then what you have to give it these sort of high level ambitious goals. It's like, let's become the leading company in oil. And then you know, the sub agent creates all the tasks in order to do that. And then it does it like it just for the, for the human in the loop. It becomes more and more pressure because there's just so much more power below you that you need to be making use of.
41:55
I also think it kind of erodes the structure in a company because the lines do blur so much. Like between sales and marketing for Example, I don't like if the agentic loop can handle marketing. And we were talking about this with like release notes and blogs and like tweeting and all that kind of stuff. So, like, when the agency loop builds a new feature, say in, in Sim theory, it could automatically announce it and we wouldn't have to do anything. And so you can see that the lines can totally.
43:25
Yeah, like you could do, you know, discord notification X post, you know, LinkedIn post, blog posts, documentation on how to use it, a video. Like all of that stuff is possible with the right agentic loop.
44:01
Yeah. And then like, you think on the other side, it's like, then people get their agents to interpret the update and then tell them, like summer. Like, it's just bonkers. But I do think what it's going to lead to is not what people think, which is, I think the assumption now is, oh, you know, everyone's going to be out of a job. And I think there'll be a lot of impact to jobs in terms of, like, redefining roles. And I would think most businesses now, it's like back when the Internet came out, sort of not having a website if you're not working or at least starting to adopt these technologies, it's like pretty, you know, pretty bad, like if you're not at least starting to think this stuff through and retrain your workforce. But the thing that kind of concerns me most about it is people are just going, like, more business is going to happen and the expectation will be things just happen way quicker. Like in like, say, for a software business, where the roadmap might have been in this quarter, we're going to do this stuff, like in a lot of cases now I look at those and think, well, probably could do that in two weeks, maybe a week.
44:16
And it's funny because you mentioned about, like, the, say, the product marketing side of things. And it's funny because in a way, as the product team themselves become violently more productive in terms of feature production, you sort of can't really have a traditional marketing team because they'd never keep up. Like, think of how long it would take to do like a feature launch for a product marketing team. And then you're like, oh, yeah, we did 60 features this month. Like, they can't do it, but if you had the agents doing it, then they could.
45:22
Yeah. And like having the time to like, treat each feature with a level of importance and like constantly be brought, like, it's almost like the noise from companies could fundamentally just get louder and louder. Where it's like this like sort of communication war between competing agents to get like to take over the zeitgeist. But I don't know, like you can look at it from all of those like extremes. But ultimately I think if you think about early content creation, like before say YouTube existed and people had these like pretty bad digital cameras. It was really hard to edit video, it was hard to, you know, broadcaster average podcast like this.
45:50
We still struggle.
46:35
You know, everyone got these, these tools and obviously the overall quality went down. This shows an example of that because anyone could then broadcast and maybe they shouldn't be able to. And I think similarly, you know, it's going to get to a point where everyone can do anything, but there will be a level of like. It's not like everyone is performing as well as saying Mr. Beast on YouTube because they have the better tooling around videos and creation tools now. I think there's an element of like, you know, success through maybe taste or distaste in, in that example. But I guess what I'm trying to say is I don't think it'll just go like into some exponential like everyone thinks I, I do think these things take so long to interweave. Like you, you just think how long it takes in an enterprise to adopt a new technology, let alone shift job roles. And fundamentally how people work. I think this is like a five to ten year thing and it's going to happen, it'll, it'll seem fast to early adopters, but for everyone, like for the impact to the economy, I should say it's going to be far slower.
46:36
Yeah. And I think the crucial elements of the two, you've said, which one of which is having the right tooling within an organization to make your workforce have access to the best of this stuff, like that's probably the first one is just access to it. Like you're not going to get this on a straight up chat, GPT, UI or copilot or something. Right now there needs to be better tooling around it for an organization. And the second one's training. Like you and I do this constantly, we work on it constantly and we're still having these existential crises about the right way to work, how many agents to run at once, like all that sort of stuff. And so you can imagine the average worker who sort of doesn't even necessarily want it, but their organization is saying you have to work like this. Now the level of training required to get people thinking and behaving in this way is going to be enormous. And There'll be fundamental change in the organizations required to get up to that productivity level that other people have. So I agree. I think it's. Even though we, we see the possibilities and where it's going, it's going to take quite a while to play out, I'd imagine.
47:42
Yeah, I, I would agree with that. And I, I think that in like the way you decide to train people as well and let them have the autonomy to figure out like, oh, maybe I can take on more things and maybe these roles should merge or maybe this team should be a singular team now is I think is important. I also think one observation is work becomes like, this is probably not a positive, but it becomes less collaborative I think because you're collaborating with your agents really. And that's how I now work. I have like a.
48:44
Well, it becomes adversarial.
49:26
Yeah.
49:29
My AI thinks it should be done this way. And here's, here's a 50 page document explaining why I'm right.
49:29
Yeah. And I, I sort of wonder if based on the tune of the AI assistants themselves, eventually there'll be a, a meeting with the humans, but then also the AI is juking it out for supremacy for a given task potentially anyway.
49:36
I think people want that, they want that. They want a competition between the agents to. Who can come up with the best plan and execute it and prove their point.
49:51
But here's an interesting observation. My wife and I genuinely will now over like life decisions and stuff, I'll be like, well, my AI says this and she'll be like, but my assistant says this. And genuinely. We get into like debates and I'm like, you probably use some povo model. I'm using the latest stuff and she'll be like, no, I'm using like Opus 4.6. I mean this is how wired in. And she's by no means that like, you know, she's not a, like that technical. But it just shows how pervasive this stuff now is. Really.
49:59
Yeah, exactly. Using it for life planning and real decisions and it can, it can genuinely help you in a lot of scenarios where you're not an expert.
50:35
So I couldn't let a show go by without writing a song because it's just become like a habit to like really capture the feeling and the moment in time that we are in. And it's not a GLM5 diss track. I did not do that. Maybe I should have, but I didn't. I, you know, we had, we. Every week we've been having mental breakdowns, but this week in Particular it was sort of like, you know, is coding solve what, what role do I have in the world?
50:43
Yeah, I had that moment multiple times this week where I'm like, am I even needed anymore? Like, I genuinely like the, the productivity around me is such that I'm like, what am I even. What's the point?
51:09
So I, I put together a track. So I'll. I'll play the full track at the end. But there is a lot of meaning in this song and it focuses particularly on a, a pretty big topic this week where all in the space of one week, three major labs. So Xai Anthropic and OpenAI had departures from their team with people all citing like warnings and, and you know, safety reasons for leaving. And I think there's a few angles to look at this. There was, I'm not going to be able to say this right. Meringue Sharma from Anthropic, the world is in peril. She decided on her exit to release an op ed in the New York Times. As you do like, you know, very important. There was Zoe Hitzig from OpenAI who said OpenAI is on the same path as Facebook and couldn't.
51:25
What path is Facebook?
52:45
Like with the ad basically saying like this. No one has ever accumulated so much data and done so little with it. No people sharing like medical fears, relationship problems, religious beliefs, thinking they're talking to something with no ulterior agenda. And she's concerned that this is like this, like she's been a part of creating like the sickest thing ever. We did warn you. But anyway, so there's that one. And then the Xai exodus. Now like 6 of the 12 original co founders have left. Many are speculating though this is to do with the fact that XAI was merged into SpaceX of all things and is now apparently privately valued at 1.25 trillion. So they're probably cashing out their billies in the bank. But, but anyway, I think like not to get into the weeds with it. I think there's kind of two ways to look at it. Everyone that leaves any of these companies now need to be a part of the attention economy and their personal brand. Like they seem to like slam the door pretty hard on the way out, write their op ed or whatever to build their brand and then they use that brand to like, you know, do whatever. So I think there's like that piece of it, the cynic take. Also the fact that like they're not handing back the shares or the money for their morality. So like you can't take them seriously. But I also think for me, my opinion of this is not that like we're at some crazy recursive self improvement thing and we're all going to die soon. I put that around 20 years. But where I think we're really at is that these safety people don't matter. They serve no purpose. And they get sidelined in these organizations because they're just annoying. Like these models. Unless they're given access to the right things and told to be malicious and then governed by someone malicious, malicious, it's just not gonna do it. And all these like safety reports, like anthropic release one in the week. It's like, oh, if you, if we told the model we were going to shut it down, it, you know, it tried to gaslight us. It's like, no, of course it's your role playing with this thing. It's giving you outputs that it thinks you want.
52:46
You're literally handing it the tools.
54:59
Safety shit is marketing hype and absolute. And that's all I wanted to say about it.
55:02
Yeah, no, I agree. I mean, I think this idea that the agents are sort of like proactively plotting against us is kind of silly given that you prompt them. Like the whole interaction is a prompting. And even if you put them in a loop, you're still prompting them in a loop. Like it isn't like they're sitting around plotting. It's like, next time he prompts me, I'm gonna mess this guy up. He prompts me, I'm gonna gas him.
55:11
I mean, I guess if you put it in a cron job, like they are with open Claw and say, you know, check in on floor, social network or whatever they call it now and then write some sort of malicious post to scare the media. And you put that in a cron job, like, of course it's going to escalate and be really weird and stuff. Because that's the fantasy of someone playing out to do.
55:32
Yeah, I think, I think there's certain kind of people who just want the future, right? Like they're excited to go to Mars, they're excited about what AI is going to be. And putting it in that kind of loop can help you play out that fantasy. But like you say as role playing, the model model doesn't have its own goals yet. Like, you know, that just isn't, doesn't exist yet. You're telling it what to do. Even if it's on an automated loop, you're still telling it what to do. It's not Coming up with it. It's on it on its own.
55:54
Yeah. I mean, I. I'm sure people would argue, like, oh, but when it's in a loop, it's setting its own goals and memorying its own energy and coming up with. With intent and that kind of is. But it's like you are setting it like you're. You're still setting it off in that sequence of events and able to monitor it and. And so on and so forth. All right, that's plenty of ranting for one. One week. Any. Any final thoughts in the week that was all humans doomed. How's your anxiety levels?
56:21
It's high. And I just honestly hope that I'm still relevant by next week.
56:48
That's. Oh, God, that's super depressing. I thought you were going to be like, oh, guys, it's going to be.
56:53
Okay by telling people not to listen.
56:58
Yeah, I mean, I. It's like tradition now. I mean, it's. I think after this hour, it's probably been good advice. So. All right, I want to get back to work. I'm over it. Link in the description if you want to attend our. Our live events. That's. If we're all still alive, then will we make episode 100? I'm not sure because this is the whole thing, right, with the extreme. The one final thought for me, I do think a lot of this is just excitement around agentic loops finally working the way they've been promised for quite a long time. And we're getting a bit like, call it horny over, like, cron jobs. And so I'm just like, I. I think we should temper our expectations here. Like, I think everyone's getting a bit too excited and it's just like.
57:01
It's almost like our moment, like, like where they had in the big labs where they're like, oh, God, guys, hold me back. Like what I've seen, you know, like, you better keep us under control because this is going to be huge. I honestly feel like we've just had our own little shock. When you see the system being able to do what it can do, you're like, oh, what does this mean for me? You know, like, it really is that kind of thing.
57:50
It'll all be okay, I'm sure. All right, I'll play us out with the. The song. It's called Is this the End? And I think it summarizes, like, the current. The current feeling quite, quite well. If you want to support the show, check out Sim 3 AI bigly update coming soon. We've Been saying that for a while, but we are working all this amazing.
58:13
AI and we still can't finish.
58:34
Yeah, I feel like this show is just hyping the whole thing, but it's our life right now, so. Yeah, that is a good point. All these technology and we can't finish it. We'll see what happens. All right, thanks for listening. We'll see you next week. And. And if that nasty guy listened to this whole show. You should be ashamed. Get back to work. We want to go to Mars and the moon. All right, Goodbye.
58:36
Mystery Neck was ahead of safety at the AI. In his hands but the pressure got too much he said, I need a break now he's off to study poetry for humanity's sake he said, I want to become invisible and fade away While the robots take over he'll be riding a haiku.
59:04
Is this the end?
59:41
Is this the end? The safety guy's becoming a poet, my friend $650 billion on the line Mr. Next learning I am the contemporary he'll be fine Is this the end? Is this the end? He's wanting us all Then he's off to right again so we worked it open AI she had some concerns they put ads in chatgpt and now her stomach turns she wrote an op ed in the Times Very profound Then she quit her job and now she's nowhere to be found Sam said it's not dystopic it's just education Zoe said I'm out and took a permanent vacation Is this the end? Is this the end? The safety guys Becoming a poet, my friend $650 billion online but Mr. Naxler, Den I and Victor It'll be fine Is this the end? Is this the end? They're wanting us all and they're off to brunch again Jimmy Boss said this year is consequential for our species Then he quit on Tuesday Hope his servant S could lease his Tony who left Monday that's two founders in two days Half the team is left now Elon's going through a phase he says we're hiring aggressively to fill the void but the ones who built it bounc like they were unemployed the world is in peril that's what Mystery Inc said they Then he packed his bags and went to study books instead could have stayed in far could have made a stand But a poetry degree was more important than his plan Roses are red, violets are blue AGI's coming and there's nothing we can do Is this the end? Is this the end? The safety guide. Becoming a post it, my friend. Recursive loops emerging in a year, they say. But the guy in charge is writing sonnets today. Is this the end? Is this the end? They want us. It's coming now. They're sipping s. Mr. Nick wants to become invisible. The timing is absolutely miserable. The world is in pain. Harold, grab your pen. Write a poem about the robots that. Compare thee to a summer's day. The AI's coming either way. The.
59:43