Nano Banana 2 is Here! Gemini-3 Shutdown & The AI Layoff Myth | EP99.36
The hosts discuss Google's new Nano Banana 2 image model, which offers 50% cost reduction and faster speeds but mixed performance results. They also cover Google's discontinuation of the poorly-received Gemini 3 model and analyze recent AI-attributed layoffs at companies like Block, questioning whether AI efficiency claims are genuine or just cost-cutting excuses.
- Image generation models are reaching 95% quality but the final 5% refinement remains frustratingly difficult and time-consuming
- Agentic AI workflows with smaller, targeted loops are proving more effective than large context single-shot approaches
- Companies claiming AI-driven layoffs may be using AI as cover for traditional cost-cutting rather than genuine efficiency gains
- Model routing and dynamic switching between cheaper and premium AI models can reduce costs by 90% while maintaining quality
- The competitive moat for SaaS companies lies in domain expertise and regulatory compliance, not just software functionality
"If you're a company like Ideogram, what are you doing all day? You know, like, you sort of like, hey, this was your whole thing and they've come out and beaten you for like six months and you haven't even got an answer for that."
"I don't even need to know how to code anymore. I'm just like plus do it now."
"What does this mean for the world when the cost of Legal documents, for example, is zero. What does it mean when the cost of software goes to zero?"
"People aren't just going to replace their enterprise software. Like, it's just not going to happen."
"Nicole, can you please take a look at this, I need your help... Nicole, your task is complete."
So Chris this week I'm actually saying so Chris this week. So I don't get emails about not saying so Chris this week. Apparently last week I said Chris this week and that upset some people. So I'm sorry, I'm really.
0:02
It's like the comfort food of podcasts. You need to hear the phrase to end your week on a Friday.
0:16
It really satisfies people, apparently. So we have some launches, some things to discuss that are very exciting. We have a new image model From Google, Nano Banana 2 combining pro capabilities with lightning fast speed. And the big takeaways really from this new net opener to launch are the fact that it's about 50% cheaper, which is a huge improvement. And it is a lot faster because it's using the underlying Flash model now to produce the images. So you, according to them, you're getting pro capabilities, lightning fast speed and about half the price. But when you get up to the 4K images, I think the discounts only like 25% so it's not as good, but it's still pretty impressive. And I think that the initial reaction so far that I've seen to this image model is people are not that overwhelmed like they were about the first ever release of Nano Banana. But I think it's just because the bar is so high with the Nano Banana series of image generation models from Google that I don't know, it's like hard to top it. But to me the problems were speed and price and it seems like they've made huge progress towards those two things.
0:21
Yes. I don't know so much about the speed though. Like if you've had a different experience to me where it's actually faster, but for me it seems about the same speed as regular Nano Banana Pro.
1:36
I think it's probably because of demand right now. When I first started using it earlier today, it was a lot faster and it has subsequently slowed up a fair bit. So I suspect it's just like a demand thing right now. But I do, I do personally notice it being a lot faster, especially on like high resolution images. Like previously if you're doing like a 4K image generation, it was awfully slow that I would just like forget about what I was working on. And it also means like iterations are just quite painful. But I've noticed now it's not as bad. But yeah, I think it's just a demand problem right now more than anything.
1:47
Yeah. So in theory it's faster, but we've just got to wait for them to catch up capacity wise. I've made quite a few images with it, and it's pretty good at instruction following. Like, it seems to only ignore your instructions when it doesn't really, like, want to do the image that you're making. Right. Like, I've tried my usual controversial stuff, and the. The safety filters seem pretty strong, but in terms of, like, I'm getting it to do kind of random stuff, like having a little character come in from the side with a cartoonish quote on a photorealistic image with a text banner at the top. And it's able to do all of those things just fine. The only thing I've noticed is that the actual photo bits of it seem. They look a bit composited. Like, they look a bit like the old Looney Tunes scenes, where it's like a cardboard cutout shoved on top of another image. Like, you know, the person looks like a photo of the person, though the background looks like a different image. Have you noticed that?
2:25
Yeah, but this was a problem in nanobanana Pro as well, where it. Sometimes it makes really, like, poor Photoshop edits. Like, almost like, you know, I went into Photoshop and tried to cut my head onto an image. And you've really got to like. I find each week when we make the podcast thumbnail, and a lot of people jokingly see our thumbnails as like, an image benchmark of how good the image models are getting now, because our thumbnails have improved dramatically since the beginning. And what people don't see behind the scenes is when I generate them, sometimes I can do like, 10 generations for them, just yelling at it, being like, no, make it look more realistic. You know, do this, do that. And because Nano Banana Pro was quite slow, it took me, like, a fair amount of time to get it right. But I ultimately think that it. Yeah, sometimes it just does that and you've got to yell at it. Similar to all Google models. I think now you, like, it has that sort of, like, adherence to a certain style or path in its response. And if you don't yell at it and try and take it off that path, then you kind of get nowhere. Sometimes I just start again, honestly, with it and go again. And also I think what's helpful is just doing 10, not 10, but, like, four variants, say, up front, like, getting it to make four different copies of whatever you're trying to make instead of a single one. And sometimes one of them out of four will be good enough.
3:19
Yeah, it makes sense. And I think that it's definitely one of those technologies that can get you, like, 90% of the way. But then the last 10% can be incredibly frustrating. Like, I've worked on diagrams with it, and it's like there's one little piece of the diagram it just simply cannot get right, no matter what you try. And I've done, like, 50 iterations to try and get around that. I'm curious if this new version is able to overcome that kind of thing. And on that one interesting thing, I think, to note about these models and the prompting is you did a slide generation using it that turned out brilliantly. And I kind of wonder if a lot of it is just prompting. Like, if your prompts are specific enough about what you're going for, it's able to get that accuracy upfront better.
4:42
So here's. Here's some interesting things I have noticed from using these models pretty extensively. And interestingly, I've built this into the new version of Sim theory because I think it's so powerful. So here's an image I created of us in bed with Dario and Sam Altman. Like, I love how it's just more than happy to oblige with this prompt. No. No problems at all. But, yeah, if you do anything with, like, a female, it's like, oh, no. But as long as it's all males, you're sweet. But yeah, so, like, one of the things I've noticed is if you mark up your images. So, like, if I circle myself in this particular image and then add the annotation to the context and say, like, you know, like, paint a clown face onto this person. Right. That will be way more accurate in the change. So where you draw on the image and annotate it, it's like, way more performant. And I'll run this example and I'll show you the result in a moment.
5:25
Let's see how fast it is. It's of a good scenario when you're live recording.
6:38
Yeah, yeah. So we will see how fast it is. But likewise on. On slides. So.
6:42
Oh, so you can actually annotate the slide and say, change this part only.
6:49
Yeah, so I'll show you on this one. So annotate as well. I put a little square here and said, put a portrait of Sundar. And I mean, quite honestly, if I go back and, like, redo it, like, it. It sort of looks like him. It's not, like, fully accurate, but it did it exactly where I put it
6:54
in the right spot. Yeah, that's very interesting.
7:18
So for slide editing, I think it's. It's super good, but it does rely. I'VE noticed. And it kind of makes sense, like, for prompt adherence, like, if you want to get rid of this, say, and you circle it and then say, like, get rid of this image. We'll check back in on this as well. But it tends to work really well. I mean, you write about slow speed, so it's still editing my. The cloud.
7:20
Yeah. I mean, the reason I knew is I've been. I've been using it all morning. And to be honest, if it was actually fast like they said, I'd be going wild with it. But after a while, you're like, I just. I couldn't be bothered waiting anymore.
7:46
I mean, it was, though it slowed up because we just added it and it only came out, like, I added it, like, about a minute after I saw the tweet. Like, I was like three minutes old and I added it. And at the time it was super fast. But clearly they're just getting absolutely smashed by it now.
7:58
It makes sense, all the different podcasts, hosts experimenting with it. The thing you also mentioned is that they. They talked about the price. And my comment to you was I didn't even realize the first one cost that much. Like, it didn't seem like the cost was so burdensome. It would stop you from, say, using it for your job or whatever. Wow. Did that well.
8:17
But look. So here's the circle to show you. I know most people listen, so not very thrilling, but I put a red circle source material around my face and then said, you know, paint a clown face onto it. And I mean, it doesn't look that realistic, but it kind of. It does work.
8:36
I'm gonna try this. I gotta try this annotation thing. I didn't even realize you could do that. It's awesome.
8:54
Yeah, it. I think it works particularly well. Now. Let's check back in with our getting rid of that image. And look at this. I mean, it just wiped it out on this slide. So if I undo this now, the change. Oops, redo. But, yeah, if I say so. See, here it is. Can you see that?
9:00
Yeah.
9:19
And then I'll go redo and then look perfectly gone. Everything else the same. Like, unbelievable.
9:20
It.
9:26
It's sort of, you know, if these models go the way that I think they'll go, like, they get way faster, way cheaper all of a sudden. I mean, I know, like, Canva talks about this thing called last mile design, and I think you just said it before, where, like, 95% of the work now is so easy and ridiculously easy in most cases. Like, you can get 95% of the way there's but that last 5%. Like if they can come out with an image model like this that understands layers somehow. And then you can take that and take it into something like a Canva and play around with the layers and sort of finalize it. I think all of a sudden these, these. Yeah, these, like it does what it's sort of done to coding with AI where it's like, you know, anyone's a great designer now if they have decent taste.
9:26
Yeah. I think though that the layer thing and we know there's all the tools like segment anywhere and the things like we know they can sort of already do that. Right. And I think that it's actually sort of beyond just like, okay, get it into Canva and edit the layers. To me it's more like if the AI can recognize the layers, then you can do targeted editing. So one of the things that I've noticed is say I'm building a presentation because I want to talk about the technical implementation of some software architecture. Everything's great, right? But one of my slides has the wrong logo on it for some reason. Now it sounds easy, but sometimes with the models you simply cannot get it to replace the logo. That's before I knew about this annotation thing. Right. But I couldn't do it. And so then what am I gonna do? I don't have photo editing stuff. I've got to go into like paint or like screenshot over it or mask it or do some like really dodgy thing. So I go from having what could be, you know, a presentation that looks like I've spent a month preparing it, it's absolutely brilliant. To looking like, oh, this is just AI. This is just like amateurish garbage. And I shouldn't pay attention to any of it. Like it completely shatters the illusion when there's like small defects in, in what you're producing. So overcoming that last mile is Canva calls it, I think is a very, very critical thing.
10:20
But the last mile is sort of true in a lot of AI use cases. Like if you look at tells Tesla full self driving right now they are working on the last mile problem with full self driving and these like literally, like truly. Yeah. And then you've got, you know, like with coding models, same thing. Like most people get into something like Claude code or start vibe coding something and they get 95% of the way there and it turns out the last 5% is the killer. Like it always was in a software project. You just get there a lot Quicker. And so I, I think a lot of it is, is solving those problems now more than anything. But just to go back to how good it is, I mean, look at this. So we did the horse egg, horse egg experiment. I got it to come up with a magazine cover of Horse and hound with a woman in like an elegant bikini sort of horse and hound style, I guess. And it says the unbelievable or something. The unbelievable discovery. Discovery. Horse eggs are here first foul hatched in Kentucky. Top 10 dressage tips for spring. But then I saw the example in their blog post about a flat lay infographic depicting the water cycle and I thought, wow, I've never seen it create anything that good. I'm going to try, try this prompt. It took me two goes, but check this out. So this is the life cycle allegedly of the horse egg cycle. A complete guide. And it's like, it's for those listening. It's a cardboard cutout. It looks like it was done by hand, showing a gestation period of six to eight weeks for a horse egg. A nesting face. And it's got tiny bits of, it's got tiny bits of hay, like, stuck onto this, like mood board.
11:36
Step two is build your nest. Yeah.
13:28
Oh, well, no, that's step three. Step two is hormonal surge. You have to take folic acid and some drug called.
13:31
Some drugs how you turn it into a horse.
13:38
And some drug called equestrian. And then four is the laying. And it's got like. I don't know how to even describe it, but that fluffy, like cloud like material, you know, cotton buds or something like that kind of thing. I'm very out of touch with that stuff. And, and then step five, incubation, 45 days. It shows the outer shell, the, the yolk sac, the embryonic foul, like, and then the last is hatching. And anyway, it's unbelievable. I'll put a link in the description below to this image a day to
13:40
print it out and put it up at your kid's school on the wall.
14:12
Yeah, it's like a real, real thing. So it can do some amazing stuff. I. Here's my honest opinion. I don't think you're sacrificing anything in terms of like fidelity or its ability to create things. The only thing I have noticed, and I'll show you one more example is, and this is a problem with all image models, right? Is the. How it degrades over time. So check this out. This is, this is now us sitting, having coffee in San Francisco. You do. You look. You don't look like you, you look a bit scary. I am missing eyeballs for some reason. And then I'm like can you put a dinosaur in the background terrorizing everyone? And then there's people running away that
14:14
have noticed that looks like quite a nice day out. I wouldn't mind doing that.
15:02
Yeah. With the dinosaur or.
15:05
Doesn't matter.
15:07
So yeah, it looks good. And then I annotated and said can you put a gold necklace around me? And it did it perfectly and retained. Yeah, you cannot tell the difference when you annotate. It's amazing. It really just adds to it. I don't know why I didn't notice this earlier. Then I'm like can you put Taylor Swift with us as well at the table? And it looks totally fake. It's one of those cheap edits like we were talking about earlier. And yeah, so anyway it like I just, I guess what I'm saying is it still has the problem of like the character degrading compared to Nano Banana Pro. Like I don't think you're getting anything more by going to Nano Banana Pro. Like maybe it's slightly better at instruction following but I, I think for the cost factor and, and hopefully speed, once they get the, the servers warmed up a little bit more, it's worth if
15:09
we could do that quality at a much faster speed. I think there is an advantage in it because you can just iterate that much quicker. I, I, I'd love that.
16:05
So one of the problems though that this really solves is this idea of like when I built this slide maker originally I was really worried because I'm like oh, you know, if you have like a 20 slide deck and you make like 10 iterations on that, that's going to cost the user like about 12 to 15 bucks in tokens to you know, make the slide deck. But that cost now goes down to about you know, what, five bit under $5. And some people that are cheap would choke on that and be like I'll just do it myself. But to me like personally, once you put your brand guidelines as a scale skill into the system and then ask it to make a deck, you don't really need that many iterations anymore. Generally it's good enough and it creates like speaker notes, all the slides fully on brand, it can do some research first. So I'm willing to pay like a couple of dollars to not to avoid all that work myself. Like that's a huge time saving. It would have taken like me like probably two hours of my time, maybe three to put together the deck. Like put charts in and all this kind of stuff. Now I can do it in what, like a couple of minutes.
16:13
And I agree with you, I must say. I. The price difference when you read it out to me in terms of sense, didn't sound that significant. But when you're talking about that, there is a big difference between it costing $5 and $12 to do a task like that, I think because you're far more likely to reach to it as a starting point if you're like, oh well, even if it's, even if it is just a starting point, that's actually good value for money.
17:25
Yeah. And the other thing I'll mention is on the text to Image leaderboard, Nano Banana 2, the Flash model is now number one on there. And number two, weirdly, is GPT Image 1.5, which just I think is embarrassing because it's a terrible model. And number three is Nano Banana Pro. I don't really understand or trust these ratings, seeing GPT image 1.5 so high. Like, I just wouldn't. It wouldn't be in my top three. But you can see the price. It's $67 for 1,000 images based on the way they test, versus $134 per thousand images. So that's a dramatic price decrease. And I think we're only going to see them go down in the future.
17:47
But what's amazing is, like in every other area, the models are usually pretty close. Right. But when it comes to image generation, once Nano Banana came out, none of the others were even touching it. Like, they're not even close. Like, if you're a company like Ideogram, what are you doing all day? You know, like, you sort of like, hey, this was your whole thing and they've come out and beaten you for like six months and you haven't even got an answer for that.
18:34
Yeah. And I mean, the other crazy thing is for Flux 2 Max now, it's actually more expensive than Nanobanana 2. So it's like they don't even have a price, like a cheaper price thing anymore.
18:57
I saw one of my sons using Ideogram the other day for some school thing. And I was just disgusted and sickened. I'm like, what's wrong with you? Get rid of that now and switch to Nano Banana. And like he, it actually changed. He was working on assignment or something. He's like, oh, this actually works now. It actually is doing what I asked it to do. Like the, the difference is so striking between them.
19:10
So the interestingly too, the Grock image Imagine image is the cheapest. Like Grog is just the cheapest on everything. $20 per thousand images. C Dream 4, which is also I think a really good image model, but it just doesn't have good distribution is $30 per thousand. So there are cheaper options out there that are, that are pretty comparable. But honestly if you're working with like slide decks or text or infographics, all the actual like I need to get real work done stuff, that's when Google is winning.
19:31
I must say the big appealing thing for me about the nano banana right from the start was legibility of text, its ability to produce diagrams, charts and other work related stuff and just its adherence to prompt. Generally speaking it will do what you want until you get to that last 10%.
20:02
It's really the model now. Like the go to model as you say for images and in just everything for me, even charts. Like it's great at charts. It's better at charts than code interpreter is which is crazy to me.
20:20
Which is working with the literal data.
20:34
Yeah. And this is just imagining the data into like a diffusion thing. So it's, it's quite incredible.
20:36
I love the technical knowledge on this podcast has gone from like reading papers and understanding it to like diffusion thing.
20:44
Well, I mean who cares at this point I don't need to know anything. I don't even need to know how to code anymore. I'm just like plus do it now.
20:53
Plus it's very. This is an aside of course, but I actually had that comment for you last night. I was working on some pretty heavy hitting features and then I realized I didn't even have my code editor open. I don't actually even look. I haven't even looked at the code it's producing at all. I don't need to know anymore. I can test it and see the results of it and maybe review it at the end. But I didn't even have VS code open.
20:59
It's like siphoning data out to some third party.
21:26
It's like, it's like sinning. I feel like I'm gonna like in the retribution or something pay for that. It's like you didn't even look.
21:29
I must admit I still like anything that's like a critical system operation where I'm worried about data leakage or something like that. I'm still a hundred percent reviewing and I'm scared not to, but I never find fault outside of it. Just embellishing and doing like way too much more than I asked, which I think the anthropic models seem to suffer a lot more from than say, a Codex model. Codex definitely seems to do bare minimum as efficiently as possible and be mean to you about it, whereas the anthropic models seem to just sometimes get a bit carried away.
21:36
It's a damn shame that Codex 5.3 just isn't available to API users yet. I'm really genuinely excited about that.
22:13
Yeah, me too. I think a lot of people are upset it's not out yet and I was actually searching and reading about it before the show and there's a lot of people upset and OpenAI are coming out in defense and saying it'll be available soon, but due to security reasons we're trying to tweak it so it doesn't take over the bio warfare or something anyway. But I did want to quickly touch on just while we're on Google at the the start of the episode, 25 minutes in that we got an interesting email, I think yesterday or today from
22:22
Google, like early, early hours this morning.
22:55
So Gemini 3 is discontinued or being discontinued in, in like two weeks or
22:58
less than two weeks, 11 days.
23:06
So they've just completely abandoned that. Which kind of makes sense now they've got 3.1.
23:08
Really, it's probably one of the worst models ever created. I think Gemini 3, the fall off
23:12
from Gemini 2.5 Pro to Gemini 3 needs to be studied. Also the hype when Gemini 3 launched, what was that? What was that like? It did seem good for like the first week or two and then like, were we just all starved of model releases so much that we got carried away?
23:18
I think what it was is it was on the back of Google just absolutely dominating for months. Remember, like we were playing around with polymarket and Google was just like flatline at the top of the graph. You know, just unshakable, undeniable leader. And so I think the assumption was, well, 3 is higher than 2.5, right? So it must be better and at least the same and then some on top of that. But then the reality of day to day use was you're like, I can't even use it. It is unusable. Remember in the past we've always said if you just left me alone with GPT4 or GPT4O or one of the middle school models, I could probably do most of the work I do now, right, because of the tooling and wrappers around it. I would argue with Gemini 3 you can't because it'll literally forget context within the same cycle. Like, you know, it's in the prompt and it's forgotten what it's actually doing within that iteration. Like it's, it's sort of like it's actually worse than some of the earlier models. Like, I'd rather the original Kimmy K2 than I would Gemini 3. Yeah.
23:35
I just don't understand what happened though, because I do recall and I should. If I was a better editor, I'd cut in clips from our initial reactions to Gemini 3. I know we did think about the, like we talked about the initial problems around the path. Like, you know, it's fixation down a singular path and like its tool calling wasn't improved. But in the like one shot demo, like fanboy hype cycle, it was just one shotting a lot of cool. Like, let's be honest, like, what is it? Three djs. What's the library we all use? I forget the name of it now.
24:41
3js.
25:15
3js. Yeah. Like, it was just really good at that stuff.
25:15
Yeah. And actually I think you may have hit on what the actual issue was because I do remember our criticism of it at the time. We're like, as a single shot copy paste model where you're taking that 10, 15 minutes to build up your context and giving it everything, it was solving your problems. And I think that's what was so amazing about it. It also could do a lot of output. So you could. And I was asking it to rewrite entire files. So I'm just like, here's the files that are relevant, here's what I want to do. It would rewrite the entire thing. Or if you're working on a document or spreadsheet or something, it could do that. But our weaknesses that we listed for it at the time was chain tool calls. It wasn't good at that sort of agentic, you know, early agentic tool looping thing. That was its weakness. Right. So if you think about it, almost certainly what they tried to tweak with Gemini 3 and beyond was the tool calling loop. And that's obviously taken away what was brilliant about it before on one hand and also screwed up that part as well. I just, I think that's probably what occurred there.
25:19
I mean like, to me, one of the breakthroughs was the Async tool calling with the models. Right. Like, I think it was like Claude Sonnet 3.7, I want to say, was the first model that could do Async tool calling.
26:18
I think you mean like parallel tool calling.
26:30
Yeah, parallel. And when that happened, I do recall or may or may not have said, but at least at the launch of Gemini 3 we kept talking about a lot of these models that feels like need to be re architected for this agentic loop and internal clock that these other models feel like they have. And it just didn't feel like Gemini 3 had an internal clock. Like it was almost dead on arrival. A model that was catching up for a time that had now passed.
26:33
Yeah. Or just had simply been architected in a different way and they're trying to retrofit that into it rather than do a fresh start.
26:58
But anyway, it's interesting reading the timeline here. So it launched November 18, 2020, 25. I thought it had been around longer but clearly not. And it scored really well on LM arena at the start. But within weeks developer forums lit up with complaints. Gemini 3 is a major downgrade. It became a top thread on the Google AI Developers forum with users reporting coding regressions, hallucinations and over censored outputs versus Gemini 2.5 pro after just one month. A community review titled Gemini 3 review after one month. Inconsistent at best or at worst compared to Gemini 2.5 pro. So I think we felt the same very early on, like that it was a downgrade, not an upgrade.
27:05
Yeah. And I mean they, they to their credit it said preview in the title the entire time. I mean don't they just use this
27:49
though to get out of like, I
27:56
mean you still pay full price, right? Like so I understand if the preview is free or cheap, like that'd be great. I'm willing to experiment. But to put it out there at the full price and then have that kind of poor performance. But also what happened to Google that for a while there they were just seriously outputting like production quality models that were brilliant and remember they were doing a new tune every week or something. Like we'd come on the pod and you're like oh it's Gemini 2.5 flash tune, whatever tune this image variation, multimodal variation. And then that all sort of stopped and now it's just like well we've put out a preview for six months and it just, just failed.
27:58
Do you know why I think the models are the way they are? Because they clearly over Eng engineered Gemini 3 to work for their search snippets and their products. So like I think the model you're consuming really is to just generate really efficient search snippets with more intelligence. And if you think about a lot of the applications and implementations of the Google models in their own products, they're all one shot and anything where you have to go turn by turn and with tool calling they're absolutely shockingly bad at like and even in their own products they're shockingly bad at. So I think they've just probably over architected for like single shot fast inference with intelligence with big context like for search as opposed to a sort of future agency model. So I'm sure they're like red alert right now going after an agentic model. There's no way they wouldn't be.
28:33
Yeah, I was about to say that's actually a like a major problem because what we've both discovered throughout the last few weeks is how much the models can do with tiny amounts of context if put in the right kind of agentic loop. And so for so long I always thought the trend was we need a million contexts, we need 2 million contexts and you just shove everything in there, hope it gets a bit cheaper and let the model think for a while and figure it out. But what we're discovering is that small loops with really high quality tool calls and structures of tool calls like so for example sub agents that will go off and research a part of the task with specific skill prompts to gather that piece of the context. But you know, tiny prompts, you know, this is a 2000 token prompt, would say 2000 tokens of context. It might loop two or three times and get back to the main thread. So it hasn't used that many tokens. But the value to the overall task that you're trying to accomplish is big. That is working so much better than going, I'm just going to read in all the files and chuck them in Gemini in one massive prompt and then get the final answer. So I think that the models that are tuned for this looping and we see it a great example of that is GLM5, which basically performs better than Gemini does in an agentic mode. Way better by a long way because it can work in this style. And so I think that that's the existential crisis for Google. Like you say, maybe that's not their goal and they don't even care. But if they are trying to compete on this level, they need at least a variant tune that is designed for this workflow that everyone's moving to. And I mean people really are moving to it. It's not like it's just us. Like you know, you've got Claude code, open claw, all this stuff working in that style. So Gemini is simply just not going to be able to compete on that kind of thing.
29:22
Yeah, and I think you've got Facebook's Acquisition of menace because they saw this coming or at least a playing catch up as well. All of these companies are just either acquiring or starting to focus on these agent groups and and agent use cases now. And I think for whatever reason they started to become viable. And I think anthropics Claude Opus 4.5 showed that all of a sudden they worked late last year and then everyone was like, hang on. Including us, like oh, it works now. Whereas before, like we kind of paused a lot of our agentic loot work because every time we did it it was so bad. We were like there's just no point to this at all right now. Like until the models improve. Like this is not shippable in its current form. But I don't know, it seems like yeah, like Google are really going to have to do something here because all of those launches and all that hype just turned into complaints from the community. That anti gravity product has just been an absolute disaster.
31:11
But it's almost like they stopped caring because remember before they were like seeking feedback and like trying to get involved and they really cared what everyone thinks. Now they're just like whatever's guys, we don't really care. Yeah.
32:13
And I don't really understand unless they've got some agentic loop model internally that they're just holding back for release until some big launch event. And so they're using that internally and they just don't care. They're like oh, we're fine, we're so far ahead.
32:24
I would be like that.
32:38
Yeah, well we kind of are like it with, with Sim Theory V2 now. Like we're living in this agentic world, teasing people, not intentionally trying to finish it off. But we're similar in like we're like, we don't care about all these other things because we, we feel like, you know, we've got something. And so yeah, maybe Google is in the in the same position now. But even the problem is 3.1 pro people are saying on X like here's some comments I pulled from research. Completely unusable, with endless loops, coding failures, 10 times slower speeds. Ask why. Asking why Google even released it. Failed migrations, destructive git suggestions. After 40 minutes of looping and poor prototyping, I asked it to find like positive, like positive reflections and basically there's none.
32:40
The positive is you'll spend more time in the real world because you won't be able to work anymore.
33:30
Yeah, but isn't it funny how like and people often point to that meme about the, the loop of you know, the loop of like model releases. And I think right now you've got the new Codex and the new and Tropic models dominating. Right. It's like the, the battle is between those two. But just how quickly the market changes in the regard of like Gemini Agentic might release in a couple of months and like all of a sudden everyone will be over, over to that thing. Like there's just zero loyalty to. It's like whoever has the best API, I guess wins still. Like it really is, it's the model stupid. Like a lot of the software around it, it's so commoditized and easy to reproduce. It's just irrelevant.
33:34
Yeah. And it's just like Opus 4.6 is just the unequivocal God of models at the moment. Like it physically hurts me to use it all day. I'm like, I don't even want to know how much this is costing when I'm blasting through agentic loops and I got five of them going and all this sort of stuff. And as I said, we've been optimizing around tighter loops using less tokens. So like, you know, it's, it's part of what we're doing. But the thing about it is it's just so reliable and it's so good you can set it off in its merry way and you're pretty confident it's going to get there in the end. Occasionally, yeah. And I don't know if this is our fault with the Agentic SIM theory or it's the model itself. Every now and then it'll get a bit second guessy with itself and maybe take a bit longer to get there. But more often than not you check back in when the task is complete and it's done and it's done it correctly. It's just remarkable how good it is. And I've tried, I've given Sonnet 4.6 a very good run during the week and it's almost as good. Right. Like it really, most of the time is, is as good. But if you want pure reliability and confidence in your model, Opus 4.6 is just the boss. There is just no comparison in terms of the model. So if any model provider is going to compete with it, they're going to have to come out with something that makes it, you know, look, look no good or just, I think a lot
34:22
of, I mean a lot of people are saying Codex 5.3 is better. The problem is in our agentic loop, in the world we're living in, because there's no API. We haven't thoroughly tested it in our context and I think that's how we generally can gauge how good a model is. I have downloaded the Codex app and have been doing some tasks in it, but I, I don't know, I just haven't seen it yet. Like it's not any better. The price is definitely a consideration for me. Like if it, if we can get it through the API and it's as cheap as like the Codex 5.2 Mac stuff and all of a sudden it's like totally. It would be my go to model. Like no doubt because I believe it's roughly on par. I just think for a lot of the tasks I work on, I still at the moment like side by side comparison, even in a planning context. I do prefer Opus right now. But yeah, it's really a two horse race at the moment. And outside of that, like as you say GLM and I think Haiku can be used for pretty like, you know, those grunt tasks are fine.
35:41
It's fine like context discovery tasks, research tasks, searching tasks, basic coding tasks. Haiku is completely competent and in terms of its looping, it's really good. As some people had been saying on our discord throughout the week, the one thing Haiku will do is almost like try and get out of work by stalling with questions. Like it's almost like a task, like something you do in the real workplace. I've heard the expression bike shedding before, where to sort of like make a meeting, go on with irrelevant crap. You will, you'll discuss some topic that everyone wants to talk about, right. And, and distract them with questions. And so I think Haiku does a little bit of that. GLM5 doesn't. It's, it's kind of well balanced. It really actually gets the job done. And if I was looking for a, you know, economical way to do massagentic work and or like for example, self hosting a model for my organization, I would straight up go GLM5 mass produce it. You're not going to really notice much downside compared to the other models.
36:49
You find it funny like anthropic accused these labs during the week, like Kimmy, K2 and GLM and like all the ones training on their outputs and had I think pretty definitive proof. And everyone roasted them because they're like you trained on all our data and the entire web and copyrighted data and called them out for it and said so what? Like why, you know, why can't they train on your thing? But I got to say they're sort of doing everyone a service because they're taking all the elements of this model and then making it open source. And so what if they're training on the weights? I kind of think it's a good thing.
37:51
I just want the best models. I don't care how they do it. So yeah, that's fantastic. And I mean it's a great endorsement for it. If Anthropic's actually got their, their backup and they're a bit worried about it, isn't that like the most glowing review you can possibly give to this model?
38:24
Yeah. So it's actually a threat, like it's the one to copy. I guess that's what I'm saying is like outside of Codex and Anthropic models, right now, you sort of know whoever China's training on is the best model at a given time, right? Like that should be the benchmark. Who is China consuming? Which Chinese labs are training on the outputs of what model should be the benchmark. Like how much throughput.
38:38
Yeah, and I think there's, there's more to be said for, and I know I keep going on about this every week, but it's just so in my head in terms of what I'm working on. But I think what we need is these cheaper models for regular agentic tasks. So something that came up yesterday in a conversation I was having was imagine you've got an organization with 20,000 people in it, right? And they want to set up recurring agentic tasks. For example, give me my daily plan that goes through your calendars, your emails, your appointments, like and you know, maybe some other company resources to get you situated. If 20,000 people do that like all at once on your system, how's it going to cope? How much is that going to cost on an organizational level? And then the other thing is duplicated work. What if everyone like you currently is sending me like daily reports of something on an automated basis, right? And I'm like, but imagine if like 5000 different people did that or equivalent every day. Like that's really going to add up. So these models, the smaller models like GLM5, just to be specific, could do the exact same job on an automated basis. And because it is a recurring thing, you could even optimize the prompting around it to make sure that model can do it. And it's a totally different proposition where, okay, maybe the, if you use the top line model it costs a dollar a day, but if you use GLM5, it costs 10 cents a day. Like that kind of thing.
39:03
It's like a. Like so right now with the automatic switcher in the sort of traditional chat mode in SIM theory. And I'm just explaining how this works more like this isn't some sort of promotion. I'm just saying it sounds like it does, but it front runs the query. So it front runs it with a fast model and says hey, what kind. It just applies like a label to the query and then it picks the best model based on the labeling, right? And it's so fast you don't even notice. Like it's finally so fast and intelligent enough it works pretty good. The problem with that in agentic is certain pieces of the agentic loop, right? You really need to keep considering like am I on the right model? You can't just like stick on the same model. And I found this when I was enhancing the SIM link for browser plugin, right. Was trying to drive down cost dramatically so that you could do browser tasks and it wouldn't send you broke but. And also have it super fast and make it super reliable for common use cases, right? And that is hard. Like it. It is a lot of like it's. The code is not hard because these models can do it. Now it's just the QA and testing different use cases that is quite frankly very tiring. So what I landed on was the model after a few iterations of doing something, if it gets stuck, like we detect it stuck, it phones home. So it'll be like, oh, I need a better model now. I need a smarter model. And it goes up the stack of smarter models until it can figure out the problem. Now if it can't figure out after a certain point it completely aborts and then reevaluates. Like am I just going in a dumb direction Now I was able to get for a pretty complex like scraping of data in a an app I won't mention because it's a bit sensitive. But scraping data out of a particular app, which is a real use case someone has. And a use case I was doing to test with was taking costing 2.79 to scrape this data which honestly for this use case you probably wouldn't care that much. $2. 79 I got it down to less than 30 cents. So back in the day, like the cost of an SMS, I think about 26 cents. And that was all through this optimization in the loop of deciding which model to use. So I kind of think that one challenge for us or everyone in general is truly getting to that automated mode where you're able to go to A glm, go to this and route and then update that routing logic over time. So you're just driving down the price.
40:29
Yeah. And almost like. And I know that we've done this, so I know why. But the, like, having a strategy around that, like, I want to use this model in. In these scenarios. This model in these scenarios. And then having the system do a process similar to what you've described. My only fear is that whole you don't know what you don't know thing, it's like, if you're asking like, you know, a really dumb guy is, how's it going? Like, do you reckon you're going to solve this task? Like, yeah, mate, all good. Yeah. Yeah. Well, no wor. And. But it's actually doing something like, idiotic, like, you know, ladder is up against the wrong wall kind of thing. So that's the only downside. So I think like some sort of supervisory smart model every now and then checking in is probably the way to go.
43:15
Like, even in my experimentation with the smart model checking in type nonsense, the ladder can still be up against the wrong wall. Like, even Opus, very late last night for me, was getting stuck on a problem and it took, like, because I was tired, it took me a while to realize I'm like, it's not even working with the right file. Like, it's completely misinterpreted the task.
43:54
Yeah.
44:16
And even though I told it what the file was at the start, it. It just is like, nah. Basically, like, I looked at its internal thinking too, thinking, oh, is it something we've done? No, it's. It's truly just gone now. The user's an idiot, the ladder's up against the wrong wall. It's. It should be this file. And I'm like, no, I'm telling you, you stupid. And I had to literally resort to absolute abuse to convince it that it was indeed this file. And so they all still suffer from that gullibility. It's just at different levels.
44:17
It's so funny you say that, because the thing that caused me to rage quit last night when I was working was I'm working on optimizing sub agents to, like, keep them really tight but do the right thing. And I had asked to look at like, a fairly easy task, like, and I had actually specifically mentioned the files that I wanted it to reference. You know what it did? It searched Google eight times for stuff. And I'm like, how did you think this was a good decision? Like, what. How is Google going to help you work on my local files? And it was searching for the files on Google. It wasn't even searching for like reference materials or something like that. You also actually gave me a funny story so about the whole like being nice to models. Because I routinely abuse my, my models when they make a mistake. Like I call it dumb. Like I'm sort of like, I guess like an abusive partner kind of with the model, like where it'll like be serving me so loyally and then I'll just be like, you should know better than this. Like, you've done an awful job. So my wife's been using the agentic loop, right. But she has the policy of always being super nice to it. So I got home yesterday and then suddenly I hear from her laptop, nicole, can you please take a look at this, I need your help. And then I heard, nicole, your task is complete. And I was like, what in the world? Like she's using sympthery. We don't have that capability. There's no like voice notifications. So she had basically said to it, you need to find a way to tell me when you're finished task or when you need input. Because I'm walking around the house and I want to know when I'm needed. So what it had done is realized through the shell on a Mac. It can use the say command that uses a built in text to speech model to talk to her. So she's. So the models have literally found a way that they can actually like have a, like a side channel communication to, to orchestrate the agentic loop. Just remarkable.
44:50
Yeah, like the capabilities of these things is nuts. And I think I was on the phone to you last night at one point and we were talking about like what happens now that anyone, anyone can kind of do this and anyone can build this stuff. And like moving forward like right now, I think most people can if they have the willpower, get things to about 95%. I do think that last 5% people with experience and people that can develop and write code and understand code still have a huge advantage like that they underestimate. I think they get excited because it's now really easy for them. But there's still so many things, like so many accumulated, so much accumulated knowledge that you need to actually like deploy a project in a production and serve it. And there's still a lot of work there in my opinion. But I think over time a lot of that stuff will be patched in too. So then you sort of come to this realization like, what does this mean for the world when the cost of Legal documents, for example, is zero. What does it mean when the cost of software goes to zero? Well, not zero, but like, you know, maybe 10 to 20x cheaper. Right. And I think that the way I sort of look at it now is you look at YouTube, right? Like everyone can record videos, but then you get a standout, like say a Mr. Beast that is just so much bigger than anyone else and takes it further and further. So I think that while, yeah, the cost comes down, it doesn't really change the market dynamics of you've still got to want to build something and persist with it and have taste and have agency to get it done and, you know, hire people to do it. Like, I, I don't think it's as great. Like, it's as like it's something I think we'll just get used to. Like once everyone has a hammer.
46:39
Let's say I had like an ultrasound machine, I could probably build like ultrasound software now. Like I could literally make software that shows it on the screen and does an analysis. I would literally have no idea what features are required, if it's right or not. Or like what is an actual pain point for a person who, who runs an ultrasound machine. Right. So like, yes, technically you could make it and try and like undercut the existing incumbent, which I'm sure charges a lot, but you would be completely flying blind as to whether the software even does what they want. Right?
48:31
Yeah, there's no way you're going to actually do it in reality. And I think that's what a lot of companies are coming out now and saying. Like, I think Atlassian came out during the week, Mike Cannon Brooks came out and said like, like, you know, people aren't just going to replace their enterprise software. Like, it's just not going to happen. Like it may, like there may might be disruption like other software companies that are more AI centric and AI first and people fail to adapt and, and then that causes a disruption. But I think it's highly like, who wants to go and build their own. Like, especially if you have something like Jira or like GitHub or like Figma or like any of these apps that are getting targeted on the public market right now and getting like annihilated in terms of valuation. Like, who really wants to go and build that or who cares? Like you don't, like you're doing your thing, you want to focus on that thing. Like, I'm not entirely sure that's going to change much, but I do think that as the AI workspaces become the centerpiece for how people are productive and work with AI agents and things like that, what will happen is the companies that don't embrace like MCPs and open standards and allow agents to interact with their SaaS products, you're going to turn from them and adopt solutions that do. And so I think that's actually the biggest risk right now is just not embracing the tech.
49:03
Yeah, exactly. People, I mean, I know this because we're trying to do it is like people are just going to bypass the ones who block you and find a way with browser use and other ways to actually get at the data anyway and then ultimately move when they can to something better that's actually embracing this thing. Just on that point though, about like, you know, anyone being able to create software, something we've experienced many times is hiring say new developers to a team and they come in and they're like, oh, all the way your software is done is wrong. We need to rewrite it. And I'm ashamed to say that a few times in my career I've said they're right, we need to rewrite the whole thing. And then what you discover is there's something about a working system that over the years has accumulated like IP and knowledge and experience in all the little changes you make over time. That seems so simple when you look at them on the surface. But then when you actually try to recreate something, you have to rediscover all of those. All of those things. And I think that's why you're not going to see someone just come out and vibe Code Salesforce and just take it over. Because they're missing all of those little things that have accumulated over time. They're missing the relationships and also just simply the distribution. Like I actually think a lot of what is going to lead to future success in software as a service is having an audience, is having the people who are willing to try your AI generated thing over something else because they're not just going to switch in like normal market conditions.
50:27
Yeah, I mean there's no doubt it'll drive prices of subscriptions down because it's harder to justify if the subscription is more than the cost of having some internal team, especially in the enterprise, maintain some proprietary software. But I just don't think it's as a biblical event as people are trying to make out. And I'm not saying that as some copium thing, I'm just trying to reason my way through it. So you look at it so Anthropic announced a bunch of open Source plugins, I think cowork or just, you know, their app or whatever. And like, for those who are unaware, these are just skills. They're markdown files describing a process. They're a prompt in a markdown file with a fancy name. That's all it is. It's like you saved a prompt to a markdown file and all they did was probably vibe coded a bunch of financial and legal processes and financial services processes into markdown files and quietly release them as plugins or skills, whatever you want to call them, targeting these, these areas like legal, sales, marketing, finance, data analysis, all the different workflows. Now I've read them, they're really not that complex. And you would be far better if you are a legal firm or a doctor or like whatever profession you are, to sit down and just chat to the AI agent and say, hey, like here's my processes. Describe it, get it to ask a bunch of questions and when it's done, say, okay, now write a markdown file documenting my process. Right? Like that's what the, the smart money will do because you have your own unique approach and yada yada. But they're good templates. But anyway, that led to a market sell off some sort of like beer porn blog post about how every white collar job and knowledge work is going to be wiped out, etc. Etc. And that just like has tanked valuations of different companies. And I don't know if it's because I work with it every day, but I, I find it extremely laughable. Like for example, legal documents, what, two years ago now, maybe three. You could already write better contracts and analyze contracts better than a lawyer. Like that's that, that ship sail. But is anyone actually doing it in reality? No, because you need to sue your lawyer if they get something wrong. And I think for small businesses it's great because it's empowering in the sense that they can now do more legal work or get legal analysis where they all negotiate.
51:56
Like I heard someone the other day, they're like, I have to pay my lawyer $12,000 to draft this document, but I just know they're going to draft it with AI. Like, and so should I ask for it? Should I ask for a discount? Being like, you're just not going to put it like, yes, I understand you've got insurance and I understand that you're signing off on it and whatever, but it's going to take you less hours now, so can we negotiate a better price? So that may drive prices down.
54:34
I think it's Going to like, I, I agree it's going to be very deflationary. Like to me it will, it'll drive prices well and truly down. But you know, there's that whole, that whole paradox of like if you drive the price down of something, people just end up consuming more. And I don't know when you like, if you apply that to legal stuff, like a lot of that work where you're driving the price down and it's not that important do you think in those cases anyway, like it would be because okay, look at conveyancing, no one's
54:56
going like, oh this, these legal documents are so cheap. I might just get a few extras.
55:32
Yeah, I might just sue people randomly for the fun of it. But I guess my point is look at conveyancing in Australia, right? Highly regulated process. You need like they have all these regulations and systems and stupid protocols where the legislation is such that it revolves around humans. Like there's no way, like sure, it can make it easier for the lawyers involved and make it quicker for them, but it's not something I can get my agent like oh, just like exchange a property for me right now. So yeah, to me, like, yeah, it's deflationary for the law firm in that case, but it's not yet necessarily deflationary for me because are they really going to drop their price when they have a stronghold on legislation now? Like, like this is the last moat really for a lot of these things.
55:36
We need AI to form its own governments and laws. That'll be the, the true, you know, shift.
56:26
Yeah, I mean, but if you really want efficiency in society and to get more done faster. The one way I think to get ahead if you're a country right now is to just de. Regulate this stuff. Say like I'm sorry conveyances, find a new job. Because like we're just going to allow AIs to do this stuff now.
56:32
Oh yeah, that sounds like a good plan.
56:49
But I mean it. What those people go and do more high level jobs, like more important things. I don't think these people enjoy doing these, these things. Like it'd be awful work.
56:51
Like it's depressing. Speaking of jobs, you probably saw like the massive layoffs during the week. You, you mentioned one to me, the Square company or what?
56:59
That happened only a couple of hours ago.
57:08
Bounce. What's it called? Box.
57:11
Box. Is it box? No, not box.
57:13
It's some sort of single keyword. Cube.
57:15
No, it's not cube.
57:18
What?
57:19
I don't know what it is now.
57:19
Anyway, the company formerly known as Square laid off, like 3,000 people or something, claiming AI.
57:21
Is it brick?
57:27
I don't know. It's something that has four sides. They may or may not be equal in the name. But anyway, in Australia, there's a company called wisetech, and they have basically logistics software for cargo, right? And anyway, I didn't know they had this many employees, but in the. In the article I read about it, it made a really interesting point because they lay off the people saying, well, AI is so much more efficient now. We don't need all these employees. Like, what a waste of time. We'll just make our existing people more efficient. But then what it does is it draws the market attention to the fact that, oh, my God, all of their software is ripe for disruption by AI. So on one hand, they're like, look how much more efficient we are. We're going to save all this money not paying these people people. But then the actual reaction of the stock price is a huge drop because everyone's like, hang on a sec. But your prime. Like, what makes you think you're going to leverage AI better than everyone else? Like, you're just as much a victim of this as those employees are or anyone else is. So it's like this really interesting time for these companies. And you sort of wonder, are they using the COVID of AI as an excuse to just cut their bottom line, because that's what I would do, or are they actually truly believing they can make them more efficient because there's no way in such a short period of time they would have enough metrics to know all these 2,000 people are just not needed because of these advancements.
57:29
The company's called Block, by the way. They laid off 4,000 people, almost half the company. I think it's like 45% of it.
58:51
That is a lot. Like, think of the kind of room you would need to fit 4,000.
58:58
Come on. This is Jack Dorsey, the Twitter guy. I mean, Elon Musk, prove this guy runs bloated, stupid companies. Like, he laid off, what, like 70% of the former employees under Jack Dorsey. And it's fine. Like, it's fine. There's nothing has changed except the brand. And so I just. I'm not sure that I don't think this is AI, like, even using it right now. I'm thinking, well, how do you just lay off half of your staff? Unless you're horrendously bloated and doing economic
59:02
reasons, it probably is a good decision. But the reason isn't AI.
59:33
They just say AI thinking the market will respond positively instead of negative. Like if you're invested in a company and they lay off half their people under any other circumstances outside of AI you're like this company is in like a lot of.
59:37
And then yeah, the whole announcement is saying how great everything's going. Yeah, we're absolutely killing it. That's why we don't need all of these people.
59:50
Guys, we're crushing up. But you're all laid off. Like it doesn't make sense.
59:58
It's like I want all that money for myself, not for you guys.
1:00:02
I think, I think right now just to calm people's nerves, just to be a voice of reason here. Like yes, it's disruptive but I don't think these, the all of these theories about, you know, the end is nigh despite earlier in the year me being in some sort of AI psychosis is real. Because yet again the long like if you just embrace these tools like the agentic loops and the turn by turn with tool calling stuff and you play around with all the latest models like we've been saying for a long time, you realize yes, they are huge productivity gains and yes you can be way more efficient in tasks if you know how to use them correctly. But we are still so far away in my opinion from the complexity of replacing like whole knowledge workers in industries that it's just laughable to me. Like I just don't see any of these companies outside of being disruption disrupted by AI first companies that come along and compete with them, any other vector of threat. It's not like companies are going to go vibe code, you know, a square terminal or whatever you sell because that's
1:00:05
not what they're providing. They're providing all the banking relationships and the regulations and the, the money transfers and stuff. Right. It's not the software.
1:01:13
Yeah, it's so short sighted to think that software can, you know, software is the business. A lot of these software companies, the software is just a means to an end to deliver a service. Like it's not. Yeah. Anyway, I think the market's yet again are not and people there's just so much fear porn out there and the fear porn sells. So. All right, thanks for listening. We will see you again next week. Also if you are still listening at this point, I put the wrong link in for our event last week, lol. So there's an updated one.
1:01:21
Even out. Even out the averages.
1:01:50
Yeah, I actually vibe coded the description and it put in the made up link. So there you go. All right, we'll see you next week. Goodbye.
1:01:52