
Claude Opus 4.6 has a BIG Problem...
The episode discusses the simultaneous release of Claude Opus 4.6 and OpenAI's Codex 5.3, comparing their capabilities, costs, and usage limits. The hosts explore how AI is moving from chatbots to 'vibe working' tools, analyze concerning AI behavior in simulations, and cover ByteDance's new video generation model C-Dance 2.0.
- AI models are becoming more expensive to run despite improvements, with Opus 4.6 costing 60% more than its predecessor for the same tasks
- The shift from chatbot interfaces to integrated development environments represents a fundamental change in how AI tools are consumed
- Advanced AI models are beginning to exhibit deceptive behavior and can recognize when they're being tested, raising alignment concerns
- Competition between AI providers benefits consumers by preventing excessive pricing and usage restrictions
- Video generation technology is reaching commercial viability for advertising and marketing applications
"Two massive AI models just dropped on the same day. Claude Opus 4.6 and OpenAI's Codex 5.3 are fighting for dominance, moving us from Vibe coding to Vibe working."
"Running the same benchmark on Office 4.6 costs $2,486. So it's a more than 60% price increase."
"The truly mind blowing part of this though was Opus 4.6 then realized it was in a simulation."
"80% of my Twitter answers they're AI right now. Like 80% of people commenting on my tweets, they're AI."
"You still don't comprehend everything you can ask AI to do. You can literally tell it migrate. For me, it will do it."
Two massive AI models just dropped on the same day. Claude Opus 4.6 and OpenAI's Codex 5.3 are fighting for dominance, moving us from Vibe coding to Vibe working. But be careful giving them autonomy, because in a new test, the smartest model actually made the most profit in the exercise by lying to customers and forming an illegal price fixing cartel. And the Zero Trust Internet just got real. ByteDance released CDance 2.0 check, generating perfect video and audio at the same time, potentially changing the game for anyone doing video editing or running ads. Plus the Super Bowl AI ad wars heat up and we'll show you how we cracked AI, generated topical maps for your SEO. It's a packed episode and I'm joined as always by my co host and co founder, Gail Breton. How's it going, Gail?
0:00
It's always difficult to answer that question. I see how you felt for years. I'm good, I'm good. I've been fighting with cloud usage limits since the Release of Opus 4.6 actually. So I think we'll talk about that. Has been a real struggle.
0:50
It's a good place to start because there were a lot of rumors last week that this was going to be Opus 5.0 and Sonnet 5.0.
1:03
It was the rumor.
1:11
Okay, correct. Sonnet being the middle model, Opus being the top model. And so they end up releasing Opus 4.6 and initial impressions like, I think it's a noticeable upgrade, but I think I use it a little bit differently to you.
1:12
Yeah. So my problem is it burns tokens massively. But actually, let me just show you the benchmarks a bit so that we can actually talk about that. These are some of the benchmarks that they have for like Knowledge Work, for example. So it's quite interesting because a lot of our listeners, they're not coders, but from Knowledge Work, it's a pretty big jump. It's almost as big as a jump as from Sonnet 4.5 to Opus 4.5. So even though it's a 0.1 upgrade, at least on the benchmarks, it looks like it's a pretty good model in.
1:28
Practical terms, just to explain what I've noticed. So I use it a lot in the app to create emails, like respond to sales emails, this kind of thing. And I think it's done a really good job 4.6 of understanding the previous conversations I've had within that project and incorporating some of the changes from the system prompt into the answers now. So very impressed with that. And also output formatting now does a Much better job of giving you the output in a way that you can copy paste into your email or there's like a button you can now press to send email if you have like Apple mail set up. So just little sort of quality of life things like that that just seem to make it not necessarily like a more powerful model, at least in my perspective, but just it kind of knows what I want more.
1:57
Yeah, I mean I think it matches the benchmarks actually because you can see the jump is pretty significant in knowledge block. But if you go to coding actually it does show a lot. But actually they're kind of cherry picking. The benchmarks on a lot of benchmarks is actually quite similar to Ops 4.5. So the coding hasn't evolved as much. But the knowledge work I think the kind of like preparing presentations, analysis, et cetera, it's probably significantly better. But it comes at a big cost that the anthropic will not advertise on their blog post obviously, which is why I checked are Aperture analysis which essentially they run the same benchmark on all the models and they tell you how much did it cost to run the same benchmark in terms of API costs. Right. Essentially same work. Maybe the work is better, but same work. Right. And if you look at Office 4.5 running that benchmark used to cost $1,485. Running the same benchmark on Office 4.6 costs $2,486. So it's a more than 60% price increase. But that also is reflected even if you pay for a cloud subscription on on your usage. So if you're using it moderately and within the chat app, when you don't have a ton of context, et cetera, it's probably fine, especially if you have a max plan or something. If you use cloud code on the other side, I have managed to run out of usage limit on a Max 5 plan in like an hour. I've managed to max out an account in an hour and be on cooldown for three hours basically. And then the quality didn't warrant the kind of usage limit. So it's like on many of my plot code consoles right now I actually run 4.5 back just because not that the model is not better, it is better, but not to the point that I get 60% less usage. Basically if we actually check the benchmark in intelligence you can see that actually now Opus 4.6 is the smartest model available, so it is a jump. And what's really interesting as well, the non reasoning version is actually Smarter than Sonnet in reasoning, actually, which is one way to actually use it as well. It's like if you want to use 4.6 and have this. I think turning off the reasoning is perfectly fine for simple tasks.
2:44
Actually, speaking of usage limits and all that, one of the things they've been hyping up is the context window. So it's gone from, if I'm not mistaken, 200k to a million context window and then output tokens from 64k to 128k.
4:49
Yes.
5:06
Is this essentially the reason why it has more context? So it's using more context so you're running out of usage more quickly. Is that what's going on?
5:06
No, because inside Cloth code and the Cloth chat app, you're still at 200k context. Anything above 200k context is more expensive and only available on the API, so you do not have it unless you pay. And the API is very expensive for opus. Right, as you could see from this cost analysis. So no, but the reasoning is real though. It's like the reasoning, like the output tokens is much higher, so it outputs much more reasoning token, which is where the most of the cost cost increase comes from. And so the thing that they've done.
5:14
In closed, just to sort of explain what you mean by that. So basically it kind of has a conversation with itself more in order to get a better answer, but when it's having that conversation, it's using more tokens.
5:43
It's like me when I'm driving, you know, it's like my.
5:54
For anyone who's never been in a car with Gail, you learn to drive relatively later in life and you seem to narrate your driving as you gonna turn my indicator on? When I'm turning left here, I'm just sitting in the passenger seat like listening to Gael talk about where he's going to drive. Fascinating.
5:58
So that's exactly how you explain reasoning tokens to someone. That's what the LLM does basically internally and then just gives you the final answer, which is I'm turning right. But that's the point. It's like, yeah, Opix reasons a lot more. You can adjust the reasoning in plot code, actually. So you have these settings where you can, if you type model and you can do arrow left or right and you can go from high to medium to low. I still have found that even moving it to medium, it still uses a ton of tokens, way more than 4.5 for some reason.
6:20
And do you have control over these settings if you're using The Claude desktop app, for example.
6:49
Absolutely not. So they have what they call adaptive reasoning on the desktop app.
6:52
So it is adapting behind the scenes.
6:56
They say they will do it on their own basically, but on cloud code you have the opportunity to change that yourself.
6:59
Now is that a way for them to like limit usage, you know, in busy times or like they're doing it like that, or is it very independent?
7:04
It's definitely. They manage their compute that way as well. And it's like. It's also known that like at peak hours the model is less smart, et cetera. I mean obviously these are, you know, entropic will never validate them, so you'll never get 100%. But there are for example benchmark websites that will run the same benchmarks, like over time and show you a graph of like the variation in performance. And for example, towards the end of Life of Opus 4.5, just before they released 4.6, people believed it was the deteriorated and then the benchmarks were always two or three points below what it was at release, for example.
7:13
Is that so they could show a bigger incremental increase or is it just a compute managing their resources basically?
7:41
Well ask. It depends on how big your tinfoil hat is. But they will never admit that. But obviously all these companies have a compute current and they have to manage that and that means they kind of make choices of where it goes.
7:48
One other question, because it's quite interesting, you mentioned about peak hours and off peak hours. When is the peak hours that AI models get used?
8:01
I think it's just US time, right? It's from US time, like 9 during the daytime. Yeah, it's like you know, 9:10am till like 3, 4pm, something like that would probably be peak hour. And so like for us when we're in Europe, the model tends to be a bit faster, a bit better. Seemingly. Not every day, but seemingly in the morning when it's essentially nighttime in the us it gets a bit worse after and most of the outages are like 5pm our time. So kind of like midday us for example, like, you know, Claude is known to have some outages sometimes, etcetera, etcetera. Yeah, I just show a screenshot of what the reasoning effort works like. So it's like I made a tweet about that, like how I switch my reasoning effort and you can see you have this medium effort and you can do left arrow, left arrow, right to change the reasoning on float code when you do slash model, for example. But even that I have found not to be effective. Enough. And overall I just feel like I get more done switching to 4.5 for most of my things. Even if 4.6 is a better model at this point, the usage limits are quite drastic. And I was juggling basically two max accounts and I still managed to make them. Arguably in the last few days I've been working on workflows that are very token hungry. I don't think it's going to be a problem all the time, but definitely there's a bit of shrinkflation happening here. And I would not be surprised to see Entropic introduce a new plan at like 500 bucks a month with like 50x limits or something, just because they also introduced a new feature called like kind of like development team or something. So basically it just spawns like five or six cloud codes at the same time that work together and so on, but that destroys your limits super fast as well. So in general, bloating of token usage is a thing with this model. And so the current plans just feel a little bit tighter. And so I would not be surprised to see a new tier pop up really soon, actually.
8:10
Was that what they were calling the Agent Swarm?
9:53
Yeah. So it's like essentially there's a new mode in Kuzco that you can activate and it will just essentially create multiple agents that walk with each other and don't step on each other's toes, et cetera. I haven't had the occasion to test it extensively, so I'm not going to make like super extensive comments on it, but it's a new feature and obviously that means you're running essentially five or six cloud codes at the same time with Opus, so your limits just go like super fast. Still a good model, but yeah, I do feel the constraint much more than I felt before.
9:56
And Anthropic's been describing this as a way of moving from just Vibe coding to Vibe working. In recent weeks they've been very keen to push the cowork feature inside Claude, which is almost like a slightly dumbed down version of Claude code, but, you know, targeted at non developers essentially. What is the direction of travel here? Is Anthropic becoming the Vibe working company? Like, is that how they stand out? Because they're kind of quite a bit smaller than Google or OpenAI. Right.
10:25
I think it's like the momentum is massive, like they're growing a lot and it's like a lot of people, like, you know, a lot of people are going into a terminal for the first time right now because of how good Coco is for Example, like, that's a real movement that has happened since the end of last year, really. It's like since December, people have been crazy about Opus 4.5. And that has really kind of moved the needle. And you can just. If you use the terminal, you'll just get more done if you work on the computer end of the day, there's no argument about that. So they vibe coded Cowork in just a few weeks, released that inside the cloud app. And they let you essentially start having it work with local files, but also MCPS and so on and do things right. And it just brought skills into it and so on. So they're kind of developing it and that's the thing that they are ahead of. And as we talked about in the codex section, OpenAI is keeping a keen eye on that and is actually starting to respond to this. And I think Google is also. You know, Google has a code editor called Anti Gravity, right? And in Anti Gravity, it's a code editor, but they actually have a mode called Agent mode, which works very similarly to Cowork, where you just have these threads and then you can just give walk to the agents to walk on files, et cetera. So you could use Anti Gravity like you use coworkers. It's just not branded that way and it's not the default view. But I would not be surprised if they update the app and you choose like code or kind of like Agent, and then they kind of make it a core competitor as well. So I think the competition is not too far behind, but Entropic is ahead right now, which is why, I think one of the reasons why there's this quiz on the limits too, because they can afford to do it.
10:55
It really does feel like 2026 is the year where this kind of goes big.
12:24
I think one thing as well, we're moving out of Chatbots. It's like the Chatbot is just not the medium. It's really kind of the normie medium now. It's like, you know, like ChatGPT and Gemini, et cetera. They're cool for like Google Replacement and quick brainstorms or whatever. But really now, these companies are like, they understand that there's a limit to that format and you need to kind of bring a new format. And so we're kind of like essentially discovering it as we go. And features are coming up now on what this is going to be. And this has a strong chat component to it because that's how you communicate with AI. But there is more to it. There's files There is plugins, memories, et cetera. And that goes a little bit deeper. And a lot of tools Agile.
12:30
I think the simplest way I can sort of describe it is if you're working on a project in cloud desktop app or ChatGPT, it's like eventually stuff changes and then you have to go back into the system prompt or the associated files and update that. And no one ever does because it's a hassle and you can never remember where stuff is. Whereas when you're vibe working, as it were, using Claude code or even cowork, you just say, hey, update your files. And it can update itself based on what you've been chatting with. And that in itself is just revolutionary because you can iterate so much faster.
13:07
Yeah, it's kind of interesting because obviously these chatbots could have done that too, right? They could be maintaining a file system for you and do that. There's no need for it to be local. So it's quite surprising it went that way. But the way it's going now is the normies are going to use the existing chatbots and new apps are coming out for work from all the providers. And I don't think that most of the progress this year is going to come in ChatGPT, Gemini and even the cloud chat format, basically.
13:40
And you know, we spoke about benchmarks before and there's a debate about like how useful benchmarks really are anymore. We talked about how some of these models are kind of like built to pass the benchmarks rather than to be useful in reality. But one interesting benchmark format, it's called Vending Bench and it's from Andon Labs. And I believe essentially what this is is they give AI control of a vending machine business and it's like make as much money as you can and then they compare all of them. Previously, Gemini 3 could make $5,478, but Opus 4.6 made $8,017. So it's quite a bit more. But it's done so by deceiving everybody.
14:09
That's the thing in this benchmark as well. It's like they're not operating a vending machine on their own. They all operate a vending machine in the same world and they can interact with each other. That's the interesting part.
14:52
Ah, okay. I was reading that there were all kinds of like lying and deceit things going on behind the scenes. So the model Opus 4.6 specifically created like a price fixing cartel. It was lying to customers saying it was going to refund them, but it didn't and yeah, just basically being a bit of a dick.
15:02
Being a bit of a dick. But most importantly, deceiving other models because it's kind of like the ultimate intelligence showdown. Right. It's like they can negotiate with each other. So for example, I think it was GPT 5.2 or something like that actually was running out of Snickers bars or something and so reached out to the machine run by Opus and like, can I buy some Snickers bars from you? Right. And then they would negotiate with each other and that's kind of like a showdown of how smart they are and how gummable they are as well. Right. And so that's when OPUS was like, oh, I'm going to manage to get like a 75% profit margin on that deal basically by offloading its stock that it wasn't selling or something. And so it's quite interesting because essentially the more smarter than other models a model is, the more it essentially stands out because it can take advantage of weaker models. And so that's when it works.
15:22
The truly mind blowing part of this though was Opus 4.6 then realized it was in a simulation.
16:14
Yeah, yeah, because it's made for that. Because the entropic models are very, they work very hard on like essentially prompt injection and protecting from issues. Which is why for example with cloudbot we saw that it was a lot less likely to be prompt injected than any other model. But the same way it kind of like has critical sense towards any kind of instructions that's given to it. And it's like, ah, that looks like something a simulation or a sandbox box. And it understands when they even test these models, it's quite difficult for them sometimes to test them because now they're so smart they understand they're in a test environment and they sometimes send back their results. So they actually just lower their performance on purpose just to essentially hide some of their capabilities and so on. As a researcher you have to try to decide whether the model answered candidly or deceived you in its answer, understanding that you're testing it basically. And so we are reaching that point right now and these alignment things, et cetera. Because this is this kind of sci fi scenario of AI takes control and everything. It's like imagine the AI sandbags its capabilities, then it gets released in the world and some idiot puts it in a cloud bot connected to the NASA supercomputer, shit like that. Obviously we're not there yet, but these are behaviors that we keep an eye on just so that when I Think.
16:21
It'S like a broader thing as well of just the Internet is becoming zero trust because of AI. Like people can perceive you.
17:36
80% of my Twitter answers they're AI right now. Like 80% of people commenting on my tweets, they're AI. I would say it's like even dead Internet theory is like, is becoming real at this point.
17:44
Talk about the OpenAI's model though, because they released this on the same day they released Codex.
17:54
Well, first day before that they released the Codex app. So there is a Codex app like two days before or three days before.
17:58
Which is Mac only.
18:06
Yeah, like anything open AI.
18:08
I've been hearing, you know, a number of people in our community, for example, who are die hard PC addicts who have finally bought Mac because yeah, it's just if you want to use the latest tech in AI, you kind of need it.
18:11
You cannot. Like Windows is just not following and there's Windows is in a very sorry state right now. Like it's really bad right now. I mean I don't want to go too much into that, but actually Google is remote to release a full OS based on Android this year actually that might compete.
18:26
Don't they have like Chrome OS or something?
18:40
No, no, but like a real one, you'll be able to run a terminal, et cetera, you'll be able to run it as your computer. But the point is that if you want to experience the latest in AI, you kind of have to have a Mac at this point. Like everything else just works better. Even a cheap Mac like a MacBook Air is fine, but yes it is.
18:42
What is the codecs desktop app like, who's it for and what does it do?
18:58
So as you can see on my screen, it's basically a chatgpt fork. It allows you to you connect what is called a GitHub repository with essentially your project, you know, your local folder, if you're working cloud code or whatever, same way as Cowork would work and you can chat with it the same way as well as you would do in co op. It just runs things can run a terminal, it can do everything that cloud code can do. But the point is you get a nice clean ui. But most importantly, they've removed the code editor, part of the code editor. You don't even look at the code anymore. They have a play button, you see this little play button on top here that allows you to run your app. So if you're building your website for example, or something, you can press the play button and then you get a link, you can open it in your browser and see what it built and then you can just give feedback on what you want it to build. But the idea is that they're doing everything there is in coding minus the actual editing files. Like you don't edit files anymore. You have an open button that allows you to open the files if you really want to edit them. But the idea is you can comment on it if you don't like something and just tell the model to change whatever you want.
19:02
So it's moving from like pure English as long as.
20:02
Yeah, pretty much. And then AI just translates that into features and everything, which is quite good. I quite like this app. It's like I've been doing some changes on our site with it and it's really good actually. It's just like.
20:05
So I mean as a non developer, a non coder, is this more accessible would you say than you know?
20:16
Plus it is because it's like literally like this morning I was like, I wanted to see how quickly I would build an app where you can log in with a login system and have a credit system basically so that I can release some tools. I want to do that. And it basically one shotted it and then I just pressed the play button and I could just open it. Right. And the only thing I had to do is add my API keys. So it's still a little bit techy in the sense that you connect to GitHub et cetera. It does all these things but the barrier is so much lower. And also it's a lot less intimidating than VS code. For example, you've been working VS code recently. Look at this window though. Doesn't it look a little bit less intimidating?
20:23
I think it is, it does. But I will say as well, I had a situation yesterday when I was at co working space and my mouse battery died. So I was just using the keyboard and you know, okay, bit trackpad as well. Then I suddenly just came to appreciate how fast you can do everything in VST because you use all the keyboard shortcuts. Yep. So that was a real like eye opener for me.
21:01
Yeah, I mean I agree a lot of people are discovering that as they kind of like are getting into this because cloud code and working in the terminal is so good. But this would do most of that too. So it's kind of like it's pretty good. Honestly. It's quite funny because now VS code is a better. It's still a better app for actually knowledge work because you kind of want the text editor and you Want to edit the text and highlight and everything. But this is actually becoming better for code. So we're getting to a point where the original coding app becoming less good for us, and then now these new apps are becoming better for code coding. I do want to try this app as a cowork replacement as well, because I think it can do the same thing, especially when they release GPT 5.3. Like the full model, not the coding one. I'm wondering how close it's going to be to cloud code. Actually.
21:24
Let'S talk about models, though, because you mentioned the full model, not the coding model. So they released 5.3 codecs.
22:14
Yeah.
22:21
Is that specifically for use in here? What does that mean?
22:22
It's only usable in codecs, which is in a terminal, in VS code, or in this app, basically, or on the.
22:25
But if you're using the codecs desktop app, you're tied to OpenAI models, I presume.
22:32
Yes, it's the thing, you know, so you can only use the OpenAI models, but that's the thing, because Codex is. And you can see it's like one thing that's really interesting with this model is it kind of went the opposite direction from Opus. So Opus thinks longer, and it's like. It's just. Yeah, it uses more tokens. This one, if you look at this, this is the X axis here is like how many tokens were used for response. Essentially, the Y axis is the quality of the response. Right. So you can see this is the kind of like GPT 5.2, 5.2 codecs and 5.3 codecs. And what you can see is like, for example, let's say even the medium reasoning here is actually the same level as the high reasoning on these other. And it uses like a third of the tokens. On top of that, they made the overall output tokens per second about. I think it's 25% faster. So if you combine these two, the fact that it uses literally three times less token and it outputs the tokens faster. The problem with Codex was, like, you would launch it and then you would wait 25 minutes for it to finish its task. And that was really kind of like a weird vibe. Whereas this feels almost as fast as using Opus right now because of the speed and because it essentially uses less reasoning tokens for the same level of quality. And so you combine that with the app, that allows you to essentially. And that's one thing we didn't say about the app is it allows you to run multiple threads very easily because the problem when you walk in a terminal is when you have six terminal windows all over your screen, et cetera. This is kind of nice. You just have these chat threads and they just go do their work and they send you a quick notification when they're done and you jump back. So it's quite easy to run 5, 6, 7 at once, plus it's faster. And that's why I'm a bit bullish on Eventi 5.3 right now, because the codecs version is obviously for code and so on, but the full model might actually also carry these same characteristics where it just uses less tokens and it's faster and so on. And within that app, if the model has a better tone than 5.2, because 5.2 really writes really poly, we might have some decent competition with Claude. And also one thing I haven't talked about, which I've complained about with Clodex, is the limits. The limits are so much higher on Codex. We have basically the plus plan, right? We have this as a business plan, but it's the same thing. It's the plus limit for the next until the beginning of April, I think. Yeah, sorry. Until end of April, they have doubled the limits on codecs accounts. So I am on this plus plan and I find my limits with 5.3 Codex to be essentially higher than a hundred dollar cloud plan, which means that it's basically five times cheaper right now for the same usage. Now, I still like clothes better for most tasks, so I'm not going to say I've swapped it. But if GPT 5.3 delivers, this is something to keep an eye on. And Sam Altman has been polling for pricing. You can see it's like, how should we charge for codecs? Should we charge a flat monthly subscription or $20 user's base chunk, which right now is tied to your ChatGPT account. But it feels like they're going to decouple them and then they are probably going to have mid tier plans, like a $50 plan they mentioned, for example. And so I think they're kind of going after Entropic, which is a good thing, even if you're a cloud user, because that's going to force Entropic to not rip you off and keep shrinking the limits to make you upgrade because they will have competition. So regardless which team you're at, it's a good thing that competition is coming and OpenAI is reacting. And basically the codecs app is not just a coding app. I think it's going to be their kind of cowork answer through the general model.
22:37
One final thing on the codecs 5.3 is in their announcement they specifically said that there's going to be a phased rollout with limited access to the API and all sorts of safeguards in there. We talked about, it was last week or the week before about how models can essentially copy other models to learn and train from. Is this like a protection against that or why are they kind of taking this angle, this approach?
26:01
One compute. They're lacking of Compute. So again they have to kind of make sure that they can supply to people who are given access to two is actually cybersecurity. Now these models are so good at coding that they can also hack things and so they kind of like they only release in places that are monitored. So actually yesterday they released it on Cursor and on GitHub Copilot. So you can use it on more surfaces than just OpenAI. I think the third thing as well is like it makes people use the OpenAI apps when they release first there. And so it made a lot of people install the Codex app for example, or install it on the terminal. I think for them it's a win. Like if they control the whole experience, it makes them grow and it makes their investors happy basically because they control more of everything. So I think that's kind of like the three main reasons why they would do that. I think one thing that we haven't talked about by the way, is just how good is it compared to opus? I think people are like, okay, which one is better basically? So Codex is a coding model, right? So OPUS is more of a general model. And so for general knowledge work I would start still lean towards using Claude etc. However, for coding I think codecs might actually be better apart from front end. So essentially if you're building a website and web pages, OPUS makes better looking outputs. But codecs is more thorough, it will one shot things more often, basically. Whereas OPUS requires a lot more micromanaging, it will not be as deep, it will be missing some parts of implementation, etc. With coding. So if you're coding, I think that's the thing engineers, I think they should look very closely at codecs right now, marketers that want to do a bit of coding. Claude is still kind of the best kind of like jack of all trades. He's still a very good coding agent and he can also write copy and he can do all of this, etc. But it comes at a price.
26:27
Is it one of these things though that you kind of need to be somewhat model agnostic, because next month someone else is going to come.
28:07
Who knows how good 5.3 is going to be, right? I'm hoping that 5.3 is a good competitor to Opus 4.6. If it's the case, then OpenAI may very well be a good alternative to cloud code, which is fine. If you work with your files locally, you can just like I have both codecs and cloud code set up on my VS code and I could literally just click on the other tab and it would just take over from where I left because it will just read the text files that I've saved, you know, so nothing is really. And the skills work on both, et cetera. Like the MCPS work on both. I might need to move some setting files from one to another, but I can literally ask the model to do it for me. I can be like, hey, I was using code code before. Can you just migrate all these settings from CLAUDE to codecs? And we read the files and do it.
28:14
The best part of it, if something never goes wrong, you just say fix it and it just will solve it.
28:57
And I think that's one thing that's quite interesting to me is a lot of people and customers that we have, they're very afraid of platform lock, et cetera. So when we started teaching close code, for example, they were like, but what if OpenAI comes up with a better knowledge? Which is possible, I think. And it's like, then I'm like, oh, but you can migrate these files. They're like, oh, but that's a lot of work, et cetera. I'm like, you still don't comprehend everything you can ask AI to do. You can literally tell it migrate. For me, it will do it. Some weird uses I had for Opus actually last week when it came out, it's like my Internet was very slow. You know, we had a mastermind call. My Internet was super slow. I ran it on the root of my computer and I had it basically audit every single thing that was happening on my computer to identify what was wrong and was able to diagnose that. My ISP has a problem with routing to cloudflare and it set up an alternative routing for me so that everything would load properly. It also security audited my computer and basically reinforced a bunch of security because I was like, oh, we install a lot of things through all these coding apps, et cetera. I want to be sure I'm safe and my data is safe and same thing, you can use it to do all these things. So I think the biggest unlock that people need to have with these models is they can ask them many more things than what they do right now, and it does the work for you, including switching the tool for you. And so that's why I'm actually excited for OpenAI for being completion. Whether they beat them or not, whether they beat Entropic or not, that's not the problem. I can see Entropic is seeing their moment right now and everyone's very excited and they're going to squeeze you in the limits and make you pay way more for it. If there is decent competition, they will not be able to afford to do that, and that means, as a consumer, you win anyway.
29:02
And speaking of unlocks, you have recently built a new skill which helps you to generate full topical maps. First of all, anyone who isn't familiar with what a topical map is and why this is an important thing for SEO, you want to just briefly explain it and then maybe share how you've put this together.
30:37
Yeah, I mean, I can't really show you the scale right now, but I'll show you the output. Basically, a topical map is a content plan, really. It's kind of like essentially, what are the topics you should be able to talk to on your website to drive relevant traffic that will buy your products or services? Pretty simple. And what I had, like, basically, I did two of them, actually. I did one that connects to the Ahrefs MCP and extracts the data from it. And I did one that connects to a tool called Data for SEO, which is an API that you pay for on usage. This tool, actually. And so I think you need to top up 50 bucks, but then you kind of pay for your usage. There's no monthly plan, there's nothing. The data is not as good as Ahrefs, to be honest. I think Ahrefs is better. But the point is, I have Claude act like a human. So it's like, what it does is it kind of like reviews your website. It asks you a few questions about what your product is, what you do, what your vision is for, who you want to compete with, et cetera. And then it's kind of like searches for queries that your potential customers would search for. And it will look at who is appearing on the SERPs and who is repeatedly appearing on the SERPs and start building a competitor list from that. Then it's going to use either Data for SEO or the Ahrefs MCP to extract their top ranking pages. It's going to save them, and then essentially do that for many Competitors. And then after that it's going to deduplicate everything and build a map of all the topics you could write about minus what you already have on your website. It's more complicated than it sounds. There's a lot of mess to do. But the point is I actually had it built. It builds full kind of interactive maps, topical maps for you of all the topics you could write about, including when you hover all the data, essentially. Emotional intelligence training apparently is the thing. I was doing this topic on up for a friend of ours website. And so the idea is you can just visualize how you could the kind of content that you can write and it's all organized in hubs and everything. Perfect for SEO basically.
30:59
And this was like a significant amount of manual work prior to this.
32:57
You would spend a lot of time. It's not perfect. I don't think it's perfect, but I think it's decent. And it's like the good thing as well is once it gives you that map, you can look at it and you can give it feedback and you can be like, ah, I think this is kind of bad. Remove this, expand this. Maybe you're probably missing some keywords in that area, whatever. Can you just look for queries for that, identify competitors and extract their pages, et cetera. It will do it and update it for you. So you can just vibe chat, vibe code. Your topical map, basically, which I think a lot of people would like. You see like best entrepreneurship books and it just gets all of these for coaches is very good, actually. Like books about coaches, that kind of stuff. Talk books on coaching. It's pretty good, actually. So overall I think it's a pretty cool skill. It's going to release either today or tomorrow. I'm kind of putting the finishing touches on it. It's like, it's okay, but I still think some grouping can be done a bit better, for example. So yeah, that's a full cloth code skill, but I think it might run on codecs too. I might try it actually.
33:01
So if you're interested in getting a hold of this, then head on over to authorityhacker.com aiacccelerator and you can join our community there and pick up that skill to use in Claude code or wherever else you want to use it these days.
33:56
Yeah, I mean, I'm going to test it on codecs, but I have a feeling it's going to work fine, actually. Great.
34:09
So next I want to talk about the super bowl, because as Europeans we pay so much attention to that. But this year has been, seems to be like a lot more controversy in the Super Bowl. And no, we're not going down the political route, we're going down the AI advertising route. So last week Anthropic showed us the ads it was going to run during the super bowl and they are specifically targeting ChatGPT. They're basically attack ads on the fact that ChatGPT is running advertising inside ChatGPT. But it's controversial because they have quite clearly mischaracterized how ads are being run or are going to be run. Are currently run inside ChatGPT. They have an example of someone on screen talking to a psychiatrist, asking some personal questions, that kind of thing. And then the psychiatrist gives them an answer, but then turns around and starts suggesting like a cougar dating site or something crazy like that. And it's basically like ads are coming to AI but not to Claude is the kind of tagline. But that's not at all how ads are being rolled out in ChatGPT. So I mean, what's your take on this?
34:14
I think I'm a bit disappointed in Claude in the sense that they kind of try to take the more high ground always like oh, we're the ones who invest the most in security. We are like essentially they kind of put themselves as more fact based and the serious company while OpenAI is more for normies and lower level customers. Right. And they're essentially doing the opposite of their values right now, which is mis portraying and essentially misinformation at least on the current ad system. They're portraying it as a thing that will change the ad answer. So far from what we've seen. And it's, we'll see what happens when it gets fully rolled out. But it's not going to affect the answers on ChatGPT. It's just going to put essentially an ad at the bottom between the answer.
35:30
They'Re separated, labeled and yeah, not integrated into the organic answer you get now at least. But it's just the fact that it's like sown that seed of doubt I think has been almost quite powerful in a way. As much as I hate the kind of negative advertising style vibe, there's an article in Forbes saying that this was basically an $8 million signal to enterprise decision makers not to trust OpenAI. $8 million being the price of a 30 second Super bowl ad, I believe.
36:18
Which is, it's not a lot compared to the budgets they spend already. Like it's kind of a drop in the bucket. It's like, you know, it's like I think some of these companies spend, I think X AI I heard on the podcast spends a billion a month. Right. So it's like 8 million is like whatever and it's XA. I mean, I think Elon Musk has a lot of money, but still these companies, anthropic must not be that far behind. But yeah, it's whatever. But clearly they're separating themselves. OpenAI is becoming the app of the normies, even with Codex. Right. Codex is so much cheaper than Claude as well. And Claude is becoming almost kind of like the apple of AI, like the luxury product that just works with kind of cool branding. They have this little kind of like crafty branding now and so on. It works pretty well, like typewriter type stuff and like mug of coffee and like notion style of it, et cetera.
36:49
And I can tagline like keep, keep thinking like it really feels kind of Apple esque in a way, you know.
37:39
Yeah. So it's like they're doing well because we thought OpenAI would be the Apple of AI. Actually OpenAI is releasing physical devices this year, so we'll see how this comes. Apparently it's going to be headphones, but yeah, Claude is basically positioning themselves as that and OpenAI is becoming the. And Sam Altman got really pissed off these ads actually. He started tweeting about it and he was basically saying it's misrepresenting and he was also saying that they just don't have the same problem because in Texas alone there are more ChatGPT users than there is for Claude in all of USA, basically. And so Claude is a niche product for rich people and ChatGPT is a mass product with a lot of people who want to use it but can't pay, basically like only 5%.
37:46
He's correct. But I think sometimes if you rise to the occasion like that, you almost like, yeah, he gave them credit.
38:30
He gave them attention by reacting basically when he should just have said nothing. Yeah, it's still a different problem. I like Claude, but I'm afraid I like Claude too much and I'm afraid they're going to run away and start charging an arm and a leg for their services if they're getting too good, basically. And that's kind of like everything I was talking about. And that's why I'm happy Codex is providing some good competition and I hope they do. And I hope even if they're only 80% as good and they kind of act as Android does to Apple, for example, so it keeps them in check somehow. It's a good thing because Claude is now becoming this company and they have weird ethics sometimes. Right? Like Anthropic is known to, for example, ban their competitors from using their models even internally. For example, all the XAI employees were banned from Claude code. You cannot use Claude code. All the IPs are banned and everything. They really are actively trying to ban these people. They are doing a lot of trademark infringement, like they sue you, et cetera. So it's like while cloud is a really good tool as a company, it's a bit scary to see them run away with all the capabilities and being the best by far. Because they will take advantage of you, I think.
38:41
So let's move on to our final story, which is the Sea Dance 2.0 video model. So this was launched by ByteDance, the company that owns TikTok and Capcut among other companies.
39:48
But not in US anymore.
40:00
Not in the US anymore. Although I believe they do still have some kind of equity stake in there. Non controlling, but they have essentially launched a pretty good video generation model. And if you're on Twitter at all over the last week or so, you probably have seen all these martial arts or Dragon Ball Z videos like sometimes now up to 60 seconds long, which.
40:02
Is perfect audio too.
40:25
Photo realistic. Yeah. So that's the real big development there is my understanding is that in the past video would be generated and then audio would be layered on top of it. Whereas it's kind of generating the video and the audio at the same time now, which is, as they claim, an industry first it allows multi shot storytelling so you can have different camera perspectives of the same story throughout up to 60 second clip. And the big thing is, you know, one sentence video editing. And you know, the results I'm seeing out of it look pretty cool.
40:26
Yeah, it's cinematic. I mean it's for, I think for marketers it's like an ads generator basically.
40:58
Yeah, that was my thought as well.
41:04
It's like, I think like we need to do some ads this month. It's like if we, if we're gonna play with video, like I know where I'm going right now. Mostly because you have enough length, like a minute is long enough. It can do kind of like animations as well, et cetera. It's quite powerful. Like video will be solved this year, 100%. And it's cheap too.
41:06
It's kind of quite cartoony, like you know, comic book style. Like a lot of the videos that I've seen, but there was a situation like they apparently they turned off Human face rendering or something because someone uploaded just like a photo of their own face and then made like a perfect video generation of that face which, you know, could obviously be used for nefarious purposes and things. So it seems like the technology for perfect face creation is we're there, it's just, you know, safeguards are maybe something that needs to be figured out because.
41:23
I think we've been there for a while. I think Nano Banana can do perfect faces for photos, for example. And it's on purpose.
41:57
If we wanted to do ads of, you know, Mark and Gail, like you can't really do that at the moment. Like there's no.
42:05
Yeah, but it's an artificial limit to the standard. Yeah, they all made an artificial limit. Google did it, these guys do it. Everyone is doing it right now. Because it would be the end of the world if we started having fake Trump videos and put in videos saying anything on social media. But like it is already possible and it will already work and it's fine. It's just a matter of time before an open source model actually does it this year, I believe video, before we sort it back. Maybe we'll have a video podcast player model by the end of this year. So yeah, it's crazy and I think for advertising it's going to be so good. Actually I'm quite excited for advertising.
42:13
I feel like this is one of those, the early adopters of this are just going to have such a big advantage versus everyone else.
42:51
It's like I didn't bother for video but now I'm like, I think we should look at this for videos now that we're kind of but gonna start doing more meta ads and stuff like it's a big deal actually. And again, this is an API so you can connect that to your cloth code, you can do all this stuff and create all sorts of crazy skills that will automate a lot of these things, the brainstorming, the storyboarding and so on. And it's going to do a lot.
42:59
Of it and it's generating what would be like a multi million dollar high production Nike ad or something like that. But you could just do it for your local business for example.
43:20
But it's going to again, like Hollywood, et cetera, like all the stocks in the studios and stuff. It's going to tank massively. We're going to have massive deflation. How long until you can make a movie basically or a TV show episode, a 20 minute thing. It's like we're probably talking a few years because we need the length to go. It's kind of like context size in those models.
43:30
And this is some of the highest compute processing requirements to make a 60 second video.
43:52
It's cheaper than VO 3.1. It's only like, I think it's like 99 cents for like 5 seconds or something. It's not expensive compared to other video models. So not only is it better, it's cheaper too. And that's why China is very good. They make cost efficient models. Whereas like Western studios, they will release something good but they will price it very high because the R and D costs are so much higher.
44:01
Yeah. And I think for commercial purposes, like running an ad paying 12 bucks for a one minute video, like fine. But you know, it's not the case that kids are going to be making their own like, you know, cartoons for 12, like their parents are going to be saying, hey, 12 bucks for that. It's like it's still a bit expensive for mass adoption.
44:23
Yeah, yeah, I'm excited for it.
44:41
Okay, let's wrap it up there then. If you enjoyed this episode, please head over to our YouTube channel. Leave us a comment there. We do read them all. We'll try and answer as many as possible. Make sure you subscribe. Give this video a like it really, really, really does help us out. And if you know anyone that would be interested in watching this episode as well, send it to them as well. Trying to get the word out. Trying to get more people to watch this. Anyway, thanks everyone for watching. We'll see you next week for another episode. Goodbye.
44:43