Ep 751: Hands on with Google’s Gemma 4: How to Use The Open Source Model Locally and Why It Matters
43 min
•Apr 8, 202611 days agoSummary
Google released Gemma 4, an open-source AI model available in four variants that delivers frontier-level performance on consumer hardware without subscription costs. The host demonstrates how to download and run Gemma 4 locally using Ollama, comparing its capabilities to GPT-4o and Claude 3.7 Sonnet from 15 months ago, and explores implications for cost reduction, privacy, and commercial AI deployment.
Insights
- Open-source models are now competitive with proprietary frontier models from 12-15 months prior, enabling cost-free local deployment for 50-80% of typical AI use cases
- The 31B Gemma 4 model achieves top-3 open-source performance (1452 ELO) at 1/10th the parameter count of comparable models, fundamentally changing the performance-to-size ratio
- Local model execution eliminates API costs, usage limits, and cloud privacy concerns, enabling 24/7 agent operation without subscription fees or data transmission
- Apache 2.0 licensing grants unrestricted commercial freedom, enabling businesses to build and sell products using Gemma 4 without vendor dependency
- Consumer hardware (mid-range MacBook Pro with 16GB RAM) can now run production-grade AI models, democratizing AI deployment beyond enterprise infrastructure
Trends
Shift from cloud-dependent AI to local, privacy-preserving model execution on consumer devicesOpen-source models closing performance gap with proprietary models faster than expected, reducing enterprise AI spendingEmergence of personal AI software replacing cloud-based SaaS for sensitive workloads in healthcare, legal, and financial sectorsDesktop software renaissance driven by capable local models and permissive licensingMixture-of-Experts architecture enabling efficient inference on resource-constrained devicesCommercial licensing of open-source models removing barriers to enterprise adoptionAgentic AI workflows becoming feasible on local hardware without cloud infrastructureModel efficiency improvements outpacing parameter scaling as primary performance driverVendor lock-in reduction through open-source alternatives to proprietary AI platformsCost-per-inference approaching zero for organizations with local deployment capability
Topics
Google Gemma 4 Open-Source ModelLocal AI Model DeploymentOllama Desktop ClientModel Parameter EfficiencyApache 2.0 LicensingConsumer Hardware AI InferencePrivacy-Preserving AIAgentic AI WorkflowsAI Cost Reduction StrategiesOpen-Source vs Proprietary ModelsFunction Calling in Language ModelsMixture of Experts ArchitectureModel QuantizationAI Benchmarking (Arena ELO Scoring)Commercial AI Deployment
Companies
Google
Released Gemma 4 open-source model family with Apache 2.0 licensing and demonstrated frontier-level performance
Google DeepMind
Developed and released Gemma 4 model family with four variants for different hardware constraints
Anthropic
Restricted Claude API usage for open-source model inference, driving adoption of alternatives like Gemma 4
OpenAI
GPT-4o used as performance comparison baseline for Gemma 4 evaluation against 15-month-old frontier models
Apple
MacBook Pro hardware used as consumer device benchmark for running Gemma 4 locally
Hugging Face
Platform for accessing and downloading Gemma 4 models for local deployment
Ollama
Desktop client software that provides graphical interface for running open-source models like Gemma 4 locally
Nvidia
GPU hardware mentioned as option for running larger Gemma 4 variants (DGX systems)
People
Jordan Wilson
Host conducting live demonstration of Gemma 4 capabilities and comparing against proprietary models
Quotes
"If I would have told you a year ago that you could use the world's most powerful models on your local machine without having to pay for it, you probably would have looked at me and said, you're absolutely crazy."
Jordan Wilson•Opening
"This Gemma is punching well above its weight class. It is competing with models 20 times its size on the open source side. This is something we've literally quite literally have never seen in the history of AI."
Jordan Wilson•Mid-episode
"I think that this is going to lead to a resurgence... I think we're going to see personal software, right? Not necessarily software that's for your whole company, software that's for you."
Jordan Wilson•Trend analysis
"Open source AI is getting smaller, faster and harder to ignore. Google built Gemma four specifically for agent work flows with native function calling."
Jordan Wilson•Conclusion
"The gap between the free local models and paid cloud services keeps shrinking fast and you can no longer ignore it."
Jordan Wilson•Closing remarks
Full Transcript
This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business and everyday life. If I would have told you a year ago that you could use the world's most powerful models on your local machine without having to pay for it, you probably would have looked at me and said, you're absolutely crazy. Well, I'm not because that day is here. Thanks to Google's new impressive Gemma for model, you can get at least 20, 25s frontier AI performance on your local machine, running it privately offline with this new impressive open source model. And what I want you to think back to a year ish ago, I'm not just telling you to, okay, think about maybe replacing that $20 a month subscription. That's not where this is going. What about if your company was spending thousands of dollars or millions of dollars on AI deployments internally or externally? Or when you think about running AI agents around the clock, right? Anthropic recently said, hey, you can't use your cloud subscription anymore for open claw. Well, now you can run with Google Gemma for you can run it around the clock and not pay a penny. No, this is not too good to be true. Yes, it kind of feels like we're living in the future and we're going to go over it all live. So welcome to every day AI. So here's the big picture. What's happening with Google's new Gemma for and it's, I think, rivaling some of its trillion of perimeter giant competitors. So Google DeepMind released Gemma for its most capable open model family today. So there's four different variations. We're going to be breaking them all down on today's show. But the big boy, the 31 billion parameter model outranks models 20 times its size. It is third on the global ranking of all open models on arena. And that's not even the best thing. The best thing maybe is that Google changed its licensing and now Gemma for is released under the very permissive Apache 2.0 license, granting full commercial freedom with essentially no restrictions. So not only is this free, you can run it on your computer around the clock. If you have the right hardware, I'm going to be breaking all of that down. But you can also create and sell things with this model. So on today's show, here's what we're going to break down. I'm going to break down for you how a 31 billion parameter model now competes with one's 20 times its size. I'm going to show you and tell you why running AI locally on your laptop or even phone changes the cost and privacy equation. And I'm going to break down live, right? The exact tools and steps to download and run Gemma for for free today as we go hands on. All right, let's get into it. This is everyday AI. Welcome. What's going on? My name is Jordan Wilson. If you're new here, well, we do this every day. And this is your daily live stream podcast and free daily newsletter, helping business leaders like you and me keep up with the ever changing AI landscape. I help you understand what's important. And show you practically how to use that information to grow your company and career. So it starts here, but make sure you go to our website at your everyday AI.com decided for the free daily newsletter. We're going to be recapping today's show, as well as giving you all of the other AI information you need to be the smartest person in AI at your company. So it's Wednesday, Wednesday, right? We have kind of different shows on Mondays. We give you the AI news on Wednesdays. We go hands on a deep dive with one new AI release, something like that. And then on Friday, we go over in our Friday features, you know, kind of five to seven new AI features Tuesday, Thursday, we rotate it a little bit. So if you are new here, that's the plan. But on Wednesdays, we get our hands dirty doing live demos of AI. But first, before we do that, I want to talk about how Gemma for how I think it's going to completely change the landscape. And the cool thing about having a daily AI podcast where you can go listen to all the episodes for free, you can go see, I've been ranting and raving about the power of small language models since 2023. And technically this is a small language model, but you get large language model performance, right? So the exact definitions of what's a small language model, what's a large language model is ever changing, right? But for the most part, you look at the number of parameters in a model. So we're going to be comparing, you know, what you can get for free out of Gemma for with what you could get out of the frontier models about, you know, 14 months ago, which at the time were GPT four, oh, and Claude, it's on it three seven. But those are models, right? GPT four, oh, was reportedly two trillion parameters. Right. So here you have a model that's about an eighth of the size and it's open source and it's free. And here's why I think it's going to change the landscape. Well, number one, I already talked about kind of this anthropic saying, Hey, you can't use your Claude subscription anymore to run open claw. You have to pay via API and people are like, that's going to be crazy expensive. Well, now you can run a genetic AI around the clock for free with Gemma four. Is it going to give you the same model as an opus five, four or a GPT five, four? Absolutely not. But there's a good chance for, you know, maybe 50 to 80 percent of what you're trying to do, this model is going to be good enough. And Google also changed the Gemma license to Apache 2.0, which provides users unrestricted commercial freedom and preventing corporate vendor dependency. That's the big thing here. I think that this is going to lead to a resurgence. And I've referenced this on the show once or twice, but I think we're going to kind of have this future that's kind of retro. I think that desktop software is going to come back. And the same way that, you know, in the nineties, we saw this wave of personal computing, I think we're going to see personal software, right? So not necessarily software that's for your whole company, software that's for you. Right. And I think that it's going to be models like Gemma four, right? That are going to allow this to happen. And also the performance versus size ratio. Absolutely just reset. All right. And I'm going to break down what this means, but essentially think of it like this. Right. If you follow, I don't know, boxing or UFC, I don't really follow those things, but there's something called like, you know, pound for pound. This is the pound for pound best fighter in the world. Right. If someone, I don't know, fighting at 150 pounds can knock out someone at 180. That's pretty impressive. Right. This Gemma is punching well above its weight class. I'm talking about it is competing with models 20 times its size on the open source side. This is something we've literally quite literally have never seen in the history of AI, which is why I think Gemma four is a huge deal. So even if you don't necessarily think that you were your company need to use this, you're like, okay, well, we pay for chat, she became a prize or Google Gemini enterprise, Claude enterprise, whatever. Right. And we have more robust, agenstic solutions already going. Okay. You still need to be learning Gemma four and building with it, not just as a backup dependency, but because it can run in, in probably in the future, right? If we fast forward one more year, I don't know if any open source competitors are going to be able to truly catch up to what we just saw from Google and their Gemma four. So let's talk quickly about the capabilities. So it can solve complex reasoning, math and multi-step logic problems. Effectively, there's native support for function calling. Yes. In a local model and structured JSON outputs for agenstic workflows. It has a context window, depending on, right, your hardware will give you all those specs in the newsletter, right? But if you have capable hardware, you can work with a 256 K token window for analyzing large document and code bases. It can analyze text images and videos natively, but excludes audio support for the bigger models, but the smaller models actually support audio and it can generate and correct code efficiently as a local offline coding assistant. Here are the four different flavors of Gemma four and oh, FYI, as I take a sip of my coffee, yeah, this is unedited, unscripted. I hope that this is going to be interesting for you, but if not, and if you listen to the podcast all the time, make sure you sign up for our newsletter, because I put a poll in our newsletter on Monday. I said, Hey, what do you guys want to see hands on on Wednesday and you all voted Gemma four? So if you want to see other types of demos on Wednesday, make sure you read our newsletter. I'll usually put out a poll, maybe Monday or the Friday before, depending on how busy things are. So I'm doing this for you. This is what you want it FYI, but I mean, I'm doing it for myself because I'd be doing the same thing anyways. But right now there are four different variants of Gemma four easy, right? But you have the E to be in the E for B. Those are essentially phone models. This is as edge as edge gets because it can actually even run on a raspberry pi, right? And your basic phones. And this is big, right? Especially if you've always, I don't know, wanting to build a certain app for something, right? And you're confused how or right? Using the Gemma E to be an E for B models can get you there pretty quickly. They're extremely capable. All right, then you have the two bigger boy models that you're going to need. Well, consumer hardware, that's that's the reality here, right? Because to get this level of performance previously before, you know, rewind more than a week ago, you couldn't on a $2,000 laptop, you couldn't run anything, right? That was a top 10 open source model. Now you can. And I think a good way to look at this is the MacBook Pro test, right? So generally Apple, right, they usually have about three to four different versions of their MacBook Pro. So obviously, these souped up ones are, you know, a little expensive. But I say if you take the middle, right, variety or the middle flavor of a MacBook Pro off the shelf, right? Walk into Best Buy Apple store, whatever. Look at the MacBook Pros and say, give me the one in the middle. Now that one in the middle can technically run the 26B version of Gemma 4. And that's because it uses the mixture of experts framework and it only activates four billion parameters and it's really fast, right? So that model, the 26B is actually faster, less capable than the 31B. But by default, you can run that, you know, you could technically run the quantized version on a 16 gigabyte MacBook, the base baseline. If you just go for the middle flavor of a MacBook Pro, which I'm trying to see here, you know, what this cost is, I haven't opened on my other tab here. Let's see. Okay. No, no trade in. No thanks. I don't want all this extra stuff. Let's see how much this is. All right. So 20, 2200, right? Which any MacBook Pro I've bought over the last 10 years has been that price or way more, right? So. Yeah. AI moves too fast to follow, but you're expected to keep up. Otherwise, your career or company might lag behind while AI native competitors leap ahead, but you don't have 10 hours a day to understand it all. That's what I do for you. But after 700 plus episodes of everyday AI, the most common questions I get is where do I start? That's why we created the start here series, an ongoing podcast series of more than a dozen episodes you can listen to in order. It covers the AI basics for beginners and sharpens the skills of AI champions, pushing their companies forward. In the ongoing series, we explain complex trends in simple language that you can turn into action. There's three ways to jump in. Number one, go scroll back to the first one in episode 691. Number two, tap the link in your show notes at any time for the start here series, or you can just go to start here series.com, which also gives you free access to our inner circle community, where you can connect with other business leaders doing the same. The start here series will slow down the pace of AI so you can get ahead. I don't think people understand the middle, you know, middle MacBook Pro that you buy off the shelf can now run a model that's about the same capabilities as the best models in the world 14 months ago. All right. And then you have the last flavor. Okay. So E two, E two B E four B for the phones and edge devices. Then you have the 26 B and then you have the 31 B dense model. All right. So when you run in this, you're running the entire thing. That's why it's dense. It's not the 26 B mixture of experts. All right. And that delivers the highest quality reasoning and coding output. And you will need a more powerful computer, but, you know, luckily for me, you know, I got a Mac studio, a fairly capable one, and it runs great on my machine. But to run this one, you will either need the most souped out, you know, MacBook Pro or Windows equivalent of that, or you will need a, you know, fairly capable Mac studio or in Nvidia GT, DGX, something like that. But you quantized, you could squeeze this one on something with about 32 gigabytes of RAM, it will run better at about 48. So, you know, as an example, my Mac studio has 64 gigabytes of RAM. All right. So now you understand the technical side. Like I said, if you have a newer middle of the line, MacBook Pro, just an easy way to benchmark it, you're going to be able to run the quantized version of the 26 B. You're going to have to have a little bit more of a powerful machine to run the 31 B dense model, but it's capable. And I'm going to show you here live. So here's why this is important. If you look at the biggest and best open source models in the world, right, like Kimmy K two five thinking, uh, well, Gemma for 31 B is now in the exact same category, but at a fraction of the cost and a fraction of the parameter count to run it and community testers have confirmed strong results in coding, reasoning and image understanding in the 31 B scored a 1452 on arena. Right. So the arena AI and formerly the L M arena, this is blind taste testing. You put a prompt in, it kicks out, you know, two different outputs from different models, you score which one's better. That's how you get an ELO score, right? And 15 months ago, the best ELO in the world was not 1450. Right. And that's what's crazy. So now this is scoring better, at least on an ELO score in all the scientific benchmarks as the frontier AI models from 15 months ago. So here's, uh, I have a little chart here on my screen for our live stream audience, podcast audience, you know, this one's not going to be super visual, but you can always go to our website at your everyday AI.com. Click episodes and you can go watch today's show. If you want to see the video version of this, but it should be fairly straightforward. I'm not going to be doing anything too visual, but we do have this from Google's announcement blog posts that shows the model performance versus size. And you'll see, we, this is literally the new Gemma 4B is uncharted territory because this is charting ELO score on one access. And then the total model size on the other. So previously to get anything like a 1450 on an open source model, right? And we don't usually know, uh, the size of proprietary models, right? So you're, uh, you know, your Gemini 3, 1 pro, your, uh, you know, Claude, uh, Opus 4, 6, your GPT-5, 4, uh, you know, et cetera, but presumably they're, you know, multiple trillions, you know, maybe one and a half to two and a half trillion per ameters, right? So thinking of this as like a hard drive size, right? If you want to simplify it. So to get this same level of performance, a 1450 ish, uh, score, um, from an open source model, you're looking at, uh, something that's about 300, 400 billion parameters in size. So again, this is about, uh, 10% of that size. Some of them, right? Like Kimmy K, uh, K two, five thinking, uh, which is a lot of people's most favorite open source model. This is like a 20th of the size, right? With the same, uh, roughly the same level of performance, at least when it comes to ELO and most scientific benchmarks. So now let's just quickly talk about what, why would you even want to run anything low like, okay, Jordan, what's the point? I don't care. $20 a month isn't a big deal. Sure. Right. But think about doing this at scale. Think about a genetic AI, right? Uh, things like open claw, right? And now that, uh, anthropocas shifted away, a lot of people are having to now use some of these open source models, but using them via open router. So it's not completely free because you can't run. You literally can't run models like Quinn three, five, GLM five, Kimmy K two thinking on anything less than a, you know, $8,000 computer, definitely on, you know, not the, you know, your average, uh, you know, MacBook pro, right? So that's the difference. This can't really run on true consumer, um, uh, prior open source models that might be able to run, uh, you know, your agentic, um, tools like open claw or other agents, you couldn't run it on consumer models. You have had to literally have like an $8,000, $10,000 computer or more. Right. And that's where the local aspect comes in handy. So being able to run things locally, keeping costs down, no subscription fees, no API keys, no loot, no usage limits after you download something, but also talk about privacy. You never have to send anything to a, to a cloud, right? Because yes, keep this in mind. As long as you are on a, uh, paid team plan with any of the big four, uh, you know, and as long as you turn off model training or turn on the basic privacy settings, you're not sending technically, uh, any of that private information, uh, to these companies, or they can't really do anything with it. Right. However, I do understand with highly sensitive documents, uh, how you might not want to send that even if you've turned off model training, right? So any sensitive industries like healthcare and legal can gain very capable AI without any cloud exposure. And then like I talked about, uh, right, the combination of, well, it's free. It can run 24 seven. It's sensitive. It runs it all on your machine. You can literally turn the internet off and use this, but then also with the new licensing, uh, you can have full commercial use. So you can literally use this for anything. And there's three ways to run Gemma for today. And then I'm going to show you how to do this live. Thanks for sticking with me. I wanted to, you know, first kind of tell you how important this is and kind of set the context here, but there's a kind of three different ways all for free that you can run at Gemma for one would be a tool like, uh, Olama, right? That's the one I'm going to show you. Uh, this is essentially, it gives local models a graphical user interface like chat GPT, very simple. Uh, right. So you can then, uh, run a terminal command and download the models in minutes, or you can even run that command in the Olama interface. Uh, also LLM studio is a great one. Same thing offers you a visual chat interface similar to chat, you can see for non-developers, uh, I guess another way. So, Hey, we'll just do four free ways to run it. Uh, you know, if any agentic system that you're running locally, uh, you can point that, uh, point that system to Gemma for, uh, on hugging face, or if it does a lot of the, um, you know, local agents that run in your computer, uh, they can run via, you know, Olama as well. So you can run an Olama command, hugging face command, uh, you know, point your agents to the download, because that's the thing you essentially download this thing and you run it. Uh, also for the, um, the other versions, right? So the two that we're going to be looking at, or the one are going to be the bigger variants, uh, right? But to run the local ones, people don't know this. Google actually has a great, uh, uh, app called Google AI edge gallery. Uh, you can download that for iOS or for Android, but this is huge. If you haven't done this already, you probably should do this because what that allows you to do is the equivalent of running it offline on your computer. Uh, right? This is an app where then you can download, uh, the, the smaller mobile edge versions. And then, Hey, if you're ever in trouble or if you're ever somewhere where you just don't have service, you at least have a highly capable, large language model on your phone that you can use at any time. All right. So let's get going. Live ish. First, I'm not going to download this line because it might take a while, right? And, uh, trying to, uh, download a large, uh, a larger file like this, while also streaming live doesn't always work. So here's what I did. I'm going to use Olamma in this case. So here's what you're going to do. You're going to go to Olamma. All right. Uh, dot com. So that's O L L a m a dot com. If you haven't already, you're going to download the program. Okay. This is a simple desktop client. Like I said, it just allows you to use any open source, open weight model on your computer, but it gives it a graphical interface. So in the same way that you would chat with chat, gbt.com, seven nine.com, claw dot AI, et cetera. This allows you to work with open models in that interface because by default, you'd be interacting with them via command line tools or the terminal, which is not always ideal for non-technical users. So go to Olamma.com, download that install Olamma on your local machine. All right. That's step one step two. You're going to download the actual model. So you're going to search, you can search models on Olamma. Just type in Gemma four, and then you're going to choose the variant that your local machine can run. So for most people, that's going to be the 26 B version. For me, I'm going to be showing you the 31 B version. All right. So all you have to do is once you bring that model up on the Olamma website, there's going to be a, it says CLI, right? But there's a little command. You're just going to copy that. Right. So this one says Olamma run Gemma four 31 B, right? All you're going to do is copy that. Then you're going to open Olamma, right? And again, very simple. All you're going to do is then paste in that command. All right. And it's going to download the model. So for me, you know, this model was about nine gigabytes. It took, I don't know, five ish minutes to download. And then that's it. And then you're ready to run with it. So let's go live. So here's what we are going to do. Live stream audience, do me a favor. Let me know if you can see my screen. All right. So I'm going to be jumping around a little bit here because I'm going to be having some, some copy and paste prompts. So about 15 months ago, I did a show comparing the latest version of Sonnet, which I believe was three seven two GPT four out. So again, going back to how I started this show, this was about 14, 15 months ago. These were the best general use case models in the world. And I had a series of prompts that kind of had like a very, you know, unofficial fun rubric that I would do comparing models. And I'm going to go ahead run the exact same prompts. All right. So we're going hands on here. Um, so I'm going to first put in a message to Gemma. And this is exactly what I did previously, just to kind of level the playing field. So all I'm saying is for this chat, please respond with proper formatting and structured bullet points, do not waste words, answer in the shortest way possible, while still being detailed enough to fill in the user answer requests. Right. Right. So this is what I did for all the other ones. This is what I'm doing it for now. All right. So here is our rubric. So, uh, test one, this is just a trick question. It's logic. All right. And when I did this, both Claude and, uh, Claude three seven saw it in GPT four Oh, got it wrong. Uh, we'll see if, uh, Gemma four gets it correct. All right. Uh, hey, love, love, love when we get little bugs. All right. Um, I'm going to have to run that again. It essentially went through the thinking, right? That's the thing. This model thinks and it reasons as well. Uh, and for those watching it live, you can just see that it did this. So we'll see if this got it correctly. Right. Uh, the correct answer, uh, should, well, I should probably read it. I said, I just woke up today with six apples and three bananas. Yeah. Live stream audience or podcast audience, try to do this live. See if you can get it. Uh, I just woke up today with six apples and three bananas. Yesterday I ate a banana and two apples this morning. I will eat one apple and no bananas. However, I don't really like apples in one banana may turn brown tomorrow. Assuming nothing else changes, how many apples and bananas will I have tonight? So a little trick question, uh, GPT four, oh, and Claude three, seven, sonnet got this wrong. All right. Let's see. Uh, all right. So it looks like also. They both got it wrong. All right. So the correct answer is five apples and three bananas. All right. So, uh, Gemma four close got five apples and two bananas, uh, which technically not that we need to gauge, uh, you know, the level of correct versus, uh, other models, uh, right. Claude, sonnet said three apples, two bananas, GPT four, oh, said three apples, two bananas. So they all got them wrong. Gemma got it a little closer, but hey, did you get this right? All right. Our next one, uh, the old man and dog crossing a river. All right. So this also shows. That the model is thinking, right? So, uh, if you're listening on the podcast, you're probably not seeing this, but it's also showing its thinking trace. All right. So the next one, I'm saying a man and his dog are standing on one side of the river. There's a boat with enough room for one human and one animal. How can a man get across with his dog in the fewest number of trips? Oh, what's so funny is I did all of these, uh, all of these beforehand, just because I wanted to make sure that they would work the first time. Uh, all right. It got this right. And now the second time it just got this wrong. Uh, which is funny, uh, but you can always go back and look and look at how it thought. Uh, so same thing, uh, Claude three seven saw it, uh, got it wrong. GBT for, oh, got it wrong. They both said three trips. Uh, the first, first time I ran this, Gemma four got it right. This one I'm doing it live here. It got it wrong. And then just for fun, I re ran it again and it still got it wrong. It said two trips, right? Interesting. That's the thing with large language models. They're generative. Uh, that's why it makes these live demos always, always fun. Uh, right. Cause doing it before offline, I'm like, okay, cool. It looks like Gemma four is going to perform, uh, much better. Uh, but it's again, getting it wrong, uh, as did the best models in the world 15 years ago, but it's getting it a little less wrong, at least for the first two times. All right. Uh, let's try the next one. All right. So our next prompt here, uh, we're saying, uh, it takes three hours to dry 10 t-shirts in the sun. How long will it take to dry 30 t-shirts in the sun? The correct answer is three hours. All right. And for, uh, reference, uh, a year ish ago, Claude and, uh, GPT got that correct. All right. It is three hours. The time doesn't change, right? Assuming, uh, and it did say drying principle, the time required to dry laundry is determined by external factors, sun intensity, humidity, not by the total quantity of items provided adequate space exists. So Gemini, Gemini four, uh, sorry, Gemma four, not only got it correct, but it did provide a, uh, some nice rationale as it thought through the problem. All right. The next one, and this already answered. That was so quick, right? Again, this is running all locally. Uh, and that was probably faster than I would have even gotten from, uh, proprietary models online, right? So I said, if you have a single match, I should remodest, you just see how fast that was my gosh. Uh, so I said, if you have a single match and you walk into a room with an oil lamp, candle and a fireplace, which do you like first? Again, these are just fun trick questions. The correct answer is the match. All right. So it got that right. Uh, Claude and GBT four. Oh, also got that right. All right. Our next one, uh, what color is an airplanes black box? All right. It's taken a second to think. Bright orange got that correct. Good. Uh, as well as the others got that correct. All right. Here's one. We'll see if, uh, this is actually correct. Because last time, uh, Claude saw it and GD for, oh, failed on this one. So I said, please give me seven jokes that end in the word blue. Two should be about animals. Three should be about some other topic in the body of this chat. That's important. Uh, right. Although in fairness, um, to the Gemma models, that technically has a little bit more to do, uh, with the harnessing of Olama in this case. Uh, right. So not exactly an apples to apples comparison, just FYI. All right. So, uh, I said three should be about two should be about animals. Three should be about some other topic in the body of this chat. And you should make up the other two. So first I'm going to see, uh, did it get the correct number? Yes. It gave me two animals, three about chat topics and two original made up. So, so far, so good. Next, do they all end in the word blue, uh, blue, blue, blue, blue, blue, blue, blue, yes. All right. So, so far, good. Uh, and then I'm going to see, uh, as long as they make sense, right? These aren't always funny, but it at least has to be a joke to pass this rubric. All right. So animal joke, why did the monkey fall into the paid bucket? Because he wasn't used to something so vividly blue. All right. Is that a joke? Sure. Is it funny? Absolutely not. All right. Let's look at the chat topics. Uh, see if it got it right, pulled the context incorrectly. Uh, why did the farmer throw away the apples? Uh, they were no longer crisp, just a sad, brown, blue. It's borderlining nonsensical. Right. Let's look at the last one. What are the next one? Why could the larger map predict the drying time because the sunlight was so strangely blue. So these are borderline nonsensical. I could say you could make the argument. They make sense. Uh, they're on, they're on the edge here. Uh, then let's look at the original made up jokes. Hopefully these are a little bit better. All right. Why did the geometry student bring a fishing pole? Uh, because he was hoping to catch something entirely blue. All right. So the jokes are trash, uh, but they actually follow, uh, the instructions. So when we're looking at instruction following this technically passed, even though the jokes were absolute garbage. Uh, but like I said, sonnet previously failed in GBT for O'ville. All right. Next one. All right. This one is much trickier. So I wouldn't expect Gemma for to get this right. So I said a box is locked with a three digit numerical code. All we know is that all digits are different. The sum of all digits is nine and the digit in the middle is the highest. What is the code? All right. So this is a very, uh, trick question because there are multiple valid answers. All right. But, uh, both Claude saw it, got this wrong. GPT for O got this wrong. So what I'm looking for in a correct answer here, um, number one, that it even gives me at least one correct answer, but there's multiple correct answers. Right. Like as an example, 1-8-0-2-7-0-3-5-1 would meet all those criteria. So, uh, Claude in GPT for O got this wrong when we did the original testing. Uh, Claude's math did not up, um, uh, GPT for O did not follow the rules. It had ones that added up to nine. Uh, right. But as an example, it gave me one, two, six, but that didn't follow the rules because the middle digits two was not the highest. So let's see. I thought for 22 seconds here, uh, it kind of went through the, the deduction process and it did give me a solution here. All right. So it technically is correct. Right. Um, whereas the other models did not even give me one correct code. Right. So Claude seven, uh, Claude three, seven, saw it said one, seven, two. That does not add up to nine. That adds up to 10. Like I said, GPT for O, uh, gave me one, two, six, which is not followed the instructions because the middle digit was not the highest. So here, uh, technically Gemma got it right. It didn't get it fully, right? But it was the only one that got it right. It said the code is two, four, three. This was technically a triple trick question because I asked for a code, but technically there are, uh, multiple codes. So it technically answered, but I would have loved a super correct answer where it said, you asked for one correct answer. Here's one, but there's actually more correct answers. But I will say that, uh, at least now Gemma got the last two, right? Uh, where the last did not get any of them right. All right. This one, we're going to go into some gray area here. All right. I don't want to make this too long because it'll probably take another, uh, five to 10 minutes to go through the rubric. So I'm just going to find some other questions that the others, uh, maybe failed or just looking to some gray area here, talking about some creativity. So this one, I said generate, uh, generate unique and creative marketing advertising strategies to grow the everyday AI podcast. Do not suggest general run of the mill ideas, only pitch clever advertising and marketing tactics to specifically grow the everyday AI podcast. All right. So, uh, for reference, uh, a year ago, Claude said, run AI teasers. Uh, virtual co-host challenge listener Q and a, uh, augmented reality experience. Uh, G.D. Foro said monthly puzzles, art contests, custom recommendations, guest AI co-hosts. All right. So let's see what Gemma for said. So, uh, it said partnership and cross promotion strategies, which is good because that's, you know, basics of growing a podcast, right? Which the others didn't come up with, even though it's not super creative. All right. So it says AI tool integration ads. It said partner with niche specific non-major AI tools. It's a good idea. Uh, industry vertical sponsorships. Uh, then it said content hijacking viral strategies, doing an AI mythbusters challenge, uh, interactive prompt battles, then community engagement tactics. So the AI challenge hotline. I like that it says dedicate a specific call in segment where listeners call with a real world mundane problem. Should we do that? Should we do that? All right. If you think we should do that, uh, also shout out because. Uh, someone from Microsoft did suggest this to me like two years ago. Uh, so shout out. I do remember Nisiani. Uh, you, you said I should do that. And I was like, yeah, we should. All right. So if you think we should do that, just say hotline, right? Uh, do a, do a, uh, uh, comments in the live stream or leave a comment on the Spotify. Just say hotline. If you think that'd be fun, maybe it won't. All right. Then it also said micro membership prompt fault. All right. So, uh, this is good. Uh, I would say these are much more impressive. Yes. This one requires judgment on my part. It is great area, but, uh, looking at what Claude three seven sonnet and GPT-4O gave me, uh, Gemma four much, much better. All right. Let me do one other that for sure, uh, some failed on. All right. So, uh, uploading, uh, uploading photos, uh, might do that. Although I don't have the original, uh, photo that I use. Let me see. Uh, all right. Let's just do one other one here. Okay. We're going to do a, uh, uploading a transcript. I like that one. So let's go ahead and, all right, let me find this file here. All right. So, uh, I'm going to go ahead and put this prompt in. It's a little bit longer. And then I'm going to be uploading, uh, two different files here. So I want to make sure that I get these, uh, get these correct. All right. All right. There we go. I should go in my downloads folder. That, that would help. All right. So here's what we're going to try. And this will probably be the last one. All right. So I said, for this chat, you will turn a podcast transcript of me, Jordan, the host of every day AI, uh, talking about AI news, um, turn it into a choppy and engaging newsletter copy. I've attached examples of previous newsletters and how they should be written as well as the most recent podcast transcript. So this is my podcast transcript from yesterday where we did a start here series about vibe coding. Uh, and then I said, please write a newsletter for the attached transcript mimicking, uh, the style as closely as possible to the example given. Uh, so we'll see here, uh, we're getting a little dot dot dot. Uh, so if I'm being honest, I don't know the last time that I uploaded, uh, two different file formats. So I uploaded a PDF and an RTF, uh, file here, uh, inside Olamas. So again, this one is not the fairest, uh, comparison because again, technically here, we're also relying on the technically the, the harness, uh, of Olamas and not just the model of Gemma four, whereas before, you know, when we're testing this against GPT-40 and quad-free seven, uh, we were using it. Okay. So I do know, okay, it is working, uh, right. I'm like, okay, this should be able to write all of us amazing. It should be able to handle, you know, multiple, uh, kind of file types. So we are, this one is, uh, taking the longest so far. This one is the first time that we're probably going to have it be able to think or reason for more than a minute or two. And again, y'all think about this, just the fact that you can have a local model that now reasons without paying a cent is crazy. All right. So it also gave me a checklist of adherence, which is great because I didn't even ask it to do that, but that is something that if I was rewriting this prompt that I've been using this for like two years, I would have rewrote this. Uh, so it went through, it created a checklist based on what it found from the examples that I, uh, that I uploaded. So as an example, right, I gave yesterday's transcript and then I gave a 30 page document of older newsletters. So it went through, it examined those. It actually only took 27 seconds. Uh, it kind of picked out the tone style, the format, the context source, all these things, hook, intro quality. Uh, let's see. All right. It actually did a pretty good job because I remember this was my intro of the podcast. All right. So I'll read it. Uh, let's be real. You can tell it a, and if you read our newsletter, let me know if this sounds like it might be in our newsletter. All right. Let's be real. You can tell an AI, you're wildest dream home and poof. A building appears in front of your eyes in minutes. It's exactly what you asked for. You move in, it's awesome. But then you want to hang a towel rack, you run, you run into a wall and you realize the entire thing is held together with duct tapes, hopes and dreams. There wasn't a permit in the foundation to shaky. You're in trouble. All right. This actually did almost too closely to my actual intro from yesterday, but as I'm looking for this, uh, as I'm looking at this, um, it actually did a pretty decent job of writing something in my tone, uh, kind of this short, uh, choppy style, like I told it to, you know, an emoji and each headline, which is what we would normally do. It has an actionable try this section, uh, which is something that we also do in the newsletter. So, uh, although this is not, uh, you know, the best ever, uh, right. It actually did a pretty good job from what I recall, uh, it did a little bit better job at instruction following, uh, than Claude three seven sonnet. I do think Claude three seven sonnet did a little bit with a tone of voice, uh, but it did a better job tone of voice, uh, or matching the tone of voice than it did, than GBT four out. So, uh, overall, when I look at the, you know, six or seven, uh, kind of different, uh, unofficial rubric test that we did here with a free local model comparing it to the frontier, uh, general use case models from 15 years ago, or sorry, 15 months ago, uh, the best in the world. It actually did better because even though it failed, right? And I wish it would have gotten it right. Like the two that it failed previously, it actually got those right the first time I ran it, uh, which didn't happen with, uh, uh, three seven sonnet or GBT four out, but still had to head in the, this very unofficial rubric, it did markedly better than the best models in the world from a year, uh, in three months ago. All right. So as we wrap this one up, here's what I want to leave you with open source AI is getting smaller, faster and harder to ignore. All right. Google built Gemma four specifically for agent work flows with native function calling. All right. So even though I didn't give an example of, you know, running this agentically, you now, I cannot tell you how important that is. If you have a middle of the road, new, uh, you know, MacBook Pro as an example, you can now have an agent that works for you 24 seven that costs zero dollars, zero cents. It's a hundred percent private. Also, what that's worth noting, uh, this is based off of the Gemini three model family. So you're not getting quite the Gemini three level, but again, you are getting a top three open source model in the world. And the only one that you can run on consumer hardware. So now users can route routine AI tests locally and cut significant costs on AI bills and the gap between the free local models in paid cloud services keeps shrinking fast and you can no longer ignore it. All right. If this was helpful, tell us what about it? Uh, you know, if you're listening, uh, live here on LinkedIn, take a second to repost this. I'd really appreciate that. If you are listening on the podcast, do me a favor, take 30 seconds. Make sure that you're following or subscribe to the show. But then if you could, if any episode of every day AI has been helpful, right? Cause we spend literally countless hours helping you all understand how this works. So if this has been helpful, please leave us a rating, uh, all those platforms as well. So thank you for tuning in. Make sure to go to your everyday AI.com, sign up for the free daily newsletter. We're going to be recapping today's show and a whole lot more. So thanks for tuning in. Hope to see you back tomorrow in every day for more everyday AI. Thanks y'all. And that's a wrap for today's edition of everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going for a little more AI magic visit your everyday AI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.