335: EKS Network Policies: Now With More Layers Than Your Security Team's Org Chart
51 min
•Dec 24, 20254 months agoSummary
Episode 335 of The Cloud Pod covers major developments across AWS, GCP, and Azure, with significant focus on AI model releases, security enhancements, and infrastructure scaling. Key topics include Meta's shift to proprietary AI, Disney's copyright lawsuit against Google, OpenAI's new GPT models, and critical security features like EKS network policies and GuardDuty threat detection.
Insights
- Meta's strategic pivot from open-source to proprietary AI models represents a fundamental business model shift that may alienate the developer community that previously supported the company
- Cloud providers are increasingly integrating AI capabilities directly into infrastructure management tools, enabling non-experts to define and deploy complex systems through natural language
- Security is becoming a competitive differentiator with providers offering increasingly sophisticated threat detection and policy enforcement capabilities at the infrastructure level
- The race for AI supremacy is driving massive infrastructure investments in data centers, networking capacity, and GPU availability across all major cloud providers
- Copyright and licensing disputes around AI training data are becoming a major regulatory and business concern, with major content owners like Disney taking legal action
Trends
AI model consolidation: Shift from open-source to proprietary models as companies seek competitive advantage and licensing revenueInfrastructure-as-code democratization: Natural language and visual tools reducing barrier to entry for infrastructure deploymentSecurity policy as code: Movement toward declarative, hierarchical security policies that can be enforced across cloud environmentsAI-driven operational intelligence: Automated analysis of system logs, thread dumps, and performance metrics using LLMsNetwork policy maturity: Layer 7 (application-level) network policies replacing traditional IP-based firewall rules in containerized environmentsRegulatory pressure on AI: Government and legal action forcing cloud providers to implement copyright detection and licensing frameworksMulti-region AI infrastructure: Massive expansion of data center capacity and networking specifically optimized for AI training workloadsIdentity-based cost allocation: Moving from resource tagging to user attribute-based cost tracking for chargeback and complianceMCP (Model Context Protocol) standardization: Emerging standard for AI agents to access enterprise data and APIs securelyHardware-software integration: Cloud providers tightly coupling devices with cloud services to justify premium pricing and prevent commoditization
Topics
EKS Network Policies (Admin and Application)AI Model Licensing and Copyright InfringementGuardDuty Extended Threat DetectionGPT-4o and Gemini 3 Model ReleasesMeta Avocado Proprietary AI ModelCedar Policy Language and CNCF AdoptionKubernetes Security and RBACCryptographic Deprecation (RC4 in Kerberos)AI Agent Integration with Enterprise DataModel Context Protocol (MCP) ServersCost Allocation and FinOpsInfrastructure-as-Code AutomationGPU Capacity and AI Training InfrastructureContainer Networking and eBPFJava Thread Dump Analysis Automation
Companies
Meta
Developing proprietary Avocado AI model, shifting from open-source strategy; spent $14.3B on AI talent and infrastruc...
OpenAI
Released GPT-4o Image 1.5 and GPT-5.2 models; licensed Disney characters for Sora video generation platform
Google
Released Gemini 3 Flash model; integrated MCP servers into Anthropic IDE; announced Application Design Center for inf...
Amazon Web Services
Announced EKS network policies, GuardDuty threat detection, Cedar policy language CNCF adoption, and Java thread dump...
Microsoft Azure
Expanding data center footprint with new regions; tripling network capacity to 18 petabytes/second; deprecating RC4 e...
Disney
Sued Google for copyright infringement on AI training; invested $1B in OpenAI and licensed 200 characters for Sora
Cloudflare
Production user of Cedar policy language for authorization and access control
MongoDB
Production user of Cedar policy language for authorization and access control
Scale AI
Founder Alexander Wang hired by Meta; company received investment from Meta for AI infrastructure support
iRobot
Filed for bankruptcy; acquired by Chinese supplier Pichia Robotics; failed Amazon acquisition in 2023 due to EU antit...
Roborock
Chinese competitor that has gained market share from iRobot in consumer robot vacuum market
Kubernetes
Integrating Cedar Authorizer for policy-based access control; receiving new network policy features from EKS
Apigee
Google API management platform adding MCP support for exposing APIs as tools for AI agents
Looker
Google analytics platform integrated with MCP servers for AI agent access to business intelligence workflows
BigQuery
Google data warehouse integrated with MCP servers for AI agent data access and analytics
People
Alexander Wang
Scale AI founder hired by Meta to lead AI infrastructure and model development efforts
Chris Cox
Meta Chief Product Officer no longer overseeing general AI unit after Llama 4 poor reception
Senator Ron Wyden
Called for FTC investigation of Microsoft regarding RC4 encryption default configuration in Active Directory
Ben Kehoe
Former iRobot advocate for Lambda and serverless technologies; now works at Siemens
Quotes
"So their strategy is get rid of open source, go make it proprietary and we'll be successful. That's a weird, weird choice."
Host discussing Meta's AI strategy•Early in episode
"I'm sure that, you know, didn't help or didn't hurt that they would go sue the biggest competitor of theirs at the same time."
Host discussing Disney suing Google while investing in OpenAI•Mid-episode
"I'm really happy to see the improved safety features because that's just coming to, you know, the news recently and some high-profile sort of events happen where it's becoming a concern for sure."
Host on GPT-5.2 safety improvements•Mid-episode
"If you've ever had to support any amount of developers in production for the ECS and EKS, one of the biggest banes of your existence will probably be troubleshooting dump analysis."
Host introducing Java thread dump analysis solution•Later in episode
"It's one of those things that, you know, if you don't know how compromised that this cipher is, you don't really prioritize getting back to it and fixing the ciphers that are used in the encryption."
Host on RC4 deprecation in Kerberos•Azure segment
Full Transcript
Welcome to the CloudPod, where the forecast is always cloudy. We talk weekly about all things AWS, GCP, and Azure. We are your hosts, Justin, Jonathan, Ryan, and Matthew. Episode 355, recorded for December 16th, 2025. EKS Network Policies, now with more layers on your security team's org chart. Good evening, Ryan. How are you doing? I'm doing well. How are you? Doing well as well. You know, we're all turning into pumpkins here as the Christmas holidays rapidly approach. So this is our last episode that will publish this year. And then we will drop the game in January, our year-end look back and our look ahead predictions and see how we did on predictions last year. That'll be our first show back into the new year. But we're taking off next week for Christmas because we've realized trying to do Christmas shows is just bad. It doesn't work. Plus, after re-invent, we're tired and we want to go do other things. We will definitely have to follow up on a couple things when we get back, but definitely look forward to the new year. New year in cloud. Maybe less AI. Probably not. Probably more AI. It's really how it's going to work. On that note, let's get into it, shall we? Yeah. Meta is apparently developing Avocado, their new frontier AI model codenamed to succeed llama now expected to launch in 2026 in q1 after internal delays related to training performance and testing the model may be proprietary rather than open source making a significant shift from meta's previous strategy to freely distribute llama's weights and architecture developers meta has spent apparently 14.3 billion dollars since june to hire scale ai founder alexander wing and acquire a stake in scale company has apparently restructured ad leadership after alum of four's poor reception in April with the chief product officer, Chris Cox, no longer overseeing the general AI unit. A medic had 600 jobs in the super intelligence labs in October. So their strategy is get rid of open source, go make it proprietary and we'll be successful. That's a weird, weird choice. I don't know. I kind of feel like that was what they're really, their value prop was, was that they were open source because a model built on Facebook data and Instagram is either probably very racist or full of thirst traps. So I'm not really sure how that works out in the model, but I just don't know if it's a really interesting, compelling story for them. Yeah. I was very surprised by the proprietary bit, and I'm just not sure. Like, I guess I don't really understand the business of, you know, the AI models. And I guess if you are going to, you know, offer like a chat service or that kind of thing, you have to sort of have a proprietary model, I guess. But it's kind of strange. Yeah. so anyways we'll keep an eye on that see what they do if they actually release a new proprietary model in q1 but uh apparently this is uh causing a lot of rifts inside their culture so see how that continues to uh evolve as they try to become a big player in ai hopefully better than their story in ar vr and the metaverse yes yes hopefully uh disney has apparently sued google for infringing copyright on a massive scale, they allege. They issued a cease and desist letter to Google alleging copyright infringement through its generative AI models, claiming Google trained its systems on Disney's copyrighted content without authorization and now enables users to generate Disney-owned characters like those from The Lion King, Deadpool, and Star Wars. Which represents one of the first major legal challenges from a content owner with substantial legal resources against a cloud AI provider. The legal notice targets two specific violations. Google's use of Disney's copyrighted works and training data for its image and video generation models and the distribution of Disney character reproductions to end users through AI-generated outputs. The case could establish important precedents for how cloud providers handle copyrighted training data and implement content filtering in AI services. The outcome may force cloud AI platforms to develop more sophisticated copyright detection systems or negotiate license agreements with content owners before deploying generative AI. Disney's involvement brings considerable legal firepower to the battle. And so we'll see how this one goes next year. I'm sure it'll get to court. Yeah. Disney suing for copyright infringements. Shocking. Shocking. The biggest abuser of copyright out there. Not violating it, but making it last as long as possible for Mickey Mouse. Right. And then they are just renowned for being completely evil. Yes. For anyone, like, you know, really sticking to, like, parody standards for people making fun of Mickey Mouse or all kinds of things like that. Just any reference. so it does make sense if there's going to be a major like AI case that sort of sets the press in what's okay and what's not for AI it does make sense it'll be brought by Disney yeah sort of weird timing though don't you think you know why in December of all time and then this story comes across my desk Disney is investing one billion dollars in open AI and is licensed 200 characters for its AI video app Sora oh okay so this is apparently marking the first major hollywood studio content licensing deal for opening eyes video platform which launched in late september and faces faced industry criticism over copyright concerns the three-year license agreement allows sora to users to create short video clips featuring licenses and characters representing our chef from opening eyes previous approach of training models on copyright material without permission this deal is notable given disney's history of aggressive copyright protection in the lobbying that shaped modern u.s copyright law in the 1990s open ai has been pursuing content licensing deals with major IP holders after facing multiple lawsuits over unauthorized use of copyright training data. I'm sure that, you know, didn't help or didn't hurt that they would go sue the biggest competitor of theirs at the same time. So that's just super convenient. But it's sort of weird. It's like, well, so OpenAI is licensing these characters, but then Disney's investing a billion dollars. So is that like a, well, Disney wanted two billion, but we gave them an, you know, or they wanted to charge us more, but we gave them a discount because they're an investor or something. I don't know. it's a weird weird worded story it's business relationships are never as straightforward i you know as i've learned more about the business you know growing older i guess and so it's like these all these little sort of hooks and trying to you know leverage each other for for shared success in theory it's kind of funny but or is it just a way to like get out of the lawsuit so they can generate the content. I don't know. It's a weird product differentiator if that's the case. It is, yeah. I mean, I don't... Is Sora even a thing anymore? Remember, it kind of had a big moment of everyone using it and then I feel like it kind of died off pretty quickly because it was so bad in so many ways. I mean, I don't know. Do your kids like Sora? There's little tools and little websites and stuff where it sort of comes into play, but it does seem like it's sort of lost its luster. It's no longer shiny and new. Yeah. I see that Sora watermark on a video and I'm just like, I swipe on by. I was like, yeah, whatever. I don't care. Well, OpenAI, as we know, is in the crunch mode, trying to deal with the existential threat brought on by Gemini, and so they released two new models for us this week. First is the release of GPT Image 1.5, their new flagship image generation model now available in chat GPT for all users and via the API. The model generates images up to 4x faster than the previous version includes a dedicated images feature in the chat gpd sidebar the preset filters and prompts for quick exploration uh it's very competitive against nano banana and i was looking at some of the charts that's already jumped to the top of the charts and all image modifications just slightly above nano banana wow it's i mean the the image generation of of ai has gone a long way in a short period of time so it's um i was pretty surprised with nano banana fixing to some of the horribleness that I experienced in early days of Planner with it. So let's play around with this and see if it's the same thing. I'm really happy to see that it also has improvements for text rendering. Yeah. I mean, definitely the video, the text in images has gotten significantly better on all the models. Thank goodness. I didn't realize how much I generated images with text in it. But it, yeah. It happens a lot. It's very high. Yeah. Well, the nice thing is too, and some of them you can now click in the image like, oh, I don't want this part. And so that helps the model actually do the edits you want now. So there's definitely some nice improvements coming into image editing in general. Yeah. Well, if you don't care about images, but you care about chat GPT and overall LLM performance, they released GPT 5-2, now available for paid users and via the API as GPT-5.2 in the API, with three variants, including instant for everyday tasks, thinking for complex work, and pro for highest quality applets. The model introduces native spreadsheet and presentation direction capabilities of chat GVT Enterprise users reporting 40 to 60 minutes saved daily on average. So apparently all the funding decks they had to build were getting old and they just wanted the AI to do it for them. GVT 5.2 Thinking achieves a 70.9 win rate against human experts on GDP Val. Benchmark spanning 44 occupations and set new records on Software Bench Pro at 55.6. the model demonstrates 11x faster output generation and less than one percent of the cost of expert professionals on knowledge work tasks long context performance reaches near 100 accuracy on four needle mrc are varying up to 256 000 tokens the new response compact endpoint extending effective context window for tool heavy workloads api pricing set for 1.75 per million input tokens and 14 per million output tokens with 90 discount on cached inputs model introduces improved safety features, including strengthened responses for mental health and self-harm scenarios. Yeah, I mean, I'm happy to see the improved safety features because that's just coming to, you know, the news recently and some high-profile sort of events happen where it's becoming a concern for sure. So it's I want to see more protections there and more in that space across all the providers. But I'm still very dubious of a new model from open ai right now just because the last few models that they've released have just been almost unusable for my day-to-day which yeah i mean they definitely the you know the fact that they felt they were under threat from gemini was because they were yeah and they were definitely hurting in general so definitely interesting developments as well all right let's move on to uh some cloud tool news amazon is open sourcing cedar uh by having join the oh sorry it's already open source but now it's joining the cncf as a sandbox project solving the problem of hard-coded access control by letting developers define fine-grained permissions as policies separate from the application code supporting rback aback and reback models with fast real-time evaluations the language stands out for its formal verification using the lean theorem pervert and differential random testing and specification providing mathematical guarantees for security critical authorization logic production adoption has been strong with users including Cloudflare, MongoDB, AWS Bedrock, and Kubernetes integrations like Kubernetes Cedar Authorizer. The CNCF move provider vendor neutral governance and broader community access beyond the AWS stewardship. And I'm sure this is really the key to why you do this. You want Google and Azure to adopt it in addition to the other companies that didn't really care as much about that lock-in. But if you make it part of the CNCF, now you can push it to become a standard and become part of the larger ecosystem, which would be great because if you could have the same CEDAR type controls across all three cloud providers, then you can make policies that actually reference or things across those providers. Exactly. And I think this type of policy is going to be absolutely key to managing sort of permissions going forward. I mean, there's already such a problem and constraint in allowing AI agents, you know, if you think about a cloud role, cloud permissions, like it's not as easy to define sort of what an AI agent can do and can't do. and I think it's going to change a lot. And so policy evaluation is probably going to be very important when it comes to those things. I'm happy for Cedar to be more prevalent because I like it better than Rego and local policy agents. Yeah, I definitely like it better than OPA. So yeah I with you right there All right moving on to general AWS news GuardDuty extended threat detection now identifies a coordinated crypto mining campaign starting from November 2nd, 25, where attackers use compromised IAM credentials to deploy miners across EC2 and ECS within 10 minutes of the initial access. New attack sequence. Compromised is a group finding correlated signals across multiple data sources to detect sophisticated attack pattern, demonstrating how extended threat detection capabilities launched at reInvent in 2025 can identify coordinated campaigns attackers employed a novel persistent technique using modify instance attribute to disable api termination on all launch instances forcing victims to manually re-enable termination before cleanup and disrupting automated remediation workflows they also created public lambda endpoints without authentication and established backdoor im users with scs permissions showing advancement in crypto mining persistence methodologies behind typical mining operations campaign targeted high value GPU and AML instances through autoscaling groups configured to scale from 20 to 1,000 instances with attackers first using dry run flags to validate permissions without triggering costs. The malicious Docker Hub image Yannick 65958-secret accumulated over 100,000 polls before takedown and attackers create up to 50 ECS clusters per account with Fargate tasks configured for maximum CPU allocation of 16,384 units. AWS recommends enabling GuardDuty runtime monitoring alongside the foundational production plan for comprehensive coverage as runtime monitoring provides host level signals critical for extended threat detection correlation. So yeah, this new thing they announced, you should use it because it sounds pretty darn good. Yeah, I mean, it's like just the sophistication of these attacks. It's just so much more than, you know, the old school days of like you accidentally check in your credentials and someone's going to launch EC2 and just have it run and it runs for however long until you find it. Like it's the fact that they're creating artifacts that you can later then exploit and, you know, call the websites in order to get back into the account. And, you know, especially disabling the termination. That's rough, right? Like that's... Because that's one of those things that you don't really know, you know, like that's a deep feature, right? Like that. One of the other hacks I've heard about in the past is you, you know, you change the encryption key for CloudTrail to... Yeah, CloudTrail, right? that's the overall audit mechanism. Yeah, that's the overall log. Thank you. It's been a rough week. But yeah, so CloudTrail, you basically use a different encryption key for a different account. Use that to encrypt it, but since the encryption key is another account, you can actually now access your own logs, but it doesn't trigger the, it doesn't trigger if you'd like to turn it off CloudTrail logs. So just changing the key was a pretty sophisticated hack. I think it fixed that. Oh, I see. Yeah, okay. So yeah, kind of removes it as a signal. Yeah. So basically, oh, we're still logging CloudTrail, but you can't read the data and it's not actually providing any value because you don't have the encryption key the decryption key to encrypt it so yeah that's hackers are the same tools we do for development right and that's what we saw in our early days of AI as well is that you know all of a sudden all of the telltale ways you divinified phishing you know typos and weird you know language and images that don't look right is all gone because the AI just does it so beautifully well now for them yep Amazon EKS is now supporting admin network policies and application network policies, giving cluster administrators centralized control over network security across all namespaces, while allowing namespace administrators to filter outbound traffic using domain names instead of maintaining IP address lists. This addresses a key limitation of standard Kubernetes network policies, which only work with individual namespaces and lack explicit design rules or policy hierarchies. The new admin network policies operate in two tiers, admin tier rules that cannot be overwritten by developers and baseline tier rules that provide default connectivity but claim to be overwritten by standard network policies. This enables platforms for teams to enforce cluster-wide security requirements like isolating sensitive workloads or ensuring monitoring access while still giving application teams flexibility within the boundary. Application network policies is exclusive to EKS automode clusters. Add layer seven, fully qualified domain-based filtering, traditional layer three or four networking policies, solving the problem of managing egress to external services with frequently changing IP addresses like the cloud. Instead of maintaining IP lists for SaaS providers or on-premise resources behind load balancers, teams can simply whitelist domain names like internal-api.companyname and policies remain valid even when underlying IP addresses change. Requirements include Kubernetes 1.29 or later, Amazon VPC CNI driver, version 1.21 or newer, standard EKS clusters and EKS auto mode for the application network policy with DNS filtering capability. Yeah, this is one of those things that, you know, it's showing a maturity level of container-driven applications just because it's been, I think, you know, a while since security teams have been aware of like some of the things that you can do with network policies and routing. and so it's been you know and you want to empower your developers and and have that but then being able to sort of also be able to sort of have a comprehensive you know ability to sort of ban and and approve traffic has been missing from a lot of these you know basically ingress controllers and so this is uh i think this is a great thing for for security teams and and probably going to be a terrible thing for developers who have had a little bit more free reign and have been able to develop a little bit more freely for networks. But let's see. I mean, I've always been very conflicted about what you can do with Ingress and Kubernetes in general. You've always been conflicted about Kubernetes, period. Yeah, Kubernetes completely. Yeah, and this is just one of those things that we're watching it turn into what the old controls were that were firewall-based at the network level. And so it's going to be one of those things where we have to figure out what the balance is between those protections and the flexibility. And that doesn't change depending on the technology that's actually doing the protection. So you just move the problem around to Kubernetes and now it's being moved around more. So, yeah. Well, if you've ever had to support any amount of developers in production for the ECS and EKS, one of the biggest banes of your existence will probably be troubleshooting dump analysis. of your container. And so it always becomes a problem because the devs want to be able to connect their debuggers to the container and do the things. And none of that really works well in a containerized work environment, despite things like ECS Connect and other tools like that. So AWS is actually trying to solve that problem by giving you automated Java thread dump analysis solution that combines Prometheus monitoring, Grafana alerting, Lambda orchestration, and Amazon Bedrock AI to diagnose JVM performance issues in seconds rather than in hours. The system works across both ECS and EKS environments, automatically detecting high thread counts and generating actual insights without requiring deep JVM expertise from operations teams. I mean, I don't have to know how to do that anymore. Does anyone have deep JVM experience anymore? Right. The solution uses Spring Boot Actuator endpoints for ECS deployments and Kubernetes API commands for EKS to capture thread dumps when Grafana alerts trigger. An Amazon bedrock then analyzes the dumps to identify deadlocks, performance bottlenecks, and thread states while providing structured recommendations across six key areas, including executive summary and optimization guidance. deployments is handled through a cloud formation template available in the in the java on adibus immersion day workshop but all thread dumps and ai analysis reports automatically stored in s3 for historical trending architecture follows event-driven principles with modular components that can be extended to other diagnostic tools like heap dump analysis or automated remediation workflows this has been riches jv metrics with contextual tags including cluster identification and container metadata nailing the lambda function to determine the appropriate thread dump collection method and this metadata-driven approach allows a single solution to handle heterogeneous container environments without manual configuration for each deployment type pricing follows standard adobe service costs for lambda invocations bedrock lm usage per token s3 storage and cloud watch metrics which tells me that if you have a bad container that crashes a lot that you could spend a lot of money in lm usage for tokens uh analyzing your same your exact same crashed up every time so yeah do keep that in mind uh and we want to enable this i guess it is it automate i mean i don't know i guess it's it is i mean it's it's i don't know if it's automatically launched but it's definitely like once you kick i mean they do it off a grafana trigger so that's you know the grafana alerts on the dump and it kicks off the base of the process i think yeah i wonder yeah it could be just that it dumps it into s3 and doesn't do the analysis i don't know like but it's definitely one of the things that like ai like sweet jesus something thread dump analysis like i hate i've always hated it like enough it oh yeah it's one of those like you have to do it but it sucks and there's so much complexity and then you have to have understand the code line that you're actually generating the dump and you have to have all this context and so typically it depends on support operations guy who doesn't have the context and he's just the middle man yeah so so it's just it's a terrible thing this job that ai can definitely do better than i can yeah and i even in my own work uh when i get thread dumps now i'm just like yeah i got this thread dump and I'm like and even there's some times where I've been looking at it I'm like I'm not sure what's wrong actually then like it it analyzes goes like oh this is very clearly an obvious thing and I'm like okay yeah did not connect that right those dots right away like you did so good job yeah it's it is really um I find it really powerful I mean I don't never I've never really felt great at analyzing thread dumps but it is really nice to just have a solution that just tells me what's wrong because that's all I've wanted the entire time and it turns out this garbage collection. It's just always garbage. If you don't have your garbage collection tuned right, your app doesn't work. EC2 autoscaling now offers a synchronous API to launch instances inside an autoscaling group. This provides synchronous feedback when launching instances allowing customers to immediately know if capacity is available in their specified availability zone or subnet. It suggests the scenarios where customers need precise control over instance placement and real-time confirmation of scaling operations rather than waiting for the asynchronous results. If you've ever updated an auto scaling group and you're waiting for it to do a refresh this is a great feature for you because this allows you to be much more particular about how you want to do things the API enables customers to override default auto scaling group configurations by specifying exact availability zones and subnets for new instances also maintaining the benefit of automated fleet management like health checks and scaling policies the feature is partially useful for those that require strict placement requirements when you implement fallback strategies quickly when capacity constraints occur in a specific zone which happens a lot more now that AI is using all the spot capacity so overall nice little feature here that i've always sort of wanted do you like this i find that you know like the auto scaling there and the things that you know that it's basically allowing you to tune or all the things that i moved to auto scaling for right i don't want to deal with any of this nonsense like um and so you still have to maintain your own sort of orchestration that understands which zone and and that kind of you know that you need to roll out to so because it's going to have to call that API. Well, I mean, there's definitely times when I wanted it. I don't think I use it all the time, but I definitely would have appreciated in certain scenarios. I mean, I'm always happy for another knob, that's for sure. But yeah. AWS is now enabling cost allocation based on workforce user attributes like the cost center, division, and department imported from IAM Identity Center. This allows organizations to automatically tag per user subscription and on-demand fees for services like amazon q business q developer and quick site with organizational metadata for chargeback purposes the feature addresses a common finops challenge where companies struggle to attribute sas style aws application cost back to specific business units and once user attributes are imported to iam identity center and enabled as cost allocation tags and billing console usage automatically flows to cost explorer and cur 2.0 with the appropriate organizational tags attached this capability is particularly relevant for enterprises deploying amazon q business or quick site at scale where individual user subscriptions can quickly add up across departments instead of mainly tracking which users belong to which cost centers. I mean, in general, even I get why they matters for Q Business or QuickSight or those things, but also just the ability to be able to create rules based on this data in IAM Identity Center. So as you create new instances, you can automatically populate this data. There's lots of use cases when this gets interesting real quickly. And this is a really nice feature that I'm very happy about. Yeah. And, you know, how neat would it be to have, you know things automatically tagged rather than just on policy that has to be predefined and hopefully encapsulates every every business rule you need to just having even compute resources or individual resources in the cloud be automatically tagged based off of you know someone accessing it or deploying it or who knows it kind of neat i like this uh i like this model i hope it sort of takes off I hope it takes off too because it has a ton of value in a lot of use cases It is going to highlight me because typically I ask a lot of questions of data and especially in a BI program. It's like, what is this? And I'm constantly looking at things and filtering things. And now that AI is a thing, I'm just asking it questions all the time. And so that's going to be a problem because I'm going to be called out. I'm like, I'm expensive. moving on to GCP they're looking at open AI and saying yeah yeah catch me biatch they released Gemini 3 Flash and Gemini 3 Flash for enterprises this week the Gemini 3 Flash positions it as a frontier intelligence model optimized for speed at reduced cost the model processes over 1 trillion tokens daily through Google's API and replaces Gemini 2.5 Flash as a default model in the Gemini app globally at no cost to users I mean this is just in general is a pretty big improvement from not only the cost perspective but also the overall performance and the ability to run this on local devices for like Android phones is going to be a huge breakthrough in LMP performance on device so I suspect you'll see a lot of Gemini 3 Flash getting rolled out all over the place because it does a lot of things really darn well yeah and it's you know that being able to run these smaller models is becoming at least more important and visible to me as it's becoming more ubiquitous you know on little things here and there that i'm using ai for so it's not everything has to it's going to have like a supercomputer behind it you want quicker results so that's kind of neat you know i still don't really understand any of the you know performance metrics that they use on these things but you know looks like better numbers so good i mean you had to go look at all the benchmarks and how the benchmarks work and i tried to read through some of them and i can tell you they're they're dense they make sense but they're just, they're involved. And they all have different idiosyncrasies to them, just like any other benchmark does for CPUs. Google has integrated model context protocol servers into its new anti-gravity IDE, allowing AI agents to directly connect to Google Cloud data services, including AlloyDB, BigQuery, Spanner, Cloud SQL, and Looker. The MCPE toolbox for databases provides pre-built connections that eliminate manual configuration, letting developers access enterprise data through a UI-driven setup process within the IDE. The integration enables AI agents to perform, oh, sorry, BigQuery and Looker connections, extend agent capabilities into analytics and business intelligence workflows, and agents can forecast trends, search data catalogs, validate metric definitions against semantic models, and run ad hoc queries to ensure application logic matches production reporting standards. MCP servers use IAM credentials to secure password storage, maintain security while giving agents access to production data sources. And this all leads into the fact that Google has the officially provided fully managed remote MCP protocol servers that are plugging into End of Gravity. So not only if you use End of Gravity, you get this capability, but you also get this capability through the officially supported Google MCP servers for anything that talks to MCPs. So in general, they're very happy with that. Yeah, it's interesting. It's interesting that they're rolling this out as part of their IDE. I guess, you know, like, where else are you going to put an MCP, you know, sort of connection? but I've always you know like there's a lot of IDEs and I'm definitely not one to use like specific IDEs for specific products anymore so like I feel like that's an older model I did download this IDE that I've never heard of until this week. Well it's new right? I thought it was new. Yeah I mean it's new-ish anti-gravity and so I'm curious to play with it. I'm not super interested in moving off of my tried and true Visual Studio code that I've used forever now. But I'm always willing to try a new IDE out once. That's how I got to Visual Studio Code in the first place because I used other tools before that one. So definitely interesting. But in addition to all of the Google Cloud Direct stuff and then you also get access to things like the Google Maps platform for location grounding, which is pretty nice. So even some of the SaaS applications are giving you something as well. And then if you're like, well, MCPs are great, but I really could use it in Apigee. You can also get apigee support inside for mcp allowing organizations to expose their existing apis as tools for ai agents without writing code or managing the mcp server itself and google handling the infrastructure transcoding and protocol management while apigee build applies its 30 plus built-in policies for authorization dedication and security to govern agentic interactions so a lot of mcp love today in the google cloud world all integrated of course into the adk the mcp proxies and the security capabilities of Apigee. I just did some real-time analysis about the feature of the MCP and then also the browser and stuff. It's one of those things where it is the newer model of coding where you're having distributed agents to do tasks and that. So the new IDs are sort of taking care of or taking advantage of that. And it is a VS Code fork. So it's very comfortable to your VS Code users. Just like VS Code or Cursor. Yep, exactly. Cursor. I see import from VS Code. I just happened to open it because I'm like, oh, yeah, I installed that earlier and I didn't actually run it. And so I did hit that when we were talking about it. And I do see it's like, import all your stuff. And then how do you want to use your agents, et cetera? So, yeah, okay. Makes sense. Everyone just using VS Code. That's the new. It's like everyone uses the Chrome engine now for their browser. Exactly. Yeah. Well, Google's application design center has now reach and general availability as a visual ai powered platform for designing and deploying terraform backed application infrastructure on gcp serves integrates the gemini cloud assist to let users describe infrastructure needs in natural language and receive deployable architecture diagrams with terraform code automatically registering applications with app hub for unified management the platform addresses platform engineering needs by providing a curated catalog of opinionated application templates including specialized gke templates for ai inference workloads using various llm models organizations can bring their own terraform configurations from git repositories and combine them with google-provided components to create standardized infrastructure patterns for reuse across development environments the gaa features include public apis and gcloud cli support vpc service control compatibility and git ops integration for ci cd workflows service offers application template revisions as an immutable audit trail an on-site tax configuration drafts between intended design and deployed applications to maintain compliance the platform is available free of cost for building and deploying application templates with pricing details at the website in the show notes that's a pretty nice one yeah it's kind of the pangea that everyone's been hoping for for a long time and with technology i guess ai is making it possible just be able to you know plain text speak your infrastructure into existence versus having to know you know specific hcl language and and how to call it and doing all the research of which modules are available and linking it all together. Yeah. Interesting to play with this. I'm definitely going to take a note to follow up on this one. Yeah. I mean, I definitely like this model better than like the, you know, like the Beanstalk or, you know, the sort of hosted application model, which has kind of been the solution up until this, right, is, you know, you don't have the infrastructure expertise in order to make these things work. So you use a hosted platform. this is sort of the this is the answer I want which is I don't really want to create a whole bunch of you know underlying infrastructure configuration and maintain it into existence if I don't have to but I do want the flexibility and the certainty that it does provide having it you know templated and something that I can apply policy against and review. That's very cool In an Azure world if you are familiar with Kerberos and the curse of Kerberos you probably are very familiar with Windows rc4-based authentication requests and the risk of such things and microsoft is finally killing and deprecating rc4 encryption only after 26 years of default support allowed following its role in major breaches including the 2024 ascension healthcare attack that affected 5.6 million patient records cypher has been cryptographically weak since 1994 and enabled kerbera roasting attacks that have been compromising enterprise networks for over a decade windows servers have continue to accept rc4 based authentication requests by default even after aes support was added creating a persistent attack factor that hackers routinely exploited senator ron wyden called for the ftc investigation of microsoft in september for gross cyber security intelligence related to this default configuration deprecation addresses a fundamental security gap in enterprise editing manager that has existed since when the after directory had launched in 2000 and organizations using windows authentication will need to ensure their systems are configured to use AAS encryption and disable the RC4 fallback to prevent downgrade attacks. This change affects any organization running at directory for user auth and access control. Click those in healthcare, finance, and other regular industries, or really anybody who uses Windows. I mean, like, this has been such a problem for so long. So, about time. Thanks, Senator Ron Wyden, for, you know, gross negligence claims against Microsoft. I mean, finally, if that's what it takes to motivate them, that's not great. Literally an act of Congress, right? Well, act of Senate, I guess. But, yeah, it's a, like, it's, 80 is so complex and it's hard to get running in the beginning that almost everyone just sort of accepts the defaults to get it up and going. And then this is one of those things that, you know, if you don't know how compromised that this cipher is, you don't really prioritize getting back to it and fixing the ciphers that are used in the encryption. And it's so, like, I'm really happy to see this. it's always been sort of this weird like black mark that makes me not trust you know windows in general well I mean this is the problem with you know not being secure by default and with the challenge I think you know even when they supported like TLS 1.2 they would accept TLS 1.0 first and so to disable that you had to go turn it off on the registry and all these things it's like no it's like why don't you fall back to the less secure ones versus the other way around then it becomes an opt-in versus an opt-out. And that was one of the reasons why people always said Windows was less secure and all that. So it's good. But yeah, to have a senator shame you into this is kind of sad. Well, Azure last week saw, you know, S3 got 50 terabit bucket support. And so they've also announced Azure Blob Storage will now scale to Xify. It's 50 plus terabit per second throughput and millions of IOPS specifically architecting to keep GPUs continuously fed during ad training workloads. The platform powers opening eyes model training and includes new smart tier preview. Bob Maham moves data between hot, cool, and cold tiers based on 30 to 90 day access patterns. Azure UltraDisk delivers new sub 0.5 milliseconds latency with 30% improvement on Azure Boost VM scaling to 400,000 IOPS per disk and up to 800,000 IOPS per VM on new EBS v6 instances and the new instant access snapshots preview eliminates pre-warming requirements and reduces recovery times from hours to seconds for premium SDV2 and UltraDisk. Azure managed Luster AMLFS 2.0 Preview supports 25 petabit namespaces and 512 gigabits of throughput featuring auto-import and auto-export capabilities. And Azure Files introduces ENTRE-only identity support for S&P shares, eliminating the need for on-premise AD infrastructure and enabling cloud native identity management, including external identities for Azure Virtual Desktop. And the Storage Mover adds cloud-to-cloud transfers and on-premise NFS to Azure Files' migration capabilities. Wow, that's a lot of good stuff. It just dawned on me as you're reading through here, because I was like, this is, you know, interesting getting all this high performance from, you know, object stores just sort of blows my mind. And then I realized that, you know, like all these sort of cloud file systems, quote unquote, have been backed underneath by these object stores for a long time. Like, oh, of course they need this. Yes, of course they do. Because I was wondering why they were talking about, you know, the user managed luster and these things that are file system based in this. and then I'm like wait a second oh oh you have to have that large object store in order to provide the file systems on top of it uh-huh which because I've always wondered how people were using object store and like high perform like how do you use a 50 terabyte bob you know I'm like I don't know how you do that like ah now I do like you don directly yeah you don directly that well I mean even even when they first launched EBS it was all everyone kind of knew that it was built on top on top of S3 But how it was done people were like I don really know how that works Oh we understood it was on EBS and the performance really told you that it didn't wait. Yeah, it wasn't great. We knew. In the beginning, yeah. Yeah. It's pretty good now, right? But it is interesting. And it's just funny. I'm just like, my eyes are opened all of a sudden. well the insatiable thirst for power and ai supremacy is driving microsoft to expand its u.s data center footprint with a new u.s east 3 region launching in greater atlanta in early 2027 plus adding availability zones to five existing regions by the end of 2027 the atlanta region will support advanced ai workloads and features zone redundant storage for improved application resilience designed to meet lead gold standard certifications for sustainability expansions adding those availability zones to the north central u.s west central u.s and u.s gov arizona regions plus enhances existing zones in u.s east 2 virginia and south central u.s texas azure government customers get dedicated infrastructure expansion with three availability zones coming to the u.s gov arizona in early 2026 specifically supporting defense industrial based requirements and this is all represents a pretty large infrastructure investments to support organizations like the university of miami using availability zones for disaster recovery and hurricane-prone regions and many other use cases that they highlight in their article. Yeah. I mean, AI is definitely driving a lot of this, right? Just good space in general, but then also like large data sets, you don't really want to have that distributed globally. Right. And so that is sort of a trick, but then I also think they're just purely running out of space. So kind of nuts. But government's probably adopting computer and cloud providers like crazy. So, yeah, instead of maintaining their own sort of private data centers and doing that, taking advantage of these cloud hyperscalers. That's cool. Well, and if you have all those data centers and you have all that storage moving around at all that speed, you need to also upgrade your network. And so Azure is tripling down on AI infrastructure with its global network, now reaching 18 petabytes per second of total capacity from six petabytes a second at the end of FY24. So three XSI in two years. the network spans over 60 ai regions with 500 000 miles of fiber and four petabytes of wang capacity using infiniband and high-speed internet for lossless data transfer between gpu clusters nat gateway standard v2 enters public preview with zone redundancy by default and no additional cost delivering 100 gigabits per second throughput and 10 million packets per second and this joins express route vpn and application gateways and offering zone resilience skews as part of azure's resiliency by default strategy security updates include dns security policy with threat intel now generally available for blocking malicious domains private link direct connects and preview for extending connectivity to any routable private ip and jot validation at layer 7 and application gateway to preview offload token validation from your back-end servers the express route is getting a 400 gigabit direct port and select location starting in 2026 for multi-terabit throughput while vp and gateway now generally uh generally available supports five gigas of single tcp flow and 20 gigabits total throughput with four tunnels private link scales to 5 000 endpoints for VNet and 20,000 across peered VNets. And container networking improvements for AKS include eBPFs for house routing for lower latency, pod CIDR expansion without cluster redeployment, WAF for application gateway for containers now generally available, and Azure Bastion support for private AKS cluster access. That's a lot of networking and stuff too. Yeah, that's a great announcement. I mean, if you have those high network throughput needs, that's fantastic. And it's been a while since I've really got into cloud at that deep layer, but I do remember sort of in AWS, the VPN limitations really biting, you know, like certain connectivity things because you really did hit, it was easy to hit those limits pretty fast and, you know, direct connect and other things came along and sort of fixed some of those things. But then I'm sure, you know, you can exceed those as well. I do like some of these, you know, things they tacked on, like the DNS, the automated sort of malicious domains. And that's kind of great because that's a great way to protect your environment so that anything malicious, the outbound call just doesn't work, which is fantastic. I like to see those types of things. Very cool. Well, Ryan, we made it. It's the end of the show. Yes, we have. All right. Well, happy holidays to all of our listeners, and we will see you in the new year with our look back and look forward show. And hopefully the cloud providers take a little time off too because they've been busy. Yeah. We cut a lot of stories this week that we, you know, Ryan and I didn't want to talk about. But really more just, you know, not anything super exciting, but lots of I didn't make the re-event cutoff type stories for AWS that they didn't make the cutoff for a reason. Right. But yeah, it's all good. So we look forward to seeing you in the new year and have a great one, all of you. Happy holidays, everybody. And that's all for this week in cloud. We'd like to thank our sponsor, Archerra. Be sure to click the link in our show notes to learn more about their services. while you're at it head over to our website at thecloudpod.net where you can subscribe to our newsletter join our Slack community send us your feedback and ask any questions you might have thanks for listening and we'll catch you on the next episode well I do have an after show for you that we had to talk about so So when I talk to you about Amazon Lambda and serverless and all of those things, is there a company that comes to mind for you that is a big poster child of using those type of technologies? Oh, certainly. Yeah. Roombas. Right. Like that was that was the big Lambda success story that I their logo has been many on many of my slide decks toting serverless technologies and patterns. well uh it might not be anymore they've apparently filed for bankruptcy marking the end of an era for the company that pioneered consumer robotics with the rumba now being acquired by its chinese supplier uh picia robotics for ever losing ground to cheaper competitors the stock crashed from amazon's 52 dollar offer in 2023 to just four dollars how the market leaders can fall when undercut on price the failed amazon acquisition in 2023 due to eu antitrust concerns looks particularly painful in hindsight as iRobot might have been better off with amazon's resources than facing bankruptcy this highlights how regulatory decisions intend to preserve competition can accelerate a company's decline instead and for cloud professionals this is how hardware iot companies struggle with strong cloud services and ecosystem lock-in that could justify premium pricing iRobot's inability to differentiate beyond hardware shows why companies like amazon google and apple integrate devices tightly with their cloud platforms chinese supplier takeover raises questions about data privacy and security for your millions of Roombas so maybe it's time to retire them out to pasture if you're still using them and this could become a caution to hail about supply chain dependencies and what happens when your manufacturer becomes your new owner in general it was founded in 1990 and sold over 40 million robot devices and this is kind of a sad day especially if you're a big serverless fan because they were definitely the poster child of all things uh serverless i forgot totally about the the amazon's bid on and that it failed because that's kind of nuts. It really does show that the business was at least trying to sort of find a situation that they could have gotten out of. But they also did just have their lunch just eaten by, there's a billion competitors now, and they never were able to really keep up and stay ahead of the game competitively, right? Like that was sort of, and I don't think this is really a takedown of serverless or anything like that because it really was sort of the combination of the hardware devices being sort of the same, you know, thinking about like Ecovac and some of these other competitors that I know more recently that have come through. The hardware was all just sort of basic and the same. They never really changed there. And then it was just, it was more expensive than anything else. Like that was it. Yeah, I mean, it got, when they got like eight, $900 for what they were able to do, it was sort of like, oof. this is getting pretty expensive for kind of a mediocre product in some ways. And I kind of didn't like some of the new vacuums and stuff are all vision based and so I vacuum in the middle of the night so I kind of like the iRobot, the older ones, but then you know, because you don't have to have the lights on, but if you don't have the lights on your house is particularly dark because I don't like sunlight. Those things don't work at night. I mean, you mentioned before the show that you don't sleep well at night maybe it's the vacuums maybe maybe you shouldn't have run those at night i'm just just putting it out there yeah yeah there's definitely you know i like my house to sort of like in the jetsons it sort of like resets overnight and like everything gets washed and everything and i wake up during the day and it's all it's like a brand new day it's interesting because yeah i was just looking at you know wire cutter uh who the last time i bought a robot vacuum which was years ago now you know it was i robot all the way and now iRobot's not even the list. Their top pick is some company called Roborock and their runner-up pick is a company called Tapo. I've heard of neither of these companies. I have not heard of Tapo. I've heard of Roborock. I heard of Ecobee and I knew of Narwhal only because I bought their mop because I was super excited about it and their mop wasn't very good. Yeah. I tried buying it. It's in a closet now. Yeah. When we moved to this house here, we had a highest end iRobot that you could buy and my house is a single story house and it's uh you know like 3 400 feet and it couldn't keep the memory of the of the house in its memory it just would it would basically fail somewhere randomly in the house but i had to go i had to go hunt for it like where did the robot die uh and so that was what killed the hero of this house and so you know it's definitely sad it's gone and uh but you know i definitely when i was looking at buying another one i was like i'm not gonna spend another 800 on a product that isn't really innovating and wasn't changing a lot and so you know i definitely if i was interested in replacing it which i am not at the moment uh i would definitely look probably elsewhere yeah but uh you know which is why why i robots filing bankruptcy because you and i are having this conversation going yeah we wouldn't buy one right now either no like and for for yeah the 800 bucks i'd rather buy like three cheap ones you know and have them just fight each other out for at the boundaries like robots to the death but at least my floors would be clean it gets really frustrating when like it was one thing when it got stuck under a chair or something mechanically where it couldn't get out you're like okay this is just silly but but it would like it would just stop in the middle of the floor go i don't know where i am i gotta i can't do it yeah i can't do anything else like what are you doing that's what happened to mine like i would get a message saying you know unknown fault on my robot and and like you know google google the message you're like oh you're you know you the memory space ran out on the robot because the floor plan's too big yeah and you're like oh okay yeah that's that's how i figured out my uh mapping issues because the maps kept resetting in the same way and it was because it was too dark so it would lose its it didn't have enough data points to sort of say oh here i am in the map and so it would be like oh i'm somewhere new and it would just reset the whole thing like so frustrating for sure well ben kehoe was really the one of the big reasons why i was really into uh i robot as well and he was a strong advocate for all things you know lambda and so big impact on he's gone for a while i think that was our joke is like you know so they felt like you know you wouldn't apply to them directly so they just had to buy your company when amazon was going to buy them and then that didn't come through but uh yeah i just looked at myx i was curious he's up to these days he's over at siemens another plc based robotic company so i'm sure he's having a good time we should yeah should reach out to him someday and see what he's up to but uh well r.i.p i robot if they don't uh survive or you know i assume this new chinese company will keep developing and maybe they'll bring the cost down and make it better uh and i'll be curious if they still are a big amazon customer or if they start changing their technology to be less dependent so we'll see what happens alibama cloud it'll be an uphill battle in the u.s market because people hear that the chinese supplier and panic yeah i mean all those all those brands robo rock and ecovacs all those they're all chinese vendors too it's yeah our wall is for sure it cracks me up it's just it's the minute it gets branded as such that's you know yeah it's all over for it yeah all right well have a great christmas uh i'll see you in the new year all right later bye now