Risky Business #835 -- Why the Fast16 malware is badass

66 min

•Apr 29, 20263 months ago

Summary

Episode #835 examines distillation attacks on frontier AI models, the Fast16 malware mystery solved after 10 years, and emerging threats including private inference needs for sensitive workloads. Guest co-host Dmitry Alperovitch discusses US-China AI competition, chip restrictions, and the importance of compute as America's competitive edge.

Insights

Distillation attacks are a more significant threat to US AI dominance than chip restrictions alone; Chinese models improve 3-6 months after US model releases through stolen reasoning traits and post-training optimization
The real moat in AI competition is compute access and model quality, not hardware vendor lock-in; frontier labs have easily ported from NVIDIA to TPUs/Traniums in weeks, making chip addiction arguments invalid
Private inference infrastructure is becoming essential for security-sensitive work (exploit development, incident response, EDR audits) where data cannot be exposed to frontier labs due to legal/confidentiality constraints
Local open-source models are approaching frontier capabilities with proper hardware and evaluation; a 6-month lag exists but optimization continues to improve performance and efficiency
Supply chain attacks and malware persistence remain critical threats; patching alone is insufficient—firmware reflashing and assuming compromise are necessary for infected network devices

Trends

Post-training optimization becoming primary driver of AI model improvements, making it attractive target for distillation attacksShift toward private inference services (Tinfoil SH, WhatsApp's private compute) as enterprises require confidential AI workloadsChinese AI innovation in mixture-of-experts architectures (Kimi, DeepSeek) demonstrating independent capability development beyond pure distillationIncreased use of AI by threat actors for recruitment, social engineering, and code generation in campaigns (North Korea IT worker scams)Hardware supply chain vulnerabilities (TSMC manufacturing Huawei chips via shell companies) becoming critical national security concernRegulatory focus on SS7/diameter vulnerabilities in telecom networks for location tracking and SIM jacking attacksMalware persistence techniques evolving to survive patching; firmware-level compromise requires complete device reflashingRansomware specialization on specific vulnerable products (Akira targeting SonicWall) proving more effective than general capabilitiesLogging and telemetry systems becoming unintended vulnerability vectors (Apple notification database logging issue)Reverse-engineering security patches and court documents to identify zero-day vulnerabilities becoming standard attacker methodology

Topics

AI Model Distillation AttacksUS-China AI Competition and Chip RestrictionsPrivate Inference and Trusted Execution EnvironmentsFast16 Kernel Malware and Shadow BrokersFrontier AI Model Capabilities (Opus, Claude, Gemini, Mythos)Open-Source AI Models (Kimi, DeepSeek, Quen)Supply Chain Security and Malware PersistenceTelecom Network Vulnerabilities (SS7/Diameter)Location Tracking and SIM Jacking AttacksRansomware Specialization and EconomicsGitHub Security Incidents and RCE VulnerabilitiesApple Notification Database Logging VulnerabilityNorth Korean IT Worker Recruitment CampaignsCisco ASA Firewall Compromise and Incident ResponseWhatsApp Private Inference Architecture

Companies

DeepSeek

Chinese AI model company accused of distillation attacks against US frontier models; improving rapidly through stolen...

Anthropic

Frontier AI lab developing Claude models; subject of distillation attacks and unauthorized access incident via Discor...

OpenAI

Frontier AI lab developing GPT models; mentioned as target for distillation attacks and comparison point for model ca...

Google

Develops Gemini frontier models and TPU hardware; successfully ported from NVIDIA chips demonstrating non-vendor-lock-in

NVIDIA

GPU manufacturer; subject of debate over chip export restrictions to China and claims of customer lock-in

CrowdStrike

Co-founded by guest Dmitry Alperovitch; mentioned in context of his background in cybersecurity

Kaspersky

Security firm that published research on Fast16 malware and Lotus Wiper ransomware; employs researchers Vitaly Kamluk...

Meta

Implemented private inference system for WhatsApp; subject of Trail of Bits security audit for private compute archit...

Apple

Developed private compute cloud; referenced as leading implementation of private inference; had notification database...

Cisco

ASA firewall devices compromised via vulnerability; Firestarter backdoor malware persists after patching

Huawei

Developing Ascent chips manufactured by TSMC via shell company; subject of chip export restriction discussions

TSMC

Semiconductor manufacturer; conned into manufacturing Huawei Ascent chips via shell company arrangement

Alibaba

Develops Quen open-source AI models approaching frontier capabilities; focused on locally deployable models

GitHub

Had multiple security incidents including merge queue regression and RCE vulnerability in GitHub Enterprise

Vercel

Hosting platform that discovered unauthorized customer account access during incident response; implemented environme...

Tinfoil SH

Private inference service provider offering secure model hosting; subject of Trail of Bits security audit

Citizen Lab

Published research on SS7/diameter location tracking attacks against telecom networks and SIM jacking campaigns

Wiz

Security firm that disclosed RCE vulnerability in GitHub and GitHub Enterprise overnight

Silverado Policy Accelerator

Washington DC think tank where guest Dmitry Alperovitch serves as chairman

Trail of Bits

Security engineering firm conducting private inference audits and using private inference for sensitive client work

People

Dmitry Alperovitch

Co-founder of CrowdStrike; guest co-host discussing US-China AI competition, chip restrictions, and testified to Hous...

Patrick Gray

Primary host conducting interviews and moderating discussion on security news and trends

James Wilson

Co-host analyzing distillation attacks, AI models, and security incidents; published solo 75-minute podcast on distil...

Dan Guido

Discussed private inference infrastructure, WhatsApp audit, and secure enclave implementations for sensitive workloads

Vitaly Kamluk

Co-authored research solving 10-year Fast16 malware mystery using Lua interpreter analysis technique

Juan Andreas Guerrero Saad

Co-authored Fast16 malware research and identified LS-DYNA targeting for subtle mathematical calculation errors

Nicholas Carlini

Interviewed for Risky Business Features podcast on Mythos capabilities and local model development timelines

Michael Kratzios

Issued memo to executive agencies regarding distillation attacks against US frontier AI models

Jensen Huang

Referenced regarding arguments about customer lock-in and selling chips to China for AI development

Sergei Mineyev

Developed malware hunting techniques used to discover Fast16; passed away recently

Andy Greenberg

Co-authored article on North Korean AI-assisted recruitment and coding campaigns

Matt Burgess

Co-authored article on North Korean AI-assisted recruitment and coding campaigns

Kim Zeta

Published detailed analysis of Lotus Wiper malware targeting Venezuelan state petroleum company

Dan Goodin

Published article on Kyber ransomware marketing quantum-ready encryption capabilities

Joseph Cox

Referenced as security journalist who avoids phone connectivity for privacy; co-founded 404 Media

Quotes

"Post-training is becoming really, really important and is driving most of the improvements in models today. So it makes sense that if you steal a bunch of reasoning traits from our models and then use them for post-training, that the Chinese AI models would become quite good."

Dmitry Alperovitch•Early in news segment

"Chips ain't cocaine. They're not that addictive. The top two Frontier AI models right now, Claude and Gemini, we're not even trained on NVIDIA chips. They moved off of NVIDIA very, very quickly."

Dmitry Alperovitch•During chip restriction discussion

"It's all about the models, not the chips. Did we win? Like, is that what victory looks like? Of course not."

Dmitry Alperovitch•During chip restriction debate

"These things, they operate in little trusted execution environments. They're a little virtual machine inside of either a CPU or a GPU that is completely separated from any infrastructure that the cloud provider or the vendor can access."

Dan Guido•Sponsor interview on private inference

"I would rather have something that does like 10,000 tokens per second and is completely stupid because then I can scale out a huge agent workflow, run like 50,000 checks on whatever it is that I'm working on and get some trust out of the process."

Dan Guido•Sponsor interview on model performance tradeoffs

Full Transcript

Hey everyone and welcome to Risky Business. My name's Patrick Gray. We've got a fabulous show for you this week. Mr. Adam Boileau is still traveling and not here with us at the moment. So we have a special guest co-host, which is Mr. Dmitry Alperovitch, who will be joining me and James Wilson in just a moment to talk through the week's security news. Dimitri, of course, many, many years ago was the co-founder of CrowdStrike. But these days he is the chairman of the Silverado Policy Accelerator, a Washington, D.C.-based think tank. But he is still very much someone who pays attention to events in cyber. So he's graciously agreed to join us to do this week's show, which he does sometimes. And we sure do appreciate it. This week's show is brought to you by Trail of Bits, which is a security engineering firm based in the United States. And Dan Guido will be joining us this week to talk through private inference. So he did a talk at Unprompted a while ago. And one of the interesting things he said was he wasn't really that keen on buying hardware for trailer bits to use to do private, you know, like local model to run local models and do it privately when you can actually just basically rent private inference hardware, you know, as a service, right? And I've thought, oh, that's an interesting topic for a conversation. So he's joining us this week in this week's sponsor interview to talk through how that works. There's like Tinfoil SH is one of the services that they use. But they also did some work looking at WhatsApp's private inference approach. And yeah, it's just generally an interesting topic. So that is coming up after this week's news, which starts now. And Dimitri, we're going to start with you. It's really funny, man, because we've got, you know, we've had guest co-hosts the last couple of weeks. and the topics just keep miraculously lining up with whoever we've scheduled to be our guest co-host. Like last week we had Grok and there was all this Groky sort of stuff to talk about. This week there's all this sort of very much Dimitri Alperovitch. All the Dimitri stuff. Yeah, exactly, right? It was meant to be. So we're going to start with this first item here, which is the State Department kicking up a huge stink about distillation attacks against frontier models in the United States. So they're saying that DeepSeek are a bunch of dirty, thieving scoundrels, and it will not stand. And yeah, what's the go here? Well, not just the State Department. This is really an all-of-government approach. Michael Kratzios, who heads the Office of Science and Technology at the White House, also put out a memo to all the executive agencies that they released publicly. Look, this is a big deal, and I'm glad that the government is now paying attention to this. I've talked to a number of frontier companies and their researchers. they actually believe that most of the progress that you're seeing currently from the main Chinese AI models, Kimi, Deepsea, Kuen, is actually coming from two things. It's coming from distillation of U.S. models and obviously smuggling of or even buying legitimately of NVIDIA chips that you use to do this post-training. And by the way, post-training is now becoming… Hey, hey, hey, hey, whoa, whoa, whoa. Hey, hey, how dare you suggest that? I mean, the Chinese trained these things on a bunch of old like 486s they had lying around in a garden. They trained them on a potato, Dimitri. What are you talking about? That's crazy talk. As we know, potatoes are fantastic at matrix multiplication. So they're just perfect for training this stuff. Yeah. No, look, I mean, as we know, even the Chinese AI researchers are saying that they're training on video chips. Now they're claiming that they're buying them legitimately. But this is not a secret. But look, what's actually really important is that if you talk to frontier AI companies here in the U.S., they will tell you that post-training is becoming really, really important and is driving most of the improvements in models today. So it makes sense that if you steal a bunch of reasoning traits from our models and then use them for post-training, that the Chinese AI models would become quite good. Still behind us, but behind in a matter of months, which makes sense, right? New Opus 4.7 comes out. They do a bunch of distillation attacks on it. And then three to six months later, oh, my God, a new Kimi model is out. That's really good. I wonder where that came from. Yeah, yeah. I mean, we also saw the news. We got a bit of insight into how some of this chip smuggling is happening, right? We have a super micro co-founder, you know, personally assisting in using a hairdryer to remove labels from hardware, to stick them on other boxes and whatever, to move this stuff into China. I feel like the restrictions on chips are only ever going to be so successful. And the best you're ever going to do is slow the Chinese down. Now, you did some testimony to the U.S. government. We'll get into that in a minute on this topic. Your testimony was on this topic. But, James, you've actually recorded and will publish today a 75-minute deep dive solo podcast on distillation attacks. And I think the conclusion that you arrived at when going through this exercise was that really the moat, America's moat, America's edge in the AI war really does come down to compute. It comes down to who has the most hardware. Yeah, absolutely. It was a really interesting deep dive that I did, Pat, because it surfaced that. It also surfaced that distillation is not just for attackers. This is a legitimate technique that's actually used in training. but for for an adversary it's such a like a delicious economic sweet sweet spot you can take these uh freely available open weight models and and with a relatively small data set compared to the massive data set you'd need for an actual frontier model to be trained um you end up with some really high quality results and and results on smaller models as well so that does help with the fact that when you when you're limited in your compute and your ram you can you can still run these things. But yes, everything's converging, everything's becoming commoditized. But the thing that still remains the huge limiting factor is getting the access to the GPUs and increasingly the RAM that's needed to host these models for the inference. Now, you know, when some, you know, we describe people as dovish or hawkish. I sometimes describe Dimitri as being less of a hawk and more of a pterodactyl when it comes to topics like this. You gave some testimony recently. Was it a Senate or a congressional committee? It was the House Select Committee on the Chinese Communist Party. Yeah, okay. So I'm guessing you went there and said, hey, it's fine. We should sell the Chinese our hardware. They're our friends, right? Was that basically what you said to them? Look, I basically said that we're in an AI race. No one is disputing that. Last time we were in a great power race was a space race. And selling chips to China is akin to selling rockets to the Soviet Union to get to the moon faster, right? It just makes no sense whatsoever. And Chairman Moulinard actually asked me, well, what about this argument that NVIDIA uses that they'll get addicted to our chips? And I said, you know, Chairman, with all due respect, chips ain't cocaine. They're not that addictive. And look, you know, the top two Frontier AI models right now, Claude and Gemini, we're not even trained on NVIDIA chips. They moved off of NVIDIA very, very quickly. I'm told it was a matter of about 10 people in a couple of months to actually port to TPUs in the case of Google, Traniums in the case of Claude. So this is not something that you can actually get them to be addicted to. And the talking points from folks like Jensen and others is, well, wait, we want China to use the American AI stack. And my view is American AI tech stack is not chips. You know, if you want to talk about the tech stack, it's chips in American cloud providers running American models. That's the tech stack. But selling China chips so that they can steal reasoning traits from American cloud models, AI models, and then use it to train their own models that will be running in Huawei data centers, that's not winning. and I actually had a little bit of a debate with Shriram who is the now I guess acting AI czar now that David Sachs has moved on in the White House on X where I said imagine the situation where we actually get China addicted to NVIDIA chips which is impossible but let me just suppose that they would and they train the very very best models and the entire world is running on Kimi or DeepSeek and applications are built on top of them but they're all running on NVIDIA chips Did we win? Like, is that what victory looks like? Of course not. It's all about the models, not the chips. Yeah, I think also the argument that, well, if you restrict chips from China, they're just going to start developing their own. I mean, that's a heavy lift. I think for people who are really deep in the semiconductor space, like, that is a heavy lift. Well, by the way, they're doing that anyway. And most of them, this is a story that should be getting more publicity, but it's not. So they have the Huawei Ascent chips. that are not even equivalent to the Blackwells in any shape or form. It's similar to the H20 from NVIDIA, but not even as good as that. But most of the Ascent chips were not made in China. They were actually made by TSMC, who was kind of conned by a shell company to build them for Huawei. And, of course, that's now hopefully been stopped. So they can't even manufacture these chips when they design them. By the way, one other point on distillation to come back to that topic. This is why, and many of the listeners probably have noticed that, when you go to Claude, when you go to ChagyBT, you no longer see the chain of thoughts details that you used to see on these particularly deep research type of experiments. This is why, because they know that their models are being distilled. If you show all that information, it helps a lot for China to actually catch up quickly. Well, James and I were chatting about that way, And your opinion on that, James, was it helps a little, more than a lot, right? Yeah, it's, I mean, look, it makes sense why they've done it, but there is just so many other signals that are still going to be there to distill these models. And so, yeah, it helps a little, but it certainly hasn't closed the door on distillation. And just to really bring into focus that point Dimitri made about the, are the chips the thing that people get addicted to? one of the things I covered in the pod that's coming out today is that the actual act of distilling a model is about 10 lines of Python code. Like this is so well embedded into frameworks and existing tooling that you could imagine that there is only probably one line in that 10 line Python script that would have to change to move from an NVIDIA chip to a GPU from anywhere else. And so it's just not, we're thinking about this wrong. We're thinking about it as there's some sort of incredible level of engineering that directly hard tires the training process and inference process to a particular hardware vendor. But we're not. It's all been modularized and abstracted away. And so I completely agree. It's easy to move. Now, James, look, there is some validity to the fact that PyTorch, one of the common frameworks, is more optimized for CUDA and NVIDIA. That is true. But, you know, TPU code has been in there for many years now. More optimizations are being done. And look, every single frontier company, every single hyperscaler is building their own chips, right? Maya from Microsoft, Facebook is building their own chips, you know, OpenAI is claiming that they're going to do their own chips, obviously TPUs from Google. So it's not like NVIDIA has a monopoly on this. I mean, this is matrix multiplication. Sure, you do it in a very parallelized way, you integrate high band memory. But like, I actually think you know you guys may correct me on this but like building a a tpu is actually a lot easier than building a cpu where you have to do all the branch prediction stuff and like deep cache integration all that stuff here you're just optimizing for really really fast really huge matrix multiplication yeah that is that is exactly right it is simpler because of the tasks that do kind of like the difference between the complex instruction set versus risk and the reduced instruction set and this is reduced even further because it really just does come down to mat mole but there's other challenges around just the memory bandwidth parallelizing parallelization in computer science has always been one of those really difficult things alongside naming things but yeah the sort of cruxier point is right there now just for everyone listening at home this is what happens when you ask Dmitry Alperovitch hey man what do you think about chips a little bit of a passion project yeah we're going to kick on to to the next part of this discussion which is you've mentioned the kimmy model a couple times now this is interesting because like uh james and i we actually interviewed nicholas carlini uh from anthropic he's the security researcher at anthropic and we published that into our risky business features feed because there's no room in the main feed for this so if you want to hear these sort of podcasts listeners please go and subscribe to risky business features uh it is a completely different podcast feed and And this is also where James' solo pod all about the distillation stuff is going. Now, the reason we're going to talk about Kimi just now is it's relevant to a bunch of stuff that we talked about with Nicholas in last week's podcast, James. Chiefly, you know, he was talking about how Mythos might be big and scary now, but like this sort of capability will probably be in a model you can run locally on your laptop in a year from now. And I guess the reason I wanted to talk about Kimi now is because they've done some really interesting stuff in terms of making the model very efficient at performing certain tasks by being able to sort of selectively load parts of it, right? Which makes Nicholas Carlini's claim about what these local models might be capable of, either on your own hardware or on something like Tinfoil SH, which we're chatting about with Dan Guido later. The point is, these local models are going to get a lot more powerful. Can you just tell us a little bit about the innovation here? Because the Chinese, we saw it with DeepSeq and now we're seeing it with Kimi, they are innovating in their own ways in AI. And this is a good example of that. Yeah, 100%. So let's cover Kimi and also Quen, because these are the two sort of leading models out there in the open weight space. Kimi is interesting because they've just come out with K2.6 and it follows in the pattern of DeepSeq, which uses this thing called mixture of experts. So Kimi K2.6 is a 1 trillion parameter model, which is vastly too big to even fit or to run efficiently on sort of a laptop or even the most beefed up Mac Studio. But you can see the trajectory that they're heading towards through these innovations, because although it's a 1 trillion parameter model, only about 32 billion of those parameters is active at any one time. And so if this continues in this direction, we're going to see this very nice balance between the model is huge in terms of the overall capabilities it's got, but it's able to then zero in on certain parts of the model, activate those at that point in time to keep inference far more efficient. The other end of this spectrum that's interesting here is Quen, which is another Chinese model. This is from Alibaba. And they focus more on models that are actually manageable size for local deployments. deployment so they've got models available at the moment that are uh i think there's one that's 27 billion parameters one that's 35 billion parameters and the benchmarks basically say these are approaching those frontier model capabilities in agentic coding and a few other tasks and to put that in perspective like i've got a mac studio m2 32 gig of ram i can run a quen 32 or 27 billion parameter model on that locally. The catch here is though these models are still incredibly slow compared to a hosted anthropic model like you're talking about 400 to a thousand seconds for something that takes 20 seconds with Claude and they're also an order of magnitude more inefficient with their token use. But you know sort of to what Dimitri was saying there this is optimization right we've cleared the hurdle of this is possible and now we're just in optimization and there's one thing we know about optimization it just keeps getting better and better over time. So yes, local running models with frontier capabilities coming real soon and kind of already here. Yeah. So I think, I mean, I think that was really an interesting part of that interview. Dimitri, you know, you, you and I, we've known each other a long time. We're good friends, right? Like our friendship, I best describe it as an argument that started in about 2019 and is still continuing You been you really an AI pumper right Like you been big on AI very early way ahead of me I think largely you been proven right That said with Mythos you think some of the hype around Mythos is not entirely justified. So I guess, you know, the idea that we might start seeing similar capabilities in local models might not be that frightening to you. Just a quick take from you on Mythos and what you think it means, because it's been such a big topic and we've got you here. Well, look, first of Well, and I think Nicholas on your podcast talked really well about this. You know, the big difference with mythos is the exploit generation, right? It is better at finding vulnerabilities. But look, other models have been able to find vulnerabilities as well, right? It's not like we woke up today and mythos suddenly can find all these vulnerabilities that no one else could discover. It's incrementally better. And, you know, my guess is probably, and I haven't played with this, but I've talked to many people that have, maybe 30% better than, you know, the Opus 4.7, for example. But it is able to write exploits. But, you know, from what I've heard, even that capability is hit or miss. Like simpler exploits, sure, but anytime you need to sort of like, you know, do an iPhone zero day that chains lots of things together, that becomes much, much harder. It is able to do it. you know what i hear from my friends at anthropic is that if you're spending the equivalent of a hundred thousand dollars on tokens sure you can probably generate some pretty cool exploits i mean very few people companies can be can afford that much right so um on this level of experimentation and and driving it that deep so uh mythos is incredibly expensive which is you know yet another reason why it won't be available publicly. And by the way, you know, that really helps with distillation, right? If Chinese cannot get their hands on this model, we'll see how good Kimmy is in six months and whether it can actually catch up with Warrior Mythos is today. Yeah, I thought actually Carlini's argument that the cost thing is kind of irrelevant was the more compelling part of the interview where he's like, look, you know, it's getting cheaper. Some of the dollar figures around the tokens required to do these things have been misreported. So I thought that part was pretty interesting but yeah anyway it's an interesting thing right because no one quite knows where all of this is 100% going but look speaking of Mythos there was a big story it was breaking kind of as we were preparing last week's show I didn't think it was that important to cover but it's been everywhere which is a group of people in a discord figured out how to get access to Mythos when they weren't on the like approved list of people who could use it the reason I thought this story was funny is completely different I'm guessing to the reason most people thought this story was interesting. The reason I thought it was interesting is Anthropic totally hung themselves on their own messaging here because they come out and talk about how dangerous it is, right? So obviously when you find out someone got access to it who wasn't supposed to, people are going to think that's a big deal when honestly it's not. Was that your take here, James? Yeah, 100%. It's just funny reading this as well. A bunch of shady Discord people and they basically used a data breach at Mercore to rifle through some logs and sort of found the url patterns and well actually not even the url i think that it's been completely misreported it's looking at the payload of the the message transcript that goes back and forth with anthropic and they just basically guessed hey you know the pattern for naming a model is pretty standard with anthropic it's the name it's the version it's the date and they must just you know put a few things together and determined oh this is what mythos is called and the number and had access to it there's also a little dot dot dot on the story around how the other part of this that made it possible is apparently they had or one of them was working with an anthropic contractor i do wonder if that's a soon-to-be former anthropic contractor but they had some sort of creds that probably gave them the ability to access models that were you know hosted in anthropic but not yet publicly available but um but overall yeah i agree with you it's like um if the model's that dangerous you kind of think there would have been a few more safeguards or things around it. Yeah, but it's like marketing meeting reality, right? And it's just a funny old thing. Like, you know, they've built a computer god in a box, but can't do ACLs. You know, like it is just such a classic in the genre of infosec stories. Moving on real quick, we're going to talk quickly about, I don't think this is going to be quick, actually, I lie. We're going to talk now about Fast 16, which is some malware. where it was, I think, sent to one research here. It was Vitaly Kamluk, which is a name I haven't heard in a while. He's an ex-Kaspersky guy. And JAGS, Juan Andreas Guerrero Saad, which I guess is why he goes by JAGS a lot of the time. Sorry for massacring your name, dude. They have done a bit of research into this malware that kind of connects to shadow brokers in an interesting way. Dimitri, tell us about Fast 16. Well, this is, I guess, a 10-year-old mystery that they were focused on solving. So people may recall the Shadow Broca's release back 10 years ago, allegedly from the NSA, and it contained, in addition to a bunch of malware and other things, it contained this file that was called Territorial Dispute, which basically had a list of various kernel drivers and other identifiers that you would check when you land on a box to see if someone else is already there. So it could be like other five eyes malware, other malware you know about, and you basically wanted to make sure there's no kind of friendly fire. If you're also on there and they get detected, then suddenly you get detected as well. So you can kind of decide what you want to do from a deconfliction perspective at that point. And there was one line in that file that just said, fast 16 driver, colonel driver, move on, nothing to see here. And that was it, right? And no one knew, you know, what APT is this? Like, you know, why is it saying that? That seems like a really interesting thing. And Jags and Vitaly have been looking for this Fast 16, elusive Fast 16 driver for years now. And the way they found it is really, really interesting. They actually use a technique from another Kaspersky researcher called Sergei Mineyev, who actually passed recently, I think last month, who was like this unbelievable APT hunter who had all these ideas of how you look through a huge repository of data, telemetry or malware to find interesting stuff. And what they did is they started looking at binaries and integrated Lua interpreter because they said, wait a second, we see this Dukun malware and other malware that integrates Lua. Let us look at all old binaries from like 2000 to 2010 timeframe that integrate Lua interpreters, right? And then weed out all the legitimate stuff and see what else we can find. And they stumbled on this file that eventually led them to this Fast 16 kernel driver that basically was compiled in 2005 based on its compilation date. And it's a really interesting malware because it spreads itself via network shares. And then it hooks into Windows file read operation so that it can look for new executables that get loaded, .exe files. And then we'll patch specific EXE files and patch particularly mathematical calculations code in those EXE files. And this was another sleuthing exercise by Jags and Vitaly to figure out what are the EXEs that it's actually patching. And their top candidates that they identified were this LS Dynas suite, which is a powerful engineering simulation software that's used to analyze how materials and structures behave in various extreme conditions, like, let's say, nuclear explosions. And lo and behold, another think tank has actually published information that Iran has used this software in its nuclear program to do its modeling. And then there was another program for open source water modeling and another Chinese construction and design software. And basically, when these executables load, it will patch up its math calculation code to basically introduce subtle errors in it, right? So that the modeling would produce wrong results, which would be really, really difficult to detect. So another kind of, you know, Stuxnet-like program that, you know, supposedly was hitting Iran back in the early 2000s. One thing I found really interesting, and I'll give Jax credit for it because I was talking to him about it, is he said, You know, this is a type of model that you can use in terms of tactics to actually target Chinese companies, AI companies, right? If they're doing distillations, well, if you can introduce subtle errors in matmul operations when you're doing training, boy, you can really ruin those models. So I thought that was an interesting thought experiment to see if, you know, this might be happening even today. Well, that'd be interesting because that'd be virtually impossible to detect, right? It's one thing to have an error showing up in a spreadsheet and then cross-referencing that. But the models are so dense and introspectable. Like, how would you even know that your weights have been messed with through this process? So funnily enough, I remember like around the time that this malware was out there doing the thing, there was a lot of talk about how you shouldn't forget about the I in the CIA triad, the confidentiality, integrity and availability triad. And, you know, I saw, I sat through talks from people who were like adjacent to the intelligence community talking about how, you know, someone could write malware that would subtly impact like spreadsheets and things like that and cause drama. And like, this was a big concern back then. And I guess I'm realizing the reason it was a big concern among people in the Five Eyes Alliance back then is because you were doing it to other people, which kind of makes sense. There's a bit of a projection going on there. But it's, look, it's a fascinating story. and you know it makes me think is this cyber war you know i just i i wonder is that what we're calling cyber war um now look moving from the i in the cia triad to the a in the cia triad uh we got some analysis here again from kaspersky looking at something they're calling lotus wiper uh which is apparently the wiper that went after the state-owned uh petroleum company in venezuela Like back in December, there was a ransomware attack that kind of looked like a wiper attack. The Venezuelans blamed the Americans. We actually published a podcast in which we said that was a credible accusation. It's looking more and more credible, James. I think you took a look at this and by the process of elimination, you're thinking, yeah, what else could it be than a US written wiper? Yeah, that's right. Between the Caspersi write-up and Kim Zeta's write-up, which had a bit more detail, no one seems to be comfortable saying, aha, this proves that it was a US operation or a US ally operation. But it's very much a Sherlock Holmes, the dog's not barking kind of thing. This is a highly targeted malware. It had no financial incentive. There was no ransom or extortion element to it. It just straight up wiped things and it did destroy data really well. and there was a compiled version of it with the timestamps that match up and it had a hard-coded string pointing to the PDVSA. So, you know, like, what's the other thing? But it was Venezuelan dissidents, clearly. Clearly, walks like a duck, talks like a duck, must be a Venezuelan dissident. Yeah, okay. Yeah, yeah. Moving on, we've got one from Andy Greenberg and Matt Burgess over at Wired, just looking at how the North Koreans are using AI to, like, automate a lot of their campaigns. I mean, this is something actually, James, you and I spoke about recently, not in the podcast, which is we saw there was some incident and they talked about how it was an AI enabled attack. And it's like, it's not going to be too long before putting that in a statement about some sort of data breach is going to be completely pointless because they're all going to be AI assisted, you know, hacks. I sort of feel like it's going to become the new sophisticated attacker line that goes into a press release. the new sophisticated attacker is like AI assisted. What are the nuts and bolts here though, real quick on this campaign that they've written up? Yeah, look, the campaign itself is just run-of-the-mill AI stuff, and of course everyone's using it and it did its usual thing of left some database credentials exposed, et cetera, et cetera. The same old thing that always gets these Vibe-coded apps popped and that allowed folks to go and rifle through and just see the tradecraft and the process behind this. What I did find interesting about it is I think they've kind of stumbled on an interesting vector here that allows them to get away with otherwise sloppy, vibe-coded tools to do this. And that is they go after their targets for this campaign here by posting as job ads. Now, we've all been through this. You get that job ad, you get the call from the recruiter, you want to start going in the recruitment process. You don't do that on your corporate laptop, right? You're doing that on your personal laptop, which doesn't have the EDR, doesn't have all those protections. So that's how they managed to get these otherwise pretty sloppy exploits onto a device. And then, you know, these days those devices have access to the things that they then can exfiltrate and get their crypto wallets and keys out of. So it's just kind of interesting that that interview aspect, this contagious interviewer camp, and as they call it, hits the sweet spot of quite an unprotected device. Yeah, they're not getting snapped by a kernel driver that Dimitri helped write 15 years ago, basically. I mean, we saw a whole thing on this on Twitter recently where someone walked through how they nearly got done, which was a LinkedIn account of some venture capitalist that they knew got compromised. And then they were like reaching out to them saying, hey, we should sync up. It's been a while. And they did the whole Calendly invite thing and waited a week. And then it's like, oh, yeah, you're going to need to download this binary to join the call. Dimitri, you had something you wanted to add there. well this just proves to me yet again that i said this uh 15 years ago first that north koreans are by far far the most creative actor right they pioneer techniques that then get adopted by others they were the first ones back in like 2009 to do disruption at scale they were the first ones to do hack and leak with sony they did the bangladesh bank heist and you know the it workers and everything else so it's not surprising to me at all that they are the first ones to really popularized the use of AI. And of course, they've been doing this with IT workers and passing interviews for a while. Now they're using it for coding. What was interesting, I thought, is that the way they expel guys identify that they use cursor and chat GPT is by comments, because they were looking at the scripts and they saw, wait a second, there's a ton of comments here describing every step of this operation. That looks suspiciously like a Vibe-coded piece of software. Yeah, in English too. So yeah, kind of obvious. That is funny. Now, this next one, I wanted to talk about with you, Dimitri, because I feel like I'm taking crazy pills because this story, it just keeps rattling around, right? There's been some sort of bug in Cisco ASA and a threat actor, I think it was Chinese, was putting some malware onto these Cisco's. Now, CISA came along and said, hey, you got to patch your Cisco's. Okay, yep, cool. And people have patched their Cisco's, US government have patched their Cisco's and now they're putting out advisories saying, hey, they're warning you that this Firestarter backdoor malware survives patching. And I'm thinking, what malware doesn't survive patching? That's not how you do incident response. I don't get this. It's mad. Why was there any expectation that patching a compromised Cisco ASA device was going to get you anywhere in terms of cleaning it up? What is going on? Well, look, you know, in fairness, right, there's no EDR running on these devices. So people kind of assume that they're clean and the only thing they have to do is patch them because usually vulnerabilities don't actually exploit the device itself and to land malware on the device, but kind of use it as a pass-through to inside the network. In this particular case, obviously malware was installed. It had persistence, so it would not be removed by patch. and look you know we were talking about this before the show like you have to reflash your firmware on these devices you have to assume compromise and as painful as it is to take firewalls offline hopefully have redundancy you absolutely have to do this just patching it is not sufficient i just look i just find it a bit maddening seeing headlines like us uk authorities warn that fire starter backdoor malware survives patching i just think what like okay uh anyway Okay, we're gonna move on to the next story now. And Citizen Lab has published a report about a threat actor that is tracking its targets They physically tracking you know tracking the physical location of their targets by exploiting vulnerabilities in SS7 and diameter which is another like SS7 protocol that isn quite as awful But it the sort of stuff that you expect to be seeing happening on telcos. Dimitri, we'll get you in on this in a moment. But first up, James, you've had a look at this one as well. This report has resulted in the telco regulator in the UK shutting down the availability of like something called global titles, which are used in these campaigns. Can you explain to us what a global title is and how it's used to stage these sort of attacks? I can. I can use my very fresh knowledge of this because after reading the Citizen Lab report, I had to go and read up on a whole lot of topics. The write-up is exquisite. It's so good. But a global title is essentially it's like a phone number style address with some metadata are attached to it and the complex and convoluted ways that these telco networks work there's kind of a resolution process it's almost like you think of like it's a form of like dns resolving down to the ip address and then you know when ip address has to use up to resolve to the mac address similar kind of multi-step resolution process for these things but the point of using the global title here i think is actually to exploit that that resolution right so they can essentially craft a global title coming from i think there was only three or so candidate telco networks they were trying but due to the way that those global title addresses get resolved they're able to sort of kind of like poke at various points of the network and see how is this getting resolved where's that where does that traffic go to and of course you know the problem with these networks is they're they're insecure by default really that it's built for interoperability between the telcos not a high degree of security and so people have bolted on firewalls to prevent a lot of this malicious traffic going back and forth. But what's interesting in this example here is the attacker is actually using lots of different steps where they basically say, I'll try this attack, see how it resolves, where does it get to, who blocked it? Okay, let's try something a little bit different. How does that get routed? Who blocked that? And it's like they're just incrementally building more and more knowledge about where the weaknesses are, because then they can funnel all the traffic they want to through that weak point in the network to get to the end subscriber that they're trying to surveil. Now, when it comes to these global titles, Who is supposed to have access to a global title? I'm guessing that's a telco. Yeah, it's a telco and they lease them out for other purposes as well. And that's what the UK has particularly. Of course, right. Like, yeah, of course, they lease them out to someone for some insane reason because it's the telco industry. Dimitri, you are actually an investor in CAPE, which occasionally sponsors the Risky Business Podcast. And that's because you introduced them to us when you invested in them. I'm guessing you invested in them for kind of this reason, because they don't even allow SS7. and it's diameter only, and I'm guessing even then they're a little bit careful about the way that traffic is handled on their network. I mean, do you have some thoughts here? Yeah, absolutely. Look, location traffic, which is one of the ways that they were using this SS7 and diameter compromises for, is a huge issue, right? It's a safety, personal safety issue. You know, executives care about it. Government folks care about it. So trying to prevent that at a telco level, really, really important. And that's one of the reasons why CAPE is becoming more and more popular across the board. What I found interesting in this write-up, and again, kudos to the Citizen Lab guys, really phenomenal work, is there's actually two types of attackers they identified. One that was just using SS7 in diameter to track locations. Another that actually had a really interesting SMS exploit where they would send a binary SMS message to a device that contained a SIM card, a SIM jacker basically exploit, that extracted location info and would basically have the SIM jacker continuously ping you with the current location. So a lot of activity that's happening that most people aren't even aware of. Obviously, we have the on-device compromises that people are becoming more aware of, waterhole attacks and other things on Apple and Androids. But this is something that doesn't even impact your phone. It's done at the carry level. So, you know, without things like this report, we would really not know the extent of which this is happening around the world today. And just the fact that that malicious SMS can include those codes to make those actions happen, boy, it just makes you nervous about carrying this thing in your pocket when you know that that's actually, you know, part of the protocol, that that can be done. It's, yeah, it's an incredible read. Turns out mixing data and code is bad in the mobile ecosystem as well. Who would have thought? So let's spend hundreds of billions of dollars on CapEx doing exactly that, Dimitri, with AI models. But anyway. And just what you were talking about, about it, you know, feeling nervous about it being in your pocket, James. Like I just still remember something like eight, nine years ago or something, meeting up with Joseph Cox, now the publisher and, you know, head of one of the partners in 404 Media. And he did not have a phone. He used an iPod Touch as his communication device for exactly that reason. He's like, I do not want to connect to any of these networks. Well, and by the way, if you talk to some of these citizen lab researchers and how paranoid they are, they don't have phones. They, you know, do a hotspot to a device with your Wi-Fi. Like they go through extraordinary efforts to try not to get tapped. Yeah. Yeah. Well, I mean, that's the thing. Like, I don't want to live that way. I just prefer not to think about this, frankly. Moving on and real quick, the U.S. has sanctioned a Cambodian senator because this senator is essentially, you know, renting compounds to scam operators. I mean, this is what I've been saying would be the case where you'd wind up with a form of state capture by an illicit industry because it's, you know, something like 40% of the Cambodian GDP or whatever comes from these scams. So, of course, the government is going to essentially wind up operating as an enabler of these sorts of scams, and that seems to be what's happening. What else have we got? We got Vercel have put out another statement and talking about how some other company, other customer accounts got knocked over by attackers, because this is what happens, right? You hire Mandiant, they come in, they start kicking over rocks and they start finding other stuff. And that's what's happened here. Although it doesn't look like what they found here was necessarily a breach of Vercel systems, but they have through the process of doing incident response on the breach that they had discovered that, well, hey, here's some access to some customer accounts. It looks kind of funny. I mean, that's basically it, right, James? Yeah, that's exactly it. It's the same thing. these environment variables that weren't marked as sensitive are somehow being enumerated but again small number of accounts it's not going to lead to much I don't think and I stand by what I said last week I think Vercel did a really good job of just enacting three or four remediation corrective steps here that you know all still remain you know as good mitigation strategies even for the the new things they've found and in particular all the environment variables now are sensitive by default. So that completely alleviates this problem. We had a complaint too, that we were going too easy on Vercel on last week's show, because they were like, they didn't send us remediation things in time. And I think, look, the reason we're praising Vercel is because they didn't communicate ahead of them knowing stuff, right? Which is how you get into trouble when you're communicating about an incident. So I think we're going to stand by that. I mean, Dimitri, have you even followed this one? Not too closely, but yeah, I mean, look, major hosting provider, right? Of course, they're going to get popped by a variety of different actors. Of course, people want to get access to their information. So not at all surprising. Yeah. Yeah. Meanwhile, checkmarks. We spoke like a couple of weeks ago about a very limited intrusion that, you know, into like a checkmarks open source thing that not many people use, no big deal. There was sort of a supply chain thing. About that, James. Yeah, it was about that. Yeah, it was originally the checkmarks kicks which is just basically an infrastructure as code scanner so we weren't yeah it was interesting but it was all team pcp and wrapped up in that then along comes april 22 there's a new set of compromise packages that are released and this went beyond kicks this was some of the ast github actions their vs code extension their developer assist extension but the thing that makes you a little bit nervous about the state of checkmarks is even in their updated advisory this week they said a second wave of malicious checkmarks artifacts are published and this was april 22nd indicating continued or renewed attacker access so clearly not quite sure how the attackers still have access but suffice to say they do and that resulted then in a whole bunch of data getting exfiltrated out of their github repos and lapsus claimed credit for that So it's a bad time for check marks. Yeah, it is. Now, quickly, too, we spoke about the issue where the FBI were able to extract someone's signal messages out of some cache of their display notifications for those messages, for their push notifications. And, you know, you said, having worked at Apple previously, that shouldn't have been possible. Apple has now patched that but you have had a look at Apple's blog here and you think that it's not so much that the push notifications were cached you think they've actually fixed a different bug So the notification database is part of a thing in iOS called Springboard which is actually a very very well engineered and quite a hardened part of the operating system because it does handle everything from arbitrating the apps that are running it's the app launcher It's the notification center, control center, et cetera. So I was a little bit surprised that a bug would creep into there that would have been essentially something along the lines of when iOS posts a notification saying an app is deleted, a bug was causing the history of notifications to not get purged. That's what we thought the bug was. But still, I was a bit like, that's a bad bug to end up in Springboard or one of those components. Then along comes the fix. and the security bulletin from Apple has this line in it. It says, impact notifications marked for deletion could be unexpectedly retained on device. Okay, we knew that. Description, a logging issue was addressed with improved data redaction. And this is where I can use a little bit of my internal knowledge of Apple to tell you that Apple has systems for collecting a whole lot of diagnostics, right? When I worked there, we'd ship a feature and we would make sure that it had things like message tracer keys or AWD diagnostics in there so that once this software's out there, we could pull these metrics and we could see, you know, how often is someone using mail's feature for threading, etc. I think what's happened here is someone accidentally put one of those traces into a log to say, log when message has been deleted out of the notifications database and included the payload in there. That gets into that logging data And it fits also with this reported thing around the notification logs were cached for a month. That's generally the pull time for those diagnostics. They hang around, they batch up, and then they get shipped off. So, yeah, I think it was actually logging and telemetry that got us here. Yeah, you know, one thing that I found really interesting is, you know, as all of your listeners know, you have this model now where a patch comes out for a piece of software, and everyone is reverse engineering that patch to kind of figure out what the vulnerability was, write an exploit and kind of use it as a one-day exploit to compromise anyone who hasn't patched. Now it seems like the model is to actually reverse engineer or really read indictment documents to identify vulnerabilities that you can then use to go patch, right? So it tells you that DOJ probably was a little overzealous in the way they were describing how they got this data and they probably should have obfuscated it a little bit better so that Apple wouldn't patch it in the future. I know of stories where entire indictments have gone away because defense counsel have been pushing FBI to disclose how they've collected certain bits of evidence so that they could make the indictment go away because they didn't want to expose sources and methods. So, yeah, I actually had the same thought, which is like they messed up by talking about this in a court document because now obviously that capability is gone forever. Now, real quick, James, because we're like going over time. GitHub's had a hell of a week. They had a regression in merge queue behavior, which was not great, but wasn't as bad as it was like first thought to be. They've had all sorts of availability problems. And then there's been this security issue that Wiz disclosed like overnight for us here in Australia. Can you just walk us through quickly GitHub's horrible week? Yeah, bad week. So the merge queue one looked pretty scary because, you know, when you say it's a regression in merging and the regression is the code didn't merge, that's a big deal. You had one job, et cetera. Yeah, exactly. Regression. We didn't do what we were supposed to do. But it was in this thing called a merge queue, which is only used if you're at a very high volume of commits and PRs going through. and so hence the blast radius that was small but nevertheless dumb bug and a real concern that the testing wasn't actually you know exercising the real purpose of emerge queue there then along comes whiz with this um advisory that there's a um you know rce basically in in github online and as well as well as all of the self-hosted versions of this um the write-up's good but the thing that made me really laugh out loud with this one was uh towards the bottom of the article github says you know if you're looking for indicators of compromise on your github enterprise box look in your github pushes and see if any of the github pushes contained a semicolon it's like oh come on guys you're telling me that you just had basically a straight path from a push command to a shell that you could put a semicolon in there to say hey start my new command here it's it's just it's sequel injection but but for the shell so a really again dumb bug to their credit they did post today a pretty good article that explains look we are having a rough time because ai agenda coding has just blown the roof off all of our metrics of how fast commits are coming in how many repos are there so yeah you know but still tough i mean it's the vibe coding revolution right so like if you imagine the equivalent number of like human developers that they're dealing with now that actually makes a lot of sense yeah it makes a lot of sense now we've got two items we're going to get through real quick uh both kind of hilarious uh catalan spots uh spotted this one uh and uh dropped it into slack today which is the insurer at bay has said oh you know one ransomware crew is like risen to the leaderboard you're thinking oh big and scary it's akira uh ransomware and you're thinking wow you know these guys must have mad skills and it turns out no uh they've just got an mo where they go and rinse anyone who's running a Sonic wall, basically, which is how they've been able to get to number one, which is just, I mean, what do you even say? Specialization works. Specialization works. That's right. And, you know, how the mighty have fallen. Like, if that's what gets you to the top of the ransomware leaderboard these days, like, you disappoint us with your, you know, subpar TTPs. And the last thing we want to talk about is, you know, the cybersecurity industry's marketing dross is starting to leak out into the ransomware ecosystem uh we've got a ransom dan gooden has this right up for us technica where a ransomware crew is uh kyber is going out there promoting its ransomware as being uh quantum ready dimitri well this is military grade technology military grade encryption for ransomware right why wouldn't you use military grade technology technology mr affiliate that I want to sign up. By the way, it's interesting that it's named after the Kyber lattice-based encryption algorithm for key encapsulation, obviously, that is quantum resistance. And yeah, if you want to differentiate yourself, one way to do that is obviously to target the most common vulnerabilities like SonicWall. If you don't have that, resort to marketing. Well, guys, that's actually it for the week's news. But before we go, Dimitri, you know how was how was your weekend man I heard you were off to the White House correspondence ball the other night was it fun well you know it was interesting exciting I was disappointed that the food was kind of lacking we got salads and then not much else so we were stuck there for an hour and a half when they told us the president would come back which he I think wanted to do and then in the end the Secret Service didn let him I remember I was was texting with you and at the time we we thought that uh there was a dead body outside because there was rumors that the shooter was uh was killed and and you were marveling at the fact that in America dinner could continue while dead bodies outside uh but uh this is America it's almost like gun deaths have become a little bit uh over normalized maybe uh yeah I tell you we We were right at the stage, and one of the funniest things I saw in the initial moments as Secret Service is swooping in and taking over the people that they were protecting, and they were grabbing Stephen Miller and Katie Miller, who was sitting at the Fox News table right next to us. And there's this couple, and I didn't know who they were, unfortunately, but they're jumping over people, jumping over chairs and screaming, Stephen, Stephen, can you please take us with you? and I'm like thinking this is not the rapture where do you think he's going to take you well Dimitri I hate to tell you mate but if it were the rapture I wouldn't want to go where Stephen Miller's going basically that is the last place I would want to go during the rapture but you know to finish it up the evening actually ended up really great because we had the Canadian ambassador at our table and at the end of it when they finally let us out he's like why don't you come over to our house we'll order a pizza and it ended up being a really fun end of the night, actually. Well, mate, all's well that ends well. We're glad that you had a fun night and survived to tell the tale so you could come here and talk about the news with us. And on that note, Dmitry Alperovic, James Wilson, thank you so much for joining me on the show to talk through the week's news. It's been a lot of fun. It's been fun, Pat, and looking forward to next week. Thanks so much for having me. That was Dimitri Alperovitch and James Wilson there with a look at the week's security news. Big thanks to them for that. It is time for this week's sponsor interview now with Dan Guido from Trail of Bits. And Trail of Bits is a really interesting sort of security consultancy and engineering firm that does a lot of interesting work. And, you know, Dan is big on AI at the moment, of course. I mean, they've always been a very sort of forward-leaning company. So when there's new tech, they dive right into it. And one of the things that they've been both making use of and looking at on behalf of their clients is when you want to do some sort of private inference, right? Rent some infrastructure so that you can run inference workloads without exposing them to the hosting provider. And I thought this is going to be an interesting topic for a sponsor interview. So that's what Dan joined me to talk about now. So, you know, there's certain workloads where you don't want to expose what you're working on to Anthropic, for example, right? You just can't, whether it's through like regulatory constraints or, you know, confidentiality constraints. Like if you're an exploit developer working on high-end stuff, you can't just throw that into Anthropic. It's just not the way the world works. So you can use local models on very expensive hardware, or you can use services like Tinfoil SH, which will help you to actually load some local models into their powerful hardware, and you can just pay to use that. But how do you then go about guaranteeing that that stuff is actually private? So that is the topic of this interview. And also, we talk about the work Trailer Bits did at looking at the way Meta has done private inference for WhatsApp. So here's that interview. Here's Dan Guido, who is going to kick off right now, explaining how private inference works. Enjoy. So these things, they operate in little trusted execution environments. They're a little virtual machine inside of either a CPU or a GPU that is completely separated from any infrastructure that the cloud provider or the vendor can access. And that's just fundamentally different from what, you know, OpenAI, Anthropic and the rest of the frontier labs are doing right now. You know, they have a lot of legal agreements that say you should trust us. And they have lawyers standing over their back with knives like, OK, like we get it. But if they get hacked or if somebody goes rogue internally or if they just get curious from a business standpoint and their internal rules change, they can look into all your stuff. Um, so this private inference stuff, um, it solves a lot of major problems that for instance, companies like trail of bits have, uh, like a lot of people trust me with their most sensitive intellectual property and they don't want me to give my data to a frontier lab where I'm taking it on faith that they're not looking. Uh, so I think in 2026, uh, this is a really big topic and, um, I am kind of excited to see where it goes, but also, uh, it's, it's, it's been a, It's been a great thing for us to be around for since I think we have the skills to really help. Now, okay, so all of this makes a lot of sense, right? So you've explained why there is a need for this private inference stuff. But, you know, as you also pointed out, this means that you are not using a frontier model. How big is the gap these days between something like the latest and greatest anthropic model or open AI model, you know, versus one of these local models? Because as I understand it, the gap is pretty substantial. So I'm guessing there's a whole bunch of stuff you just can't do when you're using, like, I know it sounds funny, but like a remotely hosted local model in one of these, you know, in one of these private inference rigs, right? So, you know, like, what, yeah, I guess that's the question. What's the gap like there? Yeah. So I think the way that we do it is we always prototype first on a Frontier Lab model. Like, I want to make sure that I'm going to go explore and figure out, can I get this problem solved with Opus, with Codex, with, you know, Gemini's latest model? And then once we get that done, then we can start building evaluations. We build some data sets around like, okay, can we reliably solve this problem? Can we swap out some of the parts and figure out that we can trust one of the open source models to do the same? I think a lot of people, the experience they have with open source models is they go grab Olama, which is doing CPU inference, not even MLX. And they're doing it with models that fit inside their laptop GPUs or their laptop CPUs. And those things just don't compare to using a full unquantized, you know, 230 billion parameter model. You're leaving so much performance on the table. And the only place you're really going to be able to do that is with some inference provider that's running it in the cloud. Not even withstanding, like I think personally, the way that trailer bits engineers use these sorts of things is that speed matters more than the model size. Like if this thing works five tokens per second, you're still not going to use it, even if it's smarter. I would rather have something that does like 10,000 tokens per second and is completely stupid because then I can scale out a huge agent workflow, run like 50,000 checks on whatever it is that I'm working on and get some trust out of the process. So the ability to run one of these really high end, you know, latest generation, 200 billion parameter plus models in the cloud with inference that another person can't peek into is pretty attractive. And I think you can do it if you have the right evals. Right. So I guess what you're saying is the local models aren't that bad if you got the right hardware, if you got hardware with the juice to run it. Yeah, I think that's true. You can get mileage out of these things. I'm seeing that it's kind of like a six month delay where, you know, Opus comes out today, six months later, the open source models catch up. But a lot of that is left to like, well, show me the proof. And that's where Trail of Bits and other firms were building a lot of proprietary data sets that give us that trust that, oh, yes, this open source model can do it. But that's work that you have to do for yourself. You know, a lot of the open evaluation data sets aren't going to tell you that. So what sort of tasks are we talking about that need to be sort of hidden away from Anthropic as per, you know, customers, Anthropic or Google or OpenAI or whoever at a customer's request, right? So what are the sort of gigs? What sort of gigs are you using this sort of private inference infrastructure for? Is it like code audits? Is it like, what is it? A lot of it's code audits. A lot of it's product security incident response. I think the most sensitive engagements that Trail of Itz works on are cases where a company got hacked and they are treading carefully because there are legal ramifications to the work that we are doing for them. You know, there's customer liability issues, you know, they could be sued, there's lawyers involved. So this is about what winds up in discovery, I guess, like if someone's dropping a subpoena on Anthropic and saying we want to see what this contractor did, blah, blah, blah, It's part of this incident response. It's part of that too. But, you know, we audit a lot of sensitive code where like we have some clients that are EDR vendors, right? And through working with those clients, we are finding remotely exploitable bugs in these EDR products. And we don't want information about where those bugs are to end up in some log at a frontier lab where any curious person inside the company might be able to get access to that log of prompts. Yeah. I mean, one of the reasons I was interested in having this conversation with you is because, well, first of all, you mentioned it in your talk at Unprompted. And secondly, I was chatting with someone who works very deep in exploit development who was talking about their frustration that they can't use models like, you know, all of the anthropic models and whatever, because, you know, you can't expose those sort of exploits and vulnerabilities. You just can't take the risk. The stakes are too high when you're working in the national defense space, at least. I think it's a really funny mindset to have. I remember conversations with similar people about Microsoft Security Research. Let's just booby trap or let's look through the logs on the documentation for different APIs inside of Windows. And from there, it's probably a data leak about, oh, this person's probably looking for bugs in this subsystem inside of Windows. It's like there's one person on the entire universe that found this specific API. And I think, you know, you can go, the turtles go way down. Like you can get a local, you would probably be motivated to get a completely local offline copy of everything that's hosted online, just so that you don't expose that stuff to a third party vendor. But I mean, from my perspective, we have client obligations to live up to. I get a lot of sensitive intellectual property and I have to attest that I'm going to keep it confidential. and it's hard for me to do that with some of the way that AI works today. So this is attractive for me. Let's just go into that because you mentioned before that there is this sort of cryptographically enforced separation between these virtual machines that are doing the inference. How robust is that separation? Because look, I understand actually from a contractual point of view, from a not having data lying around that could be subject to discovery later point of view, like this is going to be good enough, right? But I just sort of, I've always wondered about that, you know, hardware separation on a, you know, separating VMs on a single piece of hardware thing. Like how robust is it? I mean, so I think it can be very robust. The problem is that you need to actually do the work to make it robust. And that's the process where TrailerBits has been publishing public reports of audits that we've done for these systems so that people gain that trust because they're complicated machines. Like especially- So you've had a look under the hood of like tinfoil.sh, for example. Are they a customer? We've looked at tinfoil.sh. We've also looked at WhatsApp. WhatsApp has a private process system. That's the next thing I wanted to talk to you about. You did a whole gig where, you know, and the reason WhatsApp is doing private inference, it's funny, man. It's so funny because my wife just asked me yesterday, oh, there's like now an AI button in WhatsApp. She's Brazilian. All the Brazilians, that's their default app, right, is WhatsApp. I think it was like, is this so if you get lonely, you can just talk to WhatsApp? And I'm like, no, but, you know, nice one. Kind of, yeah. But they're doing, so in order not to break the end-to-end encryption, they're doing private inference so that, you know, for some of the reasons that we spoke about earlier, right? So the data isn't just leaking out into logs or whatever or leaking out into, you know, queries and prompts of models that can be recovered later. So how did WhatsApp actually tackle this? So WhatsApp is using a lot of commodity off-the-shelf systems. Like you look at Apple's private compute. They own the entire stack. They built custom hardware, custom software. They have a whole build chain that only exists at Apple. And it still sucks. But anyway, it's I mean, everybody's got the same problem. Like, so WhatsApp is using a lot of AMD stuff. And, you know, there's there's challenges. There's challenges where a lot of the AMD hardware is not robust against physical attacks. It has trouble making reproducible builds, you know, inside of a company like like what Meta is really selling is they're saying that a company with Meta's resources, that all the resources at Meta will not allow us to peek into the operation of this enclave. And that is a really strong, really outrageous statement to make. And it means that you actually have to think about this beyond just the secure enclave. You have to think about physical access controls around the hardware. You have to think through what you're actually attesting to, how people check it, what's available from the transparency log. When Apple does private compute cloud stuff, they don't even give you the source code to reproduce it. They just give you a binary. And they're like, oh, obviously there's some security researcher that's looking at every single new hash that comes out, downloading the binary, and then inspecting it fully to make sure there's no backdoors. But, you know, so there's gaps and weaknesses abound. You keep referencing Apple, right? And I'm guessing the reason you do that because that is one of the better implementations of like private inference, right? So I'm guessing that's why you keep mentioning that in a discussion about WhatsApp. Well, they're like, they were the first to market with one. And I think when it came out, it was sort of a big bang. It was a lot of interest and WhatsApp followed shortly after. So these are the two big comparable public implementations that people have. But I would say, you know, your question was, should we trust this? And like, why should we trust this? what these systems do is beyond just having technical controls, they force a company to think hard about how they're storing your data. And like, if you're not using a secure enclave, if you're not doing these things in a private inference system, then it's just sitting out there on the cloud somewhere. And, you know, your opinions about what you should do with that data could change. There might be thousands of DevOps engineers at your company that can access it. But when it's inside of an enclave, you like, you can't just like change your idea about how you're accessing data, you have to actually hack your own software. You have to spend considerable effort in an unambiguous project to hack software that you made. And the skills to do that are far, are few and far between. There are like 10 people in the whole company that could figure it out, if that. And those 10 people probably are not motivated to do it. They, they like, they know the work that went into it. So these things, they, they provide like a real logical barrier inside the company as much as they provide a technical control. Now, for anyone who's interested in looking at your full audit report for WhatsApp's private inference, you've actually published that. Oh, yeah. I mean, these systems require people to trust them. They need to know what's actually running and they want to, it's a cryptographic system. You know, we publish cryptographic audits all the time and this is one. So we have our full report that was co-published with WhatsApp and you can go dig through all the dirty details. Yeah, awesome. Well, look, Dan Guido, fantastic to chat to you. I mean, you know, it's a short-ish interview, right? So we can only sort of start to scratch the surface of what is a very interesting topic. But for people who want to go deeper, I mean, I'm looking at the audit report here and, you know, it is not short. I mean, what have we got here? About 118 pages of audit report for those of you who are really interested in going deep on private inference. But, mate, great to chat to you. Always good to see you. And I'll look forward to chatting to you again soon. Thanks again, Patrick. That was Dan Guido there from Trail of Bits, this week's risky business sponsor. Big thanks to them for that. And that is it for this week's show. I do hope you enjoyed it. I'll be back soon with more security news and analysis. But until then, I've been Patrick Gray. Thanks for listening. Thank you.