Practical AI

Controlling AI Models from the Inside

44 min
Jan 20, 20263 months ago
Listen to Episode
Summary

Ali Khatri, founder of Rynx, discusses a revolutionary approach to AI model safety that moves beyond traditional guardrails by instrumenting the internal states of AI models during runtime. Instead of just filtering inputs and outputs, his company has developed technology that can detect and prevent harmful content generation from within the model itself, achieving comparable safety performance at 1000x lower computational cost.

Insights
  • Traditional AI safety approaches that only monitor inputs and outputs are fundamentally limited, like checking IDs at a building entrance but having no visibility inside
  • Internal model instrumentation can achieve the same safety performance as standalone guard models while using 1000x fewer computational resources (20 million vs 160 billion parameters)
  • AI safety needs are highly context-specific across industries, requiring customizable solutions rather than one-size-fits-all approaches
  • The economics of current guardrail solutions often prevent their deployment, especially on edge devices where computational resources are severely constrained
  • Defense-in-depth strategies combining multiple safety layers will be essential for robust AI security
Trends
Shift from external AI guardrails to internal model instrumentation for safetyGrowing demand for context-specific AI safety solutions across different industriesEmergence of mechanistic interpretability as a practical safety tool rather than just research areaEconomic pressure driving innovation in computationally efficient AI safety solutionsIncreasing focus on runtime AI safety versus build-time safety measuresRise of hybrid approaches combining traditional guardrails with internal model monitoringGrowing recognition that current AI safety measures are inadequate for production deployment
Quotes
"Today what we are able to do is just today's solutions, analyze what's going into the model, also known as the prompt, and analyze what's coming out of the model, which is the response. But by then the damage has already been done."
Ali Khatri
"The model needs of most companies are similar. Ish. But the safety needs are dramatically different. So you cannot have a one size fits all safety stack that works for everybody."
Ali Khatri
"What we are essentially doing is we're analyzing the internal states of the primary model as it makes the prediction. So in doing so we don't need any of those two extra GPUs and that 160 billion of parameter billion parameters of inference that I counted, we have succeeded in bringing it down to 20 mil with an M."
Ali Khatri
"If you don't know how to drive a car, you're just not going to be able to do it. No matter how fit you get, no matter how much you train, you are not going to become a race car driver if you cannot drive a car. Similarly, in this security setting, if the example that I use where somebody, you know, decided to pick up a golf club for some reason or no reason at all, you weren't checking for like golf clubs are permitted items to go bring into a home."
Ali Khatri
"I am a firm believer in defense in depth. So one product does not miraculously solve everything, just like with our human society."
Ali Khatri
Full Transcript
5 Speakers
Speaker A

Welcome to the Practical AI Podcast where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work and create. Our goal is to help make AI technology practical, productive and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn X or Bluesky to stay up to date with episode drops behind the scenes and AI insights. You can learn more at PracticalAI FM. Now onto the show.

0:04

Speaker B

Welcome to another episode of the Practical AI Podcast. This is Daniel Whitenack. I am CEO at Prediction Guard and I'm joined as always by my co host Chris Benson, who is a principal AI Research engineer at Lockheed Martin. Howard, how you doing? Chris?

0:48

Speaker C

Hey, doing great today, Daniel. How's it going?

1:03

Speaker B

It's going well done. A little bit of snow shoveling on the ground as we speak. We're kind of headed into winter break or the holiday Christmas season here in the US And I think this episode will be released in the new year. So if you're listening to this, you're listening in the future. To be honest, I'm really excited about talking about the future because our guest today is really thinking very innovatively about how we can secure our AI models and have safety as we move into that future. Really excited to welcome to the show today Ali Khatri, who is founder of Rings. Welcome Ali.

1:05

Speaker D

Thanks, thanks for having me. Daniel.

1:45

Speaker B

Yeah, yeah, it's. We met earlier this fall. Really fascinated by your kind of line of work and technological innovation at Rings. But I also know that you've been thinking about these topics around AI safety, guardrails, disallowed content, et cetera for quite some time. Could you give us a little bit of a background of how you got into these topics and what you've done in the past?

1:47

Speaker D

Yeah, so I have been in the machine learning for AI safety or anti abuse use cases in general for the past eight or so years of my career. I spent about three years at Meta where I built infrastructure that serves about half of the world's population. Basically any safety, anytime you type a message on Facebook it goes through tens of safety checks which are powered by thousands of models. I built the infra that these models run on and then I moved on to Roblox where I built AI powered like where I built systems to protect about $3 billion in payments against fraud. So I've been in this space, I've been using AI models to sort of protect against abuse and during this time I realized that the models that I'm using themselves are susceptible to abuse. So that's what led me to founding rings.

2:14

Speaker B

And I know that now you're thinking about those actual models. So I often tell people also on the AI security side, there's kind of AI for security and then there's security for AI. And it sounds like there's something similar kind of in the, I guess model or safety anti abuse space. Could you give us a little bit of an understanding of like when we're talking about the safety or security of AI models, Could you kind of define that for us? Like what, what does that, what do you have in mind as the kind of bad case scenarios or worst case scenarios of what a model could do? Why it's not. Why it might not be secure or safe.

3:14

Speaker D

Yeah. So thanks for making that distinction between AI for security and security for AI. And they're two very different things. Right. Like AI for security basically means using AI to solve existing security challenges in a more effective or a better way. Right. So that's a very different and a very linearly separable body of work from security for AI, which is, which focuses on making the AI models themselves and like AI based use cases secure. As models have entered the tech stack, they also bring in a bunch of security challenges and that is what security for AI solves. Now in terms of the safety aspects that we talked about models are today will generate anything, will literally tell like these generative models. Like if it's a text model, it'll generate any, any vile, any form of vile content known to man. It could tell you like OpenAI got into trouble for like where a teen was asked to commit suicide. Right. Like was or was encouraged to commit suicide. So you have self harm, you have different other categories of harm where you can, you know, generate pornographic content, you can generate other forms of inappropriate content like violence, gore. And sometimes it doesn't even have to be inappropriate because safety is very context specific. Right. Like as a law firm, safety looks very different for you than what it does for a medical shop versus say a customer service setting versus like a co generation environment. So each one of these use cases come in with a set of permissible and non permissible behaviors. And that's kind of what safety really is, that making sure that the technology works in ways that you intended to and minimize the unintended outcomes of it.

3:59

Speaker B

And I guess people are addressing, I mean these are known issues in the sense that at least a segment of the people that are working on these Models know about these issues, could you give us a little bit of a sense of like as we speak how these at least in a production sense, like what's capable now? Like how, how does the landscape look in terms of capabilities to defend against or align with, with, with certain, certain policies or allowed content, disallowed content? What are our choices right now? If I'm looking at the availability currently of both kind of open source projects and what's in maybe closed platforms, what's available to me to deal with this issue?

5:47

Speaker D

Okay, so before I answer that I'll use, I'll start with an analogy which will help make the rest of the response much more clear, add some context. So imagine you are in a like a story like a built like a giant apartment building with over like say a thousand condos right now let's say your neighbor for some reason or no reason at all really decides to pull out a golf club and starts violently assaulting you with it. Now for a situation like this, will the good guys be able to protect you just by checking IDs at the gate?

6:42

Speaker C

Obviously not.

7:19

Speaker D

That's kind of where we are.

7:20

Speaker C

You're past that point.

7:21

Speaker D

So that's kind of where we are with AI safety today. That's kind of how jailbreaks work, right? So here the giant thousand story building is the model. So today what we are able to do is just today's solutions, analyze what's going into the model, also known as the prompt, and analyze what's coming out of the model, which is the response. But by then the damage has already been done. So now if you're using video generative models, right, you can put in a text prompt, get a video output, your video output is going to be, it's too expensive to analyze, number one. And the video has already been generated. So if you already spent a large amount of compute generating that bad content, right? Again if you're talking with audio models, you can trick audio models into generating bad content. Like a seemingly innocuous looking prompt can be tricked today using multitude of techniques to generate really malicious output. So now unless you have visibility into what's going on inside of the model, you're not going to be able to catch a lot of these things. That's where jailbreaks come from. That's where adversarial machine learning comes from. If you look at like, if you look at it through the context of predictive models versus generative models, it's essentially the same core phenomenon where operating these models as black boxes and we have no idea of what's going on inside of them. So we're trying to change that.

7:23

Speaker C

Looking at that, it seems like in the way that you just kind of phrased it with the context there, it seems like a very intractable problem. As a follow up to Dan's question, like how should I as a new user potentially or someone coming into a use case in my company where a model is desirable, but I'm worried about whatever bad or abuse means in the context that I'm operating in. How should I start off thinking about that? What's my starting point? Because I gotta say, coming into it, I'm not even sure where to start. So can you level set that a little bit in terms of, you know, what, what's, what's square one?

8:46

Speaker D

Yeah, so that's a good question. So normally what you do is the way I like to think about it is there's a general category of bad stuff that no one really wants or the law doesn't allow and things like that. So there's like a general category of stuff like that, right? Like here I, I'm including porn, hate speech, yada yada yada, which take a general category of undesirables that no one wants on the platform, like child safety, for example, that's non negotiable no matter what context you're in. Now there's also another aspect of categories which is about context specific safety. So now if you're in a banking use case, you got to think about, okay, money laundering, you might not have to think about money laundering, say in a code generation setting, for example, right? So you want to think about these very specific categories of risk or issues that come from your use case. And people tend to usually have a very good idea of that. Like if you're like, if you understand your use case well, which most people do, right. Which is why they're exploring models and they're trying to solve some problem within their use case. So within that use case you also would understand the problems that you're facing. And that's another category of risks that you want to think about. Then once you've thought broadly about these two, then you got to figure out about, once you've identified both of these categories, then you want to think about mitigations, detections and things like that.

9:26

Speaker B

And I guess there's like, if we're just kind of defining some terms for people that they might have heard of this approach that you talked about related to kind of guarding the, the gate to the, to the apartment complex, right. Or the Inputs or outputs, managing the prompts and the outputs. Is that what people refer to as like guardrails? Safeguarding? What are some of the terminology that's being used? And then I know that you all that you have a different way of thinking about this. I guess just setting the stage, jargon wise, how do we define these kind of guarding the gate things? And then as we filter into actual safety within the apartment complex or within the model, what kind of terminology? And I guess like there's a body of research building up to what you're doing, what terms are used to describe that and like as people wanted to research that, what would they look for?

10:47

Speaker D

Yeah, so guardrails essentially like are a catch all term. They can refer to prompt and response filters. Now today one of the ways of doing like there are like multiple guardrail type solutions out there. There are guard models out there. Meta has one, IBM has one, Google has one, OpenAI has one. Like and I'm talking about public releases, right? Internally most people have their own, but these essentially are prompt and response filters. So they look at the data going in, they look at the data coming out. Now. So that is like one thing that guardrails are used to refer to. Less commonly, guardrails are also used to use refer to like static checks where you just look at the output of the model and say something like, okay, the word like forbidden word, let's say the F word, right? The F word appeared in output. This is not permissible. So that's a simple regex filter that you can use. So that would also be called a guardrail in terms of like looking at the internal state of the models. There's a whole bot, like a whole field of research, area of research that's developing. It's called interpretability. There's a subset of that called mechanistic interpretability where they try to figure out what sub component of the model led to this particular output and try to change it at the source or try to alter a modify behavior while it's happening as opposed to after or before.

11:47

Speaker E

Well friends, here's your hot take of the day. Your teams, AI tools, they might be making collaboration messier, not faster. You probably know this, you feel this. Think about it. You've got AI literally everywhere now, summarizing, generating, suggesting. But if there's no structure, no shared context, you're just creating more noise, more outputs, more stuff to wade through the gap between having a great idea and actually shipping that idea. That gap isn't a speed problem, it's a clarity problem. Well this is where Miro comes in. And honestly, it shifted how I think about team workspaces. Miro's innovation workspace is not about brain dumping everything into an infinite canvas and just hoping for the best. It's about giving your work context, intentional structure, so your team knows what to focus on and where to find what they need without playing detective across 12 different tools, clicking and moving and tabbing, just, just too, too messy. And the AI piece, well, Miro AI actually gets this right. They've got these things called AI sidekicks that think like specific roles, like product leaders, agile coaches, product marketers, reviewing your materials and recommending where to double down or where to clarify. You can even build custom sidekicks that are tailored to your team's exact workflow if you desire. And then there's Miro Insights. It sorts through sticky notes, research docs, random ideas in different formats and synthesizes them into structured summaries and product briefs and mirror prototypes. They let you generate and iterate on concepts directly from your board. Test 20 variations before you ever touch your hi fi design tools, saving you time, giving you ideas and getting it right. This whole thing is built around the idea that teamwork that normally takes weeks can get done in days, not by going faster, but by eliminating the noise and the chaos. So help your teams get great done with Miro. Check it out@miro.com to find out how. That's M I R O.com again miro.com.

13:29

Speaker B

So Ali, you were just getting into these ideas of interpretability. Mechanistic interpretability I think you called it. We've, we've talked about interpretability on the show before. I think mostly in relation to like trying to figure out why a model made a certain decision in relation maybe to certain concerns around bias or other things. So like if I have a risk model for approving insurance or something like that, then I might need to have some interpretability around that. Or maybe in the case of healthcare, there's a burden for interpretability of like how decisions were were made Here it sounds like kind of the interpretability is being applied to, I guess why the is a good way to put it, why the model generated some problematic output? Or is is there a better way to think about that?

15:45

Speaker D

So there's multiple overlapping aspects here, right? Interpretability is like an umbrella research area. It's an umbrella term. So what you alluded to earlier is more of it's also described as explainability. So why was this credit card denied? So you're trying to explain it in human concepts that is a part of interpretability, no doubt. It's a subset of it. There's another subset which is how was this generated? Right. So, for example, if your generative model outputs, say, like, if you say, how are you? Hi. And the model says, how are you? You want to care about how was this generated internally, how are these tokens produced? The reason you care about that is you want to know if instead of saying responding to responding with how are you? It could respond with howdy or with something else. You want to know what caused those differences, and you want to be able to control that. So that aspect is also interpretability. Now, where this interplays with safety is when you have these prompts which look bad, which look good to a human, but bad. But result in bad outputs, which is how jailbreaks work. When you analyze how the data flows inside of this black box, you're able to control it and stop it at the source. So think of this as continuing an analogy from earlier on. Think of this as cameras at every gate or every path. So you know that, okay, well, this is what's happening in this hallway, and we got to stop it. We got to put an end to it. So it's a very different clash of defenses.

16:44

Speaker C

As you're saying this, you're actually talking about manipulating the internals of the model and the flows that are there. Kind of the cameras on the doors and stuff like that, and making it maybe a gray box rather than a black box to some degree, as opposed to kind of the more traditional guardrail approach where you have programmatic. You know, you use the word guardrails around the model's inputs and outputs to try to. To try to handle things that way. So it's kind of a whole different thing about. About. Instead of treating the model as a black box, you're saying you're diving into it and trying to affect an improvement there.

18:23

Speaker D

Yeah, so intervening is one, like stopping generation or modifying. That is one form of intervention. Right. Intervention does not have to necessarily be in that form. It can take various other forms. So once you understand today, we have no idea what's going on inside of the model. We put an input to the, like, we provide an input to the model, get an output. We have no idea what's going on. So this, what we're trying to build or what interpretability tries to do is understand what's happening inside. Now, you could control it, or you could use that to make a risk quantification and use that downstream. You don't have to do anything in the moment, necessarily, it can be leveraged downstream. So now it's basically like you have a whole new set of information or a whole new set, like a whole new class of data points that you can leverage in creative ways downstream. This is something that's not available today. And this is what interpretability builds really at runtime.

18:59

Speaker B

And am I correct? So part of my assumption in the past, and I'm fascinated by this whole subject, is like, like some small changes in the, for example, the weights or individual layers or individual pieces of the model can produce very large changes in the output behavior of the model, which it's kind of. I'm blanking on this. What is the thing? It's like butterfly flaps its wings and.

19:55

Speaker C

Oh, the butterfly effect.

20:29

Speaker B

Yeah, the butterfly effect or whatever. So like these. Because you can make a change, whether that's quantization or other things to the model and that, you know, may produce unclear and sometimes catastrophic changes in the, in the behavior of, of the output. And so if I'm understanding what you're saying, right, Ali, it's one way you could try to use the information about how the model is producing certain outputs is to intervene by actually making a modification or preventing something in the model. But that could produce other changes that you may not want, I'm assuming. But you could also instrument the model to understand potentially when it is kind of firing those certain neurons or lighting up in a certain way that is indicative of problematic behavior. Am I understanding that right? In terms of various ways of, I guess, either intervening or instrumenting. I don't know if I'm using the right terms.

20:30

Speaker D

We are instrumenting, like the way we're approaching this is we are trying to understand what happens inside of a model at runtime. Models are a monolith, right? But we're breaking it down into different spaces or subspaces. And we look at the subspaces that get activated, that get activated during bad generation. So now when you're, let's say, generating non permitted content versus permitted content, different sub regions of the model get triggered. So we're building visibility into that and we're trying to identify them at runtime. So now there are some sub regions that you wouldn't care about, right? Like, for example, if you take a General Purpose LLM, it's trained on everything ranging from Python code to 15th century Chinese poetry, right? Now when you're using it in a customer service setting, you care about neither one of them. And if they're, if those sub regions are getting off a model, are getting Activated, then you want to be able to, like, arrest it while it's happening. So this is similar to, like, if you go back to the analogy that I had, like made about the apartment building, you want to have visibility at all times into what's going on at each level, right. So you understand, you find out that a bad thing is going to happen way before it actually happens. Like, for example, if someone's going to conduct a bank robbery, people just don't get up and conduct a bank robbery, right. There's some searching going on, there's planning, cycles of planning going on, there's purchases of firearms or whatever going on. So now if you stop them at these bad activities at different levels, it does not have to get. The police don't have to deal with the shootout situation in a bank at the very end. So similarly, defense works in depth. Right. And we're building a whole new layer of safety that hasn't been tapped into just yet.

21:34

Speaker B

Yeah, that's fascinating. And I'm, and I'm guessing certain folks are probably wondering, like I am out of curiosity, like, this might be the first time that they're hearing about such an approach to this kind of problem. And they might be thinking, you know, how is this possible? Like, how. So one thing is, I think the general concept makes sense, right? Like instrumenting the interior of the, the apartment complex, understanding what's happening, kind of retrieving that, you know, that intelligence too, for you to make decisions or determine if you want to mitigate something. What are, what are the. I guess have there been multiple attempts to try this sort of thing? And from your perspective, in terms of how you all are approaching it, what is kind of needed, I guess, from the, the customer side to create this sort of instrumentation. So I could imagine, like, in one scenario, like, I could train, I could tell the customer, well, we're not going to train on Chinese poetry or code. We're just going to train a whole new model. That. And that burden on the customer is very, very heavy. Right. And on another scenario, you could say, oh, take this model off the shelf and do this to it, and then that's a less burdensome. So there's probably a spectrum here. Could you help us understand, I guess, the burden and what might be required to get to this instrumentation?

23:20

Speaker D

Yeah. So the way we're approaching this today is we do not modify the model. We do not even build models, nor require the customer to build a model. We take an off the shelf model and we build a safety module that sits on top of so Essentially for the customer, it's like a very low friction approach where they take a model that they use and love like Llama.

24:55

Speaker C

Or.

25:16

Speaker D

Granite or any Mistral or any of the open weight models for image generation. You have WAN and you have a host other host of Chinese models for the audio video setting. So any model that you love, we sort of make it more secure and tailor it for your context. So now again, remember, if you're like a lord law firm, right, an off the shelf Llama or off the shelf Mistral is not going to have the protection that you have. Like if you're like say a shoe company, let's say you're Adidas or Nike, right? You as a user want to talk about Nike but not talk about Adidas, right? You can't expect the model maker to put that in for you because the model maker is trying to sell to everybody. So we help build that customization and we do that without changing your primary model. Your primary model will continue to be as it is. If you make any modifications to it, like fine tuning or anything like that, that's on you. You control that. We don't require you to do it. But even if you do that, we can still support you.

25:18

Speaker C

So totally recognizing that there's proprietary stuff that you're not going to dive into and respect that. Could you talk a little bit about just kind of clarifying as we were kind of talking earlier about kind of the buttressing with guardrails on the external side versus going into the model and as you're talking about adding a component. So in my confusion, it seems a little bit like it's on the outside. Can you talk a little bit about what you mean by that without diving into places you can't go?

26:20

Speaker D

Yeah. So I'll give you a very. I'll try to address that as much as possible without going into the specific.

26:49

Speaker C

Fair enough.

26:56

Speaker D

Specific details. So today what you have, right, you have these filters which analyze the inputs and the outputs. Now analyzing the economics here is all messed up. Like if you were to analyze video or audio, right? Those tend to be very expensive. Computationally, those models are very expensive. So now if your inference itself costs X and if you're expecting someone to pay another X to analyze it, number one, it's slow. Number two, it's like paying someone $1,000 to guard $100 bill, you're just not going to do that, right? So what people end up doing is people end up like shipping unsafe models. So today we have tested models from different audio companies. We've tested models from different video companies, different image generation models, each and every one of them with little to no trickery. Which means an average user can just go there and ask them to generate bad stuff and they will do it. There are little to no guards there. And that's because the economics does not make sense the way thing is today. So what we've built is a scientific like we've sort of had a research breakthrough where we can build safety about thousand times cheaper. So just to give you some concrete numbers, what we've done with llama, we've taken a llama model which is like an 8 billion parameter off the shelf llama model. And today if you had to protect it, you would have to use Llama Guard 3, which is an 8 billion parameter model. And assuming it generates 10 tokens, that is you're running about 80 billion parameters of inference at runtime. Now if you do that on your prompt and response, both that number balloons to 160 billion parameters of inference. That is two extra GPUs or one extra GPU depending on how you've wired the or set up the deployed the models. So now what we are essentially doing is we're analyzing the internal states of the primary model as it makes the prediction. So in doing so we don't need any of those two extra GPUs and that 160 billion of parameter billion parameters of inference that I counted, we have succeeded in bringing it down to 20 mil with an M. So we're essentially a rounding error today because of this expensive safety profile. And you cannot even deploy them on edge devices. Like on edge devices, guardrails are non existent because you can barely like edge. When people are working on the edge, they work really hard. A lot of people work really, really hard to squeeze that one device through techniques like quantization that one model onto the limited memory of the device. So you have no room to deploy a safety model. Right. So that's why we've like built tech which literally is like a rounding error. 20 million parameters on 8 billion is nothing. And we can sort of deliver safety like comprehensive safety. And our safety performance is comparable to a standalone guard model and it is significantly faster because our latency just doesn't exist. It's parallelized and it becomes in practice the latency of the primary model is the latency that the user sees with today. You have to account for the latency of the primary model and you also have to account for the latency of the response filter and the prompt filter. Also the Response filter cannot kick in until the primary model is finished generating. You're looking at very high latencies from the perspective of the end user. You're looking at a lot of added friction in terms of slow speed. You're looking at increased costs because ultimately the cost will be passed on to the user. You're paying for two extra GPUs that you don't have to. Your quality will be substandard. Again, remember, all these models are able to do is check IDs at the gate, so that is the protection you're getting.

26:56

Speaker B

So, Ali, it's very fascinating and encouraging the results that you're seeing and what you're able to do with this sort of approach. I'm also wondering, like the size or latency or behavior that you just described, that is certainly a huge component of what people are thinking about and why they can't use guardrails in certain cases. Another question that might come up though, and I actually think that you, you can validate me, but I think you have a very good answer to this is, is what. What about the kind of accuracy or reliability of kind of one approach or the other? So a person, I guess could argue and say, well, if I have a guard at the gate and he's 100% accurate in making sure no, no gun ever gets into the building, then there'll never be a, you know, there'll never be a shooting in the. And that's a very robust guardrail or something like that. But I think, I think what you're saying is you still wouldn't know what happens in inside the building with 100% certainty. So could you address that side of things like the accuracy side or the quality side, I guess, of the performance of guarding and safety with this kind of instrumented model approach versus kind of an exterior guardrail?

30:35

Speaker D

Yeah. So an exterior guardrail, as you pointed out, is limited in visibility. Right. So even if they do 100% great job of checking someone's IDs, they only have limited information. There's only so much you can do. Right. The kind of defenses you're expecting is out of the scope. You're limited by information there. If you don't know how to drive a car, you're just not going to be able to do it. No matter how fit you get, no matter how much you train, you are not going to become a race car driver if you cannot drive a car. Similarly, in this security setting, if the example that I use where somebody, you know, decided to pick up a golf club for some Reason or no reason at all, you weren't checking for like golf clubs are permitted items to go bring into a home. So the security have done their job right. They, they haven't done anything wrong, but they are just. There's a fundamental limitation that exists here. So you can only do so much looking at artifacts. So there's a whole layer of safety that is untapped or unaddressed or inaccessible because of, because of scientific limitations. But that's changing fast and we're sort of at the, like at the leading edge of it.

32:03

Speaker B

And just to make sure that I, that I have it right. So would it be a good way to describe it that let's say that I have or I want to prevent, you know, toxicity or something coming out of the model of a certain type of. Right. There are a variety of inputs to the model that could result in that type of output, but I'm never going to know all of them or there's always an edge case. And so by instrumenting and saying this part of the model lights up when toxicity is being produced, then I no longer have to worry that I have all of the possible inputs in the world put together that that might, might trigger toxicity. I just know when there's toxicity. Is that, is that a appropriate way to put it?

33:21

Speaker D

Yeah. So when you're, when you're on the defense, when you're playing defense, right. You don't have like, when you're defending models against abuse in any scenario, whether you're defending models or just protecting a platform in a classical whole trust and safety sense, you're never going to have an exhaustive list of the million ways in which things go wrong. But you can gauge, you can develop a fair understanding through past examples or through data point that you have. That's the benefit of machine learning, right? But with the way jailbreaks work or with the way adversarial examples work, is that they do certain things inside of a model that are not possible to predict in a different model which is being used as a guard. Right. So the guard is model type A. The primary model that you're predicting is model type B. So the guard is not going to be able to predict what's happening in model type B simply because they don't have visibility into what's happening inside the model. So without, like, without information, you're not going to be able to do anything, right? Like for example, if you're the SEC and somebody takes away your access to bank accounts, you're not going to be able to prevent Money laundering. No matter how many books you've written on that subject, there's only so much you can do with guns and badges. Right. You would need visibility. If you want to try preventing money laundering, you will need visibility into the financial system. So that's kind of what we're building here. In terms of accuracy, the numbers speak for themselves. Like, we're able to match and beat the performance of standalone guard models over 1000 times our size. And that's because we are exploiting this unique insight.

34:18

Speaker C

Yeah, it sounds a lot like what is an analogy that in like neuroscience would be, if I'm, if I'm can pronounce it right, an electro cellophilogram, an EEGC where they monitor the synaptic connections and they can see it lighting up. You know, since I can't pronounce the word, I'll just describe it to the best of my ability. It's a long word. I actually had it written down. I was like, oh, crap, I still can't pronounce it here. So. So it sounds like that. As you're thinking about how you can apply that, are there any ways that, that can kind of tie back also into the hybrid approaches to see, like, if we talk about some of the other more traditional forms of guardrails that are out there, can you combine them into sort of a hybrid approach where you do have different types of guardrails that are in place, but people can add this kind of capability from you into it and thus enhance their overall security model? What would that world look like, in your view, if that's valid?

35:51

Speaker D

Yeah, I mean, I do. I'm a firm believer in defense in depth. So one product does not miraculously solve everything, just like with our human society. Right. Like, if you think about law enforcement, it's a very good parallel where for national security, you need army to protect you, you need different forms of the military to protect you from external threats. You need the border police to make sure that entrance is regulated. But at the same time, you also need state and local law enforcement. You also need federal civilian law enforcement. So you need different levels of that to make sure that security on the whole is ensured. Similarly, in the context of AI models, yes, you have guardrails which look at prompts and responses. They are valuable. Right. But then there's ways to do them efficiently. And then you also oftentimes would need to combine say, system level features with model level features. So we build those model level features. So you could do something complex like let's say you're running Some sort of customer service bot and a customer has a history of refunds. And then in that we detect, let's say we detect some sort of misrepresentation or some sort of lying. So you can say, okay, block if lying score. You can compose rules like saying block if you know lying score greater than 0.8 and this customer has refunded more than $1,000 worth of Merge. So you can, there's potential to combine, mix and match as well. And I think that's how you improve the overall safety profile of any system. It's always in depth and those layers have to work together. If you. I can go like, I can just look at web applications in parallel, right? Like you need static code analyzers and you need AI firewall. They don't replace each other. If anything, they complement each other other and build an overall robust system.

36:55

Speaker B

And before we leave the subject and maybe maybe look towards the future a little bit, I do want to highlight, I think one of the fascinating things that comes out of the work that you're doing, Ali, is some of the, I guess, customization that's possible in the sense that we've talked a lot about kind of the quote, traditional types of things that you would want to prevent, whether that's jailbreaking or toxicity. But there's also like, like you were saying, in every industry and actually at every company there might be custom types of policies that they want to enforce or certain things that they want to instrument. Does this, does this, I guess how extendable is, is this approach to those sorts of situations?

38:42

Speaker D

Yeah, so this is, this is very extendable to those. I mean our approach is designed for these situations that we've identified this niche in the market with models you're shipping a one size fits all solution. So to put it different, to put it in a different way, the model needs of most companies are similar. Ish. But the safety needs are dramatically different. So you cannot have a one size fits all safety stack that works for everybody. Yeah, sure. There's a general category of undesirables that everybody would want to keep off their platform, but that's a very small subset. With generative models which are capable of doing so much, you need to be able to customize safety as well. So that's the space we thrive in and that's what we're building for.

39:34

Speaker C

Well, that's very interesting to me. I've learned a lot today. As you're thinking, Dan, kind of telegraph, we like to kind of finish up by asking about the future. And as you're thinking about the future both in terms of kind of your specific approach that you're doing, but also kind of like where security is going for models in general in the larger, um. What, like, what kind of, what kind of guideposts do you have that you're thinking about? You know, I, I like to say at the end of the day when you know, you're taking a shower or you're lying in bed at night about to go to sleep and your mind's just kind of going loosely, you know, where do you see things going and what are you really passionate about pursuing or exploring going forward and in time, you know, what's that aspirational? Hey, I'd like to go do that. That you have in mind.

40:25

Speaker D

So I think when you look at safety, when you look at like runtime safety, like when you look at safety, right, there's different aspects, so there's build time safety, which is a whole different class of safety products. But in terms of runtime safety, today, runtime safety only exists at the data layer, which is at the prompt and response layer. The model layer is missing. My aspiration is to sort of build that out. So to show like, that has been my vision. That is the guiding vision behind rings where we want to build model native safety. And I see that that will have to exist for models to be adopted into different settings. Like if you're in healthcare, for instance, today it's very hard to use a public LLM right. Because of data concerns. And no one in their right minds today would fine tune a LLM on PII data. That's because you just can't. It's just not possible. So it's locking a lot of people out of the ecosystem and that's the problem we exist to solve. So I want to become like my vision is to sort of build that de facto model safety layer for no matter what model you're using, I want to build model become the go to for model safety.

41:19

Speaker B

That's awesome. Well, I definitely encourage people check out the show notes and I should have said this at the beginning, but just to make sure people RYNX W R Y X NX NX Sorry, I even missed W R Y N X. So check out rynx. We'll have the link in the show notes and yeah, just really fascinating work and from the community. Ali, just thank you for digging into this topic and bringing a fresh look at things. It's awesome and hope to see you back on the show to find out where things have advanced.

42:29

Speaker D

Yeah, thank you for having me on the show, guys.

43:06

Speaker B

We'll talk to you soon.

43:08

Speaker D

Thanks, bye.

43:09

Speaker A

Alright, that's our show for this week. If you haven't checked out our website, head to PracticalAI FM and be sure to connect with us on LinkedIn X or BlueSky. You'll see us posting insights related to the latest AI developments and we would love for you to join the conversation. Thanks to our partner, Prediction Guard for providing operational support for the show. Check them out@prictionsguard.com Also, thanks to Breakmaster Cylinder for the Beats and to you for listening. That's all for now, but you'll hear from us again next week.

43:18