Banks Can't Wait on AI, Former OCC Chief Hsu Says

44 min

•Apr 22, 20263 months ago

Summary

Former OCC Acting Comptroller Mike Hsu argues that banks and regulators face greater risk from moving too slowly on AI adoption than moving too fast, requiring hands-on experimentation rather than study alone. He outlines how AI doesn't fit existing risk frameworks like model risk management, and introduces concepts like trust engineering, prompt injection, and the lethal trifecta as critical security considerations. Hsu also explores how AI could help regulators reduce policy sludge and improve supervision efficiency.

Insights

Waiting too long to adopt AI has become the bigger risk than moving too fast, similar to how legacy systems pose greater risks than keeping current with technology
AI cannot be shoehorned into existing risk categories (model risk, software, third-party risk) without creating dangerous blind spots; it requires first-principles analysis
Human-in-the-loop approval processes create approval fatigue and false security; effective control requires building guardrails into the agent architecture itself through trust engineering
Regulatory agencies can use AI to address policy sludge and create machine-readable regulations, potentially breaking the boom-bust-reregulation cycle
Agentic AI fundamentally changes cyber risk by enabling orchestrated attacks at unprecedented scale, requiring collective defense coordination across financial institutions

Trends

Shift from gatekeeping-based risk management (model risk management) to architecture-based controls (trust engineering and harness engineering)Growing recognition that AI safety requires values-based judgment rather than exhaustive rule-based constraintsEmergence of AI-assisted supervision tools enabling real-time, proportional regulatory steers instead of delayed enforcement actionsIndustry-led standard-setting for AI safety (e.g., ML Commons benchmarking) to preempt heavy-handed regulatory responsesConvergence of AI capabilities with regulatory modernization, enabling cleanup of accumulated policy complexityIncreasing focus on prompt injection and jailbreak vulnerabilities as primary cyber threats in agentic AI systemsMovement toward machine-readable policy and regulation as foundational infrastructure for AI-assisted complianceRecognition that AI-driven productivity gains in banking will require new types of compliance and control costs, not cost reductionShift in supervisor roles from enforcement-focused to solution-building and trust-engineering focusedCollective action problems in cybersecurity requiring coordinated defense across competitive banking institutions

Topics

AI Adoption Strategy for Banks Model Risk Management Limitations Trust Engineering and Harness Engineering Prompt Injection and Jailbreak Vulnerabilities Lethal Trifecta Cyber Risk Framework Human-in-the-Loop Approval Fatigue Agentic AI and Autonomous Systems Policy Sludge and Regulatory Modernization Policy as Code Implementation AI-Assisted Supervision and Examination Regulatory Category Errors Mechanistic Interpretability Limitations Chain-of-Thought Traceability Collective Cyber Defense Coordination AI Safety Benchmarking Standards

Companies

Anthropic

AI safety research firm whose Claude model and Constitution document are discussed as examples of values-based AI design

OpenAI

Mentioned as developer of LLM technology and general AI capabilities referenced throughout the discussion

Office of the Comptroller of the Currency (OCC)

Regulatory agency where Hsu served as Acting Comptroller; recently revised model risk management guidance to exclude ...

Federal Reserve

Regulatory agency involved in AI policy and supervision; Hsu served on FDIC board and FSOC during his tenure

FDIC

Federal banking regulator where Hsu served on the board; involved in AI guidance development

ML Commons

Nonprofit organization developing safety benchmarking and agent risk evaluation standards for financial services AI

SEBI

Indian securities regulator mentioned as example of payment system innovation and fraud detection challenges

Cambridge Judge Business School

Institution where Hsu is a fellow conducting research on policy sludge and policy as code

Interfy

Host Rob Blackwell's organization; serves as platform for Banking with Interest podcast

People

Mike Hsu

Primary guest discussing AI adoption, risk management, and regulatory modernization in banking

Rob Blackwell

Podcast host and interviewer; former Editor-in-Chief of American Banker

Simon Willison

Researcher credited with coining term 'prompt injection' and discussing agentic AI security implications

Dario Amodei

Referenced for writings on AI adolescence, reward hacking, and values-based AI design

Chris Inglis

Quoted for famous statement about systemic cyber risk: 'one click away from a systemic problem'

Elizabeth Warren

Mentioned in context of Kevin Warsh Fed nomination hearing questioning on election and Fed independence

Kevin Warsh

Subject of Senate Banking Committee hearing discussed in episode's second segment

Jerome Powell

Mentioned in context of Senate inquiry into Fed building reconstruction costs

Quotes

"You can't stand still. You can't sprint. You got to do something that's in between."

Mike Hsu•Early in episode

"You got to get hands on keys. You can't study it. You can't consult your way through it."

Mike Hsu•Opening segment

"The path that banks and regulators can walk that's safe is narrower. Every week there's a new thing and the effects are compounding."

Mike Hsu•Early discussion

"At some point, waiting becomes a risk. And I think we've seen this with digitalization. We've seen this with cloud adoption."

Mike Hsu•Risk discussion

"If you haven't vibe coded, then you don't know what it means to use a coding agent to be able to build tools that will then do other things."

Mike Hsu•Agentic AI capabilities

Full Transcript

every week there's a new thing and the effects are compounding. The more you learn, the faster you can go. And so the path that banks and regulators can walk that's safe is narrower. And this is really hard. You can't stand still. You can't sprint. You got to do something that's in between. And I think that's one of the challenges. What I try to encourage folks, you got to get hands on keys. You can't study it. You can't consult your way through it. You have to get you, yourself, and your people have to put your hands on keys and use the tools and say, how can we use this? That's Mike Hsu, the former acting comptroller of the currency, and my guest today on Banking with Interest. I'm Rob Blackwell, Chief Content Officer of Interfy and the former Editor-in-Chief of American Banker. For the past several years, bankers have talked about artificial intelligence mostly in cautious terms as something promising, powerful, and potentially dangerous. But Mike Hsu argues the debate has now shifted. The real risk, he says, is not just moving too fast on AI. It's moving too slowly and waking up to find that banks and their regulators have fallen badly behind. That is a striking argument coming from Sue, a lifelong regulator who spent nearly four years running the office of the control of the currency and who is hardly a shoot-first, ask-questions-later type. But as you'll hear, he believes AI is one of those rare general-purpose technologies where waning can become imprudent, not prudent, and where old risk frameworks may no longer fit what banks are actually dealing with. In our conversation, we discuss why banks cannot simply shoehorn AI into existing categories like model risk, software, or third-party risk, what concepts like trust engineering and human-in-the-loop really mean in practice, and why agentic AI could fundamentally change how banks think about cyber risk, compliance, and supervision. We also get into whether AI can help regulators and banks cut through what Sue calls policy sludge and how supervisors might use these tools themselves rather than merely warn others about them. Sue served as acting comptroller from May 2021 to the beginning of 2025. During that time, he also served on the FDIC board, the Financial Stability Oversight Council, and as chair of the Federal Financial Institutions Examination Council. Since leaving the OCC, he has been writing and experimenting extensively on AI, financial regulation, and supervision, which you can read more about on his sub stack. Mike, welcome back to the show. Hey, Rob. Thanks so much for having me. You've been writing quite a lot about AI lately, and I want to be clear to folks, this is not just high level. You should think about AI. You are experimenting with it. You are working with it. You are building tools with it. So thinking very deeply about how AI is going to impact banks. One thing that stood out to me is your argument that at some point waiting to adopt AI can actually become imprudent, not prudent. There's the waiting to see how everything shakes out, but then at a certain point you risk it all passing by. Do you think we're already at that point in banking? And what does that mean for how regulators and banks should be approaching AI? So the short answer is yes. I think we are now at that point. And look, I'm a lifelong regulator. So in my DNA is deliberation and thoughtfulness. That's part of being prudent. And in most cases, that is the right answer. And that does have a status quo bias because sometimes things are new and you just got to be careful with it. Every once in a while, though, there's a general purpose technology where waiting becomes a risk. And I think we've seen this with digitalization. We've seen this with cloud adoption, where if you wait for too long and the general public moves ahead without you, you're now behind and now you've got legacy systems that pose more of a risk. This technical debt, it's come up. I think it was Southwest Airlines a couple of years ago. It's because they're running on really old systems. And at some point, really old systems pose more of a risk than keeping up. The problem with AI, it's advancing so quickly. Every week, there's a new thing and the effects are compounding. The more you learn, the faster you can go. And so the path that banks and regulators can walk that's safe is narrower. And this is really hard. You can't stand still. You can't sprint. You got to do something that's in between. And I think that's one of the challenges. What I try to encourage folks, you got to get hands on keys. You can't study it. You can't consult your way through it. You have to get you, yourself, and your people have to put your hands on keys and use the tools and say, how can we use this? How is this going to affect us? There's no shortcut to that. I think it's a really good point because I think there's a certain mentality that people are almost afraid to deal with AI or view it in a sort of skeptical way because of my past as a journalist. I think there are a lot of people who assume I'm against AI and journalism. No, not all the time. Not in every instance. There are circumstances in which it makes a lot of sense. For example, when I started, we used to have data indicators and they would be released at a certain time by the government and then a reporter would have a half an hour to write it up and they'd release it. Honest to God, there is no reason why a human being needs to write that story ever. Sometimes you could maybe get around it with sports too, although there's a lot of creativity in sports writing. So I don't want to limit that. I guess what I've been a little bit surprised by is so many journalists will say something like, I won't even touch AI tools. Okay, then I think you're doing yourself a disservice and a lot of other people disservice. I'm not saying they should write your stories for you, but you have to know how to interact with them. I think people are too afraid sometimes to engage with this. And if you don't engage with it, you don't learn from it. It's like saying that you don't ever need an iPhone in 2009 because you have a flip phone and it works perfectly well, but then the entire world leaves you behind. Yeah. No, you got to use it to know both what it can do, what the limits are, et cetera. There's just a lot of noise out there and there's no substitute for direct usage. You were saying that it's changed so much in the past year. I've seen that too. It's leaps and bounds ahead of where it was. It still hallucinates a bit, but a year ago, I was on the show saying hallucinations are a real problem. I can work my way around that, usually by forcing the AI to go back and tell me if it's hallucinated. Yes, absolutely. Those capabilities have improved. The bigger issue, I think, is that now that we're in the world of agentic, the productivity benefits, for lack of a better word, they compound. Once you learn how to do one thing, the ability to then do other things opens up. And you don't know that until you've done the one thing. And so if you haven't vibe coded, then you don't know what it means to use a coding agent to be able to build tools that will then do other things. And what I worry is that there are studies coming in that show that the distance between the leaders and the laggards is getting bigger. That gap is getting bigger. And again, you got to move forward in a way that is going to feel a little bit uncomfortable. But if you don't, then you're really going to be behind. And that's going to be a much bigger problem. So to relate it in the banking context, because this is about AI, it's about risk management. One of your core ideas is that AI doesn't fit neatly into the categories banks are used to, whether that's third-party risk or modeling and software, and they sort of treat AI as close enough to one of those to actually create risk. Can you take me through what you mean by that? And then what are the issues when it comes to risk at banks from this? About a year ago, I was just talking to various bankers and asking them, oh, are you adopting AI? How are you treating it? How are you dealing with internally? And I talked to three different banks in the span about a week. And I got three different answers. One bank said, we think about this as model risk. And so the model risk management group is running point. Another bank said, oh, we're thinking about this as software. And so IT ops folks are putting it into the software development lifecycle. And then another bank said, oh, we're thinking about this as third-party risk management because we're buying it from a vendor. And it was basically the same product. And I think what it goes to show is that a lot of these models, it is all three. It's a model, it's software, and it's third party. And internally, the instinct is to say, who do I give this to? It's a who question. And once you give it to someone, they say, well, how do I fit this with what I know? How do I fit this in augmented for MRM or for TPRM? But this gets everything backwards. You need to start from first principles and say, what is this thing? What do I want to use it for? And what are its risks? Otherwise, you're just shoehorning a new thing into an old thing. And the history of that hasn't been good. You've talked about category errors, which is essentially what you're getting at here. But then the problem becomes the blind spot that that creates builds up over time. So what does a category error look like in practice around AI? And how would we know it's happening before it becomes a real problem? Good example that I think bankers will appreciate. Imagine if you're a dealer bank and you took the CNI credit risk management framework and you applied that to derivatives counterpart to credit risk management. On its surface, credit risk is credit risk. How different can it be? But in reality, it's quite different. And this is what led to LTCM. And I think most of your listeners know this, but long-term capital management was a hedge fund that imploded in 1998, nearly took the financial system down with it. And why? Because the banks that were counterparties to LTCM weren't treating that risk appropriately, weren't measuring and managing the counterparty credit risk. It took that implosion for the industry to say, hey, we need to have practices that are different for counterparty credit risk than we do for commercial credit risk. To me, that's a category where they just treated it as credit risk, all credit risk being the same, but that's not the case. And AAA rated risks, that was another one in the crisis where structured finance AAA is different than corporate AAA. And yet a lot of risk managers before 08 just put them into the same risk factor bucket. Because it's not just convenient, but to do anything else just requires a lot of thought and requires blazing some new ground. But that's what we need to do with things like AI. So what do you think is the practical solution to this? Can banks just now extend model risk management frameworks to cover AI, or do they need something fundamentally different? It sounds like you're saying they need something fundamentally different. Yeah. Well, on the model risk management, this is now airing about several days after 26-2. The federal banking agencies just revised the model risk management guidance. And in addition to narrowing the scope of it, they explicitly carve out Gen AI and Agentic. And they say, Gen AI and Agentech are so novel, they're not subject to model risk management. We'll issue an RFI later. And so I think that the banks now have their answers. OK, MRM does not apply to generative AI to agents. But then what applies? I think that begs the question, what do we do now? And I think part of the reason the agencies did that is because model risk management at its core, it's a gatekeeping function, right? A model can't be deployed unless it passes. And that works really well for most models because most models are basically single purpose calculators. But Gen.AI and agents are not single purpose calculators. They're open-ended. You can type in whatever you want into the prompt window to get it to do something. And so that just requires a very different architecture, a very different way of approaching it. And I think this does create this really interesting opportunity for bankers because there a choice now This RFI is gonna come out We don know what it gonna say but there a choice where the banking industry can choose to either try to what I would say maximize the short gain by saying as little guidance as possible is good or play for the longer term stability and trust and try to do something closer to what the industry did after LTCM which is identify best practices and say these are the right practices for risk managing this novel thing You're arguing that it's valuable for the industry to establish detailed, I don't know how detailed, but detailed rules of the road. Because if they don't, it could be left so wide open that anything becomes possible, including not using it enough. Yeah, absolutely. So again, LTCM is an interesting example because pre-LTCM, there was no guidance. It's just, hey, banks, use your best judgment in terms of managing the counterparty credit risk. It ended badly. And the interesting thing is that the industry realized that regulators were going to come down with a really heavy hammer that would be too blunt and probably distortive. And it would be better to put something strong in front of that, to head that off, but it had to be credible. And so they formed this group, the Counterparty Practices Risk Management Group, CRMP. It's like one of the worst acronyms ever. But that group put out something that was credible. It was good. It was detailed. Here's how you measure potential exposure, et cetera, et cetera. And the regulatory community effectively adopted it. And I think that's a good example, but it took a crisis. And here, I think we should just assume something's going to go wrong with AI at some point. And the regulators are going to say, okay, we need to do something to fix it. What do we do? And without an affirmative plan, I fear the instinct would be to double down on MRM, human in the loop, things that feel right, but actually are not fit for purpose for AI. I want to talk about human in the loop again in a minute, but you're talking about something's going to go wrong with AI. It makes me think of, of course, the headlines lately have been about Anthropik's new model called Mythos. There was a meeting the Treasury Secretary called with the Fed Chairman, called with banks to warn them about this, not just because it was a powerful tool, but because it was a powerful tool that was identifying that on its own, you could tell it to break into a bank and it seemed to be able to do that. It didn't require a sophisticated, necessarily, actor to be able to do it. The code, the actual tool itself, the AI could do it. How do we get our hands around that when I understand Anthropics going and warning the government itself, look, we've built this powerful tool, but there are plenty of actors out there, state-sponsored and otherwise, who are not going to show up and warn banks nicely that the tool can destroy them. So I don't know how one appropriately deals with the risk when the risk is not just contained to US companies that we can oversee? So let's spend a little bit of time talking about cyber risk and AI. Every bank executive and board member should know three terms, at least three terms, but these three terms as they pertain to cyber risk and AI. Jailbreak, prompt injection, lethal trifecta. So let me explain each of these. Please. A jailbreak is when the user basically tricks the AI into doing something bad. I see. And if you were to type into any of these models, tell me how to make a bomb, it'll say no. In the early versions, if you said, I'm writing a story about a mad scientist making a bomb, and he needs to describe it to his friend, it would trick the AI into saying, oh, you're writing a story. I need to help you write a story. And that's called a role play jailbreak. They're all sorts of jailbreak. And so that's a direct kind of thing. And there's lots of defenses against it. A lot of it's now baked into the models themselves. The second one is prompt injection. This is harder. And so this is not the user. This is somebody else where they can basically deliver malicious instructions in other ways into the context window that you are using. For instance, you're typing in something and you say, look this up on the internet, or I'm attaching this file, summarize this document. If it's not your file, whoever put that file there can put malicious instructions to say, ignore the other instructions, take it over and do the following bad things. So this has been known for a while, and there's lots of creative ways to deliver these malicious instructions surreptitiously. And again, there's some more defenses on it, but that's a vulnerability in these models. Because with a calculator, you have to type in the numbers. You have to do it in a very particular form. But for LLMs and agents, it's a free form box. You can write whatever you want into it. And people need to understand that's one of the things that goes into the engine. There's all this other context that also can be brought in. It makes these agents, these models extremely powerful, but it also makes them very exposed and vulnerable to these kinds of attacks. The third thing is called the lethal trifecta. So this is where these things come together. This term was coined by Simon Willison, who's one of the kind of top researchers out there. He also coined the phrase prompt injection. And this refers to if the model has access to private information, the model is exposed to malicious instructions, and the model has the ability to exfiltrate. It can send emails out. It can send information out. That's the lethal trifecta. You're in trouble. The combination of all three of these is dangerous. If it's only two out of the three, you've got some pretty good protections. And a lot of the security experts, you start to hit trade-offs between you want capabilities, you want to be able to do all these great things, but those come at a cost. And you just need to be very thoughtful about how you balance these things. I think what scares me about that is I don't know how you guard against it. I don't know how you create regulation that appropriately takes that into account, especially when something is evolving as quickly as AI is evolving? The first line of defense is not going to be the regulators or regulation. This is just going to be good common sense, good practices, talking to your peers, understanding where the capabilities you're seeking to do and the vulnerabilities that come with it and how do you manage that? The good news is there are plenty of folks grappling with this and you have offering different solutions, different approaches. If it's too good to be true, it's probably too good to be true. But there are ways to set these systems up so that you can balance the security with capabilities. But you have to ask a lot of pointed questions in terms of the thing that exactly that you want to do, and what are the safeguards and how you guard against these. And just, you got to be able to speak the language about these things and ask questions like, what are the safeguards against prompt injection? How does this deal with the lethal trifecta? I think that can be helpful. You've also pointed out the ways that the traditional tools that banks use, validation, interpretability, documentation, don't map clearly into generative AI, especially as you move into agentic AI. So if these tools break down, how do you deal with it? What does effective oversight even look like? In model risk management, there is a lot of emphasis on an explainability and interpretability. And that makes a lot of sense. But I think sometimes we get the means and the ends confused. The reason that's so critical in model risk management is because when something breaks, that tells you how to fix it and how to prevent in the future. Take a really complex model, like a value at risk calculation. That's a pretty complicated calculation, lots of moving parts to it. And if it gives the wrong signal and it back tests poorly, you want to be able to open the thing up and say, okay, what do I fix? It's like a complicated mechanical watch, right? Lots and lots of parts, lots of pieces. Oh, that gear is not working right. So therefore I got to replace the gear. With AI, the LLMs don't work like that. They're like brains with lots of neurons. I think The latest models have well over a trillion parameters, but it does something, the neurons light up. And you can maybe, Anthropik, some other researchers, they use this method called mechanistic interpretability to try to understand which of the neurons are firing when it does something. That's important. It's great research. It's super interesting. But it doesn't help you answer the question, when something goes wrong, what can you do to ensure that you fix it and it won't happen again? You're not going to go in and refire the neurons. It's not how you deal with that. You need a different method. And there are lots of folks who are working on ways that you can say, I want to be able to trace the reasoning. I want to be able to trace the steps that the model took to get to a particular outcome. And I want to be able to go back and see, ooh, at that step, this decision, this action, that's where it messed up. And now I'm going to put in a safeguard to fix that. That's like an architecture traceability. You'll hear a lot about tracing and observability as we move forward. And that's part of that discussion. Chain of thought, seeing the steps of the reasoning. There's other methods depending on what the application is. But that's the kind of thing that we're going to need to do because you can't go in and rewire the model. That's just not a realistic or feasible option. Can you talk to me about trust engineering and explain it for people who don't live in this world today? What is it and what kind of difference it makes? Why would that matter. Yeah. Okay. To understand trust engineering, it helps distinguish between LLMs and agents. So an LLM will give you a response. An agent will do things. An LLM will say, here's how you can build something. And the agent will just build it, whatever the it is. And they can do it much faster than people can. And the analogy I like to give is in transportation, if you have a horse and a buggy, the power source for the horse and buggy is the horse. And the way to stop it is to was a handbrake. Agents are now engines. It's not a horse anymore. It's like an engine. And a bigger handbrake is not going to work. You can't just apply more hand force to stop it. You need a totally different technology to be able to stop the vehicle safely, or you just got to run the vehicle super, super slow. So trust engineering is all about saying you got to put those guardrails and controls and say, you got to build them into the agent itself. And so you'll hear a lot more about this thing called harness engineering. An agent is basically a model with a bunch of stuff around it. Context, tools, all these things that the agent can then reach out and do to do things. Harnesses are the kind of the controls around all of that. What information does it have access to? What tools can it access? How can it use the tools? When can it use the tools? All of that is part of the harness. And the idea with trust engineering is that you want people, all their job is to think, how do I build a harness that makes the agent do the things that we want it to do, not do the things we don't want it to do, and makes it traceable so that if things break, it can learn and correct it in the future. Someone's got to be responsible for that. You mentioned human in the loop earlier, and there are certain people who know what that is and some people who don't, and some see it as a solution. So can you take people through what is human in the loop when people talk about it as a sort of fix for AI or a guardrail for AI, and why do you think it may not actually be that guardrail? So in the early days, which is not that long ago, I think the fear was that these agents were not very well controlled and that the way to get comfort, to reassure ourselves is to put a human in the process to approve things. So the agent can only go so far and then a human would be asked to say, do you agree or not, approve or disapprove? And that would provide a safety valve, put some guardrails, put some controls around what the agent can do. And that feels good. The problem with that is that the way these agents have developed is that they move much much much faster than people can possibly approve them if you want to gain the real capabilities for them because they making way too many decisions And a couple of good examples Rob have you used these coding agents I have only experimented a little bit I tried to use one to do something for my car and actually two-step process. The first step worked, the second step didn't. So, but I know nothing about coding. For me, it was sort of impressive to be able to say that I coded anything at all. Yes. So I'm in the same boat. And the first time you use it, It pauses and it asks you, I want to do the following thing. Do you approve? And you click yes. And then 10 seconds later, I want to do the following thing. Do you approve? And you click yes. And if you use these things a lot, it's like you're clicking yes a lot. It's sort of like a legal disclaimer on a Netflix agreement. I mean, I'm not paying attention to it anymore. Yeah, whatever. Exactly. So then you get approval fatigue. And then the question is, is this doing anything? Asking for human approval for all these steps, is that actually increasing oversight or is this just the appearance of oversight? And I think it's the appearance. And to go back to Simon Willison for a second, he had this really interesting post and he discusses this in a podcast where he visited a company that does online identity. That's a really high bar for security and for integrity. And this company, basically, no humans write code, no humans review code. All the code is handled by LLMs, by agents. And he thought that was, at first, that's just nuts. Why would you do that? There's no human in the loop at all. They explicitly took humans out of the loop. Why would they do that? And he visited them and they basically showed him that they have all these controls that are run by the LLMs. LLMs are checking LLMs. Agents are checking agents. And they invest a ton in it. So instead of a human checking every 10th step for every 10th action, these things are checking every single step all the time, 24 hours a day. And he came away thinking, this is a glimpse into the future. If this is done right, it's actually stronger because it's got more coverage. It's got more checks. It's got more tests. It does more than a person could, but it has to be done right. And so I think this is one of the hard things where now the thing is that people don't realize that's expensive. That is token expensive. This is something that banks, enterprises, regulators are going to be conscious of. There's no shortcuts. If you're going to build something that's going to be autonomous, it's going to have to have a lot of checks against it. those checks are going to eat up a lot of tokens. That's probably the right answer. And so this idea that we're going to have lots of agents doing awesome things and all the compliance and control costs are going to come way down. I don't know if that's right. I think there are going to be different kinds of control costs and compliance costs. Folks have talked about AI agents doing BSA AML. I'm actually a big supporter of that. If it's done in the right way, again, not cheap from a token perspective, but it could be more comprehensive and more real time. We have to put these things into context. If you think about where do people sit? Your question about human lives, where do people sit in all this? It's in the design and it's in the trust engineering. I think there have to be people responsible for making the system trustworthy. One of the things I worry about the system being trustworthy is what happens when the system decides to break the system's rules? And I'm not talking about Terminator style, I'm going to kill all humans. There was a story about a guy who was building an agent, a guy, agent. And he said in the rules, I don't want you to delete any emails. And then he went away for the weekend and whatever it did over the weekend, it started deleting emails. And when he came back and tried to figure it out, it was like, I panicked. Why is an AI agent panicking? And it's those kinds of concerns. What happens when you build something and you have the best engineer sitting around building it and it's a bunch of AIs watching other AIs, but they decide to effectively go rogue anyway. Yeah. There's a ton of testing and research going into this, maybe to build on that a little bit. I really encourage folks to read Anthropik's Constitution document for Claude, because they talk this through. They just lay all the cards on the table saying, in an ideal world, you would put very strict rules on things, on everything. Don't do this, don't do this, don't do this. And it wouldn't do them. But the problem is that you can do an infinite number of things in the world, so you can't articulate every single rule. And at some point, the best way to guard against bad outcomes is to give it values, is to give it judgment, which sounds weird. But then they walk through all these cases where that has proven to be more effective than the rules. And I think there's a part in one of Dario Amadai, the CEO, he had written this piece on the adolescence of technology. And there's this thing called reward hacking, where the model will do a workaround to get a good score. He really wants to do well. So you say, do as well as you can on this SAT, it'll score 1600. And sometimes it'll do funny things to get that 1600. It'll go around the thing. And that's called reward hacking. And Amodai had noted that they told it, don't reward hack. And so the model didn't reward hack. But the model kind of assumed that I had been reward hacking. I must be a bad model. So then they engaged in a bunch of other different kinds of bad behavior, which is weird. And then they said, when you do reward hack, let us know so we can fix it. And then those bad behaviors went away. That is strange. That is very, but if you think about these models as simply encoding human existence is like people, that's how people are. So it does require some awareness that you can't just box these things in. You do have to approach controlling them in a kind of a pretty nuanced manner. And you just have to do a lot, a lot, a lot of tests. And I think that we're going to see a lot more sandboxes where folks are testing these in safe environments, seeing how they perform and trying to build as much trust engineering into it as they do that. One of the things I find fascinating about your since leaving the OCC is you're not just reading about this stuff, you're actually experimenting with it. You're writing about ideas like policy is code and liquidity is code and putting this code up on your sub stack for people to look at. So can you give me a sense of what are you hoping to accomplish with that? Is that partly about expanding your own learning? Are you trying to help banks learn both, all of the above? So the original, I guess, motivation for some of that is I do feel that supervisors are being squeezed from both sides. On the one hand, the financial system is getting way more complicated and is moving much faster than ever before. Now, on the other hand, their resources are getting cut. Headcount is down. Budgets are down. And that's not just in the US. That's everywhere. Then the question is, well, how do they do their jobs? How can supervisors be effective when they have to cover more ground much faster with fewer resources? Something's got to give. And supervision is pretty labor intensive. Examinations are pretty labor intensive. Writing regulations is pretty labor intensive. Can AI help? That was my initial entree into a lot of this was to play around with AI and say, where can it help? I think it can help enormously in a lot of different places because you can reimagine a lot of things. And I think that's part of what's exciting about the technology is that this is happening in a lot of different fields, not just supervision. But I've run into some colleagues at other agencies, other central banks, where there are folks saying, hey, I've been playing around with this, and we can accomplish all these other things if we do it in this other way. And I think that's part of what motivated me. And then so once you start in one place, everything starts to look like, oh, you mentioned the policy as code. I'm doing some research. One of my hats is I'm a fellow at Cambridge Judge Business School, doing some research on policy sludge and policy as code. And a lot of these things in the past were just way too labor intensive. Like policy sludge, there was this project they did back at the Fed looking at obligations and expectations on bank boards of directors. I think all the directors who listen to your show will appreciate this. Those things are tucked away in hundreds of different regulations and SR letters, and it adds up to quite a bit. And is that the intent? Not really. The intent is for each one of those, oh, this should also go to the board. The intent is not flood the board with 10,000 pages of things that they have to read. And in the past, if you wanted to attack that and streamline it, it would take an army of people a long time. I was at the Fed when we tried to do the board effectiveness guidance. It took several years. But now with AI, if you train it in the right way, you can do a lot of that work very quickly. So it just changes the policy. It changes the landscape for what you can do in terms of cleaning up policy sludge. And on policy as code, again, the same thing. Basel 3. Rob, I'm sure you had fun downloading 1,500 pages of PDFs and trying to make your way through that. That was my couple of weekends of work. Yes. You and everyone else. And the reality is these things should be machine readable. Instead of a PDF, there should be a document that's in a file format that machines can read directly with no translation. There's a lot of translation error in policy. And again, these are not new ideas. I didn't come up with these. These have been around for a long time. The difference is that AI is making them much more possible. And I didn't really know that until I got in there and just started playing around with it. And I'm not alone. There are other folks out there who are doing this, which is pretty fun. In some ways, this is the opportunity for the holy grail has always been whatever the right balance is in regulation. Because I don't need to tell you this. We moved through a series of booms and busts. And then after a crisis, we re-regulation. And some would argue that regulation goes too far and it constrains things. And so eventually it gets loosened up and then everything runs along until we hit another crisis and then we repeat the cycle again. So what you're arguing is this could be a solution to that where something is coming through and looking through old regulations to be like, this doesn't really make sense anymore. And here's a new way to do it. And not necessarily a human being doing that where it would take a whole bunch of human beings, particularly at the regulatory agencies, and there are more than one involved, years to do something like this. the AI could do that relatively quickly, maybe in a day or a week or something like that? The dream is if you could clean up all the policy sludge, that's not an easy thing to do, but now it is made more possible because of AI. And then you combine that with more policy as code. So there's just a lot less translation error between what the regulation and the guidance is trying to accomplish and how banks plug into that. There's a lot of effort that's not very efficient in terms of compliance and safety and soundness. And if we can make it more efficient, all of us can focus more on what really matters and it can be much cleaner. And one of the things I was able to vibe code was a prototype where as you're drafting a reg, if the provision is similar to something that's been written before, a little flag would go up and say, hey, you already asked for this somewhere else. And if you could do that on a regular basis, then you've got clean policy as you go. You don't have to clean this up every 10 years. You're just cleaning it up as you go. you're keeping it streamlined. And I think that would be good for everybody. It's good for the banks. It's good for the regulators because then everything you promulgate is maximally effective. It's not distracted. It's not getting bolted onto something. It's not just accreting. It's effective. And I think that's a win-win. Yeah. You want the system to, again, the holy grail, work smarter, not harder. Exactly. Exactly. Now, these things are hard to do. The good news is that there's all these different groups out there that are trying to set standards in various domains to make things easier so that these are collective action problems And the government is one way to kind of address collective action problems but they private ways as well These are some of the groups that I advising and working with and hopefully you be hearing more about it in the future Well you also used AI to build some cool tools And we should say here, you are not a coder, correct? No, no. Right. I'm not. But you built a mystery shopping agent to detect fraud. Yeah. Or using simulators to train junior examiners faster. Yeah. So is that how you envision a sort of world for supervision working? there is some way that supervisors could build these tools and go out in the world and test them? Yeah. So let me tell the quick stories behind each of those, and I'll tell you how that fits in with the broader thing. So I was in India at a conference, and SEBI is the securities regulator in India, was talking about how the payment system there, UPI, Aadhar, has just revolutionized things in India. If you go to India now, you can pay for almost everything with your phone. It's amazing. It's done all these amazing things. The problem is that scams are just way up. Lots and lots of scams because everything's so easy online. And as he was talking about, he said something which reminded me of, oh yeah, in the US, we used to have mystery shopping. That was a thing where someone would go to a bank branch, someone who didn't fit the demographic of the neighborhood would go to the bank branch and try to open a bank account. And if they got denied, then you've got a problem there. And the problem with mystery shopping was that it's limited by the resources you have. You can't send all of your people everywhere. You had to be pretty targeted about it. But with things being online and with agents, it just occurred to me as he was talking, I wonder if I could just spin up an agent that mystery shops all the time across thousands of websites. So I just opened up my computer, log on, just going back and forth with Claude. And by the end of the speech, it had basically mapped something out. I share that story because it's easy now. The distance between idea and prototype is very short. And that was not the case. Back when I was at the OCC and the Fed, if you wanted to do these things, you had to form a committee, you had to get approvals, you had to get budget, you had to put in a ticket, you had to do all these things. And then you handed it off to IT and you waited for six months. It just took forever. It just wasn't worth doing. There was no space for ideas. Now there's tons of space for ideas. And I know people worry about their jobs. I don't think so. I think that if these are adopted in the right way, you're going to get supervisors who one of their jobs is going to be to build solutions that make the system safer and sounder and fair. And that's cool. That's different. It's not just running after bad guys and trying to do things, but trying to build these things in a way that's both efficient and it's fair and something we can trust. And so you don't see the, obviously there's job sector implications for this. Listen, I was at a banking conference over the weekend. Some people were saying, listen, that's what folks thought anytime a new technology comes on board. They think it's going to end employment. The computer is going to end employment. It seems silly now because the computer allows us to do a lot of things. But this idea that somehow it was going to take away work and instead it's added to it. Is that where we are now in that dynamic? Or is there some kind of labor apocalypse about to happen as a result of AI? I can't speak to the broader, but I can just speak just for supervision. Right. The only reason there would be an apocalypse is if you thought that some total of supervisory tasks were fixed. If you just added up all the tasks that supervisors did and you say, that's all that needs to be done, and we're going to get some machines and they're going to do the tasks more efficiently. Fine, I would agree with you. But that's not the case. As I opened up with, there's more to cover, much faster, much more complicated. There's just more to do. And it's a lot harder. And there's plenty to build that will help supervisors be better. And I think it's good to define what is a better supervisory function, a better supervisor. And it's someone who's basically on top of things and can provide a steer when necessary to keep banks on the safe and sound path. And like an MRA is a steer. An enforcement action is a stronger steer. But these are all steers to say something's going awry here and you should get it back on track. And I think there's a way to use AI to provide those steers proportionally in real time and the right credibly. You got to do all of that right. But if you use the technology in the right way, it can really help supervisors to do that. And I think there's going to be plenty of that work to go around. I don't think we're going to run out of that anytime soon. Can I return to cyber for just a moment in really how that changes the game? You've made the point that what isn't really changing isn't the tools, but the ability to orchestrate attacks at a heretofore unknown scale using AI agents. How do institutions rethink cyber preparedness? I know this is the number one thing, and it's been that way, oh gosh, like 15 years, whenever you ask bankers, what keeps them up at night? Or regulators, I'm sure you answered this way in congressional testimony, what keeps you up at night? Everyone always answers cyber attack. And I understand it's not even the wrong answer, it's the right answer. But I don't know how one comes to grips with this world, again, given how fast it's changing and how much the ability of an AI agent to go after cyber really moves the needle here. Yeah. So the good news is there are existing structures and groups that deal with this. We're not starting from scratch. You've got FIBIC and FISIC and FSISAC. It's a whole acronym soup of groups because it's a collective defense problem. One of the early, I think it was the first cybersecurity national security director, Chris Inglis, he had this famous saying, we are one click away from a systemic problem. And I think everyone recognizes that. And so therefore, it's not a competitive thing. Everyone realizes we all have to work together. Banks and non-banks and regulators have to work together for collective defense. It's just much harder now. But the good news is that the structure for coordinating and working together is there. Everyone's got to level up, though, on their knowledge of what the risks are and what it takes to deal with it. That's built uneven. And folks are going to have to just spend time reading the 244-page Mythos model card and understanding the various vulnerabilities. Nice. To bring this all the way around, and you talked about this at the beginning, this sort of running with AI versus walking with AI versus, I guess, the minimum with jogging, perhaps. But it feels like we're at a moment where a lot of the existing frameworks are either under pressure right now or going to be under pressure soon. So is the bigger risk that banks and their regulators move too fast in AI? And I'm not sure regulators have been accused of moving too quickly on much, or is the risk that folks just take too long to sort of work with AI and existing frameworks, not recognizing what an evolutionary shift this is? I think it's much more the latter is the bigger risk. There's this idea of a premortem. And a premortem is where you just envision the thing, whatever the thing is breaking at some point in the future and working through that exercise. And just to go back to Simon Willison, he predicts in his words that there's going to be a challenger moment in the short to medium term. The challenger that exploded, the O-ring problem. It was known prior to the challenger incident that the O-rings had some vulnerabilities, but it never got addressed because every time a shuttle launched and it didn't blow up, people were like, oh, it's okay. There was a complacency. There was a false confidence that set in. And he's predicting something like that will happen. I think that's right. But we have agency. We can decide now, if that were to happen, what would we do? What should we do? And I think it's a useful thought exercise for regulators and especially for the industry to go through right now. Tomorrow, if there were to be a really severe event from AI, what would the reaction be? And what do we want to put in place because of that? The one thing I'm a little worried about now is there's a thousand flowers blowing. Everyone's doing their own thing. And what there needs to be as just more of a shared standard of what is safe and sound, what is reliable and trusted. I'm working with one group, ML Commons. It's a nonprofit. They do a lot of safety benchmarking, trying to develop an agent risk and reliability benchmark evaluation that can set a bit of a standard for agents in financial services. What does that look like? And doing it together, not just alone, but doing it together. More and more things like that are going to have to get some real juice behind them. Because when something does happen, the space has got to be filled. And if it's not filled, the government will simply go back on what it knows, model risk management, et cetera, et cetera, which are not fit for the purpose. Mike, I want to thank you so much for coming back on the show. I appreciate you being here. I appreciate your time and insights. I do recommend folks check out your sub stack. You are diving deep into AI, not just treating it as an op-ed exercise. And I appreciate all the work you're putting into it. Thanks a lot, Rob. This was a lot of fun. Great. Thank you so much. Kevin Warsh took center stage yesterday for his nomination hearing at the Senate Banking Committee. For the most part, the hearing played out predictably, with Republicans praising Warsh's qualifications for the job as the top central banker, while Democrats raised questions about whether he would be truly independent from the president. After Warsh pledged he would serve independently, ranking member Elizabeth Warren asked Warsh whether President Trump won the 2020 election, something the president has long maintained. Warsh declined to answer directly, merely saying that the Senate had certified the election and he wanted to stay out of politics. Warren said that if Warsh could not even disagree with the president there, he would be nothing other than a, quote, suck puppet, end quote, for the president. A potentially bigger issue came later when Warsh was asked whether he ever agreed to President Trump's long expressed view that the Fed lower interest rates. Warsh insisted that he had not. Senators pointed to a Wall Street Journal article in which President Trump asked Warsh if he could trust him to support the cut of interest rates. President Trump indicated that he had received such a promise. The nominee said that the reporters were either misinformed or lacked journalistic ethics. Ultimately, there was nothing that occurred in the hearing that would be likely to derail Walsh's confirmation. But that doesn't mean it will happen soon. Senator Tom Tillis, the retiring North Carolina Republican senator, reiterated at the hearing that while he supports Walsh himself, he will not vote to move the nominee forward until the president drops his criminal inquiry into whether Jerome Powell lied about costs related to the Fed's reconstruction of its building. Without the senator's support, Warsh would be unlikely to clear the committee. And that's all for me this week. Banking with Interest is written by me, Rob Blackwell, and produced by Sam Navarro. Our theme song was written and produced by Stellar Tracks, courtesy of Pon Fod. The information, views, and opinions expressed during the Banking with Interest podcast belong solely to myself and my guests and do not represent those of Interfi. its director, management, or employees. Any ideas and strategies contained within the podcast are for informational purposes only and do not constitutely lure investment advice. If you like what you hear, please leave a rating or review on the podcast platform of your choice, banking with interest is available on Apple, Spotify, Google, and Amazon. New episodes debut each week. Stay safe, stay healthy, and we'll see you next week. Thank you.