Brex’s AI Hail Mary — With CTO James Reggio
Brex CTO James Reggio discusses the company's comprehensive AI transformation strategy, built around three pillars: corporate AI adoption, operational AI for cost reduction, and product AI for customer value. The episode covers Brex's multi-agent architecture, internal AI platform development, and how they've restructured their engineering organization to build AI-native financial products.
- Multi-agent networks with agent-to-agent communication can be more powerful than forcing LLMs into deterministic workflows, enabling more sophisticated planning and execution
- Building internal AI platforms with domain expert accessibility (through tools like Retool) enables non-engineers to refine prompts and manage AI systems directly
- AI fluency frameworks and re-interviewing existing employees on AI-native coding skills can drive cultural transformation without creating fear-based adoption
- Operational AI in highly regulated industries like finance benefits more from simple, auditable approaches than sophisticated techniques like reinforcement learning
- The half-life of code has declined significantly with agentic coding, making it easier to experiment with different tech stacks and frameworks
"We have like three pillars to our AI strategy. We have our corporate AI strategy, which is how are we going to adopt and like buy AI tooling across the business and basically every single function to be able to 10x our workflows."
"What would a company that was founded today to disrupt Brex look like? And then we tried to basically use the answer to that question to form this team internally."
"The half life of code has declined so significantly with agentic coding, it's actually quite easy for us and for anyone else to kind of try on for size a variety of different pieces of tech to figure out what is going to be most ergonomic for solving the problem."
"I think trying to craft LLMs into deterministic workflows and dags is kind of underselling the power that they have to actually plan and execute in a more sophisticated, fluid way."
"We're not going to try to pick winners in the horse race between the foundational model providers or the agentic coding tools. What we do instead is we will procure like a small number of seats, multiple solutions and then we'll give employees the ability to pick whatever one they want to use."
We have like three pillars to our AI strategy. We have our corporate AI strategy, which is how are we going to adopt and like buy AI tooling across the business and basically every single function to be able to 10x our workflows. And we have our operational AI strategy, which is how are we going to buy and build solutions that enable us to lower our cost of operations as a financial institution. And then the final pillar is the product AI pillar, which is like, are we going to introduce new features that enable Brex to be a part of the corporate AI pillar of our customers? It's like we want to build features and be a solution that somebody else is saying to their board, hey, we, we adopted Brex and this is part of our corporate AI strategy.
0:00
Hey everyone, welcome to the Late in Space podcast. This is Alessio, founder of Kernel Labs and I'm joined by Zwicks, editor of Latent Space.
0:51
He. Hey, hey, hey.
0:57
And we're here with Jason Rangiosi 2 at Brex.
0:58
Welcome. Hey, thank you for having me.
1:00
Thanks for visiting from up in Seattle where I've been a little bit. It's cold up there, huh?
1:01
Yeah. And we have an atmospheric river hitting the city right now, so a lot of exploding. Yeah, well, yeah, we're getting the full on winter effect right now.
1:07
Well, you're here. We're talking about the sort of AI transformation within Brex. There's a lot of interesting tidbits that we are going to draw from your article, but also your background. You have got a wide array of experience from stripe to banter to convoy. And I think also mostly I'm interested in your journey as one of the rare people that have transitioned from a mobile engineering leader to a cto, which I think is also a bit more rare. I used to have this comment in the past where there's a career ceiling for people who work on client only things where usually they don't hit cto, whereas they typically promote the backend people. The backend clouding for people. The cto.
1:15
Yeah, you know, it's, it's something that I, I hear fairly, fairly frequently because there aren't that many folks with a front end background who reach this level of leadership and it's exciting for me to be able to represent that group. But I, I'll say that even though my resume kind of reflects that I've been more on the front end of things. It's probably more my experience as a founder a couple times over that actually helped me get to this, this level of my career working for somebody else becoming CTOs very much like a leadership and like general business role as much as it is a technical role. And so I think it was more the skills that I built from starting companies and trying to build those up made me a decent fit and enabled me to get the nod from Pedro to take this on as my predecessor left about two years ago.
1:54
Yeah. One thing I'm curious you guys, commentary. This is a little bit broad, unscheduled, but a lot of startups are bragging about how many ex founders they have. And yes, to some extent you want people with the founder mentality and agency, which is what, what you did to be your employees and to, to take initiative in the company. But also I wonder if it's becoming anti signal sometimes. I don't know if you've thought about this.
2:36
I think it's more about the turn. For me, especially when people are hiring ex founders is like if you're truly of the founder gene, it's kind of hard to just stay somewhere. It's like an IC for too long and then it's like, all right, I joined this thing and then in one year I'm back to being a founder. And I'm curious for you. Yeah. What was your. I'm sure you thought about leaving and like doing another company.
3:00
In fact, that was, that was the, the alternative I was considering. Even at the time that I got the phone call where they made me the offer to become cto, I was thinking about leaving to go start a company. And you know, I think what's interesting about it, we, we actually launched sort of like a new recruiting and employee value proposition for Brex a couple months ago called Quitters welcome, where we actually intentionally are leaning into this idea that we have a disproportionate number of folks who go on to become founders or like of a department when they leave our company. And we celebrate that. It's actually something that I'm very proud of. And that means that we welcome in people who want to get a different experience. I think that there's certainly a lot of founders who don't scale their own businesses to the scale that we've achieved at Breck. So there's something to be learned when they come in and then we're very happy to support people on their way out. And so I actually really like hiring former founders or future founders. The one value proposition I find that's most relevant because a lot of the folks we're hiring as AI engineers are kind of folks that are either like winding down their companies or considering maybe running AI startup. The thing that resonates the most with them is that we oftentimes can give them problems to solve that are interesting problems that maybe they even went to want to build their own startup around. But with instant distribution, that is the allure is you can come into this business and build like financial AI applications and instantly have that deployed to roughly 40,000 customers across, you know, the Fortune 100, down to, you know, tens of thousands of startups. So that, that's what is I think appealing founders. But, but the, the challenge then is making sure that we set them up for success in an environment that still feels a little bit like the startup that they might build themselves versus like something that's too corporate.
3:19
Yeah. Instead of doing your own company and then coming to you and be like, can I integrate into Brex? You get all the data from my finance agent. How's the engineering team structure?
5:04
Yeah, so we have about 300 people in engineering, like 350 total across EPD. And for the most part we structure around our product domains. And so this means that Brex is a corporate card, it's also a corporate bank account, it's expense management, travel and accounting. And so we actually have sort of full stack product domains that are roughly like 30, 40 people for each of those that have everything from like the low level infrastructure up to the web and mobile experiences. That's generally like the structure of our engineering organization. And then we have naturally like a organization that focuses on infrastructure, security, it. And then there are two additional centers of excellence that we've kind of built that kind of violate that org design where we've felt the need to put more focus or like operate slightly differently. And AI is one of those areas where we have another team of just roughly about 10 people who are focused primarily on LLM applications. And we wanted to create a bit of a separation there because the way that we were thinking about this, and this is actually something we did this summer, is we paused and asked ourselves on our AI journey towards infusing our product with AI and generating customer value. We asked ourselves, what would a company that was founded today to disrupt Brex look like? And then we tried to basically use the answer to that question to form this team internally. So it's a little bit off to the side. Ideally everybody kind of comes up to speed and contributes LLM features, but we have this sort of off on the side right now in a centralized manner.
5:13
What's the difference in AI adoption for those teams? So are the people on The LLM team, much bigger cursor users, clock co users or do you see similar diffusion?
6:50
It's actually fairly uniform across the entire engineering department. And it's actually kind of funny like one of our largest cursor users is actually an engineering manager. So like and I think that this also just like speaks to our core value of operate at all levels where we want all of our ems and everybody in leadership to still basically do the job that they're managing, manage the work. So it actually is, I think the journey of getting everybody into using agentic coding was not sort of exclusive to like the AI group. Yeah.
7:00
In fact I think this podcast was actually set up because I cold outreached to Pedro because he tweeted this. I assume this is the center Rex's. He says I started a new company inside Brex to build the future of agentic finance. No BS, just builders building 996 and pushing production grade agents to 30,000 finance teams now. 40,000. And then he actually has a little job description which I think is really interesting. I'll skip that and go straight to Brex accelerator growth 5x and cut burn 99% in the past 18 months. I assume that's a mix of internal AI, automation and other stuff but basically I wanted to put some headline numbers up front to impress people before we dig into the details.
7:34
Yeah, absolutely. And you're correct. That's the team that we have, this AI team. What was that?
8:13
Very young team.
8:18
Yeah, it's very young. I mean and it's been really interesting. The composition of the team is very young, like AI native, like 20 year olds who basically grew up with the tech. Kind of paired off with more like staff level software engineers that have been at Brexit for a little while who can kind of navigate like the existing code bases and like understand the product and the customer deeply. Like we've formed these really couple of tight, tight knit pods in the AI org where it's like three people generally somebody who has like more of a product, a customer focused background that like staff engineer who knows where the skeletons are and then like a much younger like AI native engineer who can just do things with, with agents that like the rest of us dinosaurs maybe uh, don't, don't, can't either dream of or like or where are. I think, I think part of it is like sometimes the too much experience or too much knowledge of how to solve a problem and actually be an impediment to thinking differently about it and thinking about it from like an AI first Lens. But yes, we. We've been. We've been slowly growing that team just in the same way that like a pre seed startup, you want to be very, very careful about talent density and like, very deliberate. Like only hire when you absolutely need it. And so, yeah, at this point, it's just about 10 people, and I think it was probably four or five people. I think everybody was actually in the photo that was attached to that tweet when Pedro put that out a couple months ago.
8:19
Yeah, we'll put it up. It's a photo at 1:20am on a Friday.
9:34
Yes. Oh, yeah, yeah. Because we always do, like Friday Friday demos, and that's a time for everybody to get kind of exec review time. And so everybody's in Seattle. Those folks were all in Seattle, but they're actually geographically distributed. We have a couple folks here, a couple in Sao Paulo, a couple in Seattle.
9:38
How at decibel, we have this, like, AI center of excellence, which are basically the people running these teams across companies.
9:56
Yep.
10:01
How do you make the other engineers not feel like you're not special? I think that's something that I hear a lot. It's like, hey, you know, why aren't these people working all the LLM things? And like, I'm stuck working on, you know, the KYC integration with whatever, you know what I mean? It's like, how do you build that culture?
10:02
You know, it's interesting. I thought that that would be more of a problem. But the benefit of having really optimized our engineering culture around business impact actually causes it to cut in the other direction, where folks, some folks don't want to work on the AI products because it doesn't have as much clear, direct, like, business impact right now doesn't impact revenues directly. And so I think folks, for the most part, we've enabled folks who have a strong desire to work on AI products to join that team. Like somebody transferred out of our expense management organization to come over there because they were really passionate about taking their knowledge of policy evaluation and bringing it into the AI team. But the most part, I think everybody understands how their work ladders up. And maybe there's some friendly rivalry because the folks who say work on a card product, they drive 60% of our direct revenue. And so they're pretty happy with that and they don't feel like they're being left out. And I will also say, as you probably saw in this piece that we put out with first round, there is a lot of smaller applications of LLMs peppered throughout all of Our product and operations teams. It's just some of the more novel like agentic layer that sits on top of Brex that has been put together like in this, in this sort of isolated team. So it's not like folks aren't getting to build with LLMs or use LLMs on a daily basis.
10:17
Yeah, maybe run people through the Brex Agent platform. We'll put the diagram in the video where you had the LLM gateway. You have like the MCP layer. We just had David, the creator of MCP right before you. So this is really very timely. Yeah. How did you start building that? What's the architecture?
11:40
Yeah, the architecture I think simple, is elegant and we've had basically an LL gateway and a basic handled platform from the very early days. In fact, right before being tapped to become cto, I was leading a AI Labs team internally in the wake of the announcement of ChatGPT. Everybody saw this through technology and said, hey, what are we going to do with it? And so one of the first things that we did, I think January 2023 that would have been was try to put together some internal infrastructure that made it possible for us to deploy, manage version and eval prompts and then be able to manage like data egress and model routing and have some very basic like observability and cost monitoring in an LLM gateway. So that's infrastructure that we stood up and it still continues to power a lot of those smaller, more, let's say like Precise applications of LLMs. So like for instance, we've, we've set up a completely automated pipeline for evaluating customer applications to get them onboarded instantly to Brex, which is something that used to require human intervention either for underwriting or kyc, but now we basically have a series of agents and particularly like research agents that will go and do the work that humans would normally do. And so that's running on top of this, this hand rolled framework. And then for the agents on BREX that we announced in our fall release, which is like this agentic layer that we're building that sort of sits on top of Brex and can embody workflows that a finance team would normally hire humans for, we've actually started using Mastra for that as like the kind of primary primary framework for accelerating us. We actually have built everything in Typescript, which is another like technology choice. That's, that answers the question of like what would we do if we started BREX today? But isn't the case for all of our existing backend code which is either Kotlin or Elixir and then we have a mix of PGvector, Pinecone and like, I think what we've seen is we're always, we're always reevaluating the tech and framework choices as we go. Because the half life of code has declined so significantly with agentic coding, it's actually quite easy for us and for anyone else to kind of try on for size a variety of different pieces of tech to figure out what is going to be most ergonomic for solving the problem.
11:55
Double click on Mastro. That's a new choice, an interesting one.
14:17
Yeah. I mean I think that the main, the main reason that we adopted Mastro is that it provided the ergonomics that we were actually that the ergonomics of Mastro are quite similar to the internal LLM framework that we built two and a half years ago. Whereas like LangChain was available at the time two and a half, three years ago. It didn't quite feel right to us when we were trying to. It kind of addressed the things that weren't the pieces that we, we needed to address, which is like being able to have really simple observability and logging, tracing.
14:20
LangChain didn't do it.
14:57
I mean, at that time it didn't. I think it was really, I think it was.
14:58
Well, they fixed that.
15:01
Yeah, no, they certainly, they certainly did. But, but, but so like we, we did. I'm trying to remember ancient history. We evaluated LangChain, turned off of it, built our own thing and then as we were looking, we kind of want to deprecate this internal framework that we built because at the end of the day it's not leveraged for us to maintain that. And Master ended up fitting the bill for the feature set that we were looking for. And I think what's been interesting is about half of the applications that we're building right now are on the agent layer, are running on Mastra and then the other half are actually still running on like yet another internally developed framework which is a framework that's focused more on networks of agents. So sort of multi agent orchestration versus more like strict, like you know, single turn or like workflows which are easier to use, like either Langgraph or Mastra.
15:02
Tell us about your multi agent framework. I mean that's. What are the design considerations? Why is this the first we're hearing about it?
15:55
Yeah, yeah. So it's funny, a big reason why we haven't written more about this is that it continues to evolve quite a bit and I feel like we actually had a blog post that we were going to put out in conjunction with the fall release talking about how we built this. And by the time that we finished the blog post and had all the package ready, it was already like halfway outdated. And so the way that Vista started to emerge is this multi agent network approach to implementation was when we were trying to scale up our sort of consumer grade Brex assistant. So if you think about like Brex and our customers, there's really like two very broad Personas that we serve. We serve members of a finance team who are generally like going to be doing like in roles like accountant or controller or head of T and E. For those folks, they are going to be interacting with agents that are much more specific to their roles. But then the other broad cohort of users we have are like employees of companies that have deployed Brex. So you know, you go join a new company, that company uses Brex, you get your Brex card. And our goal for employees is for Brex to completely disappear. Like the best UI UX for Brex is just the card. Like every single thing that you have to do in the software beyond just swiping the card is like an opportunity for AI to eliminate some work for you. And so what we thought was the right approach to solving for that was to, was to embody like an executive assistant for every employee. Because I as an executive at Brex, I have an EA and she knows enough about me, she has access to my calendar, my email, has all the context on when I'm traveling and for what business purposes. And so she's basically able to do everything that I would be obligated to do in Brex, be it like booking travel or like doing expense documentation. And so what we wanted to do is we wanted to build like that EA connected to the same data sources and see if we couldn't simulate that behavior so that you know, ebay basically your interface to Brex's SMS in the card. And when we started building that out, you know, the most naive like architecture for that would be to have an agent with a variety of tools and maybe, maybe do some, some rag to ensure that it has like appropriate context for the conversation. But what we were finding is that the wide range of different product lines that exist on Brex made it difficult for one like agent to perform well, being responsible for everything from like expense management to finding and booking travel, to answering policy and procurement questions. And so that's when we started breaking down the problem and into, into a variety of Sub agents that sit behind an orchestrator. And obviously this is something that can be implemented using Langgraph or Master, even has the notion of these as like network switches and data. But what we found is that it was easier for us when it came to being able to build evals for the system. We kind of just hit the eject button and built our own framework which is one in which we have agents that are able to basically DM with other agents and have multi turn conversations amongst themselves to coordinate, to complete a task or to complete an objective. And what's been nice about that is it means that you can have your Brex assistant. There's one single, one single like point of contact between you as an employee and the Brex product. And then behind your assistant if the company has like expense management turned on, you have that. If they have reimbursements, there's another agent for that. If they have travel attached to the renovation agent for that, it actually also then facilitates like our conception here is that you know, it's like generally like software encapsulation patterns taking like sort of projected into the agent space. It also makes it easier for us to have like the team that owns and understands travel, like be the ones to go and iterate on that and without needing to worry about like regressing the total system or needing like one team to own every single possible action you could take as an employee. And I'll say that like I'm still of the mindset that somebody will build a great framework and we may ultimately migrate to it, but. Or it might be us that we ultimately source this, right? Like, but, but for us like this is, this has worked out quite well in like lieu of like a couple other approaches that we tried along the way that just didn't perform well, which was to overload the agent with a variety of tools or contextual context switching where we try to say, oh, this conversation looks like it's more about reimbursement, so let's update the prompt with more reimbursement context. That was another approach that we took that didn't perform as well is actually having a reimbursement agent that it would collaborate with.
16:02
What about MCPS as sub agents? That's another pattern.
20:31
The key thing there is that there's actually a lot of value in having multi turn conversations from like the orchestrator, the assistant to like the sub agent. Whereas like you know, a tool call is basically just like one rpc. And so oftentimes what will happen is, you know, let's say, let's say the, the user reaches out to their REX assistant and says hey, like am I allowed, like how much am I allowed to expense per person for dinner tonight? I'm taking my team out. And the, the, you know, your assistant's going to then reach out to the policy agent. Maybe the policy agent needs to know in order to answer that question. Maybe it needs to know like whether this was like a customer event, a team event or whether you're traveling and so it may actually send instead of. It can't just answer the question. So it's going to reply back to the assistant and say hey, I need you to ask this clarifying question. And so then the assistant will return to the user, ask clarifying question and they'll basically have this sort of multi, multi turn conversation across multiple agents versus it just being encapsulated in like a single call and response tool call. And so there are still like all the, all the sub agents have a ton of tools, but I think of like the MCP and tool usage as being like the interface to all of our conventional imperative systems, not at the AI space.
20:35
Yeah, that's the conversation we were having earlier whether or not it should be an agent regional as well or like, yeah, there should be like a chat back.
21:52
Exactly, exactly. And that's the thing is like, okay. And one of the ways that we actually grafted this into master before we built our own framework was to, was to make every sub agent a tool. And then the input was just natural language, the output was natural language and the if you needed to have multi, multi turn, you would basically just put the full like prior conversation and as you kept calling, calling this sub agent as a tool and it's just like at that point you're like, okay, the ergonomics are kind of. The framework, framework is fighting me on this. It's actually helpful for us to basically conceive of it as an org chart and like it's the agent org chart with, with. You know, my EA is DMing other specialists and having brief conversations to support me as their client.
22:00
Yep, that was a really good deep dive. Thanks for indulging. I feel like you guys are not afraid to make your own tech, which I think is a competitive advantage. I really like that culture. Maybe we should go a bit breadth first as well. Of course I think we also deep dive a little bit too much in one area and we'll put up the chart. But I'm also very interested in the sort of internal agent stuff, the operational stuff and just the general platform Scope. So please feel free to just go into your spiel on it.
22:43
Yeah, of course. So one of the things that I was trying to do at the beginning of the year as cto, I think it really felt to me to articulate what our AI strategy was as a business. Every, every board of director was, you know, or every, every member of our board is like, hey, what's your AI strategy? And while we were doing a lot.
23:14
Of doing things, we'd really go, he's got it.
23:30
Well, yeah, yeah. And if I didn't, I, I'd be in trouble. I think he also was counting on me given that I was doing the AI organization before CTO to, to have, that's true. But, but a big part of it was like we, we were doing a lot with, with LLMs. It was more like these little one off features and you know, hey, like maybe mix in some suggestions here or maybe do a little bit of OPS automation over here. But it wasn't, it wasn't easy to, to kind of create like a verbal framework of all of these investments. And without that framework then we weren't able to like set, set a vision or a roadmap for, for investments. So what we did at the beginning of the year is we took everything that was going on as well as all of our ambitions, all of the good ideas as well as like the problems we were trying to tackle as a business this year. Throw it all on the table and see if there were some ways to cluster it into a framework that made sense to the business, to our board, to ourselves. And we came up with, I think this is not particularly novel but has helped us quite a bit. We have like three pillars for AI strategy. We have our corporate AI strategy which is how are we going to adopt and like buy AI tooling across the business in basically every single function to be able to 10x our workflows. Then we have our operational AI strategy which is how are we going to buy and build solutions that enable us to lower our cost of operations as a financial institution? Because I think it's fairly intuitive. Like financial institutions like ours face a lot of regulatory expectations and it's just like a high ops burden for running our business. So it's sort of like a lot of kind of internal use cases like being able to do like fraud detection, underwriting, kyc, be able to handle dispute automation on card transactions. Those types of operational investments are our OPS AI pillar. And then the final pillar is the product AI pillar which is like, are we going to introduce new features that enable Brex to be a part of the corporate AI pillar of our customers. It's like we want to build features and be a solution that somebody else is saying to their board, hey, we adopted Brex and this is part of our corporate AI strategy. And so it kind of has this nice little feedback loop. And we basically within the company split, you know, did a little bit of divide and conquer where folks in IT and on our people team were more or less spending more of the effort driving on corporate AI really like looking for making the procurement decisions, like creating a culture of experimentation where we spotlight and incentivize people for trying to sort of improve their personal workflows using AI. And then the pieces that I've been more involved in have been operational and product. We were just talking about products here, which is like the agents on Brexit stuff. But I think that the operational AI investments have been some of the most sort of immediately impactful to the business because we have hundreds of people who work in our operations organization. And it's actually something that differentiates us because our CSAT and the quality of our support and service is very, very high. Something we're very proud of. And so trying to figure out how can we automate significant portion of this and use LLMs in a way that doesn't degrade the customer experience. And then also kind of addresses like, what is the future of the roles of the people who we already have working full time for us. This is where Camilla, our coo, who kind of co wrote the piece with First Round with me, she's been leaning really aggressively to help every member of the operations organization start rethinking their role as being not people who kind of execute against an sop, but are people who are going to like build prompts, build evals, and like become more AI native and like the way that they do work. And so a lot of the engineering we've done has been to enable folks, say, and in fraud and risk to be able to refine prompts and add additional automation to their workflows. Yeah.
23:32
And it's secret fourth pillar, the platform.
27:29
Yeah, yeah, exactly. That is the thing that ties it all together. Exactly. Is the platform. And I think what's been really nice is that even though the platform is kind of a loose term because it consists of a wide variety of technologies, as I said, like, we haven't been too religious or dogmatic about everybody needing to be on one particular thing. What we've seen is that by making a variety of sort of ergonomic options for building with LLMs available. It like really has made it easier for us to make a quick leap forward on operational AI. Like as soon as we put our mind to it we said like look, no, we want to hit 80% automated acceptance rate for all startup and commercial businesses that apply for Brexit. Like we want a decision within 60 seconds that's fully touchless, no humans involved. We were able to break that down and then actually build the, build the agents, build the tools on top of that platform really quickly. And a lot of those tools are the same tools that our product AI agents use as well.
27:32
I was pretty sold on the Conductor. I don't know if this is under exactly the bucket, the Conductor one provisioning command. I was like yep, I want that.
28:30
Yeah, that was actually, I'd love to talk about that. So that's, that's actually on the corporate side and I think that this goes back to maybe another intuitive but, but I'd say like bold decision that we made which is that we are not going to, we're not going to try to pick winners in the horse race between the foundational model providers or the agentic coding tools or like basically anywhere where there's, there's an active horse race. What we do instead of like trying to pick a single solution is we will procure like a small number of seats, like multiple solutions and then we'll give employees the ability to pick whatever one they want to use. And so for instance like we allow employees to basically go to in Slack and use Conductor 1 to get a ChatGPT, a cloud or a Gemini license and basically you can just like build your own stack where you pick your, you pick your like chat chat provider as a, as a dev. You can pick you know between like cursor, windsurf, cloud code credits like and you can basically craft your stack to your preference and easily switch between them. And what that does for us too is when we're going to like obviously we have sort of enterprise agreements in place for all of them for the sake of like the you know the, the privacy and non training guarantees. But it's fun because when we go to renew these contracts it, it, we can basically resist the need to like do a wall to wall deployment. We can say hey look like usage trends. Our, our employees are voting with their feet, they're voting with their dollars. And you know, maybe your tool isn't as, as hot as it was a year ago.
28:39
Does it give you a dashboard of what people are choosing?
30:05
Yeah, actually we look at that we were looking at that as we're going into budgeting for next year. It's very interesting.
30:08
I would love to see that those, what's. Anything that's really up, anything that's really down.
30:13
It's fascinating how different the landscape is every three months. And I think one of the interesting challenges we had early on was getting folks to just try these tools, try to incorporate agentic coding early on, I say 12 to 18 months ago now get folks to just take the time to try a new workflow. And now at this point I think what we're seeing is like even if, you know, a new model hits the same like when Codex came out and everybody's like, oh, Codex is better at code jam, but it's a little bit slower, I find fewer folks are like kicking the tires on new things because they're just so comfortable with the ergonomics of their current workflow that some folks are just like, I'm going to stick with cloud code because I know it now. I've been working with it for like nine months. So I don't need to keep switching. I don't feel the incessant need to keep trying new things because I've, I've gotten, I'm an iPhone person and I'm just like going to stay with an iPhone even, you know, even though there's some really sexy Android hardware out there.
30:17
Do you have one of the big numbers? Like 80% of all of our code is written by AI or. But how do you measure it internally?
31:23
Yeah, no, not really. We, we, I mean I, what we do is we'll, we'll measure like the attributions on the number of commits that I have the like co authored with and we pull some of those stats. But I don't index heavily. Like in fact I don't index on those at all. And honestly, like I, I don't know how I, and honest, like honestly calculate that number.
31:30
Yeah, I agree. Yeah.
31:51
And so, so I, the thing that, the thing that we're really just, you know, we're up the point now with like our AI agentic coding journey where now we're trying to solve the second order effects of like a little bit too much slop, maybe a little. Not enough. Yeah, exactly, not enough. Like rigor and code reviews. We're trying to. The adoption is there and now we have to figure out how to mature in our usage of these tools so that quality or long term maintainability doesn't suffer as well as maybe one of the other facets of Being able to generate a lot more code more quickly is the drift between team members as far as understanding of the, the code that's in their services increases? Is it like everybody's moving faster and more independently? That is another sort of risk that we're starting to see like, you know, an incident response where folks don't know, they don't know a service as well as they used to because it's changed so much in the past couple months. Because everybody's moving more quickly.
31:52
Yeah, this has been a major topic for me this year on code based understanding and slop because obviously it's so much easier to generate code. But then now we have to review it and to some extent you can't really fight AI with more AI. You can't just be like, oh, just throw an AI reviewer onto AI code and you solved it. And so you do need to just scale human attention. And I think that's something I've been pushing a little bit in terms of, well, you're just going to, every engineer is just going to own more code, period, and be parachuted in and be expected to ramp up and be productive and also fix bugs and if you're on page of duty or whatever to just. Because everyone's going to try to be more efficient and you're supposed to see ROI productivity because if you don't, then what's the whole point of this?
32:53
Exactly, exactly. And I think it's funny, you're going back to the point of you could add AI on top to solve the problems that the AI introduces, but you just keep. That's like an endless chain.
33:38
I mean the code rabbits of the world, the graphites of the world would say, yes, actually you can. And so that's the little bit of the tension there.
33:51
Yeah. You know, I've been thinking a lot about how the craft of engineering is evolving and I will say that I feel further away from being able to predict what it looks like than I did this past summer when I spent a bunch of time. I actually basically went on leave for a month and joined the team that, the AI team that we were building just to go and build alongside them. I felt like it was really important for me to deeply understand the problems in the tech. But, and so that was me. I was, I was, you know, writing, pushing code effectively.996 and, and I, I went through so many different moments of realization of like, oh my God, this is going to change everything to oh my God, this is just amplifying all the good and the bad in the industry to, oh my God, engineers are not going to have a job anymore to, you know, it's like. And so I, I don't have any. Like, I felt like I had all the predictions back then and at this point now I'm just very interested to watch the phenomenon continue to unfold in front of us. And I will say, I was chatting with a bunch of really bright college juniors and seniors at a dinner we hosted last night and all these folks are about to enter the industry, basically having kind of come up in the era of agentic development and LLMs. And I asked them, like, what is your workflow when you're building a project? How do you use agents versus when you decide you're going to actually just write code by hand? And I was surprised to hear the consensus was that most people there were using agents to collaborate on like, building a design document and like collaborating on the architecture of the solution that they want to build and then maybe asking it to like emit, you know, a doc or an implementation plan. But then they'll go and write a lot of the code themselves still. So it's a little bit more of the, the, the rubber duck co architect use case that was most prevalent in that group. I was very surprised by that.
33:58
I'm impressed. The kid, the kids are.
35:56
Yeah, I know. They still want to, they still want to actually write the code themselves. It's interesting.
35:58
Yeah. What we hear from like the gen ZS at OpenAI, they just YOLO everything into code s.
36:01
Yeah. I would say most of the code I generate is like.
36:08
Yeah.
36:10
But I spend a lot of time on the doc. It's curious, like when you're like younger in your career, it's like you don't really have all the mental models of the different patterns to instruct. I feel like there's like overreliance, especially if you're doing the design doc. I feel like most of the senior engineers, we'll spend more time on that. It's like, even things like, you know, what columns should you index depending on what queries we usually run on the stable and things like that. It's hard for any AI to know that. And it's like, I feel like the role of the more senior engineer should actually be more of this. It's like spending time teaching the AI and then the AI can teach the junior people in a way.
36:10
Yeah, yeah. And everything, everything looks like mentorship and management the end of the day. Right. It's like you're breaking down tasks, you're supervising work, you're giving feedback. Like it's, it's basically management except that.
36:48
There'S agents are really bad at memory still. Like they basically have zero memory and it's, it's, it's the end of 2025. What's going on?
37:02
Yeah, yeah.
37:10
What's your internal stack for like preferences? There's like kind of like, you know, explicit preference you can use with you know, agents, MD and all that stuff. There's implicit preference with Linter rules and things like that in a way where it's like it just happens, you don't have to tell it. How do you structure that?
37:11
Oh, and are you talking about for agentic coding or memory thin or like platform? Yeah, yeah.
37:29
For like the coding specifically, it's like. And then we can kind of talk about, you know, the whole BREX platform.
37:33
Yeah, just, just nothing, nothing special. Just a lot of explicit rules about MD files. Yeah. And then we have, and we in Linting, we still have like traditional linters in place for the couple of different language rule chains. And then we're, we're big fans of Greptile and we use them for basically all of sort of the smarter than lintane like agentic code review. That's been the one solution that we have aligned around that has served us extremely well. Yeah, yeah, no, we're huge fans. They've built something really impressive and I think the thing that constantly blows my mind about it is the way that they're able to just have a really impressive signal to noise ratio. Like the comments that it leaves are very, very high signal. Like never. I never regret going through all like 65 comments it leaves on my, on my diffs because it catches so many things.
37:38
Yeah, I found the Codex review to be really good. I don't use Codex for code generation. But like the review product.
38:31
Yeah.
38:36
Is like very good for some reason. I used to have, when I was working in Rails there was like this project called Danger Systems.
38:36
Oh yeah.
38:43
It was kind of like a semantic Linter.
38:43
Exactly.
38:45
I feel like there should be more of that now. It's kind of like the rules are one thing at generation, but I want something in my CI that is like enforce these rules and call out where they're broken and then I can just copy paste that in an agent.
38:45
But yeah, when we, when we started building this, this new agent code, basically as we were saying, like we were answering the question, what would you do if you built, you know, a Brexit disruptor today? And it's like it wouldn't be to Pick Kotlin and Elixir as the back end. And so we actually went with the full like typescript stack and we were building on all like public interfaces and really trying to make sure that this agent layer was like arm's length from the good and the bad of the core of our product. And one thing I think what we did early on, and I don't actually know if this is true because again, the team keeps sort of iterating, but we're having good luck using Claude code like in a GitHub action to basically go and do more of that dangerous style like code review. So have a prompt for it that went through all of the different facets that were more conceptual versus like rigidly enforceable by a linter and have it leave a big comment at the end with your conformance to the idiomatic coding patterns of the new repo.
38:57
I wanted to spend some time. You said you wanted to deep dive on operational agents, customer support, onboarding, KYC fraud, delinquent account disputes. This is, I imagine, the bulk of it, of the work anywhere where there's a good story about maybe when you started iOS it was going to be this way and then you discovered through building or through customer contacts that it had to go a different direction. And so that difference in beliefs is something that people can learn from.
40:00
The thing that immediately comes to mind is that we, we believed at the beginning that using RL for credit decisions would actually be a. Like would be the way that we would end up. Or like credit and underwriting, like how much of a. Of a limit should we give to this business? That reinforcement learning would be the way that we would go about building a model that effectively would decision in the way that a human underwriter would. And it turns out that it was, we made this big investment. We were working with some outside like a company that specializes in this and the performance we ended up getting was inferior to just building a. Like a web research agent. And so, so I think what we took away, what has been most evident in operational AI is that in operations you need to be able to break down problems really granularly and be able to form SOPs that humans can repeatedly follow and thus can be audited. Because so much of the responsibilities in operations is to have auditable, repeatable processes that help to ensure that we're operating in a compliant manner. And that actually translates just so cleanly to LLMs that we haven't needed to use too many sophisticated techniques in operational AI. It's been relatively simple, like do tool like agents or maybe even a lot of problems can be solved which is like a single turn chat completion. And so the fact that we didn't when we did one sort of attempt to over engineer and use more sophisticated techniques and we discovered that in fact the solutions are a bit more plain and less technically sophisticated. The challenge is really articulating and refining prompts to reflect the execution of the SOP and reflect all the sort of institutional knowledge that isn't written down so that agents can properly replace the humans or the contractors that we would have making these decisions.
40:26
How do you decide what is worth spending a lot of time building versus what you think some of these models are? Just because some of these tasks are so generic, they're not really about. About Brex.
42:31
Yep.
42:39
Like you can assume the models will be good at it versus some of them are like very specific to you.
42:40
We kind of prioritize like the tasks that are most common for the broadest number of customers and the some of them are fairly, fairly intuitive like being able to research a customer to look to assess like legitimacy of the business and whether that business would fit our ideal customer profile for onboarding. Because there are certain types of businesses that we either legally cannot serve or we are not comfortable being able to serve. So that's the type of really kind of basic research and like a relatively straightforward problem that isn't Hyper Brex specific. The things that are a little bit more specific to us or companies in our sector would be preparing documentation for a network card dispute. Like if, if you go and dispute a transaction on, on your, your personal card, you will provide evidence to your card issuer. The card issuer then has to put together like a three or four page word document that goes to the card network and then eventually goes to the acquiring bank. And all of that is like much more specific to our business. It's a huge operational overhead for us and that's something that we, we decided to automate later because it's not as it's not on the critical path of like serving the vast number of our customers. Like disputes are expensive but not very common operational process. And so they're lower on the stack. And I think we're getting there right now. But this year has basically been us just kind of like looking at every single process, just kind of stack ranking. And I will say like the thing that got us started down this path was we wanted to expand our ideal customer profile who support more business like a wider variety of commercial businesses which tend to be businesses that aren't growing as quickly. So they're not like tech startups which have a lot of growth and they're not usually like, they're not enterprises which also tend to have a lot of growth. It's more like a lawyer's, a law firm or a dentist office. These types of like solid businesses that we should be able to serve and underwrite, but the cost to, to onboard them and the cost to serve if you have all, all the humans in the loop make them ROI negative. And so that was the first sort of use case of AI within our ops organization that then led to us really understanding we could automate much more than that.
42:44
Is this bricks going back into SMBs?
45:05
Ah, that's a good question. Yeah, yeah. So never let that die. I think the way we've thought about this is we want to always offer our product to customers where we believe we have an offering that is well suited to the, to the needs of those businesses. And I would say that still for very small businesses, our offering isn't. It's not built for that. It's built for. It's built for companies that have some degree of scale, typically have at least sort of one person, if not a couple people in their finance team. So we consider these to be more like the commercial segment and so it rhymes with sb. But our approach back then was a little bit more naive and I would say we also, we were just going for a volumes, like a volume game there. Our internal controls were not as strong. We didn't have as much experience underwriting those businesses. And so it was really ended up being a huge burden for the business. Almost existential for us to have those tens of thousands of customers that all were ROI negative. So we're trying to basically scale to serve more businesses outside of tech and outside of like the up market segment, but do it thoughtfully. So I think right now our minimum threshold is like a million dollars a year in annual revenue or like $10,000 or more per month in card transactions as kind of being like the low end of our icp, which is obviously not what you would think when you think of a small business. Small businesses tend to still be smaller than that.
45:08
Oh wow, that's really small. Okay. Yeah, yeah, mid market.
46:47
Yeah, exactly. And it's funny, it's just like the, the names of these segments. You know, it's like what we consider.
46:50
I don't know.
46:57
Yeah, no, I think, I think like that's, it's like. Yeah, it's like lower mid market. And it's funny though because when what we call Enterprise may be another, you know what sales, what we call enterprise is a business that Salesforce might call a mid market. Right. Like because it's just, it depends on the scale of yourself as a business when you use these terms.
46:57
And all of these things are built in the Brax Agent platform, like all of these automations that people build.
47:14
Yes, exactly. Yeah. And in fact most of the operational AI is running on that original platform that we have and we built it. One element of it that I didn't mention is that most of the UI UX for this platform is built in retool. And so like you can basically go into retool and there's like a prompt manager, a tool manager, an eval manager. And that's sort of where much of this was built. And the goal with that was again to make it more accessible, more ergonomic to, to get started. But what a secondary effect of having a more like visual set of tools for this is it's enabled members of the OPS organization to go and do prompt refinement themselves. So you don't need engineers to go and refine the prompts or, or even like test new foundational models when they come out. I think that that's another fun thing. When a new model drops, folks will go into the platform and basically run the evals on the new model and kind of see can we get better performance here or does this have different latency or different cost characteristics?
47:18
Yeah, you want the domain experts or the people directly using the tool, not the engineers who are sort of somewhat removed from the tool. Yeah. I do want to highlight to listeners that a lot of the Brex Agent platform are just things that every company should have, basically. Problem management system which we talked about, where the domain experts are doing it. Multimodal testing, evaluation and benchmarking frameworks, API integrations for automated workflows, MCP based architecture straight with Brexit's external AI products. This one is obviously very Brex specific. One thing I did want to highlight that I was semi impressed by because nobody, people, very few, rarely talk about this. It's knowledge base for understanding Brexit's business.
48:22
Yeah.
49:00
So do you want to expand on that?
49:01
Yeah. And this is an area where we've only scratched the surface here, but a big challenge that we face is that the world knowledge or the knowledge that's built into the model about what GPT5 thinks Brex does and how it thinks our business operates is actually quite different from what our business offers today or how our product works. And so we've had to work on building a corpus of sort of product documentation, process documentation and like curate this set of information to basically ground a variety of our LLM applications, including like that Brex assistant which is like the, you know, the assistant that employees will talk to is like we don't want it to hallucinate features that we don't have or like give, give wrong information there. And similarly like some of the operational agents need to be grounded on like what our ICP is. Because if you ask ChatGPT5 right now, like what types of businesses is Brex on board? Or like what types of businesses does Brex serve? It might not give an accurate explanation to that question. It might say hey, we're a corporate car for startups, which is what we did seven years ago. They might say we only serve enterprises. And so that has been an interesting challenge and I think what we've been trying to do there is I'm actually going to be spending time with folks talking about this next week internally about can we refresh our strategy and kind of unify it. Because we have a lot of product documentation that's internal for our operations and go to market teams. We have a bunch of product documentation that's external for our customers. We have a lot of go to market enablement material that's more sales pitchy and we have documentation that is put into Sierra, which is the chat assistant that we use for, for frontline support. Like all of this ideally could draw from the same source, but right now it's, right now it's a little bit fragmented. It's just something that we're trying to invest in though because I think at the end of the day the duplication of efforts is just like, is wasteful and it's absolutely necessary to get this.
49:02
Right just to deduplicate Sierra, meaning the Brett Taylor startup.
51:12
Yes, exactly.
51:17
I would expect that you have, you built so many other agents that's, that's, that's one you can do yourself.
51:18
That's like solving problems that are not differentiated enough for us. I think what, what's interesting about the, the Sierra that has been really helpful is that again it's really easy for like the UI and UX of basically administering a Sierra agent is something that's really accessible for the OPS and CX strategy team which are like, it's much more low code and more sort of workflow and DAG oriented and we have engineers kind of going and giving it tools to take actions but for the most part like it's nice to not have to build build the UX for somebody to manage something like that. And I think the fact that Sierra speaks the language of customers. Yeah, exactly. Speaks the language of CX that can do all the reporting and the telemetry and stuff that our VP of CX would like to see. Just you know, it's just one fewer thing that we have to build.
51:23
What about evals? How do you build evals? Who manages them?
52:12
Well it depends on the application. So on the operational AI side those evals are basically baked into the platform around every prompt or every agent and for the most part I think most of these use cases kind of come online like the V1 of our commercial underwriting agent or the V1 of our startup KYC agent are co developed between a subject matter expert in Ops and like an engineer and they're going to kind of co develop an initial eval set. But then from there generally in Ops you're always doing qa be it like on humans or on the LLM decisions. And so whenever like as part of our QA feedback loop whenever there is a mistake that's usually almost always going to result in like another eval being written as like a regression test. So all of that within Ops AI is pretty pretty straightforwardly managed on the product AI side. That's where it starts getting a little bit more challenging because the multi agent network is quite challenging to evaluate. And so what we do there is we try to adopt some of the state of the art for multi turn evals where we will basically have an agent embody the user and have basically the end user agent is given an objective and then we basically have it run a multi turn conversation and then use LL misjudge at the end to all of the different asset assessment. The one other thing that we do technique wise that is interesting is sometimes you want to, you don't want to do like, you know I think these multi turn evals are kind of like integration tests. They sometimes test more than what you want to assess. And so sometimes what we'll do is we'll also pre can like an initial preamble to a conversation or maybe a couple turns will be handwritten and we'll basically set, set the, the eval to start and we'll see if we're able to like isolate certain, certain behaviors. So it's, it's still like a work in progress and I say like at the end of the day a lot of the just periodic human review and, and like looking at, at cases where we've detected as we go to like summarize. Like, what we'll do is we'll reflect on a conversation after a certain amount of time has passed where we'll summarize it, like extract facets, like did it seem like the user accomplished their objective? And we'll just manually when a lot of the cases, when that's failed and decide to write another eval for it.
52:17
Are all the evals supposed to pass or do you have a set of evals that are like someday the model will be good enough and like how to change over time?
54:50
Yeah, it's interesting. I don't know if we have any that, that are like, oh, someday I hope it'll be good enough to do this. But it's like there are the evals that are blocking because they would indicate like a regression, an unacceptable regression. So these tend to be just accuracy related evals. But then there are others that are more about like tone and coherency and these types of things where they're, they're more subjective and we, we're just looking at those over time as a, as a metric. But the, the team is actually interesting. I think we're going to get a big update on like how the team is thinking about evals tomorrow and like our Friday, our Friday review. So it's, this is an area where I'd say the largest challenge, like the largest change we needed to make and how we were executing sort of as like a lab or an incubator back earlier this year to like where we are now, where we've, we've shipped and like we're trying to, to increase the rigor has been around like avoiding regressions and having more and more increasingly robust evals.
54:58
Yeah. I've worked with a company called Verus AI that does user simulations and I think like that's what's been interesting. Some of these things they just don't expect like the customer does not expect the model to do, but they want to track saturation of the model in a way if that makes sense. And I feel like most companies know what they don't want to happen, but it's almost like they don't. They cannot quite articulate, oh, I want in the future the model to be able to do this. They can do it today. But I'll keep running this eval that's.
55:59
Actually really, really interesting to me and I, I'm going to take that away and start thinking about this because there are, there are going to be certain. I mean we already seen this where, where Users will ask the assistant for help with things that we don't support yet or we haven't implemented yet. It's like those are opportunities actually for us to build a, like, effectively write a test that's going to be failing for weeks or months and eventually we'll go green. But is a way for us to actually kind of show the progression of sophistication of the assistant. I really like that as an idea.
56:27
Yeah. I wonder how you also catch hallucinations and things that it doesn't have.
56:59
That's usually the problem is it'll pretend like it can assist with something. And one thing that is really annoying that has been tough to. To prevent is that the, the assistant, because it is used to speaking to other agents that can support it in like accomplishing various tasks. If you ask it to, to help with a task that it thinks it probably should have an agent to, to. To work with, it'll just hallucinate that. It always like, oh yes, I'll. I will like, you know, I'll reach out to the finance team on your behalf to, to pass this question along. But it's not doing anything. There's like no finance team. There's no way for it to do that. This is something that comes up a lot. It's like, would you like me to ask the finance team? And there's no there actual guardrails for that. Yeah, yeah. That was something that we had to.
57:04
Like a regex like.
57:49
Oh no, we don't. I think we've been able. We've been able to just beat that out of its system with a system prompt. But, but the. We don't have as many guardrails in place right now. Just around a couple of like potential like things that could get us into trouble.
57:50
Yeah. Really? Extremely. Yeah, I just. Yeah. It's surprising when I, I guess two years ago was first kicking around the idea of all these things. I would have said that probably guardrails would be more prevalent, especially in finance use cases, but surprisingly they're not.
58:05
Yeah. And that was actually part of what we. That was like a feature I believe we built in the LLM gateway early on as like the sort of last chance. Like, like hard coded. Yeah, exactly. Here's some regex and stuff like this Tilt. Yeah, exactly. Or just, you know, in the way that like, if you go way afield on ChatGPT, you just get like the inline 500 error. It doesn't even tell you that it can't help. It just like craps out. Like we kind of Built a couple of those circuit breakers or like the ability to put those circuit breakers in and I don't believe if we're using them for anything.
58:21
One last thing I want to get your thoughts on was AI fluency levels which you guys have a framework of user advocate, builder, native and everyone goes through it, including Camilla. And I just think it's interesting, I think it's a model that other people are thinking about adopting but they're worried about rolling it out, that everybody's gonna be banned. And also like how do you have like this in house training course that you keep up to date? Just tell us more about it.
58:51
Yeah, so in, in the operations org they're actually more ahead of even engineering on this front as far as like trying to create, create like learning pathways for this. And I think that part of the reason why they're ahead of us is that in operations are much more they have to be able to operate training at scale. Like training is a very big part of how people build aptitude around their job function within ops. Whereas like in EPD a lot of it is sort of getting hands on, building experience, like going along, getting mentored, getting code review. But it's been really neat because I think we've really like we created an environment, we managed to buy a by speaking openly about the transformation that we saw would happen in this industry towards AI sort of displacing a lot of the operations and CX roles. And we were just honest about it and I think what in the same breath that we said hey, a lot of these job responsibilities will go away, we also said we don't anticipate that meaning that your job has to go away. It's just that your job has to change. And so the fluency framework and then the training and support and the positive sort of culture where we celebrate people making progress has been really helpful for avoiding a culture of fear or oh, you have to do this or you're going to get this is going to go in your performance evaluation. I think the it does well it's not like as rote is like oh like what is you know what is the like how much are you using AI and is it enough? It's more I think we've built a pretty positively framed culture where we'll do spot bonuses for people who have particularly novel uses of AI in their day to day. In our company, all hands every two weeks we'll do an AI spotlight. And it's very rarely somebody in EPD for the most part is folks in ATMs Ops Finance, the People organization, showing off other building agents in ChatGPT or on Glean or how they just found some new use case that they thought was helpful. So we're trying to create, like, I think at the end of the day, like, we've hired a bunch of really smart people who, like, I have full confidence that this type of work is within the reach of anybody who's motivated to, like, sort of challenge themselves. And so we've done that. Then in engineering, there's one other thing that I want to call out because I think that this is kind of fun, is that we adapted our interview loop to be more AI sort of agentic coding native. So instead of, we had like a coding and a system design question that we basically have revamped into a project where we'll give you like a brief before you come on site and then like an additional sort of spec when you do. When you start, you know, we expect you to use agented coding to complete the task. In fact, it's like, kind of impossible to get all the way through it if you don't. And so we're evaluating, you know, your knowledge. Like, we're kind of watching how you work. We're evaluating whether you understand the code that's coming out. You know, we're kind of probing at you as you go. But what we did in order to kind of maybe bootstrap the process of all of our existing engineers, like getting familiar with agenta coding, is that we, as soon as we had the interview ready to ship, we started. We said, everybody in engineering, including all the managers, are going to have to go through this interview. And so we re interviewed everybody internally. And it's like, it's one of those things where it's like, it's not. We didn't like, keep a score or like, you know, I don't have any data on like, who passed or failed or what they scored. But what we found is like, as people would take it, it would actually cause them to have moments of realization where it was like, oh, I. I can uplevel my skills around this, or like, I have, like, I want to be better at this. And so we're trying to find like, a way, like, a variety of techniques to kind of push the culture along. And I think as I reflect on like the year, because this is the year where we really put all the effort into it, I'm really satisfied to see Descent to which everybody's leaning in on a, on a daily basis. Going back to like, even I was Shocked when we were looking at our cursor logs that like the number one user is, is an engineering manager on infraorg. It's like that. That is super cool. To me it means that like folks have, have, have taken this to heart and found. Found ways of doing their job differently.
59:19
I guess my I, I had a closing question or I guess a parting question and this is broadening out from Brex.
1:03:33
Yeah.
1:03:39
And this is just. You interface with other engineering leaders all the time. Did we not cover anything that other CTOs are having as top of mind today? Like their number one problem is underscore.
1:03:39
The thing I find myself discussing with folks that and I don't want to shy away from scary topics. In fact, we were just kind of on one that was adjacent which is like how do you evaluate somebody's progression towards being more AI native? The cousin to that question is it's like will we need as many people to operate our businesses? Are there layoffs coming? How are we thinking about headcount growth? Junior versus senior, Junior versus senior? Yes, exactly. Level mix. And I still have more questions than I have answers there. I think what has been really interesting is that I view agentic development as being something that amplifies all the, all the good just as much as it amplifies all the bad. And the amplifies sloppiness, poor architectural thinking, misunderstanding of the requirements like there are for all of the acceleration of good outcomes. It also accelerates bad outcomes. And I think what has been interesting is that there has been. When you sum that all together, there's less of a obvious capacity increase. It's more nuanced than that. So I'm not looking at headcount planning as we think about it next year as being something like, oh well, because AI is giving us so much more leverage, we don't need as many people. The thing I'm really proud of in my tenure as CTO is that we haven't growing engineering at all. What we've done is we've grown the business significantly, but we've been able to build like greater efficiencies in how we execute, like how we think about building, how we roadmap what we choose to do and what not to do. That we're able to serve significantly more customers with more lines of business without needing to grow engineering account. I think that that's kind of the way that we're going to just continue on this road is like I like having 300 engineers. I would love to just, you know, a year from now have 300 engineers, but we're still, you know, 30, 50, 100% more efficient. That is the thing that comes up with other engineering leaders. And the other part of that conversation is like, how much is AI getting blamed for just sort of ordinary performance oriented risk? You know, like if Microsoft is letting go of like 4,000 people as a business, what they have 150,000 employees, I believe, is that really like AI causing that or is it them just using it as a way to avoid some harder, like, perf management decisions? I'm not entirely sure, but I'm listening more than I'm speaking on this topic because every time I feel like I have a pretty firm point of view, some new anecdote or experience comes in that kind of challenges or invalidates it.
1:03:51
Yeah, well, you know, I take these signals as it's my job to go find people who think they have answers and surface them. And you may or may not disagree, but at least you have something to use as a strawman in your work.
1:06:39
Exactly, exactly. And I think as an industry, it's just early innings on this transformation. So I'm looking forward to seeing, listening to this podcast episode a year from now and seeing what we got right, what we got wrong, and what's different because so much changes quarter over quarter.
1:06:51
Yeah, I do think aicoe is a very well established pattern. I think internal platform is very well established pattern. And this fluency thing is something that people are figuring out that I think you guys are hit on.
1:07:09
I'm happy to hear.
1:07:22
That'll be my feedback.
1:07:23
Yeah. Any final call to action for things that you want to buy, like what should people build for you? Like problems you're trying to solve that you would love people to reach out for to help.
1:07:23
The call that I'd make is for folks who are interested in multi agent networks to get in touch with us. Because I do feel like this is something where we're innovating in service of our customers and where I feel like the frameworks, the tooling and the research is there. There's actually quite a lot of interesting papers and things that we lean on, but I would love to see more of that encoded in what's available writ large in the industry. Because I feel like my intuition has been that trying to craft LLMs into deterministic workflows and dags is kind of underselling the power that they have to actually plan and execute in a more sophisticated, fluid way. And I just want to see the industry lean in more on these agent to agent interactions.
1:07:34
Okay, so I'll dive in a little bit here because I have a minor opinion. You keep using the word networks. Is that a reference to a specific paper or it's your term for it?
1:08:30
It's just our term. And I think that that's actually the term that Master uses as well for it. Initially we used to call them agent runtimes internally, and then we just switched to networks.
1:08:39
And then I think the other thing I wanted to get a clarification on is, is it mostly a full agent talk, talking with a full agent, or is there kind of like an orchestrator boss agent talking to a sub agent? And I think that does matter for a subset of people who are building all these things, because when you say multi agent, sometimes people don't agree what that means.
1:08:53
Yeah, so it's a tree more than it is a graph. So it is like.
1:09:13
Yeah, when you say network, it feels more of a graph, but it seems more directional as a tree. Like, there is a hierarchy.
1:09:16
There's a hierarchy, yeah. But there are some violations of that. One of the interesting use cases, and this is where the power of. Of having an assistant for every employee, plus having agents that run and embody members of the finance team is really powerful because there's this interesting use case that we brought to market, which is that one of the finance team agents that we launched is an audit agent. Where an audit agent kind of embodies the work that a lot of larger finance teams will do to look for patterns of waste, fraud or abuse, or like, systematic avoidance of policy that isn't as obvious with a single expense. Like, you can evaluate a single expense and the metadata around it to see if it's within policy or not. But what if you start seeing an employee often make a large number of like, $74 transactions when receipts are required at 75. Or what if you. What if you see certain things like, oh, okay, there's actually a fair number of like, doordash expenses during business hours from this individual, like on days that an office launch is provided. Or maybe you see like, rideshare patterns that are. Where you have to look at a broader context. So we built this audit agent that can ingest your SOP and look. Also ingest your. This is a bugs customer's sop. Exactly. Yep. And what it does then is it's basically always looking for potential violations. And what it does is it is extremely zealous. Like, it wants to have a minimum number of false negatives, so it will raise a large number of potential violations and then a separate Agent. A review agent will then apply the wisdom of like, is this important enough to follow up on? Is the dollar amount in question high enough? Does this user seem to have like a high compliance behavior? More generally, it makes a judgment call about whether it's worthy enough to take that violation and make it into a case. Then once it's made into a case, generally what happens is that you need to get more information from the individual. So if humans were doing this, there'd be some outsourced team that's like looking for all the potential violations. Then you have some full time employee on the finance team who's looking at all the violations. Oh, these are the ones that are important. We need to follow up on it. Then what they do is they hand it off to somebody who will go and slack that employee and be like, hey, what's going on here? And so what we have is like the audit agent looks for violations. The review agent decides whether it's worthy enough to turn into a case. And then from there, when the case is filed, that will trigger an event to the Brexit assistant for that employee. And any additional information about the business justification can be collected. Or maybe the assistant already knows because in its conversation history, if the employee knew something about why this expense looked out of policy. And so you start having the network becomes interesting when you have the finance team agents communicating with the assistant or various employees. And then behind there you have other sub agents. And so then you start seeing like more of a graph emerge. But when you look at just what serves the employee, it looks more like a tree.
1:09:24
Amazing. Well, I didn't know you were going to go into that level of detail.
1:12:33
Yeah, yeah, sorry about that.
1:12:36
No, no, no, no. I'm actually really glad I asked. Like that is very impressive and I hope you do more content about that.
1:12:37
Yeah, absolutely. We're really excited about it. I think it's been good to finally figure out a use for agents and have the technology be as robust as it is to start realizing this vision because it's something that we kind of dreamt of a couple years ago. And the tech, like to your earlier point, the tech just wasn't there. When we were trying to make a similar concept work with GPT 3.5, it was like, nope, we were hallucinating tool calls back in that day.
1:12:44
Awesome, man. Thanks so much for joining us.
1:13:10
This. I really enjoyed it. It. Happy holidays guys. Thank you for having me. Thank you.
1:13:12