Today, I am discussing a new way to think about AI and Agent Readiness inside your company, and it is called Maturity Maps. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Alright friends, quick announcements before we dive in. First of all, thank you to today's sponsors KPMG, Robots and Pencils and Blitzy. To get an ad for your version of the show, go to patreon.com.ai-daily-brief, or you can subscribe directly on Apple Podcasts. To learn more about sponsoring the show or really find out anything else about the show like where we are with our Agent Madness Bracket, which is going on right now, head on over to aidelybrief.ai. Today is the second day in our build week which is happening while I am traveling with my family. Yesterday, we did a very high level overview around everything that had happened last quarter and what it meant for this quarter, and if we put this in the framework of this being build week, yesterday was sort of the context setting. The environment in which your building is happening. Today's episode takes that down a level to discuss the benchmarks that show where others like you are. Now one of the things that I've been thinking about a lot over the last six months or so is just how much we need a totally different set of data and benchmarks for this new AI era. Everyone is adapting incredibly quickly right now, or at least they're trying to. It's new processes, new workflows, new tooling, new everything. And by and large, we're doing all that exploration without a map. Let me give you a practical example of where I think our lack of benchmarks could actually very significantly and meaningfully negatively impact a company when it comes to their AI adoption. Let's say that you are an early adopter company. Across your different functions, you've had really strong hands-on efforts to get your AI up and running. In the absence of knowing exactly what to measure, you're just trying to measure whatever you can, and early results are pretty positive. For example, in the marketing function, you have increased your content output 30% year over year without any sort of proportional increase in the resources it takes to produce that content. Now, that 30% year over year growth sounds great. But what if I told you that all of your competitors had actually grown their content output by 50%? In AI world, this is actually not a far-fetched scenario, and it shows how the need for better benchmarks and numbers is not just a vanity exercise. When we don't know how we're doing relative to peers and competitors, it makes it really hard for us to judge what we need to change, what we need to shift, and what we need to do next. Now, at AIDB and at Superintelligent, which is my enterprise AI planning and strategy company, we started exploring some of this with our AI ROI benchmarking survey at the end of last year. We had people submit and share their real use cases and share with us the impact those use cases were having across an array of eight different impact dimensions, things like time savings, cost savings, new capabilities, increased output, and a handful of others. We asked them to rate impact from negative to transformational and found that by and large, at least when it comes to people self-reporting, they were already seeing strong and positive impact from their AI initiatives. But there are a couple of clear and obvious limitations with that study. First of all, while self-reporting is better than nothing, it's always going to be somewhat imprecise. Second, like pretty much everything that we do with this audience, you have to calibrate it to a more advanced individual and organizational user than if you just surveyed a broad cross section of businesses in the world. Third, while it did give us some great information around individual use case impact, it didn't tell us all that much about other dimensions of AI readiness and adoption outside of just the use cases themselves. Anyone who's felt the sting of the capability overhang, in other words, the gap between what AI can do and what we're actually using it for, knows that raw capability isn't really the question. It's the systems we put around it to get value from it. And unfortunately, the research and information apparatus just has not adapted to this new reality. Not to pick on gardeners specifically, but they're the bigs in the space, and so sort of present an easy target. Tried and true benchmarks and information products like gardeners' Magic Quadrant have literally never been less useful than they are right now. The idea that success in something like AI application development was going to be even a little bit dictated by choosing the right AI application development platform vendor is just so far outside of the reality of these tools as to be almost actively harmful if that's where you're putting your time and effort when it comes to trying to figure out how to adopt AI. Now gardener is more than the Magic Quadrant and they are doing lots to try to catch up to the AI world, so it's not to single them out. It's more to make the point that we are in desperate need of some new frameworks, some new benchmarks, and some new tools. And so at both AIDB and at Superintelligent, we've been thinking about this a lot over the last, call it three to four months. We've experimented with a couple things that you'll probably see some version of come out at some point in the near future. One of them I call AI Opportunity Radars, which are basically a way of organizing use cases by function, but then also by applicability depending on where your organization is in its development cycle. Simply put, it's a radar or bullseye type of visual where use cases are organized into one of three categories. Some time means that most organizations as they are are well suited to get value from that use case right now. Emerging means that while there are a lot of organizations that can get value out of them, there is some amount of setup cost or right set of circumstances or infrastructure that is going to be needed to get value from that and not all organizations are going to be there just yet. Finally frontier is exactly what it sounds like where if your organization is well set up with the right infrastructure, you can be getting a lot of value from those use cases. But at this point, most organizations aren't there yet. So over the last quarter, we built an agentic system that is basically constantly seeking out every new resource it can get its hands on, assessing what those resources tell us about the use cases in different functions and keeping these radars continuously updated both with with new use cases as well as changes in where the existing use cases are placed. But as we were working on radars, it was clear again that there was something even more fundamental and that overly or only focusing on use cases was leaving out so much of what actual AI readiness means. When we're doing AI readiness and planning assessments at Superintelligent, we're not just thinking about what use cases a company should do, but what's the full set of change management and infrastructure development and new policy and investment in people and all this other stuff needs to go around it to actually get value from those use cases. And that led to the development of the framework which I'm going to be sharing today, which we call for simplicity AI maturity maps. Now the concept of maturity is certainly not some proprietary thing that we invented. Maturity is just a heuristic and a framework to look at where different organizations are around some key areas relative to one another and where they should be. So the way that maturity maps work is that they organize AI and agent maturity into six different categories. Those categories are first deployment depth, which is sort of an expanded notion of use cases. Debt in the context of AI maturity not only thinks about how many use cases you have in play, but how much those use cases are assistance versus full workflow automations versus actual applied agentic systems that are doing work with some meaningful degree of autonomy. The second category is systems integration. This is a measure of how deeply integrated the AI solutions and workflows that you're deploying are integrated with the existing systems that run your enterprise. Is everyone using chat GBT independently or does your CRM system have an agent running through it automatically extracting insights, making recommendations and even setting up new outreach campaigns. Systems integration is in some ways one part of the measure of how good the context that an enterprise's AI has to work with. Now the other piece that relates to context is of course data. How much, what quality and how well managed is your company's AI's access to your company's data? Does it require people dropping in PDFs? Once you have company knowledge, I'll set up on MCP servers. How does the AI that your company is looking to transform your company have access to the information it needs to know what that transformation should look like? Outcomes is almost a measure of measurement. Are all of your deployments pilots at experiments or do you have a track record of actual demonstrable and measured outcomes? Outcomes in some ways are the information you need to know what you should do next across all these other dimensions. The fifth dimension of AI maturity maps is people. And this is an admittedly broad category. A big part of this refers to upskilling and capabilities, but another piece has to do with attitudes. Given that one of the major barriers to adoption in many companies is not just going to be skills using AI but attitudes towards AI, people is an extremely important and unfortunately as we'll see often neglected piece of the AI maturity pie. Lastly of course is governance. How clear, how established, how communicable, how known are the rules and guidelines and access provisioning around your AI systems? Do people know where to go to get the permissions they need? Do they know what expectations are? When issues come up are their mechanisms for resolving those issues. So those are the six areas across which we look at AI maturity. Now for the purposes of developing these maps, we've started with 10 functional maps split across some of the most common very broad brush categories of knowledge work. That includes customer service, engineering, IT, which by the way the difference between those two for our purposes is effectively that engineering is all the stuff that's external facing and IT is all the technology stuff that's internal facing, sales, marketing, HR, operations, finance, legal and product. So at the end of last year we started to put together a process for actually assessing and visualizing AI maturity across all these dimensions. What came out of that is the chart that you see here which plots each of these six categories within a specific function on a five point scale. Number three, the center of the chart is the on track line. In other words, where an average organization should be. And the word should as you'll see is doing a lot of heavy lifting there. Now if on track is a three, four is ahead and five is significantly ahead while two is behind and one is significantly behind. Is the idea is that when you look at a maturity map without having to read a lot of words you can instantly see the gaps between where organization should be and where the average organization actually is. And when you compare your organization to it, also see where you are relative to both the general on track line and the average. So clarifying this a little bit more, a quarterly's designation of on track is not where the average organization is. It is a subjective measure of where we think the average organization should be. As you'll see when you dig into this quarter's numbers, in the vast majority of cases, we believe that the average organization is behind that on track line across pretty much all of these dimensions. To use a term that comes up a lot on this show, the fact that organizations tend to be behind this on track line is effectively a visualization of the capability overhang. Now at this point you might be wondering, well, what gives you authority to determine what the on track line is? It's a totally reasonable question and believe it or not, it is a little bit more at least than just my opinion. We have a few different places to pull from. The first is the sort of proprietary research and surveying that we do as part of AIDB Intel which gives us some pretty good insight into where particularly leading organizations are. And it's super intelligent given that we are doing thousands and thousands of voice agent interviews every month to help organizations assess their AI maturity and plan their AI strategy. That's another pretty unique source of frontline data. And then combined with that, we built a system to go out and effectively aggregate pretty much every new survey or study that comes out that even vaguely touches AI. You might have heard me mentioned before that my most useful open clause are my research open clause and this is one of the main things that they do. They are in a never ending 24 hour a day constantly hunting loop to both surface new sources to assess those sources in terms of their legitimacy, credibility and bias and then to integrate that information into our larger assessment system. There are more than 480 studies and surveys from the last quarter that went into these Q2 maturity maps. Among the sources that have explicit sample sizes, the combined survey respondent base 150,000 professionals across more than 50 countries. The types of source categories that we have are one big four and top tier consulting firm research. There's over 20 of those sources in that mix. Major platform earnings and public market statements. Analysts firm predictions in research from companies like Gardner, Forrester and IDC function specific regular or annual surveys such as Stack Overflows Engineering Study or other similar things for areas like marketing legal and IT, academic and government research, behavioral data sources where companies that have access to some unique user behavior data aggregate, analyze and share that. A good example of that is Jellyfish's AI coding benchmark which used behavioral data for more than 200,000 engineers across 700 companies with 20 million PRs. Finally, there are of course practitioner reports and vendor case studies, although the system is careful to rate them with some amount of skepticism given that they are of course selling something. All right folks, quick pause. Here's the uncomfortable truth. If your enterprise AI strategy is we bought some tools, you don't actually have a strategy. KPMG took the harder route and became their own client zero. They embedded AI and agents across the enterprise, how work it's done, how teams collaborate, how decisions move, not as a tech initiative but as a total operating model shift. And here's the real unlock. That shift raised the ceiling on what people could do. Humans stayed firmly at the center while AI reduced friction, serviced insight and accelerated momentum. The outcome was a more capable, more empowered workforce. If you want to understand what that actually looks like in the real world, go to www.kpmg.us slash AI. That's www.kpmg.us slash AI. Today's episode is brought to you by Robots and Pencils, a company that is growing fast. Their work as a high growth AWS and Databricks partner means that they're looking for elite talent ready to create real impact at velocity. Their teams are made up of AI native engineers, strategists and designers who love solving hard problems and pushing how AI shows up in real products. They move quickly using RoboWorks, their agent acceleration platform, so teams can deliver meaningful outcomes in weeks, not months. They don't build big teams, they build high impact nimble ones. The people there are wicked smart with patents, published research and work that's helped shape entire categories. They work in velocity pods and studios that stay focused and move with intent. If you're ready for career defining work with peers who challenge you and have your back, Robots and Pencils is the place. Explore open roles at robotsandpencils.com slash careers. That's robotsandpencils.com slash careers. You've tried in IDE co-pilots. They're fast, but they only see local silos of your code. Leverage these tools across a large enterprise code base and they quickly become less effective. The fundamental constraint, context. Blitzy solves this with infinite code context. Understanding your code base down to the line level dependency across millions of lines of code. While co-pilots help developers write code faster, Blitzy orchestrates thousands of agents that reason across your full code base. Allow Blitzy to do the heavy lifting, delivering over 80% of every sprint autonomously with rigorously validated code. Blitzy provides a granular list of the remaining work for humans to complete with their co-pilots. Tackle feature additions, large scale refactors, legacy modernization, green field initiatives, and all 5x faster. See the Blitzy difference at blitzy.com. That's blitzy.com. So in Q2, what are some of the patterns that we saw? The first you might call the adoption embedding gap. Basically, every single function specific survey reports the same pattern. High claimed adoption, but at fairly low depth in utilization. This is maybe the most dominant finding across all these sources, that the story of Q2 when it comes to enterprise AI adoption is not just the capability overhang in general, but the applied capability overhang even when it comes to adoption inside an individual organization. A second very common finding across a huge array of these sources. There tends to be a fairly big gap between worker level data and leader level data. For example, one study found that in the area of customer service, 72% of leaders said that their AI training was adequate, with 55% of their employees disagreeing. In HR, a huge percentage of leaders report that AI is a priority, but more than two-thirds of HR staff say that their organizations are not proactive in upskilling. In fact, one can argue that people are the bottleneck that is not getting nearly enough investment. We gave seven of the 10 functions, a score of one significantly behind when it came to that people category. The irony is that one could argue that the single largest barrier to converting AI adoption into AI value is on the human side, and it's the thing organizations are spending the least on. To use one dramatic example, Deloitte found 93% of AI spend going to infrastructure with only 7% going to anything related to people. Now outside of people, another finding across all of these sources is that data is kind of the ceiling on everything else. Eight of the 10 functions score a one or a 1.5 on data. Now I don't need to beat this drum anymore for this audience, but obviously without proprietary context feeding AI, things like your code base, your customer history, your deal data, you really are not going to get past basic assisted usage no matter how good the assistant tools get. One could argue that data is not one pillar among six, but the floor constraint that caps all the others. Another area with universal challenge is around outcome measurement. And this is not surprising. Given how much pressure there has been to adopt AI as fast as possible, one of the consequences of that is that no one slowed down, paused their adoption while they went out and figured out how to actually measure the ROI of all those investments. Can you imagine right now someone in the C suite suggesting with a straight face that you take six months off of adoption to figure out better ways to measure the ROI first to ensure that you weren't spending too much. That my friends is a recipe for an early retirement. The byproduct of that, however, is that the actual evidence for AI ROI is pretty thin. Now I will say that if I had to make a prediction about one area where you're going to see the biggest glow up this year, I think there are tons and tons of efforts around ROI measurement. And I would expect that to jump significantly in the quarters to come across all of these functions. The vast majority do not have any scores that are actually on track. Customer service. We rated on track in terms of deployment depth and systems. It makes sense given how much focus there has been on customer service related solutions, vertical and otherwise. Everything is well. We have as on track for deployment depth systems and people again, given that software engineering organizations are effectively the harbingers and guinea pigs for everything else that's going to happen to all the other knowledge workers. I think that this will be intuitive. And then relatedly in it, we also have deployment depth systems and people as on track. Now of course, all of these areas share structural advantages, mature tooling, technical practitioners and measurable workflows. Unfortunately, though, even if it is the case that there are bigger challenges in some ways when it comes to the technical capabilities of practitioners and the measurability of what you do, those other areas of the enterprise are going to have to catch up. Now a few observations looking across the different functions. While customer service did have a couple areas that we rated as on track, it also I think reveals something that could be a harbinger for other areas. Remember, when it came to CS, we heard 72% of leaders say that training is adequate, but 55% of people actually working in CS say it's not. 87% of customer service workers report high stress and 75% of leaders acknowledge that AI may be increasing stress. So you've got a situation where AI is absorbing routine cases, humans get the harder, more emotional ones, many people might not be trained for that shift. And that plus the fact of just increased questions about the long term sustainability of your job, the result is stress, anxiety and burnout. Basically CS could be the canary in the coal mine for what happens when you deploy AI without investing simultaneously in the humans who work alongside it. One of the areas that I think is interesting to point out is the two rating or behind when it comes to governance. In most organizations, IT owns AI governance for a big part of, if not the entire organization. And yet only 54% have centralized frameworks. 50% of AI agents are unmonitored, 88% have had security incidents. And the question becomes if the governance function is behind on governance, what does that tell us about the rest of the organization? One of the most interesting findings when it comes to sales is that it might be the cleanest example of the adoption mirage. 88% of sales teams say they use AI but only 24% have it in their actual revenue workflows. A fair bit of adoption then in other words, this is why we rate them behind in deployment depth. Much of the quote unquote adoption is reps using chat GPT in a separate browser tab for email drafts and call prep, which is not a bad thing at all. It's just not the level of automation and autonomy that I think sales organizations are hoping for. The autonomous SDR dream has not fully come to fruition yet. And I don't think that most sales organizations have figured out the right integration and balance between humans and agents in the new sales working system. The deployment depth score on operations I think is another really interesting one. In some ways, operations has had automatable functions longer than any other function in the enterprise. Think statistical forecasting rules based inventory management predictive maintenance. That's all stuff that predates this latest wave of gen AI by a decade. What that means is that when 90% of operations teams say they're investing in AI, it sounds impressive. When you actually look at what that is, a lot of it is legacy optimization infrastructure that's been running since around 2015. The gen AI layer on top is often very thin, mostly asking AI questions about operational data and generating reports. In fact, one study found that only 23% of operations groups even have a formal AI strategy. Operations is the function where the distinction between old automation and new AI maturity is showing up as a real distinct challenge. Lastly, one more interesting stat outside of the technical areas of engineering and IT and the long duration area of customer service, which has had so much emphasis on automation finances the only other function to hit on track on any pillar. And it does so on governance. Now why do you think this might be? It's because of course finance is operating in an area where governance is not optional. 69% of CFOs report advanced or established AI risk governance frameworks. Why? Socks compliance, audit trails, fiduciary duty, decades of regulatory muscle memory. Basically finance already knew how to govern risky tools even before AI existed. Now don't look at the rest of finance where we rated it significantly behind in every other category. Basically they know how to control AI but haven't figured out how to use it. What's interesting will be whether over the next few quarters we see this turn into a tortoise and a hare thing. In other words, when finance does figure out how to deploy, will they do it more safely and more effectively than functions that deployed first and governed later? And will that actually allow them to catapult at some point and jump out ahead of other functions who at the moment feel like they are farther when it comes to deployment depth? So this is the idea of maturity maps. They are of course very nascent. This is the first quarter where we've actually fully trotted them out. And in the spirit of getting feedback and seeing how useful these things can be, we're putting up the ability on the super intelligent website Bsuper.ai to actually go not only review all of these maturity maps, but do a short quiz that actually shows you where your organization is relative to both the on track line and what we think is the average. And I do want to emphasize that this is not an assessment. This is not an audit. This is a quiz. It is an online quiz that's 18 questions that is going to give you a very general idea of where you stand. We of course have ways to go deeper with your organization and get much better data to actually inform these things. But we want as many people as possible to have access to this to actually help us figure out where these lines should be and how we should evolve the entire system. Obviously, I'll have links to all of this in the show notes, but again, that's going to be at Bsuper.ai slash quiz. In terms of where we want to take this next, in addition to just continuously having better and more sources of data, probably the most glaring thing that stands out to me is that we're trying to argue for one on track line and one average line across all different organization types. In other words, is it really even remotely reasonable to judge a 10 person startup by the same on track lines as a 10,000 person enterprise? I obviously believe that there's enough value that it's worth putting it out while acknowledging that where I'd like to go next with this is vastly more gradations in both the on track and the average lines organized by things like organization size. Industry is another obvious area where you might see some fairly significant differences. And while I think it is the right call to start with some very broad brush high level functions, obviously most organizations are a lot more nuanced than just having these 10 clear departments. Still, ultimately, it is my argument that right now we are in a moment where we need all the data that we can get and the more we can all pile in on top of each other and actually share that information, create new benchmarks and help each other know what on track is and then how to get there, the better off we're all going to be. So that is maturity maps. I'm excited to share them with you. I'm excited to see what you think. Again, you can go to Bsuper.ai slash quiz, which of course will be on all my websites and on the show notes and check this out for yourself. For the sake of precision for every one of these ratings, we have at least a little bit of our argument about what it means to be behind significantly behind on track ahead or significantly ahead. Anyways, friends, that is going to do it for day two of Build Week. Appreciate you listening or watching as always. And until next time, peace.