AI + a16z

From Vector Databases to Knowledge Engines: The Next Layer of AI

46 min
May 5, 202625 days ago
Listen to Episode
Summary

Pinecone CEO Ash Ashwitosh discusses the evolution from vector databases to knowledge engines, explaining how AI agents require fundamentally different data retrieval systems than humans. The company's new Nexus platform uses a context-compilation approach that reduces token usage by 40-90%, improves task completion rates from ~50% to 90%+, and introduces NoQL as a standardized query language for agentic applications.

Insights
  • 85% of agent work is knowledge retrieval, not model reasoning—the bottleneck is data access, not LLM capability, requiring infrastructure redesign rather than model improvements
  • Knowledge engines differ from vector databases by curating and contextualizing data at the source before retrieval, eliminating the need for agents to brute-force multiple queries across disconnected systems
  • Context compilation (iterative, on-the-fly data artifact generation) replaces traditional ETL pipelines, enabling rapid deployment of vertical AI applications without infrastructure overhead
  • NoQL as an open standard for agent-data communication mirrors SQL's role in databases, creating opportunity for ecosystem standardization and third-party innovation in agentic stacks
  • Dramatic token reduction (40,000→2,000 in internal use case) and cost savings create new business model opportunities based on task completion and knowledge quality rather than infrastructure consumption
Trends
Shift from human-centric to agent-centric system design across data infrastructure and retrieval layersOffloading specialized functions (knowledge retrieval) from expensive LLMs to dedicated infrastructure, paralleling historical CPU→specialized processor transitionsEmergence of standardized protocols (NoQL, MCP interfaces) for agent-infrastructure communication as prerequisite for agentic application ecosystem growthEnterprise demand for explainable AI and data governance driving knowledge engine design with citation, attribution, and trust-first architecturesVertical AI application explosion expected as infrastructure commoditization removes 85% of development effort currently spent on knowledge retrievalPricing model evolution from infrastructure-based (compute, storage) to outcome-based (task completion, knowledge quality, accuracy metrics)Multi-model agent workflows becoming standard, requiring data layer optimization rather than single-LLM improvements for performance gainsReal-time context compilation replacing batch ETL as deployment pattern for enterprise AI applicationsData governance and security becoming primary differentiators in knowledge engine competition, not just speed or costMarketplace-driven distribution of pre-built agentic solutions and blueprints accelerating adoption among non-infrastructure-focused developers
Topics
Vector databases vs. knowledge enginesAgent-centric system architectureKnowledge retrieval optimizationNoQL query language specificationContext compilation and data curationToken efficiency and cost reductionTask completion rates for AI agentsExplainable AI and data attributionETL pipeline eliminationAgentic application development stackData governance for AI systemsMulti-model agent workflowsEnterprise AI deployment patternsVertical AI application developmentStandardization in agentic infrastructure
Companies
Pinecone
Vector database company launching Nexus knowledge engine platform with NoQL query language for AI agents
Anthropic
Claude LLM mentioned as example of model too far from data to optimize knowledge retrieval for agents
UC Berkeley
Research study cited showing 85% of agent work is knowledge retrieval, only 15% is model reasoning
People
Ash Ashwitosh
Discusses evolution from vector databases to knowledge engines and Nexus platform launch strategy
Peter Levine
Hosts discussion and provides board-level perspective on Pinecone's product evolution and market opportunity
Quotes
"85% of the agent's work is in just retrieving knowledge. Only 15% is the models. The models aren't a problem. The problem is the underlying system that you're trying to get information from."
Ash Ashwitosh
"A vector database is like a library. A knowledge engine is like an expert in some task you're performing."
Ash Ashwitosh
"We brought it down from 40,000 to about 2,000 tokens. It's under 500 milliseconds from a minute to two minutes. Most importantly, the accuracy dramatically goes up from best case over 68% to well over 90% accuracy."
Ash Ashwitosh
"What happens when software is no longer built for humans, but for agents?"
Narrator
"You're off-boarding that to very specialized things. It's history repeating itself—much of this stuff you're putting on very expensive working models."
Peter Levine
Full Transcript
About eight, nine months ago, we started seeing a massive shift of who our users are. It turns out it wasn't a human being anymore. It wasn't a different persona. It was an agent. 85% of the agent's work is in just retrieving knowledge. Only 15% is the models. The models aren't a problem. The problem is the underlying system that you're trying to get information from We brought it down from 40,000 to about 2,000. It's under 500 milliseconds from a minute to two minutes. Most importantly, the accuracy dramatically goes up from, I think, best case was over 68, well over 90% accuracy. And that is just version one. We finally understood why these things were taking so long. And they were fundamentally running on a system that was designed for human beings. What happens when software is no longer built for humans, but for agents? For years, systems like databases and search were designed around human interaction. A person asks a question, evaluates the response, and decides what to do next. But with the rise of agents, that model starts to break down. Agents don't have context. They brute force their way through systems, issuing dozens of queries, consuming tokens, and often failing to complete tests. This creates a new bottleneck, not in the models themselves, but in how data is retrieved, structured, and understood. In this episode, Peter Levine speaks with Ash Ashwitosh, CEO of Pinecone, about the shift from vector databases to knowledge engines, and what it takes to build systems that actually work for agents. Hey Ash, welcome. Hey Peter, been alive? Yeah, it has been a while, good to see you. So we're here to talk about, at least I'm here to talk about Pinecone's new launch. Yeah. And I know we, I'm a board member with you, and we've been on this journey together now for a bit. And you all have been working on this new product called Nexus. And, you know, I'd love to hear more about it and sort of the, kind of the genesis of it. and then what's happening at the launch and kind of where to from here. Yeah, I think we've been talking about it at our board level for several months now. About eight, nine months ago, we started seeing a massive shift of who our users are. It turns out it wasn't a human being anymore. It wasn't a different persona. It was an agent. And that shift fundamentally changed how we thought about what's the best way to serve this new user in the world of retrieval. If you think about what we had done for five years, six years, since we first pioneered the vector database market, the idea was you provided an interface to a human being who did a query, got a response back. And it was the human being who provided the context about whether the response was accurate, whether they had to re-ask the question, and they would finally take the action based on whether they verified the information or not. Unfortunately, agents don't have the luxury. The human gives them a task, and agents go there and start trying to perform the task. and they spend a ton of time going through this brute force loop of querying, getting some chunks of data back. And when you say, just so I have the context, when you say agents spend a lot of this brute force, what are they actually, let's say right now, before Nexus has launched, what are they actually doing in the background? Like querying what and what's the nature of that whole data flow? Yeah, so you give it a task to an agent to say, hey, is this product under warranty? Okay. Somebody asked them. Right now, let's say without Nexus. Yeah, customer service. Okay. Agent comes in and says, can you let me know if this product is under warranty? Right. Agent does something called a query expansion, breaks up the queries, and then says, okay, let me go figure out what this product is. And it goes to five or six different systems. Sometimes it might be sales order system, product definition system, things about warranty information. And it sends out different queries, just like a human being would, because that's the interface we provided as part of the database. So here's an agent trying to solve a problem without having any context with a system built for a human being. and so it goes out issues a query and it asks six or seven different queries before it first starts to get an idea about the first think of it as the first line of code effectively sometimes it could be 40 different queries and it might be an internal system or external all over the place whatever it could be any kind of stuff right but the idea most of these guys most of these agents do is they do a ton of retrieval figure out oh I don't have enough information let me go ask more questions. Oh, I have a conflict here with this information. And this reasoning goes on until they finally figured out either okay, I'm done with the task. Let me report back to my human that the task is complete and the human has to actually examine because most of the time turns out the task completion rates is less than 50%. So half the task return these agents don't actually complete right. And they take a ton of time, in fact, there's a research study that came out of UC Berkeley, which showed 85% of the agents' work is in just retrieving knowledge. Only 15% is the models. The models aren't a problem. The problem is the underlying system that you're trying to get information from, they were built for human beings. You're asking agents to come back and to pretend like you know, and we talked about this before when And when machines are talking to machines, why do we have an interface that looks like a human being? Right, right. Yeah, you and I have talked about that for a while. Yeah, and this is the same problem. It just happens to be agents performing specific tasks. And that change in our user led to what it means to fundamentally change how retrieval is done by Pinecone. Yeah. And that's what we're calling Nexus. So maybe to help me and help to put the context here, pinecone of course um you know built and defined the vector database category okay so now we're talking about this nexus um you know it's a you call i believe we call it a knowledge engine yep so is this just a marketing term like what actually you know like yeah vector database like you know instead of doing this we'll call it something else but it's really the same thing And so kind of helped, you know, for me, just helped me to understand like one is a mark, you know, one, you know, you just put some lipstick on, it looks different, right? Or there's a really, there's a different approach built on, you know, built on vectors, not built on vectors, like what's the evolution of pine cone into this? And maybe a second question on that is, how did you actually bump into this? I mean, what were users doing that informed the company that this shift was occurring and that Pinecone was a viable solution for this? So sorry to break maybe both of those. Yeah, I think the distinction is absolutely real in terms of what a knowledge engine is. and what a vector database provides to the knowledge agents. I think think of a vector database like a library. There's tons of information out there. A human being asks for some information, appropriate books and pages and documents are given to you and you read through this stuff and you figure out the knowledge out of it and go ahead and make a task. Now you allow the same vector database to operate with an agent. it has to do the same thing except it doesn't have the context so it spends when you say the agent okay go ahead and it has to go through everything where you read all the pages that are relevant you synthesize across them you hope it got the right answer and that's the brute force approach because agents are very very good at reasoning they can spin up more queries in a millisecond than you can do in an entire day. Right, right. And so they brute force their way through, which is why you see a ton of tokens consumed for even the smallest of the applications. Yeah, yeah. Now a knowledge engine is more like an expert, an expert in some task you're performing. You want to get some tasks done, but let's say you have a medical billing task agent and a knowledge engine for medical billing for that specific task is an expert in figuring out the medical billing part. It may not care about your prescriptions. It may not care about medical research. Got it. Let's just say billing. Just billing. And then, go ahead. That same knowledge engine uses the exact same data, which is, you know, let's say in a hospital, and may have a very different persona, a very different context when a doctor uses it. Sure. Versus when a hospital administrator uses it. Sure. And that's the difference. I think a vector database treats all data like it's a pool of data, like a library. And you need that. That is essential. But you need something else on top that literally creates a context, very, very specific. Okay, so we'll get back to the other one. I want to follow up on point B here on that. So you have, I get the library analogy, a bunch of books, and now I get, I also understand this knowledge. I think I do, the knowledge engine, which is as if you've read the books and it gives you the context. But I'm trying to distinguish between an LLM that I kind of thought did some of that stuff versus what added things did Pinecone do to turn the library into the knowledge agent? And then, you know, without having an LLM, like what is the contextualization in the, I mean, we can use the example. of the billing service, right? Okay, so now how does Pinecone know the context itself? Where does it learn that? I guess that's the question. You know, I think fundamentally today, all of the reasoning is done at the retrieval level, which means once you get the data, you got the LLM, you throw it in there. Sure, yeah, yeah, yeah. Let me figure out the answer. Yeah May or may not be the right answer Right I don even know if you have all the data Got it All I reasoned over was based on the data you gave me Yeah yeah yeah Okay When you move the reasoning closer to where the data is closer to where the curation of the data, where the actual processing of the data is happening, you can do a lot more things. For instance, you can get the right kind of data because now you know what context I'm addressing for. More importantly, you can start citing and attributing So you actually can say, this is the citation of why and where this answer came from, as opposed to, I don't know, but not. It just probably talks to some MCP server, gets some information, and brute forces it way to some answer, whether it's right or wrong. So when you move the reasoning from retrieval to curation, closer to the source, closer to the data, significant differences happen. And what you would do is you would tell Nexus, okay, I have this data, and typically these are the answers I expect to see. This is my context. So you give it the appropriate data. And when you say you, that's a human that does that typically? Okay, just set it up. You're effectively kind of training. We call it building. Okay, go ahead. You're training the context of the knowledge engine to say, with this data, here are the answers I expect to have. Got it. So based on this test data, this is where the interesting part is. very similar to a compiler that we remember. Sure. You write code, it compiles and generates some code. This one is a continuous compiler, an iterative compiler that says, okay, you gave me this data and you want this output. I want to match it. So I want to keep figuring out how to curate, how to break up this data in a way to create new artifacts. In fact, we actually create completely different artifacts. And is this happening all within the Pinecourt system? The entire reasoning has been moved inside. I see. Okay. And that's where you start looking at, you gave me this data, but this is the output you want. Let me find the most effective way to just completely break up this data into new artifacts. So, for example, in case of billing, you might give it the entire hospital data, but what you care about is just the patient, the doctor, and the bill. Got it. Maybe you don't care about the research part. Got it. After we break that up, that's when we embed that data back into FindCo. I see. And so the fundamental shift here is the first build phase, which is you are now compiling the context very specifically for the knowledge engine. That's one part. So as the new data comes in, it gets converted into this new format that is very close. Got it. It gets cited back to other sources. Yeah. It gets put back into Pinecone's back to database. That's one part. The second part is on the retrieval side. Now agent says, not only did I give you the data? I want to get some information. And don't give me a poem. Don't give me an image. That's cute for a human being. Give me very structured data. Tell me exactly. I see. In a very structured format because I'm a machine. I understand structure. You're machines. Yeah, I understand structure. And you don't care about images or whatever. So that's the second part you define. As part of your definition of context, not only do you define data and what kind of outputs, but you also define the format of the output. Because the format for billing might be very different from the format for the doctor. Got it. Might be very different from the hospital administration. And how hard would it be for somebody to set this up? Let's say, you know, you start with the human, they kind of organize things. Like how, and then we'll get back to how customers actually bumped into this. But what are you, you know, what's the presentation and complexity that a user has to go through? Literally, in fact, we were working on an internal one for our own contract management stuff. We've done hundreds of contracts. What we did was to say, okay, why don't we take all the contracts we did? Let's on one side talk about the successful contracts. Let's look at the input of all the contracts with the red lines. This is your source data. This is your destination. Figure out how I can approve something from here to there. And we just loaded it to the build phase, runs about three to five turns, takes a few minutes, and you create a new artifacts. Wow. This is literally, I hate to use the word training a model, but you're training a knowledge engine. Yeah. In a very, very different way. Right. And that? It's almost like you're training data to be present, you know, you're training data to, you're using data to train a knowledge engine. Exactly. And the data is the foundation. the output and the format of the output. And there are several things, and we'll talk about the new protocol that we have defined to make sure the agents can actually define how they want to get responses back. This is literally the massive gap that we've had between models that have spent a ton of time building reasoning capabilities, and people have completely ignored where the real value is, which is on the data side, not inside. And then let's say in this case, the agent now, let's say we have the knowledge engine, agent queries the knowledge engine, it comes back and query understandable, sorry, an agent understandable language. Yeah. Would the agent still use an LLM in that case afterwards? Is that sort of the best? Is that how this works? And so, I mean, my takeaway from that is it will simplify or reduce the number of tokens actually used for the backend LLM system and all that because my data is much more prescriptive when it gets to the H. Is that fair? Absolutely. And three things happen. One, the task completion rate, the success rate of a task has gone up on an average about 50%, maybe 60% in a good day. It goes up well above 90%. you actually have an agent finishing a task. This is even more important. There's no point giving someone a task, even if they did it for free. Okay, you just did the wrong thing. And if it fails, that's the biggest. That's even worse. So number one is task completion rate goes up dramatically. Number two is the time it takes to complete the task. It used to take, if you run today any of the tasks, it takes minutes. And part of the reason is spending a ton of time, 85% of the time trying to just retrieve knowledge. that dramatically goes down. And in our own internal various applications that we've been building on Nexus, tokens have gone down depending on how badly, how good it was written, but in 40 to 90% reduction in 20-year model tokens. Wow. And that is huge. I mean, that's a big cost. That's a cost savings, performance saving, the whole thing. I mean, ultimately, the ability for you to come back and have, quote, unquote, an expert who gives you precise answers very quickly, the lowest cost, that's huge. Yeah. That is huge. I mean, it's really, it's accuracy, performance, and cost. It's like all of those benefits come together. Yeah. And that, the problem for users hasn't been the models. That hasn't been the problem. Right, right, right. That's why you get demos really quickly. Right, right, right. It takes four hours to put a demo together. Sure. But then yet you understand, why is it taking so long for people? Interesting. I think the difference here is people have been traditionally using kind of ETL pipelines, right? They've been taking your data through just like the old database. Yeah, yeah. This is not an ETL pipeline anymore. Yeah. This is context compiling completely on the fly. Yeah, yeah. I love that. I love that concept of context compiling. Yeah, completely on the fly. I understand that. Yeah, absolutely. That makes sense to me. Yeah. So, Ash, I had asked before in the multi-question, multiple questions, like, what were customers, you talked about current customers sort of doing this, and that's how Pinecone recognized that there was an opportunity. So maybe talk about a customer who had Pinecone, and then what were they doing? Like, how did you know that this was a real opportunity based on customers? Yeah, let's take, maybe the customer zero was Pinecone, actually. Okay. Because we had started building our entire operations, an operations agent that allowed us to run our business without dashboards. We just banished the dashboards and moved to a model that kept the entire company's knowledge alive and accessible everywhere. So we still have this agentic backplane called Ask Data. And every query we put out there would take six to ten queries to come back with the result. It would take about 45 seconds or sometimes a couple of minutes. and oftentimes we would come back and actually validate that that was the right answer. And so, and in the process, we also know digital take us about 40,000 tokens. Wow. And you look at this, I'm saying, this is a small application. Now it's bringing data from all kinds of places, our data warehouse, our Slack, our Gunk, our Clay, all kinds of sources. And then you started looking at what was it doing? It turns out, our agentic application and the frontier model just went out and blasted. Right, right. Tried to get everything possible. Yeah. Put it through the existing agents and keep doing this over and over again. Like you were saying, yeah. So once we got, we moved that to Nexus, we literally took out 90% of the token usage. We brought it down from 40,000 to about 2,000. Wow. It's under 500 milliseconds from a minute to two minutes. Right. Most importantly, the accuracy dramatically goes up from, I think, best case was over 68, well over 90% accuracy. And that is just version one. And that, I think, was our first revelation that, okay, we finally understood why these things were taking so long. And they were fundamentally running on a system that was designed for human beings. And then we have a customer support agent that somebody had built about, you know, does Acme have? are they in warranty are they in support and you would go to three different sources like we talked about the customer record sales record the product record you would watch this whole thing take a lot longer than it should and so that was our first principle is to figure out maybe we need a maybe we need a system that actually brings a lot more of the context much, much more closer to the data than trying to push it into an LLM update. Yeah. I mean, again, it's just the compilation of data to provide context and knowledge is super, super important. And with the same data set, you might have different contexts. Totally. And it's important to make sure that the artifacts that we created were created completely on the fly like we talked about It a context compiler But unlike the regular compiler it keeps iterating until it got to the right artifacts And so yeah for your context for your knowledge engine you want to build for this particular agent, this is the right form. Right. So you mentioned that there's now this new language that the agent talks to Pinecone and all of that. What for Nexus, how does all that work? and what was the innovation there? Yeah, so once we built Nexus and you have an engine where you could have an agent define what its task was and what kind of a knowledge engine it needed, it just didn't have a way to specify that. It didn't, they needed to be a language that both knowledge engine and an agent could actually talk. So we defined something called NoQL It's a knowledge engine Query language Or knowledge query language And the intent Was to put it into Three buckets And six basic parameters One was In terms of What is the intent of this query I want to be able to say Specifically This particular query has some intent on what my ask is, what the scope of the data is. And second is in terms of time, I need this response in 45 milliseconds. Don't take an hour to come back. Figure out the best way to get me the response at a certain time. And third one was to really talk about governance. How much of the data set am I going to go access? Don't give me the entire data set because I need to be able to put a governance across the board. We're able to come back and have explainability. This is what we, it's not just the knowledge engine, it's about being a trusted knowledge engine. That makes a big difference about how you're doing in the enterprise. Right. What, so this, what about the economics of this? And how do we think about that? And how do you, you know, I mean, you mentioned kind of the completion rate and other things. Is it, I mean, if I'm a company, I'm going to go build, let's say, build agents, right? Can I quantify this up front or do I just wait and see and say, hey, like, you know, I'm going to use Pinecone Nexus and we'll see what happens? Is there a way to say you're going to get 90% completion? It's going to be, you know, 40,000 to 2,000. Is there a certain class of data where we know or you know that that is going to be the outcome? Does it happen on all data? Like what? Yeah. How do you think about that? And how should customers think about it? Firstly, if you think about where the cost is today, every vertical application is building their entire knowledge retrieval stack. it's like you know I might go back in time and say every every database application was writing its own query language building its own database or even further up saying I'm building my own operating system my own silicon so one is from a user's perspective even with our own ask data we saw 85% reduction in our actual code required because that whole part is gone so that's number one in terms of ROI and TCO. Second is for the same data, how many context engines are you, or how many knowledge engines do you want to go provide? So the larger the data set, the bigger it becomes. If it's a small data set, by definition, it's pretty constrained. A model can do fine. In fact, you can load up the entire model in a, entire data set in a context of the model and they'll be fine. But in this case, this was important for us to go after large data sets with lots of knowledge engines, lots of tasks and agents running across the board. And the bigger they are, the exponentially higher the overall benefits are. Right, right. Pinecone is an infrastructure company. Absolutely. That's just a statement. And in order for, you know, infrastructure requires applications or agents stuff to get built on top of the infrastructure. Yeah. So, you know, how do teams think about this? How should they go about thinking about building, you know, apps, agents on top of this? and how does one build this in and think about it in terms of the global, you know, sort of stack or basically rewriting the stack here for agents. And so what should that stack be and how to, you know, how do I get this? How do I as an enterprise actually leverage this as quickly as possible? Because everyone is saying, oh, we got to go do AI, right? And so, you know, everyone's demanding, I mean, you know, the leadership of companies do AI, right? So the faster you get it done, the better. So firstly, I think if you go back to the DNA of Pinecone, it was started and continues to be a developer-centric company. You have somewhere between 35,000 to 40,000 developers who continue to sign up, who learn about vector databases. And it is those same developers who are moving and building agent applications. For us, the starting point continues to be making NoQL public to these developers. And does that now come with, let's say I do a, you know, for the 40,000 people signing up, it's just built in right up front? Or is there an added, like, how does that, how do I know about NoQL? So, one, we have to continue to partner with the agent harness companies. Yeah, okay. And we may have to put things like the skill.mds for Cloud to define a whole interface. No different than how we promoted the existing APIs. We have to start partnering with some of these folks. So number one is getting NoQL to be adopted by the same development community that adopted PineCon vector database. Now, as they move up to agentic applications, they use a whole new API across the board. Second is partnering with, we intend to make no QL an open standard. Got it. So we're partnering with some of the industry standards at the right time. I think we need to get enough adoption to make sure this thing becomes an industry standard. Yeah. So just like you had SQL for databases, GraphQL for APIs, you expect to have no QL for agentic applications. Got it. In addition, there's one more part we're also working on is to create a standardized agentic stack. What does an agentic app stack look like? Right. Now, if you think about your traditional agents are the applications, LLM is a new operating system, and Pinecone is the disk, in between now you have one more thing called knowledge. That becomes a standard stack. Got it. And to make it very easy, obviously we have the core database. Now we have this knowledge engine. Plus we're also opening up something called Pinecone Marketplace that we'll be announcing. that makes it very easy for someone to have a prepackaged complete solution. You want time to value. You can go to Marketplace and look at either an app that we built or a third party. I say that as a blueprint to see how it's done. Or you can just use it. It might be production ready. Yeah, yeah, yeah. Or you might want to customize it. The idea is for you to start as the hardcore developer of the database or as an agentic application of the knowledge engine or as an end user with a full-fledged stack-based solution that you can interface with. And that part is both ours and the third-party partners. So let's say, just so I'm clear here, we have, or Pinecone does, 40,000 new people trying out Pinecone vector database. Now, let's just say I want to try Nexus. Yeah. Is that, do I add, how do I do that as a developer? Where do I, is it a new thing that I add on? Is it embedded in? And like, how do I get that? It's just another API service. It's a fully managed service, just like Pinecone. I see. Okay. So all you have to do is get your agentic applications to use NoQL. Got it. Completely change the economics. And the most important part here is once we start working with some of these other partners, so that it becomes even easier for these agentic harnesses that build agentic applications to directly use that, the friction gets even lower. Got it. I have this crazy question. Yeah. Anyway, the crazy question is, is the layer, the knowledge layer with NoQL, is that dependent on Pinecone being there, the vector database, or can this work with any? It can work with anybody. The whole idea is no QL is supposed to be an industry standard. No, but will our implementation of it be that way? Yeah. Can work with any underlying? Well, I think Nexus is going to be built on Pinecone Vector Database. Got it. No QL is supported by Nexus, but somebody else could build no QL. I see. Understood. Okay. So, yeah, that's a good distinction. But Nexus is the full, you have to have some knowledge. Nexus has both sides of it. Yeah. The top part, then the disk part, and the knowledge part together. Absolutely. And the disk part, and then there's also in there, is also the auto-ingest part. Yeah. Being able to connect to all kinds of sources of data. Right, right. Got it. So you can almost imagine every, tomorrow, you can have a vertical application. Yeah. Somebody has a great idea. Yeah. You don't go through trying to build your own database, your own operating system. Yeah. You just point us to the data sources. Right. Point as to what context and what knowledge you want to go back and what task you're trying to accomplish. And that's it. And after that, you've got a vertical application you can focus on. Now let's look out two or three years. Yeah. What is this, you know, what does it look like when all this is working? Yeah. And you sort of explain what becomes, you know, maybe what's possible that's not possible today. Yeah. You sort of explain that. But let's say in two or three years from now, how does this all look? Very similar to the Cambrian explosion that happens every time somebody standardized the most common layer, whether it is an operating system, whether it's a SQL interface. Now you'll have an explosion of vertical AI applications or agentic applications that now don have to worry about what kind of tokenomics you dealing with the speed the accuracy All you have to do is point us to what data sources you want us to engage with. And certainly you can focus on the real vertical application, the real vertical business case that you're trying to focus on rather than the infrastructure underneath. And like we said before, 85% of the agentic work today is knowledge retrieval. So certainly you're out of the business of dealing with 85%. You take all that effort, put it back into where the vertical is. Second part, more importantly, if you truly are deploying in large enterprises, trust becomes important. So not only do we have a knowledge engine, but you actually have a trusted knowledge engine that gives you an entire trace of how we reason to get to this answer, gives you the citation of where the data came from, so that you have an explainable AI. Right. At the same time, you're doing it at a, not just the economics of using a model, but also you're getting out of the business of building ETL pipelines. Right. You're building knowledge engines completely on the fly. The old model of analytic source, transform it, load it into a vector database one time, that's gone. now you have context compiling on the fly as you require and that's a big change in how people go back and deploy today it's a today if you think about it the demo is great it comes out very quickly everybody runs an AI agentic application and then they stop they have to go through this ETL pipeline they have to trust they have to security you have removed all those barriers you just dramatically simplified and dropped down the cost Yeah. So speaking of cost, what does the pricing look like for, you know, and how is Pinecone thinking about evolving pricing relative to what we're talking about here? We have a first draft of it, and we continue to work with several partners to identify what the right pricing is. but it will be more aligned with how knowledge is curated, knowledge is extracted and tasks are completed. And less about infrastructure. It'll not be about regional rights. It'll be at a level that is more about task completions what kind of knowledge you want to secure it. So we'll continue to evolve that one. Yeah. Sometimes we thought about just, it could be as simple as how many tokens you are saving you. Yeah. It could be as simple as that one. But it turns out that itself is not in a good metric because somebody could give you a product for $0, but the trust is terrible. Yeah. Or the accuracy is terrible. Yeah. Then that's useless. So we tried to combine both of those. Yeah. I think one other thing we've done is now that we've been opening up to an entire new interface for agents where you expect a thousand X more agents than human beings, human users, probably more. it was important for us to also change the economics of the underlying platform itself. The vector database itself needed to enable the economics so that you have a vector database, you have a knowledge engine, you can stack all of them at the same kind of pricing and margins. So we also are announcing an entirely new price point that allows for this entire knowledge engine to be much more successful in terms of adoption. so part of the announcement will be the first of the changing the cost structure of the core database itself we will be doing that the rest of the year so not only are you democratizing the access but you're also opening up the economics for a lot more use cases got it yeah that's exciting the fascinating element here and I'll say this it's hard to believe that this knowledge you know, Nexus, the knowledge engine here, and the compiling of data to make context and all that has such a dramatic impact on the number of tokens used, right? It's astounding. Yeah. And if you just think of it, like, I mean, this is, it's sort of revolutionary in the way we talk about it. You're like, oh, it's casual. Just put this thing in and you'll say, go from 40,000 to 2,000. I mean, that's a freaking major, major shift. And it's hard for, I mean, just intellectually, it's hard for me to believe that, you know, Pinecone, like, actually has this. Yeah. And, you know, I guess, yeah. But you and I have seen this parallel before. There was a time when I.O. interfaces, all of the I.O. code used to run on CPUs. Yeah. And CPUs are expensive. Everybody worried about the cycles you use. And then you started off-boarding that onto dedicated processors. Yeah. Like I.O. boards, I.O. cards. Yeah. Like, if you remember. Yeah, yeah. Networking, same thing. Graphics was another. I mean, all of those, all those different, you know. It's based on off-boarding somebody's specialized functions. That is exactly what we did. It's history repeating itself to say, much of this stuff you're putting on very expensive, working models. Yeah, yeah. You're off-boarding that to very specialized things. Right. and allowing applications to be built. I mean, yeah. You know, it really strikes me that we're, and this is good for the industry and good in general, we're really at the very early innings here of this whole transformation. Because if you think like, okay, it's expensive, there's tokens, now we're going to optimize. It's kind of like all these industries, you know, like there were past examples, graphics, whatever, networking, all that. They created, there were whole industries that got created by optimizing the first order, right? So the first order was everything runs on a CPU, right? And, you know, it's, oh, my God, we got to, you know, have more CPUs and all this. But then it was like, no, no, no, we're going to take, we're going to offload that CPU and go do other specialized things. And they created, I mean, of course, like entire industries were created out of that with a lot of the same use case being the fundamental. like you got to move bits around on the network or you got to show graphics or whatever. It's just the cost load shifted to a more appropriate area. And that's like what we're seeing here. And I will venture to say, no pun intended, there's going to be a lot of this. I mean, whether it's Pinecone or other areas of the industry, right? We're like in the first inning of the multi-inning game and somebody goes into overtime. I mean, I'm like, it's just... It has been done. It has been tried. The first one was, we looked at this one for some time. We knew the problem. We knew the solution. We also spent a lot of time wondering, are we the right people to do this? Yeah, yeah, yeah. First one was, why don't we just, Claude, do this thing? Yeah. Why don't we do this thing? And you realize they are too far away from the data. To them, it's just data. Right, right. Everything just brute force you trying to figure out, okay, they were too far away. And not only that, each of us uses, each agentic application uses multiple models within a single task. So what am I going to do? Load up LLMs with the data. That's unbuilding. So ultimately, it comes back to first order stuff. If you're talking about getting knowledge, and the knowledge is being derived from data, you have to be as close to the data as possible. Yeah, yeah. And beyond until this point. Yeah, yeah. So it's, I mean, it's awesome. I mean, and I think, yeah, there's going to be a lot of, I mean, I just think a lot of opportunity, you know, pine cone aside, to optimize the, like, AI is, you know, it's incredible, it's magical and all that. But it's a very blunt instrument right now, you know? And like, yeah, we're going to sharpen a lot of things up over the next, you know, the long tail of this is to, you know, optimize and make efficient a lot of things. The biggest one continues to be around trust and security. Yeah, for sure. It's an entire new opportunity. For sure. That's an opportunity in and of itself, right? Absolutely. But all these other bits, I mean, you know, and if you look at sort of, you know, the past history of computing, A lot of these things repeat themselves in terms of the importance of offloading processes, the importance of security, the importance of data governance, the importance of, you know, applications having the right access. I mean, all of these bits and pieces sort of come together. I'll give an example of what else we're doing, which MCP interfaces, which have become the de facto way. In fact, I posted this yesterday or the day before. As we looked at that, they were the first ways to define access, a standard access model to access any source of data. Gen 1, great, nobody cared. It made it very easy. And now you're finding out each MCP interface sucks up a lot of tokens because they're not optimized. So now you get to the point where, can I put an MCP interface optimization behind access? or maybe somebody else designs a router. Yeah, for sure. So there are definitely very early innings. I think we find one part of the stack that we think we're focusing on. We'll continue to have other partners. Well, you know, I'm looking forward to seeing how all this evolves. Yeah, we love it. Love it. This changes on a daily basis. Yeah, for sure. This is amazing. And awesome. Great time to be in the business. All right, Ashley. Thank you, Peter. Okay, brother. Bye. Thanks. Thanks for listening to this episode of the A16Z podcast. If you liked this episode, be sure to like, comment, subscribe, leave us a rating or review, and share it with your friends and family. For more episodes, go to YouTube, Apple Podcasts, and Spotify. Follow us on X at A16Z and subscribe to our Substack at a16z.substack.com. Thanks again for listening, and I'll see you in the next episode. closures.