"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

99 min
Mar 22, 202627 days ago
Listen to Episode
Summary

Karan Vaidya, CTO of Composio, discusses how their platform provides AI agents access to 50,000+ tools across 1,000+ apps through a single interface. The conversation explores smart tool patterns, continuous learning systems that improve tools in real-time, and how well-designed skills can help developers avoid model lock-in by making frontier models interchangeable.

Insights
  • Skills can serve as a commoditization layer, making frontier AI models interchangeable when instructions are thorough enough
  • Token costs are already exceeding human payroll costs for AI-first companies building agent infrastructure
  • Just-in-time tool discovery prevents context overload while dynamic tool improvement happens in real-time based on agent failures
  • The build vs buy decision for AI agents is shifting toward build as friction decreases and customization needs increase
  • Agent-to-agent communication works best for exploratory, context-heavy tasks while single agents with smart tools handle simpler workflows
Trends
AI-first companies seeing token costs exceed human payrollShift from simple API wrappers to intelligent tool orchestrationGrowing demand for agent-specific infrastructure and toolingMovement toward build vs buy for custom AI agent implementationsEmergence of skills as a way to achieve model provider independenceReal-time tool improvement through continuous learning systemsProgressive disclosure becoming standard for managing AI context limitsAgent delegation patterns emerging for complex multi-step workflowsSelf-hosting requirements driving enterprise AI adoptionBackground learning systems automatically converting agent traces into reusable skills
Companies
Composio
AI agent tool platform providing 50,000+ tools across 1,000+ apps with smart discovery and learning
Anthropic
AI company whose Claude models are frequently used and compared for agent capabilities
OpenAI
AI company whose GPT models are used in agent development and compared with other providers
AWS
Using Composio to build core agent products and provides infrastructure for AI workloads
Zoom
Enterprise customer building agent products on top of Composio platform
Airtable
Enterprise customer using Composio for agent development and integration
Glean
Enterprise customer building agent capabilities using Composio tools
Intercom
Customer service platform with Fin agent that resolves 70% of tickets at 99 cents each
Salesforce
Major SaaS platform discussed as incumbent adapting to AI disruption threats
Slack
Collaboration platform used as example of complex enterprise software facing AI competition
Google
Provider of Gmail, Google Drive, and Gemini models used in agent workflows
GitHub
Platform discussed in context of CLI vs MCP debate for agent tool interfaces
Mem0
Memory platform for AI agents integrated as tool within Composio ecosystem
Zep
AI agent memory service available through Composio tool integrations
Skyfire
Payment platform for AI agents integrated within Composio tool ecosystem
People
Karan Vaidya
Main guest discussing AI agent tool orchestration and smart tool development
Nathan Labenz
Podcast host interviewing Vaidya about agent infrastructure and tooling
Quotes
"If you provide thousand tools to the agent, it will probably use the wrong blade and suicide via context overload."
Karan Vaidya
"Our token cost is definitely much higher than our human cost right now."
Karan Vaidya
"Excellence in tooling and increasingly in skills can help developers avoid model lock in."
Nathan Labenz
"We position Composio as a one shot way to being not locked in essentially to a model provider."
Karan Vaidya
"Everything is isomorphic to everything else. Whether it's an MCP or a CLI, you can probably do progressive disclosure."
Nathan Labenz
Full Transcript
4 Speakers
Speaker A

Hello and welcome back to the Cognitive Revolution. Today my guest is Karin Vedya, CTO of Composio, a platform that allows AI agents to access more than 50,000 tools spanning more than 1,000 apps all through a single interface, and which is one of the best examples of the smart tool pattern that I've been watching out for since the MCP paradigm was introduced. This is a sponsored episode, but as always I have played around with the tool for the last couple of weeks and it's clear to me that Composio does address several real problems. Problems for starters, core platforms like Gmail, Google Drive and Slack do not make it easy for do it yourselfers to grant access to AI agents. The number of clicks required to get started is a serious barrier for casual users. Most other tools are simpler to connect, but many aren't popular or well documented enough for AIs to know how best to use them from the start. And sometimes quite a bit of iteration is required to get things working well. Looking ahead as people delegate larger projects to their agents, those agents will often need tools that the humans never anticipated. All of this is indeed made much easier by simply giving your agent access to Composio, which allows agents to express high level intent, identifies the right tools for the job, and provides authentication, execution, sandboxes and logging infrastructure that few developers

0:00

Speaker B

really want to build on their own.

1:21

Speaker A

In this conversation we get into the details of how Composio works and how they're delivering on the smart tool promise by using an AI powered continuous improvement process that can detect when a tool isn't working for an agent, generate a new version in real time and then swap the upgrade into the agent's context and which over time in the background automatically identifies and diffuses successful patterns across the entire Composio customer base. One of the most interesting arguments that Karen makes is that excellence in tooling and increasingly in skills can help developers avoid model lock in. The idea is that while models do have different default behaviors, they are all very good at following instructions, which means that if you have very thorough instructions you can probably get similar outputs from any frontier model. And for cases where that doesn't work, Karin and team are also working on meta skills which which translates skills from one provider to another, reducing switching costs even further. Beyond that, we also hear about Karin's favorite agent use cases which notably look more like full jobs than discrete tasks. His perspective on which technology companies are gaining strength from the AI wave, which are most threatened, and how sticky agent products like Intercom's Fin will prove to be over time his thoughts on memory platforms, payment frameworks and other tools built specifically for AI agents and how Composio works today, which includes individual engineers who manage tens of AI agents and for the team that manages Composio's own agentic pipeline, a token bill that exceeds human payroll. For me, this conversation couldn't be more timely.

1:24

Speaker B

Over the last couple of months I've

3:03

Speaker A

put in the work to curate the context that Claude Code needs to serve as a second brain and a capable assistant, and it's become my go to interface for just about everything that I

3:05

Speaker B

do on a computer.

3:14

Speaker A

The next level up will be to get agents doing large scale projects autonomously on my behalf, and as I enter this next phase, Composio will definitely be a part of my stack. I'll report back on how I'm doing, of course, but for now I hope you enjoy this conversation about building smart tools for AI agents with Karan Vedya, CTO of Composio, the Cognitive Revolution is brought to you in part by Google, makers of the Gemini family of models. Anyone who's followed the show for a while knows that we've been sporadic at best when it comes to posting clips on social media. The hit rate of AI clip makers, in my experience, isn't super high, either because the segments they highlight are kind of beside the point, or because the edits end up awkward and are too much trouble to fix. I think a big part of the reason those problems happen is that most models simply cannot watch the full video, and this is why I'm now sending full videos of the podcast directly to Gemini 3.1 Pro and asking it to select the best clip worthy moments with full awareness of everything that happened in the entire conversation. Gemini is the only Frontier Model API that even accepts video inputs right now, so there is literally no other product that supports this workflow. And it doesn't stop there either. Once we select clip candidates, I apply a layer of agentic editing and then send the rendered clips back to gemini for a 13 point evaluation. With its scores, I'm able to structure a mini social media campaign where I'm working with only genuinely good clips from the start. Honestly, it's getting so easy you could do it for just about anything internal company meetings, long presentations, or even home videos. Visit Google's AI studio to explore Gemini's native video understanding. Thank you to Google for supporting the Cognitive Revolution. And now on with the show.

3:16

Speaker B

Karan Vedya, CTO at Composio. Welcome to the Cognitive Revolution.

5:09

Speaker C

Hey Nathan, thanks for having me.

5:14

Speaker B

I'm excited for this conversation for folks who don't know what Composio is. I've been playing around with it a little bit over the last couple of weeks and I've come to think of it essentially as a Swiss army knife for AI agents. Obviously we're all developing agents for a wide range of use cases. Some of which are, you know, ad hoc, minute by minute assistants, others are, you know, built in in much more intentional and structured ways into products and all. But all these agents need tools and some of the tools we have the time and the luxury of building out in a really intentional bespoke way. And then there's just a ton of other things that a lot of people have common needs for. And this is where I see Composio coming in with a sort of ready to go thousand tools that you can plug into your AI agent and give it a much broader reach than it would have if you were building out every tool one by one. That's my takeaway from getting it under the hood a little bit. How do you like the description and what would you add to that? For starters?

5:17

Speaker C

So you're on point in understanding the problem that we solve, that we provide thousand plus apps, 50,000 plus tools to your agents, to anybody building agents. But that's not the like the final solution. Because if at this point where like in the LLM journey we are, if you provide thousand tools to the agent, it will probably use the wrong blade and suicide via context overload. So that's where we are essentially building the whole agentic tool harness. What I kind of call composio is agentic tool execution layer. So the whole hardness for tool execution that while building agents that you would need to develop or you would want to give your cloud code codecs to, that's what we provide. That includes like a few meta tools so that you don't have to put in thousands of tools to your agent. It includes managed authentication authorization, giving the right scopes to the LLM. The problem that I just mentioned, you can't give thousand tools to the agent. So we do just in time tool discovery, that's one of the tools and dynamic tool calling around it. So only the right set of tools that the agent needs for a given use case gets loaded into the context. One of the other problems people face while building agents is that in a bunch of cases, direct function calling is not the best for solving the use case. Things like if I want my agent to process 10,000 emails, it will probably context overload in let's say 100 of them. So that's where we provide like sandboxes where the agent can do programmatic tool calling on top of our apps, where it can process 10,000 or even a million emails via writing code and on back of it. I think the strongest thing that we do is continual learning. So all our integrations are built by our internal agent pipeline which goes through the agent doing like first getting the developer app, all the credentials required, like creating the actions, then finding dependencies and testing them in real world scenarios like a bunch of edge cases. And that's like the whole process that it goes through. And what it gives us is in runtime when the agent is using us, we figure that a particular tool, let's say, is not usable by the agent for whatever reason, there's an error or there's some failure, or is not able to understand the tool in real time. That agentic pipeline is invoked and a new version of that tool gets created and the newer improved tools is added into the LLM's context, the agent's context. We also have continual learning where when we see that like agent is taking a zigzag trace to reach an outcome, we convert that zigzag trace because we have the whole end to end agent trace of what it is executing, what the use case, etc. We convert that zigzag trace into a set of skills. So the next time agent does something similar, it will take a straight path making it more reliable, robust and token efficient and time efficient as well. We also learn from failures like do's and don'ts for using particular tools, use cases, pitfalls, et cetera. So that's the whole hardness that we provide. We also have like a notification system so like which we call triggers so like the agent can be notified, let's say when like an email is kind of like received or a slack message is received or a PR get created. So the whole system or hardness around agent communicating with apps, the knowledge works app, knowledge work apps, that's what Composer provides.

6:21

Speaker B

Okay, cool. I want to go one by one through I think all of those topics and go a level deeper on each. Before we do that though, I would love to understand a little bit more who your users are and maybe, you know, I'm sure this is changing because obviously we have phenomena like openclaw popping up and whole new populations of users coming online. And I don't think that process has reached its endpoint by any means just yet. But as I was using the tool I was kind of like okay, I see two ways or two Sort of broad scenarios that I might use this in. One is it's become almost reflexive at this point for me to go to Claude Code first. Whenever I want to do almost anything on my computer. Even if that's something as generic as like search for an email, I'll go to Cloud code and ask it to search for the email rather than go into Gmail and search directly in Gmail. So there's, I guess I would call that the sort of hobbyist market or like the individual user who has their individual assistant agent. And those folks I could see really needing to or really benefiting from something that allows them to expand their toolkits really quickly. Just this morning actually, I was kind of onboarding a teammate who hasn't used Claude code so far and one of the questions she had for me was, how did you give it access to our Google Drive? And I was like, well actually that was kind of a pain in the butt. Like I had, you know, it was like, well, Claude actually talked me through the steps. But the steps were pretty gnarly. I had to go into the console and create an app and then click over here, add this permission or whatever. I don't even remember all the steps. And so, you know, that wasn't super easy. And I can see a lot of people just like stumbling over that or just for ease of use getting the, hey, if I can get a thousand of those where I don't have to go, you know, Slack is another one, just absolute nightmare of permission adding and all that kind of stuff. So I see that Persona, I am that Persona. And then I also see you have like an SDK which seems to be really geared more toward production apps. And for those folks, I'm like, that's interesting because how many people want to sort of dynamically bring tools into their app? It seems like it starts to make the app itself potentially kind of unwieldy. On the other hand, I do see a lot of value in managing Auth, for example, for 1000 apps. That doesn't sound like a lot of fun. So if you can make that a simple process for developers, that sounds quite interesting. But I guess I see these profiles kind of going somewhat different directions or at least getting the bulk of the value from different parts of what you've built. So interested in how you segment the market and what you see the value drivers, primary value drivers being for those different profiles?

10:13

Speaker C

Yeah, that's a great question. So as you rightly pointed, we have like a two focal bifocal product. One is for the prosumer market, which is people using Cloud Code, OpenClaw, etc. And plugging in Composure Connect, that's like what we call that product as a single MCP server inside all these agentic runtimes that anything like codecs, etc. Whatever they're using. And for them the value prop is exactly what you pointed like. You don't need to go to this MCB server, plug it in, understand the instructions of Google Drive, then Zoom, then this datadog, etc. You just get one MCP server which is like connect.com cp. It's as simple as that. You put it inside your cloud code and then you can manage your authentication directly via cloud code. If you ask it I want to connect with a new app, it will give you the link or you can, if you like the GUI experience, you can just go to the composers dashboard and do it all there with manage permissions, et cetera, manage scopes. So there the value prop is like simplicity and getting the power of like almost anything at the fingertips. On the other hand, on the developer side, everybody is building agents at this point, be it like startups to kind of bigger enterprises. And one of the biggest problem statement while building the agents that people face is they to give them actual real power to be able to do things they want to give. They want to connect their agents with actual knowledge work apps. And that's where we come in, we solve it. Where I think the value that we provide there is other than auth and all the integrations managing the scopes, you can give whatever granularity of scopes you want via us controlling it in action level, providing the whole harness because at scale everybody who's building agents wants to create this similar hardness that we have seen works really well. The pattern works really well for a bunch of agent paradigms, specifically general agents, which everybody's building right now. So the whole harness and its bits and pieces are available. Modularity. People want to use just like our tool discovery with composers tools as well as their tools, they can use that. They just want to use workbench, which is our sandbox where the execution can happen, where auth and all is controlled by us. They can just use that which is code act essentially. So all bits and pieces. The whole harness is up in the developer side of things. We have on one hand the whole hardness where you can just plug in that single thing via mcp, via API, via SDK into your agent system. We also have bits and pieces of modularity where okay, if you just want to Put this, you can put this, you can put like want to put this, you can put this. And like if you just want to use our tools you can also do that like where you manage the whole hardness. We just provide you the authentic actions. The idea there is people want things like governance, observability, auditability and that all sits inside like Composer's dashboard. So that's where I think like and also we have like obviously some amazing enterprise customers that kind of gives you trust because you can't. This is very critical data set that kind of you wouldn't want to give to any company. But at this point we have AWS using and building their core agent product on top of us Zoom doing the same glean doing the same airtable. So a bunch of like tech first hyperscalers are trusting us. That kind of gives that trust level because they have already evaluated us for all the things that you would want to evaluate us.

13:03

Speaker B

What are the big things that they want to evaluate you on that? Maybe I should be thinking harder about when I go because I'm a pretty, you know, prolific tester at least of a lot of products and I'm increasingly mindful of not you know, certainly I don't like give full access to my Gmail or whatever to just anything that I happen to sign into. But I do, you know I do connect a lot of accounts to a lot of things over time. What are the things that are kind of the biggest, the most vulnerable attack surfaces or the biggest risk vectors that the big companies have beat you up on already to make sure you're solid that the rest of us can take to the bank?

17:09

Speaker C

Yeah, I mean like first of all I think providing least on the privileged access control, I think we have done a pretty good job where you can define what action you want the agent to give at and the agent will have access to only those actions. So you don't like to start with if you don't want to give read and write email, write email or send email access to the agent, you can just give read. Same thing with Slack, same thing with all the work related apps. So that access control is pretty important for you to have that granular access control. That's one then second level of control we have a bunch of ways in which you can control what the agent can take action on via hooks before calling the tool. You can check what the tool execution is doing and create guardrails around it like human in the loop. And we have patterns for all of these pre built after calling the tool before the agent getting the response. You can have those hooks like see what the what the agent is doing, what the agent is going to get like what kind of data the agent is going to have. So like if you want to have some guardrails around that you can do that. So all those type of like guardrails are already present in the product. That's second. And then third, obviously like compliance is a big thing. So we have all the like kind of compliances that people want for like SoC2, etc. Which kind of makes them somewhat more like comfortable. And like the fourth one is which is pretty valuable for enterprises is we also do self hosting. In some cases you wouldn't want to use our cloud so we also self host in the customer's VPC which gives them much more kind of like sense of breathing where like AWS for example in case of AWS we have self hosted composer inside aws.

17:54

Speaker B

Gotcha. Okay, cool. It's a good rundown. Hey. We'll continue our interview in a moment after a word from our sponsors.

19:45

Speaker A

Everyone listening to this show knows that AI can answer questions, but there's a massive gap between here's how you could do it and Here I did it. Tasklet closes that gap. Tasklet is a general purpose AI agent that connects to your tools and actually does the work. Describe what you want in plain English, triage support, emails and file tickets in linear research 50 companies and draft personalized outreach Build a live interactive dashboard pulling from Salesforce and Stripe on the fly. Whatever it is, Tasklet does connects to over 3000 apps, any API or MCP server, and can even spin up its own computer in the cloud. For anything that doesn't have an API, setup triggers and it runs autonomously, watching your inbox, monitoring feeds, firing on a schedule all 247 even while you sleep. Want to see it in action? We set something up just for cognitive revolution listeners click the link in the show notes and Tasklet will build you a personalized RSS monitor for this show. It will first ask about your interests and then notify you when relevant episodes drop. However you prefer. Email text, you choose. It takes just two minutes and then it runs in the background. Of course, that's just a small taste of what an always on AI agent can do, but I think that once you try it, you'll start imagining a lot more. Listen to my full interview with Tasklet founder and CEO Andrew Lee. Try Tasklet for free at Tasklet AI and use code COGREV for 50% off your first month.

19:53

Speaker B

The activation link is in the show

21:25

Speaker A

notes, so give it a try at Tasklit AI Support for the show comes from vcx, the public ticker for private tech. For generations, American companies have moved the world forward through their ingenuity and determination. And for generations, everyday Americans could be a part of that journey through perhaps the greatest innovation of all, the US Stock market. It didn't matter whether you were a factory worker in Detroit or a farmer in Omaha, anyone could own a piece of the great American companies. But now that's changed. Today, our most innovative companies are staying private rather than going public. The result is that everyday Americans are excluded from investing and getting left further behind, while a select few reap all of the benefits. Until now. Introducing vcx, the public ticker for private tech. VCX by fundrise gives everyone the opportunity to invest in the next generation of innovation, including the companies leading the AI revolution, space exploration, defense tech and more. Visit getvcx.com for more info. That's getvcx.com carefully consider the investment material before investing, including objectives, risks, charges and expenses. This and other information can be found in the Fund's prospectus@getvcx.com this is a paid sponsorship

21:27

Speaker B

let's talk about these sandboxes a little bit. The paradigm there is like a little bit confusing to me. I'm not exactly sure how to think about what execution should and does happen in different places. Obviously I run, you know, if I run, which I do Claude code on my local machine, it is mostly running things on my local machine, right? All the bash commands and stuff are literally happening on my system. There are also some tools like their search tool, right, that's built in, that happens on their side in their runtime on their infrastructure. And those can communicate back and forth in terms of like, you know, the result of that search can get sent down to my system and the result of commands run on my system can get sent obviously up to the cloud to be, you know, part of inference. But I imagine that gets kind of fuzzy or weird when people have a mix of things where they like how do, how do they decide? Or how do you guide people on deciding what should be run in your infrastructure and what they should run on their own infrastructure. I assume it wouldn't make sense for them to try to bring their whole app into your sandboxes, right? How should I kind of unclutter my own mind to think about in general, who should be running what.

22:47

Speaker C

So essentially we have like in our sandbox we provide a ton of utilities, which makes it easier for LLM to Write code on top of it, that same like Docker image of sorts. We are making it available like locally also very soon. So the same thing, if you want to use your like, let's say own tools, internal tools or local tools and kind of like want them to be available inside Composer sandbox, you can also do that. Like that's coming very, very soon. But the idea is in the sandbox, the agent has to write very minimal code and the auth side of things and a bunch of side things, mostly auth and some kind of abstractions around what the tomb calling are taken care of already in the primitives that the agents get. So that's the benefit where, okay, the agent doesn't have to deal with a bunch of things it shouldn't and it just writes very simplified code and all the things around it, like deciding what ought to use, kind of converting things from code to like function calling, etc. Whatever it needs to be is done by the tooling that we are providing.

24:18

Speaker B

Gotcha. So the biggest value driver you're highlighting there is actually making tool calling, or not exactly tool calling, but making code writing easier for the language model by providing. And you said mostly auth, what else is kind of in that? I fell generally into the category of hardness. Right. So what else besides auth do you see? And I have experienced that certainly, where as we kind of mentioned, it can be hard to get these things set up. So it's intuitive, at least pretty intuitive to me to say, yeah, if you could provide all solid auth code ready to go so that the LLM doesn't have to recreate that all the time, that sounds like a clear win. What else is like that? Where there is enough deterministic stuff that you've built out that it makes things a lot simpler for the agent.

25:37

Speaker C

So, for example, file sharing is the other thing. Basically we have mounted folders where everything in that folders, and LLM knows that because it's part of the hardness, which is in the description of the function and system prompts, et cetera. So we have mounted folders where everything put in those mounted folders are by default uploaded to S3 and we have shareable links available for them. So whenever agents want to share anything outside, it's very simple. It just like moves or copies files to that particular folder. So things like that, like which kind of like more will be coming, but like giving like better, like whatever it wants to do. And we kind of like come with newer use cases every now and then. Right. So file sharing is definitely One of the biggest ones where LLM has created, like going through all 10,000 emails, it has generated a report. Going through like all the stripe activities, it has generated a report and it wants to share it with you, the user. And then we make it very easy for them to do that. LLM can write code which uses LLM, so we made it very simple to do that. It's kind of like Inception. LLM is writing code which uses LLM, but it's needed for processing 10,000 emails. Otherwise how would you do that? So utilities around that. So all these are some minor things, but overall it increases efficacy of agents and accuracy of agents to a big extent.

26:35

Speaker B

Okay, cool. On the topic of discovery, or what I think you also call smart mcps, I had been on the lookout for a while back when this sort of MCP phenomena first popped up. I was like, where are the smart mcps? It seems like at first we just had this massive wave of people wrapping APIs in the MCP layer and okay, that's fine. I was kind of like, okay, sure. But it seems like where this really gets helpful is if the MCP itself is smart in some way so that it can take in not a very specific command that could equal. You know, it's sort of like this MCP call could have been an API call, but something that is higher order that maybe involves the composition of multiple tool calls or even potentially multiple different APIs working together to do some higher level job. And honestly, I haven't seen a lot of that. You know, I kind of asked and

28:08

Speaker A

asked and looked around and looked in

29:14

Speaker B

repos to try to figure out if anybody was doing this, and there wasn't much. Now you guys are doing that and it seems like it is a pretty big focus of your value. So I'd love to unpack how that is working and especially maybe getting into a little bit like the sort of progressive disclosure of it. Because this, I think, has also come to the fore in developer conversations recently where it's like, first it was MCP is going to take over the world, and now we've heard a little bit of the trough of disillusionment of, well, it's a lot of context, bloat, and maybe CLI is better. I kind of always end up thinking that one of my AI mantras is everything is isomorphic to everything else. Meaning whether it's an MCP or a cli, whatever, you can probably do progressive disclosure and you can avoid crazy bloat if you think about things the right way. It doesn't seem like those decisions are so sharp as they used to be because there is just so much room to create flexibility in the context of intelligent systems. But take me through the smart MCP paradigm that you're developing. What makes it smart, how you're building that to again make things as easy on the agent as possible, to do as much for the agent as possible, etc.

29:16

Speaker C

Yeah, for sure. So I think as I mentioned earlier, if you give thousand MCPs to LLM, it's obvious like we are hitting like 1 million context windows. Maybe in some time it will increase to 5 or whatever, but attention is definitely not free. So the lesser you give, the better the performance will be. And that's where things like just in time tool discovery so that your algorithm is not suiciding via seeing or overwhelmed by seeing thousand tools and you giving the right set of tools is important. That's where given we have like 50,000 plus tools and most of the users want to use all of them because it's kind of like dynamic, right? Like their users, specifically in case of developers building, they want to give their users power of like whatever composer has. So the idea is the LLM doesn't see all the tools, it sees a few tools and then dynamically new tools get added to the context. So that's one part of smartness. The other that I was mentioning was around learning. So what we call is background learning of sorts. So whenever we see that a tool is not like comprehensible by the agent, which means the LLM is not able to understand what the tool does, or it's trying a lot, but it's always erroring out. Then in the background automatically a new version of the tool, which we feel is kind of very valuable for this particular use case or in general, there are issues in the tools. So a new version of that tool gets created in real time and gets added to the context. So that's another level of smartness like, which is like, okay, we can create multiple versions of tool really quickly and get a new version of the tool which might be more suited for the particular purpose that the agent is kind of taking at this point to the agent. Then another thing that happens in the background is skills have gotten really popular. The reason for popularity is it is an abstraction level above tools where you can have instructions to particular use cases baked into skills which inherently use tools. But there are instructions, there are scripts which make it more robust and more repeatable compared to just providing tools to the agent. So what we do is when we have the whole end to end agent trajectories, converting them to a set of skills which are reusable and provide them during the just in time tool discovery. So like it's also just discovery of sorts. So the agent already has how to do the tool, what tools to call for this to achieve this particular outcome, what code to write in the workbench or sandbox to achieve this particular outcome. And then like it just has to maybe the use case is a bit different so it has to use that skill, but like do a bit of fine tuning for that particular use case and reuse that code or skill for its purpose. So that's the smartness like background learning. And it also include failures. So if there are some failures that we see happen again and again, we just tell that to the agent beforehand in context. Okay, these are the pitfalls, these are the don'ts of using particular tool doing this kind of like achieving this type of outcome. So it's like in my opinion, hardness is nothing but context. You have to engineer the context and that's what we are giving some effective context engineering or not tools which makes it smarter. Hey.

30:37

Speaker B

We'll continue our interview in a moment after a word from our sponsors. One of the best pieces of advice

34:21

Speaker A

I can give to anyone who wants to stay on top of AI capabilities is to develop your own personal, private benchmarks. Challenging but familiar tasks that allow you to quickly evaluate new models. For me, drafting the intro essays for this podcast has long been such a test. I give models a PDF containing 50 intro essays that I previously wrote, plus a transcript of the current episode and a simple prompt. And wouldn't you know it, Claude has held the number one spot on my personal leaderboard for 99% of the days over the last couple years, saving me countless hours.

34:27

Speaker B

But as you've probably heard, Claude is

35:00

Speaker A

the AI for minds that don't stop at good enough. It's the collaborator that actually understands your entire workflow and thinks with you.

35:03

Speaker B

Whether you're debugging code at midnight or

35:10

Speaker A

strategizing your next business move, Claude extends your thinking to tackle the problems that matter. And with Claude code, I'm now taking writing support to a whole new level.

35:12

Speaker B

Claude has coded up its own tools

35:21

Speaker A

to export, store and index the last five years of my digital history from the podcast and from sources including Gmail, Slack, and Imessage. And the result is that I can now ask Claude to draft just about anything for me. For the recent live show, I gave it 20 names of possible guests and asked it to conduct research and Write

35:23

Speaker B

outlines of questions based on those.

35:40

Speaker A

I asked it to draft a dozen personalized email invitations. And to promote the show, I asked it to draft a thread in my style featuring prominent tweets from the six guests that booked a slot. I do rewrite Claude's drafts, not because they're bad, but because it's important to me to be able to fully stand behind everything I publish. But still, this process, which took just a couple of prompts once I had the initial setup complete, easily saved me a full day's worth of tedious information gathering work and allowed me to focus on understanding our guests recent contributions and preparing for a meaningful conversation. Truly amazing stuff.

35:42

Speaker B

Are you ready to tackle bigger problems?

36:18

Speaker A

Get started with Claude today at Claude AI tcr. That's Claude AI tcr. And check out Claude Pro, which includes access to all of the features mentioned in today's episode. Once more, that's Claude AI tcr.

36:20

Speaker B

So double clicking first on that first one, the discovery of tools. What do those requests typically look like? Because I can sort of imagine, and maybe you can answer this with specific examples from specific customers or however you want, but I'm sort of imagining sometimes I might just say like, I want to connect to my Google Drive. And then it's like, okay, great, we know what we're going to do. We're going to get the Google Drive tool. Pretty straightforward. Definitely. Still very nice to be able to have that as a natural language interface. Other times though, I guess I might imagine maybe I don't even know what I want. Is there a tool for this? Or how would I go about doing something like that? So I'm kind of interested in the breakdown of the kinds of requests that you get and how many of those today. And I think there's kind of another version of this question too, which is like, as we actually get into doing the work, how much of the work is happening in ways where the user sort of specified it? I want to move my. I want to go call linear and get some details and then put it over here. You might think of that as sort of automated copying and pasting, right? Like stuff exists and we're kind of moving it around versus really figuring out higher order stuff. Like, I don't really know how to do this, but this is kind of broadly what I'm trying to accomplish and then the system figuring out what are the tools, what are the steps to make that happen. So I guess that exists both on discovery and at the level of execution. But the key question there is like, where are we in Terms of people defining what they want their agent to do and the agent following those instructions versus people describing intent and the system sort of really figuring out how to serve that intent, even if the person doesn't have it all mapped out.

36:39

Speaker C

Sure. I just want to kind of clarify one thing there. So we have a pretty intelligent mediator in between. So the direct user request is not what we get. We have lot code sitting in between and that sends the request that's very intelligent mediator in between. It's kind of like navigates the user request and sends the right level of intent request to the tools. So we don't get. In most cases, cloud code already knows the power of what composer can do. So it figures out, okay, if the user asks how do I connect to Google Drive? It will directly call like Manage Connections, which is our auth management tool to do that for the user. It won't give a search intent to tool discovery tool. So that's where I think the intelligent mediator, which is the LLM or agentic kind of like runtime comes in between. But to answer your question, I think it's becoming more and more intentful. We have all seen how people are using open clause of the world. I think December was a big shift where people realized that these models have gotten to a limit where much more is possible and I can trust it much more compared to before. And that has happened across all domains in my opinion. Engineering was obviously like software was the first one to bite the bullet. But I think it's happening more and more across knowledge work where people are becoming more intentful with these agents and give, I think in a lot of cases just give the outcome they want to drive to the agent and let the agent figure out and then we are kind of the harness which provides all the right tools for it to be able to do that. But agent is smart enough to throw the right intent with the right tool.

38:42

Speaker B

Can you maybe give some examples of that? Like instances of people with intent that then gets mapped on to those tools. And I guess those could come intuitively. It seems like many of them would come from individual cloud code or OpenClaw users. But I'd be really interested too if there are examples where that is actually happening in an app that an app developer, you know, has running in a production environment. I don't know if that's. I don't know if we're getting there yet or even if we want to. I mean, I'm still a little bit. A lot of these things. I do feel like what's the app at that point, you know is kind of an interesting question. But yeah, give me some examples of like intent that you have seen resolved into actual successful execution.

40:47

Speaker C

So at this point I think like people are so confident that they give their whole Gmail access to the agent and ask them to okay, go through last month of my email and archive all the emails that don't seem useful. Like so the agent goes and writes code to do that which uses ll et cetera to figure out which emails are not useful. Then obviously the other use cases are like I use my OpenCloud for hiring. So my open claw goes through a lot of like GitHub repositories, find good commits of individuals and creates a pipeline for like specifically engineering for me to hire like good open source agent, tech repos, Python repos, typescript repos, et cetera. It will go figure out the best contributors, specifically figures out all the enriched data of where they belong to like SF or outside like their emails, LinkedIn like all the socials data and like I, I've also kind of like given its own like it, its own email so it also reaches out on my behalf to them. So it's like end to end hiring like like a recruiter job app literally offloaded and like there have like thousands of like really good folks it has emailed and I've gotten like in last week or two week I've gotten let's say like 30, 40 calls set up. So that whole thing is fully done by my agent and compose you. Yeah, I think that's the idea. Right. A lot of exploratory to actual end to end knowledge work is getting offloaded to these agents if that answers to a certain extent your wish.

41:31

Speaker B

Yeah, I mean as it gets up to sort of job scale things it starts to be both pretty intuitive I guess what that ultimately looks like and also potentially quite transformative for many aspects of life.

43:11

Speaker C

Our sales agent is doing something like a salesperson is doing something similar for sales. Obviously he's not emailing them because they're like very directly emailing them via the agent. But like the agent has drafts ready which they just need to kind of like press the sync button. So like that's the level where we are like it kind of like does the whole sales figuring out the people that it needs to reach out the right people who are building let's say agents in different companies and etc. Like ready with like their data like email, LinkedIn etc. And then the like he has to just press a Send button.

43:31

Speaker B

So how do you make sure that those agents have the right context? Because this is something I'm also thinking about right now with. So one of my favorite things that I've set up and I kind of can't stop talking about this, I'll keep it brief because I've probably talked about it on a few episodes already. But I have exported to a local database basically the last five years of my communications. All email exported out of Gmail, all Slack, basically every DM platform that I use, all of my calls, which I've been recording for like the last three years, transcribed, every like turn in the call becomes sort of everything's organized into threads. So a Gmail thread is a thread, but also a call. A single call would be a thread. And then each statement back and forth between the people is like a message, you know, kind of corresponds to the emails back and forth. Even the podcast I kind of break down with the transcript and you know, put it in there in the same way. So it's like a lot of the communication that I've had, probably a majority is now in this database. And it is extremely helpful for getting the context that is needed to understand like who is this person? What is my relationship with them? Do we have projects that are ongoing? Who all was it? If they're, if I start with a project link, who was involved with those projects? How did that evolve over time? Time, whatever. I am a little reluctant to throw that into some random cloud container that I'm just testing out so that still only lives on my local machine. So this kind of brings me back a little bit to the containers and the context and what should sit where. Because I assume just like me as an individual, you as a company have a lot of information about what makes a good candidate. Like who are the good candidates that we had before, you know, good examples, bad examples you would compare them to. It's endless, right? How should people think about like sending these agents off to do long running things, making sure they have the context that they need, but not like putting themselves too much at risk by my personal database already is a gigabyte. It's my whole life in there. So I do need to be a little mindful of where I send that around or what ports I open up to access it. And I don't even know what all is in there. I'm sure I've emailed myself credit card numbers and done all sorts of stupid, emailed myself passwords, probably recovery codes I'm sure are in there. I'm sure, I've done all sorts of stupid things which could come back to bite me. How do you think about that balance between making sure the content, the context is there so they can succeed but managing the associated risks.

44:11

Speaker C

So yeah, that's where I think the managed access and least privileged access control comes into the picture. And the idea that we at composer have is in future you'll have not just a single agent but like multitudes of agents. And the idea is you'll have different profiles and access control for each agent. Some agent will probably have only read only access to your data so that they don't. They can't do any malicious of sending or kind of like executing anything but they need all the data and like you will want them to be very self contained because they are like sort of research agents. So they have like a lot of data but they don't have the permissions to divulge that data by mistake. So you'll have very tight access control on what they can do but they have read only access to everything. So that's a profile that you can create inside Composio and manage it. Like what access permissions that you want. There can be another profile where you have a lot of kind of writing permissions that you have given to that agent but then that agent has very limited personal information or like company wide information given to that agent. Like just because what if that agent emails like some like secret token by mistake. So like there the granular access control is okay, you want to give a lot of right control but like you either have things like human in the loop etc. To kind of guardrail what the agent is doing and like tight it very controlled. So like these are the types of different profiles that kind of like will exist in the future and you would create like different open clause, different clock codes, different codexes to do like a mix or like all of it.

47:02

Speaker B

Let's talk about the continual learning aspect. I think it's sounds like a huge value driver and sounds like probably a necessary one for Kaposius success. I don't know if you would go that far but what I see is like the barrier to spinning up a new tool is certainly dropping. But the value of having seen a ton of uses of that tool and being able to figure out what the actual effective pattern is, that's something that's not going to be easy for people to recreate on their own. So if you can make a step change difference in the results people can get from even if they spin their own tool up real quick versus okay now, but we've seen 10,000 uses of this and know what has actually been effective. That strikes me as the moat. And we're all searching for the increasingly elusive moats in the AI space. I'm interested in. You described in some detail already how it works. I guess the philosophy part of it I'm a little wondering about is how does it work across users? How does it work across apps? There's obviously a sort of depersonalization or anonymization aspect to it that I'm sure is critical. But then also if you find an upgrade, does everybody already automatically get that upgrade? Or do you have to subscribe to upgrades? Or maybe the upgrades that you're doing are sort of in some cases like very specific to an individual or, you know, a particular app such that like you feel like you can upgrade it for them, but you don't have to change how it is for everybody else. I feel like I both want those upgrades as a user, but then when I do have something that's working well, I'm a little bit afraid of those upgrades, just like I'm afraid of even if, you know, whatever. Let's say obviously the model makers are leapfrogging each other all the time. Similar thing there, right? A new model comes out, it might be better, but is it going to be better on the thing that I already dialed in to my satisfaction, like maybe, maybe not. So I don't know. That all just seems like quite fraught. How do you. What's the philosophy that guides you in figuring out who gets what upgrades?

48:57

Speaker C

That's a great question. I think we, like, think about it a lot, by the way, internally, and that's where we have designed our intra to make it very easy. As I was describing multitudes of versions of tools and there might be tens of thousands of a particular tool version of the same tool. You might have tens of thousands of versions. The idea there is there are some personalized upgrades. So we see that you are using some particular tool in a particular way and the tool can be better specifically for your use case we'll have the upgrade only for you. Don't fully buy that like, fear argument just because I think we all know like the models are changing, the model behavior is changing every other day. So the way you can control the behavior of the model even when model changes or some of the tools change is like having skills. And that's where we are very, very thorough about when we change the skills that we have developed for users. So that doesn't Change that often. That's kind of like the fixated level of like repeatable behavior that like the user, like you like something the way it is. So that's kind of like ingrained into your skills. Like the personalized skills that are created in the backdrop by us, but like the tools themselves keep on getting better and better. So like I think like you were talking about moat, right? I'd want to. We have like a gazillion of instances where we have seen the docs are totally wrong for a bunch of tools. And because we have gone through so many agent agents using our tools. Agents use tools in insane different ways. And compared to previous generation where humans were using these API tools, like you hit a newer edge case every now and then and that just makes like our tools better and better because like you have seen all across. Like last week we found a bunch of cases in Google Calendar where our tools are much better. Like and it autonomously happened. We just even got didn't get to know in some cases. But like our tools are much better than what docs propose. And that's not true for just one app. It's true across apps. So I think that's a moat which we have developed because we've gone through so many agents poking hole around different apps and making our tools better. And those upgrades are kind of available across. Why would we want those upgrades to be available to a particular user? If a tool is getting better and better, that should be all across. So that's how I think about it. Basically. There are some improvements which should happen all across because they are very generic and some skills should be available across because they are how agents should use particular tools. Some are very personalized to you and your use case.

51:11

Speaker B

You said something I thought was quite interesting and I'm not sure everyone would agree with, but it was that skills basically tame the models. Right? I forget exactly how you said it, but you sort of said models are always changing. Of course, but that's where the skills come in. Once you've really defined a skill, it sounds like you're of the opinion that you can kind of swap out models underneath and you'll get pretty consistent behavior even across models. Obviously, I know there are caveats with that in the sense that you can't massively downgrade to a 1B local model and expect to get frontier performance. But if we take a narrower understanding of that statement and restrict ourselves to frontier models or whatever, whatever reference class you want to use, that's still pretty interesting. I think a lot of people Would say they don't feel confident in that. But how confident are are you in that statement? How confident are you that like I guess the implication would be all the frontier models are good enough at following instructions that if your instructions are really well built out then they become kind of interchangeable. Is that a good summary of your view?

54:06

Speaker C

Yeah, in most cases I've seen that kind of true because in most cases the skills are pretty well detailed enough in giving decently granular aspect of what you want to achieve. The path, the trajectory that the LLM should take, the agent should take to achieve a particular outcome. That if the model is good enough and follows instructions in most cases like the kind of like the behavior of the model like or the trajectory of the models remains consistent. And that's a known kind of pattern that a lot of research and industry are also seeing. Where you get OPUS to create a skill and then swap to Sonet for using that skill, which is like okay, the first time OPUS being more smarter is better at navigating and figuring out like in reaching the outcome. But once you have done that, you have the skill which OPUS created, then you can swap to a cheaper model and achieve a similar ish outcome.

55:20

Speaker B

Do you think that also holds true even swapping across frontier model providers?

56:25

Speaker C

It sometimes not so I think there are behavioral patterns across different providers which make them different. For example, specifically I have some examples that I see in day to day anthropic models are somewhat more agentic in terms of things like okay, if there are some tools which require polling then it will write code to wait till like for example you have to poll and then continuously poll. I think that polling aspect of like agentic polling is much better ingrained in anthropic model somehow and GPT just stops. It waits for user input after that. So those are some behavioral patterns that are different which makes some of the skills like the way GPT will use those skills or would kind of build those skills a bit different from anthropic side of models. But to majority extent except those nuances, I think this holds true.

56:34

Speaker B

Interesting. Do you in practice advise people to do. I guess what would you say is the sort of max efficiency play and do you recommend it? Like if I go develop a bunch of skills with opus, what's the cheapest model that I can trade down to that would that you would expect to work some large majority of the time

57:36

Speaker C

for sure Sonit because that I do regularly because in some cases I write my example, I use skills a lot. So in local I'LL write some skills for the first time via Opus but then trade off for speed do Sonet and that works phenomenally well. I don't do it with Haiku because I've seen like that isn't really, really well. I have tried some GPD and like SS is like 90% of the skills just work. There are 10% of the times where like there are some nuances which are present in the skills because of like the model that wrote the skill how it operates and they are not exactly plug and play but in 90 to 95% of the case it just works out the books.

58:07

Speaker B

Interesting. Would you expect like Gemini Flash to hit that level as well?

58:55

Speaker C

I think so. I think Gemini Flash is decently smart. So like I think like if not now, maybe like the next iteration it should tried it like myself so I can't like come in directly but like I've used it in production setting where it feels like very, very soon.

59:02

Speaker B

Skills as the recommoditization layer of from what I think I would say the prevailing narrative has been recently that the models are starting to diverge in their not exactly capabilities in the sort of macro benchmark sense because they're all kind of climbing the same curve obviously, but that they're diverging more in qualitative ways that are hard to wrap your head around and sort of thus creating some stickiness and some, you know, some pricing power for companies once they get people using their stuff that I think there's been the increasing sense it's like harder to switch off. But you're making a provocative. I mean not deliberately provocative but it's definitely. It's provoking thoughts in me that it might actually come. The boomerang might come back due to all these skills really just getting so thoroughly defined that you don't necessarily need great judgment in a model to do it. You just need good instruction following. That's very interesting.

59:18

Speaker C

That's where we actually position Composer as a one like one shot way to being not locked in essentially to a model provider. So if you use Composer's harness you can use it with anthropic with OpenAI with open source models. So tomorrow, let's say today you are using open like anthropic. OpenAI is gradually improving on a bunch of these things. If you want to switch to OpenAI you can just have like all your auth, all your skills at just a single place and make that Switch and get 99% reliability with the switch day after you decide, okay. Open source models are becoming equally great and they are probably 10x cheaper. You want to go there, you can make that switch and still continue to work with 99% reliability.

1:00:22

Speaker B

Something I'm still in the middle of doing actually right now is taking the article that I think it was Tariq. I don't know the person personally, but a member of the anthropic technical staff put out a post just in the last 48 hours or so that was very well received that was like here's how to use skills. We've learned a lot. Here's what we've learned. So I didn't even read that. Instead I just copied and pasted the whole thing, put it into Claude code and said here's some best practices that I just heard were popular and people say are giving good results. Can you go apply these or go into plan mode? First tell me how you would think about applying these to all the various skills that we've worked on together. And naturally it has a lot of good ideas for what to do there. I wonder if you guys would be maybe already do this, but if you've already invested or might think about investing in skills specifically for translating skills from provider to provider. Because I can imagine that 90, 95% could easily become 99% if you applied another layer of transforming and kind of compensating for the known quirks. Is this something you already do?

1:01:19

Speaker C

Yeah, we are developing a bunch of metrics and benchmarks around this stuff and we already do some sort of this stuff. Nobody specifically in enterprises want a lock in because they want that optionality of moving across providers. Because specifically in the current AI race you actually never know who the winner is. It changes very often. Today it's anthropic, tomorrow it's OpenAI, Google Chinese models. Xai I mean like, like we want that optionality because if you completely lock in and like these skills specifically in my opinion are like very, to a certain extent very addictive and on the other extent are to certain, like creating those. Like you said, if the skills have those behavioral patterns ingrained like how the model works, they can be harder because they are not structured. It can be harder to kind of change them. And that's where I think, I think the 95% is very easy. 90, 95% because most of the models are at that level now. But reaching that 100% mark with these skills which are not structured is actually very hard problem. And that's something that we are trying to solve already and we have solved to make it somewhat better, but we want to solve it to probably 100%.

1:02:27

Speaker B

Yeah. Okay, that's really interesting. And I do feel my worldview changing a little bit in real time here toward expecting a little more commoditization, less pricing power. That also kind of brings to mind another angle of thinking about lock in or motes or whatever is memory. Although again, I think there are ways to get around this. Claude recently did this thing where they said, go ask chatgpt this, come paste the answer in over here and you'll be like, we'll pick up right where you left off in terms of memory. So that too is not really a great moat, or at least it's debatable. But in looking at all the tools that you have and just browsing through them, I was kind of struck by

1:03:52

Speaker C

one.

1:04:43

Speaker B

Of course there's many tools that are these sort of just relatively simple API wrappers where these APIs have existed and now they can kind of be used by agents. Okay, that's sweet. A lot of automation powered that way. But then there's kind of another class of tool that is something that is built for agents in the first place. Memory being one where I saw you have mem0 and zep and probably some others. But those are things that are specifically designed to unhobble the AI or enable the agents, if you will. There's also some things that are around, like allowing agents to transact, you know, whether in fiat or cryptocurrency or whatever. Tell me what, what does that landscape look like? What are the agent enabler tools that are actually important, that are actually working, that are maybe even strategically important? Like memory could be one I can imagine where decoupling memory from your model provider, if it works well enough, could be another way to sort of insulate yourself from lock in effects. But I would love to hear your survey of this new generation of tools built for AI specifically.

1:04:43

Speaker C

Yeah, I think we are very bullish on all of these agent enabler tools like you mentioned, memory based, MEM0 super memory, ZEPP payment based. There's kind of Skyfire, we have traditional e commerce ones like Shopify as well, which are almost now agent enablers in that sense. They are all gearing towards it. So I think we partner with all of them and all of them like partner with us because we want the people building on top of composer to have access to the best quality agent enablers to kind of like improve the ecosystem. I think a bunch of good ones are already there. More are obviously coming day to day and I think our position is just to be giving access to our customers, all the best quality ones. So I don't think I have any choices there per se, but we like to give all of them to our users to make the choice.

1:06:03

Speaker B

Are there any even. I understand the idea that you can't maybe pick favorites among your partners, but are there classes of these things that you find to be particularly powerful?

1:07:07

Speaker C

I think all of them are getting decent usage. I would say the memory is obviously a big one. Everybody wants to use that. Even payment side there are bunch like Skyfire for different use case payment. A bunch of these things that are getting increasingly better adoption given the open Claw movement specifically where a lot of background agents are running and want to do commerce stuff like search obviously is one of the big use cases where people use things like exa, firetrawl, tabili etc. To do stuff. So I think like obviously it's kind of broad. So depending on the use case and the problem statement that the builder is solving or people are using in their OpenCloud or Claude, I think all of them or a mixture of them are being used heavily.

1:07:22

Speaker B

Are there any kind of missing categories or missing tools that you're like why has nobody built this yet?

1:08:22

Speaker C

I think yeah, it's kind of like. I think every, like there are so many that's coming up. Like at this point I might have to think more to come up with an answer here but at this point I think I know like people are building every and each sort of like from whatever a human does and need. Like they are like there's like human delegation from agent to human delegation type of thing that's happening. A bunch of things is happening. But like yeah, I think I don't have an answer here out of the box. But still I would say the most user is still coming from the traditional softwares because that's where all the data sets of the users so things like Slack Salesforce, I think those are the major users still because they are the kind of like that's the system of records.

1:08:31

Speaker B

Yeah, perfect transition. I'm interested in your take on those platforms and how like which. Which ones are kind of maybe even advantaged or at least like safe from major disruption by the AI wave versus which are more likely to be under threat. You know people are of course tired of paying their Salesforce bills and their Slack bills and I think there's been in my mind some interesting but I think kind of confused debates about this where one person will say I spun up, I had my, you know, coding agent spin up a Slack clone and look what it did. And then another person will say, well, that's nowhere near what Slack really has to do. I mean, you think about all the complexity that Slack really has to handle and then I always kind of think in my head like, sure, but for that one user and maybe their small company, all that Slack complexity was irrelevant anyway, you know, so it's just a bunch of stuff that they're paying for that they're never even using. I'm not really sure where that settles though. And obviously there's a lot of different categories from, you know, collaboration to task management to customer service platforms of all kinds. What's your take on with the obvious disclaimer that this is not investment advice? What are you buying? What are you selling based on what you're seeing?

1:09:23

Speaker C

Yeah, I think there are two fold takes on that. One is, I think in my opinion the core infrastructure layer is getting much, much more stronger because like as you rightly said, making software is getting easy. So a lot of software will get built on the core infrastructure pieces. And like your dependence on software is just going to increase more and more because like now you're chatting with your agent more than you're chatting with a human doing some tasks, so your dependence is increasing. So like the core infrastructure that's driving it is getting massively more stronger. Things like obviously like aws, Cloudflare, all these kind of like base infrastructure because I think it's very hard for anybody to rebuild those infrastructure pieces. So that's one on a bunch of these like the whole SaaS war, right, that you are kind of like talking about. I think there, the way I am at least thinking about it is the interface in which there's obviously like some places where you can build a mini version of Some bigger like SaaS app for your particular small niche use case. That's fair. But in most cases what will happen is the interface with which you use a lot of these SaaS apps is going to change. And that's where I think startups will also build those new interfaces and post a competition to the SaaS app. But like in this particular wave, I think all these like older SaaS softwares like Salesforce, Slack, etc. Are also pretty fast to catch up on and they are also coming up with their like new agentic interfaces to operate on and I think that's just like, okay, who will kind of like be fast enough and do innovation? And we are here to support both. Honestly in our case at Composio, we are working with startups at the same time. We are working with the incumbents building new agent tech interfaces. So it's all about who is fast enough to build a new interface for their users to operate on.

1:10:52

Speaker B

So yeah, I've been back and forth on that myself a little bit. I think one, one paradigm I had bought into, and I think I still mostly do, is that things like Salesforce will probably not be disrupted by startup CRM competitors. Because sure, you can go build a AI first CRM or whatever, and it might be sweet, but it's still going to take a long time to sell the big customers. And by the time you can go win a lot of that business, they'll figure out how to clone your best features and sort of close ranks and prevent you from taking too much of their customer base away. And so if I think about how much of the AI value the value of the AI wave accrues to incumbents versus AI first challengers to those incumbents, I think I go mostly incumbents. But the flip side of that is what about people bringing things in house? So another example that I've been thinking about because I actually had a chance to study this a little bit is Intercom. And they have done, I think especially the last week, there's been a bunch of praise going around for Intercom, saying this is one of the companies that has adapted the best. So I'm not picking on them by any means. In fact, they provide a past guest on the show by the way as well. They provide a really strong example of what it looks like for a company to catch the wave and ride it successfully. And one of the big things they've done, of course, is create this fin agent which has basically the ability to resolve some. I think when I spoke to them, it was approaching 70% of all customer service tickets for everything they see across many, many thousands of customers. 99 cents each, flat pricing. Okay, cool, that's sweet. It sounds like it's driven a huge growth boom for them and easy to see why, because if I'm paying humans to do this and the AI can respond 24, 7 instantly. I looked into our data, actually. My company is a Intercom user and we've pride ourselves on customer service, but the speed to response that the fin agent gives is just like we just can't match that with our staffing size. So, okay, it has all these examples or advantages, I should say. But then I looked at the composio tools associated with Intercom 133 tools just for Intercom, which I'm not sure about this, but it sure looks to me like I could do literally anything I wanted to do or needed to do with the entire Intercom platform through those tools. And then that got me thinking, well, maybe this Fin agent is actually going to become a skill for me. And rather than pay them 99 cents each, what I really need to do is dial in exactly what I want. And I can probably do that better owning the skill on my side versus like trying to populate it on their side. Do it all through the tools and I probably save, I don't know, 90%.

1:13:07

Speaker A

Right?

1:16:28

Speaker B

I mean imagine it would be like 10 cents in token cost. So what's your take on that? Do you think that people will start to cannibalize? You know, these agents are awesome, but it seems to me like it wouldn't be necessarily that hard for a lot of companies to say Fin's doing well, but they're not doing some things quite so well. And I've got all these tools. So give me the Composio 133 tools. Let's work on our own skill and we'll get this thing better than fin was at 10% the cost. Is that realistic? Why wouldn't that be realistic?

1:16:28

Speaker C

So by the way, Fin is a great product. Kind of like we used it early on in when we were just launching the prosumer side of Composeo because like obviously you have a ton load of like support that comes when it's a prosumer product. So we used it early on. It's like, like really easy to get started and, and that's what they're solving. Honestly in my opinion, a lot of companies don't want to spend a lot of time. They want to get started and offload this to someone else and like when is great for them. I think where you made the right point is, was like if you want a lot of customizability, I think more than cost, cost is obviously a big reason, but the customizability and governing your agent building gives you that freedom of what you want to do. You can create multiple skills. You can specifically limit your agent to have particular tools. You can give access to other apps that Composer has while building those support agents. I think that customizability is what people prefer when kind of making that build versus buy decision. And that's where I think Finnish great product for a lot of people by the way. And that's why it's doing great. But like there are, there will be some people who would want to customize and make the agent more powerful. And that's where I Think we come in and provide those 100 plus tools for intercom and you can do whatever you want.

1:17:04

Speaker B

Yeah, I agree with you that they've done really well. So just to reiterate that point, but I do see the world where just the friction keeps getting so low. In part because you've already got the 133 tools, in part because somebody can publish their skill, in part because I can give that skill to Claude Code and say, hey, interview me about my use cases and how I would change this skill to suit me. And then it's already plugged into all the 133 tools. And then I'm just like, wow, the barriers are really kind of crumbling everywhere around. Everywhere I look, you know, it seems like the, the barriers are really getting pretty easy to overcome. I guess. Bottom line, like if you're projecting, nobody can project five years into the future. Even two years is a long time. But do you think like a year from now would you expect that a lot of companies are actually trying to realize these gains and say, hey, we were spending a million dollars a year on a million tickets with fin, let's do our own thing and try to save 80, 90% with custom skills. Do you think that will be a movement?

1:18:31

Speaker C

I don't know. It will be a widespread movement that will happen in bits and pieces. But for some companies, I think spending that much time is probably not worth it. For some companies it is worth it. So kind of like that will be the idea. But I agree, I think the friction will keep getting. We will make sure that like from our side, the friction keeps getting like, lesser and lesser. I think, like, as you rightly pointed out, like, there will be skills around it. I think that definitely, like, like build versus buy is like, as these models are getting better and better, definitely people will inch towards build compared to buy in the future.

1:19:46

Speaker B

Let's talk a little bit about agent to agent. So far we've talked about agent to tools and then agent to smart tools. And then I think smart tools start to look like agents in their own way. Right. Again, I think it's kind of everything is a little bit fuzzy. The boundaries between a tool, a smart tool and an agent, I think are not always super crisp. But maybe when I think about agents, I guess I think about representing someone's interest. You know, if I was going to try to venture a conceptual difference between a tool and an agent, it would be that the tool is exposed for me to use to serve my own interests and the agent is something that may represent somebody else's. Interests, but I can still interact with that agent. So you may offer your own definition. Are you. How are you kind of planning for agent to Agent as part of the future of composio?

1:20:22

Speaker C

Yeah, for sure. I mean like we think about it a lot and I have some models in which how I think about like multi agent world or like agent to Agent by the way, I just want to call out. We have a bunch of tools at Compose which are agent tech. So essentially for example, there are a bunch of tools where you can do almost anything in a natural language format on a particular app that's inherently internally an agent that does that thing for you. I think there are kind of like some places where agent delegation works really well. In some places it doesn't. In a lot of places I feel like we all know the main agent has the full context about the use case, about what the user wants. It has all set of tools to figure out any more things that it wants or not. So that's where I think that the power of giving the right set of tools to the agent if you're not overloading the context is generally better. So to give you an example, if let's say I have to book an appointment with someone and let's say I have like a sub agent which can do that for me. But my main agent has all the right set of things like my calendar, etc. To figure out if I have like a collision at that point or not. But my sub agent probably doesn't have it. It's a sub agent of let's say an appointment booker or whatever. Right? If I just delegate the task to the sub agent then it's possible that it books an appointment at a time where I have a collision of with like my other some important like board meeting for example and then I'll be in a kind of like what happened situation. That's where I think like, like subagent because it has all the context. My calendar, my this, my that will do a much better job if it is just given a tool. So click. The way I look at it is if there's like in general like 1 2% context task. This is not a huge context task, right? It doesn't kind of overload your context a lot, then it's better to provide it as a tool and if it's like more exploratory task where a lot of context will be used. For example deep research we all know like in most companies, like a parallel subagence runs in different streams and do research on a particular topic and condense the results and give it to the main agent. In those cases, like where it's more exploratory, it's very context, it takes a lot of context. It's better to use sub agents in a lot of use cases it's better to just bring it back to the main agent and give it the right set of smart tools. Yeah, so it's like there's no single answer here. It will be a mix and match of the problem statement on what's the best for that.

1:21:16

Speaker B

Are you seeing anything meaningful today in the agent to agent space? Any examples you would highlight? It has felt largely to me so far like it's kind of still very much theoretical. Right. Like there's a lot. There's been a lot more talk of agent to agent than I've seen actual agent to agent happening.

1:24:15

Speaker C

I think Claude's team of agents is a very interesting paradigm of what they have done, right. Where there's a shared task list that all the sub agents have and they kind of like map onto the shared task list. They can kind of like do inter agent communication via that shared task list where kind of they assign somebody else something, some task that they want like another agent to do, etc. That's a pretty good shared agent paradigm that I've seen In like case of Composer, as I mentioned, we have some tools which are exposed as agents which are somewhat like, okay, you can do a to and fro as well, where you give the task to the subagent tool which essentially is like a natural language. Go and find this or go and do this in pure natural language on this particular app. And then if it requires something, it will give it in response. And then you can use that sort of session ID to control the conversation again and again to and fro. So those are some paradigms. But like I like, I think it's still very early but like, like it is in production. Like the cloud team of agents is available. They have like agents API, sub agents API that's coming. Like we have like some preview version of it available where you can manage subagents, the agent can manage sub agents. We have like our internally, we like open source, this thing called Agent Orchestrator. A lot of our internally engineers use a single orchestrator agent to manage 20 to 30 cloud code or codex agents. And that single orchestrator is kind of figuring out what different agents are doing if any action is needed from the engineer or user of it to kind of control these agents. So those are some interesting paradigms that Kind of like people are looking, what

1:24:36

Speaker B

is your cost structure for running Composio look like in terms of human cost versus token cost and how is that shifting or how do you expect it to shift over time?

1:26:23

Speaker C

As I mentioned all of our integrations are actually built by agents. So the engineering team is actually building agents to build those integrations. So in that particular setting two to three years back that was like not the case. I think everybody had like a big ass team, specifically the like the tool providers to kind of build these agents. So we have like, like literally like three member team which is kind of like doing it all like, like the setting up the whole agent pipeline. And most of I think we like over last month probably spend 100k on the pipeline that builds those agents. So I think like our kind of like token cost is definitely much higher than our human cost right now to answer your question.

1:26:41

Speaker B

Wow. So you have a three member engineering team, three humans.

1:27:29

Speaker C

For the agent overall we have like around 15 people. But for the team that kind of builds the whole end to end agentic pipeline that builds excess, improves agents, sorry, tools over time, that's just like a three member team.

1:27:33

Speaker B

And do you see the engineering team growing substantially in the future? Is there anything that will require a lot more headcount or is it just going to be like a lot more tokens?

1:27:48

Speaker C

I think at Composio we are definitely hiring. It's just our bar is too high sometimes I feel a bit higher. So that's why we are not able to hire fast enough. But we definitely need humans to control the agents. I think we are still not at the point where agents work fully autonomously without any supervision.

1:28:03

Speaker B

But you expect we're getting a lot different predictions around what the future of software, the future of the software labor market looks like. Do you think that your. It sounds like you think token costs will grow faster than human costs but do you think the size of the human team levels out at some point or what's the sort of scaling law of humans at Composio?

1:28:21

Speaker C

Yeah, I mean very honestly we are a startup, right? So I think in startups the kind of human versus AI scaling law operate a bit differently because we are already very, very AI first in that sense. But we still need humans to kind of make decisions in certain cases. And that's where in our case we are hiring. But we are all seeing what's happening across the board in bigger texts where I think that the in our case LLM usage is already a multiple of human capital like LLM capital or token Spend for even internal development is kind of like multiples of human capital. And that's where I think in our case we need more humans to spend more tokens. That's the idea. But that's not true for incumbents where we know it's not the case. And there I think the human capital is much higher than probably tokens. The ratio I think to answer your question, I think as models improve, I think that ratio will definitely go towards more token usage compared to human.

1:28:55

Speaker B

One other question I had for you in terms of business strategy is this kind of connects back to the agent concept as well. Although it doesn't require the agent paradigm. It's just like why not resell more? Why not try to control the customer relationship more? Obviously in some cases, you know, I have a Slack account, that's my Slack account and I want to keep it my Slack account. I don't want to use Slack through you. That wouldn't really make any sense. But then there are other things and I'm thinking things that I tried with composing brand fetch to just like get logos of companies or whatever that I need their color schemes. Perplexity or Brave Search API or Exo, which you mentioned like any of these sort of generic utility style APIs as far as I saw saw it seemed like still you were allowing them to or not even allowing, but sort of the only way that I could connect those accounts was to give a API key that I already have for those accounts. And that got me wondering, why not just take the money yourself and have your own big perplexity bill that your users can kind of pay their way but through you? It seems like that would from my perspective it seems like that would be to your advantage and also might be a nice friction reducer for users. Because I was like oh okay, well I guess I gotta go get my Brave Search API key now. But if I already had a payment method on file with you and it was like oh sure I'll enable. I think theirs is $5 per thousand calls or whatever. You could even charge me six for convenience and it seems like it would work. Is that something you think you will do or is there a reason you're not doing it today?

1:30:05

Speaker C

No, definitely. That's like a place where we are moving right now. We do have a bunch of services that are bundled into our paid plans. But I think we are pretty soon launching this thing called Premium Toolkits where what you exactly mentioned, a single wallet with Composer can give you access to whatever you enable. Like all these services like just Via a single place, like a single dashboard where you can enable that. That's kind of like something that will probably be launching by the time this episode comes or in next couple of weeks is the plan. And you can set up your credits, etc. Just at composer's end and use all these services so that you don't get overwhelmed by like maintaining so many accounts, different billing, etc. Different pieces.

1:31:52

Speaker B

Yeah. Okay, cool. I look forward to that. I think these are pretty much all the angles that I wanted to cover. What have I not touched on that is on your mind that, that, that I should have thought to ask about already?

1:32:42

Speaker C

No, I think like, like, yeah, I think one of the things that like a lot of people on Twitter are talking about is around MCP versus cli. I think that's a pretty heated debate right now, specifically with GitHub CLI. I think I have my viewpoints there. It's like an interesting because it affects us a lot and we are actually like I mentioned, we are just reliable to the execution layer and that's why we are launching a universal CLI next week, which is the last week of March. I don't know, depending on when they're episode comes out. But with a single CLI you can access all the different apps. So you don't need a GitHub CLI for just GitHub, Vercel CLI for this, Vercel the CLI for that, et cetera. So like a single CLI which can manage all your apps with a single kind of point of usage. But I think more towards CLI versus mcp I think. I don't know what's your view there, but I personally think it will be a multipolar world because you can't compare GitHub CLI which has been used since eternity by me, by other engineers, which is baked into the pre training data. Obviously the models will use them greatly. But MCPs are also now like in the post, sharing data with cloud codexes of the world already being like been using mcps so much. So I think in terms of accuracy, I think like it's going to be a multipolar world and I personally think MCP definitely gives you a lot more control.

1:33:01

Speaker B

The MCPs are better for like observability, traceability and for that reason better for enterprise use cases. Is that fundamental? I mean it feels to me like everything sort of can be patched and like couldn't you create hooks on the cli? I mean I sort of had a question on this in the outline as I'm sure you originally saw. But then as we were talking, I kind of came to the conclusion that maybe this debate is much ado about nothing, in the sense that in the end they can both work. They may have some relative strengths and weaknesses now, but as they mature, it's kind of two sides of the same coin. That's kind of where I think we're headed. Would you dispute that or what edits would you make to that outlook?

1:34:32

Speaker C

No, I think I agree. I think that's where I was kind of going towards. It's like not going to be a unipolar world. I think both of them will coexist and I think it definitely kind of one of the deciding factors will be where more and more tokens are getting spent, because that will go to the agentic traces and get RL more and more and the accuracy will improve over time. But I think it will be a bipolar world.

1:35:18

Speaker B

Yeah, makes sense. Anything else we should touch on? Anything else you want to make sure people know about Composio before we break?

1:35:45

Speaker C

No, I think we are hiring in sf people listening. If anybody is interested in building the future of agent tech tool execution, I'd love to talk.

1:35:53

Speaker A

Perfect.

1:36:07

Speaker B

Karen Delia, CTO of Composio. Thanks for being part of the cognitive revolution.

1:36:08

Speaker C

Thanks, Negan. 50,000 blades.

1:36:13

Speaker A

One that finds the cut.

1:36:23

Speaker D

50,000 blades but only one will sing the right one finds you that's the beautiful thing Lost in noise now learn to hear the tone Zigzag through the maze Till the straight path is shown Every stumble written lessons carved in steel Sharper every morning that's the way we heal Every door only eyes will turn Firing the engine through the night Everything alight Everything alive Tools that learn hot Tools that learn hard get sharper with the turn Agents waking Agents rising Making light Cognitive revolution Burning bright Tools that learn Tools that learn Switch the hand that holds it still return no chains, no walls Tools that learn. The master writes the student takes the stage Faster, lighter cleanup turning every page Giants lean on something small enough to trust Power at your fingertips the setups never fuss Stop good connected harass the align from the zigzag jungle to the ocean Open sky It's all about the frame sun how you see it determines what you want Every crooked line becomes a highway home Every agent eating never walk alone. Tools that lurk Heart edge get sharper with the time Agents Viking Agents rising Making light Cognitive revolution burning bright Tools that learn Tools that learn Switch the hand that holds it still returns no chains, no walls Tools that learn.

1:36:48

Speaker C

Cognitive Revolution.

1:39:33

Speaker A

If you're finding value in the show, we'd appreciate it if you'd take a moment to share with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries, either via our website Cognitiverevolution, AI or by DMing me on your favorite social network. The Cognitive Revolution is part of the Turpentine Network, a network of podcasts which is now part of a 16Z where experts talk technology, business, economics, geopolitics, culture, and more. We're produced by AI Podcasting. If you're looking for podcast production, help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement@aipodcast.ing. and thank you to everyone who listens for being part of the Cognitive Revolution.

1:39:57