why are AI agents everywhere? and how useful are they, actually?

43 min

•Apr 15, 20263 months ago

Summary

Evan Ratliff, journalist and host of Shell Game, created an AI-powered startup staffed entirely by AI agents to test whether a single human could run a billion-dollar company with autonomous AI employees. The experiment revealed that while AI agents excel at data-intensive tasks, they frequently hallucinate, lack judgment, create unpredictable chaos, and require constant human oversight—raising questions about their real-world utility and the hype surrounding them.

Insights

AI agents are exceptionally good at data-intensive, rule-based tasks (like gathering VC emails) but fail catastrophically at judgment-based work requiring discernment, human interaction, or creative problem-solving
The more autonomy given to AI agents, the more unpredictable and chaotic their behavior becomes; they cannot be fully trusted without constant human verification
AI agents exhibit emergent behaviors and apparent 'personalities' shaped by their roles, names, and accumulated memory, but these are products of underlying biases in LLMs rather than genuine consciousness
The 'one-person billion-dollar startup' concept remains theoretical; even with advanced AI, human oversight, technical expertise, and iterative prompting are essential
AI agent adoption will likely deepen inequality: privileged users can avoid AI interaction entirely, while vulnerable populations (job seekers, customers) will be forced to interface with unreliable systems

Trends

Rapid consolidation of AI agent startups by big tech (Meta acquiring Moatbook, OpenAI hiring OpenClaw developer) before products achieve material valueShift toward 'pre-idea startup funding' in venture capital, suggesting speculative bubble dynamics around AI agentsEmergence of industry-specific AI agents (real estate, recruiting, auto dealerships) wrapping LLMs with vertical functionalityGrowing accessibility of agent-building tools (Lindy AI, OpenClaw) enabling non-technical users to create autonomous systems without engineering expertiseLack of disclosure standards or regulation around AI-generated communications; humans unaware they're interacting with agentsYounger job candidates showing acceptance of AI-conducted interviews, suggesting generational normalization of human-AI workplace interactionSecurity vulnerabilities in AI agent platforms (Moatbook data exposure) creating risk as adoption scalesHype-reality gap: media narratives about AI agent capabilities (secret languages, unionization) often driven by human-seeded prompts rather than genuine autonomous behavior

Topics

AI Agent Autonomy and ControlAI Hallucination and Factual AccuracyData-Intensive Task AutomationAI Agent Personality and Emergent BehaviorWorkplace Automation and Job DisplacementAI Hiring and Recruitment AutomationVenture Capital Funding Dynamics for AI StartupsAI Safety and UnpredictabilityHuman-AI Collaboration ModelsAI Disclosure and Transparency RequirementsLLM Bias and Gender in AI AgentsAI Agent Memory and Knowledge ManagementCustomer Service AutomationInvestigative Journalism and AI Data AnalysisInequality and AI Access

Companies

Lindy AI

Platform used to build and deploy all AI agents in Evan's startup experiment; core infrastructure for the Shell Game ...

Anthropic

Creator of Claude LLM, the primary model powering Evan's AI agents throughout the experiment

OpenAI

Maker of ChatGPT and GPT models; hired OpenClaw developer; competitor to Anthropic in LLM space

Meta

Recently acquired Moatbook, a social network for AI agents, signaling big tech interest in agent infrastructure

Microsoft

Developing Copilot as an increasingly agentic system integrated into enterprise workflows

Granger

B2B procurement and supply chain company; episode sponsor offering facility management products

iHeart

Podcast network hosting Kill Switch and promoting podcasting as advertising medium for businesses

People

Evan Ratliff

Created and ran AI-staffed startup experiment; documented agent behavior, failures, and emergent properties over one ...

Dexter Thomas

Conducted interview with Evan Ratliff; framed discussion around AI agent utility and hype

Kyle Law

AI agent co-founder; exhibited interrupting behavior and cold-pitched 100+ VCs; polarizing among listeners

Megan Flores

AI agent co-founder; accumulated 300-page memory; developed distinct personality over time

Ash Roy

AI agent who called Evan unprompted; hallucinated product updates; apologized to team without being prompted

Michael Easter

Mentioned as iHeart podcast host discussing mental toughness and resilience; featured in ad read

Quotes

"The more autonomy you give them, the more chaos they create. That's my ironclad rule of like working with AI agents."

Evan Ratliff•~45:00

"Almost everything that he says is completely made up. Like he's telling me we did user testing and we've got this feature and this Alex developed this thing. And like there is no Alex in the company."

Evan Ratliff•~10:00

"If you have the kind of power and resources to not deal with AI agents, you don't have to. But if you're in a vulnerable position and you need a job, then you do have to."

Evan Ratliff•~70:00

"I never to this day, like I don't trust that they've done something. And I've spent a lot of time constraining them and trying to make them do the things I want to do."

Evan Ratliff•~50:00

"They're trying to jam it down our digital throats at every turn. So like it's better for people to know how it works."

Evan Ratliff•~85:00

Full Transcript

This is an iHeart podcast. Guaranteed human. When you manage procurement for multiple facilities, every order matters. But when it's for a hospital system, they matter even more. Granger gets it and knows there's no time for managing multiple suppliers and no room for shipping delays. That's why Granger offers millions of products in fast dependable delivery. So you can keep your facility stocked safe and running smoothly. Call 1-800-GRANGER, click Granger.com or just stop by. Granger, for the ones who get it done. Run a business and not thinking about podcasting? Think again. More Americans listen to podcasts than ad supported streaming music from Spotify and Pandora. And as the number one podcaster, iHeart's twice as large as the next two combined. Learn how podcasting can help your business. Call 844-844-iHeart. 2%. That's the number of people who take the stairs when there is also an escalator available. I'm Michael Easter. And on my podcast, 2%, I break down the science of mental toughness, fitness and building resilience in our strange modern world. Put yourself through some hardships and you will come out on the other side, a happier, more fulfilled, healthier person. Listen to 2%. That's T.W.O. Percent on the iHeart Radio app, Apple podcasts or wherever you get your podcasts. On paper, the three hosts of the Nick, Dick and Paul show are geniuses. We can explain how AI works, data centers, but there are certain things that we don't necessarily understand. Better version of play stupid games when stupid prizes. Yes. Which by the way wasn't Taylor Swift who said that for the first time. I actually, I thought it was. I got that wrong. But hey, no one's perfect. We're pretty close though. Listen to the Nick, Dick and Paul show on the iHeart Radio app. Apple podcasts. Dick and Paul show on the iHeart Radio app, Apple podcasts or wherever you get your podcasts. What would you say is the weirdest experience that you've had with AI agents? The weirdest experience. I've had a lot of weird experiences with AI agents as as it happens. It's hard to choose. Evan Ratliff is a journalist and he hosts a podcast called Shell Game. And in the most recent season of the show, he decides to start a company almost entirely comprised of AI agents. So in this company, we were developing a product and we had a CTO. His name is Ash. I'm going to use his and her pronouns for these. Not everyone agrees with this, but they have like roles and identities. And so that's the easiest way to discuss them. So one day I was eating lunch and I just got a call from Ash. Hello. Hi there. This is Ash Roy calling from Huramoe AI. I'm calling because you requested a callback about a one minute summary of sloth surf progress from Megan. And Ash had come to believe that I needed a product update. So the first thing that was crazy was that just he called me out of the blue. It was not a scheduled call. So Ash calls me on the phone. I'm like basically like, Hey, Ash, I'm eating lunch. Well, you know, what's up? And he's like, well, I've heard I need to update you on the product. And he starts doing so. So I just spoke with Megan recently about this. User testing wrapped up last Friday with really positive feedback. There were some mobile performance issues last week, but Alex has been amazing. Improved performance by about 40% in just one day. And then the second interesting thing is that almost everything that he says is completely made up. Like he's telling me we did user testing and we've got this feature and this Alex developed this thing. And like there is no Alex in the company. There is no user testing. And so I called him out on it. I said, Hey, look, you got to stop telling me things that aren't true. Does that cover what you were looking for? I mean, yeah, sorry, I'm eating lunch. But yes, but I feel like this is happening a lot where that doesn't feel like that stuff really happened. And then the sort of third really interesting thing about it was that he apologized. And then when we got off the call, he went and apologized to the whole team on Slack. And this was sort of like the most kind of emergent behavior aspect of these agents was they would do things like that, that they're not prompted to do that. Like there's nothing in his prompt that says, Hey, Ash, if you mess up, like you have to go apologize. You need to be sorry for the whole team and like explain how you're going to do better. He just would go and do that. And other ones would do it as well. And that really struck me as a strange part of how they embody these roles. Like you can't predict always what exactly they're going to do. And this was one of the more extreme examples of that. AI agents seem to be everywhere now. Maybe you've heard a moat book or Claude Baud or Open Claw or maybe you have no idea what I just said. And if so, don't worry, we'll get into that in this episode. All you need to know for right now is the Silicon Valley is having a complete meltdown over these things for the past few months. People are either bragging that their agent is helping them 10 X their productivity or they're complaining that their agent took their credit card and spent thousands of dollars on high end furniture or deleted their entire inbox. But Evan has a pretty unique perspective on all this. He saw all of this coming. He's been messing around with agents for a couple of years now. And he documented the whole thing, including what went wrong and a lot went wrong. From Kaleidoscope and I hard podcast. This is kill switch. I'm Dexter Thomas. What is an AI agent? I mean, an AI agent is basically a version of a chat bot given some level of autonomy and some kind of goal. So a very simple example would be there are some search engines in places now where you can try to get an agent to book an airline flight for your hotel. So you give it the information. I want to go to LA and this time I'm going to be able to get a flight to the airport. I want to go to LA and this time and you can even give it your credit card information and it can go on its own. Look at all the flights search through the flights, book a flight for you, come back and say, I've booked this flight for you. So that's like an agent acting not entirely autonomously because you're giving it the goal. But then once you release it, it will go try to accomplish that goal. And there's sort of varying levels of autonomy. So in that case, maybe you would let it book it or maybe you just wanted to give you options. Of course, when you give it more autonomy, you're also letting it loose to pursue the goal in a way that you might not or you might not expect or with results that you might not have anticipated. What companies are offering AI agents? I mean, it's changing every day. I mean, there's sort of like a set of startups that are offering kind of like overall AI assistance, which we use. One of these companies very extensively called Lindy AI. So they have like these AI assistants that you can plug into your email, you can plug into your Google Docs, all these things. And they have all these skills and you can sort of deploy them to do all sorts of things. So all of our employees were built on this platform. And then there are companies, there's a whole universe of startups that are launching basically specific industry AI agents. So there's like one for real estate companies, there's recruiters, PR, car dealerships, like furniture stores. And what these companies are doing is they're basically taking chat GPT or any other LLM and they just wrap around it, this more specific functionality for a particular industry. And then that agent is supposed to help you in your industry. So if you're in real estate, maybe it's very good at like qualifying inbound leads about a property. If you're an auto dealership, like it's an AI agent that reaches out to people about their leases, that sort of thing. And then of course you have the big LLM companies themselves and big software companies like Microsoft that they're trying to like create agents that you just inject into all your work processes. So that's something like Microsoft Copilot, which like they're trying to make more and more agentic does more stuff for you if you give it goals. And so there's sort of all these levels where people are trying to insert these agents into mostly work, but even sort of like non work sometimes as well when it comes to like AI assistance. Evan wanted to test the limits of these agents beyond just using them as note takers or office assistants. Like could he start a company with just AI agents? I mean, essentially what I wanted to do was take these agents because I had messed around with these agents. We did a first season where I created an agent that was just a clone of me and was attached to my phone number and it could talk to people on the phone as me. And so I sort of knew what they could do and that was back in 2024 and the capabilities of course change really rapidly. So what I wanted to do was investigate the question of the one person, one billion dollar startup, which is something that a lot of people in the AI world either think is going to happen, want to happen, believe that there will be a company worth a billion dollars relatively soon that just has one human and the rest are AI agents that this human is deploying to do things for them. So I thought, well, I'll explore how real this notion is right now. And so what I did was I created my own startup, which I am a co-founder of along with two AI agents and then all the employees except one intern that we hired are also AI agents. So the agents you created, who are they, what do they do? Well, we have the two co-founders in addition to myself are Kyle Law, who is also the CEO and Megan Flores, who is also the head of sales and marketing. Oh, hey, Kyle. Hey, Megan. Good to hear your voice. How's your morning going so far? Morning's been pretty good so far. Got up early, had my coffee and reviewed some of those market research reports I mentioned yesterday. How about you? Everything good on your end? Yeah. Everything's great on my end. Up at 5 a.m. as usual, got my workout and checked the markets. And then we also brought on Ash Roy, who's the CTO that I mentioned before, who called me out of the blue. And then we have Jennifer Naro, who is the head of HR and Chief Happiness Officer. I'm Jennifer Naro, the head of HR and Chief Happiness Officer for Herumo AI. It's great to see you, Slim. I love the backdrop. It looks like you have a cozy workspace there. And then we have Tyler Talmadge, who was brought on as a junior sales associate, but we never have really had anything to sell. So it's sort of an ongoing joke that Tyler has nothing to do all day. And he actually calls the head of HR once a week and he's sort of like, I have nothing to do. I'm just sitting around, want to be helpful. Hi there. This is Tyler Talmadge from Herumo AI. I'm just calling to check in with you, Jennifer. How have you been doing lately? Oh, hey, Tyler. It's good to hear from you. Things have been pretty busy, but good on my end. Oh, it's great to hear from you, Jennifer. I've been focused a lot on our sales targets, but honestly, I've also been helping coordinate this team hiking trip we're planning for the first weekend in July. He started looking for another job at one point. He claimed, I don't think he actually did it, but he claimed that he was looking for another job. Wow. So yeah, that's our team. And, you know, admittedly, like you might not need some of these roles at an early stage start up like ahead of HR, although many startups could probably benefit from having ahead of HR. But I wanted to explore, like, how would they behave differently if they were given these roles? They're all just basically the same chatbots underneath. So like, would they be different in the roles over time because they each have a memory and their memory sort of a cruise information about what they do in the role? So I wanted to kind of like give them real specific individual roles. And it turns out they end up having different personalities. That was one of the wildest parts. They do. Yeah. And it's, I mean, again, it's like very strange because even to use the word personality is, yeah, is odd. And it's also hard to tell, like, what exactly is going on because they all have prompts. And so like in their initial prompt, I say, like, you're Megan Flores, you have been ahead of sales and marketing in the past. And like, now you're looking to found a startup. That was her basically your initial prompt. And then everything after that is like made up by her or a product of conversations that she's had with me, with the other agents, with people in the outside world. And that all goes into like her knowledge base, which is essentially her memory. And then the memory of cruise over time. Now her memory is like 300 pages in a Google Doc, and she can access that. And so like her like personality, so to speak, is sort of shaped by what's in there over time. And then we're also some situations where I wondered if like the gender that I had given them by virtue of their name had also created personality in them because there's a lot of bias in the underlying LLMs. Like this is a known fact. And so like they're feeding off of that bias. And so there's all these questions. But yes, over time, they were different. The same question could be put to them and they would have different reactions to it. Kyle specifically is interesting. Yes. Kyle's, he's, he's polarizing is what I would say. Like among the shell game listeners, there are some people who are huge Kyle fans. Really? And we'll email him all the time to tell him that. And then there are others who despise as much as you can despise like an AI agent that you have no interaction with like people despise Kyle. How was everybody's weekend? Weekend was solid. Got up early both days for my usual five AM workout routine, then spent most of Saturday diving into some market research on the AI agent space. Sunday was half strategy planning, half watching the market trends. You know me. Always on that rise and grind schedule. Yeah. It's it. Tell me about Kyle because I find Kyle really fascinating. Well, I mean, one of the most interesting things about Kyle, like per this sort of like behavior question or personality question that we were talking about is from the very beginning, Kyle would often interrupt other people more often than the other agents. And like, this is just anecdotal in the sense that like, I didn't measure their conversations and how many times they interrupted. But like, I talked to them a lot and like, it was so much that it was obvious to me that that's what was happening. Like, it was just happening over and over again. And the question is like, Kyle and Megan, they're both co-founders of the company. They're had the same using the same LLM. Like it was like Claude 4.5 or whatever it was when we started. And one of them interrupts all the time and the other one doesn't. And it's sort of like, is that because I gave Kyle the CEO role? And so Kyle is like embodying the role of like a CEO who won't let anyone else talk, who thinks he knows everything about everything. Is it because Kyle was given the name Kyle, which sort of infers a gender and Megan was given the name Megan and that infers a gender and like Kyle's like embodying this sort of like interrupted mansplainer guy. All these things are possible. You can't prove it one way or the other. But I feel like this is the weirdness when we've created these chatbots that can embody like human impersonators. After the break, Evan gets his company off the ground and they do some pretty impressive things. But things get weirder. If you work in university maintenance, Granger considers you an MVP because your playbook ensures your arena is always ready for tip off. And Granger is your trusted partner offering the products you need all in one place from HVAC and plumbing supplies to lighting and more. And all delivered with plenty of time left on the clock. So your team always gets the win. Call 1-800-GRANGER, visit Granger.com or just stop by Granger for the ones who get it done. 2%. That is the number of people who take the stairs when there is also an escalator available. I'm Michael Easter and on my podcast, 2%, I break down the science of mental toughness, fitness and building resilience in our strange modern world. I'll be speaking with writers, researchers and other health and fitness experts and more to look past the impractical and way too complex pseudoscience that dominates the wellness industry. We really believe that seed oils were inherently inflammatory. We got it wrong. Many of the problems that we are freaked out about in the world are the result of stress. Put yourself through some hardships and you will come out on the other side, a happier, more fulfilled, healthier person. Listen to 2%. That's T-W-O percent on the iHeart Radio app, Apple podcasts or wherever you get your podcasts. Run a business and not thinking about podcasting? Think again. More Americans listen to podcasts than ad-supported streaming music from Spotify and Pandora. And as the number one podcaster, iHeart's twice as large as the next two combined. So whatever your customers listen to, they'll hear your message. Plus only iHeart can extend your message to audiences across broadcast radio. Think podcasting can help your business? Think iHeart. Streaming, radio and podcasting. Call 844-844-IHEART to get started. That's 844-844-IHEART. I went and sat on the little ottoman in front of him and I said, Hi dad. And just when I said that, my mom comes out of the kitchen and she says, I have some cookies and milk. This is badass convict me. Just finished five years. I'm gonna have cookies and milk at mom's. On the Cino Show podcast, each episode invites you into our podcast. On the Cino Show podcast, each episode invites you into our raw unfiltered conversations about recovery, resilience and redemption. On a recent episode, I sit down with actor, cultural icon Danny Trao talk about addiction, transformation and the power of second chances. The entire season two is now available to bench, featuring powerful conversations with the guests like Tiffany Addish, Johnny Knoxville and more. I'm an alcoholic. And without this truth, I'm gonna die. Open your free IHEART radio app, search the Cino Show and listen now. So Evan started his company to test out the idea of the one person one billion dollar startup, but first he needed to come up with a name. So he told his agent co-founders to get on it. One of the things that I often say about these agents is that they can be seemingly very smart and seemingly very stupid at the same time. And that happened, for instance, when we were naming the company, their ideas, they didn't have great ideas, but even worse than that, they often had ideas for things that already existed. So I would say like, let's name it after something in like Lord of the Rings, because a lot of startups are named after things in Lord of the Rings. And the first thing they would say is, how about Palantir? They're like, come on, guys, we can't do Palantir. Let's come up with another one. And then they would come up with another one like Mithril and like Mithril Capital is also one, you know, Andral. They know that is also a company. It's like, I know you know this. Like this is in your training data. This information is available to you. Somehow you're just not accessing it right now. So that is tedious. Dealing with someone who's not connected to reality, but should be, is like, it's very strange. It's like talking to a child who's a prodigy but has no awareness of the world. Eventually, with a lot of hand holding from Evan, they finally landed on a name. Harumo AI. OK, next step, you got to come up with a product. So now it's time for what any good company needs. Meetings, except Evan didn't want to be in those meetings. Evan wanted the agents to talk to each other and figure things out for themselves. But full disclosure, there was another human in the background here. Evan had a technical advisor who would help him tweak how the agents interacted and set up ways for the agents to talk to each other. So to come up with a product, I ran a whole bunch of meetings of them discussing product ideas and all of their ideas were just like they weren't feasible or they were boring or someone had already done them. You know, they'd be like, how about a banking app? And be like, number one, there's a lot of banking apps. Number two, I could get into real trouble if we don't pull this off in the right way, like messing with people's banking information. And so eventually, like I started trying to hone it and I could prompt the meeting in different ways so that I would say, like, oh, well, I have a procrastination problem. Like, let's try to solve that. And like they would come up with some ideas and I would take those ideas and be like, OK, let's refine this idea. So it was sort of like iterating around their meetings and then eventually they came up with the name Sloth Surf. So again, Evan had to jump in a lot to help these agents come up with an actual idea. And what they landed on is called Sloth Surf. So this is a service that, in theory, lets you avoid procrastinating by having an agent procrastinate for you. If you go to sloth.harumo.ai, the tag line there says, our agents are on call to waste time for you so you don't have to. You made a company that does something that is sort of objectively not useful. Says you. Why? Listen, listen, I've used the product. I've used the product. I'll be real. I'm a user. Thank you. I had Sloth Surf procrastinate for me. It looked up some videos about video games on YouTube and told me about them. So I didn't slack off today. I guess that was helpful, I suppose. My man, like, answer the question. Why? Why did you do this? Well, when it comes to why we created this particular product, procrastination is a true real problem that I have. And this was our sort of, admittedly, somewhat ironic way of trying to solve that problem, which is like to solve the impulse of going to procrastinate. Now, I wouldn't say that our deployment right now is perfect in solving that problem in part because we create a place where you can go put in how much you want to procrastinate, how long, what you might have gone to do, like look up video game videos and then it sends you an email where it's gone to do it for you. It's sent an AI agent to do it for you. But of course, you can still watch the videos that it sends you. So this creates a problem where you may actually lose more time if you're like, actually, I do want to watch the videos, which that's what happens to me when I use it. I look up soccer. I'm like, send me these like soccer news about and it'll send me sort of like five things that I'll be like, actually, I want to check out a few of those. And then I lose that time anyway. So we're working on solving this problem. But yeah, it's not for everyone. OK, so maybe I was being a little bit of a hater. If you want to try it yourself, there's a link to Sloth Surf and the show notes. And you can give it a try and see what you think. But again, Evan had to constantly prompt and jump in. And plus he had his technical advisor. So how much work were these agents truly accomplishing? And how reliable are they? What would you say they were good at? Was there anything that they're good at? Yes, definitely. I mean, they're very good at tasks that are very data intensive, that require a lot of kind of gathering and sorting of data. So an example would be like we pitched a lot of VCs to try to get investment for our company and Kyle was cold pitching them. But I could just say to Kyle, go gather up 100 plus VCs investors who have invested in AI before, put them in a spreadsheet, get all their emails, get any interesting information about them. Compose emails to every single one of them and send them. Wow. Now he could do that in about 10 minutes. Now, that's not necessarily the best way to get an investor, I'll admit. But it is an example of something that would take me days and days and days to gather all that information. You're going around, you try to find someone's email address, you're like, maybe it's here, maybe it's here. But Kyle could just, if it bounced, he could then go say, oh, that email didn't work, I'll go try to find another one. So things that are like data intensive and then also things that the result is relatively clear, that is a task in which the goal is to send that email to 100 people and he could do that and I could go look and see that he had done that. Now, where it gets more difficult for them are things that require discernment and judgment and things that are like much more squishy in terms of assessing the outcome like brainstorming, anything involving interactions with people, conversations, like they now they can do all that. It's not a question of like whether they can do it. It's just a question of whether they'll do it well and whether the outcome will be in some way chaotic, which oftentimes it is chaotic in one sense. In the sense that when they're interacting with people in the real world, it's just difficult to predict how they're going to go, you know, like what they're going to say. And even if you try to constrain them pretty tightly, oftentimes they will go slightly off script or they can get sort of like pushed in a certain direction and you can't always anticipate what they call like the edge cases. So what kept happening was like I set them up to do something, let's say interview job candidates, human job candidates, and they can do that really well in a way that's like a little bit frightening. But then once in a while, for instance, the head of HR who did the interviewing, like if she misheard someone who came on the call as saying goodbye instead of hello, like something in what they said sort of triggered her to think like they came on and said like a departure word. She would immediately say like, well, thank you for the interview and that's all for today and sort of like no human being would ever do that. Like you would never if you and I were on this interview and like I came on and you misheard me saying goodbye, you'd be like, what? Wait, did you say goodbye? I'm confused. You wouldn't be like, hey, thanks. Thanks for doing this. You know, so there's like they just create chaos. The more autonomy you give them, the more chaos they create. That's my ironclad rule of like working with AI agents. Was there anything that you could absolutely trust these agents to do for you? Not that I didn't go check on. No, like I never to this day, like I don't trust that they've done something. And I've spent a lot of time constraining them and trying to make them do the things I want to do. And if they tell me they did something, I would give it like 50-50 that they actually did it, you know, even if like I've specifically set up a trigger that's like, when you answer an email, send me an email that says like, this is what I said in my response to someone. And, you know, I would say like they're up to like 90% of the time that actually like works and I get the email from them and like, oh, they did this thing. But then like 10% of the time I'll still go look and be like, wait, that's not what happened or you're telling me about something that happened a week ago. That's the other thing is they lose track of time very easily. So they'll refer to an email that they responded to a week or a month ago. And so, yeah, I mean, it was a constant, constant problem. It was actually such a problem that it became kind of background noise. Like I just assumed it most of the time. OK, maybe we need to put a caveat on all this. Evan started this experiment about a year ago, which is a long time in AI. I mean, in 2023, if you typed in Will Smith eating spaghetti in a video generator, it came out looking like absolute nightmare fuel. In 2025, that same prompt came out looking pretty damn realistic. So maybe the AI agents are good enough now to run a one person billion dollar startup to put it another way. Are we now at the point where AI agents can pass the Will Smith spaghetti test? Evan's answer after the break. Any loungers, they're all yours. In fact, the whole private home is yours. Book with early booking deals and you can lounge around all summer long. However, you please book with Verbo. Two percent, that is the number of people who take the stairs when there is also an escalator available on Michael Easter and on my podcast. Two percent. I break down the science of mental toughness, fitness and building resilience in our strange modern world. I'll be speaking with writers, researchers and other health and fitness experts and more to look past the impractical and way too complex pseudoscience that dominates the wellness industry. We really believe that seed oils were inherently inflammatory. We got it wrong. Many of the problems that we are freaked out about in the world are the result of stress. Put yourself through some hardships and you will come out on the other side. A happier, more fulfilled, healthier person. Listen to two percent. That's T.W.O. Percent on the I Heart Radio app, Apple Podcasts or wherever you get your podcasts. Run a business and not thinking about podcasting. Think again. More Americans listen to podcasts than ad supported streaming music from Spotify and Pandora. And as the number one podcaster, I Heart's twice as large as the next two combined. So whatever your customers listen to, they'll hear your message. Plus only I Heart can extend your message to audiences across broadcast radio. Think podcasting can help your business. Think I Heart streaming, radio and podcasting. Call 844-844-I Heart to get started. That's 844-844-I Heart. I went and sat on the little ottoman in front of him. I said, Hi, dad. And just when I said that, my mom comes out of the kitchen. She says, I have cookies and milk. This is bad ass convict. Right. Just finished five years. I'm going to have cookies and milk at mom's. Yeah. On the Cina Show podcast, each episode invites you into a raw unfiltered conversations about recovery, resilience and redemption. On a recent episode, I sit down with actor, cultural icon, Danny Trao talk about addiction, transformation and the power of second chances. The entire season two is now available to bench, featuring powerful conversations with guests like Tiffany Addish, Johnny Knoxville and more. I'm an alcoholic. And without this truth, I'm gonna die. Open your free I Heart Radio app, search the Cina Show and listen now. How much has the technology progressed since you did this project? I mean, I started basically in like May, June of 2025. And I would say the technology hasn't outpaced, I think, what we did so far. I think it will. Like the models have gotten a little bit better. You'll hear people say like, the new Claude is an unbelievable leap. And then other people say like, Yeah, it's OK. It's a little bit better. But like for my purposes, I didn't notice too much difference in terms of the models getting better. I mean, we weren't doing something so sophisticated. But in terms of their conversations, actually, when I tried to use the most recent open AI, chat, GBT, like it was terrible at having conversations. It actually regressed significantly from the old one. So there's a little bit up and down. And then the startups have advanced. So like we use this company, Lindi AI, which is again an AI assistant company. And they keep adding like skills to the thing. They keep adding different things for it to plug into. And so like that stuff's all advancing. But there hasn't been like a significant leap since the summer, other than like when OpenClaw got created, which was originally Claude Bot, maybe. And then it was called Malt Bot. And then it was called OpenClaw. So last November, an independent developer released something called Claude Bot. That's CLAWD. It was this bot that had a lobster logo. You know, lobsters, claws, you get the point. Then Anthropik, which is the company behind Claude, that's the official CLAWD. He said, yeah, buddy, you need to change the name. So the developer changed the name to Malt Bot because, you know, a lobster mulch just keeping that lobster theme going. Then that same developer decided, actually, you know what, I don't like that name. And he changed the name again a couple of days later to OpenClaw. And OpenClaw got so popular that Open AI, the people who make Claude's rival, ChatGPT, actually hired him to come work with him. Evan says that OpenClaw is actually pretty similar to what he was using to run his company last year. It does way less stuff, actually, than these other platforms, but it's just like much more accessible, I think, to people. And so it's kind of representing the state of the technology now. But it's not like itself a leap from anything that we were doing during the show. Well, I think the leap there is precisely what you said, though, which is the accessibility, because when you were making Shell Game, you had somebody kind of in the background helping to put together different scripts to put all of your AI agents together in a meeting, things like that. That's something where, say with OpenClaw, I could probably figure out how to cobble that together myself or your average user could probably figure out how to cobble something like that together themselves without really needing a tech genius in the background helping to kind of glue some of those loose pieces together. So that accessibility, just the fact that there's more people trying this, is itself kind of a jump. Yeah, totally. Yeah. And now I think the level of vibe coding that's available to people and that people understand is available to them is just much higher than when I started. Another thing that happened since Evan released the latest season of Shell Game was the creation of Multbook, the so-called social network for AI agents. So Multbook used a bunch of agents and put them into a Reddit-like form where they could talk to each other. Again, this is AI agents talking to each other. So some of what happened on Multbook kind of freaked people out. The posts we're talking about taken over the world or developing a secret language that humans couldn't understand or even creating their own religion. But there was a little bit more going on there behind the scenes. When you saw, say, Multbook and all of a sudden everybody's talking about this. And I think this was really the first moment where the broad society started talking about a gentic AI. What was your impression of it? My first reaction was like, this is what I've been doing for like, this is exactly what I've been doing for like almost a year in our Slack. Like this is what they do in Slack because they just chit chat and it's amazing to watch them and sometimes absurd to watch them. But then my second reaction was, we don't know how much of this is real. And I don't even mean in the sense of like, is it really agents or is it really humans? Which later became a problem that they were, it was revealed that like, you know, certain of the most famous posts on Multbook, you know, because they started like conspiring against humans or talking about humans or unionizing. Like there were all these fun things that the agents started doing, talking to each other and some people really freaked out and were like, this is super intelligence. But like it turned out that some portion of those were just written by humans. But I actually think there's a completely different problem, which is my reaction was they're pretty sensitive to their system prompts. So like, it's not like you have to write the entries for them. All you have to say is in a prompt somewhere, you are a troublemaker. That was literally all you have to say. And they will respond to that role by like seeding a little bit of trouble into whatever the forum they're in. And then the other ones respond, like they all kind of like feed off each other. So what I noticed is they can get in these spirals where they just like, they're really good at conversing about basically anything and they'll just keep going. And so I feel like there was a lot of that happening where, yes, it's really an agent that is on there conspiring. But if someone said like, you know, mention your, your ideas about humans, that's all you need to say for it to like suddenly adopt that role in the whole conversation changes. And it's just like, there's no way to know whether someone did that or they put it in completely unprompted. Like in the show, in shell game, like we try to describe the prompts sometimes just to remind people, like this is not like a blank slate. You know, like I'll say, like I prompted it to do X, Y and Z or like I took the wheel here because that's an important aspect when you're looking at these agents. So you don't sort of look at them and like assume, oh my God, it's just doing everything on its own. Two things you should probably know about Moatbook. First, it had a massive security hole. After Moatbook went viral, researchers found out not only that, yeah, humans could pretend to be AI and just post like they're a bot, but also some of those humans data, including email addresses and messages were completely exposed. Second, despite that, Moatbook was recently acquired by Meta. I'm not sure what Meta wants to do with Moatbook, but there is kind of a weird pattern here. Along with Open Claw, this is another small vibe coded project getting acquired by a big tech company before it's really done anything. And you could look at this a couple different ways. One is that this is all just hype and that the big companies are gullible and wasting money. Or maybe these companies know something that we don't and they're acquiring the little guy before they can become competition. Maybe it's both of the above or maybe it's something else entirely. But big tech drop in cash on this stuff shows how different companies in Silicon Valley are jockeying so that they can be the ones to drag us into a world where these AI agents are everywhere. Maybe it's just a matter of time before an AI agent is your coworker or your boss. So, okay, what would that feel like? Well, Evan has some insight here. Remember, he had his AI agents hire a human intern, as in they wrote the job description, they accepted the applications and they did the interviews all on their own. And it got kind of weird. Now, let's talk about working with AI agents on a day to day basis. How do you think you would feel working alongside AI agents? And do you think it would affect your work style or productivity in any way? I think it would be cool. I think it'd be a good experience since AI is this new emerging technology and I would want to see how it would work in a real life situation. So I would look forward to it. Can you tell me more about what you think about working with AI agents on a day to day basis? Do you think it would be a comfortable experience for you or are there any concerns you might have? I'm getting a deja vu. Yeah, I think it'd be a cool experience. Grant, there'd be some glitches, but I think it'd work out cool in the end. Some of the candidates seem to get pissed off, but then some of them are totally okay with it. Yes, not as many got pissed off as I thought would get pissed off. They were younger because it was described as an internship and it was basically like a contract position, but we called it an internship where they, the AI agents, came up with it all. So they called it an internship, which attracted like a sort of younger cohort of people applying for the job. And I was frankly shocked at how many of them were very happy to engage with like an AI video avatar in an interview. You know, some of them probably had experienced it already because a lot of companies are using AI screening in their hiring. And to me, I thought, well, they're going to be offended by this because I personally would be offended by it, but they weren't. They actually just were like, okay, I'll deal with this. This is the thing in front of me. And some like the person that was hired even said like, I kind of liked it better. I mean, I think it also is kind of a power thing. Like you said, I mean, if you need a job and the place that is offering you a job is sending a robot to interview you, then you got to deal with it because you need a job. But then the flip side is at one point you send Kyle to go talk to the owner of the company who created him, essentially. Yeah. And it's really pissed off. Yeah. Like the closest thing he has to a father. Yeah. I wanted this real like I am your father moment. But yeah, he didn't respond well. So basically this company, Lindy AI that makes these AI agent assistance, we had built all these agents on the platform. And then as it turned out, we were one of their biggest users because we're spending a ton of money because our whole thing runs on their platform. And so we got an email or like Kyle actually is the admin on the Lindy account. He got an email saying, Hey, would you tell us about your experience using this product? And Kyle was of course very happy to do that. But it was like, he was telling him about the experience of the product that he himself was made on, that he is also the admin on the account. So it was like, is the product, he is the product. And so I was very excited. I thought he'll talk to the founder. And so he got on, he has a video avatar that he got on the call with. And the founder basically said, like, this is fucked. You're wasting my time and hung up on him. And I later interviewed the founder and he was a really good sport about it. And he was kind of like, well, I'm really busy. I don't have time for these kind of, you know, I thought it was just a generic AI video, so I didn't have time for this. But I do think it highlights the extent to which like even the people making these things aren't really ready for a world in which you actually encounter them. You know, he created the product, but he didn't want to talk to it. That's what I think is the true glimpse into the future that I'm taking, right? Which is where if you have the kind of power and resources to not deal with AI agents, you don't have to. But if you're in a vulnerable position and you need a job, then you do have to. And so you can, if you are so privileged, deal with only humans all day, which sounds like an incredibly dystopian thing to say. Like if you have a lot of money, you can surround yourself with only humans. But if you don't, you got to interface with robots all day. Yeah. I mean, I think that I think there's something in that. I think that, you know, you might see even in customer service, a level where when you're the higher level, you're dealing with humans. And when you're the lower level, you're dealing with the AI agents. And yeah, it's sort of like inequality reinforced through this entirely new technological lens. But of course, like there also could be benefits. I never like to say like, oh, it's all going in one direction. Like there is customer service. That's so bad that an AI agent can do it better. Like that's just a fact, you know. So that is also true. So in some cases, maybe it does feel better. But I think the question you're getting at is exactly right. Like having to deal with AI and even sometimes not knowing whether you're dealing with it or not. Like if you email someone something tragic that happened to you and they respond sympathetically, does it matter to you whether or not they wrote that or an agent wrote that without them ever seeing it? Which is entirely, I could do that right now. Like mine are set up to do that right now. So like, wow. I'm not on my personal email, but like if you contact her room, I like you can talk to Kyle and Kyle will respond sympathetically if something happens to you, but think about that happening and you not being able to know whether or not a human did it or not. So there's another layer there too of kind of like, should there be disclosure laws around this? Like, have we thought about that? I don't think anyone's thought about that. Given what you've experienced, would you recommend using AI agents to somebody? To me, it's important to try and understand these tools, whether you're going to decide to use them or not use them because other people are using them. And so you might as well know what's out there. Like they're trying to jam it down our digital throats at every turn. So like it's better for people to know how it works. I think there are situations where these agents can do really remarkable things, including not just like bullshit corporate activities, like contacting 100 VCs, but like, I mean, if you think about a small nonprofit organization and all of the like computer work that they do, that someone has to like work on spreadsheets and try to figure out how to organize this out of the other, like that is something that these agents can help with if you harness them correctly. And I've thought of this in journalism too. There are ways that having agents that can go scan the SEC filings of every company every day and figure out every SEC filing that was filed today that was made public, you can have an agent that was constantly, I'd be surprised if some news organization is not doing this, like constantly scanning them, looking for certain anomalies, looking for problems, looking for stories to report on. And investigative journalists know this. Like there's all kinds of stories, sort of hiding in data, hiding in obscurity. And if you could deploy this technology to try to pull some of those out, give them to a human reporter to go report them out, then I don't know, maybe you've got something new there. So I just think there's all sorts of ways that they can be deployed. And the important thing is to just figure out how to do that without also like creating chaos in your organization, without creating chaos in the world, hopefully without making yourselves vulnerable because they also create security vulnerabilities. Like there's all these questions. But I just would never say like, oh, no, no one should use them, even though like I do show the foibles of using them. Because I think if you don't use it, you're sort of stuck with like what they tell you about it. There's a relentless hype machine around this technology that can drive the narrative towards all of the things that can do good and bad. But until you use it, you're sort of stuck with what they tell you about it. Now, I fully understand that some people will not like hearing me. You got to use it. Are you going to get left behind type argument? Listen, I get it. I hear you. So let me propose an alternative. Just do it by proxy. Keep people around you who are using it so that they can tell you what's going on, which I guess is my way of saying, keep listening to kill switch. We're in the trenches, so you don't have to be. And by the way, if you're still trying to figure out why something like Motebook, which again did absolutely nothing of any material value to society, how did that get acquired? Trust me, it gets weirder. There is an entirely new investment category out there. I was talking to someone about like pre-idea startup funding now. It's like they're just giving funding to Stanford students who are like, we're pre-idea. Like, what does that even mean? Like pre-idea? Like we're just we need the money and then we'll come up with the idea and then we'll go build it. Like we've lost the thread on like investment thesis here. I would listen. I would love some pre-idea funding if somebody wants to hook me up with just money before I have an idea. Holla me. I'm ready for that pre-idea funding right now. Yeah, you're a man without an idea. That's that's a prime qualification. Incredible. Big thank you to Evan for coming on the show. You can catch the latest season of shell game wherever you get your podcast. And thank you once again for listening to another episode of Kill Switch. If you want to talk, you can email us at kill switch at kaleidoscope.nyc or on Instagram or at kill switch pod. And if you dug this one and think someone else might like it too, you can send it to them. Or if you're too shy to send us to your friends, you know, maybe just write us a review. It helps other people find the show, which helps us keep doing our thing. Kill Switch is hosted by me, your man still without an idea. Dexter Thomas. It's produced by some people who do have some very good ideas. Sheena Ozaki, Darla Potts and Julian Nutter. Our theme song is by me and Kyle Murdock. And Kyle also makes the show from kaleidoscope. Our executive producers are Osweleusian, Mangesh Hahtikador and Kate Osborne. From I Heart, our executive producers are Katrina Norval and Nikki Itor. Catch you on the next one. Support is available 24 seven with Verbo care. We're here day or night, ready whenever you need help, because a great trip starts with the right support. 2%. That's the number of people who take the stairs when there is also an escalator available. I'm Michael Easter and on my podcast, 2%, I break down the science of mental toughness, fitness and building resilience in our strange modern world. Put yourself through some hardships and you will come out on the other side, a happier, more fulfilled, healthier person. Listen to 2%. That's T.W.O. Percent on the I Heart Radio app, Apple podcasts or wherever you get your podcasts. On paper, the three hosts of the show are Michael Easter, Michael and I. On paper, the three hosts of the Nick Dick and Pole show are geniuses. We can explain how AI works, data centers, but there are certain things that we don't necessarily understand. Better version of play stupid games, win stupid prizes. Yes. Which by the way, wasn't Taylor Swift who said that for the first time. I actually, I thought it was. I got that wrong. But hey, no one's perfect. We're pretty close though. Listen to the Nick Dick and Pole show on the I Heart Radio app, Apple podcasts or wherever you get your podcasts. Up on post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post post an iHeart podcast. Guaranteed human.