How AI digital doppelgängers could change the way we communicate w/ Synthesia CEO Victor Riparbelli

50 min

•Dec 17, 2024over 1 year ago

Summary

Victor Riparbelli, CEO of Synthesia, discusses how AI-generated digital avatars are transforming corporate communication and video creation at scale. The conversation explores the technology's potential to democratize content creation, the ethical implications of photorealistic digital humans, and how authentication and media literacy will become critical as deepfakes become indistinguishable from reality.

Insights

Digital avatars solve a B2B painkiller problem (replacing text/email) rather than a vitamin for entertainment—companies need video at scale but can't afford traditional production
The uncanny valley is being crossed for video; within 12 months, photorealistic avatars will feel indistinguishable from real Zoom calls, triggering a 'ChatGPT moment for video'
Content authentication (C2PA, blockchain watermarking) will become the default trust mechanism as deepfakes proliferate, shifting from 'believe what you see' to 'verify everything'
Human connection will remain valuable in relationship-driven roles (sales, hospitality) while being replaced in transactional roles (customer support), creating a bifurcated labor market
Democratizing video creation mirrors previous media revolutions (music production, YouTube, podcasting) that enabled new genres and voices previously gatekept by institutions

Trends

Enterprise video production shifting from Hollywood-style VFX to AI-generated content as cost and speed advantages outweigh quality concernsModeration moving upstream from distribution platforms to creation tools, raising new questions about platform responsibility vs. user freedomCelebrity licensing models evolving from one-time shoots to per-use micropayments, enabling smaller brands to access talent at scaleMedia literacy becoming critical infrastructure as photorealistic synthetic media becomes the default assumption onlineDigital avatars becoming standard corporate communication tools, similar to how email became table stakes for business operationsReal-time interactive AI avatars emerging as viable alternatives to human customer service, sales training, and professional coachingProvenance and content verification becoming competitive advantages and trust signals in an era of synthetic mediaB2B video content explosion as creation barriers collapse, requiring new curation and storytelling strategies to stand outRegulatory frameworks shifting from post-hoc moderation to pre-creation guardrails, particularly around news, politics, and sensitive topicsSynthetic media crossing the uncanny valley, making visual fidelity less differentiating than emotional expressiveness and body language

Topics

AI-Generated Video AvatarsDigital Human Creation and RealismDeepfake Detection and Content AuthenticationB2B Video Production at ScaleMedia Literacy and Trust in Digital ContentAI Safety and Content ModerationCelebrity Licensing and Digital LikenessesUncanny Valley in Synthetic MediaReal-Time Interactive AI AvatarsGenerative AI for Corporate CommunicationConsent and Ethical Frameworks in AIBlockchain for Content VerificationLabor Market Impact of AI AutomationVideo Democratization and AccessibilityPhotorealism in Computer-Generated Humans

Companies

Synthesia

AI video creation platform enabling users to build photorealistic digital avatars from 3-4 minutes of video; 50,000+ ...

Heineken

Major global company using Synthesia avatars for corporate communication and training

Zoom

Enterprise customer of Synthesia integrating digital avatars into platform

Xerox

Global corporation leveraging Synthesia avatars for internal communications

Electrolux

Global appliance company using Synthesia avatars to distribute workforce training video modules

Bumble

Dating app founder discussed using digital avatars for pre-interview matching in future dating experiences

Meta

Recently announced digital avatar tools for creators on their platforms, competing in avatar space

Adobe

Collaborating with Synthesia on C2PA content authentication and watermarking standards

OpenAI

ChatGPT and Advanced Voice Mode discussed as examples of AI crossing uncanny valley in conversational interfaces

Instagram

Referenced as example of how digital self-presentation and profile curation has normalized online personas

TikTok

Platform discussed as evolution of video-based digital identity and self-representation

YouTube

Referenced as democratizing technology that enabled new creator economy and content formats

Prime Video

Sponsor/advertiser mentioned in pre-roll ad segment

HBO Max

Sponsor/advertiser mentioned in pre-roll ad segment

People

Victor Riparbelli

Co-founder and CEO of Synthesia; primary guest discussing digital avatar technology, ethics, and future applications

Bilal Sadu

Host of The TED AI Show; interviewer conducting conversation with Victor Riparbelli

FKA Twigs

Musician who built a digital clone of herself to let fans interact with a version of her

David Beckham

Featured in Synthesia's early AI dubbing demo, speaking in multiple languages via avatar

George R.R. Martin

Author referenced in sponsor ad read for Game of Thrones series

Quotes

"Almost any technology we've invented for communication abstracts something away. Text being the most obvious example—if you take the experience of me talking to you in real life versus the same words in a text message, it would be very different."

Victor Riparbelli

"When you're interacting with ChatGPT, people are actually quite polite. You say please. And it's kind of weird because you're interacting with a computer that has no feelings. But because technology is now so powerful, it's very hard to feel that way."

Victor Riparbelli

"The biggest market for visual effects is actually going to be corporate communication in a couple of years, not Hollywood. No one would have thought that to ever happen."

Victor Riparbelli

"We're going to have to move from a world where if something is recorded with a microphone and camera, people assume it's true, to a world where we presume everything is fiction and work backwards from that."

Victor Riparbelli

"What if everyone could be a Spielberg? What if a film student can go out and say I have a great idea and all they need is time and a good idea? That's what excites me most about this technology."

Victor Riparbelli

Full Transcript

Prime Video offers the best in entertainment. This should be fun. Jason Momoa and Dave Bautista go completely down in the hilarious new action film The Wrecking Crew. Inbegrepen by Prime. Yeah, I'm pumped. Find the new Game of Thrones series A Night of the Seven Kingdoms. Based on the bestseller of George R.R. Martin. Look by being a member of HBO Max. So be brave, be just. So whatever you want to find, Prime Video. Here you look at everything. Abonnement is revised. In-house conferencing is 18+. Allormene voorwaarden zijn van toepassing. If you could jump online and be able to chat with your favorite musician anytime you like, for as long as you'd like, what would that be worth to you? What if you could connect with a personal dating coach, as often as you wished, to help sharpen up your online dating skills? Would that be appealing? Or what if you could make a digital copy of yourself and release your doppelganger to the web to take care of some of your online identity work for you? Much of this is actually within reach. Companies are learning to pair AI tech with video, audio, and animation tools to effectively mimic real people and real-ish interactions all at the same time. Musician FKA Twigs, for instance, built a digital clone of herself and uses it to let fans interact with a version of her. The founder of Bumble, the dating app, talked about how the future of dating might begin with digital avatars pre-interviewing each other. And that sort of flips the AI argument on its head a little bit, doesn't it? We've talked a lot about the potential and risks of AI becoming too human-like, but this is the reverse story. This is about human beings becoming more digital-like to become, in a sense, digital humans. If that's something you'd find useful, there's a handful of companies ready to help you create the digital version of you. One of those is called Synthesia. Using a short five-minute video you can record with your phone or webcam, you can build a reasonable facsimile of a human being. You can then choose a voice, give it a script, get it translated to dozens of languages, add a few design flourishes, and now you can push relatively pro-looking video content to your followers, your employees, whoever. No sets, no actors, no sweat. Many of Synthesia's clients aren't individual people. They're massive global companies like Heineken, Zoom, Xerox. Synthesia says more than 50,000 customers have built digital avatars into their comm strategies. In today's demanding market, we as team leaders need to be more than just experts at our jobs. This means that we need to be a leader, a coach, and a trainer. And we also need to embody the values, mission, and vision of our company. That probably sounds to you like a generic, typical computer-generated voice. And sure, it is, but it's also the voice of a Synthesia avatar that Electrolux, the global appliance company, uses to distribute video modules to help train its workforce. Be open and listen. Be transparent and available. Is that difficult? The tech is impressive enough that last summer, investors lifted Synthesia's valuation to unicorn status, hitting that vaunted $1 billion valuation. It seems like a lot of people are very interested and now very invested in seeing digital humans take off and take over how we communicate with each other now and into the future. But in this quest to build lifelike, useful digital avatars of ourselves, are we rewriting our understanding of what communicating human to human looks like? Who are we in a world that could soon be dominated by digital doppelgangers? I'm Bilal Sadu, and this is the TEDAI Show, where we figure out how to live and thrive in a world where AI is changing everything. What does it mean to be human in a world of digital doppelgangers? Big, juicy, philosophical question, I know. But Victor Ripperbelli is one of those real humans who thinks about this a lot. He's the co-founder of Synthesia. Hey, Victor, welcome to the show. Thanks, man. Glad to be here. Just to level set this conversation first, we already have so many tools for communication and you've talked about how text was the original data compression for human communication. But now we have video calls, messaging, social media, podcasts, newsletters, emails, the list goes on. Why are digital avatars necessary? I think at its very core, almost any technology we've invented for communication kind of abstracts something away, right? Like text being the most obvious example, where if you take the experience of me talking to you in real life and delivering some kind of a message versus the way you kind of perceive that message, interpret that message, would be very different than if I sent you exactly the same words written in a text message. I mean, even kind of pre-text, right? We had cave paintings with all sorts of other technologies that essentially helped us kind of like store information and deliver it to someone else kind of in a different time and different space. And what we've been doing since then is really just trying as much as we can to make these technologies appear to be as close to the experience we have in real life as possible. And I think we have lots of ways we've sort of gone around that. But obviously, you know, the ultimate way of doing this is that you can replicate the actual human experience of speaking to someone. And digital humans and digital avatars, of course, is an important part in that. And on that note, I've heard you refer to your avatars as digital humans. What's the difference in your mind? I think there's a lot of different words that kind of goes around. AI clones, AI avatars, AI humans. I think ultimately, I think they all represent roughly the same thing. If you say it's an avatar or a face or character, that kind of implies that it's a non-human entity, where you use the word human, that does imply something kind of different about it. And with the era we're living through right now with computational intelligence improving very, very rapidly, maybe I think the reason people are talking about digital human now is because it actually feels tangible that we can create something that very closely resembles human life, right? both in the real world, but I think before that in the digital world. Like all of us have interacted with chat GPTs and LLMs. We've seen firsthand the power, how much they can actually pretend to be a human. And if we can give them the kind of visual expression and the audio expression of that as well, digital humans, it actually does feel like we're going to get pretty close to be able to create something that feels like a digital human. Not just because we use the word, but because it actually feels like it will interact with it, right? So next year we'll launch a real-time avatar we can actually talk to. And I think there's probably something there where that's when we begin to think of it more as a human than we think of it as just a technology. And I think maybe a good way to anchor that is when you think about like a chatbot and ChatGPT, one thing that's very interesting is that, I do this myself and I think most people do, is that when you're interacting with these systems, people are actually quite polite. Yeah, definitely. You talk to ChatGPT like it's actually like a coworker. you say, please. And it's kind of weird, right? Because you're interacting with a computer that has no feelings as far as we're aware. But because technology is now so powerful, it's very hard for us, despite consciously knowing that we're interacting with a large language model, to feel that way, right? And I think that is our relationship with machines that's about to change quite dramatically. And digital humans is going to be the most obvious expression of that. There are two ways that a person can use Synthesia to create a digital human. They can pick from these off-the-shelf avatars that you own and build, or they can get custom avatars made of themselves. I'm curious, which is the more popular route? It's actually like roughly 50-50. In the beginning, we were kind of like, which one is the most important, right? And I think it's kind of time has gone on. It's very clear that there's no answer to that. They both serve different purposes. One of the things we learned very early on and when we started the company was that one of the big reason that people love the product so much was because they didn't have to be on camera themselves they don't like how they sound they don't like their accent and so a big part of the value proposition around these was actually the people could make video without having to be themselves right and and that was a pretty big unlock but then it's also very obvious that there's also a bunch of use cases where you want it to be yourself right so if you're a ceo creating a video about your company strategy for next year that's kind of weird coming from an anonymous avatar if you're a salesperson sending out videos to your prospects or to your existing customers to update them something that's happening in the product. It makes a lot of sense that it's you, and so on and so forth. So I think it's just, there will be many different types of use cases. And I think we'll see a mix of people's own avatars. We'll see entirely generated avatars that are specific to companies and our customers. So you can build your own kind of like IP, if you will. And there's also going to be existing real celebrities that's going to have, there's going to be a big unlock in terms of how they can work with brands in a much more scalable way than they could before. Look, even for myself, I would love to have my digital avatar, digital human be delegated to do a bunch of this stuff, especially the setup process of recording a video, I think is painful. But I'm curious for the demographic that you talked about that is super excited about not having to go through that pain or perhaps didn't grow up with selfie culture in this world with cameras all around them. When those folks first encounter their digital avatars, what kind of reactions do you typically see? A lot of people are very self-conscious, like they would be if they recorded, you know, just a screen recording of themselves, like a selfie video. But people like it when they like the result. And I think one interesting anecdote here is, you know, in the early days of Instagram, for example, the big road hack that Instagram employed was actually filters on images and on videos, right? It's actually very simple. It's like you take a picture and you make it, you know, slightly more saturated, you make it like black and white or whatever. but that makes that picture appear to look much much better than before right whereas every single image that people are taking before on their home cameras would look fairly crappy without having someone actually edit it which was like out of bounds for like most people and so i think what we see a lot of that is the same with avatars people want to kind of like touch themselves up they want to make sure that they're like you know being shot in a nice environment with nice lighting that they're like wearing their best clothes they want to be like the best representation of themselves but i think in general people love it right people especially people who doesn't want to be in video once they're happy with their appetite unlock so much for them like executives who otherwise are asked to record videos several days a week they now don't have to do that they can work with their team to just create the content automatically and then i think also people have this sort of um on a personal level right it's kind of it's pretty odd the first time you see your avatar it's pretty odd the first time you hear yourself speaking a language that you don't actually speak and it's clearly your voice it sounds like you and i think that's a very interesting glimpse for for people into kind of like the future right a lot of these what i love about gen ai as kind of a cultural movement and technology movement is that it's so accessible that all of us actually gets to feel firsthand what these technologies mean right what can they do how powerful are they and this is just such a visceral i think experience of some of the things that ai can do and i think also everyone feels like well this is only going to get better and better and better. Even though, of course, we've made a lot of progress, there's still so much more to go. I mean, these avatars are really cool. And I will say, I mean, especially coming from a VFX and CG background, you can at this stage tell that they're still an avatar. There's that whole uncanny valley question. And I'm curious on the consumption end of this, what are the reactions like? And does the context matter there? Like if people are reacting to a video in like a sales inbound email versus, you know, and countering it on a banking website versus a virtual CEO address. How do people react to these digital humans in these various contexts? So I think you nailed it there, right? It's very much about the context. I pretty sure that if I use my avatar to record a love letter to my girlfriend and It like you outsource this She probably a bit disappointed that I sent my avatar to do that and not my real self But if you're a user trying to understand like your mortgage application on a banking website and you're presented with like 10 pages of text of very kind of complex, with very complex information, almost everyone prefers to watch a video that just simplifies it for them, right? So what we generally see a lot of our customers, if not almost all of our customers, I think is that they introduce the avatars like, hey, this is your virtual facilitator. This is not a real person. This is an avatar and they're going to help you through the buying process. They're going to help onboard you to your company, whatever. And then what we see overwhelmingly is just that people really love interacting with these videos, especially if the alternative is text, right? We just did a big study with UCL here in London because we wanted to investigate how do people actually react to these videos. There's a few kind of interesting stats. One of them is that people actually completed the videos with avatars faster than the one with humans. That's because when they watch the videos of humans, humans are more imperfect, right? Like, you know, we kind of use a few too many words or we say something a little bit clunky or whatever. And so people kind of scroll back into the video to watch a section again. But with the avatars, because it's kind of like perfect in the sense that the script that's been kind of written from the get-go, and the information is actually more concise. And also very overwhelming shows that people by far prefer to learn by watching AI videos rather than ingesting text. The study that you just, the stats that you mentioned make total sense to me, right? It's like you're distilling down the information and just like communicating it in a far crisper fashion than say, you know, a long meandering conversation from a human. Though, you know, like some humans are more concise than others. When it comes to that CEO example, though, How important is photorealism to you? Maybe to level set, if I had to ask you to grade, you know, the photorealism of your avatars right now on a scale of one to 10, where would you put it? I think if you, I think you have to dissect it a little bit. I think you take the photorealism as in like, how real does it kind of look? I think it's very close to 10. I think what's, as if like, if you took a still frame of the video, right? I think it's very difficult to tell that it's an avatar, which is very large part due to AI, very good at rendering. I think where avatars still have a bit of a way to go, is the body language matching what you say. There's a beat to what we say. So when I speak to you now, like my eyebrows move in a specific way, my hands move in a specific way, we have this whole language with our bodies. And we don't notice that in the real world because all of us do this. But we notice that when we see a video of this little avatar whose body language It's kind of like out of sync. So what most avatar products in the market today, and not ours, but most avatar kind of companies usually do is that you take a real video of someone and then you loop it in perpetuity and you just change the lips. This illusion sort of works pretty well in shorter bursts, but you begin to get this kind of weird sense where the head movement is out of tune, what they're saying, the hands doesn't match what's being said. And that kind of throws you off quite a bit, right? And I think there, there's a little bit to go. Our new model that we're launching soon has kind of full body language, including hands. That makes a big difference. And then I think there's still in the voice, there's a little bit of imperfections. But I can think that the visual quality is more or less there. It's more about like the last percentage of like the body language and the kind of emotional expressiveness in these avatars, right? What you're saying makes sense to me. So it's almost like the visual fidelity, if you just look at it that way, is pretty cool. It's kind of crossed the uncanny valley. But on the other hand, yeah, you're totally right. Like that emotive quality and the body language, like in motion, that still needs a little bit of work there. And that part is like, again, AI will, I think the models we have in-house have more or less solved that. But basically, I think what we've seen is that no matter how many human animators you throw at, like animating a digital human, we cannot animate it to perfection. And as humans, we are so, so, so sensitive to even the slightest inconsistency, right? And what's amazing about AI and generative AI is that the old school way of doing this, right, is that you sit down as a human being and we try to make a list of instructions of exactly how someone should move. And of course, with AI, what we're doing is kind of like the opposite way around. We're saying, we're not going to tell you what to do. We're just going to show you a bunch of examples of how people actually move. And you can yourself learn what that means, right? So we don't tell the computer, hey, there's like six, seven facial bones and muscles and all those kind of abstractions in some sense that we as humans have built to animate digital humans. We can kind of throw those out the window and say to the machine, You know, you figure out your own taxonomy of how the body works and how people move. And that can be like a 5 billion parameter model that a human being would never be able to sit down and comprehend. But if the computer understands it, who cares, right? It can produce an output that actually looks and feels very realistic. And I think that's what we've seen in every modality, right? It's just that AI is extremely good at this because it can think way more abstract and in way more kind of parameters and dimensions than human beings ever could, right? I love this because this is certainly what you're describing as a huge difference to the way Hollywood has traditionally done it, where it's like, you know, crazy light stage scan where you're essentially in this dome with a bunch of lights pointed at you or, you know, a Medusa scan where you have to do these explicit expressions. So that really makes me curious, you know, for a lot of these off-the-shelf avatars you offer, you do capture a ton of your own training data when generating those. And of course, there's a process for folks to make their own digital twin, their own replica as well. Yeah, what does that process look like now? And what is it going to look like in the future? So right now, we need around three to four minutes of footage of someone. And that's just, I mean, I can record it with your webcam. You can record it with your phone. You can go into a studio. Today, you're still... Basically, the input is the output, as we generally say. So if you record with your webcam, you're going to get a video back. Your avatar is going to be you sitting, recording yourself on a webcam. If you go into a studio, it's going to be you in a studio and so forth. The big thing we're launching very soon is being able to essentially create an avatar if you want and then create new variations of your avatar in different environments. So let's say you've recorded one, we're sitting at home in your podcast studio, but now you actually want to record a video where you're on top of a mountain. or you're flying a plane or you're skydiving or you're doing like a million different other things, we can then create that avatar for you by you just essentially using text to prompt yourself into new scenarios. Cool. This is going to be a big, big, big unlock. So the way it works is that we still need some video of you. And the reason we need some video of you is because if we started from just an image of you, which is, that's basically the modality you want this to work in, right? You take a single image and from that you can generate a scene of you. Then we don't know anything about how you look, how you move, how your head kind of goes around, right? Even my teeth, you know? Even your teeth, the way you talk, we can never infer this from just a single image, right? Because the information is just not there. But what we want to be able to do is we want to build a model that says, this is exactly, you know, like how you move and how you speak and how your hands kind of work in conjunction with what you're saying. And then once we have that model, then we can much easier just say, okay, here's a picture of you standing on top of a mountain. Here's you in the supermarket. It hears you behind a bar or whatever. And then you can begin to create these kind of new scenes. And I think, you know, this is going to be one of those advancements that's going to have like a huge impact in terms of what people use the product for and how much fun you can have with it. I love that. It's kind of replacing the whole kind of green screen visual effects workflow, right? If you just go capture it in reasonably diffused, decent lighting and suddenly you can kind of, you know, choose a bunch of different backgrounds. Like that's like virtual production democratized. before I get carried away and get too excited about that. I do have a question like, so if someone creates this avatar, let's say I made it, who owns it? And can I license my digital doppelganger? So you own it 100%. And if you wanted to delete it, we'll of course fully delete it. No questions asked and that'll always be the case. We are thinking about what to do with kind of likenesses and should we create a marketplace where people can rent out their likeness to work with like brands or creators. It's not a functionality we had yet. What's exciting about it is that it opens up like so many new ways of using your likeness, right? So let's say that you're a celebrity, for example. The traditional way a celebrity would engage with a brand is you say, okay, Miss Big Celebrity, we're going to go into this warehouse, we're going to shoot an advertising with you, and we're going to take a bunch of still photos. And this is then sort of material for all of our campaigns moving forward, right? And maybe they'll record some social media clips as well. And then you're kind of done. You've recorded all the content and now the brand can then use that. What this unlocks is what if you have an e-commerce store and every time someone buys a product, you want to send a thank you message from a well-known celebrity. All of a sudden, it doesn't necessarily need the celebrity to do much else than just say, yes, I'm fine with this. I'll license out my likeness. And maybe instead of that being kind of like a big upfront payment to the celebrity, celebrity is just paid $1 every time someone buys a product in that store, right? And the store can quickly switch out the celebrity with someone else if they want to try someone else. Or maybe they think that for one segment of their customers, the celebrity A is the best choice for another group of customers, celebrity B is the right choice. And because everything here is generated with code, you can actually begin to do these kind of things. And so what I think we'll see is actually a democratization of like working with celebrities in some sense, where today you need to have like millions of dollars and big budgets and whatever to work with a big celebrity. In this way, the celebrity would actually pick who they want to work with, right? Maybe a celebrity would prefer to work with 500 small artisanal shops all over the US that each pay them, you know, less, but in aggregate it tastes the same as like one big Coca-Cola campaign. I think that's actually pretty interesting because my guess would be if you ask a lot of celebrities and who they would prefer to work with, they probably would prefer to work with small artisanal shops with products that they actually love rather than some mega brand who'll just throw millions at them, right? So I think we'll see a lot of new business models kind of emerge. And I personally think that's pretty exciting. That is exciting indeed. And it brings me back to sort of the B2B focus for your company. Given that most of your customers are businesses, you know, what are the types of things that they're using it for? And, you know, in the past, you've described this sort of as like, you know, it was a vitamin for like the entertainment industry, but it's really a painkiller for businesses. Why is that? So when we started the company, we initially, as you said, we set out to actually build tooling for video professionals to be more efficient. And the first thing we did was build this like AI dubbing functionality. So you kind of take a real video. We did a very famous one, David Beckham, speaking obviously in English. And then we could take that advertisement and we could create it in 10 different languages. And so it looks like David Beckham, this case of speaking in a different language. And it's definitely a very cool product. And there was a lot of interest in it and it did like okay in the marketplace, but just had this kind of feeling that if we disappear tomorrow, they will find another way of solving the problem, right? And it was kind of like a cool thing, but it wasn't really a painkiller, right? It was a nice thing to have. And it's very difficult to build a big company around something that's nice to have. You want to sell something that people really, really need to have. And so as we kind of went through the motions of taking that product to market and really just trying to build an understanding of video from first principles, we suddenly had this feeling that there's a lot of people in the world who are not making video today, and they're desperate to make video. And when we spoke to those people, they obviously did not work in the video industry, right? They work in big companies. They're like a marketing manager, training instructor, sales professional, something like that. And they're all telling us that they are desperate to make video. They have a lot of great content, a lot of great knowledge that they want to share with their customers, with their employees, but nobody reads anymore, right? They send out these emails that just ends up in the archive. So they wanted to make videos. They tried to make videos. The thing, if you work in a big company, is that often there's a lot of content to produce, which means the quantity of videos you have to make is very high. There's often need to translate them. There's need to update them after you've shot them because something changes in your business. And that's just impossible to do with a real video. And so for these people, if we can give them a way to make video, which is a thousand times easier and a thousand times more affordable than shooting it with a camera, they would probably be okay with the quality of those videos being lower than what the video industry would. Because for these people, the alternative is not a real video from a camera. The alternative is text. And so it like do you compare this to a real video or do you compare it to text It not like people are saying you know all this content we used to shoot with a camera we now make with Synticia instead It people saying well all this text that we have and all these slide decks and all this kind of static information, we can now turn that into video content. And that became the kind of inflection point for us once we kind of figured that out. And I think there is, I love what you said before, because we had the same kind of feeling, right? It's like, how weird is it that potentially the biggest market for visual effects It's actually going to be corporate communication in a couple of years, not Hollywood, right? That's very contradictory. Like no one would have thought that to ever happen. But in many ways, I think the biggest ideas, the most impactful ideas always feel very weird and very contradictory, right? Like Airbnb, I think it's like, what if people just like invite straight to sleep in their home for a bit of money? Like everyone would be like, you're absolutely crazy, right? But I think that's what technology kind of does. It challenges a lot of these kind of inherent assumptions. And I think in our little world, this is a pretty good example of that, because ultimately what we do, to your point, is special effects, right? It's visual effects. We call it AI because we use AI, but at its core, right, it's not too different from what Hollywood has been trying to do for many years. Definitely is the art and science of visual effects. And I'm kind of curious, right? Like on the consumer side, there's this short form video fatigue and just video fatigue. Everyone's doing video all the time. But on the enterprise side, as you mentioned, there's a bunch of this content that just would never have been converted into video form. If you take that to the limit, do you think there is a similar risk where we just end up polluting our feeds with a bunch of throwaway content? It's just going to be like an onslaught of enterprise B2B video content. But I think what's going to happen is that video is going to become the table stakes. So today, email is table stakes, right? You don't operate a company without sending out emails. at one point, if you're sending me like email with lots of text on them, you're just not going to open them, right? Your inbox in the future is going to look more like your TikTok feed where you just kind of quickly scroll through what's interesting. And as always, just like it is with email today, and just because something gets easier to produce, you still have to be a great storyteller. You still have to figure out what's the right hook to get my attention, to watch your video all the way through and get in contact or whatever it is that you want me to do. I think all those things around storytelling and building a good product and being good at communicating it, none of that goes away. So I think what's true now, what is going to be true in the future, it's about curation and standing out. So we are seeing an explosion of content. And of course, every time tools like the ones that you're creating come out, people use it for misinformation and disinformation, right? And so there have been instances in the past where Synthesia avatars were used to spread misinformation. How much of those incidents pushed you to sort of lock down or put rails on the abilities of these avatars? so the the safety aspect has always been very important to us and you know since we we found the company in in 2017 we did so on on an ethical framework called the free c's consent control collaboration and consent is about we never create avatars of anyone without explicit consent and that's kind of like a hard stop and which means we kind of lose out on some virality because we don't make funny videos for satire of like celebrities or whatever right but that's that's a choice we decided to make and the second one is from control right so that's basic content in moderation, which is that we take a very strong view on what you can use the platform for, what you can't use the platform for. We're a B2B product. We work with the enterprise. And so we're probably a bit overly strict in some senses. You know, there's legal categories of content that we kind of are very restrictive around. And we put a lot of effort both with machines and with humans into making sure that people don't use our platform and think they shouldn't. I think with these incidents that happened in the past, we'll always get judged by the one video that makes it through. And we learn something from that every single time. In many ways, right, like when you do concept iteration, a lot of people disagree with you, no matter what direction you go in. Yeah, you're not going to make everyone happy. Exactly. And especially, of course, when it comes to things like news and politics, religion, etc. This gets very, very hairy. And no matter what you do, there'll be people who don't like it, right? And so there was specifically one of these instances, which I think was something we discussed a lot internally. Someone made a video, and I'll leave out kind of like the details of it. But essentially a video about like a pretty hairy topic, right? a topic that'll divide people into either you're very pro or you're very against and the video was actually entirely factual but it's not perceived at this one big newspaper as being kind of a piece of sort of propaganda information and that was a very interesting one for us because we fact checked it and there's nothing that that wasn't factual in there you could argue that talking about in a specific way was kind of like a ploy to make people believe something specifically but i mean all communication has those properties and so what we've decided to do is just to be again kind of overly restrictive so we don't allow news and current events content unless your enterprise customer for example that's actually a shame because we had a lot of like NGOs citizen journalists and so and those kind of folks making great content on the platform but it's just like too difficult to manage eventually and so we decided to to make that um to make that rule so it's something we always work on as i said you know we're not claiming we're perfect but i think we've um i think we have very very good systems in place today that keeps bad people out the platform. I got to say, the stance you're taking is indeed more restrictive. I hear most platform creators sort of punting this to the point of distribution, where they're like, well, the creation tool shouldn't be responsible for this. The distribution platforms should be the ones, you know, bringing the hammer down. Look, I think these questions are like so difficult, right? And there's so many different ways you can think about them. You can think about them philosophically. If there's a question like freedom of speech, from a practical perspective, Is this just about keeping out the bad people that we all agree are bad people? Is it an economical question? Am I hindering my growth as a company because I'm overly restrictive and leaving the door open for other competitors? Like there's so many angles. It's not an easy question, right? And what we have talked a lot about is that there is a shift happening right now, specifically in AI, where a lot of companies are moving the point of moderation to the point of creation, right? where, of course, with the big language models, we see this all the time, right? There's a bunch of things they just won't talk about and they'll definitely not help you with the recipe for a bump or something like that, but even also more vanilla topics like opposite politics being the obvious one. There'll also be kind of like tiptoeing very much around those kinds of things. In our case, it's sort of the same thing where we actually limit you from actually creating the content. And I always explain this as like, that is actually very new, right? Imagine that when you're using PowerPoint, Microsoft Word, it would stop you from making a slide about how to do something horrible, right? That's a very weird thought for most people. But in many ways, that's actually what we're doing and what we're building, right? And no one has ever held Microsoft responsible for the fact that a school shooter can write their manifesto in Microsoft Word, right? Or that I'm sure there's been made PowerPoints about how to do evil, horrible things in wars and so on. But we've never seen that as being Microsoft responsibility. We've always seen that as being the distribution platform's responsibility once that content actually gets uploaded somewhere. But I do think that as a society, it's probably good that we're extra careful when we roll out these things in the beginning. And then maybe in 10, 15 years, we'll have a different view on how these technologies should be used and governed. But as a starting point, my own kind of moral inclination and the rest of the companies is that it's good to be a little bit on the back foot and be a little bit more restrictive than what some people will feel comfortable with. Now, building off the discussion and looking towards the future, you talked about next year, you're going to have these avatars that you can talk to in real time. There's an interesting thing that we came across. We did this episode with ChatGPT Advanced Voice Mode where sort of the guardrails and restrictions that are put on it almost prevent the avatar from being like fully human-like, you know? Like if it's too much in a box, you can kind of see those seams and that kind of pops the illusion. How do you think about that tension, especially as you're moving towards these more expressive product experiences? I totally agree with you. And I think it's so deeply fascinating to me how as humans, we're so good at detecting something that's non-human. Like when you talk to the voice mode chat, right? Like you understand, okay, this will help you answer like kind of practical, factual questions. And every time you ask it for an opinion or to be a little bit human, it'll just default to, you know, back to the kind of like robot speech to some extent. At some point, you know, I think these restrictions will be lifted. There's a big market and there's a big appetite for interacting with computers that feels very, very lifelike, right? So I think we will see that kind of boundary disappear over time. As for us, I think, again, you know, we've made a decision to be a B2B company. And so we're not going to be offering like virtual boyfriends and girlfriends any time in the near future. But I think a lot of those properties are also very interesting in a business context, right? For example, if you're a salesperson and you do sales training, if you can role play with a prospect that can be programmed and prompted to act in a specific way, you could probably ramp a lot faster than if you have to read documents about how to come back from different objections. And I think there's a lot of other and potentially also more controversial applications of this. Think about like psychology, therapists, and doctors. I think we'll see a lot of those pop up in the next couple of years. And I think ultimately for a lot of these use cases to really work, it has to feel very lifelike. You know, I think if you're interacting with like a sales simulator, which looks like a computer game from the 90s, you're just going to disconnect from it. It's not going to work, right? And I think right now, we're very, very close to like passing through that uncanny valley, where they actually will feel very, very close to having a Zoom call with a real human being. It's interesting, even with your B2B focus, you just outlined a bunch of these scenarios where the box is large enough where you can have a very meaningful, interactive experience. So I have to ask you, how far away are we where we can have these AI avatars that can feel indistinguishable from a human conversation? I don't think we're very far, to be honest. I think in 12 months' time, you could probably simulate zoom calls at a pretty good fidelity i think device component of this is is kind of getting to full maturity there's a lot of great technologies out there and the video part of it depending a little bit what you're trying to simulate but if you look at the videos that we're watching each other on right now right and and that's a you know that's a compressed like zoom feed then that's not like the most challenging thing to replicate and you're already going to expect a whole bunch of artifacts and compressions and all those sort of things right so if that's kind of like the goal, then I think you're not very far from it. Let me ask it in a slightly different way, especially in the visual fidelity. And to use your example from earlier, how long before you can send that digital love letter to your girlfriend and she believes it was actually from you? I think next year. Like, I really, I don't think it's far away. I think looking at what we're building right now, we have the components. We've taught a system how to predict the correct body language, facial expressions, gestures that goes with what you're saying. We can generate the voice in high enough quality where it sounds deep-felt and emotional. So I really don't think that it's more than 12 months away. And it'll be very interesting. I usually, internally, we talk about this as like the chat GPT moment for video. I think what's so powerful about chat GPT is that it truly kind of broke with uncanny value, right? the first time you used chat GPT, it's so human that you begin talking to it like a human subconsciously without even thinking about it. I think for audio and text-to-speech, kind of got there. And for video, I think this is getting very close. So internally, we think of this like when you can generate a video of like a vlogger on YouTube, like, you know, the traditional style, like sitting in my bedroom, kind of like talking at you, where you can generate that in high enough quality, high enough fidelity, that you would come home after work one day and you'd put on an avatar video and just sit down and watch an avatar talk for 18 minutes, like a lot of people do with vloggers. That's when the total market for these technologies explodes by a thousand. I think when that happens, Pandora's box is open. There's going to be lots of ethical questions, lots of cultural questions, lots of art questions about what does this mean. And I think it'll be a pretty meaningful and powerful moment. So let's get into those ethical questions. I mean, it's fascinating, right? Let's say you have these photorealistic avatars that you can talk to in real time. You know, could this tech eventually replace humans completely in, let's say, like customer service roles? And how do you think about that tension right It like how do you ensure this tech enhances rather than replaces human interactions Because the thing that keeps popping into my head is like pulling up to a hotel at like 11 PM And instead of a human there there like a fricking iPad you know it multimodal. It can see me, it'll check me and I'll do everything. It's perfect. It can work around the clock, but there's not a human. And you're already seeing some hotels try this where they've got, you know, essentially a remote worker playing that role right now, but eventually it'll be autonomous. And that's just one example. So how do you think about that Pandora's box opening? I think there are ultimately two types of use cases. If you're calling a customer support, for example, you don't really care about who the customer support agent is, right? You just care about solving your problem the fastest way you possibly can. And I think if we replace that with an agent or a bot, I think no one will care about that. And I think that'll definitely happen. It's a matter of like when the technologies are good enough. If you take the example of a salesperson or maybe a hotel receptionist. I think some hotels will want to sell the cheapest room. They'll want you to have the fastest experience and just like getting the key card and just getting into your room. Other hotels will put a lot of emphasis on meeting and greeting you at the door, taking your luggage for you, explaining what's happening in the city this weekend and so on and so forth, right? That's a product that's pretty heavily service dependent. And I think for those kinds of things, we will really value the human connection. I think it's a bit the same thing with a salesperson. A lot of people want to talk to a salesperson because it's a relationship that you build with someone else, right? And I don't think we can replace that. And I think that the human touch and the human element will become much more important in the future. AI is going to be much faster at replacing people typing in Excel spreadsheets all day than a waiter giving you a great experience at the local restaurant. I think that's well said. But I want to ask you, do you foresee a world where having a digital avatar is as common as somebody having a social media profile? Like Meta recently announced, you know, digital avatar tools for creators on their platforms, for instance. Absolutely. I think it's just an evolution of the profiles we all have today, right? In some sense, your profile on a social media network is also a clone of you. It's maybe not as visceral as like an avatar of yourself, but that is what it is, right? It's a digital representation of who you are. And if I go back to my childhood, when I was on forums, right? We'd have a username. And then the next generation of forums, you'd have a username and a profile picture. And then you'd have like a profile picture with a profile page where you can write something about yourself and your interests or whatever. And then we all graduated to like social media. And now we have not just one picture of ourselves, we have a whole gallery of pictures that talks about us. And on TikTok, we have a whole library of videos that explain something about ourselves and who we are and our place in the world and so on and so forth. So I think in many ways, it's just a natural evolution of that, that we will have kind of digital personas that represent us in the kind of digital space. So are you imagining this tech evolves to a level where, let's say, my digital self not only represents me in the virtual world, but in a sense kind of lives my virtual life for me? I don't think it's off the table, you know, I think, again, I don't think that I will enjoy interacting with my friends, but as much as I enjoy interaction with my friend in the flesh and knowing that it's actually him. I think it'll be, again, more probably practical. And maybe we'll have agents that says, hey, you haven't seen Simon for six months. Why don't we arrange something? And I'll say, yeah, that's a good idea, right? Then my AI will go to Simon's AI and say, hey, these guys haven't met up for a while. Why don't we set up something for them in a couple of months' time, right? We know that they both love listening to techno music. So let's find a concert or a rave somewhere close by and set that up for you, right? So I think, again, it's more utilitarian, I think. I don't think it's going to be like our AI like catching up on behalf of us and then giving each of like the humans the lowdown of like what was discussed in our student's life. I hope that's not going to be the case. But I think those kinds of things, I think we will see a lot more, right? And for one, as someone who has like a pretty busy life, I think that'd be pretty awesome actually, you know? But I think from a very philosophical perspective, you can argue that basically everything online is already not real, right? Like your Instagram profile is not a real representation of you. We present ourselves to the best life possible. And I think our avatars and all the digital content we'll create around ourselves will probably just be an extension of that. I think what we'll have to learn, and what I can send the younger generation to some extent are learning, is that this is like, it's fiction grounded in reality, right? And I usually use the example, like when you go to a dinner party, and when your parents went to a dinner party, also when you do for that matter, but in a different time and age, right? You sit down at the dinner table and you ask them, well, how's it going? And people do exactly the same thing in real life as they do on Instagram, right? It's very few people sit down at the table and say, actually, you know what? I'm really tired of my wife. I want a divorce. I hate my job. Like most people are like, yeah, it's going pretty well. Like we project a version of ourselves to the world. And so I think it's like this idea of projecting yourself is not something that Instagram has created. It's always been the case. It's amplified, perhaps. It amplifies it and it makes it more concrete in many ways. But I think most human behavior has been the same for like thousands, thousands of years. We just express it in a different way. So in this future where these digital humans are photorealistic, they've crossed the uncanny valley. What does that mean for individuality? Will we be confused by the fact, like, I can't even tell if this is, like, Victor that I'm interviewing or you delegated your deep fake to, like, come and do the interview? And it's, like, indiscernible to me, like, what is going to happen to transparency in that context and individuality? I think that if you look at text, like, you have been able to produce text and share with anyone online for the last many years. And I think by now, most of us have some sort of critical sense that just because something exists as text or the internet somewhere does not make it true. If you see a tweet from some random account saying World War IV just kicked off or whatever, your first instinct is going to be, that's probably not true. You've got to triangulate that information with a news source or you've got to whatever. right and and i think what's what's going to happen now is that we're going to have to move from a world in which in general if someone has been recorded with a microphone with a camera most people assume that that means that just the fact that it exists means that it's true that's that's not going to be the case anymore right and so it'll be even more important that all of us learn how to be literate with media we need to look at things from different angles who created this piece of content when was it created is this from a reputable source and i think these technology is developing very fast i think it's going to bridge into a world where we just per definition believe nothing of what we see online yeah we presume that everything is fiction everything is a hollywood film right and i think also just that we we basically go back to saying we can only trust things just because it happened in front of us if we saw it in real life that doesn't mean we can't trust anything we read or see online we're just going to have to be more critical around like presuming that just because something exists, it does not actually make it true, right? And I think that's actually going to be a good thing that we just per definition think that like almost everything is fake and we work backwards from that. And there's a couple of ways we can work backwards from that. We're working with Adobe and some other tech companies that they call C2PA, which is the idea that you fingerprint and watermark content essentially. I think we'll move into a world where content is per default verified. So when you take a picture with your phone, you make a video on Synthesia, When you create an image in Photoshop, you choose to register that piece of content in the global database of all the world's content. I hate the word, but I actually think a blockchain can be a good solution here because it's immutable. When you then upload it to YouTube, whatever your social media platform is, it will look at the content, they'll identify it in the database for all the world's content, and say, this video was created by Victor originally in 2019. It was made with Photoshop, with Synthesia, or whatever. Here's some information around it. We know where this came from originally. And that will move us into an internet, I think, where most content will be verified. That'll help you make a decision as to evaluate every single piece of content, essentially. And we'll then be in a world in which if content is not verified, it will stick out like a sore thumb. I think you're right. We are going into a world where authenticating content will be the default and we'll have provenance for most pieces of content that are created. Leaving aside sort of the concerns about the technology, what is it about the potential of digital avatars that excites you most about humans wanting to interact, live, work and play in this future. What can go right if you execute your mission correctly? I think the beautiful thing about technology is that it enables everyone to essentially have a voice, to be able to bring their ideas to life, share their knowledge with the world. The two main vectors there is, of course, distribution, which is that you can share the content once you've created. And the other one is creation, right? and i think we've seen in many modalities how powerful it is when you allow more people to create if you look at more recent examples just in my own life you know i love music and i've seen firsthand how the fact that we've been able to produce digital instruments and we can sample things has led to new genres like electronic music house and techno for example right that that's not that would have been possible with real instruments and when you see just more recently camera technology being very accessible like youtube and i mean podcasts like we're doing right now, those are essentially formats that didn't exist before we invented technologies that massively democratized that. And so for me, the promise of all this is like, well, what if everyone could be a Spielberg, right? What if anyone, a film student, can go out and say, I have a great idea and all they need to realize that is a lot of time and a good idea, right? There'll be a whole bunch of content that's been discussed about that's never going to be watched by anyone who's going to be crappy content, but there will also be a film student from somewhere in some small country in the world that manages to produce amazing art, despite not being connected to Hollywood. And I think that's really the thing that excites me the most. It's like free and creativity, culture and art is such an important part of moving humanity forward, of creating peace in the world, bridging all the gaps that we kind of have between us. And I think that that's going to be a massively positive thing for the world. We've already seen it play out in many other types of media and getting video there as well, it's going to be, I think it's going to be transformational for the world. Love it. Victor, thank you so much for joining us. Thank you. Victor Ripper Belly is the co-founder and CEO of Synthesia. And yes, I'm quite sure I spoke with a real Victor, not as digital twin. Though in a year or two, even that certainty might be up for debate. What fascinates me is how we've inadvertently paved the way for digital humans through our everyday tech compromises. I mean, think about it. We've grown completely comfortable with grainy video calls, audio glitches, and awkward zoom delays. These imperfections have actually created the perfect landing pad for digital avatars. We're already operating in a world where good enough video quality is, well, you know, good enough. But what Synthesia shows us is that this isn't just about making believable digital humans. It's about transforming how we create and share ideas at scale. When I started baking videos, it meant countless hours of shooting, reshooting, and painstaking editing just to get a simple message across. Now we're approaching a world where anyone with an idea can spin up a video presentation in minutes in any language with any number of perfectly delivered takes. And that power to create is incredible. But it also means we're racing towards a fascinating cultural crossroads. Soon, everything we see online might come with its own digital birth certificate, a verified chain of creation that tells us exactly where it came from and how it was made. It's like we're building a new trust architecture for the digital age. In a world where anyone can create any video featuring any person saying anything, maybe what becomes most valuable isn't the tech that makes it all possible, but the story underneath it all. The TED AI Show is a part of the TED Audio Collective and is produced by TED with Cosmic Standard. Our producers are Dominic Girard and Alex Higgins. Our editor is Banban Cheng. Our showrunner is Ivana Tucker. And our engineer is Asia Pilar Simpson. Our researcher and fact-checker is Christian Apartha. Our technical director is Jacob Winnick. And our executive producer is Eliza Smith. And I'm Bilavul Sidhu. Don't forget to rate and comment, and I'll see you in the next one.