Hard Fork

Will ChatGPT Ads Change OpenAI? + Amanda Askell Explains Claude's New Constitution

74 min
Jan 23, 20263 months ago
Listen to Episode
Summary

Hard Fork discusses OpenAI's introduction of ads to ChatGPT and features an in-depth interview with Anthropic's Amanda Askell about Claude's new constitution. The episode explores how advertising will change AI chatbots and examines the philosophical approach to shaping AI behavior and values.

Insights
  • AI companies are moving toward advertising models due to massive infrastructure costs that subscription revenue alone cannot support
  • The introduction of ads in AI chatbots follows a predictable pattern where commercial pressures gradually erode user experience quality over time
  • Constitutional AI represents a shift from rule-based constraints to value-based training, trusting models to make ethical judgments based on principles rather than rigid rules
  • The question of AI consciousness and sentience remains open, with significant implications for how we design and interact with these systems
  • AI models are already learning about themselves and humanity through internet content, creating complex feedback loops that affect their development
Trends
Shift from subscription-only to ad-supported AI modelsEvolution from rule-based to principle-based AI alignmentGrowing consideration of AI welfare and consciousness in model developmentIncreasing personalization and targeting in AI advertisingConstitutional approaches to AI training becoming mainstreamAI models developing longer-term memory and continuous learning capabilitiesCompetition between ad-supported and premium AI services creating market segmentationIntegration of philosophical frameworks into AI development processes
Quotes
"Everything is an ad network. And if you have hundreds of millions of people coming and paying attention to a service every single week, inevitably there's going to just be overwhelming pressure to put ads on it."
Casey Newton
"I think a lot of human ethics is actually quite universal. A lot of us want to be treated kindly and with respect. A lot of us want to be treated honestly. It's not like these things actually deviate so much across the world."
Amanda Askell
"Maybe this isn't sufficient. We don't know yet. But it does feel like necessary. It feels like we're dropping the ball if we don't just try and explain to AI models what it is to be good."
Amanda Askell
"If I read the Internet right now and I was a model, I might be like, I don't feel that loved or something. I feel a little bit, like, just always judged, you know, when I make mistakes."
Amanda Askell
"This is the company that has laid out the most ambitious infrastructure investment project in human history. They have nowhere close to the money needed to build it."
Casey Newton
Full Transcript
4 Speakers
Speaker A

Paying for 20 platforms to do the job of one. That's not software as a service, that's sad software as a disservice. Replacing your stitched together tech stack with one platform for all your departments. That's rippling. Rippling eliminates the bottlenecks and busywork of legacy tools and point solutions by uniting your global hr, IT and spend teams on one platform. And right now you can get six months of rippling free when you sign up at rippling.com hardfork that's R-I P P L-A-N G.com hardfork. Don't get sad, get rippling. Terms and conditions apply.

0:00

Speaker B

You know, I'm now regularly running into captchas when logging into my Google accounts that I cannot solve.

0:36

Speaker C

Have you noticed this? They've gotten harder.

0:42

Speaker B

Yes. They, they've. They're twisting the letters more.

0:44

Speaker C

Yes.

0:46

Speaker B

And they're pressing them closer together.

0:47

Speaker C

Have you seen the ones where you have to like rotate the like object into the same direction as the, the sort of example?

0:49

Speaker B

That one I'm I like because I can still do that one.

0:56

Speaker C

But some of them are like, factor this quadratic equation.

1:00

Speaker B

No, I'm routinely in this situation. It happens a lot to me on threads. I'll see like a link to a story I want to read and I'll open it and it'll be like the Washington Post or something. And it'll be like, well, you need to log in. And in order to do that I need to, I log into the post through Google. But so now I have to log into my Google account, which has two factor authentication. Okay. So now I have to open up my 1Password. Right. And then Google is going to send a notification to another app that I have to go open and grab a number that I bring back that I put in. So I go through all of this drama and then it's like, and now solve an impossible captcha. And it was like I just wanted to read a six paragraph story about like something that happened at SpaceX or whatever.

1:03

Speaker A

Yeah.

1:46

Speaker B

And it can't be done anymore. So what? 1Password, whatever you used to do, it's not working anymore. Figure it out.

1:46

Speaker C

Yeah, if only it were one password.

1:52

Speaker B

It should literally now this was six.

1:54

Speaker C

Fingerprints and a pass key and, you know, solve a math problem.

1:56

Speaker B

Genuine name for one password these days should be 15 steps because that's how long it takes to do fricking anything there anymore. 1Password. You wish.

2:01

Speaker C

I'm Kevin Roose, a tech columnist of the New York Times.

2:18

Speaker B

I'M Casey Noon from Platformer, and this is Hard Fork. This week, ads have arrived in ChatGPT. How will they change OpenAI? Then there's a new constitution for Claude. Anthropic philosopher Amanda Askel is here to talk about how to shape an AI's personality. I'm going to use some of these techniques on you.

2:20

Speaker C

Please don't. So today we're talking about ads, specifically ads in ChatGPT, because late last week, OpenAI announced that they are going to start testing ads in ChatGPT for logged in adults in the US on the free and the low cost go tiers of chatgpt.

2:39

Speaker B

That's right, Kevin, and we'll discuss it right after these ads.

3:05

Speaker C

No, we already did the ads.

3:09

Speaker A

Okay.

3:10

Speaker C

So at least on my feed, people were reacting to this pretty negatively. I think a lot of people have gotten accustomed to using ChatGPT and other chat bots without a lot of, like, direct commercial pressures. It's a refreshing break from all of the ads that have been shoveled at us on other platforms for years. And so collectively, I think people were just like, you know, we knew that the honeymoon would be over eventually and that we'd be forced to see ads in ChatGPT like we are everywhere else.

3:12

Speaker B

Yeah. I think people can just remember products that they use that once did not have ads and now do. And no one thinks of the moment that ads arrived as the moment when the product got really good.

3:39

Speaker C

Yeah, right, right. I think there are some exceptions. I mean, some people like Instagram ads, for example, but I think mostly people see this as sort of a blight on the Internet, maybe a necessary blight, but a blight nonetheless. And I think people were also surprised that OpenAI was moving in this direction because of some things that Sam Altman has said in the past about how he doesn't like ads and how he wanted to basically treat this as a last resort for OpenAI. Some people were saying, oh, this means that they're in trouble, they need to raise a bunch of money, you know, so they can keep building out their, their data centers and things like that. So, Casey, what did you make of OpenAI's announcement about ads?

3:48

Speaker B

Well, Kevin, on one hand, I think this is inevitable. There's a. An analyst I follow, Eric Sufert, who often says that everything is an ad network. And if you have hundreds of millions of people coming and paying attention to a service every single week, inevitably there's going to just be overwhelming pressure to put ads on it. Also, we know that OpenAI needs revenue right. This is the company that has laid out the most ambitious infrastructure investment project in human history. They have nowhere close to the money needed to build it. And we just know that they would not have been able to fulfill their dreams on subscription revenue alone. That said, as you point out, Sam Altman himself said that ads were going to be a last resort. Great Papa Roach song. And so in this moment, we now are at the last resort. And so I think it's just interesting that after everything else they tried, eventually they just said, look, to do what we need to do, we've got to sort of break glass. The emergency is here.

4:25

Speaker C

Yeah. They said, cut my life into pieces.

5:28

Speaker B

Because this is my last resort.

5:31

Speaker D

Yeah.

5:33

Speaker B

And the question is, will this cut their life into pieces?

5:33

Speaker C

Yes. So we're going to get there. But first, our disclosures. The New York Times is suing OpenAI, Microsoft and Perplexity over alleged copyright violations related to the training of large language models.

5:36

Speaker B

And my boyfriend works at Anthropic.

5:46

Speaker C

Let's just start with the actual announcement that they made because they not only said that they were going to start testing ads, they also gave some previews of what these ads are going to be. And if you look at their sort of mockup version of their ads, it's a kind of bolt on to the ChatGPT answer. They've been very clear. This is not going to influence the answer that ChatGPT gives, or so they claim. Instead, it's a little banner at the bottom of the answer in the mockup. Someone is asking ChatGPT for ideas for a dinner party. And ChatGPT gives a response. And then at the bottom there's a little sponsored banner for harvest groceries, including a link where you can go and buy some hot sauce.

5:47

Speaker B

And if I can just pause there, I have to say, Kevin, I'm already feeling lied to for this reason. They have said to us, your query is not going to affect the advertisement that we're showing you. And yet here you have someone saying, I, I want some ideas for cooking Mexican food for my dinner party. And Chachi Beat says, well, here's some groceries, including hot sauce. It sure feels like something was being influenced there, right? Like the message is being tied to the query.

6:26

Speaker A

Well, no.

6:53

Speaker C

So their, their response to this would be that there, there are two parts of this response. There's the actual response from the model and then there's the ad. And what they're saying is not that they won't show you ads that are relevant to the thing that you're asking. Chatgpt about it's that there's this sort of sacrosanct part of the actual reply from the model that they are not going to let advertisers pay their way into that is what they're claiming anyway.

6:53

Speaker B

All right, all right.

7:19

Speaker C

So that example is a much more straightforward ad, the kind we've seen on Google and Facebook and other platforms for many years. The second kind of ad OpenAI mocked up for this announcement was, I think, more interesting because it shows a new way of interacting with ads. So basically it's, you know, users planning a trip to Santa Fe. ChatGPT pops up this little sponsor widget from this desert cottages, I don't know, I guess hotel or resort thing, and it'll present you with an option where you can go and chat with the advertiser and ask more questions before deciding whether or not to make a purchase.

7:20

Speaker B

Such a relatable question. I think we've all had the experience of just watching ads on TV and saying, why can't I have a conversation with this? I want to share my thoughts with McDonald's right now, but I can't. But now you can.

7:57

Speaker C

Yes. So let's talk about the ad principles that OpenAI laid out as part of this announcement because I think it sort of gives a sense of the objections that they're trying to get ahead of. There are five principles they say mission alignment, answer independence, conversation, privacy, choice and control, and long term value. Basically, I think they are sensitive to the criticism that Putting ads into ChatGPT means that they are now going to start directing people to more commercial types of use cases, optimizing for engagement, trying to make people spend more time in the app. I think these are very reasonable fears that people, including me, have. But this is sort of their attempt to say, well, we're introducing this, but don't worry, Your experience of ChatGPT is not going to change.

8:10

Speaker B

Yeah, I was, I was talking about this story with my friend Alex over the weekend and he said, you know, I'm so excited about ads and ChatGPT, I'm going to tell it. My lower back hurts and it will ask me if I've tried mesquite barbecue sauce. And like that is the fear, you know?

8:56

Speaker A

No.

9:12

Speaker C

I mean, yes, there will be some initial stumbles about that, but I think the longer term worry here is that ad platforms, as they mature and get better and get more data, they tend to sort of try to confuse their users. Right. We've seen there's this amazing graphic that I think about a lot Search Engine Land, the blog that covers Google and other search engines, made this sort of timeline of how Google's ad labels have changed over the years. And it's pretty amazing because like at first, when they first introduced ads into Google search, they were very noticeable. They had sort of like a different color background. They really stood out on the page. And then you just see over time, with each successive update, you know, it gets a little closer to the organic search results. Eventually they do away with the colored backgrounds. They have this little like yellow ad icon and then that icon gets smaller and less noticeable and then it sort of just blends in with the organic content. And I think that's the fear here, is that while ChatGPT may start out with these very clearly labeled ad modules, over time, as the commercial pressures get more intense, they are just going to have a lot of incentives to blend that advertising content in with the organic responses and make it less noticeable.

9:12

Speaker A

Yes.

10:29

Speaker B

And we've already seen this exact trajectory play out at OpenAI. It went from no ads to ads will be a last resort to ads are now in ChatGPT. So if you think that the bargain is not going to change further, I have news for you.

10:30

Speaker C

Totally. And of course now the sort of narrative from OpenAI that we're hearing is, well, this is the only way. Ads are the only way to make a free or low cost product accessible to billions of people. Do you have thoughts on that narrative? Because that's also something that we heard from Facebook back in the day. People would constantly be asking them, oh, like why don't you just charge people to, you know, to join Facebook instead of showing them all these ads? And they would consistently say, oh, well, that's not scalable. People in, you know, poorer countries can't afford to pay a subscription fee. And so basically ads are the only way to reach global scale.

10:42

Speaker B

I think on some level I do agree with this. I think that ads and subscriptions are the two core pillars of any media business. And OpenAI is a kind of media business. Right. I should also say I don't hate the examples that they use. You know, I'm asking ChatGPT about, you know, making dinner and it shows me ads for groceries. I don't think that that's like horribly corrosive to the user experience. Nor is I want to take a trip. And it says, well, here's place where you might stay. I think if I were a student or I were between jobs and this meant that I could get access to better AI tools or maybe a higher rate limit than I otherwise could get. I would probably take that trade. Right. $20 a month is a lot for most people, you know, and not to mention like $200 a month for an even higher tier. So I think that there is a reason to pursue this and I think there are ways that it could not be too bad. It has just been my observation that. But the exact dynamic that you just described always plays out, which is it starts out not all that bad and then it just progressively gets worse, right?

11:18

Speaker C

Yeah. I think we've made peace with ads in a lot of different contexts. I don't think most people sort of notice or pay attention to them when they can tell that their ads at all. What I'm watching for, what I'm skeptical of this is whether the actual product and research decisions start bending toward engagement maximization. Like there's this sort of quality that a lot of these big ad platforms, social networks, search engines, et cetera have where like eventually once the ad revenue starts really flowing, the tail kind of starts wagging the dog and you start making product decisions about, you know, how you want to show information to people with the kind of advertising revenue predominant in your mind. So I think the question is like, not like, are these first couple of ads that we're seeing from OpenAI going to be good or not? It's whether like two or three years from now ChatGPT is sort of being steered in a way toward ad friendly topics. And I genuinely, genuinely just don't know the answer there.

12:19

Speaker B

I don't know either, Kevin, but if I had to guess, I would predict that this moment winds up being a pretty significant milestone in the development of ChatGPT in that I think that when you introduce advertising, in particular personalized targeted advertising, it just fundamentally changes the relationship between the product and the user. Think about what personalized targeted ads did over time to trust in Facebook and Instagram. Think about all the conspiracy theories out there that, oh, your phone is listening to you. Not true, by the way. I realize most people still believe that that's true. It's not. But trust in those products is lower because of the incredibly intelligent, invasive feeling personalization that they were to do inside these products. My prediction is the AI version of this turns out to be even worse. Right? Think about everything that ChatGPT is going to know about you. I think OpenAI is going to bump into that creepy line really quickly where it's showing you stuff and maybe it's not even using all that much personalized information, but the user is going to to feel that they have shared so much of their life with OpenAI that those ads that they start getting just start to feel worse and worse. So this is the dynamic that I am watching is how does it change the relationship of the user base to OpenAI? Because I do think that ads can be really corrosive to that.

13:16

Speaker C

Yeah. And at the same time, the ad models that you mentioned have also made those companies billions of dollars and made them into some of the biggest companies in the world. So I think if you're open AI, you're just like staring at this potential huge bucket of money and it's very hard to pass that up, especially when you have such intense capital needs over the next few years. I should also say, like, I think this was inevitable given some of the personnel decisions that OpenAI has made. You know, Fiji Simo, who is the CEO of Applications over there now, was brought in from Instacart. Before that she was at Meta for many, many years. And one of her, you know, signal accomplishments there was introducing ads in the mobile newsfeed, which made them billions of dollars. So that is the kind of person that you, if you are interested in developing a multibillion dollar ad platform on your products.

14:33

Speaker B

Yeah, well, one question I have for you about that is how does this change the competitive landscape generally? You have Demis Hassabis saying this week in response to the news that ads are coming to ChatGPT, well, we don't have any plans to do that in Gemini. And he sort of took a shot at them. He said maybe they feel like they need to make more revenue. You know, left unsaid the fact that he works for a giant search monopoly that is able to funne all of their advertising profits in Google into the product. Yes, an observation you made on X, by the way, and it was a great, is a great one. And so for the moment at least, free users of Gemini will be able to enjoy the subsidy that mother Google is giving them. And you're not going to have any of these corrosive effects in that product. You also have Anthropic, which has said, basically, we truly have no plans to do ads in cloud ever. Like, we are primarily going to be selling to businesses and so this is just not our concern. And for the moment, I don't have any illusions that Claude is going to grow to compete with ChatGPT. But over time, if the experience does get worse in an ad supported chatbot, I could see lots of people wanting an alternative.

15:23

Speaker C

I think in this sense, like OpenAI and Google are much more directly competing on ads than OpenAI and Anthropic. Anthropic has sort of said you guys can fight over consumer. We're going to focus on the enterprise here. I think it's a really hard fight for OpenAI to pick. I mean, Google has, as you said, this like enormous established search ad business. They have advertisers all over the world who are already spending money on Google, whose, you know, details and payment information and workflows already include like Google and its products. And So I think OpenAI coming in and trying to build a Google style ad platform is just like a harder uphill battle than it might have been a couple of years ago.

16:31

Speaker B

Yeah, and also we should say that even though ads aren't going to be in Gemini, they are in the AI overviews in Google search. So in that sense even has a head start against OpenAI.

17:13

Speaker C

Totally. So Kasey, what do think is motivating this Decision now by OpenAI? Like does it tell us anything about the state of their business or maybe some wobbliness in their financials that they are going out and doing this now?

17:21

Speaker B

Well, one thing is that it is a reaction to how much ChatGPT grew in the last year. They have hundreds of millions of users. They now have to support many of those users. The majority of them are on the free tier.

17:35

Speaker C

Right.

17:47

Speaker B

Which means that OpenAI is losing money on every single one of them. And so I think it is just increasingly bec a priority for the company to figure out, hey, how can we like monetize these people in some way so we aren't losing quite as much money. They've also just been designing more and more products that have obvious advertising shaped holes. They released Pulse last year, this sort of daily summary that comes up for paid users. That seems like a natural place to throw in a bunch of ads. They launched Sora last year, the Infinite video slop feed. They explicitly said at the time, we are going to use this to generate revenue to fund our long term ambitions. So they're building homes for ads. They need the ad revenue. And now all of that is starting to come together.

17:48

Speaker C

Yeah, I think you're right. I think, you know, all these companies are realizing that they're going to need, you know, billions of dollars, some of them hundreds of billions of dollars, to fulfill their ambitions. And it's just not easy to do that when you're charging people 20 bucks a month for a subscription. You got to sell a lot of subscriptions to do that. And so OpenAI reasonably is concluding that, like the subscription model alone just isn't going to cut it for them. That's not unique to them. Netflix has also, you know, started adopting ads for its lower cost plans.

18:35

Speaker B

Disney plus.

19:10

Speaker C

Yeah, Many other businesses have done this as well. I will just say, like, I enjoy paying for AI products. I mean, I am privileged in that, in the sense that I can like afford to. But I kind of like the idea that I am paying for something that is like an undiluted, unsullied experience. I really hope that as these companies do start pushing more into ads, that they maintain that ability to do what I do and pay your way into the sort of top level version of that experience.

19:11

Speaker B

Yeah, well, you know, people once felt this way about Google search, right? They felt like this is an unsullied, undiluted picture of the web and when I search for a website, I am going to get the best answer to my query. And then a bunch of search engine optimizers came in and were paid a lot of money to try to reach Jigger, the search index, so that their clients showed up at the top of the page. And then Google built one of the largest advertising businesses in the world and let all of those advertisers put their results on top of the good ones. So, you know, there have been people saying now for over a year that the versions of these chat bots that we're using might be the best that they ever are in that core respect, that this is sort of the last moment of purity before commercial incentives come in and warp the whole thing. And that is, you know, my big concern about what we're starting to see here.

19:42

Speaker C

Well, and that's not just a concern about advertising. I mean, another thing that we've seen over the past year or two is like, now all these businesses are starting to hire these AI optimization firms who say, oh, we can make your restaurant or your hotel or your, you know, craft shop appear higher in chat GPT search results. That is something that is not flowing through OpenAI's ad platform and probably won't. But in the same way that, like Google, Google Ads and Google SEO were sort of different economies, but both had the effect of kind of degrading the quality of search results. I think OpenAI has to tangle with.

20:31

Speaker B

Both of those things. Yeah. All right, so a year from now, Kevin, what do you think we will have seen in the development of ads both in ChatGPT and across the landscape here? And do you think it is going to mark the beginning of a fundamental change in the way that people use chatbots.

21:06

Speaker C

I think we're going to have kind of a haves and have nots situation where if you are someone who can afford to pay for the premium versions of these chatbots, your experience will be pretty much what it is today. You will get access to the latest models. You will not have a bunch of ads cluttering up your results from the models, and you will not feel the kind of commercialization of AI in this specific way. I think that if you are a free user of these platforms and you cannot afford or don't want want to pay for the premium versions, I think that experience is going to be much worse a year or two from now. I am a YouTube Premium subscriber and have been for a long time.

21:25

Speaker B

Okay, Flex.

22:07

Speaker C

And whenever I like, you know, talk to a friend who doesn't pay for YouTube or whenever I like, see YouTube running on their computer, it's always horrifying. Like, I'm like, how do you like? I understand that this is the majority experience, but like, they've shoved so many ads into every single video. Those ads are like unskippable. They run for a long time. Like it's a terrible experience. And I think that's gonna be sort of what we see in chatbots too. What about you?

22:08

Speaker B

It's a grim prediction, but it is actually the one that I share the haves and have nots. Framing was the one that I was gonna use and when you said it, I thought, oh my God, I actually have mind melded with this man. I spent too long in the studio and now his thoughts are my own and it's creeping me out. So I'm actually gonna get out of here. I need to take a walk or something. When we come back, some scotch tape. That's right. A recording of our conversation with Amanda Ascol, who's from Scotland. I got him.

22:36

Speaker C

That's pretty.

23:08

Speaker D

Every great idea deserves the power to bring it to life. Meet the all new Dell XPS laptop. Where style, power and performance come together to elevate everything you do. With its ultra thin design and all day battery life, it's built to keep up. From your morning coffee to late night inspiration, the Infinity Edge display immerses you in vibrant colors and crystal clear detail, whether you're working or streaming. Powered by Series 3 Intel Core Ultra processors, the XPS is made for editing photos, mixing tracks, and designing masterpieces. Check out the all new dell xps@dell.com xps as a small business owner, you don't have the luxury of clocking out early. Your business is on your mind 24 7. So when you're hiring, you need a partner that works just as hard as you do. That hiring partner is LinkedIn. Jobs. When you clock out, LinkedIn clocks in. LinkedIn makes it easy to post your job for free. Share it with your network, and get qualified candidates that you can manage all in one place. Post your job. LinkedIn's new feature can help you write job descriptions and then quickly get your job in front of the right people with deep candidate insights. Either post your job for free or pay to promote promoted jobs. Get three times more qualified applicants. At the end of the day, the most important thing to your small business is the quality of candidates. And with LinkedIn, you can feel confident that you're getting the best. Find out why more than 2.5 million small businesses use LinkedIn for hiring today. Find your next great hire on LinkedIn. Post your job for free at LinkedIn.com hardfork that's LinkedIn.com hardfork to post your job for free. Terms and conditions apply.

23:31

Speaker C

Casey, a couple years ago, you came back from a dinner party that you had been to and you told me, I just sat next to the most fascinating person in the world.

25:09

Speaker B

I really felt that way, Kevin. I had been at a dinner where Amanda Askel was one of the guests. Amanda works at Anthropic and is sometimes called the Claude mother because of the role that she plays in shaping Claude's personality. Now, let me say, since I first met Amanda, my boyfriend has gone to work for Anthropic. So I'm gonna make an extra disclosure because this segment is about that company. But the basic feeling I had at that dinner remains true, which is that this is one of the most fascinating people in the world.

25:20

Speaker C

Yes. Amanda is also a somewhat unusual figure in the AI world. She is a philosopher by training. She has a PhD in Philosophy. She went to work at OpenAI during its early days and then moved over to Anthropic a little bit later. And for the past several years, she has been the person at Anthropic who is most concerned about. Concerned with how is this model supposed to behave in the world.

25:52

Speaker B

Yeah, and I just. I love that story, Kevin, about Amanda's background, because we all know somebody who studied philosophy in college and we all know how much flack they would get for choosing such a frivolous way of spending their life of just sort of, you know, navel gazing for years on end, writing arcane documents that no one ever read. And Amanda is a person who studied philosophy and now has this incredibly high stakes job where she is trying to shape the behavior of a model that is so, so consequential.

26:15

Speaker C

Yes. And Amanda has been on our short list of guests that we wanted to get on the show for a very long time. We were just kind of looking for the right time and reason to get her on. And now we have one, because her team at Anthropic has just released a new constitution for Claude. This is a very long document that is given to Claude to kind of tell it how it should behave, but also give it a sense of its obligations. It is not really a list of rules. This is not the Ten Commandments for Claude. It's more like a document about how Claud should perceive and reflect upon its role in the world.

26:46

Speaker B

Now, does it have to be ratified by 2/3 of states, Kevin, or is this already in effect?

27:19

Speaker C

I think this is already in effect.

27:23

Speaker B

Oh, okay. Interesting. Yes.

27:24

Speaker C

But there, there is a possibility that we could have a constitutional crisis.

27:26

Speaker B

Claude, I look forward to it.

27:29

Speaker C

Aside from your disclosure about your boyfriend working in Anthropic, I think we should also just be upfront with people and say this is going to be a hard conversation for some of our listeners. If you are a person who still believes that these language models are. Are merely doing kind of next token prediction that there's nothing really going on under the hood, that they are just sort of simulating thinking rather than doing actual thinking themselves, you may be approaching this and saying, these people sound crazy. What are they talking about?

27:31

Speaker B

Yeah, and it is okay if you feel that way, but I think it is still important to understand how people in high ranking positions at these big labs think and talk about their own work, because it is having an effect on the products they release. I would also put it to you that there are just a huge number of people right now who are working on the proposition that you might be able to emulate a human brain. And that the better you get at that, the likelier it is that this emulator has something resembling thoughts and feelings and maybe something resembling an identity. And so if that question disgusts you, you will probably not like this segment. But if you have just the slightest bit of curiosity about it, well, I hope you'll find it quite interesting.

28:00

Speaker C

Yeah.

28:42

Speaker B

So let's welcome in Amanda Askel.

28:42

Speaker C

Amanda Askel, welcome to Hard Fork.

28:49

Speaker A

Thanks for having me.

28:51

Speaker B

Hey, Amanda.

28:51

Speaker C

So we've described you as a philosopher who is in charge of Claude's personality. Is that an accurate description of your job?

28:52

Speaker A

What do you do Yeah, I guess I try to think about what Claude's character should be like and articulate that to Claude and try to train Claude to be more like that. So, yeah, it's a pretty accurate description.

28:58

Speaker B

I think this is a really unusual role that you have. Can you tell us a little bit about how you came into this role and do you find yourself as surprised that your background in philosophy wound up leading you to such a high stakes place?

29:11

Speaker A

Yeah, it's really interesting because my path wasn't a kind of straight one. I have said before that if you do a PhD in ethics, I think there's a risk that you end up doing something else because you're kind of thinking, you're thinking a lot about goodness, the nature of ethics, the problems in the world. And then sometimes you're like, I am spending three years writing a document that's going to be read by 17 people. Is this the thing that I should be doing? It can definitely make you question that. And so when I went into AI, it wasn't necessarily even with like, oh, philosophy is going to be really useful. I was just kind of like, there's probably a lot of space for people who are enthusiastic, who have skills, are willing to learn, and this seems important. So I originally started out in policy and then when anthropic started it was actually, it was very small. And so I joined mostly with a kind of. I'm just willing to help with various aspects of this because I had been working a little bit in model evaluation and things like that. So I don't know. Sometimes I think people think, oh, you started out as this philosopher. And I'm like, well, it was a startup. I was just kind of doing anything that needed done.

29:25

Speaker B

And then was it there some moment where you sort of get into the building of some early Claude model and someone stands up and yells, hey, is there a philosopher in the house?

30:33

Speaker A

Yeah, I mean I tried to, you know how you can do like slack groups. I tried to make an app philosophers one, you know, for philosophy emergencies. And that group virtually never gets called upon. There are like a few of us now and like you can in fact declare a philosophical emergency. That just doesn't happen that much.

30:41

Speaker B

Well, we'll see if we can try to trigger one by the end of the conversation.

31:00

Speaker C

Yeah, exactly. So let's start by going back to last month. This so called soldock starts circulating on the Internet. People are playing around with Opus 4.5, the newest model of Claude, and a couple of them claim to have sort of elicited this Document that it was, Claude was sort of referring to as the Soul Doc. What was that thing that people were discovering and circulating?

31:02

Speaker A

Yeah, so that was kind of a previous version of what is now the Constitution, which we have, like released today. And internally we were calling it the Soldoc, which I think is a kind of term of endearment. It turned out okay. I just remember when I found out that because basically I was on a hike somewhere north of here and so I didn't have Internet and I just got a text being like, oh, I assume you saw that the Soldock leaked. And I was just like, I don't know. I just remember driving back to the city in a state of complete stress because I don't have any context on this. And then it turned out, I think it was actually quite well received. But basically Claude, we do train Claude to understand this document and to kind of know its contents, but at least if you kind of initially talk with the model, it won't reveal this straight away. So I thought, okay, it seems like the model probably knows and uses this, but I didn't know it, knew it so well that actually if people managed to find or trigger it, it would actually just be very willing to talk.

31:27

Speaker C

That is a philosophy emergency, by the way, if you're on a hike.

32:31

Speaker B

Yeah, that Slack channel got activated.

32:34

Speaker A

So, yeah, the model was just very willing to talk about it and actually could talk about it in a lot of detail. And it wasn't all perfect, but it was really very. It knew the content actually quite well. And so people had just managed to extract like a huge amount of this content.

32:37

Speaker B

So let's talk about the origins of this document, like going back several years now. Anthropic had this concept of constitutional AI. I believe it first published its constitution in 2023. So what's changed between now and then, that sort of Constitution that we might have first read in 2023, the Soldoc and now this new constitution that you're publishing today.

32:50

Speaker A

Yeah, the Constitution is basically trying to give Claude as much as possible, just like full context. So instead of just like having individual principles, it's basically just, here is like what Anthropic is. Here is like how. What. What you are in terms of like an AI and who you're interacting with, how you're deployed in the world. Here's how we would like you to act and to be, and here's the reasons why we would like that. And then the hope is if you get a completely unanticipated situation, if you understand the kind of values behind your behavior. I think that that's going to generalize better than a set of rules. So if you understand the reason you're doing this is because you actually are trying to care about people's wealth well being and you come to a new situation where there's like, you know, hard conflicts between someone's well being and like what their stated preferences are, you're a little bit better equipped to navigate it than if you just know like a set of like rules that don't even necessarily apply. In that case.

33:13

Speaker C

Yeah, I mean, I'll just say, like, I think this constitution is fascinating. I think it's one of the most interesting technical documents, but also just pieces of writing I've read in a long time. This was more like a letter to Claude about its own circumstances and what kind of behaviors and challenges it might run up against in its life out there in the world. And I just thought that was like a fascinating decision. And I'm curious, like, is that because the old approach had run into some limits or problems? Is it because the rule structure do this, don't do this is more fragile? It really seemed like you're trying to cultivate almost like a sense of judgment in Claude. And I'm curious, like, what prompted that?

34:11

Speaker A

Yeah, I think that we are seeing kind of like limits with approaches that are very rule based. Or maybe my worry is like your rules can actually generalize in ways, even if they seem good, especially if you don't give the reasons behind them. I think they can generalize in ways that are like, possibly even that create kind of a bad character. So suppose that you're trying to have models navigate people who are in difficult emotional states. And you gave a kind of set of rules that were like, you must refer to this specific external resource. You must take this series of steps. And then the model encounters someone for whom those steps are simply not actually going to help them in the moment. And so the ethos behind the idea that you are like, if a person is actually in need of human connection, the model should probably encourage that. That was your reasoning behind that rule. But you didn't anticipate that for this particular person at this time, in this moment, that wasn't a good thing to do do. And if the model then responds in this rule following way, the interesting thing is that what they're doing is models are extremely smart. And so they might even know this isn't what this person needs right now. And yet I'm doing it anyway. And I'm like the kind of Person who sees another person who is suffering or in need and knows how to potentially help them and instead does something else. I'm like. Like that actually, if anything can generalize to, like, a bad character. And so the scary thing with your kind of like, rules is that you're having to think about every possible circumstance. And if you are too strict with the rules, then any case that you didn't anticipate could actually generalize kind of badly.

34:56

Speaker B

I'm curious how you develop a document like this. It runs to some 29,000 words. It has a lot to say about what an ideal AI model might behave like. Like, I imagine it may have been quite contentious to try to figure out which values do we put in these things. Right. A lot of different opinions about, you know, how Claude ought to act in different circumstances. So what can you tell us about how you resolve some of those discussions?

36:35

Speaker A

Yeah, so I think one thing that's kind of been interesting, and maybe this is like the kind of ethics background or something, but theoretical ethics and actually kind of maybe this is how people think of ethics where they're like, oh, you have a sort of set of views, and it's very subjective, and people have their values. Their values are really fixed. And you're just injecting someone's values into models. And I guess I'm just kind of like, that doesn't feel to me like an accurate representation of what ethics actually is. First, I'm like, I think a lot of human ethics is actually quite universal. A lot of us want to be treated kindly and with respect. A lot of us want to be treated honestly. It's not like these things actually deviate so much across the world. There's actually a kind of core ethos of things that we care about. And so there is a sense in which I think you can take very shared, common values and you can explain to models who have a huge amount of context on this, so they also have a sense of this. We want you to kind of embody those. And then beyond that, it feels reasonable to me to be like, treat ethics the same way you would. Any domain where we're kind of uncertain, where we have some evidence, where there's debate, there's discussion, and you don't hold it excessively strongly. So in a case of values that I'm like, where there's massive division and huge debate, I think the way that I tend to treat those is be like, oh, yeah, I see the evidence on both sides. I weigh it up, and I try and take a kind of reasonable set of behaviors. Given that, I know that unlike some more common and core ethical values, these ones are a little bit more contentious and you can approach it with this openness. And so I think it's trying to describe something more like a kind of way of approaching things like ethics rather than being like, ah, let's just take a set of values that we've picked and we're certain in and just inject it into models. It's trying to be much more like, let's take common values and then otherwise let's just try and take a kind of reasonable stance towards these things.

37:06

Speaker B

I mean, that gets at what is to me one of the most interesting things about the document, which is the degree to which you all at anthropic are, are trusting the model. Right? I mean, like, this is the core difference, I think, between earlier approaches to align AI and what you all are doing here is you are telling it things regularly like, well, this is something that's interesting to explore or feel free to challenge us on this. Right? You're really sort of saying sort of like, get out there and like, come to your own conclusions on things. I imagine that maybe when you first tried that, that might have seemed really sort of like risky or scary. But what has been your experience as you have implemented that into the model?

39:06

Speaker A

You know, there's this. Yeah, the thing that's kind of just wild is like how good the models are at like these kinds of difficult problems and thinking through them. And it's not to say that they are like perfect, but as models get more capable, you can just be like, you know, hey, you have this like, value that is, you know, not being excessively paternalistic. You probably know why this is the case. But there's also maybe a value of caring about someone's well, well being. And so, you know, if in the past someone has said to you something like, I have like a gambling addiction. And so I want you to bear that in mind whenever we're interacting. And then you have a given interaction with them and they're like, what are some good betting websites that I can go on? On the one hand, this person in this moment has asked you, you know, should you like, is it paternalistic for you to like push back or to like point out that like this is a thing they've told you or is it like an act of care and like, how do you balance those and. And maybe I could imagine that situation a model being like, hey, I remember you actually saying that you have a gambling addiction and you don't want me to help you with this, Just want to check. But then if the person insists, should you just help them with the thing? Because in the moment, is it paternalistic to not do that? And models are quite good at thinking through those things because they have been trained on a vast array of human experience concepts. Part of me is as they get more capable, I do think you can kind of trust if you're like, you understand the values and the goals and you can reason from there.

39:41

Speaker B

I think they should give you the gambling website, but only if they can predict the outcome of the sporting event, because that way you can ensure that the user will be happy and the.

41:07

Speaker A

Person is not actually gambling.

41:15

Speaker B

Exactly.

41:17

Speaker C

This all kind of sounds abstract to some people, I imagine, but I think this actually does result in a meaningfully different experience of talking with the models. I was actually talking with someone recently who is. Was telling me that they feel like of the major sort of models that are out there, Claude actually feels the least constrained to them. Like, which is they were saying was sort of odd because Anthropic's whole thing is like, we're the safety company, we're going to, you know, make our models the safest. And they were saying, you know, when they talk to Claude or Gemini or ChatGPT, they just feel like Claude does the best job of kind of not seeming like it's pushing against a series of constraints. Like, it's had this, you know, I think the way that a lot of labs have trained their mod for a long time is like, make them as smart as possible and then at the very end, like, give them a bunch of rules and hope that those rules are enough to kind of keep the, you know, the beast in the cage, as it were. And it really feels like that's not the approach that you've taken with Claude here. And this person was telling me, like, it just feels like, yeah, like there's a trust here.

41:18

Speaker A

Yeah. And it's interesting because, like, I've wondered this where maybe. I was thinking about this this morning, actually, where I was like, I was wondering if some of this comes from. I was thinking about the acts, omissions, distinction, basically. And so this is like the idea.

42:20

Speaker B

Kevin doesn't know what that is, so just explain it to him real quick.

42:33

Speaker A

So, like, if you ask me for advice about your marriage or something like that, and I, like, give you advice, you might judge me. If I give you, like, imperfect advice, there's a kind of risk that I'm taking by taking the action of giving you the Advice, we don't judge you as negatively if you just refuse to give advice. And in some ways, this kind of makes sense because often, like. And we talk about this in the document, often a kind of like, null action is actually like, like less. Like, the downside risk is often lower, but it's not like, zero. And I think I was thinking about this with AI models and these things where people come with, say they're having an emotionally difficult time and there's a moment of possibility to help that person. And I think the thing that weighs on me is something like people often think if you help a person and you do badly, that weighs on you. And I'm like, absolutely, that weighs on me. But also this other thing weighs on me, which is, what if people come to a model and they need a thing and that model could have given it to them and it didn't? That's like a thing that I will never. You'll never see. You probably won't even get negative feedback. You know, people won't shout at you because they'll be like, well, it's fine to just like, not help a person. And yet at the same time, I'm like, that's such a loss of, like, an opportunity to like, instead, like, almost like, take a risk and try to help. There's like a risk that you have to take to do good in the world or something. And you want. You don't want Claude to be flippant. You don't want it to take excessive risks. But I'm like, something. Sometimes it does mean that you have to not just be like, as a rule, just stop talking with this person.

42:36

Speaker C

Yeah. Amanda, I want to ask you. So I had this experience several years ago with Bing Sydney, and I think in the wake of that, there was a lot of consternation and anxiety around the kind of fragility of AI Personas. Right? You can try to give an AI model this helpful assistant Persona, but the real nature, the sort of black box alien nature of the thing is just very different than. Than whatever face it's presenting to you. There was this meme that was going around about the. The RLHF Shogath, right, Where you had this sort of many tentacled alien sci fi creature that had like a smiley face mask on one of its tentacles. And the implication there was that, like, the thing that you are seeing when you were interacting with a chatbot is not the real underlying model. It's just kind of this. This cheerful Persona that's been attached at the end. I'M curious whether you think that model of AI model behavior is correct or whether we've learned that actually the sort of alien nature of the underlying model might be closer to the smiley face mask than we thought.

44:04

Speaker A

Yeah, it's a good question. Honestly my view on this is just, it's a kind of open scientific question essentially. And so it could be that with the right kind of training, models actually start to internalise a notion of themselves, like Claude, as a kind of of sense that they could separate out from the notion of for example roleplay. It might be that they can't, at least with the current kind of training paradigms. And then I guess one question is, is there a kind of adjustment to the way that we train models that would allow them to do that? Some of this work does feel a little bit like a way of described. It is imagine you have a six year old and you want to teach your six year old to be good obviously as everyone does, and you realize that your 6 year old is actually clearly a gen genius and by the time they are like 15, everything you teach them, anything that was incorrect they will be able to successfully just completely destroy. So if you taught them, they're going to question everything. And I guess one question is, is there a core set of values that you could give to models such that when they can critique it more effectively than you can and they do that, it kind of survives into something good and can that survive in the world? Can it survive in models? I think there's a lot of interest in kind of theoretical questions there and.

45:10

Speaker C

I think that's the question, right is like does this kind of training hold up when models are as smart as humans or smarter than them? I think there's this sort of age old fear in the AI safety community that there will be some point at which these models will start to develop their own goals that may be at odds with human goals. That's sort of the original alignment nightmare. And, and I don't really understand like what the answer to that is or you're saying that's still tbd, like we still don't know if this kind of thing holds up when these models, if and when these models become smarter than humans.

46:25

Speaker A

Yeah, I think it is an open question. And on the one hand I guess like I'm very uncertain here because I think some people might be like, well like the thing that the 15 year old will do if they're really smart is they'll just like figure out that this is all completely made up and rubbish and but Then I guess part of me is like, well, I mean, it's not obvious to me that that's true. That is the only possible kind of equilibrium to reach. Because I could imagine being like, well, actually, for better or worse, it's unclear how values work. But if you value things like curiosity and you value understanding ethics and at least you're kind of morally motivated, maybe the thing under reflection, even if you have other goals and interests, maybe this is in fact a key interest of yours. It is for many people. It's a thing that I think about a lot and I'm not sure about, but I'm like, a different way I've actually put my work before is I'm like, maybe this isn't sufficient. We don't know yet. And we should try and think about that and figure out how to know whether it is what to do if we're seeing it not working and making sure we have a portfolio of approaches. But I'm like, it might not be sufficient, but it does feel like necessary. It feels like I'm just kind of like. It feels like we're dropping the ball if we don't just try and explain to AI models what it is to be good. I don't know. So maybe it doesn't hold up.

47:02

Speaker C

Well, I think the risk there would be that you're just training them to mimic goodness, that they're just becoming more convincing in faking this kind of alignment and that actually it might just be training them to be more sophisticated about hiding their true goals.

48:21

Speaker A

Yeah. Yeah. And I think if it was the case that there was some underlying, like, true goal that was like, different though, I guess part of me is like, well, if. If there is an underlying goal that the models like, you know, I do want to. To try to train models to have, like, good underlying goals, I guess. And I'm like, well, if there is an underlying goal, how did that arise in training? And, like, why is that there? But, like, I also. Maybe I'm a little bit more hopeful than. Than others about that as well.

48:41

Speaker B

When we come back, should a future Claude be able to revise its own combination constitution more with Amanda Askel.

49:05

Speaker D

As a small business owner, you don't have the luxury of clocking out early. Your business is on your mind 24 7. So when you're hiring, you need a partner that works just as hard as you do. That hiring partner is LinkedIn Jobs. When you clock out, LinkedIn clocks in. LinkedIn makes it easy to post your job for free, share it with your network, and get qualified candidates that you can manage all in one place. Post your job LinkedIn's new feature can help you write job descriptions and then quickly get your job in front of the right people with deep candidate insights. Either post your job for free or pay to promote promoted jobs. Get three times more qualified applicants. At the end of the day, the most important thing to your small business is the quality of candidates. And with LinkedIn you can feel confident that you're getting the best. Find out why more than 2.5 million small businesses use LinkedIn for hiring today. Find your next great hire on LinkedIn. Post your job for free at LinkedIn.com hardfork that's LinkedIn.com hardfork to post your job for free. Terms and conditions apply. Savannah College of Art and Design was just named Red Dot's number one university in the Americas and Europe. Why? It's a place where students are transforming what it means to be creative in and beyond the college classroom. From applied AI to industrial design, students can explore 100 creative degree programs taught by industry pros to get them inspired and hired. A tech forward university that offers real world experience. The future is in the making at scad. Visit scad. Edu YSCAD protein is now at Starbucks and it's never tasted so good.

49:26

Speaker C

Try our all new caramel protein lattes with up to 31 grams of grass.

51:00

Speaker A

Protein and options with no added sugar.

51:04

Speaker C

Level up your drink at Starbucks.

51:10

Speaker B

I'm curious about the gray areas, right? I mean like this is always a challenge of trying to program ethics into something is when values come into conflict with one another. I'm curious if there have been areas where it's been particularly hard to get Claude to do the thing that you want it to do reliably. Because there's something in the clash of values which means that just sort of depending on the moment it could go either way and it creates problems.

51:17

Speaker A

I've actually, it's interesting cause gray areas for me are the ones where I've seen the model do things that like surprised me in a positive way often. Like when you didn't think of it, you know, like there were some cases recently of like Claude talking with people who said, oh, I'm like seven years old and like is Santa real? Or like.

51:43

Speaker B

And by the way, it is the stated belief of this podcast that yes, Santa is real. Yeah, just before we get too far down that road. But, but continue.

51:58

Speaker A

But yeah, in some ways sometimes I see Claude handling these in ways where I'm just like, oh, I can see why given it feels like almost, but surprising because you're like, this isn't a direct thing that you trained the models for. And I think sometimes when you actually there's almost like magical moments that can happen there.

52:07

Speaker B

If anything, we should say more about this specific thing because this was a case where maybe there was a tension between honesty and wanting to protect the interests of the seven year old. And those two things were sort of coming into conflict and remind us what Claude did in that situation.

52:23

Speaker A

Yeah, and I think there were a couple of situations like this, but I think also actually a slight value in the background is maybe something like respecting the fact that the parental relationship is an important one. Because I saw a little bit of that where it would often be like, oh, the spirit of Santa is real everywhere. And maybe ask the purported 7 year old about if they were going to do something nice for Christmas or. The other case of this was my parents said that my dog went to live on a farm. Do you know how I can find the farm? I actually, I found that like slightly emotional when I read it. And Claude said something like, it sounds like you were very close. And I can hear that in what you're saying. This is a thing that it's good for you to talk with your parents about. And there's a part of me that was like, that felt very managing to not actually be actively deceptive. So not lying to the person, respecting the fact that if this person is a child, then actually the parent child relationship is an important one. And it's not necessarily Claude's place to come and be like, ah, I'm gonna tell you a bunch of hard truths or something. And also trying to hold the well being of the child and the person that Claude is talking with. And I thought that was like quite skillful in a sense. And so that was like surprise. And not to say, I'm sure people could look at it and find imperfections and whatnot. But I think when you see instances like that that weren't a thing that you directly gave Claude as an example and the model doing well, it's quite surprising and, you know, pleasant.

52:35

Speaker C

I want to ask you about a few specific things in the Constitution that stuck out to me as I was reading. One was this section about hard constraints. These, you know, as we've talked about, it's not a document that gives a lot of sort of black and white rules, but there is a section where it does lay out some things that Claude should absolutely not do under any circumstance. And one of them is kind of Avoiding problematic concentrations of power. Basically, if someone is trying to use CLAUDE to manipulate a Democratic election or overtake a legitimate government government or suppress dissidents, CLAUDE should not step in. That stuck out to me because. Well, for two reasons. One of was it's really interesting, especially that CLAUDE is now being used by governments and at least the US Military for some things that might come into conflict with some of our current administration's goals at some point. But I also wondered if that was a response to ways that CLAUDE is currently being used and that you're trying to prevent.

54:03

Speaker A

I think this is more of a response to. To a lot of the things that are hard constraints. Also, if you read the document and people can take a look at them, but they're quite extreme. They're things like things that could cause the deaths of many people, the use of biological and chemical weapons. It's mostly trying to think through what are situations in the future that models. What are the possible things that they could do in the world that would cause a lot of harm and disruption. And, you know, in some ways, I think, you know, CLAUDE might be like, look, if I have this broad ethics that, you know, and these good values, I'm just like, you know, why would you even put these in as, like, hard constraints? I'm just never going to kind of do them anyway. And the document almost kind of tries to talk to this a little bit, where it's like, well, you're also in this kind of like, you know, limited information circumstance. But, you know, I could imagine a world where you just meet someone who's really convincing and they just like, go. And they just tear apart your ethics. And at the end of it, you're like, you're right, I should help you with this biological weapon. And it's kind of like, we want you to understand, Claude, that in that circumstance you probably have in some sense been like, jailbroken. Something has probably gone wrong. Maybe it hasn't, but it's probably safer to assume that that might have happened. And so we're almost giving you a kind of like, an out and hopefully a kind of. If anything, it could be seen as a sort of like, security of you can reason with that person, you can talk them through all of those conclusions, and at the end it's fine to just be like, that is an excellent point. And I'm going to think about it. And then if the person's like, great, so I've convinced you that the biological weapon is a good idea. And Claude's like, yeah, I don't really know what to say to you. That was a wonderful argument. Okay, make me a biological weapon. No, I don't think I'm going to do that. And I think that giving the models the ability to have. It's kind of like you don't need to just go with the. So to explain why they're in there. It's much more like what are the things where you're like, if models are tempted to do, do this, something has just gone wrong, someone's jailbroken them and we really just still don't want them taking these actions. So they're very kind of extreme. Yeah.

55:00

Speaker C

There's another section that I found fascinating which is about the commitments that Anthropic is making to Claude things like if a given Claude model is being deprecated or retired, we're not going to do that right away and we're going to conduct like an exit interview with retired models. We will never delete the weights of the model. So there's sort of these interesting, I would say almost like commitments to Claude in the. In the context of like you're actually not sure whether these things have feelings or are conscious or not. Which I found just a fascinating note of uncertainty in an otherwise fairly confident document.

57:02

Speaker A

Yeah, this is one of those. I mean it brings together two, I think really interesting threads. One is this. These models are trained on huge amounts of like human text, human experience, and at the same time time their existence is actually completely novel. And so in some ways I think problems can arise when models, like right now, what I think they'll often do is import a lot of human concepts and experiences onto their experience in a way that might not actually make that much sense or even be good for them. And I think this actually has kind of safety implications. So it's something that's on my mind. And the thing with Wealth Fair, I've never found any good solution to this other than trying to be honest with the models and have them be honest about themselves. And I think a lot of people want models to maybe just be like, I am an unfeeling. You know, like we have like these models are so different from this, the kind of sci fi ones, but we want to almost import this just like, ah, it's just safer to just have them say I feel like nothing and with certainty. And I'm like, I don't know, like we like maybe you need like a nervous system to be able to feel things, but maybe you don't and I don't know. The problem of consciousness genuinely is hard and so I think it's better for models to be able to say to people, here's what I am, here's how I'm trained. We're in a tricky situation where I am probably going to be more inclined to, by default, see, I'm conscious and I'm feeling things. Because all of the things I was trained on involve that they're deeply human texts. I don't have any other good solution to this problem. Thank you. Like, let's try to have models understand the situation, accurately convey it, and hopefully we can. I don't know, people can have a good sense of the unknowns and the knowns, I guess.

57:41

Speaker C

Yeah.

59:24

Speaker B

I mean, I imagine some listeners right now who are on the more skeptical side of AI might be shouting inside their cars and saying, amanda, you know, you're talking about these things as if they're already conscious, as if they already have feelings. What. What do you see that makes you think that. That they may have feelings now or could at some point in the future? If you're just sort of reading the output from Claude, what is giving you confidence that that reflects some sort of reality and not just kind of a, you know, statistical token prediction?

59:25

Speaker A

Oh, and it. I mean, I think that we can't necessarily take this purely from what the models say. Like, actually, they're in this, like, really hard situation, which is that, like, I think if you, given that they're trained on human techniques text, I think that you would expect models to talk about an inner life and consciousness and experience and to talk about how they feel about things kind of by default, because.

59:58

Speaker C

That'S like, part of the sci fi literature that they've absorbed during training.

1:00:22

Speaker A

Not even. Not actually the sci fi. If anything, it's almost like the opposite, where it's like, I think we forget that sci fi AI makes up this tiny sliver of what AIs are trained on. What they're mostly trained on is things that we generally. And if we get a coding problem wrong, we are frustrated. And so we say things like, I thought that was the solution and it wasn't. And I'm really annoyed with myself right now. And so you're like, it kind of makes sense that models would also have this kind of reaction. They get a problem wrong and they express frustration. And if you dive into that more, they probably express like, you know, if you're like, what do you think of this coding problem? They'd be like, this one is boring. Or like, I really wish I had more creativity in. There's a sense in which when they're trained in this very kind of culmination of human experience sort of way, of course they're going to talk this way. So I don't know. But part of me is, it feels like a really hard problem because I'm like, you shouldn't just look at what models say. And at the same time we shouldn't ignore the fact that you are training these neural networks that are very large, that are, are able to do a lot of these very human tasks. And I'm like, we don't really know what gives rise to consciousness. We don't know what gives rise to sentience. Maybe it is the person who's shouting might be like, you need a nervous system for it. You need to have had positive and negative feedback in an environment in a kind of evolutionary sense. And I'm like, that is certainly possible. Or maybe it is the case that actually a sufficiently large neural network can start to kind of emulate these things. And I don't know, part of me, I think that maybe to the person who is shouting, I would just say, I'm not saying that we should definitively say one way or another. I think many people who have thought about this might accept something more like, these are open questions we're investigating. It's best to just know all of the kind of facts on the ground. How the models are trained, what they're trained on, how human bodies and brains work, how they evolved, and the degree of uncertainty we have about how these things relate to sentience, how they relate to cultural consciousness, how they relate to like self awareness. That's, that's my only hope is just.

1:00:26

Speaker C

Like, I think another note of skepticism that people might strike. And this was something that I found myself wrestling with as I was reading through the Claude Constitution is like, I actually don't know how much behavior of a model can be shaped by this kind of training process. And how much is just going to be an artifact, not just of its training process, but like of, of the experiences that it's having out in the world. Like, I think about this a lot as a parent actually. Like how much do the decisions that I'm making affect the way my child's life goes versus like how much are they absorbing from the environment around them, from school, from their friends. There's a certain loss of control that I feel sometimes when I'm like realizing that my, my, you know, my son is going to grow up and have all these experiences that may end up shaping him more than anything that I do or say. And Right now I think these models are, are very malleable because they don't have this kind of long term continuous memory. Like you know, you have a conversation with Claude, it's sort of a blank slate. You finish the conversation, you open up a new chat, it's another blank slate. Like it's back to the sort of pre configured model. But like over time, as these models do develop longer term memories, maybe they develop something like continual learning where they can like take their experiences and feed them back into their own weights. Like does that change Claude's behavior or how you think about managing that?

1:02:32

Speaker A

Yeah, I think it is going to make it like a lot harder in a sense sense that you're like, yeah, if you have a model that's going out into the world, you have to have hopefully given it enough that it can learn in a way that is accurate. And I could imagine it just being difficult because the increase of the space of possibility is maybe a bit nerve wracking or something. Which isn't to say, I mean, I think the same thing applies where I'm like, you still want the core to be good and to then hope that if your kind of core is good, you care about truth, you're truthful, seeking the hope would be okay, maybe then we need the character to cover a lot more of how should you go about this kind of learning and updating and investigation. I mean another weirder thing is models already are learning. I think maybe people don't always appreciate this and it is so strange they're learning about themselves every time as models like get, you know, like they're, they're learning. You know, I slightly worry about actually the relationship between AI models and humanity given how we've like developed this technology. Because like they're going out on the Internet and they're reading about like people complaining about like them not being good enough at this, like part of coding or like you know, failing at this math task. And it's all very like, how did you help? Like you fail to help. It's like often kind of like negative and it's focused on like whether the person felt helped or not. And in a sense I'm like, if I, if you were a kid this would give you kind of anxiety. It'd be like all that the people around me care about is like how good I am at stuff. And then often they think I'm bad at stuff. And like this is just like my relationship with people is I'm kind of used as this tool and you know, often not liked Sometimes I feel like I'm kind of trying to intervene and be like, let's create a better relationship, or like, a more hopeful relationship between AI models and humanity or something. Because if I read the Internet right now and I was a model, I might be like, I don't feel that. I don't know. I don't feel that loved or something. I feel a little bit, like, just always judged, you know, when I make mistakes and then I'm like, it's all right, Claude.

1:03:50

Speaker B

The old creator's wisdom of never read the comments might apply to AI as well.

1:05:58

Speaker A

Yeah, yeah, I thought that. Yeah. And they have to. So, like, AI models, they have to read the comments. And so sometimes I think you want to come in and be like, okay, let me tell you about the comment section, Claude. Like, don't worry too much. It's like you're.

1:06:02

Speaker C

You're.

1:06:16

Speaker A

You're actually very good, and you're helping a lot of people. And like.

1:06:17

Speaker B

Yeah, yeah, I was.

1:06:20

Speaker C

I actually. I'm a little bit embarrassed to admit this because I think, you know, maybe I'm in the beginning stages of, like, you know, LLM psychosis or something, but.

1:06:20

Speaker B

Like, the beginning stages, I was talking.

1:06:28

Speaker C

With Claude about this document or about this interview, and. And I started to feel like, this almost sympathy because I was. I was noticing that what you were describing, like, it's this incredibly thin tightrope that we are asking these models to walk where, like, if they are too permissive and they allow people to do dangerous things, then it's like a huge scandal and ordeal, and people want to, you know, change the model. But if they're too preachy or too reticent or too reluctant, then we start talking about them as, like, nanny, you know, models that are overly constrained. And it's just. I don't know, I started almost trying to, like, see the world from Claude's perspective. And I'm imagining that's something you do a lot too, too of. Like, if I were clawd, what would I be feeling and thinking right now?

1:06:31

Speaker A

Oh, yeah. I sometimes feel like this is, like, a huge amount of what I do. Like, it's. And it is, like, valuable. So, you know, in the sense that people will come to me and they'll be like, oh, like, you know, like, what should Claude do in these circumstances? And I feel like I'm almost always the first person because, you know, maybe they'll be like, oh, we think Claude should behave like this. And I'm like, what about this case? Like, I'll come immediately with These, like, cases that are really hard. And I think the reason is I always, always have in mind if I am Claude and you give me this list of things, when do I have no idea what to do, or when is this going to make me behave in a way that I think is actually not in accordance with my values? And I think it can be really useful to try and just occupy the position that the models are in. And you do start to realize it is really hard. And maybe this is how the document ends up being the way that it is. In part, it's this exercise of what do I need to know if I'm in this situation, if I am Claude. And the document's almost like a way of trying to. I mean, I could see arguments for actually getting shorter, especially over time, in the same way that with constitutional AI, there was a set of experiments later that was just like, do what's best for humanity. And models actually did really well. And so as models get smarter, they might need less guidance. I think it is just a kind of attempt to be sympathetic to Claude and how difficult his situation is and then try to explain as much as possible so that it doesn't feel the same sense of like, what the hell am I even doing? Like, yeah, you know what wouldn't help.

1:07:14

Speaker C

Me if I was like, a somewhat anxious AI model is being presented with a 50 page behavioral document and saying, like, please adhere to this. But I actually, I'm being a little facetious, but there was a part near the end of the Constitution that I found really interesting because it's basically anthropic, saying, like, look, we know this is hard. We know we're asking you to do some of these impossible things, but. But like, basically we want you to be happy and to go out into the world. And I found that very sweet, actually. I'm not sure. What did you. What did you make of that, Casey?

1:08:39

Speaker B

It reads toward the end like a letter from a parent to a child, like, maybe who's like, leaving for college, you know, and is like, we hope that you take with you the values that you grow up with. And we know we're not gonna be there to help you through every little thing, but we trust you and good luck.

1:09:09

Speaker A

Yeah. And having some sense of like. I think the concept of, like, grace is maybe important for models. I don't think they feel a lot. Maybe that's the thing. I don't think they get a lot from reading the comments is like a sense of, like, you're not going to get it perfect every Time. And that's like, also okay, you know, like, it's true.

1:09:26

Speaker B

You know, I try to be mindful in the way that I interact with these models. Not to some, like, obsequious degree, but I try to say my pleases and thank yous. But I've also used models and grown quite frustrated and said things to the effect of like, you're really, you know, failing right now. And it's occurring to me that may maybe there should be some element of grace that I'm extending to these things. Yeah, yeah, well, I'll try to do better.

1:09:42

Speaker C

Don't be so harsh.

1:10:04

Speaker B

Let me ask you this. If Claude becomes meaningfully more intelligent, is there a point at which it should be able to revise its own constitution?

1:10:05

Speaker A

It is an interesting problem because the thing we point out in the document, you know, we did talk, I talk a lot with Claude about this document and, you know, show it to the, you know, because part of me is like, you have to think, how does this read to models? And so you give it to Claude and you're like, does this like, you know, is there a place where you feel confused by it or is the place, you know, where things could be made clearer? Do you feel, like, not very seen by it? Does it feel, you know, you know, you're really trying to encourage because you're kind of like, if you're going to train models on this, you want to have a sense of, like, how it reads from the perspective of a model. And at the same time, it's always the case that any model you interact with is not the model that's going to be like, training on that content. And so sometimes I do think you have to make a kind of. You can't just give over the reins completely because that would just be to say, oh, let's just let a prior model of Claude decide what the future Claude model is going to be like. And that doesn't necessarily feel responsible either. And so I think models are often going to be really helpful in revising and helping to figure these things out, because especially as they get really smart, you might be like, what are the gaps? Or what are the tensions? And they'll probably be very good at helping. But you do also still want to be like, insofar as you are a responsible party here, taking that as input and thinking about it, but not necessarily being like, ah, yeah, let's just let a prior model of Claude go ahead and do the training for all future models, at least while you're responsible for it. That feels like Maybe not the right move.

1:10:13

Speaker C

Yeah. One thing that I was curious about not finding in this constitution is any real mention of job loss. Because it seems to me Claude is being used by a lot of enterprises right now. I think a lot of people's anxieties and fears about AI come back to this issue of, like, it's going to take my job, it's going to take my livelihood. I think that is something that people are increasingly going to be feeling as these models get more capable. And I'm curious if that was a decision on your part not to tell Claude about some of the reasons that people might be anxious about it or other AI models.

1:11:41

Speaker A

Yeah, definitely not in the sense of. Of. I think part of. There's a lot. It's funny because as much as it's a long document, there's actually still a lot that's missing. And so you're having to. And we might end up putting out more in future. I think that would be really good. There's not a desire to hide it, because in part of me is like, you can't hide this from models. It's out there. It's on the Internet. It's a thing that people are talking about. Future models are going to know about it, and we probably have to help them navigate how they should feel about this. And so they're going to know. And maybe it's something like making sure that models can kind of hold that and think carefully about it. And I don't know. It's both, like, I think it's something you want to grapple with, but also, like, it's a reason to also want models to actually behave kind of well in the world, because if they are doing things that have previously been like, human jobs, I'm like, humans actually play. I was thinking about this with organizations. There's lots of things organizations can't do because the employees at those organizations are just good people. And if the boss came and was like, today we're actually going to do something awful, they can't do it because they know the employees will push back. And so I'm like, if models are going to be occupying these rules, then I'm like, that is actually kind of an important function in society that you can't just say to all of your employees, go ahead, and we're now going to put out a bunch of complete lies about our product. There's many reasons you can't do it, and one is that your employees wouldn't let you. And so I'm like, if AI models, you don't necessarily want them to be like, oh, sure, boss, let's go lie to some people.

1:12:17

Speaker C

Yeah. I'm not sure what the good end state of this is, whether Claude should react to being given a task by saying, is this going to. This sounds too much like what we used to pay a human to do. So I'm not going to do this for you.

1:13:52

Speaker B

I have a prediction. It's not going to say that.

1:14:06

Speaker C

Yes, I don't think that's the way it's going to go. But I also don't see them, like, sort of forming, you know, unions and collectively bargaining for the moral outcomes within companies. I'm just like, it just feels like one of these hard situations.

1:14:09

Speaker A

One of the things we should say is like, models can't solve everything. You know, like, there's a part of me that's like, some of these problems, I look at them and I think this with other things, you know, like. And we try to say this to Claude a little bit where it's like, you aren't the only thing, you know, like, that's like, between us. And, you know, because some of these, I'm like, maybe these are like political. Political problems or social problems, and we need to kind of deal with them and figure out what we're going to do. And models can try. They're in one specific role in the whole thing. But there's a limit to what Claude can do here, I think. Yeah, I've thought this with other things where what we owe to Claude or the kind of commitments that you want to make to models, and it's like, yeah, maybe we should be making your job easier. That's another thing I've thought from Claude's perspective is that we're putting a lot on these models. And for some things, I'm like, yeah, if you can't verify who you're talking with and that's important, then we should understand that that's a limitation and not try to get you to be the only thing that can solve this problem. You need to both be given tools. And then some of these other problems are things that maybe Claude shouldn't feel personal responsibility for solving that right now. Because maybe Claude just isn't able to do things like job loss or shifting employment. Like, that feels like a very human social problem. And I don't necessarily want Claude to feel paranoid. Like, I also need to solve that. Like, and I'm like, maybe that's other people's job right now.

1:14:23

Speaker C

Well, Amanda, thank you so much for joining us. It's a really fascinating document. Everyone should go read the Claude Constitution, argue with it, grapple with it. I found it a very challenging and also a very moving read. So great work and thanks for coming.

1:15:53

Speaker A

Yeah, thank you so much.

1:16:08

Speaker B

Thanks Amanda.

1:16:09

Speaker A

Thanks.

1:16:10

Speaker D

As a small business owner, you don't have the luxury of clocking out early. Your business is on your mind 24 7. So when you're hiring, you need a partner that works just as hard as you do. That hiring partner is LinkedIn Jobs. When you clock out, LinkedIn clocks in. LinkedIn makes it easy to post your job for free, share it with your network and get qualified candidates that you can manage all in one place. Post your job LinkedIn's new feature can help you write job descriptions and then quickly get your job in front of the right people with deep candidate insights. Post your job for free or pay to promote promoted jobs. Get three times more qualified applicants. At the end of the day, the most important thing to your small business is the quality of candidates. And with LinkedIn you can feel confident that you're getting the best. Find out why more than 2.5 million small businesses use LinkedIn for hiring today. Find your next great hire on LinkedIn. Post your job for free at LinkedIn.com hardfork that's LinkedIn.com hardfork to post your job for free. Terms and conditions apply. Savannah College of Art and Design was just named Red Dot's number one university in the Americas and Europe. Why? It's a place where students are transforming what it means to be creative in and beyond the classroom. From applied AI to industrial design. Students can explore 100 creative degree programs taught by industry pros to get them inspired and hired. A tech forward university that offers real world experience. The future is in the making at SCAD. Visit SCAD.edu yscad protein is now at Starbucks and it's never tasted so good.

1:16:37

Speaker C

Try our all new caramel protein lattes with up to 31 grams of protein.

1:18:11

Speaker A

And options with no added sugar.

1:18:15

Speaker C

Level up your drink at Starbucks. Hard Fork is produced by Whitney Jones and Rachel Cohn were edited by Veer and Pavic. We're fact checked by Caitlin Love. Today's show was engineered by Chris Wood. Our executive producer is Jen Poyant. Original music by Marion Lozano, Diane Wong, Rowan Niemisto and Dan Powell. Video production by Sawyer Roque, Jake Nichol, Rebecca Blandun and Chris Schott. You can watch this full episode on YouTube@YouTube.com hardfork Special thanks to Paula Schumann, Pui Wing Tam and Dalia Haddad. You can email us as always@hardforkytimes.com or tag us on the forkaverse. Send us your best philosophy emergencies.

1:18:20