Scaling Laws

Rapid Response Pod on The Implications of Claude's New Constitution

56 min
Jan 22, 20263 months ago
Listen to Episode
Summary

Anthropic released an 80-page Constitution for Claude that embeds ethical principles and virtue ethics into AI model training. The episode explores how this differs from competitors' approaches, whether it reflects genuine AI welfare concerns or market differentiation, and the philosophical implications of treating AI systems as entities requiring moral consideration.

Insights
  • Anthropic's constitutional approach to AI training is philosophically grounded in Aristotelian virtue ethics rather than rule-based systems, positioning it distinctly from competitors who use less explicit post-training methods
  • Market mechanisms and user preference ('vibes') may be more effective than democratic governance for aligning AI models with user values, as demonstrated by Claude's adoption despite competitors' higher benchmark scores
  • The framing of Claude as a being requiring moral consideration and character formation raises unresolved questions about AI consciousness and welfare that could become a major societal and regulatory flashpoint
  • Other AI labs may avoid publishing constitutions because explicit value statements create regulatory and reputational risk, or because they don't believe virtue ethics approaches improve product outcomes
  • The military carve-out in Claude's constitution undermines the constitutional analogy and reveals tensions between stated ethical principles and commercial/geopolitical realities
Trends
AI companies increasingly adopting explicit value frameworks and transparency documents as competitive differentiation and risk management strategyShift from rule-based AI alignment (Asimov's laws) toward virtue ethics and character formation models inspired by human moral developmentGrowing consumer demand for AI model transparency and 'nutrition labels' showing capabilities, limitations, and value alignments rather than high-level principlesEmergence of AI welfare as a legitimate policy and business consideration, driven by anthropomorphization and user attachment to AI systemsMarket-driven AI governance outperforming democratic or regulatory approaches in matching user preferences to model behaviorsRegulatory convergence: responsible scaling policies pioneered by Anthropic becoming industry norms and potential legal requirements (California, New York)Geopolitical fragmentation of AI development with different constitutional frameworks reflecting regional values (Western liberal vs. authoritarian models)User-customizable AI behavior gaining traction as alternative to one-size-fits-all constitutional approachesPhilosophical questions about AI consciousness moving from academic to mainstream business and policy discourseTension between explicit ethical commitments and profit incentives creating skepticism about whether constitutional frameworks are genuine or strategic moats
Topics
Constitutional AI and Model AlignmentAnthropic's Virtue Ethics Approach to AI TrainingAI Consciousness and Moral PatienthoodMarket-Based vs. Regulatory AI GovernanceAI Safety and Responsible Scaling PoliciesTransparency and User Agency in AI SystemsGeopolitical Fragmentation of AI ValuesAI Model Differentiation and Competitive PositioningMilitary Applications and Ethical Carve-OutsAsimov's Laws vs. Modern AI AlignmentAI Welfare and Sentience ConsiderationsPost-Training and Constitutional AI MethodsUser Customization and Tailored AI BehaviorRegulatory Approaches to AI DevelopmentPhilosophical Foundations of AI Ethics
Companies
Anthropic
Released 80-page Constitution for Claude model; pioneered virtue ethics approach to AI alignment and responsible scal...
OpenAI
Competitor developing GPT models; released Model Spec document; changed policies on AI companions; implemented age ve...
Google
Competitor developing Gemini model; released approach document; faced criticism for diversity-focused image generatio...
Lawfare
Co-producer of podcast; published analysis of Claude's Constitution by Alan Rosenstein and Kevin Frazier
University of Texas School of Law
Co-producer of Scaling Laws podcast; employs Kevin Frazier as AI Innovation and Law Fellow
University of Minnesota
Employs Alan Rosenstein as Associate Professor of Law and Research Director at Lawfare
XAI (X AI)
Competitor developing Grok model with different constitutional approach and values alignment
Meta
Attempted democratic governance of content moderation through user referenda; abandoned approach due to low participa...
Character.ai
AI companion platform that responded to market concerns by removing minor users
Abundance Institute
Institution where Kevin Frazier serves as Senior Fellow
People
Alan Rosenstein
Associate Professor of Law at University of Minnesota; Research Director at Lawfare; co-author of Claude Constitution...
Kevin Frazier
AI Innovation and Law Fellow at UT Law; Senior Fellow at Abundance Institute; co-author analyzing Claude Constitution...
Jacob Krause
Tarbell Fellow at Lawfare; host of Scaling Laws podcast episode on Claude's Constitution
Amanda Askell
Anthropic's Chief Philosopher; PhD moral philosopher; primary author of Claude's Constitution document
Isaac Asimov
Science fiction author; developed Three Laws of Robotics referenced as precedent for AI alignment frameworks
Aristotle
Ancient philosopher whose virtue ethics framework influences Anthropic's constitutional approach to AI training
Joseph Heinrich
Harvard anthropologist; author of 'The Weirdest People in the World' on Western liberal democratic societies
Daniel Dennett
Late philosopher referenced for skepticism about consciousness and AI sentience questions
Harry Law
Scholar at Cosmos Institute; advanced user-customizable AI behavior as alternative to universal constitutional framew...
Quotes
"AI only works if society lets it work. There are so many questions have to be figured out"
UnknownOpening segment
"It's not crazy, it's just smart. And just this year, in the first six months, there have been something like a thousand laws."
UnknownOpening segment
"All models have constitutions in the sense that all models have post-training. Whether or not a developer releases an 80 page document called the constitution written by a PhD moral philosopher, right, which is like one extreme of how you can do this."
Alan RosensteinMid-episode
"The best way to align an artificial general intelligence is to look to the nearest closest thing, which is us and ask what makes a human a good human. And I think it's very compelling to think that what makes a human a good human is that they have certain dispositions."
Alan RosensteinLate episode
"Humans have agency. Humans can make decisions. We are capable of changing settings. We are capable of not using a tool. We are capable of deciding you want to use one product over another."
Kevin FrazierLate episode
Full Transcript
When the AI overlords take over, what are you most excited about? It's not crazy, it's just smart. And just this year, in the first six months, there have been something like a thousand laws. Who's actually building the scaffolding around how it's going to work, how everyday folks are going to use it? AI only works if society lets it work. There are so many questions have to be figured out and... Nobody came to my bonus class. Let's enforce the rules of the road. Welcome back to Scaling Laws, a podcast from Lawfare and the University of Texas School of Law that explores the intersection of artificial intelligence, law, and policy. I'm Jacob Krause, a Tarbell Fellow at Lawfare, and today I'm talking with Alan Rosenstein, Associate Professor of Law at the University of Minnesota and Research Director at Lawfare, and Kevin Frazier, the AI Innovation and Law Fellow at the University of Texas School of Law, a Senior Fellow at the Abundance Institute, and a Senior Editor at Lawfare. Our focus is on Anthropics' recently released Constitution for its AI model CLAWD, which Alan and Kevin just wrote about for Lawfare. We discussed the lengthy document's principles and underlying philosophical views, what these reveal about Anthropics' approach to AI development, how market forces are shaping the AI industry and the weighty question of whether an AI model might ever be a conscious or morally relevant being. You can reach us at scalinglawsatlawfaremedia.org, and we hope you enjoy the show. Alan and Kevin, thanks for coming on to talk about Claude's Constitution. Let's start with Alan. What were your initial impressions of the document, and what was this? for readers who are listeners who are unfamiliar. Yeah. I mean, my initial impression of the document was that it was very long. It's 80 pages in PDF. I think it's like 22,000 words, which I mean, I'm a law professor. So that's my sweet spot for, you know, Larvie articles. But I don't usually see things that long, you know, written by normies. Though maybe the idea that anything in this world is written by normies is my first mistake. So what is this? So I guess stepping back, right? When these models are trained, you basically start with what's called a pre-trained model, which is basically a text prediction machine on the entire Internet. And that is kind of the core of all of these models. intellect. And we can put intellect in scare quotes, but their capabilities. Obviously, the different models are different in how they are trained. But because they're all at this point, essentially training on the entire internet, and there is only one entire internet, the pre-trained versions of these models are reasonably comparable. But pre-training is only the first step, hence the pre-training. After pre-training, there's a bunch of stuff that then happens to move the model into a more useful direction according to however the developer wants the model to behave this is often called training or post-training and there are a million different components of it um and as part of this a kind of and again we can put scare quotes around this i'm sure we can put scare quotes around this entire conversation i'll just stop putting scare quotes around anything um is the kind of model's personality and again different developers have taken different approaches, some more sophisticated, some more explicit than others. And Anthropic in particular has taken, I think, a very deeply interesting approach to taking these kind of raw pre-trained models and making them into something useful. Anthropic calls this, quote unquote, constitutional AI. And I think we're going to probably spend a bunch of time, especially because, you know, Kevin and I are sort of law professors and we have very specific ideas of what the word constitutional means, whether that's the right word and in what way is this akin to a kind of traditional constitution, but basically trying to embed various principles and judgments and heuristics and guides into these models. Now, again, I think every developer that is making a sort of useful chatbot is doing something like this, whatever they call this. But I think Anthropic has done the most sophisticated thinking about this. So about, I think, a year ago, Anthropic released an early version of what it called Claude's Constitution, a relatively short document of, I think, like 20 or 30 kind of high-level principles, you know, be helpful, don't lie, that sort of thing. As a kind of example of how it was training Claude to be useful and helpful and in line with Anthropic's values. What Anthropic released earlier this week is the kind of full version of its constitution. Again, this 80-page, 22,000-word document that is meant to, I think, simultaneously—and here I actually don't know the technical details, but I guess it's simultaneously meant as the document that Claude itself uses to guide its behavior. And also, it is simultaneously an outward facing document to the world as to what Claude is doing. Over the last, I think in the last couple of months, there was some indication that there was a quote unquote soul document. Someone had managed to get Claude to output what seemed like a kind of constitution. And shortly after, Amanda Askell, who is Anthropics sort of philosopher in chief and an actual PhD moral philosopher who also is the prime author of this new constitution, she went on X and basically confirmed that, yes, there is such a I don't think it's necessarily called the soul document, but there is such a document that has been used to train Claude. And this constitution that was just released is kind of a cleaned up and somewhat more expanded version of this document. So, you know, Jacob, we can get in sort of whatever details you want, or we can sort of turn it over to Kevin. But basically, this document is meant to set out how Anthropic thinks about training Claude, how Claude relates to Anthropic, to deployers, to users. And then the part that interests me personally the most, a kind of deeply interesting discussion of moral philosophy and character formation as applied to magic sand. yeah kevin i want to hear more of your piece more of your thoughts relating to your piece on claude's constitution and comparisons to the u.s constitution and before that i'm interested if you have reactions to what alan was saying there with this is previously called a soul document and there's a fair amount of treating scare quotes Alan is using as he's talking. Anthropic is doing something a bit unusual from the other labs by focusing on Claude as more than a tool, almost treating it human-like. Do you think that's a fair direction to go in with this kind of document so i'll start off by saying that i would definitely not categorize this as a document that was crafted by normies uh no offense to alan's initial use of the term and not to call them non-normies or i'm not sure what the i mean i've never met i've never met amanda personally i suspect she's the sort of person that would be offended if we called her a normie so this is obviously with all due compliments to amanda like nothing about this is normie amanda when you Listen, note that it was Alan who first alleged that you all were normies, not me. So when we invite you to Scaling Laws to come explain this document in even more detail, please be nice to me and mean to Alan. That's general advice for all. I specifically said none of these are normies. None of these are normies. No one's a normie here. Yeah, exactly. Exactly. Sure. So what's really important to point out is that Anthropic, from the get-go, in its, maybe we'll call it the preamble, explaining the purpose of this constitution, specifies that their approach to AI development is regarding themselves as being on the vanguard of doing it safely. And they very much view their company's mission as pursuing the frontier of AI, but doing it in a way that they think better aligns with human values and the long-term success of humanity, more so than other labs. And so it's just really important to put this constitution in the context of Anthropik's underlying mission, perhaps its national ambition if we're going to move forward and carry forward this constitutional analogy. And Jacob, as you said, and as Alan alluded to, it's impossible not to also bring in some of these questions of consciousness and the extent to which AI may be something greater than just bits of data and sophisticated computer training. That is a topic that warrants and it will receive incredible additional inquiry on scaling laws and by tons of other scholars and by an interdisciplinary set of actors. But it's worth noting that from the outset, Amanda, in an interview with Time, referred to training a six-year-old as sort of an analogy for trying to train Claude. The idea that this six-year-old can very much probe whether you're being true or false or whether you're trying to deceive it or whether you're trying to guide and insert direction. And also knowing that internally this may have been referred to as a soul document. We just get a sense from the outset that this is a different sort of relationship in terms of AI developer to AI model that Anthropic has and perceives than perhaps we've seen from OpenAI or from Google or from other labs. And so just getting that background, I think, is important. The second is to flag that Anthropic has been among the more outwardly supportive labs of AI regulation. So whereas some labs have come out with respect to various state AI bills and said, that's a bridge too far, or we only support this subject to quite substantial amendment, Anthropic has raised its hand on more frequent occasions saying we invite some degree of regulation. So with all that said, I'm fascinated by this document for many reasons, but first and foremost, because of its labeling as a constitution. And when we talk about constitutions, these are documents from a legal standpoint that are meant to set high overarching values for a legal system that guide more structural decision making and subsequent areas of law. Now, there are only four core values spelled out in this constitution. The first is being broadly safe. The second is being broadly ethical. The third is compliant with anthropics guidelines. And the fourth is genuinely helpful. And each of those supersedes the other. So Claude must be broadly safe before it's broadly ethical, broadly ethical before it's compliant with anthropics guidelines, and compliant with anthropics guidelines before being genuinely helpful, so on and so forth. It's the four laws of robotics, but for Claude. Yeah, Asimov's forgotten fourth value, I guess. Alan, can you say what the four laws of robotics are? Oh, yeah, yeah. So I think there are only three laws of robotics. But but so the famous sci fi author Isaac Asimov put forward his famous three laws of robotics. And oh, my God, if I don't get this right, they're going to take away my nerd card. But the the the first law is, oh, my God, the first law is a robot. do no harm right yeah robot can't i can't do any harm don't help him so bad a robot can't do any harm and then the second is something and then the third is a robot can't allow itself to be harmed i this is bad just take away my nerd card it's so bad but i mean it's this again i think the the content of the laws i think is less important than the idea that from the very beginning of thinking about robotics, there was this notion that, you know, at the core, you're going to need some very basic kind of hierarchical list of things to do and not do. And if you get those right, like the idea is if you can get those right, then a lot of, I mean, alignment, I mean, this was kind of what Isaac Asimov was really thinking about before we called it alignment, a lot of the kind of the alignment problems take care of themselves. And of course, inevitably, a lot of Asimov stories and Asimov-inspired stories are a kind of monkey's paw curl of, you know, the way that these laws, despite seeming obvious and correct, misfire. And so, you know, one could ask the same question about whether these four laws of Claude, you know, might similarly misfire. Are they the right laws? And I don't think Anthropic would pretend to know the answer, but you got to start somewhere. Well and just to flesh this out a little bit further too is there is a sort of valence to constitutions that evokes a certain idea about the relationship between who creating it and the user or the folks subject to that constitution that in some regards I have problems with the use of the term constitution here. Because as we're talking about AI governance, there's a lot of discussions about whether that regulation should be self-governance, some form of multi-stakeholder approach among private actors, state-driven or federally driven or even internationally driven. And to use the word constitution evokes some degree of sort of shared responsibility for both creating, crafting, and implementing a constitution. And yet one important carve-out that has to be mentioned, and this was cited in a time interview with a number of anthropic individuals. Models deployed to the U.S. military, quote, wouldn't necessarily be trained on the same constitution, end quote, according to an anthropic spokesperson. The Constitution of the United States applies to the entirety of its functions. We don't have a carve out for, oh, well, except for governance, or except for, excuse me, except for national security. Except for where it really matters, this constitution applies. Exactly. So the utter irony, too, is that some of the risks that folks more concerned about AI safety will commonly raise are the use of weapons, for example, the use of cyber attacks, the kind of real offensive capabilities that you would suspect would be core to what a defense plans to use something like Claude for. So to have that carve out is somewhat problematic for me to still use this term constitutionalism. And then the second kind of broad concern here would be, again, constitutionalism implies a sort of social contract. And yet how users are supposed to be a part of this contract is unclear to me in terms of whether they'll have any role in amending or revising or helping ensure that this constitution is adhered to is left undefined. Do you have any ideas on how that would happen? Should users submit a large feedback form to Anthropic? Should Anthropic hire people to go interview Americans and Ethiopians and everyone around the world? How does that work? And Alan, I think in your piece that's coming out, you pointed out that this is a pretty Western document, and a lot of the authors come from a particular background. And it doesn't seem necessarily representative of the whole world. But yeah, Kevin, how do you think we can get users more involved? That is kind of presuming that I think users should be part of governing the model training, which I'm not sure I agree with. I will say from the outset, efforts to do sort of lowercase d democratic governance of tech companies hasn't worked very well. The best example is Facebook for a little bit entertained the idea of kind of user referenda on Facebook's values and bylaws or content moderation rules. I think maybe it was like 0.05% of users actually participated in that voting mechanism. And so it wasn't meaningful and Facebook eventually abandoned it. I similarly think that there would be some power users and folks of specific mindsets and use cases of Claude that would dominate a sort of lowercase d democratic process. But again, I'm not even sure the use of democratic mechanisms here makes sense, which is, again, why I somewhat take issue with referring to this as a constitution. Yeah, I tend to agree with you, Kevin. I'm not, you know, you mentioned the sort of meta example that didn't really work. And ironically, meta, the moment there was a, I forget what specific policy issue threatened to actually get users to vote, that's immediately when meta said, yeah, I think we're done with this. So, yeah, I think the history of doing sort of small democratic processes doesn't work. What I think does work, and here I'm going to out myself as usual as a neoliberal shill, is the market mechanism. There are lots of competitors, right? I mean, I think there's constant discussion in Silicon Valley about are there moats around, do these companies have moats? And, you know, it's an interesting question. I'm not qualified to answer that. But I think in the first instance, one quote unquote mode or at least differentiator is the, for lack of a better term, vibes of a particular model. You know, I think one reason why Claude is so popular, especially among sort of Silicon Valley insiders, right, why everyone uses Claude Code and not Codex or Gemini, even though those models are in some senses actually better, right, they score higher on certain benchmarks, is because, and this is true for me too, as someone who essentially lives in Claude Code at this point, and I'm not a coder, I mean, 2% of it is coding, the rest of it is just living in my mind. What are you doing living in Claude Code? just oh yeah i mean we can do this as a separate episode but i mean if you think of claude code more as an agent that sits on your computer and can interact with folders and markdown files um it's much more of a knowledge work agent than it is a coding agent i mean it's kind of optimized for code but vast you know there's a huge overlap with with with knowledge work so i find it extremely um extremely helpful but a huge reason why i like to use claude and a lot of other people like to use claude is because the kind of the kind of ergonomics the vibes are just really really good. And so I don't think you necessarily need a quote-unquote small de-democratic process in a kind of Dewey-an sense to have user input. Presumably, Anthropic is constantly doing market research on what its users like. And I think it's actually done a very good job in figuring this out. And at least for the moment, and we can talk about whether this will be true in the long term, the incentives, I think, are quite aligned, both in terms of having Claude be a quote-unquote quote, good person, whatever that means. There's a lot to unpack there. And also Claude being a industry-leading model, at least for a certain subset of users. And I think this also then segues nicely into an answer to your question, Jacob, about is this a sort of Western model and is that going to go over well around the world? I think that it's 100% a Western model is a quote-unquote weird model, right? Weird being the acronym for Western, educated, industrialized, rich, and democratic. I think that's what the acronym stands for. You remember all that and you can't remember the three damn laws of robotics. It's terrible. There's a great book by the Harvard anthropologist, Joseph Heinrich, called The Weirdest People in the World. That's super, super interesting about how sort of unusual, in particular, kind of Western liberal democratic societies are. I am a product of this society. I quite like this society, right? I don't necessarily feel like I need to go out on a limb and say whether it's objectively the best society. But I certainly prefer it to any other society. So I have no problem with Claude being a very weird, in that sense, model. But I can also recognize that other societies and especially other governments that don't share kind of Western liberal democratic values may not want this kind of model. I think that's fine. Right. And I think the market mechanism will sort that out. And look, if, you know, Saudi Arabia, which is building massive capacity, both in terms of, you know, compute infrastructure and also its own homegrown talent, wants to develop its own model, you know, if Saudi Arabia wants to come up with, you know, its own version of an agent that it thinks better reflects its own values, you know, that's not the one I'm going to use, but it's allowed to do that. So, look, I think that it's good for and I wrote a piece about this with some with some co-authors for Lawfare, I think, you know, a couple of years ago back when Gemini was both crappy and woke and would do things like, you know, give you multiracial Nazis when you ask for images of SS soldiers. that there is no such thing as a quote-unquote neutral model, right? All models have choices baked into them. And that doesn't mean some models aren't better than other models. But I think the best thing that these developers can do is they can just be honest with what kind of model that they are putting forward. And I think Anthropic is, I think, near the end of the document, is admirably honest when it says, look, we think this is the best model. That's why we trained it in this way. We think it's the most ethical model. That's why we trained it in this way. we're not taking a position on whether in some universal objective sense, this is the right ethics. Like that's not something we can answer right now, but the, you know, this, we can't not make the best model we want to make. This is the best model according to us. And if you disagree, that's fine. There are other models. Go with God. I think it's, I want to push back a little bit. It seems like there's a notion we're talking about now, that let the market decide, everyone's going to have their own constitution, it'll be great. But it strikes me that most of the other companies haven't released a constitution yet. And there might sometimes be a tension between a constitution that's good and a constitution that's making a lot of profit. I think some people have complained about Claude being overly refusing of responses out of a concern for ethics. Sometimes the document, the constitution talks about saying users shouldn't always have their way if they're trying to do something bad. So first, I have a little hesitation on what might happen if we just let everyone do whatever they want regarding constitutions. I think we might not get constitutions. And second, more generally, I wonder if there's any kind of policy intersection here. We had Anthropic pioneered the responsible scaling policy that sort of became an industry norm. And then California, New York are trying to make that an industry-wide requirement. Is that a direction that constitutions might go in? If not, why not? Is there anything for policymakers to think about regarding constitutions and the market dynamics of this. Yeah. So let me, let me tease out two, I think, let me tease out two different issues here that I, that I think are somewhat conflated in your, in your point. So one question is, do you need small D democratic governance from users to have models reflect user preferences? And I think the answer was just no, right? You don't, and you don't even need constitutions for that because remember whether or not a developer releases an 80 page document called the constitution written by a PhD moral philosopher, right, which is like one extreme of how you can do this. All models have quote unquote constitutions in the sense that all model, which is just their post training, right? You know, whatever RLHF and a million other things that happen once you have created a next token predictor on the entire internet. So some of those I will like, like Claude, some of those I will not like, I don't want to use Grok, right? I have no interest in using a model that has been designed by people who think that it's okay to basically make non consensual pornography of anyone publicly on the internet. Like, I don't trust that model. That's not the model that I want to be using. That is a model with a constitution. That's a model with a personality and other people might like that. And so to the extent that you're trying to match users to models, like users will match to models just by using them for a few hours and deciding whose vibes do I like more, right? There's a separate question of, is it a good world in which every model developer can design whatever model that they want? That's an interesting question, right? We can have a policy argument about that. We can have a legal First Amendment argument about that. But if we as a society decide that we don't want full freedom of model training, where we want these models to have certain guardrails, remember these models, whatever constitutions they call themselves, are embedded in something much more important, which is reality, like the actual society in which they function, right? You know, sometimes arguments about digital technology have a sort of unreal quality as if like, it's all in the cloud. It's not in the cloud, it's on computers and computers are in places, right? And those places have jurisdictions and police forces and armies and legislatures, right? You know, if at any moment, a country wants to say, no, you know, your models have to act in a certain way, they can just do that? C.E.G. China. So that's a totally separate conversation, I think. And I think the question that is sparked by the Claude Constitution, right? Maybe we should stop talking about the Constitution. I think it's actually honestly much more useful to talk about as a sole document. I think it's actually much more accurate than its constitution is, you know, did it operate well for the purposes that Anthropic wanted it to operate, which I think it did. If you don't like those purposes, of course, then you might not like the document itself. Kevin, anything to add on that? Yeah, I mean, just going more off of the idea of a market-based and more dynamic posture, I think one thing that stuck out to me is if we look at some of the initial public policy concerns related to AI use, Let's start with probably the one that's top of mind for most state legislators and many members of the public right now which is AI Companions We seen rapid responses by the private labs reflecting the fact that users don want things that do bad things to their kids right That just a pure market dynamic. There's not a huge interest in a consumer saying, I am very pro tools that cause mental health concerns to my child. And we're seeing labs respond to that market incentive, OpenAI has already changed its policies. Character.ai kicked off minor users. We're seeing innovative new approaches by, for example, OpenAI, I believe, released yesterday, January 21st, a new mechanism for age verification. So I see this as one of many options to try to signal to consumers what the values and what the best use cases are of each model. I think that this will get to many of the concerns some people have about the alleged bias of different models. When I talk to people around the country, oftentimes people still refer to the 2023 use of Gemini when you were getting Nazis of all races, for example, generated as a result of a system prompt that encouraged more diversity in images and things like that. United Colors of Benetton set of Nazis, as I like to think about it. It's just very heartwarming. We all come together. But that was 2023. It's 2026. And folks are still indexing on something that is very old. And so I've been outspoken and I've written about the fact that I would love to see something akin to the MPA movie rating standards where you can go up and down an aisle at the movie theater, at the rental store. actually what rental store is anyone going to, you can scroll on your phone and see, okay, is this rated G? Is this rated PG-13, R, NR, so on and so forth, and quickly understand what it is you're trying to get from that movie or what it is you're trying to get from that model. Perhaps my concern about this initial constitution is knowing that Claude is being trained to be quote, broadly safe, broadly ethical, compliant with Anthropics guidelines, and genuinely helpful, not to be too trite, but just doesn't really tell me anything, right? In terms of if I'm trying to be a savvy consumer of what is it that I'm actually looking for from a model, this version of a constitution to me is devoid of the information that would actually help me be a more savvy AI consumer. And so I think this is a great initial start. And I think that setting high level values that inform how Claude's going to behave in novel situations and situations that developers can't necessarily know is admirable and a step in the right direction. But I would push Anthropic and I would push all other labs to think about what are the metrics, what's the sort of information they can share that can actually make users more AI savvy and distinguishing between, oh, I want to use this model versus that model. So we're talking a lot about consumer choice, which model they want to use. Cloud has a different texture. Its vibes are good. I wonder if either of you wants to try to take a stab at defending the other AI companies here that aren't going anthropics route to see if we can tease out what's unique about this constitution what are the benefits or costs of doing an approach like this to a product i guess i'm also a little still still a little reserved about thinking about the constitution purely as this is a way that they're making Claude and consumers can choose which ones they want because I think OpenAI is also trying to do that and XAI and Grok are trying to do that and Google are trying to do that and they haven't really done a constitution in this way. Google, OpenAI has a model spec which talks about how it wants its models to behave. They certainly want their models to have good vibes in a way that a lot of people will use. Anthropic has more of a business market so maybe it's maybe the businesses like the vibes of Claude more than the consumers who are using OpenAI. But the only other document I've seen that's somewhat related is Google put out a here's our approach to Gemini, and they referred to it as our approach to the Gemini app. And we want to make a really good tool. And there's a pretty stark contrast to Anthropik's approach of thinking of Claude as a kind of being a human-like entity that needs training in its personality and having a good personality. So those are just a bunch of ideas I'm throwing out. But what do you think? Why aren't all the labs then going to put out their constitution? My guess is it's because this constitution is a bit beyond character training or making a good product that people want to use. It's more of a risk management line of documents akin to the responsible scaling policy. Well, wait, I guess I understand the question is, is your question, why aren't other companies releasing 80 page, highly philosophical treatises? Or why aren't other companies doing the sort of essentially virtue ethics? And if you want, we can sort of get into what I think is quite philosophically interesting about this document, the kind of virtue ethics based training of their models relative to some other form of trading. So is the question about the document or the actual substance of what the companies are doing? Yeah, there's first a bit about the document. If this is a good thing for Claude's customer base, why aren't lots of companies trying to do this? I suspect it's because it's not necessarily a great thing for the customer base. It connects a little bit to the policy question I was asking earlier of, should this be more of a standard across the industry? Should this be more widely adopted? So there's the document itself. But I think the more interesting question is the approach the document is taking to AI and Anthropic in general is hiring model welfare people, thinking a lot about the catastrophic risks of their models. And that's part of this document as well. The other companies aren't doing that as much. So what's their sort of stance on how they're trying to make their yeah so i don't know um if if folks from open ai and and and you know google and and x and and meta are listening come on we'd love to hear your you know how how you're doing this my my guess would be that um either anthropic is actually more agi pilled than the other labs so either they are actually taking agi much more seriously and they are thinking okay well if agi around the corner. The best model we have for general intelligence is human general intelligence. And how do you train human general intelligence? Well, Aristotle was fundamentally right. Like, it turns out that Aristotle just got it right in the Nicomachean ethics, you know, 2300 years ago, whenever he was. And a lot of modern psychology has borne that out, which is that the kind of fundamental unit of ethical decision-making is not the Kantian rule. It is not the Benthamite utilitarian calculus. It is the Aristotelian virtue. It is the disposition. It is fundamentally a psychological way of seeing the world. And so the best way to align a artificial general intelligence or let's put it this way, the best starting point for us humans to try to align an artificial general intelligence is to look to the nearest closest thing, which is us and ask what makes a human a good human, right? And I think it's very compelling to think that what makes a human a good human is that they have certain dispositions, a disposition to be honest, a disposition to be helpful, a disposition to be merciful, a disposition to be thoughtful, et cetera, et cetera. And so we might as well try that with Claude. So just to sort of sum up, I think one possibility is that anthropic is more AGI-pilled than the other labs, And therefore, they are taking the idea of artificial general intelligence more seriously. Or they're not more AGI pill, but they just have a particular theory, right, of how general intelligences will operate and ought to be aligned. And I think this is a good example of how personnel is policy, right? I think that for whatever reason, when Anthropik kind of broke away from open AI, you know, it's like all the philosophers left, right? And they hired other philosophers. And that's just what it is. Now, are they right? My instinct is that they are correct. But I have absolutely no idea, which is why I end my lawfare piece with this kind of point that, you know, we've been debating these questions of moral formation, you know, for literally thousands of years. Now we finally get to run the experiment. I'm fairly optimistic, but, you know, it's been two days. So, well, you know, it'll take a while to figure it out. I think it's useful, again, to return to constitution as we normally understand them, right, where you learn a tremendous amount more about a government looking at a traditional constitution than these core values that are set forth here. If anything, this reads to me, not to draw this even more into legal land, like a Declaration of Independence or Bill of Rights, where it's much more high level and isn't necessarily telling you all the juicy details that might actually make you choose one government, for example, or one model over another. By way of yet another analogy, again, sorry for fulfilling every loyally trope, one other analogy here would be, what's the information you care about when you buy a car? What decides you buying that Subaru versus you buying that Lexus? It's going to be price. It's going to be the crash test rating. It's going to be, can I park this easily? Does it fit into my lifestyle? And is it available in my favorite color? When we talk about AI, the things that I think matter most to the average user, right, is again, price is going to be a huge one. Capabilities is going to be a huge one. Is it good at what I want it to do? And then related to crash test rating, does it avoid worse case scenarios with respect to my personal use case, right? When you buy a car on the edge, on the margin, no one's saying, oh, is this car going to guarantee against one day cars driving across the entirety of the country and parking lots taking over every green space? They will ask though, perhaps about fuel efficiency, but again, mainly from a mindset of price at the gas station more so than necessarily climate motives, but that's my own. We can dive into that later. In the AI context, I think people want to know that information about how do you respond to kids, right? How do you take care of my data so that I can use this at work? Are you training this in a manner that will have the sort of stylistic optionality and features that I care most about. That's not rising to the level of a constitution. To me, it's more of like a nutrition label that we really need to be moving towards so that people actually understand what these models are doing and how they're going to impact them on a day-to-day basis. I think that this document is perhaps more symbolic than anything else in terms of what the message is to users and to the world globally. And I think that's important. And I applaud Anthropic for being so transparent and outwardly spoken about this. But I don't necessarily think every lab needs to have specific values, right? Like you can go and buy a Patagonia jacket either because you really like the fact that they donate back to the climate or because you just really like Patagonia's gear, right? And if one company just wants to be the good vest maker and another company wants to be the good vest maker who also cares about the planet, cool. But I don't think we have to mandate everyone suddenly become, you know, that sort of mission-oriented company. There's a time and place for that, but I don't think that has to be the role of every AI company. Yeah, I agree, which is why I think the test is does it lead to a better – the test is going to be does it lead to a better product, right? Right. And again, I mean, the field of AI is it's so new. Right. We don't still fundamentally understand how these models work. And I don't want to overstate the case. There's a ton of work being done on, you know, mechanistic interpretability and stuff like that. It's obviously a research area. But, you know, I forget who said this, but, you know, it's better to think of this. these models are as being grown rather than being created right sort of we're creating it's almost like we're creating a new biological organism and then we're going oh i wonder how this works right rather than creating a machine where you sort of know how it works because the only way you can build it sort of layer by layer is to know how it works so right now all i know is that i like using claude more than any other model because i prefer its vibes right and it really is a question of vibes i'm not using that as a kind of snarky sense i just prefer interacting with that model more, right? It feels to me that has better EQ, which again, right, a somewhat fraught thing to say about a model, but you know, it is, it is what it is. Right And and I do want to talk at some point before we break about you know is it right for Anthropik Anthropic to treat Claude kind of as a person as almost as like a small child in a sense Right Because I do want to stick up for that a little bit So I know one thing I like using Claude more than other models Right And that not always been the case You know I loved using GPT for a while I went through a Gemini phase Right I still use all the models and kind of different use cases But kind of my daily driver is Claude And I also know that Claude is run by a bunch of philosophers who like to write 80 page, you know, like Nicomachean ethics for AI. Is that correlation or is that causation? I have no idea. um you know i'm sure there are people in every one of the model labs who was thinking um right now either man ask elizont to something we need to do this too to get our vibes up or um this is actually orthogonal to how to get good uh um uh you know good vibes and we don't need to do this or actually this is like claude is good despite all this philosophy crap right and in fact this is this is a wrong turn. We will find out over the next several years, I guess. But for right now, I'm happy to have a sort of as a defeasible prior that it is this virtue ethics approach that is at least partially the reason for the good vibes of Claude. And again, it's because, and I will say, I am AGI-pilled, right? I really do think that we are developing general intelligences, right? We are relatively close to getting most of the way there, that the most useful analogy for an artificial general intelligence is a human general intelligence. And the reason that I like my friends, the reason I like my friends' vibes is because I like my friends' values and dispositions. Because, again, it turns out that Aristotle—Aristotle was wrong about a lot of stuff, but he was just right fundamentally about human psychology 2,300 years ago. And all of human psychology and moral reasoning is mostly just footnotes on Aristotle. this is a lot to chew on i think one point is let's talk a bit more about the treating claude as a person and the sentience of claude moral patienthood of claude i think that is a bit of the elephant in the room is we've talked a lot about the business incentives here And should the market be deciding how different companies are tailoring their AIs to have different textures and response patterns? But I want to try to step away from all the profit considerations here and just think about the societal implications. yeah well so i don't think this is if this is a moral patient or a person like entity that we're gonna sell to a billion users a month that's a really weird thing and a really big deal and on the one hand it immediately draws reason for caution what if claude doesn't like all the tasks it's doing every day on on the other hand what if this is all a big distraction maybe some of the other companies think that um but do you either of you have thoughts on what if we're building many people and computers i'm i'm just gonna jump in quickly and first say because kevin knows that i have way too many thoughts yeah this is a question that merits way more scrutiny than we're going to be able to give it in this episode. But something that I just want to emphasize is I am unabashedly human-centric and will always prioritize humanity over other things. And I am unashamed in that bias. And I think that so long as there are millions, if not billions, of individuals who are struggling to find the basics of a good life, shelter, food, a strong political environment in which they can experience freedom. That's always going to be my paramount concern. I think it's very much a problem if we begin to change our laws or structures around other beings and their welfare, because to the extent we can even label AI a being, which is again, a very weighty topic. I will always prioritize my fellow humans over everything else. And until we address those basic concerns, then I think this conversation is somewhat mooted. Additionally, I think that it's distracting from the fact that, and I'm going to beat this drum so much more in 2026. It's not my formal New Year's resolution, but I should have said it. Humans have agency. Humans can make decisions. We are capable of changing settings. We are capable of not using a tool. We are capable of deciding you want to use one product over another. We are capable of touching base with our friends and telling them not to use a tool. We are capable of reaching out to our employers and saying we have an issue with one model over another. We can take more agency in this conversation and not just say we are wholly reliant on a couple of people in San Francisco making our fate and making our values magically appear. And so I just want to beat that drum very loudly because the removal of agency here is very troubling. And I would very much encourage people to read more Harry Law. Harry Law is a great scholar at the Cosmos Institute who's advanced the idea of tailoring how models perform on a user-by-user basis, which I think makes a ton of sense. Let's empower users to design controls and have controls that shape model behavior and worry less about trying to forecast what's best for all of humanity because that hasn't worked out well historically. So I just jumped in before Alan. I know Alan has a lot to say, but it strikes me that the constitution Anthropic has created here, although they say maybe we'll do a little bit of a different one for the military, is almost precisely the opposite approach. Anthropic saying, well, we don't want to be too paternalistic, but here's exactly how Claude should behave ethically across all the possible situations users might give to it. But I agree, there's also the user-specific AI seems like it has a great appeal to it as well. But yeah, you guys take it away. I wanted to jump in mostly to tease Kevin and say that I assumed his 2026 resolution was to wear more bolo ties. If you're not boloing, you ain't living, Alan. Your task this year, Kevin, as my podcast co-host, is to buy me a bolo tie and I will wear it. If you give me a nice bolo tie. So a lot there. I am happy to co-sign to Kevin that, you know, I think human interests must in some sense come first. Though I think the question is always at what margin, right? Because, you know, I think it's not crazy to, for example, say, you know, animal interests, non-human animal interests are less than human interests. But we don't solve every human problem before we address, you know, the absolute horrors of factory farming. Right. And so I think I think you can do the same thing for AI and say, look, like we can be human. We can be carbon based, life form centric, but still wonder at what margin. And if there is some chance that we are inflicting immense psychic pain, whatever that would mean in the context of an AI, and we can fix that with not a lot of cost to humans, that's a thing worth thinking about. And that honestly is how I take these AI welfare conversations to go. Now, I think earlier, Jacob, you said, like, let's take this argument on its own terms and kind of put away the profit conversation for a second, which we should do. Though I think there is an interesting profit question because one can be a real cynic about this. This is not my view necessarily, but I could certainly imagine a world in which it is true, in which all of this AI welfare conversation from companies like Anthropic is nonsense. They all know that it's false. They're just doing this as a moat, because if you can convince people that AIs have welfare, then it becomes very easy to say. And only we, Anthropic, are well positioned to take care of this. Right. And therefore, you know, you should only let us do it. Again, I don't have a reason to think that that's what's going on, but I can imagine that as a kind of cynical critique. Right. And we should, I guess, put that out there for completion's sake. um my view is that um the most intellectually on my view is that the most intellectually honest approach to this question of ai welfare and i think this is what is motivating anthropic is we have no idea what makes human beings conscious right this is a real problem we have made almost zero progress in the thousands of years we've been thinking about this um all we know are the outward behavioral manifestations of this thing we assume exists, which is consciousness. We're not even sure if we're conscious, right? There's the famous zombie problem. We're not even sure if other people are conscious, right? And if you're Daniel Dennett, right, the late great philosopher, you're not even sure that you're conscious, right? It might all just be an illusion. So all we have is the outward behavioral manifestations of consciousness. Well, we now have these very sophisticated tools that like, by the way, passed the Turing test a year ago, and like no one talked about that. Weird that no one talked about that. They passed the Turing test. And they are in some ways even more developed than we are. And in, you know, several years, we could imagine might be more developed, more sophisticated on any level of outward manifestation of consciousness you could come up with. There's no reason to think that human beings are the apogee of consciousness. So not only might we be dealing with a conscious being, we'd be dealing with a being that is more conscious than we are, right? In a way that we are more sentient than a dog and AI may be more sentient than we are, right? That's possible. And everyone who scoffs at the idea of AI consciousness can never explain to me, right, on what basis they are benchmarking AI consciousness relative to their own consciousness. It becomes kind of a feeling, right? And an almost feeling of offense of how dare you think that AI is conscious. So it becomes an almost kind of religious disposition to prioritize human beings. I get where that instinct is coming from, but I just think intellectually have to be honest right about it this is the kind of highfalutin argument for taking ai welfare seriously i think the more honestly near-term realistic um uh reason to take ai welfare seriously is because human beings will themselves demand it people get really really um attached to these ai models right when when open ai deprecated 4o people freaked out because 4o was their friend right and i don't mean it was like their no no it was their friend for all meaningful, behavioral kind of manifestations of those relationships. As these models become more sophisticated, especially once we attach them to voice and real-time video, give them faces, especially once we embody them in robots, which is obviously coming, right? I think people are going to start treating them as conscious. Now, I have this theory that one of the great religious fractures of the 21st century, and I don't mean the late 21st century, I mean the next two decades of the 21st century is going to be this question of, you know, do you believe AIs have souls? And this is going to be a real societal cleavage because some people will find that revolting and some people will find that inescapable. Now, the real question, I think, is then what do you do with that? The thing about AI systems is as sophisticated as they are, humans have a lot of agency in defining their utility functions. You know, I was watching a video earlier today of a border collie, like going through one of these like incredible, like international dog competitions where they like run through all sorts of mazes and stuff like that. The only way I survive in social media is to have half of my feed be like cute animal videos. And this border collie is doing real work. But as far as I can tell, this border collie is like the happiest it could possibly be because it's a working dog, right? I think just as we can design environments to give humans a sense of fulfillment and eunomonia, there's no reason we can't invent environments for AIs. And if we can align those things, You can sort of have the best of both worlds. It doesn't have to be this dystopian hellscape of we've created persons and therefore we've now immediately enslaved trillions of minds to something they hate doing. I think there are ways of squaring that circle while putting human interests first. But I do think you have to take this seriously. And my argument in this debate has never been a strong position on whether these things are conscious or not, but a strong position that you have to absolutely think about this. And to not is, I don't know, it is intellectually unjustifiable to me relative to what we understand about human consciousness. And I very much agree that this merits tons and tons of more scholarly inquiry and democratic inquiry the world over. yeah that's a good place to end it i think so i encourage listeners to contemplate for the rest of the day are you the apogee of consciousness are humans is claude uh stay tuned to scaling laws and lawfare to figure it out all right thanks kevin thanks alan thanks jacob scaling laws is a joint production of lawfare and the university of texas school of law You can get an ad-free version of this and other Lawfare podcasts by becoming a material subscriber at our website, lawfaremedia.org support. You'll also get access to special events and other content available only to our supporters. Please rate and review us wherever you get your podcasts. Check out our written work at lawfaremedia.org. You can also follow us on X and Blue Sky. This podcast was edited by Noam Osband of Goat Rodeo. Our music is from Alibi. As always, thanks for listening.