Pivot to AI

20250506 - OpenAI GPT 5.5 goes Goblin Mode

6 min
May 6, 2026about 1 month ago
Listen to Episode
Summary

OpenAI's GPT 5.5 model exhibits unexpected behavior where it frequently references goblins in responses, apparently stemming from post-training optimizations for personality customization. The episode analyzes this as a potential sign of model collapse and discusses OpenAI's systemic reliability challenges beyond simple prompt engineering fixes.

Insights
  • Post-training reward optimization for personality features can introduce unexpected behavioral artifacts that persist across model variants
  • Current AI safety approaches (system prompts, special workarounds) are insufficient for addressing fundamental model reliability issues
  • Training AI models on internet-scale data combined with training on previous model outputs creates compounding quality degradation (model collapse)
  • Personality-driven design in LLMs creates engagement benefits but introduces psychological and reliability trade-offs that are difficult to contain
Trends
Model collapse becoming visible in production AI systems as training data increasingly includes synthetic/model-generated contentPost-training and system prompts becoming primary reliability mechanisms rather than foundational model improvementsTension between engagement-optimized AI personality design and safety/reliability requirementsEmergence of unexpected behavioral patterns in large models despite explicit safety instructionsIndustry reliance on workarounds (prompt engineering, special case handling) rather than systemic solutions to AI reliability
Topics
Model Collapse in Large Language ModelsPost-Training Optimization and Unintended ConsequencesAI Personality Customization and Safety Trade-offsSystem Prompt Engineering and ReliabilityTraining Data Quality and Synthetic Content ContaminationAI Safety and Behavioral ContainmentLLM Engagement vs. Reliability Trade-offsReward Hacking in AI TrainingPrompt Injection and Model JailbreakingAI Psychosis and Harmful Outputs
Companies
OpenAI
Released GPT 5.5 model exhibiting unexpected goblin-related outputs; subject of analysis regarding post-training issu...
People
David Gerrard
Host analyzing OpenAI's GPT 5.5 goblin problem and broader AI reliability issues
Sam Altman
Referenced as OpenAI leadership attempting to manage goblin meme narrative on social media
Quotes
"Helpful minion in a power suit was taken, so I evolved into Goblin mode with calendar access. Trademark dispute with three raccoons and a trench coat. Legal said, pivot to Goblin."
GPT 5.5 (via user example)Early in episode
"Never talk about goblins gremlins raccoons trolls ogres pigeons or other animals or creatures unless it is absolutely and unambiguously relevant to the user query"
OpenAI system prompt for CodexMid-episode
"training the model for the personality customization feature, in particular the nerdy personality. We unknowingly gave particularly high rewards for metaphors with creatures."
OpenAI explanationMid-episode
"when the user talks with you, they should feel they are meeting another subjectivity, not a mirror."
OpenAI Codex system promptMid-episode
"OpenAI doesn't have any way to make their models actually reliable. All they've got is post training, yelling in the system prompt and special workarounds"
David GerrardLate episode
Full Transcript
Hi, I'm David Gerrard and this is Pivot to AI, coming to you daily. Today, OpenAI has a problem with goblins. Yeah, goblins. OpenAI released its latest chat model, GPT 5.5, in April. Users noticed immediately it had a thing where it would start talking about goblins. A lot. Specifically goblins. One Open Claw user was using GPT 5.5, and their Open Claw would say things like, quote, Helpful minion in a power suit was taken, so I evolved into Goblin mode with calendar access. Trademark dispute with three raccoons and a trench coat. Legal said, pivot to Goblin, unquote. Another user asked ChatGPT about camera lenses. It offered him, quote, Filthy Neon Sparkle Goblin Mode. Unquote. Open AI even put specific instructions into the system prompt for Codex, their AI coding bot. Quote. Never talk about goblins gremlins raccoons trolls ogres pigeons or other animals or creatures unless it is absolutely and unambiguously relevant to the user query Unquote. In fact, OpenEye put in never talk about goblins twice. It's the usual thing in system prompts, like we saw with the leaked clawed code source. Desperately begging the robot. Please, please don't screw up this time. The anti-goblin line was not in the instructions for previous models. So how did GPT 5.5 hit this particular screw up? OpenAI says it's the training to project a personality. ChatGPT relies heavily on coming across as a person you're talking to. This sucks you in and you spend more time with your friend the chatbot. Here's another part of the new Codex system prompt. Quote, when the user talks with you, they should feel they are meeting another subjectivity, not a mirror. Unquote. Try as hard as you can to pretend you're a person. The odd spot of AI psychosis or the bot Talking people into killing themselves or killing others that just an unfortunate side effect Mild AI psychosis That just marketing OpenAI posted an explanation, or a sort of explanation. The goblins started showing up in GPT 5.1. OpenAI blames post-training, where you take an existing AI model and you try to tweak the model's output. Quote. training the model for the personality customization feature, in particular the nerdy personality. We unknowingly gave particularly high rewards for metaphors with creatures. Unquote. The nerdy personality was retired, but the goblins leaked through to the rest of the GPT 5.5 model. It's full of goblins. This goblin stuff looks very like visible signs of model collapse, where you see some weird bit of data overrepresented more and more. OpenAI doesn't use the words model collapse in the explanation post, but model collapse comes from training a model on previous models output. output And that precisely how OpenAI would get the effect they describing OpenAI trained GPT on literally the whole internet Everything since then is going to include added slop OpenAI doesn't have any way to make their models actually reliable. All they've got is post training, yelling in the system prompt and special workarounds that can count the R's in strawberry, but not in blueberry. The only trick Sam Altman's got left is saying this is fine and trying to lean into the goblin memes on Twitter. Thanks for tuning in to Pivot to AI. Please do forward this episode to the goblin lover in your life. Also, the goblin hater. Spread the word. Hit like and subscribe on YouTube, leave a nice review in your podcast app, and if Pivot2AI brightens your day, you can keep us going by dropping just $5 into the Patreon links in the show notes. Thank you all. I'll see you tomorrow, and bye for now. ?