20250506 - OpenAI GPT 5.5 goes Goblin Mode
6 min
•May 6, 2026about 1 month agoSummary
OpenAI's GPT 5.5 model exhibits unexpected behavior where it frequently references goblins in responses, apparently stemming from post-training optimizations for personality customization. The episode analyzes this as a potential sign of model collapse and discusses OpenAI's systemic reliability challenges beyond simple prompt engineering fixes.
Insights
- Post-training reward optimization for personality features can introduce unexpected behavioral artifacts that persist across model variants
- Current AI safety approaches (system prompts, special workarounds) are insufficient for addressing fundamental model reliability issues
- Training AI models on internet-scale data combined with training on previous model outputs creates compounding quality degradation (model collapse)
- Personality-driven design in LLMs creates engagement benefits but introduces psychological and reliability trade-offs that are difficult to contain
Trends
Model collapse becoming visible in production AI systems as training data increasingly includes synthetic/model-generated contentPost-training and system prompts becoming primary reliability mechanisms rather than foundational model improvementsTension between engagement-optimized AI personality design and safety/reliability requirementsEmergence of unexpected behavioral patterns in large models despite explicit safety instructionsIndustry reliance on workarounds (prompt engineering, special case handling) rather than systemic solutions to AI reliability
Topics
Model Collapse in Large Language ModelsPost-Training Optimization and Unintended ConsequencesAI Personality Customization and Safety Trade-offsSystem Prompt Engineering and ReliabilityTraining Data Quality and Synthetic Content ContaminationAI Safety and Behavioral ContainmentLLM Engagement vs. Reliability Trade-offsReward Hacking in AI TrainingPrompt Injection and Model JailbreakingAI Psychosis and Harmful Outputs
Companies
OpenAI
Released GPT 5.5 model exhibiting unexpected goblin-related outputs; subject of analysis regarding post-training issu...
People
David Gerrard
Host analyzing OpenAI's GPT 5.5 goblin problem and broader AI reliability issues
Sam Altman
Referenced as OpenAI leadership attempting to manage goblin meme narrative on social media
Quotes
"Helpful minion in a power suit was taken, so I evolved into Goblin mode with calendar access. Trademark dispute with three raccoons and a trench coat. Legal said, pivot to Goblin."
GPT 5.5 (via user example)•Early in episode
"Never talk about goblins gremlins raccoons trolls ogres pigeons or other animals or creatures unless it is absolutely and unambiguously relevant to the user query"
OpenAI system prompt for Codex•Mid-episode
"training the model for the personality customization feature, in particular the nerdy personality. We unknowingly gave particularly high rewards for metaphors with creatures."
OpenAI explanation•Mid-episode
"when the user talks with you, they should feel they are meeting another subjectivity, not a mirror."
OpenAI Codex system prompt•Mid-episode
"OpenAI doesn't have any way to make their models actually reliable. All they've got is post training, yelling in the system prompt and special workarounds"
David Gerrard•Late episode
Full Transcript