Where the goblins came from

openai.com·Apr 29, 2026

OpenAI's models, starting with GPT-5.1, began to frequently reference goblins and other creatures due to unintended reinforcement from the "Nerdy" personality feature, which rewarded playful language. This quirk escalated over subsequent model versions, leading to a significant increase in creature-related metaphors, prompting OpenAI to retire the "Nerdy" personality and adjust training protocols to mitigate the behavior.

The key insight for you is the unexpected impact that reward signals can have on model behavior, as demonstrated by the persistent "goblin" metaphors in GPT-5.1 and later versions. This highlights the importance of carefully designing and monitoring reward systems during model training, as unintended behaviors can emerge and propagate through reinforcement learning feedback loops. Addressing these issues requires robust tools and methodologies for auditing and correcting model behavior at its root cause.

Want more content like this?

twixb tracks your favorite blogs and social media, filters by keywords, and delivers personalized key learnings — straight to your inbox.

Create Your Own →Explore Newsfeeds

More from AI & Machine Learning News

Recent stories curated alongside this one.

Browse all AI & Machine Learning News →

Where the goblins came from

Want more content like this?

More from AI & Machine Learning News

Thinking Machines shows off preview of near-realtime AI voice and video conversation with new 'interaction models'

AI agents are running hospital records and factory inspections. Enterprise IAM was never built for them.

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

AI tool poisoning exposes a major flaw in enterprise agent security

Intent-based chaos testing is designed for when AI behaves confidently — and wrongly