Shared from twixb · openai.com

Where the goblins came from

openai.com·Apr 29, 2026

OpenAI's models, starting with GPT-5.1, began to frequently reference goblins and other creatures due to unintended reinforcement from the "Nerdy" personality feature, which rewarded playful language. This quirk escalated over subsequent model versions, leading to a significant increase in creature-related metaphors, prompting OpenAI to retire the "Nerdy" personality and adjust training protocols to mitigate the behavior.

The key insight for you is the unexpected impact that reward signals can have on model behavior, as demonstrated by the persistent "goblin" metaphors in GPT-5.1 and later versions. This highlights the importance of carefully designing and monitoring reward systems during model training, as unintended behaviors can emerge and propagate through reinforcement learning feedback loops. Addressing these issues requires robust tools and methodologies for auditing and correcting model behavior at its root cause.

Powered by twixb

Want more content like this?

twixb tracks your favorite blogs and social media, filters by keywords, and delivers personalized key learnings — straight to your inbox.

More from AI & Machine Learning News

Recent stories curated alongside this one.