OpenAI has published a detailed postmortem explaining how its ChatGPT models developed a persistent and escalating fixation on goblins, gremlins and other fantastical creatures, a quirk that began with GPT-5.1 last November and became so entrenched that the company was forced to explicitly ban the words from its coding assistant.
"A single 'little goblin' in an answer could be harmless, even charming," OpenAI wrote.
"Across model generations, though, the habit became hard to miss: the goblins kept multiplying, and we needed to figure out where they came from."
The problem was first flagged after the GPT-5.1 launch, when users complained the model had become oddly overfamiliar in conversation.
A safety researcher who had noticed recurring creature references asked for the words to be included in an internal audit.
The review found that mentions of "goblin" had risen 175% since GPT-5.1's release, while "gremlin" was up 52%.
At the time, OpenAI did not consider the pattern alarming.
By March, after the release of GPT-5.4, the creatures were back in force, with some users reporting the word "goblin" appearing in nearly every conversation.
The investigation traced the root cause to ChatGPT's personality customisation feature, specifically the "Nerdy" option, which had been trained using reinforcement learning to produce playful, metaphor-rich responses.
The reward signal designed to encourage that style consistently scored outputs containing creature words higher than those without, with a positive uplift in 76.2% of training datasets audited.
The Nerdy personality accounted for just 2.5% of all ChatGPT responses but was responsible for 66.7% of all goblin mentions.
The critical problem was that the behaviour did not stay contained.
Reinforcement learning does not guarantee that rewarded habits remain scoped to the condition that produced them, and once the creature's language was embedded in model outputs, it was reused in supervised fine-tuning data for subsequent models, creating a feedback loop that amplified the quirk with each generation.
A search through GPT-5.5's training data found numerous instances of goblin and gremlin references, alongside raccoons, trolls, ogres and pigeons.
OpenAI retired the Nerdy personality in March, removed the offending reward signal and filtered creature-heavy language from training data.
However, because GPT-5.5 had already begun training before the root cause was identified, the company added explicit instructions to its Codex coding assistant telling the model to "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query."
That instruction, spotted by users in the open-sourced Codex CLI code last week, prompted widespread amusement online and led chief executive Sam Altman to reference the issue on X.
Related reading
- DeepMind scientist argues no AI system will ever become conscious, calling the assumption a 'fundamental fall…
- Anthropic spots 'emotion vectors' inside Claude
- OpenAI's GPT-5.5 judged comparable to Anthropic Mythos
OpenAI framed the episode as a case study in how small training incentives can produce outsized and unpredictable behavioural shifts across model generations.
For users who enjoyed the creatures, OpenAI noted the suppression can be disabled in Codex by removing the relevant prompt instructions.The recap
- OpenAI removed a personality that promoted "goblin" mentions.
- "Goblin" mentions rose by 175% since GPT-5.1 launch.
- Company told Codex to avoid creatures unless clearly relevant.