Article 75EVY OpenAI Explains the "codex-goblins" Problem

OpenAI Explains the "codex-goblins" Problem

by
hubie
from SoylentNews on (#75EVY)

An Anonymous Coward writes:

References to goblins and gremlins spiked with the release of GPT-5.1's 'Nerdy' personality, and then spread to other models:

OpenAI is opening up about its goblin problem. After a report from Wired revealed instructions to OpenAI's coding model to "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures," the AI startup published an explanation on its website, calling references to the creatures a "strange habit" its models developed as a result of their training.As outlined in the blog post, OpenAI began noticing metaphors referencing goblins and other creatures starting with its GPT-5.1 model - specifically when using the "Nerdy" personality option. OpenAI says the problem continued to worsen with subsequent model releases, until it found that its reinforcement training rewarded the quirky metaphors with the Nerdy personality, which newer models were training on.

The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them. Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.

Original Submission

Read more of this story at SoylentNews.

External Content
Source RSS or Atom Feed
Feed Location https://soylentnews.org/index.rss
Feed Title SoylentNews
Feed Link https://soylentnews.org/
Feed Copyright Copyright 2014, SoylentNews
Reply 0 comments