Article 6NG1N Stable Diffusion 3 Mangles Human Bodies Due To Nudity Filters

Stable Diffusion 3 Mangles Human Bodies Due To Nudity Filters

by
BeauHD
from Slashdot on (#6NG1N)
An anonymous reader quotes a report from Ars Technica: On Wednesday, Stability AI released weights for Stable Diffusion 3 Medium, an AI image-synthesis model that turns text prompts into AI-generated images. Its arrival has been ridiculed online, however, because it generate images of humans in a way that seems like a step backward from other state-of-the-art image-synthesis models like Midjourney or DALL-E 3. As a result, it can churn out wild anatomically incorrect visual abominations with ease. A thread on Reddit, titled, "Is this release supposed to be a joke? [SD3-2B]" details the spectacular failures of SD3 Medium at rendering humans, especially human limbs like hands and feet. Another thread titled, "Why is SD3 so bad at generating girls lying on the grass?" shows similar issues, but for entire human bodies. AI image fans are so far blaming the Stable Diffusion 3's anatomy fails on Stability's insistence on filtering out adult content (often called "NSFW" content) from the SD3 training data that teaches the model how to generate images. "Believe it or not, heavily censoring a model also gets rid of human anatomy, so... that's what happened," wrote one Reddit user in the thread. The release of Stable Diffusion 2.0 in 2023 suffered from similar problems in depicting humans accurately, and AI researchers soon discovered that censoring adult content that contains nudity also severely hampers an AI model's ability to generate accurate human anatomy. At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some abilities lost by excluding NSFW content. "It works fine as long as there are no humans in the picture, I think their improved nsfw filter for filtering training data decided anything humanoid is nsfw," wrote another Redditor. Basically, any time a prompt hones in on a concept that isn't represented well in its training dataset, the image model will confabulate its best interpretation of what the user is asking for. And sometimes that can be completely terrifying. Using a free online demo of SD3 on Hugging Face, we ran prompts and saw similar results to those being reported by others. For example, the prompt "a man showing his hands" returned an image of a man holding up two giant-sized backward hands, although each hand at least had five fingers.

twitter_icon_large.pngfacebook_icon_large.png

Read more of this story at Slashdot.

External Content
Source RSS or Atom Feed
Feed Location https://rss.slashdot.org/Slashdot/slashdotMain
Feed Title Slashdot
Feed Link https://slashdot.org/
Feed Copyright Copyright Slashdot Media. All Rights Reserved.
Reply 0 comments