There's No Tiananmen Square In the New Chinese Image-Making AI
An anonymous reader quotes a report from MIT Technology Review: There's a new text-to-image AI in town. With ERNIE-ViLG, a new AI developed by the Chinese tech company Baidu, you can generate images that capture the cultural specificity of China. It also makes better anime art than DALL-E 2 or other Western image-making AIs. But there are many things -- like Tiananmen Square, the country's second-largest city square and a symbolic political center -- that the AI refuses to show you. When a demo of the software was released in late August, users quickly found that certain words -- both explicit mentions of political leaders' names and words that are potentially controversial only in political contexts -- were labeled as "sensitive" and blocked from generating any result. China's sophisticated system of online censorship, it seems, has extended to the latest trend in AI. It's not rare for similar AIs to limit users from generating certain types of content. DALL-E 2 prohibits sexual content, faces of public figures, or medical treatment images. But the case of ERNIE-ViLG underlines the question of where exactly the line between moderation and political censorship lies. The ERNIE-ViLG model is part of Wenxin, a large-scale project in natural-language processing from China's leading AI company, Baidu. It was trained on a data set of 145 million image-text pairs and contains 10 billion parameters -- the values that a neural network adjusts as it learns, which the AI uses to discern the subtle differences between concepts and art styles. That means ERNIE-ViLG has a smaller training data set than DALL-E 2 (650 million pairs) and Stable Diffusion (2.3 billion pairs) but more parameters than either one (DALL-E 2 has 3.5 billion parameters and Stable Diffusion has 890 million). Baidu released a demo version on its own platform in late August and then later on Hugging Face, the popular international AI community. The main difference between ERNIE-ViLG and Western models is that the Baidu-developed one understands prompts written in Chinese and is less likely to make mistakes when it comes to culturally specific words. But ERNIE-ViLG will be defined, as the other models are, by what it allows. Unlike DALL-E 2 or Stable Diffusion, ERNIE-ViLG does not have a published explanation of its content moderation policy, and Baidu declined to comment for this story. When the ERNIE-ViLG demo was first released on Hugging Face, users inputting certain words would receive the message "Sensitive words found. Please enter again (...)," which was a surprisingly honest admission about the filtering mechanism. However, since at least September 12, the message has read "The content entered doesn't meet relevant rules. Please try again after adjusting it. (...)" In a test of the demo by MIT Technology Review, a number of Chinese words were blocked: names of high-profile Chinese political leaders like Xi Jinping and Mao Zedong; terms that can be considered politically sensitive, like "revolution" and "climb walls" (a metaphor for using a VPN service in China); and the name of Baidu's founder and CEO, Yanhong (Robin) Li. While words like "democracy" and "government" themselves are allowed, prompts that combine them with other words, like "democracy Middle East" or "British government," are blocked. Tiananmen Square in Beijing also can't be found in ERNIE-ViLG, likely because of its association with the Tiananmen Massacre, references to which are heavily censored in China. Giada Pistilli, a principal ethicist at Hugging Face, says it could be helpful for the developer of ERNIE-ViLG to release a document explaining the moderation decisions. "Is it censored because it's the law that's telling them to do so? Are they doing that because they believe it's wrong? It always helps to explain our arguments, our choices," says Pistilli. "Despite the built-in censorship, ERNIE-ViLG will still be an important player in the development of large-scale text-to-image AIs," concludes the report. "The emergence of AI models trained on specific language data sets makes up for some of the limitations of English-based mainstream models. It will particularly help users who need an AI that understands the Chinese language and can generate accurate images accordingly." "Just as Chinese social media platforms have thrived in spite of rigorous censorship, ERNIE-ViLG and other Chinese AI models may eventually experience the same: they're too useful to give up."
Read more of this story at Slashdot.