Google's Jigsaw Was Fighting Toxic Speech With AI. Then the AI Started Talking
tedlistens writes: All large language models are liable to produce toxic and other unwanted outputs, either by themselves or at the encouragement of users. To evaluate and "detoxify" their LLMs, OpenAI, Meta, Anthropic, and others are using Perspective API -- a free tool from Google's Jigsaw unit designed to flag toxic human speech on social media platforms and comment sections. But, as Alex Pasternack reports at Fast Company, researchers and Jigsaw itself acknowledge problems with Perspective and other AI classifiers, and worry that AI developers using them to build LLMs could be inheriting their failures, false positives, and biases. That could, in turn, make the language models more biased or less knowledgeable about minority groups, harming some of the same people the classifiers are meant to help. "Our goal is really around humans talking to humans," says Jigsaw's Lucy Vasserman, "so [using Perspective to police AI] is something we kind of have to be a little bit careful about." "Think of all the problems social media is causing today, especially for political polarization, social fragmentation, disinformation, and mental health," Eric Schmidt wrote in a recent essay with Jonathan Haidt about the coming harms of generative AI. "Now imagine that within the next 18 months -- in time for the next presidential election -- some malevolent deity is going to crank up the dials on all of those effects, and then just keep cranking." While Jigsaw says the unit is focused on tackling toxicity and hate, misinformation, violent extremism, and repressive censorship, a former Jigsaw employee says they're concerned that Perspective could only be a stopgap measure for AI safety. "I'm concerned that the safeguards for models are becoming just lip service -- that what's being done is only for the positive publicity that can be generated, rather than trying to make meaningful safeguards," the ex-employee says. In closing, the article leaves us with a quote from Vasserman. She says: "I think we are slowly but surely generally coming to a consensus around, these are the different types of problems that you want to be thinking about, and here are some techniques. But I think we're still -- and we'll always be -- far from having it fully solved."
Read more of this story at Slashdot.