Microsoft AI Engineer Says Company Thwarted Attempt To Expose DALL-E 3 Safety Problems
Todd Bishop reports via GeekWire: A Microsoft AI engineering leader says he discovered vulnerabilities in OpenAI's DALL-E 3 image generator in early December allowing users to bypass safety guardrails to create violent and explicit images, and that the company impeded his previous attempt to bring public attention to the issue. The emergence of explicit deepfake images of Taylor Swift last week "is an example of the type of abuse I was concerned about and the reason why I urged OpenAI to remove DALL-E 3 from public use and reported my concerns to Microsoft," writes Shane Jones, a Microsoft principal software engineering lead, in a letter Tuesday to Washington state's attorney general and Congressional representatives. 404 Media reported last week that the fake explicit images of Swift originated in a "specific Telegram group dedicated to abusive images of women," noting that at least one of the AI tools commonly used by the group is Microsoft Designer, which is based in part on technology from OpenAI's DALL-E 3. "The vulnerabilities in DALL-E 3, and products like Microsoft Designer that use DALL-E 3, makes it easier for people to abuse AI in generating harmful images," Jones writes in the letter to U.S. Sens. Patty Murray and Maria Cantwell, Rep. Adam Smith, and Attorney General Bob Ferguson, which was obtained by GeekWire. He adds, "Microsoft was aware of these vulnerabilities and the potential for abuse." Jones writes that he discovered the vulnerability independently in early December. He reported the vulnerability to Microsoft, according to the letter, and was instructed to report the issue to OpenAI, the Redmond company's close partner, whose technology powers products including Microsoft Designer. He writes that he did report it to OpenAI. "As I continued to research the risks associated with this specific vulnerability, I became aware of the capacity DALL-E 3 has to generate violent and disturbing harmful images," he writes. "Based on my understanding of how the model was trained, and the security vulnerabilities I discovered, I reached the conclusion that DALL-E 3 posed a public safety risk and should be removed from public use until OpenAI could address the risks associated with this model." On Dec. 14, he writes, he posted publicly on LinkedIn urging OpenAI's non-profit board to withdraw DALL-E 3 from the market. He informed his Microsoft leadership team of the post, according to the letter, and was quickly contacted by his manager, saying that Microsoft's legal department was demanding that he delete the post immediately, and would follow up with an explanation or justification. He agreed to delete the post on that basis but never heard from Microsoft legal, he writes. "Over the following month, I repeatedly requested an explanation for why I was told to delete my letter," he writes. "I also offered to share information that could assist with fixing the specific vulnerability I had discovered and provide ideas for making AI image generation technology safer. Microsoft's legal department has still not responded or communicated directly with me." "Artificial intelligence is advancing at an unprecedented pace. I understand it will take time for legislation to be enacted to ensure AI public safety," he adds. "At the same time, we need to hold companies accountable for the safety of their products and their responsibility to disclose known risks to the public. Concerned employees, like myself, should not be intimidated into staying silent." The full text of Jones' letter can be read here (PDF).
Read more of this story at Slashdot.