Research Summaries Written By AI Fool Scientists
An anonymous reader quotes a report from Scientific American: An artificial-intelligence (AI) chatbot can write such convincing fake research-paper abstracts that scientists are often unable to spot them, according to a preprint posted on the bioRxiv server in late December1. "I am very worried," says Sandra Wachter, who studies technology and regulation at the University of Oxford, UK, and was not involved in the research. "If we're now in a situation where the experts are not able to determine what's true or not, we lose the middleman that we desperately need to guide us through complicated topics," she adds. Researchers are divided over the implications for science. The chatbot, ChatGPT, creates realistic and intelligent-sounding text in response to user prompts. It is a 'large language model', a system based on neural networks that learn to perform a task by digesting huge amounts of existing human-generated text. Software company OpenAI, based in San Francisco, California, released the tool on November 30, and it is free to use. Since its release, researchers have been grappling with the ethical issues surrounding its use, because much of its output can be difficult to distinguish from human-written text. Scientists have published a preprint2 and an editorial3 written by ChatGPT. Now, a group led by Catherine Gao at Northwestern University in Chicago, Illinois, has used ChatGPT to generate artificial research-paper abstracts to test whether scientists can spot them. The researchers asked the chatbot to write 50 medical-research abstracts based on a selection published in JAMA, The New England Journal of Medicine, The BMJ, The Lancet and Nature Medicine. They then compared these with the original abstracts by running them through a plagiarism detector and an AI-output detector, and they asked a group of medical researchers to spot the fabricated abstracts. The ChatGPT-generated abstracts sailed through the plagiarism checker: the median originality score was 100%, which indicates that no plagiarism was detected. The AI-output detector spotted 66% the generated abstracts. But the human reviewers didn't do much better: they correctly identified only 68% of the generated abstracts and 86% of the genuine abstracts. They incorrectly identified 32% of the generated abstracts as being real and 14% of the genuine abstracts as being generated. Wachter says that, if scientists can't determine whether research is true, there could be "dire consequences". As well as being problematic for researchers, who could be pulled down flawed routes of investigation, because the research they are reading has been fabricated, there are "implications for society at large because scientific research plays such a huge role in our society". For example, it could mean that research-informed policy decisions are incorrect, she adds. On the contrary, Arvind Narayanan, a computer scientist at Princeton University in New Jersey, says: "It is unlikely that any serious scientist will use ChatGPT to generate abstracts." He adds that whether generated abstracts can be detected is "irrelevant." "The question is whether the tool can generate an abstract that is accurate and compelling. It can't, and so the upside of using ChatGPT is minuscule, and the downside is significant," he says.
Read more of this story at Slashdot.