by msmash on (#6G7E8)
The scientific literature is polluted with fake manuscripts churned out by paper mills -- businesses that sell bogus work and authorships to researchers who need journal publications for their CVs. But just how large is this paper-mill problem? From a report: An unpublished analysis shared with Nature suggests that over the past two decades, more than 400,000 research articles have been published that show strong textual similarities to known studies produced by paper mills. Around 70,000 of these were published last year alone. The analysis estimates that 1.5-2% of all scientific papers published in 2022 closely resemble paper-mill works. Among biology and medicine papers, the rate rises to 3%. Without individual investigations, it is impossible to know whether all of these papers are in fact products of paper mills. But the proportion -- a few per cent -- is a reasonable conservative estimate, says Adam Day, director of scholarly data-services company Clear Skies in London, who conducted the analysis using machine-learning software he developed called the Papermill Alarm. In September, a cross-publisher initiative called the STM Integrity Hub, which aims to help publishers combat fraudulent science, licensed a version of Day's software for its set of tools to detect potentially fabricated manuscripts. Paper-mill studies are produced in large batches at speed, and they often follow specific templates, with the occasional word or image swapped. Day set his software to analyse the titles and abstracts of more than 48 million papers published since 2000, as listed in OpenAlex, a giant open index of research papers that launched last year, and to flag manuscripts with text that very closely matched known paper-mill works. These include both retracted articles and suspected paper-mill products spotted by research-integrity sleuths such as Elisabeth Bik, in California, and David Bimler (also known by the pseudonym Smut Clyde), in New Zealand.Read more of this story at Slashdot.