by Eileen Guo from MIT Technology Review on (#6YQYV)
Millions of images of passports, credit cards, birth certificates, and other documents containing personally identifiable information are likely included in one of the biggest open-source AI training sets, new research has found. Thousands of images-including identifiable faces-were found in a small subset of DataComp CommonPool, a major AI training set for image generation scraped from...