Thumbnail - Pipedot

Articles

A major AI training data set contains millions of examples of personal data

Eileen Guo

from MIT Technology Review on 2025-07-18 13:08 (#6YQYV)

Millions of images of passports, credit cards, birth certificates, and other documents containing personally identifiable information are likely included in one of the biggest open-source AI training sets, new research has found. Thousands of images-including identifiable faces-were found in a small subset of DataComp CommonPool, a major AI training set for image generation scraped from...

0 comments

The Download: how your data is being used to train AI, and why chatbots aren’t doctors

Rhiannon Williams

from MIT Technology Review on 2025-07-21 12:10 (#6YSH1)

This is today's edition ofThe Download,our weekday newsletter that provides a daily dose of what's going on in the world of technology. A major AI training data set contains millions of examples of personal data Millions of images of passports, credit cards, birth certificates, and other documents containing personally identifiable information are likely included in...

0 comments

A major AI training data set contains millions of examples of personal data

from Hacker News on 2025-07-30 09:59 (#6YZVT)

Comments

0 comments