Reddit Inks $60 Million AI Content Licensing Agreement with Google
Social media platform Reddit has finalized a landmark content licensing contract with Google. The deal is worth about $60 million per year and will supply the search and advertising giant with Reddit data to train artificial intelligence (AI) models.
As per sources, this is Reddit's first major agreement to provide its trove of user-generated content to an AI developer.
For Reddit, which is filing for an initial public offering (IPO) that would unveil its finances, the epic deal shows the company's push to diversify revenue through data licensing.
Seeking New Revenue Streams Before IPOThe agreement comes as the company is poised to make its highly anticipated IPO paperwork public as early as this week. The filing would give investors an unprecedented glimpse into Reddit's balance sheet as the 16-year-old company aims to go public.
Reddit, last valued at $10 billion in 2021, intends to sell around 10% of its shares in the offering. The IPO would mark the first major social media platform to hit the markets since Pinterest debuted in 2019.
For Reddit, the imminent stock listing caps years of anticipation and pressures on the company to demonstrate a viable business model to Wall Street.
The company has ramped up initiatives to diversify revenue, including its recent move to charge companies for API access to data. Notably, tech giants like Google have faced backlash for scraping websites without permission to obtain AI training data, raising copyright concerns.
This deal hands Google an enormous corpus of conversation data spanning virtually every topic imaginable to enhance its AI models.
User Content on AI Advancement and Recent Backlash on Tech GiantsRecently, OpenAI and backer Microsoft faced a lawsuit alleging unauthorized incorporation of nonfiction books into the dataset for ChatGPT, its wildly popular chatbot.
Authors Nicholas Basbanes and Nicholas Gage spearheaded the proposed class action, claiming violations of copyright.
This controversy mirrors similar cases brought by creatives against AI developers, including a recent wide-spanning lawsuit targeting Google's data collection practices.Filed by the Clarkson Law Firm, the complaint asserted Google scraped users' data without consent to improve AI services like writing assistant Bard.
It hinged on Google's updated privacy policy, expressly mentioning mining publicly available information to advance its AI. Despite Google stating this is not a new practice, merely extended to new offerings, the lawsuit spotlighted growing unease around the exploitation of personal data.
It also accentuated the need for transparent sourcing of training data. By licensing Reddit content, Google gains access to a continually updating trove of real-world human conversations on niche communities covering sports, health, science, foods, parenting, and more.
This self-moderated, real-time interplay holds immense value for advancing natural language AI. Since its 2005 founding by Steve Huffman and Alexis Ohanian, Reddit has built an engaged user base that drives the direction of conversations big and small. Discussions regularly span from lighthearted to deeply personal.
This ever-evolving digital record offers AI researchers a bottomless well of linguistic data reflecting how people communicate, argue, explain, and inform one another.
As such, access to the site's daily discourse shared by over 50 million daily active users will spur advances in language AI.
The post Reddit Inks $60 Million AI Content Licensing Agreement with Google appeared first on The Tech Report.