Article 64CHF The Ever-Expanding Job of Preserving the Internet's Backpages

The Ever-Expanding Job of Preserving the Internet's Backpages

by
msmash
from Slashdot on (#64CHF)
A quarter of a century after it began collecting web pages, the Internet Archive is adapting to new challenges. From a report: Within the walls of a beautiful former church in San Francisco's Richmond district, racks of computer servers hum and blink with activity. They contain the internet. Well, a very large amount of it. The Internet Archive, a non-profit, has been collecting web pages since 1996 for its famed and beloved Wayback Machine. In 1997, the collection amounted to 2 terabytes of data. Colossal back then, you could fit it on a $50 thumb drive now. Today, the archive's founder Brewster Kahle tells me, the project is on the brink of surpassing 100 petabytes -- approximately 50,000 times larger than in 1997. It contains more than 700bn web pages. The work isn't getting any easier. Websites today are highly dynamic, changing with every refresh. Walled gardens like Facebook are a source of great frustration to Kahle, who worries that much of the political activity that has taken place on the platform could be lost to history if not properly captured. In the name of privacy and security, Facebook (and others) make scraping difficult.

twitter_icon_large.pngfacebook_icon_large.png

Read more of this story at Slashdot.

External Content
Source RSS or Atom Feed
Feed Location https://rss.slashdot.org/Slashdot/slashdotMain
Feed Title Slashdot
Feed Link https://slashdot.org/
Feed Copyright Copyright Slashdot Media. All Rights Reserved.
Reply 0 comments