This article was published on May 9, 2014

The Wayback Machine passes 400 billion indexed webpages, covering the Web from late 1996 to a few hours ago


The Wayback Machine passes 400 billion indexed webpages, covering the Web from late 1996 to a few hours ago

The Internet Archive today announced a massive milestone for its Wayback Machine: 400 billion indexed webpages. The data encompasses the Web as it looked anytime from late 1996 up until a few hours ago.

To celebrate the milestone, the Internet Archive has provided a list of The Wayback Machine highlights over the years:

  • 2001 – The Wayback Machine launches.
  • 2006 – Archive-It launches, allowing libraries that subscribe to the service to create curated collections of Web content.
  • March 25, 2009 – The Internet Archive and Sun Microsystems launch a new datacenter that stores the whole Web archive and serves the Wayback Machine. This 3 petabyte data center handled 500 requests per second from its home in a shipping container.
  • June 15, 2011 – The HTTP Archive becomes part of the Internet Archive, adding data about the performance of websites to the collection of website content.
  • May 28, 2012 – The Wayback Machine is available in China again, after being blocked for a few years without notice.
  • October 26, 2012 – the Internet Archive makes 80 terabytes of archived Web crawl data from 2011 available for researchers, to explore how others might be able to interact with or learn from this content.
  • October 2013 – New features for the Wayback Machine are launched, including the ability to see newly crawled content an hour after it’s archived, a “Save Page” feature so that anyone can archive a page on demand, and an effort to fix broken links on the Web starting with WordPress.com and Wikipedia.org.
  • Also in October 2013 – The Wayback Machine provides access to important Federal Government sites that go dark during the Federal Government Shutdown.

Onwards and upwards! Will The Way Back Machine have 500 billion webpages indexed by 2015? We wouldn’t be surprised if it happened sooner.

See also – Internet Archive updates Wayback Machine to cover 240b URLs from 1996-2012, totaling 5PB of data and With over 3 million users per day, the Internet Archive switches to HTTPS connections by default

Top Image Credit: MamPrint

Get the TNW newsletter

Get the most important tech news in your inbox each week.