Capricorn Technologies Logo

Internet Archive Project

Company Name: Internet Archive Project

Line of Business: The Internet Archive Project was founded by Brewster Kahle in 1996 to build an Internet library offering permanent access for researchers, historians, and scholars to historical collections in digital format.  The Archive works to prevent the Internet and other "born-digital" materials from disappearing into the past.

Objective: To grow a multi-petabyte data archive comprised of high-density, energy-efficient, and reliable storage nodes at the absolute lowest price point.

Result: Starting with 100TB of Capricorn Technologies’ GB Series nearline storage, the Internet Archive Data Center currently houses more than 3 petabytes of GB and PS PetaBox nodes, and is continually adding PetaBox nodes to its datastore to accommodate the increasingly massive storage requirements associated with this ambitious and important humanitarian project.

http://www.archive.org

Capricorn Technologies Builds One of the World’s Largest Data Archives

Brewster Kahle is not a small thinker.

Brewster has built technologies, companies, and institutions aimed at advancing the goal of providing unfettered universal access to all knowledge in perpetuity. He currently oversees the non-profit Internet Archive (IA), one of the largest digital archives in the world, as Founder and Digital Librarian. Among its many endeavors, the IA has scanned over one hundred thousand books and has pioneered a print-on-demand bookmobile for people in remote areas of the globe who would otherwise have no access to a library.

Brewster began his career as a member of the Thinking Machines supercomputer team, where he invented the WAIS (Wide Area Information Servers) distributed text searching system built to search index databases on remote computers. After Thinking Machines, Brewster founded WAIS, Inc. which was sold to AOL in 1995. In 1996, Brewster went on to found the Internet Archive and its commercial counterpart, Alexa Internet. Alexa was then sold to Amazon.com.

Now, besides running the Internet Archive, he is a member of the board of directors of the Electronic Frontier Foundation, a non-profit advocacy and legal organization dedicated to preserving free speech rights in the context of today's digital age, and a key supporter of the Open Content Alliance, a consortium of non-profit and for-profit groups dedicated to building a free archive of digital text and multimedia.

Open and free access to literature and other writings has long been considered essential to education and to the maintenance of an open society. Public and philanthropic enterprises have supported it through the ages. Brewster Kahle is here to uphold this tradition and insure that access remains. As Brewster sees it, without cultural artifacts, civilization has no memory and no mechanism to learn from its successes and failures.

The Internet Archive’s combined collections currently receive over 6 million downloads a day. The Wayback Machine receives approximately 10 million daily hits, an average of 100-200 hits per second, 100K lookups-by-URL, and 4 million retrieval requests per day. Housing and serving up the entire IA digital library, including web pages, educational courseware, films, videos, music, spoken word, books, texts, and software, is more than 3 petabytes of Capricorn Technologies’ GB and PS series PetaBox technology.

The Archive could only grow to this massive scale under the proper storage conditions. Unable to find the low cost, power-sparing, high-density devices he required to realistically house the huge amounts of data the IA would be storing, Brewster brought in engineering talent in the form of his good friend and MIT crony, CR Saikley, to develop inexpensive devices based on Linux and commodity PC components.

“We began by evaluating off-the-shelf solutions, thinking that surely there was already something out there that met Brewster’s cost, density, and power efficiency requirements,” said Saikley, who went on to found Capricorn. “We soon discovered that there was nothing available that achieved what we knew was possible.” Having come to that realization, Saikley began development of a solution based on a low power-consuming x86-based VIA EPIA Mini-ITX mainboard and a design that circulated air in a manner that would maintain a low overall temperature for the box.

At that same time, Kahle and Saikley began discussing ways to make PetaBox technology available on a wider scale, and Capricorn Technologies was formed. Capricorn began operations shortly thereafter and shipped its first products in September of 2004. The Internet Archive remains a major consumer of PetaBox technology today, helping to guide new product development.

The IA's PetaBox Storage Cluster

The IA's current storage cluster is comprised of about 30 racks of 1200 PetaBox units with 4,800 spinning drives, for a total capacity of roughly 3 petabytes. Despite its large size, the IA's PetaBox installation draws only about 50kW of power, and is maintained by 2 full-time people. "The reliability of the PetaBox along with its low power-drawing and heat-dissipating properties are key to our operation," states Kahle. “We couldn’t have achieved the scale of storage we have today without this technology.”

In the nearly three years that the IA Data Center has been populated with Capricorn’s PetaBox, there has been ample opportunity for gauging the reliability and ownership costs associated with the storage.“Overall, we have seen an annual failure rate of about 3% on the individual systems, but given the redundancy, we are maintaining and growing our datasets”, states Kahle.“And the costs to scale and maintain these units decrease over time as component prices drop and more energy-efficient chips become available.”

Today Capricorn Technologies continues to expand the PetaBox family, providing storage solutions with the lowest possible Total Cost of Ownership to organizations worldwide. Capricorn Technologies is proud to be a leader in the current data storage movement towards truly affordable, manageable and reliable high-density storage.