Existing Petabyte Stores

There are a number of large data storage facilities of a scale of interest to this project. For example, the Internet Archive, and Google, the Internet search engine, manage very large storage facilities. In both cases, they have succeeded in building useful and scalable facilities out of commodity parts. In the case of the Internet Archive, their infrastrucure comprises a set of HP desktop machines with 4 100GB IDE drives for a hardware cost of $4k/TB. Similarly, Google is built from 8,000 no-name PCs, each with two 80 GB disks, resulting in 1.4 PB online at just over $5k/TB. These models have many other practical advantages as well. For example, Internet Archive reports that they can have new storage up and running 3 weeks after ordering it. Google machines are administered by 15 system administrators, who are therefore able to support ~100TB a piece.

Hence these stores provide a very attractive model. Unfortunately, while these stores provide some of what we need, they do not provide all of it. In particular, no bytes in these stores are considered especially precious, and hence, data are not systematically backed up. (Internet Archive makes an additional disk copy of data they regard as important, and stores some data on DLT tape.) Since the cost of backup dominates all other costs, these models do not immediate provide an overall solution to our needs.

Scientific communities, for example CERN and SDSC, manage large data centers that support very large scientific data sets. Some of these high performance disk systems with multiple gigabytes per second access rates, and concomitantly higher costs. However, SDSC also constructs generic storage out of commodity units, "grid bricks" in this case: Intel-based boxes that providing a 1.7 Ghz CPU, 1.1 TBs of disk, and a Gigabit Ethernet network card for ~$3,500. One can aggregate grid bricks using data grid technology to automate some administrative tasks, such as user authentication, access control, and disk management. Using data grid technology the system is scalable to petabytes, but requires a database to manage the distributed state information. Amortizing the cost of running and maintaining the database, the total cost is about $2,000 per TB per year. However, such a set up is much like Internet Archive and Google-it is used to support web access to data, but not to ensure data persistence. For archival storage systems, the current cost including labor, tape media, tape robots, software, and network connections is reported to be about $1000 per TB more per year (according to Reagan Moore of the SDSC).

All these figures can be expected to change dramatically in the next six months. For example, the Grid Brick estimate was based on 160 GB disk drives. 320 GB disk drives are now in existence, and could drive the cost down a factor of two when production increases. The tape technology is based upon 9940B tape media that stores 200 GB per cartridge. There are multiple vendors working on 1 TB tape cartridges.

The SDSC grid brick approach is interesting for us to consider, in that they deliver backed up storage for a fraction of our cost. However, it may or may not be applicable to our needs: It does not support multiple file system exports of the same data, and may entail an economy of scale that we may or may not be able to achieve. Also, the availability from these systems may be below that which is desired in an "enterprise" environment, which is how we view our research infrastructure.

Jim Gray and his colleagues have built a number of terabyte systems, including the TerraServer and the SkyServer. The TerraServer used somewhat higher-end Digital Alpha servers, while the later SkyServer uses more commodity components (albeit with SCSI-RAID disks). These systems do not in themselves provide archival stores of these data, but these researchers' experience (as for example described in Chung, Gray, Horst and Worthington) strongly suggests that low-end TB servers are a realistic option (and in many ways inspired this proposal).