Existing Petabyte Stores
There are a number of large data storage facilities of a scale of
interest to this project. For example, the Internet Archive, and Google, the Internet search engine,
manage very large storage facilities. In both cases, they have
succeeded in building useful and scalable facilities out of commodity
parts. In the case of the Internet Archive, their infrastrucure
comprises a set of HP desktop machines with 4 100GB IDE drives for a
hardware cost of $4k/TB. Similarly, Google is built from 8,000 no-name
PCs, each with two 80 GB disks, resulting in
1.4 PB online at just over $5k/TB. These models have many other
practical advantages as well. For example, Internet Archive reports that they can
have new storage up and running 3 weeks after ordering it. Google
machines are administered by 15 system administrators, who are therefore
able to support ~100TB a piece.
Hence these stores provide a very attractive model. Unfortunately, while these stores provide some of what we need, they do not provide all of it. In particular, no bytes in these stores are considered especially precious, and hence, data are not systematically backed up. (Internet Archive makes an additional disk copy of data they regard as important, and stores some data on DLT tape.) Since the cost of backup dominates all other costs, these models do not immediate provide an overall solution to our needs.
Scientific communities, for example CERN and SDSC, manage large data
centers that support very large scientific data sets. Some of these
high performance disk systems with multiple gigabytes per second access
rates, and concomitantly higher costs. However, SDSC also constructs
generic storage out of commodity units, "grid bricks" in this case:
Intel-based boxes that providing a 1.7 Ghz CPU, 1.1 TBs of disk, and a
Gigabit Ethernet network card for ~$3,500. One can aggregate grid
bricks using data grid technology to automate some administrative tasks,
such as user authentication, access control, and disk management. Using
data grid technology the system is scalable to petabytes, but requires a
database to manage the distributed state information. Amortizing the
cost of running and maintaining the database, the total cost is about
$2,000 per TB per year. However, such a set up is much like Internet
Archive and Google-it is used to support web access to data, but not to
ensure data persistence. For archival storage systems, the current cost
including labor, tape media, tape robots, software, and network
connections is reported to be about $1000 per TB more per year
(according to Reagan Moore of the SDSC).
All these figures can be expected to change dramatically in the next six months. For example, the Grid Brick estimate was based on 160 GB disk drives. 320 GB disk drives are now in existence, and could drive the cost down a factor of two when production increases. The tape technology is based upon 9940B tape media that stores 200 GB per cartridge. There are multiple vendors working on 1 TB tape cartridges.
The SDSC grid brick approach is interesting for us to consider, in that they deliver backed up storage for a fraction of our cost. However, it may or may not be applicable to our needs: It does not support multiple file system exports of the same data, and may entail an economy of scale that we may or may not be able to achieve. Also, the availability from these systems may be below that which is desired in an "enterprise" environment, which is how we view our research infrastructure.
Jim Gray and his
colleagues have built a number of terabyte systems, including the
TerraServer and the SkyServer. The TerraServer used somewhat higher-end
Digital Alpha servers, while the later SkyServer uses more commodity
components (albeit with SCSI-RAID disks). These systems do not in
themselves provide archival stores of these data, but these researchers'
experience (as for example described in Chung,
Gray, Horst and Worthington) strongly suggests that low-end TB
servers are a realistic option (and in many ways inspired this
proposal).
|