The Structure of Generic Storage Costs

Users' expectations of storage seem to scale with the tremendous rate of hardware improvement. Users expect ever larger quantifies of data to be durable and highly available everywhere, including at work, at home, and via nomadic devices. They also expect information to be accessible on different platforms, and at exponentially decreasing costs. Unfortunately, this collective set of needs is relatively expensive to provide, and does not largely follow the exponentially decreasing cost of basic hardware.

The reason these needs have not tracked cost improvements is primarily that, as hardware storage costs decrease and the amount of data increases, the total cost of storage ownership becomes dominated by administrative costs, and, secondarily, that the needs require more than the cheapest commodity components can deliver. While it is true that one can fit a desktop PC with storage at exponentially decreasing cost, for most users, doing so does not meet the criteria of networked availability and durability. For availability, the storage needs to be on a server that can be accessed from any number of computing platforms; similarly, availability requires that the platform be something more than the simplest node, involving a hardware RAID or a variety of other features. For durability, the storage needs to be backed up or otherwise copied, which for administrative reasons generally means the bytes live on a server, which is subject to rigorous backup procedures.

Administering Storage

The cost of providing storage to users today is dominated by administrative costs, and, left to its own devices, the situation will only grow more pronounced. For example, one IT operation reports the following breakdown of storage costs for operating a 2 TB NAS system [14]:

Costs For Operating a 2 TB NAS System Used By 8000 Users:

Environment: 2%
Housing: 2%
Drives: 7%
Backups: 34%
Admin: 55%

In our own EECS department, the figures are perhaps more striking. It currently costs us $144/GB/year to provide a user with storage, broken down as follows: UCB EECS Cost of Providing a GB of Storage in a 3 TB System Used by 1400 Users:

Initial hardware purchase price 12.09%
Annual backup 68.75%
Hw/Sw maintenance 1.73%
System adm/management 17.43%

We suspect that the accounting of backups and administration is somewhat different in our own case and that of the cost figures cited by [14]. In our case, the backup costs are further divided up as follows:

Hardware 32.15%
Supplies (tapes) 40.34%
Staff 22.51%
Administrative overhead 5.00%

In any case, the implication seems clear: The cost of storage hardware per se is a small, and decreasing, fraction of the overall cost of storage. Moreover, the cost of tape backup is a high-order term. In effect, if we were provided with hardware free of cost, the overall cost of delivering fully loaded storage to our users would shrink by about 12%.

One may hope that the tape industry will come to the rescue, but there is good reason to believe that this will not happen. Thus far, tape has not kept pace with the rapid advance in disk storage capacity and cost. For example, recently, a 40 GB DLT tape (raw capacity) was priced at about 25% more than a 40 GB EDIE disk drive (~$50 vs. ~$40). And the economics of the industry may be such as not to motivate more rapid change. For data distribution and personal backups, other storage media will probably continue to dominate: Floppy disks are suitable for megabyte purposes, CDs for up to 650 MBs, and soon, writable DVD for up to 5 GB. These capacities of a useful order of magnitude for the needs of most PC users; and so, not surprisingly, while almost every PC comes with a CD drive and a disk drive, very few PCs employ a tape drive. Moreover, CD and DVD drives are manufactured for consumer markets much larger than the computer market, and it is credible that the hard disk manufacturers will eventually ship many more disk drives for the personal, home, automobile audio, and video market than they will for the computer market, continuing the already advanced displacement of tape from the consumer market. In sum, the tape industry does not benefit from the high volumes of the PC industry and consumer markets, and hence does not sustain the same level of investment as does disk and other technologies.

The high cost of tape backup up may not be worth the cost either. Backing up a petabyte at 1GBps would take 12 days, so common "full dump" strategies have to be revisited. By the same token, the time to restore a site from tape after a disaster is so large that using tape to do so may be a poor option, as it can take many days to restore a site under the most optimistic assumptions. Also, the reliability of tape-based archival and backup systems has been called into question. Convincing data on the performance of tape during disaster recovery is hard to come by, but anecdotal evidence of failures of supposedly backed up data is all too common. (From what we have been able to gather, reasonably high-end tape systems have good hardware performance, provided that tapes are not re-used frequently. Then, most errors are human errors, which can be avoided by bar-coding tapes, and other good, but, in the end, costly, administrative practices.)

As a result, disk as an alternative for tape backup is emerging. Affordable disk-to-disk (D2D) storage systems now are available in a variety of formats, and, although merits of tape versus disk will still be debated, purely disk-based systems are now a real possibility. Moreover, such systems would appear to have a number of desirable characteristics: They would better support incremental rather than full-partition backup, would provide faster and better random access to data, could possibly be cost-effective with respect to tape, and may be less subject to the sort of human error which may compromise disaster recovery.