Archival Software
We are currently targeting Linux 2.6.x as the o/s platform, initially
using the Gentoo 2004.1 distribution. Tests so far are quite positive,
and this distribution includes improved versions of the NFS v3 server
and client as well as NFS v4, plus CIFS and Samba for sharing files
between Unix and Windows domains. Also available is a Java environment
from IBM.
The first round of data bricks will be set up to support both
semi-production storage as well as more experimental techniques by
partitioning the disks. Approximately 30% to 50% of the space will be
reserved for initial data storage use, with the remaining space being
available for experimenting with various storage and replication
strategies such as erasure coding and OceanStore. Over time, and based
on demand, more of the space will be diverted to production storage,
while small portions will be retained for developing and evaluating
other strategies.
Initially, we will keep multiple copies of the data on separate boxes
with periodic incremental scans of the data to migrate changed copies.
Eventually we will mirror the data between independent boxes for backup
and plan to archive the changes to a third box. The long term plans
will include adding erasure codes at each site for further reliability.
While the initial storage may well be set up as a collection of
separate but large data partitions, we plan to move to a setup where
the physical storage boundaries largely disappear from the end-users'
concern.
We also plan to automate the data management as much as possible, so
that adding more storage will require minimal configuration.
Self-diagnosing software monitoring will watch for data corruption and
hardware failures, with automatic fail-over to other standby resources
when possible. The monitoring software should be able to report when
repairs are necessary and possibly even predict when new space may need
to be added.
The goal is to create a reliable, scalable data storage system with a
low total cost of ownership which eliminates the need for tape
backups.
Comparison of Redundancy Methods
- Probability of data loss increases with archive size
- Raid-5 alone is not enough for high-reliability
- Mirrored Raid-5 helps, but not enough at large scale
- Erasure codes win out for the same expansion factor
- Target using 4-of-16 erasure coding with three mirrors at two sites: 5+5/6
Examined Maintenance Strategies
- Failure rates on cheap commodity disks are reportedly high (6%/year)
- Replacing units after one disk failure is too costly
- Replacing disks after one or two failures leaves no migration path
- Replacing units after two or three failures balances costs and rotates out older units
- Plan to use hot spares and replace units after the second disk failure
See Also
|