
![]()
The RAID concept (Redundant Arrays of Inexpensive Disks) was first proposed in 1987 by researchers Patterson, Gibson, and Katz at the University of California at Berkeley. They proposed that the cost, performance, size limitations, and reliability of storage subsystems could be improved by combining several small disks into a disk "array" that is perceived as a single disk by applications (and the operating system). They described five ways of combining disks, numbered 1 through 5. Later, RAID 0 was added to the list. Each configuration has different costs, benefits, and drawbacks.
All RAID implementations are completely transparent to applications that use them. No special coding is required. An application that is accessing data on a RAID system does not know anything about its existence. Progress does not know when you are using RAID. The operating system may not know either if you have an independent RAID storage subsystem.
RAID 0
RAID level 0 is also called "striping". In this configuration, a group of disks act as a single disk (a "stripe set") with data striped across all the disks. The striping strategy is that the total storage space divided into equal-size sections or "stripe blocks" that are allocated round-robin among all the disks. The size of the striper block is configurable is normally a megabyte long or longer.
Benefits
- Read and write performance for large sequential files is greatly enhanced because all the disks in the stripe set can be accessed in parallel. Read and write performance for small random access transfers is about the same as for separate disks.
- Disk activity will, on the average, be balanced across all the disks in the stripe set so no one disk will become a "hot spot" with much more activity than the others. This improves performance on the average for all users.
- Several small disks can be combined into a single large logical disk to provide increased storage capacity.
Drawbacks
- Reliability is extremely low. Failure of a single disk causes loss of the entire stripe set. The mean time to failure (mttf) of the entire stripe set is inversely proportional to the number of disks.
RAID 1
RAID level 1 is also called "mirroring". In this configuration, a duplicate copy of each disk is stored on a second "mirror" drive. All data are written to both sides of the mirror. This provides redundancy and fault tolerance because each disk is duplicated. If one fails, its mirrors can still be accessed and no data are lost because of the failure. One can configure RAID 1 to have more than one duplicate (e.g. "triple mirroring") copy of each disk.
Benefits
- Reliability is very high. A disk failure has no impact on performance. If any disk fails, its mirror can continue to function and the failed disk can be replaced when it is convenient. Many modern storage subsystems are designed so that failed components can be replaced while the storage subsystem is fully operational.
- Read performance is slightly better than single drives because data can be read from either side of the mirror.
Drawbacks
- Cost is highest. The cost per megabyte is slightly more than doubled (or tripled if two redundant copies are made) compared to single drives. On the other hand, disks are cheap. Prices have been declining rapidly and steadily for the past decade and is expected to continue to do so in the foreseeable future.
- There is a slight degradation in write performance compared to single drives.
RAID 2
RAID level 2 combines striping and redundancy. The striping strategy is to spread the data across multiple disks at the bit level. Redundancy is provided by writing an error correcting code (a Hamming code, for you buzzword fans) across several additional disks. The error correcting code has the property that data lost from failure of one data disk can be recovered by reading from one of the error correction disks.
Benefits
- High reliability.
- Slightly better storage efficiency and cost advantage over mirroring.
- The error correction disks must all be updated whenever any data are written. Write performance is less than the throughput of a single disk.
Drawbacks
- Since data are striped at the bit level, all reads and all writes require accessing multiple disks. This has the effect that performance is worse than for a single disk.
- RAID 2 is not commercially viable and can be safely forgotten.
RAID 3
RAID level 3 combines the same fine-grained striping strategy as level 2, but uses only one additional disk for error correction instead of several. The error correction code is compute by taking the exclusive or of all the data disks.
Benefits
- Much better storage efficiency than RAID 1.
- Better reliability than RAID 0.
- Since data are striped at the bit level, all reads and all writes require accessing multiple disks. This has the effect that performance is worse than for single disks.
- Lower cost than RAID 2
Drawbacks
- Since there is a single disk that contains error correction data, whenever any other disk is updated, the error correction disk must also be updated. Write performance is limited to the throughput of a single disk.
RAID 4
RAID level 4 is similar to level 3 except that the striping strategy is like RAID 0 (relatively large stripe blocks) rather than at the bit level. This technique allows reconstruction of all data that was present on any single drive that has failed.
Benefits
- Much better storage efficiency compared to mirroring.
- Better reliability than RAID 1.
- Read performance is excellent.
- More reliable than RAID 0
- Better storage efficiency than RAID 1
- Lower cost than RAID 2
- More performance than RAID 3
Drawbacks
- Since there is a single disk that contains error correction data, whenever any other disk is updated, the error correction disk must also be updated. Write performance is limited to the throughput of a single disk.
- RAID 4 does not appear to be commercially viable and is rarely used.
RAID 5
In RAID level 5 configurations, data are striped across several disks along with "parity" data. The striping strategy is the same as for RAID 0 (relatively large stripe blocks). The parity data is distributed across the drives in such a way that a data group and its parity information are always written to different devices. This technique allows reconstruction of all data that was present on any single drive that has failed.
Benefits
- Read performance can be quite good.
- Reliability will be high. If one disk fails, the remaining disks continue to function without loss of data or availability. The failed disk can be replaced when it is convenient.
- Cost is low compared with mirroring (RAID 1).
Drawbacks.
- Write performance will be terrible. This is because the parity data must be updated whenever a block is written. In the worst case, writing a single database block requires four i/o operations. The following operations are required:
- Read the old stripe block
- Read the old parity data
- Exclusive or the old data group with the parity data
- Merge the new database block into the old data group
- Exclusive or the new data with the parity data
- Write the new stripe block
- Write the new parity data
"But since the data and parity are on separate drives, they can be read in parallel" you say. Yes that is true. But TANSTAAFL (There Ain't No Such Thin As A Free Lunch). Reading two disks at the same time uses up half your disk bandwidth. As you can see, wrting can use even more than half.
- Although a failed disk can be replaced, when they are, reconstructing its contents requires reading the entire contents of every other disk, writing the entire contents of the failed disk, and rewriting the parity information on all other disks. This can adversely affect overall system performance.
Implementation types
There are three common RAID implementations: in the operating system, in the disk controller, in a separate subsystem.
A RAID implementation that is part of the operating system is cheap - you don't have to pay for it (or not much). It isn't worth much either and should be avoided.
RAID implementation in a disk controller uses less processor power and system resources that an operating system based solution. Usually, special device drivers are required for these disk controllers. These types of systems are inexpensive but provide very limited fault-tolerance. Failed disks cannot be replaced while the system is operational.
A RAID implementation that is a separate subsystem from the computer is best. It can have its own power supplies, battery backup, spare controllers, hot-standby disks, and other capabilities all independent of the computer. Failure of a component in the computer should not affect the disk subsystem and vice versa. An excellent example of this type of system is Data General's Clariion disk array. Many other vendors also provide these types of systems.
![]()
Copyright 1997, Progress Software Corp., All Rights Reserved