Acronym for Redundant Array (of) Inexpensive Disks.
The idea behind RAID is having an array of disks (usually inexpensive, although that doesn't stop people buying expensive disks for their RAID array) which are all put together to form one logical disk. There are different types of RAID (as listed below) which all have various advantages (and disadvantages)
See RaidOnLinux and RaidNotes for some specific notes about RAID under linux.
RAID 0: Striping
RAID 0 technically isn't RAID: It provides no redundancy or fault tolerance.
However data is spread across the disks (they aren't just concatenated). This means that file I/O speed is better than having a single disk since each disk can be reading or writing data independant of the others.
- No parity generation
- Easy to implement in software and hardware
- Cheap to implement
- Utilise full disk capacity, no space is wasted storing redundant pages
http://www.raidarray.eu.com/raid0.html or http://www.acnc.com/raid.html
- If any disk fails, you lose all your data
- Not true RAID
- Anything where you need fast I/O particularly streaming I/O, for example Video Editing.
RAID 1: Mirroring
When writing data, write it to all disks in the array, when reading read from any of the disks of the array. If any disk in the array fails you can replace it easily and rebuild the array without loosing data. If your disks are hotswappable, you can do this with only minor performance losses.
- Can support all but one disk in the array failing simultaniously
- Easy to implement in software and in hardware.
- The Cost per MB is high, since you need to buy at least twice as much disk space as you need.
- Extremely wasteful of disk space (since at least 50% of your data is being used elsewhere)
- Writes can be slowed down
- When you just can't afford to have your data die on you.
- When you need good read performance but don't care about write performance.
RAID 2: Striping + ECC
This uses striping with some disks holding ECC information. Apparently noone has ever implemented this spec, because it's so complicated and really, the other RAID levels do it better. Be sure to prove me wrong by finding someone that does RAID level 2 :)
RAID 3: Parity Disk
RAID 3 has a parity disk which stores an XOR of all the other disks. If any one disk fails, then this XOR can be used to recreate the data by XOR'ing all the other disks together and then XOR'ing the parity data. If the parity disk fails then it can be regenerated by XOR'ing all the disks together.
- Efficient use of data storage
- High read speed
- Requires at least 4 disks
- Inefficient with small data transfers
RAID 4: Block level striping with Parity Disk
RAID 4 stripes based on blocks instead of bytes and stripes the data across the disks except for one which stores the Parity. Performance is good.
RAID 5: Parity shared across disks
Ah, RAID 5! RAID 5 combines the advantages of 3 and 0 by spreading the parity infomation across all drives. This is the most common type of RAID.
- Optimum Cost/Performance/Fault Tolerance
- Very efficient
- Handles small writes efficiently
- Handles multiple I/O requests
- Requires at least 3 disks
RAID 6: Dual parity disks
Striped array with two parity disks. Any two disks can fail simultaniously with the array continuing on. Good for when the data can't EVER stop!
The only card I've found that implements RAID6 are SATA cards made by Areca (also rebadged as Tekram).
RAID 7 isn't a standard, some company trademarked it, and came up with their own proprietary system and called it RAID 7.
RAID 1+0 (or "10")
Data is mirrored and striped across multiple disks. (Combination of RAID 1 and RAID 0.
- Good performance
- Highly fault tolerant
- Very expensive
- Drive spindles must be synchronised for good performance
- Not very scalable
Two striped arrays mirrored.
- Lots of wasted disk space
- If two disks on opposing arrays die, you lose the entire array, where 1+0 would require two disks in the same position to die before you lose the array which is far less probable.
Visual explaination of various RAID setups:
- One suggested way of calculating the Stripe size for RAID systems that are doing a lot of random I/O (machines that are serving multiple users, eg email, compute servers etc) is to figure out the maximum throughput you can get through your disks (including controllers, PCI bus bandwidth etc). Then plug it into this formula
- stripesize = throughput / (drives * RPM/60)
then round down the stripesize to the nearest multiple of your filesystem cluster size (usually 4k).
Suggestions for the improvement of the estimation of optimal stripe size is solicited.