[Tfug] Distributed storage / how parity in RAID works.
Bowie J. Poag
bpoag at comcast.net
Tue Aug 22 20:34:23 MST 2006
Hi Chris,
*sigh*.. Class time, kids. Pull up a chair, and grab a decent
calculator. Here's how parity works in RAID. :)
Before I begin, it's important to know that not all RAID types use
parity--Also, I assure you, no magical psychic unicorns were harmed in
the making of this example. For the sake of keeping things simple, we're
going to talk about RAID 3.
Parity relies on a Boolean operator called XOR, or "Exclusive OR". XOR
behaves in a "one or the other, but not both" fashion, as seen below:
0 XOR 0 = 0 (False)
0 XOR 1 = 1 (True)
1 XOR 0 = 1 (True)
0 XOR 0 = 0 (False)
For the sake of keeping it simple, lets suppose we had a ridiculously
tiny, tiiiny RAID 3 set made up of 6 drives, each capable of holding a
whopping 2 bytes of data. Broken down by purpose, thats four drives to
store our raw data, a fifth drive to store our parity data (after we
calculate it), and a sixth drive for use as a "hot spare" i.e. a drive
that sits around waiting to be called into action should one of the
other drives fail.
Lets say have a program that saves the string "C7EF011FAB07717B" to
disk. Striped across the RAID, it would look like so.. Two bytes per drive:
Drive 1 contains "C7EF"
Drive 2 contains "011F"
Drive 3 contains "AB07"
Drive 4 contains "717B"
Drive 5 contains nothing so far.
Drive 6 is the hot spare, ready to be used if another drive dies.
Now keep in mind, when any data is written, parity for that data has to
be calculated and stored as well.. It's calculated using XOR across the
width of the RAID set. Whip out a calculator and calculate parity for
the stripe by hand:
C7EF XOR 011F XOR AB07 XOR 717B (equals)
Congratulations, you just calculated parity for a RAID stripe. The
result will be "1C8C". This gets stored on the Drive 5, the parity
drive. Now, the RAID set looks like so:
Drive 1 contains "C7EF"
Drive 2 contains "011F"
Drive 3 contains "AB07"
Drive 4 contains "717B"
Drive 5 contains "1C8C"
Drive 6 is the hot spare, ready to be used if another drive dies.
Now, lets suppose a week goes by, and Drive 3 dies in a horrible
accident involving ham, and a bicycle. Without knowing the contents of
Drive 3, it's possible to reconstruct it using the parity data we
calculated earlier when the data was originally written.
To find out what was stored on Drive 3, just take the parity information
you calculated, and substitute it in for the drive that failed:
C7EF XOR 011F XOR 1C8C XOR 717B (equals)
The result will be...drum roll........ AB07! Tah-dah! The contents of
Drive 3! Cool, huh?
Magic? Nope.
Psychic unicorns involved? No sir.
Just a clever use of a vastly under-appreciated Boolean operator.
Since one of our drives is dead, covered in ham and bicycle parts, were
going to need a replacement drive. Thats where the hot spare comes in.
Bring in the hot spare, write the result (AB07) to it, and declare the
hot spare as an official member of the RAID set. Voila--Your RAID has
been repaired.
What you've just done is exactly the same process that happens in "real
world" RAID.. It's just that in the real world, the stripe size is far
bigger, and the RAID set may involve many, many more drives.
While it's an impressive trick, parity will only protect you from
>single drive< failures. If you're missing more than one member of that
RAID set, you don't have enough information for the XOR calculation to
yeild the missing data. Thats the point I was getting at--If these guys
are using something like distributed parity across 11 nodes, and one
node goes down, every single one of those 10 remaining nodes will have
to provide it's parity data at some point in order to rebuild the
contents of the node that's been lost. With nodes constantly coming in
and out of existence, the overhead involved would be insane, bordering
on sheer---dare I say it--asshattery. Enough to totally make my monocle
pop out!
*Harumph!*
Cheers,
Bowie
Chris Niswander wrote:
>At 10:46 PM 8/21/2006 -0700, you wrote:
>
>
>>Two things come to mind here.
>>
>>If they're using something like distributed parity to fill in the gaps
>>for when one or more of the nodes goes offline, heh, good luck with that
>>one, chief. Unless the client comes bundled with a magical psychic
>>unicorn that can come up with the missing data out of it's butt, you're
>>going to have one hell of a time rebuilding a stripe because the
>>
>>
>
>Splitting up a secret into n pieces of information, so that:
>1. with any x (x<n) of those pieces the original secret can be regenerated
>2. fewer than x of those pieces are inadequate to find out anything
> about the secret.
>is a pretty well established area in cryptography by now.
>
>Try keyphrase: secret sharing
>
>In short, this is a feasible thing to do with backups.
>
>In digression: Bowie, are you trolling? ;-)
>
>Chris
>
>
>
>_______________________________________________
>Tucson Free Unix Group - tfug at tfug.org
>Subscription Options:
>http://www.tfug.org/mailman/listinfo/tfug_tfug.org
>
>
>
More information about the tfug
mailing list