Raid 5 performance

John Macdonald john-Z7w/En0MP3xWk0Htik3J/w at public.gmane.org
Wed Sep 22 18:59:56 UTC 2004


On Wed, Sep 22, 2004 at 11:51:37AM -0400, Ralph Doncaster wrote:
> On Wed, 22 Sep 2004, John Macdonald wrote:
> 
> > On Wed, Sep 22, 2004 at 10:40:24AM -0400, Ralph Doncaster wrote:
> > > > If a write
> > > > spans some, but not all, of the data disks in a stripe, then
> > > > data must be read from the disk to compute the new parity info.
> > > This is a correct statement, and this is what happens when you write 16KB
> > > chunks to a stripe of 4 disks (3 data + 1 parity).  The Linux RAID code
> > > doesn't coalesce writes of multiple chunks (i.e. 3x 16KB chunks) to
> > > optimize away the theoretically unecessary read for parity computation.
> >
> > So, the driver does not compute the parity directly from the
> > data for a 3+1 setup when you write exactly one entire data
> > stripe of 12k, but it does for a 4+1 setup when you write
> > exactly one data stripe of 16k?
> 
> No, there you go trying to put words in my mouth again.  I've been polite
> up to this point, but you seem intent on being rude because you don't like
> being proven wrong.
> 
> RAID doesn't write exactly one entire data stripe of 12KB, since that
> is not a valid chunk size.  When writing 16KB to a 3+1 drive array, the
> first 12KB written should not require a read for parity computation.  It's
> the last 4KB that will require the read.

I'm not trying to be rude, and I only put words into my
own mouth.  Consider my remarks to be prefaced with "if I
understand you correctly, that means..." - I would expect that
to be naturally assumed by all readers but I'm hereby saying
it explicitly and I apologise for not making it explicit before.

Where does the 16k "chunk" get defined, by who and for what
purpose?

With a 3+1 setup, there are 3 data disks each using a 4k disk
allocation; giving 12k of data on the full RAID group in that
full stripe (plus 4k of parity data in the "extra" disk's
corresponding location).  If the program issues a 12k write,
why would the driver require that it get put into 16k "chunks"
before computing parity info?  The parity computation has to
be done on 12k units - the 3 4k portions are xor'ed together
to get the parity data.  Whatever purpose does it serve to
care at any point in this process about 16k chunks?  (Is this
to match Linux internal buffer allocation size or something?
I can't see any reason, but that doesn't mean there isn't one.)

I will admit that I haven't looked at the Linux driver, but
I have worked extensively with EMC Clariion and Symmetrix
disk arrays.  In those boxes, the data is processed at the
natural size for the stripe width.  They also have dual power
supplies and huge amounts of cache, so keeping the data in
memory long enough to collect the full stripe out of multiple
host write activities is easy.  That is equally true for 3+1,
4+1, and 7+1 setups, which are the ones most commonly supported
on those disk arrays.

> Despite all your efforts to complicate the issue, it's only grade-5 math.
> If the number of disk sectors written modulus the number of data drives !=
> 0, a read will be required for parity computation.  The number of disk
> sectors written for a RAID chunk has to be a power of 2.  Therefore with 2
> data disks (+1 for parity), a read will NEVER be required for parity
> computation when writing a chunk.

You state this as a rule that 5-th grade math can apply, but
you do not justify the rule.  I know that I was never taught
anything about using xor to compute parity in grade 5; not until
university.  Nor did grade 5 introduce any reason to take 12k
units and put them in 16k chunks on the way to storing them in
12k containers.  Give a valid reason and I'll accept the rule,
but not just an unsupported statement that this rule applies.

The xor parity track can be computed from all of the current
data, or it can be updated using the old and new values of any
changed portions to revise the old parity.  There is nothing
there that depends upon whether the total number of data disks
is even or odd.  I've done the math for this and it works.
It worked back in university and it continued to work in
various programs I've written over the years.

> If you still can't comprehend that, you'll have to ask someone else for
> help as I've sent in my unsubscribe request to majordomo-jmbJ75VLJBo at public.gmane.org

Offhand, I doubt anyone else could help explain this in the way
that you are describing, so I've copied you on this message.
Feel free to either ignore this message or to resubscribe -
I won't copy you personally again.

-- 
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list