Raid 5 performance

Wed Sep 22 14:53:18 UTC 2004

On Wed, Sep 22, 2004 at 08:59:19AM -0400, Ralph Doncaster wrote:
> On Wed, 22 Sep 2004, Lennart Sorensen wrote:
> 
> > On Wed, Sep 22, 2004 at 07:58:01AM -0400, Ralph Doncaster wrote:
> > > OK, I'll make one last try to explain this.  The smallest RAID chunk size
> > > supported by Linux is 4KB.  The sector size on an IDE drive is 512-bytes.
> > > When writing a 4KB chunk to 2, 4, or 8 data drives, no read is necessary
> > > for checksum calculation.
> >
> > Well in that case I guess you would be right.  That would apply if you
> > actually ran the chunk size that small [...]
> 
> No, a larger chunk size doesn't eliminate the read for checksum
> calculation when you have an odd number of data drives.
> 
> > I well agree that if you are writing a multiple of the size of data that
> > fits acress the number of disks, you gain a bit from having 2&n data
> > disks, although if I had 16k chunks and 3 data drives and wrote 48k, I
> > would get the same benefit.
> 
> No.  You're wrong again.  But you'll have to figure it out on your own;
> I'm not wasting any more of my time...

Actually, he's right.  If a write exactly spans all of the data
disks in a strip, you can use that data to compute the parity
without reading any old values from the disk first.  If a write
spans some, but not all, of the data disks in a stripe, then
data must be read from the disk to compute the new parity info.

There is no difference between odd and even number of disks;
the issue is whether a write goes to all or some of the disks.

The chunk size affects how large a write buffer must be to
span all of the disks.  A larger chunk size means you have to
issue proportionally larger writes to fully span the stripe.

When performance is really important, using extremely large
buffers will ensure that most of the data in the buffer is a
series of complete stripes, and the corresponding parity can
be computed directly.  If the buffer is big enough, it is less
important whether is is exactly aligned on a stripe boundary
and an exact multiple of the stripe size - the extra work for
the partial stripes on the front and/or back of the buffer is
small compared to the speed of the large amount in the middle
(and that extra can be removing by delaying the write of the
trailing fragment and merging it with the leading fragment of
the next write).

-- 
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml