xfs
[Top] [All Lists]

Re: md-RAID5/6 stripe_cache_size default value vs performance vs memory

To: Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>
Subject: Re: md-RAID5/6 stripe_cache_size default value vs performance vs memory footprint
From: Piergiorgio Sartor <piergiorgio.sartor@xxxxxxxx>
Date: Sat, 21 Dec 2013 13:20:14 +0100
Cc: Arkadiusz MiÅkiewicz <arekm@xxxxxxxx>, linux-raid@xxxxxxxxxxxxxxx, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=simple/simple; d=nexgo.de; s=mail-in; t=1387628422; bh=lpJpjjPI8yRd9NBp2SQpVche2BGZHmGaZrisHOJaevQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Transfer-Encoding:In-Reply-To; b=rkwbqN6IsFO4hydbjiu37Fup3M50qLlIh7CkXPguI1x2ALPOdJLJQVmaeN16XiLBx 3L4j1aWly2A6VJfm9tG5tiZAspqLp69D+geX32s6anIAEO4kH2fDl3BgMv4nnckpq1 mCj60AUzlt/j5VxQD/h2k0CGzslIaBkoeHgRSIR8=
In-reply-to: <52B57912.5080000@xxxxxxxxxxxxxxxxx>
References: <52B102FF.8040404@xxxxxxxxxxx> <52B2FE9E.50307@xxxxxxxxxxxxxxxxx> <52B41B67.9030308@xxxxxxxxxxx> <201312202343.47895.arekm@xxxxxxxx> <52B57912.5080000@xxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Sat, Dec 21, 2013 at 05:18:42AM -0600, Stan Hoeppner wrote:
> I renamed the subject as your question doesn't really apply to XFS, or
> the OP, but to md-RAID.
> 
> On 12/20/2013 4:43 PM, Arkadiusz MiÅkiewicz wrote:
> 
> > I wonder why kernel is giving defaults that everyone repeatly recommends to 
> > change/increase? Has anyone tried to bugreport that for stripe_cache_size 
> > case?
> 
> The answer is balancing default md-RAID5/6 write performance against
> kernel RAM consumption, with more weight given to the latter.  The formula:
> 
> ((4096*stripe_cache_size)*num_drives)= RAM consumed for stripe cache
> 
> High stripe_cache_size values will cause the kernel to eat non trivial
> amounts of RAM for the stripe cache buffer.  This table demonstrates the
> effect today for typical RAID5/6 disk counts.
> 
> stripe_cache_size     drives  RAM consumed
> 256                    4        4 MB
>                        8        8 MB
>                       16       16 MB
> 512                    4        8 MB
>                        8       16 MB
>                       16       32 MB
> 1024                   4       16 MB
>                        8       32 MB
>                       16       64 MB
> 2048                   4       32 MB
>                        8       64 MB
>                       16      128 MB
> 4096                   4       64 MB
>                        8      128 MB
>                       16      256 MB
> 
> The powers that be, Linus in particular, are not fond of default
> settings that create a lot of kernel memory structures.  The default
> md-RAID5/6 stripe_cache-size yields 1MB consumed per member device.
> 
> With SSDs becoming mainstream, and becoming ever faster, at some point
> the md-RAID5/6 architecture will have to be redesigned because of the
> memory footprint required for performance.  Currently the required size
> of the stripe cache appears directly proportional to the aggregate write
> throughput of the RAID devices.  Thus the optimal value will vary
> greatly from one system to another depending on the throughput of the
> drives.
> 
> For example, I assisted a user with 5x Intel SSDs back in January and
> his system required 4096, or 80MB of RAM for stripe cache, to reach
> maximum write throughput of the devices.  This yielded 600MB/s or 60%
> greater throughput than 2048, or 40MB RAM for cache.  In his case 60MB
> more RAM than the default was well worth the increase as the machine was
> an iSCSI target server with 8GB RAM.
> 
> In the previous case with 5x rust RAID6 the 2048 value seemed optimal
> (though not yet verified), requiring 40MB less RAM than the 5x Intel
> SSDs.  For a 3 modern rust RAID5 the default of 256, or 3MB, is close to
> optimal but maybe a little low.  Consider that 256 has been the default
> for a very long time, and was selected back when average drive
> throughput was much much lower, as in 50MB/s or less, SSDs hadn't yet
> been invented, and system memories were much smaller.
> 
> Due to the massive difference in throughput between rust and SSD, any
> meaningful change in the default really requires new code to sniff out
> what type of devices constitute the array, if that's possible, and it
> probably isn't, and set a lowish default accordingly.  Again, SSDs
> didn't exist when md-RAID was coded, nor when this default was set, and
> this throws a big monkey wrench into these spokes.

Hi Stan,

nice analytical report, as usual...

My dumb suggestion would be to simply use udev to
setup the drives.
Everything, stripe_cache, read_ahead, stcerr, etc.
can be configured, I suppose, by udev rules.

bye,

-- 

piergiorgio

<Prev in Thread] Current Thread [Next in Thread>