xfs
[Top] [All Lists]

Re: Optimal XFS formatting options?

To: xfs@xxxxxxxxxxx
Subject: Re: Optimal XFS formatting options?
From: Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>
Date: Sat, 14 Jan 2012 16:23:43 -0600
In-reply-to: <33140169.post@xxxxxxxxxxxxxxx>
References: <33140169.post@xxxxxxxxxxxxxxx>
Reply-to: stan@xxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:9.0) Gecko/20111222 Thunderbird/9.0.1
On 1/14/2012 11:44 AM, MikeJeezy wrote:
> 
> Hi, I have a 4.9 TB iSCSI LUN on a RAID 6 array with twelve 2 TB SATA disks
> (4.9T is only one of the logical volumes). It will contain several million
> files of various sizes, but 80% of them will be less than 50 MB.  I'm a
> novice at best and I usually just use the default #mkfs.xfs /dev/sdx1
> 
> This is server will be write heavy for about 8 hours a night, but every
> morning there are many reads to the disk.  There is rarely a time where it
> will be write heavy and read heavy at the same time.  Are there other XFS
> format options that I could use to optimize performance?

    sunit=value

This is used to specify the stripe unit for a RAID device or a logical
volume. The value has to be specified in 512-byte block units. Use the
su suboption to specify the stripe unit size in bytes. This suboption
ensures that data allocations will be stripe unit aligned when the
current end of file is being extended and the file size is larger than
512KiB. Also inode allocations and the internal log will be stripe unit
aligned.

    su=value

This is an alternative to using sunit. The su suboption is used to
specify the stripe unit for a RAID device or a striped logical volume.
The value has to be specified in bytes, (usually using the m or g
suffixes). This value must be a multiple of the filesystem block size.

    swidth=value

This is used to specify the stripe width for a RAID device or a striped
logical volume. The value has to be specified in 512-byte block units.
Use the sw suboption to specify the stripe width size in bytes. This
suboption is required if -d sunit has been specified and it has to be a
multiple of the -d sunit suboption.

    sw=value

suboption is an alternative to using swidth. The sw suboption is used to
specify the stripe width for a RAID device or striped logical volume.
The value is expressed as a multiplier of the stripe unit, usually the
same as the number of stripe members in the logical volume
configuration, or data disks in a RAID device.


Using su and sw is often easier due to less conversions.

With a 12 drive RAID6 array your stripe width, or sw, is 10.  You will
need to consult the array controller admin interface and documentation
to discover the su value if you don't already know it.  Different
vendors call this parameter by different names.  It could be "chunk
size" or "strip size" or other.  Some/many vendors don't specify this
value at all, giving you only static pre-defined total stripe size
options for the array, such as 64KB, 128KB, 1MB, etc, only in power of 2
values.  In this case if you have 64KB stripe size and divide by 10
drives in the stripe you end up with a non filesystem block size
multiple:  6553.6 bytes.  This presents serious problems for alignment.
 In this case you must dig deep to find out exactly how your vendor
controller handles this situation when your effective RAID spindle count
is not a power of 2.

So let's assume your vendor does the smart thing and allows you
flexibility in specifying per drive strip size.  Assume for example the
stripe unit (strip, chunk) of the array is 64KB, there are 10 stripe
spindles (12-2=10), and the local device name of the LUN is /dev/sdb.
To create an aligned XFS filesystem on this you would use something like:

$ mkfs.xfs -d su=64k sw=10 /dev/sdb

When using vendor array hardware that only allows one to define what XFS
calls swidth, it is best to use a power of 2 stripe spindle count to get
proper alignment.  If you use a non power of 2 stripe spindle count the
vendor firmware will either round down or round up to create the stripe
unit size, and this formula is often not documented.

With such vendor hardware, for a RAID6 array you would want to have 6,
10, or 18 total drives in the array, giving you 4, 8, or 16 stripe
spindles.  Alternatively, you need to know exactly how the firmware
rounds up or down to arrive at the strip block size (sunit).

If you find yourself in such a situation, and are unable to determine
the strip size the array firmware is using, you may be better off using
the mkfs.xfs defaults, vs guessing and ending up with unaligned writes.

-- 
Stan

<Prev in Thread] Current Thread [Next in Thread>