On Wed, 2001-11-14 at 11:22, Dan Yocum wrote:
> >
> > Have you read the sections about sunit and swidth?
>
>
> Yup. Still doesn't make much sense to me. It sounds like swidth is
> analogous to chunk-size in software raid, but what is sunit?
>
> And are 'sw' and 'su' just the abreviated forms of swidth and sunit,
> respectively?
There are versions with different units, one is in filesystem blocks and
one is in 512 byte blocks.
>
>
> >
> > In general mkfs.xfs will do the right thing. As the man page states,
> > when you run mkfs on an LVM or MD device it will automagically extract
> > the stripe unit and stripe width.
> >
> > If you have a hardware RAID device, however, you'll have to specify
> > these parameters manually to match the configuration of your device.
>
>
> So, to wit, in our systems we have 2, 8 disk HW RAID5 arrays which are SW
> RAID0 (striped) together. The HW chunk size is 64k (this is hardcoded).
> The SW chunk size is 512k. I wish that I could make this 448k (so one SW
> chunk goes to one array, with the left over used as the parity chunk), but
> that's not possible.
>
> So, xfs_info shows this:
>
> [root@sdssdp10 dp]# xfs_info /export/data/dp10.a/
> meta-data=/export/data/dp10.a isize=512 agcount=268, agsize=1048576
> blks
> data = bsize=4096 blocks=280145408, imaxpct=25
> = sunit=128 swidth=256 blks, unwritten=0
> naming =version 2 bsize=4096
> log =internal bsize=4096 blocks=32768
> realtime =none extsz=1048576 blocks=0, rtextents=0
This is picking up info from the software raid, it does not see any
info from the hardware. As far as it is concerned, you have two devices
the stripe unit (amount of data written to one device before it switches
to the next device) is 128 file system blocks, or 512Kbytes, the stripe
width (or amount of data before it cycles back to the first device
again) is twice this (2 devices).
You can override the automatically selected values at mkfs time, the
tricky part is working out what values will work for you. To quote from
the man page:
The sunit suboption is used to specify the stripe
unit for a RAID device or a logical volume. The
suboption value has to be specified in 512-byte
block units. Use the su suboption to specify the
stripe unit size in bytes. This suboption ensures
that data allocations will be stripe unit aligned
when the current end of file is being extended and
the file size is larger than 512KB. Also inode
allocations and the internal log will be stripe
unit aligned.
The su suboption is an alternative to using sunit.
The su suboption is used to specify the stripe unit
for a RAID device or a striped logical volume. The
suboption value has to be specified in bytes, (usu
ally using the m or g suffixes). This value must
be a multiple of the filesystem block size.
The swidth suboption is used to specify the stripe
width for a RAID device or a striped logical vol
ume. The suboption value has to be specified in
512-byte block units. Use the sw suboption to
specify the stripe width size in bytes. This sub
option is required if -d sunit has been specified
and it has to be a multiple of the -d sunit subop
tion. The stripe width will be the preferred
iosize returned in the stat(2) system call.
The sw suboption is an alternative to using swidth.
The sw suboption is used to specify the stripe
width for a RAID device or striped logical volume.
The suboption value is expressed as a multiplier of
the stripe unit, usually the same as the number of
stripe members in the logical volume configuration,
or data disks in a RAID device.
When a filesystem is created on a logical volume
device, mkfs.xfs will automatically query the logi
cal volume for appropriate sunit and swidth values.
You could specify -d su=64k,sw=896k
To be honest I am not totally sure what your sw value should be, I
presume an 8 disk raid is 7 data one parity, so I multiplied the
stripe unit by 14. The stripe unit is the only one which really matters.
Also, we discovered a problem with the latest version of mkfs in
how it lays out allocation groups onto the stripes, on really
large filesystems it attempts to make the allocation groups 4G
in size, this typically makes them all start on the same LUN,
which is not good. You should make the allocation group size
one stripe unit less than 4G, so for a 64k stripe unit this
would be
-d agsize=4294967232k
That is 4G - 64k
This will tend to spread data over all the LUNs better I think.
Steve
p.s. interested in testing some code to allow you to use 256 byte inodes
on a device bigger than 1 Tbyte?
>
>
> Why? Shouldn't swidth be 1000? And about sunit... well, I'm just confused
> about what that should be.
>
>
> >
> > --
> > Martin K. Petersen, Principal Linux Consultant, Linuxcare, Inc.
> ^^^^^^^^^
maybe that should read:
Only ;-)
Steve
--
Steve Lord voice: +1-651-683-3511
Principal Engineer, Filesystem Software email: lord@xxxxxxx
|