[Top] [All Lists]

Re: mkfs.xfs states log stripe unit is too large

To: Ingo Jürgensmann <ij@xxxxxxxxxxxxxxxxxx>
Subject: Re: mkfs.xfs states log stripe unit is too large
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Sun, 24 Jun 2012 09:44:45 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <D3F781FA-CEB0-4896-9441-772A9E533354@xxxxxxxxxxxxxxxxxx>
References: <D3F781FA-CEB0-4896-9441-772A9E533354@xxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Sat, Jun 23, 2012 at 02:50:49PM +0200, Ingo Jürgensmann wrote:
> muaddib:~# cat /proc/mdstat 
> Personalities : [raid1] [raid6] [raid5] [raid4] 
> md7 : active raid5 sdf4[3] sdd4[1] sde4[0]
>       7811261440 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

> The RAID devices /dev/md0 to /dev/md4 are on my old 3x 1 TB
> Seagate disks. Anyway, to finally come to the problem, when I try
> to create a filesystem on the new RAID5 I get the following:  
> muaddib:~# mkfs.xfs /dev/lv/usr
> log stripe unit (524288 bytes) is too large (maximum is 256KiB)
> log stripe unit adjusted to 32KiB
> meta-data=/dev/lv/usr            isize=256    agcount=16, agsize=327552 blks
>          =                       sectsz=512   attr=2, projid32bit=0
> data     =                       bsize=4096   blocks=5240832, imaxpct=25
>          =                       sunit=128    swidth=256 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal log           bsize=4096   blocks=2560, version=2
>          =                       sectsz=512   sunit=8 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> As you can see I follow the "mkfs.xfs knows best, don't fiddle
> around with options unless you know what you're doing!"-advice.
> But apparently mkfs.xfs wanted to create a log stripe unit of 512
> kiB, most likely because it's the same chunk size as the
> underlying RAID device. 

Exactly. Best thing in general is to align all log writes to the
underlying stripe unit of the array. That way as multiple frequent
log writes occur, it is guaranteed to form full stripe writes and
basically have no RMW overhead. 32k is chosen by default because
that's the default log buffer size and hence the typical size of
log writes.

If you increase the log stripe unit, you also increase the minimum
log buffer size that the filesystem supports. The filesystem can
support up to 256k log buffers, and hence the limit on maximum log
stripe alignment.

> The problem seems to be related to RAID5, because when I try to
> make a filesystem on /dev/md6 (RAID1), there's no error message:

They don't have a stripe unit/stripe width, so no alignment is
needed or configured.

> So, the question is: 
> - is this a bug somewhere in XFS, LVM or Linux's software RAID
> implementation?

Not a bug at all.

> - will performance suffer from log stripe size adjusted to just 32
> kiB? Some of my logical volumes will just store data, but one or
> the other will have some workload acting as storage for BackupPC.

For data volumes, no. For backupPC, it depends on whether the MD
RAID stripe cache can turn all the sequential log writes into a full
stripe write. In general, this is not a problem, and is almost never
a problem for HW RAID with BBWC....

> - would it be worth the effort to raise log stripe to at least 256
> kiB?

Depends on your workload. If it is fsync heavy, I'd advise against
it, as every log write will be padded out to 256k, even if you only
write 500 bytes worth of transaction data....

> - or would it be better to run with external log on the old 1 TB

External logs provide muchless benefit with delayed logging than hey
use to. As it is, your external log needs to have the same
reliability characteristics as the main volume - lose the log,
corrupt the filesystem. Hence for RAID5 volumes, you need a RAID1
log, and for RAID6 you either need RAID6 or a 3-way mirror to
provide the same reliability....

> End note: the 4 TB disks are not yet "in production", so I can run
> tests with both RAID setup as well as mkfs.xfs. Reshaping the RAID
> will take up to 10 hours, though... 

IMO, RAID reshaping is just a bad idea - it changes the alignment
characteristic of the volume, hence everything that the
filesystemlaid down in an aligned fashion is now unaligned, and you
have to tell the filesytemteh new alignment before new files will be
correctly aligned. Also, it's usually faster to back up, recreate
and restore than reshape and that puts a lot less load on your
disks, too...


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>