mkfs.xfs fails with raid5 and smaller chunk sizes

Brian Hemme bmh at rincon.com
Tue Sep 16 17:47:43 CDT 2014


On 09/16/2014 03:17 PM, Dave Chinner wrote:
> On Tue, Sep 16, 2014 at 03:03:08PM -0700, Brian Hemme wrote:
>> Hello all,
>>
>> I am having some odd problems with mkfs.xfs when used on a raid 5
>> array.  The array is built from 6 960GB SSDs all connected to SATA
>> ports on the MB and created with mdadm.  If I use a chunk size any
>> smaller then 512K mkfs.xfs just hangs forever.  It continues to use
>> CPU and so does the raid array but never completes.  If the system
>> is just left running for an extended length of time the whole OS
>> eventually locks up.  I have tried this on three different systems
>> with the same results.   I have searched all over for someone with
>> similar issues without success.  I am hoping I am just doing
>> something clearly wrong and you all can set me straight quickly.
>>
>> Some specifics:
>>      Arch linux with 3.14.1 kernel
>>      mkfs.xfs version 3.1.11
>>      mdadm - v3.3 - 3rd September 2013
>>
>> Commands:
>>> mdadm --create /dev/md0 --chunk=64K --level=5 --raid-devices=6
>> /dev/sd[a-f]
>>> mkfs.xfs /dev/md0
>>    ** This command fails and locks up
>>
>> I have tried specifying the arguments to mkfs.xfs with the same
>> results.  Building a 4 drive array seems to require a chunk size of
>> 1M or greater to work.  Same results if I make a partition on the
>> array and make the fs there.
> mkfs.xfs really should only take a couple of seconds to complete.
> Seeing as you are using SSDs, my first suspicion is that md or the
> SSDs are having problems with discard. Hence you should first
> try 'mkfs.xfs -K /dev/md0' and see if that completes quickly.
>
> Otherwise, output of 'echo w>  sysrq-trigger' from dmesg would be a
> good start, as would a 'perf top -G -U' snapshot (run for 30s at
> least a minute after mkfs.xfs starts) to tell us what is burning
> CPU.
>
> Cheers,
>
> Dave.

Thanks for the quick response!

Adding the -K seemed to do the trick.  However, for my education, why is 
this needed in this case?  It seems to work without it for larger chunk 
sizes or for raid 0 instead of 5.  It also worked on our old install 
with a 3.1.6 kernel.  Any why would not using the -K cause enough of a 
problem that the whole machine hangs?  Just trying to understand this 
enough to make sure I don't run into problems down the road.

Thanks again,
Brian



More information about the xfs mailing list