xfs
[Top] [All Lists]

Re: XFS unhappy with large holey loopback and syncs

To: Andi Kleen <ak@xxxxxxx>
Subject: Re: XFS unhappy with large holey loopback and syncs
From: David Chinner <dgc@xxxxxxx>
Date: Tue, 29 Nov 2005 14:29:37 +1100
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20051129003611.GF7209@xxxxxxxxxxxxxx>
References: <20051129003611.GF7209@xxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.4.2.1i
On Tue, Nov 29, 2005 at 01:36:11AM +0100, Andi Kleen wrote:
> 
> I just found an new exciting way to break XFS. Or rather the 
> version that's in 2.6.13. But might be a interesting try anyways.

So I just ran this on a handy altix I had lying about between other
testing ;)

It's currently not running 2.6.13, but I'm going to run this again
against 2.6.14 just to make sure it's not a regression. I suspect that
the problem is that you're generating a highly fragmented file
which requires high-order memory allocations to hold the extent list.

> You likely need a 64bit system for this.

Check.

> I created a large holey file on XFS with
> 
> # (the funny number is about the maximum that ext2 supports)
> dd if=/dev/zero of=LARGE bs=1 count=4096 seek=$[8*1024*1024*1024*1024-2*4096] 
> losetup /dev/loop0 LARGE
> mkfs.ext2 /dev/loop0
> 
> now wait until it has written a few thousands of its inode tables
> and then press ctrl-c.  mkfs.ext2 will close the loop device which
> causes a sync. And then it will hang for a very long time
> until loop starts spewing out IO errors and then it deadlocks completely.
> The mkfs process is busy waiting for its sync, loop0 does:

First I tried aborting the mkfs but I didn't see any hangs, so I let
mkfs.ext2 run to completion - it dirtied all of memory (~23GiB) and
then it spent most of the time writing to disk at 300-400MB/s:

budgie:~ # vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 7  0 416096  27760 17509664 5153888   45    6    66 397116 5725 11054  0 45 46 
 9
 3  0 416096  28096 17299648 5363776   51   10    98 393118 5638  6693  1 46 43 
10
10  0 416096  27984 16831344 5831968   26    0    38 440788 5627 11416  0 50 42 
 7
 2  0 416096  21712 17438816 5241232   22   19   246 362036 5737  8968  1 38 51 
10
 7  0 416096  27312 18031552 4631792   67    0   193 347848 5574  7725  0 37 51 
11

And kept more than a million pages under writeback for the entire
runtime.  As the number of inode tables had been written out
increased, the write out rate slows a bit. When mkfs completes, a
sync is done and everything is good.

But I can now guess why a smaller machine might hang on this test - you're
creating a file with a massive number of extents:

budgie:/usr/local/aspen/loadgen # xfs_bmap -v /mnt/dgc/stripe/LARGE |wc -l
217708
budgie:/usr/local/aspen/loadgen # 

Which is what makes me think you're having problems with high order
memory allocations at substantially lower numbers of extents. You're
testing on a machine with 4k pages, right?

Cheers,

Dave.
-- 
Dave Chinner
R&D Software Enginner
SGI Australian Software Group


<Prev in Thread] Current Thread [Next in Thread>