xfs
[Top] [All Lists]

Re: [PATCH] deadlocks on ENOSPC

To: Andi Kleen <ak@xxxxxxx>
Subject: Re: [PATCH] deadlocks on ENOSPC
From: Nathan Scott <nathans@xxxxxxx>
Date: Thu, 17 Jun 2004 11:38:07 +1000
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20040616164521.3382ed14.ak@suse.de>
References: <20040612040838.020a2efb.ak@suse.de> <20040615052909.GC816@frodo> <20040615223630.54b3e1b5.ak@suse.de> <20040616072548.GA1782@frodo> <20040616164521.3382ed14.ak@suse.de>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.3i
On Wed, Jun 16, 2004 at 04:45:21PM +0200, Andi Kleen wrote:
> On Wed, 16 Jun 2004 17:25:48 +1000
> Nathan Scott <nathans@xxxxxxx> wrote:
> 
> > Test 083 does take around half an hour to complete (and its
> > running with fewer ops than your case, making it shorter);
> > i.e. with 30 fsstress processes and 10000 ops each.
> 
> 
> I'm using this simple script to reproduce it:
> 
> rm -rf /xfs/usr
> /path/to/ltp-full-20031106/testcases/bin/fsstress -d /xfs -n50000 -p30
> while true ; do cp -a /usr /xfs ; done

Is that fsstress run in the background?  (else the cp is only
done afterward and not contributing to the deadlock here?)
I did a bunch of tests with parallel cp's earlier too, and that
wasn't helping me hit it, so I dropped that part.

I'll try hijack someone elses ia64 test box today and see if
that helps me reproduce it.

> /dev/sda6             3.0G  3.0G   20K 100% /xfs
> 
> (with /usr copied to it, after it got deleted it has maybe 20-30MB free,
> but the script fills it quickly again)

OK, I'll use a similar layout too.

> 
> The disk is idle as seen by vmstat
> 

Yep, thats stopped alright.

> I don't know how to plot values using PCP and it was not obvious
> from the manpage. Can you give me a simple recipe?

Get pmcd chugging along...
/etc/init.d/pcp start

Then something like this will extract data...
9:06 fsgqa@bruce ~ 19> pmie -t 5 -v
allocs = xfs.allocs.alloc_block; 
frees = xfs.allocs.free_block;
full = filesys.full #'/dev/sdb5';
^D

> 
> The processes are unkillable and don't go away even after a kill -9
> and when you wait a bit.

OK, that sure sounds busted; I need to try some other machines
I think, maybe this one is just too slow.

thanks.

-- 
Nathan


<Prev in Thread] Current Thread [Next in Thread>