xfs
[Top] [All Lists]

Re: [PATCH] deadlocks on ENOSPC

To: Andi Kleen <ak@xxxxxxx>
Subject: Re: [PATCH] deadlocks on ENOSPC
From: Nathan Scott <nathans@xxxxxxx>
Date: Wed, 16 Jun 2004 17:25:48 +1000
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20040615223630.54b3e1b5.ak@suse.de>
References: <20040612040838.020a2efb.ak@suse.de> <20040615052909.GC816@frodo> <20040615223630.54b3e1b5.ak@suse.de>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.3i
On Tue, Jun 15, 2004 at 10:36:30PM +0200, Andi Kleen wrote:
> On Tue, 15 Jun 2004 15:29:09 +1000
> 
> I added some instrumentation and all lost buffers seem to originate
> from xfs_alloc_fix_freelists (with different calls). I also tried with
> only a single CPU and it happens so, so it isn't an SMP race.
> 
> I will look into this more later.
> 

hi Andi,

I've been running tests all day and I'm still not able to
hit any ENOSPC deadlocks with top of tree code (test box
is a 4 cpu ia32).  Frustrating.

I do notice that during the test, at any one point in time
most fsstress processes will be in 'D' state waiting for
resources held by others - and we do alot of flushing and
synchronous activity near ENOSPC, so seems these can take
some time to become available.

Test 083 does take around half an hour to complete (and its
running with fewer ops than your case, making it shorter);
i.e. with 30 fsstress processes and 10000 ops each.

What are you using to guage deadlock-ness?  Do you have any
monitoring tools watching disk IO while you run the test --
for my tests, I've been plotting these (PCP) metrics:
        disk.all.write,
        disk.all.read,
        filesys.full,
        xfs.allocs.free_block, &
        xfs.allocs.alloc_block

while the test runs and there is always some activity for
the duration of the test (very little CPU use though, as
might be expected here).  And I can interupt at any time,
or let it run to completion, and I always see a clean
unmount and no corruption.  Hmmm.

cheers.

-- 
Nathan


<Prev in Thread] Current Thread [Next in Thread>