xfs
[Top] [All Lists]

Re: [PATCH] deadlocks on ENOSPC

To: Nathan Scott <nathans@xxxxxxx>
Subject: Re: [PATCH] deadlocks on ENOSPC
From: Andi Kleen <ak@xxxxxxx>
Date: Wed, 16 Jun 2004 16:45:21 +0200
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20040616072548.GA1782@frodo>
References: <20040612040838.020a2efb.ak@xxxxxxx> <20040615052909.GC816@frodo> <20040615223630.54b3e1b5.ak@xxxxxxx> <20040616072548.GA1782@frodo>
Sender: linux-xfs-bounce@xxxxxxxxxxx
On Wed, 16 Jun 2004 17:25:48 +1000
Nathan Scott <nathans@xxxxxxx> wrote:

> Test 083 does take around half an hour to complete (and its
> running with fewer ops than your case, making it shorter);
> i.e. with 30 fsstress processes and 10000 ops each.


I'm using this simple script to reproduce it:

rm -rf /xfs/usr
/path/to/ltp-full-20031106/testcases/bin/fsstress -d /xfs -n50000 -p30
while true ; do cp -a /usr /xfs ; done

/xfs is:

/dev/sda6             3.0G  3.0G   20K 100% /xfs

(with /usr copied to it, after it got deleted it has maybe 20-30MB free,
but the script fills it quickly again)

The layout of XFS is just lots of 10 and 100MB files until 
it's nearly full (all created by dd) and a copy of usr and
the fsstress directories.


> What are you using to guage deadlock-ness?  Do you have any
> monitoring tools watching disk IO while you run the test --
> for my tests, I've been plotting these (PCP) metrics:
>       disk.all.write,
>       disk.all.read,
>       filesys.full,
>       xfs.allocs.free_block, &
>       xfs.allocs.alloc_block
> 
> while the test runs and there is always some activity for
> the duration of the test (very little CPU use though, as
> might be expected here).  And I can interupt at any time,

The disk is idle as seen by vmstat

e.g.
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  1     12 498820  18308  31604    0    0     0     0 1073    36  0  0 50 50
 0  1     12 498820  18308  31604    0    0     0     0 1074    32  0  0 50 50
 0  1     12 498820  18308  31604    0    0     0     0 1073    32  0  0 50 50
 0  1     12 498820  18308  31604    0    0     0     0 1074    34  0  0 50 50
 0  1     12 498820  18308  31604    0    0     0     0 1073    30  0  0 50 50
 0  1     12 498820  18308  31604    0    0     0     0 1075    32  0  0 50 50
 0  1     12 498820  18308  31604    0    0     0     0 1073    30  0  0 50 50
 0  1     12 498884  18308  31604    0    0     0     0 1074    30  0  0 50 50
 0  1     12 498884  18308  31604    0    0     0     0 1073    32  0  0 50 50
 0  1     12 498884  18308  31604    0    0     0     0 1074    30  0  0 50 50
 0  1     12 498884  18308  31604    0    0     0     0 1073    32  0  0 50 50
 0  1     12 498884  18308  31604    0    0     0     0 1074    30  0  0 50 50
 0  1     12 498884  18308  31604    0    0     0     0 1074    30  0  0 50 50
 0  1     12 498884  18308  31604    0    0     0     0 1074    32  0  0 50 50


I don't know how to plot values using PCP and it was not obvious
from the manpage. Can you give me a simple recipe?

> or let it run to completion, and I always see a clean
> unmount and no corruption.  Hmmm.

The processes are unkillable and don't go away even after a kill -9
and when you wait a bit.

I don't get any FS corruption neither, just deadlocked processes.

-Andi


<Prev in Thread] Current Thread [Next in Thread>