Re: vanilla, project quota enabled and process stuck in D state

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: vanilla, project quota enabled and process stuck in D state (repeatable every time)
From: Arkadiusz Miskiewicz <arekm@xxxxxxxx>
Date: Wed, 3 Dec 2008 22:42:29 +0100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20081203213028.GW18236@disturbed>
References: <200812021949.55463.arekm@xxxxxxxx> <200812031406.41882.arekm@xxxxxxxx> <20081203213028.GW18236@disturbed>
User-agent: PLD Linux KMail/1.9.10
On Wednesday 03 of December 2008, Dave Chinner wrote:

> > D-state lock is also happening if I drop usrquota,prjquota, reboot and
> > retry the test. I assume something was written on disk that triggers the
> > problem.
> Unlikely - locking doesn't generally get stuck due to on disk
> corruption. Are there any other blocked processes in the machine?
> i.e. what is the entire output of 'echo w > /proc/sysrq-trigger'?

Only this one program trace visible in sysrq-w output. No other traces - so no 
other blocked programs.

> Are there any other signs of general unwellness (e.g. a CPU running
> at 100% when it shouldn't be)?

Nothing wrong.

> FWIW, if you are seeing this on two hosts, can you try to build
> a reproducable test case using a minimal data set and a simple
> set of commands? If you can do this and supply us with a
> xfs_metadump image of the filesystem plus the commands to reproduce
> the problem we'll be able to find the problem pretty quickly....

I was able to reproduce it with:

- mount fs with usrquota,prjquota
- setup /home/users/arekm/rpm as project quota id = 10
- run program below twice

[arekm@farm rpm]$ more a.c
#include <stdio.h>
int main() {
        int i;
        i = 
rename("/home/users/arekm/tmp/aa", "/home/users/arekm/rpm/testing");
        printf("ret=%d %m\n", i);
        return 0;
[arekm@farm rpm]$ touch /home/users/arekm/tmp/aa
[arekm@farm rpm]$ ./a.out
ret=-1 Invalid cross-device link
[arekm@farm rpm]$ ./a.out

second run hangs with D-state. 

For clarification, rpm and tmp directories are on the same 
filesystem/partition (hda2), rpm/ dir belongs to project quota id=10, tmp 
doesn't belong to any project quota.

For the rest of your questions -  Christoph promised to look at the issue 
today, so I'll wait until tomorrow and if the issue will still be a mystery 
then I'll dig out all data you asked for.

> Cheers,
> Dave.

Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

