xfs
[Top] [All Lists]

Re: More processes hanging in 'D' state.

To: linux-xfs@xxxxxxxxxxx
Subject: Re: More processes hanging in 'D' state.
From: Kelledin <kelledin+XFS@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 28 Apr 2003 07:27:09 -0500
Cc: Eric Sandeen <sandeen@xxxxxxx>
In-reply-to: <Pine.LNX.4.44.0304272129470.17953-100000@stout.americas.sgi.com>
References: <Pine.LNX.4.44.0304272129470.17953-100000@stout.americas.sgi.com>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: KMail/1.5.1
On Sunday 27 April 2003 09:41 pm, Eric Sandeen wrote:
> > (Pardoning my
> > very blunt and rather pointed question, do these patchsets
> > even get a test compile?)
>
> If you got something from
> ftp://oss.sgi.com/projects/xfs/patches/weekly-snapshot-patch/,
> no, they're just snapshots.  You pays your money (or not), you
> takes your chances.  :)  And they certainly don't get tested
> on alpha, we have nothing but x86 and ia64 here (and an athlon
> on the way....)

Well, what I got was the 2003-04-07 patchset, not from 
weekly-snapshots, but from 
ftp://oss.sgi.com/projects/xfs/patches/2.4.20/ .  AFAICT that's 
the closest thing you've got to a release synced with my current 
kernel; the 1.2 release patchset doesn't even apply cleanly to 
2.4.20.

> Anyway, on to your question.  I would have suggested kdb to
> find out where the processes were stuck, but that's not going
> to work for alpha. If you can think of a way to hit it on an
> x86 or ia64 box, kdb might help.

I suppose I can turn on XFS debugging next, see if that tells me 
anything.

> What was the exact test (RPM build) you ran?

[ root@eliudnir /usr/src ] # rpm --rebuild \ 
incept/SRPMS/i2c-2.7.0-4.src.rpm >& /var/log/rebuild.log

In order for you to properly mimic the testcase, /usr needs to be 
a mount point for a filesystem that gets no write activity other 
than the RPM rebuild (this is a normal condition for an 
FHS-compliant /usr filesystem).  As I mentioned in my last post, 
if the deadlock gets hit, allocating or unlinking an inode on 
the same fs as the deadlocked process breaks out of the 
deadlock.  That's why the condition of the /usr filesystem must 
be so strict, otherwise you may never hit the deadlock for long 
enough to notice.

The source RPM used for the testcase is here:

http://skarpsey.dyndns.org/i2c-2.7.0-4.src.rpm

You may have to run the rebuild two or three times to trip the 
deadlock.  Once you do, you'll have an rm -Rf process stuck in 
'D' state, and pretty much any process you send to recursively 
read the same directory tree (like an ls -lR or du -hs) will 
probably get stuck in the same deadlock.

-- 
Kelledin
"If a server crashes in a server farm and no one pings it, does 
it still cost four figures to fix?"


<Prev in Thread] Current Thread [Next in Thread>