[Top] [All Lists]

Re: [xfs-masters] xfs deadlock in stable kernel 3.0.4

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [xfs-masters] xfs deadlock in stable kernel 3.0.4
From: Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx>
Date: Wed, 21 Sep 2011 13:55:30 +0200
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, "xfs-masters@xxxxxxxxxxx" <xfs-masters@xxxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
In-reply-to: <20110921114237.GP15688@dastard>
References: <4E75B660.1030502@xxxxxxxxxxxx> <20110918230245.GF15688@dastard> <4E78665E.8030409@xxxxxxxxxxxx> <20110920160226.GA25542@xxxxxxxxxxxxx> <4E78CBF4.1030505@xxxxxxxxxxxx> <20110920172455.GA30757@xxxxxxxxxxxxx> <4E78CEFD.9030603@xxxxxxxxxxxx> <20110920223047.GA13758@xxxxxxxxxxxxx> <20110921021133.GM15688@dastard> <4E7994D3.5020103@xxxxxxxxxxxx> <20110921114237.GP15688@dastard>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20110831 Thunderbird/3.1.13
Am 21.09.2011 13:42, schrieb Dave Chinner:
Ok, I got a hang in the random delete phase. Not sure what is wrong
yet, but inode reclaim is trying to reclaim inodes but failing, and
the AIL is trying to push items but failing. Hence the tail of the
log is not being moved forward and new transactions are being
blocked until log space bcomes available.
OK that matches my findings. It was also mostly in the random delete phase. But i've also seen it on creates.

Given this, just triggering a log force is shoul dget everything
moving again. Running "echo 2>  /proc/sys/vm/drop_caches" gets inode
reclaim running in sync mode, which causes pinned inodes to trigger
a log force. And once I've done this, everything starts running
Oh man i was thinking about trying this. But then i forgot that idea ;-(

So, the log force not triggering in the AIL code looks to be the
problem. That, I simply cannot explain right now - it makes no sense
but that is what all the stats and trace events point to. I need to
do more investigation.
Thanks Dave and great that you were able to repeat it.

What helps is to build bonnie++ yourself and just remove the stat tests. I've done this too - so bonnie++ runs a lot faster.


<Prev in Thread] Current Thread [Next in Thread>