[Top] [All Lists]

Re: xfs deadlock in stable kernel 3.0.4

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: xfs deadlock in stable kernel 3.0.4
From: Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx>
Date: Wed, 14 Sep 2011 09:26:18 +0200
Cc: "xfs-masters@xxxxxxxxxxx" <xfs-masters@xxxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>, aelder@xxxxxxx
In-reply-to: <20110913205018.GA8543@xxxxxxxxxxxxx>
References: <1D2B34A7-7BB9-4E4E-9CA2-382C210E125F@xxxxxxxxxxxx> <20110912152133.GA8345@xxxxxxxxxxxxx> <C6515E45-5724-43DD-95A8-1F89AFE29601@xxxxxxxxxxxx> <20110912200543.GA22409@xxxxxxxxxxxxx> <4E6EF274.7050007@xxxxxxxxxxxx> <20110913205018.GA8543@xxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20110617 Thunderbird/3.1.11

Am 13.09.2011 22:50, schrieb Christoph Hellwig:
On Tue, Sep 13, 2011 at 08:04:36AM +0200, Stefan Priebe - Profihost AG wrote:
I just reported it to the scsi list as i didn't knew where the
problems is. But then some people told be it must be a XFS problem.

Some more informations:
1.) It's running with 2.6.32 and 2.6.38
2.) I can also write to another ext2 part on the same disk
array(aacraid driver) while xfs stucks - so i think it must be an
xfs problem

That points a bit more towards XFS, although we've seen storage setups
create issues depending on the exact workload.  The prime culprit for
used to be the md software RAID driver, though.

3.) I've also tried running 3.1-rc5 but then i'm seeing this error:

BUG: unable to handle kernel NULL pointer dereference at 000000000000012c
IP: [] inode_dio_done+0x4/0x25

Oops, that's a bug that I actually introduced myself.  Fix below:

Thanks for the patch.

Now we have the following situation:

1.) Systems running fine with 2.6.32, 2.6.38 and with 3.1 rc-6 + patch
2.) Sadly it does not run with 3.0.4 for more than 1 hour. And 3.0.X will become the next long term stable. So there will be a lot of people using it. 3.) I have seen this deadlock on systems with aacraid and with intel ahci onboard. (that's all we're using) 4.) I still write to other devices / raids on the same controller while the XFS root filesystem hangs.

What can we do / try now / next?


<Prev in Thread] Current Thread [Next in Thread>