[Top] [All Lists]

Re: xfs deadlock in stable kernel 3.0.4

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: xfs deadlock in stable kernel 3.0.4
From: Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx>
Date: Mon, 19 Sep 2011 12:54:37 +0200
Cc: "xfs-masters@xxxxxxxxxxx" <xfs-masters@xxxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>, aelder@xxxxxxx, Dave Chinner <david@xxxxxxxxxxxxx>
In-reply-to: <20110918200438.GC14899@xxxxxxxxxxxxx>
References: <1D2B34A7-7BB9-4E4E-9CA2-382C210E125F@xxxxxxxxxxxx> <20110912152133.GA8345@xxxxxxxxxxxxx> <C6515E45-5724-43DD-95A8-1F89AFE29601@xxxxxxxxxxxx> <20110912200543.GA22409@xxxxxxxxxxxxx> <4E6EF274.7050007@xxxxxxxxxxxx> <20110913205018.GA8543@xxxxxxxxxxxxx> <4E70571A.80108@xxxxxxxxxxxx> <4E705C42.6020909@xxxxxxxxxxxx> <20110914143005.GA28496@xxxxxxxxxxxxx> <4E75B660.1030502@xxxxxxxxxxxx> <20110918200438.GC14899@xxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20110831 Thunderbird/3.1.13
Am 18.09.2011 22:04, schrieb Christoph Hellwig:
On Sun, Sep 18, 2011 at 11:14:08AM +0200, Stefan Priebe - Profihost AG wrote:

at least i'm now able to reproduce the issue. I hope this will help
to investigate the issue and hopefully you can reproduce it as well.

I'm using vanilla 3.0.4 kernel + xfs as root filesystem and had
detect hanging taks with 120s set. You'll then see that the bonnie++
command get's stuck in xlog_grant_log_space while creating or
deleting files. I was using a SSD or a fast Raid 10 (24x SAS Disks)
- i was not able to reproduce it on normal SATA disks even a 20x
SATA Raid 10 didn't work.

Thanks a lot for the reproducer!

I've tried it on my laptop SSD and that didn't reproduce it yet.  I'll
try it on monday on a real high end setup.

Sadly my SSD briked tonight while doing heavy testing ;-( I was not able to reproduce it on every partition. Only on some. Sadly i was not able to find the common point which causes this.

I've now to setup a new machine and try to reproduce it again.

What i got so far is that bonnie++ is always hanging here:

[] ? radix_tree_gang_lookup_slot+0x6a/0x8d
[] ? xfs_bmap_search_extents+0x56/0xb9
[] ? find_get_pages+0x39/0xd8
[] xlog_wait+0x58/0x70
[] ? try_to_wake_up+0x1c6/0x1c6
[] ? xlog_grant_push_ail+0xb7/0xbf
[] xlog_grant_log_space+0x162/0x2b1
[] xfs_log_reserve+0xbb/0xc4
[] xfs_trans_reserve+0xd6/0x1b1
[] xfs_free_eofblocks+0x16b/0x1fb
[] xfs_release+0x1c7/0x202
[] xfs_file_release+0x10/0x14
[] fput+0xfd/0x1eb
[] filp_close+0x6d/0x78
[] sys_close+0x9a/0xd4
[] system_call_fastpath+0x16/0x1b

The traces we had in the past were difficult to check which process was causing the lookup. So it doesn't seem to be the xlog_grant_log_space itself it seems that it is more xfs_bmap_search_extents or radix_tree_gang_lookup_slot?


<Prev in Thread] Current Thread [Next in Thread>