3.2.9 and locking problem
Arkadiusz Miśkiewicz
arekm at maven.pl
Mon Mar 12 08:43:58 CDT 2012
On Monday 12 of March 2012, Dave Chinner wrote:
> On Fri, Mar 09, 2012 at 08:28:47PM +0100, Arkadiusz Miśkiewicz wrote:
> > Are there any bugs in area visible in tracebacks below? I have a system
> > where one operation (upgrade of single rpm package) causes rpm process
> > to hang in D-state, sysrq-w below:
> >
> > [ 400.755253] SysRq : Show Blocked State
> > [ 400.758507] task PC stack pid father
> > [ 400.758507] rpm D 0000000100005781 0 8732 8698
> > 0x00000000 [ 400.758507] ffff88021657dc48 0000000000000086
> > ffff880200000000 ffff88025126f480 [ 400.758507] ffff880252276630
> > ffff88021657dfd8 ffff88021657dfd8 ffff88021657dfd8 [ 400.758507]
> > ffff880252074af0 ffff880252276630 ffff88024cb0d005 ffff88021657dcb0 [
> > 400.758507] Call Trace:
> > [ 400.758507] [<ffffffff8114b22a>] ? kmem_cache_free+0x2a/0x110
> > [ 400.758507] [<ffffffff8114d2ed>] ? kmem_cache_alloc+0x11d/0x140
> > [ 400.758507] [<ffffffffa00df3c7>] ? kmem_zone_alloc+0x67/0xe0 [xfs]
> > [ 400.758507] [<ffffffff8148b78a>] schedule+0x3a/0x50
> > [ 400.758507] [<ffffffff8148d25d>] rwsem_down_failed_common+0xbd/0x150
> > [ 400.758507] [<ffffffff8148d303>] rwsem_down_write_failed+0x13/0x20
> > [ 400.758507] [<ffffffff812652a3>]
> > call_rwsem_down_write_failed+0x13/0x20 [ 400.758507]
> > [<ffffffff8148c8ed>] ? down_write+0x2d/0x40
> > [ 400.758507] [<ffffffffa00cf97c>] xfs_ilock+0xcc/0x120 [xfs]
> > [ 400.758507] [<ffffffffa00d4ace>] xfs_setattr_nonsize+0x1ce/0x5b0
> > [xfs] [ 400.758507] [<ffffffff81265502>] ?
> > __strncpy_from_user+0x22/0x60 [ 400.758507] [<ffffffffa00d52ab>]
> > xfs_vn_setattr+0x1b/0x40 [xfs] [ 400.758507] [<ffffffff8117c1a2>]
> > notify_change+0x1a2/0x340
> > [ 400.758507] [<ffffffff8115ed80>] chown_common+0xd0/0xf0
> > [ 400.758507] [<ffffffff8115fe4c>] sys_chown+0xac/0x1a0
> > [ 400.758507] [<ffffffff81495112>] system_call_fastpath+0x16/0x1b
>
> I can't see why we'd get a task stuck here - it's waiting on the
> XFS_ILOCK_EXCL. The only reason for this is if we leaked an unlock
> somewhere. It appears you can reproduce this fairly quickly,
linux vserver patch [1] seems to be messing with locking. Would be nice if you
could make a quick look at it to see if it can be considered guilty part?
On the other hand I wasn't able to reproduce on 3.0.22. vserver patch for .22
[2] is doing the same thing as vserver patch for 3.2.9.
> so
> running an event trace via trace-cmd for all the xfs_ilock trace
> points and posting the report output might tell us what inode is
> blocked and where we leaked (if that is the cause).
Will try to get more information but this will take some time (most likely
weeks) to get this machine down for debugging.
> Cheers,
> Dave.
1. http://vserver.13thfloor.at/Experimental/patch-3.2.9-vs2.3.2.7.diff
2. http://vserver.13thfloor.at/Experimental/patch-3.0.22-vs2.3.2.3.diff
--
Arkadiusz Miśkiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/
More information about the xfs
mailing list