xfs
[Top] [All Lists]

Re: 2.6.27.7 vanilla, project quota enabled and process stuck in D state

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: 2.6.27.7 vanilla, project quota enabled and process stuck in D state (repeatable every time)
From: Arkadiusz Miskiewicz <arekm@xxxxxxxx>
Date: Wed, 3 Dec 2008 14:06:41 +0100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20081203032013.GS18236@disturbed>
References: <200812021949.55463.arekm@maven.pl> <20081203032013.GS18236@disturbed>
User-agent: PLD Linux KMail/1.9.10
On Wednesday 03 of December 2008, Dave Chinner wrote:
> On Tue, Dec 02, 2008 at 07:49:55PM +0100, Arkadiusz Miskiewicz wrote:
> > Hello,
> >
> > I'm trying to use xfs project quota on kernel 2.6.27.7 (vanilla, no
> > additional patches), x86_64 UP machine (SMP kernel).
> >
> > Now some processes that are using /home/users/arekm/rpm are hanging in
> > D-state like:
> >
> > SysRq : Show Blocked State
> >   task                        PC stack   pid father
> > patch         D ffff88003a7dd080     0  3971   3965
> >  ffff880034453cd8 0000000000000086 0000000000000000 ffff8800344770d0
> >  ffff880034453cd8 ffff8800354d2440 ffffffff805d0340 ffff8800354d27b8
> >  00000000000041ed 00000000fffc7a61 ffff8800354d27b8 0000000000000250
> > Call Trace:
> >  [<ffffffffa00af4c4>] ? kmem_zone_alloc+0x94/0xe0 [xfs]
> >  [<ffffffff804a51cd>] __down_write_nested+0x8d/0xd0
> >  [<ffffffff804a521b>] __down_write+0xb/0x10
> >  [<ffffffff804a4229>] down_write+0x9/0x10
> >  [<ffffffffa008deb6>] xfs_ilock+0x76/0x90 [xfs]
> >  [<ffffffffa00aa7d0>] xfs_lock_two_inodes+0x70/0x120 [xfs]
> >  [<ffffffffa00ac651>] xfs_remove+0x141/0x3a0 [xfs]
> >  [<ffffffff804a54c9>] ? _spin_lock+0x9/0x10
> >  [<ffffffffa00b7c13>] xfs_setup_inode+0x673/0xa00 [xfs]
> >  [<ffffffff802d0849>] vfs_unlink+0xf9/0x140
> >  [<ffffffff802d3313>] do_unlinkat+0x1a3/0x1c0
> >  [<ffffffff80287ce0>] ? audit_syscall_entry+0x150/0x180
> >  [<ffffffff802d3341>] sys_unlink+0x11/0x20
> >  [<ffffffff8020c5aa>] system_call_fastpath+0x16/0x1b
>
> Can you enable lockdep in your kernel and retest? That will give
> use much more information about the locks that are causing problems
> here....

some debugging (including lockdep) enabled:

[  755.172243] SysRq : Show Blocked State
[  755.172265]   task                PC stack   pid father
[  755.172298] patch         D ef59de3c     0  3539   3533
[  755.172308]        c2f47520 00000086 00000002 ef59de3c ef59de44 00000000 
ef4b4920 0291f000
[  755.172324]        00000046 00000010 c2e24100 c0504040 ef59de44 ef59de40 
ef59de3c ef59c000
[  755.172339]        ef4b4920 ef4b4aa8 00000000 00021568 00000001 ef4b4920 
00000000 00000000
[  755.172354] Call Trace:
[  755.172359]  [<c014bc6a>] trace_hardirqs_on_caller+0xfa/0x130
[  755.172371]  [<c0392a4d>] schedule_timeout+0x8d/0xf0
[  755.172379]  [<c010910f>] native_sched_clock+0x7f/0xb0
[  755.172386]  [<c01315c0>] process_timeout+0x0/0x10
[  755.172394]  [<c0392a48>] schedule_timeout+0x88/0xf0
[  755.172411]  [<f88975db>] xfs_lock_two_inodes+0xcb/0x120 [xfs]
[  755.172451]  [<f8899526>] xfs_remove+0x136/0x3c0 [xfs]
[  755.172480]  [<c0393357>] mutex_lock_nested+0x1f7/0x290
[  755.172486]  [<c01ad067>] vfs_unlink+0x87/0x130
[  755.172494]  [<c01ad067>] vfs_unlink+0x87/0x130
[  755.172502]  [<f88a4ac6>] xfs_vn_unlink+0x36/0x80 [xfs]
[  755.172533]  [<c01ad0bd>] vfs_unlink+0xdd/0x130
[  755.172540]  [<c0394a44>] _spin_unlock+0x14/0x20
[  755.172546]  [<c01af13e>] do_unlinkat+0x14e/0x160
[  755.172552]  [<c014bc6a>] trace_hardirqs_on_caller+0xfa/0x130
[  755.172558]  [<c03949c0>] _spin_unlock_irq+0x20/0x30
[  755.172564]  [<c02400d4>] copy_to_user+0x34/0x80
[  755.172570]  [<c023fdbc>] trace_hardirqs_on_thunk+0xc/0x10
[  755.172576]  [<c03970b0>] do_page_fault+0x0/0x780
[  755.172583]  [<c014bc6a>] trace_hardirqs_on_caller+0xfa/0x130
[  755.172589]  [<c0103cbd>] sysenter_do_call+0x12/0x31

[arekm@farm ~]$ zgrep LOCKDEP /proc/config.gz
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
# CONFIG_DEBUG_LOCKDEP is not set

I don't see anything strictly lockdep related in dmesg so it doesn't seem to 
be triggered.

D-state lock is also happening if I drop usrquota,prjquota, reboot and retry 
the test. I assume something was written on disk that triggers the problem.

Note that now I'm testing on a second machine (UP i686, SMP kernel), so this 
isn't unique problem.

> Cheers,
>
> Dave.

-- 
Arkadiusz MiÅkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

<Prev in Thread] Current Thread [Next in Thread>