Utz Lehmann wrote:
Well drat... seems to be a really problem here.
Not sure we are going to be able to debug this remotely, hopefully we will be
able
to reproduce the problem here.
But if you could extract a bit more info....
load the modules kdbm_pg and xfsidbg.
once the system is hung break into kdb backtrace
the unmount command, then grab the first address in
the pagebuf_iostart function and feed it to kdb
kdb> pb <address>
We are looking for the pincount and the lock status.
It would seem somehow the pg_delwri_flush is waiting for
the pagebuf to get unpinned, which should have already happened
before this point.
My guess is that somehow a pagebuf is getting re-inserted into the delwri queue
while the flush code in trying to work on pushing everything out.
> Russell Cattelan [cattelan@xxxxxxxxxxx] wrote:
> > Utz Lehmann wrote:
> >
> > > Russell Cattelan [cattelan@xxxxxxxxxxx] wrote:
> > >
> >
> > Ok sorry this was my fault I should have explained this better.
> > We need the back trace of the umount process.
> > to do this simply type ps from the kdb prompt find the umount pid
> > the type
> > kdb> btp <umount pid>
>
> No problem:
>
> Unmounting file systems
> shmfs umounted
> /dev/sda2 umounted
> /dev/vg00/tmp umounted
> /dev/vg00/opt umounted
>
> Entering kdb (current=0xc035a000, pid 0) due to Keyboard Entry
> kdb> ps
> Task Addr Pid Parent [*] cpu State Thread Command
> 0xc15fe000 00000001 00000000 0 000 stop 0xc15fe260 init
> 0xc15f0000 00000002 00000001 0 000 stop 0xc15f0260 keventd
> 0xc15ec000 00000003 00000001 0 000 stop 0xc15ec260 kswapd
> 0xc15ea000 00000004 00000001 0 000 stop 0xc15ea260 kreclaimd
> 0xc15e8000 00000005 00000001 0 000 stop 0xc15e8260 bdflush
> 0xc15e6000 00000006 00000001 0 000 stop 0xc15e6260 kupdate
> 0xc1580000 00000007 00000001 0 000 stop 0xc1580260 mdrecoveryd
> 0xc1562000 00000008 00000001 0 000 stop 0xc1562260 pagebuf_daemon
> 0xce0b8000 00001130 00000001 0 000 stop 0xce0b8260 rc
> 0xce268000 00001520 00001130 0 000 stop 0xce268260 S20reboot
> 0xcf83c000 00001541 00001520 0 000 stop 0xcf83c260 umount
> kdb> btp 1541
> EBP EIP Function(args)
> 0xcf83de58 0xc0112b18 schedule+0x2d8
> kernel .text 0xc0100000 0xc0112840 0xc0112c70
> 0xc015fb13 pagebuf_iorequest+0x103 (0xcf6ca480)
> kernel .text 0xc0100000 0xc015fa10 0xc015fba0
> 0xc01c7c47 xfs_bdstrat_cb+0x27 (0xcf6ca480)
> kernel .text 0xc0100000 0xc01c7c20 0xc01c7c70
> 0xc0160763 pagebuf_delwri_flush+0xd3 (0xcf7b5ac0, 0x1, 0xcf83dec8)
> kernel .text 0xc0100000 0xc0160690 0xc0160880
> 0xc01c7d2d XFS_bflush+0x1d (0xcf7b5ac0, 0x3a01)
> kernel .text 0xc0100000 0xc01c7d10 0xc01c7d40
> 0xc01b6873 xfs_unmount+0xd3 (0xcf7aa400, 0x0, 0xc03b16a0)
> kernel .text 0xc0100000 0xc01b67a0 0xc01b6920
> 0xc01c189a fs_dounmount+0x5a (0xcf7aa400, 0x0, 0x0, 0xc03b16a0,
> 0xcf7b5dc4)
> kernel .text 0xc0100000 0xc01c1840 0xc01c18c0
> 0xc01c8d38 linvfs_put_super+0x58 (0xcf8eb800)
> kernel .text 0xc0100000 0xc01c8ce0 0xc01c8db0
> 0xc0136867 kill_super+0x87 (0xcf8eb800, 0x0, 0xc156cec0,
> 0xffffffff, 0xcfb29ac0)
> kernel .text 0xc0100000 0xc01367e0 0xc0136920
> 0xc0136c71 do_umount+0x1c1 (0xc156cec0, 0x0, 0x0)
> kernel .text 0xc0100000 0xc0136ab0 0xc0136c80
> 0xc0136d46 sys_umount+0xc6 (0x8052430, 0x0)
> more>
> kernel .text 0xc0100000 0xc0136c80 0xc0136d80
> 0xc0136d8c sys_oldumount+0xc (0x8052430, 0x804ee27, 0x8052478,
> 0x8052431, 0x804ee20)
> kernel .text 0xc0100000 0xc0136d80 0xc0136d90
> 0xc0108f77 system_call+0x33
> kernel .text 0xc0100000 0xc0108f44 0xc0108f7c
> kdb> reboot
>
> btw: 1 of ca. 7 tries the umount works.
>
> utz
--
Russell Cattelan
--
Digital Elves inc. -- Currently on loan to SGI
Linux XFS core developer.
|