Resend, because my original mail seems to be lost.
Utz Lehmann [leh@xxxxxxxxxx] wrote:
> Hi
>
> There was no pagebuf_iostart function in the back trace, i used
> pagebuf_iorequest, hope thats right.
>
> utz
>
>
> Entering kdb (current=0xc035a000, pid 0) due to Keyboard Entry
> kdb> ps
> Task Addr Pid Parent [*] cpu State Thread Command
> 0xc15fe000 00000001 00000000 0 000 stop 0xc15fe260 init
> 0xc15f0000 00000002 00000001 0 000 stop 0xc15f0260 keventd
> 0xc15ec000 00000003 00000001 0 000 stop 0xc15ec260 kswapd
> 0xc15ea000 00000004 00000001 0 000 stop 0xc15ea260 kreclaimd
> 0xc15e8000 00000005 00000001 0 000 stop 0xc15e8260 bdflush
> 0xc15e6000 00000006 00000001 0 000 stop 0xc15e6260 kupdate
> 0xc1580000 00000007 00000001 0 000 stop 0xc1580260 mdrecoveryd
> 0xc1562000 00000008 00000001 0 000 stop 0xc1562260 pagebuf_daemon
> 0xce516000 00001716 00000001 0 000 stop 0xce516260 rc
> 0xce2b4000 00002102 00001716 0 000 stop 0xce2b4260 S20reboot
> 0xcf818000 00002123 00002102 0 000 stop 0xcf818260 umount
> kdb> btp 2123
> EBP EIP Function(args)
> 0xcf819e58 0xc0112b18 schedule+0x2d8
> kernel .text 0xc0100000 0xc0112840 0xc0112c70
> 0xc015fb13 pagebuf_iorequest+0x103 (0xce3e5d80)
> kernel .text 0xc0100000 0xc015fa10 0xc015fba0
> 0xc01c7c47 xfs_bdstrat_cb+0x27 (0xce3e5d80)
> kernel .text 0xc0100000 0xc01c7c20 0xc01c7c70
> 0xc0160763 pagebuf_delwri_flush+0xd3 (0xcf7e3ac0, 0x1, 0xcf819ec8)
> kernel .text 0xc0100000 0xc0160690 0xc0160880
> 0xc01c7d2d XFS_bflush+0x1d (0xcf7e3ac0, 0x3a01)
> kernel .text 0xc0100000 0xc01c7d10 0xc01c7d40
> 0xc01b6873 xfs_unmount+0xd3 (0xcf7d9400, 0x0, 0xc03b16a0)
> kernel .text 0xc0100000 0xc01b67a0 0xc01b6920
> 0xc01c189a fs_dounmount+0x5a (0xcf7d9400, 0x0, 0x0, 0xc03b16a0,
> 0xcf7e3dc4)
> kernel .text 0xc0100000 0xc01c1840 0xc01c18c0
> 0xc01c8d38 linvfs_put_super+0x58 (0xcf8b6800)
> kernel .text 0xc0100000 0xc01c8ce0 0xc01c8db0
> 0xc0136867 kill_super+0x87 (0xcf8b6800, 0x0, 0xc156cf40,
> 0xffffffff, 0xcfb73ac0)
> kernel .text 0xc0100000 0xc01367e0 0xc0136920
> 0xc0136c71 do_umount+0x1c1 (0xc156cf40, 0x0, 0x0)
> kernel .text 0xc0100000 0xc0136ab0 0xc0136c80
> 0xc0136d46 sys_umount+0xc6 (0x8052430, 0x0)
> more>
> kernel .text 0xc0100000 0xc0136c80 0xc0136d80
> 0xc0136d8c sys_oldumount+0xc (0x8052430, 0x804ee27, 0x8052478,
> 0x8052431, 0x804ee20)
> kernel .text 0xc0100000 0xc0136d80 0xc0136d90
> 0xc0108f77 system_call+0x33
> kernel .text 0xc0100000 0xc0108f44 0xc0108f7c
> kdb> pb 0xce3e5d80
> page_buf_t at 0xce3e5d80
> pb_flags WRITE MAPPED MAPPABLE LOCK LOCKABLE ALL_PAGES_MAPPED MEM_ALLOCATED
> pb_target 0xcf7e3ac0 pb_hold 1 pb_next 0xcf7d49c0 pb_prev 0xc144a848
> pb_file_offset 0x18000200 pb_buffer_length 0x200 pb_addr 0xcf600200
> pb_bn 0xc0001 pb_count_desired 0x200
> pb_io_remaining 0 pb_error 0 pb_mem 0xcf5fe180
> pb_iodonesema (0,0) pb_sema (0,0) pincount (1)
> pb_fspriv 0xcff2f648 pb_fspriv2 0x00000000
> kdb> reboot
>
>
>
> Russell Cattelan [cattelan@xxxxxxxxxxx] wrote:
> > Utz Lehmann wrote:
> >
> > Well drat... seems to be a really problem here.
> >
> > Not sure we are going to be able to debug this remotely, hopefully we will
> > be able
> > to reproduce the problem here.
> > But if you could extract a bit more info....
> >
> > load the modules kdbm_pg and xfsidbg.
> > once the system is hung break into kdb backtrace
> > the unmount command, then grab the first address in
> > the pagebuf_iostart function and feed it to kdb
> > kdb> pb <address>
> >
> > We are looking for the pincount and the lock status.
> > It would seem somehow the pg_delwri_flush is waiting for
> > the pagebuf to get unpinned, which should have already happened
> > before this point.
> > My guess is that somehow a pagebuf is getting re-inserted into the delwri
> > queue
> > while the flush code in trying to work on pushing everything out.
|