I am happy to report that after upgrading to 2.4.9-7SGI_XFS_PR3, this
bug no longer happens. I am able to complete bonnie++ runs on both
sync'd and unsync'd MD. I still don't know what was causing the
original problem, but it seems to be fixed w/ later XFS/RH kernel.
Cross your fingers.
-tduffy
On Fri, 2001-10-19 at 18:53, Thomas Duffy wrote:
> I have a MD RAID 10 device hooked up to a 2 p 2G machine (running
> enterprise version of the kernel), basically setup like this:
>
> < ----------------- > stripe (RAID 0)
>
> [ d1 | d2 | d3 | d4 ]
> ^ ^ ^ ^
> | | | | mirrors (RAID 1)
>
> [ d5 d6 d7 d8 ]
>
> both the RAID 1's and RAID 0's have 64k chunks.
>
> when I run bonnie on this, very soon (like 5 minutes into the test), the
> process hangs on D. it is of course unkillable and unhappy. here is the
> btp from kdb of the bonnie++ process:
>
> 0xeaa45e10 0xc01180a1 schedule+0x485 (0xc1fc9a78)
> kernel .text 0xc0100000 0xc0117c1c
> 0xc01182ac
> 0xc012d022 ___wait_on_page+0x66 (0xc1fc9a78, 0x0)
> kernel .text 0xc0100000 0xc012cfbc
> 0xc012d06c
> 0xc012c471 truncate_list_pages+0xd5 (0xeaa45e94)
> kernel .text 0xc0100000 0xc012c39c
> 0xc012c6ac
> 0xc012c719 truncate_inode_pages+0x6d (0xeafb7118, 0x0, 0x0)
> kernel .text 0xc0100000 0xc012c6ac
> 0xc012c76c
> 0xc017116e pagebuf_inval+0x1a (0xeafb7060, 0x0, 0x0, 0x0)
> kernel .text 0xc0100000 0xc0171154
> 0xc0171174
> 0xc01e3b61 fs_tosspages+0x29 (0xf5a51bb0, 0x0, 0x0,
> 0xffffffff, 0xffffffff)
> kernel .text 0xc0100000 0xc01e3b38
> 0xc01e3b68
> 0xc01c36af xfs_itruncate_start+0x8f (0xf5a51b98, 0x1, 0x0,
> 0x0, 0xf5a51b98)
> kernel .text 0xc0100000 0xc01c3620
> 0xc01c36b8
> 0xc01dcac9 xfs_inactive+0x1b9 (0xf5a51bb0, 0x0)
> kernel .text 0xc0100000 0xc01dc910
> 0xc01dcd90
> 0xc01ec7bf vn_put+0x4b (0xeafb7184)
> kernel .text 0xc0100000 0xc01ec774
> 0xc01ec838
> 0xc01eb9ab linvfs_put_inode+0x17 (0xeafb7060)
> kernel .text 0xc0100000 0xc01eb994
> 0xc01eb9b0
> 0xc0154f2d iput_free+0x2d (0xeafb7060)
> kernel .text 0xc0100000 0xc0154f00
> 0xc015510c
> 0xc0152dc6 d_delete+0x62 (0xeac42560)
> kernel .text 0xc0100000 0xc0152d64
> 0xc0152e04
> 0xc014b81d vfs_unlink+0x1e9 (0xecc287a0, 0xeac42560)
> kernel .text 0xc0100000 0xc014b634
> 0xc014b854
> 0xc014b8fa sys_unlink+0xa6 (0x80529b0, 0x0, 0xbffff4e0, 0x3,
> 0x80529b0)
> kernel .text 0xc0100000 0xc014b854
> 0xc014b978
> 0xc0107073 system_call+0x33
> kernel .text 0xc0100000 0xc0107040
> 0xc0107078
>
>
> the system is still usable, but anything that tries to hit that array
> hangs. any other info I can gather to help debug this?
>
> -tduffy
>
|