http://bugzilla.kernel.org/show_bug.cgi?id=8414
------- Additional Comments From dap@xxxxxxxxxxxxx 2007-05-01 15:40 -------
I must correct myself, and I have some more info on this issue.
First, I did not upgrade from 2.6.20.3, I upgraded from 2.6.15.7. I've 3 NFS
servers, 2 of them are running 2.6.20.3 for a long time without problems.
Yesterday, I downgraded the third box (called Sulaco) to 2.6.20.3 from
2.6.21.1 and I believed the reported problem has gone, but I was wrong. The
same error occurred with 2.6.20.3 on Sulaco after 1 day! I've got a kernel
message now that cares about filesystem corruption (I think) just before the
lockup.
May 1 23:10:45 Sulaco kernel: Filesystem "dm-6": Access to block zero in
inode 2428580890 star
t_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: f0
May 1 23:10:45 Sulaco kernel: Filesystem "dm-6": Access to block zero in
inode 2428580890 star
t_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: fd
May 1 23:10:45 Sulaco kernel: Filesystem "dm-6": Access to block zero in
inode 2428580890 star
t_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: fe
May 1 23:10:45 Sulaco kernel: Filesystem "dm-6": Access to block zero in
inode 2428580890 star
t_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: fe
May 1 23:10:45 Sulaco kernel: Filesystem "dm-6": Access to block zero in
inode 2428580890 star
t_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: fe
and then:
May 1 23:10:55 Sulaco kernel: Call Trace:
May 1 23:10:55 Sulaco kernel: <IRQ> [softlockup_tick+233/272]
softlockup_tick+0xe9/0x110
May 1 23:10:55 Sulaco kernel: [update_process_times+80/144]
update_process_times+0x50/0x90
May 1 23:10:55 Sulaco kernel: [smp_local_timer_interrupt+52/96]
smp_local_timer_interrupt+0x3
4/0x60
May 1 23:10:55 Sulaco kernel: [smp_apic_timer_interrupt+89/128]
smp_apic_timer_interrupt+0x59
/0x80
May 1 23:10:55 Sulaco kernel: [apic_timer_interrupt+102/112]
apic_timer_interrupt+0x66/0x70
May 1 23:10:55 Sulaco kernel: <EOI>
[_end+127398072/2129470264] :xfs:xfs_iext_get_ext+0x0/0x
70
May 1 23:10:55 Sulaco kernel:
[_end+127274467/2129470264] :xfs:xfs_bmapi+0xdeb/0x1030
May 1 23:10:55 Sulaco kernel:
[_end+127551052/2129470264] :xfs:xfs_zero_eof+0xe4/0x190
May 1 23:10:55 Sulaco kernel:
[_end+127552484/2129470264] :xfs:xfs_write+0x4ec/0x9e0
May 1 23:10:55 Sulaco kernel: [wake_idle+26/176] wake_idle+0x1a/0xb0
May 1 23:10:55 Sulaco kernel: [activate_task+145/240]
activate_task+0x91/0xf0
May 1 23:10:55 Sulaco kernel: [__up_read+117/176] __up_read+0x75/0xb0
May 1 23:10:55 Sulaco kernel:
[_end+127471524/2129470264] :xfs:xfs_trans_unlocked_item+0x2c/0
x60
May 1 23:10:55 Sulaco kernel:
[_end+127534520/2129470264] :xfs:xfs_file_aio_write+0x0/0x70
May 1 23:10:55 Sulaco kernel:
[_end+127534617/2129470264] :xfs:xfs_file_aio_write+0x61/0x70
May 1 23:10:55 Sulaco kernel: [do_sync_readv_writev+195/272]
do_sync_readv_writev+0xc3/0x110
May 1 23:10:55 Sulaco kernel:
[_end+128189601/2129470264] :exportfs:find_exported_dentry+0x89
/0x500
May 1 23:10:55 Sulaco kernel: [dev_queue_xmit+585/608]
dev_queue_xmit+0x249/0x260
May 1 23:10:55 Sulaco kernel: [autoremove_wake_function+0/48]
autoremove_wake_function+0x0/0x
30
May 1 23:10:55 Sulaco kernel: [sunrpc_cache_lookup+126/336]
sunrpc_cache_lookup+0x7e/0x150
May 1 23:10:55 Sulaco kernel: [rw_copy_check_uvector+112/256]
rw_copy_check_uvector+0x70/0x10
0
May 1 23:10:55 Sulaco kernel: [do_readv_writev+229/464]
do_readv_writev+0xe5/0x1d0
May 1 23:10:55 Sulaco kernel:
[_end+127533722/2129470264] :xfs:xfs_fs_decode_fh+0xe2/0xf0
May 1 23:10:55 Sulaco kernel:
[_end+128239116/2129470264] :nfsd:nfsd_setuser+0x154/0x1c0
May 1 23:10:55 Sulaco kernel:
[_end+128213299/2129470264] :nfsd:fh_verify+0x50b/0x530
May 1 23:10:55 Sulaco kernel:
[_end+128220538/2129470264] :nfsd:nfsd_vfs_write+0xe2/0x2d0
When I rebooted the machine, I've got a corrupted filesystem! :(
attempt to access beyond end of device
dm-6: rw=1, want=0, limit=2097152000
Buffer I/O error on device dm-6, logical block 18446744073709551615
lost page write due to I/O error on dm-6
XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1590 of file
fs/xfs/xfs_alloc.c. Caller 0xffffffff8810b347
Call Trace:
[<ffffffff8810a457>] :xfs:xfs_free_ag_extent+0x4b7/0x690
[<ffffffff8810b347>] :xfs:xfs_free_extent+0xa7/0xd0
[<ffffffff8811a4bb>] :xfs:xfs_bmap_finish+0xeb/0x170
[<ffffffff88137e8d>] :xfs:xfs_itruncate_finish+0x1ad/0x2d0
[<ffffffff8815245e>] :xfs:xfs_inactive+0x24e/0x480
[<ffffffff88160695>] :xfs:xfs_fs_clear_inode+0xa5/0xf0
[<ffffffff80291f9a>] clear_inode+0xca/0x130
[<ffffffff80292f7f>] generic_delete_inode+0xdf/0x140
[<ffffffff802931cd>] iput+0x6d/0x70
[<ffffffff8028fc4f>] prune_one_dentry+0x6f/0xb0
[<ffffffff8028fdd4>] prune_dcache+0x144/0x1b0
[<ffffffff802903e9>] shrink_dcache_memory+0x19/0x50
[<ffffffff8025f6a7>] shrink_slab+0x107/0x180
[<ffffffff80260c72>] balance_pgdat+0x242/0x380
[<ffffffff80243680>] keventd_create_kthread+0x0/0x90
[<ffffffff80243680>] keventd_create_kthread+0x0/0x90
[<ffffffff80260ecd>] kswapd+0x11d/0x120
[<ffffffff80243c10>] autoremove_wake_function+0x0/0x30
[<ffffffff80243c10>] autoremove_wake_function+0x0/0x30
[<ffffffff80260db0>] kswapd+0x0/0x120
[<ffffffff80260db0>] kswapd+0x0/0x120
[<ffffffff80243642>] kthread+0x92/0xd0
[<ffffffff8020a5b8>] child_rip+0xa/0x12
[<ffffffff80243680>] keventd_create_kthread+0x0/0x90
[<ffffffff802435b0>] kthread+0x0/0xd0
[<ffffffff8020a5ae>] child_rip+0x0/0x12
xfs_force_shutdown(dm-6,0x8) called from line 4272 of file fs/xfs/xfs_bmap.c.
Return address = 0xffffffff88161368
Filesystem "dm-6": Corruption of in-memory data detected. Shutting down
filesystem: dm-6
Please umount the filesystem, and rectify the problem(s)
What going on here? I consider 2.6.20.3 stable for this job as I use it on the
another 2 NFS server box for a long time and the setup of this boxes are very
similar. Maybe hardware problem? Memory failure?? But why the call trace is so
similar?
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|