xfs-masters
[Top] [All Lists]

[xfs-masters] [Bug 8414] soft lockup on XFS write (by nfsd)

To: xfs-masters@xxxxxxxxxxx
Subject: [xfs-masters] [Bug 8414] soft lockup on XFS write (by nfsd)
From: bugme-daemon@xxxxxxxxxxxxxxxxxxx
Date: Tue, 1 May 2007 15:40:38 -0700
Reply-to: xfs-masters@xxxxxxxxxxx
Sender: xfs-masters-bounce@xxxxxxxxxxx
http://bugzilla.kernel.org/show_bug.cgi?id=8414





------- Additional Comments From dap@xxxxxxxxxxxxx  2007-05-01 15:40 -------
I must correct myself, and I have some more info on this issue.

First, I did not upgrade from 2.6.20.3, I upgraded from 2.6.15.7. I've 3 NFS 
servers, 2 of them are running 2.6.20.3 for a long time without problems. 
Yesterday, I downgraded the third box (called Sulaco) to 2.6.20.3 from 
2.6.21.1 and I believed the reported problem has gone, but I was wrong. The 
same error occurred with 2.6.20.3 on Sulaco after 1 day! I've got a kernel 
message now that cares about filesystem corruption (I think) just before the 
lockup.

May  1 23:10:45 Sulaco kernel: Filesystem "dm-6": Access to block zero in 
inode 2428580890 star
t_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: f0
May  1 23:10:45 Sulaco kernel: Filesystem "dm-6": Access to block zero in 
inode 2428580890 star
t_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: fd
May  1 23:10:45 Sulaco kernel: Filesystem "dm-6": Access to block zero in 
inode 2428580890 star
t_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: fe
May  1 23:10:45 Sulaco kernel: Filesystem "dm-6": Access to block zero in 
inode 2428580890 star
t_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: fe
May  1 23:10:45 Sulaco kernel: Filesystem "dm-6": Access to block zero in 
inode 2428580890 star
t_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: fe

and then:

May  1 23:10:55 Sulaco kernel: Call Trace:
May  1 23:10:55 Sulaco kernel:  <IRQ>  [softlockup_tick+233/272] 
softlockup_tick+0xe9/0x110
May  1 23:10:55 Sulaco kernel:  [update_process_times+80/144] 
update_process_times+0x50/0x90
May  1 23:10:55 Sulaco kernel:  [smp_local_timer_interrupt+52/96] 
smp_local_timer_interrupt+0x3
4/0x60
May  1 23:10:55 Sulaco kernel:  [smp_apic_timer_interrupt+89/128] 
smp_apic_timer_interrupt+0x59
/0x80
May  1 23:10:55 Sulaco kernel:  [apic_timer_interrupt+102/112] 
apic_timer_interrupt+0x66/0x70
May  1 23:10:55 Sulaco kernel:  <EOI>  
[_end+127398072/2129470264] :xfs:xfs_iext_get_ext+0x0/0x
70
May  1 23:10:55 Sulaco kernel:  
[_end+127274467/2129470264] :xfs:xfs_bmapi+0xdeb/0x1030
May  1 23:10:55 Sulaco kernel:  
[_end+127551052/2129470264] :xfs:xfs_zero_eof+0xe4/0x190
May  1 23:10:55 Sulaco kernel:  
[_end+127552484/2129470264] :xfs:xfs_write+0x4ec/0x9e0
May  1 23:10:55 Sulaco kernel:  [wake_idle+26/176] wake_idle+0x1a/0xb0
May  1 23:10:55 Sulaco kernel:  [activate_task+145/240] 
activate_task+0x91/0xf0
May  1 23:10:55 Sulaco kernel:  [__up_read+117/176] __up_read+0x75/0xb0
May  1 23:10:55 Sulaco kernel:  
[_end+127471524/2129470264] :xfs:xfs_trans_unlocked_item+0x2c/0
x60
May  1 23:10:55 Sulaco kernel:  
[_end+127534520/2129470264] :xfs:xfs_file_aio_write+0x0/0x70
May  1 23:10:55 Sulaco kernel:  
[_end+127534617/2129470264] :xfs:xfs_file_aio_write+0x61/0x70
May  1 23:10:55 Sulaco kernel:  [do_sync_readv_writev+195/272] 
do_sync_readv_writev+0xc3/0x110
May  1 23:10:55 Sulaco kernel:  
[_end+128189601/2129470264] :exportfs:find_exported_dentry+0x89
/0x500
May  1 23:10:55 Sulaco kernel:  [dev_queue_xmit+585/608] 
dev_queue_xmit+0x249/0x260
May  1 23:10:55 Sulaco kernel:  [autoremove_wake_function+0/48] 
autoremove_wake_function+0x0/0x
30
May  1 23:10:55 Sulaco kernel:  [sunrpc_cache_lookup+126/336] 
sunrpc_cache_lookup+0x7e/0x150
May  1 23:10:55 Sulaco kernel:  [rw_copy_check_uvector+112/256] 
rw_copy_check_uvector+0x70/0x10
0
May  1 23:10:55 Sulaco kernel:  [do_readv_writev+229/464] 
do_readv_writev+0xe5/0x1d0
May  1 23:10:55 Sulaco kernel:  
[_end+127533722/2129470264] :xfs:xfs_fs_decode_fh+0xe2/0xf0
May  1 23:10:55 Sulaco kernel:  
[_end+128239116/2129470264] :nfsd:nfsd_setuser+0x154/0x1c0
May  1 23:10:55 Sulaco kernel:  
[_end+128213299/2129470264] :nfsd:fh_verify+0x50b/0x530
May  1 23:10:55 Sulaco kernel:  
[_end+128220538/2129470264] :nfsd:nfsd_vfs_write+0xe2/0x2d0


When I rebooted the machine, I've got a corrupted filesystem! :(

attempt to access beyond end of device
dm-6: rw=1, want=0, limit=2097152000
Buffer I/O error on device dm-6, logical block 18446744073709551615
lost page write due to I/O error on dm-6
XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1590 of file 
fs/xfs/xfs_alloc.c.  Caller 0xffffffff8810b347

Call Trace:
 [<ffffffff8810a457>] :xfs:xfs_free_ag_extent+0x4b7/0x690
 [<ffffffff8810b347>] :xfs:xfs_free_extent+0xa7/0xd0
 [<ffffffff8811a4bb>] :xfs:xfs_bmap_finish+0xeb/0x170
 [<ffffffff88137e8d>] :xfs:xfs_itruncate_finish+0x1ad/0x2d0
 [<ffffffff8815245e>] :xfs:xfs_inactive+0x24e/0x480
 [<ffffffff88160695>] :xfs:xfs_fs_clear_inode+0xa5/0xf0
 [<ffffffff80291f9a>] clear_inode+0xca/0x130
 [<ffffffff80292f7f>] generic_delete_inode+0xdf/0x140
 [<ffffffff802931cd>] iput+0x6d/0x70
 [<ffffffff8028fc4f>] prune_one_dentry+0x6f/0xb0
 [<ffffffff8028fdd4>] prune_dcache+0x144/0x1b0
 [<ffffffff802903e9>] shrink_dcache_memory+0x19/0x50
 [<ffffffff8025f6a7>] shrink_slab+0x107/0x180
 [<ffffffff80260c72>] balance_pgdat+0x242/0x380
 [<ffffffff80243680>] keventd_create_kthread+0x0/0x90
 [<ffffffff80243680>] keventd_create_kthread+0x0/0x90
 [<ffffffff80260ecd>] kswapd+0x11d/0x120
 [<ffffffff80243c10>] autoremove_wake_function+0x0/0x30
 [<ffffffff80243c10>] autoremove_wake_function+0x0/0x30
 [<ffffffff80260db0>] kswapd+0x0/0x120
 [<ffffffff80260db0>] kswapd+0x0/0x120
 [<ffffffff80243642>] kthread+0x92/0xd0
 [<ffffffff8020a5b8>] child_rip+0xa/0x12
 [<ffffffff80243680>] keventd_create_kthread+0x0/0x90
 [<ffffffff802435b0>] kthread+0x0/0xd0
 [<ffffffff8020a5ae>] child_rip+0x0/0x12

xfs_force_shutdown(dm-6,0x8) called from line 4272 of file fs/xfs/xfs_bmap.c.  
Return address = 0xffffffff88161368
Filesystem "dm-6": Corruption of in-memory data detected.  Shutting down 
filesystem: dm-6
Please umount the filesystem, and rectify the problem(s)


What going on here? I consider 2.6.20.3 stable for this job as I use it on the 
another 2 NFS server box for a long time and the setup of this boxes are very 
similar. Maybe hardware problem? Memory failure?? But why the call trace is so 
similar?


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


<Prev in Thread] Current Thread [Next in Thread>