http://bugzilla.kernel.org/show_bug.cgi?id=8414
Summary: soft lockup on XFS write (by nfsd)
Kernel Version: 2.6.21.1
Status: NEW
Severity: high
Owner: xfs-masters@xxxxxxxxxxx
Submitter: dap@xxxxxxxxxxxxx
Most recent kernel where this bug did *NOT* occur: 2.6.20.3
Distribution: Debian Etch
Hardware Environment: Pentium D SMP, 4G RAM, x86_64
Software Environment: NFSv3 serving via UDP from XFS on LVM that's on software
RAID5 (built from 16 disks, 128k chunk)
Problem Description:
I've upgraded my kernel from 2.6.20.3 to 2.6.21.1 on my NFS server and 1 day
after the server box stopped NFS serving with the following message:
BUG: soft lockup detected on CPU#1!
Call Trace:
<IRQ> [<ffffffff802acdea>] softlockup_tick+0xfa/0x120
[<ffffffff802122af>] __do_softirq+0x5f/0xd0
[<ffffffff80295877>] update_process_times+0x57/0x90
[<ffffffff8027a314>] smp_local_timer_interrupt+0x34/0x60
[<ffffffff8027a8ab>] smp_apic_timer_interrupt+0x4b/0x80
[<ffffffff80263696>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff881466b0>] :xfs:xfs_iext_idx_to_irec+0x100/0x110
[<ffffffff88146703>] :xfs:xfs_iext_get_ext+0x43/0x70
[<ffffffff88146d68>] :xfs:xfs_iext_bno_to_ext+0x138/0x160
[<ffffffff881248f5>] :xfs:xfs_bmap_search_multi_extents+0x75/0x120
[<ffffffff88124a13>] :xfs:xfs_bmap_search_extents+0x73/0x120
[<ffffffff8812a258>] :xfs:xfs_bmapi+0x2b8/0x1150
[<ffffffff8023c966>] unmap_underlying_metadata+0x6/0x40
[<ffffffff80217b3b>] __block_commit_write+0x7b/0xc0
[<ffffffff88170c22>] :xfs:xfs_zero_eof+0xd2/0x190
[<ffffffff88171a96>] :xfs:xfs_write+0x576/0xb70
[<ffffffff8026aac9>] _write_lock_bh+0x9/0x20
[<ffffffff881fbacd>] :nf_conntrack:__nf_ct_refresh_acct+0x3d/0x120
[<ffffffff8814643d>] :xfs:xfs_iget+0x12d/0x170
[<ffffffff802236d1>] __up_read+0x21/0xb0
[<ffffffff8815cd1c>] :xfs:xfs_trans_unlocked_item+0x2c/0x60
[<ffffffff8816d460>] :xfs:xfs_file_aio_write+0x0/0x60
[<ffffffff8816d4ba>] :xfs:xfs_file_aio_write+0x5a/0x60
[<ffffffff802c9273>] do_sync_readv_writev+0xc3/0x110
[<ffffffff8821c46f>] :exportfs:find_exported_dentry+0x8f/0x530
[<ffffffff8029f390>] autoremove_wake_function+0x0/0x30
[<ffffffff804ed14e>] sunrpc_cache_lookup+0x7e/0x160
[<ffffffff802c910a>] rw_copy_check_uvector+0x8a/0x130
[<ffffffff802c97a3>] do_readv_writev+0x113/0x230
[<ffffffff8816cc94>] :xfs:xfs_fs_decode_fh+0xd4/0xe0
[<ffffffff882232d3>] :nfsd:nfsd_vfs_write+0x113/0x350
[<ffffffff8021f945>] __dentry_open+0x115/0x1e0
[<ffffffff882239e7>] :nfsd:nfsd_open+0x157/0x1c0
[<ffffffff88223cb9>] :nfsd:nfsd_write+0xd9/0x120
[<ffffffff8822aec0>] :nfsd:nfsd3_proc_write+0x110/0x150
[<ffffffff8821f25d>] :nfsd:nfsd_dispatch+0xfd/0x1f0
[<ffffffff804e4c54>] svc_process+0x3e4/0x730
[<ffffffff8026a432>] __down_read+0x12/0xa2
[<ffffffff8821f720>] :nfsd:nfsd+0x0/0x2e0
[<ffffffff8821f8c0>] :nfsd:nfsd+0x1a0/0x2e0
[<ffffffff80263878>] child_rip+0xa/0x12
[<ffffffff8821f720>] :nfsd:nfsd+0x0/0x2e0
[<ffffffff8821f720>] :nfsd:nfsd+0x0/0x2e0
[<ffffffff8026386e>] child_rip+0x0/0x12
The network graph shows a massive write at that time, I think it was the
cause.
NFS mounted by the clients with options:
rw,nosuid,nodev,noexec,vers=3,rsize=32768,wsize=32768,acregmin=1800,acregmax=3600,acdirmin=1800,acdirmax=3600,soft,intr,proto=udp,timeo=14,retrans=2,sec=sys
One of nfsd processes is spinning at 99%, the rest is in state D:
root 4817 3.1 0.0 0 0 ? R Apr29 78:46 [nfsd]
root 4818 1.0 0.0 0 0 ? D Apr29 25:28 [nfsd]
root 4819 1.0 0.0 0 0 ? D Apr29 25:40 [nfsd]
root 4820 1.0 0.0 0 0 ? D Apr29 25:31 [nfsd]
root 4821 1.0 0.0 0 0 ? D Apr29 25:34 [nfsd]
root 4822 1.0 0.0 0 0 ? D Apr29 25:40 [nfsd]
root 4823 1.0 0.0 0 0 ? D Apr29 25:32 [nfsd]
root 4824 1.0 0.0 0 0 ? D Apr29 25:36 [nfsd]
Steps to reproduce:
I don't know exactly, it's happened on a productive system, I cannot play with
it, I downgrade it to 2.6.20.3 now. I used default settings in /proc and /sys.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|