xfs-masters
[Top] [All Lists]

[xfs-masters] [Bug 8414] New: soft lockup on XFS write (by nfsd)

To: xfs-masters@xxxxxxxxxxx
Subject: [xfs-masters] [Bug 8414] New: soft lockup on XFS write (by nfsd)
From: bugme-daemon@xxxxxxxxxxxxxxxxxxx
Date: Tue, 1 May 2007 04:21:48 -0700
Reply-to: xfs-masters@xxxxxxxxxxx
Sender: xfs-masters-bounce@xxxxxxxxxxx
http://bugzilla.kernel.org/show_bug.cgi?id=8414

           Summary: soft lockup on XFS write (by nfsd)
    Kernel Version: 2.6.21.1
            Status: NEW
          Severity: high
             Owner: xfs-masters@xxxxxxxxxxx
         Submitter: dap@xxxxxxxxxxxxx


Most recent kernel where this bug did *NOT* occur: 2.6.20.3
Distribution: Debian Etch
Hardware Environment: Pentium D SMP, 4G RAM, x86_64
Software Environment: NFSv3 serving via UDP from XFS on LVM that's on software 
RAID5 (built from 16 disks, 128k chunk)
Problem Description:
I've upgraded my kernel from 2.6.20.3 to 2.6.21.1 on my NFS server and 1 day 
after the server box stopped NFS serving with the following message:

BUG: soft lockup detected on CPU#1!

Call Trace:
 <IRQ>  [<ffffffff802acdea>] softlockup_tick+0xfa/0x120
 [<ffffffff802122af>] __do_softirq+0x5f/0xd0
 [<ffffffff80295877>] update_process_times+0x57/0x90
 [<ffffffff8027a314>] smp_local_timer_interrupt+0x34/0x60
 [<ffffffff8027a8ab>] smp_apic_timer_interrupt+0x4b/0x80
 [<ffffffff80263696>] apic_timer_interrupt+0x66/0x70
 <EOI>  [<ffffffff881466b0>] :xfs:xfs_iext_idx_to_irec+0x100/0x110
 [<ffffffff88146703>] :xfs:xfs_iext_get_ext+0x43/0x70
 [<ffffffff88146d68>] :xfs:xfs_iext_bno_to_ext+0x138/0x160
 [<ffffffff881248f5>] :xfs:xfs_bmap_search_multi_extents+0x75/0x120
 [<ffffffff88124a13>] :xfs:xfs_bmap_search_extents+0x73/0x120
 [<ffffffff8812a258>] :xfs:xfs_bmapi+0x2b8/0x1150
 [<ffffffff8023c966>] unmap_underlying_metadata+0x6/0x40
 [<ffffffff80217b3b>] __block_commit_write+0x7b/0xc0
 [<ffffffff88170c22>] :xfs:xfs_zero_eof+0xd2/0x190
 [<ffffffff88171a96>] :xfs:xfs_write+0x576/0xb70
 [<ffffffff8026aac9>] _write_lock_bh+0x9/0x20
 [<ffffffff881fbacd>] :nf_conntrack:__nf_ct_refresh_acct+0x3d/0x120
 [<ffffffff8814643d>] :xfs:xfs_iget+0x12d/0x170
 [<ffffffff802236d1>] __up_read+0x21/0xb0
 [<ffffffff8815cd1c>] :xfs:xfs_trans_unlocked_item+0x2c/0x60
 [<ffffffff8816d460>] :xfs:xfs_file_aio_write+0x0/0x60
 [<ffffffff8816d4ba>] :xfs:xfs_file_aio_write+0x5a/0x60
 [<ffffffff802c9273>] do_sync_readv_writev+0xc3/0x110
 [<ffffffff8821c46f>] :exportfs:find_exported_dentry+0x8f/0x530
 [<ffffffff8029f390>] autoremove_wake_function+0x0/0x30
 [<ffffffff804ed14e>] sunrpc_cache_lookup+0x7e/0x160
 [<ffffffff802c910a>] rw_copy_check_uvector+0x8a/0x130
 [<ffffffff802c97a3>] do_readv_writev+0x113/0x230
 [<ffffffff8816cc94>] :xfs:xfs_fs_decode_fh+0xd4/0xe0
 [<ffffffff882232d3>] :nfsd:nfsd_vfs_write+0x113/0x350
 [<ffffffff8021f945>] __dentry_open+0x115/0x1e0
 [<ffffffff882239e7>] :nfsd:nfsd_open+0x157/0x1c0
 [<ffffffff88223cb9>] :nfsd:nfsd_write+0xd9/0x120
 [<ffffffff8822aec0>] :nfsd:nfsd3_proc_write+0x110/0x150
 [<ffffffff8821f25d>] :nfsd:nfsd_dispatch+0xfd/0x1f0
 [<ffffffff804e4c54>] svc_process+0x3e4/0x730
 [<ffffffff8026a432>] __down_read+0x12/0xa2
 [<ffffffff8821f720>] :nfsd:nfsd+0x0/0x2e0
 [<ffffffff8821f8c0>] :nfsd:nfsd+0x1a0/0x2e0
 [<ffffffff80263878>] child_rip+0xa/0x12
 [<ffffffff8821f720>] :nfsd:nfsd+0x0/0x2e0
 [<ffffffff8821f720>] :nfsd:nfsd+0x0/0x2e0
 [<ffffffff8026386e>] child_rip+0x0/0x12


The network graph shows a massive write at that time, I think it was the 
cause.

NFS mounted by the clients with options: 
rw,nosuid,nodev,noexec,vers=3,rsize=32768,wsize=32768,acregmin=1800,acregmax=3600,acdirmin=1800,acdirmax=3600,soft,intr,proto=udp,timeo=14,retrans=2,sec=sys

One of nfsd processes is spinning at 99%, the rest is in state D:
root      4817  3.1  0.0      0     0 ?        R    Apr29  78:46 [nfsd]
root      4818  1.0  0.0      0     0 ?        D    Apr29  25:28 [nfsd]
root      4819  1.0  0.0      0     0 ?        D    Apr29  25:40 [nfsd]
root      4820  1.0  0.0      0     0 ?        D    Apr29  25:31 [nfsd]
root      4821  1.0  0.0      0     0 ?        D    Apr29  25:34 [nfsd]
root      4822  1.0  0.0      0     0 ?        D    Apr29  25:40 [nfsd]
root      4823  1.0  0.0      0     0 ?        D    Apr29  25:32 [nfsd]
root      4824  1.0  0.0      0     0 ?        D    Apr29  25:36 [nfsd]

Steps to reproduce:
I don't know exactly, it's happened on a productive system, I cannot play with 
it, I downgrade it to 2.6.20.3 now. I used default settings in /proc and /sys.

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


<Prev in Thread] Current Thread [Next in Thread>
  • [xfs-masters] [Bug 8414] New: soft lockup on XFS write (by nfsd), bugme-daemon <=