http://oss.sgi.com/bugzilla/show_bug.cgi?id=433
Summary: Soft lockups in xfs_finish_reclaim_all
Product: Linux XFS
Version: Current
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: XFS kernel code
AssignedTo: xfs-master@xxxxxxxxxxx
ReportedBy: c.pascoe@xxxxxxxxxxxxxx
I've seen soft lockup bugs reported at least eight times on one of our slower
(Dual P3 Xeon) servers that has heavy hardlink and unlink activity. I have
seen two traces with approximately equal frequency:
BUG: soft lockup detected on CPU#1!
Pid: 5210, comm: xfssyncd
EIP: 0060:[<c03421bd>] CPU: 1
EIP is at _spin_unlock_irqrestore+0xd/0x10
EFLAGS: 00000292 Tainted: PF (2.6.14-xfs-20051103)
EAX: 00000001 EBX: 00000292 ECX: 00000001 EDX: 00000292
ESI: c379d498 EDI: 00000292 EBP: db94def0 DS: 007b ES: 007b
CR0: 8005003b CR2: bfdbdf88 CR3: 0043c000 CR4: 000006d0
[<c0101110>] show_regs+0x150/0x178
[<c0143df0>] softlockup_tick+0x90/0xa0
[<c012ac53>] do_timer+0x43/0xf0
[<c0107901>] timer_interrupt+0x71/0x90
[<c0143fd5>] handle_IRQ_event+0x35/0x70
[<c014409a>] __do_IRQ+0x8a/0xf0
[<c0104ec7>] do_IRQ+0x37/0x70
[<c01037ea>] common_interrupt+0x1a/0x20
[<c01deeb6>] __down_trylock+0x66/0x80
[<c0340792>] __down_failed_trylock+0xa/0x10
[<f8ab0d67>] .text.lock.xfs_iget+0x143/0x15c [xfs]
[<f8ad1b1c>] xfs_finish_reclaim_all+0x9c/0x100 [xfs]
[<f8acb2d7>] xfs_syncsub+0x67/0x2e0 [xfs]
[<f8aca94e>] xfs_sync+0x1e/0x30 [xfs]
[<f8addabe>] vfs_sync+0x3e/0x50 [xfs]
[<f8add07f>] vfs_sync_worker+0x3f/0x50 [xfs]
[<f8add199>] xfssyncd+0x109/0x180 [xfs]
[<c01363d6>] kthread+0x96/0xe0
[<c010113d>] kernel_thread_helper+0x5/0x18
and
EIP is at xfs_finish_reclaim_all+0x83/0x100 [xfs]
EFLAGS: 00000202 Tainted: PF (2.6.14-xfs-20051103)
EAX: 00000000 EBX: e69c0a60 ECX: e69c0a60 EDX: 00000001
ESI: e69c08c0 EDI: f776f54c EBP: f73a5f3c DS: 007b ES: 007b
CR0: 8005003b CR2: 08059548 CR3: 0043c000 CR4: 000006d0
[<c0101110>] show_regs+0x150/0x178
[<c0143df0>] softlockup_tick+0x90/0xa0
[<c012ac53>] do_timer+0x43/0xf0
[<c0107901>] timer_interrupt+0x71/0x90
[<c0143fd5>] handle_IRQ_event+0x35/0x70
[<c014409a>] __do_IRQ+0x8a/0xf0
[<c0104ec7>] do_IRQ+0x37/0x70
[<c01037ea>] common_interrupt+0x1a/0x20
[<f8acb2d7>] xfs_syncsub+0x67/0x2e0 [xfs]
[<f8aca94e>] xfs_sync+0x1e/0x30 [xfs]
[<f8addabe>] vfs_sync+0x3e/0x50 [xfs]
[<f8add07f>] vfs_sync_worker+0x3f/0x50 [xfs]
[<f8add199>] xfssyncd+0x109/0x180 [xfs]
[<c01363d6>] kthread+0x96/0xe0
[<c010113d>] kernel_thread_helper+0x5/0x18
Whilst the machine seems to continue operating fine, it may be a concern that
we loop there for 10 seconds. Perhaps a cond_resched() in the "continue" paths
would help things, though this may have other implications.
--
Configure bugmail: http://oss.sgi.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|