xfs
[Top] [All Lists]

xfs_freeze stuck in pagebuf_delwri_flush()

To: "'linux-xfs@xxxxxxxxxxx'" <linux-xfs@xxxxxxxxxxx>
Subject: xfs_freeze stuck in pagebuf_delwri_flush()
From: Dale Stephenson <dale.stephenson@xxxxxxxxxxx>
Date: Wed, 28 Aug 2002 12:20:51 -0700
Sender: linux-xfs-bounce@xxxxxxxxxxx
I've been looking at a box that is endlessly looping inside
pagebuf_delwri_flush().  It was running the August 23rd CVS code and testing
dbench with snapshots.  It had XFS on a source volume, four snapshots of the
source volume also mounted, and dbench running on the source volume.  A
program was trying to fire a fifth snapshot, but didn't get any further than
the xfs_freeze -f command.  The first xfs_freeze -f command did not complete
in a timely fashion, so an alarm signal went off, a xfs_freeze -u command
was completed, and xfs_freeze -f was tried again.  The second xfs_freeze is
looping inside of pagebuf_delwri_flush(), while the first is inside
get_active_stripe() [inside drivers/md/raid.5], descending from
pagebuf_delwri_flush().

I'm not sure how the first freeze got stuck, but the second really mystifies
me.  the loop is list_for_each_safe(), but it alternates between two and
only two pagebuf structures.  Dumping the location of
pb_daemon->pagebuf_delwri_l looks to me
like a list head without any other nodes attached to it.  I'm not sure how
the process could get in this state.

The pagebuf daemon isn't doing anything, and the two dbench processes are
both stopped by xfs_check_frozen(), as I would expect.  There's 34 processes
runnable, but as far as I know none of them are getting time.

Below I've included the following things from a kdb session:
1) A back trace from the looping xfs_freeze
2) disassembly from the beginning of pagebuf_delwri_flush
3) memory dump from what I assume is &pb_daemon->pagebuf_delwri_l, address
hardcoded into the assembly
4) register dumps after a breakpoint is hit at the beginning of the
list_for_each_safe loop.  On repeated stops, I only saw those two values of
ebx.
5) dumps of page_buf_t for the two addresses I found in ebx
6) a backtrace from the first (non-looping) xfs_freeze

I've still got access to the box, so if there is any other information I can
get off it, please let me know.

Dale Stephenson
steph@xxxxxxxxxxxxxx

kdb> bt
EBP        EIP        Function (args)
0xda5a5bfc 0xc01f6801 pagebuf_delwri_flush+0x41 (0xcf71735c, 0x1, 0x0)
                               kernel .text 0xc0100000 0xc01f67c0 0xc01f6944
0xda5a5c10 0xc01fe554 XFS_bflush+0x10 (0xcf71735c, 0xee91e000, 0x9, 0x0,
0xda5a\4000)
                               kernel .text 0xc0100000 0xc01fe544 0xc01fe558
0xda5a5c38 0xc01c3aed xfs_fs_freeze+0x75 (0xee91e000, 0xee760380,
0xee7603a0, 0\xbffffcac, 0xee8b08dc)
                               kernel .text 0xc0100000 0xc01c3a78 0xc01c3b2c
0xda5a5f4c 0xc01fa080 xfs_ioctl+0x1414 (0xecafae40, 0x400f1000, 0x0,
0x400f1480\, 0xef13b7f4)
                               kernel .text 0xc0100000 0xc01f8c6c 0xc01fa1a0
0xda5a5ec4 0xc01239e8 do_no_page+0x48 (0xef13b7f4, 0xecafae40, 0x400f1480,
0x0,\ 0xd4ef83c4)
                               kernel .text 0xc0100000 0xc01239a0 0xc0123b10
0xda5a5ef0 0xc0123b6a handle_mm_fault+0x5a (0xef13b7f4, 0xecafae40,
0x400f1480,\ 0x0, 0xda5a4000)
                               kernel .text 0xc0100000 0xc0123b10 0xc0123bd0
0xda5a5fb4 0xc0112eac do_page_fault+0x1a4 (0xee91e000, 0xef13b810,
0x400f1480, \0xecafae40)
                               kernel .text 0xc0100000 0xc0112d08 0xc01131b6
0xda5a5f34 0xc01ff7ba linvfs_statfs+0x22 (0x30002, 0xda5a5f7c, 0xfffffff7,
0xda\5a5f7c)
                               kernel .text 0xc0100000 0xc01ff798 0xc01ff7c4
0xda5a5f54 0xc0131600 vfs_statfs+0x40 (0xee8b08f4, 0xee7603a0, 0xec6d6484,
0xc0\045877, 0xbffffcac)
                               kernel .text 0xc0100000 0xc01315c0 0xc0131610
0xda5a5f70 0xc01f7e94 linvfs_ioctl+0x48 (0xee7603a0, 0xec6d6484, 0xc0045877,
0x\bffffcac, 0xda5a4000)
                               kernel .text 0xc0100000 0xc01f7e4c 0xc01f7eb4
0xda5a5fbc 0xc013fe2a sys_ioctl+0x26a (0xbffffd28, 0xc0108c6b, 0x5,
0xc0045877,\ 0xbffffcac)
 
 
kdb> id pagebuf_delwri_flush
0xc01f67c0 pagebuf_delwri_flush:         push   %ebp
0xc01f67c1 pagebuf_delwri_flush+0x1:     mov    %esp,%ebp
0xc01f67c3 pagebuf_delwri_flush+0x3:     sub    $0x1c,%esp
0xc01f67c6 pagebuf_delwri_flush+0x6:     push   %edi
0xc01f67c7 pagebuf_delwri_flush+0x7:     push   %esi
0xc01f67c8 pagebuf_delwri_flush+0x8:     push   %ebx
0xc01f67c9 pagebuf_delwri_flush+0x9:     movl   $0x0,0xfffffff0(%ebp)
0xc01f67d0 pagebuf_delwri_flush+0x10:    lea    0xfffffff8(%ebp),%ecx
0xc01f67d3 pagebuf_delwri_flush+0x13:    mov    %ecx,0xfffffff8(%ebp)
0xc01f67d6 pagebuf_delwri_flush+0x16:    mov    %ecx,%esi
0xc01f67d8 pagebuf_delwri_flush+0x18:    mov    %esi,0xffffffec(%ebp)
0xc01f67db pagebuf_delwri_flush+0x1b:    mov    %esi,0x4(%esi)
0xc01f67de pagebuf_delwri_flush+0x1e:    mov    0xc03a0ca8,%edx
0xc01f67e4 pagebuf_delwri_flush+0x24:    lea    0xc(%edx),%eax
0xc01f67e7 pagebuf_delwri_flush+0x27:    mov    0xc(%ebp),%esi
0xc01f67ea pagebuf_delwri_flush+0x2a:    mov    %esi,0xffffffe8(%ebp)
kdb>
0xc01f67ed pagebuf_delwri_flush+0x2d:    mov    0xc(%edx),%ebx
0xc01f67f0 pagebuf_delwri_flush+0x30:    andl   $0x1,0xffffffe8(%ebp)
0xc01f67f4 pagebuf_delwri_flush+0x34:    mov    (%ebx),%ecx
0xc01f67f6 pagebuf_delwri_flush+0x36:    mov    %ecx,0xfffffff4(%ebp)
0xc01f67f9 pagebuf_delwri_flush+0x39:    cmp    %eax,%ebx
0xc01f67fb pagebuf_delwri_flush+0x3b:    je     0xc01f68c8
pagebuf_delwri_flush\+0x108
0xc01f6801 pagebuf_delwri_flush+0x41:    mov    %ebx,0xffffffe4(%ebp)
0xc01f6804 pagebuf_delwri_flush+0x44:    mov    0x8(%ebx),%eax
0xc01f6807 pagebuf_delwri_flush+0x47:    test   %eax,%eax
 
kdb> md 0xc03a0ca8
0xc03a0ca8 c184f424 c03a0cac c03a0cac 00000000
0xc03a0cb8 effabfe4 effabfe4 00000000 c03a0cc4
0xc03a0cc8 c03a0cc4 00000000 00000000 00000000
0xc03a0cd8 00000000 00000000 e856f0f0 e89f7280
 
Instruction(i) breakpoint #0 at 0xc01f6801 (adjusted)
0xc01f6801 pagebuf_delwri_flush+0x41:    int3
 
Entering kdb (current=0xda5a4000, pid 23188) due to Breakpoint @ 0xc01f6801
kdb> rd
eax = 0xc184f430 ebx = 0xc222414c ecx = 0xcf71735c edx = 0xc184f424
 
Entering kdb (current=0xda5a4000, pid 23188) due to Breakpoint @ 0xc01f6801
kdb> rd
eax = 0xc184f430 ebx = 0xeb50bbf4 ecx = 0xcf71735c edx = 0xc184f424
 
kdb> pb 0xeb50bbf4
page_buf_t at 0xeb50bbf4
  pb_flags ASYNC STALE FS_MANAGED RELEASE LOCK ALLOCATE NEXT_KEY
              ALL_PAGES_MAPPED ADDR_ALLOCATED MEM_ALLOCATED FORCEIO
  pb_target 0x00000001 pb_hold 0 pb_next 0xc222414c pb_prev 0xc222414c
  pb_hash_index 235 pb_hash_next 0xc01fe554 pb_hash_prev 0xcf71735c
  pb_file_offset 0xee91e000cf71735c pb_buffer_length 0x9 pb_addr 0xeb50a000
  pb_bn 0xc01c3aedeb50bc38 pb_count_desired 0x0
  pb_io_remaining 0 pb_error 3
  pb_page_count 0 pb_offset 0xbcbc pb_pages 0xc019fbd7
  pb_iodonesema (-1073742676,-292878116) pb_sema (-347030016,0) pincount (0)
 
kdb> pb 0xc222414c
page_buf_t at 0xc222414c
  pb_flags WRITE MAPPABLE LOCKABLE ALL_PAGES_MAPPED MEM_ALLOCATED
  pb_target 0xcf71735c pb_hold 2 pb_next 0xeb50bbf4 pb_prev 0xeb50bbf4
  pb_hash_index 8 pb_hash_next 0xc2224ab8 pb_hash_prev 0xe89e60f0
  pb_file_offset 0x4b010f0000 pb_buffer_length 0x2000 pb_addr 0x00000000
  pb_bn 0x25808780 pb_count_desired 0x2000
  pb_io_remaining 2 pb_error 0
  pb_page_count 2 pb_offset 0x0 pb_pages 0xc22241d0
  pb_iodonesema (0,0) pb_sema (0,0) pincount (0)
pb_fspriv 0xd9fdb24c pb_fspriv2 0x00000000
 
kdb> btp 22750
EBP        EIP        Function (args)
0xeb50b9b0 0xc0113a93 schedule+0x373 (0xcf904000, 0x0, 0xeb282bbc,
0x25810200, \0xed16d800)
                               kernel .text 0xc0100000 0xc0113720 0xc0113ab8
0xeb50ba00 0xc024b185 get_active_stripe+0xa1 (0xcf904000, 0x3c02700, 0x200,
0x0\, 0xeb282bbc)
                               kernel .text 0xc0100000 0xc024b0e4 0xc024b594
0xeb50ba2c 0xc024d250 raid5_make_request+0x54 (0xef13f79c, 0x1, 0xeb282bbc,
0x9\00)
                               kernel .text 0xc0100000 0xc024d1fc 0xc024d300
0xeb50ba44 0xc0250983 md_make_request+0x3b (0xc03abcf8, 0x1, 0xeb282bbc,
0xc653\0f3c, 0x0)
                               kernel .text 0xc0100000 0xc0250948 0xc02509b0
0xeb50ba68 0xc022ca59 generic_make_request+0x9d (0x1, 0xeb282bbc, 0x0,
0xc11645\08, 0xc222414c)
                               kernel .text 0xc0100000 0xc022c9bc 0xc022cae8
0xeb50bae0 0xc01f5c8a _pagebuf_page_io+0x39e (0xc1164508, 0xc222414c,
0x2580878\8, 0x0, 0x3a00)
                               kernel .text 0xc0100000 0xc01f58ec 0xc01f5ce0
0xeb50bb34 0xc01f5dfc _page_buf_page_apply+0x11c (0xc222414c, 0x10f0000,
0x4b, \0xc1164508, 0x0)
                               kernel .text 0xc0100000 0xc01f5ce0 0xc01f5e0c
0xeb50bb80 0xc01f630c _pagebuf_segment_apply+0xc0 (0xc222414c, 0xc222414c,
0xc2\224214, 0x0, 0x1)
                               kernel .text 0xc0100000 0xc01f624c 0xc01f635c
0xeb50bbbc 0xc01f5eef pagebuf_iorequest+0xe3 (0xc222414c)
                               kernel .text 0xc0100000 0xc01f5e0c 0xc01f5f48
0xeb50bbc8 0xc01fe4cc xfs_bdstrat_cb+0x18 (0xc222414c, 0xeeb8365c,
0xeb50a000, \0xee91e000, 0xc222414c)
                               kernel .text 0xc0100000 0xc01fe4b4 0xc01fe4f0
0xeb50bbfc 0xc01f68a4 pagebuf_delwri_flush+0xe4 (0xcf71735c, 0x1, 0x0)
more>
                               kernel .text 0xc0100000 0xc01f67c0 0xc01f6944
0xeb50bc10 0xc01fe554 XFS_bflush+0x10 (0xcf71735c, 0xee91e000, 0x9, 0x0,
0xeb50\a000)
                               kernel .text 0xc0100000 0xc01fe544 0xc01fe558
0xeb50bc38 0xc01c3aed xfs_fs_freeze+0x75 (0xee91e000, 0xee760380,
0xee7603a0, 0\xbffffcac, 0xee8b08dc)
                               kernel .text 0xc0100000 0xc01c3a78 0xc01c3b2c
0xeb50bf4c 0xc01fa080 xfs_ioctl+0x1414 (0xecafad10, 0x400f1000, 0x0,
0x400f1480\, 0xef13bd94)
                               kernel .text 0xc0100000 0xc01f8c6c 0xc01fa1a0
0xeb50bec4 0xc01239e8 do_no_page+0x48 (0xef13bd94, 0xecafad10, 0x400f1480,
0x0,
0xe21fd3c4)
                               kernel .text 0xc0100000 0xc01239a0 0xc0123b10
0xeb50bef0 0xc0123b6a handle_mm_fault+0x5a (0xef13bd94, 0xecafad10,
0x400f1480,
0x0, 0xeb50a000)
                               kernel .text 0xc0100000 0xc0123b10 0xc0123bd0
0xeb50bfb4 0xc0112eac do_page_fault+0x1a4 (0xee91e000, 0xef13bdb0,
0x400f1480, \0xecafad10)
                               kernel .text 0xc0100000 0xc0112d08 0xc01131b6
0xeb50bf34 0xc01ff7ba linvfs_statfs+0x22 (0x30002, 0xeb50bf7c, 0xfffffff7,
0xeb\50bf7c)
                               kernel .text 0xc0100000 0xc01ff798 0xc01ff7c4
0xeb50bf54 0xc0131600 vfs_statfs+0x40 (0xee8b08f4, 0xee7603a0, 0xe5d50594,
0xc0\045877, 0xbffffcac)
                               kernel .text 0xc0100000 0xc01315c0 0xc0131610
0xeb50bf70 0xc01f7e94 linvfs_ioctl+0x48 (0xee7603a0, 0xe5d50594, 0xc0045877,
0x\bffffcac, 0xeb50a000)
                               kernel .text 0xc0100000 0xc01f7e4c 0xc01f7eb4
0xeb50bfbc 0xc013fe2a sys_ioctl+0x26a (0xbffffd28, 0xc0108c6b, 0x9,
0xc0045877,
0xbffffcac)
                               kernel .text 0xc0100000 0xc013fbc0 0xc013fe50
           0xc0108d5c error_code+0x34
                               kernel .text 0xc0100000 0xc0108d28 0xc0108d64
more>
Interrupt registers:
eax = 0x00000001 ebx = 0xbffffd28 ecx = 0xc0108c6b edx = 0x00000009
esi = 0xc0045877 edi = 0xbffffcac esp = 0x00000023 eip = 0x0000002b
ebp = 0xbffffd94 xss = 0x00000246 xcs = 0x00000036 eflags = 0x400f1494
xds = 0xbffffd28 xes = 0x00000036 origeax = 0x400e002b &regs = 0xeb50bfbc
Interrupt from user space, end of kernel trace


<Prev in Thread] Current Thread [Next in Thread>
  • xfs_freeze stuck in pagebuf_delwri_flush(), Dale Stephenson <=