xfs-masters
[Top] [All Lists]

[Bug 86051] New: xfsdump gets stuck with 3.17

To: xfs-masters@xxxxxxxxxxx
Subject: [Bug 86051] New: xfsdump gets stuck with 3.17
From: bugzilla-daemon@xxxxxxxxxxxxxxxxxxx
Date: Sat, 11 Oct 2014 14:02:37 +0000
Auto-submitted: auto-generated
Delivered-to: xfs-masters@xxxxxxxxxxx
https://bugzilla.kernel.org/show_bug.cgi?id=86051

            Bug ID: 86051
           Summary: xfsdump gets stuck with 3.17
           Product: File System
           Version: 2.5
    Kernel Version: 3.17.0
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: XFS
          Assignee: xfs-masters@xxxxxxxxxxx
          Reporter: v13@xxxxxx
        Regression: No

Hi,

I recently built 3.17 and started getting xfsdump stuck. Once xfsdump is stuck
it's unkillable, even with -9. The backtraces that are listed bellow are caused
by this. I tested this twice (test, got stuck, reboot, xfs_repair (no errors),
test, got stuck). I left the xfsdump running for ~24 hours but nothing
happened. I'm including two backtraces, the first and the last, but there were
more in between. After that nothing was printed, even though xfsdump was still
stuck.

First backtrace:
Oct 11 03:53:31 hell kernel: INFO: task xfsdump:3269 blocked for more than 120
seconds.
Oct 11 03:53:31 hell kernel:      Not tainted 3.17.0-v2-v #34
Oct 11 03:53:31 hell kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Oct 11 03:53:31 hell kernel: xfsdump         D 0000000000000001     0  3269  
3252 0x00000080
Oct 11 03:53:31 hell kernel: ffff8802aa23f9a0 0000000000000002 000000000000a000
ffff8802accce180
Oct 11 03:53:31 hell kernel: ffff8802aa23ffd8 ffff880408e0c920 ffff8802accce180
ffff8802aa23f8e8
Oct 11 03:53:31 hell kernel: ffffffff8113e1b7 0000000001b56000 ffff8802aa23f978
ffff8802aa23f960
Oct 11 03:53:31 hell kernel: Call Trace:
Oct 11 03:53:31 hell kernel: [<ffffffff8113e1b7>] ?
lru_cache_add_active_or_unevictable+0x27/0x90
Oct 11 03:53:31 hell kernel: [<ffffffffa033f7b1>] ?
xfs_iext_bno_to_ext+0xa1/0x1b0 [xfs]
Oct 11 03:53:31 hell kernel: [<ffffffffa0324b88>] ? xfs_bmbt_get_all+0x18/0x20
[xfs]
Oct 11 03:53:31 hell kernel: [<ffffffffa031a4e8>] ?
xfs_bmap_search_multi_extents+0xa8/0x130 [xfs]
Oct 11 03:53:31 hell kernel: [<ffffffff814be799>] schedule+0x29/0x70
Oct 11 03:53:31 hell kernel: [<ffffffff814c13b9>] schedule_timeout+0x179/0x200
Oct 11 03:53:31 hell kernel: [<ffffffff81137135>] ?
get_page_from_freelist+0x3c5/0x6c0
Oct 11 03:53:31 hell kernel: [<ffffffff814c0544>] __down+0x64/0xa0
Oct 11 03:53:31 hell kernel: [<ffffffffa034d4db>] ? _xfs_buf_find+0x14b/0x2a0
[xfs]
Oct 11 03:53:31 hell kernel: [<ffffffff8108d674>] down+0x44/0x50
Oct 11 03:53:31 hell kernel: [<ffffffffa034d2fc>] xfs_buf_lock+0x3c/0xd0 [xfs]
Oct 11 03:53:31 hell kernel: [<ffffffffa034d4db>] _xfs_buf_find+0x14b/0x2a0
[xfs]
Oct 11 03:53:31 hell kernel: [<ffffffffa034d75a>] xfs_buf_get_map+0x2a/0x190
[xfs]
Oct 11 03:53:31 hell kernel: [<ffffffffa034e42c>] xfs_buf_read_map+0x2c/0x110
[xfs]
Oct 11 03:53:31 hell kernel: [<ffffffffa0379669>]
xfs_trans_read_buf_map+0x1b9/0x460 [xfs]
Oct 11 03:53:31 hell kernel: [<ffffffffa033d3dd>] xfs_read_agi+0x8d/0xe0 [xfs]
Oct 11 03:53:31 hell kernel: [<ffffffffa033d464>] xfs_ialloc_read_agi+0x34/0xd0
[xfs]
Oct 11 03:53:31 hell kernel: [<ffffffffa036189b>] xfs_bulkstat+0x16b/0x4d0
[xfs]
Oct 11 03:53:31 hell kernel: [<ffffffffa0361590>] ?
xfs_bulkstat_one_int+0x2e0/0x2e0 [xfs]
Oct 11 03:53:31 hell kernel: [<ffffffff811a3946>] ? dput+0x26/0x1b0
Oct 11 03:53:31 hell kernel: [<ffffffffa0357071>] xfs_ioc_bulkstat+0xd1/0x1a0
[xfs]
Oct 11 03:53:31 hell kernel: [<ffffffffa035967e>] xfs_file_ioctl+0x81e/0xb20
[xfs]
Oct 11 03:53:31 hell kernel: [<ffffffff810f443c>] ?
acct_account_cputime+0x1c/0x20
Oct 11 03:53:31 hell kernel: [<ffffffff81079f1b>] ?
account_system_time+0x8b/0x190
Oct 11 03:53:31 hell kernel: [<ffffffff812a8838>] ?
lockref_put_or_lock+0x48/0x80
Oct 11 03:53:31 hell kernel: [<ffffffff8119f8b8>] do_vfs_ioctl+0x2c8/0x490
Oct 11 03:53:31 hell kernel: [<ffffffff8107a390>] ?
vtime_account_user+0x40/0x60
Oct 11 03:53:31 hell kernel: [<ffffffff810e0c3c>] ?
__audit_syscall_entry+0x9c/0xf0
Oct 11 03:53:31 hell kernel: [<ffffffff8119fb01>] SyS_ioctl+0x81/0xa0
Oct 11 03:53:31 hell kernel: [<ffffffff814c2ad3>] tracesys+0xe1/0xe6

Last backtrace:
Oct 11 04:11:31 hell kernel: INFO: task xfsdump:3269 blocked for more than 120
seconds.
Oct 11 04:11:31 hell kernel:      Not tainted 3.17.0-v2-v #34
Oct 11 04:11:31 hell kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Oct 11 04:11:31 hell kernel: xfsdump         D 0000000000000001     0  3269  
3252 0x00000080
Oct 11 04:11:31 hell kernel: ffff8802aa23f9a0 0000000000000002 000000000000a000
ffff8802accce180
Oct 11 04:11:31 hell kernel: ffff8802aa23ffd8 ffff880408e0c920 ffff8802accce180
ffff8802aa23f8e8
Oct 11 04:11:31 hell kernel: ffffffff8113e1b7 0000000001b56000 ffff8802aa23f978
ffff8802aa23f960
Oct 11 04:11:31 hell kernel: Call Trace:
Oct 11 04:11:31 hell kernel: [<ffffffff8113e1b7>] ?
lru_cache_add_active_or_unevictable+0x27/0x90
Oct 11 04:11:31 hell kernel: [<ffffffffa033f7b1>] ?
xfs_iext_bno_to_ext+0xa1/0x1b0 [xfs]
Oct 11 04:11:31 hell kernel: [<ffffffffa0324b88>] ? xfs_bmbt_get_all+0x18/0x20
[xfs]
Oct 11 04:11:31 hell kernel: [<ffffffffa031a4e8>] ?
xfs_bmap_search_multi_extents+0xa8/0x130 [xfs]
Oct 11 04:11:31 hell kernel: [<ffffffff814be799>] schedule+0x29/0x70
Oct 11 04:11:31 hell kernel: [<ffffffff814c13b9>] schedule_timeout+0x179/0x200
Oct 11 04:11:31 hell kernel: [<ffffffff81137135>] ?
get_page_from_freelist+0x3c5/0x6c0
Oct 11 04:11:31 hell kernel: [<ffffffff814c0544>] __down+0x64/0xa0
Oct 11 04:11:31 hell kernel: [<ffffffffa034d4db>] ? _xfs_buf_find+0x14b/0x2a0
[xfs]
Oct 11 04:11:31 hell kernel: [<ffffffff8108d674>] down+0x44/0x50
Oct 11 04:11:31 hell kernel: [<ffffffffa034d2fc>] xfs_buf_lock+0x3c/0xd0 [xfs]
Oct 11 04:11:31 hell kernel: [<ffffffffa034d4db>] _xfs_buf_find+0x14b/0x2a0
[xfs]
Oct 11 04:11:31 hell kernel: [<ffffffffa034d75a>] xfs_buf_get_map+0x2a/0x190
[xfs]
Oct 11 04:11:31 hell kernel: [<ffffffffa034e42c>] xfs_buf_read_map+0x2c/0x110
[xfs]
Oct 11 04:11:31 hell kernel: [<ffffffffa0379669>]
xfs_trans_read_buf_map+0x1b9/0x460 [xfs]
Oct 11 04:11:31 hell kernel: [<ffffffffa033d3dd>] xfs_read_agi+0x8d/0xe0 [xfs]
Oct 11 04:11:31 hell kernel: [<ffffffffa033d464>] xfs_ialloc_read_agi+0x34/0xd0
[xfs]
Oct 11 04:11:31 hell kernel: [<ffffffffa036189b>] xfs_bulkstat+0x16b/0x4d0
[xfs]
Oct 11 04:11:31 hell kernel: [<ffffffffa0361590>] ?
xfs_bulkstat_one_int+0x2e0/0x2e0 [xfs]
Oct 11 04:11:31 hell kernel: [<ffffffff811a3946>] ? dput+0x26/0x1b0
Oct 11 04:11:31 hell kernel: [<ffffffffa0357071>] xfs_ioc_bulkstat+0xd1/0x1a0
[xfs]
Oct 11 04:11:31 hell kernel: [<ffffffffa035967e>] xfs_file_ioctl+0x81e/0xb20
[xfs]
Oct 11 04:11:31 hell kernel: [<ffffffff810f443c>] ?
acct_account_cputime+0x1c/0x20
Oct 11 04:11:31 hell kernel: [<ffffffff81079f1b>] ?
account_system_time+0x8b/0x190
Oct 11 04:11:31 hell kernel: [<ffffffff812a8838>] ?
lockref_put_or_lock+0x48/0x80
Oct 11 04:11:31 hell kernel: [<ffffffff8119f8b8>] do_vfs_ioctl+0x2c8/0x490
Oct 11 04:11:31 hell kernel: [<ffffffff8107a390>] ?
vtime_account_user+0x40/0x60
Oct 11 04:11:31 hell kernel: [<ffffffff810e0c3c>] ?
__audit_syscall_entry+0x9c/0xf0
Oct 11 04:11:31 hell kernel: [<ffffffff8119fb01>] SyS_ioctl+0x81/0xa0


My details:
* kernel 3.7.0 built by me
* xfs_repair version 3.2.1
* 1 cpu, 4 cores
* I don't have meminfo from when the problem happens
* Relevant /proc/mounts line: /dev/mapper/tera1-home /home xfs
rw,noatime,attr2,inode64,noquota 0 0
* The relevant part of the layout is as follows (take a deep breath):
  * Two physical rotational disks
  * An SSD disk
  * md116 made of two partitions from these disks
  * md118 made of two partitions from these disks
  * A partition on the SSD disk that's used for bcache caching
  * bcache1 comprised by md116 + ssd_part
  * bcache3 comprised by md118 + ssd_part
  * LVM PV on bcache1
  * LVM PV on bcache3
  * LVM VG with bcache1 and bcache3
  * LVM LV on that VG
  * XFS partition on that LV
* Write cache is enabled

Thanks,
Stefanos

-- 
You are receiving this mail because:
You are the assignee for the bug.

<Prev in Thread] Current Thread [Next in Thread>