XFS / xfssyncd lock-ups on 2.6.38-8

Subject: XFS / xfssyncd lock-ups on 2.6.38-8
Date: Sun, 20 Nov 2011 08:34:02 -0800
We're running a dozen Amazon AWS instances (on Ubuntu Natty Narwhal, kernel 
2.6.38-8). We've recent brought up several machines based on some previous 
snapshots (EBS snapshots, rather than LVM), and they've been locking up under 
load. The dmesg output is below; does this issue look familiar, or perhaps 
fixed in a later kernel? Or could it be indicative of some data corruption in 
the snapshot process? 

The drive is being used for Postgres write-ahead-logs, so it's a write-heavy, 
read-light drive. When the array (4 drives in RAID0) freezes up, nothing seems 
to fix it short of a hard restart of the machine—we've tried things like 
stopping Postgres, issuing a 'drop cache' to the kernel, and trying to kill the 
locked process, to no avail.

Would appreciate any thoughts/pointers to fixes or workarounds if this is a 
known issue.



(The /dev/md126 array that is locking up is an XFS RAID0 across 4 volumes)

The errors look like this:

[558307.361854] INFO: task xfssyncd/md126:1029 blocked for more than 120 
[558307.361867] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[558307.361874] xfssyncd/md126  D ffff881116f13b00     0  1029      2 0x00000000
[558307.361879]  ffff881088989d00 0000000000000246 ffff881088989fd8 
[558307.361884]  0000000000013b00 ffff8810866c3120 ffff881088989fd8 
[558307.361889]  ffff881089b84440 ffff8810866c2d80 ffffffff815dc13e 
[558307.361894] Call Trace:
[558307.361904]  [<ffffffff815dc13e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[558307.361932]  [<ffffffffa00e52d8>] xlog_grant_log_space+0x4a8/0x500 [xfs]
[558307.361937]  [<ffffffff8105f180>] ? default_wake_function+0x0/0x20
[558307.361951]  [<ffffffffa00e71ff>] xfs_log_reserve+0xff/0x140 [xfs]
[558307.361967]  [<ffffffffa00f31fc>] xfs_trans_reserve+0x9c/0x200 [xfs]
[558307.361980]  [<ffffffffa00d7383>] xfs_fs_log_dummy+0x43/0x90 [xfs]
[558307.361995]  [<ffffffffa010a3c1>] xfs_sync_worker+0x81/0x90 [xfs]
[558307.362009]  [<ffffffffa01090f3>] xfssyncd+0x183/0x230 [xfs]
[558307.362025]  [<ffffffffa0108f70>] ? xfssyncd+0x0/0x230 [xfs]
[558307.362030]  [<ffffffff81086ac6>] kthread+0x96/0xa0
[558307.362035]  [<ffffffff8100cde4>] kernel_thread_helper+0x4/0x10
[558307.362038]  [<ffffffff8100c1e3>] ? int_ret_from_sys_call+0x7/0x1b
[558307.362041]  [<ffffffff815dc621>] ? retint_restore_args+0x5/0x6
[558307.362045]  [<ffffffff8100cde0>] ? kernel_thread_helper+0x0/0x10

