Hi,
I got a deadlock on 2.6.10 where (due to some fs corruption somewhere -- not
the point of this e-mail) xfs_trans_delete_ail called xfs_do_force_shutdown
holding the AIL_LOCK. Later on, xfs_trans_tail_ail was called, which went for
AIL_LOCK again...
The code path in question (though perhaps there are other possible ones) looks
like:
xfs_trans_delete_ail holds AIL_LOCK
-> calls xfs_do_force_shutdown
-> calls xfs_log_force_umount
-> calls xlog_state_sync_all
-> calls xlog_state_release_iclog
-> calls xlog_assign_tail_lsn
-> calls xfs_trans_tail_ail
-> tries to take AIL_LOCK
A sample backtrace I got (seen due to memory shortages as it happens, but this
too is a separate problem) was:
Call Trace:<IRQ> <ffffffff80159260>{__alloc_pages+816}
<ffffffff801592fe>{__get_free_pages+14}
<ffffffff8015cbc1>{cache_grow+273}
<ffffffff8015d0d8>{cache_alloc_refill+440}
<ffffffff8015caa6>{kmem_cache_alloc+54} <ffffffff802f87ec>{alloc_skb+44}
<ffffffffa001029e>{:e1000:e1000_alloc_rx_buffers+110}
<ffffffffa0012b8d>{:e1000:e1000_clean+1869}
<ffffffff802fecd4>{net_rx_action+132}
<ffffffff8013a931>{__do_softirq+113} <ffffffff8013a9e5>{do_softirq+53}
<ffffffff8011124f>{do_IRQ+63} <ffffffff8010e9cd>{ret_from_intr+0}
<EOI> <ffffffff80135d9d>{printk+141}
<ffffffff8011ce10>{flat_send_IPI_mask+0}
<ffffffff8035ed37>{.text.lock.spinlock+0}
<ffffffff802150a1>{xfs_trans_tail_ail+33}
<ffffffff8020909e>{xlog_assign_tail_lsn+30}
<ffffffff80209d69>{xlog_state_release_iclog+57}
<ffffffff8020b0a1>{xlog_state_sync_all+209}
<ffffffff801fa0a6>{xfs_cmn_err+214}
<ffffffff8020c422>{xfs_log_force_umount+322}
<ffffffff80222ea0>{pagebuf_iodone_work+0}
<ffffffff8021fc14>{xfs_do_force_shutdown+132}
<ffffffff8021538b>{xfs_trans_delete_ail+219}
<ffffffff8021538b>{xfs_trans_delete_ail+219}
<ffffffff8035e9d7>{__up_wakeup+53}
<ffffffff801e885c>{xfs_buf_iodone+44}
<ffffffff801e806a>{xfs_buf_do_callbacks+42}
<ffffffff801e8742>{xfs_buf_iodone_callbacks+322}
<ffffffff801313c3>{__wake_up+67}
<ffffffff80222ea0>{pagebuf_iodone_work+0}
<ffffffff80146450>{worker_thread+496}
<ffffffff80131300>{default_wake_function+0}
<ffffffff80131300>{default_wake_function+0}
<ffffffff8014a840>{keventd_create_kthread+0}
<ffffffff80146260>{worker_thread+0}
<ffffffff8014a840>{keventd_create_kthread+0}
<ffffffff8014a7f9>{kthread+217}
<ffffffff8010ef77>{child_rip+8}
<ffffffff8014a840>{keventd_create_kthread+0}
<ffffffff8014a720>{kthread+0} <ffffffff8010ef6f>{child_rip+0}
The dmesg said:
Filesystem "sdf1": xfs_trans_delete_ail: attempting to delete a log item
that is not in the AIL
xfs_force_shutdown(sdf1,0x8) called from line 382 of file
fs/xfs/xfs_trans_ail.c. Return address = 0xffffffff8021538b
Soon after the first CPU deadlocked, each other CPU on my system locked up
going for the same AIL_LOCK. It'd be great this particular deadlock case could
be fixed so that fs problems like this don't bring entire systems down.
Cheers,
Jim Minter <jim@xxxxxxxxxxxxxxxxxx>
|