http://bugzilla.kernel.org/show_bug.cgi?id=7668
Summary: XFS mount of LVM2 snapshot hangs occasionally
Kernel Version: 2.6.17.7
Status: NEW
Severity: high
Owner: xfs-masters@xxxxxxxxxxx
Submitter: official.box@xxxxxxxxx
Most recent kernel where this bug did *NOT* occur: unknown
Distribution: Fedora
Hardware Environment: PC, P4 2.4GHz HT, Abit Mainboard, HD: WD2000-JS 200GB
Software Environment: LVM2, Logical Volume and LVM snapshots
Problem Description:
Mount operation applied to LVM2 snapshot hangs occasionally whenever main
Logical Volume is in use. Possibly due to heavy IO load, congestion.
I was able to track this error with help of simple debugging methodology.
First, I discovered that hanging occurs within xlog_recover() function of
xfs_log_recover.c source file.
xlog_recover() calls xlog_do_recover() where I placed a few printk's to give me
a clue about closer location of the problem:
xlog_do_recover(
xlog_t *log,
xfs_daddr_t head_blk,
xfs_daddr_t tail_blk)
{
int error;
xfs_buf_t *bp;
xfs_sb_t *sbp;
printk("xlog_do_recover: #1\n");
/*
* First replay the images in the log.
*/
error = xlog_do_log_recovery(log, head_blk, tail_blk);
if (error) {
return error;
}
printk("xlog_do_recover: #2\n");
XFS_bflush(log->l_mp->m_ddev_targp);
printk("xlog_do_recover: #3\n");
/*
* If IO errors happened during recovery, bail out.
*/
if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
return (EIO);
}
printk("xlog_do_recover: #4\n");
/*
* We now update the tail_lsn since much of the recovery has completed
* and there may be space available to use. If there were no extent
* or iunlinks, we can free up the entire log and set the tail_lsn to
* be the last_sync_lsn. This was set in xlog_find_tail to be the
* lsn of the last known good LR on disk. If there are extent frees
* or iunlinks they will have some entries in the AIL; so we look at
* the AIL to determine how to set the tail_lsn.
*/
printk("xlog_do_recover: #5\n");
xlog_assign_tail_lsn(log->l_mp);
/*
* Now that we've finished replaying all buffer and inode
* updates, re-read in the superblock.
*/
printk("xlog_do_recover: #6\n");
bp = xfs_getsb(log->l_mp, 0);
XFS_BUF_UNDONE(bp);
XFS_BUF_READ(bp);
xfsbdstrat(log->l_mp, bp);
printk("xlog_do_recover: #7\n");
if ((error = xfs_iowait(bp))) {
printk("xlog_do_recover: #100\n");
xfs_ioerror_alert("xlog_do_recover",
log->l_mp, bp, XFS_BUF_ADDR(bp));
ASSERT(0);
xfs_buf_relse(bp);
printk("xlog_do_recover: #101\n");
return error;
}
/* Convert superblock from on-disk format */
sbp = &log->l_mp->m_sb;
printk("xlog_do_recover: #8\n");
xfs_xlatesb(XFS_BUF_TO_SBP(bp), sbp, 1, XFS_SB_ALL_BITS);
ASSERT(sbp->sb_magicnum == XFS_SB_MAGIC);
ASSERT(XFS_SB_GOOD_VERSION(sbp));
xfs_buf_relse(bp);
printk("xlog_do_recover: #9\n");
xlog_recover_check_summary(log);
/* Normal transactions can now occur */
log->l_flags &= ~XLOG_ACTIVE_RECOVERY;
return 0;
}
Whenever mount operation hangs, it hangs after displaying "xlog_do_recover: #7"
message. It hangs on xfs_iowait(bp) function call.
WCHAN of changing mount process suggested semaphore infinite wait.
In xfs_buf.h we read:
#define xfs_iowait(bp) xfs_buf_iowait(bp)
and xfs_buf_iowait() is defined in xfs_buf.c, again.
Within this function, mount operation hangs on: down(&bp->b_iodonesema);
In order to be able to log some useful information I changed xfs_buf_iowait()
definition in such a way that:
a. it tries to get the semaphore using down_trylock()
b. unless successfull, it logs some fields of our xfs_buf_t structure.
c. finally it hangs using original down(&bp->b_iodonesema)
My function looks as follows:
int
xfs_buf_iowait(
xfs_buf_t *bp)
{
XB_TRACE(bp, "iowait", 0);
if (atomic_read(&bp->b_io_remaining))
blk_run_address_space(bp->b_target->bt_mapping);
if (down_trylock(&bp->b_iodonesema))
{
printk("xfs_buf_iowait blocked, b_io_rem: %d, b_error: %d,
b_flags: %d, block: %lld, dev_maj: %d, dev_min: %d, XBF_ASYNC: %d\n",
atomic_read(&bp->b_io_remaining), bp->b_error, bp->b_flags, bp->b_bn, MAJOR(bp-
>b_target->bt_dev), MINOR(bp->b_target->bt_dev), (((bp->b_flags & XBF_ASYNC) !=
0) ? 1 : 0));
down(&bp->b_iodonesema);
}
XB_TRACE(bp, "iowaited", (long)bp->b_error);
return bp->b_error;
}
To be able to log even more diagnostic information I added additional printk's
within dm_request() and dec_pending() functions of dm.c file.
This logging takes place on start and end of any io operation entering device
mapper. To limit excessive logging I only log BIOs referring to bi_sector == 0
which is sufficient in this case.
Now, with proper, non-hanging execution of mount operation we get something
that looks like:
Dec 12 01:13:08 localhost kernel: [17179773.208000] XFS mounting filesystem dm-5
Dec 12 01:13:08 localhost kernel: [17179773.344000] xfs_buf_iowait blocked,
b_io_rem: 1, b_error: 0, b_flags: 262149, block: 40960032, dev_maj: 253,
dev_min: 5, XBF_ASYNC: 0
...
Dec 12 01:13:15 localhost kernel: [17179786.228000] xlog_do_recover: #2
...
Dec 12 01:13:21 localhost kernel: [17179786.300000] xlog_do_recover: #3
Dec 12 01:13:21 localhost kernel: [17179786.300000] xlog_do_recover: #4
Dec 12 01:13:24 localhost kernel: [17179786.300000] xlog_do_recover: #5
Dec 12 01:13:33 localhost kernel: [17179786.300000] xlog_do_recover: #6
>>>>>>>>
Dec 12 01:13:33 localhost kernel: [17179786.300000] dm_request: request for
sector 0, device: 253:5, dir: READ
Dec 12 01:13:33 localhost kernel: [17179786.300000] xlog_do_recover: #7
Dec 12 01:13:34 localhost kernel: [17179786.300000] xfs_buf_iowait blocked,
b_io_rem: 1, b_error: 0, b_flags: 131333, block: 0, dev_maj: 253, dev_min: 5,
XBF_ASYNC: 0
Dec 12 01:13:35 localhost kernel: [17179786.416000] dec_pending: ending bio for
sector 0, device: 253:5, dir: READ
>>>>>>>>
Dec 12 01:13:35 localhost kernel: [17179786.416000] xlog_do_recover: #8
Dec 12 01:13:35 localhost kernel: [17179786.416000] xlog_do_recover: #9
...
Our focus point is on xlog_do_recover: #7 line with preceding IO request,
xfs_iowait() call that follows and on final endio of IO request.
Preceding IO request is always a READ request for block number 0.
On the following line:
Dec 12 01:13:34 localhost kernel: [17179786.300000] xfs_buf_iowait blocked,
b_io_rem: 1, b_error: 0, b_flags: 131333, block: 0, dev_maj: 253, dev_min: 5,
XBF_ASYNC: 0
you can read that xfs_buf_iowait call was about to block on b_iodonesema but it
finally got woken up.
b_flags of xfs_buf_t involved had NO XBF_ASYNC flag set.
Now, with hanging path of mount execution you get something like this:
>>>>>>>>>>
Dec 12 01:20:33 localhost kernel: [17180171.176000] dm_request: request for
sector 0, device: 253:9, dir: READ
Dec 12 01:20:33 localhost kernel: [17180171.176000] xlog_do_recover: #7
Dec 12 01:20:34 localhost kernel: [17180171.176000] xfs_buf_iowait blocked,
b_io_rem: 1, b_error: 0, b_flags: 131349, block: 0, dev_maj: 253, dev_min: 9,
XBF_ASYNC: 1
Dec 12 01:20:34 localhost kernel: [17180171.176000] dec_pending: ending bio for
sector 0, device: 253:9, dir: READ
>>>>>>>
Everything looks similarly except for the fact that preceding READ operation on
block number 0 initiated asynchronous mode of operation.
This pattern is 100% repeatable - whenever XBF_ASYNC flag is 1 for this READ
operation on block number 0, outstanding xfs_iowait() is never going to exit
waiting state !
Steps to reproduce:
1. Create Logical Volume, mount it under /lv
2. Enter the following loop:
while [ true ];
do
lvcreate -L15GB -s -c256k -n1 /dev/VolGroup00/LogVol00;
lvcreate -L15GB -s -n2 /dev/VolGroup00/LogVol00;
lvcreate -L15GB -s -c256k -n3 /dev/VolGroup00/LogVol00;
lvcreate -L15GB -s -n4 /dev/VolGroup00/LogVol00;
for i in 1;
do
mount -onouuid /dev/VolGroup00/1 /s/1;
mount -onouuid /dev/VolGroup00/2 /s/2;
mount -onouuid /dev/VolGroup00/3 /s/3;
mount -onouuid /dev/VolGroup00/4 /s/4;
umount /s/1;
umount /s/2;
umount /s/3;
umount /s/4;
done
echo "Removing /dev/VolGroup00/1";
time echo "y" | lvremove /dev/VolGroup00/1;
echo "Removing /dev/VolGroup00/2";
time echo "y" | lvremove /dev/VolGroup00/2;
echo "Removing /dev/VolGroup00/3";
time echo "y" | lvremove /dev/VolGroup00/3;
echo "Removing /dev/VolGroup00/4";
time echo "y" | lvremove /dev/VolGroup00/4;
echo "Removed all snapshots";
done
3. In background run additional script to cause IO load of the system:
while [ true ];
do
cat /dev/zero | head -c1000m > /lv/output;
rm -f /lv/output;
done
4. After some system, script 2. will hang on some mount operation.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|