Hi Dave, Alex,
Debugging using trace, crash and systemtap, I found that the hang
happens when xfs_sync_worker() (thru kworker) gets stuck in xlog_wait()
while reserving a transaction log buffer for the dummy log.
I also found that even though xfsaild_push() keeps getting invoked, it
doesn't do anything to push the log to the disk, since the
ailp->xa_target has not been changed since it has been called from the
process stack a while back.
So, I thought, resetting the target to the max value would help nudge
the flow of ail to the disk. So, I added the following code.
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index ed9252b..f59fd9f 100644
@@ -534,6 +534,10 @@ out_done:
ailp->xa_last_pushed_lsn = 0;
+ lsn = xfs_ail_max_lsn(ailp);
+ xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &lsn);
and it seem to do the magic.
With this change, test 234 runs fine.
Is this a good fix, bad fix, overkill... ?
Please let me know.