On Thu, Jul 25, 2013 at 10:02:40AM -0500, Mark Tinguely wrote:
> On 07/24/13 19:21, Dave Chinner wrote:
> >On Wed, Jul 24, 2013 at 08:28:42AM -0500, Mark Tinguely wrote:
> >>If you could please redo the test and get the stack traces with
> >>/proc/sysrq-trigger and if you kernel works with crash, a core dump.
> >>For the stack trace, I mostly want to know if it has several
> >>"xlog_grant_head_wait" entries in it, because ...
> >>...I seemed to have triggered a couple log space reservation hangs
> >>with fsstress one XFS partition and a mega-copy on another
> >>partition, but will have to graft the new XFS tree onto a Linux 3.10
> >>kernel to get crash (and one of my sata controllers) to work again.
> >They are unrelated to this patchset.
> >Somewhere in the code there
> >is a mismatch between what we reserve as the base requirement for an
> >actual log write and what the CIL actually steals, and that is, most
> >likely, what is leading to log hangs.
> >This is demonstratable in the fact that generic/070 on 512 byte
> >block size filesystems regularly hits a transaction reservation
> >exhausted assert failure on transaction commit of the periodic log
> >dummy transaction on my test rigs.
> In testing patch 44, I did not trip over any cil stealing asserts
> before the hang. I think the cil steal assert is a different and a
> legitimate complaint. When I tripped over the ASSERT in with the v3
> inode enabled, the writeid only reserves space for the sb but there
> were occasions of root btree and attribute fork entry that were also
> patch 43 runs for hours without incident. Previous to this series, I
> ran the same tests with parent pointer testing with much higher log
> reservations for day or two and never got a hang.
> I tested patch 44 with copy like tests and both times it hung both
> times - not a convincing number of tests. A quick look, I see an
> empty AIL, empty CIL, the CTX is using 0 bytes, doesn't look like
> there are any cil pushes going nor any older ctx, the ctx has an
> empty ticket reservation. The log tail is 0xd000014d7 and
> reserve/grant is 0xe00204d04. The next reservation is for a rename
> transaction that uses just over the log space left. There has to be
> a log space leak. I will go back patch 43 on one machine and patch
> 44 on another and make sure it is patch 44 is causing the problem.
Right, a patch that makes transaction commits go faster is likely to
cause a pre-existing reservation leak to leak faster....