Still seeing hangs in xlog_grant_log_space
Dave Chinner
david at fromorbit.com
Wed May 9 02:35:07 CDT 2012
On Tue, May 08, 2012 at 08:59:44AM +1000, Dave Chinner wrote:
> On Sat, May 05, 2012 at 09:44:35AM +0200, Juerg Haefliger wrote:
> > >> But better would be to use the device number of the relevant
> > >> filesystem to filter the events. The device is 252:2, which means in
> > >> kernel terms is it:
> > >>
> > >> dev = (major << 20) | minor
> > >> = 0xfc00002
> > >>
> > >> So you should be able to get just the xfs-hang events via:
> > >>
> > >> # trace-cmd record -e xfs\* -d 'dev == 0xfc00002'
> > >>
> > >> and as long as you don't host log files on /xfs-hang, it'll only
> > >> record the workload running on the xfs-hang filesystem.
> > >
> > > Third try: https://region-a.geo-1.objects.hpcloudsvc.com:443/v1.0/AUTH_9630ead2-6194-40df-afd3-7395448d4536/xfs-hang/report-2012-04-27-180secs.tgz
> > > Filtered by device, trace events go to a different filesystem.
> >
> > Did anybody have a chance to look at the data?
>
> I've had a quick look, but I need to write scripts to visualise it
> (i.e. graph it) to determine if there's any pattern to the issue.
And, as expected, something unexpected popped out.
Judicious use of awk on the log space grant events shows an
interesting pattern occuring from time to time:
Transaction Wait queues Grant head Write Head Log head Log tail
-----------------------------------------------------------------------------------------
INACTIVE empty empty 118 438024 118 438024 118 802 118 697
INACTIVE empty empty 119 20240 119 20240 118 802 118 697
REMOVE empty empty 118 438456 118 438456 118 802 118 697
REMOVE empty empty 118 438772 118 438772 118 802 118 697
CREATE empty empty 119 35872 119 35872 118 802 118 697
FSYNC_TS active empty 119 205428 119 205428 118 802 118 697
FSYNC_TS active empty 119 202944 119 202944 118 802 118 697
FSYNC_TS active empty 119 200664 119 200664 118 802 118 697
REMOVE empty empty 118 380316 118 380316 118 724 118 652
INACTIVE empty empty 118 552532 118 552532 118 724 118 652
FSYNC_TS empty empty 118 382140 118 382140 118 724 118 652
INACTIVE empty empty 118 565968 118 565968 118 724 118 652
REMOVE empty empty 119 25404 119 25404 118 802 118 697
INACTIVE active empty 119 25580 119 25580 118 802 118 697
Anyone notice something fishing with the log head and tail?
That's right, they go *backwards* when a particular REMOVE
transaction is executed. That's, well, completely unexpected, and
completely breaks the assumptions made in the log space reservation
code. Essentially, the log tail ismoving backwards, and the head is
relative to the tail so moves with it.
That is *nasty*. It means that if this occurs at just the right
(wrong) time (i.e. just before a checkpoint), we can overwrite log
metadata and essentially corrupt the log. If we were to crash with
something like this active in the log, then log reocvery would fail
and or corrupt the filesystem. It is however, very hard to trigger a
crash while this conditions exists because, as you can see, it only
existed for 4 transactions - about 100uS according to the traces in
this case.
I think this has to do with how inode allocation/unlink buffers are
handled - we handle their position (via their LSN) in the AIL
specially, and I think that's affecting the in memory view of the
log tail and hence how much space is really available in the log. I
need to do more investigation to really understand the way it fails,
and write a reproducable test case, but I think I've found the
smoking gun....
Cheers,
Dave.
--
Dave Chinner
david at fromorbit.com
More information about the xfs
mailing list