xfs
[Top] [All Lists]

Re: Still seeing hangs in xlog_grant_log_space

To: Juerg Haefliger <juergh@xxxxxxxxx>
Subject: Re: Still seeing hangs in xlog_grant_log_space
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 27 Apr 2012 09:07:38 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <CADLDEKs6oMDA-6OhmcFxyRoBVpduKtSput=53TQGn9NCAOXC1Q@xxxxxxxxxxxxxx>
References: <20120423143843.GN9541@dastard> <CADLDEKvFF3FvEHVtmwdWhbM58_jrCRX+Uk9vLBg1hA8sizh5BQ@xxxxxxxxxxxxxx> <20120423235840.GQ9541@dastard> <CADLDEKsfckBw2oVYFfaaTbpe8Ri+rYJr2e5SB7-pM0BU9nRUeA@xxxxxxxxxxxxxx> <20120424120731.GT9541@dastard> <CADLDEKs01GnxgYh2UTt1waVDUXHbB_RcBcUTBr5REFg5aD5jHA@xxxxxxxxxxxxxx> <20120425223845.GX9541@dastard> <CADLDEKvYkpUnMrqdMyqCmsYrZcUtiJ6ZRhrRu_ERTjn=r7M3Pg@xxxxxxxxxxxxxx> <20120426224412.GA9541@dastard> <CADLDEKs6oMDA-6OhmcFxyRoBVpduKtSput=53TQGn9NCAOXC1Q@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Fri, Apr 27, 2012 at 01:00:08AM +0200, Juerg Haefliger wrote:
> On Fri, Apr 27, 2012 at 12:44 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Thu, Apr 26, 2012 at 02:37:50PM +0200, Juerg Haefliger wrote:
> >> On Thu, Apr 26, 2012 at 12:38 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >> > On Tue, Apr 24, 2012 at 08:26:04PM +0200, Juerg Haefliger wrote:
> >> >> On Tue, Apr 24, 2012 at 2:07 PM, Dave Chinner <david@xxxxxxxxxxxxx> 
> >> >> wrote:
> >> >> > On Tue, Apr 24, 2012 at 10:55:22AM +0200, Juerg Haefliger wrote:
> >> >> >> > Alright, then I need all the usual information. I suspect an event
> >> >> >> > trace is the only way I'm going to see what is happening. I just
> >> >> >> > updated the FAQ entry, so all the necessary info for gathering a
> >> >> >> > trace should be there now.
> >> >> >> >
> >> >> >> > http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> >> >> >>
> >> >> >> Very good. Will do. What kernel do you want me to run? I would prefer
> >> >> >> our current production kernel (2.6.38-8-server) but I understand if
> >> >> >> you want something newer.
> >> >> >
> >> >> > If you can reproduce it on a current kernel - 3.4-rc4 if possible, if
> >> >> > not a 3.3.x stable kernel would be best. 2.6.38 is simply too old to
> >> >> > be useful for debugging these sorts of problems...
> >> >>
> >> >> OK, I reproduced a hang running 3.4-rc4. The data is here but it's a
> >> >> whopping 2GB (yes it's compressed):
> >> >> https://region-a.geo-1.objects.hpcloudsvc.com:443/v1.0/AUTH_9630ead2-6194-40df-afd3-7395448d4536/xfs-hang/report-2012-04-24.tar
> >> >
> >> > That's a bit big to be useful, and far bigger than I'm willing to
> >> > download given that I'm on the end of a wet piece of string, not a
> >> > big fat intarwebby pipe.
> >>
> >> Fair enough.
> >>
> >>
> >> > I'm assuming it is the event trace
> >> > that is causing it to blow out? If so, just the 30-60s either side of
> >> > the hang first showing up is probaby necessary, and that should cut
> >> > the size down greatly....
> >>
> >> Can I shorten the existing trace.dat?
> >
> > No idea, but that's likely the problem - I don't want the binary
> > trace.dat file. I want the text output of the report command
> > generated from the binary trace.dat file...
> 
> Well yes. I did RTFM :-) trace.dat is 15GB.

OK, that's a lot larger than I expected for a hung filesystem....

> >> I stopped the trace
> >> automatically 10 secs after the the xlog_... trace showed up in syslog
> >> so effectively some 130+ secs after the hang occured.

Can you look at the last timestamp in the report file, and trim off
anything from the start that is older than, say, 180s before that?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>