xfs
[Top] [All Lists]

Re: Still seeing hangs in xlog_grant_log_space

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Still seeing hangs in xlog_grant_log_space
From: Juerg Haefliger <juergh@xxxxxxxxx>
Date: Fri, 27 Apr 2012 11:04:33 +0200
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=vG4vnvP4gMVM8xVet2x65MvvpxnIVrF8BW9mCMYgB3g=; b=nVAJrv09URqJAxsoesphOh+jtTwANIXHifMBR02c8B3KxSRZkaDAX35JHLHnZjnU37 8rBKgj8P0Kr8RBvSCnsUP4hyxE8dRoAdFtL1feux6VOaLYY9oo/cXBJ+xlWSkZa93TJb iWJdlekpJva17s2bPvCRn3LoMYYHXUsHuLDs3m0TDsLdFkLJ881Fb1/Su6zYCRD/1Nk6 ORUiNgIUmmeG5isl2NKYnK7qtiF34blnYLOfEqqVF2ZAY+pzwPxJ968wQTWr9PwbRoCE 9TmS4In1S7L2CqGYy6wW6qXhCwQLv3NQD0pefgh4XYRuKdqqPGzLxF0wj5V3j5o6AN9h BD/w==
In-reply-to: <20120426230738.GB9541@dastard>
References: <20120423143843.GN9541@dastard> <CADLDEKvFF3FvEHVtmwdWhbM58_jrCRX+Uk9vLBg1hA8sizh5BQ@xxxxxxxxxxxxxx> <20120423235840.GQ9541@dastard> <CADLDEKsfckBw2oVYFfaaTbpe8Ri+rYJr2e5SB7-pM0BU9nRUeA@xxxxxxxxxxxxxx> <20120424120731.GT9541@dastard> <CADLDEKs01GnxgYh2UTt1waVDUXHbB_RcBcUTBr5REFg5aD5jHA@xxxxxxxxxxxxxx> <20120425223845.GX9541@dastard> <CADLDEKvYkpUnMrqdMyqCmsYrZcUtiJ6ZRhrRu_ERTjn=r7M3Pg@xxxxxxxxxxxxxx> <20120426224412.GA9541@dastard> <CADLDEKs6oMDA-6OhmcFxyRoBVpduKtSput=53TQGn9NCAOXC1Q@xxxxxxxxxxxxxx> <20120426230738.GB9541@dastard>
On Fri, Apr 27, 2012 at 1:07 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Fri, Apr 27, 2012 at 01:00:08AM +0200, Juerg Haefliger wrote:
>> On Fri, Apr 27, 2012 at 12:44 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> > On Thu, Apr 26, 2012 at 02:37:50PM +0200, Juerg Haefliger wrote:
>> >> On Thu, Apr 26, 2012 at 12:38 AM, Dave Chinner <david@xxxxxxxxxxxxx> 
>> >> wrote:
>> >> > On Tue, Apr 24, 2012 at 08:26:04PM +0200, Juerg Haefliger wrote:
>> >> >> On Tue, Apr 24, 2012 at 2:07 PM, Dave Chinner <david@xxxxxxxxxxxxx> 
>> >> >> wrote:
>> >> >> > On Tue, Apr 24, 2012 at 10:55:22AM +0200, Juerg Haefliger wrote:
>> >> >> >> > Alright, then I need all the usual information. I suspect an event
>> >> >> >> > trace is the only way I'm going to see what is happening. I just
>> >> >> >> > updated the FAQ entry, so all the necessary info for gathering a
>> >> >> >> > trace should be there now.
>> >> >> >> >
>> >> >> >> > http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>> >> >> >>
>> >> >> >> Very good. Will do. What kernel do you want me to run? I would 
>> >> >> >> prefer
>> >> >> >> our current production kernel (2.6.38-8-server) but I understand if
>> >> >> >> you want something newer.
>> >> >> >
>> >> >> > If you can reproduce it on a current kernel - 3.4-rc4 if possible, if
>> >> >> > not a 3.3.x stable kernel would be best. 2.6.38 is simply too old to
>> >> >> > be useful for debugging these sorts of problems...
>> >> >>
>> >> >> OK, I reproduced a hang running 3.4-rc4. The data is here but it's a
>> >> >> whopping 2GB (yes it's compressed):
>> >> >> https://region-a.geo-1.objects.hpcloudsvc.com:443/v1.0/AUTH_9630ead2-6194-40df-afd3-7395448d4536/xfs-hang/report-2012-04-24.tar
>> >> >
>> >> > That's a bit big to be useful, and far bigger than I'm willing to
>> >> > download given that I'm on the end of a wet piece of string, not a
>> >> > big fat intarwebby pipe.
>> >>
>> >> Fair enough.
>> >>
>> >>
>> >> > I'm assuming it is the event trace
>> >> > that is causing it to blow out? If so, just the 30-60s either side of
>> >> > the hang first showing up is probaby necessary, and that should cut
>> >> > the size down greatly....
>> >>
>> >> Can I shorten the existing trace.dat?
>> >
>> > No idea, but that's likely the problem - I don't want the binary
>> > trace.dat file. I want the text output of the report command
>> > generated from the binary trace.dat file...
>>
>> Well yes. I did RTFM :-) trace.dat is 15GB.
>
> OK, that's a lot larger than I expected for a hung filesystem....
>
>> >> I stopped the trace
>> >> automatically 10 secs after the the xlog_... trace showed up in syslog
>> >> so effectively some 130+ secs after the hang occured.
>
> Can you look at the last timestamp in the report file, and trim off
> anything from the start that is older than, say, 180s before that?

Cut the trace down to 180 secs which brought the filesize down to
93MB: 
https://region-a.geo-1.objects.hpcloudsvc.com:443/v1.0/AUTH_9630ead2-6194-40df-afd3-7395448d4536/xfs-hang/report-2012-04-24-180secs.tgz

...Juerg


> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

<Prev in Thread] Current Thread [Next in Thread>