xfs
[Top] [All Lists]

Re: Still seeing hangs in xlog_grant_log_space

To: Ben Myers <bpm@xxxxxxx>
Subject: Re: Still seeing hangs in xlog_grant_log_space
From: Juerg Haefliger <juergh@xxxxxxxxx>
Date: Sat, 19 May 2012 09:28:55 +0200
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=6CJhwWr3YCskXYb0CwyBNYuQ/V8pWX7FZw7BjTeSIVQ=; b=eUNRTDQtVcjX5Jo4ke2Wfz9qWPTSxAD7E199O//oXRS9Sb2wBJFFnebQbPIfSFQNkM 0Zu76vP1/vjGjn8eQdB5t42PBHnXvRPyVL1rOIJrxoYu9HPHpBLQbzbRYCiMAkuHqwod VH0SRuRU7ckfTqIhyqsiDgAdXA6KKcsTS4E0KMY8/k8KTaIsqg7PE/qGLDj5gmwrJq9U JYjdyurlbr4SNWgIEIVz3kNnD6mzzRPoYtjP4a88jYJMV+wqCTWaoLweDicT//o9JS4Z iHhj4MO/vFg8B4I3aVwBe1acCw7DZOxE8B2QNxT/bs5NznpQqRzOjxdTU8a2En8Px2WQ r6Qg==
In-reply-to: <20120518171959.GQ16099@xxxxxxx>
References: <CADLDEKvYkpUnMrqdMyqCmsYrZcUtiJ6ZRhrRu_ERTjn=r7M3Pg@xxxxxxxxxxxxxx> <20120426224412.GA9541@dastard> <CADLDEKs6oMDA-6OhmcFxyRoBVpduKtSput=53TQGn9NCAOXC1Q@xxxxxxxxxxxxxx> <20120426230738.GB9541@dastard> <CADLDEKuKLeYiqhQW0E9g_bS0VXoxPGPOck3N004Pxg4_Opbzow@xxxxxxxxxxxxxx> <20120427110922.GF9541@dastard> <CADLDEKtUHAGcOPT1jtcvyJVk+zsoL5_thYFtHJYs+w=6EGuVSA@xxxxxxxxxxxxxx> <CADLDEKs4YbNzj2c0HKHwSdUfKy0efdQRe1rOsWDkWUgd+BOGHw@xxxxxxxxxxxxxx> <20120507171908.GA16881@xxxxxxx> <CADLDEKvgT_FcGhJKoPaQv0mh_Jqdaqu8SYatc9xxU7vOY217YQ@xxxxxxxxxxxxxx> <20120518171959.GQ16099@xxxxxxx>
Hi Ben,

> Hey Juerg,
>
> On Wed, May 09, 2012 at 09:54:08AM +0200, Juerg Haefliger wrote:
>> > On Sat, May 05, 2012 at 09:44:35AM +0200, Juerg Haefliger wrote:
>> >> Did anybody have a chance to look at the data?
>> >
>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/979498
>> >
>> > Here you indicate that you have created a reproducer.  Can you post it to 
>> > the list?
>>
>> Canonical attached them to the bug report that they filed yesterday:
>> http://oss.sgi.com/bugzilla/show_bug.cgi?id=922
>
> I'm interested in understanding to what extent the hang you see in production
> on 2.6.38 is similar to the hang of the reproducer.  Mark is seeing a 
> situation
> where there is nothing on the AIL and is clogged up in the CIL, others are
> seeing items on the AIL that don't seem to be making progress.  Could you
> provide a dump or traces from a hang on a filesystem with a normal sized log?
> Can the reproducer hit the hang eventually without resorting to the tiny log?

I'm not certain that the reproducer hang is identical to the
production hang. One difference that I've noticed is that a reproducer
hang can be cleared with an emergency sync while a production hang
can't. I'm working on trying to get a trace from a production machine.
Any ideas how to do the tracing without filling up the filesystem with
trace data? I need to run it for at least a week to catch a hang. I
was thinking of tracing in 15 min batches and just keeping 30 mins
worth of trace data but that will leave some gaps when I stop/restart
the tracing. I'm not familiar with ftrace, maybe it provides the
functionality to do some sort of ring buffer dumping to only keep the
last x mins of data?

Thanks
...Juerg


> Regards,
>        Ben

<Prev in Thread] Current Thread [Next in Thread>