[Top] [All Lists]

Re: XFS hung on kernel

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: XFS hung on kernel
From: Ilia Mirkin <imirkin@xxxxxxxxxxxx>
Date: Sun, 18 Jul 2010 01:28:43 -0400
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=VvQz0+DAhlYaqZUFXo2yVyPR+IhFsWJ/OJJ3JnwCzY8=; b=r8MOmLdKqh50OlyV/bnfoXaTy+UIWQYgJBlnq3xpST41cJG9Y5in8DrIz6mQEOXYmH RH342puGQj6mBuEzQJN3WKzWPM449m3TdJT6QpvZ/1Ti1Nnp+ulOU23OcscOrh3++LdN 82WACVB8IBBrcgnKW71oG7uDEM2u2/j3tVrSI=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=SjVUqI++Sz+ax0NyPuKj07jXvRqpb3IgOaGeL57rUkqqIBbUi/vd8Pbk6SdS5j5tQO 1ma1WEmKf37R7ypBQHOuSBUjl61SL3gn8sxjVD1Mp2wPwZS/O7LyZVGJ0tBOH1lSGCFv vIRYt+hkcHz3rG1KSPd6bkP78nd2hcJlv9W3c=
In-reply-to: <20100718045702.GB6282@xxxxxxxxxxxxx>
References: <AANLkTilX3l8TbUztLStj_u9OqOZnBrsNQxmeV4DuBmYJ@xxxxxxxxxxxxxx> <20100718012033.GA18888@dastard> <20100718045702.GB6282@xxxxxxxxxxxxx>
Sender: ibmirkin@xxxxxxxxx
On Sun, Jul 18, 2010 at 12:57 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> On Sun, Jul 18, 2010 at 11:20:33AM +1000, Dave Chinner wrote:
>> So, back to the situation with the WARN_ON(). You're running
>> applications that are doing something that:
>>       a) is not supported;
>>       b) compromises data integrity guarantees;
>>       c) is not reliably reported; and
>>       d) might be causing hangs
>> Right now I'm not particularly inclined to dig into this further;
>> it's obvious the applications are doing something that is not
>> supported (by XFS or the generic page cache code), so this is the
>> first thing you really need to care about getting fixed if you value
>> your backups...
> While it's slightly crazy it's also a pretty easy way for users to shoot
> themselve in their feet.  Unlike the generic filesystems with their
> simplistic i_mutex locking we have a way to assure this works properly
> in XFS with the shared/exclusive iolock, so I'm willing to look into
> this further.
> Ilia, would you be willing to test patches for this?

If by "this" you mean the WARN_ON's, no problem. It should be easy to
repro in a non-critical setup, although I haven't tried. If you mean
the hang, it will not be so easy to reproduce, as it has only happened
once so far.

I would also be happy to share the details of our setup, if you'd like
to be able to play with it directly yourself. With our setup, multiple
WARN_ON's happen every time we run a backup (last time I checked, it
was ~50, but I'm sure it varies).

As a last thought on the matter, I'm sure that this is all brought on
by our use of innodb_flush_method=O_DIRECT which tells mysql to use
direct io. Leaving that at its default which will just open the files
regularly will probably make all these issues go away. I do not have a
sufficient understanding of the details of how mysql uses direct io,
and how it interacts with xtrabackup (which claims to work fine with
direct io, but who knows) to be able to declare that it's safe, so I'm
happy to accept Dave's advice to Not Do That.

<Prev in Thread] Current Thread [Next in Thread>