[Top] [All Lists]

Re: XFS hung on kernel

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: XFS hung on kernel
From: Ilia Mirkin <imirkin@xxxxxxxxxxxx>
Date: Sun, 18 Jul 2010 16:17:41 -0400
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=jnnAJ5REcPAXSxibEm/CshuGpdPkd4aShyGO0jbH+5Q=; b=vsvrHOosrE1DEPT7PZTbzf0MaQ141eOMXnJKxWf43Ztq5omYkkfJDqhME2LhXwDLYo doo2OngKCB0ufZi3g6t7Kc2XIYYhclEYtIr9qSNnmFUTu+o1i0WwV5eejxxhOg4blPH3 h4mqSjcLC+9DgCR/dQKCjAyhvoWdRorOlzetQ=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=VwnKHzGCnj4H3aKvEmnubNcrxNCHlhv+k36eR+gmR+g3fuPUsYrZENJbCJMOAv0Zf6 trgGE0BjxlZRHbcsEQfGdV7QS2xViioFLMALiTCR8xMFCSS6v5ue6BMqwCJnmaTsotIf rc07K9k70/BNxWUwM8bzyyYwbaGMY0PUrhswI=
In-reply-to: <AANLkTil7lvYW1awXbODVP5r6mXlTrlt4CCncvn5hU4b8@xxxxxxxxxxxxxx>
References: <AANLkTilX3l8TbUztLStj_u9OqOZnBrsNQxmeV4DuBmYJ@xxxxxxxxxxxxxx> <20100718012033.GA18888@dastard> <20100718045702.GB6282@xxxxxxxxxxxxx> <AANLkTil7lvYW1awXbODVP5r6mXlTrlt4CCncvn5hU4b8@xxxxxxxxxxxxxx>
Sender: ibmirkin@xxxxxxxxx
On Sun, Jul 18, 2010 at 1:28 AM, Ilia Mirkin <imirkin@xxxxxxxxxxxx> wrote:
> On Sun, Jul 18, 2010 at 12:57 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>> On Sun, Jul 18, 2010 at 11:20:33AM +1000, Dave Chinner wrote:
>>> So, back to the situation with the WARN_ON(). You're running
>>> applications that are doing something that:
>>>       a) is not supported;
>>>       b) compromises data integrity guarantees;
>>>       c) is not reliably reported; and
>>>       d) might be causing hangs
>>> Right now I'm not particularly inclined to dig into this further;
>>> it's obvious the applications are doing something that is not
>>> supported (by XFS or the generic page cache code), so this is the
>>> first thing you really need to care about getting fixed if you value
>>> your backups...
>> While it's slightly crazy it's also a pretty easy way for users to shoot
>> themselve in their feet.  Unlike the generic filesystems with their
>> simplistic i_mutex locking we have a way to assure this works properly
>> in XFS with the shared/exclusive iolock, so I'm willing to look into
>> this further.
>> Ilia, would you be willing to test patches for this?
> If by "this" you mean the WARN_ON's, no problem. It should be easy to
> repro in a non-critical setup, although I haven't tried. If you mean
> the hang, it will not be so easy to reproduce, as it has only happened
> once so far.
> I would also be happy to share the details of our setup, if you'd like
> to be able to play with it directly yourself. With our setup, multiple
> WARN_ON's happen every time we run a backup (last time I checked, it
> was ~50, but I'm sure it varies).
> As a last thought on the matter, I'm sure that this is all brought on
> by our use of innodb_flush_method=O_DIRECT which tells mysql to use
> direct io. Leaving that at its default which will just open the files
> regularly will probably make all these issues go away. I do not have a
> sufficient understanding of the details of how mysql uses direct io,
> and how it interacts with xtrabackup (which claims to work fine with
> direct io, but who knows) to be able to declare that it's safe, so I'm
> happy to accept Dave's advice to Not Do That.

In case you guys are interested, I've also opened a bug at
https://bugs.launchpad.net/percona-xtrabackup/+bug/606981. I never
quite bothered to _really_ understand all of the details of O_DIRECT
vs mmap vs read, so if I've misrepresented reality in the bug, feel
free to correct it there, or let me know and I'll try to straighten
things out.

Ilia Mirkin

<Prev in Thread] Current Thread [Next in Thread>