xfs
[Top] [All Lists]

Re: XFS crashes on VMs

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: XFS crashes on VMs
From: Shrinand Javadekar <shrinand@xxxxxxxxxxxxxx>
Date: Fri, 19 Jun 2015 11:34:36 -0700
Cc: "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CABppvi58MHbtmhx226yM+e1ntR+bPvLqk6h80guW=2cimCzSxg@xxxxxxxxxxxxxx>
References: <CABppvi4FZKTsu22uk6nOSaShJOUyWg7cO6h-i4YekF4MLKH8RQ@xxxxxxxxxxxxxx> <8BF495B8-F444-415A-B7BF-5E1961C75817@xxxxxxxxxxx> <CABppvi5ODWyjMw0pouAQ__SQ-aGb48PEyTCPU7Dt65Z1DhKmVA@xxxxxxxxxxxxxx> <556666B2.8060301@xxxxxxxxxxx> <54A83597-7AE8-4846-944F-EEE7E2AD8B3C@xxxxxxxxxxx> <CABppvi58MHbtmhx226yM+e1ntR+bPvLqk6h80guW=2cimCzSxg@xxxxxxxxxxxxxx>
I hit this problem again and captured the output of all the steps
while repairing the filesystem. Here's the crash:
http://pastie.org/private/prift1xjcc38s0jcvehvew

And the output of the xfs_repair steps (also attached if needed):
http://pastie.org/private/gvq3aiisudfhy69ezagw

Hope this can provide some insights.

-Shri

On Thu, May 28, 2015 at 11:08 AM, Shrinand Javadekar
<shrinand@xxxxxxxxxxxxxx> wrote:
> We'll try and reproduce this and capture the output of xfs_repair when
> it happens next. Will keep an eye on what else was happening in the
> infrastructure when it happens.
>
> FWIW, we've seen this in local VMware environment as well as when we
> were running on Amazon EC2 instances. So it doesn't seem hypervisor
> specific.
>
> On Wed, May 27, 2015 at 5:53 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
>> And did anything else "interesting" happen prior to the detection?
>>
>>> On May 27, 2015, at 7:52 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
>>>
>>> You'll need to try to narrow down how it happened.
>>>
>>> The hexdumps in the logs show what data was in the buffer; in one case it 
>>> was ascii, and was definitely not xfs metadata.
>>>
>>> Either:
>>>
>>> a) xfs wrote the wrong metadata - almost impossible, because we verify the 
>>> data on write in the same way as we do on read
>>>
>>> b) xfs read the wrong block due to other metadata corruption.
>>>
>>> c) something corrupted the storage after it was written
>>>
>>> d) the storage returned the wrong data on a read request ...
>>>
>>> e) ???
>>>
>>> Did you save the xfs_repair output?  That might offer more clues.
>>>
>>> Unless you can reproduce it, it'll be hard to come up with a definitive 
>>> root cause... can you try?
>>>
>>> -Eric
>>>
>>>
>>>> On 5/27/15 7:03 PM, Shrinand Javadekar wrote:
>>>> Thanks Eric,
>>>>
>>>> We ran xfs_repair and were able to get it back into a running state.
>>>> This is fine for a test & dev but in production it won't be
>>>> acceptable. What other data do we need to get to the bottom of this?
>>>>
>>>>> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
>>>>> That's not a crash. That is xfs detecting on disk corruption which likely 
>>>>> happened at some time prior. You should unmount and run xfs_repair, 
>>>>> possibly with ân first if you would like to do a dry run to see what it 
>>>>> might do.  If you get fresh corruption after a full repair, then that 
>>>>> becomes more interesting. It's possible that you have a problem with the 
>>>>> underlying block layer or it's possible that it is an xfs bug -  but I 
>>>>> think this is not something that we have seen before.
>>>>>
>>>>> Eric
>>>>>
>>>>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar 
>>>>>> <shrinand@xxxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am running Openstack Swift in a VM with XFS as the underlying
>>>>>> filesystem. This is generating a metadata heavy workload on XFS.
>>>>>> Essentially, it is creating a new directory and a new file (256KB) in
>>>>>> that directory. This file has extended attributes of size 243 bytes.
>>>>>>
>>>>>> I am seeing the following two crashes of the machine:
>>>>>>
>>>>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
>>>>>>
>>>>>> AND
>>>>>>
>>>>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
>>>>>>
>>>>>> I have only seen these when running in a VM. We have run several tests
>>>>>> on physical server but have never seen these problems.
>>>>>>
>>>>>> Are there any known issues with XFS running on VMs?
>>>>>>
>>>>>> Thanks in advance.
>>>>>> -Shri
>>>>>>
>>>>>> _______________________________________________
>>>>>> xfs mailing list
>>>>>> xfs@xxxxxxxxxxx
>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>
>>>> _______________________________________________
>>>> xfs mailing list
>>>> xfs@xxxxxxxxxxx
>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@xxxxxxxxxxx
>>> http://oss.sgi.com/mailman/listinfo/xfs

Attachment: xfs_crash
Description: Binary data

<Prev in Thread] Current Thread [Next in Thread>