[Top] [All Lists]

Re: XFS crashes on VMs

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: XFS crashes on VMs
From: Shrinand Javadekar <shrinand@xxxxxxxxxxxxxx>
Date: Thu, 28 May 2015 11:08:59 -0700
Cc: "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <54A83597-7AE8-4846-944F-EEE7E2AD8B3C@xxxxxxxxxxx>
References: <CABppvi4FZKTsu22uk6nOSaShJOUyWg7cO6h-i4YekF4MLKH8RQ@xxxxxxxxxxxxxx> <8BF495B8-F444-415A-B7BF-5E1961C75817@xxxxxxxxxxx> <CABppvi5ODWyjMw0pouAQ__SQ-aGb48PEyTCPU7Dt65Z1DhKmVA@xxxxxxxxxxxxxx> <556666B2.8060301@xxxxxxxxxxx> <54A83597-7AE8-4846-944F-EEE7E2AD8B3C@xxxxxxxxxxx>
We'll try and reproduce this and capture the output of xfs_repair when
it happens next. Will keep an eye on what else was happening in the
infrastructure when it happens.

FWIW, we've seen this in local VMware environment as well as when we
were running on Amazon EC2 instances. So it doesn't seem hypervisor

On Wed, May 27, 2015 at 5:53 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
> And did anything else "interesting" happen prior to the detection?
>> On May 27, 2015, at 7:52 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
>> You'll need to try to narrow down how it happened.
>> The hexdumps in the logs show what data was in the buffer; in one case it 
>> was ascii, and was definitely not xfs metadata.
>> Either:
>> a) xfs wrote the wrong metadata - almost impossible, because we verify the 
>> data on write in the same way as we do on read
>> b) xfs read the wrong block due to other metadata corruption.
>> c) something corrupted the storage after it was written
>> d) the storage returned the wrong data on a read request ...
>> e) ???
>> Did you save the xfs_repair output?  That might offer more clues.
>> Unless you can reproduce it, it'll be hard to come up with a definitive root 
>> cause... can you try?
>> -Eric
>>> On 5/27/15 7:03 PM, Shrinand Javadekar wrote:
>>> Thanks Eric,
>>> We ran xfs_repair and were able to get it back into a running state.
>>> This is fine for a test & dev but in production it won't be
>>> acceptable. What other data do we need to get to the bottom of this?
>>>> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
>>>> That's not a crash. That is xfs detecting on disk corruption which likely 
>>>> happened at some time prior. You should unmount and run xfs_repair, 
>>>> possibly with ân first if you would like to do a dry run to see what it 
>>>> might do.  If you get fresh corruption after a full repair, then that 
>>>> becomes more interesting. It's possible that you have a problem with the 
>>>> underlying block layer or it's possible that it is an xfs bug -  but I 
>>>> think this is not something that we have seen before.
>>>> Eric
>>>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@xxxxxxxxxxxxxx> 
>>>>> wrote:
>>>>> Hi,
>>>>> I am running Openstack Swift in a VM with XFS as the underlying
>>>>> filesystem. This is generating a metadata heavy workload on XFS.
>>>>> Essentially, it is creating a new directory and a new file (256KB) in
>>>>> that directory. This file has extended attributes of size 243 bytes.
>>>>> I am seeing the following two crashes of the machine:
>>>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
>>>>> AND
>>>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
>>>>> I have only seen these when running in a VM. We have run several tests
>>>>> on physical server but have never seen these problems.
>>>>> Are there any known issues with XFS running on VMs?
>>>>> Thanks in advance.
>>>>> -Shri
>>>>> _______________________________________________
>>>>> xfs mailing list
>>>>> xfs@xxxxxxxxxxx
>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@xxxxxxxxxxx
>>> http://oss.sgi.com/mailman/listinfo/xfs
>> _______________________________________________
>> xfs mailing list
>> xfs@xxxxxxxxxxx
>> http://oss.sgi.com/mailman/listinfo/xfs

<Prev in Thread] Current Thread [Next in Thread>