xfs
[Top] [All Lists]

Re: XFS filesystem corruption

To: Julien FERRERO <jferrero06@xxxxxxxxx>
Subject: Re: XFS filesystem corruption
From: Ric Wheeler <rwheeler@xxxxxxxxxx>
Date: Wed, 06 Mar 2013 11:47:39 -0500
Cc: Emmanuel Florac <eflorac@xxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CAPcwv6wqv0b_CPqDpBfOwVDg23uBi=tpGQSy9XuH2uWS5oVMWQ@xxxxxxxxxxxxxx>
References: <CAPcwv6wZJSBtgF-L6KNSn6N6Y+wUZJFXdbcg+zYRwoaB2sDdjw@xxxxxxxxxxxxxx> <20130306161519.2c28d911@xxxxxxxxxxxxxx> <CAPcwv6wqv0b_CPqDpBfOwVDg23uBi=tpGQSy9XuH2uWS5oVMWQ@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130219 Thunderbird/17.0.3
On 03/06/2013 11:16 AM, Julien FERRERO wrote:
Hi Emmanuel

2013/3/6 Emmanuel Florac <eflorac@xxxxxxxxxxxxxx>:
Le Wed, 6 Mar 2013 16:08:59 +0100 vous écriviez:

I am totally stuck and I really don't know how to duplicate the
corruption. I only know that units are used to be power cycle by
operator while the fs is still mounted (no proper shutdown / reboot).
My guess is the fs journal shall handle this case and avoid such
corruption.
Wrong guess. It may work or not, depending upon a long list of
parameters, but basically not turning it off properly is asking for
problems and corruptions. The problem will be tragically aggravated if
your hardware RAID doesn't have a battery backed-up cache.

OK but our server is 95% of the time reading data and 5% of the time
writing data. We have a case of a server that did not write anything
at the time of failure (and during all the uptime session). Moreover,
failure occurs to files that were opened in read-only or weren't
accessed at all at the time of failure. I don't think the H/W RAID is
the issue since we have the same corruption with other setup without
H/W RAID.

Does the "ls" output with "???" looks like a fs corruption ?


Caching can hold dirty data in volatile cache for a very long time. Even if you open a file in "read-only" mode, you still do a fair amount of writes to storage. You can use blktrace or similar tool to see just how much data is written.

As mentioned earlier, you always must unmount cleanly as a best practice. An operator that powers off with mounted file systems need educated or let go :)

Ric

<Prev in Thread] Current Thread [Next in Thread>