XFS filesystem corruption
Ric Wheeler
rwheeler at redhat.com
Wed Mar 6 10:47:39 CST 2013
On 03/06/2013 11:16 AM, Julien FERRERO wrote:
> Hi Emmanuel
>
> 2013/3/6 Emmanuel Florac <eflorac at intellique.com>:
>> Le Wed, 6 Mar 2013 16:08:59 +0100 vous écriviez:
>>
>>> I am totally stuck and I really don't know how to duplicate the
>>> corruption. I only know that units are used to be power cycle by
>>> operator while the fs is still mounted (no proper shutdown / reboot).
>>> My guess is the fs journal shall handle this case and avoid such
>>> corruption.
>> Wrong guess. It may work or not, depending upon a long list of
>> parameters, but basically not turning it off properly is asking for
>> problems and corruptions. The problem will be tragically aggravated if
>> your hardware RAID doesn't have a battery backed-up cache.
>>
> OK but our server is 95% of the time reading data and 5% of the time
> writing data. We have a case of a server that did not write anything
> at the time of failure (and during all the uptime session). Moreover,
> failure occurs to files that were opened in read-only or weren't
> accessed at all at the time of failure. I don't think the H/W RAID is
> the issue since we have the same corruption with other setup without
> H/W RAID.
>
> Does the "ls" output with "???" looks like a fs corruption ?
>
Caching can hold dirty data in volatile cache for a very long time. Even if you
open a file in "read-only" mode, you still do a fair amount of writes to
storage. You can use blktrace or similar tool to see just how much data is written.
As mentioned earlier, you always must unmount cleanly as a best practice. An
operator that powers off with mounted file systems need educated or let go :)
Ric
More information about the xfs
mailing list