xfs
[Top] [All Lists]

Re: xfs caching and data loss

To: linux-xfs@xxxxxxxxxxx
Subject: Re: xfs caching and data loss
From: David J N Begley <d.begley@xxxxxxxxxx>
Date: Tue, 1 Feb 2005 05:02:44 +1100 (EST)
In-reply-to: <F62740B0EFCFC74AA6DCF52CD746242D01033823@iu-mssg-mbx05.exchange.iu.edu>
References: <F62740B0EFCFC74AA6DCF52CD746242D01033823@iu-mssg-mbx05.exchange.iu.edu>
Reply-to: David J N Begley <d.begley@xxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
On Mon, 31 Jan 2005, Wilkins, Vern wrote:

> I've read as much as I could find on XFS, and have used XFS extensively on
> production systems, for quite a long time, with virtually no problems
> whatsoever.  Still, I regularly hear comments about the increased risk of
> data loss with XFS, due to it's aggressive caching.

I am coming at this same issue/question from a slightly different perspective;
I have almost no XFS experience, though am considering deploying the file
system on some production machines (I am trying to find time to conduct some
realistic simulations to test performance of XFS versus others for our
applications).  Subsequently, I am reading all I can about XFS' design and
people's real-world experiences with the file system.

> Could someone explain to me, is this a myth, something that carried over
> from older versions, or is it a reality.

Whilst more experienced users/designers are more qualified to comment here
than I, hopefully by putting my thoughts in writing someone can correct my
train of thought and in the process, help us both (and others).

Pretty much any system risks data loss due to unexpected power loss - this
could be because data sent from app to FS has not yet been flushed from
buffers/cache, ditto for underlying device code (be it real device drivers or
pseudo drivers such as logical volume managers).  Add to that transmission of
data between hardware components then finally the successful writing of said
data to a physical disk - all without considering a physical disk fault caused
by power loss.

In that sense, from what I can tell XFS risks data loss (data intended to be
sent to the disk but it never made it before the power disappeared) - though
so does any other file system (only the amount varies).

> If it is a real risk, is the risk greater than with other filesystems such
> as reiser3, ext3, jfs, etc, and to what degree?

This is probably the more important issue - the risk of losing data on the
disk due to incomplete transactions (meta data updated but data itself not
on the disk).  It is this issue which (I suspect) more than anything else
raises concern in admins' minds than any aggressive caching.

To use the ext3-related terminology, the main Linux journal file systems could
be categorised as three types:

- data=journal;  highest theoretical protection, lowest theoretical
  performance except in certain circumstances;  all data and metadata
  written to journal before main file system

- data=ordered;  reasonable/high protection, reasonable/high performance;
  data forced to disk prior to the associated metadata committed to journal

- data=writeback;  lowest protection, highest theoretical performance;
  metadata and associated data written to disk in non-reasonably-
  deterministic order (ie., after crash old data may appear in files)

Obviously ext3 offers all three modes though defaults to "ordered".  I have
(through unintended experience) nothing but positive things to say about
ext3's reliability in ordered mode (even on ATA disks in a standard PC).

Reiser3 on standard Linux 2.4 kernels is more akin to ext3's "writeback" mode,
whilst on Linux 2.6 kernels it apparently also supports "ordered" mode (note,
some distributors, such as SuSE, have added "ordered" mode to Reiser3 on their
2.4-based systems).

This brings us to XFS.  From what I can tell, XFS is also more like ext3's
"writeback" mode - this is (in part) why under some circumstances, files will
appear to contain zeros for "no apparent reason" after an unclean
shutdown/restart (that is, metadata written to disk prior to the data itself).
From my reading of experiences posted on the Web, people seem to claim this
"data loss" happens more often with XFS than with other file systems - hence
the claim that XFS loses data more often (though as above, this is probably
more due to the order of updates being written to the disk than merely
"aggressive caching").

So why would I bother even considering XFS?  Scalable performance - though
this is yet to be proven (after I complete my simulation tests).

> I don't see this being much of an issue on servers anyway, since who
> wouldn't have a production server on a UPS, but still it seems the data
> loss/caching issue is one of the only negatives (real or not), that I see
> regularly mentioned in regards to XFS.

Unfortunately the "just use a UPS" response is not as useful as you may think;
for example, the rooms into which I will have to place our servers all have
remote aircon alarms/monitoring, UPS, backup generators, etc.  None of this
protects you against a catastrophic UPS failure, or a tight-arse facilities
manager who won't provide you with access to the UPS' monitoring signals so
you can shutdown cleanly before power is lost.

Of course, real failure may only be once a year or even more rare, but if it
leads to the loss of important data then that's one loss too many - hence
people freaking out about XFS (even if the real incidents of loss are few).

Sorry for the long reply, especially if I've managed to completely screw-up
any details - but as above, I figure it's better to put my thoughts in writing
so someone can correct me and help us both.

Cheers..


-- 
"You win again, gravity!" - Zapp Brannigan (Futurama)


<Prev in Thread] Current Thread [Next in Thread>