[Top] [All Lists]

Re: LWN article: ext4 and data loss

To: Linux XFS <xfs@xxxxxxxxxxx>
Subject: Re: LWN article: ext4 and data loss
From: pg_xf2@xxxxxxxxxxxxxxxxxx (Peter Grandi)
Date: Sun, 15 Mar 2009 14:26:42 +0000
In-reply-to: <200903142042.51574.Martin@xxxxxxxxxxxx>
References: <200903121239.35442@xxxxxx> <200903121514.12732.Martin@xxxxxxxxxxxx> <49B92423.4020708@xxxxxxxxxxx> <200903142042.51574.Martin@xxxxxxxxxxxx>
[ ... usual misunderstanding about caching and transactions ... ]

>>>> ext4 is taking its hints from XFS in this regard, not the
>>>> other way around.  XFS dealt with this long ago.

>>> Hmmm, I remember having had similar issues with XFS not to
>>> long ago,

>> depends on what you mean by not too long ago, I think.  Yes,
>> kde had this issue on xfs too, and xfs gave up on teaching
>> apps to fsync, and implemented the same sorts of things ext4
>> has done (or will do) to mitigate this quite some time ago.

> Well 2.6.28 and See
> http://oss.sgi.com/archives/xfs/2008-12/msg00540.html

>>> [ ... ] applications will have to get rid of behavioral
>>> assumptions regation filesystem and use safe writing via
>>> fsync and whatever else for configuration and other
>>> important files.

>> It's simple.  Want your data safe on disk?  fsync. There's
>> not a lot more to it than that. (and if fsync hurts perf too
>> much, re-think how you are storing your data)

>> Filesystems can hack around some heuristics to try to make
>> unsafe apps safer, but in the end, it's the app's job to make
>> sure a buffered write hits permanent storage when it matters.

This discussion is partially misguided, but then how many people
study storage system semantics...

The goal is to do atomic transactions: within a transaction
there are no guarantees, but at the end of transaction things
get stored permanently.

Unfortunately as described 'ext3' has historically done
''rolling'' auto-saving, so many people and application
developers have not appreciated the need for transaction
semantics (common attitude, for example how many programmers for
example check the return code of 'close'?).

Now under Linux and POSIX it is essentially impossible to do
atomic, persistent transactions, because:

* 'fsync' does NOT guarantee persistency. Only that *RAM*
  buffers are flushed; therefore host adapter and disk buffers
  are not required to be flushed.

* Linux write barriers also only guaranteeq ordering and not
  persistence, and there is a number of misguided people who
  think that this is how things should be.

> Hmmm, okay. So here is:
> http://bugs.kde.org/187172

In practice, for systems without caching host adapters, and with
'ext3', most of the time informal ''rolling'' transactions every
5s fool most people/work as if they were right, and as asserted
this has lulled developers into thinking that transactions don't
matter. Too bad this kills performance and/or reliability on
anything else.

This is just another example of how much userspace sucks


Note that in a proper design where 'fsync' would guarantee
persistence, like in every transactional systems, lots of small
transactions have very sharp performance implications. People
who earn a living doing transactional systems therefore spend a
great deal of money and effort designing them to perform well
despite lots of small transactions, with 15k drives, vast
parallel RAID, bettery backed logs, etc.

You cannot have all of these:

* Reliable transactions.
* Fast with lots of small transactions.
* With cheap hardware.

In the end one must decided whether to follow the Microsoft
strategy (f*ck doing the right thing, cultivate bugs that users
are relying on) or the UNIX one (try to do the right thing).

<Prev in Thread] Current Thread [Next in Thread>