xfs
[Top] [All Lists]

Re: Filesystem Consistency Issues

To: Linux-XFS Mailing List <linux-xfs@xxxxxxxxxxx>
Subject: Re: Filesystem Consistency Issues
From: pg_xfs@xxxxxxxxxxxxxxxxxx (Peter Grandi)
Date: Sun, 4 Dec 2005 23:41:42 +0000
In-reply-to: <20051204142506.GE2605@xxxxxxxxxxx>
References: <20051204142506.GE2605@xxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
>>> On Sun, 4 Dec 2005 22:25:06 +0800, Federico Sevilla III
>>> <jijo@xxxxxxxxxxx> said:

jijo> Hi, We use XFS (Linux kernel 2.6.12 on Debian 3.1 Sarge)
jijo> on fat client point-of-sale terminals.

Somewhat odd idea... XFS is designed towards really large
systems, even if it supports small ones fairly decently.

jijo> [ ... ] The POS application regularly writes transaction
jijo> audit information and maintains a local cache of the
jijo> product database on the local drive, [ ... ]

jijo> [ ... 800MB used out of 20-40GB, but in two months the free
jijo> list is gone ... ] When I investigated one of the machines,
jijo> mounting the filesystem in read/write mode after having
jijo> booted from a rescue CD automatically fixed part of it,
jijo> freeing about 60% of the filesystem. Running xfs_repair
jijo> further freed up space by moving disconnected inodes to
jijo> lost+found.  I found that these machines were regularly
jijo> powered off without a proper shutdown, and presumably with
jijo> dirty data in the buffers. [ ... ]

jijo> I find XFS's behavior troubling, though.

Perhaps, but that is about as good as it goes.

jijo> First, that I had to mount the filesystem (the root
jijo> partition) from a rescue CD for the log replay to "fix"
jijo> things properly.

IIRC Debian Sarge mounts the root filesystem in a way that for
example does not necessarily trigger the mount count/time based
check. At least it did not happen on my mostly-Sarge PC.

jijo> Isn't it supposed to do this by itself on bootup?

This really depends on how the distribution handles that. I
would check the system log for any messages saying that checking
has been skipped, or should be done.

jijo> And second, that the filesystem's consistency needed
jijo> xfs_repair to completely repair things.

That's not surprising; several journaling filesystems don't
journal freelist transactions, because they can reconstruct it
from a filesystem scan, and that is usually cheaper/faster.

jijo> Data loss during incorrect shutdown is understandable and
jijo> acceptable, but we use a journaling filesystem like XFS in
jijo> particular so that filesystem consistency is guaranteed,
jijo> right?

I would guess that means consistent as far as it goes; that is,
the _visible_ parts of the filesystem metadata are consistent.
Things like the free list and unattached inodes are invisible to
user programs. Making sure that on restart they are fine means
either lots more overhead journaling, or running a full scan on
restart.

jijo> The systems are configured so that hdparm disables write
jijo> caching on the drives, [ ... ]

That's a good idea, but check it actually works; some drives
(especially 2.5" ones) don't support that or just ignore that
command...

  http://WWW.sabi.co.UK/Notes/anno05-3rd.html#050912

jijo> We don't have ECC RAM, though, since these are POS
jijo> terminals, not servers.

That's a different topic, and a pet peeve of mine: _everybody_
not just servers should use ECC RAM. There are good arguments
that even with reliable memory, as soon as it get over around
64MiB things are dangerous, and at least _detection_ of RAM
errors should be there (and then parity can then do ECC too).

But must PC buyers don't know, so it is hard to find chipsets
and motherboards that support it. Bah!

jijo> What's the known behavior of XFS as far as not being
jijo> properly unmounted on a regular basis is concerned?

Just like with any other modern file system, very bad things may
happen, more so because of the delayed allocation policy for
data blocks.

Applications that care about that should indeed use 'fsync' (and
perhaps on directories too [I wonder if it is supported] if
files are created/deleted) as you say:

jijo> I have a number of projects where this is a "way of life"
jijo> and where the best thing we can do on the application level
jijo> is to issue an fsync() after critical operations, [ ... ]

but 'sync' as a mount option is very likely a good idea; no
modern (post-MS-DOS/MS-Win 9x) file system is designed to behave
well without proper unmounting, except in 'sync' mode.

There is a rather important issue for all journaled file
systems: how often is the in-memory journal written to disk?

With 'ext3' one can use the 'commit' mount parameter to set an
interval in seconds, and XFS and JFS are subject instead to the
parameters like 'dirty_background_ratio' and 'dirty_ratio' for
the kernel 'pdflush' daemon.

But even setting the journal writing interval to as low as say 1
second (or equivalent) means that there is a 1 second window of
vulnerability, and in on second one can conceivably journal _a
lot_ of stuff, and bye-bye if someone powers off the machine at
that point.

There is probably no way around it other than '-o sync'.


<Prev in Thread] Current Thread [Next in Thread>