xfs
[Top] [All Lists]

Filesystem Consistency Issues

To: Linux-XFS Mailing List <linux-xfs@xxxxxxxxxxx>
Subject: Filesystem Consistency Issues
From: Federico Sevilla III <jijo@xxxxxxxxxxx>
Date: Sun, 4 Dec 2005 22:25:06 +0800
Mail-followup-to: Linux-XFS Mailing List <linux-xfs@xxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.9i
Hi,

We use XFS (Linux kernel 2.6.12 on Debian 3.1 Sarge) on fat client
point-of-sale terminals. Right now we have four hypermarts with about 40
terminals each, all running the same setup. The POS application
regularly writes transaction audit information and maintains a local
cache of the product database on the local drive, which for simplicity
we've partitioned with a single root filesystem.

Recently, we noticed that we have a growing discrepancy between actual
disk usage and disk free reports. Most of the systems have 20GB to 40GB
hard drives, but actually only use about 800MB all in. Disk usage as
reported by

    # du -csh /

stays at around 800MB, but disk usage as reported by

    # df -h

continuously increases until the filesystem reaches 100% utilization. It
only takes a couple of months of daily use (power-on in the morning,
power-off in the evening) for this to happen.

When I investigated one of the machines, mounting the filesystem in
read/write mode after having booted from a rescue CD automatically fixed
part of it, freeing about 60% of the filesystem. Running xfs_repair
further freed up space by moving disconnected inodes to lost+found.

I found that these machines were regularly powered off without a proper
shutdown, and presumably with dirty data in the buffers. Of course the
"if it hurts, don't do it" rule applies here, and we're working to
correct this procedure by having store personnel shut down the machines
properly.

I find XFS's behavior troubling, though. First, that I had to mount the
filesystem (the root partition) from a rescue CD for the log replay to
"fix" things properly. Isn't it supposed to do this by itself on bootup?
And second, that the filesystem's consistency needed xfs_repair to
completely repair things.

Data loss during incorrect shutdown is understandable and acceptable,
but we use a journaling filesystem like XFS in particular so that
filesystem consistency is guaranteed, right?

The systems are configured so that hdparm disables write caching on the
drives, so I've ruled that common mistake out. We don't have ECC RAM,
though, since these are POS terminals, not servers.

What's the known behavior of XFS as far as not being properly unmounted
on a regular basis is concerned? I have a number of projects where this
is a "way of life" and where the best thing we can do on the application
level is to issue an fsync() after critical operations, and I want to
know if I should continue to stick by my stand to use XFS, or if I
should begin playing around with other filesystems.

Thanks in advance for any insights.

Cheers!

 --> Jijo

-- 
Federico Sevilla III : jijo.free.net.ph : When we speak of free software
GNU/Linux Specialist : GnuPG 0x93B746BE : we refer to freedom, not price.


<Prev in Thread] Current Thread [Next in Thread>