xfs
[Top] [All Lists]

Re: Kernel OOPS and several filed nullified

To: Blake Matheny <matheny@xxxxxxxxxxx>
Subject: Re: Kernel OOPS and several filed nullified
From: Andi Kleen <ak@xxxxxxx>
Date: Thu, 24 Jan 2002 00:52:08 +0100
Cc: Seth Mos <knuffie@xxxxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <20020123225816.GA43826@mail.dbaseiv.net>
References: <20020123193846.GA43044@mail.dbaseiv.net> <4.3.2.7.2.20020124001056.02b721b0@pop.xs4all.nl> <20020123225816.GA43826@mail.dbaseiv.net>
Sender: owner-linux-xfs@xxxxxxxxxxx
User-agent: Mutt/1.3.22.1i
On Wed, Jan 23, 2002 at 05:58:16PM -0500, Blake Matheny wrote:
> Eric Sandeen was good enough to email me a utility which looks for
> null'd files. That found a few of them. Most of /usr/local/lib
> (anything that had been open ~30 seconds before going to single user
> mode) got the whack. When the system oops'd I didn't have any files
> open. But it seems like anything that had been open up to 30 seconds
> before the crash was corrupted. 'Recovery' took place after reboot. Is
> there any chance that rmap11c, preempt or some of the hash patches I'm
> using is causing this behaviour? I was running XFS for months with no

The behaviour is 'designed' into XFS. XFS does very aggressive delayed
flushing of file data to get good extent allocation. On the other hand
file system metadata like the file size is logged to a log on disk. 
When a file is opened for rewrite e.g. by an editor it is first truncated
(file size set to 0, data discarded) and then when the editor writes 
it it a file size update is put into the log. The data is not flushed
yet, but only later (this is needed to get very good IO performance)
Now when the log is flushed earlier than the data you have a file with old
data discarded, size already set to new size, but data not flushed to disk. 
The result is a file with a 'hole' over the whole file size. A whole 
is returned to you as zeroes. Log flushes usually happen a lot more
often then data flushes; it can be triggered by an delete or rename, 
which data flush is only triggered after 30seconds (default buffer
flush time).  

Basically it's the price you have to pay for the high performance with
bulk IO in XFS. 

One way to make the problem likely less visible is to change the buffer
flush delay. As you discovered it is 30seconds by default. It can be
changed via the /proc/sys/vm/bdflush sysctl; the 6th number there is 
the buffer flush delay in jiffies (on i386 a jiffie is 10ms). For example
you could set a buffer flush time of 5 seconds, this would force earlier
file data flushing. Of course it could also have some bad impact on 
your IO performance, if you rely on fast IO you should probably better
benchmark it first if it doesn't cause too big a slowdown. 


-Andi


<Prev in Thread] Current Thread [Next in Thread>