xfs
[Top] [All Lists]

Re: write-caching with XFS

To: Chris Parrott <chris.parrott@xxxxxxxxxxxx>
Subject: Re: write-caching with XFS
From: Steve Lord <lord@xxxxxxx>
Date: 07 Jan 2002 13:00:10 -0600
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <3C39DADF.8010700@xxxxxxxxxxxx>
References: <3C39DADF.8010700@xxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
On Mon, 2002-01-07 at 11:29, Chris Parrott wrote:
> 
> Greetings:
> 
> I have noticed a very strange phenomenon involving XFS with hardware 
> write-caching being active on Maxtor hard drives.  We have seen this on 
> both 80 GB and 120 GB drives, so it's not limited to any one drive model 
> in particular.  Maxtor turns on write-caching by default in their hard 
> drives.
> 
> We are working on a project which involves streaming live video data to 
> a large (approx. 78-118 GB, depending on the drive) partition formatted 
> with XFS.  As the data comes in, it is held in a ring buffer before 
> being dumped to the disk in fixed (approx. 99 KB) chunks.  With 
> write-caching turned on, dumping data to the XFS partition causes the 
> ring buffer to eventually overflow, resulting in periodic data loss. 
>  However, if we turn off write-caching, the ring buffer never seems to 
> overflow.  It seems that the write calls just block longer with 
> write-caching turned on.  Unfortunately, the extra blocking time causes 
> us to not be able to process our data promptly enough to prevent buffer 
> overflows.
> 
> We had an engineer from Maxtor perform some IDE bus traces while data 
> was being spooled to the drive, and he could not find any indication 
> that drive performance itself was the culprit.  All of the I/O requests 
> to the drive itself were completed within the usual, expected durations 
> of time, once the corresponding IDE commands had been issued.
> 
> I tried another experiment, in which I replaced the XFS filesystem with 
> ReiserFS, to determine if the problem with filesystem vs. IDE-driver 
> related.  The ring buffer did not overflow when writing to the ReiserFS 
> partition.  (We cannot use ReiserFS in production, as we depend on some 
> features only available in XFS.)
> 
> We are using a 2.4.8 kernel, with the corresponding XFS patch applied. 
>  This kernel has been heavily modified to support our product, so we 
> cannot easily upgrade to the very latest kernel revision.  Hence, we 
> have not been able to track all the subsequent XFS developments.
> 
> Does anyone know what might be going on in XFS to cause this sort of 
> behavior?  I am curious as to why the write requests to XFS would take 
> longer to complete with write-caching turned on.  I would like to keep 
> write-caching on, if at all possible, due to the overall performance gains.
> 
> Many thanks in advance,

You might consider this:

 Journaled filesystems rely on controlling the ordering of writes to the
 disk to maintain integrity. If a log write is reported by the device 
 driver as being on disk, then the filesystem assumes it is free to
 write out the metadata itself. Lets assume we have an operation which
 takes a block from the free space and assigns it to a file. We create
 a transaction to do this and write it to the log. Once the log write
 is completed, we allow the metadata to go out to disk. There are two
 chunks of metadata written independently.

 Lets assume write caching is on. We write the log record into the
 cache, it returns saying the data is safe, we allow the metadata
 to go out. For some reason, one of the metadata writes makes it
 through cache before the log write does. If you crash at this point
 you have a corrupt filesystem. Unless Maxtor can guarantee that they
 never lose write cached data in a power failure you are on shaky
 ground here.


As for why you are seeing the behavior you are, I am not sure, but the
xfs log is probably being continually written to - a circular buffer 
in the middle of the partition. If you have a spare spindle to experiment
with, create a filesystem with an external log and see how it behaves.

        mkfs -t xfs -f -l logdev=/dev/xxx,size=16384b /dev/yyy

        mount -t xfs -o logdev=/dev/xxx /dev/yyy /xfs

Where /dev/xxx does not share the write cache with /dev/yyy

It is possible the log writes are causing pathalogical behavior in
the cache.

Steve

> 
> +chris
> 
> 
> Chris Parrott
> Linux Software Engineer
> Echostar Technologies Corp.
> 94 Inverness Terrace East
> Englewood, CO 80112
> phone: 303 706 5383 / fax: 303 799 6222
> e-mail: chris.parrott@xxxxxxxxxxxx
> 
-- 

Steve Lord                                      voice: +1-651-683-3511
Principal Engineer, Filesystem Software         email: lord@xxxxxxx


<Prev in Thread] Current Thread [Next in Thread>