xfs
[Top] [All Lists]

Re: write-caching with XFS

To: Steve Lord <lord@xxxxxxx>
Subject: Re: write-caching with XFS
From: Simon Matter <simon.matter@xxxxxxxxxxxxxxxx>
Date: Tue, 08 Jan 2002 08:13:54 +0100
>received: from mobile.sauter-bc.com (unknown [10.1.6.21]) by basel1.sauter-bc.com (Postfix) with ESMTP id 28D9A57306; Tue, 8 Jan 2002 08:13:55 +0100 (CET)
Cc: Chris Parrott <chris.parrott@xxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx
Organization: Sauter AG, Basel
References: <3C39DADF.8010700@echostar.com> <1010430010.7120.45.camel@jen.americas.sgi.com>
Sender: owner-linux-xfs@xxxxxxxxxxx
Steve Lord schrieb:
> 
> On Mon, 2002-01-07 at 11:29, Chris Parrott wrote:
> >
> > Greetings:
> >
> > I have noticed a very strange phenomenon involving XFS with hardware
> > write-caching being active on Maxtor hard drives.  We have seen this on
> > both 80 GB and 120 GB drives, so it's not limited to any one drive model
> > in particular.  Maxtor turns on write-caching by default in their hard
> > drives.
> >
> > We are working on a project which involves streaming live video data to
> > a large (approx. 78-118 GB, depending on the drive) partition formatted
> > with XFS.  As the data comes in, it is held in a ring buffer before
> > being dumped to the disk in fixed (approx. 99 KB) chunks.  With
> > write-caching turned on, dumping data to the XFS partition causes the
> > ring buffer to eventually overflow, resulting in periodic data loss.
> >  However, if we turn off write-caching, the ring buffer never seems to
> > overflow.  It seems that the write calls just block longer with
> > write-caching turned on.  Unfortunately, the extra blocking time causes
> > us to not be able to process our data promptly enough to prevent buffer
> > overflows.
> >
> > We had an engineer from Maxtor perform some IDE bus traces while data
> > was being spooled to the drive, and he could not find any indication
> > that drive performance itself was the culprit.  All of the I/O requests
> > to the drive itself were completed within the usual, expected durations
> > of time, once the corresponding IDE commands had been issued.
> >
> > I tried another experiment, in which I replaced the XFS filesystem with
> > ReiserFS, to determine if the problem with filesystem vs. IDE-driver
> > related.  The ring buffer did not overflow when writing to the ReiserFS
> > partition.  (We cannot use ReiserFS in production, as we depend on some
> > features only available in XFS.)
> >
> > We are using a 2.4.8 kernel, with the corresponding XFS patch applied.
> >  This kernel has been heavily modified to support our product, so we
> > cannot easily upgrade to the very latest kernel revision.  Hence, we
> > have not been able to track all the subsequent XFS developments.
> >
> > Does anyone know what might be going on in XFS to cause this sort of
> > behavior?  I am curious as to why the write requests to XFS would take
> > longer to complete with write-caching turned on.  I would like to keep
> > write-caching on, if at all possible, due to the overall performance gains.
> >
> > Many thanks in advance,
> 
> You might consider this:
> 
>  Journaled filesystems rely on controlling the ordering of writes to the
>  disk to maintain integrity. If a log write is reported by the device
>  driver as being on disk, then the filesystem assumes it is free to
>  write out the metadata itself. Lets assume we have an operation which
>  takes a block from the free space and assigns it to a file. We create
>  a transaction to do this and write it to the log. Once the log write
>  is completed, we allow the metadata to go out to disk. There are two
>  chunks of metadata written independently.
> 
>  Lets assume write caching is on. We write the log record into the
>  cache, it returns saying the data is safe, we allow the metadata
>  to go out. For some reason, one of the metadata writes makes it
>  through cache before the log write does. If you crash at this point
>  you have a corrupt filesystem. Unless Maxtor can guarantee that they
>  never lose write cached data in a power failure you are on shaky
>  ground here.
> 
> As for why you are seeing the behavior you are, I am not sure, but the
> xfs log is probably being continually written to - a circular buffer
> in the middle of the partition. If you have a spare spindle to experiment
> with, create a filesystem with an external log and see how it behaves.
> 
>         mkfs -t xfs -f -l logdev=/dev/xxx,size=16384b /dev/yyy
> 
>         mount -t xfs -o logdev=/dev/xxx /dev/yyy /xfs
> 
> Where /dev/xxx does not share the write cache with /dev/yyy
> 
> It is possible the log writes are causing pathalogical behavior in
> the cache.
> 
> Steve

I have made some performance tests in the last days just because I was
wondering how bad XFS performs on SoftRAID. You didn't tell us whether
you are using some kind of RAID so I expect you don't?

The XFS FAQ says that XFS performs slightly worse than ext2 on Soft
RAID1 and RAID5. I whish we could update the FAQ (Seth ?) to say it
performs really bad (if you don't care about your logdev)! My test
program used to run 40 minutes to complete on SoftRAID5 and without RAID
or Hardware RAID it was finished in 10 minutes. Ok, my setup was:

DELL PowerEdge 1400 with 4 U160 SCSI drives, DELL Raidadapter with 64M
cache (Megaraid) and onboard dual U160 Adaptec SCSI, PIII/800, 256MB
Ram.
My programm is a mix of hd, cpu and network load. Some hundred procs of
cp copying large amount of small files while a bonnie is running in
background while copying some amount of data via NFS and so on. As I
said mixed load. I have made this because when you just compare
different bonnie's or other benchmarks then you don't see possible
bottlenecks but in a dirty mixed test you may find them. The result, out
of my head, where:

XFS  on Hardware RAID5 w/o write caching                : ~10 min
XFS  on Hardware RAID5 w   write caching                : ~13 min
EXT3 on Hardware RAID5 w/o write caching                : ~13 min
XFS  on Software RAID5 w/o write caching                : ~42 min
EXT3 on Software RAID5 w/o write caching                : ~12 min
XFS  on Software RAID5 w/o write caching,
        logdev on SoftRAID1 on the same disks           : ~10 min

As I understand it we can say:
- Write caching does not always boost performance with XFS, and is very
dangerous as steve mentioned before.
- an external logdev can very much increase performance under some
sircumstances.

Sorry if my writing was just confusing but I liked to share what I found
out after stressing my spindles for many hours.

Simon

> 
> >
> > +chris
> >
> >
> > Chris Parrott
> > Linux Software Engineer
> > Echostar Technologies Corp.
> > 94 Inverness Terrace East
> > Englewood, CO 80112
> > phone: 303 706 5383 / fax: 303 799 6222
> > e-mail: chris.parrott@xxxxxxxxxxxx
> >
> --
> 
> Steve Lord                                      voice: +1-651-683-3511
> Principal Engineer, Filesystem Software         email: lord@xxxxxxx



<Prev in Thread] Current Thread [Next in Thread>