[Top] [All Lists]

Re: External log size limitations

To: Andrew Klaassen <ak@xxxxxxxxxxx>
Subject: Re: External log size limitations
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 21 Feb 2011 08:14:10 +1100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <4D5E8FAD.9080802@xxxxxxxxxxx>
References: <4D5C1D77.1060000@xxxxxxxxxxx> <20110217003233.GH13052@dastard> <4D5E8FAD.9080802@xxxxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Fri, Feb 18, 2011 at 10:26:37AM -0500, Andrew Klaassen wrote:
> Dave Chinner wrote:
> >The limit is just under 2GB now - that document is a couple of years
> >out of date - so if you are running on anything more recent that a
> >~2.6.27 kernel 2GB logs should work fine.
> Ah, good to know.
> >Data write speed or metadata write speed? What sort of write
> >patterns?
> A couple of hundred nodes on a renderfarm doing mostly compositing
> with some 3D.  It's about 80/20 read/write.  On the current system
> that we're thinking of converting - an Exastore version 3 system -
> browsing the filesystem becomes ridiculously slow when write loads
> become moderate, which is why snappier metadata operations are
> attractive to us.

OK, but I don't think that the metadata operations are becoming slow
because you are doing write operations - they are likely to be slow
due to doing _lots of IO_. That won't change with XFS....

> One thing I'm worried about, though, is moving from the Exastore's
> 64K block size to the 4K Linux blocksize limitation.  My quick
> calculation says that that's going to reduce our throughput under
> random load (which is what a renderfarm becomes with a couple of
> hundred nodes) from about 200MB/s to about 13MB/s with our
> 56x7200rpm disks.  It's too bad those large blocksize patches from a
> couple of years back didn't go through to make this worry moot.

How much data is actually being changed out of each of those 64k
blocks? Last time I analysed a compositing application, it was
reading full frames and textures, then writing only the modified
portions of the frames back to the server. Because these were ѕmall
sections of the frames, it was typically writing only a few KB at a
time per IO, with several write IOs and seeks for each region it was
working on. It was completely small random write bound, and while
XFS does OK at that sort of workload, it's not optmised for it like

IOWs the write bandwidth of XFS will be determined by how big these
IOs are, not the block size. It may be faster doing smaller IOs
because the 64k block size would probably require read-modify-write
cycles for this workload. XFS will still max out the disk IOPS under
this workload, so don't expect cold-cache metadata operations to be
miraculously faster than on your current system...

> >As it is, 2GB is still not enough for preventing metadata writeback
> >for minutes if that is what you are trying to do.  Even if you use
> >the new delaylog mount option - which reduces log traffic by an
> >order of magnitude for most non-synchronous workloads - log write
> >rates can be upwards of 30MB/s under concurrent metadata intensive
> >workloads....
> Is there a rule-of-thumb to convert number of files being written to
> log write rates?  We push a lot of data through, but most of the
> files are a few megabytes in size instead of a few kilobytes.

Not really. Run your workload and measure it - XFS exports stats
that include the amount written to the journal. See:


> > filling a 2GB log means you'll
> >probably have ten of gigabytes of dirty metadata in memory, so
> >response to memory shortages can cause IO storms and severe
> >interactivity problems, etc.
> I assume that if we packed the server with 128GB of RAM we wouldn't
> have to worry about that as much.  But... short of that, would you
> have a rule of thumb for log size to memory size?  Could I expect
> reasonable performance with a 2GB log and 32GB in the server?  With
> 12GB in the server?


It's all dependent on your workload. Test it and see...

> I know you'd have to mostly guess to make up a rule of thumb, but
> your guesses would be a lot better than mine.  :-)
> >In general, I'm finding that a log size of around 512MB w/ delaylog
> >gives the best tradeoff between scalability, performance, memory
> >usage and relatively sane recovery times...
> I'm excited about the delaylog and other improvements I'm seeing
> entering the kernel, but I'm worried about stability.  There seem to
> have been a lot of bugfix patches and panic reports since 2.6.35 for
> XFS to go along with the performance improvements, which makes me
> tempted to stick to 2.6.34 until the dust settles and the kinks are
> worked out.  If I put the new XFS code on the server, will it stay
> up for a year or more without any panics or crashes?

If you are concerned about stability under heavy load in production
environments, then you should be running a well tested environment
such as RHEL or SLES. The latest and greatest mainline kernel is not
for you....


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>