Dave Chinner wrote:
The limit is just under 2GB now - that document is a couple of years
out of date - so if you are running on anything more recent that a
~2.6.27 kernel 2GB logs should work fine.
Ah, good to know.
Data write speed or metadata write speed? What sort of write
patterns?
A couple of hundred nodes on a renderfarm doing mostly compositing with
some 3D. It's about 80/20 read/write. On the current system that we're
thinking of converting - an Exastore version 3 system - browsing the
filesystem becomes ridiculously slow when write loads become moderate,
which is why snappier metadata operations are attractive to us.
One thing I'm worried about, though, is moving from the Exastore's 64K
block size to the 4K Linux blocksize limitation. My quick calculation
says that that's going to reduce our throughput under random load (which
is what a renderfarm becomes with a couple of hundred nodes) from about
200MB/s to about 13MB/s with our 56x7200rpm disks. It's too bad those
large blocksize patches from a couple of years back didn't go through to
make this worry moot.
> Also, don't forget that data is not logged so increasing
the log size won't change the speed of data writeback.
Yes, of course... that momentarily slipped my mind.
As it is, 2GB is still not enough for preventing metadata writeback
for minutes if that is what you are trying to do. Even if you use
the new delaylog mount option - which reduces log traffic by an
order of magnitude for most non-synchronous workloads - log write
rates can be upwards of 30MB/s under concurrent metadata intensive
workloads....
Is there a rule-of-thumb to convert number of files being written to log
write rates? We push a lot of data through, but most of the files are a
few megabytes in size instead of a few kilobytes.
If you want a log larger than 2GB, then there is a lot of code
changes in both kernel an userspace as the log arithmetic is all
done via 32 bit integers and a lot of it is byte based.
Good to know.
As it is, there are significant scaling issues with logs of even 2GB
in size - log replay can take tens of minutes when a log full of
inode changes have to be replayed,
We've got decent a UPS, so unless we get kernel panics, those tens of
minutes for an occasional unexpected hard shutdown should mean less lost
production time than the drag of slower metadata operations all the time.
> filling a 2GB log means you'll
probably have ten of gigabytes of dirty metadata in memory, so
response to memory shortages can cause IO storms and severe
interactivity problems, etc.
I assume that if we packed the server with 128GB of RAM we wouldn't have
to worry about that as much. But... short of that, would you have a
rule of thumb for log size to memory size? Could I expect reasonable
performance with a 2GB log and 32GB in the server? With 12GB in the server?
I know you'd have to mostly guess to make up a rule of thumb, but your
guesses would be a lot better than mine. :-)
In general, I'm finding that a log size of around 512MB w/ delaylog
gives the best tradeoff between scalability, performance, memory
usage and relatively sane recovery times...
I'm excited about the delaylog and other improvements I'm seeing
entering the kernel, but I'm worried about stability. There seem to
have been a lot of bugfix patches and panic reports since 2.6.35 for XFS
to go along with the performance improvements, which makes me tempted to
stick to 2.6.34 until the dust settles and the kinks are worked out. If
I put the new XFS code on the server, will it stay up for a year or more
without any panics or crashes?
Thanks for your great feedback. This is one of the things that makes
open source awesome.
Andrew
|