[Top] [All Lists]

Re: XFS use within multi-threaded apps

To: Peter Grandi <pg_xf2@xxxxxxxxxxxxxxxxxx>
Subject: Re: XFS use within multi-threaded apps
From: Angelo McComis <angelo@xxxxxxxxxxx>
Date: Sat, 23 Oct 2010 16:59:06 -0400
Cc: Linux XFS <xfs@xxxxxxxxxxx>
In-reply-to: <19651.15840.54770.942761@xxxxxxxxxxxxxxxxxx>
References: <AANLkTi=w1o8EF6-M7o8Qi9VpY-10m+MCR8U+K1_Aze=g@xxxxxxxxxxxxxx> <87eibm4xon.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <AANLkTikqHvSmr=WSZ3t8m3wKOGcUmQQ1v9Qx9MqGArS+@xxxxxxxxxxxxxx> <19651.15840.54770.942761@xxxxxxxxxxxxxxxxxx>
On Sat, Oct 23, 2010 at 3:56 PM, Peter Grandi <pg_xf2@xxxxxxxxxxxxxxxxxx> wrote:
>>> I have a use case where I'd like to forward the use of XFS. This is for
>>> large (multi-GB, say anywhere from 5GB to 300GB) individual files, such as
>>> what you'd see under a database's data file / tablespace.

>> Step 1) Use XFS.
>> Nothing, and I do mean nothing comes close to reliability and consistent
>> performance.

> I have been running iozone benchmarks, [ ... ]

I think that it is exceptionally difficult to get useful results
out of Iozone...

True - the benchmarks themselves don't tell a complete story. Specific to iozone, I was basically comparing XFS to EXT3, and showing the results (various record sizes, various file sizes, and various worker thread counts)... The only true benchmark is to run the application in the way that is characteristic of how it will be used. Database benchmarks themselves would vary greatly between use cases: from generic looking up data (random reads), data warehouse analytics (sequential reads), ETL (sequential reads, sequential writes), etc. 
>>> My database vendor (who, coincidentally markets their own
>>> filesystems and operating systems) says that there are
>>> certain problems under XFS with specific mention of
>>> corruption issues, if a single root or the metadata become
>>> corrupted, the entire filesystem is gone,

If that's bad enough it applies to any file system out there
except FAT and Reiser, as they store some metadata with each
block. ZFS and BTRFS may have something similar. But it is not
an issue.

>>> and it has performance issues on a multi-threaded workload,
>>> caused by the single root filesystem for metadata becoming a
>>> bottleneck.

That's actually more of a problem with Lustre, in extreme cases.

>> XFS has anything but performance problems on multithreaded
>> workloads. It is *the* best of the Linux filesystems
>> (actually... possibly any file system anywhere) for
>> multithreaded IO.

That's actually multithreaded IO to the same file, for
multithreaded IO to different files JFS (and allegedly 'ext4') are
also fairly good.

> Well - I mentioned it above. Their current recommendation for
> Linux is to stick with ext3... and for big file/big IO
> operations, switch to ext4.

That's just about because those are the file systems that are
"qualified", and 'ext3' defaults give the lowest risks in case the
application environment is misdesigned and relies on 'O_PONIES'.

> [ ... ] "well, ext3 has problems whenever the kernel journal
> thread wakes up to flush under heavy I/O,

That actually happens with every file system, and it is one of
several naive misdesigns in the Linux IO subsystem. The default
Linux page cache flusher parameters are often too "loose" by a 1-2
orders of magnitude, and this can cause serious problems. Nedver
mind that the page cache

In any case the Linux page cache itself is also a bit of a joke, a
(hopefully) a DBMS will not use it anyhow, but use direct IO, and
XFS is targeted at direct IO, large file, multistreaming loads.

Peter, and others:
Thanks for this great discussion. I appreciate the thought that went into all of the replies.

In the end, we had a sit down discussion with our vendor.  They admitted that they "support" XFS, but have very few customers using it (said they can count them on one hand), and when I pressed them on if it's a technology limitation, they threw down the gauntlet and said "look, we're giving you our frank recommendation here. EXT3."  They quoted as having 10+TB databases running OLTP transactions on XFS, with 4-5GB/sec sustained throughput to the disk system.  And 20-30TB for data warehouse type operations.  When pressed about the cache flush issue, they mentioned they use direct IO under ext3, and it's not an issue in that case.

In doing my research, I searched for references of other Fortune nn-sized companies who use this DB and use XFS underneath it. I came up empty handed...  I searched my network for large-ish companies using XFS, and how they were using it.  I'm not sure if we're bordering on "secret sauce" type stuff here, but I had an extremely difficult time getting enterprise references to back up the research I've done.

For our use, we had to opt to follow the vendor recommendation, and it came down to not wanting to be one of those that they can count on one hand using XFS with their product.

I'm still confounded by why - when XFS is technically superior in these cases - is it so obscure?  Are Enterprise Linux guys just not looking this deep under the covers to uncover performance enhancements like this? Is it because RedHat didn't to include the XFS tools in the distro until recently, causing XFS to not be a choice part of it? Are other Linux folks "next, next, finish..." people when it comes to how they install?  I really don't get it.

Thanks for all the discussion folks. I hope to put forth other use cases as the surface.

<Prev in Thread] Current Thread [Next in Thread>