On Mon, Oct 18, 2010 at 09:42:04AM -0400, Angelo McComis wrote:
> Apologies but I am new to this list, and somewhat new to XFS.
> I have a use case where I'd like to forward the use of XFS. This is for
> large (multi-GB, say anywhere from 5GB to 300GB) individual files, such as
> what you'd see under a database's data file / tablespace.
Yup, perfect use case for XFS.
> My database vendor (who, coincidentally markets their own filesystems and
> operating systems) says that there are certain problems under XFS with
> specific mention of corruption issues, if a single root or the metadata
> become corrupted, the entire filesystem is gone,
Yes, they are right about detected metadata corruption causing a
filesystem _shutdown_, but that does not mean that a metadata
corruption event will cause your entire filesystem to disappear.
Besides, the worst case for _any_ filesystem is that it gets
corrupted beyond repair and you have to restore from backups,
so you still have to plan for this eventuality when dealing with
disaster recovery scenarios.
What they neglect to mention is that XFS has a lot of metadata
corruption detection code, and shutѕ down at the first detection to
prevent the filesystem for being further damaged before a repair
process can be run. Apart from btrfs, XFS has the best run-time
metadata corruption detection of any filesystem in Linux, and even
so there are plans to improve that over the next year of so....
> and it has performance
> issues on a multi-threaded workload, caused by the single root filesystem
> for metadata becoming a bottleneck.
Single root design has nothing to do with performance on
multithreaded workloads. However, XFS really isn't a single-root
design. While it has a single root for the _directory structure_,
the allocation subsystem has a root per allocation group and hence
allocation operations can occur in parallel in XFS.
Hence the only points of serialisation for most operations is either
an individual directory being operated on or the journalling subsystem.
Simultaneous directory modifications are not something that
databases (or any application) do very often, so that point of
serialisation is not something you're ever likely to hit. Besides,
this serialisation is a limitation of the linux VFS, not something
specific to XFS. Similarly, databases don't do a lot of metadata
operations so the journalling subsytem won't be a bottleneck,
Databases do large amounts of _data IO_ to and from files, and that
is what XFS excels at. Especially if the database is using direct
IO, because then XFS allows concurrent read and write access to the
file so the only limitations in throughput is the storage subsystem
and the database itself...
And FWIW, I've done nothing but improve multithreaded throughput for
metadata operations in XFS for the past few months, so the claims
your vendor is making really have no basis in reality.
> This feedback from the vendor is surely taken with a grain of salt as they
> have marketing motivations of their own product to consider.
> Surely, something like corruption and bottlenecks under heavy load /
> multi-threaded use would be a bug that would be addressed, right?
Yes, absolutely. Please ask the vendor to raise bugs for any issues
they have seen next time they say this to you.
> And surely, something like a BTree structure, with a root node, journaled
> metadata, etc. would be inherent in other filesystem choices as well, right?
> The vendor, in the end, did recommend ext4, but ext4 is not in my mainline
> Linux kernel as anything beyond "tech preview" at this point.
Oh, man, I almost spat out my coffee all over my keyboard when I
read that. I needed a good laugh this morning. :)
So what we have here is a classic case of FUD.
Your vendor's recommendation to use ext4 instead of XFS directly
contradicts their message not to use XFS. ext4 is exactly the same
as XFS in regard to the single root/metadata corruption design
issues, but ext4 does a much worse job of detecting corruption
at runtime compared to XFS.
ext4 is also immature, is pretty much untested in long-term
production environments and has developers that are already
struggling to understand and maintain the code because of the way it
has been implemented.
IOWs, your vendor is recommending a filesystem that is _inferior to XFS_.
That's a classic sales technique - level FUD at a competitor, then
recommend an inferior solution as the _better alternative_. The key
to this technique is that the alternative needs to be something that
the customer will recognise as not being viable for deployment in
business critical systems. So now the customer doesn't want to use
either, and they are ready for the "but we've got this really robust
solution and it only costs $$$" sucker-punch.
My best guess at the reason for such a carefully targeted sales
technique is that their database is just as robust and performs just
as well on XFS as it does on their own solutions that cost mega-$$$.
What other motivation is there for taking such an approach?