xfs
[Top] [All Lists]

Re: Is XFS suitable for 350 million files on 20TB storage?

To: Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: Is XFS suitable for 350 million files on 20TB storage?
From: Stefan Priebe <s.priebe@xxxxxxxxxxxx>
Date: Fri, 05 Sep 2014 22:14:51 +0200
Cc: "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140905191815.GB8400@xxxxxxxxxxxxxx>
References: <540986B1.4080306@xxxxxxxxxxxx> <20140905123058.GA29710@xxxxxxxxxxxxxxx> <5409AF40.10801@xxxxxxxxxxxx> <20140905134810.GA3965@xxxxxxxxxxxxxx> <5409FBEA.9050708@xxxxxxxxxxxx> <20140905191815.GB8400@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

Am 05.09.2014 21:18, schrieb Brian Foster:
...

On Fri, Sep 05, 2014 at 08:07:38PM +0200, Stefan Priebe wrote:
Interesting, that seems like a lot of free inodes. That's 1-2 million in
each AG that we have to look around for each time we want to allocate an
inode. I can't say for sure that's the source of the slowdown, but this
certainly looks like the kind of workload that inspired the addition of
the free inode btree (finobt) to more recent kernels.

It appears that you still have quite a bit of space available in
general. Could you run some local tests on this filesystem to try and
quantify how much of this degradation manifests on sustained writes vs.
file creation? For example, how is throughput when writing a few GB to a
local test file?

Not sure if this is what you expect:

# dd if=/dev/zero of=bigfile oflag=direct,sync bs=4M count=1000
1000+0 records in
1000+0 records out
4194304000 bytes (4,2 GB) copied, 125,809 s, 33,3 MB/s

or without sync
# dd if=/dev/zero of=bigfile oflag=direct bs=4M count=1000
1000+0 records in
1000+0 records out
4194304000 bytes (4,2 GB) copied, 32,5474 s, 129 MB/s

> How about with that same amount of data broken up
across a few thousand files?

This results in heavy kworker usage.

4GB in 32kb files
# time (mkdir test; for i in $(seq 1 1 131072); do dd if=/dev/zero of=test/$i bs=32k count=1 oflag=direct,sync 2>/dev/null; done)

...

55 min

Brian

P.S., Alternatively if you wanted to grab a metadump of this filesystem
and compress/upload it somewhere, I'd be interested to take a look at
it.

I think there might be file and directory names in it. If this is the case i can't do it.

Stefan



Thanks!

Stefan



Brian

... as well as what your typical workflow/dataset is for this fs. It
seems like you have relatively small files (15TB used across 350m files
is around 46k per file), yes?

Yes - most fo them are even smaller. And some files are > 5GB.

If so, I wonder if something like the
following commit introduced in 3.12 would help:

133eeb17 xfs: don't use speculative prealloc for small files

Looks interesting.

Stefan

<Prev in Thread] Current Thread [Next in Thread>