xfs
[Top] [All Lists]

Re: xfs and swift

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: xfs and swift
From: Mark Seger <mjseger@xxxxxxxxx>
Date: Wed, 6 Jan 2016 17:46:33 -0500
Cc: Laurence Oberman <loberman@xxxxxxxxxx>, Linux fs XFS <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=WJHEDTYHi29d8GPP3kZzthvAiU0v+NmrZHHYUfc+wB0=; b=wvB90X7we/DrNoEEZ04dr2CEivqtEBkfKAMHcIwxYrvpmHBsmwjVPMdmiMiv97cbEr lXdZsV6QH0iZ8PXywyD7kAZvDUVK9IhNNY/sEt/WMybD3rJlUPe/ZO8l5ENlQqFJm4RN nk2LiJS32qik2duJNzfI1ETJQ3oRoiFTjiz9ZmT/EaFJ+JWvXECR8Qz37lYPKRxsytP1 jCTC5sQ4jxaIaacvVi/2FNTp49SbY2IYNcarTcxQJM9LGJ08Ey6aer/rZ5uEYHw4Dp5v XPRXAQuMWBG8AiwWyTHvRp1p5MyOGhAhoOq8APss4V/Gyfi9mjpb5zdRXNVV0EI5ViE0 c3xw==
In-reply-to: <20160106221004.GJ21461@dastard>
References: <CAC2B=ZGX2bkEhdgCrpS2X5v+SpAg0jtxZ19vk_9+O9aHME-FSA@xxxxxxxxxxxxxx> <20160106220454.GI21461@dastard> <20160106221004.GJ21461@dastard>
dave, thanks for getting back to me and the pointer to the config doc. Âlots to absorb and play with.

the real challenge for me is that I'm doing testing as different levels. While i realize running 100 parallel swift PUT threads on a small system is not the ideal way to do things, it's the only easy way to get massive numbers of objects into the fillesystem and once there, the performance of a single stream is pretty poor and by instrumenting the swift code I can clearly see excess time being spent in creating/writing the objects and so that's lead us to believe the problem lies in the way xfs is configured. Âcreating a new directory structure on that same mount point immediately results in high levels of performance.

As an attempt to try to reproduce the problems w/o swift, I wrote a little python script that simply creates files in a 2-tier structure, the first tier consisting of 1024 directories and each directory contains 4096 subdirectories into which 1K files are created. I'm doing this for 10000 objects as a time and then timing them, reporting the times, 10 per line so each line represents 100 thousand file creates.

Here too I'm seeing degradation and if I look at what happens when there are already 3M files and I write 1M more, I see these creation times/10 thousand:

Â1.004236 Â0.961419 Â0.996514 Â1.012150 Â1.101794 Â0.999422 Â0.994796 Â1.214535 Â0.997276 Â1.306736
Â2.793429 Â1.201471 Â1.133576 Â1.069682 Â1.030985 Â1.096341 Â1.052602 Â1.391364 Â0.999480 Â1.914125
Â1.193892 Â0.967206 Â1.263310 Â0.890472 Â1.051962 Â4.253694 Â1.145573 Â1.528848 13.586892 Â4.925790
Â3.975442 Â8.896552 Â1.197005 Â3.904226 Â7.503806 Â1.294842 Â1.816422 Â9.329792 Â7.270323 Â5.936545
Â7.058685 Â5.516841 Â4.527271 Â1.956592 Â1.382551 Â1.510339 Â1.318341 13.255939 Â6.938845 Â4.106066
Â2.612064 Â2.028795 Â4.647980 Â7.371628 Â5.473423 Â5.823201 14.229120 Â0.899348 Â3.539658 Â8.501498
Â4.662593 Â6.423530 Â7.980757 Â6.367012 Â3.414239 Â7.364857 Â4.143751 Â6.317348 11.393067 Â1.273371
146.067300 Â1.317814 Â1.176529 Â1.177830 52.206605 Â1.112854 Â2.087990 42.328220 Â1.178436 Â1.335202
49.118140 Â1.368696 Â1.515826 44.690431 Â0.927428 Â0.920801 Â0.985965 Â1.000591 Â1.027458 60.650443
Â1.771318 Â2.690499 Â2.262868 Â1.061343 Â0.932998 64.064210 37.726213 Â1.245129 Â0.743771 Â0.996683

nothing one set of 10K took almost 3 minutes!

my main questions at this point are is this performance expected and/or might a newer kernel help? Âand might it be possible to significantly improve things via tuning or is it what it is? I do realize I'm starting with an empty directory tree whose performance degrades as it fills, but if I wanted to tune for say 10M or maybe 100M files might I be able to expect more consistent numbers (perhaps starting out at lower performance) as the numbers of objects grow? I'm basically looking for more consistency over a broader range of numbers of files.

-mark

On Wed, Jan 6, 2016 at 5:10 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
On Thu, Jan 07, 2016 at 09:04:54AM +1100, Dave Chinner wrote:
> On Wed, Jan 06, 2016 at 10:15:25AM -0500, Mark Seger wrote:
> > I've recently found the performance our development swift system is
> > degrading over time as the number of objects/files increases. This is a
> > relatively small system, each server has 3 400GB disks. The system I'm
> > currently looking at has about 70GB tied up in slabs alone, close to 55GB
> > in xfs inodes and ili, and about 2GB free. The kernel
> > is 3.14.57-1-amd64-hlinux.
>
> So you go 50M cached inodes in memory, and a relatively old kernel.
>
> > Here's the way the filesystems are mounted:
> >
> > /dev/sdb1 on /srv/node/disk0 type xfs
> > (rw,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=512,swidth=1536,noquota)
> >
> > I can do about 2000 1K file creates/sec when running 2 minute PUT tests at
> > 100 threads. If I repeat that tests for multiple hours, I see the number
> > of IOPS steadily decreasing to about 770 and the very next run it drops to
> > 260 and continues to fall from there. This happens at about 12M files.
>
> According to the numbers you've provided:
>
>   Âlookups    Âcreates    Âremoves
> Fast: 1550Â Â Â Â Â Â 1350Â Â Â Â Â Â 300
> Slow: 1000Â Â Â Â Â Â Â900Â Â Â Â Â Â 250
>
> This is pretty much what I'd expect on the XFS level when going from
> a small empty filesystem to one containing 12M 1k files.
>
> That does not correlate to your numbers above, so it's not at all
> clear that there is realy a problem here at the XFS level.
>
> > The directory structure is 2 tiered, with 1000 directories per tier so we
> > can have about 1M of them, though they don't currently all exist.
>
> That's insane.
>
> The xfs directory structure is much, much more space, time, IO and
> memory efficient that a directory hierachy like this. The only thing
> you need a directory hash hierarchy for is to provide sufficient
> concurrency for your operations, which you would probably get with a
> single level with one or two subdirs per filesystem AG.

BTW, you might want to read the section on directory block size for
a quick introduction to XFS directory design and scalability:

https://git.kernel.org/cgit/fs/xfs/xfs-documentation.git/tree/admin/XFS_Performance_Tuning/filesystem_tunables.asciidoc

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>