<div dir="ltr">dave, thanks for getting back to me and the pointer to the config doc. lots to absorb and play with.<div><br></div><div>the real challenge for me is that I'm doing testing as different levels. While i realize running 100 parallel swift PUT threads on a small system is not the ideal way to do things, it's the only easy way to get massive numbers of objects into the fillesystem and once there, the performance of a single stream is pretty poor and by instrumenting the swift code I can clearly see excess time being spent in creating/writing the objects and so that's lead us to believe the problem lies in the way xfs is configured. creating a new directory structure on that same mount point immediately results in high levels of performance.</div><div><br></div><div>As an attempt to try to reproduce the problems w/o swift, I wrote a little python script that simply creates files in a 2-tier structure, the first tier consisting of 1024 directories and each directory contains 4096 subdirectories into which 1K files are created. I'm doing this for 10000 objects as a time and then timing them, reporting the times, 10 per line so each line represents 100 thousand file creates.</div><div><br></div><div>Here too I'm seeing degradation and if I look at what happens when there are already 3M files and I write 1M more, I see these creation times/10 thousand:</div><div><br></div><div><div> 1.004236 0.961419 0.996514 1.012150 1.101794 0.999422 0.994796 1.214535 0.997276 1.306736</div><div> 2.793429 1.201471 1.133576 1.069682 1.030985 1.096341 1.052602 1.391364 0.999480 1.914125</div><div> 1.193892 0.967206 1.263310 0.890472 1.051962 4.253694 1.145573 1.528848 13.586892 4.925790</div><div> 3.975442 8.896552 1.197005 3.904226 7.503806 1.294842 1.816422 9.329792 7.270323 5.936545</div><div> 7.058685 5.516841 4.527271 1.956592 1.382551 1.510339 1.318341 13.255939 6.938845 4.106066</div><div> 2.612064 2.028795 4.647980 7.371628 5.473423 5.823201 14.229120 0.899348 3.539658 8.501498</div><div> 4.662593 6.423530 7.980757 6.367012 3.414239 7.364857 4.143751 6.317348 11.393067 1.273371</div><div>146.067300 1.317814 1.176529 1.177830 52.206605 1.112854 2.087990 42.328220 1.178436 1.335202</div><div>49.118140 1.368696 1.515826 44.690431 0.927428 0.920801 0.985965 1.000591 1.027458 60.650443</div><div> 1.771318 2.690499 2.262868 1.061343 0.932998 64.064210 37.726213 1.245129 0.743771 0.996683</div></div><div><br></div><div>nothing one set of 10K took almost 3 minutes!</div><div><br></div><div>my main questions at this point are is this performance expected and/or might a newer kernel help? and might it be possible to significantly improve things via tuning or is it what it is? I do realize I'm starting with an empty directory tree whose performance degrades as it fills, but if I wanted to tune for say 10M or maybe 100M files might I be able to expect more consistent numbers (perhaps starting out at lower performance) as the numbers of objects grow? I'm basically looking for more consistency over a broader range of numbers of files.</div><div><br></div><div>-mark</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 6, 2016 at 5:10 PM, Dave Chinner <span dir="ltr"><<a href="mailto:david@fromorbit.com" target="_blank">david@fromorbit.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Thu, Jan 07, 2016 at 09:04:54AM +1100, Dave Chinner wrote:<br>
> On Wed, Jan 06, 2016 at 10:15:25AM -0500, Mark Seger wrote:<br>
> > I've recently found the performance our development swift system is<br>
> > degrading over time as the number of objects/files increases. This is a<br>
> > relatively small system, each server has 3 400GB disks. The system I'm<br>
> > currently looking at has about 70GB tied up in slabs alone, close to 55GB<br>
> > in xfs inodes and ili, and about 2GB free. The kernel<br>
> > is 3.14.57-1-amd64-hlinux.<br>
><br>
> So you go 50M cached inodes in memory, and a relatively old kernel.<br>
><br>
> > Here's the way the filesystems are mounted:<br>
> ><br>
> > /dev/sdb1 on /srv/node/disk0 type xfs<br>
> > (rw,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=512,swidth=1536,noquota)<br>
> ><br>
> > I can do about 2000 1K file creates/sec when running 2 minute PUT tests at<br>
> > 100 threads. If I repeat that tests for multiple hours, I see the number<br>
> > of IOPS steadily decreasing to about 770 and the very next run it drops to<br>
> > 260 and continues to fall from there. This happens at about 12M files.<br>
><br>
> According to the numbers you've provided:<br>
><br>
> lookups creates removes<br>
> Fast: 1550 1350 300<br>
> Slow: 1000 900 250<br>
><br>
> This is pretty much what I'd expect on the XFS level when going from<br>
> a small empty filesystem to one containing 12M 1k files.<br>
><br>
> That does not correlate to your numbers above, so it's not at all<br>
> clear that there is realy a problem here at the XFS level.<br>
><br>
> > The directory structure is 2 tiered, with 1000 directories per tier so we<br>
> > can have about 1M of them, though they don't currently all exist.<br>
><br>
> That's insane.<br>
><br>
> The xfs directory structure is much, much more space, time, IO and<br>
> memory efficient that a directory hierachy like this. The only thing<br>
> you need a directory hash hierarchy for is to provide sufficient<br>
> concurrency for your operations, which you would probably get with a<br>
> single level with one or two subdirs per filesystem AG.<br>
<br>
</div></div>BTW, you might want to read the section on directory block size for<br>
a quick introduction to XFS directory design and scalability:<br>
<br>
<a href="https://git.kernel.org/cgit/fs/xfs/xfs-documentation.git/tree/admin/XFS_Performance_Tuning/filesystem_tunables.asciidoc" rel="noreferrer" target="_blank">https://git.kernel.org/cgit/fs/xfs/xfs-documentation.git/tree/admin/XFS_Performance_Tuning/filesystem_tunables.asciidoc</a><br>
<div class="HOEnZb"><div class="h5"><br>
Cheers,<br>
<br>
Dave.<br>
--<br>
Dave Chinner<br>
<a href="mailto:david@fromorbit.com">david@fromorbit.com</a><br>
</div></div></blockquote></div><br></div>