xfs
[Top] [All Lists]

Re: Issues and new to the group

To: xfs@xxxxxxxxxxx
Subject: Re: Issues and new to the group
From: Jay Ashworth <jra@xxxxxxxxxxx>
Date: Thu, 26 Sep 2013 11:26:47 -0400 (EDT)
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <52444BDD.9060100@xxxxxxxxx>
----- Original Message -----
> From: "Joe Landman" <joe.landman@xxxxxxxxx>

> > takes. The folders are image folders that have anywhere between 5 to
> > 10 million images in each folder.
> 
> The combination of very large folders, and virtualization is working
> against you. Couple that with an old (ancient by Linux standards) xfs
> in the virtual CentOS 5.9 system, and you aren't going to have much
> joy with this without changing a few things.

> Can you change from one single large folder to a heirarchical set of
> folders? The single large folder means any metadata operation (ls,
> stat, open, close) has a huge set of lists to traverse. It will work,
> albiet slowly. As a rule of thumb, we try to make sure our users don't
> go much beyond 10k files/folder. If they need to, building a heirarchy
> of folders slightly increases management complexity, but keeps the
> lists that are needed to be traversed much smaller.
> 
> A strategy for doing this: If your files are named "aaaa0001"
> "aaaa0002" ... "zzzz9999" or similar, then you can chop off the first
> letter, and make a directory of it, and then put all files starting
> with that letter in that directory. Then within each of those directories,
> do the same thing with the second letter. This gets you 676
> directories and about 15k files per directory. Much faster directory 
> operations.
> Much smaller lists to traverse.

While this problem isn't *near* as bad on XFS as it was on older filesystems,
where over maybe 500-1000 files would result in 'ls' commands taking
over a minute...

It's still a good idea to filename hash large collections of files of 
similar types into a directory tree, as Joe recommends.  The best approach
I myself have seen to this is to has a filename of

835bfak3f89yu12.jpg

into

8/3/5/b/835bfak3f89yu12.jpg
8/3/5/b/f/835bfak3f89yu12.jpg
8/3/5/b/f/a/835bfak3f89yu12.jpg

Going as deep as necessary to reduce the size of the directories. What
you lose in needing to cache the extra directory levels outweighs (probably
far outweighs) having to handle Directories Of Unusual Size.

Note that I didn't actually trim the filename proper; the final file still has
its full name.  This hash is easy to build, as long as you fix the number of 
layers
in advance... and if you need to make it deeper, later, it's easy to build a 
shell script that crawls the current tree and adds the next layer.

Cheers,
-- jra
-- 
Jay R. Ashworth                  Baylink                       jra@xxxxxxxxxxx
Designer                     The Things I Think                       RFC 2100
Ashworth & Associates     http://baylink.pitas.com         2000 Land Rover DII
St Petersburg FL USA               #natog                      +1 727 647 1274

<Prev in Thread] Current Thread [Next in Thread>