xfs
[Top] [All Lists]

Re: Lots of files

To: Michel Machado <michel@xxxxxxxxxxxxxxx>
Subject: Re: Lots of files
From: Michael Sinz <msinz@xxxxxxxxx>
Date: Thu, 30 Jan 2003 08:04:30 -0500
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <015a01c2c856$bac63610$1601070a@mz.digirati.com.br>
References: <200301291303.39958.joshhelmer@cox.net> <015a01c2c856$bac63610$1601070a@mz.digirati.com.br>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021118
Michel Machado wrote:
Hi,

    Book "Postfix", ISBN 0-672-32114-9, page 46, item "Modified active,
bounce, and deferred Queues":

========================
[...]
    It is a well-known fact on Unix systems that accesing files in a
directory containig lots of files is slower than accessing fewer files in
muitiple subdirectories. Using this information, Venema has modified the
crucial message queue directories (active, bounce, and deferred),
subdividing them into new subdirectories.

    Each of the three main queue directories is split into two levels of
subdirectories. Each message is placed into a subdirectory based on the
first two characters of its filename.

    Figure 2.5 demonstrates this layout:

    /var/spool/postfix/active/
        0/0/
            0023A55F2
            00A3CF9D3
        0/2/
        A/1/
        A/B/
            ABC343CC2
            AB224CD21
[...]
========================

    Is this well-known fact on Unix systems true on XFS? Why? Does the XFS
hash algorithm solve this?

While I do not have a system that is *very* large, our tests on somewhat large systems shows that XFS does not have the access problem that linear directory entry filesystems have (such as BSD's filesystem or Linux/EXT2 or MS-DOS FAT(12/16/32))

What this technique basically describes is a hashing mechanism
that the application implements due to performance problems within
the operating system and filesystem.

Depending on what the code does and how it was "tuned" to the
behavior patterns of the filesystem(s) it was built on, XFS
may or may not be impacted by this.

For example, if you do not have the performance problem of
many files within a directory, doing the pseudo-hash within
the application actually slows things down relative to not
doing it.  Even ignoring the extra code overhead within the
application, you have the fact that opening a file with the
name "foobar" will be one directory lookup while opening up
the file with "f/o/foobar" will be three directory lookups.
When directory lookups are O(n) (UFS/EXT2/etc) then an application
hash can drop the time by doing something like the above with
something like 3*O(n/(36*36))  (well, assuming you have a nice
hash that is 36 (0-9,A-Z) for the first and second character
of the name and that the names are evenly distributed)

That is a big win.  However, if the lookup is more like O(log2 n)
(btree) the 3x factor starts to become the primary factor.
(Especially since there is some fixed overhead in any directory
lookup which is being "ignored" in the O(n) case for large n
as the O(n) is significantly greater than the fixed overhead.
Once you get to O(log2 n) the fixed overhead becomes a more
significant factor)

How close a filesystem can get to O(log2 n) is still a complex
question due to performance considerations during file creation
and deletion.  (Those operations become more expensive as you
try to keep b-tree structures balanced vs simple linear lists or
hash lists)

There are other reasons applications may wish to implement a
form of "hash" themselves.  For example, a mail server needs
to "scan" the mail queue to find work it needs to do.  In the
simplistic case this could significantly impact either memory
performance or system throughput due to cache thrash if you
scan a single very large directory or do the scan in parts.
This is, to some extent, outside the direct details of the
filesystem except for what the cost is of enumerating the
files within a directory.  For example, the Amiga FFS
filesystem had a physical disk block per directory entry.  Thus,
the directory itself was one block and any entries in that directory
were their own blocks.  This made file access very deterministic and
very fast (a very good hashing mechanism helped, along with caches)
but caused extra disk operations in order to list all of the files
in a directory.  (You had to read a block for each file just to get
the file names)  Thus, you could have a directory such as a "/bin"
that had thousands of files in it and have no noticable impact on
the speed of command execution but enumerating (ls /bin) such
a large directory was significantly slower than on filesystems
that have linear access.  Thus, the tradeoff that was used was
that access to files was seen more important (most common case?)
than enumeration of the files.  (This was later addressed by adding
directory cache that contained the information needed for
the enumeration, thus giving linear directory performance for
enumeration while still providing full speed file access at the
cost of slightly slower create and delete performance)

Is the solution used by Venema good? Why? On XFS?

I have not studied the code to see what the performance considerations would be. However, I would say that the mechanism described can be a significant win with many files on any traditional filesystem. (thousands of files level). The point at which more advanced filesystems such as XFS or JFS may show significant gains depends on the behavior of the code. From my tests of XFS scaling performance (this was with many home directories), XFS scales to at least two orders of magnitude more entries than UFS (did not have large enough systems to try more) but in very few entries, the slightly higher overhead of XFS showed UFS slightly faster. (However, in the low numbers case, both were faster than the system needed or even could handle as far as delivering data to the clients - the only cases that we cared about were when performance became an issue and XFS scaled much better than UFS - and not just in directory entried)

    How many files are necessary to produce this effect? 100, 1000, 10000,
100000, 1000000?

Sounds like an interesting test should be made of this. I should compile a kernel with as many R/W filesystems as possible and see if I can do some benchmarks to compare them for this type of access pattern. (Or maybe someone has?)

--
Michael Sinz -- Director, Systems Engineering -- Worldgate Communications
A master's secrets are only as good as
        the master's ability to explain them to others.


<Prev in Thread] Current Thread [Next in Thread>