[Top] [All Lists]

XFS: Abysmal write performance because of excessive seeking (allocation

To: xfs@xxxxxxxxxxx
Subject: XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?)
From: Stefan Ring <stefanrin@xxxxxxxxx>
Date: Thu, 5 Apr 2012 20:10:43 +0200
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Sxohmf1bT4oHgxM85r/rXnMvcJGd6KilEg/MtE6q0c4=; b=B37XDknd/btbmDeSz/sh9uw1Qs7zD3rKUFTa9LfnxoZpmhgT2yidTpQP6BAM8UyCwr r9iYSaF1bW5E4Kj9lLG1Cu9AgCHekH47pWKD8TBWbGdxFyBEbSFaLLt4Ui3F4hrRRg1m g/eDDympQKcji8QOIOHIu5fAVjAvA8HVzGgsU9mQ5NsDdOsOwso4hVdXugwbyG+DG7Fp wy0kNU+QccFzgQJXMjkvvmP1fX5Qke/zMvohxkYGd47kgjPPhRZZaRNKdU66MujJ1zBF LTebYCLHwm6wEgqxlIEU5tvNXUtfSrf5aKcyAbBBPxrp9T8I4d1gVhl5lYlqwW9+FgkX 1Geg==
Encouraged by reading about the recent improvements to XFS, I decided
to give it another try on a new server machine. I am happy to report
that compared to my previous tests a few years ago, performance has
progressed from unusably slow to barely acceptable, but still lagging
behind ext4, which is a noticeable (and notable) improvement indeed

The filesystem operations I care about the most are the likes which
involve thousands of small files across lots of directories, like
large trees of source code. For my test, I created a tarball of a
finished IcedTea6 build, about 2.5 GB in size. It contains roughly
200,000 files in 20,000 directories. The test I want to report about
here was extracting this tarball onto an XFS filesystem. I tested
other actions as well, but they didn't reveal anything too noticeable.

So the test consists of nothing but un-tarring the archive, followed
by a "sync" to make sure that the time-to-disk is measured. Prior to
running it, I had populated the filesystem in the following way:

I created two directory hierarchies, each containing the unpacked
tarball 20 times, which I rsynced simultaneously to the target
filesystem. When this was done, I deleted one half of them, creating
some free space fragmentation, and what I hoped would mimic real-world
conditions to some degree.

So now to the test itself -- the tar "x" command returned quite fast
(on the order of only a few seconds), but the following sync took
ages. I created a diagram using seekwatcher, and it reveals that the
disk head jumps about wildly between four zones which are written to
in almost perfectly linear fashion.

When I reran the test with only a single allocation group, behavior
was much better (about twice as fast).

OTOH, when I continuously extracted the same tarball in a loop without
syncing in-between, it would continuously slow down in the ag=1 case
to the point of being unacceptably slow. The same behavior did not
occur with ag=4.

I am aware that no filesystem can be optimal, but given that the
entire write set -- all 2.5 GB of it -- is "known" to the file system,
that is, in memory, wouldn't it be possible to write it out to disk in
a somewhat more reasonable fashion?

This is the seekwatcher graph:

And for comparison, the same on ext4, on the same partition primed in
the same way (parallel rsyncs mentioned above):

As can be seen from the time scale in the bottom part, the ext4
version performed about 5 times as fast because of a much more
disk-friendly write pattern.

I ran the tests with a current RHEL 6.2 kernel and also with a 3.3rc2
kernel. Both of them exhibited the same behavior. The disk hardware
used was a SmartArray p400 controller with 6x 10k rpm 300GB SAS disks
in RAID 6. The server has plenty of RAM (64 GB).

<Prev in Thread] Current Thread [Next in Thread>