On Wed, Apr 24, 2002 at 06:16:56PM +0200, Paul Schutte wrote:
>
> This is another thing that is bothering me.
> If you create a filesystem with only one AG all the benchmarks, tars, cps and
> whatever you can dream up is a lot faster.
>
> I concluded that the seeks accross to other allocation groups was killing
> performance.
This is probably true only in the uniprocessor, single-spindle case.
On a multiprocessor, the existence of largely independent allocation
groups should be a big win, allowing much more processor concurrency in the
filesystem code. On a filesystem that lives on multiple spindles (even if
they're hidden behind a logical disk device of some kind) the inherent
ability of the multiple heads to service requests from different areas of
the disk at different times is discarded without multiple allocation groups.
Of course, you have to understand what you're optimizing your disk subsystem
for. The typical naive RAID configuration using mirroring or RAID5 for
redundancy and small-stripe striping for "parallelism" actually just boosts
single-threaded I/O *throughput*; it actually discards potential parallelism.
When you have to seek, you still have to wait for the full seek time of *one
of* the disks (though on stripe-sized or larger I/O, you have to seek only
1/N as often, where "N" is the number of disks) because all heads must settle
on the same stripe before your I/O request to the stripe can be satisfied.
If you're looking for maximum concurrency for small transactions -- for
example, precisely the benchmark you describe -- you want to do something
like create a plex with your disks (RAID 5 volumes for redundancy, perhaps)
laid out in sequence, not striped, and arrange your allocation groups such
that there is one per disk, so requests to separate AGs hit separate I/O
paths at the lowest level, and the seek penalty is greatly reduced.
One of the really nice things about XFS is that it is flexible enough that
one *can* optimize it for either the maximum-throughput or maximum-concurrency
case, in configurations from single-spindle-uniprocessor to really big
servers with gigantic arrays of many disks.
In the small case, though, allocating across a small number of AGs should not
hurt performance much, so long as the allocation policy batches transactions
into the same AG whenever possible. What you really, really *don't* want is
a policy like the broken old BSD FFS policy: switch allocation groups any
time you create a new file, and make a new allocation choice every time you
create a new file. Spreading stuff across the disk is good; not doing it in
batches does murder your single-spindle performance.
Thor
|