> On Sat, Mar 24 2001, Petru Paler wrote:
> > Hi,
> > What are the gains of using the "kio" mount options? Are
> > kiobufs faster than buffer heads, or they just allow more
> > efficient merging of requests?
> The kiobufs allow efficient "merging" from the fs side, so that
> it can submit single large chunks of I/O. The gains can be pretty
> good, I know Steve Lord had some numbers on IDE and SCSI bh vs kiobuf
> that showed this nicely.
> Jens Axboe
Hmm, not sure I have those numbers anymore Jens. The other point is that
merging of kiobuf requests within the elevator has never been implemented.
Kiobufs and xfs is a long story, but XFS comes from an environment where the
filesystem and buffer cache do request merging and the elevator does not.
For the raw I/O path we have shown that not having to setup buffer heads
on requests, but being able to pass the kiobuf directly to the block
layer was a win, I do not have numbers handy for this myself. There is
definitely a cpu penalty to pay in some cases by using buffer heads when
you start out with large chunks of data you want to do I/O on.
XFS itself has chunks of metadata larger than a page, and writes to the
log are upto 32K at once, the work of setting up buffer_head in these
cases would be worth avoiding, but is not a huge benefit.
We used to have code in the xfs write path where we could cluster together
several hundred pages in one I/O request. This gave good throughput, but
did not play well with the rest of the kernel in terms of resource
consumption, getting pages out of xfs when there was high memory pressure
was a problem, the ordering of writes to the disk was somewhat random as well.
We changed the the algorithm used in the write path to fix these problems
and removed the code which did the clustering in this matter. Our write path
now clusters in the elevator using buffer heads. Thanks to the work Jens
has done there, the elevator is doing a pretty good job nowadays. This
leaves the only things in xfs capable of requests bigger than a page
being the metadata component - which is going to use 1 page I/O in most
places, and the direct I/O path.
Long term I would still like to see some mechanism where the filesystem can
issue requests of larger than a page and have them dealt with correctly. This
will probably be something lighter weight than a kiobuf.
We now return to our regularly scheduled programming!