On Tue, Sep 14, 2010 at 05:12:17PM -0500, Alex Elder wrote:
> On Tue, 2010-09-14 at 20:56 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> >
> > Under heavy multi-way parallel create workloads, the VFS struggles to write
> > back all the inodes that have been changed in age order. The bdi flusher
> > thread
> > becomes CPU bound, spending 85% of it's time in the VFS code, mostly
> > traversing
> > the superblock dirty inode list to separate dirty inodes old enough to
> > flush.
> >
> > We already keep an index of all metadata changes in age order - in the AIL -
> > and continued log pressure will do age ordered writeback without any extra
> > overhead at all. If there is no pressure on the log, the xfssyncd will
> > periodically write back metadata in ascending disk address offset order so
> > will
> > be very efficient.
>
> So log pressure will cause the logged updates to the inode to be
> written to disk (in order), which is all we really need. Is that
> right?
Yes. And if there is no log pressure, xfssyncd will do the writeback
in an disk order efficient manner.
> Therefore we don't need to rely on the VFS layer to get
> the dirty inode pushed out?
No. Indeed, for all other types of metadata (btree blocks,
directory/attribute blocks, etc) we already rely on the
xfsaild/xfsbufd to write them out in a timely manner because the VFS
knows nothing about them.
> Is writeback the only reason we should inform the VFS that an
> inode is dirty? (Sorry, I have to leave shortly and don't have
> time to follow this at the moment--I may have to come back to
> this later.)
Yes, pretty much. Take your time - this is one of the more radical
changes in the patch set...
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
|