On Tue, Sep 14, 2010 at 05:12:17PM -0500, Alex Elder wrote:
> On Tue, 2010-09-14 at 20:56 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > Under heavy multi-way parallel create workloads, the VFS struggles to write
> > back all the inodes that have been changed in age order. The bdi flusher
> > thread
> > becomes CPU bound, spending 85% of it's time in the VFS code, mostly
> > traversing
> > the superblock dirty inode list to separate dirty inodes old enough to
> > flush.
> > We already keep an index of all metadata changes in age order - in the AIL -
> > and continued log pressure will do age ordered writeback without any extra
> > overhead at all. If there is no pressure on the log, the xfssyncd will
> > periodically write back metadata in ascending disk address offset order so
> > will
> > be very efficient.
> So log pressure will cause the logged updates to the inode to be
> written to disk (in order), which is all we really need. Is that
Yes. And if there is no log pressure, xfssyncd will do the writeback
in an disk order efficient manner.
> Therefore we don't need to rely on the VFS layer to get
> the dirty inode pushed out?
No. Indeed, for all other types of metadata (btree blocks,
directory/attribute blocks, etc) we already rely on the
xfsaild/xfsbufd to write them out in a timely manner because the VFS
knows nothing about them.
> Is writeback the only reason we should inform the VFS that an
> inode is dirty? (Sorry, I have to leave shortly and don't have
> time to follow this at the moment--I may have to come back to
> this later.)
Yes, pretty much. Take your time - this is one of the more radical
changes in the patch set...