xfs
[Top] [All Lists]

XFS slow when project quota exceeded (again)

To: xfs@xxxxxxxxxxx
Subject: XFS slow when project quota exceeded (again)
From: Jan Kara <jack@xxxxxxx>
Date: Thu, 14 Feb 2013 14:14:52 +0100
Cc: dchinner@xxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
User-agent: Mutt/1.5.20 (2009-06-14)
  Hi,

  this is a follow up on a discussion started here:
http://www.spinics.net/lists/xfs/msg14999.html

To just quickly sum up the issue: 
When project quota gets exceeded XFS ends up flushing inodes using
sync_inodes_sb(). I've tested (in 3.8-rc4) that if one writes 200 MB to a
directory with 100 MB project quota like:
        fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC, 0644);
        for (i = 0; i < 50000; i++)
                pwrite(fd, buf, 4096, i*4096);
it takes about 3 s to finish, which is OK. But when there are lots of
inodes cached (I've tried with 10000 inodes cached on the fs), the same
test program runs ~140 s. This is because sync_inodes_sb() iterates over
all inodes in superblock and waits for IO and this iteration eats CPU
cycles.

One can argue that sync_inodes_sb() should be optimized but it isn't that
easy because it is a data integrity operation and we have to be sure that
even IO which has been submitted before sync_inodes_sb() is called is
finished by the time it returns.

So another POV is that it is an overkill to call data integrity operation
in a place which just needs to make sure IO is submitted so that delalloc
uncertainty is reduced. The comment before xfs_flush_inodes() says that
sync_inodes_sb() is used to throttle multiple callers to the rate at which
IO is completing. I wonder - why is that needed? Delalloc blocks are
allocated already at IO submission time so it should be enough to wait for
IO submission to happen? And writeback_inodes_sb() does that. Changelog of
the commit changing xfs_flush_inodes() to use sync_inodes_sb() claims that
if writeback_inodes_sb() is used, premature ENOSPC can happen. But I wonder
why is that the case when writeback_inodes_sb() waits for IO submission to
happen and thus for extent conversion? Isn't sync_inodes_sb() just papering
over some other problem?

                                                                Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR

<Prev in Thread] Current Thread [Next in Thread>