xfs
[Top] [All Lists]

Re: [PATCH] xfs: Don't flush inodes when project quota exceeded

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH] xfs: Don't flush inodes when project quota exceeded
From: Jan Kara <jack@xxxxxxx>
Date: Wed, 21 Nov 2012 01:24:59 +0100
Cc: Jan Kara <jack@xxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20121120000428.GZ14281@dastard>
References: <1352766973-14197-1-git-send-email-jack@xxxxxxx> <20121119213913.GB29498@xxxxxxxxxxxxx> <20121120000428.GZ14281@dastard>
User-agent: Mutt/1.5.20 (2009-06-14)
On Tue 20-11-12 11:04:28, Dave Chinner wrote:
> On Mon, Nov 19, 2012 at 10:39:13PM +0100, Jan Kara wrote:
> > On Tue 13-11-12 01:36:13, Jan Kara wrote:
> > > When project quota gets exceeded xfs_iomap_write_delay() ends up flushing
> > > inodes because ENOSPC gets returned from xfs_bmapi_delay() instead of 
> > > EDQUOT.
> > > This makes handling of writes over project quota rather slow as a simple 
> > > test
> > > program shows:
> > >   fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC, 0644);
> > >   for (i = 0; i < 50000; i++)
> > >           pwrite(fd, buf, 4096, i*4096);
> > > 
> > > Writing 200 MB like this into a directory with 100 MB project quota takes
> > > around 6 minutes while it takes about 2 seconds with this patch applied. 
> > > This
> > > actually happens in a real world load when nfs pushes data into a 
> > > directory
> > > which is over project quota.
> > > 
> > > Fix the problem by replacing XFS_QMOPT_ENOSPC flag with XFS_QMOPT_EPDQUOT.
> > > That makes xfs_trans_reserve_quota_bydquots() return new error EPDQUOT 
> > > when
> > > project quota is exceeded. xfs_bmapi_delay() then uses this flag so that
> > > xfs_iomap_write_delay() can distinguish real ENOSPC (requiring flushing)
> > > from exceeded project quota (not requiring flushing).
> > > 
> > > As a side effect this patch fixes inconsistency where e.g. xfs_create()
> > > returned EDQUOT even when project quota was exceeded.
> >   Ping? Any opinions?
> 
> FWIW, it doesn't look like it'll apply to a current XFs tree:
> 
> > > @@ -441,8 +442,11 @@ retry:
> > >    */
> > >   if (nimaps == 0) {
> > >           trace_xfs_delalloc_enospc(ip, offset, count);
> > > -         if (flushed)
> > > -                 return XFS_ERROR(error ? error : ENOSPC);
> > > +         if (flushed) {
> > > +                 if (error == 0 || error == EPDQUOT)
> > > +                         error = ENOSPC;
> > > +                 return XFS_ERROR(error);
> > > +         }
> > >  
> > >           if (error == ENOSPC) {
> > >                   xfs_iunlock(ip, XFS_ILOCK_EXCL);
> 
> This xfs_iomap_write_delay() looks like this now:
> 
>         /*
>          * If bmapi returned us nothing, we got either ENOSPC or EDQUOT. Retry
>          * without EOF preallocation.
>          */
>         if (nimaps == 0) {
>                 trace_xfs_delalloc_enospc(ip, offset, count);
>                 if (prealloc) {
>                         prealloc = 0;
>                         error = 0;
>                         goto retry;
>                 }
>                 return XFS_ERROR(error ? error : ENOSPC);
>         }
> 
> The flushing is now way up in xfs_file_buffered_aio_write(), and the
> implementation of xfs_flush_inodes() has changed as well. Hence it
> may or may not behave differently not....
  OK, so I tested latest XFS tree and changes by commit 9aa05000 (changing
xfs_flush_inodes()) indeed improve the performace from those ~6 minutes to
~6 seconds which is good enough I believe. Thanks for the pointer! I was
thinking for a while why sync_inodes_sb() is so much faster than the
original XFS implementation and I believe it's because we don't force the
log on each sync now.

                                                                Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR

<Prev in Thread] Current Thread [Next in Thread>