[Top] [All Lists]

Re: [PATCH v2 1/3] xfs: add scan owner field to xfs_eofblocks

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: [PATCH v2 1/3] xfs: add scan owner field to xfs_eofblocks
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Wed, 28 May 2014 10:00:36 -0400
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140528053019.GB3816@xxxxxxxxxxxxx>
References: <1400845950-41435-1-git-send-email-bfoster@xxxxxxxxxx> <1400845950-41435-2-git-send-email-bfoster@xxxxxxxxxx> <20140527104428.GC1440@xxxxxxxxxxxxx> <20140527121810.GB63281@xxxxxxxxxxxxxxx> <20140527212653.GC6677@dastard> <20140528053019.GB3816@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, May 27, 2014 at 10:30:19PM -0700, Christoph Hellwig wrote:
> On Wed, May 28, 2014 at 07:26:53AM +1000, Dave Chinner wrote:
> > > Right... maybe I'm not parsing your point. The purpose here is to avoid
> > > the trylock entirely. E.g., Indicate that we have already acquired the
> > > lock and can proceed with xfs_free_eofblocks(), rather than fail a
> > > trylock and skip (which appears to be a potential infinite loop scenario
> > > here due to how the AG walking code handles EAGAIN).
> > 
> > I think Christoph's concern here is that we are calling a function
> > that can take the iolock while we already hold the iolock. i.e. the
> > reason we have to add the anti-deadlock code in the first place.
> Indeed.

Ah, I didn't parse correctly then. Thanks...

> > To
> > address that, can we restructure xfs_file_buffered_aio_write() such
> > that the ENOSPC/EDQUOT flush is done outside the iolock?
> > 
> > >From a quick check, I don't think there is any problem with dropping
> > the iolock, doing the flushes and then going all the way back to the
> > start of the function again, but closer examination and testing is
> > warranted...

I considered this briefly early on, but wasn't sure about whether we
should run through the write_checks() bits more than once (e.g.,
potentially do the eof zeroing, etc., multiple times..?).

> I think we'd need some form of early space reservation, otherwise we'd
> get non-atomic writes.  Time to get those batches write patches out
> again..

So the concern is that multiple writers to an overlapped range could
become interleaved? From passing through the code, we hit
generic_perform_write(), which iters over the iov in a
write_begin/copy_write_end loop. If we hit ENOSPC somewhere in the
middle, we'd return what we've written so far. I don't believe the
buffered_aio_write() path would see the error unless it was the first
attempt at a delayed allocation. IOW, mid-write failure will be a short
write vs. an ENOSPC error.

It seems like it _might_ be safe to drop and reacquire iolock given
these semantics (notwithstanding the write_checks() bits), but I could
certainly be missing something...


<Prev in Thread] Current Thread [Next in Thread>