On Tue, May 27, 2014 at 10:30:19PM -0700, Christoph Hellwig wrote:
> On Wed, May 28, 2014 at 07:26:53AM +1000, Dave Chinner wrote:
> > > Right... maybe I'm not parsing your point. The purpose here is to avoid
> > > the trylock entirely. E.g., Indicate that we have already acquired the
> > > lock and can proceed with xfs_free_eofblocks(), rather than fail a
> > > trylock and skip (which appears to be a potential infinite loop scenario
> > > here due to how the AG walking code handles EAGAIN).
> > I think Christoph's concern here is that we are calling a function
> > that can take the iolock while we already hold the iolock. i.e. the
> > reason we have to add the anti-deadlock code in the first place.
Ah, I didn't parse correctly then. Thanks...
> > To
> > address that, can we restructure xfs_file_buffered_aio_write() such
> > that the ENOSPC/EDQUOT flush is done outside the iolock?
> > >From a quick check, I don't think there is any problem with dropping
> > the iolock, doing the flushes and then going all the way back to the
> > start of the function again, but closer examination and testing is
> > warranted...
I considered this briefly early on, but wasn't sure about whether we
should run through the write_checks() bits more than once (e.g.,
potentially do the eof zeroing, etc., multiple times..?).
> I think we'd need some form of early space reservation, otherwise we'd
> get non-atomic writes. Time to get those batches write patches out
So the concern is that multiple writers to an overlapped range could
become interleaved? From passing through the code, we hit
generic_perform_write(), which iters over the iov in a
write_begin/copy_write_end loop. If we hit ENOSPC somewhere in the
middle, we'd return what we've written so far. I don't believe the
buffered_aio_write() path would see the error unless it was the first
attempt at a delayed allocation. IOW, mid-write failure will be a short
write vs. an ENOSPC error.
It seems like it _might_ be safe to drop and reacquire iolock given
these semantics (notwithstanding the write_checks() bits), but I could
certainly be missing something...