On Mon, Feb 20, 2012 at 07:38:28PM -0500, Christoph Hellwig wrote:
> Do not use unlogged metadata updates and the VFS dirty bit for updating
> the file size after writeback. In addition to causing various problems
> with updates getting delayed for far too long this also drags in the
> unscalable VFS dirty tracking, and is one of the few remaining unlogged
> metadata updates.
>
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
....
>
> +STATIC int
> +xfs_setfilesize_trans_alloc(
> + struct xfs_ioend *ioend)
> +{
....
> + /*
> + * We hand off the transaction to the completion thread now, so
> + * clear the flag here.
> + */
> + current_restore_flags_nested(&tp->t_pflags, PF_FSTRANS);
......
> @@ -173,18 +214,32 @@ xfs_end_io(
> * range to normal written extens after the data I/O has finished.
> */
> if (ioend->io_type == IO_UNWRITTEN) {
> + /*
> + * For buffered I/O we never preallocate a transaction when
> + * doing the unwritten extent conversion, but for direct I/O
> + * we do not know if we are converting and unwritten extent
> + * or not at the point where we preallocate the transaction.
> + */
> + if (ioend->io_append_trans) {
> + ASSERT(ioend->io_isdirect);
> +
> + current_set_flags_nested(
> + &ioend->io_append_trans->t_pflags, PF_FSTRANS);
> + xfs_trans_cancel(ioend->io_append_trans, 0);
.....
> @@ -1280,17 +1340,33 @@ xfs_vm_direct_IO(
> {
> struct inode *inode = iocb->ki_filp->f_mapping->host;
> struct block_device *bdev = xfs_find_bdev_for_inode(inode);
> + struct xfs_ioend *ioend = NULL;
> ssize_t ret;
>
> if (rw & WRITE) {
> - iocb->private = xfs_alloc_ioend(inode, IO_DIRECT);
> + size_t size = iov_length(iov, nr_segs);
> +
> + /*
> + * Direct I/O code may have to convert unwritten extents from
> + * the AIO and I/O handler in interrupt context. To make this
> + * possible we have to preallocate an ioend that allows defering
> + * it here. For the case where we did not find an unwritten
> + * extent we'll free it again later.
> + */
> + iocb->private = ioend = xfs_alloc_ioend(inode, IO_DIRECT);
> + if (offset + size > XFS_I(inode)->i_d.di_size) {
> + ret = xfs_setfilesize_trans_alloc(ioend);
> + if (ret)
> + goto destroy_ioend;
> + ioend->io_isdirect = 1;
> + }
>
> ret = __blockdev_direct_IO(rw, iocb, inode, bdev, iov,
> offset, nr_segs,
> xfs_get_blocks_direct,
> xfs_end_io_direct_write, NULL, 0);
> if (ret != -EIOCBQUEUED && iocb->private)
> - xfs_destroy_ioend(iocb->private);
> + goto destroy_ioend;
> } else {
> ret = __blockdev_direct_IO(rw, iocb, inode, bdev, iov,
> offset, nr_segs,
> @@ -1299,6 +1375,12 @@ xfs_vm_direct_IO(
> }
>
> return ret;
> +
> +destroy_ioend:
> + if (ioend->io_append_trans)
> + xfs_trans_cancel(ioend->io_append_trans, 0);
Hmmm, I missed something here first time through: do we need to
restore the PF_FSTRANS flag here before cancelling the transaction?
I don't think we do (clearing a flag that is aready cleared is not a
problem), but we should be consistent with the IO completion
handling of this transaction.
Other than that, everything looks good, so you can add:
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
When that inconsistency is fixed.
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
|