[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TAKE - make "osyncisdsync" the default



Andrew Morton wrote:

>Eric Sandeen wrote:
>
>>"osyncisdsync" makes us go faster, and it's how other Linux
>>filesystems are set up, anyway - so make it the default.
>>
>
>The comment lies :)
>
>        /* For now, when the user asks for O_SYNC, we'll actually
>         * provide O_DSYNC. */
>        if (status >= 0) {
>                if ((file->f_flags & O_SYNC) || IS_SYNC(inode))
>                        status = generic_osync_inode(inode, OSYNC_METADATA|OSYNC_DATA);
>        }
>
>On other filesystems, O_SYNC writes actually sync both metadata
>and data, as well as the inode, if i_size changed.
>
>For default behaviour I suggest the best semantics are for
>an O_SYNC write to guarantee that the written data will be
>available after a crash.
>

Ah, but we do not go there:

        /* Handle various SYNC-type writes */
        if (ioflags & O_SYNC) {

                /* Flush all inode data buffers */

                error = -fsync_inode_buffers(ip);
                if (error)
                        goto out;

                /*
                 * If we're treating this as O_DSYNC and we have not 
updated the
                 * size, force the log.
                 */

                if (!(mp->m_flags & XFS_MOUNT_OSYNCISOSYNC)
                        && !(xip->i_update_size)) {
                        /*
                         * If an allocation transaction occurred
                         * without extending the size, then we have to force
                         * the log up the proper point to ensure that the
                         * allocation is permanent.  We can't count on
                         * the fact that buffered writes lock out direct I/O
                         * writes - the direct I/O write could have extended
                         * the size nontransactionally, then finished before
                         * we started.  xfs_write_file will think that 
the file
                         * didn't grow but the update isn't safe unless the
                         * size change is logged.
                         *
                         * Force the log if we've committed a transaction
                         * against the inode or if someone else has and
                         * the commit record hasn't gone to disk (e.g.
                         * the inode is pinned).  This guarantees that
                         * all changes affecting the inode are permanent
                         * when we return.
                         */

                        xfs_inode_log_item_t *iip;
                        xfs_lsn_t lsn;

                        iip = xip->i_itemp;
                        if (iip && iip->ili_last_lsn) {
                                lsn = iip->ili_last_lsn;
                                xfs_log_force(mp, lsn,
                                                XFS_LOG_FORCE | 
XFS_LOG_SYNC);
                        } else if (xfs_ipincount(xip) > 0) {
                                xfs_log_force(mp, (xfs_lsn_t)0,
                                                XFS_LOG_FORCE | 
XFS_LOG_SYNC);
                        }

                } else {
                        /*
                         * O_SYNC or O_DSYNC _with_ a size update are 
handled
                         * the same way.
                         *
                         * If the write was synchronous then we need to make
                         * sure that the inode modification time is 
permanent.
                         * We'll have updated the timestamp above, so here
                         * we use a synchronous transaction to log the 
inode.
                         * It's not fast, but it's necessary.
                         *
                         * If this a dsync write and the size got changed
                         * non-transactionally, then we need to ensure that
                         * the size change gets logged in a synchronous
                         * transaction.
                         */

                        tp = xfs_trans_alloc(mp, XFS_TRANS_WRITE_SYNC);
                        if ((error = xfs_trans_reserve(tp, 0,
                                                      
XFS_SWRITE_LOG_RES(mp),
                                                      0, 0, 0))) {
                                /* Transaction reserve failed */
                                xfs_trans_cancel(tp, 0);
                        } else {
                                /* Transaction reserve successful */
                                xfs_ilock(xip, XFS_ILOCK_EXCL);
                                xfs_trans_ijoin(tp, xip, XFS_ILOCK_EXCL);
                                xfs_trans_ihold(tp, xip);
                                xfs_trans_log_inode(tp, xip, XFS_ILOG_CORE);
                                xfs_trans_set_sync(tp);
                                error = xfs_trans_commit(tp, 0, 
(xfs_lsn_t)0);
                                xfs_iunlock(xip, XFS_ILOCK_EXCL);
                        }
                }