Andrew Morton wrote:
Eric Sandeen wrote:
"osyncisdsync" makes us go faster, and it's how other Linux
filesystems are set up, anyway - so make it the default.
The comment lies :)
/* For now, when the user asks for O_SYNC, we'll actually
* provide O_DSYNC. */
if (status >= 0) {
if ((file->f_flags & O_SYNC) || IS_SYNC(inode))
status = generic_osync_inode(inode,
OSYNC_METADATA|OSYNC_DATA);
}
On other filesystems, O_SYNC writes actually sync both metadata
and data, as well as the inode, if i_size changed.
For default behaviour I suggest the best semantics are for
an O_SYNC write to guarantee that the written data will be
available after a crash.
Ah, but we do not go there:
/* Handle various SYNC-type writes */
if (ioflags & O_SYNC) {
/* Flush all inode data buffers */
error = -fsync_inode_buffers(ip);
if (error)
goto out;
/*
* If we're treating this as O_DSYNC and we have not
updated the
* size, force the log.
*/
if (!(mp->m_flags & XFS_MOUNT_OSYNCISOSYNC)
&& !(xip->i_update_size)) {
/*
* If an allocation transaction occurred
* without extending the size, then we have to force
* the log up the proper point to ensure that the
* allocation is permanent. We can't count on
* the fact that buffered writes lock out direct I/O
* writes - the direct I/O write could have extended
* the size nontransactionally, then finished before
* we started. xfs_write_file will think that
the file
* didn't grow but the update isn't safe unless the
* size change is logged.
*
* Force the log if we've committed a transaction
* against the inode or if someone else has and
* the commit record hasn't gone to disk (e.g.
* the inode is pinned). This guarantees that
* all changes affecting the inode are permanent
* when we return.
*/
xfs_inode_log_item_t *iip;
xfs_lsn_t lsn;
iip = xip->i_itemp;
if (iip && iip->ili_last_lsn) {
lsn = iip->ili_last_lsn;
xfs_log_force(mp, lsn,
XFS_LOG_FORCE |
XFS_LOG_SYNC);
} else if (xfs_ipincount(xip) > 0) {
xfs_log_force(mp, (xfs_lsn_t)0,
XFS_LOG_FORCE |
XFS_LOG_SYNC);
}
} else {
/*
* O_SYNC or O_DSYNC _with_ a size update are
handled
* the same way.
*
* If the write was synchronous then we need to make
* sure that the inode modification time is
permanent.
* We'll have updated the timestamp above, so here
* we use a synchronous transaction to log the
inode.
* It's not fast, but it's necessary.
*
* If this a dsync write and the size got changed
* non-transactionally, then we need to ensure that
* the size change gets logged in a synchronous
* transaction.
*/
tp = xfs_trans_alloc(mp, XFS_TRANS_WRITE_SYNC);
if ((error = xfs_trans_reserve(tp, 0,
XFS_SWRITE_LOG_RES(mp),
0, 0, 0))) {
/* Transaction reserve failed */
xfs_trans_cancel(tp, 0);
} else {
/* Transaction reserve successful */
xfs_ilock(xip, XFS_ILOCK_EXCL);
xfs_trans_ijoin(tp, xip, XFS_ILOCK_EXCL);
xfs_trans_ihold(tp, xip);
xfs_trans_log_inode(tp, xip, XFS_ILOG_CORE);
xfs_trans_set_sync(tp);
error = xfs_trans_commit(tp, 0,
(xfs_lsn_t)0);
xfs_iunlock(xip, XFS_ILOCK_EXCL);
}
}
|