[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: TAKE - make "osyncisdsync" the default
Andrew Morton wrote:
>Stephen Lord wrote:
>
>>Andrew Morton wrote:
>>
>>>Eric Sandeen wrote:
>>>
>>>>"osyncisdsync" makes us go faster, and it's how other Linux
>>>>filesystems are set up, anyway - so make it the default.
>>>>
>>>The comment lies :)
>>>
>>> /* For now, when the user asks for O_SYNC, we'll actually
>>> * provide O_DSYNC. */
>>> if (status >= 0) {
>>> if ((file->f_flags & O_SYNC) || IS_SYNC(inode))
>>> status = generic_osync_inode(inode, OSYNC_METADATA|OSYNC_DATA);
>>> }
>>>
>>>On other filesystems, O_SYNC writes actually sync both metadata
>>>and data, as well as the inode, if i_size changed.
>>>
>>>For default behaviour I suggest the best semantics are for
>>>an O_SYNC write to guarantee that the written data will be
>>>available after a crash.
>>>
>>Ah, but we do not go there:
>>
>
>Now I'm lost. Eric's commit message seems to indicate that
>other filesystems do not sync metadata with O_SYNC, which
>isn't the case.
>
>What is the proposed XFS behaviour with O_SYNC??
>
>Thanks.
>
File data is always pushed out to disk.
Basically, if we did not update the inode size during this write then
the last
transaction which affected the inode is flushed out to the on disk log.
If the
inode size was changed by the write then a new transaction is created with
the inode in it and flushed out to disk immediately.
The difference between the two behaviors is in the old default we always
do the
transaction at the end of the write. The only consequence is that the inode
timestamps are not forced out to disk as part of each and every write.
Having the last transaction which modified the inode forced out to disk
makes all the metadata associated with the write safe.
So the only 'metadata' affected by this change is the inode timestamps,
this is actually the default behavior on a number of unix implementations.
Steve