xfs
[Top] [All Lists]

Re: TAKE - make "osyncisdsync" the default

To: Andrew Morton <akpm@xxxxxxxxxx>
Subject: Re: TAKE - make "osyncisdsync" the default
From: Stephen Lord <lord@xxxxxxx>
Date: Fri, 15 Feb 2002 19:46:57 -0600
Cc: Eric Sandeen <sandeen@xxxxxxx>, "linux-xfs@xxxxxxxxxxx" <linux-xfs@xxxxxxxxxxx>
References: <200202152250.g1FMokw05049@xxxxxxxxxxxxxxxxxxxxxx> <3C6D9686.5444468A@xxxxxxxxxx> <3C6D9B4A.2000601@xxxxxxx> <3C6D9D57.FA92EA6A@xxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20011226
Andrew Morton wrote:

Stephen Lord wrote:

Andrew Morton wrote:

Eric Sandeen wrote:

"osyncisdsync" makes us go faster, and it's how other Linux
filesystems are set up, anyway - so make it the default.

The comment lies :)

      /* For now, when the user asks for O_SYNC, we'll actually
       * provide O_DSYNC. */
      if (status >= 0) {
              if ((file->f_flags & O_SYNC) || IS_SYNC(inode))
                      status = generic_osync_inode(inode, 
OSYNC_METADATA|OSYNC_DATA);
      }

On other filesystems, O_SYNC writes actually sync both metadata
and data, as well as the inode, if i_size changed.

For default behaviour I suggest the best semantics are for
an O_SYNC write to guarantee that the written data will be
available after a crash.

Ah, but we do not go there:


Now I'm lost.  Eric's commit message seems to indicate that
other filesystems do not sync metadata with O_SYNC, which
isn't the case.

What is the proposed XFS behaviour with O_SYNC??

Thanks.


File data is always pushed out to disk.

Basically, if we did not update the inode size during this write then the last transaction which affected the inode is flushed out to the on disk log. If the
inode size was changed by the write then a new transaction is created with
the inode in it and flushed out to disk immediately.

The difference between the two behaviors is in the old default we always do the
transaction at the end of the write. The only consequence is that the inode
timestamps are not forced out to disk as part of each and every write.

Having the last transaction which modified the inode forced out to disk
makes all the metadata associated with the write safe.

So the only 'metadata' affected by this change is the inode timestamps,
this is actually the default behavior on a number of unix implementations.

Steve



<Prev in Thread] Current Thread [Next in Thread>