xfs
[Top] [All Lists]

Re: vim file write mode on journaling fs.

To: Bram Moolenaar <Bram@xxxxxxxxxxxxx>
Subject: Re: vim file write mode on journaling fs.
From: Steve Lord <lord@xxxxxxx>
Date: Sat, 11 Aug 2001 16:26:32 -0500
Cc: Russell Cattelan <cattelan@xxxxxxxxxxx>, Seth Mos <knuffie@xxxxxxxxx>, Linux XFS Mailing List <linux-xfs@xxxxxxxxxxx>
Comments: In-reply-to Bram Moolenaar <Bram@moolenaar.net> message dated "Sat, 11 Aug 2001 20:48:01 +0200."
References: <200108111848.f7BIm1704198@moolenaar.net>
Sender: owner-linux-xfs@xxxxxxxxxxx
OK, just picking a message to respond to from this thread.

  1. It is not XFS which is deciding when to write the file data out
     to disk, it is Linux. The bdflush daemon is responsible for this,
     that 30 seconds is one of its control parameters.

  2. An individual thread doing a write in XFS has no way of knowing
     or predicting what else may just of happened, or be about to happen
     on the system. You cannot say 'I will write my data now because
     the system is idle', there may be a couple of Gbytes of I/O about
     to come in from another source.

  3. O_SYNC is a fairly standard flag on file open, if freebsd does not
     have it then it is missing a fairly major feature of filesystems.
     Having said that, I do not recommend using O_SYNC, it is more
     expensive than an fsync.

  4. Anyone who is shutting down their system by pulling the plug 
     rather than doing an orderly shutdown is asking for trouble.
     Yes, it is a situation we want to deal with fairly gracefully,
     but filesystem recovery in journaling filesystems and fsck in
     others is there to 'recover' from problems, not to cater to
     lazy users. umount is there for a reason.

  5. The delayed write you talk about is the norm for ALL filesystems
     operating on spinning disks. If you don't delay writes in a filesystem
     then you will be here until Christmas responding to this email.
     Now XFS has delayed allocation which is different.

     The normal process of a write in linux is something like this:

        o write system call comes in, looks for space in the file, if
          there is none it asks the filesystem to allocate some, the
          data is copied into a buffer which has the disk address of
          this data and which is marked dirty.
          The write then returns - the data is NOT written
          to disk, nor in the case of ext2 would any of the metadata
          changes be written to disk.

        o The bdflush daemon comes along and sees buffers as being
          suitable for flushing, the data and metadata gets written
          out to disk.

     With XFS it looks like this:

        o write system call comes in looks for space in the file, if
          there is none it asks the filesystem for some, the filesystem
          records the fact that space was requested at this point in the
          file. A buffer is allocated as before, it is marked dirty, it
          is also marked delayed allocate. The write returns.

        o Possibly an inode flush or a log flush pushes the new inode
          out to disk.

        o The bdflush daemon comes along and sees the buffers as being
          delayed allocate, it calls the filesystem to allocate the space.
          The allocate is done, and the buffers are written out to disk.
          The transaction which records the extents is still in memory
          and will not be flushed for a few seconds yet.

      This last sentence is the major difference, and is probably what is
      biting here, the write has not really happened until this metadata
      makes it out to disk. We may be having some issues with how long
      this is taking in Linux.

So the upshot of all of this is that I suspect we do have an issue, and we
will get to it at some point. In the mean time there is no need to start
discarding filesystems which do not behave as you want them to do if you
pull the plug.

Steve


<Prev in Thread] Current Thread [Next in Thread>