Steve Lord wrote:
> OK, just picking a message to respond to from this thread.
> 1. It is not XFS which is deciding when to write the file data out
> to disk, it is Linux. The bdflush daemon is responsible for this,
> that 30 seconds is one of its control parameters.
> 2. An individual thread doing a write in XFS has no way of knowing
> or predicting what else may just of happened, or be about to happen
> on the system. You cannot say 'I will write my data now because
> the system is idle', there may be a couple of Gbytes of I/O about
> to come in from another source.
These two are technical reasons. I like to look at it from the POV of the
user or system administrator. The technical implementation needs to match
what they want to do. The implementation may need to be changed to make it
work properly. Yes, that would be work.
> 3. O_SYNC is a fairly standard flag on file open, if freebsd does not
> have it then it is missing a fairly major feature of filesystems.
> Having said that, I do not recommend using O_SYNC, it is more
> expensive than an fsync.
Does O_SYNC mean every single write() is synced? We don't need that.
> 4. Anyone who is shutting down their system by pulling the plug
> rather than doing an orderly shutdown is asking for trouble.
> Yes, it is a situation we want to deal with fairly gracefully,
> but filesystem recovery in journaling filesystems and fsck in
> others is there to 'recover' from problems, not to cater to
> lazy users. umount is there for a reason.
The "pulling the plug" test is a very good way to check what would happen if
the power fails unexpectedly. E.g., when a fuse blows.
> 5. The delayed write you talk about is the norm for ALL filesystems
> operating on spinning disks. If you don't delay writes in a filesystem
> then you will be here until Christmas responding to this email.
> Now XFS has delayed allocation which is different.
More technical reasons, which explain why it works this way, but give no
reason to want it that way.
> This last sentence is the major difference, and is probably what is
> biting here, the write has not really happened until this metadata
> makes it out to disk. We may be having some issues with how long
> this is taking in Linux.
There appears to be a problem: There is only a fixed delay, no clever
mechanism that starts flushing data when the disk isn't very busy.
> So the upshot of all of this is that I suspect we do have an issue, and we
> will get to it at some point. In the mean time there is no need to start
> discarding filesystems which do not behave as you want them to do if you
> pull the plug.
I think the main issue is that I would expect the system to flush data when
the system isn't doing anything. Scenario: I'm editing a file, write it out
and go for coffee. Switching on the coffee machine blows a fuse, and the
computer was on the same one. Now, how big is the chance that my work was
saved? If it takes 30 seconds before the data gets written, I probably lost
it. If the system recognized that I stopped typing and decided it has some
time to write to disk, I would have probably lost nothing. If I started
running make and caused a lot of disk I/O, I would not be surprised some data
One specific issue is when I copy a file (or in Vim: make a small change to a
file and write the new version). You want to end up with either the old
version or the new version. You might lose your most recent changes, but
that's only a small problem. It's a big problem when both are lost.
Technically this probably means that the meta data shouldn't be updated before
the data has been written. This is probably very complicated, but that is
what I see as an important issue for data safety.
5 out of 4 people have trouble with fractions.
/// Bram Moolenaar -- Bram@xxxxxxxxxxxxx -- http://www.moolenaar.net \\\
((( Creator of Vim -- http://vim.sf.net -- ftp://ftp.vim.org/pub/vim )))
\\\ Help me helping AIDS orphans in Uganda - http://iccf-holland.org ///