> The "xFS Project Architecture" design document has the following to say
> about the requirements made on disk driver by XFS:
> Disk drivers are the same as in traditional and current IRIX systems,
> except for ordering constraints needed by the log and volume managers
> and rate guarantees. In particular, it must be possible for the volume
> manager to be certain that a write request has actually been completed,
> not merely cached for later writing. It should also be possible for the
> volume manager to specify that a given write not be reordered - that all
> blocks passed to the driver before this block will be written before
> this block is written, and all blocks passed to the driver after this
> block will be written after this block. (If non-ordered writes are not
> available on a particular driver, the volume manager can synthesize
> this behavior by waiting for completion, but this is much less efficient.)
> I assume the following still holds true. To my knowledge only IBM's IDE
> drives support those semantics. Does this mean that all log-writes on
> non-IBM IDE drives are synchronous? How does the volume manager know
> whether or not the disk driver supports this behaviour?
All ordering guarantees are handled by XFS itself nowadays - it does not
queue buffers down to the driver layer until it is OK for the I/O to complete.
The one remaining aspect of this is probably GRIO handling where more
explicit control over I/O requests is required.
The issue this is talking about now boils down to the fact that XFS wants the
I/O completion for a log or meta-data write to really mean that the data is
going to survive the machine going down. This is used as an indication that we
can start writing the metadata (in the case of a log write completion) or reuse
log space (in the case of metadata I/O completion).
So drive write caching can be an issue, what happens to write cached data at
1) power off
2) scsi bus reset (and the ide equivalent).
Generally this is controllable externally, I do not know if manufacturers
default drive write caching to off or not - I know IBM does for scsi drives.
I believe this will be an issue for other filesystems with logging such as
ext3 and Reiserfs.
Long term it would be nice to have a driver interface which accepts a flag
with an I/O request which indicates if it is cachable or not. There is some
I/O in XFS (user data) where write caching would be an acceptable thing to
do, and we have talked about being able to take advantage of it.
p.s. we do have some synchronous transactions where the log write gets forced
out to disk at transaction commit time, in most cases this does not happen
though, XFS can (and does) do less than one disk I/O per transaction in the
best case. Another long term goal would be to restructure these transactions
to avoid the requirement that they be synchronous.