On Wed, Jul 30, 2014 at 1:41 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Tue, Jul 29, 2014 at 08:38:16AM -0400, Brian Foster wrote:
>> On Tue, Jul 29, 2014 at 10:53:09AM +0200, Frank . wrote:
>> > Hello.
>> > I just wanted to have more information about the delaylog feature.
>> > From what I understood it seems to be a common feature from different FS.
>> > It's supposed to retain information such as metadata for a time ( how much
>> > ?). Unfortunately, I could not find further information about journaling
>> > log section in the XFS official documentation.
>> > I just figured out that delaylog feature is now included and there is no
>> > way to disable it
>> > (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/xfs.txt?id=HEAD).
>> There is a design document for XFS delayed logging co-located with the
>> xfs doc:
> Or, indeed, here:
>> I'm not an expert on the delayed logging infrastructure so I can't give
>> details, but it's basically a change to aggregate logged items into a
>> list (committed item list - CIL) and "local" areas of memory (log
>> vectors) at transaction commit time rather than logging directly into
>> the log buffers. The benefits and tradeoffs of this are described in the
>> link above. One tradeoff is that more items can be aggregated before a
>> checkpoint occurs, so that naturally means more items are batched in
>> memory and written to the log at a time.
>> This in turn means that in the event of a crash, more logged items are
>> lost than the older, less efficient implementation. This doesn't effect
>> the consistency of the fs, which is the purpose of the log.
> In a nutshell.
> Basically, logging in XFS is asynchronous unless directed by the
> user application, specific operational constraints or mount options
> to be synchronous.
>> > Whatever the information it could be, I understood that this is a
>> > temporary memory located in RAM.
>> > Recently, I had a crash on a server and I had to execute the repair
>> > procedure which worked fine.
>> A crash should typically only require a log replay and that happens
>> automatically on the next mount. If you experience otherwise, it's a
>> good idea to report that to the list with the data listed here:
>> > But I would like to disable this feature to prevent any temporary data not
>> > to be written do disk. (Write cache is already disabled on both hard drive
>> > and raid controller).
>> > Perhaps it's a bad idea disabling it. If so, I would like to have your
>> > opinion about where memory corruption could happen.
>> Delayed logging is not configurable these days. The original
>> implementation was optional via a mount option, but my understanding is
>> that might have been more of a precaution for a new feature than a real
>> tuning option.
>> If you want to ensure consistency of certain operations, those
>> applications should issue fsync() calls as appropriate. You could also
>> look into the 'wsync' mount option (and probably expect a significant
>> performance hit).
> Using the 'wsync' or 'dirsync' mount options effectively cause the
> majority of transactions to be synchronous - it always has, even
> before delayed logging was implemented - so that once a user visible
> namespace operation completes, it is guaranteed to be on stable
> storage. This is necessary for HA environments so that failover from
> one server to another doesn't result in files appearing or
> disappearing on failover...
> Note that this does not change file data behaviour. In this case you
> need to add the "sync" mount option, which forces all buffered IO to
> be synchronous and so will be *very slow*. But if you've already
> turned off the BBWC on the RAID controller then your storage is
> already terribly slow and so you probably won't care about making
> performance even worse...
Dave, excuse my ignorant questions
I know the Linux kernel keeps data in cache up to 30 seconds before a
kernel daemon flushes it to disk, unless
the configured dirty ratio (which is 40% of RAM, iirc) is reached
before these 30 seconds so the flush is done before it
What I did is lower these 30 seconds to 5 seconds so every 5 seconds
data is flushed to disk (I've set the dirty_expire_centisecs to 500).
So, are there any drawbacks in doing this? I mean, I don't care *that*
much for performance but I do want my dirty data to be on
storage in a reasonable amount of time. I looked at the various sync
mount options but they all are synchronous so it is my
impression they'll be slower than giving the kernel 5 seconds to keep
data and then flush it.
>From XFS perspective, I'd like to know if this is not recommended or
if it is? I know that with setting the above to 500 centisecs
means that there will be more writes to disk and potentially may
result in tear & wear, thus shortening the lifetime of the
This is a regular desktop system with a single Seagate Constellation
SATA disk so no RAID, LVM, thin provision or anything else
What do you think? :)
> Dave Chinner
> xfs mailing list