xfs
[Top] [All Lists]

Re: Delaylog information enquiry

To: Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: Delaylog information enquiry
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 30 Jul 2014 09:41:51 +1000
Cc: "Frank ." <frank_1005@xxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140729123815.GA13120@xxxxxxxxxxxxxxx>
References: <DUB129-W7B2973281D7E749989D43EEF80@xxxxxxx> <20140729123815.GA13120@xxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Jul 29, 2014 at 08:38:16AM -0400, Brian Foster wrote:
> On Tue, Jul 29, 2014 at 10:53:09AM +0200, Frank . wrote:
> > Hello. 
> > 
> > I just wanted to have more information about the delaylog feature. 
> > From what I understood it seems to be a common feature from different FS. 
> > It's supposed to retain information such as metadata for a time ( how much 
> > ?). Unfortunately, I could not find further information about journaling 
> > log section in the XFS official documentation. 
> > I just figured out that delaylog feature is now included and there is no 
> > way to disable it 
> > (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/xfs.txt?id=HEAD).
> >  
> > 
> 
> There is a design document for XFS delayed logging co-located with the
> xfs doc:
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/xfs-delayed-logging-design.txt?id=HEAD

Or, indeed, here:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs-documentation.git;a=blob;f=design/xfs-delayed-logging-design.asciidoc

> I'm not an expert on the delayed logging infrastructure so I can't give
> details, but it's basically a change to aggregate logged items into a
> list (committed item list - CIL) and "local" areas of memory (log
> vectors) at transaction commit time rather than logging directly into
> the log buffers. The benefits and tradeoffs of this are described in the
> link above. One tradeoff is that more items can be aggregated before a
> checkpoint occurs, so that naturally means more items are batched in
> memory and written to the log at a time.
> 
> This in turn means that in the event of a crash, more logged items are
> lost than the older, less efficient implementation. This doesn't effect
> the consistency of the fs, which is the purpose of the log.

In a nutshell.

Basically, logging in XFS is asynchronous unless directed by the
user application, specific operational constraints or mount options
to be synchronous.

> > Whatever the information it could be, I understood that this is a temporary 
> > memory located in RAM. 
> > Recently, I had a crash on a server and I had to execute the repair 
> > procedure which worked fine. 
> > 
> 
> A crash should typically only require a log replay and that happens
> automatically on the next mount. If you experience otherwise, it's a
> good idea to report that to the list with the data listed here:
> 
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> 
> > But I would like to disable this feature to prevent any temporary data not 
> > to be written do disk. (Write cache is already disabled on both hard drive 
> > and raid controller). 
> > 
> > Perhaps it's a bad idea disabling it. If so, I would like to have your 
> > opinion about where memory corruption could happen. 
> > 
> 
> Delayed logging is not configurable these days. The original
> implementation was optional via a mount option, but my understanding is
> that might have been more of a precaution for a new feature than a real
> tuning option.
> 
> If you want to ensure consistency of certain operations, those
> applications should issue fsync() calls as appropriate. You could also
> look into the 'wsync' mount option (and probably expect a significant
> performance hit).

Using the 'wsync' or 'dirsync' mount options effectively cause the
majority of transactions to be synchronous - it always has, even
before delayed logging was implemented - so that once a user visible
namespace operation completes, it is guaranteed to be on stable
storage. This is necessary for HA environments so that failover from
one server to another doesn't result in files appearing or
disappearing on failover...

Note that this does not change file data behaviour. In this case you
need to add the "sync" mount option, which forces all buffered IO to
be synchronous and so will be *very slow*. But if you've already
turned off the BBWC on the RAID controller then your storage is
already terribly slow and so you probably won't care about making
performance even worse...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>