[Top] [All Lists]

Re: op-journaled fs, journal size and storage speeds

To: Peter Grandi <pg_mh@xxxxxxxxxx>
Subject: Re: op-journaled fs, journal size and storage speeds
From: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Mon, 2 May 2011 06:40:31 -0400
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Linux fs XFS <linux-xfs@xxxxxxxxxxx>, Linux fs JFS <jfs-discussion@xxxxxxxxxxxxxxxxxxxxx>
In-reply-to: <19901.41647.606112.243194@xxxxxxxxxxxxxxxxxx>
References: <19900.8703.214676.218477@xxxxxxxxxxxxxxxxxx> <20110501092758.GG13542@dastard> <19901.41647.606112.243194@xxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Sun, May 01, 2011 at 07:13:03PM +0100, Peter Grandi wrote:
> > That's why you can configure an external log....
> ...and lose barriers :-). But indeed.

Using a writeback cache on the log device is rather pointless as
every writes needs write through semantics using FUA or a post-flush
anyway.  But I actually have patch to allow for devices with
a writeback cache in external log configurations, it's just a bit
complicated as we basically need to copy the pre-flush statemachine
into XFS to deal with the preflush beeing for a different device
than the actual write.

> >> But if they can be pretty small, I wonder whether putting the
> >> journals of several filesystems on the same storage device then
> >> becomes a sensible option as the locality will be quite narrow
> >> (e.g. a single physical cylinder) or it could be wortwhile like
> >> the database people do to journal to battery-backed RAM.
> For example as described in this old paper:

It only makes sense if the log activity bursts for the different
filesystems happen at different times, or none of the filesystems
maxes out the log IOP rate.  

> But they seem to me fundamentally terrible for journals, because
> of the large erase blocks sizes and the enormous latency of erase
> operations (lots of read-erase-write cycles for small commits).
> They seem more oriented to large mostly read-only data sets than
> very small mostly write ones.

As mentioned earlier in this thread XFS allows to align and pad
log writes.  Just make sure to get a device with an erase block
size <= 256 kilobytes, which usually means SLC.  But even drives
with a larger erase block size and sane firmware tend to be faster
than plain old disks.  But as Dave mentioned there's nothing that's
going to beat a battery backed cache/memory for log IOP performance.

> The saving grace is the capacitor-backed RAM in SSDs (used to work
> around erase block size issues as you probably know) which to a
> significant extent may act as the  battery-backed RAM  I was
> mentioning; and similarly as another post says the  battery-backed
> RAM  in RAID host adapters would do much the same function.

Just make sure your device actually has it.  Both the Intel X25 SSDs
and many other consumer / prosumer SSDs actually don't have them
and will lose data in case of a powerloss.

<Prev in Thread] Current Thread [Next in Thread>