[Top] [All Lists]

Re: Issues with delalloc->real extent allocation

To: Geoffrey Wehrman <gwehrman@xxxxxxx>
Subject: Re: Issues with delalloc->real extent allocation
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 24 Jan 2011 10:26:37 +1100
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, bpm@xxxxxxx, xfs@xxxxxxxxxxx
In-reply-to: <20110121144152.GD10729@xxxxxxx>
References: <20110114235549.GI16267@dastard> <20110118204752.GB28791@xxxxxxxxxxxxx> <20110118231831.GZ28803@dastard> <20110119120321.GC12941@xxxxxxxxxxxxx> <20110119133147.GN16267@dastard> <20110119135548.GA11502@xxxxxxxxxxxxx> <20110120013346.GO16267@dastard> <20110120144503.GA7225@xxxxxxx> <20110121025140.GY16267@dastard> <20110121144152.GD10729@xxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Fri, Jan 21, 2011 at 08:41:52AM -0600, Geoffrey Wehrman wrote:
> On Fri, Jan 21, 2011 at 01:51:40PM +1100, Dave Chinner wrote:
> | Realistically, for every disadvantage or advantage we can enumerate
> | for specific workloads, I think one of us will be able to come up
> | with a counter example that shows the opposite of the original
> | point. I don't think this sort of argument is particularly
> | productive. :/
> Sorry, I wasn't trying to be argumentative.  Rather I was just documenting
> what I saw as potential issues.  I'm not arguing against your proposed
> change.  If you don't find my sharing of observations productive, I'm
> happy to keep my thoughts to my self in the future.

Ah, that's not what I meant, Geoffrey. Reading it back, I probably
should have said "direction of discussion" rather than "sort of
argument" to make it more obvious I trying not to get stuck with us
goign round and round trying to demonstrate the pros and cons of
different approaches on a workload-by-workload basis.

Basically all I was trying to do is move the discusion past a
potential sticking point - I definitely value the input and insight
you provide, and I'll try to write more clearly to hopefully avoid
such misunderstandings in future discussions.

> | Instead, I look at it from the point of view that a 64k IO is little
> | slower than a 4k IO so such a change would not make much difference
> | to performance. And given that terabytes of storage capacity is
> | _cheap_ these days (and getting cheaper all the time), the extra
> | space of using 64k instead of 4k for sparse blocks isn't a big deal.
> | 
> | When I combine that with my experience from SGI where we always
> | recommended using filesystems block size == page size for best IO
> | performance on HPC setups, there's a fair argument that using page
> | size extents for small sparse writes isn't a problem we really need
> | to care about.
> | 
> | І'd prefer to design for where we expect storage to be in the next
> | few years e.g. 10TB spindles. Minimising space usage is not a big
> | priority when we consider that in 2-3 years 100TB of storage will
> | cost less than $5000 (it's about $15-20k right now).  Even on
> | desktops we're going to have more capacity that we know what to do
> | with, so trading off storage space for lower memory overhead, lower
> | metadata IO overhead and lower potential fragmentation seems like
> | the right way to move forward to me.
> | 
> | Does that seem like a reasonable position to take, or are there
> | other factors that you think I should be considering?
> Keep in mind that storage of the future may not be on spindles, and
> fragmentation may not be an issue.  Even so, with SSD 64K I/O is very
> reasonable as most flash memory implements at a minimum 64K page.  I'm
> fully in favor your proposal to require page sized I/O.

With flash memory there is the potential that we don't even need to
care. The trend is towards on-device compression (e.g. Sandforce
controllers already do this) to reduce write amplification to values
lower than one. Hence a 4k write surrounded by 60k of zeros is
unlikely to be a major issue as it will compress really well.... :)


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>