xfs
[Top] [All Lists]

Re: xfs: add FITRIM support

To: Lukas Czerner <lczerner@xxxxxxxxxx>
Subject: Re: xfs: add FITRIM support
From: Lukas Czerner <lczerner@xxxxxxxxxx>
Date: Thu, 6 Jan 2011 09:40:48 +0100 (CET)
Cc: Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, Dave Chinner <david@xxxxxxxxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>
In-reply-to: <alpine.LFD.2.00.1101060923210.2731@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
References: <20101125112304.GA4195@xxxxxxxxxxxxx> <201101052307.38379@xxxxxx> <20110105225039.GD8322@dastard> <201101060910.34534@xxxxxx> <alpine.LFD.2.00.1101060923210.2731@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
User-agent: Alpine 2.00 (LFD 1167 2008-08-23)
On Thu, 6 Jan 2011, Lukas Czerner wrote:

> On Thu, 6 Jan 2011, Michael Monnerie wrote:
> 
> > On Mittwoch, 5. Januar 2011 Dave Chinner wrote:
> > > No state or additional on-disk
> > > structures are needed for xfs_fsr to do it's work....
> > 
> > That's not exactly the same - once you defraged a file, you know it's 
> > done, and can skip it next time. But you dont know if the (free) space 
> > between block 0 and 20 on disk has been rewritten since the last trim 
> > run or not used at all, so you'd have to do it all again.
> >  
> > > The background trim is intended to enable even the slowest of
> > > devices to be trimmed over time, while introducing as little runtime
> > > overhead and complexity as possible. Hence adding complexity and
> > > runtime overhead to optimise background trimming tends to defeat the
> > > primary design goal....
> > 
> > It would be interesting to have real world numbers to see what's "best". 
> > I'd imagine a normal file or web server to store tons of files that are 
> > mostly read-only, while 5% of it a used a lot, as well as lots of temp 
> > files. For this, knowing what's been used would be great.
> > 
> > Also, I'm thinking of a NetApp storage, that has been setup to run 
> > deduplication on Sunday. It's best to run trim on Saturday and it should 
> > be finished before Sunday. For big storages that might be not easy to 
> > finish, if all disk space has to be freed explicitly.
> > 
> > And wouldn't it still be cheaper to keep a "written bmap" than to run 
> > over the full space of a (big) disk? I'd say depends on the workload.
> > 
> 
> I have already investigated approach with storing the information about
> blocks freed since last trim. However I found it not that useful for
> several reasons.
> 
> 1. Bitmaps are big, especially on huge filesystems you are talking about
> it will significantly increase the memory utilization.
> 
> 2. Rbtree might be better, however there is some threshold we need to
> watch, because when it gets really fragmented it can be bigger than
> bitmap. Moreover it adds significant complexity and of course CPU
> utilization.

Not talking about the fact that neither bitmaps not rbtrees can survive
umount.

> 
> 3. As I said several times, we do not need to trim when there was not
> enough writes from the last trim, because when we have enough space for
> example for wear leveling in SSD, we do not need to reclaim more, OR we
> can do is really slowly as a precaution measure.
> 
> All that said, we have much more flexibility in user space and we can
> think of a lots of different heuristic to determine whether or not to do
> the trim and how.
> 
> Thanks!
> -Lukas
> 

<Prev in Thread] Current Thread [Next in Thread>