[Top] [All Lists]

Re: [PATCH, RFC] xfs: batched discard support

To: Mark Lord <liml@xxxxxx>
Subject: Re: [PATCH, RFC] xfs: batched discard support
From: Rolf Eike Beer <eike-kernel@xxxxxxxxx>
Date: Thu, 20 Aug 2009 17:43:41 +0200
Cc: Ric Wheeler <rwheeler@xxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>, Paul Mackerras <paulus@xxxxxxxxx>, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, linux-scsi@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, jens.axboe@xxxxxxxxxx, "IDE/ATA development list" <linux-ide@xxxxxxxxxxxxxxx>, Neil Brown <neilb@xxxxxxx>
In-reply-to: <4A8D5FDB.7080505@xxxxxx>
References: <20090816004705.GA7347@xxxxxxxxxxxxx> <4A8D5442.1000302@xxxxxxxxxx> <4A8D5FDB.7080505@xxxxxx>
User-agent: KMail/1.12.0 (Linux/2.6.31-rc6-git; KDE/4.3.0; i686; ; )
Mark Lord wrote:
> Ric Wheeler wrote:
> > Note that returning consistent data is critical for devices that are
> > used in a RAID group since you will need each RAID block that is used to
> > compute the parity to continue to return the same data until you
> > overwrite it with new data :-)
> >
> > If we have a device that does not support this (or is misconfigured not
> > to do this), we should not use those devices in an MD group & do discard
> > against it...
> ..
> Well, that's a bit drastic.  But the RAID software should at least
> not issue TRIM commands in ignorance of such.
> Would it still be okay to do the TRIMs when the entire parity stripe
> (across all members) is being discarded?  (As opposed to just partial
> data there being dropped)

I think there might be a related usecase that could benefit from 
TRIM/UNMAP/whatever support in file systems even if the physical devices do 
not support that. I have a RAID5 at work with LVM over it. This week I deleted 
an old logical volume of some 200GB that has been moved to a different volume 
group, tomorrow I will start to replace all the disks in the raid with bigger 
ones. So if the LVM told the raid "hey, this space is totally garbage from now 
on" the raid would not have to do any calculation when it has to rebuild that 
but could simply write fixed patterns to all disks (e.g. 0 to first data, 0 to 
second data and 0 as "0 xor 0" to parity). With the knowledge that some of the 
underlying devices would support "write all to zero" this operation could be 
speed up even more, with "write all fixed pattern" every unused chunk would go 
down to a single write operation (per disk) on rebuild regardless which parity 
algorithm is used.

And even if things are in use the RAID can benefit from such things. If we 
just define that every unmapped space will always be 0 when read and I write 
to a raid volume and the other part of the checksum calculation is unmapped 
checksumming becomes easy as we already know half of the values before: 0. So 
we can save the reads from the second data stripe and most of the calculation.
"dd if=/dev/md0" on an unmapped space is more or less the same as "dd 
if=/dev/zero" than.

I only fear that these things are too obviously as I would be the first to 
have this idea ;)



Attachment: signature.asc
Description: This is a digitally signed message part.

<Prev in Thread] Current Thread [Next in Thread>