On Mon, Apr 07, 2008 at 08:57:38AM -0400, Christoph Hellwig wrote:
> I don't really like this. The chance to hit a previously used generation
> seems to high.
The chance to hit an existing generation number is almost non-existant.
The counter is incremented on every allocation and not just when
inode chunks are allocated on disk. Hence a series of "allocate
chunk, unlink + free chunk, realloc chunk" is guaranteed to get a
higher generation number on reallocation, as is the "allocate a
chunk, while [1] {allocate; unlink}, unlink chunk, reallocate
chunk." These are the issues that are causing use problems right
now.
The generation number won't get reused at all until it wraps at 2^32
allocations within the AG, and then you've got to have a chunk of inodes
get freed and reallocated at the same time the counter matches an inode
generation number. While not impossible, it'll be pretty rare....
> What about making the first few bits of each generation
> number a per-ag counter that's incremented anytime we deallocate an inode
> cluster?
First thing I considered - increment on chunk freeing is not
sufficient guarantee of short-term uniqueness. To guarantee short
term uniqueness, the generation number used to initialise the inode
chunk if it is immediately reallocated needs to be greater than the
maximum used by any inode in the chunk that got freed. Now the "counter"
becomes a "maximum generation number used in the AG" value. This
also adds significant complexity to xfs_icluster_free() as we have to
look at every inode in the chunk and not just the ones that are
in-core.
FWIW, the biggest complexity with this approach is wrapping - how do
you tell what the highest highest generation number in the inode
chunk being freed is when some have wrapped through zero?
I basically gave up on this approach because of the extra complexity
and nasty, untestable corner cases it introduced into code that is
already complex. A simple incrementing counter solves the short-term
uniqueness problem while still making it very hard to get duplicates in
the long term. If you really, really need long term uniqueness, then
use 'ikeep'.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
|