xfs
[Top] [All Lists]

Re: [PATCH] Don't initialise new inode generation numbers to zero V2

To: David Chinner <dgc@xxxxxxx>
Subject: Re: [PATCH] Don't initialise new inode generation numbers to zero V2
From: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Mon, 28 Apr 2008 02:25:47 -0400
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, Greg Banks <gnb@xxxxxxxxxxxxxxxxx>, xfs-dev <xfs-dev@xxxxxxx>, xfs-oss <xfs@xxxxxxxxxxx>
In-reply-to: <20080428062032.GI103491721@xxxxxxx>
References: <20080422015806.GU108924158@xxxxxxx> <480D641B.8060301@xxxxxxxxxxxxxxxxx> <20080422050447.GV103491721@xxxxxxx> <20080425085750.GA6395@xxxxxxxxxxxxx> <20080428031120.GD103491721@xxxxxxx> <20080428055922.GD3192@xxxxxxxxxxxxx> <20080428062032.GI103491721@xxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.17 (2007-11-01)
Looks good.

On Mon, Apr 28, 2008 at 04:20:32PM +1000, David Chinner wrote:
> Don't initialise new inode generation numbers to zero
> 
> When we allocation new inode chunks, we initialise the generation
> numbers to zero. This works fine until we delete a chunk and then
> reallocate it, resulting in the same inode numbers but with a
> reset generation count. This can result in inode/generation
> pairs of different inodes occurring relatively close together.
> 
> Given that the inode/gen pair makes up the "unique" portion of
> an NFS filehandle on XFS, this can result in file handles cached
> on clients being seen on the wire from the server but refer to
> a different file. This causes .... issues for NFS clients.
> 
> Hence we need a unique generation number initialisation for
> each inode to prevent reuse of a small portion of the generation
> number space. Make this initialiser per-allocation group so
> that it is not a single point of contention in the filesystem,
> and increment it on every allocation within an AG to reduce the
> chance that a generation number is reused for a given inode number
> if the inode chunk is deleted and reallocated immediately
> afterwards.
> 
> Version 3:
> o use random32 rather than get_random_int() as cryptographically
>   secure random numbers are not really necessary here.
> 
> Version 2:
> o remove persistent per-AGI agi_newinogen field and replace with
>   randomly generated 32 bit number for each new cluster. This prevents
>   NFS clients from potentially guessing what the next generation
>   number is going to be and removes the need for persistent numbers on
>   disk.
> 
> Signed-off-by: Dave Chinner <dgc@xxxxxxx>
> ---
>  fs/xfs/xfs_ialloc.c |   10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c    2008-04-28 16:12:57.376445802 
> +1000
> +++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2008-04-28 16:15:04.427919630 +1000
> @@ -147,6 +147,7 @@ xfs_ialloc_ag_alloc(
>       int             version;        /* inode version number to use */
>       int             isaligned = 0;  /* inode allocation at stripe unit */
>                                       /* boundary */
> +     unsigned int    gen;
>  
>       args.tp = tp;
>       args.mp = tp->t_mountp;
> @@ -290,6 +291,14 @@ xfs_ialloc_ag_alloc(
>       else
>               version = XFS_DINODE_VERSION_1;
>  
> +     /*
> +      * Seed the new inode cluster with a random generation number. This
> +      * prevents short-term reuse of generation numbers if a chunk is
> +      * freed and then immediately reallocated. We use random numbers
> +      * rather than a linear progression to prevent the next generation
> +      * number from easily guessable.
> +      */
> +     gen = random32();
>       for (j = 0; j < nbufs; j++) {
>               /*
>                * Get the block.
> @@ -309,6 +318,7 @@ xfs_ialloc_ag_alloc(
>                       free = XFS_MAKE_IPTR(args.mp, fbuf, i);
>                       free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
>                       free->di_core.di_version = version;
> +                     free->di_core.di_gen = cpu_to_be32(gen);
>                       free->di_next_unlinked = cpu_to_be32(NULLAGINO);
>                       xfs_ialloc_log_di(tp, fbuf, i,
>                               XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED);
---end quoted text---


<Prev in Thread] Current Thread [Next in Thread>