xfs
[Top] [All Lists]

Re: [PATCH 2/4] xfs: ensure verifiers are attached to recovered buffers

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 2/4] xfs: ensure verifiers are attached to recovered buffers
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Wed, 30 Jul 2014 12:30:03 -0400
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1406684929-11133-3-git-send-email-david@xxxxxxxxxxxxx>
References: <1406684929-11133-1-git-send-email-david@xxxxxxxxxxxxx> <1406684929-11133-3-git-send-email-david@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.23 (2014-03-12)
On Wed, Jul 30, 2014 at 11:48:47AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> Crash testing of CRC enabled filesystems has resulted in a number of
> reports of bad CRCs being detected after the filesystem was mounted.
> Errors such as the following were being seen:
> 
> XFS (sdb3): Mounting V5 Filesystem
> XFS (sdb3): Starting recovery (logdev: internal)
> XFS (sdb3): Metadata CRC error detected at xfs_agf_read_verify+0x5a/0x100 
> [xfs], block 0x1
> XFS (sdb3): Unmount and run xfs_repair
> XFS (sdb3): First 64 bytes of corrupted metadata buffer:
> ffff880136ffd600: 58 41 47 46 00 00 00 01 00 00 00 00 00 0f aa 40  
> XAGF...........@
> ffff880136ffd610: 00 02 6d 53 00 02 77 f8 00 00 00 00 00 00 00 01  
> ..mS..w.........
> ffff880136ffd620: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03  
> ................
> ffff880136ffd630: 00 00 00 04 00 08 81 d0 00 08 81 a7 00 00 00 00  
> ................
> XFS (sdb3): metadata I/O error: block 0x1 ("xfs_trans_read_buf_map") error 74 
> numblks 1
> 
> The errors were typically being seen in AGF, AGI and their related
> btree block buffers some time after log recovery had run. Often it
> wasn't until later subsequent mounts that the problem was
> discovered. The common symptom was a buffer with the correct
> contents, but a CRC and an LSN that matched an older version of the
> contents.
> 
> Some debug added to _xfs_buf_ioapply() indicated that buffers were
> being written without verifiers attached to them from log recovery,
> and Jan Kara isolated the cause to log recovery readahead an dit's
> interactions with buffers that had a more recent LSN on disk than
> the transaction being recovered. In this case, the buffer did not
> get a verifier attached, and os when the second phase of log
> recovery ran and recovered EFIs and unlinked inodes, the buffers
> were modified and written without the verifier running. Hence they
> had up to date contents, but stale LSNs and CRCs.
> 
> Fix it by attaching verifiers to buffers we skip due to future LSN
> values so they don't escape into the buffer cache without the
> correct verifier attached.
> 
> This patch is based on analysis and a patch from Jan Kara.
> 
> cc: <stable@xxxxxxxxxxxxxxx>
> Reported-by: Jan Kara <jack@xxxxxxx>
> Reported-by: Fanael Linithien <fanael4@xxxxxxxxx>
> Reported-by: Grozdan <neutrino8@xxxxxxxxx>
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_log_recover.c | 47 +++++++++++++++++++++++++++++++----------------
>  1 file changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index fbc2362..0a015cc 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2126,6 +2126,17 @@ xlog_recover_validate_buf_type(
>       __uint16_t              magic16;
>       __uint16_t              magicda;
>  
> +     /*
> +      * We can only do post recovery validation on items on CRC enabled
> +      * fielsystems as we need to know when the buffer was written to be able
> +      * to determine if we should have replayed the item. If we replay old
> +      * metadata over a newer buffer, then it will enter a temporarily
> +      * inconsistent state resulting in verification failures. Hence for now
> +      * just avoid the verification stage for non-crc filesystems
> +      */
> +     if (!xfs_sb_version_hascrc(&mp->m_sb))
> +             return;
> +

This function has a couple other similar *_hascrc() checks further down
that we should probably kill now. Otherwise, looks Ok...

Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>

>       magic32 = be32_to_cpu(*(__be32 *)bp->b_addr);
>       magic16 = be16_to_cpu(*(__be16*)bp->b_addr);
>       magicda = be16_to_cpu(info->magic);
> @@ -2197,10 +2208,6 @@ xlog_recover_validate_buf_type(
>  #endif
>               break;
>       case XFS_BLFT_DINO_BUF:
> -             /*
> -              * we get here with inode allocation buffers, not buffers that
> -              * track unlinked list changes.
> -              */
>               if (magic16 != XFS_DINODE_MAGIC) {
>                       xfs_warn(mp, "Bad INODE block magic!");
>                       ASSERT(0);
> @@ -2388,16 +2395,7 @@ xlog_recover_do_reg_buffer(
>       /* Shouldn't be any more regions */
>       ASSERT(i == item->ri_total);
>  
> -     /*
> -      * We can only do post recovery validation on items on CRC enabled
> -      * fielsystems as we need to know when the buffer was written to be able
> -      * to determine if we should have replayed the item. If we replay old
> -      * metadata over a newer buffer, then it will enter a temporarily
> -      * inconsistent state resulting in verification failures. Hence for now
> -      * just avoid the verification stage for non-crc filesystems
> -      */
> -     if (xfs_sb_version_hascrc(&mp->m_sb))
> -             xlog_recover_validate_buf_type(mp, bp, buf_f);
> +     xlog_recover_validate_buf_type(mp, bp, buf_f);
>  }
>  
>  /*
> @@ -2505,12 +2503,29 @@ xlog_recover_buffer_pass2(
>       }
>  
>       /*
> -      * recover the buffer only if we get an LSN from it and it's less than
> +      * Recover the buffer only if we get an LSN from it and it's less than
>        * the lsn of the transaction we are replaying.
> +      *
> +      * Note that we have to be extremely careful of readahead here.
> +      * Readahead does not attach verfiers to the buffers so if we don't
> +      * actually do any replay after readahead because of the LSN we found
> +      * in the buffer if more recent than that current transaction then we
> +      * need to attach the verifier directly. Failure to do so can lead to
> +      * future recovery actions (e.g. EFI and unlinked list recovery) can
> +      * operate on the buffers and they won't get the verifier attached. This
> +      * can lead to blocks on disk having the correct content but a stale
> +      * CRC.
> +      *
> +      * It is safe to assume these clean buffers are currently up to date.
> +      * If the buffer is dirtied by a later transaction being replayed, then
> +      * the verifier will be reset to match whatever recover turns that
> +      * buffer into.
>        */
>       lsn = xlog_recover_get_buf_lsn(mp, bp);
> -     if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0)
> +     if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
> +             xlog_recover_validate_buf_type(mp, bp, buf_f);
>               goto out_release;
> +     }
>  
>       if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
>               error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
> -- 
> 2.0.0
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

<Prev in Thread] Current Thread [Next in Thread>