[PATCH 2/4] xfs: ensure verifiers are attached to recovered buffers
Brian Foster
bfoster at redhat.com
Wed Jul 30 11:30:03 CDT 2014
On Wed, Jul 30, 2014 at 11:48:47AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner at redhat.com>
>
> Crash testing of CRC enabled filesystems has resulted in a number of
> reports of bad CRCs being detected after the filesystem was mounted.
> Errors such as the following were being seen:
>
> XFS (sdb3): Mounting V5 Filesystem
> XFS (sdb3): Starting recovery (logdev: internal)
> XFS (sdb3): Metadata CRC error detected at xfs_agf_read_verify+0x5a/0x100 [xfs], block 0x1
> XFS (sdb3): Unmount and run xfs_repair
> XFS (sdb3): First 64 bytes of corrupted metadata buffer:
> ffff880136ffd600: 58 41 47 46 00 00 00 01 00 00 00 00 00 0f aa 40 XAGF...........@
> ffff880136ffd610: 00 02 6d 53 00 02 77 f8 00 00 00 00 00 00 00 01 ..mS..w.........
> ffff880136ffd620: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03 ................
> ffff880136ffd630: 00 00 00 04 00 08 81 d0 00 08 81 a7 00 00 00 00 ................
> XFS (sdb3): metadata I/O error: block 0x1 ("xfs_trans_read_buf_map") error 74 numblks 1
>
> The errors were typically being seen in AGF, AGI and their related
> btree block buffers some time after log recovery had run. Often it
> wasn't until later subsequent mounts that the problem was
> discovered. The common symptom was a buffer with the correct
> contents, but a CRC and an LSN that matched an older version of the
> contents.
>
> Some debug added to _xfs_buf_ioapply() indicated that buffers were
> being written without verifiers attached to them from log recovery,
> and Jan Kara isolated the cause to log recovery readahead an dit's
> interactions with buffers that had a more recent LSN on disk than
> the transaction being recovered. In this case, the buffer did not
> get a verifier attached, and os when the second phase of log
> recovery ran and recovered EFIs and unlinked inodes, the buffers
> were modified and written without the verifier running. Hence they
> had up to date contents, but stale LSNs and CRCs.
>
> Fix it by attaching verifiers to buffers we skip due to future LSN
> values so they don't escape into the buffer cache without the
> correct verifier attached.
>
> This patch is based on analysis and a patch from Jan Kara.
>
> cc: <stable at vger.kernel.org>
> Reported-by: Jan Kara <jack at suse.cz>
> Reported-by: Fanael Linithien <fanael4 at gmail.com>
> Reported-by: Grozdan <neutrino8 at gmail.com>
> Signed-off-by: Dave Chinner <dchinner at redhat.com>
> ---
> fs/xfs/xfs_log_recover.c | 47 +++++++++++++++++++++++++++++++----------------
> 1 file changed, 31 insertions(+), 16 deletions(-)
>
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index fbc2362..0a015cc 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2126,6 +2126,17 @@ xlog_recover_validate_buf_type(
> __uint16_t magic16;
> __uint16_t magicda;
>
> + /*
> + * We can only do post recovery validation on items on CRC enabled
> + * fielsystems as we need to know when the buffer was written to be able
> + * to determine if we should have replayed the item. If we replay old
> + * metadata over a newer buffer, then it will enter a temporarily
> + * inconsistent state resulting in verification failures. Hence for now
> + * just avoid the verification stage for non-crc filesystems
> + */
> + if (!xfs_sb_version_hascrc(&mp->m_sb))
> + return;
> +
This function has a couple other similar *_hascrc() checks further down
that we should probably kill now. Otherwise, looks Ok...
Reviewed-by: Brian Foster <bfoster at redhat.com>
> magic32 = be32_to_cpu(*(__be32 *)bp->b_addr);
> magic16 = be16_to_cpu(*(__be16*)bp->b_addr);
> magicda = be16_to_cpu(info->magic);
> @@ -2197,10 +2208,6 @@ xlog_recover_validate_buf_type(
> #endif
> break;
> case XFS_BLFT_DINO_BUF:
> - /*
> - * we get here with inode allocation buffers, not buffers that
> - * track unlinked list changes.
> - */
> if (magic16 != XFS_DINODE_MAGIC) {
> xfs_warn(mp, "Bad INODE block magic!");
> ASSERT(0);
> @@ -2388,16 +2395,7 @@ xlog_recover_do_reg_buffer(
> /* Shouldn't be any more regions */
> ASSERT(i == item->ri_total);
>
> - /*
> - * We can only do post recovery validation on items on CRC enabled
> - * fielsystems as we need to know when the buffer was written to be able
> - * to determine if we should have replayed the item. If we replay old
> - * metadata over a newer buffer, then it will enter a temporarily
> - * inconsistent state resulting in verification failures. Hence for now
> - * just avoid the verification stage for non-crc filesystems
> - */
> - if (xfs_sb_version_hascrc(&mp->m_sb))
> - xlog_recover_validate_buf_type(mp, bp, buf_f);
> + xlog_recover_validate_buf_type(mp, bp, buf_f);
> }
>
> /*
> @@ -2505,12 +2503,29 @@ xlog_recover_buffer_pass2(
> }
>
> /*
> - * recover the buffer only if we get an LSN from it and it's less than
> + * Recover the buffer only if we get an LSN from it and it's less than
> * the lsn of the transaction we are replaying.
> + *
> + * Note that we have to be extremely careful of readahead here.
> + * Readahead does not attach verfiers to the buffers so if we don't
> + * actually do any replay after readahead because of the LSN we found
> + * in the buffer if more recent than that current transaction then we
> + * need to attach the verifier directly. Failure to do so can lead to
> + * future recovery actions (e.g. EFI and unlinked list recovery) can
> + * operate on the buffers and they won't get the verifier attached. This
> + * can lead to blocks on disk having the correct content but a stale
> + * CRC.
> + *
> + * It is safe to assume these clean buffers are currently up to date.
> + * If the buffer is dirtied by a later transaction being replayed, then
> + * the verifier will be reset to match whatever recover turns that
> + * buffer into.
> */
> lsn = xlog_recover_get_buf_lsn(mp, bp);
> - if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0)
> + if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
> + xlog_recover_validate_buf_type(mp, bp, buf_f);
> goto out_release;
> + }
>
> if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
> error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
> --
> 2.0.0
>
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
More information about the xfs
mailing list