[Top] [All Lists]

rebuilt HW RAID60 array; XFS filesystem looks bad now

To: <xfs@xxxxxxxxxxx>
Subject: rebuilt HW RAID60 array; XFS filesystem looks bad now
From: Paul Brunk <pbrunk@xxxxxxx>
Date: Mon, 3 Mar 2014 16:05:27 -0500
Delivered-to: xfs@xxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Thunderbird/29.0a2

Short version: XFS filesystem on HW RAID60 array.  Array has been
multiply rebuilt due to drive insertions.  XFS filesystem damaged and
trying to salvage what I can, and I want to make sure I have no option
other than "xfs_repair -L".  Details follow.

# uname -a
Linux rccstor7.local 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

xfs_repair version 3.1.1.  The box has one 4-core Opteron CPU and 8
GB of RAM.

I have a 32TB HW RAID60 volume (Areca 1680 HW RAID) made of two RAID6
raid sets.

This volume is a PV in Linux LVM, with a single LV defined in it.  The
LV had an XFS filesystem created on it (no external log).

I can't do xfs_info on it because I can't mount the filesystem.

I had multiple drive removals and insertions (due to timeout error
with non-TLER drives in the RAID array, an unfortunate setup I
inherited), which triggered multiple HW RAID rebuilds.  This caused
the RAID volume to end up defined twice in the controller, with each
of the two constituent RAID sets being defined twice.  At Areca's
direction, I did a "raid set rescue" in the Areca controller.  That
succeeded in reducing the number of volumes from two to one, and the
RAID volume is now "normal" in the RAID controller instead of

The logical volume is visible to the OS now, unlike when the RAID
status was "failed".

  # lvdisplay
  --- Logical volume ---
  LV Path                /dev/vg0/lv0
  LV Name                lv0
  VG Name                vg0
  LV UUID                YMlFWe-PTGe-5kHx-V3uo-31Vp-grXR-9ZBt3R
  LV Write Access        read/write
  LV Creation host, time ,
  LV Status              available
  # open                 0
  LV Size                32.74 TiB
  Current LE             8582595
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2

That's good, but now I think the XFS filesystem is in bad shape.

 # grep /media/shares /etc/fstab
 UUID="9cba4e90-1d8f-4a98-8701-df10a28556da" /media/shares xfs pquota 0 0

That UUID entry in /dev/disk/by-uuid is a link to /dev/dm-2.

"dm-2" is the RAID volume.  Here it is in /proc/partitions:
 major minor  #blocks     name
  253     2   35154309120 dm-2

When I try to mount the XFS filesystem:

 # mount /media/shares
 mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg0-lv0,
        missing codepage or helper program, or other error
        In some cases useful info is found in syslog - try
        dmesg | tail  or so

 # dmesg|tail
 XFS (dm-2): Mounting Filesystem
 XFS (dm-2): Log inconsistent or not a log (last==0, first!=1)
 XFS (dm-2): empty log check failed
 XFS (dm-2): log mount/recovery failed: error 22
 XFS (dm-2): log mount failed

 # xfs_check /dev/dm-2
 xfs_check: cannot init perag data (117)
 XFS: Log inconsistent or not a log (last==0, first!=1)
 XFS: empty log check failed

 # xfs_repair -n /dev/dm-2
 produced at least 7863 lines of output.   It begins

 Phase 1 - find and verify superblock...
 Phase 2 - using internal log
         - scan filesystem freespace and inode maps...
 bad magic # 0xa04850d in btbno block 0/108
 expected level 0 got 10510 in btbno block 0/108
 bad btree nrecs (144, min=255, max=510) in btbno block 0/108
 block (0,80-80) multiply claimed by bno space tree, state - 2
 block (0,108-108) multiply claimed by bno space tree, state - 7

 # egrep -c "invalid start block" xfsrepair.out
 # egrep -c "multiply claimed by bno" xfsrepair.out

 Included in the output are 381 occurrences of this pair of messages:

 bad starting inode # (0 (0x0 0x0)) in ino rec, skipping rec
 badly aligned inode rec (starting inode = 0)

Is there anything I should try prior to xfs_repair -L?

I'm just trying to salvage whatever I can from this FS.  I'm aware it
could be all gone.  Thanks.

Paul Brunk, system administrator
Georgia Advanced Computing Resource Center (GACRC)
Enterprise IT Svcs, the University of Georgia

<Prev in Thread] Current Thread [Next in Thread>