| To: | xfs@xxxxxxxxxxx |
|---|---|
| Subject: | XFS filesystem recovery from secondary superblocks |
| From: | Aaron Goulding <aarongldng@xxxxxxxxx> |
| Date: | Tue, 30 Oct 2012 22:02:28 -0700 |
| Dkim-signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=7yjD5S00GAnBG4W4P+GqLw8XA3iY5CJ4cODedLyd9YA=; b=HPGb8iWfEusT2PYpBF2k9h77LFk0pTlNreBr9/12J3gKz5XgWpIlI1LLzlqF2+K8MQ uAaHsxjGeroyx/jk6cEVtBfCBU5vPtV8wSoFKQ+kyT2FmjYjv6I9yGWJbEML1LWcm1a4 l4p3pFCeIC3OHFvzL0uCFZb/m3oCiti1CdadCdiJXB5dOAT8MHG8YK/0N0IOrfp/m5VF AelkB303pUXAaGEcptcfefVCqse6sZomDx0umEm26L6JKwBpZrLao6B+IOHQKRqZRdrJ CCBzqDBjvwWzAjOL6dUBsJ7/5u8TSeNIvYIFVgLc99/lepGZZnQi0l9AUPn9T+LRqeDF QeQA== |
|
Hello! So I have an XFS filesystem that isn't mounting, and quite a long story as to why and what I've tried. And before you start, yes backups are the preferred method of restoration at this point. Never trust your files to a single FS, etc. So I have a 9 disk MD array (0.9 superblock format, total usable space 14TB) configured as an LVM PV, one VG, then one LV with not quite all the space allocated. That LV is formatted XFS and mounted as /mnt/storage. This was started on Ubuntu 10.04 which has been release-upgraded to 12.04. The LV has been grown 3 times over the last two years. The system's boot, root and swap partitions are on a separate drive. So what happened? Well one drive died spectacularly. It had a full bearing failure which caused a power drain and the system kicked out two more drives instantly. This put the array into an offline state as expected. I replaced the failed drive with a new one, and checked carefully the disk order, before attempting to re-assemble the array. At the time, I didn't know about mdadm --re-add. (Likely my first mistake) mdadm --create --assume-clean --level=6 --raid-devices=9 /dev/md0 /dev/sdg1 missing /dev/sdh1 /dev/sdj1 /dev/sdd1 /dev/sdb1 /dev/sde1 /dev/sdf1 /dev/sdc1 The first problem with this is that the update to Ubuntu meant it created the superblocks as 1.2 instead of 0.9. Not catching this, I then added in the replacement /dev/sdi1. This started the array rebuilding incorrectly. I quickly realized my mistake and stopped the array, then recreated again, this time using superblock 0.9 format, but the damage had already been done to roughly the first 100GB of the array, possibly more. I attempted to restore the lvm superblock from the backup stored in /etc/lvm/backup/ pvcreate -f -vv --uuid "hJrAn2-wTd8-vY11-steD-23Jh-AwKK-4VvnkH" --restorefile /etc/lvm/backup/vg1 /dev/md0 When that failed, I decided to attach a second array so I could more safely examine the problem. I built a second MD array with 7 3T disks in RAID6, giving me a 15TB /mnt/restore volume to work with. I made a dd copy of /dev/md0 to a test file I could manipulate safely. Once I had the file created, I tried xfs_clean -f /mnt/restore/md0.dat to no luck. I used a hex editor to add XFSB to be beginning, hoping the recovery would just clean around the LVM data with similar results. The result looks like the following: Phase 1 - find and verify superblock... bad primary superblock - bad or unsupported version !!! attempting to find secondary superblock... .................................................................................................... unable to verify superblock, continuing... .................................................................................................... unable to verify superblock, continuing... .................................................................................................... unable to verify superblock, continuing... .................................................................................................... unable to verify superblock, continuing... .................................................................................................... unable to verify superblock, continuing... .................................................................................................... unable to verify superblock, continuing... .................................................................................................... unable to verify superblock, continuing... .................................................................................................... unable to verify superblock, continuing... .................................................................................................... unable to verify superblock, continuing... .................................................................................................... unable to verify superblock, continuing... .................................................................................................... Exiting now. running xfs_db /mnt/restore/md0.dat would appear to run out of memory. So I realized I needed to pull the data out of LVM and re-assemble it properly if I was going to make any progress. So I checked the backup config again: # Generated by LVM2 version 2.02.66(2) (2010-05-20): Sun Jul 29 13:40:58 2012 contents = "Text Format Volume Group" version = 1 description = "Created *after* executing 'vgcfgbackup'" creation_host = "jarvis" # Linux jarvis 3.0.0-23-server #39-Ubuntu SMP Thu Jul 19 19:37:41 UTC 2012 x86_64 creation_time = 1343594458 # Sun Jul 29 13:40:58 2012 vg1 { id = "hJrAn2-wTd8-vY11-steD-23Jh-AwKK-4VvnkH" seqno = 19 status = ["RESIZEABLE", "READ", "WRITE"] flags = [] extent_size = 8192 # 4 Megabytes max_lv = 0 max_pv = 0 physical_volumes { pv0 { id = "VRHqH4-oIje-iQWV-iLUL-dLXX-eEf9-mLd9Z7" device = "/dev/md0" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 27349166336 # 12.7354 Terabytes pe_start = 768 pe_count = 3338521 # 12.7354 Terabytes } } logical_volumes { storage { id = "H47IMn-ohEG-3W6l-NfCu-ePjJ-U255-FcIjdp" status = ["READ", "WRITE", "VISIBLE"] flags = [] segment_count = 4 segment1 { start_extent = 0 extent_count = 2145769 # 8.18546 Terabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 25794 ] } segment2 { start_extent = 2145769 extent_count = 626688 # 2.39062 Terabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 2174063 ] } segment3 { start_extent = 2772457 extent_count = 384170 # 1.46549 Terabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 2954351 ] } segment4 { start_extent = 3156627 extent_count = 140118 # 547.336 Gigabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 2800751 ] } } } } So I noticed segment4 comes before segment3 (based off stripes = [ "pv0",2800751 ]) and standard extent size was 4MB, so I wrote the following: echo "writing seg 1 .." dd if=/dev/md0 of=/dev/md1 bs=4194304 seek=0 skip=25794 count=2145769 echo "writing seg 2 .." dd if=/dev/md0 of=/dev/md1 bs=4194304 seek=2145769 skip=2174063 count=626688 echo "writing seg 3 .." dd if=/dev/md0 of=/dev/md1 bs=4194304 seek=2772457 skip=2954351 count=384170 echo "writing seg 4 .." dd if=/dev/md0 of=/dev/md1 bs=4194304 seek=3156627 skip=2800751 count=140118 then just to make sure things were clean, I zeroed out the remainder of /dev/md1 I used the hex editor (shed) again to make sure the first four bytes on the drive are XFSB. Once done, I tried xfs_repair again, this time on /dev/md1 with the same results as above. Next I tried xfs_db /dev/md1 to see if anything would load. I get the following: root@jarvis:/mnt# xfs_db /dev/md1 Floating point exception With the following in dmesg: [1568395.691767] xfs_db[30966] trap divide error ip:41e4b5 sp:7fff5db8ab90 error:0 in xfs_db[400000+6a000] So at this point I'm stumped. I'm hoping one of you clever folks out there might have some next steps I can take. I'm okay with a partial recovery, and I'm okay if the directory tree gets horked and I have to dig through lost+found, but I'd really like to at least be able to recover something from this. I'm happy to post any info needed on this. Thanks! -Aaron |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: [PATCH 19/25] xfs: add xfs_da_node verification, Dave Chinner |
|---|---|
| Next by Date: | Re: [PATCH 25/25] xfs: add write verifiers to log recovery, Christoph Hellwig |
| Previous by Thread: | [patch 0/2] xfstests: build system revamp, Ben Myers |
| Next by Thread: | xfstest 179 ASSERT, Mark Tinguely |
| Indexes: | [Date] [Thread] [Top] [All Lists] |