Hello Everyone,<br><br>Hopefully this is the correct kind of information to send to this list.<br><br>I have an issue with a large XFS volume (17TB) that mounts, but is not readable. I can view the folder structure on the volume but I can't access any of the actual data. A disk failed in a RAID5 array and while it has rebuilt now, it looks like it's caused serious data integrity issues.<br>
<br>Here is the CentOS release / Kernel version:<br> [root@svr608 ~]# uname -a<br> Linux svr608 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012 x86_64 x86_64 x86_64 GNU/Linux<br> [root@svr608 ~]# cat /etc/redhat-release<br>
CentOS release 5.8 (Final)<br> [root@svr608 ~]# cat /tmp/yum.list | grep xfs | grep installed<br> kmod-xfs.x86_64 0.4-2 installed<br> xfsdump.x86_64 2.2.46-1.el5.centos installed<br>
xfsprogs.x86_64 2.9.4-1.el5.centos installed<br> xorg-x11-xfs.x86_64 1:1.0.2-5.el5_6.1 installed<br><br>On startup, the OS thinks everything's fine with the drives/volume:<br>
SCSI subsystem initialized<br> HP CISS Driver (v 3.6.28-RH2)<br> GSI 20 sharing vector 0x42 and IRQ 20<br> ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 32 (level, low) -> IRQ 66<br> cciss 0000:04:00.0: cciss: Trying to put board into performant mode<br>
cciss 0000:04:00.0: Placing controller into performant mode<br> cciss/c0d0: p1 p2 p3 p4 < p5 ><br> usb 5-2: new low speed USB device using uhci_hcd and address 2<br> cciss/c0d1:<br> cciss 0000:04:00.0: blocks= 35162671280 block_size= 512<br>
cciss 0000:04:00.0: blocks= 35162671280 block_size= 512<br> cciss/c0d2: unknown partition table<br> scsi0 : cciss<br> shpchp: Standard Hot Plug PCI Controller Driver version: 0.4<br> libata version 3.00 loaded.<br>
ata_piix 0000:00:1f.2: version 2.12<br> ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 58<br> ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]<br> PCI: Setting latency timer of device 0000:00:1f.2 to 64<br>
scsi1 : ata_piix<br> scsi2 : ata_piix<br> ata1: SATA max UDMA/133 bmdma 0xff90 irq 14<br> ata2: SATA max UDMA/133 bmdma 0xff98 irq 15<br> usb 5-2: configuration #1 chosen from 1 choice<br> input: Rextron USB as /class/input/input0<br>
input,hidraw0: USB HID v1.10 Keyboard [Rextron USB] on usb-0000:00:1d.1-2<br> input: Rextron USB as /class/input/input1<br> input,hidraw0: USB HID v1.00 Mouse [Rextron USB] on usb-0000:00:1d.1-2<br> ata1: SATA link down (SStatus 0 SControl 300)<br>
ata2: SATA link down (SStatus 0 SControl 300)<br> ACPI: PCI Interrupt 0000:00:1f.5[B] -> GSI 19 (level, low) -> IRQ 58<br> ata_piix 0000:00:1f.5: MAP [ P0 -- P1 -- ]<br> PCI: Setting latency timer of device 0000:00:1f.5 to 64<br>
scsi3 : ata_piix<br> scsi4 : ata_piix<br> ata3: SATA max UDMA/133 cmd 0xcc00 ctl 0xc880 bmdma 0xc400 irq 58<br> ata4: SATA max UDMA/133 cmd 0xc800 ctl 0xc480 bmdma 0xc408 irq 58<br> ata3: SATA link down (SStatus 0 SControl 300)<br>
ata4: SATA link down (SStatus 0 SControl 300)<br> device-mapper: uevent: version 1.0.3<br> device-mapper: ioctl: 4.11.6-ioctl (2011-02-18) initialised: <a href="mailto:dm-devel@redhat.com">dm-devel@redhat.com</a><br>
device-mapper: dm-raid45: initialized v0.2594l<br> kjournald starting. Commit interval 5 seconds<br> EXT3-fs: mounted filesystem with ordered data mode.<br> SELinux: Disabled at runtime.<br> SELinux: Unregistering netfilter hooks<br>
type=1404 audit(1334501635.200:2): selinux=0 auid=4294967295 ses=4294967295<br> ... snip (network devices) ...<br> dell-wmi: No known WMI GUID found<br> md: Autodetecting RAID arrays.<br> md: autorun ...<br>
md: ... autorun DONE.<br> device-mapper: multipath: version 1.0.6 loaded<br> loop: loaded (max 8 devices)<br> EXT3 FS on cciss/c0d0p5, internal journal<br> kjournald starting. Commit interval 5 seconds<br>
EXT3 FS on cciss/c0d0p3, internal journal<br> EXT3-fs: mounted filesystem with ordered data mode.<br> kjournald starting. Commit interval 5 seconds<br> EXT3 FS on cciss/c0d0p1, internal journal<br> EXT3-fs: mounted filesystem with ordered data mode.<br>
SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled<br> SGI XFS Quota Management subsystem<br> XFS mounting filesystem cciss/c0d2<br> Ending clean XFS mount for filesystem: cciss/c0d2<br>
Adding 4192956k swap on /dev/cciss/c0d0p2. Priority:-1 extents:1 across:4192956k<br><br>But even though the volume mounts, when trying to access data it just gives a "Structure needs cleaning" error.<br><br>
Running xfs_check and xfs_repair yield the following:<br> [root@svr608 ~]# xfs_check /dev/cciss/c0d2<br> bad agf magic # 0x58418706 in ag 0<br> bad agf version # 0x30002 in ag 0<br> /usr/sbin/xfs_check: line 28: 5259 Segmentation fault xfs_db$DBOPTS -i -p xfs_check -c "check$OPTS" $1<br>
[root@svr608 ~]# xfs_repair -n /dev/cciss/c0d2<br> Phase 1 - find and verify superblock...<br> superblock read failed, offset 0, size 524288, ag 0, rval -1<br><br> fatal error -- Input/output error<br><br>And they leave the following in dmesg:<br>
xfs_db[5259]: segfault at 000000000555a134 rip 00000000004070c3 rsp 00007fff986bae50 error 4<br> cciss 0000:04:00.0: cciss: c ffff810037e00000 has CHECK CONDITION sense key = 0x3<br><br>And finally if I try to ls or stat a directory, I get the following call trace:<br>
Call Trace:<br> [<ffffffff8835d8b8>] :xfs:xfs_da_do_buf+0x4ee/0x59c<br> [<ffffffff8835d9b9>] :xfs:xfs_da_read_buf+0x16/0x1b<br> [<ffffffff8835d9b9>] :xfs:xfs_da_read_buf+0x16/0x1b<br> [<ffffffff88362414>] :xfs:xfs_dir2_leaf_lookup_int+0x57/0x24f<br>
[<ffffffff88362414>] :xfs:xfs_dir2_leaf_lookup_int+0x57/0x24f<br> [<ffffffff8004ad3e>] try_to_del_timer_sync+0x7f/0x88<br> [<ffffffff883628c5>] :xfs:xfs_dir2_leaf_lookup+0x1f/0xb6<br> [<ffffffff8835f50c>] :xfs:xfs_dir2_isleaf+0x19/0x4a<br>
[<ffffffff8003f8b2>] memcpy_toiovec+0x36/0x66<br> [<ffffffff8835fc1a>] :xfs:xfs_dir_lookup+0xf9/0x140<br> [<ffffffff88384309>] :xfs:xfs_lookup+0x49/0xa8<br> [<ffffffff8805c27c>] :ext3:ext3_get_acl+0x63/0x310<br>
[<ffffffff8838f772>] :xfs:xfs_vn_lookup+0x3d/0x7b<br> [<ffffffff8000d0b0>] do_lookup+0x126/0x227<br> [<ffffffff80009c59>] __link_path_walk+0x3aa/0xf39<br> [<ffffffff8000eb37>] link_path_walk+0x45/0xb8<br>
[<ffffffff8000ce0a>] do_path_lookup+0x294/0x310<br> [<ffffffff80012969>] getname+0x15b/0x1c2<br> [<ffffffff80023a11>] __user_walk_fd+0x37/0x4c<br> [<ffffffff8002898c>] vfs_stat_fd+0x1b/0x4a<br>
[<ffffffff80067235>] do_page_fault+0x4cc/0x842<br> [<ffffffff8023074b>] sys_connect+0x7e/0xae<br> [<ffffffff80023741>] sys_newstat+0x19/0x31<br> [<ffffffff8005d229>] tracesys+0x71/0xe0<br>
[<ffffffff8005d28d>] tracesys+0xd5/0xe0<br><br> 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................<br> Filesystem cciss/c0d2: XFS internal error xfs_da_do_buf(2) at line 2112 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff8835d9b9<br>
<br>hpacucli says the array is fine, but it looks like it's corrupted to me. This is probably a lost cause, but if anyone has any ideas I'd love to hear them.<br><br><br>Thanks,<br><br>Drew<br>