X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: * X-Spam-Status: No, score=1.1 required=5.0 tests=BAYES_00,FREEMAIL_FROM, HTML_MESSAGE,T_DKIM_INVALID autolearn=no version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id q3K4CGHd168902 for ; Thu, 19 Apr 2012 23:12:16 -0500 X-ASG-Debug-ID: 1334895134-04cbb005682651c0001-NocioJ Received: from mail-iy0-f181.google.com (mail-iy0-f181.google.com [209.85.210.181]) by cuda.sgi.com with ESMTP id prsNRCH3NtGDHfHj (version=TLSv1 cipher=RC4-SHA bits=128 verify=NO) for ; Thu, 19 Apr 2012 21:12:15 -0700 (PDT) X-Barracuda-Envelope-From: m3rlin@gmail.com X-Barracuda-Apparent-Source-IP: 209.85.210.181 Received: by iagk10 with SMTP id k10so14727674iag.26 for ; Thu, 19 Apr 2012 21:12:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=K0rrE8R3taQuuTNknEzeibXCQDNIgxqMYBb67qboZBo=; b=K1InI9OXc/KV88w9IY9GmwG4oRgnCiFyVQ8+d41iEHy82vXomSsHt/Eo4odHtTARZm tEcO5dGBLlRH+3I43rZONeRmgvluYAxuMUqPMSekHUPyydSzHKJOXJveiCo8EYGbodUP d7wXIUQ5d5o7hWkdK6AZsasBOri90bRW3eBmEDjOcf7+kcHypNWr/B3ie3BG7AGtKzQC RZpgYADvY/JNKLGNj/jZHGyZxMsRU7W8Mhk1+0AwGYnUi3FVzGgD0OnEhUPIbS3pI3lt of3+EaYwcb3VmI7bQIkYyfN4g8LUkDfr8Gasazi9BCCozCoOS6YByDBDNyEqVk1ejdf5 3H6A== Received: by 10.50.217.230 with SMTP id pb6mr4514799igc.1.1334895134737; Thu, 19 Apr 2012 21:12:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.203.77 with HTTP; Thu, 19 Apr 2012 21:11:54 -0700 (PDT) In-Reply-To: <20120415223106.GU6734@dastard> References: <20120415223106.GU6734@dastard> From: Drew Wareham Date: Fri, 20 Apr 2012 14:11:54 +1000 Message-ID: Subject: Re: xfs_check segfault / xfs_repair I/O error To: Dave Chinner X-ASG-Orig-Subj: Re: xfs_check segfault / xfs_repair I/O error Cc: xfs@oss.sgi.com Content-Type: multipart/alternative; boundary=14dae93411e751836604be147ea3 X-Barracuda-Connect: mail-iy0-f181.google.com[209.85.210.181] X-Barracuda-Start-Time: 1334895134 X-Barracuda-Encrypted: RC4-SHA X-Barracuda-URL: http://192.48.176.25:80/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at sgi.com X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using per-user scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=1.3 tests=DKIM_SIGNED, DKIM_VERIFIED, HTML_MESSAGE X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.94629 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- -0.00 DKIM_VERIFIED Domain Keys Identified Mail: signature passes verification 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature 0.00 HTML_MESSAGE BODY: HTML included in message --14dae93411e751836604be147ea3 Content-Type: text/plain; charset=UTF-8 Hi Dave / Stan, Thanks for taking the time to reply. Unfortunately none of the suggestions were able to recover the data - I'm going to rebuild the array now, but as RAID6 for the extra level of security. Thanks again for all your help! Cheers, Drew On Mon, Apr 16, 2012 at 8:31 AM, Dave Chinner wrote: > On Sun, Apr 15, 2012 at 11:15:09PM +1000, Drew Wareham wrote: > > Hello Everyone, > > > > Hopefully this is the correct kind of information to send to this list. > > > > I have an issue with a large XFS volume (17TB) that mounts, but is not > > readable. I can view the folder structure on the volume but I can't > access > > any of the actual data. A disk failed in a RAID5 array and while it has > > rebuilt now, it looks like it's caused serious data integrity issues. > > > > Here is the CentOS release / Kernel version: > > [root@svr608 ~]# uname -a > > Linux svr608 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012 > > x86_64 x86_64 x86_64 GNU/Linux > > [root@svr608 ~]# cat /etc/redhat-release > > CentOS release 5.8 (Final) > > [root@svr608 ~]# cat /tmp/yum.list | grep xfs | grep installed > > kmod-xfs.x86_64 0.4-2 > > installed > > xfsdump.x86_64 2.2.46-1.el5.centos > > installed > > xfsprogs.x86_64 2.9.4-1.el5.centos > > Try upgrading xfsprogs to the latest version first. this is rather > old, and the latest versions handle IO errors better... > > > But even though the volume mounts, when trying to access data it just > gives > > a "Structure needs cleaning" error. > > > > Running xfs_check and xfs_repair yield the following: > > [root@svr608 ~]# xfs_check /dev/cciss/c0d2 > > bad agf magic # 0x58418706 in ag 0 > > Oh, that's bad. 2 bytes of the magic number are corrupt... > > > bad agf version # 0x30002 in ag 0 > > And the version is completely toast. > > > /usr/sbin/xfs_check: line 28: 5259 Segmentation fault > > xfs_db$DBOPTS -i -p xfs_check -c "check$OPTS" $1 > > [root@svr608 ~]# xfs_repair -n /dev/cciss/c0d2 > > Phase 1 - find and verify superblock... > > superblock read failed, offset 0, size 524288, ag 0, rval -1 > > > > fatal error -- Input/output error > > > > And they leave the following in dmesg: > > xfs_db[5259]: segfault at 000000000555a134 rip 00000000004070c3 rsp > > 00007fff986bae50 error 4 > > cciss 0000:04:00.0: cciss: c ffff810037e00000 has CHECK CONDITION > sense > > key = 0x3 > > This is clearly a raid array error.... > > .... > > > ................ > > Filesystem cciss/c0d2: XFS internal error xfs_da_do_buf(2) at line > 2112 > > of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff8835d9b9 > > > > hpacucli says the array is fine, but it looks like it's corrupted to me. > > It's badly corrupted. Try a newer version of check/repair, otherwise > you're in a disaster recovery situation... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > --14dae93411e751836604be147ea3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Dave / Stan,

Thanks for taking the time to reply.=C2=A0 Unfortuna= tely none of the suggestions were able to recover the data - I'm going = to rebuild the array now, but as RAID6 for the extra level of security.
=
Thanks again for all your help!


Cheers,

Dre= w


On Mon, Apr 16, 2012 at 8:31 AM, Dave Ch= inner <david@fr= omorbit.com> wrote:
On Sun, Apr 15, 2012 at 11:15:09PM +1000, Drew Wareham wr= ote:
> Hello Everyone,
>
> Hopefully this is the correct kind of information to send to this list= .
>
> I have an issue with a large XFS volume (17TB) that mounts, but is not=
> readable. =C2=A0I can view the folder structure on the volume but I ca= n't access
> any of the actual data. =C2=A0A disk failed in a RAID5 array and while= it has
> rebuilt now, it looks like it's caused serious data integrity issu= es.
>
> Here is the CentOS release / Kernel version:
> =C2=A0 =C2=A0 [root@svr608 ~]# uname -a
> =C2=A0 =C2=A0 Linux svr608 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:5= 1 EST 2012
> x86_64 x86_64 x86_64 GNU/Linux
> =C2=A0 =C2=A0 [root@svr608 ~]# cat /etc/redhat-release
> =C2=A0 =C2=A0 CentOS release 5.8 (Final)
> =C2=A0 =C2=A0 [root@svr608 ~]# cat /tmp/yum.list | grep xfs | grep ins= talled
> =C2=A0 =C2=A0 kmod-xfs.x86_64 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.4-2
> installed
> =C2=A0 =C2=A0 xfsdump.x86_64 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2.2.46-1.el5.cento= s
> installed
> =C2=A0 =C2=A0 xfsprogs.x86_64 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02.9.4-1.el5.cent= os

Try upgrading xfsprogs to the latest version first. this is rather old, and the latest versions handle IO errors better...

> But even though the volume mounts, when trying to access data it just = gives
> a "Structure needs cleaning" error.
>
> Running xfs_check and xfs_repair yield the following:
> =C2=A0 =C2=A0 [root@svr608 ~]# xfs_check /dev/cciss/c0d2
> =C2=A0 =C2=A0 bad agf magic # 0x58418706 in ag 0

Oh, that's bad. 2 bytes of the magic number are corrupt...

> =C2=A0 =C2=A0 bad agf version # 0x30002 in ag 0

And the version is completely toast.

> =C2=A0 =C2=A0 /usr/sbin/xfs_check: line 28: =C2=A05259 Segmentation fa= ult
> xfs_db$DBOPTS -i -p xfs_check -c "check$OPTS" $1
> =C2=A0 =C2=A0 [root@svr608 ~]# xfs_repair -n /dev/cciss/c0d2
> =C2=A0 =C2=A0 Phase 1 - find and verify superblock...
> =C2=A0 =C2=A0 superblock read failed, offset 0, size 524288, ag 0, rva= l -1
>
> =C2=A0 =C2=A0 fatal error -- Input/output error
>
> And they leave the following in dmesg:
> =C2=A0 =C2=A0 xfs_db[5259]: segfault at 000000000555a134 rip 000000000= 04070c3 rsp
> 00007fff986bae50 error 4
> =C2=A0 =C2=A0 cciss 0000:04:00.0: cciss: c ffff810037e00000 has CHECK = CONDITION sense
> key =3D 0x3

This is clearly a raid array error....

....

> ................
> =C2=A0 =C2=A0 Filesystem cciss/c0d2: XFS internal error xfs_da_do_buf(= 2) at line 2112
> of file fs/xfs/xfs_da_btree.c. =C2=A0Caller 0xffffffff8835d9b9
>
> hpacucli says the array is fine, but it looks like it's corrupted = to me.

It's badly corrupted. Try a newer version of check/repair, otherw= ise
you're in a disaster recovery situation...

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

--14dae93411e751836604be147ea3--