xfs
[Top] [All Lists]

Re: fs corruption

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: fs corruption
From: Leo Davis <leo1783@xxxxxxxxx>
Date: Sun, 24 Apr 2011 22:47:24 -0700 (PDT)
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1303710444; bh=QNf1lB0so6e+irn52pJvCYuqghQ8CBq3RasMKcq1eBg=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:MIME-Version:Content-Type; b=5xUTBCpx9OXjqpmTaEUbgZXYHOhPRSDWZ4iYJFFblqgWz8KAzgS1lgisHH/AI+QsWrLFZiWzJMy+1xwGjpyxbQWgVoteCiH+GCW4C9Nb4CYR1IQS3JaLm+plK5B6reb9/0KU7LqzgrjQ5KvAc5YBPrH86eOcPZlgvSB3I2rBJL0=
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:MIME-Version:Content-Type; b=erR1zB8jusFzvI4rDCA7SOuHse1F805yxWw2WkIU7jfhVrC+dUgXWc0L48A1pPCQ15osKCOd63D9OhWlILOie20uhCXDzDs/HNIwbsGvX0DU1Yo6SrzYTBOg5pVYVJlCcu3oOM98rRhEleb3qNiUH9NptpimLC6n5UOcN9qVq+0=;
Just to add if it helps- I find this logged by smart array controller:

Corrected ECC Error, Status=0x00000001 Addr=0x060f4e00

 


 


From: Leo Davis <leo1783@xxxxxxxxx>
To: Dave Chinner <david@xxxxxxxxxxxxx>
Cc: xfs@xxxxxxxxxxx
Sent: Mon, April 25, 2011 9:55:02 AM
Subject: Re: fs corruption

Thank you for that :).
 
However,I've run into another fs corruption issue on my other server. I just thought I would use the same thread rather than opening new.
 
I was troublehooting a weird fiber channel issue ( logins going missing to my storage) when I noticed these backtraces in dmesg.
 

Filesystem "cciss/c3d1p1": XFS internal error xfs_btree_check_lblock at line 186 of file fs/xfs/xfs_btree.c. Caller 0xffffffff881b92d6

Call Trace:

[<ffffffff881bce83>] :xfs:xfs_btree_check_lblock+0xf4/0xfe

[<ffffffff881b92d6>] :xfs:xfs_bmbt_lookup+0x159/0x420

[<ffffffff881b41cc>] :xfs:xfs_bmap_add_extent_delay_real+0x62a/0x103a

[<ffffffff881a8cfa>] :xfs:xfs_alloc_vextent+0x379/0x3ff

[<ffffffff881b543a>] :xfs:xfs_bmap_add_extent+0x1fb/0x390

[<ffffffff881b7f34>] :xfs:xfs_bmapi+0x895/0xe79

[<ffffffff881d4082>] :xfs:xfs_iomap_write_allocate+0x201/0x328

[<ffffffff881d4b09>] :xfs:xfs_iomap+0x22a/0x2a5

[<ffffffff881e9ae3>] :xfs:xfs_map_blocks+0x2d/0x65

[<ffffffff881ea723>] :xfs:xfs_page_state_convert+0x2af/0x544

[<ffffffff881eab04>] :xfs:xfs_vm_writepage+0xa7/0xdf

[<ffffffff8001cef2>] mpage_writepages+0x1bf/0x37d

[<ffffffff881eaa5d>] :xfs:xfs_vm_writepage+0x0/0xdf

[<ffffffff8005b1ea>] do_writepages+0x20/0x2f

[<ffffffff8005000e>] __filemap_fdatawrite_range+0x50/0x5b

[<ffffffff80050717>] do_fsync+0x2f/0xa4

[<ffffffff800e1ce9>] __do_fsync+0x23/0x36

[<ffffffff8005e116>] system_call+0x7e/0x83

Filesystem "cciss/c3d1p1": XFS internal error xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c. Caller 0xffffffff881d4186

Call Trace:

[<ffffffff881e1b37>] :xfs:xfs_trans_cancel+0x55/0xfa

[<ffffffff881d4186>] :xfs:xfs_iomap_write_allocate+0x305/0x328

[<ffffffff881d4b09>] :xfs:xfs_iomap+0x22a/0x2a5

[<ffffffff881e9ae3>] :xfs:xfs_map_blocks+0x2d/0x65

[<ffffffff881ea723>] :xfs:xfs_page_state_convert+0x2af/0x544

[<ffffffff881eab04>] :xfs:xfs_vm_writepage+0xa7/0xdf

[<ffffffff8001cef2>] mpage_writepages+0x1bf/0x37d

[<ffffffff881eaa5d>] :xfs:xfs_vm_writepage+0x0/0xdf

[<ffffffff8005b1ea>] do_writepages+0x20/0x2f

[<ffffffff8005000e>] __filemap_fdatawrite_range+0x50/0x5b

[<ffffffff80050717>] do_fsync+0x2f/0xa4

[<ffffffff800e1ce9>] __do_fsync+0x23/0x36

[<ffffffff8005e116>] system_call+0x7e/0x83

xfs_force_shutdown(cciss/c3d1p1,0x8) called from line 1165 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff881e1b50

Filesystem "cciss/c3d1p1": Corruption of in-memory data detected. Shutting down filesystem: cciss/c3d1p1

Please umount the filesystem, and rectify the problem(s)

Filesystem "cciss/c3d1p1": xfs_log_force: error 5 returned.

Filesystem "cciss/c3d1p1": xfs_log_force: error 5 returned.

 

 
Any thoughts on what the root cause might be?
- I've checked the underlying drives, array controller etc and all looks healthy; (indicating it is a fs issue for sure?)
I did the xfs_repair which corrected the issue but I'm worried as to how fs ended up in this state, this being a production box.
 
Thanks in advance.


From: Dave Chinner <david@xxxxxxxxxxxxx>
To: Leo Davis <leo1783@xxxxxxxxx>
Cc: xfs@xxxxxxxxxxx
Sent: Tue, April 12, 2011 4:35:32 PM
Subject: Re: fs corruption

On Tue, Apr 12, 2011 at 03:51:20AM -0700, Leo Davis wrote:
> You have a corrupted free space btree.
>
> Err... apologies for my ignorance, but what is a free space btree?

A tree that indexes the free space in the filesystem. Every time you
write a file or remove a file you are allocating or freeing space,
and these tree keep track of that free space.

If you want to know - at a high level - how XFS is structured (good
for understanding what a free space tree is), read this paper:

http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html

It's from 1996, but still correct on all the major structural
details.

> I had serial trace from raid controller which i just checked and
> it logged some 'Loose cabling', but this was months back.....  not
> sure whether that can be the cause of this.. strange if that is
> the case since it's been a long time

it's possible that it took a couple of months to trip over a random
metadata corruption. I've seen that before in directory trees and
inode clusters where corruption is not detected until next time they
are read from disk....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
<Prev in Thread] Current Thread [Next in Thread>