| To: | Dave Chinner <david@xxxxxxxxxxxxx> |
|---|---|
| Subject: | Re: fs corruption |
| From: | Leo Davis <leo1783@xxxxxxxxx> |
| Date: | Sun, 24 Apr 2011 22:47:24 -0700 (PDT) |
| Cc: | xfs@xxxxxxxxxxx |
| Dkim-signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1303710444; bh=QNf1lB0so6e+irn52pJvCYuqghQ8CBq3RasMKcq1eBg=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:MIME-Version:Content-Type; b=5xUTBCpx9OXjqpmTaEUbgZXYHOhPRSDWZ4iYJFFblqgWz8KAzgS1lgisHH/AI+QsWrLFZiWzJMy+1xwGjpyxbQWgVoteCiH+GCW4C9Nb4CYR1IQS3JaLm+plK5B6reb9/0KU7LqzgrjQ5KvAc5YBPrH86eOcPZlgvSB3I2rBJL0= |
| Domainkey-signature: | a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:MIME-Version:Content-Type; b=erR1zB8jusFzvI4rDCA7SOuHse1F805yxWw2WkIU7jfhVrC+dUgXWc0L48A1pPCQ15osKCOd63D9OhWlILOie20uhCXDzDs/HNIwbsGvX0DU1Yo6SrzYTBOg5pVYVJlCcu3oOM98rRhEleb3qNiUH9NptpimLC6n5UOcN9qVq+0=; |
|
Just to add if it helps- I find this logged by smart array controller:
Corrected ECC Error, Status=0x00000001 Addr=0x060f4e00
From: Leo Davis <leo1783@xxxxxxxxx> To: Dave Chinner <david@xxxxxxxxxxxxx> Cc: xfs@xxxxxxxxxxx Sent: Mon, April 25, 2011 9:55:02 AM Subject: Re: fs corruption Thank you for that :).
However,I've run into another fs corruption issue on my other server. I just thought I would use the same thread rather than opening new.
I was troublehooting a weird fiber channel issue ( logins going missing to my storage) when I noticed these backtraces in dmesg.
Filesystem "cciss/c3d1p1": XFS internal error xfs_btree_check_lblock at line 186 of file fs/xfs/xfs_btree.c. Caller 0xffffffff881b92d6 Call Trace: [<ffffffff881bce83>] :xfs:xfs_btree_check_lblock+0xf4/0xfe [<ffffffff881b92d6>] :xfs:xfs_bmbt_lookup+0x159/0x420 [<ffffffff881b41cc>] :xfs:xfs_bmap_add_extent_delay_real+0x62a/0x103a [<ffffffff881a8cfa>] :xfs:xfs_alloc_vextent+0x379/0x3ff [<ffffffff881b543a>] :xfs:xfs_bmap_add_extent+0x1fb/0x390 [<ffffffff881b7f34>] :xfs:xfs_bmapi+0x895/0xe79 [<ffffffff881d4082>] :xfs:xfs_iomap_write_allocate+0x201/0x328 [<ffffffff881d4b09>] :xfs:xfs_iomap+0x22a/0x2a5 [<ffffffff881e9ae3>] :xfs:xfs_map_blocks+0x2d/0x65 [<ffffffff881ea723>] :xfs:xfs_page_state_convert+0x2af/0x544 [<ffffffff881eab04>] :xfs:xfs_vm_writepage+0xa7/0xdf [<ffffffff8001cef2>] mpage_writepages+0x1bf/0x37d [<ffffffff881eaa5d>] :xfs:xfs_vm_writepage+0x0/0xdf [<ffffffff8005b1ea>] do_writepages+0x20/0x2f [<ffffffff8005000e>] __filemap_fdatawrite_range+0x50/0x5b [<ffffffff80050717>] do_fsync+0x2f/0xa4 [<ffffffff800e1ce9>] __do_fsync+0x23/0x36 [<ffffffff8005e116>] system_call+0x7e/0x83 Filesystem "cciss/c3d1p1": XFS internal error xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c. Caller 0xffffffff881d4186 Call Trace: [<ffffffff881e1b37>] :xfs:xfs_trans_cancel+0x55/0xfa [<ffffffff881d4186>] :xfs:xfs_iomap_write_allocate+0x305/0x328 [<ffffffff881d4b09>] :xfs:xfs_iomap+0x22a/0x2a5 [<ffffffff881e9ae3>] :xfs:xfs_map_blocks+0x2d/0x65 [<ffffffff881ea723>] :xfs:xfs_page_state_convert+0x2af/0x544 [<ffffffff881eab04>] :xfs:xfs_vm_writepage+0xa7/0xdf [<ffffffff8001cef2>] mpage_writepages+0x1bf/0x37d [<ffffffff881eaa5d>] :xfs:xfs_vm_writepage+0x0/0xdf [<ffffffff8005b1ea>] do_writepages+0x20/0x2f [<ffffffff8005000e>] __filemap_fdatawrite_range+0x50/0x5b [<ffffffff80050717>] do_fsync+0x2f/0xa4 [<ffffffff800e1ce9>] __do_fsync+0x23/0x36 [<ffffffff8005e116>] system_call+0x7e/0x83 xfs_force_shutdown(cciss/c3d1p1,0x8) called from line 1165 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff881e1b50 Filesystem "cciss/c3d1p1": Corruption of in-memory data detected. Shutting down filesystem: cciss/c3d1p1 Please umount the filesystem, and rectify the problem(s) Filesystem "cciss/c3d1p1": xfs_log_force: error 5 returned. Filesystem "cciss/c3d1p1": xfs_log_force: error 5 returned.
Any thoughts on what the root cause might be?
- I've checked the underlying drives, array controller etc and all looks healthy; (indicating it is a fs issue for sure?)
I did the xfs_repair which corrected the issue but I'm worried as to how fs ended up in this state, this being a production box.
Thanks in advance.
From: Dave Chinner <david@xxxxxxxxxxxxx> To: Leo Davis <leo1783@xxxxxxxxx> Cc: xfs@xxxxxxxxxxx Sent: Tue, April 12, 2011 4:35:32 PM Subject: Re: fs corruption On Tue, Apr 12, 2011 at 03:51:20AM -0700, Leo Davis wrote: > You have a corrupted free space btree. > > Err... apologies for my ignorance, but what is a free space btree? A tree that indexes the free space in the filesystem. Every time you write a file or remove a file you are allocating or freeing space, and these tree keep track of that free space. If you want to know - at a high level - how XFS is structured (good for understanding what a free space tree is), read this paper: http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html It's from 1996, but still correct on all the major structural details. > I had serial trace from raid controller which i just checked and > it logged some 'Loose cabling', but this was months back..... not > sure whether that can be the cause of this.. strange if that is > the case since it's been a long time it's possible that it took a couple of months to trip over a random metadata corruption. I've seen that before in directory trees and inode clusters where corruption is not detected until next time they are read from disk.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: 2.6.39-rc4+: oom-killer busy killing tasks, Dave Chinner |
|---|---|
| Next by Date: | Re: 2.6.39-rc4+: oom-killer busy killing tasks, Christian Kujau |
| Previous by Thread: | Re: fs corruption, Emmanuel Florac |
| Next by Thread: | [PATCH] xfs: fix duplicate message output, Dave Chinner |
| Indexes: | [Date] [Thread] [Top] [All Lists] |