xfs-masters
[Top] [All Lists]

[xfs-masters] [Bug 720] xfs_da_do_buf error under load on Core 5 current

To: xfs-master@xxxxxxxxxxx
Subject: [xfs-masters] [Bug 720] xfs_da_do_buf error under load on Core 5 current kernel - XFS over LVM on 3Ware 9500 HW RAID
From: bugzilla-daemon@xxxxxxxxxxx
Date: Fri, 22 Sep 2006 12:35:00 -0700
Reply-to: xfs-masters@xxxxxxxxxxx
Sender: xfs-masters-bounce@xxxxxxxxxxx
http://oss.sgi.com/bugzilla/show_bug.cgi?id=720





------- Additional Comments From blackavr@xxxxxxxxxxxxx  2006-09-22 12:35 CST 
-------
(In reply to comment #4)
> This is probably caused by a known problem, fixed in 2.6.17.7. You 
> also need a recent xfsprogs (2.8.11 IIRC) to fix the on disk corruption. 
> There's more infoin the XFS FAQ about how to recover from this problem here: 
>  
> http://oss.sgi.com/projects/xfs/faq.html#dir2 

That's possible, but this seems different. First, I was under the impression
that that bug was patched in 2.6.17-1.2157_FC5. Second, as far as I know, the
hallmark of that bug was that the corruption could not be fixed with the
xfs_repair that I have. However, my xfsprogs-2.7.3-1.2.1 repairs the filesystem
fine - no need to use xfs_db to mark blocks bad manually. The long repair
process on the latest one is likely due to the 450K files in that directory,
hashed 5 layers deep, so 450K files + up to 2250000 directories adds up to a
long rebuild time. Nothing ended up in lost+found, either, just the expected
nulls on the in-memory files.  The previous one repaired fast and completely,
and we've had the same issue once before. All have worked fine after xfs_repair,
and xfs_repair didn't complain at all.  
We're going to force flush before reporting success up the chain, so that plus
running battery backup on the 3Ware cards should stop the nulls. However, these
machines run essentially unattended, so a fix to the error that puts them
offline is what I'm seeking. It feels like a race condition, as it happens under
heavy read/write/delete.  I'm certainly going to dig deeper into what Fedora is
shipping in the 2157 kernel build, and look into an update, but this feels
different than that bug. 

-- 
Configure bugmail: http://oss.sgi.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


<Prev in Thread] Current Thread [Next in Thread>