http://oss.sgi.com/bugzilla/show_bug.cgi?id=720
------- Additional Comments From blackavr@xxxxxxxxxxxxx 2006-09-22 12:35 CST
-------
(In reply to comment #4)
> This is probably caused by a known problem, fixed in 2.6.17.7. You
> also need a recent xfsprogs (2.8.11 IIRC) to fix the on disk corruption.
> There's more infoin the XFS FAQ about how to recover from this problem here:
>
> http://oss.sgi.com/projects/xfs/faq.html#dir2
That's possible, but this seems different. First, I was under the impression
that that bug was patched in 2.6.17-1.2157_FC5. Second, as far as I know, the
hallmark of that bug was that the corruption could not be fixed with the
xfs_repair that I have. However, my xfsprogs-2.7.3-1.2.1 repairs the filesystem
fine - no need to use xfs_db to mark blocks bad manually. The long repair
process on the latest one is likely due to the 450K files in that directory,
hashed 5 layers deep, so 450K files + up to 2250000 directories adds up to a
long rebuild time. Nothing ended up in lost+found, either, just the expected
nulls on the in-memory files. The previous one repaired fast and completely,
and we've had the same issue once before. All have worked fine after xfs_repair,
and xfs_repair didn't complain at all.
We're going to force flush before reporting success up the chain, so that plus
running battery backup on the 3Ware cards should stop the nulls. However, these
machines run essentially unattended, so a fix to the error that puts them
offline is what I'm seeking. It feels like a race condition, as it happens under
heavy read/write/delete. I'm certainly going to dig deeper into what Fedora is
shipping in the 2157 kernel build, and look into an update, but this feels
different than that bug.
--
Configure bugmail: http://oss.sgi.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|