http://bugzilla.kernel.org/show_bug.cgi?id=8414
------- Additional Comments From dgc@xxxxxxx 2007-05-02 06:50 -------
The original problem - the soft lockup - is probably a result
of a massively fragmented file as we are searching the extent
list when the soft-lockup detector fired. The soft lockup
detector is not indicative of an actual bug being present, though.
The second problem you report (access to block zero) indicates
something did go wrong and there is on-disk corruption of an
extent tree - there are zeros instead of real block numbers.
ou need to run repair to fix this.
> attempt to access beyond end of device
> dm-6: rw=1, want=0, limit=2097152000
> Buffer I/O error on device dm-6, logical block 18446744073709551615
> lost page write due to I/O error on dm-6
Shows an attempt to write to a block marked as either a hole or delayed
allocate. You're not having memory errors are you?
Now, the mount error:
982 error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL,
&rip, 0);
983 if (error) {
984 cmn_err(CE_WARN, "XFS: failed to read root inode");
985 goto error3;
986 }
Means teh root inode in the superblock is wrong. Something went really
wrong here. Can you run:
# xfs_db -r -c "sb 0" -c p <dev>
and
# dd if=<dev> bs=512 count=1 | od -Ax
And attach the output so we can see how badly corrupted the superblock
is?
And finally, repair. What problem did you encounter? Did you just end
up with everything in lost+found?
Hmmm - this reminds me of problems seen when the filesystem wraps the
device at 2TB. How big is the LVM volume this partition is on and where is
it placed? Along the same trainof thought, I'm wondering
if the device wasn't reconstructed correctly with the new kernel and that
led to the problem. Were the any other error messages or warnings in the
syslog from the boot when the problem first happened?
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|