Thanks for you reply.
I am trying to act on the hints you gave me but I still have a few
On Thu, Oct 25, 2012 at 6:47 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Thu, Oct 25, 2012 at 09:45:10AM -0400, Kamal Dasu wrote:
>> with "CONFIG_XFS_DEBUG=y" I get the following assertion:
>> Assertion failed: prev.br_state == XFS_EXT_NORM, file:
>> fs/xfs/xfs_bmap.c, line: 5192
> Yup, that's pretty clear indication of a corrupted extent record.
What is the best way to prevent transactions that record bad
extent length and block numbers.
>> would have cleared inode 6776
>> - agno = 1
>> 771a3500: Badness in key lookup (length)
>> bp=(bno 16107312, len 16384 bytes) key=(bno 16107312, len 8192 bytes)
>> - agno = 2
>> bad nblocks 5120 for inode 33701135, would reset to 4096
>> inode 34297761 - bad rt extent start block number 2392537303836672,
> That's the open, unlinked file at the time the system crashed. That
> may be where your problems are coming from. The RT is mostly
> untested, and we sure as anything don't do any crash resiliency or
> recovery testing on it, so there's a good chance there are bugs in
> it that might show up in situations like this....
> You need to detect extents with invalid lengths in them and trigger
> a corruption-based filesystem shutdown.
Looked at the log during one of the filesystem shutdown when the
I/O error occurs. is this an indication of already corrupted log due to
corrupted in-memory metadata structures?.
attempt to access beyond end of device
sda2: rw=0, want=33792081130943048, limit=31471329
I/O error in filesystem ("sda2") meta-data dev sda2 block
0x780db80007f240 ("xfs_trans_read_buf") error 5 buf count 4096
xfs_force_shutdown(sda2,0x1) called from line 395 of file
fs/xfs/xfs_trans_buf.c. Return address = 0x801f4f88
Filesystem "sda2": I/O Error Detected. Shutting down filesystem: sda2
Please umount the filesystem, and rectify the problem(s)
However the log is already corrupted. So is there a check on a write
to the log ?.
>> also if there is something that can be done to avoid this situation in
>> the first place.
> Track down where those stray upper bits in the block numbers are
> coming from, and you'll have your answer.
Have not been able to track this down yet. But could it be a possible memory
corruption, leading to the in-memory metadata to get corrupted.
On a similar occurrence of this issue on recovery after a reboot seems
to always go through the evict path
Filesystem "sda2": XFS internal error xfs_trans_cancel at line 1815
of file fs/xfs/xfs_trans.c. Caller 0x801f8524
Filesystem "sda2": Corruption of in-memory data detected. Shutting
down filesystem: sda2
View this message in context:
Sent from the Xfs - General mailing list archive at Nabble.com.