Thanks Dave.
>> with 2.6.18 kernel,128 MB of RAM, MIPS architecture and XFS version
>> 2.8.11.
> [...]
>> Can anyone let me know what could be the probable cause of this issue.
> they are all from corrupted extent btrees.
> There are many possible causes of this that we've fixed over the past years
> since 2.6.18 was released. Indeed, we are currently discussing fixes for a
> bunch of problems that lead to corrupted extent btrees and problems like
> this. I'd suggest that you should probably start with a more recent kernel,
> make sure you have a serial console and set the xfs_error_level to 11 so that
> it gives as much information as possible on the console when the error it >
> hit.
> if that doesn't give a stack trace, then you need to set the xfs_panic_mask
> to crash the machine on block zero accesses and report the stack straces
> that it outputs...
Yes, I went through the changes between 2.6.24 and 2.6.18 and they are quite a
few. But as this is production system and on field, its not viable to upgrade
the kernel. I do understand that there could be many places which can cause the
corruption. Unfortunately, three different systems have given three different
places of corruption as stated. Now I am sleeping in the access to block zero
exception and rescheduling so that it won't stall the system and I can monitor
the state of the filesystem. As the frequency of landing the error is once in
2.5 days under extreme stress, if you could point me to the probable place to
look at, I can narrow down the debugging path.
Thanks in advance
Sagar
|