>>>> I'm seeing some irregulars halts on one of my XFS volume (/srv). I
>>>> can only use umount -l to dismount the volume or do a hot reboot.
>>>>
>>>> Dec 9 15:47:25 smbserver kernel: xfs_force_shutdown(md(9,5),0x8)
>>>> called from line 1039 of file xfs_trans.c. Return address =
>>>> 0xe109f312
>>>> Dec 9 15:47:25 smbserver kernel: Corruption of in-memory data
>>>> detected. Shutting down filesystem: md(9,5)
>>>> Dec 9 15:47:25 smbserver kernel: Please umount the filesystem, and
>>>> rectify the problem(s)
>>>>
>>>> I'm not sure if the disk has problems, but during boot up there's
>>>> no error found by fsck. The stall sometime occurs in weeks and
>>>> sometimes few times per day. So I really doubt if this a disk
>>>> problem. Is there any way I can trace or perhaps fix this? BTW, if
>>>> I want to manually force a disk check
>>>
>>> I think you might have a memory problem. Try memtest86. Some people
>>> don't see the problems with other filesystems. I have seen a number
>>> of cases already where bad memory only showed up with XFS
>>> filesystems.
>>
>> I did try memtest86, but found no problem. I even swapped brand new
>> RAM. Is there more info I can provide?
>
> How long did you let memtest86 run (# of passes, hours)? It could
> also be a CPU cache problem, or even the motherboard going flaky.
> I've also seen some cases of power supply problems being exhibited as
> memory issues (insufficient power delivery, or unstable power
> delivery, spikes, fluctuations outside spec etc).
My last memtest86 ran about a good 1/2 to 1 hour, I think it did 3 cycles. I
hope it is not HW, as I have 2 other boxes that run exact config except with
HW RAID. I'm running Intel SE7500WV2S motherboard, with dual P4 1.8GHz,
2x256MB PC2100 DDR ECC Reg. I did a top on the halted system, but found not
much CPU and swap usage. The system also runs with dual hot-swap 400W
power-supply on UPS.
Regards,
Norman
|