On Wed, 31 Jan 2001 13:57:26 +0100,
Aldert Zomer <A.L.Zomer@xxxxxxxxxxx> wrote:
>I'm using XFS on a SMP machine with 2 GB, also built with
>CONFIG_HIGHMEM. I'm receiving these errors after the machine has been up
>for a while. I didn't have this problem with the standard 2.4.0 kernel
>or with a 2.2.18 kernel. Or my memory is suddenly faulty, or perhaps
>there is a problem with the 2.4.0-XFS kernel. Can anyone help me here?
>
>It's a redhat7 box, all the latest patches applied, 2.4.0-XFS kernel
>from cvs (from 30 januari) build with kgcc.
>
>snippet from /var/log/messages:
>
>Jan 30 17:01:49 molgen kernel: Uhhuh. NMI received. Dazed and confused,
>but trying to continue
>Jan 30 17:01:49 molgen kernel: Uhhuh. NMI received. Dazed and confused,
>but trying to continue
>Jan 30 17:01:49 molgen kernel: You probably have a hardware problem with
>your RAM chips
>Jan 30 17:01:49 molgen kernel: You probably have a hardware problem with
>your RAM chips
>
>I've placed /var/log/dmesg online for some more information about my
>system:
>
>http://molgen.biol.rug.nl/dmesg
I bounced this problem off Ingo Molnar and Maciej W. Rozycki, who look
after io-apic and nmi code. Maciej said
I've looked at the log and it seems to be a ServerWorks chipset,
unfortunately. The manufacturer is hard to cooperate -- they declare
they want to support Linux, but they are much worried of the
competition and do not provide documentation. I've already asked
them about the IRQ 0 routing problem you may see in the log -- we
must route it through the 8259A to make it work at all. They
answered an NDA is required to get any docs.
With this in mind I can only assume the guy has faulty RAM in his
system and a NMI reports it. Note that certain chipsest use the NMI
to report ECC-corrected errors as well. We don't have a real memory
error handler in Linux, unfortunately, although there is a patch for
certain chipsets available.
They confirm my suspicion that the problem is probably not XFS related.
It is more likely to be memory or the chipset. BTW, is this a
ServerWorks chipset?
Transient memory problems are notoriously sensitive to data access
patterns, slightly faulty memory will work under Windows but fail under
Linux because we drive it harder. Adding XFS will change the data
access patterns.
|