On Fri, 5 Oct 2001, Ian D. Hardy wrote:
> Hi,
>
> I've also been seeing these '__alloc_pages: 0-order allocation' messages
> (sometimes
> 4 or 5 order). I must admit I don't understand what they mean. Looking through
> the archives of various Linux kernel groups these messages seem to be
> a regular re-occuring thread, that never has a conclusive conclusion!
Correct
> As in the previous messages in this thread 'HIGHMEM 4GB' and SMP appear
> frequently in the reports of people seeing these messages. Another
> observation is that whenever specific HW is mentioned the systems showing
> these problems seem to be based on ServerWorks LE/HE motherboards (as is the
> case with my systems, which are SuperMicro 370DL3 based) I wonder if this
> is significant and if it would explain why Eric could not reproduce the
> problem? What is the motherboard in your system Eric?
The server at work is a ServerWorks LE based board with 2GB PC133 ram. The
lower end Dell servers have these.
> I'm currently running 2.4.9 and 2.4.10 kernels but have seen these messages
> on earlier 2.4.x kernels. If anything they are much less frequent in
> 2.4.10, which appears to be the first version to include the additional
> '(gfp=0x3d0/0) from c0127fe9' info at the end of the line (I'm sure this
I can reliably kill the box by just starting mongo with 5 processes with
any particular fs. It seems to be rather generic. Most of the time it just
deadlocks and no further disk IO is possible. In a previous mail I also
posted that on another filesystem I suddenly got a response that a
executable did not exist anymore!
Not that this is very likely to occur in multiple IO situations. I can run
a single Bonnie without problems or a mongo.pl with 1 process. As soon as
I have more then 1 process that generates any form of decent IO the system
deadlocks.
> means something to someone?). The most reliable way to reproduce them seems
> to be to exersise a disk/filesystem, particularly XFS (though I've seen
> the errors when exersising an ReiserFS FS. Indeed, I've recently seen
And ext2 for me too. I can not tell if XFS is really any worse then the
other filesystems but checking 200GB of ext2 in 8 hours time is not
feasible either ;-)
> these messages on a Dell 1550, 2Gbyte RAM, ServerWorks HE based,
> running a 2.4.2 SMP kernel, but without XFS built into the kernel (or
> as modules)
The dell PE 2500 is ServerWorks LE based, 2GB ram with both 2.4.10 and
2.4.11-pre3.
The only way to fix it is not using HIGHMEM. As soon as I compile without
HIGHMEM (4GB) the box is stable and does not deadlock or crash even under
heavy load. I have about a month before the system must go into production
so if anyone has some hints or tests I could do they are most welcome.
I can not get it over my heart to tell that we cannot use half the
memory available. There goes my reputation :-/
Cheers
Seth
|