On Mon, Dec 17, 2007 at 02:39:07PM +0100, Laurent Caron wrote:
> I'm still experiencing a strange behavior on one of my DRBD setup.
> It basically consists in:
> 2 servers with XFS filesystems on top of DRBD, itself on top of MD (aka
> soft raid).
> The two servers exhibit the same behavior. This strange behavior might
> appear between 1 day and 3 weeks after having started the machines.
> Slab debugging is turned on.
> Do anyone have a clue about that problem?
The symptoms you see are the machine running out of memory and the OOM
killer being invoked. There's nothing XFS here - you'd do better to post
to lkml about this.
> I already posted about it some time ago, and was asked to turn slab debugging
What you posted recently appeared to be the result of memory corruption,
hence the request for debugging to be turned on. This appears to be a
> Dec 16 01:12:27 mailserver-1 kernel: DMA: 5*4kB 11*8kB 7*16kB 2*32kB 2*64kB
> 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3484kB
> Dec 16 01:12:27 mailserver-1 kernel: Normal: 195*4kB 82*8kB 5*16kB 9*32kB
> 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 3788kB
> Dec 16 01:12:27 mailserver-1 kernel: HighMem: 37376*4kB 104969*8kB 97167*16kB
> 61944*32kB 34197*64kB 13138*128kB 3479*256kB 502*512kB 24*1024kB 2*2048kB
> 2*4096kB = 9580920kB
Hmmm - you appear to have a highmem based box and have run out of
low memory for the kernel. So while having ~9.5GB of free high
memory (that the kernel can't directly use), you're out of low
memory that the kernel can use and hence it is going OOM. The
output of /proc/slabinfo or watching slabtop will tell you where
most of this memory is going.
FWIW, I suggest upgrading to a 64 bit machine ;)
SGI Australian Software Group