On Mon, Dec 17, 2007 at 02:39:07PM +0100, Laurent Caron wrote:
> Hi,
> I'm still experiencing a strange behavior on one of my DRBD setup.
> It basically consists in:
> 2 servers with XFS filesystems on top of DRBD, itself on top of MD (aka
> soft raid).
> The two servers exhibit the same behavior. This strange behavior might
> appear between 1 day and 3 weeks after having started the machines.
> Slab debugging is turned on.
> Do anyone have a clue about that problem?

The symptoms you see are the machine running out of memory and the OOM
killer being invoked. There's nothing XFS here - you'd do better to post
to lkml about this.

> I already posted about it some time ago, and was asked to turn slab debugging 
> on.

What you posted recently appeared to be the result of memory corruption,
hence the request for debugging to be turned on. This appears to be a
different problem.

> Dec 16 01:12:27 mailserver-1 kernel: DMA: 5*4kB 11*8kB 7*16kB 2*32kB 2*64kB 
> 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3484kB
> Dec 16 01:12:27 mailserver-1 kernel: Normal: 195*4kB 82*8kB 5*16kB 9*32kB 
> 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 3788kB
> Dec 16 01:12:27 mailserver-1 kernel: HighMem: 37376*4kB 104969*8kB 97167*16kB 
> 61944*32kB 34197*64kB 13138*128kB 3479*256kB 502*512kB 24*1024kB 2*2048kB 
> 2*4096kB = 9580920kB

Hmmm - you appear to have a highmem based box and have run out of
low memory for the kernel. So while having ~9.5GB of free high
memory (that the kernel can't directly use), you're out of low
memory that the kernel can use and hence it is going OOM.  The
output of /proc/slabinfo or watching slabtop will tell you where
most of this memory is going.

FWIW, I suggest upgrading to a 64 bit machine ;)


Dave Chinner
Principal Engineer
SGI Australian Software Group

