Steve+
I enabled the 'CONFIG_DEBUG_SLAB' option in the kernel (taking a recent
CVS of the 2.4.17 XFS from 12th Feb) and have had the following Oops (which I
hope means more to you than to me!).
Feb 14 00:41:31 blue00 kernel: kfree: bad ptr f8f3d000h.
Feb 14 00:41:31 blue00 kernel: invalid operand: 0000
Feb 14 00:41:32 blue00 kernel: CPU: 1
Feb 14 00:41:32 blue00 kernel: EIP: 0010:[kmem_cache_free+54/128] Not
tainted
Feb 14 00:41:32 blue00 kernel: EFLAGS: 00010086
Feb 14 00:41:32 blue00 kernel: eax: 0000001d ebx: 00e3cf40 ecx: 0000002e
edx: 00000000
Feb 14 00:41:32 blue00 kernel: esi: d42520e4 edi: f8f3d000 ebp: 00000000
esp: f7ee1e30
Feb 14 00:41:32 blue00 kernel: ds: 0018 es: 0018 ss: 0018
Feb 14 00:41:32 blue00 kernel: Process kswapd (pid: 5, stackpage=f7ee1000)
Feb 14 00:41:32 blue00 kernel: Stack: c02b3322 f8f3d000 d4252130 d42520e4
00000000 00000000 00000286 c74bfecc
Feb 14 00:41:32 blue00 kernel: c01f6f86 f8f3d000 c01cabfe f8f3d000
00014460 d42520e4 d42520e4 00000000
Feb 14 00:41:32 blue00 kernel: c01cac6f d42520e4 00000000 d42520e4
c01c78ae d42520e4 00000001 c01e649a
Feb 14 00:41:32 blue00 kernel: Call Trace: [change_termios+118/400]
[xlog_recover_do_efi_trans+158/192] [xlog_recover_do_efd_trans+79/256]
[xlog_regrant_write_log_space+94/784] [linvfs_follow_link+10/240]
Feb 14 00:41:32 blue00 kernel: Code: 0f 0b 83 c4 08 8b 15 8c 85 3f c0 8b 2c 1a
89 7c 24 14 b8 00
Using defaults from ksymoops -t elf32-i386 -a i386
Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 0f 0b ud2a
Code; 00000002 Before first symbol
2: 83 c4 08 add $0x8,%esp
Code; 00000004 Before first symbol
5: 8b 15 8c 85 3f c0 mov 0xc03f858c,%edx
Code; 0000000a Before first symbol
b: 8b 2c 1a mov (%edx,%ebx,1),%ebp
Code; 0000000e Before first symbol
e: 89 7c 24 14 mov %edi,0x14(%esp,1)
Code; 00000012 Before first symbol
12: b8 00 00 00 00 mov $0x0,%eax
Ian
Steve Lord wrote:
>
> On Mon, 2002-02-04 at 07:30, Ian D. Hardy wrote:
> > Hi,
> >
> > Anyone any ideas on the following Oops (processed with ksymoops 2.4.3). It
> > is
> > from a NFS server (Dual 1Ghz Supermicro LE, 1Gbyte RAM, 40Gbyte Maxtor IDE
> > system disk, Zero-D/GForce RI Fibrechannel to IDE hardware RAID-5 500Gbyte
> > disk unit). It is running the Linux 2.4.17-xfs kernel taken as a CVS image
> > on 27th January. The main area of disk it is serving is on the HW RAID unit,
> > which is the only XFS filesystem on the system. The system had been up
> > for just over 3 days when it crashed.
> >
> > I reported a very similar failure a few weeks ago, at that time running a
> > 2.4.9 based kernel, Steve Lord suggested that we tried the latest CVS image
> > as this had fixed some memory alloacation problems.
> >
> > The machine is essentially an NFS fileserver to a computational cluster.
> > Though
> > of possible interest is the 'save' process that was running on one of the
> > processes, this is the Legato Networker backup client process (which was
> > performing a full backup of the XFS filesystem at the time). I don't think
> > this is significant as I was seeing these crashes (at ~4 to 12 day
> > intervals)
> > with the 2.4.9 kernel not dependant upon a 'save' session running.
> >
> >
>
> You have not been forgotten, just trying to do too many things at once
> around here right now. But both of you ended up with an oops in kfree,
> would it be possible to turn on CONFIG_DEBUG_SLAB.
> This will turn on a number of memory checking features and might make
> things fall over at a different - and more inciteful point.
>
> In Chip's case I suspect the config flag does not exist, so hand edit
> mm/slab.c and turn on the DEBUG options in there.
>
> On a side note, today I experienced an oops due to what appeared to be
> a failure to allocate a buffer - we had been assuming these were caused
> by being out of memory, but in my case I had plenty of available memory,
> it turns out to be a bug in the pagebuf code when we reallocate metadata
> space. I am thrashing the fix on some test boxes now, but it is possible
> that those really were not out of memory cases people were seeing, but
> due to this bug.
>
> Steve
>
> --
>
> Steve Lord voice: +1-651-683-3511
> Principal Engineer, Filesystem Software email: lord@xxxxxxx
--
/////////////Technical Coordination, Research Services////////////////////
Ian Hardy
Computing Services
Southampton University email: idh@xxxxxxxxxxx
Southampton S017 1BJ, UK. i.d.hardy@xxxxxxxxxxx
\\'BUGS: The notion of errors is ill-defined' (IRIX man page for netstat)\
|