xfs
[Top] [All Lists]

Re: Oopses in kfree

To: Steve Lord <lord@xxxxxxx>
Subject: Re: Oopses in kfree
From: "Ian D. Hardy" <i.d.hardy@xxxxxxxxxxx>
Date: Thu, 14 Feb 2002 10:40:54 +0000
Cc: linux-xfs@xxxxxxxxxxx, oz@xxxxxxxxxxx
Organization: University of Southampton
References: <3C5E8CFA.CACF28C3@soton.ac.uk> <1013126082.21881.65.camel@jen.americas.sgi.com>
Sender: owner-linux-xfs@xxxxxxxxxxx
Steve+

I enabled the 'CONFIG_DEBUG_SLAB' option in the kernel (taking a recent
CVS of the 2.4.17 XFS from 12th Feb) and have had the following Oops (which I
hope means more to you than to me!).


Feb 14 00:41:31 blue00 kernel: kfree: bad ptr f8f3d000h.
Feb 14 00:41:31 blue00 kernel: invalid operand: 0000 
Feb 14 00:41:32 blue00 kernel: CPU:    1 
Feb 14 00:41:32 blue00 kernel: EIP:    0010:[kmem_cache_free+54/128]    Not
tainted 
Feb 14 00:41:32 blue00 kernel: EFLAGS: 00010086 
Feb 14 00:41:32 blue00 kernel: eax: 0000001d   ebx: 00e3cf40   ecx: 0000002e  
edx: 00000000 
Feb 14 00:41:32 blue00 kernel: esi: d42520e4   edi: f8f3d000   ebp: 00000000  
esp: f7ee1e30 
Feb 14 00:41:32 blue00 kernel: ds: 0018   es: 0018   ss: 0018 
Feb 14 00:41:32 blue00 kernel: Process kswapd (pid: 5, stackpage=f7ee1000) 
Feb 14 00:41:32 blue00 kernel: Stack: c02b3322 f8f3d000 d4252130 d42520e4
00000000 00000000 00000286 c74bfecc  
Feb 14 00:41:32 blue00 kernel:        c01f6f86 f8f3d000 c01cabfe f8f3d000
00014460 d42520e4 d42520e4 00000000  
Feb 14 00:41:32 blue00 kernel:        c01cac6f d42520e4 00000000 d42520e4
c01c78ae d42520e4 00000001 c01e649a  
Feb 14 00:41:32 blue00 kernel: Call Trace: [change_termios+118/400]
[xlog_recover_do_efi_trans+158/192] [xlog_recover_do_efd_trans+79/256]
[xlog_regrant_write_log_space+94/784] [linvfs_follow_link+10/240]  
Feb 14 00:41:32 blue00 kernel: Code: 0f 0b 83 c4 08 8b 15 8c 85 3f c0 8b 2c 1a
89 7c 24 14 b8 00  
Using defaults from ksymoops -t elf32-i386 -a i386

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   0f 0b                     ud2a   
Code;  00000002 Before first symbol
   2:   83 c4 08                  add    $0x8,%esp
Code;  00000004 Before first symbol
   5:   8b 15 8c 85 3f c0         mov    0xc03f858c,%edx
Code;  0000000a Before first symbol
   b:   8b 2c 1a                  mov    (%edx,%ebx,1),%ebp
Code;  0000000e Before first symbol
   e:   89 7c 24 14               mov    %edi,0x14(%esp,1)
Code;  00000012 Before first symbol
  12:   b8 00 00 00 00            mov    $0x0,%eax


Ian

Steve Lord wrote:
> 
> On Mon, 2002-02-04 at 07:30, Ian D. Hardy wrote:
> > Hi,
> >
> > Anyone any ideas on the following Oops (processed with ksymoops 2.4.3). It 
> > is
> > from a NFS server (Dual 1Ghz Supermicro LE, 1Gbyte RAM, 40Gbyte Maxtor IDE
> > system disk, Zero-D/GForce RI Fibrechannel to IDE hardware RAID-5 500Gbyte
> > disk unit). It is running the Linux 2.4.17-xfs kernel taken as a CVS image
> > on 27th January. The main area of disk it is serving is on the HW RAID unit,
> > which is the only XFS filesystem on the system. The system had been up
> > for just over 3 days when it crashed.
> >
> > I reported a very similar failure a few weeks ago, at that time running a
> > 2.4.9 based kernel, Steve Lord suggested that we tried the latest CVS image
> > as this had fixed some memory alloacation problems.
> >
> > The machine is essentially an NFS fileserver to a computational cluster. 
> > Though
> > of possible interest is the 'save' process that was running on one of the
> > processes, this is the Legato Networker backup client process (which was
> > performing a full backup of the XFS filesystem at the time). I don't think
> > this is significant as I was seeing these crashes (at ~4 to 12 day 
> > intervals)
> > with the 2.4.9 kernel not dependant upon a 'save' session running.
> >
> >
> 
> You have not been forgotten, just trying to do too many things at once
> around here right now. But both of you ended up with an oops in kfree,
> would it be possible to turn on CONFIG_DEBUG_SLAB.
> This will turn on a number of memory checking features and might make
> things fall over at a different - and more inciteful point.
> 
> In Chip's case I suspect the config flag does not exist, so hand edit
> mm/slab.c and turn on the DEBUG options in there.
> 
> On a side note, today I experienced an oops due to what appeared to be
> a failure to allocate a buffer - we had been assuming these were caused
> by being out of memory, but in my case I had plenty of available memory,
> it turns out to be a bug in the pagebuf code when we reallocate metadata
> space. I am thrashing the fix on some test boxes now, but it is possible
> that those really were not out of memory cases people were seeing, but
> due to this bug.
> 
> Steve
> 
> --
> 
> Steve Lord                                      voice: +1-651-683-3511
> Principal Engineer, Filesystem Software         email: lord@xxxxxxx

-- 

/////////////Technical Coordination, Research Services////////////////////
Ian Hardy                                   
Computing Services                            
Southampton University                      email: idh@xxxxxxxxxxx
Southampton  S017 1BJ, UK.                         i.d.hardy@xxxxxxxxxxx
\\'BUGS: The notion of errors is ill-defined' (IRIX man page for netstat)\


<Prev in Thread] Current Thread [Next in Thread>