xfs
[Top] [All Lists]

RE: Re-occurance of NFS server panics

To: I.D.Hardy@xxxxxxxxxxx
Subject: RE: Re-occurance of NFS server panics
From: Stephen Lord <lord@xxxxxxx>
Date: 16 Sep 2002 10:35:43 -0500
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <E5CC9E66DAF2D411A0D700B0D079331B41F1F2@exchange2.soton.ac.uk>
References: <E5CC9E66DAF2D411A0D700B0D079331B41F1F2@exchange2.soton.ac.uk>
Sender: linux-xfs-bounce@xxxxxxxxxxx
On Mon, 2002-09-16 at 09:52, Ian D. Hardy wrote:
> Steve +,
> 
> Sorry to bother you again. You may remember that we've corresponded
> several times over the past ~9months with regards to kernel memory
> allocation problems and fragmented files (see bellow).
> 
> We had a period of relative stability, however the last few weeks we
> have gone back to a situation of having one or more crashes/hangs every
> week and are now having to again review our continued use of XFS.
> Therefore any  update on progress towards a fix for these problems would
> be very useful (I'd hate to go though the pain of converting our ~1Tbyte
> filesystem to Reiser of ext3 if there are fixes immanent).
> 
> We have been running a 2.4.18 XFS CVS kernel from Mid May for some time
> now, I'm just in the process of compiling and testing the current 2.4.19
> XFS CVS, is this likely to help? (looking through the list archive I
> can't find anything of direct relevance but may have missed something). 
> 
> We appear to be running at a lower overall system fragmentation level
> now (currently 13% in the past it has been 28% or more), though I guess
> it is possible for only a couple of large very fragmented files to
> result in kernel memory allocation problems and still have reasonably
> low overall FS fragmentation levels?
> 
> Unfortunately the NFS load on our server is such that it is
> difficult/impossible to predict times of light NFS load in which to run
> fsr and as reported before we've had several incidents of filesystem
> corruption and the kernel taking the FS offline running fsr under a NFS
> load.
> 
> Thanks for your time (BTW: We've persevered with XFS for so long as it
> seems to give better performance for our workload than ext3 or ReiserFS,
> however, stability is again becoming a problem).
> 

Nothing immediately rings a bell, there have been some recent changes
which fixed some hangs HP was having in doing large scale NFS
benchmarking. These might be beneficial to you. The last oops output
I have from you looked like this:

>>EIP; c012ff76 <kfree+66/14c>   <=====
Trace; c01fd146 <kmem_free+22/28>
Trace; c01fd1a4 <kmem_realloc+58/68>
Trace; c01d04f0 <xfs_iext_realloc+f0/108>
Trace; c01a6996 <xfs_bmap_delete_exlist+6a/74>
Trace; c01a5f12 <xfs_bmap_del_extent+58a/f68>
Trace; c01d6474 <xlog_state_do_callback+2a4/2ec>
Trace; c01fd2e0 <kmem_zone_zalloc+44/d0>
Trace; c01aac14 <xfs_bunmapi+b78/fd0>
Trace; c01cf962 <xfs_itruncate_finish+23e/3e0>
Trace; c01e6e22 <xfs_setattr+ae2/f7c>
Trace; c01e6340 <xfs_setattr+0/f7c>
Trace; c026cf44 <qdisc_restart+14/178>
Trace; c01f696e <linvfs_setattr+152/17c>
Trace; c01e6340 <xfs_setattr+0/f7c>
Trace; c014f45c <notify_change+7c/2a4>
Trace; f8d2e972 <[nfsd]nfsd_setattr+3ea/524>
Trace; f8d33f7a <[nfsd]nfsd3_proc_setattr+b6/c4>
Trace; f8d3b4a0 <[nfsd]nfsd_procedures3+40/2c0>
Trace; f8d2b5d2 <[nfsd]nfsd_dispatch+d2/19a>
Trace; f8d3b4a0 <[nfsd]nfsd_procedures3+40/2c0>
Trace; f8cf6f88 <[sunrpc]svc_process+28c/51c>
Trace; f8d3b400 <[nfsd]nfsd_svcstats+0/40>
Trace; f8d3aed8 <[nfsd]nfsd_version3+0/10>
Trace; f8d2b348 <[nfsd]nfsd+1b8/370>
Trace; c01057ea <kernel_thread+22/30>
Code;  c012ff76 <kfree+66/14c>
00000000 <_EIP>:
Code;  c012ff76 <kfree+66/14c>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c012ff78 <kfree+68/14c>
   2:   83 c4 08                  add    $0x8,%esp
Code;  c012ff7a <kfree+6a/14c>
   5:   8b 15 2c 95 3f c0         mov    0xc03f952c,%edx
Code;  c012ff80 <kfree+70/14c>
   b:   8b 2c 1a                  mov    (%edx,%ebx,1),%ebp
Code;  c012ff84 <kfree+74/14c>
   e:   89 7c 24 14               mov    %edi,0x14(%esp,1)
Code;  c012ff88 <kfree+78/14c>
  12:   b8 00 00 00 00            mov    $0x0,%eax

Is this still what you see? 
 
This one does bear the symptoms of a problem which was fixed a while
ago - memory was freed into the wrong pool.

Steve



<Prev in Thread] Current Thread [Next in Thread>