On Mon, 2002-09-16 at 09:52, Ian D. Hardy wrote:
> Steve +,
>
> Sorry to bother you again. You may remember that we've corresponded
> several times over the past ~9months with regards to kernel memory
> allocation problems and fragmented files (see bellow).
>
> We had a period of relative stability, however the last few weeks we
> have gone back to a situation of having one or more crashes/hangs every
> week and are now having to again review our continued use of XFS.
> Therefore any update on progress towards a fix for these problems would
> be very useful (I'd hate to go though the pain of converting our ~1Tbyte
> filesystem to Reiser of ext3 if there are fixes immanent).
>
> We have been running a 2.4.18 XFS CVS kernel from Mid May for some time
> now, I'm just in the process of compiling and testing the current 2.4.19
> XFS CVS, is this likely to help? (looking through the list archive I
> can't find anything of direct relevance but may have missed something).
>
> We appear to be running at a lower overall system fragmentation level
> now (currently 13% in the past it has been 28% or more), though I guess
> it is possible for only a couple of large very fragmented files to
> result in kernel memory allocation problems and still have reasonably
> low overall FS fragmentation levels?
>
> Unfortunately the NFS load on our server is such that it is
> difficult/impossible to predict times of light NFS load in which to run
> fsr and as reported before we've had several incidents of filesystem
> corruption and the kernel taking the FS offline running fsr under a NFS
> load.
>
> Thanks for your time (BTW: We've persevered with XFS for so long as it
> seems to give better performance for our workload than ext3 or ReiserFS,
> however, stability is again becoming a problem).
>
Nothing immediately rings a bell, there have been some recent changes
which fixed some hangs HP was having in doing large scale NFS
benchmarking. These might be beneficial to you. The last oops output
I have from you looked like this:
>>EIP; c012ff76 <kfree+66/14c> <=====
Trace; c01fd146 <kmem_free+22/28>
Trace; c01fd1a4 <kmem_realloc+58/68>
Trace; c01d04f0 <xfs_iext_realloc+f0/108>
Trace; c01a6996 <xfs_bmap_delete_exlist+6a/74>
Trace; c01a5f12 <xfs_bmap_del_extent+58a/f68>
Trace; c01d6474 <xlog_state_do_callback+2a4/2ec>
Trace; c01fd2e0 <kmem_zone_zalloc+44/d0>
Trace; c01aac14 <xfs_bunmapi+b78/fd0>
Trace; c01cf962 <xfs_itruncate_finish+23e/3e0>
Trace; c01e6e22 <xfs_setattr+ae2/f7c>
Trace; c01e6340 <xfs_setattr+0/f7c>
Trace; c026cf44 <qdisc_restart+14/178>
Trace; c01f696e <linvfs_setattr+152/17c>
Trace; c01e6340 <xfs_setattr+0/f7c>
Trace; c014f45c <notify_change+7c/2a4>
Trace; f8d2e972 <[nfsd]nfsd_setattr+3ea/524>
Trace; f8d33f7a <[nfsd]nfsd3_proc_setattr+b6/c4>
Trace; f8d3b4a0 <[nfsd]nfsd_procedures3+40/2c0>
Trace; f8d2b5d2 <[nfsd]nfsd_dispatch+d2/19a>
Trace; f8d3b4a0 <[nfsd]nfsd_procedures3+40/2c0>
Trace; f8cf6f88 <[sunrpc]svc_process+28c/51c>
Trace; f8d3b400 <[nfsd]nfsd_svcstats+0/40>
Trace; f8d3aed8 <[nfsd]nfsd_version3+0/10>
Trace; f8d2b348 <[nfsd]nfsd+1b8/370>
Trace; c01057ea <kernel_thread+22/30>
Code; c012ff76 <kfree+66/14c>
00000000 <_EIP>:
Code; c012ff76 <kfree+66/14c> <=====
0: 0f 0b ud2a <=====
Code; c012ff78 <kfree+68/14c>
2: 83 c4 08 add $0x8,%esp
Code; c012ff7a <kfree+6a/14c>
5: 8b 15 2c 95 3f c0 mov 0xc03f952c,%edx
Code; c012ff80 <kfree+70/14c>
b: 8b 2c 1a mov (%edx,%ebx,1),%ebp
Code; c012ff84 <kfree+74/14c>
e: 89 7c 24 14 mov %edi,0x14(%esp,1)
Code; c012ff88 <kfree+78/14c>
12: b8 00 00 00 00 mov $0x0,%eax
Is this still what you see?
This one does bear the symptoms of a problem which was fixed a while
ago - memory was freed into the wrong pool.
Steve
|