Steve +,
Sorry to bother you again. You may remember that we've corresponded
several times over the past ~9months with regards to kernel memory
allocation problems and fragmented files (see bellow).
We had a period of relative stability, however the last few weeks we
have gone back to a situation of having one or more crashes/hangs every
week and are now having to again review our continued use of XFS.
Therefore any update on progress towards a fix for these problems would
be very useful (I'd hate to go though the pain of converting our ~1Tbyte
filesystem to Reiser of ext3 if there are fixes immanent).
We have been running a 2.4.18 XFS CVS kernel from Mid May for some time
now, I'm just in the process of compiling and testing the current 2.4.19
XFS CVS, is this likely to help? (looking through the list archive I
can't find anything of direct relevance but may have missed something).
We appear to be running at a lower overall system fragmentation level
now (currently 13% in the past it has been 28% or more), though I guess
it is possible for only a couple of large very fragmented files to
result in kernel memory allocation problems and still have reasonably
low overall FS fragmentation levels?
Unfortunately the NFS load on our server is such that it is
difficult/impossible to predict times of light NFS load in which to run
fsr and as reported before we've had several incidents of filesystem
corruption and the kernel taking the FS offline running fsr under a NFS
load.
Thanks for your time (BTW: We've persevered with XFS for so long as it
seems to give better performance for our workload than ext3 or ReiserFS,
however, stability is again becoming a problem).
Regards
Ian Hardy
Research Support Services
ISS
Southampton University, UK
-----Original Message-----
From: Stephen Lord [mailto:lord@xxxxxxx]
Sent: 26 June 2002 19:08
To: Ian D. Hardy
Cc: linux-xfs@xxxxxxxxxxx; I.D.Hardy@xxxxxxxxxxx;
O.G.Parchment@xxxxxxxxxxx
Subject: Re: Re-occurance of NFS server panics
On Wed, 2002-06-26 at 12:36, Ian D. Hardy wrote:
Sorry, you dropped through the cracks there, and I am currently sitting
in the back of a talk at the Ottawa Linux Symposium, so my coding time
is a little limited this week. Next week there will also be no one in
the office (except the Australian contingent).
Seems you have two issues, first file fragmentation and the fact that
fsr appears to have issues on a live system. Yes I agree that running
fsr during down time is the best solution available right now. I do not
know if you have an idle time where you could actually run fsr on a
known idle system. I think it has options to run for a fixed amount of
time
instead of running to completion. If you have known times
when activity is low you could possibly run fsr during this period.
The fundamental issue is the amount of memory which one of
these fragmented files needs to hold its extents and the
ideal solution to to change how this memory is organized.
I have tinkered with the idea, but it is a non-trivial
project and I do not know when I might get to do it.
So I don't really have a code solution for you right now,
we need to look into what is happening to fsr under nfs
load, there should be something we can do to fix that
faster than the extent allocation code.
Steve
> Steve ++ Colleagues,
>
> Sorry to bother you (I understand that your busy & short
> staffed) - it would be useful to get some feedback on
> the problems/issues I raised a couple of weeks ago (I did note that
> you mentioned continuing problems due to fragmentation in another
> thread a few days ago). Do you have any idea if/when it should be
> possible to fix this problem? (I feel bad asking; but I'm getting
> preasure to look again at alternatives ...... which I'd rather not do
> - as I'm sure they have their own problems!).
>
> FYI: in the last ~20 days we've had another panic, that looked like
> another memory alloc error (I was on leave - so didn't get the full
> details) + a couple of system lockups (high load average and failing
> to fileserve); possibly not related. We reduced the load by
> introducing another server/filesystem (reiserfs !!) and moving some
> users onto that, today we had some scheduled maintenance time and did
> an ofline defrag of the XFS filesystem bringing it down from ~28% to
> <1%.
>
> Is there anything that I can do (remember I'm not a kernel
> writer/expert) to help, any further diagnostics that would help.
>
> Again many thanks for your help.
>
> Ian Hardy
>
>
> /////////////Technical Coordination, Research
Services////////////////////
> Ian Hardy Tel: 023 80 593577
> Computing Services Mobile: 0709 2127503
> Southampton University email: idh@xxxxxxxxxxx
> Southampton S017 1BJ, UK.
i.d.hardy@xxxxxxxxxxx
> \\'BUGS: The notion of errors is ill-defined' (IRIX man page for
> netstat)\
|