[Top] [All Lists]

RE: ADD 804570 - The elevator bug

To: 'Tony Gale' <gale@xxxxxxxxxxxxxxxxxx>, Russell Cattelan <cattelan@xxxxxxxxxxx>
Subject: RE: ADD 804570 - The elevator bug
From: "Lord, Steve" <SLord@xxxxxxxxxxx>
Date: Tue, 5 Dec 2000 07:05:53 -0600
Cc: linux-xfs@xxxxxxxxxxx
Sender: owner-linux-xfs@xxxxxxxxxxx
I should comment here - since it was an old message of mine which
was sent around again. The code I suggested people try is buggy,
there was followup in linux kernel about this. Basically the
elevator 'bug' is a starvation problem which is due to it using
very large constants for how many times a request can be scanned
over before it has to be processed. It never caused a crash,
it just caused some test programs to hang for about a day before
they worked through the elevator.

Fixing the elevator should be as simple as either getting hold of
elvtune which can tweak these parameters or changing them in the
code (probably elevator.c) I am in a hotel room in Denver and I
don't have direct access to source at the moment, or I would point
you at the right places.

[ Side note on repair and the lost+found directory - the first thing
repair does is remove this directory without removing its contents,
which means that if you run repair with a lost+found with contents
you will always get unlinked files. If you want to keep the files
then move them somewhere else (renaming lost+found would work) or
just delete them between repair runs. I do not know the reasoning
behind this logic, the authors or repair left SGI a long time

You state that your news server hangs after about a week - this is
the thing which needs digging into I think. Have you tried dropping
into kdb when this happens - a dump of all stack traces might be a
good starting point.

> -----Original Message-----
> From: Tony Gale [mailto:gale@xxxxxxxxxxxxxxxxxx]
> Sent: Tuesday, December 05, 2000 4:15 AM
> To: Russell Cattelan
> Cc: linux-xfs@xxxxxxxxxxx
> Subject: Re: ADD 804570 - The elevator bug
> On 04-Dec-2000 Russell Cattelan wrote:
> > Tony Gale wrote:
> > 
> >> But, the filesystem pretty much goes unrecoverable after
> >> I am forced to reset the box:
> > 
> > This could be related to other issues.
> > First exactly what version of the XFS tree are you running?
> > If you are running anything less than current (as of today)
> > or the XFS_BETA_4 image, please upgrade immediately.
> I was running linux-2.4-xfs cvs from Nov 27. I can't gauge the
> stability of xfs if I keep updating it.
> I've upgraded to the current cvs + elevator patch now.
> > 
> > There was a corruption problem in all previous version.
> > Symptoms of the corruption does sound similar to what
> > you are describing.
> > 
> > At this point you will need to run xfs_repair to get your
> > file system back, if repair fails let us know hopefully we
> > can fix whatever went wrong.
> >
> xfs_repair seems to sort out the (numerous) problems, although innd
> still won't work with the resulting spool. Will have to speak to innd
> people about that.
> But, xfs_repair does have an annoyance. I like to run fs repair 
> programs until it reports no problems, but the way xfs_repair
> unlinks lost+found, if there are any files in there you always get
> disconnected inodes in phase 6.
> Thanks
> -tony
> ---
> E-Mail: Tony Gale <gale@xxxxxxxxxxxxxxxxxx>
> grep me no patterns and I'll tell you no lines.
> The views expressed above are entirely those of the writer
> and do not represent the views, policy or understanding of
> any other person or official body.

<Prev in Thread] Current Thread [Next in Thread>