xfs
[Top] [All Lists]

Re: TAKE - fix panic caused by mix of local and remote access to xfs

To: Steve Lord <lord@xxxxxxx>
Subject: Re: TAKE - fix panic caused by mix of local and remote access to xfs
From: <marchuk@xxxxxxxxxxxxxxxxx>
Date: Wed, 30 May 2001 11:01:23 -0700 (PDT)
Cc: Galen Arnold <arnoldg@xxxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <200105301759.f4UHxIh18635@jen.americas.sgi.com>
Sender: owner-linux-xfs@xxxxxxxxxxx
May 29 20:01:09 gauss kernel: __alloc_pages: 0-order allocation failed.  
.....after a reboot
May 30 03:27:39 gauss kernel: __alloc_pages: 0-order allocation failed. 
another crash.


*****************************
Walter Marchuk
Senior Computer Specialist
University of Washington
Electrical Engineering
Room: 307g
206-221-5421
marchuk@xxxxxxxxxxxxxxxxx
*****************************

On Wed, 30 May 2001, Steve Lord wrote:

> 
> Hi,
> 
> I am running iozone on a 2Gbyte box on 2.4.5-xfs right now it is still in
> the first pass - I only have a striped md filesystem, not hardware raid.
> I have not been running long enough to say if the problem is fixed or not.
> 
> There continue to be general highmem problems in 2.4 linux right now, there
> are known deadlocks in the generic code - nothing to do with XFS itself,
> but XFS may be better at exposing them. Things do appear to be improving,
> the 2.4.2 kernel did not last 10 seconds on this test for Galen, the 2.4.4
> kernel lasted several hours. There are more changes in 2.4.5, but Linus rushed
> them in right before he put out the 2.4.5 release and he left for Japan,
> there was continuing discussion that this would not fix the deadlocks.
> 
> Since then there has been a patch to change how I/O is done for highmem
> boxes, which a) improves performance and b) should if I understand it
> get rid of deadlocks.
> 
> I would recommend trying the 2.4.5 kernel, and should that still have
> problems I can try the patch from Jens Axboe for changing the way the
> I/O is done and pass it on to you.
> 
> Thanks for continuing to pound on xfs though.
> 
> Steve
> 
> > linux-2.4.4-xfs
> > 
> > *****************************
> > Walter Marchuk
> > Senior Computer Specialist
> > University of Washington
> > Electrical Engineering
> > Room: 307g
> > 206-221-5421
> > marchuk@xxxxxxxxxxxxxxxxx
> > *****************************
> > 
> > On Wed, 30 May 2001, Galen Arnold wrote:
> > 
> > > Walter,
> > > 
> > > Welcome to the club!  I can reproduce that behavior with 2.4.4-xfs and
> > > this iozone test (my host has 2G mem, so I test with big files):
> > > 
> > >   iozone -s 4000m -r 64k
> > > 
> > > My box also hangs (doesn't crash or panic).  Was that 2.4.5-xfs you
> > > tested, if so, you saved me the trouble and I'll delay my next cvs
> > > checkout/rebuild?
> > > 
> > > -Galen
> > > 
> > > +
> > > Galen Arnold, system engineer--systems group       arnoldg@xxxxxxxxxxxxx
> > > National Center for Supercomputing Applications           (217) 244-3473
> > > 152 Computer Applications Bldg., 605 E. Spfld. Ave., Champaign, IL 61820
> > > 
> > > On Wed, 30 May 2001 marchuk@xxxxxxxxxxxxxxxxx wrote:
> > > 
> > > > My fileserver has the latest XFS CVS kernel.  Yesterday the machine
> > > > crashed/froze  It stopped at 8PM right after this allocation failed
> > > > error.  Notice the time when the machine came back, 1AM (it was 
> > > > physicall
> > y
> > > > rebooted).  
> > > > 
> > > > I did a search for this error and saw a correlation with xfs and this
> > > > error.  Do any of you know if the error and the crash was due to xfs 
> > > > bug?
> > > > 
> > > > May 29 20:01:09 gauss kernel: __alloc_pages: 0-order allocation failed. 
> > > >  
> >     
> > > > May 30 01:34:19 gauss syslogd 1.3-3: restart.    
> > > > *****************************
> > > > Walter Marchuk
> > > > Senior Computer Specialist
> > > > University of Washington
> > > > Electrical Engineering
> > > > Room: 307g
> > > > 206-221-5421
> > > > marchuk@xxxxxxxxxxxxxxxxx
> > > > *****************************
> > > > 
> > > > On Tue, 29 May 2001, Steve Lord wrote:
> > > > 
> > > > > > That means it's been tagged?
> > > > > 
> > > > > I am not sure what you mean by tagged, I checked in a fix for the 
> > > > > probl
> > em
> > > > > with the implementation I checked in on Friday, and in cleaning my 
> > > > > mail
> > > > > box I saw that I had said I would do a followup under the same 
> > > > > heading.
> > > > > 
> > > > > Steve
> > > > > 
> > > > > > 
> > > > > > -- 
> > > > > > Austin Gonyou
> > > > > > Systems Architect, CCNA
> > > > > > Coremetrics, Inc.
> > > > > > Phone: 512-796-9023
> > > > > > email: austin@xxxxxxxxxxxxxxx
> > > > > > 
> > > > > > On Tue, 29 May 2001, Steve Lord wrote:
> > > > > > 
> > > > > > > >
> > > > > > > > Sod's law says you find the hole right after you ship the code. 
> > > > > > > > T
> > here is 
> > > > > > a
> > > > > > > > bug in this code, I would not do a cvs update until I get a fix 
> > > > > > > > i
> > n, I kno
> > > > > > w
> > > > > > > > roughly how to fix it, but it will take me a while to code and 
> > > > > > > > te
> > st it.
> > > > > > > >
> > > > > > > > I would hold off on the cvs tree for a while until you see 
> > > > > > > > anothe
> > r messag
> > > > > > e
> > > > > > > > from me on this thread - probably not today.
> > > > > > > >
> > > > > > > > Steve
> > > > > > > >
> > > > > > >
> > > > > > > This was fixed over the weekend by the way.
> > > > > > >
> > > > > > > Steve
> > > > > > >
> > > > > 
> > > > > 
> > > > 
> > > 
> 
> 


<Prev in Thread] Current Thread [Next in Thread>