Received: with ECARTIS (v1.0.0; list linux-xfs); Tue, 17 Jun 2003 17:20:40 -0700 (PDT) Received: from lips.thebarn.com (lips.borg.umn.edu [160.94.232.50]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h5I0KM2x007603 for ; Tue, 17 Jun 2003 17:20:23 -0700 Received: from [10.0.0.10] (c-24-245-56-70.mn.client2.attbi.com [24.245.56.70]) by lips.thebarn.com (8.12.9/8.12.6) with ESMTP id h5I0KLuw075636; Tue, 17 Jun 2003 19:20:21 -0500 (CDT) (envelope-from cattelan@thebarn.com) Subject: Re: problems booting/recovery after crash (xfs cvs) From: Russell Cattelan To: Michael Loftis Cc: Mihai RUSU , Linux XFS List In-Reply-To: <113491687.1055869908@[10.0.0.135]> References: <113491687.1055869908@[10.0.0.135]> Content-Type: text/plain Organization: Message-Id: <1055895565.1068.0.camel@lupo.thebarn.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 17 Jun 2003 19:19:25 -0500 Content-Transfer-Encoding: 7bit X-archive-position: 4378 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: cattelan@thebarn.com Precedence: bulk X-list: linux-xfs BTW file a bug http://oss.sgi.com/bugzilla/ so this doesn't get lost On Tue, 2003-06-17 at 18:11, Michael Loftis wrote: > Boot single again (with latest kernel or whatever works) don't mount the > filesystem and run an xfs_repair. > > Recently I experienced almost the exact same thing. (only our machine room > is loud and the machine has no external indication of drive I/O so I can't > comment on that) -- ours was a problem with the root partition though > which required a boot from a rescue disk. > > > > --On Tuesday, June 17, 2003 22:50 +0300 Mihai RUSU wrote: > > > Hi > > > > After 2.4.21 kernel release I wanted to try it out on some of the machines > > here. For that I checked out CVS (SGI-XFS CVS-2003-06-16_05:00_UTC with > > no debug enabled) compiled and booted. After 20 hours of uptime the > > machine had to be rebooted hard (well ,that was actually a mistake but > > thats not the point) by unplugging it. After power on, it booted until the > > first "big" XFS filesystem recovery were it hanged (disk activity on that > > fs stopped after some seconds). When I say big I mean a ~140 gb partition > > (before that one, there was another XFS partitions which didnt had the > > hanging problem but which is a lot smaller, 16gb, and also which has > > internal journal different from the one were it hangs which has external > > jurnal). I have tried many kernels from the lilo boot menu and no one > > succeded except for the 2.4.9-34 kernel (contributed kernel from the 1.1 > > release dir). The kernels that I have tried and not succeded were > > 2.4.21-cvs (the version mentioned above) and 2.4.18-18SGI_XFS_1.2.0. I > > also mention that all this kernels I have compiled them myself using gcc > > 2.95.3. > > > > Another strange thing is that in the kernel logs I couldnt find the boot > > messages of the boot tries before that last one (2.4.9-34) that worked. > > When 2.4.9-34 booted also gave me some interesting kernel logs: > > > > Jun 17 15:41:08 s1 kernel: Starting XFS recovery on filesystem: > > dac960(48,9) (dev: 8/6) > > Jun 17 15:41:08 s1 kernel: xfs_inotobp: xfs_imap() returned an error 22 > > on dac960(48,9). Returning error. > > Jun 17 15:41:08 s1 kernel: xfs_iunlink_remove: xfs_inotobp() returned an > > error 22 on dac960(48,9). Returning error. > > Jun 17 15:41:08 s1 kernel: xfs_inactive: xfs_ifree() returned an error = > > 22 on dac960(48,9) > > Jun 17 15:41:08 s1 kernel: xfs_force_shutdown(dac960(48,9),0x1) called > > from line 1962 of file xfs_vnodeops.c. Return address = 0xc01cf242 > > Jun 17 15:41:08 s1 kernel: I/O Error Detected. Shutting down filesystem: > > dac960(48,9) > > Jun 17 15:41:08 s1 kernel: Please umount the filesystem, and rectify the > > problem(s) > > Jun 17 15:41:08 s1 kernel: Ending XFS recovery on filesystem: dac960(48,9) > > (dev: 8/6) > > > > After 2.4.9-34 booted I rebooted the machine (software) and it booted > > 2.4.29-cvs just fine. > > > > Because I never had problems with that machine before (hardware problems) > > and because it did it only on XFS recovery I presume its a XFS bug ? > > > > I would like to know what can I do to make sure it doesnt happen again. > > Thanks :) > > > > ---------------------------- > > Mihai RUSU > > > > Disclaimer: Any views or opinions presented within this e-mail are solely > > those of the author and do not necessarily represent those of any company, > > unless otherwise specifically stated. > > > > > > > > -- > Michael Loftis > Modwest Sr. Systems Administrator > Powerful, Affordable Web Hosting -- Russell Cattelan