That sounds very similar to my experience with recent kernels from CVS.
I found that upon the stall (oops), nothing more could be written to the
filesystem and it was impossible to sync/unmount the online volumes,
forcing a hard reset followed by a boot from my rescue CD. I attributed
(possibly mistakenly it turns out) to the kernel being compiled by gcc
3.1 I've since gone back to 2.4.18-XFS and a different compiler and all
seems well now. The one thing that made it hard to pinpoint this one for
me, is that when I got the oops and nothing more could be written, I
couldn't even see the output of the oops in my logs.
-Walt
D. Stimits wrote:
Steve Lord wrote:
This one crept in about a week ago, I should have left this alone!
Nathan hit it in testing, for some reason I never could.
Date: Tue Jul 16 19:33:03 PDT 2002
Workarea: jen.americas.sgi.com:/src/lord/xfs-linux.2.4
The following file(s) were checked into:
bonnie.engr.sgi.com:/isms/slinx/2.4.x-xfs
Modid: 2.4.x-xfs:slinx:123144a
linux/fs/xfs/pagebuf/page_buf.c - 1.40
- fix unlock without lock bug in pagebuf, causes a BUG macro to trip
also remove need for xfs_fs.h
Out of curiosity, how would this show up? I'm experimenting on a new
Redhat 7.3 install, smp (dual pIII), i840 chipset, scsi aic7xxx mixed
with IDE. I am running "noapic" to avoid some i840 chipset issues (if
they have been solved by microcode I will find out, but first I am
getting it running with noapic), and compile the kernel and related sgi
code with kgcc. This is 2.4.19-rc1-xfs, from cvs of about 2 days ago.
This TAKE and today's takes are the first takes in a few days that were
not on this cvs version (it is out of date by about 2 or 3 days).
What I noticed is in copying a large subdirectory from an existing xfs
partition, to an ext2 partition (cp -adpr, from one scsi disk to
another), it copies a lot, then stalls. The hard drives stop showing any
activity. I am running KDE during this, and had tail -f going on
/var/log/messages. Not only did the copy stall, the keyboard became 100%
unavailable, though the mouse still worked. I could focus on different
windows, minimize them, so on, but no keyboard access. Top did not show
any real cpu use or memory out of the normal. I couldn't even get to a
console. But telnet from a local machine worked fine, and I could either
kill the cp from the remote login, or kill the X11 session, and all was
restored. If I copied the same directory via feeding "find" to cpio, it
worked flawlessly. It might not have anything to do with xfs, but it
seemed useful to report, and ask about (no oops or backtrace or other
concrete data to save)...is this anything familiar? It seems like
something deadlocked and stalled, I couldn't really call it a crash, nor
could I say for certain it wasn't a KDE or a window manager or other
issue.
FYI, I am on the mailing list.
D. Stimits, stimits @ idcomm.com
|