xfs
[Top] [All Lists]

Re: mkinitrd, ramdisk failure?

Subject: Re: mkinitrd, ramdisk failure?
From: "D. Stimits" <stimits@xxxxxxxxxx>
Date: Wed, 13 Jun 2001 18:57:51 -0600
Cc: "XFS: linux-xfs@xxxxxxxxxxx" <linux-xfs@xxxxxxxxxxx>
References: <Pine.GSO.4.33.0106140812160.16255-100000@xxxxxxxxxxxxxxxxxxxx>
Reply-to: stimits@xxxxxxxxxx
Sender: owner-linux-xfs@xxxxxxxxxxx
Chris Pascoe wrote:
> 
> > NOTE: The /boot partition seems to be read fine, and the scsi controller
> > is detected and works fine, including apparently the read of /boot. SCSI
> > is directly compiled in.
> 
> I don't believe the read of /boot isn't done using the system SCSI
> modules; it's done using BIOS calls.  So that it loads the kernel
> and initrd is not an indication that the SCSI drivers, etc, actually
> loaded properly.

Ok, I did some experimenting. Background first: my /boot/ is the first
partition; it is ext2; ext2 is compiled in. The test: I went ahead and
made all scsi become a module. Voila, you were correct, running aic7xxx
as module caused failure to load the scsi driver, and the boot failed
earlier, though it got past networking. I'm about to try again, this
time explicitly naming the scsi modules. I suspect it won't matter...the
ramdisk is being completely ignored by lilo (I double-checked, no
spelling errors in naming the file or path).

> 
> Bear with me and check that you are seeing everything below:
> 
> The point where the initrd is loaded starts with:
>         RAMDISK: Compressed image found at block 0

I have never seen this message; if it occurs early on, it is doing so
when screen mode changes, and the screen is blank. By the time it is
unblanked, no such RAMDISK message occurs. I am set for something like
60 lines vertical by 142 characters horizontal, so I see a *lot* of the
bootup on just one screen page, but I still miss the few lines right
after the uncompressing of the boot kernel.

> 
> Then it is followed by (as the root filesystem on the ramdisk is mounted):
>         VFS: Mounted root (ext2 filesystem).

In several paragraphs just before this (or rather, the VFS failure
note), no "RAMDISK" occurs.

> [I assume if you're loading XFS as a module, the filesystem on the initrd
>  is ext2, which is compiled into the booting kernel?  Otherwise, you're
>  never going to get anywhere, because it needs to mount/run the initrd to
>  get xfs support, and it can't mount the initrd because it doesn't have
>  the xfs modules loaded yet..]

Yes, XFS as module, initrd on ext2 of separate /boot, which is compiled
in. If I don't have XFS or scsi as module, but all other things remain
constant, it boots correctly and runs great.

> 
> You should now see a few messages as the pagebuf/xfs_support/xfs modules
> load (I assume, never used them as modules):  (NB This might come after
> the SCSI support has loaded):
>         Loading pagebuf module
>         Pagebuf cache Copyright (c) 2001 Silicon Graphics, Inc.
>         Loading xfs_support module
>         Loading xfs module
>         XFS filesystem Copyright (c) 2001 Silicon Graphics, Inc.

This never occurs. It is obvious the ramdisk is not loading, but I
cannot figure out why. The initial ramdisk is correctly made, and I can
use the gzip scheme followed by mounting on loopback, and view the
actual ramdisk contents to verify it has what it should need.

> 
> After this, you should see a few SCSI driver messages at this
> time, as the SCSI adaptor drivers load, then a bunch of lines like:
> 
> (if you have scsi modules compiled in)
>         Loading scsi_mod module
>         Loading sd_mod module

Never occurs.

> 
> (always:)
>         Loading aacraid module (my SCSI driver)
>           Vendor: DELL      Model: PERCRAID Mirror   Rev: 0001
>           Type:   Direct-Access                      ANSI SCSI revision: 02
>         Detected scsi removable disk sda at scsi0, channel 0, id 0, lun 0
>           Vendor: DELL      Model: PERCRAID RAID5    Rev: 0001
>           Type:   Direct-Access                      ANSI SCSI revision: 02
>         Detected scsi removable disk sdb at scsi0, channel 0, id 1, lun 0

Occurs if I have scsi compiled in, regardless of whether xfs root
filesystem mount fails; fails regardless of all other things if scsi is
a module. It is painfully obvious that my initial ramdisk is 100%
ignored.

> 
> Indicating that the disks were detected.
> 
> After that, it enumerates the partitions on the disk:
>         Partition check:
>          sda: sda1 sda2 sda3 < sda5 sda6 sda7 >
>          sdb: sdb1 sdb2 sdb3
> 
> And then it goes to mount the new root file system:
>         VFS: Mounted root (xfs filesystem) readonly.
> 
> If you're not seeing the "Detected scsi disk..." lines above, I'm guessing
> that you haven't got the scsi disk support compiled in, or the disks
> loaded.

The "Detected" part works with compiled-in scsi. I'll have to view it
closer for the partition check, I can't recall if that is visible (when
boot fails, I have to hand copy everything in order to report on it).

> 
> Can you check the compare the output you get of the boot process with what
> I've pasted above, and let us know the differences?  Also, knowing the
> contents of the /linuxrc on the initrd would be handy.
> 
> Chris

The linuxrc from ramdisk when scsi and xfs are both modules:
#!/bin/sash

aliasall

echo "Loading scsi_mod module"
insmod /lib/scsi_mod.o
echo "Loading sd_mod module"
insmod /lib/sd_mod.o
echo "Loading aic7xxx module"
insmod /lib/aic7xxx.o
echo "Loading pagebuf module"
insmod /lib/pagebuf.o
echo "Loading xfs_support module"
insmod /lib/xfs_support.o
echo "Loading xfs module"
insmod /lib/xfs.o


I cannot see how this should fail, except that in the process of running
lilo (I use lilo -v -v), initial ramdisk is being totally ignored. At
one point I found there was a label limit for the name of the label.
When I tried to create this label (much earlier, successful kernel), it
failed, telling me the name was too long:
label=2.4.6-pre1-xfs-2
So I shortened it and it accepted it:
label=2.4.6-p1-xfs-2

I don't know where the size limit is, but it accepts labels of 14
characters, and rejects labels of 16 characters; my guess is it needs 15
characters plus NULL in a 16 char buffer. My initrd though is this on
the failing kernel (via lilo.conf):
  initrd=/boot/initrd-2.4.6-pre1-xfs-3.img

I wonder if it can't accept that long of a name or path? If it fails due
to truncation, but is not telling me, this could be it (I'll try a
shorter name for testing, haven't done so yet though). But in general,
lilo -v -v tells me all things are successful...I have been watching it
for errors...there are no errors or warnings.

D. Stimits, stimits@xxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>