Jeremy Jackson wrote:
Hi,
I'm wonder what's the official word about xfs_repair on a read-only
mounted fs. The utilities complain for me, so I have to boot from a
repair partition to fix XFS (a while back when the shutdown files in use
bug was still a problem). Ext2 has no problem with this. I'd just to
know for future reference, so I know if I have to have a spare root fs
or not.
While I too would like to have a way to repair XFS read-only mounts,
there are other reasons to have a special recovery partition for
this.
What I have done is to make /boot its own partition at the start of
the disk. This is where the kernel lives, along with lilo stuff (or
grub if you use that)
Anyway, I have made a script that will build a mini-boot system in
that partition and that will then run as the init process to fix up
any other filesystems. To help reduce the chance that /boot is
corrupted, I mount it read-only and, since nothing normally runs from
it, if it ever needs fixing, I can just unmount it and fix it after
a regular boot.
The install script (see attached) builds everything needed and even
tells you how much space it used in /boot to do its work. You may need
to add lilo/grub entries that match your environment plus, as currently
written, it assumes a devfs kernel (someone want to make a different
version?)
It does support modular recovery kernels as well, but you do need to add
the linux.lastchance kernel yourself (I take a known good kernel and put
it there and only update it when I once again have a known good kernel)
Given that this is a recurring issue, I may make a web page for this.
--
Michael Sinz -- Director, Systems Engineering -- Worldgate Communications
A master's secrets are only as good as
the master's ability to explain them to others.
#!/bin/sh
#
# $Id: InstallRecoveryBoot 2002/11/23 -- MKSoft Development $
#
# This module contains the script that, when run, will put onto the /boot
# partition, a recovery boot feature. The script will also add the entry
# into the lilo.conf such that it can be used.
#
# NOTE - If you do not clean out the /lib/modules tree from various kernel
# builds, you
#
# Note that this script will destroy any recovery boot feature that it
# may have already installed in order to be able to ensure that the new
# one is complete and correct.
#
## Only root can run this
if [ `id -u` != 0 ]; then
echo "Only root can run this script!"
exit 1
fi
## Remount /boot as read-write...
mount -o rw,remount /boot
if [ $? != 0 ]; then
echo "Unable to mount /boot as read/write. Is /boot a partition?"
exit 1
fi
echo "Installing recovery boot feature into /boot"
## Make sure that it is all just owned by root...
umask 077
## Clean up any old install
rm -rf /boot/bin /boot/lib /boot/etc /boot/sbin /boot/boot /boot/dev /boot/proc
/boot/var /boot/tmp
## Get the size before we install
before_size="`du -sb /boot`"
mkdir -p -m 700 /boot/bin /boot/lib /boot/etc /boot/sbin
mkdir -p -m 000 /boot/boot /boot/proc /boot/dev /boot/var /boot/tmp
## We need an "sh" processor too...
ln -s bash /boot/bin/sh
## Swapoff is just a softlink to swapon
ln -s swapon /boot/sbin/swapoff
## And, since we mount read-only, put /proc/mounts in /etc/mtab
ln -s /proc/mounts /boot/etc/mtab
## Make the special recovery fstab - this is needed to
## make sure that we can run in this mode
cat << 'boot-fstab' > /boot/etc/fstab
# $Id: InstallRecoveryBoot 2002/11/23 -- MKSoft Development $
#
# This is the special recovery boot fstab
# We mount the recovery boot as read-only, just to be safe
# We also bind-mount it into /boot in case lilo.conf is needed.
# We also mount /var and /tmp as tmpfs mounts such that we can
# do some disk operations (everything else is read-only)
/dev/root / auto ro,sync 0 0
/ /boot none bind 0 0
none /proc proc defaults 0 0
none /var tmpfs defaults 0 0
none /tmp tmpfs defaults 0 0
#
### Any swap partitions found in the system's fstab go here:
boot-fstab
## Now, grab, from the system fstab any swap information...
grep "^/dev/.*swap.*swap" /etc/fstab >>/boot/etc/fstab
## The common bit of code that starts the recovery system.
cat << 'common-init' >/boot/sbin/init
# $Id: InstallRecoveryBoot 2002/11/23 -- MKSoft Development $
#
export PATH="/sbin:/bin"
export TERM="ansi"
mount -n -a
## Start a pre-probing shell on vc/4 just as a backup in case
## there is some problem that needs a shell.
export PS1="XFS pre-Recovery Shell\n\w # "
bash 0<>/dev/vc/4 1>&0 2>&0 &
LastWord()
{
while [ "x$1" != "x" ]; do
word="$1"
shift
done
echo $word
}
bootdev="`ls -l /dev/root`"
## Export the boot device name (so we skip it)
export xfs_boot="/dev/`LastWord $bootdev`"
## Start building the list of XFS partitions
## (We assume that the boot device is XFS just
## so that we can display something useful there)
export xfs_parts=$xfs_boot
## Turn on swap, if we have it...
swapon -a -e
## Try and let some async boot items finish
## so lets wait for 5 seconds...
echo ""
echo -n "Waiting for the dust to settle ... "
usleep 5000000
echo "done"
echo -n "Probing disk(s) and partition(s) ... "
## Note that we look for all parts of a disk on
## any host/bus/target/lun - This includes
## "whole" disks which do not have partitions
## This should work for scsi and ide
##
## Arg! - mount/xfs/kernel output even if redirected!
## So, we jump to vc/2 to do all of the work (and thus
## get all of the nasty details there) and then bounce
## back afterwards...
chvt 2
echo "Probing disk(s) and partition(s) ..." 0<>/dev/vc/2 1>&0 2>&0
for part in /dev/*/host*/bus*/target*/lun*/*; do
mkdir -p /tmp/test
echo "$part ... "
if [ "$part" != "$xfs_boot" ]; then
## Note that using mount to test if the
## partition is XFS does two things for us:
## 1) It forces XFS to replay anything that is
## in the log for us
## 2) It keeps us from trying to auto-fix any
## partition that is so corrupted that
## mount can not even replay the log
mount -n -t xfs $part /tmp/test
if [ $? = 0 ]; then
xfs_parts="$xfs_parts $part"
fi
umount /tmp/test
fi
echo ""
rmdir /tmp/test
done 0<>/dev/vc/2 1>&0 2>&0
chvt 1
echo "done"
## Start the remaining recovery process
echo -e "\nXFS Recovery System (details on vc/2)\n"
## Now for each XFS partition that is not the
## boot partition we run xfs_check to see if anything
## is even remotely wrong...
## Note that this exports the xfs_needs_repair variable
## which will contain all of the disks/partitions that
## have something wrong with them.
echo "XFS Partitions:"
echo -e "\n\nXFS Partitions: (details)" 0<>/dev/vc/2 1>&0 2>&0
export xfs_needs_repair=""
for part in $xfs_parts; do
if [ "$part" != "$xfs_boot" ]; then
echo -n "Checking $part ... "
echo -e "\nChecking $part" 0<>/dev/vc/2 1>&0 2>&0
xfs_check $part 0<>/dev/vc/2 1>&0 2>&0
if [ $? = 0 ]; then
echo -e -n "\b\b\b\b- "
echo "OK"
else
echo -e -n "\b\b\b\b- "
echo "needs repair"
xfs_needs_repair="$xfs_needs_repair $part"
fi
else
echo "Skipping $part - boot partition"
fi
done
echo ""
## Start a post-probe recovery shell on vc/3
## This is just in case you need more than 1 for some work
export PS1="XFS Recovery Shell\n\w # "
bash -c 'set ; echo -e "\nRun /sbin/repair to auto-repair\n" ; exec bash'
0<>/dev/vc/3 1>&0 2>&0 &
## Now start the shell or auto-repair script (depending)
export PS1="XFS Recovery Shell (exit to reboot) [auto-repair =
/sbin/repair]\n\w # "
$AUTO_XFS_REPAIR
echo -n "rebooting..."
sync
swapoff -a
umount -a >/dev/null 2>&1
sync
echo -n " please wait..."
reboot -f -d
common-init
chmod 500 /boot/sbin/init
## Make out interactive recovery init script
cat << 'manual-init' >/boot/sbin/init-manual
#!/bin/bash
#
# $Id: InstallRecoveryBoot 2002/11/23 -- MKSoft Development $
#
AUTO_XFS_REPAIR=bash
. /sbin/init
manual-init
chmod 500 /boot/sbin/init-manual || exit 1
## Make our autofix init script
cat << 'auto-init' >/boot/sbin/init-auto
#!/bin/bash
#
# $Id: InstallRecoveryBoot 2002/11/23 -- MKSoft Development $
#
AUTO_XFS_REPAIR=/sbin/repair
. /sbin/init
auto-init
chmod 500 /boot/sbin/init-auto || exit 1
## Make the auto-repair script/command
cat << 'auto-repair' >/boot/sbin/repair
#!/bin/bash
#
# $Id: InstallRecoveryBoot 2002/11/23 -- MKSoft Development $
#
## This script uses the exported xfs_needs_repair
## variable to do its work. This variable should
## contain all of the devices that are XFS filesystems
## that did not pass xfs_check.
_xfs_needs_repair="x $xfs_needs_repair"
for part in $_xfs_needs_repair; do
if [ "$part" != "x" ]; then
echo "Repairing $part..."
xfs_repair $part
echo "Finished $part"
echo ""
fi
done
auto-repair
chmod 500 /boot/sbin/repair || exit 1
## Copy the fstab of the real system into a special file
## for easier reference...
cp /etc/fstab /boot/etc/fstab.system
## A simple routine to check the libraries needed...
CheckLibs ()
{
if [ "$1" != "not" ]; then
libfile=$3
libtarget="/boot/lib/`basename $1`"
if [ ! -f $libtarget ]; then
echo " requires $libtarget"
install -m 500 --strip "$libfile" "$libtarget" || exit 1
fi
fi
}
for file in \
/bin/bash \
/bin/cat \
/bin/chmod \
/bin/chown \
/bin/cp \
/bin/dd \
/bin/df \
/bin/dmesg \
/bin/echo \
/bin/grep \
/bin/ls \
/bin/mkdir \
/bin/more \
/bin/mount \
/bin/mv \
/bin/rm \
/bin/rmdir \
/bin/sync \
/bin/umount \
/bin/usleep \
/bin/vi \
/etc/lilo.conf \
/sbin/lilo \
/sbin/reboot \
/sbin/swapon \
/sbin/xfs_repair \
/usr/bin/chvt \
/usr/bin/du \
/usr/sbin/chroot \
/usr/sbin/xfs_check \
/usr/sbin/xfs_db \
; do
## We don't have a "/usr" in the boot
## recovery area so we delete "/usr"
## if it is there...
target="/boot${file##/usr}"
## Copy the file
echo " installing $target"
install -m 500 --strip "$file" "$target" 2>/dev/null || exit 1
## Now, do we also want to make sure that
## we have the shared libraries that are needed
IFS=$'\n'
for lib in `ldd $file 2>/dev/null`; do
unset IFS
CheckLibs $lib
done
unset IFS
done
## Just to be sure that we have the modules for
## whatever kernel we are using... (stripped :-)
echo " installing kernel modules..."
find /lib/modules -type d -exec mkdir -p /boot\{\} \;
find /lib/modules -type f -exec install -m 500 --strip \{\} /boot\{\} \;
2>/dev/null
## Check if our lilo.conf has the recovery option yet
if [ "x`grep Recovery /etc/lilo.conf`" == "x" ]; then
cat << 'lilo.conf' >>/etc/lilo.conf
# Recovery entries
image=/boot/linux.lastchance
label=Recovery
root=/dev/ide/host0/bus0/target0/lun0/part1
append="init=/sbin/init-manual"
image=/boot/linux.lastchance
label=Autofix
root=/dev/ide/host0/bus0/target0/lun0/part1
append="init=/sbin/init-auto"
lilo.conf
fi
## Run lilo, just to be sure...
echo "Running lilo..."
lilo
## Get the size after we install
after_size="`du -sb /boot`"
## Now, remount /boot based on its fstab settings...
sync
mount -o remount /boot || exit
echo "Done."
echo ""
CalcUsage()
{
diff=$(( $3 - $1 ))
echo "Recovery feature is using $diff bytes in /boot"
}
CalcUsage $before_size $after_size
|