[Top] [All Lists]

Re: Kernel panic, SB validate failed

To: Gaspar Bakos <gbakos@xxxxxxxxxxxxxxx>
Subject: Re: Kernel panic, SB validate failed
From: Net Llama! <netllama@xxxxxxxxxxxxx>
Date: Thu, 4 Dec 2003 14:28:43 -0500 (EST)
Cc: Steve Lord <lord@xxxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <Pine.SOL.4.58.0312041335060.18049@xxxxxxxxxxxxxxxxxxxx>
References: <Pine.SOL.4.58.0312041209470.18049@xxxxxxxx> <3FCF7131.9080703@xxxxxxxx> <Pine.SOL.4.58.0312041310450.18049@xxxxxxxxxxxxxxxxxxxx> <Pine.SOL.4.58.0312041335060.18049@xxxxxxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
On Thu, 4 Dec 2003, Gaspar Bakos wrote:
> Hi,
> > I'll check the cable, or swap it, and see what
> > happens.
> Swapping the cable did not change things, unfortunately.
> The main question now is recovery: how to start gently without loosing
> things - if this is possible at all.
> I remember that the last time I had to run xfs_repair -L, because the
> filesystems were not mountable, and simple xfs_repair suggested used of
> -L. So, I lost a lot of things (not a surprise, but there seemed to be
> no other way to proceed, at least with my knowledge)
> Here is the cable situation a bit more detailed:
> Originally the failed disk was connected as hda with a cable that indeed
> shows some wear (cable #1). The hdb on the same cable, however, showed no
> problems (also XFS, what else).
> After the crash, I replaced the faulty hda to another linux disk to boot
> in, and the faulty disk went in place of hdd, which has a seemingly new
> cable, and with which the previous hdd worked fine. All the xfs_check
> messages I reported were done with this setup, ie. with cable #2.
> Then I put back fauly to hda, and changed cable #1 to yet another cable
> (#3), but the same kernel panic happens. Seems like all disks work fine
> with all the cables I have (as the faulty one also used to).
> This is off-topic, not directly XFS-related, although would be useful to
> filter out non-XFS problems:
> I am thinking if there is a way to investigate if the 'faulty' drive has
> true hw problems. It is recognized at boot in (as hdd), CHS correct,
> hdparm -Ii gives correct response.
> - Is there any meaning of runnig "badblocks" on the device?
> - Or is there an equivalent for this with xfs (badblocks was mainly used
> through e2fsck -c)?

You can start by checking your messages log for any errors.  If that comes
up clean, and the disk & controller are SMART capable, there is a SMART
suite of utilities (google for it) that you can run that will determine if
the drive is on the verge of failure.

Also, some drive vendors have diagnostic tools that you can run, normally
from a bootable floppy disk.

Lonni J Friedman                                netllama@xxxxxxxxxxxxx
Linux Step-by-step & TyGeMo                  http://netllama.ipfox.com

<Prev in Thread] Current Thread [Next in Thread>