On Mon, 5 Jan 2004, Rainer Krienke wrote:
> Hello,
>
> a happy new year to everyone on this list ...
>
> well for some of my xfs filesystems the year had a bad start. Due to a
> powerfail on some systems that possibly rebootet and later again crashed due
> to another powerfail some xfs filesystem were damaged. I could not mount
> theses filesystems and so ran xfs_repair -L.
What was the failure when you first tried to mount?
> For one filesystem (~150GB) this worked. xfs_repair reported some errors in
> the filesystem but finished its work. Next I tired to mount this filesystem
> but mount complained that it could not find a valid superblock. So I ran
> xfs_repair once again. It still found some errors (but less than before).
> Next I rebootet the machine and the filesystem was mounted.
Unable to find a superblock immediately after repair? I have never
seen this before, sounds very odd. BTW running repair twice in
sucession will keep finding "errors" because the first run
creates /lost+found, and the second run unlinks /lost+found then
rediscovers everything that was in it as disconnected. This
is a "feature" that I'd like to change some day. Moving /lost+found
after the first run should make repair run cleanly the next time,
unless something is really still wrong with the filesystem.
In any case a successful xfs_repair run followed by a bad superblock
is a big red flag indicating... something. :)
> Another filesystem on a second machien (~40GB) that strange enough had been
> mounted on reboot but was not accessible (ls /filesystem: input output error)
> could not be repaired by xfs_repair:
Sounds like the filesystem shut down due to some error, can you check
your logs? In fact checking your logs in general might be useful
here, I wonder if there is anything else going on.
> again I had to use -L since after unmounting I was no longer able to mount
> it. xfs_repair reported a lot more errors compared to the first filesystem
> from above and then in pass 4 I think when traversing the directory hierarchy
> from / it segfaultet. A second and third run of xfs_repair produced each time
> more errors but always ended in a segfault. In between I recovered the data
> from tape, but I still have the old broken filesystem for further
> investigation if needed.
Can you convince it to dump a corefile? We could then have a better
idea of what's going wrong.
> Now I would like to know if this behaviour is "normal":
>
> - Can a filesystem with tansaction logging like xfs become inconsitant
> because
> of a power fail? There is no disk failure!
It should not be inconsistent (disk+log should always be consistent).
You may have data loss though due to cached data which never makes it
to disk.
> - Should xfs_repair find all errors in one run (like a regular fsck does) or
> do I have to run it again and again until it reports no more errors?
See above; in general it should find all errors except for the
/lost+found "feature."
> - Is it a known issue that xfs_repair seg faults sometimes or is it perhaps a
> problem of my version (see below) ?
I wonder if maybe it's failing a memory allocation for the large
filesystem; I thought i remembered a (fixed) problem like this
but I don't see it in the changelogs. A little gdb debugging
would be a big help.
> - Can I do something else to avoid corrupted xfs filesystems in case of a
> crash.
You shouldn't have to do anything special; let's try to get to the
bottom of what happened here.
> The machine is a suse linux 8.2 multiprocessor system with SuSE patched
> kernel 2.4.21-144 (from a suse 9.0 system). The xfs kernel driver reports
> version 1.3.1 but I think the filesystems in question were created with an
> earlier kernel I guess with driver version 1.3beta. xfs_repair has version
> 2.5.6. All filesystems are on logical volumes that are handled by lvm2. The
> underlying (lvm) physical volume is a software raid 1, to prevent data loss
> due to a disk failure.
Those look like pretty recent versions of everything.
-Eric
|