xfs-masters
[Top] [All Lists]

[xfs-masters] Re: potential xfs_repair bug

To: Eric Sandeen <sandeen@xxxxxxx>
Subject: [xfs-masters] Re: potential xfs_repair bug
From: Junfeng Yang <yjf@xxxxxxxxxxxx>
Date: Thu, 2 Sep 2004 12:34:42 -0700 (PDT)
Cc: xfs-masters@xxxxxxxxxxx
In-reply-to: <4136A0EB.5000204@sgi.com>
Reply-to: xfs-masters@xxxxxxxxxxx
Sender: xfs-masters-bounce@xxxxxxxxxxx
the image was created by
1. mkfs.xfs -q -d agcount=2
2. create one file with a 255-char long name in dir '/'
3. umount to sync the modifiction

in other words, this image is a prefectly valid one.  presumably,
xfs_repair shouldn't repair anything at all.


to reproduce the error,

1. get the http://keeda.stanford.edu/~junfeng/orig.img.bz2.  or you can
re-create the image flollowing the above instructions.

2. change xfsprogs/libxfs/rdwr.c function libxfs_writebuf_int to make it
stop after writing out the first 7 sectors of block 96 (assuming only
1-sector write is guarenteed to be atomic), as the following:

libxfs_writebuf_int(xfs_buf_t *buf, int flags)
{
        int     sts;
        int     fd = libxfs_device_to_fd(buf->b_dev);

#ifdef IO_DEBUG
        fprintf(stderr, "writing %ubytes at blkno=%llu(%llu), %p\n",
                buf->b_bcount, BBTOOFF64(buf->b_blkno), buf->b_blkno,
buf);
#endif
        if (buf->b_blkno ==  96) {
                // stop fsck after writing out the first 7 sectors of block 96
                sts = pwrite64(fd, buf->b_addr, buf->b_bcount-512,
                               BBTOOFF64(buf->b_blkno));
                exit(0);
        }

        sts = pwrite64(fd, buf->b_addr, buf->b_bcount,
BBTOOFF64(buf->b_blkno));
        if (sts < 0) {
                fprintf(stderr, _("%s: pwrite64 failed: %s\n"),
                        progname, strerror(errno));
                if (flags & LIBXFS_EXIT_ON_FAILURE)
                        exit(1);
                return errno;
        }
        else if (sts != buf->b_bcount) {
                fprintf(stderr, _("%s: error - wrote only %d of %d
bytes\n"),
                        progname, sts, buf->b_bcount);
                if (flags & LIBXFS_EXIT_ON_FAILURE)
                        exit(1);
                return EIO;
        }
        return 0;
}


3. run the *unmodified* xfs_repair on the same image again, and you'll see


Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
corrupt block 0 in directory inode 128
    will junk block
no . entry for directory 128
no .. entry for root directory 128
problem with directory contents in inode 128
cleared root inode 128
        - agno = 1
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
root inode lost
        - clear lost+found (if it exists) ...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
reinitializing root directory
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ...
        - traversal finished ...
        - traversing all unattached subtrees ...
        - traversals finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 131, moving to lost+found
disconnected dir inode 132, moving to lost+found
Phase 7 - verify and correct link counts...
resetting inode 128 nlinks from 2 to 3
done

after this step,  content of '/' gets wiped off.

-Junfeng

On Wed, 1 Sep 2004, Eric Sandeen wrote:

> (cc:ing xfs-masters to keep everyone in the loop)
>
> How was the filesystem damaged before running xfs_repair the first time?
>
> I'm not sure how much effort was put into protecting xfs_repair from
> crashes... I can believe that a poorly-timed crash could cause problems.
>
> -Eric
>
> Junfeng Yang wrote:
>
> > Hi Eric,
> >
> > do you guys intend to make xfs_repair crash-safe?
> >
> > out tool flagged a warning for xfs_repair, where a prefectly correct file
> > system can be messed up by xfs_repair if crash happens during xfs_repair.
> > the warning can be triggered this way:
> >
> > 1. run xfs_repair on a xfs image containg 1 file.  crash it right before
> > the last sector is written to disk.  (in our case it is the write to
> > sector 103.)
> >
> > 2. run xfs_repair again on the same image.  it reports that root dir
> > content is corrupted so it cleans up root dir and moves the file to
> > lost+found
> >
> > it appears to me that a crash during xfs_repair may wipe off all the
> > entries under '/'.  does xfs_repair try to rebuild the root dir?  I'm not
> > sure if this should be considered as a bug or not.  any
> > conformation/clarafication is appreciated.
> >
> > the fs image before step 1 can be obtainted at
> > http://keeda.stanford.edu/~junfeng/orig.img.bz2
> >
> > image after step 1 is at
> > http://keeda.stanford.edu/~junfeng/half.img.bz2
> >
> > Thanks,
> > -Junfeng
>


<Prev in Thread] Current Thread [Next in Thread>