xfs
[Top] [All Lists]

Re: xfs_repair segfaut in stage 6

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: xfs_repair segfaut in stage 6
From: Alex Elder <aelder@xxxxxxx>
Date: Tue, 20 Sep 2011 15:25:41 -0500
Cc: Bartosz Cisek <bartosz.cisek@xxxxxxxxxxxxxx>, Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx>, <xfs@xxxxxxxxxxx>
In-reply-to: <20110914153852.GA11050@xxxxxxxxxxxxx>
References: <4E69D1B9.8070201@xxxxxxxxxxxxxx> <201109091401.31422@xxxxxx> <4E6A2B73.50503@xxxxxxxxxxxxxx> <20110912161215.GA17798@xxxxxxxxxxxxx> <4E707624.9030703@xxxxxxxxxxxxxx> <20110914142430.GA28049@xxxxxxxxxxxxx> <4E70C15C.3030502@xxxxxxxxxxxxxx> <20110914153852.GA11050@xxxxxxxxxxxxx>
Reply-to: <aelder@xxxxxxx>
On Wed, 2011-09-14 at 11:38 -0400, Christoph Hellwig wrote:
> On Wed, Sep 14, 2011 at 04:59:40PM +0200, Bartosz Cisek wrote:
> > Stack trace is pasted in bug issue [1] that is linked in first mail ;)
> > Compiled by hand from git: "DEBUG=-DDEBUG make". I don't know why some
> > of values are 'optimized out'.
> > 
> > [1] http://oss.sgi.com/bugzilla/show_bug.cgi?id=914
> 
> Looks like we do not handle read I/O errors very well (to say at all)
> in phase6.  Can you see if the patch below makes a difference?

Christoph, I'm assuming you want this reviewed
as a submitted patch.


> ---
> From: Christoph Hellwig <hch@xxxxxx>
> Subject: repair: fix I/O error handling
> 
> Currently libxfs_trans_read_buf never returns an error, even if
> libxfs_readbuf did not manage to complete the I/O.  This is different
> from the kernel behaviour and can lead to segfaults in code that
> doesn't expect it.  Add a new b_error member to xfs_buf (mirroring
> the kernel version) and use that to propagate proper error codes
> to the caller.  Also fix libxfs_readbufr to handle short reads
> properly, and to not override errno values e.g. by a fprintf.
> 
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> 
> Index: xfsprogs-dev/include/libxfs.h
> ===================================================================
> --- xfsprogs-dev.orig/include/libxfs.h        2011-09-14 11:17:42.660738577 
> -0400
> +++ xfsprogs-dev/include/libxfs.h     2011-09-14 11:20:45.959738580 -0400
> @@ -230,6 +230,7 @@ typedef struct xfs_buf {
>       void                    *b_fsprivate2;
>       void                    *b_fsprivate3;
>       char                    *b_addr;
> +     int                     b_error;
>  #ifdef XFS_BUF_TRACING
>       struct list_head        b_lock_list;
>       const char              *b_func;
> Index: xfsprogs-dev/libxfs/rdwr.c
> ===================================================================
> --- xfsprogs-dev.orig/libxfs/rdwr.c   2011-09-14 11:12:08.807741720 -0400
> +++ xfsprogs-dev/libxfs/rdwr.c        2011-09-14 11:20:21.183238272 -0400
> @@ -314,6 +314,7 @@ libxfs_initbuf(xfs_buf_t *bp, dev_t devi
>       bp->b_blkno = bno;
>       bp->b_bcount = bytes;
>       bp->b_dev = device;
> +     bp->b_error = 0;
>       if (!bp->b_addr)
>               bp->b_addr = memalign(libxfs_device_alignment(), bytes);
>       if (!bp->b_addr) {
> @@ -454,15 +455,17 @@ libxfs_readbufr(dev_t dev, xfs_daddr_t b
>  {
>       int     fd = libxfs_device_to_fd(dev);
>       int     bytes = BBTOB(len);
> +     int     error;
>  
>       ASSERT(BBTOB(len) <= bp->b_bcount);
>  
> -     if (pread64(fd, bp->b_addr, bytes, LIBXFS_BBTOOFF64(blkno)) < 0) {
> +     if (pread64(fd, bp->b_addr, bytes, LIBXFS_BBTOOFF64(blkno)) != bytes) {

If we reach EOF this returns 0, but errno is I think
going to be 0.  Do we want to print a "read failed"
message in that case?  Is EOF a failure, or just
a somewhat normal condition?

Also, it may not matter in the calling code (I
did only a quick check) but maybe it would be
better to set bp->b_error here, where the error
really occurred, rather than in libxfs_readbuf().

Other than that, this change looks good to me.

Reviewed-by: Alex Elder <aelder@xxxxxxx>


<Prev in Thread] Current Thread [Next in Thread>