xfs
[Top] [All Lists]

Re: mkfs.xfs pagefault when removed storage during operation

To: Ajeet Yadav <ajeet.yadav.77@xxxxxxxxx>
Subject: Re: mkfs.xfs pagefault when removed storage during operation
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Wed, 02 Feb 2011 22:10:03 -0600
Cc: xfs@xxxxxxxxxxx
In-reply-to: <AANLkTi=wi_Fhr5v1J4wopvFTY=hC2EA_QmJu4Uc_XgGs@xxxxxxxxxxxxxx>
References: <AANLkTi=wi_Fhr5v1J4wopvFTY=hC2EA_QmJu4Uc_XgGs@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7
On 2/1/11 5:06 AM, Ajeet Yadav wrote:
> We are testing mkfs.xfs and xfs_repair stability to look for crashes
> and other issues specially with removable devices.
> And unfortunately crashes does occur.
> Code inspection shows in most cases the caller does not handle
> libxfs_readbuf() for error cases i.e when return value = NULL.
> 
> Now I need your suggestion.
> We should fix all such cases or the simplest way is to exit... if
> read() or write() fails with EIO errorno in libxfs_readbufr() and
> libxfs_writebufr().

I see very little reason to gracefully handle all error cases
during mkfs. It would be prettier, yes, but if mkfs fails, with
or without an error, with or without a segfault, you have to 
just start it over anyway, right?

I think there are better places to focus effort.

-Eric

> Fortunately these function already support exit, if we use flag
> LIBXFS_EXIT_ON_FAILURE, LIBXFS_B_EXIT but they are used selectively.
> 
> The current problem is related to function libxfs_trans_read_buf()
> 
>        bp = libxfs_readbuf(dev, blkno, len, flags);
> #ifdef XACT_DEBUG
>         fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp);
> #endif
>         xfs_buf_item_init(bp, tp->t_mountp);
>         bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
>         bip->bli_recur = 0;
>         xfs_trans_add_item(tp, (xfs_log_item_t *)bip);
> 
>         /* initialise b_fsprivate2 so we can find it incore */
>         XFS_BUF_SET_FSPRIVATE2(bp, tp);
>         *bpp = bp;
>         return 0;
> 
> if  libxfs_readbuf() fails due to device removal or other error, bp = NULL.
> In function xfs_buf_item_init(bp, tp->t_mountp) as soon as bp is
> dereferenced occurs
> 
> mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
> 

<Prev in Thread] Current Thread [Next in Thread>