mkfs.xfs pagefault when removed storage during operation
Eric Sandeen
sandeen at sandeen.net
Thu Feb 3 00:07:22 CST 2011
On 2/3/11 12:03 AM, Ajeet Yadav wrote:
> Sorry I do not agree, we have a bug so we cannot ignore it.
> Solving at first place can save a lot of time if same problem create a
> side effect that may sometime be very hard to catch.
>
> Now lets consider the current problem
> 1. Its related to libxfs in xfsprogs, so its not mkfs issue anymore
> 2. If we come across any critical problem in libxfs we can cross
> verify kernel xfs implementation to find if there also a logical
> issue.
> One learning and be used in other part.
> 3. Yes I agree that if mkfs.xfs fails we have to re-run it anyways,
> but then what is the difference between a novice code and professional
> product.
> If you cscope libxfs_trans_read_buf() in xfsprogs, its caller
> always checks the return value, and its used extensively in xfsprogs.
> But this function always return 0. Infact there is no error handding
> at all, lets not consider EIO error only.
> 4. We are here in open community out of need, at the same time to make
> it better.
>
> I was wondering why I am not getting any reply, I think mail subject
> was wrong......mkfs ;)
I may have jumped at that too quickly, yes ;)
> I will release the patch, please take out time to review it.
Well, that's fair enough, that's how it works - if it's important to you,
and you want to fix it, then you can! And if properly done it gets
merged.
-Eric
> On Thu, Feb 3, 2011 at 1:10 PM, Eric Sandeen <sandeen at sandeen.net> wrote:
>> On 2/1/11 5:06 AM, Ajeet Yadav wrote:
>>> We are testing mkfs.xfs and xfs_repair stability to look for crashes
>>> and other issues specially with removable devices.
>>> And unfortunately crashes does occur.
>>> Code inspection shows in most cases the caller does not handle
>>> libxfs_readbuf() for error cases i.e when return value = NULL.
>>>
>>> Now I need your suggestion.
>>> We should fix all such cases or the simplest way is to exit... if
>>> read() or write() fails with EIO errorno in libxfs_readbufr() and
>>> libxfs_writebufr().
>>
>> I see very little reason to gracefully handle all error cases
>> during mkfs. It would be prettier, yes, but if mkfs fails, with
>> or without an error, with or without a segfault, you have to
>> just start it over anyway, right?
>>
>> I think there are better places to focus effort.
>>
>> -Eric
>>
>>> Fortunately these function already support exit, if we use flag
>>> LIBXFS_EXIT_ON_FAILURE, LIBXFS_B_EXIT but they are used selectively.
>>>
>>> The current problem is related to function libxfs_trans_read_buf()
>>>
>>> bp = libxfs_readbuf(dev, blkno, len, flags);
>>> #ifdef XACT_DEBUG
>>> fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp);
>>> #endif
>>> xfs_buf_item_init(bp, tp->t_mountp);
>>> bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
>>> bip->bli_recur = 0;
>>> xfs_trans_add_item(tp, (xfs_log_item_t *)bip);
>>>
>>> /* initialise b_fsprivate2 so we can find it incore */
>>> XFS_BUF_SET_FSPRIVATE2(bp, tp);
>>> *bpp = bp;
>>> return 0;
>>>
>>> if libxfs_readbuf() fails due to device removal or other error, bp = NULL.
>>> In function xfs_buf_item_init(bp, tp->t_mountp) as soon as bp is
>>> dereferenced occurs
>>>
>>> mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs at oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>
>>
>>
>
More information about the xfs
mailing list