xfs
[Top] [All Lists]

Re: mkfs.xfs pagefault when removed storage during operation

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: mkfs.xfs pagefault when removed storage during operation
From: Ajeet Yadav <ajeet.yadav.77@xxxxxxxxx>
Date: Thu, 3 Feb 2011 15:03:47 +0900
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=5um94D1lqXfEAGF0WvC2mN5outLfQ1Xb9CN8mwfBllw=; b=xVXw2E6/eus7QbC9mzlnoWTACIUc0SgjhTUnCF65uRyU6o6WYH0pUmzwjUHUp5qIis pU88mLU4Zz4QexnM5L0eNLcSG3FEWgHQBb3+2SdTSdyA8J0DT/2QKH2fy7sxAtWH4BHi VFyY/VSq45jzvdoitGMe1ouS/oYt6pNtv1vQc=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=pAYtTCAOiiHvauO5AFGOobPfIHFL/x5aRUW/qO2rniXidR3Kt0UVmcZQdeE3KD8mmk WU9RJiDNrrP6T3ggFSdPlUYQdUQ9IPfuD5alyMdDyGlcL2YaZ5YXMoAxTAzIkdgFdhD6 kM3rYfULjBqWMQGq5HtlwBNjZ+4Auxz46rOH4=
In-reply-to: <4D4A2A9B.6090803@xxxxxxxxxxx>
References: <AANLkTi=wi_Fhr5v1J4wopvFTY=hC2EA_QmJu4Uc_XgGs@xxxxxxxxxxxxxx> <4D4A2A9B.6090803@xxxxxxxxxxx>
Sorry I do not agree, we have a bug so we cannot ignore it.
Solving at first place can save a lot of time if same problem create a
side effect that may sometime be very hard to catch.

Now lets consider the current problem
1. Its related to libxfs in xfsprogs, so its not mkfs issue anymore
2. If we come across any critical problem in libxfs we can cross
verify kernel xfs implementation to find if there also a logical
issue.
    One learning and be used in other part.
3. Yes I agree that if mkfs.xfs fails we have to re-run it anyways,
but then what is the difference between a novice code and professional
product.
     If you cscope libxfs_trans_read_buf() in xfsprogs, its caller
always checks the return value, and its used extensively in xfsprogs.
But this function always return 0. Infact there is no error handding
at all, lets not consider EIO error only.
4. We are here in open community out of need, at the same time to make
it better.

I was wondering why I am not getting any reply, I think mail subject
was wrong......mkfs ;)
I will release the patch, please take out time to review it.

On Thu, Feb 3, 2011 at 1:10 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
> On 2/1/11 5:06 AM, Ajeet Yadav wrote:
>> We are testing mkfs.xfs and xfs_repair stability to look for crashes
>> and other issues specially with removable devices.
>> And unfortunately crashes does occur.
>> Code inspection shows in most cases the caller does not handle
>> libxfs_readbuf() for error cases i.e when return value = NULL.
>>
>> Now I need your suggestion.
>> We should fix all such cases or the simplest way is to exit... if
>> read() or write() fails with EIO errorno in libxfs_readbufr() and
>> libxfs_writebufr().
>
> I see very little reason to gracefully handle all error cases
> during mkfs. It would be prettier, yes, but if mkfs fails, with
> or without an error, with or without a segfault, you have to
> just start it over anyway, right?
>
> I think there are better places to focus effort.
>
> -Eric
>
>> Fortunately these function already support exit, if we use flag
>> LIBXFS_EXIT_ON_FAILURE, LIBXFS_B_EXIT but they are used selectively.
>>
>> The current problem is related to function libxfs_trans_read_buf()
>>
>>        bp = libxfs_readbuf(dev, blkno, len, flags);
>> #ifdef XACT_DEBUG
>>         fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, 
>> tp);
>> #endif
>>         xfs_buf_item_init(bp, tp->t_mountp);
>>         bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
>>         bip->bli_recur = 0;
>>         xfs_trans_add_item(tp, (xfs_log_item_t *)bip);
>>
>>         /* initialise b_fsprivate2 so we can find it incore */
>>         XFS_BUF_SET_FSPRIVATE2(bp, tp);
>>         *bpp = bp;
>>         return 0;
>>
>> if  libxfs_readbuf() fails due to device removal or other error, bp = NULL.
>> In function xfs_buf_item_init(bp, tp->t_mountp) as soon as bp is
>> dereferenced occurs
>>
>> mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@xxxxxxxxxxx
>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>
>

<Prev in Thread] Current Thread [Next in Thread>