xfs
[Top] [All Lists]

Re: [PATCH] Re-dirty pages on I/O error

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: [PATCH] Re-dirty pages on I/O error
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Thu, 11 Sep 2008 16:21:40 -0500
Cc: Lachlan McIlroy <lachlan@xxxxxxx>, xfs-dev <xfs-dev@xxxxxxx>, xfs-oss <xfs@xxxxxxxxxxx>
In-reply-to: <20080911103342.GA17482@xxxxxxxxxxxxx>
References: <48C8D8CD.7050508@xxxxxxx> <20080911103342.GA17482@xxxxxxxxxxxxx>
User-agent: Thunderbird 2.0.0.16 (X11/20080723)
Christoph Hellwig wrote:
> On Thu, Sep 11, 2008 at 06:37:33PM +1000, Lachlan McIlroy wrote:
>> If we get an error in xfs_page_state_convert() - and it's not EAGAIN - then
>> we throw away the dirty page without converting the delayed allocation.  This
>> leaves delayed allocations that can never be removed and confuses code that
>> expects a flush of the file to clear them.  We need to re-dirty the page on
>> error so we can try again later or report that the flush failed.
>>
>> --- a/fs/xfs/linux-2.6/xfs_aops.c    2008-09-11 16:32:11.000000000 +1000
>> +++ b/fs/xfs/linux-2.6/xfs_aops.c    2008-09-11 15:44:09.000000000 +1000
>> @@ -1147,16 +1147,6 @@ error:
>>      if (iohead)
>>              xfs_cancel_ioend(iohead);
>>
>> -    /*
>> -     * If it's delalloc and we have nowhere to put it,
>> -     * throw it away, unless the lower layers told
>> -     * us to try again.
>> -     */
>> -    if (err != -EAGAIN) {
>> -            if (!unmapped)
>> -                    block_invalidatepage(page, 0);
>> -            ClearPageUptodate(page);
>> -    }
> 
> While this always looked fishy to me we it needs a good explanation to
> kill this.  I try to remember why Steve did it this way long time ago.

Hrm some of that was my fault, ages ago.

http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/pagebuf/Attic/page_buf_io.c.diff?r1=1.2;r2=1.3;hideattic=0

I don't remember the details fo why.... ah here's a clue

http://oss.sgi.com/archives/xfs/2002-01/msg00475.html

> As _xfs_force_shutdown was written, it tried to schedule in an interrupt 
> context
> and caused a BUG() to be thrown.

> Also, even if we didn't try to deal with leftover buffers in the interrupt,
> they subsequently had their delalloc flags removed, and thus queued up
> to clobber block 0 (1,2,3) on the disk, thus corrupting the filesystem.

so back then, delalloc buffers w/o a home would eventually slam into the
superblock, I guess.

Anyway, if this is redirtied, will it ever go away for an IO error that
persists?

-Eric

>> @@ -1216,8 +1206,11 @@ xfs_vm_writepage(
>>       * then mark the page dirty again and leave the page
>>       * as is.
>>       */
>> -    if (current_test_flags(PF_FSTRANS) && need_trans)
>> -            goto out_fail;
>> +    if (current_test_flags(PF_FSTRANS) && need_trans) {
>> +            redirty_page_for_writepage(wbc, page);
>> +            unlock_page(page);
>> +            return -EAGAIN;
> 
> The redirty, unlock, return sequence is duplicated after your
> patch, I think we should still keep the out_fail goto.  Also returning
> -EGAIN from ->writepage is wrong.  The return values goes through
> handle_write_error and mapping_set_error into the return value of e.g.
> msync.  If you look at all similar writepage implementation they only
> return a negative error for a real error condition and simply return 0
> when just redirtying it due to transaction constraints or when trylocks
> fail.
> 
> 

<Prev in Thread] Current Thread [Next in Thread>