xfs
[Top] [All Lists]

Re: update: Re: XFS 1.2 bug?

To: Russell Cattelan <cattelan@xxxxxxxxxxx>
Subject: Re: update: Re: XFS 1.2 bug?
From: Steven Dake <sdake@xxxxxxxxxx>
Date: Wed, 26 Mar 2003 10:22:58 -0700
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <3E8151E4.809@thebarn.com>
References: <3E809EB1.6000904@mvista.com> <1048639851.11129.44.camel@rose.americas.sgi.com> <3E80FEB5.90801@mvista.com> <3E8106CD.2050101@mvista.com> <3E8151E4.809@thebarn.com>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021130
Russell thanks for the responses and questions below.

Russell Cattelan wrote:

Ok I've looked over the code.
write_full_page is modeled after block_write_full_page.

Is block_write_full_page something new in XFS CVS? I don't see it in 1.2...

The past the end of file is the same check used in that function.
Basically the offset var used to see if size of the file is part way
into a page so if the size of the file lands on a page boundry offset
will be 0.

I understand that the code is doing that, but I was wondering why XFS should care that write_full_page should fail if the file size is on a page boundary. If the file size is on a page boundary, shouldn't XFS be able to write the page just fine?



As far as the unlock_page in the error case ... again based on block_write_full_page
it seems correct to unlock the page in case of an error.

If you unlock it here, you must not unlock it in generic_file_write_nolock. It can only be unlocked once... So the code should pick where it shall unlock. Since it will be unlocked later in generic_file_write_nolock, I don't understand the need to unlock it in write_full_page at all. The only case I can think of unlocking the page, is when it is _not_ called from generic_file_write_nolock. Perhaps this is happening somewhere and that is the need for the unlock, but if that is the case, generic_file_write_nolock must not unlock the page a second time... Tthe error code for write_full_page is never passed up to generic_file_write_nolock, so generic_file_write_nolock is never aware that it should _not_ unlock the page because the I/O had a previous error.



do you have the backtrace of crash again? I'm wondering who called write_full_page
in this case.

The crash is not in write_full_page. write_full_page is the first caller to unlock the page. The crash occurs in generic_file_write_nolock which also calls unlock_page on the same page as write_full_page did, causing a BUG() call. The calling chain is:


First generic_file_write_nolock is entered from a system call write:

system_call
sys_write
linvfs_write
xfs_write

Then the page is unlocked via this call chain:
generic_file_write_nolock
generic_commit_write
__block_commit_write
balance_dirty
write_some_buffer
write_buffer_delay
linvfs_write_full_page
write_full_page
unlock_page

Then the page is unlocked again via this call chain:
generic_file_write_nolock
unlock_page



Steven Dake wrote:

Russell,

I had a deeper look at why write_full_page was failing...  In the code:
STATIC int
write_full_page(
   struct page     *page,
   int         delalloc)
{
   struct inode        *inode = (struct inode*)page->mapping->host;
   unsigned long       end_index = inode->i_size >> PAGE_CACHE_SHIFT;
   int         ret;

/* Are we off the end of the file ? */
if (page->index >= end_index) {
unsigned offset = inode->i_size & (PAGE_CACHE_SIZE-1);
if ((page->index >= end_index+1) || !offset) {
ret = -EIO;
printk ("Off the end of the file %d %d %d %d\n", page->index, end_index, offset, inode-
>i_size);
goto out;
}
}


I added the above printk and got the output
Mar 24 22:49:47 192 kernel: Off the end of the file 91406 91406 0 374398976


offset is zero, because 374398976 & ((1 << 12) -1) is zero. This operation looks like it should proceed since the size is ok and the page index and end index are within ranges. Why should we error if the size & PAGE_CACHE_SIZE-1 is zero?

Any help appreciated on the two issues
(the offset issue and the extra unlock_pages)

Thanks!
-steve

Steven Dake wrote:

Russell,

Thanks for the hints but I believe I found a bug in XFS 1.2, and perhaps you could verify it since I'm not very familiar with the XFS code.

both linvfs_write_full_page and write_full_page will attempt to unlock_page in some circumstances. The page is previously locked in the function generic_file_write_nolock. This generic_file_write_nolock function will unlock the page after it is done using it. Unlocking a page twice will result in an Oops, so either the mm layer (generic_file_write_nolock) shouldn't be unlocking the page, or the xfs code in linvfs_write_full_page and write_full_page shouldn't unlock the page a second time. The unlock in the xfs code is suspect, since it only unlocks on certain conditions (which will result in an Oops)

The function linvfs_write_full_page unlocks a page in the case of an error conditon. I have not been able to hit this error condition, but if it occurs, an Oops will most certainly occur.

STATIC int
linvfs_write_full_page(
   struct page     *page)
{
   int         flagset = 0;
   int         error;
   int         need_trans;
   int         nr_delalloc, nr_unmapped;

   if (count_page_state(page, &nr_delalloc, &nr_unmapped)) {
       need_trans = nr_delalloc + nr_unmapped;
   } else {
       need_trans = 1;
   }

   if ((current->flags & (PF_FSTRANS|PF_NOIO)) && need_trans)
       goto out_fail;

   if (need_trans) {
       current->flags |= PF_NOIO;
       flagset = 1;
   }

   error = write_full_page(page, nr_delalloc);

   if (flagset)
       current->flags &= ~PF_NOIO;
   return error;

out_fail:
   SetPageDirty(page);
   unlock_page(page);
   return 0;
}

The function write_full_page will unlock the page if delalloc_convert returns a negative value. I don't understand why delalloc_convert returns a negative value, and if the page should be unlocked, but it seems that it shouldn't. Could you give me some background on why the page is unlocked at this point? If I comment out this unlock, bonnie++ does not crash and the filesystem seems still stable, but I'd like to understand _why_ this condition is occuring and if the unlock_page is really needed, or if letting the mm handle it is appropriate.

STATIC int
write_full_page(
   struct page     *page,
   int         delalloc)
{
   struct inode        *inode = (struct inode*)page->mapping->host;
   unsigned long       end_index = inode->i_size >> PAGE_CACHE_SHIFT;
   int         ret;

   /* Are we off the end of the file ? */
   if (page->index >= end_index) {
       unsigned offset = inode->i_size & (PAGE_CACHE_SIZE-1);
       if ((page->index >= end_index+1) || !offset) {
           ret =  -EIO;
           goto out;
       }
   }

   if (!page_has_buffers(page)) {
       create_empty_buffers(page, inode->i_dev, 1 << inode->i_blkbits);
   }

   ret = delalloc_convert(inode, page, 1, delalloc == 0);

out:
   if (ret < 0) {
       /*
        * If it's delalloc and we have nowhere to put it,
        * throw it away.
        */
       if (delalloc)
           block_flushpage(page, 0);
       ClearPageUptodate(page);
//      unlock_page(page);
   }

   return ret;
}

It looks like the real problem is that no error return is processed to not unlock the buffer if an error is returned by linvfs_write_full_page by the function generic_file_write_nolock in the call chain but the return values are used to indicate state in some of the calls and cannot be used as error returns...

The call stack that gets to the point where the page is unlocked is:
generic_file_write_nolock()
generic_commit_write()
__block_commit_write()
balance_dirty()
write_some_buffers()
write_buffer_delay()
linvfs_write_full_page()
write_full_page()

After that the finish of the call generic_file_write_nolock will also unlock that same page causing an oops.

Thanks!
-steve

Rusell Cattelan wrote:

On Tue, 2003-03-25 at 12:23, Steven Dake wrote:


XFS Developers,

Ok here is what I have done:

I have a tree that already had XFS 1.1 (with a bunch of other stuff) included. I hand applied the XFS 1.2 on top of XFS 1.1 patch and everything seems to work fine, except when I run bonnie++ and under load (Writing intelligently operation), I receive the following Oops:




If possible get kbd into your tree. Then open a bug and attach what ever backtraces you can get. bta is a good place to start.

That might shed some light on the situation, but without wading through
the lock changes you might have in your tree this problem may be hard to track down.


-Russell
















<Prev in Thread] Current Thread [Next in Thread>