Russell,
Thanks for the hints but I believe I found a bug in XFS 1.2, and
perhaps you could verify it since I'm not very familiar with the XFS
code.
both linvfs_write_full_page and write_full_page will attempt to
unlock_page in some circumstances. The page is previously locked in
the function generic_file_write_nolock. This
generic_file_write_nolock function will unlock the page after it is
done using it. Unlocking a page twice will result in an Oops, so
either the mm layer (generic_file_write_nolock) shouldn't be
unlocking the page, or the xfs code in linvfs_write_full_page and
write_full_page shouldn't unlock the page a second time. The unlock
in the xfs code is suspect, since it only unlocks on certain
conditions (which will result in an Oops)
The function linvfs_write_full_page unlocks a page in the case of an
error conditon. I have not been able to hit this error condition,
but if it occurs, an Oops will most certainly occur.
STATIC int
linvfs_write_full_page(
struct page *page)
{
int flagset = 0;
int error;
int need_trans;
int nr_delalloc, nr_unmapped;
if (count_page_state(page, &nr_delalloc, &nr_unmapped)) {
need_trans = nr_delalloc + nr_unmapped;
} else {
need_trans = 1;
}
if ((current->flags & (PF_FSTRANS|PF_NOIO)) && need_trans)
goto out_fail;
if (need_trans) {
current->flags |= PF_NOIO;
flagset = 1;
}
error = write_full_page(page, nr_delalloc);
if (flagset)
current->flags &= ~PF_NOIO;
return error;
out_fail:
SetPageDirty(page);
unlock_page(page);
return 0;
}
The function write_full_page will unlock the page if delalloc_convert
returns a negative value. I don't understand why delalloc_convert
returns a negative value, and if the page should be unlocked, but it
seems that it shouldn't. Could you give me some background on why
the page is unlocked at this point? If I comment out this unlock,
bonnie++ does not crash and the filesystem seems still stable, but
I'd like to understand _why_ this condition is occuring and if the
unlock_page is really needed, or if letting the mm handle it is
appropriate.
STATIC int
write_full_page(
struct page *page,
int delalloc)
{
struct inode *inode = (struct inode*)page->mapping->host;
unsigned long end_index = inode->i_size >> PAGE_CACHE_SHIFT;
int ret;
/* Are we off the end of the file ? */
if (page->index >= end_index) {
unsigned offset = inode->i_size & (PAGE_CACHE_SIZE-1);
if ((page->index >= end_index+1) || !offset) {
ret = -EIO;
goto out;
}
}
if (!page_has_buffers(page)) {
create_empty_buffers(page, inode->i_dev, 1 << inode->i_blkbits);
}
ret = delalloc_convert(inode, page, 1, delalloc == 0);
out:
if (ret < 0) {
/*
* If it's delalloc and we have nowhere to put it,
* throw it away.
*/
if (delalloc)
block_flushpage(page, 0);
ClearPageUptodate(page);
// unlock_page(page);
}
return ret;
}
It looks like the real problem is that no error return is processed
to not unlock the buffer if an error is returned by
linvfs_write_full_page by the function generic_file_write_nolock in
the call chain but the return values are used to indicate state in
some of the calls and cannot be used as error returns...
The call stack that gets to the point where the page is unlocked is:
generic_file_write_nolock()
generic_commit_write()
__block_commit_write()
balance_dirty()
write_some_buffers()
write_buffer_delay()
linvfs_write_full_page()
write_full_page()
After that the finish of the call generic_file_write_nolock will also
unlock that same page causing an oops.
Thanks!
-steve
Rusell Cattelan wrote:
On Tue, 2003-03-25 at 12:23, Steven Dake wrote:
XFS Developers,
Ok here is what I have done:
I have a tree that already had XFS 1.1 (with a bunch of other
stuff) included. I hand applied the XFS 1.2 on top of XFS 1.1
patch and everything seems to work fine, except when I run bonnie++
and under load (Writing intelligently operation), I receive the
following Oops:
If possible get kbd into your tree.
Then open a bug and attach what ever backtraces you can get.
bta is a good place to start.
That might shed some light on the situation, but without wading through
the lock changes you might have in your tree this problem may be
hard to track down.
-Russell