Received: with ECARTIS (v1.0.0; list linux-xfs); Tue, 25 Mar 2003 17:24:59 -0800 (PST) Received: from zipcode.az.mvista.com (fw-az.mvista.com [65.200.49.158]) by oss.sgi.com (8.12.8/8.12.5) with SMTP id h2Q1OFq9011489 for ; Tue, 25 Mar 2003 17:24:56 -0800 Received: from mvista.com (persist.az.mvista.com [10.50.1.246]) by zipcode.az.mvista.com (8.9.3/8.9.3) with ESMTP id SAA14226; Tue, 25 Mar 2003 18:35:47 -0700 Message-ID: <3E80FEB5.90801@mvista.com> Date: Tue, 25 Mar 2003 18:13:25 -0700 From: Steven Dake User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Rusell Cattelan CC: linux-xfs@oss.sgi.com Subject: XFS 1.2 bug? References: <3E809EB1.6000904@mvista.com> <1048639851.11129.44.camel@rose.americas.sgi.com> In-Reply-To: <1048639851.11129.44.camel@rose.americas.sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 3348 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: sdake@mvista.com Precedence: bulk X-list: linux-xfs Content-Length: 4631 Lines: 156 Russell, Thanks for the hints but I believe I found a bug in XFS 1.2, and perhaps you could verify it since I'm not very familiar with the XFS code. both linvfs_write_full_page and write_full_page will attempt to unlock_page in some circumstances. The page is previously locked in the function generic_file_write_nolock. This generic_file_write_nolock function will unlock the page after it is done using it. Unlocking a page twice will result in an Oops, so either the mm layer (generic_file_write_nolock) shouldn't be unlocking the page, or the xfs code in linvfs_write_full_page and write_full_page shouldn't unlock the page a second time. The unlock in the xfs code is suspect, since it only unlocks on certain conditions (which will result in an Oops) The function linvfs_write_full_page unlocks a page in the case of an error conditon. I have not been able to hit this error condition, but if it occurs, an Oops will most certainly occur. STATIC int linvfs_write_full_page( struct page *page) { int flagset = 0; int error; int need_trans; int nr_delalloc, nr_unmapped; if (count_page_state(page, &nr_delalloc, &nr_unmapped)) { need_trans = nr_delalloc + nr_unmapped; } else { need_trans = 1; } if ((current->flags & (PF_FSTRANS|PF_NOIO)) && need_trans) goto out_fail; if (need_trans) { current->flags |= PF_NOIO; flagset = 1; } error = write_full_page(page, nr_delalloc); if (flagset) current->flags &= ~PF_NOIO; return error; out_fail: SetPageDirty(page); unlock_page(page); return 0; } The function write_full_page will unlock the page if delalloc_convert returns a negative value. I don't understand why delalloc_convert returns a negative value, and if the page should be unlocked, but it seems that it shouldn't. Could you give me some background on why the page is unlocked at this point? If I comment out this unlock, bonnie++ does not crash and the filesystem seems still stable, but I'd like to understand _why_ this condition is occuring and if the unlock_page is really needed, or if letting the mm handle it is appropriate. STATIC int write_full_page( struct page *page, int delalloc) { struct inode *inode = (struct inode*)page->mapping->host; unsigned long end_index = inode->i_size >> PAGE_CACHE_SHIFT; int ret; /* Are we off the end of the file ? */ if (page->index >= end_index) { unsigned offset = inode->i_size & (PAGE_CACHE_SIZE-1); if ((page->index >= end_index+1) || !offset) { ret = -EIO; goto out; } } if (!page_has_buffers(page)) { create_empty_buffers(page, inode->i_dev, 1 << inode->i_blkbits); } ret = delalloc_convert(inode, page, 1, delalloc == 0); out: if (ret < 0) { /* * If it's delalloc and we have nowhere to put it, * throw it away. */ if (delalloc) block_flushpage(page, 0); ClearPageUptodate(page); // unlock_page(page); } return ret; } It looks like the real problem is that no error return is processed to not unlock the buffer if an error is returned by linvfs_write_full_page by the function generic_file_write_nolock in the call chain but the return values are used to indicate state in some of the calls and cannot be used as error returns... The call stack that gets to the point where the page is unlocked is: generic_file_write_nolock() generic_commit_write() __block_commit_write() balance_dirty() write_some_buffers() write_buffer_delay() linvfs_write_full_page() write_full_page() After that the finish of the call generic_file_write_nolock will also unlock that same page causing an oops. Thanks! -steve Rusell Cattelan wrote: >On Tue, 2003-03-25 at 12:23, Steven Dake wrote: > > >>XFS Developers, >> >>Ok here is what I have done: >> >>I have a tree that already had XFS 1.1 (with a bunch of other stuff) >>included. I hand applied the XFS 1.2 on top of XFS 1.1 patch and >>everything seems to work fine, except when I run bonnie++ and under load >>(Writing intelligently operation), I receive the following Oops: >> >> >> >If possible get kbd into your tree. >Then open a bug and attach what ever backtraces you can get. >bta is a good place to start. > >That might shed some light on the situation, but without wading through >the lock changes you might have in your tree this problem may be hard >to track down. > >-Russell > > > > > >