xfs
[Top] [All Lists]

BUG() in end_page_writeback(), stack overflows and system speed decrease

To: xfs@xxxxxxxxxxx
Subject: BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB
From: Juergen Urban <JuergenUrban@xxxxxx>
Date: Thu, 19 Nov 2009 09:57:45 +0100
User-agent: KMail/1.10.3 (Linux/2.6.27-7-generic; KDE/4.1.3; i686; ; )
Hello,

my machine is running very unstable since I use XFS on an external USB 
harddisc (855 GByte XFS partition on 1TByte). One problem was the stack 
overflows caused by the large stack use of XFS, USB, SCSI and VFS in Linux 
2.6.23.13. NFS on XFS caused much more stack overflows. I think I got around 
the stack overflows by disabling preemption, SMP and NFS in Linux, but I am not 
sure about it. I think that I didn't got a message from the stack overflow 
detection after this. I also tried a Live-CD (KNOPPIX), but there are the same 
problems. I exchanged some of the hardware. XFS is decreasing system 
performance.  I use the Linux VDR with DVB-S which seems to increase the 
problems. I was able to record 3 high bandwidth streams in parallel before 
using XFS. Now it has problems to record one high bandwidth stream.  The 
system got a little bit usable after I changed the IO scheduler to deadline.
It is difficult to get a good backtrace of the kernel crash, because the 
backlog 
is not saved on the internal harddisc (reiserfs and ext3). I was able to find 
out that XFS triggers a BUG() in end_page_writeback() at mm/filemap.c:552:

void end_page_writeback(struct page *page)
{
        if (!TestClearPageReclaim(page) || rotate_reclaimable_page(page)) {
                if (!test_clear_page_writeback(page))
                        BUG();
        }
        smp_mb__after_clear_bit();
        wake_up_page(page, PG_writeback);
}

The backtrace looks like this (Sorry, I needed to write it down from screen 
and I don't have everything):

end_page_writeback()
end_buffer_async_write()
update_stats_wait_end()
xfs_setfilesize()
xfs_???_dealloc()
xfs_destroy_ioend()
run_workqueue()

After searching in the code I found:
/* TODO: cleanup count and page_dirty */

It seems that page_dirty may be handled wrong and could cause the problem, but 
I don't know the purpose of this stuff. The same comment is in the latest 
source code from GIT.
After running the system for while, I was able to trigger the kernel crash by 
starting "sync" in the command line.
My stack traces includes often dvb_dmx_swfilter_packets(), do_IRQ()/tasklets 
and sys_write()/vfs_write(). I can't scroll up in most situations.
Can anyone help me?
Is there an easy way to backup the data or replace the file system without 
kernel crash in between?

Best regards
Juergen Urban

<Prev in Thread] Current Thread [Next in Thread>