BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB
Eric Sandeen
sandeen at sandeen.net
Thu Nov 19 12:00:16 CST 2009
Juergen Urban wrote:
> Hello,
>
> my machine is running very unstable since I use XFS on an external USB
> harddisc (855 GByte XFS partition on 1TByte). One problem was the stack
> overflows caused by the large stack use of XFS, USB, SCSI and VFS in Linux
> 2.6.23.13. NFS on XFS caused much more stack overflows. I think I got around
> the stack overflows by disabling preemption, SMP and NFS in Linux, but I am not
> sure about it. I think that I didn't got a message from the stack overflow
> detection after this.
Are you on 4k stacks? To be honest I'd still expect things to be mostly
ok stack-wise even if so.
> I also tried a Live-CD (KNOPPIX), but there are the same
> problems. I exchanged some of the hardware. XFS is decreasing system
> performance. I use the Linux VDR with DVB-S which seems to increase the
> problems. I was able to record 3 high bandwidth streams in parallel before
> using XFS.
Really, you could record 3 parallel high-def TV streams to ext3 via USB?
I guess I'm a little surprised...
> Now it has problems to record one high bandwidth stream. The
> system got a little bit usable after I changed the IO scheduler to deadline.
> It is difficult to get a good backtrace of the kernel crash, because the backlog
> is not saved on the internal harddisc (reiserfs and ext3). I was able to find
> out that XFS triggers a BUG() in end_page_writeback() at mm/filemap.c:552:
>
> void end_page_writeback(struct page *page)
> {
> if (!TestClearPageReclaim(page) || rotate_reclaimable_page(page)) {
> if (!test_clear_page_writeback(page))
> BUG();
> }
> smp_mb__after_clear_bit();
> wake_up_page(page, PG_writeback);
> }
Regarding the bug, if there is any way to test a kernel newer than .23,
I'd start there; I don't know offhand of a bug that was fixed here, but
.23 was a long time ago...
> The backtrace looks like this (Sorry, I needed to write it down from screen
> and I don't have everything):
>
> end_page_writeback()
> end_buffer_async_write()
> update_stats_wait_end()
> xfs_setfilesize()
> xfs_???_dealloc()
> xfs_destroy_ioend()
> run_workqueue()
>
> After searching in the code I found:
> /* TODO: cleanup count and page_dirty */
>
> It seems that page_dirty may be handled wrong and could cause the problem, but
> I don't know the purpose of this stuff. The same comment is in the latest
> source code from GIT.
> After running the system for while, I was able to trigger the kernel crash by
> starting "sync" in the command line.
> My stack traces includes often dvb_dmx_swfilter_packets(), do_IRQ()/tasklets
> and sys_write()/vfs_write(). I can't scroll up in most situations.
> Can anyone help me?
> Is there an easy way to backup the data or replace the file system without
> kernel crash in between?
You should certainly be able to copy data off xfs via usb; if it's
failing, I guess we'll need more info to find out why, but I'd suggest
at least booting a newer livecd to do that copy and see if things fare
better.
-Eric
> Best regards
> Juergen Urban
>
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
More information about the xfs
mailing list