xfs
[Top] [All Lists]

Re: BUG() in end_page_writeback(), stack overflows and system speed decr

To: Juergen Urban <JuergenUrban@xxxxxx>
Subject: Re: BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Thu, 19 Nov 2009 12:00:16 -0600
Cc: xfs@xxxxxxxxxxx
In-reply-to: <200911190957.45957.JuergenUrban@xxxxxx>
References: <200911190957.45957.JuergenUrban@xxxxxx>
User-agent: Thunderbird 2.0.0.21 (X11/20090320)
Juergen Urban wrote:
> Hello,
> 
> my machine is running very unstable since I use XFS on an external USB 
> harddisc (855 GByte XFS partition on 1TByte). One problem was the stack 
> overflows caused by the large stack use of XFS, USB, SCSI and VFS in Linux 
> 2.6.23.13. NFS on XFS caused much more stack overflows. I think I got around 
> the stack overflows by disabling preemption, SMP and NFS in Linux, but I am 
> not 
> sure about it. I think that I didn't got a message from the stack overflow 
> detection after this. 

Are you on 4k stacks?  To be honest I'd still expect things to be mostly
ok stack-wise even if so.

> I also tried a Live-CD (KNOPPIX), but there are the same 
> problems. I exchanged some of the hardware. XFS is decreasing system 
> performance.  I use the Linux VDR with DVB-S which seems to increase the 
> problems. I was able to record 3 high bandwidth streams in parallel before 
> using XFS. 

Really, you could record 3 parallel high-def TV streams to ext3 via USB?
I guess I'm a little surprised...

> Now it has problems to record one high bandwidth stream.  The 
> system got a little bit usable after I changed the IO scheduler to deadline.
> It is difficult to get a good backtrace of the kernel crash, because the 
> backlog 
> is not saved on the internal harddisc (reiserfs and ext3). I was able to find 
> out that XFS triggers a BUG() in end_page_writeback() at mm/filemap.c:552:
> 
> void end_page_writeback(struct page *page)
> {
>         if (!TestClearPageReclaim(page) || rotate_reclaimable_page(page)) {
>                 if (!test_clear_page_writeback(page))
>                         BUG();
>         }
>         smp_mb__after_clear_bit();
>         wake_up_page(page, PG_writeback);
> }

Regarding the bug, if there is any way to test a kernel newer than .23,
I'd start there; I don't know offhand of a bug that was fixed here, but
.23 was a long time ago...

> The backtrace looks like this (Sorry, I needed to write it down from screen 
> and I don't have everything):
> 
> end_page_writeback()
> end_buffer_async_write()
> update_stats_wait_end()
> xfs_setfilesize()
> xfs_???_dealloc()
> xfs_destroy_ioend()
> run_workqueue()
> 
> After searching in the code I found:
> /* TODO: cleanup count and page_dirty */
> 
> It seems that page_dirty may be handled wrong and could cause the problem, 
> but 
> I don't know the purpose of this stuff. The same comment is in the latest 
> source code from GIT.
> After running the system for while, I was able to trigger the kernel crash by 
> starting "sync" in the command line.
> My stack traces includes often dvb_dmx_swfilter_packets(), do_IRQ()/tasklets 
> and sys_write()/vfs_write(). I can't scroll up in most situations.
> Can anyone help me?
> Is there an easy way to backup the data or replace the file system without 
> kernel crash in between?

You should certainly be able to copy data off xfs via usb; if it's
failing, I guess we'll need more info to find out why, but I'd suggest
at least booting a newer livecd to do that copy and see if things fare
better.

-Eric

> Best regards
> Juergen Urban
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
> 

<Prev in Thread] Current Thread [Next in Thread>