xfs
[Top] [All Lists]

Re: BUG() in end_page_writeback(), stack overflows and system speed de

To: xfs@xxxxxxxxxxx
Subject: Re: BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB
From: Juergen Urban <JuergenUrban@xxxxxx>
Date: Fri, 20 Nov 2009 17:23:47 +0100
In-reply-to: <4B0587B0.3020702@xxxxxxxxxxx>
References: <200911190957.45957.JuergenUrban@xxxxxx> <4B0587B0.3020702@xxxxxxxxxxx>
User-agent: KMail/1.10.3 (Linux/2.6.27-7-generic; KDE/4.1.3; i686; ; )
On Thursday 19 November 2009 19:00:16 Eric Sandeen wrote:
> Juergen Urban wrote:
> > Hello,
> >
> > my machine is running very unstable since I use XFS on an external USB
> > harddisc (855 GByte XFS partition on 1TByte). One problem was the stack
> > overflows caused by the large stack use of XFS, USB, SCSI and VFS in
> > Linux 2.6.23.13. NFS on XFS caused much more stack overflows. I think I
> > got around the stack overflows by disabling preemption, SMP and NFS in
> > Linux, but I am not sure about it. I think that I didn't got a message
> > from the stack overflow detection after this.
>
> Are you on 4k stacks?  To be honest I'd still expect things to be mostly
> ok stack-wise even if so.

No, I am using 8k stacks.

>
> > I also tried a Live-CD (KNOPPIX), but there are the same
> > problems. I exchanged some of the hardware. XFS is decreasing system
> > performance.  I use the Linux VDR with DVB-S which seems to increase the
> > problems. I was able to record 3 high bandwidth streams in parallel
> > before using XFS.
>
> Really, you could record 3 parallel high-def TV streams to ext3 via USB?
> I guess I'm a little surprised...
>

No, I meant that I was able to record 3 high bandwidth SDTV streams on the 
internal hard disc with ext3. Then I've got an external USB drive and 
formatted it with XFS, because someone told me that XFS is running stable with 
VDR on an internal hard disc.

> > Now it has problems to record one high bandwidth stream.  The
> > system got a little bit usable after I changed the IO scheduler to
> > deadline. It is difficult to get a good backtrace of the kernel crash,
> > because the backlog is not saved on the internal harddisc (reiserfs and
> > ext3). I was able to find out that XFS triggers a BUG() in
> > end_page_writeback() at mm/filemap.c:552:
> >
> > void end_page_writeback(struct page *page)
> > {
> >         if (!TestClearPageReclaim(page) || rotate_reclaimable_page(page))
> > { if (!test_clear_page_writeback(page))
> >                         BUG();
> >         }
> >         smp_mb__after_clear_bit();
> >         wake_up_page(page, PG_writeback);
> > }
>
> Regarding the bug, if there is any way to test a kernel newer than .23,
> I'd start there; I don't know offhand of a bug that was fixed here, but
> .23 was a long time ago...

Now I tried linux-2.6.31.6. My system hangs in the start scripts. Maybe this 
is caused by network scripts. I got the message that ehci_hcd need to be 
loaded before uhci_hcd and ohci_hcd. I skipped uhci_hcd and ohci_hcd in 
/etc/discover.conf. Now I have a higher performance with linux-2.6.23.13 and I 
can record 3 normal streams in parallel on the with USB and XFS. But it is 
still unstable. The last error what I got was in block_prepare_write 
(fs/buffer.c). This caused follow up errors in do_invalidate_page() called by 
xfs_get_blocks().
Sometimes there are file system deadlocks. I can do everything, but not access 
the file system. Every try to access the file system leads to a deadlock of the 
program. This normally happens after a kernel exception.

>
> > The backtrace looks like this (Sorry, I needed to write it down from
> > screen and I don't have everything):
> >
> > end_page_writeback()
> > end_buffer_async_write()
> > update_stats_wait_end()
> > xfs_setfilesize()
> > xfs_???_dealloc()
> > xfs_destroy_ioend()
> > run_workqueue()
> >
> > After searching in the code I found:
> > /* TODO: cleanup count and page_dirty */
> >
> > It seems that page_dirty may be handled wrong and could cause the
> > problem, but I don't know the purpose of this stuff. The same comment is
> > in the latest source code from GIT.
> > After running the system for while, I was able to trigger the kernel
> > crash by starting "sync" in the command line.
> > My stack traces includes often dvb_dmx_swfilter_packets(),
> > do_IRQ()/tasklets and sys_write()/vfs_write(). I can't scroll up in most
> > situations. Can anyone help me?
> > Is there an easy way to backup the data or replace the file system
> > without kernel crash in between?
>
> You should certainly be able to copy data off xfs via usb; if it's
> failing, I guess we'll need more info to find out why, but I'd suggest
> at least booting a newer livecd to do that copy and see if things fare
> better.

My idea was to shrink it and create a new partition where I can copy the data. 
As far as I understand I need to mount it for the shrink process, so I may 
have the problem of kernel exceptions while shrinking.

>
> -Eric
>
> > Best regards
> > Juergen Urban
> >
> > _______________________________________________
> > xfs mailing list
> > xfs@xxxxxxxxxxx
> > http://oss.sgi.com/mailman/listinfo/xfs
>
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

<Prev in Thread] Current Thread [Next in Thread>