Bugzilla – Bug 230
umount hangs after high disk load
Last modified: 2003-05-08 10:33:35 CDT
kernel: 2.4.20+xfs(some modern snapshots up to 2003-03-09, from SGI FTP)+netfiler+openwall build by gcc 3.2.(1-2), XFS in kernel, not in modules. All linux partitions is XFS: /=hda5(300M), /usr=hda8(7G), /var=hda9(2G), /home=hda10(12G). swap=hda7(1G). 256M RAM. Redhat-like initscripts. If afer boot run some disk-using program (I use updatedb), and than try to reboot or halt, shutdown hangs in /etc/rc.d/init.d/halt while first umount loop. Using -v as umount, I found, that hang occured while umounting /usr. fuser shows no processes, using /usr. The same result can be created by: mount /usr -o remount,ro or Alt-PrintScreen-u Alt-PrintScreen-P shows, shows, that current process in swapper. Before hang essenshial disk activity (by LED, approx 10-20s).
Can you drop the system into kdb and btp the umount process?
Created attachment 68 [details] result of btp(mount)
I've been experiencing the same problem for several months now, on my Debian-sarge system (1.2G Athlon/512M RAM). I have xfs partitions on /usr (30G), /home (20G), /var (7.5G) (but not on /, too lazy to go through the conversion). Kernel versions I've had it happen with are 2.4.18 and 2.4.19, xfs versions uncertain but at least 1.2, perhaps 1.1 (but I may have been using one of the pre-1.2 versions). Again it's /usr that umount hangs on -- but not all the time. Sometimes it boots clean, sometimes it writes to disk for upwards of two minutes (!) and *then* boots clean. And sometimes it writes to disk for upwards of two minutes, or not at all, and then hangs. I haven't been able to correlate it with any particular disk usage pattern, unlike the person filing this bug. I might go a week or two without rebooting the thing and it'll boot clean, then reboot into Linux for five minutes, reboot again, and watch it lock up. The opposite is equally likely. I have tried reversing the mount order (which reverses the umount order) from having /usr first to having it last, and it still hangs on /usr, never the others. I've waited for the system to show no activity for 5-10 minutes and then rebooted, and still had it write to disk for a long time and then lock up. Suggestions for further troubleshooting welcome. .....Ron
bta shows no processes with pagebuf_lock (exept mount), and no processes in __down at all.
Argghh this bit of code again! XFS_log_write_unmount_ro Calls 3 times... can you toss in some printk's in to find out which of the 3 calls is hanging, dump the pincount also.
Hang observed while first call to pagebuf_delwri_flush from XFS_log_write_remount_ro, before loop, (in case of remount ro), and while single call to pagebuf_delwri_flush from XFS_bflush in case of umount.
Kernel recompiled with xfs-2003-03-23, and without any additional patches. With the same conditions, whan I enter kdb and use 'ps', some untrivial output occures: all processes, exept init have string: Error: does not matsh running process tables:( 0xc03e0000). As the result, bta shows no useful information.
How are you entering kdb? I need the output from these kdb commands cpu ps
Created attachment 71 [details] result of cpu, ps and some of bta Created by this sequence: normal boot on initlevel3, log as root, than: updatedb ; reboot After updatedb (approx 4 min) system begins to reboot, try to umout disks, (approx 2-3min) , than hangs. Than I press 'pause' and enter kdb. Results in attach.
Kernel bug. UP systems call task_set_cpu() but never call task_release_cpu() so all tasks appear to own the cpu, which confuses debuggers. This problem has been bounced to linux-kernel. I doubt that the missing task_release_cpu() has anything to do with your umount hang. Unfortunately it does stop kdb backtrace working on UP systems. I expect to get a fix from l-k in the next day or so.
Created attachment 72 [details] some kdb outputs I'm also seeing that bug on a machine here. I can also reproduce it with 'updatedb; reboot'. System is Debian Woody, Kernel-2.4.20 + xfs-030316+ ptrace patch. I've created some kdb backtraces, maybe they help.
Created attachment 73 [details] another umount backtrace This is another backtrace of the umount process. It has been created some 50 Minutes after the umount process startet.
Hmm for some reaon you are out of requests. 0xce95fd34 0xc01b856a __get_request_wait+0x9a (0xc03b23c4, 0x1) kernel .text 0xc0100000 0xc01b84d0 0xc01b85d0 load the kdb_pg and kdbm_vm you can then look at the request queue for that device requeue <first arg of __get_request_wait> Also send a bta if you could. or at the very least btp of kupdated, dbflush, kswapd, pagebufd, and page_buf_IO
Created attachment 74 [details] ouput of ps, cpu and bta
would ext3 happen to be in the mix at all on this system?
Nope there's no ext3 filesystem (But ext3 is compiled as module). The system has 4 partitions, where hda1 (ext2) mounted as /, hda2 (xfs) as /var, hda3 (swap) and hda4 (xfs) mounted as /usr. When I use SysRQ+sync, while the hanging umount process during shutdown occures, I see that all devices except hda4(/usr) can be emergency sync'd. SYSRQ+remount,ro then can do nothing anymore. But then there's another weird thing: When I reboot the machine(while the umount process is hanging - and only with SySRQ-boot, no sync and ro), hda1 will be e2fsck'd on startup, xfs-recovery will be done on hda2 (/var), but not on hda4 (/usr). The kernel claims to do a clean xfs mount here... hmmm? The xfs fs had been created with xfsprogs-2.0.3
KDB note: if you are using kdb on uni-processors then you will hit a kernel bug where it does not correctly set scheduler flags. Fetch and apply ftp://oss.sgi.com/projects/kdb/download/v4.0/sched-bug-2.4.20.bz2 Patch is against 2.4.20, it fits 2.4.21-pre5.
Yes it's an UP System here, but I don't run in those kdb probs as Anton does. Maybe because I'm using an SMP Kernel? But, I'll try Keith's schedule patch anyway.
Created attachment 75 [details] new backtraces with schedule-kdb patch applied
This bascially the same problem the umount code has run out of request. dumping the rqueue will show how many requests are pending vs how many are in flight. The request queue stuff in linux has a very delicate balnce point we need to find out what is getting stuck and thus causing the out of requests situation.
okay. If someone gives me a hint, how to dump the requeue (and how to load kdb_pg and kdbm_vm - as mentioned in comment #13 of Bug 230), I'll take a look at it. thanks.
Created attachment 76 [details] trace of new weekly snapshot (2003-03-30)
Still would be helpful to have a look at the request queue rqueue <addr> where <addr> is the first arg to get_request_wait
Sorry, but no functions like *get_req* in my trace. What else can I do?
Created attachment 77 [details] rqueue dump of get_request_wait
Well the request queue is stuck... for some reason most (60 of 64) requests are stuck in the queue waiting to go out. When this happens the system doesn't make progress since the code to wake people up waiting for a request requires 32 free requests before it wakes anybody up. It's possible there is a large amount of cached meta data that needs to be written out but updatedb should more of a read intensive thing than a write intensive thing. We need some indea of how much meta data is being written out at unmount time. Some printk in the flush code might help, hopefully we can come up with some of those so called printk's in the new few days. Well I won't since I'll be gone till monday but hopefully Eric will be able to get something to you
Tested snapshot 2003-04-06: same result. I have 14388 dirs in /usr, atime of each changed during updatedb. Some unusual phenomena: after updatedb, sync don't lead to disk i/o, (unlike ext2), and don't consume any time.
It almost looks like the pagebuf daemon is getting stuck. try doing a btp <pid> (in kdb) once the updatedb has run. If it is waiting for a request dump the request queue also. One other thing, has xfs_check been run on the fs? Try running xfs_repair from single user before /usr is mounted just to make we are not dealing with something strange on disk.
Created attachment 78 [details] btp of pagebufd, rqueue dump please check bugzilla, the latest comments did'n't get forwarded to me and linux-xfs....
I ran xfs_repair(from xfsprgs 2.0.3 - will try again with more recenct tools) on /var and /usr, but nothing changed. The request queue is still stuck. Btw: my last comment got cc'd to me, but I'm still not sure if it makes it's way to the list.
tried again with xfsprogs-2.3.11 Something strange: every time I run xfs_repair on /usr, I see following in Phase 6: Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... rebuilding directory inode 128 - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... Is this normal that inode 128 gets rebuild every time?
In my case, pagebufd is in other place: schedule+0x1F7, interruptible_sleep_on+0x36, pagebuf_daemon+0x1a9 = page_buf.c+2023, arch_kernel_thread+0x26 xfs_restore don't decected any bugs. If I do updatedb ; halt in single-user mode, umount don't locked, but use disk approx 30-40 s before complete.
Yes xfs_repair unlinks the lost&found before it starts to any files left in there from run to run will be refound by repair. I really does seem like something is causing the the meta data delay write queue to get stuck. What I don't undertand is how updatedb is causing alot dirty meta data? That should mainly be a read operation. We have also be unable to reproduce this locally.
Created attachment 79 [details] page_buf.c patch This patch may or may not make a difference, but give it a try. Hopefully by running the tq_disk task_queue more often while flushing the the delwri queue the request queue won't get stuck.
Patch from comment 35 don't help. Now I trying to trace, who (and why) don't release pagebuf_lock. It may be a rare case on other architectures. In similar machines different result: Duron 950/128/60G: only updatedb lead to lock Duron 1200/512/40G: also find /usr >& /dev/null too.
Russell, thanks for the patch. But I'm currently away from Regensburg - I'll do further tests, when I'm back in about two weeks. cheers. Christian
Created attachment 81 [details] Statistics from /proc May be this statistics will be usefull
Do you have XFS patch for 2.4.21-preX kernel? In pre7 i found some changes, that may by relevant to this bug, but prevents to apply XFS patch with minimal use of hand. And is some documentation exist about ideology of XFS and pagebuf? It's hard to find a bug without undestanding key ideas.
Tested new patch linux-2.4.20-xfs-2003-05-04 for 21-rc1. It works! Moreover: unmounting in fast now! My proposal: treat as FIXED after 2.4.21 release.
Yep! It works here with latest cvs checkout, too. IMHO bug #230 could be closed. Great work, Steve et. al. !
The latest sync rework appears to have fixed this problem.
*** Bug 227 has been marked as a duplicate of this bug. ***