Bug 230 - umount hangs after high disk load
: umount hangs after high disk load
Status: RESOLVED FIXED
Product: XFS
Classification: Unclassified
Component: XFS kernel code
: Current
: Linux
: major
: ---
Assigned To: Eric Sandeen
:
:
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2003-03-13 06:23 CST by Anton Guda
Modified: 2003-05-08 10:33 CDT (History)
4 users (show)

See Also:


Attachments
result of btp(mount) (2.65 KB, text/plain)
2003-03-15 13:05 CST, Anton Guda
Details
result of cpu, ps and some of bta (1.84 KB, text/plain)
2003-03-25 13:59 CST, Anton Guda
Details
some kdb outputs (3.91 KB, text/plain)
2003-03-26 04:23 CST, Christian Guggenberger
Details
another umount backtrace (1.89 KB, text/plain)
2003-03-26 05:25 CST, Christian Guggenberger
Details
ouput of ps, cpu and bta (11.75 KB, text/plain)
2003-03-26 13:26 CST, Christian Guggenberger
Details
new backtraces with schedule-kdb patch applied (11.86 KB, text/plain)
2003-03-27 05:19 CST, Christian Guggenberger
Details
trace of new weekly snapshot (2003-03-30) (2.02 KB, text/plain)
2003-03-31 02:25 CST, Anton Guda
Details
rqueue dump of get_request_wait (17.56 KB, text/plain)
2003-04-01 04:35 CST, Christian Guggenberger
Details
btp of pagebufd, rqueue dump (15.50 KB, text/plain)
2003-04-08 05:17 CDT, Christian Guggenberger
Details
page_buf.c patch (603 bytes, patch)
2003-04-11 13:28 CDT, Russell Cattelan
Details | Diff
Statistics from /proc (1.56 KB, text/plain)
2003-04-19 03:33 CDT, Anton Guda
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Anton Guda 2003-03-13 06:23:35 CST
kernel: 2.4.20+xfs(some modern snapshots up to 2003-03-09, from SGI FTP)+netfiler+openwall 
build by gcc 3.2.(1-2), XFS in kernel, not in modules. 
 
All linux partitions is XFS: /=hda5(300M), /usr=hda8(7G), /var=hda9(2G), /home=hda10(12G). 
swap=hda7(1G). 256M RAM. 
Redhat-like initscripts. 
 
If afer boot run some disk-using program (I use updatedb), and than try to reboot or halt, 
shutdown hangs in /etc/rc.d/init.d/halt while first umount loop. 
Using -v as umount, I found, that hang occured while umounting /usr. 
fuser shows no processes, using /usr. 
The same result can be created by:  
  mount /usr -o remount,ro   or   Alt-PrintScreen-u 
Alt-PrintScreen-P shows, shows, that current process in swapper. 
Before hang essenshial disk activity (by LED, approx 10-20s).
Comment 1 Russell Cattelan 2003-03-13 16:31:41 CST
Can you drop the system into kdb and btp the umount process?
Comment 2 Anton Guda 2003-03-15 13:05:54 CST
Created attachment 68 [details]
result of btp(mount)
Comment 3 Ron Murray 2003-03-15 13:16:13 CST
   I've been experiencing the same problem for several months now, on
my Debian-sarge system (1.2G Athlon/512M RAM). I have xfs partitions
on /usr (30G), /home (20G), /var (7.5G) (but not on /, too lazy to go
through the conversion). Kernel versions I've had it happen with are
2.4.18 and 2.4.19, xfs versions uncertain but at least 1.2, perhaps
1.1 (but I may have been using one of the pre-1.2 versions).

   Again it's /usr that umount hangs on -- but not all the
time. Sometimes it boots clean, sometimes it writes to disk for
upwards of two minutes (!) and *then* boots clean. And sometimes it
writes to disk for upwards of two minutes, or not at all, and then
hangs.

   I haven't been able to correlate it with any particular disk usage
pattern, unlike the person filing this bug. I might go a week or two
without rebooting the thing and it'll boot clean, then reboot into
Linux for five minutes, reboot again, and watch it lock up. The
opposite is equally likely.

   I have tried reversing the mount order (which reverses the umount
order) from having /usr first to having it last, and it still hangs on
/usr, never the others. I've waited for the system to show no activity
for 5-10 minutes and then rebooted, and still had it write to disk for
a long time and then lock up.

Suggestions for further troubleshooting welcome.

 .....Ron
Comment 4 Anton Guda 2003-03-16 08:52:44 CST
bta shows no processes  with pagebuf_lock (exept mount), 
and no processes in __down at all. 
 
Comment 5 Russell Cattelan 2003-03-17 08:01:38 CST
Argghh this bit of code again!
XFS_log_write_unmount_ro
Calls 3 times... can you toss in some printk's in to find out which of 
the 3 calls is hanging, dump the pincount also.
Comment 6 Anton Guda 2003-03-17 14:49:58 CST
Hang observed while first call to pagebuf_delwri_flush from 
XFS_log_write_remount_ro, 
before loop, (in case of remount ro), 
and while single call to  pagebuf_delwri_flush from XFS_bflush 
in case of umount. 
 
Comment 7 Anton Guda 2003-03-25 02:09:36 CST
Kernel recompiled with xfs-2003-03-23, and without any additional patches. 
With the same conditions, whan I enter kdb and use 'ps', 
some untrivial output occures: 
all processes, exept init have string: 
Error: does not matsh running process tables:( 0xc03e0000). 
As the result, bta shows no useful information. 
Comment 8 Keith Owens 2003-03-25 02:39:59 CST
How are you entering kdb?  I need the output from these kdb commands
  cpu
  ps
Comment 9 Anton Guda 2003-03-25 13:59:01 CST
Created attachment 71 [details]
result of cpu, ps and some of bta

Created by this sequence:
normal boot on initlevel3, log as root, than:
updatedb ; reboot

After updatedb (approx 4 min) system begins to reboot, try to umout disks,
(approx 2-3min) , than hangs.
Than I press 'pause' and enter kdb. Results in attach.
Comment 10 Keith Owens 2003-03-25 16:42:40 CST
Kernel bug.  UP systems call task_set_cpu() but never call
task_release_cpu() so all tasks appear to own the cpu, which confuses
debuggers.  This problem has been bounced to linux-kernel.

I doubt that the missing task_release_cpu() has anything to do with
your umount hang.  Unfortunately it does stop kdb backtrace working on
UP systems.  I expect to get a fix from l-k in the next day or so.
Comment 11 Christian Guggenberger 2003-03-26 04:23:54 CST
Created attachment 72 [details]
some kdb outputs

I'm also seeing that bug on a machine here. I can also reproduce it with
'updatedb; reboot'. System is Debian Woody, Kernel-2.4.20 + xfs-030316+ ptrace
patch. I've created some kdb backtraces, maybe they help.
Comment 12 Christian Guggenberger 2003-03-26 05:25:04 CST
Created attachment 73 [details]
another umount backtrace

This is another backtrace of the umount process. It has been created some 50
Minutes after the umount process startet.
Comment 13 Russell Cattelan 2003-03-26 11:21:03 CST
Hmm for some reaon you are out of requests.
0xce95fd34 0xc01b856a __get_request_wait+0x9a (0xc03b23c4, 0x1)
                               kernel .text 0xc0100000 0xc01b84d0 0xc01b85d0
load the kdb_pg and kdbm_vm
you can then look at the request queue for that device
requeue <first arg of __get_request_wait>

Also send a bta if you could.
or at the very least btp of kupdated, dbflush, kswapd, pagebufd, and page_buf_IO
Comment 14 Christian Guggenberger 2003-03-26 13:26:04 CST
Created attachment 74 [details]
ouput of ps, cpu and bta
Comment 15 Russell Cattelan 2003-03-26 15:04:07 CST
would ext3 happen to be in the mix at all on this system?
Comment 16 Christian Guggenberger 2003-03-26 16:24:42 CST
Nope there's no ext3 filesystem (But ext3 is compiled as module). The system has
4 partitions, where hda1 (ext2) mounted as /, hda2 (xfs) as /var, hda3 (swap)
and hda4 (xfs) mounted as /usr.
When I use SysRQ+sync, while the hanging umount process during shutdown occures,
I see that all devices except hda4(/usr) can be emergency sync'd.
SYSRQ+remount,ro then can do nothing anymore.

But then there's another weird thing: 
When I reboot the machine(while the umount process is hanging - and only with
SySRQ-boot, no sync and ro), hda1 will be e2fsck'd on startup, xfs-recovery will
be done on hda2 (/var), but not on hda4 (/usr). The kernel claims to do a clean
xfs mount here... hmmm?
The xfs fs had been created with xfsprogs-2.0.3

Comment 17 Keith Owens 2003-03-26 18:40:30 CST
KDB note: if you are using kdb on uni-processors then you will hit a kernel bug
where it does not correctly set scheduler flags.  Fetch and apply

ftp://oss.sgi.com/projects/kdb/download/v4.0/sched-bug-2.4.20.bz2

Patch is against 2.4.20, it fits 2.4.21-pre5.
Comment 18 Christian Guggenberger 2003-03-27 00:58:20 CST
Yes it's an UP System here, but I don't run in those kdb probs as Anton does.
Maybe because I'm using an SMP Kernel? 
But, I'll try Keith's schedule patch anyway.
Comment 19 Christian Guggenberger 2003-03-27 05:19:29 CST
Created attachment 75 [details]
new backtraces with schedule-kdb patch applied
Comment 20 Russell Cattelan 2003-03-27 09:25:39 CST
This bascially the same problem the umount code has run out of request.
dumping the rqueue will show how many requests are pending vs how many 
are in flight.

The request queue stuff in linux has  a very delicate balnce point 
we need to find out what is getting stuck and thus causing the out
of requests situation.
Comment 21 Christian Guggenberger 2003-03-27 10:35:42 CST
okay. If someone gives me a hint, how to dump the requeue (and how to load
kdb_pg and kdbm_vm - as mentioned in comment #13 of Bug 230), I'll take a look
at it.
thanks.
Comment 22 Anton Guda 2003-03-31 02:25:44 CST
Created attachment 76 [details]
trace of new weekly snapshot (2003-03-30)
Comment 23 Russell Cattelan 2003-03-31 13:15:20 CST
Still would be helpful to have a look at the request queue

rqueue <addr>
where <addr> is the first arg to get_request_wait
Comment 24 Anton Guda 2003-04-01 01:51:29 CST
Sorry, but no functions like *get_req*  in my trace. 
What else can I do? 
Comment 25 Christian Guggenberger 2003-04-01 04:35:33 CST
Created attachment 77 [details]
rqueue dump of get_request_wait
Comment 26 Russell Cattelan 2003-04-02 13:55:04 CST
Well the request queue is stuck... 
for some reason most (60 of 64) requests are stuck in the queue
waiting to go out. When this happens the system doesn't make progress
since the code to wake people up waiting for a request requires 32 free
requests before it wakes anybody up.
It's possible there is a large amount of cached meta data that needs to be 
written out but updatedb should more of a read intensive thing than a write
intensive thing.
We need some indea of how much meta data is being written out at unmount time. 
Some printk in the flush code might help, hopefully we can come up with some of
those so called printk's in the new few days. 
Well I won't since I'll be gone till monday but hopefully Eric will be able to
get something to you
Comment 27 Anton Guda 2003-04-07 01:25:52 CDT
Tested snapshot 2003-04-06: same result. 
I have 14388 dirs in /usr, atime of each changed during updatedb. 
Some unusual phenomena:  
after updatedb, sync don't lead to disk i/o, 
(unlike ext2), and don't consume any time. 
Comment 28 Russell Cattelan 2003-04-07 16:30:19 CDT
It almost looks like the pagebuf daemon is getting stuck.
try doing a btp <pid> (in kdb) once the updatedb has run.

If it is waiting for a request dump the request queue also.

One other thing, has xfs_check been run on the fs?

Try running xfs_repair from single user before /usr is mounted
just to make we are not dealing with something strange on disk.
Comment 29 Christian Guggenberger 2003-04-08 05:17:11 CDT
Created attachment 78 [details]
btp of pagebufd, rqueue dump

please check bugzilla, the latest comments did'n't get forwarded to me and
linux-xfs....
Comment 30 Christian Guggenberger 2003-04-08 05:59:48 CDT
I ran xfs_repair(from xfsprgs 2.0.3 - will try again with more recenct tools) on
/var and /usr, but nothing changed. The request queue is still stuck. 

Btw: my last comment got cc'd to me, but I'm still not sure if it makes it's way
to the list.
Comment 31 Christian Guggenberger 2003-04-08 06:13:48 CDT
tried again with xfsprogs-2.3.11
Something strange: every time I run xfs_repair on /usr, I see following in Phase 6:

Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ... 
rebuilding directory inode 128
        - traversal finished ... 
        - traversing all unattached subtrees ... 
        - traversals finished ... 
        - moving disconnected inodes to lost+found ... 

Is this normal that inode 128 gets rebuild every time?
Comment 32 Anton Guda 2003-04-08 06:24:14 CDT
In my case, pagebufd is in other place: 
  schedule+0x1F7, 
  interruptible_sleep_on+0x36, 
  pagebuf_daemon+0x1a9 = page_buf.c+2023, 
  arch_kernel_thread+0x26 
xfs_restore don't decected any bugs. 
If I do updatedb ; halt in single-user mode, 
umount don't locked, but use disk approx 30-40 s before complete. 
Comment 33 Russell Cattelan 2003-04-10 10:01:30 CDT
Yes xfs_repair unlinks the lost&found before it starts to any files
left in there from run to run will be refound by repair.

I really does seem like something is causing the the meta data delay write
queue to get stuck.

What I don't undertand is how updatedb is causing alot dirty meta data?
That should mainly be a read operation.

We have also be unable to reproduce this locally.
Comment 34 Russell Cattelan 2003-04-10 10:02:39 CDT
Yes xfs_repair unlinks the lost&found before it starts to any files
left in there from run to run will be refound by repair.

I really does seem like something is causing the the meta data delay write
queue to get stuck.

What I don't undertand is how updatedb is causing alot dirty meta data?
That should mainly be a read operation.

We have also be unable to reproduce this locally.
Comment 35 Russell Cattelan 2003-04-11 13:28:43 CDT
Created attachment 79 [details]
page_buf.c patch

This patch may or may not make a difference, but give it a try.
Hopefully by running the tq_disk task_queue more often while 
flushing the the delwri queue the request queue won't get stuck.
Comment 36 Anton Guda 2003-04-14 09:20:51 CDT
Patch from comment 35 don't help. 
Now I trying to trace, who (and why) don't release pagebuf_lock. 
It may be a rare case on other architectures. 
In similar machines different result: 
Duron 950/128/60G: only updatedb lead to lock 
Duron 1200/512/40G: also find /usr >& /dev/null too. 
Comment 37 Christian Guggenberger 2003-04-15 13:51:43 CDT
Russell, thanks for the patch. But I'm currently away from Regensburg -  I'll do
further tests, when I'm back in about two weeks.

cheers.
Christian
Comment 38 Anton Guda 2003-04-19 03:33:26 CDT
Created attachment 81 [details]
Statistics from /proc

May be this statistics will be usefull
Comment 39 Anton Guda 2003-04-21 07:09:25 CDT
Do you have XFS patch for 2.4.21-preX kernel? 
In pre7 i found some changes, that may by relevant to this bug, 
but prevents to apply XFS patch with minimal use of hand. 
And is some documentation exist about ideology of XFS and pagebuf? 
It's hard to find a bug without undestanding key ideas. 
Comment 40 Anton Guda 2003-05-07 05:28:26 CDT
Tested new patch linux-2.4.20-xfs-2003-05-04 for 21-rc1. 
It works! Moreover: unmounting in fast now! 
My proposal: treat as FIXED after 2.4.21 release. 
Comment 41 Christian Guggenberger 2003-05-08 02:11:25 CDT
Yep! It works here with latest cvs checkout, too.
IMHO bug #230 could be closed.

Great work, Steve et. al. !

Comment 42 Russell Cattelan 2003-05-08 08:30:03 CDT
The latest sync rework appears to have fixed this problem.
Comment 43 Russell Cattelan 2003-05-08 08:33:35 CDT
*** Bug 227 has been marked as a duplicate of this bug. ***