Bugzilla – Bug 158
FAM does not recover from queue overflows
Last modified: 2007-05-23 07:16:31 CDT
It often happens, that fam covers 100%cpu load until you restart it. Even if no processes using fam. Only restarting will help. Is there a way to track it down ? When i use the -d option it doesnt seem to happen.... My System: Linux 2.4.19-pre10 i686 AuthenticAMD fam-2.6.7 with dnotify patch. Also some other users of us (gentoo) are reporting this problems: See: http://forums.gentoo.org/viewtopic.php?t=6761 and http://forums.gentoo.org/viewtopic.php?t=5708 i will help you to track down the problem, but need assistance. felix
Hi Felix, and thanks for your interest. I don't think FAM is enabled by default on Gentoo, indeed it seems that the user must manually install FAM. Is FAM running from xinetd or some other way? If the problem does not appear when using the -d flag, what behavior do you get if you use the -f flag? What clients are using FAM (GNOME, KDE, something else)? Some users have reported problems with the DNotify patch. Are you able to rebuild FAM without the DNotify patch and report whether this problem still occurs?
Im running now with the -d option: whats this ? this is printed million times on the console But it will end when closing the last client (im using kongueror/kde-3.01) felix *************** overflow sigqueue *********************** *************** overflow sigqueue *********************** *************** overflow sigqueue *********************** *************** overflow sigqueue *********************** *************** overflow sigqueue *********************** *************** overflow sigqueue *********************** *************** overflow sigqueue *********************** *************** overflow sigqueue *********************** *************** overflow sigqueue *********************** *************** overflow sigqueue *********************** *************** overflow sigqueue *********************** *************** overflow sigqueue *********************** *************** overflow sigqueue *********************** *************** overflow sigqueue ***********************
The only place I see this text is in the DNotify patch.
It seems the DNotify patch uses a queue of 1024 elements (as does FAM currently), so if Konqueror is trying to monitor more than 1024 files this may explain the problem.
The bit in the DNotify code that seems to describe why this might occur reads: "When the RT queue overflows we get a SIGIO".
This issue relates to a message posted to fam@oss.sgi.com some time back. I don't think my earlier comments in this bug are pertinent. My guess is that the (fam|sig)queue is filled too quickly for FAM to recover, but I could be wrong. Is there any possibility of you looking into this, Alex? FWIW: When I use the FAM test program to FAM /dev on my Red Hat 7.3 box I actually get a famqueue overflow rather than a sigqueue overflow.
It should be able to recover from overflows, so it should be fixed. I'm going on vacation tomorrow, so i don't have time to look at it right now. But eventually i'd like to fix it.
I notice a couple interesting things: 1. if I redirect output to a regular file (rather than my xterm), I can run test/test -d /dev successfully (in both instances I use fam -d) 2. if I put a couple printfs in the signal handler, I now get a sigqueue overflow rather than a famqueue overflow
Created attachment 39 [details] block SIGIO and SIGRTMIN so we handle each one fully
As it seems that the events are being created faster than FAM/DNotify is handling them (and it could be that we are getting signals before the entire signal handling function can be performed), one way around this could be to block the SIGIO (RT queue overflowed) and SIGRTMIN (file changed) signals, so we handle each signal one at a time. I'm not familiar with real- time signals, so please regard this as an idea rather than a definite solution. Whether both or only one signal type should be blocked should be considered. Doing things this way may also be far less efficient, but it does seem to stop the problem. :-)
Wil Evers suggests building with -lrt -lpthread here: <http://oss.sgi.com/projects/fam/mail_archive/200301/msg00011.html>
*** Bug 210 has been marked as a duplicate of this bug. ***
I am trying to use FAM to implement an auto-refresh feature in the IDE tooling of the eclipse project (www.eclipse.org). Essentially, we want any resources in the user's workspace to be automatically refreshed if changes are made using some external tool. I believe I am running into this apparent 1024 limit on the number of directories that can be monitored. User workspaces in Eclipse can easily contain thousands of directories. The behavior I see is that the FAM API calls (FAMMonitorDirectory, FAMCancelMonitor) seem to hang once this limit is exceeded. I take it this is a known limitation? Any plans to remove this limitation?
Comment on attachment 39 [details] block SIGIO and SIGRTMIN so we handle each one fully AAAA
I would be surprised if RedHat didn't already solve this. RedHat currently ships Fedora with FAM 2.6.10. Could someone check whether RedHat's current dnotify patch has any improvements over the one on SGI's site, and if it has, merge those improvements into SGI's dnotify patch?
Is this ~5yr old bug going to get fixed? :-X I regularly see this problem and have to restart fam due to the 99% cpu use it causes.