From proski@gnu.org Sun May 4 21:34:02 2003 Received: with ECARTIS (v1.0.0; list devfs); Sun, 04 May 2003 21:34:13 -0700 (PDT) Received: from fencepost.gnu.org (fencepost.gnu.org [199.232.76.164]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h454Y1Fu004913 for ; Sun, 4 May 2003 21:34:02 -0700 Received: from proski by fencepost.gnu.org with local (Exim 4.10) id 19CXfs-0007C6-00; Mon, 05 May 2003 00:34:00 -0400 Date: Mon, 5 May 2003 00:33:59 -0400 (EDT) From: Pavel Roskin X-X-Sender: proski@marabou.research.att.com To: devfs@oss.sgi.com, Andrey Borzenkov Subject: Re[2]: [PATCH] fix initlog/minilogd deadlock on /dev/log access In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323328-2054551758-1052108175=:1278" Content-ID: X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 125 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: proski@gnu.org Precedence: bulk X-list: devfs This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. --8323328-2054551758-1052108175=:1278 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII Content-ID: Hello! The problem with the hang on "Finding module dependencies" has been traced to a deadlock in the kernel, possibly in devfs. One minilogd calls bind() on /dev/log, and the other calls unlink() on the same file. The problem only happens with Linux 2.4.x (not 2.5.x) and only if devfs is mounted. Given that it's essentially a race condition, we cannot be sure that the problem doesn't exist without devfs or on 2.5.x kernels. However, devfs must be involved somehow. Since both minilogd processes are in the "D" state, I don't think devfsd is involved. They both are waiting in the kernel. The attached patch can be used as a workaround. It prevents minilogd from doing unlink() while another one is calling bind() on the same file. The patch is very ugly and is not meant to be applied. However, it demonstrates where the problem lies. I think there is more that one program to blame. We have two initlog processes, each calling minilogd. That's the first problem. We have minilogd removing /dev/log without trying to find if it's in use. This may or may not be OK. At least it's not OK if the kernel is not fixed. That's the second problem. unlink() and bind() can deadlock on devfs. That's the third problem. If it's a known problem and it was fixed in 2.5.x kernels, it would be really nice to propagate the fix to the 2.4.x series, because hanging Red Hat is a major annoyance, and it took me months living with even worse workarounds and a whole day to track it down. There are some more details in this bug: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=85621 I'll try to create a testcase not involving reboot. Also, I'll be able to run the logs produced by Alt-SysRq T through ksymoops. But I'm going to be quite busy next week, so I'll appreciate if someone else beats me as that. -- Regards, Pavel Roskin --8323328-2054551758-1052108175=:1278 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="minilogd.diff" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="minilogd.diff" LS0tIHNyYy9taW5pbG9nZC5jDQorKysgc3JjL21pbmlsb2dkLmMNCkBAIC0x OSw2ICsxOSw4IEBADQogI2luY2x1ZGUgPHN5cy9zdGF0Lmg+DQogI2luY2x1 ZGUgPHN5cy91bi5oPg0KIA0KKyNkZWZpbmUgTE9HX0xPQ0sgIi92YXIvbG9j ay9zdWJzeXMvbWluaWxvZ2QiDQorDQogc3RhdGljIGludCB3ZV9vd25fbG9n PTA7DQogc3RhdGljIGNoYXIgKipidWZmZXI9TlVMTDsNCiBzdGF0aWMgaW50 IGJ1ZmxpbmVzPTA7DQpAQCAtMTM5LDIyICsxNDEsNDkgQEAgaW50IG1haW4o aW50IGFyZ2MsIGNoYXIgKiphcmd2KSB7DQogICAgc3RydWN0IHNvY2thZGRy X3VuIGFkZHI7DQogICAgaW50IHNvY2s7DQogICAgaW50IHBpZDsNCisgICBp bnQgd2VfaG9sZF9sb2NrID0gMDsNCisgICBpbnQgY2Fubm90X2xvY2sgPSAw Ow0KKyAgIGludCBpOw0KICAgICANCiAgICAvKiBvcHRpb24gcHJvY2Vzc2lu ZyBtYWRlIHNpbXBsZS4uLiAqLw0KICAgIGlmIChhcmdjPjEpIGRlYnVnPTE7 DQotICAgLyoganVzdCBpbiBjYXNlICovDQotICAgc29jayA9IG9wZW4oIi9k ZXYvbnVsbCIsT19SRFdSKTsNCi0gICBkdXAyKHNvY2ssMCk7DQotICAgZHVw Mihzb2NrLDEpOw0KLSAgIGR1cDIoc29jaywyKTsNCi0JDQogICAgYnplcm8o JmFkZHIsIHNpemVvZihhZGRyKSk7DQogICAgYWRkci5zdW5fZmFtaWx5ID0g QUZfTE9DQUw7DQogICAgc3RybmNweShhZGRyLnN1bl9wYXRoLF9QQVRIX0xP RyxzaXplb2YoYWRkci5zdW5fcGF0aCktMSk7DQogICAgc29jayA9IHNvY2tl dChBRl9MT0NBTCwgU09DS19TVFJFQU0sMCk7DQorICAgDQorICAgZm9yIChp ID0gMDsgaSA8IDEwOyBpKyspIHsNCisgICAgICBpZiAob3BlbihMT0dfTE9D SywgT19XUk9OTFkgfCBPX0NSRUFUIHwgT19FWENMLCAwKSA9PSAtMSkgew0K KwkgaWYgKGVycm5vID09IEVFWElTVCkgew0KKwkgICAgc2xlZXAoMSk7DQor CSAgICBjb250aW51ZTsNCisJIH0gZWxzZSB7DQorCSAgICBjYW5ub3RfbG9j ayA9IDE7DQorCSAgICBicmVhazsNCisJIH0NCisgICAgICB9IGVsc2Ugew0K Kwkgd2VfaG9sZF9sb2NrID0gMTsNCisJIGJyZWFrOw0KKyAgICAgIH0NCisg ICB9DQorDQorICAgaWYgKCFjYW5ub3RfbG9jayAmJiAhd2VfaG9sZF9sb2Nr KSB7DQorICAgICAgZnByaW50ZihzdGRlcnIsICJtaW5pbG9nZDogY2Fubm90 IGdldCBsb2NrIGZvciAvZGV2L2xvZ1xuIik7DQorICAgICAgZXhpdCg2KTsN CisgICB9DQorDQogICAgdW5saW5rKF9QQVRIX0xPRyk7DQorDQorICAgLyog anVzdCBpbiBjYXNlICovDQorICAgc29jayA9IG9wZW4oIi9kZXYvbnVsbCIs T19SRFdSKTsNCisgICBkdXAyKHNvY2ssMCk7DQorICAgZHVwMihzb2NrLDEp Ow0KKyAgIGR1cDIoc29jaywyKTsNCisNCiAgICAvKiBCaW5kIHNvY2tldCBi ZWZvcmUgZm9ya2luZywgc28gd2Uga25vdyBpZiB0aGUgc2VydmVyIHN0YXJ0 ZWQgKi8NCiAgICBpZiAoIWJpbmQoc29jaywoc3RydWN0IHNvY2thZGRyICop ICZhZGRyLCBzaXplb2YoYWRkcikpKSB7DQorICAgICAgaWYgKHdlX2hvbGRf bG9jaykNCisJIHVubGluayhMT0dfTE9DSyk7DQogICAgICAgd2Vfb3duX2xv ZyA9IDE7DQogICAgICAgbGlzdGVuKHNvY2ssNSk7DQogICAgICAgaWYgKChw aWQ9Zm9yaygpKT09LTEpIHsNCkBAIC0xNjksNiArMTk4LDggQEAgaW50IG1h aW4oaW50IGFyZ2MsIGNoYXIgKiphcmd2KSB7DQogCSAgZXhpdCg0KTsNCiAg ICAgICB9DQogICAgfSBlbHNlIHsNCisgICAgICBpZiAod2VfaG9sZF9sb2Nr KQ0KKwkgdW5saW5rKExPR19MT0NLKTsNCiAgICAgICBleGl0KDUpOw0KICAg IH0NCiB9DQo= --8323328-2054551758-1052108175=:1278-- From arvidjaar@mail.ru Sun May 4 22:20:12 2003 Received: with ECARTIS (v1.0.0; list devfs); Sun, 04 May 2003 22:20:17 -0700 (PDT) Received: from f6.mail.ru (f6.mail.ru [194.67.57.36]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h455K9Fu005353 for ; Sun, 4 May 2003 22:20:11 -0700 Received: from mail by f6.mail.ru with local id 19CYOW-00091k-00; Mon, 05 May 2003 09:20:08 +0400 Received: from [212.248.25.26] by win.mail.ru with HTTP; Mon, 05 May 2003 09:20:08 +0400 From: "Andrey Borzenkov" To: "Pavel Roskin" Cc: devfs@oss.sgi.com Subject: Re[3]: [PATCH] fix initlog/minilogd deadlock on /dev/log access Mime-Version: 1.0 X-Mailer: mPOP Web-Mail 2.19 X-Originating-IP: [212.248.25.26] Date: Mon, 05 May 2003 09:20:08 +0400 In-Reply-To: Reply-To: "Andrey Borzenkov" Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit Message-Id: X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 126 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: arvidjaar@mail.ru Precedence: bulk X-list: devfs > > Hello! > thank you for taking time to investigate. > The problem with the hang on "Finding module dependencies" has been traced > to a deadlock in the kernel, possibly in devfs. > > One minilogd calls bind() on /dev/log, and the other calls unlink() on the > same file. The problem only happens with Linux 2.4.x (not 2.5.x) and only > if devfs is mounted. > > Given that it's essentially a race condition, we cannot be sure that the > problem doesn't exist without devfs or on 2.5.x kernels. However, devfs > must be involved somehow. > > Since both minilogd processes are in the "D" state, I don't think devfsd > is involved. They both are waiting in the kernel. > I could not reproduce D state. Could you send me your stack > The attached patch can be used as a workaround. It prevents minilogd > from doing unlink() while another one is calling bind() on the same file. > The patch is very ugly and is not meant to be applied. However, it > demonstrates where the problem lies. > Yes, I know that it involves several concurrent minilogd. Your patch fixes just symptom not the reason why it happens :) > I think there is more that one program to blame. We have two initlog > processes, each calling minilogd. That's the first problem. > This is the exact problem that was fixed in my patch. What happens, is 1) first initlog is started, finds out /dev/log is unavailable, starts minilogd 2) first initlog calls openlog/syslog/closelog. openlog (which does connect) updates /dev/log atime. minilogd assumes syslog has been started and exits 3) on next call initlog finds /dev/log is unavailable and starts minilogd again; 4) see 2) to have deadlock we need second initlog somewhere - dammit, I missed that fact: # The root filesystem is now read-write, so we can now log # via syslog() directly.. if [ -n "$IN_INITLOG" ]; then IN_INITLOG= fi That perfectly explains why it happens at this place. Instead of logging via single initlog that is parent of rc.sysinit we now start seperate initlog for every action each one starting minilogd every time it logs a line. At some point one of them happily hangs. I am about to clean up initscripts anyway; one more point to consider. > We have minilogd removing /dev/log without trying to find if it's in use. > This may or may not be OK. At least it's not OK if the kernel is not > fixed. That's the second problem. > there is no way to find out if it is in use without race conditions. The only way is to try to connect but you can't be sure it won't be removed and recreated between connect and bind. > unlink() and bind() can deadlock on devfs. That's the third problem. If > it's a known problem and it was fixed in 2.5.x kernels, it would be really > nice to propagate the fix to the 2.4.x series, because hanging Red Hat is > a major annoyance, and it took me months living with even worse > workarounds and a whole day to track it down. > I do not think it is related to devfs, but who knows ... I still believe devfs just triggers bad minilogd behaviour that was hidden before. > There are some more details in this bug: > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=85621 > > I'll try to create a testcase not involving reboot. Also, I'll be able to > run the logs produced by Alt-SysRq T through ksymoops. But I'm going to > be quite busy next week, so I'll appreciate if someone else beats me as > that. > I was not able reproduce the D state case, but I can add fourth problem: initlogs sleep in connect. For all I can tell this cannot happen - if socket is not in LISTEN connect should be refused. What I did was - stop syslog - do something like (in zsh) for i in {1..20}; do initlog test & initlog test & initlog test & done I get many minilogds looping on poll and many initlogs hanging in connect. The fact minilogs loops on poll means it listens on SOCKET but never gets anything to accept. So, what is this socket all initlogs are trying to connect to? thank you -andrey From proski@gnu.org Mon May 5 11:40:36 2003 Received: with ECARTIS (v1.0.0; list devfs); Mon, 05 May 2003 11:40:44 -0700 (PDT) Received: from fencepost.gnu.org (fencepost.gnu.org [199.232.76.164]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h45IeZFu021504 for ; Mon, 5 May 2003 11:40:36 -0700 Received: from proski by fencepost.gnu.org with local (Exim 4.10) id 19Ckt9-0006j8-00; Mon, 05 May 2003 14:40:35 -0400 Date: Mon, 5 May 2003 14:40:34 -0400 (EDT) From: Pavel Roskin X-X-Sender: proski@marabou.research.att.com To: Andrey Borzenkov cc: devfs@oss.sgi.com Subject: Re[3]: [PATCH] fix initlog/minilogd deadlock on /dev/log access In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 127 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: proski@gnu.org Precedence: bulk X-list: devfs On Mon, 5 May 2003, Andrey Borzenkov wrote: > > One minilogd calls bind() on /dev/log, and the other calls unlink() on the > > same file. The problem only happens with Linux 2.4.x (not 2.5.x) and only > > if devfs is mounted. Yes. I could run that system with serial console dump the process list by Break-T (the same as Alt-SysRq-T on the local console). The result is available here: https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=91500&action=view It is clearly visible that there is a deadlock between sys_unlink() and sys_bind(), which confirms my findings based on adding debug information to minilogd. > > Since both minilogd processes are in the "D" state, I don't think devfsd > > is involved. They both are waiting in the kernel. > > I could not reproduce D state. Could you send me your stack You see the stack. I have standard /etc/devfsd.conf without statements to ignore /dev/log. > Yes, I know that it involves several concurrent minilogd. Your > patch fixes just symptom not the reason why it happens :) Absolutely. Actually, 2.5.x kernels don't hang, but I see a lot of output on the console on startup, maybe because minilogs fails to work correctly. Fixing the hang in the devfs code is important but certainly insufficient. > This is the exact problem that was fixed in my patch. What happens, is > > 1) first initlog is started, finds out /dev/log is unavailable, > starts minilogd [snip] It would be great if you post it on bugzilla. I feel my knowledge in this area is insufficient. Besides, it's a wrong list. > > We have minilogd removing /dev/log without trying to find if it's in use. > > This may or may not be OK. At least it's not OK if the kernel is not > > fixed. That's the second problem. > > there is no way to find out if it is in use without race conditions. > The only way is to try to connect but you can't be sure it won't be > removed and recreated between connect and bind. Properly designed API should provide means to lock an object atomically when it's created, but UNIX sockets are decades old - it's hard to expect them to be properly designed. Still, I hope there is a solution (although I'm not an expert in this area). > I do not think it is related to devfs, but who knows ... I still > believe devfs just triggers bad minilogd behavior that was hidden > before. It's deadlock between devfsd_notify_de() and devfs_d_revalidate_wait(). It's pretty hard to deny that devfs is involved :-) > I get many minilogds looping on poll and many initlogs hanging in > connect. The fact minilogs loops on poll means it listens on SOCKET but > never gets anything to accept. So, what is this socket all initlogs are > trying to connect to? Please copy it to bugzilla. I didn't write minilogd. -- Regards, Pavel Roskin From arvidjaar@mail.ru Mon May 5 13:48:35 2003 Received: with ECARTIS (v1.0.0; list devfs); Mon, 05 May 2003 13:48:43 -0700 (PDT) Received: from hueymiccailhuitl.mtu.ru (hueytecuilhuitl.mtu.ru [195.34.32.123]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h45KmXFu023811 for ; Mon, 5 May 2003 13:48:35 -0700 Received: from ppp137-130.dialup.mtu-net.ru (ppp137-130.dialup.mtu-net.ru [62.118.137.130]) by hueymiccailhuitl.mtu.ru (Postfix) with ESMTP id BC847F886A; Tue, 6 May 2003 00:46:21 +0400 (MSD) (envelope-from arvidjaar@mail.ru) From: Andrey Borzenkov To: Pavel Roskin Subject: Re: [PATCH] fix initlog/minilogd deadlock on /dev/log access Date: Tue, 6 May 2003 00:47:38 +0400 User-Agent: KMail/1.5 Cc: devfs@oss.sgi.com References: In-Reply-To: MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_q3st+ExCIht9hg2" Message-Id: <200305060047.39086.arvidjaar@mail.ru> X-archive-position: 128 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: arvidjaar@mail.ru Precedence: bulk X-list: devfs --Boundary-00=_q3st+ExCIht9hg2 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline please, try attached proof-of-concept patch (untested). It is the best I can come up with at 0:45 a.m. On Monday 05 May 2003 22:40, Pavel Roskin wrote: > On Mon, 5 May 2003, Andrey Borzenkov wrote: > > > One minilogd calls bind() on /dev/log, and the other calls unlink() on > > > the same file. The problem only happens with Linux 2.4.x (not 2.5.x) > > > and only if devfs is mounted. > > Yes. I could run that system with serial console dump the process list by > Break-T (the same as Alt-SysRq-T on the local console). The result is > available here: > > https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=91500&action=view > > It is clearly visible that there is a deadlock between sys_unlink() and > sys_bind(), which confirms my findings based on adding debug information > to minilogd. > Argh! It was not bind/unlink - it was two concurrent lookups on non-existent entry that was just removed by one of minilod's. minilogd1 minilog2 path_lookup("dev/log", LOOKUP_PARENT, &nd); -> yields dentry for path_lookup("dev/log", LOOKUP_PARENT, &nd); -> yields dentry for down(->i_sem) holds i_sem down(->i_sem) - sleeps lookup_hash("log", ) . devfs_lookup(, "log") . MISS . set "log"->d_op to &devfs_wait_dops; . init "log"->wait_queue . up(->i_sem) . . obtains i_sem lookup_hash("log", ); cached_lookup(, "log", 0) devfs_d_revalidate_wait("log", 0) wait on "log"->wait_queue ... waits to be waked up by devfs_lookup try_modload("log", bla bla bla) down(->i_sem) deadlock wakeup("log"->wait_queue) happens only here [...] I have to think it over. It is not trivial to fix. It appears ->d_revalidate is sometimes called under i_sem and sometimes without i_sem. Probably rearranging devfs_lookup may do. The whole story sucks. Do you know who maintains devfs in 2.4 now? I am not sure the problem does not exist in 2.5. It depends on precise timing and scheduling so it may stil be there, just not exposed as yet. -andrey thousands thanks. Esp. as the problem magically stopped happen here :( --Boundary-00=_q3st+ExCIht9hg2 Content-Type: text/x-diff; charset="iso-8859-1"; name="devfs.minilogd.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="devfs.minilogd.patch" --- linux-2.4.21-0.13mdk/fs/devfs/base.c.minilogd 2002-11-29 02:53:15.000000000 +0300 +++ linux-2.4.21-0.13mdk/fs/devfs/base.c 2003-05-06 00:41:35.000000000 +0400 @@ -3038,7 +3038,6 @@ static struct dentry *devfs_lookup (stru revalidation */ up (&dir->i_sem); wait_for_devfsd_finished (fs_info); /* If I'm not devfsd, must wait */ - down (&dir->i_sem); /* Grab it again because them's the rules */ de = lookup_info.de; /* If someone else has been so kind as to make the inode, we go home early */ @@ -3068,6 +3067,7 @@ out: write_lock (&parent->u.dir.lock); wake_up (&lookup_info.wait_queue); write_unlock (&parent->u.dir.lock); + down (&dir->i_sem); /* Grab it again because them's the rules */ devfs_put (de); return retval; } /* End Function devfs_lookup */ --Boundary-00=_q3st+ExCIht9hg2-- From proski@gnu.org Mon May 5 14:34:43 2003 Received: with ECARTIS (v1.0.0; list devfs); Mon, 05 May 2003 14:34:47 -0700 (PDT) Received: from fencepost.gnu.org (fencepost.gnu.org [199.232.76.164]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h45LYeFu024803 for ; Mon, 5 May 2003 14:34:43 -0700 Received: from proski by fencepost.gnu.org with local (Exim 4.10) id 19Cnbb-0001rV-00; Mon, 05 May 2003 17:34:39 -0400 Date: Mon, 5 May 2003 17:34:36 -0400 (EDT) From: Pavel Roskin X-X-Sender: proski@marabou.research.att.com To: Andrey Borzenkov cc: devfs@oss.sgi.com Subject: Re: [PATCH] fix initlog/minilogd deadlock on /dev/log access In-Reply-To: <200305060047.39086.arvidjaar@mail.ru> Message-ID: References: <200305060047.39086.arvidjaar@mail.ru> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 129 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: proski@gnu.org Precedence: bulk X-list: devfs On Tue, 6 May 2003, Andrey Borzenkov wrote: > please, try attached proof-of-concept patch (untested). It is the best I can > come up with at 0:45 a.m. Yes it helps, thank you! 5 reboots with original minilogd and rc.sysinit, no hangs. The patch also applies to Linux 2.5.69 and has no visible effect, which is good because that kernel was already OK. -- Regards, Pavel Roskin From james_mcmechan@hotmail.com Mon May 5 22:12:42 2003 Received: with ECARTIS (v1.0.0; list devfs); Mon, 05 May 2003 22:12:50 -0700 (PDT) Received: from hotmail.com (bay2-dav60.bay2.hotmail.com [65.54.246.195]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h465CgFu028995 for ; Mon, 5 May 2003 22:12:42 -0700 Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Mon, 5 May 2003 22:12:37 -0700 Received: from 66.52.12.207 by bay2-dav60.bay2.hotmail.com with DAV; Tue, 06 May 2003 05:12:36 +0000 X-Originating-IP: [66.52.12.207] X-Originating-Email: [james_mcmechan@hotmail.com] From: "James McMechan" To: References: <200305060047.39086.arvidjaar@mail.ru> Subject: Re: [PATCH] fix initlog/minilogd deadlock on /dev/log access Date: Mon, 5 May 2003 22:11:55 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4522.1200 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Message-ID: X-OriginalArrivalTime: 06 May 2003 05:12:37.0123 (UTC) FILETIME=[1BEBD530:01C3138E] X-archive-position: 130 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: James_McMechan@hotmail.com Precedence: bulk X-list: devfs I have also found that your patch greatly improves things on my systems. I had been preparing to try the minilog patch, I ended up trying the kernel patch since I had the kernel tree installed. I have booted my systems several times without problem. I had previously had a boot failure rate of about 50% Thank you for finding this problem and better yet the solution. From Andrey.Borzenkov@siemens.com Tue May 6 01:28:45 2003 Received: with ECARTIS (v1.0.0; list devfs); Tue, 06 May 2003 01:28:55 -0700 (PDT) Received: from david.siemens.de (david.siemens.de [192.35.17.14]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h468ShFu031253 for ; Tue, 6 May 2003 01:28:45 -0700 Received: from mail1.siemens.de (mail1.siemens.de [139.23.33.14]) by david.siemens.de (8.11.7/8.11.7) with ESMTP id h468Sft28139; Tue, 6 May 2003 10:28:41 +0200 (MEST) Received: from MOWD019A.mow.siemens.ru ([163.242.196.119]) by mail1.siemens.de (8.11.7/8.11.7) with ESMTP id h468Ser14128; Tue, 6 May 2003 10:28:40 +0200 (MEST) Received: by mowd019a.mow.siemens.ru with Internet Mail Service (5.5.2653.19) id ; Tue, 6 May 2003 12:32:43 +0400 Received: from mw2b210c (163.242.193.12 [163.242.193.12]) by MOWD019A.mow.siemens.ru with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id KLLHVV9M; Tue, 6 May 2003 12:32:38 +0400 From: Borzenkov Andrey To: "'James McMechan'" , devfs@oss.sgi.com Subject: RE: [PATCH] fix initlog/minilogd deadlock on /dev/log access Date: Tue, 6 May 2003 12:28:30 +0400 Message-ID: <6134254DE87BD411908B00A0C99B044F05A0C918@mowd019a.mow.siemens.ru> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4510 In-Reply-To: Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h468ShFu031253 X-archive-position: 131 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: Andrey.Borzenkov@siemens.com Precedence: bulk X-list: devfs > > I have also found that your patch greatly improves things on my systems. > I had been preparing to try the minilog patch, I ended up trying the > kernel patch since I had the kernel tree installed. > I still advice you to use minilogd patch as well. It prevents you from loosing boot messages up to the point syslog is started. The second you need to do - make devfsd ignore /dev/log and remove it from any dev backing store you have (if any). This will trigger minilogd exit as well so at least some of the messages (up to devfsd start) go lost. > I have booted my systems several times without problem. I had previously > had a boot failure rate of about 50% > yeah, same here. > Thank you for finding this problem and better yet the solution. Pavel did all the work. I just put two ends together. With stack trace problem became obvious. -andrey From arvidjaar@mail.ru Tue May 6 02:03:47 2003 Received: with ECARTIS (v1.0.0; list devfs); Tue, 06 May 2003 02:03:49 -0700 (PDT) Received: from mx4.mail.ru (fallback.mail.ru [194.67.57.14]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4693ZFu032299 for ; Tue, 6 May 2003 02:03:36 -0700 Received: from f11.mail.ru (f11.mail.ru [194.67.57.41]) by mx4.mail.ru (mPOP.Fallback_MX) with ESMTP id E195210189A for ; Tue, 6 May 2003 12:32:35 +0400 (MSD) Received: from mail by f11.mail.ru with local id 19CxrI-00006M-00; Tue, 06 May 2003 12:31:32 +0400 Received: from [212.248.25.26] by win.mail.ru with HTTP; Tue, 06 May 2003 12:31:32 +0400 From: "Andrey Borzenkov" To: "Pavel Roskin" Cc: devfs@oss.sgi.com Subject: Re[2]: [PATCH] fix initlog/minilogd deadlock on /dev/log access Mime-Version: 1.0 X-Mailer: mPOP Web-Mail 2.19 X-Originating-IP: [212.248.25.26] Date: Tue, 06 May 2003 12:31:32 +0400 In-Reply-To: Reply-To: "Andrey Borzenkov" Content-Type: text/plain; charset=koi8-r Message-Id: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h4693ZFu032299 X-archive-position: 132 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: arvidjaar@mail.ru Precedence: bulk X-list: devfs > > On Tue, 6 May 2003, Andrey Borzenkov wrote: > > > please, try attached proof-of-concept patch (untested). It is the best I can > > come up with at 0:45 a.m. > > Yes it helps, thank you! 5 reboots with original minilogd and rc.sysinit, > no hangs. > I'll wait till the end of week for any regression. Please tell if you observe some. After review the patch appears safe so I'll submit it for 2.4 tree. > The patch also applies to Linux 2.5.69 and has no visible effect, which is > good because that kernel was already OK. > If it is applies it is bad. It means, unless locking rules has changed, 2.5 is likely to have the same deadlock situation. thank you once more for excellent debugging work. -andrey From proski@gnu.org Tue May 13 11:59:22 2003 Received: with ECARTIS (v1.0.0; list devfs); Tue, 13 May 2003 11:59:29 -0700 (PDT) Received: from fencepost.gnu.org (fencepost.gnu.org [199.232.76.164]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4DIxLFu012516 for ; Tue, 13 May 2003 11:59:22 -0700 Received: from proski by fencepost.gnu.org with local (Exim 4.10) id 19Fezg-0004ve-00 for devfs@oss.sgi.com; Tue, 13 May 2003 14:59:20 -0400 Date: Tue, 13 May 2003 14:59:19 -0400 (EDT) From: Pavel Roskin X-X-Sender: proski@marabou.research.att.com To: devfs@oss.sgi.com Subject: Is this list archived? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 133 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: proski@gnu.org Precedence: bulk X-list: devfs Hello! I wanted to show the recent discussion about the race condition in devfs to Christoph Hellwig, who is also aware of some race condition in devfs (http://lwn.net/Articles/31952/), but I cannot find the archives of the mailing list. The link from http://www.atnf.csiro.au/people/rgooch/linux/docs/devfs.html to http://oss.sgi.com/projects/devfs/archive/ doesn't work (page not found). -- Regards, Pavel Roskin From Andrey.Borzenkov@siemens.com Tue May 13 22:17:24 2003 Received: with ECARTIS (v1.0.0; list devfs); Tue, 13 May 2003 22:17:28 -0700 (PDT) Received: from goliath.siemens.de (goliath.siemens.de [192.35.17.28]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4E5HKFu021902 for ; Tue, 13 May 2003 22:17:22 -0700 Received: from mail2.siemens.de (mail2.siemens.de [139.25.208.11]) by goliath.siemens.de (8.11.7/8.11.7) with ESMTP id h4E5HGU19638; Wed, 14 May 2003 07:17:16 +0200 (MEST) Received: from MOWD019A.mow.siemens.ru ([163.242.196.119]) by mail2.siemens.de (8.11.7/8.11.7) with ESMTP id h4E5HFE20711; Wed, 14 May 2003 07:17:15 +0200 (MEST) Received: by mowd019a.mow.siemens.ru with Internet Mail Service (5.5.2653.19) id ; Wed, 14 May 2003 09:21:33 +0400 Received: from mw2b210c (163.242.193.12 [163.242.193.12]) by MOWD019A.mow.siemens.ru with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id K661JTVV; Wed, 14 May 2003 09:21:32 +0400 From: Borzenkov Andrey To: "'Pavel Roskin'" , devfs@oss.sgi.com Subject: RE: Is this list archived? Date: Wed, 14 May 2003 09:17:09 +0400 Message-ID: <6134254DE87BD411908B00A0C99B044F05A0C91F@mowd019a.mow.siemens.ru> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4510 In-Reply-To: Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h4E5HKFu021902 X-archive-position: 134 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: Andrey.Borzenkov@siemens.com Precedence: bulk X-list: devfs > Hello! > > I wanted to show the recent discussion about the race condition in devfs > to Christoph Hellwig, who is also aware of some race condition in devfs > (http://lwn.net/Articles/31952/), but I cannot find the archives of the > mailing list. > > The link from > http://www.atnf.csiro.au/people/rgooch/linux/docs/devfs.html to > http://oss.sgi.com/projects/devfs/archive/ doesn't work (page not found). > the archives are dead for a long time. I posted the full details including patch on lkml: http://marc.theaimsgroup.com/?l=linux-kernel&m=105233420622539&w=2 you can refer to this post. I appreciate if you keep me informed. BTW I checked 2.5 and Devfs code in question is the same. Unless locking rules changed substantially (not yet checked) it means the same race condition. What people think about moving the project to public place like sourceforge? I have patch for devfsd moutils/module-init-tools interoperability (will post later), some modules.devfs cleanups but how can update(s) be made available? -andrey From rgooch@ras.ucalgary.ca Tue May 13 22:52:17 2003 Received: with ECARTIS (v1.0.0; list devfs); Tue, 13 May 2003 22:52:22 -0700 (PDT) Received: from vindaloo.ras.ucalgary.ca (vindaloo.ras.ucalgary.ca [136.159.55.21]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4E5qEFu022620 for ; Tue, 13 May 2003 22:52:17 -0700 Received: (from rgooch@localhost) by vindaloo.ras.ucalgary.ca (8.12.9/8.12.9) id h4E5q7Bo011113; Tue, 13 May 2003 23:52:07 -0600 Date: Tue, 13 May 2003 23:52:07 -0600 Message-Id: <200305140552.h4E5q7Bo011113@vindaloo.ras.ucalgary.ca> From: Richard Gooch To: Borzenkov Andrey Cc: "'Pavel Roskin'" , devfs@oss.sgi.com Subject: RE: Is this list archived? In-Reply-To: <6134254DE87BD411908B00A0C99B044F05A0C91F@mowd019a.mow.siemens.ru> References: <6134254DE87BD411908B00A0C99B044F05A0C91F@mowd019a.mow.siemens.ru> X-archive-position: 135 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: rgooch@ras.ucalgary.ca Precedence: bulk X-list: devfs Borzenkov Andrey writes: > > > Hello! > > > > I wanted to show the recent discussion about the race condition in devfs > > to Christoph Hellwig, who is also aware of some race condition in devfs > > (http://lwn.net/Articles/31952/), but I cannot find the archives of the > > mailing list. > > > > The link from > > http://www.atnf.csiro.au/people/rgooch/linux/docs/devfs.html to > > http://oss.sgi.com/projects/devfs/archive/ doesn't work (page not found). > > > > the archives are dead for a long time. I posted the full details including > patch on lkml: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=105233420622539&w=2 > > you can refer to this post. I appreciate if you keep me informed. > > BTW I checked 2.5 and Devfs code in question is the same. Unless locking > rules changed substantially (not yet checked) it means the same race > condition. > > What people think about moving the project to public place like > sourceforge? I have patch for devfsd moutils/module-init-tools > interoperability (will post later), some modules.devfs cleanups but > how can update(s) be made available? No, I haven't given up maintenance. Let me get my life back in order and patches will start flowing again. Regards, Richard.... Permanent: rgooch@atnf.csiro.au Current: rgooch@ras.ucalgary.ca From Andrey.Borzenkov@siemens.com Tue May 13 23:18:07 2003 Received: with ECARTIS (v1.0.0; list devfs); Tue, 13 May 2003 23:18:20 -0700 (PDT) Received: from thoth.sbs.de (thoth.sbs.de [192.35.17.2]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4E6I5Fu023164 for ; Tue, 13 May 2003 23:18:06 -0700 Received: from mail2.siemens.de (mail2.siemens.de [139.25.208.11]) by thoth.sbs.de (8.11.7/8.11.7) with ESMTP id h4E6I4229629 for ; Wed, 14 May 2003 08:18:04 +0200 (MEST) Received: from MOWD019A.mow.siemens.ru ([163.242.196.119]) by mail2.siemens.de (8.11.7/8.11.7) with ESMTP id h4E6I3E24169 for ; Wed, 14 May 2003 08:18:04 +0200 (MEST) Received: by mowd019a.mow.siemens.ru with Internet Mail Service (5.5.2653.19) id ; Wed, 14 May 2003 10:22:21 +0400 Received: from mw2b210c (163.242.193.12 [163.242.193.12]) by MOWD019A.mow.siemens.ru with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id K661J4DC; Wed, 14 May 2003 10:22:14 +0400 From: Borzenkov Andrey To: devfs@oss.sgi.com Subject: RE: Is this list archived? Date: Wed, 14 May 2003 10:17:52 +0400 Message-ID: <6134254DE87BD411908B00A0C99B044F05A0C920@mowd019a.mow.siemens.ru> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4510 In-Reply-To: <200305140552.h4E5q7Bo011113@vindaloo.ras.ucalgary.ca> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-archive-position: 136 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: Andrey.Borzenkov@siemens.com Precedence: bulk X-list: devfs > > No, I haven't given up maintenance. Let me get my life back in order > and patches will start flowing again. > Hey ho! Very pleased to hear you again. Hopefully nothing evil happened. Best wishes -andrey From proski@gnu.org Tue May 27 08:29:52 2003 Received: with ECARTIS (v1.0.0; list devfs); Tue, 27 May 2003 08:30:33 -0700 (PDT) Received: from fencepost.gnu.org (fencepost.gnu.org [199.232.76.164]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4RFTh2x003651 for ; Tue, 27 May 2003 08:29:52 -0700 Received: from proski by fencepost.gnu.org with local (Exim 4.20) id 19KgOU-0007eW-Nt; Tue, 27 May 2003 11:29:42 -0400 Date: Tue, 27 May 2003 11:29:53 -0400 (EDT) From: Pavel Roskin X-X-Sender: proski@marabou.research.att.com To: devfs@oss.sgi.com cc: linux-kernel@vger.kernel.org Subject: [PATCH] Graceful failure in devfs_remove() in 2.5.x Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 137 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: proski@gnu.org Precedence: bulk X-list: devfs Hello! It's already the second time that I encounter a kernel panic in the same place. When devfs_remove() is called on a non-existent file entry, the kernel panics and I have to reboot the system. First time it was unregistering of pseudoterminals. This time it's ide-floppy module that doesn't register devfs entries if the media is absent but still tries to unregister them. The bug in ide-floppy will be reported separately. The point of this message is that the failure in devfs_remove() is possible, especially with rarely used drivers. Secondly, is not fatal enough to justify an immediate panic and reboot. Thirdly, devfs misses a chance to tell the user what's going wrong. This patch makes devfs_remove() print an error to the kernel log and continue. PRINTK is defined in fs/devfs/base.c to report errors in the cases like this one: #define PRINTK(format, args...) \ {printk (KERN_ERR "%s" format, __FUNCTION__ , ## args);} The patch: ============================================== --- linux.orig/fs/devfs/base.c +++ linux/fs/devfs/base.c @@ -1710,6 +1710,11 @@ void devfs_remove(const char *fmt, ...) if (n < 64 && buf[0]) { devfs_handle_t de = _devfs_find_entry(NULL, buf, 0); + if (!de) { + PRINTK ("(%s): not found, cannot remove\n", buf); + return; + } + write_lock(&de->parent->u.dir.lock); _devfs_unregister(de->parent, de); devfs_put(de); ============================================== The patch is against Linux 2.5.70. Linux 2.4.21-rc4 already has protection against panic although it doesn't print the error message - see devfs_unlink() in fs/devfs/base.c -- Regards, Pavel Roskin From hch@infradead.org Wed May 28 02:46:09 2003 Received: with ECARTIS (v1.0.0; list devfs); Wed, 28 May 2003 02:46:15 -0700 (PDT) Received: from phoenix.infradead.org (phoenix.infradead.org [195.224.96.167]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4S9k82x015938 for ; Wed, 28 May 2003 02:46:09 -0700 Received: from hch by phoenix.infradead.org with local (Exim 4.10) id 19KxVT-0007AX-00; Wed, 28 May 2003 10:46:03 +0100 Date: Wed, 28 May 2003 10:46:03 +0100 From: Christoph Hellwig To: Pavel Roskin Cc: devfs@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] Graceful failure in devfs_remove() in 2.5.x Message-ID: <20030528104603.A27503@infradead.org> Mail-Followup-To: Christoph Hellwig , Pavel Roskin , devfs@oss.sgi.com, linux-kernel@vger.kernel.org References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from proski@gnu.org on Tue, May 27, 2003 at 11:29:53AM -0400 X-archive-position: 138 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: devfs On Tue, May 27, 2003 at 11:29:53AM -0400, Pavel Roskin wrote: > This patch makes devfs_remove() print an error to the kernel log and > continue. PRINTK is defined in fs/devfs/base.c to report errors in the > cases like this one: Patch looks okay _except_ for use of this gross macro. Just ise plain printk instead. From proski@gnu.org Wed May 28 09:00:30 2003 Received: with ECARTIS (v1.0.0; list devfs); Wed, 28 May 2003 09:01:04 -0700 (PDT) Received: from fencepost.gnu.org (fencepost.gnu.org [199.232.76.164]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4SG0T2x029607 for ; Wed, 28 May 2003 09:00:29 -0700 Received: from proski by fencepost.gnu.org with local (Exim 4.20) id 19L3Lh-0000hG-G0; Wed, 28 May 2003 12:00:21 -0400 Date: Wed, 28 May 2003 12:00:20 -0400 (EDT) From: Pavel Roskin X-X-Sender: proski@marabou.research.att.com To: Christoph Hellwig cc: devfs@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] Graceful failure in devfs_remove() in 2.5.x In-Reply-To: <20030528104603.A27503@infradead.org> Message-ID: References: <20030528104603.A27503@infradead.org> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323328-869121793-1054137620=:1702" X-archive-position: 139 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: proski@gnu.org Precedence: bulk X-list: devfs This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. --8323328-869121793-1054137620=:1702 Content-Type: TEXT/PLAIN; charset=US-ASCII On Wed, 28 May 2003, Christoph Hellwig wrote: > On Tue, May 27, 2003 at 11:29:53AM -0400, Pavel Roskin wrote: > > This patch makes devfs_remove() print an error to the kernel log and > > continue. PRINTK is defined in fs/devfs/base.c to report errors in the > > cases like this one: > > Patch looks okay _except_ for use of this gross macro. Just > ise plain printk instead. I always try to follow the existing code style, but if you want me to make an exception, here it is. Fixed patch is attached. -- Regards, Pavel Roskin --8323328-869121793-1054137620=:1702 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="devfs-rm.diff" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="devfs-rm.diff" LS0tIGxpbnV4Lm9yaWcvZnMvZGV2ZnMvYmFzZS5jDQorKysgbGludXgvZnMv ZGV2ZnMvYmFzZS5jDQpAQCAtMTcxMCw2ICsxNzEwLDEyIEBAIHZvaWQgZGV2 ZnNfcmVtb3ZlKGNvbnN0IGNoYXIgKmZtdCwgLi4uKQ0KIAlpZiAobiA8IDY0 ICYmIGJ1ZlswXSkgew0KIAkJZGV2ZnNfaGFuZGxlX3QgZGUgPSBfZGV2ZnNf ZmluZF9lbnRyeShOVUxMLCBidWYsIDApOw0KIA0KKwkJaWYgKCFkZSkgew0K KwkJCXByaW50ayhLRVJOX0VSUiAiJXM6ICVzIG5vdCBmb3VuZCwgY2Fubm90 IHJlbW92ZVxuIiwNCisJCQkgICAgICAgX19GVU5DVElPTl9fLCBidWYpOw0K KwkJCXJldHVybjsNCisJCX0NCisNCiAJCXdyaXRlX2xvY2soJmRlLT5wYXJl bnQtPnUuZGlyLmxvY2spOw0KIAkJX2RldmZzX3VucmVnaXN0ZXIoZGUtPnBh cmVudCwgZGUpOw0KIAkJZGV2ZnNfcHV0KGRlKTsNCg== --8323328-869121793-1054137620=:1702-- From proski@gnu.org Wed May 28 09:36:01 2003 Received: with ECARTIS (v1.0.0; list devfs); Wed, 28 May 2003 09:36:13 -0700 (PDT) Received: from fencepost.gnu.org (fencepost.gnu.org [199.232.76.164]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4SGa02x029981 for ; Wed, 28 May 2003 09:36:00 -0700 Received: from proski by fencepost.gnu.org with local (Exim 4.20) id 19L3uB-0002NB-TG; Wed, 28 May 2003 12:35:59 -0400 Date: Wed, 28 May 2003 12:35:58 -0400 (EDT) From: Pavel Roskin X-X-Sender: proski@marabou.research.att.com To: devfs@oss.sgi.com, ide-floppy-test@lists.sourceforge.net cc: Paul Bristow Subject: ide-floppy doesn't register with devfs if no media Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 140 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: proski@gnu.org Precedence: bulk X-list: devfs Hello! There is a bug in 2.5.x kernels (last tested with 2.5.69-bk18) - devfs entries are not registered for IDE floppies if the media is absent while the ide-floppy driver is being initialized. ide-floppy sets the number of minor numbers to more than one in idefloppy_attach(), drivers/ide/ide-floppy.c and calls add_disk() in drivers/block/genhd.c, which calls register_disk() in fs/partitions/check.c. register_disk() registers devfs entries for devices with one minor number (e.g. standard PC floppy) and then checks media. If the media is missing, no devfs entries are registered at all. I think if would be better to register at least the "disc" entry. By the way, if ide-floppy is a module, it tries to unregister the devfs entries on unload, which results is a kernel panic. My earlier patch posted today to the devfs list prevents the panic and may be useful for development of ide-floppy until this problem is fixed. -- Regards, Pavel Roskin