From arvidjaar@mail.ru Sun Jul 6 10:23:07 2003 Received: with ECARTIS (v1.0.0; list devfs); Sun, 06 Jul 2003 10:23:19 -0700 (PDT) Received: from hueymiccailhuitl.mtu.ru (hueytecuilhuitl.mtu.ru [195.34.32.123]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h66HN52x019222 for ; Sun, 6 Jul 2003 10:23:06 -0700 Received: from ppp135-79.dialup.mtu-net.ru (ppp135-79.dialup.mtu-net.ru [62.118.135.79]) by hueymiccailhuitl.mtu.ru (Postfix) with ESMTP id A8CB1E0E43; Sun, 6 Jul 2003 21:22:59 +0400 (MSD) (envelope-from arvidjaar@mail.ru) From: Andrey Borzenkov To: linux-kernel@vger.kernel.org Subject: [PATCH][2.5.73] stack corruption in devfs_lookup Date: Sun, 6 Jul 2003 21:06:53 +0400 User-Agent: KMail/1.5 References: In-Reply-To: Cc: devfs@oss.sgi.com MIME-Version: 1.0 Message-Id: <200307062058.40797.arvidjaar@mail.ru> Content-Type: Multipart/Mixed; boundary="Boundary-00=_tcFC/K5rog3javx" X-archive-position: 145 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: arvidjaar@mail.ru Precedence: bulk X-list: devfs --Boundary-00=_tcFC/K5rog3javx Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Doing concurrent lookups for the same name in devfs with devfsd and modules enabled may result in stack coruption. When devfs_lookup needs to call devfsd it arranges for other lookups for the same name to wait. It is using local variable as wait queue head. After devfsd returns devfs_lookup wakes up all waiters and returns. Unfortunately there is no garantee all waiters will actually get chance to run and clean up before devfs_lookup returns. so some of them attempt to access already freed storage on stack. It is trivial to trigger with SMP kernel (I have single-CPU system if it matters) doing while true do ls /dev/foo & done With spinlock debug enabled this results in large number of stacks similar to ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:120! invalid operand: 0000 [#1] CPU: 0 EIP: 0060:[] Tainted: G S EFLAGS: 00010082 EIP is at remove_wait_queue+0xac/0xc0 eax: 0000000e ebx: f6617e4c ecx: 00000000 edx: 00000001 esi: f6747dd0 edi: f6616000 ebp: f6617e10 esp: f6617df0 ds: 007b es: 007b ss: 0068 Process ls (pid: 1517, threadinfo=f6616000 task=f6619900) Stack: c03eb9d5 c011ffa0 00000286 f6617e24 c0443880 f6747dd0 f6616000 f6617e4c f6617e78 c01cb3e6 c04470c0 f6616000 00000246 f6747dcc c1a6f1dc 00000000 f6619900 c011d4e0 00000000 00000000 f7d4b73c f663d005 f6759828 00000000 Call Trace: [] remove_wait_queue+0x0/0xc0 [] devfs_d_revalidate_wait+0x1d6/0x1f0 [] default_wake_function+0x0/0x30 [] default_wake_function+0x0/0x30 [] do_lookup+0x5a/0xa0 [] link_path_walk+0x5be/0xb20 [] kmem_cache_alloc+0x14b/0x190 [] __user_walk+0x3e/0x60 [] vfs_stat+0x1e/0x60 [] do_brk+0x12b/0x200 [] sys_stat64+0x1b/0x40 [] sys_brk+0xf2/0x120 [] do_page_fault+0x0/0x4c5 [] sysenter_past_esp+0x52/0x71 Code: 0f 0b 78 00 6c b0 3e c0 e9 72 ff ff ff 8d b4 26 00 00 00 00 <6>note: ls[1517] exited with preempt_count 1 eip: c011ffa0 without spinlock debug system usually hung dead with reset button as the only possibility. I was not able to reproduce it on 2.4 on single-CPU system - in 2.4 devfs_d_revalidate_wait does not attempt to remove itself from wait queue so it appears to be safe. attached patch is against 2.5.73 but applies to 2.5.74 as well. It makes lookup struct be allocated from heap and adds reference counter to free it when no more needed. regards -andrey --Boundary-00=_tcFC/K5rog3javx Content-Type: text/x-diff; charset="iso-8859-1"; name="2.5.73-devfs_stack_corruption.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="2.5.73-devfs_stack_corruption.patch" --- linux-2.5.73/fs/devfs/base.c.stack 2003-06-23 19:46:44.000000000 +0400 +++ linux-2.5.73/fs/devfs/base.c 2003-06-26 21:01:45.000000000 +0400 @@ -2208,8 +2208,46 @@ struct devfs_lookup_struct { devfs_handle_t de; wait_queue_head_t wait_queue; + atomic_t count; }; +static struct devfs_lookup_struct * +new_devfs_lookup_struct(void) +{ + struct devfs_lookup_struct *p = kmalloc(sizeof(*p), GFP_KERNEL); + + if (!p) + return NULL; + + init_waitqueue_head (&p->wait_queue); + atomic_set(&p->count, 1); + return p; +} + +static void +get_devfs_lookup_struct(struct devfs_lookup_struct *info) +{ + if (info) + atomic_inc(&info->count); + else { + printk(KERN_ERR "null devfs_lookup_struct pointer\n"); + dump_stack(); + } +} + +static void +put_devfs_lookup_struct(struct devfs_lookup_struct *info) +{ + if (info) { + if (!atomic_dec_and_test(&info->count)) + return; + kfree(info); + } else { + printk(KERN_ERR "null devfs_lookup_struct pointer\n"); + dump_stack(); + } +} + /* XXX: this doesn't handle the case where we got a negative dentry but a devfs entry has been registered in the meanwhile */ static int devfs_d_revalidate_wait (struct dentry *dentry, int flags) @@ -2252,11 +2290,13 @@ static int devfs_d_revalidate_wait (stru read_lock (&parent->u.dir.lock); if (dentry->d_fsdata) { + get_devfs_lookup_struct(lookup_info); set_current_state (TASK_UNINTERRUPTIBLE); add_wait_queue (&lookup_info->wait_queue, &wait); read_unlock (&parent->u.dir.lock); schedule (); remove_wait_queue (&lookup_info->wait_queue, &wait); + put_devfs_lookup_struct(lookup_info); } else read_unlock (&parent->u.dir.lock); return 1; @@ -2268,7 +2308,7 @@ static int devfs_d_revalidate_wait (stru static struct dentry *devfs_lookup (struct inode *dir, struct dentry *dentry) { struct devfs_entry tmp; /* Must stay in scope until devfsd idle again */ - struct devfs_lookup_struct lookup_info; + struct devfs_lookup_struct *lookup_info; struct fs_info *fs_info = dir->i_sb->s_fs_info; struct devfs_entry *parent, *de; struct inode *inode; @@ -2285,9 +2325,10 @@ static struct dentry *devfs_lookup (stru read_lock (&parent->u.dir.lock); de = _devfs_search_dir (parent, dentry->d_name.name, dentry->d_name.len); read_unlock (&parent->u.dir.lock); - lookup_info.de = de; - init_waitqueue_head (&lookup_info.wait_queue); - dentry->d_fsdata = &lookup_info; + lookup_info = new_devfs_lookup_struct(); + if (!lookup_info) return ERR_PTR(-ENOMEM); + lookup_info->de = de; + dentry->d_fsdata = lookup_info; if (de == NULL) { /* Try with devfsd. For any kind of failure, leave a negative dentry so someone else can deal with it (in the case where the sysadmin @@ -2297,6 +2338,7 @@ static struct dentry *devfs_lookup (stru if (try_modload (parent, fs_info, dentry->d_name.name, dentry->d_name.len, &tmp) < 0) { /* Lookup event was not queued to devfsd */ + put_devfs_lookup_struct(lookup_info); d_add (dentry, NULL); return NULL; } @@ -2309,7 +2351,7 @@ static struct dentry *devfs_lookup (stru up (&dir->i_sem); wait_for_devfsd_finished (fs_info); /* If I'm not devfsd, must wait */ down (&dir->i_sem); /* Grab it again because them's the rules */ - de = lookup_info.de; + de = lookup_info->de; /* If someone else has been so kind as to make the inode, we go home early */ if (dentry->d_inode) goto out; @@ -2333,10 +2375,11 @@ static struct dentry *devfs_lookup (stru de->name, de->inode.ino, inode, de, current->comm); d_instantiate (dentry, inode); out: + write_lock (&parent->u.dir.lock); dentry->d_op = &devfs_dops; dentry->d_fsdata = NULL; - write_lock (&parent->u.dir.lock); - wake_up (&lookup_info.wait_queue); + wake_up (&lookup_info->wait_queue); + put_devfs_lookup_struct(lookup_info); write_unlock (&parent->u.dir.lock); devfs_put (de); return retval; --Boundary-00=_tcFC/K5rog3javx-- From akpm@osdl.org Sun Jul 6 12:02:22 2003 Received: with ECARTIS (v1.0.0; list devfs); Sun, 06 Jul 2003 12:02:27 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h66J2K2x020379 for ; Sun, 6 Jul 2003 12:02:22 -0700 Received: from mnm (build.pdx.osdl.net [172.20.1.2]) by mail.osdl.org (8.11.6/8.11.6) with ESMTP id h66J24q03331; Sun, 6 Jul 2003 12:02:05 -0700 Date: Sun, 6 Jul 2003 12:03:15 -0700 From: Andrew Morton To: Andrey Borzenkov Cc: linux-kernel@vger.kernel.org, devfs@oss.sgi.com Subject: Re: [PATCH][2.5.73] stack corruption in devfs_lookup Message-Id: <20030706120315.261732bb.akpm@osdl.org> In-Reply-To: <200307062058.40797.arvidjaar@mail.ru> References: <200307062058.40797.arvidjaar@mail.ru> X-Mailer: Sylpheed version 0.9.0pre1 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 146 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: akpm@osdl.org Precedence: bulk X-list: devfs Andrey Borzenkov wrote: > > When devfs_lookup needs to call devfsd it arranges for other lookups for the > same name to wait. It is using local variable as wait queue head. After > devfsd returns devfs_lookup wakes up all waiters and returns. Unfortunately > there is no garantee all waiters will actually get chance to run and clean up > before devfs_lookup returns. so some of them attempt to access already freed > storage on stack. OK, but I think there is a simpler fix. We can rely on the side-effects of prepare_to_wait() and finish_wait(). The wakeup() will remove all wait_queue_t's from the wait_queue_head, and so when the waiters wake up and call finish_wait(), they will never touch the now-out-of-scope waitqueue head. It is a little faster than the currentcode, too. Could you please test this? diff -puN fs/devfs/base.c~devfs-oops-fix-2 fs/devfs/base.c --- 25/fs/devfs/base.c~devfs-oops-fix-2 2003-07-06 11:55:38.000000000 -0700 +++ 25-akpm/fs/devfs/base.c 2003-07-06 11:59:23.000000000 -0700 @@ -2218,7 +2218,6 @@ static int devfs_d_revalidate_wait (stru struct fs_info *fs_info = dir->i_sb->s_fs_info; devfs_handle_t parent = get_devfs_entry_from_vfs_inode (dir); struct devfs_lookup_struct *lookup_info = dentry->d_fsdata; - DECLARE_WAITQUEUE (wait, current); if ( is_devfsd_or_child (fs_info) ) { @@ -2252,11 +2251,12 @@ static int devfs_d_revalidate_wait (stru read_lock (&parent->u.dir.lock); if (dentry->d_fsdata) { - set_current_state (TASK_UNINTERRUPTIBLE); - add_wait_queue (&lookup_info->wait_queue, &wait); - read_unlock (&parent->u.dir.lock); - schedule (); - remove_wait_queue (&lookup_info->wait_queue, &wait); + DEFINE_WAIT(wait); + + prepare_to_wait(&lookup_info->wait_queue, &wait, TASK_UNINTERRUPTIBLE); + read_unlock(&parent->u.dir.lock); + schedule(); + finish_wait(&lookup_info->wait_queue, &wait); } else read_unlock (&parent->u.dir.lock); return 1; @@ -2336,6 +2336,12 @@ out: dentry->d_op = &devfs_dops; dentry->d_fsdata = NULL; write_lock (&parent->u.dir.lock); + /* + * This wakeup will remove all waiters' wait_queue_t's from the waitqueue + * head, because the waiters use prepare_to_wait()/finished_wait(). + * Hence it is OK that the waitqueue_head goes out of scope immediately + * after the wakeup is delivered + */ wake_up (&lookup_info.wait_queue); write_unlock (&parent->u.dir.lock); devfs_put (de); _ From arvidjaar@mail.ru Mon Jul 7 12:06:57 2003 Received: with ECARTIS (v1.0.0; list devfs); Mon, 07 Jul 2003 12:07:07 -0700 (PDT) Received: from hueymiccailhuitl.mtu.ru (hueytecuilhuitl.mtu.ru [195.34.32.123]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h67J6s2x002219 for ; Mon, 7 Jul 2003 12:06:56 -0700 Received: from ppp128-147.dialup.mtu-net.ru (ppp128-147.dialup.mtu-net.ru [62.118.128.147]) by hueymiccailhuitl.mtu.ru (Postfix) with ESMTP id 6CBB6D97F2; Mon, 7 Jul 2003 23:06:49 +0400 (MSD) (envelope-from arvidjaar@mail.ru) From: Andrey Borzenkov To: Andrew Morton Subject: [PATCH][2.5.74] devfs lookup deadlock/stack corruption combined patch Date: Mon, 7 Jul 2003 23:06:15 +0400 User-Agent: KMail/1.5 References: <20030706120315.261732bb.akpm@osdl.org> <20030706175405.518f680d.akpm@osdl.org> In-Reply-To: <20030706175405.518f680d.akpm@osdl.org> Cc: linux-kernel@vger.kernel.org, devfs@oss.sgi.com MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_nScC/Bk6DiZ1tYs" Message-Id: <200307072306.15995.arvidjaar@mail.ru> X-archive-position: 147 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: arvidjaar@mail.ru Precedence: bulk X-list: devfs --Boundary-00=_nScC/Bk6DiZ1tYs Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline On Monday 07 July 2003 04:54, you wrote: > Actually, don't bother. This idea can be made to work, but > we already have enough tricky stuff in the wait/wakeup area. > > Let's run with your original patch. > I finally hit a painfully trivial way to reproduce another long standing devfs problem - deadlock between devfs_lookup and devfs_d_revalidate_wait. When devfs_lookup releases directory i_sem devfs_d_revalidate_wait grabs it (it happens not for every path) and goes to wait to be waked up. Unfortunately, devfs_lookup attempts to acquire directory i_sem before ever waking it up ... To reproduce (2.5.74 UP or SMP - does not matter, single CPU system) ls /dev/foo & rm -f /dev/foo & or possibly in a loop but then it easily fills up process table. In my case it hangs 100% reliably - on 2.5 OR 2.4. The current fix is to move re-acquire of i_sem after all devfs_d_revalidate_wait waiters have been waked up. Much better fix would be to ensure that ->d_revalidate either is always called under i_sem or always without. But that means the very heart of VFS and I do not dare to touch it. The fix has been tested on 2.4 (and is part of unofficial Mandrake Club kernel); I expected the same bug is in 2.5; I just was stupid not seeing the way to reproduce it before. Attached is combined patch and fix for deadlock only (to show it alone). Andrew, I slightly polished original stack corruption version to look more consistent with the rest of devfs; also removed NULL pointer checks - let it just BUG in this case if it happens. I have already sent the patch for 2.4 two times - please, could somebody finally either apply it or explain what is wrong with it. Richard is out of reach apparently and the bug is real and seen by many people. regards -andrey --Boundary-00=_nScC/Bk6DiZ1tYs Content-Type: text/x-diff; charset="iso-8859-1"; name="2.5.74-devfs_combined.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="2.5.74-devfs_combined.patch" --- linux-2.5.74-smp/fs/devfs/base.c.deadlock 2003-06-23 19:46:44.000000000 +0400 +++ linux-2.5.74-smp/fs/devfs/base.c 2003-07-07 22:38:35.000000000 +0400 @@ -2208,8 +2208,38 @@ struct devfs_lookup_struct { devfs_handle_t de; wait_queue_head_t wait_queue; + atomic_t count; }; +static struct devfs_lookup_struct *new_devfs_lookup_struct (void) +{ + struct devfs_lookup_struct *info = kmalloc (sizeof (*info), GFP_KERNEL); + + if (info == NULL) { + PRINTK ("(): cannot allocate memory\n"); + return NULL; + } + + init_waitqueue_head (&info->wait_queue); + atomic_set (&info->count, 1); + DPRINTK (DEBUG_I_LOOKUP, "(%p): allocated\n", info); + return info; +} + +static inline void get_devfs_lookup_struct (struct devfs_lookup_struct *info) +{ + atomic_inc (&info->count); +} + +static inline void put_devfs_lookup_struct (struct devfs_lookup_struct *info) +{ + if (!atomic_dec_and_test (&info->count)) + return; + + DPRINTK (DEBUG_I_LOOKUP, "(%p): freed\n", info); + kfree (info); +} + /* XXX: this doesn't handle the case where we got a negative dentry but a devfs entry has been registered in the meanwhile */ static int devfs_d_revalidate_wait (struct dentry *dentry, int flags) @@ -2252,11 +2282,13 @@ static int devfs_d_revalidate_wait (stru read_lock (&parent->u.dir.lock); if (dentry->d_fsdata) { + get_devfs_lookup_struct (lookup_info); set_current_state (TASK_UNINTERRUPTIBLE); add_wait_queue (&lookup_info->wait_queue, &wait); read_unlock (&parent->u.dir.lock); schedule (); remove_wait_queue (&lookup_info->wait_queue, &wait); + put_devfs_lookup_struct (lookup_info); } else read_unlock (&parent->u.dir.lock); return 1; @@ -2268,7 +2300,7 @@ static int devfs_d_revalidate_wait (stru static struct dentry *devfs_lookup (struct inode *dir, struct dentry *dentry) { struct devfs_entry tmp; /* Must stay in scope until devfsd idle again */ - struct devfs_lookup_struct lookup_info; + struct devfs_lookup_struct *lookup_info; struct fs_info *fs_info = dir->i_sb->s_fs_info; struct devfs_entry *parent, *de; struct inode *inode; @@ -2285,9 +2317,10 @@ static struct dentry *devfs_lookup (stru read_lock (&parent->u.dir.lock); de = _devfs_search_dir (parent, dentry->d_name.name, dentry->d_name.len); read_unlock (&parent->u.dir.lock); - lookup_info.de = de; - init_waitqueue_head (&lookup_info.wait_queue); - dentry->d_fsdata = &lookup_info; + lookup_info = new_devfs_lookup_struct (); + if (lookup_info == NULL) return ERR_PTR (-ENOMEM); + lookup_info->de = de; + dentry->d_fsdata = lookup_info; if (de == NULL) { /* Try with devfsd. For any kind of failure, leave a negative dentry so someone else can deal with it (in the case where the sysadmin @@ -2297,6 +2330,7 @@ static struct dentry *devfs_lookup (stru if (try_modload (parent, fs_info, dentry->d_name.name, dentry->d_name.len, &tmp) < 0) { /* Lookup event was not queued to devfsd */ + put_devfs_lookup_struct (lookup_info); d_add (dentry, NULL); return NULL; } @@ -2308,8 +2342,7 @@ static struct dentry *devfs_lookup (stru revalidation */ up (&dir->i_sem); wait_for_devfsd_finished (fs_info); /* If I'm not devfsd, must wait */ - down (&dir->i_sem); /* Grab it again because them's the rules */ - de = lookup_info.de; + de = lookup_info->de; /* If someone else has been so kind as to make the inode, we go home early */ if (dentry->d_inode) goto out; @@ -2333,11 +2366,13 @@ static struct dentry *devfs_lookup (stru de->name, de->inode.ino, inode, de, current->comm); d_instantiate (dentry, inode); out: + write_lock (&parent->u.dir.lock); dentry->d_op = &devfs_dops; dentry->d_fsdata = NULL; - write_lock (&parent->u.dir.lock); - wake_up (&lookup_info.wait_queue); + wake_up (&lookup_info->wait_queue); + put_devfs_lookup_struct (lookup_info); write_unlock (&parent->u.dir.lock); + down (&dir->i_sem); /* Grab it again because them's the rules */ devfs_put (de); return retval; } /* End Function devfs_lookup */ --Boundary-00=_nScC/Bk6DiZ1tYs Content-Type: text/x-diff; charset="iso-8859-1"; name="2.5.74-devfs_lookup_deadlock.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="2.5.74-devfs_lookup_deadlock.patch" --- linux-2.5.74-smp/fs/devfs/base.c.orig 2003-07-07 20:58:36.000000000 +0400 +++ linux-2.5.74-smp/fs/devfs/base.c 2003-07-07 20:58:50.000000000 +0400 @@ -2308,7 +2308,6 @@ static struct dentry *devfs_lookup (stru revalidation */ up (&dir->i_sem); wait_for_devfsd_finished (fs_info); /* If I'm not devfsd, must wait */ - down (&dir->i_sem); /* Grab it again because them's the rules */ de = lookup_info.de; /* If someone else has been so kind as to make the inode, we go home early */ @@ -2338,6 +2337,7 @@ out: write_lock (&parent->u.dir.lock); wake_up (&lookup_info.wait_queue); write_unlock (&parent->u.dir.lock); + down (&dir->i_sem); /* Grab it again because them's the rules */ devfs_put (de); return retval; } /* End Function devfs_lookup */ --Boundary-00=_nScC/Bk6DiZ1tYs-- From akpm@osdl.org Mon Jul 7 14:06:27 2003 Received: with ECARTIS (v1.0.0; list devfs); Mon, 07 Jul 2003 14:06:34 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h67L6Q2x008236 for ; Mon, 7 Jul 2003 14:06:27 -0700 Received: from dhcp-140-235.pao.digeo.com (build.pdx.osdl.net [172.20.1.2]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h67L6Hq06347; Mon, 7 Jul 2003 14:06:17 -0700 Date: Mon, 7 Jul 2003 14:00:10 -0700 From: Andrew Morton To: Andrey Borzenkov Cc: linux-kernel@vger.kernel.org, devfs@oss.sgi.com Subject: Re: [PATCH][2.5.74] devfs lookup deadlock/stack corruption combined patch Message-Id: <20030707140010.4268159f.akpm@osdl.org> In-Reply-To: <200307072306.15995.arvidjaar@mail.ru> References: <20030706120315.261732bb.akpm@osdl.org> <20030706175405.518f680d.akpm@osdl.org> <200307072306.15995.arvidjaar@mail.ru> X-Mailer: Sylpheed version 0.9.0pre1 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 148 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: akpm@osdl.org Precedence: bulk X-list: devfs Andrey Borzenkov wrote: > > I finally hit a painfully trivial way to reproduce another long standing devfs > problem - deadlock between devfs_lookup and devfs_d_revalidate_wait. uh. > The current fix is to move re-acquire of i_sem after all > devfs_d_revalidate_wait waiters have been waked up. Directory modifications appear to be under write_lock(&dir->u.dir.lock); so that obvious problem is covered. Your patch might introduce a race around _devfs_get_vfs_inode() - two CPUs running that against the same inode at the same time? > Andrew, I slightly polished original stack corruption version to look more > consistent with the rest of devfs; I think it's Lindent time. From proski@gnu.org Mon Jul 7 14:41:44 2003 Received: with ECARTIS (v1.0.0; list devfs); Mon, 07 Jul 2003 14:41:49 -0700 (PDT) Received: from fencepost.gnu.org (fencepost.gnu.org [199.232.76.164]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h67Lfg2x008824 for ; Mon, 7 Jul 2003 14:41:43 -0700 Received: from proski by fencepost.gnu.org with local (Exim 4.20) id 19Zdjy-0002jA-4G; Mon, 07 Jul 2003 17:41:42 -0400 Date: Mon, 7 Jul 2003 17:41:40 -0400 (EDT) From: Pavel Roskin X-X-Sender: proski@marabou.research.att.com To: Andrey Borzenkov cc: Andrew Morton , linux-kernel@vger.kernel.org, devfs@oss.sgi.com Subject: Re: [PATCH][2.5.74] devfs lookup deadlock/stack corruption combined patch In-Reply-To: <200307072306.15995.arvidjaar@mail.ru> Message-ID: References: <20030706120315.261732bb.akpm@osdl.org> <20030706175405.518f680d.akpm@osdl.org> <200307072306.15995.arvidjaar@mail.ru> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 149 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: proski@gnu.org Precedence: bulk X-list: devfs On Mon, 7 Jul 2003, Andrey Borzenkov wrote: > To reproduce (2.5.74 UP or SMP - does not matter, single CPU system) > > ls /dev/foo & rm -f /dev/foo & > > or possibly in a loop but then it easily fills up process table. In my case it > hangs 100% reliably - on 2.5 OR 2.4. I've done some testing too. I was using this script: while :; do ls /dev/foo & rm -f /dev/foo & done On Linux 2.5.74-bk5 without patches, running this script on a local virtual console makes the system very slow. I could not even login on another virtual console. However, I could interrupt the script by Ctrl-C. After that I had a large number of processes in the D state. Here's an excerpt from the output of "ps ax": 31011 vc/1 D 0:00 ls /dev/foo 31012 vc/1 D 0:00 rm -f /dev/foo 31013 vc/1 D 0:00 ls /dev/foo 31014 vc/1 D 0:00 rm -f /dev/foo I couldn't reboot the system cleanly. My guess is that no new processes could be run. Linux 2.5.74-bk5 with the "combined" patch is OK. Running the test script doesn't prevent new logins. There are no hanging processes after interrupting the script. > I have already sent the patch for 2.4 two times - please, could somebody > finally either apply it or explain what is wrong with it. Richard is out of > reach apparently and the bug is real and seen by many people. I confirm seeing the bug on two systems with recent 2.4.x kernels with probability over 50%. Upgrading both systems to 2.5.x cured the problem (of course, it's just a race condition that stopped happening). Yes, it's an important problem to fix, and maybe we could remove the "experimental" mark from CONFIG_DEVFS once it's done. Maybe it's better to have the patch accepted in the 2.5 series first just for "methodical" reasons, but in practical terms, it's 2.4 kernels that need the deadlock fix very badly. -- Regards, Pavel Roskin From arvidjaar@mail.ru Tue Jul 8 10:59:32 2003 Received: with ECARTIS (v1.0.0; list devfs); Tue, 08 Jul 2003 10:59:41 -0700 (PDT) Received: from hueymiccailhuitl.mtu.ru (hueytecuilhuitl.mtu.ru [195.34.32.123]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h68HxU2x010848 for ; Tue, 8 Jul 2003 10:59:32 -0700 Received: from ppp137-109.dialup.mtu-net.ru (ppp137-109.dialup.mtu-net.ru [62.118.137.109]) by hueymiccailhuitl.mtu.ru (Postfix) with ESMTP id 2C32FD6224; Tue, 8 Jul 2003 21:59:27 +0400 (MSD) (envelope-from arvidjaar@mail.ru) From: Andrey Borzenkov To: Andrew Morton Subject: Re: [PATCH][2.5.74] devfs lookup deadlock/stack corruption combined patch Date: Tue, 8 Jul 2003 21:49:17 +0400 User-Agent: KMail/1.5 Cc: linux-kernel@vger.kernel.org, devfs@oss.sgi.com References: <200307072306.15995.arvidjaar@mail.ru> <20030707140010.4268159f.akpm@osdl.org> In-Reply-To: <20030707140010.4268159f.akpm@osdl.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200307082149.17918.arvidjaar@mail.ru> X-archive-position: 150 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: arvidjaar@mail.ru Precedence: bulk X-list: devfs On Tuesday 08 July 2003 01:00, Andrew Morton wrote: > Andrey Borzenkov wrote: > > I finally hit a painfully trivial way to reproduce another long standing > > devfs problem - deadlock between devfs_lookup and > > devfs_d_revalidate_wait. > > uh. > > > The current fix is to move re-acquire of i_sem after all > > devfs_d_revalidate_wait waiters have been waked up. > > Directory modifications appear to be under write_lock(&dir->u.dir.lock); so > that obvious problem is covered. Your patch might introduce a race around > _devfs_get_vfs_inode() - two CPUs running that against the same inode at > the same time? > Actually it just makes it marginally more probable. Normal open without O_CREATE runs ->d_revalidate outside of i_sem i.e. neither devfs_lookup vs. devfs_d_revalidate_wait nor devfs_d_revalidate_wait vs. itself are protected by i_sem and this is (should be) the most common case for /dev access. This race happens under non-trivial conditions. devfsd descendant (i.e. some action) should be involved; and action triggered by devfs_lookup does not race with it by definition because devfs_lookup waits for action to finish. I.e. it needs another devfsd action that would access /dev entry after it just has been created or two concurrent lookups in LOOKUP action itself. Quite unlikely in real life and race window is very small. I do not want to sound like it has to be ignored - but devfs code is so messy that no trivial fix exists that would not make code even more messy. So I would still apply original fixes and let this problem be solved later - it is not so important as to delay two other. -andrey From arvidjaar@mail.ru Fri Jul 11 12:11:00 2003 Received: with ECARTIS (v1.0.0; list devfs); Fri, 11 Jul 2003 12:11:04 -0700 (PDT) Received: from hueymiccailhuitl.mtu.ru (hueytecuilhuitl.mtu.ru [195.34.32.123]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h6BJAm2x001693 for ; Fri, 11 Jul 2003 12:10:49 -0700 Received: from ppp135-145.dialup.mtu-net.ru (ppp135-145.dialup.mtu-net.ru [62.118.135.145]) by hueymiccailhuitl.mtu.ru (Postfix) with ESMTP id 4D3481CA22D; Fri, 11 Jul 2003 22:48:01 +0400 (MSD) (envelope-from arvidjaar@mail.ru) From: Andrey Borzenkov To: devfs@oss.sgi.com Subject: devfsd hangs on restart - is_devfsd_or_child() problem Date: Fri, 11 Jul 2003 22:47:12 +0400 User-Agent: KMail/1.5 Cc: linux-kernel@vger.kernel.org, Thierry Vignaud MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200307112247.12646.arvidjaar@mail.ru> X-archive-position: 151 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: arvidjaar@mail.ru Precedence: bulk X-list: devfs I cannot believe it is so fragile ... is_devfsd_or_child() simplemindedly checks for pgrp: static int is_devfsd_or_child (struct fs_info *fs_info) { if (current == fs_info->devfsd_task) return (TRUE); if (current->pgrp == fs_info->devfsd_pgrp) return (TRUE); return (FALSE); } /* End Function is_devfsd_or_child */ unfortunately, bash (I do not know if it does it always or not) resets pgrp on startup. I.e. if your action is using shell it is no more considered devfsd descendant ... and it will attempt in turn start devfsd action while devfsd is waiting for it ot finish. Thierry, i refer mostly to dynamics scripts currently. Every time I update devfsd it hangs in one of them. And actually it is enough to do service devfsd restart to trigger this. It may be 2.5 specific again in that it is not as easily seen under 2.4. I have no idea what can be done. Is there any way in kernel to find out if one task is descendant of other task? Even rewriting devfsd to use non-blocking calls and request queue does not help as it apprently just results in endless loop (action triggering action triggering action ...) -andrey From arvidjaar@mail.ru Sat Jul 12 03:13:21 2003 Received: with ECARTIS (v1.0.0; list devfs); Sat, 12 Jul 2003 03:13:27 -0700 (PDT) Received: from hueymiccailhuitl.mtu.ru (hueytecuilhuitl.mtu.ru [195.34.32.123]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h6CADK2x003867 for ; Sat, 12 Jul 2003 03:13:21 -0700 Received: from ppp138-13.dialup.mtu-net.ru (ppp138-13.dialup.mtu-net.ru [62.118.138.13]) by hueymiccailhuitl.mtu.ru (Postfix) with ESMTP id 83886D9BB2; Sat, 12 Jul 2003 14:13:16 +0400 (MSD) (envelope-from arvidjaar@mail.ru) From: Andrey Borzenkov To: devfs@oss.sgi.com Subject: [PATCH][2.5.75] devfsd hangs on restart - is_devfsd_or_child() problem Date: Sat, 12 Jul 2003 14:11:41 +0400 User-Agent: KMail/1.5 Cc: linux-kernel@vger.kernel.org, Thierry Vignaud , Andrew Morton References: <200307112247.12646.arvidjaar@mail.ru> In-Reply-To: <200307112247.12646.arvidjaar@mail.ru> MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_d79D/ejDfe6Z1Wd" Message-Id: <200307121411.41131.arvidjaar@mail.ru> X-archive-position: 152 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: arvidjaar@mail.ru Precedence: bulk X-list: devfs --Boundary-00=_d79D/ejDfe6Z1Wd Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline On Friday 11 July 2003 22:47, Andrey Borzenkov wrote: > I cannot believe it is so fragile ... > > is_devfsd_or_child() simplemindedly checks for pgrp: > > static int is_devfsd_or_child (struct fs_info *fs_info) > { > if (current == fs_info->devfsd_task) return (TRUE); > if (current->pgrp == fs_info->devfsd_pgrp) return (TRUE); > return (FALSE); > } /* End Function is_devfsd_or_child */ > Andrew, one more for your collection :) The code that did proper check existed in 2.4 and was removed in 2.5 for whatever reason. The patch restores it slightly modified as below. 2.4 code looks somewhat unclean in that - it traverses task list without lock. - is starts from current->real_parent but nothing prevents current be init_task itself. This hung for me on 2.5 during boot. May be 2.4 does something differently. Comments? regards -andrey This is trivially reproduced under 2.5 by using devfsd.conf lines LOOKUP ^foo$ EXECUTE /home/bor/tmp/devfsd/handler /dev/bar LOOKUP ^bar$ EXECUTE /home/bor/tmp/devfsd/handler /dev/foo and handler like -------- cut here ---------- #include int main(int argc, char **argv, char **envp) { if (argc <= 1) return 0; setpgrp(); return access(argv[1], R_OK); } -------- cut here ---------- and doing ls /dev/foo --Boundary-00=_d79D/ejDfe6Z1Wd Content-Type: text/x-diff; charset="iso-8859-1"; name="2.5.75-is_devfsd_or_child.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="2.5.75-is_devfsd_or_child.patch" --- linux-2.5.75-smp/fs/devfs/base.c.devfsd_child 2003-07-11 19:41:46.000000000 +0400 +++ linux-2.5.75-smp/fs/devfs/base.c 2003-07-12 13:51:49.000000000 +0400 @@ -676,6 +676,7 @@ #include #include #include +#include #include #include @@ -1325,8 +1326,20 @@ static void free_dentry (struct devfs_en static int is_devfsd_or_child (struct fs_info *fs_info) { - if (current == fs_info->devfsd_task) return (TRUE); - if (current->pgrp == fs_info->devfsd_pgrp) return (TRUE); + struct task_struct *p = current; + + if (p == fs_info->devfsd_task) return (TRUE); + if (p->pgrp == fs_info->devfsd_pgrp) return (TRUE); + read_lock(&tasklist_lock); + for ( ; p != &init_task; p = p->real_parent) + { + if (p == fs_info->devfsd_task) + { + read_unlock (&tasklist_lock); + return (TRUE); + } + } + read_unlock (&tasklist_lock); return (FALSE); } /* End Function is_devfsd_or_child */ --Boundary-00=_d79D/ejDfe6Z1Wd-- From arvidjaar@mail.ru Sun Jul 13 08:37:29 2003 Received: with ECARTIS (v1.0.0; list devfs); Sun, 13 Jul 2003 08:37:38 -0700 (PDT) Received: from hueymiccailhuitl.mtu.ru (hueytecuilhuitl.mtu.ru [195.34.32.123]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h6DFbIFl003794 for ; Sun, 13 Jul 2003 08:37:19 -0700 Received: from ppp133-181.dialup.mtu-net.ru (ppp133-181.dialup.mtu-net.ru [62.118.133.181]) by hueymiccailhuitl.mtu.ru (Postfix) with ESMTP id 7DD07D6BBC; Sun, 13 Jul 2003 19:14:23 +0400 (MSD) (envelope-from arvidjaar@mail.ru) From: Andrey Borzenkov To: linux-kernel@vger.kernel.org Subject: [PATCH][2.5.75] fix removable partitioned media with devfs Date: Sun, 13 Jul 2003 18:38:07 +0400 User-Agent: KMail/1.5 Cc: devfs@oss.sgi.com MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_P7WE/Nf/oBjoHUO" Message-Id: <200307131838.07341.arvidjaar@mail.ru> X-archive-position: 153 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: arvidjaar@mail.ru Precedence: bulk X-list: devfs --Boundary-00=_P7WE/Nf/oBjoHUO Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Current 2.5 does not register any device node in devfs for empty media (capacity == 0) case. This makes removables unusable with devfs. Partition rescan is done only on bdev open, but without any device node for device it is impossible to open it. In 2.4 it was finally solved by always registering .../disc node as representation for "whole" disk and using devfsd action to force partition rescan on access to (non-existing) partition name. For primary names it was handled internally by devfs - it kept track of removable devices in directory and initiated partition rescan when name was not found. Both are obviously broken now. You can't do partition rescan because no node is registered at all and internal handling was removed. Very nice. The attached patch makes resgister_disk always register at least disc node. This now works for old and new compat names as per devfsd configuration; canonical names are still broken: {pts/3}% ll /dev/scsi/host1/bus0/target4/lun0/part4 ls: /dev/scsi/host1/bus0/target4/lun0/part4: No such file or directory but it can be fixed using the same technique as above so I won't push it. -andrey --Boundary-00=_P7WE/Nf/oBjoHUO Content-Type: text/x-diff; charset="us-ascii"; name="2.5.75-removable_media_with_devfs.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="2.5.75-removable_media_with_devfs.patch" --- linux-2.5.75-smp/fs/partitions/check.c.removable 2003-06-26 21:41:24.000000000 +0400 +++ linux-2.5.75-smp/fs/partitions/check.c 2003-07-13 17:20:16.000000000 +0400 @@ -348,6 +348,9 @@ void register_disk(struct gendisk *disk) return; } + /* always add handle for the whole disk */ + devfs_add_partitioned(disk); + /* No such device (e.g., media were just removed) */ if (!get_capacity(disk)) return; @@ -356,7 +359,6 @@ void register_disk(struct gendisk *disk) if (blkdev_get(bdev, FMODE_READ, 0, BDEV_RAW) < 0) return; state = check_partition(disk, bdev); - devfs_add_partitioned(disk); if (state) { for (j = 1; j < state->limit; j++) { sector_t size = state->parts[j].size; --Boundary-00=_P7WE/Nf/oBjoHUO-- From arvidjaar@mail.ru Sat Jul 26 09:15:01 2003 Received: with ECARTIS (v1.0.0; list devfs); Sat, 26 Jul 2003 09:15:14 -0700 (PDT) Received: from hueymiccailhuitl.mtu.ru (hueytecuilhuitl.mtu.ru [195.34.32.123]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h6QGEwFl019547 for ; Sat, 26 Jul 2003 09:15:00 -0700 Received: from ppp128-65.dialup.mtu-net.ru (ppp128-65.dialup.mtu-net.ru [62.118.128.65]) by hueymiccailhuitl.mtu.ru (Postfix) with ESMTP id 9A533DCFB3; Sat, 26 Jul 2003 20:14:54 +0400 (MSD) (envelope-from arvidjaar@mail.ru) From: Andrey Borzenkov To: Andrew Morton Subject: [PATCH][2.6.0-test1] redesign - stack corruption in devfs_lookup Date: Sat, 26 Jul 2003 18:58:24 +0400 User-Agent: KMail/1.5 Cc: linux-kernel@vger.kernel.org, devfs@oss.sgi.com References: <200307062058.40797.arvidjaar@mail.ru> <20030706120315.261732bb.akpm@osdl.org> In-Reply-To: <20030706120315.261732bb.akpm@osdl.org> MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_QcpI/G4o40gkrAj" Message-Id: <200307261858.25144.arvidjaar@mail.ru> X-archive-position: 154 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: arvidjaar@mail.ru Precedence: bulk X-list: devfs --Boundary-00=_QcpI/G4o40gkrAj Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline On Sunday 06 July 2003 23:03, Andrew Morton wrote: > Andrey Borzenkov wrote: > > When devfs_lookup needs to call devfsd it arranges for other lookups for > > the same name to wait. It is using local variable as wait queue head. > > After devfsd returns devfs_lookup wakes up all waiters and returns. > > Unfortunately there is no garantee all waiters will actually get chance > > to run and clean up before devfs_lookup returns. so some of them attempt > > to access already freed storage on stack. > > OK, but I think there is a simpler fix. We can rely on the side-effects of > prepare_to_wait() and finish_wait(). > and here is even more simple fix. there is no need to ever bother about wait queue because it dies soon without our intervention. This is exactly what code in 2.4 does - somebody "improved" code in 2.5 (again) :( the patch against 2.6.0-test1 removes my previous patch and adds comment in revalidate_wait to prevent it from hapenning again. It is basically 2.4 code except one spinlock acquisition has been moved a bit earlier (probably does not matter just looks better). This probably should use wake_up_all instead of wake_up to make intention more clear. It is tested using the same test case. Also 2.4 never had this problem as well. -andrey --Boundary-00=_QcpI/G4o40gkrAj Content-Type: text/x-diff; charset="iso-8859-1"; name="2.6.0-test1-devfs_stack_corruption-3.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="2.6.0-test1-devfs_stack_corruption-3.patch" --- linux-2.6.0-test1/fs/devfs/base.c.2.6.0-test1 2003-07-26 12:11:38.000000000 +0400 +++ linux-2.6.0-test1/fs/devfs/base.c 2003-07-26 12:11:06.000000000 +0400 @@ -2208,46 +2208,8 @@ struct devfs_lookup_struct { devfs_handle_t de; wait_queue_head_t wait_queue; - atomic_t count; }; -static struct devfs_lookup_struct * -new_devfs_lookup_struct(void) -{ - struct devfs_lookup_struct *p = kmalloc(sizeof(*p), GFP_KERNEL); - - if (!p) - return NULL; - - init_waitqueue_head (&p->wait_queue); - atomic_set(&p->count, 1); - return p; -} - -static void -get_devfs_lookup_struct(struct devfs_lookup_struct *info) -{ - if (info) - atomic_inc(&info->count); - else { - printk(KERN_ERR "null devfs_lookup_struct pointer\n"); - dump_stack(); - } -} - -static void -put_devfs_lookup_struct(struct devfs_lookup_struct *info) -{ - if (info) { - if (!atomic_dec_and_test(&info->count)) - return; - kfree(info); - } else { - printk(KERN_ERR "null devfs_lookup_struct pointer\n"); - dump_stack(); - } -} - /* XXX: this doesn't handle the case where we got a negative dentry but a devfs entry has been registered in the meanwhile */ static int devfs_d_revalidate_wait (struct dentry *dentry, struct nameidata *nd) @@ -2290,13 +2252,19 @@ static int devfs_d_revalidate_wait (stru read_lock (&parent->u.dir.lock); if (dentry->d_fsdata) { - get_devfs_lookup_struct(lookup_info); set_current_state (TASK_UNINTERRUPTIBLE); add_wait_queue (&lookup_info->wait_queue, &wait); read_unlock (&parent->u.dir.lock); schedule (); - remove_wait_queue (&lookup_info->wait_queue, &wait); - put_devfs_lookup_struct(lookup_info); + /* + * This does not need nor should remove wait from wait_queue. + * Wait queue head is never reused - nothing is ever added to it + * after all waiters have been waked up and head itself disappears + * very soon after it. Moreover it is local variable on stack that + * is likely to have already disappeared so any reference to it + * at this point is buggy. + */ + } else read_unlock (&parent->u.dir.lock); return 1; @@ -2308,7 +2276,7 @@ static int devfs_d_revalidate_wait (stru static struct dentry *devfs_lookup (struct inode *dir, struct dentry *dentry, struct nameidata *nd) { struct devfs_entry tmp; /* Must stay in scope until devfsd idle again */ - struct devfs_lookup_struct *lookup_info; + struct devfs_lookup_struct lookup_info; struct fs_info *fs_info = dir->i_sb->s_fs_info; struct devfs_entry *parent, *de; struct inode *inode; @@ -2325,10 +2293,9 @@ static struct dentry *devfs_lookup (stru read_lock (&parent->u.dir.lock); de = _devfs_search_dir (parent, dentry->d_name.name, dentry->d_name.len); read_unlock (&parent->u.dir.lock); - lookup_info = new_devfs_lookup_struct(); - if (!lookup_info) return ERR_PTR(-ENOMEM); - lookup_info->de = de; - dentry->d_fsdata = lookup_info; + lookup_info.de = de; + init_waitqueue_head (&lookup_info.wait_queue); + dentry->d_fsdata = &lookup_info; if (de == NULL) { /* Try with devfsd. For any kind of failure, leave a negative dentry so someone else can deal with it (in the case where the sysadmin @@ -2338,7 +2305,6 @@ static struct dentry *devfs_lookup (stru if (try_modload (parent, fs_info, dentry->d_name.name, dentry->d_name.len, &tmp) < 0) { /* Lookup event was not queued to devfsd */ - put_devfs_lookup_struct(lookup_info); d_add (dentry, NULL); return NULL; } @@ -2350,7 +2316,7 @@ static struct dentry *devfs_lookup (stru revalidation */ up (&dir->i_sem); wait_for_devfsd_finished (fs_info); /* If I'm not devfsd, must wait */ - de = lookup_info->de; + de = lookup_info.de; /* If someone else has been so kind as to make the inode, we go home early */ if (dentry->d_inode) goto out; @@ -2377,8 +2343,7 @@ out: write_lock (&parent->u.dir.lock); dentry->d_op = &devfs_dops; dentry->d_fsdata = NULL; - wake_up (&lookup_info->wait_queue); - put_devfs_lookup_struct(lookup_info); + wake_up (&lookup_info.wait_queue); write_unlock (&parent->u.dir.lock); down (&dir->i_sem); /* Grab it again because them's the rules */ devfs_put (de); --Boundary-00=_QcpI/G4o40gkrAj-- From sandeen@sgi.com Mon Jul 28 09:32:11 2003 Received: with ECARTIS (v1.0.0; list devfs); Mon, 28 Jul 2003 09:32:16 -0700 (PDT) Received: from zok.sgi.com (zok.SGI.COM [204.94.215.101]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h6SGWAFl006200 for ; Mon, 28 Jul 2003 09:32:10 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [192.48.203.135]) by zok.sgi.com (8.12.9/8.12.9/linux-outbound_gateway-1.1) with ESMTP id h6SGW5q0012903 for ; Mon, 28 Jul 2003 09:32:05 -0700 Received: from poppy-e236.americas.sgi.com (poppy-e236.americas.sgi.com [128.162.236.207]) by flecktone.americas.sgi.com (8.12.9/8.12.9/generic_config-1.2) with ESMTP id h6SGV4QK2418745 for ; Mon, 28 Jul 2003 11:31:04 -0500 (CDT) Received: from stout.americas.sgi.com (stout.americas.sgi.com [128.162.232.50]) by poppy-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id h6SGV4Yk15770796 for ; Mon, 28 Jul 2003 11:31:04 -0500 (CDT) Subject: [PATCH] better handling of fstab labels with devfs? From: Eric Sandeen To: "devfs@oss.sgi.com" Content-Type: text/plain Organization: Message-Id: <1059409863.28134.14.camel@stout.americas.sgi.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 28 Jul 2003 11:31:04 -0500 Content-Transfer-Encoding: 7bit X-archive-position: 155 X-ecartis-version: Ecartis v1.0.0 Sender: devfs-bounce@oss.sgi.com Errors-to: devfs-bounce@oss.sgi.com X-original-sender: sandeen@sgi.com Precedence: bulk X-list: devfs As is well known at this point, if you turn on devfs but do not mount it on /dev, mount-by-label will break. This is a patch against the 2.4.21-kernel which puts the "old-style" device names back into /proc/partitions _unless_ 1) CONFIG_DEVFS_MOUNT is turned on, meaning devfs is mounted on /dev at boot time, or 2) the device major # is dynamically allocated by devfs (so, assume there is no "old" name available) This still breaks in some corner cases, one that I can think of is if the user does not mount devfs on /dev at boot time, but later does mount it, then /proc/partitions will have old names in it, so they can't be found -if- the user is also not running devfsd to get the old compat names. I'm not necessarily asking this to be applied upstream; I know it is not a perfect solution, and maybe it doesn't make things much better. But, I thought I'd throw it out there. Does anyone see anything horribly broken about the approach? Thanks, -Eric =========================================================================== linux/fs/partitions/check.c =========================================================================== --- /usr/tmp/TmpDir.11207-0/linux/fs/partitions/check.c_1.39 2003-07-21 11:39:51.000000000 -0500 +++ linux/fs/partitions/check.c 2003-07-17 14:16:56.000000000 -0500 @@ -107,7 +107,14 @@ unsigned int unit = (minor >> hd->minor_shift); unsigned int part = (minor & ((1 << hd->minor_shift) -1 )); - if ((unit < hd->nr_real) && hd->part[minor].de) { + /* Only show devfs names if devfs is mounted at boot time, + * or devfs owns this major. Otherwise show old names. + */ + if ((unit < hd->nr_real) && hd->part[minor].de +#ifndef CONFIG_DEVFS_MOUNT + && devfs_owns_block_major(hd->major) +#endif + ) { int pos; pos = devfs_generate_path (hd->part[minor].de, buf, 64); =========================================================================== linux/fs/devfs/util.c =========================================================================== --- /usr/tmp/TmpDir.11207-0/linux/fs/devfs/util.c_1.9 2003-07-21 11:39:51.000000000 -0500 +++ linux/fs/devfs/util.c 2003-07-17 14:36:02.000000000 -0500 @@ -150,6 +150,16 @@ 0-3, 7-9, 11-63, 65-99, 101-113, 120-127, 199, 201, 240-255 Total free: 122 */ + +/* pre-assigned majors */ +unsigned long assigned_major_list[] = + {INITIALISER64 (0xfffffb8f, 0xffffffff), /* Majors 0-31, 32-63 */ + INITIALISER64 (0xfffffffe, 0xff03ffef), /* Majors 64-95, 96-127 */ + INITIALISER64 (0x00000000, 0x00000000), /* Majors 128-159, 160-191 */ + INITIALISER64 (0x00000280, 0xffff0000), /* Majors 192-223, 224-255 */ + }; + +/* currently allocated majors */ static struct major_list block_major_list = {SPIN_LOCK_UNLOCKED, {INITIALISER64 (0xfffffb8f, 0xffffffff), /* Majors 0-31, 32-63 */ @@ -174,6 +184,22 @@ /** + * devfs_owns_block_major - Test whether a device nr is from devfs' pool + * @major: The major number to test + + * Returns 1 if the major number is owned by devfs, 0 otherwise. + */ + +int devfs_owns_block_major (int major) +{ + int devfs_major; + + devfs_major = !test_bit (major, &assigned_major_list); + printk("major %d is %sowned by devfs (%d)\n", major, devfs_major ? "" : "not ", devfs_major); + return devfs_major; +} /* End Function devfs_owns_block_major */ + +/** * devfs_alloc_major - Allocate a major number. * @type: The type of the major (DEVFS_SPECIAL_CHR or DEVFS_SPECIAL_BLK) =========================================================================== linux/include/linux/devfs_fs_kernel.h =========================================================================== --- /usr/tmp/TmpDir.11207-0/linux/include/linux/devfs_fs_kernel.h_1.9 2003-07-21 11:39:51.000000000 -0500 +++ linux/include/linux/devfs_fs_kernel.h 2003-07-17 14:15:44.000000000 -0500 @@ -107,6 +107,7 @@ unsigned int flags, unsigned int major, unsigned int minor_start, umode_t mode, void *ops, void *info); +extern int devfs_owns_block_major (int major); extern int devfs_alloc_major (char type); extern void devfs_dealloc_major (char type, int major); extern kdev_t devfs_alloc_devnum (char type); @@ -276,6 +277,11 @@ return; } +static inline int devfs_owns_block_major (int major) +{ + return 0; +} + static inline int devfs_alloc_major (char type) { return -1; -- Eric Sandeen [C]XFS for Linux http://oss.sgi.com/projects/xfs sandeen@sgi.com SGI, Inc. 651-683-3102