<div dir="ltr"><div><div>Dear Eric,<br><br></div>I checked rh srpm <a href="https://content-web.rhn.redhat.com/rhn/public/NULL/kernel/2.6.32-279.19.1.el6/SRPMS/kernel-2.6.32-279.19.1.el6.src.rpm?__gda__=1366390847_8550b8568c50ea46b3180266b476353d&ext=.rpm">https://content-web.rhn.redhat.com/rhn/public/NULL/kernel/2.6.32-279.19.1.el6/SRPMS/kernel-2.6.32-279.19.1.el6.src.rpm?__gda__=1366390847_8550b8568c50ea46b3180266b476353d&ext=.rpm</a><br>
</div>And the code is same, as following:<br><br>__rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)<br>{<br> struct rwsem_waiter *waiter;<br> struct task_struct *tsk;<br> int woken;<br><br> waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);<br>
<br> if (!wakewrite) {<br> if (waiter->flags & RWSEM_WAITING_FOR_WRITE)<br> goto out;<br> goto dont_wake_writers;<br> }<br><br> /* if we are allowed to wake writers try to grant a single write lock<br>
* if there's a writer at the front of the queue<br> * - we leave the 'waiting count' incremented to signify potential<br> * contention<br> */<br> if (waiter->flags & RWSEM_WAITING_FOR_WRITE) {<br>
sem->activity = -1;<br> list_del(&waiter->list);<br> tsk = waiter->task;<br> /* Don't touch waiter after ->task has been NULLed */<br> smp_mb();<br> waiter->task = NULL;<br>
wake_up_process(tsk);<br> put_task_struct(tsk);<br> goto out;<br> }<br><br> /* grant an infinite number of read locks to the front of the queue */<br> dont_wake_writers:<br> woken = 0;<br> while (waiter->flags & RWSEM_WAITING_FOR_READ) {<br>
struct list_head *next = waiter->list.next;<br><br> list_del(&waiter->list);<br> tsk = waiter->task;<br> smp_mb();<br> waiter->task = NULL;<br> wake_up_process(tsk);<br>
put_task_struct(tsk);<br> woken++;<br> if (list_empty(&sem->wait_list))<br> break;<br> waiter = list_entry(next, struct rwsem_waiter, list);<br> }<br><br> sem->activity += woken;<br>
<br> out:<br> return sem;<br>}<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">2013/4/20 ·ûÓÀÌÎ <span dir="ltr"><<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>Hi Eric,<br></div>Here's the server info:<br>[<a href="mailto:root@10.23.72.95" target="_blank">root@10.23.72.95</a> ~]# rpm -qa|grep kernel<br>
kernel-debug-debuginfo-2.6.32-279.19.1.el6.x86_64<br>kernel-headers-2.6.32-279.19.1.el6.x86_64<br>
abrt-addon-kerneloops-2.0.8-6.el6.x86_64<br>dracut-kernel-004-283.el6.noarch<br>kernel-debuginfo-common-x86_64-2.6.32-279.19.1.el6.x86_64<br>kernel-debuginfo-2.6.32-279.19.1.el6.x86_64<br>kernel-debug-2.6.32-279.19.1.el6.x86_64<br>
kernel-devel-2.6.32-279.19.1.el6.x86_64<br>libreport-plugin-kerneloops-2.0.9-5.el6.x86_64<br>kernel-firmware-2.6.32-279.19.1.el6.noarch<br>kernel-2.6.32-279.19.1.el6.x86_64<br>kernel-debug-devel-2.6.32-279.19.1.el6.x86_64<br>
[<a href="mailto:root@10.23.72.95" target="_blank">root@10.23.72.95</a> ~]# uname -a<br>Linux 10.23.72.95 2.6.32-279.19.1.el6.x86_64 #1 SMP Fri Apr 19 10:44:52 CST 2013 x86_64 x86_64 x86_64 GNU/Linux<br>[<a href="mailto:root@10.23.72.95" target="_blank">root@10.23.72.95</a> ~]# <br>
<br>The kernel code looks like:<br>__rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)<br>{<br> struct rwsem_waiter *waiter;<br> struct task_struct *tsk;<br> int woken;<br><br> waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);<br>
<br> if (!wakewrite) {<br> if (waiter->flags & RWSEM_WAITING_FOR_WRITE)<br> goto out;<br> goto dont_wake_writers;<br> }<br><br> /* if we are allowed to wake writers try to grant a single write lock<br>
* if there's a writer at the front of the queue<br> * - we leave the 'waiting count' incremented to signify potential<br> * contention<br> */<br> if (waiter->flags & RWSEM_WAITING_FOR_WRITE) {<br>
sem->activity = -1;<br> list_del(&waiter->list);<br> tsk = waiter->task;<br> /* Don't touch waiter after ->task has been NULLed */<br> smp_mb();<br>
waiter->task = NULL;<br> wake_up_process(tsk);<br> put_task_struct(tsk);<br> goto out;<br> }<br><br> /* grant an infinite number of read locks to the front of the queue */<br>
dont_wake_writers:<br> woken = 0;<br> while (waiter->flags & RWSEM_WAITING_FOR_READ) {<br> struct list_head *next = waiter->list.next;<br><br> list_del(&waiter->list);<br>
tsk = waiter->task;<br> smp_mb();<br> waiter->task = NULL;<br> wake_up_process(tsk);<br> put_task_struct(tsk);<br> woken++;<br>
if (list_empty(&sem->wait_list))<br> break;<br> waiter = list_entry(next, struct rwsem_waiter, list);<br> }<br><br> sem->activity += woken;<br>
<br> out:<br> return sem;<br>}<br><br></div>I use srpm because I want to apply the trace path. Can you help to provide the official 279.19.1 srpm link.<br></div>Thank you.<br></div><div class="gmail_extra"><div><div class="h5">
<br><br>
<div class="gmail_quote">2013/4/20 Eric Sandeen <span dir="ltr"><<a href="mailto:sandeen@sandeen.net" target="_blank">sandeen@sandeen.net</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>On 4/19/13 8:18 AM, ·ûÓÀÌÎ wrote:<br>
> Dear Eric,<br>
> If it's racing issue where the lock is introduced? I want to study the code from you. Thank you.<br>
><br>
<br>
</div>essentially:<br>
<br>
xfs_remove()<br>
{<br>
...<br>
xfs_lock_two_inodes(dp, ip, XFS_ILOCK_EXCL);<br>
...<br>
xfs_droplink()<br>
<br>
You are 100% sure that you were running the 279.19.1 kernel?<br>
<br>
(I'm not very familiar with Oracle's clone of RHEL - I assume that they have copied all of Red Hat's work verbatim, but I have not looked)<br>
<br>
Can you verify that in:<br>
<br>
__rwsem_do_wake()<br>
<br>
the undo target looks like:<br>
<br>
out:<br>
return sem;<br>
<br>
<br>
/* undo the change to the active count, but check for a transition<br>
* 1->0 */<br>
undo:<br>
if (rwsem_atomic_update(-RWSEM_ACTIVE_BIAS, sem) & RWSEM_ACTIVE_MASK)<br>
goto out;<br>
goto try_again;<br>
<br>
<br>
thanks,<br>
-Eric<br>
<br>
> 2013/4/19 ·ûÓÀÌÎ <<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>>><br>
<div>><br>
> Sure the serious thing here is that it corrupt the unlinked list. The inode 0x1bd33 which trigger xfs shutdown is not 0x6b133.<br>
><br>
><br>
</div>> 2013/4/19 Eric Sandeen <<a href="mailto:sandeen@sandeen.net" target="_blank">sandeen@sandeen.net</a> <mailto:<a href="mailto:sandeen@sandeen.net" target="_blank">sandeen@sandeen.net</a>>><br>
<div><div>><br>
> On 4/19/13 4:41 AM, ·ûÓÀÌÎ wrote:<br>
> > Dear Brian and Eric,<br>
> ><br>
> > kernel kernel-2.6.32-279.19.1.el6.x86_64.rpm <<a href="http://mirror.linux.duke.edu/pub/centos/6.3/updates/x86_64/Packages/kernel-2.6.32-279.19.1.el6.x86_64.rpm" target="_blank">http://mirror.linux.duke.edu/pub/centos/6.3/updates/x86_64/Packages/kernel-2.6.32-279.19.1.el6.x86_64.rpm</a>> still have this problem<br>
> > I build the kernel from this srpm<br>
> > <a href="https://oss.oracle.com/ol6/SRPMS-updates/kernel-2.6.32-279.19.1.el6.src.rpm" target="_blank">https://oss.oracle.com/ol6/SRPMS-updates/kernel-2.6.32-279.19.1.el6.src.rpm</a><br>
> ><br>
> > today the shutdown happens again during test.<br>
> > Seelogs bellow:<br>
> ><br>
> > /var/log/message<br>
> > Apr 19 16:40:05 10 kernel: XFS (sdb): xfs_iunlink_remove: xfs_inotobp() returned error 22.<br>
> > Apr 19 16:40:05 10 kernel: XFS (sdb): xfs_inactive: xfs_ifree returned error 22<br>
> > Apr 19 16:40:05 10 kernel: XFS (sdb): xfs_do_force_shutdown(0x1) called from line 1184 of file fs/xfs/xfs_vnodeops.c. Return address = 0xffffffffa02d4bda<br>
> > Apr 19 16:40:05 10 kernel: XFS (sdb): I/O Error Detected. Shutting down filesystem<br>
> > Apr 19 16:40:05 10 kernel: XFS (sdb): Please umount the filesystem and rectify the problem(s)<br>
> > Apr 19 16:40:07 10 kernel: XFS (sdb): xfs_log_force: error 5 returned.<br>
> > Apr 19 16:40:37 10 kernel: XFS (sdb): xfs_log_force: error 5 returned.<br>
> ><br>
> > systemtap script output:<br>
> > --- xfs_imap -- module("xfs").function("xfs_imap@fs/xfs/xfs_ialloc.c:1257").return -- return=0x16<br>
> > vars: mp=0xffff88101801e800 tp=0xffff880ff143ac70 ino=0xffffffff imap=0xffff88100e93bc08 flags=0x0 agbno=? agino=? agno=? blks_per_cluster=? chunk_agbno=? cluster_agbno=? error=? offset=? offset_agbno=? __func__=[...]<br>
> > mp: m_agno_log = 0x5, m_agino_log = 0x20<br>
> > mp->m_sb: sb_agcount = 0x1c, sb_agblocks = 0xffffff0, sb_inopblog = 0x4, sb_agblklog = 0x1c, sb_dblocks = 0x1b4900000<br>
> > imap: im_blkno = 0x0, im_len = 0xe778, im_boffset = 0xd997<br>
> > kernel backtrace:<br>
> > Returning from: 0xffffffffa02b4260 : xfs_imap+0x0/0x280 [xfs]<br>
> > Returning to : 0xffffffffa02b9d59 : xfs_inotobp+0x49/0xc0 [xfs]<br>
> > 0xffffffffa02b9ec1 : xfs_iunlink_remove+0xf1/0x360 [xfs]<br>
> > 0xffffffff814ede89<br>
> > 0x0 (inexact)<br>
> > user backtrace:<br>
</div></div>> > 0x3ec260e5ad [/lib64/<a href="http://libpthread-2.12.so" target="_blank">libpthread-2.12.so</a> <<a href="http://libpthread-2.12.so" target="_blank">http://libpthread-2.12.so</a>> <<a href="http://libpthread-2.12.so" target="_blank">http://libpthread-2.12.so</a>>+0xe5ad/0x219000]<br>
<div><div>> ><br>
> > --- xfs_iunlink_remove -- module("xfs").function("xfs_iunlink_remove@fs/xfs/xfs_inode.c:1681").return -- return=0x16<br>
> > vars: tp=0xffff880ff143ac70 ip=0xffff8811ed111000 next_ino=? mp=? agi=? dip=? agibp=? ibp=? agno=? agino=? next_agino=? last_ibp=? last_dip=0xffff881000000001 bucket_index=? offset=? last_offset=0xffffffffffff8811 error=? __func__=[...]<br>
> > ip: i_ino = 0x1bd33, i_flags = 0x0<br>
> > ip->i_d: di_nlink = 0x0, di_gen = 0x53068791<br>
> ><br>
> > debugfs events trace:<br>
> > <a href="https://docs.google.com/file/d/0B7n2C4T5tfNCREZtdC1yamc0RnM/edit?usp=sharing" target="_blank">https://docs.google.com/file/d/0B7n2C4T5tfNCREZtdC1yamc0RnM/edit?usp=sharing</a><br>
><br>
> Same issue, one file was unlinked twice in a race:<br>
><br>
> === ino 0x6b133 ===<br>
> <...>-4477 [003] 2721.176790: xfs_iunlink: dev 8:16 ino 0x6b133<br>
> <...>-4477 [003] 2721.176839: xfs_iunlink_remove: dev 8:16 ino 0x6b133<br>
> <...>-4477 [009] 3320.127227: xfs_iunlink: dev 8:16 ino 0x6b133<br>
> <...>-4477 [001] 3320.141126: xfs_iunlink_remove: dev 8:16 ino 0x6b133<br>
> <...>-4477 [003] 7973.136368: xfs_iunlink: dev 8:16 ino 0x6b133<br>
> <...>-4479 [018] 7973.158457: xfs_iunlink: dev 8:16 ino 0x6b133<br>
> <...>-4479 [018] 7973.158497: xfs_iunlink_remove: dev 8:16 ino 0x6b133<br>
><br>
> -Eric<br>
><br>
><br>
><br>
><br>
> --<br>
> ·ûÓÀÌÎ<br>
><br>
><br>
><br>
><br>
> --<br>
> ·ûÓÀÌÎ<br>
<br>
</div></div></blockquote></div><br><br clear="all"><br></div></div><span class="HOEnZb"><font color="#888888">-- <br>·ûÓÀÌÎ
</font></span></div>
</blockquote></div><br><br clear="all"><br>-- <br>·ûÓÀÌÎ
</div>