| To: | David Chinner <dgc@xxxxxxx>, xfs@xxxxxxxxxxx |
|---|---|
| Subject: | Re: Issues with XFS on Sles9 sp2. |
| From: | Roger Heflin <rheflin@xxxxxxxxx> |
| Date: | Wed, 20 Dec 2006 09:45:04 -0600 |
| In-reply-to: | <20061205004416.GU33919298@melbourne.sgi.com> |
| References: | <45704570.7020209@atipa.com> <20061203224832.GY37654165@melbourne.sgi.com> <457431B1.20700@atipa.com> <20061204224208.GO33919298@melbourne.sgi.com> <4574AA4F.4030902@atipa.com> <20061205004416.GU33919298@melbourne.sgi.com> |
| Sender: | xfs-bounce@xxxxxxxxxxx |
| User-agent: | Thunderbird 1.5 (X11/20060313) |
David, I have a bit more data on this hang up. So far we have not been able to reproduce it with any test other applications. It so far will only duplicate with one certain application and the trace from that application shows something interesting: <Oct/27 07:40 am>acuSolve-gmpi D 00000000000493e0 0 14418 1 14419 16873 (NOTLB) <Oct/27 07:40 am>Call Trace:<ffffffff80165ad4>{wait_on_page_writeback_range_wq+324} <Oct/27 07:40 am> <ffffffff8010f9c8>{__down+152} <ffffffff80135c50>{default_wake_function+0} <Oct/27 07:40 am> <ffffffff80234447>{__down_failed+53} <ffffffff801949dc>{.text.lock.super+169} <Oct/27 07:40 am> <ffffffff8018fcea>{do_sync+42} <ffffffff8018fd5e>{sys_sync+62} <Oct/27 07:40 am> <ffffffff80110794>{system_call+124} <Oct/27 07:40 am>acuSolve-gmpi D 00000000000493e0 0 14419 1 18864 14418 (NOTLB) <Oct/27 07:40 am>Call Trace:<ffffffff8010f9c8>{__down+152} <ffffffff80135c50>{default_wake_function+0} <Oct/27 07:40 am> <ffffffff80234447>{__down_failed+53} <ffffffffa0141642>{:xfs:.text.lock.xfs_buf+15} <Oct/27 07:40 am> <ffffffffa0126618>{:xfs:xfs_getsb+40} <ffffffffa012ecea>{:xfs:xfs_syncsub+2602} <Oct/27 07:40 am> <ffffffffa013e468>{:xfs:vfs_sync+40} <ffffffffa013d434>{:xfs:linvfs_sync_super+68} <Oct/27 07:40 am> <ffffffff80193cff>{sync_filesystems+223} <ffffffff8018fcf1>{do_sync+49} <Oct/27 07:40 am> <ffffffff8018fd5e>{sys_sync+62} <ffffffff80110794>{system_call+124} Both copies of the application appear to be calling sys_sync at the same time, and each one is hung in a different part of the sys_sync call, I wonder if there is a deadlock condition possible if one is doing this sort of thing? All other processes are always running on the machines, these 2 process are the extra ones that appear to be required to get the xfs lockup condition. Other application process have never reproduced the issue, so it is likely that something that these 2 processes are doing is required to cause the deadlock condition. Both processes are trying to get locks and failing, if I look at the other processes that are working with locks I see: <Oct/27 07:40 am>Call Trace:<ffffffffa0141832>{:xfs:kmem_zone_zalloc+50} <ffffffffa012a9c4>{:xfs:_xfs_trans_alloc+36} <Oct/27 07:40 am> <ffffffff80231b35>{__down_write+117} <ffffffffa0116ead>{:xfs:xfs_ilock+93} <Oct/27 07:40 am> <ffffffffa012eda3>{:xfs:xfs_syncsub+2787} <ffffffff80146970>{del_timer_sync+80} <Oct/27 07:40 am> <ffffffff80135c50>{default_wake_function+0} <ffffffff80234447>{__down_failed+53} <Oct/27 07:40 am> <ffffffffa0141642>{:xfs:.text.lock.xfs_buf+15} <ffffffffa0126618>{:xfs:xfs_getsb+40} <Oct/27 07:40 am> <ffffffffa012b8aa>{:xfs:xfs_trans_getsb+106} <ffffffffa012a10c>{:xfs:xfs_trans_commit+332} <Oct/27 07:40 am> <ffffffffa012b37b>{:xfs:xfs_trans_log_buf+107} <ffffffff80234447>{__down_failed+53} <Oct/27 07:40 am> <ffffffffa0141642>{:xfs:.text.lock.xfs_buf+15} <ffffffffa0126618>{:xfs:xfs_getsb+40} <Oct/27 07:40 am> <ffffffffa012b8aa>{:xfs:xfs_trans_getsb+106} <ffffffffa012a10c>{:xfs:xfs_trans_commit+332} <Oct/27 07:40 am> <ffffffffa013194f>{:xfs:xfs_create+1359} <ffffffffa013b429>{:xfs:linvfs_mknod+521} <Oct/27 07:40 am> <ffffffffa0116d16>{:xfs:xfs_iunlock+102} <ffffffffa0133387>{:xfs:xfs_lookup+119} <Oct/27 07:40 am> <ffffffffa013b704>{:xfs:linvfs_lookup+84} <ffffffff8019c49b>{real_lookup+123} <Oct/27 07:40 am> <ffffffff8010f9c8>{__down+152} <ffffffff80135c50>{default_wake_function+0} <Oct/27 07:40 am> <ffffffff80234447>{__down_failed+53} <ffffffff801949dc>{.text.lock.super+169} <Oct/27 07:40 am> <ffffffff8018fcea>{do_sync+42} <ffffffff8018fd5e>{sys_sync+62} <Oct/27 07:40 am>Call Trace:<ffffffff8010f9c8>{__down+152} <ffffffff80135c50>{default_wake_function+0} <Oct/27 07:40 am> <ffffffff80234447>{__down_failed+53} <ffffffffa0141642>{:xfs:.text.lock.xfs_buf+15} <Oct/27 07:40 am> <ffffffffa0126618>{:xfs:xfs_getsb+40} <ffffffffa012ecea>{:xfs:xfs_syncsub+2602} It very much looks like a deadlock as almost everyone hung looks to be working/waiting for locks. Is there a way to list the kernel locks and see who is holding what locks? Roger Full Trace follows: <Oct/27 07:40 am>xfssyncd D 00000000000493e0 0 2760 1 3876 2755 (L-TLB) <Oct/27 07:40 am>Call Trace:<ffffffffa0141832>{:xfs:kmem_zone_zalloc+50} <ffffffffa012a9c4>{:xfs:_xfs_trans_alloc+36} <Oct/27 07:40 am> <ffffffff80231b35>{__down_write+117} <ffffffffa0116ead>{:xfs:xfs_ilock+93} <Oct/27 07:40 am> <ffffffffa012eda3>{:xfs:xfs_syncsub+2787} <ffffffff80146970>{del_timer_sync+80} <Oct/27 07:40 am> <ffffffff80146a55>{del_singleshot_timer_sync+21} <ffffffff80146d2e>{schedule_timeout+254} <Oct/27 07:40 am> <ffffffffa013e468>{:xfs:vfs_sync+40} <ffffffffa013da79>{:xfs:vfs_sync_worker+25} <Oct/27 07:40 am> <ffffffffa013dc1a>{:xfs:xfssyncd+378} <ffffffffa013d780>{:xfs:linvfs_fill_super+0} <Oct/27 07:40 am> <ffffffff801112b7>{child_rip+8} <ffffffffa013d780>{:xfs:linvfs_fill_super+0} <Oct/27 07:40 am> <ffffffffa013daa0>{:xfs:xfssyncd+0} <ffffffff801112af>{child_rip+0} <Oct/27 07:40 am> <Oct/27 07:40 am>res D 000000000000000a 0 16149 1 26319 16151 5825 (NOTLB) <Oct/27 07:40 am>Call Trace:<ffffffff80231bcd>{__down_read+125} <ffffffffa01333dc>{:xfs:xfs_access+44} <Oct/27 07:40 am> <ffffffffa013af44>{:xfs:linvfs_permission+20} <ffffffff8019c767>{permission+55} <Oct/27 07:40 am> <ffffffff8019df1c>{link_path_walk+348} <ffffffff801a0706>{__user_walk_it+70} <Oct/27 07:40 am> <ffffffff801974b0>{vfs_lstat+128} <ffffffff80122868>{do_page_fault+536} <Oct/27 07:40 am> <ffffffff801975bf>{sys_newlstat+31} <ffffffff80111101>{error_exit+0} <Oct/27 07:40 am> <ffffffff80110794>{system_call+124} <Oct/27 07:40 am>sbatchd D 00000000000493e0 0 16151 1 12686 16149 (NOTLB) <Oct/27 07:40 am>Call Trace:<ffffffff801a7b51>{dput+33} <ffffffff8019c2cd>{follow_mount+93} <Oct/27 07:40 am> <ffffffff801a7b51>{dput+33} <ffffffff80231bcd>{__down_read+125} <Oct/27 07:40 am> <ffffffffa01333dc>{:xfs:xfs_access+44} <ffffffffa013af44>{:xfs:linvfs_permission+20} <Oct/27 07:40 am> <ffffffff8019c767>{permission+55} <ffffffff8018aeca>{sys_chdir+138} <Oct/27 07:40 am> <ffffffff801a394c>{sys_select+1244} <ffffffff80110794>{system_call+124} <Oct/27 07:40 am> <Oct/27 07:40 am>gm_mapper D 000000000000000a 0 12686 1 16834 16151 (L-TLB) <Oct/27 07:40 am>Call Trace:<ffffffffa012b37b>{:xfs:xfs_trans_log_buf+107} <ffffffff8010f9c8>{__down+152} <Oct/27 07:40 am> <ffffffff80135c50>{default_wake_function+0} <ffffffff80234447>{__down_failed+53} <Oct/27 07:40 am> <ffffffffa0141642>{:xfs:.text.lock.xfs_buf+15} <ffffffffa0126618>{:xfs:xfs_getsb+40} <Oct/27 07:40 am> <ffffffffa012b8aa>{:xfs:xfs_trans_getsb+106} <ffffffffa012a10c>{:xfs:xfs_trans_commit+332} <Oct/27 07:40 am> <ffffffffa00e7a9c>{:xfs:xfs_free_extent+204} <ffffffffa0111634>{:xfs:xfs_efd_init+68} <Oct/27 07:40 am> <ffffffffa014179b>{:xfs:kmem_zone_alloc+75} <ffffffffa0141832>{:xfs:kmem_zone_zalloc+50} <Oct/27 07:40 am> <ffffffffa011a9cd>{:xfs:xfs_itruncate_finish+557} <ffffffffa012aae9>{:xfs:xfs_trans_alloc+217} <Oct/27 07:40 am> <ffffffff8011081d>{sysret_signal+28} <ffffffffa01300af>{:xfs:xfs_inactive+591} <Oct/27 07:40 am> <ffffffff8011081d>{sysret_signal+28} <ffffffff80169f50>{__pagevec_free+32} <Oct/27 07:40 am> <ffffffff8011081d>{sysret_signal+28} <ffffffffa013ebc8>{:xfs:vn_rele+72} <Oct/27 07:40 am> <ffffffffa013d392>{:xfs:linvfs_clear_inode+18} <ffffffff801a9d3b>{clear_inode+155} <Oct/27 07:40 am> <ffffffff801aa3f5>{generic_delete_inode+245} <ffffffff801a95ee>{iput+158} <Oct/27 07:40 am> <ffffffff801a7cb5>{dput+389} <ffffffff8018d9de>{__fput+270} <Oct/27 07:40 am> <ffffffff8018965e>{filp_close+126} <ffffffff8013f073>{put_files_struct+115} <Oct/27 07:40 am> <ffffffff80140522>{do_exit+1010} <ffffffff801484b5>{__dequeue_signal+501} <Oct/27 07:40 am> <ffffffff8011081d>{sysret_signal+28} <ffffffff80140fa8>{do_group_exit+232} <Oct/27 07:40 am> <ffffffff8014ab37>{get_signal_to_deliver+1175} <ffffffff8011004b>{do_signal+1179} <Oct/27 07:40 am> <ffffffff8010fc45>{do_signal+149} <ffffffffa02dbea0>{:gm:gm_linux_ioctl+0} <Oct/27 07:40 am> <ffffffffa02dbf0a>{:gm:gm_linux_ioctl+106} <ffffffff801a2094>{sys_ioctl+1092} <Oct/27 07:40 am> <ffffffff8011052d>{sys_rt_sigreturn+653} <ffffffff8011081d>{sysret_signal+28} <Oct/27 07:40 am> <ffffffff80110adf>{ptregscall_common+103} <Oct/27 07:40 am>lim D 000000000000000a 0 16834 1 16835 17594 12686 (NOTLB) <Oct/27 07:40 am>Call Trace:<ffffffff80231bcd>{__down_read+125} <ffffffffa01333dc>{:xfs:xfs_access+44} <Oct/27 07:40 am> <ffffffffa013af44>{:xfs:linvfs_permission+20} <ffffffff8019c767>{permission+55} <Oct/27 07:40 am> <ffffffff8019df1c>{link_path_walk+348} <ffffffff801a0706>{__user_walk_it+70} <Oct/27 07:40 am> <ffffffff801974b0>{vfs_lstat+128} <ffffffff80117ec4>{save_i387+148} <Oct/27 07:40 am> <ffffffff8011018d>{do_signal+1501} <ffffffff801975bf>{sys_newlstat+31} <Oct/27 07:40 am> <ffffffff80147d04>{sys_rt_sigaction+148} <ffffffff80110794>{system_call+124} <Oct/27 07:40 am> <Oct/27 07:40 am>pim D 00000000000493e0 0 16835 16834 16870 (NOTLB) <Oct/27 07:40 am>Call Trace:<ffffffffa01412ad>{:xfs:xfs_buf_get_flags+877} <ffffffffa014179b>{:xfs:kmem_zone_alloc+75} <Oct/27 07:40 am> <ffffffff8010f9c8>{__down+152} <ffffffff80135c50>{default_wake_function+0} <Oct/27 07:40 am> <ffffffffa012b37b>{:xfs:xfs_trans_log_buf+107} <ffffffff80234447>{__down_failed+53} <Oct/27 07:40 am> <ffffffffa0141642>{:xfs:.text.lock.xfs_buf+15} <ffffffffa0126618>{:xfs:xfs_getsb+40} <Oct/27 07:40 am> <ffffffffa012b8aa>{:xfs:xfs_trans_getsb+106} <ffffffffa012a10c>{:xfs:xfs_trans_commit+332} <Oct/27 07:40 am> <ffffffffa0104d26>{:xfs:xfs_dir2_createname+278} <ffffffffa0117d3d>{:xfs:xfs_ichgtime+301} <Oct/27 07:40 am> <ffffffffa013194f>{:xfs:xfs_create+1359} <ffffffffa013b429>{:xfs:linvfs_mknod+521} <Oct/27 07:40 am> <ffffffffa0116d16>{:xfs:xfs_iunlock+102} <ffffffffa0133387>{:xfs:xfs_lookup+119} <Oct/27 07:40 am> <ffffffffa013b704>{:xfs:linvfs_lookup+84} <ffffffff8019c49b>{real_lookup+123} <Oct/27 07:40 am> <ffffffff8019cedb>{vfs_create+251} <ffffffff8019f3a0>{open_namei+464} <Oct/27 07:40 am> <ffffffff80189cc7>{filp_open+87} <ffffffff80189d8f>{sys_open+159} <Oct/27 07:40 am> <ffffffff80189765>{sys_close+229} <ffffffff80110794>{system_call+124} <Oct/27 07:40 am> <Oct/27 07:40 am>elim.uptime D 00000000000493e0 0 16873 1 14418 18756 (NOTLB) <Oct/27 07:40 am>Call Trace:<ffffffff80231bcd>{__down_read+125} <ffffffffa01333dc>{:xfs:xfs_access+44} <Oct/27 07:40 am> <ffffffffa013af44>{:xfs:linvfs_permission+20} <ffffffff8019c767>{permission+55} <Oct/27 07:40 am> <ffffffff8019df1c>{link_path_walk+348} <ffffffff8019f2a1>{open_namei+209} <Oct/27 07:40 am> <ffffffff80189cc7>{filp_open+87} <ffffffff80189d8f>{sys_open+159} <Oct/27 07:40 am> <ffffffff80111101>{error_exit+0} <ffffffff80110794>{system_call+124} <Oct/27 07:40 am> <Oct/27 07:40 am>res D 00000000000493e0 0 14323 16149 26319 (NOTLB) <Oct/27 07:40 am>Call Trace:<ffffffff80301c83>{inet_recvmsg+51} <ffffffff802b520a>{sock_aio_read+346} <Oct/27 07:40 am> <ffffffff80231bcd>{__down_read+125} <ffffffffa01333dc>{:xfs:xfs_access+44} <Oct/27 07:40 am> <ffffffffa013af44>{:xfs:linvfs_permission+20} <ffffffff8019c767>{permission+55} <Oct/27 07:40 am> <ffffffff8019df1c>{link_path_walk+348} <ffffffff8019f27f>{open_namei+175} <Oct/27 07:40 am> <ffffffff80189cc7>{filp_open+87} <ffffffff80189d8f>{sys_open+159} <Oct/27 07:40 am> <ffffffff802b58a8>{sys_socket+104} <ffffffff80110794>{system_call+124} <Oct/27 07:40 am> <Oct/27 07:40 am>acuSolve-gmpi D 00000000000493e0 0 14418 1 14419 16873 (NOTLB) <Oct/27 07:40 am>Call Trace:<ffffffff80165ad4>{wait_on_page_writeback_range_wq+324} <Oct/27 07:40 am> <ffffffff8010f9c8>{__down+152} <ffffffff80135c50>{default_wake_function+0} <Oct/27 07:40 am> <ffffffff80234447>{__down_failed+53} <ffffffff801949dc>{.text.lock.super+169} <Oct/27 07:40 am> <ffffffff8018fcea>{do_sync+42} <ffffffff8018fd5e>{sys_sync+62} <Oct/27 07:40 am> <ffffffff80110794>{system_call+124} <Oct/27 07:40 am>acuSolve-gmpi D 00000000000493e0 0 14419 1 18864 14418 (NOTLB) <Oct/27 07:40 am>Call Trace:<ffffffff8010f9c8>{__down+152} <ffffffff80135c50>{default_wake_function+0} <Oct/27 07:40 am> <ffffffff80234447>{__down_failed+53} <ffffffffa0141642>{:xfs:.text.lock.xfs_buf+15} <Oct/27 07:40 am> <ffffffffa0126618>{:xfs:xfs_getsb+40} <ffffffffa012ecea>{:xfs:xfs_syncsub+2602} <Oct/27 07:40 am> <ffffffffa013e468>{:xfs:vfs_sync+40} <ffffffffa013d434>{:xfs:linvfs_sync_super+68} <Oct/27 07:40 am> <ffffffff80193cff>{sync_filesystems+223} <ffffffff8018fcf1>{do_sync+49} <Oct/27 07:40 am> <ffffffff8018fd5e>{sys_sync+62} <ffffffff80110794>{system_call+124} <Oct/27 07:40 am> <Oct/27 07:40 am>mktemp D 00000000000493e0 0 17594 1 17656 16834 (NOTLB) <Oct/27 07:40 am>Call Trace:<ffffffff8025bbe9>{SHATransform+25} <ffffffff8019c2cd>{follow_mount+93} <Oct/27 07:40 am> <ffffffff801a7b51>{dput+33} <ffffffff80231bcd>{__down_read+125} <Oct/27 07:40 am> <ffffffffa01333dc>{:xfs:xfs_access+44} <ffffffffa013af44>{:xfs:linvfs_permission+20} <Oct/27 07:40 am> <ffffffff8019c767>{permission+55} <ffffffff8019df1c>{link_path_walk+348} <Oct/27 07:40 am> <ffffffff801a02ac>{sys_mkdir+220} <ffffffff80110794>{system_call+124} <Oct/27 07:40 am> <Oct/27 07:40 am>check_EWNstag D 00000000000493e0 0 17620 1 17751 17656 (NOTLB) <Oct/27 07:40 am>Call Trace:<ffffffff8018cbbd>{do_sync_write+173} <ffffffff80231bcd>{__down_read+125} <Oct/27 07:40 am> <ffffffffa01333dc>{:xfs:xfs_access+44} <ffffffffa013af44>{:xfs:linvfs_permission+20} <Oct/27 07:40 am> <ffffffff8019c767>{permission+55} <ffffffff8019df1c>{link_path_walk+348} <Oct/27 07:40 am> <ffffffff8019f2a1>{open_namei+209} <ffffffff80189cc7>{filp_open+87} <Oct/27 07:40 am> <ffffffff80189d8f>{sys_open+159} <ffffffff80111101>{error_exit+0} <Oct/27 07:40 am> <ffffffff80110794>{system_call+124} <Oct/27 07:40 am> <Oct/27 07:40 am> <Oct/27 07:40 am>sh D 00000000000493e0 0 17959 1 17793 17858 (NOTLB) <Oct/27 07:40 am>Call Trace:<ffffffff80231bcd>{__down_read+125} <ffffffffa01333dc>{:xfs:xfs_access+44} <Oct/27 07:40 am> <ffffffffa013af44>{:xfs:linvfs_permission+20} <ffffffff8019c767>{permission+55} <Oct/27 07:40 am> <ffffffff8019df1c>{link_path_walk+348} <ffffffff8019f2a1>{open_namei+209} <Oct/27 07:40 am> <ffffffff80189cc7>{filp_open+87} <ffffffff80189d8f>{sys_open+159} <Oct/27 07:40 am> <ffffffff80111101>{error_exit+0} <ffffffff80110794>{system_call+124} <Oct/27 07:40 am> |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: [PATCH] incorrect direct io error handling, David Chinner |
|---|---|
| Next by Date: | Re: [PATCH] rethink removal of xattr syscall stubs (for now), Lachlan McIlroy |
| Previous by Thread: | RE: Issues with XFS on Sles9 sp2., Roger Heflin |
| Next by Thread: | Re: mkfs.xfs questions, Iustin Pop |
| Indexes: | [Date] [Thread] [Top] [All Lists] |