xfs
[Top] [All Lists]

Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
From: Rafał Kupka <kupson@xxxxxxxxxx>
Date: Sat, 03 Mar 2012 12:37:07 +0100
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=kupson.net; s=pn10; h=Content-Transfer-Encoding:Mime-Version:Message-ID:Date:Content-Type:References:In-Reply-To:Cc:To:From:Subject; bh=MPzqe66mXmfQVzURm9kDEC+DXRx+sQ/wKEGhJkIrgj8=; b=OY9sWFNJu75AP9CWN/66bgJzkXaYPp/k1RVqTPuw37SjKt7qp+UELIIi1t19S+GpU2gBIEPdnd29QerSOvlYYE0igVa09vO7pkOh2GKyMYVAvkGKUlwg2V1bIIbQVu78iMtilxJoKrtlmQXWF5crX+/GdfF+Un4BSnIINPm2KWc=;
In-reply-to: <20120302104348.GA22230@xxxxxxxxxxxxx>
References: <1330539626.7615.73.camel@xxxxxxxxxxxxxxxx> <20120302104348.GA22230@xxxxxxxxxxxxx>
On Fri, 2012-03-02 at 05:43 -0500, Christoph Hellwig wrote:
Hello,

> From the trace this looks like we got incorrect results from xfs_imap,
> which got some fairly significant changes between the two kernels
> you tested.  Can you check if 2.6.36 shows this issue?  It's the first
> kernel that had these big changes in inode lookup.

I've tested more kernels and here are results (linus git tree, no
additional patches):
- v2.6.36 -- good
- v2.6.38 -- good
- v2.6.39 -- bad, I/O error log entry and locked processes:

Mar  2 20:13:58 goscie kernel: [  431.392301] XFS (vda2): I/O error occurred: 
meta-data dev vda2 block 0xe6700       ("xfs_trans_read_buf") error 5 buf count 
8192
Mar  2 20:13:58 goscie kernel: [  431.427181] XFS (vda2): xfs_imap_to_bp: 
xfs_trans_read_buf() returned error 5.
Mar  2 20:13:59 goscie kernel: [  433.086049] XFS (vda2): I/O error occurred: 
meta-data dev vda2 block 0xe6700       ("xfs_trans_read_buf") error 5 buf count 
8192
Mar  2 20:13:59 goscie kernel: [  433.088638] XFS (vda2): xfs_imap_to_bp: 
xfs_trans_read_buf() returned error 5.
Mar  2 20:16:47 goscie kernel: [  600.584162] INFO: task cron:718 blocked for 
more than 120 seconds.
Mar  2 20:16:47 goscie kernel: [  600.585520] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  2 20:16:47 goscie kernel: [  600.587651] cron            D 
ffff8800174d2180     0   718      1 0x00020000
Mar  2 20:16:47 goscie kernel: [  600.589632]  ffff8800174d2180 
0000000000000082 ffffffff8100e0d1 ffff88001822c7a0
Mar  2 20:16:47 goscie kernel: [  600.592651]  0000000000013d40 
ffff880013831fd8 ffff880013831fd8 0000000000013d40
Mar  2 20:16:47 goscie kernel: [  600.595718]  ffff8800174d2180 
ffff880013830010 ffff88001fc13d40 ffff88001821ed00
Mar  2 20:16:47 goscie kernel: [  600.598314] Call Trace:
Mar  2 20:16:47 goscie kernel: [  600.598640]  [<ffffffff8100e0d1>] ? 
sched_clock+0x5/0x8
Mar  2 20:16:47 goscie kernel: [  600.599099]  [<ffffffff8139166c>] ? 
schedule_timeout+0x2d/0xd7
Mar  2 20:16:47 goscie kernel: [  600.599503]  [<ffffffff81391fab>] ? 
__down_common+0x92/0xe3
Mar  2 20:16:47 goscie kernel: [  600.599935]  [<ffffffff811d7ba0>] ? 
_xfs_buf_find+0x1ed/0x2c8
Mar  2 20:16:47 goscie kernel: [  600.600383]  [<ffffffff8105f3bc>] ? 
down+0x27/0x37
Mar  2 20:16:47 goscie kernel: [  600.600782]  [<ffffffff811d6365>] ? 
xfs_buf_lock+0x64/0xb0
Mar  2 20:16:47 goscie kernel: [  600.601179]  [<ffffffff811d7ba0>] ? 
_xfs_buf_find+0x1ed/0x2c8
Mar  2 20:16:47 goscie kernel: [  600.601581]  [<ffffffff811d7cd6>] ? 
xfs_buf_get+0x5b/0x14f
Mar  2 20:16:47 goscie kernel: [  600.601986]  [<ffffffff811d7de0>] ? 
xfs_buf_read+0x16/0xbf
Mar  2 20:16:47 goscie kernel: [  600.602360]  [<ffffffff811ce10a>] ? 
xfs_trans_read_buf+0x52/0x573
Mar  2 20:16:47 goscie kernel: [  600.602750]  [<ffffffff811b65ee>] ? 
xfs_imap_to_bp+0x47/0x206
Mar  2 20:16:47 goscie kernel: [  600.603171]  [<ffffffff811d6c56>] ? 
xfs_buf_rele+0xfa/0x17a
Mar  2 20:16:47 goscie kernel: [  600.603567]  [<ffffffff811b9531>] ? 
xfs_iread+0x59/0x1cc
Mar  2 20:16:47 goscie kernel: [  600.603983]  [<ffffffff811b3727>] ? 
xfs_iget+0x3e4/0x611
Mar  2 20:16:47 goscie kernel: [  600.612400]  [<ffffffff811d0d80>] ? 
xfs_lookup+0xaa/0xd9
Mar  2 20:16:47 goscie kernel: [  600.612757]  [<ffffffff811dc6f3>] ? 
xfs_vn_lookup+0x3d/0x7c
Mar  2 20:16:47 goscie kernel: [  600.613142]  [<ffffffff810fed3f>] ? 
d_alloc_and_lookup+0x43/0x60
Mar  2 20:16:47 goscie kernel: [  600.613611]  [<ffffffff8110061c>] ? 
walk_component+0x161/0x30c
Mar  2 20:16:47 goscie kernel: [  600.613967]  [<ffffffff81101683>] ? 
path_lookupat+0xad/0x33f
Mar  2 20:16:47 goscie kernel: [  600.614398]  [<ffffffff81033b5a>] ? 
update_cfs_shares+0x7c/0x149
Mar  2 20:16:47 goscie kernel: [  600.614779]  [<ffffffff81101933>] ? 
do_path_lookup+0x1e/0x9a
Mar  2 20:16:47 goscie kernel: [  600.615125]  [<ffffffff810ff44d>] ? 
getname_flags+0x14f/0x1c6
Mar  2 20:16:47 goscie kernel: [  600.615496]  [<ffffffff811023df>] ? 
user_path_at+0x48/0x7a
Mar  2 20:16:47 goscie kernel: [  600.615857]  [<ffffffff8105e7f5>] ? 
hrtimer_cancel+0xc/0x16
Mar  2 20:16:47 goscie kernel: [  600.616220]  [<ffffffff81391de2>] ? 
do_nanosleep+0x77/0xae
Mar  2 20:16:47 goscie kernel: [  600.616583]  [<ffffffff811078e7>] ? 
dput+0x84/0xe8
Mar  2 20:16:47 goscie kernel: [  600.616909]  [<ffffffff810fa00b>] ? 
vfs_fstatat+0x43/0x70
Mar  2 20:16:47 goscie kernel: [  600.617247]  [<ffffffff8102dbc0>] ? 
sys32_stat64+0x11/0x29
Mar  2 20:16:47 goscie kernel: [  600.617621]  [<ffffffff81399180>] ? 
cstar_dispatch+0x7/0x2e
Mar  2 20:16:47 goscie kernel: [  600.617984] INFO: task xfsdump:1849 blocked 
for more than 120 seconds.
Mar  2 20:16:47 goscie kernel: [  600.618346] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  2 20:16:47 goscie kernel: [  600.618964] xfsdump         D 
ffff8800178936e0     0  1849   1826 0x00020000
Mar  2 20:16:47 goscie kernel: [  600.619448]  ffff8800178936e0 
0000000000000086 ffff880000000000 ffffffff8160b020
Mar  2 20:16:47 goscie kernel: [  600.620276]  0000000000013d40 
ffff88000d841fd8 ffff88000d841fd8 0000000000013d40
Mar  2 20:16:47 goscie kernel: [  600.621068]  ffff8800178936e0 
ffff88000d840010 ffffea000053c950 0000000000000008
Mar  2 20:16:47 goscie kernel: [  600.621814] Call Trace:
Mar  2 20:16:47 goscie kernel: [  600.622155]  [<ffffffff8139166c>] ? 
schedule_timeout+0x2d/0xd7
Mar  2 20:16:47 goscie kernel: [  600.622509]  [<ffffffff811ce10a>] ? 
xfs_trans_read_buf+0x52/0x573
Mar  2 20:16:47 goscie kernel: [  600.622888]  [<ffffffff8119a361>] ? 
xfs_btree_check_sblock+0x64/0x15e
Mar  2 20:16:47 goscie kernel: [  600.623308]  [<ffffffff81391fab>] ? 
__down_common+0x92/0xe3
Mar  2 20:16:47 goscie kernel: [  600.623650]  [<ffffffff811d7ba0>] ? 
_xfs_buf_find+0x1ed/0x2c8
Mar  2 20:16:47 goscie kernel: [  600.624065]  [<ffffffff8105f3bc>] ? 
down+0x27/0x37
Mar  2 20:16:47 goscie kernel: [  600.624398]  [<ffffffff811d6365>] ? 
xfs_buf_lock+0x64/0xb0
Mar  2 20:16:47 goscie kernel: [  600.624739]  [<ffffffff811d7ba0>] ? 
_xfs_buf_find+0x1ed/0x2c8
Mar  2 20:16:47 goscie kernel: [  600.625127]  [<ffffffff811d7cd6>] ? 
xfs_buf_get+0x5b/0x14f
Mar  2 20:16:47 goscie kernel: [  600.625475]  [<ffffffff811d7de0>] ? 
xfs_buf_read+0x16/0xbf
Mar  2 20:16:47 goscie kernel: [  600.625820]  [<ffffffff811ce10a>] ? 
xfs_trans_read_buf+0x52/0x573
Mar  2 20:16:47 goscie kernel: [  600.626198]  [<ffffffff811b65ee>] ? 
xfs_imap_to_bp+0x47/0x206
Mar  2 20:16:47 goscie kernel: [  600.626550]  [<ffffffff811b9531>] ? 
xfs_iread+0x59/0x1cc
Mar  2 20:16:47 goscie kernel: [  600.626888]  [<ffffffff811b3727>] ? 
xfs_iget+0x3e4/0x611
Mar  2 20:16:47 goscie kernel: [  600.627277]  [<ffffffff811d3715>] ? 
kmem_alloc+0x28/0x9e
Mar  2 20:16:47 goscie kernel: [  600.627641]  [<ffffffff811bc59b>] ? 
xfs_bulkstat_one_int+0x9a/0x318
Mar  2 20:16:47 goscie kernel: [  600.628000]  [<ffffffff8117f3b4>] ? 
xfs_bstime_store_compat+0x4d/0x4d
Mar  2 20:16:47 goscie kernel: [  600.628433]  [<ffffffff8117fb58>] ? 
xfs_bulkstat_one_compat+0x17/0x1c
Mar  2 20:16:47 goscie kernel: [  600.628857]  [<ffffffff811bc350>] ? 
xfs_bulkstat+0x58c/0x73d
Mar  2 20:16:47 goscie kernel: [  600.629220]  [<ffffffff8117fb41>] ? 
xfs_compat_fssetdm_by_handle+0xc7/0xc7
Mar  2 20:16:47 goscie kernel: [  600.629635]  [<ffffffff8118006a>] ? 
xfs_compat_ioc_bulkstat+0x162/0x1cf
Mar  2 20:16:47 goscie kernel: [  600.630080]  [<ffffffff8103680c>] ? 
task_rq_lock+0x47/0x7a
Mar  2 20:16:47 goscie kernel: [  600.630463]  [<ffffffff81180462>] ? 
xfs_file_compat_ioctl+0x38b/0x465
Mar  2 20:16:47 goscie kernel: [  600.630829]  [<ffffffff81030962>] ? 
__wake_up_common+0x41/0x78
Mar  2 20:16:47 goscie kernel: [  600.631177]  [<ffffffff810366c2>] ? 
__wake_up+0x35/0x46
Mar  2 20:16:47 goscie kernel: [  600.631537]  [<ffffffff812915c0>] ? 
pty_write+0x48/0x53
Mar  2 20:16:47 goscie kernel: [  600.631895]  [<ffffffff8103e770>] ? 
perf_event_task_sched_out+0x48/0x54
Mar  2 20:16:47 goscie kernel: [  600.632285]  [<ffffffff8128cc83>] ? 
n_tty_write+0x322/0x360
Mar  2 20:16:47 goscie kernel: [  600.632635]  [<ffffffff8112dec3>] ? 
compat_sys_ioctl+0x1bb/0xfb4
Mar  2 20:16:47 goscie kernel: [  600.632993]  [<ffffffff810086b8>] ? 
__switch_to+0x1c9/0x288
Mar  2 20:16:47 goscie kernel: [  600.633340]  [<ffffffff810379f3>] ? 
pick_next_task_fair+0xf9/0x10a
Mar  2 20:16:47 goscie kernel: [  600.633707]  [<ffffffff8102602a>] ? 
pvclock_clocksource_read+0x46/0xb4
Mar  2 20:16:47 goscie kernel: [  600.634079]  [<ffffffff810622ff>] ? 
timekeeping_get_ns+0xe/0x2e
Mar  2 20:16:47 goscie kernel: [  600.634430]  [<ffffffff81062975>] ? 
getnstimeofday+0x4d/0x80
Mar  2 20:16:47 goscie kernel: [  600.634778]  [<ffffffff810629f5>] ? 
do_gettimeofday+0x10/0x2e
Mar  2 20:16:47 goscie kernel: [  600.635136]  [<ffffffff81399180>] ? 
cstar_dispatch+0x7/0x2e

- git revision 243b422af9ea9af4ead07a8ad54c90d4f9b6081a -- good
It's one patch before "xfs: preallocation transactions do not need to be 
synchronous"

- git revision c5850150d0b9ae16840c5d9846f9d5a759996a15:
    Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
    
    * 'for-linus' of git://oss.sgi.com/xfs/xfs:
      xfs: stop using the page cache to back the buffer cache
      xfs: register the inode cache shrinker before quotachecks
      xfs: xfs_trans_read_buf() should return an error on failure
      xfs: introduce inode cluster buffer trylocks for xfs_iflush
      vmap: flush vmap aliases when mapping fails
      xfs: preallocation transactions do not need to be synchronous
    
Hung xfsdump (only?) without any additional log entries
(CONFIG_XFS_DEBUG is set):

Mar  3 12:05:57 goscie kernel: [  960.816216] INFO: task xfsdump:1519 blocked 
for more than 120 seconds.
Mar  3 12:05:57 goscie kernel: [  960.818113] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  3 12:05:57 goscie kernel: [  960.820223] xfsdump         D 
ffff88001e196950     0  1519   1490 0x00020000
Mar  3 12:05:57 goscie kernel: [  960.821572]  ffff88001e196950 
0000000000000086 ffff880000000000 ffffffff8160b020
Mar  3 12:05:57 goscie kernel: [  960.823618]  0000000000013d00 
ffff880009903fd8 ffff880009903fd8 0000000000013d00
Mar  3 12:05:57 goscie kernel: [  960.825707]  ffff88001e196950 
ffff880009902010 0000000100000000 0000000000000008
Mar  3 12:05:57 goscie kernel: [  960.827746] Call Trace:
Mar  3 12:05:57 goscie kernel: [  960.828672]  [<ffffffff81391493>] ? 
schedule_timeout+0x2d/0xd7
Mar  3 12:05:57 goscie kernel: [  960.829937]  [<ffffffff811ce4d2>] ? 
xfs_trans_read_buf+0x52/0x573
Mar  3 12:05:57 goscie kernel: [  960.831118]  [<ffffffff8119a8a5>] ? 
xfs_btree_check_sblock+0x64/0x15e
Mar  3 12:05:57 goscie kernel: [  960.831459]  [<ffffffff81391dd2>] ? 
__down_common+0x92/0xe3
Mar  3 12:05:57 goscie kernel: [  960.831780]  [<ffffffff811d7fab>] ? 
_xfs_buf_find+0x1ed/0x2c8
Mar  3 12:05:57 goscie kernel: [  960.832125]  [<ffffffff8105f2fc>] ? 
down+0x27/0x37
Mar  3 12:05:57 goscie kernel: [  960.832424]  [<ffffffff811d6776>] ? 
xfs_buf_lock+0x7c/0xc8
Mar  3 12:05:57 goscie kernel: [  960.832739]  [<ffffffff811d7fab>] ? 
_xfs_buf_find+0x1ed/0x2c8
Mar  3 12:05:57 goscie kernel: [  960.833068]  [<ffffffff811d80e1>] ? 
xfs_buf_get+0x5b/0x14f
Mar  3 12:05:57 goscie kernel: [  960.833412]  [<ffffffff811d81eb>] ? 
xfs_buf_read+0x16/0xbf
Mar  3 12:05:57 goscie kernel: [  960.833737]  [<ffffffff811ce4d2>] ? 
xfs_trans_read_buf+0x52/0x573
Mar  3 12:05:57 goscie kernel: [  960.834071]  [<ffffffff811b6b2a>] ? 
xfs_imap_to_bp+0x47/0x206
Mar  3 12:05:57 goscie kernel: [  960.834396]  [<ffffffff811b9a6d>] ? 
xfs_iread+0x59/0x1cc
Mar  3 12:05:57 goscie kernel: [  960.834706]  [<ffffffff811b3c63>] ? 
xfs_iget+0x3e4/0x611
Mar  3 12:05:57 goscie kernel: [  960.835022]  [<ffffffff811d3b15>] ? 
kmem_alloc+0x60/0x9e
Mar  3 12:05:57 goscie kernel: [  960.835338]  [<ffffffff811bca6b>] ? 
xfs_bulkstat_one_int+0x9a/0x318
Mar  3 12:05:57 goscie kernel: [  960.835677]  [<ffffffff8117f8d4>] ? 
xfs_bstime_store_compat+0x4d/0x4d
Mar  3 12:05:57 goscie kernel: [  960.836029]  [<ffffffff81180078>] ? 
xfs_bulkstat_one_compat+0x17/0x1c
Mar  3 12:05:57 goscie kernel: [  960.836377]  [<ffffffff811bc820>] ? 
xfs_bulkstat+0x58c/0x73d
Mar  3 12:05:57 goscie kernel: [  960.836697]  [<ffffffff81180061>] ? 
xfs_compat_fssetdm_by_handle+0xc7/0xc7
Mar  3 12:05:57 goscie kernel: [  960.837049]  [<ffffffff8118058a>] ? 
xfs_compat_ioc_bulkstat+0x162/0x1cf
Mar  3 12:05:57 goscie kernel: [  960.837439]  [<ffffffff810b1d7a>] ? 
generic_file_aio_read+0x58d/0x5a2
Mar  3 12:05:57 goscie kernel: [  960.837790]  [<ffffffff81180982>] ? 
xfs_file_compat_ioctl+0x38b/0x465
Mar  3 12:05:57 goscie kernel: [  960.838132]  [<ffffffff811b3219>] ? 
xfs_iunlock+0xea/0x147
Mar  3 12:05:57 goscie kernel: [  960.838467]  [<ffffffff811d1221>] ? 
xfs_free_eofblocks+0xaa/0x222
Mar  3 12:05:57 goscie kernel: [  960.838810]  [<ffffffff8112e413>] ? 
compat_sys_ioctl+0x1bb/0xfb4
Mar  3 12:05:57 goscie kernel: [  960.846656]  [<ffffffff8102602a>] ? 
pvclock_clocksource_read+0x46/0xb4
Mar  3 12:05:57 goscie kernel: [  960.847015]  [<ffffffff8106222f>] ? 
timekeeping_get_ns+0xe/0x2e
Mar  3 12:05:57 goscie kernel: [  960.847343]  [<ffffffff810628a5>] ? 
getnstimeofday+0x4d/0x80
Mar  3 12:05:57 goscie kernel: [  960.847664]  [<ffffffff81062925>] ? 
do_gettimeofday+0x10/0x2e
Mar  3 12:05:57 goscie kernel: [  960.847994]  [<ffffffff81398fc0>] ? 
cstar_dispatch+0x7/0x2e

I don't know if it's the same bug - there is no assertion nor I/O error
log entry...

Regards,
Rafał Kupka

<Prev in Thread] Current Thread [Next in Thread>