<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">
<TITLE>2.6.23 kdb in xfs_bmbt_get_block with unwritten extents</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>Hello All,<BR>
<BR>
I am getting random kdbs when creating preallocated files that are excessively 'holey' (ex: 500MB+ file with alternating 4K written 4K unwritten extents). Creating such files is not my intention, and is being addressing in the userspace writer. That said, I am still concerned with running into kdb.<BR>
<BR>
I am currently running 2.6.23.9, and have done some digging through the changelogs, but cant seem to find a match. Also, 2.6.24 seems to have a massive rewrite in this area, which significantly limits the scope that I can search.<BR>
<BR>
The cause of the crash is a straigtforward NULL derference in xfs_bmap_btree.c:xfs_bmbt_get_block(), but I suspect the root cause is going to be some complex condition that corrupts the cursor...<BR>
<BR>
if (level < cur->bc_nlevels - 1) {<BR>
*bpp = cur->bc_bufs[level]; <----- cur->bc_bufs[level] == NULL<BR>
rval = XFS_BUF_TO_BMBT_BLOCK(*bpp); <----- BAM! NULL dereferenced<BR>
}<BR>
<BR>
Scanning the source, I see numerous instances of this same unchecked dereference from bc_bufs, but so far I have only hit this one condition.<BR>
<BR>
Here is the call trace...<BR>
<BR>
[<ffffffff8034fec0>] xfs_bmbt_increment+0xb0/0x2c0<BR>
[<ffffffff80346c4b>] xfs_bmap_add_extent_unwritten_real+0x5eb/0xd50<BR>
[<ffffffff80349c72>] xfs_bmap_add_extent+0x152/0x480<BR>
[<ffffffff8038f8d2>] kmem_zone_zalloc+0x32/0x50<BR>
[<ffffffff8034cd40>] xfs_bmapi+0xbe0/0x11f0<BR>
[<ffffffff806ea404>] _spin_unlock+0x14/0x40<BR>
[<ffffffff806ea4cd>] _spin_lock+0x1d/0x90<BR>
[<ffffffff80375e63>] xfs_log_reserve+0xa3/0x100<BR>
[<ffffffff806ea975>] _spin_unlock_irq+0x15/0x40<BR>
[<ffffffff806e9fa6>] __down_write_nested+0x96/0xa0<BR>
[<ffffffff80381ec9>] xfs_trans_reserve+0xa9/0x1f0<BR>
[<ffffffff8037274a>] xfs_iomap_write_unwritten+0x14a/0x230<BR>
[<ffffffff8037166e>] xfs_iomap+0x2fe/0x390<BR>
[<ffffffff806ea3c6>] __lock_text_start+0x16/0x40<BR>
[<ffffffff8038fb20>] xfs_end_bio_unwritten+0x0/0x50<BR>
[<ffffffff8038fb51>] xfs_end_bio_unwritten+0x31/0x50<BR>
[<ffffffff802429a3>] run_workqueue+0x73/0x130<BR>
[<ffffffff80242afc>] worker_thread+0x9c/0xf0<BR>
[<ffffffff80246cd0>] autoremove_wake_function+0x0/0x30<BR>
[<ffffffff80246cd0>] autoremove_wake_function+0x0/0x30<BR>
[<ffffffff80242a60>] worker_thread+0x0/0xf0<BR>
[<ffffffff8024660c>] kthread+0x6c/0xa0<BR>
[<ffffffff8020c9a8>] child_rip+0xa/0x12<BR>
[<ffffffff802465a0>] kthread+0x0/0xa0<BR>
[<ffffffff8020c99e>] child_rip+0x0/0x12<BR>
<BR>
Given the trace, I assume that if I avoid all B+tree managed unwritten extents, I can avoid the crash. However avoiding such files completely seems a bit unrealistic, as I have the need to store files with a reasonable amount of holes...<BR>
<BR>
Thanks,<BR>
Richard<BR>
</FONT>
</P>
</BODY>
</HTML>