[Top] [All Lists]

Re: Help debugging a use after free

To: Stuart MENEFY <stuart.menefy@xxxxxx>
Subject: Re: Help debugging a use after free
From: Russell Cattelan <cattelan@xxxxxxxxxxx>
Date: Tue, 14 Apr 2009 14:10:04 -0500
Cc: xfs@xxxxxxxxxxx
In-reply-to: <49E37972.1070101@xxxxxx>
References: <49E37972.1070101@xxxxxx>
User-agent: Thunderbird (Macintosh/20070728)
Hash: SHA1

Stuart MENEFY wrote:
> Folks
> For the last few days I've been trying to debug an apparent use
> after free in XFS code. This is in a kernel on an SH4
> processor, so this could easily be an XFS problem which has already
> been solved, or an architecture problem. This started because I've
> been trying to work around the D-cache aliasing problems which the
> SH4 suffers from because of XFS's use of vmap, so its also possible
> this is an obscure aliasing problem. I've also been unable to
> reproduce it on an x86_64, albeit with a slightly later kernel.
> The problem occurs when running a loop which is essentially: mount
> /dev/sda1 /mnt bonnie++  -u root -f -s0 -n16:1024:1024 -b -d /mnt
> umount /mnt on an otherwise clean partition, and usually takes at
> least 24 hours to appear.  The problem only occurs with SLAB
> poisoning enabled, and results in a misaligned access because the
> poison value when treated as a pointer is an illegal misaligned:
> Unaligned kernel access on behalf of "bonnie++" pid=22419
> pc=0x841a6c6c ins=0x8433 Pid : 22419, Comm:             bonnie++ PC
> is at xfs_trans_unlocked_item+0xc/0x60 PC  : 841a6c6c SP  :
> 87f29d3c SR  : 40008000 TEA : 00402ebe    Not tainted R0  :
> 00000000 R1  : 841a6c60 R2  : 87f28000 R3  : 00000001 R4  :
> 6b6b6b6b R5  : 6b6b6b6b R6  : 843342ac R7  : 840139c0 R8  :
> 6b6b6b6b R9  : 6b6b6b6b R10 : 878cb000 R11 : 8452eb8c R12 :
> 00003255 R13 : 00000625 R14 : 87f29d4c MACH: 00000000 MACL:
> 0df75800 GBR : 297f9658 PR  : 84189d20
> Call trace: [<84189d20>] xfs_iunlock+0x60/0xc0 [<841915da>]
> xfs_inode_item_push+0x1a/0x40 [<841a6ff4>]
> xfs_trans_push_ail+0x1b4/0x240 [<841971b6>]
> xlog_grant_push_ail+0xf6/0x180 [<8419961c>]
> xfs_log_reserve+0x3c/0xc0 [<841a540a>] xfs_trans_reserve+0x8a/0x220
>  [<8418d6f0>] xfs_itruncate_finish+0x170/0x420 [<841a7de0>]
> xfs_trans_ihold+0x0/0x20 [<841a7de0>] xfs_trans_ihold+0x0/0x20
> [<841a7e60>] xfs_trans_ijoin+0x0/0xa0 [<841b112a>]
> xfs_inactive+0x3ea/0x500 [<84189da0>] xfs_ilock+0x0/0xe0
> [<841bf75c>] xfs_fs_clear_inode+0x3c/0xa0 [<843379a0>]
> _spin_unlock+0x0/0x60 [<840818da>] clear_inode+0x5a/0x140
> [<84081a92>] generic_delete_inode+0xd2/0x120 [<84081bc6>]
> generic_drop_inode+0xe6/0x1c0 [<84080e2e>] iput+0x4e/0x80
> [<840764ba>] do_unlinkat+0xfa/0x1a0 [<8408fdba>] do_fsync+0x7a/0xe0
>  [<8408fe40>] __do_fsync+0x20/0x60 [<8407656e>] sys_unlink+0xe/0x20
>  [<84076560>] sys_unlink+0x0/0x20 [<840088f8>] syscall_call+0xa/0xe
> (Note that due to the way backtrace is implemented on SH
> architectures there are spurious values in this backtrace, so it is
> probably safe to ignore any entries of the form FN+0x0.)
> The problem is caused by the "xfs_inode_t" which is passed into
> xfs_iunlock() having already been freed at the SLAB level. Putting
> some additional tracing code into the free path, this structure is
> being freed by xfs_inode_item_destroy(), which is called from from
> xfs_idestroy() which is called from xfs_ireclaim().
I guess the first question is which thread has actually free'ed the
inode? it sounds like a race
that seems unlikely unless you are on a multi processor system. I
don't know anything about the SH4 architecture
but I assume this is a single processor system?

If you can instrument actually free that is causing the problem it
would be good to print out the
actually pid doing the free and as many return addresses that you can,
so we can get an
idea of the actual call chain.

> In between the free and use there appears to be some disk activity,
>  as I see an allocate from the SCSI layer, plus a few other kmalloc
>  and kfree's.
> Running with XFS_DEBUG enabled hasn't shown any problems, and still
>  failed in the same way.
I wouldn't expect XFS_DEBUG to point out anything particularly
interesting in this area,
 but it never hurts to try.
> So, does this sound like a problem which has been seen before?
> Alternatively can anyone suggest what would normally prevent this
> happening, and I can look into that (although I know next to
> nothing about the guts of the XFS code, so please don't be afraid
> to state the obvious!). Or any other suggestions welcome.
> Stuart
> _______________________________________________ xfs mailing list
> xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs

Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


<Prev in Thread] Current Thread [Next in Thread>