[PATCH] xfs: shutdown filesystem if xfs_perag_get fails
Mark Tinguely
tinguely at sgi.com
Mon Apr 22 10:11:39 CDT 2013
On 04/22/13 09:32, Eric Sandeen wrote:
> On 4/22/13 8:45 AM, Mark Tinguely wrote:
>> On 04/21/13 16:55, Eric Sandeen wrote:
>>> On 4/21/13 12:41 PM, Mark Tinguely wrote:
>>>
>>>> This problem happened locally with a bad inode number from xfs
>>>> recovery. xfs_perag_get() can return NULL if given a bad agno.
>>>> Most callers of xfs_perag_get() do not check for a NULL before
>>>> using the pointer. This patch forces a shutdown of the filesystem
>>>> for those callers that do not check the return value rather than
>>>> crashing on a dereferenced NULL pointer.
>>>
>>> Hi Mark -
>>>
>>> I'm curious, what was the callchain when this happened? Was it
>>> during recovery? If so, would aborting recovery be more prudent?
>>>
>>> I might be missing something, but I'm not sure how shutting
>>> down avoids a subsequent null ptr deref& crash.
>>>
>>> i.e. if a caller does something like:
>>>
>>> pag = xfs_perag_get(mp, agno);
>>> spin_lock(&pag->pagb_lock);
>>>
>>> shutting down in xfs_perag_get doesn't save us from a
>>> null pag pointer, would it?
>>>
>>> Thanks,
>>> -Eric
>>
>> You are correct, we have to exit the routine(s) to avoid the dereference. Let the callers handle the error.
>>
>> Sorry for the noise.
>
> No problem, glad I'm useful on the rare occasion. ;)
>
> Can you share the backtrace on the null deref you saw?
>
> Thanks,
> -Eric
>
PID: 7965 TASK: ffff88013566c040 CPU: 0 COMMAND: "mount"
#0 [ffff8801356035d0] machine_kexec at ffffffff810265ce
#1 [ffff880135603620] crash_kexec at ffffffff810a3b5a
#2 [ffff8801356036f0] oops_end at ffffffff81442de8
#3 [ffff880135603710] __bad_area_nosemaphore at ffffffff81032995
#4 [ffff8801356037d0] do_page_fault at ffffffff8144558b
#5 [ffff8801356038d0] page_fault at ffffffff81442065
[exception RIP: _raw_spin_lock+5]
RIP: ffffffff81441985 RSP: ffff880135603980 RFLAGS: 00010206
RAX: 0000000000010000 RBX: ffff880135530340 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000013 RDI: 0000000000000090
RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000003
R10: ffff880134d96d00 R11: ffffffff8101ff60 R12: 0000000000000000
R13: 00000042bb4c2000 R14: 0000000000002000 R15: ffff880135530340
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffff880135603980] _xfs_buf_find at ffffffffa01a7fef [xfs]
#7 [ffff8801356039f0] xfs_buf_get at ffffffffa01a824a [xfs]
#8 [ffff880135603a30] xfs_buf_read at ffffffffa01a83a4 [xfs]
#9 [ffff880135603a60] xlog_recover_inode_pass2 at ffffffffa0193629 [xfs]
#10 [ffff880135603ac0] xlog_recover_commit_trans at ffffffffa0194166 [xfs]
#11 [ffff880135603af0] xlog_recover_process_data at ffffffffa0194351 [xfs]
#12 [ffff880135603b50] xlog_do_recovery_pass at ffffffffa0194804 [xfs]
#13 [ffff880135603c80] xlog_do_log_recovery at ffffffffa0194aa7 [xfs]
#14 [ffff880135603cb0] xlog_do_recover at ffffffffa0194aea [xfs]
#15 [ffff880135603cd0] xlog_recover at ffffffffa019592e [xfs]
#16 [ffff880135603cf0] xfs_log_mount at ffffffffa018d524 [xfs]
#17 [ffff880135603d20] xfs_mountfs at ffffffffa0198a13 [xfs]
#18 [ffff880135603d70] xfs_fs_fill_super at ffffffffa01b0ac9 [xfs]
#19 [ffff880135603db0] mount_bdev at ffffffff811544fe
#20 [ffff880135603e20] mount_fs at ffffffff81153ece
#21 [ffff880135603e60] vfs_kern_mount at ffffffff8116f975
#22 [ffff880135603e90] do_kern_mount at ffffffff8116fa63
#23 [ffff880135603ed0] do_mount at ffffffff81170a7d
#24 [ffff880135603f30] sys_mount at ffffffff81170b80
#25 [ffff880135603f80] system_call_fastpath at ffffffff81449692
RIP: 00007fd46cac51ea RSP: 00007fff910bdde8 RFLAGS: 00010206
RAX: 00000000000000a5 RBX: ffffffff81449692 RCX: ffffffffc0ed0000
RDX: 0000000000626fb0 RSI: 000000000061de90 RDI: 000000000061de70
RBP: 00007fff910be0cc R8: 0000000000000000 R9: 00007fff910bdad0
R10: ffffffffc0ed0000 R11: 0000000000000206 R12: 0000000000626fb0
R13: 0000000000000000 R14: 00000000c0ed0000 R15: 00007fff910be040
ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b
The recovery value is bad and is a problem on its own, but XFS does not
verify the validity of ag number when doing a xfs_perag_get().
--Mark.
More information about the xfs
mailing list