xfs
[Top] [All Lists]

Re: fw: [PATCH] fix instant oops with tracing enabled

To: Lachlan McIlroy <lachlan@xxxxxxx>
Subject: Re: fw: [PATCH] fix instant oops with tracing enabled
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 15 Oct 2008 11:54:41 +1100
Cc: Christoph Hellwig <hch@xxxxxx>, Mark Goodwin <markgw@xxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <48F546ED.6050702@xxxxxxx>
Mail-followup-to: Lachlan McIlroy <lachlan@xxxxxxx>, Christoph Hellwig <hch@xxxxxx>, Mark Goodwin <markgw@xxxxxxx>, xfs@xxxxxxxxxxx
References: <20081013223932.GE10716@disturbed> <48F3EA6F.9000209@xxxxxxx> <20081014131140.GB17351@xxxxxx> <48F546ED.6050702@xxxxxxx>
User-agent: Mutt/1.5.18 (2008-05-17)
On Wed, Oct 15, 2008 at 11:27:09AM +1000, Lachlan McIlroy wrote:
> Christoph Hellwig wrote:
>> On Tue, Oct 14, 2008 at 10:40:15AM +1000, Mark Goodwin wrote:
>>> Lachlan also saw some regressions after merging these patchsets :
>>> . replace the mount inode list with radix tree traversals
>>> . clean up sync code
>>
>> What exactly?  I saw some softlookup in 042, but when applying Dave's
>> xfs_sync_inodeS_ag fix (or the hal of it applying without the del inodes
>> tracking in the radix tree) it goes away.
>
> I saw this panic but I don't think it's related to the above patches:
>
> [252921.307588] BUG: unable to handle kernel <3>BUG: scheduling while atomic: 
> dd/16976/0xf101da90

Isn't there another line with this ouutput that looks like:

        atomic = 1 in_interrupt = 0

To indicate the "atomic" reason?

> [252921.307908] Modules linked in:
> [252921.307911] Pid: 16976, comm: dd Not tainted 2.6.27-rc8 #183
> [252921.307913] [252921.307913] Call Trace:

[ snip exceedingly deep stack that'll blow a 4k ia32 stack
completely ]

In summary, the stack is:

        write
          balance_dirty_pages
            xfs_iomap_write_allocate
              <enter memory reclaim>
              try_to_free_pages
                xfs_iomap_write_allocate
                   _xfs_trans_commit
                     xlog_write
                       xlog_state_get_iclog_space
                         <sleep>

The question is what is the reason for running in atomic mode?
The only place I can see a sleep happening in this function is
the call to sv_wait(), which means the atomic state must have come
from higher up.... Seems very strange.

> I saw sync get stuck in an infinite loop running test 042 - maybe the same
> problem you saw.

Yes, that's the lockup that the later patch I posted fixes.

> I saw the panic in _xfs_itrace_exit() which has now been fixed.
>
> And I also saw this assertion:
>
> <4>[34770.626472] Assertion failed: (index >= 0) && (index < 
> ktp->kt_nentries), file: fs/xfs/support/ktrace.c, line: 173
> <0>[34770.626511] ------------[ cut here ]------------
> <2>[34770.627419] kernel BUG at fs/xfs/support/debug.c:81!

I can't see how that is related to the changes - it's a trace
buffer index overrun. That kind of implies that the ktrace_t
has been corrupted. Memory corruption of some kind?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>