On Wed, Oct 15, 2008 at 11:27:09AM +1000, Lachlan McIlroy wrote:
> Christoph Hellwig wrote:
>> On Tue, Oct 14, 2008 at 10:40:15AM +1000, Mark Goodwin wrote:
>>> Lachlan also saw some regressions after merging these patchsets :
>>> . replace the mount inode list with radix tree traversals
>>> . clean up sync code
>>
>> What exactly? I saw some softlookup in 042, but when applying Dave's
>> xfs_sync_inodeS_ag fix (or the hal of it applying without the del inodes
>> tracking in the radix tree) it goes away.
>
> I saw this panic but I don't think it's related to the above patches:
>
> [252921.307588] BUG: unable to handle kernel <3>BUG: scheduling while atomic:
> dd/16976/0xf101da90
Isn't there another line with this ouutput that looks like:
atomic = 1 in_interrupt = 0
To indicate the "atomic" reason?
> [252921.307908] Modules linked in:
> [252921.307911] Pid: 16976, comm: dd Not tainted 2.6.27-rc8 #183
> [252921.307913] [252921.307913] Call Trace:
[ snip exceedingly deep stack that'll blow a 4k ia32 stack
completely ]
In summary, the stack is:
write
balance_dirty_pages
xfs_iomap_write_allocate
<enter memory reclaim>
try_to_free_pages
xfs_iomap_write_allocate
_xfs_trans_commit
xlog_write
xlog_state_get_iclog_space
<sleep>
The question is what is the reason for running in atomic mode?
The only place I can see a sleep happening in this function is
the call to sv_wait(), which means the atomic state must have come
from higher up.... Seems very strange.
> I saw sync get stuck in an infinite loop running test 042 - maybe the same
> problem you saw.
Yes, that's the lockup that the later patch I posted fixes.
> I saw the panic in _xfs_itrace_exit() which has now been fixed.
>
> And I also saw this assertion:
>
> <4>[34770.626472] Assertion failed: (index >= 0) && (index <
> ktp->kt_nentries), file: fs/xfs/support/ktrace.c, line: 173
> <0>[34770.626511] ------------[ cut here ]------------
> <2>[34770.627419] kernel BUG at fs/xfs/support/debug.c:81!
I can't see how that is related to the changes - it's a trace
buffer index overrun. That kind of implies that the ktrace_t
has been corrupted. Memory corruption of some kind?
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
|