xfs
[Top] [All Lists]

Re: Crashes in various ext2 functions while running xfstest/check

To: Steve Lord <lord@xxxxxxx>
Subject: Re: Crashes in various ext2 functions while running xfstest/check
From: Chris Pascoe <c.pascoe@xxxxxxxxxxxxxx>
Date: Mon, 4 Jun 2001 19:10:02 +1000 (EST)
Cc: <linux-xfs@xxxxxxxxxxx>
In-reply-to: <200105241434.f4OEY6214966@xxxxxxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
Hi Steve,

Further to my last emails on this, I think I've tracked down why the
crashes occur, but don't know how to fix it.  I eliminated the scsi
hardware, ethernet card, etc, that Seth Mos suggested might be the problem
(got loans of completely different hardware).  I can reliably crash my
test machine in under an hour by running test 013 in a loop, and letting
the "/etc/cron.hourly/sysstat" cron job run.  Doing some random other
commands during the process helps speed the crash up.

The crashes I see are related to the machine having highmem support, and
buffers allocated with pages in high memory making their way onto the
(fs/buffer.c) free_list.  I added an extra field to struct buffer_head
that records in the buffer head who created it (in create_empty_buffers),
and what function called put_last_free.  In every instance, the
buffer_head that causes the crash was created by
hook_buffers_to_page_delay, and put onto the free list later by a call to
__invalidate_buffers.  (Adding code to record in the bh who called
that.... done.... crashed, - the caller was blkdev_put this time, but I'll
run a few more tests).

When one of these bh's with bh->b_page in high memory is given to ext2 by
getblk, and a "bread" performed, bh->b_data gets set to values < PAGE_SIZE
by a call to set_bh_page.  This is why it looked like the bh's were
corrupted in my previous backtraces.  The actual disk IO that was
performed on these pages proceeds okay though, as ll_rw_blk() does
create_bounce's for the real disk I/O (which is why the dereferences
you saw came after a successful call to bread).

I can seemingly (no crashes after a weekend of repeats) make the crashes
go away by replacing GFP_HIGHUSER with GFP_USER in clean_inode
(fs/inode.c), and _pagebuf_lookup_pages (fs/pagebuf/page_buf.c).
Changing one alone doesn't make any difference.

Hope that this makes some sense to you, and you can just say aha, and wave
the magic wand :).  I hope you can replicate it locally with this
information.

Regards,
Chris


<Prev in Thread] Current Thread [Next in Thread>