[Top] [All Lists]

Re: xfs recovery oops in 2.6.4-mm1

To: William Lee Irwin III <wli@xxxxxxxxxxxxxx>, Steve Lord <lord@xxxxxxx>, tes@xxxxxxx
Subject: Re: xfs recovery oops in 2.6.4-mm1
From: Nathan Scott <nathans@xxxxxxx>
Date: Mon, 15 Mar 2004 10:30:54 +1100
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <4052834B.5010208@xxxxxxx>
References: <20040312100025.GP655@xxxxxxxxxxxxxx> <4051D517.4070005@xxxxxxx> <20040312233153.GT655@xxxxxxxxxxxxxx> <4052834B.5010208@xxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.3i
Hi guys,
On Fri, Mar 12, 2004 at 09:19:51AM -0600, Steve Lord wrote:
> Have you successfully used xfs on this box with older kernels, or is
> this a new filesystem? Was this the first mount under 2.6.4-mm1?

This would be useful to know.

On Fri, Mar 12, 2004 at 09:43:07PM -0600, Steve Lord wrote:
> > On Fri, Mar 12, 2004 at 09:19:51AM -0600, Steve Lord wrote:
> > 
> >>I see this is a sparc, any chance you could provide disassembly of the
> >>xfs_next_bit function. I wonder if it is playing up on this processor,
> >>it makes use of ffs and we have had some architecture issues with it
> >>before.
> Well, its a little wierd, looks like something is broken compilation
> wise in the recovery code, or we trampled on ourselves.
> We passed a null pointer in xfs_next_bit which is why it blew up,
> the caller does have a way of doing this:
>          unsigned int            *data_map = NULL;
>          unsigned int            map_size = 0;
>          switch (buf_f->blf_type) {
>          case XFS_LI_BUF:
>                  data_map = buf_f->blf_data_map;
>                  map_size = buf_f->blf_map_size;
>                  break;
>          case XFS_LI_6_1_BUF:
>          case XFS_LI_5_3_BUF:
>                  obuf_f = (xfs_buf_log_format_v1_t*)buf_f;
>                  data_map = obuf_f->blf_data_map;
>                  map_size = obuf_f->blf_map_size;
>                  break;
>          }
> ......
>       bit = xfs_next_bit(data_map, map_size, bit);
> We definitely did not hit any of the expected cases in the switch
> statement and we passed null into xfs_next_bit().

Hmmm.  Should be impossible, given the caller, which does
suggest a compiler issue I agree.

> However, the only caller of this function does a check of the same
> data structure and will fail recovery if it does not get one of the
> recognized flags.
> So unless some of the code in between the two stamped on the memory
> in the mean time it looks like a compilation bug.

Theres not much code inbetween for something to go wrong, and
nothing I can see that explicitly futzes with buf_f where the
null pointer would be originating.

> Is this repeatable? We could probably come up with some extra checks in
> here to narrow it down. It might also be worth doing an
>       xfs_logprint -t -i -b /dev/xxx
> if this happens again and sending the output. This would at least
> determine if what was on the disk was correct.
> Nathan, any other ideas?

Yeah, above dump would be useful; ideally we need to find a
reproducible test case too - this might help there...

On Fri, Mar 12, 2004 at 07:48:14PM -0800, William Lee Irwin III wrote:
> On Fri, Mar 12, 2004 at 09:43:07PM -0600, Steve Lord wrote:
> I have to brew up some way to corrupt the fs and get it to try to
> recover via mounting it to reproduce, which doesn't happen for ordinary
> pulling the plug -type of affairs. It did recur 3 times when I once I
> got it corrupted properly, so it's mostly a question of doing that. I
> think it's related to heavy io going on when the plug is pulled.

Attached is a test program thats been used to exercise log
recovery in a more user-friendly fashion.  It uses the xfs
forced-shutdown mode to get a dirty log without having to
pull the plug.  So, generate traffic, shutdown, unmount, &
then the next mount will do log recovery.  If you find the
right traffic to generate a reproducible failure, diagnosing
this becomes a whole lot easier.  There's other tools for
generating all manner of different types of traffic in the
xfstests directory in the xfs userspace cvs too - fsstress
is a good one for generating lots of metadata operations.



-- Binary/unsupported file stripped by Ecartis --
-- Type: text/x-csrc

<Prev in Thread] Current Thread [Next in Thread>