xfs
[Top] [All Lists]

Re: xfs oops (CVS-2004-05-15_05:00_UTC)

To: linux-xfs@xxxxxxxxxxx
Subject: Re: xfs oops (CVS-2004-05-15_05:00_UTC)
From: Nathan Scott <nathans@xxxxxxx>
Date: Wed, 23 Jun 2004 21:05:47 +1000
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.2.5i
Bounced, resending.

cheers.

----- Forwarded message from Mail Delivery Subsystem 
<MAILER-DAEMON@xxxxxxxxxxxx> -----

Date: Wed, 23 Jun 2004 04:12:31 -0700
To: <nathans@xxxxxxxxxxxxxxxxxxxxxxxx>
From: Mail Delivery Subsystem <MAILER-DAEMON@xxxxxxxxxxxx>
Subject: Returned mail: see transcript for details

The original message was received at Wed, 23 Jun 2004 03:56:18 -0700
from fddi-nodin.corp.sgi.com [198.29.75.193]

   ----- The following addresses had permanent fatal errors -----
<linux-xfs@xxxxxxxxxxx>
    (reason: 554 5.4.6 Too many hops)

   ----- Transcript of session follows -----
554 5.4.6 Too many hops 18 (17 max): from <nathans@xxxxxxxxxxxxxxxxxxxxxxxx> 
via localhost, to <linux-xfs@xxxxxxxxxxx>

Reporting-MTA: dns; omx2.sgi.com
Arrival-Date: Wed, 23 Jun 2004 03:56:18 -0700

Final-Recipient: RFC822; linux-xfs@xxxxxxxxxxx
Action: failed
Status: 5.4.6
Diagnostic-Code: SMTP; 554 5.4.6 Too many hops
Last-Attempt-Date: Wed, 23 Jun 2004 04:12:31 -0700

Date: Wed, 23 Jun 2004 20:36:39 +1000
To: Krzysztof Rusocki <kszysiu@xxxxxxxxxxxxxxxxxxxx>
Cc: linux-xfs@xxxxxxxxxxx
User-Agent: Mutt/1.2.5i
From: Nathan Scott <nathans@xxxxxxx>
Subject: Re: xfs oops (CVS-2004-05-15_05:00_UTC)

Hi there,

On Wed, Jun 23, 2004 at 10:49:22AM +0200, Krzysztof Rusocki wrote:
> On Tue, Jun 22, 2004 at 06:29:06PM +1000, Nathan Scott wrote:
> > 
> > Actually, since this looks a bit like an IO completion is
> > happening after we've freed a buffer, could you see if the
> > BUG_ONs in this patch are hit on your machine?  If so, the
> > kdb backtrace there will be much closer to the fault and
> > might give me enough information to figure out whats going
> > on here.
> 
> Hi,
> 
> Here's preliminary report:

Thanks!

> 2) I browsed xfs_buf.c changesets and reverted (along with recent
> undelay fix which, as far as I can see, did not reach linux-2.5 bk)

Not yet, no.  Its sitting in Linus' merge queue though.

> the following two:
> 
> - 1.1722.10.100 [XFS] Don't leak locked pages on readahead failure
> - 1.1587.5.6 [XFS] close external blockdevice after final flush
> 
> with no effect (still crashing), actually. Currently I'm running kernel with
> - 1.1371.750.18 [XFS] cleanup pagebuf flag usage and simplify pagebuf_free.
> reverted as well. No conclusions on this one yet, though.
> 
> I don't know why but I gave up binary chopping idea, 2.6.4 had been working
> fine so, I think, there are not too many changes to consider...
> 
> What changesets would you suggest to focus on if problem persists?
> 

Firstly, I guess it would be good to verify that 2.6.4 really
does work.  Then, I would focus on all (and there shouldn't be
_too_ many) mods that modified xfs_buf.c -- a bk revtool on
fs/xfs/linux-2.6/xfs_buf.c will give the list going back to
2.6.4 (and further if needed).

> 3) xfs_repair (2.6.3 cmds) on rootfs on one of failing machines *constantly*
> seems to show odd (for me at least) things - I do not know whether
> they indicate some troubles, however I believe they should't be there...
> 
>         - ...
>         - agno = 6
> LEAFN node level is 1 inode 25165984 bno = 8388608
>         - agno = 7
>         - ...
> LEAFN node level is 1 inode 25165984 bno = 8388608
>         - agno = 7

These two I've not come across before, from a read through
the code they seem harmless (the function in repair thats
printing this suggests this is OK, reports success and
carries on).  Could you send me an xfs_db dump of that
directory inode (25165984) anyway though?  thanks.

> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - ensuring existence of lost+found directory
>         - traversing filesystem starting at / ... 
> rebuilding directory inode 128
>         - traversal finished ... 

... and this is just the good ol' lost+found entry being
recreated each time - also harmless.

cheers.

-- 
Nathan


----- End forwarded message -----

-- 
Nathan


<Prev in Thread] Current Thread [Next in Thread>