Bounced, resending.
cheers.
----- Forwarded message from Mail Delivery Subsystem
<MAILER-DAEMON@xxxxxxxxxxxx> -----
Date: Wed, 23 Jun 2004 04:12:31 -0700
To: <nathans@xxxxxxxxxxxxxxxxxxxxxxxx>
From: Mail Delivery Subsystem <MAILER-DAEMON@xxxxxxxxxxxx>
Subject: Returned mail: see transcript for details
The original message was received at Wed, 23 Jun 2004 03:56:18 -0700
from fddi-nodin.corp.sgi.com [198.29.75.193]
----- The following addresses had permanent fatal errors -----
<linux-xfs@xxxxxxxxxxx>
(reason: 554 5.4.6 Too many hops)
----- Transcript of session follows -----
554 5.4.6 Too many hops 18 (17 max): from <nathans@xxxxxxxxxxxxxxxxxxxxxxxx>
via localhost, to <linux-xfs@xxxxxxxxxxx>
Reporting-MTA: dns; omx2.sgi.com
Arrival-Date: Wed, 23 Jun 2004 03:56:18 -0700
Final-Recipient: RFC822; linux-xfs@xxxxxxxxxxx
Action: failed
Status: 5.4.6
Diagnostic-Code: SMTP; 554 5.4.6 Too many hops
Last-Attempt-Date: Wed, 23 Jun 2004 04:12:31 -0700
Date: Wed, 23 Jun 2004 20:36:39 +1000
To: Krzysztof Rusocki <kszysiu@xxxxxxxxxxxxxxxxxxxx>
Cc: linux-xfs@xxxxxxxxxxx
User-Agent: Mutt/1.2.5i
From: Nathan Scott <nathans@xxxxxxx>
Subject: Re: xfs oops (CVS-2004-05-15_05:00_UTC)
Hi there,
On Wed, Jun 23, 2004 at 10:49:22AM +0200, Krzysztof Rusocki wrote:
> On Tue, Jun 22, 2004 at 06:29:06PM +1000, Nathan Scott wrote:
> >
> > Actually, since this looks a bit like an IO completion is
> > happening after we've freed a buffer, could you see if the
> > BUG_ONs in this patch are hit on your machine? If so, the
> > kdb backtrace there will be much closer to the fault and
> > might give me enough information to figure out whats going
> > on here.
>
> Hi,
>
> Here's preliminary report:
Thanks!
> 2) I browsed xfs_buf.c changesets and reverted (along with recent
> undelay fix which, as far as I can see, did not reach linux-2.5 bk)
Not yet, no. Its sitting in Linus' merge queue though.
> the following two:
>
> - 1.1722.10.100 [XFS] Don't leak locked pages on readahead failure
> - 1.1587.5.6 [XFS] close external blockdevice after final flush
>
> with no effect (still crashing), actually. Currently I'm running kernel with
> - 1.1371.750.18 [XFS] cleanup pagebuf flag usage and simplify pagebuf_free.
> reverted as well. No conclusions on this one yet, though.
>
> I don't know why but I gave up binary chopping idea, 2.6.4 had been working
> fine so, I think, there are not too many changes to consider...
>
> What changesets would you suggest to focus on if problem persists?
>
Firstly, I guess it would be good to verify that 2.6.4 really
does work. Then, I would focus on all (and there shouldn't be
_too_ many) mods that modified xfs_buf.c -- a bk revtool on
fs/xfs/linux-2.6/xfs_buf.c will give the list going back to
2.6.4 (and further if needed).
> 3) xfs_repair (2.6.3 cmds) on rootfs on one of failing machines *constantly*
> seems to show odd (for me at least) things - I do not know whether
> they indicate some troubles, however I believe they should't be there...
>
> - ...
> - agno = 6
> LEAFN node level is 1 inode 25165984 bno = 8388608
> - agno = 7
> - ...
> LEAFN node level is 1 inode 25165984 bno = 8388608
> - agno = 7
These two I've not come across before, from a read through
the code they seem harmless (the function in repair thats
printing this suggests this is OK, reports success and
carries on). Could you send me an xfs_db dump of that
directory inode (25165984) anyway though? thanks.
> Phase 5 - rebuild AG headers and trees...
> - reset superblock...
> Phase 6 - check inode connectivity...
> - resetting contents of realtime bitmap and summary inodes
> - ensuring existence of lost+found directory
> - traversing filesystem starting at / ...
> rebuilding directory inode 128
> - traversal finished ...
... and this is just the good ol' lost+found entry being
recreated each time - also harmless.
cheers.
--
Nathan
----- End forwarded message -----
--
Nathan
|