xfs
[Top] [All Lists]

Re: XFS _apparent_ corruption: "DATA POINT" (worked around); 2.6.15.4-bi

To: Linda Walsh <xfs@xxxxxxxxx>
Subject: Re: XFS _apparent_ corruption: "DATA POINT" (worked around); 2.6.15.4-biglowmem
From: Timothy Shimmin <tes@xxxxxxx>
Date: Mon, 06 Mar 2006 17:35:01 +1100
Cc: Linux-Xfs <linux-xfs@xxxxxxxxxxx>
In-reply-to: <440B68D7.8060106@xxxxxxxxx>
References: <440B68D7.8060106@xxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Thunderbird 1.5 (X11/20051201)
Hi Linda,

Thanks for the report. I just wanted to comment on some log stuff.

Basically, it would be more useful to use the "-t" option to logprint.
More details (probably unwanted ;-) are mentioned below.

Linda Walsh wrote:
Running 2.6.15.4 with the "biglowmem" patch (to allow using last 128M of 1G
address space w/o calling it HIGHMEM, and using a 3+1G memory split.

System has been _stable_: uptime was 20days+11:04.

I tried doing an 'ls' of a directory and my system hung -- no panic, no message. Had been doing compiles/tests on same disk w/no problems (~26G used, 94G total,
68G free).

* Rebooted, went back to same dir -- hung again. * Rebooted, unmounted partition
  >  xfs_check claimed a journal needed to play.
  *  Remounted partition -- no problem; unmount;
  >  xfs_check -- claimed journal present
  >  xfs_repair -- claimed journal present
 *>  remount & unmount; xfs_repair still sees journal;
* xfs_logprint gave:
----
ls ->hang
fs_logprint:
xfs_logprint: /dev/hde1 contains a mounted and writable filesystem
    data device: 0x2101
    log device: 0x2101 daddr: 100663328 length: 95392

Header 0x3ef wanted 0xfeedbabe
**********************************************************************
* ERROR: header cycle=1007        block=38747                        *
**********************************************************************
Bad log record header
--------
* Decide to delete bad log: run xfs_repair -L /dev/hde1 :
   runs completely through: NO ERRORS;
The "-t" option will look at the log from a recovery point of view and will start from the tail of the log
going up to the head. This will have the outstanding transactions.
Without the "-t" option, it will print in "operational" mode and will start at the beginning of the log,
and it expects a log record header.
If the log has wrapped (which will generally be the case) then it's quite possible not to see a log record header at the start because we do wrapping at a lower level than this - there will instead be an operation header.
(Sometimes I wonder if the default for logprint would be better off as -t).
So, if you had used a "-t" then you probably would have got output and it would have showed where
the head and tail were.
Something is still wrong with the log, of course, if check and repair say the log is dirty after a clean unmount.
That shouldn't happen and it would be interesting to see the log.
The log can be saved with the "-C filename" option which is useful for looking at later although the log could be up to 128MiB in size (more awkard to send). (In your case its 95,392 BBs = 47MiB) Usually the last thing written to the log is an unmount record and it would be interesting to see where
that unmount record has gone.
Without the "-t" option and looking at a lower level view (which will show the unmount record), I've found it useful to use the "-s startblk" option to start at a valid log record. And to find out where the log records are, one can use
the "-d" option.

--Tim


<Prev in Thread] Current Thread [Next in Thread>