On Mon, Apr 18, 2005 at 05:27:50PM +1000, Nathan Scott wrote:
> On Sun, Apr 17, 2005 at 09:57:58PM -0500, KrishnaPradeep Tamma wrote:
> > Hi,
> > Does XFS write an explicit CHECK_POINT block in log or does it go
> > through the whole log during recovery?
> Hmm, neither of those really - it uses a binary chop type of algorithm
> to find the head and tail, and replays just that section of the log.
Elaborating a little more as I understand it...
Every 512 byte block of the ondisk log has a cycle# stored in either the first
word or the 2nd word (for the log record headers). Each time the log wraps with
records written to it, the current cycle# is incremented.
Log records are added/written to the head of the ondisk log each time
internal log buffers get full (or are forced out) which happens as we
initiate more metadata operations.
As metadata is actually written to disk (the non-log part),
the tail of the log can move forward.
On recovery we then want to go thru all the outstanding metadata ops
which never made it to disk and so we need to recover from the tail to the head.
To find the head of the ondisk log it looks at the cycle#s of each block.
For instance if we had cycle#s of 4,4,4,4,4,3,3,3,3,3 we would probably
have a good idea of where it last finished writing to the ondisk log
(just before the first 3 cycle # as we assume this is old data).
However, it can get a bit more complicated than this. One such complication
is that the log records are written out from a ring of log buffers and each
of the log buffers can complete out of order compared with the order in
which the writes were issued.
i.e. we ask to be written bufA-cyc4 bufB-cyc4 bufC-cyc4 bufD-cyc4 but
is is possible that bufB and bufD say finish just prior to the power being
turned off. So we will see 3,4,3,4 on the disk during recovery.
The binary search part comes in when we go searching for blocks with particular
cycle#s. Look at xlog_find_head() for details.
Finding the tail is done after finding the head. Every log record header
contains the tail ptr as of the time the log record was issued to be
written to disk. So to find the tail we scan back from the head to
find the log record header and then use the h_tail_lsn field.
I think that's the basic idea.