Wrapped journal record corruption on read at recovery - patch attached (was Re: XFS corruption with failover)
Andy Poling
andy at realbig.com
Wed Oct 14 11:43:16 CDT 2009
On Wed, 14 Oct 2009, Christoph Hellwig wrote:
>> I think the complexity here stems from an uncertainty (as we prepare for the
>> "second" read) whether there was a first read or not. As the code reads
>> today, if there is data before the end, the first read is done and offset has
>> been set. If not, offset is NULL.
>>
>> It seems like the more elegant approach would be to set offset before the
>> first read, and then update it if the first read takes place (in case it was
>> unaligned). That also gets rid of bufaddr, and seems like it might read
>> better.
>
> Yeah. Note that in Linux 2.6.29 I did some changes in that are to
> take the read and offset calculation into a common helper, so if the
> changes become larger we might see some cosmetic difference between
> older and newer kernels.
I'll definitely take a look at that. When I looked into doing what I had
previously described above, I realized it doesn't end up looking any clearer
than the original code. If there is no first read, the second read could
potentially be re-aligned by xlog_bread(), which means we then need to use
xlog_align() to determine the real beginning of the data.
Given that, I'm beginning to think the existing code that only sets offset
when a read is done is pretty good. I have one more idea to clean it up a bit
that I'll try.
Additionally, I have not been able to conclusively convince myself that with a
wrapped record, after a first read, the second read will never need to be
re-aligned... which would be problematic. Is the beginning of the log always
correctly aligned? If so, I will stop worrying about it.
> We have a tool called loggen to produce log traffic as part of the QA
> test suite. We could try to use it to reproduce this case. The most
> important bit is that we work on a filesystem that actually requires
> the alignment bits, that is one using a larger log sector size. Just
> curious, what is the value of sb_logsectlog for your test fs?
If I'm interpreting this correctly, xfs_info says it is 4KiB:
meta-data=/dev/loop0 isize=256 agcount=8, agsize=65536 blks
= sectsz=4096 attr=1
data = bsize=4096 blocks=524288, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=2560, version=2
= sectsz=4096 sunit=1 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
-Andy
It ain't what you don't know that gets you into trouble.
It's what you know for sure that just ain't so. - Mark Twain
More information about the xfs
mailing list