Problems with kernel 3.6.x (vm ?) (was : Is kernel 3.6.1 or filestreams option toxic ?)
Yann Dupont
Yann.Dupont at univ-nantes.fr
Mon Oct 29 07:43:00 CDT 2012
Le 29/10/2012 13:18, Dave Chinner a écrit :
> On Mon, Oct 29, 2012 at 12:25:40PM +1100, Dave Chinner wrote:
>> On Mon, Oct 29, 2012 at 10:48:02AM +1100, Dave Chinner wrote:
>>> On Sat, Oct 27, 2012 at 12:05:34AM +0200, Yann Dupont wrote:
>>>> Le 26/10/2012 12:03, Yann Dupont a écrit :
>>>>> Le 25/10/2012 23:10, Dave Chinner a écrit :
>>>> - mkfs.xfs on it, with default options
>>>> - mounted with default options
>>>> - launch something that hammers this volume. I launched compilebench
>>>> 0.6 on it
>>>> - wait some time to fill memory,buffers, and be sure your disks are
>>>> really busy. I waited some minutes after the initial 30 kernel
>>>> unpacking in compilebench
>>>> - hard reset the server (I'm using the Idrac of the server to
>>>> generate a power cycle)
>>>> - After some try, I finally had the impossibility to mount the xfs
>>>> volume, with the error reported in previous mails. So far this is
>>>> normal .
>>> So it doesn't happen every time, and it may be power cycle related.
>>> What is your "local disk"?
>> I can't reproduce this with a similar setup but using KVM (i.e.
>> killing the VM instead of power cycling) or forcing a shutdown of
>> the filesystem without flushing the log. The second case is very
>> much the same as power cycling, but without the potential "power
>> failure caused partial IOs to be written" problem.
>>
>> The only thing I can see in the logprint that I haven't seen so far
>> in my testing is that your log print indicates a checkpoint that
>> wraps the end of the log. I haven't yet hit that situation by
>> chance, so I'll keep trying to see if that's the case that is
>> causing the problem....
> Well, it's taken about 12 hours of random variation of parameters
> in the loop of:
>
> mkfs.xfs -f /dev/vdb
> mount /dev/vdb /mnt/scratch
> ./compilebench -D /mnt/scratch &
> sleep <some period>
> /home/dave/src/xfstests-dev/src/godown /mnt/scratch
> sleep 5
> umount /mnt/scratch
> xfs_logprint -d /dev/vdb
>
> To get a log with a wrapped checkpoint to occur. That was with <some
> period> equal to 36s. In all that time, I hadn't seen a single log
> mount failure, and the moment I get a wrapped log:
>
> 1917 HEADER Cycle 10 tail 9:018456 len 32256 ops 468
> 1981 HEADER Cycle 10 tail 9:018456 len 32256 ops 427
> ^^^^^^^^^^^^^^^
> [00000 - 02045] Cycle 0x0000000a New Cycle 0x00000009
>
> [ 368.364232] XFS (vdb): Mounting Filesystem
> [ 369.096144] XFS (vdb): Starting recovery (logdev: internal)
> [ 369.126545] XFS (vdb): xlog_recover_process_data: bad clientid 0x2c
> [ 369.129522] XFS (vdb): log mount/recovery failed: error 5
> [ 369.131884] XFS (vdb): log mount failed
>
> Ok, so no LVM, no power failure involved, etc. Dig deeper. Let's see
> if logprint can dump the transactional record of the log:
>
> # xfs_logprint -f log.img -t
> .....
> LOG REC AT LSN cycle 9 block 20312 (0x9, 0x4f58)
>
> LOG REC AT LSN cycle 9 block 20376 (0x9, 0x4f98)
> xfs_logprint: failed in xfs_do_recovery_pass, error: 12288
> #
>
> Ok, xfs_logprint failed to decode the wrapped transaction at the end
> of the log. I can't see anything obviously wrong with the contents
> of the log off the top of my head (logprint is notoriously buggy),
> but the above command can reproduce the problem (3 out of 3 so far),
> so I should be able to track down the bug from this.
>
> Cheers,
>
> Dave.
OK, very glad to hear you were able to reproduce it.
Good luck, and now let the chase begin :)
Cheers,
More information about the xfs
mailing list