xfs
[Top] [All Lists]

Re: 2.6.21-git10/11: files getting truncated on xfs? or maybe an nlink p

To: David Chinner <dgc@xxxxxxx>
Subject: Re: 2.6.21-git10/11: files getting truncated on xfs? or maybe an nlink problem?
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Thu, 10 May 2007 07:46:33 -0700
Cc: Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, Matt Mackall <mpm@xxxxxxxxxxx>, xfs@xxxxxxxxxxx, michal.k.k.piotrowski@xxxxxxxxx
In-reply-to: <20070510012609.GU85884050@sgi.com>
References: <4642389E.4080804@goop.org> <20070509231643.GM85884050@sgi.com> <4642598E.3000607@goop.org> <20070510000119.GO85884050@sgi.com> <46426194.3040403@goop.org> <20070510004918.GS85884050@sgi.com> <46426D31.8070000@goop.org> <20070510012609.GU85884050@sgi.com>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Thunderbird 1.5.0.10 (X11/20070302)
David Chinner wrote:
> On Wed, May 09, 2007 at 05:54:09PM -0700, Jeremy Fitzhardinge wrote:
>   
>> David Chinner wrote:
>>     
>>> Suspend-resume, eh?
>>>
>>> There's an immediate suspect. Can you test this specifically for us?
>>> i.e. download a known good file set, do some stuff, suspend, resume,
>>> then check the files? If it doesn't show up the first time, can
>>> you do it a few times just to rule it out?
>>>       
>> Well, I've been doing suspend-resume with xfs for a while without
>> problems; the problems seem to be recent and easily repeatable.  Which
>> just means that it could be a new suspend-resume problem, of course.
>>     
>
> Ok. I'm just trying to find a relatively simple test case for the
> problem - seeing as you seem to be able to reliably reproduce this
> we should be able to work out the trigger...
>   

OK, I was able to reproduce it reliably with a script with did basically:

    for i in `seq 20`; do
        hg clone -U --pull a b-$i
        hg verify b-$i          # always OK
        umount /home
        sleep 5
        mount /home
        hg verify b-$i          # often found truncated files
    done
      

No suspend/resumes involved.  The trees are linux kernel ones, so fairly
large, but small enough to fit entirely in core.  My script also
captured xfs_bmap before/after output for files which had tended to be
corrupted in the past, but unfortunately none of them got corrupted in
these tests.  But I do have all the trees lying around to extract more
detail for if you like.

Interestingly, the corruption happened in each case around the same
place in the tree, often in the sata drivers.  I wonder if that was just
related to the timing of this script.

Attaching script and results.

    J
#!/bin/sh

#set -x
#set -e

D=/home/jeremy/hg
F=linux-clone-test

function emit() {
        #echo "$@" > /dev/tty
        echo "$@"
}

function run() {
        emit "   $@"
        if ! eval "$@"; then
                echo "Command failed"
                exit
        fi
}

function nofail() {
        emit "   $@"
        eval "$@"
}

function validaterepo() {
        nofail hg -R "$1" verify
        run xfs_bmap -vvp $1/.hg/store/*
        run ls -ld $1/.hg/store/*
}

[ -d "$D" ] || nofail mount /home

validaterepo $D/$F || exit
for i in $(seq 20); do
        emit "Iteration $i" $(date)
        run hg clone -U --pull $D/$F $D/$F-$i
        validaterepo $D/$F-$i
        run umount /home
        #run xfs_check /dev/vg00/homexfs || exit
        run sleep 5
        run mount /home
        nofail hg -R "$D/$F-$i" verify
        run xfs_bmap -vvp $D/$F-$i/.hg/store/*
        run ls -l $D/$F-$i/.hg/store/*
        emit
done

Attachment: clonetest.log.gz
Description: GNU Zip compressed data

<Prev in Thread] Current Thread [Next in Thread>