xfs
[Top] [All Lists]

Re: XFS umount issue

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: XFS umount issue
From: Nuno Subtil <subtil@xxxxxxxxx>
Date: Tue, 24 May 2011 03:18:11 -0700
Cc: xfs-oss <xfs@xxxxxxxxxxx>
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type:content-transfer-encoding; bh=f11WeVLu2s89nfp6X7bFEw0WT574hBF0nczbXwXe8i0=; b=YaOU2ucz8jQ+vDpgAKX82PeAeE7kgM0ojj8mJS4/mcLXHvWOvHzQikPjkRpXgHg6Z3 t1GmlDmg+PqJ+QdswW9J4ffC3Awl8FuxnvuxqJIlrtBidSyynssbjYTSMa7kxdMpi882 IauQXB5L+khb6USMpLEQzmBEDQfZMSWO7kkx0=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=SLwspJMbYvfKiAJT8/41rECZvRA0q8IvCmeR/cbdC+DXU6spjX6p3LikDUPFTdqW9N jvQFxWx076CPwS+YBXR/PcO0TZrXl/w366VlHd7GFn6PeBus8JvLV0zd70GfQvZB5i6x /DSCmiimyvRIQOtTnRBAPThRU4RvzIq6FgnDM=
In-reply-to: <20110524075404.GG32466@dastard>
References: <BANLkTikNMrFzxJF4a86ZM55r3D=ThPFmOw@xxxxxxxxxxxxxx> <20110524000243.GB32466@dastard> <BANLkTinJecB+CB-n0Au=yaUFLDiDUwhzwg@xxxxxxxxxxxxxx> <20110524075404.GG32466@dastard>
On Tue, May 24, 2011 at 00:54, Dave Chinner <david@xxxxxxxxxxxxx> wrote:

...

>> > Ok, so there's nothing here that actually says it's an unmount
>> > error. More likely it is a vmap problem in log recovery resulting in
>> > aliasing or some other stale data appearing in the buffer pages.
>> >
>> > Can you add a 'xfs_logprint -t <device>' after the umount? You
>> > should always see something like this telling you the log is clean:
>>
>> Well, I just ran into this again even without using the script:
>>
>> root@howl:/# umount /dev/md5
>> root@howl:/# xfs_logprint -t /dev/md5
>> xfs_logprint:
>>     data device: 0x905
>>     log device: 0x905 daddr: 488382880 length: 476936
>>
>>     log tail: 731 head: 859 state: <DIRTY>
>>
>>
>> LOG REC AT LSN cycle 1 block 731 (0x1, 0x2db)
>>
>> LOG REC AT LSN cycle 1 block 795 (0x1, 0x31b)
>
> Was there any other output? If there were valid transactions between
> the head and tail of the log xfs_logprint should have decoded them.

There was no more output here.

>
>> I see nothing in dmesg at umount time. Attempting to mount the device
>> at this point, I got:
>>
>> [  764.516319] XFS (md5): Mounting Filesystem
>> [  764.601082] XFS (md5): Starting recovery (logdev: internal)
>> [  764.626294] XFS (md5): xlog_recover_process_data: bad clientid 0x0
>
> Yup, that's got bad information in a transaction header.
>
>> [  764.632559] XFS (md5): log mount/recovery failed: error 5
>> [  764.638151] XFS (md5): log mount failed
>>
>> Based on your description, this would be an unmount problem rather
>> than a vmap problem?
>
> Not clear yet. I forgot to mention that you need to do
>
> # echo 3 > /proc/sys/vm/drop_caches
>
> before you run xfs_logprint, otherwise it will see stale cached
> pages and give erroneous results..

I added that before each xfs_logprint and ran the script again. Still
the same results:

...
+ mount /store
+ cd /store
+ tar xf test.tar
+ sync
+ umount /store
+ echo 3
+ xfs_logprint -t /dev/sda1
xfs_logprint:
    data device: 0x801
    log device: 0x801 daddr: 488384032 length: 476936

    log tail: 2048 head: 2176 state: <DIRTY>


LOG REC AT LSN cycle 1 block 2048 (0x1, 0x800)

LOG REC AT LSN cycle 1 block 2112 (0x1, 0x840)
+ mount /store
mount: /dev/sda1: can't read superblock

Same messages in dmesg at this point.

> You might want to find out if your platform needs to (and does)
> implement these functions:
>
> flush_kernel_dcache_page()
> flush_kernel_vmap_range()
> void invalidate_kernel_vmap_range()
>
> as these are what XFS relies on platforms to implement correctly to
> avoid cache aliasing issues on CPUs with virtually indexed caches.

Is this what /proc/sys/vm/drop_caches relies on as well?

flush_kernel_dcache_page is empty, the others are not but are
conditionalized on the type of cache that is present. I wonder if that
is somehow not being detected properly. Wouldn't that cause other
areas of the system to misbehave as well?

Nuno

>
>> I've tried adding a sync before each umount, as well as testing on a
>> plain old disk partition (i.e., without going through MD), but the
>> problem persists either way.
>
> The use of sync before unmount implies it is not an unmount problem,
> and ruling out MD is also a good thing to know.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>

<Prev in Thread] Current Thread [Next in Thread>