[Top] [All Lists]

Re: can xfs_repair guarantee a complete clean filesystem?

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: can xfs_repair guarantee a complete clean filesystem?
From: hank peng <pengxihan@xxxxxxxxx>
Date: Wed, 2 Dec 2009 10:39:50 +0800
Cc: linux-xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=AFe7AOFdaZEd/A9Z5ByPn4HEo40AE4T1ak7fDZ21pRg=; b=GBhF5ImcObnTJFD6APxN3mZkboDuUSRQfDOt8wmRZ8ZQE/WeMry4kLANogXNsPtUq2 VBj7Hi8eXZaW34vl/bagBbNouoxI9Y+W7+2dcYTZwBazWk0OspS/r6xZpjaPyPK9tPBy uUDGfL4r+4/MjofKqTFDHAyD5hHSCZd2GW/Rc=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=OtJF31GjSMiAinfCXsnMRTEaVwbB7lHoy7vHCBylzMLFiSJqAND2Iv5yh/MFWkqAg6 mNVZbEvBLB8Qfbnmdgd2pB82SPt6acUg8Tgs1/HSovDTUTijG04ZvG9W/4cCMnD+6vC1 24kEiOnUcwmn1KYtQb3e/OCBlrTwwuJ6diSVk=
In-reply-to: <4B15BE1C.2030209@xxxxxxxxxxx>
References: <389deec70911301805j37df7397l1c3ddbbad7e91768@xxxxxxxxxxxxxx> <4B14936F.7040401@xxxxxxxxxxx> <389deec70911302037v19764c2cr7686b353c5e933fa@xxxxxxxxxxxxxx> <4B14B077.5090500@xxxxxxxxxxx> <389deec70911302234v2fc792ddt54bf88f5200500be@xxxxxxxxxxxxxx> <4B152BAD.1000004@xxxxxxxxxxx> <389deec70912010732o72edd3c4q196088a1c01b801e@xxxxxxxxxxxxxx> <4B1539C6.90000@xxxxxxxxxxx> <389deec70912011646y682a2cc2rd5d4ea8cfe78d2f5@xxxxxxxxxxxxxx> <4B15BE1C.2030209@xxxxxxxxxxx>
Hi, Eric:
I think I have reproduced the problem.

# uname -a
Linux 1234dahua 2.6.23 #747 Mon Nov 16 10:52:58 CST 2009 ppc unknown
#mdadm -C /dev/md1 -l5 -n3 /dev/sd{h,c,b}
# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
[raid4] [multipath]
md1 : active raid5 sdb[3] sdc[1] sdh[0]
      976772992 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_]
      [==>..................]  recovery = 13.0% (63884032/488386496)
finish=103.8min speed=68124K/sec

unused devices: <none>
#pvcreate /dev/md1
#vgcreate Pool_md1 /dev/md1
#lvcreate -L 931G -n testlv Pool_md1
# lvdisplay
  --- Logical volume ---
  LV Name                /dev/Pool_md1/testlv
  VG Name                Pool_md1
  LV UUID                jWTgk5-Q6tf-jSEU-m9VZ-K2Kb-1oRW-R7oP94
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                931.00 GB
  Current LE             238336
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0

#mkfs.xfs -f -ssize=4k /dev/Pool_md1/testlv
#mount /dev/Pool_md1/testlv /mnt/Pool_md1/testlv
All is OK and mount the filesystem and began to write files into it
through our application software. For a short while, problem occured.
# cd /mnt/Pool_md1/testlv
cd: error retrieving current directory: getcwd: cannot access parent
directories: Input/output error
#dmesg | tail -n 30
--- rd:3 wd:2
 disk 0, o:1, dev:sdh
 disk 1, o:1, dev:sdc
RAID5 conf printout:
 --- rd:3 wd:2
 disk 0, o:1, dev:sdh
 disk 1, o:1, dev:sdc
 disk 2, o:1, dev:sdb
md: recovery of RAID array md1
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for recovery.
md: using 128k window, over a total of 488386496 blocks.
Filesystem "dm-0": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-0
Ending clean XFS mount for filesystem: dm-0
Filesystem "dm-0": XFS internal error xfs_trans_cancel at line 1169 of
file fs/xfs/xfs_trans.c.  Caller 0xc019fbf0
Call Trace:
[e8e6dcb0] [c00091ec] show_stack+0x3c/0x1a0 (unreliable)
[e8e6dce0] [c017559c] xfs_error_report+0x50/0x60
[e8e6dcf0] [c0197058] xfs_trans_cancel+0x124/0x140
[e8e6dd10] [c019fbf0] xfs_create+0x1fc/0x63c
[e8e6dd90] [c01ad690] xfs_vn_mknod+0x1ac/0x20c
[e8e6de40] [c007ded4] vfs_create+0xa8/0xe4
[e8e6de60] [c0081370] open_namei+0x5f0/0x688
[e8e6deb0] [c00729b8] do_filp_open+0x2c/0x6c
[e8e6df20] [c0072a54] do_sys_open+0x5c/0xf8
[e8e6df40] [c0002320] ret_from_syscall+0x0/0x3c
xfs_force_shutdown(dm-0,0x8) called from line 1170 of file
fs/xfs/xfs_trans.c.  Return address = 0xc01b0b74
Filesystem "dm-0": Corruption of in-memory data detected.  Shutting
down filesystem: dm-0
Please umount the filesystem, and rectify the problem(s)

What shoul I do now? use xfs_repair or use newer kernel ? Please let
me know if you need other information.

2009/12/2 Eric Sandeen <sandeen@xxxxxxxxxxx>:
> hank peng wrote:
>> 2009/12/1 Eric Sandeen <sandeen@xxxxxxxxxxx>:
> ...
>>>> kernel version is 2.6.23, xfsprogs is 2.9.7, CPU is MPC8548, powerpc arch.
>>>> I am at home now, Maybe I can provide some detailed information tomorrow.
>>> If there's any possibility to test newer kernel & userspace, that'd
>>> be great.  Many bugs have been fixed since those versions.
>> We did have plan to upgrade kernel to latest 2.6.31.
> Well, I'm just suggesting testing it for now, not necessarily
> upgrading your product.  Would just be good to know if the bug you
> are seeing persists upstream on ppc.
>> BTW, Is there some place where I can check those fixed bug list across 
>> versions?
> You can look at the changelogs on kernel.org, for instance:
> http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.32
> Or with git, git log --pretty-oneline fs/xfs
> There isn't a great bug <-> commit <-> kernelversion mapping, I guess.
> -Eric
>>> -Eric

The simplest is not all best but the best is surely the simplest!

<Prev in Thread] Current Thread [Next in Thread>