xfs
[Top] [All Lists]

Re: can xfs_repair guarantee a complete clean filesystem?

To: hank peng <pengxihan@xxxxxxxxx>
Subject: Re: can xfs_repair guarantee a complete clean filesystem?
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Tue, 01 Dec 2009 21:52:46 -0600
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <389deec70912011839y57cf5d84y2a50f8d2ea39ca29@xxxxxxxxxxxxxx>
References: <389deec70911301805j37df7397l1c3ddbbad7e91768@xxxxxxxxxxxxxx> <4B14936F.7040401@xxxxxxxxxxx> <389deec70911302037v19764c2cr7686b353c5e933fa@xxxxxxxxxxxxxx> <4B14B077.5090500@xxxxxxxxxxx> <389deec70911302234v2fc792ddt54bf88f5200500be@xxxxxxxxxxxxxx> <4B152BAD.1000004@xxxxxxxxxxx> <389deec70912010732o72edd3c4q196088a1c01b801e@xxxxxxxxxxxxxx> <4B1539C6.90000@xxxxxxxxxxx> <389deec70912011646y682a2cc2rd5d4ea8cfe78d2f5@xxxxxxxxxxxxxx> <4B15BE1C.2030209@xxxxxxxxxxx> <389deec70912011839y57cf5d84y2a50f8d2ea39ca29@xxxxxxxxxxxxxx>
User-agent: Thunderbird 2.0.0.23 (Macintosh/20090812)
hank peng wrote:
> Hi, Eric:
> I think I have reproduced the problem.
> 
> # uname -a
> Linux 1234dahua 2.6.23 #747 Mon Nov 16 10:52:58 CST 2009 ppc unknown
> #mdadm -C /dev/md1 -l5 -n3 /dev/sd{h,c,b}
> # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
> [raid4] [multipath]
> md1 : active raid5 sdb[3] sdc[1] sdh[0]
>       976772992 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_]
>       [==>..................]  recovery = 13.0% (63884032/488386496)
> finish=103.8min speed=68124K/sec
> 
> unused devices: <none>
> #pvcreate /dev/md1
> #vgcreate Pool_md1 /dev/md1
> #lvcreate -L 931G -n testlv Pool_md1
> #lvdisplay
> # lvdisplay
>   --- Logical volume ---
>   LV Name                /dev/Pool_md1/testlv
>   VG Name                Pool_md1
>   LV UUID                jWTgk5-Q6tf-jSEU-m9VZ-K2Kb-1oRW-R7oP94
>   LV Write Access        read/write
>   LV Status              available
>   # open                 1
>   LV Size                931.00 GB
>   Current LE             238336
>   Segments               1
>   Allocation             inherit
>   Read ahead sectors     auto
>   - currently set to     256
>   Block device           253:0
> 
> #mkfs.xfs -f -ssize=4k /dev/Pool_md1/testlv
> #mount /dev/Pool_md1/testlv /mnt/Pool_md1/testlv
> All is OK and mount the filesystem and began to write files into it
> through our application software. For a short while, problem occured.
> # cd /mnt/Pool_md1/testlv
> cd: error retrieving current directory: getcwd: cannot access parent
> directories: Input/output error
> #dmesg | tail -n 30
> --- rd:3 wd:2
>  disk 0, o:1, dev:sdh
>  disk 1, o:1, dev:sdc
> RAID5 conf printout:
>  --- rd:3 wd:2
>  disk 0, o:1, dev:sdh
>  disk 1, o:1, dev:sdc
>  disk 2, o:1, dev:sdb
> md: recovery of RAID array md1
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than
> 200000 KB/sec) for recovery.
> md: using 128k window, over a total of 488386496 blocks.
> Filesystem "dm-0": Disabling barriers, not supported by the underlying device
> XFS mounting filesystem dm-0
> Ending clean XFS mount for filesystem: dm-0
> Filesystem "dm-0": XFS internal error xfs_trans_cancel at line 1169 of
> file fs/xfs/xfs_trans.c.  Caller 0xc019fbf0
> Call Trace:
> [e8e6dcb0] [c00091ec] show_stack+0x3c/0x1a0 (unreliable)
> [e8e6dce0] [c017559c] xfs_error_report+0x50/0x60
> [e8e6dcf0] [c0197058] xfs_trans_cancel+0x124/0x140
> [e8e6dd10] [c019fbf0] xfs_create+0x1fc/0x63c
> [e8e6dd90] [c01ad690] xfs_vn_mknod+0x1ac/0x20c
> [e8e6de40] [c007ded4] vfs_create+0xa8/0xe4
> [e8e6de60] [c0081370] open_namei+0x5f0/0x688
> [e8e6deb0] [c00729b8] do_filp_open+0x2c/0x6c
> [e8e6df20] [c0072a54] do_sys_open+0x5c/0xf8
> [e8e6df40] [c0002320] ret_from_syscall+0x0/0x3c
> xfs_force_shutdown(dm-0,0x8) called from line 1170 of file
> fs/xfs/xfs_trans.c.  Return address = 0xc01b0b74
> Filesystem "dm-0": Corruption of in-memory data detected.  Shutting
> down filesystem: dm-0
> Please umount the filesystem, and rectify the problem(s)
> 
> What shoul I do now? use xfs_repair or use newer kernel ? Please let
> me know if you need other information.

Test upstream; if it passes, test kernels in between to see if you can find
out when it got fixed, and maybe you can backport it.

If it fails upstream, we have an unfixed bug and we'll try to help you find it.

-Eric

<Prev in Thread] Current Thread [Next in Thread>