[Top] [All Lists]

Re: Corrupted files

To: Leslie Rhorer <lrhorer@xxxxxxxxxxxx>, Sean Caron <scaron@xxxxxxxxx>
Subject: Re: Corrupted files
From: Sean Caron <scaron@xxxxxxxxx>
Date: Tue, 9 Sep 2014 21:25:37 -0400
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <540FA586.9090308@xxxxxxxxxxxx>
References: <540F1B01.3020700@xxxxxxxxxxxx> <20140909220645.GH20518@dastard> <540FA586.9090308@xxxxxxxxxxxx>
Hi Leslie,

You really don't want to be running "green" anything in an array... that is a ticking time bomb just waiting to go off... let me tell you... At my installation, a predecessor had procured a large number of green drives because they were very inexpensive and regrets were had by all. Lousy performance, lots of spurious ejection/RAID gremlins and the failure rate on the WDC Greens is just appalling...

BBWC stands for Battery Backed Write Cache; this is a feature of hardware RAID cards; it is just like it says on the tin; a bit (usually half a gig, or a gig, or two...) of nonvolatile cache that retains writes to the array in case of power failure, etc. If you have BBWC enabled but your battery is dead, bad things can happen. Not applicable for JBOD software RAID.

I hold firm to my beliefs on xfs_repair :) As I say, you'll see a variety of opinions here.Â



On Tue, Sep 9, 2014 at 9:12 PM, Leslie Rhorer <lrhorer@xxxxxxxxxxxx> wrote:
On 9/9/2014 5:06 PM, Dave Chinner wrote:
Fristly, more infomration is required, namely versions and actual
error messages:


RAID-Server:/# xfs_repair -V
xfs_repair version 3.1.7
RAID-Server:/# uname -r

4.0 GHz FX-8350 eight core processor

RAID-Server:/# cat /proc/meminfo /proc/mounts /proc/partitions
MemTotal:Â Â Â Â 8099916 kB
MemFree:Â Â Â Â Â5786420 kB
Buffers:Â Â Â Â Â 112684 kB
Cached:Â Â Â Â Â Â457020 kB
SwapCached:Â Â Â Â Â Â 0 kB
Active:Â Â Â Â Â Â521800 kB
Inactive:Â Â Â Â Â457268 kB
Active(anon):Â Â Â276648 kB
Inactive(anon):Â Â140180 kB
Active(file):Â Â Â245152 kB
Inactive(file):Â Â317088 kB
Unevictable:Â Â Â Â Â Â0 kB
Mlocked:Â Â Â Â Â Â Â Â0 kB
SwapTotal:Â Â Â 12623740 kB
SwapFree:Â Â Â Â12623740 kB
Dirty:Â Â Â Â Â Â Â Â 20 kB
Writeback:Â Â Â Â Â Â Â0 kB
AnonPages:Â Â Â Â 409488 kB
Mapped:Â Â Â Â Â Â 47576 kB
Shmem:Â Â Â Â Â Â Â 7464 kB
Slab:Â Â Â Â Â Â Â197100 kB
SReclaimable:Â Â Â112644 kB
SUnreclaim:Â Â Â Â 84456 kB
KernelStack:Â Â Â Â 2560 kB
PageTables:Â Â Â Â Â8468 kB
NFS_Unstable:Â Â Â Â Â 0 kB
Bounce:Â Â Â Â Â Â Â Â 0 kB
WritebackTmp:Â Â Â Â Â 0 kB
CommitLimit:Â Â 16673696 kB
Committed_AS:Â Â 1010172 kB
VmallocTotal:Â Â34359738367 kB
VmallocUsed:Â Â Â 339140 kB
VmallocChunk:Â Â34359395308 kB
HardwareCorrupted:Â Â Â0 kB
AnonHugePages:Â Â Â Â Â0 kB
HugePages_Total:Â Â Â Â0
HugePages_Free:Â Â Â Â 0
HugePages_Rsvd:Â Â Â Â 0
HugePages_Surp:Â Â Â Â 0
Hugepagesize:Â Â Â Â2048 kB
DirectMap4k:Â Â Â Â65532 kB
DirectMap2M:Â Â Â5120000 kB
DirectMap1G:Â Â Â3145728 kB
rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=10240k,nr_inodes=1002653,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=809992k,mode=755 0 0
/dev/disk/by-uuid/fa5c404a-bfcb-43de-87ed-e671fda1ba99 / ext4 rw,relatime,errors=remount-ro,user_xattr,barrier=1,data="">ordered 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
tmpfs /run/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=4144720k 0 0
/dev/md1 /boot ext2 rw,relatime,errors=continue 0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
Backup:/Backup /Backup nfs rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=,mountvers=3,mountport=39597,mountproto=tcp,local_lock=none,addr= 0 0
Backup:/var/www /var/www/backup nfs rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=,mountvers=3,mountport=39597,mountproto=tcp,local_lock=none,addr= 0 0
/dev/md0 /RAID xfs rw,relatime,attr2,delaylog,sunit=2048,swidth=12288,noquota 0 0
major minor #blocks name

 Â8    0 125034840 sda
 Â8    1   96256 sda1
 Â8    2 112305152 sda2
 Â8    3 Â12632064 sda3
 Â8   Â16 125034840 sdb
 Â8   Â17   96256 sdb1
 Â8   Â18 112305152 sdb2
 Â8   Â19 Â12632064 sdb3
 Â8   Â48 3907018584 sdd
 Â8   Â32 3907018584 sdc
 Â8   Â64 1465138584 sde
 Â8   Â80 1465138584 sdf
 Â8   Â96 1465138584 sdg
 Â8   112 3907018584 sdh
 Â8   128 3907018584 sdi
 Â8   144 3907018584 sdj
 Â8   160 3907018584 sdk
 Â9    1   96192 md1
 Â9    2 112239488 md2
 Â9    3 Â12623744 md3
 Â9    0 23441319936 md0
 Â9   Â10 4395021312 md10

RAID-Server:/# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1] [raid0]
md10 : active raid0 sdf[0] sde[2] sdg[1]
   4395021312 blocks super 1.2 512k chunks

md0 : active raid6 md10[12] sdc[13] sdk[10] sdj[11] sdi[15] sdh[8] sdd[9]
   23441319936 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [8/7] [UUU_UUUU]
   bitmap: 29/30 pages [116KB], 65536KB chunk

md3 : active (auto-read-only) raid1 sda3[0] sdb3[1]
   12623744 blocks super 1.2 [3/2] [UU_]
   bitmap: 1/1 pages [4KB], 65536KB chunk

md2 : active raid1 sda2[0] sdb2[1]
   112239488 blocks super 1.2 [3/2] [UU_]
   bitmap: 1/1 pages [4KB], 65536KB chunk

md1 : active raid1 sda1[0] sdb1[1]
   96192 blocks [3/2] [UU_]
   bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

    Six of the drives are 4T spindles (a mixture of makes and models). The three drives comprising MD10 are WD 1.5T green drives. These are in place to take over the function of one of the kicked 4T drives. Md1, 2, and 3 are not data drives and are not suffering any issue.

    I'm not sure what is meant by "write cache status" in this context. The machine has been rebooted more than once during recovery and the FS has been umounted and xfs_repair run several times.

    I don't know for what the acronym BBWC stands.

RAID-Server:/# xfs_info /dev/md0
meta-data="" Â Â Â Â Â Â Âisize=256Â Â agcount=43, agsize=137356288 blks
    Â=           Âsectsz=512 Âattr=2
data  Â=           Âbsize=4096 Âblocks=5860329984, imaxpct=5
    Â=           Âsunit=256  swidth=1536 blks
naming Â=version 2       bsize=4096 Âascii-ci=0
log   =internal       Âbsize=4096 Âblocks=521728, version=2
    Â=           Âsectsz=512 Âsunit=8 blks, lazy-count=1
realtime =none         Âextsz=4096 Âblocks=0, rtextents=0

    The system performs just fine, other than the aforementioned, with loads in excess of 3Gbps. That is internal only. The LAN link is ony 1Gbps, so no external request exceeds about 950Mbps.


dmesg, in particular, should tell use what the corruption being
encountered is when stat fails.

RAID-Server:/# ls "/RAID/DVD/Big Sleep, The (1945)/VIDEO_TS/VTS_01_1.VOB"
ls: cannot access /RAID/DVD/Big Sleep, The (1945)/VIDEO_TS/VTS_01_1.VOB: Structure needs cleaning
RAID-Server:/# dmesg | tail -n 30
[192173.363981] XFS (md0): corrupt dinode 41006, extent total = 1, nblocks = 0.
[192173.363988] ffff8802338b8e00: 49 4e 81 b6 02 02 00 00 00 00 03 e8 00 00 03 e8Â IN..............
[192173.363996] XFS (md0): Internal error xfs_iformat(1) at line 319 of file /build/linux-eKuxrT/linux-3.2.60/fs/xfs/xfs_inode.c. Caller 0xffffffffa0509318
[192173.364062] Pid: 10813, comm: ls Not tainted 3.2.0-4-amd64 #1 Debian 3.2.60-1+deb7u3
[192173.364065] Call Trace:
[192173.364097]Â [<ffffffffa04d3731>] ? xfs_corruption_error+0x54/0x6f [xfs]
[192173.364134]Â [<ffffffffa0509318>] ? xfs_iread+0x9f/0x177 [xfs]
[192173.364170]Â [<ffffffffa0508efa>] ? xfs_iformat+0xe3/0x462 [xfs]
[192173.364204]Â [<ffffffffa0509318>] ? xfs_iread+0x9f/0x177 [xfs]
[192173.364240]Â [<ffffffffa0509318>] ? xfs_iread+0x9f/0x177 [xfs]
[192173.364268]Â [<ffffffffa04d6ebe>] ? xfs_iget+0x37c/0x56c [xfs]
[192173.364300]Â [<ffffffffa04e13b4>] ? xfs_lookup+0xa4/0xd3 [xfs]
[192173.364328]Â [<ffffffffa04d9e5a>] ? xfs_vn_lookup+0x3f/0x7e [xfs]
[192173.364344]Â [<ffffffff81102de9>] ? d_alloc_and_lookup+0x3a/0x60
[192173.364357]Â [<ffffffff8110388d>] ? walk_component+0x219/0x406
[192173.364370]Â [<ffffffff81104721>] ? path_lookupat+0x7c/0x2bd
[192173.364383]Â [<ffffffff81036628>] ? should_resched+0x5/0x23
[192173.364396]Â [<ffffffff8134f144>] ? _cond_resched+0x7/0x1c
[192173.364408]Â [<ffffffff8110497e>] ? do_path_lookup+0x1c/0x87
[192173.364420]Â [<ffffffff81106407>] ? user_path_at_empty+0x47/0x7b
[192173.364434]Â [<ffffffff813533d8>] ? do_page_fault+0x30a/0x345
[192173.364448]Â [<ffffffff810d6a04>] ? mmap_region+0x353/0x44a
[192173.364460]Â [<ffffffff810fe45a>] ? vfs_fstatat+0x32/0x60
[192173.364471]Â [<ffffffff810fe590>] ? sys_newstat+0x12/0x2b
[192173.364483]Â [<ffffffff813509f5>] ? page_fault+0x25/0x30
[192173.364495]Â [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b
[192173.364503] XFS (md0): Corruption detected. Unmount and run xfs_repair

    That last line, by the way, is why I ran umount and xfs_repair.

xfs mailing list

<Prev in Thread] Current Thread [Next in Thread>