[Top] [All Lists]

Re: xfs corrupted

To: Emmanuel Florac <eflorac@xxxxxxxxxxxxxx>
Subject: Re: xfs corrupted
From: Stefanita Rares Dumitrescu <katmai@xxxxxxxxxxxxxxx>
Date: Tue, 15 Oct 2013 20:45:59 +0200
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20131015203434.2f336fd8@xxxxxxxxxxxxxx>
References: <1381826507281-35009.post@xxxxxxxxxxxxx> <20131015203434.2f336fd8@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.0.1
That was the first thing i checked: the array was optimal, and i checked each drive with smartctl, and they are all fine.

I left the xfs_repair on for the night, and it showed no progress. I was actually thinking that maybe the memory is bad, so i took the server offline this morning, and ran a memtest for 3 hours, which showed nothing wrong with the sticks, however good news:

I was able to mount the array, but i can only read from it. Whenever i try to write something, it just hangs right there.

I ran an xfs_repair -n on the second array, which is 18 tb in size as opposed to the 14 tb first one, and that check completed in like 10 minutes.

I am running now xfs_repair -n on the 14 tb bad array, and it's stuck here for about 5 hours now.

[root@kp4 ~]# umount /home
[root@kp4 ~]# xfs_repair -n /dev/sdc
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0

What worries me is that i see 100 % cpu usage, some 74 % memory usage (i have 4 gb ram) but there is no disk activity at all. I was thinking that it would be at least some reads if the xfs_repair is doing something.

On 15/10/2013 20:34, Emmanuel Florac wrote:
Le Tue, 15 Oct 2013 01:41:47 -0700 (PDT) vous Ãcriviez:

Did i jump the gun by using the -L switch :/ ?

You should have checked that the RAID is optimal first! In case of a
flailing hardware, any write to the volume can exacerbate problems.

You should use arcconf to check for the RAID state (arcconf getstatus
1)  and eventually run a RAID repair (arcconf task start 1 logicaldrive
0 verify_fix).

<Prev in Thread] Current Thread [Next in Thread>