xfs
[Top] [All Lists]

xfs corruption issue

To: xfs@xxxxxxxxxxx, Dave Chinner <david@xxxxxxxxxxxxx>
Subject: xfs corruption issue
From: Danny Shavit <danny@xxxxxxxxxxxxxxxxx>
Date: Wed, 1 Apr 2015 17:09:11 +0300
Cc: Alex Lyakas <alex@xxxxxxxxxxxxxxxxx>, Lev Vainblat <lev@xxxxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Hello Dave,
My name is Danny Shavit and I am with Zadara storage.
We will appreciate your feedback reagrding an xfs_corruption and xfs_reapir issue.

We found a corrupted xfs volume in one of our systems. It is around 1 TB size and about 12 M files.
We run xfs_repair on the volume which succeeded after 42 minutes.
We noticed that memory consumption raised to about 7.5 GB.
Since some customers are using only 4GB (and sometimes even 2 GB) we tried running "xfs_repair -m 3200" on a 4GB RAM machine.
However, this time an OOM event happened during handling of AG 26 during step 3.
The log of xfs_repair is enclosed below.
We will appreciate your feedback on the amount of memory needed for xfs_repair in general and when using "-m" option specifically.
The xfs metadata dump (prior to xfs_repair) can be found here:
https://zadarastorage-public.s3.amazonaws.com/xfs/xfsdump-prod-ebs_2015-03-30_23-00-38.tgz
It is a 1.2 GB file (and 5.7 GB uncompressed).

We will appreciate your feedback on the corruption pattern as well.
--
Thank you,
Danny Shavit
Zadarastorage

---------- xfs_repair log ----------------
root@vsa-00000428-vc-1:/export/4xfsdump# date; xfs_repair -v /dev/dm-55; dateÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
Tue Mar 31 02:28:04 PDT 2015
Phase 1 - find and verify superblock...
ÂÂÂÂÂÂÂ - block cache size set to 735288 entries
Phase 2 - using internal log
ÂÂÂÂÂÂÂ - zero log...
zero_log: head block 1920 tail block 1920
ÂÂÂÂÂÂÂ - scan filesystem freespace and inode maps...
agi_freecount 54, counted 55 in ag 7
sb_ifree 947, counted 948
ÂÂÂÂÂÂÂ - found root inode chunk
Phase 3 - for each AG...
ÂÂÂÂÂÂÂ - scan and clear agi unlinked lists...
ÂÂÂÂÂÂÂ - process known inodes and perform inode discovery...
ÂÂÂÂÂÂÂ - agno = 0
ÂÂÂÂÂÂÂ - agno = 1
ÂÂÂÂÂÂÂÂ - agno = 2
ÂÂÂÂÂÂÂ - agno = 3
ÂÂÂÂÂÂÂ - agno = 4
ÂÂÂÂÂÂÂ - agno = 5
ÂÂÂÂÂÂÂ - agno = 6
ÂÂÂÂÂÂÂ - agno = 7
ÂÂÂÂÂÂÂ - agno = 8
ÂÂÂÂÂÂÂ - agno = 9
ÂÂÂÂÂÂÂ - agno = 10
ÂÂÂÂÂÂÂ - agno = 11
ÂÂÂÂÂÂÂ - agno = 12
ÂÂÂÂÂÂÂ - agno = 13
ÂÂÂÂÂÂÂ - agno = 14
ÂÂÂÂÂÂÂ - agno = 15
ÂÂÂÂÂÂÂ - agno = 16
ÂÂÂÂÂÂÂ - agno = 17
ÂÂÂÂÂÂÂ - agno = 18
ÂÂÂÂÂÂÂ - agno = 19
ÂÂÂÂÂÂÂ - agno = 20
ÂÂÂÂÂÂÂ - agno = 21
bad . entry in directory inode 5691013154, was 5691013170: correcting
bad . entry in directory inode 5691013156, was 5691013172: correcting
bad . entry in directory inode 5691013157, was 5691013173: correcting
bad . entry in directory inode 5691013163, was 5691013179: correcting
ÂÂÂÂÂÂÂ - agno = 22
ÂÂÂÂÂÂÂ - agno = 23
ÂÂÂÂÂÂÂ - agno = 24
ÂÂÂÂÂÂÂ - agno = 25
ÂÂÂÂÂÂÂ - agno = 26ÂÂ (Danny: OOM occurred here with -m 3200)
ÂÂÂÂÂÂÂ - agno = 27
ÂÂÂÂÂÂÂ - agno = 28
ÂÂÂÂÂÂÂ - agno = 29
ÂÂÂÂÂÂÂ - agno = 30
ÂÂÂÂÂÂÂ - agno = 31
ÂÂÂÂÂÂÂ - agno = 32
ÂÂÂÂÂÂÂ - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
ÂÂÂÂÂÂÂ - setting up duplicate extent list...
ÂÂÂÂÂÂÂ - check for inodes claiming duplicate blocks...
ÂÂÂÂÂÂÂ - agno = 0
ÂÂÂÂÂÂÂ - agno = 1
ÂÂÂÂÂÂÂ - agno = 2
ÂÂÂÂÂÂÂ - agno = 3
ÂÂÂÂÂÂÂ - agno = 4
ÂÂÂÂÂÂÂ - agno = 5
ÂÂÂÂÂÂÂ - agno = 6
ÂÂÂÂÂÂÂ - agno = 7
ÂÂÂÂÂÂÂ - agno = 8
ÂÂÂÂÂÂÂ - agno = 9
ÂÂÂÂÂÂÂ - agno = 10
ÂÂÂÂÂÂÂ - agno = 11
ÂÂÂÂÂÂÂ - agno = 12
ÂÂÂÂÂÂÂ - agno = 13
ÂÂÂÂÂÂÂ - agno = 14
ÂÂÂÂÂÂÂ - agno = 15
ÂÂÂÂÂÂÂ - agno = 16
ÂÂÂÂÂÂÂ - agno = 17
ÂÂÂÂÂÂÂ - agno = 18
ÂÂÂÂÂÂÂ - agno = 19
ÂÂÂÂÂÂÂ - agno = 20
ÂÂÂÂÂÂÂ - agno = 21
ÂÂÂÂÂÂÂ - agno = 22
ÂÂÂÂÂÂÂ - agno = 23
ÂÂÂÂÂÂÂ - agno = 24
ÂÂÂÂÂÂÂ - agno = 25
ÂÂÂÂÂÂÂ - agno = 26
ÂÂÂÂÂÂÂ - agno = 27
ÂÂÂÂÂÂÂ - agno = 28
ÂÂÂÂÂÂÂ - agno = 29
ÂÂÂÂÂÂÂ - agno = 30
ÂÂÂÂÂÂÂ - agno = 31
ÂÂÂÂÂÂÂ - agno = 32
Phase 5 - rebuild AG headers and trees...
ÂÂÂÂÂÂÂ - agno = 0
ÂÂÂÂÂÂÂ - agno = 1
ÂÂÂÂÂÂÂ - agno = 2
ÂÂÂÂÂÂÂ - agno = 3
ÂÂÂÂÂÂÂ - agno = 4
ÂÂÂÂÂÂÂ - agno = 5
ÂÂÂÂÂÂÂ - agno = 6
ÂÂÂÂÂÂÂ - agno = 7
ÂÂÂÂÂÂÂ - agno = 8
ÂÂÂÂÂÂÂ - agno = 9
ÂÂÂÂÂÂÂ - agno = 10
ÂÂÂÂÂÂÂ - agno = 11
ÂÂÂÂÂÂÂ - agno = 12
ÂÂÂÂÂÂÂ - agno = 13
ÂÂÂÂÂÂÂ - agno = 14
ÂÂÂÂÂÂÂ - agno = 15
ÂÂÂÂÂÂÂ - agno = 16
ÂÂÂÂÂÂÂ - agno = 17
ÂÂÂÂÂÂÂ - agno = 18
ÂÂÂÂÂÂÂ - agno = 19
ÂÂÂÂÂÂÂ - agno = 20
ÂÂÂÂÂÂÂ - agno = 21
ÂÂÂÂÂÂÂ - agno = 22
ÂÂÂÂÂÂÂ - agno = 23
ÂÂÂÂÂÂÂ - agno = 24
ÂÂÂÂÂÂÂ - agno = 25
ÂÂÂÂÂÂÂ - agno = 26
ÂÂÂÂÂÂÂ - agno = 27
ÂÂÂÂÂÂÂ - agno = 28
ÂÂÂÂÂÂÂ - agno = 29
ÂÂÂÂÂÂÂ - agno = 30
ÂÂÂÂÂÂÂ - agno = 31
ÂÂÂÂÂÂÂ - agno = 32
ÂÂÂÂÂÂÂ - reset superblock...
Phase 6 - check inode connectivity...
ÂÂÂÂÂÂÂ - resetting contents of realtime bitmap and summary inodes
ÂÂÂÂÂÂÂ - traversing filesystem ...
ÂÂÂÂÂÂÂ - agno = 0
ÂÂÂÂÂÂÂ - agno = 1
ÂÂÂÂÂÂÂ - agno = 2
ÂÂÂÂÂÂÂ - agno = 3
ÂÂÂÂÂÂÂ - agno = 4
ÂÂÂÂÂÂÂ - agno = 5
ÂÂÂÂÂÂÂ - agno = 6
ÂÂÂÂÂÂÂ - agno = 7
ÂÂÂÂÂÂÂ - agno = 8
ÂÂÂÂÂÂÂ - agno = 9
ÂÂÂÂÂÂÂ - agno = 10
ÂÂÂÂÂÂÂ - agno = 11
entry "SavedXML" in dir inode 2992927241 inconsistent with .. value (4324257659) in ino 5691013156
ÂÂÂÂÂÂÂ will clear entry "SavedXML"
rebuilding directory inode 2992927241
ÂÂÂÂÂÂÂ - agno = 12
ÂÂÂÂÂÂÂ - agno = 13
ÂÂÂÂÂÂÂ - agno = 14
ÂÂÂÂÂÂÂ - agno = 15
ÂÂÂÂÂÂÂ - agno = 16
entry "Out" in dir inode 4324257659 inconsistent with .. value (2992927241) in ino 5691013172
ÂÂÂÂÂÂÂ will clear entry "Out"
rebuilding directory inode 4324257659
ÂÂÂÂÂÂÂ - agno = 17
ÂÂÂÂÂÂÂ - agno = 18
ÂÂÂÂÂÂÂ - agno = 19
ÂÂÂÂÂÂÂ - agno = 20
ÂÂÂÂÂÂÂ - agno = 21
entry "tocs_file" in dir inode 5691012138 inconsistent with .. value (3520464676) in ino 5691013154
ÂÂÂÂÂÂÂ will clear entry "tocs_file"
entry "trees.log" in dir inode 5691012138 inconsistent with .. value (3791956240) in ino 5691013155
ÂÂÂÂÂÂÂ will clear entry "trees.log"
rebuilding directory inode 5691012138
entry "filelist.xml" in directory inode 5691012139 not consistent with .. value (1909707067) in inode 5691013157,
junking entry
fixing i8count in inode 5691012139
entry "image001.jpg" in directory inode 5691012140 not consistent with .. value (2450176033) in inode 5691013163,
junking entry
fixing i8count in inode 5691012140
entry "OCR" in dir inode 5691013154 inconsistent with .. value (5691013170) in ino 1909707065
ÂÂÂÂÂÂÂ will clear entry "OCR"
entry "Tmp" in dir inode 5691013154 inconsistent with .. value (5691013170) in ino 2179087403
ÂÂÂÂÂÂÂ will clear entry "Tmp"
entry "images" in dir inode 5691013154 inconsistent with .. value (5691013170) in ino 2450176007
ÂÂÂÂÂÂÂ will clear entry "images"
rebuilding directory inode 5691013154
entry "286_Kellman_Hoffer_Master.pdf_files" in dir inode 5691013156 inconsistent with .. value (5691013172) in ino 834535727
ÂÂÂÂÂÂÂ will clear entry "286_Kellman_Hoffer_Master.pdf_files"
rebuilding directory inode 5691013156
ÂÂÂÂÂÂÂ - agno = 22
ÂÂÂÂÂÂÂ - agno = 23
ÂÂÂÂÂÂÂ - agno = 24
ÂÂÂÂÂÂÂ - agno = 25
ÂÂÂÂÂÂÂ - agno = 26
ÂÂÂÂÂÂÂ - agno = 27
ÂÂÂÂÂÂÂ - agno = 28
ÂÂÂÂÂÂÂ - agno = 29
ÂÂÂÂÂÂÂ - agno = 30
ÂÂÂÂÂÂÂ - agno = 31
ÂÂÂÂÂÂÂ - agno = 32
ÂÂÂÂÂÂÂ - traversal finished ...
ÂÂÂÂÂÂÂ - moving disconnected inodes to lost+found ...
disconnected dir inode 834535727, moving to lost+found
disconnected dir inode 1909707065, moving to lost+found
disconnected dir inode 2179087403, moving to lost+found
disconnected dir inode 2450176007, moving to lost+found
disconnected dir inode 5691013154, moving to lost+found
disconnected dir inode 5691013155, moving to lost+found
disconnected dir inode 5691013156, moving to lost+found
disconnected dir inode 5691013157, moving to lost+found
disconnected dir inode 5691013163, moving to lost+found
disconnected dir inode 5691013172, moving to lost+found
Phase 7 - verify and correct link counts...
resetting inode 81777983 nlinks from 2 to 12
resetting inode 1909210410 nlinks from 1 to 2
resetting inode 1909707067 nlinks from 3 to 2
resetting inode 2450176033 nlinks from 18 to 17
resetting inode 2992927241 nlinks from 13 to 12
resetting inode 3520464676 nlinks from 13 to 12
resetting inode 3791956240 nlinks from 13 to 12
resetting inode 4324257659 nlinks from 13 to 12
resetting inode 5691013154 nlinks from 5 to 2
resetting inode 5691013156 nlinks from 3 to 2

ÂÂÂÂÂÂÂ XFS_REPAIR SummaryÂÂÂ Tue Mar 31 03:11:00 2015

PhaseÂÂÂÂÂÂÂÂÂÂ StartÂÂÂÂÂÂÂÂÂÂ EndÂÂÂÂÂÂÂÂÂÂÂÂ Duration
Phase 1:ÂÂÂÂÂÂÂ 03/31 02:28:04Â 03/31 02:28:05Â 1 second
Phase 2:ÂÂÂÂÂÂÂ 03/31 02:28:05Â 03/31 02:28:42Â 37 seconds
Phase 3:ÂÂÂÂÂÂÂ 03/31 02:28:42Â 03/31 02:48:29Â 19 minutes, 47 seconds
Phase 4:ÂÂÂÂÂÂÂ 03/31 02:48:29Â 03/31 02:55:40Â 7 minutes, 11 seconds
Phase 5:ÂÂÂÂÂÂÂ 03/31 02:55:40Â 03/31 02:55:43Â 3 seconds
Phase 6:ÂÂÂÂÂÂÂ 03/31 02:55:43Â 03/31 03:10:57Â 15 minutes, 14 seconds
Phase 7:ÂÂÂÂÂÂÂ 03/31 03:10:57Â 03/31 03:10:57

Total run time: 42 minutes, 53 seconds
done
Tue Mar 31 03:11:01 PDT 2015

<Prev in Thread] Current Thread [Next in Thread>