http://oss.sgi.com/bugzilla/show_bug.cgi?id=803
Summary: xfs_repair 2.10.1 finds errors which it cannot repair
Product: Linux XFS
Version: unspecified
Platform: PC
OS/Version: Linux
Status: NEW
Severity: critical
Priority: P1
Component: xfsprogs
AssignedTo: xfs-masters@xxxxxxxxxxx
ReportedBy: chris@xxxxxxxxxxxxxxxxx
Even the latest xfs_repair (2.10.1) cannot repair the errors it finds!
This is a Linux system with a 2.4.31 vanilla kernel with the XFS code as it came
with that kernel - no changes, no patches. The xfsprogs are the latest: 2.10.1.
The problem: no errors during normal operation. This is very bad, because the
corruption has now (possibly) propagated into my backaups! Note that I can mount
and unmount without errors.
The whole story started when I did an rsync with an external backup disc (ext3
filesystem) which showed repeated errors when checked with e2fsck. In case it
helps, the errors were mostly of the form:
Unconnected directory inode 2356270
(.../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../..)
Connect to /lost+found? yes
Since rsync copies sector per sector (or something like that, anyway...), I
suspected that something *might* be wrong with the source filesystem, which was
an XFS, so I ran xfs_check on it.
I got tons of errors:
bad nblocks 0 for inode 3270672, counted 1
bad nblocks 0 for inode 20309208, counted 4
bad nblocks 0 for inode 20310266, counted 1
bad nblocks 0 for inode 45318873, counted 1
bad nblocks 0 for inode 46139128, counted 1
bad nblocks 0 for inode 46139431, counted 5
bad nblocks 0 for inode 52119415, counted 6
bad nblocks 0 for inode 68338719, counted 3
bad nblocks 0 for inode 79692586, counted 1
bad nblocks 0 for inode 95083015, counted 6
bad nblocks 0 for inode 112875036, counted 10
bad nblocks 0 for inode 112875081, counted 7
bad nblocks 0 for inode 112875085, counted 2
bad nblocks 0 for inode 136228262, counted 1
bad nblocks 0 for inode 146778058, counted 6
bad nblocks 0 for inode 146778061, counted 2
bad nblocks 0 for inode 163576511, counted 14
bad nblocks 0 for inode 176286446, counted 1
bad nblocks 0 for inode 180204507, counted 1
bad nblocks 0 for inode 180354599, counted 2
bad nblocks 0 for inode 210057277, counted 1
bad nblocks 0 for inode 227587850, counted 7
bad nblocks 0 for inode 239074245, counted 1
bad nblocks 0 for inode 255837575, counted 3
bad nblocks 0 for inode 262016912, counted 1
bad nblocks 0 for inode 264229578, counted 3
bad nblocks 0 for inode 288354699, counted 1
bad nblocks 0 for inode 293833163, counted 6
bad nblocks 0 for inode 312903924, counted 2
bad nblocks 0 for inode 321911858, counted 1
bad nblocks 0 for inode 322962031, counted 5
bad nblocks 0 for inode 331224174, counted 3
bad nblocks 0 for inode 347882402, counted 1
bad nblocks 0 for inode 448787758, counted 1
bad nblocks 0 for inode 489276729, counted 6
bad nblocks 0 for inode 489276811, counted 3
bad nblocks 0 for inode 489277580, counted 3
bad nblocks 0 for inode 489278058, counted 1
bad nblocks 301120 for inode 540821039, counted 717253
bad nblocks 0 for inode 557844345, counted 1
bad nblocks 0 for inode 675284973, counted 1
bad nblocks 834623 for inode 683093370, counted 1023413
bad nblocks 0 for inode 767558770, counted 2
bad nblocks 0 for inode 771849006, counted 1
bad nblocks 0 for inode 792723588, counted 2
bad nblocks 0 for inode 801112857, counted 4
...and many more, all of the same type. Most of the time the nblocks were 0
while the "counted nblocks" were some small number like 1, 2, 3 or 4. I have
included a few notable exceptions above (like the one for inode 683093370:
original: 834623, counted: 1023413).
Running xfs_repair on this filesystem produces something strange: xfs_repair
finds the same errors during its Phase 3 - however it does not correct them
during Phase 4!
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
correcting nblocks for inode 3270672, was 0 - counted 1
- agno = 1
- agno = 2
correcting nblocks for inode 20309208, was 0 - counted 4
correcting nblocks for inode 20310266, was 0 - counted 1
- agno = 3
- agno = 4
- agno = 5
correcting nblocks for inode 45318873, was 0 - counted 1
correcting nblocks for inode 46139128, was 0 - counted 1
correcting nblocks for inode 46139431, was 0 - counted 5
- agno = 6
correcting nblocks for inode 52119415, was 0 - counted 6
- agno = 7
- agno = 8
correcting nblocks for inode 68338719, was 0 - counted 3
- agno = 9
correcting nblocks for inode 79692586, was 0 - counted 1
- agno = 10
- agno = 11
correcting nblocks for inode 95083015, was 0 - counted 6
- agno = 12
- agno = 13
correcting nblocks for inode 112875036, was 0 - counted 10
correcting nblocks for inode 112875081, was 0 - counted 7
correcting nblocks for inode 112875085, was 0 - counted 2
- agno = 14
- agno = 15
- agno = 16
correcting nblocks for inode 136228262, was 0 - counted 1
- agno = 17
correcting nblocks for inode 146778058, was 0 - counted 6
correcting nblocks for inode 146778061, was 0 - counted 2
- agno = 18
- agno = 19
correcting nblocks for inode 163576511, was 0 - counted 14
- agno = 20
- agno = 21
correcting nblocks for inode 176286446, was 0 - counted 1
correcting nblocks for inode 180204507, was 0 - counted 1
correcting nblocks for inode 180354599, was 0 - counted 2
- agno = 22
- agno = 23
- agno = 24
- agno = 25
correcting nblocks for inode 210057277, was 0 - counted 1
- agno = 26
- agno = 27
correcting nblocks for inode 227587850, was 0 - counted 7
- agno = 28
correcting nblocks for inode 239074245, was 0 - counted 1
- agno = 29
- agno = 30
correcting nblocks for inode 255837575, was 0 - counted 3
- agno = 31
correcting nblocks for inode 262016912, was 0 - counted 1
correcting nblocks for inode 264229578, was 0 - counted 3
- agno = 32
- agno = 33
- agno = 34
correcting nblocks for inode 288354699, was 0 - counted 1
- agno = 35
correcting nblocks for inode 293833163, was 0 - counted 6
- agno = 36
- agno = 37
correcting nblocks for inode 312903924, was 0 - counted 2
- agno = 38
correcting nblocks for inode 321911858, was 0 - counted 1
correcting nblocks for inode 322962031, was 0 - counted 5
- agno = 39
correcting nblocks for inode 331224174, was 0 - counted 3
- agno = 40
- agno = 41
correcting nblocks for inode 347882402, was 0 - counted 1
- agno = 42
- agno = 43
- agno = 44
- agno = 45
- agno = 46
- agno = 47
- agno = 48
- agno = 49
- agno = 50
- agno = 51
- agno = 52
- agno = 53
correcting nblocks for inode 448787758, was 0 - counted 1
- agno = 54
- agno = 55
- agno = 56
- agno = 57
- agno = 58
correcting nblocks for inode 489276729, was 0 - counted 6
correcting nblocks for inode 489276811, was 0 - counted 3
correcting nblocks for inode 489277580, was 0 - counted 3
correcting nblocks for inode 489278058, was 0 - counted 1
- agno = 59
- agno = 60
- agno = 61
- agno = 62
- agno = 63
- agno = 64
correcting nblocks for inode 540821039, was 301120 - counted 717253
- agno = 65
- agno = 66
correcting nblocks for inode 557844345, was 0 - counted 1
- agno = 67
- agno = 68
- agno = 69
- agno = 70
- agno = 71
- agno = 72
- agno = 73
- agno = 74
- agno = 75
- agno = 76
- agno = 77
- agno = 78
- agno = 79
- agno = 80
correcting nblocks for inode 675284973, was 0 - counted 1
- agno = 81
correcting nblocks for inode 683093370, was 834623 - counted 1023413
- agno = 82
- agno = 83
- agno = 84
- agno = 85
- agno = 86
- agno = 87
- agno = 88
- agno = 89
- agno = 90
- agno = 91
correcting nblocks for inode 767558770, was 0 - counted 2
- agno = 92
correcting nblocks for inode 771849006, was 0 - counted 1
- agno = 93
- agno = 94
correcting nblocks for inode 792723588, was 0 - counted 2
- agno = 95
correcting nblocks for inode 801112857, was 0 - counted 4
- agno = 96
...and so on. The above is "Phase 3". From the above output, I would expect the
nblocks to be corrected for those inodes. However, Phase 4 shows nothing at all:
- agno = 295
correcting nblocks for inode 2474797566, was 0 - counted 1
correcting nblocks for inode 2474902928, was 0 - counted 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
...all the way up to agno 295:
- agno = 287
- agno = 288
- agno = 289
- agno = 290
- agno = 291
- agno = 292
- agno = 293
- agno = 294
- agno = 295
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
When I rerun xfs_repair or xfs_check, I get the exact same set of bad nblocks.
This means that xfs_repair fails to correct those errors.
I said that I get no errors during operation of that filesystem. Indeed
everything seems fine (mounting, unmounting, remounting, writing and reading
files...), EXCEPT when I try to touch (with, say, "ls") a file that is in one of
those inodes with the bad nblocks:
I have first to find one, so I do:
find /my-mount-point -inum 1912658730
and get hundreds of filenames with "Error 990":
find: /my-mount-point/file-1: Unknown error 990
find: /my-mount-point/file-2: Unknown error 990
find: /my-mount-point/file-3: Unknown error 990
(/my-mount-point/file-XXX are just example names, not real ones).
When I try to list one of those files with "ls", then and only then(!) I get an
error in the syslog:
kernel: Filesystem "sd(8,17)": corrupt dinode 1019217568, extent total = 1,
nblocks = 0. Unmount and run xfs_repair.
kernel: 0x0: 49 4e 81 a4 01 02 00 01 00 00 01 f4 00 00 00 64
kernel: Filesystem "sd(8,17)": XFS internal error xfs_iformat(1) at line 475 of
file xfs_inode.c. Caller 0xf89118aa
kernel: e5ed1d58 f890fe54 f8946cb1 00000001 f6972800 f8946c8f 000001db f89118aa
kernel: f89118aa 00000000 f6972800 00005015 01000000 00000000 00000000
f6972800
kernel: ccdbea10 f89118aa ccdbea10 e4a5d400 00000001 00000000 e5ed1dc8
00000000
kernel: Call Trace:
[keybdev:__insmod_keybdev_O/lib/modules/2.4.31/kernel/drivers/input/+4293193300/96]
[keybdev:__insmod_keybdev_O/lib/modules/2.4.31/kernel/drivers/input/+4293418161/96]
[keybdev:__insmod_keybdev_O/lib/modules/2.4.31/kernel/drivers/input/+4293418127/96]
[keybdev:__insmod_keybdev_O/lib/modules/2.4.31/kernel/drivers/input/+4293200042/96]
[keybdev:__insmod_keybdev_O/lib/modules/2.4.31/kernel/drivers/input/+4293200042/96]
kernel: Call Trace: [<f890fe54>] [<f8946cb1>] [<f8946c8f>] [<f89118aa>]
[<f89118aa>]
kernel:
[keybdev:__insmod_keybdev_O/lib/modules/2.4.31/kernel/drivers/input/+4293200042/96]
[keybdev:__insmod_keybdev_O/lib/modules/2.4.31/kernel/drivers/input/+4293187380/96]
[keybdev:__insmod_keybdev_O/lib/modules/2.4.31/kernel/drivers/input/+4293188994/96]
[keybdev:__insmod_keybdev_O/lib/modules/2.4.31/kernel/drivers/input/+4293308220/96]
[keybdev:__insmod_keybdev_O/lib/modules/2.4.31/kernel/drivers/input/+4293331701/96]
[keybdev:__insmod_keybdev_O/lib/modules/2.4.31/kernel/drivers/input/+4293377916/96]
kernel: [<f89118aa>] [<f890e734>] [<f890ed82>] [<f892bf3c>] [<f8931af5>]
[<f893cf7c>]
kernel: [real_lookup+242/320] [link_path_walk+1401/2832] [path_lookup+57/80]
[__user_walk+73/128] [filldir64+0/336] [sys_lstat64+31/144]
kernel: [<c0151262>] [<c01519c9>] [<c0152169>] [<c0152519>] [<c01576b0>]
[<c014e03f>]
kernel: [system_call+51/56]
kernel: [<c0108fa7>]
As long as I don't touch these inodes, no errors are thrown anywhere.
There are some uneasy questions here:
1) How did those nblock errors happen at all?
2) Why didn't I get a big, fat error message for each one of those errors at the
time it occurred?
3) How serious is the situation?
Let's get this nasty bug out of the way! What information may I send you next?
Regards
Chris Karakas
http://www.karakas-online.de
--
Configure bugmail: http://oss.sgi.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|