Hello,
I've been using XFS for a few years; this is the first operational problem I've
ran into. I'm using ix86 2.6.16-ck4 (includes 2.6.16.2), I'd be
extremely surprised if the -ck patch had anything to do with this, but I
understand anything is up for suspicion.
Realized this failure today in /var/log/syslog after I noticed a lot of drop
outs from my media player. This seemed to have begun 16 April.
Apr 22 13:10:34 core kernel: Filesystem "hdc1": XFS internal error
xfs_ialloc_read_agi at line 1357 of file fs/xfs/xfs_ialloc.c. Caller 0xf09f5e1b
Apr 22 13:10:34 core kernel: [<f09f7180>] xfs_ialloc_read_agi+0xaa/0xef [xfs]
Apr 22 13:10:34 core kernel: [<f09f5e1b>] xfs_ialloc_ag_select+0x105/0x25b
[xfs]
Apr 22 13:10:34 core last message repeated 2 times
Apr 22 13:10:34 core kernel: [<f09f5fb9>] xfs_dialloc+0x48/0x891 [xfs]
Apr 22 13:10:34 core kernel: [<b013ee86>] kmem_getpages+0x75/0x8d
Apr 22 13:10:34 core kernel: [<f0a03b1e>] xlog_grant_log_space+0xe8/0x208 [xfs]
Apr 22 13:10:34 core kernel: [<f0a03c03>] xlog_grant_log_space+0x1cd/0x208
[xfs]
Apr 22 13:10:34 core kernel: [<f09fbbf1>] xfs_ialloc+0x44/0x444 [xfs]
Apr 22 13:10:34 core kernel: [<f0a02b58>] xlog_grant_push_ail+0x2b/0xf0 [xfs]
Apr 22 13:10:34 core kernel: [<f0a0f4cb>] xfs_dir_ialloc+0x5e/0x21b [xfs]
Apr 22 13:10:34 core kernel: [<f0a0d0e7>] xfs_trans_reserve+0xa9/0x16d [xfs]
Apr 22 13:10:34 core kernel: [<f0a14edd>] xfs_mkdir+0x2c5/0x554 [xfs]
Apr 22 13:10:34 core kernel: [<f0a197b7>] xfs_buf_free+0x79/0x7e [xfs]
Apr 22 13:10:34 core kernel: [<f09e71b4>] xfs_da_brelse+0x6e/0x90 [xfs]
Apr 22 13:10:34 core kernel: [<f0a1d4f9>] linvfs_mknod+0x154/0x278 [xfs]
Apr 22 13:10:34 core kernel: [<b0154814>] __d_lookup+0xa2/0xc5
Apr 22 13:10:34 core kernel: [<f09ecb4a>] xfs_dir2_leaf_lookup+0x1f/0xcf [xfs]
Apr 22 13:10:34 core kernel: [<f09e8f7a>] xfs_dir2_isleaf+0x1b/0x56 [xfs]
Apr 22 13:10:34 core kernel: [<f09e8825>] xfs_dir2_lookup+0xda/0x110 [xfs]
Apr 22 13:10:34 core kernel: [<b017d3f1>] __journal_file_buffer+0xe1/0x1c0
Apr 22 13:10:34 core kernel: [<f0a0f3ef>] xfs_dir_lookup_int+0x2d/0xab [xfs]
Apr 22 13:10:34 core kernel: [<f0a1d6a0>] linvfs_lookup+0x51/0x69 [xfs]
Apr 22 13:10:34 core kernel: [<f0a1d64b>] linvfs_mkdir+0x17/0x1b [xfs]
Apr 22 13:10:34 core kernel: [<b014e32f>] vfs_mkdir+0x58/0x8f
Apr 22 13:10:34 core kernel: [<b014e3ed>] sys_mkdirat+0x87/0xc1
Apr 22 13:10:34 core kernel: [<b014e436>] sys_mkdir+0xf/0x13
Apr 22 13:10:34 core kernel: [<b01029ad>] syscall_call+0x7/0xb
It seems this would happen every time XFS hit a file / directory with an
inconsistency -- and that happened to be a great number.
Interestingly, it did not affect the most recently written files. The bug
really hated Tom Waits.
An example of what xfs_repair did follows; I apologize that I didn't manage to
save the output of xfs_check.
The console buffer was set too low :-(
clearing inode number in entry at offset 1304...
entry "Inspiral_Carpets_-_L01_Real_Thing.flac" at block 0 offset 112 in
directory inode 165 references non-existent inode 22864296
...
rebuilding directory inode 132
- traversal finished ...
- traversing all unattached subtrees ...
rebuilding directory inode 135759756
...
Phase 7 - verify and correct link counts...
resetting inode 128 nlinks from 140 to 120
XFS did an admirable job repairing the underlying file system, but it emptied
my hdd pretty well. Fortunately I have redundant storage on an
external USB drive. It uses XFS as well, and checks fine. Is this journal
corruption? `badblocks` did not find any media problems.
"smart" did not find any media problems. During r[e]sync, no errors were
logged by Smart or xfs.
All things considered, xfs did a remarkable job not causing system
interruptions! xfs_repair also did a decent job saving what it could in
lost+found
(presumably any file that managed to be separated from it's errant parent
directory). If I didn't have rsync and a backup medium, I'd have been
terribly happy with that. It saved 11gb of data.
Anyway, hopefully the errors above might help to find a bug.
Thanks for all that you do,
Ryan Mikulovsky
|