On Wed, 19 Dec 2007 01:41:36 +1100, Yann Dupont
<Yann.Dupont@xxxxxxxxxxxxxx> wrote:
David Chinner wrote:
On Tue, Dec 18, 2007 at 10:20:21AM +0100, Yann Dupont wrote:
Hello, we got a kernel oops, probably in xfs on a debian kernel.
This volume is on SAN + device mapper.
this is a 1 TB volume. It was in service for more than 2 ou 3 years.
There is a high humber of files on it, as this volume serves for a
rsyncd, where 200+ servers sync their root filesystem on it every day.
here is the oops :
Dec 16 23:27:32 inchgower kernel: XFS internal error
XFS_WANT_CORRUPTED_GOTO at line 1561 of file fs/xfs/xfs_alloc.c.
Caller
0xffffffff881857b7
Dec 16 23:27:32 inchgower kernel:
Dec 16 23:27:32 inchgower kernel: Call Trace:
Dec 16 23:27:32 inchgower kernel: [<ffffffff88183ec0>]
:xfs:xfs_free_ag_extent+0x19f/0x67f
corrupted freespace btree. what does xfs_check tell you about the
filesystem on dm-3?
xfs_check tells me to run xfs_repair -L, the attempts to mount the FS
to clear the logs ending in kernel oops.
[snip]
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
)
And now the process seems stuck.
There is no activity on the san disk ;
a ps show this :
root 7885 6466 7885 0 6 1447133 5660020 6 09:55 pts/0
00:00:19 xfs_repair -L /dev/evms/DATAXFS2
root 7885 6466 17190 0 6 1447133 5660020 6 10:16 pts/0
00:00:00 xfs_repair -L /dev/evms/DATAXFS2
root 7885 6466 17191 0 6 1447133 5660020 6 10:16 pts/0
00:00:00 xfs_repair -L /dev/evms/DATAXFS2
root 7885 6466 17192 0 6 1447133 5660020 6 10:16 pts/0
00:00:00 xfs_repair -L /dev/evms/DATAXFS2
root 7885 6466 17193 0 6 1447133 5660020 6 10:16 pts/0
00:00:00 xfs_repair -L /dev/evms/DATAXFS2
root 7885 6466 17194 0 6 1447133 5660020 6 10:16 pts/0
00:00:00 xfs_repair -L /dev/evms/DATAXFS2
and a strace this :
inchgower:~# strace -fp 7885
Process 17194 attached with 6 threads - interrupt to quit
[pid 17191] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...>
[pid 17192] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...>
[pid 17193] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...>
[pid 17194] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...>
[pid 17190] futex(0x67e4f8, FUTEX_WAIT, 2, NULL
Can I stop the process and start another version without risking
problems ?
Yes, you can stop and restart. In your scenario, run xfs_repair -P to
disable prefetch which is getting stuck.
Barry.
|