Hi
We are using XFS on many production servers here. We have been running
2.4.9-13 XFS 1.0.2 rel till 2.4.9-31 XFS 1.1 rel with no problems untill
now.
Today I have found a strange message in the dmesg output I did ksymoops on
it and this is the output:
ksymoops 2.4.5 on i686 2.4.9-31SGI_XFS_1.1. Options used
-v /usr/src/linux/vmlinux (specified)
-K (specified)
-l /proc/modules (default)
-o /lib/modules/2.4.9-31SGI_XFS_1.1/ (default)
-m /usr/src/linux/System.map (default)
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
invalid operand: 0000
CPU: 0
EIP: 0010:[<c01db07b>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010287
eax: 00000000 ebx: 00000008 ecx: f45cbcc0 edx: f45cbcc0
esi: 00000001 edi: f45cbcc0 ebp: 00001000 esp: f7da7f58
ds: 0018 es: 0018 ss: 0018
Process bdflush (pid: 7, stackpage=f7da7000)
Stack: 00000001 f45cbcc0 c01db2bb 00000001 f45cbcc0 f45cbcc0 0000000d
f6b9e720
f7da7fbc 00000001 f7da6000 00000286 c02732f8 f7da633a 00000010
c02708e0
00000000 00000000 f7da7fd4 c013487d 00000001 00000001 f7da7fc4
f45cbcc0
Call Trace: [<c01db2bb>]
[<c013487d>]
[<c0137bca>]
[<c024e8ae>]
[<c0137deb>]
[<c01055b4>]
Code: 0f 0b b8 03 00 00 00 f0 0f ab 42 18 0f b7 42 0c 66 89 42 14
>>EIP; c01db07b <submit_bh+2b/6c> <=====
>>ecx; f45cbcc0 <END_OF_CODE+3429214c/????>
>>edx; f45cbcc0 <END_OF_CODE+3429214c/????>
>>edi; f45cbcc0 <END_OF_CODE+3429214c/????>
>>ebp; 00001000 Before first symbol
>>esp; f7da7f58 <END_OF_CODE+37a6e3e4/????>
Trace; c01db2bb <ll_rw_block+1ff/270>
Trace; c013487d <write_buffer+6d/7c>
Trace; c0137bca <flush_dirty_buffers+a2/e4>
Trace; c024e8ae <Unused_offset+a6a/451c>
Trace; c0137deb <bdflush+73/b0>
Trace; c01055b4 <kernel_thread+28/38>
Code; c01db07b <submit_bh+2b/6c>
0000000000000000 <_EIP>:
Code; c01db07b <submit_bh+2b/6c> <=====
0: 0f 0b ud2a <=====
Code; c01db07d <submit_bh+2d/6c>
2: b8 03 00 00 00 mov $0x3,%eax
Code; c01db082 <submit_bh+32/6c>
7: f0 0f ab 42 18 lock bts %eax,0x18(%edx)
Code; c01db087 <submit_bh+37/6c>
c: 0f b7 42 0c movzwl 0xc(%edx),%eax
Code; c01db08b <submit_bh+3b/6c>
10: 66 89 42 14 mov %ax,0x14(%edx)
More info:
Linux version 2.4.9-31SGI_XFS_1.1 (dizzy@us) (gcc version 2.95.3
20010315 (release)) #1 SMP
Its a -custom compiled version. We had it running for 33 days with no
problems at all (until we rebooted for a hw upgrade) on the same machine
doing the same thing.
Please tell me what to do couse I have a bdflush process which shows to be
in a Z(ombie) state and I dont know if this can corrupt my data (althought
the system seems to be running ok).
ps ax| head
PID TTY STAT TIME COMMAND
1 ? S 0:07 init [3]
2 ? SW 0:00 [keventd]
3 ? RWN 0:12 [ksoftirqd_CPU0]
4 ? SWN 0:14 [ksoftirqd_CPU1]
5 ? SW 6:02 [kswapd]
6 ? SW 0:00 [kreclaimd]
7 ? Z 9:20 [bdflush <defunct>]
8 ? SW 83:42 [kupdated]
9 ? SW 0:15 [pagebuf_daemon]
However recently it started to have problems. The first one was a strange
"crash". It responded to ping and trying to connect to its services
returned "connection established" but nothing else worked. Not even a new
SSH session. I even filtered all the traffic to it from a upstream router
and allowed only my machine , didnt got any luck. Trying to put a console
on it didnt worked (the screen remained blank). The second problem was 3
days ago when one of our httpd servers died and every I couldnt restart it
couse the kernel said "address already in use" althought the netstat
command said there is no other process listening there (I even waited for
2 hours). This second problem I have hit it with 2.4.9-13 and 2.4.9-21 too
but very rare (3 months interval).
Thanks
----------------------------
Mihai RUSU
Disclaimer: Any views or opinions presented within this e-mail are solely
those of the author and do not necessarily represent those of any company,
unless otherwise specifically stated.
|