Ok, false alarm.
The behavior the kernel exhibits is the same that has always been.
I went back about 10 kernels with the same iproute2 code upto around
2.6.8.
Its narrowed down to be user space problem. Investigating ..
I also found that the kernel does send NLMSG_DONE; somehow
user space misses it.
cheers,
jamal
On Fri, 2005-03-25 at 13:34, jamal wrote:
> On Fri, 2005-03-25 at 12:27, Patrick McHardy wrote:
>
> > What does ps -eo args,wchan show?
> >
>
> It shows tc stuck on wait_for_packet; dump is:
>
> ------
> tc S C06493A0 0 20153 20074 (NOTLB)
> c3e4fc1c 00000086 c4ea8d70 c06493a0 000005b4 00000000 00000000 00000000
> 00000000 00000000 00000000 00022e09 b5edbac0 000f48bb c4ea8d70
> c4ea8ed8
> 00000000 7fffffff c3e4fca0 c3e4fc78 c04b28d4 c015a52d cffebc80
> c3e4fc44
> Call Trace:
> [<c04b28d4>] schedule_timeout+0xd4/0xe0
> [<c03ae4f0>] wait_for_packet+0xb0/0x110
> [<c03ae6a3>] skb_recv_datagram+0x153/0x220
> [<c03eef68>] netlink_recvmsg+0x58/0x210
> [<c03a70ac>] sock_recvmsg+0xcc/0xf0
> [<c03a8c9b>] sys_recvmsg+0x13b/0x200
> [<c03a8f8d>] sys_socketcall+0x22d/0x240
> [<c0103c0d>] sysenter_past_esp+0x52/0x75
> ------
>
> user space is stuck in recvmsg(). It seems to be waiting for an
> NLMSG_DONE to complete the transaction - but that never comes.
>
> One thing i've verified so far is it has nothing to do with the module
> replay code. I am also doubting it has naything to do with locks in
> the kernel. Its also a possibility that something changed in the
> iproute2 causing this stuck waiting for NLMSG_DONE.
>
> cheers,
> jamal
>
>
>
|