|
Dear Networking Teak,
I am a system engineer in a Chinese software
development company, and currently I am implementing IPSec on Linux version
2.2.16. Generally speaking, what I am doing is adding internet key exchange and
IPsec processing functions into the Linux ip_queue_xmit, ip_build_xmit and
ip_local_deliver procedures. After I have built up a new sk_buff variable, I
send it off using my own function send_skbuff(), this is where the problem
occurs. Following is the complete copy of the culprit:
void send_skbuff(struct sk_buff
*skb) { struct rtable *rt; struct iphdr
*iph;
iph = skb->nh.iph;
if(ip_route_output(&rt, iph->daddr,
iph->saddr, IPTOS_LOWDELAY, 0)) goto
drop; skb->dst = dst_clone(&rt->u.dst);
skb->priority = TC_PRIO_FILLER; if
(skb_cloned(skb)) skb = skb_copy(skb,
GFP_ATOMIC); else skb = skb_clone(skb,
GFP_ATOMIC); if (skb->len >
rt->u.dst.pmtu) goto fragment;
skb->dst->output(skb); return; fragment:
ip_fragment(skb, skb->dst->output); return; drop:
kfree_skb(skb); }
All functions within the above procedure are
kernel supplied and irrelevant to the problem, so I will skip its
explanation. I use the kgdb to debug my codes. Everything worked fine until I
had crossed the skb->dst->output(skb) statement and hit upon return. Then,
when I typed “next” on host machine, the system crashed with
“Scheduling in interrupt!” appearing on target machine and the
following message on my host machine:
Program received signal SIGSEGV, Segmentation
fault. Schedule () at sched.c: 875 875 *(int *)0 = 0
Leery of some low level codes within the
“skb->dst->output(skb)” provoked the crash, I step into that
statement which points to function ip_output. Alas, the system crashed with the
same reason upon the first statement of procedure ip_output, and that statement
does simple book keeping! The problem is patent: scheduling while within the
interrupt processing, the myth is how could it ever be scheduling within
interrupt since I am not doing anything within the interrupt context!? It is a
well known fact that within Linux there are three places to evoke scheduling,
either it is educed explicitly by the procedure itself, which won’t be
within interrupt context as long as the caller is not within it; or it can be
elicited when the current process’s time quantum is exhausted or when a
newly awakened process has a higher priority than the current one, during these
two situations, the system will mark the need_resched flag, and the actual
scheduling is performed when the system returns from a interrupt, again within
the context of the process that is to be scheduled away. Therefore normally
there is no chance to be scheduling in interrupt. Where is the hole? The problem
must be caused by my codes, shouldn’t it? By the way, I implemented the
send_skbuff months ago, and together with other codes they had been working
fine. This scheduling in interrupt problem only recently popped up when I was
debugging error processing for IPsec and the place it occurred is irrelevant to
the error processing. Any comments will be greatly appreciated.
Regards,
Lee Tong
|