Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f3JJ1a820859 for netdev-outgoing; Thu, 19 Apr 2001 12:01:36 -0700 Received: from mgw-x1.nokia.com (mgw-x1.nokia.com [131.228.20.21]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f3JJ1XM20856 for ; Thu, 19 Apr 2001 12:01:34 -0700 Received: from esvir05nok.ntc.nokia.com (esvir05nokt.ntc.nokia.com [172.21.143.37]) by mgw-x1.nokia.com (Switch-2.1.0/Switch-2.1.0) with ESMTP id f3JJ16214083 for ; Thu, 19 Apr 2001 22:01:06 +0300 (EET DST) Received: from esebh24nok.ntc.nokia.com (unverified) by esvir05nok.ntc.nokia.com (Content Technologies SMTPRS 4.2.1) with ESMTP id for ; Thu, 19 Apr 2001 22:01:30 +0300 Received: by esebh24nok with Internet Mail Service (5.5.2652.78) id ; Thu, 19 Apr 2001 22:01:30 +0300 Message-ID: <2D6CADE9B0C6D411A27500508BB3CBD0013C792F@eseis15nok> From: Imran.Patel@nokia.com To: netdev@oss.sgi.com Subject: pmtu discovery [PATCH] Date: Thu, 19 Apr 2001 22:01:29 +0300 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2652.78) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk hello everybody, while working with my NAT-PT module, i found that PMTU discovery in Linux doesn't handle a peculiar case - that of having a value of pmtu < 1280 returned by the icmpv6 "pkt too big" message. In a perfectly normal world that should never happen, but it might be the case when the ipv6 host is talking to a ipv4 host across a translator and there is a link of mtu < 1280 on the v4 side....However, rfc 2460 has a solution to this problem: In response to an IPv6 packet that is sent to an IPv4 destination (i.e., a packet that undergoes translation from IPv6 to IPv4), the originating IPv6 node may receive an ICMP Packet Too Big message reporting a Next-Hop MTU less than 1280. In that case, the IPv6 node is not required to reduce the size of subsequent packets to less than 1280, but must include a Fragment header in those packets so that the IPv6-to-IPv4 translating router can obtain a suitable Identification value to use in resulting IPv4 fragments. Note that this means the payload may have to be reduced to 1232 octets (1280 minus 40 for the IPv6 header and 8 for the Fragment header), and smaller still if additional extension headers are used. I've taken a shot in solving this....but i am not sure whether it is the best way to do it....what i do is when i get this mtu < 1280 message, i set the mtu = 1272 (1280 - frag header size; the rfc says it to be 1280)....the other cleaner way would be to define a boolean like RTF_TO_IPV4 and set the mtu = 1280.... But anycase, i set the mtu to be 1272 since 8 bytes are going to be added to every unfragmented packet (except for the packets that will be fragmented).....and it helps in simpler code and helps distinguish this case without defining the boolean... In the case when the packet is going to be fragmented anyway, i pass mtu as 1280....so it is something like: for unfragmented packets: mtu = 1272 & addition of a frag header (so all packets from 0 to 1272 are handled ok) for fragmented packets: mtu = 1280 (all packets with size > 1272 are handled here) i have attached the diffs of ip6_output.c and route.c (latest versions in cvs) against the 2.4.3 versions.....i also am not sure at which other places this may affect (maybe tcp!)...the patch is completely untested (i dunno even if it compiles)...as i am not sure about the approach...comments so that i can improve it??? imran --- /home/pis/route.c Thu Apr 19 13:55:03 2001 +++ net/ipv6/route.c Thu Apr 19 14:08:02 2001 @@ -1002,10 +1002,9 @@ struct rt6_info *rt, *nrt; if (pmtu < IPV6_MIN_MTU) { - if (net_ratelimit()) - printk(KERN_DEBUG "rt6_pmtu_discovery: invalid MTU value %d\n", + pmtu = IPV6_MIN_MTU - sizeof(struct frag_hdr); + printk(KERN_DEBUG "rt6_pmtu_discovery: MTU = %u < IPV6_MIN_MTU", pmtu); - return; } rt = rt6_lookup(daddr, saddr, dev->ifindex, 0); --- net/ipv6/ip6_output.c Thu Apr 19 18:10:19 2001 +++ /home/pis/ip6_output.c Thu Apr 19 13:54:43 2001 @@ -353,9 +353,6 @@ last_len += opt->opt_flen; } - if(mtu < IPV6_MIN_MTU) - mtu = IPV6_MIN_MTU; - /* * Length of fragmented part on every packet but * the last must be an: @@ -626,9 +623,6 @@ err = 0; if (flags&MSG_PROBE) goto out; - - if(mtu < IPV6_MIN_MTU) - pktlength += sizeof(struct frag_hdr); skb = sock_alloc_send_skb(sk, pktlength + 15 + dev->hard_header_len, @@ -647,28 +641,16 @@ skb->nh.ipv6h = hdr; if (!sk->protinfo.af_inet.hdrincl) { - u8 *prev_hdr = &hdr->nexthdr; ip6_bld_1(sk, skb, fl, hlimit, jumbolen ? sizeof(struct ipv6hdr) : pktlength); if (opt || jumbolen) { + u8 *prev_hdr = &hdr->nexthdr; prev_hdr = ipv6_build_nfrag_opts(skb, prev_hdr, opt, final_dst, jumbolen); - - if(mtu < IPV6_MIN_MTU) - prev_hdr = ipv6_build_fraghdr(skb, prev_hdr, 0); - if (opt && opt->opt_flen) ipv6_build_frag_opts(skb, prev_hdr, opt); } - else if(mtu < IPV6_MIN_MTU) { - prev_hdr = ipv6_build_fraghdr(skb, prev_hdr, 0); - } } - - /* drop it ??? */ - else if(mtu < IPV6_MIN_MTU) - goto out; - skb_put(skb, length); err = getfrag(data, &hdr->saddr,