Received: with ECARTIS (v1.0.0; list netdev); Sun, 15 May 2005 03:41:05 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j4FAevOv012735 for ; Sun, 15 May 2005 03:40:58 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DXGXi-0004VA-00; Sun, 15 May 2005 20:40:18 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DXGXg-0006L0-00; Sun, 15 May 2005 20:40:16 +1000 Date: Sun, 15 May 2005 20:40:16 +1000 To: Evgeniy Polyakov Cc: netdev@oss.sgi.com, davem@davemloft.net Subject: [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames Message-ID: <20050515104016.GA24344@gondor.apana.org.au> References: <20050514134834.GA2698@uganda.factory.vocord.ru> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="envbJBWh7q8WU6mo" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.6+20040907i From: Herbert Xu X-archive-position: 1136 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 2623 Lines: 75 --envbJBWh7q8WU6mo Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sun, May 15, 2005 at 06:01:09PM +1000, Herbert Xu wrote: > > In fact you've uncovered a couple of bugs elsewhere too. IPv4's > ip_push_pending_frames attributes all the data to the first skb > which breaks when the packet is sent through ip_fragment since > the latter does not undo the truesize attribution. skb_linearize > is broken as well since it ignores truesize/owner altogether. Here is a patch that addresses the first issue. Since frag_list skb's constructed by ip*_push_pending_frames are usually going to be fragmented in ip*_fragment and sent separately, it makes no sense to account them all in the head skb. Doing so means that ip*_fragment would have to undo it all in order to keep the correct wmem accounting. In fact as it is IPv4 is attributing all the memory to the head skb in ip_push_pending_frames and not undoing the attribution correctly in ip_fragment. On the fast path it will not undo the work at all which means that as soon as the head skb is transmitted the quota which may still be used by the other skb's can be used by new data. On the slow path it double-charges the skb's which may unnecssarily delay new data from being sent. This patch simply leaves the wmem accounting with each skb. Signed-off-by: Herbert Xu Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --envbJBWh7q8WU6mo Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1138,10 +1138,6 @@ int ip_push_pending_frames(struct sock * tail_skb = &(tmp_skb->next); skb->len += tmp_skb->len; skb->data_len += tmp_skb->len; - skb->truesize += tmp_skb->truesize; - __sock_put(tmp_skb->sk); - tmp_skb->destructor = NULL; - tmp_skb->sk = NULL; } /* Unless user demanded real pmtu discovery (IP_PMTUDISC_DO), we allow --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1116,12 +1116,6 @@ int ip6_push_pending_frames(struct sock tail_skb = &(tmp_skb->next); skb->len += tmp_skb->len; skb->data_len += tmp_skb->len; -#if 0 /* Logically correct, but useless work, ip_fragment() will have to undo */ - skb->truesize += tmp_skb->truesize; - __sock_put(tmp_skb->sk); - tmp_skb->destructor = NULL; - tmp_skb->sk = NULL; -#endif } ipv6_addr_copy(final_dst, &fl->fl6_dst); --envbJBWh7q8WU6mo--