[RFC] Store inner MTU in dst_pmtu for IPsec

To: netdev@xxxxxxxxxxx
Subject: [RFC] Store inner MTU in dst_pmtu for IPsec
From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 19 Apr 2004 22:56:29 +1000
WARNING: This patch is completely untested.

Various people have reported that there are issues with TCP MTU handling
with the native IPsec stack.  I found that if your interface/route's MTU
is in a certain range then the xfrm4_tunnel_check_size() function is
guaranteed to fail when given a full-sized packet.

The reason is that it uses trailer_len unconditionally even though
trailer_len is calculated for a packet of a particular size (zero).

For example, in the typical ESP/3DES/IPIP scenario, if the MTU is
congruent to 3/4 modulo 8 then this all works.  But all other values
are doomed to fail.  Luckily, the most common MTUs of 1500/1492 happen
to satisfy this condition.  But the MTU of 1480 (IPIP over Ethernet) fails.

The obvious fix is to use get_mss to calculate the correct size.  However,
since get_mss is rather slow I realised that its value should be cached
rather than recomputed for every outgoing packet.

It soon dawned on me that the entire MTU handling for IPsec is broken.
In particular, we simply ignore ICMP need-to-frag packets for peers
that are across an IPsec tunnel.  This is because the dst_pmtu value
for IPsec dst's comes from the route for the IPsec gateway, not the
route for the peer across the IPsec tunnel.

This patch solves the problem in the obvious way by setting the path
to the peer's route instead of that of the IPsec gateway.

I haven't even compiled this patch so there are bound to be problems
with it.  I have also ignored IPv6 for now.  But I'd like to know
whether I'm completely off my head before proceeding any further :)

