Version 2 of the rfc/patch is attached. It has been changed
as indicated in the commentary below.
Diffstat:
include/linux/sysctl.h | 1
net/ipv4/ip_fragment.c | 195 +++++++++++++++++++++++++++++++++++++++++++++
net/ipv4/sysctl_net_ipv4.c | 11 ++
Signed-off-by: Arthur Kepner <akepner@xxxxxxx>
On Tue, 28 Jun 2005, Arthur Kepner wrote:
>
> On Sun, 26 Jun 2005, Herbert Xu wrote:
>
> >
> > > +struct ipc {
> > > ......
> > > + struct rcu_head rcu;
> >
> > Is RCU worth it here? The only time we'd be taking the locks on this
> > is when the first fragment of a packet comes in. At that point we'll
> > be taking write_lock(&ipfrag_lock) anyway.
> >
Yes, I think rcu is worth it here. The reason is that to
not use rcu would necessitate grabbing the (global)
ipfrag_lock an additional time, when we free an ipc.
Adding an "ipc" to the hashtable could be done under the
ipfrag_lock, as you mention. But removing an ipc shouldn't
be done at the same time that fragments are destroyed,
because the common case is that another fragment queue will
soon be created for the same (src,dst,proto). Better to
save the ipc for a while to avoid freeing and then
immediately recreating it.
Since the freeing of the ipc has to be deferred until
well after the last associated fragment queue has been
freed, we can't take advantage of the fact that the
ipfrag_lock is held when the fragment queue is freed.
So when finally freeing the ipc, we can either grab the
global ipfrag_lock again, or use some other, finer-grained
lock to protect the ipc_hash entries. I'd prefer to avoid
introducing new uses of global locks.
If we use the finer-grained ipc_hash[].lock locks then
rcu allows us to avoid taking any locks in ipc_find when we
create a new fragment chain and there already happens to be
an ipc for the associated (src,dst,proto). (I suspect this
would be a fairly common case.)
> > The only other use of RCU in your patch is ip_count. That should be
> > changed to be done in ip_defrag instead. At that point you can simply
> > find the ipc by deferencing ipq, so no need for __ipc_find and hence
> > RCU.
> >
> > The reason you need to change it in this way is because you can't make
> > assumptions about ip_rcv_finish being the first place where a packet
> > is defragmented. With connection tracking enabled conntrack is the first
> > place where defragmentation occurs.
> >
> .....
This has been fixed. ip_input.c isn't changed by this version
of the patch. But there's the caveat that I mentioned earlier:
>
> There is a (big) advantage to doing this in ip_defrag() - this
> becomes a no-op for non-fragmented datagrams. The disadvantage
> is that there could be a situation where you receive:
>
> 1) first fragment of datagram X [for a particular (src,dst,proto)]
> 2) a zillion non-fragmented datagrams [for the same (src,dst,proto)]
> 3) last fragment of datagram X [for (src,dst,proto)]
>
> and no "disorder" would be detected for the datagrams associated
> with (src,dst,proto), even though the ip id could have wrapped in the
> meantime. This seems like a very uncommon case, however.
>
>
> > > +#define IPC_HASHSZ IPQ_HASHSZ
> > > +static struct {
> > > + struct hlist_head head;
> > > + spinlock_t lock;
> > > +} ipc_hash[IPC_HASHSZ];
> >
> > I'd store ipc entries in the main ipq hash table since they can use
> > the same keys for lookup as ipq entries. You just need to set protocol
> > to zero and map the user to values specific to ipc for ipc entries.
> > One mapping would be to set the top bit of user for ipc entries, e.g.
> >
> > #define IP_DEFRAG_IPC 0x80000000
> > ipc->user = ipq->user | IP_DEFRAG_IPC;
> >
> > Of course you also need to make sure that the two structures share
> > the leading elements. You can then use the user field to distinguish
> > between ipc/ipq entries.
>
I thought about this point, but I dislike reusing the same
structure for such different purposes, so left this unchanged.
Comments?
--
Arthur
ip_reasm.patch.2
Description: ip_reasm.patch.2
|