On Sun, Nov 14, 2004 at 09:01:14PM +0100, Florian Weimer wrote:
> * John Heffner:
>
> > of the TCP/UDP checksum is to detect errors occurring outside the
> > protection of the link layer checksums -- errors when data is reassembled
> > or copied across busses inside hosts and routers.
>
> The IP checksum is quite bad at catching those, though. Broken memory
> banks or busses tend to introduce bit errors in distances which are
> multiples of 16 bits (something like 64 or 256). Because of the way
> the IP checksum works, two such errors in the same packet cancel out
> and go undetected.
> I was once on the receiving end of such packets, and I can tell you
> it's not a fun thing to debug. 8-(
Btw., "When the CRC and TCP Checksum Disagree"
http://citeseer.ist.psu.edu/stone00when.html is well worth reading.
Doesn't go into the offload vs. host IP checksum case too heavily, though,
I'm not sure if anyone really has data on that. The impression I have is
that the risk isn't that big. If you're having flipped bits in
your (non-ECC :-) ) memory, you lose. If your PCI bus flips bits,
you probably lose when the data is read off disk. If your NIC has a
bad checksum engine, well... Then the IP checksums end up bad on the remote
end, packets get dropped, people tend to notice and that chip gets host-based
checksums soon enough.
What definately would make sense is using user-space checksums (or just
transmit output from a PRNG + the seed and compare the streams)
in driver/hardware stress testing. And testing all those corner cases which
the driver/NIC might have gotten wrong.
|