[Top] [All Lists]

Re: [IPX]: Fix checksum computation.

To: Arnaldo Carvalho de Melo <acme@xxxxxxxxxxxxxxxx>
Subject: Re: [IPX]: Fix checksum computation.
From: Stephen Hemminger <shemminger@xxxxxxxx>
Date: Fri, 31 Oct 2003 16:38:43 -0800
Cc: Joe Perches <joe@xxxxxxxxxxx>, netdev@xxxxxxxxxxx
In-reply-to: <20031031213159.GO3705@xxxxxxxxxxxxxxxx>
Organization: Open Source Development Lab
References: <200310312006.h9VK62Hh005910@xxxxxxxxxxxxxxx> <1067635446.11564.92.camel@xxxxxxxxxxxxxxxxxxxxx> <20031031213159.GO3705@xxxxxxxxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
Okay, here is the standard: (Inside Appletalk)

> The DDP checksum is provided to detect errors caused by faulty operation 
> (such as memor
> data bus errors) within routers on the internet. Implementers of DDP should 
> treat generati
> the checksum as an optional feature. The 16-bit DDP checksum is computed as 
> follows:
> CkSum := 0 ;
> FOR each datagram byte starting with the byte immediately following th
> Checksum field
> REPEAT the following algorithm:
>           CkSum := CkSum + byte; (unsigned addition)
>           Rotate CkSum left one bit, rotating the most significant bit in
>                               least significant bit;
> IF, at the end, CkSum = 0 THEN
>           CkSum := $FFFF (all ones).
> Reception of a datagram with CkSum equal to 0 implies that a checksum is not 
> performed.

Here is the old loop:

        while (len--) {
                sum += *data;
                sum <<=1;
                if (sum & 0x10000) {
                        sum &= 0xffff;

My buggy loop is:

        while (len--) {
                sum += *data++;
                sum <<= 1;
                sum = ((sum >> 16) + sum) & 0xFFFF;

The problem is the carry from the first addition needs to be dropped
not folded back (like IP).  

Corrected fast code is:

        while (len--) {
                sum += *data++;
                sum <<= 1;
                sum = (((sum & 0x10000) >> 16) + sum) & 0xffff;

At least it is correct on the standalone random data test, and the
new code is 30% faster for the cached memory case (13.7 clks/byte vs 18 

<Prev in Thread] Current Thread [Next in Thread>