I spotted this discussion on the main kernel list:
-------- Original Message --------
Subject: lockless poll() (was Re: namei() query)
Date: Mon, 24 Apr 2000 21:36:00 +0900
To: Linus Torvalds <torvalds@xxxxxxxxxxxxx>
CC: Manfred Spraul <manfreds@xxxxxxxxxxxxxxxx>,
In the heavy duty case, csum_partial_copy_generic() becomes the new
winner of the worst time consuming function with the poll()
optimization. We are arranging the global figure now.
Though csum_partial_copy_generic() is highly optimized with
hand-crafted code, it eats lots of time. It may be inevitable, but may
be reducible. We are now investigating why it does.
Has much thought been given to using hardware checksums on transmit?
If someone could sketch out how it should be architected I'll give it a