Let me slightly hijack this thread to throw out another topic that I
think is worth talking about at the kernel summit: handling remote DMA
(RDMA) network technologies.
As some of you might know, I'm one of the main authors of the
InfiniBand support in the kernel, and I think we have things fairly
well in hand there, although handling direct userspace access to RDMA
capabilities may raise some issues worth talking about.
However, there is also RDMA-over-TCP hardware beginning to be used,
based on the specs from the IETF rddp working group and the RDMA
Consortium. I would hope that we can abstract out the common pieces
for InfiniBand and RDMA NIC (RNIC) support and morph
drivers/infiniband into a more general drivers/rdma.
This is not _that_ offtopic, since RDMA NICs provide another way of
handling OOM for iSCSI. By having the NIC handle the network
transport through something like iSER, you avoid a lot of the issues
in this thread. Having to reconnect to a target while OOM is still a
problem, but it seems no worse in principal than the issues with a
dump FC card that needs the host driver to handling fabric login.
I know that in the InfiniBand world, people have been able to run
stress tests of storage over SCSI RDMA Protocol (SRP) with very heavy
swapping going on and no deadlocks. SRP is in effect network storage
with the transport handled by the IB hardware.
However there are some sticky points that I would be interested in
discussing. For example, the IETF rddp drafts envisage what they call
a "dual stack" model: TCP connections are set up by the usual network
stack and run for a while in "streaming" mode until the application is
ready to start using RDMA. At that point there is an "MPA"
negotiation and then the socket is handed over to the RNIC. Clearly
moving the state from the kernel's stack to the RNIC is not trivial.
Other developers who have more direct experience with RNIC hardware or
perhaps just strong opinions may have other things in this area that
they'd like to talk about.