netdev
[Top] [All Lists]

Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics

To: Andi Kleen <ak@xxxxxx>
Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics
From: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 29 Mar 2005 09:56:48 -0600
Cc: Rik van Riel <riel@xxxxxxxxxx>, Dmitry Yusupov <dmitry_yus@xxxxxxxxx>, mpm@xxxxxxxxxxx, andrea@xxxxxxx, michaelc@xxxxxxxxxxx, open-iscsi@xxxxxxxxxxxxxxxx, ksummit-2005-discuss@xxxxxxxxx, netdev@xxxxxxxxxxx
In-reply-to: <20050329152008.GD63268@xxxxxx>
References: <424346FE.20704@xxxxxxxxxxx> <20050324233921.GZ14202@xxxxxxxxxxxxxx> <20050325034341.GV32638@xxxxxxxxx> <20050327035149.GD4053@xxxxxxxxx> <20050327054831.GA15453@xxxxxxxxx> <1111905181.4753.15.camel@mylaptop> <20050326224621.61f6d917.davem@xxxxxxxxxxxxx> <Pine.LNX.4.61.0503272245350.30885@xxxxxxxxxxxxxxxxxxxxxxxxxxx> <m1zmwn21hk.fsf@xxxxxx> <1112027284.5531.27.camel@mulgrave> <20050329152008.GD63268@xxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
On Tue, 2005-03-29 at 17:20 +0200, Andi Kleen wrote:
> > Actually, not in 2.6 ... we had the same issue in SCSI using mempools
> > for sglist allocation.  All of the mempool alocation paths now take gfp_
> > flags, so you can specify GFP_ATOMIC for interrupt context.
> 
> Just does not work when you are actually short of memory.
> 
> Just think a second on how a mempool works: In the extreme
> case when it cannot allocate system memory anymore it has
> to wait for someone else to free a memory block into the mempool,
> then pass it on to the next allocator etc. Basically 
> it is a direct bypass pipeline for memory to pass memory
> directly from one high priority user to another. This only
> works with sleeping. Otherwise you could not handle an arbitary
> number of users with a single mempool.
> 
> So to get a reliable mempool you have to sleep on allocation.

But that's not what we use them for.  You are confusing reliability with
forward progress.

In SCSI we use GFP_ATOMIC mempools in order to make forward progress.
All the paths are coded to expect a failure (in which case we requeue).
For forward progress what we need is the knowledge that there are n
resources out there dedicated to us.  When they return they get
reallocated straight to us and we can restart the queue processing
(there's actually a SCSI trigger that does this).

For receive mempools, the situation is much the same; if you have n
reserved buffers, then you have to drop the n+1 th packet.  However, the
resources will free up and go back to your mempool, and eventually you
accept the packet on retransmit.

The killer scenario (and why we require a mempool) is that someone else
gets the memory before you but then becomes blocked on another
allocation, so now you have no more allocations to allow forward
progress.

James


> > The object isn't to make the queues *reliable* it's to ensure the system
> > can make forward progress.  So all we're trying to ensure is that the
> > sockets used to service storage have some probability of being able to
> > send and receive packets during low memory.
> 
> For that it is enough to make the sender reliable. Retransmit
> takes care of the rest.

No ... we cannot get down to the situation where GFP_ATOMIC always
fails.  Now we have no receive capacity at all and the system deadlocks.

> > In your scenario, if we're out of memory and the system needs several
> > ACK's to the swap device for pages to be released to the system, I don't
> > see how we make forward progress since without a reserved resource to
> > allocate from how does the ack make it up the stack to the storage
> > driver layer?
> 
> Typically because the RX ring of the driver has some packets left.
> 
> Also since TCP is very persistent and there is some memory
> activity left you will have at least occasionally a time slot
> where a GFP_ATOMIC allocation can succeed.

That's what I think a mempool is required to guarantee.  Without it,
there are scenarios where GFP_ATOMIC always fails.

James



<Prev in Thread] Current Thread [Next in Thread>