Alex Bligh - linux-kernel <linux-kernel@xxxxxxxxxxx> writes:
> > It uses vmalloc only when __GFP_VMALLOC flag is given - and so it is
> > expected to not use __GFP_VMALLOC flag in IRQ.
> Ah OK. If your point is that people use GFP_ATOMIC when it's
> not needed, and demand physically contiguous memory when only
> virtually contiguous memory is needed, in several places in
> the kernel, then you are correct. [I am not convinced that
> vmalloc() is the best way to fix it though.]
> Most of the order>0 users of __get_free_pages() don't
> 'need' to do that. For instance I was convinced that networking
> code needed this for larger than 4k packets (pre-fragmentation
> or post-prefragmentation) until someone pointed out that
> the kiovec stuff was there, waiting to be used, if someone
> made the code changes. But the code changes are non-trivial.
The zero copy stuff introduced in 2.4.4 allows for skb fragments.
I haven't seen any of the network drivers using it on their receive
path but it should be possible.
> Note also that something (not sure what) has made fragmentation
> increasingly prevalent over the years since the buddy allocator
> was originally put in.
Actually it seems to be situations like the stack now being two pages
> (see my earlier patch for measuring
> fragmentation). There is currently /no/ intelligence in there
> to defragment stuff, and the 'light touch' patches (ideas I had
> and posted here) don't appear to work. If we want __get_free_pages
> to allocate order>0 this is possible to do reliably if we
> have some intelligent form of page out which attempts
> to defragment as it runs, or else run a defragmenter. It's also possible
> to do allocate order>0 GFP_ATOMIC far more reliably than at
> present if we had a target for defragmentation under normal
> operation, just like we retain a target for pages reserved
> for atomic allocation.
> The very original buddy code (circa 94/95 which I wrote) maintained
> that there should be (from memory) at least one entry on a high
> order list (I think it was the 64k list), which gave you a few
> guaranteed 8k allocations (which was I was interested in). It's
> trivial to patch this into __get_free_pages though I haven't
> tried this (i.e. rather than just look at total free pages,
> look at the existance of a page on either the order=4, 5, 6...
> queues). Note you will use memory less efficiently if you do
> this. In times of cheaper memory costs, it might be worth
> testing this approach again.
> Alex Bligh
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/