On Tue, Sep 03, 2013 at 03:46:24PM -0500, Mark Tinguely wrote:
> On 09/03/13 15:04, Dave Chinner wrote:
> >On Tue, Sep 03, 2013 at 08:07:19AM -0500, Mark Tinguely wrote:
> >>On 09/02/13 17:20, Dave Chinner wrote:
> >>>On Mon, Sep 02, 2013 at 12:03:37PM -0500, Mark Tinguely wrote:
> >>>>On 09/02/13 05:52, Dave Chinner wrote:
> >>>>>Hi folks,
> >>>>>These failures are a result of order-4 allocations being done on v5
> >>>>>filesystems to support the large ACL count xattrs. The first patch
> >>>>>puts out usual falbback to vmalloc workaround in place. The second
> >>>>>patch factors all the places we now have this fallback-to-vmalloc
> >>>>>and makes it transparent to the callers.
> >>>>Thanks for clean up. Broken record time: Do we really need order
> >>>>allocation in the filesystem? Esp in xfs_ioctl.c.
> >>>I don't understand your question. Are you asking why we need high
> >>>order allocation?
> >>In patch 2, why not drop the physically contiguous allocation
> >>attempt and just do the virtually contiguous allocation?
> > a) virtual memory space is extremely limited on some
> > platforms - we regularly get people reporting that they've
> > exhausted vmalloc space on 32 bit systems.
> > b) when there is free contiguous memory, allocating that
> > contiguous memory is much faster than allocating
> > virtual memory.
> > c) virtual memory access is slower than physical memory
> > access and it puts pressure on the page tables.
> >IOWs, we want to avoid allocating virtual memory if at all possible.
> Ummm, It is all virtual memory it all runs through page tables. The
> MMU works on virtual addresses.
Sure, but there's a massive difference between kmalloc and
> It appears Linux has a special range of kernel virtual memory for
> the physical contiguous allocations and range for sparse memory
... kernel memory is directly mapped as there is a 1:! relationship
between the virtual address of kernel memory and the physical
location of the memory. vmalloc memory has an arbitrary virtual to
physical mapping that has to be looked in a separate structure on
every page fault, same as for any userspace page fault. vmalloc
space can become fragmented, be exhausted, etc, just like kmalloc
memory can be. Indeed, there are situations where vmalloc can fail
yet kmalloc will succeed...
> XFS does not need the physical space that backs the kernel virtual
> address to be contiguous - other parts of the kernel do. Why put
> pressure on the drivers that need order allocations when we do not
> need it?
Let's just quote Linus from 2003, shall we:
| > I think it'd make more sense to only use vmalloc when it's explicitly
| > too big for kmalloc - or simply switch on num_online_cpus > 100 or
| > whatever a sensible cutoff is (ie nobody but you would ever see this ;-))
| No, please please please don't do these things.
| vmalloc() is NOT SOMETHING YOU SHOULD EVER USE! It's only valid for when
| you _need_ a big array, and you don't have any choice. It's slow, and it's
| a very restricted resource: it's a global resource that is literally
| restricted to a few tens of megabytes. It should be _very_ carefully used.
| There are basically no valid new uses of it. There's a few valid legacy
| users (I think the file descriptor array), and there are some drivers that
| use it (which is crap, but drivers are drivers), and it's _really_ valid
| only for modules. Nothing else.
| Basically: if you think you need more memory than a kmalloc() can give,
| you need to re-organize your data structures. To either not need a big
| area, or to be able to allocate it in chunks.
Linus will say exactly the same thing today....
And that doesn't take into account that vmalloc() or vm_map_ram()
inside an XFS transaction context can deadlock as page table entries
use GFP_KERNEL allocation, and this can happen in a GFP_NOFS context
Seriously, there are excellent reasons for vmalloc being considered
a bad thing and hence it's use is actively discouraged across
the entire kernel space. We only use it where absolutely necessary
to function correctly.