File: [Development] / linux-2.6-xfs / Documentation / ia64 / aliasing.txt (download)
Revision 1.3, Wed Sep 12 17:09:56 2007 UTC (10 years, 1 month ago) by tes.longdrop.melbourne.sgi.com
Branch: MAIN
CVS Tags: HEAD Changes since 1.2: +12 -0
lines
Update 2.6.x-xfs to 2.6.23-rc4.
Also update fs/xfs with external mainline changes.
There were 12 such missing commits that I detected:
--------
commit ad690ef9e690f6c31f7d310b09ef1314bcec9033
Author: Al Viro <viro@ftp.linux.org.uk>
xfs ioctl __user annotations
commit 20c2df83d25c6a95affe6157a4c9cac4cf5ffaac
Author: Paul Mundt <lethal@linux-sh.org>
mm: Remove slab destructors from kmem_cache_create().
commit d0217ac04ca6591841e5665f518e38064f4e65bd
Author: Nick Piggin <npiggin@suse.de>
mm: fault feedback #1
commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7
Author: Nick Piggin <npiggin@suse.de>
mm: merge populate and nopage into fault (fixes nonlinear)
commit d00806b183152af6d24f46f0c33f14162ca1262a
Author: Nick Piggin <npiggin@suse.de>
mm: fix fault vs invalidate race for linear mappings
commit a569425512253992cc64ebf8b6d00a62f986db3e
Author: Christoph Hellwig <hch@infradead.org>
knfsd: exportfs: add exportfs.h header
commit 831441862956fffa17b9801db37e6ea1650b0f69
Author: Rafael J. Wysocki <rjw@sisk.pl>
Freezer: make kernel threads nonfreezable by default
commit 8e1f936b73150f5095448a0fee6d4f30a1f9001d
Author: Rusty Russell <rusty@rustcorp.com.au>
mm: clean up and kernelify shrinker registration
commit 5ffc4ef45b3b0a57872f631b4e4ceb8ace0d7496
Author: Jens Axboe <jens.axboe@oracle.com>
sendfile: remove .sendfile from filesystems that use generic_file_sendfile()
commit 8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d
Author: Rafael J. Wysocki <rjw@sisk.pl>
Add suspend-related notifications for CPU hotplug
commit 59c51591a0ac7568824f541f57de967e88adaa07
Author: Michael Opdenacker <michael@free-electrons.com>
Fix occurrences of "the the "
commit 0ceb331433e8aad9c5f441a965d7c681f8b9046f
Author: Dmitriy Monakhov <dmonakhov@openvz.org>
mm: move common segment checks to separate helper function
--------
Merge of 2.6.x-xfs-melb:linux:29656b by kenmcd.
|
MEMORY ATTRIBUTE ALIASING ON IA-64
Bjorn Helgaas
<bjorn.helgaas@hp.com>
May 4, 2006
MEMORY ATTRIBUTES
Itanium supports several attributes for virtual memory references.
The attribute is part of the virtual translation, i.e., it is
contained in the TLB entry. The ones of most interest to the Linux
kernel are:
WB Write-back (cacheable)
UC Uncacheable
WC Write-coalescing
System memory typically uses the WB attribute. The UC attribute is
used for memory-mapped I/O devices. The WC attribute is uncacheable
like UC is, but writes may be delayed and combined to increase
performance for things like frame buffers.
The Itanium architecture requires that we avoid accessing the same
page with both a cacheable mapping and an uncacheable mapping[1].
The design of the chipset determines which attributes are supported
on which regions of the address space. For example, some chipsets
support either WB or UC access to main memory, while others support
only WB access.
MEMORY MAP
Platform firmware describes the physical memory map and the
supported attributes for each region. At boot-time, the kernel uses
the EFI GetMemoryMap() interface. ACPI can also describe memory
devices and the attributes they support, but Linux/ia64 currently
doesn't use this information.
The kernel uses the efi_memmap table returned from GetMemoryMap() to
learn the attributes supported by each region of physical address
space. Unfortunately, this table does not completely describe the
address space because some machines omit some or all of the MMIO
regions from the map.
The kernel maintains another table, kern_memmap, which describes the
memory Linux is actually using and the attribute for each region.
This contains only system memory; it does not contain MMIO space.
The kern_memmap table typically contains only a subset of the system
memory described by the efi_memmap. Linux/ia64 can't use all memory
in the system because of constraints imposed by the identity mapping
scheme.
The efi_memmap table is preserved unmodified because the original
boot-time information is required for kexec.
KERNEL IDENTITY MAPPINGS
Linux/ia64 identity mappings are done with large pages, currently
either 16MB or 64MB, referred to as "granules." Cacheable mappings
are speculative[2], so the processor can read any location in the
page at any time, independent of the programmer's intentions. This
means that to avoid attribute aliasing, Linux can create a cacheable
identity mapping only when the entire granule supports cacheable
access.
Therefore, kern_memmap contains only full granule-sized regions that
can referenced safely by an identity mapping.
Uncacheable mappings are not speculative, so the processor will
generate UC accesses only to locations explicitly referenced by
software. This allows UC identity mappings to cover granules that
are only partially populated, or populated with a combination of UC
and WB regions.
USER MAPPINGS
User mappings are typically done with 16K or 64K pages. The smaller
page size allows more flexibility because only 16K or 64K has to be
homogeneous with respect to memory attributes.
POTENTIAL ATTRIBUTE ALIASING CASES
There are several ways the kernel creates new mappings:
mmap of /dev/mem
This uses remap_pfn_range(), which creates user mappings. These
mappings may be either WB or UC. If the region being mapped
happens to be in kern_memmap, meaning that it may also be mapped
by a kernel identity mapping, the user mapping must use the same
attribute as the kernel mapping.
If the region is not in kern_memmap, the user mapping should use
an attribute reported as being supported in the EFI memory map.
Since the EFI memory map does not describe MMIO on some
machines, this should use an uncacheable mapping as a fallback.
mmap of /sys/class/pci_bus/.../legacy_mem
This is very similar to mmap of /dev/mem, except that legacy_mem
only allows mmap of the one megabyte "legacy MMIO" area for a
specific PCI bus. Typically this is the first megabyte of
physical address space, but it may be different on machines with
several VGA devices.
"X" uses this to access VGA frame buffers. Using legacy_mem
rather than /dev/mem allows multiple instances of X to talk to
different VGA cards.
The /dev/mem mmap constraints apply.
mmap of /proc/bus/pci/.../??.?
This is an MMIO mmap of PCI functions, which additionally may or
may not be requested as using the WC attribute.
If WC is requested, and the region in kern_memmap is either WC
or UC, and the EFI memory map designates the region as WC, then
the WC mapping is allowed.
Otherwise, the user mapping must use the same attribute as the
kernel mapping.
read/write of /dev/mem
This uses copy_from_user(), which implicitly uses a kernel
identity mapping. This is obviously safe for things in
kern_memmap.
There may be corner cases of things that are not in kern_memmap,
but could be accessed this way. For example, registers in MMIO
space are not in kern_memmap, but could be accessed with a UC
mapping. This would not cause attribute aliasing. But
registers typically can be accessed only with four-byte or
eight-byte accesses, and the copy_from_user() path doesn't allow
any control over the access size, so this would be dangerous.
ioremap()
This returns a mapping for use inside the kernel.
If the region is in kern_memmap, we should use the attribute
specified there.
If the EFI memory map reports that the entire granule supports
WB, we should use that (granules that are partially reserved
or occupied by firmware do not appear in kern_memmap).
If the granule contains non-WB memory, but we can cover the
region safely with kernel page table mappings, we can use
ioremap_page_range() as most other architectures do.
Failing all of the above, we have to fall back to a UC mapping.
PAST PROBLEM CASES
mmap of various MMIO regions from /dev/mem by "X" on Intel platforms
The EFI memory map may not report these MMIO regions.
These must be allowed so that X will work. This means that
when the EFI memory map is incomplete, every /dev/mem mmap must
succeed. It may create either WB or UC user mappings, depending
on whether the region is in kern_memmap or the EFI memory map.
mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
See https://bugzilla.novell.com/show_bug.cgi?id=140858.
The EFI memory map reports the following attributes:
0x00000-0x9FFFF WB only
0xA0000-0xBFFFF UC only (VGA frame buffer)
0xC0000-0xFFFFF WB only
This mmap is done with user pages, not kernel identity mappings,
so it is safe to use WB mappings.
The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
which uses a granule-sized UC mapping. This granule will cover some
WB-only memory, but since UC is non-speculative, the processor will
never generate an uncacheable reference to the WB-only areas unless
the driver explicitly touches them.
mmap of 0x0-0xFFFFF legacy_mem by "X"
If the EFI memory map reports that the entire range supports the
same attributes, we can allow the mmap (and we will prefer WB if
supported, as is the case with HP sx[12]000 machines with VGA
disabled).
If EFI reports the range as partly WB and partly UC (as on sx[12]000
machines with VGA enabled), we must fail the mmap because there's no
safe attribute to use.
If EFI reports some of the range but not all (as on Intel firmware
that doesn't report the VGA frame buffer at all), we should fail the
mmap and force the user to map just the specific region of interest.
mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
The EFI memory map reports the following attributes:
0x00000-0xFFFFF WB only (no VGA MMIO hole)
This is a special case of the previous case, and the mmap should
fail for the same reason as above.
read of /sys/devices/.../rom
For VGA devices, this may cause an ioremap() of 0xC0000. This
used to be done with a UC mapping, because the VGA frame buffer
at 0xA0000 prevents use of a WB granule. The UC mapping causes
an MCA on HP sx[12]000 chipsets.
We should use WB page table mappings to avoid covering the VGA
frame buffer.
NOTES
[1] SDM rev 2.2, vol 2, sec 4.4.1.
[2] SDM rev 2.2, vol 2, sec 4.4.6.