[BACK]Return to machinecheck CVS log [TXT][DIR] Up to [Development] / linux-2.6-xfs / Documentation / x86_64

File: [Development] / linux-2.6-xfs / Documentation / x86_64 / Attic / machinecheck (download)

Revision 1.3, Wed Sep 12 17:09:56 2007 UTC (10 years, 1 month ago) by tes.longdrop.melbourne.sgi.com
Branch: MAIN
Changes since 1.2: +7 -5 lines

Update 2.6.x-xfs to 2.6.23-rc4.

Also update fs/xfs with external mainline changes.
There were 12 such missing commits that I detected:

--------
commit ad690ef9e690f6c31f7d310b09ef1314bcec9033
Author: Al Viro <viro@ftp.linux.org.uk>
    xfs ioctl __user annotations

commit 20c2df83d25c6a95affe6157a4c9cac4cf5ffaac
Author: Paul Mundt <lethal@linux-sh.org>
    mm: Remove slab destructors from kmem_cache_create().

commit d0217ac04ca6591841e5665f518e38064f4e65bd
Author: Nick Piggin <npiggin@suse.de>
    mm: fault feedback #1

commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7
Author: Nick Piggin <npiggin@suse.de>
    mm: merge populate and nopage into fault (fixes nonlinear)

commit d00806b183152af6d24f46f0c33f14162ca1262a
Author: Nick Piggin <npiggin@suse.de>
    mm: fix fault vs invalidate race for linear mappings

commit a569425512253992cc64ebf8b6d00a62f986db3e
Author: Christoph Hellwig <hch@infradead.org>
    knfsd: exportfs: add exportfs.h header

commit 831441862956fffa17b9801db37e6ea1650b0f69
Author: Rafael J. Wysocki <rjw@sisk.pl>
    Freezer: make kernel threads nonfreezable by default

commit 8e1f936b73150f5095448a0fee6d4f30a1f9001d
Author: Rusty Russell <rusty@rustcorp.com.au>
    mm: clean up and kernelify shrinker registration

commit 5ffc4ef45b3b0a57872f631b4e4ceb8ace0d7496
Author: Jens Axboe <jens.axboe@oracle.com>
    sendfile: remove .sendfile from filesystems that use generic_file_sendfile()

commit 8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d
Author: Rafael J. Wysocki <rjw@sisk.pl>
    Add suspend-related notifications for CPU hotplug

commit 59c51591a0ac7568824f541f57de967e88adaa07
Author: Michael Opdenacker <michael@free-electrons.com>
    Fix occurrences of "the the "

commit 0ceb331433e8aad9c5f441a965d7c681f8b9046f
Author: Dmitriy Monakhov <dmonakhov@openvz.org>
    mm: move common segment checks to separate helper function
--------
Merge of 2.6.x-xfs-melb:linux:29656b by kenmcd.

Configurable sysfs parameters for the x86-64 machine check code.

Machine checks report internal hardware error conditions detected
by the CPU. Uncorrected errors typically cause a machine check
(often with panic), corrected ones cause a machine check log entry.

Machine checks are organized in banks (normally associated with
a hardware subsystem) and subevents in a bank. The exact meaning
of the banks and subevent is CPU specific.

mcelog knows how to decode them.

When you see the "Machine check errors logged" message in the system
log then mcelog should run to collect and decode machine check entries
from /dev/mcelog. Normally mcelog should be run regularly from a cronjob.

Each CPU has a directory in /sys/devices/system/machinecheck/machinecheckN
(N = CPU number)

The directory contains some configurable entries:

Entries:

bankNctl
(N bank number)
	64bit Hex bitmask enabling/disabling specific subevents for bank N
	When a bit in the bitmask is zero then the respective
	subevent will not be reported.
	By default all events are enabled.
	Note that BIOS maintain another mask to disable specific events
	per bank.  This is not visible here

The following entries appear for each CPU, but they are truly shared
between all CPUs.

check_interval
	How often to poll for corrected machine check errors, in seconds
	(Note output is hexademical). Default 5 minutes.  When the poller
	finds MCEs it triggers an exponential speedup (poll more often) on
	the polling interval.  When the poller stops finding MCEs, it
	triggers an exponential backoff (poll less often) on the polling
	interval. The check_interval variable is both the initial and
	maximum polling interval.

tolerant
	Tolerance level. When a machine check exception occurs for a non
	corrected machine check the kernel can take different actions.
	Since machine check exceptions can happen any time it is sometimes
	risky for the kernel to kill a process because it defies
	normal kernel locking rules. The tolerance level configures
	how hard the kernel tries to recover even at some risk of
	deadlock.  Higher tolerant values trade potentially better uptime
	with the risk of a crash or even corruption (for tolerant >= 3).

	0: always panic on uncorrected errors, log corrected errors
	1: panic or SIGBUS on uncorrected errors, log corrected errors
	2: SIGBUS or log uncorrected errors, log corrected errors
	3: never panic or SIGBUS, log all errors (for testing only)

	Default: 1

	Note this only makes a difference if the CPU allows recovery
	from a machine check exception. Current x86 CPUs generally do not.

trigger
	Program to run when a machine check event is detected.
	This is an alternative to running mcelog regularly from cron
	and allows to detect events faster.

TBD document entries for AMD threshold interrupt configuration

For more details about the x86 machine check architecture
see the Intel and AMD architecture manuals from their developer websites.

For more details about the architecture see
see http://one.firstfloor.org/~andi/mce.pdf