请输入您要查询的百科知识:

 

词条 Machine-check exception
释义

  1. Problem types

  2. Possible causes

  3. Decoding MCEs

     Programs to decode Intel and AMD MCEs 

  4. See also

  5. References

  6. External links

{{refimprove|date=June 2011}}

A machine-check exception (MCE) is a type of computer hardware error that occurs when a computer's central processing unit detects an unrecoverable hardware error in the processor itself, the memory, the i/o devices or on the system bus. It is not caused by software. The error usually occurs due to component failure or the overheating or overclocking of hardware components. Most machine check exceptions halt the operating system and require a restart before users can continue normal operation. Diagnosing the failure can be difficult because so little information about what caused the problem is captured.

Modern versions of Microsoft Windows on IA-32 and x86-64 processors handle machine check exceptions through the Windows Hardware Error Architecture. When WHEA detects a machine check exception, it displays the error in a Blue Screen of Death, with the following parameters (which vary, but the first parameter is always 0x0 for a machine check exception):[1]

Older versions of Windows handle similar exceptions through the Machine Check Architecture. In this case the Blue Screen of Death will show an error similar to the following:[2]

On Linux, a process (such as klogd[3]) writes a message to the kernel log and/or the console screen (usually only to the console when the error is non-recoverable and the machine crashes as a result):

 '''CPU 0: Machine Check Exception: 0000000000000004 Bank 2: f200200000000863 Kernel panic: CPU context corrupt'''

Problem types

Most of these errors relate specifically to the Pentium processor family. Similar errors may occur on other processors and will cause similar problems.

Some of the main hardware problems that cause MCEs include:

  • System bus errors: (error communicating between the processor and the motherboard).
  • Memory errors: parity checking detects when a memory error has occurred. Error correction code (ECC) can correct limited memory errors so that processing can continue.
  • Cache errors in the processor.

Possible causes

Machine checks are a hardware problem, not a software problem. They're often the result of the overclocking or overheating, causing errors, or hitting a thermal limit where the CPU must shut itself down to avoid permanent damage. But they can also be caused by bus errors introduced by other failing components, including memory, i/o devices and i/o controllers. Possible causes include:

  • Poor CPU cooling due to a CPU heatsink and fan that's clogged with dust or come loose.
  • Overclocking beyond the highest clock rate at which the CPU is still reliable.
  • Failing motherboard.
  • Failing processor.
  • Failing memory.
  • Failing i/o controllers, on either the motherboard or separate cards.
  • Failing i/o devices.
  • Inadequate or failing power supply.
  • Poor case cooling due to inadequate or clogged case fans or filters.

Cooling problems are usually obvious upon inspection. A failing motherboard or processor can be identified by swapping-in known-good parts. Memory can be checked by booting to a diagnostic tool, e.g., on Windows with the Windows Memory Diagnostic utility. Non-essential failing i/o controllers and devices can be identified by unplugging them if possible or disabling the devices and drivers to see if the problem disappears. If the failures typically only occur fairly soon after the OS is booted or not at all or not for days, that may be suggestive of a power supply problem, the failure occurring when power demand peaks as the OS wakes up any physical drives and other devices.

Decoding MCEs

As noted previously, decoding MCE errors can prove difficult. Normally the manufacturer (especially processor manufacturers) will be able to provide information about specific codes.

For IA-32 and Intel 64 processors, consult the Intel 64 and IA-32 Architectures Software Developer's Manual[4] Chapter 15 (Machine-Check Architecture), or the Microsoft KB Article on Windows Exceptions.[5]

Programs to decode Intel and AMD MCEs

  • mcat: A Windows command-line program from AMD to decode MCEs from AMD K8, Family 0x10 and 0x11 processors.
  • mcelog[6] A Linux daemon by Andi Kleen to handle MCEs for modern x86 processors. mcelog can also decode machine checks.
  • parsemce[7] a Linux program by Dave Jones to decode MCEs from AMD K7 processors.
  • mced[8] a Linux program by Tim Hockin to gather MCEs from the kernel and alert interested applications. It does not try to interpret the MCE data, it just alerts other programs.

See also

  • Machine check architecture

References

1. ^{{cite web| url=https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0x124---whea-uncorrectable-error| title=Bug Check 0x124: WHEA_UNCORRECTABLE_ERROR| publisher=MSDN| date=2016-09-29| accessdate=2017-07-13}}
2. ^{{cite web| url=https://support.microsoft.com/en-au/help/162363/understanding-and-troubleshooting-the-stop-0x0000009c-screen| title=Bug Check 0x9C: MACHINE_CHECK_EXCPETION| publisher=Microsoft Support| date=2018-03-31| accessdate=2018-03-31}}
3. ^{{cite web| url= https://linux.die.net/man/8/klogd| title= klogd(8) - Linux man page| author=Steve Lord, Greg Wettstein| quote=klogd is a system daemon which intercepts and logs Linux kernel messages.| accessdate=2017-07-13}}
4. ^{{cite book | title = Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide, Part 2 | publisher = Intel Corporation | date = November 2018 | chapter = Machine Check Architecture | url = https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-volume-3b-system-programming-guide-part-2}}
5. ^{{cite web| url=https://support.microsoft.com/en-us/kb/329284| title=Stop error message in Windows XP that you may receive: "0x0000009C (0x00000004, 0x00000000, 0xb2000000, 0x00020151)"| publisher=MSDN| date=2015-12-07| accessdate=2017-07-13}}
6. ^{{cite web| url=http://www.mcelog.org/| title=mcelog: Advanced hardware error handling for x86 Linux | date=2015-04-20| accessdate=2017-07-13}}
7. ^{{cite web| url=https://www.kernel.org/pub/linux/kernel/people/davej/tools/parsemce.c| title=parsemce: Linux Machine check exception handler parser.| date=2003-07-22| accessdate=2017-07-13}}
8. ^{{Github|thockin/mcedaemon}}

External links

  • mcelog: Advanced hardware error handling for x86 Linux
  • [https://www.kernel.org/pub/linux/kernel/people/davej/tools/parsemce.c parsemce: Linux Machine check exception handler parser]

1 : Computer errors

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/14 1:57:13