Pentium F00F bug

Intel Pentium processor

The Pentium F00F bug is a design flaw in the majority of Intel Pentium, Pentium MMX, and Pentium OverDrive processors (all in the P5 microarchitecture). Discovered in 1997, it can result in the processor ceasing to function until the computer is physically rebooted. The bug has been circumvented through operating system updates.

The name is shorthand for F0 0F C7 C8, the hexadecimal encoding of one offending instruction.[1] More formally, the bug is called the invalid operand with locked CMPXCHG8B instruction bug.[2]

Description

In the x86 architecture, the byte sequence F0 0F C7 C8 represents the instruction lock cmpxchg8b eax (locked compare and exchange of 8 bytes in register EAX). The bug also applies to opcodes ending in C9 through CF, which specify register operands other than EAX. The F0 0F C7 C8 instruction does not require any special privileges.

This instruction encoding is invalid. The cmpxchg8b instruction compares the value in the EDX and EAX registers with an 8-byte value in a memory location. In this case, however, a register is specified instead of a memory location, which is not allowed.

Under normal circumstances, this would simply result in an exception; however, when used with the lock prefix (normally used to prevent two processors from interfering with the same memory location), the CPU erroneously uses locked bus cycles to read the illegal instruction exception-handler descriptor. Locked reads must be paired with locked writes, and the CPU's bus interface enforces this by forbidding other memory accesses until the corresponding writes occur. As none are forthcoming, after performing these bus cycles all CPU activity stops, and the CPU must be reset to recover.

Due to the proliferation of Intel microprocessors, the existence of this open-privilege instruction was considered a serious issue at the time. Operating system vendors responded by implementing workarounds that detected the condition and prevented the crash.[3] Information about the bug first appeared on the Internet on or around 8 November 1997.[4] Since the F00F bug has become common knowledge, the term is sometimes used to describe similar hardware design flaws such as the Cyrix coma bug.

No permanent hardware damage results from executing the F00F instruction on a vulnerable system; it simply locks up until rebooted. However, data loss of unsaved data is likely if the disk buffers have not been flushed, if drives were interrupted during a write operation, or if some other non-atomic operation was interrupted.

The B2 stepping solved this issue for Intel's Pentium processors.[2]

The F00F instruction can be considered an example of a Halt and Catch Fire (HCF) instruction.

Workarounds

Although a definite solution to this problem required some sort of hardware/firmware revision, there were proposed workarounds at the time[1] which prevented the exploitation of this issue in generating a denial-of-service attack on the affected machine. All of them were based on forcefully breaking up the pattern of faulty bus accesses responsible for the processor hang. Intel's proposed (therefore "official") solutions required setting up the table of interrupt descriptors in an unnatural way that forced the processor to issue an intervening page fault before it could access the memory containing the descriptor for the undefined-opcode exception. These extraneous memory accesses turned out to be sufficient for the bus interface to let go of the locking requirement that was the root cause for the bug.

Specifically, the table of interrupt descriptors, which normally resides on a single memory page, is instead split over two pages such that the descriptors for the first seven exception handlers reside on a page, and the remainder of the table on the following page. The handler for the undefined opcode exception is then the last descriptor on the first page, while the handler for the page-fault exception resides on the second page. The first page can now be made not-present (usually signifying a page that has been swapped out to disk to make room for some other data), which will force the processor to fetch the descriptor for the page-fault exception handler. This descriptor, residing on the second page of the table, is present in memory as usual (if it were not, the processor would double- and then triple-fault, leading to a shutdown). These extra memory cycles override the memory locking requirement issued by the original illegal instruction (since faulting instructions are supposed to be able to be restarted after the exception handler returns). The handler for the page-fault exception has to be modified, however, to cope with the necessity of providing the missing page for the first half of the interrupt descriptor table, a task it is not usually required to perform.

The second official workaround from Intel proposed keeping all pages present in memory, but marking the first page read-only. Since the originating illegal instruction was supposed to issue a memory write cycle, this is enough for again forcing the intervention of the page-fault handler. This variant has the advantage that the modifications required to the page-fault handler are very minor compared to the ones required for the first variant; it basically just needs to redirect to the undefined-exception handler when appropriate. However, this variant requires that the operating system itself be prevented from writing to read-only pages (through the setting of a global processor flag), and not all kernels are designed this way; more recent kernels in fact are, since this is the same basic mechanism used for implementing copy-on-write.

Additional workarounds other than the official ones from Intel have been proposed; in many cases these proved as effective and much easier to implement.[1] The simplest one involved merely marking the page containing interrupt descriptors as non-cacheable. Again, the extra memory cycles that the processor was forced to go through to fetch data from RAM every time it needed to invoke an exception handler appeared to be all that was needed to prevent the processor from locking up. In this case, no modification whatsoever to any exception handler was required. And, although not strictly necessary, the same split of the interrupt descriptor table was performed in this case, with only the first page marked non-cacheable. This was for performance reasons, as the page containing most of the descriptors (and the ones more often required, in fact) could stay in cache.

For unknown reasons, these additional, unofficial workarounds were never endorsed by Intel. It might be that it was suspected that they might not work with all affected processor versions.

See also

References

  1. ^ a b c Collins, Robert R. (1998-05-01). "The Pentium F00F Bug". Dr. Dobb's Journal. Retrieved 2015-07-27.
  2. ^ a b "81. Invalid Operand with Locked CMPXCHG8B Instruction". Pentium Processor Specification Update, Version-041 [Release Date January 1999] (PDF). Santa Clara, California, USA: Intel Corporation. 1998. pp. 51f. Order Number 242480-041. Archived from the original (PDF) on 2016-03-04. Retrieved 2015-07-27. PROBLEM: The CMPXCHG8B instruction compares an 8-byte value in EDX and EAX with an 8-byte value in memory (the destination operand). The only valid destination operands for this instruction are memory operands. If the destination operand is a register, the processor should generate an invalid opcode exception, execution of the CMPXCHG8B instruction should be halted and the processor should execute the invalid opcode exception handler. This erratum occurs if the LOCK prefix is used with the CMPXCHG8B instruction with an (invalid) register destination operand. In this case, the processor may not start execution of the invalid opcode exception handler because the bus is locked. This results in a system hang. IMPLICATION: If an (invalid) register destination operand is used with the CMPXCHG8B instruction and the LOCK prefix, the system may hang. No memory data is corrupted and the user can perform a system reset to return to normal operation. Note that the specific invalid code sequence necessary for this erratum to occur is not normally generated in the course of programming nor is such a sequence known by Intel to be generated by commercially available software. This erratum only applies to Pentium processors, Pentium processors with MMX technology, Pentium OverDrive processors and Pentium OverDrive processors with MMX technology. Pentium Pro processors, Pentium II processors and i486TM and earlier processors are not affected […]
  3. ^ "torvalds/linux". GitHub. Archived from the original on 2022-06-23. Retrieved 2021-07-09.
  4. ^ Hovers, Onno; et al. (1997-11-08). "Nieuwe Intel Pentium Bug" [New Intel…] (newsgroup thread, 38 posts by 22 authors) (in Dutch). Newsgroupnl.comp.hardware. Retrieved 2015-07-27. Als je er nog niet over gehoord hebt, er is een nieuwe Intel Pentium BUG. Daardoor is het vanuit userspace mogelijk om de Pentium helemaal te laten crashen met 1 instructie. De bug doet zich voor op de Intel Pentium en de Intel Pentium MMX. De bug doet zich niet voor op de Intel Pentium Pro, de Intel Pentium II, de chips van AMD, Cyrix e.d. Deze bug is alleen van belang voor sommige mensen die een multiuser (shell) systeem draaien op een Intel Pentium. Op zo'n systeem kan elke user het systeem crashen…

Further reading