Quickly accessible working storage available as part of a digital processor
A processor register is a quickly accessible location available to a computer's processor.[1] Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. In computer architecture, registers are typically addressed by mechanisms other than main memory, but may in some cases be assigned a memory address e.g. DEC PDP-10, ICT 1900.[2]
Almost all computers, whether load/store architecture or not, load items of data from a larger memory into registers where they are used for arithmetic operations, bitwise operations, and other operations, and are manipulated or tested by machine instructions. Manipulated items are then often stored back to main memory, either by the same instruction or by a subsequent one. Modern processors use either static or dynamicrandom-access memory (RAM) as main memory, with the latter usually accessed via one or more cache levels.
Processor registers are normally at the top of the memory hierarchy, and provide the fastest way to access data. The term normally refers only to the group of registers that are directly encoded as part of an instruction, as defined by the instruction set. However, modern high-performance CPUs often have duplicates of these "architectural registers" in order to improve performance via register renaming, allowing parallel and speculative execution. Modern x86 design acquired these techniques around 1995 with the releases of Pentium Pro, Cyrix 6x86, Nx586, and AMD K5.
Registers are normally measured by the number of bits they can hold, for example, an 8-bit register, 32-bit register, 64-bit register, 128-bit register, or more. In some instruction sets, the registers can operate in various modes, breaking down their storage memory into smaller parts (32-bit into four 8-bit ones, for instance) to which multiple data (vector, or one-dimensional array of data) can be loaded and operated upon at the same time. Typically it is implemented by adding extra registers that map their memory into a larger register. Processors that have the ability to execute single instructions on multiple data are called vector processors.
Types
A processor often contains several kinds of registers, which can be classified according to the types of values they can store or the instructions that operate on them:
User-accessible registers can be read or written by machine instructions. The most common division of user-accessible registers is a division into data registers and address registers.
Address registers hold addresses and are used by instructions that indirectly access primary memory.
Some processors contain registers that may only be used to hold an address or only to hold numeric values (in some cases used as an index register whose value is added as an offset from some address); others allow registers to hold either kind of quantity. A wide variety of possible addressing modes, used to specify the effective address of an operand, exist.
General-purpose registers (GPRs) can store both data and addresses, i.e., they are combined data/address registers; in some architectures, the register file is unified so that the GPRs can store floating-point numbers as well.
Status registers hold truth values often used to determine whether some instruction should or should not be executed.
Floating-point registers (FPRs) store floating-point numbers in many architectures.
Constant registers hold read-only values such as zero, one, or pi.
Vector registers hold data for vector processing done by SIMD instructions (Single Instruction, Multiple Data).
Special-purpose registers (SPRs) hold some elements of the program state; they usually include the program counter, also called the instruction pointer, and the status register; the program counter and status register might be combined in a program status word (PSW) register. The aforementioned stack pointer is sometimes also included in this group. Embedded microprocessors, such as microcontrollers, can also have special function registers corresponding to specialized hardware elements.
Model-specific registers (also called machine-specific registers) store data and settings related to the processor itself. Because their meanings are attached to the design of a specific processor, they are not expected to remain standard between processor generations.
Architectural registers are the registers visible to software and are defined by an architecture. They may not correspond to the physical hardware if register renaming is being performed by the underlying hardware.
In some architectures (such as SPARC and MIPS), the first or last register in the integer register file is a pseudo-register in that it is hardwired to always return zero when read (mostly to simplify indexing modes), and it cannot be overwritten. In Alpha, this is also done for the floating-point register file. As a result of this, register files are commonly quoted as having one register more than how many of them are actually usable; for example, 32 registers are quoted when only 31 of them fit within the above definition of a register.
Examples
The following table shows the number of registers in several mainstream CPU architectures. Note that in x86-compatible processors, the stack pointer (ESP) is counted as an integer register, even though there are a limited number of instructions that may be used to operate on its contents. Similar caveats apply to most architectures.
Although all of the below-listed architectures are different, almost all are in a basic arrangement known as the von Neumann architecture, first proposed by the Hungarian-American mathematicianJohn von Neumann. It is also noteworthy that the number of registers on GPUs is much higher than that on CPUs.
The A register is an accumulator to which all arithmetic is done; the H and L registers can be used in combination as an address register; all registers can be used as operands in load/store/move/increment/decrement instructions and as the other operand in arithmetic instructions. There is no floating-point unit (FPU) available.
The A register is an accumulator to which all arithmetic is done; the register pairs B+C, D+E, and H+L can be used as address registers in some instructions; all registers can be used as operands in load/store/move/increment/decrement instructions and as the other operand in arithmetic instructions. Some instructions only use H+L; another instruction swaps H+L and D+E. Floating-point processors intended for the 8080 were Intel 8231, AMD Am9511, and Intel 8232. They were also readily usable with the Z80 and similar processors.
The 8086/8088, 80186/80188, and 80286 processors, if provided an 8087, 80187 or 80287 co-processor for floating-point operations, support an 80-bit wide, 8 deep register stack with some instructions able to use registers relative to the top of the stack as operands; without a co-processor, no floating-point registers are supported.
The 80386 processor requires an 80387 for floating-point operations, later processors had built-in floating-point, with both having an 80-bit wide, 8 deep register stack with some instructions able to use registers relative to the top of the stack as operands. The Pentium III and later had the SSE with additional 128-bit XMM registers.
Geode GX/Media GX/4x86/5x86 is the emulation of 486/Pentium compatible processor made by Cyrix/National Semiconductor. Like Transmeta, the processor had a translation layer that translated x86 code to native code and executed it.[citation needed] It does not support 128-bit SSE registers, just the 80387 stack of eight 80-bit floating-point registers, and partially supports 3DNow! from AMD. The native processor only contains 1 data and 1 address register for all purposes and it is translated into 4 paths of 32-bit naming registers r1 (base), r2 (data), r3 (back pointer), and r4 (stack pointer) within scratchpad SRAM for integer operations.[citation needed]
A 16-bit processor from the Taiwanese company Sunplus Technology, notably used in VTech's V.Smile line of educational video game consoles, in addition to many plug-in TV games and off-brand consoles starting from the mid-2000s.
A 32-bit stack machine processor developed by VM Labs and specialized for multimedia. It can be found on the company's own Nuon DVD player console line and the Game Wave Family Entertainment System from ZaPit games. The design was heavily influenced by Intel's MMX technology; it contained a 128-byte unified stack cache for both vector and scalar instructions. The unified cache can be divided as eight 128-bit vector registers or thirty-two 32-bit SIMD scalar registers through bank renaming; there is no integer register in this architecture.
Nios II is based on the MIPS IV instruction set[citation needed] and has 31 32-bit GPRs, with register 0 being hardwired to zero, and eight 64-bit floating-point registers[citation needed]
Address register 8 (a7) is the stack pointer. 68000, 68010, 68012, 68020, and 68030 require an FPU for floating-point; 68040 had FPU built in. FP registers are 80-bit.
+ 2 × 32 Vector
(dedicated vector co-processor located nearby its GPU)
The Emotion Engine's main core (VU0) is a heavily modified DSP general core intended for general background tasks and it contains one 64-bit accumulator, two general data registers, and one 32-bit program counter. A modified MIPS III executable core (VU1) is for game data and protocol control, and it contains thirty-two 32-bit general-purpose registers for integer computation and thirty-two 128-bit SIMD registers for storing SIMD instructions, streaming data value and some integer calculation value, and one accumulator register for connecting general floating-point computation to the vector register file on the co-processor. The coprocessor is built via a 32-entry 128-bit vector register file (can only store vector values that pass from the accumulator in the CPU) and no integer registers are built in. Both the vector co-processor (VPU 0/1) and the Emotion Engine's entire main processor module (VU0 + VU1 + VPU0 + VPU1) are built based on a modified MIPS instructions set. The accumulator in this case is not general-purpose but control status.
Earlier generations allowed up to 127/63 registers per thread (Tesla/Fermi). The more registers are configured per thread, the fewer threads can run at the same time. Registers are 32 bits wide; double-precision floating-point numbers and 64-bit pointers therefore require two registers. It additionally has up to 8 predicate registers per thread.[18]
8 'A' registers, A0–A7, hold 18-bit addresses; 8 'B' registers, B0–B7, hold 18-bit integer values (with B0 permanently set to zero); 8 'X' registers, X0–X7, hold 60 bits of integer or floating-point data. Seven of the eight 18-bit A registers were coupled to their corresponding X registers: setting any of the A1–A5 registers to a value caused a memory load of the contents of that address into the corresponding X register. Likewise, setting an address into registers A6 or A7 caused a memory store into that location in memory from X6 or X7. (Registers A0 and X0 were not coupled like this).
16 in G5 and later S/390 models and z/Architecture
FP was optional in System/360, and always present in S/370 and later. In processors with the Vector Facility, there are 16 vector registers containing a machine-dependent number of 32-bit elements.[23] Some registers are assigned a fixed purpose by calling conventions; for example, register 14 is used for subroutine return addresses and, for ELF ABIs, register 15 is used as a stack pointer. The S/390 G5 processor increased the number of floating-point registers to 16.[24]
An eight-core 8/16-bit sliced stack machine controller with a simple logic circuit inside, it has 8 cog counters (cores), each containing three 8/16 bit special control registers with 32 bit x 512 stack RAM. However, it does not contain any general register for integer purposes. Unlike most shadow register files in modern processors and multi-core systems, all of the stack RAM in cog can be accessed in instruction level, which allows all of these cogs to act as a single general-purpose core if necessary. Floating-point unit is external and it contains two 80-bit vector registers.
Also included are a stack pointer and a frame pointer. Additional registers are used to implement zero-overhead loops and circular buffer DAGs (data address generators).
All of the registers may be used generally (integer, float, stack pointer, jump, indexing, etc.). Every 36-bit memory (or register) word can also be manipulated as a half-word, which can be considered an (18-bit) address. Other word interpretations are used by certain instructions. In the original PDP-10 processors, these 16 GPRs also corresponded to main (i.e. core) memory locations 0–15; a hardware option called "fast memory" implemented the registers as separate ICs, and references to memory locations 0–15 referred to the IC registers. Later models implemented the registers as "fast memory" and continued to make memory locations 0–15 refer to them. Movement instructions take (register, memory) operands: MOVE 1,2 is register-register, and MOVE 1,1000 is memory-to-register.
The general purpose registers are used for floating-point values as well. Three of the registers have special uses: R12 (Argument Pointer), R13 (Frame Pointer), and R14 (Stack Pointer), while R15 refers to the Program Counter.
6502's content A (Accumulator) register for main purpose data store and memory address (8-bit data/16-bit address), X and Y are indirect and direct index registers (respectively) and the SP registers are specific index only.
65c816 is the 16-bit successor of the 6502. X, Y, and D (Direct Page register) are condition registers and SP register are specific index only. Main accumulator extended to 16-bit (C)[34] while keeping 8-bit (A) for compatibility and main registers can now address up to 24-bit (16-bit wide data instruction/24-bit memory address).
Media-embedded processor was a 32-bit processor developed by Toshiba with a modded 8080 instruction set. Only the A, B, C, and D registers are available through all modes (8/16/32-bit). It is incompatible with x86; however, it contains an 80-bit floating-point unit that is x87-compatible.
r15 is the program counter, and not usable as a general purpose register; r13 is the stack pointer; r8–r13 can be switched out for others (banked) on a processor mode switch. Older versions had 26-bit addressing,[35] and used upper bits of the program counter (r15) for status flags, making that register 32-bit.
Each instruction controls whether registers are interpreted as integers or single precision floating point. Architecture is scalable to 4096 cores with 16 and 64 core implementations currently available.
Usage
The number of registers available on a processor and the operations that can be performed using those registers has a significant impact on the efficiency of code generated by optimizing compilers. The Strahler number of an expression tree gives the minimum number of registers required to evaluate that expression tree.
^Jia, Zhe; Maggioni, Marco; Staiger, Benjamin; Scarpazza, Daniele P. (2018). "Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking". arXiv:1804.06826 [cs.DC].