Fujitsu collaborated with ARM to develop the processor; it is the first processor to use the ARMv8.2-AScalable Vector Extension SIMD instruction set with 512-bit vector implementation.[4]
It has "Four-operand FMA with Prefix Instruction",[1] i.e. MOVPRFX instruction followed by 3-operand FMA operation (ARM, like RISC in general, is a 3-operand machine, with no space for four operands), which get packed into a single operation in the pipeline. For the processor the designer claim ">90% execution efficiency in (D|S|H)GEMM and INT16/8 dot product".[1]
The processor uses 32 gigabytes of HBM2 memory with a bandwidth of 1 TB per second.[4] The processor contains 16 PCI Express generation 3 lanes[1] to connect to accelerators (hypothetical e.g. GPUs and FPGAs). The processor also integrates a TofuD fabric controller with 10 ports implemented as 20 lanes of high-speed 28 Gbit/s to connect multiple nodes in a cluster.[1] The reported transistor count is about 8.8 billion.[4]
Each A64FX processor has four NUMA nodes, with each NUMA node having 12 compute cores, for a total of 48 cores per processor.[8][2][3] Each NUMA node has its own level 2 cache, HBM2 memory, and assistant cores for non-computational purposes.[8]
Fujitsu intends to produce lower specification machines with reduced assistant cores.[2][3]Reliability, availability and serviceability (RAS) capabilities are claimed, i.e. ~128,400 error checkers in total.
In June 2020 the Fugaku supercomputer using this processor reached 442 petaFLOPS and became the fastest supercomputer in the world.
Implementations
Fujitsu designed the A64FX for the Fugaku. As of June and November 2020, the Fugaku is the fastest supercomputer in the world by TOP500 rankings.[9] Fujitsu intends to sell smaller machines with A64FX processors.[2][3] Anandtech reported in June 2020 that the cost of a PRIMEHPC FX700 server, with two A64FX nodes, was ¥4,155,330 (c. US$39,000).[10]
Cray is developing supercomputers using the A64FX.[11][12] The Isambard 2 supercomputer is being built for a consortium in the United Kingdom, led by the University of Bristol and also including the Met Office, using the Fujitsu processors.[13][14] It is an upgrade to the Isambard supercomputer which was built with the Marvell ThunderX2, another ARM architecture microprocessor.[14]