El Capitan (supercomputer)

El Capitan
Active
  • Deployment: 2H 2023
  • Completion: 2024
SponsorsU.S. Department of Energy
OperatorsLawrence Livermore National Laboratory and U.S. Department of Energy
LocationLivermore Computing Complex
ArchitectureHPE Cray Shasta
Power40 MW (Proj)
Operating systemTOSS
SpaceTBA
MemoryTBA
StorageTBA
Speed1.742 exaFLOPS (Rmax) / 2.746 exaFLOPS (Rpeak)
CostUS$600 million (estimated cost)
PurposeScientific research and development, stockpile stewardship[1]

Hewlett Packard Enterprise El Capitan, is an exascale supercomputer, hosted at the Lawrence Livermore National Laboratory in Livermore, United States and becoming operational in 2024. It is based on the Cray EX Shasta architecture. El Capitan displaced Frontier as the world's fastest supercomputer in the 64th edition of the Top500 (Nov 2024). El Capitan is the third exascale system deployed by the US.

Design

El Capitan uses a combined 11,039,616 CPU and GPU cores consisting of 43,808 AMD 4th Gen EPYC 24C "Genoa" 24 core 1.8 GHz CPUs (1,051,392 cores) and 43,808 AMD Instinct MI300A GPUs (9,988,224 cores). The MI300A consists of 24 Zen4 based CPU cores, and a CDNA3 based GPU integrated onto a single organic package, along with 128GB of HBM3 memory.[2]

Blades are interconnected by an HPE Slingshot 64-port switch that provides 12.8 terabits/second of bandwidth. Groups of blades are linked in a dragonfly topology with at most three hops between any two nodes. Cabling is either optical or copper, customized to minimize cable length. Total cabling runs 145 km (90 mi).

El Capitan uses an APU architecture, where the CPU and GPU share an internal on-chip coherent interconnect.

El Capitan takes up 7,500 square feet of floor space, similar to two tennis courts put together.[3] It is made up of at least 87 compute racks, including the “Rabbit” NVM-Express fast storage arrays and compute nodes. According to The Next Platform: "El Capitan has a total of 11,136 nodes in liquid-cooled Cray EX racks, with four MI300A compute engines per node and a total of 44,544 devices across the system. Each device has 128 GB of HBM3 main memory that is shared across the CPU and GPU chiplets, which runs at 5.2 GHz and which delivers an aggregate 5.3 TB/sec of aggregate bandwidth into and out of the CPU and GPU chiplets."[4]

History

El Capitan was ordered as a part of the Department of Energy's CORAL-2 initiative, intended to replace Sierra, an IBM/NVIDIA machine deployed in 2018. The original design envisioned hundreds of thousands of GPUs and 40 MW of power.[citation needed] LLNL partnered with HPE Cray and AMD to build the system.[5]

Three El Capitan prototypes – named rzVernal, Tioga, and Tenaya – themselves were powerful enough to be listed on the TOP500 supercomputer list in June, 2023.[6] rzVernal reached 4.1 petaflops.[7] In early July, the first components of El Capitan were installed at Lawrence Livermore, with complete installation expected by mid 2024.[8]

By November 18, 2024, El Capitan was operational and verified as the world's fastest supercomputer, achieving 1.742 exaFLOPs.[9]

References

  1. ^ "Fiscal Year 2023 Stockpile Stewardship and Management Plan – Biennial Plan Summary Report to Congress" (PDF). United States Department of Energy. pp. 3–17. Retrieved May 27, 2023.
  2. ^ Smith, Ryan (January 25, 2023). "ces-2023-amd-instinct-mi300-data-center-apu-silicon-in-hand-146b-transistors-shipping-h223". Anandtech. Retrieved February 13, 2023.
  3. ^ Redell, Bob (November 29, 2024). "Livermore lab's 'El Capitan' is world's fastest supercomputer". NBC Bay Area. Retrieved February 13, 2023.
  4. ^ Morgan, Timothy Prickett (November 18, 2024). "El Capitan" Supercomputer Blazes The Trail for Converged CPU-GPU Compute". The Next Platform. Retrieved December 30, 2024.
  5. ^ Trader, Tiffany (August 13, 2019). "Cray Wins NNSA-Livermore "El Capitan" Exascale Contract". hpcwire.com. Retrieved February 13, 2023.
  6. ^ "June 2023 list". TOP500.org. Retrieved October 10, 2023.
  7. ^ Aaron Klotz (June 3, 2022). "Trio of Prototype AMD-Based El Capitan Supercomputers Already Rank in Top 200". Tom's Hardware.
  8. ^ Anton Shilov (July 6, 2023). "El Capitan Installation Begins: First APU-based Exascale System Shaping Up For 2024". Anandtech.
  9. ^ Michael Kan (November 18, 2024). "US' El Capitan Is Now the World's Fastest Supercomputer". PC Magazine.
Records
Preceded by
HPE Frontier
1.1 exaFLOPS
World's most powerful supercomputer
November 2024 –
1.7 exaFLOPS
Incumbent