NVIDIA RTX 4080/4090 Launch Approaches as TSMC/Partners Prep, Chiplet Based Hopper to Pack 36,864 Cores

NVIDIA’s next-gen GeForce RTX 40 collection graphics playing cards are slated to land someday within the second half of 2022, presumably even at Computex in late Could. On that entrance, the chipmaker’s Taiwanese backend companions, most notably TSMS, ASE Know-how, and different IC suppliers/packers are gearing up for a clean launch of probably the most highly effective graphics structure ever. The corporate plans to launch two completely different lineups, codenamed Hopper and Ada Lovelace. The previous will leverage a chiplet primarily based structure with two dies (primarily based on TMSC’s CoWoS 2.5D packaging expertise) and the latter will retain the monolithic design of its predecessors.

GPU TU102 GA102 AD102
Arch Turing Ampere Ada Lovelace
Course of TSMC 12nm Sam 8nm LPP 5nm
GPC 6 7 12
TPC 36 42 72
SMs 72 84 144
Shaders 4,608 10,752 18,432
TFLOPs 16.1 37.6 90 TFLOPs?
Reminiscence 11GB GDDR6 24GB GDDR6X 24GB GDDR6X
Bus Width 384-bit 384-bit 384-bit
TGP 250W 350W 600W?
Launch Sep 2018 Sep 20 H2 2022

As we’re heard, Hopper can be aimed toward information facilities, HPC, and AI-centric workloads. Succeeding the A100, it’ll reportedly function two AD102 (Ada) dies glued collectively utilizing TSMC’s CoWoS packaging expertise. Which means that we’re an enormous 36,864 cores for the massive fats Hopper “H100” GPU. For the reminiscence, you possibly can make sure that it’ll pair HBM2e or HBM3 with a 1,024-bit (or increased) bus leading to a bandwidth of three,000-4,000 GB/s. This can, after all, come at a a lot increased energy consumption of round 1000W (from simply 400W on the A100). Not precisely shocking contemplating the sheer improve in throughput. We’re speaking a few bounce from simply 19.5 TFLOPs (FP64/FP32) to over 150 TFLOPs, with combined precision compute modes providing even increased efficiency: A 7-8x improve in compute capabilities via a single technology!

This image has an empty alt attribute; its file name is E65gO0tVgAgIxGI.jpg
Hopper (Through: @Harukaze)

For Ada-based GeForce RTX 4080 and 4090, we’re twice as a lot efficiency because the up to date Ampere elements, with an FP32 core depend of as much as 18,432. The AD102 flagship is rumored to function 144 SMs distributed throughout 12 GPCs. That leads to a 71% acquire in uncooked compute efficiency (66 TFLOPs) over the GA102. Add to that the truth that Staff Inexperienced is leveraging TSMC’s superior N5 course of node for Lovelace, and the ensuing frequency enhance ought to web a ~2.2x acquire over the RTX 3090.

The bus width of the RTX 4080 and 4090 must be the identical as their predecessors (384-bit and 320-bit), paired with quicker GDDR6X chips, leading to even increased reminiscence bandwidth. The RTX 4090 ought to pack as much as 24GB of GDDR6X reminiscence, and clock speeds rivaling the Navi 31 elements (2.3-2.5GHz). As for the general efficiency throughput, we’re round 90 TFLOPs of FP32 efficiency, an enormous step up over the 3090’s 36 TFLOPs.

If the AD102 features a complete of 18,432 cores, we are able to anticipate roughly 16,000 cores on the RTX 4080 and 18,000 on the RTX 4090. Based on Greymon and Kopitekimi, the Lovelace-based RTX 4080/4090 will draw as a lot as 500W of energy below load. That is regardless of the usage of one of the superior and environment friendly course of nodes on the planet. Nevertheless, working the numbers sort of provides up.

The AD102 flagship is predicted to function 144 SMs/12 GPCs, a acquire of 71% when it comes to logic in comparison with the GA102. Even when TSMC’s N5 node is 30% extra power-efficient than Samsung’s 8nm LPP node, we’re a rise of at the very least 80% in {hardware} items. That ought to simply lead to an influence draw at the very least 30-50% greater than the top-end RTX 3080/3090 Ampere choices.

Supply: DigiTimes

Be the first to comment

Leave a Reply

Your email address will not be published.