Raja’s Chip Notes Lay Out Intel’s Path to Zettascale

Raja Chip Notes Intel Path To Zetta Scale
Raja’s Chip Notes Intel’s Path To Zetta Scale

Just a few hours in the past, I bumped into Raja Koduri at a bar. For individuals who have no idea, Raja is the SVP and GM of Intel’s Accelerated Computing Programs and Graphics (AXG) Group. Given the entire Supercomputing convention actions we’re protecting this week, it was nice to seize a beer (maybe a couple of) with Raja and go into Intel’s HPC technique. Particularly, Raja detailed for me how Intel plans to go from ExaFLOPS in 2022 to ZettaFLOPS in 2027-2028. For some context, that is Intel’s pathway to roughly 1000x efficiency of immediately’s techniques in solely 5-6 years.

Raja’s Chip Notes Lay Out Intel’s Path to Zettascale

What you will see because the artifacts of our dialogue are merely a couple of factors on an Workplace Depot pad. Only for some context, we managed to seize an image whereas having this dialogue.

Patrick Raja Chance Beer Encounter
Patrick Raja Likelihood Beer Encounter

Raja defined to me Intel’s path to Zettascale as an unlimited enchancment to immediately’s techniques, together with the Aurora supercomputer slated for 2022. I requested and Raja let me snap a photograph of his “chip notes” after the dialogue. For these questioning, the bar supplied us every with small plates of Cool Ranch Doritos. It was a bit humorous since we have been there speaking about chips. Therefore, we’re calling these “Raja’s Chip Notes.”

Raja Chip Notes Intel Path To Zetta Scale
Raja’s Chip Notes Intel’s Path To Zetta Scale

What you may see above is a sequence of enhancements Raja thinks Intel can attain with a view to get to Zettaflops, or roughly 500x the Aurora efficiency of >=2 Exaflops (extra on that in a bit.) One of many constraints right here was working inside the same energy footprint to Aurora since it will be much less of an achievement to say Zettaflops have been achieved with a corresponding 500x improve in energy consumption.

  • One of many massive ones and the primary on that checklist is “Structure” with a 16x enchancment. That 16x entails adopting comparable math execution to what some others out there are doing. That quantity is 16x, however Raja instructed me that Intel is aware of the architectural modifications to scale properly past that. The 16x is getting used right here as a result of going properly past which may change the long run GPUs/ accelerators into double-precision LINPACK optimized chips as a substitute of performing properly on different workloads.
    Raja famous that whereas Intel may give attention to easy DP execution, as a substitute one of many greater issues is conserving the entire execution models fed with information and sufficient reminiscence bandwidth. His place is that Intel would give attention to not simply DP execution which will get Intel to the Zettaflop period, but in addition AI math operations, and maybe most significantly guaranteeing that reminiscence bandwidth is plentiful and well-utilized. That method could not give a 1000x enchancment for each utility, however it ought to assist the Zettascale structure present huge positive aspects for a greater diversity of purposes.
  • The subsequent one is labeled as “energy/ thermals” and is scoped with a 2x enchancment. Because the Zettaflops objective is focusing on the identical or comparable energy as Aurora, one different technique to get extra efficiency is to do extra with much less energy. Examples of this can be operating chips at considerably decrease voltages and introducing higher-end cooling. We’re going to see the transition occur to liquid cooling, however extra vital cooling could also be required than simply rear door warmth exchangers.
  • Knowledge motion” is a 3x alternative. That is an space that I gave some suggestions on to Raja by way of asking for extra element to be shared. Intel, as one can think about, has tooling to research the place energy is spent in techniques. Nowadays a considerable amount of energy, and it may be a majority of energy, is spent shifting information round in a system and bundle. Because of this, issues like having increased levels of integration could make a significant distinction by way of rebalancing the ability that’s devoted to really performing computations versus shifting information. For these following silicon photonics, that is coming, and we are going to cowl {that a} bit extra later.
  • The one which I feel many of us give attention to is course of expertise. Intel introduced a reasonably aggressive schedule for brand new course of introduction. That’s the reason there’s a “Course of” 5x observe. One key merchandise right here is that, particularly on the HPC/ GPU facet, Intel is embracing the multi-die or multi-tile design together with superior packaging. That is particularly designed to have the ability to enable various kinds of silicon to be built-in utilizing the proper course of expertise with a view to restrict danger as Intel strikes ahead to new generations.

Now 2EF x 16 x 2 x 3 x 5 is simply 960, nonetheless Aurora is listed as a >= 2EF peak system. My sense is that it is going to be properly above that. In flip that may enable a bit extra margin within the particular person gadgets above (tough estimates themselves) to advance and nonetheless hit a Zettascale system or roughly 1000x a current-generation 1 Exaflop system.

Now allow us to get to Intel’s HPC Technique web page:

Raja Chip Notes Intel HPC Strategy 2022 2028
Raja’s Chip Notes Intel HPC Technique 2022 2028

That is specified by mainly three phases. Every of those phases roughly matches oneAPI variations that you will note on the left facet. Raja confused that taking over architectures as an organization was not simply constructing {hardware}. It’s, maybe extra considerably, additionally sustaining and investing in a hardware-software contract for constant order of magnitude efficiency positive aspects.

Section 1: 2022 – 2023 – Exascale

  • Exascale for Intel actually begins with its 2022 lineup. This consists of Sapphire Rapids and Ponte Vecchio and we are going to see this in Aurora. Though these are 2022 merchandise, there’s a lot on the market on them. I’ve personally seen a number of Sapphire Rapids techniques between OCP Summit 2021 and SC21, so the trade has transitioned from speaking about Sapphire as a far-out product to discussing the road extra definitively at this level.
Aurora Specs Accessed 2021 11 18
Aurora Specs Accessed 2021 11 18
  • The subsequent technology Raja calls “optimizing Exascale.” It was getting late (round midnight) and neither of us may keep in mind if Intel had disclosed Granite Rapids. I checked and it was famous within the Intel Accelerated Manufacturing disclosure so one can learn Xeon-Subsequent right here as Granite Rapids. PVC-Subsequent is one other bridge that Intel has not publicly disclosed. The general message I took away from our discuss is that this subsequent technology was about enhancing the 2022 architectures.

Section 2: 2024 – 2025 – Pre-Zetta

  • Within the Pre-Zetta period, we get Falcon. Falcon is the Xeon + Xe mixture that will likely be extra equal to NVIDIA Grace. One thing that may turn into more and more vital is integration. Eradicating extra SerDes from techniques saves a ton of energy and better ranges of integration imply that much less energy will be spent on shifting information round and extra energy can as a substitute be spent on compute.
  • Lightbender” is what now we have all been ready for. That is silicon photonics built-in into chips. I’ve some tough thought of the goal specs, however since they didn’t make it to Raja’s Chip Notes let me set it up in another way. Intel has said that it’s shifting to a chiplet/ tile structure with more and more refined packaging. My sense is that this can be a silicon photonics tile answer that will likely be quick sufficient to do issues like transfer HBM or different kinds of reminiscence off of GPU/ CPU packages. That opens up the flexibility for brand new system design in addition to the flexibility to simply fluctuate capability and doubtlessly media sorts. A high-speed photonics interconnect additionally implies that different units akin to processors will be bodily extra distant however with a high-speed hyperlink to the GPUs/ accelerators. That may enable for higher system design as properly.

Section 3: 2026 – 2028 – Zettascale

  • Since there’s not so much on the notepad on this one, that is the subsequent step in refining the entire totally different elements that Intel is constructing over the subsequent 4-5 years. A technique to consider it’s a double-precision Zetaflop in one thing like a 50MW energy envelope. The opposite manner to consider that is that it may result in the present 50MW Exaflop-class techniques down-scaling to be solely 50kW techniques that may slot in a rack or a couple of racks. An influence of that is actually democratizing large-scale supercomputing. That’s what the road “Exascale on the Edge” is referring to on the primary sheet.
  • One other vital observe right here is that that is the timeframe when the structure, energy and thermals, information motion, and course of applied sciences would want to maneuver up the maturity curve. I requested Raja and he’s real looking. A number of this expertise Intel has line-of-sight to, however not all the pieces has been invented but. He acknowledged that there are dangers to the 1000x determine, however he was strolling me by the plan. My overarching sense having spent a while with Raja is that he feels some uncertainty and danger but in addition has a little bit of buffer constructed into elements of the 1000x plan.
  • One will discover that the dates listed here are written a bit oddly with 26 – 27 – 28. I could have prompt including “28” to offer a bit extra margin for future applied sciences.

Now to the ultimate sheet:

Raja Chip Notes OneAPI And Phi Or Phi22
Raja’s Chip Notes oneAPI And Phi Or Phi22

Not too long ago Raja posted this tweet:

For individuals who have no idea “Φ” is the twenty first letter of the Greek alphabet. Extra importantly for context, additionally it is the title of the Xeon Phi line that Intel had within the HPC area for years. Raja famous that the Φ image seems considerably just like the O and I from oneAPI put collectively. My suggestion, provided that to realize the 1000x it’s possible we can have fewer piecemeal elements and as a substitute, a better stage of integration was to make use of years. So subsequent 12 months’s Sapphire Rapids plus Ponte Vecchio platform turns into Phi22 for Phi and 2022, the 12 months it is going to debut. Most definitely somebody at Intel has already mentioned this can be a dangerous thought, however a couple of beers in that was the suggestion.

Closing Phrases

First off, I simply wished to say thanks to Raja for taking the outing of your night (and early morning) to have a number of beers and stroll by this. When the preliminary 1000x Zettascale claims got here out, many have been very skeptical. Aurora continues to be not put in and Intel is betting so much on Ponte Vecchio and its fully new fashion of chipbuilding. Nonetheless, after talking to Raja, it looks like Intel has what I might name its “Phi22” answer pretty properly established and is now learn how to execute an aggressive plan for the long run. Frankly, Intel must have an aggressive plan right here as a result of if it doesn’t, different firms will. Raja acknowledged the dangers, however not less than has a plan the place there are numerous items the corporate already has options for with a view to get to 1000x. Personally, I can’t wait till STH is reviewing 1 Exaflop options in our lab at solely 50kW.

Be the first to comment

Leave a Reply

Your email address will not be published.