After unveiling its second technology Habana Gaudi2 AI processor final month with some preliminary efficiency figures, Intel has adopted go well with with internally run benchmarks displaying its fancy accelerator outpacing NVIDIA’s A100 GPU. According to Intel, its outcomes show clear management coaching efficiency in comparison with what the competitors has to supply.
Gaudi2 is an enormous slab silicon, as proven within the picture above. The Guadi2 processor within the middle is flanked by six excessive bandwidth reminiscence tiles on exterior, successfully tripling the in-package reminiscence capability from 32GB within the earlier model to 96GB of HBM2E within the present iteration, serving up 2.45TB/s of bandwidth.
Intel’s newest benchmarks show what it sees as “dramatic advancements in time-to-train (TTT)” over the first-gen Gaudi chip. Even extra importantly (for Intel), its newest MLPerf submission spotlight a number of efficiency wins over NVIDIA’s A100-80G for eight accelerators on imaginative and prescient and language coaching fashions.
In case you missed it, Intel acquired Habana Labs for $2 billion in late 2019, to present it a lift in a excessive stakes race in opposition to NVIDIA in AI and ML coaching. The information middle is where the massive {dollars} come from, and to that time, NVIDIA’s information middle income topped its gaming income for the primary time final quarter ($3.75 billion in comparison with $3.62 billion).
Intel and NVIDIA each have large vested pursuits in dominating the information middle. MLPerf is an trade commonplace benchmark that NVIDIA usually trumpets its personal efficiency wins, so it is a truthful playground. It’s additionally simple to see why Intel is eager on flexing what it has been capable of accomplish with its Gaudi2 processor.

Intel Habana Gaudi2 Versus NVIDIA A100 In MLPerf

Two of the highlights Intel confirmed off have been ResNet-50 and BERT coaching instances. ResNet is a imaginative and prescient/picture recognition mannequin whereas BERT is for pure language processing. These are key areas in AI and ML, and each are trade commonplace fashions.
In an eight-accelerator server, Intel’s benchmarks spotlight as much as 45 p.c higher efficiency in ResNet and as much as 35 p.c in BERT, in comparison with NVIDIA’s A100 GPU. And in comparison with the first-gen Gaudi chip, we’re taking a look at good points of as much as 3X and and 4.7X, respectively.
“These advances can be attributed to the transition to 7-nanometer process from 16nm, tripling the number of Tensor Processor Cores, increasing the GEMM engine compute capacity, tripling the in-package high bandwidth memory capacity, increasing bandwidth and doubling the SRAM size,” Intel says.
“For vision models, Gaudi2 has a new feature in the form of an integrated media engine, which operates independently and can handle the entire pre-processing pipe for compressed imaging, including data augmentation required for AI training,” Intel provides.
The Gaudi household compute structure is heterogeneous and includes two compute engines, these being a Matrix Multiplication Engine (MME) and a totally programmable Tensor Processor Core (TPC) cluster. The former handles perations that may be lowered to Matrix Multiplications, whereas the latter is tailored for deep studying operations to speed up every little thing else.
What Intel is displaying are some enormous wins in MLPerf that may be translated to real-world workloads. It’s spectacular, although keep in mind that the A100 is technically going to be legacy silicon when Hopper H100 hits. NVIDIA hasn’t submitted MLPerf outcomes for its H100 accelerator but, and will probably be attention-grabbing to see how issues shake out when it does.