Intel Gaudi 2 Accelerator Up To 55% Faster Than NVIDIA H100 In Stable Diffusion, 3x Faster Than A100 In AI Benchmark Showdown
Stability AI has published a new blog post that offers an AI benchmark showdown between Intel Gaudi 2 & NVIDIA's H100 and A100 GPU accelerators. The benchmarks show that Intel's solutions offer great value and can be seen as a respected alternative for customers who are eyeing a fast & readily available solution compared to NVIDIA's offerings.
Intel vs NVIDIA AI Accelerator Showdown: Gaudi 2 Showcases Strong Performance Against H100 & A100 In Stable Diffusion & Llama 2 LLMs, Great Performance/$ Highlighted As Strong Reason To Go Team Blue
The AI firm, Stability AI, has been making open models that can handle a diverse range of tasks efficiently. To test this out, Stability AI used two of their models which include Stable Diffusion 3, and did a benchmarking run between the most popular AI Accelerators from NVIDIA and Intel to see how they perform against each other.
Related Story Intel Core i9-14900KS 6.2 GHz CPU Listed For $749 At Microcenter, $50 Higher Price Than 13900KS
In Stability Diffusion 3, the next chapter in the highly popular text-to-image model, Intel's Gaudi 2 AI accelerator delivered some exceptional results. The model ranges from 800M to 8B parameters & it was tested using the 2B parameter version. For comparison, 2 nodes featuring a total of 16 Intel & NVIDIA accelerators were used with a batch size set to 16 per accelerator and a batch size of up to 512. The end result was the Intel Gaudi 2 offering a 56% speedup versus the H100 80GB GPU and a 2.43x speedup versus the A100 80 GB GPU.
The 96 GB HBM capacity also allowed Intel's Gaudi 2 to fit in a batch size of 32 per accelerator for a total batch size of 512. This enabled a further speed of 1,254 images per second, a speed-up of 35% over the 16 Batch Gaudi 2 accelerator, 2.10x over the H100 80GB, and 3.26x over the A100 80 GB AI GPUs.
Further scaling up to 32 nodes (256 accelerators) for both the Gaudi 2 and A100 80 GB GPUs, you see an increase of 3.16x on the Intel solution which can output 49.4