NVIDIA Further Boosts AI Performance By 3x For GeForce RTX GPUs, RTX PC & RTX Workstations With Latest Driver
NVIDIA has further boosted the AI performance of its GeForce RTX GPUs & RTX AI PC platforms with the latest R555 driver release.
NVIDIA's GeForce RTX GPUs & RTX PCs Offer The Fastest AI Performance Across All Segments, Now Boosted By 3X With Latest Drivers
During today's Microsoft Build, NVIDIA announced a range of new AI performance optimizations that are now available on the RTX platform which includes GeForce RTX GPUs, Workstations, and PCs.
Related Story Microsoft Intros Copilot+ PCs For The “AI PC” Era: Snapdragon X First, Intel Lunar Lake & AMD Strix Later With Faster NVIDIA RTX & AMD Radeon GPUs
The new optimizations are specifically targeted at a range of LLMs (Large Language Models) that power the latest Generative AI experiences. Using the latest R555 drivers, NVIDIA's RTX GPUs and AI PC platforms now offer up to 3x faster AI performance with ONNX Runtime (ORT) and DirectML. These two tools are used to run AI models locally on Windows PCs.
In addition to that, WebNN has also been accelerated with RTX via DirectML. This is an application programming interface for web developers to deploy new AI models. Microsoft is working with NVIDIA to further accelerate RTX GPU performance whilst adding DirectML support on PyTorch. Following is a full list of capabilities that the new R555 drivers offer for GeForce RTX GPUs and RTX PCs:
- Support for DQ-GEMM metacommand to handle INT4 weight-only quantization for LLMs
- New RMSNorm normalization methods for Llama 2, Llama 3, Mistral and Phi-3 models
- Group and multi-query attention mechanisms, and sliding window attention to support Mistral
- In-place KV updates to improve attention performance
- Support for GEMM of non-multiple-of-8 tensors to improve context phase performance
In performance benchmarks of ORT, a generative AI extension released by Microsoft, NVIDIA shows gains across the board in both INT4 and FP16 data types. The performance improvements are up to 3x thanks to the optimization techniques added within these