Details
- NVIDIA has joined forces with Stability AI to enhance Stable Diffusion 3.5 Large, leveraging TensorRT acceleration and FP8 quantization, which together double generation speeds and cut VRAM usage by 40%, dropping from 18GB to 11GB.
- The improvements allow five GeForce RTX 50 Series GPUs to run SD3.5 simultaneously, compared to just one previously, thanks to optimized use of Tensor Cores for faster AI processing.
- The new TensorRT for RTX SDK debuts as a standalone toolkit with an eightfold reduction in size, featuring just-in-time (JIT) on-device engine compilation that eliminates the need for specific GPU precompilation and offers integration with Windows ML.
- Stability AI's fine-tuned and optimized models are accessible through Hugging Face, with an NVIDIA NIM microservice deployment option slated for release in July 2025.
- Support for FP4 quantization is added for Blackwell GPUs, while RTX 40/50 Series and PRO GPUs continue to utilize FP8, expanding performance benefits across more hardware tiers.
Impact
By making high-end AI model deployment faster and less resource-intensive, NVIDIA is pushing real-time generative AI into reach for a wider range of developers and creators. This move could accelerate AI-powered creative workflows on consumer-grade PCs and challenge the reliance on cloud-based inference. NVIDIA’s innovative JIT engine and SDK integration set a new bar for rivals looking to keep pace in efficient edge AI deployment.