NVIDIA and Stability AI Turbocharge Stable Diffusion 3.5 on RTX GPUs

Details

NVIDIA has joined forces with Stability AI to enhance Stable Diffusion 3.5 Large, leveraging TensorRT acceleration and FP8 quantization, which together double generation speeds and cut VRAM usage by 40%, dropping from 18GB to 11GB.
The improvements allow five GeForce RTX 50 Series GPUs to run SD3.5 simultaneously, compared to just one previously, thanks to optimized use of Tensor Cores for faster AI processing.
The new TensorRT for RTX SDK debuts as a standalone toolkit with an eightfold reduction in size, featuring just-in-time (JIT) on-device engine compilation that eliminates the need for specific GPU precompilation and offers integration with Windows ML.
Stability AI's fine-tuned and optimized models are accessible through Hugging Face, with an NVIDIA NIM microservice deployment option slated for release in July 2025.
Support for FP4 quantization is added for Blackwell GPUs, while RTX 40/50 Series and PRO GPUs continue to utilize FP8, expanding performance benefits across more hardware tiers.

Impact

By making high-end AI model deployment faster and less resource-intensive, NVIDIA is pushing real-time generative AI into reach for a wider range of developers and creators. This move could accelerate AI-powered creative workflows on consumer-grade PCs and challenge the reliance on cloud-based inference. NVIDIA’s innovative JIT engine and SDK integration set a new bar for rivals looking to keep pace in efficient edge AI deployment.

NVIDIA and Stability AI Turbocharge Stable Diffusion 3.5 on RTX GPUs

Details

Impact

Social

CONTENT

INFO