Baidu Unveils ERNIE-4.5-VL-28B-A3B-Thinking: A 3B-Parameter Multimodal AI Model

Details

Baidu has open-sourced ERNIE-4.5-VL-28B-A3B-Thinking, a vision-language model introduced on November 11, 2025.
This model features 28 billion total parameters but utilizes only 3 billion per inference through an A3B sparse Mixture-of-Experts router, significantly reducing memory requirements.
Testing shows ERNIE-4.5-VL-28B-A3B-Thinking surpasses Google's Gemini-2.5-Pro on VQA, MMBench, and SEED-Bench, and matches or beats larger 7B+ open models in multimodal reasoning tasks.
It introduces a new semantic-alignment module that jointly embeds images with Chinese and English text, aiming to minimize hallucinations and boost detailed reasoning.
All model assets—including weights, training code, and evaluation notebooks—are available under Apache-2.0, enabling the tech community to use and fine-tune the model via GitHub, Hugging Face, PaddlePaddle AI Studio, and ModelScope.
This release extends Baidu’s ERNIE 4.5 line, delivering a lightweight alternative before the launch of the next-gen ERNIE-5 series.

Impact

Baidu challenges industry giants like Google and OpenAI by offering an open-source model that outperforms some of their flagship VL models, which remain closed. By requiring only 3 billion active parameters at inference, the model becomes accessible on standard GPUs, opening doors for startups, academic researchers, and edge device makers. The move reflects a broader industry trend toward efficiency and transparency, and may accelerate real-world applications in AR, robotics, and on-device AI as Baidu builds toward its next major release.

Baidu Unveils ERNIE-4.5-VL-28B-A3B-Thinking: A 3B-Parameter Multimodal AI Model

Details

Impact

Social

CONTENT

INFO