Details
- Baidu has open-sourced ERNIE-4.5-VL-28B-A3B-Thinking, a vision-language model introduced on November 11, 2025.
- This model features 28 billion total parameters but utilizes only 3 billion per inference through an A3B sparse Mixture-of-Experts router, significantly reducing memory requirements.
- Testing shows ERNIE-4.5-VL-28B-A3B-Thinking surpasses Google's Gemini-2.5-Pro on VQA, MMBench, and SEED-Bench, and matches or beats larger 7B+ open models in multimodal reasoning tasks.
- It introduces a new semantic-alignment module that jointly embeds images with Chinese and English text, aiming to minimize hallucinations and boost detailed reasoning.
- All model assets—including weights, training code, and evaluation notebooks—are available under Apache-2.0, enabling the tech community to use and fine-tune the model via GitHub, Hugging Face, PaddlePaddle AI Studio, and ModelScope.
- This release extends Baidu’s ERNIE 4.5 line, delivering a lightweight alternative before the launch of the next-gen ERNIE-5 series.
Impact
Baidu challenges industry giants like Google and OpenAI by offering an open-source model that outperforms some of their flagship VL models, which remain closed. By requiring only 3 billion active parameters at inference, the model becomes accessible on standard GPUs, opening doors for startups, academic researchers, and edge device makers. The move reflects a broader industry trend toward efficiency and transparency, and may accelerate real-world applications in AR, robotics, and on-device AI as Baidu builds toward its next major release.
