Baidu Releases Qianfan-VL Vision-Language Models as Open Source for Enterprise Use

Details

Baidu AI Cloud has open-sourced the Qianfan-VL series, a family of multimodal vision-language models, making both weights and code publicly available.
The models come in three parameter sizes, each tailored to balance latency, accuracy, and compute requirements for different deployment needs.
According to Baidu’s technical blog, Qianfan-VL achieves leading benchmark results in optical character recognition and mathematical reasoning tasks.
APIs are designed for enterprise use-cases, including document parsing, interpreting charts, and seamless integration within the broader Qianfan AI platform.
Checkpoints can be freely downloaded from GitHub and Hugging Face, under a licence allowing commercial deployment.
This open release builds on Baidu’s recent initiatives with ERNIE Speed and ERNIE Lite, continuing a pattern of open-sourcing major AI models.
The models are optimized for mainstream hardware, running efficiently on NVIDIA A100 and H100 GPUs, as well as on China-developed Ascend accelerators.

Impact

Baidu’s open-source move intensifies competition with Meta’s LLaVA-Next and Alibaba’s Qwen-VL, as companies race to boost homegrown multimodal AI stacks. The release could accelerate AI adoption in finance and government sectors by offering cost-effective, OCR-focused solutions. By making these models available globally and aligning with China’s push for ‘secure and controllable AI,’ Baidu aims to attract diverse developers and shift industry focus toward application-driven innovation.

Baidu Releases Qianfan-VL Vision-Language Models as Open Source for Enterprise Use

Details

Impact

Social

CONTENT

INFO