Details

  • Baidu released ERNIE 5.0, a native omni-modal large model with 2.4 trillion parameters using a Mixture-of-Experts (MoE) architecture that activates under 3% of parameters per inference for efficient performance.
  • Built on an end-to-end unified autoregressive architecture, it jointly trains text, images, videos, and audio data, enabling seamless multimodal understanding and generation unlike industry's common late fusion methods.
  • Supports input and output across text, images, audio, video, and documents; available via Ernie Bot app, website for individuals, and Qianfan platform for enterprises and developers.
  • Demonstrates breakthroughs in multimodal tasks like coding from video tutorials, creative writing simulating classical literature such as Dream of the Red Chamber with modern business logic, and strong reasoning.
  • Try it at the official link provided in the announcement; unveiled at Baidu Wenxin Moment conference, advancing from prior versions like ERNIE 4.0 and Ernie X1.

Impact

Baidu's ERNIE 5.0 positions the company as a frontrunner in native multimodal AI, matching or exceeding benchmarks of rivals like Google's Gemini 2.5 Pro and hypothetical GPT-5 variants in text reasoning, visual analysis, video comprehension, and generation, while its ultra-sparse MoE design under 3% activation enhances inference efficiency amid global GPU constraints. This native full-modal approach, integrating modalities in a single autoregressive framework, lowers barriers for complex real-world applications such as video-to-code generation or context-aware creative tasks, potentially accelerating adoption in China and beyond where data localization favors domestic models. Amid intensifying US-China AI rivalry and export controls on advanced chips, ERNIE 5.0 bolsters Baidu's self-reliance, steering R&D toward efficient, omni-modal systems that could narrow the gap with Western leaders and influence funding toward hybrid expert architectures over the next 12-24 months. Enterprises gain cost-effective access via Qianfan, widening AI deployment in sectors like e-commerce and content creation.