Details
- Baidu released ERNIE-Image, an open-source 8B text-to-image model integrated into ERNIE Bot, designed for precise text rendering, complex instruction following, and structured image generation.
- The model has achieved leading results among comparable open-source models on GenEval and OneIG benchmarks (English and Chinese versions), performing at parity with top closed-source competitors.
- ERNIE-Image ranks #1 globally among open-source models on LongText-Bench with a score of 0.9733, demonstrating superior performance on text-heavy prompt handling.
- Already adopted by 50+ integration partners, indicating rapid enterprise uptake following the launch.
- Designed with a focus on lowering hardware and cost barriers to advanced image generation, making high-quality text-to-image capabilities more accessible to developers and organizations.
Impact
ERNIE-Image's release represents a significant competitive move in the open-source text-to-image space, particularly as proprietary models like Gemini 2.5 Flash Image continue to advance. Baidu's emphasis on text rendering and instruction-following directly addresses known limitations in compositional image generation, a persistent challenge across the industry. The model's performance parity with closed-source alternatives at 8B parameters could democratize access to capable image generation tools, lowering deployment costs for enterprises. However, GenEval itself faces documented benchmark drift issues—recent research shows the benchmark can diverge from human judgment by up to 17.7% on state-of-the-art models, suggesting ERNIE-Image's GenEval scores should be interpreted cautiously relative to actual user satisfaction. Rapid adoption by 50+ partners signals market demand for open alternatives to proprietary solutions.
