Details

  • Google DeepMind and Google Research have released the FACTS (Factuality Assessment Comprehensive Test Suite), positioned as the first comprehensive benchmark for evaluating the factual reliability of large language models.
  • The FACTS suite measures model outputs along four axes: internal knowledge, live web-search response accuracy, citation and grounding, and performance on multimodal (text-plus-image) prompts.
  • Version 1.0 evaluated 15 leading AI models; Gemini 3 Pro achieved the top composite score of 68.8 percent.
  • While models demonstrated improvements in search-augmented and encyclopedic tasks, they continue to face challenges in image understanding and factual grounding.
  • The dataset, evaluation scripts, and live leaderboard have been open-sourced on Kaggle under the Apache 2.0 license to support community testing and reproducibility.

Impact

This public leaderboard crowns Gemini 3 Pro as the current leader, ramping up pressure on competitors like OpenAI’s GPT-5 and Anthropic’s Claude 4.5 to share similar factuality data. FACTS is positioned to become a critical industry benchmark, potentially influencing model selection and regulatory compliance as global scrutiny around AI transparency rises. Persistent gaps in multimodal performance are set to shape future R&D priorities and investment across the sector.