Details

  • OpenAI released GPT-5.4 on March 5, 2026, available now via API and Codex, rolling out to ChatGPT Plus, Team, and Pro users throughout the day.
  • Key upgrades include superior knowledge work, web search, native computer use for autonomous navigation of desktops, browsers, and software using screenshots and commands.
  • Supports 1 million token context window, mid-response steering, and variants like GPT-5.4 Pro for high performance and GPT-5.4 Thinking for advanced reasoning.
  • Achieves record benchmarks: 75% on OSWorld-Verified (surpassing human baseline of 72.4%), top on APEX-Agents for law/finance, 83% on GDPval knowledge work test.
  • Reduces hallucinations by 33% on individual claims and 18% on full responses vs. GPT-5.2; introduces Tool Search for efficient tool calling without pre-loading definitions.
  • Consolidates capabilities from GPT-5.3-Codex coding strengths, improved reasoning, agentic workflows; excels at long-horizon tasks like slide decks, financial models, legal analysis.
  • Enhanced token efficiency solves problems with fewer tokens; better visual understanding for high-res images, charts, documents.

Impact

OpenAI's GPT-5.4 release intensifies competition in the frontier AI race, outpacing rivals like Anthropic's Claude and Google's Gemini with native computer use that scores 75% on OSWorld-Verified, exceeding the human baseline and GPT-5.2's 47.3%, enabling autonomous agent workflows without custom infrastructure. This lowers barriers for enterprise adoption by consolidating coding, reasoning, and desktop navigation into one model, potentially shifting market dynamics toward more efficient, token-thrifty systems despite slightly higher per-token pricing, as efficiency gains offset costs for complex tasks. The 1M token context and Tool Search optimize for long-horizon professional deliverables, pressuring competitors to match in knowledge work benchmarks where GPT-5.4 leads with 83% on GDPval and top APEX-Agents scores. Reduced hallucinations bolster reliability for high-stakes finance and legal applications, aligning with AI safety trends via chain-of-thought monitoring in the Thinking variant. Over the next 12-24 months, this could accelerate R&D into agentic AI, drawing more funding to scalable on-device and enterprise tools while highlighting ongoing GPU demands.