Details

  • OpenAI is rolling out GPT-5.4, GPT-5.4 Thinking, and GPT-5.4 Pro across ChatGPT, API, and Codex, combining advances in reasoning, coding, and agentic workflows into one frontier model.
  • Key improvements include enhanced deep web research, better context retention for long thinking sessions, and user ability to interrupt and adjust model instructions mid-process.
  • The model is more factual and efficient, using fewer tokens and delivering faster speeds; it's 33% less likely to make factual errors in claims and 18% less likely overall compared to GPT-5.2.
  • Features native computer-use for autonomous operation of desktops, browsers, and software via screenshots and commands; supports up to 1M token context window in API and Codex.
  • Introduces Tool Search in API to dynamically look up tool definitions, reducing token usage by 47% in tests with many tools while maintaining accuracy.
  • Sets new benchmarks including 75% on OSWorld-Verified (surpassing human baseline of 72.4%), record scores on WebArena Verified, GDPval (83%), and APEX-Agents for law and finance tasks.
  • Available gradually to ChatGPT Plus, Team, Pro subscribers; excels at long-horizon tasks like slide decks, financial models, legal analysis, and complex coding.

Impact

OpenAI's GPT-5.4 release intensifies competition in enterprise AI by consolidating reasoning, coding, and native agentic capabilities into a single model, directly challenging Anthropic's stronghold in professional workflows with superior benchmark results like 75% on OSWorld-Verified computer navigation, exceeding human baselines, and record scores on knowledge work tests. The 1M token context window and 47% token savings via Tool Search lower costs for developers building large-scale agents, potentially accelerating adoption in tools like spreadsheets and integrations with FactSet or Moody's, while reduced hallucinations by 33% enhance reliability for finance and legal tasks. This positions OpenAI ahead in the race toward autonomous AI agents, pressuring rivals like Microsoft's Copilot and Perplexity's offerings to match native computer control and efficiency gains. Over the next 12-24 months, such advancements could redirect funding toward agentic R&D, widening the gap between frontier models capable of multi-step execution and legacy systems, while aligning with trends in on-device inference and AI safety through transparent chain-of-thought monitoring.