Details

  • Sam Altman announced GPT-5.3-Codex, OpenAI's new coding model achieving top scores: 57% on SWE-Bench Pro, 76% on TerminalBench 2.0, and 64% on OSWorld.
  • Features mid-task steerability, live updates during tasks, and strong computer use capabilities.
  • More efficient: uses less than half the tokens of GPT-5.2-Codex for similar tasks and over 25% faster per token.
  • Available now to paid ChatGPT users for software development like writing, debugging, and testing via Codex tools.
  • Released amid cybersecurity concerns; first OpenAI model rated 'high' risk, with gated access, safeguards, and no full API yet for sensitive uses.
  • Follows recent Codex app launch and comes minutes after Anthropic's competing agentic coding model.
  • Model self-improved: early versions helped debug and evaluate itself.

Impact

OpenAI's GPT-5.3-Codex positions the company ahead in the AI coding race, surpassing prior OpenAI and Anthropic models on key benchmarks like SWE-Bench Pro, where it hits 57%—a leap that could accelerate software development by enabling complex, multi-day tasks like building full games from prompts. This pressures rivals like Anthropic, whose similar tool launched just minutes earlier, and narrows the gap in agentic capabilities amid intensifying competition. By halving token usage and boosting speed 25%, it lowers barriers for developers, potentially shifting market dynamics toward broader adoption in everyday coding and expanding non-expert access via ChatGPT subscriptions. However, the 'high' cybersecurity rating prompts unprecedented safeguards, including trusted access programs, aligning with growing regulatory scrutiny on AI-enabled harms and setting a precedent for cautious rollouts of powerful agents. Over 12-24 months, this could steer R&D toward safer multi-agent orchestration and on-device tools, while driving funding into inference optimization to match OpenAI's efficiency gains.