Details

  • OpenAI released GPT-5.3-Codex-Spark, a smaller, ultra-fast version of GPT-5.3-Codex, as a research preview for real-time coding assistance.
  • Available today to ChatGPT Pro users via Codex app, CLI, and VS Code extension; API access for select design partners.
  • Delivers over 1000 tokens per second on Cerebras Wafer Scale Engine 3 hardware, enabling near-instant feedback for edits, refactoring, and interface changes.
  • Features 128k context window, text-only; separate rate limits apply due to specialized low-latency infrastructure.
  • Includes end-to-end optimizations like 80% reduced client/server roundtrip overhead, 50% faster time-to-first-token via persistent WebSocket.
  • Outperforms GPT-5.1-Codex-mini on SWE-Bench Pro and Terminal-Bench 2.0 benchmarks, completing tasks in a fraction of the time of GPT-5.3-Codex.
  • First milestone in OpenAI-Cerebras partnership announced in January 2026; future plans include larger models, longer contexts, and multimodal support.

Impact

OpenAI's GPT-5.3-Codex-Spark introduces a latency-first tier to coding AI, powered by Cerebras' Wafer Scale Engine, achieving over 1000 tokens per second and enabling real-time collaboration that rivals feel less optimized for in interactive developer workflows. While competitors like Anthropic's Claude and Google's Gemini offer strong coding capabilities, they lag in sub-second response times for live editing, positioning Codex-Spark to pressure rivals by accelerating adoption in high-velocity dev environments where speed matches intelligence. This dual-mode approach—pairing long-horizon agents with instant iteration—aligns with industry shifts toward hybrid AI agents, potentially lowering barriers for rapid prototyping and widening access via Pro subscriptions. Over the next 12-24 months, it could steer R&D toward inference hardware innovations, intensifying GPU alternatives like Cerebras and influencing funding flows into low-latency AI infrastructure amid growing demands for on-device and edge computing.