Details

  • OpenAI released GPT-5.3-Codex-Spark, a lightweight version of its agentic coding model GPT-5.3-Codex, as a research preview exclusively for ChatGPT Pro users in the Codex app.
  • The model achieves over 1000 tokens per second, powered by Cerebras' Wafer Scale Engine 3 (WSE-3) chip with 4 trillion transistors for ultra-low latency inference.
  • Designed for real-time collaboration, rapid prototyping, and swift iteration, contrasting with the full GPT-5.3-Codex for deeper, longer tasks.
  • This marks the first milestone in OpenAI's multi-year, over $10 billion partnership with Cerebras, integrating custom hardware to accelerate AI responses.
  • Launch includes limitations that OpenAI plans to improve rapidly; available today for Pro users focusing on daily productivity in coding workflows.
  • Builds on recent GPT-5.3-Codex release, which advanced coding benchmarks and was partially self-developed using prior models.

Impact

OpenAI's rollout of GPT-5.3-Codex-Spark underscores its push into high-speed inference, leveraging Cerebras' WSE-3 to deliver over 1000 tokens per second, a concrete leap that pressures rivals like Anthropic and Google in agentic coding tools where low latency enables new real-time workflows. This hardware integration, following a $10 billion deal, addresses GPU bottlenecks and positions OpenAI to widen access for Pro users, potentially accelerating adoption in software development amid rising demand—Codex usage has doubled since December. While cybersecurity risks prompted safeguards like trusted access programs for the base model, Spark's focus on rapid prototyping sidesteps some high-risk automation for now, aligning with OpenAI's preparedness framework. Over the next 12-24 months, this could steer R&D toward hybrid chip-model stacks, drawing funding to inference specialists like Modal Labs and shifting market dynamics toward faster, cheaper AI deployment without sacrificing reasoning depth.