Details
- Cursor released a technical report detailing Composer 2's training process, starting from Moonshot AI's open-source Kimi K2.5 model, with continued pretraining, reinforcement learning (RL), and benchmark development to emulate real coding environments.
- Key phases include continued pretraining for consistent downstream coding gains, RL with simple algorithms for broad performance boosts, and CursorBench for realistic, complex software engineering problems.
- Infrastructure highlights custom open-sourced kernels, distributed training, and RL environment scaling; thanks to Kimi K2.5, Ray, ThunderKittens, PyTorch, Fireworks, and Colfax.
- Composer 2 achieves frontier-level coding: 61.3% on CursorBench, 61.7% Terminal-Bench 2.0, 73.7% SWE-bench Multilingual, up from Composer 1.5's 44.2%, 47.9%, 65.9%.
- Priced at $0.50/M input, $2.50/M output tokens (fast variant $1.50/M input, $7.50/M output); uses compaction-in-the-loop RL reducing error by 50% for project-scale refactors over hundreds of actions.
- Cursor notes only ~1/4 compute from base Kimi, rest proprietary, yielding very different benchmark performance.
Impact
Cursor's Composer 2, built atop Moonshot's Kimi with heavy continued pretraining and RL, delivers benchmark scores rivaling leaders like Claude Opus, pressuring rivals such as OpenAI's o1 and Anthropic in coding-specific tasks at aggressive $0.50/M input pricing that undercuts many frontier models. This leverages Chinese open-source foundations amid U.S.-China AI tensions, accelerating U.S. developer tools access while highlighting global model interdependence. Rapid iteration—third generation in five months—narrows gaps with generalist APIs, potentially shifting market toward specialized coding agents and lowering barriers for enterprise adoption.
