Details

  • Qwen, a Chinese AI lab, introduced Jan-v2-VL, its second-generation visual-language agent designed to handle complex, multi-stage tasks.
  • In internal stress tests, Jan-v2-VL completed 49 consecutive sub-tasks—ranging from web searches and image analysis to code execution—without any failures.
  • The model features a hierarchical planner, long-context memory buffer, and a vision-language transformer fine-tuned on Qwen-2 72B, as indicated in engineering documentation.
  • This new version increases its autonomy ceiling by 58 percent over Jan-v1 (which managed 31 steps) and more than doubles Google’s Gemini 1.5 Pro reported 20-step performance.
  • Demonstrations include a travel-planning workflow where the system parses screenshots, books flights through APIs, generates expense spreadsheets, and sends booking emails.
  • Qwen has announced intentions to open-source model weights and evaluation tools in December to support academic verification and safety research.
  • Jan-v2-VL will be offered on Alibaba Cloud’s PAI-EAS platform, available with a pay-as-you-go API at $0.006 per 1,000 tokens beginning December 1, 2025.

Impact

Qwen’s ability to sustain 49 consecutive tasks pushes rivals like Anthropic’s Claude 4 and OpenAI’s GPT-4o, which currently reach about 35 steps, to evolve. With pricing significantly lower than Western competitors, Qwen’s offering could drive wider adoption of multimodal AI in the Asia-Pacific region. The long-context memory feature not only sets a new standard for reliability but could influence future AI model designs across the industry.