Perplexity Publishes Research on SFT + RL Pipeline for Accurate Search-Augmented AI Answers

Details

Perplexity released new research detailing its post-training pipeline using supervised fine-tuning (SFT) followed by on-policy reinforcement learning (RL) to enhance AI models for search-augmented answers.
The pipeline improves search accuracy, citation quality, instruction following, and efficiency, applied to Qwen models from Alibaba to match or exceed GPT models in factuality at lower cost.
First stage fine-tunes models to follow instructions, adhere to guardrails, and maintain consistent language.
Second stage uses RL with a reward design combining correctness, user preference, and efficiency; preferences only factor in for correct answers to avoid optimizing flawed responses.
This explains why the same base model delivers more accurate, better-cited, and efficient answers within Perplexity compared to unmodified versions.

Impact

Perplexity's SFT + RL pipeline positions it to challenge OpenAI and Anthropic by achieving GPT-level factuality on Qwen base models at reduced costs, potentially accelerating adoption of cost-efficient search-augmented AI in enterprise research tools. Benchmarks like 93.9% SimpleQA accuracy for related Deep Research features outpace ChatGPT's 87.6% citation validity, narrowing the gap with frontier models while emphasizing verifiable citations over raw generation. This operational edge could pressure rivals to prioritize post-training for retrieval tasks, shifting market dynamics toward hybrid search-LLM systems amid rising demands for trustworthy AI outputs in professional settings.

Perplexity Publishes Research on SFT + RL Pipeline for Accurate Search-Augmented AI Answers

Details

Impact

Social

CONTENT

INFO