Work done by Weiqi Wang, Xin Liu


🔍 TL;DR

Problem

Idea: HeaPA. Turn a static dataset into a living curriculum by:

  1. Dual-heap query pool
  2. On-policy query augmentation
  3. Teacher-guided verification
  4. Reward propagation via a lineage graph

Results (Qwen2.5-7B, math RL)

Plug-and-play

Code and paper coming up soon!


1. Motivation: Why data efficiency matters in RL for reasoning

LLMs are pretty good at many NLP tasks, but math reasoning is still hard: