🏛️ London's Premier Job Portal
Contexte et atouts du poste
Benchmarks for Language Models
Hosting team: Inria Lille, CRIStAL (UMR 9189) Supervisor: Damien Sileo Duration: 36 months Project: TACTICS (PEPR)
Context
Language models have become surprisingly capable reasoners, but progress is bottlenecked by data. Both training (SFT, RLVR) and evaluation rely on problem sets with known answers, and the supply of high-quality, uncontaminated, difficulty-calibrated reasoning problems is running thin. Hand-curated benchmarks saturate quickly, leak into pretraining corpora, and cannot be regenerated at will. Web-scraped reasoning data carries licensing baggage and offers no correctness guarantees.
Procedural generation offers a way out. By coupling a problem generator with a symbolic solver, one can produce an effectively unbounded stream of fresh instances, each shipped with a certified solution and a tuna...