Hosting team: Inria Lille, CRIStAL (UMR 9189) Supervisor: Damien Sileo Duration: 36 months Project: TACTICS (PEPR)
Language models have become surprisingly capable reasoners, but progress is bottlenecked by data. Both training (SFT, RLVR) and evaluation rely on problem sets with known answers, and the supply of high-quality, uncontaminated, difficulty-calibrated reasoning problems is running thin. Hand-curated benchmarks saturate quickly, leak into pretraining corpora, and cannot be regenerated at will. Web-scraped reasoning data carries licensing baggage and offers no correctness guarantees.
Procedural generation offers a way out. By coupling a problem generator with a symbolic solver, one can produce an effectively unbounded stream of fresh instances, each shipped with a certified solution and a tunable difficulty knob. The same pipeline serves three purposes at once: pretraining and SFT data with reasoning traces, verifiable rewards for RL, and contamination-free evaluation suites. Our group has been building this line of work through the Reasoning Core library, which already covers PDDL planning over randomized domains, full first-order logic, CFG parsing, causal inference over Bayesian networks, and equation solving.
The PhD will extend this infrastructure toward harder formalisms and richer reasoning regimes. The core technical question is how to turn a solver plus a formalism into a good procedural generator: one whose output distribution is broad rather than templated, whose difficulty scales smoothly, and whose problems remain well-posed and human-readable as complexity grows.
Several directions are open and will be shaped with the candidate:
- Advanced logics and planning. Hierarchical planning (HTN/HDDL), multi-agent and epistemic planning, temporal logics, and modal logics for reasoning about knowledge and belief. Each formalism has mature solvers but no scalable generator that exposes their full expressive range.
- From solver to generator. Most solvers are built to consume problems, not produce them. Sampling instances that are non-trivial, non-degenerate, solvable within a budget, and distributionally diverse is a research problem in itself, and it gets harder as the formalism grows.
- Verbalization and alignment with human priors. A formal instance is only useful if its natural-language rendering is faithful, varied, and not artificially easy or hard. This involves grammar-based verbalization, controlling surface form independently of logical content, and understanding how phrasing interacts with model inductive biases.
- Pushing scalability where solvers struggle. At high difficulty, exact solvers time out. The thesis will explore approximate verification, compositional generation, and incremental construction strategies that preserve correctness guarantees while extending the reachable difficulty range.
- Location: Inria Lille, France
- Funding: full PhD scholarship (PEPR TACTICS), plus compute budget for frontier model inference
- Start date: flexible, ideally October 2026