Context
PDE surrogates are neural networks trained on data generated by traditional numerical PDE solvers. Their goal is to approximate these solvers at a significantly lower computational and memory cost. These approaches have recently attracted strong interest within the emerging fields of Scientific Machine Learning (SciML) and AI for Science.
Model architectures have rapidly evolved, from CNN-based designs to advanced approaches combining attention mechanisms, neural operators, and generative models. Recent examples include PDE-Transformer, Poseidon, and Universal Physics Transformer. In weather forecasting, several teams have reported breakthrough results using PDE surrogates, achieving near state-of-the-art accuracy at a fraction of the computational cost.
Deterministic PDE surrogates are typically trained by minimizing a mean squared error (MSE) loss between predictions and ground truth. However, they often suffer from a “regression to the mean” effect, which limits their ability to capture complex or chaotic dynamics.
Stochastic PDE surrogates address this limitation by incorporating generative modeling techniques such as diffusion models (DDPM), score-based models, or flow matching. These methods learn to transform a simple known distribution (typically Gaussian) into a complex target distribution using an iterative denoising process. At inference time, the model generates realistic samples conditioned on input data.
Compared to deterministic approaches, generative PDE surrogates better capture fine-scale structures and uncertainty, especially for chaotic systems. They naturally enable uncertainty quantification, making them well-suited for sensitivity analysis and inverse problems (e.g., parameter estimation via Bayesian inference).
NEMO (https://www.nemo-ocean.eu/) is a widely used ocean circulation model for research and operational forecasting in oceanography and climate science. It is based on the Navier–Stokes equations, coupled with a nonlinear equation of state linking temperature and salinity to fluid motion. Due to its turbulent and chaotic nature, uncertainty quantification is essential, motivating the use of stochastic PDE surrogates. Similarly, Croco (https://www.croco-ocean.org) is an other ocean model specialized for costal and regional simulations.
The objective of this position is to design, train, and validate a conditional generative PDE surrogate for the NEMO and Croco models.
Our research
This project is a collaboration between IGE and the DataMove team, combining complementary expertise in ocean modeling and large-scale machine learning.
IGE is one of the leading contributors to the NEMO model and has deep expertise in ocean model numerical implementations, parameterizations, and applications. This knowledge is essential for data generation, validation, and physical interpretation.
DataMove has extensive experience in training PDE surrogates on large-scale supercomputing infrastructures. The team develops and maintains Melissa, an in-house platform (https://hal.science/hal-04102400v1 - ICML 2023), which enables efficient online training by streaming data directly from simulation runs to distributed multi-GPU training pipelines.
Melissa also supports active learning strategies, allowing simulations to focus on challenging regimes and thereby improve both model quality and training efficiency (https://hal.science/hal-04712480v1).