The postdoc project is hosted jointly by INRIA Lille and CWI in Amsterdam. These national research centers have ongoing collaborations in the area of bandits and statistical testing. The current postdoc project revolves around the question of sequential robust estimation. This includes for example mean estimation in the Huber model with adversarial corruptions (Mathieu et al., 2022;
Agrawal et al., 2024) and A/B testing under mild model misspecification.
Robust statistics and robust estimation deal with inference and estimation when the data have outliers, i.e. a small proportion of the data is arbitrary and does not come from the distribution from which we want to learn (Ronchetti and Huber, 2009; Wilcox, 2012). As these outliers can make most classical non-robust statistics arbitrarily bad, the classical approaches of least square regression and maximum likelihood estimates have to be significantly modified. The effect of outliers is even stronger when the sample-size is small, in which case it is customary to use sequential methods to stop collection of data as soon as we have enough to conclude. Depending on when the outliers are collected, a sequential test may completely fail– stops very early and ends up with a wrong conclusion with high confidence.
More specifically, we are interested in studying two types of statistics, namely the Generalized Likelihood Ratio Test statistic (GLRT) (Kaufmann and Koolen, 2021; Agrawal et al., 2024) and the closely related KLInf statistic (Degenne and Mathieu, 2024), for the sequential setting and specific corruption models. Both these statistics are popular tools in statistical testing due to their close relation to information theoretic lower bounds on the minimal sample size needed to conclude. These statistics have already been applied successfully to estimation, confidence intervals and testing in both parametric and non-parametric models. Though the GLRT and KLinf statistics are well-defined in some of the corrupted settings (Agrawal et al., 2024), their concentrations are not sufficiently understood for most of the practical distributions.
In this context, the postdoc will work with us to resolve challenges of sequential robust estimation problem with specific distributional assumptions, structure and corruption styles. Specifically, we would work on one or more of these problems:
- Behaviour of GLRTs under different models of corruptions and structures of data distributions,
- Tightness and coverage of anytime-valid and robust confidence intervals in terms of KLInf,
- Concentration of KLInf driven robust mean estimation under corruptions beyond Gaussians.
The postdoc will study, employ and develop statistical frameworks including GLRT, KLinf, GRO and E-values (Gr¨unwald et al., 2024), martingales (Ruf et al., 2022; Kaufmann and Koolen, 2021), online learning and universal prediction (Agrawal et al., 2021).
The candidate is expected to conduct the research activities, that is bibliographical search, proposing original ideas related to the topic and developing them, presenting the work in the Scool seminar, workshops and conferences. Due to the collaborative nature of the project, the candidate is also expected to spend a part of time at CWI Amsterdam. The candidate should aim to publish the research results in premier conferences and journals of our field of research (e.g. ICML, NeurIPS, COLT, IJCAI, AAAI, JMLR, Annals of Stat.). Since the work involves and impacts responsible AI in general, the successful candidate should collaborate in writing scientific articles aiming towards the larger audience.
References
Agrawal, S., Koolen, W. M., and Juneja, S. (2021). Optimal best-arm identification methods for tail-risk measures. Advances in neural information processing systems, 34:25578–25590.
Agrawal, S., Mathieu, T., Basu, D., and Maillard, O.-A. (2024). Crimed: Lower and upper bounds on regret for bandits with unbounded stochastic corruption. In International Conference on Algorithmic Learning Theory, pages 74–124. PMLR.
Degenne, R. and Mathieu, T. (2024). Information lower bounds for robust mean estimation.
Grunwald, P., de Heide, R., and Koolen, W. (2024). Safe testing. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 86(5):1136–1137.
Kaufmann, E. and Koolen, W. M. (2021). Mixture martingales revisited with applications to sequential tests and confidence intervals. The Journal of Machine Learning Research, 22(1):11140–11183.
Mathieu, T., Basu, D., and Maillard, O.-A. (2022). Bandits corrupted by nature: Lower bounds on regret and robust optimistic algorithms. Transactions on Machine Learning Research.
Ronchetti, E. M. and Huber, P. J. (2009). Robust statistics. John Wiley & Sons Hoboken, NJ, USA.
Ruf, J., Larsson, M., Koolen, W. M., and Ramdas, A. (2022). A composite generalization of Ville’s martingale theorem. arXiv preprint arXiv:2203.04485.
Wilcox, R. R. (2012). Introduction to robust estimation and hypothesis testing. Academic press.