A striking remark when looking at the current types of attacks on AI
models is their quantity and apparent independence (see [10] Fig. 3):
each is treated as a separate domain. In addition to this list of
attacks, we claim that an audit may be viewed as the leak of a feature
from a production model, and must be considered as a potential threat.
In that light, clarifications in the relation between these attacks
might come from a systematic study of how they relate with regards to
the setup they operate in, versus the information gain they permit. We
propose to work on a hierarchy of attacks, that will uncover the
smallest attacks (in terms of assumptions and scope) and how they
might be composed into larger attacks, and so on. This hierarchy will
reveal unexplored configurations, where several simple attacks will be
combined to build richer attacks. This hierarchy will provide the
missing link between audits and AI security, bridging the two in a
formal way. The postdoc candidate will leverage algorithmic
background, to devise a hierarchy, in a parallel to the Herlihy
hierarchy in algorithms. We intend to use the notion of
"distinguishability" [14] as a hierarchy backbone (to assess if an
attack leaks data permitting strong or weak distinguishability of
models). In particular, the field of "property testing" will be
related to this hierarchy.
# References
[1] Le Merrer, E., Pons, R., & Tredan, G. (2024). Algorithmic audits of algorithms, and the law. AI and Ethics, 4(4),
1365-1375.
[2] Godinot, A., Le Merrer, E., Penzo, C., Taïani, F., & Tredan, G. (2025). Queries, Representation & Detection: The
Next 100 Model Fingerprinting Schemes. In AAAI.
[3] Le Merrer, E., & Tredan, G. (2020) Remote explainability faces the bouncer problem. Nature machine intelligence,
2(9), 529-539.
[4] Maho, T., Furon, T., & Le Merrer, E. (2021). Surfree: a fast surrogate-free black-box attack. In CVPR.
[5] Godinot, A., Le Merrer, E., Tredan, G., Penzo, C., & Taïani, F. (2024). Under manipulations, are some AI models
harder to audit?. In IEEE Conference on Secure and Trustworthy Machine Learning.
[6] de Vos, M., Dhasade, A., Garcia Bourrée, J., Kermarrec, A. M., Le Merrer, E., Rottembourg, B., & Tredan, G. (2024).
Fairness auditing with multi-agent collaboration. In ECAI.
[7] Le Merrer, E., Perez, P., & Tredan, G. (2020). Adversarial frontier stitching for remote neural network
watermarking. Neural Computing and Applications, 32(13), 9233-9244.
[8] Le Merrer, E., Morgan, B., & Tredan, G. (2021). Setting the record straighter on shadow banning. In INFOCOM.
[9] Maho, T., Furon, T., & Le Merrer, E. (2022). Randomized smoothing under attack: How good is it in practice?. In
ICASSP.
[10]Ma et al., « Safety at Scale: A Comprehensive Survey of Large Model Safety». arXiv:2502.05206v3
[11]Yan, T., & Zhang, C. (2022). Active fairness auditing. In ICML.
[12]Apruzzese, G., Anderson, H. S., Dambra, S., Freeman, D., Pierazzi, F., & Roundy, K. (2023). “Real attackers don't
compute gradients”: bridging the gap between adversarial ml research and practice. In 2023 IEEE conference on
secure and trustworthy machine learning.
[13]Fukuchi, K., Hara, S., & Maehara, T. (2020). Faking fairness via stealthily biased sampling. In AAAI.
[14]Attiya, H., & Rajsbaum, S. (2020). Indistinguishability. Communications of the ACM, 63(5), 90-99.
[15]ANSSI (2024). Security recommandations for a generative AI system. ANSSI-PA-102.