The Hivenet distributed platform allows participants to share computing and storage resources for limited periods of time. These resources may become available only at specific moments, or may be withdrawn unexpectedly. This creates a highly dynamic training environment, where the system must continuously decide which resources to use, when to use them, and how to exploit them efficiently.
This challenge is closely related to cross-device federated learning, where client participation often depends on external constraints such as battery level, WiFi connectivity, device usage, or time-of-day patterns. In operational federated learning systems, volatility is often handled by recruiting a large number of clients at each training round and proceeding once enough responses have been received. While effective at large scale, this strategy can be inefficient and may introduce statistical biases in the learning process.
The Hivenet setting differs from standard cross-device federated learning in several important ways. First, the number of available participants may be smaller, making it costly or impossible to rely on massive redundancy. Second, while federated learning usually assumes that data cannot be moved across clients, Hivenet may allow portions of data to be transferred to selected participants. Third, the availability of Hivenet resources may sometimes be known in advance or predicted with some accuracy, creating new opportunities for resource-aware training algorithms.
Beyond training time and model accuracy, such systems also raise questions of energy efficiency and environmental impact. Recent work on carbon-aware federated learning has shown that client and time-slot scheduling can exploit temporal and geographical variations in carbon intensity to reduce the carbon footprint of training [arputharaj25]. This perspective is particularly relevant in a distributed infrastructure such as Hivenet, where resource availability, energy characteristics, and communication conditions may vary significantly over time.
These specific features call for new distributed learning methods that jointly account for statistical efficiency, system constraints, resource availability, communication cost, and, when relevant, energy or carbon-related criteria.
The objective of this PhD is to develop new distributed training algorithms for volatile and resource-constrained environments such as Hivenet, taking into account not only training time and accuracy, but also communication cost and, when relevant, energy or carbon-related objectives.
A first objective will be to model the availability of participating resources. The candidate will study how to characterize resource availability patterns using real-world traces, measurements, or insights from Hivenet engineers. This modeling phase will aim to capture both predictable availability, such as daily or weekly patterns, and unpredictable events, such as sudden resource withdrawals.
A second objective will be to design training algorithms that exploit these availability models. The algorithms will decide which resources should participate in training, when they should be activated, and how they should be used. In particular, the thesis will study questions such as how much data should be transferred to a participant, how many local model updates should be performed before communication, and how to balance computation, communication, and statistical progress.
A third objective will be to provide theoretical guarantees for the proposed methods. The candidate will analyze the convergence and training-time performance of distributed learning algorithms under intermittent, heterogeneous, and possibly predictable resource availability. A particular focus will be placed on understanding the trade-off between redundancy, communication cost, local computation, and bias induced by non-uniform participation.
A fourth objective will be to investigate frugal and sustainable training strategies. Building on recent work on client scheduling, intermittent availability, and carbon-aware federated learning [cho22, ribero23, rodio24, arputharaj25], the thesis will study how to schedule training across clients and time slots in order to reduce unnecessary computation, exploit favorable availability periods, and possibly lower the carbon impact of training without significantly degrading accuracy or increasing completion time.
A fifth objective will be to evaluate the proposed algorithms experimentally. The methods will first be tested in simulation on standard machine learning datasets and controlled availability models. In a second phase, the most promising algorithms will be evaluated in conditions closer to the Hivenet system, with the goal of assessing their practical performance and deployment potential.
Overall, the thesis aims to contribute to a new generation of frugal distributed training algorithms: algorithms that avoid unnecessary redundancy, exploit predictable resource availability, and make efficient use of heterogeneous resources while preserving learning accuracy.
[mcmahan17] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, PMLR, 2017.
[li20] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), 2020.
[cho22] Y. J. Cho, J. Wang, and G. Joshi. Towards understanding biased client selection in federated learning. In AISTATS, 2022.
[ribero23] M. Ribero, H. Vikalo, and G. de Veciana. Federated learning under intermittent client availability and time-varying communication constraints. IEEE Journal of Selected Topics in Signal Processing, 17(1), 2023.
[rodio24] A. Rodio, F. Faticanti, O. Marfoq, G. Neglia, and E. Leonardi. Federated learning under heterogeneous and correlated client availability. IEEE/ACM Transactions on Networking, 32(2), 2024.
[eichner19] H. Eichner, T. Koren, B. McMahan, N. Srebro, and K. Talwar. Semi-cyclic stochastic gradient descent. In ICML, 2019.
[cho23] Y. J. Cho, P. Sharma, G. Joshi, Z. Xu, S. Kale, and T. Zhang. On the convergence of federated averaging with cyclic client participation. In ICML, 2023.
[arputharaj25] D. R. Arputharaj, C. Rodriguez, A. Rodio, and G. Neglia. Green Federated Learning via Carbon-Aware Client and Time Slot Scheduling. In 33rd International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2025.