Context
The ever-growing number of services and Internet of Things (IoT) devices has resulted in data being distributed across different locations (regions and countries). Additionally, data exhibits different usage patterns, including cold data (written once and never read), stream data (produced once and consumed by many), and hot data (written once and consumed by many). Furthermore, these data types have different performance and dependability requirements (e.g., low latency for data streams).
Data caching is a widely used technique that improves application performance by storing data on high-speed devices close to end users. Most research on data caching has focused on the benefits of different data placement strategies (i.e., which data to place in the cache), data movement, cache partitioning, cache eviction [1, 2, 3, 4, 5, 6, 7, 8], and on realizing cost-efficient data redundancy techniques in caching systems [9]. However, few efforts have studied data management when caches are distributed across different platforms (Edge-to-Cloud), utilize heterogeneous storage devices (in terms of performance and cost), and serve multiple, diverse applications, including traditional data services, serverless workflows and data streaming.
References:
[1] Asit Dan and Don Towsley. 1990. An Approximate Analysis of the LRU and FIFO Buffer Replacement Schemes. SIGMETRICS Perform. Eval. Rev. 18, 1 (apr 1990), 143–152. https://doi.org/10.1145/98460.98525
[2] Marek Chrobak and John Noga. 1999. LRU is better than FIFO. Algorithmica 23 (02 1999), 180–185. https://doi.org/10.1007/PL00009255
[3] Blankstein, Aaron, Siddhartha Sen, and Michael J. Freedman. “Hyperbolic caching: Flexible caching for web applications.” 2017 USENIX Annual Technical Conference (USENIX ATC 17). 2017.
[4] Cristian Ungureanu, Biplob Debnath, Stephen Rago, and Akshat Aranya. 2013. TBF: A memory-efficient replacement policy for flash- based caches. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). 1117–1128. https://doi.org/10.1109/ICDE.2013.6544902
[5] Orcun Yildiz, Amelie Chi Zhou, Shadi Ibrahim. 2018. Improving the Effectiveness of Burst Buffers for Big Data Processing in HPC Systems with Eley. Future Generation Computer Systems, Volume 86, 2018, Pages 308-318, ISSN 0167-739X, https://doi.org/10.1016/j.future.2018.03.029.
[6] G. Aupy, O. Beaumont and L. Eyraud-Dubois, "Sizing and Partitioning Strategies for Burst-Buffers to Reduce IO Contention," 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, 2019,
[7] ZHANG, Yazhuo, YANG, Juncheng, YUE, Yao, et al. {SIEVE} is simpler than {LRU}: an efficient {Turn-Key} eviction algorithm for web caches. In : 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 2024. p. 1229-1246.
[8] Juncheng Yang, Ziming Mao, Yao Yue, and K. V. Rashmi. GL-Cache: Group-level learning for efficient and high-performance caching. FAST’23, pages 115–134, 2023.
[9] RASHMI, K. V., CHOWDHURY, Mosharaf, KOSAIAN, Jack, et al.{EC-Cache}:{Load-Balanced},{Low-Latency} cluster caching with online erasure coding. In : 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 2016. p. 401-417.