Abstract
Recently, High Performance, Big Data, and Cloud Computing worlds tend to converge in terms of workload deployment with containerization technology acting as an enabler towards this direction. In such cases of application diversity and multi-tenancy, a universal scheduler able to satisfy the end-user needs for seamless, yet, efficient application deployment is required. While Kubernetes container orchestrator seems to be the answer that enables application-agnostic deployment, it still depends highly on coarse system metrics for its scheduling policies, thus, neglecting the performance degradation due to resource contention in the underlying system.
In this paper, we design and implement an interference-aware modular framework, able to balance incoming workload based on low-level metrics monitoring. We evaluate our proposed solution over different workload mixes and co-location scenarios showing that against the state-of-art, but interference unaware Kubernetes scheduler the proposed framework significantly improves the latency distribution of the converged cloud infrastructure, improving median latency up to 27% and reducing standard deviation up to 25%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Google cloud platform. https://www.cloud.google.com. Accessed 02 Feb 2021
grpc. https://grpc.io/. Accessed 02 Feb 2021
Protocol buffers. https://developers.google.com/protocol-buffers. Accessed 02 Feb 2021
Al Jawarneh, I.M., et al: Container orchestration engines: a thorough functional and performance comparison. In: ICC 2019–2019 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2019)
Amazon, E.: Amazon web services (November 2012) (2015). http://aws.amazon.com/es/ec2/
Authors, P.: Prometheus-monitoring system & time series database (2017)
Bauman, E., Ayoade, G., Lin, Z.: A survey on hypervisor-based monitoring: approaches, applications, and evolutions. ACM Comput. Surv. (CSUR) 48(1), 10 (2015)
Blagodurov, S., Fedorova, A.: User-level scheduling on NUMA multicore systems under Linux. In: Linux Symposium, vol. 2011 (2011)
Burns, B., Grant, B., Oppenheimer, D., Brewer, E., Wilkes, J.: Borg, omega, and Kubernetes: lessons learned from three container-management systems over a decade. Queue 14(1), 70–93 (2016)
Cassandra, A.: Apache Cassandra. Website 13 (2014). http://planetcassandra.org/what-is-apache-cassandra
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154 (2010)
Delimitrou, C., Kozyrakis, C.: ibench: quantifying interference for datacenter applications. In: 2013 IEEE International Symposium on Workload Characterization (IISWC), pp. 23–33. IEEE (2013)
Dongarra, J., Heroux, M.A., Luszczek, P.: HPCG benchmark: a new metric for ranking high performance computing systems. Knoxville, Tennessee, pp. 1–11 (2015)
Felter, W., Ferreira, A., Rajamony, R., Rubio, J.: An updated performance comparison of virtual machines and Linux containers. In: 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 171–172. IEEE (2015)
Ferdman, M., et al.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (2012)
Ferikoglou, A., Masouros, D., Tzenetopoulos, A., Xydis, S., Soudris, D.: Resource aware GPU scheduling in Kubernetes infrastructure. In: 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021), pp. 4:1–4:12 (2021)
Gan, Y., Zhang, Y., Hu, K., Cheng, D., He, Y., Pancholi, M., Delimitrou, C.: Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 19–33 (2019)
Garefalakis, P., Karanasos, K., Pietzuch, P., Suresh, A., Rao, S.: Medea: scheduling of long running applications in shared production clusters. In: Proceedings of the Thirteenth EuroSys Conference, p. 4. ACM (2018)
Henning, J.L.: Spec cpu2006 benchmark descriptions. ACM SIGARCH Comput. Archit. News 34(4), 1–17 (2006)
Kanev, S., et al.: Profiling a warehouse-scale computer. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 158–169 (2015)
Mars, J., Tang, L., Hundt, R., Skadron, K., Soffa, M.L.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 248–259. ACM (2011)
Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: Contention aware execution: online contention detection and response. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 257–265. ACM (2010)
Masouros, D., Xydis, S., Soudris, D.: Rusty: runtime interference-aware predictive monitoring for modern multi-tenant systems. IEEE Trans. Parallel Distrib. Syst. 32(1), 184–198 (2020)
MySQL, A.: Mysql (2001)
Naqvi, S.N.Z., Yfantidou, S., Zimányi, E.: Time series databases and influxdb. Studienarbeit, Université Libre de Bruxelles p. 12 (2017)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Terpstra, D., Jagode, H., You, H., Dongarra, J.: Collecting performance data with PAPI-C. In: Müller, M., Resch, M., Schulz, A., Nagel, W. (eds.) Tools for High Performance Computing 2009, pp. 157–173. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11261-4_11
Thomas Willham, R.D.: Intel\(\text{\textregistered} \) performance counter monitor - a better way to measure CPU utilization. https://software.intel.com/content/www/us/en/develop/articles/intel-performance-counter-monitor.html
Tzenetopoulos, A., Masouros, D., Xydis, S., Soudris, D.: Interference-aware orchestration in Kubernetes. In: Jagode, H., Anzt, H., Juckeland, G., Ltaief, H. (eds.) ISC High Performance 2020. LNCS, vol. 12321, pp. 321–330. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59851-8_21
Wang, L., et al.: Bigdatabench: a big data benchmark suite from internet services. In: 2014 IEEE 20th international symposium on high performance computer architecture (HPCA), pp. 488–499. IEEE (2014)
Wegrzynek, A.: Influxdb C++ client. https://github.com/awegrzyn/influxdb-cxx (2019)
Yang, H., Breslow, A., Mars, J., Tang, L.: Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. ACM SIGARCH Comput. Archit. News 41(3), 607–618 (2013)
Yasin, A., Ben-Asher, Y., Mendelson, A.: Deep-dive analysis of the data analytics workload in cloudsuite. In: 2014 IEEE International Symposium on Workload Characterization (IISWC), pp. 202–211. IEEE (2014)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I., et al.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)
Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. ACM SIGPLAN Notices 45(3), 129–142 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Tzenetopoulos, A., Masouros, D., Xydis, S., Soudris, D. (2022). Interference-Aware Workload Placement for Improving Latency Distribution of Converged HPC/Big Data Cloud Infrastructures. In: Orailoglu, A., Jung, M., Reichenbach, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2021. Lecture Notes in Computer Science, vol 13227. Springer, Cham. https://doi.org/10.1007/978-3-031-04580-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-04580-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04579-0
Online ISBN: 978-3-031-04580-6
eBook Packages: Computer ScienceComputer Science (R0)