Skip to main content

Interference-Aware Workload Placement for Improving Latency Distribution of Converged HPC/Big Data Cloud Infrastructures

  • Conference paper
  • First Online:
Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13227))

Included in the following conference series:

Abstract

Recently, High Performance, Big Data, and Cloud Computing worlds tend to converge in terms of workload deployment with containerization technology acting as an enabler towards this direction. In such cases of application diversity and multi-tenancy, a universal scheduler able to satisfy the end-user needs for seamless, yet, efficient application deployment is required. While Kubernetes container orchestrator seems to be the answer that enables application-agnostic deployment, it still depends highly on coarse system metrics for its scheduling policies, thus, neglecting the performance degradation due to resource contention in the underlying system.

In this paper, we design and implement an interference-aware modular framework, able to balance incoming workload based on low-level metrics monitoring. We evaluate our proposed solution over different workload mixes and co-location scenarios showing that against the state-of-art, but interference unaware Kubernetes scheduler the proposed framework significantly improves the latency distribution of the converged cloud infrastructure, improving median latency up to 27% and reducing standard deviation up to 25%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Google cloud platform. https://www.cloud.google.com. Accessed 02 Feb 2021

  2. grpc. https://grpc.io/. Accessed 02 Feb 2021

  3. Protocol buffers. https://developers.google.com/protocol-buffers. Accessed 02 Feb 2021

  4. Al Jawarneh, I.M., et al: Container orchestration engines: a thorough functional and performance comparison. In: ICC 2019–2019 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2019)

    Google Scholar 

  5. Amazon, E.: Amazon web services (November 2012) (2015). http://aws.amazon.com/es/ec2/

  6. Authors, P.: Prometheus-monitoring system & time series database (2017)

    Google Scholar 

  7. Bauman, E., Ayoade, G., Lin, Z.: A survey on hypervisor-based monitoring: approaches, applications, and evolutions. ACM Comput. Surv. (CSUR) 48(1), 10 (2015)

    Article  Google Scholar 

  8. Blagodurov, S., Fedorova, A.: User-level scheduling on NUMA multicore systems under Linux. In: Linux Symposium, vol. 2011 (2011)

    Google Scholar 

  9. Burns, B., Grant, B., Oppenheimer, D., Brewer, E., Wilkes, J.: Borg, omega, and Kubernetes: lessons learned from three container-management systems over a decade. Queue 14(1), 70–93 (2016)

    Article  Google Scholar 

  10. Cassandra, A.: Apache Cassandra. Website 13 (2014). http://planetcassandra.org/what-is-apache-cassandra

  11. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154 (2010)

    Google Scholar 

  12. Delimitrou, C., Kozyrakis, C.: ibench: quantifying interference for datacenter applications. In: 2013 IEEE International Symposium on Workload Characterization (IISWC), pp. 23–33. IEEE (2013)

    Google Scholar 

  13. Dongarra, J., Heroux, M.A., Luszczek, P.: HPCG benchmark: a new metric for ranking high performance computing systems. Knoxville, Tennessee, pp. 1–11 (2015)

    Google Scholar 

  14. Felter, W., Ferreira, A., Rajamony, R., Rubio, J.: An updated performance comparison of virtual machines and Linux containers. In: 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 171–172. IEEE (2015)

    Google Scholar 

  15. Ferdman, M., et al.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (2012)

    Google Scholar 

  16. Ferikoglou, A., Masouros, D., Tzenetopoulos, A., Xydis, S., Soudris, D.: Resource aware GPU scheduling in Kubernetes infrastructure. In: 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021), pp. 4:1–4:12 (2021)

    Google Scholar 

  17. Gan, Y., Zhang, Y., Hu, K., Cheng, D., He, Y., Pancholi, M., Delimitrou, C.: Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 19–33 (2019)

    Google Scholar 

  18. Garefalakis, P., Karanasos, K., Pietzuch, P., Suresh, A., Rao, S.: Medea: scheduling of long running applications in shared production clusters. In: Proceedings of the Thirteenth EuroSys Conference, p. 4. ACM (2018)

    Google Scholar 

  19. Henning, J.L.: Spec cpu2006 benchmark descriptions. ACM SIGARCH Comput. Archit. News 34(4), 1–17 (2006)

    Article  Google Scholar 

  20. Kanev, S., et al.: Profiling a warehouse-scale computer. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 158–169 (2015)

    Google Scholar 

  21. Mars, J., Tang, L., Hundt, R., Skadron, K., Soffa, M.L.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 248–259. ACM (2011)

    Google Scholar 

  22. Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: Contention aware execution: online contention detection and response. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 257–265. ACM (2010)

    Google Scholar 

  23. Masouros, D., Xydis, S., Soudris, D.: Rusty: runtime interference-aware predictive monitoring for modern multi-tenant systems. IEEE Trans. Parallel Distrib. Syst. 32(1), 184–198 (2020)

    Article  Google Scholar 

  24. MySQL, A.: Mysql (2001)

    Google Scholar 

  25. Naqvi, S.N.Z., Yfantidou, S., Zimányi, E.: Time series databases and influxdb. Studienarbeit, Université Libre de Bruxelles p. 12 (2017)

    Google Scholar 

  26. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  27. Terpstra, D., Jagode, H., You, H., Dongarra, J.: Collecting performance data with PAPI-C. In: Müller, M., Resch, M., Schulz, A., Nagel, W. (eds.) Tools for High Performance Computing 2009, pp. 157–173. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11261-4_11

    Chapter  Google Scholar 

  28. Thomas Willham, R.D.: Intel\(\text{\textregistered} \) performance counter monitor - a better way to measure CPU utilization. https://software.intel.com/content/www/us/en/develop/articles/intel-performance-counter-monitor.html

    Google Scholar 

  29. Tzenetopoulos, A., Masouros, D., Xydis, S., Soudris, D.: Interference-aware orchestration in Kubernetes. In: Jagode, H., Anzt, H., Juckeland, G., Ltaief, H. (eds.) ISC High Performance 2020. LNCS, vol. 12321, pp. 321–330. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59851-8_21

    Chapter  Google Scholar 

  30. Wang, L., et al.: Bigdatabench: a big data benchmark suite from internet services. In: 2014 IEEE 20th international symposium on high performance computer architecture (HPCA), pp. 488–499. IEEE (2014)

    Google Scholar 

  31. Wegrzynek, A.: Influxdb C++ client. https://github.com/awegrzyn/influxdb-cxx (2019)

  32. Yang, H., Breslow, A., Mars, J., Tang, L.: Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. ACM SIGARCH Comput. Archit. News 41(3), 607–618 (2013)

    Article  Google Scholar 

  33. Yasin, A., Ben-Asher, Y., Mendelson, A.: Deep-dive analysis of the data analytics workload in cloudsuite. In: 2014 IEEE International Symposium on Workload Characterization (IISWC), pp. 202–211. IEEE (2014)

    Google Scholar 

  34. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I., et al.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)

    Google Scholar 

  35. Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. ACM SIGPLAN Notices 45(3), 129–142 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Achilleas Tzenetopoulos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tzenetopoulos, A., Masouros, D., Xydis, S., Soudris, D. (2022). Interference-Aware Workload Placement for Improving Latency Distribution of Converged HPC/Big Data Cloud Infrastructures. In: Orailoglu, A., Jung, M., Reichenbach, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2021. Lecture Notes in Computer Science, vol 13227. Springer, Cham. https://doi.org/10.1007/978-3-031-04580-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-04580-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-04579-0

  • Online ISBN: 978-3-031-04580-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics