Adaptive Application Scheduling Under Interference in Kubernetes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing

Adaptive Application Scheduling under Interference in


Kubernetes

Víctor Medel Omer Rana José Ángel Bañares


Aragon Institute of School of Computer Science & Aragon Institute of
Engineering Research (I3A) Informatics Engineering Research (I3A)
University of Zaragoza, Spain Cardiff University, UK University of Zaragoza, Spain
[email protected] [email protected] [email protected]
Unai Arronategui
Aragon Institute of
Engineering Research (I3A)
University of Zaragoza, Spain
[email protected]

ABSTRACT 2. [App, R1 ] Resource1 3. Interference

Containers are rapidly replacing Virtual Machines (VMs) as


the compute instance in cloud-based deployments. The sig- 1. Qos
Scheduler
2. [App, Ri ]
Resourcei
SoI
App
nificantly lower overhead of deploying containers (compared i) Contingence
to VMs) has often been cited as one reason for this. How- actions
ii) Adaptation 2. [App, Rk ]
ever, interference caused by the limited isolation in shared Controller
Resourcek

resources can impact into the performance of hosted applica-


tions. We develop a Reference Net-based model of resource
4. ∆µ
management within Kubernetes, primarily to better charac-
terise such performance issues. Our model makes use of data
obtained from a Kubernetes deployment, and can be used Figure 1: Deployment and elasticity schema of an
as a basis to design scalable (and potentially interference- application in proposed cloud architecture.
tolerant) applications that make use of Kubernetes.
resources in this system. These system can be influenced
1. INTRODUCTION & CONTEXT by different sources of interference – such as background
Kubernetes provides the means to support container-based workload imposing varying degrees of contention in system
deployment within Platform-as-a-Service (PaaS) clouds, fo- resources (e.g. memory, CPU usage, network, etc). Applica-
cusing specifically on cluster-based systems. It allows to de- tion execution is likely to be influenced differently based on
ploy multiple “pods” across physical machines, enabling scale the type and duration of interference observed over the life-
out of an application with dynamically changing workload. time of an application. Our Reference Net based model at-
Each pod can support multiple Docker containers, which are tempts to capture how such interference sources can be char-
able to make use of services (e.g. file system and I/O) as- acterised, based on data acquired from micro-benchmarks
sociated with a pod. With significant interest in supporting which are executed alongside an application. In this way, an
cloud native applications (CNA), Kubernetes provides a use- application developer can take account of such interference
ful approach to achieve this. One of the key requirements sources to improve their application behaviour providing ap-
for CNA is support for scalability and resilience of the de- plication versions with different performances and degrees
ployed application, making more effective use of on-demand of resilience to interferences. Our approach is influenced
provisioning and elasticity of cloud platforms [1]. by Paragon [2] system, which uses a classification algorithm
We present a Reference Net (a kind of Petri net) based to determine the influence of several Sources of Interference
performance and management model for Kubernetes, iden- (SoI) on application execution. With this information, a
tifying different operational states that may be associated scheduler attempts to balance application workloads, de-
with “pods” and containers, and the competition for shared pending on application interference characteristics. Simi-
larly, ARQ [3] introduces “Quality of a Resource” needed by
an application – low values of this metric identify workloads
which can be scheduled across a wider range of potential
resources.
Publication rights licensed to ACM. ACM acknowledges that this contribution was
authored or co-authored by an employee, contractor or affiliate of a national govern-
ment. As such, the Government retains a nonexclusive, royalty-free right to publish or
reproduce this article, or to allow others to do so, for Government purposes only. 2. CONTAINER MANAGEMENT
UCC ’16, December 06 - 09, 2016, Shanghai, China An application deployed inside a container is often not iso-
ACM ISBN 978-1-4503-4616-0/16/12. . . $15.00 lated from other tenant’s containers on the same machine.
DOI: http://dx.doi.org/10.1145/2996890.3007889 The interference caused can lead to performance degrada-

426
tion and to inefficient scheduling decisions about resource
allocation. In addition, due to the nature of the SoI, there
are applications that may be more resilient to interference
than others.
Figure 1 illustrates our architecture, suggesting two kinds
of actions to mitigate the effects of interference: i) sched-
uler actions – such as container migration and ii) developer
actions – such as identifying the interference source and de-
veloping adaptation into the application. To support this,
we model the Kubernetes resource manager to simulate the
behaviour of an application under different deployment sce-
narios. We use micro-benchmark data to characterise the
model – i.e. parameters ranges to consider for the model.

2.1 Characterising Interference Figure 2: Behaviour of a Pod. Tokens are containers


PNs have been extensively used to model resource allo- and places represent container lifecycle states.
cation systems (RAS) and to design solutions for resources
allocation problems [4]. The degradation in the associated
performance metrics is caused by the waiting time (or by Once the model has been built and tuned with micro-
the presence of deadlocks) while waiting for a resource to be benchmark results, we can use it to estimate the overhead
available. This formalism represents high level resources, for introduced by interference. A developer can test several ap-
example, the allocation of computing nodes by a scheduler. plication deployment configurations. This approach can be
However, the resources involved in the execution of a con- used to identify applications with low/high sensitivity to in-
tainer in a node are more complex to model. To understand terference, enabling the scheduler to allocate containers per
how interference between containers works, we have identi- node. Additionally, the model can be used to undertake var-
fied the potential SoIs: i) CPU usage: if there is no con- ious what-if scenarios to investigate the behaviour of the SoI.
tention in the use of the CPU, the container uses the needed We can assign a random probability distribution for the us-
CPU; if not, there is a reservation system to share the CPU age of a resource associated with each SoI and measure the
proportionally; ii) Cache Memory (L1/L2/L3 and TLB) and associated overhead for each container.
Memory bandwidth: the cache hierarchy in a node is not iso-
lated between containers, so a container can be continuously Acknowledgments: This work was supported in part by:
failing in cache access because another one is making an ag- The Industry and Innovation department of the Aragonese
gressive use of cache; iii) network usage: the access to the Government, European Social Funds (COSMOS group, ref.
network is shared between all containers at a physical node; T93) and the Spanish Ministry of Economy (Programa de
in addition, if there is no contention, a container can use the I+D+i Estatal de Investigación, Desarrollo e innovación
entire bandwidth; iv) I/O file system access: the file sys- Orientada a los Retos de la Sociedad TIN2013-40809-R).
tem is shared between all containers; moreover, it could be V. Medel was the recipient of a fellowship from the Spanish
a distributed file system and the network can related to it. Ministry of Economy and from the Fundación Ibercaja-CAI.

2.2 Reference Net Model 4. REFERENCES


Figure 2 shows part of our model, representing the be-
[1] S. Brunner, M. Blochlinger, G. Toffetti, J. Spillner, and
haviour of a Pod in Kubernetes through the lifecycle of its
T. M. Bohnert, “Experimental evaluation of the
containers. We have used the Object Nets paradigm [5] with
cloud-native application design,” IEEE/ACM 8th
reference semantic. This abstraction allows us to model hi-
International Conference on Utility and Cloud
erarchical systems. Bold transitions are Timed transitions,
Computing (UCC), pp. 488–493, 2015.
related to the lifecycle of a pod. Their probability distri-
[2] C. Delimitrou and C. Kozyrakis, “Paragon: Qos-aware
butions are obtained measuring a real Kubernetes cluster.
scheduling for heterogeneous datacenters,” in ACM
For example, T2 and T3 transitions model the execution
SIGPLAN Notices, vol. 48, no. 4. ACM, 2013, pp.
time of an application and time until the next container
77–88.
fails, respectively. Both are application dependant and can
be influenced by the background workload. In addition, the [3] C. Delimitrou, N. Bambos, and C. Kozyrakis,
deployment configuration of the application can affect this “Qos-aware admission control in heterogeneous
time. To model the behaviour of the container, we can re- datacenters,” in Proceedings of the 10th International
fine these transitions to represent the influence caused by Conference on Autonomic Computing (ICAC 13), 2013,
potential interference sources. pp. 291–296.
Each SoI can be modelled with a place for the resource [4] J.-P. López-Grao and J.-M. Colom, Transactions on
(shared for all containers) and a place which represents the Petri Nets and Other Models of Concurrency V.
sensibility to that source for each container. The interfer- Springer Berlin Heidelberg, 2012, ch. A Petri Net
ence is modelled as the waiting time to take the needed re- Perspective on the Resource Allocation Problem in
source. To estimate these values, some micro-benchmarking Software Engineering, pp. 181–200.
techniques should be used. [5] R. Valk, “Object petri nets: Using the nets-within-nets
paradigm, advanced course on petri nets 2003 (j. desel,
w. reisig, g. rozenberg, eds.), 3098,” 2003.
3. POTENTIAL USAGE

427

You might also like