32karamel A System For Timely Provisioning Large-Scale Software Across IaaS Clouds

2019 IEEE 12th International Conference on Cloud Computing (CLOUD)

Karamel: A System for Timely Provisioning

Large-Scale Software Across IaaS Clouds
Kamal Hakimzadeh1 and Jim Dowling1,2
KTH - Royal Institute of Technology, Stockholm, Sweden
Email: [email protected]
Logical Clocks AB, Stockholm, Sweden
Email: [email protected]

Abstract—Cloud-native systems and application software plat-

forms are becoming increasingly complex, and, ideally, they
are expected to be quick to launch, elastic, portable across
different cloud environments and easily managed. However, as
cloud applications increase in complexity, so do the resultant
challenges in configuring such applications, and orchestrating the
deployment of their constituent services on potentially different
cloud operating systems and environments.
This paper presents a new orchestration system called Karamel
that addresses these challenges by providing a cloud-independent
orchestration service for deploying and configuring cloud appli-
cations and platforms across different environments. In Karamel,
we model configuration routines with their dependencies in
composable modules, and we achieve a high level of configu-
ration/deployment parallelism by using techniques such as DAG Fig. 1: Interaction of a cloud user with an IaaS cloud through
traversal control logic, dataflow variable binding, and parallel Karamel as the engine transforms a cluster-definition (CD) to
actuation. In Karamel, complex distributed deployments are the desired cluster state by (1) The user provides a CD; (2) the
specified declaratively in a compact YAML syntax, and cluster
definitions can be validated using an external artifact repository engine extracts the list of provisioning modules from the CD,
(GitHub). (3) the engine downloads the metadata of the modules, (4) the
Index Terms—cloud; software deployment; orchestration; engine generates an execution DAG and runs it, (5) along the
execution, many feedback loops are generated between target
I. I NTRODUCTION systems and the engine; the engine handles failures and signals
The migration of applications and platforms to the Cloud, the plan further inside the loops.
has increased the complexity of their software deployment
processes to include infrastructure provisioning, orchestrated platforms. According to the feedback it receives from orches-
configuration and deployment [20], and support for runtime tration/configuration steps, it either proceeds with the plan or
configuration changes to services. it only repeats the tasks that have failed – without restarting
Current solutions in the DevOps paradigm suffer from lim- the plan from the beginning.
ited models, with container management systems, such as Ku- We model a configuration routine as a composable module
bernetes [17] and Docker-Swarm [3], using static container im- such that the dependencies of the module to other modules
ages that can be configured using environment variables. Static are also embedded inside the definition of the module. As
models enforce the configuration of software and launching such, we design modules only once but we reuse them many
of services to happen at the same time. In contrast, legacy times in deployments that have different orchestration plans.
configuration management (CM) systems, such as Chef [2], At runtime, a module is replicated in many similar actuation
Ansible [1], and Puppet [4], are host-centric, limiting the tasks and the actuator runs each task on a remote host. The
support for configuration steps that bounce between hosts ( controller uses dataflow variable binding - making downstream
orchestration steps) before services are launched. tasks wait for their required input data items to be built by
To this end, we present our cloud-independent software upstream tasks.
orchestration system for provisioning distributed systems, We implement a domain specific language (DSL) for ex-
Karamel, that follows the classical feedback control loop pressing software deployments (cluster definitions). Leverag-
mechanism for provisioning the software (see Figure 1). The ing on the properties provided by our composable design, our
controller generates an orchestration plan per deployment DSL is expressive but concise. The DSL allows fine-grained
and it executes the plan by using parallel actuation of the parameter passing to the modules, letting users customize the
target system on many machines residing on different cloud cluster definitions. Parameters defined in cluster definitions are

DOI 10.1109/CLOUD.2019.00069
Fig. 2: Visualization of different system classes by the complexity of configurations. Karamel supports distributed systems that
require more inter-component configuration and it introduces advanced configuration patterns that are not bound to hosts.

validated against their interface definition, typically hosted in distributed systems but they are limited in supporting advanced
a GitHub repository, and specified using Chef [2], a popular configuration patterns.
configuration management platform. That is, we reuse for Configuration Complexity. In Karamel, we push these
software configuration, while Karamel provides orchestration boundaries further. As Figure 2 shows, we target distributed
capabilities. services, like class C (the stronger class), but we want to sup-
The contributions of this work are: port advanced configuration patterns. The advanced patterns
(C1) An orchestration system for deploying software sys- are not limited to a single host, like in class B, and they are
tems on cloud platforms at scale. Deployments in Karamel not host-centric, like in class A and C. Besides, the model
are portable between different cloud providers (it currently is portable and it supports the combinations of bare-metal
supports Amazon [6], Google Cloud [10], Open-Stack [15], servers, virtual technologies, and virtual machine images.
and bare-metal servers) and virtualization technologies [16]. Features → Encapsulation (definition) Composition (at usage)
Classes ↓
1-Predefined Commands 0-None
The system is open source [13], [12] and it is used by Levels → 2-Predefined Functions 1-Imperative
DevOps ↓ 3-Custom Scripts 2-Declarative Local Dep.
organizations from academia and industry [7], [9], [11], [14]. 4-Custom Functions 3-Declarative Global Dep.
(C2) A composable model for encapsulating orchestration Kubernetes 1 0
A Swarm 1 0
routines along with their dependencies inside them. The model Marathon 1 3
Chef 3 2
allows the declarative design of many layers of software with B
Puppet 3 2
complex orchestration dependencies at scale. Ansible 3 1
C CloudFormation 3 1
(C3) A level of parallelism higher than existing solu- SaltStack 4 2
Brooklyn 4 1
tions. Karamel achieves this by its systematic solution for Karamel 4 3
deployments based on a DAG traversal control logic. Our
task abstraction allows us to support heterogeneous types TABLE I: Modular design of provisioning routines. 1-4 denote
of orchestration routines (e.g., forking machines, installing ’simple to advanced’ and 0 denotes ’not supported’.
software, etc.) in a single coherent plan. Further, parallel
Composable Modules. Our modular design brings bene-
actuation and dataflow variable binding increase support for
fits such as logical separation of configuration boundaries,
independent development, and building large-scale and robust
II. M OTIVATION AND R ELATED W ORK configuration management (CM) systems from smaller and
We present related work in the DevOps paradigm to moti- more stable modules. Table I shows encapsulation methods
vate Karamel. Firstly, we group systems into three classes: A, used by systems versus their flexibility of composing the
B, and C. We synthesize this information by analyzing the sys- modules in different deployments.
tems’ documentation and, where necessary, their source code. Containers are bounded to pre-defined commands (e.g.,
Class A follows the microservice architecture [18] using con- start, stop, etc.) - only Marathon supports distributed depen-
tainer images for fast launch. Class B supports deployments dencies but they have to be defined imperatively and per
on many hosts but they can only configure hosts individually deployment. Chef and Puppet rely on scripting languages
(host-centric). Class C supports orchestration for deploying by making dependencies declarative inside modules: they are

Fig. 3: An example of Logical DAG that is generated from
the dependencies defined between functions f , g, h, and i .
highly composable but they are single host configuration sys- Fig. 4: An execution DAG for an application of the logical
tems. Class C support customized functions for encapsulation DAG in Figure 3. In this example, the target-groups of the
but they suffer from defining compositions imperatively. In functions f , g, h, and i have respectively 1, 2, 3, and 2 hosts.
Karamel, we seek to achieve high flexibility for composing
pre-defined modules but without repeating the definitions of results of the functions are new state-items. A function is
the modules and their dependencies for each deployment. targeted to run at a target-group of hosts. We apply each
function once at each host in a target group. In our model,
Features → Parallel Paradigm Execution Plan Data Sharing
Classes ↓
1-Host Limited
the functions are activated when all their input arguments
1-Parallel Launch Rep. Apps 0-None
Levels →
DevOps ↓
2-Parallel Run Rep. Tasks 1-Imperative
2-Global Store
3-Pull (from Host)
(definition-items and state-items) are available, similar to
3-Task Parallel 2-Declarative
4-Data Parallel
4-Push (to Consumer) dataflow variables from concurrent programming [19]. The
Kubernetes 1 0 2
A Swarm 1 0 2 production and consumption of state-items define a partial
Marathon 1 1 2
Chef 0 1 1 order () over the modules.
Puppet 0 1 1

f.results ∩ g.arguments = Ø ⇐⇒ f  g
Ansible 2 1 3
C CloudFormation 0 1 2
SaltStack 3 1 2
Brooklyn 4 2 4
Karamel 4 2 4 DAG Given a set of functions, a logical acyclic graph
TABLE II: Shortcomings of systems in maximizing parallel (LDAG) is constructed based on the partial-order relationship
execution. 1-4: ’simple to advanced’ and 0: ’not supported’. between the functions. For example, for the function set
F = {f, g, h, i} with the partial-order set R = {f 
Parallel Execution. As quick launching is very crucial in g, f  h, g  i, h  i} the LDAG in Figure 3 is drawn. An
large-scale deployments, we investigate the pros and cons of application of an LDAG is an execution DAG (EDAG). We
the supported execution models from a parallelism standpoint. form an execution DAG as we apply all the functions inside
As Table II presents, class B are task-serial as they run their the LDAG at their corresponding target-groups (Figure 4) and
recipes in an imperatively defined order. Class A uses parallel transform the data between them.
replication of identical services as parallelism but cross differ- IV. K ARAMEL - S YSTEM C OMPONENTS
ent applications commands run serially. Marathon repeats the
definition of execution plans per deployment. Ansible can run In this section, we give an overview of the main components
replications of the same tasks in parallel and it allows each of the system and its algorithms.
host to proceed without synchronizing hosts for similar tasks. Cluster Definition DSL. Figure 5 shows an example of the
However, Ansible does not have parallelism between target DSL in YAML [5]. The DSL can be simply interpreted as a
hosts of different tasks. SaltStack has higher task parallelism list of functions (e.g., hadoop :: namenode) and definition-
compared to Ansible, as it handles task dependencies between items. Algorithm 1 summarizes how we build LDAG and
all target-groups. Brooklyn only supports data parallel model EDAG give an cluster definition (CD). First, it parses the CD
with its sensor-effector model. Inspired by learning from the and loads set of referenced functions F with their definition
limitations, we choose to capture execution plans in declarative items DI. Then, it loads the metadata meta and the body
fashion as we select data parallel execution by dataflow of functions from our function repository. Next, it loads
variable binding model to maximize the parallelism. direct and transitive dependencies from loaded meta in a
loop. Finally, it builds LDAG by using dependencies, and
III. M ODEL it instantiates an EDAG by traversing the LDAG from the
State. A collection of key-value pairs (state-items) form root and applying definition items to the functions.
the final state for deployment. Provisioning modules produce Orchestration Controller. Our engine orchestrates deploy-
State-items. For instance: machines, containers, storage, files, ments by using a control loop mechanism. As Algorithm 2
and services are typical state-items. shows, a control loop starts as a new EDAG given. The
Functions. Provisioning modules behave like functions in controller, always, submits ready tasks to our actuator, then, it
Karamel (observational view over functions as in [19]). A waits until feedback arrives from the actuator. If the received
function is responsible for building, controlling, and roll- task t was successful, the controller collects the set of state-
backing a subset of the state. The input arguments of functions items SI that are built by task t. The controller submits all
are a mix of definition-items and state-items. The function’s the successors of the task t for actuation as it binds SI to
body transforms definition-items into new state-items. The the tasks variables. Otherwise, if the task was unsuccessful,

Algorithm 2: control loop(EDAG)
1 T = EDAG.roots
2 for t ∈ T do
3 callback(t, this)
4 submit to actuator(t)
5 repeat
6 t ← wait f or callback()
7 if t.succeed then
8 SI ← t.state items
9 T ← t.successors
10 for s ∈ T do
11 callback(s, SI, this)
12 submit to actuator(t)
13 submit f or debugging(t)
Fig. 5: Minimal Cluster Definition for deploying Apache-
Spark [21] and HDFS [8] on Amazon EC2.
until EDAG.isdone

Algorithm 1: dag builder(CD)

1 F, DI ← parse(CD) Algorithm 3: actuator(H, RE, BP, α)
2 D = meta, body ki=1 ← load detail(F ) 1 Q ← allocate Q per host(H)
3 while ∃f ∈ F |f.meta.dep F do 2 R ← allocate runner per host(H)
4 F = F ∪ f.meta.dep 3 ∀r ∈ R|r.start()
5 D = D ∪ load detail(f )
end # Main Thread Runs:
6 LDAG = build ldag(D) 4 repeat
7 EDAG = apply(LDAG, DI) 5 t ← await controller submission()
6 Q[t.host].enq(t)
until T RU E
the controller allows human intervention by escalating the
error to our dashboard. When a failed task was fixed, either # Runner Thread r ∈ R Runs:
automatically or by a human, then the controller signals the 7 repeat
DAG and continues the deployment. 8 t ← wait dq(Q[r.host])
Orchestration Actuator. Algorithm 3 shows the routine of 9 run over ssh stubborn(t, RE, BP, α)
the actuator. The actuator is launched for provisioning software 10 call controller(t)
on a set of hosts H (for simplicity, we assume that hosts are until T RU E
provisioned for only show the routine). The actuator creates
a set of task queues Q (one queue per host) and a set of
executor threads R (one dedicated thread per host) for parallel The back-off period is exponentially increased by a factor
execution. The actuator waits until it receives a new task from The stubborn mechanism is designed to overcome temporary
the controller: it places the new task at the tail of the queue. faults, due to networking or external services, such as the
A task runner r removes a task from its queue, or it waits for artifact repository.
a new task to arrive if the queue is empty, and it runs it. The
serial ordering of tasks at the host level is to avoid potential V. E VALUATION
conflicts of software packages as some packaging systems like First, we present some statistics related to some real de-
apt take global (host-level) locks. ployments done by Karamel, then we present results of an
Karamel uses agent-less communication over SSH for sim- experiment measuring the latency of deployments.
plicity of the usage (details are skipped in Algorithm 3 for Statistics. We present some statistics that we have collected
short). The task runner uses a stubborn retry mechanism when over the course of a year in 2015-2016 related to the pro-
a failure occurs; that is, it retries the execution a number of visioning of 1660 clusters at a variable scale between 1-100
times RE with a back-off period BP between each attempt. machines. The data shows that 47 different users used Karamel
from 15 countries and 33 cities. Figure 6a shows that more

100 100 Openstack Nova Amazon Ec2
Distribution Distribution
80 CDF 80 CDF

0.2 % 32.9 %
60 60


40 40
20 20 4.2 %
Google Cloud
6.6 %
0 0 56.1 %
0 50 100 150 200 250 300 350 0 5 10 15
line of code [#] provisioning modules [#] Baremetal

Fig. 6: Statistics collected from real-world clusters run in Karamel over the course of one year in 2016. It shows distributions
of: (i) lines of code (LOC), (ii) number of provisioning modules, and (iii) cloud provider used - over 1660 launched clusters.

than 88% of the clusters are defined in less than 100 lines of data parallelism for actuation. Karamel is available as an open
YAML code whereas the referenced provisioning modules are source project and is used in industry and academia.
coded in more than 500 lines in Ruby. Totally 68 modules are
referenced in the definitions and maximum 16 modules are
used per cluster: 59% of the clusters only use 1 or 2 modules, [1] Ansible v2.4. https://goo.gl/gYPjkW, [01-03-2019].
[2] Chef Client v12.0. https://goo.gl/ai6Za8, [01-03-2019].
almost 23% of them use exactly 7 modules (Figure 6b). [3] Docker v17.09. https://goo.gl/sKQKyQ, [01-03-2019].
Figure 6c shows that clusters are deployed portably on [4] Puppet 5.3. https://goo.gl/HQdkRG, [01-03-2019].
supported clouds: 56.1% are deployed on in-house premises, [5] YAML Data Serialization Standard. http://yaml.org
[6] Amazon Web Services, Inc. Amazon Elastic Computing Cloud. https:
39.2% on Amazon, 6.6% on OCCI, and 4.2% on Google. //aws.amazon.com/ec2/, 2017.
[7] A. Bessani, J. Brandt, M. Bux, V. Cogo, L. Dimitrova, J. Dowl-
600 ing, A. Gholami, K. Hakimzadeh, M. Hummel, M. Ismail, et al.
Biobankcloud: a platform for the secure storage, sharing, and processing
500 of large biomedical data sets. In the First International Workshop on
Data Management and Analytics for Medicine and Healthcare (DMAH
latency [sec]

400 2015), 2015.

Provisioning Machines [8] D. Borthakur. Hdfs architecture guide. HADOOP APACHE
300 Provisioning Software
PROJECT http://hadoop. apache. org/common/docs/current/hdfs design.
200 pdf, page 39, 2008.
[9] M. Bux, J. Brandt, C. Lipka, K. Hakimzadeh, J. Dowling, and U. Leser.
100 Saasfee: scalable scientific workflow execution engine. Proceedings of
the VLDB Endowment, 8(12):1892–1895, 2015.
[10] Google. Google Compute Engine. https://cloud.google.com/compute/,
21 22 23 24 25 26 27
machines [#] 2017.
Fig. 7: Provisioning Latency at Scale. [11] K. Hakimzadeh, H. P. Sajjad, and J. Dowling. Scaling hdfs with a
strongly consistent relational model for metadata. In IFIP International
Conference on Distributed Applications and Interoperable Systems,
Latency at Scale. Moreover, we perform an experiment to pages 38–51. Springer, 2014.
deploy the stack of Spark on HDFS in Figure 5 on Amazon [12] Kamal Hakimzadeh. One Click Installation for Clusters. http://www.
karamel.io/, 2017.
Cloud. In each run, we change the number of machines by the [13] Kamal Hakimzadeh. Reproducing Distributed Systems on Cloud. https:
factor of 2 and we measure the latency of deployment. Figure //github.com/karamelchef/karamel, 2017.
7 shows that latency in the provisioning of infrastructure [14] S. Niazi, M. Ismail, S. Grohsschmiedt, M. Ronström, S. Haridi, and
J. Dowling. Hopsfs: Scaling hierarchical file system metadata using
fluctuates slightly which is related to the changing delays newsql databases. In FAST, pages 89–103, 2017.
incurred from Amazon. However, the latency for the provi- [15] OpenStack Org. OpenStack Compute (Nova). https://github.com/
sioning of software starts from 488 seconds for 2 machines openstack/nova, 2017.
[16] H. Peiro Sajjad, K. Hakimzadeh, and S. Perera. Reproducible distributed
and it increases to 522 for 16 machines. The slight increase clusters with mutable containers: To minimize cost and provisioning
is due to the fact that some machines finish some jobs slower time. In Proceedings of the 2017 workshop on hot topics in container
than others and they increase the total time of the provisioning networking and networked systems. ACM, 2017.
[17] D. K. Rensin. Kubernetes - Scheduling the Future at Cloud Scale. 1005
plan at larger scales. Gravenstein Highway North Sebastopol, CA 95472, 2015.
VI. C ONCLUSION [18] J. Thönes. Microservices. IEEE Software, 32(1):116–116, 2015.
[19] P. Van-Roy and S. Haridi. Concepts, techniques, and models of computer
In this paper, we explained that the orchestration of modern programming. MIT press, 2004.
systems should be time-efficient and repeatable as it deals with [20] D. Weerasiri, M. C. Barukh, B. Benatallah, Q. Z. Sheng, and R. Ranjan.
A taxonomy and survey of cloud resource orchestration techniques. ACM
complex configurations scenarios. We showed that existing Computing Surveys (CSUR), 50(2):26, 2017.
systems have limited models for the required agility. We [21] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica.
demonstrated that Karamel overcomes the issues by using de- Spark: cluster computing with working sets. HotCloud, 10:10–10, 2010.
signing orchestration as composable modules and employing


