32karamel A System For Timely Provisioning Large-Scale Software Across IaaS Clouds
32karamel A System For Timely Provisioning Large-Scale Software Across IaaS Clouds
32karamel A System For Timely Provisioning Large-Scale Software Across IaaS Clouds
validated against their interface definition, typically hosted in distributed systems but they are limited in supporting advanced
a GitHub repository, and specified using Chef [2], a popular configuration patterns.
configuration management platform. That is, we reuse for Configuration Complexity. In Karamel, we push these
software configuration, while Karamel provides orchestration boundaries further. As Figure 2 shows, we target distributed
capabilities. services, like class C (the stronger class), but we want to sup-
The contributions of this work are: port advanced configuration patterns. The advanced patterns
(C1) An orchestration system for deploying software sys- are not limited to a single host, like in class B, and they are
tems on cloud platforms at scale. Deployments in Karamel not host-centric, like in class A and C. Besides, the model
are portable between different cloud providers (it currently is portable and it supports the combinations of bare-metal
supports Amazon [6], Google Cloud [10], Open-Stack [15], servers, virtual technologies, and virtual machine images.
and bare-metal servers) and virtualization technologies [16]. Features → Encapsulation (definition) Composition (at usage)
Classes ↓
1-Predefined Commands 0-None
The system is open source [13], [12] and it is used by Levels → 2-Predefined Functions 1-Imperative
DevOps ↓ 3-Custom Scripts 2-Declarative Local Dep.
organizations from academia and industry [7], [9], [11], [14]. 4-Custom Functions 3-Declarative Global Dep.
(C2) A composable model for encapsulating orchestration Kubernetes 1 0
A Swarm 1 0
routines along with their dependencies inside them. The model Marathon 1 3
Chef 3 2
allows the declarative design of many layers of software with B
Puppet 3 2
complex orchestration dependencies at scale. Ansible 3 1
C CloudFormation 3 1
(C3) A level of parallelism higher than existing solu- SaltStack 4 2
Brooklyn 4 1
tions. Karamel achieves this by its systematic solution for Karamel 4 3
deployments based on a DAG traversal control logic. Our
task abstraction allows us to support heterogeneous types TABLE I: Modular design of provisioning routines. 1-4 denote
of orchestration routines (e.g., forking machines, installing ’simple to advanced’ and 0 denotes ’not supported’.
software, etc.) in a single coherent plan. Further, parallel
Composable Modules. Our modular design brings bene-
actuation and dataflow variable binding increase support for
fits such as logical separation of configuration boundaries,
concurrency.
independent development, and building large-scale and robust
II. M OTIVATION AND R ELATED W ORK configuration management (CM) systems from smaller and
We present related work in the DevOps paradigm to moti- more stable modules. Table I shows encapsulation methods
vate Karamel. Firstly, we group systems into three classes: A, used by systems versus their flexibility of composing the
B, and C. We synthesize this information by analyzing the sys- modules in different deployments.
tems’ documentation and, where necessary, their source code. Containers are bounded to pre-defined commands (e.g.,
Class A follows the microservice architecture [18] using con- start, stop, etc.) - only Marathon supports distributed depen-
tainer images for fast launch. Class B supports deployments dencies but they have to be defined imperatively and per
on many hosts but they can only configure hosts individually deployment. Chef and Puppet rely on scripting languages
(host-centric). Class C supports orchestration for deploying by making dependencies declarative inside modules: they are
392
Fig. 3: An example of Logical DAG that is generated from
the dependencies defined between functions f , g, h, and i .
highly composable but they are single host configuration sys- Fig. 4: An execution DAG for an application of the logical
tems. Class C support customized functions for encapsulation DAG in Figure 3. In this example, the target-groups of the
but they suffer from defining compositions imperatively. In functions f , g, h, and i have respectively 1, 2, 3, and 2 hosts.
Karamel, we seek to achieve high flexibility for composing
pre-defined modules but without repeating the definitions of results of the functions are new state-items. A function is
the modules and their dependencies for each deployment. targeted to run at a target-group of hosts. We apply each
function once at each host in a target group. In our model,
Features → Parallel Paradigm Execution Plan Data Sharing
Classes ↓
0-None
1-Host Limited
the functions are activated when all their input arguments
1-Parallel Launch Rep. Apps 0-None
Levels →
DevOps ↓
2-Parallel Run Rep. Tasks 1-Imperative
2-Global Store
3-Pull (from Host)
(definition-items and state-items) are available, similar to
3-Task Parallel 2-Declarative
4-Data Parallel
4-Push (to Consumer) dataflow variables from concurrent programming [19]. The
Kubernetes 1 0 2
A Swarm 1 0 2 production and consumption of state-items define a partial
Marathon 1 1 2
B
Chef 0 1 1 order () over the modules.
Puppet 0 1 1
f.results ∩ g.arguments = Ø ⇐⇒ f g
Ansible 2 1 3
C CloudFormation 0 1 2
SaltStack 3 1 2
Brooklyn 4 2 4
Karamel 4 2 4 DAG Given a set of functions, a logical acyclic graph
TABLE II: Shortcomings of systems in maximizing parallel (LDAG) is constructed based on the partial-order relationship
execution. 1-4: ’simple to advanced’ and 0: ’not supported’. between the functions. For example, for the function set
F = {f, g, h, i} with the partial-order set R = {f
Parallel Execution. As quick launching is very crucial in g, f h, g i, h i} the LDAG in Figure 3 is drawn. An
large-scale deployments, we investigate the pros and cons of application of an LDAG is an execution DAG (EDAG). We
the supported execution models from a parallelism standpoint. form an execution DAG as we apply all the functions inside
As Table II presents, class B are task-serial as they run their the LDAG at their corresponding target-groups (Figure 4) and
recipes in an imperatively defined order. Class A uses parallel transform the data between them.
replication of identical services as parallelism but cross differ- IV. K ARAMEL - S YSTEM C OMPONENTS
ent applications commands run serially. Marathon repeats the
definition of execution plans per deployment. Ansible can run In this section, we give an overview of the main components
replications of the same tasks in parallel and it allows each of the system and its algorithms.
host to proceed without synchronizing hosts for similar tasks. Cluster Definition DSL. Figure 5 shows an example of the
However, Ansible does not have parallelism between target DSL in YAML [5]. The DSL can be simply interpreted as a
hosts of different tasks. SaltStack has higher task parallelism list of functions (e.g., hadoop :: namenode) and definition-
compared to Ansible, as it handles task dependencies between items. Algorithm 1 summarizes how we build LDAG and
all target-groups. Brooklyn only supports data parallel model EDAG give an cluster definition (CD). First, it parses the CD
with its sensor-effector model. Inspired by learning from the and loads set of referenced functions F with their definition
limitations, we choose to capture execution plans in declarative items DI. Then, it loads the metadata meta and the body
fashion as we select data parallel execution by dataflow of functions from our function repository. Next, it loads
variable binding model to maximize the parallelism. direct and transitive dependencies from loaded meta in a
loop. Finally, it builds LDAG by using dependencies, and
III. M ODEL it instantiates an EDAG by traversing the LDAG from the
State. A collection of key-value pairs (state-items) form root and applying definition items to the functions.
the final state for deployment. Provisioning modules produce Orchestration Controller. Our engine orchestrates deploy-
State-items. For instance: machines, containers, storage, files, ments by using a control loop mechanism. As Algorithm 2
and services are typical state-items. shows, a control loop starts as a new EDAG given. The
Functions. Provisioning modules behave like functions in controller, always, submits ready tasks to our actuator, then, it
Karamel (observational view over functions as in [19]). A waits until feedback arrives from the actuator. If the received
function is responsible for building, controlling, and roll- task t was successful, the controller collects the set of state-
backing a subset of the state. The input arguments of functions items SI that are built by task t. The controller submits all
are a mix of definition-items and state-items. The function’s the successors of the task t for actuation as it binds SI to
body transforms definition-items into new state-items. The the tasks variables. Otherwise, if the task was unsuccessful,
393
Algorithm 2: control loop(EDAG)
1 T = EDAG.roots
2 for t ∈ T do
3 callback(t, this)
4 submit to actuator(t)
end
5 repeat
6 t ← wait f or callback()
7 if t.succeed then
8 SI ← t.state items
9 T ← t.successors
10 for s ∈ T do
11 callback(s, SI, this)
12 submit to actuator(t)
end
end
else
13 submit f or debugging(t)
Fig. 5: Minimal Cluster Definition for deploying Apache-
end
Spark [21] and HDFS [8] on Amazon EC2.
until EDAG.isdone
394
100 100 Openstack Nova Amazon Ec2
Distribution Distribution
80 CDF 80 CDF
0.2 % 32.9 %
60 60
[%]
[%]
40 40
20 20 4.2 %
Google Cloud
6.6 %
0 0 56.1 %
OCCI
0 50 100 150 200 250 300 350 0 5 10 15
line of code [#] provisioning modules [#] Baremetal
Fig. 6: Statistics collected from real-world clusters run in Karamel over the course of one year in 2016. It shows distributions
of: (i) lines of code (LOC), (ii) number of provisioning modules, and (iii) cloud provider used - over 1660 launched clusters.
than 88% of the clusters are defined in less than 100 lines of data parallelism for actuation. Karamel is available as an open
YAML code whereas the referenced provisioning modules are source project and is used in industry and academia.
coded in more than 500 lines in Ruby. Totally 68 modules are
R EFERENCES
referenced in the definitions and maximum 16 modules are
used per cluster: 59% of the clusters only use 1 or 2 modules, [1] Ansible v2.4. https://goo.gl/gYPjkW, [01-03-2019].
[2] Chef Client v12.0. https://goo.gl/ai6Za8, [01-03-2019].
almost 23% of them use exactly 7 modules (Figure 6b). [3] Docker v17.09. https://goo.gl/sKQKyQ, [01-03-2019].
Figure 6c shows that clusters are deployed portably on [4] Puppet 5.3. https://goo.gl/HQdkRG, [01-03-2019].
supported clouds: 56.1% are deployed on in-house premises, [5] YAML Data Serialization Standard. http://yaml.org
[6] Amazon Web Services, Inc. Amazon Elastic Computing Cloud. https:
39.2% on Amazon, 6.6% on OCCI, and 4.2% on Google. //aws.amazon.com/ec2/, 2017.
[7] A. Bessani, J. Brandt, M. Bux, V. Cogo, L. Dimitrova, J. Dowl-
600 ing, A. Gholami, K. Hakimzadeh, M. Hummel, M. Ismail, et al.
Biobankcloud: a platform for the secure storage, sharing, and processing
500 of large biomedical data sets. In the First International Workshop on
Data Management and Analytics for Medicine and Healthcare (DMAH
latency [sec]
395