Cloud Computing: Off-Premise Cloud Migration

Download as pdf or txt
Download as pdf or txt
You are on page 1of 79

19/10/2010

1
Cloud Computing
..computation may someday
be organized as a public utility..
[John McCarthy, 1960]
Virtualization and Cloud Computing
2
On-Premise Cloud Off-Premise Cloud
Off-premise Cloud migration
Elastic Capacity: Local Cloud to Off-premise Cloud migration
19/10/2010
2
Cloud computing: Introduction
O Term coined in late of 2007
O Currently emerges as a hot topic due to its abilities to offer
O flexible dynamic IT infrastructures
O QoS guaranteed computing environment
O configurable software services
3
A nutshell (1) 4
SaaS
Complex requirements
Data outside firewall
Cost vs usage
`

Convenience, simplicity
Flexibility
Virtualization
Utility computing
Pay As You Go
19/10/2010
3
A nutshell (2) 5
SaaS = leasing a car
Pay as you go
Virtual engine..
Virtual tyres..
Cloud Computing Users and Business Models
O Acquisition Model (Service)
O "All that matters is results; I don't care how it's done"
O Business Model (Pay for usage)
O "I don't want to own assets - I want to pay for elastic usage, like a utility"
O Access Model (Internet)
O "I want accessibility from anywhere, from any device"
O Technical Model (Scalable, elastic, shareable)
O "It's about economies of scale, with effective and dynamic sharing"
6
6
Service Consumers
Service Catalog,
Component
Library
Cloud
Administrator
Datacenter
Infrastructure
Monitor & Manage
Services & Resources
Component Vendors /
Software Publishers
Publish & Update
Components,
Service Templates
Access
Services
IT Cloud
Garter, 2008
19/10/2010
4
Public Clouds
O Large scale infrastructure available on a rental basis
O Operating System virtualization (e.g. Xen) provides CPU
isolation
O Roll-your-own network provisioning provides network
isolation
O Locally specific storage abstractions
O Fully customer self-service
O Service Level Agreements (SLAs) are advertized
O Requests are accepted and resources granted via web services
O Customers access resources remotely via the Internet
O Accountability is e-commerce based
O Web-based transaction
O Pay-as-you-go and flat-rate subscription
O Customer service, refunds, etc.
7
Private Clouds
O Internally managed data centers
O The organization sets up a virtualization environment on
its own servers
O in its data center
O in the data center of a managed service provider
O Key benefits
O you have total control over every aspect of the
infrastructure
O you gain advantages of virtualization
O Issues
O It lacks the freedom from
O capital investment
O Flexibility (almost infinite grow of cloud computing)
O Useful for companies that have significant existing IT
investments
8
19/10/2010
5
Roberto Turrin
Politecnico di Milano
..As a Service
Cloud computing
9
http://www.ephinx.com/tvadverts/119/ibm-on-demand-business-help-desk-advert.html
..help desk on-demand
Maximilien Brice, CERN
Why do it yourself if you can pay someone to do it for you? 10
19/10/2010
6
Cloud Computing: a compact view 11
Cloud TV
(Cloud Computing)
Video On Demand
(SaaS)
Electricity
On Demand
(PaaS)
Full Cloud-based
system
= SaaS + PaaS
(utility computing)
A variety of as-a-Service terms
to describe services offered in Clouds
O AaaS - Architecture as a Service
O BaaS - Business as a Service
O CaaS - Computing as a Service
O CRMaaS - CRM as a Service
O DaaS - Data as a Service
O DBaaS - Database as a Service
O EaaS - Ethernet as a Service
O FaaS - Frameworks as a Service
O GaaS - Globalization or Governance as a Service
O HaaS - Hardware as a Service
O IaaS - Infrastructure or Integration as a Service
O IDaaS - Identity as a Service
O ITaaS - IT as a Service
O LaaS - Lending as a Service
O MaaS - Mashups as a Service
O OaaS - Organization or Operations as a Service
O SaaS - Software as a Service
O StaaS - Storage as a Service
O PaaS - Platform as a Service
O TaaS - Technology or Testing as a Service
O VaaS - Voice as a Service
12
19/10/2010
7
Ontology
Toward a Unified Ontology of Cloud Computing
[L. Youseff, M. Butrico, and D. Da Silva]
13
Cloud Application Layer
O Cloud Application Layer
O SaaS
O Users access the services provided by this layer through
web-portals, and are sometimes required to pay fees to use
them.
O Cloud applications can be developed on the cloud software
environments or infrastructure components
14
19/10/2010
8
Software as a service (SaaS)
O In terms of maturity, Software in the cloud is much more
evolved than Hardware [G. Reese, 2008]
O Application is used as an on demand service. Often
provided via the Internet
O Think on-demand TV programs
O Example:
O Google App (online office)
O SalesForce.com (CRMaaS)
15
Software as a service (SaaS): examples
O An early example of the SaaS is the Application Service
Provider (ASP). The ASP approach provides subscriptions to
software that is hosted or delivered over the Internet.
O Microsofts Software +Service shows another example: a
combination of local software and Internet services
interacting with one another.
O Googles Chrome browser gives an interesting SaaS
scenario: a new desktop could be offered, through which
applications can be delivered (either locally or remotely) in
addition to the traditional Web browsing experience
16
19/10/2010
9
Software as a service (SaaS): characteristics
O Characteristics:
O Availability via a web browser
O No installation required
O No proprietary desktop sw needed
O On-demand availability
O No sales process to gain access to SaaS-based sw
O Payment terms based on usage
O No massive setup fees
O When you no longer need those service, you simply stop paying
O Minimal IT demands
O E.g., DNS management
17
Software as a service (SaaS): benefits
O Benefits to users
O Reduce expenses: multiple computers, multiple users
O Alleviates sw maintenance, ongoing operation, support costs
O Export computational work from users terminal to SaaS provider
O No special hardware is required
O Ease of usage: easy installation, access everywhere
O High performance with no huge investment
O Benefits to providers
O Easier to maintain/upgrade/testing (without disturbing users)
O roll small patches
O Add new features
O Protect intellectual property
O Control usage (no illegal copies)
18
19/10/2010
10
Software as a service (SaaS): issues
O Security and availability (dependability)
O Possible network outage and system failures
O Up-time
O Performance
O Migration of users data to the cloud
O Security, safety of confidential data
O Users authentication and authorization
O Data backup, disaster recovery
O SLA for cloud application
19
O Computing resources and platform provided on demand
O Think electricity service
Utility Computing (UC) 20
19/10/2010
11
Utility Computing (UC)
O Computing resources (cpu hour, memory, network) and
platform to run software are provided as on demand service
O Think electricity service
O The same evolution happened
O Hardware as a service (HaaS),
Infrastructure as a service (IaaS),
Platform as a Service (PaaS)
O Examples of UC providers: Amazon EC2, Google AppEngine

O Who will use UC? Is UC the end of high-end PC?


O People who otherwise has to build their own data center: SaaS
providers, analytics & batch processing
21
Cloud Software Environment Layer
O Cloud Software Environment Layer
O PaaS
O Users are application developers
O Providers supply developers with a programming-language-
level environment with well-defined API
O Facilitate interaction between envirnoment and apps
O Accelerate the deployment
O Support scalability
22
19/10/2010
12
Platform as a Service (PaaS)
O You program using the vendors specific application
development platform
O The vendor worries about all deployment details
23
Platform as a Service (PaaS): examples
O Google App Engine
O Python runtime environment and API
O SalesForce Apex (<-->CRMaaS)
O Language to design (along with application logic) page layout,
workflow, customer reports.
O Apache Hadoop
O Programming language for embarrassingly parallel apps
O Yahoo!/Apache Pig
O High-level language to enable processing of very large files on
Hadoop environment
O The highest abstraction layer in Pig is a query language
interface, whereby users express data analysis tasks as
queries, in the style of SQL or Relational Algebra
24
19/10/2010
13
Platform as a Service (PaaS): benefits
O Benefits for developers
O Automatic scaling
O Automatic load balancing
O Integration with other services provided through PaaS Provider
O Authentication service
O Email service
O User interface
Again, such services can be integrated on demand
O It alleviates overhead of cloud apps development
O Reduces development time
O Minimizes logic faults in application
25
Platform as a Service (PaaS): issues
O The PaaS approach is vendor lock-in
O E.g., Google App Engine
O requires the application to be written in Python, using Google-
specific API
O The application will work well only inside the Googles
infrastructure
26
19/10/2010
14
Cloud Software Infrastructure Layer
O Cloud Software Infrastructure Layer
O IaaS: computational
O DaaS: storage
O CaaS: communications
O Provides resources to the higher-leve layers (i.e., Software
and Software Environment)
O Note that Cloud Apps and Cloud Sw might bypass
Cloud Sw Infrastructure
O However, this would reduce
O Simplicity
O Development efforts
27
Infrastructure as a Service (IaaS)
O Virtual Machines (VM) vs dedicated hardware
O VMs benefits
O Flexibility
O Super-user (root) access to VM for fine granularity settings and
customize installed sw
O VMs issues
O Regulatory requirements: certain functions must operate on
dedicated hw
O Performance requirements (especially I/O)
O Legacy systems:
lack of web integration strategy
28
19/10/2010
15
Infrastructure as a Service (IaaS): examples (1)
O Commercial solutions
O Amazon Elastic Cloud (EC2)
O Pure virtualization
O Based on Xen
O AppNexus Netezza in the Cloud" Data Warehouse Service
O Dedicated Hardware (with virtualization on top)
O Your apps are not fighting with other users
O Enomalys elastic computing infrastructure
O Based on Xen
O GoGrid (worlds first multi-server control panel)
O Web-based control panel
O Wide variety of OS images (CentOS/Fedora, Windows Server)
O IIS, Microsoft SQL server, Ruby on Rails, PostgreSQL
O Rackspace
O Microsoft Azure
O it covers from private clouds to PaaS
O Based on .NET platform (portable across Microsoft environments)
29
Infrastructure as a Service (IaaS): examples (2)
O Academic open-source projects
O Eucalyptus Systems
O Delivers private cloud software
O Support Xen, KVM, Vmware (vSphere, ESX, and ESXi)
O Globus Science Clouds
O Based on Nimbus.
Open source toolkit, in turn, supporting Xen and KVM
O Goals:
to make it easy for scientific projects to experiment with IaaS-
style cloud computing
to enable projects developing infrastructure for such clouds to
learn from user requirements.
30
19/10/2010
16
Data as a Service (DaaS)
O Allows users to
O store their data at remote disks
O Access data anytime from any place
O Facilitates cloud applications to scale beyond their limited
servers
O Requirements (hard to reach all together):
O High dependability: availability, reliability, performance
(scalability)
O Replication
O Data consistency
31
Data as a Service (DaaS): examples (1)
O Taxonomy
O Distributed file systems
O E.g., GFS, HDFS
O Replicated relational database (RDBMS)
O Strict consistency model, less interested to availability
O E.g., Bayou
O Key-value stores
O Strong on availability, relaxed consistency
O E.g., Amazon Dynamo
Query Model
ACID Properties
Efficiency
Operation environment is assumed to be non-hostile
32
19/10/2010
17
Data as a Service (DaaS): examples (2)
O Commercial systems
O Amazon
O Persistent cloud storage: S3
O Ephemeral instance storage
O Elastic block storage
O EMC, COS (Cloud Optimized Storage) solutions:
O EMC Atmos.
Multi-petabyte infrastructure for information distribution and
storage
replica, versioning, compression, deduplication e spin down
Access via
Web Services (REST/SOAP)
File system protocols (CIFS/NFS/IFS)
O The DaaS could also be found at some popular IT services,
e.g., Google Docs
33
Communications as a Service (CaaS)
O Communications becomes a vital component in
guaranteeing QoS
O Requirements:
O Service-oriented
O Configurable
O Schedulable
O Predictable
O Dynamic provisioning of virtual
overlays for traffic isolation
or dedicated bandwidth
O Guaranteed message delay
O Reliable
O Network security
O Communication encryption
O Network monitoring
34
19/10/2010
18
Communications as a Service (CaaS): examples 35
O Candidate cloud applications that can be composed of CaaS
O VoIP, telephone systems, audio and video conferencing,
instant messaging are and
O Microsoft Connected Service Framework (CSF)
O Developed on the .NET framework
O enables scalable, loosely coupled service-based solutions
O For telecommunications operators and service providers
O allows to aggregate, provision and manage converged
communications services for their subscribers across multiple
networks and a range of device types.
O For media and entertainment organizations
O provides a service-oriented infrastructure to manage how disparate
applications work together to create, manipulate, share and
distribute digital content.
Software kernel layer
O Basic sw management for the physical servers
O OS kernel
O Hypervisor
O VMM
O Clustering middleware
O A step back (10 years ago):
O Grid computing applications were deployed and run on this
layer on several interconnected clusters of machines
36
19/10/2010
19
Software kernel layer: grid computing
O Grid computing applications were deployed and run on this
layer on several interconnected clusters of machines
O Examples:
O Globus toolkit
O Condor
O Absence of virtualization abstraction
O Jobs closely tied to actual hw infrastructure
O Its complicated to provide: migration, checkpoint, and load
balancing
O Grid computing research can potentially be integrated to the
research area of the cloud.
37
Clouds Versus Grids
O Richs assertion: Clouds and Grids are distinct
O Cloud
O Full private cluster is provisioned
O Individual user can only get a tiny fraction of the total
resource pool
O No support for cloud federation except through the client
interface
O Opaque with respect to resources
O Grid
O Built so that individual users can get most, if not all of
the resources in a single request
O Middleware approach takes federation as a first principle
O Resources are exposed, often as bare metal
O These differences mandate different architectures for each
38
19/10/2010
20
Firmware/hardware layer
O Physical hw
O Servers, switches,
O Users of this layer are big enterprise with huge IT
requirements
O Hardware as a Service (Haas)
O Provides, operates, manages, and upgrades the
hw on behalf of its customers, for the life-time of the
sublease
39
Hardware as a Service (HaaS)
O The old idea that hardware should be a capital expenditure
is now being challenged and IT service providers can now
rent hardware from HaaS providers.
O Hardware may only represent 20% of typical IT budgets,
but when equipment needs deployed or replaced, the
capital expenditure has undeniable impact on businesses. In
these difficult economic conditions, important projects have
been delayed or simply abandoned because cash is needed
elsewhere.
O Hardware as a Service was coined possibly in 2006. As the
result of rapid advances in hardware virtualization, IT
automation and usage metering and pricing, users could
buy IT hardware, or even an entire data center, as a pay-
as-you-go subscription service. The HaaS is flexible,
scalable and manageable to meet your needs.
40
19/10/2010
21
Hardware as a Service (HaaS): benefits
O Benefits to users:
O Enterprise users do not need to invest in building and
managing data centers
O HaaS provider has the technical expertise.
O Benefits to providers
O The provider has the cost-effective infrastructure to host the
systems
O Benefit materializes from economy of scale of
O building huge data center infrastructures
O with gigantic floor space, power, cooling costs
O operation and management expertise
41
Hardware as a Service (HaaS): challenges
O Providers have to address a number of technical challenges
O Efficiency, ease, speed of provisioning
O Data center management
O Scheduling
O Power-consumption optimizations
O Solutions:
O Remote scriptable boot-loaders for remotely boot and deploy
complete sw stacks on the data centers
O PXE, Uboot
O E.g., IBM Kittyhawk (based on Uboot) script the boot sequence
of thousands on remote Bluegene/P node..
42
19/10/2010
22
Hardware as a Service (HaaS): examples
O Five-year, $575 million sublease contract between Morgan
Stanleys and IBM
O What: Individual Investor Group and Discover Financial
Services unit
O FROM:
O Morgan Stanley previously ran applications on mainframes.
O TO:
O Utility computing
43
Spectrum of Abstractions
O Different levels of abstraction
O Instruction Set VM: Amazon EC2
O Framework VM: Google AppEngine
O Similar to languages
O Higher level abstractions can be built on top of lower ones
44
EC2 Azure AppEngine
Force.com
Lower-level,
More flexibility,
More management
Not scalable by default
Higher-level,
Less flexibility,
Less management
Automatically scalable
19/10/2010
23
Roberto Turrin
Politecnico di Milano
Enabling technologies
Cloud computing
45
ENABLING TECHNOLOGIES (1)
O Virtualization technology
O Virtualization technologies partition hardware and
thus provide flexible and scalable computing platforms.
O Virtual machine techniques, such as VMware and Xen,
offer virtualized IT-infrastructures on demand.
O Virtual network advances, such as VPN, support users
with a customized network environment to access
Cloud resources.
O Virtualization techniques are the bases of the Cloud
computing since they render flexible and scalable
hardware services.
46
19/10/2010
24
ENABLING TECHNOLOGIES (2)
O Orchestration of Service flow and workflow
O Computing Clouds offer a complete set of service
templates on demand, which could be composed by
services inside the computing Cloud.
O Computing Clouds therefore should be able to
automatically orchestrate services from different
sources and of different types to form a service flow or
a workflow transparently and dynamically for users.
47
ENABLING TECHNOLOGIES (3)
O Web service and Service Oriented Architecture
(SOA)
O Computing Cloud services are normally exposed as Web
services, which follow the industry standards such as
WSDL, SOAP. The services organization and
orchestration inside Clouds could be managed in a
Service Oriented Architecture (SOA).
O A set of Cloud services furthermore could be used in a
SOA application environment, thus making them
available on various distributed platforms and could be
further accessed across the Internet.
48
19/10/2010
25
ENABLING TECHNOLOGIES (4)
O Web 2.0
O Web 2.0 is an emerging technology describing the
innovative trends of using World Wide Web technology
and Web design that aims to enhance creativity,
information sharing, collaboration and functionality of
the Web.
O The essential idea behind Web 2.0 is to improve the
interconnectivity and interactivity of Web applications.
The new paradigm to develop and access Web
applications enables users access the Web more easily
and efficiently.
O Cloud computing services in nature are Web
applications which render desirable computing services
on demand. It is thus a natural technical evolution that
the Cloud computing adopts the Web 2.0 technique.
49
ENABLING TECHNOLOGIES (5)
O World-wide distributed storage system
O A Cloud storage requirements:
O A network storage system, which is backed by distributed storage
providers, offers storage capacity for users to lease.
O The data storage could be migrated, merged, and managed
transparently to end users for whatever data formats.
O Examples:
O Google File System
O Amazon S3
O SmugMug (an example of Mashup).
Digital photo sharingWeb site
Upload of an unlimited number of photos for all account types
providing a published API which allows programmers to create
new functionality, and supporting XML-based RSS and Atom
feeds.
50
19/10/2010
26
ENABLING TECHNOLOGIES (6)
O A distributed data system
O Provides data sources accessed in a semantic way.
O Users could locate data sources in a large distributed
environment by the logical name instead of physical
locations.
O Example: Virtual Data System (VDS)
O Virtual Data Language (VDL)
O Specifies grid workflows to derive and create data
51
ENABLING TECHNOLOGIES (7)
O Programming Model
O Users drive into the computing Cloud with data and
applications.
O Some Cloud programming models should be proposed
for users to adapt to the Cloud infrastructure.
O For the simplicity and easy access of Cloud services, the
Cloud programming model, however, should not be too
complex or too innovative for end users.
O Example: Apache Hadoop
52
19/10/2010
27
Roberto Turrin
Politecnico di Milano
Peculiarities
Cloud Computing
53
Why Cloud Computing
O Large Scale Problems
O Definitely data-intensive
O May also be processing intensive
O Examples:
O Crawling, indexing, searching, mining the Web
O Post-genomics life sciences research
O Other scientific data (physics, astronomers, etc.)
O Sensor networks
O Batch processing
O Web 2.0 applications
O Large Data Centers
O Scale problems? Throw more machines at it!
O Clear trend: centralization of computing resources in large data centers
O Necessary ingredients: fiber, juice, and space
O Important Issues:
O Redundancy
O Efficiency
O Utilization
O Management
54
19/10/2010
28
Cloud Killer Apps
O Mobile and web applications
O Mobile devices: low memory & computation power
O Mashup applications
O Extensions of compute-intensive desktop software
O Matlab, Mathematica
O Image rendering, 3D animation
O (Parallel) Batch processing (MapReduce)
O Examples:
O Peter Harkins at The Washington Post: 200 EC2 instances (1,407
server hours), convert 17,481 pages of Hillary Clintons travel
documents within 9 hours
O The New York Times used 100 Amazon EC2 instances + Hadoop
application to recognize 4TB of raw TIFF image into 1.1 million
PDFs in 24 hours ($240)
O Cost associativity
55
Cloud Computing: several definitions.. (1)
O A computing cloud is a set of network enable services
providing scalable, QoS guaranteed, normally personalized,
inexpensive computing infrastructures on demand, which
could be accessed in a simple and pervasive way.
[G. Wang, Y. Wang, Q. Li]
O If you walk into any library or Internet caf and sit down at
any computer without preference for OS or browser and
access a service, that service is cloud-based. [G. Reese]
O Access via web browser
O Zero capital to get started
O You pay only for what you use as you use it
56
19/10/2010
29
Cloud Computing: several definitions.. (2)
O Cloud computing is a nascent business and technology
concept with different meanings for different people.
O for application and IT users, its IT as a service (ITaaS) that
is, delivery of computing, storage, and applications over the
Internet from centralized data center
O for Internet application developers, its an Internet-scale
software development platform and runtime environment
O for infrastructure providers and administrators, its the
massive, distributed data center infrastructure connected by IP
networks
[Microsoft, Cisco, IBM]
57
Cloud Mythologies
O Cloud computing infrastructure is just a web service
interface to operating system virtualization.
O Im running Xen in my data center Im running a private
cloud.
O Cloud computing imposes a significant performance
penalty over bare metal provisioning.
O I wont be able to run a private cloud because my users will
not tolerate the performance hit.
O Clouds and Grids are equivalent
O In the mid 1990s, the term grid was coined to describe
technologies that would allow consumers to obtain computing
power on demand.
58
19/10/2010
30
Why Is Cloud Computing Distinct? (1)
O User-centric interfaces
O On-demand service provisioning
O QoS guaranteed offer
O Autonomous System
O Scalability and flexibility
59
..from other computing paradigms
(e.g., GRID, GLOBAL, INTERNET computing)
Why Is Cloud Computing Distinct? (2)
O User-centric interfaces
O Cloud services should be accessed with simple and
pervasive methods. In fact, the Cloud computing adopts the
concept of Utility computing.
O Utility Computing: users obtain and employ computing
platforms in computing Clouds as easily as they access a
traditional public utility.
O In detail, the Cloud services enjoy the following features:
(1) The cloud interfaces do not force users to change their working
habits and environments.
(2) The cloud client software which is required to be installed locally
is lightweight
(3) Cloud interfaces are location independent and can be
accessed by some well established interfaces like Web services
framework and Internet browser
60
19/10/2010
31
Why Is Cloud Computing Distinct? (3)
O On-demand service provisioning
O The computing Clouds provide resources and services for users
on demand. User can customize and personalize their
computing environments later on, for example, software
installation, network configuration, as users usually own
administrative privileges.
O Illusion of an infinite computing resources available on demand
O Elimination of an up-front commitment by Cloud users
O Ability of pay for use of computing resources on a short-term basis
as needed
O QoS guaranteed offer
O The computing environments provided by computing Clouds
can guarantee QoS for users.
O The computing Cloud renders QoS in general by processing
Service Level Agreement (SLA) with users a negotiation on
the levels of availability, serviceability, performance, operation,
or other attributes of the service like billing and even
penalties in the case of the SLA violations.
61
Its a challenge
Why Is Cloud Computing Distinct? (4)
O Autonomous System
O The computing Cloud is an autonomous system and it is
managed transparently to users. Hardware, software and
data inside clouds can be automatically reconfigured,
orchestrated and consolidated to present a single platform
image, finally rendered to users.
O Scalability and flexibility
O The scalability and flexibility are the most important
features that drive the emergence of the Cloud computing.
Cloud services and computing platforms offered by computing
Clouds could be scaled across various concerns, such as
geographical locations, hardware performance, software
configurations.
The computing platform should be flexible to adapt to various
requirements of a potentially large number of users.
62
19/10/2010
32
All 18-29
User webmail services 56% 77%
Store personal photos online 34% 50%
Use online applications 29% 39%
Store personal videos online 7% 14%
Pay to store computer files online 5% 9%
Back up hard drive to an online site 5% 7%
Percent of Internet users who do the following
Cloud computing activities
Major Minor Not
Easy and convenient 51% 23% 23%
Ubiquitous access 41% 25% 32%
Easily shared 39% 28% 29%
Wont lose information 34% 23% 23%
Why people use Cloud..
Roberto Turrin
Politecnico di Milano
Before the move into the cloud
Issues:
Sw licenses
SLA
Customer lock-in
Scalability
Security (confidentiality)
Cloud cost model
When cloud saves money
64
19/10/2010
33
Software Licenses (1)
O A license working on your data center might not be ported
into the cloud
O Cloud is based on a pay-as-you-go model
O From 0am to 9am: 2 app servers (just for redundancy)
O From 9am to 5pm: 2+6 app servers (peak hours)
O From 5pm to 0am: 2+2 app servers
You pay for 110 hours of computing time
O Traditional sw licenses do not usually match the cloud
pricing model
O You have to pay for 8 licenses, even if only 2 servers are active at
night
O Does sw licenses support usage-based costs?
65
Software Licenses (2)
O Does the sw licenses allow operations in virtualized env.?
O In the cloud it is easy to launch new servers
O Do you have proper/enough licenses?
Are you violating agreements?
O The ideal licensing model is open source
O Most open source sw is free off charge and let you do whatever
you want
O Some supported open source sw (e.g., RedHat Enterp. MySQL
Enterp.) might have slightly more strict licenses
O However, these licenses tend to be cloud-friendly
O Several non open-source offerings tend to hourly license
charges (e.g., Microsoft, Sun)
O EC2: (base) VM with Microsoft Windows Server 0.15$/hour
vs VM with open source OS 0.10$/hour
66
19/10/2010
34
Software Licenses (3)
O Software providers cost model
O is based on one-time purchases
O effectiveness is measured on quartely sales
O Challenges (against clouds flexibility)
O Per-user licenses. Can work in the cloud, but require attention
to the mechanism for auditing your licensing
O Licenses might be tied to a specific MAC/IP
O Software license management system might not support cloud
or virtual environments
O Per-CPU licenses
O You may have to create a custom install for each instance of
the sw.
O If they require a licensing server, it may not be enough smart
to recognize replacement virtual servers on the fly
67
Availability of the Service (1)
O Existing SaaS have set a high standard
O Google Search is the dial tone of the Internet
If Google Search is not available, you think the Internet is
down
O Why are customers reluctant to migrate to Cloud
Computing?
68
Service and outage Duration Date
S3: auth service overload 2h Feb 2008
S3: error in a protocol 6-8h Jul 2008
AppEngine: programming error 5h Jun 2008
Gmail 1.5h Nov 2008
Above the Clouds: A Berkeley View of Cloud Computing Michael Armbrust et al, Feb 2009
19/10/2010
35
Availability of the Service (2)
O Why are customers reluctant to migrate to Cloud
Computing?
O Its rare to lose a physical server with no warning at all
O When a physical component fails (or warns that it is about to)
it is replaced by a redundant component with no downtime
O However, if you loose a physical server the damage is
consistent
O EC2 instance, for instance, are completely unreliable if
compared to a low-end server with component redundancy
O No warnings before loosing a virtual server. It fails and it is not
available/reachable
O However, the loss of a virtual server is almost a nonevent
69
Availability of the Service: what about DDoS?
O With DDoS the service is made unavailable by overloading
the system from multiple sources
O Example. Attackers rent bots on the black market for few
cents - 0.03$ - a week:
O 500000 bots cost the attacker $15000 and generate extra
1GB/second network bandwith
O A victim in EC2 is charged
O an extra 360$/hour for extra bandwith
O extra money for the increased workload (e.g., 100$/hour for 1000
instances)
O It takes at least 32 hours for going over the break-even-point,
i.e., when the victims cost is higher than the attackers cost
O A 32-hour attack is difficult to sustain
O Cloud computing shifts the attack target from the SaaS
provider to the Utiilty Computing (e.g., IaaS) provider
O Cloud is elastic and can dynamically scale
O Usually DDoS protection is a core competence
70
PAXSON, V. private communication, December 2008
19/10/2010
36
Performance: unpredictability
O Multiple VMs share physical CPUs and memory
O CPU/RAM performance is generally satisfactory
O Almost similar to what you expect from a physical server
O More concerns regard network and I/O.
O For instance EC2s local storage is completely unpredictable
O Best practices spread processing across multiple servers.
O Options at app. layer: independent nodes vs clustering
O The former use load-balancer to split sessions. Each VM is
ignorant of the others.
O Massively scalable
O Application state must be managed via shared DB, message queue,
or centralized data storage
O The latter use load-balancer to route requests to clustered
application servers. Servers communicate and share state info
O More complex and limiting scalability
O Many architectures rely on multicasting (e.g., not available in EC2)
O Pro: state information is kept within application server tier
71
Data transfer: performance.. and more
O Data-intensive applications, deployed somewhere across
the boundary of clouds
O 100/150$ per Terabyte transferred!
O Data transferring might be the bottleneck
O Note, transferring 10TB from Berkley to Seattle :
O via S3 at 20Mbit/sec takes 45days and costs $1000
O via overnight shipping takes 1 day and costs $400
72
19/10/2010
37
Data lock-in
O APIs for Cloud are proprietary
O No standardization exist
O This prevents interoperatibility
O Extracting data or program from one cloud and using them in
another cloud is not easy
O Customers are lock-in
O Vulnerable to prices/reliability/SLA
O Vulnerable to providers going out of business
O Standardization would lead to
O Fearer prices (more competition..) with higher QoS
O Surge computing
O Based on private Cloud
O Public cloud is (dynamically) accessed to capture extra capacity
73
Security (1)
O ..its unconfortable that
O someone else is in charge of your data protection
O There is not traditional perimeter security
O Laws in many nations
O require providers to keep customers data and other
copyrighted material within national boundaries (or
Continental boundaries)
O viceversa, often the above information is not allowed to be
stored in a specific country
74
19/10/2010
38
Security (2)
O No big obstacles to create a secure environment in the cloud
O You never know where you data is in the cloud
O Except for specific exploits, data in the cloud
is as secure as data on a physical machine
O E.g., encrypting data before placing it in a cloud
O Public object (e.g., S3) are iherently insecure
and should be encrypted
75
Sold your files to others 90% 5% 2% 3%
Used your information in marketing campaigns 80% 10% 3% 6%
Analyzed your information to display custom ads 68% 19% 6% 7%
Kept a copy of your deleted files 63% 20% 8% 8%
Gave law enforcement files when asked 49% 15% 11% 22%
How concerned would you be if a company
Very
Somewhat
Not
too Not at all
Attitudes about policy of services
19/10/2010
39
Cloud computing economics
O If even pay-as-you-go could be
more expensive than buying,
economy benefits come from
O Elasticity
O Transference of risk
O Elasticity
O Cloud can add/remove resources
O at a fine grain
O within minutes (rather than weeks)
O Dynamicity
O Traditional datacenters tend to overprovision (to sustain peak
workloads)
virtualization/consolidation attenuates this issue
Capacity prediction can be wrong or imprecise,
underestimating the demand
O Transferring of risk
O The risk of wrong prediction is shifted from service operator
to cloud
77
Usage-based
costs
Risk Elasticity
Cloud computing: benefit to users
O Mitigate the risks of over-provisioning and under-
provisioning
O No up-front cost, invest on other aspects (marketing,
technology)
O Less maintenance & operational cost
O Save time, time = money
In summary: Reduce cost
78
19/10/2010
40
Cloud computing: mitigate risks
O Real world utilization 5%-20%
O Animoto demand surge:
from 50 servers to 3500
servers in 3 days
O Black Friday sales
79
Over-provisioning
Demand
Capacity
t
R
e
s
o
u
r
c
e
s
Demand
Capacity
t
R
e
s
o
u
r
c
e
s
R
e
s
o
u
r
c
e
s
Demand
Capacity
t
1
2 3
Under-provisioning
On demand, scalable
Cloud Computing: benefit to providers (1)
O Make money
O Economies of scale
O Time diversity: different peeks for different services
O Geographical diversity: choice of best location
O Electric price in Idaho = 1/5 in Hawaii
O Existing infrastructure & expertise
O Google, Amazon: utilize off-peak capacity
80
Resource Cost for medium scale Cost for large scale Ratio
Network $95 / Mbps / month $13 / Mbps / month ~7x
Storage $2.20 / GB / month $0.40 / GB / month ~6x
Administration 140 servers/admin >1000 servers/admin ~7x
Where $/KWH Why?
Idaho 3.6c Hydroelectric power, no long-distance transfer
California 10c No coal fired electricity allowed. Power
transmitted on long distance
Hawaii 18c Fuel must be shipped
19/10/2010
41
Cloud Computing: benefit to providers (2)
O Leverage existing investment
O Adding cloud services on top of existing infrastructure at a
low incremental cost
O Defend a franchise
O Leverage customer relationships
O Become a platform
O E.g., Facebooks initiative with Joyent (cloud provider) for plug-
in appications
81
Should I move to the Cloud? 82
Government security systems,
trade secrets knowledge
repositories (IPR)
Procurement portals
Internally accessible business
support systems (CRM, ERP)
Financial trading systems
E.g. London Stock Exchange
Time critical news publishing and
press releases
E.g. lottery results, disaster news
Internet sales systems for peak-
demand products
E.g. Concert tickets
Personal internet banking
General interest blogs, corporate
websites
Organisation internal
communication portals, systems
with a limited user community
Cloud Infrastructure Adoption Matrix
Paul Rundle, PA Consulting Group
D
a
t
a

s
e
c
u
r
i
t
y
Application usage pattern (peakiness)
** Security
comprises:
availability,
sensitivity
of data
A
V
O
I
D
A
D
O
P
T
19/10/2010
42
Roberto Turrin
Politecnico di Milano
Main Cloud Technologies
Amazon web services: EC2,S3,MapReduce
GoGrid
RackSpace
AppEngine
Azure
Roberto Turrin
Politecnico di Milano
Amazon
Utility computing
84
19/10/2010
43
Amazon
O Elastic Compute Cloud
O Rent virtual machine instances to run your software.
Monitor and increase / decrease the number of VMs as
demand changes
O How to use:
O Create an Amazon Machine Image (AMI): applications,
libraries, data and associated settings
O Upload AMI to Amazon S3 (simple storage service)
O Use Amazon EC2 web service to configure security and
network access
O Choose OS, start AMI instances
O Monitor & control via web interface or APIs
85
Amazon Web Services: overview
O Amazon EC2 (Elastic Cloud Compute)
O Amazon S3 (Simple Storage Service)
O Amazon SQS (Simple Queue Service)
O Amazon Cloud Front
O Amazon SimpleDB
O Amazon Elastic MapReduce
86
19/10/2010
44
Amazon EC2
O Heart of Amazon Cloud
O Web Services API for provisioning, managing and
deprovisioning virtual servers
O Any application anywhere on the Internet can launch a virtual
server in the Amazon Cloud
O EC2 and S3
O Machine images are stored in S3
O Instances Volume snapshots are stored in S3
O Your virtual server can use S3 for any other storage need
87
Amazon EC2: concepts (1)
O Instance. Created starting from an AMI
O AMI (Amazon Machine Image). Similar to a ghost image
O Amazon prebuilt AMIs
O Third-party AMIs
O Your own AMIs
O Elastic IP address. Static IP address!
O Region. Group of availability zones.
O SLA guarantees availability 99.95% of at least 2 availability
zones within a region.
O Availability zone. Analogous to a data center.
O 2 zones are guaranteed not to share any common point of
failures
88
19/10/2010
45
Amazon EC2: concepts (2)
O Security group. Roughly analogous to a network
segment.
O Protect and define access policy to your instances
O Block storage volume. Analogous to a SAN
O Block-level storage that you can mount from EC2 instances
O Snapshot. A copy of the current state of your block storage
volume.
O Stored in S3
O Good backup mechanism
O However, they are not portable (you cannot use them out of
Amazon cloud)
89
Amazon EC2: concepts (3) 90
Region (e.g., USA/EastCoast)
Availability zone (e.g., us-east-1a)
Security group
AMI Instance Volume
Snapshot
Elastic
IP
1
0..*
0..*
1
0..*
0..1
0..1
0..1
19/10/2010
46
Amazon EC2: characteristics (1)
O Customized Xen hypervisor
O Dynamic provisioning and deprovisioning
O Capabilities needed to provide isolated computing environment
O Storage:
O Two kinds of storage in EC2
O Ephemeral instance storage
O Elastic Block storage (EBS)
O Amazon S3
O Security
O Virtual firewall rules
O Traffic filtered to your nodes
O Routing rules
O Security groups
91
Amazon EC2: characteristics (2)
O Characteristics:
O Elastic: increase or decrease capacity within minutes
O Monitor and control via EC2 APIs
O Completely controlled: root access to each instances
O Flexible: choose your OS, software packages
O Redhat, Ubuntu, openSuse, Windows Sever 2003,
O Small, large, extra large instances
O Reliable: Amazon datacenters, high availability and redundancies
O Secure: web interface to configure firewall settings
O Cost:
O CPU: small instance, $0.10 per hour for Linux, $0.125 per hour
for Windows (1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor)
O Bandwidth: in $0.10, out $0.17 per GB
O Storage: $0.10 per GB-month, $0.10 per 1 million I/O
requests
92
19/10/2010
47
Amazon EC2: practical examples access (1)
O Access
O Through Web Services
O Amazon Web Services Console
93
Amazon EC2: practical examples access (2)
O Access
O Through Web Services
O Amazon Web Services Console
O ElasticFox browser plug-in
94
19/10/2010
48
Amazon EC2: practical examples access (3)
O Access
O Through Web Services API
O Amazon Web Services Console
O ElasticFox Firefox plug-in
O Amazon Command Line tools
95
Amazon EC2: practical examples instance setup (1)
O Instance setup
O It dynamically sets up a server
O An instance is launched from an AMI (stored in S3)
O You can not launch a new EC2 instance when S3 is unavailable
ec2-describe-images
IMAGE ami-225fba4b
ec2-public-images/fedora-core4-apache-mysqlv1.07.manifest.xml
amazon
available
public
i386
machine
96
ID
S3 object
owner
state
public/private
32/64 bit
Machine, ramdisk, kernel
19/10/2010
49
Amazon EC2: practical examples instance setup (2)
O Instance setup
O Launching a virtual server instance based on a machine image
ec2-run-instances ami-1fd73376
RESERVATION r-3d01de54 1234567890123 default
INSTANCE i-b1a21bd8
ami-1fd73376
pending
0
m1.small
2008-10-22T16:10:38+0000
us-east-1a
aki-a72cf9ce
ari-a52cf9cc
97
You can reserve more than
available instances
Instance ID
AMI ID
Status (pending, running, shutting down, terminated
Auth key
Type of instance (less/more powerful machines)
Availability zone
Amazon EC2: practical examples security (1)
O Remote access to instances
O You do not have any account on a new instance
ec2-add-keypair mykeypair
ec2-run-instances -k mykeypair ami-1fd73376 root account
RESERVATION r-3d01de54 1234567890123 default
INSTANCE i-b1a21bd8
ami-1fd73376
pending
mykeypair
98
Auth key
-----BEGIN RSA PRIVATE KEY-----
MIIEoQIBAAKCAQBuLFg5ujHrtm1jnutSuoO8
Xe56LlT+HM8v/xkaa39EstM3/aFxTHgElQiJLC
hpHungXQ29VTc8rc1bW0lkdi23OH5eqkMH
19/10/2010
50
Amazon EC2: practical examples security (2)
O Remote access to instances
O An other issue: security rules deny you to access
ec2-authorize default -p 22
ec2-authorize default -P tcp -p 22 -s 10.0.0.1/32
RESERVATION r-3d01de54 1234567890123 default
O You can create additional groups, defining different rule sets
for different kinds of instances
O Opening HTTP/HTTPS for a LB
O Blocking any access to an application server
ec2-add-group mygroup -d groupdescription
O You can not change security group on running instances
99
Amazon EC2: practical examples availability zones
O Availability zones
O 3 in USA, 2 in EU
O 2 availability zones have distinct physical infrastructure
O Failure of part/all of one zone does not impact the others
O Amazon SLA: 99.95% availability of at least 2 zones in a region
O When you launch one instance in one zone and another
instance in a second zone, you gain infrastructural redundancy
O You pay for traffic between any two zones. You do not pay for
traffic within the same zone
m1.small
2008-10-22T16:10:38+0000
us-east-1a
aki-a72cf9ce
ari-a52cf9cc
100
Availability zone
19/10/2010
51
Amazon EC2: practical examples static IP
O New instances are dynamically assigned a public and a
private IP address
O You might need a static IP address (e.g., Web SERVER)
O Amazon account allows 5 elastic (i.e., static) IP addresses
O You pay for each elastic IP address you create (even if not
assigned*)
O The private IP address always remains dynamic
O Routing of IPv6 addresses is not currently supported
101
*note: youre charged even if you are not using the IP address
Amazon EC2: practical examples Data storage (1)
O Persistent cloud storage (S3)
O Ephemeral instance storage
O Its lifespan matches the instance it supports
O Unpredictable speed
O Elastic block storage (EBS)
O Think it as a SAN (performance of a GB Ethernet)
O You can mount/unmount it
O You can mount any number of volumes (size: 1GB-1TB)
O It can be formatted with the filesystem you prefer
O However, a volume can not be shared between 2 instances
102
19/10/2010
52
Amazon EC2: practical examples Data storage (2)
O Persistent cloud storage (S3)
O Ephemeral instance storage
O Elastic block storage (EBS)
103
S3 Instance Block
Speed

?

Reliability

Durability

Flexibility

Complexity high low high
Cost

Strength DR
management
Transient data Operational
data
Weakness Operational
data
Nontransient
data
Lots of small
I/O
Amazon S3
O Cloud-based persistent storage
O Independent from ohter Amazon services
O Apps on your own server can leverage Amazon S3 without be
in the cloud
O Do not think S3 as a remote filesystem
O You store object in buckets
O You do not store files
O You do not have directories
O Objects can not be larger than 5 GB
O Buckets exist in a flat namespace shared among all S3 users
O You can not create a tree of buckets
O If you want, you can make your buckets/object publicly available
O You can not mount S3
O There exist 3rd-party solutions.. however, S3 is not a filesystem
104
19/10/2010
53
Amazon S3: access
O Web services (SOAP or REST)
O Find buckets and objects
O Discover their metadata
O Create new buckets
O Upload new objects
O Delete existing buckets and objects
O BitTorrent
O P2P
O In general, transactional web app wont use BitTorrent
105
Amazon S3: reliability/availability vs durability 106
O Durability
O data reliability/availability
O No data loss, no data corruption
O Availability
O 99.5%
O 365*24*0.005 = 44h/year
O Weak track record in terms of availability
19/10/2010
54
Roberto Turrin
Politecnico di Milano
GoGrid
"world's first multi-server control panel"
107
GoGrid: a cloudcenter
O Cloudcenter
O A data center in the clouds
O Cloudcenters share some
characteristics with
Service infrastructure (e.g., AWS)
O Scale on demand
O Pay-as-you-go
O Convert capital expenditures (capex) to
operational expenditures (opex)
O Programmatic (API) and graphical user interfaces (GUI)
O Basic infrastructure: storage, servers, network, power, and
cooling
O Other examples: FlexiScale, ElasticHosts, AppNexus
O Easier cloud-bridging
108
19/10/2010
55
Cloudcenters vs Service infrastructures
O Service infrastructures
O Custom web services in the cloud
O Deliver a web application
O Do batch processing
O Every web service (storage, database, servers,) is a unique
and custom solution. E.g.,
O S3 uses the S3 protocol and proprietary storage mechanisms
O SQS uses its own non-standard custom protocol and message
format
O Services are designed in a custom manner to allow to scale
O Cloudcenters
O Most AWS competitors use this approach
O Standard data center services using standard technology and
protocols, but in the cloud. E.g.,
O Storage is available via SMB/CIFS and NFS
O Database uses standard SQL and RDBMS
O Firewalls and load balancers are based on HW appliances
(instead of custom distributed and configured software)
109
GoGrid vs Traditional data centers (1)
O Traditional data center are composed by:
O Perimeter security (firewalls, IDSs)
O Load balancing (hw)
O Network segmentation (e.g., VLANs)
O A combination of physical and virtual hw
O Filesharing (NAS)
O Block storage (SAN)
O Support services: DNS, DHCP, server imaging, inventory
management, asset management and monitoring
O Power, cooling, bandwith, and backup
O 24/7 on site support and staff
O Cloudcenters offer most of these services in a multitenant
fashion.
110
19/10/2010
56
GoGrid vs Traditional data centers (2)
O Cloudcenters allow reusing your current in-house data
center expertise (e.g., forecast/managing capacity)
O Benefits of cloud computing: capacity on-demand,
automating workload elasticity, pay only for what you use
O Elastic workload can be moved to the cloud for cost
optimization
O Two directions of scaling
O Out (horizontal scaling). Like any Cloud
O Solution for stateless services (web/application servers, batch
processing), where adding new (virtual) servers requires little
additional configuration and brings additional capacity
O Up (vertical scaling). Like traditional data center
O Solution for stateful services (databases, fileservers).
Significant reconfiguration is required.
Rebalancing/synchronizing Terabyte is non-trivial
Adding servers does not directly translate into more capacity.
O In these cases its better to use bigger servers
111
GoGrid: role of physical hosting 112
DMZ
Scale out
Scale up
19/10/2010
57
GoGrid vs Data centers vs Service Infrastructure (1)
Functionality Data center GoGrid AWS
Firewall Hw firewall Hw firewall Distributed sw firewall
Load balancer (LB) Hw LB Hw LB Roll-your-own sw LB
Network isolation VLan VLan Distributed sw firewall
Private Networks Yes (VLan) Yes (VLan) No
Network protocols Any Any No multicast/broadcast..
OS Any Some limits Some limits
DNS Managed in house Managed by GoGrid No
Persistent local
storage
Yes Yes No
Persistent network
storage
Yes Yes Yes
Mixed virtual and
physical servers
Yes Yes No
113
Real-life usage: GoGrid vs AWS
O Infrastructure service
O Little of your current in-house expertise (networking,
storage,) is relevant.
O You need to learn new skills to manage S3 and EC2, to go over
the lack of multi/broadcast, to configure the custom distributed
sw firewall,
O CloudCenters
O Very similar to using the console of a virtualization
management system (e.g., Vmware VirtualCenter).
O In addition to servers, you can control network, DNS, storage,
load balancers and firewall through the same panel
O Friendlier for transactional web applications than for batch
processing applications (that do not need firewall,
loadbalancer, VLans,)
114
19/10/2010
58
Roberto Turrin
Politecnico di Milano
Rackspace
115
Rackspaces Cloud Services
O Rackspaces roots are in managed hosting of physical
servers and its Fanatical Support
O In October 2008, Rackspace acquired Slicehost, leader in
the Linux virtual server hosting market
O In the clouds, Rackspaces solutions can range
O from small startups in a complete virtual cloud envinronment
O to complex physical servers
O Customers have the advantage of
O a single vendor relationship,
O fully integrated technologies
O benefits of having their physical servers located within the
same data centers as their cloud services
116
19/10/2010
59
Rackspace: Cloud Servers
O Custom images and image repositoris
O Programmer API
O Static IP address
O Utility-based pricing
O Traditional Fanatical
O Rackspaces hybrid capability
O Managed hosting customers can tap into Cloud Servers
when they need extra computing power (scaling out)
O Customers can duplicate physical configuration almost
instantaneously
O Testing, Quality Assurance (QA), development, change reviews,
117
Rackspace: Cloud Files
O Storage service
O Concept of objects (up to 5GB) and
non-nested containers (buckets)
O Objects can be associated to metadata
O Access via web interface and programmer API (REST web
services, PHP, Phython, Ruby, Java, C#/.Net)
O Pay-as-you-go services
O It can be used for backup and data archives in the cloud
O Possibility of easily publishing the content behind a CDN
(Content distribution network)
118
19/10/2010
60
Rackspace: Cloud Sites
O Two stacks:
O LAMP (Linux, Apache, MySQL, PHP/Perl/Python)
O Microsoft (Windows, .NET, ASP, SQL Server)
O Hosting solution for traditional web site
O It handles the backend management
O It automatically scales the site as demand increases (or
decreases)
119
Roberto Turrin
Politecnico di Milano
AppEngine
Google
120
19/10/2010
61
Google AppEngine - introduction
O Write your web program in Python (recently, Java too) and
submit to Google. It will take care of the rest
O How to use
O Download AppEngine SDK
O Develop your program locally
O A set of python programs, input = requested url, output = return
message
O Debug locally
O Register for an application id
O Submit your application to Google
121
Google AppEngine Hello world
O Creating a Simple Request Handler
Create a file helloworld.py:
O Map url to handler
Edit configuration file app.yaml
O Data storage:
O Distributed file system
O Store using AppEngine API, retrieve using GQL
O Built on top of Bigtable
O designed to scale well
O Abstraction on top of Bigtable
O API influenced by scalability
O No joins
O Recommendations: denormalize schema; precompute joins
O Debug: http://localhost:8080/
122
print 'Content-Type: text/plain'
print ''
print 'Hello, world!
application: helloworld
version: 1
handlers:
- url: /.*
script: helloworld.py
19/10/2010
62
Google AppEngine
O Register for an application ID
O http://appengine.google.com
O Verification code sent to your mobile
O Uploading the Application
O appcfg.py update helloworld/
O Enter your Google username and password at the prompts
O http://application-id.appspot.com
O Manage using Administration Console
O Set up domain name
O Invite other people to be developers
O View error logs, traffic logs
O Switch between different versions
123
AppEngine: architecture 124
19/10/2010
63
Automatic Scaling to Application Needs
O You dont need to configure your resource needs
O One CPU can handle many requests per second
O Apps are hashed (really mapped) onto CPUs:
O One process per app, many apps per CPU
O Creating a new process is a matter of cloning a generic
model process and then loading the application code (in fact
the clones are pre-created and sit in a queue)
O The process hangs around to handle more requests (reuse)
O Eventually old processes are killed (recycle)
O Busy apps (many QPS) get assigned to multiple CPUs
O This automatically adapts to the need
O as long as CPUs are available
125
Google AppEngine
O Characteristics
O Easy to start, little administration
O Scale automatically
O Reliable
O Integrated with Google user service: get user nickname, login
O Cost:
O Can set daily quota
O CPU hour: 1.2 GHz Intel x86 processor
O Free quotas going to be reduced soon
126
Resource Unit Unit
cost
Free
(daily)
Outgoing
Bandwidth
gigabytes $0.12 10GB
Incoming
Bandwidth
gigabytes $0.10 10GB
CPU Time CPU hours $0.10 46 hours
Stored Data gigabytes per
month
$0.15 1GB (all)
App Engine costs
nothing to get
started
19/10/2010
64
Roberto Turrin
Politecnico di Milano
Azure
Platform
Microsoft
127
Platform-as-a-Service
Software-as-a-service
Infrastructure-as-a-Service
Image from http://news.cnet.com/2300-1001_3-10001898-5.html?tag=mncol
Azure - Introduction
O It is an operating system for the cloud
O It is designed for utility computing
O It has four primary features:
O Service management
O Compute
O Storage
O Developer experience
O Facilities
O Abstract execution environment
O Shared file system
O Resource allocation
O Programming environments
O Supports building applications that scale
O Programming tools and interfaces are designed to be familiar to
traditional desktop programmer
128
19/10/2010
65
Azure platform 129
http://www.microsoft.com/windowsazure/whitepapers/
Main components: Windows Azure
O Fabric: Microsoft data centers
accessible via Internet
O RESTful approach
O Azura compute and storage
services are built on top
O Apps written in C#, VB, Java, .Net
O Web apps written in Asp.Net, Windows Communication
Foundation, Php,
O Each app has a config file (e.g., number of instances) -
Windows Azure tries to maintain the desider state
O Access via Windows Live ID
130
HTTP/HTTPS
Blobs
Table
s
Drives
Queues
Blobs file/container storage
Tables - non-schematized data
Queues inter-role communication
Drives durable storage (in beta)
19/10/2010
66
Windows
Windows
Windows
Running applications Web and Worker roles 131
VM
VM
IIS
WEB
ROLE
instance
Agent
LB
HTTP
HTTPS
VM
VM
WORKER
ROLE
instance
Agent
Windows
Communation via:
Azure storage queue
Windows Communication
Foundation (WCF)
Interact with Fabric
Exposes API
O To be scalable Web/worker
instances must be stateless
O State in Azure SQL
O Apps developed much like on-
premises (Visual Studio)
Hypervisor
Accessing data blobs, tables, and queues
O Unstructured data
O Two-level: a container
holds 1 or more blobs
O Blobs can have
associated metadata
O A CDN is provided
132
Blobs
O NO relational tables
O Store entity with
properties
O Azure Storage partitions
tables across multiple
servers if necessary
Tables
O Mainly for Web-Worker
role communication
Queues
O 3 replica for each data
O Fault-tolerance
O Consistency guaranteed
O REST technology to identify and
expose data
O URI, HTTP(s) access
19/10/2010
67
SQL Azure
O SQL Server in the cloud
O Core relational database capabilities
O REST, native, and ODBC accessibility
O Data Sync between cloud and
on-premise databases
O Automatic replication and failover
O 3 replica
133
1GB
1GB
1GB
10GB
Web Edition Business Edition
TDS
(Tabular Data Stream)
Azure Platform APPFABRIC
O Service Bus
O Communication across organizational and network boundaries
O Help your app to become kind of a SaaS
O You define endpoints
O They can be accessed by other apps (in the cloud or on-
premises)
O They are assigned a URI to locate the service
O Access Control Service
O Federated,
claims-based identity
134
[Developer Academy]
R
e
s
t
19/10/2010
68
APPFABRIC Service BUS
Sample Scenarios
O Enterprise wished to let software at its trading partners
access one of its applications
O Application running on Windows Azure might need to access
data stored in an on-premises database
O An enterprise that exposes application services to its
trading partners
O Relying on Access Control to authenticate and provide identity
info for each client app. This info is not maintained internally,
but stored in the Access Control service
135
[Developer Academy]
Azure Pricing Meters
SQL Azure
Per month
Web Edition (1GB) = $9.99
Business Edition (10GB) = $99.99
Compute
Per Service Hour
Small: $0.12
Medium: $0.24
Large: $0.48
X-Large: $0.96
Storage
Per GB stored and
transactions
Storage = $0.15 / GB
Transaction = $0.10 / 100K
Bandwidth
Per GB transfer in/out of a datacenter
US/EU =$0.10 in / $0.15 out
Asia Pacific = $0.30 in / $0.45 out
AppFabric
Per Message Operation
$0.015 per 10K messages
http://www.microsoft.com/windowsazure/tco/
19/10/2010
69
Service Level Agreements
Instance
health
SQL Azure
availability
AppFabric
availability
Compute
connectivity
Storage
availability
Roberto Turrin
Politecnico di Milano
Amazon Elastic MAPREDUCE
Hadoop in the Clouds
19/10/2010
70
Web-Scale Problems?
O Dont hold your breath:
OBiocomputing
ONanocomputing
OQuantum computing
O
OBatch jobs: e.g., invoices processing
O It all boils down to
ODivide-and-conquer
OThrowing more hardware at the problem
139
Different Workers
O Different threads in the same core
O Different cores in the same CPU
O Different CPUs in a multi-processor system
O Different machines in a distributed system
Choices
O Commodity vs. exotic hardware
O Number of machines vs. processor vs. cores
O Bandwidth of memory vs. disk vs. network
O Different programming models
140
Divide-and-conquer
19/10/2010
71
Patterns for Parallelism
O Parallel computing has been around for decades
O Here are some design patterns
Producer/consumer flow
141
..toward Map Reduce
O From patterns to implementation:
O pthreads, OpenMP for multi-threaded programming
O MPI for clustering computing
O
O The reality:
O Lots of one-off solutions, custom code
O Write you own dedicated library, then program with it
O Burden on the programmer to explicitly manage everything
O MapReduce
142
19/10/2010
72
MapReduce: Functional Programming
O MapReduce
O functional programming meets distributed processing on
steroids
O Not a new idea dates back to the 50s (or even 30s)
O What is functional programming?
O Computation as application of functions
O Theoretical foundation provided by lambda calculus
O How is it different?
O Traditional notions of data and instructions are not
applicable
O Data flows are implicit in program
O Different orders of execution are possible
O Exemplified by LISP
143
Functional programming: e.g., Lisp
O Lisp Lost In Silly Parentheses
O Lists are primitive data types
O Functions written in prefix notation
O Two important concepts in functional programming
O Map: do something to everything in a list
O Fold: combine results of a list in some way
(+ 1 2) 3
(* 3 4) 12
(sqrt (+ (* 3 3) (* 4 4))) 5
(define x 3) x
(* x 5) 15
'(1 2 3 4 5)
'((a 1) (b 2) (c 3))
144
19/10/2010
73
Map
O Map is a higher-order function
O How map works:
O Function is applied to every element in a list
O Result is a new list
f f f f f
145
(map (lambda (x) (* x x))
'(1 2 3 4 5))
'(1 4 9 16 25)
Fold
O Fold is also a higher-order function
O How fold works:
O Accumulator set to initial value
O Function applied to list element and the accumulator
O Result stored in the accumulator
O Repeated for every item in the list
O Result is the final value in the accumulator
f f f f f final value
Initial value
146
(fold + 0 '(1 2 3 4 5)) 15
19/10/2010
74
Typical Problem
O Iterate over a large number of records
O Extract something of interest from each
O Shuffle and sort intermediate results
O Aggregate intermediate results
O Generate final output
Key idea: provide an abstraction at the point of
these two operations
147
MapReduce
O Programmers specify two functions:
map (k, v) <k, v>*
reduce (k, v) <k, v>*
O All v with the same k are reduced together
O Usually, programmers also specify:
partition (k, number of partitions ) partition for k
O Often a simple hash of the key, e.g. hash(k) mod n
O Allows reduce operations for different keys in parallel
O Implementations:
O Google has a proprietary implementation in C++
O Hadoop is an open source implementation in Java (lead by
Yahoo)
148
19/10/2010
75
MapReduce
Data Store
Initial kv pairs
map map
Initial kv pairs
map
Initial kv pairs
map
Initial kv pairs
k
1
, values
k
2
, values
k
3
, values
k
1
, values
k
2
, values
k
3
, values
k
1
, values
k
2
, values
k
3
, values
k
1
, values
k
2
, values
k
3
, values
Barrier: aggregate values by keys
reduce
k
1
, values
final k
1
values
reduce
k
2
, values
final k
2
values
reduce
k
3
, values
final k
3
values
149
Typical Parallelization Problems
O How do we assign work units to workers?
O What if we have more work units than workers?
O What if workers need to share partial results?
O How do we aggregate partial results?
O How do we know all the workers have finished?
O What if workers die?
150
19/10/2010
76
MapReduce Runtime
O Handles scheduling
O Assigns workers to map and reduce tasks
O Handles data distribution
O Moves the process to the data
O Handles synchronization
O Gathers, sorts, and shuffles intermediate data
O Handles faults
O Detects worker failures and restarts
O Everything happens on top of a distributed FS
151
From Theory to Practice
Hadoop Cluster
You
1. Scp data to cluster
2. Move data into HDFS
3. Develop code locally
4. Submit MapReduce job
4a. Go back to Step 3
5. Move data out of HDFS
6. Scp data from cluster
152
19/10/2010
77
On Amazon: With EC2
You
1. Scp data to cluster
2. Move data into HDFS
3. Develop code locally
4. Submit MapReduce job
4a. Go back to Step 3
5. Move data out of HDFS
6. Scp data from cluster
0. Allocate Hadoop cluster
EC2
Your Hadoop Cluster
7. Clean up!
153
On Amazon: EC2 and S3
Your Hadoop Cluster
S3
(Persistent Store)
EC2
(The Cloud)
Copy from S3 to HDFS
Copy from HFDS to S3
154
19/10/2010
78
MapReduce example: PageRank
Given page x with in-bound links t
1
t
n
, where
O C(t) is the out-degree of t
O o is probability of random jump
O N is the total number of nodes in the graph

=
+ |
.
|

\
|
=
n
i i
i
t C
t PR
N
x PR
1
) (
) (
) 1 (
1
) ( o o
X
t
1
t
2
t
n

PageRank: a sketch
O Properties of PageRank
O Can be computed iteratively
O Effects at each iteration is local
O Sketch of algorithm:
O Start with seed PR
i
values
O Each page distributes PR
i
credit to all pages it links to
O Each target page adds up credit from multiple in-bound links
to compute PR
i+1
O Iterate until values converge
19/10/2010
79
PageRank in MapReduce
Map: distribute PageRank credit to link targets
...
Reduce: gather up PageRank credit from multiple
sources to compute new PageRank value
Iterate until
convergence
PageRank in MapReduce: speed-up 158

You might also like