AWS Certified Solutions Architect Professional Slides v2.0

Download as pdf or txt
Download as pdf or txt
You are on page 1of 724

NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.

com
Disclaimer: These slides are copyrighted and
strictly for personal use only
• This document is reserved for people enrolled into the
Ultimate AWS Certified Solutions Architect Professional course

• Please do not share this document, it is intended for personal use and exam
preparation only, thank you.

• If you’ve obtained these slides for free on a website that is not the course’s
website, please reach out to [email protected]. Thanks!

• Best of luck for the exam and happy learning!

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Certified Solutions
Architect Professional Course
SAP-C01

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Setting the right expectations for this course
• This course is all slides based
• I’m assuming you have experience using AWS
• No hands-on will come with the course. You should know the basics
• It’s fast paced. Your time is valuable. Feel free to slow me down to 0.75x
• If you just passed the AWS Certified Solutions Architect Associate cert
• I recommend you go through AWS Certified Developer, SysOps & DevOps
• I know you are eager to get the SAP certification, but take your time
• The AWS knowledge needed for the SA Pro exam
• Is extremely similar to the knowledge for SAA
• The questions are more complex, and knowing details is very important
• It’s possible that multiple answers are correct, but one is the most appropriate

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
The AWS Certified Solutions Architect
Professional Exam
• Is HARD
• Tests real AWS experience
• Will test you on some very subtle service features

• I have included quizzes for every single section BUT…


• The quizzes are not “scenario based” / ”exam-like”
• They only help you extract some important notions out of what you’re learning
• This is my optimal way of teaching you about specific topics
• Please trust my teaching process

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Practice Exams
• This course does not come with practice exams
• I recommend you look on Udemy for extra practice exams
• I really want to focus this course on the knowledge needed
• I may come up with a practice exam at some point (to be purchased separately)

• Warning:
• This course is on the NEW CERTIFICATION (SAP-C01)
• You may see outdated content in other practice exams, other courses, etc…
• This course is not incomplete, it’s more targeted towards the knowledge you
actually need to know to pass the exam

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Identity & Federation Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
IAM – What should you know by now
• Users: long term credentials
• Groups
• Roles: short-term credentials, uses STS
• EC2 Instance Roles: uses the EC2 metadata service. One role at a time per instance
• Service Roles: API Gateway, CodeDeploy, etc…
• Cross Account roles
• Policies
• AWS Managed
• Customer Managed
• Inline Policies
• Resource Based Policies (S3 bucket, SQS queue, etc…)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
IAM Policies Deep Dive
• Anatomy of a policy: JSON doc with Effect,
Action, Resource, Conditions, Policy Variables
• Explicit DENY has precedence over ALLOW
• Best practice: use least privilege for
maximum security
• Access Advisor: See permissions granted and
when last accessed
• Access Analyzer: Analyze resources that are
shared with external entity
• Navigate Examples at:
https://docs.aws.amazon.com/IAM/latest/User
Guide/access_policies_examples.html

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
IAM AWS Managed Policies
AdministratorAccess
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
IAM AWS Managed Policies
PowerUserAccess

{ …{
"Version": "2012-10-17", "Effect": "Allow",
"Statement": [ "Action": [
{ "iam:CreateServiceLinkedRole",
"Effect": "Allow", "iam:DeleteServiceLinkedRole",
"NotAction": [ "iam:ListRoles",
"iam:*", "organizations:DescribeOrganization”,
"organizations:*", "account:ListRegions"
"account:*" ],
], "Resource": "*"
"Resource": "*" }
},… ]
}

Note how ”NotAction” is used instead of Deny


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
IAM Policies Conditions
"Condition" : { "{condition-operator}" : { "{condition-key}" : "{condition-value}" }}

Operators:
• String (StringEquals, StringNotEquals, StringLike…)
• "Condition": {"StringEquals": {"aws:PrincipalTag/job-category": "iamuser-admin"}}
• "Condition": {"StringLike": {"s3:prefix": [ "", "home/", "home/${aws:username}/" ]}}
• Numeric (NumericEquals, NumericNotEquals, NumericLessThan…)
• Date (DateEquals, DateNotEquals, DateLessThan…)
• Boolean (Bool):
• “Condition": {"Bool": {"aws:SecureTransport": "true"}}
• "Condition": {"Bool": {"aws:MultiFactorAuthPresent": "true"}}
• (Not)IpAddress:
• "Condition": {"IpAddress": {"aws:SourceIp": "203.0.113.0/24"}}
• ArnEquals, ArnLike
• Null: "Condition":{"Null":{"aws:TokenIssueTime":"true"}}

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
IAM Policies Variables and Tags
Example: ${aws:username}
• "Resource": ["arn:aws:s3:::mybucket/${aws:username}/*"]

AWS Specific:
• aws:CurrentTime, aws:TokenIssueTime, aws:principaltype, aws:SecureTransport,
aws:SourceIp, aws:userid, ec2:SourceInstanceARN

Service Specific:
• s3:prefix, s3:max-keys, s3:x-amz-acl, sns:Endpoint, sns:Protocol…

Tag Based:
• iam:ResourceTag/key-name, aws:PrincipalTag/key-name…

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
IAM Roles vs Resource Based Policies
• Attach a policy to a resource (example: S3 bucket policy) versus
attaching of a using a role as a proxy

User Role
Account A Account B

Amazon S3
Account B

User S3 Bucket
Account A Policy

Amazon S3
Account B

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
IAM Roles vs Resource Based Policies
• When you assume a role (user, application or service), you give up your
original permissions and take the permissions assigned to the role

• When using a resource-based policy, the principal doesn’t have to give up any
permissions

• Example: User in account A needs to scan a DynamoDB table in Account A


and dump it in an S3 bucket in Account B.

• Supported by: Amazon S3 buckets, SNS topics, SQS queues, Lambda


functions, ECR, Backup, EFS, Glacier, Cloud9, AWS Artifact, Secrets Manager,
ACM, KMS, CloudWatch Logs, API Gateway, EventBridge etc…

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
IAM Permission Boundaries
• IAM Permission Boundaries are supported for users and roles (not groups)
• Advanced feature to use a managed policy to set the maximum permissions
an IAM entity can get.

Example: + = No Permissions

IAM Permission Boundary IAM Permissions


Through IAM Policy
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
IAM Permission Boundaries
• Can be used in combinations of Use cases
AWS Organizations SCP
• Delegate responsibilities to non
administrators within their permission
boundaries, for example create new IAM
users

• Allow developers to self-assign policies


and manage their own permissions, while
making sure they can’t “escalate” their
privileges (= make themselves admin)

• Useful to restrict one specific user


(instead of a whole account using
Organizations & SCP)
https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_boundaries.html

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Using STS to Assume a Role
• Define an IAM Role within your AssumeRole API
account or cross-account
• Define which principals can access
AWS STS
this IAM Role
user
• Use AWS STS (Security Token temporary
security
Service) to retrieve credentials and credential
impersonate the IAM Role you permissions
have access to (AssumeRole API)
• Temporary credentials can be valid
between 15 minutes to 12 hour
Role (same or
other account) IAM

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Assuming a Role with STS
• Provide access for an IAM user in one AWS account that you own to access
resources in another account that you own
• Provide access to IAM users in AWS accounts owned by third parties
• Provide access for services offered by AWS to AWS resources
• Provide access for externally authenticated users (identity federation)

• Ability to revoke active sessions and credentials for a role


(by adding a policy using a time statement – AWSRevokeOlderSessions)

When you assume a role (user, application or service), you give up your original
permissions and take the permissions assigned to the role

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Providing Access to an IAM User in Your or
Another AWS Account That You Own
• You can grant your IAM users permission to switch to roles within your AWS
account or to roles defined in other AWS accounts that you own.

Terminate EC2 Instance


User Role
Account A Account A*

• Benefits:
• You must explicitly grant your users permission to assume the role.
• Your users must actively switch to the role using the AWS Management Console or
assume the role using the AWS CLI or AWS API
• You can add multi-factor authentication (MFA) protection to the role so that only users
who sign in with an MFA device can assume the role
• Least privilege + auditing using CloudTrail

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Cross account access with STS
Development Account
Production Account

1. Admin creates role that grants


Development account read/write
access to productionapp bucket 3. Users requests Group: Testers
Access to role

4. STS returns
Group: Developers
Role: UpdateApp Role credentials

2. Admin grants members of the


group Developers permission to
5. User can access assume the UpdateApp Role
S3 bucket: productionapp the S3 bucket by using
the role credentials

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Providing Access to AWS Accounts Owned by
Third Parties
• Zone of trust = accounts, organizations that you own
• Outside Zone of Trust = 3rd parties
• Use IAM Access Analyzer to find out which resources are exposed
• For granting access to a 3rd party:
• The 3rd party AWS account ID
• An External ID (secret between you and the 3rd party)
• To uniquely associate with the role between you and 3rd party
• Must be provided when defining the trust and when assuming the role
• Must be chosen by the 3rd party
• Define permissions in the IAM policy

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
The confused deputy

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
STS Important APIs
• AssumeRole: access a role within your account or cross-account
• AssumeRoleWithSAML: return credentials for users logged with SAML
• AssumeRoleWithWebIdentity: return creds for users logged with an IdP
• Example providers include Amazon Cognito, Login with Amazon, Facebook,
Google, or any OpenID Connect-compatible identity provider
• AWS recommends using Cognito instead
• GetSessionToken: for MFA, from a user or AWS account root user
• GetFederationToken: obtain temporary creds for a federated user,
usually a proxy app that will give the creds to a distributed app inside a
corporate network

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Identity Federation in AWS
• Give users outside of AWS permissions to access login
AWS resources in your account
credentials
• You don’t need to create IAM Users (user
management is outside AWS) User
Identity
• Use cases: Provider
• A corporate has its own identity system (e.g., Active
Directory)
• Web/Mobile application that needs access to AWS trust
resources access
relationship
• Identity Federation can have many flavors:
• SAML 2.0
• Custom Identity Broker
• Web Identity Federation With(out) Amazon Cognito
• Single Sign-On (SSO)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SAML 2.0 Federation
• Security Assertion Markup Language 2.0 (SAML 2.0)
• Open standard used by many identity providers (e.g., ADFS)
• Supports integration with Microsoft Active Directory Federations Services (ADFS)
• Or any SAML 2.0–compatible IdPs with AWS
• Access to AWS Console, AWS CLI, or AWS API using temporary credentials
• No need to create IAM Users for each of your employees
• Need to setup a trust between AWS IAM and SAML 2.0 Identity Provider (both ways)

• Under-the-hood: Uses the STS API AssumeRoleWithSAML

• SAML 2.0 Federation is the “old way”, Amazon Single Sign-On (AWS SSO) Federation is
the new managed and simpler way
• https://aws.amazon.com/blogs/security/enabling-federation-to-aws-using-windows-active-directory-
adfs-and-saml-2-0/

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SAML 2.0 Federation – AWS API Access

Corporate (Identity Provider) AWS Cloud

1. App makes Auth. Request 4. AssumeRoleWithSAML API


Portal/Identity Security Token
3. SAML Assertion 5. Temporary Security
Provider (IdP) Service (STS)
Credentials
User

2. Authenticate

6. Access AWS APIs


LDAP-based S3 Bucket
Identity Store

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SAML 2.0 Federation – AWS Console Access
Corporate (Identity Provider) AWS Cloud
IAM Role
1. User Logs into Portal 4. Post to AWS Sign-in
Portal/Identity AWS Sign-in Endpointfor SAML
3. SAML Assertion 6. Sign-in URL for
Provider (IdP) (https://signin.aws.amazon.com/saml)
AWS Console
User 5. Request Temporary
Security Credentials

2. Authenticate Security Token


Service (STS)

LDAP-based
Identity Store 7. Redirect

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SAML 2.0 Federation –
Active Directory FS (ADFS)
Corporate (Identity Provider) AWS Cloud
IAM Role
1. User Logs into Portal 4. Post to AWS Sign-in
AWS Sign-in Endpointfor SAML
3. SAML Assertion 6. Sign-in URL for (https://signin.aws.amazon.com/saml)
AWS Console
User 5. Request Temporary
Security Credentials

2. Authenticate Security Token


Service (STS)

7. Redirect

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Custom Identity Broker Application
• Use only if Identity Provider is NOT compatible with SAML 2.0
• The Identity Broker Authenticates users & requests temporary credentials from AWS
• The Identity Broker must determine the appropriate IAM Role
• Uses the STS API AssumeRole or GetFederationToken
AWS Cloud
Corporate (Identity Provider)
3. Request Temporary Security Credentials Security Token
Service (STS)
1. User browse to a URL
Custom 5. Access AW
4. Token or URL S APIs AWS Services
Identity Broker
User 5. R …
edi
2. Authenticate rec
t

LDAP-based
Identity Store

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Web Identity Federation – Without Cognito
AWS Cloud
Client 3. AssumeRoleWithWebIdentity API
Security Token
• Not recommended by AWS 4. Temporary Security Credentials
Service (STS)
– use Cognito instead
5. A
cce
1. Login 2. Web Identity ss AW
Token S re
sou
rce
s
3rd Party Identity Provider AWS Services

OpenID Connect
or Compatible IdP

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Web Identity Federation – With Cognito
• Preferred over for Web
Identity Federation Client AWS Cloud
• Create IAM Roles using 3. ID Token
Cognito with the least 4. Cognito Token Amazon
privilege needed
• Build trust between the Cognito
5. C
OIDC IdP and AWS 6. Te ogn
mpo ito T
ra ry S oken
ecur
1. Login 2. ID Token it y C
rede Security Token
ntia
• Cognito benefits: 7.
Ac
ce
ls
Service (STS)
• Supports anonymous users ss
AW
• Supports MFA 3rd Party Identity Provider Sr
es
ou
• Data Synchronization rce
s AWS Services


• Cognito replaces a Token OpenID Connect
Vending Machine (TVM) or Compatible IdP

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Web Identity Federation – IAM Policy
• After being authenticated with
Web Identity Federation, you
can identify the user with an
IAM policy variable

• Examples:
• cognito-
identity.amazonaws.com:sub
• www.amazon.com:user_id
• graph.facebook.com:id
• accounts.google.com:sub

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What is Microsoft Active Directory (AD)?
• Found on any Windows Server
with AD Domain Services Domain Controller
• Database of objects: User
John
Accounts, Computers, Printers, Password
File Shares, Security Groups
• Centralized security
management, create account,
assign permissions
• Objects are organized in trees
• A group of trees is a forest

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What is ADFS (AD Federation Services)?
• ADFS provides Single Sign-On across applications
• SAML across 3rd party: AWS Console, Dropbox, Office365, etc…
Corporate (Identity Provider) AWS Cloud

1. User browse to a URL 4. POST to AWS Sign-in


AWS Sign-in Endpoint for SAML
3. SAML Token 5. Sign-in URL for AWS (https://signin.aws.amazon.com/saml)
Management Console
User
2. Authenticate
6. Redirect

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Directory Services
• AWS Managed Microsoft AD auth trust auth
• Create your own AD in AWS, manage users
locally, supports MFA
• Establish “trust” connections with your on-
premises AD On-prem AD AWS Managed AD

• AD Connector proxy auth


• Directory Gateway (proxy) to redirect to on-
premises AD
• Users are managed on the on-premises AD
On-prem AD AD Connector
• Simple AD
• AD-compatible managed directory on AWS
• Cannot be joined with on-premises AD

Simple AD
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Directory Services
AWS Managed Microsoft AD
• Managed Service: Microsoft AD in your AWS VPC
• EC2 Windows Instances: VPC

• EC2 Windows instances can join the domain and run


traditional AD applications (sharepoint, etc) Availability Zone
• Seamlessly Domain Join Amazon EC2 Instances from
Multiple Accounts & VPCs
• Integrations:
• RDS for SQL Server, AWS Workspaces, Quicksight… Apps Domain Controllers
• AWS SSO to provide access to 3rd party applications
• Standalone repository in AWS or joined to on-
premises AD Availability Zone

• Multi AZ deployment of AD in 2 AZ, # of DC


(Domain Controllers) can be increased for scaling
• Automated backups Apps AD DC AD DC
• Automated Multi-Region replication of your
directory
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Microsoft Managed AD - Integrations

SAML

RDS for Amazon Amazon Amazon Amazon AWS


WorkSpaces Quicksight Connect WorkDocs Single-Sign On Through AWS SSO
SQL Server

AD two-way
Forest trust
Traditional AD Applications

AWS Managed
Extend On Premise AD Microsoft AD DC

.NET Apps SharePoint SQL Server

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Connect to on-premises AD
• Ability to connect your on- VPC

premises Active Directory to Site-to-Site VPN


AWS Managed Microsoft AD Or Direct Connect
• Must establish a Direct Connect on-premises AWS Managed
(DX) or VPN connection Microsoft AD Microsoft AD DC
• Can setup three kinds of forest trust
trust:
• One-way trust: trust
AWS => on-premises
• One-way trust: est ing d om a in Seamless
on-premises => AWS q u om
For re .mycorp.c domain join
D
• Two-way forest trust: awsA
AWS ó on-premises
• Forest trust is different than EC2
synchronization Traditional AD app

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Solution Architecture:
Active Directory Replication
• You may want to create a replica of your AD on EC2 in the cloud to
minimize latency of in case DX or VPN goes down
• Establish trust between the AWS Managed Microsoft AD and EC2
VPC
on-premises Microsoft AD on EC2 AWS Managed
Microsoft AD Self Managed Replica Microsoft AD DC
trust
replication
trust

Domain: Domain: Domain:


onpremAD.example.com onpremAD.example.com awsAD.example.com

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Directory Services
AD Connector
Corporate Office
• AD Connector is a directory
gateway to redirect directory User
requests to your on-premises
Microsoft Active Directory
VPN or
• No caching capability Direct Connect
1. User Credentials
(over SSL)

Authentication
• Manage users solely on-premises,

3. LDAP
AWS Cloud
no possibility of setting up a trust
Region
• VPN or Direct Connect 4. STS (AssumeRole)
Custom
Sign-in Page
• Doesn’t work with SQL Server, 5. Temp. Credentials
2
doesn’t do seamless joining, can’t AWS IAM Availability Zone Availability Zone

share directory
AD Connector AD Connector

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Directory Services
Simple AD
• Simple AD is an inexpensive Active Directory–compatible service with
the common directory features.
• Supports joining EC2 instances, manage users and groups
• Does not support MFA, RDS SQL server, AWS SSO
• Small: 500 users, large: 5000 users
• Powered by Samba 4, compatible with Microsoft AD
• lower cost, low scale, basic AD compatible, or LDAP compatibility
• No trust relationship

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Organizations
Root Organizational Unit (OU)

Management Account

OU (Dev) OU (Prod)

OU (HR) OU (Finance)
Member Accounts

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Organizations -
OrganizationAccountAccessRole
• IAM role which grants full administrator
permissions in the Member account to the AWS Organizations
Management account
• Used to perform admin tasks in the Member Management Account
accounts (e.g., creating IAM users)
• Could be assumed by IAM users in the create AssumeRole API
Management account
• Automatically added to all new Member Member Account
accounts created with AWS Organizations
• Must be created manually if you invite an IAM Role

existing Member account OrganizationAccountAccessRole

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Multi Account Strategies
• Create accounts per department, per cost center, per dev / test / prod,
based on regulatory restrictions (using SCP), for better resource
isolation (ex: VPC), to have separate per-account service limits, isolated
account for logging,

• Multi Account vs. One Account Multi VPC


• Use tagging standards for billing purposes
• Enable CloudTrail on all accounts, send logs to central S3 account
• Send CloudWatch Logs to central logging account
• Strategy to create an account for security

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Organizational Units (OU) - Examples
Business Unit Environmental Lifecycle Project-Based

Sales Prod Project 1


Account 1 Account 1 Account 1
Project 1
Sales OU Prod OU
Sales Prod OU Project 1
Account 2 Account 2 Account 2

Retail Dev Project 2


Management
Account 1 Management
Account 1 Management Project 2 Account 1
Account
Retail OU Account
Dev OU Account OU
Retail Dev Project 2
Account 2 Account 2 Account 2

Finance Test Project 3


Finance Account 1 Account 1 Project 3 Account 1
Test OU
OU OU
Finance Test Project 3
Account 2 Account 2 Account 2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Organization - Feature Modes
• Consolidated billing features:
• Consolidated Billing across all accounts - single payment method
• Pricing benefits from aggregated usage (volume discount for EC2, S3…)

• All Features (Default):


• Includes consolidated billing features, SCP
• Invited accounts must approve enabling all features
• Ability to apply an SCP to prevent member accounts from leaving the org
• Can’t switch back to Consolidated Billing Features only

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Organizations – Reserved Instances
• For billing purposes, the consolidated billing feature of AWS Organizations
treats all the accounts in the organization as one account.
• This means that all accounts in the organization can receive the hourly cost
benefit of Reserved Instances that are purchased by any other account.
• The payer account (Management account) of an organization can turn off
Reserved Instance (RI) discount and Savings Plans discount sharing for any
accounts in that organization, including the payer account
• This means that RIs and Savings Plans discounts aren't shared between any
accounts that have sharing turned off.
• To share an RI or Savings Plans discount with an account, both accounts must
have sharing turned on

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Service Control Policies (SCP)
• Define allowlist or blocklist IAM actions
• Applied at the OU or Account level
• Does not apply to the Management Account
• SCP is applied to all the Users and Roles in the account, including Root user
• The SCP does not affect Service-linked roles
• Service-linked roles enable other AWS services to integrate with AWS Organizations
and can't be restricted by SCPs.
• SCP must have an explicit Allow (does not allow anything by default)
• Use cases:
• Restrict access to certain services (for example: can’t use EMR)
• Enforce PCI compliance by explicitly disabling services

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SCP Hierarchy
FullAWSAccess SCP Root OU • Management Account
• Can do anything
• (no SCP apply)
DenyAccessAthena SCP Management Account
• Account A
• Can do anything
• EXCEPT access Redshift
DenyRedshift SCP OU (Prod) (explicit Deny from OU)
• Account B
AuthorizedRedshift SCP Account A • Can do anything
• EXCEPT access Redshift
(explicit Deny from Prod OU)
DenyAWSLambda SCP • EXCEPT access Lambda
OU (HR) OU (Finance) (explicit Deny from HR OU)
• Account C
Account B Account C • Can do anything
• EXCEPT access Redshift
(explicit Deny from Prod OU)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SCP Examples
Blocklist and Allowlist strategies

More examples: https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_example-scps.html

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
IAM Policy Evaluation Logic

https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic.html

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Restricting Tags with IAM Policies
• You can restrict specific Tags on AWS
resources
• Using the aws:TagKeys Condition Key
• Validate the Tag Keys attached to a resource
against the Tag Keys in the IAM Policy
• Example: allow IAM users to create EBS Match All Keys
Volumes only if it has the “Env” and
“CostCenter” Tags
• Use either ForAllValues (must have all
keys) or ForAnyValue (must have any of
these keys at a minimum)

Match Any Keys


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Using SCP to Restrict Creating Resources
without appropriate Tags
• Prevent IAM Users/Roles in the
affected Member accounts from
creating resources if they don’t
have a specific Tags

• Example: restrict launching an


EC2 instance if it doesn’t have
the “Project” and “CostCenter”
Tags

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Organizations – Tag Policies
• Helps you standardize tags across resources in an
AWS Organization
• Ensure consistent tags, audit tagged resources,
maintain proper resources categorization, …
• You define Tag keys and their allowed values
• Helps with AWS Cost Allocation Tags and
Attribute-based Access Control
• Prevent any non-compliant tagging operations on
specified services and resources
• Generate a report that lists all tagged/non-
compliant resources
• Use CloudWatch Events to monitor non-
compliant tags

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Organizations – AI Services Opt-out Policies
• Certain AWS AI services may use your content for
continuous improvement of Amazon AI/ML services
• Example: Amazon Lex, Amazon Comprehend,
Amazon Polly, …
All Services
• You can opt-out of having your content stored or
used by AWS AI services
• Create an Opt-out Policy that enforces this setting
across all Member accounts and AWS Regions
• You can opt-out all AI services or selected services
• Can be attached to Organization Root, specific OU,
or individual Member account

ONLY Rekognition & Lex

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Organizations – Backup Policies
• AWS Backup enables you to create
Backup Plans that define how to backup
your AWS resources

• JSON documents that define Backup


Plans across an AWS Organization
• Gives you granular control over backing
up your resources (e.g., backup frequency,
time window, backup region, …)
• Can be attached to Organization Root,
specific OU, or individual Member
account
• Immutable Backup Plans appear in
Member accounts (view ONLY)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Single Sign-On (SSO)
• Centrally manage SSO access to:
• Multiple AWS accounts
• Commonly used business applications (e.g.,
Salesforce, Box, Office 365, …)
• Custom SAML 2.0-based applications
• Integrated with AWS Organizations
• Identity source:
• SSO-built in: manage users & groups
• Active Directory through Directory Services
(AWS Managed Microsoft AD or AD Connector)
• External Identity Provider: any SAML 2.0 Identity
Provider (e.g., Azure AD, Okta Universal
Directory)
• Centralized permission management https://aws.amazon.com/blogs/security/introducing-aws-single-sign-on/
• Centralized auditing with CloudTrail (e.g., user
sign-in activities)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SSO vs AssumeRoleWithSAML
AWS Single Sign-On AssumeRoleWithSAML

Sign-in
Sign-in

SAML
Client (Browser)
AWS SSO Portal/Identity Client (Browser) Security Token
Provider (IdP) Service (STS)

integration
authenticate

Identity Store LDAP-based


(SAML 2.0) Identity Store

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Single Sign-On (SSO) – With AD
AWS Cloud Business Apps

Sign-in SSO Access Office 365

SSO
Users A
AWS SSO cce
ss
ru st
T
way
2- SSO Access …

AWS Organization

OU (Prod) OU (Dev)

… …
Custom SAML 2.0-
Based Apps

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS SSO – Integration with MS AD
AWS Cloud
• AWS Managed Microsoft AD
AWS Managed AWS SSO
• AWS Managed Microsoft AD Microsoft AD

with 2-way forest trust with Corporate Data Center AWS Cloud
on-premises AD 2-way trust

AWS Managed AWS SSO


Microsoft AD
• AD Connector to on-
premises AD Corporate Data Center AWS Cloud
proxy

AD Connector AWS SSO

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Control Tower
• Easy way to set up and govern a secure and compliant multi-account
AWS environment based on best practices
• Benefits:
• Automate the set up of your environment in a few clicks
• Automate ongoing policy management using guardrails
• Detect policy violations and remediate them
• Monitor compliance through an interactive dashboard
• AWS Control Tower runs on top of AWS Organizations:
• It automatically sets up AWS Organizations to organize accounts and implement
SCPs (Service Control Policies)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Control Tower – Account Factory
• Automates account provisioning and Corporate Datacenter

deployments
• Enables you to create pre-approved
baselines and configuration options
for AWS accounts in your VPN or
2-way Trust
Direct Connect
organization (e.g., VPC default
configuration, subnets, region, …) AWS Cloud
• Uses AWS Service Catalog to Control Tower – Landing Zone
provision new AWS accounts (created through Account Factory)

authenticate

Member Accounts AWS SSO Directory AD Connector

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Control Tower – Detect and Remediate
Policy Violations
• Guardrail
• Provides ongoing governance for your Control Tower environment (AWS Accounts)
• Preventive – using SCPs (e.g., Disallow Creation of Access Keys for the Root User)
• Detective – using AWS Config (e.g., Detect Whether MFA for the Root User is Enabled)
• Example: identify non-compliant resources (e.g., untagged resources)

AWS Control Tower


Guardrail trigger notify
(Detective) (NON_COMPLIANT)
AWS Config
SNS Admin
monitor un-tagged
resources invoke

Member remediate
Accounts (add tags)
Lambda
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Resource Access Manager (RAM)
• Share AWS resources that you own with other AWS accounts
• Share with any account or within your Organization
• Avoid resource duplication!
• VPC Subnets
• Allow to have all the resources launched in the same subnets
• Must be from the same AWS Organizations.
• Cannot share security groups and default VPC
• Participants can manage their own resources in there
• Participants can't view, modify, delete resources that belong to other participants or the owner
• AWS Transit Gateway
• Route 53 (Resolver Rules, DNS Firewall Rule Groups)
• License Manager Configurations

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Resource Access Manager (RAM)
• Aurora DB Clusters
• ACM Private Certificate Authority
• CodeBuild Project
• EC2 (Dedicated Hosts, Capacity Reservation)
• AWS Glue (Catalog, Database, Table)
• AWS Network Firewall Policies
• AWS Resource Groups
• Systems Manager Incident Manager (Contacts, Response Plans)
• AWS Outposts (Outpost, Site)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Resource Access Manager – VPC example
• Each account…
AWS Cloud – VPC Owner
• is responsible for its own
VPC resources
• cannot view, modify or delete
Private subnet other resources in other
accounts
• Network is shared so…
• Anything deployed in the VPC
Account 1 Account 2 s
can talk to other resources in
the VPC
• Applications are accessed easily
EC2 ALB EC2 across accounts, using private IP!
• Security groups from other
accounts can be referenced for
maximum security

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Summary of Identity & Federation
• Users and Accounts all in AWS
• AWS Organizations
• AWS Control Tower to setup secure & complaint multi-account AWS environment (best practices)
• Federation with SAML
• Federation without SAML with a custom IdP (GetFederationToken)
• AWS Single Sign-On to connect to multiple AWS Accounts (Organization) and SAML apps
• Web Identity Federation (not recommended)
• Cognito for most web and mobile applications (has anonymous mode, MFA)
• AWS Directory Service:
• Managed Microsoft AD – standalone or setup trust AD with on-premises, has MFA, seamless join, RDS
integration
• AD Connector – proxy requests to on-premises
• Simple AD – standalone & cheap AD-compatible with no MFA, no advanced capabilities
• AWS RAM to share resources (example VPC subnets)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Security Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS CloudTrail
• Provides governance, compliance and audit for your AWS Account
• CloudTrail is enabled by default!
• Get an history of events / API calls made within your AWS Account by:
• Console
• SDK
• CLI
• AWS Services
• Can put logs from CloudTrail into CloudWatch Logs or S3
• A trail can be applied to All Regions (default) or a single Region.
• If a resource is deleted in AWS, investigate CloudTrail first!

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudTrail Diagram

SDK

CloudWatch Logs
CloudTrail Console
CLI

Console
Inspect & Audit S3 Bucket

IAM Users &


IAM Roles

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudTrail Events
• Management Events:
• Operations that are performed on resources in your AWS account
• Examples:
• Configuring security (IAM AttachRolePolicy)
• Configuring rules for routing data (Amazon EC2 CreateSubnet)
• Setting up logging (AWS CloudTrail CreateTrail)
• By default, trails are configured to log management events.
• Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)

• Data Events:
• By default, data events are not logged (because high volume operations)
• Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject): can separate Read and Write Events
• AWS Lambda function execution activity (the Invoke API)

• CloudTrail Insights Events:


• See next slide J

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudTrail Insights
• Enable CloudTrail Insights to detect unusual activity in your account:
• inaccurate resource provisioning
• hitting service limits
• Bursts of AWS IAM actions
• Gaps in periodic maintenance activity
• CloudTrail Insights analyzes normal management events to create a baseline
• And then continuously analyzes write events to detect unusual patterns
• Anomalies appear in the CloudTrail console
• Event is sent to Amazon S3
• An EventBridge event is generated (for automation needs)
CloudTrail Console

Continous analysis generate


Management Events Insights Events S3 Bucket

CloudTrail Insights
EventBridge event

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudTrail Events Retention
• Events are stored for 90 days in CloudTrail
• To keep events beyond this period, log them to S3 and use Athena

Management Events CloudTrail


Athena
log analyze
Data Events

90 days S3 Bucket
Insights Events Long-term retention
retention

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudTrail – Solution Architecture:
Delivery to S3
Every
5 minutes Lifecycle Policy
Glacier
SSE-S3 (default)
or SSE-KMS S3
CloudTrail
Delivery
S3 Enhancements:
S3 Events
notifications • Enable Versioning
• MFA Delete Protection
SQS, SNS, Lambda
• S3 Lifecycle Policy (S3 IA, Glacier…)
• S3 Object Lock
SNS SQS
• SSE-S3 or SSE-KMS encryption
• Feature to perform CloudTrail Log File Integrity
validation
(SHA-256 for hashing and signing)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudTrail - Solution Architecture:
Multi Account, Multi Region Logging
Account A

Security Account

CloudTrail cloudtrail-bucket/account-A
cloudtrail-bucket/account-B
cloudtrail-bucket/account-C…
Account B
S3
+ S3 Bucket Policy

Observations:
CloudTrail • The S3 bucket policy is necessary for cross-account delivery
• If Account A wants to access its CloudTrail files:
• Option 1: create a cross-account role and assume the role
• Option 2: edit the bucket policy

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudTrail - Solution Architecture:
Alert for API calls
stream

CloudTrail CW Logs Metric Filters CW Alarm SNS

• Log filter metrics can be used to detect a high level of API happening
• Ex: Count occurrences of EC2 TerminateInstances API
• Ex: Count of API calls per user
• Ex: Detect high level of Denied API calls

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudTrail – Solution Architecture:
Organizational Trail
AWS Organizations
(o-exampleorgid)

Management Account
(111111111111)
my-organization-bucket/Logs/o-exampleorgid/111111111111
my-organization-bucket/Logs/o-exampleorgid/222222222222
The Organizational my-organization-bucket/Logs/o-exampleorgid/333333333333
Trail is created in the my-organization-bucket/Logs/o-exampleorgid/444444444444
my-organization-bucket/Logs/o-exampleorgid/555555555555
management CloudTrail Trail S3 Bucket
account. (MyOrganizationTrail) (my-organization-bucket)

OU (Prod) OU (Dev)

Member Account Member Account Member Account Member Account


(222222222222) (333333333333) (444444444444) (555555555555)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudTrail: How to react to events the fastest?
Overall, CloudTrail may take up to 15 minutes to deliver events

• CloudWatch Events:
• Can be triggered for any API call in CloudTrail
• The fastest, most reactive way
• CloudTrail Delivery in CloudWatch Logs:
• Events are streamed
• Can perform a metric filter to analyze occurrences and detect anomalies
• CloudTrail Delivery in S3:
• Events are delivered every 5 minutes
• Possibility of analyzing logs integrity, deliver cross account, long-term storage

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS KMS (Key Management Service)
• Anytime you hear “encryption” for an AWS service, it’s most likely KMS
• Easy way to control access to your data, AWS manages keys for us
• Fully integrated with IAM for authorization
• Seamlessly integrated into:
• Amazon EBS: encrypt volumes
• Amazon S3: Server-side encryption of objects
• Amazon Redshift: encryption of data
• Amazon RDS: encryption of data
• Amazon SSM: Parameter store
• Etc…
• But you can also use the CLI / SDK

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
KMS – KMS Key Types
• Symmetric (AES-256 keys)
• First offering of KMS, single encryption key that is used to Encrypt and Decrypt
• AWS services that are integrated with KMS use Symmetric KMS keys
• Necessary for envelope encryption
• You never get access to the KMS key unencrypted (must call KMS API to use)
• Asymmetric (RSA & ECC key pairs)
• Public (Encrypt) and Private Key (Decrypt) pair
• Used for Encrypt/Decrypt, or Sign/Verify operations
• The public key is downloadable, but you can’t access the Private Key unencrypted
• Use case: encryption outside of AWS by users who can’t call the KMS API

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Types of KMS Keys
• Customer Managed Keys
• Create, manage and use, can enable or disable
• Possibility of rotation policy (new key generated every year, old key preserved)
• Can add a Key Policy (resource policy) & audit in CloudTrail
• Leverage for envelope encryption

• AWS Managed Keys


• Used by AWS service (aws/s3, aws/ebs, aws/redshift)
• Managed by AWS (automatically rotated every 3 years)
• View Key Policy & audit in CloudTrail

• AWS Owned Keys


• Created and managed by AWS, use by some AWS services to protect your resources
• Used in multiple AWS accounts, but they are not in your AWS account
• You can’t view, use, track, or audit

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Types of KMS Keys

KMS Key Customer Managed Key AWS Managed Key AWS Owned Key

Can view metadata?


Can manage?
Used only for my AWS account?

Automatic Rotation Optional (every 1 year) Required (every 3 years) Varies

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
KMS Key Material Origin
• Identifies the source of the key material in the KMS key
• Can’t be changed after creation

• KMS (AWS_KMS) – default


• AWS KMS creates and manages the key material in its own key store

• External (EXTERNAL)
• You import the key material into the KMS key
• You’re responsible for securing and managing this key material outside of AWS

• Custom Key Store (AWS_CLOUDHSM)


• AWS KMS creates the key material in a custom key store (CloudHSM Cluster)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
KMS Key Source – Custom Key Store
(CloudHSM) AWS KMS
• Integrate KMS with CloudHSM cluster as a
Custom Key Store
• Key materials are stored in a CloudHSM View & manage keys
Custom Key
cluster that you own and manage User Store Connector

• The cryptographic operations are performed


in the HSMs VPC

• Use cases: AZ - A AZ - B

• You need direct control over the HSMs CloudHSM Cluster


• KMS keys needs to be stored in a dedicated
HSMs HSM HSM
• HSMs must be validated at FIPS 140-2 Level 3
(KMS validated at FIPS 140-2 Level 2)
At least 2 active HSMs
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
KMS Key Source - External
• Import your own key material into KMS key, Bring Your Own Key (BYOK)
• You’re responsible for key material’s security, availability, and durability outside of AWS
• Must be 256-bit Symmetric key (Asymmetric is NOT supported)
• Can’t be used with Custom Key Store (CloudHSM)
• Manually rotate your KMS key (Automatic Key Rotation is NOT supported)

1. Create KMS key


3. Encrypt Key Material (SYMMETRIC & EXTERNAL)

2. Download
Key Material (Public Key & Import Token)
+ User Key Material AWS KMS
4. Import
Public Key (Encrypted Key Material & Import Token)
KMS key
=

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
KMS Multi-Region Keys
AWS KMS
us-west-2

multi-Region Replica key


arn:aws:kms:us-west-2:111122223333:
key/mrk-1234abcd12ab34cd56ef1234567890ab

us-east-1 eu-west-1
sync
multi-Region Primary key multi-Region Replica key
arn:aws:kms:us-east-1:111122223333: arn:aws:kms:eu-west-1:111122223333:
key/mrk-1234abcd12ab34cd56ef1234567890ab key/mrk-1234abcd12ab34cd56ef1234567890ab

ap-southeast-2

multi-Region Replica key


arn:aws:kms:ap-southeast-2:111122223333:
key/mrk-1234abcd12ab34cd56ef1234567890ab

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
KMS Multi-Region Keys
• A set of identical KMS keys in different AWS Regions that can be used
interchangeably (~ same KMS key in multiple Regions)
• Encrypt in one Region and decrypt in other Regions (No need to re-encrypt
or making cross-Region API calls)
• Multi-Region keys have the same key ID, key material, automatic rotation, …
• KMS Multi-Region are NOT global (Primary + Replicas)
• Each Multi-Region key is managed independently
• Only one primary key at a time, can promote replicas into their own primary
• Use cases: Disaster Recovery, Global Data Management (e.g., DynamoDB
Global Tables), Active-Active Applications that span multiple Regions,
Distributed Signing applications, …

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SSM Parameter Store
• Secure storage for configuration and secrets Applications
• Optional Seamless Encryption using KMS
Plaintext Encrypted
• Serverless, scalable, durable, easy SDK configuration configuration
• Version tracking of configurations / secrets
• Configuration management using path & IAM SSM Parameter
Check IAM Store
• Notifications with CloudWatch Events permissions

• Integration with CloudFormation Decryption


Service

AWS KMS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SSM Parameter Store Hierarchy
• /my-department/
• my-app/ GetParameters or
• dev/ GetParametersByPath API
• db-url Dev Lambda
• db-password Function
• prod/
• db-url
Prod Lambda
• db-password
Function
• other-app/
• /other-department/
• /aws/reference/secretsmanager/secret_ID_in_Secrets_Manager
• /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Standard and advanced parameter tiers
Standard Advanced
Total number of parameters 10,000 100,000
allowed
(per AWS account and
Region)
Maximum size of a 4 KB 8 KB
parameter value
Parameter policies available No Yes
Cost No additional charge Charges apply
Storage Pricing Free $0.05 per advanced parameter per
month
API Interaction Pricing Standard Throughput: free Standard Throughput: $0.05 per 10,000
(higher throughput = up to Higher Throughput: $0.05 per 10,000 API interactions
1000 Transactions per API interactions Higher Throughput: $0.05 per 10,000
second) API interactions

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Parameters Policies (for advanced parameters)
• Allow to assign a TTL to a parameter (expiration date) to force
updating or deleting sensitive data such as passwords
• Can assign multiple policies at a time

Expiration (to delete a parameter) ExpirationNotification (CW Events) NoChangeNotification (CW Events)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Secrets Manager
• Meant for storing secrets (e.g., passwords, API keys, …) Secrets Manager
• Capability to force rotation of secrets every X days Secret
• Automate generation of secrets on rotation (uses Lambda) (Database Password)

• Natively supports Amazon RDS (all supported DB engines), pull at boot


Redshift, DocumentDB
• Support other databases and services (custom Lambda ECS
function) ECS Task
Inject secret as
• Control access to secrets using Resource-based Policy environment variable

• Integration with other AWS services to natively pull


access
secrets from Secrets Manager: CloudFormation,
CodeBuild, ECS, EMR, Fargate, EKS, Parameter Store…

RDS
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Secrets Manager – with CloudFormation

secret is generated

reference secret in
RDS DB instance

link the secret to


RDS DB instance

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SSM Parameter Store vs Secrets Manager
• Secrets Manager ($$$):
• Automatic rotation of secrets with AWS Lambda
• Lambda function is provided for RDS, Redshift, DocumentDB
• KMS encryption is mandatory
• Can integration with CloudFormation
• SSM Parameter Store ($):
• Simple API
• No secret rotation (can enable rotation using Lambda triggered by CW Events)
• KMS encryption is optional
• Can integration with CloudFormation
• Can pull a Secrets Manager secret using the SSM Parameter Store API

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SSM Parameter Store vs. Secrets Manager
Rotation
AWS Secrets Manager SSM Parameter Store
every 30 days
every 30 days

invoke invoke

AWS Secrets Manager Lambda Function CloudWatch Events Lambda Function


(can be provided)
change
change password
value
change
password

Amazon RDS Amazon RDS SSM Parameter Store

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
RDS - Security
• KMS encryption at rest for underlying EBS volumes / snapshots
• Transparent Data Encryption (TDE) for Oracle and SQL Server
• SSL encryption to RDS is possible for all DB (in-flight)
• IAM authentication for MySQL and PostgreSQL
• Authorization still happens within RDS (not in IAM)
• Can copy an un-encrypted RDS snapshot into an encrypted one
• CloudTrail cannot be used to track queries made within RDS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SSL/TLS - Basics
• SSL refers to Secure Sockets Layer, used to encrypt connections
• TLS refers to Transport Layer Security, which is a newer version
• Nowadays, TLS certificates are mainly used, but people still refer as SSL

• Public SSL certificates are issued by Certificate Authorities (CA)


• Comodo, Symantec, GoDaddy, GlobalSign, Digicert, Letsencrypt, etc…

• SSL certificates have an expiration date (you set) and must be renewed

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SSL Encryption – How it works
• Asymmetric
Encryption is
1. Client sends hello, cipher suits & random expensive (SSL)
• Symmetric
2. Server Response with server random & encryption is
Client SSL certificate (Public Key) cheaper
Server
3. Client verifies • Asymmetric
SSL certificate handshake is used to
4. Master key (symmetric) generated and sent exchange a per-
encrypted using the Public Key 5. Server verifies client random
Client SSL cert symmetric key
(optional)
• Possibility of client
6. Master key sending an SSL
7. Secure Symmetric Communication in Place certificate as well
is decrypted
using Private Key
(two-way certificate)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SSL – Server Name Indication (SNI)
• SNI solves the problem of loading multiple SSL Target group for

certificates onto one web server (to serve www.mycorp.com

multiple websites)
• It’s a “newer” protocol, and requires the client
to indicate the hostname of the target server
in the initial SSL handshake Target group for
Domain1.example.com
• The server will then find the correct I would like
www.mycorp.com
certificate, or return the default one
Client ALB
Note: SSL Cert:
• Only works for ALB & NLB (newer Use the correct Domain1.example.com

generation), CloudFront SSL cert

SSL Cert:
• Does not work for CLB (older gen) www.mycorp.com

….

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SSL – Man in the Middle Attacks
HTTP HTTP

User Pirate Server Good Server


(can intercept packets)

HTTPS HTTPS

User Pirate Server Good Server


If infected, the user may trust the Send fake SSL cert to User
“pirate SSL certificate” Decrypts and
re-encrypts packets

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SSL – Man in the Middle Attack
How to prevent
1. Don’t use public-facing HTTP, use HTTPS (meaning, use SSL/TLS
certificates)
2. Use a DNS that has DNSSEC
• To send a client to a pirate server, a DNS response needs to be “forged” by a
server which intercepts them
• It is possible to protect your domain name by configuring DNSSEC
• Amazon Route 53 supports DNSSEC for domain registration.
• Route 53 supports DNSSEC for DNS service as of December 2020 (using KMS)
• You could also run a custom DNS server on Amazon EC2 for example (Bind is
the most popular, dnsmasq, KnotDNS, PowerDNS).

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Certificate Manager (ACM)
• To host public SSL certificates in AWS, you can:
• Buy your own and upload them using the CLI
• Have ACM provision and renew public SSL
certificates for you (free of cost) Public www
HTTPS Request
SSL
• ACM loads SSL certificates on the following termination
integrations: Provision and
• Load Balancers (including the ones created by EB) Maintain Cert
• CloudFront distributions
• APIs on API Gateways
ACM
Private AWS
• SSL certificates is overall a pain to manually HTTP Request
manage, so ACM is great to leverage in your
AWS infrastructure!

Less CPU cost in EC2


Thanks to SSL termination for the ELB
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ACM – Good to know
• Possibility of creating public certificates
• Must verify public DNS
• Must be issued by a trusted public certificate authority (CA)
• Possibility of creating private certificates
• For your internal applications
• You create your own private CA
• Your applications must trust your private CA
• Certificate renewal:
• Automatically done if generated provisioned by ACM
• Any manually uploaded certificates must be renewed manually and re-uploaded
• ACM is a regional service
• To use with a global application (multiple ALB for example), you need to issue an SSL certificate
in each region where you application is deployed.
• You cannot copy certs across regions

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudHSM
• KMS => AWS manages the software for encryption
• CloudHSM => AWS provisions encryption hardware
• Dedicated Hardware (HSM = Hardware Security Module)
• You manage your own encryption keys entirely (not AWS)
• HSM device is tamper resistant, FIPS 140-2 Level 3 compliance
• Supports both symmetric and asymmetric encryption (SSL/TLS keys)
• No free tier available
• Must use the CloudHSM Client Software
• Redshift supports CloudHSM for database encryption and key management
• Good option to use with SSE-C encryption

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudHSM Diagram
AWS manages the Hardware

SSL Connection
User manages the Keys

AWS CloudHSM
CloudHSM Client

IAM permissions: CloudHSM Software:


• CRUD an HSM Cluster • Manage the Keys
• Manage the Users
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudHSM – High Availability
• CloudHSM clusters are spread across Multi AZ (HA)
• Great for availability and durability
Availability Zone 1

CloudHSM 1

Availability Zone 2
CloudHSM Client
CloudHSM 2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudHSM vs. KMS
Feature AWS KMS AWS CloudHSM
Tenancy Multi-Tenant Single-Tenant
Standard FIPS 140-2 Level 2 FIPS 140-2 Level 3
Master Keys • AWS Owned Keys Customer Managed CMK
• AWS Managed Keys
• Customer Managed KMS Keys
Key Types • Symmetric • Symmetric
• Asymmetric • Asymmetric
• Digital Signing • Digital Signing & Hashing
Key Accessibility Accessible in multiple AWS regions (can’t • Deployed and managed in a VPC
access keys outside the region it’s created in) • Can be shared across VPCs (VPC Peering)
Crypotgraphic None • SSL/TLS Acceleration
Acceleration • Oracle TDE Acceleration
Access & AWS IAM You create users and manage their permissions
Authentication

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudHSM vs. KMS
Feature AWS KMS AWS CloudHSM
High Availability AWS Managed Service Add multiple HSMs over different AZs
Audit Capability • CloudTrail • CloudTrail
• CloudWatch • CloudWatch
• MFA support
Free Tier Yes No

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Solution Architecture:
SSL on ALB
Auto Scaling group

HTTP

HTTPS

ALB with SSL cert


from ACM

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Solution Architecture:
SSL on web server EC2 instances
Auto Scaling group Retrieve SSL private key
at EC2 boot time
(user data)
HTTPS Install certs on EC2

TCP HTTPS

HTTPS Performing SSL encryption / SSM Parameter Store


NLB decryption can use CPU resources

IAM permissions

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Solution Architecture:
CloudHSM – SSL Offloading Auto
Scaling
• You can offload SSL to group
SSL
CloudHSM (SSL offloading
Acceleration) HTTPS

• Supported by NGINX,
Apache Web servers and TCP HTTPS
IIS for Windows Server
HTTPS
• Extra security: the SSL NLB CloudHSM
private key never leaves the
HSM device
• Must setup a cryptographic
user (CU) on the
CloudHSM device CloudHSM
(multi-AZ)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Encryption for Objects
• There are 4 methods of encrypting objects in S3

• SSE-S3: encrypts S3 objects using keys handled & managed by AWS


• SSE-KMS: leverage AWS Key Management Service to manage encryption
keys
• SSE-C: when you want to manage your own encryption keys
• Client Side Encryption

• Glacier: all data is AES-256 encrypted, key under AWS control

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Encryption in transit (SSL)
• AWS S3 exposes:
• HTTP endpoint: non encrypted
• HTTPS endpoint: encryption in flight

• You’re free to use the endpoint you want, but HTTPS is recommended
• HTTPS is mandatory for SSE-C
• Encryption in flight is also called SSL / TLS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Events in S3 Buckets
• S3 Access Logs:
• Detailed records for the requests that are made to a bucket
• Might take hours to deliver
• Might be incomplete (best effort)
• S3 Events Notifications:
• Receive notifications when certain events happen in your bucket
• E.g.: new objects created, object removal, restore objects, replication events
• Destinations: SNS, SQS queue, Lambda
• Typically delivered in seconds but can take minutes, notification for every object if versioning is
enabled, else risk of one notification for two same object write done simultaneously
• Trusted Advisor:
• Check the bucket permission (is the bucket public?)
• CloudWatch Events:
• Need to enable CloudTrail object level logging on S3 first
• Target can be Lambda, SQS, SNS, etc…

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Security
• User based
• IAM policies - which API calls should be allowed for a specific user from IAM
console

• Resource Based
• Bucket Policies - bucket wide rules from the S3 console - allows cross account
• Object Access Control List (ACL) – finer grain
• Bucket Access Control List (ACL) – less common

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Bucket Policies
• Use S3 bucket for policy to:
• Grant public access to the bucket
• Force objects to be encrypted at upload
• Grant access to another account (Cross Account)
• Optional Conditions on:
• Public IP or Elastic IP (not on Private IP)
• Source VPC or Source VPC Endpoint – only works with VPC Endpoints
• CloudFront Origin Identity
• MFA
• Examples here: https://docs.aws.amazon.com/AmazonS3/latest/dev/example-
bucket-policies.html

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 pre-signed URLs
• Can generate pre-signed URLs using SDK or CLI
• For downloads (easy, can use the CLI)
• For uploads (harder, must use the SDK)
• Valid for a default of 3600 seconds, can change timeout with --expires-in
[TIME_BY_SECONDS] argument
• Users given a pre-signed URL inherit the permissions of the person who
generated the URL for GET / PUT

• Examples :
• Allow only logged-in users to download a premium video on your S3 bucket
• Allow an ever changing list of users to download files by generating URLs dynamically
• Allow temporarily a user to upload a file to a precise location in our bucket

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Endpoint Gateway for S3
VPC

Public www
S3 Bucket
Bucket policy by AWS:SourceIP (public IP)
Internet
Public Instance
Gateway

private S3 Bucket
Bucket policy by
AWS:SourceVpce
VPC Endpoint (one or few endpoints)
Private Instance
Gateway
OR

AWS:SourceVpc
(encompass all possible VPC endpoints)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Object Lock & Glacier Vault Lock
• S3 Object Lock
• Adopt a WORM (Write Once Read
Many) model Object
• Block an object version deletion for a
specified amount of time

• Glacier Vault Lock


• Adopt a WORM (Write Once Read
Many) model Vault Lock Policy
• Lock the policy for future edits (can no Object can’t be deleted
longer be changed)
• Helpful for compliance and data retention

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 – Access Points
• Each Access Point gets its own DNS and policy to limit who can access it
• A specific IAM user / group
• One policy per Access Point => Easier to manage than complex bucket policies
• Can restrict to traffic from a specific VPC
• Access points are linked to a specific bucket (unique name per acct/region)
Finance Policy to grand r/w access Finance Data: “/finance/....”
Users / Group To a specific /finance prefix
Finance AP

Sales Policy to grand r/w access


Users / Group To a specific /sales prefix Sales AP Sales Data: “/sales/....”

Analytics Policy to grand read access


Users / Group To all the bucket Analytics AP Simple Bucket Policy

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 – Access Points with Shared Bucket
VPC - A VPC - C
Endpoint Policy Endpoint Policy

Central VPC
EC2 Instances VPC Gateway VPC Gateway EC2 Instances
Endpoint Endpoint
VPC Access Point

Access Point
Policy

VPC - B VPC - D
Endpoint Policy Endpoint Policy

S3 Bucket
EC2 Instances VPC Gateway VPC Gateway EC2 Instances
Endpoint Endpoint

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Network Security
• Security Groups VPC
• Attached to ENI (Elastic Network Interfaces) – EC2, Public subnet
RDS, Lambda in VPC, etc
• Are stateful (any traffic in is allowed to go out, any traffic
out can go back in)
• Can reference by CIDR and security group id
• Supports security group references for VPC peering NACL
• Default: inbound denied, outbound all allowed
• NACL (Network ACL): Security group
• Attached at the subnet level
• Are stateless (inbound and outbound rules apply for all
traffic) Host
• Can only reference a CIDR range (no hostname) Firewall
• Default: allow all inbound, allow all outbound
• New NACL: denies all inbound, denies all outbound
• Host Firewall
• Software based, highly customizable

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What’s a DDOS* Attack?
*Distributed Denial-of-Service
normal users

Not accessible
Not responsive

attacker

application
server

masters
bots
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Type of Attacks on your infrastructure
• Distributed Denial of Service (DDoS):
• When your service is unavailable because it’s receiving too many requests
• SYN Flood (Layer 4): send too many TCP connection requests
• UDP Reflection (Layer 4): get other servers to send many big UDP requests
• DNS flood attack: overwhelm the DNS so legitimate users can’t find the site
• Slow Loris attack: a lot of HTTP connections are opened and maintained

• Application level attacks:


• more complex, more specific (HTTP level)
• Cache bursting strategies: overload the backend database by invalidating cache

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DDoS Protection on AWS
• AWS Shield Standard: protects against DDoS attack for your website and
applications, for all customers at no additional costs
• AWS Shield Advanced: 24/7 premium DDoS protection
• AWS WAF: Filter specific requests based on rules
• CloudFront and Route 53:
• Availability protection using global edge network
• Combined with AWS Shield, provides DDoS attack mitigation at the edge
• Be ready to scale – leverage AWS Auto Scaling
• Separate static resources (S3 / CloudFront) from dynamic ones (EC2 / ALB)
• Read the whitepaper for details:
https://d1.awsstatic.com/whitepapers/Security/DDoS_White_Paper.pdf

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Sample Reference Architecture

https://aws.amazon.com/answers/networking/aws-ddos-attack-mitigation/

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Shield
• AWS Shield Standard:
• Free service that is activated for every AWS customer
• Provides protection from attacks such as SYN/UDP Floods, Reflection attacks
and other layer 3/layer 4 attacks
• AWS Shield Advanced:
• Optional DDoS mitigation service ($3,000 per month per organization)
• Protect against more sophisticated attack on Amazon EC2, Elastic Load
Balancing (ELB), Amazon CloudFront, AWS Global Accelerator, Route 53
• 24/7 access to AWS DDoS response team (DRP)
• Protect against higher fees during usage spikes due to DDoS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS WAF – Web Application Firewall
• Protects your web applications from common web exploits (Layer 7)
• Deploy on Application Load Balancer (localized rules)
• Deploy on API Gateway (rules running at the regional or edge level)
• Deploy on CloudFront (rules globally on edge locations)
• Used to front other solutions: CLB, EC2 instances, custom origins, S3 websites)
• Deploy on AppSync (protect your GraphQL APIs)
• WAF is not for DDoS protection
• Define Web ACL (Web Access Control List):
• Rules can include IP addresses, HTTP headers, HTTP body, or URI strings
• Protects from common attack - SQL injection and Cross-Site Scripting (XSS)
• Size constraints, Geo match
• Rate-based rules (to count occurrences of events)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS WAF – Managed Rules
• Library of over 190 managed rules
• Ready-to-use rules that are managed by AWS and AWS Marketplace Sellers

• Baseline Rule Groups – general protection from common threats


• AWSManagedRulesCommonRuleSet, AWSManagedRulesAdminProtectionRuleSet, …
• Use-case Specific Rule Groups – protection for many AWS WAF use cases
• AWSManagedRulesSQLiRuleSet, AWSManagedRulesWindowsRuleSet,
AWSManagedRulesPHPRuleSet, AWSManagedRulesWordPressRuleSet, …
• IP Reputation Rule Groups – block requests based on source (e.g., malicious IPs)
• AWSManagedRulesAmazonIpReputationList, AWSManagedRulesAnonymousIpList
• Bot Control Managed Rule Group – block and manage requests from bots
• AWSManagedRulesBotControlRuleSet

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
WAF - Web ACL – Logging
• You can send your logs to an:
• Amazon CloudWatch Logs log group – 5 MB per second
• Amazon Simple Storage Service (Amazon S3) bucket – 5 minutes interval
• Amazon Kinesis Data Firehose – limited by Firehose quotas
Kinesis Firehose
Destinations
CloudWatch Logs
Amazon S3

Amazon
AWS WAF Kinesis Data Firehose Redshift

Amazon
OpenSearch
S3 Bucket …
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Firewall Manager
• Manage rules in all accounts of an AWS Organization

• Common set of security rules


• WAF rules (Application Load Balancer, API Gateways, CloudFront)
• AWS Shield Advanced (ALB, CLB, Elastic IP, CloudFront)
• Security Groups for EC2 and ENI resources in VPC

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Blocking an IP address

VPC

Security group

NACL
Client EC2 Instance
Public IP
+ Optional Firewall
Software in EC2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Blocking an IP address – with an ALB

VPC

ALB Security group EC2 Security group

NACL
Client
EC2 Instance
Application Load Balancer Private IP
Connection Termination

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Blocking an IP address – with an NLB

VPC

Passthrough EC2 Security group

Sees client’s IP Sees client’s IP

NACL
Client
Network Load Balancer EC2 Instance
Traffic goes through Private IP
No Security Group

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Blocking an IP address – ALB + WAF

VPC

ALB Security group EC2 Security group

NACL
ALB
Client
EC2 Instance
Private IP

WAF
IP address filtering

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Blocking an IP address – ALB, CloudFront WAF

VPC

ALB Security group EC2 Security group

CloudFront Public IPs

Client Public ALB


CloudFront EC2 Instance
Geo Restriction Private IP

NACL
NACL not helpful

WAF
IP address filtering

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon Inspector
• Automated Security Assessments
SSM
Agent
• For EC2 instances
• Leveraging the AWS System Manager (SSM) agent
• Analyze against unintended network accessibility
• Analyze the running OS against known vulnerabilities
• For Containers push to Amazon ECR
Inspector
• Assessment of containers as they are pushed
Service

• Reporting & integration with AWS Security Hub Assessment run state
& findings
• Send findings to Amazon Event Bridge
Security Hub EventBridge

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What does AWS Inspector evaluate?
• Remember: only for EC2 instances and container infrastructure

• Continuous scanning of the infrastructure, only when needed

• Package vulnerabilities (EC2 & ECR) – database of CVE


• Network reachability (EC2)

• A risk score is associated with all vulnerabilities for prioritization

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Config
• Helps with auditing and recording compliance of your AWS resources
• Helps record configurations and changes over time
• AWS Config Rules does not prevent actions from happening (no deny)
• Questions that can be solved by AWS Config:
• Is there unrestricted SSH access to my security groups?
• Do my buckets have any public access?
• How has my ALB configuration changed over time?
• You can receive alerts (SNS notifications) for any changes
• AWS Config is a per-region service
• Can be aggregated across regions and accounts

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Config Resource
• View compliance of a resource over time

• View configuration of a resource over time

• View CloudTrail API calls if enabled

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Config Rules
• Can use AWS managed config rules (over 75)
• Can make custom config rules (must be defined in AWS Lambda)
• Evaluate if each EBS disk is of type gp2
• Evaluate if each EC2 instance is t2.micro
• Rules can be evaluated / triggered:
• For each config change
• And / or: at regular time intervals
• Can trigger CloudWatch Events if the rule is non-compliant (and chain with Lambda)
• Rules can have auto remediations:
• If a resource is not compliant, you can trigger an auto remediation
• Define the remediation through SSM Automations
• Ex: remediate security group rules, stop instances with non-approved tags

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Managed Logs
• Load Balancer Access Logs (ALB, NLB, CLB) => to S3
• Access logs for your Load Balancers
• CloudTrail Logs => to S3 and CloudWatch Logs
• Logs for API calls made within your account
• VPC Flow Logs => to S3 and CloudWatch Logs
• Information about IP traffic going to and from network interfaces in your VPC
• Route 53 Access Logs => to CloudWatch Logs
• Log information about the queries that Route 53 receives
• S3 Access Logs => to S3
• Server access logging provides detailed records for the requests that are made to a bucket
• CloudFront Access Logs => to S3
• Detailed information about every user request that CloudFront receives
• AWS Config => to S3

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
GuardDuty
• Intelligent Threat discovery to Protect AWS Account
• Uses Machine Learning algorithms, anomaly detection, 3rd party data
• One click to enable (30 days trial), no need to install software

• Input data includes:


• CloudTrail Logs: unusual API calls, unauthorized deployments
• VPC Flow Logs: unusual internal traffic, unusual IP address
• DNS Logs: compromised EC2 instances sending encoded data within DNS queries

• Can setup CloudWatch Event rules to be notified in case of findings


• CloudWatch Events rules can target AWS Lambda or SNS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
GuardDuty

SNS
GuardDuty
VPC Flow Logs

CloudTrail Logs

DNS Logs (AWS DNS) CloudWatch Event


Lambda

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Compute and Load Balancing
Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Solution Architecture on AWS
Compute Layer Caching / Session Layer
DNS Layer ElastiCache, DAX,
Route 53 EC2, ASG, Lambda
ECS, Fargate, Batch, EMR DynamoDB, RDS
Database Layer
RDS, Aurora, DynamoDB
ElasticSearch, S3, Redshift

Decoupling Orchestration Layer


Web Layer SQS, SNS, Kinesis
CLB, ALB, NLB Amazon MQ, Step Functions
API Gateway, Elastic IP
Storage Layer
EBS, EFS, Instance Store

Static Assets Layer (storage)


CDN Layer S3, Glacier
CloudFront

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EC2 Instance Types – Main ones
• R: applications that needs a lot of RAM – in-memory caches
• C: applications that needs good CPU – compute / databases
• M: applications that are balanced (think “medium”) – general / web app
• I: applications that need good local I/O (instance storage) – databases
• G: applications that need a GPU – video rendering / machine learning

• T2 / T3: burstable instances (up to a capacity)


• T2 / T3 - unlimited: unlimited burst

• Real-world tip: use https://www.ec2instances.info

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EC2 - Placement Groups
• Control the EC2 Instance placement strategy using placement groups
• Group Strategies:
• Cluster—clusters instances into a low-latency group in a single Availability Zone
• Spread—spreads instances across underlying hardware (max 7 instances per group per
AZ) – critical applications
• Partition—spreads instances across many different partitions (which rely on different sets
of racks) within an AZ. Scales to 100s of EC2 instances per group (Hadoop, Cassandra,
Kafka)
• You can move an instance into or out of a placement group
• Your first need to stop it
• You then need to use the CLI (modify-instance-placement)
• You can then start your instance

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Placement Groups
Cluster
EC2 EC2 EC2 Placement group
Same Rack Cluster
Same AZ Low latency
10 Gbps network
EC2 EC2 EC2

• Pros: Great network (10 Gbps bandwidth between instances with Enhanced
Networking enabled - recommended)
• Cons: If the rack fails, all instances fails at the same time
• Use case:
• Big Data job that needs to complete fast
• Application that needs extremely low latency and high network throughput

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Placement Groups
Spread
Us-east-1a Us-east-1b Us-east-1c • Pros:
• Can span across Availability
Zones (AZ)
• Reduced risk is simultaneous
EC2 EC2 EC2 failure
• EC2 Instances are on different
physical hardware
Hardware 1 Hardware 3 Hardware 5
• Cons:
• Limited to 7 instances per AZ
per placement group
• Use case:
EC2 EC2 EC2 • Application that needs to
maximize high availability
• Critical Applications where
Hardware 2 Hardware 4 Hardware 6 each instance must be isolated
from failure from each other

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Placements Groups
Partition
us-east-1a • Up to 7 partitions per AZ
• Up to 100s of EC2 instances
EC2 EC2 EC2 • The instances in a partition do
not share racks with the instances
in the other partitions
EC2 EC2 EC2
• A partition failure can affect many
EC2 but won’t affect other
EC2 EC2 EC2
partitions
• EC2 instances get access to the
partition information as metadata
EC2 EC2 EC2
• Use cases: HDFS, HBase,
Partition 1 Partition 2 Partition 3 Cassandra, Kafka

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EC2 Instance Launch Types
• On Demand Instances: short workload, predictable pricing, reliable
• Spot Instances: short workloads, for cheap, can lose instances (not reliable)
• Reserved: (MINIMUM 1 year)
• Reserved Instances: long workloads
• Convertible Reserved Instances: long workloads with flexible instances
• Dedicated Instances: no other customers will share your hardware
• Dedicated Hosts: book an entire physical server, control instance placement
• Great for software licenses that operate at the core, or CPU socket level
• Can define host affinity so that instance reboots are kept on the same host

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EC2 included metrics
• CPU: CPU Utilization + Credit Usage / Balance
• Network: Network In / Out
• Status Check:
• Instance status = check the EC2 VM
• System status = check the underlying hardware
• Disk: Read / Write for Ops / Bytes (only for instance store)

• RAM is NOT included in the AWS EC2 metrics

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EC2 Instance Recovery
• Status Check:
• Instance status = check the EC2 VM
• System status = check the underlying hardware

monitor alert

EC2 Instance CloudWatch Alarm SNS Topic


StatusCheckFailed_System

EC2 Instance Recovery

• Recovery: Same Private, Public, Elastic IP, metadata, placement group

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
High Performance Computing (HPC)
• The cloud is the perfect place to perform HPC
• You can create a very high number of resources in no time
• You can speed up time to results by adding more resources
• You can pay only for the systems you have used

• Perform genomics, computational chemistry, financial risk modeling,


weather prediction, machine learning, deep learning, autonomous driving

• Which services help perform HPC?

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Data Management & Transfer

• AWS Direct Connect:


• Move GB/s of data to the cloud, over a private secure network

• Snowball & Snowmobile


• Move PB of data to the cloud

• AWS DataSync
• Move large amount of data between on-premise and S3, EFS, FSx for Windows

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Compute and Networking
• EC2 Instances:
• CPU optimized, GPU optimized
• Spot Instances / Spot Fleets for cost savings + Auto Scaling

• EC2 Placement Groups: Cluster for good network performance

EC2 EC2 EC2 Placement group


Same Rack Cluster
Same AZ Low latency
10Gbps network
EC2 EC2 EC2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Compute and Networking
• EC2 Enhanced Networking (SR-IOV)
• Higher bandwidth, higher PPS (packet per second), lower latency
• Option 1: Elastic Network Adapter (ENA) up to 100 Gbps
• Option 2: Intel 82599 VF up to 10 Gbps – LEGACY

• Elastic Fabric Adapter (EFA)


• Improved ENA for HPC, only works for Linux
• Great for inter-node communications, tightly coupled workloads
• Leverages Message Passing Interface (MPI) standard
• Bypasses the underlying Linux OS to provide low-latency, reliable transport

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Storage
• Instance-attached storage:
• EBS: scale up to 256,000 IOPS with io2 Block Express
• Instance Store: scale to millions of IOPS, linked to EC2 instance, low latency

• Network storage:
• Amazon S3: large blob, not a file system
• Amazon EFS: scale IOPS based on total size, or use provisioned IOPS
• Amazon FSx for Lustre:
• HPC optimized distributed file system, millions of IOPS
• Backed by S3

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Automation and Orchestration
• AWS Batch
• AWS Batch supports multi-node parallel jobs, which enables you to run single
jobs that span multiple EC2 instances.
• Easily schedule jobs and launch EC2 instances accordingly

• AWS ParallelCluster
• Open source cluster management tool to deploy HPC on AWS
• Configure with text files
• Automate creation of VPC, Subnet, cluster type and instance types

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Auto Scaling Groups – Dynamic Scaling Policies
• Target Tracking Scaling
• Most simple and easy to set-up
• Example: I want the average ASG CPU to stay at around 40%
• Simple / Step Scaling
• When a CloudWatch alarm is triggered (example CPU > 70%), then add 2 units
• When a CloudWatch alarm is triggered (example CPU < 30%), then remove 1
• Scheduled Actions
• Anticipate a scaling based on known usage patterns
• Example: increase the min capacity to 10 at 5 pm on Fridays

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Auto Scaling Groups – Predictive Scaling
• Predictive scaling: continuously forecast load and schedule scaling ahead

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Good metrics to scale on Users

• CPUUtilization: Average CPU


utilization across your instances
• RequestCountPerTarget: to make sure
the number of requests per EC2 Application
Load Balancer
instances is stable
RequestCountPerTarget
• Average Network In / Out (if you’re Target Value: 3
application is network bound)
• Any custom metric (that you push
using CloudWatch)
Auto Scaling group

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Auto Scaling – Good to know
• Spot Fleet support (mix on Spot and On-Demand instances)

• Lifecycle Hooks:
• Perform actions before an instance is in service, or before it is terminated
• Examples: cleanup, log extraction, special health checks

• To upgrade an AMI, must update the launch configuration / template


• Then terminate instances manually (CloudFormation can help)
• Or use EC2 Instance Refresh for Auto Scaling

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Auto Scaling – Instance Refresh
User
• Goal: update launch template
and then re-creating all EC2 New Launch Template StartInstanceRefresh
(Updated AMI) Min. Healthy Percentage: 60 %
instances
• For this we can use the native
feature of Instance Refresh Auto Scaling Group

• Setting of minimum healthy New Launch


Template
percentage
• Specify warm-up time (how long
until the instance is ready to use)
Old Launch
Template

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Auto Scaling – Scaling Processes
• Launch: Add a new EC2 to the group, increasing the capacity
• Terminate: Removes an EC2 instance from the group, decreasing its capacity.
• HealthCheck: Checks the health of the instances
• ReplaceUnhealthy: Terminate unhealthy instances and re-create them
• AZRebalance: Balancer the number of EC2 instances across AZ
• AlarmNotification: Accept notification from CloudWatch
• ScheduledActions: Performs scheduled actions that you create.
• AddToLoadBalancer: Adds instances to the load balancer or target group
• InstanceRefresh: Perform an instance refresh

• We can suspend these processes!

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Auto Scaling – Health Checks
• Health checks available: GOOD HEALTH CHECK BAD HEALTH CHECK
• EC2 Status Checks
• ELB Health Checks (HTTP)
• Custom Health Checks – send ASG ASG
instance’s health to an ASG using AWS (Target Group) (Target Group)
CLI or AWS SDK (set-instance-health)
/health-server /number-customers
• ASG will launch a new instance after
terminating an unhealthy one
• Make sure the health check is simple EC2 Instance EC2 Instance
and checks the correct thing
DB call

© Stephane Maarek
RDS DB Instance
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Auto Scaling – Updating an application
Client

ALB

Auto Scaling Group

EC2 Instances
Launch Template

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Auto Scaling – Solution Architecture

ALB ALB

Target group 1 Target group 2


Same target group Split traffic between TG
Auto Scaling Group Auto Scaling Group 1 Auto Scaling Group 2

EC2 Instances EC2 Instances EC2 Instances EC2 Instances


Launch Template v1 Launch Template v2 Launch Template v1 Launch Template v2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Auto Scaling – Solution Architecture
Client Test Client
DNS Query

Route 53 Separate
CNAME Client based LB manual testing
Weighted record Load testing
ALB 1 ALB 2

Auto Scaling Group 1 Auto Scaling Group 2

EC2 Instances EC2 Instances


Launch Template v1 Launch Template v2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EC2 Spot Instances
• Can get a discount of up to 90% compared to On-Demand
• Define max spot price and get the instance while current spot price < max
• The hourly spot price varies based on offer and capacity
• If the current spot price > your max price you can choose to stop or terminate your
instance with a 2 minutes grace period.

• Used for batch jobs, data analysis, or workloads that are resilient to failures.
• Not great for critical jobs or databases

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EC2 Spot Instances

User-defined max price

https://console.aws.amazon.com/ec2sp/v1/spot/home?region=us-east-1#
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Spot Fleets
• Spot Fleets = set of Spot Instances + (optional) On-Demand Instances
• The Spot Fleet will try to meet the target capacity with price constraints
• Define possible launch pools: instance type (m5.large), OS, Availability Zone
• Can have multiple launch pools, so that the fleet can choose
• Spot Fleet stops launching instances when reaching capacity or max cost
• Strategies to allocate Spot Instances:
• lowestPrice: from the pool with the lowest price (cost optimization, short workload)
• diversified: distributed across all pools (great for availability, long workloads)
• capacityOptimized: pool with the optimal capacity for the number of instances

• Spot Fleets allow us to automatically request Spot Instances with the lowest price

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What is Docker?
• Docker is a software development platform to deploy apps
• Apps are packaged in containers that can be run on any OS
• Apps run the same, regardless of where they’re run
• Any machine (no compatibility issues, predictable behavior)
• Less work
• Easier to maintain and deploy
• Works with any language, any OS, any technology
• Control how much memory / CPU is allocated to your container
• Scale containers up and down very quickly (seconds)
• More efficient than Virtual machines

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Docker Containers Management on AWS
• To manage containers, we need a container management platform

• Amazon Elastic Container Service (Amazon ECS) Amazon ECS


• Amazon’s own container platform

• Amazon Elastic Kubernetes Service (Amazon EKS)


Amazon EKS
• Amazon’s managed Kubernetes (open source)

• AWS Fargate
• Amazon’s own Serverless container platform AWS Fargate
• Works with ECS and with EKS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon ECS – Use cases
• Run Microservices
• Run multiple Docker containers on the same machine
• Easy Service Discovery features to enhance communication
• Direct integration with Application Load Balancer and Network Load Balancer
• Auto Scaling capability

• Run Batch Processing / Scheduled Tasks


• Schedule ECS tasks to run on On-demand / Reserved / Spot instances

• Migrate Applications to the Cloud


• Dockerize legacy applications running on-premises
• Move Docker containers to run on Amazon ECS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon ECS – Concepts
• ECS Cluster – logical grouping of EC2 instances
• ECS Service – defines how many tasks should run and how they should
be run
• Task Definitions – metadata in JSON form to tell ECS how to run a
Docker container (image name, CPU, RAM, …)
• ECS Task – an instance of a Task Definition, a running Docker container(s)
• ECS IAM Roles
• EC2 Instance Profile – used by the EC2 instance (e.g., make API calls to ECS, send
logs, …)
• ECS Task IAM Role – allow each task to have a specific role (e.g., make API calls to
S3, DynamoDB, …)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon ECS - Concepts ECS
CloudWatch
Logs S3 Bucket

Auto Scaling Group


EC2 Instance Profile

instantiate EC2 Instance EC2 Instance EC2 Instance


Task IAM Role
Task Definition Service - A

Service - B

ECS Tasks ECS Tasks ECS Tasks

ECS Cluster

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon ECS – ALB Integration
• We get Dynamic Port Mapping Dynamic EC2 Instance

• Allows you to run multiple Port Mapping


36789 ECS Task
instances of the same application
on the same EC2 instance 39586 ECS Task
• The ALB finds the right port on
your EC2 Instances 80/443

• Use cases: Users


EC2 Instance

• Increased resiliency even if running Application


39748 ECS Task
on one EC2 instance Load Balancer
• Maximize utilization of CPU / cores ECS Task
39856
• Ability to perform rolling upgrades
without impacting app uptime
ECS Cluster

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Fargate
• Launch Docker containers on AWS
• You do not provision the infrastructure New Docker
(no EC2 instances to manage) Container

• It’s all serverless!


• You create task definitions
AWS Fargate
• AWS runs containers for you based on
the CPU / RAM you need
• To scale, just increase the number of
tasks. Simple! No more EC2 instances J

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon ECS – Security & Networking
• You can inject secrets and configurations as Environment Variables into
running Docker containers
• Integration with SSM Parameter Store and Secrets Manager

• ECS Tasks Networking


• none – no network connectivity, no port mappings
• bridge – uses Docker’s virtual container-based network
• host – bypass Docker’s network, uses the underlying host network interface
• awsvpc
• Every tasks launched on the instance gets its own ENI and a private IP address
• Simplified networking, enhanced security, Security Groups, monitoring, VPC Flow Logs
• Default mode for Fargate tasks

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon ECS – Service Auto Scaling
• Automatically increase/decrease the desired number of tasks
• Amazon ECS leverages AWS Application Auto Scaling
• CPU and RAM is tracked in CloudWatch at the ECS Service level

• Target Tracking – scale based on target value for a specific CloudWatch metric
• Step Scaling – scale based on a specified CloudWatch Alarm
• Scheduled Scaling – scale based on a specified date/time (predictable changes)

• ECS Service Auto Scaling (task level) ≠ EC2 Auto Scaling (EC2 instance level)
• Fargate Auto Scaling is much easier to setup (because Serverless)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon ECS – Spot Instances
• ECS Classic (EC2 Launch Type)
• Can have the underlying EC2 instances as Spot Instances (managed by an ASG)
• Instances may go into draining mode to remove running tasks
• Good for cost savings, but will impact reliability

• AWS Fargate
• Specify minimum of tasks for on-demand baseline workload
• Add tasks running on FARGATE_SPOT for cost-savings (can be reclaimed by AWS)
• Regardless of On-demand or Spot, Fargate scales well based on load

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon ECR - Elastic Container Registry
• Store and manage Docker images on AWS ECR Repository

• Private and Public repository (Amazon ECR Docker Docker


Image A Image B
Public Gallery https://gallery.ecr.aws)
• Fully integrated with ECS
• Access is controlled through IAM pull
IAM Role
pull

(permission errors => check policy)


EC2 Instance
• Supports image vulnerability scanning,
versioning, image tags, image lifecycle, …

ECS Cluster
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Lambda Integrations
Main ones

API Gateway Kinesis DynamoDB AWS S3 – AWS IoT


Simple Storage Service Internet of Things

CloudWatch Events CloudWatch Logs AWS SNS AWS Cognito Amazon


SQS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Example: Serverless Thumbnail creation

u sh
p
New thumbnail in S3
trigger

pu
Image name

sh
New image in S3 AWS Lambda Function Image size
Creates a Thumbnail Creation date
etc…

Metadata in DynamoDB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Example: Serverless CRON Job

Trigger
Every 1 hour

CloudWatch Events
AWS Lambda Function
Perform a task

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Lambda Language Support (runtimes)
• Node.js (JavaScript)
• Python
• Java
• C# (.NET Core)
• Golang
• C# / Powershell
• Ruby
• Custom Runtime API (community supported, example Rust)

• Lambda Container Image


• The container image must implement the Lambda Runtime API
• ECS / Fargate is preferred for running arbitrary Docker images

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda – Limits to know
• RAM – 128 MB to 10,240 MB (10 GB)
• CPU – is linked to RAM (cannot be set manually)
• 2 vCPUs are allocated at 1,769 MB of RAM
• 6 vCPUs are allocated at 10,240 MB of RAM
• Timeout – up to 15 minutes
• /tmp Storage – 512 MB (can’t process BIG files)
• Deployment Package – 50 MB (zipped) , 250 MB (unzipped) including layers
• Concurrent Executions – 1000 (soft limit that can be increased)
• Container Image Size – 10 GB
• Invocation Payload (request/response) – 6 MB (sync), 256 KB (async)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda – Latencies Considerations
(approximates)
• Lambda Latency:
• Cold Lambda Invocation: ~100ms API Gateway
• Warm Lambda Invocation: ~ms
• New feature of “provisioned concurrency”
(Dec 2019) to reduce # of cold starts
• API Gateway invocation: 100 ms
• CloudFront invocation: 100 ms Lambda
• If you chain with other services (API
Gateway, CloudFront, ALB, Lambda, SQS,
Step Functions…), add their latencies as
well
• X-Ray can help visualize the end-to-end
latency
API Gateway CloudFront ELB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda - Security
• IAM Roles for Lambda to grant write
access to other AWS services

• Resource-based Policies for


Lambda (similar to S3 bucket
policies):
• Allow other accounts to invoke or
manage Lambda
• Allow other services to invoke or
manage Lambda
(define through the CLI)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda in a VPC External API

Default Lambda Deployment Lambda in VPC

AWS Cloud AWS Cloud

Public subnet DynamoDB


Public
www NAT IGW
works DynamoDB
Endpoint
VPC & Private Subnet VPC & Private Subnet
Not working
working

Assign security group Private RDS


Private RDS

Note: Lambda - CloudWatch Logs works even


without endpoint or NAT Gateway
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Lambda Logging, Monitoring and Tracing
• CloudWatch:
• AWS Lambda execution logs are stored in AWS CloudWatch Logs
• AWS Lambda metrics are displayed in AWS CloudWatch Metrics (successful
invocations, error rates, latency, timeouts, etc…)
• Make sure your AWS Lambda function has an execution role with an IAM policy
that authorizes writes to CloudWatch Logs
• X-Ray:
• It’s possible to trace Lambda with X-Ray
• Enable in Lambda configuration (runs the X-Ray daemon for you)
• Use AWS SDK in Code
• Ensure Lambda Function has correct IAM Execution Role

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda – Synchronous Invocations
• Synchronous: CLI, SDK, API Gateway
• Results is returned right away
• Error handling must happen client side (retries, exponential backoff, etc…)
invoke
SDK Do something
Response

invoke proxy
Client Do something
Response Response

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda – Asynchronous Invocation
• S3, SNS, CloudWatch Events… retries

• Lambda attempts to retry on New file event


async invocation
errors (3 tries total)
• Make sure the processing is
idempotent (in case of retries)
DLQ for
failed processing

• Can define a DLQ (dead-letter


queue) – SNS or SQS – for SQS
failed processing

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda – Event Source Mapping
• Sources: Sources
• Kinesis Data Streams, SQS, SQS FIFO
Kinesis Data Amazon
• DynamoDB Streams, Amazon MQ, Apache Kafka Streams DynamoDB
• Records need to be polled from the source
(common denominator) Amazon MSK Amazon MQ
• All records are respect ordering properties
EXCEPT for SQS standard Amazon SQS Apache Kafka
• If your function returns an error, the entire
batch is reprocessed until success
poll batch
• Kinesis, DynamoDB Stream: stop shard processing
• SQS FIFO: stop, unless a SQS DLQ has been
defined Lambda
• Need to make sure your Lambda function is (Event Source Mapping)
idempotent
invoke with batch

Lambda function
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda – Destinations
• Nov 2019: Can configure to send result to a
destination
• Asynchronous invocations - can define destinations for
successful and failed event:
• Amazon SQS
• Amazon SNS
• AWS Lambda
https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html
• Amazon EventBridge bus
• Note: AWS recommends you use destinations instead of
DLQ now (but both can be used at the same time)

• Event Source mapping: for discarded event batches


• Amazon SQS
• Amazon SNS
• Note: you can send events to a DLQ directly from SQS

https://docs.aws.amazon.com/lambda/latest/dg/invocation-eventsourcemapping.html

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Lambda Versions
• When you work on a Lambda function,
we work on $LATEST $LATEST
• When we’re ready to publish a Lambda (mutable)
function, we create a version
• Versions are immutable
• Versions have increasing version numbers
• Versions get their own ARN (Amazon
Resource Name) V1 V2
• Version = code + configuration (nothing (Immutable) (Immutable)
can be changed - immutable)
• Each version of the lambda function can
be accessed

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Lambda Aliases
Users
• Aliases are ”pointers” to Lambda
function versions
• We can define a “dev”, ”test”,
“prod” aliases and have them point
at different lambda versions DEV Alias PROD Alias TEST Alias
• Aliases are mutable (mutable) (mutable) (mutable)
• Aliases enable Blue / Green
deployment by assigning weights to 5%
lambda functions 95%

• Aliases enable stable configuration


of our event triggers / destinations $LATEST V1 V2
• Aliases have their own ARNs (mutable) (Immutable) (Immutable)
• Aliases cannot reference aliases

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Lambda Aliases with API Gateway
PROD Alias
Prod Stage 95%
V1

5%
No API Gateway changes Lambda alias changes
TEST Alias
Test Stage
V2
100%

DEV Alias
Dev Stage 100%
$LATEST

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda & CodeDeploy
• CodeDeploy can help you automate
traffic shift for Lambda aliases Make X vary over time until X = 100%
• Feature is integrated within the SAM
framework PROD Alias

• Linear: grow traffic every N minutes until 100 – X%


100% V1
• Linear10PercentEvery3Minutes
• Linear10PercentEvery10Minutes
• Canary: try X percent then 100%
• Canary10Percent5Minutes X%
CodeDeploy
• Canary10Percent30Minutes V2
• AllAtOnce: immediate
• Can create Pre & Post Traffic hooks to
check the health of the Lambda function

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda Environment Variables
• Environment variable = key / value pair in “String” form
• Adjust the function behavior without updating code
• The environment variables are available to your code
• Lambda Service adds its own system environment variables as well

• Helpful to store secrets (encrypted by KMS)


• Secrets can be encrypted by the Lambda service key, or your own KMS Key

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda Concurrency and Throttling
• Concurrency limit: up to 1000 concurrent executions

• Can set a “reserved concurrency” at the function level (=limit)


• Each invocation over the concurrency limit will trigger a “Throttle”

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda Concurrency Issue
• If you don’t reserve (=limit) concurrency, the following can happen:

1000 concurrent
executions
Many users Application Load Balancer

THROTTLE!

Few users
API Gateway

THROTTLE!
SDK / CLI

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Cold Starts & Provisioned Concurrency
• Cold Start:
• New instance => code is loaded and code outside the handler run (init)
• If the init is large (code, dependencies, SDK…) this process can take some time.
• First request served by new instances has higher latency than the rest
• Provisioned Concurrency:
• Concurrency is allocated before the function is invoked (in advance)
• So, the cold start never happens, and all invocations have low latency
• Application Auto Scaling can manage concurrency (schedule or target utilization)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Reserved and Provisioned Concurrency

https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda Integration with ALB
• To expose a Lambda function as an HTTP(S) endpoint…
• You can use the Application Load Balancer (or an API Gateway)
• The Lambda function must be registered in a target group

Target Group
HTTP/HTTPS INVOKE SYNC

Client Application Load Balancer


(ALB)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Types of load balancer on AWS
• AWS has 4 kinds of managed Load Balancers
• Classic Load Balancer (v1 - old generation) – 2009 – CLB
• HTTP, HTTPS, TCP, SSL (secure TCP)
• Application Load Balancer (v2 - new generation) – 2016 – ALB
• HTTP, HTTPS, WebSocket
• Network Load Balancer (v2 - new generation) – 2017 – NLB
• TCP, TLS (secure TCP), UDP
• Gateway Load Balancer – 2020 – GWLB
• Operates at layer 3 (Network layer) – IP Protocol

• Overall, it is recommended to use the newer generation load balancers as they


provide more features
• Some load balancers can be setup as internal (private) or external (public) ELBs

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Classic Load Balancers (v1)
• Health Checks can be HTTP (L7) or
TCP (L4) based including with SSL

• Supports only one SSL certificate


• The SSL certificate can have many SAN listener internal
(Subject Alternate Name), but the SSL
certificate must be changed anytime a SAN
is added / edited / removed
• Better to use ALB with SNI (Server Name
Indication) if possible Client CLB EC2
• Can use multiple CLB if you want distinct
SSL certificates

• TCP => TCP passes all the traffic to


the EC2 instance
• Only way to use 2-way SSL authentication

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Application Load Balancer (v2)
• Application load balancers is Layer 7 (HTTP)

• Load balancing to multiple HTTP applications across machines


(target groups)
• Load balancing to multiple applications on the same machine
(ex: containers) – great fit with ECS, has dynamic port mapping
• Support for HTTP/2 and WebSocket
• Support redirects (from HTTP to HTTPS for example)
• Routing Rules for path, headers, query string

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Application Load Balancer (v2)
HTTP Based Traffic

Target Group

Health Check
application
for Users
Route /user HTTP
WWW

External
Application
Load Balancer
(v2)

Target Group

Health Check
application
for Search
Route /search HTTP
WWW

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Application Load Balancer (v2)
Target Groups
• EC2 instances (can be managed by an Auto Scaling Group) – HTTP
• ECS tasks (managed by ECS itself) – HTTP
• Lambda functions – HTTP request is translated into a JSON event
• IP Addresses – must be private IPs

• ALB can route to multiple target groups


• Health checks are at the target group level

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Network Load Balancer (v2)
• Network load balancers (Layer 4) allow to:
• Forward TCP & UDP traffic to your instances
• Handle millions of request per seconds
• Less latency ~100 ms (vs 400 ms for ALB)

• NLB has one static IP per AZ, and supports assigning Elastic IP
(helpful for whitelisting specific IP)

• NLB are used for extreme performance, TCP or UDP traffic


• Not included in the AWS free tier

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Network Load Balancer – Target Groups
• EC2 instances
• IP Addresses – must be private IPs
• Application Load Balancer
Network Network Network
Load Balancer Load Balancer Load Balancer

i-1234567890abcdef0 i-1234567890abcdef0 192.168.1.118 10.0.4.21

Target Group Target Group Target Group


(EC2 Instances) (IP Addresses) (Application Load Balancer)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Gateway Load Balancer
Route
• Deploy, scale, and manage a fleet of 3rd party Table
network virtual appliances in AWS
• Example: Firewalls, Intrusion Detection and Users Application
Prevention Systems, Deep Packet Inspection (source) (destination)
Systems, payload manipulation, …
traffic traffic

• Operates at Layer 3 (Network Layer) – IP Gateway


Packets Load Balancer
• Combines the following functions:
• Transparent Network Gateway – single entry/exit
for all traffic
• Load Balancer – distributes traffic to your virtual
appliances Target Group
• Uses the GENEVE protocol on port 6081
3rd Party Security
Virtual Appliances
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Gateway Load Balancer – Target Groups
• EC2 instances
• IP Addresses – must be private IPs

Gateway Gateway
Load Balancer Load Balancer

i-1234567890abcdef0 i-1234567890abcdef0 192.168.1.118 10.0.4.21

Target Group Target Group


(EC2 Instances) (IP Addresses)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Cross-Zone Load Balancing
With Cross Zone Load Balancing: Without Cross Zone Load Balancing:
each load balancer instance distributes evenly Requests are distributed in the instances of the
across all registered instances in all AZ node of the Elastic Load Balancer

50 50 50 50

10 10 10 10 6.25 6.25 6.25 6.25


10 10 25 25

10 10 10 10 6.25 6.25 6.25 6.25

Availability Zone 1 Availability Zone 2 Availability Zone 1 Availability Zone 2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Cross-Zone Load Balancing
• Classic Load Balancer
• Disabled by default
• No charges for inter AZ data if enabled

• Application Load Balancer


• Always on (can’t be disabled)
• No charges for inter AZ data

• Network Load Balancer


• Disabled by default
• You pay charges ($) for inter AZ data if enabled

• Gateway Load Balancer


• Disabled by default
• You pay charges ($) for inter AZ data if enabled

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Sticky Sessions (Session Affinity)
• It is possible to implement stickiness so that the
same client is always redirected to the same Client 1 Client 2 Client 3
instance behind a load balancer
• This works for Classic Load Balancers &
Application Load Balancers
• The “cookie” used for stickiness has an
expiration date you control
• Use case: make sure the user doesn’t lose his
session data
• Enabling stickiness may bring imbalance to the
load over the backend EC2 instances
EC2 Instance EC2 Instance

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Request Routing Algorithms – Least
Outstanding Requests
• The next instance to receive the request is the instance that has the lowest number
of pending/unfinished requests
• Works with Application Load Balancer and Classic Load Balancer (HTTP/HTTPS)

EC2 Instance
ALB

EC2 Instance

CLB
(HTTP/HTTPS Listener)
EC2 Instance

Registered Instances
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Request Routing Algorithms – Round Robin
• Equally choose the targets from the target group
• Works with Application Load Balancer and Classic Load Balancer (TCP)

EC2 Instance
ALB

--- OR ---
EC2 Instance

CLB (TCP Listener) EC2 Instance


Target Group (ALB) /
Registered Instances (CLB)
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Request Routing Algorithms – Flow Hash
• Selects a target based on the protocol, source/destination IP address,
source/destination port, and TCP sequence number
• Each TCP/UDP connection is routed to a single target for the life of the connection
• Works with Network Load Balancer

Protocol
Source & destination IP EC2 Instance
Source & destination Port Hash
TCP sequence no. Flow Hash 8743b…
Algorithm
EC2 Instance

Network Load Balancer


EC2 Instance

Target Group
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway – Overview
REST API PROXY REQUESTS CRUD

Amazon API AWS Amazon


client Gateway Lambda DynamoDB

• Helps expose Lambda, HTTP & AWS Services as an API


• API versioning, authorization, traffic management (API keys, throttles),
huge scale, serverless, req/resp transformations, OpenAPI spec, CORS

• Limits to know:
• 29 seconds timeout
• 10 MB max payload size

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway – Deployment Stages
• API changes are deployed to “Stages” (as many as you want)
• Use the naming you like for stages (dev, test, prod)
• Stages can be rolled back as a history of deployments is kept
PROD Alias
Prod Stage 95%
V1

5%
TEST Alias
Test Stage
V2
100%

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway – Integrations
• HTTP
• Expose HTTP endpoints in the backend
• Example: internal HTTP API on premise, Application Load Balancer…
• Why? Add rate limiting, caching, user authentications, API keys, etc…
• Lambda Function
• Invoke Lambda function
• Easy way to expose REST API backed by AWS Lambda
• AWS Service
• Expose any AWS API through the API Gateway?
• Example: start an AWS Step Function workflow, post a message to SQS
• Why? Add authentication, deploy publicly, rate control…

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Solution Architecture Discussion:
API Gateway in front of S3
• You will be impacted by the 10 MB payload size limit
Client I want to upload a file proxy
Application

• Better architecture:
I want to upload a file invoke
Client
Application
Forward URL Return URL

Generate pre-signed URL

Upload to S3 using the pre-signed URL

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway - Endpoint Types
• Edge-Optimized (default): For global clients
• Requests are routed through the CloudFront Edge locations (improves latency)
• The API Gateway still lives in only one region
• Regional:
• For clients within the same region
• Could manually combine with CloudFront (more control over the caching
strategies and the distribution)
• Private:
• Can only be accessed from your VPC using an interface VPC endpoint (ENI)
• Use a resource policy to define access

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Caching API responses
Client
• Caching reduces the number of calls made to the
backend
• Default TTL (time to live) is 300 seconds
(min: 0s, max: 3600s)
• Caches are defined per stage Check
• Possible to override cache settings per method API cache
Gateway
• Clients can invalidate the cache with header: Gateway
cache
Cache-Control: max-age=0 (with proper IAM
authorization)
• Able to flush the entire cache (invalidate it) If cache miss
immediately
• Cache encryption option backend
• Cache capacity between 0.5GB to 237GB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway - Errors
• 4xx means Client errors
• 400: Bad Request
• 403: Access Denied, WAF filtered
• 429: Quota exceeded, Throttle

• 5xx means Server errors


• 502: Bad Gateway Exception, usually for an incompatible output returned from a
Lambda proxy integration backend and occasionally for out-of-order invocations due to
heavy loads.
• 503: Service Unavailable Exception
• 504: Integration Failure – ex Endpoint Request Timed-out Exception
API Gateway requests time out after 29 second maximum

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway – Security
• Load SSL certificates and use Route53 to define a CNAME
• Resource Policy (~S3 Bucket Policy):
• control who can access the API
• Users from AWS accounts, IP or CIDR blocks, VPC or VPC Endpoints
• IAM Execution Roles for API Gateway at the API level
• To invoke a Lambda Function, an AWS service…
• CORS (Cross-origin resource sharing):
• Browser based security
• Control which domains can call your API

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway – Authentication
• IAM based access (AWS_IAM)
• Good for providing access within your Client
infrastructure
• Pass IAM credentials in headers through Sig V4 Authentication
+ get token
• Lambda Authorizer (formerly Custom
Authorizer) Pass token
• Use Lambda to verify a custom OAuth / SAML /
3rd party authentication
API Gateway
• Cognito User Pools
• Client authenticates with Cognito Cognito User Pools Pass identity
• Client passes the token to API Gateway
• API Gateway knows out-of-the-box how to verify
to token
Backend

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway – Logging, Monitoring, Tracing
• CloudWatch Logs:
• Enable CloudWatch logging at the Stage level (with Log Level – ERROR, INFO)
• Can log full requests / responses data
• Can send API Gateway Access Logs (customizable)
• Can send logs directly into Kinesis Data Firehose (as an alternative to CW logs)
• CloudWatch Metrics:
• Metrics are by stage, possibility to enable detailed metrics
• IntegrationLatency, Latency, CacheHitCount, CacheMissCount
• X-Ray:
• Enable tracing to get extra information about requests in API Gateway
• X-Ray API Gateway + AWS Lambda gives you the full picture

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway – Usage Plans & API Keys
• If you want to make an API available as an offering ($) to your customers
• Usage Plan:
• who can access one or more deployed API stages and methods
• how much and how fast they can access them
• uses API keys to identify API clients and meter access
• configure throttling limits and quota limits that are enforced on individual client
• API Keys:
• alphanumeric string values to distribute to your customers
• Ex: WBjHxNtoAb4WPKBC7cGm64CBibIb24b4jt8jJHo9
• Can use with usage plans to control access
• Throttling limits are applied to the API keys
• Quotas limits is the overall number of maximum requests

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway – WebSocket API – Overview
CHAT APPLICATION Client 1
• What’s WebSocket?
• Two-way interactive communication Persistent connection
between a user’s browser and a server
• Server can push information to the client WebSocket API
• This enables stateful application use cases API Gateway
• WebSocket APIs are often used in real-
time applications such as chat
applications, collaboration platforms,
multiplayer games, and financial trading
platforms.
Lambda function Lambda function Lambda function
• Works with AWS Services (Lambda, (onConnect) (sendMessage) (onDisconnect)
DynamoDB) or HTTP endpoints

Amazon DynamoDB
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Server to Client Messaging
@connections used for replies to clients
WebSocket URL
wss://abcdef.execute-api.us-west-1.amazonaws.com/dev

send message
invoke
connectionId connectionId
Clients Lambda function
Amazon API Gateway Amazon DynamoDB
WebSocket API (sendMessage)

Connection URL HTTP POST (IAM Sig v4)


callback
Connection URL
wss://abcdef.execute-api.us-west-1.amazonaws.com/dev/@connections/connectionId

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS AppSync - Overview
• AppSync is a managed service that uses GraphQL
• GraphQL makes it easy for applications to get exactly the data they
need.
• This includes combining data from one or more sources
• NoSQL data stores, Relational databases, HTTP APIs…
• Integrates with DynamoDB, Aurora, Elasticsearch & others
• Custom sources with AWS Lambda
• Retrieve data in real-time with WebSocket or MQTT on WebSocket
• For mobile apps: local data access & data synchronization
• It all starts with uploading one GraphQL schema

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AppSync Diagram
DynamoDB
Web apps

Mobile apps AppSync GraphQL Schema Aurora


Resolvers
Real-time
dashboards ElasticSearch
Service
Offline Sync

Lambda Anything

CloudWatch HTTP Public


Metrics & Logs HTTP APIs

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Record Types
• A – maps a hostname to IPv4
• AAAA – maps a hostname to IPv6
• CNAME – maps a hostname to another hostname
• The target is a domain name which must have an A or AAAA record
• Can’t create a CNAME record for the top node of a DNS namespace (Zone
Apex)
• Example: you can’t create for example.com, but you can create for
www.example.com
• NS – Name Servers for the Hosted Zone
• Control how traffic is routed for a domain

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Diagram for A record
example.com?
Amazon
Route 53
54.22.33.44
Client

AWS Cloud

Public IP
54.22.33.44

EC2 Instance

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – CNAME vs. Alias
• AWS Resources (Load Balancer, CloudFront...) expose an AWS hostname:
• lb1-1234.us-east-2.elb.amazonaws.com and you want myapp.mydomain.com

• CNAME:
• Points a hostname to any other hostname. (app.mydomain.com => blabla.anything.com)
• ONLY FOR NON ROOT DOMAIN (aka. something.mydomain.com)
• Alias:
• Points a hostname to an AWS Resource (app.mydomain.com => blabla.amazonaws.com)
• Works for ROOT DOMAIN and NON ROOT DOMAIN (aka mydomain.com)
• Free of charge
• Native health check

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Alias Records Targets
• Elastic Load Balancers
• CloudFront Distributions Elastic Amazon Amazon
• API Gateway Load Balancer CloudFront API Gateway

• Elastic Beanstalk environments


• S3 Websites
Elastic Beanstalk S3 Websites VPC Interface
• VPC Interface Endpoints Endpoints

• Global Accelerator accelerator


• Route 53 record in the same hosted zone
Global Accelerator Route 53 Record
(same Hosted Zone)

• You cannot set an ALIAS record for an EC2 DNS name


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Records TTL (Time To Live)
• High TTL – e.g., 24 hr
• Less traffic on Route 53 equest
D NS R le.com
?
p
• Possibly outdated records myapp
.exa m

• Low TTL – e.g., 60 sec. A 1 2 .3 4.56.7


TL)
8 Amazon
• More traffic on Route 53 ($$) (with T Route 53
TTL
• Records are outdated for less HT T
P Re
time Client ques
t
Will cache the result for HT T
• Easy to change records The TTL of the record
P Re
spon
se

• Except for Alias records, TTL


is mandatory for each DNS
record Web Server

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Routing Policies – Simple
Single Value
• Typically, route traffic to a single foo.example.com
resource A 11.22.33.44

• Can’t be associated with Health Client


Amazon
Checks Route 53

• Can specify multiple values in the Multiple Value


same record foo.example.com

• If multiple values are returned, a


random one is chosen by the client Client
A 11.22.33.44
A 55.66.77.88
A 99.11.22.33
Amazon
chooses
a random value Route 53

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Routing Policies – Weighted
• Control the % of the requests that go to
each specific resource
% Weight: 70
70
• Can be associated with Health Checks
• Use cases: load balancing between regions,
testing new application versions…
20%

Amazon Weight: 20
Route 53
10
%

Weight: 10
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Routing Policies – Latency-based
• Redirect to the resource that
has the least latency close to us
• Super helpful when latency for
users is a priority
• Latency is based on traffic
between users and AWS
Regions
ALB
• Germany users may be (us-east-1)
directed to the US (if that’s the
lowest latency) ALB
(ap-southeast-1)
• Can be associated with Health
Checks (has a failover
capability)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Routing Policies – Failover (Active-Passive)

EC2 Instance
Health Check (Primary)
(mandatory)
DNS Requests
Failover
Client
Amazon
Route 53

EC2 Instance
(Secondary – Disaster Recovery)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Routing Policies – Geolocation
A 11.22.33.44
• Different from Latency-based!
• This routing is based on user location
• Specify location by Continent, Country
or by US State (if there’s overlapping,
most precise location selected) Default
A 99.11.22.33
• Should create a “Default” record (in
case there’s no match on location)
• Use cases: website localization, restrict
content distribution, load balancing, …
• Can be associated with Health Checks
A 55.66.77.88

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Routing Policies – Geoproximity
• Route traffic to your resources based on the geographic location of users and
resources
• Ability to shift more traffic to resources based on the defined bias
• To change the size of the geographic region, specify bias values:
• To expand (1 to 99) – more traffic to the resource
• To shrink (-1 to -99) – less traffic to the resource

• Resources can be:


• AWS resources (specify AWS region)
• Non-AWS resources (specify Latitude and Longitude)
• You must use Route 53 Traffic Flow to use this feature

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Routing Policies – Geoproximity

us-west-1 us-east-1
Bias: 0 Bias: 0

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Routing Policies – Geoproximity

us-west-1 us-east-1
Bias: 0 Bias: 50

Higher bias in us-east-1

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Traffic flow
• Simplify the process of creating and
maintaining records in large and
complex configurations
• Visual editor to manage complex
routing decision trees
• Configurations can be saved as
Traffic Flow Policy
• Can be applied to different Route 53
Hosted Zones (different domain
names)
• Supports versioning

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Routing Policies – Multi-Value
• Use when routing traffic to multiple resources
• Route 53 return multiple values/resources
• Can be associated with Health Checks (return only values for healthy resources)
• Up to 8 healthy records are returned for each Multi-Value query
• Multi-Value is not a substitute for having an ELB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Hosted Zones
• A container for records that define how to route traffic to a domain and
its subdomains

• Public Hosted Zones – contains records that specify how to route


traffic on the Internet (public domain names)
application1.mypublicdomain.com
• Private Hosted Zones – contain records that specify how you route
traffic within one or more VPCs (private domain names)
application1.company.internal

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Public vs. Private Hosted Zones
Public Hosted Zone Private Hosted Zone

example.com?

VPC
54.22.33.44
Client Private Hosted Zone
Public Hosted Zone

db.example.internal?
al?
ern

10.0.0.35
.int

.10
pl e
DB Instance

0
am

.0.
VPC

10
i.ex
(db.example.internal)

ap
(Private IP)

S3 Bucket Amazon EC2 Instance Application EC2 Instance EC2 Instance


CloudFront (Public IP) Load Balancer (webapp.example.internal) (api.example.internal)
(Private IP) (Private IP)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Good to Know
• For internal private DNS (Private Hosted Zone), you must enable the VPC
settings enableDnsHostnames and enableDnsSupport

• DNS Security Extensions (DNSSEC)


• A protocol for securing DNS traffic, verifies DNS data integrity and origin
• Protects against Man in the Middle (MITM) attacks
• Route 53 supports both DNSSEC for Domain Registeration and DNSSEC Signing
• Works only with Public Hosted Zones

• Route 53 with 3rd Registrar


• You can buy the domain out of AWS and use Route 53 as the DNS provider
• Update the NS records on the 3rd party Registrar

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Health Checks
Amazon Route 53
DNS Record
• HTTP Health Checks are only for public (latency, geoproximity, …)
resources
• Health Check => Automated DNS Failover: Health Check Health Check
1. Health checks that monitor an endpoint
(application, server, other AWS resource)
2. Health checks that monitor other health us-east-1 eu-west-1
checks (Calculated Health Checks)
3. Health checks that monitor CloudWatch
Alarms (full control !!) – e.g., throttles of ALB ALB
DynamoDB, alarms on RDS, custom metrics,
… (helpful for private resources)
Auto Scaling group Auto Scaling group

• Health Checks are integrated with CW


metrics EC2 Instance EC2 Instance

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Calculated Health Checks
Amazon Route 53
• Combine the results of multiple Health
Checks into a single Health Check
Health Check
• You can use OR, AND, or NOT (Parent)

• Can monitor up to 256 Child Health Checks


• Specify how many of the health checks need
to pass to make the parent pass
Health Check Health Check Health Check
• Usage: perform maintenance to your website (Child) (Child) (Child)

without causing all health checks to fail


monitor monitor monitor

EC2 Instance EC2 Instance EC2 Instance


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Health Checks – Monitor an Endpoint
Health Checker Health Checker Health Checker
(us-east-1) (us-west-1) (sa-east-1)

• About 15 global health checkers will check

HT /hea
the endpoint health

to
TP lth
20

req
0c

ue
• Health Checks pass only when the

od

st
e
endpoint responds with the 2xx and 3xx eu-west-1
Must allow incoming
status codes requests from Route 53
Health Checkers IP
• Health Checks can be setup to pass / fail ALB
address range

based on the text in the first 5120 bytes of


the response Auto Scaling group

EC2 Instance

https://ip-ranges.amazonaws.com/ip-ranges.json
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Health Checks – Private Hosted Zones
• Route 53 health checkers are outside the
VPC
VPC
• They can’t access private endpoints Private subnet

(private VPC or on-premises resource)


Health Checker
(us-east-1)

• You can create a CloudWatch Metric and monitor

associate a CloudWatch Alarm, then


create a Health Check that checks the monitor

alarm itself CloudWatch


Alarm

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Health Checks Solution Architecture
RDS multi-region failover
Option 1:
HTTP call
RDS Main /health-db route
us-east-1 Health check
Option 2:
CW Alarm
Async replication
CW Alarm linked to Health Check

RDS Read Replica Promote Read Replicas trigger CW Event linked to CW Alarm
us-west-2 (Or SNS topic)

Update DNS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Hybrid DNS
• By default, Route 53 Resolver
automatically answers DNS queries for: Public Name Server
• Local domain names for EC2 instances
• Records in Private Hosted Zones
• Records in public Name Servers Region

• Hybrid DNS – resolving DNS queries VPC


between VPC (Route 53 Resolver) and Private Hosted Zone
your networks (other DNS Resolvers)
• Networks can be: Route 53
• VPC itself / Peered VPC Resolver EC2 Instance
(ec2-192-0-2-44.compute-1.amazonaws.com)
• on-premises Network (connected through
Direct Connect or AWS VPN)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Resolver Endpoints
• Inbound Endpoint
• DNS Resolvers on your network can forward DNS queries to Route 53 Resolver
• Allows your DNS Resolvers to resolve domain names for AWS resources (e.g., EC2
instances) and records in Route 53 Private Hosted Zones

• Outbound Endpoint
• Route 53 Resolver conditionally forwards DNS queries to your DNS Resolvers
• Use Resolver Rules to forward DNS queries to your DNS Resolvers

• Associated with one or more VPCs in the same AWS Region


• Create in two AZs for high availability
• Each Endpoint supports 10,000 queries per second per IP address

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Resolver Inbound Endpoints
us-east-1 on-premises Data Center
Private Hosted Zone
(aws.private) Domain Name Forward To
VPC
aws.private 10.0.0.10
10.0.1.10
Private Subnet 1

kup S Quer y ?
loo DN private
ENI . aws.
EC2 Instance app
(IP: 10.0.0.10) DNS Resolvers
(app.aws.private) (onpremise.private)

Private Subnet 2 DNS Query


Route 53 Resolver app.aws.private?
Inbound Endpoint
Resolver VPN or DX connection
(IP: x.x.x.2) ENI
(IP: 10.0.1.10)
Server
(web.onpremise.private)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Resolver Outbound Endpoints
us-east-1
Private Hosted Zone on-premises Data Center
(aws.private)
VPC
Forwarding Rules
Private Subnet 1 IP: 172.16.0.10
Domain Name Target IP
ry ? onprem.private 172.16.0.10 y
S Que rivate N S Quer vate?
DN em.p D i
r ENI p rem.pr
b. onp EC2 Instance (IP: 10.0.0.20) web.
on
DNS Resolvers
we
(app.aws.private) (onprem.private)

Private Subnet 2
Route 53 Resolver
Outbound Endpoint
Resolver VPN or DX connection
(IP: x.x.x.2) ENI or
(IP: 10.0.1.20) NAT Gateway
Server
(web.onprem.private)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route 53 – Resolver Rules Resolver
Outbound Endpoint
• Control which DNS queries are forwarded to DNS
Resolvers on your network
• Conditional Forwarding Rules (Forwarding Rules) Forwarding Rules
• Forward DNS queries for a specified domain and all its Domain Name Target IP
subdomains to target IP addresses example.com 172.16.0.10

• System Rules acme.example.com 172.16.0.10

• Selectively overriding the behavior defined in Forwarding


overrides
Rules (e.g., don’t forward DNS queries for a subdomain System Rules
acme.example.com) Domain Name
• Auto-defined System Rules acme.example.com
• Defines how DNS queries for selected domains are
resolved (e.g., AWS internal domain names, Privated Auto-defined System Rules
Hosted Zones) Domain Name
• If multiple rules matched, Route 53 Resolver compute.amazonaws.com
chooses the most specific match ec2.internal

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Global Accelerator
• Leverage the AWS internal
network to route to your
application
• 2 Anycast IP are created for your America Edge location Europe
application
• The Anycast IP send traffic directly
to Edge Locations Private AWS
Public ALB
• The Edge locations send the traffic
Australia India
to your application

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Global Accelerator
• Works with Elastic IP, EC2 instances, ALB, NLB, public or private
• Supports Client IP Address Preservation except for NLBs and EIPs endpoints
• Consistent Performance
• Intelligent routing to lowest latency and fast regional failover
• No issue with client cache (because the IP doesn’t change)
• Internal AWS network
• Health Checks
• Global Accelerator performs a health check of your applications
• Helps make your application global (failover less than 1 minute for unhealthy)
• Great for disaster recovery (thanks to the health checks)
• Security
• only 2 external IP need to be whitelisted
• DDoS protection thanks to AWS Shield

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Global Accelerator vs CloudFront
• They both use the AWS global network and its edge locations around the world
• Both services integrate with AWS Shield for DDoS protection.

• CloudFront
• Improves performance for both cacheable content (such as images and videos)
• Dynamic content (such as API acceleration and dynamic site delivery)
• Content is served at the edge

• Global Accelerator
• Improves performance for a wide range of applications over TCP or UDP
• Proxying packets at the edge to applications running in one or more AWS Regions.
• Good fit for non-HTTP use cases, such as gaming (UDP), IoT (MQTT), or Voice over IP
• Good for HTTP use cases that require static IP addresses
• Good for HTTP use cases that required deterministic, fast regional failover

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Solution Architecture Comparisons
• EC2 on its own with Elastic IP
• EC2 with Route53
• ALB + ASG
• ALB + ECS on EC2
• ALB + ECS on Fargate
• ALB + Lambda
• API Gateway + Lambda
• API Gateway + AWS Service
• API Gateway + HTTP backend (ex: ALB)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EC2 with Elastic IP
Elastic IP Address
• Quick failover
Access instance using • The client should not
Public IP (Elastic IP
see the change
happen
User
• Helpful if the client
Public EC2 needs to resolve by
static Public IP
address
• Does not scale
Move Elastic IP • Cheap
In case of DR

Standby Instance

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Stateless web app - scaling horizontally
DNS Query
A Record Public EC2 instance,
TTL 1 hour No Elastic IP

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Stateless web app - scaling horizontally

DNS Query • “DNS-based load


A Record balancing”
TTL 1 hour • Ability to use multiple
instances
• Route53 TTL implies
client may get outdated
information
• Clients must have logic to
deal with hostname
resolution failures
• Adding an instance may
not receive full traffic
right away due to DNS
TTL

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ALB + ASG
DNS Query
• Scales well, classic architecture
Auto Scaling group
Alias Record • New instances are in service right away.
Availability zone 1
TTL 1 hour • Users are not sent to instances that are
out-of-service
• Time to scale is slow (EC2 instance
Availability zone 1 to 3 startup + bootstrap) – AMI can help
• ALB is elastic but can’t handle sudden,
huge peak of demand (pre-warm)
Availability zone 2
• Could lose a few requests if instances
are overloaded
• CloudWatch used for scaling
ALB + • Cross-Zone balancing for even traffic
Health Checks distribution
+ Multi AZ Availability zone 3

• Target utilization should be between


40% and 70%

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ALB + ECS on EC2 (backed by ASG)
DNS Query Auto Scaling group + ECS • Same properties as ALB +
Alias Record
TTL 1 hour Availability zone 1 ASG
• Application is run on
Availability zone 1 to 3 Docker
• ASG + ECS allows to have
Availability zone 2
dynamic port mappings
• Tough to orchestrate ECS
ALB + service auto-scaling + ASG
Health Checks
+ Multi AZ Availability zone 3
auto-scaling

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ALB + ECS on Fargate
DNS Query Fargate + Service Auto Scaling • Application is run on
Alias Record
TTL 1 hour Availability zone 1 Docker
• Service Auto Scaling is easy
• Time to be in-service is
quick (no need to launch an
Availability zone 2
EC2 instance in advance)
• Still limited by the ALB in
case of sudden peaks
Availability zone 3 • “serverless” application tier
• “managed” load balancer

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ALB + Lambda
DNS Query • Limited to Lambda’s runtimes
Alias Record • Seamless scaling thanks to
TTL 1 hour Lambda
• Simple way to expose
Lambda functions as HTTP/S
without all the features from
API Gateway
• Can combine with WAF
(Web Application Firewall)
ALB • Good for hybrid
microservices
• Example: use ECS for some
requests, use Lambda for
others

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway + Lambda
• Pay per request, seamless scaling,
fully serverless
• Soft limits: 10000/s API Gateway,
1000 concurrent Lambda
• API Gateway features:
Amazon API AWS authentication, rate limiting,
client
Gateway Lambda caching, etc…
• Lambda Cold Start time may
increase latency for some
requests
• Fully integrated with X-Ray

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway + AWS Service (as a proxy)
OK
• Lower latency, cheaper
PUT
• Not using Lambda concurrent
capacity, no custom code
AWS
Amazon API SQS
client
Gateway
Lambda
• Expose AWS APIs securely
BETTER through API Gateway
• SQS, SNS, Step Functions…
• Remember API Gateway has a
client Amazon API SQS payload limit of 10 MB (can be
Gateway
a problem for S3 proxy)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway + HTTP backend (ex: ALB)
• Use API Gateway features on
top of custom HTTP backend
(authentication, rate control,
API keys, caching…)

client Amazon API


Gateway HTTP Server
(ex: ALB, on-prem)
• Can connect to…
• on-premises service
• Application Load Balancer
• 3rd party HTTP service

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Storage Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EBS
• Network drive you attach to
ONE instance only US-EAST-1A
• Linked to a specific availability
zone (transfer: snapshot =>
restore)
• Volumes can be resized

• Make sure you choose an


instance type that is EBS
optimized to enjoy maximum EBS EBS EBS
throughput (10 GB) (100 GB) (50 GB)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EBS Volume Types
• EBS Volumes come in 6 types
• gp2 / gp3 (SSD): General purpose SSD volume that balances price and performance for
a wide variety of workloads
• io1 / io2 (SSD) / io2 Block Express: Highest-performance SSD volume for mission-
critical low-latency or high-throughput workloads
• st1 (HDD): Low cost HDD volume designed for frequently accessed, throughput-
intensive workloads
• sc1 (HDD): Lowest cost HDD volume designed for less frequently accessed workloads

• EBS Volumes are characterized in Size | Throughput | IOPS (I/O Ops Per Sec)
• When in doubt always consult the AWS documentation – it’s good!
• Only gp2/gp3 and io1/io2 can be used as boot volumes

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EBS Snapshots
• Incremental – only backup changed blocks
• EBS backups use IO, and you shouldn’t run them while your application is
handling a lot of traffic
• Snapshots will be stored in S3 (but you won’t directly see them)
• Not necessary to detach volume to do snapshot, but recommended
• Can copy snapshots across region (for DR)
• Can make Image (AMI) from Snapshot
• EBS volumes restored by snapshots need to be pre-warmed (use the Fast
Snapshot Restore FSR feature or fio/dd command to read the entire volume)
• Snapshots can be automated using Amazon Data Lifecycle Manager

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EBS Multi-Attach – io1/io2 family
• Attach the same EBS volume to multiple
EC2 instances in the same AZ Availability Zone 1
• Each instance has full read & write
permissions to the volume
• Use case:
• Achieve higher application availability in
clustered Linux applications (ex: Teradata)
• Applications must manage concurrent write
operations
• Must use a file system that’s cluster-aware
(not XFS, EX4, etc…) io2 volume with Multi-Attach

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Local EC2 Instance Store Very high IOPS

• Physical disk attached to the


physical server where your EC2 is
• Very High IOPS (because physical)
• Disks up to 7.5 TiB (can change
over time), stripped to reach 60
TiB (can change over time…)
• Block Storage (just like EBS)
• Cannot be increased in size
• Risk of data loss if hardware fails

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Instance Store vs EBS
• Instance store is physically attached to the machine (ephemeral storage)
• EBS is a network drive (persistent)
• Pros:
• Better I/O performance (EBS gp2 has a max IOPS of 16000, io1 of 64000, io2
Block Express of 256000)
• Good for buffer / cache / scratch data / temporary content
• Data survives reboots
• Cons:
• On stop or termination, the instance store is lost
• You can’t resize the instance store
• Backups must be operated by the user

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EFS – Elastic File System
• Managed NFS (network file system) that can be mounted on many EC2
• EFS works with EC2 instances in multi-AZ, & on–premises (DX & VPN)
• Highly available, scalable, expensive (3x gp2), pay per GB used
us-east-1a us-east-1b us-east-1c

EC2 Instances EC2 Instances EC2 Instances

Security Group

EFS FileSystem

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EFS – Elastic File System
• Use cases: content management, web serving, data sharing, WordPress
• Compatible with Linux based AMI (not Windows), POSIX-compliant
• Uses NFSv4.1 protocol
• Uses security group to control access to EFS
• Encryption at rest using KMS
• Can only attach to one VPC, create one ENI (mount target) per AZ
• POSIX file system (~Linux) that has a standard file API
• File system scales automatically, pay-per-use, no capacity planning!

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EFS – Performance & Storage Classes
• EFS Scale
• 1000s of concurrent NFS clients, 10 GB+ /s throughput
• Grow to Petabyte-scale network file system, automatically

• Performance mode (set at EFS creation time)


• General purpose (default): latency-sensitive use cases (web server, CMS, etc…)
• Max I/O – higher latency, throughput, highly parallel (big data, media processing)

• Throughput mode
• Bursting (1 TB = 50MiB/s + burst of up to 100MiB/s)
• Provisioned: set your throughput regardless of storage size, ex: 1 GiB/s for 1 TB storage

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EFS – Storage Classes
• Storage Tiers (lifecycle management
feature – move file after N days)
• Standard: for frequently accessed files
• Infrequent access (EFS-IA): cost to retrieve
files, lower price to store. Enable EFS-IA with a no access
Lifecycle Policy for 60 days

EFS Standard

• Availability and durability move Lifecycle Policy


• Regional: Multi-AZ, great for prod
• One Zone: One AZ, great for dev, backup
enabled by default, compatible with IA (EFS
One Zone-IA)
EFS IA

Amazon EFS File System

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EFS - On-premises & VPC Peering
AWS Cloud
Redundancy in DX / DX
or DX / VPN
VPC VPC

NFS Mount Target by IPv4


(not DNS) Amazon EFS
ENI
Direct Connect
OR / AND ENI
VPC
Site-to-Site VPN ENI peering EC2
On-premises Server

Redundancy in mount target

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EFS – Access Points Developers
Analytics
Users/Groups Users/Groups

• Easily manage applications access to NFS


environments / = /data / = /config
Permissions
• Enforce a POSIX user and group to use
when accessing the file system Access Point 2
UID: 1002
Access Point 1
UID: 1001

• Restrict access to a directory within the GID: 1002


Path: /data
GID: 1001
Path: /config

file system and optionally specify a


different root directory /
• Can restrict access from NFS clients using
IAM policies
/data /secret /config

EFS File System


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EFS – File System Policies
• Resource-based policy to control access to EFS File Systems (same as S3
bucket policy)
• By default, it grants full access to all clients

Grant Read & Write Access to A specific IAM User

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 – Overview
• Object storage, serverless, unlimited storage, pay-as-you-go
• Good to store static content (image, video files)
• Access objects by key, no indexing facility
• Not a filesystem, cannot be mounted natively on EC2

• Anti patterns:
• Lots of small files
• POSIX file system (use EFS instead), file locks
• Search features, queries, rapidly changing data
• Website with dynamic content

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Storage Classes Comparison
Intelligent- Glacier Instant Glacier Flexible Glacier Deep
Standard Standard-IA One Zone-IA
Tiering Retrieval Retrieval Archive

Durability 99.999999999% == (11 9’s)

Availability 99.99% 99.9% 99.9% 99.5% 99.9% 99.99% 99.99%

Availability
>= 3 >= 3 >= 3 1 >= 3 >= 3 >= 3
Zones

Min. Storage
None None 30 Days 30 Days 90 Days 90 Days 180 Days
Duration Charge

Min. Billable
None None 128 KB 128 KB 128 KB 40 KB 40 KB
Object Size

Retrieval Fee None None Per GB retrieved Per GB retrieved Per GB retrieved Per GB retrieved Per GB retrieved

• You can transition objects between tiers (or delete) using S3 Lifecycle Policies
https://aws.amazon.com/s3/storage-classes/
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 – Replication
us-east-1
• Cross Region Replication (CRR)
• Same Region Replication (SRR) Amazon S3

• Combine with Lifecycle Policies CRR

us-west-2
• Helpful to reduce latency
• Helpful for disaster recovery Amazon S3

• Helpful for security Lifecycle Policy to Transition

us-west-2
• S3 bucket versioning must be enabled
Glacier

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Event Notifications
• S3:ObjectCreated, S3:ObjectRemoved,
S3:ObjectRestore, S3:Replication…
SNS
• Object name filtering possible (*.jpg)
• Use case: generate thumbnails of images
uploaded to S3 events
• Can create as many “S3 events” as desired
Amazon S3 SQS

• S3 event notifications typically deliver events


in seconds but can sometimes take a minute
or longer
Lambda Function

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Event Notifications
with Amazon EventBridge
events All events rules Over 18
AWS services
as destinations
Amazon S3 Amazon
bucket EventBridge

• Advanced filtering options with JSON rules (metadata, object size, name...)
• Multiple Destinations – ex Step Functions, Kinesis Streams / Firehose…
• EventBridge Capabilities – Archive, Replay Events, Reliable delivery

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 – Baseline Performance
• Amazon S3 automatically scales to high request rates, latency 100-200 ms
• Your application can achieve at least 3,500 PUT/COPY/POST/DELETE and
5,500 GET/HEAD requests per second per prefix in a bucket.
• There are no limits to the number of prefixes in a bucket.
• Example (object path => prefix):
• bucket/folder1/sub1/file => /folder1/sub1/
• bucket/folder1/sub2/file => /folder1/sub2/
• bucket/1/file => /1/
• bucket/2/file => /2/
• If you spread reads across all four prefixes evenly, you can achieve 22,000
requests per second for GET and HEAD

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Performance
• Multi-Part upload: • S3 Transfer Acceleration
• recommended for files > 100MB, • Increase transfer speed by transferring
must use for files > 5GB file to an AWS edge location which will
• Can help parallelize uploads (speed forward the data to the S3 bucket in the
up transfers) target region
• Compatible with multi-part upload
Divide Parallel uploads
In parts
Fast Fast
(public www) (private AWS)
File in USA Edge Location S3 Bucket
Amazon S3 USA Australia
BIG file

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Performance – S3 Byte-Range Fetches
• Parallelize GETs by requesting specific
byte ranges
• Better resilience in case of failures
Can be used to retrieve only partial
Can be used to speed up downloads data (for example the head of a file)

File in S3 File in S3

Byte-range request for header


(first XX bytes)
Part 1 Part 2 … Part N header

Requests in parallel

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Multi-Part Upload – Remove Incomplete Parts
S3 Bucket
Parallel Uploads

BIG file ds

loa
t-up
ar
… -mu
l t i p
S3 Lifecycle Policy
li st
api
s s3 Incomplete
aw Multi-Part Upload
Use AWS CLI to Use Lifecycle Policy to
List Incomplete abort & delete Incomplete
User
Multi-part Uploads Multi-part Uploads after X days

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Select & Glacier Select
• Retrieve less data using SQL by performing server side filtering
• Can filter by rows & columns (simple SQL statements)
• Less network transfer, less CPU cost client-side

CSV file

Get CSV with S3 Select

Send filtered dataset


Amazon S3

Server-side filtering
https://aws.amazon.com/blogs/aws/s3-glacier-select/

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Solution Architecture ASG
Exposing Static Objects

EC2 Instance Store

CloudFront ALB EFS

CloudFront S3
CloudFront EC2 EBS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Solution Architecture
Indexing objects in DynamoDB

writes

Amazon S3 Lambda Function DynamoDB Table

API for object metadata


- Search by date
- Total storage used by a customer
- List of all objects with certain attributes
- Find all objects uploaded within a date range

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Solution Architecture on AWS
Dynamic vs Static Content Caching / Session Layer
DAX & DynamoDB
DNS Layer
Route 53 Dynamic Content (REST, HTTP server):
ALB + EC2
API Gateway + Lambda

Database Layer
Dynamic DynamoDB

CDN Layer upload


CloudFront index
Static content
Static events Lambda
Function

Pre-signed URL Static Assets Layer


S3
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon FSx – Overview
• Launch 3rd party high-performance file systems on AWS
• Fully managed service

FSx for Lustre FSx for FSx for


Windows File NetApp ONTAP
Server

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon FSx for Windows (File Server)
• EFS is a shared POSIX system for Linux systems.

• FSx for Windows is a fully managed Windows file system share drive
• Supports SMB protocol & Windows NTFS
• Microsoft Active Directory integration, ACLs, user quotas
• Built on SSD, scale up to 10s of GB/s, millions of IOPS, 100s PB of data
• Can be accessed from your on-premises infrastructure
• Can be configured to be Multi-AZ (high availability)
• Data is backed-up daily to S3

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon FSx for Lustre
• Lustre is a type of parallel distributed file system, for large-scale computing
• The name Lustre is derived from “Linux” and “cluster”

• Machine Learning, High Performance Computing (HPC)


• Video Processing, Financial Modeling, Electronic Design Automation
• Scales up to 100s GB/s, millions of IOPS, sub-ms latencies
• Seamless integration with S3
• Can “read S3” as a file system (through FSx)
• Can write the output of the computations back to S3 (through FSx)
• Can be used from on-premises servers

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
FSx File System Deployment Options
Region
• Scratch File System Availability Zone 1 Availability Zone 2
ENI
• Temporary storage Compute Compute
instances instances
• Data is not replicated (doesn’t persist if file
server fails)
• High burst (6x faster, 200MBps per TiB) FSx For Lustre S3 bucket
• Usage: short-term processing, optimize (Scratch file system) (optional data repository)
costs
Region
• Persistent File System Availability Zone 1 Availability Zone 2
• Long-term storage Compute
ENI
Compute
• Data is replicated within same AZ instances instances

• Replace failed files within minutes


• Usage: long-term processing, sensitive data FSx For Lustre S3 bucket
(Persistent file system) (optional data repository)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS DataSync
• Move large amount of data from on-premises to AWS
• Can synchronize to: Amazon S3 (any storage classes – including
Glacier), Amazon EFS, Amazon FSx for Windows
• Move data from your NAS or file system via NFS or SMB
• Replication tasks can be scheduled hourly, daily, weekly
• Leverage the DataSync agent to connect to your systems
• Can setup a bandwidth limit

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS DataSync
NFS / SMB to AWS (S3, EFS, FSx for Windows)
Region
on-premises
AWS Storage Resources

NFS or SMB TLS S3 Standard S3 Intelligent- S3 Standard-IA


Tiering

NFS or SMB AWS DataSync AWS S3 One S3 Glacier S3 Glacier


Server Agent Zone-IA Deep Archive
DataSync

AWS EFS Amazon FSx for


Windows File Server

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS DataSync
EFS to EFS
Region Region
(source) (destination)
VPC

Amazon EFS EC2 instance Amazon EFS


AWS DataSync
With DataSync Service endpoint
Agent

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Transfer Family
• A fully-managed service for file transfers into and out of Amazon S3 or
Amazon EFS using the FTP protocol
• Supported Protocols
• AWS Transfer for FTP (File Transfer Protocol (FTP))
• AWS Transfer for FTPS (File Transfer Protocol over SSL (FTPS))
• AWS Transfer for SFTP (Secure File Transfer Protocol (SFTP))
• Managed infrastructure, Scalable, Reliable, Highly Available (multi-AZ)
• Pay per provisioned endpoint per hour + data transfers in GB
• Store and manage users’ credentials within the service
• Integrate with existing authentication systems (Microsoft Active Directory,
LDAP, Okta, Amazon Cognito, custom)
• Usage: sharing files, public datasets, CRM, ERP, …

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Transfer Family

MS Active Directory authenticate


LDAP
… AWS Transfer for SFTP

Amazon S3

AWS Transfer for FTPS IAM Role

Users Route 53
(FTP client) (optional) AWS Transfer for FTP
(only within VPC)

Amazon EFS
AWS Transfer Family

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Transfer Family – Endpoint Types
Public Endpoint VPC Endpoint with Internal Access VPC Endpoint with Internet-facing Access
Internet

Internet VPC
EC2 Instance VPC EC2 Instance

EIP
AWS Cloud

VPN or
Direct Connect VPN or
Direct Connect
Corporate Data Center
• IPs managed by AWS (subject Corporate Data Center
to change, use DNS names)
• Can’t setup allow lists by
source IP addresses
• Static private IPs
• Static private IPs
• Setup allow lists (SGs & NACLs)
• Static public IPs (EIPs)
• Setup Security Groups
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Caching Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS CloudFront
• Content Delivery Network (CDN)
• Improves read performance, content
is cached at the edge
• 300+ Point of Presence globally
(edge locations)
• DDoS protection, integration with
Shield, AWS Web Application
Firewall
• Can expose external HTTPS and Source: https://aws.amazon.com/cloudfront/features/?nc=sn&loc=2
can talk to internal HTTPS backends

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront – Origins
• S3 Bucket
• For distributing files
• Enhanced security with CloudFront Origin Access Identity (OAI)
• CloudFront can be used as an ingress (to upload files to S3)
• S3 Bucket configured as a website
• First, enable Static Website hosting on the bucket
• MediaStore Container & MediaPackage Endpoint
• To deliver Video On Demand (VOD) or live streaming video using AWS Media Services
• Custom Origin (HTTP)
• EC2 instance
• Elastic Load Balancer (CLB or ALB)
• API Gateway (for more control… otherwise use API Gateway Edge)
• Any HTTP backend you want

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront – S3 as an Origin
AWS Cloud

Public www
Private AWS
Private AWS
Edge Edge
Los Angeles Mumbai

Private AWS Private AWS

Origin (S3 bucket)


Public www OAI
Edge Edge
São Paulo Melbourne

Origin Access Identity


+ S3 bucket policy

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront – EC2 or ALB as an origin
Security group

Allow Public IP of Edge Locations

Edge Location EC2 Instance


Must be Public

Security group Security group


Allow Public IP of Allow Security Group
Edge Locations of Load Balancer

Edge Location Application Load Balancer EC2 Instances


Public IPs Must be Public Can be Private

http://d7uri8nf7uskq.cloudfront.net/tools/list-cloudfront-ips
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront vs S3 Cross Region Replication
• CloudFront:
• Global Edge network
• Files are cached for a TTL (maybe a day)
• Great for static content that must be available everywhere

• S3 Cross Region Replication:


• Must be setup for each region you want replication to happen
• Files are updated in near real-time
• Read only
• Great for dynamic content that needs to be available at low-latency in few
regions

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront Geo Restriction

• You can restrict who can access your distribution


• Allow list: Allow your users to access your content only if they're in one of the
countries on a list of approved countries.
• Block list: Prevent your users from accessing your content if they're in one of the
countries on a blacklist of banned countries.

• The “country” is determined using a 3rd party Geo-IP database


• Use case: Copyright Laws to control access to content

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront Signed URL Diagram
• Signed URL with expiration
to control access to content 4. Signed URL Edge Location

in CloudFront Edge Location


Client
Amazon CloudFront

• The Signed URL are


generated by an API call into 1. Authentication +
Authorization
3. Signed URL

CloudFront as a trusted OAI


signer
2. Generate Signed URL
(AWS SDK)

Amazon S3
Application Server

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront Signed URL vs
S3 Pre-Signed URL
• CloudFront Signed URL: • S3 Pre-Signed URL:
• Allow access to a path, no matter • Issue a request as the person who
the origin pre-signed the URL
• Account wide key-pair, only the root • Uses the IAM key of the signing
can manage it IAM principal
• Can filter by IP, path, date, expiration • Limited lifetime
• Can leverage caching features

Pre-Signed URL
Signed URL Origin
Client
Client
Edge location

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront – Restrict Access to Application
Load Balancers and Custom Origins
• Prevent direct access to your ALB or Custom Origins (only access through CloudFront)
• First, configure CloudFront to add a Custom HTTP Header to requests it sends to the ALB
• Second, configure the ALB to only forward requests that contain that Custom HTTP
Header
• Keep the custom header name and value secret!
Origins
GET /index.html HTTP/1.1
GET /index.html HTTP/1.1 Host: mywebsite.com
X-Custom-Header: djdfhsb12121
Host: mywebsite.com X-Custom-Header: djdfhsb12121
… … forward

.1
Users P/1 Application
T T
l H om EC2 Instances
tm Load Balancer
ex.h site.c
ind eb -- OR --
Edge Location E T / myw
G st:
Ho
… You can also restrict access to
CloudFront Public IP addresses ONLY
https://d7uri8nf7uskq.cloudfront.net/
Custom Origin tools/list-cloudfront-ips

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront Caching
Client
• Cache based on
• Headers
• Session Cookies
• Query String Parameters Request
Origin
• The cache lives at each CloudFront Edge
Location
forwards

• You want to maximize the cache hit rate to


minimize requests on the origin Edge Location

Check / Update cache


• Control the TTL (0 seconds to 1 year), can be Based on Headers / Cookies
set by the origin using the Cache-Control
header, Expires header…
Expire based on TTL

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront Caching – Whitelist Headers
Better caching because less headers values
GET /image/cat.jpg HTTP/1.1
Host: pics.mywebsite.com
User-Agent: Mozilla/5.0 (Mac OS X 10_15_2….) GET /image/cat.jpg HTTP/1.1
Date: Tue, 28 Jan 2020 17:01:57 GMT Host: pics.mywebsite.com
Authorization: SAPISIDHASH fdd00ecee39fe…. whitelisting Authorization: SAPISIDHASH fdd00ecee39fe….
Keep-Alive: 300
Accept-Ranges: bytes

Request forward

Client Origin
Edge Location
Cache based on Host & Authorization

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront – Maximize cache hits by
separating static and dynamic distributions
CDN Layer Dynamic Content (REST, HTTP server):
CloudFront ALB + EC2
API Gateway + Lambda
Cache based on correct
headers and cookie

Dynamic

Static content
Static requests

No headers / session caching rules


Required for maximizing cache hits

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront Caching vs API Gateway Caching
Edge Location (us-east-2)
User API Gateway Edge
(us-east-2) (eu-west-1)

CloudFront Edge (us-east-2) similar


More control over the distribution
User API Gateway Regional
(us-east-2) (eu-west-1)

CloudFront Edge (us-east-2)


With caching enabled at the edge
User API Gateway Regional
(us-east-2) (eu-west-1)
Cache is optional

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront – Customization At The Edge
• Many modern applications execute some form of the logic at the edge
• Edge Function:
• A code that you write and attach to CloudFront distributions
• Runs close to your users to minimize latency
• Doesn’t have any cache, only to change requests/responses
• CloudFront provides two types: CloudFront Functions & Lambda@Edge
• Use cases:
• Manipulate HTTP requests and responses
• Implement request filtering before reaching your application
• User authentication and authorization
• Generate HTTP responses at the edge
• A/B Testing
• Bot mitigation at the edge
• You don’t have to manage any servers, deployed globally

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront Functions & Lambda@Edge
Edge Location

CloudFront Functions
Client A

Regional Edge Cache


Edge Location

CloudFront Functions Lambda@Edge Functions


Client B
Origin

Edge Location

CloudFront Functions
Client C

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront – CloudFront Functions Client

• Lightwight functions written in JavaScript


• For high-scale, latency-sensitive CDN customizations Viewer Viewer
• Sub-ms startup times, millions of requests/second Request Response

• RuN at Edge Locations


• Process-based isolation
• Used to change Viewer requests and responses:
• Viewer Request: after CloudFront receives a request from a CloudFront
viewer
• Viewer Response: before CloudFront forwards the response to Origin Origin
the viewer Request Response
• Native feature of CloudFront (manage code entirely within
CloudFront)

Origin
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront – Lambda@Edge Client

• Lambda functions written in NodeJS or Python


• Scales to 1000s of requests/second Viewer Viewer
• Runs at the nearest Regional Edge Cache Request Response

• VM-based isolation
• Used to change CloudFront requests and responses:
• Viewer Request – after CloudFront receives a request from a viewer
• Origin Request – before CloudFront forwards the request to the origin CloudFront
• Origin Response – after CloudFront receives the response from the origin
• Viewer Response – before CloudFront forwards the response to the Origin Origin
viewer Request Response

• Author your functions in one AWS Region (us-east-1), then


CloudFront replicates to its locations

Origin
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront Functions with Lambda@Edge
CloudFront Functions and Lambda@Edge can be used together

Edge Location Regional Edge Cache

CloudFront Lambda@Edge
Functions Functions

Viewer Request Origin Request

Viewer Response Origin Response


Client Edge Location Regional Edge
Cache
Origin
Cache

NOTE: You can’t combine CloudFront Functions and Lambda@Edge


in viewer events (viewer request & viewer response)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Using Lambda@Edge Only
Use when you need some of the capabilities of Lambda@Edge that aren’t available
with CloudFront Functions (e.g., longer execution time, network access, …)

Edge Location Regional Edge Cache

Lambda@Edge Lambda@Edge
Functions Functions

Viewer Request Origin Request

Viewer Response Origin Response


Client Regional Edge Origin
Cache

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront Functions vs. Lambda@Edge
CloudFront Functions Lambda@Edge
Runtime Support JavaScript Node.js, Python
Execution Location Edge Locations Regional Edge Caches
CloudFront Triggers - Viewer Request/Response - Viewer Request/Response
- Origin Request/Response
Isolation Process-based VM-based
Max. Execution Time < 1 ms - 5 seconds (viewers triggers)
- 30 seconds (origin triggers)
Max. Memory 2 MB - 128 MB (viewer triggers)
- 10 GB (origin triggers)
Total Package Size 10 KB - 1 MB (viewer triggers)
- 50 MB (origin t
Network Access, File System Access No Yes
Access to the Request Body No Yes
Pricing Free tier available, 1/6th price of @Edge No free tier, charged per request & duration

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront Functions vs. Lambda@Edge –
Use Cases
CloudFront Functions Lambda@Edge
• Cache key normalization • Longer execution time (several ms)
• Transform request attributes (headers,
cookies, query strings, URL) to create an • Adjustable CPU or memory
optimal Cache Key
• Your code depends on a 3rd
• Header manipulation
• Insert/modify/delete HTTP headers in the
libraries (e.g., AWS SDK to access
request or response other AWS services)
• URL rewrites or redirects • Network access to use external
• Request authentication & authorization services for processing
• Create and validate user-generated
tokens (e.g., JWT) to allow/deny requests • File system access or access to the
body of HTTP requests

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront Functions vs. Lambda@Edge –
Authentication and Authorization
CloudFront Functions Lambda@Edge

Amazon Amazon
CloudFront CloudFront
request forward request forward

Client Client
Origin Origin

Edge Location Regional


Edge Cache
intercept
check authentication Cognito
intercept
And authorization Or 3rd Party OICD
check authenticaiton
(e.g., validate JWT tokens) and authorization
CloudFront Function

Lambda@Edge Function
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda@Edge: Loading content based on
User-Agent
GET /images/cat-1920-1080.jpg GET /images/cat-1920-1080.jpg
Host: pics.mywebsite.com Amazon Host: pics.mywebsite.com
User-Agent: Mac OS Chrome/96.0.4664.110 User-Agent: Mac OS Chrome/96.0.4664.110

CloudFront …

GET /images/cat-1920-1080.jpg
Host: pics.mywebsite.com GET /images/cat-640-320.jpg
User-Agent: iPhone OS Safari/604.1 Host: pics.mywebsite.com
S3 Bucket
… User-Agent: iPhone OS Safari/604.1 (Origin)
Regional …
Edge Cache

Inspect User-Agent HTTP Header Redirect based on Device Type

Lambda@Edge Function
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda@Edge – Global Application
Amazon
CloudFront
dynamic API requests
Client

HTML website Regional Cached Responses


Edge Cache

S3 Bucket query data


(Static Website Hosting)
Lambda@Edge Function
DynamoDB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront – HTTPS configuration and Host

If Host header is forwarded:


CloudFront (HTTPS)
- www.example.com won’t match
origin.example.com ALB (HTTPS)
Hostname:
- CloudFront will refuse the
www.example.com request Hostname:
SSL Cert: origin.example.com
If Host header is not forwarded:
www.example.com
- CloudFront will add a Host SSL Cert:
GET https://www.example.com
header value of the origin: origin.example.com
HTTP/1.1 Origin: origin.example.com
Host: www.example.com origin.example.com
- Requests & Responses will work
[Headers…: Values…]

Note: there are 2 SSL certificates to manage


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront – HTTPS configuration and Host

Impossible configuration
CloudFront (HTTPS)
As the CloudFront distribution
Will loop over itself ALB (HTTPS)
Hostname:
www.example.com
Hostname:
SSL Cert: www.example.com
www.example.com
GET https://www.example.com SSL Cert:
HTTP/1.1 Origin: www.example.com
Host: www.example.com www.example.com
[Headers…: Values…]

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFront – HTTPS configuration and Host
If Host header is forwarded:
- Host: www.example.com will
match the www.example.com SSL
CloudFront (HTTPS)
certificate
- CloudFront will accept ALB (HTTPS)
Hostname:
- The correct Host value could be
www.example.com
set by Lambda@Edge Hostname:
SSL Cert: origin.example.com
If Host header is not forwarded:
www.example.com
- CloudFront will add a Host SSL Cert:
GET https://www.example.com
header value of the origin: www.example.com
HTTP/1.1 Origin:
origin.example.com
Host: www.example.com origin.example.com
- origin.example.com will not
[Headers…: Values…]
match the SSL cert
www.example.com
- The request will fail
Note: there is 1 SSL certificates to manage
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon ElastiCache Overview
• The same way RDS is to get managed Relational Databases…
• ElastiCache is to get managed Redis or Memcached
• Caches are in-memory databases with really high performance, low
latency
• Helps reduce load off of databases for read intensive workloads
• Helps make your application stateless
• AWS takes care of OS maintenance / patching, optimizations, setup,
configuration, monitoring, failure recovery and backups
• Using ElastiCache involves heavy application code changes

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ElastiCache
Solution Architecture - DB Cache
• Applications queries Amazon
ElastiCache, if not ElastiCache
available, get from RDS Cache hit
and store in ElastiCache.
• Helps relieve load in RDS
Cache miss
• Cache must have an application
Read from DB
invalidation strategy to
make sure only the most
current data is used in Write to cache Amazon
there. RDS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ElastiCache
Solution Architecture – User Session Store
• User logs into any of the Write session
application application
• The application writes Amazon
the session data into ElastiCache
ElastiCache Retrieve session

User
• The user hits another application
instance of our
application
• The instance retrieves the
data and the user is application
already logged in

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ElastiCache – Redis vs Memcached
REDIS MEMCACHED
• Multi AZ with Auto-Failover • Multi-node for partitioning of
• Read Replicas to scale reads data (sharding)
and have high availability • Non persistent
• Persistent, Data Durability: • No backup and restore
Append Only File (AOF), • Multi-threaded architecture
backup and restore features

Replication
+
sharding

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Handling Extreme Rates
Compute Layer Redis – < 200 nodes (replica+sharding)
Route 53 ASG, ECS – slow, bootstrap Memcached – 20 nodes (sharding)
(global) Fargate – faster DAX – 10 nodes (primary + replicas)
Lambda – 1000 concurrent Database Layer
RDS, Aurora, ElasticSearch - provisioned
DynamoDB - auto scaling, on-demand
SQS, SNS – unlimited
SQS FIFO – 3000 RPS (with batching)
CloudFront ALB Kinesis – 1 MB/s in, 2 MB/s out per shard
Client
100000 RPS API Gateway – 10000 RPS
EBS – 16k IOPS (gp2), 64k IOPS (io1)
Instance Store – ~M IOPS
EFS – General, Max IO

Cache S3 – 3500 PUT, 5550 GET per prefix /s


CloudFront (edge) KMS limits if encrypted

Caching, TTL, Network, Computation, Cost, Latency


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Databases Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DynamoDB – in short
• NoSQL database, fully managed, massive scale (1,000,000 rps)
• Similar to Apache Cassandra (can migrate to DynamoDB)
• No disk space to provision, max object size is 400 KB
• Capacity: provisioned (WCU, RCU, & Auto Scaling) or on-demand
• Supports CRUD (Create Read Update Delete)
• Read: eventually or strong consistency
• Supports transactions across multiple tables (ACID support)
• Backups available, point in time recovery
• Table classes: Standard and Infrequent Access (IA)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DynamoDB - Basics
• DynamoDB is made of tables
• Each table has a primary key (must be decided at creation time)
• Each table can have an infinite number of items (= rows)
• Each item has attributes (can be added over time – can be null)
• Maximum size of a item is 400KB
• Data types supported are:
• Scalar Types: String, Number, Binary, Boolean, Null
• Document Types: List, Map
• Set Types: String Set, Number Set, Binary Set

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DynamoDB – Primary Keys

• Option 1: Partition key only (HASH)


user_id First Name Age
• Partition key must be unique for each
item 12broiu45 John 46
dfi7503df Katie 31
• Partition key must be “diverse” so
that the data is distributed Partition key
attributes
(unique)
• Example: user_id for a users table

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DynamoDB – Primary Keys
• Option 2: Partition key + Sort Key
• The combination must be unique
• Data is grouped by partition key user_id game_id Result
• Sort key == range key 12broiu45 1234 win
• Example: users-games table 12broiu45 3456 lose
• user_id for the partition key Partition key Sort Key attributes
• game_id for the sort key
Primary key

• Example good sort key: timestamp

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DynamoDB – Indexes
• Object = primary key + optional sort key + attributes
• LSI – Local Secondary Index
• Keep the same primary key
• Select an alternative sort key
• Must be defined at table creation time
• GSI – Global Secondary Index
• Change the primary key and optional sort sort
• Can be defined after the table is created

• You can only query by PK + sort key on the main table & indexes (≠ RDS)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DynamoDB – Important Features
• TTL: automatically expire row after a • Global Tables: (cross region
specified epoch date replication)
• Active Active replication, many
• DynamoDB Streams: regions
• react to changes to DynamoDB tables in • Must enable DynamoDB
real time Streams
• Can be read by AWS Lambda, EC2… • Useful for low latency, DR
• 24 hours retention of data purposes

Amazon ES
ElasticSearch replication
CRUD Streams CRUD CRUD

Table Kinesis
Lambda Table Table
us-east-1 ap-southeast-2
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon Kinesis Data Streams for DynamoDB
• You can use Kinesis Data Streams to capture item-level changes in DynamoDB
• Custom and longer data retention period (> 24 hours in DynamoDB Streams)
Kinesis Data
Firehose
Store …
Amazon S3 Redshift OpenSearch

item-level changes

DynamoDB Kinesis Data Real-time computations


Table Streams (filter, aggregate, transform, …)
Kinesis Data Kinesis Data Lambda
Streams Firehose
Kinesis Data
Analytics

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DynamoDB Solution Architecture
Indexing objects in DynamoDB

writes

Amazon S3 Lambda Function DynamoDB Table

API for object metadata


- Search by date
- Total storage used by a customer
- List of all objects with certain attributes
- Find all objects uploaded within a date range

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DynamoDB - DAX
• DAX = DynamoDB Accelerator
• Seamless cache for DynamoDB, no application re- applications

write
• Writes go through DAX to DynamoDB
• Micro second latency for cached reads & queries
• Solves the Hot Key problem (too many reads) DAX
• 5 minutes TTL for cache by default
• Up to 10 nodes in the cluster
• Multi AZ (3 nodes minimum recommended for
production)
• Secure (Encryption at rest with KMS, VPC, IAM,
CloudTrail…)
Amazon table table table
DynamoDB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DynamoDB – DAX vs ElastiCache

Store Aggregation Result ElastiCache

Client
DynamoDB

DAX
Individual objects cache
Query / Scan cache

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon OpenSearch (ex ElasticSearch)
• New name is Amazon OpenSearch
• ElasticSearch => OpenSearch
• Kibana => OpenSearch Dashboards

• Managed version of OpenSearch (open-source project, fork of ElasticSearch)


• Needs to run on servers (not a serverless offering)
• Use cases:
• Log Analytics
• Real Time application monitoring
• Security Analytics
• Full Text Search
• Clickstream Analytics
• Indexing

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
OpenSearch + OS Dashboards + Logstash
• OpenSearch (ex ElasticSearch): provide search and indexing capability
• You must specify instance types, multi-AZ, etc

• OpenSearch Dashboards (ex Kibana):


• Provide real-time dashboards on top of the data that sits in OpenSearch
• Alternative to CloudWatch dashboards (more advanced capabilities)

• Logstash:
• Log ingestion mechanism, use the “Logstash Agent”
• Alternative to CloudWatch Logs (you decide on retention and granularity)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
OpenSearch patterns
DynamoDB
CRUD

DynamoDB Table DynamoDB Stream Lambda Function Amazon OpenSearch

API to search items


API to retrieve items

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
OpenSearch patterns
CloudWatch Logs
Real time

CloudWatch Logs Subscription Filter Lambda Function Amazon OpenSearch


(managed by AWS)

Near Real Time

CloudWatch Logs Subscription Filter Kinesis Data Firehose Amazon OpenSearch

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
RDS
• Engines: PostgreSQL, MySQL, MariaDB, Oracle, Microsoft SQL Server
• Managed DB: provisioning, backups, patching, monitoring
• Launched within a VPC, usually in private subnet, control network access
using security groups (important when using Lambda)
• Storage by EBS (gp2 or io1), can increase volume size with auto-scaling
• Backups: automated with point-in-time recovery. Backups expire
• Snapshots: manual, can make copies of snapshots cross region
• RDS Events: get notified via SNS for events (operations, outages…)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
RDS – Multi AZ & Read Replicas
• Multi-AZ: Standby instance for • Read Replicas: Increase read
throughput. Eventual consistency.
failover in case of outage Can be cross-region
Application
Application
writes reads
writes reads
reads
One DNS name – automatic failover

SYNC ASYNC
replication replication
Standby instance Master Instance
RDS Read Replica RDS Instance
AZ – B AZ – A

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
RDS – Security (reminder)
• KMS encryption at rest for underlying EBS volumes / snapshots
• Transparent Data Encryption (TDE) for Oracle and SQL Server
• SSL encryption to RDS is possible for all DB (in-flight)
• IAM authentication for MySQL and PostgreSQL
• Authorization still happens within RDS (not in IAM)
• Can copy an un-encrypted RDS snapshot into an encrypted one
• CloudTrail cannot be used to track queries made within RDS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
About RDS for Oracle – Exam Tips
Backups • Real Application Clusters (RAC)
• RDS for Oracle does NOT support RAC
• Use RDS Backups for backups & restore
to Amazon RDS for Oracle • RAC is working on Oracle on EC2
Instances because you have full control
• Use Oracle RMAN (Recovery Manager)
for backups & restore to-non RDS (RDS • RDS for Oracle supports Transparent
not supported) Data Encryption (TDE) to encrypt
data before it’s written to storage
backup restore • DMS works on Oracle RDS
RDS DB Instance RDS Backup RDS DB Instance
on-premises AWS Cloud

backup upload restore replicate/migrate

RDS DB Instance Oracle RMAN S3 Bucket Oracle DB Oracle DB AWS DMS RDS DB Instance
Backup (external) (source) (target)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
About RDS for MySQL
• You can use the native mysqldump to migrate a MySQL RDS DB to non-RDS
• The external MySQL database can run either on-premises in your data
center, or on an Amazon EC2 instance
DB Admin
1. Export using mysqldump 2. Import using mysqldump

AWS Cloud on-premises Data Center

3. Start Replication

RDS MySQL DB MySQL DB Instance


4. Stop Replication
Instance (Source) after completion (Target)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
RDS Proxy for AWS Lambda
Lambda functions
• When using Lambda functions with RDS, it …
opens and maintains a database connection
• This can result in a “TooManyConnections” IAM
Authentication
exception
• With RDS Proxy, you no longer need code VPC
that handles cleaning up idle connections
and managing connection pools Public subnet
• Supports IAM authentication or DB RDS Proxy
authentication, auto-scaling
• The Lambda function must have connectivity
to the Proxy (public proxy => public Private subnet
Lambda, private proxy => Lambda in VPC)
Aurora
DB Cluster

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
RDS Solution Architecture
Cross Region Failover
Option 1:
HTTP call
RDS Main /health-db route
us-east-1 Health check
Option 2:
CW Alarm
Async replication
CW Alarm linked to Health Check

RDS Read Replica Promote Read Replicas trigger CW Event linked to CW Alarm
us-west-2 (Or SNS topic)

Update DNS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Aurora
• DB Engines: PostgreSQL-compatible & MySQL-compatible
• Storage: automatically grows up to 128 TB, 6 copies of data, multi-AZ
• Read Replicas: up to 15 RR, reader endpoint to access them all
• Cross Region RR: entire database is copied (not select tables)
• Load / Offload data directly from / to S3: efficient use of resources
• Backup, Snapshots & Restore: same as RDS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Aurora High Availability and Read Scaling
• 6 copies of your data across 3 AZ: AZ 1 AZ 2 AZ 3
• 4 copies out of 6 needed for writes
• 3 copies out of 6 need for reads
• Self healing with peer-to-peer replication
W R
• Storage is striped across 100s of volumes R R R R

• Automated failover for master in less Shared storage Volume


Replication + Self Healing + Auto Expanding
than 30 seconds
• Master + up to 15 Aurora Read
Replicas serve reads
• Support for Cross Region Replication

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Aurora DB Cluster

client

Writer Endpoint Reader Endpoint


Pointing to the master Connection Load Balancing

Auto Scaling

W R R R R R

Shared storage Volume


Auto Expanding from 10G to 128 TB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Aurora Serverless
• Automated database Client
instantiation and auto-
scaling based on actual
usage
Proxy Fleet
• Good for infrequent, (managed by Aurora)
intermittent or
unpredictable workloads
• No capacity planning
needed
• Pay per second, can be
more cost-effective
Shared storage Volume

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Global Aurora us-east-1 - PRIMARY region

• Aurora Cross Region Read Replicas:


• Useful for disaster recovery
• Simple to put in place
Applications
• Aurora Global Database (recommended): Read / Write
• 1 Primary Region (read / write) replication
• Up to 5 secondary (read-only) regions, replication lag is
less than 1 second eu-west-1 - SECONDARY region

• Up to 16 Read Replicas per secondary region


• Helps for decreasing latency
• Promoting another region (for disaster recovery) has an
RTO of < 1 minute
Applications
Read Only

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Aurora Multi-Master
• In case you want immediate failover for write node (HA) –
• Every node does R/W - vs promoting a RR as the new master
Client
Multiple DB Connections (for failover)

Replicate Replicate

Shared Storage Volume

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Aurora Endpoints
• Endpoint = Host Address + Port

• Cluster Endpoint (Writer Endpoint)


• Connects to the current primary DB instance in the Aurora cluster
• Used for all write operations in the DB cluster (inserts, updates, deletes, and queries)
• Reader Endpoint
• Provides load-balancing for read only connections to all Aurora Replicas in the Aurora cluster
• Used only for read operations (queries)
• Custom Endpoint
• Represents a set on DB instances that you choose in the Aurora cluster
• Used when you want to connect to different subsets of DB instances with different capacities and
configurations (e.g., different DB parameter group)
• Instance Endpoint
• Connects to a specific DB instance in the Aurora cluster
• Used when you want to diagnosis and fine tune a specific DB instance

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Aurora – Custom Endpoints
• Define a subset of Aurora Instances as a Custom Endpoint
• Example: Run analytical queries on specific replicas
• The Reader Endpoint is generally not used after defining Custom Endpoints

Analytical Queries
Queries
Client

Writer Endpoint Reader Endpoint Custom Endpoint

db.r3.large db.r3.large db.r5.2xlarge db.r5.2xlarge


W
R R R R
Shared Storage Volume
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Aurora Logs
Amazon Aurora
• You can monitor the following types
of Aurora MySQL log files:
• Error log …
Log Files
• Slow query log
• General log
• The audit log publish download

• These log files are either downloaded or


published to CloudWatch Logs
User
CloudWatch Logs

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Service Communications Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Step Functions
• Build serverless visual workflow to orchestrate your Lambda functions
• Represent flow as a JSON state machine
• Features: sequence, parallel, conditions, timeouts, error handling…
• Maximum execution time of 1 year
• Possibility to implement human approval feature

• If you chain Lambda functions using Step Functions, be mindful of the


added latency to pass the calls.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Visual workflow in Step Functions

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Step Function Integrations
• Optimized Integrations
• Can invoke a Lambda function Lambda Batch ECS
• Run an AWS Batch job
• Run an ECS task and wait for it to complete Step Functions
• Insert an item from DynamoDB
• Publish message to SNS, SQS DynamoDB
• Launch an EMR, Glue, or SageMaker jobs
• Launch another Step Function workflow…
• AWS SDK Integrations
SNS SQS
• Access 200+ AWS services from your State
Machine
• Works like standard AWS SDK API call

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Step Functions Workflow Triggers
• You can invoke a Step Function
Workflow (State Machine) using:
AWS Management AWS SDK AWS CLI AWS Lambda
• AWS Management Console Console
• AWS SDK (StartExecution API call)
• AWS CLI (start-execution)
• AWS Lambda (StartExecution API call) API Gateway EventBridge CodePipeline Step Functions

• API Gateway
• EventBridge trigger
• CodePipeline
• Step Functions

Step Functions
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Step Functions – Sample Projects
• https://console.aws.amazon.com/states/home?region=us-east-
1#/sampleProjects

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Step Functions – Tasks
• Lambda Tasks:
• Invoke a Lambda function
• Activity Tasks:
• Activity worker (HTTP), EC2 Instances, mobile device, on premise DC
• They poll the Step functions service
• Service Tasks:
• Connect to a supported AWS service
• Lambda function, ECS Task, Fargate, DynamoDB, Batch job, SNS topic, SQS queue
• Wait Task:
• To wait for a duration or until a timestamp

• Note: Step Functions does not integrate natively with AWS Mechanical Turk

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Step Functions – Solution Architecture
REST API Call
Service proxy
API Gateway
Lambda
Trigger

CloudWatch Step Functions


Events
Trigger DynamoDB

SQS
AWS SDK / CLI

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS SWF – Simple Workflow Service
• Coordinate work amongst applications
• Code runs on EC2 (not serverless)
• 1 year max runtime
• Concept of “activity step” and “decision step”
• Has built-in “human intervention” step
• Example: order fulfilment from web to warehouse to delivery
• Step Functions is recommended to be used for new applications, except:
• If you need external signals to intervene in the processes
• If you need child processes that return values to parent processes
• If you need to use Amazon Mechanical Turk

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SQS
• Serverless, managed queue, integrated with IAM
• Can handle extreme scale, no provisioning required
• Used to decouple services
• Message size of max 256 KB (use a pointer to S3 for large messages)
• Can be read from EC2 (optional ASG), Lambda
• SQS could be used as a write buffer for DynamoDB
• SQS FIFO:
• receive messages in order they were sent
• 300 messages/s without batching, 3000 /s with batching

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SQS – Solution Architecture
Idempotency
• Messages can be processed twice by consumer (in case of failures, timeouts, etc)
• To hedge against that problem, implement idempotency at the consumer level
• Means the same action done twice by the consumer won’t duplicate the effect

SQS DynamoDB
Poll batch of messages
Long Polling EC2
Consumer

Insert into DynamoDB (not idempotent)


Vs Upsert into DynamoDB (idempotent)
Delete message

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Lambda – Event Source Mapping
SQS & SQS FIFO
• Event Source Mapping will SQS
poll SQS (Long Polling)
• Specify batch size (1-10
messages) POLL RETURN BATCH
• Recommended: Set the
queue visibility timeout to
6x the timeout of your Lambda
Lambda function Event Source Mapping
• To use a DLQ
• set-up on the SQS queue,
not Lambda (DLQ for INVOKE WITH EVENT BATCH
Lambda is only for async
invocations)
• Or use a Lambda destination Lambda Function
for failures

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SQS - Solution Architecture
Request / Response queue (async)
SQS Request Queue

• Decoupling
• Fault-Tolerance
• Load Balancing
Client Work Processor

SQS Response Queue

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon MQ
• SQS, SNS are “cloud-native” services, and they’re using proprietary protocols
from AWS.
• Traditional applications running from on-premises may use open protocols
such as: MQTT, AMQP, STOMP, Openwire, WSS
• When migrating to the cloud, instead of re-engineering the application to use
SQS and SNS, we can use Amazon MQ
• Amazon MQ = managed Apache ActiveMQ

• Amazon MQ doesn’t “scale” as much as SQS / SNS


• Amazon MQ runs on a dedicated machine, can run in HA with failover
• Amazon MQ has both queue feature (~SQS) and topic features (~SNS)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon MQ – Re-platform
• IBM MQ, TIBCO EMS, Rabbit MQ, and Apache ActiveMQ can be
migrated to Amazon MQ

https://aws.amazon.com/blogs/compute/migrating-from-ibm-mq-to-amazon-mq-using-a-phased-approach/
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS SNS
• What if you want to send one message to many receivers?
Direct Email Pub / Sub
Email
integration notification notification

Fraud Fraud
Service Service
Buying Buying
Service Service
Shipping Shipping
Service SNS Topic Service

SQS Queue SQS Queue

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS SNS
• The “event producer” only sends message to one SNS topic
• As many “event receivers” (subscriptions) as we want to listen to the SNS topic notifications
• Each subscriber to the topic will get all the messages (note: new feature to filter messages)
• Up to 10,000,000+ subscriptions per topic
• 100,000 topics limit
• Subscribers can be:
• SQS
• HTTP / HTTPS (with delivery retries – how many times)
• Lambda
• Emails
• SMS messages
• Mobile Notifications (SNS Mobile Push - Android, Apple, Fire OS, Windows…)
• Kinesis Data Firehose

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SNS + SQS: Fan Out
SQS Queue

Fraud
Service
Buying
Service
Shipping
SNS Topic Service

SQS Queue

• Push once in SNS, receive in many SQS queues


• Fully decoupled, no data loss, ability to add receivers of data later
• SQS allows for delayed processing, retries of work
• May have many workers on one queue and one worker on the other
queue

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon SNS – FIFO Topic
• FIFO = First In First Out (ordering of messages in the topic)

Send messages Receive messages Subscribers


Producer
SQS FIFO
4 3 2 1 4 3 2 1

• Similar features as SQS FIFO:


• Ordering by Message Group ID (all messages in the same group are ordered)
• Deduplication using a Deduplication ID or Content Based Deduplication
• Can only have SQS FIFO queues as subscribers
• Limited throughput (same throughput as SQS FIFO)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SNS FIFO + SQS FIFO: Fan Out
• In case you need fan out + ordering + deduplication

SQS FIFO Queue


Fraud
Service
Buying
Service
Shipping
SNS FIFO Topic Service

SQS FIFO Queue

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Data Engineering Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Kinesis Overview
• Kinesis is a managed “data streaming” service
• Great for application logs, metrics, IoT, clickstreams
• Great for “real-time” big data
• Great for streaming processing frameworks (Spark, NiFi, etc…)
• Data is automatically replicated synchronously to 3 AZ

• Kinesis Streams: low latency streaming ingest at scale


• Kinesis Analytics: perform real-time analytics on streams using SQL
• Kinesis Firehose: load streams into S3, Redshift, ElasticSearch & Splunk

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis
Amazon
Kinesis

Amazon S3
Click streams bucket

IoT devices
Amazon Kinesis Amazon Kinesis Amazon Kinesis
Streams Analytics Firehose
Amazon
Metrics & Logs Redshift

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Streams Overview
• Streams are divided in ordered Shards / Partitions
Shard 1
producers Shard 2 consumers
Shard 3

• Data retention is 24 hours by default, can go up to 365 days


• Ability to reprocess / replay data
• Multiple applications can consume the same stream
• Real-time processing with scale of throughput
• Once data is inserted in Kinesis, it can’t be deleted (immutability)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Streams Shards
• Two modes for capacity:
• On-demand: no capacity planning, Kinesis scales shards automatically
• Provisioned: you manage the shards over time
• Batching available or per message calls.
• The number of shards can evolve over time (reshard / merge)
• Records are ordered per shard
Shard 1
Shard 2
producers Shard 3 consumers
Shard 4

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Producers & Consumers
KINESIS PRODUCERS KINESIS CONSUMERS
• AWS SDK: simple consumer
• AWS SDK: simple producer
• Lambda: (through Event source mapping)
• Kinesis Producer Library (KPL): • KCL: checkpointing, coordinated reads
batch, compression, retries, C++,
Java ages
mess
e

Checkpoint progress
KCL APP
• Kinesis Agent: sum
C on
• Monitor log files and sends them to
Kinesis directly Consume messages
KCL APP
• can write to Kinesis Data Streams Kinesis
AND Kinesis Data Firehose Data Streams
(KDS)

Amazon
© Stephane Maarek DynamoDB
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Data Streams Limits to know
• Producer:
• 1MB/s or 1000 messages/s at write PER SHARD
• “ProvisionedThroughputException” otherwise
• Consumer Classic:
• 2MB/s at read PER SHARD across all consumers
• 5 API calls per second PER SHARD across all consumers
• Consumer Enhanced Fan-Out:
• 2MB/s at read PER SHARD, PER ENHANCED CONSUMER
• No API calls needed (push model)
• Data Retention:
• 24 hours data retention by default
• Can be extended to 7 days

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Data Firehose 3rd-party Partner Destinations

Lambda
function Datadog
Applications
Kinesis Data
Data Streams transformation AWS Destinations
Amazon S3
Client Record
Up to 1 MB
Amazon Redshift
Amazon Batch writes (COPY through S3)
SDK, KPL CloudWatch
(Logs & Events) Kinesis
Data Firehose Amazon ElasticSearch

Kinesis Agent All or Failed data


Custom Destinations
AWS IoT
Producers S3 backup bucket HTTP Endpoint

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Data Firehose
• Fully Managed Service, no administration, automatic scaling, serverless
• AWS: Redshift / Amazon S3 / ElasticSearch
• 3rd party partner: Splunk / MongoDB / DataDog / NewRelic / …
• Custom: send to any HTTP endpoint
• Pay for data going through Firehose
• Near Real Time
• 60 seconds latency minimum for non full batches
• Or minimum 1MB of data at a time
• Supports many data formats, conversions, transformations, compression
• Supports custom data transformations using AWS Lambda
• Can send failed or all data to a backup S3 bucket

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Data Firehose Delivery Diagram

Data Transformation
Several “blueprint” templates available

Delivery
stream output COPY
Source

Amazon S3 Amazon Redshift


Output Bucket

Source Records
Transformation failures
Delivery Failures

Amazon S3
Other bucket

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Firehose Buffer Sizing
• Firehose accumulates records in a buffer
• The buffer is flushed based on time and size rules

• Buffer Size (ex: 32MB): if that buffer size is reached, it’s flushed
• Buffer Time (ex: 1 minute): if that time is reached, it’s flushed
• Firehose can automatically increase the buffer size to increase throughput

• High throughput => Buffer Size will be hit


• Low throughput => Buffer Time will be hit

• If real-time flush from Kinesis Data Streams to S3 is needed, use Lambda

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Data Streams vs Firehose
Kinesis Data Streams Kinesis Data Firehose

• Streaming service for ingest at scale • Load streaming data into S3 / Redshift /
• Write custom code (producer / ES / 3rd party / custom HTTP
consumer) • Fully managed
• Real-time (~200 ms) • Near real-time (buffer time min. 60 sec)
• Manage scaling (shard splitting / • Automatic scaling
merging) • No data storage
• Data storage for 1 to 365 days • Doesn’t support replay capability
• Supports replay capability

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Analytics, Conceptually…

Kinesis Data
Analytics
Streams
Tools, Outputs
Kinesis Data
Analytics

Kinesis Data
Firehose

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Analytics, In more depth…
Kinesis
Consumers…

Kinesis Data Kinesis Data


Streams Streams

Input SELECT STREAM Output Amazon S3


Stream(s) (ItemID, count(*) Stream(s)
FROM SourceStream
Kinesis Data GROUP BY ItemID; Kinesis Data
Firehose Firehose
Redshift
Reference Error
Kinesis Data
table Stream
Analytics
Amazon S3

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Data Analytics
• Use cases
• Streaming ETL: select columns, make simple transformations, on streaming data
• Continuous metric generation: live leaderboard for a mobile game
• Responsive analytics: look for certain criteria and build alerting (filtering)

• Features
• Pay only for resources consumed (but it’s not cheap)
• Serverless; scales automatically
• Use IAM permissions to access streaming source and destination(s)
• SQL or Flink to write the computation
• Schema discovery
• Lambda can be used for pre-processing

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Full Data Engineering Pipeline
Real-Time Layer
AWS Lambda

producers

Amazon Kinesis Amazon Kinesis Amazon Kinesis Amazon EC2


Data Streams Data Analytics Data Streams

Amazon S3 Amazon Redshift


producers

Amazon Kinesis Amazon S3 Amazon Kinesis


Data Firehose Data Firehose
Amazon Elasticsearch
Service

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Streaming Architectures
3000 messages of 1 KB per second
Kinesis
• 3 shards: 3MB/s in
Kinesis AWS Lambda • 3 * $0.015/hr = $32.4/mth
Data Streams • Must use KDF for output to S3
DynamoDB
Streams
DynamoDB + Streams
DynamoDB AWS Lambda • 3000 WCU = 3 MB/s
• = $1,450.90 / month
• Storage in DynamoDB
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Comparison Charts
Kinesis Data SQS SQS FIFO SNS DynamoDB S3
Streams
Data Immutable Immutable Immutable Immutable Mutable Mutable

Retention 1-7 days, 1-14 days 1-14 days No retention Infinite or can Infinite, can
export to S3 implement setup lifecycle
using KDF TTL policies
Ordering Per shard No ordering Per group-id No ordering No ordering No ordering

Scalability Provision Soft limit 300 msg/s Soft limit WCU & RCU Infinite
Or 3000 if batch 3500 PUT 5500
shards On-demand GET / prefix
Readers EC2, Lambda, EC2, Lambda EC2, Lambda HTTP, DynamoDB SDK, S3
KDF, KDA, KCL Lambda, Streams Events
(checkpoint) Email, SQS…
Latency KDS (200 ms) Low (10- Low (10- Low (10- Low (10- Low (10-
KDF (1 min) 100ms) 100ms) 100ms) 100ms) 100ms)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Batch
• Run batch jobs as Docker images
• Two options:
1. Run on AWS Fargate (fully serverless offering)
2. Dynamic provisioning of the instances (EC2 & Spot Instances) – in VPC
• Optimal quantity and type based on volume and requirements
• No need to manage clusters, fully serverless
• You just pay for the underlying resources used
• Example: batch process of images, running thousands of concurrent jobs
• Schedule Batch Jobs using CloudWatch Events
• Orchestrate Batch Jobs using AWS Step Functions

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Batch – Solution Architecture
metadata
Event Notifications
AWS Batch
AP
Ic
all Amazon
AWS Lambda Amazon ECS DynamoDB

upload retrieve file (API call) EC2 Instance


processed file
User Spot Instance (Spot Fleet)
Amazon S3
AWS Fargate Amazon S3

er
gg
tri
event

CloudWatch Events
Amazon EventBridge
pull Docker images
Amazon ECR

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Batch vs. Lambda
• Lambda:
• Time limit
• Limited runtimes (built in runtimes, or Docker images built for Lambda)
• Limited temporary disk space
• Serverless

• Batch:
• No time limit
• Any runtime as long as it’s packaged as a Docker image
• Rely on EBS / instance store for disk space
• Relies on EC2 (can be managed by AWS) or AWS Fargate

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Batch – Compute Environments
• Managed Compute Environment:
• AWS Batch managed the capacity and instance types within the environment
• You can choose EC2 On-Demand or Spot Instances
• You can choose Fargate On-Demand or Fargate Spot Instances
• You can set a maximum price for Spot Instances
• Launched within your own VPC
• If you launch within your own private subnet, make sure it has access to the ECS service
• Either using a NAT gateway / instance or using VPC Endpoints for ECS

• Unmanaged Compute Environment


• You control and manage EC2 instance configuration, provisioning and scaling

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Batch – Managed Compute Environment
AWS Batch (set min & max vCPU)

Spot Instances

Add jobs Distribute jobs m5.large c5.xlarge r5.2xlarge


SDK AWS Batch Job Queue

Automatically created to respond


To increase / decrease in jobs

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Batch – Multi Node Mode
AWS Batch
• Multi Node: large scale, good for HPC
(high performance computing) Submit Job
Launch and manage
• Leverages multiple EC2 / ECS instances at
the same time
• Good for tightly coupled workloads Same Rack
EC2
EC2
EC2
Same AZ main
• Represents a single job, and specified how
many nodes to create for the job Placement
• 1 main node, and many child node. Group:
EC2 EC2 EC2
Cluster
• Does not work with Spot Instances
• Works better if your EC2 launch mode is a
placement group ”cluster”

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon EMR
• EMR stands for “Elastic MapReduce”
• EMR helps creating Hadoop clusters (Big Data) to analyze and process
vast amount of data
• The clusters can be made of hundreds of EC2 instances
• Also supports Apache Spark, HBase, Presto, Flink…
• EMR takes care of all the provisioning and configuration of EC2
• Auto-scaling with CloudWatch
• Use cases: data processing, machine learning, web indexing, big data…

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EMR – Integrations
EMR

DynamoDB
Read from DynamoDB
EC2 EC2

EMRFS S3 (EMRFS)
(native integration) Permanent Storage
EBS Volume EBS Volume Server-side encryption
(HDFS) (HDFS)
Temporary Storage
VPC – Single AZ

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon EMR – Node types & purchasing
• Master Node: Manage the cluster, coordinate, manage health
• Core Node: Run tasks and store data
• Task Node (optional): Just to run tasks
• Purchasing options:
• On-demand: reliable, predictable, won’t be terminated
• Reserved (min 1 year): cost savings (EMR will automatically use if available)
• Spot Instances: cheaper, can be terminated, less reliable

• Can have long-running cluster, or transient (temporary) cluster


• One big cluster vs many smaller ones? Long running vs transient?

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon EMR – Instance Configuration
• Uniform instance groups: select • Instance fleet: select target
a single instance type and capacity, mix instance types and
purchasing options (no Auto
purchasing option for each Scaling)
node (has auto scaling)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Running Jobs on AWS
Strategy 1: Provision EC2 instance Strategy 3: Reactive Workflow Strategy 5: use Fargate
(long running - CRON jobs) CW Events
S3 Events
API Gateway
EC2 SQS, SNS
Etc… CW Events Fargate

Strategy 2: CloudWatch Events + Lambda Strategy 4: use AWS Batch Strategy 6: Use EMR
(cron) (step execution or cluster)
cron schedule

CW Events CW Events Batch EMR

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Glue
• Managed extract, transform, and load (ETL) service
• Useful to prepare and transform data for analytics
• Fully serverless service
Glue ETL
S3 Bucket

Extract Load
Amazon RDS Transform Redshift
Data Warehouse

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Glue Data Catalog
• Glue Data Catalog: catalog of datasets

Glue Jobs (ETL)

Amazon S3 Amazon Athena


Data discovery
Writes Metadata

Amazon RDS AWS Glue


Data Catalog Amazon
AWS Glue
Database Database Redshift
Data Crawler
Spectrum
Amazon DynamoDB

JDBC Tables Tables


(Metadata) (Metadata)
Amazon EMR
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Redshift Overview
• Redshift is based on PostgreSQL, but it’s not used for OLTP
• It’s OLAP – online analytical processing (analytics and data warehousing)
• 10x better performance than other data warehouses, scale to PBs of data
• Columnar storage of data (instead of row based)
• Massively Parallel Query Execution (MPP)
• Pay as you go based on the instances provisioned
• Has a SQL interface for performing the queries
• BI tools such as AWS Quicksight or Tableau integrate with it

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Redshift Continued…
• Data is loaded from S3, Kinesis Firehose, DynamoDB, DMS…
• Based on node type: up to 100+ nodes, up to 16 TB of space per node
• Can provision multiple nodes, but it’s not Multi-AZ
• Leader node: for query planning, results aggregation
• Compute node: for performing the queries, send results to leader
• Backup & Restore, Security VPC / IAM / KMS, Monitoring
• Redshift Enhanced VPC Routing: COPY / UNLOAD goes through VPC
• Redshift is provisioned, so it’s worth it when you have a sustained usage
(use Athena if the queries are sporadic instead)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Redshift – Snapshots & DR
• Snapshots are point-in-time backups of a cluster, stored internally in S3
• Snapshots are incremental (only what has changed is saved)
• You can restore a snapshot into a new cluster
• Automated: every 8 hours, every 5 GB, or on a schedule. Set retention
• Manual: snapshot is retained until you delete it

• You can configure Amazon Redshift to automatically copy snapshots


(automated or manual) of a cluster to another AWS Region

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Cross-Region Snapshot Copy for an KMS-
Encrypted Redshift
Source Destination
us-east-1 eu-west-2

Redshift Redshift
copy to
Redshift Snapshot eu-west-2 Redshift Snapshot
(encrypted) (encrypted)

KMS Key - A KMS Key - B

snapshot copy grant


Enables Redshift to perform encryption operations in the destination Region

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Redshift Spectrum Query
SELECT COUNT (*), …
FROM S3.EXT_TABLE
GROUP BY …

JDBC/ODBC

• Query data that is already in Amazon Redshift Cluster


S3 without loading it
Leader Node
• Must have a Redshift cluster
available to start the query
Compute Nodes
• The query is then submitted
to thousands of Redshift
Spectrum nodes Redshift Spectrum
1 2 …. N

Amazon S3

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Redshift Workload Management (WLM)
• Enables you to flexibly manage queries’
priorities within workloads Admin User
• Example: prevent short, fast-running queries
from getting stuck behind long-running system
short-running
queries
queries queries
Long-running
• Define multiple query queues (Superuser query
queue, User-defined queues)
Amazon Redshift
• Route queries to the appropriate queues at
runtime
• Automatic WLM – queues and resources
managed by Redshift
• Manual WLM – queues and resources
managed by you Superuser Short-running Long-running
Queue Queue Queue
WLM
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Redshift Concurrency Scaling
Users
• Enables you to provide consistently fast
performance with virtually unlimited
concurrent users and queries
• Redshift automatically adds additional
cluster capacity (Concurrency-Scaling
Cluster) to process an increase in
requests Amazon Redshift

• Ability to decide which queries sent to


Redshift Cluster Concurrency-Scaling
the concurrency-Scaling Cluster using Cluster
WLM
Nodes Nodes
• Charged per second …

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DocumentDB
• Aurora is an “AWS-implementation” of PostgreSQL / MySQL …
• DocumentDB is the same for MongoDB (which is a NoSQL database)
• MongoDB is used to store, query, and index JSON data

• Similar “deployment concepts” as Aurora


• Fully Managed, highly available with replication across 3 AZ
• DocumentDB storage automatically grows in increments of 10GB

• Automatically scales to workloads with millions of requests per seconds

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Athena & Quicksight
• Athena:
• Serverless SQL queries on top of your data in S3, pay per query, output to S3
• Supports CSV, JSON, Parquet, ORC, etc…
• Queries are logged in CloudTrail (which can be chained with CloudWatch logs)
• Great for sporadic queries
• Ready-to-use queries for VPC Flow Logs, CloudTrail, ALB Access Logs, Cost and
Usage reports (billing), etc…
• Quicksight:
• Business Intelligence tool for data visualization, creating dashboards
• Integrates with Athena, Redshift, EMR, RDS…

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Athena
Quicksight

Amazon S3

Amazon S3

Glue Data Catalog Athena

Log queries

Stream Metric Filter

CloudTrail CloudWatch Logs CloudWatch Alarms


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Quicksight

RDS / JDBC

Redshift
Quicksight

Athena

Amazon S3

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Full Data Engineering Pipeline
Analytics layer
Hadoop / Spark / Hive…

Amazon EMR

Data Warehousing

Redshift /
Redshift Spectrum

Visualization

Amazon QuickSight
Amazon Athena
Serverless

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Big Data Ingestion Pipeline
IoT Devices Pull data

Real-time Ingestion Reporting


Every 1 minute Bucket Bucket
trigger

Amazon Kinesis Data Amazon Kinesis Data Amazon Simple Storage Amazon Simple Queue AWS Lambda Amazon Athena Amazon Simple Storage
Streams Firehose Service (S3) Service Service (S3)
(optional)

AWS Lambda
Amazon QuickSight Amazon Redshift
(not serverless)
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Comparison of warehousing technologies
• EMR
• Need to use Big Data tools such as Apache Hive, Spark
• One long-running cluster, many jobs, with auto-scaling, or one cluster per job?
• Purchasing options – Spot, On Demand, Reserved Instances
• Can access data in DynamoDB and / or S3
• Scratch data on EBS disks (HDFS) and long term storage in S3 (EMRFS)
• Athena
• Simple queries and aggregations, data must live in S3
• Serverless, simple SQL queries, out-of-the-box queries for many services (cost & billing..)
• Audit queries through CloudTrail
• Redshift
• Advanced SQL queries, must provision servers
• Can leverage Redshift Spectrum for serverless queries on S3

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Monitoring Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudWatch
• CloudWatch Metrics
• Provided by many AWS services
• EC2 standard: 5 minutes, detailed monitoring: 1 minute
• EC2 RAM is not a built-in metric
• Can create custom metrics: standard resolution 1 minute, high resolution 1 sec
• CloudWatch Alarms
• Can trigger actions: EC2 action (reboot, stop, terminate, recover), Auto Scaling, SNS
• Alarm events can be intercepted by CloudWatch Events
• CloudWatch Dashboards
• Display metrics and alarms
• Can show metrics of multiple regions

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudWatch Alarms integrations

Kinesis
CloudWatch Alarm CloudWatch Event

Step Functions

EC2 Action Auto Scaling SNS SQS Lambda


Stop, Terminate
Reboot, Recover

Lambda
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudWatch Events
• Intercept events from AWS services
• Example: EC2 Instance Start, CodeBuild Failure, S3, Trusted Advisor
• Can intercept any API call with CloudTrail integration

• Notable targets:
• Compute: Lambda, Batch, ECS task
• Orchestration: Step Functions, CodePipeline, CodeBuild
• Integration: SQS, SNS, Kinesis Data Streams, Kinesis Data Firehose
• Maintenance: SSM, EC2 Actions

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS CloudWatch Logs - Sources
• SDK, CloudWatch Logs Agent, CloudWatch Unified Agent
• Elastic Beanstalk: collection of logs from application
• ECS: collection from containers
• AWS Lambda: collection from function logs
• VPC Flow Logs: VPC specific logs
• API Gateway
• CloudTrail based on filter
• CloudWatch log agents: for example on EC2 machines
• Route53: Log DNS queries

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudWatch Logs
• Log groups: arbitrary name, usually representing an application
• Log stream: instances within application / log files / containers
• Can define log expiration policies (never expire, 30 days, etc..)
• Optional KMS encryption
• CloudWatch Logs can send logs to:
• Amazon S3 (exports)
• Kinesis Data Streams
• Kinesis Data Firehose
• AWS Lambda
• ElasticSearch

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudWatch Logs Metric Filter & Insights
• CloudWatch Logs can use filter expressions
• For example, find a specific IP inside of a log
• Or count occurrences of “ERROR” in your logs
• Metric filters can be used to trigger alarms

• CloudWatch Logs Insights can be used to query logs and add queries to
CloudWatch Dashboards

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudWatch Logs – S3 Export

• S3 buckets must be encrypted with


AES-256 (SSE-S3), not SSE-KMS
• Log data can take up to 12 hours to
become available for export
CloudWatch Logs Amazon S3
• The API call is CreateExportTask

• Not near-real time or real-time… use


Logs Subscriptions instead

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudWatch Logs Subscriptions
Lambda Function Real time
(managed by AWS)

Amazon ES

Near
Real Time

CloudWatch Logs Subscription Filter Kinesis Data Firehose Amazon S3

KDF, KDA, EC2, Lambda…


Lambda Function
(custom) Kinesis Data Streams

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudWatch Logs Aggregation
Multi-Account & Multi Region
ACCOUNT A
REGION 1

CloudWatch Logs Subscription Filter

ACCOUNT B Near
REGION 2 Real Time

CloudWatch Logs Subscription Filter Kinesis Data Streams Kinesis Data Firehose Amazon S3

ACCOUNT B
REGION 3

CloudWatch Logs Subscription Filter

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudWatch Logs Agent & Unified Agent
• For virtual servers (EC2 instances, on-premises servers…)
• CloudWatch Logs Agent
• Old version of the agent
• Can only send to CloudWatch Logs
• CloudWatch Unified Agent
• Collect additional system-level metrics such as RAM, processes, etc…
• Collect logs to send to CloudWatch Logs
• Centralized configuration using SSM Parameter Store
• Batch Sends
• batch_count: number of log events to send (default 10000, min 1)
• batch_duration: duration of batching for log events (default & min is 5000ms)
• batch_size: max size of log events in a batch (default & max is 1 MB)
• Both agents cannot send logs to Kinesis

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS X-Ray
Visual analysis of our applications

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
X-Ray
• Tracing requests across your microservices (distributed systems)
• Integrations with:
• EC2 – install the X-Ray agent
• ECS – install the X-Ray agent or Docker container
• Lambda
• Beanstalk - agent is automatically installed
• API Gateway – helpful to debug errors (such as 504)

• The X-Ray agent or services need IAM permissions to X-Ray

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Deployment and Instance
Management Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Elastic Beanstalk Overview
• Elastic Beanstalk is a developer centric view of deploying an application
on AWS

• It uses all the component’s we’ve seen before:


EC2, Auto Scaling Group, Elastic Load Balancers, RDS, etc…
• But it’s all in one view that’s easy to make sense of!
• We still have full control over the configuration of each component

• Beanstalk is free but you pay for the underlying instances

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Beanstalk
• Support for many platforms: • Multicontainer Docker
• Go • Preconfigured Docker
• Java SE
• Java with Tomcat
• .NET on Windows Server with IIS • If not supported, you can write your
• Node.js custom platform (advanced)
• PHP
• Python
• Ruby • Beanstalk is great to “Replatform”
• Packer Builder your application from on-premises to
• Single Container Docker the cloud

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Beanstalk
• Managed service
• Instance configuration / OS is handled by Beanstalk
• Deployment strategy is configurable but performed by Elastic Beanstalk

• Just the application code is the responsibility of the developer

• Three architecture models:


• Single Instance deployment: good for dev
• LB + ASG: great for production or pre-production web applications
• ASG only: great for non-web apps in production (workers, etc..)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Beanstalk Environments
Single Instance High Availability with Load Balancer Worker Tier
Great for dev Great for prod
PUT

SQS Queue
Availability Zone 1 Availability Zone 1 ALB Availability Zone 2
Elastic IP
Auto Scaling Group

EC2 Instance EC2 Instance EC2 Instance

Worker Tier = SQS + EC2


RDS Master RDS Master RDS Standby Auto Scaling Group

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Web Server vs Worker Environment
• If your application performs tasks that are long to complete, offload these tasks to a dedicated
worker environment
• Decoupling your application into two tiers is common
• Example: processing a video, generating a zip file, etc
• You can define periodic tasks in a file cron.yaml

Web Tier = ELB + EC2 Worker Tier = SQS + EC2

requests PUT

ALB
Auto Scaling Group SQS Queue Auto Scaling Group

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Beanstalk Deployment
Blue / Green

Environment “blue”
v1
• Not a “direct feature” of Elastic Beanstalk
• Zero downtime and release facility v1
• Create a new “stage” environment and

90
deploy v2 there v1

%
• The new environment (green) can be Web traffic

validated independently and roll back if

Environment “green”
issues v2

%
Amazon Route 53
• Route 53 can be setup using weighted

10
policies to redirect a little bit of traffic to v2
the stage environment
• Using Beanstalk, “swap URLs” (DNS v2
swap) when done with the environment
test
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS OpsWorks
• Chef & Puppet help you perform server configuration automatically, or
repetitive actions
• They work great with EC2 & On Premise VM
• AWS OpsWorks = Managed Chef & Puppet
• It’s an alternative to AWS SSM

• If you’re already using cookbooks (chef) on premise, OpsWorks is good


• Migrating from other tech to OpsWorks or vice–versa is not easy

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Quick word on Chef / Puppet
• They help with managing configuration as code
• Helps in having consistent deployments
• Works with Linux / Windows
• Can automate: user accounts, cron, ntp, packages, services…

• They leverage “Recipes”, “Cookbooks” or ”Manifests”

• Chef / Puppet have similarities with SSM / Beanstalk / CloudFormation


but they’re open-source tools that work cross-cloud

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
OpsWorks Architecture
OpsWorks Stack
Cookbook App
Repository Repository
Elastic Load
Balancer Layer ALB
OpsWorks Layers

Application
Server Layer
App Server Instances (EC2) OpsWorks Layers
Applications

Database
Layer
Database Server (RDS) OpsWorks Layers

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS CodeDeploy
• We want to deploy our application
automatically to many EC2 instances
• These instances are not managed by v1 v2
Elastic Beanstalk
• There are several ways to handle
deployments using open source v1 v2
tools (Ansible, Terraform, Chef,
Puppet, etc…)
v1 v2
• We can use the managed Service
AWS CodeDeploy
• CodeDeploy can deploy to: v1 v2
EC2, ASG, ECS & Lambda

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CodeDeploy to EC2
• Define how to deploy the
application using
appspec.yml + deployment
v1 v2 v2 v2
strategy

Half
• Will do in-place update to v1 v2 v2 v2
your fleet of EC2 instances

Other Half
• Can use hooks to verify v1 v1 v1 v2

the deployment after each


v1 v1 v1 v2
deployment phase

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CodeDeploy to ASG
ALB
• In place updates:
• Updates current existing EC2
instances Blue/Green deployment

• Instances newly created by an Auto Scaling Group

ASG will also get automated


deployments
• Blue / green deployment:
EC2 Instances EC2 Instances
• A new auto-scaling group is Launch Template v1 Launch Template v2
created (settings are copied)
• Choose how long to keep the
old instances
• Must be using an ELB

CodeDeploy
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CodeDeploy to AWS Lambda
Lambda Alias
CloudWatch
• Traffic Shifting feature Alarm v1
Trigger
• Pre and Post traffic hooks deployment
features to validate deployment (CICD)
Traffic shifting with alias
(before the traffic shift starts
and after it ends) CodeDeploy
v2
• Easy & automated rollback Run tests
using CloudWatch Alarms
• SAM framework natively uses Lambda Function
Pre-Traffic Hook
CodeDeploy
Run tests
Lambda Function
Post-Traffic Hook
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CodeDeploy to ECS
ALB
• Support for Blue/Green deployments
for Amazon ECS and AWS Fargate Blue/Green deployment
Traffic Shifting
• Setup is done within the ECS service
ECS Service
definition
• A new task set is created, and traffic
is re-routed to the new task test. ECS Tasks ECS Tasks
Definition 1 Definition 2
• Then if everything is stable for X
minutes, the old task set is
terminated (so you have time to
notice issues) CodeDeploy

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS CloudFormation
• Infrastructure as code (IaC) in AWS
• Portability of stacks across multiple accounts and regions

• Backbone of the Elastic Beanstalk service


• Backbone of the Service Catalog service
• Backbone of the SAM (Serverless Application Model) framework

• Must-know service as a developer / sysops / devops

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFormation & ASG
• CloudFormation manages the ASG, CloudFormation
not the underlying EC2
• You can define “success conditions” UpdatePolicy
for the launch of your EC2 instances
using a CreationPolicy Auto Scaling group
New launch config

• You can define “update strategies” for


the update of your EC2 instances
V1 V2
using an UpdatePolicy
• To update the underlying EC2 in an
ASG, you have to create a new launch
configuration / launch template & use V1 V2
an UpdatePolicy

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Retaining Data on Deletes
• You can put a DeletionPolicy on any resource to control what happens when
the CloudFormation template is deleted
• DeletionPolicy=Retain:
• Specify on resources to preserve / backup in case of CloudFormation deletes
• To keep a resource, specify Retain (works for any resource / nested stack)
• DeletionPolicy=Snapshot:
• EBS Volume, ElastiCache Cluster, ElastiCache ReplicationGroup
• RDS DBInstance, RDS DBCluster, Redshift Cluster
• DeletePolicy=Delete (default behavior):
• Note: for AWS::RDS::DBCluster resources, the default policy is Snapshot
• Note: to delete an S3 bucket, you need to first empty the bucket of its content

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFormation and IAM
• When deploying a CloudFormation stack
1. it uses the permissions of our own IAM principal
2. OR assign an IAM role to the stack that can perform the actions

• If you create IAM resources, you need to explicitly provide a “capability”


to CloudFormation CAPABILITY_IAM and CAPABILITY_NAMED_IAM

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFormation Custom Resources (Lambda)
• You can define a Custom Resource in
CloudFormation
CloudFormation to address any of these Custom Resource
use cases:
Create, update, delete
• An AWS resource is not yet supported
(new service for example)
• An on-premises resource AWS Lambda Function

• Emptying an S3 bucket before being


deleted API calls
• Fetch an AMI id
• Anything you want…! Whatever you want

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFormation – Cross vs Nested Stacks
• Cross Stacks Stack 1
• Helpful when stacks have different lifecycles
• Use Outputs Export and Fn::ImportValue VPC
• When you need to pass export values to Stack 2
Stack
many stacks (VPC Id, etc…)
Stack 3
• Nested Stacks
• Helpful when components must be re-used
• Ex: re-use how to properly configure an App Stack App Stack
Application Load Balancer RDS RDS
• The nested stack only is important to the Stack Stack
higher level stack (it’s not shared)
ASG ELB ASG ELB
Stack Stack Stack Stack

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFormation – Others Concepts
• CloudFormer
• Create an AWS CloudFormation template from existing AWS resources

• ChangeSets
• Generate & Preview the CloudFormation changes before they get applied

• StackSets
• Deploy a CloudFormation stack across multiple accounts and regions

• Stack Policies
• Prevent accidental updates / deletes to stack resources

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFormation – Integration with Secrets
Manager
secret is generated

reference secret in
RDS DB instance

link the secret to


RDS DB instance

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Service Catalog
• Users that are new to AWS have too many options, and may create
stacks that are not compliant / in line with the rest of the organization

• Some users just want a quick self-service portal to launch a set of


authorized products pre-defined by admins

• Includes: virtual machines, databases, storage options, etc…

• Enter AWS Service Catalog!

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Service Catalog diagram
Product Portfolio Control
ADMIN TASKS

CloudFormation Collection of Products IAM Permissions to


Templates Access Portfolios
USER TASKS

Product List Provisioned Products

launch

Authorized by IAM Ready to use


Properly Configured
Properly Tagged
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Service Catalog
• Create and manage catalogs of IT services that are approved on AWS
• The “products” are CloudFormation templates
• Ex: Virtual machine images, Servers, Software, Databases, Regions, IP address ranges
• CloudFormation helps ensure consistency, and standardization by Admins
• They are assigned to Portfolios (teams)
• Teams are presented a self-service portal where they can launch the products
• All the deployed products are centrally managed deployed services
• Helps with governance, compliance, and consistency
• Can give user access to launching products without requiring deep AWS knowledge
• Integrations with “self-service portals” such as ServiceNow

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS SAM - Serverless Application Model
• SAM = Serverless Application Model
• Framework for developing and deploying serverless applications
• All the configuration is YAML code. Examples:
• Lambda Functions (AWS::Serverless::Function)
• DynamoDB tables (AWS::Serverless::SimpleTable)
• API Gateway (AWS::Serverless::API)
• StepFunction - State Machine (AWS::Serverless::StateMachine)
• SAM can help you to run Lambda, API Gateway, DynamoDB locally
• SAM can use CodeDeploy to deploy Lambda functions (traffic shifting)
• Leverages CloudFormation in the backend

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CICD Architecture for SAM
DynamoDB

CodePipeline

v1

Traffic shifting with alias

CodeCommit CodeBuild CloudFormation CodeDeploy v2


Build, test, package + SAM

API Gateway

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Cloud Development Kit (CDK)
• Define your cloud infrastructure using a familiar language:
• JavaScript/TypeScript, Python, Java, and .NET
• The code is “compiled” into a CloudFormation template (JSON/YAML)
• You can therefore deploy infrastructure and application runtime code together
• Great for Lambda functions
• Great for Docker containers in ECS / EKS

CDK Application

CDK CLI CloudFormation CloudFormation


Programming Template
Languages

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CDK Example

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Deployment Options
• Vanilla EC2 with User Data (just for the • Elastic Beanstalk
first launch) • In-place all at once upgrades
• Build an AMI for things that are slow to • Rolling upgrades (with or without additional
instances)
install (runtimes, updates, tools), and use • Immutable upgrades (new instances)
EC2 user data for quick runtime setup
• Blue / Green (entirely new stack)
• Auto Scaling Group with launch
template (AMI) • OpsWorks
• For chef / puppet stacks only
• CodeDeploy (no new AMI – application • Can manage ELB and EC2 instances
deployments) • Cannot manage an ASG
• In-place on EC2
• In-place on ASG • SAM Framework
• New instances on ASG • Leverages CloudFormation & CodeDeploy
• Traffic shifting for AWS Lambda • CDK
• New task set for ECS + traffic shifting • Manage infra with a programming language
• Leverages CloudFormation

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Systems Manager Overview
• Helps you manage your EC2 and on-premises systems at scale
• Get operational insights about the state of your infrastructure
• Easily detect problems
• Patching automation for enhanced compliance
• Works for both Windows and Linux OS
• Integrated with CloudWatch metrics / dashboards
• Integrated with AWS Config
• Free service

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Systems Manager Features
• Resource Groups Action:
• Insights: • Automation (shut down EC2, create AMIs)
• Insights Dashboard • Run Command
• Inventory: discover and audit • Session Manager
the software installed • Patch Manager
• Compliance • Maintenance Windows
• Parameter Store • State Manager: define and maintaining
configuration of OS and applications

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
How Systems Manager works
• We need to install the SSM
agent onto the systems we SSM
control
• Installed by default on Amazon
Linux AMI & some Ubuntu
AMI
• If an instance can’t be
controlled with SSM, it’s
probably an issue with the
SSM agent! SSM Agent SSM Agent SSM Agent
• Make sure the EC2 instances
have a proper IAM role to
allow SSM actions EC2 Instance EC2 Instance On Premise VM

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Systems Manager
Run Command
• Execute a document (= script) or
just run a command SSM Service

• Run command across multiple


instances (using resource groups) RunCommand
• Rate Control / Error Control
• Integrated with IAM & CloudTrail
SSM Agent
• No need for SSH
Command is run

• Results in the console EC2 Instance


No SSH access needed

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SSM Patch Manager –
Predefined Patch Baselines
• Defines which patches should or shouldn’t be installed on your instances
• Linux:
• AWS-AmazonLinux2DefaultPatchBaseline
• AWS-CentOSDefaultPatchBaseline
• AWS-RedHatDefaultPatchBaseline
• AWS-SuseDefaultPatchBaseline
• AWS-UbuntuDefaultPatchBaseline
• Windows: (patches are auto-approved 7 days after the release)
• AWS-DefaultPatchBaseline: install OS patch CriticalUpdates & SecurityUpdates
• AWS-WindowsPredefinedPatchBaseline-OS: same as “AWS-DefaultPatchBaseline”
• AWS-WindowsPredefinedPatchBaseline-OS-Applications: also updates Microsoft applications
• Can define your own custom patch baselines as well (OS, classification, severity…)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SSM Patch Managers – Steps
1. Define a patch baseline to use (or multiple if
you have multiple environments)
2. Define patch groups: define based on tags,
example different environments (dev, test,
prod) – use tag Patch Group
3. Define Maintenance Windows (schedule,
duration, registered targets/patch groups and
registered tasks)
4. Add the AWS-RunPatchBaseline Run
Command as part of the registered tasks of the
Maintenance Window (works cross platform
Linux & Windows)
5. Define Rate Control (concurrency & error
threshold) for the task
6. Monitor Patch Compliance using SSM Inventory

https://aws.amazon.com/blogs/mt/patching-your-windows-ec2-instances-using-aws-systems-manager-patch-manager/

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Parameter Store
• Secure storage for configuration and secrets Applications

• Optional Seamless Encryption using KMS


• Serverless, scalable, durable, easy SDK, free Plaintext Encrypted
configuration configuration
• Version tracking of configurations / secrets
• Configuration management using path & IAM
SSM Parameter
• Notifications with CloudWatch Events Check IAM
permissions
Store

• Integration with CloudFormation


Decryption
Service
• Can retrieve secrets from Secrets Manager using
the SSM Parameter Store API AWS KMS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Parameter Store Hierarchy
• /my-department/
• my-app/ GetParameters or
• dev/ GetParametersByPath API
• db-url Dev Lambda
• db-password Function
• prod/
• db-url
Prod Lambda
• db-password
Function
• other-app/
• /other-department/
• /aws/reference/secretsmanager/secret_ID_in_Secrets_Manager
• /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Cost Control Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Cost Allocation Tags
• With Tags we can track resources that relate to each other
• With Cost Allocation Tags we can enable detailed costing reports
• Just like Tags, but they show up as columns in Reports
• AWS Generated Cost Allocation Tags
• Automatically applied to the resource you create
• Starts with Prefix aws: (e.g. aws: createdBy)
• They’re not applied to resources created before the activation
• User tags
• Defined by the user
• Starts with Prefix user:
• Cost Allocation Tags just appear in the Billing Console
• Takes up to 24 hours for the tags to show up in the report

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Tag Editor
• Allows you to manage tags of multiple resources at once
• You can add/update/delete tags
• Search tagged/untagged resources in all AWS Regions

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Trusted Advisor
• No need to install anything – high level AWS account assessment
• Analyze your AWS accounts and provides recommendation:
• Cost Optimization & Recommendations
• Performance
• Security
• Fault Tolerance
• Service Limits
• Core Checks and recommendations – all customers
• Can enable weekly email notification from the console
• Full Trusted Advisor – Available for Business & Enterprise support plans
• Ability to set CloudWatch alarms when reaching limits
• Programmatic Access using AWS Support API

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Support Plans
Basic Support Developer Business Enterprise
Recommended if you Recommended if you have business
included for all AWS Recommended if you have
are experimenting or and/or mission critical workloads in
customers and free production workloads in AWS.
testing in AWS. AWS.
AWS Trusted Full set of checks Full set of checks
Advisor Best 7 Core checks 7 Core checks + Programmatic Access using + Programmatic Access using AWS
Practice Checks AWS Support API Support API
Business hours email 24x7 phone, email, and chat
24x7 access to 24x7 phone, email, and chat access
access to Cloud access to Cloud Support
customer service, to Cloud Support Engineers
Enhanced Support Associates Engineers
documentation,
Technical Support
whitepapers, and Unlimited cases / unlimited
Unlimited cases / 1 Unlimited cases / unlimited
support forums. contacts (IAM supported)
primary contact contacts (IAM supported)

General guidance: < 24 hours General guidance: < 24 hours


General guidance:
System impaired: < 12 hours System impaired: < 12 hours
< 24 business hours**
Case Severity / Production system impaired: Production system impaired: < 4 hours
Response Times System impaired:
< 4 hours Production system down: < 1 hour
Production system down: Business-critical system down:
< 12 business hours**
< 1 hour < 15 minutes

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Trusted Advisor – Good to know
• Can check if an S3 bucket is made
public
• But cannot check for S3 objects that are
public inside of your bucket!
• Use CloudWatch Events / S3 Events
instead

• Service Limits
• Limits can only be monitored in Trusted
Advisor (cannot be changed)
• Cases have to be created manually in
AWS Support Centre to increase limits
• OR use the new AWS Service Quotas
service (new service - has an API)

https://aws.amazon.com/solutions/limit-monitor/

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EC2 Instance Launch Types
• On Demand Instances: short workload, predictable pricing, reliable
• Spot Instances: short workloads, for cheap, can lose instances (not reliable)
• Reserved: (MINIMUM 1 year)
• Reserved Instances: long workloads
• Convertible Reserved Instances: long workloads with flexible instances
• Dedicated Instances: no other customers will share your hardware
• Dedicated Hosts: book an entire physical server, control instance placement
• Great for software licenses that operate at the core, or socket level
• Can define host affinity so that instance reboots are kept on the same host

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Savings Plan
• New pricing model to get a discount based on long-term usage
• Commit to a certain type of usage: ex $10 per hour for 1 to 3 years
• Any usage beyond the savings plan is billed at the on-demand price

• EC2 Instance Savings plan (up to 72% - same discount as Standard RIs)
• Select instance family (e.g. M5, C5…), and locked to a specific region
• Flexible across size (m5.large to m5.4xlarge), OS (Windows to Linux), tenancy
(dedicated or default)
• Compute Savings plan (up to 66% - same discount as Convertible RIs)
• Ability to move between instance family (move from C5 to M5), region (Ireland to US),
compute type (EC2, Fargate, Lambda), OS & tenancy
• SageMaker Savings plan (up to 64% off)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Storage Classes
• Amazon S3 Standard - General Purpose
• Amazon S3 Standard-Infrequent Access (IA)
• Amazon S3 One Zone-Infrequent Access
• Amazon S3 Glacier Instant Retrieval
• Amazon S3 Glacier Flexible Retrieval
• Amazon S3 Glacier Deep Archive
• Amazon S3 Intelligent Tiering

• Can move between classes manually or using S3 Lifecycle configurations

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 – Other Cost Savings
• S3 Select & Glacier Select: save in network and CPU cost
• S3 Lifecycle Rules: transition objects between tiers
• Compress objects to save space
• S3 Requester Pays:
• In general, bucket owners pay for all Amazon S3 storage and data transfer costs
associated with their bucket
• With Requester Pays buckets, the requester instead of the bucket owner pays
the cost of the request and the data download from the bucket
• The bucket owner always pays the cost of storing data
• Helpful when you want to share large datasets with other accounts
• If an IAM role is assumed, the owner account of that role pays for the request

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Storage Classes
• Amazon S3 Standard - General Purpose
• Amazon S3 Standard-Infrequent Access (IA)
• Amazon S3 One Zone-Infrequent Access
• Amazon S3 Glacier Instant Retrieval
• Amazon S3 Glacier Flexible Retrieval
• Amazon S3 Glacier Deep Archive
• Amazon S3 Intelligent Tiering

• Can move between classes manually or using S3 Lifecycle configurations

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Durability and Availability
• Durability:
• High durability (99.999999999%, 11 9’s) of objects across multiple AZ
• If you store 10,000,000 objects with Amazon S3, you can on average expect to
incur a loss of a single object once every 10,000 years
• Same for all storage classes

• Availability:
• Measures how readily available a service is
• Varies depending on storage class
• Example: S3 standard has 99.99% availability = not available 53 minutes a year

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Standard – General Purpose
• 99.99% Availability
• Used for frequently accessed data
• Low latency and high throughput
• Sustain 2 concurrent facility failures

• Use Cases: Big Data analytics, mobile & gaming applications, content
distribution…

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Storage Classes – Infrequent Access
• For data that is less frequently accessed, but requires rapid access when needed
• Lower cost than S3 Standard

• Amazon S3 Standard-Infrequent Access (S3 Standard-IA)


• 99.9% Availability
• Use cases: Disaster Recovery, backups

• Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)


• High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed
• 99.5% Availability
• Use Cases: Storing secondary backup copies of on-premise data, or data you can recreate

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon S3 Glacier Storage Classes
• Low-cost object storage meant for archiving / backup
• Pricing: price for storage + object retrieval cost

• Amazon S3 Glacier Instant Retrieval


• Millisecond retrieval, great for data accessed once a quarter
• Minimum storage duration of 90 days
• Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier):
• Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) – free
• Minimum storage duration of 90 days
• Amazon S3 Glacier Deep Archive – for long term storage:
• Standard (12 hours), Bulk (48 hours)
• Minimum storage duration of 180 days

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Intelligent-Tiering
• Small monthly monitoring and auto-tiering fee
• Moves objects automatically between Access Tiers based on usage
• There are no retrieval charges in S3 Intelligent-Tiering

• Frequent Access tier (automatic): default tier


• Infrequent Access tier (automatic): objects not accessed for 30 days
• Archive Instant Access tier (automatic): objects not accessed for 90 days
• Archive Access tier (optional): configurable from 90 days to 700+ days
• Deep Archive Access tier (optional): config. from 180 days to 700+ days

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Storage Classes Comparison
Intelligent- Glacier Instant Glacier Flexible Glacier Deep
Standard Standard-IA One Zone-IA
Tiering Retrieval Retrieval Archive

Durability 99.999999999% == (11 9’s)

Availability 99.99% 99.9% 99.9% 99.5% 99.9% 99.99% 99.99%

Availability SLA 99.9% 99% 99% 99% 99% 99.9% 99.9%

Availability
>= 3 >= 3 >= 3 1 >= 3 >= 3 >= 3
Zones

Min. Storage
None None 30 Days 30 Days 90 Days 90 Days 180 Days
Duration Charge

Min. Billable
None None 128 KB 128 KB 128 KB 40 KB 40 KB
Object Size

Retrieval Fee None None Per GB retrieved Per GB retrieved Per GB retrieved Per GB retrieved Per GB retrieved

https://aws.amazon.com/s3/storage-classes/
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
S3 Storage Classes – Price Comparison
Example: us-east-1
Glacier Instant Glacier Flexible Glacier Deep
Standard Intelligent-Tiering Standard-IA One Zone-IA
Retrieval Retrieval Archive
Storage Cost
$0.023 $0.0025 - $0.023 %0.0125 $0.01 $0.004 $0.0036 $0.00099
(per GB per month)
GET: $0.0004
GET: $0.0004
POST: $0.03
POST: $0.05
Retrieval Cost GET: $0.0004 GET: $0.0004 GET: $0.001 GET: $0.001 GET: $0.01
(per 1000 request) POST: $0.005 POST: $0.005 POST: $0.01 POST: $0.01 POST: $0.02 Expedited: $10
Standard: $0.10
Standard: $0.05
Bulk: $0.025
Bulk: free

Expedited (1 – 5 mins)
Standard (12 hours)
Retrieval Time Instantaneous Standard (3 – 5 hours)
Bulk (48 hours)
Bulk (5 – 12 hours)

Monitoring Cost
$0.0025
(pet 1000 objects)

https://aws.amazon.com/s3/pricing/

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Budgets
• Create budget and send alarms when costs exceeds the budget
• 4 types of budgets: Usage, Cost, Reservation, Savings Plans
• For Reserved Instances (RI)
• Track utilization
• Supports EC2, ElastiCache, RDS, Redshift
• Up to 5 SNS notifications per budget
• Can filter by: Service, Linked Account, Tag, Purchase Option, Instance
Type, Region, Availability Zone, API Operation, etc…
• Same options as AWS Cost Explorer!
• 2 budgets are free, then $0.02/day/budget

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Cost Explorer
• Visualize, understand, and manage your AWS costs and usage over time
• Create custom reports that analyze cost and usage data.
• Analyze your data at a high level: total costs and usage across all accounts
• Or Monthly, hourly, resource level granularity
• Choose an optimal Savings Plan (to lower prices on your bill)
• Forecast usage up to 12 months based on previous usage

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Cost Explorer – Example

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Migrations Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Cloud Migration: The 6R
• From: https://aws.amazon.com/blogs/enterprise-strategy/6-strategies-for-
migrating-applications-to-the-cloud/

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Cloud Migration: The 6R
• Rehosting: “lift and shift”
• Simple migrations by re-hosting on AWS (applications, databases, data…)
• No cloud optimizations being done, application is migrated as is
• Could save as much as 30% on cost
• Example: Migrate using AWS VM Import/Export, AWS Server Migration Service

• Replatforming:
• Example: migrate your database to RDS
• Example: migrate your application to Elastic Beanstalk (Java with Tomcat)
• Not changing the core architecture, but leverage some cloud optimizations

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Cloud Migration: The 6R
• Repurchase: “drop and shop”
• Moving to a different product while moving to the cloud
• Often you move to a SaaS platform
• Expensive in the short term, but quick to deploy
• Example: CRM to Salesforce.com, HR to Workday, CMS to Drupal

• Refactoring / Re-architecting:
• Reimagining how the application is architected using Cloud Native features
• Driven by the need of the business to add features, scale, performance
• Example: move an application to Serverless architectures, use AWS S3

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Cloud Migration: The 6R
• Retire
• Turn off things you don’t need (maybe as a result of Re-architecting)
• Helps with reducing the surface areas for attacks (more security)
• Save cost, maybe up to 10% to 20%
• Focus your attention on resources that must be maintained

• Retain
• Do nothing for now (for simplicity, cost reason, importance…)
• It’s still a decision to make in a Cloud Migration

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Storage Gateway
• Bridge between on-premises data and
cloud data in S3 Files Tapes
Volumes
• Use cases: disaster recovery, backup &
restore, tiered storage

• 4 types of Storage Gateway: AWS Storage Gateway


• File Gateway
• Volume Gateway
• Tape Gateway
• Amazon FSx File Gateway

• Exam Tip: You need to know the


differences between all 4!
Amazon EBS
S3 Glacier

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
File Gateway
• File Gateway appliance is a virtual machine to
bridge between your NFS and S3
• Metadata and directory structure are preserved
• Configured S3 buckets are accessible using the
NFS and SMB protocol
• Each File Gateway should have an IAM role to
access S3
• Most recently used data is cached in the file
gateway
• Can be mounted on many servers
• Whitepaper:
https://d0.awsstatic.com/whitepapers/aws-
storage-gateway-file-gateway-for-hybrid-
architectures.pdf

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
File Gateway: Extensions
Corporate
data center
Lambda
nts
NFS / SMB Eve
Amazon S3 S3

File Gateway Appliance

Athena
VPC

NFS / SMB CRR


Redshift Spectrum
File Gateway Appliance

Amazon S3 EMR

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
File Gateway: Read Only Replicas

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
File Gateway: Backup and Lifecycle Policies

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
File Architectures: Other possibilities
• Amazon S3 Object Versioning
• Ability to store multiple object versions as they are modified
• Helpful to restore a file to a previous version
• Could restore an entire file system to a previous version
• Must use the “RefreshCache” API on the Gateway to be notified of restore

• Amazon S3 Object Lock


• Enables to have the File Gateway for Write Once Read Many (WORM) data
• If there are file modifications or renames in the file share clients, the file gateway
creates a new version of the object without affecting priori versions, and the
original locked version will remain unchanged

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Volume Gateway
• Block storage using iSCSI protocol backed by S3
• Cached volumes: low latency access to most recent data, full data on S3
• Stored volumes: entire dataset is on premise, scheduled backups to S3
• Can create EBS snapshots from the volumes and restore as EBS!
• Up to 32 volumes per gateway
• Each volume up to 32TB in cached mode (1PB per Gateway)
• Each volume up to 16 TB in stored mode (512TB per Gateway)

Corporate Data Center AWS Cloud

Region

iSCSI HTTPS
Application Volume Gateway S3 Bucket Amazon EBS
Server Snapshots

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Tape Gateway
• Some companies have backup processes using physical tapes (!)
• With Tape Gateway, companies use the same processes but in the cloud
• Virtual Tape Library (VTL) backed by Amazon S3 and Glacier
• Back up data using existing tape-based processes (and iSCSI interface)
• Works with leading backup software vendors
• You can’t access single file within tapes. You need to restore the tape entirely
Corporate Data Center AWS Cloud

Region
Media
iSCSI Changer HTTPS

Tape Virtual Tapes Archived Tapes


Backup Tape
stored in stored in
Server Drive Gateway Amazon S3 Amazon Glacier

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon FSx File Gateway
• Native access to Amazon FSx for Windows File Server
• Local cache for frequently accessed data
• Windows native compatibility (SMB, NTFS, Active Directory...)
• Useful for group file shares and home directories

Corporate AWS Cloud


Data Center

SMB Clients
Amazon FSx
File Gateway Amazon FSx File systems
for Windows File Server

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Snow Family
• Highly-secure, portable devices to collect and process data at the edge,
and migrate data into and out of AWS

• Data migration:
Snowcone Snowball Edge Snowmobile

• Edge computing:
Snowcone Snowball Edge

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Data Migrations with AWS Snow Family
Challenges:
Time to Transfer • Limited connectivity
100 Mbps 1Gbps 10Gbps • Limited bandwidth
10 TB 12 days 30 hours 3 hours • High network cost
100 TB 124 days 12 days 30 hours • Shared bandwidth (can’t
1 PB 3 years 124 days 12 days maximize the line)
• Connection stability

AWS Snow Family: offline devices to perform data migrations


If it takes more than a week to transfer over the network, use Snowball devices!

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Diagrams
• Direct upload to S3:
www: 10Gbit/s

client Amazon S3
bucket
• With Snow Family:
ship

AWS AWS import/ Amazon S3


client
Snowball Snowball export bucket

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Snowball Edge (for data transfers)
• Physical data transport solution: move TBs or PBs of data in or out
of AWS
• Alternative to moving data over the network (and paying network
fees)
• Pay per data transfer job
• Provide block storage and Amazon S3-compatible object storage
• Snowball Edge Storage Optimized
• 80 TB of HDD capacity for block volume and S3 compatible object
storage
• Snowball Edge Compute Optimized
• 42 TB of HDD capacity for block volume and S3 compatible object
storage
• Use cases: large data cloud migrations, DC decommission, disaster
recovery

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Snowcone
• Small, portable computing, anywhere, rugged &
secure, withstands harsh environments
• Light (4.5 pounds, 2.1 kg)
• Device used for edge computing, storage, and data
transfer
• 8 TBs of usable storage
• Use Snowcone where Snowball does not fit
(space-constrained environment)
• Must provide your own battery / cables

• Can be sent back to AWS offline, or connect it to


internet and use AWS DataSync to send data

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Snowmobile

• Transfer exabytes of data (1 EB = 1,000 PB = 1,000,000 TBs)


• Each Snowmobile has 100 PB of capacity (use multiple in parallel)
• High security: temperature controlled, GPS, 24/7 video surveillance
• Better than Snowball if you transfer more than 10 PB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Snow Family for Data Migrations

Snowcone Snowball Edge Snowmobile

Snowcone Snowball Edge Snowmobile


Storage Optimized
Storage Capacity 8 TB usable 80 TB usable < 100 PB
Migration Size Up to 24 TB, online and Up to petabytes, Up to exabytes, offline
offline offline
DataSync agent Pre-installed
Storage Clustering Up to 15 nodes

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Snow Family – Usage Process
1. Request Snowball devices from the AWS console for delivery
2. Install the snowball client / AWS OpsHub on your servers
3. Connect the snowball to your servers and copy files using the client
4. Ship back the device when you’re done (goes to the right AWS
facility)
5. Data will be loaded into an S3 bucket
6. Snowball is completely wiped

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What is Edge Computing?
• Process data while it’s being created on an edge location
• A truck on the road, a ship on the sea, a mining station underground...

• These locations may have


• Limited / no internet access
• Limited / no easy access to computing power
• We setup a Snowball Edge / Snowcone device to do edge computing
• Use cases of Edge Computing:
• Preprocess data
• Machine learning at the edge
• Transcoding media streams
• Eventually (if need be) we can ship back the device to AWS (for transferring data for example)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Snow Family – Edge Computing
• Snowcone (smaller)
• 2 CPUs, 4 GB of memory, wired or wireless access
• USB-C power using a cord or the optional battery
• Snowball Edge – Compute Optimized
• 52 vCPUs, 208 GiB of RAM
• Optional GPU (useful for video processing or machine learning)
• 42 TB usable storage
• Snowball Edge – Storage Optimized
• Up to 40 vCPUs, 80 GiB of RAM
• Object storage clustering available
• All: Can run EC2 Instances & AWS Lambda functions (using AWS IoT Greengrass)
• Long-term deployment options: 1 and 3 years discounted pricing

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS OpsHub
• Historically, to use Snow Family devices, you
needed a CLI (Command Line Interface tool)
• Today, you can use AWS OpsHub (a software
you install on your computer / laptop) to
manage your Snow Family Device
• Unlocking and configuring single or clustered devices
• Transferring files
• Launching and managing instances running on Snow
Family Devices
• Monitor device metrics (storage capacity, active
instances on your device)
• Launch compatible AWS services on your devices
(ex: Amazon EC2 instances, AWS DataSync,
Network File System (NFS))

https://aws.amazon.com/blogs/aws/aws-snowball-edge-update/

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Snow Family – Improving Transfer Performance
• Most impactful to least:
• Perform multiple write operations at one time - from multiple terminals
• Transfer small files in batches – zip up small files until at least 1MB
• Don't perform other operations on files during transfer
• Reduce local network use
• Eliminate unnecessary hops – directly connect to the computer

• The data transfer rate using the file interface is typically between 25
MB/s and 40 MB/s. If you need to transfer data faster than this, use the
Amazon S3 Adapter for Snowball, which has a data transfer rate
typically between 250 MB/s and 400 MB/s

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DMS – Database Migration Service
• Quickly and securely migrate databases to
AWS, resilient, self healing Source DB
• The source database remains available
during the migration
• Supports:
EC2 instance
• Homogeneous migrations: ex Oracle to Running DMS
Oracle
• Heterogeneous migrations: ex Microsoft SQL
Server to Aurora
• Continuous Data Replication using CDC Target DB
• You must create an EC2 instance to
perform the replication tasks

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DMS Sources and Targets
SOURCES: TARGETS:
• On-premises and EC2 instances
• On-premises and EC2 instances databases: Oracle, MS SQL Server,
databases: Oracle, MS SQL Server, MySQL, MariaDB, PostgreSQL, SAP
MySQL, MariaDB, PostgreSQL, • Amazon RDS including Aurora
MongoDB, SAP, DB2 • Amazon Redshift
• Azure: Azure SQL Database • Amazon DynamoDB
• Amazon S3
• Amazon RDS: all including
• ElasticSearch Service
Aurora
• Kinesis Data Streams
• Amazon S3 • DocumentDB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Schema Conversion Tool (SCT)
• Convert your Database’s Schema from one engine to another
• Example OLTP: (SQL Server or Oracle) to MySQL, PostgreSQL, Aurora
• Example OLAP: (Teradata or Oracle) to Amazon Redshift

Source DB DMS + SCT Target DB (different engine)

• You do not need to use SCT if you are migrating the same DB engine
• Ex: on-premises PostgreSQL => RDS PostgreSQL
• The DB engine is still PostgreSQL (RDS is the platform)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DMS – Good things to know
• Works over VPC Peering, VPN (site to site, software), Direct Connect
• Supports Full Load, Full Load + CDC, or CDC only
• Oracle:
• Source: Supports TDE for the source using “BinaryReader”
• Target: Supports BLOBs in tables that have a primary key, and TDE
• ElasticSearch:
• Source: does not exist
• Target: possible to migrate to DMS from a relational database
• Therefore DMS cannot be used to replicate ElasticSearch data

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Snowball + Database Migration Service (DMS)
• Larger data migrations can include many terabytes of information.
• Can be limited due to network bandwidth or size of data
• AWS DMS can use Snowball Edge & Amazon S3 to speed up migration
• Following stages:
1. You use the AWS Schema Conversion Tool (AWS SCT) to extract the data
locally and move it to an Edge device.
2. You ship the Edge device or devices back to AWS.
3. After AWS receives your shipment, the Edge device automatically loads its
data into an Amazon S3 bucket.
4. AWS DMS takes the files and migrates the data to the target data store. If you
are using change data capture (CDC), those updates are written to the
Amazon S3 bucket and then applied to the target data store.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Cloud Adoption Readiness Tool (CART)
• Helps organizations develop efficient and effective plans for cloud adoption and migrations
• Transforms your idea of moving to the cloud into a detailed plan that follows AWS best
practices
• Answer a set of questions across six perspectives (business, people, process, platform,
operations, security)
• Generates a custom report on your level of migration readiness

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Disaster Recovery Overview
• Any event that has a negative impact on a company’s business continuity
or finances is a disaster
• Disaster recovery (DR) is about preparing for and recovering from a
disaster
• What kind of disaster recovery?
• on-premises => on-premises: traditional DR, and very expensive
• on-premises => AWS Cloud: hybrid recovery
• AWS Cloud Region A => AWS Cloud Region B
• Need to define two terms:
• RPO: Recovery Point Objective
• RTO: Recovery Time Objective

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
RPO and RTO
Data loss Downtime

RPO Disaster RTO

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Disaster Recovery Strategies
• Backup and Restore
• Pilot Light
• Warm Standby
• Hot Site / Multi Site Approach

Faster RTO

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Backup and Restore (High RPO)
Corporate data AWS Cloud AWS Cloud
center

Amazon EC2

lifecycle
AWS Storage Gateway Amazon S3

AWS Snowball Glacier


AMI
AWS Cloud

EBS Scheduled regular


snapshots
Redshift
Amazon RDS
Snapshot
RDS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Disaster Recovery – Pilot Light
• A small version of the app is always running in the cloud
• Useful for the critical core (pilot light)
• Very similar to Backup and Restore
• Faster than Backup and Restore as critical systems are already up

Corporate data AWS Cloud


center Route 53

EC2 (not running)

Data Replication

RDS (running)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Warm Standby
• Full system is up and running, but at minimum size
• Upon disaster, we can scale to production load
Corporate data AWS Cloud
center

Reverse
Route 53
proxy
ELB

App
Server
EC2 Auto Scaling failover
(minimum)
Master Data Replication
DB
RDS Slave (running)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Multi Site / Hot Site Approach
• Very low RTO (minutes or seconds) – very expensive
• Full Production Scale is running AWS and On Premise
Corporate data AWS Cloud
active active
center

Reverse
Route 53
proxy
ELB

App
Server
EC2 Auto Scaling failover
(production)
Master Data Replication
DB
RDS Slave (running)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
All AWS Multi Region

AWS Cloud AWS Cloud


active active

Route 53

ELB ELB

EC2 Auto Scaling EC2 Auto Scaling failover


(production) (production)
Data Replication

Aurora Global (master) Aurora Global (slave)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Disaster Recovery Tips
• Backup
• EBS Snapshots, RDS automated backups / Snapshots, etc…
• Regular pushes to S3 / S3 IA / Glacier, Lifecycle Policy, Cross Region Replication
• From on-premises: Snowball or Storage Gateway
• High Availability
• Use Route53 to migrate DNS over from Region to Region
• RDS Multi-AZ, ElastiCache Multi-AZ, EFS, S3
• Site to Site VPN as a recovery from Direct Connect
• Replication
• RDS Replication (Cross Region), AWS Aurora + Global Database
• Database replication from on-premises to RDS
• Storage Gateway
• Automation
• CloudFormation / Elastic Beanstalk to re-create a whole new environment
• Recover / Reboot EC2 instances with CloudWatch if alarms fail
• AWS Lambda functions for customized automations
• Chaos
• Netflis has a “simian-army” randomly terminating EC2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Fault Injection Simulator (FIS)
• A fully managed service for running fault injection experiments on AWS workloads
• Based on Chaos Engineering – stressing an application by creating disruptive events
(e.g., sudden increase in CPU or memory), observing how the system responds, and
implementing improvements
• Helps you uncover hidden bugs and performance bottlenecks
• Supports the following AWS services: EC2, ECS, EKS, RDS…
• Use pre-built templates that generate the desired disruptions
Resources Monitoring

create start

EC2 ECS CloudWatch


AWS Fault Injection Experiment Stop View
Simluator Template Experiment Results
(stop if complete or (identify performance,
EKS RDS EventBridge an alarm is triggered) observability,
or resiliency issues)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Application Discovery Service
• Plan migration projects by gathering information about on-premises data centers
• Server utilization data and dependency mapping are important for migrations
• Agentless Discovery (AWS Agentless Discovery Connector)
• Open Virtual Appliance (OVA) package that can be deployed to a VMware host
• VM inventory, configuration, and performance history such as CPU, memory, and disk usage
• OS agnostic
• Agent-based Discovery (AWS Application Discovery Agent)
• System configuration, system performance, running processes, and details of the network
connections between systems
• Supports Microsoft Server, Amazon Linux, Ubuntu, RedHat, CentOS, SUSE…
• Resulting data can be exported as CSV or viewed within AWS Migration Hub
• Data can be explorer using pre-defined queries in Amazon Athena

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Application Migration Service (MGN)
• The “AWS evolution” of CloudEndure Migration, replacing AWS Server Migration Service (SMS)

• Lift-and-shift (rehost) solution which simplify migrating applications to AWS


• Converts your physical, virtual, and cloud-based servers to run natively on AWS
• Supports wide range of platforms, Operating Systems, and databases
• Minimal downtime, reduced costs

Corporate Data Center / Any cloud AWS Cloud


Application Migration Service
OS
Staging Production
continuous replication
Apps
cutover
DB AWS Replication
Agent
Disks Low-cost EC2 instances Target EC2 instances
& EBS volumes & EBS volumes

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Elastic Disaster Recovery (DRS)
• Used to be named “CloudEndure Disaster Recovery”

• Quickly and easily recover your physical, virtual, and cloud-based servers into AWS
• Example: protect your most critical databases (including Oracle, MySQL, and SQL Server),
enterprise apps (SAP), protect your data from ransomware attacks, …
• Continuous block-level replication for your servers
Corporate Data Center / Any cloud AWS Cloud
Elastic Disaster Recovery
OS
Staging Production
Apps continuous replication
(seconds)
failover
DB AWS Replication (minutes)
Agent
Disks Low-cost EC2 instances Target EC2 instances
& EBS volumes & EBS volumes
failback

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
On-premises strategy with AWS
• Ability to download Amazon Linux 2 AMI as a VM (.iso format)
• VMWare, KVM, VirtualBox (Oracle VM), Microsoft Hyper-V
• AWS Application Discovery Service AWS Application
• Gather information about your on-premises servers to plan a migration
Discovery Service
• Server utilization and dependency mappings
• Track with AWS Migration Hub
• AWS Application Migration Service (MGN) AWS Application
• Replacing AWS Server Migration Services & CloudEndure Migration Migration Service
• Incremental replication of on-premises live servers to AWS
• Migrates the entire VM into AWS
• AWS Elastic Disaster Recovery (DRS) AWS Elastic
• Replacing CloudEndure Disaster Recovery Disaster Recovery
• Recover on-premises workloads onto AWS
• AWS Database Migration Service (DMS)
• replicate on-premises => AWS , AWS => AWS, AWS => on-premises
AWS Database
• Works with various database technologies (Oracle, MySQL, DynamoDB, etc..) Migration Service

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Basics
• CIDR: Block of IP address
• Example: 192.168.0.0/26: 192.168.0.0 – 192.168.0.63 (64 IP)
• Used for security groups, route tables, VPC, subnets, etc…
• Private IP
• 10.0.0.0 – 10.255.255.255 (10.0.0.0/8) <= in big networks
• 172.16.0.0 – 172.31.255.255 (172.16.0.0/12) <= default AWS one
• 192.168.0.0 – 192.168.255.255 (192.168.0.0/16) <= example: home networks
• Public IP
• All the rest

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Basics
• VPC
• A VPC must have a defined list of CIDR blocks, that cannot be changed
• Each CIDR within VPC: min size is /28, max size is /16 (65536 IP addresses)
• VPC is private, so only Private IP CIDR ranges are allowed
• Subnets
• Within a VPC, defined as a CIDR that is a subset of the VPC CIDR
• All instances within subnets get a private IP
• First 4 IP and last one in every subnet is reserved by AWS
• Route Tables
• Used to control where the network traffic is directed to
• Can be associated with specific subnets
• The “most specific” routing rule is always followed (192.168.0.1/24 beats 0.0.0.0/0)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Basics
• Internet Gateway (IGW)
• Helps our VPC connect to the internet, HA, scales horizontally
• Acts as a NAT for instances that have a public IPv4 or public IPv6
• Public Subnets
• Has a route table that sends 0.0.0.0/0 to an IGW
• Instances must have a public IPv4 to talk to the internet
• Private Subnets
• Access internet with a NAT Instance or NAT Gateway setup in a public subnet
• Must edit routes so that 0.0.0.0/0 routes traffic to the NAT

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Basics VPC

Private subnet
• NAT Instance
Private EC2
• EC2 instance you deploy in a public subnet 192.168.0.34
• Edit the route in your private subnet to route
0.0.0.0/0 to your NAT instance
• Not resilient to failure, limited bandwidth based
on instance type, cheap Public subnet

• Must manage failover yourself NAT Gateway


• NAT Gateway (or NAT Instance)
• Managed NAT solution, bandwidth scales Elastic IP
automatically 54.25.43.122
• Resilient to failure within a single AZ
• Must deploy multiple NAT Gateways in multiple Public internet
AZ for HA
• Has an Elastic IP, external services see the IP of the
NAT Gateway as the source 3rd party service
Must whitelist
The Elastic IP
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Basics
• Network ACL (NACL)
• Stateless firewall defined at the subnet level, applies to all instances within
• Support for allow and deny rules
• Stateless = return traffic must be explicitly allowed by rules
• Helpful to quickly and cheaply block specific IP addresses

• Security Groups
• Applied at the instance level, only support for allow rules, no deny rules
• Stateful = return traffic is automatically allowed, regardless of rules
• Can reference other security groups in the same region (peered VPC, cross-account)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Basics
• VPC Flows Logs
• Log internet traffic going through your VPC
• Can be defined at the VPC level, Subnet level, or ENI-level
• Helpful to capture “denied internet traffic”
• Can be sent to CloudWatch Logs and Amazon S3

• Bastion Hosts
• SSH into private EC2 instances through a public EC2 instance (bastion host)
• You must manage these instances yourself (failover, recovery)
• SSM Session Manager is a more secure way to remote control without SSH

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Basics
• IPv6 in short
• All IPv6 addresses are public, total 3.4×1038 addresses (vs 4.3 billion IPv4)
• Example CIDR: 2600:1f18:80c:a900::/56
• Addresses are “random” and can’t be scanned online (because too many)
• VPC support for IPv6
• Create an IPv6 CIDR for VPC & use an IGW (supports IPv6)
• Public subnet:
• Create an instance with IPv6 support
• Create a route table entry to ::/0 (IPv6 “all”) to the IGW
• Private subnet (instances cannot be reached by IPv6 but can reach IPv6):
• Create an Egress-Only Internet Gateway in the public subnet
• Add a route table entry for the private subnet from ::/0 to the Egress-Only IGW

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Peering
• Connect two VPC, privately using AWS’
network VPC peering
Aß àB
VPC A VPC B
• Make them behave as if they were in the
same network
• Must not have overlapping CIDR
• VPC Peering connection is not transitive
(must be established for each VPC that
need to communicate with one another) VPC C
• You can do VPC peering with another VPC peering VPC peering
AWS account Aß àC B ß àC

• You must update route tables in each


VPC’s subnets to ensure instances can
communicate

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Peering – Good to know
• VPC peering can work inter-region, cross-account
• You can reference a security group of a peered VPC (works cross account)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Peering – Longest Prefix Match
• VPC uses the longest prefix match to select the most specific route

• Here the longest prefix for 10.0.0.77 is 10.0.0.77/32 (route table VPC A)
• (other way of saying it is “most specific route”)
https://docs.aws.amazon.com/vpc/latest/peering/peering-configurations-partial-access.html#one-to-two-vpcs-lpm

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Peering – Invalid Configurations
Overlapping CIDR for IPv4 No Transitive VPC Peering

No Edge to Edge Routing

VPN, Direct Connect, IGW, NAT, Gateway VPC Endpoint (S3 & DynamoDB)
https://docs.aws.amazon.com/vpc/latest/peering/invalid-peering-configurations.html
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Peering – Invalid Configuration
No edge to edge routing
• This is an invalid configuration
• VPC Peering does not support edge to edge routing for NAT devices
Central VPC
VPC VPC
Public subnet
Private subnet Private subnet

X NAT Gateway X

Internet Gateway

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Transit VPC (=Software VPN)
• Not an AWS offering, newer managed solution is Transit Gateway
• Uses the public internet with a software VPN solution
• Allows for transitive connectivity between VPC & locations
• More complex routing rules, overlapping CIDR ranges, network-level packet filtering

VPC Transit VPC VPC


VPN VPN
Software
VPN
Public www

Corporate
data center

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Network topologies can become complicated

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Transit Gateway
• For having transitive peering between thousands of VPC and
on-premises, hub-and-spoke (star) connection
• Regional resource, can work cross-region
• Share cross-account using Resource Access Manager (RAM)
• You can peer Transit Gateways across regions
• Route Tables: limit which VPC can talk with other VPC
• Works with Direct Connect Gateway, VPN connections
• Supports IP Multicast (not supported by any other AWS
service)
• Instances in a VPC can access a NAT Gateway, NLB,
PrivateLink, and EFS in others VPCs attached to the AWS
Transit Gateway.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Transit Gateway – Central NAT Gateway
• The NAT Gateway is
shared in the Egress-
VPC
• The private App VPC
can access internet
through the TGW
• In this example: the
App VPCs cannot
communicate with
each other based on
the TGW route table

https://aws.amazon.com/blogs/networking-and-content-delivery/creating-a-single-internet-
exit-point-from-multiple-vpcs-using-aws-transit-gateway/
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Endpoints
• Endpoints allow you to connect to AWS VPC
Services using a private network instead of Private subnet
the public www network
• They scale horizontally and are redundant
VPC Endpoint
• No more IGW, NAT, etc… to access AWS Interface (ENI)
Services
• VPC Endpoint Gateway (S3 & DynamoDB) VPC Endpoint
• VPC Endpoint Interface (all incl. S3 & DDB) Gateway

• In case of issues:
• Check DNS Setting Resolution in your VPC
• Check Route Tables
S3 DynamoDB CloudWatch

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Endpoint Gateway
• Only works for S3 and DynamoDB, must create one gateway per VPC
• Must update route tables entries
• Gateway is defined at the VPC level

EC2
VPC Endpoint S3
Gateway
• DNS resolution must be enabled in the VPC
• The same public hostname for S3 can be used
• Gateway endpoint cannot be extended out of a VPC (VPN, DX, TGW, peering)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Endpoints Interface
• Provision an ENI that will have a private endpoint interface hostname
• Leverage Security Groups for security
• Private DNS (setting when you create the endpoint)
• The public hostname of a service will resolve to the private Endpoint Interface hostname
• VPC Setting: “Enable DNS hostnames” and “Enable DNS Support” must be 'true’
• Example for Athena:
• vpce-0b7d2995e9dfe5418-mwrths3x.athena.us-east-1.vpce.amazonaws.com
• vpce-0b7d2995e9dfe5418-mwrths3x-us-east-1a.athena.us-east-1.vpce.amazonaws.com
• vpce-0b7d2995e9dfe5418-mwrths3x-us-east-1b.athena.us-east-1.vpce.amazonaws.com
• athena.us-east-1.amazonaws.com (private DNS name)
• Interface can be accessed from Direct Connect and Site-to-Site VPN

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Endpoint Policies
• Endpoint Policies are JSON documents to control access to services
• Does not override or replace IAM user policies or service-specific
policies (such as S3 bucket policies)
• Note: the IAM user
can still use other SQS
API from outside the
VPC Endpoint
• You could add an SQS
queue policy to deny
any action not done
through the VPC
endpoint
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Endpoint Policy & S3 bucket policy
• VPC Endpoint Policy to restrict access to bucket “my_secure_bucket”

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Endpoint Policy & S3 bucket policy
• VPC Endpoint Policy to allow access to Amazon Linux 2 repositories

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Endpoint Policy & S3 bucket policy
• S3 bucket policy may have
• Condition: "aws:sourceVpce": "vpce-1a2b3c4d" to Deny any traffic that doesn't
come from a specific VPC endpoint (more secure)
• Condition: "aws:sourceVpc": "vpc-111bbb22" for a specific VPC
• The aws:sourceVpc condition only works for VPC Endpoints, in case you
have multiple endpoints and want to manage access to your S3 buckets
for all your endpoints
• The S3 bucket policies can restrict access only from a specific public IP
address or an elastic IP address. You can’t restrict based on private IP
• Therefore aws:SourceIp condition doesn’t apply for VPC endpoints

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Example S3 bucket policies
• S3 bucket policy to restrict to one • S3 bucket policy to restrict to an
specific VPC Endpoint entire VPC (multiple VPC Endpoints)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPC Endpoint Policies for S3
Troubleshooting
Check IAM permissions

VPC
Private subnet
Security group

Check SG Route table VPC Endpoint


Amazon S3
Outbound Rules Must have route to S3 Gateway
Using gateway VPC Endpoint Verify
Check VPC Endpoint Policy S3 bucket policy
Check VPC DNS settings
DNS resolution must be enabled

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS PrivateLink (VPC Endpoint Services)
• Most secure & scalable way to expose a service to 1000s of VPC (own or other accounts)
• Does not require VPC peering, internet gateway, NAT, route tables…
• Requires a network load balancer (Service VPC) and ENI (Customer VPC)
• If the NLB is in multiple AZ, and the ENI in multiple AZ, the solution is fault tolerant!

Service VPC Customer VPC


AWS Private Link

AWS
private
Application Network Elastic Network Consumer
service Load Balancer Interface (ENI) Application

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Secure and Scale Web Filtering
using Explicit Proxy

https://aws.amazon.com/blogs/networking-and-content-delivery/how-to-use-aws-privatelink-to-
secure-and-scale-web-filtering-using-explicit-proxy/

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Site to Site VPN (AWS Managed VPN)
• on-premises: Corporate
• Setup a software or hardware VPN appliance data center
to your on-premises network.
• The on-premises VPN should be accessible VPN appliance
using a public IP Public IP
• AWS-side:
Customer Gateway
• Setup a Virtual Private Gateway (VGW) and
attach to your VPC
• Setup a Customer Gateway to point the on- Site-to-Site VPN Site-to-Site VPN
premises VPN appliance Public Tunnel 2 (IPSec)
Tunnel 1 (IPSec) internet
• Two VPN connections (tunnels) are
created for redundancy, encrypted using VPC Virtual Private
IPSec Gateway
• Can optionally accelerate it using Global (VGW)
Accelerator (for worldwide networks) Private subnet

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Route Propagation in Site-to-Site VPN
VPC
Corporate
Route Table CGW Site-to-Site VPN VGW Private subnet
data center Route Table

10.3.0.0/20 Custom ASN Custom ASN 10.0.0.1/24

• Static Routing:
• Create static route in corporate data center for 10.0.0.1/24 through the CGW
• Create static route in AWS for 10.3.0.0/20 through the VGW
• Dynamic Routing (BGP):
• Uses BGP (Border Gateway Protocol) to share routes automatically (eBGP for internet)
• We don’t need to update the routing tables, it will be done for us dynamically
• Just need to specify the ASN (Autonomous System Number) of the CGW and VGW

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Site to Site VPN and Internet Access
• NOT OKAY (blocked by NAT Gateway restrictions)
VPC Public subnet
Corporate
CGW VGW
data center
google.com
(or direct connect)
NAT Gateway IGW

• OKAY (self managed NAT Instance – more control)


VPC Public subnet
Corporate
CGW VGW
data center
google.com
(or direct connect)
NAT Instance IGW

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Site to Site VPN and Internet Access

• OKAY (alternative to NAT Instances / Gateway)


VPC
Corporate
CGW VGW Private subnet
data center

Google.com

on-premises NAT (or direct connect)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS VPN CloudHub
• Can connect up to 10 Customer Gateway
for each Virtual Private Gateway (VGW)
• Low cost hub-and-spoke model for
primary or secondary network
connectivity between locations
• Provide secure communication between
sites, if you have multiple VPN connections
• It’s a VPN connection so it goes over the
public internet
• Can be a failover connection between
your on-premises locations

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Client VPN
• Connect from your computer using OpenVPN to your private network
in AWS and on-premises
Computer with AWS Client VPN (OpenVPN)

e IP Uses
at priva
s priv te IP
Use
VPC
Corporate
CGW Site-to-Site VPN VGW Private subnet
data center

10.3.0.0/20 Custom ASN Custom ASN 10.0.0.1/24

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Software VPN (not AWS managed)
• You can setup your own software VPN, but you have to manage
everything including bandwidth, redundancy, etc.

VPC Public subnet


Corporate Private subnet
data center

Software VPN Public www


Software VPN

• You would have more control over the setup and routing options

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
VPN to multiple VPC
Corporate
data center
• For VPN-based
customers, AWS
recommends creating a
separate VPN Customer Gateway
connection for each
customer VPC.
• Direct Connect is
recommended because
it has a Direct Connect
VGW VGW VGW
Gateway
VPC VPC VPC

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Shared Services VPC
VPC A VPC B VPC C

• Create a VPN connection between on-


premises and shared service VPC
• Replicate services, applications, VPC Peering
databases between on-premises and
the Shared Services VPC or deploy
proxies in the shared service VPC
Replicated Application
• Do VPC peering between the VPC and Services Proxies
the shared service VPC Shared Services VPC
• VPCs can directly access the Shared
Service VPC services and do not need VGW
VPN connections to on-premises Site-to-Site VPN

CGW

Corporate data center


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Other Solutions
• Transit VPC (complicated)
• Good for resources that are hard to replicate on the cloud
• Transit Gateway (simple)
• Must use VPN as VPC peering does not support transitive routing

VPC Transit VPC VPC


VPN VPN
Software
VPN

VGW VGW

Corporate
data center

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Direct Connect
• Provides a dedicated private connection from a remote network to
your VPC
• Dedicated connection must be setup between your DC and AWS
Direct Connect locations
• More expensive than running a VPN solution
• Private access to AWS services through VIF
• Bypass ISP, reduce network cost, increase bandwidth and stability
• Not redundant by default (must setup a failover DX or VPN)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Direct Connect – Virtual Interfaces (VIF)
• Public VIF – connect to Public AWS Endpoints (S3 buckets, EC2 service,
anything AWS …)
• Private VIF – connect to resources in your VPC (EC2 instances, ALB, …)
• Transit Virtual Interface – connect to resources in a VPC using a Transit
Gateway

• VPC Endpoints can’t be accessed through Private VIF (you don’t need
them)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Direct Connect Diagram
Region
(us-east-1)
VPC Corporate
data center
Private Subnet
VLAN 1
VLAN 2
Virtual Private Gateway AWS Direct Customer or Customer
Connect Endpoint partner router router/firewall
EC2 Instances
Customer or
AWS Cage partner cage

AWS Direct Connect Location Customer Network

Amazon Glacier Amazon S3


Private virtual interface
Public virtual interface

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Direct Connect – Connection Types
• Dedicated Connections: 1Gbps, 10 Gbps, 100 Gbps capacity
• Physical ethernet port dedicated to a customer
• Request made to AWS first, then completed by AWS Direct Connect Partners

• Hosted Connections: 50Mbps, 500 Mbps, to 10 Gbps


• Connection requests are made via AWS Direct Connect Partners
• Capacity can be added or removed on demand
• 1, 2, 5, 10 Gbps available at select AWS Direct Connect Partners

• Lead times are often longer than 1 month to establish a new connection

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Direct Connect – Encryption
Region
• Data in transit is not encrypted but is (us-east-1)
private VPC

Availability Zone Corporate


(us-east-1a)
• AWS Direct Connect + VPN provides Private Subnet 1
Data Center

an IPsec-encrypted private connection


Client

EC2 Instances
• VPN over Direct Connect connection
Uses Public VIF Availability Zone
(us-east-1b) Virtual AWS Direct
VPN
Customer
Connection
Private Connect Endpoint Gateway
Private Subnet 2 Gateway

AWS Direct
• Good for an extra level of security, but Connect
Location
slightly more complex to put in place EC2 Instances

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Direct Connect – Link Aggregation Groups
(LAG)
Region
(us-east-1)
• Get increased speed and failover by
summing up existing DX connections VPC
into a logical one Availability Zone

• Can aggregate up to 4 connections (us-east-1a)


LAG 1
Corporate
Data Center
(active-active mode) Private Subnet 1 Conn
ectio
n1

• Can add connections over time to Connection 2

the LAG EC2 Instances


AWS Direct
Connect
Customer
Gateway
• All connections in the LAG: Availability Zone
Location - A

• Must be dedicated connections (us-east-1b) Virtual

• Must have the same bandwidth Private Subnet 2


Private
Gateway LAG 2
Corporate
Data Center
• Must terminate at the same AWS Direct Conn
ectio
n1
Connect Endpoint
Connection 2
• Can set a minimum number of EC2 Instances AWS Direct Customer
connections for the LAG to function Connect
Location - B
Gateway

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Direct Connect Gateway
• If you want to setup a Direct Connect to one or more VPC in many different
regions (same/cross account), you must use a Direct Connect Gateway
Region Region
(us-east-1) (us-west-1)

VPC VPC

Customer network
10.0.0.0/16 172.16.0.0/16
Private virtual
Private virtual
interface
interface

Private virtual AWS Direct


interface Connect
Direct Connect Gateway connection

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Direct Connect Gateway + Transit Gateway
Direct Connect
Gateway

VPC VPC
Transit
Gateway

VPN
Connection

VPC VPC
Customer
Gateway

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Site-to-Site Active-Active Connection
Active-Active VPN Connection

AWS Region Corporate Data Center 1


(10.0.0.0/16)

VPC
(172.16.0.0/16)
Customer
Gateway
VPN Connection

Virtual Private Corporate Data Center 2


Gateway (10.1.0.0/16)

VPN Connection

Customer
Gateway

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Direct Connect – High Availability
Multiple connections at multiple AWS Direct Connect locations

Corporate
Data Center
Region

AWS Direct
Connect Location 1

Corporate
Data Center

AWS Direct
Connect Location 2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Direct Connect – High Availability
Backup VPN Connection

AWS Region Corporate Data Center 1


(10.0.0.0/16)

VPC VPN Connection


(172.16.0.0/16)
Customer
Gateway

Virtual Private
Gateway Corporate Data Center 2
(10.1.0.0/16)

Customer
AWS Direct
Connect Location 1 Gateway

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Other Services Section

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Continuous Integration
• Developers push the code to a code
repository often (GitHub / CodeCommit /
Bitbucket / etc…)
• A testing / build server checks the code as
soon as it’s pushed (CodeBuild / Jenkins CI Tell developer
Push code
/ etc…) results of build
often
• The developer gets feedback about the
tests and checks that have passed / failed
Code
• Find bugs early, fix bugs Build Server
Repository
Get code
• Deliver faster as the code is tested build & test
• Deploy often
• Happier developers, as they’re unblocked

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Continuous Delivery
Push code
• Ensure that the software can be released often
reliably whenever needed. Code
• Ensures deployments happen often and Repository
Get code
are quick build & test
• Shift away from “one release every 3 Build Server
months” to ”5 releases a day” Deploy every
• That usually means automated deployment Deployment passing build
• CodeDeploy Server
• Jenkins CD
• Spinnaker Application Application Application
• Etc… Server v1 Server v1 Server v1

Application Application Application


Server v2 Server v2 Server v2
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Technology Stack for CICD
Code Build Test Deploy Provision

AWS AWS CodeBuild


AWS Elastic Beanstalk
CodeCommit (no time limit)

GitHub User Managed EC2


Jenkins CI
Or 3rd party AWS CodeDeploy Instances Fleet
Or 3rd party CI servers (CloudFormation)
code repository

Orchestrate: AWS CodePipeline

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CICD Architecture
“DEV” CodePipeline

Elastic Beanstalk

DEV BRANCH

Pull Request & Merge CodeBuild CodeDeploy DEV Environment

CodeCommit
“PROD” CodePipeline

Elastic Beanstalk

PROD BRANCH

CodeBuild CodeDeploy PROD Environment

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CodeCommit Trigger for AWS Lambda
Developer
• Every push to CodeCommit can
trigger a Lambda function
push

• The Lambda function can scan for AWS CodeCommit


leaked AWS credentials on every
lock repository
code push, and disable them invoke

automatically to remedy the issue


API call to AWS Lambda
disable Access Keys (scan credentials)

IAM
trigger

Administrator
notify
AWS SNS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Good to know – CICD
• You can use a manual approval stage in CodePipeline
• Running unit tests CodeCommit + CodeBuild + Code Pipeline
CodePipeline
CodeCommit CodeBuild
(Source) (Run Tests)

• Build and Store Docker Images: CodeBuild + ECR


CodeBuild Amazon ECR
(Build Image) Docker image (Store Image)

• Automated CloudFormation deployment: CodePipeline

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudSearch
• Managed service to setup, manage and scale a search solution
• Managed alternative to ElasticSearch
• Free text, Boolean, autocomplete suggestions, geospatial search…

DynamoDB Table DynamoDB Stream Lambda Function Amazon CloudSearch

API to retrieve items API to search items

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Alexa for Business, Lex & Connect
• Alexa for Business:
• Use Alexa to help employees be more productive in meeting rooms and their desk
• Measure and increase the utilization of meeting rooms in their workplace
• Amazon Lex: (same technology that powers Alexa)
• Automatic Speech Recognition (ASR) to convert speech to text
• Natural Language Understanding to recognize the intent of text, callers
• Helps build chatbots, call center bots
• Amazon Connect:
• Receive calls, create contact flows, cloud-based virtual contact center
• Can integrate with other CRM systems or AWS

Phone Call call stream invoke schedule


Schedule an
Appointment
Connect Lex Lambda CRM
Intent recognized
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Rekognition
• Find objects, people, text, scenes in images and videos using ML
• Facial analysis and facial search to do user verification, people counting
• Create a database of “familiar faces” or compare against celebrities
• Use cases:
• Labeling
• Content Moderation
• Text Detection
• Face Detection and Analysis (gender, age range, emotions…)
• Face Search and Verification
• Celebrity Recognition
• Pathing (ex: for sports game analysis)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Rekognition: static images

Amazon S3 Lambda + Step Functions

File upload S3 events metadata


Front End API

DynamoDB Table
Analyze image

index API
Gateway

Rekognition Amazon CloudSearch


(or ElasticSearch)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Video Streams
• One video stream per streaming device (producers)
• Security cameras, body worn camera, smartphone
• Can use a Kinesis Video Streams Producer library
• Underlying data is stored in S3 (but we don’t have access to it)
• Cannot output the stream data to S3 (must build custom solution)
• Consumers:
• Consumed by EC2 instances for real time analysis, or in batch
• Can leverage the Kinesis Video Stream Parser Library
• Integration with AWS Rekognition for facial detection

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Video Streaming & Rekognition
EC2 with KCL

Kinesis Data Stream


Kinesis Video Stream Rekognition
Video
Producers,
DeepLens Metadata stream
Kinesis Data Firehose

Internal Rekognition
EC2
Face Collection
Kinesis Data Analytics

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS WorkSpaces
• Managed, Secure Cloud Desktop
• Great to eliminate management of on-premises VDI (Virtual Desktop Infrastructure)
• On Demand, pay per by usage
• Secure, Encrypted, Network Isolation
• Integrated with Microsoft Active Directory

secure Corporate
data center

User Virtual Desktop


Linux / Windows
AWS Cloud

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS WorkSpaces
• WorkSpaces Application Manager (WAM)
• Deploy and Manage applications as virtualized application containers
• Provision at scale, and keep the applications updated using WAM
• Windows Updates
• By default, Amazon Workspaces are configured to install software updates
• Amazon WorkSpaces with Windows will have Windows Update turned on
• You have full control over the Windows Update frequency
• Maintenance Windows
• Updates are installed during maintenance windows (you define them)
• Always On WorkSpaces: default is from 00h00 to 04h00 on Sunday morning
• AutoStop WorkSpaces: automatically starts once a month to install updates
• Manual maintenance: you define your windows and perform maintenance

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon AppStream 2.0
• Desktop Application Streaming Service
• Deliver to any computer, without acquiring, provisioning infrastructure
• The application is delivered from within a web browser

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon AppStream 2.0 vs WorkSpaces
• Workspaces
• Fully managed VDI and desktop available
• The users connect to the VDI and open native or WAM applications
• Workspaces are on-demand or always on

• AppStream 2.0
• Stream a desktop application to web browsers (no need to connect to a VDI)
• Works with any device (that has a web browser)
• Allow to configure an instance type per application type (CPU, RAM, GPU)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon Mechanical Turk
• Crowdsourcing marketplace to perform simple human tasks
• Distributed virtual workforce.
• Integrates with SWF natively, does not integrate with Step Functions

• Example:
• You have a list of 10,000 restaurant names in your area and you want to get the
telephone number, opening hours, address, etc…
• Assume the restaurant name is not perfect, therefore Google API cannot help
• You distribute the task on Mechanical Turk and humans will fill your database
• Other use cases: image classification, data collection, business processing

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Device Farm
• Application testing service for
your mobile and web applications
• Test across real browsers and real
mobiles devices
• Fully automated using framework
• Improve the quality of web and
mobile apps
• Generates videos and logs to
document the issues encountered
• Can remotely log-in to devices
for debugging

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Macie
• Amazon Macie is a fully managed data security and data privacy service
that uses machine learning and pattern matching to discover and
protect your sensitive data in AWS.
• Macie helps identify and alert you to sensitive data, such as personally
identifiable information (PII)

analyze notify integrations

S3 Buckets Macie CloudWatch Events


Discover Sensitive Data (PII) EventBridge

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon Transcribe
• Automatically convert speech to text
• Uses a deep learning process called automatic speech recognition
(ASR) to convert speech to text quickly and accurately
• Use cases:
• transcribe customer service calls
• automate closed captioning and subtitling
• generate metadata for media assets to create a fully searchable archive

”Hello my name is Stéphane.


I hope you’re enjoying the course!

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon WorkDocs
• Alternative to Google Drive / OneDrive / etc
Secure and auditable End-user and
• A secure content management system that content sharing team file storage
can be accessed through API calls from
external customer apps
• Historical versions of files is kept for easy
rollback
• Create, store, sync, and share files from any WorkDocs
location
• Content encrypted at rest and in-transit
Team content Automation and
• Approval workflow, reminders, notifications Collaboration and Extensibility
workflows
• WorkDocs Drive – native desktop application
Replace legacy
Network file shares
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Final Tips & Sample Questions

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Analysis from the Practice Sample questions
• Exam Page: https://aws.amazon.com/certification/certified-solutions-
architect-professional/

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 1
• An enterprise has a large number of AWS accounts owned by separate
business groups. One of the accounts was recently compromised. The
attacker launched a large number of instances, resulting in a high bill for
that account.
• The security breach was addressed, but management has asked a
solutions architect to develop a solution to prevent excessive spending
in all accounts. Each business group wants to retain full control over its
AWS account.
• Which solution should the solutions architect recommend to meet
these requirements?

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 1 – Architecture Diagram
AWS Cloud AWS Cloud AWS Cloud

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option A
• Use AWS Organizations to add each AWS account to the master account.
Create a service control policy (SCP) that uses the `ec2:instanceType` condition
key to prevent the launch of high-cost instance types in each account.
SCP AWS ORGANIZATION SCP SCP
AWS Cloud AWS Cloud AWS Cloud

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option B
• Attach a new customer-managed IAM policy to an IAM group in each
account that uses the ec2:instanceType condition key to prevent the launch
of high-cost instance types. Place all of the existing IAM users in each
group.
AWS Cloud AWS Cloud AWS Cloud
Users in a Group Policy

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER
Option C
• Enable billing alerts on each AWS account. Create Amazon CloudWatch
alarms that send an Amazon SNS notification to the account
administrator whenever their account exceeds the spending budget.
AWS Cloud AWS Cloud AWS Cloud

Billing Alarm SNS Notification

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option D
• Enable AWS Cost Explorer in each account. Regularly review the Cost
Explorer reports for each account to ensure spending does not
exceed the planned budget
AWS Cloud AWS Cloud AWS Cloud

Cost Explorer

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 2
• A company has multiple AWS accounts. The company has integrated its
on-premises Active Directory (AD) with AWS SSO to grant AD users
least privilege abilities to manage infrastructure across all the accounts.
• A solutions architect must integrate a third-party monitoring solution
that requires read-only access across all AWS accounts. The monitoring
solutions will run in its own AWS account.
• How can the monitoring solution be given the required permissions?

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 2 - Architecture
AWS Cloud AWS Cloud

AWS Cloud

3rd party monitoring


AWS SSO
On Premises AD

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option A
• Create a user in an AWS SSO directory and assign a read-only permissions set. Assign
all AWS accounts to be monitored to the new user. Provide the third-party
monitoring solution with the user name and password.
AWS Cloud AWS Cloud

AWS Cloud

3rd party monitoring


AWS SSO
On Premises AD
New User

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Note on option A
• Currently, the sample question PDF says:

• “A is incorrect because credentials supplied by AWS SSO are temporary, so the


application would lose permissions and have to re-login”

• That is wrong.

• Users created in AWS SSO have a password that doesn’t change and must respect
the password policy defined.

• Here Option A is wrong because you can’t have both users defined in AWS SSO
and in Active Directory. SSO only allows for one Identity source (SSO, AD or IdP).

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option B
• Create an AWS IAM role in the organization's master account. Allow the
AWS account of the third-party monitoring solution to assume the role.
AWS Cloud AWS Cloud

AWS Cloud

Role 3rd party monitoring


AWS SSO
On Premises AD
AWS Master Account

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option C
• Invite the AWS account of the third-party monitoring solution to join
the organization. Enable all features
AWS Cloud AWS Cloud

AWS Cloud

3rd party monitoring


AWS SSO
On Premises AD
AWS Master Account
AWS Organization
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER
Option D
• Create an AWS CloudFormation template that defines a new AWS IAM role for
the third-party monitoring solution with the account of the third party listed in the
trust policy. Create the IAM role across all linked AWS accounts by using a stack set.
CloudFormation
StackSet To deploy
in all accounts

AWS Cloud AWS Cloud

Assume role thanks to trust

Stack Role

AWS Cloud

3rd party monitoring


AWS SSO
On Premises AD

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 3
• A team is building an HTML form hosted in a public Amazon S3 bucket.
The form uses JavaScript to post data to an Amazon API Gateway
endpoint. The endpoint is integrated with AWS Lambda functions. The
team has tested each method in the API Gateway console and received
valid responses.
• Which combination of steps must be completed for the form to
successfully post to the API Gateway and receive a valid response?
(Select TWO.)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 3 – Architecture

POST API Gateway Lambda


[restapi-id].execute-api.amazonaws.com

GET

Public S3 Bucket
Client

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWERS D,E
Options
• A) Configure the S3 bucket to
allow cross-origin resource
sharing (CORS).
• B) Host the form on Amazon
EC2 rather than Amazon S3. API Gateway Lambda
[restapi-id].execute-api.amazonaws.com
• C) Request a limit increase for
API Gateway.
• D) Enable cross-origin resource
sharing (CORS) in API Gateway.
Public S3 Bucket
• E) Configure the S3 bucket for Client
web hosting.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 3 – Final Architecture
CORS is a Browser based security
CORS to allow calls with Origin [bucketname].s3.website-[region].amazonaws.com
Using the header Access-Control-Allow-Origin

POST API Gateway Lambda


[restapi-id].execute-api.amazonaws.com

GET

Web Browser Public S3 Bucket


[bucketname].s3.website-[region].amazonaws.com
Visits [bucketname].s3.website-[region].amazonaws.com
Makes API calls to [restapi-id].execute-api.amazonaws.com
With Origin: [bucketname].s3.website-[region].amazonaws.com

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 4
• A retail company runs a serverless mobile app built on Amazon API
Gateway, AWS Lambda, Amazon Cognito, and Amazon DynamoDB.
During heavy holiday traffic spikes, the company receives complaints of
intermittent system failures. Developers find that the API Gateway
endpoint is returning 502 Bad Gateway errors to seemingly valid
requests.
• Which method should address this issue?

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 4 – Architecture
Client

Authentication
+ get token

502 errors in case of spikes

API Gateway

Cognito User Pools

Lambda

DynamoDB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway - Errors
• 4xx means Client errors
• 400: Bad Request
• 403: Access Denied, WAF filtered
• 429: Quota exceeded, Throttle

• 5xx means Server errors


• 502: Bad Gateway Exception, usually for an incompatible output returned from a
Lambda proxy integration backend and occasionally for out-of-order invocations due to
heavy loads.
• 503: Service Unavailable Exception
• 504: Integration Failure – ex Endpoint Request Timed-out Exception
API Gateway requests time out after 29 second maximum

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER
Option A
Client
• Increase the Authentication
concurrency limit for + get token
Lambda functions and
configure notification 502 errors in case of spikes
alerts to be sent by
Amazon CloudWatch API Gateway

when the Cognito User Pools


ConcurrentExecutions CW Alarm
metric approaches the Lambda
Concurrent
limit. Executions Metric

DynamoDB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option B Option B would work if we were receiving 429 errors
429: Quota exceeded, Throttle
Client
• Configure notification Authentication
alerts for the limit of + get token
transactions per
second on the API CW Alarm SNS
Gateway endpoint TPS Metric
and create a Lambda
function that will Cognito User Pools API Gateway
increase this limit, as
needed. Lambda
Lambda

AWS Service Quotas API


DynamoDB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option C Option C would be valid if we had performance issues
At Cognito User Pools, but that’s not the question
Client
• Shard users to Authentication
Amazon Cognito user + get token
pools in multiple
regions to reduce user 502 errors in case of spikes

authentication latency.
API Gateway

Cognito User Pools


Multiple Regions
Multiple Pools
Lambda

DynamoDB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option D
Client
• Use DynamoDB Authentication
strongly consistent + get token
reads to ensure the
latest data is always 502 errors in case of spikes

returned to the client


API Gateway
application.
Cognito User Pools

Lambda

Strongly Consistent Reads

DynamoDB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 5 – Architecture
• A web hosting company has enabled Amazon GuardDuty in every AWS
Region for all of its accounts. A system administrator must create an
automated response to high-severity events.
• How should this be accomplished?

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER B
Options
A) Create rules through VPC Flow Logs that trigger an AWS Lambda function that programmatically addresses the issue.
B) Create an AWS CloudWatch Events rule that triggers an AWS Lambda function that programmatically addresses the issue.
C) Configure AWS Trusted Advisor to trigger an AWS Lambda function that programmatically addresses the issue.
D) Configure AWS CloudTrail to trigger an AWS Lambda function that programmatically addresses the issue.

SNS
GuardDuty
VPC Flow Logs

CloudTrail Logs

DNS Logs (AWS DNS) CloudWatch Event


Lambda

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 6
• A company is launching a new web service on an Amazon ECS cluster.
Company policy requires that the security group on the cluster
instances block all inbound traffic but HTTPS (port 443). The cluster
consists of Amazon 100 EC2 instances. Security engineers are
responsible for managing and updating the cluster instances. The security
engineering team is small, so any management efforts must be
minimized.
• How can the service be designed to meet these operational
requirements?

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 6 – Architecture
ECS Cluster

Security group

443

Client

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option A
• Change the SSH port ECS Cluster
to 2222 on the cluster
instances with a user Security group

data script. Log in to


each instance using 443
SSH over port 2222
Client

2222

SSH Client

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option B
• Change the SSH port to ECS Cluster
2222 on the cluster instances
with a user data script. Use Security group
AWS Trusted Advisor to
remotely manage the cluster
instances over port 2222 443

Client

2222

Trusted Advisor

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER
Option C
• Launch the cluster ECS Cluster
instances with no
SSH key pairs. Use Security group
the Amazon EC2
Systems Manager
Run Command to 443
remotely manage SSM Agent
Client
the cluster instances

Run
Command

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option D
• Launch the cluster ECS Cluster
instances with no SSH key
pairs. Use AWS Trusted Security group
Advisor to remotely
manage the cluster
instances. 443

Client

Trusted Advisor

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 7
• A company has two AWS accounts: one for production workloads and
one for development workloads. Creating and managing these
workloads are a development team and an operations team. The
company needs a security strategy that meets the following
requirements:
• Developers need to create and delete development application infrastructure.
• Operators need to create and delete both development and production
application infrastructure.
• Developers should have no access to production infrastructure.
• All users should have a single set of AWS credentials.
• What strategy meets these requirements?

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option A
• In the development account:
• Create a development IAM group with the ability to create and delete
application infrastructure.
• Create an IAM user for each operator and developer and assign them to the
development group.
• In the production account:
• Create an operations IAM group with the ability to create and delete application
infrastructure.
• Create an IAM user for each operator and assign them to the operations group.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option B
• In the development account:
• Create a development IAM group with the ability to
create and delete application infrastructure.
• Create an IAM user for each developer and assign
them to the development group.
• Create an IAM user for each operator and assign
them to the development group and the operations
group in the production account.
• In the production account:
• Create an operations IAM group with the ability to
create and delete application infrastructure.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option C
• In the development account:
• Create a shared IAM role with the ability to create and delete application
infrastructure in the production account.
• Create a development IAM group with the ability to create and delete
application infrastructure.
• Create an operations IAM group with the ability to assume the shared role.
• Create an IAM user for each developer and assign them to the development
group.
• Create an IAM user for each operator and assign them to the development
group and the operations group.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER
Option D
• In the development account:
• Create a development IAM group with the ability to create and delete application
infrastructure.
• Create an operations IAM group with the ability to assume the shared role in the
production account.
• Create an IAM user for each developer and assign them to the development group.
• Create an IAM user for each operator and assign them to the development group and
the operations group.
• In the production account:
• Create a shared IAM role with the ability to create and delete application infrastructure.
• Add the development account to the trust policy for the shared role.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 8 – Architecture
• A company is migrating an Apache Hadoop cluster from its data center
to AWS. The cluster consists of 60 VMware Linux virtual machines
(VMs). During the migration cluster, downtime should be minimized.
• Which process will minimize downtime?

Corporate
data center

AWS Cloud

VMWare Linux
Running Hadoop

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option C
• Create OVA files of the VMs. Upload the OVA files to Amazon S3. Use
VM Import/Export to create AMIs from the OVA files. Launch the
cluster on AWS as Amazon EC2 instances from the AMIs

Corporate
data center

AWS Cloud
VM Import/Export

VMWare Linux S3 EC2


Running Hadoop

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option D
• Export the HDFS data from the VMs to a new Amazon Aurora
database. Launch a new Hadoop cluster on Amazon EC2 instances.
Import the data from the Aurora database to HDFS on the new cluster.

Corporate
data center

AWS Cloud
Export HDFS
Import HDFS

VMWare Linux Aurora EC2


Running Hadoop

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option A
• Use the AWS Management Portal for vCenter to migrate the VMs to
AWS as Amazon EC2 instances

We recommend using AWS Server Migration Service (SMS) to migrate VMs from a vCenter
environment to AWS. SMS automates the migration process by replicating on-premises
VMs incrementally and converting them to Amazon machine images (AMIs). You can
continue using your on-premises VMs while migration is in progress. For more information
about AWS SMS, see AWS Server Migration Service.

If any of the following are true, you should consider using AWS SMS:
• You are using vCenter 6.5 Server.
• You want to specify BYOL licenses during migration.
• You are interested in migrating VMs to Amazon EC2.
• You want to use incremental migration.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER
Option B
• Use AWS SMS to migrate the VMs to AWS as AMIs. Launch the cluster
on AWS as Amazon EC2 instances from the migrated AMIs

Corporate
SMS
data center

AWS Cloud

VMWare Linux
Running Hadoop

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 9
• A solutions architect needs to reduce costs for a big data application. The application
environment consists of hundreds of devices that send events to Amazon Kinesis
Data Streams. The device ID is used as the partition key, so each device gets a
separate shard. Each device sends between 50 KB and 450 KB of data per second.
The shards are polled by an AWS Lambda function that processes the data and
stores the result on Amazon S3.
• Every hour, an AWS Lambda function runs an Amazon Athena query against the
result data that identifies any outliers and places them in an Amazon SQS queue. An
Amazon EC2 Auto Scaling group of two EC2 instances monitors the queue and
runs a short (approximately 30-second) process to address the outliers. The devices
submit an average of 10 outlying values every hour.
• Which combination of changes to the application would MOST reduce costs?
(Select TWO.)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 9 – Architecture
Auto Scaling group

Poll

30s processing SQS

10 msgs / hour
Kinesis
Lambda
50-450KB /s
Shard 1

Shard 2 Athena

Shard 50 Lambda S3

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Options (choose 2)
A) Change the Auto Scaling group launch configuration to use smaller
instance types in the same instance family.
B) Replace the Auto Scaling group with an AWS Lambda function
triggered by messages arriving in the Amazon SQS queue.
C) Reconfigure the devices and data stream to set a ratio of 10 devices
to 1 data stream shard.
D) Reconfigure the devices and data stream to set a ratio of 2 devices to
1 data stream shard.
E) Change the desired capacity of the Auto Scaling group to a single EC2
instance.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER B
Option Group 1
Auto Scaling group
A) Change the Auto Scaling group Poll
launch configuration to use smaller small
SQS
instance types in the same instance > 30s processing 10 msgs / hour
family. small
B) Replace the Auto Scaling group
with an AWS Lambda function
triggered by messages arriving in SQS
the Amazon SQS queue. 10 msgs / hour
E) Change the desired capacity of
the Auto Scaling group to a single Auto Scaling group

EC2 instance. SQS


30s processing 10 msgs / hour

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER D
Option Group 2
C) Reconfigure the devices
50-450KB /s x 10 = 500KB-4500KB / s
and data stream to set a
ratio of 10 devices to 1 Shard N
data stream shard.

50-450KB /s x 2 = 100KB-900KB / s
D) Reconfigure the devices
and data stream to set a Shard N
ratio of 2 devices to 1 data
stream shard.
Each shard has a limit of 1MB/s = 1000KB/s
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 10
• A company operates an ecommerce application on Amazon EC2 instances
behind an ELB Application Load Balancer. The instances run in an Amazon
EC2 Auto Scaling group across multiple Availability Zones. After an order is
successfully processed, the application immediately posts order data to an
external third-party affiliate tracking system that pays sales commissions for
order referrals. During a highly successful marketing promotion, the number
of EC2 instances increased from 2 to 20. The application continued to work
correctly, but the increased request rate overwhelmed the third-party affiliate
and resulted in failed requests.

• Which combination of architectural changes could ensure that the entire


process functions correctly under load? (Select TWO.)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 10 – Architecture

ALB

Auto Scaling group

External API

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER B
Option Group 1
• A) Move the code that calls the
affiliate to a new AWS Lambda
function. Modify the application
to invoke the Lambda function External API
asynchronously.
• B) Move the code that calls the
affiliate to a new AWS Lambda
function. Modify the application
to place the order data in an
Amazon SQS queue. Trigger SQS External API
the Lambda function from the
queue.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER D
Option Group 2
• C) Increase the timeout of
the new AWS Lambda
function.
• D) Adjust the concurrency
limit of the new AWS
Lambda function.
SQS External API
• E) Increase the memory of
the new AWS Lambda
function.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Analysis from the Practice Sample questions

• Note: the questions are from 2015 – the certification has definitely
evolved since then, but the questions are still interesting!

• Download them from the PDF resource attached to this lecture

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 1
• Your company’s on-premises content management system has the
following architecture:
• Application Tier – Java code on a JBoss application server
• Database Tier – Oracle database regularly backed up to Amazon Simple
Storage Service (S3) using the Oracle RMAN backup utility
• Static Content – stored on a 512GB gateway stored Storage Gateway volume
attached to the application server via the iSCSI interface
• Which AWS based disaster recovery strategy will give you
the best RTO?
Disaster

RPO RTO

Data loss Downtime

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 1 – Architecture Diagram
• Your company’s on-premises on-premises AWS Cloud
content management system
has the following architecture: backups
• Application Tier – Java code on Oracle Database
a JBoss application server Backed Up to S3
Using RMAN
• Database Tier – Oracle
database regularly backed up
to Amazon Simple Storage
Service (S3) using the Oracle
RMAN backup utility Java Code
• Static Content – stored on a On JBoss
512GB gateway stored Storage
Gateway volume attached to
the application server via the async backups
iSCSI interface
As EBS snapshots
Storage Gateway
• Which AWS based disaster Volume (iSCSI)
recovery strategy will give you
the best RTO?

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER
Option A
• Deploy the Oracle on-premises AWS Cloud
database and the backups restore
JBoss app server on Oracle Database
Backed Up to S3
EC2. Restore the Using RMAN Oracle
RMAN Oracle
backups from
Amazon S3. Java Code
Generate an EBS Java
On JBoss
attach
volume of static
content from the async backups
Storage Gateway and EBS
As EBS snapshots
Storage Gateway Volume
attach it to the JBoss Volume (iSCSI) restore
EC2 server.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option B
• Deploy the Oracle on-premises AWS Cloud
database on RDS. backups
Deploy the JBoss app Oracle Database
Lifecycle
Policy? restore
Backed Up to S3
server on EC2. Using RMAN
Restore the RMAN
Oracle backups from
Amazon Glacier. Java Code
Generate an EBS Java
On JBoss
attach
volume of static
content from the async backups
Storage Gateway and EBS
As EBS snapshots
Storage Gateway Volume
attach it to the JBoss Volume (iSCSI) restore
EC2 server.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option C
• Deploy the Oracle on-premises AWS Cloud
database and the
JBoss app server on backups restore
Oracle Database
EC2. Restore the Backed Up to S3
RMAN Oracle Using RMAN Oracle
backups from
Amazon S3. Restore
the static content by Java Code
attaching an AWS On JBoss
Java Attach
Storage Gateway Over iSCSI
running on Amazon
EC2 as an iSCSI async backups
Storage
volume to the JBoss As EBS snapshots
Storage Gateway Gateway
EC2 server. Volume (iSCSI) restore Volume

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option D
• Deploy the Oracle on-premises AWS Cloud
database and the backups restore
JBoss app server on Oracle Database
Backed Up to S3
EC2. Restore the Using RMAN Oracle
RMAN Oracle
backups from
Amazon S3. Restore Java Code
Java
On JBoss
the static content
from an AWS async backups
Storage Gateway- As EBS snapshots
Storage
Gateway
VTL running on Storage Gateway
Volume (iSCSI) Tape
Amazon EC2 restore

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 2
• An ERP application is deployed in Region

multiple Availability Zones in a single Availability Zone Availability Zone

region. In the event of failure, the RTO


must be less than 3 hours, and the RPO
is 15 minutes. The customer realizes
that data corruption occurred roughly
1.5 hours ago. Which DR strategy can
be used to achieve this RTO and RPO
in the event of this kind of failure?
Disaster

RPO RTO

Data loss Downtime


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option A Region

Availability Zone Availability Zone

• Take 15-minute DB
backups stored in
Amazon Glacier, with
transaction logs stored in
Amazon S3 every 5
minutes.
15-minute 5-minute transaction
DB backup Logs backup

Glacier S3
Slow RTO

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option B Region

Availability Zone

• Use synchronous Data corruption


database master-slave
replication between two
Availability Zones.
Synchronous
Master-slave replication

Availability Zone

Data corruption

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER
Option C Region

Availability Zone Availability Zone

• Take hourly DB backups


to Amazon S3, with
transaction logs stored in
S3 every 5 minutes

Hourly 5-minute transaction


DB backup Logs backup

S3 S3

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option D Region

Availability Zone Availability Zone

• Take hourly DB backups to


an Amazon EC2 instance
store volume, with
transaction logs stored in
Amazon S3 every 5 minutes.
Hourly
Region
backup
Availability Zone 5-minute transaction
Logs backup

EC2 Instance
Store Volume S3

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 3
• The Marketing Director in your company asked you to create a mobile
app that lets users post sightings of good deeds known as random acts
of kindness in 80-character summaries. You decided to write the
application in JavaScript so that it would run on the broadest range of
phones, browsers, and tablets. Your application should provide access to
Amazon DynamoDB to store the good deed summaries. Initial testing
of a prototype shows that there aren’t large spikes in usage. Which
option provides the most cost-effective and scalable architecture for
this application?

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 3 – Architecture Diagram

Run on Tablets, Web Browsers,


Mobile Applications

Client
DynamoDB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option A
TVM on EC2
• Provide the JavaScript
client with temporary STS
credentials from the Request
Security Token Service credentials
using a Token Vending
Machine (TVM) on an EC2 Send temp creds
instance to provide signed
credentials mapped to an
Amazon Identity and
Access Management (IAM)
user allowing DynamoDB Client DynamoDB
puts and S3 gets. You serve
your mobile application
out of an S3 bucket Amazon S3 bucket
enabled as a web site. Your Get static JS content Enabled as a website
client updates DynamoDB. For the application

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER
Option B
• Register the application
with a Web Identity STS
Provider like Amazon, login
Google, or Facebook, AssumeRoleWithWebIdentity
create an IAM role for that
provider, and set up
permissions for the IAM
role to allow S3 gets and
DynamoDB puts. You serve Client DynamoDB
your mobile application out
of an S3 bucket enabled as
a web site. Your client Amazon S3 bucket
Get static JS content Enabled as a website
updates DynamoDB. For the application

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option C
• Provide the JavaScript client with TVM on EC2
temporary credentials from the
Security Token Service using a STS
Token Vending Machine (TVM) Request
to provide signed credentials credentials
mapped to an IAM user allowing Send temp creds
DynamoDB puts. You serve your
mobile application out of
Apache EC2 instances that are
load-balanced and autoscaled.
Your EC2 instances are Client ELB EC2 + ASG DynamoDB
configured with an IAM role that
allows DynamoDB puts. Your
server updates DynamoDB.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option D
• Register the JavaScript application
with a Web Identity Provider like
Amazon, Google, or Facebook, STS
create an IAM role for that
provider, and set up permissions login
for the IAM role to allow AssumeRoleWithWebIdentity
DynamoDB puts. You serve your
mobile application out of Apache
EC2 instances that are load-
balanced and autoscaled. Your
EC2 instances are configured Client
with an IAM role that allows
DynamoDB puts. Your server
updates DynamoDB. ELB EC2 + ASG DynamoDB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 4
• You are building a website that will retrieve and display highly sensitive
information to users. The amount of traffic the site will receive is known and
not expected to fluctuate. The site will leverage SSL to protect the
communication between the clients and the web servers. Due to the nature
of the site you are very concerned about the security of your SSL private
key and want to ensure that the key cannot be accidentally or intentionally
moved outside your environment. Additionally, while the data the site will
display is stored on an encrypted EBS volume, you are also concerned that
the web servers’ logs might contain some sensitive information; therefore,
the logs must be stored so that they can only be decrypted by employees of
your company. Which of these architectures meets all of the requirements?

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 4 – Architecture Diagram

SSL Certificate
SSL Private Key

Web Server

EBS Volume
Encrypted with KMS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option A
• Use Elastic Load
Balancing to distribute
traffic to a set of web Ephemeral Volume
servers. To protect the Encrypted with a random AES key
SSL private key, upload
the key to the load
balancer and configure
the load balancer to SSL Certificate
offload the SSL traffic. Private Key
Write your web server
logs to an ephemeral HTTP
volume that has been HTTPS
encrypted using a EC2
randomly generated AES
key.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option B
• Use Elastic Load Balancing
S3
to distribute traffic to a set With SSE
of web servers. Use TCP
load balancing on the load Logs to S3
balancer and configure
your web servers to
retrieve the private key TCP Secure TCP
from a private Amazon S3 EC2
bucket on boot. Write your
web server logs to a Retrieve on boot
private Amazon S3 bucket Using EC2 user data
SSL Certificate
using Amazon S3 server- Private Key
side encryption.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER
Option C
• Use Elastic Load Balancing S3
to distribute traffic to a set With SSE
of web servers, configure
the load balancer to Logs to S3
perform TCP load balancing,
use an AWS CloudHSM to
perform the SSL TCP Secure TCP
transactions, and write your EC2
web server logs to a private SSL
Amazon S3 bucket using SSL Certificate
offloading
Amazon S3 server-side Private Key
encryption. CloudHSM

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option D
Ephemeral Volume
• Use Elastic Load Balancing to Encrypted with a random AES key
distribute traffic to a set of
web servers. Configure the Send logs
load balancer to perform TCP
load balancing, use an AWS
CloudHSM to perform the SSL TCP Secure TCP
transactions, and write your EC2
web server logs to an SSL
ephemeral volume that has SSL Certificate
offloading
been encrypted using a Private Key
CloudHSM
randomly generated AES key.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 5
• You are designing network connectivity for your fat client application.
The application is designed for business travelers who must be able to
connect to it from their hotel rooms, cafes, public Wi-Fi hotspots, and
elsewhere on the Internet. You do not want to publish the application
on the Internet.
• Which network design meets the above requirements while minimizing
deployment and operational costs?

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option A
• Implement AWS
Direct Connect, and VPC
create a private
interface to your VPC. Public subnet

Create a public subnet


and place your
application servers in it. Direct Connect
App Server

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option B
• Implement Elastic Load
Balancing with an SSL
listener that terminates
the back-end connection
to the application.
SSL HTTP
Public ELB EC2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option C
• Configure an IPsec VPN
connection, and provide the VPC
users with the configuration
details. Create a public Public subnet

subnet in your VPC, and


place your application
servers in it. VPN
Over www
App Server

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER
Option D
• Configure an SSL VPN solution
in a public subnet of your VPC, VPC
then install and configure SSL
VPN client software on all user Public subnet Private subnet

computers. Create a private VPN


subnet in your VPC and place Over www

your application servers in it.


VPN Solution
App Server

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 6
• Your company hosts an on-premises legacy engineering application
with 900GB of data shared via a central file server. The engineering
data consists of thousands of individual files ranging in size from
megabytes to multiple gigabytes. Engineers typically modify 5-10
percent of the files a day. Your CTO would like to migrate this
application to AWS, but only if the application can be migrated over
the weekend to minimize user downtime. You calculate that it will take
a minimum of 48 hours to transfer 900GB of data using your
company’s existing 45-Mbps Internet connection.
• After replicating the application’s environment in AWS, which option
will allow you to move the application’s data to AWS without losing
any data and within the given timeframe?

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Question 6 – Architecture Diagram
on-premises AWS Cloud

Legacy
replicated
Application

?
Central
File Server

45 Mbps
> 48 hours for 900 GB
5-10% changing every day

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option A
• Copy the data to
Amazon S3 using on-premises AWS Cloud

multiple threads and


multi-part upload for
large files over the Legacy
replicated
weekend, and work in Application
parallel with your Reconfigure
developers to Application to use S3
reconfigure the
Central
replicated application File Server
environment to leverage Copy over the weekend
Amazon S3 to serve the Network is maxed out
engineering files.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CORRECT ANSWER
Option B
• Sync the application
data to Amazon S3 on-premises AWS Cloud

starting a week replicated


before the migration, Legacy
mount
on Friday morning Application

perform a final sync,


and copy the entire Copy to EFS
data set to your AWS Central
File Server
file server after the Start a week before
sync completes. Final sync Friday morning

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option C
• Copy the application data
to a 1-TB USB drive on on-premises AWS Cloud

Friday and immediately send


overnight, with Saturday Legacy
replicated
delivery, the USB drive to Application

AWS Import/Export to be mount


imported as an EBS volume,
mount the resulting EBS Central
File Server
volume to your AWS file USB shipped to AWS EBS Volume
server on Sunday. + AWS Import/Export

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Option D
• Leverage the AWS on-premises AWS Cloud

Storage Gateway to
create a Gateway-Stored
replicated
volume. On Friday copy Legacy
the application data to Application

the Storage Gateway mount


volume. After the data
has been copied, perform Central
a snapshot of the volume File Server
and restore the volume
as an EBS volume to be copy
attached to your AWS file Storage Gateway
server on Sunday. Stored Volume snapshot
EBS Volume

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Next steps
• Congratulations, you have covered all the domains!
• Make sure you revisit the lectures as much as possible

• A good extra resource to do is the AWS Exam Readiness course at:


• https://www.aws.training/Details/eLearning?id=34737

• The AWS Certified SA Pro exam is hard, and tests experience…


• Make sure you master every single concept outlined in this course

© Stephane Maarek

You might also like