Qubole Open Data Lake Platform Aws Ra PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Install and configure requirements for Amazon

1
Qubole Open Data Lake Platform on AWS
EC2 On-Demand and Spot Instances in Qubole
Open Data Lake Platform. Configure Identity and
Access Management (IAM) Roles and AWS
Run ad hoc, streaming, and machine learning workloads on a data lake using cost-effective Amazon EC2 Spot Instances Accounts to ensure that the platform can access
infrastructure hosting Apache Spark, Presto, Hive, and Airflow engines. customer’s compute and storage.
Qubole’s open data lake platform manages AWS
2 infrastructure according to workload-driven
service level agreements (SLAs) and performance
without user involvement. Administrators can set
up cluster management and configuration for On-
Demand, Spot, and Spot Blocks on AWS in the
customer’s virtual private cloud (VPC).
Qubole’s Platform Runtime services include
3 3 Workload-Aware Autoscaling, Intelligent Spot
Management, Automated Cluster Lifecycle
Management, and Heterogeneous Cluster
Management, to manage the AWS compute
automatically for total cost optimization (TCO)
2 optimization as per workload and SLA
requirements.
Qubole uses AWS Glue Data Catalog as an
4 external Hive metastore, ensuring a single source
of truth for all metadata related to the customer
data in Amazon S3. Using AWS Glue sync agent,
QDS clusters can synchronize metadata changes
1 from their Hive metastore to AWS Glue Data
Catalog. Syncing metadata to the AWS Glue Data
Catalog allows users to to query their data using
AWS analytics services such as Amazon Athena
5 and Amazon Redshift. Users can also retrieve data
4 from Amazon S3 using native s3n and s3a client
connectors.
To accommodate spot interruptions, Qubole’s
5 Intelligent Spot Management use Amazon FSx for
Lustre to manage intermediate shuffle data create
by ML/AI, SQL analytics, and ETL jobs. Using
Amazon FSx for Lustre allows the Qubole
platform to reduce costs, tolerate loss in spot
nodes and improve workload performance.
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Reference Architecture

You might also like