Trino (Presto) DB: Zero Copy Lakehouse: Artem Aliev Huawei

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Trino(Presto)DB:

Zero Copy Lakehouse


Artem Aliev
Huawei
Artem Aliev
• Huawei Cloud Hybrid Integration Platform
• Expert and solution architect
• 20+ years in Software Development
• Big data platforms integrations
• Apache Hadoop, Spark, Cassandra, TinkerPop
• Storage optimizations
• JVM development
• SpbU teacher

[email protected]
Application scenarios
• Data enrichment and composition services
• Multi-datasource, multi-cloud, micro service environment
• Exploration analytic
• What else we have for analyses?
• Fraud/Security breach detection and prevention
• ML model inference
Application scenarios
• Data enrichment and composition services
• Multi-datasource, multi-cloud, micro service environment
• Exploration analytic
• What else we have for analyses?
• Fraud/Security breach detection and prevention
• ML model inference
Requirements
• Interactive queries (join queries)
• Seconds for analytics
• Sub-seconds for user services
• Different Data Sources
• SQL/NoSQL databases
• S3 files and Hadoop Systems
• REST Services
• Consistent up-to-date results
• Open Source
Warehouse District

Example (tpc-c) 100 rows


History
3 000 000
1000 rows

rows Customer
Show user history for the Stock 3 000 000 rows
10 000 000
given warehouse. rows New-Order
900 000 rows

Item Order-Line Order


select distinct i_name, i_price 100 000
30 000 000 rows 3 000 000 rows
from warehouse rows
join district on (w_id = d_w_id)
join customer on (d_w_id = c_w_id and d_id = c_d_id)
join orders on (o_w_id = w_id and o_d_id = d_id and o_c_id = c_id)
join order_line on (o_w_id = ol_w_id and o_d_id = ol_d_id and o_id = ol_o_id)
join stock on (ol_supply_w_id = s_w_id and ol_i_id = s_i_id)
join item on (s_i_id = i_id)
where w_id = 50 and c_id = 101; seconds
MPP DB 20-80
Tuned Trino 4
Postgres 0.7
Traditional Stack
• Data Lake
• Hive, Spark, Impala, Trino, Drill, Dremio*
• Data warehouse
• ClickHouse, Greenplum, Vertica*
• Data marts
• Postgres, Mysql, ClickHouse
ETL/ELT from sources to data marts
• Nightly by batches
• Streaming
• Fast
• Need special database to enrich and join data in the stream
• Redis, Cassandra, etc..
• Eager enrichments
• Both fights with:
• Data source model changes
• Loading failures
• Inconsistent loading
Databricks Solution: Lakehouse
Databricks Solution: Lakehouse
NO ETL!
• Big Data as usual DataBase
• Direct request to Data Sources
Micro service architecture support
• A lot of small exotic databases
• “Agile” development with a lot of schema changes
• REST API data access only
• Pay per request
• Google API, etc
Feature requirements summary
• Schema changes tolerance
• Advanced pushdowns to data sources and optimizations
• Legacy databases are still better in indexing
• No ETL
• Extreme: No caches, local materialized views, reflections, etc.
• Avoid full scans
• REST endpoint support
• Open Source
Candidate tested
• Postgres with FDW
• Very old and unsupported plugins
• Pushdowns works only with other Postgres
• Drill – schema-free for Hadoop
• Not in active development
• Optimizer is not good
• TrinoDB
• Very easy REST connector development
• Dremio -- not really Open Source
• Hive, Spark – files and manual jdbc only
The winner is: Presto
• Facebook develop Presto at 2012 and release to OS at 2013
• 2019
• PrestoDB supported by Facebook in Linux Foundation
• https://github.com/prestodb/presto
• PrestoSQL supported by Starburst
• 2020 Renamed to TrinoDB
• https://github.com/trinodb/trino

• 2020 OpenLooKeng from Huawei


• https://gitee.com/openlookeng/hetu-core
• Cloud Services
TrinoDB/PrestoDB
• SQL
• 30+ connectors
• Easy to develop new connectors
• Dynamic Catalog
• Represent data as tables
• In schema, in catalog
• Common type system
• Type conversions for columns
• Query planner is types aware
Classical Distributed Architecture
Adding Datasouce
• Just drop a property file into etc/catalog directory
• File name is a catalog name
• Schemas and tables will be loaded from the connector

connector.name=postgresql
connection-url=jdbc:postgresql://localhost:5432/tpcc
connection-user=postgres
connection-password=password
Great Optimization Engine
• Cost based optimizations (CBO)
• Hive connector only 
• Pushdowns
• Predicate
• Optimizer propagates constants through joins
• Dynamic filtering support for joins (base on CBO)
• Projection
• Aggregation!
• JOIN*
• TOP-N and LIMITs
• ORDER BY ... LIMIT N or ORDER BY ... FETCH FIRST N ROWS
Warehouse District

Highly-Selective Join 100 rows


History
3 000 000
1000 rows

rows Customer
Show user history for given Stock 3 000 000 rows
10 000 000
warehouse. rows New-Order
900 000 rows

Item Order-Line Order


select distinct i_name, i_price 100 000
30 000 000 rows 3 000 000 rows
from warehouse rows
join district on (w_id = d_w_id)
join customer on (d_w_id = c_w_id and d_id = c_d_id)
join orders on (o_w_id = w_id and o_d_id = d_id and o_c_id = c_id)
join order_line on (o_w_id = ol_w_id and o_d_id = ol_d_id and o_id = ol_o_id)
join stock on (ol_supply_w_id = s_w_id and ol_i_id = s_i_id)
join item on (s_i_id = i_id)
where w_id = 50 and c_id = 101; seconds
MPP DB 20-80
Tuned Trino 4
Postgres 0.7
Nested Loop Join
Nested Loop Join
First Attempt: Dynamic Filtering
• Collect ids from the right side
• Push ids to the left side join
• CBO is recommended
• Hive and Memory supported
• JDBC PR #7968
Secret Index Joins for Thrift Connector
• Is used to integrate external storage system without connector.
• Just wrap you service with ThriftServer
• Works for REST API!
• Wrapping JDBC
Is inconvenient
Apache Thrift overview
• Thrift is Remote Procedure Call Server development framework
• Development:
• Describe interface in .thrift file.
• Generate service interface and client code:
thrift --gen java TrinoThriftService.thrift
• Implement interfaces for the server
• Trino example ThriftTpchServer
Adding Index to JDBC connector
• Just add ;)

• Not in open source yet


Fixed:
• From 80 sec to 4
REST API and micro services
• Faceboook use(d) ThriftService
• Create thrift server for your microservices
• trino-example-http connector
• Modify for your needs
• Don’t forget about Index Provider
• We developed simple configurable connector for our internal services
Zero Copy Done!
• No need to build huge data lake with a lot of servers a head of time
• Single node TrinoDB could do data exploration

Let see other features:


Security
• HTTPS with TLS 1.2, 1.3
• User auth: Password,LDAP,Oauth,Kerberos,JWT,Certificate
• Access Control
• up to table operations
• System operations
Administration
• Web UI for monitoring
• JMX monitoring
• Resource groups
• Memory, CPU limits
• Queues
• Spill to disk support
Dynamic datasource reconfiguration
• Static property files by default
• PR: #12605
• OpenLooKeng fork
Caching
• Alluxio FS cache for Hive
• Memory connector
Indexing for Hive
• OpenLooKeng exclusive feature
• Bloom, Btree, MinMax,Bitmap indexes
High Availability
• OpenLooKeng
• Active-Active base on distributed cache
• Use standard approaches for microservices
• K8s
Try it: Lakehouse microserivce
#> docker run -p 8080:8080 --name trino trinodb/trino
Connect cli:
#> docker exec -ti trino trino

For “production” usage just store catalog in the git and mount it into the docker
#> docker run --rm -p 8080:8080 \
-v /opt/trino_catalog_git:/etc/trino/catalog \
--name trino trinodb/trino
Run some commands
Sample data the right way
Web UI
System catalog
JMX support
• A lot of System Mbeans
And so on and so far

You might also like