Trino (Presto) DB: Zero Copy Lakehouse: Artem Aliev Huawei
Trino (Presto) DB: Zero Copy Lakehouse: Artem Aliev Huawei
Trino (Presto) DB: Zero Copy Lakehouse: Artem Aliev Huawei
[email protected]
Application scenarios
• Data enrichment and composition services
• Multi-datasource, multi-cloud, micro service environment
• Exploration analytic
• What else we have for analyses?
• Fraud/Security breach detection and prevention
• ML model inference
Application scenarios
• Data enrichment and composition services
• Multi-datasource, multi-cloud, micro service environment
• Exploration analytic
• What else we have for analyses?
• Fraud/Security breach detection and prevention
• ML model inference
Requirements
• Interactive queries (join queries)
• Seconds for analytics
• Sub-seconds for user services
• Different Data Sources
• SQL/NoSQL databases
• S3 files and Hadoop Systems
• REST Services
• Consistent up-to-date results
• Open Source
Warehouse District
rows Customer
Show user history for the Stock 3 000 000 rows
10 000 000
given warehouse. rows New-Order
900 000 rows
connector.name=postgresql
connection-url=jdbc:postgresql://localhost:5432/tpcc
connection-user=postgres
connection-password=password
Great Optimization Engine
• Cost based optimizations (CBO)
• Hive connector only
• Pushdowns
• Predicate
• Optimizer propagates constants through joins
• Dynamic filtering support for joins (base on CBO)
• Projection
• Aggregation!
• JOIN*
• TOP-N and LIMITs
• ORDER BY ... LIMIT N or ORDER BY ... FETCH FIRST N ROWS
Warehouse District
rows Customer
Show user history for given Stock 3 000 000 rows
10 000 000
warehouse. rows New-Order
900 000 rows
For “production” usage just store catalog in the git and mount it into the docker
#> docker run --rm -p 8080:8080 \
-v /opt/trino_catalog_git:/etc/trino/catalog \
--name trino trinodb/trino
Run some commands
Sample data the right way
Web UI
System catalog
JMX support
• A lot of System Mbeans
And so on and so far