Anderson Chaves’ Post

View profile for Anderson Chaves, graphic

Lead Data Scientist (Light Management & Coatings) at EssilorLuxottica

Let's keep the Snowflake vs Databricks beside for a moment. Please tell me more in the comments about your experiences using Dask and Spark. #dataengineering #bigdata #distributedcomputing

View organization page for Coiled, graphic

2,539 followers

How does Dask compare to Spark? We’ve re-run the TPC-H benchmarks and noticed some interesting results. Dask is often faster than Spark, both locally at 10 GB scale and on a 10 TB dataset on the cloud. Dask more reliably completed the queries, especially at large scale. PySpark failed or ran into an hour-long timeout for several queries that Dask successfully completed. We’re not Spark experts though, and would welcome critique here. We talk through these results, and other ways in which Spark and Dask differ in more detail in our blog post: https://lnkd.in/gTXynzrT

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics