Let's keep the Snowflake vs Databricks beside for a moment. Please tell me more in the comments about your experiences using Dask and Spark. #dataengineering #bigdata #distributedcomputing
How does Dask compare to Spark? We’ve re-run the TPC-H benchmarks and noticed some interesting results. Dask is often faster than Spark, both locally at 10 GB scale and on a 10 TB dataset on the cloud. Dask more reliably completed the queries, especially at large scale. PySpark failed or ran into an hour-long timeout for several queries that Dask successfully completed. We’re not Spark experts though, and would welcome critique here. We talk through these results, and other ways in which Spark and Dask differ in more detail in our blog post: https://lnkd.in/gTXynzrT