Ankit Goyal’s Post

CTO at Retape (YC W23)

8mo

Snowflake Optimization Tip of the Day #2 ➡ Enable Automatic Clustering Snowflake has a powerful concept of automatic clustering that can help in reducing compute cost. The way it works is - less data processed leads to a lower compute bill. By default, Snowflake clusters the data based on the insertion pattern of the data, commonly called natural clustering. Taking an example of events data, let’s say certain events come in every day, so Snowflake naturally organizes data by event date. But while querying, analysts might tend to use both the event date as well as the event type as filters. The distribution of event types can vary throughout the day, so Snowflake will not organize the data based on event type by default. You’ll need to guide it to cluster the data on both event date and event type, by explicitly mentioning clustering keys. Snowflake will do the heavy lifting of re-clustering the data in the background through the process of automatic clustering. Once this gets done, queries using event date and event type as filters would run faster - as they have to process only a subset of the data, and therefore will have a lower cost. However, automatic clustering also takes up credits, so it should not be enabled on all tables. But like every Snowflake feature, it’s very easy to switch on and off. The best way to determine whether to enable automatic clustering on a table is to actually do it and see if there's a cost advantage. I’ve personally seen it work well on tables >250GB in size.

3 Comments

Ankit Goyal

CTO at Retape (YC W23)

8mo

Official documentation - https://docs.snowflake.com/en/user-guide/tables-auto-reclustering

Alexis Chicoine

Senior BI Data Developper (Analytics Engineer) at Coveo

8mo

It’s not so much about the table size but about how you filter the data. Also, if your first clustering key isn’t time based it gets a lot more expensive.

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Abey Antony

Azure Data Engineer | Database Administrator
1mo
Report this post
Optimizing Data Performance with Clustering in Snowflake In Snowflake, Clustering and Automatic Clustering can make a significant impact on query performance, especially for large datasets. Let’s dive into what clustering is and how Snowflake’s automatic clustering feature makes life easier for data engineers and analysts! What is Clustering in Snowflake? • Clustering in Snowflake is a way of organizing data within a table based on certain columns (clustering keys), which helps improve query performance by reducing the amount of data scanned. • Why It Matters: When you frequently query data based on specific columns, clustering that data can make queries faster and more efficient by minimizing disk I/O and processing. Manual Clustering vs. Automatic Clustering 1. Manual Clustering: • Traditionally, you would need to manually maintain clustering by periodically reorganizing data based on the chosen clustering keys. This process requires monitoring and management to keep clusters optimized, which can be time-consuming. 2. Automatic Clustering: • Snowflake’s Automatic Clustering takes the manual work out of maintaining clustering. When you define clustering keys, Snowflake automatically reorganizes and optimizes the data as it’s loaded or modified. • Why This is a Game-Changer: Automatic Clustering continuously optimizes data without manual intervention, so you can focus on analysis rather than maintenance. Snowflake’s clustering and automatic clustering capabilities ensure that even as your data grows, query performance stays optimized with minimal manual effort! #Snowflake #DataOptimization #AutomaticClustering #DataEngineering #BigData #DataWarehousing #CloudData #DataPerformance #TechInnovation #DataOps
Like Comment
To view or add a comment, sign in
Data Engineer Things

37,217 followers
5mo
Report this post
❄ Data Modeling with Snowflake: A concise critical review 🖋️ Author: Chad Isenberg 🔗 Read the article here: https://lnkd.in/eWbD85pk ------------------------------------------- ✅ Follow Data Engineer Things for more insights and updates. 💬 Hit the 'Like' button if you enjoyed the article. ------------------------------------------- #dataengineering #datamodeling #snowflake #data

Data Modeling with Snowflake: A concise critical review

blog.det.life
Like Comment
To view or add a comment, sign in
Emmanuel DUBOIS

Co-founder
5mo
Report this post
📢 Exciting News for Data Enthusiasts! Snowflake has just rolled out a game-changing feature: Dynamic Iceberg Tables. But what does this mean for you if you're not a tech wizard? Let me break it down: 1️⃣ Easier Data Lake Integration Dynamic Iceberg Tables allow you to store massive amounts of data cost-effectively while still being able to analyze and transform it within Snowflake. It's like having a huge, organized warehouse for your data that you can easily access and work with. 2️⃣ Always Up-to-Date Information Imagine having a magic notebook that automatically updates itself with the latest information. That's what Dynamic Iceberg Tables do for your data. No more manual updates or outdated reports! 3️⃣ Flexible Data Processing Whether you need to process data in large batches or handle it as it comes in real-time, Dynamic Iceberg Tables have got you covered. You can switch between these modes with a simple command. 4️⃣ Cost-Efficient Updates Instead of refreshing all your data every time there's a change, Dynamic Iceberg Tables only process what's new or different. This means faster updates and lower costs. 5️⃣ Play Nice with Others These tables use a format that's compatible with other data tools, making it easier to share and collaborate across different platforms. In simple terms, Dynamic Iceberg Tables make handling big data easier, faster, and more cost-effective. It's a big step forward in making data more accessible and useful for businesses of all sizes. Are you excited about this new feature? How do you think it could benefit your organization? #Snowflake #DataInnovation #BusinessIntelligence #TechMadeSimple
Like Comment
To view or add a comment, sign in
Shraddha Shetty

Senior Consultant - Data Engineer @ EY | Data & Analytics | Certified in Azure, Databricks, Snowflake, AWS, PowerBI | Python, SQL, Scala, Spark, PySpark, dbt
5mo
Report this post
𝐁𝐞𝐡𝐢𝐧𝐝 𝐭𝐡𝐞 𝐒𝐜𝐞𝐧𝐞𝐬: 𝐇𝐨𝐰 𝐃𝐚𝐭𝐚 𝐢𝐬 𝐒𝐭𝐨𝐫𝐞𝐝 𝐢𝐧 𝐒𝐧𝐨𝐰𝐟𝐥𝐚𝐤𝐞 Ever wondered what makes Snowflake's data storage so powerful and efficient? Here's a peek behind the curtain. 𝐓ℎ𝐞 𝐌𝐚𝐠𝐢𝐜 𝐨𝐟 𝐌𝐢𝐜𝐫𝐨-𝐩𝐚𝐫𝐭𝐢𝐭𝐢𝐨𝐧𝐬 Snowflake stores all data in micro-partitions, which are compact units ranging from 50 to 500 MB before compression. This structure offers several key benefits: 1. Automatic Partitioning: Data is divided into micro-partitions automatically, eliminating manual intervention. 2. Columnar Storage: Data is stored in columns, allowing for efficient compression and fast queries. 3. Granular Metadata: Detailed metadata for each micro-partition enhances query pruning and performance. 𝐃𝐚𝐭𝐚 𝐂𝐥𝐮𝐬𝐭𝐞𝐫𝐢𝐧𝐠 𝐟𝐨𝐫 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐞𝐝 𝐏𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 Snowflake enhances query performance with clustering metadata: 1. Pruning Power: Only relevant micro-partitions are scanned, thanks to detailed metadata. 2. Efficient Sorting: Data is naturally sorted during insertion, avoiding unnecessary scans. 3. Clustering Depth: Snowflake monitors clustering health to maintain optimal performance over time. 𝐒𝐢𝐦𝐩𝐥𝐢𝐟𝐲𝐢𝐧𝐠 𝐃𝐚𝐭𝐚 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 Snowflake makes data management seamless: 1. Dynamic Scaling: Micro-partitions adapt to data changes, reducing maintenance needs. 2. Automatic Optimization: Snowflake determines the best compression algorithms and clustering keys. 𝐑𝐞𝐚𝐥-𝐰𝐨𝐫𝐥𝐝 𝐈𝐦𝐩𝐚𝐜𝐭 Imagine querying a year's worth of data in seconds, with sub-second response times for time-series data. Snowflake's architecture delivers unprecedented speed and efficiency, processing millions of rows in mere seconds. Within each micro-partition, data is stored in a columnar data structure, allowing better compression and efficient access only to those columns required by a query. Shown in the picture 24 rows from the table are stored and sorted in 4 micro-partitions by columns. Repeated values are stored only once. Let’s imagine that you need data from two different tables for your SQL query to be executed. So, instead of copying both tables fully to the compute cluster, Snowflake retrieves only relevant micro-partitions. As a result, the query needs less time to be completed. Unlock the potential of your data with Snowflake. Experience the future of data storage today! #Snowflake #DataStorage #MicroPartitions #DataClustering
Like Comment
To view or add a comment, sign in
5minsnowflake Newsletter

42 followers
2mo
Report this post
Data loading methods in Snowflake are crucial for optimizing performance and efficiency. One bold assertion made in a recent article is that the choice of loading strategy can significantly impact overall data warehousing success. This is an important consideration for developers and data scientists alike. Are some methods really more beneficial than others, or does it all depend on the specific use case? I invite everyone to share their thoughts and experiences on this topic. Let’s spark a conversation about best practices in data loading. What methods have you found most effective? #DataScience #Snowflake #CloudComputing #DataLoading #DataEngineering https://lnkd.in/gKKY4Enp

What are the different data loading methods in Snowflake? - Snowflake Solutions

https://snowflakesolutions.net
Like Comment
To view or add a comment, sign in
Prakhar Tolambia

Data Engineer @ ZS , AWS || PySpark || Python || Airflow
7mo
Report this post
🔍 Exploring the Power of Dynamic Tables in Snowflake 🔍 Are you looking to elevate your data management game? Dive into the world of dynamic tables in Snowflake! 💡 Dynamic tables offer unparalleled flexibility and efficiency in handling data, enabling seamless adaptability to changing business needs. Whether you're dealing with evolving schemas, fluctuating data volumes, or dynamic data transformations, Snowflake's dynamic tables have got you covered. Here's why dynamic tables are a game-changer: 1️⃣ Adaptive Schema Evolution: Say goodbye to rigid schemas. With dynamic tables, you can effortlessly accommodate schema changes without the hassle of altering existing structures. This agility ensures smooth data ingestion and processing, empowering your analytics initiatives. 2️⃣ Scalability at its Finest: In today's data-driven landscape, scalability is non-negotiable. Dynamic tables in Snowflake scale effortlessly, allowing you to handle massive volumes of data with ease. Whether you're dealing with terabytes or petabytes, Snowflake's dynamic tables ensure optimal performance without compromising on speed. 3️⃣ Real-time Insights: Time is of the essence in the world of analytics. With dynamic tables, you can unlock real-time insights into your data, enabling quick decision-making and actionable intelligence. Say hello to agility and goodbye to latency. 4️⃣ Efficient Data Transformations: Transforming data shouldn't be a headache. Dynamic tables streamline the process, enabling seamless data transformations on the fly. From simple manipulations to complex operations, Snowflake's dynamic tables empower you to transform your data with ease. Ready to harness the power of dynamic tables in Snowflake? Unlock the full potential of your data and supercharge your analytics initiatives like never before. 💥 #Snowflake #DataManagement #Analytics #DynamicTables
Like Comment
To view or add a comment, sign in
Sarang Ravate

Senior Software Engineer at CGI
3mo Edited
Report this post
🚀 **New Blog: Databricks Query Optimization – 2024 Guide** 🚀 Just published a comprehensive guide on boosting Databricks query performance for 2024. Whether you're a data pro or just getting started, this blog has actionable insights to help you optimize your workflows. Read more: [https://lnkd.in/dxH9UQKe) #Databricks #QueryOptimization Hevo Data

Databricks Query Optimization: 10 Techniques for Faster, Efficient Queries

hevodata.com
Like Comment
To view or add a comment, sign in
Guilherme Sepe

Strategic B2B Sales - Brazil @ Databricks | On a mission to help enterprises unify data, analytics, and AI
1mo
Report this post
Databricks delivered a load of new data warehousing features such as: - intelligent experiences - predictive optimizations - world-class price/performance. Check out the blog for more

What's new with Databricks SQL, October 2024

databricks.com
Like Comment
To view or add a comment, sign in
BluePi

19,540 followers
9mo
Report this post
◼️5 Signs Your Data Needs a Quality Check-Up. And How Snowflake Can Help! Does your data feel more like a jumbled mess than a valuable asset? If you're experiencing these signs, it's time to focus on data quality: ❌Inconsistent reports and conflicting information ❌Wasted time fixing errors and discrepancies ❌Difficulty making confident business decisions Fortunately, Snowflake offers a solution. This article explores: ✅How Snowflake helps you identify and address data quality issues ✅Key features like access history, data quality queries, and object tagging ✅Real-world examples of how Snowflake improves data quality Click the link and learn how Snowflake can be your data quality hero! https://lnkd.in/gnxK2Edm #Snowflake #DataQuality #Analytics #CloudDataWarehouse #DataIntegrity #bluepi

Data Quality in Snowflake

bluepiit.com
Like Comment
To view or add a comment, sign in
MaheswarNaidu Sasapu

Data Engineer | Big Data Engineer | Big Data Developer | @ Cognizant | Hadoop | HDFS | SQL | Hive | Scala | Python | Spark | AWS | AWS Glue | AWS EMR | AWS Redshift | S3 | Lambda
2mo Edited
Report this post
🎯 𝗘𝗧𝗟 𝘃𝘀 𝗘𝗟𝗧: 𝗪𝗵𝗮𝘁’𝘀 𝘁𝗵𝗲 𝗕𝗲𝘀𝘁 𝗖𝗵𝗼𝗶𝗰𝗲 𝗳𝗼𝗿 𝗬𝗼𝘂𝗿 𝗗𝗮𝘁𝗮 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆? 🤔 🚀 𝗘𝗧𝗟 (Extract, Transform, Load) and 𝗘𝗟𝗧 (Extract, Load, Transform) are two common data integration methods. But how do you decide which one to use for your business needs? Let's dive into the key differences and get your thoughts! 🔍 𝗤𝘂𝗶𝗰𝗸 𝗢𝘃𝗲𝗿𝘃𝗶𝗲𝘄: 𝗘𝗧𝗟: Data is transformed before loading into the target system (e.g., a data warehouse). • ✅ Good for complex transformations before data is loaded. • 🚫 Slower with large data volumes. 𝗘𝗟𝗧: Data is first loaded into the target system and then transformed. • ✅ Leverages the processing power of modern data warehouses (like Snowflake, BigQuery). • 🚫 May not work well if you need early data cleansing. #DataEngineering #BigData #ETL #ELT #DataPipelines #CloudData #DataStrategy #DataArchitecture # Karthik K.
Like Comment
To view or add a comment, sign in

8,204 followers

50 Posts

View Profile Follow

Ankit Goyal’s Post

More Relevant Posts

Explore topics