Snowflake Optimization Tip of the Day #2 ➡ Enable Automatic Clustering Snowflake has a powerful concept of automatic clustering that can help in reducing compute cost. The way it works is - less data processed leads to a lower compute bill. By default, Snowflake clusters the data based on the insertion pattern of the data, commonly called natural clustering. Taking an example of events data, let’s say certain events come in every day, so Snowflake naturally organizes data by event date. But while querying, analysts might tend to use both the event date as well as the event type as filters. The distribution of event types can vary throughout the day, so Snowflake will not organize the data based on event type by default. You’ll need to guide it to cluster the data on both event date and event type, by explicitly mentioning clustering keys. Snowflake will do the heavy lifting of re-clustering the data in the background through the process of automatic clustering. Once this gets done, queries using event date and event type as filters would run faster - as they have to process only a subset of the data, and therefore will have a lower cost. However, automatic clustering also takes up credits, so it should not be enabled on all tables. But like every Snowflake feature, it’s very easy to switch on and off. The best way to determine whether to enable automatic clustering on a table is to actually do it and see if there's a cost advantage. I’ve personally seen it work well on tables >250GB in size.
It’s not so much about the table size but about how you filter the data. Also, if your first clustering key isn’t time based it gets a lot more expensive.
CTO at Retape (YC W23)
8moOfficial documentation - https://docs.snowflake.com/en/user-guide/tables-auto-reclustering