Snowflake cost and efficiency optimisation checklist
Steven Wallace
Introduction
If you are looking to save a bit of money on your snowflake expenses, then look no further. I have compiled a list of short and sweet considerations for your business to look at in order to increase efficiency in your snowflake environment. Each of these are a quick summary with links and code snippets to get you on your way.
Storage
1. Unused storage: Free up unused storage and save on cost. Use select.devs dbt-snowflake-monitoring package and identify unused tables in dbt. Alternatively use the TABLE_STORAGE_METRICS view and order by TOTAL_BILLABLE_BYTES to find the tables costing you the most:
2. Use transient tables: Do you have tables which are only for transformation purposes? Avoid time-travel and fail-safe costs by making them transient.
3. Remember to remove temp/transient tables: This is a select query that will generate a list of SQL commands to execute for all temporary tables that are older than 1 day.
Environment
4. Set auto-suspend: The time the warehouse stays online with inactivity. Not too low as cache is cleared and you are charged for 1 minute on startup, default is 10 minutes:
5. Set query timeout: The time a statement/query can run before snowflake cancels it. The default is 2 days:
6. Set minimum clusters: By default the minimum clusters is set to 1. Ensure this is no higher than 1 to avoid any unused cluster cost.
7. Client disconnects: If a client running a query is disconnected, snowflake will continue to run that unless you set:
8. Consider snowpark-optimised warehouse: For complex snowpark workloads it may be beneficial to use Snowpark-optimized Warehouses | Snowflake Documentation
9. Consider enabling query acceleration: if you have the following type of workloads: Ad hoc analytics, Workloads with unpredictable data volume per query or Queries with large scans and selective filters. Then look into Using the Query Acceleration Service | Snowflake Documentation
10. Consider using search optimisation: some queries can benefit from using search optimisation, see if you can by Identifying Queries That Can Benefit from Search Optimization | Snowflake Documentation
Transformation
11. Consider whether you need to run jobs on weekends: It’s easy to just run jobs on a daily basis, but what if there was no requirement for data to be up to date on weekends? You could save ~104 days of compute a year.
12. Change warehouse size dynamically: expand and shrink your snowflake data warehouse size with:
13. Take advantage of micro partitions: Cluster tables by columns which are filtered on regularly for queries to make use of partition pruning:
14. Use incremental or materialised: Do you have a lot of data to load? Try incremental. Do you have a lot of data to query? Try materialised. Working with Materialized Views | Snowflake Documentation
15. Use ANSI joins: If you have older code, ensure joins are not done in the where clause. Snowflake accepts these for backward compatibility, but does not optimise them.
16. Avoid using order by: Only use the order by statement if you really need it. Sorting and ordering can be very expensive.
17. Take advantage of metadata cache: Snowflake stores a lot of metadata about objects. In some queries snowflake will avoid warehouse usage if the answer is in the metadata, for example - select distinct, select count(*). The is a good article: Caching in Snowflake - ThinkETL
Conclusion
These are only a few considerations when setting up or auditing your environment. One of the key things missing here is monitoring your usage. Without measuring, you will be unable to benchmark and save. If you have any more cost saving or efficiency ideas please let me know by leaving a comment.