Data Engineering

Snowflake cost and efficiency optimisation checklist

An easy to navigate checklist of things to consider in order to optimise your snowflake environment and help reduce costs and increase efficiency.

Steven Wallace

Chalkboard with snowflake logo and arrows depicting efficiency rising and costs reducing

Introduction

If you are looking to save a bit of money on your snowflake expenses, then look no further. I have compiled a list of short and sweet considerations for your business to look at in order to increase efficiency in your snowflake environment. Each of these are a quick summary with links and code snippets to get you on your way.

Storage

1. Unused storage: Free up unused storage and save on cost. Use select.devs dbt-snowflake-monitoring package and identify unused tables in dbt. Alternatively use the TABLE_STORAGE_METRICS view and order by TOTAL_BILLABLE_BYTES to find the tables costing you the most:


   select table_catalog as database_name,
          table_schema as schema_name,
          table_name,
          (active_bytes + time_travel_bytes + failsafe_bytes + retained_for_clone_bytes) as total_billable_bytes
     from snowflake.account_usage.table_storage_metrics
 order by total_billable_bytes desc
    limit 10

2. Use transient tables: Do you have tables which are only for transformation purposes? Avoid time-travel and fail-safe costs by making them transient.


   create or replace transient table my_table as ...

3. Remember to remove temp/transient tables: This is a select query that will generate a list of SQL commands to execute for all temporary tables that are older than 1 day.


   select 'drop table ' || table_catalog || '.' || table_schema || '.' || "table_name" as sql_command
     from information_schema.tables
    where table_type = 'LOCAL TEMPORARY' and 
          created <= dateadd(day, 1, created);

Environment

4. Set auto-suspend: The time the warehouse stays online with inactivity. Not too low as cache is cleared and you are charged for 1 minute on startup, default is 10 minutes:


   alter warehouse compute_wh set auto_suspend = 300;

5. Set query timeout: The time a statement/query can run before snowflake cancels it. The default is 2 days:


   alter warehouse compute_wh set statement_timeout_in_seconds = 600;

6. Set minimum clusters: By default the minimum clusters is set to 1. Ensure this is no higher than 1 to avoid any unused cluster cost.

7. Client disconnects: If a client running a query is disconnected, snowflake will continue to run that unless you set:


   alter account set abort_detached_query = true

8. Consider snowpark-optimised warehouse: For complex snowpark workloads it may be beneficial to use Snowpark-optimized Warehouses | Snowflake Documentation 

9. Consider enabling query acceleration: if you have the following type of workloads: Ad hoc analytics, Workloads with unpredictable data volume per query or Queries with large scans and selective filters. Then look into Using the Query Acceleration Service | Snowflake Documentation 

10. Consider using search optimisation: some queries can benefit from using search optimisation, see if you can by Identifying Queries That Can Benefit from Search Optimization | Snowflake Documentation 

Transformation

11. Consider whether you need to run jobs on weekends: It’s easy to just run jobs on a daily basis, but what if there was no requirement for data to be up to date on weekends? You could save ~104 days of compute a year.

12. Change warehouse size dynamically: expand and shrink your snowflake data warehouse size with:


   alter warehouse compute_wh set warehouse_size = medium;

13. Take advantage of micro partitions: Cluster tables by columns which are filtered on regularly for queries to make use of partition pruning:


   alter table my_table cluster by (business_key);

14. Use incremental or materialised: Do you have a lot of data to load? Try incremental. Do you have a lot of data to query? Try materialised. Working with Materialized Views | Snowflake Documentation 

15. Use ANSI joins: If you have older code, ensure joins are not done in the where clause. Snowflake accepts these for backward compatibility, but does not optimise them.

16. Avoid using order by: Only use the order by statement if you really need it. Sorting and ordering can be very expensive.

17. Take advantage of metadata cache: Snowflake stores a lot of metadata about objects. In some queries snowflake will avoid warehouse usage if the answer is in the metadata, for example - select distinct, select count(*). The is a good article: Caching in Snowflake - ThinkETL

Conclusion

These are only a few considerations when setting up or auditing your environment. One of the key things missing here is monitoring your usage. Without measuring, you will be unable to benchmark and save. If you have any more cost saving or efficiency ideas please let me know by leaving a comment.

+
Thank you! Check your email for more information.
Oops! Something went wrong while submitting the form.