dbt (Data Build Tool) - Frequently Asked Questions
Steven Wallace
Introduction
There are a few articles about what dbt is, but too many of them focus on what it can do for you or your business. I want to explain in very basic terms, what dbt is and provide some frequently asked questions.
What is dbt?
dbt is an abbreviation for Data Build Tool and is essentially a wrapper for SQL statements that allows you to transform data within data platforms. It comes in two forms: dbt Core is free and open source, while dbt Cloud is a fully managed and paid for solution (opens in a new tab).
Is dbt an ETL tool?
No. It relates more to an ELT tool and provides only the T on the end. Let's say you have a snowflake data warehouse with lots of data. dbt can access that data, change or ‘Transform’ that data and insert it into new tables within snowflake.
Is dbt a Data Catalogue?
No, However it does help with cataloguing data a lot. As you develop in dbt you include documentation, metadata and lineage as you go. dbt then compiles this into a website or you can integrate dbt with popular data catalogue tools like Atlan (opens in a new tab).
Does dbt move any data?
No, dbt can only transform data that has already been loaded into a target data platform. It can execute code over databases, warehouses, data lakes, or query engines. It currently supports over 20 verified, trusted and community data platform adapters (opens in a new tab).
Where is dbt installed?
dbt is a command line tool written in python which makes it fairly flexible for implementation. Often it exists in a container and is orchestrated through that. Many tools have integrated dbt which allows dbt jobs to be executed or scheduled through them. A common example is combining dbt with Airflow (opens in a new tab).
How is dbt development done?
dbt is installed as a python module which is distributed on PyPi. All development is done through your favourite Python IDE, usually VSCode. A dbt project is made up of a combination of SQL, YAML and possibly PY files (opens in a new tab).
Is dbt SQL only?
dbt is SQL first, but builds on this with a language called Jinja (used within SQL statements). It can also support Python transformations as long as your data platform can support Python. YAML is also used for configuration and schema definitions.
Have another question? Let us know below.