How To Build An End-To-End Testing Pipeline With DBT On Databricks - by Databricks SQL SME - DBSQL SME Engineering - Medium
How To Build An End-To-End Testing Pipeline With DBT On Databricks - by Databricks SQL SME - DBSQL SME Engineering - Medium
https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
Search Write
Introduction
In previous articles, the DBSQL SME group has introduced how to perform
basic performant ETL on DBT for all things Databricks (here and here). Now
1 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
we dive into the next stage: data quality & pipeline testing. Data quality is
essential in any analytics pipeline. This blog post outlines a robust approach
to ensuring data integrity throughout your dbt workflow. We will explore a
series of tests such as anomaly detection, unit tests, and data contracts that
will help you maintain high-quality data from the source to the final output.
Databricks provides a unified platform for data processing and analytics that
allows users to build, test, deploy, and monitor data products all in one
place! We will leverage dbt (data build tool) — an open-source command-line
tool that helps analysts and engineers transform data in their warehouse
more effectively — to implement robust testing techniques to our data
pipeline.
Databricks
2 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
used
• On your local machine, create a virtual environment and install the dbt-
databricks adapter
• Regarding the profiles.yml file, the best practice is to gitignore this file,
as it relates to an individuals configuration and is created automatically
when you set up dbt locally. This dbt YAML file lives in the .dbt/ directory
of your user/home directory. Update your profiles.yml file from the repo
to point to the Databricks SQL warehouse you created above.
Now test that you can connect to your Databricks SQL warehouse from your
terminal with dbt debug. If all goes well, you should see this output “All
checks passed!” If not, please troubleshoot using the error messages (I find
them very helpful actually).
3 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
Once you enable this, all your metadata is synchronized with Unity Catalog
in Databricks
4 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
Threads help parallelize node execution in the dbt’s (DAG). The default
number of threads is currently 4, but there is no maximum, and dbt will
allow you to go up to your Databricks SQL maximum limit. As a starting
point, increase this number to 10 with a medium SQL warehouse but check
out this in-depth analysis for more details on the best combination. To see
how Databricks SQL manages multiple queries, click here.
5 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
Freshness Checks
One of the first things to check is the freshness of your data. To check the
freshness of your source data in your pipeline in dbt, you can use dbt’s
freshness check block. A freshness block is defined within your models/
sources.yml file and is used to define the acceptable amount of time
between the most recent record in the source, and now, for a table to be
considered “fresh”.
6 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
When we run
7 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
Freshness checks are very helpful as they notify you when your data
becomes stale so you can address them before proceeding further in your
pipeline.
8 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
DBT tests
dbt test
dbt will pass if both tables have equal row counts or it would fail otherwise.
Easy!
dbt init
9 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
Feel free to add other tests at this stage that may be relevant to your project.
10 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
Let us create a simple singular test that asserts that there are no future order
dates in the order table, as that would be strange.
Create this file and save it in the tests directory. Now run
11 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
dbt comes with 4 generic tests out of the box, but you can build your own
custom generic tests. However, before you go build yours, check out these
open-source packages ( dbt-utils and dbt-expectations which we looked at
already) to see if the test you have in mind hasn’t already been created.
12 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
Below is an example of the 4 built-in generic tests being used for the orders
model in a schema.yml file:
• not_null: the order_id column in the orders model should not contain
null values
13 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
14 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
15 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
model contract:
DBT contracts
You may notice that constraints are very similar to data tests, and in some
cases, you can replace data tests with their equivalent constraint. Data tests
validate the content of your model after it is built while constraints check
this at build time. See here and here for more details.
At the moment with dbt-databricks, once you implement constraints, you get
an error message; however, dbt only validates the data once it has already
been inserted into the table, meaning that if the constraints and/or
constraint_check fails, the table with the failing data will still exist in Unity
16 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
Unit Tests
Unit tests examine the smallest testable parts of your models. We implement
unit tests to validate specific transformations or calculations in our models,
especially when there is complex logic. These can be written as singular
tests or using dbt’s recently released unit test framework.
17 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
To ensure the CLV calculation is correct, you can create unit tests where we
pass our sample data (“given”) and then our expected output (“Expect”):
18 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
Inputs & Expected outputs in DBT for Unit Testing of pipeline logic
19 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
Unit tests like this are invaluable when dealing with complex calculations,
ensuring that your dbt models produce accurate and reliable results. Since
the inputs of unit tests are static, it is recommended that they be run only in
development or CI environments.
Additional Tips
To enhance your testing strategy, consider implementing these additional
features:
Test Severity
Another feature that I particularly like is test severity. This allows you to
configure your tests to return a warning instead of an error. Generally, if
your test query returns at least one row, this will return an error.
20 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
• Where: in the test configuration, you can include a filter to apply the
tests to. In this case, it was only important that “closed won”
opportunities had a customer ID assigned to them.
• Config: with severity set to error (which is the default), dbt will check the
error_if condition first. If the error condition is met, the test will return
and error, if not, it checks the warn_if condition and “warns” if that
condition is met, or else, the test passes if neither the “error” nor the
“warn” conditions are met.
21 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
dbt test
in your terminal.
We will see how you can add alerts using Databricks Workflows in the
Monitoring and Alerting in the next part of this article.
Failing Fast
Sometimes, during development, if there is a failure during your build or
test, you may want to exit dbt immediately instead of waiting for all the
models to complete. This will help you save some time and money on your
warehouse especially when you have a lot of models. dbt has a little-known
flag called — fail-fast or -x which immediately exits dbt if any resource fails
to build. You can find out more about it here.
Conclusion
We looked at many approaches to implementing data quality checks in your
data pipeline in Databricks. Remember that data quality is not a one-time
effort but an ongoing process. It is essential to regularly review and update
your tests as your data models evolve. In summary, you should now be
familiar with the following:
• How to set up a basic dbt project for Databricks including a rule of thumb
for compute sizing,
22 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
• How to implement unit tests and custom one-off tests on dbt models,
In the next article, we will look at how to automate monitoring and alerting
using Databricks Workflows and Databricks SQL in a CI/CD pipeline.
Publication for content from the DBSQL SME group and the surrounding
community on DBSQL best practices, new design patterns, creative solutions,
and real world user stories all in one place.
One stop shop for all technical how-tos, demos, and best practices for building
on Databricks SQL
23 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
Responses (4)
Sava Matic
Marian R
Oct 16, 2024
Franz Wöllert
Oct 16, 2024
marlanbar
Mar 27
Is it possible to run dbt unit tests on CI/CD without connecting to the actual workspace?
24 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
In DBSQL SME Engineeri… by Databricks SQL S… In DBSQL SME Engineering by Franco Patano
In DBSQL SME Engineeri… by Databricks SQL S… In DBSQL SME Engineeri… by Databricks SQL S…
Apr 16 29 Apr 16 43 6
See all from Databricks SQL SME See all from DBSQL SME Engineering
25 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
Apr 1 Apr 16 29
26 of 27 5/24/2025, 10:20 AM
How to build an end-to-end testing pipeline with dbt on Databricks | by Databricks SQL SME | DBSQL SME Engineering | Medium https://medium.com/dbsql-sme-engineering/how-to-build-an-end-to-end-testing-pipeline-with-dbt-on-databricks-cb6e179e646c
May 6 85 1 May 16 33
Jan 27 Apr 14 60 2
Help Status About Careers Press Blog Privacy Rules Terms Text to speech
27 of 27 5/24/2025, 10:20 AM