What Are DBT Sources
What Are DBT Sources
What Are DBT Sources
dbt Cloud is the fastest and most reliable way to deploy dbt. Develop, test,
schedule, document, and investigate data models all in one browser-based UI.
dbt Cloud's flexible plans and features make it well-suited for data teams of any
size — sign up for your free 14-day trial!
Use the dbt Cloud CLI to develop, test, run, and version control dbt projects and
commands in your dbt Cloud development environment. Collaborate with team
members, directly from the command line.
The IDE is the easiest and most efficient way to develop dbt models, allowing
you to build, test, run, and version control your dbt projects directly from your
browser.
Manage environments
Create custom schedules to run your production jobs. Schedule jobs by day of
the week, time of day, or a recurring interval. Decrease operating costs by using
webhooks to trigger CI jobs and the API to start jobs.
Notifications
Set up and customize job notifications in dbt Cloud to receive email or slack
alerts when a job run succeeds, fails, or is cancelled. Notifications alert the right
people when something goes wrong instead of waiting for a user to report it.
Run visibility
View the history of your runs and the model timing dashboard to help identify
where improvements can be made to the scheduled jobs.
dbt Cloud hosts and authorizes access to dbt project documentation, allowing
you to generate data documentation on a schedule for your project. Invite
teammates to dbt Cloud to collaborate and share your project's documentation.
Seamlessly connect your git account to dbt Cloud and provide another layer of
security to dbt Cloud. Import new repositories, trigger continuous integration,
clone repos using HTTPS, and more!
Enable Continuous Integration
Configure dbt Cloud to run your dbt projects in a temporary schema when new
commits are pushed to open pull requests. This build-on-PR functionality is a
great way to catch bugs before deploying to production, and an essential tool in
any analyst's belt.
Security
Manage risk with SOC-2 compliance, CI/CD deployment, RBAC, and ELT
architecture.
Use the dbt Semantic Layer to define metrics alongside your dbt models and
query them from any integrated analytics tool. Get the same answers
everywhere, every time.
Discovery API*
Enhance your workflow and run ad-hoc queries, browse schema, or query the
dbt Semantic Layer. dbt Cloud serves a GraphQL API, which supports arbitrary
queries.
dbt Explorer*
Learn about dbt Explorer and how to interact with it to understand, improve,
and leverage your data pipelines.
Both the dbt Cloud IDE and the dbt Cloud CLI enable users to natively defer to
production metadata directly in their development workflows.
For a clean slate, it's a good practice to drop the development schema at the
start and end of your development cycle.
Required setup
When using defer, it compares artifacts from the most recent successful
production job, excluding CI jobs.
To enable defer in the dbt Cloud IDE, toggle the Defer to production button on
the command bar. Once enabled, dbt Cloud will:
1. Pull down the most recent manifest from the Production environment for
comparison
2. Pass the --defer flag to the command (for any command that accepts the
flag)
For example, if you were to start developing on a new branch with nothing in
your development schema, edit a single model, and run dbt build -s
state:modified — only the edited model would run. Any {{ ref() }} functions
will point to the production location of the referenced models.
Select the 'Defer to production' toggle on the bottom right of the command bar to enable defer in
the dbt Cloud IDE.
One key difference between using --defer in the dbt Cloud CLI and the dbt
Cloud IDE is that --defer is automatically enabled in the dbt Cloud CLI for all
invocations, compared with production artifacts. You can disable it with the --
no-defer flag.
The dbt Cloud CLI offers additional flexibility by letting you choose the source
environment for deferral artifacts. You can set a defer-env-id key in either
your dbt_project.yml or dbt_cloud.yml file. If you do not provide a defer-env-
id setting, the dbt Cloud CLI will use artifacts from your dbt Cloud environment
marked "Production".
dbt_cloud.yml
defer-env-id: '123456'
dbt_project.yml
dbt_cloud:
defer-env-id: '123456'
The dbt Cloud CLI is currently in public preview. Share feedback or request
features you'd like to see on the dbt community Slack.
dbt commands are run against dbt Cloud's infrastructure and benefit from:
Prerequisites
The dbt Cloud CLI is available in all deployment regions and for both multi-
tenant and single-tenant accounts (Azure single-tenant not supported at this
time).
Ensure you are using dbt version 1.5 or higher. Refer to dbt Cloud
versions to upgrade.
Note that SSH tunneling for Postgres and Redshift connections doesn't
support the dbt Cloud CLI yet.
You can install the dbt Cloud CLI on the command line by using one of these
methods.
macOS (brew)
Windows (native executable)
Linux (native executable)
Existing dbt Core users (pip)
Before you begin, make sure you have Homebrew installed in your code editor
or command line terminal. Refer to the FAQs if your operating system runs into
path conflicts.
o If you see a dbt not found, you're good to go. If the dbt help text
appears, use pip uninstall dbt to remove dbt Core from your
system.
3. Verify your installation by running dbt --help in the command line. If you
see the following output, your installation is correct:
The dbt Cloud CLI - an ELT tool for running SQL transformations and
data models in dbt Cloud...
If you don't see this output, check that you've deactivated pyenv or venv
and don't have a global dbt version installed.
o Note that you no longer need to run the dbt deps command when
your environment starts. This step was previously required during
initialization. However, you should still run dbt deps if you make
any changes to your packages.yml file.
4. Clone your repository to your local computer using git clone. For
example, to clone a GitHub repo using HTTPS format, run git clone
https://github.com/YOUR-USERNAME/YOUR-REPOSITORY.
5. After cloning your repo, configure the dbt Cloud CLI for your dbt Cloud
project. This lets you run dbt commands like dbt environment show to
view your dbt Cloud configuration or dbt compile to compile your project
and validate models and tests. You can also add, edit, and synchronize
files with your repo.
The following instructions explain how to update the dbt Cloud CLI to the latest
version depending on your operating system.
During the public preview period, we recommend updating before filing a bug
report. This is because the API is subject to breaking changes.
macOS (brew)
Windows (executable)
Linux (executable)
Existing dbt Core users (pip)
To update the dbt Cloud CLI, run brew update and then brew upgrade dbt.
Using VS Code extensions
Visual Studio (VS) Code extensions enhance command line tools by adding extra
functionalities. The dbt Cloud CLI is fully compatible with dbt Core, however, it
doesn't support some dbt Core APIs required by certain tools, for example, VS
Code extensions.
You can use extensions like dbt-power-user with the dbt Cloud CLI by following
these steps:
This setup allows dbt-power-user to continue to work with dbt Core in the
background, alongside the dbt Cloud CLI. For more, check the dbt Power
User documentation.
FAQs
What's the difference between the dbt Cloud CLI and dbt Core?Hover to view
How do I run both the dbt Cloud CLI and dbt Core?Hover to view
How to create an alias?Hover to view
Why am I receiving a `Session occupied` error?Hover to view
The dbt Cloud CLI is currently in public preview. Share feedback or request
features you'd like to see on the dbt community Slack.
Prerequisites
Once you install the dbt Cloud CLI, you need to configure it to connect to a dbt
Cloud project.
3. Follow the banner instructions and download the config file to:
o Mac or Linux: ~/.dbt/dbt_cloud.yml
o Windows: C:\Users\yourusername\.dbt\dbt_cloud.yml
- project-id: "<project-id>"
account-host: "<account-host>"
api-key: "<user-api-key>"
dbt-cloud:
project-id: PROJECT_ID
o To find your project ID, select Develop in the dbt Cloud navigation
menu. You can use the URL to find the project ID. For example,
in https://cloud.getdbt.com/develop/26228/projects/123456, the
project ID is 123456.
6. You should now be able to use the dbt Cloud CLI and run dbt
commands like dbt environment show to view your dbt Cloud
configuration details or dbt compile to compile models in your dbt
project.
With your repo recloned, you can add, edit, and sync files with your repo.
Set environment variables
To set environment variables in the dbt Cloud CLI for your dbt project:
The dbt Cloud CLI uses the same set of dbt commands and MetricFlow
commands as dbt Core to execute the commands you provide. For
example, use the dbt environment command to view your dbt Cloud
configuration details.
It allows you to automatically defer build artifacts to your Cloud project's
production environment.
It also supports project dependencies, which allows you to depend on
another project using the metadata service in dbt Cloud.
o Project dependencies instantly connect to and reference (or ref)
public models defined in other projects. You don't need to execute
or analyze these upstream models yourself. Instead, you treat them
as an API that returns a dataset.
These features create a powerful editing environment for efficient SQL coding,
suitable for both experienced and beginner developers.
The dbt Cloud IDE includes version control,files/folders, an editor, a command/console, and more.
Enable dark mode for a great viewing experience in low-light environments.
DISABLE AD BLOCKERS
To improve your experience using dbt Cloud, we suggest that you turn off ad
blockers. This is because some project file names, such as google_adwords.sql,
might resemble ad traffic and trigger ad blockers.
Prerequisites
The dbt Cloud IDE comes with features that make it easier for you to develop,
build, compile, run, and test data models.
To understand how to navigate the IDE and its user interface elements, refer to
the IDE user interface page.
Feature Info
Keyboard You can access a variety of commands and actions in the IDE
shortcuts by choosing the appropriate keyboard shortcut. Use the
shortcuts for common tasks like building modified models or
resuming builds from the last failure.
File state Ability to see when changes or actions have been made to the
indicators file. The indicators M, D, A, and • appear to the right of your
file or folder name and indicate the actions performed:
IDE version The IDE version control section and git button allow you to
control apply the concept of version control to your project directly
into the IDE.
Project Generate and view your project documentation for your dbt
documentation project in real-time. You can inspect and verify what your
project's documentation will look like before you deploy your
changes to production.
Preview and You can compile or preview code, a snippet of dbt code, or
Compile button one of your dbt models after editing and saving.
Build, test, and Build, test, and run your project with a button click or by
Feature Info
Command bar You can enter and run commands from the command bar at
the bottom of the IDE. Use the rich model selection syntax to
execute dbt commands directly within dbt Cloud. You can
also view the history, status, and logs of previous runs by
clicking History on the left of the bar.
Drag and drop Drag and drop files located in the file explorer, and use the file
breadcrumb on the top of the IDE for quick, linear navigation.
Access adjacent files in the same file by right-clicking on the
breadcrumb file.
Organize tabs - Move your tabs around to reorganize your work in the IDE
and files - Right-click on a tab to view and select a list of actions,
including duplicate files
- Close multiple, unsaved tabs to batch save your work
- Double click files to rename files
Multiple You can make multiple selections for small and simultaneous
selections edits. The below commands are a common way to add more
cursors and allow you to insert cursors below or above with
ease.
Lint and Format Lint and format your files with a click of a button, powered by
Feature Info
Git diff view Ability to see what has been changed in a file before you make
a pull request.
DAG in the IDE You can see how models are used as building blocks from left
to right to transform your data from raw sources into cleaned-
up modular derived pieces and final outputs on the far right of
the DAG. The default view is 2+model+2 (defaults to display 2
nodes away), however, you can change it to +model+
(full DAG). Note the --exclude flag isn't supported.
Status bar This area provides you with useful information about your IDE
and project status. You also have additional options like
enabling light or dark mode, restarting the IDE, or recloning
your repo.
Dark mode From the status bar in the Cloud IDE, enable dark mode for a
great viewing experience in low-light environments.
Start-up process
There are three start-up states when using or launching the Cloud IDE:
Creation start — This is the state where you are starting the IDE for the
first time. You can also view this as a cold start (see below), and you can
expect this state to take longer because the git repository is being cloned.
Cold start — This is the process of starting a new develop session, which
will be available for you for three hours. The environment automatically
turns off three hours after the last activity. This includes compile, preview,
or any dbt invocation, however, it does not include editing and saving a
file.
Hot start — This is the state of resuming an existing or active develop
session within three hours of the last activity.
Work retention
The Cloud IDE needs explicit action to save your changes. There are three ways
your work is stored:
Unsaved, local code — The browser stores your code only in its local
storage. In this state, you might need to commit any unsaved changes in
order to switch branches or browsers. If you have saved and committed
changes, you can access the "Change branch" option even if there are
unsaved changes. But if you attempt to switch branches without saving
changes, a warning message will appear, notifying you that you will lose
any unsaved changes.
If you attempt to switch branches without saving changes, a warning message will appear,
telling you that you will lose your changes.
Saved but uncommitted code — When you save a file, the data gets
stored in durable, long-term storage, but isn't synced back to git. To
switch branches using the Change branch option, you must "Commit and
sync" or "Revert" changes. Changing branches isn't available for saved-
but-uncommitted code. This is to ensure your uncommitted changes
don't get lost.
Committed code — This is stored in the branch with your git provider and
you can check out other (remote) branches.
To improve your experience using dbt Cloud, we suggest that you turn off ad
blockers. This is because some project file names, such as google_adwords.sql,
might resemble ad traffic and trigger ad blockers.
In order to start experiencing the great features of the Cloud IDE, you need to
first set up a dbt Cloud development environment. In the following steps, we
outline how to set up developer credentials and access the IDE. If you're
creating a new project, you will automatically configure this during the project
setup.
The IDE uses developer credentials to connect to your data platform. These
developer credentials should be specific to your user and they should not be
super user credentials or the same credentials that you use for your production
deployment of dbt.
1. Navigate to your Credentials under Your Profile settings, which you can
access at https://YOUR_ACCESS_URL/settings/profile#credentials,
replacing YOUR_ACCESS_URL with the appropriate Access URL for your
region and plan.
2. Select the relevant project in the list.
3. Click Edit on the bottom right of the page.
4. Enter the details under Development Credentials.
5. Click Save.
Configure developer credentials in your Profile
6. Access the Cloud IDE by clicking Develop at the top of the page.
7. Initialize your project and familiarize yourself with the IDE and its
delightful features.
You can build, compile, run, and test dbt projects using the command bar
or Build button. Use the Build button to quickly build, run, or test the model
you're working on. The Cloud IDE will update in real-time when you run models,
tests, seeds, and operations.
If a model or test fails, dbt Cloud makes it easy for you to view and download
the run logs for your dbt invocations to fix the issue.
Use dbt's rich model selection syntax to run dbt commands directly within dbt
Cloud.
Preview, compile, or build your dbt project. Use the lineage tab to see your DAG.
The dbt Cloud IDE makes it possible to build and view documentation for your
dbt project while your code is still in development. With this workflow, you can
inspect and verify what your project's generated documentation will look like
before your changes are released to production.
Related docs
Related questions
How can I fix my .gitignore file?Hover to view
A .gitignore file specifies which files git should intentionally ignore or 'untrack'.
dbt Cloud indicates untracked files in the project file explorer pane by putting
the file or folder name in italics.
To resolve issues with your gitignore file, adding the correct entries won't
automatically remove (or 'untrack') files or folders that have already been
tracked by git. The updated gitignore will only prevent new files or folders from
being tracked. So you'll need to first fix the gitignore file, then perform some
additional git operations to untrack any incorrect files or folders.
1. Launch the Cloud IDE into the project that is being fixed, by
selecting Develop on the menu bar.
2. In your File Explorer, check to see if a .gitignore file exists at the root of
your dbt project folder. If it doesn't exist, create a new file.
3. Open the new or existing gitignore file, and add the following:
# ✅ Correct
target/
dbt_packages/
logs/
# legacy -- renamed to dbt_packages in dbt v1
dbt_modules/
Note — You can place these lines anywhere in the file, as long as they're
on separate lines. The lines shown are wildcards that will include all
nested files and folders. Avoid adding a trailing '*' to the lines, such
as target/*.
6. Once the IDE restarts, go to the File Explorer to delete the following files
or folders (if they exist). No data will be lost:
9. Once the IDE restarts, use the Create a pull request (PR) button under
the Version Control menu to start the process of integrating the changes.
10.When the git provider's website opens to a page with the new PR, follow
the necessary steps to complete and merge the PR into the main branch
of that repository.
o Note — The 'main' branch might also be called 'master', 'dev', 'qa',
'prod', or something else depending on the organizational naming
conventions. The goal is to merge these changes into the root
branch that all other development branches are created from.
11.Return to the dbt Cloud IDE and use the Change Branch button, to switch
to the main branch of the project.
12.Once the branch has changed, click the Pull from remote button to pull
in all the changes.
13.Verify the changes by making sure the files/folders in the .gitignore file
are in italics.
A dbt project on the main branch that has properly configured gitignore folders (highlighted in
italics).
There are two options for this approach: editing the main branch directly if
allowed, or creating a pull request to implement the changes if required:
When permissions allow it, it's possible to edit the `.gitignore` directly on the
main branch of your repo. Here are the following steps:
For more info, refer to this detailed video for additional guidance.
Is there a cost to using the Cloud IDE?Hover to view
Not at all! You can use dbt Cloud when you sign up for the Free Developer plan,
which comes with one developer seat. If you’d like to access more features or
have more developer seats, you can upgrade your account to the Team or
Enterprise plan.
dbt Cloud IDE: dbt Cloud is a web-based application that allows you to
develop dbt projects with the IDE, includes a purpose-built scheduler,
and provides an easier way to share your dbt documentation with your
team. The IDE is a faster and more reliable way to deploy your dbt models
and provides a real-time editing and execution environment for your dbt
project.
dbt Cloud CLI: The dbt Cloud CLI allows you to run dbt commands against
your dbt Cloud development environment from your local command line
or code editor. It supports cross-project ref, speedier, lower-cost builds,
automatic deferral of build artifacts, and more.
dbt Core: dbt Core is an open-sourced software that’s freely available. You
can build your dbt project in a code editor, and run dbt commands from
the command line
This page offers comprehensive definitions and terminology of user interface elements,
allowing you to navigate the IDE landscape with ease.
The Cloud IDE layout includes version control on the upper left, files/folders on the left, editor on
the right an command/console at the bottom
Basic layout
The IDE streamlines your workflow, and features a popular user interface layout with files
and folders on the left, editor on the right, and command and console information at the
bottom.
The Git repo link, documentation site
button, Version Control menu, and File Explorer
1. Git repository link — Clicking the Git repository link, located on the upper left of
the IDE, takes you to your repository on the same active branch.
o Note: This feature is only available for GitHub or GitLab repositories on multi-
tenant dbt Cloud accounts.
2. Documentation site button — Clicking the Documentation site book icon, located
next to the Git repository link, leads to the dbt Documentation site. The site is
powered by the latest dbt artifacts generated in the IDE using the dbt docs
generate command from the Command bar.
3. Version Control — The IDE's powerful Version Control section contains all git-
related elements, including the Git actions button and the Changes section.
4. File Explorer — The File Explorer shows the filetree of your repository. You can:
o Click on any file in the filetree to open the file in the File Editor.
o Click and drag files between directories to move files.
o Right-click a file to access the sub-menu options like duplicate file, copy file name,
copy as ref, rename, delete.
o Note: To perform these actions, the user must not be in read-only mode, which
generally happens when the user is viewing the default branch.
o Use file indicators, located to the right of your files or folder name, to see when
changes or actions were made:
Unsaved (•) — The IDE detects unsaved changes to your file/folder
Modification (M) — The IDE detects a modification of existing files/folders
Added (A) — The IDE detects added files
Deleted (D) — The IDE detects deleted files.
Use the Command bar to write dbt commands, toggle 'Defer', and view the current IDE status
5. Command bar — The Command bar, located in the lower left of the IDE, is used to
invoke dbt commands. When a command is invoked, the associated logs are
shown in the Invocation History Drawer.
6. Defer to production — The Defer to production toggle allows developers to only
build and run and test models they've edited without having to first run and build
all the models that come before them (upstream parents). Refer to Using defer in
dbt Cloud for more info.
7. Status button — The IDE Status button, located on the lower right of the IDE,
displays the current IDE status. If there is an error in the status or in the dbt code
that stops the project from parsing, the button will turn red and display "Error". If
there aren't any errors, the button will display a green "Ready" status. To access
the IDE Status modal, simply click on this button.
Editing features
The IDE features some delightful tools and layouts to make it easier for you to write dbt
code and collaborate with teammates.
Use the file editor, version control section, and save button during your development workflow
1. File Editor — The File Editor is where users edit code. Tabs break out the region for
each opened file, and unsaved files are marked with a blue dot icon in the tab view.
o Use intuitive keyboard shortcuts to help develop easier for you and your team.
2. Save button — The editor has a Save button that saves editable files. Pressing the
button or using the Command-S or Control-S shortcut saves the file contents. You
don't need to save to preview code results in the Console section, but it's
necessary before changes appear in a dbt invocation. The File Editor tab shows a
blue icon for unsaved changes.
3. Version Control — This menu contains all git-related elements, including the Git
actions button. The button updates relevant actions based on your editor's state,
such as prompting to pull remote changes, commit and sync when reverted
commit changes are present, or creating a merge/pull request when appropriate.
o The dropdown menu on the Git actions button allows users to revert changes,
refresh Git state, create merge/pull requests, and change branches.
Keep in mind that although you can't delete local branches in the IDE using
this menu, you can reclone your repository, which deletes your local
branches and refreshes with the current remote branches, effectively
removing the deleted ones.
o You can also resolve merge conflicts and for more info on git, refer to Version
control basics.
o Version Control Options menu — The Changes section, under the Git actions
button, lists all file changes since the last commit. You can click on a change to
open the Git Diff View to see the inline changes. You can also right-click any file and
use the file-specific options in the Version Control Options menu.
dbt Editor Command Palette — The dbt Editor Command Palette displays text
editing actions and their associated keyboard shortcuts. This can be accessed by
pressing F1 or right-clicking in the text editing area and selecting Command
Palette.
Click F1 to access the dbt Editor Command Palette menu for editor shortcuts
Git Diff View — Clicking on a file in the Changes section of the Version Control
Menu will open the changed file with Git Diff view. The editor will show the
previous version on the left and the in-line changes made on the right.
The Git Diff View displays the previous version on the left and the changes made on the
right of the Editor
Markdown Preview console tab — The Markdown Preview console tab shows a
preview of your .md file's markdown code in your repository and updates it
automatically as you edit your code.
The Markdown Preview console tab renders markdown code below the Editor tab.
CSV Preview console tab — The CSV Preview console tab displays the data from
your CSV file in a table, which updates automatically as you edit the file in your
seed directory.
View csv code in the CSV Preview console tab below the Editor tab.
Console section
The console section, located below the File editor, includes various console tabs and
buttons to help you with tasks such as previewing, compiling, building, and viewing
the DAG. Refer to the following sub-bullets for more details on the console tabs and
buttons.
The Console section is located below the File editor and has various tabs and buttons to help
execute tasks
1. Preview button — When you click on the Preview button, it runs the SQL in the active file
editor regardless of whether you have saved it or not and sends the results to
the Results console tab. You can preview a selected portion of saved or unsaved code by
highlighting it and then clicking the Preview button.
2. Compile button — The Compile button compiles the saved or unsaved SQL code and
displays it in the Compiled Code tab.
Starting from dbt v1.6 or higher, when you save changes to a model, you can compile its
code with the model's specific context. This context is similar to what you'd have when
building the model and involves useful context variables
like {{ this }} or {{ is_incremental() }}.
3. Build button — The build button allows users to quickly access dbt commands
related to the active model in the File Editor. The available commands include dbt
build, dbt test, and dbt run, with options to include only the current resource, the
resource and its upstream dependencies, the resource, and its downstream
dependencies, or the resource with all dependencies. This menu is available for all
executable nodes.
4. Format button — The editor has a Format button that can reformat the contents
of your files. For SQL files, it uses either sqlfmt or sqlfluff, and for Python files, it
uses black.
5. Results tab — The Results console tab displays the most recent Preview results in
tabular format.
6. Compiled Code tab — The Compile button triggers a compile invocation that
generates compiled code, which is displayed in the Compiled Code tab.
Compile results show up in the Compiled Code tab
7. Lineage tab — The Lineage tab in the File Editor displays the active model's
lineage or DAG. By default, it shows two degrees of lineage in both directions
(2+model_name+2), however, you can change it to +model+ (full DAG).
o Double-click a node in the DAG to open that file in a new tab
o Expand or shrink the DAG using node selection syntax.
o Note, the --exclude flag isn't supported.
The Invocation History Drawer stores information on dbt invocations in the IDE. When you
invoke a command, like executing a dbt command such as dbt run, the associated logs
are displayed in the Invocation History Drawer.
Clicking the ^ icon next to the Command bar on the lower left of the page
Typing a dbt command and pressing enter
Or pressing Control-backtick (or Ctrl + `)
The Invocation History Drawer returns a log and detail of all your dbt Cloud invocations.
1. Invocation History list — The left-hand panel of the Invocation History Drawer
displays a list of previous invocations in the IDE, including the command, branch
name, command status, and elapsed time.
2. Invocation Summary — The Invocation Summary, located above System Logs,
displays information about a selected command from the Invocation History list,
such as the command, its status (Running if it's still running), the git branch that
was active during the command, and the time the command was invoked.
3. System Logs toggle — The System Logs toggle, located under the Invocation
Summary, allows the user to see the full stdout and debug logs for the entirety of
the invoked command.
4. Command Control button — Use the Command Control button, located on the
right side, to control your invocation and cancel or rerun a selected run.
The Invocation History list displays a list of previous invocations in the IDE
5. Node Summary tab — Clicking on the Results Status Tabs will filter the Node
Status List based on their corresponding status. The available statuses are Pass
(successful invocation of a node), Warn (test executed with a warning), Error
(database error or test failure), Skip (nodes not run due to upstream error), and
Queued (nodes that have not executed yet).
6. Node result toggle — After running a dbt command, information about each
executed node can be found in a Node Result toggle, which includes a summary
and debug logs. The Node Results List lists every node that was invoked during the
command.
7. Node result list — The Node result list shows all the Node Results used in the dbt
run, and you can filter it by clicking on a Result Status tab.
Use menus and modals to interact with IDE and access useful options to help your
development workflow.
Editor tab menu — To interact with open editor tabs, right-click any tab to access
the helpful options in the file tab menu.
File Search — You can easily search for and navigate between files using the File
Navigation menu, which can be accessed by pressing Command-O or Control-O or
The Command History returns a log and detail of all your dbt Cloud invocations.
IDE Status modal — The IDE Status modal shows the current error message and
debug logs for the server. This also contains an option to restart the IDE. Open this
by clicking on the IDE Status button.
The Command History returns a log and detail of all your dbt Cloud invocations.
Commit Changes modal — The Commit Changes modal is accessible via the Git
Actions button to commit all changes or via the Version Control Options menu to
commit individual changes. Once you enter a commit message, you can use the
modal to commit and sync the selected changes.
The Commit Changes modal is how users commit changes to their branch.
Change Branch modal — The Change Branch modal allows users to switch git
branches in the IDE. It can be accessed through the Change Branch link or the Git
Actions button in the Version Control menu.
The Commit Changes modal is how users change their branch.
IDE Options menu — The IDE Options menu can be accessed by clicking on the
three-dot menu located at the bottom right corner of the IDE. This menu contains
global options such as:
o Toggling between dark or light mode for a better viewing experience
o Restarting the IDE
o Fully recloning your repository to refresh your git state and view status details
o Viewing status details, including the IDE Status modal.
Access the IDE Options menu to switch to dark or light mode, restart the IDE, reclone your repo, or
view the IDE status
Tags:
IDE
In the dbt Cloud IDE, you can perform linting, auto-fix, and formatting on five
different file types:
SQL — Lint and fix with SQLFluff, and format with sqlfmt
YAML, Markdown, and JSON — Format with Prettier
Python — Format with Black
Each file type has its own unique linting and formatting rules. You
can customize the linting process to add more flexibility and enhance problem
and style detection.
By default, the IDE uses sqlfmt rules to format your code, making it convenient
to use right away. However, if you have a file named .sqlfluff in the root
directory of your dbt project, the IDE will default to SQLFluff rules instead.
Use SQLFluff to lint/format your SQL code, and view code errors in the Code Quality tab.
Lint
With the dbt Cloud IDE, you can seamlessly use SQLFluff, a configurable SQL
linter, to warn you of complex functions, syntax, formatting, and compilation
errors. This integration allows you to run checks, fix, and display any code errors
directly within the Cloud IDE:
Linting doesn't support ephemeral models in dbt v1.5 and lower. Refer to
the FAQs for more info.
Enable linting
Use the Lint or Fix button in the console section to lint or auto-fix your code.
Customize linting
SQLFluff is a configurable SQL linter, which means you can configure your own
linting rules instead of using the default linting settings in the IDE. You can
exclude files and directories by using a standard .sqlfluffignore file. Learn
more about the syntax in the .sqlfluffignore syntax docs.
1. Create a new file in the root project directory (the parent or top-level
directory for your files). Note: The root project directory is the directory
where your dbt_project.yml file resides.
2. Name the file .sqlfluff (make sure you add the . before sqlfluff).
3. Create and add your custom config code.
4. Save and commit your changes.
5. Restart the IDE.
6. Test it out and happy linting!
Refer to the SQLFluff config file to add the dbt code (or dbtonic) rules we use for
our own projects:
For more info on styling best practices, refer to How we style our SQL.
Customize linting by configuring your own linting code rules, including dbtonic linting/styling.
Format
In the dbt Cloud IDE, you can format your code to match style guides with a click
of a button. The IDE integrates with formatters like sqlfmt, Prettier, and Black to
automatically format code on five different file types — SQL, YAML, Markdown,
Python, and JSON:
SQL — Format with sqlfmt, which provides one way to format your dbt
SQL and Jinja.
YAML, Markdown, and JSON — Format with Prettier.
Python — Format with Black.
The Cloud IDE formatting integrations take care of manual tasks like code
formatting, enabling you to focus on creating quality data models,
collaborating, and driving impactful results.
Format SQL
To format your SQL code, dbt Cloud integrates with sqlfmt, which is an
uncompromising SQL query formatter that provides one way to format the SQL
query and Jinja.
By default, the IDE uses sqlfmt rules to format your code, making
the Format button available and convenient to use immediately. However, if
you have a file named .sqlfluff in the root directory of your dbt project, the IDE
will default to SQLFluff rules instead.
To enable sqlfmt:
You can add a configuration file to customize formatting rules for YAML,
Markdown, or JSON files using Prettier. The IDE looks for the configuration file
based on an order of precedence. For example, it first checks for a "prettier" key
in your package.json file.
For more info on the order of precedence and how to configure files, refer
to Prettier's documentation. Please note, .prettierrc.json5, .prettierrc.js,
and .prettierrc.toml files aren't currently supported.
Format Python
To format your Python code, dbt Cloud integrates with Black, which is an
uncompromising Python code formatter.
models Each model lives in a single file and contains logic that either
transforms raw data into a dataset that is ready for analytics or, more
often, is an intermediate step in such a transformation.
snapshot A way to capture the state of your mutable tables so you can refer to
s it later.
seeds CSV files with static data that you can load into your data platform
with dbt.
data tests SQL queries that you can write to test the models and resources in
your project.
sources A way to name and describe the data loaded into your warehouse by
your Extract and Load tools.
analysis A way to organize analytical SQL queries in your project such as the
general ledger from your QuickBooks.
When building out the structure of your project, you should consider these
impacts on your organization's workflow:
require-dbt- Restrict your project to only work with a range of dbt Core versions
version
Project subdirectories
You can use the Project subdirectory option in dbt Cloud to specify a
subdirectory in your git repository that dbt should use as the root directory for
your project. This is helpful when you have multiple dbt projects in one
repository or when you want to organize your dbt project files into
subdirectories for easier management.
To use the Project subdirectory option in dbt Cloud, follow these steps:
1. Click on the cog icon on the upper right side of the page and click
on Account Settings.
2. Under Projects, select the project you want to configure as a project
subdirectory.
3. Select Edit on the lower right-hand corner of the page.
4. In the Project subdirectory field, add the name of the subdirectory. For
example, if your dbt project files are located in a subdirectory
called <repository>/finance, you would enter finance as the subdirectory.
o You can also reference nested subdirectories. For example, if your
dbt project files are located in <repository>/teams/finance, you
would enter teams/finance as the subdirectory. Note: You do not
need a leading or trailing / in the Project subdirectory field.
After configuring the Project subdirectory option, dbt Cloud will use it as the
root directory for your dbt project. This means that dbt commands, such as dbt
run or dbt test, will operate on files within the specified subdirectory. If there is
no dbt_project.yml file in the Project subdirectory, you will be prompted to
initialize the dbt project.
New projects
You can create new projects and share them with other people by making them
available on a hosted git repository like GitHub, GitLab, and BitBucket.
After you set up a connection with your data platform, you can initialize your
new project in dbt Cloud and start developing. Or, run dbt init from the
command line to set up your new project.
During project initialization, dbt creates sample model files in your project
directory to help you start developing quickly.
Sample projects
If you want to explore dbt projects more in-depth, you can clone dbt Lab’s Jaffle
shop on GitHub. It's a runnable project that contains sample configurations and
helpful notes.
If you want to see what a mature, production project looks like, check out
the GitLab Data Team public repo.
Models are where your developers spend most of their time within a dbt
environment. Models are primarily written as a select statement and saved as
a .sql file. While the definition is straightforward, the complexity of the
execution will vary from environment to environment. Models will be written
and rewritten as needs evolve and your organization finds new ways to
maximize efficiency.
SQL is the language most dbt users will utilize, but it is not the only one for
building models. Starting in version 1.3, dbt Core and dbt Cloud support Python
models. Python models are useful for training or deploying data science models,
complex transformations, or where a specific Python package meets a
need — such as using the dateutil library to parse dates.
Your organization may need only a few models, but more likely you’ll need a
complex structure of nested models to transform the required data. A model is a
single file containing a final select statement, and a project can have multiple
models, and models can even reference each other. Add to that, numerous
projects and the level of effort required for transforming complex data sets can
improve drastically compared to older methods.
Learn more about models in SQL models and Python models pages. If you'd like
to begin with a bit of practice, visit our Getting Started Guide for instructions on
setting up the Jaffle_Shop sample data so you can get hands-on with the power
of dbt.
Snapshot configurations
Snapshot properties
snapshot command
Analysts often need to "look back in time" at previous data states in their
mutable tables. While some source data systems are built in a way that makes
accessing historical data possible, this is not always the case. dbt provides a
mechanism, snapshots, which records changes to a mutable table over time.
i updated_a
status
d t
1 pending 2019-01-01
Now, imagine that the order goes from "pending" to "shipped". That same
record will now look like:
i updated_a
status
d t
1 shipped 2019-01-02
This order is now in the "shipped" state, but we've lost the information about
when the order was last in the "pending" state. This makes it difficult (or
impossible) to analyze how long it took for an order to ship. dbt can "snapshot"
these changes to help you understand how values in a row change over time.
Here's an example of a snapshot table for the previous example:
{% snapshot orders_snapshot %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at',
)
}}
{% endsnapshot %}
It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud.
Instead, run the dbt snapshot command in the IDE by completing the following
steps.
On the first run: dbt will create the initial snapshot table — this will be
the result set of your select statement, with additional columns
including dbt_valid_from and dbt_valid_to. All records will have
a dbt_valid_to = null.
On subsequent runs: dbt will check which records have changed or if any
new records have been created:
o The dbt_valid_to column will be updated for any existing records
that have changed
o The updated record and any new records will be inserted into the
snapshot table. These records will now have dbt_valid_to = null
Snapshots can be referenced in downstream models the same way as
referencing models — by using the ref function.
Example
{% endsnapshot %}
3. Write a select statement within the snapshot block (tips for writing a
good snapshot query are below). This select statement defines the results
that you want to snapshot over time. You can use sources and refs here.
snapshots/orders_snapshot.sql
{% snapshot orders_snapshot %}
{% endsnapshot %}
4. Check whether the result set of your query includes a reliable timestamp
column that indicates when a record was last updated. For our example,
the updated_at column reliably indicates record changes, so we can use
the timestamp strategy. If your query result set does not have a reliable
timestamp, you'll need to instead use the check strategy — more details
on this below.
5. Add configurations to your snapshot using a config block (more details
below). You can also configure your snapshot from
your dbt_project.yml file (docs).
snapshots/orders_snapshot.sql
{% snapshot orders_snapshot %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at',
)
}}
{% endsnapshot %}
6. Run the dbt snapshot command — for our example a new table will be
created at analytics.snapshots.orders_snapshot. You can change
the target_database configuration, the target_schema configuration and
the name of the snapshot (as defined in {% snapshot .. %}) will change
how dbt names this table.
$ dbt snapshot
Running with dbt=0.16.0
Completed successfully
7. Inspect the results by selecting from the table dbt created. After the first
run, you should see the results of your query, plus the snapshot meta
fields as described below.
8. Run the snapshot command again, and inspect the results. If any records
have been updated, the snapshot should reflect this.
9. Select from the snapshot in downstream models using the ref function.
models/changed_orders.sql
Snapshot "strategies" define how dbt knows if a row has changed. There are
two strategies built-in to dbt — timestamp and check.
Timestamp strategy (recommended)
updated_at A column which represents when the source row was last updated_at
updated
Example usage:
snapshots/orders_snapshot_timestamp.sql
{% snapshot orders_snapshot_timestamp %}
{{
config(
target_schema='snapshots',
strategy='timestamp',
unique_key='id',
updated_at='updated_at',
)
}}
{% endsnapshot %}
Check strategy
Example Usage
snapshots/orders_snapshot_check.sql
{% snapshot orders_snapshot_check %}
{{
config(
target_schema='snapshots',
strategy='check',
unique_key='id',
check_cols=['status', 'is_cancelled'],
)
}}
{% endsnapshot %}
Rows that are deleted from the source query are not invalidated by default. With
the config option invalidate_hard_deletes, dbt can track rows that no longer
exist. This is done by left joining the snapshot table with the source table, and
filtering the rows that are still valid at that point, but no longer can be found in
the source table. dbt_valid_to will be set to the current snapshot time.
Example Usage
snapshots/orders_snapshot_hard_delete.sql
{% snapshot orders_snapshot_hard_delete %}
{{
config(
target_schema='snapshots',
strategy='timestamp',
unique_key='id',
updated_at='updated_at',
invalidate_hard_deletes=True,
)
}}
{% endsnapshot %}
Configuring snapshots
Snapshot configurations
A number of other configurations are also supported (e.g. tags and post-hook),
check out the full list here.
The unique key is used by dbt to match rows up, so it's extremely important to
make sure this key is actually unique! If you're snapshotting a source, I'd
recommend adding a uniqueness test to your source (example).
Your models should then select from these snapshots, treating them like regular
data sources. As much as possible, snapshot your source data in its raw form
and use downstream models to clean up the data
If you apply business logic in a snapshot query, and this logic changes in the
future, it can be impossible (or, at least, very difficult) to apply the change in
logic to your snapshots.
Snapshot meta-fields
Snapshot tables will be created as a clone of your source dataset, plus some
additional meta-fields*.
inserted a record.
dbt_valid_to The timestamp when this row The most recent snapshot
became invalidated. record will have dbt_valid_to set
to null.
*The timestamps used for each column are subtly different depending on the
strategy you use:
For the check strategy, the current timestamp is used to populate each column.
If configured, the check strategy uses the updated_at column instead, as with the
timestamp strategy.
FAQs
How do I run one snapshot at a time?Hover to view
How often should I run the snapshot command?Hover to view
What happens if I add new columns to my snapshot query?Hover to view
Do hooks run with snapshots?Hover to view
Why is there only one `target_schema` for snapshots?Hover to view
Can I store my snapshots in a directory other than the `snapshot` directory in
my project?Hover to view
By default, dbt expects your snapshot files to be located in
the snapshots subdirectory of your project.
snapshot-paths: ["snapshots"]
Note that you cannot co-locate snapshots and models in the same directory.
Debug Snapshot target is not a snapshot table errorsHover to view
Test command
Data test properties
Data test configurations
Test selection examples
Overview
Data tests are assertions you make about your models and other resources in your dbt
project (e.g. sources, seeds and snapshots). When you run dbt test, dbt will tell you if
each test in your project passes or fails.
You can use data tests to improve the integrity of the SQL in each model by making
assertions about the results generated. Out of the box, you can test whether a specified
column in a model only contains non-null values, unique values, or values that have a
corresponding value in another model (for example, a customer_id for
an order corresponds to an id in the customers model), and values from a specified list.
You can extend data tests to suit business logic specific to your organization – any
assertion that you can make about your model in the form of a select query can be turned
into a data test.
Data tests return a set of failing records. Generic data tests (f.k.a. schema tests) are
defined using test blocks.
Like almost everything in dbt, data tests are SQL queries. In particular, they
are select statements that seek to grab "failing" records, ones that disprove your
assertion. If you assert that a column is unique in a model, the test query selects for
duplicates; if you assert that a column is never null, the test seeks after nulls. If the data
test returns zero failing rows, it passes, and your assertion has been validated.
There are two ways of defining data tests in dbt:
A singular data test is testing in its simplest form: If you can write a SQL query that returns
failing rows, you can save that query in a .sql file within your test directory. It's now a
data test, and it will be executed by the dbt test command.
A generic data test is a parameterized query that accepts arguments. The test query is
defined in a special test block (like a macro). Once defined, you can reference the generic
test by name throughout your .yml files—define it on models, columns, sources,
snapshots, and seeds. dbt ships with four generic data tests built in, and we think you
should use them!
Defining data tests is a great way to confirm that your outputs and inputs are as expected,
and helps prevent regressions when your code changes. Because you can use them over
and over again, making similar assertions with minor variations, generic data tests tend to
be much more common—they should make up the bulk of your dbt data testing suite.
That said, both ways of defining data tests have their time and place.
If you're new to dbt, we recommend that you check out our quickstart guide to build your
first dbt project with models and tests.
Singular data tests
The simplest way to define a data test is by writing the exact SQL that will return failing
records. We call these "singular" data tests, because they're one-off assertions usable for a
single purpose.
These tests are defined in .sql files, typically in your tests directory (as defined by
your test-paths config). You can use Jinja (including ref and source) in the test
definition, just like you can when creating models. Each .sql file contains
one select statement, and it defines one data test:
tests/assert_total_payment_amount_is_positive.sql
-- Refunds have a negative amount, so the total amount should always be >=
0.
-- Therefore return records where this isn't true to make the test fail
select
order_id,
sum(amount) as total_amount
from {{ ref('fct_payments' )}}
group by 1
having not(total_amount >= 0)
Certain data tests are generic: they can be reused over and over again. A generic data test
is defined in a test block, which contains a parametrized query and accepts arguments. It
might look like:
select *
from {{ model }}
where {{ column_name }} is null
{% endtest %}
You'll notice that there are two arguments, model and column_name, which are then
templated into the query. This is what makes the test "generic": it can be defined on as
many columns as you like, across as many models as you like, and dbt will pass the values
of model and column_name accordingly. Once that generic test has been defined, it can be
added as a property on any existing model (or source, seed, or snapshot). These properties
are added in .yml files in the same directory as your resource.
INFO
If this is your first time working with adding properties to a resource, check out the docs
on declaring properties.
Out of the box, dbt ships with four generic data tests already
defined: unique, not_null, accepted_values and relationships. Here's a full example
using those tests on an orders model:
version: 2
models:
- name: orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: status
tests:
- accepted_values:
values: ['placed', 'shipped', 'completed', 'returned']
- name: customer_id
tests:
- relationships:
to: ref('customers')
field: id
Behind the scenes, dbt constructs a select query for each data test, using the
parametrized query from the generic test block. These queries return the rows where your
assertion is not true; if the test returns zero rows, your assertion passes.
You can find more information about these data tests, and additional configurations
(including severity and tags) in the reference section.
Those four tests are enough to get you started. You'll quickly find you want to use a wider
variety of tests—a good thing! You can also install generic data tests from a package, or
write your own, to use (and reuse) across your dbt project. Check out the guide on custom
generic tests for more information.
INFO
There are generic tests defined in some open source packages, such as dbt-utils and dbt-
expectations — skip ahead to the docs on packages to learn more!
Example
1. Add a .yml file to your models directory, e.g. models/schema.yml, with the following
content (you may need to adjust the name: values for an existing model)
models/schema.yml
version: 2
models:
- name: orders
columns:
- name: order_id
tests:
- unique
- not_null
2. Run the dbt test command:
$ dbt test
Completed successfully
Unique test
Compiled SQL
Templated SQL
select *
from (
select
order_id
from analytics.orders
where order_id is not null
group by order_id
having count(*) > 1
) validation_errors
Compiled SQL
Templated SQL
select *
from analytics.orders
where order_id is null
Normally, a data test query will calculate failures as part of its execution. If you set the
optional --store-failures flag, the store_failures, or the store_failures_as configs,
dbt will first save the results of a test query to a table in the database, and then query that
table to calculate the number of failures.
This workflow allows you to query and examine failing records much more quickly in
development:
A test's results will always replace previous failures for the same test.
FAQs
How do I test one model at a time?Hover to view
Can I store my tests in a directory other than the `tests` directory in my project?Hover to view
How do I run tests on just my sources?Hover to view
As of v0.20.0, you can use the error_if and warn_if configs to set custom failure
thresholds in your tests. For more details, see reference for more information.
For dbt v0.19.0 and earlier, you could try these possible solutions:
Consider an orders table that contains records from multiple countries, and the
combination of ID and country code is unique:
order_id country_code
1 AU
2 AU
... ...
1 US
2 US
... ...
select
country_code || '-' || order_id as surrogate_key,
...
models/orders.yml
version: 2
models:
- name: orders
columns:
- name: surrogate_key
tests:
- unique
2. Test an expression
models/orders.yml
version: 2
models:
- name: orders
tests:
- unique:
column_name: "(country_code || '-' || order_id)"
This is especially useful for large datasets since it is more performant. Check out the docs
on packages for more information.
models/orders.yml
version: 2
models:
- name: orders
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- country_code
- order_id
Overview
Using Jinja turns your dbt project into a programming environment for SQL,
giving you the ability to do things that aren't normally possible in SQL. For
example, with Jinja you can:
In fact, if you've used the {{ ref() }} function, you're already using Jinja!
Jinja can be used in any SQL in a dbt project, including models, analyses, tests,
and even hooks.
Check out the tutorial on using Jinja for a step-by-step example of using Jinja in
a model, and turning it into a macro!
Getting started
Jinja
select
order_id,
{% for payment_method in payment_methods %}
sum(case when payment_method = '{{payment_method}}' then amount end) as
{{payment_method}}_amount,
{% endfor %}
sum(amount) as total_amount
from app_data.payments
group by 1
select
order_id,
sum(case when payment_method = 'bank_transfer' then amount end) as
bank_transfer_amount,
sum(case when payment_method = 'credit_card' then amount end) as
credit_card_amount,
sum(case when payment_method = 'gift_card' then amount end) as
gift_card_amount,
sum(amount) as total_amount
from app_data.payments
group by 1
You can recognize Jinja based on the delimiters the language uses, which we
refer to as "curlies":
Expressions {{ ... }}: Expressions are used when you want to output a
string. You can use expressions to reference variables and call macros.
Statements {% ... %}: Statements don't output a string. They are used
for control flow, for example, to set up for loops and if statements,
to set or modify variables, or to define macros.
Comments {# ... #}: Jinja comments are used to prevent the text within
the comment from executing or outputing a string.
When used in a dbt model, your Jinja needs to compile to a valid query. To
check what SQL your Jinja compiles to:
Using dbt Cloud: Click the compile button to see the compiled SQL in the
Compiled SQL pane
Using dbt Core: Run dbt compile from the command line. Then open the
compiled SQL file in the target/compiled/{project name}/ directory. Use a
split screen in your code editor to keep both files open at once.
Macros
Macros in Jinja are pieces of code that can be reused multiple times – they are
analogous to "functions" in other programming languages, and are extremely
useful if you find yourself repeating code across multiple models. Macros are
defined in .sql files, typically in your macros directory (docs).
select
id as payment_id,
{{ cents_to_dollars('amount') }} as amount_usd,
...
from app_data.payments
select
id as payment_id,
(amount / 100)::numeric(16, 2) as amount_usd,
...
from app_data.payments
A number of useful macros have also been grouped together into packages —
our most popular package is dbt-utils.
After installing a package into your project, you can use any of the macros in
your own project — make sure you qualify the macro by prefixing it with
the package name:
select
field_1,
field_2,
field_3,
field_4,
field_5,
count(*)
from my_table
{{ dbt_utils.dimensions(5) }}
You can also qualify a macro in your own project by prefixing it with
your package name (this is mainly useful for package authors).
FAQs
What parts of Jinja are dbt-specific?Hover to view
Which docs should I use when writing Jinja or creating a macro?Hover to view
Why do I need to quote column names in Jinja?Hover to view
My compiled SQL has a lot of spaces and new lines, how can I get rid of it?Hover to
view
Once you learn the power of Jinja, it's common to want to abstract every
repeated line into a macro! Remember that using Jinja can make your models
harder for other users to interpret — we recommend favoring readability when
mixing Jinja with SQL, even if it means repeating some lines of SQL in a few
places. If all your models are macros, it might be worth re-assessing.
Writing a macro for the first time? Check whether we've open sourced one
in dbt-utils that you can use, and save yourself some time!
{% set ... %} can be used to create a new variable, or update an existing one.
We recommend setting variables at the top of a model, rather than hardcoding
it inline. This is a practice borrowed from many other coding languages, since it
helps with readability, and comes in handy if you need to reference the variable
in two places:
Source properties
Source configurations
{{ source() }} jinja function
source freshness command
Using sources
Sources make it possible to name and describe the data loaded into your
warehouse by your Extract and Load tools. By declaring these tables as sources
in dbt, you can then
Declaring a source
version: 2
sources:
- name: jaffle_shop
database: raw
schema: jaffle_shop
tables:
- name: orders
- name: customers
- name: stripe
tables:
- name: payments
*By default, schema will be the same as name. Add schema only if you want to use a
source name that differs from the existing schema.
If you're not already familiar with these files, be sure to check out the
documentation on schema.yml files before proceeding.
Once a source has been defined, it can be referenced from a model using
the {{ source()}} function.
models/orders.sql
select
...
select
...
from raw.jaffle_shop.orders
version: 2
sources:
- name: jaffle_shop
description: This is a replica of the Postgres database used by our app
tables:
- name: orders
description: >
One record per order. Includes cancelled and deleted orders.
columns:
- name: id
description: Primary key of the orders table
tests:
- unique
- not_null
- name: status
description: Note that the status can change over time
- name: ...
- name: ...
You can find more details on the available properties for sources in
the reference section.
FAQs
With a couple of extra configs, dbt can optionally snapshot the "freshness" of
the data in your source tables. This is useful for understanding if your data
pipelines are in a healthy state, and is a critical component of defining SLAs for
your warehouse.
version: 2
sources:
- name: jaffle_shop
database: raw
freshness: # default freshness
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
loaded_at_field: _etl_loaded_at
tables:
- name: orders
freshness: # make this a little more strict
warn_after: {count: 6, period: hour}
error_after: {count: 12, period: hour}
- name: product_skus
freshness: null # do not check freshness for this table
To snapshot freshness information for your sources, use the dbt source
freshness command (reference docs):
The results of this query are used to determine whether the source is fresh or
not:
Filter
Some databases can have tables where a filter over certain columns are
required, in order prevent a full scan of the table, which could be costly. In order
to do a freshness check on such tables a filter argument can be added to the
configuration, e.g. filter: _etl_loaded_at >= date_sub(current_date(),
interval 1 day). For the example above, the resulting query would look like
select
max(_etl_loaded_at) as max_loaded_at,
convert_timezone('UTC', current_timestamp()) as snapshotted_at
from raw.jaffle_shop.orders
where _etl_loaded_at >= date_sub(current_date(), interval 1 day)
FAQs
Yes!
The dbt source freshness command will output a pass/warning/error status for
each table selected in the freshness snapshot.
Additionally, dbt will write the freshness results to a file in the target/ directory
called sources.json by default. You can also override this destination, use the -
o flag to the dbt source freshness command.
After enabling source freshness within a job, configure Artifacts in your Project
Details page, which you can find by clicking the gear icon and then
selecting Account settings. You can see the current status for source freshness
by clicking View Sources in the job page.
run, test, and list resources that feed into your exposure
populate a dedicated page in the auto-generated documentation site
with context relevant to data consumers
Declaring an exposure
exposures:
- name: weekly_jaffle_metrics
label: Jaffles by the Week
type: dashboard
maturity: high
url: https://bi.tool/dashboards/1
description: >
Did someone say "exponential growth"?
depends_on:
- ref('fct_orders')
- ref('dim_customers')
- source('gsheets', 'goals')
- metric('count_orders')
owner:
name: Callum McData
email: [email protected]
Available properties
Required:
Expected:
Optional:
description
tags
meta
Once an exposure is defined, you can run commands that reference it:
dbt run -s +exposure:weekly_jaffle_report
dbt test -s +exposure:weekly_jaffle_report
When we generate our documentation site, you'll see the exposure appear:
Group members may include models, tests, seeds, snapshots, analyses, and
metrics. (Not included: sources and exposures.) Each node may belong to only
one group.
Declaring a group
groups:
- name: finance
owner:
# 'name' or 'email' is required; additional properties allowed
email: [email protected]
slack: finance-data
github: finance-data-team
Project-level
Model-level
In-file
dbt_project.yml
models:
marts:
finance:
+group: finance
By default, all models within a group have the protected access modifier. This
means they can be referenced by downstream resources in any group in the
same project, using the ref function. If a grouped model's access property is set
to private, only resources within its group can reference it.
models/schema.yml
models:
- name: finance_private_model
access: private
config:
group: finance
# in a different group!
- name: marketing_model
config:
group: marketing
models/marketing_model.sql
Related docs
Analyses
Overview
dbt's notion of models makes it easy for data teams to version control and
collaborate on data transformations. Sometimes though, a certain SQL
statement doesn't quite fit into the mold of a dbt model. These more
"analytical" SQL files can be versioned inside of your dbt project using
the analysis functionality of dbt.
Any .sql files found in the analyses/ directory of a dbt project will be compiled,
but not executed. This means that analysts can use dbt functionality
like {{ ref(...) }} to select from models in an environment-agnostic way.
In practice, an analysis file might look like this (via the open source Quickbooks
models):
analyses/running_total_by_account.sql
-- analyses/running_total_by_account.sql
with journal_entries as (
select *
from {{ ref('quickbooks_adjusted_journal_entries') }}
), accounts as (
select *
from {{ ref('quickbooks_accounts_transformed') }}
select
txn_date,
account_id,
adjusted_amount,
description,
account_name,
sum(adjusted_amount) over (partition by account_id order by id rows
unbounded preceding)
from journal_entries
order by account_id, id
Data Build Tool (DBT) is a popular open-source tool used in the data
analytics and data engineering fields. DBT helps data professionals
transform, model, and prepare data for analysis. If you’re preparing
for an interview related to DBT, it’s important to be well-versed in
its concepts and functionalities. To help you prepare, here’s a list of
common interview questions and answers about DBT.
1. What is DBT?
Answer: DAG stands for Directed Acyclic Graph, and in the context
of DBT, it represents the dependencies between models. DBT uses a
DAG to determine the order in which models are built.
Answer: DBT macros are reusable SQL code snippets that can
simplify and standardize common operations in your DBT models,
such as filtering, aggregating, or renaming columns.
10. How can you perform testing and validation of DBT models?
15. Can DBT work with different data sources and data warehouses?
16. How does DBT handle incremental loading of data from source
systems?
17. What security measures does DBT support for data access and
transformation?
1)View (Default):
Purpose: Views are virtual tables that are not materialized. They
are essentially saved queries that are executed at runtime.
Use Case: Useful for simple transformations or when you want to
reference a SQL query in multiple models.
{{ config(
materialized='view'
) }}
SELECT
...
FROM ...
2)Table:
3)Incremental:
{{ config(
materialized='incremental'
) }}
SELECT
...
FROM ...
Use Case: Useful when dbt needs a way to identify changes in the
data.
{{ config(
materialized='table',
unique_key='id'
) }}
SELECT
...
INTO {{ ref('my_table') }}
FROM ...
5)Snapshot:
{{ config(
materialized='snapshot'
) }}
SELECT
...
INTO {{ ref('my_snapshot_table') }}
FROM ...
Answer: Dbt provides several types of tests that you can use to
validate your data. Here are some common test types in dbt:
version: 2
models:
- name: my_model
tests:
- unique:
columns: [id]
version: 2
models:
- name: my_model
tests:
- not_null:
columns: [name, age]
version: 2
models:
- name: my_model
tests:
- accepted_values:
column: status
values: ['active', 'inactive']
4)Relationship Test (relationship):
Verifies that the values in a foreign key column match primary key
values in the referenced table.
version: 2
models:
- name: orders
tests:
- relationship:
to: ref('customers')
field: customer_id
version: 2
models:
- name: orders
tests:
- referential_integrity:
to: ref('customers')
field: customer_id
models:
- name: my_model
tests:
- custom_sql: "column_name > 0"
21.What is seed?
version: 2
sources:
- name: my_seed_data
tables:
- name: my_seed_table
seed:
freshness: { warn_after: '7 days', error_after: '14 days' }
1)Pre-hooks:
-- models/my_model.sql
{{ config(
pre_hook = "CREATE TEMP TABLE my_temp_table AS SELECT * FROM
my_source_table"
) }}
SELECT
column1,
column2
FROM
my_temp_table
2)Post-hooks:
Example of a post-hook :
-- models/my_model.sql
SELECT
column1,
column2
FROM
my_source_table
{{ config(
post_hook = "UPDATE metadata_table SET last_run_timestamp =
CURRENT_TIMESTAMP"
) }}
23.what is snapshots?
-- snapshots/customer_snapshot.sql
{{ config(
materialized='snapshot',
unique_key='customer_id',
target_database='analytics',
target_schema='snapshots',
strategy='timestamp'
) }}
SELECT
customer_id,
name,
email,
address,
current_timestamp() as snapshot_timestamp
FROM
source.customer;
24.What is macros?
-- my_macro.sql
{% macro my_macro(parameter1, parameter2) %}
SELECT
column1,
column2
FROM
my_table
WHERE
condition1 = {{ parameter1 }}
AND condition2 = {{ parameter2 }}
{% endmacro %}
2. Invocation: You can then use the macro in your dbt models by
referencing it.
-- my_model.sql
{{ my_macro(parameter1=1, parameter2='value') }}
When you run the dbt project, dbt replaces the macro invocation
with the actual SQL code defined in the macro.
-- my_model.sql
{{ my_macro(parameter1=1, parameter2='value') }}
-- another_model.sql
{{ my_macro(parameter1=2, parameter2='another_value') }}
1. Models Directory:
This is where you store your SQL files containing dbt models. Each
model represents a logical transformation or aggregation of your
raw data. Models are defined using SQL syntax and are typically
organized into subdirectories based on the data source or business
logic.
2. Data Directory:
The data directory is used to store any data files that are required for
your dbt transformations. This might include lookup tables,
reference data, or any other supplemental data needed for your
analytics.
3. Analysis Directory:
This directory contains SQL files that are used for ad-hoc querying
or exploratory analysis. These files are separate from the main
models and are not intended to be part of the core data
transformation process.
4. Tests Directory:
dbt allows you to write tests to ensure the quality of your data
transformations. The tests directory is where you store YAML files
defining the tests for your models. Tests can include checks on the
data types, uniqueness, and other criteria.
5. Snapshots Directory:
6. Macros Directory:
Macros in dbt are reusable pieces of SQL code. The macros directory
is where you store these macros, and they can be included in your
models for better modularity and maintainability.
7. Docs Directory:
8. dbt_project.yml:
This YAML file is the configuration file for your dbt project. It
includes settings such as the target warehouse, database connection
details, and other project-specific configurations.
9. Profiles.yml:
This file contains the connection details for your data warehouse. It
specifies how to connect to your database, including the type of
database, host, username, and password.
my_project/
|-- analysis/
| |-- my_analysis_file.sql
|-- data/
| |-- my_model_file.sql
|-- macros/
| |-- my_macro_file.sql
|-- models/
| |-- my_model_file.sql
|-- snapshots/
| |-- my_snapshot_file.sql
|-- tests/
| |-- my_test_file.sql
|-- dbt_project.yml
By using dbt for data refresh, you can streamline and automate the
process of transforming raw data into a clean, structured format for
analysis. This approach promotes repeatability, maintainability, and
collaboration in the data transformation process
1. What is a model in dbt (data build tool)?
A model is a select statement. Models are defined in .sql files (typically in
your models directory):
Each .sql file contains one model / select statement
The name of the file is used as the model name
Models can be nested in subdirectories within the models directory
When you execute the dbt run command, dbt will build this model in your
data warehouse by wrapping it in a create view as or create table as
statement.
2. What are the configurations in a model?
Configurations are “model settings” that can be set in your dbt_project.yml
file, and in your model file using a config block. Some example
configurations include:
What is dbt?↵
In short, dbt (data build tool) turns your data analysts into engineers
and allows them to own the entire analytics engineering workflow.
dbt (data build tool) is easy to use for anyone who knows SQL—you
don’t need to have a high-powered data engineering skillset to build
data pipelines anymore.
Hear why dbt is the iFit engineering team’s favorite tool and how it
helped them drive triple-digit growth for the company:
The dbt Cloud UI offers an attractive interface for individuals of all ranges of
experience to comfortably develop in.
While configuring a continuous integration job in the dbt Cloud UI, you can take
advantage of dbt’s sleek slim UI feature and even use webhooks to run jobs
automatically when a pull request is open.
Lineage is automatically generated for all your models in dbt. This has saved teams
numerous hours in manual documentation time.
Scheduling is simplified in the dbt Cloud UI. Just give it directions on what time you
want a production job to run, and it will take it from there.
Simple example of applying tests on the primary key for a table in a project.
A data model organizes different data elements and standardizes how they relate to one
another and real-world entity properties. So logically then, data modeling is the process of
creating those data models.
Data models are composed of entities, and entities are the objects and concepts whose data
we want to track. They, in turn, become tables found in a database. Customers, products,
manufacturers, and sellers are potential entities.
Each entity has attributes—details that the users want to track. For instance, a customer’s
name is an attribute.
With that out of the way, let’s check out those data modeling interview questions!
Basic Data Modeling Interview Questions
Physical data model - This is where the framework or schema describes how data is
physically stored in the database.
Conceptual data model - This model focuses on the high-level, user’s view of the data in
question
Logical data models - They straddle between physical and theoretical data models,
allowing the logical representation of data to exist apart from the physical storage.
2. What is a Table?
A table consists of data stored in rows and columns. Columns, also known as fields, show
data in vertical alignment. Rows also called a record or tuple, represent data’s horizontal
alignment.
3. What is Normalization?
Database normalization is the process of designing the database in such a way that it reduces
data redundancy without sacrificing integrity.
Ensure relationships between the tables in addition to the data residing in the tables
ERD stands for Entity Relationship Diagram and is a logical entity representation, defining
the relationships between the entities. Entities reside in boxes, and arrows symbolize
relationships.
A surrogate key, also known as a primary key, enforces numerical attributes. This surrogate
key replaces natural keys. Instead of having primary or composite primary keys, data
modelers create the surrogate key, which is a valuable tool for identifying records,
building SQL queries, and enhancing performance.
8. What Are the Critical Relationship Types Found in a Data Model? Describe
Them.
Identifying. A relationship line normally connects parent and child tables. But if a child
table’s reference column is part of the table’s primary key, the tables are connected by a
thick line, signifying an identifying relationship.
Non-identifying. If a child table’s reference column is NOT a part of the table’s primary
key, the tables are connected by a dotted line, signifying a no-identifying relationship.
This is a data model that consists of all the entries required by an enterprise.
10. What Are the Most Common Errors You Can Potentially Face in Data
Modeling?
These are the errors most likely encountered during data modeling.
Building overly broad data models: If tables are run higher than 200, the data model
becomes increasingly complex, increasing the likelihood of failure
Unnecessary surrogate keys: Surrogate keys must only be used when the natural key
cannot fulfill the role of a primary key
The purpose is missing: Situations may arise where the user has no clue about the
business’s mission or goal. It’s difficult, if not impossible, to create a specific business
model if the data modeler doesn’t have a workable understanding of the company’s
business model
Inappropriate denormalization: Users shouldn’t use this tactic unless there is an excellent
reason to do so. Denormalization improves read performance, but it creates redundant
data, which is a challenge to maintain.
The two design schema is called Star schema and Snowflake schema. The Star schema has a
fact table centered with multiple dimension tables surrounding it. A Snowflake schema is
similar, except that the level of normalization is higher, which results in the schema looking
like a snowflake.
These are dimensions used to manage both historical data and current data in data
warehousing. There are four different types of slowly changing dimensions: SCD Type 0
through SCD Type 3.
A data mart is the most straightforward set of data warehousing and is used to focus on one
functional area of any given business. Data marts are a subset of data warehouses oriented to
a specific line of business or functional area of an organization (e.g., marketing, finance,
sales). Data enters data marts by an assortment of transactional systems, other data
warehouses, or even external sources.
Data sparsity defines how much data we have for a model’s specified dimension or entity. If
there is insufficient information stored in the dimensions, then more space is needed to store
these aggregations, resulting in an oversized, cumbersome database.
Entities can be broken down into several sub-entities or grouped by specific features. Each
sub-entity has relevant attributes and is called a subtype entity. Attributes common to every
entity are placed in a higher or super level entity, which is why they are called supertype
entities.
Metadata is defined as “data about data.” In the context of data modeling, it’s the data that
covers what types of data are in the system, what it’s used for, and who uses it.
No, it’s not an absolute requirement. However, denormalized databases are easily accessible,
easier to maintain, and less redundant.
19. What’s the Difference Between forwarding and Reverse Engineering, in the
Context of Data Models?
Forward engineering is a process where Data Definition Language (DDL) scripts are
generated from the data model itself. DDL scripts can be used to create databases. Reverse
Engineering creates data models from a database or scripts. Some data modeling tools have
options that connect with the database, allowing the user to engineer a database into a data
model.
20. What Are Recursive Relationships, and How Do You Rectify Them?
Recursive relationships happen when a relationship exists between an entity and itself. For
instance, a doctor could be in a health center’s database as a care provider, but if the doctor is
sick and goes in as a patient, this results in a recursive relationship. You would need to add a
foreign key to the health center’s number in each patient’s record.
22. Why Are NoSQL Databases More Useful than Relational Databases?
They have a dynamic schema, which means they can evolve and change as quickly as
needed
NoSQL databases have sharding, the process of splitting up and distributing data to
smaller databases for faster access
They offer failover and better recovery options thanks to the replication
This is a grouping of low-cardinality attributes like indicators and flags, removed from other
tables, and subsequently “junked” into an abstract dimension table. They are often used to
initiate Rapidly Changing Dimensions within data warehouses.
Learn over a dozen of data science tools and skills with PG Program in Data Science and get access to
masterclasses by Purdue faculty. Enroll now and add a shining star to your data science resume!
I hope these Data modeling interview questions have given you an idea of the kind of
questions can be asked in an interview. So, if you’re intrigued by what you’ve read about data
modeling and want to know how to become a data modeler, then you will want to check the
article that shows you how to become one.
But if you’re ready to accelerate your career in data science, then sign up for
Simplilearn’s Data Scientist Course. You will gain hands-on exposure to key technologies,
including R, SAS, Python, Tableau, Hadoop, and Spark. Experience world-class training by
an industry leader on the most in-demand Data Science and Machine learning skills.
The program boasts a half dozen courses, over 30 in-demand skills and tools, and more than
15 real-life projects. So check out Simplilearn’s resources and get that new data modeling
career off to a great start!