What Is Snowflake Data Governance
What Is Snowflake Data Governance
Snowflake provides industry-leading features that ensure the highest levels of governance for
your account and users, as well as all the data you store and access in Snowflake.
Column-level Security
Allows the application of a masking policy to a column within a table or view.
Row-level Security
Allows the application of a row access policy to a table or view to determine which
rows are visible in the query result.
Object Tagging
Allows the tracking of sensitive data for compliance, discovery, protection, and
resource usage.
Data Classification
Allows categorizing potentially personal and/or sensitive data to support compliance
and privacy regulations.
Access History
Allows the auditing of the user access history through the Account
Usage ACCESS_HISTORY View.
Object Dependencies
Allows the auditing of how one object references another object by its metadata (e.g.
creating a view depends on a table name and column names) through the Account
Usage OBJECT_DEPENDENCIES view.
place?
Data reliability: Good practices for data governance help
make sure data is correct, consistent, and reliable, allowing
businesses to make better-informed decisions based on
reliable data.
Data compliance with regulations: Businesses are bound by
regulations/standards that govern how data should be kept
and secured. Data governance ensures these requirements
are followed, which can help prevent legal complications
and massive penalties/fines.
Data security: Data breaches can be costly and detrimental
to a business's reputation. Effective data governance
practices can improve data security by controlling access to
sensitive information and protecting it from unauthorized
disclosure or data misuse.
Data efficiency and productivity: Data governance helps
ensure that data is available when and where it is needed,
which cuts down on wasted work and might boost
productivity.
Data decision-making process: Data governance helps
businesses make better decisions and achieve their
objectives more effectively by providing them with reliable
and accurate data.
Now, let's jump back to understanding the concept of Snowflake
data governance.
1) Column-level security
Column-level security feature in Snowflake is only available in
the Enterprise edition or higher. It provides enhanced measures
to safeguard sensitive data in tables or views. It offers two
distinct features, they are:
Dynamic Data Masking hides plain-text data in tables and
views columns based on masking policies at query runtime.
These schema-level policies prevent unauthorized access to
sensitive data while letting authorized users access
sensitive data at query runtime. The policies use conditions
and functions to transform the data when conditions are
met.
External Tokenization is a feature that enables accounts to
tokenize data before loading it into Snowflake and
detokenize the data at query runtime. Tokenization is the
process of removing sensitive data by replacing it with an
undecipherable token. External tokenization makes use of
masking policies with external functions . Before data can
be loaded into Snowflake, it must be tokenized by a third-
party tokenization service. At query execution, Snowflake
uses the external function to make an API call to the
tokenization provider, which then analyzes an externally-
created tokenization policy before returning tokenized or
detokenized data depending on the masking policy
conditions.
What is Masking Policy?
Masking policies are schema-level objects that protect sensitive
data from unwanted access while allowing authorized users to
view the sensitive data during query execution. These masking
policies are made up of conditions and functions that change
data during query execution when the given criteria are met.
Masking policies can be applied to one or more columns in a table
or view that have the same data type. Masking policy conditions
can be expressed using Conditional Expression Functions and
Context Functions or by querying on a custom table.
3) Object tagging
Object-tagging feature in Snowflake is also only available in
the Enterprise edition or higher. To define what "object tags"
are, they are simply labels that allow you to assign metadata to
Snowflake objects, such as tables, views, and schemas, by using
tags. Tags are essentially labels that consist of key-value pairs.
These tags can be used to categorize and describe Snowflake
objects, making them easier to manage and organize.
Check out this official Snowflake documentation to learn more about the in-
depth process of Object tagging and its benefits.
Snowflake object tagging offers several benefits, with one of the
main benefits being the ability to inherit tags based on where
they are applied. On top of that, it also has numerous advantages,
including tracking and finding sensitive data, classifying data and
objects, tracking resource consumption, adding row-level
security, tag-based masking data—and much more!!
TLDR; Object tagging in Snowflake enables efficient data
categorization and organization using labels called "tags,"
providing benefits such as tracking sensitive data, implementing
access policies, and simplifying Snowflake governance
5) Data classification
Data classification feature in Snowflake is also only available in
the Enterprise edition or higher. Data classification in
Snowflake is a feature that allows users to automatically identify
and classify columns in their tables containing personal or
sensitive data.
The classification process involves three main
steps: analyze, review, and apply. The first step, analyze,
involves calling the EXTRACT_SEMANTIC_CATEGORIES function
to analyze the columns and output possible categories and
associated probabilities. The second step, 'review,' involves
validating the results, while the third step, 'apply,' involves
assigning system tags to columns containing personal or
sensitive data.
Check out the official Snowflake documentation , to learn more about the data
classification.
6) Object dependencies
Object Dependencies is a built-in Snowflake governance feature
that allows users to identify dependencies among Snowflake
objects.
In Snowflake, an object dependency is established whenever an
existing object needs to reference some metadata on its behalf or
for at least one other object. A dependency can be triggered by
an object's name, its ID value or both.
Object Dependencies enables users to view and track these
dependencies between Snowflake objects, which is particularly
useful for impact analysis, data integrity assurance, and
compliance purposes.
Learn more about it from here: Snowflake official documentation
Object Dependencies are a really important feature for
compliance officers and auditors who need to trace data from a
given object to its original data source to meet regulatory
requirements.
7) Access History
Access History feature in Snowflake is also only available in
the Enterprise edition or higher. Access History is a built-in
Snowflake governance feature that provides a record of all user
activity related to data access and modification within a
Snowflake account. Essentially, it tracks user queries that read
column data and SQL statements that write data (INSERT,
UPDATE, DELETE). The Access History feature is particularly
useful for regulatory compliance auditing and also provides
insights into frequently accessed tables and columns.
The Access History feature in Snowflake is available through the
Account Usage ACCESS_HISTORY view.
Check out the official Snowflake documentation , to learn more about Access
History.
TLDR; Access history features help users easily maintain a
detailed record of all data access and modification events within
their Snowflake accounts.
1) Collibra
Collibra is an enterprise-oriented data governance tool that helps
businesses and organizations understand and manage their data
assets. It enables businesses and organizations to create an
inventory of data assets, capture metadata about 'em, and govern
these assets to ensure regulatory compliance. The tool is
primarily used by IT, data owners, and administrators in charge of
data protection and compliance to inventory and track how data
is used. Collibra's aim is to protect data, ensure it is
appropriately governed and used, and eliminate potential fines
and risks from a lack of regulatory compliance.
Collibra's mission is to help businesses secure their data, ensure
appropriate governance and utilization, and eliminate potential
fines and risks associated with noncompliance with regulatory
requirements. So, by integrating Collibra with Snowflake ,
enterprises can effectively manage their data assets within
Snowflake by leveraging Collibra's governance capabilities. This
combination enables data democratization and enterprise-wide
collaboration, while also enabling businesses to easily discover
and scale access to reliable data. The unique features and
complementary capabilities of both platforms empower
businesses to increase data usage, collaboration, and ultimately
deliver faster insights and innovation, all while ensuring proper
governance of their data within Snowflake.
Collibra (Source: collibra.com)
Collibra offers six key functional areas to aid in data governance:
Collibra Data Quality & Observability : Monitors data quality
and pipeline reliability to aid in remedying anomalies.
Collibra Data Catalog : A single solution for finding and
understanding data from various sources.
Data Governance : A location for finding, understanding, and
creating a shared language around data for all individuals
within an organization.
Data Lineage : Automatically maps relationships between
systems, applications, and reports to provide a
comprehensive view of data across the enterprise.
Collibra Protect : Allows for the discovery, definition, and
protection of data from a unified platform.
Data Privacy : Centralizes, automates, and guides workflows
to encourage collaboration and address global regulatory
requirements for data privacy.
2) Alation
Alation is a sophisticated data catalog solution designed for
enterprise-level organizations, acting as a unified reference for all
their data needs. It automatically scans and indexes over 60
distinct data sources, encompassing on-premises databases,
cloud storage, file systems, and business intelligence tools.
Utilizing query log ingestion, Alation analyzes queries to pinpoint
the most frequently accessed data and its primary users. This
information forms the foundation of the catalog, which allows
users to collaborate and contextualize the data. With the catalog
established, data analysts and scientists can swiftly locate,
scrutinize, validate, and repurpose data, enhancing their
productivity.
However, Alation's capabilities extend beyond a mere data
catalog solution. It also serves as a data governance platform,
enabling analytics teams to effectively manage and enforce
policies for data consumers. Through Alation's comprehensive
metadata management, organizations can establish and enforce
policies, monitor usage, and maintain compliance with data
privacy regulations. Its adaptable workflows and dashboards
empower governance teams to effortlessly create, modify, and
disseminate policies, ensuring responsible data usage across the
enterprise.
Alation is an optimal solution for Snowflake data governance, as
it centralizes data, fosters collaboration, and enforces adherence
to data access and usage policies. This leads to heightened
productivity and innovation, making Alation an invaluable
resource for organizations seeking efficient Snowflake data
governance.
Alation (Source: Alation)
Key benefits of using Alation:
Boost analyst productivity
Improve data comprehension
Foster collaboration
Minimize the risk of data misuse
Eliminate IT bottlenecks
Easily expose and interpret data policies
Alation offers various solutions to improve productivity, accuracy
and data-driven decision-making. These include:
Alation Data Catalog : Improves the efficiency of analysts
and the accuracy of analytics, empowering all members of
an organization to find, understand, and govern data
efficiently.
Alation Connectors : A wide range of native data sources
that speed up the process of gaining insights and enable
data intelligence throughout the enterprise. (Additional data
sources can also be connected with the Open Connector
Framework SDK.)
Alation Platform : An open and intelligent solution for various
metadata management applications, including search and
discovery, data governance, and digital transformation.
Alation Data Governance App : Simplifies secure access to
the best data in hybrid and multi-cloud environments.
Alation Cloud Service : Offers businesses and organizations
the option to manage their data catalog on their own or have
it managed for them in the cloud.
Conclusion
Snowflake data governance is essential for ensuring data quality,
security, and accuracy. Snowflake provides a comprehensive set
of features to help businesses implement data governance, but
these features must be combined with an effective strategy. In
this article, we defined Snowflake data governance, discussed
best practices for implementation, and provided an overview of
the built-in and third-party tools available to support Snowflake
data governance.
You can think of Snowflake Governance as a fence protecting
your data garden from any trespassers. Use it to your full
advantage to create reliable data security measures and data
access controls, safeguarding the privacy of your sensitive data
stored in Snowflake.
FAQs
What is Snowflake data governance?
Snowflake data governance refers to the policies, procedures,
and practices implemented to manage and control data stored on
the Snowflake Platform. It ensures data integrity, security, and
management.
What are the advantage of having a data governance strategy?
Advantage of data governance strategy include improved data
reliability, compliance with regulations, enhanced data security,
increased data efficiency and productivity, and better decision-
making based on accurate data.
What are the key features of Snowflake's built-in data
governance?
Snowflake's built-in data governance features include column-
level security, row-level access policies/security, object tagging,
object tag-based masking policies, data classification, object
dependencies, and access history.
What are some best practices for implementing Snowflake data
governance?
Best practices include effectively using Snowflake's built-in
governance features, establishing data policies and procedures,
forming a governance team, developing a data governance
framework, implementing security measures, maintaining data
quality standards, and leveraging automation and monitoring
tools.
Which tools can be used for Snowflake data governance?
Snowflake can be integrated with a range of security and
governance tools, such as Collibra, Alation, and others.
==
Quick answer:
TL;DR? Here are the highlights of this article and what to expect from it:
Snowflake offers data governance capabilities such as:
Column-level security
Row-level access
Object tag-based masking
Data classification
Oauth
Data governance in Snowflake can be improved with a Snowflake-validated
data governance solution. Such a solution would:
Handle governance for data from multiple sources (non-Snowflake)
Enable data lineage
Enhance data discovery
Embed collaboration
Empower cross-functional teams
This article delves into the specifics of data governance for Snowflake assets
and improving it so that you can manage your entire data estate.
Looking for a data catalog? Make sure to check out Atlan — approved by the
Snowflake Ready Validation Program. Book a demo or take a guided product
tour.
Table of contents
1. What is Snowflake?
2. What is data governance in Snowflake?
3. Benefits of governing Snowflake data assets
4. Data governance capabilities in snowflake
5. 5 key challenges of implementing data governance in Snowflake
6. Snowflake data governance with Atlan
7. Atlan: A Snowflake validated data governance solution
What is Snowflake?
Snowflake’s platform enables a wide variety of workloads and applications on any
cloud, including data warehouses, data lakes, data pipelines, and collaboration as
well as business intelligence, data science, and data analytics applications.
Quoting Snowflake’s website:
Snowflake is a fully managed service that’s simple to use but can power a near-
unlimited number of concurrent workloads. Snowflake is your solution for data
warehousing, data lakes, data engineering, data science, data application
development, and securely sharing and consuming shared data.
Snowflake stands out because it decouples both storage and compute. This means
you can spin up and down machines on demand based on the analytics workload.
Quoting Snowflake from their S-1 form,
Our platform solves the decades-old problem of data silos and data
governance. Leveraging the elasticity and performance of the public cloud, our
platform enables customers to unify and query data to support a wide variety
of use cases. It also provides frictionless and governed data access so users can
securely share data inside and outside of their organizations, generally without
copying or moving the underlying data.
Snowflake is a cloud-agnostic platform that can distribute data across regions as well
as across cloud providers such as AWS, Azure, and GCP for data storage. Some of
the customers of Snowflake include Dropbox, Doordash, Hubspot, Adobe, and Fitbit.
Leading firms have eliminated millions of dollars in cost from their data
ecosystems and enabled digital and analytics use cases worth millions or even
billions of dollars. Data governance is one of the top three differences between
firms that capture this value and firms that don’t.
Data governance ensures that data throughout its lifecycle is accurate, consistent,
fresh, and complete.
Data access
Data governance helps provide flexible, and scalable access policies to data users.
Security and compliance
Data governance monitors and reduces the risk of exposing private data. It enables
data assets to be auditable across their life cycle and thus helping businesses comply
with regulations like GDPR, HIPAA, etc.
Life cycle management
Governance helps set policies for data creation, data retention, and data deletion.
Trust
Governance improves the trust and confidence in data and thus helps increase the
ROI of your data assets
Data quality and data availability: Key benefits of having a robust data governance
[Download ebook] → Rethinking Data Governance for the Modern Data Stack
1. Column-level security
2. Row-level access policies
3. Object tag-based masking policies
4. Data classification
5. Object dependencies
6. Oauth
1. Column-level governance with data masking
Column-level governance lets you add data masking policy within a table or a view
through Dynamic data masking and External tokenization.
Learn more: Column-level data governance for Snowflake tables and views
2. Row-level governance
Snowflake uses row-level policies to control what rows are returned in the query
result — SELECT, UPDATE, DELETE, and MERGE statements.
Learn more: Row access policies for Snowflake data
3. Object tagging
Data stewards use tags to track sensitive data for security, privacy, compliance,
discovery, and resource usage. Tags become powerful when you attach them with
access policies. This helps manage and scale data governance in Snowflake easier.
Learn more: Object tag-based access policies
4. Data classification
Snowflake samples all data assets and classifies them as tags. The broad classification
use cases include PII, data access, policies, and anonymization.
Learn more: Introduction to Snowflake data classification
5. Object dependencies
Object dependencies enable you to track dependencies within Snowflake data assets.
Dependencies are useful for performing an impact analysis, being compliant, and
maintaining data integrity.
Learn more: Object dependencies and their use-cases
6. OAuth
Snowflake supports both built-in and third-party OAuth for the authentication and
authorization of users.
Learn more: Introduction to OAuth in Snowflake
Data governance, especially on the cloud, proliferates across multiple tools and
processes ranging from ingestion, ETL, data quality, and business intelligence(BI). So
the need for a centralized metadata and data governance platform is a must-have to
get a complete hold on governance.
3. Lack of data lineage
Data governance in essence ensures that high-quality data exists for analysis
throughout the life-cycle of the data. Lineage helps track this by helping visualize the
journey of the data from the source to the dashboard.
4. Data discovery challenges
The two core components of a good data governance system are “availability” and
“usability” of data. The lack of a data catalog and a business glossary in Snowflake
might make it difficult to find, use, and collaborate on data.
5. Governance management for non-technical/business users
Data governance setup and management on Snowflake are entirely done through
writing SQL queries. This makes it harder for non-technical/business users to use
Snowflake for governance.
CREATE OR REPLACE TAG Classification;
ALTER TAG Classification set comment =
"Tag Tables or Views with one of the following classification
values:
'Confidential', 'Restricted', 'Internal', 'Public'";
With Atlan as the central metadata management system, there is no need for end-
users to log into different systems to find and understand the data. This reduces the
time to value of any data project.
Atlan’s data dictionary and a business glossary help crowdsource the tribal
knowledge and create a unified taxonomy of data assets across your Snowflake
warehouse. The business glossary maps physical data elements like databases, tables,
columns, and SQL queries to business terms, definitions, metrics, KPIs, calculations,
and reports.
Learn more: How to create and manage a business glossary in Atlan
Snowflake Data Governance: Atlan Data lineage helps you understand the journey
Atlan auto classifies PII data like email, name, phone number, and credit card
information.
As the number of compliances (GDPR, HIPPAA, CCPA, etc.) companies must adhere
to is growing - protecting sensitive data is becoming ever more challenging. Altan
auto propagates all classifications downstream such that every table that is derived
from the column is tagged with the same classification. Atlan also helps mask
sensitive data through hashing, nullifying, and redacting.
Learn more: How to add the classification to a Snowflake data asset
Snowflake Data Governance: Scale access control with Personas and Purposes.
Source: Atlan
Embedded collaboration: Context on demand without switching costs
Atlan’s reporting center gives you a quick snapshot of important governance metrics
like:
Total data assets crawled and cataloged — the breakdown of assets by data
sources
A breakdown of asset categories — SQL assets, BI assets, process assets, etc.
Asset drill down by Persona, Owners, and Groups.
Total assets that have been verified and certified.
Learn more: Track Snowflake data governance metrics on Atlan’s Reporting Center
“Atlan’s unique, collaboration-first approach for the modern data stack helps to
break down organizational silos and empower cross-functional teams to work
together to make better business decisions.”
Atlan is more than a metadata management and data cataloging tool. Atlan is built
by data engineers for solving the evolving needs of modern data teams. Atlan’s
capabilities include faster discovery, transparent data flow, robust governance, and
collaboration built on open infrastructure and an easy-to-use user interface.
The deep integration and the open API enable Atlan to solve other modern data
governance use cases across DataOps, workflow management, and pipeline
automation.
Atlan has been named a leader in The Forrester Wave™: Enterprise Data Catalogs for
DataOps, Q2 2022.
The report states,
“Atlan is the tool of choice for DataOps and data product deployment. Atlan’s
vision is to create frictionless data product deployment through a single
metadata and data automation platform.”
While this might seem self-explanatory, if your data is performant but not available, usable,
validated, or secure, then the data could be adding risk to your enterprise. This could be in
the form of your ability to trust the quality and consistency of your data, or errors in your data
being made public through misreporting revenue. Consequently, if you’re sacrificing
security in order to provide the availability to your users or customers, you could be putting
your enterprise at risk.
Within this article, we cover each of these elements of data governance in depth — along
with the controls that the Snowflake Data Cloud provides.
What Is Data Governance?
In data engineering, the term data governance can take on a lot of meanings and is heavily
overloaded.
Data governance describes how enterprises put in place a collection of processes and
practices that effectively manage data, data assets, and metadata within the platform.
This includes controls for data availability, usability, integrity, and security.
While data governance has a much broader scope in totality and complexity, this post will
cover three of the many layers of data governance:
For this post, we are focusing on the yellow boxes in this process.
Data governance is iterative, not static.
You must regularly re-evaluate and update your controls as your enterprise changes.
Oftentimes, data stewards review existing practices, meet with business stakeholders to
determine the controls that are necessary, and audit the controls that are being enforced
within the enterprise.
Why Does It Matter?
Data governance controls ensure that data is consistent and dependable within the data’s
lifecycle. This includes everything from initial creation and ingestion from a source to
complex use cases such as a machine learning model result.
By enforcing specific standards for data governance, you ensure that quality data is being
used to drive key business decisions, inform customers, and empower users.
Without data governance processes and practices, you run the risk of data inconsistencies.
For example, if you’re pulling addresses from multiple sources that don’t follow a particular
standard, you will have to determine how to resolve these addresses. Are they duplicates?
Are they accurate? Without effective data governance in place, these questions frequently
require time and cost your enterprise money.
These same concerns apply to all kinds of data, from time zones on dates stored in your
database to missing data points between systems.
Looking for more best practices around Snowflake role hierarchy?
This guide will provide actionable steps on how to streamline your onboarding process for
users on Snowflake Data Cloud!
How Do I Build Data Controls in Snowflake?
While many data governance controls require manual auditing and
validation, Snowflake provides functionality to reduce the number of controls required. In
some instances, Snowflake can also automate data governance controls.
Let’s take a look at each of our previously mentioned data governance concerns and how
they’re managed within Snowflake.
Availability
It’s critical that your data be available when your business needs it.
Snowflake is available on AWS, Azure, and Google Cloud Platform, and it inherits many of
the high availability features that these cloud providers offer.
Snowflake builds on top of the availability of these cloud provider offerings with:
Fault tolerance of data nodes
Automatically spreading data across multiple availability zones
Time travel for accessing deleted data and performing data backups
Separation of compute and storage resources
Data replication and fail-over (cross-region and cross-cloud)
Amidst a spree of data leaks over the last few years, data privacy and security have
This, topped up with a solid identity and access management framework, does well
with the application and infrastructure layer, but do all of these measures suffice for
While the data infrastructure can be secured using the tools mentioned above, data
itself needs several other layers of protection within the organization to prevent,
among other things, unauthorized access and accidental sharing of data. To do that,
data teams usually had to rely on an external data governance tool before some data
warehouse and data lake platform companies started offering an integrated data
SQL is something you can, more or less, take for granted in teams that directly deal
with data, which is why this makes using Snowflake a desirable proposition for
governance tool, but it does some of the heavy lifting in certain areas. Snowflake’s
governance features help you answer the following questions and more:
Which business categories do your data objects belong to? This is done
security features.
Which users are reading data from and writing data? Snowflake logs all
such activity in the ACCESS_HISTORY table.
How has a data asset (table or column) transformed in a data pipeline after
This article will introduce you to the data above governance themes with examples to
help you navigate your data governance journey on Snowflake. Let’s dive right in!
Tagging objects in Snowflake
Tags are general-purpose labeling constructs and provide a clean and highly flexible
like accounts and warehouses to fine-grained schema-level objects like tables, views,
and columns. Tags can be used in several ways, some prescribed by Snowflake as
listed below, while others are left for you to discover for your business use case:
In addition to governance, tags help with many other use cases, such as search and
discovery in data catalogs, business intelligence tools, etc. Many of these tools, such
as Select Star, Immuta, and Collibra, have a two-way tag sync functionality, where you
can apply tags from your data catalogs and they’ll get updated in Snowflake. This
deserves mention because, in most cases, data catalogs only extract metadata from
data sources and don’t write back to them. Think of this as out-of-the-box reverse ETL
for metadata, i.e., the data catalog enriching the data source back.
Identifying sensitive data
While tags allow you to arbitrarily create labels for your data assets, in many cases,
you would also need to classify your data assets based on their content. To do that,
you can use Snowflake’s data classification feature. The classification process requires
running to use this feature. You can use this feature to:
The core step of the process is extracting semantic categories that represent personal
attributes, such as name, address, age, salary, and so on. To do that, you can call
the EXTRACT_SEMANTIC_CATEGORIES function and, using a sample of the column
data, and it will give you the probability of each column containing sensitive data.
Aside from that, it provides you attributes like confidence, coverage, and two system-
defined tags, SEMANTIC_CATEGORY and PRIVACY_CATEGORY.
Data classification isn’t a fully automatic process; it goes through a cycle of analysis,
review, and application, where Snowflake takes care of steps one and three. However,
it falls upon the data engineers to review whether Snowflake’s correct interpretation
parameters to automatically associate the tags to the table columns. You can use these
While RBAC (role-based access control) and DAC (discretionary access control) allow
you to manage object-level access in Snowflake, there are two other methods that you
can use to restrict access to data in tabular objects, such as tables, views, and
materialized views. Some examples of where granular access to data would make
Restrict access to sales records for one region by members of other regions
requirements
Protect PII and PHI data by masking fields partly or in full based on
various factors
With row access policies and masking policies, you can restrict access to rows and
columns. Both these methods allow for on-the-fly evaluation of policies. Based on
these evaluations, you’ll be granted access to specific rows or columns. Row access
policies use both Conditional Expression functions and Context Functions for policy
enforcement. For instance, you can check for the role a user is using to access the data
by using the CURRENT_ROLE() context function.
Mapping data lineage
While implementing the modern data stack, it’s usual for a data cataloging or lineage
tool to be set up to allow business and technical users to understand the flow of data
and how it’s been transformed from source to target. Snowflake’s data lineage
Visualize how the data loads from sources, integrates with other sources,
Debug ETL issues that end up in stale, erroneous, and incomplete data
Snowflake’s data lineage capabilities rest upon two distinct features; the metadata for
both is captured in the ACCOUNT_USAGE schema. The OBJECT_DEPENDENCIES view
allows you to look at the dependency graph between objects, i.e., it shows you which
objects need which other objects for proper functioning. This is helpful when
planning to handle the changes to an upstream object and the impact those changes
which tables, views, materialized views, and columns were accessed. It also provides
the text of the query in full. Using access history, you can also use external tools with
their native SQL parsers to build or enrich data lineage as and when required.
Conclusion
This article walked you through Snowflake’s native data governance features. It also
talked about identifying and protecting sensitive data, restricting access to data using
various control measures, and using the metadata to build the lineage graph. For
more, check out Snowflake’s guide to data governance, which talks about these
capabilities in much more detail. Also, check out the latest data governance-related