0% found this document useful (0 votes)
107 views35 pages

What Is Snowflake Data Governance

Snowflake provides strong data governance capabilities including column-level security, row-level security, object tagging, and data classification. These features allow masking policies to be applied to columns, row access policies to control which rows are visible, tagging of objects to track sensitive data, and categorization of personal data to support compliance. Additional governance includes access history auditing and monitoring data usage and policies.

Uploaded by

Rick V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views35 pages

What Is Snowflake Data Governance

Snowflake provides strong data governance capabilities including column-level security, row-level security, object tagging, and data classification. These features allow masking policies to be applied to columns, row access policies to control which rows are visible, tagging of objects to track sensitive data, and categorization of personal data to support compliance. Additional governance includes access history auditing and monitoring data usage and policies.

Uploaded by

Rick V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Data Governance in Snowflake

Snowflake provides industry-leading features that ensure the highest levels of governance for
your account and users, as well as all the data you store and access in Snowflake.

Column-level Security
Allows the application of a masking policy to a column within a table or view.

Row-level Security
Allows the application of a row access policy to a table or view to determine which
rows are visible in the query result.

Object Tagging
Allows the tracking of sensitive data for compliance, discovery, protection, and
resource usage.

Tag-based masking policies


Allows protecting column data by assigning a masking policy to a tag and then setting
the tag on a database object or the Snowflake account.

Data Classification
Allows categorizing potentially personal and/or sensitive data to support compliance
and privacy regulations.

Access History
Allows the auditing of the user access history through the Account
Usage ACCESS_HISTORY View.

Object Dependencies
Allows the auditing of how one object references another object by its metadata (e.g.
creating a view depends on a table name and column names) through the Account
Usage OBJECT_DEPENDENCIES view.

Data Governance area in Snowsight


Allows using the Monitoring » Governance area to monitor and report on the usage
of policies and tags with tables, views, and columns using two different
interfaces: Dashboard and Tagged Objects. For details, see:

 Create and assign tags


 Monitor tags with Snowsight
 Monitor masking policies with Snowsight
 Monitor row access policies with Snowsight
What is Snowflake Data Governance?
To understand Snowflake Data Governance, it is necessary first
to understand the concept of Data Governance in general.

What is Data Governance?


Data governance is a set of practices, processes, and regulations
that control how data is gathered, kept, used, and shared. It
involves creating policies and regulations for how data is
gathered, stored, shared and used; this includes setting rules
around who can access the data and under what conditions they
are allowed to do so. Data governance also establishes processes
to ensure that all of an organization's data is accurate and safe
so that users can make decisions based on accurate information.
Data governance is critical for safeguarding sensitive data and
ensuring it is secure, compliant and high-quality. It lowers data
breaches and misuse while improving data quality, allowing for
discovering relevant data insights. Good data governance is
essential in regulated industries such as healthcare, banking, and
finance because it ensures data traceability and prevents
unauthorized access or removal.

What are the benefits of having a data governance strategy in

place?
 Data reliability: Good practices for data governance help
make sure data is correct, consistent, and reliable, allowing
businesses to make better-informed decisions based on
reliable data.
 Data compliance with regulations: Businesses are bound by
regulations/standards that govern how data should be kept
and secured. Data governance ensures these requirements
are followed, which can help prevent legal complications
and massive penalties/fines.
 Data security: Data breaches can be costly and detrimental
to a business's reputation. Effective data governance
practices can improve data security by controlling access to
sensitive information and protecting it from unauthorized
disclosure or data misuse.
 Data efficiency and productivity: Data governance helps
ensure that data is available when and where it is needed,
which cuts down on wasted work and might boost
productivity.
 Data decision-making process: Data governance helps
businesses make better decisions and achieve their
objectives more effectively by providing them with reliable
and accurate data.
Now, let's jump back to understanding the concept of Snowflake
data governance.

What is Snowflake data governance?


Snowflake data governance refers to the policies, procedures,
and practices that can be implemented to guarantee proper
management and control of data stored on the Snowflake
Platform. To keep the integrity and value of data, Snowflake data
governance needs a full-scale approach that includes data
security, data quality, and data management.
Snowflake data governance is fundamentally about creating and
following rules about accessing, protecting, and using data. This
includes establishing roles and permissions to manage who can
access and update data on the Snowflake environment. Users can
also leverage the powerful features provided by Snowflake, such
as Virtual Private Snowflake (VPS), and third-party services,
such as PrivateLink (not affiliated with Snowflake), to safeguard
their data better and make sure only authorized users are allowed
to access it.

Overview of built-in Snowflake governance features:

1) Column-level security
Column-level security feature in Snowflake is only available in
the Enterprise edition or higher. It provides enhanced measures
to safeguard sensitive data in tables or views. It offers two
distinct features, they are:
 Dynamic Data Masking hides plain-text data in tables and
views columns based on masking policies at query runtime.
These schema-level policies prevent unauthorized access to
sensitive data while letting authorized users access
sensitive data at query runtime. The policies use conditions
and functions to transform the data when conditions are
met.
 External Tokenization is a feature that enables accounts to
tokenize data before loading it into Snowflake and
detokenize the data at query runtime. Tokenization is the
process of removing sensitive data by replacing it with an
undecipherable token. External tokenization makes use of
masking policies with external functions . Before data can
be loaded into Snowflake, it must be tokenized by a third-
party tokenization service. At query execution, Snowflake
uses the external function to make an API call to the
tokenization provider, which then analyzes an externally-
created tokenization policy before returning tokenized or
detokenized data depending on the masking policy
conditions.
What is Masking Policy?
Masking policies are schema-level objects that protect sensitive
data from unwanted access while allowing authorized users to
view the sensitive data during query execution. These masking
policies are made up of conditions and functions that change
data during query execution when the given criteria are met.
Masking policies can be applied to one or more columns in a table
or view that have the same data type. Masking policy conditions
can be expressed using Conditional Expression Functions and
Context Functions or by querying on a custom table.

In short, Snowflake's column-level security enables users to apply


masking policies to protect sensitive data in tables or views. This
feature grants access and visibility only to authorized users who
need it, through a flexible policy-driven approach that allows
secure control over the data.

2) Row-level access policies/security


Row-level security is a feature in Snowflake that enables
administrators to limit access to particular rows in tables/views
based on a set of policies defined in the schema. These policies
can be basic or sophisticated, depending on the specific security
requirements.
Note: Row-level security feature in Snowflake is also only available in
the Enterprise edition or higher.
A row access policy is also a schema-level object that controls
whether a given row in a table or view is accessible through
SELECT operations or by UPDATE, DELETE, and MERGE
operations. The policy can include conditions and functions to
transform the data at query execution time if the conditions are
satisfied. This policy-driven approach is intended to encourage
the partitioning of tasks to enable teams (especially Snowflake
governance teams) to develop regulations limiting the exposure
of sensitive data. Typically, the object owner or role with
the OWNERSHIP privilege on the object has complete access to
the underlying data. Yet, row access policies can override this
access and limit the visibility of specific rows in the query result.
You can add a row access policy to a table or view either when
the object is created or after the object is created. The policy
admin can easily apply row access policies to tables and views.
Check out this official Snowflake documentation to learn more about the Row
level policy and how it works.
TLDR; Snowflake's row-level security is a powerful way to control
access to sensitive data at a granular level. It ensures that only
authorized users or roles can see or access specific rows of data
in a table or view.

3) Object tagging
Object-tagging feature in Snowflake is also only available in
the Enterprise edition or higher. To define what "object tags"
are, they are simply labels that allow you to assign metadata to
Snowflake objects, such as tables, views, and schemas, by using
tags. Tags are essentially labels that consist of key-value pairs.
These tags can be used to categorize and describe Snowflake
objects, making them easier to manage and organize.
Check out this official Snowflake documentation to learn more about the in-
depth process of Object tagging and its benefits.
Snowflake object tagging offers several benefits, with one of the
main benefits being the ability to inherit tags based on where
they are applied. On top of that, it also has numerous advantages,
including tracking and finding sensitive data, classifying data and
objects, tracking resource consumption, adding row-level
security, tag-based masking data—and much more!!
TLDR; Object tagging in Snowflake enables efficient data
categorization and organization using labels called "tags,"
providing benefits such as tracking sensitive data, implementing
access policies, and simplifying Snowflake governance

4) Object tag-based masking policies.


Tag-based masking policies in Snowflake make it possible to
apply a masking policy automatically to all columns with a
specific tag. This feature makes protecting data easier because it
eliminates the need to apply a masking policy to each column by
hand or manually. A tag-based masking policy is created using
the ALTER TAG command, which allows you to associate a
masking policy with a specific tag.
Whenever a column is tagged with the tag associated with a
masking policy, the policy is automatically applied to that
particular column. The masking policy will only get applied if the
column's datatype matches the datatype specified in the masking
policy signature. If a column has both a directly assigned
masking policy and a tag-based masking policy, the directly
assigned policy takes precedence. Also, it is recommended to
create a generic masking policy for each data type supported by
Snowflake, such as STRING, NUMBER, and TIMESTAMP; this
policy should specify how authorized roles can see the raw data
while unauthorized roles can see a fixed masked value. This
simplifies the initial process of column data protection.
Learn more about it from here: Snowflake official documentation
TLDR; Tag-based masking policies make protecting data easier by
applying a masking policy automatically to all columns that have
a certain tag; this feature ensures consistent data protection
across all columns that share the same tag.

5) Data classification
Data classification feature in Snowflake is also only available in
the Enterprise edition or higher. Data classification in
Snowflake is a feature that allows users to automatically identify
and classify columns in their tables containing personal or
sensitive data.
The classification process involves three main
steps: analyze, review, and apply. The first step, analyze,
involves calling the EXTRACT_SEMANTIC_CATEGORIES function
to analyze the columns and output possible categories and
associated probabilities. The second step, 'review,' involves
validating the results, while the third step, 'apply,' involves
assigning system tags to columns containing personal or
sensitive data.
Check out the official Snowflake documentation , to learn more about the data
classification.

6) Object dependencies
Object Dependencies is a built-in Snowflake governance feature
that allows users to identify dependencies among Snowflake
objects.
In Snowflake, an object dependency is established whenever an
existing object needs to reference some metadata on its behalf or
for at least one other object. A dependency can be triggered by
an object's name, its ID value or both.
Object Dependencies enables users to view and track these
dependencies between Snowflake objects, which is particularly
useful for impact analysis, data integrity assurance, and
compliance purposes.
Learn more about it from here: Snowflake official documentation
Object Dependencies are a really important feature for
compliance officers and auditors who need to trace data from a
given object to its original data source to meet regulatory
requirements.

7) Access History
Access History feature in Snowflake is also only available in
the Enterprise edition or higher. Access History is a built-in
Snowflake governance feature that provides a record of all user
activity related to data access and modification within a
Snowflake account. Essentially, it tracks user queries that read
column data and SQL statements that write data (INSERT,
UPDATE, DELETE). The Access History feature is particularly
useful for regulatory compliance auditing and also provides
insights into frequently accessed tables and columns.
The Access History feature in Snowflake is available through the
Account Usage ACCESS_HISTORY view.
Check out the official Snowflake documentation , to learn more about Access
History.
TLDR; Access history features help users easily maintain a
detailed record of all data access and modification events within
their Snowflake accounts.

Best Practices for Implementing Snowflake Data Governance

1) Use Snowflake's built-in governance features effectively


Snowflake offers a range of built-in governance features that can
be used to ensure that data is properly classified, secured, and
audited. These features include object tagging, dynamic data
masking, row access policies, and object dependencies. It is
crucial to understand and use these features effectively to ensure
that data is appropriately governed.

2) Data policies and procedure


Data policies and procedures are essential for ensuring data is
managed and governed effectively. These policies and procedures
should cover various areas such as data quality, data privacy,
data security, data retention, and data access. The policies and
procedures should be reviewed and updated regularly to ensure
that they remain relevant and effective.

3) Establishing Effective Snowflake Data Governance Team


To establish effective Snowflake data governance, it is crucial to
create a dedicated Governance Council/Committee that will serve
as the governance team. This team will develop and enforce
cross-functional rules and procedures to ensure data is managed
effectively. It is important that each team member has a clearly
defined role and responsibility.
Here are some essential roles to consider:
 Stakeholders
 Data stewards
 Data managers
 Data custodians
 Compliance officers
 Data Architects
 Information Security Officers
 Data Quality Analysts
So, by forming a Snowflake governance team with these key
roles, businesses/organizations can ensure that their Snowflake
data governance program is effective and aligned with the needs
of the business.

4) Develop a data governance framework


A data governance framework should be developed to ensure that
data is managed and governed in a consistent and structured
manner. The framework should include policies, procedures,
guidelines, and standards used to manage and govern data across
the organization. The framework should also include roles and
responsibilities for data governance and a process for managing
data governance issues and escalations.

5) Implement Security measures


Security measures are essential for protecting data from
unauthorized access or breaches. Organizations/businesses
should implement various security measures such as access
controls, encryption, data masking, and more! It is also crucial to
establish a security monitoring and incident response process to
ensure that any security incidents are detected and responded to
in a timely manner.

6) Maintain Data Quality standards


Maintaining data quality standards is important for ensuring that
data is accurate, consistent and reliable. Organizations should
establish data quality standards and implement processes to
monitor and maintain data quality. This includes processes for
data validation, data cleansing and data enrichment.

7) Implementing automation and monitoring tools


Automation and monitoring tools can improve the efficiency and
effectiveness of governance processes. For example, automated
processes can be used to apply data classification tags to
objects based on specific criteria or to enforce row-level access
policies whereas Monitoring tools can be used to track access to
data, detect security incidents, and monitor data quality.

Tools Used for Effective Snowflake Governance

1) Collibra
Collibra is an enterprise-oriented data governance tool that helps
businesses and organizations understand and manage their data
assets. It enables businesses and organizations to create an
inventory of data assets, capture metadata about 'em, and govern
these assets to ensure regulatory compliance. The tool is
primarily used by IT, data owners, and administrators in charge of
data protection and compliance to inventory and track how data
is used. Collibra's aim is to protect data, ensure it is
appropriately governed and used, and eliminate potential fines
and risks from a lack of regulatory compliance.
Collibra's mission is to help businesses secure their data, ensure
appropriate governance and utilization, and eliminate potential
fines and risks associated with noncompliance with regulatory
requirements. So, by integrating Collibra with Snowflake ,
enterprises can effectively manage their data assets within
Snowflake by leveraging Collibra's governance capabilities. This
combination enables data democratization and enterprise-wide
collaboration, while also enabling businesses to easily discover
and scale access to reliable data. The unique features and
complementary capabilities of both platforms empower
businesses to increase data usage, collaboration, and ultimately
deliver faster insights and innovation, all while ensuring proper
governance of their data within Snowflake.
Collibra (Source: collibra.com)
Collibra offers six key functional areas to aid in data governance:
 Collibra Data Quality & Observability : Monitors data quality
and pipeline reliability to aid in remedying anomalies.
 Collibra Data Catalog : A single solution for finding and
understanding data from various sources.
 Data Governance : A location for finding, understanding, and
creating a shared language around data for all individuals
within an organization.
 Data Lineage : Automatically maps relationships between
systems, applications, and reports to provide a
comprehensive view of data across the enterprise.
 Collibra Protect : Allows for the discovery, definition, and
protection of data from a unified platform.
 Data Privacy : Centralizes, automates, and guides workflows
to encourage collaboration and address global regulatory
requirements for data privacy.

2) Alation
Alation is a sophisticated data catalog solution designed for
enterprise-level organizations, acting as a unified reference for all
their data needs. It automatically scans and indexes over 60
distinct data sources, encompassing on-premises databases,
cloud storage, file systems, and business intelligence tools.
Utilizing query log ingestion, Alation analyzes queries to pinpoint
the most frequently accessed data and its primary users. This
information forms the foundation of the catalog, which allows
users to collaborate and contextualize the data. With the catalog
established, data analysts and scientists can swiftly locate,
scrutinize, validate, and repurpose data, enhancing their
productivity.
However, Alation's capabilities extend beyond a mere data
catalog solution. It also serves as a data governance platform,
enabling analytics teams to effectively manage and enforce
policies for data consumers. Through Alation's comprehensive
metadata management, organizations can establish and enforce
policies, monitor usage, and maintain compliance with data
privacy regulations. Its adaptable workflows and dashboards
empower governance teams to effortlessly create, modify, and
disseminate policies, ensuring responsible data usage across the
enterprise.
Alation is an optimal solution for Snowflake data governance, as
it centralizes data, fosters collaboration, and enforces adherence
to data access and usage policies. This leads to heightened
productivity and innovation, making Alation an invaluable
resource for organizations seeking efficient Snowflake data
governance.
Alation (Source: Alation)
Key benefits of using Alation:
 Boost analyst productivity
 Improve data comprehension
 Foster collaboration
 Minimize the risk of data misuse
 Eliminate IT bottlenecks
 Easily expose and interpret data policies
Alation offers various solutions to improve productivity, accuracy
and data-driven decision-making. These include:
 Alation Data Catalog : Improves the efficiency of analysts
and the accuracy of analytics, empowering all members of
an organization to find, understand, and govern data
efficiently.
 Alation Connectors : A wide range of native data sources
that speed up the process of gaining insights and enable
data intelligence throughout the enterprise. (Additional data
sources can also be connected with the Open Connector
Framework SDK.)
 Alation Platform : An open and intelligent solution for various
metadata management applications, including search and
discovery, data governance, and digital transformation.
 Alation Data Governance App : Simplifies secure access to
the best data in hybrid and multi-cloud environments.
 Alation Cloud Service : Offers businesses and organizations
the option to manage their data catalog on their own or have
it managed for them in the cloud.

Conclusion
Snowflake data governance is essential for ensuring data quality,
security, and accuracy. Snowflake provides a comprehensive set
of features to help businesses implement data governance, but
these features must be combined with an effective strategy. In
this article, we defined Snowflake data governance, discussed
best practices for implementation, and provided an overview of
the built-in and third-party tools available to support Snowflake
data governance.
You can think of Snowflake Governance as a fence protecting
your data garden from any trespassers. Use it to your full
advantage to create reliable data security measures and data
access controls, safeguarding the privacy of your sensitive data
stored in Snowflake.

FAQs
What is Snowflake data governance?
Snowflake data governance refers to the policies, procedures,
and practices implemented to manage and control data stored on
the Snowflake Platform. It ensures data integrity, security, and
management.
What are the advantage of having a data governance strategy?
Advantage of data governance strategy include improved data
reliability, compliance with regulations, enhanced data security,
increased data efficiency and productivity, and better decision-
making based on accurate data.
What are the key features of Snowflake's built-in data
governance?
Snowflake's built-in data governance features include column-
level security, row-level access policies/security, object tagging,
object tag-based masking policies, data classification, object
dependencies, and access history.
What are some best practices for implementing Snowflake data
governance?
Best practices include effectively using Snowflake's built-in
governance features, establishing data policies and procedures,
forming a governance team, developing a data governance
framework, implementing security measures, maintaining data
quality standards, and leveraging automation and monitoring
tools.
Which tools can be used for Snowflake data governance?
Snowflake can be integrated with a range of security and
governance tools, such as Collibra, Alation, and others.
==

Snowflake Data Governance: Features, Frameworks & Best practices


Updated August 01st, 2023

Share this article

Quick answer:
TL;DR? Here are the highlights of this article and what to expect from it:
 Snowflake offers data governance capabilities such as:
 Column-level security
 Row-level access
 Object tag-based masking
 Data classification
 Oauth
 Data governance in Snowflake can be improved with a Snowflake-validated
data governance solution. Such a solution would:
 Handle governance for data from multiple sources (non-Snowflake)
 Enable data lineage
 Enhance data discovery
 Embed collaboration
 Empower cross-functional teams
 This article delves into the specifics of data governance for Snowflake assets
and improving it so that you can manage your entire data estate.
 Looking for a data catalog? Make sure to check out Atlan — approved by the
Snowflake Ready Validation Program. Book a demo or take a guided product
tour.

Atlan: A modern snowflake data governance workspace


Atlan is a single data governance plane for all your Snowflake data assets.
Atlan helps build a robust data governance system by:
 Protecting sensitive data at scale.
 Automating consistent data access policies across your entire data ecosystem.
 Providing transparency into data lifecycle through lineage.
 Establishing trust in data through context and collaboration.

Table of contents
1. What is Snowflake?
2. What is data governance in Snowflake?
3. Benefits of governing Snowflake data assets
4. Data governance capabilities in snowflake
5. 5 key challenges of implementing data governance in Snowflake
6. Snowflake data governance with Atlan
7. Atlan: A Snowflake validated data governance solution

What is Snowflake?
Snowflake’s platform enables a wide variety of workloads and applications on any
cloud, including data warehouses, data lakes, data pipelines, and collaboration as
well as business intelligence, data science, and data analytics applications.
Quoting Snowflake’s website:

Snowflake is a fully managed service that’s simple to use but can power a near-
unlimited number of concurrent workloads. Snowflake is your solution for data
warehousing, data lakes, data engineering, data science, data application
development, and securely sharing and consuming shared data.

Snowflake stands out because it decouples both storage and compute. This means
you can spin up and down machines on demand based on the analytics workload.
Quoting Snowflake from their S-1 form,

Our platform solves the decades-old problem of data silos and data
governance. Leveraging the elasticity and performance of the public cloud, our
platform enables customers to unify and query data to support a wide variety
of use cases. It also provides frictionless and governed data access so users can
securely share data inside and outside of their organizations, generally without
copying or moving the underlying data.

Snowflake is a cloud-agnostic platform that can distribute data across regions as well
as across cloud providers such as AWS, Azure, and GCP for data storage. Some of
the customers of Snowflake include Dropbox, Doordash, Hubspot, Adobe, and Fitbit.

Snowflake Architecture diagram. Source: Snowflake


What is data governance in snowflake?
Data governance is a set of standards, procedures, models, and guidelines around
people, processes, and technologies that detail how data is to be properly managed,
accessed, and used. Good data governance helps ensure the availability, quality,
integrity, and security of organizational data.
Quoting The Data Governance Institute(DGI):

Data Governance is a system of decision rights and accountabilities for


information-related processes, executed according to agreed-upon models
which describe who can take what actions with what information, and when,
under what circumstances, using what methods.

Benefits of governing Snowflake data assets


Quoting Mckinsey & Company:

Leading firms have eliminated millions of dollars in cost from their data
ecosystems and enabled digital and analytics use cases worth millions or even
billions of dollars. Data governance is one of the top three differences between
firms that capture this value and firms that don’t.

Here are the benefits of governing Snowflake data assets:


Data quality

Data governance ensures that data throughout its lifecycle is accurate, consistent,
fresh, and complete.
Data access

Data governance helps provide flexible, and scalable access policies to data users.
Security and compliance

Data governance monitors and reduces the risk of exposing private data. It enables
data assets to be auditable across their life cycle and thus helping businesses comply
with regulations like GDPR, HIPAA, etc.
Life cycle management

Governance helps set policies for data creation, data retention, and data deletion.
Trust
Governance improves the trust and confidence in data and thus helps increase the
ROI of your data assets

Data quality and data availability: Key benefits of having a robust data governance

program. Source: Mckinsey Digital

[Download ebook] → Rethinking Data Governance for the Modern Data Stack

Data governance capabilities in Snowflake


Snowflake provides the following data governance features out of the box:

1. Column-level security
2. Row-level access policies
3. Object tag-based masking policies
4. Data classification
5. Object dependencies
6. Oauth
1. Column-level governance with data masking

Column-level governance lets you add data masking policy within a table or a view
through Dynamic data masking and External tokenization.
Learn more: Column-level data governance for Snowflake tables and views
2. Row-level governance

Snowflake uses row-level policies to control what rows are returned in the query
result — SELECT, UPDATE, DELETE, and MERGE statements.
Learn more: Row access policies for Snowflake data
3. Object tagging

Data stewards use tags to track sensitive data for security, privacy, compliance,
discovery, and resource usage. Tags become powerful when you attach them with
access policies. This helps manage and scale data governance in Snowflake easier.
Learn more: Object tag-based access policies
4. Data classification
Snowflake samples all data assets and classifies them as tags. The broad classification
use cases include PII, data access, policies, and anonymization.
Learn more: Introduction to Snowflake data classification
5. Object dependencies

Object dependencies enable you to track dependencies within Snowflake data assets.
Dependencies are useful for performing an impact analysis, being compliant, and
maintaining data integrity.
Learn more: Object dependencies and their use-cases
6. OAuth

Snowflake supports both built-in and third-party OAuth for the authentication and
authorization of users.
Learn more: Introduction to OAuth in Snowflake

5 key challenges of implementing data governance in Snowflake


The following can be listed as the 5 key challenges of implementing effective
governance across snowflake data assets:
1. Metadata scope limitations
2. Governance for data from multiple sources (non-Snowflake)
3. Lack of data lineage
4. Data discovery challenges
5. Governance management for non-technical / business users
1. Metadata scope limitations

The scope of what is considered and tracked as metadata is expanding, especially


with the adoption of the modern data stack. Teams can now use ETL logs, data
quality metrics, workflow errors, etc. as metadata to improve and automate data
governance. So you’ll need a centralized metadata management tool to track and
take action on metadata.
2. Governance for data from multiple sources (non-Snowflake)

Data governance, especially on the cloud, proliferates across multiple tools and
processes ranging from ingestion, ETL, data quality, and business intelligence(BI). So
the need for a centralized metadata and data governance platform is a must-have to
get a complete hold on governance.
3. Lack of data lineage
Data governance in essence ensures that high-quality data exists for analysis
throughout the life-cycle of the data. Lineage helps track this by helping visualize the
journey of the data from the source to the dashboard.
4. Data discovery challenges

The two core components of a good data governance system are “availability” and
“usability” of data. The lack of a data catalog and a business glossary in Snowflake
might make it difficult to find, use, and collaborate on data.
5. Governance management for non-technical/business users

Data governance setup and management on Snowflake are entirely done through
writing SQL queries. This makes it harder for non-technical/business users to use
Snowflake for governance.
CREATE OR REPLACE TAG Classification;
ALTER TAG Classification set comment =
"Tag Tables or Views with one of the following classification
values:
'Confidential', 'Restricted', 'Internal', 'Public'";

CREATE OR REPLACE TAG PII;


ALTER TAG PII set comment = "Tag Tables or Views with PII with one
or more
of the following values: 'Phone', 'Email', 'Address'";

CREATE OR REPLACE TAG SENSITIVE_PII;


ALTER TAG SENSITIVE_PII set comment = "Tag Tables or Views with
Sensitive PII
with one or more of the following values: 'SSN', 'DL',
'Passport',
'Financial', 'Medical'";
Creating a PII tag classification on Snowflake by writing SQL queries.
Source: The Snowflake Definitive Guide, by, Joyce Kay Avila.

Snowflake data governance with Atlan: Flexible and scalable


Atlan helps transform data governance from a complex, bureaucratic process into a
simple, community-driven approach. With custom classification, programmable PII
bots, and automated classification propagation, Atlan makes data governance
management easy at scale.
Listed below are some of the key data governance use cases that Atlan helps
solve:
Data discovery and search: Catalog
Atlan crawls and catalogs all your data assets in your Snowflake data warehouse and
drives self-service data discovery — Atlan also crawls other data sources like
Bigquery, Redshift, Databricks, etc.
Atlan’s Google-like search lets you find relevant data assets, documents, BI
dashboards, and queries and understand the associated context between different
data assets in a business-friendly user interface.
The powerful search filters let you slice and dice your search results based on data
sources, tables, columns, business glossary, owners, and classification tags.
Learn more: How to search and discover assets in Atlan

Snowflake Data Governance: Atlan Data catalog facilitates metadata discovery

across your Snowflake warehouse. Source: Atlan

A Guide to Building a Business Case for a Data Catalog


Download ebook

Single source of truth: Centralized documentation

With Atlan as the central metadata management system, there is no need for end-
users to log into different systems to find and understand the data. This reduces the
time to value of any data project.
Atlan’s data dictionary and a business glossary help crowdsource the tribal
knowledge and create a unified taxonomy of data assets across your Snowflake
warehouse. The business glossary maps physical data elements like databases, tables,
columns, and SQL queries to business terms, definitions, metrics, KPIs, calculations,
and reports.
Learn more: How to create and manage a business glossary in Atlan

Snowflake Data Governance: A centralized knowledge bank that explains key

business terms and concepts. Source: Atlan


Transparency and Traceability: Data Lineage

Atlan’s automated data lineage helps businesses to meet regulatory requirements


with ease. Key compliance needs like data provenance, data usage, data access, data
transformation, and data archival/deletion can be tracked end-to-end from the
source to the BI dashboard using data lineage.
Atlan actively parses SQL queries and builds column-level lineage. This granular
visualization helps identify both upstream and downstream dependencies of a data
asset. Data users can visually backtrack and identify the source and logic of a data
asset and thereby helping reduce the total volume of data requests and increasing
analyst productivity.
Learn more: End-to-end life cycle visualization with Atlan’s data lineage

Snowflake Data Governance: Atlan Data lineage helps you understand the journey

of the data from its data source to dashboards. Source: Atlan


DataOps: Agile data science for scale
Data lineage helps DataOps engineers understand the data dependencies better,
thus helping with better pipeline workflow design. Tracing data across its lifecycle
helps identify and fix issues with root cause analysis (RCA) and impact analysis.
Technical metadata like workflow errors and data quality logs help alert downstream
users of data reliability.
Learn more: Solve DataOps use cases with Atlan’s custom metadata

Get a Demo of Atlan Data Governance for Snowflake

Classification of sensitive data: Be compliant-ready

Atlan auto classifies PII data like email, name, phone number, and credit card
information.
As the number of compliances (GDPR, HIPPAA, CCPA, etc.) companies must adhere
to is growing - protecting sensitive data is becoming ever more challenging. Altan
auto propagates all classifications downstream such that every table that is derived
from the column is tagged with the same classification. Atlan also helps mask
sensitive data through hashing, nullifying, and redacting.
Learn more: How to add the classification to a Snowflake data asset

Snowflake Data Governance: Automate classification and access control through

data governance. Source: Atlan


Scale governance better: Granular access control
Atlan lets you manage data access governance through role-based access —
Administrators, members, and guests — and access policies that control access to
certain data assets. Access policies let you define granular access via:
 Metadata policies: Control who can view, update, add and delete metadata.
 Data policies: Control what users can do with the data — querying data,
hiding data, etc.
 Glossary policies: Create, update, delete, and add classifications to business
terms.
To scale data governance better, Atlan also lets you define access policies and user
experience based on:
 Personas: Personas define policies to control which users can (or cannot) take
certain actions on specific assets — examples include marketing, sales, data
engineering, etc.
 Purposes: Purposes control permissions based on which users and groups can
view, edit, and query assets tagged with that classification. For example, a
finance team can tag all the assets related to their analysis as “finance” and
create policies around the tag.
Atlan integrates with your existing single sign-on and user management systems like
Okta, JumpCloud, OneLogin, Google, and Azure SSO.
Learn more: Identity and access management in Atlan for Snowflake assets
Learn more: Seamlessly scale governance for Snowflake data
with Personas and Purposes.

Snowflake Data Governance: Scale access control with Personas and Purposes.

Source: Atlan
Embedded collaboration: Context on demand without switching costs

Atlan provides you with a decentralized and community-driven approach to data


governance; Instead of being an afterthought, governance is now part of your daily
workflows.
With Atlan, you can learn more about any data asset without ever switching the
application and breaking your flow. For example, information about when the
column was updated last on your BI dashboard is just a click away.
Atlan also helps democratize governance by integrating with your collaboration and
project management tools like Slack and Jira. With Atlan’s slack bot, you can look up
any data asset right within your chat interface. Data users can initiate Slack
conversations around any data asset right within Atlan.
Learn more: Seamlessly collaborate around your Snowflake data assets
with Slack and Jira integration.
Snowflake Data Governance: Embedded collaboration with data assets and team

members in the tools you are familiar with. Source: Atlan


Governance Reporting

Atlan’s reporting center gives you a quick snapshot of important governance metrics
like:
 Total data assets crawled and cataloged — the breakdown of assets by data
sources
 A breakdown of asset categories — SQL assets, BI assets, process assets, etc.
 Asset drill down by Persona, Owners, and Groups.
 Total assets that have been verified and certified.
Learn more: Track Snowflake data governance metrics on Atlan’s Reporting Center

Snowflake Data Governance: A snapshot of data governance status through

Reporting center. Source: Atlan

Atlan: A Snowflake-validated data governance solution


If you are evaluating and looking to deploy best-in-class data access governance for
your Snowflake data warehouse - give Atlan a spin.
Atlan is the first data catalog and metadata management solution approved by
the Snowflake Ready Validation Program.
Quoting, Bob Muglia, Former CEO of Snowflake:

“Atlan’s unique, collaboration-first approach for the modern data stack helps to
break down organizational silos and empower cross-functional teams to work
together to make better business decisions.”

Atlan is more than a metadata management and data cataloging tool. Atlan is built
by data engineers for solving the evolving needs of modern data teams. Atlan’s
capabilities include faster discovery, transparent data flow, robust governance, and
collaboration built on open infrastructure and an easy-to-use user interface.
The deep integration and the open API enable Atlan to solve other modern data
governance use cases across DataOps, workflow management, and pipeline
automation.
Atlan has been named a leader in The Forrester Wave™: Enterprise Data Catalogs for
DataOps, Q2 2022.
The report states,
“Atlan is the tool of choice for DataOps and data product deployment. Atlan’s
vision is to create frictionless data product deployment through a single
metadata and data automation platform.”

Getting started with Snowflake data governance with Atlan:

 How to crawl Snowflake metadata


 How to mine Snowflake metadata
 What does Atlan crawl from Snowflake?
 How to attach a classification for Snowflake data assets?
 How do I control access to Snowflake metadata and data?

Snowflake data governance: Related reads


 What is data governance & why does it matter?
 Data Governance Framework : Examples, Standards & Templates
 7 Best Practices for Data Governance to Follow in 2024
 Benefits of Data Governance : 4 Ways It Helps Build Great Data Teams
 Data Governance Roles and Responsibilities : A Quick Round-Up
 Data Governance Policy : Examples, Templates & How to Write One
 Key Objectives of Data Governance : How Should You Think About Them?
 5 Popular Data Governance Certifications & Trainings in 2024
 8 Best Data Governance Books Every Data Practitioner Should Read in 2024
 Automated Data Governance : How Does It Help You Manage Access, Security
& More at Scale?
The world will have created and stored 200 Zettabytes of data by 2025, which is the
equivalent of every person on the planet carrying around 400 iPhones. Half of that will be
stored in cloud environments.
As more and more data is created by systems, devices, and transactions, the complex
challenges surrounding that data haven’t magically disappeared. That’s why data
governance is more critical than ever.
In this first article in our series on data governance within Snowflake, we focus on building
foundations.
Enterprises historically have been focused on data performance, whether that be the
frequency of updates or query execution time. However, getting data ASAP isn’t exactly a
North Star metric when it comes to enterprise data governance. You likely have many other
concerns related to your data’s:
 Availability
 Usability
 Integrity
 Security

While this might seem self-explanatory, if your data is performant but not available, usable,
validated, or secure, then the data could be adding risk to your enterprise. This could be in
the form of your ability to trust the quality and consistency of your data, or errors in your data
being made public through misreporting revenue. Consequently, if you’re sacrificing
security in order to provide the availability to your users or customers, you could be putting
your enterprise at risk.
Within this article, we cover each of these elements of data governance in depth — along
with the controls that the Snowflake Data Cloud provides.
What Is Data Governance?
In data engineering, the term data governance can take on a lot of meanings and is heavily
overloaded.
Data governance describes how enterprises put in place a collection of processes and
practices that effectively manage data, data assets, and metadata within the platform.
This includes controls for data availability, usability, integrity, and security.
While data governance has a much broader scope in totality and complexity, this post will
cover three of the many layers of data governance:
For this post, we are focusing on the yellow boxes in this process.
Data governance is iterative, not static.
You must regularly re-evaluate and update your controls as your enterprise changes.
Oftentimes, data stewards review existing practices, meet with business stakeholders to
determine the controls that are necessary, and audit the controls that are being enforced
within the enterprise.
Why Does It Matter?
Data governance controls ensure that data is consistent and dependable within the data’s
lifecycle. This includes everything from initial creation and ingestion from a source to
complex use cases such as a machine learning model result.
By enforcing specific standards for data governance, you ensure that quality data is being
used to drive key business decisions, inform customers, and empower users.
Without data governance processes and practices, you run the risk of data inconsistencies.
For example, if you’re pulling addresses from multiple sources that don’t follow a particular
standard, you will have to determine how to resolve these addresses. Are they duplicates?
Are they accurate? Without effective data governance in place, these questions frequently
require time and cost your enterprise money.
These same concerns apply to all kinds of data, from time zones on dates stored in your
database to missing data points between systems.
Looking for more best practices around Snowflake role hierarchy?

This guide will provide actionable steps on how to streamline your onboarding process for
users on Snowflake Data Cloud!
How Do I Build Data Controls in Snowflake?
While many data governance controls require manual auditing and
validation, Snowflake provides functionality to reduce the number of controls required. In
some instances, Snowflake can also automate data governance controls.
Let’s take a look at each of our previously mentioned data governance concerns and how
they’re managed within Snowflake.
Availability
It’s critical that your data be available when your business needs it.
Snowflake is available on AWS, Azure, and Google Cloud Platform, and it inherits many of
the high availability features that these cloud providers offer.
Snowflake builds on top of the availability of these cloud provider offerings with:
 Fault tolerance of data nodes
 Automatically spreading data across multiple availability zones
 Time travel for accessing deleted data and performing data backups
 Separation of compute and storage resources
 Data replication and fail-over (cross-region and cross-cloud)

These additional features are critical for maintaining data availability.


For example, an availability zone within AWS goes down temporarily. Since not only your
data but also your compute resources have been distributed to multiple availability zones,
Snowflake will automatically handle failure cases and reroute your request away from the
unavailable zone.
Image by Snowflake
But software and hardware are never perfect. Inevitably, there’s going to be an outage that
impacts your enterprise. In the event that Snowflake itself has a service outage, Snowflake
will issue account credits (depending on your level of support). While this doesn’t change the
impact on your enterprise, it does help offset any costs incurred as a result of the outage.
Another key aspect of availability is having access to data when you need it. In many
traditional systems, your data is spread across multiple databases or multiple clusters. In
order to aggregate that data together, you need to create a data pipeline to move data from
one system to another.
This creates opportunities for data to become unavailable if that pipeline goes down.
Snowflake on the other hand is built on a centralized data model, which removes the need for
data pipelines to move data from one location to another, and further increases the
availability of your data throughout your enterprise.
Usability
It’s important to define processes and practices that ensure your data is usable, documented,
labeled, and can be easily found by consumers. It’s critical that data ingestion implements
controls for consistent, usable data. Services provide data in many different ways, and it’s
critical that your tooling be able to enforce the controls your enterprise has put in place.
What Snowflake Tooling Does
Snowflake has a few different processes for data ingestion (like Stages, SnowPipe,
and Kafka). These processes can ensure that data conforms to particular standards. When
defining these ingestion processes, you define a schema that the data is expected to adhere to.
This schema definition enforces your data types and the precision of those data types.
What Snowflake’s Account Access Does
Snowflake’s centralized data model also promotes usability. While many enterprises may
only have one Snowflake account, Snowflake allows for multiple accounts to segregate out
business costs and organize data by business unit. With Snowflake’s data
sharing functionality, a single data point can be referenced within multiple Snowflake
accounts.
Snowflake also gives you the ability to share data outside of your Snowflake instance
with reader accounts. This allows consumers to have read-only access to specific data without
needing to unload or copy your data elsewhere.
The sharing approach accelerates the value that your data provides the
enterprise and increases the usability of your data without increasing risk. This reduces the
need to apply further data governance practices for your data sharing and unload processes,
ultimately increasing the velocity of your data.
Integrity
Once data has been ingested and stored, it’s critical that your data be resilient, accurate,
complete, and consistent.
You should put controls in place to ensure that your enterprise can rely on your data. There
are two main types of data integrity that an enterprise needs to be concerned with within
Snowflake: physical and logical.
Let’s take a look at each of them.
Physical Integrity
It’s critical that data can be read and stored consistently without data loss.
In the event of the loss of an availability zone or data center, the data should retain its
integrity both in a replicated environment and when that availability zone or data center
recovers. As previously mentioned, Snowflake automatically replicates your data across
multiple availability zones to prevent data loss and maintain physical integrity.
Logical Integrity
Logical integrity in data governance refers to ensuring that your data is logically accurate and
complete with regard to your enterprise and individual domains.
This requires building and evolving controls within your data ingestion and storage to ensure
that you have up-to-date data and that the data continues to conform to the expected types,
values, and constraints.
In order to validate data completeness or accuracy, you will need to create controls and
processes separate from Snowflake. These controls will need to compare your ingested data
against the source system that it was ingested from, and will require resources outside of
Snowflake to perform this comparison. However, since Snowflake separates out your read
and write compute resources, reading data out for validation doesn’t impact data ingestion
performance.
Security
Security should always be front of mind in any digital system. Users should be assigned the
bare minimum set of permissions that allows them to perform their job or task. Your data
should also be encrypted in-transit and at-rest to ensure that intercepted traffic or data leaks
do not provide attackers with your data.
Data security takes shape within Snowflake in a few different ways.
Resource Access
When a query is submitted to Snowflake, it will validate that the current session’s role is
allowed to perform the action in the issued query. Snowflake provides many layers of access
control, and each of these has a particular use case. Let’s take a look at each of them.
Role-Based Access Control (RBAC)
Snowflake allows for you to create a role hierarchy, where roles can inherit from each other.
Each role contains a set of permission grants that allow an assigned user to perform actions
against your Snowflake resources. These actions might include reading from a database,
creating integrations, or assigning permissions to other roles. As a basic example, your role
hierarchy might look like this:
Discretionary Access Control (DAC)
Within Snowflake, each resource has a defined owner. The owner of a resource is a role
defined at the time of creation. The ownership of a resource can also be updated by either the
owning role, a parent of that role, or the account administrator. It is generally recommended
that all custom roles (that don’t involve user management) ultimately roll up to the
SYSADMIN role within Snowflake, as this role is responsible for owning all custom
resources.
RBAC and DAC within Snowflake are combined into one security model, allowing for
flexibility depending on your use case. Every resource has an owner (a role), and roles can
be assigned to users which allows them to perform actions against resources.
Encryption
Snowflake supports end-to-end encryption for all ingestion methods and encrypts all data it
ingests by default.
For data outside of Snowflake that you wish to ingest, it is recommended that you encrypt
your data client-side and then start the ingestion process into Snowflake. If this data isn’t
encrypted prior to ingestion, Snowflake will encrypt it before storing it. This is the best
practice as it ensures that your data is secure — it’s only readable by those who have the
correct permissions granted to them via roles.
In the case of client-side encryption, Snowflake requires the following:
1. The Snowflake customer creates a secret master key, which remains with the
customer.
2. The client generates a random encryption key and encrypts the file before it is
ingested into Snowflake. The random encryption key is then encrypted with the
customer’s master key.
3. The encrypted file and the encrypted random key are uploaded to the cloud storage
service. The encrypted random key is stored with the file’s metadata.
Authorization and Authentication
In order to control access to your data, it’s recommended to integrate Snowflake with your
enterprise’s authorization and authentication strategy.
We also recommend that you use federated authentication and the System for Cross-domain
Identity Management (SCIM) specification in order to sync your users and their roles
between your identity provider and Snowflake. This allows you to automate the provisioning
and de-provisioning of users and creates a single source of truth for managing user access.
For example, if you are using Okta as an identity provider, you would assign groups to a
user. When setting up your SCIM integration with Snowflake and Okta, you would map
what groups in Okta map to what groups in Snowflake. This way, you can manage access
inside of your existing identity provider, thereby simplifying the data access controls needed
for your data governance strategy.
Once this setup is complete, it’s important to assign the correct grants or permissions to your
roles within Snowflake. You will want to restrict the permissions in these roles to be the
minimum permissions necessary to perform the role’s designated tasks.
Access Policies
Another layer of data security that is available to you within Snowflake is access policies.
This can be applied at the row level or column level via row access policies and dynamic data
masking. These features allow you to conditionally return data within a table based on the
user’s role.
Row Access Policies
These policies allow you to control which rows in your table are returned based on the user’s
current role. This, as part of your data governance strategy, allows you to apply governance
policies. A single policy can be reused across multiple tables or views.
A good example of this would be a table that contains all the employees and their salaries.
You wouldn’t want everybody to be able to see all the records in that table, but you do want
individuals to be able to see their own salary and managers to be able to see their direct
report’s salaries. You would apply a row level filter that verifies the user has the manager
role in order to be able to get additional results back.
Columnar Access Policies
Also known as dynamic data masking, this is for when you want to conditionally include data
in the result of your query. If a user doesn’t have access to that column, the result will have a
mask applied to hide sensitive information. This can be created as a single policy and applied
to multiple columns across your Snowflake schema.
Reusing our row access policy example of salaries, perhaps you want people to be able to see
the full list of people within the company, but that table has sensitive information such as
salary, address, phone number, etc. Instead of filtering at the row level, we could simply
mask all the personally identifiable information (PII) and salary data using dynamic masking.
If the current role doesn’t have access to see that information, you can replace the data value
with a mask like “********” or whatever fits your business preferences and requirements.
Auditing Access History
When applying various methods for protecting sensitive data, it’s also important to have an
audit process defined to ensure that your data is being accessed following the governance
practices you’ve defined. This is where Snowflake’s access history functionality is critical.
This records the read operations for each query and identifies which columns the query
accessed directly or indirectly. These records should be used to facilitate regulatory
compliance and data governance compliance by data stewards.
Access history also gives you visibility into what data is being accessed, how frequently, and
allows you to validate the usage of your data as well. It’s common for data stewards to have
to identify who is using what data and how frequently to drive decisions like archiving or
deleting datasets that aren’t used. This can also be used to drive decisions around high
volume datasets which may require more attention or sensitivity.
Putting It All Together
Your data should always have controls in place to make sure that it’s providing value to your
business, only accessible by the allowed entities, secure from data leaks and hackers, and
highly available.
If sensitive data gets into the wrong hands or is inaccessible, there’s an immediate
impact upon your business.
Snowflake has many built-in controls to alleviate the custom processes and engineering costs
typically associated with data warehouses. When implementing a data warehouse or pipeline,
be sure to evaluate the necessary controls for data governance, and regularly re-evaluate and
update these controls.

Next up in the Series


Part 2: How Do I Manage Data Governance Access Controls In Snowflake?
Looking for more best practices around Snowflake and data g

Photo by Darius Cotoi on Unsplash

Amidst a spree of data leaks over the last few years, data privacy and security have

taken center stage in technology-related conversations. Securing the perimeter within

and outside an organization has become extremely important. Networking constructs


like VPCs, NACLs, and security groups provide a multi-layered approach to security.

This, topped up with a solid identity and access management framework, does well

with the application and infrastructure layer, but do all of these measures suffice for

the data layer, too? I’d say they don’t.

While the data infrastructure can be secured using the tools mentioned above, data

itself needs several other layers of protection within the organization to prevent,

among other things, unauthorized access and accidental sharing of data. To do that,

data teams usually had to rely on an external data governance tool before some data

warehouse and data lake platform companies started offering an integrated data

governance solution, Snowflake being one of the prominent ones.

Snowflake is unique in how it approaches data governance — the same way it

approaches infrastructure — everything is done using SQL. The basic knowledge of

SQL is something you can, more or less, take for granted in teams that directly deal

with data, which is why this makes using Snowflake a desirable proposition for

businesses. Snowflake’s governance features are not a replacement for a full-fledged

governance tool, but it does some of the heavy lifting in certain areas. Snowflake’s

governance features help you answer the following questions and more:
 Which business categories do your data objects belong to? This is done

using object tagging.

 Which columns contain sensitive, especially PII and PHI data?

Snowflake’s built-in data classification engine can help you do that.

 Which data objects should be visible to a user? Granular control of data

objects is enabled by applying row access policies and column-level

security features.

 Which users are reading data from and writing data? Snowflake logs all
such activity in the ACCESS_HISTORY table.

 How has a data asset (table or column) transformed in a data pipeline after

cleansing, wrangling, and transformation?


Both ACCESS_HISTORY and OBJECT_DEPENDENCIES give you a detailed

answer to that question.

This article will introduce you to the data above governance themes with examples to

help you navigate your data governance journey on Snowflake. Let’s dive right in!
Tagging objects in Snowflake

Tags are general-purpose labeling constructs and provide a clean and highly flexible

solution for categorizing data assets, from coarse-grained organization-level objects

like accounts and warehouses to fine-grained schema-level objects like tables, views,

and columns. Tags can be used in several ways, some prescribed by Snowflake as

listed below, while others are left for you to discover for your business use case:

 Track cost and usage of org-wide data assets

 Classify and group data assets based on custom requirements

 Apply data masking policies on columns with certain tags

 Protect sensitive data using system-defined tags

In addition to governance, tags help with many other use cases, such as search and

discovery in data catalogs, business intelligence tools, etc. Many of these tools, such

as Select Star, Immuta, and Collibra, have a two-way tag sync functionality, where you

can apply tags from your data catalogs and they’ll get updated in Snowflake. This
deserves mention because, in most cases, data catalogs only extract metadata from

data sources and don’t write back to them. Think of this as out-of-the-box reverse ETL

for metadata, i.e., the data catalog enriching the data source back.
Identifying sensitive data

While tags allow you to arbitrarily create labels for your data assets, in many cases,

you would also need to classify your data assets based on their content. To do that,

you can use Snowflake’s data classification feature. The classification process requires

a certain amount of compute power, so you’ll need to have a warehouse up and

running to use this feature. You can use this feature to:

 Identify sensitive data using native Snowflake features

 Have oversight of sensitive data

 Invoke external cloud-based functions for classification

The core step of the process is extracting semantic categories that represent personal

attributes, such as name, address, age, salary, and so on. To do that, you can call
the EXTRACT_SEMANTIC_CATEGORIES function and, using a sample of the column

data, and it will give you the probability of each column containing sensitive data.

Aside from that, it provides you attributes like confidence, coverage, and two system-
defined tags, SEMANTIC_CATEGORY and PRIVACY_CATEGORY.

Data classification isn’t a fully automatic process; it goes through a cycle of analysis,

review, and application, where Snowflake takes care of steps one and three. However,

it falls upon the data engineers to review whether Snowflake’s correct interpretation

of the column data. After the review, you can use


the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure with the name of

the table and the output of the EXTRACT_SEMANTIC_CATEGORIES function as

parameters to automatically associate the tags to the table columns. You can use these

tags to protect sensitive data using Snowflake’s data masking capabilities.


Granular access to data

While RBAC (role-based access control) and DAC (discretionary access control) allow

you to manage object-level access in Snowflake, there are two other methods that you
can use to restrict access to data in tabular objects, such as tables, views, and

materialized views. Some examples of where granular access to data would make

sense are when you want to:

 Restrict access to sales records for one region by members of other regions

 Create an information barrier between teams to comply with regulatory

requirements

 Protect PII and PHI data by masking fields partly or in full based on

various factors

With row access policies and masking policies, you can restrict access to rows and

columns. Both these methods allow for on-the-fly evaluation of policies. Based on

these evaluations, you’ll be granted access to specific rows or columns. Row access

policies use both Conditional Expression functions and Context Functions for policy

enforcement. For instance, you can check for the role a user is using to access the data
by using the CURRENT_ROLE() context function.
Mapping data lineage

While implementing the modern data stack, it’s usual for a data cataloging or lineage

tool to be set up to allow business and technical users to understand the flow of data

and how it’s been transformed from source to target. Snowflake’s data lineage

metadata comes in handy when you have to:

 Visualize how the data loads from sources, integrates with other sources,

and becomes usable for the business

 Perform impact analysis when making changes to data objects

 Debug ETL issues that end up in stale, erroneous, and incomplete data

Snowflake’s data lineage capabilities rest upon two distinct features; the metadata for
both is captured in the ACCOUNT_USAGE schema. The OBJECT_DEPENDENCIES view

allows you to look at the dependency graph between objects, i.e., it shows you which

objects need which other objects for proper functioning. This is helpful when

planning to handle the changes to an upstream object and the impact those changes

would have on the downstream objects.


The other feature is where Snowflake logs a year’s worth of query history in
the ACCESS_HISTORY view. This view provides details about the query, such as

which tables, views, materialized views, and columns were accessed. It also provides

the text of the query in full. Using access history, you can also use external tools with

their native SQL parsers to build or enrich data lineage as and when required.
Conclusion

This article walked you through Snowflake’s native data governance features. It also

talked about identifying and protecting sensitive data, restricting access to data using

various control measures, and using the metadata to build the lineage graph. For

more, check out Snowflake’s guide to data governance, which talks about these

capabilities in much more detail. Also, check out the latest data governance-related

blog posts in the community-driven Snowflake blog on Medium.

Optimizing performance in Snowflake


The following topics help guide efforts to improve the performance of Snowflake.

Exploring execution times


Gain insights into the historical performance of queries using the web interface or by
writing queries against data in the ACCOUNT_USAGE schema.

Optimizing warehouses for performance


Learn about strategies to fine-tune computing power in order to improve the
performance of a query or set of queries running on a warehouse, including enabling
the Query Acceleration Service.

Optimizing storage for performance


Learn how storing similar data together, creating optimized data structures, and
defining specialized data sets can improve the performance of queries.

Helpful when choosing between Automatic Clustering, Search Optimization Service,


and materialized views.

You might also like