Snowpro™ Advanced: Data Engineer: Exam Study Guide
Snowpro™ Advanced: Data Engineer: Exam Study Guide
This study guide highlights concepts that may be covered on Snowflake’s SnowPro™
Advanced: Data Engineer Certification exam.
This document introduces relevant information that may appear on the SnowPro Advanced:
Data Engineer Exam. This document should serve as an introduction to the knowledge and
skills required to guide your preparation. The material contained within this study guide is not
intended to guarantee a passing score on any Snowflake certification exam.
For an overview and more information on the SnowPro™ Core Certification exam or
SnowPro™ Advanced Certification series, please navigate here.
TABLE OF CONTENTS
STEPS TO SUCCESS 4
Page 1
SNOWPRO™ ADVANCED: DATA ENGINEER CERTIFICATION
OVERVIEW
The SnowPro™ Advanced: Data Engineer tests advanced knowledge and skills used to apply
comprehensive data engineering principles using Snowflake.
Target Audience:
2 + years of data engineering experience, including practical experience using Snowflake for DE
tasks; Candidates should have a working knowledge of Restful APIs, SQL, semi-structured
datasets, and cloud native concepts. Programming experience is a plus.
This exam guide includes test domains, weightings, and objectives. It is not a comprehensive
listing of all the content that will be presented on this examination. The table below lists the main
content domains and their weighting ranges.
Page 2
SNOWPRO™ ADVANCED: DATA ENGINEER PREREQUISITE
Eligible individuals must hold an active SnowPro Core Certified credential. If you feel you need
more guidance on the fundamentals, please see the SnowPro Core Study guide.
Snowflake recommends that examinees have at least 2 years of hands-on practical Snowflake
implementation experience prior to attempting any of the SnowPro Advanced exams.
For the SnowPro Advanced: Data Engineer Certification exam, we recommend individuals
have at least 2 + years of hands-on Snowflake Practitioner experience in a Data
Engineering role prior to attempting this exam. The exam will assess skills through
scenario-based questions and real-world examples.
Page 3
RECOMMENDATIONS AND USING THE GUIDE
This guide will show the Snowflake topics and subtopics covered on the exam. Following the
topics will be additional resources consisting of videos, documentation, blogs, and/or exercises
to help you understand data engineering with Snowflake.
STEPS TO SUCCESS
Page 4
SNOWPRO ADVANCED: DATA ENGINEER DOMAINS & OBJECTIVES
1.0 Domain: Data Movement
1.8 Outline when to use External Tables and define how they work.
● Partitioning external tables
● Materialized views
● Partitioned data unloading
Lab Guides
Accelerating Data Engineering with Snowflake & dbt (lab guide)
Auto-Ingest Twitter Data into Snowflake (lab guide)
Page 5
Automating Data Pipelines to Drive Marketing Analytics with Snowflake & Fivetran (lab
guide)
Reading Assets
Support for Calling External functions via Google Cloud API Gateway Now in Public
Preview (blog)
Snowflake and Spark, Part 2: Pushing Spark Query (Blog)
Fetching Query Results From Snowflake (Blog)
Moving from On-Premises ETL to Cloud-Driven ELT (White paper)
Snowflake Documentation
COPY INTO (Documentation)
Loading Data into Snowflake (Documentation)
DESCRIBE STAGE (Documentation)
Data Loading Tutorials(Documentation)
CREATE FILE FORMAT (Documentation)
Continuous Data Pipelines (Documentation)
VALIDATE_PIPE_LOAD (Documentation)
COPY_HISTORY (Documentation)
Databases, Tables & Views (Documentation)
CREATE STREAM (Documentation)
CREATE TASK (Documentation)
Connectors & Drivers (Documentation)
Sharing Data Securely in Snowflake (Documentation)
CREATE EXTERNAL TABLE (Documentation)
Page 6
● Snowpipe
● Stages
● Tasks
● Streams
Lab Guides
Resource Optimization: Performance (lab guide)
Resource Optimization: Usage Monitoring (lab guide)
Building a Data Application (lab guide)
Reading Assets
Performance Impact from Local and Remote Disk Spilling (Blog)
Snowflake: Visualizing Warehouse Performance (Blog)
Caching in Snowflake Data Warehouse (Blog)
Snowflake Documentation
Queries (Documentation)
System Functions (Documentation)
Account Usage (Documentation)
QUERY_HISTORY, QUERY_HISTORY_BY_* (Documentation)
Analyzing Queries Using Query Profile (Documentation)
Databases, Tables & Views (Documentation)
Virtual Warehouses (Documentation)
COPY_HISTORY (Documentation)
LOAD_HISTORY View (Documentation)
TASK_HISTORY (Documentation)
COPY_HISTORY View (Documentation)
SHOW STREAMS (Documentation)
PIPE_USAGE_HISTORY View (Documentation)
3.4 Use Time Travel and Cloning to create new development environments.
Page 7
● Backup databases
● Test changes before deployment
● Rollback
Lab Guides
Getting Started with Time Travel (lab guide)
Snowflake Documentation
Snowflake Time Travel & Fail-safe (Documentation)
Databases, Tables & Views (Documentation)
Parameter Hierarchy and Types (Documentation)
Database Replication and Failover/Failback (Documentation)
Continuous Data Pipelines (Documentation)
SYSTEM$CLUSTERING_INFORMATION (Documentation)
SYSTEM$CLUSTERING_DEPTH (Documentation)
4.2 Outline the system defined roles and when they should be applied.
● The purpose of each of the System Defined Roles including best practices
usage in each case
● The primary differences between SECURITYADMIN and USERADMIN
roles
● The difference between the purpose and usage of the USERADMIN/
SECURITYADMIN roles and SYSADMIN
Page 8
Security Study Resources:
Reading Assets
Snowflake RBAC Security Prefers Role Inheritance to Role Composition (Blog)
Snowflake Documentation
Managing Security in Snowflake (Documentation)
Managing Your User Preferences (Documentation)
Managing Governance in Snowflake (Documentation)
Stored Procedures (Documentation)
GRANT <privileges>…TO ROLE (Documentation)
CREATE MATERIALIZED VIEW (Documentation)
5.1 Define User-Defined Functions (UDFs) and outline how to use them.
● Secure UDFs
● SQL UDFs
● JavaScript UDFs
● Returning table value as compared to scalar value
Reading Assets
Snowflake For Data Engineering – Easily Ingest, Transform and Deliver Data for
Up-To-The Moment Insight (white paper)
Page 9
Bringing Extensibility to Data Pipelines: What’s New with Snowflake External Functions
(blog)
Generating a JSON Dataset Using Relational Data in Snowflake (blog)
Best Practices for Managing Unstructured Data (White paper)
Snowflake Documentation
UDFs (User-Defined Functions) (Documentation)
External Functions (Documentation)
CREATE EXTERNAL FUNCTION (Documentation)
CREATE API INTEGRATION (Documentation)
CREATE EXTERNAL FUNCTION (Documentation)
Transactions (Documentation)
Stored Procedures (Documentation)
TRY_PARSE_JSON (Documentation)
Queries (Documentation)
Semi-Structured Data (Documentation)
Databases, Tables & Views (Documentation)
Snowpark (Documentation)
Page 10
SNOWPRO™ ADVANCED: DATA ENGINEER SAMPLE QUESTIONS
b. Clustering information on all tables: this function clusters all tables by default.
d. An error: this function does not accept lists of columns as a second parameter.
Page 11
2. A Data Engineer has inherited a database and is monitoring a table with the below query
every 30 days:
The Engineer gets the first two results (e.g., Day 0 and Day 30).
-- DAY 0 -------
{
"cluster_by_keys" : "LINEAR(o_orderdate)",
"total_partition_count" : 3218,
"total_constant_partition_count" : 0,
"average_overlaps" : 20.4133,
"average_depth" : 11.4326,
"partition_depth_histogram" : {
"00000" : 0,
"00001" : 0,
"00002" : 0,
"00003" : 0,
"00004" : 0,
"00005" : 0,
"00006" : 0,
"00007" : 0,
"00008" : 0,
"00009" : 0,
"00010" : 993,
"00011" : 841,
"00012" : 748,
"00013" : 413,
"00014" : 121,
"00015" : 74,
"00016" : 16,
"00032" : 12
}
}
-- DAY 30 -------
{
"cluster_by_keys" : "LINEAR(o_orderdate)",
"total_partition_count" : 3240,
"total_constant_partition_count" : 0,
"average_overlaps" : 64.1185,
"average_depth" : 33.4704,
"partition_depth_histogram" : {
"00000" : 0,
"00001" : 0,
"00002" : 0,
"00003" : 0,
Page 12
"00004" : 0,
"00005" : 0,
"00006" : 0,
"00007" : 0,
"00008" : 0,
"00009" : 0,
"00010" : 0,
"00011" : 0,
"00012" : 0,
"00013" : 0,
"00014" : 0,
"00015" : 0,
"00016" : 0,
"00032" : 993,
"00064" : 2247
}
}
a. The table is well organized for queries that range over column o_orderdate.
Over time, this organization is degrading.
b. The table was initially well organized for queries that range over column
o_orderdate. Over time this organization has improved further.
c. The table was initially not organized for queries that range over column
o_orderdate. Over time, this organization has changed.
d. The table was initially poorly organized for queries that range over column
o_orderdate. Over time, this organization has improved.
3. A Data Engineer is preparing to load staged data from an external stage using a
task object.
Which of the following practices will provide the MOST efficient load performance?
Page 13
4. A Data Engineer is working on a project that requires data to be moved directly from an
internal stage to an external stage.
5. The S1 schema contains two permanent tables that were created as shown below:
a. The retention time on table_a does not change; table_b is set to 20 days.
a. An error will be generated; a data retention time on a schema cannot be set.
b. The retention time on both tables will be set to 20 days.
c. The retention time will not change on either table.
Page 14
Keys: 1) B
2) A
3) D
4) A
5) A
The information provided in this study guide is provided for your purposes only and may not be
provided to third parties.
IN ADDITION, THIS STUDY GUIDE IS PROVIDED “AS IS”. NEITHER SNOWFLAKE NOR ITS
SUPPLIERS MAKES ANY OTHER WARRANTIES, EXPRESS OR IMPLIED, STATUTORY OR
OTHERWISE, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, TITLE,
FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT.
Page 15