0% found this document useful (0 votes)
592 views49 pages

Building Applications With Snowpark For Dummies

Uploaded by

avilanchee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
592 views49 pages

Building Applications With Snowpark For Dummies

Uploaded by

avilanchee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

These materials are © 2024 John Wiley & Sons, Inc.

Any dissemination, distribution, or unauthorized use is strictly prohibited.


Building Applications
with Snowpark

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Building Applications
with Snowpark

Snowflake Special Edition

by Monica Mehta

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Building Applications with Snowpark For Dummies®,
Snowflake Special Edition
Published by
John Wiley & Sons, Inc.
111 River St.
Hoboken, NJ 07030-5774
www.wiley.com
Copyright © 2024 by John Wiley & Sons, Inc., Hoboken, New Jersey

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without
the prior written permission of the Publisher. Requests to the Publisher for permission should
be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, The Dummies Way, Dummies.com,
Making Everything Easier, and related trade dress are trademarks or registered trademarks of
John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries, and may not be
used without written permission. Snowflake and the Snowflake logo are trademarks or registered
trademarks of Snowflake Inc. All other trademarks are the property of their respective owners.
John Wiley & Sons, Inc., is not associated with any product or vendor mentioned in this book.

LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: WHILE THE PUBLISHER AND AUTHORS HAVE


USED THEIR BEST EFFORTS IN PREPARING THIS WORK, THEY MAKE NO REPRESENTATIONS
OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF
THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION
ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES REPRESENTATIVES, WRITTEN
SALES MATERIALS OR PROMOTIONAL STATEMENTS FOR THIS WORK. THE FACT THAT AN
ORGANIZATION, WEBSITE, OR PRODUCT IS REFERRED TO IN THIS WORK AS A CITATION AND/
OR POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE PUBLISHER
AND AUTHORS ENDORSE THE INFORMATION OR SERVICES THE ORGANIZATION, WEBSITE, OR
PRODUCT MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. THIS WORK IS SOLD WITH
THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING PROFESSIONAL
SERVICES. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR
YOUR SITUATION. YOU SHOULD CONSULT WITH A SPECIALIST WHERE APPROPRIATE. FURTHER,
READERS SHOULD BE AWARE THAT WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED
OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.
NEITHER THE PUBLISHER NOR AUTHORS SHALL BE LIABLE FOR ANY LOSS OF PROFIT OR ANY
OTHER COMMERCIAL DAMAGES, INCLUDING BUT NOT LIMITED TO SPECIAL, INCIDENTAL,
CONSEQUENTIAL, OR OTHER DAMAGES.

For general information on our other products and services, or how to create a custom
For Dummies book for your business or organization, please contact our Business Development
Department in the U.S. at 877-409-4177, contact [email protected], or visit www.wiley.com/go/
custompub. For information about licensing the For Dummies brand for products or services,
contact BrandedRights&[email protected].
ISBN 978-1-394-23841-5 (pbk); ISBN 978-1-394-23840-8 (ebk)

Publisher’s Acknowledgments
Some of the people who helped bring this book to market include the following:
Development Editor: Nicole Sholly Sales Manager: Molly Daugherty
Project Manager: Jennifer Bingham Content Refinement Specialist:
Acquisitions Editor: Traci Martin Saikarthick Kumarasamy

Editorial Manager: Rev Mengle

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Table of Contents
INTRODUCTION................................................................................................ 1
About This Book.................................................................................... 2
Icons Used in This Book........................................................................ 2
Beyond the Book................................................................................... 3

CHAPTER 1: Introduction to Modern Applications


and Snowpark................................................................................. 5
What Are Modern Applications?.......................................................... 6
Exploring Use Cases for Apps.............................................................. 7
Architecture of Applications................................................................. 8
Data tier............................................................................................ 8
Processing tier.................................................................................. 9
Presentation tier.............................................................................. 9
Looking at the Benefits of a Cloud Data Platform
for Applications...................................................................................... 9
Introducing Snowflake’s Snowpark................................................... 10
Libraries.......................................................................................... 10
Runtimes......................................................................................... 11
Processing of all data types.......................................................... 15

CHAPTER 2: Welcome to the Snowflake Data Cloud.................... 17


Compute Challenges for Applications.............................................. 18
Multiple programming languages............................................... 18
Unstructured data......................................................................... 19
Security and governance.............................................................. 19
Introducing the Snowflake Data Cloud............................................. 20
Data tier: Flexible architecture patterns..................................... 21
Processing tier: Language flexibility............................................ 22
UI tier: Streamlit in Snowflake...................................................... 23
The Benefits of the Data Cloud.......................................................... 23
Near-limitless scale and performance........................................ 24
Exceptional economic value......................................................... 24
Inherent ease of use...................................................................... 24
Multicloud and cross-cloud flexibility.......................................... 24
Baked-in security and governance.............................................. 25
Unique collaboration options....................................................... 26

Table of Contents v

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
CHAPTER 3: What You Can Build with Snowpark........................... 27
Building Data Pipelines....................................................................... 28
Data transformations.................................................................... 29
Custom business logic................................................................... 29
Data science and ML pipelines..................................................... 29
Building AI/ML Models........................................................................ 30
Model development...................................................................... 31
Model deployment and management........................................ 32

CHAPTER 4: Getting Started with Snowpark...................................... 33


The Snowpark API............................................................................... 34
Code Editors and IDEs........................................................................ 34
Notebook Solutions for Data Science............................................... 34
Snowflake Notebooks................................................................... 34
Hex notebooks............................................................................... 35
Open-source notebooks............................................................... 35
Snowflake Python Worksheets.......................................................... 35

CHAPTER 5: Six Benefits of Building Apps


with Snowpark.............................................................................. 37
One Platform for All Languages........................................................ 38
Bringing the Compute to the Data.................................................... 38
Built-In Governance and Security...................................................... 39
Increased Performance and Reduced Price..................................... 39
Building with Streamlit in Snowflake................................................ 40
Distribute and Monetize with Snowflake Native
Apps and Snowflake Marketplace..................................................... 41

vi Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Introduction
M
odern applications are revolutionizing the world. These
applications are not only transforming the way we inter-
act with technology but also reshaping various sectors of
society. They’re harnessing the power of vast amounts of data
and leveraging advanced artificial intelligence (AI) algorithms to
provide personalized experiences, make accurate predictions,
automate processes, and drive innovation. From healthcare to
finance, and education to entertainment, modern applications are
opening up new possibilities and changing our lives in ways we
could never have imagined.

In healthcare, modern applications are being used to predict


disease outbreaks, improve patient care, and accelerate drug dis-
covery. In education, applications provide personalized learning
experiences and early identification of students who need more
help. Financial firms are using applications to assess risks, detect
fraud, and optimize investor strategies. Entertainment companies
are using applications to recommend content, analyze audiences,
and create virtual reality experiences. These are just a few exam-
ples of how modern applications are changing the world — the
opportunities are endless.

However, your organization may have experienced just how com-


plex and time-consuming building data-intensive, AI-enriched
applications can be. Unfortunately, data is often fragmented
across many systems and in complex formats, requiring complex
data pipelines and difficult integrations. Constructing accurate AI
and machine learning (ML) models requires storing, processing,
and analyzing large volumes of clean, accurate data. Developers
often need to set up separate compute environments for the mul-
tiple programming languages needed to build robust applications.
They also need to ensure that the data they collect and share is
secured and governed so it doesn’t get into the wrong hands. Rest
assured, this book can help.

Introduction 1

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
About This Book
The Snowflake Data Cloud, including Snowpark, helps develop-
ers overcome the challenges of building and deploying applica-
tions. The Snowflake Data Cloud is the best choice for application
architecture because it simplifies development and operations by
providing a unified, secure, and fully governed cloud environment
for data storage, integration analysis, and other computing tasks.

Snowpark was designed to make building data pipelines and AI/ML


models in Snowflake a breeze using programming languages such
as Python, Java, and Scala without any data movement. If you’re
looking for faster, easier, and more cost-efficient application
development, look no further than the Snowflake Data Cloud and
Snowpark, which this book introduces to you in a straightforward
and easygoing manner.

Whether you’re a data scientist, data engineer, or developer, this


book provides you with information about the following topics:

»» Modern applications: What are they? What are the use


cases? What’s the best architecture for building and
deploying them?
»» The Snowflake Data Cloud: What is it? How does it support
applications? What are the benefits of using it, and what can
organizations achieve by building applications on it?
»» Snowpark: What is it? How and why should you build data
pipelines with it? What are the best practices for and benefits
of developing and deploying AI/ML models on it? How can
you get started with Snowpark?

Icons Used in This Book


Throughout this book, the following icons highlight tips, impor-
tant points to remember, and more:

Tips guide you to easier ways to perform a task or helpful advice


as you build, deploy, and monetize apps.

2 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
This icon highlights concepts worth remembering as you immerse
yourself in all things Snowpark.

The jargon beneath the jargon, explained.

Beyond the Book


Once you’re ready to build applications quickly and cost-
effectively with Snowpark, getting started is easy. You can begin
development anywhere that can run a Python kernel. Simply
install the Snowpark API and establish a connection with your
Snowflake account. The Snowflake website offers a number of
resources to help you get started with Snowpark quickly and
easily. To learn more, visit snowflake.com/snowpark.

Disclaimer: Snowpark is the set of libraries and runtimes in


Snowflake that securely deploy and process non-SQL code.
Snowpark is an ever-evolving product. Features and capabilities
described in this book may not be generally available, be different
than described, or no longer exist at the time of reading.

Introduction 3

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Defining modern applications

»» Checking out modern app use cases

»» Describing the three-tier approach to


application architecture

»» Revealing the benefits of using a cloud


data platform for applications

»» Introducing Snowpark for the app


processing tier

Chapter 1
Introduction to
Modern Applications
and Snowpark

T
he increasing market opportunities for data-intensive,
artificial intelligence (AI)-enriched applications are driven
by the explosion of data fueled by advances in wireless con-
nectivity, compute capacity, and the proliferation of Internet of
Things (IoT) devices. This is an exciting time for developers to
deliver innovative, data-driven apps to customers in every indus-
try, from healthcare to entertainment. But too many of them
spend most of their efforts on unproductive tasks. They wrestle
with reconciling different technologies, waste time managing
complex infrastructures, and struggle to hire in-demand opera-
tions specialists.

CHAPTER 1 Introduction to Modern Applications and Snowpark 5

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
THE ERA OF DATA
According to a 2023 report by Grand View Research, the global
software market size was valued at $583.47 billion in 2022 and is
expected to grow at a compound annual growth rate (CAGR) of
11.5 percent from 2023 to 2030. The report is titled “Software
Market Size, Share & Trends Analysis Report By Enterprise Size, By
Vertical (BFSI, Retail), By Deployment (On-premises, Cloud), By Type
(Productivity Software, Application Software), By Region, And Segment
Forecasts, 2023 - 2030,” — in case you want to read more about it.

Additionally, market intelligence firm IDC predicted in their “Data Age


2025” report that the volume of data created each year will top
160 zettabytes (that’s a trillion gigabytes) by 2025 — a tenfold increase
over the amount of data created in 2016. That report’s full title is
“Data Age 2025: The Evolution of Data to Life-Critical, Don’t Focus
on Big Data; Focus on the Data That’s Big.”

The good news is that Snowflake offers developers an easier,


faster way to build, monetize, and deploy modern applications.
But before we get into the solution, let’s start with the basics. This
chapter defines applications (or apps in shorthand), describes the
three-tier approach to application architecture, reveals the bene-
fits of using Snowflake to build apps, and introduces Snowpark —
Snowflake’s set of libraries and runtimes that securely deploy and
process non-SQL code, such as Python, Java, and Scala.

What Are Modern Applications?


Chances are, you’ve already interacted with a modern application
today. Modern apps are data-intensive and AI-enriched. They
process large volumes of complex and fast-changing data from
different sources, analyze it with AI/machine learning (ML) mod-
els, and present those insights so that customers or employees
can make better decisions and perform smarter actions.

In other words, modern apps handle data collection, processing,


and representation to provide value in the form of AI-enriched
insights. Here are some examples:

6 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
»» Marketing apps can forecast customer behavior and
product strategies.
»» Health apps can identify and predict disease, and
personalize health recommendations.
»» Transportation apps can optimize shipping routes in real
time.
»» Security apps help teams monitor, detect, and respond to
events quickly.
»» Climate apps predict extreme weather events and enable
municipalities to plan and respond quickly.

The possibilities are endless.

Modern apps have four characteristics:

»» High data volume: We’re talking billions of images or


millions of customer interactions.
»» Complex data formats: Data can be in a structured,
semistructured, and unstructured format, such as text files,
social media posts, or images and videos.
»» Dynamic, ever-changing data: The content and ingestion
of data from external sources (such as web data via APIs,
IoT devices, and other applications) can change and
fluctuate wildly.
»» AI-enriched: Modern apps use AI/ML models for faster time
to insights.

Exploring Use Cases for Apps


Here are five of the most common types of use cases for building
applications that are data-intensive and AI-enriched:

»» Customer 360: With Customer 360, organizations can get a


complete view of their customers by integrating data from
various touchpoints and applying it in areas such as person-
alized marketing and customer service.

CHAPTER 1 Introduction to Modern Applications and Snowpark 7

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
»» IoT: IoT apps ingest large volumes of time-series data from
devices and sensors for near real-time analysis in areas such
as security and personal health.
»» Application health and security analytics: Analytics are
crucial to maintaining the performance, availability, and
security of applications through use cases such as
performance availability and compliance monitoring.
»» ML and data science: Apps can use ML algorithms for
predictive analytics, image and speech recognition, fraud
detection, and customer service to name just a few
examples.
»» Embedded analytics: Embedded analytics is the integration
of data analytics capabilities within business applications,
allowing end users to access data insights.

Regardless of the type of application you’re designing, build-


ing, or supporting, the best way to differentiate your offering is
to select the right underlying architecture. Modern applications
that deliver real-time value at massive scale require a modern
data platform that’s designed for highly performant customer
experiences.

Architecture of Applications
The architecture of a modern application includes three physical
tiers. The chief advantage of three-tier application architecture is
that developers can develop, modify, and scale the tiers separately
instead of changing the entire application.

Data tier
At the bottom is the data tier, sometimes called the database tier
or persistence layer. This is where the data that feeds the process-
ing tier is stored and managed. It includes the data storage and
access mechanisms. Ideally, this tier can be scaled easily to allow
for changes in the amount of data the application needs to handle
at specific times.

8 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Processing tier
In the middle is the processing tier, also known as the business logic
or compute layer. This is where business logic for the application
is defined — for example, through a specific set of business rules.
This tier processes information that’s collected in the presenta-
tion tier as well as information stored in the data tier through
data transformations and ML models. The processing tier can also
add, change, or delete data in the data tier. The most popular pro-
gramming languages used in this tier are Python, Java, and Scala.

Presentation tier
On the top is the presentation tier, which houses the user interface.
This is where the user interacts with the data. Its main purpose
is to display information, communicate, and collect data from the
end user. For example, the presentation tier of a retail application
might include the mechanism for browsing merchandise, inter-
acting with the shopping cart, and purchasing.

A major advantage of three-tier application architecture is an


increased level of application security. Since the presentation and
data tiers are separate, cyberattacks can be prevented by adding
security features in the processing tier.

Looking at the Benefits of a Cloud


Data Platform for Applications
A cloud data platform provides a unified, secure, and fully governed
cloud environment for data store, integration analysis, and other
computing tasks. It is the best choice for application architecture
because it simplifies development and operations (DevOps) and
lets you bring innovative applications to market faster and more
efficiently.

The ideal modern cloud data platform for applications should


include the five capabilities highlighted in Table 1-1.

CHAPTER 1 Introduction to Modern Applications and Snowpark 9

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
TABLE 1-1 The Ideal Modern Cloud Data Platform
Capability What It Does Benefit

The capacity for Separates compute from Enables users to scale up and
near-unlimited storage down automatically and work
resources concurrently without
impacting performance

Governed and Eliminates the need to Enables seamless and secure


secure move or copy data collaboration
collaboration internally or externally

A fully managed Handles maintenance, Decreases costs and valuable


solution administration, and other worker time otherwise spent
automated services on manual tasks

Built-in Publishes and offers apps Increases app visibility and


distribution to organizations that can ease of monetization
capabilities easily find, try, and buy
your apps

Flexibility of Runs the app in the Eliminates data exposure


deployment provider’s or consumer’s outside of secure
models account environments

Introducing Snowflake’s Snowpark


Now that you understand why it’s critical to build applications on
a cloud data platform, it’s time to reveal why the best solution for
processing data that feeds into applications through data trans-
formations or AI/ML modeling is Snowpark on Snowflake.

Snowpark is the set of libraries and runtimes in Snowflake that


securely enable developers to deploy and process non-SQL code,
including Python, Java, and Scala. Thousands of organizations are
accelerating development and performance of their data engi-
neering and ML workloads with Snowpark for Python.

Libraries
On the client side, Snowpark consists of libraries that include
the Snowpark API and Snowpark ML API. These are open-source
libraries that work with any Python environment.

10 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
SNOWPARK CLIENT-SIDE
LIBRARIES
Client-side libraries include:

• Snowpark API: The Snowpark API library includes the DataFrame


API. The Snowpark DataFrame API brings deeply integrated,
DataFrame-style programming and OSS-compatible APIs to the
languages data practitioners like to use.
• Snowpark ML API: Snowpark ML includes the Python library
and underlying infrastructure for end-to-end ML workflows in
Snowflake. Snowpark ML unifies data pre-processing, feature
engineering, model training, and integrated deployment into a
single, easy-to-use Python library.

Runtimes
On the server side, runtimes include either Python, Java, and
Scala in the warehouse model or Snowpark Container Services.
Snowflake’s virtual warehouses are compute clusters that host
and run server-side contracts, such as user-defined functions
(UDFs) and stored procedures for processing custom logic.

For Python developers, Snowpark’s Python runtime makes it


possible to write custom Python code through UDFs and stored
procedures, which are deployed into Snowflake’s secure Python
sandbox. UDFs and stored procedures are two key components
of Snowpark that allow developers to bring custom Python logic
to Snowflake’s compute engine, while taking advantage of open-
source packages preinstalled from Anaconda in Snowpark.

For workloads that require the use of GPUs or other specialized


hardware, custom runtimes and libraries, or hosting of long-
running full-stack applications, Snowpark Container Services
offers the ideal solution.

CHAPTER 1 Introduction to Modern Applications and Snowpark 11

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
SNOWPARK SERVER-SIDE
RUNTIMES
Server-side runtimes include:

• Python, Java, and Scala in the warehouse model: Runtime con-


tracts include UDFs and Stored Procedures for Python that bring
custom logic to Snowflake’s compute engine while taking advan-
tage of open-source Python packages pre-installed in Snowpark.
• Snowpark Container Services: This allows developers to register,
deploy, and run container images in Snowflake-managed
infrastructure.

UDFs
Custom logic written in Python runs directly in Snowflake using
UDFs. These functions can stand alone or be called as part of a
DataFrame operation to process the data. Snowpark takes care of
serializing the custom code into Python byte code and pushes all
of the logic to Snowflake, so it runs next to the data.

To host the code, Snowpark has a secure, sandboxed Python run-


time built right into the Snowflake engine. Python UDFs scale out
processing associated with the underlying Python code, which
occurs in parallel across all threads and nodes, and comprising
the virtual warehouse on which the function is executing.

There are several types of UDFs that can be used in Snowpark,


including:

»» Scalar UDFs: Operate on each row in isolation and produce


a single result
»» Vectorized UDFs: Receive batches of input rows as pandas
DataFrames and return batches of results as pandas arrays
or series
»» User-Defined Table Functions: Return multiple rows for
each input row, return a single result for a group of rows,
or maintain state across multiple rows

12 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Stored Procedures
Snowpark Stored Procedures help developers operationalize their
Python code and run, orchestrate, and schedule their pipelines.
A stored procedure is created once and can be executed many
times with a simple CALL statement in your orchestration or
automation tools. Snowflake supports stored procedures in SQL,
Python, Java, JavaScript, and Scala, so developers can easily create
polyglot pipelines feeding applications.

Developers can use stored procedures in Snowpark to bundle the


Python function and have Snowpark deploy it on the server side.
Snowpark will serialize the Python code and dependencies into
bytecode and store them in a Snowflake stage automatically. They
can be created either as a temporary (session-level) or permanent
object in Snowflake.

Stored procedures are single-node bound, which means trans-


formations or analysis of data at scale inside a stored procedure
should leverage the Snowpark DataFrame API or other deployed
UDFs to scale compute across all nodes of a compute cluster.

Anaconda packages
One of the benefits of Python is its rich ecosystem of open-source
packages and libraries. In recent years, open-source packages
have been one of the biggest enablers for faster and easier devel-
opment. To leverage open-source innovation, Snowpark has
partnered with Anaconda for a product integration without any
additional cost to the user beyond warehouse usage. Developers in
Snowflake are now able to speed up their Python-based pipelines
by taking advantage of the seamless dependency management
and comprehensive set of curated open-source packages provided
by Anaconda — all without moving or copying the data.

All Snowpark users can benefit from thousands of the most popu-
lar packages that are preinstalled from the Anaconda repository,
including FuzzyWuzzy for string matching, H3 for geospatial
analysis, and scikit-learn for ML and predictive data analysis.
Additionally, Snowpark is integrated with the Conda package
manager so users can avoid dealing with broken Python environ-
ments because of missing dependencies.

Snowpark also fully supports dbt, one of the most popular


solutions for data transformation today. It supports a SQL-first

CHAPTER 1 Introduction to Modern Applications and Snowpark 13

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
transformation workflow, and in 2022, dbt introduced support
for Python. With dbt’s support for both SQL and Python, users
can write transformations in the language they find most familiar
and fit for purpose. And dbt on Snowpark allows analyses using
tools available in the open-source Python ecosystem, including
state-of-the art packages for data engineering and data science,
all within the dbt framework familiar to many SQL users.

Snowpark-optimized warehouses
Snowpark-optimized warehouses have compute nodes with 16x
the memory and 10x the local cache compared with standard
warehouses. The larger memory helps unlock memory-intensive
use cases on large data sets such as ML training, ML inference,
data exports from object storage, and other memory-intensive
analytics that could not previously be accommodated in standard
warehouses.

Snowpark-optimized warehouses also inherit all the benefits of


Snowflake virtual warehouses:

»» Fully managed: Snowflake oversees the maintenance,


security patching, tuning, and delivery of the latest perfor-
mance enhancements transparently.
»» Elastic: Elastic scaling of compute supports virtually any
number of users, jobs, or data with multitenant security and
resource isolation.
»» Reliable: Industry-leading SLA is consistently upheld.
»» Secure: Governance controls are applied across all work-
loads without trade-offs.

Snowpark Container Services


Snowpark Container Services is a new Snowpark runtime option
that enables developers to effortlessly deploy, manage, and scale
containerized workloads (jobs, services, service functions) using
secure, Snowflake-managed infrastructure with configurable hard­
ware options, such as GPUs. This new runtime eliminates the need
for users to deal with complex operations of managing and main-
taining compute and clusters for containers.

Moving governed data outside of Snowflake would expose it to


security risks. With containers running in Snowflake, there is no

14 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
need to move governed data outside of Snowflake to use it as part
of the most sophisticated AI/ML models and apps (whether devel-
oped for internal or for commercial uses). Snowpark Container
Services is available for workloads that require use of GPUs, cus-
tom runtimes/libraries, or the hosting of long-running full-stack
applications.

The containers, built and packaged by developers using their


tools of choice, can include code in any programming language
(for example, C/C++, Node.js, Python, R, React, etc.). In addition,
Snowpark Container Services provides a simple, unified experi-
ence for the end-to-end lifecycle of containerized applications
and AI/ML models. Other solutions require you to manually stitch
together a container registry, container management service, and
compute service, plus require you to manage your own separate
tools for observability, data connectivity, security, and so on.

Snowflake Native Apps provide the building blocks for app devel-
opment, distribution, operation, and monetization all within
Snowflake’s platform. Snowpark Container Services can be used
as part of a Snowflake Native App to allow developers to build and
distribute sophisticated full-stack apps that run entirely in their
end customer’s Snowflake account.

Processing of all data types


Developers can ingest all types of data using Snowflake’s
platform, including streaming or batch ingestion of structured,
semistructured, or unstructured data. The potential of unstruc-
tured data for analytics and applications has remained largely
untapped.

Snowpark allows users to easily process and derive insights


from unstructured data from files such as images, videos, and
audio. Python developers can easily take advantage of the Python
ecosystem of open-source packages, such as PyPDF2 for PDF
processing, or OpenPyXL to read Excel files.

CHAPTER 1 Introduction to Modern Applications and Snowpark 15

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Detailing the challenges of application
development

»» Identifying the unique attributes of the


Data Cloud

»» Exploring how the Data Cloud supports


applications

»» Revealing what organizations can


achieve from building apps with
Snowflake

Chapter 2
Welcome to the
Snowflake Data Cloud

A
pplication development is a complex and time-consuming
process. The good news is that new tools are making the
task of building and deploying applications much easier.
According to industry reports, generative artificial intelligence
(AI) can help developers complete coding tasks in 30 to 50 percent
of the time. But legacy systems can’t handle the large amounts of
data that are required to build AI applications, making it difficult
to gather, analyze, and share data across entities.

To use advanced tools that make it easier to build applications,


you need a scalable, flexible data platform and application archi-
tecture. Here’s where Snowflake comes in. The Snowflake Data
Cloud provides the platform to help developers build and deploy
applications quickly, easily, and cost-effectively.

With this technology foundation, organizations can overcome


compute challenges for applications and even build innovative
new types of applications.

CHAPTER 2 Welcome to the Snowflake Data Cloud 17

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Compute Challenges for Applications
One prediction for how data will be used in 2025 is that it will be
“embedded in every interaction and process.” That’s according
to Snowflake’s Data Trends 2023 Report. As the amount of data
grows exponentially, it becomes increasingly difficult to manage
and process in a timely manner. This requires the development of
new AI and machine learning (ML) algorithms that can scale to
search and process massive amounts of data.

However, modern applications often have complex workflows


that can be difficult to manage, scale, and secure, and they require
advanced infrastructure to support them. Modern applications
also tend to have intensive resource demands, requiring large
amounts of compute power, networking bandwidth, and graphics
capabilities.

In this section, we describe three specific compute challenges


for developers building and deploying applications — handling
multiple programming languages, processing unstructured data,
and managing complex data governance — and then in the next
section show how these challenges can be overcome.

Multiple programming languages


Development teams often need to use multiple programming
languages to build robust applications. SQL and Python are among
the most popular languages in the modern data stack process-
ing tier, where data is processed and transformed, and where ML
models are trained and deployed. While SQL is the long-standing
database language for querying and transforming data, Python
has emerged as the preferred programming language for devel-
opers as more advanced logic such as AI and ML becomes part of
the application.

The challenge arises when developers need to set up and manage


separate compute environments for each language. Moving data
across compute environments often requires converting, egress-
ing, scanning, and loading data from its source. And develop-
ers often need to stitch multiple tools together to finish a single
analysis when using multiple languages. Even for those confident
in both languages, running separate clusters for each environ-
ment can be frustrating and time-consuming.

18 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Unstructured data
Unstructured data can provide a rich source of information for
application developers, leading to deeper insights about user
behavior and user experience. Unstructured data doesn’t have a
predefined structure and can include call logs, chat transcripts,
contracts, and sensor data. It could be in any file format, such as
doc, pdf, jpeg, mp3, or mp4. It could also be in industry-specific
formats, such as raster data for geospatial, .las files for well log
data, or DICOM for medical.

Because the limitations of legacy technology made processing


solutions very complex and expensive, the potential of unstruc-
tured data for analytics and applications has remained largely
untapped. Historically, organizations have relied on structured
tables and semistructured data such as JSON and XML for ana-
lytical workloads and applications. For developers, unstructured
data is often hard to manage and not easy to collect, search, or
query. The sheer size and complexity of information without a
predefined schema also make it challenging to extract actionable
insights.

Security and governance


Developers must focus on security and governance to ensure their
application aligns with business goals and complies with exter-
nal regulations. Strong application security is critical because it
protects software application code and data against cyberthreats.
Governance provides a formal framework for achieving measura-
ble progress toward strategic objectives, maintaining compliance
standards, and protecting data privacy and security.

But keeping up with the evolving threat landscape and the


adoption of nimbler, more incremental software development
frameworks can be difficult and expensive. In addition, data is
often siloed and fragmented throughout an organization, but
strong security and governance requires a centralized repository
for security-relevant data.

Finally, manual processes of reviewing alerts and validating con-


trols slows down reaction time to incidents. Organizations need a
data platform that can ensure multilevel protections, extending
the security supplied by cloud storage vendors.

CHAPTER 2 Welcome to the Snowflake Data Cloud 19

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
DATA DIFFICULTIES FOR
DEVELOPERS
Data presents a number of challenges for application developers,
who must:

• Break through organizational silos that prevent access to a single


view of data.
• Unify various formats of structured and unstructured data from
multiple sources.
• Set up and manage separate compute environments for each
language.
• Store and process massive volumes of data quickly and easily
to construct accurate AI models.
• Ensure that the data they collect and share complies with
regulatory guidelines.
• Safeguard the security of the data and make sure it doesn’t
get into the wrong hands.

(And they must do it all while managing costs!)

Regardless of the programming language they use or the type of


data they collect, application developers need to ensure the appli-
cation architecture is reliable, can scale to meet demand, and has
built-in security and governance.

Introducing the Snowflake Data Cloud


To solve these challenges, more and more organizations are
turning to the Snowflake Data Cloud. That’s because the Data
Cloud is a global network where thousands of organizations
mobilize data with near-unlimited scale, concurrency, and per-
formance. Inside the Data Cloud, organizations can unite their
siloed data, easily discover and securely share governed data, and
execute diverse analytic workloads.

20 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Wherever data or users live, the Data Cloud delivers a unified and
seamless experience — even when data and workloads span mul-
tiple public clouds. The Data Cloud enables discovering, managing,
and sharing data among business units, suppliers, other business
partners, and customers. It also offers live access to ready-to-use
data and applications that power innovative business solutions
from hundreds of providers in Snowflake Marketplace.

Snowflake’s flexible architecture lets you develop and scale appli-


cations without operational burden. Snowflake’s platform sup-
ports three robust tiers or layers (introduced in Chapter 1) that
work together to make the Data Cloud an ideal platform for mod-
ern applications.

For your application storage or data tier, Snowflake can handle


all types of data with automatic scalability and reliability. For the
processing tier, Snowflake’s elastic compute infrastructure allows
developers to write code in their preferred language for pipelines
and AI/ML models and run it directly on Snowflake. For the pres-
entation tier or user interface layer, Snowflake offers multiple
options to build and host front-end applications using the lan-
guage of your choice.

Snowflake’s platform is a unified architecture that allows you


to integrate any type of data — structured, semistructured, or
unstructured — from a wide range of data sources and use it to
power your applications, from search components to AI features.
Snowflake’s platform, delivered as a fully managed service, fea-
tures storage, compute, and global services layers that are physi-
cally separated but logically integrated. Data workloads scale
independently from one another, making it an ideal platform for
scaling applications without resource contention or performance
degradation.

Data tier: Flexible architecture patterns


Snowflake’s single platform enables customers to work with more
than SQL — it powers a full spectrum of use cases against many
architecture patterns. That means architectures can be built based
on what works best for an organization, even if that changes over
time. With this flexibility, customers can power any use case while
staying true to the core tenets of the Snowflake platform: strong
security and governance, excellent performance, and simplicity.

CHAPTER 2 Welcome to the Snowflake Data Cloud 21

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Snowflake supports a wide range of data formats, architecture
patterns, and use cases — all on a single, secure, and governed
platform — including any of the following:

»» A data warehouse for highly optimized SQL and business


intelligence use cases
»» A data lake for structured, semistructured, or unstructured data
»» A data mesh for a team to manage its own data and adhere
to standards (rather than a central data team)
»» A data lakehouse for SQL analytics and unstructured data
processing together

With Snowflake, you have two data storage options:

»» Use Snowflake as your central data repository and super-


charge performance, querying, security, and governance
with the Snowflake Data Cloud.
»» Store your data in AWS S3, Azure Data Lake, Google Cloud
Storage, or S3-compatible storage and use Snowflake to add
a governance layer on top and speed up data transformation
and analytics using Snowflake compute.

Processing tier: Language flexibility


Snowflake provides full support for programming languages
such as Python, Java, and Scala. For development and process-
ing in other languages, Snowpark enables Snowflake to meet the
programmability needs for all developers, regardless of which
languages you use to code your applications. Snowpark’s set of
libraries and runtimes in Snowflake securely deploy and process
programming languages such as Python, Java, and Scala.

To transform data or build features for AI/ML models, you can


use DataFrame-style APIs that help you interact with Snowflake’s
SQL engine with familiar programming. To process custom code
inside warehouses, which consist of Snowflake’s elastic compute
clusters, developers can package their code with User-Defined
Functions (UDFs) and Stored Procedures.

In the case of Python code, third-party open-source libraries can


be securely and efficiently executed using the integrated Anaconda
package repository and manager. All compute is automatically

22 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
pushed to the Snowflake engine for the heavy lifting to enable
use cases such as data transformations, ELT/ETL pipelines, ML
development, and ML deployment. With Snowflake, it all happens
in a single, easy-to-use platform with superior price performance
and near-zero maintenance.

Snowpark supports two types of compute: warehouses and con-


tainers. Most data processing logic can be efficiently executed in
Snowflake using the warehouse compute that supports Python,
Java, and Scala with standard compute or high-memory compute
(Snowpark-optimized warehouse).

Developers that require programming language or hardware flex-


ibility can build and deploy using Snowpark Container Services,
Snowflake’s fully managed container offering. Containers can be
built and packaged by developers using their tools of choice, can
include code in any programming language (for example, C/C++,
Node.js, Python, R, React, etc.), and can be executed using a wider
range of configurable hardware options, including GPUs.

UI tier: Streamlit in Snowflake


Building user interfaces does not require advanced front-end
development expertise. Snowflake makes it easy for developers to
build and deploy interactive interfaces for their applications using
the native integration of the Streamlit Python library. For teams
that need ultimate flexibility and employ experienced front-end
developers, Snowpark Container Services supports the hosting
and execution of any programming language.

The Benefits of the Data Cloud


The Snowflake Data Cloud is the natural platform for applications.
Snowflake’s unique architecture solves many of the challenges
that developers face when they build and deploy applications, and
gives them the power to create modern applications that are data-
intensive and AI-enriched.

Snowflake customers are creating full-stack modern apps,


including apps that create advanced visualizations for near real-
time risk awareness, feature visual self-service analytics, and
have natural language chat interfaces that run in the end user’s

CHAPTER 2 Welcome to the Snowflake Data Cloud 23

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
data platform. In this section, we highlight some of the benefits
of building applications on Snowflake.

Near-limitless scale and performance


Snowflake is architected for cloud-scale computing in terms
of data volume, computational performance, and concurrent
workload execution. This unique cloud-built architecture enables
organizations in the Data Cloud to spin up new workloads and
dynamically provision those workloads with as little or as much
compute and storage capacity as they need. There are no hidden
capacity limits, and there is no fear of resource contention with
other concurrent data workloads.

Exceptional economic value


Snowflake’s cost-effective utility model ensures organizations in
the Data Cloud pay only for the capacity they use, in one-second
increments. Capacity is provisioned automatically, based on user-
defined thresholds, allowing customers to focus on consumption
while the platform manages the resources they need. Because
Snowflake contracts with the public cloud vendors for immense
capacity at a considerable scale, customers receive these cloud
services at exceptional economic value via Snowflake.

Inherent ease of use


Snowflake’s platform is dramatically more straightforward to use
than previous generations of cloud data platforms and database-
as-a-service offerings. Because the platform doesn’t require
expert knowledge to run, your database administrators can move
beyond manual tuning and configuration chores, and your devel-
opers can focus their expertise on creating and refining data
models, extracting new insights, and deriving business value
from a wide range of data.

Multicloud and cross-cloud flexibility


The Data Cloud spans the world’s most popular cloud service pro-
viders (CSPs): Amazon Web Services (AWS), Microsoft Azure, and
Google Cloud Platform (GCP). You no longer have to standard-
ize on one of them. You can leverage multiple regions and clouds
according to your business needs and seamlessly share data from
one region or cloud to another. For example, a large multinational

24 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
organization with a multicloud strategy can use the Data Cloud
to operate seamlessly across multiple cloud provider regions to
serve its worldwide customer base and enhance its disaster recov-
ery strategy. Similarly, a data provider can leverage Snowflake’s
global infrastructure to share data with other organizations in
many geographies and across multiple clouds.

Because Snowflake completely abstracts the underlying public


cloud platform, users and administrators don’t need any cloud-
specific expertise. They never have to work with third-party cloud
services directly, nor do they receive a separate bill for public
cloud usage.

Snowflake is provided as a fully managed service that runs


completely on the CSP’s infrastructure. Snowflake designed the
platform to be cloud agnostic, and a Snowflake account can be
hosted on any of the following cloud platforms: AWS, Microsoft
Azure, and GCP. One unified code base spans all three public cloud
services, which means customers can seamlessly move data and
workloads among them yet interact with one cohesive platform
interface. Being cloud agnostic is a decisive advantage when
serving a worldwide user base or formulating a global disaster
recovery strategy.

Baked-in security and governance


A strong data governance and compliance framework is critical
to extracting the analytical value of data. The Data Cloud allows
you to store all your data while providing fast, governed, and
secure access to that data. Industry-leading cybersecurity prac-
tices extend across all data and all workloads. Multilevel security
includes data encryption and associated key management ser-
vices, role-based access controls, object-level permissions, and
robust database security. Furthermore, the Data Cloud adheres to
regulatory and data privacy policies to ensure the correct han-
dling of sensitive data.

Although new privacy requirements are still emerging in many


areas, Snowflake is continually evolving to help customers com-
ply with national and industry-specific regulations, and to avoid
costly violations of those regulations.

CHAPTER 2 Welcome to the Snowflake Data Cloud 25

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Unique collaboration options
Snowflake’s unique architecture provides the near-unlimited
scale, concurrency, and performance for organizations to collab-
orate with data across their enterprises and ecosystems. Whether
sharing data or full-stack applications, the Data Cloud raises the
bar of what is possible.

You can also establish entirely new revenue streams and collab-
orate with business partners in new and productive ways with
capabilities such as the Snowflake Native App Framework, which
allows developers to build, distribute, and monetize full-stack
apps that run within the end customer’s Snowflake account.

The Snowflake Native App Framework provides the building blocks


for app development, distribution, operation, and monetization
all within Snowflake’s platform. It gives providers the capability
to build a new type of application — a Snowflake Native App —
that’s distributed and monetized on Snowflake Marketplace and
runs inside the end customer’s Snowflake account.

Developers can use the flexibility and programmability of


Snowpark’s libraries and runtimes when building Snowflake Native
Apps. This includes Snowpark Container Services, an additional
Snowpark runtime that enables developers to manage and deploy
container images within Snowflake-managed infrastructure using
configurable hardware options including NVIDIA AI GPUs.

26 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Using Snowpark for data engineering
use cases

»» Revisiting Snowpark components

»» Revealing what makes building AI/ML


models successful

»» Detailing model development and model


deployment with Snowpark

Chapter 3
What You Can Build
with Snowpark

S
nowpark, the set of libraries and runtimes in Snowflake that
securely deploy and process Python and other programming
languages, enables all data users to bring their work to the
Snowflake Data Cloud. Snowpark allows data engineers, data
scientists, and data developers to code with their language of
choice to execute pipeline, machine learning (ML) workflow, and
app logic faster and more securely, in a single platform. With
Snowpark, users coding any language, including Python, Java,
and Scala, will be able to benefit from the performance, elasticity,
and governance of the Snowflake platform.

This chapter examines how you can build better data pipelines
and ML models with Snowpark.

CHAPTER 3 What You Can Build with Snowpark 27

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Building Data Pipelines
Data pipelines feeding applications can be complex for two main
reasons:

»» Multiple tools and infrastructure for each programming


language
One of the primary reasons to support multiple languages
is that data engineering is a collaborative effort across
teams. Data analysts may prefer SQL and GUI-based tools
like business dashboards. In contrast, data scientists
generally prefer to prepare data using notebooks and
Python, and data engineers and developers need additional
tools to tackle complex code and programming constructs.
Data often needs to travel across these different systems to
make pipelines work, leading to complex architectures that
can jeopardize security and impair data governance.
»» Significant manual effort and maintenance overhead
Working with complex data processing infrastructures
typically requires a lot of management and troubleshooting.
As a result, data engineers are often spread thin, spending
most of their time maintaining and fixing pipelines.

Snowflake natively supports SQL processing, and with Snowpark’s


libraries and runtimes, data engineers can also effortlessly build
simple, governed, and fast pipelines with other popular program-
ming languages, including Python, Java, and Scala. Snowpark
offers data engineers three crucial benefits:

»» A single platform that supports multiple languages without


any external compute
»» Superior governance
»» Faster, cheaper, and more resilient pipelines
Snowflake’s intelligent infrastructure eliminates the complexity
that data engineers often face — it just works, so they can focus
on what matters.

28 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Data engineers commonly use Snowpark for three use cases: data
transformation/feature engineering, custom business logic, and
data science and ML pipelines.

Customers see a median of 3.5x faster performance and 34 percent


cost savings with Snowpark over managed Spark.

Data transformations
Snowpark can be used to transform data stored in Snowflake or
S3-compatible data sources using popular programming lan-
guages such as Python, Java, and Scala. This can include data
engineering tasks such as data cleaning, data normalization, and
data aggregation. All data transformations can then be packaged
as Snowpark stored procedures to operate and schedule jobs with
Snowflake Tasks or other orchestration tools.

Data engineering teams can use Snowpark to transform raw data


into modeled formats regardless of type, including unstructured
data such as images, videos, and audio.

Custom business logic


Users can leverage Snowpark’s User Defined Functions (UDFs) to
streamline architecture with complex data processing and cus-
tom business logic written in Python or Java in the same platform
running SQL queries and transformations. There are no separate
clusters to manage, scale, or operate.

Data science and ML pipelines


Data teams can use the integrated Anaconda repository and
package manager to collaborate in bringing ML data pipelines to
production. Trained ML models can also be packaged as a UDF to
run the model inference close to data, enabling faster paths from
model development to production.

Snowpark enables pipelines with better price performance, trans-


parent costs, and less operational overhead thanks to Snowflake’s
unique multicluster shared data architecture. Snowflake is a
single integrated platform that delivers the performance, scale,
elasticity, and concurrency that today’s organizations require.

CHAPTER 3 What You Can Build with Snowpark 29

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
SNOWPARK COMPONENTS
As a refresher, developers interact with Snowpark using libraries and
runtimes:

• Snowpark libraries: On the client side, Snowpark consists of


libraries including the Snowpark API for DataFrame transforma-
tions and the Snowpark ML API for end-to-end artificial intelligence
(AI)/ML development and operations.
• Snowpark runtimes: On the server side, runtimes include Python,
Java, and Scala in warehouse compute and support for any lan-
guage in Snowpark Container Services. In warehouses, developers
can leverage UDFs and stored procedures (sprocs) to bring in and
run custom logic. Snowpark Container Services is available for
workloads that require use of GPUs, custom runtimes/libraries,
or the hosting of long-running full-stack applications.

Building AI/ML Models


Many applications are built on AI/ML models, but taking these
models to production requires multiple teams. Doing so in a scal-
able and repeatable way isn’t easy when each team manages its
own copies of data spread out over its own processing infrastruc-
ture. Snowflake solves these challenges by providing these three
key benefits:

»» Centralized, governed data: Teams can access a single


true source of data in your organization, with easy access
to structured and unstructured data from other external
sources — whether it’s from another business unit, a
partner, or a third-party source — without manual
transfers using Snowflake Marketplace.
»» Flexible, elastic compute engine: Multiple teams can
access Snowflake’s compute engine concurrently to process
data for features, models, and applications. They can also
bring advanced analytics such as AI and ML to the data
without having to move it to external, ungoverned
environments.
»» Choice of tools: Teams can interact with Snowflake directly
from the Snowflake UI or from another tool. Snowflake

30 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
provides developers with the flexibility to use their tool of
choice while ensuring the processing and data stays within
Snowflake’s secure and governed boundaries.

To access, visualize, and process data as part of their ML work-


flows, data scientists can leverage their favorite programming
languages, such as Python, using Snowpark in Snowflake.

For Python users, Snowpark ML includes a client-side library


and underlying infrastructure for end-to-end ML workflows
in Snowflake. Snowpark ML unifies data preprocessing, feature
engineering, model training, and integrated deployment into a
single, easy-to-use Python library. Snowpark ML includes the
Modeling API for data preprocessing and model training, as well
as the Snowpark ML Operations capabilities, including the Snow-
park Model Registry, for model deployment and management.

Snowflake’s infrastructure includes Snowpark-optimized


warehouses — which have 16x the memory and 10x the local
cache compared with standard warehouses — and are great for
memory-intensive operations including ML model training and
inference.

Model development
Model development refers to the process of designing, training, and
evaluating an ML model. This involves selecting the appropriate
algorithms, preparing the data, and tuning the model’s param-
eters to achieve the best performance.

Snowpark ML Modeling API includes a collection of Python APIs


for preprocessing data and training models. By using Snowpark
ML Modeling to perform these tasks within Snowflake, you can:

»» Transform your data and train your models without moving


your data out of Snowflake.
»» Work with popular ML frameworks similar to those you’re
already familiar with, such as scikit-learn and XGBoost,
directly in Snowflake.
»» Keep your ML pipeline running within Snowflake’s security
and governance frameworks.
»» Benefit from the performance and scalability of Snowflake’s
compute infrastructure.

CHAPTER 3 What You Can Build with Snowpark 31

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Model deployment and management
Model deployment refers to the process of integrating a trained ML
model into a production environment so that it can be used to
make predictions on new data.

Snowpark ML has capabilities for ML Operations (MLOps), includ-


ing the Snowpark Model Registry, which enables scalable, secure
deployment and management of models in Snowflake. This reg-
istry provides centralized publishing and discovery of models to
streamline collaboration as part of the process where data sci-
entists hand off successful experiments to ML engineers. Models
managed in the Snowpark Model Registry can be deployed to pro-
duction securely on elastic Snowflake infrastructure, including
Snowflake warehouses and Snowpark Container Services.

Central management and governance of ML artifacts and metadata


is critical to scalable and robust MLOps. By having a Snowflake
native solution that can manage any ML model, whether trained
inside or outside of Snowflake, Snowflake customers can stream-
line the process of running multiple models in production — all
while having scalable access controls that democratize access to
the models for SQL users and not just ML experts.

Snowflake helps organizations maximize the value of data,


including unstructured data and third-party data. Regardless of
the language used, Snowflake’s intelligent infrastructure handles
scaling and performance tuning, allowing data scientists to spend
more of their time building models.

With Snowflake, you can:

»» Publish, discover, and share models with a central and


governed view of model artifacts and metadata.
»» Effortlessly deploy registered models for inference using
scalable Snowflake infrastructure.
»» Democratize access with the ability to expose models and
results to coders and SQL users

32 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Minimizing the learning curve with the
Snowpark API

»» Building code using code editors and IDEs

»» Building ML models using notebooks

»» Test-driving Snowpark with Snowflake


Python worksheets

»» Exploring Snowpark online resources

Chapter 4
Getting Started
with Snowpark

A
re you ready to build applications quickly and cost-
effectively with Snowpark? It’s easy to get started because
Snowflake minimizes the learning curve by eliminating
the need for a separate tool. You can begin development anywhere
that can run a Python kernel. Simply install the Snowpark API and
establish a connection with your Snowflake account.

Snowpark gives developers the flexibility to tailor their workflow


to their preferences. It supports many development interfaces,
including code editors and integrated development environments
(IDEs), Hex and open-source notebook solutions, and Snowflake
Notebooks and Snowflake Python worksheets in the Snowflake
web interface.

CHAPTER 4 Getting Started with Snowpark 33

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
The Snowpark API
The Snowpark API is a library, including a DataFrame API, that
allows developers to query and process data at scale in Snowflake.
The Snowpark DataFrame API enables developers to write queries
and data transformations using familiar DataFrames while ben-
efiting from Snowflake’s processing performance, security, and
scalability.

Code Editors and IDEs


Many developers prefer to build using code editors and IDEs.
These offer capabilities such as local debugging, autocomplete,
and integration with source control. Snowpark works well in VS
Code, IntelliJ, PyCharm, and other tools. Code editors and IDEs are
a great choice for a rich development and testing experience for
building pipelines.

VS Code also works with a Jupyter extension that provides a note-


book experience within the editor. This brings breakpoints and
debugging into the notebook experience without requiring sepa-
rate management of the Jupyter container or runtime.

Notebook Solutions for Data Science


Notebooks are a popular choice for building machine learning
(ML) models in Snowpark. They enable rapid experimentation
using cells, which means you can write code in one cell, run it,
and see the results immediately without having to run the entire
application. This makes it easier to test small parts of your code
and debug issues. You can run a variety of notebook solutions
with Snowpark.

Snowflake Notebooks
Snowflake Notebooks is a new development interface that offers
an interactive, cell-based programming environment for Python
and SQL users to explore, process, and experiment with data in
Snowpark. Snowflake’s built-in notebooks allow developers to
write and execute code, train and deploy models using Snowpark

34 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
ML, visualize results with Streamlit chart elements, and much
more — all within Snowflake’s unified, secure platform.

And, because the Notebook is natively integrated into Snowflake’s


role-based access controls (RBAC), it’s easy to securely share
and collaborate on your code and results without compromising
the security of any enterprise data. For data science and ML, the
cell-based layout in Snowflake Notebooks unlocks experimenta-
tion and exploration tasks, as developers can write and execute
code, visualize results, capture notes, and share insights all in
one place.

Hex notebooks
Snowpark is integrated with Hex’s notebooks, allowing for a
powerful and flexible coding experience. Hex was built to work
with Snowflake’s Data Cloud. Its unique architecture and tight
integrations let users move beyond the limits of traditional note-
books, and modernize their workflows for the cloud. Most if not
all data processing can be pushed down to the Data Cloud, tak-
ing advantage of Snowflake’s scalable compute and advanced
caching. Discovering insights and building apps on data stored
in Snowflake using Hex’s analytics workspace takes just minutes.

Hex provides first-class support for Snowpark. Integrating with


Hex’s notebook UI provides an elegant front end for data scien-
tists and analysts to connect to data in the Data Cloud, explore
and analyze SQL and Python, build interactive data apps, and
share them broadly.

Open-source notebooks
Open-source notebooks such as Jupyter can be run locally while
connected securely to Snowflake to execute data operations. This
lets developers interact with their data directly and see their que-
ries in real time. Any machine running containers or Python can
build data pipelines or ML models feeding apps using Snowpark.

Snowflake Python Worksheets


Snowflake Python worksheets are a quick and simple way to try
Snowpark directly in Snowsight, the Snowflake web interface,
for data manipulations and transformations. These worksheets

CHAPTER 4 Getting Started with Snowpark 35

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
provide autocomplete for the Snowpark session and can run
directly from the browser as a stored procedure. You can use
third-party packages listed in the Snowflake Anaconda channel
or import your own Python files from stages to use in scripts. If
you’re interested in trying Snowpark for free, the easiest way to
get started is through the 30-day free trial using the Snowflake
Python worksheets here:

https://signup.snowflake.com/?lab=getStartedWithSn
owparkInPythonWorksheets

SNOWPARK ONLINE RESOURCES


The Snowflake website offers a number of resources to help you get
started with Snowpark quickly and easily:

Snowpark Quickstarts offer several tutorials, such as


“Getting Started with Snowpark for Python and Streamlit” or
“Intro to Machine Learning with Snowpark for Python.” Visit
http://quickstarts.snowflake.com.
Snowpark Developer Guides are available for each of the three
supported programming languages: Java, Python, and Scala. Visit
https://docs.snowflake.com/en/developer-guide/
snowpark/index for more information.
The Snowpark homepage offers a wealth of information on all
things Snowpark: snowflake.com/snowpark.

To get started with Snowpark, you need a Snowflake account and


read/write access to a database. If you do not have a Snowflake
account, you can sign up for a free trial, which doesn’t even require
a credit card!

https://signup.snowflake.com/?lab=getStartedWith
SnowparkInPythonWorksheets

36 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Streamlining architecture complexity
with one platform

»» Processing code next to the data

»» Running apps in a governed, secure


environment

»» Enjoying improved performance while


keeping costs down

»» Optimizing app distribution

»» Maximizing monetization

Chapter 5
Six Benefits of Building
Apps with Snowpark

I
magine you were building an application on a cloud platform
that lets you run multiple programming languages without
needing different processing engines. You could process your
code next to the data in one governed, secure environment rather
than exporting it to other locations. And you could build, deploy,
distribute, and monetize your application natively on one scalable
platform. What price and performance gains would you be able to
achieve? How much more quickly would you be able to bring your
app to market and serve your customers?

Snowpark, the set of libraries and runtimes that securely deploy


and process programming code such as Python, Java, and Scala
in Snowflake, opens data programmability to all developers,
regardless of which languages you use to code your applications.
You can code in your familiar constructs and run directly inside
Snowflake’s elastic compute engine, with no other processing
system needed. And you can do everything from build to mone-
tize apps on Snowflake’s secure, governed platform. This chapter

CHAPTER 5 Six Benefits of Building Apps with Snowpark 37

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
details the many ways organizations can benefit from building the
underlying application processing tier — pipelines and machine
learning (ML) models — with Snowpark.

One Platform for All Languages


Developers often need to use multiple programming languages to
build robust applications. Architecture complexity increases sig-
nificantly when different teams use different languages across
multiple processing engines. Snowpark streamlines architec-
tures by natively supporting programming languages of choice —
including Python, Java, and Scala — all in Snowflake without the
need for separate processing engines. Instead, Snowpark brings
all teams together to collaborate on the same data in Snowflake’s
single platform. This can lead to more efficient collaboration and
development productivity.

Snowpark includes runtimes for Python, Java, and Scala in addi-


tion to Snowpark Container Services, which can accommodate
code in any programming language of choice (for more on this,
see Chapter 1).

Python adoption continues to surge with the rapid adoption


of advanced analytics and ML in the enterprise. According to
Snowflake’s Data Trends 2023 report, in February 2023, Python
accounted for nearly 88 percent of all jobs run on Snowpark.

Bringing the Compute to the Data


One of the top four trends identified in Snowflake’s Data Trends
2023 report was that state-of-the-art companies are bring-
ing their work and artificial intelligence (AI) to the data, not
vice versa.

Snowpark lets developers write and execute code where the data
is securely stored and governed. With Snowpark as the process-
ing layer, developers can build and run applications directly on
data already in Snowflake. In addition, Snowflake can automati-
cally scale compute resources up and down for virtually unlim-
ited concurrency without impacting performance or having to
reshuffle data.

38 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
As a result, developers can build and scale applications powered
by Snowflake without worrying about capacity planning or costly
overprovisioning. Snowflake also provides a set of tools to write
code, check that it works correctly, and put it into production —
which can help accelerate the release of new applications or
features.

As we’ve mentioned elsewhere in the book, developers can use


Snowpark to write code in their preferred language to process and
analyze structured, semistructured, and unstructured data gov-
erned in Snowflake.

Built-In Governance and Security


Another trend highlighted in the Data Trends 2023 report was
that organizations are paying increased attention to governance.
One way to see how much attention is being paid to data protec-
tion is to look at how often data masking policies are set to limit
access to the right users at the right times. Across the Snowflake
Data Cloud in FY 2022, the number of applied masking poli-
cies grew more than 205 percent according to Snowflake’s Data
Trends 2023 report.

Enterprise-grade governance controls and security are built into


Snowpark. For example, Snowpark’s secure design isolates data
to protect the network and host from malicious workloads, and
gives administrators control over the libraries that developers
execute. Developers can build confidently, knowing data security
and compliance measures are incorporated and consistent.

Increased Performance
and Reduced Price
Snowpark customers see significant price and performance ben-
efits over other managed and open-source solutions, thanks
to Snowflake’s engine and the elimination of data movement.
According to Snowflake’s Snowpark Price Performance: Customer
Results report, as of June 2023, customers saw a median of 3.5x
faster performance and a 34 percent cost savings with Snowpark

CHAPTER 5 Six Benefits of Building Apps with Snowpark 39

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
over managed Spark. The performance and price benefits are
largely due to these reasons:

»» Snowflake’s distributed compute engine is logically


integrated but physically separated from storage. It was built
using a multiclustered, shared data architecture that plans
and optimizes the execution of concurrent workloads, with
many built-in optimizations such as autoclustering and
micro-partitioning.
»» Snowflake’s data platform enables developers working
in Snowpark to bring their code to the data rather than
exporting data to other environments. The elimination of
external data processing engines for pipelines built in
multiple languages — which often requires converting,
egressing, scanning, and loading data from its source —
helps to reduce costs and improve performance.

Building with Streamlit in Snowflake


Developers need to have as much flexibility in distributing their
apps as they do in building them. By using Snowpark as the pro-
cessing tier for apps, developers also benefit from all the other
capabilities in the Snowflake platform. Whether you want to
turn ML models into apps, or do everything from build to mon-
etize an app in one environment, Snowflake has options based
on your preferences. (See the following section for more about
monetization.)

Streamlit in Snowflake enables Python developers to turn data


and ML models into interactive web apps — no front-end devel-
opment required. Streamlit is a pure-Python open-source appli-
cation framework that enables developers to quickly and easily
write, share, and deploy applications using Snowflake as a data
source.

Data scientists and developers can

»» Build apps using Streamlit’s component-rich, open-source


Python library.

40 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
»» Modify code and see changes go live with side-by-side editor
and app preview screens in Snowflake.
»» Share Streamlit apps via URLs that leverage existing
role-based access controls and run on Snowflake’s scalable,
secure, and performant infrastructure.

Distribute and Monetize with Snowflake


Native Apps and Snowflake Marketplace
With Snowflake, developers can build, deploy, distribute, and
monetize applications inside the Data Cloud. They can use the
Snowflake Native App Framework to bring together all the com-
ponents of their application, using Snowflake first-party capa-
bilities such as Snowpark and Streamlit in Snowflake. The app
can then be distributed and monetized in Snowflake Marketplace
and deployed directly inside a customer’s Snowflake account.
This provides a convenient way for developers to reach a large
audience of potential customers and generate revenue from their
applications.

Snowflake Marketplace allows developers to distribute and


sell their Snowflake-powered applications to other Snowflake
customers. From the consumer side, Snowflake customers can
discover and purchase Snowflake Native Apps on Snowflake
Marketplace, and then install and run them within their own
Snowflake account. Bringing the apps close to the data opens up
a whole new world of possibilities to enrich, activate, enhance,
visualize, and transform data — without data ever leaving the
consumer’s account.

Because Snowflake Native Apps run in the customer’s account and


there is no need to move data, security and procurement hur-
dles are reduced, which also accelerates sales cycles for providers
and time to value for customers. Risk to providers’ intellectual
property is reduced as the code remains private from end users.
Customers can access only the interface visible to them without
ever accessing proprietary data sets or logic from providers.

CHAPTER 5 Six Benefits of Building Apps with Snowpark 41

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Snowflake offers a range of monetization options, from
subscription-based models to usage-based models priced per
month or query. Custom Event Billing capabilities enable you to
build your own pricing strategy. You can charge customers based
on their usage and specify billing events based on your prefer-
ences, such as consumed rows, ingested rows, unique ingested
users, locations monthly, and more.

Customers see benefits as well: With on-platform billing, cus-


tomers can pay via credit card, ACH, or even use their Snowflake
Capacity commitment. And because these customer-facing apps
are Powered by Snowflake, users know they’ll receive the benefits
of Snowflake’s scalability, speed, and reliability.

BUILD, DISTRIBUTE, MONETIZE,


AND DEPLOY APPS — ALL IN
THE DATA CLOUD
How it works for app providers:

1. Build the app’s processing tier using Snowpark.


2. Build the app’s user interface with Streamlit in Snowflake.
3. Deploy and test the app locally.
4. Distribute and monetize apps on Snowflake Marketplace.

42 Building Applications with Snowpark For Dummies, Snowflake Special Edition

These materials are © 2024 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
WILEY END USER LICENSE AGREEMENT
Go to www.wiley.com/go/eula to access Wiley’s ebook EULA.

You might also like