0% found this document useful (0 votes)
296 views13 pages

7 Snowflake Reference Architectures For Application Builders

This document describes a serverless data stack reference architecture that uses serverless functions, databases, and workflows. The architecture includes an API gateway, serverless compute functions, a NoSQL/OLTP database for transactions, serverless ETL to load data into Snowflake for querying, and Snowflake to provide scalable data warehousing. The serverless components scale automatically without provisioning servers, and Snowflake scales its warehouses to keep data fresh and support concurrent queries.

Uploaded by

Maria Pilar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
296 views13 pages

7 Snowflake Reference Architectures For Application Builders

This document describes a serverless data stack reference architecture that uses serverless functions, databases, and workflows. The architecture includes an API gateway, serverless compute functions, a NoSQL/OLTP database for transactions, serverless ETL to load data into Snowflake for querying, and Snowflake to provide scalable data warehousing. The serverless components scale automatically without provisioning servers, and Snowflake scales its warehouses to keep data fresh and support concurrent queries.

Uploaded by

Maria Pilar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

7 SNOWFLAKE REFERENCE

ARCHITECTURES FOR
APPLICATION BUILDERS
For every data app use case, there is a modern data architecture. Discover yours. EBOOK
3 Why your data platform matters
5 Serverless data stack reference architecture
6 Streaming data stack reference architecture
7 Machine learning and data science reference architecture
8 Application health and security analytics
9 IoT reference architecture
10 Customer 360 reference architecture
11 Embedded analytics reference architecture
12 Future-proof your applications
13 About Snowflake
WHY YOUR DATA
PLATFORM MATTERS

It’s safe to say data application builders will never worry about a
lack of data. Approximately 40 zettabytes (ZB) of new data was
generated in 2019, and IDC predicts that with a steady growth
trajectory, 175 ZB will be generated in 2025. Although these ever-
increasing amounts of data present immeasurable opportunities for
delivering data-driven insights to customers, there are three crucial
questions every startup and established ISV provider should ask:
CAN OUR UNDERLYING ARCHITECTURE SCALE TO MEET THE
NEEDS OF OUR FAST-GROWTH BUSINESS?
CAN OUR PRODUCT INGEST AND ANALYZE LARGE AMOUNTS OF
STRUCTURED AND SEMI-STRUCTURED DATA TOGETHER?
MOST IMPORTANTLY, CAN WE ACCOMPLISH THESE GOALS WHILE
REMAINING OPERATIONALLY EFFICIENT AND COST-EFFECTIVE?
CHAMPION GUIDES
Today, too many organizations are burdened by • SQL for all data This ebook provides detailed reference architectures
infrastructure costs that arise from traditional Snowflake ingests JSON, Avro, Parquet, and other for seven use cases and design patterns, and it
architectures. When companies can achieve data without transformations or requiring pipeline demonstrates the importance of a cloud-built data
scalability only by throwing more resources at the fixes every time the schema changes. With ANSI platform that matches scalability and connectivity
problem, companies face an expensive and never- SQL, Snowflake enables your teams to query semi- expectations, both today and in the future.
ending problem. Traditional architectures are also structured data just as easily as structured data.
riddled with operational overhead in the form of
• No Site Reliability Engineering/DevOps burden
maintenance and tuning, which wastes valuable
As a near-zero management platform, Snowflake
engineering time and slows growth.
automatically handles provisioning, availability,
The questions above highlight the intrinsic need tuning, data protection, and other operations,
for a data stack architecture that has scalability, which enables you to focus on your own
connectivity, and support for all data types built application rather than maintenance.
into its design. That means selecting cloud-built
infrastructure components, the most important of Snowflake also ensures seamless connections to
which is your data platform. third-party platforms and APIs, easily fitting in with
your existing environment.
As the central hub for all-things data, only a cloud
data platform can deliver the performance and
nearly infinite autoscaling needed to launch and scale
applications quickly and cost-effectively. Here’s what
the Snowflake Cloud Data Platform provides:

• High performance and unlimited concurrency


Through a multi-cluster, shared data architecture,
Snowflake spins up dedicated compute clusters that
support a nearly unlimited number of concurrent
workloads on shared tables. There’s never
contention for resources or an unhappy user.

• Scalability with true elasticity


Snowflake compute resources scale up and
down automatically to deliver on-demand high
performance that’s cost-effective.

4
CHAMPION GUIDES
SERVERLESS DATA OBJECTIVE
Build data intensive applications that

STACK REFERENCE run on serverless infrastructures.

ARCHITECTURE
SERVERLESS DATA STACK DESCRIPTION
1 The client-side app, running on mobile or web

API Gateway Service Serverless Compute NoSQL/OLTP DB Serverless ETL devices, invokes the application logic on the
serverless compute via an API gateway service.
The gateway authenticates the API calls and
throttles them, based on SLAs.

Amazon API Gateway AWS Lambda Amazon Aurora AWS Step Functions 2 Serverless compute runs the application logic

Serverless and scales on demand, without the need to
provision or manage servers. The application
Asure Data Factory queries Snowflake data (5) for runtime decisions,
such as delivering product recommendations or
powering a dashboard for analysis.
Azure API Management Azure Functions
Azure Google Cloud 3
An OLTP or NoSQL database provides the
Cosmos DB Composer application with high-capacity transaction
processing. This NoSQL/OLTP database can also
be a serverless service.

4 An ETL serverless stack orchestrates the workflow



Google Cloud Apigee API Google Cloud and loads transaction data into Snowflake.
Endpoints Platform Functions Google Cloud Google Cloud
Snowflake ingests data in batches or in streams
2 Datastore 3 Dataflow 4 5
and makes it available to the application for
queries. Snowflake scales automatically to
1 keep pace with the data pipeline and ensure
data is always fresh. Workloads are isolated
Native JSON Zero
Support Management in virtual warehouses where they can run and
scale concurrently without resource contention.
Native JSON support enables easy ingestion
Workload Isolation and querying of flexible schema data alongside
structured data.
Client-side Backend Apps
Apps & Services
5

5
CHAMPION GUIDES
STREAMING DATA OBJECTIVE
Build data intensive applications

STACK REFERENCE that rely on streaming data ingestion


and analysis.
ARCHITECTURE
STREAMING DATA STACK DESCRIPTION
1 The producer application generates continuous

data that the streaming service ingests and
1 Streaming Services 3 buffers to account for data rate differences
between the producer and consumers.
Depending on the application’s needs, Snowflake
ingests data directly from the streaming service
or via cloud object storage (2).

2 In cases where the application requires raw data



to persist in cloud object storage, the streaming
service processes the raw data and batches it
Transformation Using into larger chunks, thus lowering the API storage
Amazon Cloud Streams & Tasks expenses. When Amazon Kinesis is used as the
Kinesis Pub/Sub streaming service, data is staged in cloud object
storage before ingestion.

3 Snowflake ingests data from the streaming



service into a staging table and stores the
streamed data for analysis. Its Streams and Tasks
Producer App In-app Analytics features detect data changes and schedule
Azure Event Hub
tasks to perform any required transformations.
Multiple streams and tasks can be chained to
implement a complex data pipeline. Snowpipe
with Auto-Ingest automates the data ingestion
from cloud object storage.
2 Cloud Object Storage

Auto Ingestion Instant


Using Snowpipe Scalability

Google Cloud
Storage Amazon S3

Azure Blob Storage

6
CHAMPION GUIDES
MACHINE LEARNING AND OBJECTIVE
Train machine learning (ML) models to

DATA SCIENCE REFERENCE build predictive applications, such as


recommendation engines.
ARCHITECTURE
MACHINE LEARNING AND DATA SCIENCE DESCRIPTION
1 The application produces training data, which
Apps & Model Deployment for
Snowflake (3) ingests via the streaming service
Microservices Batch/Real-time Prediction or via cloud object storage (2). The streaming
service buffers the training data to ensure
reliable and continuous ingestion.

2 When cloud object storage is used, the



1 5 streaming service batches training data into
larger chunks to lower the API storage expenses.
Streaming Services Snowflake ingests data into a staging table.
3
3 4 When new data is detected, the Streams and
Tasks feature schedule required transformations.
Model Training Multiple streams and tasks can be chained to
Zero-copy Clones implement a complex data pipeline. External
Amazon for feature engineering — Automated Training Platforms — Tables support queries of data in cloud object
Kinesis experimentation storage without ingestion. Data scientists can
create zero-copy clones of the training data to
support feature engineering and experimentation.
Transformation/Cleanup
Using Streams
& Tasks 4 Using the data stored in Snowflake, data scientists

train models with ML platforms and available
— Custom Training Platforms — libraries. Once the model artifacts are trained,
Cloud Object
Storage they are deployed on the training platforms or on
a separate process (5) to support predictions.
Google
The application performs predictions in real
Cloud Amazon Google ML Azure ML 5
Storage time or schedules batch predictions using the
SageMaker Engine Service deployed models. For batch predictions, data is
Query Data in read from an input table in Snowflake, and the
— Machine Learning Libraries — results are stored in an output table where they
Amazon Object Storage via
External Tables are available to the application. In cases where
S3
subsecond response time is required, predictions
can also be performed using input data from the
streaming service.
Azure
Blob
Storage

7
CHAMPION GUIDES
APPLICATION HEALTH OBJECTIVE
Analyze large volumes of log data to

AND SECURITY ANALYTICS identify security threats and monitor


application health.
REFERENCE ARCHITECTURE
APPLICATION HEALTH AND SECURITY ANALYTICS DESCRIPTION
1 The application and its infrastructure log large

volumes of event data that can be used to
2 Streaming Services
monitor application health and detect malicious
behavior. Log collection and aggregation systems
4 6 centralize log data from multiple sources and
deliver it to a streaming service (2) or to cloud
App/ Scheduled
5 Message Service object storage (3).
Infrastructure Tasks invoke The streaming service buffers log data to ensure
Amazon SQL-based checks 2
Kinesis reliable and continuous ingestion.
Email
3 Depending on which log collector and aggregation
Amazon SNS
Cloud Rule/ML Based system is used, data can be staged in cloud object
Pub/Sub Alerting System storage without the need for a streaming service.
Logs
1 (e.g. SnowAlert)
4 Snowflake stores and analyzes the log data,

which can be saved for long periods at commodity
Log Collection Azure storage prices. Snowpipe with Auto-Ingest
& Aggregation Event automates the ingestion from cloud object
Systems Hub storage. Scheduled tasks invoke SQL-based
SMS/Push
queries to detect suspicious behavior or
Monitoring application health concerns.
Dashboards/Ad Hoc
Querying External rule-based alerting systems, such as
5
Cloud Object Snowpipe Cost Effective SIEM
SnowAlert, can detect suspicious activity or
AWS CloudTrail Storage w/Auto Ingest log storage health concerns. Operations teams can monitor
the application via dashboards or ad hoc queries.
Google
Cloud 6 A messaging service uses email, SMS, or push

Storage notifications to notify operations teams of
events that require attention.

7 SIEM systems can leverage data in Snowflake for


Amazon
advanced searching and alerting capabilities.
S3

Azure
Blob
Storage
3 7

8
CHAMPION GUIDES
IOT REFERENCE OBJECTIVE
Build applications that analyze large

ARCHITECTURE volumes of time-series data from IoT


devices and respond in real time.

IOT DESCRIPTION
1 Smart devices, sensors, and other IoT devices

generate continuous data.
IoT Rules
Engine
Due to frequently unreliable internet
3 Streaming Services 6 2
connectivity, IoT devices communicate using the
MQTT protocol and an IoT message broker. The
5 message broker uses a publish and subscribe
mechanism to interact with other services, which
subscribe to specific topics within the broker to
access device data.
1 2 Amazon
Kinesis 3 A streaming service is used to ingest and buffer

IoT IoT Message real-time device data, thus ensuring reliable
Native JSON ingestion and delivery to a staging table in
Devices Broker Support Snowflake (5).
Cloud
AWS IoT Pub/Sub
Aggregation Using 4 In cases where the application requires it, cloud
Core
Streams & Tasks object storage is used to stage batch data prior
to ingestion. For example, minute-by-minute
Azure data may be stored in cloud object storage,
Event
whereas aggregated data over a longer period
Hub
MQTT Azure IoT IoT Analytics may be stored in Snowflake (5).
Hub
5 Snowflake offers native support for JSON and
Time-series Optimized
Cloud Object other semi-structured data formats for easy
Data Ingestion ingestion of device data. Snowpipe automatically
Storage with Snowpipe
Cloud IoT optimizes time-series queries by ingesting data
Core chronologically. Snowflake’s Streams and Tasks
Google features automate the workflows required to
Cloud Query Data in
Object Storage via ingest and aggregate incoming data.
Storage
External Tables
6 An IoT rules engine hosts the business logic

HiveMQ required by the application and operates on
Amazon data available in Snowflake and in the message
S3 broker. The rules engine sends messages back to
controls devices.

Azure
Blob
4 Storage

9
CHAMPION GUIDES
CUSTOMER 360 OBJECTIVE
Build sales and marketing applications

REFERENCE that use historical and real-time data to


accomplish “360-degree view” customer
ARCHITECTURE goals, such as finding new segments and
sending personalized offers.

CUSTOMER 360 DESCRIPTION


1 Cloud object storage stages application data,
Apps & Services
such as data on products, audiences, purchase
attributions, and user activity, for ingestion.

A streaming service ensures reliable and


4 ML Models
2
continuous ingestion by buffering event data,
6 such as clickstreams.
3rd Party Data Snowflake Secure Data Sharing ETL services orchestrate the workflow to load
3
data from cloud object storage into Snowflake.
Streaming Service
5
2 4 Snowflake Secure Data Sharing enables data

Azure from third-party sources to be used without
Event Hub copying or moving the data.

Amazon Cloud
3 Native
5
Snowflake supports all the analytics workloads
Kinesis within the application. External Tables support
Pub/Sub JSON queries of data in cloud object storage without
ETL Support ingestion. The Streams and Tasks features
automate the ingestion and data enrichment
AWS Step process. Native support for JSON and other
Cloud Object Storage Functions Data Enrichment using semi-structured formats simplifies the ingestion
Streams & Tasks of event data. Secure Data Sharing enables
Product Data monetization of fresh data without copying or
moving the data.
Google Cloud ML models are trained to optimize offers based
Audience Data Dataflow 6
on historical data stored in Snowflake. The
1 application makes real-time predictions via an
Purchase Attribution Data Azure Data Factory Query Data in API and uses Snowflake tables to store input
Object Storage via data and batch prediction results.
External Tables
User Activity Data Data Monetization
via Secure
Data Sharing

Amazon Google Cloud Azure Blob


S3 Storage Storage

10
CHAMPION GUIDES
EMBEDDED ANALYTICS OBJECTIVE
Build analytics-heavy applications that

REFERENCE ARCHITECTURE deliver in-app visualizations.

EMBEDDED ANALYTICS DESCRIPTION


1 The application makes requests via an API

or web tier, depending on whether API
API/Web Tier In-Memory Cache management is required to enforce an SLA.
1 2
2 In-memory cache provides in-session read

requests to ensure millisecond response time.

3 An OLTP or NoSQL database supports the


Memcached
transaction workloads of the application.
App Snowflake (4) ingests historical transaction
data via ETL infrastructure to support
OLTP/NoSQL DB 3 analytical workloads.

4 Snowflake stores all historical data and



supports queries by the application and
business intelligence tools (5). Virtual
warehouses isolate workloads and autoscale
ETL compute resources to deliver high performance
5 queries and unlimited concurrency.

4 In-App Embedded 5
Embedded business intelligence tools or open-
source charting libraries support analytics from
Business Intelligence within the application.

Workload Isolation

Fast Analytical Native JSON


Queries Support

11
CHAMPION GUIDES
FUTURE-PROOF YOUR
APPLICATIONS

Regardless of the type of applications you Rather than spend valuable development time
build or what architectural design pattern you rearchitecting your data stack over and over again
select, you must meet the core data platform to chase ever-evolving scalability needs, a cloud data
platform lets you focus on what you do best:
requirements for scalability and connectivity
building and improving your application to entice
if you want to attract and keep customers to new customers.
grow your business. With Snowflake, you can
And that’s something you can hang your app on.
meet customer expectations with a modern
foundation for your data stack that delivers
a highly performant service, both now and in
the future.

12
ABOUT SNOWFLAKE
Snowflake’s cloud data platform shatters the barriers that have prevented organizations of all sizes from unleashing the true value from their data.
More than 2,000 customers deploy Snowflake to advance their businesses beyond what was once possible by deriving all the insights from all
their data by all their business users. Snowflake equips organizations with a single, integrated platform that offers the only data warehouse built for
the cloud; instant, secure, and governed access to their entire network of data; and a core architecture to enable many types of data workloads,
including a single platform for developing modern data applications. Snowflake: Data without limits. Find out more at snowflake.com

© 2020 Snowflake. All rights reserved.

CITATIONS
“The Digitization of the World From Edge to Core.” IDC. bit.ly/2QuFiKk
1

You might also like