0% found this document useful (0 votes)
334 views14 pages

A Detailed View Inside Snowflake

Good view of all of Snowflake's capabilities and why you should care.

Uploaded by

no theenks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
334 views14 pages

A Detailed View Inside Snowflake

Good view of all of Snowflake's capabilities and why you should care.

Uploaded by

no theenks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

A Detailed View

Inside Snowflake
THE DATA WAREHOUSE BUILT FOR THE CLOUD

WHITEPAPER
THE NEED FOR CHANGE

Legacy data warehouses are based on technology that is, at its core, decades old. They were designed
in a time when data was simpler, and the number of people in an organization with the need or desire
to access the database were few. As analytics has become a company-wide practice, and a larger
volume of more diverse data is collected, the data warehouse has become the biggest roadblock
that people are facing in their path to insight. To meet the demands and opportunities of today and
tomorrow, data warehouses will need to fundamentally change.

Data is becoming more diverse. It used to Technology has evolved. There are
be that data came primarily from internal technologies available today, like the
sources (e.g. transactional, ERP, and CRM cloud, that were not even conceived of
systems) in structured forms at a predictable when conventional data warehouses were
rate and volume. Today, in addition to designed. As such, they weren’t designed to
traditional sources, data is being generated take advantage of the unlimited scale and
by a by diverse and rapidly changing set convenience of the cloud.
of sources, including application logs, web
Purchasing has evolved. With the diverse
interactions, mobile devices, and more.
and ever changing workload of the modern
That data frequently arrives in flexible semi-
data warehouse, many organizations would
structured formats such as JSON or Avro, at
prefer to pay for their data infrastructure
highly variable rates and volumes.
and software as a subscription, instead of
Data is being used differently. Data used to a permanent (and large) one-time capital
flow through complex ETL pipelines into a outlay.
data warehouse, where reporting queries
ran periodically to update fixed dashboards
and reports. That process often took days.
Today, a wide array of analysts need to “Today’s data warehouses are based on
explore and experiment with data as quickly technology that is decades old. To meet the
as possible, without knowing in advance demands and opportunities of today, data
where they might find value in it. A growing
warehouses have to fundamentally change.”
number of applications need immediate
access to data in order to support new and — Jeff Shukis, VP Engineering and Tech Ops, VoiceBase

existing business processes.

WHITEPAPER

2
Traditional data warehouses have been adequate for years, but their architectural baggage is becoming
more evident as they to fail to evolve to changing needs. They are often quite expensive, as well.

At the same time, newer “big data” offerings and noSQL systems such as Hadoop are failing to provide
a better alternative. They can be useful tools for data transformation and data science, but they
weren’t designed for data warehousing. They require difficult-to-find skillsets, are not fully compatible
with the existing ecosystem of SQL-based tools, and fail to deliver interactive performance. What’s
more, to deliver the capabilities required to be a data warehouse even partially, they need to be paired
with other compute and processing tools.

IMAGINING A FRESH APPROACH TO If we were to start over, unencumbered by the


DATA WAREHOUSING accumulated baggage of data warehousing history,
what would we build? The ideal data warehouse
These limitations can’t be fixed with haphazard feature
would combine the strengths of data warehousing—
updates; they are fundamental to the inadequate
performance, security, and a broad ecosystem-with
architecture of traditional data warehouses and big
the flexibility and scalability of “big data” systems.
data solutions. To address their shortcomings, a
complete redesign and reimagining of data warehouse
architecture and technology is necessary.

Fig. 1: Snowflake can store any scale of diverse data at a low cost.

3
Such a data warehouse would be: • Able to facilitate seamless data sharing: With
organizations looking to share data both inside
• Able to store any type of business data: Natively
and outside of their walls, the data warehouse
handle diverse types of data without requiring
of the future would enable support for seamless
complex transformations before loading that data
data sharing.
into the data warehouse.
Unfortunately, traditional data warehouses and
• Instantly scalable for flexible performance
the noSQL systems that are frequently promoted
and concurrency: Able to infinitely scale up
as their complement or even replacement - are
and instantly scale down at any time without
fundamentally unable to fulfill all these requirements.
disruption. It would also be able to scale out to
as many different use cases as needed without
disruption. It goes without saying that complete THE LIMITS OF TRADITIONAL
elasticity is difficult to accomplish without an DATA WAREHOUSES AND NOSQL
unlimited compute resource like the cloud affords. ALTERNATIVES

• A true service: Management and infrastructure Traditional data warehouses are fundamentally
would be automatically managed by the unable to deliver this vision. Data warehouse
warehouse so that users could focus on getting appliances with fixed configurations are certainly
value from their data. the most limited, but even software-only products
cannot be truly elastic. Those limitations are driven
• A seamless fit with existing skills and tools: The
by fundamental flaws in the two dominant scalable
data community has been myopically focused
database architectures in traditional databases:
on supporting tools for a small number of
shared-disk and shared-nothing.
data scientists, without addressing the huge
community of people and tools that understand The Shared-Disk Architecture and its
standard SQL. Full support for standard SQL
Limitations
makes it possible to offer a better engine for The shared-disk architecture was the first approach
those users without the need for new expertise, to emerge for scaling beyond the single-node SMP
programming paradigms, and training. architectures of early systems. It is designed to scale
processing beyond a single server while keeping
• A flexible subscription and service: Businesses
data in a central location. In a shared-disk system,
should be able to pay for all of their services and
all of the data is stored on a storage device that
infrastructure as a service, and data warehouses
is accessible from all of the nodes in the database
are no different. The flexibility of the subscription
cluster. Any change in the data is updated and
model allows for the ebb and flow of business
reflected in the single storage location. Shared-disk
needs, and more elegantly supports the rapid
architectures are attractive for their simplicity of data
growth and capital expenditure models of
management: all processing nodes in the database
modern organizations.
cluster have direct access to all data, and that data is
consistent because all modifications to the data are
written to the shared disk. However, the scalability
of this architecture is severely limited because

WHITEPAPER

4
even a modest number of concurrent queries will The challenges of optimizing data distribution
overwhelm the storage device and the network to in a shared-nothing system have only grown as
it, forcing processing to slow down for data to be workloads have become more dynamic and varied.
returned from the shared disk. Additional compute Distribution of data across compute nodes is
nodes only exacerbate the overloaded shared disk. typically done through static assignment--data is
Further, complicated on-disk locking mechanisms are distributed at the time it is loaded by either a user-
needed to ensure data consistency across the cluster. specified distribution key or by a default algorithm.
Changing the data distribution typically requires
completely redistributing data across the cluster or
even unloading and reloading data. This is a slow
and disruptive operation, often requiring queries to
pause and blocking queries that modify data.

Further, shared-nothing architectures make it very


Fig. 1: Shared disk architecture is limited by the performance of the disk difficult to select the right balance of storage and
compute. Because the cluster must be sized to
house all data, compute resources may be more than
Shared-Nothing Architecture and its
Limitations needed for actual queries or may be insufficient for
the queries run on the system. Because of the time
Shared-nothing databases arose as a solution to required to resize the cluster (if even possible to do
the bottlenecks of the shared-disk architecture. The so), organizations frequently overprovision these
shared-nothing architecture scales processing and clusters, resulting in wasted resources and spend.
compute together by distributing different subsets of
data across all of the processing nodes in the system,
Limitations of noSQL
eliminating the bottleneck of communication with
a shared disk. Designed in an era where bandwidth
and network latency to storage was a key bottleneck,
the shared-nothing architecture took advantage of
inexpensive local disk, moving data storage close Fig. 2: Shared nothing architecture is limited by the need to distribute
and query data across nodes
to compute.

However, the shared-nothing architecture has its own


The limited flexibility of traditional data warehouse
limitations, which have become increasingly apparent
architectures and their inability to scale cost-
as technology and data analytics have advanced.
effectively to handle the massive data volumes of the
For one, the shared-nothing architecture has modern business helped lead to interest in emerging
performance bottlenecks of its own. As the cluster noSQL alternatives like Hadoop. The ability of noSQL
is scaled to more and more nodes, the fact that data solutions to store non-relational data without first
is distributed across the cluster requires shuffling requiring transformation, leverage inexpensive
data between nodes. That shuffling adds overhead commodity servers for scaling to large data volumes,
that reduces performance and makes performance and support diverse custom programming led
heavily dependent on how data is distributed across organizations to experiment with noSQL solutions
the nodes in the system.

WHITEPAPER

5
in a variety of use cases. Many wondered whether Although a small number of data warehouses are
noSQL solutions could even replace the data marketing themselves as “cloud” solutions, they
warehouse. weren’t designed for the cloud. These offerings
are either managed service offerings of existing
However, as organizations have looked more closely
on-premises products, or simply an installation of
at these solutions, it has become clear that they have
existing software in a public cloud infrastructure.
limitations of their own that make them unable to
Conversely, there are cloud vendors offering “cloud
replace the data warehouse. Most noSQL solutions,
data warehouses” that were never intended to be data
including Hadoop, rely on the same shared-
warehouses in the first place, and lack full support
nothing architecture that underlies traditional data
for basic features like ANSI-SQL compatibility.
warehouses. As a result, key limitations of shared-
nothing architectures also hinder these solutions—
Snowflake was founded by a team with deep
data frequently needs to be shuffled among nodes,
experience in data warehousing. Guided by their
compute cannot be sized independently of storage,
experiences and frustrations with existing systems,
and clusters often need to be overprovisioned.
our team built a completely new data warehouse
Not only that, but noSQL systems generally don’t designed to deliver dynamic infrastructure,
fully support ANSI SQL and are extremely complex performance, and flexibility at a fraction of the cost.
to manage. As a result of their inefficiency, they also Most importantly, they built Snowflake from scratch
suffer from poor performance and struggle to support for the cloud rather than starting with existing
higher levels of concurrency. In short, Hadoop and software like Postgres or Hadoop.
noSQL tools are fundamentally poor at analytics.
The Snowflake solution? First of all, Snowflake was
built in the cloud and for the cloud to offer completely
SNOWFLAKE: DATA WAREHOUSE BUILT unlimited storage and compute. Snowflake is a
FOR THE CLOUD massively parallel processing (MPP) database that

At Snowflake, as we considered the limitations is fully relational, ACID compliant, and processes

of existing systems, we realized that the cloud standard SQL natively without translation or

is the perfect foundation to build this ideal data simulation. It was designed as a software service that

warehouse. The cloud offers near-infinite resources can take full advantage of cloud infrastructure, while

in a wide array of configurations, available at any retaining the positive attributes of existing solutions.

time, and you only pay for what you use. Public
cloud offerings have matured such that they cannot
support a large and growing set of enterprise
“With Snowflake’s speed, we can explore this
computing needs, often delivering higher data
durability and overall availability than private information map at the speed of thought, and
datacenters, all without the upfront capital costs. move from data, to information, to a decision,
10 times faster.”
— Chris Frederick, Business Intelligence Manager
University of Notre Dame

WHITEPAPER

6
A new architecture: Multi-cluster, Snowflake dynamically brings together the storage,
shared data compute and services layers, delivering exactly the
resources needed exactly when they are needed. The
Snowflake’s novel design physically separates but
database storage layer resides in a scalable cloud
logically integrates storage, compute and services
storage service, such as Amazon S3, which ensures
like security and metadata; we call it multi-cluster,
data replication, scaling and availability without any
shared data and it consists of 3 components:
management by customers. Snowflake optimizes and
• Storage: the persistent storage layer for data stores data in a columnar format within the storage
stored in Snowflake layer, organized into databases as specified by the user.

• Compute: a collection of independent compute To allocate compute resources for tasks like loading,
resources that execute data processing tasks transformation and querying, users create “virtual
required for queries warehouses” which are essentially MPP compute
clusters. These virtual warehouses have the ability
• Services: a collection of system services that
to access any of the databases in the database
handle infrastructure, security, metadata, and
storage layer to which they have been granted
optimization across the entire Snowflake system
access, and they can be created, resized and deleted
dynamically as resource needs change. When virtual
warehouses execute queries, they transparently
and automatically cache data from the database
storage layer. This hybrid architecture combines
the unified storage of a shared-disk architecture
with the performance benefits of a shared-nothing
architecture.

The cloud services layer consists of a set of services


that manage the Snowflake system—metadata,
security, access control, and infrastructure. The
services in this layer seamlessly communicate with
Fig. 3: Built from the ground up for the cloud, Snowflake’s unique architecture
physically separates and logically integrates compute, storage and services client applications (including the Snowflake web user
interface, JDBC, and ODBC clients) to coordinate
In a traditional data warehouse, storage, compute, and
query processing and return results. The services
database services are tightly coupled. This can stem
layer retains metadata about the data stored in
from either the configuration of the physical nodes
Snowflake and how that data has been used, making
(even in the cloud), or the architecture of the data
it possible for new virtual warehouses to immediately
warehouse appliance. Even “big data” platforms tie
use that data.
storage, compute and services tightly together within
the same nodes. Big data platforms can scale compute Unlike a traditional data warehouse, Snowflake
and storage to some degree, but they still suffer from can dynamically bring together the optimal set of
the same predictable performance limitations as the resources to handle a multitude of different usage
number of workloads and users increase. scenarios, with the right balance of IO, memory,
CPU, etc. This flexibility is what makes it possible to

WHITEPAPER

7
support data warehouse workloads with different Semi-structured data like this is commonly
query and data access patterns in a single service. hierarchical and rarely adheres to a fixed schema.
Snowflake’s architecture enables the following key Data elements may exist in some records but not
capabilities: others, while new elements may appear at any
time in any record. Correlating the information in
• Support for all of your data in one system
this semi-structured data with structured data is
• Support for all of your use cases with dynamic important to extract and analyze the information
elasticity within it.

• True ease of use with a self managing service and Using semi-structured data in a traditional relational
automatic adaptation database requires compromising flexibility or
performance. One approach is to transform that
data into a relational format by extracting fields
HOW SNOWFLAKE DELIVERS ON
and flattening hierarchies so that it can be loaded
THE PROMISE OF THE CLOUD DATA
WAREHOUSE into a relational database schema. This approach
effectively puts the constraints of a fixed schema on
Support all of your data in one system
that semi-structured data, sacrificing information and
Snowflake designed a data warehouse that allows flexibility. Fields not specified for extraction are lost,
you to store all of your business data in a single including new fields that appear in the data. Adding
system. That is a sharp contrast from current fields requires a redesign of the data pipeline and
products, which are typically optimized for a single updating all of the data that was previously loaded to
type of data, forcing you to create silos for different include the new fields.
data or use cases.
The alternative to this approach, which some
Native Support for Semi-Structured Data databases have implemented, is a special datatype
for storing semi-structured data as a complex object
Traditional database architectures were designed
or simply as a string type. Although this approach
to store and process data in strictly relational
preserves the information and flexibility in the semi-
rows and columns. These architectures built their
structured data, it sacrifices performance because
processing models and optimizations around the
the relational database engine can’t optimize
assumption that this data consistently contained
processing for semi-structured data types. For
the set of columns defined by the database schema.
example, accessing a single element in an object
This assumption made performance and storage
commonly requires a full scan of the entire object in
optimizations like indices and pruning possible, but
order to locate the element.
at the cost of a static, costly-to-change data model.
Because traditional data warehouses do not support
Structured, relational data will always be critical for
the capabilities needed to effectively store and
reporting and analysis. But a significant share of data
process semi-structured data, many customers have
today is machine-generated and delivered in semi-
turned to alternative approaches, such as Hadoop,
structured data formats such as JSON, Avro, and XML.
for processing this type of information. While
Hadoop systems can load semi-structured data
without requiring a defined schema, they require

WHITEPAPER

8
specialized skills and are inefficient at processing When Snowflake loads semi-structured data, it
structured data. automatically discovers the attributes and structure
that exist in the data to optimize how it’s stored.
With either approach, there are massive sacrifices. In
It looks for repeated attributes across records, and
desperation, many organizations are adopting both
then organizes and stores those repeated attributes
a traditional data warehouse and Hadoop alongside
separately, enabling better compression and fast
one another. This creates additional complexity and
access similar to the way that a columnar database
subjects them to the negative aspects of both systems.
optimizes column storage. Statistics about these
Storing all types of data in Snowflake pseudo-columns are also calculated and stored
in Snowflake’s metadata repository for use in
Snowflake took a novel, different approach,
optimizing queries. This storage optimization is
designing a data warehouse that can store and
completely transparent to the user.
process diverse types of data in a single system
without compromising flexibility or performance. Snowflake also enables you to query that data
Snowflake’s patented approach provides native through extensions to SQL, making it simple to
storage of semi-structured data together with use relational queries that can combine access to
native support for the relational model and the structured and semi-structured data in a single
optimizations it can provide. query. Because of Snowflake’s approach to storing
semi-structured data, the Snowflake query optimizer
has metadata information about the semi-structured

“I can’t say enough about how fantastic the data that allows it to optimize access to that data.
For example, statistics in the metadata allow the
native JSON support is. Snowflake lets us load
optimizer to apply pruning to reduce the amount of
our JSON data as is, flatten it all out, load it data that needs to be read from the storage layer.
into the event tables, and then parse that into
Single System for All Business Data
views. My analysts are really happy about this.”
— Josh McDonald, Director of Analytics Engineering, KIXEYE Traditional architectures create isolated silos of data.
Structured data is processed in a data warehouse.
Semi-structured data is processed with Hadoop.
Snowflake started by making it possible to flexibly Complex, multi-step operations are required to
store semi-structured records inside a relational bring this data together. Scalability limits force
table in native form. This is accomplished through a organizations to separate workloads and data into
custom datatype (Snowflake’s VARIANT datatype) separate data warehouses and data marts, essentially
that allows schema-less storage of hierarchical creating islands of data that have limited visibility
data, including JSON, Avro, XML and Parquet. This and access to data in other database clusters.
makes it possible to load semi-structured data
directly into Snowflake without pre-processing, All of these silos make it possible to configure a
losing information, or defining a schema. You simply data warehouse, datamart, or Hadoop cluster that
create a table containing a column with Snowflake’s is tuned for a particular workload, but at greater
VARIANT datatype and then load files containing cost and overhead. Even with a significant amount
semi-structured data into that table. of infrastructure, it is often difficult to actually find

WHITEPAPER

9
insights in the data because each silo of data only • Data: The amount of data stored can be increased
contains a part of the relevant data. or decreased at any time. Unlike shared-nothing
architectures where the ratio of storage to
Support all of your use cases elastically compute is fixed, the compute configuration is
determined independently of the volume of data
The ideal data warehouse would be able to size
in the system. This architecture also makes it
up and down on-demand to provide exactly the
possible to store data at a very low cost because
capacity and performance needed, exactly when it is
no compute resources are required to store data
needed. However, traditional products are difficult
in the database.
and costly to scale up, and almost impossible to
scale down. That forces an upfront capacity planning
exercise that typically results in an oversized data
warehouse, optimized for the peak workload but
running underutilized at all other times.

Fig. 5: Snowflake’s unique architecture enables it to elastically support any scale


of data, processing, and workloads

• Compute: The compute resources being used


for query processing can also be scaled up
or down at any time as the intensity of the
workload on the system changes. Because
storage and compute are decoupled, and the data
is dynamically distributed, changing compute
resources does not require reshuffling the data.
Fig. 4: Traditional data warehouses must be manually sized to the highest workload, Compute resources can be changed on-the-fly,
if they are configurable at all. Cloud warehouses could be more elastic.
without disruption.

Cloud infrastructure uniquely enables full elasticity


• Users: With most data warehouses, there’s
because resources can be added and discarded at
a fundamental limit to scaling concurrency
any time. That makes it possible to have exactly
because all of the queries are competing for the
the resources you need for all users and workloads,
same resources. As more users and workloads
but only with an architecture designed to take full
are added, the system gets slower and slower.
advantage of the cloud.
Regardless of how large the cluster becomes,
eventually the system cannot support additional
Snowflake’s separation of storage, compute, and
concurrency and the only option is to create
system services makes it possible to dynamically
a new datamart. This brings with it the extra
modify the configuration of the system. Resources can
management burden of replicating or migrating
be sized and scaled independently and transparently,
data across systems. Snowflake can scale to
on-the-fly. This makes it possible for Snowflake to
support more users and workloads without
deliver full elasticity across multiple dimensions:
performance impact because multiple virtual
warehouses can be deployed on-demand, all with
access to the same data.

WHITEPAPER

10
ENABLE EASE OF USE Eliminating Software and Infrastructure
Management
The need for a self-managing system
The Snowflake data warehouse was designed
Conventional data warehouses and “big data” platforms to completely eliminate the management of
require significant care and feeding. They rely on skilled infrastructure. It is built on cloud infrastructure, which
administrators constantly exerting themselves to it transparently manages for the user. Users simply
maintain the data platform: choosing data distribution log in to the Snowflake service and it is immediately
schemes, creating and maintaining indices, updating available, without complex setup required.
metadata, cleaning up files, and more.
Ongoing management of the software infrastructure
Manual optimization was feasible in an environment is also managed by Snowflake. Users do not need to
where queries were predictable and workloads manage patches, upgrades, and system security. The
were few, but it doesn’t scale when there are a Snowflake service automatically manages the system.
large number of ever-changing workloads. The time
and effort required to optimize the system for all Capacity planning, a painful requirement during the
those different workloads quickly gets in the way of deployment of a conventional on-premises data
actually analyzing data. warehouse, is all but eliminated because Snowflake
makes it possible to add and subtract resources
In contrast, Snowflake set out to build a data on the fly. Because it is easy to scale up and down
warehouse as a service where users focus on based on need, you are not forced into a huge
analyzing data rather than spending time managing upfront cost in order to ensure sufficient capacity for
and tuning. That required Snowflake to design a data future needs.
warehouse that would:
Other manual actions within traditional data
• Eliminate the management of hardware and warehouses that Snowflake automates include:
software infrastructure. The data warehouse
should not require users to think about how to • Continuous data protection: Time Travel enables
deploy and configure physical hardware. Similarly, you to immediately revert any table, database
users should not need to worry about installing, or schema to a previous state. It’s enabled
configuring, patching, and updating software. automatically and stores data as it’s transformed for
up to 24 hours, or 90 days in enterprise versions.
• Enable the system to learn and adapt. Rather than
requiring users to invest time configuring and • Copying to clone: Most data warehouses
tuning (and retuning) a wide array of parameters, require you to copy data to clone, forcing a large
Snowflake designed a data warehouse that sees amount of manual effort and a significant time
how it is being used and dynamically adapts investment. Snowflake’s multi-cluster, shared
based on that information. data architecture ensures that you never need
to copy any data, because any warehouse or
database automatically references the same
centralized data store.

WHITEPAPER

11
• Data distribution is managed automatically by PAY AS YOU GO
Snowflake based on usage. Rather than relying
on a static partitioning scheme based on a Up front capital expenditures no longer
distribution algorithm or key chosen by the user at
make sense
load time, Snowflake automatically manages how
As technology changes at an ever increasing pace,
data is distributed in the virtual warehouse. Data
the old model of paying for licensed software and
is automatically redistributed based on usage to
hardware in a massive up-front expenditure no longer
minimize data shuffling and maximize performance.
makes sense. Data warehouses can be particularly
painful to pay for in this model, as many traditional
• Loading data is dramatically simpler because
systems can cost tens of millions of dollars.
complex ETL data pipelines are no longer needed
to prepare data for loading. Snowflake natively
Newer cloud models that charge by the query aren’t
supports and optimizes diverse data, both
any better. Query based pricing can lead to runaway,
structured and semi-structured, while making
unpredictable charges and frequent query failures
that data accessible via SQL.
as cost limits are hit. What’s more, there isn’t any
way to define the compute power dedicated to
• Dynamic query optimization ensures that
each query, so you have to trust that the system is
Snowflake operates as efficiently as possible by
choosing the resource sizing that makes the most
looking at the state of the system when a query is
sense for your query.
dispatched for execution, not just when it is first
compiled. That adaptability is a crucial component
Enabling the data warehouse as a usage-
within Snowflake’s ability to scale up and down.
based service
• Scaling compute: Autoscaling is a feature that can
Snowflake is paid for as a usage based service Each
be enabled within any Snowflake multi-cluster
month, you pay for the data you store (at a cost
data warehouse that will match the number of
similar to the raw storage costs of Amazon S3),
compute clusters to the query or load, without
and the number of Snowflake Compute Credits
needing manual intervention or input.
you use for compute. Each Credit costs around
$2, and one Credit provides enough usage for
an XS data warehouse for one hour. A Small data
warehouse -the next size up- costs 2 credits per
“Snowflake is faster, more flexible, and
hour and delivers approximately twice the compute
more scalable than the alternatives on the horsepower. Each successive size of data warehouse
market. The fact that we don’t need to do continues to double both the compute horsepower

any configuration or tuning is great because and price in credits. This linear model makes it easy
to plan for your expenditures, and keep them low in
we can focus on analyzing data instead of on
the first place.
managing and tuning a data warehouse.”
—Craig Lancaster, CTO, Jana

WHITEPAPER

12
Snowflake also addresses the limitations of the query in the provider account, it’s automatically and
based pricing model. Since you pay for each warehouse immediately visible in the consumer account. Detailed
by the hour, costs are always known and understood. permissions and role based access can be applied to
What’s more, your query will never fail due to cost that data, ensuring that information is only shared
limits, it’ll just take longer. The basic premise is that with the people who it is meant to be shared with.
you have ultimate control over every piece of the
warehouse, so if you want your query to move faster
Use the SQL that you already know
you can choose to move to a larger warehouse. Again,
The last benefit of Snowflake’s architecture is the
these are choices that you aren’t given with inflexible
simplest, but in many ways the most important:
query based models.
you can use the SQL that your team already knows.

Seamless sharing of data noSQL systems and query based data stores have
become more common recently, but they both fail to
Snowflake’s architecture vastly simplifies the process fully support standard ANSI SQL. This not only limits
of sharing data, particularly between different the way you interact with your data and transform it,
organizations. Instead of needing to manually create but it requires you to hire people familiar with those
copies of data and sending them over FTP, EDI, or systems, or train your existing people to use these
cloud file services, Snowflake Data Sharing allows new systems.
any Snowflake customer to share access to their data
with any other Snowflake customer. Snowflake allows your team to use the SQL they
already know and love to transform and query all
Instead of sending a file, you send access to the of your data. This simplicity pays endless dividends
underlying data. The schema and database structure over time as you save time and resources you
can be imported automatically by the consumer so would otherwise devote to supporting bespoke and
there’s very little manual effort involved in using the “oddball” systems.
shared data. What’s more, when the data updates

THE IMPACT OF REINVENTION

By reimagining and reinventing the data warehouse, Snowflake has addressed all of the key
limitations of today’s technology. Doing so required a new architecture that was completely
different from data warehouses of the past. As a result, you can easily store all your data,
enable all your users with zero management, paying the way you want to and using the
SQL you already rely on. Rather than being bottlenecked waiting for the availability of
overstretched IT and data science resources, analysts get rapid access to data in a service
that can operate at any scale of data, users, and workloads.

To learn more about Snowflake, join us for a live demo at https://www.snowflake.net/


webinar/snowflake-livedemo/

WHITEPAPER

13
Snowflake Computing, the cloud data warehousing company, has
reinvented the data warehouse for the cloud and today’s data.
Snowflake is built from the cloud up with a patent-pending new
architecture that delivers the power of data warehousing, the flexibility
of big data platforms and the elasticity of the cloud – at a fraction of
the cost of traditional solutions. Snowflake is headquartered in Silicon
Valley and can be found online at snowflake.net.

Copyright © 2017 Snowflake Computing, Inc. All rights reserved. SNOWFLAKE COMPUTING, the Snowflake
Logo, and SNOWFLAKE ELASTIC DATA WAREHOUSE are trademarks of Snowflake Computing, Inc.

You might also like