0% found this document useful (0 votes)
40 views3 pages

Clickhouse vs. Snowflake Overview

Uploaded by

A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views3 pages

Clickhouse vs. Snowflake Overview

Uploaded by

A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Clickhouse vs.

Snowflake
Snowflake is a ANSI SQL cloud data platform that comprises three components: data storage, compute and cloud services. Snowflake was built
from scratch to take advantage of elastic cloud infrastructure and is available in AWS, Google Cloud and Azure. Snowflake is delivered as a
service and avoids draining significant time and resources in database maintenance and management.

Required Capability Clickhouse Snowflake

Platform must easily ● Cannot isolate workloads on different compute ● Able to act as a single source of truth for data with full
handle workload resources while accessing the same data consistency and reliability
variety with fast ● Scaling up or out requires manual resource ● Compute clusters are independent from one another
performance and provisioning and all access the same data, with full DML operations
reliability (data ● Queries can fail when computation exceeds memory ● Isolating compute means no chance of performance
engineering, data ● Good for simple, uniform queries where sub-second bottlenecks or resource contention caused by heavy
science, analytics) response times are required; not good for complex workloads
analytics ● Warehouses can be provisioned and warehouse sizes
● Real time updates or deletes are not allowed can be adjusted up/down in less than one second –
teams get exactly the resources they need
● Automatic horizontal scaling when set to
“multi-cluster”; dynamically processes higher query
throughput as concurrency increases
● Updates, deletes and merge transactions can all be
accomplished in real time or on a scheduled basis

Platform does not ● Significant maintenance and management required ● Cloud Services layer manages resource provisioning,
introduce significant to administer: metadata storage and management, governance and
maintenance and ○ Determining shards role-based access controls, automatic query
management ○ Creating replicas for high availability optimization and ACID transactionality
overhead for ○ System configuration and workload management ● Software upgrades implemented in the background,
engineering to avoid resource contention without degrading system performance
○ Manual scaling and resource provisioning ● Allows engineers and data scientists to focus on
○ Security building and launching new products, features,
innovations to drive customer retention and

1
satisfaction vs. managing infrastructure

Transactionality ● Not transactional – does not assure data consistency, ● Fully transactional and ACID compliant
across workloads and which could be disastrous for building and training ● As soon as data change is committed, every virtual
user groups predictive models on dirty data warehouse sees the new data

Complex, ● Cannot handle complex joins and high cardinality ● Handles complex joins easily, with fast performance
cross-customer GROUPBY
analytics

Sandboxes and ● Requires completely separate environments for ● Instant to create clones of production databases for
dev/test dev/test dev/test; no limit to number of clones that can be
created
● Dev/test workloads run in same environment as
production workloads, without competing for
resources; changes to prod can be easily committed

2
Architecture Comparison

Clickhouse Snowflake

In Clickhouse, components are manually set up and must be Snowflake’s architecture is delivered entirely as a service and
continually managed/monitored. These include a load balancer, proxy removes the need to allocate engineering resources to manual and
layer, cluster, and Zookeeper to perform replicas. The cluster must be ongoing database maintenance/management.
designed for high availability and avoiding resource contention, and
the underlying infrastructure must be provisioned and scaled Storage is based in AWS S3. It scales automatically and infinitely.
manually. Upon ingestion, data is encrypted, compressed, columnarized,
partitioned and reorganized for optimal performance.

Compute runs on “virtual warehouses” (MPP compute clusters).


Warehouses can scale up/down/out and be created instantly for full
workload isolation and immediate performance improvements.
Customers choose the number of warehouses and assign their
size/scaling parameters to meet SLAs.

Cloud Services manages all functions that are delivered as a service,


such as metadata, query optimization, enterprise-grade security,
ACID transactionality, resource provisioning, software upgrades.

You might also like