0% found this document useful (0 votes)

42 views34 pages

Quick Tour of ClickHouse Internals

The document provides an overview of ClickHouse, a column-oriented database management system designed for fast interactive queries on real-time data. It discusses key features such as the MergeTree engine for data sorting and merging, sharding for distributed tables, and replication for fault tolerance. The document emphasizes the importance of efficient data processing, indexing, and the system's scalability and open-source nature.

Uploaded by

yida

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views34 pages

Quick Tour of ClickHouse Internals

Uploaded by

yida

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Quick Tour of ClickHouse Internals

Aleksey Zatelepin Yandex

v1.1
ClickHouse use cases
A stream of events
› Actions of website visitors
› Ad impressions
› Financial transactions
› DNS queries
› …
We want to save info about these events and then glean some
insights from it

2
ClickHouse philosophy
› Interactive queries on data updated in real time
› Cleaned structured data is needed
› Try hard not to pre-aggregate anything
› Query language: a dialect of SQL + lots of extensions

2
Sample query in a web analytics system
Top-10 referers for a counter for the last week.

SELECT Referer, count(*) AS count

FROM hits
WHERE CounterID = 1234 AND Date >= today() - 7
GROUP BY Referer
ORDER BY count DESC
LIMIT 10

2
How to execute a query fast?
Read data fast
› Only needed columns: CounterID, Date, Referer
› Locality of reads (an index is needed!)
› Data compression

2
How to execute a query fast?
Read data fast
› Only needed columns: CounterID, Date, Referer
› Locality of reads (an index is needed!)
› Data compression
Process data fast
› Vectorized execution (block-based processing)
› Parallelize to all available cores and machines
› Specialization and low-level optimizations
2
Index needed!

The principle is the same as with classic DBMSes

A majority of queries will contain conditions on
CounterID and (possibly) Date
(CounterID, Date) fits the bill
Check this by mentally sorting the table by primary key
Differences
› The table will be physically sorted on disk
› Is not a unique constraint

2
Index internals
(CounterID, Date) CounterID Date Referer

primary.idx .mrk .bin .mrk .bin .mrk .bin

… …
1234 2017-09-11
N 1234 2017-09-21
N+8192 1234 2017-09-27
N+16384 1235 2013-02-16
1235 2013-03-12
… …
(One entry each 8192 rows)

2
Things to remember about indexes
Index is sparse
› Must fit into memory
› Default value of granularity (8192) is good enough
› Does not create a unique constraint
› Poor performance for point queries
Table is sorted according to the index
› There can be only one
› Using the index is always beneficial
2
How to keep the table sorted
Inserted events are (almost) sorted by time
But we need to sort by primary key!
MergeTree: maintain a small set of sorted parts
Similar idea to an LSM tree

2
How to keep the table sorted
Primary key
Part To
on disk insert

M N N+1
Insertion number

2
How to keep the table sorted
Primary key
Part Part
on disk on disk

M N N+1
Insertion number

2
How to keep the table sorted
Primary key
Part Part
[M, N] [N+1]
Merge in the background

M N N+1
Insertion number

2
How to keep the table sorted
Primary key
Part
[M, N+1]

M N+1
Insertion number

2
Things to do while merging
Replace/update records
› ReplacingMergeTree
› CollapsingMergeTree
Pre-aggregate data
› AggregatingMergeTree
Metrics rollup
› GraphiteMergeTree

2
MergeTree partitioning
ENGINE = MergeTree(Date,… )
› Table is partitioned by month or (soon) by any expression
› Parts from different partitions are not merged
› Easy manipulation of partitions

ALTER TABLE DROP PARTITION

ALTER TABLE DETACH/ATTACH PARTITION
› MinMax index by partition columns

2
Things to remember about MergeTree
Merging runs in the background
› Even when there are no queries!
Control total number of parts
› Rate of INSERTs
› MaxPartsCountForPartition and DelayedInserts
metrics are your friends

2
When one server is not enough
› The data won’t fit on a single server…
› You want to increase performance by adding more servers…
› Multiple simultaneous queries are competing for resources…

ClickHouse: Sharding + Distributed tables!

2
Reading from a Distributed table

SELECT FROM distributed_table

GROUP BY column

SELECT FROM local_table

GROUP BY column

Shard 1 Shard 2 Shard 3

2
Reading from a Distributed table

Full result

Partially aggregated
result

Shard 1 Shard 2 Shard 3

2
NYC taxi benchmark
CSV 227 Gb, ~1.3 bln rows
SELECT passenger_count, avg(total_amount)
FROM trips GROUP BY passenger_count

Shards 1 3 140

Time, s. 1,224 0,438 0,043

Speedup x2.8 x28.5

2
Inserting into a Distributed table

INSERT INTO distributed_table

Shard 1 Shard 2 Shard 3

2
Inserting into a Distributed table

Async insert into shard #

sharding_key % 3

INSERT INTO local_table

Shard 1 Shard 2 Shard 3

2
Inserting into a Distributed table
INSERT INTO distributed_table
SETTINGS insert_distributed_sync=1

Split by sharding_key and insert

Shard 1 Shard 2 Shard 3

2
Things to remember about Distributed tables
It is just a view
› Doesn’t store any data by itself
Will always query all shards

Ensure that the data is divided into shards uniformly

› either by inserting directly into local tables
› or let the Distributed table do it
(but beware of async inserts by default)

2
When failure is not an option
› Protection against hardware failure
› Data must be always available for reading and writing

ClickHouse: ReplicatedMergeTree engine!

› Async master-master replication
› Works on per-table basis

2
Replication internals
Inserted block number

INSERT

Replica 1
fetch
fetch Replication
Replica 2 queue
merge (ZooKeeper)
Replica 3
merge

2
Replication and the CAP–theorem
What happens in case of network failure (partition)?
❋
› Not consistent
As is any system with async replication
❋
But you can turn sequential consistency on
❋
› Highly available (almost)
Tolerates the failure of one datacenter, if ClickHouse replicas
are in min 2 DCs and ZK replicas are in 3 DCs.
❋
A server partitioned from ZK quorum is unavailable for writes

2
Putting it all together
SELECT FROM distributed_table

SELECT FROM replicated_table

Shard 1 Shard 2 Shard 3

Replica 1 Replica 1 Replica 1

Shard 1 Shard 2 Shard 3

Replica 2 Replica 2 Replica 2

2
Things to remember about replication
Use it!
› Replicas check each other
› Unsure if INSERT went through?
Simply retry - the blocks will be deduplicated
› ZooKeeper needed, but only for INSERTs
(No added latency for SELECTs)

Monitor replica lag

› system.replicas and system.replication_queue
tables are your friends
2
Brief recap
› Column–oriented
› Fast interactive queries on real time data
› SQL dialect + extensions
› Bad fit for OLTP, Key–Value, blob storage
› Scales linearly
› Fault tolerant
› Open source!

2
Thank you
Start using ClickHouse today!
Questions? Or reach us at:
› [email protected]
› Telegram: https://t.me/clickhouse_en
› GitHub: https://github.com/yandex/ClickHouse/
› Google group: https://groups.google.com/group/clickhouse

ETL Training - Day 1
No ratings yet
ETL Training - Day 1
59 pages
List of Job Consultancy With Address in Hyderbad
67% (6)
List of Job Consultancy With Address in Hyderbad
20 pages
Draftsman Interview Questions and Answers Guide.: Global Guideline
No ratings yet
Draftsman Interview Questions and Answers Guide.: Global Guideline
9 pages
Kiran Abinitio
No ratings yet
Kiran Abinitio
66 pages
Designing Data Intensive Applications
25% (4)
Designing Data Intensive Applications
61 pages
CS614 Finalterm Subjective Referencefile
No ratings yet
CS614 Finalterm Subjective Referencefile
27 pages
高中数学解难释疑-微积分初步王元相
No ratings yet
高中数学解难释疑-微积分初步王元相
302 pages
Designing Data Intensive Applications: Part 1: Storage and Retrieval
No ratings yet
Designing Data Intensive Applications: Part 1: Storage and Retrieval
85 pages
16天记住7000考研单词
No ratings yet
16天记住7000考研单词
117 pages
Clickhouse en
No ratings yet
Clickhouse en
673 pages
概率论与数理统计袁德美
No ratings yet
概率论与数理统计袁德美
275 pages
Designing Data-Intensive Apps - CH 3
No ratings yet
Designing Data-Intensive Apps - CH 3
7 pages
JAI MAHAKAAL! GOC Kohinoor Drive [Private] - _JAI MAHAKAAL! GOC Kohinoor Drive_[Educative.io] System Design_Grokking the System Design Interview_Course Contents_2.Glossary of System Design Basics_
No ratings yet
JAI MAHAKAAL! GOC Kohinoor Drive [Private] - _JAI MAHAKAAL! GOC Kohinoor Drive_[Educative.io] System Design_Grokking the System Design Interview_Course Contents_2.Glossary of System Design Basics_
139 pages
概率论与数理统计龙永红
No ratings yet
概率论与数理统计龙永红
308 pages
John Crane Gas Seal Technology: 27 September, Singapore
No ratings yet
John Crane Gas Seal Technology: 27 September, Singapore
44 pages
Introduction To Software Testing Tools
100% (1)
Introduction To Software Testing Tools
3 pages
What's Up CAPTCHA - A CAPTCHA Based On Image Orientation
No ratings yet
What's Up CAPTCHA - A CAPTCHA Based On Image Orientation
10 pages
经济数学-概率论与数理统计康健
No ratings yet
经济数学-概率论与数理统计康健
173 pages
Hypertable An Open Source, High Performance, Scalable Database
100% (2)
Hypertable An Open Source, High Performance, Scalable Database
37 pages
Python PYQ
No ratings yet
Python PYQ
10 pages
DataEngg Day2
No ratings yet
DataEngg Day2
28 pages
SDL Plugins
No ratings yet
SDL Plugins
5 pages
System Design
No ratings yet
System Design
150 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
138 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
System Design
No ratings yet
System Design
150 pages
Chapter 6 Word - Table and Mail Merge
No ratings yet
Chapter 6 Word - Table and Mail Merge
29 pages
So Thats Why Its So Fast An Introduction To ClickHouse Internals 2022 05 16
No ratings yet
So Thats Why Its So Fast An Introduction To ClickHouse Internals 2022 05 16
48 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
TK3 Manual
No ratings yet
TK3 Manual
24 pages
Multi-Terabyte MySQL Data Warehouses - Absolutely! Presentation
100% (1)
Multi-Terabyte MySQL Data Warehouses - Absolutely! Presentation
33 pages
DataEngg Day2
No ratings yet
DataEngg Day2
28 pages
ABC Technical
No ratings yet
ABC Technical
1 page
ClickHouse Cloud Live - Feb 8 2024
No ratings yet
ClickHouse Cloud Live - Feb 8 2024
55 pages
PostgreSQL For IoT
No ratings yet
PostgreSQL For IoT
59 pages
Ilovepdf Merged 4c7bdd33 159a 4e13 8c6b De1ec358b3ca
No ratings yet
Ilovepdf Merged 4c7bdd33 159a 4e13 8c6b De1ec358b3ca
75 pages
2-Data Warehouse Architecture - Three-Tier Data Warehouse Architecture-16!12!2024
No ratings yet
2-Data Warehouse Architecture - Three-Tier Data Warehouse Architecture-16!12!2024
30 pages
Lecture9 Polymorphism
No ratings yet
Lecture9 Polymorphism
97 pages
SQLGraph - When ClickHouse Marries Graph Processing Amoisbird PDF
0% (1)
SQLGraph - When ClickHouse Marries Graph Processing Amoisbird PDF
35 pages
Specifications:: Specifications Product Product Name Merk / Neg - Asal Type
No ratings yet
Specifications:: Specifications Product Product Name Merk / Neg - Asal Type
4 pages
17 DatabaseArchitectures
No ratings yet
17 DatabaseArchitectures
41 pages
ClickHouse Grokking
No ratings yet
ClickHouse Grokking
18 pages
(Ebooks PDF) Download Triple Focus A New Approach To Education The Full Chapters
100% (3)
(Ebooks PDF) Download Triple Focus A New Approach To Education The Full Chapters
21 pages
ORDERS D97A Message-Guideline
No ratings yet
ORDERS D97A Message-Guideline
45 pages
Advanced Data Lakehouse Concepts - New
No ratings yet
Advanced Data Lakehouse Concepts - New
25 pages
ECS781P-9-Cloud Data Management
No ratings yet
ECS781P-9-Cloud Data Management
79 pages
Sample Complaint Letter
No ratings yet
Sample Complaint Letter
2 pages
DataEngg Day2 v1
No ratings yet
DataEngg Day2 v1
28 pages
Intersystem Cache
No ratings yet
Intersystem Cache
20 pages
Statistics and Probability (MAT02) Numerical Descriptive Measure
No ratings yet
Statistics and Probability (MAT02) Numerical Descriptive Measure
13 pages
ERTMS in 10 Questions
No ratings yet
ERTMS in 10 Questions
8 pages
Opensource Column Store Databases - MariaDB ColumnStore vs. ClickHouse - FileId - 188040
No ratings yet
Opensource Column Store Databases - MariaDB ColumnStore vs. ClickHouse - FileId - 188040
50 pages
ByteHouse Ebook CNCH v1.1 Compressed
No ratings yet
ByteHouse Ebook CNCH v1.1 Compressed
27 pages
ClickHouse Data Warehouse 101 - The First Billion Rows
No ratings yet
ClickHouse Data Warehouse 101 - The First Billion Rows
47 pages
Processor Architecture
No ratings yet
Processor Architecture
25 pages
Mystic Media House Profile
No ratings yet
Mystic Media House Profile
16 pages
Vertica Unify 2021 - A Deep Dive of Complex Data Types
No ratings yet
Vertica Unify 2021 - A Deep Dive of Complex Data Types
43 pages
Case Study: Hadoop
No ratings yet
Case Study: Hadoop
46 pages
System Design
No ratings yet
System Design
32 pages
Systems Design Study Guide
No ratings yet
Systems Design Study Guide
32 pages
Explain Briefly The Different Building Blocks of Algorithms
No ratings yet
Explain Briefly The Different Building Blocks of Algorithms
19 pages
Ppce 12
No ratings yet
Ppce 12
3 pages
Unit-4 (STLD) Lecture2
No ratings yet
Unit-4 (STLD) Lecture2
21 pages
CH-2023-07-12 - Amazon QuickSight
No ratings yet
CH-2023-07-12 - Amazon QuickSight
11 pages
Paper 2 PDF
No ratings yet
Paper 2 PDF
29 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
46 pages
Ccs341-Dw-Int I Key-Set I-Ar
No ratings yet
Ccs341-Dw-Int I Key-Set I-Ar
18 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
30 pages
PREPOSITIONS OF PLACE - Quizizz
No ratings yet
PREPOSITIONS OF PLACE - Quizizz
6 pages
NoSQL Database For Software
No ratings yet
NoSQL Database For Software
49 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
sqf6 Clickhouse Guide Sample
No ratings yet
sqf6 Clickhouse Guide Sample
14 pages
Cassandra Query Language by Examples - Puzzles with Answers
From Everand
Cassandra Query Language by Examples - Puzzles with Answers
Cristian Scutaru
No ratings yet
Data Warehouseclass
No ratings yet
Data Warehouseclass
25 pages
Database Caching Strategies Using Redis
No ratings yet
Database Caching Strategies Using Redis
22 pages
VNX - VNX 5100 Procedures-Replacing A SFP in A SP
No ratings yet
VNX - VNX 5100 Procedures-Replacing A SFP in A SP
15 pages
09 Indexes2
No ratings yet
09 Indexes2
5 pages
System Design
No ratings yet
System Design
6 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
conda设置国内源
No ratings yet
conda设置国内源
1 page
MP22 8259
No ratings yet
MP22 8259
8 pages
Data Management Nuts and Bolts
No ratings yet
Data Management Nuts and Bolts
21 pages
Revis Ioin
No ratings yet
Revis Ioin
5 pages
Chapter 4 Clickhouse Bigdata V3.5 Questions
No ratings yet
Chapter 4 Clickhouse Bigdata V3.5 Questions
5 pages
DW MySQL
No ratings yet
DW MySQL
15 pages
Unit 5 DMS
No ratings yet
Unit 5 DMS
4 pages
Course Code: Course Title: TPC Version No. Course Pre-Requisites/ Co-Requisites Anti-Requisites (If Any) - Objectives
No ratings yet
Course Code: Course Title: TPC Version No. Course Pre-Requisites/ Co-Requisites Anti-Requisites (If Any) - Objectives
4 pages
Module 6
No ratings yet
Module 6
7 pages
Cheat Sheet v4
No ratings yet
Cheat Sheet v4
3 pages
Cheat Sheet v2
No ratings yet
Cheat Sheet v2
3 pages
Resume 1 Linux Administrator
No ratings yet
Resume 1 Linux Administrator
2 pages
Teradata Performance Optimization
No ratings yet
Teradata Performance Optimization
7 pages
Parallelism in A Uniprocessor System: Multiprogramming
No ratings yet
Parallelism in A Uniprocessor System: Multiprogramming
2 pages
Teradata Performance Optimization
No ratings yet
Teradata Performance Optimization
7 pages
Providing Query Suggestions and Ranking For User Search History
No ratings yet
Providing Query Suggestions and Ranking For User Search History
5 pages
Lakos Large Scale C++
No ratings yet
Lakos Large Scale C++
5 pages

Quick Tour of ClickHouse Internals

Uploaded by

Quick Tour of ClickHouse Internals

Uploaded by

Quick Tour of ClickHouse Internals

Aleksey Zatelepin Yandex

SELECT Referer, count(*) AS count

The principle is the same as with classic DBMSes

primary.idx .mrk .bin .mrk .bin .mrk .bin

ALTER TABLE DROP PARTITION

ClickHouse: Sharding + Distributed tables!

SELECT FROM distributed_table

SELECT FROM local_table

Shard 1 Shard 2 Shard 3

Shard 1 Shard 2 Shard 3

Time, s. 1,224 0,438 0,043

Speedup x2.8 x28.5

INSERT INTO distributed_table

Shard 1 Shard 2 Shard 3

Async insert into shard #

INSERT INTO local_table

Shard 1 Shard 2 Shard 3

Split by sharding_key and insert

Shard 1 Shard 2 Shard 3

Ensure that the data is divided into shards uniformly

ClickHouse: ReplicatedMergeTree engine!

SELECT FROM replicated_table

Shard 1 Shard 2 Shard 3

Shard 1 Shard 2 Shard 3

Monitor replica lag

You might also like