0% found this document useful (0 votes)

28 views23 pages

StarRocks Intro

StarRock

Uploaded by

George Zhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views23 pages

StarRocks Intro

StarRock

Uploaded by

George Zhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

StarRocks

Real-time Analytics Made Easy

Company Profile
StarRocks—A Brief History

Founded: May 2020, HQ in

Silicon Valley

Source code:On GitHub, Doris

Spin Off, 80% new code

StarRocks Growth: Already

used by over 500 companies
Challenges Facing
Real-time Analytics
Why real-time analytics at scale is so hard?
Challenges Facing Todays Real-time Analytics

The bad case of de-normalized table

D1 D1.1

D4 Fact D2
• Added complexity to data pipeline
D3

• Added delay to data ingestion Joined Tables Flat Table

• Extra hardware, development,

and maintenance cost
Denormalize if possible
• It does NOT accommodate biz
changes easily Converting “star schema” to
denormalized “flat schema”
Challenges Facing Todays Real-time Analytics

Real-time analytics do
NOT handle updates well
Update / delete—Clickhouse
• Updates are forced into DB Implemented as “Alter table update” (mutations)
Asynchronous
• Either Merge on Read or
https://clickhouse.Yandex/docs/en/query_language/alter/#alter-mutations

Segment Replacement
• Query performance struggles
when processing updates/deletes
• Many use cases CANNOT
be supported
Challenges Facing Todays Real-time Analytics

High concurrency or
real-time? Pick one!

• ClickHouse CPU intensive

architecture does NOT
support high concurrency well
• Default to ONLY 100
concurrent queries
• Not suitable for large user base
or external facing applications
Challenges Facing Todays Real-time Analytics

Extremely difficult
to maintain shard1 shard2 shard3 Resharding shard1 shard2 shard3 shard4 shard5

• Difficult to scale out with distributed distributed

heavy data re-balancing local local local

Rebalancing
local local local local local
shard_key shard_key shard_key shard_key shard_key shard_key shard_key shard_key
%3 = 0 %3 = 1 %3 = 2 %5 = 0 %5 = 1 %5 = 2 %5 = 3 %5 = 4

• Relies on many 3rd-party

components
The issue we faced was that Clickhouse doesn’t
• Complex data pipeline automatically rebalance data in the cluster when
we add new shards.
• Increased Total Cost
of Ownership
ClickHouse scale out issues
StarRocks, Real-time
Analytics Made Easy
StarRocks Key Capabilities

Blazing Fast Queries Real-time Insight

• OLAP or Ad-hoc analytics • Second-level data freshness

01 • Sub-second query latency

• Flat table or multi-table joins
02 • High speed data ingestion
• Real-time update and delete
• Query billions of rows

Analytics for Everyone Simple Operations

• Supports 1000s of concurrent users • Reduced data pipeline complexity

03 • Up to 10000 QPS (Queries Per

Second)
04 • Linear scalability
• Reduced TCO
StarRocks—Real-time Queries Made Easy

2x to 6x faster in standard
benchmark testing

Blazing fast queries on star

schema and flat tables
De-normalized tables are
NOT required
Greatly simplifies data pipeline
Opens doors to more use cases
StarRocks—Real-time Processing Made Easy

Blazing fast queries with

frequent data updates Online ETL to build single Druid / Pinot /
Others Kafka Kafka Application
data wide table ClickHouse

Batch replace

Streaming or Change Data lake

Data Capture
Online
Complete update/ StarRocks
data
Kafka Application

delete functions
Sub-second query
latency even when data
is frequently updated
StarRocks—Real-time Operations Made Easy

High concurrency
and high throughput

Support 10000s of concurrent users

Resource isolation based on queries
Linear scalability for better
concurrency
Gives the power of data-driven to all!
StarRocks—Real-time Operations Made Easy

Simple and Elegant

Architecture
Client Application

MySQL Protocol
No dependencies on
external components
FE–Leader FE–Leader FE–Observer

Auto scaling without human Catalog Manager Catalog Manager … Catalog Manager

Query Optimizer Query Optimizer Query Optimizer

intervention
Linear, predictable scaling model BE BE BE BE

Execution Engine Execution Engine Execution Engine … Execution Engine

Reduced Operational Costs Storage Engine Storage Engine Storage Engine Storage Engine
Summary: StarRocks Makes Real-time Analytics Easy

Other Products

Superior query performance without

De-normalized table is a necessary evil
de-normalization

Maintains performance while data is

Struggle with updates and deletes
frequently updated/deleted

Low concurrency with only 10 – 100 users High concurrency with 10000s of users

Complex architecture, 3rd-party Simplified architecture, easy to scale,

dependency, hard to maintain and scale and reduced TCO
World-class Engineering Features

Cost Based Optimizer Intelligent materialized view

The cornerstone for distributed join Transparent query acceleration
in query execution

Fully vectorized query engine Resource management

The only query engine with vectorized No single runaway query can bring
execution across CPU, memory, and down the cluster
storage layer

Pipeline execution 100% SQL compatible with

Fully leverage CPU cores for MySQL client protocol
parallel processing Out of box support for all major BI tools
Announcing
StarRocks Cloud
Embargoed until
Announcing StarRocks Cloud 8am EDT, July 14th, 2022

Cloud native deployment of StarRocks real-time analytics

Automated elastic cloud resource management

Separate of storage and compute in the cloud

Reduced system administration effort

Lower and transparent infrastructure cost

Initially on AWS with Azure and GCP available soon

Case Studies
Use Case: Minerva, the Metrics Store at Airbnb

Denormalize compute Background and Pain Points

cost is very high
Minerva is Airbnb's internal unified
metric platform
DB Exports Dimension Table
Denormalized Vision: ”Define metrics once, use them
Wide Table
Logging Fact Table
everywhere"
… Over 12,000 metrics and 4,000 dimensions
3rd Party Data Dimension Table
Denormalized
Wide Table
Used for various data consumption
Fact Table scenarios, such as A/B testing, data
exploration, and data analysis.
In Minerva v1, multiple flat tables to feed
into Druid
Any change from source requires refresh
Row Data Minerva Query Layer the wide tables and it can take hours
Use Case: Minerva, the Enterprise Metrics
Store at Airbnb
Minerva on Demand, Powered
by StarRocks
No need for pre-aggregate like
DB Exports Dimension Table Druid does
Denormalized
Wide Table
Logging Fact Table
Handle high-cardinality dimensions
much better
3rd Party Data Dimension Table
Less than 20% of data need to be
de-normalized
Fact Table
The rest are queried in star schema
on the fly

Improved data freshness

and reduced TCO
Row Data Minerva StarRocks
Case: Real-time Analytics at a Social Media App
With 200mm+ Active Users

2017 2018 2019 2020 2021

From batch to real-time analytics: Redshift ➾ Hive/Presto ➾ ClickHouse ➾ StarRocks

Built real-time advertisement data platform on ClickHouse in 2019
Had stability, concurrency, and update issues when data volume and # of users grew
StarRocks replaced ClickHouse in 2021 as the new Advertisement Data Platform
Thank You

KubeVirt CDI in Practice: The Complete Guide for Developers and Engineers
From Everand
KubeVirt CDI in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Seldon Core Triton Integration for Scalable Model Serving: The Complete Guide for Developers and Engineers
From Everand
Seldon Core Triton Integration for Scalable Model Serving: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Lenses.io for Data Streaming Platforms: The Complete Guide for Developers and Engineers
From Everand
Lenses.io for Data Streaming Platforms: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
FaaS-netes Deployment and Operations: The Complete Guide for Developers and Engineers
From Everand
FaaS-netes Deployment and Operations: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
KSQL for Stream Processing: Definitive Reference for Developers and Engineers
From Everand
KSQL for Stream Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Science Workflows with Vaex: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Science Workflows with Vaex: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
KeyDB Administration and Performance Tuning: Definitive Reference for Developers and Engineers
From Everand
KeyDB Administration and Performance Tuning: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Liquibase in Practice: Definitive Reference for Developers and Engineers
From Everand
Liquibase in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DP-500 Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI Exam Guide
From Everand
DP-500 Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI Exam Guide
Anand Vemula
No ratings yet
Kestra Pipeline Orchestration Essentials: The Complete Guide for Developers and Engineers
From Everand
Kestra Pipeline Orchestration Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
PrestoDB in Practice: Definitive Reference for Developers and Engineers
From Everand
PrestoDB in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Redshift Essentials: Definitive Reference for Developers and Engineers
From Everand
Redshift Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Couchbase Essentials: Definitive Reference for Developers and Engineers
From Everand
Couchbase Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
From Everand
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Synapse Administration and Deployment: The Complete Guide for Developers and Engineers
From Everand
Synapse Administration and Deployment: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Sqoop Essentials: Definitive Reference for Developers and Engineers
From Everand
Sqoop Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DBeaver Essentials: Definitive Reference for Developers and Engineers
From Everand
DBeaver Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Lakes & Pipelines: A Modern Azure Guide
From Everand
Data Lakes & Pipelines: A Modern Azure Guide
Kameron Hussain
No ratings yet
CrateDB for IoT and Machine Data: The Complete Guide for Developers and Engineers
From Everand
CrateDB for IoT and Machine Data: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
RisingWave for Real-Time Data Processing: The Complete Guide for Developers and Engineers
From Everand
RisingWave for Real-Time Data Processing: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Pipeline Automation with Airbyte: Definitive Reference for Developers and Engineers
From Everand
Data Pipeline Automation with Airbyte: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Databricks Platform Essentials: Definitive Reference for Developers and Engineers
From Everand
Databricks Platform Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
From Everand
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
From Everand
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
Robert Johnson
No ratings yet
Informatica PowerCenter Workflow and Transformation Guide: Definitive Reference for Developers and Engineers
From Everand
Informatica PowerCenter Workflow and Transformation Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Accpac - Guide - Manual For AP User Guide PDF
100% (8)
Accpac - Guide - Manual For AP User Guide PDF
509 pages
Minikube in Practice: Definitive Reference for Developers and Engineers
From Everand
Minikube in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering DuckDB: High-Performance Analytics Made Easy
From Everand
Mastering DuckDB: High-Performance Analytics Made Easy
Robert Johnson
No ratings yet
Manual
No ratings yet
Manual
93 pages
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
From Everand
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Example Lab Mis 201-2nd Semester Syllabus 1436-37
No ratings yet
Example Lab Mis 201-2nd Semester Syllabus 1436-37
2 pages
Airflow for Data Workflow Automation
From Everand
Airflow for Data Workflow Automation
Richard Johnson
No ratings yet
OSCP Notes NagendranGS
No ratings yet
OSCP Notes NagendranGS
58 pages
HTML Xmlns
No ratings yet
HTML Xmlns
12 pages
How To Download Free VidMate
No ratings yet
How To Download Free VidMate
8 pages
Lecture:11-13 INTERRUPTS: Course Instructor: Dr. Devyani Gupta
No ratings yet
Lecture:11-13 INTERRUPTS: Course Instructor: Dr. Devyani Gupta
35 pages
Report CSC 186 PDF
No ratings yet
Report CSC 186 PDF
26 pages
Print Resume - PHP
No ratings yet
Print Resume - PHP
2 pages
How To Build A Faceless YouTube Channel - PDF Guide
100% (1)
How To Build A Faceless YouTube Channel - PDF Guide
17 pages
SOLID Principles
No ratings yet
SOLID Principles
22 pages
CH 1
No ratings yet
CH 1
218 pages
Differences Between SAP S4 HANA and Other ERPs
No ratings yet
Differences Between SAP S4 HANA and Other ERPs
4 pages
Robot Toolbox MATLAB
100% (1)
Robot Toolbox MATLAB
93 pages
Apache Calcite - A Foundational Framework For Optimized Query Processing Over Heterogeneous Data Sources - Sigmod-2018
No ratings yet
Apache Calcite - A Foundational Framework For Optimized Query Processing Over Heterogeneous Data Sources - Sigmod-2018
23 pages
Airbnb Case Study
No ratings yet
Airbnb Case Study
9 pages
DP - 19 - 3 - Practice FAZRULAKMALFADILA - C2C022001
No ratings yet
DP - 19 - 3 - Practice FAZRULAKMALFADILA - C2C022001
39 pages
अभियोजन जर्नल, वर्ष ३, अंक १ । - 1608015140
No ratings yet
अभियोजन जर्नल, वर्ष ३, अंक १ । - 1608015140
369 pages
Electronic Evidence in Tanzania
100% (1)
Electronic Evidence in Tanzania
171 pages
E N /IP PSTX S S: Ther ET FOR OFT Tarter
No ratings yet
E N /IP PSTX S S: Ther ET FOR OFT Tarter
3 pages
Unit-3 & 4: Concurrency & Interprocess Communication
No ratings yet
Unit-3 & 4: Concurrency & Interprocess Communication
81 pages
Tenorshare 4ukey Crack Latest Version
100% (1)
Tenorshare 4ukey Crack Latest Version
45 pages
List View
No ratings yet
List View
3 pages
Modern Data Pipelines With Apache Airflow
No ratings yet
Modern Data Pipelines With Apache Airflow
36 pages
BD-Unit03-6 7 and 8-1
No ratings yet
BD-Unit03-6 7 and 8-1
68 pages
Social Bookmarking Submission High Quality Submission Google Base Submission Classified Submission
No ratings yet
Social Bookmarking Submission High Quality Submission Google Base Submission Classified Submission
16 pages
Capstone Project
No ratings yet
Capstone Project
57 pages
CIPT Onl Mod4Transcript PDF
No ratings yet
CIPT Onl Mod4Transcript PDF
16 pages
Practical File ON Information System Management-Lab: Bachelor of Business Administration Batch 2019 - 22
No ratings yet
Practical File ON Information System Management-Lab: Bachelor of Business Administration Batch 2019 - 22
28 pages
Fix: The Group Header of A Data Grouping Is Not Printed at The Top of A Page As Expected After You Install Visual Foxpro 9.0 Service Pack 2
No ratings yet
Fix: The Group Header of A Data Grouping Is Not Printed at The Top of A Page As Expected After You Install Visual Foxpro 9.0 Service Pack 2
2 pages
Gitops: Florian Beetz Anja Kammer Simon Harrer
No ratings yet
Gitops: Florian Beetz Anja Kammer Simon Harrer
53 pages
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Real Time Data Warehousing and Its Applications
No ratings yet
Real Time Data Warehousing and Its Applications
10 pages
16 BHA Tally Format
No ratings yet
16 BHA Tally Format
2 pages
Microsoft Word
No ratings yet
Microsoft Word
19 pages
Module 04
No ratings yet
Module 04
42 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
Big Data - Module 1
No ratings yet
Big Data - Module 1
35 pages
Basic Concepts in Big Data 1
No ratings yet
Basic Concepts in Big Data 1
43 pages
Fcis PPT Group 7
No ratings yet
Fcis PPT Group 7
47 pages
KMSmicro VJJHJH
No ratings yet
KMSmicro VJJHJH
10 pages
Big data-UNIT 1
No ratings yet
Big data-UNIT 1
39 pages
15 Big Data Tools and Technologies To Know About in 2021
No ratings yet
15 Big Data Tools and Technologies To Know About in 2021
7 pages
Building A Big Data Architecture - Core Components, Best Practices
No ratings yet
Building A Big Data Architecture - Core Components, Best Practices
6 pages
Creating Custom Web ADI Integrators
No ratings yet
Creating Custom Web ADI Integrators
4 pages
Architectures of Big Data
No ratings yet
Architectures of Big Data
27 pages
New World Hadoop Architectures (& What Problems They Really Solve) For Dbas
No ratings yet
New World Hadoop Architectures (& What Problems They Really Solve) For Dbas
44 pages
Modulo 1 - Fundamentos de Big Data
No ratings yet
Modulo 1 - Fundamentos de Big Data
4 pages
Digitization Week 3
No ratings yet
Digitization Week 3
13 pages
Introduction To Big Data, Hadoop and Spark
No ratings yet
Introduction To Big Data, Hadoop and Spark
40 pages
Demystifying Big Data RGc1.0
100% (1)
Demystifying Big Data RGc1.0
10 pages
19174the Rise of Industrial Big Data WP Gft834
No ratings yet
19174the Rise of Industrial Big Data WP Gft834
6 pages
Introduction To Big Data Analytics
100% (4)
Introduction To Big Data Analytics
112 pages
DBMS Learning Material 1
No ratings yet
DBMS Learning Material 1
9 pages
BDE ManagedHadoopDataLakes PAVLIK PDF
No ratings yet
BDE ManagedHadoopDataLakes PAVLIK PDF
10 pages
S4HANA Overview
No ratings yet
S4HANA Overview
36 pages
Big Data
No ratings yet
Big Data
5 pages
Managing Your Assets With Big Data Tools
No ratings yet
Managing Your Assets With Big Data Tools
54 pages
Oracle Data Guard 11gR2 Administration Beginner's Guide
From Everand
Oracle Data Guard 11gR2 Administration Beginner's Guide
Emre Baransel
No ratings yet

StarRocks Intro

Uploaded by

StarRocks Intro

Uploaded by

StarRocks

Real-time Analytics Made Easy

Founded: May 2020, HQ in

Source code:On GitHub, Doris

StarRocks Growth: Already

The bad case of de-normalized table

• Added delay to data ingestion Joined Tables Flat Table

• Extra hardware, development,

• ClickHouse CPU intensive

• Difficult to scale out with distributed distributed

heavy data re-balancing local local local

• Relies on many 3rd-party

Blazing Fast Queries Real-time Insight

01 • Sub-second query latency

Analytics for Everyone Simple Operations

03 • Up to 10000 QPS (Queries Per

Blazing fast queries on star

Blazing fast queries with

Streaming or Change Data lake

Support 10000s of concurrent users

Simple and Elegant

Query Optimizer Query Optimizer Query Optimizer

Execution Engine Execution Engine Execution Engine … Execution Engine

Superior query performance without

Maintains performance while data is

Complex architecture, 3rd-party Simplified architecture, easy to scale,

Cost Based Optimizer Intelligent materialized view

Fully vectorized query engine Resource management

Pipeline execution 100% SQL compatible with

Cloud native deployment of StarRocks real-time analytics

Automated elastic cloud resource management

Separate of storage and compute in the cloud

Reduced system administration effort

Lower and transparent infrastructure cost

Initially on AWS with Azure and GCP available soon

Denormalize compute Background and Pain Points

Improved data freshness

2017 2018 2019 2020 2021

From batch to real-time analytics: Redshift ➾ Hive/Presto ➾ ClickHouse ➾ StarRocks

You might also like