Kudu

The document discusses Apache Kudu, an open source column-oriented data store. It provides an overview of Kudu's architecture, history, use cases, and how it compares to Apache HBase. Kudu is designed to enable both fast analytics and real-time queries on large datasets by combining fast inserts/updates with efficient columnar scans, filling a gap previously filled by complex hybrid architectures.

Uploaded by

Aman Raturi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

161 views9 pages

Kudu

Uploaded by

Aman Raturi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

*8

Apache Kudu
Introduction

❑ Introduction
❑ Architecture
❑ History
❑ Why kudu
❑ Use Case
❑ Kudu vs HBase
Apache Kudu
Introduction

❑ Apache Kudu is a open source column-oriented data store of the Apache Hadoop
ecosystem.
❑ Kudu is storage for fast analytics on fast data.
❑ Kudu providing a combination of fast inserts and updates alongside efficient columnar
scans to enable multiple real-time analytic workloads across a single storage layer.
❑ Kudu fills the gap between HDFS and Apache HBase formerly solved with complex hybrid
architecture.
Apache Kudu
Architecture

The diagram shows a Kudu cluster with three masters and multiple tablet servers, each
serving multiple tablets. It illustrates how Raft consensus is used to allow for both leaders
and followers for both the masters and tablet servers. In addition, a tablet server can be a
leader for some tablets and a follower for others. Leaders are shown in gold, while
followers are shown in grey.
Apache Kudu
Architecture

Master tablet Tablet 1. Tablet 2. Tablet n

Master tablet Tablet 1

Tablet n
LEADER LEADER
FOLLOWER

Tablet 2
Master tablet Tablet 1
FOLLOWER
FOLLOWER FOLLOWER

Master tablet Tablet 1 Taet 2 Tablet n

FOLLOWER FOLLOWER FOLLOWER LEADER
Apache Kudu
History

❑ Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop
ecosystem. It is compatible with most of the data processing frameworks in the Hadoop
environment.
❑ The open source project to build Apache Kudu began as internal project at Cloudera. The
first version Apache Kudu 1.0 was released 19 September 2016.
Apache Kudu
Why Kudu

❑ Apache kudu is the disruptive technology to enable Real-Time analytics on fast data that
we have all been waiting for.
❑ Kudu is completely different than other Big data analytics solution.
❑ Kudu take advantage of Next Generation Hardware.
❑ Kudu supports SQL with Spark or Impala.
❑ Kudu enables killer “Big Data” Apps.
❑ Kudu should be part of your Big Data strategy.
Apache Kudu
Use case

The big data landscape was until 1-3 years ago dominated by several storage systems, the
first was Hadoop HDFS and later followed by Apache HBase, a NoSQL database. HDFS is
great for high-speed writes and scans while the latter is well suited for random-access
queries. A new storage engine, Apache Kudu tries to bridge the gap between those two
uses cases. Apache Kudu is a distributed, columnar database for structured, real-time data.
Because Kudu has a schema, it is only suited for structured data, contrary to HBase which is
schemaless.
Apache Kudu
Kudu vs HBase

❑ Apache HBase is an open-source, distributed, versioned, column-oriented store modeled

after Google Bigtable: A Distributed Storage System for Structured Data. Just as Bigtable
leverages the distributed data storage provided by the Google File System, HBase provides
Bigtable-like capabilities on top of Apache Hadoop.
❑ Performance
● OLTP
● Fast Point Queries
❑ HBase is fast for updates and inserts but for analytics
❑ A new addition to the open source Apache Hadoop ecosystem, Kudu completes Hadoop's
storage layer to enable fast analytics on fast data.
❑ Real time analytics
❑ Kudu is meant to do both well

Lab - Qlik Replicate Azure Databricks
No ratings yet
Lab - Qlik Replicate Azure Databricks
16 pages
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
No ratings yet
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
39 pages
Data Lake Bootcamp: Building Reliable Data Lakes
No ratings yet
Data Lake Bootcamp: Building Reliable Data Lakes
29 pages
Kafka in Action
100% (4)
Kafka in Action
209 pages
Google: Professional-Data-Engineer Exam
100% (1)
Google: Professional-Data-Engineer Exam
12 pages
ProMoTe A Data Product Model Template For Data Meshes
No ratings yet
ProMoTe A Data Product Model Template For Data Meshes
18 pages
Big Data Technology Stack
100% (1)
Big Data Technology Stack
12 pages
Cloudera Kudu
100% (1)
Cloudera Kudu
102 pages
Metadata Management On A Hadoop Eco-System: Whitepaper by
No ratings yet
Metadata Management On A Hadoop Eco-System: Whitepaper by
12 pages
Installing and Using Impala
No ratings yet
Installing and Using Impala
248 pages
Drill Slides
No ratings yet
Drill Slides
14 pages
Apache HIVE
No ratings yet
Apache HIVE
9 pages
Battle of The Giants - Comparing Kimball and Inmon
No ratings yet
Battle of The Giants - Comparing Kimball and Inmon
15 pages
Fundamentals of Big Data Engineering: A Guide To The
No ratings yet
Fundamentals of Big Data Engineering: A Guide To The
14 pages
Big Data Fabric Architecture
No ratings yet
Big Data Fabric Architecture
15 pages
Data Architecture
No ratings yet
Data Architecture
1 page
Cloudera Hadoop Introduction PDF
100% (1)
Cloudera Hadoop Introduction PDF
50 pages
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
No ratings yet
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
15 pages
Enabling Scalable OLAP Directly On A Data Lakehouse Architecture
No ratings yet
Enabling Scalable OLAP Directly On A Data Lakehouse Architecture
39 pages
Data Ingestion Architecture For Telecom
No ratings yet
Data Ingestion Architecture For Telecom
10 pages
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
No ratings yet
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
12 pages
Govindarajan Data Vault PDF
100% (1)
Govindarajan Data Vault PDF
29 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
Serverless Architecture For Product Defect Detection Using Computer Vision Ra
No ratings yet
Serverless Architecture For Product Defect Detection Using Computer Vision Ra
1 page
Hadoop ECO System
No ratings yet
Hadoop ECO System
1 page
The Benefits of Delta Lake and Lakehouse Architecture
No ratings yet
The Benefits of Delta Lake and Lakehouse Architecture
3 pages
Big Data Landscape 2017
No ratings yet
Big Data Landscape 2017
1 page
Data Lakes For Maximum Flexibility
No ratings yet
Data Lakes For Maximum Flexibility
29 pages
Modern Data Architecture: Bywhinmon
No ratings yet
Modern Data Architecture: Bywhinmon
10 pages
Big Data
0% (1)
Big Data
2 pages
Data Warehouse Design For E-Commerce Environment
No ratings yet
Data Warehouse Design For E-Commerce Environment
26 pages
Data Warehousing and BA
No ratings yet
Data Warehousing and BA
77 pages
NoSQL Gnosis. - Resp
No ratings yet
NoSQL Gnosis. - Resp
22 pages
Federated vs. Centeralized vs. De-Centeralized Data Warehouse
No ratings yet
Federated vs. Centeralized vs. De-Centeralized Data Warehouse
5 pages
Cloudera Introduction
No ratings yet
Cloudera Introduction
93 pages
Eb Data Lake Vs Data Warehouse Selection Guide en
No ratings yet
Eb Data Lake Vs Data Warehouse Selection Guide en
20 pages
Big Data Architectural Patterns and Best Practices On AWS Presentation
100% (1)
Big Data Architectural Patterns and Best Practices On AWS Presentation
56 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
Big Data Technologies
No ratings yet
Big Data Technologies
4 pages
Top 10 Guidelines For Deploying Modern Data Architecture For The Data Driven Enterprise
No ratings yet
Top 10 Guidelines For Deploying Modern Data Architecture For The Data Driven Enterprise
6 pages
An Investigation of NoSQL Database Performance From A MYSQL Perspective
No ratings yet
An Investigation of NoSQL Database Performance From A MYSQL Perspective
3 pages
Data Vault and HQDM Principles PDF
No ratings yet
Data Vault and HQDM Principles PDF
8 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
Building Medallion Architectures 1742969743
No ratings yet
Building Medallion Architectures 1742969743
18 pages
Four Distributed System Architectural Patterns
No ratings yet
Four Distributed System Architectural Patterns
10 pages
Mongodb Interview Questions (V4.4)
No ratings yet
Mongodb Interview Questions (V4.4)
25 pages
Talend ESB Container AG 50b en
No ratings yet
Talend ESB Container AG 50b en
63 pages
Azure Synpase Analytics Service
No ratings yet
Azure Synpase Analytics Service
22 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
Deep Dive Aurora
No ratings yet
Deep Dive Aurora
55 pages
Unit 4: SAP Analytics Cloud Integration
No ratings yet
Unit 4: SAP Analytics Cloud Integration
14 pages
Apache Cassandra
No ratings yet
Apache Cassandra
7 pages
Buyers Guide SUSE Rancher 2.6 OpenShift Tanzu - Anthos
No ratings yet
Buyers Guide SUSE Rancher 2.6 OpenShift Tanzu - Anthos
39 pages
Chapter 12 - Data Warehousing and Online Analytical Processing
No ratings yet
Chapter 12 - Data Warehousing and Online Analytical Processing
20 pages
Big Data: by It Faculty Alttc Ghaziabad
No ratings yet
Big Data: by It Faculty Alttc Ghaziabad
26 pages
Lab - Qlik Replicate Oracle To Azure Synapse
No ratings yet
Lab - Qlik Replicate Oracle To Azure Synapse
23 pages
Data As A Service The Future of Data Management
No ratings yet
Data As A Service The Future of Data Management
7 pages
Data Lakehouse
No ratings yet
Data Lakehouse
7 pages
The Modernization of The Data Warehouse
No ratings yet
The Modernization of The Data Warehouse
17 pages
Kudu
No ratings yet
Kudu
13 pages
Apache Kudu: Zbigniew Baranowski
No ratings yet
Apache Kudu: Zbigniew Baranowski
40 pages
DocScanner Jan 12, 2023 2-29 PM
No ratings yet
DocScanner Jan 12, 2023 2-29 PM
32 pages
Bda Unit 4
No ratings yet
Bda Unit 4
20 pages
Big Data Analytics
No ratings yet
Big Data Analytics
7 pages
Pig
No ratings yet
Pig
12 pages
Unit 3.BigData Notes
No ratings yet
Unit 3.BigData Notes
19 pages
Sentimental Analysis On Big Data Hadoop: Abstract
No ratings yet
Sentimental Analysis On Big Data Hadoop: Abstract
3 pages
CSE Syl BOS 14-15 Draft-161-188
No ratings yet
CSE Syl BOS 14-15 Draft-161-188
28 pages
Big Data Storage: Made by Urmil Sehgal 6 Semseter (E) (02524302011)
No ratings yet
Big Data Storage: Made by Urmil Sehgal 6 Semseter (E) (02524302011)
22 pages
Apache Hive: General Information About Hive
No ratings yet
Apache Hive: General Information About Hive
3 pages
Big Data Notes UNIT-1
No ratings yet
Big Data Notes UNIT-1
14 pages
Babel A Generic Benchmarking Platform
No ratings yet
Babel A Generic Benchmarking Platform
10 pages
Big Data Analytics in Digital Banking - Matt - Assignment2no1 - Final
No ratings yet
Big Data Analytics in Digital Banking - Matt - Assignment2no1 - Final
10 pages
Flink: Big Data Huawei Course
No ratings yet
Flink: Big Data Huawei Course
22 pages
Unit-1 What Is Big Data
No ratings yet
Unit-1 What Is Big Data
26 pages
Big Data Assignment
No ratings yet
Big Data Assignment
9 pages
People'S University, Bhopal Syllabus of Examination Choice Based Credit System (CBCS)
No ratings yet
People'S University, Bhopal Syllabus of Examination Choice Based Credit System (CBCS)
23 pages
Mock Exam M4
No ratings yet
Mock Exam M4
11 pages
Learn Well Technocraft: Hadoop/Big Data Syllabus
100% (1)
Learn Well Technocraft: Hadoop/Big Data Syllabus
12 pages
Big Data Unit 1 AKTU Notes
No ratings yet
Big Data Unit 1 AKTU Notes
87 pages
Notes For Big Data
No ratings yet
Notes For Big Data
21 pages
构建基于Apache Kylin的大数据分析平台讲话
No ratings yet
构建基于Apache Kylin的大数据分析平台讲话
37 pages
Cloudera Administrator Training Slides PDF
No ratings yet
Cloudera Administrator Training Slides PDF
601 pages
Pyspark File Commands and Theory
No ratings yet
Pyspark File Commands and Theory
29 pages
Bad601 Lab Maual
No ratings yet
Bad601 Lab Maual
34 pages
"Big Data With Hadoop": Faculty Development Program On
No ratings yet
"Big Data With Hadoop": Faculty Development Program On
2 pages
CBLM For BigData Data Analytics and Data Science
No ratings yet
CBLM For BigData Data Analytics and Data Science
374 pages
Syllabus
No ratings yet
Syllabus
35 pages
Advanced Certificate Programme DS 1669897036711 PDF
No ratings yet
Advanced Certificate Programme DS 1669897036711 PDF
34 pages
BIG DATA ANALYTICS MCQs
No ratings yet
BIG DATA ANALYTICS MCQs
8 pages

Kudu

Uploaded by

Kudu

Uploaded by

*8

Master tablet Tablet 1. Tablet 2. Tablet n

Master tablet Tablet 1

Master tablet Tablet 1 Taet 2 Tablet n

❑ Apache HBase is an open-source, distributed, versioned, column-oriented store modeled

You might also like