0% found this document useful (0 votes)
161 views9 pages

Kudu

The document discusses Apache Kudu, an open source column-oriented data store. It provides an overview of Kudu's architecture, history, use cases, and how it compares to Apache HBase. Kudu is designed to enable both fast analytics and real-time queries on large datasets by combining fast inserts/updates with efficient columnar scans, filling a gap previously filled by complex hybrid architectures.

Uploaded by

Aman Raturi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views9 pages

Kudu

The document discusses Apache Kudu, an open source column-oriented data store. It provides an overview of Kudu's architecture, history, use cases, and how it compares to Apache HBase. Kudu is designed to enable both fast analytics and real-time queries on large datasets by combining fast inserts/updates with efficient columnar scans, filling a gap previously filled by complex hybrid architectures.

Uploaded by

Aman Raturi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

*8

Apache Kudu
Introduction

❑ Introduction
❑ Architecture
❑ History
❑ Why kudu
❑ Use Case
❑ Kudu vs HBase
Apache Kudu
Introduction

❑ Apache Kudu is a open source column-oriented data store of the Apache Hadoop
ecosystem.
❑ Kudu is storage for fast analytics on fast data.
❑ Kudu providing a combination of fast inserts and updates alongside efficient columnar
scans to enable multiple real-time analytic workloads across a single storage layer.
❑ Kudu fills the gap between HDFS and Apache HBase formerly solved with complex hybrid
architecture.
Apache Kudu
Architecture

The diagram shows a Kudu cluster with three masters and multiple tablet servers, each
serving multiple tablets. It illustrates how Raft consensus is used to allow for both leaders
and followers for both the masters and tablet servers. In addition, a tablet server can be a
leader for some tablets and a follower for others. Leaders are shown in gold, while
followers are shown in grey.
Apache Kudu
Architecture

Master tablet Tablet 1. Tablet 2. Tablet n

Master tablet Tablet 1


Tablet n
LEADER LEADER
FOLLOWER

Tablet 2
Master tablet Tablet 1
FOLLOWER
FOLLOWER FOLLOWER

Master tablet Tablet 1 Taet 2 Tablet n


FOLLOWER FOLLOWER FOLLOWER LEADER
Apache Kudu
History

❑ Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop
ecosystem. It is compatible with most of the data processing frameworks in the Hadoop
environment.
❑ The open source project to build Apache Kudu began as internal project at Cloudera. The
first version Apache Kudu 1.0 was released 19 September 2016.
Apache Kudu
Why Kudu

❑ Apache kudu is the disruptive technology to enable Real-Time analytics on fast data that
we have all been waiting for.
❑ Kudu is completely different than other Big data analytics solution.
❑ Kudu take advantage of Next Generation Hardware.
❑ Kudu supports SQL with Spark or Impala.
❑ Kudu enables killer “Big Data” Apps.
❑ Kudu should be part of your Big Data strategy.
Apache Kudu
Use case

The big data landscape was until 1-3 years ago dominated by several storage systems, the
first was Hadoop HDFS and later followed by Apache HBase, a NoSQL database. HDFS is
great for high-speed writes and scans while the latter is well suited for random-access
queries. A new storage engine, Apache Kudu tries to bridge the gap between those two
uses cases. Apache Kudu is a distributed, columnar database for structured, real-time data.
Because Kudu has a schema, it is only suited for structured data, contrary to HBase which is
schemaless.
Apache Kudu
Kudu vs HBase

❑ Apache HBase is an open-source, distributed, versioned, column-oriented store modeled


after Google Bigtable: A Distributed Storage System for Structured Data. Just as Bigtable
leverages the distributed data storage provided by the Google File System, HBase provides
Bigtable-like capabilities on top of Apache Hadoop.
❑ Performance
● OLTP
● Fast Point Queries
❑ HBase is fast for updates and inserts but for analytics
❑ A new addition to the open source Apache Hadoop ecosystem, Kudu completes Hadoop's
storage layer to enable fast analytics on fast data.
❑ Real time analytics
❑ Kudu is meant to do both well

You might also like