0% found this document useful (0 votes)

383 views21 pages

Getting Started With Apache Kafka

Getting Started with Apache Kafka Apache Kafka is a distributed messaging system designed to move data at high volumes. It addresses shortcomings of traditional data movement tools and approaches. Invented by LinkedIn to address data growth issues, it was open-sourced under the Apache Software Foundation in 2012. Kafka has seen first-choice adoption for data movement by hundreds of enterprises and internet-scale companies.

Uploaded by

ancgate

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

383 views21 pages

Getting Started With Apache Kafka

Uploaded by

ancgate

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Getting Started with Apache Kafka

WHY APACHE KAFKA

Ryan Plant
COURSE AUTHOR

@ryan_plant blog.ryanplant.com
What Is Apache Kafka?

Microsoft ElasticSearch
SQL Server

MongoDB
Oracle

MySQL
Apache Kafka
Hadoop

“A high-throughput distributed messaging system.”

What a Typical Enterprise Looks Like

RDBMS LOGS NOSQL QUEUES BLOBS

DW HADOOP SEARCH ANALYSIS

Database replication
Log shipping
Extract, Transform, and Load (ETL)
Messaging
Custom middleware magic
Database Replication and Log Shipping

RDBMS to RDBMS only

Database-specific
Tight coupling (schema)
Performance challenges (log shipping)
Cumbersome (subscriptions)
Extract, Transform, and Load (ETL)

Typically proprietary and costly

Lots of custom development
Scalability challenged
Performance challenged
Often times requires multiple instances
Messaging

Limited scalability
Smaller messages
Requires rapid consumption
Not fault-tolerant (application)
Perils of Messaging Under High Volume

High volume?
Publishers Message size?
No throttle?

Single host?
Local storage?
Message buildup?
BROKER

Consumers No consumption?
Slow consumption?
Perils of Messaging With Application Faults

Publishers

BROKER

Message
Consumers processing
bug
Middleware Magic

Increasingly complex
Deceiving
Consistency concerns
Potentially expensive
Middleware Challenges
Multi-write pattern Message broker pattern

Atomic
transaction

Coordination 1 2 1 2
Competing
logic consumers

Non-consuming
consumer
Isn’t There a Better Way?

To move data around:

- Cleanly
- Reliably
- Quickly
- Autonomously
That’s What LinkedIn
Asked in 2010…
High Volume:
- Over 1.4 trillion messages per day
- 175 terabytes per day
- 650 terabytes of messages consumed
per day
- Over 433 million users

High Velocity:
- Peak 13 million messages per second
- 2.75 gigabytes per second

High Variety:
- Multiple RDBMS (Oracle, MySQL, etc.)
- Multiple NoSQL (Espresso, Voldemort)
- Hadoop, Spark, etc.
Pre-2010 LinkedIn Data Architecture

skills recommendations
comments jobs
network updates ads mail search
groups people you may know profile news stats

…
kaf  ka  esque /’káf, kə, ɛsk/ | adjective
Basically it describes a nightmarish situation which most
people can somehow relate to, although strongly surreal.
synonyms: surreal, lucid, spoilsbury toast boy
Usage: “Whoa! This flick is way kafkaesque…”
Franz Kafka

Source: Urban Dictionary

Next-generation Messaging Goals

High throughput
Horizontally scalable
Reliable and durable
Loosely coupled Producers and Consumers
Flexible publish-subscribe semantics
Post-2010 LinkedIn Data Architecture

LOB
APPS
APPS
DBs LOGS
consume consume
consume consume consume
consume

publish publish publish

topic topic topic …

consume consume consume consume publish

publish publish

DW MARTS HDP SEARCH OPS …

Timeline of Events

2010 Today
2003 Initial Kafka 1.1 Trillion
LinkedIn Launch Deployment @ messages per
LinkedIn day @ LinkedIn

2011
2009
Kafka Open Sourced
Kafka Inception
Apache Software
Development begins
Foundation
Apache Kafka Adoption
7X since 2015

Yahoo Uber Square Airbnb

Etsy Oracle Coursera Spotify
Microsoft Goldman Sachs IBM Ancestry
Bing Netflix Pintrest LinkedIn
Mailchimp PayPal Twitter Hotels.com
Kafka is a distributed messaging system
Designed to move data at high volumes
Addresses shortcomings of traditional
Summary data movement tools and approaches
Invented by LinkedIn to address data
growth issues common to many
enterprises
Open-sourced under Apache Software
Foundation in 2012
First-choice adoption for data movement
for hundreds of enterprise and internet-
scale companies

Apache Kafka
No ratings yet
Apache Kafka
9 pages
Module 8: Network and Security: Hands-On
No ratings yet
Module 8: Network and Security: Hands-On
9 pages
Kafka Setup and Streaming Guide
No ratings yet
Kafka Setup and Streaming Guide
111 pages
Kubernatis (K8s) Configuration Guide
No ratings yet
Kubernatis (K8s) Configuration Guide
20 pages
5 Kafka Producer Advanced
No ratings yet
5 Kafka Producer Advanced
152 pages
BK Hdfs Administration
No ratings yet
BK Hdfs Administration
73 pages
Kafka and Zookeeper Setup Guide
No ratings yet
Kafka and Zookeeper Setup Guide
4 pages
Terraform Course: Setup & Tools Guide
No ratings yet
Terraform Course: Setup & Tools Guide
5 pages
Understanding Elasticsearch Basics
No ratings yet
Understanding Elasticsearch Basics
19 pages
Kafka's Architecture: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Kafka's Architecture: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
AWS Data Lake Lab: Athena & QuickSight
No ratings yet
AWS Data Lake Lab: Athena & QuickSight
22 pages
Pavan Resume
No ratings yet
Pavan Resume
3 pages
DevOps & Cloud Engineering Expertise
No ratings yet
DevOps & Cloud Engineering Expertise
7 pages
Tomcat Server 7: Architecture & Admin
100% (1)
Tomcat Server 7: Architecture & Admin
36 pages
Kubernetes: From Containers to Complex Workloads
No ratings yet
Kubernetes: From Containers to Complex Workloads
26 pages
Unit 5 Apache Kafka Notes
No ratings yet
Unit 5 Apache Kafka Notes
54 pages
Apache Kafka Setup Guide
No ratings yet
Apache Kafka Setup Guide
3 pages
Microservices Project
No ratings yet
Microservices Project
17 pages
Automating Stateful Apps with Operators
No ratings yet
Automating Stateful Apps with Operators
28 pages
DevOps & Cloud Engineering Expertise
No ratings yet
DevOps & Cloud Engineering Expertise
7 pages
DevOps Engineer Opportunity
No ratings yet
DevOps Engineer Opportunity
2 pages
Spark & Scala for Developers
No ratings yet
Spark & Scala for Developers
40 pages
Lab6 - Implement Azure Kubernetes Service
No ratings yet
Lab6 - Implement Azure Kubernetes Service
6 pages
IT Cloud & DevOps Expertise
No ratings yet
IT Cloud & DevOps Expertise
12 pages
Machine Learning Spark ML
No ratings yet
Machine Learning Spark ML
11 pages
Cloudera Distribution of Apache Kafka
No ratings yet
Cloudera Distribution of Apache Kafka
56 pages
BD - Spark - Baladasu A - SightSpectrum
No ratings yet
BD - Spark - Baladasu A - SightSpectrum
3 pages
TalendOpenStudio BigData UG 5.2.1 en
No ratings yet
TalendOpenStudio BigData UG 5.2.1 en
266 pages
DevOps & AWS Engineer Expertise Summary
No ratings yet
DevOps & AWS Engineer Expertise Summary
6 pages
Aws
No ratings yet
Aws
3 pages
Abdul Bari Mohammed
No ratings yet
Abdul Bari Mohammed
7 pages
Nagios 3
0% (1)
Nagios 3
11 pages
Cloudera Kafka PDF
No ratings yet
Cloudera Kafka PDF
175 pages
Hadoop Administration Interview Guide
No ratings yet
Hadoop Administration Interview Guide
26 pages
Machine Learning in Spark
100% (1)
Machine Learning in Spark
26 pages
Devops For Freshers
No ratings yet
Devops For Freshers
63 pages
04 1) +EC2+instance+Lab
No ratings yet
04 1) +EC2+instance+Lab
45 pages
2019-Infrastructure As Code With Terraform
No ratings yet
2019-Infrastructure As Code With Terraform
43 pages
Docker Basics: Containers & Images
No ratings yet
Docker Basics: Containers & Images
5 pages
Amazon Web Services Hands-On IAM: December, 2012
No ratings yet
Amazon Web Services Hands-On IAM: December, 2012
9 pages
Jenkins Installation Guide for Windows
100% (1)
Jenkins Installation Guide for Windows
17 pages
01 - Intro To Red Hat JBoss 20150702
No ratings yet
01 - Intro To Red Hat JBoss 20150702
52 pages
Spark Training in Bangalore
No ratings yet
Spark Training in Bangalore
36 pages
How To Setup An SFTP Server On CentOS
No ratings yet
How To Setup An SFTP Server On CentOS
7 pages
Kubernetes Autoscaling Guide
No ratings yet
Kubernetes Autoscaling Guide
7 pages
Irrshath M. Baawa: Sr. Devops Engineer
No ratings yet
Irrshath M. Baawa: Sr. Devops Engineer
8 pages
DevOps & Cloud Engineering Expertise
No ratings yet
DevOps & Cloud Engineering Expertise
7 pages
Integrated Replicat GoldenGate 12c New Feature
No ratings yet
Integrated Replicat GoldenGate 12c New Feature
8 pages
DevOps & Cloud Engineer Resume
No ratings yet
DevOps & Cloud Engineer Resume
4 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
Administrator Exercise Instructions 201306
No ratings yet
Administrator Exercise Instructions 201306
117 pages
Cloud DevOps Engineer Resume
No ratings yet
Cloud DevOps Engineer Resume
8 pages
Microservices On GCP: How I Learned To Stop Worrying and Learned To Love The Mesh
No ratings yet
Microservices On GCP: How I Learned To Stop Worrying and Learned To Love The Mesh
31 pages
Tauqeer Iqbal AWS Architect IDC
No ratings yet
Tauqeer Iqbal AWS Architect IDC
6 pages
Cloudera Manager Administration Guide
No ratings yet
Cloudera Manager Administration Guide
78 pages
Kafka Ebook SoftwareMill
100% (1)
Kafka Ebook SoftwareMill
27 pages
Kafka
No ratings yet
Kafka
12 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Kafka PDF
No ratings yet
Kafka PDF
106 pages
Kafka Data Bus-Stream Bus
No ratings yet
Kafka Data Bus-Stream Bus
3 pages
MTA BridgesAndTunnelsHourlyTrafficRates DataDictionary
No ratings yet
MTA BridgesAndTunnelsHourlyTrafficRates DataDictionary
1 page
DoD Cloud Strategy Overview
No ratings yet
DoD Cloud Strategy Overview
24 pages
09 - Azure Data Engineering Cheatsheet
No ratings yet
09 - Azure Data Engineering Cheatsheet
37 pages
Nelson - DSP Conference Briefing - DoD CIO
No ratings yet
Nelson - DSP Conference Briefing - DoD CIO
7 pages
Montauk GO101
No ratings yet
Montauk GO101
2 pages
Understanding Rabbitmq and Easynetq Slides
No ratings yet
Understanding Rabbitmq and Easynetq Slides
19 pages
2 Technical Writing Software Documentation m2 Slides PDF
No ratings yet
2 Technical Writing Software Documentation m2 Slides PDF
66 pages
Google Cloud Guide for Beginners
No ratings yet
Google Cloud Guide for Beginners
35 pages
Using The Publish and Subscribe Pattern For Notifications Slides
No ratings yet
Using The Publish and Subscribe Pattern For Notifications Slides
44 pages
Simplifying Background Jobs with Hangfire
No ratings yet
Simplifying Background Jobs with Hangfire
24 pages
MGD Printout Poster PDF
No ratings yet
MGD Printout Poster PDF
1 page
MGD Printout Letter PDF
No ratings yet
MGD Printout Letter PDF
1 page
Apache Kafka: Topics, Partitions, Brokers
No ratings yet
Apache Kafka: Topics, Partitions, Brokers
37 pages
Optimizing IT with COBIT Framework
No ratings yet
Optimizing IT with COBIT Framework
1 page
Kafka Producer Setup Guide
No ratings yet
Kafka Producer Setup Guide
31 pages
Kafka Consumer Guide for Developers
No ratings yet
Kafka Consumer Guide for Developers
38 pages
Apache Kafka Architecture Overview
No ratings yet
Apache Kafka Architecture Overview
23 pages
Kafka Consumer Guide for Developers
No ratings yet
Kafka Consumer Guide for Developers
38 pages
Apache Kafka: Topics, Partitions, Brokers
No ratings yet
Apache Kafka: Topics, Partitions, Brokers
37 pages
Factory Patterns: Factory Method and Abstract Factory
No ratings yet
Factory Patterns: Factory Method and Abstract Factory
25 pages
Acer 1 Compal LA-6901P JE50-HR SJV50-HR - P5WE0 P5WS0 - Rev0.5
No ratings yet
Acer 1 Compal LA-6901P JE50-HR SJV50-HR - P5WE0 P5WS0 - Rev0.5
61 pages
Gov Tech Guidance Note 1 The Frontier
No ratings yet
Gov Tech Guidance Note 1 The Frontier
8 pages
M.Com Dissertation: Digital Marketing
No ratings yet
M.Com Dissertation: Digital Marketing
50 pages
YSEALI 2024 Workshop: Cultural Tourism
No ratings yet
YSEALI 2024 Workshop: Cultural Tourism
1 page
BNB Star: Tokenized Crowdfunding System
No ratings yet
BNB Star: Tokenized Crowdfunding System
33 pages
I PM Clearing Formats
No ratings yet
I PM Clearing Formats
1,729 pages
RAKSHITPANT
No ratings yet
RAKSHITPANT
1 page
Server Load Balancer: LB-8000 LB-8000
No ratings yet
Server Load Balancer: LB-8000 LB-8000
4 pages
Motorola Razr V3 Manual
No ratings yet
Motorola Razr V3 Manual
100 pages
Process Flow: 1.6 Lakhs School Caf Project - User Manual 2025
No ratings yet
Process Flow: 1.6 Lakhs School Caf Project - User Manual 2025
6 pages
AI's Influence on Indian Retail Industry
No ratings yet
AI's Influence on Indian Retail Industry
46 pages
CertiK Audit For Shibnobi Inu Audit
No ratings yet
CertiK Audit For Shibnobi Inu Audit
41 pages
Unit 1 2 3 (Communication)
No ratings yet
Unit 1 2 3 (Communication)
11 pages
Internship Report
No ratings yet
Internship Report
23 pages
Creating HA Device With StarWind Virtual SAN Free
No ratings yet
Creating HA Device With StarWind Virtual SAN Free
9 pages
UST Accountancy Elective Enlistment
No ratings yet
UST Accountancy Elective Enlistment
2 pages
Clinicalestablishment Gipl Landscape
No ratings yet
Clinicalestablishment Gipl Landscape
152 pages
Physical Inventory Documents - Read, Create - SAP Help Portal
No ratings yet
Physical Inventory Documents - Read, Create - SAP Help Portal
3 pages
Lesson 3 Components of An Android Application
No ratings yet
Lesson 3 Components of An Android Application
39 pages
V2 - 7-Day Publi$Hing Challenge Step-By-Step Guide (3) .PDF - Crdownload
No ratings yet
V2 - 7-Day Publi$Hing Challenge Step-By-Step Guide (3) .PDF - Crdownload
67 pages
UoN IT Service Status Page - 1 Service Is Experiencing Problems
No ratings yet
UoN IT Service Status Page - 1 Service Is Experiencing Problems
1 page
Document 2407678.1.PDF Transmission Modification Issue
No ratings yet
Document 2407678.1.PDF Transmission Modification Issue
6 pages
Sage100 2023 Installation System Admin Guide
No ratings yet
Sage100 2023 Installation System Admin Guide
132 pages
Palraj Balasainath OffCampus Resume
No ratings yet
Palraj Balasainath OffCampus Resume
1 page
41cf78 10354802 INV 2021 17267
No ratings yet
41cf78 10354802 INV 2021 17267
1 page
Agrim Tondon 1
No ratings yet
Agrim Tondon 1
13 pages
Ict Productivity
No ratings yet
Ict Productivity
18 pages
Sbi Po PT - 15
No ratings yet
Sbi Po PT - 15
5 pages
Snapchat Dog Filter PNG - Recherche Google
No ratings yet
Snapchat Dog Filter PNG - Recherche Google
1 page
Problem Statements
No ratings yet
Problem Statements
8 pages

Getting Started With Apache Kafka

Uploaded by

Getting Started With Apache Kafka

Uploaded by

Getting Started with Apache Kafka

WHY APACHE KAFKA

“A high-throughput distributed messaging system.”

RDBMS LOGS NOSQL QUEUES BLOBS

DW HADOOP SEARCH ANALYSIS

RDBMS to RDBMS only

Typically proprietary and costly

To move data around:

Source: Urban Dictionary

publish publish publish

topic topic topic …

consume consume consume consume publish

DW MARTS HDP SEARCH OPS …

Yahoo Uber Square Airbnb

You might also like