Getting Started With Apache Kafka
Getting Started With Apache Kafka
Ryan Plant
COURSE AUTHOR
@ryan_plant blog.ryanplant.com
What Is Apache Kafka?
Microsoft ElasticSearch
SQL Server
MongoDB
Oracle
MySQL
Apache Kafka
Hadoop
Limited scalability
Smaller messages
Requires rapid consumption
Not fault-tolerant (application)
Perils of Messaging Under High Volume
High volume?
Publishers Message size?
No throttle?
Single host?
Local storage?
Message buildup?
BROKER
Consumers No consumption?
Slow consumption?
Perils of Messaging With Application Faults
Publishers
BROKER
Message
Consumers processing
bug
Middleware Magic
Increasingly complex
Deceiving
Consistency concerns
Potentially expensive
Middleware Challenges
Multi-write pattern Message broker pattern
Atomic
transaction
Coordination 1 2 1 2
Competing
logic consumers
Non-consuming
consumer
Isn’t There a Better Way?
High Velocity:
- Peak 13 million messages per second
- 2.75 gigabytes per second
High Variety:
- Multiple RDBMS (Oracle, MySQL, etc.)
- Multiple NoSQL (Espresso, Voldemort)
- Hadoop, Spark, etc.
Pre-2010 LinkedIn Data Architecture
skills recommendations
comments jobs
network updates ads mail search
groups people you may know profile news stats
…
kaf ka esque /’káf, kə, ɛsk/ | adjective
Basically it describes a nightmarish situation which most
people can somehow relate to, although strongly surreal.
synonyms: surreal, lucid, spoilsbury toast boy
Usage: “Whoa! This flick is way kafkaesque…”
Franz Kafka
High throughput
Horizontally scalable
Reliable and durable
Loosely coupled Producers and Consumers
Flexible publish-subscribe semantics
Post-2010 LinkedIn Data Architecture
LOB
APPS
APPS
DBs LOGS
consume consume
consume consume consume
consume
2010 Today
2003 Initial Kafka 1.1 Trillion
LinkedIn Launch Deployment @ messages per
LinkedIn day @ LinkedIn
2011
2009
Kafka Open Sourced
Kafka Inception
Apache Software
Development begins
Foundation
Apache Kafka Adoption
7X since 2015