0% found this document useful (0 votes)
192 views

Kafka Netdb 06 2011 PDF

This document discusses Kafka, a distributed messaging system for log processing used at LinkedIn. It describes how LinkedIn uses Kafka to handle billions of logging events per day from various sources. The key design principles of Kafka include having a simple API, being efficient through approaches like batching and relying on file system caching, and being distributed. The document outlines the producer and consumer APIs and how Kafka achieves efficiency and scalability through its use of logs, stateless brokers, and Zookeeper-driven consumer load balancing. It also shares initial performance results and outlines future roadmap items like compression and stream processing.

Uploaded by

Manoj Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
192 views

Kafka Netdb 06 2011 PDF

This document discusses Kafka, a distributed messaging system for log processing used at LinkedIn. It describes how LinkedIn uses Kafka to handle billions of logging events per day from various sources. The key design principles of Kafka include having a simple API, being efficient through approaches like batching and relying on file system caching, and being distributed. The document outlines the producer and consumer APIs and how Kafka achieves efficiency and scalability through its use of logs, stateless brokers, and Zookeeper-driven consumer load balancing. It also shares initial performance results and outlines future roadmap items like compression and stream processing.

Uploaded by

Manoj Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Kafka: a Distributed Messaging System for Log Processing

Jay Kreps, Neha Narkhede, Jun Rao


LinkedIn

AGENDA
Kafka

usage at LinkedIn

Kafka

design

Kafka

roadmap

ABOUT LINKEDIN
Professional
top

social network platform

50th largest site in the world (traffic)

100M+

members

LOGGING OVERVIEW

Many types of events

user activity events: impression, search, ads, etc

operational events: call stack, service metrics, etc

High volume: billions of events per day

Both online and offline use case

reporting, batch analysis

security, news feeds, performance dashboard, ...

DEPLOYMENT
Main site
Frontend

Analysis site

Frontend

Frontend

VIP

Kafka
Kafka
Kafka

Realtime
service

Realtime
service

Kafka
Kafka
Kafka

Asterdata

Oracle

Hadoop

KAFKA DESIGN PRINCIPLES


Simple API
Efficient
Distributed

PRODUCER API
void send(String topic, ByteBufferMessageSet messages)

producer = new KafkaProducer();


message = new Message(test message str.getBytes());
set = new ByteBufferMessageSet(message);
producer.send(test, set);

CONSUMER API
streams[] = Consumer.createMessageStreams(test, 1)
for(message: streams[0]) {
bytes = message.payload()
// do something with bytes
}

EFFICIENCY #1: SIMPLE


STORAGE
Each

topic has an evergrowing log

log == a list of files

message is addressed by a log offset

EFFICIENCY #2: CAREFUL


TRANSFER
Batch
No

send and fetch

message caching in Kafka layer

Rely

on file system page cache

mostly, sequential
Zero-copy

access patterns

transfer: file -> socket

EFFICIENCY #3: STATELESS


BROKER
Each

consumer maintains its own state

Message

deletion driven by retention policy, not by


tracking consumption
acceptable

in practice

rewindable

consumer

AUTO CONSUMER LOAD


BALANCING
broker

broker

broker

broker

zookeeper

consumer

consumer

brokers and consumers register in zookeeper

consumers listen to broker and consumer changes

each change triggers consumer rebalancing

PRODUCER PERFORMANCE

CONSUMER PERFORMANCE

ROADMAP
New

Kafka features

compression
replication
stream

processing (online M/R)

http://sna-projects.com/kafka/

You might also like