0% found this document useful (0 votes)
51 views5 pages

Kafka

Apache Kafka is an open source distributed streaming platform that enables real-time event-driven applications to publish or subscribe to data streams. It stores records accurately in order and processes them in real-time, enabling high-volume, low-latency stream processing. Kafka is primarily used for real-time streaming data pipelines between systems and real-time streaming applications that generate their own event streams. It supports both low-level consumers that specify topics and partitions to read from, and high-level consumers that use consumer groups for easier consumption without duplicating records.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views5 pages

Kafka

Apache Kafka is an open source distributed streaming platform that enables real-time event-driven applications to publish or subscribe to data streams. It stores records accurately in order and processes them in real-time, enabling high-volume, low-latency stream processing. Kafka is primarily used for real-time streaming data pipelines between systems and real-time streaming applications that generate their own event streams. It supports both low-level consumers that specify topics and partitions to read from, and high-level consumers that use consumer groups for easier consumption without duplicating records.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Apache Kafka (Kafka) is an open source, distributed streaming

platform that enables (among other things) the development of


real-time, event-driven applications.

Kafka has three primary capabilities:


1- It enables applications to publish or subscribe to data or event streams.
2- It stores records accurately (i.e., in the order in which they occurred) in a
fault-tolerant and durable way.
3- It processes records in real-time (as they occur).

Kafka is a stream processing platform that enables applications


to publish, consume, and process high volumes of record
streams in a fast and durable way .
Kafka is used primarily for creating two kinds of applications:
 Real-time streaming data pipelines: Applications designed
specifically to move millions and millions of data or event
records between enterprise systems—at scale and in real-
time—and move them reliably, without risk of corruption,
duplication of data, and other problems that typically
occur when moving such huge volumes of data at high
speeds.
 Real-time streaming applications: Applications that are
driven by record or event streams and that generate
streams of their own. If you spend any time online, you
encounter scores of these applications every day, from the
retail site that continually updates the quantity of a
product at your local store, to sites that display
personalized recommendations or advertising based on
clickstream analysis.

Low-level consumers
There are two types of consumers in Kafka. First, the low-level
consumer, where topics and partitions are specified as is the offset
from which to read, either fixed position, at the beginning or at the
end. This can, of course, be cumbersome to keep track of which offsets
are consumed so the same records aren’t read more than once. So
Kafka added another easier way of consuming with:
High-level consumer
The high-level consumer (more known as consumer groups) consists of
one or more consumers. Here a consumer group is created by adding
the property “group.id” to a consumer. Giving the same group id to
another consumer means it will join the same group.
Now we have been looking at the producer and the consumer, and we will check at how
the broker receives and stores records coming in the broker.

We have an example, where we have a broker with three topics, where each topic has 8
partitions.

The producer sends a record to partition 1 in topic 1 and since the partition is empty the
record ends up at offset 0.

Next record is added to partition 1 will and up at offset 1, and the next record at offset 2
and so on.
This is what is referred to as a commit log, each record is appended to the log and there
is no way to change the existing records in the log. This is also the same offset that the
consumer uses to specify where to start reading.

Resourses :
https://www.youtube.com/watch?v=X79IjgIUDzU
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013 (slideshare.net)

You might also like