Effective Kafka A Hands on Guide to Building Robust and Scalable Event Driven Applications with Code Examples in Java 1st Edition Emil Koutanov instant download
Effective Kafka A Hands on Guide to Building Robust and Scalable Event Driven Applications with Code Examples in Java 1st Edition Emil Koutanov instant download
https://ebookfinal.com/download/microservices-for-java-developers-a-
hands-on-introduction-to-frameworks-and-containers-first-edition-
edition-posta/
https://ebookfinal.com/download/cloud-native-devops-building-scalable-
and-reliable-applications-1st-edition-mohammed-ilyas-ahmed/
https://ebookfinal.com/download/j2ee-technology-in-practice-building-
business-applications-with-the-java-2-platform-1st-edition-rick-
cattell/
Node Up and Running Scalable Server Side Code with
JavaScript 1st Edition Tom Hughes-Croucher
https://ebookfinal.com/download/node-up-and-running-scalable-server-
side-code-with-javascript-1st-edition-tom-hughes-croucher/
https://ebookfinal.com/download/effective-unit-testing-a-guide-for-
java-developers-1st-edition-lasse-koskela/
https://ebookfinal.com/download/the-hands-on-guide-to-diabetes-care-
in-hospital-1st-edition-david-levy/
Emil Koutanov
This book is for sale at http://leanpub.com/effectivekafka
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing
process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and
many iterations to get reader feedback, pivot until you have the right book and build traction once
you do.
Chapter 4: Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Installing Kafka and ZooKeeper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Launching Kafka and ZooKeeper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Running in the background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Installing Kafdrop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Chapter 7: Serialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Key and value serializer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Key and value deserializer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Compaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Combining compaction with deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
and far less prescriptive. However fashionable and blogged-about these paradigms might be, they
only offer partial solutions to common problems. One must be mindful of the context at all times,
and apply the model judiciously. And crucially, one must understand the deficiencies of the proposed
approach, being able to reason about the implications of its adoption — both immediate and long-
term.
The principal inconvenience of a distributed system is that it shifts the complexity from the innards
of a service implementation to the notional fabric that spans across services. Some might say, it
lifts the complexity from the micro level to the macro level. In doing so, it does not reduce the net
complexity; on the contrary, it increases the aggregate complexity of the combined solution. An
astute engineering leader is well-aware of this. The reason why a distributed architecture is often
chosen — assuming it is chosen correctly — is to enable the compartmentalisation of the problem
domain. It can, if necessary, be decomposed into smaller chunks and solved in partial isolation,
typically by different teams — then progressively integrated into a complete whole. In some cases,
this decomposition is deliberate, where teams are organised around the problem. In other, less-
than-ideal cases, the breakdown of the problem is a reflection of Conway’s Law, conforming to
organisational structures. Presumed is the role an architect, or a senior engineering figure that
orchestrates the decomposition, assuming the responsibility for ensuring the conceptual integrity
and efficacy of the overall solution. Centralised coordination may not always be present — some
organisations have opted for a more democratic style, whereby teams act in concert to decompose
the problem organically, with little outside influence.
Coupling
Whichever the style of decomposition, the notion of macro complexity cannot be escaped. Funda-
mentally, components must communicate in one manner or another, and therein lies the problem:
components are often inadvertently made aware of each other. This is called coupling — the degree of
interdependence between software components. The lower the coupling, the greater the propensity
of the system to evolve to meet new requirements, performance demands, and operational challenges.
Conversely, tight coupling shackles the components of the system, increasing their mutual reliance
and impeding their evolution.
There are known ways for alleviating the problem of coupling, such as the use of an asynchronous
communication style and message-oriented middleware to segregate components. These techniques
have been used to varying degrees of success; there are times where message-based communication
has created a false economy — collaborating components may still be transitively dependent upon
one another in spite of their designers’ best efforts to forge opaque conduits between them.
Resilience
It would be rather nice if computers never failed and networks were reliable; as it happens, reality
differs. The problem is exacerbated in a distributed context: the likelihood of any one component
experiencing an isolated failure increases with the total number of components, which carries
negative ramifications if components are interdependent.
Chapter 1: Event Streaming Fundamentals 3
Distributed systems typically require a different approach to resilience compared to their centralised
counterparts. The quantity and makeup of failure scenarios is often much more daunting in dis-
tributed systems. Failures in centralised systems are mostly characterised as fail-stop scenarios —
where a process fails totally and permanently, or a network partition occurs, which separates the
entirety of the system from one or more clients, or the system from its dependencies. At either rate,
the failure modes are trivially understood. By contrast, distributed systems introduce the concept of
partial failures, intermittent failures, and, in the more extreme cases, Byzantine failures. The latter
represents a special class of failures where processes submit incorrect or misleading information to
unsuspecting peers.
Consistency
Ensuring state consistency in a distributed system is perhaps the most difficult aspect to get right.
One can think of a distributed system as a vast state machine, with some elements of it being updated
independently of others. There are varying levels of consistency, and different applications may
demand specific forms of consistency to satisfy their requirements. The stronger the consistency
level, the more synchronisation is necessary to maintain it. Synchronisation is generally regarded
as a difficult problem; it is also expensive — requiring additional resources and impacting the
performance of the system.
Cost being held a constant, the greater the requirement for consistency, the less distributed a system
will be. There is also a natural counterbalance between consistency and availability, identified by
Eric Brewer in 1998. The essence of it is in the following: distributed systems must be tolerant of
network partitions, but in achieving this tolerance, they will have to either give up consistency
or availability guarantees. Note, this conjecture does not claim that a consistent system cannot
simultaneously be highly available, only that it must give up availability if a network partition
does occur.
By comparison, centralised systems are not bound by the same laws, as they don’t have to contend
with network partitions. They can also take advantage of the underlying hardware, such as CPU
cache lines and atomic operations, to ensure that individual threads within a process maintain
consistency of shared data. When they do fail, they typically fail as a unit — losing any ephemeral
state and leaving the persistent state as it was just before failure.
Event-Driven Architecture
Event-Driven Architecture (EDA) is a paradigm promoting the production, detection, consumption
of, and reaction to events. An event is a significant state in change, that may be of interest within
the domain where this state change occurred, or outside of that domain. Interested parties can be
notified of an event by having the originating domain publish some canonical depiction of the event
to a well-known conduit — a message broker, a ledger, or a shared datastore of some sort. Note, the
event itself does not travel — only its notification; however, we often metonymically refer to the
notification of the event as the event. (While formally incorrect, it is convenient.)
Chapter 1: Event Streaming Fundamentals 4
An event-driven system formally consists of emitters (also known as producers and agents), con-
sumers (also known as subscribers and sinks), and channels (also known as brokers). We also use
the term upstream — to refer to the elements prior to a given element in the emitter-consumer
relation, and downstream — to refer to the subsequent elements.
An emitter of an event is not aware of any of the event’s downstream consumers. This statement
captures the essence of an event-driven architecture. An emitter does not even know whether
a consumer exists; every transmission of an event is effectively a ‘blind’ broadcast. Likewise,
consumers react to specific events without the knowledge of the particular emitter that published
the event. A consumer need not be the final destination of the event; the event notification may
be persisted or transformed by the consumer before being broadcast to the next stage in a notional
pipeline. In other words, an event may spawn other events; elements in an event-driven architecture
may combine the roles of emitters and consumers, simultaneously acting as both.
Event notifications are immutable. An element cannot modify an event’s representation once it has
been emitted, not even if it is the originally emitter. At most, it can emit new notifications relating
to that event — enriching, refining, or superseding the original notification.
Coupling
Elements within EDA are exceedingly loosely coupled, to the point that they are largely unaware
of one another. Emitters and consumers are only coupled to the intermediate channels, as well as to
the representations of events — schemas. While some coupling invariably remains, in practice, EDA
offers the lowest degree of coupling of any practical system. The collaborating components become
largely autonomous, standalone systems that operate in their own right — each with their individual
set of stakeholders, operational teams, and governance mechanisms.
By way of an example, an e-commerce system might emit events for each product purchase, detailing
the time, product type, quantity, the identity of the customer, and so on. Downstream of the
emitter, two systems — a business intelligence (BI) platform and an enterprise resource planning
(ERP) platform — might react to the sales events and build their own sets of materialised views.
(In effect, view-only projections of the emitter’s state.) Each of these platforms are completely
independent systems with their own stakeholders: the BI system satisfies the business reporting
and analytics requirements for the marketing business unit, while the ERP system supports supply
chain management and capacity planning — the remit of an entirely different business unit.
To put things into perspective, we shall consider the potential solutions to this problem in the absence
of EDA. There are several ways one could have approached the solution; each approach commonly
found in the industry to this day:
1. Build a monolith. Conceptually, the simplest approach, requiring a system to fulfill all
requirements and cater to all stakeholders as an indivisible unit.
2. Integration. Allow the systems to invoke one another via some form of an API. Either the
e-commerce platform could invoke the BI and ERP platforms at the point of sale, or the BI and
ERP platforms could invoke the e-commerce platform APIs just before generating a business
Chapter 1: Event Streaming Fundamentals 5
report or supplier request. Some variations of this model use message queues for systems to
send commands and queries to one another.
3. Data decapsulation. If system integrators were cowboys, this would be their prairie. Data
decapsulation (a coined term, if one were to ask) sees systems ‘reaching over’ into each other’s
‘backyard’, so to speak, to retrieve data directly from the source (for example, from an SQL
database) — without asking the owner of the data, and oftentimes without their awareness.
4. Shared data. Build separate applications that share the same datastore. Each application is
aware of all data, and can both read and modify any data element. Some variations of this
scheme use database-level permissions to restrict access to the data based on an application’s
role, thereby binding the scope of each application.
Once laid out, the drawbacks of each model become apparent. The first approach — the proverbial
monolith — suffers from uncontrolled complexity growth. In effect, it has to satisfy everyone and
everything. This also makes it very difficult to change. From a reliability standpoint, it is the
equivalent of putting all of one’s eggs in one basket — if the monolith were to fail, it will impact all
stakeholders simultaneously.
The second approach — integrate everything — is what these days is becoming more commonly
known as the ‘distributed monolith’, especially when it is being discussed in the context of mi-
croservices. While the systems (or services, as the case may be) appear to be standalone — they
might even be independently sourced and maintained — they are by no means autonomous, as they
cannot change freely without impacting their peers.
The third approach — read others’ data — is the architectural equivalent of a ‘get rich quick scheme’
that always ends in tears. It takes the path of least resistance, making it highly alluring. However,
the model creates the tightest possible level of coupling, making it very difficult to change the
parties down the track. It is also brittle — a minor and seemingly benign change to the internal
data representation in one system could have a catastrophic effect on another system.
The final model — the use of a shared datastore — is a more civilised variation of the third approach.
While it may be easier to govern, especially with the aid of database-level access control — the
negative attributes are largely the same.
Now imagine that the business operates multiple disparate e-commerce platforms, located in dif-
ferent geographic regions or selling different sorts of products. And to top it off, the business now
needs a separate data warehouse for long-term data collection and analysis. The addition of each
new component significantly increases the complexity of the above solutions; in other words, they
do not scale. By comparison, EDA scales perfectly linearly. Systems are unaware of one another and
react to discrete events — the origin of an event is largely circumstantial. This level of autonomy
permits the components to evolve rapidly in isolation, meeting new functional and non-functional
requirements as necessary.
Resilience
The autonomy created by the use of EDA ensures that, as a whole, the system is less prone to outage
if any of its individual components suffer a catastrophic failure. How is this achieved?
Chapter 1: Event Streaming Fundamentals 6
Integrated systems, and generally, any topological arrangement that exhibits a high degree of
component coupling is prone to correlated failure — whereby the failure of one component can
take down an entire system. In a tightly coupled system, components directly rely on one another
to jointly achieve some goal. If one of these components fails, then the remaining components that
depend on it may also cease to function; at minimum, they will not be able to carry out those
operations that depend on the failed component.
In the case of a monolith, the failure assertion is trivial — if a fail-stop scenario occurs, the entire
process is affected.
Under EDA, enduring a component failure implies the inability to either emit events or consume
them. In the event of emitter failure, consumers may still operate freely, albeit without a facility
for reacting to new events. Using our earlier example, if the e-commerce engine fails, none of the
downstream processes will be affected — the business can still run analytical queries and attend to
resource planning concerns. Conversely, if the ERP system fails, the business will still make sales;
however, some products might not be placed on back-order in time, potentially leading to low stock
levels. Furthermore, provided the event channel is durable, the e-commerce engine will continue to
publish sales events, which will eventually be processed by the ERP system when it is restored. The
failure of an event channel can be countered by implementing a local, stateful buffer on the emitter,
so that any backlogged events can be published when the channel has been restored. In other words,
not only is an event-driven system more resilient by retaining limited operational status during
component failure, it is also capable of self-healing when failed components are replaced.
In practice, systems may suffer from soft failures, where components are saturated beyond their
capacity to process requests, creating a cascading effect. In networking, this phenomenon is called
‘congestive collapse’. In effect, components appear to be online, but are stressed — unable to turn
around some fraction of requests within acceptable time frames. In turn, the requesting components
— having detected a timeout — retransmit requests, hoping to eventually get a response. This in-
creases pressure on the stressed components, exacerbating the situation. Often, the missed response
is merely an indication of receiving the request — in effect, the requester is simply piling on duplicate
work.
Under EDA, requesters do not require a confirmation from downstream consumers — a simple
acknowledgement from the event channel is sufficient to assume that the event has been stably
enqueued and that the consumer(s) will get to it at some future point in time.
Consistency
EDA ameliorates the problem of distributed consistency by attributing explicit mastership to state,
such that any stateful element can only be manipulated by at most one system — its designated
owner. This is also referred to as the originating domain of the event. Other domains may only react
to the event; for example, they may reduce the event stream to a local projection of the emitter’s
state.
Under this model, consistency within the originating domain is trivially maintained by enforcing
the single writer principle. External to the domain, the events can be replayed in the exact order
Chapter 1: Event Streaming Fundamentals 7
they were observed on the emitter, creating sequential consistency — a model of consistency where
updates do not have to be seen instantaneously, but must be presented in the same order to all
observers, which is also the order they were observed on the emitter. Alternatively, events may
be emitted in causal order, categorising them into multiple related sequences, where events within
any sequence are related amongst themselves, but unrelated to events in another sequence. This is
a slight relaxation of sequential consistency to allow for safe parallelism, and is sufficient in the
overwhelming majority of use cases.
Applicability
For all its outstanding benefits, EDA is not a panacea and cannot supplant integrated or monolithic
systems in all cases. For instances, EDA is not well-suited to synchronous interactions, as mutual
or unilateral awareness among collaborating parties runs contrary to the grain of EDA and negates
most of its benefits.
EDA is not a general-purpose architectural paradigm. It is designed to be used in conjunction with
other paradigms and design patterns, such as synchronous request-response style messaging, to
solve more general problems. In the areas where it can be applied, it ordinarily leads to significant
improvements in the system’s non-functional characteristics. Therefore, one should seek to max-
imise opportunities for event-driven compositions, refactoring the architecture to that extent.
• Interface between the emitter and the channel, and the consumer and the channel;
• Cardinality of the emitter and consumer elements that interact with a common channel;
• Delivery semantics;
• Enabling parallelism in the handling of event notifications;
• Persistence, durability, and retention of event records; and
• Ordering of events and associated consistency models.
The focal point of event streaming is, unsurprisingly, an event stream. At minimum, an event stream
is a durable, totally-ordered, unbounded sequence of immutable event records, delivered at least once
to its subscriber(s). An event streaming platform is a concrete technology that implements the event
streaming model, addressing the points enumerated above. It interfaces with emitter and consumer
Chapter 1: Event Streaming Fundamentals 8
ecosystems, hosts event streams, and may provide additional functionality beyond the essential set
of event streaming capabilities. For example, an event streaming platform may offer end-to-end
compression and encryption of event records, which is not essential in the construction of event-
driven systems, but is convenient nonetheless.
It is worth noting that event streaming is not required to implement the event channel element of
EDA. Other transports, such as message queues, may be used to fulfill similar objectives. In fact,
there is nothing to say that EDA is exclusive to distributed systems; the earliest forms of EDA were
realised within the confines of a single process, using purely in-memory data structures. It may seem
banal in comparison, but even UI frameworks of the bygone era, such as Java Swing, draw on the
foundations of EDA, as do their more contemporary counterparts, such as React.
When operating in the context of a distributed system, the primary reason for choosing event
streaming over the competing alternatives is that the former was designed specifically for use in
EDA, and its various implementations — event streaming platforms — offer a host of capabilities
that streamline their adoption in EDA. A well-designed event streaming platform provides direct
correspondence with native EDA concepts. For example, it takes care of event immutability, record
ordering, and supports multiple independent consumers — concepts that might not necessarily be
endemic to alternate solutions, such as message queues.
This chapter has furnished an overview of the challenges of engineering distributed systems,
contrasted with the building of monolithic business applications. The numerous drawbacks of dis-
tributed systems increase their cost and complicate their upkeep. Generally speaking, the components
of a complex system are distributed out of necessity — namely, the requirement to scale in both the
performance plane and in the engineering capacity to deliver change.
We looked at how the state of the art has progressed since the mass adoption of the principles of
distributed computing in mainstream software engineering. Specifically, we explored Event-Driven
Architecture as a highly effective paradigm for reducing coupling, bolstering resilience, and avoiding
the complexities of maintaining a globally consistent state.
Finally, we touched upon event streaming, which is a rendition of the event channel element of
EDA. We also learned why event streaming is the preferred approach for persisting and transporting
event notifications. In no uncertain terms, event streaming is the most straightforward path for the
construction of event-driven systems.
Chapter 2: Introducing Apache Kafka
Apache Kafka (or simply Kafka) is an event streaming platform. But it is also more than that. It is
an entire ecosystem of technologies designed to assist in the construction of complete event-driven
systems. Kafka goes above and beyond the essential set of event streaming capabilities, providing
rich event persistence, transformation, and processing semantics.
Event streaming platforms are a comparatively recent paradigm within the broader message-oriented
middleware class. There are only a handful of mainstream implementations available, compared to
hundreds of MQ-style brokers, some going back to the 1980s (for example, Tuxedo). Compared to
established messaging standards such as AMQP, MQTT, XMPP, and JMS, there are no equivalent
standards in the streaming space. Kafka is a leader in the area of event streaming, and more broadly,
event-driven architecture. While there is no de jure standard in event streaming, Kafka is the
benchmark to which most competing products orient themselves. To this effect, several competitors
— such as Azure Event Hubs and Apache Pulsar — offer APIs that mimic Kafka.
Event streaming platforms are an active area of continuous research and experimentation.
In spite of this, event streaming platforms aren’t just a niche concept or an academic idea
with few esoteric use cases; they can be applied effectively to a broad range of messaging
and eventing scenarios, routinely displacing their more traditional counterparts.
Kafka is written in Java, meaning it can run comfortably on most operating systems and hardware
configurations. It can equally be deployed on bare metal, in the Cloud, and a Kubernetes cluster.
And finally, Kafka has libraries written for just about every programming language, meaning that
virtually every developer can start taking advantage of event streaming and push their application
architecture to the next level of resilience and scalability.
processing approximately one billion events per day. By 2012, this number had risen to 20 billion.
By July 2013, Kafka was carrying 200 billion events per day. Two years later, in 2015, Kafka was
turning over one trillion events per day, with peaks of up to 4.5 million events per second.
Over the four years of 2011 to 2015, the volume of records has grown by three orders of magnitude.
By the end of this period, LinkedIn was moving well over a petabyte of event data per week. By all
means, this level of growth could not be attributed to Kafka alone; however, Kafka was undoubtedly
a key enabler from an infrastructure perspective.
As of October 2019, LinkedIn maintains over 100 Kafka clusters, comprising more than 4,000 brokers.
These collectively serve more than 100,000 topics and 7 million partitions. The total number of
records handled by Kafka has surpassed 7 trillion per day.
• Yahoo uses Kafka for real-time analytics, handling up to 20 gigabits of uncompressed event
data per second in 2015. Yahoo is also a major contributor to the Kafka ecosystem, having
open-sourced its in-house Cluster Manager for Apache Kafka (CMAK) product.
• Twitter heavily relies on Kafka for its mobile application performance management and
analytics product, which has been clocked at five billion sessions per day in February 2015.
Twitter processes this stream using a combination of Apache Storm, Hadoop, and AWS Elastic
MapReduce.
• Netflix uses Kafka as the messaging backbone for its Keystone pipeline — a unified event
publishing, collection, and routing infrastructure for both batch and stream processing. As of
2016, Keystone comprises over 4,000 brokers deployed entirely in the Cloud, which collectively
handle more than 700 billion events per day.
• Tumblr relies on Kafka as an integral part of its event processing pipeline, capturing up 500
million page views a day back in 2012.
• Square uses Kafka as the underlying bus to facilitate stream processing, website activity
tracking, metrics collection and monitoring, log aggregation, real-time analytics, and complex
event processing.
• Pinterest employs Kafka for its real-time advertising platform, with 100 clusters comprising
over 2,000 brokers deployed in AWS. Pinterest is turning over in excess of 800 billion events
per day, peaking at 15 million per second.
• Uber is among the most prominent of Kafka adopters, processing in excess of a trillion
events per day — mostly for data ingestion, event stream processing, database changelogs, log
aggregation, and general-purpose publish-subscribe message exchanges. In addition, Uber is
an avid open-source contributor — having released its in-house cluster replication solution
uReplicator into the wild.
Chapter 2: Introducing Apache Kafka 11
And it’s not just the engineering-focused organisations that have adopted Kafka — by some esti-
mates, up a third of Fortune 500 companies use Kafka to fulfill their event streaming and processing
needs.
There are good reasons for this level of industry adoption. As it happens, Kafka is one of the
most well-supported and well-regarded event streaming platforms, boasting an impressive number
of open-source projects that integrate with Kafka. Some of the big names include Apache Storm,
Apache Flink, Apache Hadoop, LogStash and the Elasticsearch Stack, to name a few There are also
Kafka Connect integrations with every major SQL database, and most NoSQL ones too. At the time
of writing, there are circa one hundred supported off-the-shelf connectors, which does not include
custom connectors that have been independently developed.
Uses of Kafka
Chapter 1: Event Streaming Fundamentals has provided the necessary background, fitting Kafka as
an event streaming platform within a larger event-driven system.
There are several use cases falling within the scope of EDA that are well-served by Apache Kafka.
This section covers some of these scenarios, illustrating how Kafka may be used to address them.
Publish-subscribe
Pub-Sub
Any messaging scenario where producers are generally unaware of consumers, and instead publish
messages to well-known aggregations called topics. Conversely, consumers are generally unaware of
the producers but are instead concerned with specific content categories. The producer and consumer
Chapter 2: Introducing Apache Kafka 12
ecosystems are loosely-coupled, being aware of only the common topic(s) and messaging schema(s).
This pattern is commonly used in the construction of loosely-coupled microservices.
When Kafka is used for general-purpose publish-subscribe messaging, it will be competing with its
‘enterprise’ counterparts, such as message brokers and service buses. Admittedly, Kafka might not
have all the features of some of these middleware platforms — such as message deletion, priority
levels, producer flow control, distributed transactions, or dead-letter queues. On the other hand,
these features are mostly representative of traditional messaging paradigms — intrinsic to how
these platforms are commonly used. Kafka works in its own idiomatic way — optimised around
unbounded sequences of immutable events. As long as a publish-subscribe relationship can be
represented as such, then Kafka is fit for the task.
Log aggregation
Log aggregation
Dealing with large volumes of log-structured events, typically emitted by application or infrastruc-
ture components. Logs may be generated at burst rates that significantly outstrip the ability of query-
centric datastores to keep up with log ingestion and indexing, which are regarded as ‘expensive’
operations. Kafka can act as a buffer, offering an intermediate, durable datastore. The ingestion
process will act as a sink, eventually collating the logs into a read-optimised database (for example,
Elasticsearch or HBase).
A log aggregation pipeline may also contain intermediate steps, each adding value en route to the
final destination; for example, to compress log data, encrypt log content, normalise the logs into a
canonical form, or sanitise the log entries — scrubbing them of personally-identifiable information.
Chapter 2: Introducing Apache Kafka 13
Log shipping
Log shipping
While sounding vaguely similar to log aggregation, the shipping of logs is a vastly different concept.
Essentially, this involves the real-time copying of journal entries from a master data-centric system
to one or more read-only replicas. Assuming stage changes are fully captured as journal records,
replaying those records allows the replicas to accurately mimic the state of the master, albeit with
some lag.
Kafka’s optional ability to partition records within a topic to create independent, causally ordered
sequences of events allows for replicas to operate in one of sequential or causal consistency models —
depending on the chosen partitioning scheme. The various consistency models were briefly covered
in Chapter 1: Event Streaming Fundamentals. Both consistency models are sufficient for creating
read-only copies of the original data.
Log shipping is a key enabler for another related architectural pattern — event sourcing. Kafka will
act as a durable event store, allowing any number of consumers to rebuild a point-in-time snapshot
of their application state by replaying all records up to that point in time. Loss of state information
in any of the downstream consumers can be recovered by replaying the events from the last stable
checkpoint, thereby reducing the need to take frequent backups.
SEDA pipelines
SEDA pipeline
Discovering Diverse Content Through
Random Scribd Documents
we can see how this loving obedience was in reality a
storing up of energy for the great revolution of which
she had caught the earliest intimations.
It is a pleasant thought to take in passing that this
good sister—happier than many—had brothers
equally good. If she was all that a sister could be she
found in them good brothers, who were friends and
fellow-workers, helping her in all the great aims of
her life. Her eldest brother, the Rev. Alfred J. Buss, as
clerk to the governing body of the schools, quite
relieved her mind from all anxiety concerning
business arrangements; whilst the religious
instruction given by the Rev. Septimus Buss carried
on the early tradition of the school. There was a wide
gap between the eldest of the family and number
seven, so that her relation with this brother, after the
mother’s death, was half maternal as well as half
sisterly. When he early became engaged to her pupil,
cousin, and friend, and thus gave her the truest and
most tender of sisters, the bond was doubled, and
the children of this beloved pair—her namesake
Francis, especially—became as her very own. Her
letters are full of allusion to “my boy,” who was her
joy from his peculiarly engaging babyhood till he
fulfilled her heart’s desire by taking Holy Orders. His
next brother followed in this example, first set by the
son of the Rev. A. J. Buss, now Minor Canon of
Lincoln.
This clerical bent was very strong in the family. As
a boy, Alfred Joseph Buss shared his sister’s
enthusiasm for teaching, and for any hope of head-
mastership Holy Orders were essential. Before he
was out of his teens he became the first assistant-
master in the then newly opened North London
Collegiate School for Boys. He was also English tutor
at one time to the young Orleans princes. But later in
life he found himself drawn most strongly to the
work of the parish priest. Septimus Buss inherited so
much of his father’s genius, that he seemed destined
for art, having a picture in the Royal Academy whilst
only nineteen years of age. But, though in obedience
to his father he worked hard at painting, he still had
his own intentions, and worked harder at Greek and
Latin. Knowing, however, that there was at that time
an extra strain upon the family finances, he bravely
kept his own wishes to himself till he had earned the
means of carrying them out. The story of these two
brothers is among the helpful and instructive tales
that ought some day to be written, to show what can
be done by high aims and resolute will. Of both it
may be said that they are all the stronger as fighters
in their splendid battle against East End misery,
because, in their own boyhood, they knew how “to
endure hardness as good soldiers.”
This attraction to the clerical profession was a
natural sequence to early associations. The most
powerful influence of Miss Buss’ girlish life was
undoubtedly that of her revered friend of whom Mrs.
Septimus Buss writes, when alluding to—
“All associated with our dear friend must have been struck
with her loyalty and faithfulness to her old friends. I am
thinking especially of her treatment of Mrs. Laing, for so
many years. Sunday by Sunday she went to see her after
morning service as regularly as the day came round; flowers
were sent to her very frequently, also nice books to read. On
her birthday Miss Buss never failed to see her before the
school-work began.”
“the Rev. Canon Dale, Vicar of St. Pancras, and his two
sons, Pelham and Lawford Dale; the Rev. Cornelius Hart,
Vicar of Old St. Pancras; the Rev. R. P. Clemenger, Vicar of
St. Thomas’, Agar Town; the Revs. E. Spooner and Charles
Lee, the immediate successors of Mr. Laing; the Countess of
Hardwicke, one of the earliest and most faithful friends of
the school, whose daughter, Lady Elizabeth Biddulph, still
continues the yearly prize for good conduct, and whose
warm letter of sympathy, in January, was one of the many
we received. We all remember, too, Judge Payne, and his
witty impromptu verses at so many prize-givings.”
“Dinan, 1860.
“Everything has combined to make this holiday delightful,
and I am so well and happy, that I feel as if I was only
twenty years of age, instead of a hundred, as I do in
Camden Street. I find myself talking slang to the boys, and
actually shouting fag-ends of absurd choruses from mere
lightheartedness.
“I am very sorry to say that I do not feel any more
industrious, though doubtless I shall have to recover from
that complaint in London. Also I regret to say that I have to-
day incurred the severe displeasure of our wee blue-eyed
laddie!”
CHAPTER III.
INFLUENCE.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookfinal.com