Chapter 6 - Big Data Architecture Part 1
Chapter 6 - Big Data Architecture Part 1
Part 1
Introduction
The non-stop growth of data, the frantic releases of
new electronic devices and the data-driven decision-
making trend in companies is fueling a constant
demand for more efficient Big Data processing
systems.
• Lambda Architecture
• The lambda architecture is an approach to big data
processing that aims to achieve low latency updates
while maintaining the highest possible accuracy.
• It is divided in 3 layers. The first, “the batch layer” is
composed of a distributed file system which stores
the entirety of the collected data.
• The same layer stores a set of predefined functions
to be run on the dataset to produce what is called a
batch view. Those views are stored in a database
constituting the “serving layer” from which they
can be queried interactively by the user.
Types of Big Data Architecture
• Lambda Architecture
Types of Big Data Architecture
• Lambda Architecture
• The third layer called “speed layer” computes
incremental functions on the new data as it
arrives in the system.
• It processes only data which is generated
between two consecutive batch views re-
computation producing and it produces real-
time views which are also stored in the
serving layer. The different views are queried
together to obtain the most accurate possible
results
Types of Big Data Architecture
• Speed layer. The speed layer can be implemented using real- time
processing tools such as Storm or S4. Spark Streaming can also be used
although it treats data in micro-batches rather than in real streams. The
advantage is that the Spark code can be reused of in the batch layer
• Kappa Architecture