0% found this document useful (0 votes)
20 views2 pages

3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views2 pages

3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

Batch Layer

Stores the raw, immutable data (e.g., in a Data Lake or distributed file system
like Hadoop).
Processes the data in bulk at regular intervals using batch jobs.
Produces a batch view, which contains precomputed results for accurate querying.
Tools: Hadoop, Apache Spark, Azure Data Lake, etc.
Speed Layer

Processes data in real-time as it arrives (e.g., events, transactions).


Provides low-latency, approximate results immediately.
Complements the batch layer by covering only the most recent data.
Tools: Apache Kafka, Apache Flink, Azure Event Hub, etc.
Serving Layer

Combines the batch and real-time outputs to provide a unified, queryable view of
the data.
Delivers results to end-users or applications via APIs or dashboards.
Tools: Databases (e.g., Cassandra, Elasticsearch), Power BI, etc.
How it Works:
Data Ingestion: Raw data flows into both the batch and speed layers simultaneously.
Processing:
The batch layer processes the entire dataset at regular intervals to ensure
accuracy.
The speed layer processes incoming data in real-time for low-latency responses.
Serving:
The serving layer combines outputs from both layers, prioritizing real-time data
for immediacy but relying on the batch layer for historical and accurate results.
Example: Social Media Analytics
Imagine a social media platform tracking user interactions like likes, shares, and
comments.

Batch Layer:
Historical data of all user interactions is stored in a data lake and processed
nightly to generate accurate metrics like monthly active users (MAU) or engagement
trends.

Speed Layer:
Real-time interactions are processed as they happen to display the latest trending
topics or live user counts.

Serving Layer:
A dashboard shows a combination of real-time stats (current active users, live
trends) and historical data (engagement over the last month).

Underlying Architecture
Data Sources: Events, logs, sensors, transactions, etc.
Ingestion Layer: Tools like Apache Kafka, Azure Event Hubs, or Amazon Kinesis bring
data into the system.
Batch Layer Storage: Data is stored in distributed file systems (HDFS, Azure Data
Lake) for processing.
Batch Layer Processing: Engines like Apache Spark or Hadoop process the data in
large-scale jobs.
Stream Layer Processing: Stream processing tools (Flink, Storm) handle real-time
events.
Serving Layer: Combines and serves data using databases or visualization tools
(e.g., Power BI, Tableau).
Benefits of Lambda Architecture
Scalability: Handles vast amounts of data.
Fault Tolerance: Each layer ensures resilience in case of failures.
Flexibility: Can process both real-time and historical data.
Limitations
Complexity: Maintaining separate batch and speed layers requires more effort.
Data Duplication: Raw data is processed in both layers, leading to redundancy.
Latency in Batch Layer: Accurate batch results are delayed until the job completes.
Would you like to explore a practical implementation of Lambda Architecture?

You might also like