0% found this document useful (0 votes)

19 views28 pages

Lec 4 - Big Data Ecosystem Architecture

The document outlines the Big Data Ecosystem, detailing its components, architecture, and processing patterns such as Lambda and Kappa architectures. It describes the layers involved in big data architecture including data sources, ingestion, storage, analysis, consumption, and governance. The document emphasizes the importance of managing and analyzing large datasets through various tools and frameworks to derive business insights.

Uploaded by

amirosama21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views28 pages

Lec 4 - Big Data Ecosystem Architecture

Uploaded by

amirosama21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Big Data Ecosystem &

Architecture

Lecture 4
Outline ▪ Big Data Ecosystem and the different
components that exist in this ecosystem.

▪ Very Generic Big Data Architecture

▪ Big Data Architecture Patterns | Lambda vs

Kappa Architecture

2
Big Data Ecosystem

▪ The Big Data Ecosystem refers to

the entire environment of tools,
technologies, frameworks, and
processes that work together to manage
and analyse big data.

▪ It’s a broad concept that includes not

only the architecture but also
the people, processes, and business
use cases surrounding big data.
Big Data Ecosystem Overview
Big Data Architecture

“Big Data architecture is the logical and/or physical layout/structure of

how Big Data will be stored, accessed and managed within a Big Data
or IT environment” … Techopedia

▪ Logically defines how Big Data solution will work, the core components
(hardware, database, software, storage) used, flow of information, security
and more.

▪ Designing a big data architecture is a complex and challenging process due

to the following:
• Characteristic of big data.
• Faster additions of new technological innovations.
• Competing products at lower costs in the market .

“Big Data Analytics “, Ch.01 L03: Introduction To ... Big Data Analytics Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
A Very Generic Big Data
Architecture

https://www.youtube.com/watch?v=rvqCqK2Lpjg
A Very Generic Big Data
Architecture
A Very Generic Big Data Architecture

“Big Data Analytics “, Ch.01 L03: Introduction To ... Big Data Analytics Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
A Very Generic Big Data
Architecture

▪ The processing workflow can be divided into three layers:

• Data sources;
• Data management (integration, storage and consumption);
• Data analytics, Business intelligence (BI) and knowledge discovery (KD).

▪ This division allows us to discuss big data topics from different

perspectives.
• For computer scientists and engineers - data storage and management,
communication, and computation.
• For data scientists and statisticians - machine learning models
development to get usable information out of too huge and complex
datasets.
• From an organizational viewpoint - business analysts are expected to
select and deploy analytics service.
Data Source

▪ The real power of big data is to ingest tons data from different sources.

▪ Any type of data can be acquired and stored

▪ The data sources layer is composed of both private (internal) and public
(external) data sources.

▪ The most challenging task is to capture the heterogeneous datasets from

various service providers.

• XML and JSON are the de facto format for the web and mobile applications due
to their ease of integration into browser technologies and server technologies
that support Javascript.

• Linked Data and Semantic technologies

Ingestion Layer

▪ Data ingestion means a process of obtaining and importing data from data
sources for immediate use or storage.

▪ Data ingestion can be classified into two types:

• Batch - Large sets of data are acquired and supplied in batches. Data
collection might be conditionally triggered, scheduled, or ad hoc.

• Streaming - The continuous flow of data. This is required for real-time data
analysis. It finds and retrieves data as it is generated. Because it is always
watching for changes in data pools, it necessitates more resources.
Data Ingestion - Data Access
Connectors

▪ The Data Access Connectors are tools and frameworks for extracting and
ingesting data from various sources into the big data storage .

▪ These connectors can include both wired and wireless connections

▪ The data ingestion mechanism can either be a push or pull mechanism.

▪ The choice of the specific tool or framework for data ingestion will be driven by
the data consumer.
• If the consumer has the capability (or requirement) to pull data, publish-subscribe
messaging frameworks which allow the consumers to pull the data or messaging
queues can be used. The data producers push data to a messaging framework or a
queue from which the consumers can pull the data.
• An alternative design approach is the push approach, where the data sources first
push data to the framework and the framework then pushes the data to the data
sinks.
Data Ingestion - Data Access
Connectors

▪ Messaging Queues: are useful for push-pull messaging where the producers
push data to the queues and the consumers pull the data from the queues.
The producers and consumers do not need to be aware of each other.
▪ i.e. RabbitMQ, ZeroMQ, RestMQ and Amazon SQS.
Data Ingestion - Data Access
Connectors

▪ Publish-Subscribe Messaging: is a communication model that involves

publishers, brokers and consumers. Publishers are the source of data.
Publishers send the data to the topics which are managed by the broker.
▪ Publish-subscribe messaging frameworks such as Apache Kafka and Amazon Kinesis
Data Ingestion - Data Access
Connectors

• Source-Sink Connectors: allow collecting, aggregating and moving data from various
sources (such as server logs, databases, social media, streaming sensor data from
Internet of Things devices and other sources) into a centralized data store (such as a
distributed file system).

• Database Connectors: used for importing data from RDBMS into big data storage and
analytics frameworks, i.e., Apache Sqoop.

• Custom Connectors: can be built based on the source of the data and the data collection
requirements. Some examples of custom connectors include custom connectors for
collecting data from social networks, for NoSQL databases and connectors for Internet of
Things (IoT). REST, WebSocket and MQTT, and AWS IoT and Azure IoT Hub as IoT
custom connectors.
Storage Layer

▪ Data Storage includes distributed filesystems (i.e. HDFS) and non-relational databases
(NoSQL), which store the data collected from the raw data sources using the data
access connectors.

▪ Hadoop Distributed File System (HDFS), a distributed file system that runs on large
clusters and provides high-throughput access to data.

▪ Data stored in HDFS can be analysed with various big data analytics frameworks built
on top of HDFS.

▪ For certain analytics applications, it is preferable to store data in a NoSQL database

such as HBase.
Analysis Layer

▪ Data analytics refers to technologies that are grounded mostly in data mining and
statistical analysis for business insights
▪ To draw insights from the data, it pulls data from the data storage layer or directly from the
data source.
▪ The selection of an appropriate processing model and analytical solution is a challenging
problem and depends on the business issues of the targeted domain

Predict the future, understand the past: the four types of data analysis
Data Consumption Layer

▪ The data consumption layer is where the processed and analysed data is made
available to end-users, applications, or systems for decision-making, reporting, or
further action.

▪ This layer is the final stage in the data pipeline, where the value of the data is realized
by stakeholders.

▪ It has tools like dashboards, reporting tools, and business intelligence (BI) platforms
(e.g., Tableau, Power BI, Qlik) are used to visualize and interact with the data. In
addition, custom applications or APIs may also consume data for specific business
needs.
Governance Layer

▪ Strong guidelines and processes are required to monitor, structure, store, and
secure the data from the time it enters the enterprise, gets processed, stored,
analysed, and purged or archived.

▪ Governance for big data includes:

• Managing high volumes of data in variety of formats.
• Continuously training and managing the statistical models required to pre-process
unstructured data and analytics.
• Setting policy and compliance regulations for external data regarding its retention and
usage.
• Defining the data archiving and purging policies.
• Creating the policy for how data can be replicated across various systems.
• Setting data encryption policies.
Batch Processing
Real-Time Processing

With Dedicated
Serving Layer

Without Dedicated
Serving Layer
Big Data Architecture Patterns | Lambda
Architecture

▪ Lambda architecture is a data processing framework that aims to provide a

unified and fault-tolerant approach to big data processing.

▪ The architecture is designed to handle both batch and real-time data processing,
providing a comprehensive solution for handling large-scale data analysis.

• Ability to process data at high speed in a streaming context is necessary for operational
needs, such as transaction processing and real-time reporting.
• Batch processing, involving massive amounts of data, and related correlation and
aggregation is important for business reporting.

What is Lambda Architecture

Asim Zahid | Mar 25, 2023
Batch Only Serving
Pattern 1(a) – Lambda Architecture Layer

https://www.youtube.com/watch?v=waDJcSCXz_Y
Dedicated Serving
Pattern 1(b) – Lambda Architecture Layer

https://www.youtube.com/watch?v=waDJcSCXz_Y
Common Serving
Pattern 1(c) – Lambda Architecture Layer

https://www.youtube.com/watch?v=waDJcSCXz_Y
Big Data Architecture Patterns | Kappa
Architecture

▪ A data processing architecture used for streaming data.

▪ It is based on a streaming architecture in which an incoming series of data is

first stored in a messaging engine like Apache Kafka.

▪ From there, a stream processing engine will read the data and transform it into
an analysable format.

▪ The Kappa Architecture supports (near) real-time analytics when the data is
read and transformed immediately after it is inserted into the messaging
engine.

▪ The main difference with the Kappa Architecture is that all data is treated as if
it were a stream, so the stream processing engine acts as the sole data
transformation engine.
Pattern 2 – Kappa Architecture

https://www.youtube.com/watch?v=waDJcSCXz_Y
Lambda Architecture VS Kappa Architecture

https://www.youtube.com/watch?v=waDJcSCXz_Y

Big Data Architecture Basics
No ratings yet
Big Data Architecture Basics
24 pages
Big Data All Unit by Study4sub
No ratings yet
Big Data All Unit by Study4sub
161 pages
Unit 1
No ratings yet
Unit 1
51 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
79 pages
Unit I Introduction Data Science and Big Data
No ratings yet
Unit I Introduction Data Science and Big Data
42 pages
Big Data Components
No ratings yet
Big Data Components
31 pages
Bigdata
No ratings yet
Bigdata
23 pages
Big Data Unit-I
No ratings yet
Big Data Unit-I
28 pages
Big Data Analytics
No ratings yet
Big Data Analytics
36 pages
Lecture 2
No ratings yet
Lecture 2
11 pages
Stream Processing Chapter 2
No ratings yet
Stream Processing Chapter 2
21 pages
Big Data Chatgpt
No ratings yet
Big Data Chatgpt
8 pages
Big Data Components
No ratings yet
Big Data Components
58 pages
Abhishek Seminar 222
No ratings yet
Abhishek Seminar 222
19 pages
Module 1
No ratings yet
Module 1
29 pages
BIGDATAUNIT1 AKTUpdf
No ratings yet
BIGDATAUNIT1 AKTUpdf
33 pages
BD U-1 (Anupam Sir)
No ratings yet
BD U-1 (Anupam Sir)
20 pages
BDMA Part 2
No ratings yet
BDMA Part 2
16 pages
Lecture 2
No ratings yet
Lecture 2
25 pages
BigData AmberSahai1
No ratings yet
BigData AmberSahai1
32 pages
Microsoft Premium PL-900 by VCEplus 157q
100% (1)
Microsoft Premium PL-900 by VCEplus 157q
123 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
Detailednotes - Unit1 - Big Data
No ratings yet
Detailednotes - Unit1 - Big Data
22 pages
Big Data
No ratings yet
Big Data
51 pages
Bigdata Notes
No ratings yet
Bigdata Notes
136 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Data Ingestion, Processing and Architecture Layers For Big Data and Iot
No ratings yet
Data Ingestion, Processing and Architecture Layers For Big Data and Iot
32 pages
ACFrOgAo1SpYCo1YmTJeiGbHKH22nYKAL3GLgRtzpk4R3gRbHCAsTnCSMxfKm0SFBNYGz7keG7rfZN Y3QVo gdxiQyqG - 6KLsY2icn
No ratings yet
ACFrOgAo1SpYCo1YmTJeiGbHKH22nYKAL3GLgRtzpk4R3gRbHCAsTnCSMxfKm0SFBNYGz7keG7rfZN Y3QVo gdxiQyqG - 6KLsY2icn
14 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
Big Data Architecture
No ratings yet
Big Data Architecture
41 pages
Manual VL54
No ratings yet
Manual VL54
174 pages
Unit 2 - BD - Big Data Technology Foundations
No ratings yet
Unit 2 - BD - Big Data Technology Foundations
76 pages
Big Data - Unit-I
No ratings yet
Big Data - Unit-I
17 pages
Raw Material List
No ratings yet
Raw Material List
2 pages
1) Discuss Big Data Architecture in Detail With Help of Neat and Clean Diagram
No ratings yet
1) Discuss Big Data Architecture in Detail With Help of Neat and Clean Diagram
18 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
Unit 1 B Tech 3 Year BD
No ratings yet
Unit 1 B Tech 3 Year BD
10 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
24 pages
Abarbanel - 1996 - Analysis of Observed Chaotic Data PDF
No ratings yet
Abarbanel - 1996 - Analysis of Observed Chaotic Data PDF
277 pages
Unit II Big Data Architecture
No ratings yet
Unit II Big Data Architecture
5 pages
Catalogue ns80 300 Eng PDF
No ratings yet
Catalogue ns80 300 Eng PDF
268 pages
Big Data - Comprehensive Summary
No ratings yet
Big Data - Comprehensive Summary
12 pages
Big Data Architecture
No ratings yet
Big Data Architecture
9 pages
ALTIVAR312 P8 2009 07 en
100% (1)
ALTIVAR312 P8 2009 07 en
127 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
DSBDA EndSem2023 12F FlyHigh
No ratings yet
DSBDA EndSem2023 12F FlyHigh
20 pages
Big Data Class - Introduction
No ratings yet
Big Data Class - Introduction
60 pages
Big Data Architectures
No ratings yet
Big Data Architectures
8 pages
Introduction To Big Data, Hadoop and Spark
No ratings yet
Introduction To Big Data, Hadoop and Spark
40 pages
Racing
No ratings yet
Racing
130 pages
Data Science
No ratings yet
Data Science
87 pages
Big Data Analytics - Unit 2
No ratings yet
Big Data Analytics - Unit 2
10 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
Types of Digital Data: Unit 1 Big Data KCS-061
No ratings yet
Types of Digital Data: Unit 1 Big Data KCS-061
12 pages
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
Big Data-Introduction
No ratings yet
Big Data-Introduction
14 pages
Management Information System of Allied Bank
57% (14)
Management Information System of Allied Bank
13 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Chapter 6 - Big Data Architecture Part 1
No ratings yet
Chapter 6 - Big Data Architecture Part 1
41 pages
1 - Big Data Analytics & IoT
No ratings yet
1 - Big Data Analytics & IoT
13 pages
Lecture 2 - Big Data
No ratings yet
Lecture 2 - Big Data
8 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Big Data Unit 1 Notes - 240311 - 100703
No ratings yet
Big Data Unit 1 Notes - 240311 - 100703
15 pages
Big Data Architecture
No ratings yet
Big Data Architecture
4 pages
Ergonomics Design of Human CNC Machine Interface
No ratings yet
Ergonomics Design of Human CNC Machine Interface
23 pages
3
No ratings yet
3
45 pages
Part 1 - Introduction To Big Data
No ratings yet
Part 1 - Introduction To Big Data
24 pages
Mx3ipg2a PDF
No ratings yet
Mx3ipg2a PDF
2 pages
Share Whitepaper 7
No ratings yet
Share Whitepaper 7
14 pages
The Green Data Center Chapter 1
No ratings yet
The Green Data Center Chapter 1
19 pages
Non-Orthogonal Multiple Access For Cooperative Communications: Challenges, Opportunities, and Trends
No ratings yet
Non-Orthogonal Multiple Access For Cooperative Communications: Challenges, Opportunities, and Trends
20 pages
ECON 262-Mathematical Applications in Economics-Kiran Arooj
0% (1)
ECON 262-Mathematical Applications in Economics-Kiran Arooj
4 pages
Creating and Managing A Bluebeam Session For Construction Administration - r1
No ratings yet
Creating and Managing A Bluebeam Session For Construction Administration - r1
8 pages
PRC3000 Manual Press Quick Start
No ratings yet
PRC3000 Manual Press Quick Start
4 pages
Cs578: Internet of Things: Different Components of Iot
No ratings yet
Cs578: Internet of Things: Different Components of Iot
6 pages
Zamfira Ioana Ruxandra - Raport
No ratings yet
Zamfira Ioana Ruxandra - Raport
10 pages
4-5-Security-Privacy Rules-Lessonplan
No ratings yet
4-5-Security-Privacy Rules-Lessonplan
4 pages
BrightEye 5 - Analog Composite TBC and Frame Sync BE5
No ratings yet
BrightEye 5 - Analog Composite TBC and Frame Sync BE5
2 pages
12 Issue Akira
100% (1)
12 Issue Akira
20 pages
AI Syllabus
No ratings yet
AI Syllabus
2 pages
Stm32 Embedded Software Offering
No ratings yet
Stm32 Embedded Software Offering
12 pages
HTW Berlin Master Thesis
100% (4)
HTW Berlin Master Thesis
6 pages
HTML Paragraphs: Example
No ratings yet
HTML Paragraphs: Example
4 pages
Programmatically Update The UpdatePanel Using ASP
No ratings yet
Programmatically Update The UpdatePanel Using ASP
4 pages
Ecm Battery
No ratings yet
Ecm Battery
1 page
BookReview GoodCharts
No ratings yet
BookReview GoodCharts
1 page
Asian EFL Journal
No ratings yet
Asian EFL Journal
1 page
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet

Lec 4 - Big Data Ecosystem Architecture

Uploaded by

Lec 4 - Big Data Ecosystem Architecture

Uploaded by

Big Data Ecosystem &

▪ Very Generic Big Data Architecture

▪ Big Data Architecture Patterns | Lambda vs

▪ The Big Data Ecosystem refers to

▪ It’s a broad concept that includes not

“Big Data architecture is the logical and/or physical layout/structure of

▪ Designing a big data architecture is a complex and challenging process due

▪ The processing workflow can be divided into three layers:

▪ This division allows us to discuss big data topics from different

▪ Any type of data can be acquired and stored

▪ The most challenging task is to capture the heterogeneous datasets from

• Linked Data and Semantic technologies

▪ Data ingestion can be classified into two types:

▪ These connectors can include both wired and wireless connections

▪ The data ingestion mechanism can either be a push or pull mechanism.

▪ Publish-Subscribe Messaging: is a communication model that involves

▪ For certain analytics applications, it is preferable to store data in a NoSQL database

▪ Governance for big data includes:

▪ Lambda architecture is a data processing framework that aims to provide a

What is Lambda Architecture

▪ A data processing architecture used for streaming data.

▪ It is based on a streaming architecture in which an incoming series of data is

You might also like