0% found this document useful (0 votes)

21 views24 pages

Unit - Iv Data Analytics Frameworks: Centralized and Distributed Functional Architectures of Relational Systems

Uploaded by

kaseybritto4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views24 pages

Unit - Iv Data Analytics Frameworks: Centralized and Distributed Functional Architectures of Relational Systems

Uploaded by

kaseybritto4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

UNIT - IV DATA ANALYTICS FRAMEWORKS

Big Data Architectures: Centralized and Distributed functional architectures of relational

systems; Data Warehousing architectures; Service Oriented Architecture; Lambda architecture

Centralized and Distributed functional architectures of relational

Systems

Introduction to Big Data Architectures

Big Data refers to datasets that are so large or complex that traditional data processing
applications are inadequate to deal with them. The challenges include data capture, storage,
analysis, data curation, search, sharing, transfer, visualization, querying, updating, and
information privacy.

To manage such extensive data volumes effectively, specialized Big Data Architectures are
developed. These architectures address the three V's of big data—Volume (the amount of
data), Velocity (the speed at which data is generated), and Variety (the different types of
data)—and sometimes also include Veracity (data accuracy) and Value(data's usefulness).

Big Data Architectures can be broadly divided into Centralized and Distributed systems,
especially when dealing with relational data management.

Overview of Relational Systems

Relational Database Management Systems (RDBMS) are databases that store data in a
structured format, using rows and columns (i.e., tables). They use Structured Query Language
(SQL) to perform queries and manage data. Examples of traditional RDBMS include Oracle
Database, MySQL, Microsoft SQL Server, and PostgreSQL.

Key Components of Relational Systems:

1. Tables (Relations): The fundamental structure in which data is stored. Each table is a
collection of related data entries organized by rows and columns.
2. Schema: Defines the structure of the database, including the tables, fields, data types,
and relationships between tables.
3. Primary Key: A unique identifier for each record in a table. Ensures that each entry can
be uniquely distinguished.
4. Foreign Key: A field in one table that uniquely identifies a row in another table,
establishing a relationship between the two tables.
5. Indexes: Data structures that improve the speed of data retrieval operations at the cost
of additional space and processing time for writes.

Example Scenario:

Consider a database for a university:

● Students Table: Contains student ID (Primary Key), name, date of birth, and other
personal information.
● Courses Table: Contains course ID (Primary Key), course name, and description.
● Enrollments Table: Manages the many-to-many relationship between students and
courses, using student ID and course ID as Foreign Keys.

1. Centralized Database: A centralized database is basically a type of database that is stored,

located as well as maintained at a single location only. This type of database is modified and
managed from that location itself. This location is thus mainly any database system or a
centralized computer system. The centralized location is accessed via an internet connection
(LAN, WAN, etc). This centralized database is mainly used by institutions or organizations.
Advantages:

● Since all data is stored at a single location only thus it is easier to access and
coordinate data.
● The centralized database has very minimal data redundancy since all data is stored
in a single place.
● It is cheaper in comparison to all other databases available.

Disadvantages:

● The data traffic in the case of a centralized database is more.

● If any kind of system failure occurs in the centralized system then the entire data will
be destroyed.

2. Distributed Database:

A distributed database is basically a type of database which consists of multiple databases that
are connected with each other and are spread across different physical locations. The data that
is stored in various physical locations can thus be managed independently of other physical
locations. The communication between databases at different physical locations is thus done by
a computer network.
Advantages:

● This database can be easily expanded as data is already spread across different
physical locations.
● The distributed database can easily be accessed from different networks.
● This database is more secure in comparison to a centralized database.

Disadvantages:

● This database is very costly and is difficult to maintain because of its complexity.

● In this database, it is difficult to provide a uniform view to users since it is spread

across different physical locations.

Distributed Functional Architectures

Definition

A Distributed Functional Architecture refers to a system where data storage, processing, and
management are spread across multiple interconnected systems or nodes. This architecture is
designed to handle large-scale data processing by leveraging distributed computing and
storage.

Components

1. Distributed Database Systems:

○ Data is partitioned and spread across multiple nodes. Each node may store a
portion of the overall database.
○ Horizontal Partitioning (Sharding): Data is split across nodes based on
row-level partitioning (e.g., customer data by region).
○ Vertical Partitioning: Data is split by column across nodes (less common in
practice).
2. Data Processing Frameworks:
○ Distributed processing systems like Hadoop MapReduce, Apache Spark, and
Apache Flink enable large-scale data processing by distributing the workload
across multiple nodes.
3. Storage Solutions:
○ Distributed file systems like HDFS (Hadoop Distributed File System) store data
across multiple machines, ensuring redundancy and fault tolerance.
○ Cloud storage services (e.g., Amazon S3, Google Cloud Storage) also provide
distributed storage with high availability.
4. Networking Infrastructure:
○ High-speed networking (often using technologies like InfiniBand or
high-performance Ethernet) is crucial to ensure low-latency communication
between nodes.
5. Load Balancers:
○ Distribute workloads evenly across nodes, preventing any single node from
becoming a bottleneck.
6. Data Replication and Sharding Mechanisms:
○ Replication ensures that copies of the data are stored on multiple nodes for fault
tolerance.
○ Sharding divides the database into smaller, more manageable pieces distributed
across the network.

Detailed Workflow

1. Data Ingestion:
○ Data is ingested in parallel across multiple nodes. Data pipelines are designed to
feed into distributed storage and processing systems.
○ Example: Sensor data from thousands of IoT devices is ingested simultaneously
into a distributed database.
2. Data Storage:
○ Data is partitioned across multiple nodes using sharding strategies. Replication
ensures that even if one node fails, data is still available.
○ Example: A large-scale e-commerce platform stores customer and transaction
data across different data centers globally.
3. Data Processing:
○ Distributed computing frameworks like Spark process data in parallel, using the
power of multiple nodes to handle large datasets.
○ Example: A real-time recommendation system processes user behavior data
across a distributed cluster to generate recommendations.
4. Data Access:
○ Users and applications access data through distributed queries that retrieve and
aggregate data from multiple nodes.
○ Example: A BI tool queries a distributed database to generate a report combining
data from multiple sources.

Advantages
● Scalability: Easily scales horizontally by adding more nodes, making it suitable for very
large datasets and high transaction volumes.
● Fault Tolerance: If one node fails, others can take over, minimizing downtime.
● Performance: Distributed processing allows for handling large workloads more
efficiently, reducing processing time.
● Global Accessibility: Data can be stored in geographically distributed nodes, ensuring
faster access for users worldwide.

Challenges

● Complexity: Managing and coordinating distributed systems is more complex than

centralized ones.
● Consistency: Ensuring data consistency across nodes (especially in systems requiring
strong consistency) can be challenging.
● Latency: Inter-node communication can introduce latency, especially in geographically
distributed setups.
● Cost: Higher operational costs due to the need for more infrastructure, maintenance,
and skilled personnel.

Examples

● Google Spanner: A globally distributed database system that supports both SQL
queries and strong consistency.
● Apache Cassandra: A distributed NoSQL database designed for high availability and
scalability, often used by large-scale enterprises like Netflix.
● Amazon DynamoDB: A key-value and document database that delivers single-digit
millisecond performance at any scale.
Data Warehousing architectures

Introduction to Data Warehousing

Data Warehousing is the process of collecting, storing, and managing large volumes of data
from various sources within an organization to support business intelligence (BI) activities,
including reporting, data analysis, and decision-making. A Data Warehouse is a central
repository of integrated data from one or more disparate sources.

Data Warehousing systems are designed to support queries and analysis rather than
transaction processing, providing a historical view of data. The architecture of a Data
Warehouse is crucial as it defines how data is stored, managed, and accessed.

Key Components of Data Warehousing Architecture

1. Data Sources:
○ The origin points from where data is collected. These can include databases, flat
files, online transaction processing (OLTP) systems, enterprise resource planning
(ERP) systems, and external data sources.
2. ETL (Extract, Transform, Load) Process:
○ Extract: Collecting data from various source systems.
○ Transform: Cleaning, filtering, and reformatting the data into a suitable structure
for analysis.
○ Load: Loading the transformed data into the Data Warehouse.
3. Staging Area:
○ A temporary storage area where data is processed during the ETL process. It is
used to clean, transform, and prepare data before it is loaded into the Data
Warehouse.
4. Data Warehouse:
○ The central repository where integrated, historical data is stored. It is optimized
for query performance and analysis rather than transaction processing.
5. Data Marts:
○ Subsets of the Data Warehouse that are designed for specific business lines or
departments. They contain data tailored to the needs of a particular group or
function within the organization.
6. OLAP (Online Analytical Processing) Cubes:
○ Multidimensional data structures that allow users to perform complex queries and
analysis quickly. OLAP cubes pre-aggregate data for rapid querying.
7. Metadata:
○ Data about the data stored in the Data Warehouse. It includes information on
data sources, transformations, data structures, and relationships. Metadata is
critical for managing and navigating the Data Warehouse.
8. Query Tools:
○ Software applications that allow users to query the Data Warehouse, generate
reports, and perform data analysis. These tools often include SQL interfaces,
reporting software, and data visualization tools.
9. Business Intelligence (BI) Tools:
○ Applications used to analyze data and generate insights. BI tools include
dashboards, data mining tools, and machine learning platforms.

Types of Data Warehousing Architectures

1. Single-Tier Architecture

Definition:

● A simple architecture where the Data Warehouse, ETL process, and BI tools are all
housed on a single system.

Characteristics:

● Minimal data latency.

● Low cost and complexity.
● Often used for small-scale, less complex data environments.

Limitations:

● Scalability issues as data volume grows.

● Potential performance bottlenecks since all processes are on one system.

Use Case:
● Small businesses or departments with limited data volume and straightforward analytics
needs.

2. Two-Tier Architecture

Definition:

● A more common architecture where the Data Warehouse and data marts are on one
layer, and the BI tools are on another.

Components:

● First Tier: Data sources and ETL processes.

● Second Tier: Data Warehouse and Data Marts.

Characteristics:

● Better scalability and performance compared to single-tier.

● Moderate complexity.
● Data marts allow for departmental segmentation.

Limitations:

● Data redundancy across data marts.

● More complex data integration processes.

Use Case:

● Medium-sized organizations with distinct departmental needs for data analysis.

3. Three-Tier Architecture

Definition:

● The most widely used architecture, consisting of three layers: the data source layer, the
Data Warehouse layer, and the BI tool layer.

Components:

● First Tier: Data sources and the staging area.

● Second Tier: The central Data Warehouse.
● Third Tier: BI tools and data visualization platforms.

Characteristics:

● Highly scalable and robust.

● Supports complex queries and high data volumes.
● Centralized data storage with data marts or OLAP cubes for specific departments.

Limitations:

● Higher cost and complexity.

● Requires significant hardware and software infrastructure.

Use Case:

● Large enterprises with extensive data and analytics needs.

4. Hybrid Architecture

Definition:

● A combination of two or more architectures, designed to leverage the strengths of each

while mitigating their weaknesses.

Components:

● May include a mix of on-premises and cloud-based Data Warehousing solutions.

● Combines centralized Data Warehouses with distributed data marts or OLAP cubes.

Characteristics:

● Flexible and adaptable to changing business needs.

● Can integrate with cloud storage and processing solutions.

Limitations:

● Increased complexity in managing different systems.

● Requires careful integration planning and management.

Use Case:

● Organizations transitioning to the cloud or with a mix of legacy systems and modern BI
needs.

Data Flow in a Data Warehousing Architecture

Step-by-Step Data Flow:

1. Data Extraction:
○ Data is extracted from various sources such as OLTP systems, flat files, and
external sources. This process may occur in real-time or in batch mode.
2. Data Transformation:
○ The extracted data is cleaned, normalized, and transformed to fit the schema of
the Data Warehouse. This step includes data validation, deduplication, and
conversion of data types.
3. Data Loading:
○ Transformed data is loaded into the staging area first for temporary storage and
further processing, and then into the Data Warehouse.
4. Data Storage:
○ Data is stored in the Data Warehouse in a structured format. Data marts may be
created to cater to specific departments or functions.
5. Data Aggregation and Cubes:
○ Data is aggregated and summarized into OLAP cubes to support fast,
multidimensional analysis.
6. Query and Analysis:
○ Users access the data through BI tools, querying the Data Warehouse or OLAP
cubes to generate reports, dashboards, and insights.
7. Metadata Management:
○ Metadata is maintained to track the origin, transformation, and location of data
within the system.

Data Warehousing Tools and Technologies

ETL Tools:

● Informatica PowerCenter
● Talend
● Microsoft SQL Server Integration Services (SSIS)
● Apache NiFi

Database Management Systems:

● Oracle Exadata
● Amazon Redshift
● Google BigQuery
● Microsoft Azure Synapse Analytics

OLAP and Query Tools:

● SAP BW
● IBM Cognos
● Microsoft Power BI
● Tableau
Data Warehousing Platforms:

● Snowflake
● Teradata
● Cloudera Data Platform

Service Oriented Architecture

Introduction to Service-Oriented Architecture (SOA)

Service-Oriented Architecture (SOA) is an architectural pattern that enables software

components (services) to be developed, deployed, and maintained independently. These
services are designed to provide discrete pieces of functionality, such as processing customer
orders, managing inventory, or generating reports. The goal of SOA is to create a flexible,
scalable, and reusable architecture that can adapt to changing business requirements.
In SOA, services communicate with each other over a network using standard protocols, such
as HTTP, SOAP, or REST. This allows services to be platform-independent and accessible
across different environments, enabling interoperability between heterogeneous systems.

Key Principles of SOA

1. Loose Coupling:
○ Services are designed to minimize dependencies on each other. Changes to one
service should not impact other services.
2. Interoperability:
○ Services can communicate across different platforms and technologies using
standardized protocols.
3. Reusability:
○ Services are designed to be reused across different applications and business
processes.
4. Abstraction:
○ Services encapsulate the underlying logic, exposing only the necessary interface
to consumers.
5. Autonomy:
○ Each service operates independently, with control over its own logic and data.
6. Discoverability:
○ Services are published in a registry or repository, allowing them to be easily
discovered and invoked by consumers.
7. Composability:
○ Services can be combined or orchestrated to create complex business processes
or workflows.

Components of SOA

1. Services:
○ Self-contained units of functionality that can be independently developed,
deployed, and managed. Each service provides a specific business function,
such as customer management or order processing.
2. Enterprise Service Bus (ESB):
○ A middleware component that facilitates communication between services. The
ESB handles message routing, transformation, and protocol mediation, enabling
seamless integration between services.
3. Service Registry and Repository:
○ A centralized directory where services are registered and stored. The registry
contains metadata about each service, including its location, interface, and usage
policies, enabling service discovery and governance.
4. Service Consumers:
○ Applications or systems that invoke services to perform specific tasks.
Consumers can be web applications, desktop applications, or other services.
5. Service Providers:
○ The entities responsible for implementing and hosting services. Providers define
the service interface, business logic, and data processing.
6. Service Contracts:
○ The formal agreements between service consumers and providers, specifying the
service's interface, input/output data formats, communication protocols, and
quality of service (QoS) requirements.
7. Business Process Management (BPM):
○ Tools and technologies that enable the orchestration and management of
services to create end-to-end business processes.

SOA Architecture

1. Enterprise Service Bus (ESB)

Definition:

● The ESB acts as a communication layer that connects and integrates services within an
SOA. It provides routing, mediation, and transformation services, allowing different
services to communicate seamlessly.

Functions:

● Message Routing: Directs messages between services based on predefined rules.

● Protocol Mediation: Translates messages between different communication protocols.
● Message Transformation: Converts message formats to ensure compatibility between
services.
● Security: Provides centralized security mechanisms such as authentication,
authorization, and encryption.

Examples of ESB:

● Apache Camel
● Mule ESB
● IBM WebSphere ESB

2. Service Registry and Repository

Definition:

● A centralized directory where services are published and stored. It includes metadata
about each service, such as its location, interface, and policies.

Functions:

● Service Discovery: Allows consumers to find available services.

● Governance: Ensures that services adhere to organizational standards and policies.
● Versioning: Manages different versions of services to support backward compatibility.

Examples:

● WSO2 Governance Registry

● Oracle Service Registry

3. Service Consumers and Providers

Service Providers:

● Develop and deploy services.

● Define service contracts, including the interface and communication protocols.

Service Consumers:

● Invoke services to perform tasks.

● Interact with services using the defined service contract.

Interaction:

● Consumers discover services via the registry, invoke them through the ESB, and receive
the required functionality.

4. Service Contracts

Definition:

● A service contract is a formal specification that defines the interaction between a service
provider and a consumer.

Components:

● Interface Definition: Specifies the methods or functions available for invocation.

● Data Formats: Defines the input and output data structures.
● Communication Protocols: Specifies the protocols (e.g., HTTP, SOAP) used for
interaction.
● Quality of Service (QoS): Details performance metrics, security requirements, and
availability guarantees.

Importance:

● Ensures clear expectations between service consumers and providers.

● Facilitates interoperability and reuse.

SOA Best Practices

1. Design for Reusability:

○ Create services that are modular and can be reused across different applications
and business processes.
2. Loose Coupling:
○ Minimize dependencies between services to allow independent development,
deployment, and scaling.
3. Use Standard Protocols:
○ Leverage standard communication protocols (e.g., SOAP, REST) to ensure
interoperability between services.
4. Implement Robust Security:
○Use authentication, authorization, and encryption to protect services from
unauthorized access and data breaches.
5. Version Control:
○ Maintain versioning of services to support backward compatibility and manage
changes effectively.
6. Monitor and Optimize Performance:
○ Continuously monitor service performance and optimize to reduce latency and
improve scalability.

Lambda architecture

Introduction to Lambda Architecture

Lambda Architecture is designed to handle large-scale data processing in a way that balances
latency, throughput, and fault-tolerance. It combines batch processing for historical data and
real-time stream processing for fresh data, providing a unified view of both historical and
real-time data.

The architecture is particularly effective for scenarios where real-time data processing is
required, but the results need to be eventually consistent with batch-processed data to ensure
accuracy and completeness.

Key Components of Lambda Architecture

Lambda Architecture consists of three main layers:

1. Batch Layer:
○ Purpose: Handles large-scale historical data processing and generates batch
views that are accurate and complete.
○ Data Storage: Stores the master dataset, which is an immutable and
append-only dataset containing all the raw data.
○ Processing: Performs computation on the entire dataset (e.g., aggregation,
filtering) to generate batch views. This layer typically has higher latency but
guarantees accuracy.
2. Speed Layer:
○ Purpose: Handles real-time data processing to provide low-latency updates to
the system.
○ Data Storage: Stores the incoming data temporarily until it is processed.
○ Processing: Performs real-time computation on the data as it arrives, generating
real-time views or updates. This layer is designed for low latency but may
sacrifice some accuracy due to the approximate nature of real-time
computations.
3. Serving Layer:
○ Purpose: Combines the results from both the batch and speed layers to serve
queries in a unified manner.
○ Data Storage: Stores the precomputed views (from both batch and speed layers)
and indexes them for fast query performance.
○ Query Handling: Ensures that queries return the most up-to-date and accurate
data by combining batch views and real-time updates.

Data Flow in Lambda Architecture

Step-by-Step Data Flow:

1. Data Ingestion:
○ Raw data is ingested from various sources (e.g., sensors, logs, transactions) and
stored in both the batch layer (as the master dataset) and the speed layer.
2. Batch Processing:
○ The batch layer periodically processes the entire master dataset to generate
comprehensive batch views. This process may take minutes or hours, depending
on the data volume and complexity.
3. Real-Time Processing:
○ Simultaneously, the speed layer processes incoming data in real-time, generating
real-time views that reflect the latest data. This processing is typically done in
seconds or milliseconds.
4. Serving Layer:
○ The serving layer merges the batch views with the real-time views to provide a
complete and up-to-date result for any query. The batch views offer accuracy,
while the real-time views ensure low-latency updates.
5. Query Execution:
○ When a query is made, the serving layer accesses both the batch and speed
layers to retrieve the required data, ensuring that the response is both accurate
and timely.

Advantages of Lambda Architecture

1. Scalability:
○ Can handle massive volumes of data due to the separation of batch and real-time
processing.
2. Fault-Tolerance:
○ The architecture is resilient to system failures, as data is stored in an immutable
format in the batch layer, allowing for recomputation if necessary.
3. Flexibility:
○ Supports various data processing use cases, from real-time analytics to
long-term data warehousing.
4. Accuracy and Completeness:
○ Batch processing ensures that historical data is processed accurately, while the
speed layer allows for real-time updates, providing a balance between accuracy
and latency.
5. Separation of Concerns:
○ By dividing processing into batch and real-time layers, each can be optimized
independently for its specific requirements.

Challenges of Lambda Architecture

1. Complexity:
○ The architecture is inherently complex, requiring the development and
maintenance of two parallel processing pipelines (batch and speed).
2. Data Consistency:
○ Ensuring consistency between the batch and speed layers can be challenging,
especially when dealing with late-arriving data or updates.
3. Latency in Batch Processing:
○ The batch layer introduces inherent latency, as it processes large volumes of
data over longer periods.
4. Resource Intensive:
○ Requires significant computational resources to manage and maintain both
processing layers and the serving layer.
5. Data Duplication:
○ Data is often stored and processed in multiple layers, leading to potential
duplication of storage and processing efforts.

Tools and Technologies

Batch Layer:

● Hadoop: For distributed storage and processing of large datasets.

● Apache Spark: For fast, in-memory batch processing.
● Amazon EMR: For cloud-based, scalable batch processing.

Speed Layer:

● Apache Storm: For real-time stream processing.

● Apache Kafka Streams: For processing streams of data in real-time.
● Apache Flink: For stateful stream processing with low latency.

Serving Layer:

● HBase: For storing and serving batch views with low-latency read access.
● Cassandra: For distributed and scalable data storage.
● Elasticsearch: For indexing and searching across the batch and real-time views.

Lambda Architecture Use Cases

1. Real-Time Analytics:
○ Monitoring and analyzing streaming data, such as financial transactions, social
media feeds, or sensor data, to provide real-time insights.
2. Fraud Detection:
○ Identifying fraudulent activities by combining historical data analysis with
real-time monitoring to detect anomalies.
3. Personalization Engines:
○ Delivering personalized recommendations by processing user behavior data in
real-time, while refining models with batch processing.
4. IoT Applications:
○ Managing data from a vast network of IoT devices, where real-time data is
crucial, but long-term trends and patterns also need to be analyzed.

Comparison with Other Architectures

Lambda vs. Kappa Architecture:

● Lambda Architecture:
○ Uses both batch and speed layers to process data.
○ Suitable for scenarios where both historical accuracy and real-time updates are
required.
● Kappa Architecture:
○ Relies solely on stream processing, eliminating the batch layer.
○ Simplifies the architecture but may compromise on the accuracy of long-term
data processing.

Lambda vs. Traditional Data Warehousing:

● Lambda Architecture:
○ Combines real-time and batch processing to offer low-latency insights.
○ Better suited for modern, large-scale data environments.
● Traditional Data Warehousing:
○ Focuses primarily on batch processing of historical data.
○ May not provide the real-time capabilities needed for modern applications.

Big Data Architecture Basics
No ratings yet
Big Data Architecture Basics
24 pages
Distributed DBMS
No ratings yet
Distributed DBMS
62 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Distributed Databases: Daniel Marcous
No ratings yet
Distributed Databases: Daniel Marcous
41 pages
BDT Unit 02 - Part1
No ratings yet
BDT Unit 02 - Part1
153 pages
Master Spark Concepts
No ratings yet
Master Spark Concepts
112 pages
04 - Distributed DBMSs - Concepts and Design
No ratings yet
04 - Distributed DBMSs - Concepts and Design
72 pages
Unit 2
No ratings yet
Unit 2
34 pages
UNIT II Database & Data Warehouse
No ratings yet
UNIT II Database & Data Warehouse
26 pages
Abhishek Seminar 222
No ratings yet
Abhishek Seminar 222
19 pages
Unit I
No ratings yet
Unit I
11 pages
07 DistributedDataManagement
No ratings yet
07 DistributedDataManagement
44 pages
Unit 5
No ratings yet
Unit 5
5 pages
Unit 2
No ratings yet
Unit 2
14 pages
ADMT End War
No ratings yet
ADMT End War
30 pages
07 BigData DataAnalysis
No ratings yet
07 BigData DataAnalysis
66 pages
DDB Notes
No ratings yet
DDB Notes
19 pages
Unit 4
No ratings yet
Unit 4
13 pages
Parallel Database Systems and Their Architecture
No ratings yet
Parallel Database Systems and Their Architecture
17 pages
Big Data Slides
No ratings yet
Big Data Slides
26 pages
Big Data Imp-1
No ratings yet
Big Data Imp-1
16 pages
NoSQL Unit 1 & 2 QnA
No ratings yet
NoSQL Unit 1 & 2 QnA
18 pages
Second Unit ADBMS
No ratings yet
Second Unit ADBMS
53 pages
TIE - 21CS71 SIMP With Key Answers
No ratings yet
TIE - 21CS71 SIMP With Key Answers
19 pages
Big Data Analysis PDF 2
No ratings yet
Big Data Analysis PDF 2
18 pages
Advanced Database Concepts
No ratings yet
Advanced Database Concepts
7 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Big Data Pyq 21-22
No ratings yet
Big Data Pyq 21-22
9 pages
David Baba
No ratings yet
David Baba
9 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
24 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
Uc PDF
No ratings yet
Uc PDF
10 pages
Advanced Database Individual Assignment
No ratings yet
Advanced Database Individual Assignment
4 pages
DB Unit-2
No ratings yet
DB Unit-2
27 pages
DBS Reviewer
No ratings yet
DBS Reviewer
4 pages
DBMS - Chapter 1
No ratings yet
DBMS - Chapter 1
45 pages
Big Data Analysis Unit 1-5 Extended
No ratings yet
Big Data Analysis Unit 1-5 Extended
35 pages
Unit 1-Database Systems Concepts and Architecture
No ratings yet
Unit 1-Database Systems Concepts and Architecture
41 pages
CS3492-DBMS Unit-5
No ratings yet
CS3492-DBMS Unit-5
9 pages
BDA Module-1
No ratings yet
BDA Module-1
9 pages
Big Data Unit-Ii Notes
No ratings yet
Big Data Unit-Ii Notes
7 pages
Lefikir PowerPoint
No ratings yet
Lefikir PowerPoint
15 pages
Adbms Finals Reviewer
No ratings yet
Adbms Finals Reviewer
3 pages
Hadoop Recap
No ratings yet
Hadoop Recap
27 pages
Data Ingreation Approaches
No ratings yet
Data Ingreation Approaches
9 pages
Midterm Elective Database Notes
No ratings yet
Midterm Elective Database Notes
14 pages
Introduction To Database Systems
No ratings yet
Introduction To Database Systems
4 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
DDB Slides
No ratings yet
DDB Slides
30 pages
Distributed DB
No ratings yet
Distributed DB
16 pages
DMND 1
No ratings yet
DMND 1
8 pages
Tybca Recent Trends in It Chpter 1
No ratings yet
Tybca Recent Trends in It Chpter 1
16 pages
Distibuted System
No ratings yet
Distibuted System
11 pages
IET Udaipur BDA Unit-1
No ratings yet
IET Udaipur BDA Unit-1
10 pages
Gfmam The Maintenance Framework First Edition English Version
100% (1)
Gfmam The Maintenance Framework First Edition English Version
24 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
DDBS Post Mid - Lecture 2
No ratings yet
DDBS Post Mid - Lecture 2
16 pages
Subject: Dds (512) Distributed Data Processing
No ratings yet
Subject: Dds (512) Distributed Data Processing
12 pages
Darood
No ratings yet
Darood
22 pages
Assessments in Occupational Therapy Mental Health An Integrative Approach, 4th Edition Full Digital Edition
100% (15)
Assessments in Occupational Therapy Mental Health An Integrative Approach, 4th Edition Full Digital Edition
16 pages
Jamb Mat Questions 1 5
No ratings yet
Jamb Mat Questions 1 5
46 pages
Brighton Spec ASME 80-10 2017 PDF
No ratings yet
Brighton Spec ASME 80-10 2017 PDF
1 page
HG Grade 3
No ratings yet
HG Grade 3
3 pages
Formulation of Objective
No ratings yet
Formulation of Objective
16 pages
ED-UCCP-201401A-Packaged Water Cool PDF
No ratings yet
ED-UCCP-201401A-Packaged Water Cool PDF
38 pages
API 653 Notes
No ratings yet
API 653 Notes
3 pages
Weekly Topical Test 1 Trigonometry
No ratings yet
Weekly Topical Test 1 Trigonometry
3 pages
Qualification and Validation
No ratings yet
Qualification and Validation
45 pages
Dan Glimne Motor Tuning 2 - MC Jan-70
No ratings yet
Dan Glimne Motor Tuning 2 - MC Jan-70
40 pages
Endometrium Embryology and Development
No ratings yet
Endometrium Embryology and Development
1 page
Catalogo Thompson MOTORES
No ratings yet
Catalogo Thompson MOTORES
23 pages
The Gomti Riverfront in Lucknow, India: Revitalization of A Cultural Heritage Landscape
No ratings yet
The Gomti Riverfront in Lucknow, India: Revitalization of A Cultural Heritage Landscape
20 pages
Mini Research On Homeless
No ratings yet
Mini Research On Homeless
6 pages
Illustrated Parts Catalog Bo105 Ls A-3: Lifting System Assy
No ratings yet
Illustrated Parts Catalog Bo105 Ls A-3: Lifting System Assy
2 pages
Mitochondrial Disorders Biochemical and Molecular Analysis Methods in Molecular Biology Vol 837 2012th Edition Lee-Jun C. Wong (Editor) Download PDF
100% (2)
Mitochondrial Disorders Biochemical and Molecular Analysis Methods in Molecular Biology Vol 837 2012th Edition Lee-Jun C. Wong (Editor) Download PDF
84 pages
Red Pills
100% (1)
Red Pills
2 pages
CG Project Report
No ratings yet
CG Project Report
25 pages
Teacher Leader Qualities Self Assessment
No ratings yet
Teacher Leader Qualities Self Assessment
7 pages
Complete Seminar Presentation-3
No ratings yet
Complete Seminar Presentation-3
16 pages
A World of Art Exam Chapter 1
No ratings yet
A World of Art Exam Chapter 1
7 pages
Abbotsford VFR Terminal Procedures Chart Rwy 01 & 19
No ratings yet
Abbotsford VFR Terminal Procedures Chart Rwy 01 & 19
3 pages
Fisa Tehnica Pompe SPAU
No ratings yet
Fisa Tehnica Pompe SPAU
4 pages
Fpse 64
No ratings yet
Fpse 64
1 page
Mod Menu Crash 2025 01 03-19 23 22
No ratings yet
Mod Menu Crash 2025 01 03-19 23 22
1 page
Semest Er - 3ecesyl L Abus (Analogelectroni CS)
No ratings yet
Semest Er - 3ecesyl L Abus (Analogelectroni CS)
2 pages
SOW Ransomware Protection Solution v1
No ratings yet
SOW Ransomware Protection Solution v1
11 pages
Special Comment(s) Overall:: 9299724. Clark Builders. Raymond Block - Level3 - Zone1 - Phase1. March 01, 2018
No ratings yet
Special Comment(s) Overall:: 9299724. Clark Builders. Raymond Block - Level3 - Zone1 - Phase1. March 01, 2018
3 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet

Unit - Iv Data Analytics Frameworks: Centralized and Distributed Functional Architectures of Relational Systems

Uploaded by

Unit - Iv Data Analytics Frameworks: Centralized and Distributed Functional Architectures of Relational Systems

Uploaded by

UNIT - IV DATA ANALYTICS FRAMEWORKS

Big Data Architectures: Centralized and Distributed functional architectures of relational

Centralized and Distributed functional architectures of relational

Introduction to Big Data Architectures

Overview of Relational Systems

Key Components of Relational Systems:

Consider a database for a university:

1. Centralized Database: A centralized database is basically a type of database that is stored,

● The data traffic in the case of a centralized database is more.

● In this database, it is difficult to provide a uniform view to users since it is spread

across different physical locations.

1. Distributed Database Systems:

● Complexity: Managing and coordinating distributed systems is more complex than

Introduction to Data Warehousing

Key Components of Data Warehousing Architecture

Types of Data Warehousing Architectures

● Minimal data latency.

● Scalability issues as data volume grows.

● First Tier: Data sources and ETL processes.

● Better scalability and performance compared to single-tier.

● Data redundancy across data marts.

● Medium-sized organizations with distinct departmental needs for data analysis.

● First Tier: Data sources and the staging area.

● Highly scalable and robust.

● Higher cost and complexity.

● Large enterprises with extensive data and analytics needs.

● A combination of two or more architectures, designed to leverage the strengths of each

● May include a mix of on-premises and cloud-based Data Warehousing solutions.

● Flexible and adaptable to changing business needs.

● Increased complexity in managing different systems.

Data Flow in a Data Warehousing Architecture

Step-by-Step Data Flow:

Data Warehousing Tools and Technologies

Database Management Systems:

OLAP and Query Tools:

Service Oriented Architecture

Introduction to Service-Oriented Architecture (SOA)

Service-Oriented Architecture (SOA) is an architectural pattern that enables software

Key Principles of SOA

● Message Routing: Directs messages between services based on predefined rules.

2. Service Registry and Repository

● Service Discovery: Allows consumers to find available services.

● WSO2 Governance Registry

3. Service Consumers and Providers

● Develop and deploy services.

● Invoke services to perform tasks.

● Interface Definition: Specifies the methods or functions available for invocation.

● Ensures clear expectations between service consumers and providers.

SOA Best Practices

1. Design for Reusability:

Introduction to Lambda Architecture

Key Components of Lambda Architecture

Data Flow in Lambda Architecture

Advantages of Lambda Architecture

Challenges of Lambda Architecture

Tools and Technologies

● Hadoop: For distributed storage and processing of large datasets.

● Apache Storm: For real-time stream processing.

Lambda Architecture Use Cases

Comparison with Other Architectures

Lambda vs. Kappa Architecture:

Lambda vs. Traditional Data Warehousing:

You might also like