U2 - Hub Spoke

The document discusses hub-and-spoke and bus architectures in data warehousing, highlighting their roles in managing data flow and integration. It explains the ETL (Extract, Transform, Load) process, detailing its stages and the advantages and disadvantages of using ETL in data warehousing. Overall, it emphasizes the importance of these architectures and processes for efficient data management and analysis.

Uploaded by

Ameryn Ameryn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views17 pages

U2 - Hub Spoke

Uploaded by

Ameryn Ameryn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Big Data Analytics Unit 2

Hub and Spoke Architecture

● A hub-and-spoke data warehouse architecture is a type of data warehouse
architecture that is composed of a central hub, which is typically a relational
database, and a number of spokes, which are usually OLAP cubes or data
marts.
● A hub and spoke is any process in which a wheel of a bicycle is used to move
along a path (referred to as a spoke in a bicycle). In the logistics industry, a
hub and spoke distribution model is used to move inventory from a large
distribution center to multiple fulfillment centers.
● In the hub and spoke architecture, the hub serves as the centralized broker,
while the spoke serves as an adapter that connects applications to the hub.
● The spoke establishes a connection with an application and converts
application data into a format that the hub understands.
● A hub and spoke data model is a type of data model that is used to organize
data in a way that is easy to understand and use. This type of data model is
often used in databases and software applications.
● The hub and spoke data model is made up of a central hub, which is
surrounded by a number of spokes. Each spoke represents a different piece
of data. The hub and spoke data model is easy to use because it is easy to
understand how the data is arranged.
● This type of architecture is often used in organizations that have a large
amount of data to warehouse.
● The hub-and-spoke architecture allows the organization to keep the data in
one central location, while still providing access to the data for reporting and
analysis.
● The hub and spoke models both use virtual networks to manage external
connectivity and hosting services used by multiple workloads. On virtual
networks, workloads are hosted and linked to the central hub through virtual
network peering.
Bus Architecture
•In the context of data warehousing, the term "bus architecture" typically
refers to the concept of a "data bus."
•The data bus architecture is used to manage the flow of data from source
systems to the data warehouse
Here's how the data bus architecture works in a data warehouse context:
•Central Integration Point: The data bus serves as a central integration
point where data from various source systems is collected and transformed
before being loaded into the data warehouse. It acts as a staging area for
the data use in the dimensional model.
• Hub-and-Spoke Model: The data bus architecture resembles a hub-and-spoke
model. The "hub" represents the central data integration point (the data bus), and the
"spokes" represent the source systems that feed data into the hub.
•Data Staging: Data from source systems is first extracted and loaded into the data
bus. This allows for standardization, transformation, and cleansing of the data before
it's further integrated into the data warehouse.
•Decoupling Data Sources: The data bus architecture decouples the data
warehouse from individual source systems. This means that changes in source
systems don't directly impact the data warehouse's structure. Instead, changes are
managed within the data bus, and the data warehouse is updated with consistent,
transformed data.
•Dimensional Modeling: Once the data is transformed within the data bus, it is then loaded into the data
warehouse using a dimensional modeling approach, such as star schema or snowflake schema. This allows
for efficient querying and reporting.
•Scalability and Flexibility: The data bus architecture can accommodate new source systems more easily,
making the data warehouse architecture scalable and flexible. New data sources can be integrated into the
data bus without disrupting the existing data flow.
•Data Consistency: By applying transformations and data quality checks within the data bus, data
consistency and integrity are maintained before data is loaded into the data warehouse. This helps ensure
accurate reporting and analysis.
•Incremental Loading: The data bus architecture supports incremental loading of data. Only changed or
new data needs to be processed and loaded into the data warehouse, reducing the processing load and
improving efficiency
ETL Process in Data Warehouse

ETL stands for Extract, Transform, Load and it is a process used in

data warehousing to extract data from various sources, transform it
into a format suitable for loading into a data warehouse, and then
load it into the warehouse.
The process of ETL can be broken down into the following three
stages:
•Extract data from legacy systems
•Cleanse the data to improve data quality and establish consistency
•Load data into a target database
ETL Process in Data Warehouse
ETL Process in Data Warehouse

1.Extract: The first stage in the ETL process is to extract data from various sources such as
transactional systems, spreadsheets, and flat files. This step involves reading data from the
source systems and storing it in a staging area.

2.Transform: In this stage, the extracted data is transformed into a format that is suitable for
loading into the data warehouse. This may involve cleaning and validating the data, converting
data types, combining data from multiple sources, and creating new data ﬁelds.

3.Load: After the data is transformed, it is loaded into the data warehouse. This step involves
creating the physical data structures and loading the data into the warehouse.
ETL Process in Data Warehouse

The ETL process is an iterative process that is repeated as new

data is added to the warehouse. The process is important because
it ensures that the data in the data warehouse is accurate,
complete, and up-to-date. It also helps to ensure that the data is
in the format required for data mining and reporting.
How ETL works

Extract
During data extraction, raw data is copied or exported from source locations to a
staging area. Data management teams can extract data from a variety of data
sources, which can be structured or unstructured. Those sources include but are
not limited to:
•SQL or NoSQL servers
•CRM and ERP systems
•Flat files
•Email
•Web pages
How ETL works

Transform
The second step of the ETL process is transformation. In this step, a set of rules
or functions are applied to the extracted data to convert it into a single standard
format. It may involve the following processes/tasks: Filtering – loading only
certain attributes into the data warehouse.
•Cleaning – filling up the NULL values with some default values, mapping U.S.A,
United States, and America into USA, etc.
•Joining – joining multiple attributes into one.
•Splitting – splitting a single attribute into multiple attributes.
•Sorting – sorting tuples on the basis of some attribute (generally key-attribute).
How ETL works

Loading
•The third and final step of the ETL process is loading. In this step,
the transformed data is finally loaded into the data warehouse.
•Sometimes the data is updated by loading into the data
warehouse very frequently and sometimes it is done after longer
but regular intervals.
•The rate and period of loading solely depend on the requirements
and vary from system to system.
ETL Tools
The most commonly used ETL tools are:
1. Hevo
2.Sybase
3.Oracle Warehouse builder
4.CloverETL
5.MarkLogic.
Advantages of the ETL process in data warehousing:

1.Improved data quality: The ETL process ensures that the data in the data warehouse
is accurate, complete, and up-to-date.
2.Better data integration: The ETL process helps to integrate data from multiple
sources and systems, making it more accessible and usable.
3.Increased data security: The ETL process can help to improve data security by
controlling access to the data warehouse and ensuring that only authorized users can
access the data.
4.Improved scalability: The ETL process can help to improve scalability by providing a
way to manage and analyze large amounts of data.
5.Increased automation: ETL tools and technologies can automate and simplify the
ETL process, reducing the time and effort required to load and update data in the
warehouse.
Disadvantages of ETL process in data warehousing:

1.High cost: ETL process can be expensive to implement and maintain,

especially for organizations with limited resources.
2.Complexity: ETL process can be complex and difficult to implement,
especially for organizations that lack the necessary expertise or resources.
3.Limited flexibility: ETL process can be limited in terms of flexibility, as it may
not be able to handle unstructured data or real-time data streams.
4.Limited scalability: ETL process can be limited in terms of scalability, as it
may not be able to handle very large amounts of data.
5.Data privacy concerns: ETL process can raise concerns about data privacy, as
large amounts of data are collected, stored, and analyzed.

Model Driven Logical ETL Design Part1
No ratings yet
Model Driven Logical ETL Design Part1
9 pages
Unit 2 DW
No ratings yet
Unit 2 DW
75 pages
Chapter 4 (PRE 6)
No ratings yet
Chapter 4 (PRE 6)
4 pages
Introduction To DW
No ratings yet
Introduction To DW
59 pages
Unit1 (DW&DM)
No ratings yet
Unit1 (DW&DM)
30 pages
What Is ETL?
No ratings yet
What Is ETL?
6 pages
Notes Download Ba
No ratings yet
Notes Download Ba
104 pages
Module 2
No ratings yet
Module 2
43 pages
Sheila A. Ibia Bsit 2 What Is ETL (Extract, Transform, Load) ?
No ratings yet
Sheila A. Ibia Bsit 2 What Is ETL (Extract, Transform, Load) ?
5 pages
2-Data Warehousing
No ratings yet
2-Data Warehousing
30 pages
Bi Unit 3
No ratings yet
Bi Unit 3
26 pages
Data Min
No ratings yet
Data Min
2 pages
ETL
No ratings yet
ETL
3 pages
All Unit
No ratings yet
All Unit
17 pages
Assignment On Chapter 8 Data Warehousing and Management
No ratings yet
Assignment On Chapter 8 Data Warehousing and Management
13 pages
Lec 11 - DW
No ratings yet
Lec 11 - DW
32 pages
Assignment On Chapter 8 Data Warehousing and Management
No ratings yet
Assignment On Chapter 8 Data Warehousing and Management
13 pages
MIS-15 - Data and Knowledge Management
No ratings yet
MIS-15 - Data and Knowledge Management
55 pages
Introduction To Data Warehousing - Overview
No ratings yet
Introduction To Data Warehousing - Overview
21 pages
DWDM
No ratings yet
DWDM
107 pages
Solve These Questions
No ratings yet
Solve These Questions
11 pages
DWH and Testing1
No ratings yet
DWH and Testing1
11 pages
DWM Unit1 Solved QB
No ratings yet
DWM Unit1 Solved QB
14 pages
$RRWYO9T
No ratings yet
$RRWYO9T
71 pages
An Overview On Data Quality Issues at Data Staging ETL
No ratings yet
An Overview On Data Quality Issues at Data Staging ETL
4 pages
Data Warehousing
No ratings yet
Data Warehousing
16 pages
DSS Ch03
No ratings yet
DSS Ch03
10 pages
Unit 2
No ratings yet
Unit 2
25 pages
ETL Testing
No ratings yet
ETL Testing
12 pages
DW Unit II Notes
No ratings yet
DW Unit II Notes
57 pages
ETL Process: (Extract, Transform, and Load) Process
No ratings yet
ETL Process: (Extract, Transform, and Load) Process
21 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
21 pages
Unit4.3 Datawarehousesanddatamining 9d922ec6 3ac3 460f 91d0 Cadde45958f7 177231
No ratings yet
Unit4.3 Datawarehousesanddatamining 9d922ec6 3ac3 460f 91d0 Cadde45958f7 177231
45 pages
DWH Concepts Overview
No ratings yet
DWH Concepts Overview
11 pages
ETL - Extract, Transform and Load: What Is A Data Warehouse?
No ratings yet
ETL - Extract, Transform and Load: What Is A Data Warehouse?
30 pages
Data Warehouse
No ratings yet
Data Warehouse
86 pages
Module 1
No ratings yet
Module 1
32 pages
Data Warehouse Definition
No ratings yet
Data Warehouse Definition
12 pages
Data Mining and Warehousing Lecture-1,2
No ratings yet
Data Mining and Warehousing Lecture-1,2
37 pages
Unit B Data Warehousing
No ratings yet
Unit B Data Warehousing
26 pages
Introduction To Data Warehousing and Business Intelligence
No ratings yet
Introduction To Data Warehousing and Business Intelligence
15 pages
DWM Notes 1
No ratings yet
DWM Notes 1
15 pages
ETL (Extract, Transform, and Load) Process in Data Warehouse
No ratings yet
ETL (Extract, Transform, and Load) Process in Data Warehouse
6 pages
Survey On ETL Processes
No ratings yet
Survey On ETL Processes
11 pages
Data Warehousing Full QA
No ratings yet
Data Warehousing Full QA
2 pages
ETL Basic Concepts
No ratings yet
ETL Basic Concepts
63 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
Dataware House
100% (8)
Dataware House
42 pages
Extract, Transform and Load (Etl) Performance Improved by Query Cache
No ratings yet
Extract, Transform and Load (Etl) Performance Improved by Query Cache
20 pages
ETL (Extract, Transform, and Load) Process
No ratings yet
ETL (Extract, Transform, and Load) Process
8 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
11 pages
Session Five - Data Integration
No ratings yet
Session Five - Data Integration
11 pages
Lecture 13 - Data Warehousing
No ratings yet
Lecture 13 - Data Warehousing
27 pages
Notes 1
No ratings yet
Notes 1
51 pages
Data Warehouse
100% (1)
Data Warehouse
12 pages
Data Warehouse - Concept and Fundamentals: Sridevi
No ratings yet
Data Warehouse - Concept and Fundamentals: Sridevi
25 pages
Creation of A Data Warehouse: To Create A Data Warehouse From Various .CSV Files Using Postgrsql Tool
No ratings yet
Creation of A Data Warehouse: To Create A Data Warehouse From Various .CSV Files Using Postgrsql Tool
18 pages
Data Warehouse - Final
No ratings yet
Data Warehouse - Final
28 pages
ETL Process
No ratings yet
ETL Process
11 pages
Creative Writing
No ratings yet
Creative Writing
68 pages
Science Behind The Construction of A Temple
No ratings yet
Science Behind The Construction of A Temple
3 pages
The Finite Element Method
No ratings yet
The Finite Element Method
13 pages
Heat Exchanger Formulas
No ratings yet
Heat Exchanger Formulas
2 pages
Rational of The Body Shop - Project 2 (Branded Interactions)
No ratings yet
Rational of The Body Shop - Project 2 (Branded Interactions)
5 pages
R8 Waray BoSY CRLA 11.24.2021 v4
No ratings yet
R8 Waray BoSY CRLA 11.24.2021 v4
10 pages
Sample .Paper - 1 - Class Xii
No ratings yet
Sample .Paper - 1 - Class Xii
7 pages
HW5e Int Tests Guide
50% (2)
HW5e Int Tests Guide
1 page
March June 2022
No ratings yet
March June 2022
24 pages
Cloze Test: How To Crack The Nut
No ratings yet
Cloze Test: How To Crack The Nut
4 pages
Zamoras Vs Su Case Digest
No ratings yet
Zamoras Vs Su Case Digest
1 page
Final Test: With Answer Key
No ratings yet
Final Test: With Answer Key
8 pages
Agency 2
No ratings yet
Agency 2
8 pages
Luxuria A Monster Romance (Colette Rhodes) (Z-Library) - 289-344
No ratings yet
Luxuria A Monster Romance (Colette Rhodes) (Z-Library) - 289-344
56 pages
PART II - Private Corporations
No ratings yet
PART II - Private Corporations
6 pages
CIN: U40109MH2005SGC153645: (A Govt. of Maharashtra Undertaking)
No ratings yet
CIN: U40109MH2005SGC153645: (A Govt. of Maharashtra Undertaking)
1 page
Lesson 1 PE
No ratings yet
Lesson 1 PE
28 pages
Tale of High Elf and Futa Oni
No ratings yet
Tale of High Elf and Futa Oni
1 page
(Case) - Honda B
No ratings yet
(Case) - Honda B
5 pages
Ex - Mayor Sanchez
No ratings yet
Ex - Mayor Sanchez
3 pages
Module 5 Welding
No ratings yet
Module 5 Welding
119 pages
Answers To The Plato Practice Test
50% (6)
Answers To The Plato Practice Test
7 pages
Grade 7 Tos
No ratings yet
Grade 7 Tos
2 pages
Cardio (PP012) Quiz 1 Grades
No ratings yet
Cardio (PP012) Quiz 1 Grades
7 pages
Special Web 1 PDF
No ratings yet
Special Web 1 PDF
12 pages
British Ballads From Maine
No ratings yet
British Ballads From Maine
599 pages
2011 Design House Catalog
No ratings yet
2011 Design House Catalog
220 pages
3.1 Fluid Mosaic Model
No ratings yet
3.1 Fluid Mosaic Model
35 pages
Lesson 5 - Site Layout and Design-1
No ratings yet
Lesson 5 - Site Layout and Design-1
7 pages
In The Beginning : Chapter Summaries
No ratings yet
In The Beginning : Chapter Summaries
2 pages

U2 - Hub Spoke

Uploaded by

U2 - Hub Spoke

Uploaded by

Big Data Analytics Unit 2

Hub and Spoke Architecture

ETL stands for Extract, Transform, Load and it is a process used in

The ETL process is an iterative process that is repeated as new

1.High cost: ETL process can be expensive to implement and maintain,

You might also like