Data Warehousing

Data warehousing is a centralized repository that stores current and historical data from various sources to support decision-making and provide insights. It consists of three layers: the Data Source Layer for collecting raw data, the Data Warehouse Layer for storing and processing integrated data, and the Presentation Layer for user access and analysis tools. An Enterprise Data Warehouse (EDW) enhances decision-making, data quality, historical intelligence, efficiency, scalability, BI capabilities, data security, collaboration, compliance, and strategic advantage.

Uploaded by

Dhanasekar Sethupathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views42 pages

Data Warehousing

Uploaded by

Dhanasekar Sethupathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 42

Data Warehousing

• What is Data Warehousing?

• Definition: A data warehouse is a central warehouse that
stores current and historical data from different sources.
• Purpose: To provide insights and support decision-making.
Key Components
• Data Sources: Operational databases, external data sources, etc.
• ETL (Extract, Transform, Load): Processes for moving data from sources to the
warehouse.
• Data Warehouse: Centralized repository.
• Data Marts: Subset of the data warehouse tailored for specific business needs.
• BI Tools: Tools for querying and reporting.
Data Warehousing Architecture
• Data Source Layer (Bottom Tier)
• Purpose: This layer is responsible for collecting data
from various source systems.
• Components:
– Operational Databases: These are the day-to-day
transactional databases from which data is extracted.
– External Sources: These can include data from external
providers, flat files, spreadsheets, or other sources.
• Data Collection: Data is extracted from these sources,
often in raw form, before being processed.
• Data Warehouse Layer (Middle Tier)
• Purpose: This layer is where data is stored, processed, and
integrated. It acts as a central repository for data.
Components:
- middle layer features an OLAP to increase query speed.
– ETL Processes: Extract, Transform, Load processes are used to clean,
transform, and load data into the data warehouse.
– Data Storage: The data warehouse itself, where integrated data from
various sources is stored. This data is typically structured and
optimized for querying and reporting.

Features:
– Data Integration: Combining data from different sources.
– Data Consolidation: Aggregating data for comprehensive analysis.
– Historical Data Storage: Maintaining historical data for trend analysis.
• Presentation Layer (Top Tier)
• Purpose: This layer provides users with access to the data
and tools needed for analysis and reporting.
Components:
– Business Intelligence (BI) Tools: Tools like dashboards,
reporting tools, and query tools that help users analyze the data.
– Data Mining Tools: Tools used to uncover patterns and insights
from the data.
Features:
– User Access: Providing different levels of access and
capabilities depending on user roles.
– Data Visualization: Representing data in charts, graphs, and
other visual formats for easier interpretation.
Summary
• Bottom Tier (Data Source Layer): Collects and
prepares raw data from various sources.
• Middle Tier (Data Warehouse Layer):
Centralized storage and processing of
integrated data.
• Top Tier (Presentation Layer): Provides tools
and interfaces for data analysis and reporting.
Enterprise Data Warehouse (EDW)
1.Improved Decision-Making
• Centralized Data: By consolidating data from multiple sources, an EDW
provides a unified view of the organization's information, making it easier
for decision-makers to access accurate and comprehensive data.
• Advanced Analytics: Facilitates complex queries and advanced analytics,
leading to better insights and informed decision-making.
2. Enhanced Data Quality and Consistency
• Data Integration: Ensures that data from different sources is integrated
and standardized, reducing inconsistencies and errors.
• Data Cleaning: ETL processes improve data quality by cleaning and
transforming data before it is loaded into the warehouse.
3. Historical Intelligence
• Time-Series Analysis: Stores historical data, allowing for trend analysis
and the ability to track changes over time.
• Long-Term Data Storage: Enables organizations to analyze long-term
performance and historical patterns.
4. Increased Efficiency
• Faster Query Performance: Optimized for querying and reporting, which
speeds up the retrieval of data compared to querying operational systems.
• Resource Optimization: Offloads reporting and analytical workloads
from operational systems, improving their performance and efficiency.
5. Scalability
• Handling Large Volumes of Data: Designed to scale with the growing
volume of data, supporting large datasets and complex queries.
• Future Growth: Can accommodate expanding data needs and integrate
new data sources as the organization grows.
6. Enhanced Business Intelligence (BI) Capabilities
• Integrated Reporting: Facilitates comprehensive reporting and
analysis across different business units and functions.
• Data Visualization: Supports advanced visualization tools for
better data interpretation and communication.
7. Improved Data Security
• Centralized Control: Provides a centralized platform for
implementing security measures and access controls.
• Data Governance: Ensures consistent data governance policies and
practices are applied across the organization.
8. Better Collaboration
• Shared Data Access: Enables different departments and teams to
access the same data, fostering collaboration and alignment.
• Consistent Information: Provides a single source of truth, reducing
discrepancies and enhancing communication across the
organization.
9. Compliance and Reporting
• Regulatory Compliance: Facilitates compliance with industry regulations
by maintaining accurate and complete records.
• Audit Trails: Provides detailed audit trails for data access and
modifications, supporting transparency and accountability.
10. Strategic Advantage
• Competitive Insights: Allows organizations to analyze market trends,
customer behavior, and operational performance, leading to strategic
advantages in the market.
• Innovation Support: Provides a solid foundation for data-driven
innovation and strategic initiatives.
DATA MINING TOOLS
• Data mining tools are software applications designed to analyze large datasets and
• RapidMiner
• Features:
– User-friendly interface with drag-and-drop functionality.
– Comprehensive suite for data preparation, modeling, evaluation, and deployment.
– Supports various algorithms for classification, regression, clustering, and
association rules.
– Integration with various data sources and formats.
• 2. KNIME
• Features:
– Open-source data analytics platform with a visual workflow interface.
– Supports data mining, machine learning, and data visualization.
– Extensive library of nodes for different data processing tasks.
– Integrates with R, Python, and other statistical tools.
• 3. SAS Enterprise Miner
• Features:
– Advanced analytics platform for data mining, predictive modeling, and
machine learning.
– Robust tools for data preparation, modeling, and evaluation.
– Supports a wide range of algorithms and techniques.
– Integration with SAS's other analytics and business intelligence tools.
• 4. IBM SPSS Modeler
• Features:
– Data mining and predictive analytics software with a visual interface.
– Supports a variety of data mining techniques, including clustering,
classification, and regression.
– Offers integration with IBM Watson for enhanced analytics capabilities.
– Capabilities for handling text mining and sentiment analysis.
• extract useful information or patterns.
• **5. Tableau
• Features:
– Primarily a data visualization tool with powerful analytics capabilities.
– Allows for interactive data exploration and dashboard creation.
– Provides integration with various data sources and supports complex calculations.
– Capable of performing basic data mining tasks such as clustering and trend analysis.
• **6. Microsoft SQL Server Analysis Services (SSAS)
• Features:
– Part of the Microsoft SQL Server suite, used for online analytical processing (OLAP) and data
mining.
– Provides data mining models for classification, clustering, and regression.
– Integration with other Microsoft products and BI tools.
• **7. Weka
• Features:
– Open-source software for data mining and machine learning.
– Offers a collection of algorithms for data preprocessing, classification, clustering, and association.
– Provides a user-friendly graphical interface for experimenting with different algorithms.
• 8. H2O.ai
• Features:
– Open-source platform for advanced machine learning and data mining.
– Supports various algorithms, including generalized linear models, gradient boosting machines,
and deep learning.
– Scalable and capable of handling big data.
– Integration with other data science tools and languages, such as R and Python.
• **9. Orange
• Features:
– Open-source data visualization and analysis tool with a visual programming interface.
– Provides widgets for data mining, machine learning, and data visualization.
– Suitable for educational purposes and rapid prototyping.
• **10. Google Cloud AI and BigQuery
• Features:
– Cloud-based tools for big data analytics and machine learning.
– BigQuery: Managed data warehouse for running SQL queries on large datasets.
– Google Cloud AI: Offers tools for building and deploying machine learning models.
• **11. Alteryx
• Features:
– Data preparation and analytics platform with a drag-and-drop interface.
– Provides tools for data blending, cleansing, and advanced analytics.
– Supports integration with various data sources and BI tools.
• **12. Domo
• Features:
– Cloud-based platform for business intelligence and data mining.
– Offers tools for data integration, visualization, and advanced analytics.
– Includes features for real-time data monitoring and reporting.
Basic statistical descriptions of
• data
Basic statistical descriptions of data provide a summary of the key characteristics of a
dataset
• 1. Measures of Central Tendency
• These measures describe the center or typical value of a dataset.
• Mean (Average):
– Definition: The sum of all data values divided by the number of values.
– Formula:
– Usage: Provides the arithmetic average of the data.
• Median:
– Definition: The middle value when the data is sorted in ascending or descending
order.
– Formula: For an odd number of observations, it is the middle value. For an even
number, it is the average of the two middle values.
– Usage: Useful for understanding the central value, especially when data is skewed.
• .
• Mode:
– Definition: The value that appears most frequently in the dataset.
– Usage: Identifies the most common value or values in the data.
Variability refers to how spread out or discrete the values in a dataset are from
the average or central value.
Importance in Data Science
• Model Evaluation: Understanding variability helps in evaluating the
performance of models. For example, high variability in model predictions
may indicate over fitting.
• Data Quality: Identifying high variability can signal issues such as
outliers or inconsistencies in the data.
3. Correlation
Correlation measures the strength and direction of the relationship between
two variables. It quantifies how changes in one variable are associated with
changes in another.
• In data science, correlation helps to understand relationships
between features.
Types of Correlation

• Positive Correlation:
– Definition: When one variable increases, the other variable also tends to
increase.
– Example: Height and weight. Generally, as height increases, weight also
increases.
• Negative Correlation:
– Definition: When one variable increases, the other variable tends to
decrease.
– Example: Exercise frequency and body fat percentage. More exercise
might correlate with lower body fat.
• No Correlation:
– Definition: No discernible relationship between the two variables.
– Example: Shoe size and intelligence. There’s no expected relationship
between these variables.
Regression

• Regression is a statistical method used to model and analyze the

relationship between a dependent variable and one or more independent
variables. It is used to predict the dependent variable based on the values
of the independent variables and to understand the relationships between
variables.
• Key Concepts in Regression
• Dependent Variable (Response Variable):
– The variable you are trying to predict or explain.
– Example: House price.
• Independent Variables (Predictors or Features):
– Variables that influence or predict the dependent variable.
– Example: Square footage, number of bedrooms.
Importance in Data Science
• Predictive Modeling: Regression models are fundamental for predicting
continuous outcomes and estimating relationships between variables.
• Understanding Relationships: Helps to identify and quantify
relationships between features and the target variable.
• Feature Selection: Assists in determining which features are significant
predictors and should be included in models.
• Risk Management: Used to assess risks and uncertainties in various
domains, such as finance and healthcare.
bias
• bias is the difference between the model's average prediction and the actual
value. It reflects how far off the model's predictions are from the true
values on average.
Types of Bias
• High Bias: This occurs when a model is too simple to capture the
underlying patterns in the data, leading to under fitting. It means the model
is not complex enough to learn from the data and therefore has a high
training error and poor predictive performance.
• Low Bias: This indicates that the model is more complex and can capture
more patterns in the data. However, if the model is too complex, it might
lead to over fitting.
Summary
• In each example, high bias refers to a consistent error in one direction
(e.g., consistently underestimating or overestimating),

• In low bias means the predictions or measurements are closer to the true
values on average, with minimal consistent error.

Model Config IBP
100% (1)
Model Config IBP
434 pages
Data Mining
No ratings yet
Data Mining
142 pages
Data Warehousing and Data Mining
75% (4)
Data Warehousing and Data Mining
14 pages
Data Engineering Part 1 1735286787
No ratings yet
Data Engineering Part 1 1735286787
22 pages
Git & Github Tutorial: Jorge Ramírez
No ratings yet
Git & Github Tutorial: Jorge Ramírez
59 pages
Data Management & Data Architecture
No ratings yet
Data Management & Data Architecture
21 pages
Ics 2104 Object Oriented Programming I
No ratings yet
Ics 2104 Object Oriented Programming I
3 pages
Business Intelligence
No ratings yet
Business Intelligence
18 pages
Ransomware Hostage Rescue Manual
No ratings yet
Ransomware Hostage Rescue Manual
27 pages
ICT and Its Current State
No ratings yet
ICT and Its Current State
10 pages
Business Analytics
No ratings yet
Business Analytics
3 pages
Omcics Troubleshooting
No ratings yet
Omcics Troubleshooting
690 pages
BI Lab File
No ratings yet
BI Lab File
25 pages
Unit 2
No ratings yet
Unit 2
22 pages
HPE - A00127308en - Us - HPE Performance Cluster Manager System Monitoring Guide
No ratings yet
HPE - A00127308en - Us - HPE Performance Cluster Manager System Monitoring Guide
308 pages
ABAP Coding STD + Naming Conversions
No ratings yet
ABAP Coding STD + Naming Conversions
25 pages
22011a0512 Madhu Da
No ratings yet
22011a0512 Madhu Da
5 pages
Data Warehousing and Mining Module 1
No ratings yet
Data Warehousing and Mining Module 1
34 pages
Bida Notes
No ratings yet
Bida Notes
67 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
92 pages
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
No ratings yet
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
54 pages
Project Report: Bachelor of Engineering
No ratings yet
Project Report: Bachelor of Engineering
28 pages
Daves Business Intelligence Guide UA
No ratings yet
Daves Business Intelligence Guide UA
25 pages
Presentation 20
No ratings yet
Presentation 20
31 pages
1 GovTech Factsheet 26 Apr
No ratings yet
1 GovTech Factsheet 26 Apr
3 pages
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
No ratings yet
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
24 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
23 pages
??? ????????? ???
No ratings yet
??? ????????? ???
21 pages
ISDM Group5 Review
No ratings yet
ISDM Group5 Review
23 pages
Organizing Data and Information: AD660 - Databases, Security, and Web Technologies
No ratings yet
Organizing Data and Information: AD660 - Databases, Security, and Web Technologies
31 pages
Business Intelligence
No ratings yet
Business Intelligence
17 pages
DWM 2
No ratings yet
DWM 2
31 pages
Managing The Information Systems Infrastructure and Services Session 13&14
No ratings yet
Managing The Information Systems Infrastructure and Services Session 13&14
9 pages
12 01 09 10 32 12 1287 Sindhujam PDF
No ratings yet
12 01 09 10 32 12 1287 Sindhujam PDF
23 pages
Data Warehousing Mining
No ratings yet
Data Warehousing Mining
26 pages
Chapter 1 Data Warehouse Fundamentals
No ratings yet
Chapter 1 Data Warehouse Fundamentals
26 pages
Updated
No ratings yet
Updated
29 pages
Data Mining and Warehousing Lecture-1,2
No ratings yet
Data Mining and Warehousing Lecture-1,2
37 pages
Vendor Attestation Policy PDF
No ratings yet
Vendor Attestation Policy PDF
9 pages
Bi 1-5
No ratings yet
Bi 1-5
83 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
17 pages
Ba Important
No ratings yet
Ba Important
13 pages
SQL Training 101
No ratings yet
SQL Training 101
25 pages
CS 2208 Data Mining and Warehousing Notes
No ratings yet
CS 2208 Data Mining and Warehousing Notes
14 pages
ABW2011IQM15
No ratings yet
ABW2011IQM15
21 pages
Dmbi Question Bank
No ratings yet
Dmbi Question Bank
21 pages
CS2032 Unit I Notes
No ratings yet
CS2032 Unit I Notes
23 pages
Data and Control Abstraction
No ratings yet
Data and Control Abstraction
19 pages
BUSINESS INTELLIGENCE Unit2
No ratings yet
BUSINESS INTELLIGENCE Unit2
19 pages
Bi Cie1
No ratings yet
Bi Cie1
11 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
19 pages
BI Architecture - 1
No ratings yet
BI Architecture - 1
11 pages
BI - NOTES (Question Paper)
No ratings yet
BI - NOTES (Question Paper)
13 pages
Unit - 4
No ratings yet
Unit - 4
19 pages
Data Warehousing
No ratings yet
Data Warehousing
8 pages
Sayan Ghosh 26900123054 Cse Data Mining 6TH Sem
No ratings yet
Sayan Ghosh 26900123054 Cse Data Mining 6TH Sem
11 pages
Chapter 2
No ratings yet
Chapter 2
12 pages
Datawarehouse and Data Mining Final Notes
No ratings yet
Datawarehouse and Data Mining Final Notes
9 pages
Business Intelligence
No ratings yet
Business Intelligence
13 pages
BI1&4
No ratings yet
BI1&4
46 pages
Unit 4 DigitalData
No ratings yet
Unit 4 DigitalData
22 pages
MaaS360 Scoped Application Design Document
No ratings yet
MaaS360 Scoped Application Design Document
13 pages
Principles of Sap Governance Risk and Compliance
No ratings yet
Principles of Sap Governance Risk and Compliance
3 pages
Fortigate 3700f Series
No ratings yet
Fortigate 3700f Series
12 pages
Labanswers
No ratings yet
Labanswers
11 pages
SSLStrip On Windows-White Paper
No ratings yet
SSLStrip On Windows-White Paper
6 pages
Business Intelligence?: BI Used For?
No ratings yet
Business Intelligence?: BI Used For?
9 pages
Priyanka CV
No ratings yet
Priyanka CV
10 pages
Business Intelligence
No ratings yet
Business Intelligence
11 pages
Eserver I5 and Db2: Business Intelligence Concepts
No ratings yet
Eserver I5 and Db2: Business Intelligence Concepts
12 pages
DMT Unit-1
No ratings yet
DMT Unit-1
59 pages
DataMining Assignment1
No ratings yet
DataMining Assignment1
13 pages
DW Assignment
No ratings yet
DW Assignment
6 pages
Business Intelligence
No ratings yet
Business Intelligence
9 pages
ADBMS Chapter 9
No ratings yet
ADBMS Chapter 9
7 pages
DWDM202
No ratings yet
DWDM202
6 pages
Open Elective III & IV List 2021-22
No ratings yet
Open Elective III & IV List 2021-22
6 pages
DMBI
No ratings yet
DMBI
4 pages
Disable Animations On A Specific View in SwiftUI Using Transactions
No ratings yet
Disable Animations On A Specific View in SwiftUI Using Transactions
6 pages
IoT Connectivity by Monogoto
No ratings yet
IoT Connectivity by Monogoto
5 pages
Bridges LLC Proposal
No ratings yet
Bridges LLC Proposal
4 pages
15.0.3 Class Activity - What Is Going On
No ratings yet
15.0.3 Class Activity - What Is Going On
3 pages
Rapid7 Insightvm Product Brief 1
No ratings yet
Rapid7 Insightvm Product Brief 1
2 pages
Leveraging Business Intelligence in Data Warehousing and Data Mining For Informed Decision
No ratings yet
Leveraging Business Intelligence in Data Warehousing and Data Mining For Informed Decision
2 pages
BI Midterm Review
No ratings yet
BI Midterm Review
3 pages
Introduction To Data Mining and Data Warehousing
No ratings yet
Introduction To Data Mining and Data Warehousing
2 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
12 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet

Data Warehousing

Uploaded by

Data Warehousing

Uploaded by

Data Warehousing

• What is Data Warehousing?

• Regression is a statistical method used to model and analyze the

You might also like