Lecture 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Data Warehouse and Data Mining – First Lecture

Overview of Data Warehouse

The term "Data Warehouse" was first coined by Bill Inmon in 1990.
According to Inmon, a data warehouse is a subject-oriented, integrated, time-
variant, and non-volatile collection of data. This data helps analysts to take
informed decisions in an organization. An operational database undergoes
frequent changes on a daily basis on account of the transactions that take place.
Suppose a business executive wants to analyze previous feedback on any data
such as a product, a supplier, or any consumer data, then the executive will
have no data available to analyze because the previous data has been updated
due to transactions. A data warehouses provides us generalized and
consolidated data in multidimensional view. Along with generalized and
consolidated view of data, a data warehouses also provides us Online
Analytical Processing (OLAP) tools. These tools help us in interactive and
effective analysis of data in a multidimensional space. This analysis results in
data generalization and data mining. Data mining functions such as
association, clustering, classification, prediction can be integrated with OLAP
operations to enhance the interactive mining of knowledge at multiple level of
abstraction. That's why data warehouse has now become an important
platform for data analysis and online analytical processing.
Data Warehouse and Data Mining 2023-2024

A Data Warehousing (DW)

A Data Warehousing (DW) is process for collecting and managing data from
varied sources to provide meaningful business insights. A Data warehouse is
typically used to connect and analyze business data from heterogeneous
sources. The data warehouse is the core of the BI system which is built for
data analysis and reporting.

It is a blend of technologies and components which aids the strategic use of


data. It is electronic storage of a large amount of information by a business
which is designed for query and analysis instead of transaction processing. It
is a process of transforming data into information and making it available to
users in a timely manner to make a difference.

The decision support database (Data Warehouse) is maintained separately


from the organization’s operational database. However, the data warehouse is
not a product but an environment. It is an architectural construct of an
information system which provides users with current and historical decision
support information which is difficult to access or present in the traditional
operational data store.

You many know that a 3NF-designed database for an inventory system many
have tables related to each other. For example, a report on current inventory
information can include more than 12 joined conditions. This can quickly
slow down the response time of the query and report. A data warehouse
provides a new design which can help to reduce the response time and helps to
enhance the performance of queries for reports and analytics.

Prepared by Dr. Dunia H. Hameed Page 2


Data Warehouse and Data Mining 2023-2024

Data warehouse system is also known by the following name:

 Decision Support System (DSS)


 Executive Information System
 Management Information System
 Business Intelligence Solution
 Analytic Application
 Data Warehouse

Prepared by Dr. Dunia H. Hameed Page 3


Data Warehouse and Data Mining 2023-2024

History of Data warehouse

The Data warehouse benefits users to understand and enhance their


organization’s performance. The need to warehouse data evolved as
computer systems became more complex and needed to handle increasing
amounts of Information. However, Data Warehousing is a not a new thing.

Here are some key events in evolution of Data Warehouse-

 1960- Dartmouth and General Mills in a joint research project,


develop the terms dimensions and facts.
 1970- A Nielsen and IRI introduces dimensional data marts for retail
sales.
 1983- Tera Data Corporation introduces a database management
system which is specifically designed for decision support
 Data warehousing started in the late 1980s when IBM worker Paul
Murphy and Barry Devlin developed the Business Data Warehouse.
 However, the real concept was given by Inmon Bill. He was
considered as a father of data warehouse. He had written about a
variety of topics for building, usage, and maintenance of the
warehouse & the Corporate Information Factory.

Working Method of Data warehouse

A Data Warehouse works as a central repository where information arrives


from one or more data sources. Data flows into a data warehouse from the
transactional system and other relational databases.

Prepared by Dr. Dunia H. Hameed Page 4


Data Warehouse and Data Mining 2023-2024

Data may be:

 Structured
 Semi-structured
 Unstructured data

The data is processed, transformed, and ingested so that users can access
the processed data in the Data Warehouse through Business Intelligence
tools, SQL clients, and spreadsheets. A data warehouse merges
information coming from different sources into one comprehensive
database.

By merging all of this information in one place, an organization can


analyze its customers more holistically. This helps to ensure that it has
considered all the information available. Data warehousing makes data
mining possible. Data mining is looking for patterns in the data that may
lead to higher sales and profits.

Types of Data Warehouse

Three main types of Data Warehouses (DWH) are:

1. Enterprise Data Warehouse (EDW):

Enterprise Data Warehouse (EDW) is a centralized warehouse. It provides


decision support service across the enterprise. It offers a unified approach
for organizing and representing data. It also provides the ability to classify
data according to the subject and give access according to those divisions.

Prepared by Dr. Dunia H. Hameed Page 5


Data Warehouse and Data Mining 2023-2024

2. Operational Data Store:

Operational Data Store, which is also called ODS, are nothing but data
store required when neither Data warehouse nor OLTP systems support
organizations reporting needs. In ODS, Data warehouse is refreshed in real
time. Hence, it is widely preferred for routine activities like storing records
of the Employees.

3. Data Mart:

A data mart is a subset of the data warehouse. It specially designed for a


particular line of business, such as sales, finance, sales or finance. In an
independent data mart, data can collect directly from sources.

General stages of Data Warehouse

Earlier, organizations started relatively simple use of data warehousing.


However, over time, more sophisticated use of data warehousing begun.

The following are general stages of use of the data warehouse (DWH):

Offline Operational Database:

In this stage, data is just copied from an operational system to another


server. In this way, loading, processing, and reporting of the copied data do
not impact the operational system’s performance.

Prepared by Dr. Dunia H. Hameed Page 6


Data Warehouse and Data Mining 2023-2024

Offline Data Warehouse:

Data in the Data warehouse is regularly updated from the Operational


Database. The data in Data warehouse is mapped and transformed to meet
the Data warehouse objectives.

Real time Data Warehouse:

In this stage, Data warehouses are updated whenever any transaction takes
place in operational database. For example, Airline or railway booking
system.

Integrated Data Warehouse:

In this stage, Data Warehouses are updated continuously when the


operational system performs a transaction. The Datawarehouse then
generates transactions which are passed back to the operational system.

Components of Data warehouse

Four components of Data Warehouses are:

1. Load manager: Load manager is also called the front component. It


performs with all the operations associated with the extraction and load
of data into the warehouse. These operations include transformations to
prepare the data for entering into the Data warehouse.
2. Warehouse Manager: Warehouse manager performs operations
associated with the management of the data in the warehouse. It
performs operations like analysis of data to ensure consistency, creation
of indexes and views, generation of denormalization and aggregations,

Prepared by Dr. Dunia H. Hameed Page 7


Data Warehouse and Data Mining 2023-2024

transformation and merging of source data and archiving and baking-up


data.
3. Query Manager: Query manager is also known as backend component.
It performs all the operation operations related to the management of
user queries. The operations of this Data warehouse components are
direct queries to the appropriate tables for scheduling the execution of
queries.
4. End-user access tools:

This is categorized into five different groups like

1. Data Reporting
2. Query Tools
3. Application development tools
4. EIS tools
5. OLAP tools and data mining tools.

Who needs Data warehouse?

DWH (Data warehouse) is needed for all types of users like:

 Decision makers who rely on mass amount of data


 Users who use customized, complex processes to obtain information
from multiple data sources.
 It is also used by the people who want simple technology to access the
data
 It also essential for those people who want a systematic approach for
making decisions.

Prepared by Dr. Dunia H. Hameed Page 8


Data Warehouse and Data Mining 2023-2024

 If the user wants fast performance on a huge amount of data which is


a necessity for reports, grids or charts, then Data warehouse proves
useful.
 Data warehouse is a first step If you want to discover ‘hidden
patterns’ of data-flows and groupings.

What Is a Data Warehouse Used For?

Here, are most common sectors where Data warehouse is used:

Airline:

In the Airline system, it is used for operation purpose like crew assignment,
analyses of route profitability, frequent flyer program promotions, etc.

Banking:

It is widely used in the banking sector to manage the resources available on


desk effectively. Few banks also used for the market research, performance
analysis of the product and operations.

Healthcare:

Healthcare sector also used Data warehouse to strategize and predict


outcomes, generate patient’s treatment reports, share data with tie-in
insurance companies, medical aid services, etc.

Prepared by Dr. Dunia H. Hameed Page 9


Data Warehouse and Data Mining 2023-2024

Public sector:

In the public sector, data warehouse is used for intelligence gathering. It


helps government agencies to maintain and analyze tax records, health
policy records, for every individual.

Investment and Insurance sector:

In this sector, the warehouses are primarily used to analyze data patterns,
customer trends, and to track market movements.

Retain chain:

In retail chains, Data warehouse is widely used for distribution and


marketing. It also helps to track items, customer buying pattern,
promotions and also used for determining pricing policy.

Telecommunication:

A data warehouse is used in this sector for product promotions, sales


decisions and to make distribution decisions.

Hospitality Industry:

This Industry utilizes warehouse services to design as well as estimate their


advertising and promotion campaigns where they want to target clients
based on their feedback and travel patterns.

Prepared by Dr. Dunia H. Hameed Page 10


Data Warehouse and Data Mining 2023-2024

Steps to Implement Data Warehouse

The best way to address the business risk associated with a Data warehouse
implementation is to employ a three-prong strategy as below

Enterprise strategy: Here we identify technical including current


architecture and tools. We also identify facts, dimensions, and attributes.
Data mapping and transformation is also passed.

Phased delivery: Data warehouse implementation should be phased based


on subject areas. Related business entities like booking and billing should
be first implemented and then integrated with each other.

Iterative Prototyping: Rather than a big bang approach to implementation,


the Data warehouse should be developed and tested iteratively.

Advantages of Data Warehouse (DWH):

1. Data warehouse allows business users to quickly access critical data


from some sources all in one place.
2. Data warehouse provides consistent information on various cross-
functional activities. It is also supporting ad-hoc reporting and query.
3. Data Warehouse helps to integrate many sources of data to reduce
stress on the production system.
4. Data warehouse helps to reduce total turnaround time for analysis and
reporting.
5. Restructuring and Integration make it easier for the user to use for
reporting and analysis.

Prepared by Dr. Dunia H. Hameed Page 11


Data Warehouse and Data Mining 2023-2024

6. Data warehouse allows users to access critical data from the number
of sources in a single place. Therefore, it saves user’s time of
retrieving data from multiple sources.
7. Data warehouse stores a large amount of historical data. This helps
users to analyze different time periods and trends to make future
predictions.

Disadvantages of Data Warehouse:

1. Not an ideal option for unstructured data.


2. Creation and Implementation of Data Warehouse is surely time
confusing affair.
3. Data Warehouse can be outdated relatively quickly
4. Difficult to make changes in data types and ranges, data source
schema, indexes, and queries.
5. The data warehouse may seem easy, but actually, it is too complex for
the average users.
6. Despite best efforts at project management, data warehousing project
scope will always increase.
7. Sometime warehouse users will develop different business rules.
8. Organizations need to spend lots of their resources for training and
Implementation purpose.

Prepared by Dr. Dunia H. Hameed Page 12

You might also like