Chapter 1&2
Chapter 1&2
Chapter One
Data Warehousing and Data Mining
November 13, 2021
Chapter-1
Overview & Brief description of
data mining Data warehousing,
data mining and database technology
Online Transaction processing and data mining
• Data Warehousing – Overview
• The term "Data Warehouse" was first coined by Bill Inmon in 1990.
According to Inmon, a data warehouse is a subject oriented,
integrated, time-variant, and non-volatile collection of data.
• This data helps analysts to take informed decisions in an organization.
• An operational database undergoes frequent changes on a daily
basis on account of the transactions that take place.
• Suppose a business executive wants to analyze previous feedback on
any data such as a product, a supplier, or any consumer data, then
the executive will have no data available to analyze because the
previous data has been updated due to transactions.
• A data warehouses provides us generalized and consolidated data in
multidimensional view. Along with generalized and consolidated view
of data, a data warehouses also provides us Online Analytical
Processing (OLAP) tools.
• These tools help us in interactive and effective analysis of data in a
multidimensional space.
• This analysis results in data generalization and data mining.
• Data mining functions such as association, clustering, classification,
prediction can be integrated with OLAP operations to enhance the
interactive mining of knowledge at multiple level of abstraction.
• That's why data warehouse has now become an important platform
for data analysis and online analytical processing.
• Understanding a Data Warehouse
• In other words, we can say that data mining is the procedure of mining
knowledge from data.
• The information or knowledge extracted so can be used for any of the
following applications −
• Market Analysis
• Fraud Detection
• Customer Retention
• Production Control
• Science Exploration
• Data Mining - Systems
• There is a large variety of data mining systems available. Data mining
systems may integrate techniques from the following −
• 4. Association Rules:
• 6. Sequential Patterns:
• The sequential pattern is a data mining technique specialized
for evaluating sequential data to discover sequential patterns.
• 7. Prediction:
• Prediction used a combination of other data mining techniques such
as trends, clustering, classification, etc.
• Data Mining Architecture
• The significant components of data mining systems are a data source,
data mining engine, data warehouse server, the pattern evaluation
module, graphical user interface, and knowledge base.
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases
January 14, 2022 34
Data Mining in Business Intelligence
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
• Apart from these, data mining can also be used in the areas of
production control, customer retention, science exploration,
sports, astrology, and Internet Web Technology.
• Market Analysis and Management
• Listed below are the various fields of market where data mining is
used
• Customer Profiling − Data mining helps determine what kind of
people buy what kind of products.
• Identifying Customer Requirements − Data mining helps in
identifying the best products for different customers. It uses
prediction to find the factors that may attract new customers.
• Cross Market Analysis − Data mining performs
Association/correlations between product sales.
• Determining Customer purchasing pattern − Data mining helps in
determining customer purchasing pattern.
• Providing Summary Information − Data mining provides us
various multidimensional summary reports.
• Corporate Analysis and Risk Management
• Data mining is also used in the fields of credit card services and
telecommunication to detect frauds.
• In fraud telephone calls, it helps to find the destination of the
call, duration of the call, time of the day or week, etc.
• It also analyzes the patterns that deviate from expected norms.
• Classification and Prediction
• Classification is the process of finding a model that describes the
data classes or concepts.
• The purpose is to be able to use this model to predict the class of
objects whose class label is unknown.
• This derived model is based on the analysis of sets of training
data.
• The derived model can be presented in the following forms −
• Classification (IF-THEN) Rules
• Decision Trees
• Mathematical Formulae
• Neural Networks
• The list of functions involved in these processes are as follows −
• Classification − It predicts the class of objects whose class label is
unknown.
• Its objective is to find a derived model that describes and
distinguishes data classes or concepts.
• The Derived Model is based on the data object whose class label
is well known.
• Prediction − It is used to predict missing or unavailable numerical
data values rather than class labels. Regression Analysis is
generally used for prediction.
• Outlier Analysis − Outliers may be defined as the data objects
that do not comply with the general model of the data available.
• Evolution Analysis − Evolution analysis refers to the description
for objects whose behavior changes over time.
Challenges in Data Mining
Efficiency and scalability of data mining algorithms
Handling high-dimensionality
9
It provides summarized and It provides detailed and flat relational
multidimensional view of data. view of data.
10
The number of users is in The number of users is in thousands.
hundreds.
11
The number of records accessed is The number of records accessed is in
in millions. tens.
12
The database size is from 100GB The database size is from 100 MB to
to 100 TB. 100 GB.
13
These are highly flexible. It provides high performance.
• Advantages of OLTP:
• It allows more than one user to access and change the same data
simultaneously.
• Therefore, it requires concurrency control and recovery technique in
order to avoid any unprecedented situations
• OLTP system data are not suitable for decision making. You have to
use data of OLAP systems for “what if” analysis or the decision
making.
Wollega University
Chapter Two
Data Warehousing and Data Mining
November 13, 2021
Data Warehouse Concepts
• Data Mart
• Data marts contain a subset of organization-wide data that is
valuable to specific groups of people in an organization.
• In other words, a data mart contains only those data that is
specific to a particular group.
• For example, the marketing data mart may contain only data
related to items, customers, and sales. Data marts are confined to
subjects.
• Points to Remember About Data Marts