0% found this document useful (0 votes)

107 views34 pages

An Introduction To Data Warehousing and Data Mining

This document provides an overview of data warehousing and data mining concepts. It begins with definitions of data, information, and knowledge. It then discusses the evolution of database systems from unordered records to relational databases. Key aspects of data warehousing are introduced, including the differences between operational and informational systems. Common uses of data mining such as marketing, fraud detection, and text analysis are also outlined. The document concludes with a case study on how Saurashtra University implemented a global data warehouse to integrate data from various local databases using different technologies.

Uploaded by

Agnivesh Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views34 pages

An Introduction To Data Warehousing and Data Mining

Uploaded by

Agnivesh Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

An Introduction to Data

Warehousing and Data Mining

Before going to Data Warehouse, you should know
What is data?
Evolution of Database System.
What is data, information and knowledge

As the world approached the 21st Century we are facing new and challenging
problems. More than ever before, governments, industry and the wider
community need information to help them to make decisions to tackle these
problems.
Before one can present and interpret information there has to be a process of
gathering and sorting data. Just as crude oil is the raw material from which petrol
is distilled, so too, data can be viewed as the raw material from which
information is obtained. Therefore, a good definition of data is:
Data
Data are observations or facts which when collected, organized and evaluated
become information or knowledge.
Information
Information is data that has been organized to serve a useful purpose.
Knowledge:
Informal, involves culture and generally know-how acquired by a human being
in his life experience.
Evaluation in Database Management
Ancient to modern:
All records were stored in unordered format, does not guarantee quality of
data or search technique. Then the concept of design came, which lead to
better reliability and performance
1960’s:
Computers become cost effective for private companies along with increasing
storage capability of computers. Two main data models were developed:
network model (CODASYL) and hierarchical (IMS).
1970-1972:
E.F. Codd proposed relational model for databases in a landmark paper on
how to think about databases.
1976:
P. Chen proposed the Entity-Relationship (ER) model for database design
giving yet another important insight into conceptual data models.
1980:
SQL (Structured Query Language) becomes “intergalactic standard”.
Evaluation in Database Management (Continue)

1990:
ODBC and the beginning of Object Database Management Systems (ODBMS).
Late-1990’s:
OLTP (Online Transaction Processing) and OLAP (Online Analytic Processing).
Future trends:
Huge (terabyte) systems are appearing and will require novel means of handling
and analyzing data. Successors to SQL (and perhaps RDBMS) will be emerging in
the future. Most likely this will be overtaken by XML and other emerging
techniques.
What a Data Warehouse Is

Data warehouse is the center of the architecture for information systems

for future. Data warehouse supports informational processing by providing
a solid platform of integrated, historical data from which to do analysis.
Data warehouse provides the facility for integration in a world of
unintegrated application systems. Data warehouse is achieved in an
evolutionary, step at a time fashion. Data warehouse organizes and stores
the data needed for informational, analytical processing over a long
historical time perspective. There is indeed a world of promise in building
and maintaining a data warehouse.
A data warehouse is a:
      subject oriented,
      integrated,
      time variant,
      non volatile

collection of data in support of management's decision-making process
Subject Oriented:
Data that gives information about a particular subject instead
of about a company's ongoing operations.
Integrated:
Data that is gathered into the data warehouse from a variety
of sources and merged into a coherent whole.
Time-variant:
All data in the data warehouse is identified with a particular
time period.
Non-volatile
Data is stable in a data warehouse. More data is added but
data is never removed. This enables management to gain a
consistent picture of the business.
Operational vs. Informational Systems

Perhaps the most important concept that has come out of the Data Warehouse
movement is the recognition that there are two fundamentally different types
of information systems in all organizations: operational systems and
informational systems.
"Operational systems" are just what their name implies; they are the systems
that help us run the enterprise operation day-to-day. These are the backbone
systems of any enterprise, our "order entry', "inventory", "manufacturing",
"payroll" and "accounting" systems. Because of their importance to the
organization, operational systems were almost always the first parts of the
enterprise to be computerized. Over the years, these operational systems have
been extended and rewritten, enhanced and maintained to the point that they
are completely integrated into the organization. Indeed, most large
organizations around the world today couldn't operate without their
operational systems and the data that these systems maintain.
On the other hand, there are other functions that go on within the enterprise that
have to do with planning, forecasting and managing the organization. These
functions are also critical to the survival of the organization, especially in our
current fast-paced world. Functions like "marketing planning", "engineering
planning" and "financial analysis" also require information systems to support
them. But these functions are different from operational ones, and the types of
systems and information required are also different. The knowledge-based
functions are informational systems.

"Informational systems" have to do with analyzing data and making decisions,

often major decisions, about how the enterprise will operate, now and in the
future. And not only do informational systems have a different focus from
operational ones, they often have a different scope. Where operational data
needs are normally focused upon a single area, informational data needs often
span a number of different areas and need large amounts of related operational
data.
Why Data Warehousing
• Quality decisions come from quality data.
• Problems with real life data:
– Data needs to be integrated from different sources
– Missing values
– Noisy and inconsistent values
– Data is not at the right level of aggregation
Why Use of Data Mining Today

Human analysis skills are inadequate:

– Volume and dimensionality of the data
– High data growth rate

Availability of:
– Data
– Storage
– Computational power
– Off-the-shelf software
Why Use of Data Mining Today (Continued…..)

Competition on service, not only on price (Banks,

phone companies, hotel chains, rental car
companies)

Personalization
What is Data Mining
Data Mining, or Knowledge Discovery in Databases (KDD) as it is
also known, is the nontrivial extraction of implicit, previously
unknown, and potentially useful information from data. This
encompasses a number of different technical approaches, such as
clustering, data summarization, learning classification rules, finding
dependency net works, analysing changes, and detecting anomalies.
In other words data mining is the search for relationships and global
patterns that exist in large databases but are `hidden' among the vast
amount of data, such as a relationship between student data and their
progress report. These relationships represent valuable knowledge
about the database and the objects in the database and, if the
database is a faithful mirror, of the real world registered by the
database.
Preprocessing and Mining
Knowledge
Patterns
Preprocessed
Data
Target Interpretation
Data
Model
Original Data Construction

Preprocessing
Data
Integration
and Selection
Convergence of Three Key Technologies
Common Uses of Data Mining

• Direct mail marketing

• Web site personalization
• Credit card fraud detection
• Bioinformatics
• Cheminformatics
• Text mining & analysis
• Market basket analysis
CASE STUDY
Of SAURASHTRA UNIVERSITY
RAJKOT - GUJARAT
TECHNOLOGICAL HETEROGENEITY

Rajkot
(Oracle)

Surendranagar
(DEC Sybase)
Junagadh Porbandar Amreli Jamnagar
(IBM DB2) (FoxPro) (Oracle) (Sql Ser)
The technological environment typically is heterogeneous.
Rajkot

Junagadh Porbandar Amreli Jamnagar Surendra

nagar
Detailed data is refreshed into the global warehouse
from the outlying sites.
Rajkot

Surendranagar

Junagadh
Jamnagar

Porbandar Amreli

The global data model is used to identify and define the

system of record at the outlying sites.
LEVELS OF GRANULARITY

Detailed data from outlying sites is added and aggregated upward

until the global Data Warehouse is populated.
The system of record remains at the outlying site level.
LOCAL WAREHOUSES

Each of the outlying sites can have its own

local Data Warehouse.
The local Data Warehouses at the outlying
site can feed the global Data Warehouse.
DRILL DOWN

Drill down starts at the global Data Warehouse

and goes to the outlying sites.
Metadata is the glue that holds the global data
environment together.Distributed metadata is
required across the globe.
BUILDING THE GLOBAL DATA WAREHOUSE
ITERATIVELY

The global Data Warehouse is built and populated

iteratively, in phases.
STAGING AREAS

Staging areas can be created for the detailed refreshment

data as it moves to the global Data Warehouse.
SUPPORTING MORE THAN ONE GLOBAL DATA
WAREHOUSE

The outlying sites can support more than one global

Data Warehouse.
Out comes from the System

 Discovering the stages and status of teaching and research work undertaken
by faculty members.

 To access information regarding learning environment for student and

optimize for specific needs.

 Status of allotment of work, monitoring of allotted work, follow up of work

for increasing effectivity and productivity.

 To optimize time for conduct of exam of university.

 To monitor activities of departments teaching, research and administration.

 To promote collaborative work and establish communication to increase

effectiveness of collaborative work.
      The data keeping and data mining will generate data that will assist for
quality maintains, quality improvement leads to assessment for ISO
9000 and accreditation for NAAC, NBA (AICTE).
 Account intelligence.
      Budgeting analysis and setting budget target.
      Budget monitoring on time scale.
      Student feedback data warehousing and generating
intelligent conclusion.
      Creating syllabus information for course program for individual
subject and deriving conclusion for accommodation of required subject
based on analysis and to certain extend course contain detail. This
approach will help to update curriculum keeping pace with emerging
technology and quick implementation by industry.
References

•http://www.billinmon.com

•Data mining by Peter Adriaans and Dolf Zantinge (PEA)

•Data warehousing in the real world by Sam Anahory and

Dennis Murray(PEA)

•http://www.pcc.qub.ac.uk

http://db.cs.sfu.ca

•Data Mining Techniques By Arun K Pujari

Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
12 pages
How Evolution of Database Led To Data Mining
No ratings yet
How Evolution of Database Led To Data Mining
10 pages
What Motivated Data Mining? Why Is It Important?
No ratings yet
What Motivated Data Mining? Why Is It Important?
14 pages
DWDM
No ratings yet
DWDM
48 pages
DMDW Technical Paper Presentation.
No ratings yet
DMDW Technical Paper Presentation.
12 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Knowledge Discovery Analysis
No ratings yet
Knowledge Discovery Analysis
7 pages
By Bi Jay Mishra
No ratings yet
By Bi Jay Mishra
685 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
29 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
108 pages
Data Mining and Data Warehouse - Mukesh Prasad Chaudhary
No ratings yet
Data Mining and Data Warehouse - Mukesh Prasad Chaudhary
651 pages
INFORMATION MANAGEMENT Unit 3 NEW
100% (1)
INFORMATION MANAGEMENT Unit 3 NEW
61 pages
Data Mining Ch1
No ratings yet
Data Mining Ch1
38 pages
Data Mining and Data Warehouse BY
100% (1)
Data Mining and Data Warehouse BY
12 pages
Defining Data Mining and Data Warehouse (Adugna Gutema)
No ratings yet
Defining Data Mining and Data Warehouse (Adugna Gutema)
9 pages
CSEE8
No ratings yet
CSEE8
10 pages
Data Mining and Warehousing - L1 & L2
No ratings yet
Data Mining and Warehousing - L1 & L2
30 pages
Data Warehousing and Data Mining Final Year Seminar Topic
No ratings yet
Data Warehousing and Data Mining Final Year Seminar Topic
10 pages
Unit 1 - Introduction To Data Mining and Data Warehousing
No ratings yet
Unit 1 - Introduction To Data Mining and Data Warehousing
84 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
11 pages
Current Trends
No ratings yet
Current Trends
35 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
What Is A Data Warehouse?: A Single, Complete and Consistent Store of Data Obtained Ina What They Can
No ratings yet
What Is A Data Warehouse?: A Single, Complete and Consistent Store of Data Obtained Ina What They Can
18 pages
DWM Cheatsheet Sem 5
No ratings yet
DWM Cheatsheet Sem 5
27 pages
Data Mining and Warehosuing Lecture 01
No ratings yet
Data Mining and Warehosuing Lecture 01
36 pages
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
No ratings yet
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
10 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
26 pages
Data Warehousing & Data Mining Slides
No ratings yet
Data Warehousing & Data Mining Slides
23 pages
Chapter 6-Data Warehouse and Datamining
No ratings yet
Chapter 6-Data Warehouse and Datamining
38 pages
Module 1-1basic Concepts
No ratings yet
Module 1-1basic Concepts
40 pages
1,2 Units Notes
No ratings yet
1,2 Units Notes
53 pages
Data Warehousing and Data Mining
75% (4)
Data Warehousing and Data Mining
14 pages
Malineni Lakshmaiah Engineering College S.KONDA-523101 Andhra Pradesh
No ratings yet
Malineni Lakshmaiah Engineering College S.KONDA-523101 Andhra Pradesh
15 pages
DBMS II Seven 7
No ratings yet
DBMS II Seven 7
13 pages
Data Mining and Data Warehousing: Gayathri Vidya Parishad College of Engineering Visakhapatnam
No ratings yet
Data Mining and Data Warehousing: Gayathri Vidya Parishad College of Engineering Visakhapatnam
11 pages
Data Warehousing
No ratings yet
Data Warehousing
21 pages
Data Warehousing
No ratings yet
Data Warehousing
23 pages
DWDM B Tech Unit 1 Part-A
No ratings yet
DWDM B Tech Unit 1 Part-A
15 pages
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
No ratings yet
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
54 pages
Data Warehouse
No ratings yet
Data Warehouse
97 pages
Data Mining
No ratings yet
Data Mining
7 pages
Hu DM 2024
No ratings yet
Hu DM 2024
205 pages
DWM Unit-I Notes
No ratings yet
DWM Unit-I Notes
9 pages
358 44 Datamining and Warehousing 4.4
No ratings yet
358 44 Datamining and Warehousing 4.4
155 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
14 pages
Data Warehouse and Data Mining - Neccessity or Useless Investment
No ratings yet
Data Warehouse and Data Mining - Neccessity or Useless Investment
8 pages
Module 1
No ratings yet
Module 1
78 pages
Data Mining& Data Warehousing.
No ratings yet
Data Mining& Data Warehousing.
13 pages
Data Mining and Data Warehouse: Raju - Qis@yahoo - Co.in Praneeth - Grp@yahoo - Co.in
No ratings yet
Data Mining and Data Warehouse: Raju - Qis@yahoo - Co.in Praneeth - Grp@yahoo - Co.in
8 pages
Data Mining and Data Warehousing
100% (2)
Data Mining and Data Warehousing
11 pages
Data Warehousing Basics
No ratings yet
Data Warehousing Basics
20 pages
Data Mining N Business Intelligence
No ratings yet
Data Mining N Business Intelligence
63 pages
Data Mining Final New
No ratings yet
Data Mining Final New
109 pages
Data Warehouseclass
No ratings yet
Data Warehouseclass
25 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
41 pages
Project Report For ME
No ratings yet
Project Report For ME
49 pages
Datamining
100% (1)
Datamining
11 pages
TTP-245p 247 User Manual E
No ratings yet
TTP-245p 247 User Manual E
50 pages
Pa - Unit - Iv
No ratings yet
Pa - Unit - Iv
45 pages
Maret 12
No ratings yet
Maret 12
8 pages
Datastage Interview Questions & Answers
No ratings yet
Datastage Interview Questions & Answers
8 pages
Thesis Statement Worksheet 5th Grade
100% (2)
Thesis Statement Worksheet 5th Grade
4 pages
SIC XE Architecture
No ratings yet
SIC XE Architecture
9 pages
Kazrog Avalon VT747SP User Guide
No ratings yet
Kazrog Avalon VT747SP User Guide
18 pages
Tux Paint 06
No ratings yet
Tux Paint 06
6 pages
PrE4 Module 1
No ratings yet
PrE4 Module 1
8 pages
SERDES
No ratings yet
SERDES
47 pages
Blockchain in Cyber Security: Submitted by
No ratings yet
Blockchain in Cyber Security: Submitted by
11 pages
A Survey On Large Language Model Acceleration Based On KV Cache Management
No ratings yet
A Survey On Large Language Model Acceleration Based On KV Cache Management
43 pages
Curriculum Vitae: Sanjay Dixit
No ratings yet
Curriculum Vitae: Sanjay Dixit
3 pages
OPT B1plus Unit Test 11 Higher
No ratings yet
OPT B1plus Unit Test 11 Higher
6 pages
Tech Achievements With Photos (IT Batch 2026)
No ratings yet
Tech Achievements With Photos (IT Batch 2026)
23 pages
Long Password DOS Attack 1702916027
No ratings yet
Long Password DOS Attack 1702916027
9 pages
NSM Datasheet
No ratings yet
NSM Datasheet
7 pages
A Transfer Alignment Algorithm Study Based On Actual Flight Test Data From A Tactical Air-To-Ground Weapon Launch
No ratings yet
A Transfer Alignment Algorithm Study Based On Actual Flight Test Data From A Tactical Air-To-Ground Weapon Launch
8 pages
NPM-D3A en 25 0101
No ratings yet
NPM-D3A en 25 0101
4 pages
E-Commerce Lab - Code 108 & 311 - BBA (G.) & BBA (B & I) - Sem. II
No ratings yet
E-Commerce Lab - Code 108 & 311 - BBA (G.) & BBA (B & I) - Sem. II
12 pages
STUDY NOTES TTL 100 Prelims - Unit 1
No ratings yet
STUDY NOTES TTL 100 Prelims - Unit 1
8 pages
Week2-Fuzzy Logic and Reasoning
No ratings yet
Week2-Fuzzy Logic and Reasoning
48 pages
SmartOTDR Optics ENG
No ratings yet
SmartOTDR Optics ENG
304 pages
Aon - Cyber Solution: Ransomware Supplemental Questionnaire
No ratings yet
Aon - Cyber Solution: Ransomware Supplemental Questionnaire
9 pages
Questions On Mysql-Dbms and Python Connectivity With Answers Using Format
No ratings yet
Questions On Mysql-Dbms and Python Connectivity With Answers Using Format
3 pages
IOT Embedded Projects List 2021 - 2022
No ratings yet
IOT Embedded Projects List 2021 - 2022
10 pages
CV Porto Vickyab - Compressed
No ratings yet
CV Porto Vickyab - Compressed
8 pages
Testng Interview Questions Level
No ratings yet
Testng Interview Questions Level
7 pages
A Survey On E-Commerce Recommendation Systems Using Artificial Intelligence and Current Trends For Personalization To Improve Customer Experience
No ratings yet
A Survey On E-Commerce Recommendation Systems Using Artificial Intelligence and Current Trends For Personalization To Improve Customer Experience
5 pages
Practical No. - 1
No ratings yet
Practical No. - 1
55 pages

An Introduction To Data Warehousing and Data Mining

Uploaded by

An Introduction To Data Warehousing and Data Mining

Uploaded by

An Introduction to Data

Warehousing and Data Mining

Data warehouse is the center of the architecture for information systems

"Informational systems" have to do with analyzing data and making decisions,

Human analysis skills are inadequate:

Competition on service, not only on price (Banks,

• Direct mail marketing

Junagadh Porbandar Amreli Jamnagar Surendra

The global data model is used to identify and define the

Detailed data from outlying sites is added and aggregated upward

Each of the outlying sites can have its own

Drill down starts at the global Data Warehouse

The global Data Warehouse is built and populated

Staging areas can be created for the detailed refreshment

The outlying sites can support more than one global

 To access information regarding learning environment for student and

 Status of allotment of work, monitoring of allotted work, follow up of work

 To optimize time for conduct of exam of university.

 To monitor activities of departments teaching, research and administration.

 To promote collaborative work and establish communication to increase

•Data mining by Peter Adriaans and Dolf Zantinge (PEA)

•Data warehousing in the real world by Sam Anahory and

•Data Mining Techniques By Arun K Pujari

You might also like