0% found this document useful (0 votes)
107 views34 pages

An Introduction To Data Warehousing and Data Mining

This document provides an overview of data warehousing and data mining concepts. It begins with definitions of data, information, and knowledge. It then discusses the evolution of database systems from unordered records to relational databases. Key aspects of data warehousing are introduced, including the differences between operational and informational systems. Common uses of data mining such as marketing, fraud detection, and text analysis are also outlined. The document concludes with a case study on how Saurashtra University implemented a global data warehouse to integrate data from various local databases using different technologies.

Uploaded by

Agnivesh Pandey
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views34 pages

An Introduction To Data Warehousing and Data Mining

This document provides an overview of data warehousing and data mining concepts. It begins with definitions of data, information, and knowledge. It then discusses the evolution of database systems from unordered records to relational databases. Key aspects of data warehousing are introduced, including the differences between operational and informational systems. Common uses of data mining such as marketing, fraud detection, and text analysis are also outlined. The document concludes with a case study on how Saurashtra University implemented a global data warehouse to integrate data from various local databases using different technologies.

Uploaded by

Agnivesh Pandey
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

An Introduction to Data

Warehousing and Data Mining


Before going to Data Warehouse, you should know
What is data?
Evolution of Database System.
What is data, information and knowledge

As the world approached the 21st Century we are facing new and challenging
problems. More than ever before, governments, industry and the wider
community need information to help them to make decisions to tackle these
problems.
Before one can present and interpret information there has to be a process of
gathering and sorting data. Just as crude oil is the raw material from which petrol
is distilled, so too, data can be viewed as the raw material from which
information is obtained. Therefore, a good definition of data is:
Data
Data are observations or facts which when collected, organized and evaluated
become information or knowledge.
Information
Information is data that has been organized to serve a useful purpose.
Knowledge:
Informal, involves culture and generally know-how acquired by a human being
in his life experience.
Evaluation in Database Management
Ancient to modern:
All records were stored in unordered format, does not guarantee quality of
data or search technique. Then the concept of design came, which lead to
better reliability and performance
1960’s:
Computers become cost effective for private companies along with increasing
storage capability of computers. Two main data models were developed:
network model (CODASYL) and hierarchical (IMS).
1970-1972:
E.F. Codd proposed relational model for databases in a landmark paper on
how to think about databases.
1976:
P. Chen proposed the Entity-Relationship (ER) model for database design
giving yet another important insight into conceptual data models.
1980:
SQL (Structured Query Language) becomes “intergalactic standard”.
Evaluation in Database Management (Continue)

1990:
ODBC and the beginning of Object Database Management Systems (ODBMS).
Late-1990’s:
OLTP (Online Transaction Processing) and OLAP (Online Analytic Processing).
Future trends:
Huge (terabyte) systems are appearing and will require novel means of handling
and analyzing data. Successors to SQL (and perhaps RDBMS) will be emerging in
the future. Most likely this will be overtaken by XML and other emerging
techniques.
What a Data Warehouse Is

Data warehouse is the center of the architecture for information systems


for future. Data warehouse supports informational processing by providing
a solid platform of integrated, historical data from which to do analysis.
Data warehouse provides the facility for integration in a world of
unintegrated application systems. Data warehouse is achieved in an
evolutionary, step at a time fashion. Data warehouse organizes and stores
the data needed for informational, analytical processing over a long
historical time perspective. There is indeed a world of promise in building
and maintaining a data warehouse.
A data warehouse is a:
      subject oriented,
      integrated,
      time variant,
      non volatile
 
collection of data in support of management's decision-making process
Subject Oriented:
Data that gives information about a particular subject instead
of about a company's ongoing operations.
Integrated:
Data that is gathered into the data warehouse from a variety
of sources and merged into a coherent whole.
Time-variant:
All data in the data warehouse is identified with a particular
time period.
Non-volatile
Data is stable in a data warehouse. More data is added but
data is never removed. This enables management to gain a
consistent picture of the business.
Operational vs. Informational Systems

Perhaps the most important concept that has come out of the Data Warehouse
movement is the recognition that there are two fundamentally different types
of information systems in all organizations: operational systems and
informational systems.
"Operational systems" are just what their name implies; they are the systems
that help us run the enterprise operation day-to-day. These are the backbone
systems of any enterprise, our "order entry', "inventory", "manufacturing",
"payroll" and "accounting" systems. Because of their importance to the
organization, operational systems were almost always the first parts of the
enterprise to be computerized. Over the years, these operational systems have
been extended and rewritten, enhanced and maintained to the point that they
are completely integrated into the organization. Indeed, most large
organizations around the world today couldn't operate without their
operational systems and the data that these systems maintain.
On the other hand, there are other functions that go on within the enterprise that
have to do with planning, forecasting and managing the organization. These
functions are also critical to the survival of the organization, especially in our
current fast-paced world. Functions like "marketing planning", "engineering
planning" and "financial analysis" also require information systems to support
them. But these functions are different from operational ones, and the types of
systems and information required are also different. The knowledge-based
functions are informational systems.

"Informational systems" have to do with analyzing data and making decisions,


often major decisions, about how the enterprise will operate, now and in the
future. And not only do informational systems have a different focus from
operational ones, they often have a different scope. Where operational data
needs are normally focused upon a single area, informational data needs often
span a number of different areas and need large amounts of related operational
data.
Why Data Warehousing
• Quality decisions come from quality data.
• Problems with real life data:
– Data needs to be integrated from different sources
– Missing values
– Noisy and inconsistent values
– Data is not at the right level of aggregation
Why Use of Data Mining Today

Human analysis skills are inadequate:


– Volume and dimensionality of the data
– High data growth rate

Availability of:
– Data
– Storage
– Computational power
– Off-the-shelf software
Why Use of Data Mining Today (Continued…..)

Competition on service, not only on price (Banks,


phone companies, hotel chains, rental car
companies)

Personalization
What is Data Mining
Data Mining, or Knowledge Discovery in Databases (KDD) as it is
also known, is the nontrivial extraction of implicit, previously
unknown, and potentially useful information from data. This
encompasses a number of different technical approaches, such as
clustering, data summarization, learning classification rules, finding
dependency net works, analysing changes, and detecting anomalies.
In other words data mining is the search for relationships and global
patterns that exist in large databases but are `hidden' among the vast
amount of data, such as a relationship between student data and their
progress report. These relationships represent valuable knowledge
about the database and the objects in the database and, if the
database is a faithful mirror, of the real world registered by the
database.
Preprocessing and Mining
Knowledge
Patterns
Preprocessed
Data
Target Interpretation
Data
Model
Original Data Construction

Preprocessing
Data
Integration
and Selection
Convergence of Three Key Technologies
Common Uses of Data Mining

• Direct mail marketing


• Web site personalization
• Credit card fraud detection
• Bioinformatics
• Cheminformatics
• Text mining & analysis
• Market basket analysis
CASE STUDY
Of SAURASHTRA UNIVERSITY
RAJKOT - GUJARAT
TECHNOLOGICAL HETEROGENEITY

Rajkot
(Oracle)

Surendranagar
(DEC Sybase)
Junagadh Porbandar Amreli Jamnagar
(IBM DB2) (FoxPro) (Oracle) (Sql Ser)
The technological environment typically is heterogeneous.
Rajkot

Junagadh Porbandar Amreli Jamnagar Surendra


nagar
Detailed data is refreshed into the global warehouse
from the outlying sites.
Rajkot

Surendranagar

Junagadh
Jamnagar

Porbandar Amreli

The global data model is used to identify and define the


system of record at the outlying sites.
LEVELS OF GRANULARITY

Detailed data from outlying sites is added and aggregated upward


until the global Data Warehouse is populated.
The system of record remains at the outlying site level.
LOCAL WAREHOUSES

Each of the outlying sites can have its own


local Data Warehouse.
The local Data Warehouses at the outlying
site can feed the global Data Warehouse.
DRILL DOWN

Drill down starts at the global Data Warehouse


and goes to the outlying sites.
Metadata is the glue that holds the global data
environment together.Distributed metadata is
required across the globe.
BUILDING THE GLOBAL DATA WAREHOUSE
ITERATIVELY

The global Data Warehouse is built and populated


iteratively, in phases.
STAGING AREAS

Staging areas can be created for the detailed refreshment


data as it moves to the global Data Warehouse.
SUPPORTING MORE THAN ONE GLOBAL DATA
WAREHOUSE

The outlying sites can support more than one global


Data Warehouse.
Out comes from the System

     Discovering the stages and status of teaching and research work undertaken
by faculty members.

  To access information regarding learning environment for student and


optimize for specific needs.

     Status of allotment of work, monitoring of allotted work, follow up of work


for increasing effectivity and productivity.

      To optimize time for conduct of exam of university.

      To monitor activities of departments teaching, research and administration.

   To promote collaborative work and establish communication to increase


effectiveness of collaborative work.
      The data keeping and data mining will generate data that will assist for
quality maintains, quality improvement leads to assessment for ISO
9000 and accreditation for NAAC, NBA (AICTE).
  Account intelligence.
      Budgeting analysis and setting budget target.
      Budget monitoring on time scale.
      Student feedback data warehousing and generating
intelligent conclusion.
      Creating syllabus information for course program for individual
subject and deriving conclusion for accommodation of required subject
based on analysis and to certain extend course contain detail. This
approach will help to update curriculum keeping pace with emerging
technology and quick implementation by industry.
References

•http://www.billinmon.com

•Data mining by Peter Adriaans and Dolf Zantinge (PEA)

•Data warehousing in the real world by Sam Anahory and


Dennis Murray(PEA)

•http://www.pcc.qub.ac.uk

http://db.cs.sfu.ca

•Data Mining Techniques By Arun K Pujari

You might also like