An Introduction To Data Warehousing and Data Mining
An Introduction To Data Warehousing and Data Mining
As the world approached the 21st Century we are facing new and challenging
problems. More than ever before, governments, industry and the wider
community need information to help them to make decisions to tackle these
problems.
Before one can present and interpret information there has to be a process of
gathering and sorting data. Just as crude oil is the raw material from which petrol
is distilled, so too, data can be viewed as the raw material from which
information is obtained. Therefore, a good definition of data is:
Data
Data are observations or facts which when collected, organized and evaluated
become information or knowledge.
Information
Information is data that has been organized to serve a useful purpose.
Knowledge:
Informal, involves culture and generally know-how acquired by a human being
in his life experience.
Evaluation in Database Management
Ancient to modern:
All records were stored in unordered format, does not guarantee quality of
data or search technique. Then the concept of design came, which lead to
better reliability and performance
1960’s:
Computers become cost effective for private companies along with increasing
storage capability of computers. Two main data models were developed:
network model (CODASYL) and hierarchical (IMS).
1970-1972:
E.F. Codd proposed relational model for databases in a landmark paper on
how to think about databases.
1976:
P. Chen proposed the Entity-Relationship (ER) model for database design
giving yet another important insight into conceptual data models.
1980:
SQL (Structured Query Language) becomes “intergalactic standard”.
Evaluation in Database Management (Continue)
1990:
ODBC and the beginning of Object Database Management Systems (ODBMS).
Late-1990’s:
OLTP (Online Transaction Processing) and OLAP (Online Analytic Processing).
Future trends:
Huge (terabyte) systems are appearing and will require novel means of handling
and analyzing data. Successors to SQL (and perhaps RDBMS) will be emerging in
the future. Most likely this will be overtaken by XML and other emerging
techniques.
What a Data Warehouse Is
Perhaps the most important concept that has come out of the Data Warehouse
movement is the recognition that there are two fundamentally different types
of information systems in all organizations: operational systems and
informational systems.
"Operational systems" are just what their name implies; they are the systems
that help us run the enterprise operation day-to-day. These are the backbone
systems of any enterprise, our "order entry', "inventory", "manufacturing",
"payroll" and "accounting" systems. Because of their importance to the
organization, operational systems were almost always the first parts of the
enterprise to be computerized. Over the years, these operational systems have
been extended and rewritten, enhanced and maintained to the point that they
are completely integrated into the organization. Indeed, most large
organizations around the world today couldn't operate without their
operational systems and the data that these systems maintain.
On the other hand, there are other functions that go on within the enterprise that
have to do with planning, forecasting and managing the organization. These
functions are also critical to the survival of the organization, especially in our
current fast-paced world. Functions like "marketing planning", "engineering
planning" and "financial analysis" also require information systems to support
them. But these functions are different from operational ones, and the types of
systems and information required are also different. The knowledge-based
functions are informational systems.
Availability of:
– Data
– Storage
– Computational power
– Off-the-shelf software
Why Use of Data Mining Today (Continued…..)
Personalization
What is Data Mining
Data Mining, or Knowledge Discovery in Databases (KDD) as it is
also known, is the nontrivial extraction of implicit, previously
unknown, and potentially useful information from data. This
encompasses a number of different technical approaches, such as
clustering, data summarization, learning classification rules, finding
dependency net works, analysing changes, and detecting anomalies.
In other words data mining is the search for relationships and global
patterns that exist in large databases but are `hidden' among the vast
amount of data, such as a relationship between student data and their
progress report. These relationships represent valuable knowledge
about the database and the objects in the database and, if the
database is a faithful mirror, of the real world registered by the
database.
Preprocessing and Mining
Knowledge
Patterns
Preprocessed
Data
Target Interpretation
Data
Model
Original Data Construction
Preprocessing
Data
Integration
and Selection
Convergence of Three Key Technologies
Common Uses of Data Mining
Rajkot
(Oracle)
Surendranagar
(DEC Sybase)
Junagadh Porbandar Amreli Jamnagar
(IBM DB2) (FoxPro) (Oracle) (Sql Ser)
The technological environment typically is heterogeneous.
Rajkot
Surendranagar
Junagadh
Jamnagar
Porbandar Amreli
Discovering the stages and status of teaching and research work undertaken
by faculty members.
•http://www.billinmon.com
•http://www.pcc.qub.ac.uk
http://db.cs.sfu.ca