0% found this document useful (0 votes)

49 views17 pages

Chapter-1 (Introduction)

This document provides an introduction to data mining. It discusses why data mining is necessary given the explosive growth of data from various sources. It defines data mining as the process of discovering interesting patterns and knowledge from large amounts of data. The document outlines the major components of data mining, including what kinds of data can be mined, what patterns can be discovered, and common technologies used. It aims to give the reader a high-level overview of the key concepts in data mining.

Uploaded by

Monis Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views17 pages

Chapter-1 (Introduction)

Uploaded by

Monis Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Data Mining - Introduction

Pramod Kumar Singh

Professor (Computer Science and Engineering)
ABV – Indian Institute of Information Technology Management Gwalior
Gwalior – 474015, MP, India
Introduction

◼ Why Data Mining?

◼ What Is Data Mining?

◼ What Kind of Data Can Be Mined?

◼ What Kinds of Patterns Can Be Mined?

◼ What Technologies Are Used?

◼ What Kind of Applications Are Targeted?

◼ Major Issues in Data Mining

Data Mining – Why?

We say we live in the information age. However, actually we live in the data age. Because of digitalization data
has increased many fold than the previous era. This growth is explosive. World Wide Web (WWW), social
networks, supermarkets, business houses, industries etc. are generating data in terms of petabytes and more.
◼ Major sources of explosive growth of abundant data
◼ Business: Web, e-commerce, transactions, stocks, …
◼ Science: Remote sensing, bioinformatics, scientific simulation, …
◼ Society and everyone: news, digital cameras, YouTube, …
It is not possible to uncover the knowledge / information hidden in the heap of this data without automated
tools.
◼ We are drowning in data, but starving for knowledge!
◼ “Necessity is the mother of invention” — Data mining — Automated analysis of massive data sets.
Data Mining – Why?

This explosively growing, widely available, and gigantic

body of data makes our time truly the data age.

Powerful and versatile tools are badly needed to

automatically uncover valuable information from the
tremendous amounts of data and to transform such
data into organized knowledge.

This necessity has led to the birth of data mining.

Figure: The world is data rich but information poor.

Data Mining – Why?
Data mining can be viewed as a result of the natural
evolution of Information Technology (IT).

Since the 1960s, database and information technology

has evolved systematically from primitive file
processing systems to sophisticated and powerful
database systems.

After the establishment of database management

systems, database technology moved toward the
development of advanced database systems, data
warehousing, and data mining for advanced data
analysis and web-based databases.

Advanced data analysis came in late 1980s onward

because of a steady progress in computer hardware
technology which allowed powerful and affordable
computers, data collection equipment, and storage
media. It boosted information retrieval, and data
analysis.
Data Mining – Why?
One emerging data repository architecture is the data warehouse. This is a repository of multiple
heterogeneous data sources organized under a unified schema at a single site to facilitate management decision
making.

Data warehouse technology includes data cleaning, data integration, and online analytical processing (OLAP) —
that is, analysis techniques with functionalities such as summarization, consolidation, and aggregation, as well as
the ability to view information from different angles.

Though OLAP tools support multidimensional analysis and decision making, additional data analysis tools are
required for in-depth analysis.
For example, data mining tools that provide data classification, clustering, outlier/anomaly detection, and
the characterization of changes in data over time.

Since 1990s, huge volumes of data have been accumulated beyond databases and data warehouses, e.g., World
Wide Web and web-based databases, Internet-based global interconnected, heterogeneous databases /
information bases. They play a vital role in the information industry.
The effective and efficient analysis of data from such different forms of data by integration of information
retrieval, data mining, and information network analysis technologies is a challenging task.
Data Mining – Why?
The fast-growing, tremendous amount of data,
collected and stored in large and numerous data
repositories, has far exceeded our human ability for
comprehension without powerful tools. This situation
(the abundance of data, and the need for powerful
data analysis tools), is described as a data rich but
information poor situation.

Consequently, important decisions are often made

based NOT on the information-rich data stored in data
repositories but rather on a decision maker’s intuition,
simply because the decision maker does not have the
tools to extract the valuable knowledge embedded in
the vast amounts of data.

The widening gap between data and information calls

for the systematic development of data mining tools
that can turn data tombs into golden nuggets of
knowledge. Figure: The world is data rich but information poor.
What is Data Mining

The data mining, a truly interdisciplinary subject, is

basically knowledge mining from data. It is shown in
the adjacent figure.

The other popular names of data mining are knowledge

mining from data, knowledge extraction, data/pattern
analysis, data archaeology, and data dredging.

The data mining is also popularly known as knowledge

discovery from data, or KDD.

Figure: Data mining – searching the knowledge

(interesting patterns) in data.
What is Data Mining
The KDD process is as follows.
1. Data cleaning: removing noise and inconsistent data.
2. Data integration: data from sources are combined.
3. Data selection: data relevant to the analysis task are
retrieved from the database.
4. Data transformation: data are transformed and
consolidated into forms appropriate for mining. It is
usually done by performing summary or aggregation
operations.
5. Data mining: an essential process where intelligent
methods are applied to extract data patterns.
6. Pattern evaluation: identify the truly interesting
patterns representing knowledge based on
interestingness measures.
7. Knowledge presentation: visualization and knowledge
representation techniques are used to present mined
knowledge to users. Figure: Data mining as a step in the process of knowledge
discovery.
What is Data Mining

The first four steps (Steps 1 – 4) are different forms of data preprocessing, where data are prepared for mining.
The data mining step may interact with the user or a knowledge base. The interesting patterns are presented to
the user and may be stored as new knowledge in the knowledge base.

Though the data mining is shown as one step in the knowledge discovery process, in industry, in media, and in
the research milieu, the term data mining is often used to refer to the entire knowledge discovery process.

Therefore, we adopt a broad view of data mining functionality: Data mining is the process of discovering
interesting patterns and knowledge from large amounts of data. The data sources can include databases, data
warehouses, the Web, other information repositories, or data that are streamed into the system dynamically.
What Kinds of Data can be Mined?

As a general technology, data mining can be applied to any kind of data as long as the data are meaningful for a
target application.

However, the most basic forms of data for mining applications are database data, data warehouse data, and
transactional data.

Data mining can also be applied to other forms of data, e.g., data streams, ordered/sequence data, graph or
networked data, spatial data, text data, multimedia data, the WWW.

Data mining continues to embrace new data types as they emerge.

What Kinds of Patterns can be Mined?
There are a number of data mining functionalities. Primarily, they are:
Characterization and discrimination: Data characterization is a summarization of the general characteristics or features
of a target class of data. Data discrimination is a comparison of the general features of the target class data objects against
the general features of objects from one or multiple contrasting classes.
Mining of frequent patterns, associations, and correlations: Frequent patterns are the patterns that occur
frequently in data. Association is strong relationships among the items of frequent patterns. Correlation is
interesting statistical correlations between associated attribute–value pairs.
Classification and regression: Classification is the process of finding a model (or function) that describes and
distinguishes data classes or concepts, whereas classification predicts categorical (discrete, unordered) labels,
regression models continuous-valued functions.
Clustering analysis: Unlike classification and regression, which analyze class-labeled (training) data sets,
clustering analyzes data objects without consulting class labels.
Outlier analysis: A data set may contain objects that do not comply with the general behavior or model of the
data. These data objects are outliers.
Data mining functionalities are used to specify the kinds of patterns to be found in data mining tasks. In general,
such tasks can be classified into two categories: descriptive and predictive.
✓ Descriptive mining tasks characterize properties of the data in a target data set.
✓ Predictive mining tasks perform induction on the current data in order to make predictions.
What Technologies are Used?
As a highly application-driven domain, data mining has incorporated many techniques from other domains such
as statistics, machine learning, pattern recognition, database and data warehouse systems, information retrieval,
visualization, algorithms, high performance computing, and many application domains.

Figure: Data mining adopts techniques from many domains.

The interdisciplinary nature of data mining research and development contributes significantly to the success of
data mining and its extensive applications.
What Kinds of Applications are Targeted?

As a highly application-driven discipline, data mining has seen great successes in many applications. Its
applications are limited only by the human imagination.

However, the two highly successful and popular application examples of data mining: business intelligence and
search engines.

Business Intelligence (BI): BI technologies provide historical, current, and predictive views of business
operations. Examples include reporting, online analytical processing, business performance management,
competitive intelligence, benchmarking, and predictive analytics. The data mining is the core of business
intelligence.

Web Search Engines: Web search engine is a specialized computer server that searches for information on the
Web. The search results of a user query are often returned as a list (sometimes called hits). The hits may consist
of web pages, images, and other types of files. Web search engines are essentially very large data mining
applications. Various data mining techniques are used in all aspects of search engines, ranging from crawling,
indexing, and searching.
Major Issues in Data Mining
Data mining is a dynamic and fast-expanding field with great strengths. The major issues in data mining research
can be partitioned into the following five groups.

Mining methodology
✓ Mining various and new kinds of knowledge: Due to the diversity of applications, new mining tasks
continue to emerge, making data mining a dynamic and fast-growing field.
✓ Mining knowledge in multidimensional space: When searching for knowledge in large data sets, we can
explore the data in multidimensional space.
✓ Data mining – an interdisciplinary effort: The power of data mining can be substantially enhanced by
integrating new methods from multiple disciplines.
✓ Boosting the power of discovery in a networked environment: Most data objects reside in a linked or
interconnected environment, whether it be the Web, database relations, files, or documents.
✓ Handling uncertainty, noise, or incompleteness of data: Data often contain noise, errors, exceptions, or
uncertainty, or are incomplete. Errors and noise may confuse the data mining process, leading to the
derivation of erroneous patterns.
✓ Pattern evaluation and pattern- or constraint-guided mining: Not all the patterns generated by data
mining processes are interesting. Pattern interestingness is user dependent. Therefore, techniques are
needed to assess the interestingness of discovered patterns based on subjective measures.
Major Issues in Data Mining

User interaction
✓ Interactive mining: Interactive mining should allow users to dynamically change the focus of a search, to
refine mining requests based on returned results,
✓ Incorporation of background knowledge: Background knowledge, constraints, rules, and other
information regarding the domain under study should be incorporated into the knowledge discovery
process.
✓ Ad hoc data mining and data mining query languages: There should be high-level data mining query
languages or other high-level flexible user interfaces that give users the freedom to define ad hoc data
mining tasks.
✓ Presentation and visualization of data mining results: How can a data mining system present data mining
results, vividly and flexibly, so that the discovered knowledge can be easily understood and directly
usable by humans?
Major Issues in Data Mining
Efficiency and scalability
✓ Efficiency and scalability of data mining algorithms: The algorithms must be efficient and scalable to
effectively extract information from huge amounts of data in many data repositories or in dynamic data
streams.
✓ Parallel, distributed, and incremental mining algorithms: The humongous size of many data sets, the wide
distribution of data, and the computational complexity of some data mining methods are prime factors.
Diversity of data types
✓ Handling complex types of data: Diverse applications generate a wide spectrum of new and complex data
types.
✓ Mining dynamic, networked, and global data repositories: Mining gigantic, interconnected information
networks may help disclose many more patterns and knowledge in heterogeneous data sets than can be
discovered from a small set of isolated data repositories.
Data mining and society
✓ Social impacts of data mining: With data mining penetrating our everyday lives, it is important to study
the impact of data mining on society.
✓ Privacy-preserving data mining: Data mining poses the risk of disclosing an individual’s personal
information.
✓ Invisible data mining: People should be able to perform data mining or use data mining results simply by
mouse clicking, without any knowledge of data mining algorithms.

Data Mining
No ratings yet
Data Mining
395 pages
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
A6V10316241 NK8237 Installation
No ratings yet
A6V10316241 NK8237 Installation
96 pages
Data Mining
No ratings yet
Data Mining
61 pages
NBA 2K13 PSP Manual Digital
50% (2)
NBA 2K13 PSP Manual Digital
10 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
Unit III
No ratings yet
Unit III
101 pages
1 01intro, 2data (Except2 3), 3preprocessing
No ratings yet
1 01intro, 2data (Except2 3), 3preprocessing
169 pages
Nphies ValidationErrorCodes-V2.6 - OBA 1
No ratings yet
Nphies ValidationErrorCodes-V2.6 - OBA 1
783 pages
Data Mining - Concepts and Techniques
No ratings yet
Data Mining - Concepts and Techniques
224 pages
01 Intro
No ratings yet
01 Intro
52 pages
1 Intro
No ratings yet
1 Intro
50 pages
Inf 444e - Datamining N Advanced Databases Introduction 2019
No ratings yet
Inf 444e - Datamining N Advanced Databases Introduction 2019
32 pages
DWDM R19 Unit 1
No ratings yet
DWDM R19 Unit 1
27 pages
DB 14
No ratings yet
DB 14
97 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
Chapter 1 - Tagged
No ratings yet
Chapter 1 - Tagged
46 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
1 - Lect 1 & 2 Data Mining
No ratings yet
1 - Lect 1 & 2 Data Mining
20 pages
Haramaya University College of Engineering and Technology Department of Information Technology
No ratings yet
Haramaya University College of Engineering and Technology Department of Information Technology
38 pages
01 Intro
No ratings yet
01 Intro
45 pages
Data Mining L1,2
No ratings yet
Data Mining L1,2
26 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
01 Intro 1
No ratings yet
01 Intro 1
50 pages
Module - 1 - DM
No ratings yet
Module - 1 - DM
52 pages
Data Science Module 1 Notes
No ratings yet
Data Science Module 1 Notes
16 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
Design and Construction of A Battery Level Indicator
No ratings yet
Design and Construction of A Battery Level Indicator
10 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
41 pages
DM 1
No ratings yet
DM 1
78 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
8 pages
Data Mining and Its Applications
No ratings yet
Data Mining and Its Applications
60 pages
Data Mining
No ratings yet
Data Mining
27 pages
2 DM Module 1 Introduction DVS
No ratings yet
2 DM Module 1 Introduction DVS
81 pages
DMWH M1
No ratings yet
DMWH M1
25 pages
Dmi Unit 1 - 186 - N3
No ratings yet
Dmi Unit 1 - 186 - N3
12 pages
1 Chapter One
No ratings yet
1 Chapter One
54 pages
Data Mining and Warehousing-1
No ratings yet
Data Mining and Warehousing-1
43 pages
Screenshot 2023-10-19 at 11.36.57
No ratings yet
Screenshot 2023-10-19 at 11.36.57
27 pages
01 Intro
No ratings yet
01 Intro
40 pages
DM
No ratings yet
DM
15 pages
Dataminig
No ratings yet
Dataminig
21 pages
Anaum Hamid: Lecture 01 - Introduction To DM
No ratings yet
Anaum Hamid: Lecture 01 - Introduction To DM
50 pages
Solution Manual For Fundamentals of Communication Systems, 2/E J G. Proakis, M Salehi
No ratings yet
Solution Manual For Fundamentals of Communication Systems, 2/E J G. Proakis, M Salehi
42 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
July 16, 2009 1 Data Mining
No ratings yet
July 16, 2009 1 Data Mining
26 pages
Limooezekii Report 7
No ratings yet
Limooezekii Report 7
17 pages
Data Mining in Search Engine Analytics
No ratings yet
Data Mining in Search Engine Analytics
7 pages
NI Tutorial 6628 en
No ratings yet
NI Tutorial 6628 en
6 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
Introduction To Data Mining - 125604
No ratings yet
Introduction To Data Mining - 125604
7 pages
Acp Excise
No ratings yet
Acp Excise
11 pages
Data Mining
No ratings yet
Data Mining
7 pages
Cyber Security UNIT-2
No ratings yet
Cyber Security UNIT-2
40 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
1 Intro
No ratings yet
1 Intro
33 pages
Motivation of Data Mining
No ratings yet
Motivation of Data Mining
4 pages
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
No ratings yet
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
6 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
01 Introduction To Data Mining
No ratings yet
01 Introduction To Data Mining
6 pages
Listening Practice Questions
No ratings yet
Listening Practice Questions
28 pages
Data Mining: by Doug Alexander
No ratings yet
Data Mining: by Doug Alexander
6 pages
Data Mining
No ratings yet
Data Mining
7 pages
Project Book Finish
No ratings yet
Project Book Finish
40 pages
Verilog Lecture 1 - Noopur
No ratings yet
Verilog Lecture 1 - Noopur
41 pages
CG Lab4 Assignment 2021IMT015 APOORV JAIN
No ratings yet
CG Lab4 Assignment 2021IMT015 APOORV JAIN
27 pages
Literature Review Natural Language Processing
100% (2)
Literature Review Natural Language Processing
6 pages
3-Linear Regreesion-Assumptions
No ratings yet
3-Linear Regreesion-Assumptions
28 pages
SLG Module 10.2.1 Smplifying and Evaluating Rational Expressions (Casas, Albiso)
No ratings yet
SLG Module 10.2.1 Smplifying and Evaluating Rational Expressions (Casas, Albiso)
5 pages
Chapter-2 (Data)
No ratings yet
Chapter-2 (Data)
95 pages
Rosslare ACQ41 Product Manual
No ratings yet
Rosslare ACQ41 Product Manual
58 pages
Ds Assignment - 9 - Debajyoti - Dhar - Bcs - 021
No ratings yet
Ds Assignment - 9 - Debajyoti - Dhar - Bcs - 021
17 pages
Services That We Can Offer
No ratings yet
Services That We Can Offer
3 pages
UD10 Dettronics Gas Detector Data Sheet
No ratings yet
UD10 Dettronics Gas Detector Data Sheet
2 pages
Privilege 12 Eylul 2022-2023 Answer Key PDF 10
No ratings yet
Privilege 12 Eylul 2022-2023 Answer Key PDF 10
1 page
78-Identify Input and Output Devices
No ratings yet
78-Identify Input and Output Devices
16 pages
Ai 1
No ratings yet
Ai 1
3 pages
3.255 Million Cookies - Cookie Clicker
No ratings yet
3.255 Million Cookies - Cookie Clicker
1 page
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Authorization Letter
No ratings yet
Authorization Letter
2 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
Linux Imp Topics
No ratings yet
Linux Imp Topics
29 pages
CG Assignment-2 2021bcs045
No ratings yet
CG Assignment-2 2021bcs045
19 pages
EdgeWise Structure Guide
No ratings yet
EdgeWise Structure Guide
19 pages
Amazon 664 1490 Euro 3 Pallets 1
No ratings yet
Amazon 664 1490 Euro 3 Pallets 1
12 pages
Oracle Break Glass For Fusion Cloud Ds
No ratings yet
Oracle Break Glass For Fusion Cloud Ds
5 pages
2-Introduction - ML Vs Conventional Deteministic Etc
No ratings yet
2-Introduction - ML Vs Conventional Deteministic Etc
21 pages
Map Reduce
No ratings yet
Map Reduce
21 pages
4-R Code and PPT - Predicting Medical Expenses Using Linear Regression - New Without Prerequsit
No ratings yet
4-R Code and PPT - Predicting Medical Expenses Using Linear Regression - New Without Prerequsit
17 pages
С1 Smartwatches Are They as Smart as We Think 1
No ratings yet
С1 Smartwatches Are They as Smart as We Think 1
15 pages
Data-Information-Data Analytics
No ratings yet
Data-Information-Data Analytics
15 pages
Exoplayer Dev Playlists HTML
No ratings yet
Exoplayer Dev Playlists HTML
1 page
Welcome To My Presentation: Women Safety Device Based On Iot Prepared By: Id# 171207
No ratings yet
Welcome To My Presentation: Women Safety Device Based On Iot Prepared By: Id# 171207
15 pages
5-Supervised and Unsupervised
No ratings yet
5-Supervised and Unsupervised
7 pages
Power Management Systems - Predictive Maintenance & Energy Sourcing Opportunities
No ratings yet
Power Management Systems - Predictive Maintenance & Energy Sourcing Opportunities
8 pages
Lab Manual 01 (Introduction)
No ratings yet
Lab Manual 01 (Introduction)
5 pages
CC Lab 4 - WSDL
No ratings yet
CC Lab 4 - WSDL
4 pages
Sonica Eswar Resume
No ratings yet
Sonica Eswar Resume
1 page

Chapter-1 (Introduction)

Uploaded by

Chapter-1 (Introduction)

Uploaded by

Data Mining - Introduction

Pramod Kumar Singh

◼ Why Data Mining?

◼ What Is Data Mining?

◼ What Kind of Data Can Be Mined?

◼ What Kinds of Patterns Can Be Mined?

◼ What Technologies Are Used?

◼ What Kind of Applications Are Targeted?

◼ Major Issues in Data Mining

This explosively growing, widely available, and gigantic

Powerful and versatile tools are badly needed to

This necessity has led to the birth of data mining.

Figure: The world is data rich but information poor.

Since the 1960s, database and information technology

After the establishment of database management

Advanced data analysis came in late 1980s onward

Consequently, important decisions are often made

The widening gap between data and information calls

The data mining, a truly interdisciplinary subject, is

The other popular names of data mining are knowledge

The data mining is also popularly known as knowledge

Figure: Data mining – searching the knowledge

Data mining continues to embrace new data types as they emerge.

Figure: Data mining adopts techniques from many domains.

You might also like