1.1 - Intro DM
1.1 - Intro DM
UNIT – I:
Introduction to Data Mining: Introduction, What is Data Mining, Definition, KDD, Challenges, Data Mining
Tasks, Data Preprocessing, Data Cleaning, Missing data, Dimensionality Reduction, Feature Subset Selection,
Discretization and Binaryzation, Data Transformation; Measures of Similarity and Dissimilarity- Basics.
1.1 Introduction – What is Data Mining , Definition
In general terms, “Mining” is the process of extraction of some valuable material from the earth e.g. coal
mining, diamond mining etc. In the context of computer science, “Data Mining” refers to the extraction of
useful information from a bulk of data or data warehouses.
In case of coal or diamond mining, the result of extraction process is coal or diamond. But in case of
Data Mining, the result of extraction process is not data!! Instead, the result of data mining is the patterns and
knowledge that we gain at the end of the extraction process.
Thus, Data Mining is also known as Knowledge Discovery or Knowledge Extraction.
Currently, Data Mining and Knowledge Discovery are used interchangeably.
Data Mining refers to the nontrivial extraction of implicit, previously unknown and potentially useful
information from data in databases.
Now a days, data mining is used in almost all the places where a large amount of data is stored and
processed.
For example, banks typically use ‘data mining’ to find out their prospective customers who could be
interested in credit cards, personal loans or insurances as well. Since banks have the transaction details and
detailed profiles of their customers, they analyze all this data and try to find out patterns which help them predict
that certain customers could be interested in personal loans etc.
Main Purpose of Data Mining
Basically, the information gathered from Data Mining helps to predict hidden patterns, future trends and
behaviors and allowing businesses to take decisions.
Technically, data mining is the computational process of analyzing data from different perspective, dimensions,
angles and categorizing / summarizing it into meaningful information.
Data Mining can be applied to any type of data e.g. Data Warehouses, Transactional Databases, Relational
Databases, Multimedia Databases, Spatial Databases, Time-series Databases, World Wide Web.
Real life example of Data Mining – Market Basket Analysis
Market Basket Analysis is a technique which gives the careful study of purchases done by a customer in a super
market. The concept is basically applied to identify the items that are bought together by a customer. Say, if a
person buys bread, what are the chances that he/she will also purchase butter. This analysis helps in promoting
offers and deals by the companies. The same is done with the help of data mining.
DATA MINING TECHNIQUES
a. Artificial Neural Networks
We use data mining in non-linear predictive models. As this learn through training and resemble biological
neural networks in structure.
b. Decision Trees
As we use tree-shaped structures to represent sets of decisions. Also, these rules are generated for the
classification of a dataset. These decisions generate rules for the classification of a dataset..
c. Genetic Algorithms
There are the present genetic combination, mutation, and natural selection for optimization techniques.
d. Nearest Neighbor Method
A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) like.
Sometimes called the k-nearest neighbour technique.
e. Rule Induction
The extraction of useful if-then rules from data based on statistical significance.
APPLICATIONS OF DATA MINING
1. Financial Analysis
2. Biological Analysis
3. Scientific Analysis
4. Intrusion Detection
5. Fraud Detection
6. Research Analysis
7. Weather forecasting.
8. E-commerce.
9. Self-driving cars.
10.Hazards of new medicine.
11. Space research.
12. Fraud detection.
13.Stck trade analysis.
14. Business forecasting.
15.Social networks.
16.Customers likelihood.
AREAS WHERE DATA MINING HAD GOOD AND BAD EFFECTS :
a. Good Effects
Predict future trends, customer purchase habits
Help with decision making
Improve company revenue and lower costs
Market basket analysis
Fraud detection
b. Bad Effects
User privacy/security
Amount of data is overwhelming
Great cost at an implementation stage
Possible misuse of information
The possible inaccuracy of data