0% found this document useful (0 votes)
59 views31 pages

Data Mining

Data mining is the process of discovering patterns in large data sets. It is a core component of knowledge discovery in databases and involves using algorithms to explore data, develop models, and discover previously unknown patterns. The overall goal of data mining is to extract meaningful information from data and transform it into an understandable structure to aid analysis and decision making. Some common data mining techniques include classification, regression, clustering, and dependency modeling.

Uploaded by

ASHA MAHINI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views31 pages

Data Mining

Data mining is the process of discovering patterns in large data sets. It is a core component of knowledge discovery in databases and involves using algorithms to explore data, develop models, and discover previously unknown patterns. The overall goal of data mining is to extract meaningful information from data and transform it into an understandable structure to aid analysis and decision making. Some common data mining techniques include classification, regression, clustering, and dependency modeling.

Uploaded by

ASHA MAHINI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Data Mining (The analysis step of Knowledge Discovery in

Databases” Process or KDD), an interdisciplinary subfield of


computer Science, is the computational process of discovering
patterns in large data sets involving methods at the intersection
of artificial intelligence, machine learning, statistics, and
database management systems.

WHAT IS DATA MINING


BASIC-DEFINITIONS OF DATA
MINING
• The discovery of new, non-obvious, valuable information from
a large collection of raw data

• Data Mining (DM) is the core of the KDD [Knowledge Discovery in


Databases] process, involving the inferring of algorithms that
explore the data, develop the model and discover previously unknown
patterns.

• The set of activities used to find new, hidden or unexpected


patterns in data
DEFINITIONS OF DATA MINING
The detection of patterns from existing data.

pattern
1. A consistent, trait, feature, or method. 

2. Any combination of values that contain meaning within the


context or domain for which they are being reviewed
The overall goal of the data mining process is to extract
information from a data set and transform it into an
understandable structure for further use:
(Predictive analytics)
Discovering meaningful new corrections, patterns, trends.
Example : Forecasting

DATA MINING -CONTINUED


Data analytics (DA) is the science of examining
raw data with the purpose of drawing conclusions
about that information. Data analytics is used in
many industries to allow companies and
organization to make better business decisions
and in the sciences to verify or disprove existing
models or theories

DATA ANALYTICS/PREDICTIVE
ANALYTICS
Data analytics is distinguished from Data mining by the scope, purpose
and focus of the analysis. Data miners sort through huge data sets using
sophisticated software to identify undiscovered patterns and establish
hidden relationships. Data analytics focuses on inference, the process of
deriving a conclusion based solely on what is already known by the
researcher.
Uses lower level of Granularity, meaning it looks at the
individual level. Instead of looking at which candidate will
win the Presidential election in the state of Ohio, which is
forecasting. It looks at the individual level.
Which person is voting for or against.
Predicts which individuals can be persuaded, which ones will
not change, etc. Now with this information we ca change the
outcome of the race.
Obama used this technique very well.

PREDICTIVE ANALYTICS - FOCUS


Data mining is one of the “10 emerging technologies
that will change the world” listed by the MIT
Technology Review (Larose).

There is no doubt why many firms embrace data


mining in their operations. An article in Information
System Management points out that “data mining has
become a widely accepted process for organizations to
enhance their organizational performance and gain a
competitive advantage”

EMERGING TECHNOLOGY
DATA MINING: BUSINESS

 
• What is it?
 Decision making
 Marketing
 Detecting Fraud 
 
This technology is popular with many businesses because it allows them to learn more
about their customers, prevent frauds and identity theft, and also make smart marketing
decisions
  
 
Keys to a Successful Data Mining Project

• Credible source of data

• Knowledgeable personnel

• Appropriate algorithms 
PRIMARY TASKS OF DATA MINING
Classification classify a data item into one of
several predefined classes

Regression map a data item to a real-value


prediction variable

Clustering identify a finite set of


categories or clusters to
describe the data
Summarization find a compact description for a
set (or subset) of data

Dependency Modeling describe significant dependencies


between variables or between the
values of a feature

Change and Deviation Discover the most significant


Detection changes
SOME OF THE COMMONLY USED DATA
MINING METHODS ARE:

• Statistical Data Analysis


• Cluster Analysis
• Decision Trees and Decision Rules
• Association Rules
• Artificial Neural Networks
• Genetic Algorithms
• Fuzzy Sets and Fuzzy Logic
In direct marketing a company saves much time by marketing to
prospects that would have the highest reply rate. Instead of random
selection on which customers to pick for their surveys, a company could
use direct marketing from data mining to find the “correct” customers to
ask.

DATA MINING APPLICATIONS


DIRECT MARKETING USING DATA MINING,
GIVES US 3% CONVERSION

 Identifies smaller group, example ¼ of population and gets


a higher conversion, 3% ,
DATA MINING APPLICATIONS
Market segmentation is used in data mining in order to identify the
common characteristics of customers who buy the products from one’s
company.

With market segmentation, you will be able to find behaviors that are
common among your customers. As a company seeks customer’s trends,
it helps them find necessities in order to help them improve their
business.
DATA MINING APPLICATIONS

Customer churn predicts which customers will have a


change of heart
towards your company and join another company
(competitor). Although customer churns are negative to
one’s business, it allows the corporation to seek out the
problem they are facing and create solutions.
 Example: Magazine subscriber
 Ideas to keep customer:
 Discount, coupons, etc.

CUSTOMER CHURN
Market basket analysis- involves researching customer
characteristics in respect to their purchase patterns

Example: Ralphs Club Card


Cereal and Milk

DATA MINING APPLICATIONS


 Beer and diapers
 merchandising

MARKET BASKET
 Examples of real life.

 Target – can predict which customers will be pregnant


 Hospitals can predict which payments may need to be
admitted
 Credit card – can predict which customers may miss their
payment based upon where card is used. Example Bar-
alcohol=missed payments

PREDICTION BASED ON DATA


MINING/PREDICTIVE ANALYSIS
Class Identification
 

• Mathematical taxonomy

• Concept clustering
Class identification, which consists of mathematical taxonomy and concept
clustering. Mathematical taxonomy focuses on what makes the members of
a certain class similar, as opposed to differentiating one class from another.

For example, Ralphs can classify its customers based on their income or past
purchases

DATA MINING APPLICATIONS


Concept clustering - determines clusters according to attribute similarity.

Consider the pattern a purchase of toys for age group 3–5 years, is followed by
purchase of kid’s bicycle within 6 months about 90% of the time by high
income customers, which was discovered by data mining. The Company can
identify the prospective customers for kid’s bicycle based
on toy purchase details and adjust the mail catalog accordingly.

DATA MINING APPLICATIONS


Deviation analysis, A deviation can be fraud or a change. In the past,
such deviations were difficult to detect in time to take corrective action.
Data mining tools help identify such deviations .
For example, a higher than normal credit purchase on a credit card can be a
fraud, or a genuine purchase by the customer. Once a deviation has been
discovered as a fraud, the company takes steps to prevent such frauds
and initiates corrective action

DATA MINING APPLICATIONS


MAKING BETTER DECISIONS

• Patterns and trends

• What to produce?

• Equal Success
PRIVACY
•Sensitive information

Data mining increase


incentives to get
more sensitive data

Seeing into private


future- Target
Do we have the right

Employers try to
predict churn
RESPONSE BIAS ISSUES
 
• Types
o –Coverage or frame error
o –Sampling error
o –Nonresponsive error
o –Measurement error

• Flawed data
The most recent and most promising use of data mining has been the
development of data mining
tools for the medical sector. The use of
data mining to extract patterns from medical data provides near endless
opportunities for symptom trend detection, earlier detection of illness, DNA
trend analysis and improved patient reactions to medicines. These many
advantages allow doctors and hospitals to be more effective and more efficient.

DATA MINING IN MEDICAL


ADVANTAGES OF DATA MINING: MEDICINE

• Earlier detection of illness

• Symptom trends

• Data analysis

• Improved drug reactions


DISADVANTAGES OF DATA MINING:
MEDICINE

• No uniform language - Medical


 
 
 
• Incomplete records
 

• Privacy
DATA MINING - MEDICAL

How data mining is actually used to analyze individual data can become quite
complex due to the data. The goal of the process is to take the medical data
which contain many attributes and determine which ones are actually relevant to
the diagnosis, symptom or result.

Two methods used in medical data mining are clustering, discussed previously
and biclustering.

You might also like