0% found this document useful (0 votes)
27 views19 pages

Data Mining

Uploaded by

GOURAV GHOSH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views19 pages

Data Mining

Uploaded by

GOURAV GHOSH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

DATA WAREHOUSING

AND MINING

TOPIC: INTRODUCTION
TO DATA MINING

BY – GOURAV GHOSH
ROLL NO. - 31154322014
What is Data Mining/KDD

Data mining (knowledge discovery from data)


 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of data
What is Data Mining

 By definition is the process of extracting previously


unknown data from large databases and using it to make
orgnisational decisions.
 Is concerned with the discovery of hidden knowledge.
 Usually works on large volumes of data
 Is useful in making critical organisationnal decisions,
particularly those of strategic nature
Data Mining
Data Mining referred using a number of names:
Data Fishing, Data Dredging (1960…):
 Used by statisticians (as bad name)
Knowledge Discovery in Databases (1989…):
 Used by AI, Machine Learning Community
Business Intelligence (1990…):
 Business management term
Also data archaeology, information harvesting, information
discovery, knowledge extraction, data/pattern analysis, etc.
Data Mining: On What Kinds
Of Data?
Relational database
Data warehouse
Transactional database
Advanced database and information repository
 Object-relational database
 Spatial and temporal data
 Time-series data
 Stream data
 Multimedia database
 Text databases & WWW
Data Mining Functionalities

Concept description
 Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet
regions
Association (correlation and causality)
 Nappies & Beer
Classification and Prediction
 Construct models that describe and distinguish classes or concepts for future
prediction
 Predict some unknown or missing numerical values
Data Mining Functionalities

Cluster analysis
 Class label is unknown: Group data to form new classes,
e.g., cluster houses to find distribution patterns
Outlier analysis
 Outlier:a data object that does not comply with the general
behavior of the data
 Noise or exception? No! useful in fraud detection and rare
event analysis
Other pattern-directed or statistical analyses
Data Mining is
Multidisciplinary
Statistics
Pattern Neurocomputing
Recognition

Machine
Data Mining Learning AI

Databases
KDD
Why we Need Data Mining

Data explosion problem


 Automated data collection tools and mature database technology lead to huge
amounts of data accumulated
We are drowning in data, but starving for knowledge!
Solution: Data warehousing and data mining
 Data warehousing and on-line analytical processing
 Mining interesting knowledge (rules, regularities, patterns, constraints) from
data in large databases
Potential Applications

Data analysis and decision support


 Market analysis and management
 Risk analysis and management
 Fraud detection and detection of unusual patterns
Other applications
 Text mining (email, documents) and Web mining
 Stream data mining
 DNA and bio-data analysis
Stages of KDD
Knowledge

Evaluation &
Presentation

Data Mining

Selection &
Transformation
Data
Warehouse

Cleaning &
Integration

Databases
Issues and Challenges of Data
Mining
Data mining methodology
 Mining different kinds of knowledge from diverse data types, e.g., bio, stream,
Web
 Performance: efficiency, effectiveness, and scalability
 Pattern evaluation: the interestingness problem
 Incorporation of background knowledge
 Handling noise and incomplete data
 Parallel, distributed and incremental mining methods
 Integration of the discovered knowledge with existing one: knowledge fusion
Issues and Challenges of Data
Mining
User interaction
 Data mining query languages and ad-hoc mining
 Expression and visualization of resultant knowledge
 Interactive mining of knowledge at multiple levels of abstraction
Applications and social impacts
 Domain-specific data mining & invisible data mining
 Protection of data security, integrity, and privacy
Market Analysis And
Management
Where does the data come from?
 Credit card transactions, loyalty cards, discount coupons, customer complaint
calls, etc
Target marketing
 Find clusters of “model” customers who share the same characteristics
 Determine customer purchasing patterns over time
Cross-market analysis
 Associations/co-relations between product sales, & prediction based on such
association
Market Analysis And
Management (cont…)
Customer profiling
 What types of customers buy what products (clustering or classification)
Customer requirement analysis
 Identifying the best products for different customers
 Predict what factors will attract new customers
Provision of summary information
 Multidimensional summary reports
 Statistical summary information (data central tendency and variation)
Corporate Analysis & Risk
Management
Finance planning and asset evaluation
 Cash flow analysis and prediction
 Contingent claim analysis to evaluate assets
 Cross-sectional and time series analysis (financial-ratio, trend
analysis, etc.)
Resource planning
 Summarize and compare the resources and spending
Competition
 Monitor competitors and market directions
 Group customers into classes and a class-based pricing procedure
 Set pricing strategy in a highly competitive market
Fraud Detection & Mining
Unusual Patterns
Applications: Health care, retail, credit card service, telecommunications
 Auto insurance: ring of collisions
 Money laundering: suspicious monetary transactions
 Medical insurance
 Professional patients, ring of doctors, and ring of
references
 Unnecessary or correlated screening tests
 Telecommunications: phone-call fraud
 Phone call model: destination of the call, duration, time of
day or week. Analyze patterns that deviate from an
expected norm
 Retail industry
 Analysts estimate that 38% of retail shrink is due to
dishonest employees
 Anti-terrorism
THANK YOU

You might also like