0% found this document useful (0 votes)
51 views18 pages

1 - 1 Intro To Data Mining - ch1

This document provides an introduction and overview of a data mining course. It discusses what data mining is, how it has evolved from database and data warehouse technologies, its applications in domains like business, science, and healthcare, and the key steps in the knowledge discovery process including data cleaning, transformation, algorithm selection, mining, and evaluation. The goal of the course is to teach students basic principles, techniques, and applications of data mining using both theoretical and practical approaches.

Uploaded by

Rana Hafeez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views18 pages

1 - 1 Intro To Data Mining - ch1

This document provides an introduction and overview of a data mining course. It discusses what data mining is, how it has evolved from database and data warehouse technologies, its applications in domains like business, science, and healthcare, and the key steps in the knowledge discovery process including data cleaning, transformation, algorithm selection, mining, and evaluation. The goal of the course is to teach students basic principles, techniques, and applications of data mining using both theoretical and practical approaches.

Uploaded by

Rana Hafeez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Data Mining

Introduction to the Course and the Field

There is an inherent meaning in everything. “Signs for people who can see.”
agenda
Course Introduction

Course Details

What is Data Mining?


Course Details
• Course Description: The course of Data
Mining teaches the students
• Basic principles, techniques, tools and
applications of Data Mining.
• Science of data mining as the automatic
extraction of patterns representing knowledge
stored in large databases, data warehouses, and
other massive information repositories
• About the overlap that exists with areas such as
machine learning and pattern recognition.
• The concepts of data pre-processing, cluster
analysis, classification and prediction, frequent
pattern mining and data warehousing.
Course Resources
• Text book:
• Data Mining: Concepts and Techniques (3rd
Edition) by Jiawei Han and Micheline Kamber

• Reference book:
• Elements of Statistical Learning by Hastie,
Tibshirani and Friedman
• Freely available online (google for it)
Course Requirement
• You should have some knowledge of the
concepts and terminology associated with
• database systems,
• statistics,
• machine learning.

• You should have some programming


experience. In particular, you should be able to
read pseudo-code and understand simple data
structures such as multidimensional arrays.
WHAT IS DATA MINING?
How Data Mining?
• It evolved in to being as the science of
databases evolved

• Database  Datawarehouses  Data Mining

• Process similar to how science evolved

• Data Mining and Data Analytics is the fastest


growing discipline worldwide with plenty of
jobs
Evolution of Sciences
• Before 1600, empirical science
• 1600-1950s, theoretical science
• Each discipline has grown a theoretical component. Theoretical models often
motivate experiments and generalize our understanding.
• 1950s-1990s, computational science
• Over the last 50 years, most disciplines have grown a third, computational branch
(e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.)
• Computational Science traditionally meant simulation. It grew out of our inability
to find closed-form solutions for complex mathematical models.
• 1990-now, data science
• The flood of data from new scientific instruments and simulations
• The ability to economically store and manage petabytes of data online
• The Internet and computing Grid that makes all these archives universally
accessible
• Scientific info. management, acquisition, organization, query, and visualization
tasks scale almost linearly with data volumes. Data mining is a major new
challenge!
• Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online Science,
Comm. ACM, 45(11): 50-54, Nov. 2002
Evolution of Database Technology
• 1960s:
• Data collection, database creation, IMS and network DBMS
• 1970s:
• Relational data model, relational DBMS implementation
• 1980s:
• RDBMS, advanced data models (extended-relational, OO, deductive,
etc.)
• Application-oriented DBMS (spatial, scientific, engineering, etc.)
• 1990s:
• Data mining, data warehousing, multimedia databases, and Web
databases
• 2000s
• Stream data management and mining
• Data mining and its applications
• Web technology (XML, data integration) and global information systems
Data Mining Definition
• Data mining (knowledge discovery from data)
• Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge
from huge amount of data
• Data mining: a misnomer?
• Alternative names
• Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
• Watch out: Is everything “data mining”?
• Simple search and query processing
• (Deductive) expert systems
Why Data Mining?
• Huge volumes of Data available: from terabytes to petabytes
• Data collection and data availability
• Automated data collection tools, database systems, Web, computerized
society
• Cameras, publication tools, scanned text and images
• Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: Remote sensing, bioinformatics, scientific simulation, …
• Society and everyone: news, digital cameras, YouTube
• Medical data, demographic data, financial data and marketing data
• We are drowning in data, but starving for knowledge!
• “Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets
Why Data Mining?—Potential Applications
• Data analysis and decision support
• Market analysis and management
• Target marketing, customer relationship management (CRM),
market basket analysis, cross selling, market segmentation
• Risk analysis and management
• Forecasting, customer retention, quality control, competitive
analysis
• Fraud detection and detection of unusual patterns (outliers)
• Other Applications
• Text mining (news group, email, documents) and Web
mining
• Stream data mining
• Bioinformatics and bio-data analysis
Ex. 1: Market Analysis and Management
• Where does the data come from?—Credit card transactions,
loyalty cards, discount coupons, customer complaint calls, plus
(public) lifestyle studies
• Target marketing
• Find clusters of “model” customers who share the same
characteristics: interest, income level, spending habits, etc.
• Determine customer purchasing patterns over time
• Cross-market analysis—Find associations/co-relations between
product sales, & predict based on such association
• Customer profiling—What types of customers buy what
products (clustering or classification)
• Customer requirement analysis
• Identify the best products for different groups of customers
• Predict what factors will attract new customers
Ex. 2: Fraud Detection & Mining Unusual Patterns
• Approaches: Clustering & model construction for frauds, outlier
analysis
• Applications: Health care, retail, credit card service,
telecomm.
Ex.3: Biomedical Applications
• Approaches: Clustering &
Classification
• Applications:
• Automated diagnosis
• Discovery of disease trends
• Prediction of epidemics
• Discovering causes for certain conditions
• Patient data retrieval
Data Mining: Confluence of Multiple Disciplines

Database
Technology Statistics

Machine Visualization
Learning Data Mining

Pattern
Recognition Artificial
Algorithm Intelligence
Knowledge Discovery (KDD) Process

◦ Data mining—core of
Pattern Evaluation
knowledge discovery
process
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
KDD Process: Several Key Steps
• Learning the application domain
• relevant prior knowledge and goals of application
• Creating a target data set: data selection
• Data cleaning and preprocessing: (may take 60% of effort!)
• Data reduction and transformation
• Find useful features, dimensionality/variable reduction,
invariant representation
• Choosing functions of data mining
• summarization, classification, regression, association,
clustering
• Choosing the mining algorithm(s)
• Data mining: search for patterns of interest
• Pattern evaluation and knowledge presentation
• visualization, transformation, removing redundant patterns, etc.
• Use of discovered knowledge

You might also like