0% found this document useful (0 votes)
23 views28 pages

DM - Lecture 1

The document discusses data mining concepts presented by Lecturer Aamana. It covers data mining techniques like classification, clustering, regression, and association rule mining. It also addresses challenges in data mining implementation and potential applications in domains like market analysis, risk analysis, and bioinformatics. Key data mining tasks are discussed including characterization, discrimination, and trend analysis from large datasets.

Uploaded by

Amna Arooj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views28 pages

DM - Lecture 1

The document discusses data mining concepts presented by Lecturer Aamana. It covers data mining techniques like classification, clustering, regression, and association rule mining. It also addresses challenges in data mining implementation and potential applications in domains like market analysis, risk analysis, and bioinformatics. Key data mining tasks are discussed including characterization, discrimination, and trend analysis from large datasets.

Uploaded by

Amna Arooj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

Data Mining:

Concepts and Techniques


Lecturer Aamana
Dept of Software Engineering

Research interest
•Recommender System
•Machine learning
•Natural Language processing

March 4, 2024 Data Mining 1


J. Leskovec, A. Rajaraman, J. Ullman: Mining of
Massive Datasets, http://www.mmds.org 2
Why Mine Data? Commercial Viewpoint

March 4, 2024 Data Mining - Dr Naima Iltaf 3


Why Mine Data? Scientific Viewpoint

March 4, 2024 Data Mining - Dr Naima Iltaf 4


Mining Large Data Sets - Motivation

March 4, 2024 Data Mining - Dr Naima Iltaf 5


Evolution of Database Technology
 1960s:
 Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
 Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web
databases
 2000s
 Stream data management and mining
 Data mining and its applications
 Web technology (XML, data integration) and global information systems

March 4, 2024 Data Mining - Dr Naima Iltaf 6


What Is Data Mining?

March 4, 2024 Data Mining - Dr Naima Iltaf 7


What Is Data Mining?
 Given lots of data
 Discover patterns and models that are:
 Valid: hold on new data with some certainty
 Useful: should be possible to act on the item
 Unexpected: non-obvious to the system
 Understandable: humans should be able to
interpret the pattern

March 4, 2024 Data Mining - Dr Naima Iltaf 8


Predicting through Scores

March 4, 2024 Data Mining - Dr Naima Iltaf 9


Find patterns in historical data

March 4, 2024 Data Mining - Dr Naima Iltaf 10


Data Mining is not …
 Generating multidimensional cubes of a relational
table

March 4, 2024 Data Mining - Dr Naima Iltaf 11


Data Mining is not …
 Searching for a  Searching for
phone number in a keywords on Google
phone book

March 4, 2024 Data Mining - Dr Naima Iltaf 12


Data Mining is not …
 Generating a  Issuing SQL query to
histogram of salaries a database, and
for different age reading the reply
groups

March 4, 2024 Data Mining - Dr Naima Iltaf 13


What Is Data Mining?

 Data mining (knowledge discovery from data)


 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of data

 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.

March 4, 2024 Data Mining - Dr Naima Iltaf 14


Data Mining tasks

March 4, 2024 Data Mining - Dr Naima Iltaf 15


Data Mining: Confluence of Multiple Disciplines

Database
Technology Statistics

Machine Visualization
Learning Data Mining

Pattern
Recognition Other
Algorithm Disciplines

March 4, 2024 Data Mining - Dr Naima Iltaf 16


Why Not Traditional Data Analysis?
 Tremendous amount of data
 Algorithms must be highly scalable to handle such as tera-bytes of
data
 High-dimensionality of data
 Micro-array may have tens of thousands of dimensions
 High complexity of data
 Data streams and sensor data
 Time-series data, temporal data, sequence data
 Structure data, graphs, social networks and multi-linked data
 Heterogeneous databases and legacy databases
 Spatial, spatiotemporal, multimedia, text and Web data
 Software programs, scientific simulations
 New and sophisticated applications
March 4, 2024 Data Mining - Dr Naima Iltaf 17
Data Mining Techniques

March 4, 2024 Data Mining - Dr Naima Iltaf 19


Data Mining Techniques
Classification
•used to retrieve important and relevant information about data, and
metadata.
• helps to classify data in different classes

Clustering
•identify data that are like each other
•helps to understand the differences and similarities between the data

Regression
•identifying and analyzing the relationship between variables.
•used to identify the likelihood of a specific variable, given the presence
of other variables

Association Rules:
helps to find the association between two or more Items
discovers a hidden pattern in the data set
March 4, 2024 Data Mining - Dr Naima Iltaf 20
Data Mining Techniques
Outlier detection
 refers to observation of data items in the dataset which do not match
an expected pattern or expected behavior
 used in a variety of domains, such as intrusion, detection, fraud or
fault detection, etc
Sequential Patterns
 helps to discover or identify similar patterns or trends in transaction
data for certain period
Prediction
 used a combination of the other data mining techniques like trends,
sequential patterns, clustering, classification, etc
 analyzes past events or instances in a right sequence for predicting a
future event.

Data Mining - Dr Naima Iltaf 21


Challenges of Implementation
of Data mining
 Skilled Experts are needed to formulate the data mining queries.
 Overfitting: Due to small size training database, a model may not fit
future states.
 Data mining needs large databases which sometimes are difficult to
manage
 Business practices may need to be modified to determine to use the
information uncovered.
 If the data set is not diverse, data mining results may not be accurate.
 Integration information needed from heterogeneous databases and
global information systems could be complex
March 4, 2024 Data Mining - Dr Naima Iltaf 22
Data Mining—Potential Applications

 Data analysis and decision support


 Market analysis and management
 Target marketing, customer relationship management (CRM),
market basket analysis, cross selling, market segmentation
 Risk analysis and management
 Forecasting, customer retention, improved underwriting,
quality control, competitive analysis
 Fraud detection and detection of unusual patterns (outliers)
 Other Applications
 Text mining (news group, email, documents) and Web mining
 Stream data mining
 Bioinformatics and bio-data analysis

March 4, 2024 Data Mining - Dr Naima Iltaf 23


Challenges in Data Mining

March 4, 2024 Data Mining - Dr Naima Iltaf 24


Summary

 Data mining: Discovering interesting patterns from large amounts of


data
 A natural evolution of database technology, in great demand, with
wide applications
 A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
 Mining can be performed in a variety of information repositories
 Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend analysis, etc.
 Data mining systems and architectures
 Major issues in data mining

March 4, 2024 Data Mining - Dr Naima Iltaf 25


Evaluation


Based on:- Quizes 10


Quizzes Assignment(s) 10


Assignments
Project 5

Project
Midterm 25

Midterm and final examinations

Distribution of marks Final Exam 50

Total 100

03/04/24 26
Assignments and Reports

All assignments must be submitted by the due
date/time

In case of late submissions, marks will be
deducted

15% per late day. No submissions after 3 days
of due date.

Assignments must be submited as in both soft
copies and printed copies

MS Office documents

pdfs(recommended).

Multiple pages must be bounded together in an


03/04/24 appropriate manner (either stapled or bounded). 27
Course Resources


Jiawei Han ”Data Mining: Concepts and
Techniques”, Second Edition and above

Anything that you can find to help you learn.

03/04/24 28
Topic to be Covered
 Introduction to Data Mining
 Data Pre Processing
 Data Reduction
 Association Analysis
 Sequence mining
 Clustering
 Classification
 Link analysis
 Outlier mining
 Text Mining
 Web mining
 Recommender System
March 4, 2024 Data Mining - Dr Naima Iltaf 29

You might also like