0% found this document useful (0 votes)
113 views

L1 - Introduction

This document provides an introduction to data mining. It defines data mining as extracting useful patterns from large amounts of data. The main steps in data mining are presented, including data preparation, model building, evaluation, and deployment. Classification, regression, clustering, association rule mining, and sequential pattern mining are described as common data mining tasks. The differences between DBMS, OLAP, and data mining are outlined.

Uploaded by

Veena Tella
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

L1 - Introduction

This document provides an introduction to data mining. It defines data mining as extracting useful patterns from large amounts of data. The main steps in data mining are presented, including data preparation, model building, evaluation, and deployment. Classification, regression, clustering, association rule mining, and sequential pattern mining are described as common data mining tasks. The differences between DBMS, OLAP, and data mining are outlined.

Uploaded by

Veena Tella
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

BITS Pilani

BITS Pilani Dr.Aruna Malapati


Asst Professor
Hyderabad Campus Department of CSIS
BITS Pilani
Hyderabad Campus

Data Mining - Introduction


Today’s Learning objective

• Define what is Data Mining

• List the Steps/Phases involved in data Mining

• Compare DBMS,OLAP and Data Mining

• List the Predictive and Descriptive Data Mining Tasks

BITS Pilani, Hyderabad Campus


DIKW Pyramid

BITS Pilani, Hyderabad Campus


Data generated from
several sources

BITS Pilani, Hyderabad Campus


Dreaded with data

BITS Pilani, Hyderabad Campus


What is Data Mining?

Searching for knowledge


from your data.

BITS Pilani, Hyderabad Campus


What is Data Mining?
• Data mining – Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful) patterns or knowledge
from huge amount of data.

• Alternative names – Knowledge discovery (mining) in databases


(KDD), knowledge extraction, data/pattern analysis, etc.

BITS Pilani, Hyderabad Campus


Data Mining Process

BITS Pilani, Hyderabad Campus


DBMS,OLAP and Data
Mining
DBMS OLAP DATA MINING
TASK Extract data Summaries, trends Knowledge
and forecasts Discovery of
hidden patterns
Type of Result Information Analysis Insight &
Prediction
Method Deduction Multidimensional Induction
data modelling,
Aggregation,
Statistics
Example List all What is the average Who will buy
customers who income of printers along with
purchased customers across computers?
Computers in regions?
the last year.
BITS Pilani, Hyderabad Campus
Data Mining Tasks

Objective is to predict the value of a particular


attribute based on the values of other attributes. Classification

Predictive Regression

Data Outlier Detection


Mining
Descriptive Clustering

Association
Objective is to derive patterns
Sequential Pattern
Mining
BITS Pilani, Hyderabad Campus
Classification Example
cal cal u s
ri ri uo
go g o i n
a te a te o nt a ss
c c c cl
Tid Home Marital Taxable Home Marital Taxable
Owner Status Income Default Owner Status Income Default

1 Yes Single 125K No No Single 75K ?

2 No Married 100K No Yes Married 50K ?

3 No Single 70K No No Married 150K ?


4 Yes Married 120K No Yes Divorced 90K ?
5 No Divorced 95K Yes No Single 40K ?
6 No Married 60K No No Married 80K ? Test
10

Set
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
Training
10 No Single 90K Yes Model
10
Set

BITS Pilani, Hyderabad Campus


Classification: Definition
• Given a collection of records (training set )
– Each record contains a set of attributes, one of the attributes is the class.

• Find a model for class attribute as a function of the values of


other attributes.

• Goal: previously unseen records should be assigned a class as


accurately as possible.
– A test set is used to determine the accuracy of the model. Usually, the
given data set is divided into training and test sets, with training set used to
build the model and test set used to validate it.

BITS Pilani, Hyderabad Campus


Regression

• For classification the output(s) is nominal

• In regression the output is continuous

– Function Approximation

• Many models could be used – Simplest is linear regression

– Fit data with the best hyper-plane which "goes through"


the points y
dependent
variable
(output)

x – independent variable (input)


BITS Pilani, Hyderabad Campus
Clustering

BITS Pilani, Hyderabad Campus


Association Rule Mining

BITS Pilani, Hyderabad Campus


Sequential Pattern Mining
• Given a set of sequences and support threshold, find the
complete set of frequent subsequences

A sequence : < (ef) (ab) (df) c b >


A sequence database
SID sequence An element may contain a set of items.
10 <a(abc)(ac)d(cf)> Items within an element are unordered
and we list them alphabetically.
20 <(ad)c(bc)(ae)>
30 <(ef)(ab)(df)cb> <a(bc)dc> is a subsequence
40 <eg(af)cbc> of <a(abc)(ac)d(cf)>

Given support threshold min_sup =2, <(ab)c> is a


sequential pattern
BITS Pilani, Hyderabad Campus
Challenges in Data Mining

• Tremendous amount of data


• Algorithms must be highly scalable to handle such as
tera-bytes of data
• High-dimensionality of data
• Micro-array may have tens of thousands of
dimensions • High complexity of data
• Noisy and unreliable
• Dynamically evolving
• High dimensionality
• Multiple heterogeneous sources
• New and sophisticated applications

BITS Pilani, Hyderabad Campus


Teaching and Evaluation for
BITS F415 – L P U 3 0 3
Evaluation Scheme:
Nature of
Component Duration Weightage (%)
Component
Mid Term Exam 90 Mins. 25 Closed Book
Quizzes (Three) 30 Mins 15 Closed Book
Assignments -- 25 Open Book
Comprehensive 3 Hours 35 Closed Book

Chamber Consultation Hour: Mon 8th hour


 
Notices: All notices pertaining to this course will be displayed on the CMS/ CSIS Notice
Board.
 
Make-up Policy: Prior Permission is must and Make-up shall be granted only in genuine
cases based on individual’s need, circumstances. The recommendation from chief
warden is necessary to request for a make-up.
BITS Pilani, Hyderabad Campus
Books

TEXT BOOK REFERENCE BOOKS

BITS Pilani, Hyderabad Campus


Take home message

• Data Mining refers to non-trivial extraction of implicit,


previously unknown and potentially useful knowledge from
data
• Data Mining covers topics including warehousing,
association analysis, clustering, classification, anomaly
detection, etc. (based on the type of mined knowledge), as
well as transaction data mining, stream data mining,
sequence data mining, graph data mining, etc. (based on
the type of data)
• Data Mining has wide applications in many different fields in
business, science, engineering, education, and many more

BITS Pilani, Hyderabad Campus

You might also like