0% found this document useful (0 votes)

9 views66 pages

Week 15 Lecture Notes

Uploaded by

findinngclosure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views66 pages

Week 15 Lecture Notes

Uploaded by

findinngclosure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

MACHINE LEARNING

A JOURNEY FROM DATA TO DECISIONS

DEPARTMENT OF COMPUTER SCIENCE

Unsupervised Learning

2
Types of Unsupervised
Learning Algorithm
Clustering: Clustering is a method of
grouping the objects into clusters such
that objects with most similarities remains
into a group and has less or no similarities
with the objects of another group.

Association: An association rule is an

unsupervised learning method which is
used for finding the relationships between
variables in the large database.

3
K-Means Clustering

4
What is K-Means Clustering?

It is an iterative algorithm that divides

the unlabeled dataset into k different
clusters in such a way that each
dataset belongs only one group that
has similar properties.
5
How does the K-Means Algorithm
Work?

6
How does the K-Means Algorithm
Work?

7
How does the K-Means Algorithm
Work?

8
How does the K-Means Algorithm
Work?

9
How does the K-Means Algorithm
Work?

10
Final Clusters

11
Example K-Means Clustering

12
Types of Clustering Methods

▰Partitioning Clustering

▰Density-Based Clustering

▰Distribution Model-Based Clustering

▰Hierarchical Clustering
13
Partitioning Clustering

In this type, the dataset is divided into

a set of k groups, where K is used to
define the number of pre-defined
groups.

The cluster center is created in such a

way that the distance between the
data points of one cluster is minimum
as compared to another cluster
14
centroid.
Density-Based Clustering

It connects the highly-dense areas into

clusters, and the arbitrarily shaped
distributions are formed as long as the
dense region can be connected.

This algorithm does it by identifying

different clusters in the dataset and
connects the areas of high densities into
clusters.

The dense areas in data space are divided

from each other by sparser areas. 15
Distribution Model-Based
Clustering

The data is divided based on

the probability of how a
dataset belongs to a
particular distribution.

The grouping is done by

assuming some distributions
commonly Gaussian
Distribution.
16
Hierarchical Clustering

In this technique, the dataset is

divided into clusters to create a tree-
like structure, which is also called
a dendrogram.

The observations or any number of

clusters can be selected by cutting
the tree at the correct level.

17
Machine Learning Process

18
Association

19
Apriori Algorithm

20
Steps for Apriori Algorithm

▰Step-1: Determine the support of item sets in the transactional

database, and select the minimum support and confidence.
▰Step-2: Take all supports in the transaction with higher support
value than the minimum or selected support value.
▰Step-3: Find all the rules of these subsets that have higher
confidence value than the threshold or minimum confidence.
▰Step-4: Sort the rules as the decreasing order of lift.
21
Apriori Algorithm Working

Suppose we have the following

dataset that has various
transactions, and from this
dataset, we need to find the
frequent item sets and generate
the association rules using the
Apriori algorithm

22
Step-1: Calculating C1 and L1

Candidate set or C1. frequent item set L1

23
Step-2: Candidate Generation C2,
and L2

Candidate Generation C2 frequent item set L2

24
Step-3: Candidate generation C3,
and L3

Candidate Generation C3
As we can see from the above C3
table, there is only one combination
of item set that has support count
equal to the minimum support
count.
So, the L3 will have only one
combination, i.e., {A, B, C}.

25
Step-4: Finding the association
rules for the subsets

Rules Support Confidence As the given threshold

or minimum confidence
A ^B → C 2 Sup{(A ^B) ^C}/sup(A ^B)= 2/4=0.5=50% is 50%, so the first
three rules
B^C → A 2 Sup{(B^C) ^A}/sup(B ^C)= 2/4=0.5=50% A ^B → C,
B^C → A,
A^C → B 2 Sup{(A ^C) ^B}/sup(A ^C)= 2/4=0.5=50% and
A^C → B
C→ A ^B 2 Sup{(C^( A ^B)}/sup(C)= 2/5=0.4=40% can be considered as
the strong association
A→ B^C 2 Sup{(A^( B ^C)}/sup(A)= 2/6=0.33=33.33%
rules for the given
problem.
B→ B^C 2 Sup{(B^( B ^C)}/sup(B)= 2/7=0.28=28%

26
Splitting the Dataset - Holdout

27
Stratified Sampling

28
Underfitting and Overfitting

29
Bias vs Variance

• Bias is the difference between observed value and the predicted value.
• Variance is defined as the difference in performance on the training set vs on the
test set.
30
31
32
33
34
Bias vs Variance

We generally want to minimize both

bias and variance i.e build a model
which not only fits the training data
well but also generalizes well on
test/validation data.
35
Enrich the Dataset

36
Improve Model Efficiency –
K-Fold Testing

37
Model Selection

38
Anaconda Environment

39
Value Addition

40
Sample Dataset - Iris

41
Dataset Types

42
Facets of data

■ Structured
■ Unstructured
■ Natural language
■ Machine-generated
■ Graph-based
■ Audio, video, and images
■ Streaming
43
Data Preprocessing
Techniques - Missing Data
Two ways to deal
with missing data:

1. By deleting the
particular row.
2. By calculating the
mean.

44
Encoding Categorical Data

45
Feature Scaling

• Scaling data means transforming it so that the values fit within some range or
scale, such as 0–100 or 0–1.

• Imagine you have an image represented as a set of RGB values ranging from 0 to
255. We can scale the range of the values from 0–255 down to a range of 0–1.

• This scaling process will not affect the algorithm output since every value is scaled
in the same way.

• But it can speed up the training process, because now the algorithm only needs to
handle numbers less than or equal to 1.

46
Example Dataset

47
Machine Learning with R

48
Datasets Resources

49
Open Data Resources

50
Technologies
Tools for Data Science

52
Applications
Image Processing

54
Banking and Finance

55
Sports

56
Digital Advertisements

57
Health Care

58
Speech Recognition

59
Internet Search

60
Recommender
System

61
Gaming

62
Augmented Reality

63
Self-Driving Cars

64
Robots

65
Questions & Answers Session

DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
ML notes
No ratings yet
ML notes
10 pages
6th_SEM Machine Learning Notes PDF
100% (1)
6th_SEM Machine Learning Notes PDF
36 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Unit 2
No ratings yet
Unit 2
57 pages
Chapter 3
No ratings yet
Chapter 3
67 pages
Unit 6
No ratings yet
Unit 6
22 pages
CEC453 Machine Learning
No ratings yet
CEC453 Machine Learning
168 pages
Learning AI
No ratings yet
Learning AI
34 pages
DK Eyewitness Top 10 New York City (DK Eyewitness) (Z-Library)
No ratings yet
DK Eyewitness Top 10 New York City (DK Eyewitness) (Z-Library)
194 pages
Machine Learning
100% (6)
Machine Learning
115 pages
BML answer key
No ratings yet
BML answer key
21 pages
Module 04 Edited
No ratings yet
Module 04 Edited
19 pages
Chapter - 4
No ratings yet
Chapter - 4
14 pages
Ai Word Document Session 2 Detailed Exaple
No ratings yet
Ai Word Document Session 2 Detailed Exaple
15 pages
Unsupervised Machine Learning Techniques (2)
No ratings yet
Unsupervised Machine Learning Techniques (2)
58 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
11 pages
Unsupervised Ml
No ratings yet
Unsupervised Ml
15 pages
DM - MP (1)
No ratings yet
DM - MP (1)
15 pages
Conformity and Conflict 15th Edition McCurdy Test Bank download
100% (2)
Conformity and Conflict 15th Edition McCurdy Test Bank download
52 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
ML Lect1
100% (1)
ML Lect1
51 pages
ML 4,5
No ratings yet
ML 4,5
8 pages
Pengumpulan Data Kuantitatif
100% (1)
Pengumpulan Data Kuantitatif
52 pages
Machine
No ratings yet
Machine
61 pages
Data Mining
No ratings yet
Data Mining
68 pages
CPCB QP 2021
No ratings yet
CPCB QP 2021
88 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
79 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
6 الى13 داتا ماينق
No ratings yet
6 الى13 داتا ماينق
19 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Ia1 Ml Scheme Common to is,Ai,Cs - Copy
No ratings yet
Ia1 Ml Scheme Common to is,Ai,Cs - Copy
10 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
Module 04
No ratings yet
Module 04
75 pages
11 W11NSE6220 - Fall 2023 - Zeng
No ratings yet
11 W11NSE6220 - Fall 2023 - Zeng
43 pages
DWM_Module 3 (1)
No ratings yet
DWM_Module 3 (1)
22 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Chapter5 - Machine Learning
No ratings yet
Chapter5 - Machine Learning
37 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
2nd Unit NN Final Class Notes (1)
No ratings yet
2nd Unit NN Final Class Notes (1)
50 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Unit 4 Introduction to Algorithm
No ratings yet
Unit 4 Introduction to Algorithm
10 pages
Classification
No ratings yet
Classification
50 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
Decision Trees. These Models Use Observations About Certain
No ratings yet
Decision Trees. These Models Use Observations About Certain
6 pages
Introduction to Classification and Classification Algorithms
No ratings yet
Introduction to Classification and Classification Algorithms
9 pages
3.popular Machine Learning Algorithm
No ratings yet
3.popular Machine Learning Algorithm
11 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
49 Machine Learning
No ratings yet
49 Machine Learning
300 pages
Colloquium Evaluation: Faculty of Computer Science and Engineering To:Kanika Gupta Ma'Am Bhavya Sethi 16csu082
No ratings yet
Colloquium Evaluation: Faculty of Computer Science and Engineering To:Kanika Gupta Ma'Am Bhavya Sethi 16csu082
12 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Data Mining Intro IEP
No ratings yet
Data Mining Intro IEP
47 pages
欢迎来到monash大学作业封面页面！
100% (2)
欢迎来到monash大学作业封面页面！
8 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
Slides Courtesy: Ling Chen [email protected]
No ratings yet
Slides Courtesy: Ling Chen [email protected]
42 pages
M1 the Creativity Imperative
No ratings yet
M1 the Creativity Imperative
42 pages
DMA Handbook Fall 2021 - Final Version
No ratings yet
DMA Handbook Fall 2021 - Final Version
58 pages
Landini 6870 - Parts Catalog
No ratings yet
Landini 6870 - Parts Catalog
16 pages
Science11 Q1 Mod3of8 Mineralsand-Rocks v2-1
No ratings yet
Science11 Q1 Mod3of8 Mineralsand-Rocks v2-1
24 pages
Instructions
No ratings yet
Instructions
6 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Assignment Psychiatric
No ratings yet
Assignment Psychiatric
14 pages
Enteral Nutrition Safety System With Enfit
No ratings yet
Enteral Nutrition Safety System With Enfit
12 pages
Siemens Mammomat Balance Installation and Start Up
No ratings yet
Siemens Mammomat Balance Installation and Start Up
32 pages
Worksheet 1 Csip
No ratings yet
Worksheet 1 Csip
10 pages
Cryogenic Ball Milling - A Key For Elemental Analysis of Plastic-Rich Automotive Shedder Residue
No ratings yet
Cryogenic Ball Milling - A Key For Elemental Analysis of Plastic-Rich Automotive Shedder Residue
9 pages
14.review Environment Impact of Livestock Farming and Precision
No ratings yet
14.review Environment Impact of Livestock Farming and Precision
10 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Urban Heat Island and Land Use Cover Dyn
No ratings yet
Urban Heat Island and Land Use Cover Dyn
19 pages
Ethylene Oxide
No ratings yet
Ethylene Oxide
3 pages
Draft Forecast July 20th 1658362481
No ratings yet
Draft Forecast July 20th 1658362481
3 pages
Adams 1986
No ratings yet
Adams 1986
5 pages
The Role of Affective Communication
No ratings yet
The Role of Affective Communication
13 pages
DRAM Technology
No ratings yet
DRAM Technology
16 pages
g1 Compensation Administration
No ratings yet
g1 Compensation Administration
2 pages
Climate Invoice.
No ratings yet
Climate Invoice.
1 page
KUKA Robotics: Robot Range
0% (1)
KUKA Robotics: Robot Range
4 pages
Blood Transfusion MCQ
95% (64)
Blood Transfusion MCQ
6 pages
Api To Update Supervisor
No ratings yet
Api To Update Supervisor
3 pages
Mahindra Value Delivery System
No ratings yet
Mahindra Value Delivery System
5 pages
CMR - Quarry
0% (1)
CMR - Quarry
9 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Week 15 Lecture Notes

Uploaded by

Week 15 Lecture Notes

Uploaded by

MACHINE LEARNING

A JOURNEY FROM DATA TO DECISIONS

DEPARTMENT OF COMPUTER SCIENCE

Association: An association rule is an

It is an iterative algorithm that divides

▰Distribution Model-Based Clustering

In this type, the dataset is divided into

The cluster center is created in such a

It connects the highly-dense areas into

This algorithm does it by identifying

The dense areas in data space are divided

The data is divided based on

The grouping is done by

In this technique, the dataset is

The observations or any number of

▰Step-1: Determine the support of item sets in the transactional

Suppose we have the following

Candidate set or C1. frequent item set L1

Candidate Generation C2 frequent item set L2

Rules Support Confidence As the given threshold

We generally want to minimize both

You might also like