Week 15 Lecture Notes
Week 15 Lecture Notes
2
Types of Unsupervised
Learning Algorithm
Clustering: Clustering is a method of
grouping the objects into clusters such
that objects with most similarities remains
into a group and has less or no similarities
with the objects of another group.
3
K-Means Clustering
4
What is K-Means Clustering?
6
How does the K-Means Algorithm
Work?
7
How does the K-Means Algorithm
Work?
8
How does the K-Means Algorithm
Work?
9
How does the K-Means Algorithm
Work?
10
Final Clusters
11
Example K-Means Clustering
12
Types of Clustering Methods
▰Partitioning Clustering
▰Density-Based Clustering
▰Hierarchical Clustering
13
Partitioning Clustering
17
Machine Learning Process
18
Association
19
Apriori Algorithm
20
Steps for Apriori Algorithm
22
Step-1: Calculating C1 and L1
23
Step-2: Candidate Generation C2,
and L2
24
Step-3: Candidate generation C3,
and L3
Candidate Generation C3
As we can see from the above C3
table, there is only one combination
of item set that has support count
equal to the minimum support
count.
So, the L3 will have only one
combination, i.e., {A, B, C}.
25
Step-4: Finding the association
rules for the subsets
26
Splitting the Dataset - Holdout
27
Stratified Sampling
28
Underfitting and Overfitting
29
Bias vs Variance
• Bias is the difference between observed value and the predicted value.
• Variance is defined as the difference in performance on the training set vs on the
test set.
30
31
32
33
34
Bias vs Variance
36
Improve Model Efficiency –
K-Fold Testing
37
Model Selection
38
Anaconda Environment
39
Value Addition
40
Sample Dataset - Iris
41
Dataset Types
42
Facets of data
■ Structured
■ Unstructured
■ Natural language
■ Machine-generated
■ Graph-based
■ Audio, video, and images
■ Streaming
43
Data Preprocessing
Techniques - Missing Data
Two ways to deal
with missing data:
1. By deleting the
particular row.
2. By calculating the
mean.
44
Encoding Categorical Data
45
Feature Scaling
• Scaling data means transforming it so that the values fit within some range or
scale, such as 0–100 or 0–1.
• Imagine you have an image represented as a set of RGB values ranging from 0 to
255. We can scale the range of the values from 0–255 down to a range of 0–1.
• This scaling process will not affect the algorithm output since every value is scaled
in the same way.
• But it can speed up the training process, because now the algorithm only needs to
handle numbers less than or equal to 1.
46
Example Dataset
47
Machine Learning with R
48
Datasets Resources
49
Open Data Resources
50
Technologies
Tools for Data Science
52
Applications
Image Processing
54
Banking and Finance
55
Sports
56
Digital Advertisements
57
Health Care
58
Speech Recognition
59
Internet Search
60
Recommender
System
61
Gaming
62
Augmented Reality
63
Self-Driving Cars
64
Robots
65
Questions & Answers Session
66