0% found this document useful (0 votes)

15 views152 pages

DA-Unit V

Uploaded by

jidey30017

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views152 pages

DA-Unit V

Uploaded by

jidey30017

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 152

Outcome

Implement data visualization using visualization

tools in Python Programming
Outline

Clustering Algorithms

Text Analysis Model Evaluation and Selection

Outline

● Clustering Algorithms:
○ K-Means, Hierarchical Clustering,
○ Time-series analysis.
● Introduction to Text Analysis:
○ Text-preprocessing, Bag of words, TF-IDF and topics.
● Need and Introduction to social network analysis
● Introduction to business analysis.
● Model Evaluation and Selection:
○ Metrics for Evaluating Classiﬁer Performance, Holdout Method and Random
Sub sampling
○ Parameter Tuning
○ Clustering and Time-series analysis using Scikit-learn, sklearn.
○ Metrics Confusion matrix, AUC-ROC Curves , Elbow plot.
Clustering

What is Clustering | Cluster analysis

Cluster analysis is a statistical classiﬁcation technique in which a set of

objects or points with similar characteristics are grouped together in clusters.
Clustering

Need of Clustering Algorithms

Trying to determine the appropriate Using Clustering algorithms on the Selling the products to the targeted
audience for the product customer base audience
Clustering Algorithms

❏ K-Means

❏ Hierarchical Clustering ❏ Time-series analysis

Clustering Algorithms

❏ K-Means
Unsupervised learning algorithm

Used to solve the clustering problems

Which groups are unlabeled dataset into different clusters.

Here K deﬁnes the number of predeﬁned clusters that need to be created in the process,
as if K=2, there will be two clusters, and
for K=3, there will be three clusters, and so on.
Clustering Algorithms

❏ K-Means
Clustering Algorithms

❏ K-Means

● It is an iterative algorithm that divides the unlabeled dataset into k different clusters
● in such a way that each dataset belongs only one group that has similar properties.

● It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
Clustering Algorithms

❏ K-Means
● It is a centroid-based algorithm, where each cluster is associated with a centroid.

● The main aim of this algorithm is

to minimize the sum of distances
between the data point and their
corresponding clusters.
Clustering Algorithms

❏ K-Means

The k-means clustering algorithm mainly performs two tasks:

● Determines the best value for K center points or centroids by an iterative process.

● Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
Clustering Algorithms

❏ K-Means
Clustering Algorithms

❏ K-Means
● Basically K-Means runs on distance calculations, which uses “Euclidean Distance” to calculate
the distance between two given instances.
● For given instances (X1, Y1) and (X2, Y2), the formula is

● Link for Solved Example

● Link for Python Code

Clustering Algorithms
❏ K-Means
Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Step 1

Select the number K to decide the number of clusters.

● Let's take number k of clusters, i.e., K=2, to identify the dataset

and to put them into different clusters.
● It means here we will try to group these datasets into two
different clusters.
Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Step 2

Select random K points or centroids. (It can be other from the input dataset).
Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Assign each data point to their closest centroid,

Step 3
which will form the predeﬁned K clusters.
Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Step 4 Calculate the variance and place a new centroid of each

cluster.
Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Repeat the third steps, which means reassign each datapoint to the new
Step 5
closest centroid of each cluster.
Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Step 6 If any reassignment occurs, then go to step-4 else go to FINISH.

Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Step 7 The model is ready.

Clustering Algorithms

How does the K-Means Algorithm Work?

Clustering Algorithms

Effect for Number of Cluster: K

Model Evaluation & Selection

Elbow Method

Click here for demonstration

Clustering Algorithms

❏ Hierarchical Clustering

● we develop the hierarchy of clusters in the form of a tree

● this tree-shaped structure is known as the dendrogram.

Clustering Algorithms

❏ Hierarchical Clustering | hierarchical cluster analysis

● unsupervised machine learning algorithm

● used to group the unlabeled datasets into a cluster

Clustering Algorithms

❏ Hierarchical Clustering

● we develop the hierarchy of clusters in the form of a tree

● this tree-shaped structure is known as the dendrogram.

Clustering Algorithms

❏ Hierarchical Clustering

● The hierarchical clustering technique has two approaches:

1. Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm

starts with taking all data points as single clusters and merging them until one
cluster is left.

2. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is

a top-down approach.
Clustering Algorithms

❏ Hierarchical Clustering

Agglomerative Hierarchical clustering

● The agglomerative hierarchical clustering algorithm is a popular example of HCA.

● To group the datasets into clusters, it follows the bottom-up approach.

● It means, this algorithm considers each dataset as a single cluster at the beginning,
and then start combining the closest pair of clusters together.

● It does this until all the clusters are merged into a single cluster that contains all
the datasets.
Clustering Algorithms
❏ Hierarchical Clustering
Divisive Hierarchical clustering

● This is top Down Strategy does the reverse of agglomerative hierarchical clustering by starting
with all objects in one cluster.

● It subdivides the clusters into smaller & smaller pieces, until each object from a cluster on its
own or until it satisﬁes certain termination conditions.

Like , a desired number of cluster or the diameter of each cluster is within a certain threshold
Clustering Algorithms

❏ Hierarchical Clustering

Why hierarchical clustering?

● we can opt for the hierarchical clustering algorithm

● because, in this algorithm, we don't need to have
knowledge about the predeﬁned number of clusters.
Clustering Algorithms

❏ Hierarchical Clustering

Agglomerative Hierarchical clustering

Measure for the distance between two clusters

● the closest distance between the two clusters is crucial for the hierarchical
clustering.

● There are various ways to calculate the distance between two clusters, and
these ways decide the rule for clustering.

● These measures are called Linkage methods.

Clustering Algorithms
❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
Single
Linkage Methods Linkage

Complete Linkage Centroid

Linkage Methods Linkage

Average
Linkage
Clustering Algorithms
❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
Linkage Methods

Single ● It is the Shortest Distance between the closest points of the clusters.
Linkage
Clustering Algorithms
❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
Linkage Methods

● It is the farthest distance between the two points of two different clusters.
Complete
Linkage
● It is one of the popular linkage methods as it forms tighter clusters than
single-linkage.
Clustering Algorithms
❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
Linkage Methods

Average ● It is the linkage method in which the distance between each pair of datasets is
Linkage
added up and then divided by the total number of datasets to calculate the
average distance between two clusters.
Clustering Algorithms
❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
Linkage Methods

Centroid ● It is the linkage method in which the distance between the centroid of the
Linkage clusters is calculated.

Reference
Clustering Algorithms

❏ Hierarchical Clustering Example

How the Agglomerative Hierarchical clustering Work?
● Task is to divide the students into different groups.
● Each student in an assignment and based on the given marks.
● There’s no ﬁxed target here as to how many groups to have.
● No clear idea about what type of students should be assigned to which group, it cannot
be solved as a supervised learning problem.
● So, we will try to apply hierarchical clustering here and segment the students into
different groups.
Clustering Algorithms

❏ Hierarchical Clustering Example

Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?
√(10-7)^2 = √9 = 3

Step 1 Creating a Proximity Matrix

Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?

Step 2

assign all the points to an individual cluster

Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?

Step 3

look at the smallest distance in the proximity matrix and

merge the points with the smallest distance
Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?

Step 3

the smallest distance is 3 and hence we will merge point 1

and 2
Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?

Step 3

look at the updated clusters and accordingly update the

proximity matrix

As per the selected linkage function update the proximity

matrix
Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?
To decide the distance between (1,2)->3
Step 3
● Check the proximity matrix

min((1,3),(1,2))
=min(18,21)
=18
Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?

Step 4

Repeat step 2 until only a single cluster is left.

Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering

Dendrogram Representation
Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering

Dendrogram Representation

More the distance of the vertical lines in the dendrogram, more

the distance between those clusters.
Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering ● Decide threshold
● Consider Threshold =12
Number of Cluster ● The number of clusters will be the number of vertical lines
which are being intersected by the line drawn using the
threshold.
● The red line intersects 2 vertical lines
● we will have 2 clusters.
● One cluster will have a sample (1,2,4) and the other will have
a sample (3,5)
Clustering Algorithms
❏ Hierarchical Clustering

Agglomerative Hierarchical clustering Divisive Hierarchical clustering

● Initially each item in its own ● Initially each item in its one
cluster cluster

● Iteratively cluster are merged ● Large clusters are successively

together divided

● Bottom up
● Top Down
Clustering Algorithms

❏ Time-series analysis

● Time series is a sequence of data points in chronological sequence, most often gathered in regular
intervals.
● It can be applied to any variable that changes over time and generally speaking, usually data
points that are closer together are more similar than those further apart

● It is the way of studying the characteristics of the response variable with respect to time, as the
independent variable

● To estimate the target variable in the name of predicting or forecasting, use the time variable as
the point of reference
Clustering Algorithms

❏ Time-series analysis
Timestamp Stock - Price
Example stock price
2015-10-11 09:00:00 100

2015-10-11 10:00:00 110

2015-10-11 11:00:00 105

Basic structure of time series data 2015-10-11 12:00:00 90

Observations are recorded every hour. 2015-10-11 13:00:00 120

Clustering Algorithms

❏ Time-series analysis
Trend
Components of time series

Components
Seasonality of Irregularity
time series

Cyclical
Clustering Algorithms

❏ Time-series analysis
Components of time series

In which there is no ﬁxed interval and any divergence within the given
Trend dataset is a continuous timeline.

The trend would be negative or positive of null trend

Clustering Algorithms

❏ Time-series analysis
Components of time series

Seasonality In which regular or ﬁxed interval shifts within the dataset in a continuous
timeline.

Would be bell curve or saw tooth.

● Identifying seasonality in time series data is important for the development of a useful
time series model.
Clustering Algorithms

❏ Time-series analysis
Components of time series

● Identifying seasonality in time series data is important for the

Seasonality
development of a useful time series model.

❏ tools for detecting seasonality in time series data

● tools that are useful for detecting seasonality in time series data
● Time series plots
● Statistical analysis and tests
Clustering Algorithms

❏ Time-series analysis
Components of time series

Seasonality
Clustering Algorithms

❏ Time-series analysis
Components of time series

Cyclical

Source In which there is no ﬁxed interval, uncertainty in movement and its pattern
Clustering Algorithms

❏ Time-series analysis
Components of time series

Irregularity Unexpected situations/events/scenarios and spikes in a short time span

Clustering Algorithms
❏ Time-series analysis
Components of time series
Clustering Algorithms

❏ Time-series analysis
Time Series analysis can be classiﬁed as :

Parametric & Non Parametric

Linear & Non Linear

Univariate & Multivariate

Clustering Algorithms

❏ Time-series analysis
Techniques used for time series analysis

ARIMA Models

Box-Jenkins multivariate models

Holt winters exponential smoothing

Clustering Algorithms

❏ Time-series analysis
Techniques used for time series analysis

ARIMA Models

ARIMA stands for AutoRegressive Integrated Moving Average.

Text Analysis

It is a machine learning technique used to automatically extract valuable

insights from unstructured text data.
Text Analysis

keywords

Names

Text Analysis Extract Speciﬁc

information
Company Informations

Survey Responses
Text Analysis

Text Analysis Operations using natural language toolkit

Tokenization

Stop Words Removal

Stemming and Lemmatization

POS Tagging
Text Analysis

Text Analysis Operations using natural language toolkit

Tokenization

● the ﬁrst step in text analytics

● The process of breaking down a text paragraph into smaller

chunks such as words or sentences is called Tokenization.

● Token is a single entity that is the building blocks for a sentence

or paragraph.
Text Analysis

Text Analysis Operations using natural language toolkit

Tokenization

● split a paragraph into list of sentences

Sentence Tokenization
using sent_tokenize() method

Word Tokenization ● split a sentence into list of words using

word_tokenize() method
Text Analysis

Text Analysis Operations using natural language toolkit

Tokenization
Text Analysis

Text Analysis Operations using natural language toolkit

Stop Words Removal

● Stopwords considered as noise in the text.

● Text may contain stop words such as is, am, are, this, a, an, the, etc.
Text Analysis
Text Analysis Operations using natural language toolkit

Stop Words Removal

Text Analysis
Text Analysis Operations using natural language toolkit

Stemming and Lemmatization

● Stemming is a normalization technique where lists of tokenized words are

converted into shortened root words to remove redundancy.

● Lemmatization in NLTK (Natural Lang. Toolkit) is the algorithmic process of

ﬁnding the lemma of a word depending on its meaning and context.
Text Analysis
Text Analysis Operations using natural language toolkit

Stemming and Lemmatization

Text Analysis
Text Analysis Operations using natural language toolkit

Stemming and Lemmatization

Example
Text Analysis
Text Analysis Operations using natural language toolkit

POS Tagging

● POS (Parts of Speech) tell us about grammatical information of words of the

sentence by assigning speciﬁc token as tag to each words.
Text Analysis

Text Analysis Model using TF-IDF

● Term frequency–inverse document frequency(TFIDF)

● is a numerical statistic that is intended to reﬂect how important a word is

to a document in a collection or corpus.
Text Analysis

Term Frequency

● It is a measure of the frequency of a word (w) in a document (d).

● TF is deﬁned as the ratio of a word’s occurrence in a document to the total number

of words in a document.
Text Analysis

Term Frequency

Formula
Text Analysis

Term Frequency

Example
Text Analysis

Inverse Document Frequency

● It is the measure of the importance of a word.

● Term frequency (TF) does not consider the importance of words.

● Some words such as’ of’, ‘and’, etc. can be most frequently present but are of little
signiﬁcance.

● IDF provides weightage to each word based on its frequency in the corpus D.
Text Analysis

Inverse Document Frequency

Formula
Text Analysis

Inverse Document Frequency

In our example, since we have two documents in the corpus, N=2.

Example
Text Analysis

Term Frequency — Inverse Document Frequency (TFIDF)

● It is the product of TF and IDF.

● TFIDF gives more weightage to the word that is rare in the corpus (all the
documents).

● TFIDF provides more importance to the word that is more frequent in the
document.
Text Analysis

Term Frequency — Inverse Document Frequency (TFIDF)

Formula
Text Analysis

Term Frequency — Inverse Document Frequency (TFIDF)

Example
Text Analysis

Term Frequency — Inverse Document Frequency (TFIDF)

Disadvantage of TF IDF

● It is unable to capture the semantics.

, Text Analysis

Introduction to social network analysis

Social network analysis (SNA)

● is the process of investigating social structures

in terms of nodes and edges that connect them
through the use of networks and graph theory.

Source
, Text Analysis

Application of Social Network Analysis

https://www.latentview.com/blog/a-guide-to-social-network-analysis-and-its-use-cases/
, Text Analysis

Introduction to social network analysis

Link prediction Classification

, Text Analysis

Introduction to social network analysis

Community detection Influence propagation

, Text Analysis

Graph Theory for social network analysis Graph:

A graph is made up of
vertices(also called nodes) that
are connected by edges(also
called links or relationships).
, Text Analysis

Graph Theory for social network analysis Edges:

Here are three different edges relationships:

● The relationship “working together” is a symmetric

● Symmetric and Asymmetric
relationship
(Directionality)
● If A is related to B, B is also related to A.
● Binary and Valued (Weight)
, Text Analysis

Graph Theory for social network analysis Edges:

Here are three different

edges relationships:

● Symmetric and
Asymmetric
(Directionality)
● Binary and Valued
(Weight)
, Text Analysis

Graph Theory for social network analysis Edges:

Here are three different edges relationships: ● The relationship between nodes is ‘child of’, then the
relationship is asymmetric.

● Symmetric and Asymmetric ● This is the case if someone follows someone else on

(Directionality) Twitter.

● Binary and Valued (Weight) ● If A is the child of B, then B is not a child of A. Such a
network where the relationship is asymmetric
, Text Analysis

Graph Theory for social network analysis Edges:

● Relationships can be binary or valued.

Here are three different edges relationships:
● “Priya follows Teena on Twitter” is a binary
relationship
● Symmetric and Asymmetric
● “Priya retweeted 4 tweets from Teena ” is valued.
(Directionality)
● In the Twitter world, such relationships are easily
● Binary and Valued (Weight)
quantiﬁed
● in the “softer” social world it’s very hard to determine
and quantify the quality of an interpersonal
relationship.
, Text Analysis

Graph Theory for social network analysis Density

● The relation between the number of existing

connections in a network and all possible
connections
, Text Analysis

Graph Theory for social network analysis Density

● 5 Nodes
● Potential edges= 5(5-1)/2 = 5*4
=20/2=10
● Actual Edges= 9
● Density= 9/10= 90%
● Hence it is a high-density network.
, Text Analysis

Graph Theory for social network analysis Density

● 5 Nodes
● Potential edges= 5(5-1)/2 = 5*4 /2
=20/2=10
● Actual Edges= 4
● Density= 4/10= 40%
● Hence it is a low-density network.
, Text Analysis

Graph Theory for social network analysis Density

, Text Analysis

Graph Theory for social network analysis Centrality Measures

Degree Cardinality Measures the number of

direct ties to a node; this will indicate the most
connected node in the group.

The standardized score is calculated by dividing the score by (n-1),

where n is the number of nodes in the network.

Nodes 3 and 5 have a high degree centrality of 0.5, i.e.,

they are the most well-connected nodes in the network.
, Text Analysis

Graph Theory for social network analysis Closeness Cardinality

● Closeness Cardinality Closeness measures how close a node is to the rest of the
network. It is the ability of the node to reach the other nodes in the network.
● It is calculated as the inverse of the sum of the distance between a node and other
nodes in the network.
, Text Analysis

Graph Theory for social network analysis Centrality Measures

● Closeness Cardinality
● Hence the Closeness score for node 1 will
be 1/16.
● The standardized score is calculated by
multiplying the score by (n-1).
, Text Analysis

Graph Theory for social network analysis Centrality Measures

● Closeness Cardinality

Node 4 is the closest/central node in the network with

the highest closeness score of 0.6.
, Text Analysis

Graph Theory for social network analysis Centrality Measures

● Betweenness Centrality is a measure of how often a node appears in the shortest

path connecting two other nodes.
, Text Analysis

Graph Theory for social network analysis Centrality Measures

Let us take node 5 in Figure 4. Node 5

occurs in 9 shortest paths between a Nodes with high betweenness centrality are critical in
pair of nodes controlling and maintaining ﬂow in the network; hence these
are critical nodes in the network
, Text Analysis

Graph Theory for social network analysis Centrality Measures

Nodes with high betweenness centrality are critical in controlling and

maintaining ﬂow in the network; hence these are critical nodes in the
network
, Text Analysis

Graph Theory for social network analysis Centrality Measures

● Eigen Centrality is a relative measure of the importance of the node in the

network.
● Each node is assigned a value or score depending upon the number of other
prominent/ high scoring nodes it is connected to.
, Text Analysis

Graph Theory for social network analysis Why Centrality Measures ?

● Here ‘d’ represents the degree centrality score.

● Nodes A and B are connected to 4 nodes each, and
hence both have a degree centrality score of 4.
● But when we look at their neighbors, we can see
that node B is connected to nodes with a high
degree.
● Hence, node B can be preferred over node A when
we have to choose based on connectivity.
, Text Analysis

Introduction to business analysis

Business analysis

● “Business analysis is the practice of enabling change in

an enterprise by deﬁning needs and recommending
solutions that deliver value to stakeholders.

● It enables an enterprise to articulate needs and the

rationale for change, and to design and describe solutions
that can deliver value.”
, Model Evaluation & Selection

Why Cross Validation is important

● Data needs to split into:

● Training data: Used for model development

● Validation data: Used for validating the performance of the same model

● In simple terms cross-validation allows us to utilize our data even better.

, Model Evaluation & Selection

Why Cross Validation is important

, Model Evaluation & Selection

Cross Validation
, Model Evaluation & Selection

Cross Validation

● Cross-Validation also referred to as out of sampling technique is an essential element of a

data science project.

● It is a resampling procedure used to evaluate machine learning models and access how the
model will perform for an independent test dataset.
, Model Evaluation & Selection

Cross Validation

8 different cross-validation techniques

1. Leave p out cross-validation
2. Leave one out cross-validation
3. Holdout cross-validation
4. Repeated random subsampling validation
5. k-fold cross-validation
6. Stratiﬁed k-fold cross-validation
7. Time Series cross-validation
8. Nested cross-validation
, Model Evaluation & Selection

Cross Validation Hold Out Cross Validation

, Model Evaluation & Selection

Cross Validation Hold Out Cross Validation

● The holdout technique is an exhaustive cross-validation method, that randomly

splits the dataset into train and test data depending on data analysis.

70:30 split of Data into training and validation data respectively

, Model Evaluation & Selection

Cross Validation Hold Out Cross Validation

● In the case of holdout cross-validation, the dataset is randomly split into training
and validation data.

● Generally, the split of

● training data is more than test data.
● The training data is used to induce the model and validation data is evaluates
the performance of the model.

● The more data is used to train the model, the better the model is.
, Model Evaluation & Selection

Cross Validation Hold Out Cross Validation

● For the holdout cross-validation method, a good amount of data is isolated from
training.

Pros Cons

1. Simple, easy to understand, and 1. Not suitable for an imbalanced dataset.

implement. 2. A lot of data is isolated from training the
model.
, Model Evaluation & Selection

Cross Validation Hold Out Cross Validation

● Hold out Approach in Sklearn

❏ The hold-out approach can be applied by using train_test_split module of

sklearn.model_selection

❏ In the below example we have split the dataset to create the test data with a size of
30% and train data with a size of 70%. The random_state number ensures the split is
deterministic in every run.
, Model Evaluation & Selection

Cross Validation

Random subsampling Cross Validation

● Repeated random subsampling validation also referred to as Monte Carlo cross-validation

splits the dataset randomly into training and validation.

● Unlikely k-fold cross-validation split of the dataset into not in groups or folds but splits in
this case in random.

● The number of iterations is not ﬁxed and decided by analysis.

● the results are then averaged over the splits.

, Model Evaluation & Selection

Cross Validation

Random subsampling Cross Validation

Repeated random subsampling validation

, Model Evaluation & Selection

Cross Validation Random subsampling Cross Validation

Pros Cons

1. The proportion of train and validation 1. Some samples may not be selected for either
training or validation.
splits is not dependent on the number of
iterations or partitions. 2. Not suitable for an imbalanced dataset.
Model Evaluation & Selection

Parameter Tuning and Optimization

● There is a list of different machine learning models.

● They all are different in some way or the other, but what makes them different is
nothing but input parameters for the model.

● These input parameters are named as Hyperparameters.

● These hyperparameters will deﬁne the architecture of the model

● the best part about these is that you get a choice to select these for your model.
Model Evaluation & Selection

Parameter Tuning

● we are not aware of optimal values for hyperparameters which would generate the
best model output.

● what we tell the model is to explore and select the optimal model architecture
automatically.

● This selection procedure for hyperparameter is known as Hyperparameter Tuning.

Model Evaluation & Selection
Parameter Tuning

Number of Iteration in K Means Clustering

Model Evaluation & Selection
Parameter Tuning

Parameter Tuning for Decision tree

Model Evaluation & Selection
Parameter Tuning

effect of k on knn classiﬁcation

Model Evaluation & Selection
Parameter Tuning

Initial support and threshold for Association

Rule Mining
Model Evaluation & Selection

Why we need Parameter Tuning

❏ here we would discuss what questions this hyperparameter tuning will answer for us

● What should be the value for the maximum depth of the Decision Tree?
● How many trees should I select in a Random Forest model?
● Should use a single layer or multiple layer Neural Network, if multiple layers
then how many layers should be there?
● How many neurons should I include in the Neural Network?
● What should be the minimum sample split value for Decision Tree?
● What value should I select for the minimum sample leaf for my Decision Tree?
Model Evaluation & Selection

Why we need Parameter Tuning

❏ here we would discuss what questions this hyperparameter tuning will answer for us

● How many iterations should I select for Neural Network?

● What should be the value of the learning rate for gradient descent?
● Which solver method is best suited for my Neural Network?
● What is the K in k nearest Neighbors?
● What should be the value for C and sigma in Support Vector Machine?

Note : this a few questions which could be answered by hyperparameter tuning.

Model Evaluation & Selection

approaches to Hyperparameter tuning

● Manual Search

● Random Search

● Grid Search
Model Evaluation & Selection

approaches to Hyperparameter tuning

● Manual Search

❏ we select some hyperparameters for a model based on our gut feeling and experience.

❏ Based on these parameters, the model is trained, and model performance measures are
checked.
❏ This process is repeated with another set of values for the same hyperparameters until
optimal accuracy is received, or the model has attained optimal error.

❏ This might not be of much help as human judgment is biased, and here human experience is
playing a signiﬁcant role.
Model Evaluation & Selection

approaches to Hyperparameter tuning

● Random Search

❏ doing multiple rounds of this process, it would be better to give multiple values for all
the hyperparameters in one go to the model and let the model decide which one best
suits.
Model Evaluation & Selection

approaches to Hyperparameter tuning

● Grid Search

❏ This method is quite an expensive

method in terms of computation power
and time, but this is the most efﬁcient
method as there is the least possibility of
missing out on an optimal solution for a
model.
Model Evaluation & Selection

Confusion Matrix
Model Evaluation & Selection

Confusion Matrix

Click Here for Solved Example:

Model Evaluation & Selection

ROC-AUC Curve

● The Receiver Operator Characteristic (ROC) curve is an evaluation metric for binary
classiﬁcation problems.
● It is a probability curve that plots the TPR against FPR at various threshold values
and essentially separates the ‘signal’ from the ‘noise’.
● The Area Under the Curve (AUC) is the measure of the ability of a classiﬁer to
distinguish between classes and is used as a summary of the ROC curve.
● The higher the AUC, the better the performance of the model at distinguishing
between the positive and negative classes.
Model Evaluation & Selection

ROC-AUC Curve
Model Evaluation & Selection

ROC-AUC Curve

● When AUC = 1, then the classiﬁer is able to

perfectly distinguish between all the Positive and
the Negative class points correctly.
● If, however, the AUC had been 0, then the
classiﬁer would be predicting all Negatives as
Positives, and all Positives as Negatives.
Model Evaluation & Selection

ROC-AUC Curve

● When 0.5<AUC<1, there is a high chance that the

classiﬁer will be able to distinguish the positive
class values from the negative class values.
● This is so because the classiﬁer is able to detect
more numbers of True positives and True
negatives than False negatives and False
positives.
Model Evaluation & Selection

ROC-AUC Curve

● When AUC=0.5, then the classiﬁer is not able to

distinguish between Positive and Negative class
points. Meaning either the classiﬁer is predicting
random class or constant class for all the data
points.
Model Evaluation & Selection

ROC-AUC Curve

Data Science Unit 5
No ratings yet
Data Science Unit 5
105 pages
Clustering
No ratings yet
Clustering
44 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
ML Unit 2
No ratings yet
ML Unit 2
17 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
07 Clustering
No ratings yet
07 Clustering
34 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Unit 4 Self Made
No ratings yet
Unit 4 Self Made
28 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
Clustering
No ratings yet
Clustering
84 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Unit Iii - ML
No ratings yet
Unit Iii - ML
13 pages
Module 4 - 5TH Sem
No ratings yet
Module 4 - 5TH Sem
23 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Clustering
No ratings yet
Clustering
7 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
K-Means and Hierarchical Clustering
No ratings yet
K-Means and Hierarchical Clustering
30 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
19 - Sessionppt - Clusteringalgos
No ratings yet
19 - Sessionppt - Clusteringalgos
36 pages
Clustering
No ratings yet
Clustering
7 pages
Clustering
No ratings yet
Clustering
29 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
Clustering
No ratings yet
Clustering
75 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Unsupervised Learning Part 1
No ratings yet
Unsupervised Learning Part 1
9 pages
Unit - Iv Unsupervisied Learning - Notes
No ratings yet
Unit - Iv Unsupervisied Learning - Notes
32 pages
Cluster
100% (1)
Cluster
72 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
6 pages
Lecture PS7
No ratings yet
Lecture PS7
47 pages
Lec 8
No ratings yet
Lec 8
14 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
Clustering
No ratings yet
Clustering
7 pages
RK Clustering
No ratings yet
RK Clustering
77 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Clustering
No ratings yet
Clustering
38 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
63 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
7 pages
Lec 2
No ratings yet
Lec 2
32 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Unsupervised Algorithms Unit3
No ratings yet
Unsupervised Algorithms Unit3
53 pages
Fuzzy Meaning
No ratings yet
Fuzzy Meaning
6 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
An Introduction To Different Methods of Clustering in Machine Learning
No ratings yet
An Introduction To Different Methods of Clustering in Machine Learning
8 pages
Analysis and Implementation of TR-BDF2: Hosea A,, L.E Shampine B, 1
No ratings yet
Analysis and Implementation of TR-BDF2: Hosea A,, L.E Shampine B, 1
17 pages
2.2 Graphical Method Minimization
No ratings yet
2.2 Graphical Method Minimization
11 pages
Breast Cancer Aiml Project
No ratings yet
Breast Cancer Aiml Project
25 pages
MTH686-Non Linear Regression Lecture 4
No ratings yet
MTH686-Non Linear Regression Lecture 4
5 pages
Self Learning LinAlgebra
No ratings yet
Self Learning LinAlgebra
44 pages
Automatic Pixel-Level Multiple Damage Detection of Concretestructure Using Fully Convolutional Network
No ratings yet
Automatic Pixel-Level Multiple Damage Detection of Concretestructure Using Fully Convolutional Network
19 pages
Newton Raphson Method
No ratings yet
Newton Raphson Method
4 pages
Merged Presentation Choladeck
No ratings yet
Merged Presentation Choladeck
128 pages
Trace Table Notes and Activity
No ratings yet
Trace Table Notes and Activity
4 pages
Fall 2023-2024 IE 451 Homework 3 Solutions
No ratings yet
Fall 2023-2024 IE 451 Homework 3 Solutions
15 pages
Label-Efficient Segmentation Via Affinity Propagation
No ratings yet
Label-Efficient Segmentation Via Affinity Propagation
19 pages
MathEng5-M - Part 6-2
No ratings yet
MathEng5-M - Part 6-2
31 pages
2425 CSC14003 23CLC1 Quiz01
No ratings yet
2425 CSC14003 23CLC1 Quiz01
8 pages
Tmi 2018 2833635
No ratings yet
Tmi 2018 2833635
14 pages
SSP 0 1 Introduction
No ratings yet
SSP 0 1 Introduction
5 pages
Image Super Resolution Report
No ratings yet
Image Super Resolution Report
12 pages
1 s2.0 S2667305323000273 Main
No ratings yet
1 s2.0 S2667305323000273 Main
18 pages
Date:12/11/20 Due Date:12/11/20: Institute of Southern Punjab Multan Assignment Top Cover
No ratings yet
Date:12/11/20 Due Date:12/11/20: Institute of Southern Punjab Multan Assignment Top Cover
3 pages
ML Lab Report
No ratings yet
ML Lab Report
6 pages
2.1.1 Problem Solving and Design
No ratings yet
2.1.1 Problem Solving and Design
17 pages
Dijkstra's Algorithm Group 3 A21+A22+A23
No ratings yet
Dijkstra's Algorithm Group 3 A21+A22+A23
49 pages
Graphs
No ratings yet
Graphs
136 pages
Important Questions
No ratings yet
Important Questions
4 pages
Group 3 (Seksyen4)
No ratings yet
Group 3 (Seksyen4)
15 pages
Artificial Intelligence (UNIT - 2)
No ratings yet
Artificial Intelligence (UNIT - 2)
53 pages
1 Examples of Cramer's Rule
No ratings yet
1 Examples of Cramer's Rule
6 pages
06 KNN
No ratings yet
06 KNN
41 pages
Tutorial 3
No ratings yet
Tutorial 3
37 pages
Operation researchOR MCQs
No ratings yet
Operation researchOR MCQs
8 pages