0% found this document useful (0 votes)
15 views152 pages

DA-Unit V

Uploaded by

jidey30017
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views152 pages

DA-Unit V

Uploaded by

jidey30017
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 152

Outcome

Implement data visualization using visualization


tools in Python Programming
Outline

Clustering Algorithms

Text Analysis Model Evaluation and Selection


Outline

● Clustering Algorithms:
○ K-Means, Hierarchical Clustering,
○ Time-series analysis.
● Introduction to Text Analysis:
○ Text-preprocessing, Bag of words, TF-IDF and topics.
● Need and Introduction to social network analysis
● Introduction to business analysis.
● Model Evaluation and Selection:
○ Metrics for Evaluating Classifier Performance, Holdout Method and Random
Sub sampling
○ Parameter Tuning
○ Clustering and Time-series analysis using Scikit-learn, sklearn.
○ Metrics Confusion matrix, AUC-ROC Curves , Elbow plot.
Clustering

What is Clustering | Cluster analysis

Cluster analysis is a statistical classification technique in which a set of


objects or points with similar characteristics are grouped together in clusters.
Clustering

Need of Clustering Algorithms

Trying to determine the appropriate Using Clustering algorithms on the Selling the products to the targeted
audience for the product customer base audience
Clustering Algorithms

❏ K-Means

❏ Hierarchical Clustering ❏ Time-series analysis


Clustering Algorithms

❏ K-Means
Unsupervised learning algorithm

Used to solve the clustering problems

Which groups are unlabeled dataset into different clusters.

Here K defines the number of predefined clusters that need to be created in the process,
as if K=2, there will be two clusters, and
for K=3, there will be three clusters, and so on.
Clustering Algorithms

❏ K-Means
Clustering Algorithms

❏ K-Means

● It is an iterative algorithm that divides the unlabeled dataset into k different clusters
● in such a way that each dataset belongs only one group that has similar properties.

● It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
Clustering Algorithms

❏ K-Means
● It is a centroid-based algorithm, where each cluster is associated with a centroid.

● The main aim of this algorithm is


to minimize the sum of distances
between the data point and their
corresponding clusters.
Clustering Algorithms

❏ K-Means

The k-means clustering algorithm mainly performs two tasks:

● Determines the best value for K center points or centroids by an iterative process.

● Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
Clustering Algorithms

❏ K-Means
Clustering Algorithms

❏ K-Means
● Basically K-Means runs on distance calculations, which uses “Euclidean Distance” to calculate
the distance between two given instances.
● For given instances (X1, Y1) and (X2, Y2), the formula is

● Link for Solved Example

● Link for Python Code


Clustering Algorithms
❏ K-Means
Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Step 1

Select the number K to decide the number of clusters.

● Let's take number k of clusters, i.e., K=2, to identify the dataset


and to put them into different clusters.
● It means here we will try to group these datasets into two
different clusters.
Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Step 2

Select random K points or centroids. (It can be other from the input dataset).
Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Assign each data point to their closest centroid,


Step 3
which will form the predefined K clusters.
Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Step 4 Calculate the variance and place a new centroid of each


cluster.
Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Repeat the third steps, which means reassign each datapoint to the new
Step 5
closest centroid of each cluster.
Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Step 6 If any reassignment occurs, then go to step-4 else go to FINISH.


Clustering Algorithms
❏ K-Means
How does the K-Means Algorithm Work?

Step 7 The model is ready.


Clustering Algorithms

How does the K-Means Algorithm Work?


Clustering Algorithms

Effect for Number of Cluster: K


Model Evaluation & Selection

Elbow Method

Click here for demonstration


Clustering Algorithms

❏ Hierarchical Clustering

● we develop the hierarchy of clusters in the form of a tree

● this tree-shaped structure is known as the dendrogram.


Clustering Algorithms

❏ Hierarchical Clustering | hierarchical cluster analysis

● unsupervised machine learning algorithm

● used to group the unlabeled datasets into a cluster


Clustering Algorithms

❏ Hierarchical Clustering

● we develop the hierarchy of clusters in the form of a tree

● this tree-shaped structure is known as the dendrogram.


Clustering Algorithms

❏ Hierarchical Clustering

● The hierarchical clustering technique has two approaches:

1. Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm


starts with taking all data points as single clusters and merging them until one
cluster is left.

2. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is


a top-down approach.
Clustering Algorithms

❏ Hierarchical Clustering

Agglomerative Hierarchical clustering

● The agglomerative hierarchical clustering algorithm is a popular example of HCA.

● To group the datasets into clusters, it follows the bottom-up approach.

● It means, this algorithm considers each dataset as a single cluster at the beginning,
and then start combining the closest pair of clusters together.

● It does this until all the clusters are merged into a single cluster that contains all
the datasets.
Clustering Algorithms
❏ Hierarchical Clustering
Divisive Hierarchical clustering

● This is top Down Strategy does the reverse of agglomerative hierarchical clustering by starting
with all objects in one cluster.

● It subdivides the clusters into smaller & smaller pieces, until each object from a cluster on its
own or until it satisfies certain termination conditions.

Like , a desired number of cluster or the diameter of each cluster is within a certain threshold
Clustering Algorithms

❏ Hierarchical Clustering

Why hierarchical clustering?

● we can opt for the hierarchical clustering algorithm


● because, in this algorithm, we don't need to have
knowledge about the predefined number of clusters.
Clustering Algorithms

❏ Hierarchical Clustering

Agglomerative Hierarchical clustering


Measure for the distance between two clusters

● the closest distance between the two clusters is crucial for the hierarchical
clustering.

● There are various ways to calculate the distance between two clusters, and
these ways decide the rule for clustering.

● These measures are called Linkage methods.


Clustering Algorithms
❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
Single
Linkage Methods Linkage

Complete Linkage Centroid


Linkage Methods Linkage

Average
Linkage
Clustering Algorithms
❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
Linkage Methods

Single ● It is the Shortest Distance between the closest points of the clusters.
Linkage
Clustering Algorithms
❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
Linkage Methods

● It is the farthest distance between the two points of two different clusters.
Complete
Linkage
● It is one of the popular linkage methods as it forms tighter clusters than
single-linkage.
Clustering Algorithms
❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
Linkage Methods

Average ● It is the linkage method in which the distance between each pair of datasets is
Linkage
added up and then divided by the total number of datasets to calculate the
average distance between two clusters.
Clustering Algorithms
❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
Linkage Methods

Centroid ● It is the linkage method in which the distance between the centroid of the
Linkage clusters is calculated.

Reference
Clustering Algorithms

❏ Hierarchical Clustering Example


How the Agglomerative Hierarchical clustering Work?
● Task is to divide the students into different groups.
● Each student in an assignment and based on the given marks.
● There’s no fixed target here as to how many groups to have.
● No clear idea about what type of students should be assigned to which group, it cannot
be solved as a supervised learning problem.
● So, we will try to apply hierarchical clustering here and segment the students into
different groups.
Clustering Algorithms

❏ Hierarchical Clustering Example


Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?
√(10-7)^2 = √9 = 3

Step 1 Creating a Proximity Matrix


Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?

Step 2

assign all the points to an individual cluster


Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?

Step 3

look at the smallest distance in the proximity matrix and


merge the points with the smallest distance
Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?

Step 3

the smallest distance is 3 and hence we will merge point 1


and 2
Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?

Step 3

look at the updated clusters and accordingly update the


proximity matrix

As per the selected linkage function update the proximity


matrix
Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?
To decide the distance between (1,2)->3
Step 3
● Check the proximity matrix

min((1,3),(1,2))
=min(18,21)
=18
Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering
How the Agglomerative Hierarchical clustering Work?

Step 4

Repeat step 2 until only a single cluster is left.


Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering

Dendrogram Representation
Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering

Dendrogram Representation

More the distance of the vertical lines in the dendrogram, more


the distance between those clusters.
Clustering Algorithms

❏ Hierarchical Clustering
Agglomerative Hierarchical clustering ● Decide threshold
● Consider Threshold =12
Number of Cluster ● The number of clusters will be the number of vertical lines
which are being intersected by the line drawn using the
threshold.
● The red line intersects 2 vertical lines
● we will have 2 clusters.
● One cluster will have a sample (1,2,4) and the other will have
a sample (3,5)
Clustering Algorithms
❏ Hierarchical Clustering

Agglomerative Hierarchical clustering Divisive Hierarchical clustering

● Initially each item in its own ● Initially each item in its one
cluster cluster

● Iteratively cluster are merged ● Large clusters are successively


together divided

● Bottom up
● Top Down
Clustering Algorithms

❏ Time-series analysis

● Time series is a sequence of data points in chronological sequence, most often gathered in regular
intervals.
● It can be applied to any variable that changes over time and generally speaking, usually data
points that are closer together are more similar than those further apart

● It is the way of studying the characteristics of the response variable with respect to time, as the
independent variable

● To estimate the target variable in the name of predicting or forecasting, use the time variable as
the point of reference
Clustering Algorithms

❏ Time-series analysis
Timestamp Stock - Price
Example stock price
2015-10-11 09:00:00 100

2015-10-11 10:00:00 110

2015-10-11 11:00:00 105


Basic structure of time series data 2015-10-11 12:00:00 90

Observations are recorded every hour. 2015-10-11 13:00:00 120


Clustering Algorithms

❏ Time-series analysis
Trend
Components of time series

Components
Seasonality of Irregularity
time series

Cyclical
Clustering Algorithms

❏ Time-series analysis
Components of time series

In which there is no fixed interval and any divergence within the given
Trend dataset is a continuous timeline.

The trend would be negative or positive of null trend


Clustering Algorithms

❏ Time-series analysis
Components of time series

Seasonality In which regular or fixed interval shifts within the dataset in a continuous
timeline.

Would be bell curve or saw tooth.

● Identifying seasonality in time series data is important for the development of a useful
time series model.
Clustering Algorithms

❏ Time-series analysis
Components of time series

● Identifying seasonality in time series data is important for the


Seasonality
development of a useful time series model.

❏ tools for detecting seasonality in time series data

● tools that are useful for detecting seasonality in time series data
● Time series plots
● Statistical analysis and tests
Clustering Algorithms

❏ Time-series analysis
Components of time series

Seasonality
Clustering Algorithms

❏ Time-series analysis
Components of time series

Cyclical

Source In which there is no fixed interval, uncertainty in movement and its pattern
Clustering Algorithms

❏ Time-series analysis
Components of time series

Irregularity Unexpected situations/events/scenarios and spikes in a short time span


Clustering Algorithms
❏ Time-series analysis
Components of time series
Clustering Algorithms

❏ Time-series analysis
Time Series analysis can be classified as :

Parametric & Non Parametric

Linear & Non Linear

Univariate & Multivariate


Clustering Algorithms

❏ Time-series analysis
Techniques used for time series analysis

ARIMA Models

Box-Jenkins multivariate models

Holt winters exponential smoothing


Clustering Algorithms

❏ Time-series analysis
Techniques used for time series analysis

ARIMA Models

ARIMA stands for AutoRegressive Integrated Moving Average.


Text Analysis

It is a machine learning technique used to automatically extract valuable


insights from unstructured text data.
Text Analysis

keywords

Names

Text Analysis Extract Specific


information
Company Informations

Survey Responses
Text Analysis

Text Analysis Operations using natural language toolkit

Tokenization

Stop Words Removal

Stemming and Lemmatization

POS Tagging
Text Analysis

Text Analysis Operations using natural language toolkit

Tokenization

● the first step in text analytics

● The process of breaking down a text paragraph into smaller


chunks such as words or sentences is called Tokenization.

● Token is a single entity that is the building blocks for a sentence


or paragraph.
Text Analysis

Text Analysis Operations using natural language toolkit

Tokenization

● split a paragraph into list of sentences


Sentence Tokenization
using sent_tokenize() method

Word Tokenization ● split a sentence into list of words using


word_tokenize() method
Text Analysis

Text Analysis Operations using natural language toolkit

Tokenization
Text Analysis

Text Analysis Operations using natural language toolkit

Stop Words Removal

● Stopwords considered as noise in the text.

● Text may contain stop words such as is, am, are, this, a, an, the, etc.
Text Analysis
Text Analysis Operations using natural language toolkit

Stop Words Removal


Text Analysis
Text Analysis Operations using natural language toolkit

Stemming and Lemmatization

● Stemming is a normalization technique where lists of tokenized words are


converted into shortened root words to remove redundancy.

● Lemmatization in NLTK (Natural Lang. Toolkit) is the algorithmic process of


finding the lemma of a word depending on its meaning and context.
Text Analysis
Text Analysis Operations using natural language toolkit

Stemming and Lemmatization


Text Analysis
Text Analysis Operations using natural language toolkit

Stemming and Lemmatization


Example
Text Analysis
Text Analysis Operations using natural language toolkit

POS Tagging

● POS (Parts of Speech) tell us about grammatical information of words of the


sentence by assigning specific token as tag to each words.
Text Analysis

Text Analysis Model using TF-IDF

● Term frequency–inverse document frequency(TFIDF)

● is a numerical statistic that is intended to reflect how important a word is


to a document in a collection or corpus.
Text Analysis

Term Frequency

● It is a measure of the frequency of a word (w) in a document (d).

● TF is defined as the ratio of a word’s occurrence in a document to the total number


of words in a document.
Text Analysis

Term Frequency

Formula
Text Analysis

Term Frequency

Example
Text Analysis

Inverse Document Frequency

● It is the measure of the importance of a word.

● Term frequency (TF) does not consider the importance of words.

● Some words such as’ of’, ‘and’, etc. can be most frequently present but are of little
significance.

● IDF provides weightage to each word based on its frequency in the corpus D.
Text Analysis

Inverse Document Frequency

Formula
Text Analysis

Inverse Document Frequency

In our example, since we have two documents in the corpus, N=2.


Example
Text Analysis

Term Frequency — Inverse Document Frequency (TFIDF)

● It is the product of TF and IDF.

● TFIDF gives more weightage to the word that is rare in the corpus (all the
documents).

● TFIDF provides more importance to the word that is more frequent in the
document.
Text Analysis

Term Frequency — Inverse Document Frequency (TFIDF)

Formula
Text Analysis

Term Frequency — Inverse Document Frequency (TFIDF)

Example
Text Analysis

Term Frequency — Inverse Document Frequency (TFIDF)

Disadvantage of TF IDF

● It is unable to capture the semantics.


, Text Analysis

Introduction to social network analysis

Social network analysis (SNA)

● is the process of investigating social structures


in terms of nodes and edges that connect them
through the use of networks and graph theory.

Source
, Text Analysis

Application of Social Network Analysis

https://www.latentview.com/blog/a-guide-to-social-network-analysis-and-its-use-cases/
, Text Analysis

Introduction to social network analysis

Link prediction Classification


, Text Analysis

Introduction to social network analysis

Community detection Influence propagation


, Text Analysis

Graph Theory for social network analysis Graph:

A graph is made up of
vertices(also called nodes) that
are connected by edges(also
called links or relationships).
, Text Analysis

Graph Theory for social network analysis Edges:

Here are three different edges relationships:

● The relationship “working together” is a symmetric


● Symmetric and Asymmetric
relationship
(Directionality)
● If A is related to B, B is also related to A.
● Binary and Valued (Weight)
, Text Analysis

Graph Theory for social network analysis Edges:

Here are three different


edges relationships:

● Symmetric and
Asymmetric
(Directionality)
● Binary and Valued
(Weight)
, Text Analysis

Graph Theory for social network analysis Edges:

Here are three different edges relationships: ● The relationship between nodes is ‘child of’, then the
relationship is asymmetric.

● Symmetric and Asymmetric ● This is the case if someone follows someone else on

(Directionality) Twitter.

● Binary and Valued (Weight) ● If A is the child of B, then B is not a child of A. Such a
network where the relationship is asymmetric
, Text Analysis

Graph Theory for social network analysis Edges:

● Relationships can be binary or valued.


Here are three different edges relationships:
● “Priya follows Teena on Twitter” is a binary
relationship
● Symmetric and Asymmetric
● “Priya retweeted 4 tweets from Teena ” is valued.
(Directionality)
● In the Twitter world, such relationships are easily
● Binary and Valued (Weight)
quantified
● in the “softer” social world it’s very hard to determine
and quantify the quality of an interpersonal
relationship.
, Text Analysis

Graph Theory for social network analysis Density

● The relation between the number of existing


connections in a network and all possible
connections
, Text Analysis

Graph Theory for social network analysis Density

● 5 Nodes
● Potential edges= 5(5-1)/2 = 5*4
=20/2=10
● Actual Edges= 9
● Density= 9/10= 90%
● Hence it is a high-density network.
, Text Analysis

Graph Theory for social network analysis Density

● 5 Nodes
● Potential edges= 5(5-1)/2 = 5*4 /2
=20/2=10
● Actual Edges= 4
● Density= 4/10= 40%
● Hence it is a low-density network.
, Text Analysis

Graph Theory for social network analysis Density


, Text Analysis

Graph Theory for social network analysis Centrality Measures

Degree Cardinality Measures the number of


direct ties to a node; this will indicate the most
connected node in the group.

The standardized score is calculated by dividing the score by (n-1),


where n is the number of nodes in the network.

Nodes 3 and 5 have a high degree centrality of 0.5, i.e.,


they are the most well-connected nodes in the network.
, Text Analysis

Graph Theory for social network analysis Closeness Cardinality

● Closeness Cardinality Closeness measures how close a node is to the rest of the
network. It is the ability of the node to reach the other nodes in the network.
● It is calculated as the inverse of the sum of the distance between a node and other
nodes in the network.
, Text Analysis

Graph Theory for social network analysis Centrality Measures

● Closeness Cardinality
● Hence the Closeness score for node 1 will
be 1/16.
● The standardized score is calculated by
multiplying the score by (n-1).
, Text Analysis

Graph Theory for social network analysis Centrality Measures

● Closeness Cardinality

Node 4 is the closest/central node in the network with


the highest closeness score of 0.6.
, Text Analysis

Graph Theory for social network analysis Centrality Measures

● Betweenness Centrality is a measure of how often a node appears in the shortest


path connecting two other nodes.
, Text Analysis

Graph Theory for social network analysis Centrality Measures

Let us take node 5 in Figure 4. Node 5


occurs in 9 shortest paths between a Nodes with high betweenness centrality are critical in
pair of nodes controlling and maintaining flow in the network; hence these
are critical nodes in the network
, Text Analysis

Graph Theory for social network analysis Centrality Measures

Nodes with high betweenness centrality are critical in controlling and


maintaining flow in the network; hence these are critical nodes in the
network
, Text Analysis

Graph Theory for social network analysis Centrality Measures

● Eigen Centrality is a relative measure of the importance of the node in the


network.
● Each node is assigned a value or score depending upon the number of other
prominent/ high scoring nodes it is connected to.
, Text Analysis

Graph Theory for social network analysis Why Centrality Measures ?

● Here ‘d’ represents the degree centrality score.


● Nodes A and B are connected to 4 nodes each, and
hence both have a degree centrality score of 4.
● But when we look at their neighbors, we can see
that node B is connected to nodes with a high
degree.
● Hence, node B can be preferred over node A when
we have to choose based on connectivity.
, Text Analysis

Introduction to business analysis

Business analysis

● “Business analysis is the practice of enabling change in


an enterprise by defining needs and recommending
solutions that deliver value to stakeholders.

● It enables an enterprise to articulate needs and the


rationale for change, and to design and describe solutions
that can deliver value.”
, Model Evaluation & Selection

Why Cross Validation is important

● Data needs to split into:

● Training data: Used for model development

● Validation data: Used for validating the performance of the same model

● In simple terms cross-validation allows us to utilize our data even better.


, Model Evaluation & Selection

Why Cross Validation is important


, Model Evaluation & Selection

Cross Validation
, Model Evaluation & Selection

Cross Validation

● Cross-Validation also referred to as out of sampling technique is an essential element of a


data science project.

● It is a resampling procedure used to evaluate machine learning models and access how the
model will perform for an independent test dataset.
, Model Evaluation & Selection

Cross Validation

8 different cross-validation techniques


1. Leave p out cross-validation
2. Leave one out cross-validation
3. Holdout cross-validation
4. Repeated random subsampling validation
5. k-fold cross-validation
6. Stratified k-fold cross-validation
7. Time Series cross-validation
8. Nested cross-validation
, Model Evaluation & Selection

Cross Validation Hold Out Cross Validation


, Model Evaluation & Selection

Cross Validation Hold Out Cross Validation

● The holdout technique is an exhaustive cross-validation method, that randomly


splits the dataset into train and test data depending on data analysis.

70:30 split of Data into training and validation data respectively


, Model Evaluation & Selection

Cross Validation Hold Out Cross Validation

● In the case of holdout cross-validation, the dataset is randomly split into training
and validation data.

● Generally, the split of


● training data is more than test data.
● The training data is used to induce the model and validation data is evaluates
the performance of the model.

● The more data is used to train the model, the better the model is.
, Model Evaluation & Selection

Cross Validation Hold Out Cross Validation

● For the holdout cross-validation method, a good amount of data is isolated from
training.

Pros Cons

1. Simple, easy to understand, and 1. Not suitable for an imbalanced dataset.


implement. 2. A lot of data is isolated from training the
model.
, Model Evaluation & Selection

Cross Validation Hold Out Cross Validation

● Hold out Approach in Sklearn

❏ The hold-out approach can be applied by using train_test_split module of


sklearn.model_selection

❏ In the below example we have split the dataset to create the test data with a size of
30% and train data with a size of 70%. The random_state number ensures the split is
deterministic in every run.
, Model Evaluation & Selection

Cross Validation

Random subsampling Cross Validation

● Repeated random subsampling validation also referred to as Monte Carlo cross-validation


splits the dataset randomly into training and validation.

● Unlikely k-fold cross-validation split of the dataset into not in groups or folds but splits in
this case in random.

● The number of iterations is not fixed and decided by analysis.

● the results are then averaged over the splits.


, Model Evaluation & Selection

Cross Validation

Random subsampling Cross Validation

Repeated random subsampling validation


, Model Evaluation & Selection

Cross Validation Random subsampling Cross Validation

Pros Cons

1. The proportion of train and validation 1. Some samples may not be selected for either
training or validation.
splits is not dependent on the number of
iterations or partitions. 2. Not suitable for an imbalanced dataset.
Model Evaluation & Selection

Parameter Tuning and Optimization

● There is a list of different machine learning models.

● They all are different in some way or the other, but what makes them different is
nothing but input parameters for the model.

● These input parameters are named as Hyperparameters.

● These hyperparameters will define the architecture of the model

● the best part about these is that you get a choice to select these for your model.
Model Evaluation & Selection

Parameter Tuning

● we are not aware of optimal values for hyperparameters which would generate the
best model output.

● what we tell the model is to explore and select the optimal model architecture
automatically.

● This selection procedure for hyperparameter is known as Hyperparameter Tuning.


Model Evaluation & Selection
Parameter Tuning

Number of Iteration in K Means Clustering


Model Evaluation & Selection
Parameter Tuning

Parameter Tuning for Decision tree


Model Evaluation & Selection
Parameter Tuning

effect of k on knn classification


Model Evaluation & Selection
Parameter Tuning

Initial support and threshold for Association


Rule Mining
Model Evaluation & Selection

Why we need Parameter Tuning

❏ here we would discuss what questions this hyperparameter tuning will answer for us

● What should be the value for the maximum depth of the Decision Tree?
● How many trees should I select in a Random Forest model?
● Should use a single layer or multiple layer Neural Network, if multiple layers
then how many layers should be there?
● How many neurons should I include in the Neural Network?
● What should be the minimum sample split value for Decision Tree?
● What value should I select for the minimum sample leaf for my Decision Tree?
Model Evaluation & Selection

Why we need Parameter Tuning

❏ here we would discuss what questions this hyperparameter tuning will answer for us

● How many iterations should I select for Neural Network?


● What should be the value of the learning rate for gradient descent?
● Which solver method is best suited for my Neural Network?
● What is the K in k nearest Neighbors?
● What should be the value for C and sigma in Support Vector Machine?

Note : this a few questions which could be answered by hyperparameter tuning.


Model Evaluation & Selection

approaches to Hyperparameter tuning

● Manual Search

● Random Search

● Grid Search
Model Evaluation & Selection

approaches to Hyperparameter tuning

● Manual Search

❏ we select some hyperparameters for a model based on our gut feeling and experience.

❏ Based on these parameters, the model is trained, and model performance measures are
checked.
❏ This process is repeated with another set of values for the same hyperparameters until
optimal accuracy is received, or the model has attained optimal error.

❏ This might not be of much help as human judgment is biased, and here human experience is
playing a significant role.
Model Evaluation & Selection

approaches to Hyperparameter tuning

● Random Search

❏ doing multiple rounds of this process, it would be better to give multiple values for all
the hyperparameters in one go to the model and let the model decide which one best
suits.
Model Evaluation & Selection

approaches to Hyperparameter tuning

● Grid Search

❏ This method is quite an expensive


method in terms of computation power
and time, but this is the most efficient
method as there is the least possibility of
missing out on an optimal solution for a
model.
Model Evaluation & Selection

Confusion Matrix
Model Evaluation & Selection

Confusion Matrix
Model Evaluation & Selection

Confusion Matrix
Model Evaluation & Selection

Confusion Matrix

Click Here for Solved Example:


Model Evaluation & Selection

ROC-AUC Curve

● The Receiver Operator Characteristic (ROC) curve is an evaluation metric for binary
classification problems.
● It is a probability curve that plots the TPR against FPR at various threshold values
and essentially separates the ‘signal’ from the ‘noise’.
● The Area Under the Curve (AUC) is the measure of the ability of a classifier to
distinguish between classes and is used as a summary of the ROC curve.
● The higher the AUC, the better the performance of the model at distinguishing
between the positive and negative classes.
Model Evaluation & Selection

ROC-AUC Curve
Model Evaluation & Selection

ROC-AUC Curve
Model Evaluation & Selection

ROC-AUC Curve
Model Evaluation & Selection

ROC-AUC Curve
Model Evaluation & Selection

ROC-AUC Curve

● When AUC = 1, then the classifier is able to


perfectly distinguish between all the Positive and
the Negative class points correctly.
● If, however, the AUC had been 0, then the
classifier would be predicting all Negatives as
Positives, and all Positives as Negatives.
Model Evaluation & Selection

ROC-AUC Curve

● When 0.5<AUC<1, there is a high chance that the


classifier will be able to distinguish the positive
class values from the negative class values.
● This is so because the classifier is able to detect
more numbers of True positives and True
negatives than False negatives and False
positives.
Model Evaluation & Selection

ROC-AUC Curve

● When AUC=0.5, then the classifier is not able to


distinguish between Positive and Negative class
points. Meaning either the classifier is predicting
random class or constant class for all the data
points.
Model Evaluation & Selection

ROC-AUC Curve

You might also like