0% found this document useful (0 votes)
20 views

Unit 5

The document discusses evaluating machine learning algorithms and models. It covers splitting data, common evaluation metrics for classification and regression, techniques for model selection like grid search and cross-validation, and popular machine learning algorithms like linear regression, logistic regression, SVM, KNN, decision trees, random forests, Naive Bayes and more.

Uploaded by

Md. Sunmun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Unit 5

The document discusses evaluating machine learning algorithms and models. It covers splitting data, common evaluation metrics for classification and regression, techniques for model selection like grid search and cross-validation, and popular machine learning algorithms like linear regression, logistic regression, SVM, KNN, decision trees, random forests, Naive Bayes and more.

Uploaded by

Md. Sunmun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Machine Learning Algorithm Analytics: Evaluating Machine Learning algorithms, Model, Selection,

Ensemble Methods (Boosting, Bagging, and Random Forests).


Modeling Sequence/Time-Series Data and Deep Learning: Deep generative models, Deep
Boltzmann Machines, Deep auto-encoders, Applications of Deep Networks.

Evaluating Machine Learning algorithms


Evaluating machine learning algorithms is crucial to assess their performance and choose the best
model for a particular task. Here are some key steps and metrics commonly used for evaluating
ML algorithms:

1. Splitting Data: Start by splitting your dataset into training and testing sets. The typical split is 70-80%
for training and 20-30% for testing. You can also use techniques like cross-validation for more
robust evaluations.
2.
3. Metrics for Classification:
Accuracy: Measures the ratio of correct predictions to the total number of predictions. It's suitable for
balanced datasets but can be misleading for imbalanced ones.

Precision and Recall: Precision measures the proportion of true positive predictions out of all positive
predictions, while recall measures the proportion of true positive predictions out of all actual
positives.

F1 Score: The harmonic mean of precision and recall, balancing both metrics.

ROC Curve and AUC: Receiver Operating Characteristic (ROC) curves plot the true positive rate
against the false positive rate, and the Area Under the Curve (AUC) summarizes the curve's
performance.

4. Metrics for Regression:


Mean Absolute Error (MAE): Average of the absolute differences between predicted and actual values.

Mean Squared Error (MSE): Average of the squared differences between predicted and actual values.

Root Mean Squared Error (RMSE): Square root of the MSE, providing a measure in the same units as
the target variable.

R-squared (R2): Measures the proportion of the variance in the dependent variable that is predictable
from the independent variables.

5. Model Selection Techniques:


Grid Search: Exhaustive search over specified hyperparameter values to find the best model.

Random Search: Randomized search over hyperparameter values, often more efficient than grid
search.
Cross-Validation: Technique for assessing how the results of a statistical analysis will generalize to an
independent dataset.

6. Bias-Variance Tradeoff:
Bias: Error due to overly simplistic assumptions in the model; high bias can lead to underfitting.

Variance: Error due to too much complexity in the model; high variance can lead to overfitting.

Model Complexity Analysis: Evaluate how model complexity affects bias and variance to find the right
balance.

7. Ensemble Methods:
8.
Bagging (Bootstrap Aggregating): Training multiple models on different subsets of the data and
averaging their predictions to reduce variance.
Boosting: Sequentially training models, with each subsequent model focusing on the errors made by
the previous ones to reduce bias.
Random Forests, Gradient Boosting Machines (GBM), XGBoost, LightGBM: Popular ensemble
methods with different strategies for combining multiple models.
9. Domain-Specific Metrics: Depending on the application, you might need to define custom evaluation
metrics. For example, in medical diagnostics, sensitivity and specificity are critical.
Model, Selection
Model selection is a critical aspect of machine learning that involves choosing the best algorithm
or model for a given task. Here's a structured approach to model selection:
1. Define Objectives and Constraints: Clearly define your project's objectives, constraints (e.g., time,
resources), and the problem you aim to solve (classification, regression, clustering, etc.).
2. Understand the Data: Thoroughly analyze your dataset to understand its characteristics, including
the number of features, data types, distributions, missing values, outliers, and potential biases.
3. Select Candidate Models: Based on your problem type (supervised, unsupervised, or semi-
supervised learning) and data characteristics, choose a set of candidate models that are suitable
for experimentation. For example:
o Classification: Logistic Regression, Decision Trees, Random Forests, Support Vector Machines
(SVM), Neural Networks, etc.
o Regression: Linear Regression, Ridge Regression, Lasso Regression, Decision Trees, Gradient
Boosting Machines (GBM), etc.
o Clustering: K-means, DBSCAN, Hierarchical Clustering, Gaussian Mixture Models (GMM), etc.
4. Preprocess Data: Preprocess your data by handling missing values, encoding categorical variables,
scaling/normalizing features, and performing feature selection/engineering as needed.
5. Split Data: Split your dataset into training, validation, and test sets. The training set is used to
train models, the validation set helps tune hyperparameters, and the test set evaluates the final
model's performance.
6. Choose Evaluation Metrics: Select appropriate evaluation metrics based on your problem type.
For classification, consider accuracy, precision, recall, F1 score, ROC-AUC, etc. For regression, use
metrics like MAE, MSE, RMSE, R-squared, etc.
7. Baseline Models: Start with simple baseline models to establish a performance benchmark. For
example, a simple linear model can serve as a baseline for regression tasks, while a majority class
classifier can be a baseline for classification.
8. Hyperparameter Tuning: Use techniques like grid search, random search, or Bayesian
optimization to tune hyperparameters for each model. Hyperparameters are settings that control
the learning process (e.g., learning rate, regularization strength, tree depth).
9. Evaluate Models: Train each model on the training set, tune hyperparameters using the
validation set, and evaluate performance on the test set using the chosen evaluation metrics.
Compare models based on their test set performance.
10. Select Final Model: Select the best-performing model based on your evaluation criteria,
considering factors like accuracy, interpretability, computational efficiency, and suitability for
deployment.
11. Validate and Refine: Validate the chosen model's performance on new, unseen data (if available).
Refine the model further if necessary, such as by collecting more data, fine-tuning
hyperparameters, or exploring advanced techniques like ensemble learning.
12. Document and Communicate: Document your model selection process, including the rationale
behind choosing certain models, hyperparameter settings, and evaluation results. Communicate
your findings and decisions effectively to stakeholders.

List of Popular Machine Learning Algorithm


Here is a list of the Top 10 Most popular Machine Learning Algorithms.
1. Linear Regression
Linear regression is a simple algorithm used to map the linear relationship between input features
and a continuous target variable. It works by fitting a line to the data and then using the line to
predict new values.
2. Logistic Regression
Logistic regression is an extension of linear regression that is used for classification tasks to estimate
the likelihood that an instance belongs to a specific class.
3. SVM (Support Vector Machine)
SVMs are supervised learning algorithms that can perform classification and regression tasks. It finds
a hyperplane that best separates classes in feature space.
4. KNN (K-nearest Neighbour)
KNN is a non-parametric technique that can be used for classification as well as regression. It works
by identifying the k most similar data points to a new data point and then predicting the label of
the new data point using the labels of those data points.
5. Decision Tree
Decision trees are a type of supervised learning technique that can be used for classification as well
as regression. It operates by segmenting the data into smaller and smaller groups until each group
can be classified or predicted with high degree of accuracy.
6. Random Forest
Random forests are a type of ensemble learning method that employs a set of decision trees to
make predictions by aggregating predictions from individual trees. It improves the precision and
resilience of single decision trees. It can be used for both classification and regression tasks.
7. Naive Bayes
Naive Bayes is a probabilistic classifier based on Bayes’ theorem that is used for classification tasks.
It works by assuming that the features of a data point are independent of each other.
8. PCA (Principal Component Analysis)
PCA is a dimensionality reduction technique used to transform data into a lower-dimensional space
while retaining as much variance as possible. It works by finding the directions in the data that
contain the most variation, and then projecting the data onto those directions.
9. Apriori algorithms
Apriori algorithm is a traditional data mining technique for association rules mining in transactional
databases or datasets. It is designed to uncover links and patterns between things that regularly co-
occur in transactions. Apriori detects frequent itemsets, which are groups of items that appear
together in transactions with a given minimum support level.
10. K-Means Clustering
K-Means clustering is an unsupervised learning approach that can be used to group together data
points. It works by finding k clusters in the data so that the data points in each cluster are as similar
to each other as feasible while remaining as distinct as possible from the data points in other clusters.

As we know, Ensemble learning helps improve machine learning results by combining several
models. This approach allows the production of better predictive performance compared to a
single model. Basic idea is to learn a set of classifiers (experts) and to allow them to vote. Bagging
and Boosting are two types of Ensemble Learning. These two decrease the variance of a single
estimate as they combine several estimates from different models. So the result may be a model
with higher stability. Let’s understand these two terms in a glimpse.
1. Bagging: It is a homogeneous weak learners’ model that learns from each other independently in
parallel and combines them for determining the model average.
2. Boosting: It is also a homogeneous weak learners’ model but works differently from Bagging. In
this model, learners learn sequentially and adaptively to improve model predictions of a learning
algorithm.
Let’s look at both of them in detail and understand the Difference between Bagging and Boosting.

Bagging
Bootstrap Aggregating, also known as bagging, is a machine learning ensemble meta-algorithm
designed to improve the stability and accuracy of machine learning algorithms used in statistical
classification and regression. It decreases the variance and helps to avoid overfitting. It is usually
applied to decision tree methods. Bagging is a special case of the model averaging approach.

Suppose a set D of d tuples, at each iteration i, a training set D i of d tuples is selected via row
sampling with a replacement method (i.e., there can be repetitive elements from different d
tuples) from D (i.e., bootstrap). Then a classifier model Mi is learned for each training set D < i.
Each classifier Mi returns its class prediction. The bagged classifier M* counts the votes and assigns
the class with the most votes to X (unknown sample).
Implementation Steps of Bagging
 Step 1: Multiple subsets are created from the original data set with equal tuples, selecting
observations with replacement.
 Step 2: A base model is created on each of these subsets.
 Step 3: Each model is learned in parallel with each training set and independent of each other.
 Step 4: The final predictions are determined by combining the predictions from all the models.
Example of Bagging
The Random Forest model uses Bagging, where decision tree models with higher variance are
present. It makes random feature selection to grow trees. Several random trees make a Random
Forest.
To read more refer to this article: Bagging classifier
Boosting
Boosting is an ensemble modeling technique that attempts to build a strong classifier from the
number of weak classifiers. It is done by building a model by using weak models in series. Firstly,
a model is built from the training data. Then the second model is built which tries to correct the
errors present in the first model. This procedure is continued and models are added until either
the complete training data set is predicted correctly or the maximum number of models is added.
Boosting Algorithms
There are several boosting algorithms. The original ones, proposed by Robert Schapire and Yoav
Freund were not adaptive and could not take full advantage of the weak learners. Schapire and
Freund then developed AdaBoost, an adaptive boosting algorithm that won the prestigious
Gödel Prize. AdaBoost was the first really successful boosting algorithm developed for the
purpose of binary classification. AdaBoost is short for Adaptive Boosting and is a very popular
boosting technique that combines multiple “weak classifiers” into a single “strong classifier”.
Algorithm:
1. Initialise the dataset and assign equal weight to each of the data point.
2. Provide this as input to the model and identify the wrongly classified data points.
3. Increase the weight of the wrongly classified data points and decrease the weights of correctly
classified data points. And then normalize the weights of all data points.
4. if (got required results)
Goto step 5
else
Goto step 2
5. End

Similarities Between Bagging and Boosting


Bagging and Boosting, both being the commonly used methods, have a universal similarity of
being classified as ensemble methods. Here we will explain the similarities between them.
1. Both are ensemble methods to get N learners from 1 learner.
2. Both generate several training data sets by random sampling.
3. Both make the final decision by averaging the N learners (or taking the majority of them i.e
Majority Voting).
4. Both are good at reducing variance and provide higher stability.

Deep Generative Models


Deep generative models are a class of machine learning models that learn to generate new data
samples that are similar to the training data they were trained on. These models are particularly
powerful for tasks such as generating realistic images, text, audio, and other types of data. Here
are some key types of deep generative models:
1. Variational Autoencoders (VAEs):
o VAEs are a type of autoencoder where the encoder learns a probabilistic distribution (usually
Gaussian) of latent variables from the input data.
o The decoder then generates new samples by sampling from this latent space and reconstructing
them into the original data domain.
o VAEs are trained using variational inference techniques to optimize the reconstruction loss and a
regularization term to encourage a smooth latent space.
2. Generative Adversarial Networks (GANs):
o GANs consist of two neural networks: a generator and a discriminator, trained simultaneously in
a competitive manner.
o The generator learns to generate realistic samples to fool the discriminator, while the
discriminator learns to distinguish between real and generated samples.
o GANs have been highly successful in generating high-quality images, videos, and other complex
data types.

3. Autoregressive Models:
o Autoregressive models, such as PixelCNN and PixelRNN, generate data sequentially by modeling
the conditional probability of each element given previous elements.
o These models are often used for generating images and text, where the order of elements (pixels,
words) matters.

4. Normalizing Flows:
o Normalizing flow models transform a simple base distribution (e.g., Gaussian) into a more
complex distribution through a series of invertible transformations.
o By chaining multiple invertible transformations, these models can capture complex data
distributions and generate new samples.

5. Flow-Based Models:
o Flow-based models, such as Real NVP (Real Non-Volume Preserving) and Glow, use invertible
transformations to map data between a simple base distribution and the target distribution.
o These models can generate high-quality samples and are particularly efficient during both training
and generation.

6. Deep Belief Networks (DBNs):


o DBNs are a type of generative model that combines multiple layers of restricted Boltzmann
machines (RBMs) or autoencoders.
o They can learn hierarchical representations of data and generate new samples by sampling from
the learned distributions.

What are Deep Boltzmann Machines (DBMs)?

Deep Boltzmann Machines (DBMs) are a kind of artificial neural network that belongs to the family
of generative models. They are designed to discover intricate structures within large datasets by
learning to recreate the input data they’re given. Think of a DBM as an artist who, after studying
a collection of paintings, learns to create new artworks that could belong to the same collection.
Similarly, a DBM analyzes data and learns how to produce new examples that are similar to the
original data.

DBMs consist of multiple layers of hidden units, which are like the neurons in our brains. These
units work together to capture the probabilities of various patterns within the data. Unlike some
other neural networks, all units in a DBM are connected across layers, but not within the same
layer, which allows them to create a web of relationships between different features in the data.
This structure helps DBMs to be good at understanding complex data like images, text, or sound.
The ‘deep’ in the Deep Boltzmann Machine refers to the multiple layers in the network, which
allow it to build a deep understanding of the data. Each layer captures increasingly abstract
representations of the data. The first layer might detect edges in an image, the second layer might
detect shapes, and the third layer might detect whole objects like cars or trees.

Types of Boltzmann Machines

eep Learning models are broadly classified into supervised and unsupervised models.

Supervised DL models:

 Artificial Neural Networks (ANNs)


 Recurrent Neural Networks (RNNs)
 Convolutional Neural Networks (CNNs)

Unsupervised DL models:

 Self Organizing Maps (SOMs)


 Boltzmann Machines
 Autoencoders

Boltzmann Machines is an unsupervised DL model in which every node is connected to every other
node. That is, unlike the ANNs, CNNs, RNNs and SOMs, the Boltzmann Machines are undirected
(or the connections are bidirectional). Boltzmann Machine is not a deterministic DL model but a
stochastic or generative DL model. It is rather a representation of a certain system. There are two
types of nodes in the Boltzmann Machine — Visible nodes — those nodes which we can and do
measure, and the Hidden nodes – those nodes which we cannot or do not measure. Although the
node types are different, the Boltzmann machine considers them as the same and everything works
as one single system. The training data is fed into the Boltzmann Machine and the weights of the
system are adjusted accordingly. Boltzmann machines help us understand abnormalities by learning
about the working of the system in normal conditions.
Deep autoencoders are a type of artificial neural network used for unsupervised learning and data
compression. They consist of an encoder and a decoder, which work together to learn a
compressed representation (latent space) of the input data. Here's a breakdown of deep
autoencoders:

1. Architecture:
o Encoder: The encoder network takes the input data and maps it to a lower-
dimensional latent space representation. It typically consists of multiple hidden
layers, allowing for a hierarchical and nonlinear transformation of the input data.
o Decoder: The decoder network takes the latent representation produced by the
encoder and reconstructs the original input data. Like the encoder, it also has multiple
hidden layers.
2. Training:
o Objective: The goal of training a deep autoencoder is to minimize the reconstruction
error between the input data and the output from the decoder.
o Loss Function: The loss function commonly used is the mean squared error (MSE)
between the input and output data. Other reconstruction loss functions, such as
binary cross-entropy for binary data, can also be used.
o Optimization: Gradient-based optimization techniques like stochastic gradient
descent (SGD) or its variants (e.g., Adam, RMSProp) are used to minimize the loss
function during training.
3. Benefits:
o Dimensionality Reduction: Deep autoencoders can learn meaningful low-
dimensional representations of high-dimensional data, enabling dimensionality
reduction and feature learning.
o Feature Extraction: The hidden layers of the encoder can capture important features
and patterns in the data, which can be useful for downstream tasks such as
classification or clustering.
o Unsupervised Learning: Autoencoders are trained in an unsupervised manner,
meaning they do not require labeled data for training.
4. Variants:
o Sparse Autoencoders: Introduce sparsity constraints in the hidden layers to encourage
the model to learn sparse representations, useful for feature selection and noise
reduction.
o Denoising Autoencoders: Trained to reconstruct clean data from noisy input, helping
in learning robust representations and reducing overfitting.
o Variational Autoencoders (VAEs): Incorporate probabilistic modeling to learn a latent
space that follows a specific distribution, enabling generation of new data samples.
5. Applications:
o Image Denoising: Deep autoencoders can be used to remove noise from images by
learning to reconstruct clean images from noisy ones.
o Anomaly Detection: They can detect anomalies or outliers in data by comparing the
reconstruction error of input samples.
o Feature Learning: Autoencoders are used for learning meaningful representations in
various domains such as computer vision, natural language processing, and signal
processing.
6. Challenges:
o Overfitting: Deep autoencoders can be prone to overfitting, especially when the
model capacity is high relative to the size of the training data.
o Hyperparameter Tuning: Choosing the right architecture, number of layers, hidden
units, and regularization techniques requires experimentation and tuning.

Applications of Deep Networks.

Deep neural networks (DNNs) have found applications across various domains due to their
ability to learn complex patterns and representations from data. Here are some notable
applications of deep networks:

1. Computer Vision:
o Image Classification: DNNs like Convolutional Neural Networks (CNNs) are widely
used for image classification tasks, such as identifying objects in images.
o Object Detection and Localization: DNNs can detect and localize objects within
images, enabling applications like autonomous driving, surveillance, and
augmented reality.
o Image Segmentation: DNNs segment images into different regions or objects, useful
in medical imaging for identifying tumors or in satellite imagery for land cover
classification.
2. Natural Language Processing (NLP):
o Text Classification: DNNs are used for sentiment analysis, spam detection, and
topic classification in textual data.
o Machine Translation: Sequence-to-sequence models based on recurrent neural
networks (RNNs) or transformers are used for machine translation tasks.
o Named Entity Recognition (NER): DNNs can identify and classify named entities
like person names, locations, and organizations in text.
3. Speech Recognition:
o Automatic Speech Recognition (ASR): DNNs, especially deep recurrent networks
and transformer-based models, have significantly improved speech recognition
accuracy in applications like virtual assistants and voice-controlled devices.
o Speaker Identification: DNNs can identify speakers based on their voice
characteristics, used in security systems and personalized services.
4. Healthcare:
o Medical Imaging Analysis: DNNs analyze medical images (MRI, CT scans, X-rays)
for diagnosis, disease detection, and treatment planning.
o Drug Discovery: DNNs assist in drug discovery processes by predicting molecular
properties, identifying potential drug candidates, and analyzing biological data.
5. Finance:
o Algorithmic Trading: DNNs are used for analyzing financial data, predicting stock
prices, and developing algorithmic trading strategies.
o Fraud Detection: DNNs help detect fraudulent activities in banking, insurance, and
online transactions by learning patterns of fraudulent behavior.
6. Autonomous Systems:
o Autonomous Vehicles: DNNs are crucial for perception tasks in autonomous
vehicles, such as object detection, lane detection, and pedestrian tracking.
o Robotics: DNNs enable robots to perceive and interact with the environment,
including tasks like object manipulation and navigation.
7. Recommendation Systems:
o Content Recommendation: DNN-based recommendation systems analyze user
behavior and preferences to provide personalized recommendations in platforms
like e-commerce, streaming services, and social media.
8. Game Playing:
o Deep Reinforcement Learning: DNNs combined with reinforcement learning
techniques have achieved remarkable success in playing complex games like chess,
Go, and video games

You might also like