0% found this document useful (0 votes)

18 views20 pages

Unit Iii

Document

Uploaded by

cwales559

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views20 pages

Unit Iii

Document

Uploaded by

cwales559

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

UNIT III

Dimensionality reduction (Notes)

Dimensionality reduction refers to techniques that reduce the number of input
variables in a dataset while retaining as much information as possible. This is
particularly useful in high-dimensional datasets, where too many variables can
lead to overfitting or computational inefficiency.
1. Subset Selection
Subset selection identifies a subset of predictors (features) that are most relevant
for predicting the response variable.
Subset selection, is the process of selecting a subset of relevant features
(variables, predictors) for use in model construction for several reasons:
 Simplification of models to make them easier to interpret,
 Shorter training times,
 To avoid the curse of dimensionality,
 Enhanced generalization by reducing overfitting
The smaller subsets of features are chosen from a set of many dimensional data
to represent the model by filtering, wrapping, embedding.
It is also known as variable selection, attribute selection, feature selection.
Methods:
 Best Subset Selection: Evaluates all possible combinations of predictors
and selects the subset that minimizes prediction error.
 Forward Selection: Starts with no predictors and adds them one by one
based on their contribution to the model's performance.
 Backward Elimination: Starts with all predictors and removes them one
by one based on their statistical insignificance.

Subset Selection:

Fig 01 Types of Subset Selection

Classification-Separating Hyperplanes
 Classification: It’s the process of dividing data into categories or groups (e.g., identifying if
an email is spam or not).

 Separating Hyperplanes: In classification, a hyperplane is a boundary that divides data

points into classes.

For example, in a 2D space, a hyperplane is a line; in 3D, it’s a plane.

The goal is to find a hyperplane that best separates data points of different classes (e.g.,
separating cats and dogs in a feature space).

Algorithms like SVM (Support Vector Machines) often focus on finding the optimal
hyperplane.

A separating hyperplane is a plane that separates two classes of data points in a multi-dimensional
space. The hyperplane separation theorem states, that if two classes of data points are linearly
separable, then there exists a hyperplane that perfectly separates the two classes
In a binary classification problem, given a linearly separable data set, the optimal separating
hyperplane is the one that correctly classifies all the data while being farthest away from the data
points. In this respect, it is said to be the hyperplane that maximizes the margin, defined as the
distance from the hyperplane to the closest data point.
The idea behind the optimality of this classifier can be illustrated as follows. New test points are
drawn according to the same distribution as the training data. Thus, if the separating hyperplane is far
away from the data points, previously unseen test points will most likely fall far away from the
hyperplane or in the margin. As a consequence, the larger the margin is, the less likely the points are
to fall on the wrong side of the hyperplane.
Finding the optimal separating hyperplane can be formulated as a convex quadratic
programming problem, which can be solved with well-known techniques.
The optimal separating hyperplane should not be confused with the optimal classifier known as
the Bayes classifier: the Bayes classifier is the best classifier for a given problem, independently of
the available data but unattainable in practice, whereas the optimal separating hyperplane is only the
best linear classifier one can produce given a particular data set.

The optimal separating hyperplane is one of the core ideas behind the support vector machines. In
particular, it gives rise to the so-called support vectors which are the data points lying on the margin
boundary of the hyperplane. These points support the hyperplane in the sense that they contain all the
required information to compute the hyperplane: removing other points does not change the optimal
separating hyperplane. Elaborating on this fact, one can actually add points to the data set without
influencing the hyperplane, as long as these points lie outside of the margin.

In the following dia The plot below shows the optimal separating hyperplane and its margin for a data
set in 2 dimensions. The support vectors are the highlighted points lying on the margin boundary.

Fig. Separating Hyperplanes

ANN
Elements of a Neural Network

 Input Layer: This layer accepts input features. It provides information from the outside
world to the network, no computation is performed at this layer, nodes here just pass on the
information(features) to the hidden layer.
 Hidden Layer: Nodes of this layer are not exposed to the outer world, they are part of the
abstraction provided by any neural network. The hidden layer performs all sorts of
computation on the features entered through the input layer and transfers the result to the
output layer.
 Output Layer: This layer bring up the information learned by the network to the outer
world.
Classification in Artificial Neural Networks
Artificial Neural Networks (ANNs) are used for classification by learning decision
boundaries (like hyperplanes) that divide classes based on the input data. They can handle
non-linear decision boundaries using hidden layers and activation functions.

Early Models of ANN

In machine learning, the perceptron (or McCulloch–Pitts neuron) is an algorithm
for supervised learning of binary classifiers. A binary classifier is a function which can decide
whether or not an input, represented by a vector of numbers, belongs to some specific class. It
is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on
a linear predictor function combining a set of weights with the feature vector.
1. Perceptron (1958):
 The simplest ANN model with a single neuron.
 Used for binary classification.
 Limited to solving linearly separable problems (e.g., cannot solve XOR
problem).
A perceptron is one of the earliest and simplest models of a neuron. A Perceptron
model is a binary classifier, separating data into two different classifications. As a
linear model it is one of the simplest examples of a type of artificial neural
network.

Fig. Perceptron
Multilayer Perceptron artificial neural networks adds complexity and density,
with the capacity for many hidden layers between the input and output layer. Each
individual node on a specific layer is connected to every node on the next layer.
This means Multilayer Perceptron models are fully connected networks, and can
be leveraged for deep learning.
They’re used for more complex problems and tasks such as complex
classification or voice recognition. Because of the model’s depth and complexity,
processing and model maintenance can be resource and time-consuming.
2. Multi-Layer Perceptron (MLP):
 Added hidden layers to overcome limitations of the perceptron.
 Capable of solving complex, non-linear problems.

The bias acts as an adjustable constant in the neuron, allowing the activation function to shift, which
helps the model better fit the data.

Role of Bias in ANNs

1. Improves Model Flexibility:

 Without bias, the output of the neuron is entirely dependent on the weighted sum of
inputs. This constrains the neuron to pass through the origin (0,0) for certain
activation functions.

 Bias allows the activation function to shift up or down, enabling the neuron to fit data
that doesn't pass through the origin.

2. Shifts Activation Function:

 The bias term adjusts the decision boundary by shifting the activation function.
 For instance, in a linear model y= w⋅x+b, the bias b shifts the line up or down,
providing more flexibility in separating data points.

3. Facilitates Learning of Complex Functions:

 Biases are essential in learning non-linear patterns when combined with activation
functions like ReLU, sigmoid, or tanh.
 Without bias, the neural network might struggle to approximate functions where the
outputs are not symmetrical around the origin.

Mathematical Representation

A neuron computes the output as:

Here:

 wi: Weights for inputs xi

 xi: Inputs

 b: Bias

 f: Activation function

The bias b adjusts the input to the activation function, allowing the output y to take on values that fit
the data distribution better.

Comparison Without Bias

If b=0 , the neuron’s output becomes:

In this case:

 The model loses flexibility.

 Decision boundaries are restricted, making the model less capable of learning complex
relationships.

Back Propagation in ANN:

Backpropagation (short for "Backward Propagation of Errors") is a method used to train artificial
neural networks. Its goal is to reduce the difference between the model’s predicted output and the
actual output by adjusting the weights and biases in the network.

Backpropagation is a powerful algorithm in deep learning, primarily used to train artificial neural
networks, particularly feed-forward networks. It works iteratively, minimizing the cost function by
adjusting weights and biases.
In each epoch, the model adapts these parameters, reducing loss by following the error gradient.
Backpropagation often utilizes optimization algorithms like gradient descent or stochastic gradient
descent. The algorithm computes the gradient using the chain rule from calculus, allowing it to
effectively navigate complex layers in the neural network to minimize the cost function.

Fig. A simple illustration of how the backpropagation works by adjustments of weights

Backpropagation plays a critical role in how neural networks improve over time. Here's why:

1. Efficient Weight Update: It computes the gradient of the loss function with respect to each
weight using the chain rule, making it possible to update weights efficiently.

2. Scalability: The backpropagation algorithm scales well to networks with multiple layers and
complex architectures, making deep learning feasible.

3. Automated Learning: With backpropagation, the learning process becomes automated, and
the model can adjust itself to optimize its performance.

Probability Distribution
In statistics, Probability distribution functions depict the probability of different outcomes of a
random variable. It can be divided into 2 types-

 Discrete Probability Distribution- In this probability distribution, the random variable may
take discrete and distinct number of values with their respective probabilities.
For Example: a die rolled once can take only 6 values, from 1 to 6. And each of these
outcomes has a probability of ⅙.

 Continuous Probability Distribution: In this probability distribution, the random variable can
take an infinite number of values. And the probability of any discrete value is almost zero.
The probability is given for a range of values.
Parameter Estimation:
Parameter estimation involves finding the values of parameters in a statistical model that best
explain or fit a given dataset. Two common approaches are Maximum Likelihood Estimation
(MLE) and Bayesian Parameter Estimation.
MLE : In MLE, the objective is to maximize the likelihood of observing data given specific
probability distribution and its parameters. We estimate parameters that maximize the
likelihood of observing the data.
Likelihood function
The objective is to maximise the probability of observing the data points from joint
probability distribution considering specific probability distribution. This is formally stated
as-
P(X | theta)
Here, theta is an unknown parameter. This may also be written as
P(X ; theta)
P(x1,x2,x3,...,xn ; theta)
This is the likelihood function and is commonly denoted with L-
L(X ; theta)
Since the aim is to find the parameters that maximise the likelihood function-
Maximum{L(X;theta)}
The joint probability is restated as a product of conditional probability for every observation
given the distribution parameters.
L(X | theta) = π(i to n) P (xi | theta)

Bayesian Estimation
Bayes Theorem
Most of you might already be aware of bayes theorem. It was proposed by Thomas Bayes.
The theorem puts forth a formula for conditional probability. Given as:
Fig Bayes’ Theorem
Here, We find the probability of event A given B is true. And P(A) and P(B) are independent
probabilities of events A and B.
Or, you may come across websites referring to these in pure statistical terminology.
P(A) = Prior Probability. This is the probability of any event before we take into
consideration any new piece of information.
P(B) is referred to as evidence. How likely an observation of B is given our prior beliefs
about A.
P(B|A) is referred to as likelihood function. It tells how likely each observation of B is for a
fixed A.
P(A|B) = Posterior Probability. This is the probability of an event after some event has
already occurred.

Decision tree evaluation measures:

Decision trees are a non-parametric supervised learning method that can be used for both
classification and regression tasks. They work by learning simple decision rules from the data
features to predict the value of a target variable.
Decision tree evaluation measures are used to assess how well a decision tree splits data at
each node and how well the entire tree performs on a dataset. These measures can be
categorized into split criteria (used during tree construction) and performance metrics
(used to evaluate the final tree).
The best way to evaluate decision tree models is through a combination of metrics like
accuracy, precision, recall, F1-score, and ROC-AUC. Cross-validation techniques such as k-
fold or stratified cross-validation ensure robustness. Additionally, assessing the model's
performance on unseen data with techniques like holdout validation or bootstrapping
provides further validation. Regularization methods like pruning help prevent overfitting,
enhancing generalization to unseen data. Ultimately, a comprehensive evaluation strategy
ensures the effectiveness and reliability of decision tree models.
Some metrics and methods used to evaluate decision tree models include:

.
Hypothesis Testing in Ensemble Methods
What is Hypothesis Testing?
Any data science project starts with exploring the data. When we perform an analysis on
a sample through exploratory data analysis and inferential statistics we get information about
the sample. Now, we want to use this information to predict values for the entire population.
It involves using statistical principles to combine predictions from multiple models (the
ensemble) and evaluate whether the combined predictions significantly improve performance
or achieve better results compared to individual models.
Fig. Types of Errors
Hypothesis testing is done to confirm our observation about the population using sample
data, within the desired error level. Through hypothesis testing, we can determine whether we
have enough statistical evidence to conclude if the hypothesis about the population is true or
not.
How to perform hypothesis testing in machine learning?
To trust your model and make predictions, we utilize hypothesis testing. When we will use
sample data to train our model, we make assumptions about our population. By performing
hypothesis testing, we validate these assumptions for a desired significance level.

Let’s take the case of regression models: When we fit a straight line through a linear
regression model, we get the slope and intercept for the line. Hypothesis testing is used to
confirm if our beta coefficients are significant in a linear regression model. Every time we
run the linear regression model, we test if the line is significant or not by checking if the
coefficient is significant.
Key steps to perform hypothesis test are as follows:
1. Formulate a Hypothesis
2. Determine the significance level
3. Determine the type of test
4. Calculate the Test Statistic values and the p values
5. Make Decision
Types of Hypothesis Testing
Hypothesis tests are divided into two categories:
1) Parametric tests – are used when the samples have a normal distribution. In general,
samples with a mean of 0 and a variance of 1 follow a normal distribution.
2) Non-Parametric tests – If the samples do not follow a normal distribution, non-
parametric tests are used.
Two types of Hypothesis Testing can be created depending on the number of samples to
be compared:
• One Sample – If there is only one sample that must be compared to a specific value, it is
called a single sample.
• Two Samples – if you’re comparing two or more samples. Correlation and sample
difference are two tests that could be used in this situation. Samples can be paired or not in
both circumstances. Dependent samples are sometimes known as paired samples, while
independent samples are known as unpaired samples. Natural or matched couplings occur in
paired samples.

Ensemble Methods Overview

Ensemble methods combine predictions from multiple base models to improve performance,
stability, and robustness. Common ensemble techniques include:
1. Bagging (Bootstrap Aggregating):
Example: Random Forest.
Combines models trained on different subsets of the data using averaging (for
regression) or majority voting (for classification).
2. Boosting:
Example: AdaBoost, Gradient Boosting.
Sequentially builds models where each corrects the errors of the previous one.
3. Stacking:
Combines predictions from multiple models using a meta-model to learn how
to best combine them.
4. Voting:
A simple combination of predictions using majority (for classification) or
averaging (for regression).

What is graph-based clustering?

Graph-based clustering is a method for identifying groups of similar cells or samples. It
makes no prior assumptions about the clusters in the data. This means the number, size,
density, and shape of clusters does not need to be known or assumed prior to clustering
Graphical model clustering involves using probabilistic graphical models to group similar
data points into clusters based on their underlying probability distributions. It combines the
principles of graph theory and statistical modeling, making it a powerful tool for
understanding relationships in complex datasets.
Key Concepts in Graphical Models
1. Graphical Models: Represent dependencies between variables using nodes
(representing variables) and edges (representing conditional dependencies).
 Bayesian Networks (Directed Acyclic Graphs): Represent directed
dependencies.
 Markov Random Fields (Undirected Graphs): Represent undirected
dependencies.
2. Clustering: The task of grouping data points such that points in the same group are
more similar to each other than to points in other groups.
3. Probabilistic Clustering: Assigns probabilities of membership to clusters rather than
hard assignments.
Advantages of Graphical Model Clustering
 Handles uncertainty and probabilistic relationships effectively.
 Models complex dependencies between variables.
 Offers flexibility for structured and unstructured data (e.g., images, text, or time-
series).

Graphical Model Clustering Process

1. Define the Graphical Model:
 Choose nodes to represent variables and edges to represent dependencies.
 Select a probabilistic framework (Bayesian Network, MRF, etc.).
2. Infer Cluster Membership:
Use methods like Expectation-Maximization (EM), Variational Inference, or Gibbs
Sampling to infer cluster assignments.
3. Parameter Estimation:
Estimate model parameters (e.g., mean and variance in GMMs) using Maximum
Likelihood Estimation or Bayesian methods.
4. Evaluate Clusters:
Use metrics like Adjusted Rand Index (ARI), Silhouette Score, or Log-Likelihood.

Advantages of Graphical Model Clustering

 Handles uncertainty and probabilistic relationships effectively.
 Models complex dependencies between variables.
 Offers flexibility for structured and unstructured data (e.g., images, text, or time-
series).
Applications
1. Text Clustering:
Topic modeling (e.g., LDA for news articles or reviews).
2. Image Segmentation:
Clustering pixels using MRFs or GMMs.
3. Biological Data:
Gene expression data clustering using Bayesian networks.
4. Social Networks:
Community detection in graph-based data.

A Gaussian Mixture Model (GMM): is a probabilistic model that assumes data is

generated from a mixture of several Gaussian distributions with unknown parameters. It is
widely used in clustering, density estimation, and data modeling.

Spectral Clustering: Ensemble Method and Learning Theory

Spectral clustering is a powerful algorithm that leverages the eigen structure of data similarity
matrices to group data points into clusters. When combined with ensemble methods, it
becomes a robust approach for solving clustering problems. Theoretical insights from
learning theory further enhance its understanding and application.
Spectral clustering is a graph-based clustering method that uses the eigenvalues and
eigenvectors of a similarity matrix derived from the data to group points into clusters. It
constructs a similarity graph, computes its Laplacian matrix, and transforms data into a
lower-dimensional spectral space where clustering (e.g., using KKK-means) is performed.
Ensemble spectral clustering combines multiple spectral clustering results from diverse
similarity measures or graph constructions to enhance robustness and accuracy. Learning
theory supports its consistency, generalization, and robustness by analyzing spectral gaps and
perturbation effects.
Applications and Use: Spectral clustering is widely used in applications like image
segmentation (grouping pixels into coherent regions), social network analysis (detecting
communities), bioinformatics (clustering genes or proteins), and document clustering
(organizing text data). Its ability to handle non-linear boundaries and adapt to diverse data
structures makes it valuable in scenarios where traditional clustering methods like KKK-
means fail.

Machine Learning-4
100% (1)
Machine Learning-4
18 pages
RC msn4
No ratings yet
RC msn4
151 pages
Deep-Learning Notes 01
No ratings yet
Deep-Learning Notes 01
8 pages
Artificial Neural Network Image Processing: Presented by
No ratings yet
Artificial Neural Network Image Processing: Presented by
88 pages
2024 L1 FixedIncome
No ratings yet
2024 L1 FixedIncome
93 pages
Machine Learning: Support Vector Machines Kernel Methods
No ratings yet
Machine Learning: Support Vector Machines Kernel Methods
87 pages
Unit4 PPT
No ratings yet
Unit4 PPT
118 pages
12 Advanced Machine Learning Algorithms
No ratings yet
12 Advanced Machine Learning Algorithms
41 pages
DL Unit2
No ratings yet
DL Unit2
113 pages
Official Notification For OAVS Recruitment
No ratings yet
Official Notification For OAVS Recruitment
28 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
Gulfood Exhibitor List N 1
No ratings yet
Gulfood Exhibitor List N 1
19 pages
AIOT Unit 1
No ratings yet
AIOT Unit 1
23 pages
DL KIET Model Question Paper
No ratings yet
DL KIET Model Question Paper
31 pages
Learning Algorithm
No ratings yet
Learning Algorithm
58 pages
NN Theory
No ratings yet
NN Theory
138 pages
Mod 3
No ratings yet
Mod 3
101 pages
M2 AI Chap1 Neural-Network
No ratings yet
M2 AI Chap1 Neural-Network
60 pages
Chapter 9. Classification: Advanced Methods
No ratings yet
Chapter 9. Classification: Advanced Methods
39 pages
Unit-2: Logistic Regression
No ratings yet
Unit-2: Logistic Regression
30 pages
Maricel Movs Ok Rpms With Annotations Kra 1-5-2021 2022
100% (1)
Maricel Movs Ok Rpms With Annotations Kra 1-5-2021 2022
146 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
MachineLearning Lecture 2
No ratings yet
MachineLearning Lecture 2
23 pages
ML Module Ii
No ratings yet
ML Module Ii
24 pages
1
No ratings yet
1
61 pages
Unit 1.1
No ratings yet
Unit 1.1
44 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
PHM802 NN LP04
No ratings yet
PHM802 NN LP04
49 pages
NN Ch2
No ratings yet
NN Ch2
36 pages
Deep Learning - A Gentle Introduction
No ratings yet
Deep Learning - A Gentle Introduction
100 pages
UNIT-1 Notes
No ratings yet
UNIT-1 Notes
20 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
AI lsn5 PDF
No ratings yet
AI lsn5 PDF
18 pages
NN-Ch2 New V1
No ratings yet
NN-Ch2 New V1
99 pages
Unit 2
No ratings yet
Unit 2
10 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
23 pages
Linearly Separable 1
No ratings yet
Linearly Separable 1
36 pages
L3 Ann
No ratings yet
L3 Ann
15 pages
Data Analysis ch1
No ratings yet
Data Analysis ch1
13 pages
DL Unit1
No ratings yet
DL Unit1
10 pages
Oe-Ml Unit-5
No ratings yet
Oe-Ml Unit-5
20 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
Ai and ML
No ratings yet
Ai and ML
16 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
CFBC 718 e 2 C
No ratings yet
CFBC 718 e 2 C
30 pages
ML Fundamentals by Bitspace
No ratings yet
ML Fundamentals by Bitspace
19 pages
PRu 4
No ratings yet
PRu 4
13 pages
02 Landscape Design Methodology
No ratings yet
02 Landscape Design Methodology
48 pages
3) Multi-Layer Perceptron Learning in Tensorflow
No ratings yet
3) Multi-Layer Perceptron Learning in Tensorflow
7 pages
Unit V
No ratings yet
Unit V
25 pages
ML Unit I
No ratings yet
ML Unit I
14 pages
Week3 Perceptron Mlprwerwerwer
No ratings yet
Week3 Perceptron Mlprwerwerwer
8 pages
DR DL
No ratings yet
DR DL
7 pages
Unit 3
No ratings yet
Unit 3
8 pages
NN Unit 2
No ratings yet
NN Unit 2
20 pages
Lecture 2: Basics and Definitions: Networks As Data Models
No ratings yet
Lecture 2: Basics and Definitions: Networks As Data Models
28 pages
Artificial Neural Network Bao
No ratings yet
Artificial Neural Network Bao
26 pages
ch1 Part 1
100% (1)
ch1 Part 1
28 pages
Pattern Recognition & Analysis Assignment - Ii
No ratings yet
Pattern Recognition & Analysis Assignment - Ii
19 pages
Iv. Single Layer Structures: 4.1. Perceptrons
No ratings yet
Iv. Single Layer Structures: 4.1. Perceptrons
26 pages
Linear Separability Linearly Separable Data Non-Linearly Separable Data
No ratings yet
Linear Separability Linearly Separable Data Non-Linearly Separable Data
1 page
Project Report 2
No ratings yet
Project Report 2
11 pages
Egyptian Heaven and Hell Volume II
No ratings yet
Egyptian Heaven and Hell Volume II
314 pages
Job Portal
82% (11)
Job Portal
17 pages
Case Study Synopsis Lpu Ums
No ratings yet
Case Study Synopsis Lpu Ums
5 pages
Mounting Procedure: Reference: C3131320010 A1
No ratings yet
Mounting Procedure: Reference: C3131320010 A1
16 pages
Intro To Psych L6
No ratings yet
Intro To Psych L6
10 pages
AI-Powered Course Recommendation System
No ratings yet
AI-Powered Course Recommendation System
11 pages
Betas
No ratings yet
Betas
4 pages
12 Reach Dealer Parts
No ratings yet
12 Reach Dealer Parts
185 pages
LDB MP2020 FRMWRK
No ratings yet
LDB MP2020 FRMWRK
77 pages
Srs Report
No ratings yet
Srs Report
24 pages
Anglais
No ratings yet
Anglais
19 pages
Web Content Management System
No ratings yet
Web Content Management System
6 pages
Shops & Estt
No ratings yet
Shops & Estt
4 pages
Lab Report Liquid Flow
No ratings yet
Lab Report Liquid Flow
17 pages
Frequently Asked Questions (Faqs) About The: Symmetry454 and Symmetry010 Calendars
No ratings yet
Frequently Asked Questions (Faqs) About The: Symmetry454 and Symmetry010 Calendars
17 pages
Research Paper Mytsak
No ratings yet
Research Paper Mytsak
27 pages
Critical Analysis of My Mother at Sixty Six
No ratings yet
Critical Analysis of My Mother at Sixty Six
7 pages
Download
No ratings yet
Download
3 pages
Objective Function Decisions Demand Supply Constraints
No ratings yet
Objective Function Decisions Demand Supply Constraints
7 pages
KT Remote G PowerRemote en
No ratings yet
KT Remote G PowerRemote en
2 pages
Mechanical Engineering Seminars
No ratings yet
Mechanical Engineering Seminars
1 page
Director of Training
No ratings yet
Director of Training
2 pages
Masafi
No ratings yet
Masafi
2 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet

Unit Iii

Uploaded by

Unit Iii

Uploaded by

UNIT III

Dimensionality reduction (Notes)

Fig 01 Types of Subset Selection

 Separating Hyperplanes: In classification, a hyperplane is a boundary that divides data

For example, in a 2D space, a hyperplane is a line; in 3D, it’s a plane.

Fig. Separating Hyperplanes

Early Models of ANN

Role of Bias in ANNs

1. Improves Model Flexibility:

2. Shifts Activation Function:

3. Facilitates Learning of Complex Functions:

A neuron computes the output as:

 wi: Weights for inputs xi

Comparison Without Bias

If b=0 , the neuron’s output becomes:

 The model loses flexibility.

Back Propagation in ANN:

Fig. A simple illustration of how the backpropagation works by adjustments of weights

Decision tree evaluation measures:

Ensemble Methods Overview

What is graph-based clustering?

Graphical Model Clustering Process

Advantages of Graphical Model Clustering

A Gaussian Mixture Model (GMM): is a probabilistic model that assumes data is

Spectral Clustering: Ensemble Method and Learning Theory

You might also like