0% found this document useful (0 votes)
56 views7 pages

Unit 2

Support vector machines (SVMs) are a type of supervised machine learning model that can be used for classification or regression tasks. SVMs find a hyperplane that maximally separates classes in the training data. There are two types: linear SVMs for linearly separable data and kernel SVMs for non-linear data using kernel functions. Kernel methods map data to higher dimensions to find separating hyperplanes.

Uploaded by

21csme011anay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views7 pages

Unit 2

Support vector machines (SVMs) are a type of supervised machine learning model that can be used for classification or regression tasks. SVMs find a hyperplane that maximally separates classes in the training data. There are two types: linear SVMs for linearly separable data and kernel SVMs for non-linear data using kernel functions. Kernel methods map data to higher dimensions to find separating hyperplanes.

Uploaded by

21csme011anay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Bayesian Belief Networks (BBNs):

Bayesian belief networks, also known as Bayesian networks or probabilistic graphical models,
are powerful tools for representing and reasoning under uncertainty. They provide a structured
way to model complex relationships among a set of random variables using a directed acyclic
graph (DAG).
Key Components:
Nodes: Each node represents a random variable in the domain of interest. Nodes can represent
observable variables, latent variables, or parameters.
Edges: Directed edges between nodes indicate probabilistic dependencies or causal
relationships. An edge from node A to node B suggests that the value of node B depends
probabilistically on the value of node A.
Conditional Probability Tables (CPTs): Each node has associated conditional probability
tables that specify the probability distribution of the node given its parents in the graph. These
tables encode the probabilistic dependencies among variables.
Benefits:
Probabilistic Reasoning: BBNs enable probabilistic reasoning, allowing us to compute
probabilities of events given observed evidence and prior knowledge.
Modularity and Interpretability: BBNs provide a modular and interpretable representation
of complex systems, making it easier to understand and analyze causal relationships.
Decision Support: BBNs can be used for decision support, enabling optimal decision-making
under uncertainty by considering the probabilities of different outcomes.
Applications:
Medical Diagnosis: BBNs are used for medical diagnosis by modeling the probabilistic
relationships between symptoms and diseases.
Risk Assessment: BBNs are employed in risk assessment and decision analysis for evaluating
the likelihood and consequences of various events.
Fault Diagnosis: BBNs are applied in fault diagnosis and troubleshooting systems to identify
the root causes of failures in complex systems.

Expectation-Maximization (EM) Algorithm:


The expectation-maximization algorithm is an iterative optimization technique used to estimate
the parameters of probabilistic models when some of the data is missing or incomplete. It is
particularly useful in the context of unsupervised learning and probabilistic modeling.
Key Steps:
Initialization: Start with initial guesses for the parameters of the model.
Expectation (E) Step: In the E-step, compute the expected value of the latent variables given
the observed data and current parameter estimates. This step involves computing the posterior
distribution of the latent variables using Bayes' rule.
Maximization (M) Step: In the M-step, update the parameter estimates to maximize the
likelihood of the observed data. This step involves maximizing the expected complete-data log-
likelihood computed in the E-step.
Iterate: Repeat the E and M steps until convergence, where the parameter estimates no longer
change significantly.

What is Convergence in the EM algorithm?


Convergence is defined as the specific situation in probability based on intuition, e.g., if two
random variables have very little difference in their probability, then they are known as
converged. In other words, whenever the values of given variables are matched with each other,
it is called convergence.
Steps in EM Algorithm
The EM algorithm is completed mainly in 4 steps, which include Initialization Step,
Expectation Step, Maximization Step, and convergence Step. These steps are explained as
follows:
EM Algorithm in Machine Learning
1st Step: The very first step is to initialize the parameter values. Further, the system is provided
with incomplete observed data with the assumption that data is obtained from a specific model.
2nd Step: This step is known as Expectation or E-Step, which is used to estimate or guess the
values of the missing or incomplete data using the observed data. Further, E-step primarily
updates the variables.
3rd Step: This step is known as Maximization or M-step, where we use complete data obtained
from the 2nd step to update the parameter values. Further, M-step primarily updates the
hypothesis.
4th step: The last step is to check if the values of latent variables are converging or not. If it
gets "yes", then stop the process; else, repeat the process from step 2 until the convergence
occurs.

Benefits:
Handling Missing Data: EM algorithm can handle missing or incomplete data effectively by
iteratively imputing the missing values and updating the parameter estimates.
Unsupervised Learning: EM algorithm is widely used for unsupervised learning tasks such as
clustering and density estimation, where the underlying data distribution is unknown.
Mixture Models: EM algorithm is commonly used to estimate the parameters of mixture
models, such as Gaussian mixture models (GMMs), which are widely used for modeling
complex data distributions.

Applications:
Clustering: EM algorithm is used for clustering data into groups or clusters, where the data
distribution of each cluster is modeled by a separate component in a mixture model.
Density Estimation: EM algorithm is employed for estimating the probability density function
of data when the underlying distribution is unknown or complex.
Image Segmentation: EM algorithm is applied in image segmentation tasks to partition images
into regions with similar characteristics based on intensity or color distributions.
In summary, Bayesian belief networks provide a structured framework for representing and
reasoning under uncertainty, while the expectation-maximization algorithm is a powerful tool
for estimating the parameters of probabilistic models, especially in the presence of missing or
incomplete data.

Disadvantages of EM algorithm
• The convergence of the EM algorithm is very slow.
• It can make convergence for the local optima only.
• It considers both forward and backward probability.
• It is opposite to that of numerical optimization, which takes only forward probabilities.
Introduction to Support Vector Machines (SVM)
Support Vector Machines (SVMs) are a type of supervised learning algorithm that can be used
for classification or regression tasks. The main idea behind SVMs is to find a hyperplane that
maximally separates the different classes in the training data. This is done by finding the
hyperplane that has the largest margin, which is defined as the distance between the hyperplane
and the closest data points from each class. Once the hyperplane is determined, new data can
be classified by determining on which side of the hyperplane it falls. SVMs are particularly
useful when the data has many features, and/or when there is a clear margin of separation in
the data.

Support Vector Machines (SVMs) are broadly classified into two types: Simple or Linear SVM
and Kernel or Non-linear SVM.
Simple or Linear SVM:
• A Linear SVM is used for classifying linearly separable data, where the dataset can be
divided into distinct categories with a straight line.
• It is suitable for datasets with linearly separable classes, where a single straight line can
effectively separate the classes.
• The classifier used for such data is termed a Linear SVM classifier.
• Simple SVMs are commonly used for classification and regression analysis problems.
Kernel or Non-linear SVM:
• Non-linear data, which cannot be separated into distinct categories with a straight line,
is classified using Kernel or Non-linear SVM.
• In this type, the classifier is referred to as a non-linear classifier.
• To handle non-linear data, features are mapped into higher dimensions using kernel
functions, such as polynomial, radial basis function (RBF), or sigmoid kernels.
• By transforming the data into higher dimensions, a hyperplane is constructed to separate
the classes or categories effectively.
• Kernel SVMs are especially useful for addressing optimization problems with multiple
variables and handling complex data relationships that cannot be captured in lower
dimensions.
What is Kernel Method?
A set of techniques known as kernel methods are used in machine learning to address
classification, regression, and other prediction issues. They are built around the idea of kernels,
which are functions that gauge how similar two data points are to one another in a high-
dimensional feature space. Kernel methods' fundamental premise is used to convert the input
data into a high-dimensional feature space, which makes it simpler to distinguish between
classes or generate predictions. Kernel methods employ a kernel function to implicitly map the
data into the feature space, as opposed to manually computing the feature space.
Here are some most commonly used kernel functions in SVMs:

Linear Kernel
A linear kernel is a type of kernel function used in machine learning, including in SVMs
(Support Vector Machines). It is the simplest and most commonly used kernel function, and it
defines the dot product between the input vectors in the original feature space.
The linear kernel can be defined as:
K(x, y) = x .y
Where x and y are the input feature vectors. The dot product of the input vectors is a measure
of their similarity or distance in the original feature space.
Polynomial Kernel
The polynomial kernel, utilized in machine learning, including SVMs, is a nonlinear function
that transforms input data into a higher-dimensional space using polynomial functions.

Defined as:
K(x, y) = (x . y + c)^d
Here, x and y are input feature vectors, c is a constant, and d is the polynomial degree. The
decision boundary of an SVM with a polynomial kernel captures intricate correlations between
input features as a nonlinear hyperplane. The degree of nonlinearity is determined by the
polynomial degree.
Advantages:
Detects both linear and nonlinear correlations.
Captures complex data relationships.
Challenges:
Difficulty in selecting the appropriate polynomial degree.
Higher degrees may lead to overfitting, while lower degrees may not adequately represent data
relationships.
The Gaussian kernel, also known as the radial basis function (RBF) kernel, is a popular
nonlinear kernel function used in SVMs. It maps input data into a higher-dimensional feature
space using a Gaussian function:
K(x, y) = exp(-gamma ||x - y||^2)
Here, x and y are input feature vectors, gamma controls the width of the Gaussian function,
and ||x - y||^2 is the squared Euclidean distance between input vectors.
In SVMs, the Gaussian kernel yields a nonlinear decision boundary, capturing complex
relationships between input features. It doesn't require explicit feature engineering but selecting
the gamma parameter is crucial, as smaller values may lead to underfitting and larger values to
overfitting.
The hyperplane, often mentioned in discussions about SVMs, serves as a boundary that
separates different classes or categories in a dataset. It's a function used to distinguish between
features based on their characteristics. In lower dimensions like 2D, this function appears as a
line, while in 3D, it manifests as a plane. In higher dimensions, it's referred to as a hyperplane.
Mathematically, the equation of a hyperplane in an 'm'-dimensional space can be represented
as:
[ W^T X + b = 0 ]

Where:
- (W) is a vector comprising ( W_0, W_1, W_2, W_3, ldots, W_m )
- (b) is the bias term (equivalent to ( W_0 )
- X represents the variables or features.
This equation essentially defines the decision surface in SVM, which helps classify data points
into different categories.
Understanding the concept of the hyperplane is crucial as it forms the basis of how SVMs work
to classify data points effectively across multiple dimensions.
Properties of SVM, and Issues in SVM
Properties of Support Vector Machines (SVM):

1. Effective in High-Dimensional Spaces: SVMs perform well even in high-dimensional


spaces, making them suitable for tasks with a large number of features.

2. Memory Efficient: SVMs only use a subset of training points (support vectors) to define the
decision boundary, making them memory efficient for large datasets.

3. Versatility: SVMs can handle various types of data, including linear and nonlinear data,
through the use of different kernel functions.

4. Margin Maximization: SVMs seek to maximize the margin between different classes,
leading to better generalization and improved resistance to noise.
5. Flexibility in Kernel Selection: SVMs offer flexibility in kernel selection, allowing users to
choose different kernel functions based on the nature of the data.

6. Interpretability: SVMs provide interpretable results, making it easier to understand and


analyze the decision boundaries.

Issues in Support Vector Machines (SVM):

1. Sensitivity to Parameters: SVM performance can be sensitive to the choice of parameters


such as the regularization parameter (C) and kernel parameters (e.g., gamma in RBF kernel),
requiring careful tuning.

2. Scalability: SVMs may have scalability issues with very large datasets, as the training time
and memory requirements can increase significantly with dataset size.

3. Difficulty in Interpretation: While SVMs provide interpretable results, the decision


boundaries in high-dimensional spaces can be difficult to visualize and interpret.

4. Handling Imbalanced Data: SVMs may struggle with imbalanced datasets, where one class
is significantly larger than the other, leading to biased decision boundaries.

5. Kernel Selection: Choosing the appropriate kernel function and its parameters can be
challenging, and poor choices can lead to suboptimal performance.

6. Binary Classification: SVMs inherently support binary classification and may require
additional techniques for multi-class classification tasks.

For more detailed-


https://towardsdatascience.com/support-vector-machines-svm-c9ef22815589

You might also like