0% found this document useful (0 votes)
31 views55 pages

Machine Unit4

Uploaded by

Ash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views55 pages

Machine Unit4

Uploaded by

Ash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 55

https://www.geeksforgeeks.

org/feature-selection-techniques-in-machine-learning/

feature extraction

Feature extraction techniques in machine learning involve transforming raw data


into a set of features that can be effectively used by a learning algorithm. These
techniques aim to capture the most important information from the data,
reducing dimensionality while preserving relevant information to improve model
performance. Here are some common feature extraction techniques:

1. Principal Component Analysis (PCA)


Objective: Reduce the dimensionality of data by transforming it into a
new set of variables (principal components) that are orthogonal and
account for most of the variance in the data.

Process:

 Standardize the data.


 Compute the covariance matrix.
 Calculate the eigenvalues and eigenvectors of the covariance matrix.
 Select the top k eigenvectors corresponding to the largest eigenvalues.
 Transform the data using these top k eigenvectors.

Applications: Image compression, noise reduction, visualization.

2. Linear Discriminant Analysis (LDA)


Linear Discriminant Analysis (LDA) is a supervised learning technique commonly
used for classification and dimensionality reduction. The main idea behind LDA is
to find a linear combination of features that best separate two or more classes. It
achieves this by maximizing the ratio of between-class variance to within-class
variance in any particular dataset, thereby ensuring maximum separability

Objective: Find a linear combination of features that best separate two or


more classes.

Process:

 Compute the within-class and between-class scatter matrices.


 Calculate the eigenvalues and eigenvectors for the scatter matrices.
 Select the top eigenvectors to form a new feature space.

3. Kernel PCA
Kernel Principal Component Analysis (Kernel PCA) is an extension of
Principal Component Analysis (PCA) that enables the technique to handle
non-linear data. While traditional PCA identifies the principal components
(the directions of maximum variance) in the original feature space, Kernel
PCA first maps the data into a higher-dimensional space using a non-linear
function and then performs PCA in this new space. This allows Kernel PCA
to capture the complex structures and relationships within the data that
linear PCA might miss.

Key Concepts

Objective: Extend PCA to non-linear data by using kernel functions to


project data into a higher-dimensional space where it becomes linearly
separable.

Process:

 Choose a kernel function (e.g., polynomial, Gaussian).


 Compute the kernel matrix.
 Perform eigenvalue decomposition on the kernel matrix.
 Select the top eigenvectors to transform the data.

Applications: Non-linear data problems, image processing.

4. Independent Component Analysis (ICA)


Independent Component Analysis (ICA) is a computational technique for
separating a multivariate signal into additive, independent non-Gaussian
components. It is primarily used for blind source separation, where the goal is to
decompose a mixed signal into its original sources. ICA assumes that the
components are statistically independent and that the observed data are linear
mixtures of these independent components.

Objective: Decompose multivariate signals into additive, independent


components.

Process:

 Center and whiten the data.


 Apply an algorithm (e.g., FastICA) to find independent components.

Applications: Blind source separation, signal processing, image


processing.

5. Singular Value Decomposition (SVD)


Singular Value Decomposition (SVD) is a fundamental matrix factorization
technique in linear algebra and is widely used in various fields such as signal
processing, statistics, and machine learning. SVD decomposes a given matrix
into three simpler matrices, providing insights into the properties of the original
matrix, such as its rank, range, and null space.

Objective: Factorize a matrix into three matrices to capture essential


information.

Process

 Decompose the data matrix A into 𝑈, Σ, and VT such that A=UΣVT.


 Use the top k singular values and corresponding vectors to approximate
the original matrix.

Applications: Latent semantic analysis, image compression.

6. Bag of Words (BoW) / TF-IDF


Objective: Convert text data into numerical feature vectors.

Process:

 Tokenize the text.


 Build a vocabulary of all unique words.
 Create feature vectors based on word counts (BoW) or term frequency-
inverse document frequency (TF-IDF).

Applications: Text classification, information retrieval.

7. Factor analysis

. Factor Analysis (FA) is a statistical method used for identifying the underlying
relationships between observed variables. It is commonly used for dimensionality
reduction and fea+ture extraction in machine learning. FA assumes that the
observed variables are influenced by a smaller number of unobserved variables
called factors. These factors can be thought of as the underlying structure that
explains the patterns of correlations within the observed data.
Purpose:

 Identify underlying relationships between observed variables.


 Reduce dimensionality and extract features.

Key Concepts:

 Latent Variables (Factors): Unobserved variables influencing observed


data.
 Observed Variables: Measured data influenced by factors.
 Factor Loadings: Coefficients representing relationships between
observed variables and factors.
 Specific Variance: Variance of observed variables not explained by
factors.

Principal component analysis algorithm


Recommendation system

https://www.geeksforgeeks.org/ml-content-based-recommender-system/
https://www.geeksforgeeks.org/collaborative-filtering-ml/
In machine learning, particularly in Support Vector Machines (SVMs), a hyperplane is a crucial
concept used for classification tasks. To understand it better, let's break it down:

### What is a Hyperplane?

A hyperplane is a subspace whose dimension is one less than that of its ambient space. In simpler
terms:
- In a 2-dimensional space (like a piece of paper), a hyperplane is a line.

- In a 3-dimensional space (like the real world), a hyperplane is a plane.

- in higher dimensions, it's a bit harder to visualize, but the idea extends
similarly: in an n-dimensional space, a hyperplane is an (𝑛−1)-dimensional flat
affine subspace.

### Hyperplane in the Context of SVM

Support Vector Machines are supervised learning models used for classification and regression
analysis. In the context of binary classification, the goal of an SVM is to find the hyperplane that best
separates the data into two classes.

#### Key Characteristics of the Hyperplane in SVM:

1. **Maximal Margin**:

- SVM aims to find the hyperplane that maximizes the margin between the two classes. The margin
is defined as the distance between the hyperplane and the nearest points from either class, which
are called support vectors.

- By maximizing this margin, the SVM ensures that the classifier is not only accurate but also robust
to new data points, reducing the risk of overfitting.

2. **Support Vectors**:

- Support vectors are the data points that lie closest to the hyperplane and are most difficult to
classify. These points are crucial because they determine the position and orientation of the
hyperplane.

- The optimal hyperplane is the one that is equidistant from the support vectors of each class.

1. Mathematical Formulation:

 The equation of a hyperplane in an 𝑛n-dimensional space can be written


as:

𝑤⋅𝑥+𝑏=0w⋅x+b=0
Here, 𝑤w is the weight vector (normal to the hyperplane), 𝑥x is the
input feature vector, and 𝑏b is the bias term.
 For classification, the decision rule is:

Class=sign(𝑤⋅𝑥+𝑏)Class=sign(w⋅x+b)
This means that a point is classified based on which side of the
hyperplane it falls on.
- For classification, the decision rule is:

### Visual Example

In a 2-dimensional space (2D), imagine plotting the data points on a graph. If you have two classes,
say Class A and Class B, the SVM will find a line (hyperplane) that best separates these two classes.
Ideally, this line will be placed in such a way that the distance to the nearest points (support vectors)
from both classes is maximized.

### Extending to Non-linear Boundaries

For more complex datasets where a linear hyperplane cannot separate the classes effectively, SVMs
use a technique called the **kernel trick**. Kernels transform the data into a higher-dimensional
space where a hyperplane can then separate the classes. Common kernels include polynomial
kernels, radial basis function (RBF) kernels, and others.

### Summary

In summary, the hyperplane in SVMs is a critical concept that defines the decision boundary
between classes in a classification problem. The primary goal is to find the hyperplane that
maximizes the margin, ensuring a robust and accurate classification model. This is achieved by
focusing on the support vectors and potentially using kernel methods for non-linear separation.
Convergence means two random variables are present and difference between their probability is
less means two variables value are match with each other then we called as convergence.
Logistic regression:
https://www.geeksforgeeks.org/understanding-logistic-regression/
clustering:::
https://www.javatpoint.com/clustering-in-machine-learning
Agglomerative

You might also like