Dimensionality Reduction Algorithms
Dimensionality Reduction Algorithms
On
Submitted By:
P.V.S.S.K KASHYAP
18891A05A5
1
VIGNAN INSTITUTE OF TECHNOLOGY & SCIENCE
(NBA Accredited & Affiliated to Jawaharlal Nehru Technological University, Hyderabad)
Deshmukhi (v), Pochampally (M), Yadagdri - Bhuvanagiri District, Telangana-508284
Vision
To emerge as a premier center for education and research in computer science and engineering
and in transforming students into innovative professionals of contemporary and future technologies to
cater to the global needs of human resources for IT and ITES companies.
Mission
To produce excellent computer science professionals by imparting quality training,
hands-on-experience and value based education.
To strengthen links with industry through collaborative partnerships in research &
product development and student internships.
To promote research based projects and activities among the students in the emerging
areas of technology.
To explore opportunities for skill development in the application of computer science
among rural and underprivileged population.
2
Program Educational Objectives
● To create and sustain a community of learning in which students acquire knowledge and
apply in their concerned fields with due consideration for ethical, ecological, and
economic issues.
● To provide knowledge based services so as to meet the needs of the society and industry.
● To make the students understand, design and implement the concepts in multiple arenas.
● To educate the students in disseminating the research findings with good soft skills so as
to become successful entrepreneurs.
3
PREFACE
have tried my best to elucidate all the relevant detail to the topic to be included in the
report. While in the beginning I have tried to give a general view about this topic.
4
Table of Contents
I. ACKNOWLEDGEMENT 6
II. CERTIFICATE 7
1. MACHINE LEARNING 8
3. WORKING 10
4. USAGE 12
10. APPLICATIONS 24
11. CONCLUSION 31
13. REFERENCES 33
5
ACKNOWLEDGEMENT
I would like to thank respected, Head of the Department for giving me such a wonderful
opportunity to expand my knowledge for my own branch and giving me guidelines to present a
seminar report. It helped me a lot to realize of what we study for.
I would like to thank respected technical seminar coordinator for organizing seminars with the
continuous support.
Secondly, I would like to thank my parents who patiently helped me as I went through my work
and helped to modify and eliminate some of the irrelevant or unnecessary stuff.
Thirdly, I would like to thank my friends who helped me to make my work more organized and
well-stacked till the end.
Last but not the least, I would thank the Almighty for giving me strength to complete my report
on time.
6
CERTIFICATE
Coordinator
Technical Seminar Head of the Department
7
1. MACHINE LEARNING
Recommendation engines are a common use case for machine learning. Other popular uses
include fraud detection, spam filtering, malware threat detection, business process automation
(BPA) and predictive maintenance.
8
2. TYPES OF MACHINE LEARNING
Classical machine learning is often categorized by how an algorithm learns to become more
accurate in its predictions. There are four basic approaches: supervised learning, unsupervised
learning, semi-supervised learning and reinforcement learning. The type of algorithm data
scientists choose to use depends on what type of data they want to predict.
Supervised learning: In this type of machine learning, data scientists supply algorithms with
labelled training data and define the variables they want the algorithm to assess for correlations.
Both the input and the output of the algorithm is specified.
Unsupervised learning: This type of machine learning involves algorithms that train on
unlabeled data. The algorithm scans through data sets looking for any meaningful connection.
The data that algorithms train on as well as the predictions or recommendations they output are
predetermined.
Semi-supervised learning: This approach to machine learning involves a mix of the two
preceding types. Data scientists may feed an algorithm mostly labeled training data, but the
model is free to explore the data on its own and develop its own understanding of the data set.
Reinforcement learning: Data scientists typically use reinforcement learning to teach a machine
to complete a multi-step process for which there are clearly defined rules. Data scientists
program an algorithm to complete a task and give it positive or negative cues as it works out how
to complete a task. But for the most part, the algorithm decides on its own what steps to take
along the way.
9
3. WORKING
10
then apply to new, unlabeled data. The performance of algorithms typically improves when they
train on labelled data sets. But labelling data can be time consuming and expensive. Semi-
supervised learning strikes a middle ground between the performance of supervised learning and
the efficiency of unsupervised learning. Some areas where semi-supervised learning is used
include:
Machine translation: Teaching algorithms to translate language based on less than a full
dictionary of words.
Fraud detection: Identifying cases of fraud when you only have a few positive examples.
Labelling data: Algorithms trained on small data sets can learn to apply data labels to larger sets
automatically.
11
4. USAGE
Facebook uses machine learning to personalize how each member's feed is delivered. If a
member frequently stops to read a particular group's posts, the recommendation engine will start
to show more of that group's activity earlier in the feed.
Behind the scenes, the engine is attempting to reinforce known patterns in the member's online
behaviors. Should the member change patterns and fail to read posts from that group in the
coming weeks, the news feed will adjust accordingly.
In addition to recommendation engines, other uses for machine learning include the following:
Customer relationship management. CRM software can use machine learning models to analyse
email and prompt sales team members to respond to the most important messages first. More
advanced systems can even recommend potentially effective responses.
Business intelligence. BI and analytics vendors use machine learning in their software to identify
potentially important data points, patterns of data points and anomalies.
Human resource information systems. HRIS systems can use machine learning models to filter
through applications and identify the best candidates for an open position.
12
Self-driving cars. Machine learning algorithms can even make it possible for a semi-autonomous
car to recognize a partially visible object and alert the driver.
Virtual assistants. Smart assistants typically combine supervised and unsupervised machine
learning models to interpret natural speech and supply context.
When it comes to advantages, machine learning can help enterprises understand their customers
at a deeper level. By collecting customer data and correlating it with behaviors over time,
machine learning algorithms can learn associations and help teams tailor product development
and marketing initiatives to customer demand.
Some companies use machine learning as a primary driver in their business models. Uber, for
example, uses algorithms to match drivers with riders. Google uses machine learning to surface
the ride advertisements in searches.
But machine learning comes with disadvantages. First and foremost, it can be expensive.
Machine learning projects are typically driven by data scientists, who command high salaries.
These projects also require software infrastructure that can be expensive.
There is also the problem of machine learning bias. Algorithms trained on data sets that exclude
certain populations or contain errors can lead to inaccurate models of the world that, at best, fail
and, at worst, are discriminatory. When an enterprise bases core business processes on biased
models it can run into regulatory and reputational harm.
13
How to choose the right machine learning model
The process of choosing the right machine learning model to solve a problem can be time
consuming if not approached strategically.
Step 1: Align the problem with potential data inputs that should be considered for the solution.
This step requires help from data scientists and experts who have a deep understanding of the
problem.
Step 2: Collect data, format it and label the data if necessary. This step is typically led by data
scientists, with help from data wranglers.
Step 3: Chose which algorithm(s) to use and test to see how well they perform. This step is
usually carried out by data scientists.
Step 4: Continue to fine tune outputs until they reach an acceptable level of accuracy. This step
is usually carried out by data scientists with feedback from experts who have a deep
understanding of the problem.
Complex models can produce accurate predictions, but explaining to a lay person how an output
was determined can be difficult.
14
While machine learning algorithms have been around for decades, they've attained new
popularity as artificial intelligence has grown in prominence. Deep learning models, in
particular, power today's most advanced AI applications.
Machine learning platforms are among enterprise technology's most competitive realms, with
most major vendors, including Amazon, Google, Microsoft, IBM and others, racing to sign
customers up for platform services that cover the spectrum of machine learning activities,
including data collection, data preparation, data classification, model building, training and
application deployment.
Continued research into deep learning and AI is increasingly focused on developing more
general applications. Today's AI models require extensive training in order to produce an
algorithm that is highly optimized to perform one task. But some researchers are exploring ways
to make models more flexible and are seeking techniques that allow a machine to apply context
learned from one task to future, different tasks.
15
6. Principal Component Analysis
Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning. It is one of the popular tools that is used for
exploratory data analysis and predictive modeling. It is a technique to draw strong patterns from
the given dataset by reducing the variances.
Correlation: It signifies that how strongly two variables are related to each other.
Orthogonal: It defines that variables are not correlated to each other, and hence the correlation
between the pair of variables is zero.
Covariance Matrix: A matrix containing the covariance between the pair of variables is called
the Covariance Matrix.
1. Getting the dataset: we need to take the input dataset and divide it into two subparts X
and Y, where X is the training set, and Y is the validation set.
2. Representing data into a structure: In the 2nd step we will represent our dataset into a
structure.
16
3. Standardizing the data: In this step, we will standardize our dataset. Such as in a
particular column, the features with high variance are more important compared to the
features with lower variance.
4. Standardizing the data: In this step, we will standardize our dataset. Such as in a
particular column, the features with high variance are more important compared to the
features with lower variance.
5. Calculating the Covariance of Z: To calculate the covariance of Z, we will take the
matrix Z, and will transpose it. After transpose, we will multiply it by Z. The output
matrix will be the Covariance matrix of Z.
6. Calculating the Eigen Values and Eigen Vectors: we will calculate the eigenvalues
and eigenvectors for the resultant covariance matrix Z
7. Calculating the new features Or Principal Components : Here we will calculate the
new features. To do this, we will multiply the P* matrix to the Z. In the resultant matrix
Z*, each observation is the linear combination of original features. Each column of the
Z* matrix is independent of each other.
8. Remove less or unimportant features from the new dataset: The new feature set has
occurred, so we will decide here what to keep and what to remove. It means, we will only
keep the relevant or important features in the new dataset, and unimportant features will
be removed out.
Applications of PCA
17
7. Independent Component Analysis
Working of ICA
The standard problem used to describe ICA is the “Cocktail Party Problem”. In its simplest form,
imagine two people having a conversation at a cocktail party (like the red and blue speakers
above). For whatever reason, you have two microphones placed near both party-goers (like the
purple and pink microphones above). Both voices are heard by both microphones at different
volumes based on the distance between the person and microphone. In other words, we record
two files that include audio from the two party-goers mixed together. The problem then is, how
can we separate them?
18
This problem is solved easily with Independent Component Analysis (ICA) which transforms
a set of vectors into a maximally independent set. Returning to our “Cocktail Party Problem”,
ICA will convert the two mixed audio recordings (represented by purple and pink waveforms
below) into two unmixed recordings of each individual speaker (represented by blue and red
waveforms below). Notice, that the number of inputs and outputs are the same, and since the
outputs are mutually independent there is no obvious way to drop components like in Principal
Component Analysis (PCA).
Applications of ICA
Image processing: A method that recognizes and outlines the hidden factors in
multivariate signals, ICA has significantly transformed the field of image processing,
19
Image De-Nosing: Using different methods it will remove most of the Noise that an
image has accumulated while capturing and it will enhance the image quality.
Handling incomplete data: handling missing data does not allow PCA to be effective.
Perhaps ICA is the sole solution to fill in missing data. Moreover, ICA can also be seen
as one of the data mining tools when it comes to handling incomplete data.
8. Methods used on projections
Long Short-Term Memory (LSTM)
ARIMA
Comparing Models
Long Short-Term Memory (LSTM): LSTM is a type of recurrent neural network that is
particularly useful for making predictions with sequential data.
ARIMA: The ARIMA model looks slightly different than the models above. We use the stats
models SARIMAX package to train the model and generate dynamic predictions. The SARIMA
model breaks down into a few parts.
20
Comparing Models: To compare model performance, we will look at root mean squared error
(RMSE) and mean absolute error (MAE). These measurements are both commonly used for
comparing model performance, but they have slightly different intuition and mathematical
meaning.
MAE: the mean absolute error tells us on average how far our predictions are from
the true value.
RMSE: we calculate RMSE by taking the square root of the sum of all of the squared
errors.
21
9. t-Distributed Stochastic Neighbor Embedding (t-SNE)
Applies a non-linear dimensionality reduction technique where the focus is on keeping the
very similar data points close together in lower-dimensional space.
Preserves the local structure of the data using student t-distribution to compute the
similarity between two points in lower-dimensional space.
Working:
Step 1: Find the pairwise similarity between nearby points in a high dimensional space.
22
Step 2: Map each point in high dimensional space to a low dimensional map based on the
pairwise similarity of points in the high dimensional space.
Step 3: Find a low-dimensional data representation that minimizes the mismatch between Pᵢⱼ and
qᵢⱼ using gradient descent based on Kullback-Leibler divergence (KL Divergence)
UMAP
: UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning
technique for dimension reduction. UMAP is constructed from a theoretical framework based in
Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that
applies to real world data
FACTS:
UMAP outperformed t-SNE and PCA, if we look at the 2d and 3d plot, we can see mini-
clusters that are being separated well. It is very effective for visualizing clusters or groups
of data points and their relative proximities
We know that UMAP is faster than tSNE when it concerns
large number of data points,
23
number of embedding dimensions greater than 2 or 3.
large number of ambient dimensions in the data set.
The purpose of this post is to provide a complete and simplified explanation of Principal
Component Analysis (PCA). We'll cover how it works step by step, so everyone can understand
it and make use of it, even those without a strong mathematical background. PCA is a widely
covered method on the web, and there are some great articles about it, but many spend too much
time in the weeds on the topic, when most of us just want to know how it works in a simplified
way. Principal component analysis can be broken down into five steps. I'll go through each step,
providing logical explanations of what PCA is doing and simplifying mathematical concepts
24
such as standardization, covariance, eigenvectors and eigenvalues without focusing on how to
compute them. Principal Component Analysis, or PCA, is a dimensionality-reduction method
that is often used to reduce the dimensionality of large data sets, by transforming a large set of
variables into a smaller one that still contains most of the information in the large set. Reducing
the number of variables of a data set naturally comes at the expense of accuracy, but the trick in
dimensionality reduction is to trade a little accuracy for simplicity. Because smaller data sets are
easier to explore and visualize and make analyzing data much easier and faster for machine
learning algorithms without extraneous variables to process. So to sum up, the idea of PCA is
simple — reduce the number of variables of a data set, while preserving as much information as
possible
Application:
Backward Elimination
Backward elimination is a feature selection technique while building a machine learning model.
It is used to remove those features that do not have a significant effect on the dependent variable
or prediction of output. There are various ways to build a model in Machine Learning, which are:
All-in. Backward Elimination.
It is a stepwise regression approach that begins with a full (saturated) model and at each step
gradually eliminates variables from the regression model to find a reduced model that best
explains the data. Also known as Backward Elimination regression.
Forward selection is a type of stepwise regression which begins with an empty model and adds in
variables one by one. ... It is one of two commonly used methods of stepwise regression; the
other is backward elimination, and is almost opposite .The overall percentage of data that is
25
missing is important. Generally, if less than 5% of values are missing then it is acceptable to
ignore them (REF). However, the overall percentage missing alone is not enough; you also need
to pay attention to which data is missing.
Missing data is defined as the values or data that is not stored (or not present) for some variable/s
in the given dataset. Below is a sample of the missing data from the Titanic dataset. You can see
the columns 'Age' and 'Cabin' have some missing values.
What is variance in machine learning? Variance refers to the changes in the model when using
different portions of the training data set. Simply stated, variance is the variability in the model
prediction—how much the ML function can adjust depending on the given data set.
Low Variance Filter is a useful dimensionality reduction algorithm. ... The variance is a
statistical measure of the amount of variation in the given variable. If the variance is too low, it
means that it does not change much and hence it can be ignored.
26
Why do we use a low variance filter?
Filters out double-compatible columns, whose variance is below a user defined threshold.
Columns with low variance are likely to distract certain learning algorithms (in particular those
which are distance based) and are therefore better removed.
A small variance indicates that the data points tend to be very close to the mean, and to each
other. A high variance indicates that the data points are very spread out from the mean, and from
one another. Variance is the average of the squared distances from each point to the mean.
A large variance indicates that numbers in the set are far from the mean and far from each other.
A small variance, on the other hand, indicates the opposite. A variance value of zero, though,
indicates that all values within a set of numbers are identical.
This dimensionality reduction algorithm tries to discard inputs that are very similar to others. In
simple words, if your opinion is same as your boss, one of you is not required. If the value of two
input parameters is always the same, it means they represent the same entity. Then we do not
need two parameters there. Just one should be enough. In technical words, if there is a very high
correlation between two input variables, we can safely drop one of them. High Correlation filter:
A pair of variables having high correlation increases multicollinearity in the dataset. So, we can
use this technique to find highly correlated features and drop them accordingly.
The corr() method can be used to identify the correlation between the fields. Of course, before
we start we have to choose only the numeric fields as the corr() method works only with the
numeric fields. We can have a high correlation between non-numeric fields. But this method
works only on numeric fields.
High correlation between two variables means they have similar trends and are likely to carry
similar information. This can bring down the performance of some models drastically (linear and
logistic regression models, for instance). We can calculate the correlation between independent
numerical variables that are numerical in nature. If the correlation coefficient crosses a certain
27
threshold value, we can drop one of the variables (dropping a variable is highly subjective and
should always be done keeping the domain in mind).
Random Forest
Random Forest is one of the most widely used algorithms for feature selection. It comes
packaged with in-built feature importance so you don’t need to program that separately. This
helps us select a smaller subset of features.We need to convert the data into numeric form by
applying one hot encoding, as Random Forest (Scikit-Learn Implementation) takes only numeric
inputs. Let’s also drop the ID variables (Item_Identifier and Outlet_Identifier) as these are just
unique numbers and hold no significant importance for us currently.
Random forest is a Supervised Machine Learning Algorithm that is used widely in Classification
and Regression problems. It builds decision trees on different samples and takes their majority
vote for classification and average in case of regression. ... It performs better results for
classification problems.
A random forest is a machine learning technique that's used to solve regression and classification
problems. ... A random forest eradicates the limitations of a decision tree algorithm. It reduces
the overfitting of datasets and increases precision.
Random Forest is a supervised machine learning algorithm made up of decision trees. Random
Forest is used for both classification and regression—for example, classifying whether an email
is “spam” or “not spam”
The random forest is a classification algorithm consisting of many decisions trees. It uses
bagging and feature randomness when building each individual tree to try to create an
uncorrelated forest of trees whose prediction by committee is more accurate than that of any
individual tree.
Factor Analysis
Factor analysis is one of the unsupervised machine learning algorithms which is used for
dimensionality reduction. This algorithm creates factors from the observed variables to represent
the common variance i.e. variance due to correlation among the observed variables.
28
Factor analysis is a powerful data reduction technique that enables researchers to investigate
concepts that cannot easily be measured directly. By boiling down a large number of variables
into a handful of comprehensible underlying factors, factor analysis results in easy-to-
understand, actionable data.
PCA, short for Principal Component Analysis, and Factor Analysis, are two statistical methods
that are often covered together in classes on Multivariate
There are two types of factor analyses, exploratory and confirmatory. Exploratory factor analysis
(EFA) is method to explore the underlying structure of a set of observed variables, and is a
crucial step in the scale development process. The purpose of factor analysis is to reduce many
individual items into a fewer number of dimensions. Factor analysis can be used to simplify data,
such as reducing the number of variables in regression models. The overall objective of factor
analysis is data summarization and data reduction. A central aim of factor analysis is the orderly
simplification of a number of interrelated measures. Factor analysis describes the data using
many fewer dimensions than original variables.
Factor analysis is used to identify "factors" that explain a variety of results on different tests. For
example, intelligence research found that people who get a high score on a test of verbal ability
are also good on other tests that require verbal abilities.
Auto-Encoder
An autoencoder is an unsupervised learning technique for neural networks that learns efficient
data representations (encoding) by training the network to ignore signal “noise.” Autoencoders
can be used for image denoising, image compression, and, in some cases, even generation of
image data.
Autoencoder is a type of neural network that can be used to learn a compressed representation of
raw data. ... The encoder compresses the input and the decoder attempts to recreate the input
from the compressed version provided by the encoder. After training, the encoder model is saved
and the decoder is discarded
29
Forward Selection
Forward selection is a type of stepwise regression which begins with an empty model and adds in
variables one by one. In each forward step, you add the one variable that gives the single best
improvement to your model. It is one of two commonly used methods of stepwise regression; the
other is backward elimination, and is almost opposite. In that, you start with a model that
includes every possible variable and eliminate the extraneous variables one by one.
Forward selection typically begins with only an intercept. One tests the various variables that
may be relevant, and the ‘best’ variable—where “best” is determined by some pre-determined
criteria—is added to the model. As the model continues to improve (per that same criteria) we
continue the process, adding in one variable at a time and testing at each step. Once the model no
longer improves with adding more variables, the process stops. The criterion used to determine
which variable goes in when are varied. You could be attempting to find the lowest score under
cross validation, the lowest p-value, or any of a number of other tests or measures of
accuracy.Since stepwise regression tends toward over-fitting, which happens when we put in
more variables than is actually good for the model; it typically shows a very close, neat fit of the
data used in regression, but the model will be far off from additional data points and not good for
interpolation. Therefore, it is usually good to have strict criteria for adding in any variables.
Forward selection is an iterative method in which we start with having no feature in the model.
In each iteration, we keep adding the feature which best improves our model till an addition of a
new variable does not improve the performance of the model.
Score comparison
In machine learning, scoring is the process of applying an algorithmic model built from a
historical dataset to a new dataset in order to uncover practical insights that will help solve a
business problem
The primary objective of model comparison and selection is definitely better performance of the
machine learning software/solution. The objective is to narrow down on the best algorithms that
suit both the data and the business requirements.
30
11.CONCLUSION
31
12.FUTURE SCOPE:
The application who are trying to have or develop a better efficient and better
accurate project there is a wide range of methodologies that are available in this
algorithms. Dimensionality reduction will have a better performance as the basic
negatives and curse if the algorithms are being nullified and better accuracy and
better performance will be achieved. So, the application of this will be at a viewed
in a border way
32
13.REFERENCE:
• https://www.geeksforgeeks.org/
• https://www.guru99.com/
• https://www.edureka.co/
• https://www.javatpoint.com/
• https://www.sciencedirect.com/
• https://www.tutorialspoint.com/
33
34