0% found this document useful (0 votes)
53 views3 pages

Principal Component Analysis

Principal Component Analysis (PCA) is a popular dimensionality reduction technique that uses linear algebra to project high-dimensional data into a lower-dimensional space for visualization and model training. It works by performing an eigendecomposition or singular value decomposition on the data to find the most relevant components. Singular Value Decomposition (SVD) is another popular dimensionality reduction method that decomposes a matrix into three component matrices. Latent Semantic Analysis (LSA) applies SVD to text data represented as word-document matrices to distill documents into their most important semantic elements.

Uploaded by

Kang Chul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views3 pages

Principal Component Analysis

Principal Component Analysis (PCA) is a popular dimensionality reduction technique that uses linear algebra to project high-dimensional data into a lower-dimensional space for visualization and model training. It works by performing an eigendecomposition or singular value decomposition on the data to find the most relevant components. Singular Value Decomposition (SVD) is another popular dimensionality reduction method that decomposes a matrix into three component matrices. Latent Semantic Analysis (LSA) applies SVD to text data represented as word-document matrices to distill documents into their most important semantic elements.

Uploaded by

Kang Chul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Principal Component Analysis

Often a dataset has many columns, perhaps tens, hundreds, thousands or more. Modeling data
with many features is challenging, and models built from data that include irrelevant features
are often less skillful than models trained from the most relevant data. It is hard to know which
features of the data are relevant and which are not. Methods for automatically reducing the
number of columns of a dataset are called dimensionality reduction, and perhaps the most
popular is method is called the principal component analysis or PCA for short. This method is
used in machine learning to create projections of high-dimensional data for both visualization
and for training models. The core of the PCA method is a matrix factorization method from
linear algebra. The eigendecomposition can be used and more robust implementations may use
the singular-value decomposition or SVD.

Singular-Value Decomposition
Another popular dimensionality reduction method is the singular-value decomposition method
or SVD for short. As mentioned and as the name of the method suggests, it is a matrix
factorization method from the field of linear algebra. It has wide use in linear algebra and can
be used directly in applications such as feature selection, visualization, noise reduction and
more. We will see two more cases below of using the SVD in machine learning.

Latent Semantic Analysis


In the sub-field of machine learning for working with text data called natural language
processing, it is common to represent documents as large matrices of word occurrences. For
example, the columns of the matrix may be the known words in the vocabulary and rows may
be sentences, paragraphs, pages or documents of text with cells in the matrix marked as the
count or frequency of the number of times the word occurred. This is a sparse matrix
representation of the text. Matrix factorization methods such as the singular-value
decomposition can be applied to this sparse matrix which has the effect of distilling the
representation down to its most relevant essence. Documents processed in thus way are much
easier to compare, query and use as the basis for a supervised machine learning model. This
form of data preparation is called Latent Semantic Analysis or LSA for short, and is also
known by the name Latent Semantic Indexing or LSI.
3.10. Recommender Systems 15

Recommender Systems
Predictive modeling problems that involve the recommendation of products are
called recom- mender systems, a sub-field of machine learning. Examples
include the recommendation of books based on previous purchases and
purchases by customers like you on Amazon, and the recommendation of
movies and TV shows to watch based on your viewing history and viewing
history of subscribers like you on Netflix. The development of recommender
systems is primarily concerned with linear algebra methods. A simple example
is in the calculation of the similarity between sparse customer behavior vectors
using distance measures such as Euclidean distance or dot products. Matrix
factorization methods like the singular-value decomposition are used widely in
recommender systems to distill item and user data to their essence for querying
and searching and comparison.

Deep Learning
Artificial neural networks are nonlinear machine learning algorithms that are
inspired by elements of the information processing in the brain and have proven
effective at a range of problems not least predictive modeling. Deep learning is
the recent resurged use of artificial neural networks with newer methods and
faster hardware that allow for the development and training of larger and
deeper (more layers) networks on very large datasets. Deep learning methods
are routinely achieve state-of-the-art results on a range of challenging problems
such as machine translation, photo captioning, speech recognition and much
more.
At their core, the execution of neural networks involves linear algebra data
structures multiplied and added together. Scaled up to multiple dimensions,
deep learning methods work with vectors, matrices and even tensors of inputs
and coefficients, where a tensor is a matrix with more than two dimensions.
Linear algebra is central to the description of deep learning methods via matrix
notation to the implementation of deep learning methods such as Google’s
TensorFlow Python library that has the word ”tensor” in its name.

Summary
In this chapter, you discovered 10 common examples of machine learning that
you may be familiar with that use and require linear algebra. Specifically, you
learned:
ˆ The use of linear algebra structures when working with data such as
tabular datasets and images.
ˆ Linear algebra concepts when working with data preparation such as
one hot encoding and dimensionality reduction.
ˆ The in-grained use of linear algebra notation and methods in sub-
fields such as deep learning, natural language processing and
recommender systems.

3.12.1 Next
This is the end of the first part, in the next part you will discover how to
manipulate arrays of data in Python using NumPy.

You might also like