0% found this document useful (0 votes)
33 views10 pages

Love Report 1

Uploaded by

vinayakjivtode84
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views10 pages

Love Report 1

Uploaded by

vinayakjivtode84
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

1

RAJIV GANDHI COLLEGE OF ENGINEERING, RESEARCH & TECHNOLOGY, CHANDRAPUR

(DR. BABASAHEB AMBEDKAR TECHNOLOGICAL UNIVERSITY, LONERE)

(2023-24)

SEMINAR REPORT
On
MACINE LEARNING
(PRINCIPAL COMPONENT ANALYSIS)

SEMISTER- III SECOND YEAR


Seminar –I

Submitted
By
NAME: LAHU FULSING AJMERA
ROLL NO: CSEC333

Guided By Seminar In charge


Prof. R. V. Lichode Prof. R. V. Lichode

Dr. Nitin Janwe


HOD, CSE & IT
2

TABLE CONTENT
3

No. Section Page No.

1 Introduction 4

1.1 Overview of Machine Learning 4

1.2 Importance of Dimensionality Reduction 4

2 Principal Component Analysis (PCA) 5

2.1 What is PCA? 5

2.2 The Mathematical Foundation of PCA 5

2.2.1 Step 1: Standardize the Data 5

2.2.2 Step 2: Compute the Covariance Matrix 5

2.2.3 Step 3: Eigenvalue Decomposition 5

2.2.4 Step 4: Sort and Select Principal Components 5

2.2.5 Step 5: Transform the Data 6

2.3 How PCA Works (Example) 6

3 Applications of PCA in Machine Learning 7

3.1 Data Preprocessing and Noise Reduction 7

3.2 Feature Extraction 7

3.3 Visualization of High-Dimensional Data 7

3.4 Face Recognition 7

3.5 Principal Component Regression (PCR) 7

4 Advantages and Limitations of PCA 8

4.1 Advantages 8

4.2 Limitations 8

5 Conclusion 9

6 References 9

ABSTRACT
4

Machine learning (ML) has become one of the most


transformative technologies of the 21st century,
revolutionizing industries and improving systems in diverse
fields such as healthcare, finance, marketing, and autonomous
vehicles. This report explores the basics of machine learning,
its types, and the role of Principal Component Analysis (PCA)
as a dimensionality reduction technique in machine learning
pipelines. PCA plays a crucial role in simplifying complex
datasets while retaining essential information, making it an
indispensable tool for data preprocessing, feature extraction,
and noise reduction. We discuss how PCA works, its
mathematical foundation, and its applications within the
broader context of machine learning.

1. INTRODUCTION

1.1 Overview of Machine Learning:


5

Machine learning is a subset of artificial intelligence (AI) that involves building


algorithms that allow computers to learn from data and make predictions or
decisions without being explicitly programmed. The core idea behind machine
learning is to use statistical techniques to learn patterns in data, which can then
be applied to predict future outcomes or classify new data.
Machine learning can be broadly categorized into three types:
Supervised Learning: Algorithms learn from labelled data and make predictions
based on that learning (e.g., classification, regression).
Unsupervised Learning: Algorithms find patterns or groupings in unlabelled
data (e.g., clustering, anomaly detection).
Reinforcement Learning: An agent learns by interacting with an environment
and receiving feedback in the form of rewards or penalties.
With the increasing availability of large datasets and advancements in
computational power, machine learning has become an essential tool for solving
complex real-world problems.

1.2 Importance of Dimensionality Reduction:


In many machine learning problems, especially those involving large datasets,
the number of features (dimensions) can be extremely high. High-dimensional
data often leads to several challenges:
Curse of Dimensionality: As the number of features increases, the amount of
data needed to train a model effectively grows exponentially.
Overfitting: More features increase the likelihood of a model learning noise
rather than useful patterns.
Computational Complexity: High-dimensional data require more computational
resources for processing and analysis.
Dimensionality reduction techniques, such as Principal Component Analysis
(PCA), address these challenges by reducing the number of features while
preserving the essential information in the data.
2. PRINCIPAL COMPONENT ANALYSIS (PCA)

2.1 What is PCA?


6

Principal Component Analysis (PCA) is a statistical technique used for


dimensionality reduction. It transforms a dataset of possibly correlated variables
into a smaller set of uncorrelated variables called principal components. These
components capture the maximum variance in the data, allowing for effective
data compression and feature extraction.
PCA is widely used in exploratory data analysis, data preprocessing, image
compression, and noise reduction, particularly in high-dimensional data such as
images or genomics.

2.2 The Mathematical Foundation of PCA


PCA works by identifying the directions (principal components) in which the
data varies the most. The main steps of PCA are as follows:

Step 1: Standardize the Data


PCA is sensitive to the scales of the features, so it is crucial to standardize the
data before applying PCA. Standardization transforms the data so that each
feature has a mean of 0 and a standard deviation of 1.

Step 2: Compute the Covariance Matrix


The covariance matrix describes the variance and correlation between the
features in the dataset. If the data has \ (n \) features, the covariance matrix is an
\ (n \times n \) matrix where each element \(\text{cove} (Xie, Ju) \) represents
the covariance between features \ (Xie \) and \ (Ju \).

Step 3: Eigenvalue Decomposition


Eigenvalue decomposition is performed on the covariance matrix. This yields a
set of eigenvectors and eigenvalues. The eigenvectors represent the directions of
the new axes (principal components), and the eigenvalues indicate the amount
of variance captured by each component.
Step 4: Sort and Select Principal Components
The eigenvectors are sorted in descending order based on their corresponding
eigenvalues. The first few eigenvectors, with the largest eigenvalues, are
7

selected to form the new feature space. These components capture the most
significant variance in the data.

Step 5: Transform the Data


The original dataset is projected onto the selected principal components to form
a new, lower-dimensional dataset.

2.3 How PCA Works (Example)

Consider a dataset with two features: height and weight. These features may be
correlated, meaning that an increase in height could be associated with an
increase in weight. PCA will identify a new axis (the first principal component)
that captures the greatest variance, often representing the linear combination of
height and weight. The second principal component will capture the remaining
variance, orthogonal to the first component.

This transformation allows us to reduce the data from two dimensions (height
and weight) to one dimension, while preserving as much of the original variance
as possible.

3. APPLICATIONS OF PCA IN MACHINE LEARNING


8

3.1 Data Preprocessing and Noise Reduction


In many machine learning tasks, datasets can have a high level of noise or
irrelevant features. By applying PCA, we can reduce the dimensionality of the
data and filter out noise. This often leads to improved model performance, as
the algorithm focuses on the most important features.

3.2 Feature Extraction


PCA can be used for **feature extraction** when the original features are
highly correlated or redundant. By projecting the data onto the principal
components, we can create a smaller, uncorrelated set of features that still retain
the essential information of the original dataset.

3.3 Visualization of High-Dimensional Data


One of the most common uses of PCA is in the visualization of high-
dimensional data. By reducing the data to two or three principal components,
we can visualize the structure and patterns within the data that might be hidden
in higher dimensions.

3.4 Face Recognition


PCA has been widely used in **image processing** for facial recognition tasks.
In these applications, each image is considered a point in a high-dimensional
space (with the number of dimensions corresponding to the number of pixels in
the image). PCA helps reduce the dimensionality while preserving key features
like facial structure, enabling efficient face recognition algorithms.

3.5 Principal Component Regression (PCR)


Principal Component Regression combines PCA with regression analysis. By
using the principal components as predictors in a regression model, PCR can
help avoid multicollinearity issues in datasets with highly correlated features,
leading to more stable and reliable predictions.
4. ADVANTAGES AND LIMITATIONS OF PCA
9

4.1 Advantages
Reduces Dimensionality: PCA reduces the number of features, making it easier
and faster to process large datasets.
Improves Model Performance: By removing irrelevant or redundant features,
PCA can enhance the performance of machine learning models, especially in
terms of generalization.
Noise Reduction: PCA can filter out noise from the data, resulting in more
robust models.
Uncorrelated Features: After PCA, the transformed features (principal
components) are uncorrelated, which simplifies subsequent modelling
techniques.

4.2 Limitations
Linear Assumption: PCA assumes that the relationships between features are
linear, which may not be the case in all datasets. Non-linear dimensionality
reduction techniques (e.g., t-SNE, autoencoders) may be more appropriate in
such cases.
Interpretability: While PCA reduces the dimensionality of the data, the new
features (principal components) may not have an intuitive interpretation,
making it harder to understand what the components represent.
Loss of Information: Although PCA preserves the most important information,
some data variation is inevitably lost, particularly when reducing to a very low
number of dimensions.

5. CONCLUSION
10

Machine learning has transformed the way we approach data analysis, enabling
more accurate predictions, better decision-making, and automated systems.
Principal Component Analysis (PCA) plays a vital role in machine learning,
particularly in tasks involving high-dimensional data. By reducing the number
of features, PCA helps mitigate the curse of dimensionality, enhances model
performance, and facilitates data visualization. While PCA is an effective tool
for dimensionality reduction, it is important to consider its assumptions and
limitations, and to choose the appropriate technique based on the nature of the
data and the problem at hand.
As machine learning continues to evolve, techniques like PCA will remain
essential for dealing with complex, high-dimensional datasets, ensuring that
machine learning models are both efficient and accurate.

6.REFERENCES

1. Jolliffe, I. T. (2002). Principal Component Analysis. Springer-Verlag New


York.
2. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
3. Hastie, T., Toshiari, R., & Friedman, J. (2009). The Elements of Statistical
Learning. Springer.
4. Sheens, J. (2014). A Tutorial on Principal Component Analysis.
arXiv:1404.1100.
5. Scikit-learn Documentation. (n.d.). Principal Component Analysis. Retrieved
from [https://scikit-learn.org/stable/modules/decomposition.html#pca]
(https://scikit-learn.org/stable/modules/decomposition.html#pca)

This seminar report provides a comprehensive understanding of machine


learning and the crucial role of PCA in reducing the complexity of data while
maintaining its essential features. Through this exploration, we see how
dimensionality reduction techniques can significantly enhance the efficiency
and performance of machine learning algorithms.

You might also like