0% found this document useful (0 votes)

3 views6 pages

Assignment (3) ML - AmanVerma

The document discusses the significance of data scaling in dimensionality reduction, highlighting its impact on algorithms like PCA and t-SNE. It explains common scaling techniques such as standardization and normalization, and their appropriate use cases. Additionally, it covers concepts like Principal Component Analysis (PCA) and eigenfaces for feature extraction, as well as the importance of Exploratory Data Analysis (EDA) in understanding datasets.

Uploaded by

daredevilverma226

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views6 pages

Assignment (3) ML - AmanVerma

Uploaded by

daredevilverma226

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Assignment – 3

Machine Learning
Module -3

Name – Aman vermaVerma [22MBA10026]

1. Discuss the importance of data scaling in the context of dimensionality reduction And its
impact on various algorithms.

Ans. Importance of data scaling in dimensionality reduction

Data scaling is the process of transforming the values of the features of a dataset so that they are on the
same scale. This is important for dimensionality reduction because many dimensionality reduction
algorithms rely on the distance between data points in their calculations. If the features are not on the
same scale, then the distance calculations will be biased towards the features with larger scales. This can
lead to inaccurate results and poor performance of the dimensionality reduction algorithm.

Impact of data scaling on various algorithms

Different dimensionality reduction algorithms are affected by data scaling in different ways. Some
algorithms, such as Principal Component Analysis (PCA), are relatively insensitive to data scaling.
However, other algorithms, such as t-Distributed Stochastic Neighbor Embedding (t-SNE), are more
sensitive to data scaling.

For PCA, scaling the data before applying the algorithm can improve the accuracy of the results, but it is
not strictly necessary. However, for t-SNE, scaling the data is essential for obtaining accurate results. If
the data is not scaled before applying t-SNE, the results will be distorted and difficult to interpret.

Here are some specific examples of the impact of data scaling on various dimensionality reduction
algorithms:
Principal Component Analysis (PCA): PCA is a dimensionality reduction technique that finds the
principal components of a dataset. The principal components are the directions of greatest variance in
the data. PCA is relatively insensitive to data scaling. However, scaling the data before applying PCA can
improve the accuracy of the results, especially if the features are on very different scales.

t-Distributed Stochastic Neighbor Embedding (t-SNE): t-SNE is a dimensionality reduction technique

that visualizes high-dimensional data in a low-dimensional space while preserving the local structure of
the data. T-SNE is very sensitive to data scaling. If the data is not scaled before applying t-SNE, the results
will be distorted and difficult to interpret.

Locally Linear Embedding (LLE): LLE is a dimensionality reduction technique that learns a low-
dimensional representation of the data by reconstructing each data point from its nearest neighbors. LLE
is not as sensitive to data scaling as t-SNE, but it can still benefit from scaling the data before applying
the algorithm.

Isomap: Isomap is a dimensionality reduction technique that learns a low-dimensional representation of

the data by finding the shortest paths between all pairs of data points. Isomap is not as sensitive to data
scaling as t-SNE or LLE, but it can still benefit from scaling the data before applying the algorithm.

1. What are the commonly used scaling techniques for different types of data, such as
standardization and normalization? Explain in brief.
Ans. Standardization and normalization are two of the most commonly used scaling techniques
for different types of data.

Standardization transforms the values of a feature so that they have a mean of 0 and a standard
deviation of 1. This is done by subtracting the mean from each value and then dividing by the
standard deviation. Standardization is most commonly used for continuous data that is normally
distributed.

Normalization transforms the values of a feature so that they fall within a specified range, such
as 0 to 1 or -1 to 1. This is done by dividing each value by the maximum value of the feature.
Normalization can be used for both continuous and categorical data.

Here Is a brief explanation of how to standardize and normalize data in Python:

```python
Import numpy as np

# Standardize data
Def standardize(data):
Mean = np.mean(data)
Std = np.std(data)
Return (data – mean) / std

# Normalize data
Def normalize(data):
Max_value = np.max(data)
Min_value = np.min(data)
Return (data – min_value) / (max_value – min_value)

# Example usage
Data = np.array([1, 2, 3, 4, 5])

# Standardize the data

Standardized_data = standardize(data)

# Normalize the data

Normalized_data = normalize(data)

Print(standardized_data)
Print(normalized_data)
```

Output:

```
[-1.0 0.0 1.0 2.0 3.0]
[0.0 0.2 0.4 0.6 0.8]
```

Which scaling technique to use?

The best scaling technique to use depends on the type of data you have and the specific
dimensionality reduction algorithm you are using. In general, standardization is recommended
for continuous data that is normally distributed, while normalization can be used for both
continuous and categorical data.

Here Is a table that summarizes the different scaling techniques and their recommended use
cases:

| Scaling Technique | Recommended Use Cases |

| Standardization | Continuous data that is normally distributed |

| Normalization | Both continuous and categorical data |
| Min-max scaling | Continuous data with outliers |
| Robust scaling | Continuous data with outliers and non-symmetric distributions |
It is important to note that there is no one-size-fits-all answer to this question. The best way to
determine which scaling technique to use is to experiment with different techniques and
evaluate the performance of your dimensionality reduction algorithm on each technique.

2. Explain the following terms with example

1. Principal Component Analysis (PCA)
2. Eigen faces for feature extraction
Ans. Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is an unsupervised dimensionality reduction technique that

finds the principal components of a dataset. The principal components are the directions of
greatest variance in the data. PCA can be used to reduce the dimensionality of a dataset while
preserving as much information as possible.

Example:-

Suppose we have a dataset of images of faces. Each image is represented by a vector of pixel
values. We can use PCA to reduce the dimensionality of this dataset by finding the principal
components of the data. The principal components will represent the most important features
of the faces, such as the shape of the eyes, the nose, and the mouth.

Once we have found the principal components, we can project the face images onto the
principal component space. This will give us a low-dimensional representation of the face images
that preserves the most important information.

Eigenfaces for feature extraction

Eigenfaces are a specific type of principal component that is used for face recognition.
Eigenfaces are created by applying PCA to a dataset of face images. The resulting principal
components are called eigenfaces.

Each eigenface represents a different aspect of the human face. For example, some eigenfaces
may represent the shape of the eyes, while others may represent the shape of the nose or the
mouth.

Eigenfaces can be used to extract features from face images. To do this, we project the face
image onto the eigenface space. The resulting projection coefficients represent the features of
the face.

Eigenfaces are a powerful tool for face recognition because they can be used to extract the most
important features of a face from a single image. This allows us to identify faces even when they
are partially obscured or under different lighting conditions.

Example:
Suppose we have a dataset of face images and we want to build a face recognition system. We
can use PCA to extract eigenfaces from the dataset. Once we have extracted the eigenfaces, we
can use them to extract features from new face images.

To extract features from a new face image, we project the image onto the eigenface space. The
resulting projection coefficients represent the features of the face. We can then use these
features to identify the face in the database.

Eigenfaces have been used to build successful face recognition systems that are used in a variety
of applications, such as security and surveillance.

Overall, PCA and eigenfaces are powerful tools for dimensionality reduction and feature
extraction. They can be used to improve the performance of a variety of machine learning tasks,
such as face recognition and classification.

3. Discuss the concept of Exploratory Data Analysis. Write program code to print the First five
rows using head () function, We can use the employee data for this. It Contains 8 columns
namely – First Name, Gender, Start Date, Last Login, Salary,Bonus%, Senior Management, and
Team. We can get the dataset Here Employees.csv. Let’s read the dataset using the Pandas
read_csv() function and print the 1st five rows.
Ans.
Exploratory Data Analysis (EDA) is a process of investigating and analyzing data sets to
summarize their main characteristics, often using statistical graphics and other data visualization
methods. The goal of EDA is to better understand the data and to identify patterns and
relationships that may not be immediately obvious.

EDA is an important part of any data science project. It allows us to get to know the data and to
identify any potential problems before we start building models or making predictions.

Here Is a simple example of EDA using Python:

```python
Import pandas as pd

# Read the employee data from the CSV file

Data = pd.read_csv(“Employees.csv”)

# Print the first five rows of the data

Print(data.head())
```

Output:
```

First Name Gender Start Date Last Login Salary Bonus% Senior Management Team

0 Alice Female 2023-01-01 2023-08-04 100000 10 No Engineering

1 Bob Male 2023-02-01 2023-08-05 120000 12 Yes Sales

2 Carol Female 2023-03-01 2023-08-06 140000 14 No Marketing

3 Dave Male 2023-04-01 2023-08-07 160000 16 Yes Product

4 Eve Female 2023-05-01 2023-08-08 180000 18 No Design
```

This output gives us a basic overview of the employee data. We can see that there are 5
employees in the dataset, and that the data includes information such as the employee’s name,
gender, start date, salary, bonus percentage, senior management status, and team.

We can also use EDA to identify more specific patterns and relationships in the data. For
example, we can use data visualization to create charts and graphs that show how the different
variables in the data are related to each other.

EDA is a powerful tool that can help us to better understand our data and to make better
decisions.

Immediate Access Medical Insurance A Revenue Cycle Process Approach 9th Edition Valerius Verified PDF Download
No ratings yet
Immediate Access Medical Insurance A Revenue Cycle Process Approach 9th Edition Valerius Verified PDF Download
406 pages
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
No ratings yet
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
30 pages
Peter H. Kahn, Stephen R. Kellert - Children and Nature - Psychological, Sociocultural, and Evolutionary investigations-MIT Press (2002)
100% (1)
Peter H. Kahn, Stephen R. Kellert - Children and Nature - Psychological, Sociocultural, and Evolutionary investigations-MIT Press (2002)
371 pages
Chapter 1 - The Business and Society Relationship
100% (5)
Chapter 1 - The Business and Society Relationship
19 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Teske Duolingo 2017
No ratings yet
Teske Duolingo 2017
10 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Dimensionality Reduction (Pca)
No ratings yet
Dimensionality Reduction (Pca)
32 pages
Research Paper About Education PDF
No ratings yet
Research Paper About Education PDF
7 pages
D3S2 - Unsupervised - Dimensionality Reduction
No ratings yet
D3S2 - Unsupervised - Dimensionality Reduction
81 pages
I. Objectives: The Learners Explain The Use of Possessive Pronouns
100% (1)
I. Objectives: The Learners Explain The Use of Possessive Pronouns
2 pages
AI Unit-5
No ratings yet
AI Unit-5
53 pages
Mikael Arab Resume
No ratings yet
Mikael Arab Resume
3 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
Feature Engineering
No ratings yet
Feature Engineering
51 pages
IIT JEE (Main + Advanced) NEET UG (Pre-Medical)
No ratings yet
IIT JEE (Main + Advanced) NEET UG (Pre-Medical)
14 pages
Updated Lecture 13 Zainab
No ratings yet
Updated Lecture 13 Zainab
17 pages
Reflection
No ratings yet
Reflection
1 page
Mark Smith s254336 Paper 2
No ratings yet
Mark Smith s254336 Paper 2
7 pages
Math 30-1 Assignment Booklet 1 (2024 Fall) - 3
No ratings yet
Math 30-1 Assignment Booklet 1 (2024 Fall) - 3
14 pages
Insight On The Integration of Human and Christian Education in The Lives of The Students As The Heart (Core) of Lasallian Education
No ratings yet
Insight On The Integration of Human and Christian Education in The Lives of The Students As The Heart (Core) of Lasallian Education
2 pages
Ds 5
No ratings yet
Ds 5
9 pages
EDAB Module 5 Singular Value Decomposition (SVD)
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
58 pages
Unit of Study Pumpkins
No ratings yet
Unit of Study Pumpkins
4 pages
Summative Evaluation of Practicum 1
No ratings yet
Summative Evaluation of Practicum 1
2 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
CS464 Ch6 FeatureExtraction
No ratings yet
CS464 Ch6 FeatureExtraction
46 pages
Dimensionality Reduction Techniques You Should Know in 2021
No ratings yet
Dimensionality Reduction Techniques You Should Know in 2021
12 pages
It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
Day School 03
No ratings yet
Day School 03
32 pages
Dimensionality Reduction: Motivation I: Data Compression
No ratings yet
Dimensionality Reduction: Motivation I: Data Compression
35 pages
Essay Tecnología
No ratings yet
Essay Tecnología
4 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
19 pages
CHP 4
No ratings yet
CHP 4
72 pages
Face Recognition PAC
No ratings yet
Face Recognition PAC
24 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
ML Unit 2 Part 2
No ratings yet
ML Unit 2 Part 2
23 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Pca
No ratings yet
Pca
19 pages
Unit 1 Patterning
No ratings yet
Unit 1 Patterning
52 pages
Ed 243 Autism Lecture Notes With Kris Baker
No ratings yet
Ed 243 Autism Lecture Notes With Kris Baker
3 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
Cvresearchpaperfinalfinal
No ratings yet
Cvresearchpaperfinalfinal
5 pages
ML Module 6
No ratings yet
ML Module 6
6 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
Harlem Renaissance Project
No ratings yet
Harlem Renaissance Project
5 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
2 pages
Lecture - Performance Based Assessment
No ratings yet
Lecture - Performance Based Assessment
5 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
Deep Learning For Data Analytics 2023 Answer
No ratings yet
Deep Learning For Data Analytics 2023 Answer
6 pages
Module 3
No ratings yet
Module 3
41 pages
GR12 Agri COT2 DLL - EXPLICIT - 2nd QRTR
No ratings yet
GR12 Agri COT2 DLL - EXPLICIT - 2nd QRTR
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Love Report
No ratings yet
Love Report
7 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Project LA
No ratings yet
Project LA
13 pages
Lifetime Sports Syllabus: Course Description
No ratings yet
Lifetime Sports Syllabus: Course Description
3 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Feature Engineering
No ratings yet
Feature Engineering
5 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
7.3 Pca
No ratings yet
7.3 Pca
17 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Pr2 Title Proposal
No ratings yet
Pr2 Title Proposal
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Developing Reading Fluency With Repeated Reading
No ratings yet
Developing Reading Fluency With Repeated Reading
6 pages
DR Pca
No ratings yet
DR Pca
22 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
ENGLISH Observe
No ratings yet
ENGLISH Observe
5 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
I.K.Gujral Punjab Technical University, Jalandhar: Jalandhar-Kapurthala Highway, Jalandhar
No ratings yet
I.K.Gujral Punjab Technical University, Jalandhar: Jalandhar-Kapurthala Highway, Jalandhar
1 page
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
BEED Curriculum
100% (1)
BEED Curriculum
4 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Practical Guidelines
100% (2)
Practical Guidelines
48 pages
Prototype Title: - STRAND & SEC.: - Members
No ratings yet
Prototype Title: - STRAND & SEC.: - Members
1 page
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Lesson Plan For "Boom" and Setting
No ratings yet
Lesson Plan For "Boom" and Setting
3 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
The McKinsey 7S Model
No ratings yet
The McKinsey 7S Model
3 pages

Assignment (3) ML - AmanVerma

Uploaded by

Assignment (3) ML - AmanVerma

Uploaded by

Assignment – 3

Name – Aman vermaVerma [22MBA10026]

Ans. Importance of data scaling in dimensionality reduction

Impact of data scaling on various algorithms

t-Distributed Stochastic Neighbor Embedding (t-SNE): t-SNE is a dimensionality reduction technique

Isomap: Isomap is a dimensionality reduction technique that learns a low-dimensional representation of

Here Is a brief explanation of how to standardize and normalize data in Python:

# Standardize the data

# Normalize the data

Which scaling technique to use?

| Scaling Technique | Recommended Use Cases |

| Standardization | Continuous data that is normally distributed |

2. Explain the following terms with example

Principal Component Analysis (PCA) is an unsupervised dimensionality reduction technique that

Eigenfaces for feature extraction

Here Is a simple example of EDA using Python:

# Read the employee data from the CSV file

# Print the first five rows of the data

0 Alice Female 2023-01-01 2023-08-04 100000 10 No Engineering

1 Bob Male 2023-02-01 2023-08-05 120000 12 Yes Sales

3 Dave Male 2023-04-01 2023-08-07 160000 16 Yes Product

You might also like