0% found this document useful (0 votes)

15 views6 pages

PDSLab Manual EXP7

Uploaded by

Tawheed Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views6 pages

PDSLab Manual EXP7

Uploaded by

Tawheed Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

BHARATIYA VIDYA BHAVAN’S

SARDAR PATEL INSTITUTE OF TECHNOLOGY

MUNSHI NAGAR, ANDHERI (WEST), MUMBAI – 400 058.
Computer Engineering
Python Programming for Data Science
A. Y. 2024-25

Experiment 7

Aim: Implementation of feature reduction techniques.

Theory:

Introduction to Feature Reduction Techniques:

Feature reduction techniques are essential tools in machine learning and data science, used to
simplify models by reducing the number of input variables while preserving the most relevant
information. These methods enhance computational efficiency, minimize the risk of overfitting,
and improve the interpretability of models. By focusing on the most significant features, these
techniques allow models to perform better, especially when dealing with high-dimensional
datasets. Common approaches include Principal Component Analysis (PCA), Linear
Discriminant Analysis (LDA), and feature selection methods like Recursive Feature
Elimination (RFE). These techniques are widely used across various domains to enhance model
performance and reduce complexity.

Key Features of Feature Reduction Techniques Include:

- Dimensionality Reduction: Reduces the number of features in high-dimensional datasets

while preserving the most informative ones, which improves model accuracy and efficiency.

- Mitigating Overfitting: Simplifying models by reducing features can prevent overfitting,

especially in cases where the model has more features than data points.

- Improved Interpretability: With fewer features, it becomes easier to understand and interpret
the patterns in the data.

- Faster Computation: Fewer features lead to quicker training and prediction times, which is
critical for large datasets or complex models.

Key Feature Reduction Techniques:

1. Principal Component Analysis (PCA):

- PCA is a linear technique that reduces the dimensionality of the data by finding a new set of
orthogonal features (principal components) that capture the most variance in the data.

- Use case: Dimensionality reduction for visualization, noise reduction, and improving model
efficiency.

SPIT/CE/PDS/EXP7 Page No.1

2. Linear Discriminant Analysis (LDA):

- LDA reduces the feature space by focusing on maximizing the separation between different
classes. It finds a lower-dimensional space that best separates the target classes.

- Use case: Commonly used for supervised classification tasks with multiple classes.

3.Singular Value Decomposition (SVD)

- It is a powerful technique in linear algebra that is used for dimensionality reduction, data
compression, and feature extraction. It decomposes a matrix into three other matrices, revealing
its fundamental structure. In machine learning, SVD is often applied to reduce the number of
features in a dataset while preserving important information.

How SVD Works:

Given a matrix AA of shape m×nm×n, SVD decomposes it into three matrices:

A=UΣVTA=UΣVT Where:

● UU: An m×mm×m orthogonal matrix. Each column of UU represents a left singular

vector.
● ΣΣ: An m×nm×n diagonal matrix where the diagonal values are called singular values.
These values indicate the importance (or magnitude) of the corresponding singular
vectors.
● VTVT: The transpose of an n×nn×n orthogonal matrix. Each row of VTVT (or column
of VV) is a right singular vector.

Basic Syntax and Operations for Feature Reduction:

1. Principal Component Analysis (PCA):

# Importing necessary libraries

from sklearn.decomposition import PCA # PCA for feature reduction
import numpy as np # NumPy for handling data arrays

# Sample data: 100 samples, each with 5 features

# Generating random data with 100 rows and 5 columns
data = np.random.rand(100, 5)

# Applying PCA to reduce the number of features to 2

# PCA will find the two principal components that capture the most variance in the data
pca = PCA(n_components=2)

# Transforming the data by projecting it onto the two principal components

reduced_data = pca.fit_transform(data)

# Output the explained variance ratio of the two principal components

# This shows how much variance in the data is explained by each of the two components
print(pca.explained_variance_ratio_)

SPIT/CE/PDS/EXP7 Page No.2

2. Linear Discriminant Analysis (LDA):

# Importing necessary libraries

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
# LDA for dimensionality reduction
from sklearn.datasets import load_iris # Loading a sample dataset (Iris dataset)

# Loading the Iris dataset

# X represents the features (sepal length, sepal width, petal length, petal width)
# y represents the target labels (species of iris flowers)
iris = load_iris()
X, y = iris.data, iris.target

# Applying LDA to reduce the data to 2 components

# LDA will find two linear combinations of the features that best separate the classes
lda = LDA(n_components=2)

# Transforming the data by projecting it onto the two linear discriminants

lda_data = lda.fit_transform(X, y)

# Displaying the reduced data

# This will print the transformed data with two components for each sample
print(lda_data)

3)Singular Value Decomposition (SVD)

# Importing necessary libraries

from sklearn.decomposition import TruncatedSVD # SVD for dimensionality reduction
import numpy as np # NumPy for handling data arrays

# Sample data: 100 samples, each with 5 features

# Generating random data with 100 rows and 5 columns
data = np.random.rand(100, 5)

# Applying SVD to reduce the number of features to 2

# TruncatedSVD is used here to compute a low-rank approximation of the data
svd = TruncatedSVD(n_components=2)

# Transforming the data by projecting it onto the two singular vectors

reduced_data = svd.fit_transform(data)

# Output the explained variance ratio

# This shows how much variance in the data is captured by each of the selected components
print(svd.explained_variance_ratio_)

Overfitting and Underfitting: A Detailed Explanation

In machine learning, the goal is to build models that generalize well to unseen data. However, if
a model is too complex or too simple, it may suffer from overfitting or underfitting, which are
two common pitfalls in model training. Understanding these concepts is key to achieving a
balance between a model’s performance on the training data and its ability to generalize to new
data.

SPIT/CE/PDS/EXP7 Page No.3

1. Overfitting

Definition:

Overfitting occurs when a model is too complex and learns not only the underlying patterns in
the training data but also the noise and irrelevant details. As a result, the model performs very
well on the training data but poorly on unseen test data. This happens because the model fits
the data too closely, capturing random fluctuations that do not represent the actual data
distribution.

Causes of Overfitting:

- Too many features: If there are more features than necessary, the model may pick up on
irrelevant relationships in the training data.

- High model complexity: Complex models (like deep neural networks or very deep decision
trees) have a large number of parameters, which can cause them to memorize the training data.

- Insufficient training data: When the dataset is too small, the model has fewer examples to
learn from, leading it to capture noise or spurious patterns.

- Excessive training: Training a model for too many epochs can lead to overfitting, as the
model starts learning minor variations in the data instead of general patterns.

Symptoms of Overfitting:

- High training accuracy, low test accuracy: The model shows high performance on training
data but performs poorly on validation/test data.

- Large gap between training and validation error: The training error keeps decreasing, but the
validation error stops decreasing (or starts increasing).

Example of Overfitting:

Imagine fitting a highly complex curve (with many polynomial terms) to data points that follow
a simpler underlying trend. The model fits every data point perfectly, including noise or
outliers, but fails to predict new data points accurately.

How to Prevent Overfitting:

- Cross-validation: Use techniques like k-fold cross-validation to ensure that the model
generalizes well.

- Regularization: Techniques like L1 (Lasso) and L2 (Ridge) regularization add penalties to the
model's complexity to prevent overfitting.

- Reduce model complexity: Simplify the model by using fewer features or parameters (e.g.,
pruning a decision tree).

- Early stopping: Stop the training process when the validation error starts increasing, even if
the training error keeps decreasing.

- Ensemble methods: Use techniques like bagging (e.g., Random Forests) or boosting (e.g.,
XGBoost) to combine multiple models and reduce overfitting.

- Data augmentation: In cases where data is limited, artificially increase the training set by
creating variations of the existing data (commonly used in image recognition tasks).
SPIT/CE/PDS/EXP7 Page No.4
2. Underfitting

Definition:

Underfitting occurs when a model is too simple and fails to capture the underlying patterns in
the training data. The model performs poorly on both training and test data because it cannot
represent the data well. This typically happens when the model is not complex enough to
account for the variability in the data.

Causes of Underfitting:

- Too few features: The model doesn’t have enough features to capture the relationships in the
data.

- High bias: Simple models, like linear regression for complex data, make strong assumptions
about the data (e.g., linearity) and may not capture more intricate patterns.

- Undertrained model: The model hasn’t been trained long enough to learn from the data, or the
algorithm used may not be appropriate for the complexity of the problem.

- Wrong choice of model: Using a model that is inherently too simple for the task, such as
applying linear regression to data that requires a non-linear model.

Symptoms of Underfitting:

- High training error and high test error: Both the training and test errors are large because the
model is too simple to learn the data’s structure.

- Low variance, high bias: The model has little flexibility to adapt to the data and hence has a
strong bias (predefined assumption about the data).

Example of Underfitting:

Suppose you have data that follows a non-linear pattern (like a curve), but you fit a straight line
(linear regression) to it. The model will not capture the underlying trend and will perform
poorly even on the training data.

How to Prevent Underfitting:

- Increase model complexity: Use a more complex model or add more features to the model
(e.g., use polynomial regression instead of linear regression for non-linear data).

- Remove bias: Try using models with lower bias, such as decision trees or neural networks,
which have more flexibility in fitting the data.

- Train longer: Ensure that the model is trained sufficiently, allowing it to learn the patterns in
the data.

- Feature engineering: Create new features or transformations of existing ones to better

represent the underlying data patterns.

3. Balancing Overfitting and Underfitting (Bias-Variance Tradeoff)

To achieve good model performance, we need to find the right balance between overfitting and
underfitting. This balance is often referred to as the bias-variance tradeoff:

- High bias (underfitting) occurs when the model is too simplistic and makes strong
SPIT/CE/PDS/EXP7 Page No.5
assumptions about the data (e.g., assuming a linear relationship when the data is non-linear).

- High variance (overfitting) occurs when the model is too flexible and captures noise in the
data, leading to poor generalization to new data.

The goal is to build a model that has low bias and low variance by tuning the complexity of the
model and using techniques like regularization, cross-validation, or ensemble learning to ensure
good generalization.

Visualization of Overfitting and Underfitting

Imagine plotting a model's performance as it grows in complexity:

1. Underfitting: The model is too simple and cannot capture patterns in the data.

2. Good fit: The model is just complex enough to capture the patterns without fitting noise.

3. Overfitting: The model is too complex and fits the training data perfectly, but performs
poorly on new data.

The ideal model is somewhere in between, where it is complex enough to learn the data but not
so complex that it learns noise.

Conclusion: This lab manual provides a thorough introduction to key techniques in

feature reduction, including methods like PCA, LDA, and SVD. It covers fundamental
concepts, such as reducing dimensionality while preserving essential data patterns,
optimizing model performance, and addressing the curse of dimensionality. Through
practical examples and exercises, you will gain a strong understanding of how to
effectively apply these techniques to simplify datasets without losing important
information. This knowledge will equip you with the skills to enhance model efficiency,
improve interpretability, and tackle complex problems in data science, machine
learning, and business analytics.

** In conclusion, students are expected to write their own individual learnings from the said
experiments.

SPIT/CE/PDS/EXP7 Page No.6

Unit 3
No ratings yet
Unit 3
21 pages
2023 NEC Code Changes
75% (4)
2023 NEC Code Changes
46 pages
Unit 5 (Dimensionality Reduction)
No ratings yet
Unit 5 (Dimensionality Reduction)
96 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
79 pages
Unit - 4
No ratings yet
Unit - 4
76 pages
End Term + Mid Term
No ratings yet
End Term + Mid Term
54 pages
Machine Unit4
No ratings yet
Machine Unit4
55 pages
UNIT-4 Machine Learning
No ratings yet
UNIT-4 Machine Learning
20 pages
Unit 3
No ratings yet
Unit 3
50 pages
L06 Features
No ratings yet
L06 Features
44 pages
Feature Engineering
No ratings yet
Feature Engineering
51 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
Operator'S Manual: AVI Survival Product, Inc. 1655 NW 136 Avenue, Bldg. M Sunrise, Florida, USA 33323
100% (1)
Operator'S Manual: AVI Survival Product, Inc. 1655 NW 136 Avenue, Bldg. M Sunrise, Florida, USA 33323
19 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
AI Unit-5
No ratings yet
AI Unit-5
53 pages
Sanjey RS Lab
No ratings yet
Sanjey RS Lab
33 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
85 pages
Day School 03
No ratings yet
Day School 03
32 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
EDAB Module 5 Singular Value Decomposition (SVD)
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
58 pages
Ensemble Learning and Random Forests
No ratings yet
Ensemble Learning and Random Forests
37 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
ML Unit 4
No ratings yet
ML Unit 4
10 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Northbay Summarizes Data Pre-Processing Algorithms
No ratings yet
Northbay Summarizes Data Pre-Processing Algorithms
10 pages
CHP 4
No ratings yet
CHP 4
72 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
KCU401-C Keeler Cryomatic Service Manual
100% (1)
KCU401-C Keeler Cryomatic Service Manual
25 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
19 pages
It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
ML Notes
No ratings yet
ML Notes
15 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
MLTAHER
No ratings yet
MLTAHER
14 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Exp 15
No ratings yet
Exp 15
12 pages
ML Mod 4 & 6 Pyq
No ratings yet
ML Mod 4 & 6 Pyq
11 pages
Unit 4 (PCA)
No ratings yet
Unit 4 (PCA)
12 pages
ML Interview Questions
No ratings yet
ML Interview Questions
10 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Dissrtatn Cmplte PDF
No ratings yet
Dissrtatn Cmplte PDF
162 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
PA
No ratings yet
PA
8 pages
Aiml Model
No ratings yet
Aiml Model
13 pages
Feature Extraction
No ratings yet
Feature Extraction
16 pages
Exp7
No ratings yet
Exp7
7 pages
Lab Lec 1a - Laboratory Rules and Safety Precautions
No ratings yet
Lab Lec 1a - Laboratory Rules and Safety Precautions
52 pages
Classification Review
No ratings yet
Classification Review
8 pages
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
No ratings yet
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
10 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Dimensionality Reduction Final
No ratings yet
Dimensionality Reduction Final
5 pages
ML 6
No ratings yet
ML 6
7 pages
Wk01 Machine Learning
No ratings yet
Wk01 Machine Learning
6 pages
Creative Non-Fiction - Q3 - W6
100% (5)
Creative Non-Fiction - Q3 - W6
17 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Lec 4 - Data Science
No ratings yet
Lec 4 - Data Science
3 pages
Instruction Manual: Millivoltmeter
100% (1)
Instruction Manual: Millivoltmeter
51 pages
Maxbox - Starter67 Machine Learning
No ratings yet
Maxbox - Starter67 Machine Learning
7 pages
Music 2F03 Notes
No ratings yet
Music 2F03 Notes
6 pages
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
No ratings yet
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
5 pages
Lamprel Jebel Ali
No ratings yet
Lamprel Jebel Ali
2 pages
VH Cbse-Gr-8 Mathematics Sample QP Half-Yearly
No ratings yet
VH Cbse-Gr-8 Mathematics Sample QP Half-Yearly
10 pages
Advantages and Disadvantages Paragraph
No ratings yet
Advantages and Disadvantages Paragraph
5 pages
A Brief Overview of Artificial Intelligence
No ratings yet
A Brief Overview of Artificial Intelligence
2 pages
Spring Security SAML - Documentation
No ratings yet
Spring Security SAML - Documentation
7 pages
Hai An Agency & Logistics Co.,LTD (HAAL)
No ratings yet
Hai An Agency & Logistics Co.,LTD (HAAL)
26 pages
f389 Saw Filter
No ratings yet
f389 Saw Filter
9 pages
CT 230
No ratings yet
CT 230
21 pages
Open Silicon Pakistan Brochure
No ratings yet
Open Silicon Pakistan Brochure
1 page
GCRG International Conference
No ratings yet
GCRG International Conference
5 pages
BITSAT Preference Sheet 2021
No ratings yet
BITSAT Preference Sheet 2021
4 pages
Guidelines To Fill Student Data University Marksheet v2.9
No ratings yet
Guidelines To Fill Student Data University Marksheet v2.9
6 pages
Poetry Mid Test
No ratings yet
Poetry Mid Test
4 pages
Ultrasonic Calculator
No ratings yet
Ultrasonic Calculator
6 pages
Typical Vs Atypical Antipsychotics
No ratings yet
Typical Vs Atypical Antipsychotics
6 pages
BL - Awb
No ratings yet
BL - Awb
1 page
Digital Twins For Precision Healthcare
No ratings yet
Digital Twins For Precision Healthcare
20 pages
Buckling of Orthotropic, Curved, Sandwich Panels Subjected To Edge Shear Loads
No ratings yet
Buckling of Orthotropic, Curved, Sandwich Panels Subjected To Edge Shear Loads
4 pages
Int J Mental Health Nurs - 2003 - Happell - Burnout and Job Satisfaction A Comparative Study of Psychiatric Nurses From
No ratings yet
Int J Mental Health Nurs - 2003 - Happell - Burnout and Job Satisfaction A Comparative Study of Psychiatric Nurses From
9 pages
L5 Evaluating Materials
No ratings yet
L5 Evaluating Materials
11 pages
Amazonico London A La Carte Menu
No ratings yet
Amazonico London A La Carte Menu
2 pages
Bhumika Kasar
No ratings yet
Bhumika Kasar
1 page

PDSLab Manual EXP7

Uploaded by

PDSLab Manual EXP7

Uploaded by

BHARATIYA VIDYA BHAVAN’S

SARDAR PATEL INSTITUTE OF TECHNOLOGY

Aim: Implementation of feature reduction techniques.

Introduction to Feature Reduction Techniques:

Key Features of Feature Reduction Techniques Include:

- Dimensionality Reduction: Reduces the number of features in high-dimensional datasets

- Mitigating Overfitting: Simplifying models by reducing features can prevent overfitting,

Key Feature Reduction Techniques:

1. Principal Component Analysis (PCA):

SPIT/CE/PDS/EXP7 Page No.1

3.Singular Value Decomposition (SVD)

How SVD Works:

Given a matrix AA of shape m×nm×n, SVD decomposes it into three matrices:

● UU: An m×mm×m orthogonal matrix. Each column of UU represents a left singular

Basic Syntax and Operations for Feature Reduction:

1. Principal Component Analysis (PCA):

# Importing necessary libraries

# Sample data: 100 samples, each with 5 features

# Applying PCA to reduce the number of features to 2

# Transforming the data by projecting it onto the two principal components

# Output the explained variance ratio of the two principal components

SPIT/CE/PDS/EXP7 Page No.2

# Importing necessary libraries

# Loading the Iris dataset

# Applying LDA to reduce the data to 2 components

# Transforming the data by projecting it onto the two linear discriminants

# Displaying the reduced data

3)Singular Value Decomposition (SVD)

# Importing necessary libraries

# Sample data: 100 samples, each with 5 features

# Applying SVD to reduce the number of features to 2

# Transforming the data by projecting it onto the two singular vectors

# Output the explained variance ratio

Overfitting and Underfitting: A Detailed Explanation

SPIT/CE/PDS/EXP7 Page No.3

How to Prevent Overfitting:

How to Prevent Underfitting:

- Feature engineering: Create new features or transformations of existing ones to better

3. Balancing Overfitting and Underfitting (Bias-Variance Tradeoff)

Visualization of Overfitting and Underfitting

Imagine plotting a model's performance as it grows in complexity:

Conclusion: This lab manual provides a thorough introduction to key techniques in

SPIT/CE/PDS/EXP7 Page No.6

You might also like