0% found this document useful (0 votes)
405 views17 pages

Unit No.02 - Feature Extraction and Selection

Dimensionality reduction techniques are used to reduce the number of features or variables in a dataset. This helps address the curse of dimensionality by reducing complexity and overfitting. There are two main approaches: feature extraction, which creates new features as transformations of the original features; and feature selection, which selects a subset of important original features. Popular dimensionality reduction methods include principal component analysis (PCA), linear discriminant analysis (LDA), and correlation-based feature selection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
405 views17 pages

Unit No.02 - Feature Extraction and Selection

Dimensionality reduction techniques are used to reduce the number of features or variables in a dataset. This helps address the curse of dimensionality by reducing complexity and overfitting. There are two main approaches: feature extraction, which creates new features as transformations of the original features; and feature selection, which selects a subset of important original features. Popular dimensionality reduction methods include principal component analysis (PCA), linear discriminant analysis (LDA), and correlation-based feature selection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Unit No.

02
Feature Extraction and Selection

 What is Dimensionality Reduction?


The number of input features, variables, or columns present in a given dataset is

known as dimensionality, and the process to reduce these features is called dimensionality
reduction.

A dataset contains a huge number of input features in various cases, which makes
the predictive modeling task more complicated. Because it is very difficult to visualize or

make predictions for the training dataset with a high number of features, for such cases,
dimensionality reduction techniques are required to use.

Dimensionality reduction technique can be defined as, "It is a way of converting


the higher dimensions dataset into lesser dimensions dataset ensuring that it provides

similar information." These techniques are widely used in machine learning for obtaining
a better fit predictive model while solving the classification and regression problems.

It is commonly used in the fields that deal with high-dimensional data, such
as speech recognition, signal processing, bioinformatics, etc. It can also be used for

data visualization, noise reduction, cluster analysis, etc.

 The Curse of Dimensionality


Handling the high-dimensional data is very difficult in practice, commonly known

as the curse of dimensionality. If the dimensionality of the input dataset increases, any
machine learning algorithm and model becomes more complex. As the number of features

increases, the number of samples also gets increased proportionally, and the chance of
overfitting also increases. If the machine learning model is trained on high-dimensional data,

it becomes overfitted and results in poor performance.


Hence, it is often required to reduce the number of features, which can be done with

dimensionality reduction.

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
 Benefits of applying Dimensionality Reduction

Some benefits of applying dimensionality reduction technique to the given dataset are
given below:

o By reducing the dimensions of the features, the space required to store the dataset
also gets reduced.
o Less Computation training time is required for reduced dimensions of features.
o Reduced dimensions of features of the dataset help in visualizing the data quickly.
o It removes the redundant features (if present) by taking care of multicollinearity.

 Disadvantages of dimensionality Reduction

There are also some disadvantages of applying the dimensionality reduction, which are
given below:

o Some data may be lost due to dimensionality reduction.


o In the PCA dimensionality reduction technique, sometimes the principal
components required to consider are unknown.

 Methods of Dimensionality Reduction (Feature Reduction)


There are two ways to apply the dimension reduction technique,

Methods of Feature
Reduction

Feature Extraction Feature Selection

-Correlation
-Principal Component Analysis -Chi-Square Test
-Linear Discriminant Analysis -Forward Selection
-Kernel PCA -Backward Selection
-Quadratic Discriminant Analysis -Information Gain
-LASSO & Ridge Regression

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
 Feature Extraction: -
Feature extraction is usually used when the original raw data is very different and we
cannot use raw data for Machine Learning modeling then we transform raw data into the
desired form. Feature extraction is the method for creating a new and smaller set of
features that captures most of the useful information of raw data.
When we actually work in real world machine learning problem then we rarely get data in
shape of CSV So we have to extract the useful features from the raw data. Some of the
popular types of raw data from which features (new feature creation) can be extracted.

 Texts
 Images
 Geospatial data
 Date and time
 Web data
 Sensor Data

 Statistical features-
Statistics is a branch of mathematics that deals with collecting, analyzing,

interpreting, and visualizing empirical data. Descriptive statistics and inferential statistics
are the two major areas of statistics. Descriptive statistics are for describing the properties

of sample and population data (what has happened). Inferential statistics use those
properties to test hypotheses, reach conclusions, and make predictions (what can you

expect).

Important Statistical Features are as follows,

 Median
 Mode
 Mean
 Percentile
 Bias
 Variance
 Standard Deviation (S.D)
 Mean Absolute Deviation (M.A.D)
 Z- Score
 Skewness
 Kurtosis

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
 Mean- Mean is the most commonly used measure of central tendency. It actually
represents the average of the given collection of data.

It is equal to the sum of all the values in the collection of data divided by the total

number of values.
Suppose we have n values in a set of data namely as x1, x2, x3, …, xn, then the mean of

data is given by:

 Median- Generally median represents the mid-value of the given set of data when
arranged in a particular order. Middle Value is called as Median.

 Mode- The most frequent number occurring in the data set is known as the mode.
 Percentile- a percentile is a term that describes how a score compares to other scores

from the same set. IQR


(Interquartile Range)

0th 50th Percentile 100th


Percentile (Median) Percentile

100 300 400 500 800 900 1000

25th 75th
Percentile Percentile

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
Calculation-

 Bias- when there is a systematic difference between the true parameters and the

results we estimated, it is called a bias.


 Variance- Variance gives us a measure of how much the data points differ from the

mean. It can also be treated as a measure that tell us how far the data points are
spread out.

 Standard Deviation (S.D)- the standard deviation is a measure of the amount of


variation or dispersion of a set of values.

 Z- Score- How many Standard Deviation away a datapoint is from mean.


 Skewness- Skewness is a measure of symmetry, or more precisely, the lack of

symmetry. A distribution, or data set, is symmetric if it looks the same to the left and
right of the center point.

 Kurtosis- Kurtosis is a measure of whether the data are heavy-tailed or light-tailed


relative to a normal distribution.

 Principal Component Analysis-


PCA (Principal Component Analysis) is a statistical technique that is widely used for data
reduction and dimensionality reduction in various fields such as machine learning, data
science, and signal processing. PCA is a linear transformation technique that finds the
directions of the maximum variance in a dataset and projects the data onto a lower-
dimensional space.

PCA can be divided into the following steps:

1. Standardize the data: The first step in PCA is to standardize the data so that each
variable has zero mean and unit variance. This is done to ensure that all variables
are on the same scale.
2. Calculate the covariance matrix: The next step is to calculate the covariance matrix,
which is a measure of the linear relationship between two variables. The covariance
matrix is calculated by multiplying the transpose of the data matrix by the data
matrix itself.
3. Calculate the eigenvectors and eigenvalues of the covariance matrix: The
eigenvectors and eigenvalues of the covariance matrix are calculated to find the

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
directions of maximum variance in the dataset. Eigenvectors are the directions and
eigenvalues are the magnitudes of the variance.
4. Select the principal components: The eigenvectors are sorted by their
corresponding eigenvalues in descending order. The eigenvectors with the largest
eigenvalues are called the principal components. These principal components
represent the directions of maximum variance in the dataset.
5. Project the data onto the principal components: Finally, the data is projected onto
the principal components to reduce the dimensionality of the dataset. The number
of principal components to be selected depends on the desired amount of variance
to be retained in the data.

PCA is useful in several applications such as image compression, feature extraction, data
visualization, and data analysis. PCA can also be used in combination with other machine
learning algorithms to improve their performance by reducing the dimensionality of the
dataset.

Feature Selection
Feature Selection is a way of selecting the subset of the most relevant features from the
original features set by removing the redundant, irrelevant, or noisy features.”

Need of Feature Selection

 While developing the machine learning model, only a few variables in the dataset
are useful for building the model, and the rest features are either redundant or
irrelevant.

 If we input the dataset with all these redundant and irrelevant features, it may
negatively impact and reduce the overall performance and accuracy of the model.

 Hence it is very important to identify and select the most appropriate features from
the data and remove the irrelevant or less important features, which is done with
the help of feature selection in machine learning.

 Feature selection is one of the important concepts of machine learning, which


highly impacts the performance of the model.

Feature Selection Techniques-


There are mainly two types of Feature Selection techniques, which are:

Supervised Feature Selection technique


Supervised Feature selection techniques consider the target variable and can be used
for the labelled dataset.

Unsupervised Feature Selection technique


Unsupervised Feature selection techniques ignore the target variable and can be used
for the unlabelled dataset
Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
There are mainly three techniques under supervised feature Selection:

1. Wrapper Methods
In wrapper methodology, selection of features is done by considering it as a search problem,
in which different combinations are made, evaluated, and compared with other
combinations. It trains the algorithm by using the subset of features iteratively.

On the basis of the output of the model, features are added or subtracted, and with this
feature set, the model has trained again.

Some techniques of wrapper methods are:

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
o Forward selection - Forward selection is an iterative process, which begins with an empty
set of features. After each iteration, it keeps adding on a feature and evaluates the
performance to check whether it is improving the performance or not. The process continues
until the addition of a new variable/feature does not improve the performance of the model.
o Backward elimination - Backward elimination is also an iterative approach, but it is the
opposite of forward selection. This technique begins the process by considering all the
features and removes the least significant feature. This elimination process continues until
removing the features does not improve the performance of the model.
o Exhaustive Feature Selection- Exhaustive feature selection is one of the best feature
selection methods, which evaluates each feature set as brute-force. It means this method tries
& make each possible combination of features and return the best performing feature set.
o Recursive Feature Elimination-Recursive feature elimination is a recursive greedy
optimization approach, where features are selected by recursively taking a smaller and
smaller subset of features. Now, an estimator is trained with each set of features, and the
importance of each feature is determined using coef_attribute or through
a feature_importances_attribute.

2. Filter Methods
In Filter Method, features are selected on the basis of statistics measures. This method does
not depend on the learning algorithm and chooses the features as a pre-processing step.

The filter method filters out the irrelevant feature and redundant columns from the model
by using different metrics through ranking.

The advantage of using filter methods is that it needs low computational time and does not
overfit the data.

Some common techniques of Filter methods are as follows:

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
o Information Gain
o Chi-square Test
o Fisher's Score
o Missing Value Ratio

Information Gain: Information gain determines the reduction in entropy while


transforming the dataset. It can be used as a feature selection technique by calculating the
information gain of each variable with respect to the target variable.

Chi-square Test: Chi-square test is a technique to determine the relationship between the
categorical variables. The chi-square value is calculated between each feature and the target
variable, and the desired number of features with the best chi-square value is selected.

Fisher's Score:

Fisher's score is one of the popular supervised technique of features selection. It returns the
rank of the variable on the fisher's criteria in descending order. Then we can select the
variables with a large fisher's score.

Missing Value Ratio:

The value of the missing value ratio can be used for evaluating the feature set against the
threshold value. The formula for obtaining the missing value ratio is the number of missing
values in each column divided by the total number of observations. The variable is having
more than the threshold value can be dropped.

3. Embedded Methods
Embedded methods combined the advantages of both filter and wrapper methods by
considering the interaction of features along with low computational cost. These are fast
processing methods similar to the filter method but more accurate than the filter method.

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
These methods are also iterative, which evaluates each iteration, and optimally finds the
most important features that contribute the most to training in a particular iteration. Some
techniques of embedded methods are:

o Regularization- Regularization adds a penalty term to different parameters of the machine


learning model for avoiding overfitting in the model. This penalty term is added to the
coefficients; hence it shrinks some coefficients to zero. Those features with zero coefficients
can be removed from the dataset. The types of regularization techniques are L1
Regularization (Lasso Regularization) or Elastic Nets (L1 and L2 regularization).
o Random Forest Importance - Different tree-based methods of feature selection help us
with feature importance to provide a way of selecting features. Here, feature importance
specifies which feature has more importance in model building or has a great impact on the
target variable. Random Forest is such a tree-based method, which is a type of bagging
algorithm that aggregates a different number of decision trees. It automatically ranks the
nodes by their performance or decrease in the impurity (Gini impurity) over all the trees.
Nodes are arranged as per the impurity values, and thus it allows to pruning of trees below a
specific node. The remaining nodes create a subset of the most important features.

Decision Tree Classification Algorithm-


o Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further
split the tree into subtrees.

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
o Below diagram explains the general structure of a decision tree:

Why use Decision Trees?


There are various algorithms in Machine learning, so choosing the best algorithm for the
given dataset and problem is the main point to remember while creating a machine learning
model. Below are the two reasons for using the Decision tree:

o Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
o The logic behind the decision tree can be easily understood because it shows a tree-
like structure.

Decision Tree Terminologies


 Root Node: Root node is from where the decision tree starts. It represents the entire dataset,
which further gets divided into two or more homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after
getting a leaf node.
 Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from the tree.

 Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.
Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
Example- Decision Tree
Let's say we have a dataset of weather conditions and corresponding whether people go outside or
not:

 In this case, the goal is to predict whether someone will go outside or not given the
weather conditions. A decision tree can be used to model this problem.

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
Entropy-
Entropy is a measure of the impurity or randomness of a dataset in decision tree classification. It
is used to determine which feature is the best to split the data on. Entropy is calculated by
considering the proportion of each class in the dataset, and the more evenly the classes are
distributed, the higher the entropy.
The formula for entropy is:

H(S) = -∑(p_i)log2(p_i)
where p_i is the proportion of samples in the dataset that belong to the ith class.
Let's take an example to understand entropy better. Suppose we have a dataset of 100 samples,
where 70 samples belong to class A and 30 samples belong to class B. We want to calculate the
entropy of this dataset.
Step 1: Calculate the probability of each class
p_A = 70 / 100 = 0.7 p_B = 30 / 100 = 0.3
Step 2: Calculate the entropy
H(S) = -[0.7log2(0.7) + 0.3log2(0.3)] = 0.881
This means that the dataset has an entropy of 0.881. If the entropy is closer to 0, it means that the
dataset is more pure and easier to classify, while if the entropy is closer to 1, it means that the
dataset is more impure and harder to classify. Therefore, when building a decision tree, we want
to choose the feature that gives the lowest entropy, as this will result in the best split of the data
into the different classes.

Information Gain-
Information gain is a measure of the amount of information gained by splitting a dataset on a
particular feature in decision tree classification. It is calculated by comparing the entropy or Gini
index of the dataset before and after the split. The feature that gives the highest information gain
is chosen as the feature to split on.

Information Gain = Entropy ( Parent)- [ Weighted Average ]


Entropy ( Childrens)
Let's take an example to understand information gain better. Suppose we have a dataset of 100
samples, where 40 samples belong to class A and 60 samples belong to class B. We want to split
the dataset on a feature called "color", which has two possible values: "red" and "blue". The color
distribution for each class is as follows:
• Class A: 20 red, 20 blue
• Class B: 30 red, 30 blue
Step 1: Calculate the entropy of the original dataset
H(S) = -[0.4log2(0.4) + 0.6log2(0.6)] = 0.971

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
Step 2: Calculate the entropy after splitting on "color"
For red:
H(S_red) = -[0.5log2(0.5) + 0.5log2(0.5)] = 1.0
For blue:
H(S_blue) = -[0.333log2(0.333) + 0.667log2(0.667)] = 0.918
Step 3: Calculate the information gain
IG(color) = H(S) - [(20/40)*H(S_red) + (20/40)*H(S_red) + (30/60)*H(S_blue) +
(30/60)*H(S_blue)]
= 0.971 - [0.51.0 + 0.51.0 + 0.50.918 + 0.50.918]
= 0.029
This means that the information gain for splitting on the "color" feature is 0.029. If we compare
this with the information gain for other features, we can determine which feature is the best to split
on. The higher the information gain, the better the feature is for splitting the dataset.
Therefore, in summary, information gain measures the reduction in entropy or Gini index achieved
by splitting a dataset on a particular feature. The feature that provides the highest information gain
is chosen as the splitting feature for a decision tree.

Gini Index-
Gini index is a metric used to measure the impurity of a dataset in decision tree classification. A
dataset is considered pure if all of its samples belong to the same class. If a dataset is impure, it
means that there is more than one class present in the dataset. The Gini index measures the
probability of misclassifying a randomly chosen sample in the dataset.
The Gini index can be calculated using the following formula:

Gini index = 1 - ∑(p_i)^2


where p_i is the proportion of samples in the dataset that belong to the ith class.
Let's take an example to understand the Gini index better. Suppose we have a dataset of 100
samples, where 60 samples belong to class A and 40 samples belong to class B. We want to
calculate the Gini index for this dataset.
Step 1: Calculate the probability of each class
p_A = 60 / 100 = 0.6 p_B = 40 / 100 = 0.4
Step 2: Calculate the Gini index
Gini index = 1 - ((0.6)^2 + (0.4)^2) = 0.48
This means that if we randomly choose a sample from the dataset, there is a 48% chance of
misclassifying it. If the Gini index is closer to 0, it means that the dataset is more pure and easier
to classify, while if the Gini index is closer to 1, it means that the dataset is more impure and harder
to classify. Therefore, when building a decision tree, we want to choose the feature that gives the
lowest Gini index, as this will result in the best split of the data into the different classes.
Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
Greedy forward & backward-
Greedy forward and backward are two popular wrapper methods for feature selection in machine
learning. These methods involve selecting a subset of features and evaluating the performance of
a machine learning model on the subset.

Greedy forward selection:


In greedy forward selection, the algorithm starts with an empty set of features and adds one feature
at a time to the set. At each step, the algorithm selects the feature that gives the best performance
improvement when added to the set. The algorithm continues until a predetermined number of
features is selected or until adding new features no longer improves performance.
The steps for the greedy forward selection method are as follows:
Initialize the selected set of features to an empty set.
For each feature that is not in the selected set:
a. Train a machine learning model on the selected set of features plus the current feature.
b. Evaluate the performance of the model using a performance metric such as accuracy or AUC.
c. If the performance of the model improves when the current feature is added to the selected set,
add the feature to the selected set.
Repeat steps 2a to 2c until the desired number of features is selected or until adding new features
no longer improves performance.
The advantages of greedy forward selection include its simplicity and the fact that it may converge
to a better feature subset faster than other methods. However, it may be prone to local optima and
may not find the globally optimal feature subset.

Greedy backward selection:


In greedy backward selection, the algorithm starts with all features and removes one feature at a
time from the set. At each step, the algorithm removes the feature that gives the best performance
improvement when removed from the set. The algorithm continues until a predetermined number
of features is selected or until removing features no longer improves performance.
The steps for the greedy backward selection method are as follows:
Initialize the selected set of features to all features.
For each feature in the selected set:
a. Train a machine learning model on the selected set of features minus the current feature.
b. Evaluate the performance of the model using a performance metric such as accuracy or AUC.
c. If the performance of the model improves when the current feature is removed from the selected
set, remove the feature from the selected set.
Repeat steps 2a to 2c until the desired number of features is selected or until removing features no
longer improves performance.

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
The advantages of greedy backward selection include that it can be faster than greedy forward
selection and may find a better feature subset by removing irrelevant or redundant features.
However, like greedy forward selection, it may be prone to local optima and may not find the
globally optimal feature subset.
Overall, both greedy forward and backward selection methods are simple and effective methods
for feature selection, and their choice depends on the specific problem and dataset. It is
recommended to use cross-validation to evaluate the performance of the selected feature subset
and to avoid overfitting.

Applications of feature extraction and selection algorithms in


Mechanical Engineering.
Feature extraction and selection algorithms have numerous applications in Mechanical
Engineering. In this answer, we will discuss some of the key applications of these algorithms in
this field.

Structural Health Monitoring:


Structural health monitoring is the process of monitoring the health of structures such as bridges,
dams, and buildings to ensure their safety and integrity. Feature extraction and selection algorithms
are used in this field to extract relevant features from sensors such as accelerometers, strain gauges,
and temperature sensors. These features can be used to detect and predict structural damage,
corrosion, and fatigue, which can help prevent catastrophic failures.

Quality Control in Manufacturing:


Feature extraction and selection algorithms are used in quality control to detect defects and
anomalies in manufactured products. For example, image processing techniques can be used to
extract features from images of manufactured parts, such as their dimensions and surface
roughness. These features can be used to classify parts as either defective or non-defective, which
can help reduce scrap and improve product quality.

Predictive Maintenance:
Predictive maintenance is the process of predicting when machines or equipment are likely to fail,
so that maintenance can be scheduled before a failure occurs. Feature extraction and selection
algorithms can be used to extract relevant features from sensors such as accelerometers,
thermocouples, and vibration sensors. These features can be used to train machine learning models
that can predict when equipment is likely to fail, and schedule maintenance accordingly. This can
help reduce downtime and maintenance costs.

Energy Efficiency:
Feature extraction and selection algorithms are also used in the field of energy efficiency. For
example, in heating, ventilation, and air conditioning (HVAC) systems, features such as
temperature, humidity, and airflow can be extracted from sensors to predict energy consumption
and optimize system performance. These features can be used to train machine learning models
that can adjust system parameters to minimize energy consumption while maintaining comfort
levels.
Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede
Robotics:
Feature extraction and selection algorithms are also used in robotics to extract relevant features
from sensor data such as lidar and camera images. These features can be used to enable
autonomous navigation, object recognition, and grasping. For example, machine learning models
can be trained to recognize objects in a cluttered environment, which can help robots pick and
place objects accurately.
In conclusion, feature extraction and selection algorithms have numerous applications in
Mechanical Engineering, ranging from structural health monitoring to robotics. These algorithms
can help extract relevant features from sensor data, and enable the development of machine
learning models that can improve safety, quality, efficiency, and productivity in various
mechanical engineering applications.

All Numerical Available in PDF which is in the video


description.

Dr. D.Y. Patil Institute of Technology, Pimpri, Pune. Prof. J.S. Narkhede

You might also like