0% found this document useful (0 votes)
119 views10 pages

ANOVA For Feature Selection in Machine Learning by Sampath Kumar Gajawada Towards Data Science

1. ANOVA can be used for feature selection in machine learning when the response variable is continuous and the predictor is categorical. 2. One-way ANOVA compares the variance between groups to the variance within groups using an F-test. 3. In an example using student data, a one-way ANOVA showed that the variance between final grades based on a student's guardian was significant, indicating the guardian feature impacts final grade.

Uploaded by

Cuvox
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views10 pages

ANOVA For Feature Selection in Machine Learning by Sampath Kumar Gajawada Towards Data Science

1. ANOVA can be used for feature selection in machine learning when the response variable is continuous and the predictor is categorical. 2. One-way ANOVA compares the variance between groups to the variance within groups using an F-test. 3. In an example using student data, a one-way ANOVA showed that the variance between final grades based on a student's guardian was significant, indicating the guardian feature impacts final grade.

Uploaded by

Cuvox
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Open in app Get started

Published in Towards Data Science

sampath kumar gajawada Follow

Oct 19, 2019 · 7 min read · Listen

Save

ANOVA for Feature Selection in Machine


Learning
Applications of ANOVA in Feature selection
Open in app Get started

Photo by Fahrul Azmi

The biggest challenge in machine learning is selecting the best features to train the
model. We need only the features which are highly dependent on the response variable.
But what if the response variable is continuous and the predictor is categorical ???
ANOVA ( Analysis of Variance) helps us to complete our job of selecting the best features.
Open in app Get started

In this article, I will take you through

a. Impact of Variance

b. F-Distribution

c. ANOVA

c. One Way ANOVA with example

Impact of Variance
Variance is the measurement of the spread between numbers in a variable. It measures
how far a number is from the mean and every number in a variable.

The variance of a feature determines how much it is impacting the response variable. If
the variance is low, it implies there is no impact of this feature on response and vice-
versa.

F-Distribution
A probability distribution generally used for the analysis of variance. It assumes
Hypothesis as

H0: Two variances are equal

H1: Two variances are not equal

Degrees of Freedom
Degrees of freedom refers to the maximum number of logically independent values,
which have the freedom to vary. In simple words, it can be defined as the total number
of observations minus the number of independent constraints imposed on the
observations.

Df = N -1 where N is the Sample Size


F- Value
It is the ratio of two Chi-distributions divided by its degrees of Freedom.
Open in app Get started

F value

Let’s solve the above equation and check how it can be useful to analyze the variance.

F ratio

In the real world, we always deal with samples so comparing standard deviations will be
almost equal to comparing the variances.

image from https://newonlinecourses.science.psu.edu

In the above fig, we could observe that the shape of F- distribution always depends on
degrees of freedom.
Open in app Get started

ANOVA
Analysis of Variance is a statistical method, used to check the means of two or more
groups that are significantly different from each other. It assumes Hypothesis as

H0: Means of all groups are equal.

H1: At least one mean of the groups are different.

How comparison of means transformed to the comparison of variance?


Consider two distributions and their behavior in below fig.

Behavior of distributions

From the above fig, we can say If the distributions overlap or close, the grand mean will
be similar to individual means whereas if distributions are far, the grand mean and
individual means differ by larger distance.

It refers to variations between the groups as the values in each group are different. So in
ANOVA, we will compare Between-group variability to Within-group variability.

ANOVA uses F-tet check if there is any significant difference between the groups. If there
is no significant difference between the groups that all variances are equal, the result of
ANOVA’s F-ratio will be close to 1.

One Way ANOVA with example


1. One Way ANOVA tests the relationship between categorical predictor vs continuous
Open in app Get started
response.

2. Here we will check whether there is equal variance between groups of categorical
feature wrt continuous response.

3. If there is equal variance between groups, it means this feature has no impact on
response and it can not be considered for model training.

Let’s consider a school dataset having data about student’s performance. We have to
predict the final grade of the student based on features like age, guardian, study time,
failures, activities, etc.

By using One Way ANOVA let us determine is there any impact of the guardian on the
final grade. Below is the data

Student final grades by the guardian

We can see guardian ( mother, father, other ) as columns and student final grade in
rows.

Steps to perform One Way ANOVA


1. Define Hypothesis

2. Calculate the Sum of Squares

3. Determine degrees of freedom


4. F-value
Open in app Get started

5. Accept or Reject the Null Hypothesis

Define Hypothesis

H0: All levels or groups in guardian have equal variance

H1: At least one group is different.

Calculate the Sum of Squares


The sum of squares is the statistical technique used to determine the dispersion in data
points. It is the measure of deviation and can be written as

Sum of Squares

As stated in ANOVA, we have to do F-test to check if there is any variance between the
groups by comparing the variance between the groups and variance within groups. This
can be done by using the sum of squares and the definitions are as follows.

Total Sum of Squares

The distance between each observed point x from the grand mean xbar is x-xbar. If you
calculate this distance between each data point, square each distance and add up all the
squared distances you get

Total Sum of Squares

Between the Sum of Squares

The distance between each group average value g from grand means xbar is g-xbar.
Doing similar to the total sum of squares we get
Open in app Get started

Between the Sum of Squares

Within the Sum of Squares

The distance between each observed value within the group x from the group-mean g is
given as x-g. Doing similar to the total sum of squares we get

Within the Sum of Squares

The total sum of squares = Between Sum of Squares + Within Sum of Squares

Determine degrees of freedom

We already discussed what is the definition of degrees of freedom now we will calculate
for between groups and within groups.

1. Since we have 3 groups ( mother, father, other) degrees of freedom for Between
groups can be given as (3–1) = 2.

2. Having 18 samples in each group, Degrees of freedom for within groups will be the
sum of degrees of freedom of all groups that is (18–1) + (18–1) + (18–1) = 51.

F-value
Since we are comparing the variance between the groups and variance within the
groups. The F value is given as

F value

Calculating Sum of Squares and F value here is the summary.


ANOVA table
Open in app Get started

Accept or reject the Null Hypothesis


With 95% confidence, alpha = 0.05 , df1 =2 ,df2 =51 given F value from the F table is
3.179 and the calculated F value is 18.49.

F test

In the above fig, we see that the calculated F value falls in the rejection region that is
beyond our confidence level. So we are rejecting the Null Hypothesis.
To Conclude, as the null hypothesis isOpen
rejected
in app that
Get started

means variance exists between the groups which


state that there is an impact of the guardian on
student final score. So we will include this feature for
model training.
615 8

Using One way ANOVA we can check only single predictor vs response and determine
the relationship but what if you have two predictors? we will use Two way ANOVA and if
there are more than two features we will go for multi-factor ANOVA.

Using two-way or multi-factor ANOVA we can check the relationship on a response like

1. Will the guardian is impacting the final student grade?


Sign up for The Variable
2. Will the student activities are impacting the final student grade?
By Towards Data Science

3.Every
WillThursday, the Variable
the guardian delivers the
and student very best together
activities of Towardsare
Dataimpacting
Science: from hands-on
final tutorials
grade?
and cutting-edge research to original features you don't want to miss. Take a look.

Drawing above conclusions doing one test is always interesting right ?? I am on the way
Get this newsletter
to make an article on two-way and multi-factor ANOVA and will make more interesting.

Here we dealt with having the response as continuous and predictor as categorical. If the
response is categorical and the predictor is categorical, please check on my article Chi-
Square test for Feature Selection in machine learning.

Chi-Square Test for Feature Selection in Machine learning


We always wonder where the Chi-Square test is useful in machine
learning and how this test makes difference.Feature…
towardsdatascience.com

Hope you enjoyed it !! Stay tuned !!! Please do comment on any queries or suggestions !!!!

You might also like