0% found this document useful (0 votes)
28 views25 pages

PSAI Unit 5

Uploaded by

vasikar22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views25 pages

PSAI Unit 5

Uploaded by

vasikar22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Degrees of Freedom

Degrees of Freedom are a number of the independent values that a


statistical analysis can estimate.
Degrees of freedom refers to the maximum number of logically
independent values, values that have the freedom to vary in data
science.
In statistics, degrees of freedom represent the number of values or
quantities in a statistical distribution that are free to vary. It's an
essential concept when working with different statistical tests,
including t-tests, chi-square tests, and analysis of variance (ANOVA).
Degrees of freedom, often denoted by v or df, are the number of
independent data points used to calculate the statistic.
In inferential statistics, you estimate a population parameter by
calculating a sample statistic. The number of independent data used
to calculate a statistic is called degrees of freedom. The degrees of
freedom of a statistic depend on the sample size.
If the sample size is small, there are only a few independent data and
therefore only a few degrees of freedom.
When the sample is large, there are many independent data and
therefore many degrees of freedom.
Although degrees of freedom are closely related to sample size, they
are not the same. There are always fewer degrees of freedom than the
sample size.

Example:
1. The total number of shoe pairs options is 7.
2. Each day, shoe pairs choice is free to vary, but there is a
restriction imposed by the choices made on previous days.
3. The degrees of freedom can be thought of as the number of
choices one can make independently.
On Monday, she can choose any of the seven desserts.
(7 options)
On Tuesday, she can choose any of the six remaining dessert options.
(6 options)
On Wednesday, she can choose any of the five remaining options.
(5 options)
On Thursday, She can choose any of the four options.
(4 options)
On Friday, she can choose any of the three options.
(3 options)
On Saturday, She can choose any of the two options.
(2 options)
On Sunday, she doesn’t have any options to choose from.
(No option, she have to choose remaining option)

Example:
You have a data set of 10 values and if the mean of the 10 values is 3.5,
the constraint requires that the sum of the 10 values must equal
10*3.5= 35?
First value to ninth values are free to vary(having freedom to choose
any value from 1 to 9)
0.1+0.2+0.3+0.4+0.5+0.6+0.7+0.8+0.9+30.5 =35
(1) (2) (3) (4) (5) (6) (7) (8) (9)

Upto 9th Value we’re having a choice to any value. 10th value will be
specific value equal to sum, so, 10th value should be 30.5 to equal to
the sum, to get mean 3.5
another way

3+(-8.2)+(-37)+(-92)+(-1)+0+1+(-22)+99+61.3 =35
(1) (2) (3) (4) (5) (6) (7) (8) (9)

Upto 9th Value we’re having a choice to any value. 10th value will be
specific value equal to sum, so, 10th value should be 61.3 to equal to
the sum, to get mean 3.5

Degrees of freedom formula

Df = n-1 (for one sample t test and paired t test)

Df = 10-1 = 9

Df = n1+n2-2 (two sample t test)


Student’s T - Distribution

The t-distribution is also called the student's t-distribution and is


used to make assumptions about the mean when we do not know the
standard deviation. In probability and statistics, the normal
distribution is a bell-shaped distribution with mean μ and standard
deviation σ. The T-distribution is similar to the normal distribution,
but flatter and shorter than the normal distribution. Here we discuss
t-distribution, formula, table, properties and applications.
Shape:
The t-distribution is symmetric and bell-shaped, much like the
normal distribution.
However, it has thicker tails compared to the normal distribution.
Parameters:
The t-distribution is characterized by its degrees of freedom (df),
which determine its shape.
As df increases, the t-distribution approaches the shape of the
standard normal distribution.

Properties of the distribution


1. The t-distribution variable ranges from -∞ to +∞.
2. The t-distribution is symmetric like the normal distribution if
the degree of t is even in the probability density function (pdf).
3. For large values ​of ν (ie, increased sample size n); The
T-distribution tends to the standard normal distribution. This
means that for different values ​of ν, the shape of the
t-distribution also varies.
4. The T-distribution is less peaked than the normal distribution in
the middle and higher in the tails.
5. The Y value (peak height) reaches a maximum at μ = 0 as seen in
the graph above.
6. The mean of the distribution is a division of 0 ν; 1 where ν =
degrees of freedom.
When to apply t Test
A t test is a statistical test that is used to compare the means of two
groups.

Assumptions about t test:


1)n<30
2)population Standard Deviation is unknown
3)population data is normally distributed

The t-test is one of the most common hypothesis tests in statistics.


The t-test determines either whether the sample mean and the mean
of the population differ or if two sample means differ statistically. The
t-test distinguishes between
One sample t-test
two sample t-test (independent Sample T test)
paired samples t test (dependent Sample T test)

One sample t - Test


The one-sample t-test is a statistical test used to determine if the
mean of a single sample is significantly different from a population
mean. This test is particularly useful when dealing with small sample
sizes and the population standard deviation is unknown.

Two Sample t- test


The two-sample t-test is a statistical test used to determine if the
means of two independent samples are significantly different from
each other. It is commonly employed when comparing the means of
two groups to assess if there is a statistically significant difference
between them.

paired sample t-test

The Paired Samples t Test compares the means of two measurements


taken from the same individual, object, or related units. These
"paired" measurements can represent things like: A measurement
taken at two different times (e.g., pre-test and post-test score with an
intervention administered between the two time points)
Equality of Variances

The F Test for Equality of Variances between two groups is a statistical


test used to compare the variances of two samples to determine
whether they are equal. It is based on the assumption that the samples
are drawn from normally distributed populations.

Steps in the F Test for Equality of Two Variances:

Specify the null and alternative hypotheses. The null hypothesis is


usually that the variances of the two samples are equal, while the
alternative hypothesis is that the variances of the two samples are not
equal.

Select two samples from the populations and calculate the sample
variances and sizes.

Calculate the test statistic, which is the ratio of the larger sample
variance to the smaller sample variance.

Determine the critical value of the test statistic based on the


significance level (alpha) of the test and the degrees of freedom for
the numerator and denominator.

The degrees of freedom for the numerator and denominator are


calculated as the sample sizes minus

Compare the calculated test statistic to the critical value to determine


whether to reject or fail to reject the null hypothesis. If the calculated
test statistic exceeds the critical value, the null hypothesis is rejected,
and the alternative hypothesis is accepted.

2 2
Null hypothesis: σ1 = σ2

2 2
Alternative Hypothesis : σ1 ≠ σ2
Calculate the sample mean & sample variance

Σ𝑥
𝑥 = 𝑛

Σ𝑦
𝑦 = 𝑛

2
2 Σ(𝑥−𝑥)
𝑠1 = 𝑛1− 1

2
2 Σ(𝑦−𝑦)
𝑠2 = 𝑛2− 1

2
𝑠1
F statistic = 𝑓 = 2
𝑠2

Degrees of freedom = n-1

Set level of significance

Find critical value

Compare the critical value at certain degrees of freedom and level of


significance with calculated f value
Chi - Square Test

The Chi-Square test is a statistical procedure for determining the difference


between observed and expected data. This test can also be used to determine
whether it correlates to the categorical variables in our data. It helps to find out
whether a difference between two categorical variables is due to chance or a
relationship between them.

A chi-square test is a statistical test that is used to compare observed and


expected results. The goal of this test is to identify whether a disparity between
actual and predicted data is due to chance or to a link between the variables
under consideration. As a result, the chi-square test is an ideal choice for
aiding in our understanding and interpretation of the connection between our
two categorical variables.

Chi-Squared Tests are most commonly used in hypothesis testing. A


hypothesis is an assumption that any given condition might be true, which can
be tested afterwards. The Chi-Square test estimates the size of inconsistency
between the expected results and the actual results when the size of the sample
and the number of variables in the relationship is mentioned.

These tests use degrees of freedom to determine if a particular null hypothesis


can be rejected based on the total number of observations made in the
experiments. Larger the sample size, more reliable is the result.

There are two main types of Chi-Square tests namely -

Independence
Goodness-of-Fit

Goodness of fit

The Chi-square goodness of fit test checks whether your sample data is
likely to be from a specific theoretical distribution. We have a set of data
values, and an idea about how the data values are distributed. The test gives
us a way to decide if the data values have a “good enough” fit to our idea, or
if our idea is questionable.
For the goodness of fit test, we need one variable. We also need an idea, or
hypothesis, about how that variable is distributed.

We have bags of candy with five flavors in each bag. The bags should
contain an equal number of pieces of each flavor. The idea we'd like to test
is that the proportions of the five flavors in each bag are the same.

For a group of children’s sports teams, we want children with a lot of


experience, some experience and no experience shared evenly across the
teams. Suppose we know that 20 percent of the players in the league have a
lot of experience, 65 percent have some experience and 15 percent are new
players with no experience. The idea we'd like to test is that each team has
the same proportion of children with a lot, some or no experience as the
league as a whole.

Chi square statistic =

How to perform a chi-square test


The exact procedure for performing a Pearson’s chi-square test
depends on which test you’re using, but it generally follows these
steps:

Create a table of the observed and expected frequencies. This can


sometimes be the most difficult step because you will need to carefully
consider which expected values are most appropriate for your null
hypothesis.

Calculate the chi-square value from your observed and expected


frequencies using the chi-square formula.
Find the critical chi-square value in a chi-square critical value table or
using statistical software.

Compare the chi-square value to the critical value to determine which


is larger.

Decide whether to reject the null hypothesis. You should reject the
null hypothesis if the chi-square value is greater than the critical
value. If you reject the null hypothesis, you can conclude that your
data are significantly different from what you expected.

Independence

The Chi-Square Test of Independence is a derivable ( also known as


inferential ) statistical test which examines whether the two sets of
variables are likely to be related with each other or not. This test is
used when we have counts of values for two nominal or categorical
variables and is considered a non-parametric test. A relatively large
sample size and independence of observations are the required
criteria for conducting this test.

Chi square statistic =


Goodness-of-fit test and independence test are two different types of
statistical tests used to analyze categorical data.

Goodness-of-Fit Test:

○ A goodness-of-fit test assesses how well observed categorical


data fit an expected distribution or model.
○ It is used when you have one categorical variable and you want
to test whether the observed frequency distribution of that
variable matches a theoretical or expected distribution.
○ Common examples include the chi-square goodness-of-fit test,
which compares observed frequencies to expected frequencies
under a specified distribution.

2. Independence Test:
○ An independence test examines whether two categorical
variables are independent of each other or if there's an
association between them.
○ It is used when you have two categorical variables and you want
to determine whether there's a relationship between them.
○ The most common test for independence is the chi-square test
of independence, which compares observed frequencies in a
contingency table to expected frequencies if the variables were
independent.
Chi Square Test for Independence

The chi-square test for independence is a statistical method used to determine


whether there is a significant association between two categorical variables. It
is commonly used in research to test hypotheses about the relationship
between two variables.

Finding the expected values from contingency table

R1 R2 Row Total
C1 a b a+b

C2 c d c+d

Column total a+c b+d N = a+b+c+d

𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 * 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙


Expected Value (E) = 𝑛

(𝑎+𝑏)*(𝑎+𝑐)
Expected Value E(a) =
𝑛

(𝑎+𝑏)*(𝑏+𝑑)
Expected Value E(b) =
𝑛

(𝑐+𝑑)*(𝑎+𝑐)
Expected Value E(c) =
𝑛

(𝑐+𝑑)*(𝑏+𝑑)
Expected Value E(d) =
𝑛

Degrees of freedom = (𝑟 − 1)(𝑐 − 1)

Compare the chi square value calculated with tabulated value


critical value.
Example:

In an antimalarial campaign in india, quinine was administered to 812


people out of a total population of 3248 the no.of fever cases is shown
below

Treatment Fever No Fever Row Total

Quinine 20(a1) 792(a2) 812

No Quinine 220(b1) 2216(b2) 2436

Column Total 240 3008 N = 3248

Null Hypothesis: Ho: Quinine was not effective in checking malaria


Alternate Hypothesis: H1: Quinine is effective in checking malaria

Now, calculate expected values from the given data


𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙*𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
Expected Value = 𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛

812*240
E(20) = 3248
=60
792*3008
E(792) = 3248 =752
2436*240
E(220) = 3248 =180
2436*3008
E(22160) = 3248 = 2256

Expected values tables


60(a1) 752(a2)

182(b1) 2256(b2)
Calculation of chi square

O E O-E (O-E)2
(𝑂−𝐸)2
𝐸

20 60 -40 1600 1600/60 =


26.66

792 752 40 1600 1600/752=


2.12

220 180 40 1600 1600/180 =


8.88

2216 2256 -40 1600 1600/2256


= 0.709

● X2 is the chi-square test statistic


● is the summation operator (it means
“take the sum of”)
● is the observed frequency
● is the expected frequency

(𝑂−𝐸)2
Σ 𝐸
= 26.66+2.12+8.88+0.709 = 38.369
Calculate the degrees of freedom
DF =(c-1)(r-1)
=(2-1)(2-1)
=1
Level of significance
∝ = 0.05/5%

Calculated chi square value is 38.369 & tabulated chi square is


3.841(with respect to chi square table)

38.369 > 3.841, hence we reject the null hypothesis.

You might also like