0% found this document useful (0 votes)
60 views5 pages

How To Perform T-Test in Pandas

The document discusses three types of t-tests that can be performed using pandas: independent two sample t-test, Welch's two sample t-test, and paired sample t-test. It provides examples of how to set up sample data in a pandas DataFrame and use functions from the SciPy library to conduct each t-test. The examples test whether two studying methods lead to different exam score means. The t-test output includes the test statistic and p-value, allowing a determination of whether the null hypothesis can be rejected.

Uploaded by

Garuma Abdisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views5 pages

How To Perform T-Test in Pandas

The document discusses three types of t-tests that can be performed using pandas: independent two sample t-test, Welch's two sample t-test, and paired sample t-test. It provides examples of how to set up sample data in a pandas DataFrame and use functions from the SciPy library to conduct each t-test. The examples test whether two studying methods lead to different exam score means. The t-test output includes the test statistic and p-value, allowing a determination of whether the null hypothesis can be rejected.

Uploaded by

Garuma Abdisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

How to Perform t-Tests in Pandas (3

Examples)

The following examples show how to perform three


different t-tests using a pandas DataFrame:

 Independent Two Sample t-Test


 Welch’s Two Sample t-Test
 Paired Samples t-Test

Example 1: Independent Two Sample t-Test in Pandas


An independent two sample t-test is used to determine if
two population means are equal.

For example, suppose a professor wants to know if


two different studying methods lead to different
mean exam scores.

To test this, he recruits 10 students to use method


A and 10 students to use method B.

The following code shows how to enter the scores


of each student in a pandas DataFrame and then
use the ttest_ind() function from the SciPy library to
perform an independent two sample t-test:
import pandas as pd
from scipy.stats import ttest_ind

#create pandas DataFrame


df = pd.DataFrame({'method': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
'score': [71, 72, 72, 75, 78, 81, 82, 83, 89, 91, 80, 81, 81,
84, 88, 88, 89, 90, 90, 91]})

#view first five rows of DataFrame


df.head()

method score
0 A 71
1 A 72
2 A 72
3 A 75
4 A 78

#define samples
group1 = df[df['method']=='A']
group2 = df[df['method']=='B']

#perform independent two sample t-test


ttest_ind(group1['score'], group2['score'])

Ttest_indResult(statistic=-2.6034304605397938, pvalue=0.017969284594810425)

From the output we can see:

 t test statistic: –2.6034


 p-value: 0.0179

Since the p-value is less than .05, we reject the null


hypothesis of the t-test and conclude that there is
sufficient evidence to say that the two methods
lead to different mean exam scores.

Example 2: Welch’s t-Test in Pandas


Welch’s t-test is similar to the independent two
sample t-test, except it does not assume that the
two populations that the samples came from
have equal variance.
To perform Welch’s t-test on the exact same
dataset as the previous example, we simply need
to specify equal_var=False within the ttest_ind()
function as follows:
import pandas as pd
from scipy.stats import ttest_ind

#create pandas DataFrame


df = pd.DataFrame({'method': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
'score': [71, 72, 72, 75, 78, 81, 82, 83, 89, 91, 80, 81, 81,
84, 88, 88, 89, 90, 90, 91]})

#define samples
group1 = df[df['method']=='A']
group2 = df[df['method']=='B']

#perform Welch's t-test


ttest_ind(group1['score'], group2['score'], equal_var=False)

Ttest_indResult(statistic=-2.603430460539794, pvalue=0.02014688617423973)

From the output we can see:

 t test statistic: –2.6034


 p-value: 0.0201

Since the p-value is less than .05, we reject the null


hypothesis of Welch’s t-test and conclude that
there is sufficient evidence to say that the two
methods lead to different mean exam scores.

Example 3: Paired Samples t-Test in Pandas


A paired samples t-test is used to determine if two
population means are equal in which each
observation in one sample can be paired with an
observation in the other sample.

For example, suppose a professor wants to know if


two different studying methods lead to different
mean exam scores.

To test this, he recruits 10 students to use method


A and then take a test. Then, he lets the same 10
students used method B to prepare for and take
another test of similar difficulty.

Since all of the students appear in both samples,


we can perform a paired samples t-test in this
scenario.

The following code shows how to enter the scores


of each student in a pandas DataFrame and then
use the ttest_rel() function from the SciPy library to
perform a paired samples t-test:
import pandas as pd
from scipy.stats import ttest_rel

#create pandas DataFrame


df = pd.DataFrame({'method': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
'score': [71, 72, 72, 75, 78, 81, 82, 83, 89, 91, 80, 81, 81,
84, 88, 88, 89, 90, 90, 91]})

#view first five rows of DataFrame


df.head()

method score
0 A 71
1 A 72
2 A 72
3 A 75
4 A 78

#define samples
group1 = df[df['method']=='A']
group2 = df[df['method']=='B']

#perform independent two sample t-test


ttest_rel(group1['score'], group2['score'])

Ttest_relResult(statistic=-6.162045351967805, pvalue=0.0001662872100210469)

From the output we can see:

 t test statistic: –6.1620


 p-value: 0.0001

Since the p-value is less than .05, we reject the null


hypothesis of the paired samples t-test and
conclude that there is sufficient evidence to say
that the two methods lead to different mean exam
scores.

You might also like