0% found this document useful (0 votes)
30 views35 pages

Psych Stats Reviewer

The document provides an overview of statistics, including definitions, types of data, categories of statistics, and the role of statistical software. It details various statistical software packages, their uses, advantages, and disadvantages, particularly focusing on SPSS. Additionally, it covers basic statistical concepts, hypothesis testing, and methods for analyzing and interpreting data.

Uploaded by

Aya Marquez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views35 pages

Psych Stats Reviewer

The document provides an overview of statistics, including definitions, types of data, categories of statistics, and the role of statistical software. It details various statistical software packages, their uses, advantages, and disadvantages, particularly focusing on SPSS. Additionally, it covers basic statistical concepts, hypothesis testing, and methods for analyzing and interpreting data.

Uploaded by

Aya Marquez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

PSYCH STATS

Basic Concepts and Categories of Statistics & Statistical Packages/Softwares

 Statistics (Singular sense) is a science which deals with the collection, organization, presentation,
analysis, and interpretation of data. a study of variation. In (plural sense) it is an actual number
derived from the data, a collection of facts and figures and a processed data (e.g. Population
statistics, statistics on births, statistics on enrollment).

Data (facts and figures)

Types of data

 Primary data acquired directly from the source Ex: data obtained by measuring wt. of 500 one-
day old chicks from Farm XYZ
 Secondary data is a non-primary data Ex: Phil. Rice Production (tons/ha) data by province from
1990-2014 taken from publications of the Phil. Bureau of Agricultural Statistics.

Categories of Statistics

 Descriptive statistics- methods of organizing, summarizing, presenting data and their


interpretation.
 Inferential statistics – concerned with making generalizations about a larger set of data where
only a part is examined.

Scope of Statistics

Role of Statistics

 A tool for data analysis (e.g. standard drug vs. new drug…. which is more effective?)
 Opinion poll survey (Do you think Philippines is ready for ASEAN integration 2015?)

Some Basic Terms:

 Universe – set of all entities or individuals under consideration/subject of the study.

2 types:

 Finite – when the elements of the universe can be counted for a given time period.
 Infinite – when the number of elements of the universe is unlimited.

Variable – characteristics of interest measurable or observable on each & every individual of the
universe.

 Population – set of all possible values of the variable.


 Sample – subset of the universe or the population.
 Distribution – pattern of variation of a variable.

The Variables and Levels of Measurement

 The measurement of a variable determines the amount of information that can be processed to
answer research objectives of a study. The scale of measurement of the variable determines the
algebraic operations that can be performed and the statistical tools that can be applied to
analyze the data. These are four scales or levels of measurement:
o Nominal - data collected are simply labels or names or categories without any implicit
or explicit ordering of the labels. It is also observations with the same label belong to
the same category, lowest level of measurement and the frequencies or counts of
observations belonging to the same category can be obtained.
o Ordinal - data collected are labels or classes with an implied ordering in these labels;
the difference between two labels cannot be quantified; a level of measure higher than
nominal;  only ordering or ranking can be done on the data;
o Interval - data collected can be ordered or ranked, added and subtracted, but not
divided nor multiplied; differences between any two data values can be determined; the
unit of measurement is constant (but arbitrary), and the zero point is arbitrary; a level of
measurement higher than ordinal.
o Ratio - data collected has all the properties of the interval scale and in addition, can be
multiplied and divided; has a true zero point; is the highest level of measurement.

Statistical Software is a specialized computer program used for data management and statistical analysis

 CS Pro - a software package for editing, tabulating, and disseminating data from censuses and
surveys; a public domain software
 SAS - a propriety software that enables users to implement data management, statistical
analysis, data mining, forecasting, etc.; a popular statistical software for medical research and
pharmaceutical industry.
 Stata - a propriety software that widely used in the field of economics, sociology and medicine;
executes data management and transformation, parameter estimations, graphics, statistical
measure computations and other related mathematical calculations; in executing the program,
time series, statistics and graphics are being loaded.
 Minitab a statistical software package originally intended for teaching statistics; Suitable for
moderate-size datasets.
 R a free software programming language based on S programming language; A software
environment for statistical computing and graphics.
 ITSM 2000 permit easy execution of data processing, graphical display, estimation, and
diagnostic testing for univariate and multivariate time series models in the time and frequency
domains; provides easy to use estimation and forecasting tools for spectral analysis; particularly,
the dynamic graphics allow the user to instantly see the effect of data transformations and
model changes on a wide variety of features such as the sample, residual, and model
autocorrelation functions and spectra.
 E-views offers an extensive array of powerful features for data handling, statistics and
econometric analysis, forecasting and simulation, data presentation, and programming.
 IRRISTAT - a set of microcomputer programs designed to assist agricultural researchers in
developing experimental lay-outs and undertaking plot sampling, data collection, data and file
management, statistical analysis of data and presentation of results.
 STAR - a freeware developed specifically by Biometrics and Breeding Informatics, Plant Breeding,
Genetics and Biotechnology Division of International Rice Research Institute); a computer
program for data management and basic statistical analysis of experimental data.
 SPSS (Statistical Package for Social Sciences) one of the most widely used program for statistical
analysis in Social Sciences.

Advantages:

 User-friendly interface
 Wide array of statistical procedures

Disadvantages:

 Expensive
 License is time limited
 Graphics are less impressive

IBM-SPSS Introduction

Quick Facts about SPSS

 It was invented by Norman H Nie, C. Hadlai “Tex” Hul, and Dale H. Bent during 1960s.
 In 1980s, the version of the software was moved to a personal computer.
 Last 2008, the name SPSS was changed to Predictive Analysis Software (PASW).
 A year after, SPSS was acquired by IBM and renamed the software as IBM SPSS Statistics.

Statistical Analysis and procedures we can do with SPSS

 Calculate Descriptive Statistics


 Compute Frequencies
 Compare Means  Do Test of Association and Independence
 Create Different Graphs and Charts
 Run Correlation and Regression
 Conduct Analysis of Variance (ANOVA) and many other Statistical Procedures

SPSS Windows

SPSS is divided into 3 main windows:

1. Data Editor Window - this is where you enter the data - divided into 2 views:
o Data View - a spreadsheet-like interface where you enter the data. This is the default
view when opening SPSS.
o Variable View - this is where you define your variables.
2. Output Window - this is where the result is being displayed.
3. Syntax Editor Window - is used to run and store SPSS command
4. Script Window – provides the opportunity to write full-blown programs, in a BASIC-like
language. Text editor for syntax composition. Extension of the saved file will be “sbs”.

Entering SPSS data

 Define the variable names. Click the Variable View tab at the bottom of the Data editor window.
 In the first row of the first column, type origin. Then press ENTER key. In the second row, type
age. Then ENTER. In the third row, type num_sib. Press ENTER.
 New variables are automatically given a Numeric data type.
 Type – the type of variable.
o Internal formats: Numeric, String (alphanumeric), Date
o Output formats: Comma, Dot, Scientific notation, Dollar, Custom currency
 Width – number of characters or numerical digit you will be able to enter for a particular
variable.
 Decimals – desired number of decimal places.
 Label – full name of the variable
 Values – Use to assign values to variables e.g. 1 – Male 2 – Female
 Missing – allows you to assign missing values.
 Column– determine the size of column display.
 Align – Alignment of data in column
 Measure – Level of measurement of data/variable

Columns and Align

 Columns sets the amount of space reserved to display the contents of the variable in Data View;
generally the default value is adequate.
 Align sets whether the contents of the variable appear on the left, centre or right of the cell in
Data View.
 Numeric variables are right-hand justified by default and string variables left-hand justified by
default; the defaults are generally adequate.

Interpretation Measure of Central Tendency

 Mean - the average value


 Median – the middle value of the data set when it is arranged in an ascending or decreasing
order
 Mode – the most frequently occurring value(s) in the data set

Measure of Location

 Minimum – smallest observed value in the data


 Maximum – largest value observed in the data

Measure of Dispersion

 Standard deviation – a measure of variability of the data points from the mean value
 Variance – average squared differences of the data points from the mean value
 Range – the simplest measure of variation computed as the difference between the highest and
lowest value of the data set.
Recoding (Transforming) Variables

Sometimes you will want to transform a variable by combining some of its categories or values together.
For example, you may want to change a continuous variable into an ordinal categorical variable, or you
may want to merge the categories of a nominal variable. In SPSS, this type of transform is called
recoding.

In SPSS, there are two basic options for recoding variables:

1. Recode into Different Variables


2. Recode into Same Variables

Each of these options allows you to re-categorize an existing variable. Recode into Different Variables
create a new variable without modifying the original variable, while Recode into Same Variables will
permanently overwrite the original variable. In general, it is best to recode a variable into a different
variable so that you never alter the original data and can easily access the original data if you need to
make different changes later on.

Recode into Different Variables

 Recoding into a different variable transforms an original variable into a new variable. That is, the
changes do not overwrite the original variable; they are instead applied to a copy of the original
variable under a
 new name. To recode into different variables, click Transform > Recode into Different Variables.
 The left column lists all of the variables in your dataset. Select the variable you wish to recode by
clicking it. Click the arrow in the center to move the selected variable to the center text box, (B).
Old And New Values

Once you click Old and New Values, a new window where you will specify how to transform the values
will appear.

 Value: Enter a specific numeric code representing an existing category. System-missing: Applies
to any system-missing values (.)
 System- or user-missing: Applies to any system-missing values (.) or special missing value codes
defined by the user in the Variable View window
 Range: For use with ordered categories or continuous measurements. Enter the lower and upper
boundaries that should be coded. The recoded category will include both endpoints, so data
values that are exactly equal to the boundaries will be included in that category.
 Range, LOWEST through value: For use with ordered categories or continuous measurements.
Recode all values less than or equal to some number.
 Range, value through HIGHEST: For use with ordered categories or continuous measurements.
Recode all values greater than or equal to some number.
 All other values: Applies to any value not explicitly accounted for by the previous recoding rules.
If using this setting, it should be applied last.

Recode into Same Variables

 Recoding into the same variable (Transform > Recode into Same Variables) works the same way
as described above, except for that any changes made will permanently alter the original
variable. That is, the original values will be replaced by the recoded values. In general, it is good
practice not to recode into the same variable because it overwrites the original variable. If you
ever needed to use the variable in its original form (or wanted to double-check your steps), that
information would be lost.
Computing Variables

 Sometimes you may need to compute a new variable based on existing information (from other
variables) in your data. For example, you may want to: Convert the units of a variable from feet
to meters Use a subject's height and weight to compute their BMI. Compute a subscale score
from items on a survey. Apply a computation conditionally, so that a new variable is only
computed for cases where certain conditions are met .

When writing an expression in the Compute Variables dialog window:

 SPSS is not case-sensitive with respect to variable names.


 When specifying the formula for a new variable, you have to option to include or not include
spaces after the commas that go between arguments in a function.
 Do not put a period at the end of the expression you enter into the Numeric Expression box.
Computing Variables using Syntax

 You do not necessarily need to use the Compute Variables dialog window in order to compute
variables or generate syntax. You can write your own syntax expressions to compute variables
(and it is often faster and more convenient to do so!) Syntax expressions can be executed by
opening a new Syntax file (File > New > Syntax), entering the commands in the Syntax window,
and then pressing the Run button. The general form of the syntax for computing a new (numeric)
variable is.

Generating Descriptive Statistics

 Consider the data on screening exam scores of 20 freshman applicants each in Science High
school and Rural High School.

Null Hypothesis – the conjecture which is being tested, denoted by Ho. - Generally, this is a statement of
equality or status quo or no difference.

Alternative Hypothesis – the complementary statement that will be accepted in the event that the null
hypothesis is rejected. It is denoted by Ha or H1.

2 Types of error:

 Type I error – error in rejecting a true Ho. The probability of committing Type I error is denoted
by ; i.e  = P[Type I error] = P[reject Ho/Ho is true = level of significance of a statistical test
 Type II error – error in accepting a false Ho Probability of Committing Errors 1. The probability of
committing Type II error is denoted by ; i.e  = P[Type II error] = P[accept Ho/Ho is false]

Test of statistical hypothesis

 Test statistic - Statistic which provides a basis for determining whether to reject Ho in favor of
Ha.
 Decision Rule - Rule which specifies that region for which the test statistic leads to the rejection
of Ho in favor of Ha.
 Critical Region - The region specified on the test of Ho vs Ha.
Test on assumptions

 In most situations, the satisfaction of assumptions for certain parametric methods ensures the
validity of the results and the appropriateness of the test employed. It is for this reason that a
number of methods has been designed to test on certain assumptions of parametric methods.
1. Tests on Equality of Variances
o The assumption of homoskedasticity (equality of variances) is used in ANOVA techniques
and regression analysis.
o The assumption of homoskedasticity is necessary for some tests to be valid.
o The Bartlett’s test makes use of the  2 test.
o It tests whether p populations have equal variances of the samples obtained from the p
populations.
o One of the many assumptions in the analysis of an experimental data.
o If this assumption does not hold, the F-tests in the analysis of variance is not valid

Procedure: Analyze > Compare Means > Oneway ANOVA > Options > Homogeneity of Variance Test

2. The Run’s Test for Randomness


o Inferential statistics will only be valid if random samples are taken from the population(s)
of interest, i.e., successive observations must be independent of each other.
o Test for randomness are usually based on the sequence or order in which observations
were obtained.

Procedure: Analyze > Nonparametric Tests > Legacy Dialogs > RUNS
3. The One-Sample Test for Normality
o Use Wilk-Shapiro test (for N < 2000) and Kolmogorov-Smirnov (K-S) test (for N > 2000) is
used to determine whether the sample data came from a normal distribution or not.
o It makes use of the standard normal distribution as the basis to say whether a certain
distribution is normal or not.

Procedure: Analyze > Descriptive Statistics > Explore > Plots > Normality plots with tests.

In instances wherein certain assumptions are not satisfied, appropriate transformations and adjustments
to the data must be done before parametric methods (e.g., t, Z of F tests) are employed. Another
alternative in such instances is also done, i.e., to employ the nonparametric counterpart of the
appropriate parametric test.
Nonparametric Statistical tests

 Also called distribution-free statistics.


 No assumptions are made about the precise form of the sampled population.
 Easier to apply.
 Applicable to rank data
 Usable when two sets of observations come from different populations
 The only alternative when sample size is small (n< 25)
 Useful at a specified significance level as stated (whatever happened to be the shape of the
distribution from which the sample distribution was drawn)
 Lower statistical efficiency

Parametric statistical test (e.g., Z, t, F tests) are more powerful than nonparametric tests.
Kruskal-Wallis H Test

 The Kruskal-Wallis H test (sometimes also called the "one-way ANOVA on ranks") is a rank-
based nonparametric test that can be used to determine if there are statistically significant
differences between two or more groups of an independent variable on a continuous or ordinal
dependent variable. It is considered the nonparametric alternative to the one-way ANOVA, and
an extension of the Mann-Whitney U test to allow the comparison of more than two
independent groups.

Assumptions in a Kruskal Wallis Test

There are certain assumptions in the Kruskal-Wallis test.

 It is assumed that the observations in the data set are independent of each other.
 It is assumed that the distribution of the population should not be necessarily normal and the
variances should not be necessarily equal.
 It is assumed that the observations must be drawn from the population by the process of
random sampling.

Output o Kruskal-Wallis H Test

You will be presented with the following output (assuming you did not select the Descriptive checkbox in
the "Several Independent Samples: Options" dialogue box):

 The mean rank (i.e., the "Mean Rank" column in the Ranks table) of the Pain Score for each drug
treatment group can be used to compare the effect of the different drug treatments. Whether
these drug treatment groups have different pain scores can be assessed using the Test
Statistics table which presents the result of the Kruskal-Wallis H test. That is, the chi-squared
statistic (the "Chi-Square" row), the degrees of freedom (the "df" row) of the test and the
statistical significance of the test (the "Asymp. Sig." row).

Example: A shoe company wants to know if three groups of workers have different salaries:
Women: 23000, 41000, 54000, 66000, 78000
Men: 45000, 55000, 60000, 70000, 72000
Minorities: 20000, 30000, 34000, 40000, 44000
Test the difference among three groups, using 𝛼 = 0.05.
STEP 1: Identify the null and alternative hypothesis.
Ho: There is no significant difference among the salaries of the three groups of workers.
Ha: There is a significant difference among the salaries of the three groups of workers.
STEP 2: Identify the test procedure

Test Procedure: Kruskal-Wallis Test

STEP 3: Identify the level of significance

𝛼 = 0.05 𝑜𝑟 5%
STEP 4: Write the decision rule.

Decision Rule: Reject Ho if sig < 𝛼; otherwise, fail to reject Ho

Step 5:

sig = 0.035

𝛼 = 0.05

STEP 6: Write the decision.

Decision: Since sig = 0.035 < 𝛼 = 0.05; we reject the Ho.

STEP 7: Write conclusion.

Conclusion: At 𝛼 = 5%, there is a significant difference among the salaries of the three groups of
workers.

Test for Two Related Samples

 The paired sample t-test, sometimes called the dependent sample t-test, is a statistical
procedure used to determine whether the mean difference between two sets of observations is
zero.
 In a paired sample t-test, each subject or entity is measured twice, resulting in pairs of
observations.
 Common applications of the paired sample t-test include case-control studies or repeated-
measures designs.

The Paired Samples t Test compares the means of two measurements taken from the same
individual, object, or related units. These "paired" measurements can represent things like:

 A measurement taken at two different times (e.g., pre-test and post-test score with an
intervention administered between the two time points)
 A measurement taken under two different conditions (e.g., completing a test under a
"control" condition and an "experimental" condition)
 Measurements taken from two halves or sides of a subject or experimental unit (e.g.,
measuring hearing loss in a subject's left and right ears).

The purpose of the test is to determine whether there is statistical evidence that the mean difference
between paired observations is significantly different from zero. The Paired Samples t Test is a
parametric test.
This test is also known as:

 Dependent t Test
 Paired t Test
 Repeated Measures t Test

The variable used in this test is known as:

 Dependent variable, or test variable (continuous), measured at two different times or for
two related conditions or units.

Common Uses

The Paired Samples t Test is commonly used to test the following:

 Statistical difference between two time points


 Statistical difference between two conditions
 Statistical difference between two measurements
 Statistical difference between a matched pair

Data Requirements

 Your data must meet the following requirements:


 Dependent variable that is continuous (i.e., interval or ratio level)
o Note: The paired measurements must be recorded in two separate variables.
 Related samples/groups (i.e., dependent observations)
o The subjects in each sample, or group, are the same. This means that the subjects in the
first group are also in the second group.
 Random sample of data from the population
 Normal distribution (approximately) of the difference between the paired values
 No outliers in the difference between the two related groups

Hypothesis

The hypotheses can be expressed in two different ways that express the same idea and are
mathematically equivalent:
 H0: µ1 = µ2 ("the paired population means are equal")
H1: µ1 ≠ µ2 ("the paired population means are not equal")
OR
 H0: µ1 - µ2 = 0 ("the difference between the paired population means is equal to 0")
H1: µ1 - µ2 ≠ 0 ("the difference between the paired population means is not 0")
where
 µ1 is the population mean of variable 1, and
 µ2 is the population mean of variable 2.
Test Statistics

The test statistic for the Paired Samples t Test, denoted t, follows the same formula as the one
sample t test.

Or

The calculated t value is then compared to the critical t value with df = n - 1 from the t distribution table
for a chosen confidence level. If the calculated t value is greater than the critical t value, then we reject
the null hypothesis (and conclude that the means are significantly different).
 x¯diff = Sample mean of the differences
 n= Sample size (i.e., number of observations)
 sdiff= Sample standard deviation of the differences
 sx¯ = Estimated standard error of the mean (s/sqrt(n))
Data Setup
 Data should include two continuous numeric variables (represented in columns) that will be
used in the analysis.
 The two variables should represent the paired variables for each subject (row). If your data are
arranged differently (e.g., cases represent repeated units/subjects), simply restructure the data
to reflect this format.

How to run a paired samples t test in SPSS?

To run a Paired Samples t Test in SPSS, click Analyze > Compare Means > Paired-Samples T Test.

The Paired-Samples T Test window opens where you will specify the variables to be used in the analysis.
All of the variables in your dataset appear in the list on the left side. Move variables to the right by
selecting them in the list and clicking the blue arrow buttons. You will specify the paired variables in
the Paired Variables area.

 Pair: The “Pair” column represents the number of Paired Samples t Tests to run. You may choose
to run multiple Paired Samples t Tests simultaneously by selecting multiple sets of matched
variables. Each new pair will appear on a new line.
 Variable1: The first variable, representing the first group of matched values. Move the variable
that represents the first group to the right where it will be listed beneath the
“Variable1” column.
 Variable2: The second variable, representing the second group of matched values. Move the
variable that represents the second group to the right where it will be listed beneath
the “Variable2” column.
 Options: Clicking Options will open a window where you can specify the Confidence Interval
Percentage and how the analysis will address Missing Values (i.e., Exclude cases analysis by
analysis or Exclude cases listwise). Click Continue when you are finished making specifications.
Setting the confidence interval percentage does not have any impact on the calculation of the p-value. If
you are only running one paired samples t test, the two "missing values" settings will produce the same
results. There will only be differences if you are running 2 or more paired samples t tests. (This would
look like having two or more rows in the main Paired Samples T Test dialog window.)
Example:

Research question: Is there a difference in marks following a teaching intervention?

The marks for a group of students before(pre) and after(post) a teaching intervention are recorded
below:

Step 1: State the null and Alternative Hypotheses

Ho: There is no significant in mean pre- and post-marks

Ha: There is a significant difference in mean pre- and post-marks

Step 2: Identity the test procedure

Test Procedure: Paired Sample T-test

Step 3: Identify the level of significance

a= 0.05

Step 4: Write the Decision Rule

Reject Ho if the sig < a; Otherwise, fail to reject Ho.

Decision: The null hypotheses is rejected, since p <0.05 (in fact p =0.04)

Conclusion: There is strong evidence (t=3.231, p =0.004 that the teaching intervention improves marks.
In this data set, it improved marks, on average, by approximately 2 points. Of course, if we were to be
take other sample marks , we could get a ‘mean paired difference’ in marks different from 2.05.

Mann-Whitney U-Test

 The Wilcoxon-Mann-Whitney (WMW) test was proposed by Frank Wilcoxon in 1945(“Wilcoxon


rank sum test”) and by Henry Mann and Donald Whitney in 1947 (“Mann-Whitney U test”).
 Another names for the Mann-Whitney U Test are Wilcoxon Rank Sum Test, Mann–Whitney–
Wilcoxon (MWW) or Wilcoxon–Mann–Whitney test.
 The Mann-Whitney U Test is a non-parametric test used to compare differences between two
independent groups when the dependent variable is either ordinal or continuous, but not
normally distributed.
 Is a non-parametric test of the null hypothesis that, for randomly selected values X and Y from
two populations, the probability of X being greater than Y is equal to the probability of Y being
greater than X.
 For example, you could use the Mann-Whitney U test to understand whether attitudes towards
pay discrimination, where attitudes are measured on an ordinal scale, differ based on gender
(i.e., your dependent variable would be "attitudes towards pay discrimination" and your
independent variable would be "gender", which has two groups: "male" and "female").
 Alternately, you could use the Mann-Whitney U test to understand whether salaries, measured
on a continuous scale, differed based on educational level (i.e., your dependent variable would
be "salary" and your independent variable would be "educational level", which has two groups:
"high school" and "university").
 The Mann-Whitney U test is often considered the nonparametric alternative to the independent
t-test.

Assumptions:

 Assumption #1: Your dependent variable should be measured at the ordinal or continuous level.

o Examples of ordinal variables include Likert items (e.g., a 7-point scale from "strongly
agree" through to "strongly disagree"), amongst other ways of ranking categories (e.g., a
5-point scale explaining how much a customer liked a product, ranging from "Not very
much" to "Yes, a lot").
o Examples of continuous variables include revision time (measured in hours), intelligence
(measured using IQ score), exam performance (measured from 0 to 100), weight
(measured in kg), and so forth.

 Assumption # 2: Your independent variable should consist of two categorical, independent


groups.

o Examples of independent variables that meet this criterion include gender (2 groups:
male or female), employment status (2 groups: employed or unemployed), smoker (2
groups: yes or no), and so forth.
 Assumption #3: You should have independence of observations, which means that there is no
relationship between the observations in each group or between the groups themselves.
o For example, there must be different participants in each group with no participant
being in more than one group. This is more of a study design issue than something you
can test for, but it is an important assumption of the Mann-Whitney U test. If your study
fails this assumption, you will need to use another statistical test instead of the Mann-
Whitney U test (e.g., a Wilcoxon signed-rank test).
 Assumption # 4: A Mann-Whitney U test can be used when your two variables are not normally
distributed.
o However, in order to know how to interpret the results from a Mann-Whitney U test,
you have to determine whether your two distributions (i.e., the distribution of scores
for both groups of the independent variable; for example, 'males' and 'females' for the
independent variable, 'gender') have the same shape.
Test Procedure in SPSS Statistics

1. Click Analyze -> Nonparametric Tests -> Legacy Dialogs -> 2 Independent Samples.
2. Drag and drop the dependent variable into the Test Variable(s) box, and the grouping variable
into the Grouping Variable box.
3. Tick Mann-Whitney U under Test Type.
4. Click on Define Groups, and input the values that define each of the groups that make up the
grouping variable (i.e., the coded value for Group 1 and the coded value for Group 2).
5. Press Continue, and then click on OK to run the test.
6. The result will appear in the SPSS data viewer.

Assumption of Normality

 Given this setup, it would be usual to conduct an independent samples t test. One assumption of
this parametric test is that data is normally distributed. The trouble is if we test our data for
normality, we get this result.
 Both Kolmogorov-Smirnov and Shapiro-Wilk suggest that our dependent variable is not
distributed normally. This is confirmed by the histogram, which has a long left tail.
 This means we’re better off using a non-parametric test to determine whether there is a
relationship between our independent and dependent variables (though, actually, since we have
a large number of observations, we’d probably get away with the t test). The obvious choice
here is the Mann-Whitney U test.

Example:

Consider a Phase II clinical trial designed to investigate the effectiveness of a new drug to reduce
symptoms of asthma in children. A total of n=10 participants are randomized to receive either the new
drug or a placebo. Participants are asked to record the number of episodes of shortness of breath over a
1 week period following receipt of the assigned treatment. The data are shown below. Is there a
difference in the number of episodes of shortness of breath over a 1 week period in participants
receiving the new drug as compared to those receiving the placebo?

Solution Exercise 1.

1. State the null and alternative hypotheses

𝐻𝑂 : There is no difference in the number of episodes of shortness of breath over a 1 week period in
participants receiving the new drug as compared to those receiving the placebo.
𝐻1 : There is a difference in the number of episodes of shortness of breath over a 1 week period in
participants receiving the new drug as compared to those receiving the placebo.
In symbol,

𝐻𝑂 : 𝑢1 = 𝑢2

𝐻1 : 𝑢1 ≠ 𝑢2

2. Determine if the hypotheses are one- or two-tailed.

These hypotheses are two-tailed as the null is written with is equal to sign.
3. Select the appropriate test statistic.

Test Statistic: Mann-Whitney U Test

4. Specify the level of significance

𝛼 = 0.01 or 1%

5. Write the Decision Rule

Reject 𝐻𝑂 if the sig < 𝛼; otherwise, fail to reject 𝐻𝑂

The final section of the output gives the values of the Mann-Whitney U test (and several other tests as
well.) In this example, the Mann-Whitney U value is 3.000. There are two p values given -- one on the
row labeled Asymp. Sig (2-Tailed) and the other on the row labeled Exact Sig. [2*(1- tailed Sig.)].
Typically, we will use the exact significance, although if the sample size is large, the asymptotic
significance value can be used to gain a little statistical power.

Sig = .056

𝛼 = 0.01

7. Decision:

Since Sig = .056 > α = 0.01; fail to reject 𝐻0 .

8. Conclusion:

At α = 0.01, there is no difference in the number of episodes of shortness of breath over a 1 week
period in participants receiving the new drug and for those receiving the placebo.

Or the result of the Mann-Whitney U test supports the proposition that the effects of using new drug is
the same as to effect of using placebo in reducing symptoms of asthma among children.

Pearson Product Moment Correlation Coefficient

What is Pearson Correlation?

The bivariate Pearson Correlation produces a sample correlation coefficient, r, which measures
the strength and direction of linear relationships between pairs of continuous variables. By extension,
the Pearson Correlation evaluates whether there is statistical evidence for a linear relationship among
the same pairs of variables in the population, represented by a population correlation coefficient, ρ
(“rho”). The Pearson Correlation is a parametric measure.

This measure is also known as:

 Pearson’s correlation
 Pearson Product-Moment Correlation (PPMC)
Common Uses

Common Applications: Exploring the relationship (linear) between 2 variables; eg, as variable A
increases, does variable B increase or decrease? The relationship is measured by a quantity called
correlation.

The bivariate Pearson correlation indicates the following:

 Whether a statistically significant linear relationship exists between two continuous variables
 The strength of a linear relationship (i.e., how close the relationship is to being a perfectly
straight line)
 The direction of a linear relationship (increasing or decreasing)

Direction

 The sign of the correlation coefficient indicates the direction of the relationship.
 + direct relationship - when one variable increases the other one also increases, or when one
variable decreases the other one also decreases

 inverse relationship - when one variable increases the other one decreases and vice versa

 -1 : perfectly negative linear relationship


 0 : no relationship
 +1 : perfectly positive linear relationship

A coefficient close to 1 means a strong and positive association between the two variables, and a
coefficient close to -1 means strong negative association between the two variables.

You have to be careful with the following matters:

 Association does not mean necessarily a causal relation between both variables. For example,
there might be a third variable you have not considered and this third variable might be the
explanation for the behavior of the other two.
 Even if there is a causal relationship between the variables, the correlation coefficient does not
tell you which variable is the cause and which is the effect.
 If the coefficient is close to 0, it does not necessarily mean that there is no relation between the
two variables. It means there is not a LINEAR relationship, but there might be another type of
functional relationship (for example, quadratic or exponential).

Degree

Magnitude/strength is determined from correlation coefficient (absolute value):

 0 - .19 = very weak correlation/relationship


 .20 - .39 = weak correlation
 .40 - .59 = moderate correlation
 .60 - .79 = strong correlation
 .80 - 1 = very strong correlation
Significance (is determined by the sig. Value)

 The scatterplots below show correlations that are r = +0.90, r = 0.00, and r = -0.90, respectively.
The strength of the nonzero correlations are the same. But the direction of the correlations is
different: a negative correlation corresponds to a decreasing relationship, while and a positive
correlation corresponds to an increasing relationship.

Note: The bivariate Pearson Correlation cannot address non-linear relationships or relationships among
categorical variables. If you wish to understand relationships that involve categorical variables and/or
non-linear relationships, you will need to choose another measure of association.

The bivariate Pearson Correlation only reveals associations among continuous variables. The
bivariate Pearson Correlation does not provide any inferences about causation, no matter how large the
correlation coefficient is.

Data Requirements:

To use Pearson correlation, your data must meet the following requirements:

1. Two or more continuous variables (i.e., interval or ratio level)


2. Cases must have non-missing values on both variables
3. Linear relationship between the variables
4. Independent cases (i.e., independence of observations)
i. the values for all variables across cases are unrelated
ii. for any case, the value for any variable cannot influence the value of any variable for
other cases
5. Bivariate normality
i. Each pair of variables is bivariately normally distributed
ii. Each pair of variables is bivariately normally distributed at all levels of the other
variable(s)
iii. This assumption ensures that the variables are linearly related; violations of this
assumption may indicate that non-linear relationships among variables exist. Linearity
can be assessed visually using a scatterplot of the data.
6. Random sample of data from the population
7. No outliers
Note: The bivariate Pearson Correlation cannot address non-linear relationships or relationships among
categorical variables. If you wish to understand relationships that involve categorical variables and/or
non-linear relationships, you will need to choose another measure of association.

The bivariate Pearson Correlation only reveals associations among continuous variables. The bivariate
Pearson Correlation does not provide any inferences about causation, no matter how large the
correlation coefficient is.

 The Bivariate Correlations window opens, where you will specify the variables to be used in the
analysis. All of the variables in your dataset appear in the list on the left side. To select variables
for the analysis, select the variables in the list on the left and click the blue arrow button to
move them to the right, in the Variables field.

Example:

Calcium_Intake.sav show the calcium intake of Special Program in Sports students and their knowledge
about calcium. Is there a relationship between calcium intake and knowledge about calcium of SPS
students at a = 0.05?

Steps:

Ho: There is no correlation/association between calcium intake and knowledge about calcium of SPS
students.
Ho : r = 0
Ha: There is a correlation/association between calcium intake and knowledge about calcium of SPS
students.
Ha : r = 0
Test Procedure: Pearson Product-Moment Correlation

a = 5%

Computations: ANALYZE>CORRELATE>BIVARIATE

Sig=0.002

r=.533 (.40 - .59 = moderate correlation)

Decision: Since sig = 0.002 < a = 0.05; we reject Ho

Decision: There is a correlation/association between calcium intake and knowledge about calcium of SPS
students.

Based on the results, we can state the following:

 Calcium intake and knowledge about calcium were found to be moderately positively
correlated, r(28) = .533, p = .002.
 Among the SPS students, the calcium intake and knowledge about calcium were moderately
positively correlated, r(28) = .533, p < .05.

Conclusion:

 We can conclude that for SPS students there is evidence that knowledge about calcium is related
to calcium intake. In particular, it seems that the more the SPS students know about calcium, the
greater their calcium intake is (r = 0.53, p = .002).

One-Way Analysis of Variance

 ANOVA was developed by Ronald Fisher in 1918 and is the extension of the t and the z test.
Before the use of ANOVA, the t-test and z-test were commonly used. But the problem with the T-
test is that it cannot be applied for more than two groups. In 1918, Ronald Fisher developed a
test called the analysis of variance.

What is a One-Way ANOVA?

 One-Way ANOVA (Analysis of Variance) compares the means of two or more independent
groups in order to determine whether there is statistical evidence that the associated population
means are significantly different. One-Way ANOVA is a parametric test.

This test is also known as:

 One-Factor ANOVA
 One-Way Analysis of Variance
 Between Subjects ANOVA

The variables used in this test are known as:

 Dependent variable
 Independent variable (also known as the grouping variable, or factor)
 This variable divides cases into two or more mutually exclusive levels, or groups
The One-Way ANOVA is often used to analyze data from the following types of studies:

 Field studies
 Experiments
 Quasi-experiments

The One-Way ANOVA is commonly used to test the statistical differences among the means of two or
more groups or interventions

When to use a One-Way ANOVA?

 Use a one-way ANOVA when you have collected data about one categorical independent
variable and one quantitative dependent variable. The independent variable should have at least
three levels (i.e. at least three different groups or categories). ANOVA tells you if the dependent
variable changes according to the level of the independent variable. If you only want to compare
two groups, use a t-test instead.

Your data must satisfy the following assumptions:

1. Dependent variable that is continuous (i.e., interval or ratio level)


2. Independent variable should be categorical (nominal or ordinal) (i.e., three or more groups)
3. Independent samples/groups (i.e., independence of observations)

There is no relationship between the subjects in each sample. This means that:

 subjects in the first group cannot also be in the second group


 no subject in either group can influence subjects in the other group
 no group can influence the other group

 4. Random sample of data from the population

PROCEDURE: Analyze > Nonparametric Tests > Legacy Dialogs > RUNS

 Normal distribution (approximately) of the dependent variable for each group (i.e., for each
level of the factor)

PROCEDURE: Analyze > Descriptive Statistics > Explore > Plots > Normality plots with tests.

 Homogeneity of variances (i.e., variances approximately equal across groups; group variances
are homogenous)
PROCEDURE: Analyze > Compare Means > Oneway ANOVA > Options > Homogeneity of Variance
Test

 No outliers

Note: When the normality, homogeneity of variances, or outliers assumptions for One-Way ANOVA are
not met, you may want to run the nonparametric Kruskal-Wallis test instead.
How to Run a One-Way ANOVA

 To run a One-Way ANOVA in SPSS, click Analyze > Compare Means > One-Way ANOVA.
 The One-Way ANOVA window opens, where you will specify the variables to be used in the
analysis. All of the variables in your dataset appear in the list on the left side. Move variables to
the right by selecting them in the list and clicking the blue arrow buttons. You can move a
variable(s) to either of two areas: Dependent List or Factor.

 When the initial F test indicates that significant differences exist between group means,
contrasts are useful for determining which specific means are significantly different when you
have specific hypotheses that you wish to test. Contrasts are decided before analyzing the data
(i.e., a priori). Contrasts break down the variance into component parts. They may involve using
weights, non-orthogonal comparisons, standard contrasts, and polynomial contrasts (trend
analysis).
When the initial F test indicates that significant differences exist between group means, post hoc tests
are useful for determining which specific means are significantly different when you do not have specific
hypotheses that you wish to test. Post hoc tests compare each pair of means (like t-tests), but unlike t-
tests, they correct the significance estimate to account for the multiple comparisons.

Example:

Math_Performance.sav data show the Mathematics performance of the Junior High School students.
Test the differences among the four grade levels, using a = 0.05.

Before proceeding to parametric test of One-Way ANOVA, perform the test on assumptions of normality,
randomness, and homoscedasticity.

TEST ON NOMALITY

STEPS:

Test of Hypothesis:

Ho = The distribution of data is normal.

Ha = The distribution of data is not normal.

Test Procedure: WILK-SHAPIRO TEST FOR NORMALITY

a = 5%

Computations:

SPSS PROCEDURE: ANALYZE>DESCRIPTIVE STATISTICS> EXPLORE> PLOTS>NORMALITY PLOTS WITH


TEST.

Sig = 0.521 (Grade 7)

Sig = 0.592 (Grade 8)

Sig = 0.350 (Grade 9)

Sig = 0.514 (Grade 10)


a = 0.05

Decision: Since sig > a = 0.05; we fail to reject Ho.

Conclusion: at a = 5%, The distribution is normal.

TEST ON RANDOMNESS

STEPS:

Test of Hypothesis:

Ho = The sequence of observations is random.

Ha = The sequence of observations is not random.

Test Procedure:

Run’s Test For Randomness

a = 5%

Computations:

SPSS PROCEDURE: ANALYZE>NON PARAMETRIC TEST>LEGACY DIALOGS>RUNS

Sig = 0.208

a = 0.05

Decision: Since sig = 0.208 > a = 0.05; we fail to reject Ho.

Conclusion: at a = 5%, The sequence of observations is random.

TEST ON HOMOSCEDASTICITY (The Homogeneity of Variance Test is incorporated in the One-Way


ANOVA Test Option)

STEPS:

Test of Hypothesis:

The variances in Mathematics performance among grade levels are equal.

The variances in Mathematics performance among grade levels are not equal.

Test Procedure: HOMOGENEITY OF VARIANCE TEST (Leven’s Tests)

Sig = 0.271

a = 0.05

a = 5%
Computations:

SPSS PROCEDURE: ANALYZE>COMPARE MEANS>ONEWAY ANOVA>OPTIONS>HOMOGENEITY OF


VARIANCE TEST

Decision: Since sig = 0.271 > a = 0.05; we fail to reject Ho.

Conclusion: at a = 5%, The variances in Mathematics performance among grade levels are equal.

Since the data have meet the test on assumptions proceed to the One-Way ANOVA table.

Steps:

1.

Ho: There is no significant difference in the Mathematics performance of students among grade
levels.

Ha: There is a significant difference in the Mathematics performance of students among grade
levels.

2.

Test Procedure: F-test (One-Way ANOVA)

3. a = 5%
4. Computations: ANALYZE>COMPARE MEANS>ONE-WAY ANOVA

Sig=0.228

a=0.05
5. Decision: Since sig = 0.228 > a = 0.05; we fail to reject H o
6. Decision: There is no significant difference in the Mathematics performance of students among
grade levels.

Note: The ANOVA alone does not tell us specifically which means were different from one another. To
determine that, we would need to follow up with multiple comparisons (or post-hoc) tests. They are
typically only conducted (interpreted) after a significant ANOVA. Post hoc tests are used to dive in and
look for differences between groups, testing each possible pair of groups.
TEST FOR ONE-SAMPLE CASE

One Sample Z-Test

 One-Sample Z test is performed when we want to compare a sample mean with the population
mean.

One Sample T-Test

 One-Sample t-test is performed when we want to compare a sample mean with the population
mean. The difference from the Z Test is that we do not have the information on Population
Variance here. We use the sample standard deviation instead of population standard deviation
in this case.

HOW TO RUN A ONE SAMPLE TEST IN SPSS?

 To run a One Sample Test in SPSS, click Analyze > Compare Means > One-Sample T Test.
 The One-Sample T Test window opens where you will specify the variables to be used in the
analysis. All of the variables in your dataset appear in the list on the left side. Move variables to
the Test Variable(s) area by selecting them in the list and clicking the arrow button.
 A. Test Variable(s): The variable whose mean will be compared to the hypothesized population
mean (i.e., Test Value). You may run multiple One Sample t Tests simultaneously by selecting
more than one test variable. Each variable will be compared to the same Test Value.
 Test Value: The hypothesized population mean against which your test variable(s) will be
compared.
 Options: Clicking Options will open a window where you can specify the Confidence Interval
Percentage and how the analysis will address Missing Values (i.e., Exclude cases analysis by
analysis or Exclude cases listwise). Click Continue when you are finished making specifications.

THE OUTPUT OF ONE SAMPLE TEST IN SPSS

 The first section, One-Sample Statistics, provides basic information about the selected variable,
including the valid (non-missing) sample size (n), mean, standard deviation, and standard error.
In this example, the mean height of the sample is 68.03 inches, which is based on 408 non-
missing observations.
 The second section, One-Sample Test, displays the results most relevant to the One
Sample t Test.
 Test Value: The number we entered as the test value in the One-Sample T Test window.
 t Statistic: The test statistic of the one-sample t test, denoted t. In this example, t = 5.810. Note
that t is calculated by dividing the mean difference (E) by the standard error mean (from the
One-Sample Statistics box).
 df: The degrees of freedom for the test. For a one-sample t test, df = n - 1; so here, df = 408 - 1 =
407.
 Sig. (2-tailed): The two-tailed p-value corresponding to the test statistic.
 Mean Difference: The difference between the "observed" sample mean (from the One Sample
Statistics box) and the "expected" mean (the specified test value (A)). The sign of the mean
difference corresponds to the sign of the t value (B). The positive t value in this example
indicates that the mean height of the sample is greater than the hypothesized value (66.5).
 Confidence Interval for the Difference: The confidence interval for the difference between the
specified test value and the sample mean.

Example:

Problem: A certain brand of milk is advertised as having a net weight of 250 grams. If the net weights of
a random sample of cans are 256, 248, 242, 245, 246, 248, 250, 255, 243 and 249 grams, can it be
conducted that the average net weight of the cans is not equal to the advertised amount? Use 𝛼 = 0.05
and assume that the net weight of this brand of powdered milk is normally distributed.

Step 1: State the null and Alternative Hypotheses

Ho: 𝜇 = 250 𝑔𝑟𝑎𝑚𝑠


Ha: 𝜇 ≠ 250 𝑔𝑟𝑎𝑚𝑠
Step 2: Identity the test procedure

Test Procedure: One Sample t-test

Step 3: Identify the level of significance

𝛼 = 0.05 𝑜𝑟 5%
Step 4: Write the Decision Rule

Reject Ho if the sig < 𝛼; Otherwise, fail to reject Ho

Step 5: Solve

Procedure: Analyze → Compare Means → One Sample T-test

Decision: Since sig = 0.249 < α=0.05; We fail to reject Ho.

Conclusion: At α=5%, Data provide evidence to say that the average net weight of cans is equal to the
advertised amount which is 250 grams.

WAYS TO DETERMINE WHETHER A MEAN IS STATISTICALLY DIFFERENT FROM ANOTHER MEAN

1. The T-value is larger than the critical value


2. The p value is less than .05.
3. The 95% Confidence interval of the Difference does not include
We reject the null hypothesis.

Mean
∑𝑋
Mean= 𝑁

Where:

 ∑X = Sum of all the values in the dataset

 N = Total number of values in the dataset

For example, if you have the numbers 5, 10, 15, the mean is:
5+10+15 30
= = 10
3 3
Mode

The mode is the value that appears most frequently in a data set. It represents the most common or
repeated number.

Examples:

1. For Ungrouped Data:

o Data set: {2, 3, 3, 5, 7, 8, 3}

o The mode is 3 because it appears the most times.

Median

The median is the middle value of a data set when it is arranged in ascending order (from smallest to
largest). It divides the data into two equal halves.

How to Find the Median:

1. For an Odd Number of Values:

o The median is the middle number.


o Example: {3, 7, 9, 12, 15}
 Middle value = 9 (Median = 9).

2. For an Even Number of Values:

o The median is the average (mean) of the two middle numbers.


o Example: {2, 5, 7, 10, 12, 15}
 Two middle values = 7 and 10
𝟕+𝟏𝟎
 Median 𝟐
= 8.5
Standard Deviation

 The standard deviation (SD) is a measure of how spread out the numbers in a data set are. A low
standard deviation means the values are close to the mean, while a high standard deviation
means the values are more spread out.

Formula for Standard Deviation

Population Standard Deviation (σ\sigmaσ):

 If you are considering the entire population, the formula is:

Where:

 σ = Population standard deviation


 X = Each data point
 μ = Population mean
 N = Total number of values
 ∑ = Summation symbol

Sample Standard Deviation (sss):

 If you are working with a sample from a population, the formula is:

Where:

 s = Sample standard deviation


 X = Each data point
 X (with bar above) = Sample mean
 n = Number of values in the sample
 ∑= Summation symbol

The key difference is that for a sample, we divide by n−1n - 1n−1 instead of N to account for bias in
estimating the population standard deviation.

Example:
Range

The range is the difference between the highest and lowest values in a dataset. It measures the spread
of the data.

Variance

Variance measures how spread out the data is from the mean. It is the average of the squared
differences from the mean.

Example:
Skewness

Interpretation of Skewness:

 Skewness = 0 → Data is symmetrical (e.g., normal distribution).


 Skewness > 0 → Data is positively skewed (right-skewed, tail on the right).
 Skewness < 0 → Data is negatively skewed (left-skewed, tail on the left)

Types of Kurtosis:

 Mesokurtic (K = 3) → Normal distribution (e.g., bell curve).


 Leptokurtic (K > 3) → High kurtosis, more peaked, with heavy tails (more outliers).
 Platykurtic (K < 3) → Low kurtosis, flatter peak, with light tails (fewer outliers).
Tool Number of Parametric/non- Assumptions When to Hint
Variables parametric Use Phrases
One-Sample 1 Parametric NO Compare the "Is this
T-Test (continuous) mean of one group
group to a different
known value from a
or known
population value?"
mean.
Paired T- 2 (related Parametric YES Compare the "Pre-test
Test samples) means of and post-
two related test" or
groups (e.g., "Same
before and group
after comparison"
treatment).
Independent 2 Parametric YES Compare the "Two
T-Test (independent means of groups,
groups) two compare
independent their means"
groups.
One-Way 1 Parametric YES Compare "Multiple
ANOVA (continuous) means across groups,
&1 more than compare
(categorical two their means"
with >2 independent
levels) groups.
Kruskal- 1 Non-Parametric NO Compare "Non-
Wallis Test (continuous) medians parametric
&1 across more version of
(categorical than two ANOVA"
with >2 independent
levels) groups when
assumptions
of ANOVA
are not met,
or the data is
ordinal.
Mann- 1 Non-Parametric NO Compare "Non-
Whitney U- (continuous) medians of parametric
Test &1 two version of
(categorical independent Independent
with 2 groups when T-Test"
levels) assumptions
of T-Test are
not met.
 NON-PARAMETRIC = Variables are 25 and below
 PARAMETRIC = Must Met an Assumption, Variables are 25 and above
 Homogeneity of Variances = Analyze > Compare Means > One-way ANOVA > Options >
Homogeneity of Variance Test
 Run’s Test for Randomness = Analyze > Nonparametric Tests > Legacy Dialogs > RUNS
 Normality = Analyze > Descriptive Statistics > Explore > Plots > Normality plots with tests.
 One Sample T-Test = Analyze > Compare Means > One-Sample T-Test
 Paired T-Test = Analyze > Compare Means > Paired-Sample T-Test
 Independent T-Test = Analyze > Compare Means > Independent-Sample T-Test
 One-Way ANOVA = Analyze > Compare Means > One-Way ANOVA
 Kruskal-Wallis Test = Analyze > Nonparametric Tests > Legacy Dialogs > K-Independent Samples,
Kruskal-Wallis H
 Mann-Whitney U-Test = Analyze > Nonparametric Tests > Legacy Dialogs > 2-Related Samples,
Mann-Whitney U-Test
 Wilk-Shapiro Test = N ≤ 2000
 Kolmogorov-Smirnov Test = N > 2000
 Formula = F (2,21) = 14.518, P = 0.00 wherein 2 (Between Groups), 21 (Within Groups), 14.518 (F)
and P = 0.00 (Value or sig-fig two-tailed)
1. One-Sample T-Test
A researcher wants to test if the average height of a group of 30 college students is equal to 170
cm. The sample of students has the following heights (in cm): [list of heights].
2. Paired-T-Test
A researcher wants to determine whether a new study method improves students' test scores. The
scores of 25 students were recorded before and after they used the new study method.

3. Independent T-Test
A teacher wants to test whether male students perform better than female students in a particular
subject. The test scores of 50 male and 50 female students are recorded.

4. One-Way ANOVA
A researcher wants to compare the test scores of students in three different teaching methods:
traditional lecture, online learning, and blended learning. The test scores of 40 students are
recorded for each method.

5. Kruskal-Wallis Test
A researcher wants to compare customer satisfaction ratings across three different stores.
Satisfaction ratings are given on a 1-5 scale by 30 customers per store.

6. Mann-Whitney U-Test
A researcher wants to compare the income levels of two cities: City A and City B. The incomes
of 100 residents from each city are recorded, but the data is not normally distributed.

( greater than 0.05 ) If we fail to reject the ho, there is no significant difference.( less than 0.05 ) If we reject
the ho, there is a significant difference.

You might also like