Psych Stats Reviewer
Psych Stats Reviewer
Statistics (Singular sense) is a science which deals with the collection, organization, presentation,
analysis, and interpretation of data. a study of variation. In (plural sense) it is an actual number
derived from the data, a collection of facts and figures and a processed data (e.g. Population
statistics, statistics on births, statistics on enrollment).
Types of data
Primary data acquired directly from the source Ex: data obtained by measuring wt. of 500 one-
day old chicks from Farm XYZ
Secondary data is a non-primary data Ex: Phil. Rice Production (tons/ha) data by province from
1990-2014 taken from publications of the Phil. Bureau of Agricultural Statistics.
Categories of Statistics
Scope of Statistics
Role of Statistics
A tool for data analysis (e.g. standard drug vs. new drug…. which is more effective?)
Opinion poll survey (Do you think Philippines is ready for ASEAN integration 2015?)
2 types:
Finite – when the elements of the universe can be counted for a given time period.
Infinite – when the number of elements of the universe is unlimited.
Variable – characteristics of interest measurable or observable on each & every individual of the
universe.
The measurement of a variable determines the amount of information that can be processed to
answer research objectives of a study. The scale of measurement of the variable determines the
algebraic operations that can be performed and the statistical tools that can be applied to
analyze the data. These are four scales or levels of measurement:
o Nominal - data collected are simply labels or names or categories without any implicit
or explicit ordering of the labels. It is also observations with the same label belong to
the same category, lowest level of measurement and the frequencies or counts of
observations belonging to the same category can be obtained.
o Ordinal - data collected are labels or classes with an implied ordering in these labels;
the difference between two labels cannot be quantified; a level of measure higher than
nominal; only ordering or ranking can be done on the data;
o Interval - data collected can be ordered or ranked, added and subtracted, but not
divided nor multiplied; differences between any two data values can be determined; the
unit of measurement is constant (but arbitrary), and the zero point is arbitrary; a level of
measurement higher than ordinal.
o Ratio - data collected has all the properties of the interval scale and in addition, can be
multiplied and divided; has a true zero point; is the highest level of measurement.
Statistical Software is a specialized computer program used for data management and statistical analysis
CS Pro - a software package for editing, tabulating, and disseminating data from censuses and
surveys; a public domain software
SAS - a propriety software that enables users to implement data management, statistical
analysis, data mining, forecasting, etc.; a popular statistical software for medical research and
pharmaceutical industry.
Stata - a propriety software that widely used in the field of economics, sociology and medicine;
executes data management and transformation, parameter estimations, graphics, statistical
measure computations and other related mathematical calculations; in executing the program,
time series, statistics and graphics are being loaded.
Minitab a statistical software package originally intended for teaching statistics; Suitable for
moderate-size datasets.
R a free software programming language based on S programming language; A software
environment for statistical computing and graphics.
ITSM 2000 permit easy execution of data processing, graphical display, estimation, and
diagnostic testing for univariate and multivariate time series models in the time and frequency
domains; provides easy to use estimation and forecasting tools for spectral analysis; particularly,
the dynamic graphics allow the user to instantly see the effect of data transformations and
model changes on a wide variety of features such as the sample, residual, and model
autocorrelation functions and spectra.
E-views offers an extensive array of powerful features for data handling, statistics and
econometric analysis, forecasting and simulation, data presentation, and programming.
IRRISTAT - a set of microcomputer programs designed to assist agricultural researchers in
developing experimental lay-outs and undertaking plot sampling, data collection, data and file
management, statistical analysis of data and presentation of results.
STAR - a freeware developed specifically by Biometrics and Breeding Informatics, Plant Breeding,
Genetics and Biotechnology Division of International Rice Research Institute); a computer
program for data management and basic statistical analysis of experimental data.
SPSS (Statistical Package for Social Sciences) one of the most widely used program for statistical
analysis in Social Sciences.
Advantages:
User-friendly interface
Wide array of statistical procedures
Disadvantages:
Expensive
License is time limited
Graphics are less impressive
IBM-SPSS Introduction
It was invented by Norman H Nie, C. Hadlai “Tex” Hul, and Dale H. Bent during 1960s.
In 1980s, the version of the software was moved to a personal computer.
Last 2008, the name SPSS was changed to Predictive Analysis Software (PASW).
A year after, SPSS was acquired by IBM and renamed the software as IBM SPSS Statistics.
SPSS Windows
1. Data Editor Window - this is where you enter the data - divided into 2 views:
o Data View - a spreadsheet-like interface where you enter the data. This is the default
view when opening SPSS.
o Variable View - this is where you define your variables.
2. Output Window - this is where the result is being displayed.
3. Syntax Editor Window - is used to run and store SPSS command
4. Script Window – provides the opportunity to write full-blown programs, in a BASIC-like
language. Text editor for syntax composition. Extension of the saved file will be “sbs”.
Define the variable names. Click the Variable View tab at the bottom of the Data editor window.
In the first row of the first column, type origin. Then press ENTER key. In the second row, type
age. Then ENTER. In the third row, type num_sib. Press ENTER.
New variables are automatically given a Numeric data type.
Type – the type of variable.
o Internal formats: Numeric, String (alphanumeric), Date
o Output formats: Comma, Dot, Scientific notation, Dollar, Custom currency
Width – number of characters or numerical digit you will be able to enter for a particular
variable.
Decimals – desired number of decimal places.
Label – full name of the variable
Values – Use to assign values to variables e.g. 1 – Male 2 – Female
Missing – allows you to assign missing values.
Column– determine the size of column display.
Align – Alignment of data in column
Measure – Level of measurement of data/variable
Columns sets the amount of space reserved to display the contents of the variable in Data View;
generally the default value is adequate.
Align sets whether the contents of the variable appear on the left, centre or right of the cell in
Data View.
Numeric variables are right-hand justified by default and string variables left-hand justified by
default; the defaults are generally adequate.
Measure of Location
Measure of Dispersion
Standard deviation – a measure of variability of the data points from the mean value
Variance – average squared differences of the data points from the mean value
Range – the simplest measure of variation computed as the difference between the highest and
lowest value of the data set.
Recoding (Transforming) Variables
Sometimes you will want to transform a variable by combining some of its categories or values together.
For example, you may want to change a continuous variable into an ordinal categorical variable, or you
may want to merge the categories of a nominal variable. In SPSS, this type of transform is called
recoding.
Each of these options allows you to re-categorize an existing variable. Recode into Different Variables
create a new variable without modifying the original variable, while Recode into Same Variables will
permanently overwrite the original variable. In general, it is best to recode a variable into a different
variable so that you never alter the original data and can easily access the original data if you need to
make different changes later on.
Recoding into a different variable transforms an original variable into a new variable. That is, the
changes do not overwrite the original variable; they are instead applied to a copy of the original
variable under a
new name. To recode into different variables, click Transform > Recode into Different Variables.
The left column lists all of the variables in your dataset. Select the variable you wish to recode by
clicking it. Click the arrow in the center to move the selected variable to the center text box, (B).
Old And New Values
Once you click Old and New Values, a new window where you will specify how to transform the values
will appear.
Value: Enter a specific numeric code representing an existing category. System-missing: Applies
to any system-missing values (.)
System- or user-missing: Applies to any system-missing values (.) or special missing value codes
defined by the user in the Variable View window
Range: For use with ordered categories or continuous measurements. Enter the lower and upper
boundaries that should be coded. The recoded category will include both endpoints, so data
values that are exactly equal to the boundaries will be included in that category.
Range, LOWEST through value: For use with ordered categories or continuous measurements.
Recode all values less than or equal to some number.
Range, value through HIGHEST: For use with ordered categories or continuous measurements.
Recode all values greater than or equal to some number.
All other values: Applies to any value not explicitly accounted for by the previous recoding rules.
If using this setting, it should be applied last.
Recoding into the same variable (Transform > Recode into Same Variables) works the same way
as described above, except for that any changes made will permanently alter the original
variable. That is, the original values will be replaced by the recoded values. In general, it is good
practice not to recode into the same variable because it overwrites the original variable. If you
ever needed to use the variable in its original form (or wanted to double-check your steps), that
information would be lost.
Computing Variables
Sometimes you may need to compute a new variable based on existing information (from other
variables) in your data. For example, you may want to: Convert the units of a variable from feet
to meters Use a subject's height and weight to compute their BMI. Compute a subscale score
from items on a survey. Apply a computation conditionally, so that a new variable is only
computed for cases where certain conditions are met .
You do not necessarily need to use the Compute Variables dialog window in order to compute
variables or generate syntax. You can write your own syntax expressions to compute variables
(and it is often faster and more convenient to do so!) Syntax expressions can be executed by
opening a new Syntax file (File > New > Syntax), entering the commands in the Syntax window,
and then pressing the Run button. The general form of the syntax for computing a new (numeric)
variable is.
Consider the data on screening exam scores of 20 freshman applicants each in Science High
school and Rural High School.
Null Hypothesis – the conjecture which is being tested, denoted by Ho. - Generally, this is a statement of
equality or status quo or no difference.
Alternative Hypothesis – the complementary statement that will be accepted in the event that the null
hypothesis is rejected. It is denoted by Ha or H1.
2 Types of error:
Type I error – error in rejecting a true Ho. The probability of committing Type I error is denoted
by ; i.e = P[Type I error] = P[reject Ho/Ho is true = level of significance of a statistical test
Type II error – error in accepting a false Ho Probability of Committing Errors 1. The probability of
committing Type II error is denoted by ; i.e = P[Type II error] = P[accept Ho/Ho is false]
Test statistic - Statistic which provides a basis for determining whether to reject Ho in favor of
Ha.
Decision Rule - Rule which specifies that region for which the test statistic leads to the rejection
of Ho in favor of Ha.
Critical Region - The region specified on the test of Ho vs Ha.
Test on assumptions
In most situations, the satisfaction of assumptions for certain parametric methods ensures the
validity of the results and the appropriateness of the test employed. It is for this reason that a
number of methods has been designed to test on certain assumptions of parametric methods.
1. Tests on Equality of Variances
o The assumption of homoskedasticity (equality of variances) is used in ANOVA techniques
and regression analysis.
o The assumption of homoskedasticity is necessary for some tests to be valid.
o The Bartlett’s test makes use of the 2 test.
o It tests whether p populations have equal variances of the samples obtained from the p
populations.
o One of the many assumptions in the analysis of an experimental data.
o If this assumption does not hold, the F-tests in the analysis of variance is not valid
Procedure: Analyze > Compare Means > Oneway ANOVA > Options > Homogeneity of Variance Test
Procedure: Analyze > Nonparametric Tests > Legacy Dialogs > RUNS
3. The One-Sample Test for Normality
o Use Wilk-Shapiro test (for N < 2000) and Kolmogorov-Smirnov (K-S) test (for N > 2000) is
used to determine whether the sample data came from a normal distribution or not.
o It makes use of the standard normal distribution as the basis to say whether a certain
distribution is normal or not.
Procedure: Analyze > Descriptive Statistics > Explore > Plots > Normality plots with tests.
In instances wherein certain assumptions are not satisfied, appropriate transformations and adjustments
to the data must be done before parametric methods (e.g., t, Z of F tests) are employed. Another
alternative in such instances is also done, i.e., to employ the nonparametric counterpart of the
appropriate parametric test.
Nonparametric Statistical tests
Parametric statistical test (e.g., Z, t, F tests) are more powerful than nonparametric tests.
Kruskal-Wallis H Test
The Kruskal-Wallis H test (sometimes also called the "one-way ANOVA on ranks") is a rank-
based nonparametric test that can be used to determine if there are statistically significant
differences between two or more groups of an independent variable on a continuous or ordinal
dependent variable. It is considered the nonparametric alternative to the one-way ANOVA, and
an extension of the Mann-Whitney U test to allow the comparison of more than two
independent groups.
It is assumed that the observations in the data set are independent of each other.
It is assumed that the distribution of the population should not be necessarily normal and the
variances should not be necessarily equal.
It is assumed that the observations must be drawn from the population by the process of
random sampling.
You will be presented with the following output (assuming you did not select the Descriptive checkbox in
the "Several Independent Samples: Options" dialogue box):
The mean rank (i.e., the "Mean Rank" column in the Ranks table) of the Pain Score for each drug
treatment group can be used to compare the effect of the different drug treatments. Whether
these drug treatment groups have different pain scores can be assessed using the Test
Statistics table which presents the result of the Kruskal-Wallis H test. That is, the chi-squared
statistic (the "Chi-Square" row), the degrees of freedom (the "df" row) of the test and the
statistical significance of the test (the "Asymp. Sig." row).
Example: A shoe company wants to know if three groups of workers have different salaries:
Women: 23000, 41000, 54000, 66000, 78000
Men: 45000, 55000, 60000, 70000, 72000
Minorities: 20000, 30000, 34000, 40000, 44000
Test the difference among three groups, using 𝛼 = 0.05.
STEP 1: Identify the null and alternative hypothesis.
Ho: There is no significant difference among the salaries of the three groups of workers.
Ha: There is a significant difference among the salaries of the three groups of workers.
STEP 2: Identify the test procedure
𝛼 = 0.05 𝑜𝑟 5%
STEP 4: Write the decision rule.
Step 5:
sig = 0.035
𝛼 = 0.05
Conclusion: At 𝛼 = 5%, there is a significant difference among the salaries of the three groups of
workers.
The paired sample t-test, sometimes called the dependent sample t-test, is a statistical
procedure used to determine whether the mean difference between two sets of observations is
zero.
In a paired sample t-test, each subject or entity is measured twice, resulting in pairs of
observations.
Common applications of the paired sample t-test include case-control studies or repeated-
measures designs.
The Paired Samples t Test compares the means of two measurements taken from the same
individual, object, or related units. These "paired" measurements can represent things like:
A measurement taken at two different times (e.g., pre-test and post-test score with an
intervention administered between the two time points)
A measurement taken under two different conditions (e.g., completing a test under a
"control" condition and an "experimental" condition)
Measurements taken from two halves or sides of a subject or experimental unit (e.g.,
measuring hearing loss in a subject's left and right ears).
The purpose of the test is to determine whether there is statistical evidence that the mean difference
between paired observations is significantly different from zero. The Paired Samples t Test is a
parametric test.
This test is also known as:
Dependent t Test
Paired t Test
Repeated Measures t Test
Dependent variable, or test variable (continuous), measured at two different times or for
two related conditions or units.
Common Uses
Data Requirements
Hypothesis
The hypotheses can be expressed in two different ways that express the same idea and are
mathematically equivalent:
H0: µ1 = µ2 ("the paired population means are equal")
H1: µ1 ≠ µ2 ("the paired population means are not equal")
OR
H0: µ1 - µ2 = 0 ("the difference between the paired population means is equal to 0")
H1: µ1 - µ2 ≠ 0 ("the difference between the paired population means is not 0")
where
µ1 is the population mean of variable 1, and
µ2 is the population mean of variable 2.
Test Statistics
The test statistic for the Paired Samples t Test, denoted t, follows the same formula as the one
sample t test.
Or
The calculated t value is then compared to the critical t value with df = n - 1 from the t distribution table
for a chosen confidence level. If the calculated t value is greater than the critical t value, then we reject
the null hypothesis (and conclude that the means are significantly different).
x¯diff = Sample mean of the differences
n= Sample size (i.e., number of observations)
sdiff= Sample standard deviation of the differences
sx¯ = Estimated standard error of the mean (s/sqrt(n))
Data Setup
Data should include two continuous numeric variables (represented in columns) that will be
used in the analysis.
The two variables should represent the paired variables for each subject (row). If your data are
arranged differently (e.g., cases represent repeated units/subjects), simply restructure the data
to reflect this format.
To run a Paired Samples t Test in SPSS, click Analyze > Compare Means > Paired-Samples T Test.
The Paired-Samples T Test window opens where you will specify the variables to be used in the analysis.
All of the variables in your dataset appear in the list on the left side. Move variables to the right by
selecting them in the list and clicking the blue arrow buttons. You will specify the paired variables in
the Paired Variables area.
Pair: The “Pair” column represents the number of Paired Samples t Tests to run. You may choose
to run multiple Paired Samples t Tests simultaneously by selecting multiple sets of matched
variables. Each new pair will appear on a new line.
Variable1: The first variable, representing the first group of matched values. Move the variable
that represents the first group to the right where it will be listed beneath the
“Variable1” column.
Variable2: The second variable, representing the second group of matched values. Move the
variable that represents the second group to the right where it will be listed beneath
the “Variable2” column.
Options: Clicking Options will open a window where you can specify the Confidence Interval
Percentage and how the analysis will address Missing Values (i.e., Exclude cases analysis by
analysis or Exclude cases listwise). Click Continue when you are finished making specifications.
Setting the confidence interval percentage does not have any impact on the calculation of the p-value. If
you are only running one paired samples t test, the two "missing values" settings will produce the same
results. There will only be differences if you are running 2 or more paired samples t tests. (This would
look like having two or more rows in the main Paired Samples T Test dialog window.)
Example:
The marks for a group of students before(pre) and after(post) a teaching intervention are recorded
below:
a= 0.05
Decision: The null hypotheses is rejected, since p <0.05 (in fact p =0.04)
Conclusion: There is strong evidence (t=3.231, p =0.004 that the teaching intervention improves marks.
In this data set, it improved marks, on average, by approximately 2 points. Of course, if we were to be
take other sample marks , we could get a ‘mean paired difference’ in marks different from 2.05.
Mann-Whitney U-Test
Assumptions:
Assumption #1: Your dependent variable should be measured at the ordinal or continuous level.
o Examples of ordinal variables include Likert items (e.g., a 7-point scale from "strongly
agree" through to "strongly disagree"), amongst other ways of ranking categories (e.g., a
5-point scale explaining how much a customer liked a product, ranging from "Not very
much" to "Yes, a lot").
o Examples of continuous variables include revision time (measured in hours), intelligence
(measured using IQ score), exam performance (measured from 0 to 100), weight
(measured in kg), and so forth.
o Examples of independent variables that meet this criterion include gender (2 groups:
male or female), employment status (2 groups: employed or unemployed), smoker (2
groups: yes or no), and so forth.
Assumption #3: You should have independence of observations, which means that there is no
relationship between the observations in each group or between the groups themselves.
o For example, there must be different participants in each group with no participant
being in more than one group. This is more of a study design issue than something you
can test for, but it is an important assumption of the Mann-Whitney U test. If your study
fails this assumption, you will need to use another statistical test instead of the Mann-
Whitney U test (e.g., a Wilcoxon signed-rank test).
Assumption # 4: A Mann-Whitney U test can be used when your two variables are not normally
distributed.
o However, in order to know how to interpret the results from a Mann-Whitney U test,
you have to determine whether your two distributions (i.e., the distribution of scores
for both groups of the independent variable; for example, 'males' and 'females' for the
independent variable, 'gender') have the same shape.
Test Procedure in SPSS Statistics
1. Click Analyze -> Nonparametric Tests -> Legacy Dialogs -> 2 Independent Samples.
2. Drag and drop the dependent variable into the Test Variable(s) box, and the grouping variable
into the Grouping Variable box.
3. Tick Mann-Whitney U under Test Type.
4. Click on Define Groups, and input the values that define each of the groups that make up the
grouping variable (i.e., the coded value for Group 1 and the coded value for Group 2).
5. Press Continue, and then click on OK to run the test.
6. The result will appear in the SPSS data viewer.
Assumption of Normality
Given this setup, it would be usual to conduct an independent samples t test. One assumption of
this parametric test is that data is normally distributed. The trouble is if we test our data for
normality, we get this result.
Both Kolmogorov-Smirnov and Shapiro-Wilk suggest that our dependent variable is not
distributed normally. This is confirmed by the histogram, which has a long left tail.
This means we’re better off using a non-parametric test to determine whether there is a
relationship between our independent and dependent variables (though, actually, since we have
a large number of observations, we’d probably get away with the t test). The obvious choice
here is the Mann-Whitney U test.
Example:
Consider a Phase II clinical trial designed to investigate the effectiveness of a new drug to reduce
symptoms of asthma in children. A total of n=10 participants are randomized to receive either the new
drug or a placebo. Participants are asked to record the number of episodes of shortness of breath over a
1 week period following receipt of the assigned treatment. The data are shown below. Is there a
difference in the number of episodes of shortness of breath over a 1 week period in participants
receiving the new drug as compared to those receiving the placebo?
Solution Exercise 1.
𝐻𝑂 : There is no difference in the number of episodes of shortness of breath over a 1 week period in
participants receiving the new drug as compared to those receiving the placebo.
𝐻1 : There is a difference in the number of episodes of shortness of breath over a 1 week period in
participants receiving the new drug as compared to those receiving the placebo.
In symbol,
𝐻𝑂 : 𝑢1 = 𝑢2
𝐻1 : 𝑢1 ≠ 𝑢2
These hypotheses are two-tailed as the null is written with is equal to sign.
3. Select the appropriate test statistic.
𝛼 = 0.01 or 1%
The final section of the output gives the values of the Mann-Whitney U test (and several other tests as
well.) In this example, the Mann-Whitney U value is 3.000. There are two p values given -- one on the
row labeled Asymp. Sig (2-Tailed) and the other on the row labeled Exact Sig. [2*(1- tailed Sig.)].
Typically, we will use the exact significance, although if the sample size is large, the asymptotic
significance value can be used to gain a little statistical power.
Sig = .056
𝛼 = 0.01
7. Decision:
8. Conclusion:
At α = 0.01, there is no difference in the number of episodes of shortness of breath over a 1 week
period in participants receiving the new drug and for those receiving the placebo.
Or the result of the Mann-Whitney U test supports the proposition that the effects of using new drug is
the same as to effect of using placebo in reducing symptoms of asthma among children.
The bivariate Pearson Correlation produces a sample correlation coefficient, r, which measures
the strength and direction of linear relationships between pairs of continuous variables. By extension,
the Pearson Correlation evaluates whether there is statistical evidence for a linear relationship among
the same pairs of variables in the population, represented by a population correlation coefficient, ρ
(“rho”). The Pearson Correlation is a parametric measure.
Pearson’s correlation
Pearson Product-Moment Correlation (PPMC)
Common Uses
Common Applications: Exploring the relationship (linear) between 2 variables; eg, as variable A
increases, does variable B increase or decrease? The relationship is measured by a quantity called
correlation.
Whether a statistically significant linear relationship exists between two continuous variables
The strength of a linear relationship (i.e., how close the relationship is to being a perfectly
straight line)
The direction of a linear relationship (increasing or decreasing)
Direction
The sign of the correlation coefficient indicates the direction of the relationship.
+ direct relationship - when one variable increases the other one also increases, or when one
variable decreases the other one also decreases
inverse relationship - when one variable increases the other one decreases and vice versa
A coefficient close to 1 means a strong and positive association between the two variables, and a
coefficient close to -1 means strong negative association between the two variables.
Association does not mean necessarily a causal relation between both variables. For example,
there might be a third variable you have not considered and this third variable might be the
explanation for the behavior of the other two.
Even if there is a causal relationship between the variables, the correlation coefficient does not
tell you which variable is the cause and which is the effect.
If the coefficient is close to 0, it does not necessarily mean that there is no relation between the
two variables. It means there is not a LINEAR relationship, but there might be another type of
functional relationship (for example, quadratic or exponential).
Degree
The scatterplots below show correlations that are r = +0.90, r = 0.00, and r = -0.90, respectively.
The strength of the nonzero correlations are the same. But the direction of the correlations is
different: a negative correlation corresponds to a decreasing relationship, while and a positive
correlation corresponds to an increasing relationship.
Note: The bivariate Pearson Correlation cannot address non-linear relationships or relationships among
categorical variables. If you wish to understand relationships that involve categorical variables and/or
non-linear relationships, you will need to choose another measure of association.
The bivariate Pearson Correlation only reveals associations among continuous variables. The
bivariate Pearson Correlation does not provide any inferences about causation, no matter how large the
correlation coefficient is.
Data Requirements:
To use Pearson correlation, your data must meet the following requirements:
The bivariate Pearson Correlation only reveals associations among continuous variables. The bivariate
Pearson Correlation does not provide any inferences about causation, no matter how large the
correlation coefficient is.
The Bivariate Correlations window opens, where you will specify the variables to be used in the
analysis. All of the variables in your dataset appear in the list on the left side. To select variables
for the analysis, select the variables in the list on the left and click the blue arrow button to
move them to the right, in the Variables field.
Example:
Calcium_Intake.sav show the calcium intake of Special Program in Sports students and their knowledge
about calcium. Is there a relationship between calcium intake and knowledge about calcium of SPS
students at a = 0.05?
Steps:
Ho: There is no correlation/association between calcium intake and knowledge about calcium of SPS
students.
Ho : r = 0
Ha: There is a correlation/association between calcium intake and knowledge about calcium of SPS
students.
Ha : r = 0
Test Procedure: Pearson Product-Moment Correlation
a = 5%
Computations: ANALYZE>CORRELATE>BIVARIATE
Sig=0.002
Decision: There is a correlation/association between calcium intake and knowledge about calcium of SPS
students.
Calcium intake and knowledge about calcium were found to be moderately positively
correlated, r(28) = .533, p = .002.
Among the SPS students, the calcium intake and knowledge about calcium were moderately
positively correlated, r(28) = .533, p < .05.
Conclusion:
We can conclude that for SPS students there is evidence that knowledge about calcium is related
to calcium intake. In particular, it seems that the more the SPS students know about calcium, the
greater their calcium intake is (r = 0.53, p = .002).
ANOVA was developed by Ronald Fisher in 1918 and is the extension of the t and the z test.
Before the use of ANOVA, the t-test and z-test were commonly used. But the problem with the T-
test is that it cannot be applied for more than two groups. In 1918, Ronald Fisher developed a
test called the analysis of variance.
One-Way ANOVA (Analysis of Variance) compares the means of two or more independent
groups in order to determine whether there is statistical evidence that the associated population
means are significantly different. One-Way ANOVA is a parametric test.
One-Factor ANOVA
One-Way Analysis of Variance
Between Subjects ANOVA
Dependent variable
Independent variable (also known as the grouping variable, or factor)
This variable divides cases into two or more mutually exclusive levels, or groups
The One-Way ANOVA is often used to analyze data from the following types of studies:
Field studies
Experiments
Quasi-experiments
The One-Way ANOVA is commonly used to test the statistical differences among the means of two or
more groups or interventions
Use a one-way ANOVA when you have collected data about one categorical independent
variable and one quantitative dependent variable. The independent variable should have at least
three levels (i.e. at least three different groups or categories). ANOVA tells you if the dependent
variable changes according to the level of the independent variable. If you only want to compare
two groups, use a t-test instead.
There is no relationship between the subjects in each sample. This means that:
PROCEDURE: Analyze > Nonparametric Tests > Legacy Dialogs > RUNS
Normal distribution (approximately) of the dependent variable for each group (i.e., for each
level of the factor)
PROCEDURE: Analyze > Descriptive Statistics > Explore > Plots > Normality plots with tests.
Homogeneity of variances (i.e., variances approximately equal across groups; group variances
are homogenous)
PROCEDURE: Analyze > Compare Means > Oneway ANOVA > Options > Homogeneity of Variance
Test
No outliers
Note: When the normality, homogeneity of variances, or outliers assumptions for One-Way ANOVA are
not met, you may want to run the nonparametric Kruskal-Wallis test instead.
How to Run a One-Way ANOVA
To run a One-Way ANOVA in SPSS, click Analyze > Compare Means > One-Way ANOVA.
The One-Way ANOVA window opens, where you will specify the variables to be used in the
analysis. All of the variables in your dataset appear in the list on the left side. Move variables to
the right by selecting them in the list and clicking the blue arrow buttons. You can move a
variable(s) to either of two areas: Dependent List or Factor.
When the initial F test indicates that significant differences exist between group means,
contrasts are useful for determining which specific means are significantly different when you
have specific hypotheses that you wish to test. Contrasts are decided before analyzing the data
(i.e., a priori). Contrasts break down the variance into component parts. They may involve using
weights, non-orthogonal comparisons, standard contrasts, and polynomial contrasts (trend
analysis).
When the initial F test indicates that significant differences exist between group means, post hoc tests
are useful for determining which specific means are significantly different when you do not have specific
hypotheses that you wish to test. Post hoc tests compare each pair of means (like t-tests), but unlike t-
tests, they correct the significance estimate to account for the multiple comparisons.
Example:
Math_Performance.sav data show the Mathematics performance of the Junior High School students.
Test the differences among the four grade levels, using a = 0.05.
Before proceeding to parametric test of One-Way ANOVA, perform the test on assumptions of normality,
randomness, and homoscedasticity.
TEST ON NOMALITY
STEPS:
Test of Hypothesis:
a = 5%
Computations:
TEST ON RANDOMNESS
STEPS:
Test of Hypothesis:
Test Procedure:
a = 5%
Computations:
Sig = 0.208
a = 0.05
STEPS:
Test of Hypothesis:
The variances in Mathematics performance among grade levels are not equal.
Sig = 0.271
a = 0.05
a = 5%
Computations:
Conclusion: at a = 5%, The variances in Mathematics performance among grade levels are equal.
Since the data have meet the test on assumptions proceed to the One-Way ANOVA table.
Steps:
1.
Ho: There is no significant difference in the Mathematics performance of students among grade
levels.
Ha: There is a significant difference in the Mathematics performance of students among grade
levels.
2.
3. a = 5%
4. Computations: ANALYZE>COMPARE MEANS>ONE-WAY ANOVA
Sig=0.228
a=0.05
5. Decision: Since sig = 0.228 > a = 0.05; we fail to reject H o
6. Decision: There is no significant difference in the Mathematics performance of students among
grade levels.
Note: The ANOVA alone does not tell us specifically which means were different from one another. To
determine that, we would need to follow up with multiple comparisons (or post-hoc) tests. They are
typically only conducted (interpreted) after a significant ANOVA. Post hoc tests are used to dive in and
look for differences between groups, testing each possible pair of groups.
TEST FOR ONE-SAMPLE CASE
One-Sample Z test is performed when we want to compare a sample mean with the population
mean.
One-Sample t-test is performed when we want to compare a sample mean with the population
mean. The difference from the Z Test is that we do not have the information on Population
Variance here. We use the sample standard deviation instead of population standard deviation
in this case.
To run a One Sample Test in SPSS, click Analyze > Compare Means > One-Sample T Test.
The One-Sample T Test window opens where you will specify the variables to be used in the
analysis. All of the variables in your dataset appear in the list on the left side. Move variables to
the Test Variable(s) area by selecting them in the list and clicking the arrow button.
A. Test Variable(s): The variable whose mean will be compared to the hypothesized population
mean (i.e., Test Value). You may run multiple One Sample t Tests simultaneously by selecting
more than one test variable. Each variable will be compared to the same Test Value.
Test Value: The hypothesized population mean against which your test variable(s) will be
compared.
Options: Clicking Options will open a window where you can specify the Confidence Interval
Percentage and how the analysis will address Missing Values (i.e., Exclude cases analysis by
analysis or Exclude cases listwise). Click Continue when you are finished making specifications.
The first section, One-Sample Statistics, provides basic information about the selected variable,
including the valid (non-missing) sample size (n), mean, standard deviation, and standard error.
In this example, the mean height of the sample is 68.03 inches, which is based on 408 non-
missing observations.
The second section, One-Sample Test, displays the results most relevant to the One
Sample t Test.
Test Value: The number we entered as the test value in the One-Sample T Test window.
t Statistic: The test statistic of the one-sample t test, denoted t. In this example, t = 5.810. Note
that t is calculated by dividing the mean difference (E) by the standard error mean (from the
One-Sample Statistics box).
df: The degrees of freedom for the test. For a one-sample t test, df = n - 1; so here, df = 408 - 1 =
407.
Sig. (2-tailed): The two-tailed p-value corresponding to the test statistic.
Mean Difference: The difference between the "observed" sample mean (from the One Sample
Statistics box) and the "expected" mean (the specified test value (A)). The sign of the mean
difference corresponds to the sign of the t value (B). The positive t value in this example
indicates that the mean height of the sample is greater than the hypothesized value (66.5).
Confidence Interval for the Difference: The confidence interval for the difference between the
specified test value and the sample mean.
Example:
Problem: A certain brand of milk is advertised as having a net weight of 250 grams. If the net weights of
a random sample of cans are 256, 248, 242, 245, 246, 248, 250, 255, 243 and 249 grams, can it be
conducted that the average net weight of the cans is not equal to the advertised amount? Use 𝛼 = 0.05
and assume that the net weight of this brand of powdered milk is normally distributed.
𝛼 = 0.05 𝑜𝑟 5%
Step 4: Write the Decision Rule
Step 5: Solve
Conclusion: At α=5%, Data provide evidence to say that the average net weight of cans is equal to the
advertised amount which is 250 grams.
Mean
∑𝑋
Mean= 𝑁
Where:
For example, if you have the numbers 5, 10, 15, the mean is:
5+10+15 30
= = 10
3 3
Mode
The mode is the value that appears most frequently in a data set. It represents the most common or
repeated number.
Examples:
Median
The median is the middle value of a data set when it is arranged in ascending order (from smallest to
largest). It divides the data into two equal halves.
The standard deviation (SD) is a measure of how spread out the numbers in a data set are. A low
standard deviation means the values are close to the mean, while a high standard deviation
means the values are more spread out.
Where:
If you are working with a sample from a population, the formula is:
Where:
The key difference is that for a sample, we divide by n−1n - 1n−1 instead of N to account for bias in
estimating the population standard deviation.
Example:
Range
The range is the difference between the highest and lowest values in a dataset. It measures the spread
of the data.
Variance
Variance measures how spread out the data is from the mean. It is the average of the squared
differences from the mean.
Example:
Skewness
Interpretation of Skewness:
Types of Kurtosis:
3. Independent T-Test
A teacher wants to test whether male students perform better than female students in a particular
subject. The test scores of 50 male and 50 female students are recorded.
4. One-Way ANOVA
A researcher wants to compare the test scores of students in three different teaching methods:
traditional lecture, online learning, and blended learning. The test scores of 40 students are
recorded for each method.
5. Kruskal-Wallis Test
A researcher wants to compare customer satisfaction ratings across three different stores.
Satisfaction ratings are given on a 1-5 scale by 30 customers per store.
6. Mann-Whitney U-Test
A researcher wants to compare the income levels of two cities: City A and City B. The incomes
of 100 residents from each city are recorded, but the data is not normally distributed.
( greater than 0.05 ) If we fail to reject the ho, there is no significant difference.( less than 0.05 ) If we reject
the ho, there is a significant difference.