0% found this document useful (0 votes)
37 views75 pages

Data Analysis Using Statistics

Uploaded by

Mael
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views75 pages

Data Analysis Using Statistics

Uploaded by

Mael
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

PLANNING

DATA ANALYSES
USING
STATISTICS
• When the necessary data have already been collected,
the next step is to organize the raw data for data
analysis.
• It is important that the researcher is assured of the
quality of the data for accuracy, consistency,
completeness and systematic arrangement to facilitate
coding and tabulation.
• Every research methodology requires a data analysis
plan. The plan includes specifying the statistical
measures to use and to address the research questions.
• The appropriate methods of data analysis are
determined by the type of data, the variables to be
used, the number of cases and the distribution of the
variables.
PURPOSE OF DATA ANALYSIS PLAN
The purpose of a data analysis plan is to gather
useful information to find solutions to research
questions of interest. It may be used to:
• Describe data sets;
• Determine the degree of relationship of variables;
• Determine differences between variables;
• Predict outcomes; and
• Compare variables
PURPOSE OF DATA ANALYSIS PLAN
All of the mentioned could be manipulated by using
any of a combination of the following data analysis
strategies:
• EXPLORATORY DATA ANALYSIS
This type of data analysis is used when it is not
clear what to expect from the data. This strategy
uses numerical and visual presentations such as
graphs. Since the research of interest is new, it is
possible to find some inconsistencies, such as
missing values, distribution of the data or unusually
small or too large values or invalid data.
PURPOSE OF DATA ANALYSIS PLAN
• DESCRIPTIVE DATA ANALYSIS
This type of data analysis is used to describe, show
or summarize data in a meaningful way, leading to
a simple interpretation of data. Descriptive data
analyses do not allow you to formulate conclusions
beyond the data that you have described. The
commonly used descriptive statistics are those that
analyze the distribution of data such as frequency,
percentage, measures of central tendency and
measures of dispersion.
PURPOSE OF DATA ANALYSIS PLAN
• INFERENTIAL DATA ANALYSIS
Inferential statistics tests hypotheses about a set of
data to reach conclusions or make generalizations
beyond merely describing the data. Inferential
statistics include tests of significance of difference
such as t-test, Analysis of Variance (ANOVA); and
tests of relationship such as Product Moment
Coefficient or Correlations or Pearson r, Spearman
rho, linear regression and Chi-square test.
Descriptive Data Analysis
Quantitative Data Analysis Quantitative data analysis
is a systematic approach of investigations during
which numerical data are collected and are
transformed into meaningful information (Prieto,
Naval, and Carey 2017). Data analysis refers to how a
researcher could gain meaningful insight on a mass
number of data. The main purpose of data analysis in
research is to find meaning in data so that the derived
knowledge can be used to make informed decisions.
QUANTITATIVE ANALYSIS IN EVALUATION
• NOMINAL SCALE
A nominal scale of measurement is used for
labelling variables. It is sometimes called
categorical data. Basketball players wear sports
shirts with numbers, but that is just a way to
identify the players. Likewise, if you want to
categorize respondents based on gender, you could
use 1 for male, and 2 for female. No order or
distance is observed. The Yes or No scale is an
example of nominal data. The numbers assigned to
the variables have no quantitative value. Some
examples of variables measured on a nominal scale
are gender, religious affiliation, race or ethnic
group.
QUANTITATIVE ANALYSIS IN EVALUATION
• ORDINAL SCALE
An ordinal scale of measurement assigns order on
items on the characteristics being measured. It
involves the ranking of individuals, attitudes and
characteristics. The order in the honor roll (first
honor, second honor, third honor); order of
agreement (strongly agree, agree, strongly
disagree) or economic status (low, average, high)
are some examples.
Numerical scores such as first, second, third and so
on are assigned but the numerical value or quantity
has no value except its ability to establish ranking
among set of data.
QUANTITATIVE ANALYSIS IN EVALUATION
• INTERVAL SCALE
The interval scale has equal units of measurement,
thereby, making it possible to interpret the order of
scale scores and the distance between them.
However, interval scales do not have a “true zero”.
With the interval data, addition and subtraction are
possible but you cannot multiply or divide.
QUANTITATIVE ANALYSIS IN EVALUATION
• INTERVAL SCALE
The interval scale has equal units of measurement,
thereby, making it possible to interpret the order of
scale scores and the distance between them.
However, interval scales do not have a “true zero”.
With the interval data, addition and subtraction are
possible but you cannot multiply or divide.
QUANTITATIVE ANALYSIS IN EVALUATION
• RATIO SCALE
Ratio scale is considered the highest level of
measurement. It has the characteristics of an
interval scale but it has a zero point. Because of this
property, all statistical operations can be performed
on ratio scales. All descriptive and inferential
statistics may be applied. All variables can be
added, subtracted, multiplied and divided.
Descriptive Data Analysis
Descriptive data analysis provides simple summaries
about the sample and the measures. It is used to
simply describe what is or what the data shows.
Different statistical measures are used to analyze data
and draw conclusions under descriptive data analysis
(Trochim, 2020). This type of data analysis does not
attempt to test hypothesis. The following statistical
measures of descriptive analysis are used to compute
further statistical testing (Prieto, Naval, and Carey
2017; Florida State University 2005):
Frequency

It refers to the number of times each data occurs.


Frequency table is used to record the occurrence of
each data. The table contains the list of collected
data on the left column and its number of
occurrences on the right column. Frequency just help
you organize your data. It does not provide a great
deal of descriptive information about the data. But
frequency is the starting point for many other
statistical methods.
Frequency
MEASURES OF CENTRAL TENDENCY

Suppose, senior high school students were asked


how many hours they spent on the computer, and
in what subject they often used the computer for.
Results of the survey could indicate that on the
average, the senior high school students spent two
(2) or more hours with a range of one (1) to four (4)
hours. A typical senior high school student spent
more than two hours studying his/her research
subject using the computer.
DESCRIPTIVE DATA ANALYSIS
1.1 MEAN
Often called the arithmetic average of a set of data,
the mean is the sum of the observed values in the
distribution divided by the number of observations.
It is frequently used for interval or ratio data. The
symbol 𝑋(xത bar) is used to denote the arithmetic
mean.
The mean is calculated by summing up the
observations (items, height, scores or responses)
and dividing by the number of observations.
DESCRIPTIVE DATA ANALYSIS
1.1 MEAN
𝒔𝒖𝒎 𝒐𝒇 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔
𝑴𝒆𝒂𝒏 𝑿 ഥ =
𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔
The formula is:
𝑛
𝑋𝑖
𝑋ത = ෍
𝑛
𝑖=1

Σ𝑥

Or 𝑋 =
𝑛
DESCRIPTIVE DATA ANALYSIS
1.1 MEAN
A. For Ungrouped Data
Ex. 1. find the mean of the measurement
18, 26, 27, 29, 30
Ex. 2. find the mean of the following:
Scores in the National Achievement Test (NAT)
90 95 96 87 110
102 95 98 87 117
115 96 91 95 95
93 105 86 103 106
DESCRIPTIVE DATA ANALYSIS
1.1 MEAN
B. For Grouped Data
When the observations are grouped into classes,
the formula for grouped data is as follows:
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑐𝑙𝑎𝑠𝑠
𝑀𝑒𝑎𝑛 𝑋ത = 𝑥 𝑐𝑙𝑎𝑠𝑠 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
DESCRIPTIVE DATA ANALYSIS
1.2 The Weighted Mean
The weighted average or weighted mean is
necessary in some situations. Suppose that you are
given the means of two or more measurements
and you wish to find the mean of all measures
combined into one group. The formula for
weighted mean is given by,
𝚺𝒇𝒙
𝑿𝒘 =
𝒏
Where:
f = frequency
x = numerical value or item in a set of data
n = number of observations in the data set
DESCRIPTIVE DATA ANALYSIS
1.2 The Weighted Mean
Ex. 1. Find the mean of the heights of 50 senior
high school students summarized as follows:
Heights (in inches) Frequency Height x Frequency
56 6 336
57 15 855
58 12 696
59 8 472
60 5 300
61 2 122
62 2 124
𝜮𝒇 = 𝟓𝟎 𝜮𝒇𝒙 = 𝟐𝟗𝟎𝟓
DESCRIPTIVE DATA ANALYSIS
1.2 The Weighted Mean
Ex. 2. solve for the mean of the data.
Class Frequency (f) Class Midpoint (x) Fx
76-80 3 78 234
71-75 5 73 365
66-70 6 68 408
61-65 8 63 504
56-60 10 58 580
51-55 7 53 371
46-50 7 48 336
41-45 3 43 129
36-40 1 38 38
TOTAL 50 2965
DESCRIPTIVE DATA ANALYSIS
1.3 Median
The median is the midpoint of the distribution. It
represents the point in the data where 50% of the
values fall below that point and 50% fall above it.
When the distribution has an even number of
observations, the median is the average of the two
middle scores. The median is the most appropriate
measure of central tendency for ordinal data.
DESCRIPTIVE DATA ANALYSIS
1.3 Median
A. For Ungrouped Data
1. Arrange the items (scores, responses,
observations) from lowest to highest.
2. Count to the middle value. For an odd number
of values arranged from lowest to highest, the
median corresponds to value. If the array
contains an even number of observations, the
median is the average of the two middle values.
DESCRIPTIVE DATA ANALYSIS
1.3 Median
Ex. 1. Consider these odd numbers of numerical
values:
7, 8, 8, 9, 10, 12, 23
Ex. 2. consider these even numbers of numerical
values:
12, 15, 18, 22, 30, 32
Ex. 3. find the median for the set of measurements.
15, 20, 12, 26, 3, 30, 14
DESCRIPTIVE DATA ANALYSIS
1.3 Median
B. For Grouped Data
If the data are grouped into classes,
𝑛 𝑡ℎthe median will fall
into one of the classes as the ( ) value. The process
2
involves several steps and has for its general formula
the following:
𝑛
− 𝐹
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + 𝑖(2 )
𝑓
Where:
L = exact lower limit of the class containing the median (median class)
i = interval size
n = total number of items observations
F = cumulative frequency in the class preceding the median class
f = frequency of the median class
DESCRIPTIVE DATA ANALYSIS
1.3 Median
B. For Grouped Data
Ex. 4. The following data show the distribution of
the ages of people interviewed for a survey on a
topic about climate change.
Class Interval (x) Frequency (f) Cumulative Frequency (F)
11-20 20 20
21-30 14 34
31-40 22 56
41-50 18 74
51-60 14 88
61-70 12 100
𝒇 = 𝟏𝟎𝟎
DESCRIPTIVE DATA ANALYSIS
1.4 Mode
The mode is the most frequently occurring value in
a set of observations. In cases where there is more
that one observation which is the highest but with
equal frequency, the distribution is bimodal (with 2
highest observations) or multimodal (with more
than two highest observations). In cases where
every item has an equal number of observations,
there is no mode. The mode is appropriate for
nominal data.
DESCRIPTIVE DATA ANALYSIS
1.4 Mode
Ex. 1. The ages of fifteen (10) persons assembled in
a room area as follows:
16, 18, 18, 25, 25, 25, 30, 34, 36 and 38.
Ex. 2. The number of hours spent by 10 students in
an internet café was as follows:
2, 2, 2, 3, 3, 4, 4, 4, 5, 5
DESCRIPTIVE DATA ANALYSIS
3. Measures of Dispersion
Suppose you ask a group of senior high school
students to rate the quality of food at the school
canteen and you find out that the average rating is
3.5 using the following scale: 5 (Excellent); 4 (Very
Satisfactory); 3 (Satisfactory); 2 (Fair); and 1 (Poor).
How close are the ratings given by the students?
Do their ratings cluster around the middle point of
3, or are their rating spread or dispersed, with
some students giving ratings of 1 and the rest
giving ratings of 5?
DESCRIPTIVE DATA ANALYSIS
3. Measures of Dispersion

Dispersion is a way of describing how spread


out a set of data is. It is important for
describing the spread of the data, or its
variation around a central value. It is also
called measure of variability.
The measures to be considered are the range,
standard deviation and the variance.
DESCRIPTIVE DATA ANALYSIS
3.1 The Range
The range is the difference between the largest and
the smallest values in a set of data.
Consider the following scored obtained by ten (10)
students participating in a mathematics contest:
6, 10, 12, 15, 18, 18, 20, 23, 25, 28
Thus, the range is 22. The scores range from 6 to
28.
DESCRIPTIVE DATA ANALYSIS
3.1 The Range
For example, using the scores obtained by Ms. Static
from her students, we
could calculate the range as R = highest score – lowest
score.
50 50 50 42 42 42 42 42 33 33 30 30 30 30 27 27 27 21
19 19

Finding the Range: In this distribution of scores, 50 is


the highest score and
19 is the lowest score. Getting the difference of the
two scores (50 – 19 = 31) then the
range of the set of scores is 31.
DESCRIPTIVE DATA ANALYSIS
3.2 Standard Deviation
The standard deviation (SD) is a measure of the
spread or variation of data about the mean.
SD computed by calculating the average distance
that the average value is from the mean.
A. For Ungrouped Data
The formula for calculating the standard deviation
for ungrouped data is given by
Σ(𝑥 − 𝑥)ҧ 2
𝑆𝐷 =
𝑛−1
3.2 Standard Deviation
Ex.1. Consider the same data used in using the range. The
values are 6, 10, 12, 15, 18, 18, 20, 23, 25, 28.
Solution:
1. Compute the mean.
2. Subtract the mean 𝑥ҧ from each score (x), or 𝑥 − 𝑥.ҧ
3. Square each difference from Step 2, or (𝑥 − 𝑥)ҧ 2
4. Sum all the squares from Step 3 or (𝑥 − 𝑥)ҧ 2
5. Divide the number in step 4 by 𝑛 − 1. The umber of
items or scores is denoted by n. the quantity 𝑛 − 1 is
called the degrees of freedom, a statistical concept that
produces a more accurate estimate of the data
6. Compute the standard deviation using the formula
3.2 Standard Deviation
1. Approximately 68% of the scores in the sample falls within one
standard deviation of the mean
2. Approximately 95% of the scores in the sample falls
within two standard deviations of the mean
3. Approximately, 99% of the scores in the sample falls
within three standard deviations of the mean
4. In our example, with a 𝑥ҧ of 17.5 and a SD of 6.95, we can
say that,
68% of the scores will in the range
= (17.5-6.95) to (17.5+6.95)
= 10. 5 to 24. 45
5. Likewise, 95% of the scores will fall in the range
= 17.5- (2) (6.95) to 1.75+ (2) (6.95)
= (17.5-13.9) to (17.5+13.9)
= 3.6 to 31.4
3.2 Standard Deviation
For example, Ms. Static wants to know how widely
disperse the scores of her10 students from the mean.
50 50 42 42 42 33 30 30 27 19
3.3 Variance
It is the average of the squared differences from
the Mean. Variance could be easily calculated along
with the standard deviation because variance is the
square value of the standard deviation.
Let us take the example given in the standard
deviation, if you could see in step 5 (refer to the
example in SD), the calculated value is 94.9. This
value is what we call the variance of the data. The
same goes if you compute the square of the
standard deviation 9.742 = 94.9
INFERENTIAL DATA ANALYSIS
Inferential statistics refers to statistical measures
and techniques that allow us to use samples to
make generalizations about the population from
which the samples were drawn.
1. Test of Significance of Difference (T-test)
• Between Means
❖For independent samples (i.e when the
respondents consist of two different groups of
boys and girls, working mothers and non-working
mothers, healthy and malnourished children and
the like)
Inferential Data Analysis
If descriptive statistics just simply give us a summary of
what are the data presented, inferential statistics allows
us to make inferences and generalizations about the
population using the selected samples.
Inferential data analysis is used to draw conclusions from
a sample and draw conclusion about the population. Just
secure that our sample accurately reflects the population
(Frost 2020).
Some of the inferential statistics that are used to test
significant differences and relationships are discussed in
this topic.
INFERENTIAL DATA ANALYSIS
1. Test of Significance of Difference (T-test)

A t-test is a type of inferential statistics used


to determine if there is a significant
difference between the means of within-
group or two groups, which may be related to
certain features (Kenton, 2019). This test is
used only when your two groups of
participants are both less than 30.
INFERENTIAL DATA ANALYSIS
1. Test of Significance of Difference (T-test)

Analytics Vidhaya (2019) provides some assumptions


we need to notice before performing the t-test.
1. The data should follow a continuous or ordinal
scale (for example is the IQ test scores of students).
2. The observations in the data should be randomly
selected.
3. Large sample size should be taken for the data to
approach a normal distribution (although t-test is
essential for small samples as their distributions are
non-normal).
4. Variances among the groups should be equal (for
independent two-sample t-test).
INFERENTIAL DATA ANALYSIS
1. Test of Significance of Difference (T-test)
Case 1: 𝛿1 𝛿2 unknown or 𝒏𝟏 ≥ 𝟑𝟎 𝒂𝒏𝒅 𝒏𝟐 𝟑𝟎
𝑥1ҧ − 𝑥ҧ2 − (𝜇1 − 𝜇2 )
𝑧=
𝛿12 𝛿22
+
𝑛1 𝑛2

Case 2: 𝛿1≠ 𝛿2 and 𝒏𝟏 < 𝟑𝟎 𝒂𝒏𝒅 𝒏𝟐 < 𝟑𝟎


𝑥ҧ 1 −𝑥ҧ2 −(𝜇1 −𝜇2 )
t= 𝑑𝑓 = 𝑠𝑚𝑎𝑙𝑙𝑒𝑟 𝑜𝑓𝑛1 − 1 𝑜𝑟 𝑛2 − 1
𝑠2 2
1 + 𝑠2
𝑛1 𝑛2
INFERENTIAL DATA ANALYSIS
1. Test of Significance of Difference (T-test)
Case 3: 𝛿1 = 𝛿2 and 𝒏𝟏 < 𝟑𝟎 𝒂𝒏𝒅 𝒏𝟐 < 𝟑𝟎
𝑥ҧ 1 −𝑥ҧ2 −(𝜇1 −𝜇2 )
t= 𝑑𝑓 = 𝑛1 + 𝑛2 − 2
𝑠2
𝑝 𝑠2
𝑝
𝑛1 𝑛2
+

where
2 2
𝑛1 − 1 𝑠1 + (𝑛2 − 1)𝑠2
𝑠𝑝2 =
𝑛1 + 𝑛2 − 2
INFERENTIAL DATA ANALYSIS
1. Test of Significance of Difference (T-test)
❖For correlated/dependent samples (i.e when the same
set of respondents or paired sets of respondents are
involved)
𝑑ҧ − 𝜇𝑑
𝑡= (𝑑𝑓 = 𝑛 − 1)
𝑠𝑑 𝑛
• Between Proportions or Percentages
❖For independent samples
𝑝1 − 𝑝2 − (𝑝1 − 𝑝2 )
𝑧=
𝑝𝑞 𝑝𝑞
1𝑛 +𝑛2
INFERENTIAL DATA ANALYSIS
1. Test of Significance of Difference (T-test)
• Between Proportions or Percentages
❖For correlated/dependent samples
𝐷−𝐴 𝑝1 −𝑝2
𝑧= 𝐴+𝐷
or 𝑧=
𝑎+𝑑
𝑁
INFERENTIAL DATA ANALYSIS
2. Z-test
A z-test is also a type of inferential statistics
used to determine if there is a significant
difference between the means of two
comparing groups. The difference between z-
test to t-test is the number of sample
participants. If you are finding a significant
difference between the means of two groups
but your samples in each comparing group
are more than 30, then the z-test is the
appropriate test to use.
INFERENTIAL DATA ANALYSIS
2. Z-test
According to Glen (2020), below are needed to be
noticed before performing the z-test.
1. Your sample size must be greater than 30.
Otherwise, use a t-test.
2. Your data should be normally distributed.
However, for large sample sizes (over 30) this
doesn’t always matter.
3. Your data should be randomly selected from a
population, where each item has an equal chance
of being selected.
4. The sample sizes should be equal if possible.
INFERENTIAL DATA ANALYSIS
• ANALYSIS OF VARIANCE (ANOVA)

This is used when significance of difference of


means of three or more groups are to be
determined at one time. ANOVA relies on the
F-ration to test the hypothesis that the two
variances are equal; that is, the subgroups are
from the same population. If no true variance
exists between the groups, the ANOVA's F-
ratio should equal close to 1.
INFERENTIAL DATA ANALYSIS
• ANALYSIS OF VARIANCE (ANOVA)
• An ANOVA test is a way to find out if survey or experiment
results are significant. In other words, they help you to figure
out if you need to reject the null hypothesis or accept
the alternate hypothesis.
• Basically, you’re testing groups to see if there’s a difference
between them. Examples of when you might want to test
different groups:
• A group of psychiatric patients are trying three different
therapies: counseling, medication and biofeedback. You want
to see if one therapy is better than the others.
• A manufacturer has two different processes to make light
bulbs. They want to know if one process is better than the
other.
• Students from different colleges take the same exam. You
want to see if one college outperforms the other.
INFERENTIAL DATA ANALYSIS
• ANALYSIS OF VARIANCE (ANOVA)
What Does “One-Way” or “Two-Way Mean?
One-way or two-way refers to the number
of independent variables (IVs) in your Analysis of
Variance test.
One-way has one independent variable (with
2 levels). For example: brand of cereal,
Two-way has two independent variables (it can
have multiple levels). For example: brand of cereal,
calories.
INFERENTIAL DATA ANALYSIS
• ANALYSIS OF VARIANCE (ANOVA)
ANOVA is used when significance of difference of means of
two or more groups are to be determined at one time
❖One-Way Analysis of Variance
A typical ANOVA Table
Source of Degree f Sum f Mean
F-ration 𝝆
variation Freedom Squares Square
Between
groups
Within
groups

Total
INFERENTIAL DATA ANALYSIS
• ANALYSIS OF VARIANCE (ANOVA)
❖One-Way Analysis of Variance
ANOVA relies on the F-ration to test the hypothesis that
the two variances are equal; that is, the subgroups are
from the same population. “Between group” refers to the
variation between each group mean and the grand or
overall mean.
INFERENTIAL DATA ANALYSIS
2. TESTS OF SIGNIFICANT RELATIONSHIP
• Spearman Rank-Order Correlation or
Spearman rho
It is used when data available are expressed in
ranks (ordinal variables). Use Spearman rho
when you have two ranked variables, and you
want to test whether the two variables covary;
whether, as one variable increases, the other
variable tends to increase or decrease. You also
use this if you have one measurement variable
and one ranked variable; in this case, you
convert the measurement variable to ranks and
use Spearman rank correlation on the two sets
of variables (McDonald 2014).
INFERENTIAL DATA ANALYSIS
2. TESTS OF SIGNIFICANT RELATIONSHIP
•Spearman Rank-Order Correlation or
Spearman rho
This is used when data available are
expressed in terms of ranks (ordinal
variable)
6 σ 𝐷2
𝜌 =1−
𝑁(𝑁 2 − 1)
INFERENTIAL DATA ANALYSIS
2. TESTS OF SIGNIFICANT RELATIONSHIP
• Spearman Rank-Order Correlation or
Spearman rho
For example, Melfi and Poyser (2007) observed
the dominance behavior of 6 male colombus
monkeys in a zoo. They ranked the monkey
based on its dominance over others. After
determining the dominance rankings, they
counted eggs of Trichuris nematodes per gram
of monkey feces, a measurement variable. They
wanted to know whether social dominance was
associated with the number of nematode eggs,
so they converted eggs per gram of feces to
ranks and used Spearman rank correlation.
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• Chi-Square Test for Independence
It is used when data expressed in terms of
frequencies or percentages (nominal
variables). A chi-square test measures how
expectations are related to actual observed
data. The data used in calculating a chi-square
test must be random, raw, mutually exclusive,
drawn from independent variables, and
drawn from a large enough sample (Hayes
2020).
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• Chi-Square Test for Independence
This is used when data are expressed in terms
of frequencies or percentages (nominal
variables)
Contingency Table
2
(𝑂 − 𝐸)
𝑥2 = ෍ 𝑑𝑓 = (𝑟 − 1)(𝑐 − 1)
𝐸
where
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
𝐸=
(𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙)
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• Product – Moment Coefficient of
Correlation or Pearson r
This is used when data are expressed in terms
of ratio and interval variables. It is used to
evaluates the linear relationship between two
continuous variables. A relationship is linear
when a change in one variable is associated
with a proportional change in the other
variable.
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• Product – Moment Coefficient of
Correlation or Pearson r
For example, you want to determine if there
is an association with students’ anxiety in
mathematics (continuous variable) and their
performance in their mathematics class
(continuous variable).
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• Product – Moment Coefficient of
Correlation or Pearson r
This is used when data are expressed in terms
of scores such as weights and heights or
scores in a test (ratio or interval).
Case 1: When deviations from the mean are used
σ(𝑥 − 𝑥)(𝑦
ҧ − 𝑦)

𝑟=
σ(𝑥 − 𝑥ҧ 2 σ(𝑦 − 𝑦ത 2
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• Product – Moment Coefficient of Correlation
or Pearson r
This is used when data are expressed in terms
of scores such as weights and heights or
scores in a test (ratio or interval).
Case 1: When raw scores on the original
observations are used
𝑛 σ 𝑥𝑦 − (σ 𝑥)(σ 𝑦)
𝑟=
2 2
𝑛 σ 𝑥2 − (σ 𝑥) 𝑛 σ 𝑦2 − (σ 𝑦)
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• T-test to test the Significance of Pearson r
The t-test to test the significance of Pearson r is used
to determine if the value of computed coefficient of
correlation is significant. That is, does it represent a
real correlation or is the obtained coefficient or
correlation merely brought about by
The formula
2
2
(𝑓𝑜 − 𝑓𝑒 )
𝑥 =෍ (𝑑𝑓 = (𝑘 − 1)
𝑓𝑒
where: 𝑛−2
r = correlation coefficient 𝑡=𝑟
n = number samples 1 − 𝑟2
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• T-test to test the Significance of Pearson r
The coefficient of detemination (𝑟 2 )can also be
used to indicate what propotion of the total
variation in the dependent variable is explained by
the linear relationship with the independent
variable. You can multiply by 100 to convert the
coefficient of determination to percent.

You might also like