Data Analysis Using Statistics
Data Analysis Using Statistics
DATA ANALYSES
USING
STATISTICS
• When the necessary data have already been collected,
the next step is to organize the raw data for data
analysis.
• It is important that the researcher is assured of the
quality of the data for accuracy, consistency,
completeness and systematic arrangement to facilitate
coding and tabulation.
• Every research methodology requires a data analysis
plan. The plan includes specifying the statistical
measures to use and to address the research questions.
• The appropriate methods of data analysis are
determined by the type of data, the variables to be
used, the number of cases and the distribution of the
variables.
PURPOSE OF DATA ANALYSIS PLAN
The purpose of a data analysis plan is to gather
useful information to find solutions to research
questions of interest. It may be used to:
• Describe data sets;
• Determine the degree of relationship of variables;
• Determine differences between variables;
• Predict outcomes; and
• Compare variables
PURPOSE OF DATA ANALYSIS PLAN
All of the mentioned could be manipulated by using
any of a combination of the following data analysis
strategies:
• EXPLORATORY DATA ANALYSIS
This type of data analysis is used when it is not
clear what to expect from the data. This strategy
uses numerical and visual presentations such as
graphs. Since the research of interest is new, it is
possible to find some inconsistencies, such as
missing values, distribution of the data or unusually
small or too large values or invalid data.
PURPOSE OF DATA ANALYSIS PLAN
• DESCRIPTIVE DATA ANALYSIS
This type of data analysis is used to describe, show
or summarize data in a meaningful way, leading to
a simple interpretation of data. Descriptive data
analyses do not allow you to formulate conclusions
beyond the data that you have described. The
commonly used descriptive statistics are those that
analyze the distribution of data such as frequency,
percentage, measures of central tendency and
measures of dispersion.
PURPOSE OF DATA ANALYSIS PLAN
• INFERENTIAL DATA ANALYSIS
Inferential statistics tests hypotheses about a set of
data to reach conclusions or make generalizations
beyond merely describing the data. Inferential
statistics include tests of significance of difference
such as t-test, Analysis of Variance (ANOVA); and
tests of relationship such as Product Moment
Coefficient or Correlations or Pearson r, Spearman
rho, linear regression and Chi-square test.
Descriptive Data Analysis
Quantitative Data Analysis Quantitative data analysis
is a systematic approach of investigations during
which numerical data are collected and are
transformed into meaningful information (Prieto,
Naval, and Carey 2017). Data analysis refers to how a
researcher could gain meaningful insight on a mass
number of data. The main purpose of data analysis in
research is to find meaning in data so that the derived
knowledge can be used to make informed decisions.
QUANTITATIVE ANALYSIS IN EVALUATION
• NOMINAL SCALE
A nominal scale of measurement is used for
labelling variables. It is sometimes called
categorical data. Basketball players wear sports
shirts with numbers, but that is just a way to
identify the players. Likewise, if you want to
categorize respondents based on gender, you could
use 1 for male, and 2 for female. No order or
distance is observed. The Yes or No scale is an
example of nominal data. The numbers assigned to
the variables have no quantitative value. Some
examples of variables measured on a nominal scale
are gender, religious affiliation, race or ethnic
group.
QUANTITATIVE ANALYSIS IN EVALUATION
• ORDINAL SCALE
An ordinal scale of measurement assigns order on
items on the characteristics being measured. It
involves the ranking of individuals, attitudes and
characteristics. The order in the honor roll (first
honor, second honor, third honor); order of
agreement (strongly agree, agree, strongly
disagree) or economic status (low, average, high)
are some examples.
Numerical scores such as first, second, third and so
on are assigned but the numerical value or quantity
has no value except its ability to establish ranking
among set of data.
QUANTITATIVE ANALYSIS IN EVALUATION
• INTERVAL SCALE
The interval scale has equal units of measurement,
thereby, making it possible to interpret the order of
scale scores and the distance between them.
However, interval scales do not have a “true zero”.
With the interval data, addition and subtraction are
possible but you cannot multiply or divide.
QUANTITATIVE ANALYSIS IN EVALUATION
• INTERVAL SCALE
The interval scale has equal units of measurement,
thereby, making it possible to interpret the order of
scale scores and the distance between them.
However, interval scales do not have a “true zero”.
With the interval data, addition and subtraction are
possible but you cannot multiply or divide.
QUANTITATIVE ANALYSIS IN EVALUATION
• RATIO SCALE
Ratio scale is considered the highest level of
measurement. It has the characteristics of an
interval scale but it has a zero point. Because of this
property, all statistical operations can be performed
on ratio scales. All descriptive and inferential
statistics may be applied. All variables can be
added, subtracted, multiplied and divided.
Descriptive Data Analysis
Descriptive data analysis provides simple summaries
about the sample and the measures. It is used to
simply describe what is or what the data shows.
Different statistical measures are used to analyze data
and draw conclusions under descriptive data analysis
(Trochim, 2020). This type of data analysis does not
attempt to test hypothesis. The following statistical
measures of descriptive analysis are used to compute
further statistical testing (Prieto, Naval, and Carey
2017; Florida State University 2005):
Frequency
Σ𝑥
ത
Or 𝑋 =
𝑛
DESCRIPTIVE DATA ANALYSIS
1.1 MEAN
A. For Ungrouped Data
Ex. 1. find the mean of the measurement
18, 26, 27, 29, 30
Ex. 2. find the mean of the following:
Scores in the National Achievement Test (NAT)
90 95 96 87 110
102 95 98 87 117
115 96 91 95 95
93 105 86 103 106
DESCRIPTIVE DATA ANALYSIS
1.1 MEAN
B. For Grouped Data
When the observations are grouped into classes,
the formula for grouped data is as follows:
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑐𝑙𝑎𝑠𝑠
𝑀𝑒𝑎𝑛 𝑋ത = 𝑥 𝑐𝑙𝑎𝑠𝑠 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
DESCRIPTIVE DATA ANALYSIS
1.2 The Weighted Mean
The weighted average or weighted mean is
necessary in some situations. Suppose that you are
given the means of two or more measurements
and you wish to find the mean of all measures
combined into one group. The formula for
weighted mean is given by,
𝚺𝒇𝒙
𝑿𝒘 =
𝒏
Where:
f = frequency
x = numerical value or item in a set of data
n = number of observations in the data set
DESCRIPTIVE DATA ANALYSIS
1.2 The Weighted Mean
Ex. 1. Find the mean of the heights of 50 senior
high school students summarized as follows:
Heights (in inches) Frequency Height x Frequency
56 6 336
57 15 855
58 12 696
59 8 472
60 5 300
61 2 122
62 2 124
𝜮𝒇 = 𝟓𝟎 𝜮𝒇𝒙 = 𝟐𝟗𝟎𝟓
DESCRIPTIVE DATA ANALYSIS
1.2 The Weighted Mean
Ex. 2. solve for the mean of the data.
Class Frequency (f) Class Midpoint (x) Fx
76-80 3 78 234
71-75 5 73 365
66-70 6 68 408
61-65 8 63 504
56-60 10 58 580
51-55 7 53 371
46-50 7 48 336
41-45 3 43 129
36-40 1 38 38
TOTAL 50 2965
DESCRIPTIVE DATA ANALYSIS
1.3 Median
The median is the midpoint of the distribution. It
represents the point in the data where 50% of the
values fall below that point and 50% fall above it.
When the distribution has an even number of
observations, the median is the average of the two
middle scores. The median is the most appropriate
measure of central tendency for ordinal data.
DESCRIPTIVE DATA ANALYSIS
1.3 Median
A. For Ungrouped Data
1. Arrange the items (scores, responses,
observations) from lowest to highest.
2. Count to the middle value. For an odd number
of values arranged from lowest to highest, the
median corresponds to value. If the array
contains an even number of observations, the
median is the average of the two middle values.
DESCRIPTIVE DATA ANALYSIS
1.3 Median
Ex. 1. Consider these odd numbers of numerical
values:
7, 8, 8, 9, 10, 12, 23
Ex. 2. consider these even numbers of numerical
values:
12, 15, 18, 22, 30, 32
Ex. 3. find the median for the set of measurements.
15, 20, 12, 26, 3, 30, 14
DESCRIPTIVE DATA ANALYSIS
1.3 Median
B. For Grouped Data
If the data are grouped into classes,
𝑛 𝑡ℎthe median will fall
into one of the classes as the ( ) value. The process
2
involves several steps and has for its general formula
the following:
𝑛
− 𝐹
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + 𝑖(2 )
𝑓
Where:
L = exact lower limit of the class containing the median (median class)
i = interval size
n = total number of items observations
F = cumulative frequency in the class preceding the median class
f = frequency of the median class
DESCRIPTIVE DATA ANALYSIS
1.3 Median
B. For Grouped Data
Ex. 4. The following data show the distribution of
the ages of people interviewed for a survey on a
topic about climate change.
Class Interval (x) Frequency (f) Cumulative Frequency (F)
11-20 20 20
21-30 14 34
31-40 22 56
41-50 18 74
51-60 14 88
61-70 12 100
𝒇 = 𝟏𝟎𝟎
DESCRIPTIVE DATA ANALYSIS
1.4 Mode
The mode is the most frequently occurring value in
a set of observations. In cases where there is more
that one observation which is the highest but with
equal frequency, the distribution is bimodal (with 2
highest observations) or multimodal (with more
than two highest observations). In cases where
every item has an equal number of observations,
there is no mode. The mode is appropriate for
nominal data.
DESCRIPTIVE DATA ANALYSIS
1.4 Mode
Ex. 1. The ages of fifteen (10) persons assembled in
a room area as follows:
16, 18, 18, 25, 25, 25, 30, 34, 36 and 38.
Ex. 2. The number of hours spent by 10 students in
an internet café was as follows:
2, 2, 2, 3, 3, 4, 4, 4, 5, 5
DESCRIPTIVE DATA ANALYSIS
3. Measures of Dispersion
Suppose you ask a group of senior high school
students to rate the quality of food at the school
canteen and you find out that the average rating is
3.5 using the following scale: 5 (Excellent); 4 (Very
Satisfactory); 3 (Satisfactory); 2 (Fair); and 1 (Poor).
How close are the ratings given by the students?
Do their ratings cluster around the middle point of
3, or are their rating spread or dispersed, with
some students giving ratings of 1 and the rest
giving ratings of 5?
DESCRIPTIVE DATA ANALYSIS
3. Measures of Dispersion
where
2 2
𝑛1 − 1 𝑠1 + (𝑛2 − 1)𝑠2
𝑠𝑝2 =
𝑛1 + 𝑛2 − 2
INFERENTIAL DATA ANALYSIS
1. Test of Significance of Difference (T-test)
❖For correlated/dependent samples (i.e when the same
set of respondents or paired sets of respondents are
involved)
𝑑ҧ − 𝜇𝑑
𝑡= (𝑑𝑓 = 𝑛 − 1)
𝑠𝑑 𝑛
• Between Proportions or Percentages
❖For independent samples
𝑝1 − 𝑝2 − (𝑝1 − 𝑝2 )
𝑧=
𝑝𝑞 𝑝𝑞
1𝑛 +𝑛2
INFERENTIAL DATA ANALYSIS
1. Test of Significance of Difference (T-test)
• Between Proportions or Percentages
❖For correlated/dependent samples
𝐷−𝐴 𝑝1 −𝑝2
𝑧= 𝐴+𝐷
or 𝑧=
𝑎+𝑑
𝑁
INFERENTIAL DATA ANALYSIS
2. Z-test
A z-test is also a type of inferential statistics
used to determine if there is a significant
difference between the means of two
comparing groups. The difference between z-
test to t-test is the number of sample
participants. If you are finding a significant
difference between the means of two groups
but your samples in each comparing group
are more than 30, then the z-test is the
appropriate test to use.
INFERENTIAL DATA ANALYSIS
2. Z-test
According to Glen (2020), below are needed to be
noticed before performing the z-test.
1. Your sample size must be greater than 30.
Otherwise, use a t-test.
2. Your data should be normally distributed.
However, for large sample sizes (over 30) this
doesn’t always matter.
3. Your data should be randomly selected from a
population, where each item has an equal chance
of being selected.
4. The sample sizes should be equal if possible.
INFERENTIAL DATA ANALYSIS
• ANALYSIS OF VARIANCE (ANOVA)
Total
INFERENTIAL DATA ANALYSIS
• ANALYSIS OF VARIANCE (ANOVA)
❖One-Way Analysis of Variance
ANOVA relies on the F-ration to test the hypothesis that
the two variances are equal; that is, the subgroups are
from the same population. “Between group” refers to the
variation between each group mean and the grand or
overall mean.
INFERENTIAL DATA ANALYSIS
2. TESTS OF SIGNIFICANT RELATIONSHIP
• Spearman Rank-Order Correlation or
Spearman rho
It is used when data available are expressed in
ranks (ordinal variables). Use Spearman rho
when you have two ranked variables, and you
want to test whether the two variables covary;
whether, as one variable increases, the other
variable tends to increase or decrease. You also
use this if you have one measurement variable
and one ranked variable; in this case, you
convert the measurement variable to ranks and
use Spearman rank correlation on the two sets
of variables (McDonald 2014).
INFERENTIAL DATA ANALYSIS
2. TESTS OF SIGNIFICANT RELATIONSHIP
•Spearman Rank-Order Correlation or
Spearman rho
This is used when data available are
expressed in terms of ranks (ordinal
variable)
6 σ 𝐷2
𝜌 =1−
𝑁(𝑁 2 − 1)
INFERENTIAL DATA ANALYSIS
2. TESTS OF SIGNIFICANT RELATIONSHIP
• Spearman Rank-Order Correlation or
Spearman rho
For example, Melfi and Poyser (2007) observed
the dominance behavior of 6 male colombus
monkeys in a zoo. They ranked the monkey
based on its dominance over others. After
determining the dominance rankings, they
counted eggs of Trichuris nematodes per gram
of monkey feces, a measurement variable. They
wanted to know whether social dominance was
associated with the number of nematode eggs,
so they converted eggs per gram of feces to
ranks and used Spearman rank correlation.
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• Chi-Square Test for Independence
It is used when data expressed in terms of
frequencies or percentages (nominal
variables). A chi-square test measures how
expectations are related to actual observed
data. The data used in calculating a chi-square
test must be random, raw, mutually exclusive,
drawn from independent variables, and
drawn from a large enough sample (Hayes
2020).
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• Chi-Square Test for Independence
This is used when data are expressed in terms
of frequencies or percentages (nominal
variables)
Contingency Table
2
(𝑂 − 𝐸)
𝑥2 = 𝑑𝑓 = (𝑟 − 1)(𝑐 − 1)
𝐸
where
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
𝐸=
(𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙)
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• Product – Moment Coefficient of
Correlation or Pearson r
This is used when data are expressed in terms
of ratio and interval variables. It is used to
evaluates the linear relationship between two
continuous variables. A relationship is linear
when a change in one variable is associated
with a proportional change in the other
variable.
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• Product – Moment Coefficient of
Correlation or Pearson r
For example, you want to determine if there
is an association with students’ anxiety in
mathematics (continuous variable) and their
performance in their mathematics class
(continuous variable).
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• Product – Moment Coefficient of
Correlation or Pearson r
This is used when data are expressed in terms
of scores such as weights and heights or
scores in a test (ratio or interval).
Case 1: When deviations from the mean are used
σ(𝑥 − 𝑥)(𝑦
ҧ − 𝑦)
ത
𝑟=
σ(𝑥 − 𝑥ҧ 2 σ(𝑦 − 𝑦ത 2
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• Product – Moment Coefficient of Correlation
or Pearson r
This is used when data are expressed in terms
of scores such as weights and heights or
scores in a test (ratio or interval).
Case 1: When raw scores on the original
observations are used
𝑛 σ 𝑥𝑦 − (σ 𝑥)(σ 𝑦)
𝑟=
2 2
𝑛 σ 𝑥2 − (σ 𝑥) 𝑛 σ 𝑦2 − (σ 𝑦)
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• T-test to test the Significance of Pearson r
The t-test to test the significance of Pearson r is used
to determine if the value of computed coefficient of
correlation is significant. That is, does it represent a
real correlation or is the obtained coefficient or
correlation merely brought about by
The formula
2
2
(𝑓𝑜 − 𝑓𝑒 )
𝑥 = (𝑑𝑓 = (𝑘 − 1)
𝑓𝑒
where: 𝑛−2
r = correlation coefficient 𝑡=𝑟
n = number samples 1 − 𝑟2
INFERENTIAL DATA ANALYSIS
2. TESTS OF RELATIONSHIP
• T-test to test the Significance of Pearson r
The coefficient of detemination (𝑟 2 )can also be
used to indicate what propotion of the total
variation in the dependent variable is explained by
the linear relationship with the independent
variable. You can multiply by 100 to convert the
coefficient of determination to percent.