Statistical Treatment (Part of Module 4
Statistical Treatment (Part of Module 4
TREATMENT
LESSON3B
A Taxonomy of Statistics
Planning Data Analyses using Statistics
◦ Exploratory Data Analysis. This type of data analysis is used when it is not clear what to expect from
the data. This strategy used numerical ad visual presentations such as graphs. Since the research of
interest is new, it is possible to find some inconsistencies, such as missing values, distribution of the data
usually small or to large values or invalid data.
◦ Descriptive Data Analysis. This type of data is use to describe, show or summarize data in a
meaningful way, leading to a simple interpretation of data. It do not allow to formulate conclusions
beyond the data that have been described. The commonly used descriptive statistics are those that
analyze the distribution of data such as frequency, percentage, measure of central tendency and measure
of dispersion.
All of the above mentioned used of data analysis plan could be manipulated by
using or any combination of the following data analysis strategies:
◦ Inferential Data Analysis. This test hypotheses about a set of data to reach conclusions or make
generalizations beyond merely describing data. Inferential statistics include tests of significance of
difference such as t-test, Analysis of Variance (ANOVA); and tests of relationship such as Product
Moment Coefficient or Correlation or Pearson r, Spearman rho, linear regression and Chi-square test.
Quantitative Analysis of Evaluation
◦ Determining the level of measurement of the quantitative data is important before proceeding with
analysis of data. The choice of statistical measure/s is dependent on the level of measurement of the
data. The following are the levels of measurement scales;
Nominal scale
Ordinal scale
Interval scale
Ratio scale
Quantitative Analysis of Evaluation
◦ Nominal Scale of measurement is used for labelling variables. It is sometimes called categorical data.
◦ Ordinal Scale of measurements assigns order or items on the characteristic being measured. It involves
the ranking of individuals, attitudes and characteristic.
◦ Interval Scale has equal units of measurement, thereby, making it possible to interpret the order of the
scale scores and the distance between them. However, interval scales do not have a “true zero”. With
interval data, addition and subtraction are possible but not with multiplication and division.
◦ Ratio Scale is considered the highest level of measurement. It has the characteristics of an interval scale
but it has a zero point. Because of this property, all statistical operations can be performed on ratio
scales. All descriptive and inferential statistics may be applied. All variables can be added, subtracted,
multiplied and divided.
Descriptive Data Analysis
Descriptive Data Analysis
1. Measures of Central Tendency. The common measures of central
tendency, sometimes called measures of location or center, include the
mean, median, and mode.
1.1 Mean often called arithmetic average of a set of data, the mean is the
sum of the observed values in the distribution divided by the number of
observations. It is often used for interval or ratio data. The symbol is x̅
“x-bar” is used to denote the arithmetic mean.
Mean
Sum of observations
Mean = ------------------------------------
number of observations
Example: Find the mean of the measurement: 18, 26, 27, 29, 30
Solution:
18 + 26 + 27 + 29 + 30
= ---------------------------------- = 26
5
(Mean) For grouped data
When mean are grouped into classes the formula for group data is as follows
Solution:
frequency of each class
Mean (x̅ ) = ------------------------------------------- x class midpoint
total number of observations
Descriptive Data Analysis
1.2 Weighted Mean
x̅w = ∑fx/n
Where: f – frequency
x – numerical value or item in a set of data
n – number of observations in the data set
(Weighted Mean) For ungrouped data
Example: Find the mean of the heights of 50 senior high school students summarized as follows:
Example: Find the mean of the heights of 50 senior high school students summarized as follows:
∑fx 2905
(x̅ w) = -------- = ------------- = 58.1 inches
∑f 50
(Weighted Mean) For grouped data
When the data is grouped into classes, the class midpoint represents the “x” in the formula. Example: Solve the mean of the data
below
Solution:
∑fx 2965
(x̅ w) = -------- = ------------- = 59.3
n 50
Descriptive Data Analysis
Example: Consider these even numbers of numerical values: 12, 15, 22, 30, 32
The two middle values are 18 and 22. If the average of the two middle numbers is taken,
that is, 18 + 22 = 40 nd divided by 2 =, the median is 20
If the data are grouped into classes, the median will fall into one of the classes as the (n/2)th
value. The process involves several steps and has for its general formula the following:
n/2 –F
Median = L + I -----------
f
Where:
L = exact lower limit of the class containing the median (median class)
i = interval size
n = total number of items or observations
F = cumulative frequency in the class preceding the median class
f = frequency of median class
Continuation
(Median) For grouped data
Example: The following data show the distribution of the ages of people interviewed for a
survey on a topic climate change.
Solution:
Since there are 100 values in the data set, the median will represent the (n/2) th or the (100/2)
th item, that is the 50th largest value.
Determine in which class the 50th value falls. The first two classes have cumulative
frequency of 34 classes.
It need another 16 values to reach 50. Thus, 50 th value falls in the next class which contains
22 vales. The median class the is 31-40
Thus,
L = 30.5 n = 100 F = 34 f = 22 I = 20
Continuation
(Median) For grouped data
Solution:
n/2 –F (100/2) - 34
Median = L + I ----------- = 30.5 + 10 --------------------
f 22
Median = 37.77
This means that 50% or 50 of the 100 ages will fall bellow 37.77 and 50% or 50 will fall
above it.
Descriptive Data Analysis
Example: The ages of fifteen (15) persons assembles in a room are as follows:
16, 18, 18, 19, 25, 25, 25, 35, 35, and 36.
Solution: An age of 25 is the mode because it has been recorded three times in the sample, more
than any other age.
The range is the difference between the largest and the smallest values in a set of data.
Consider the following ranges obtained by ten (10) students participating in a mathematical
contest:
6, 10, 12, 15,18, 18, 20, 23, 25, 28
` Thus, the range is 22. The score range from 6 to 28.
Average (Mean) Deviation
This measure of spread is defined as the absolute difference or deviation between the value
in a set of data and the mean, divided by the total number of values in the set of data.
In mathematics, the term “absolute” represented by the sign “ I I “ simple means taking the
value of a number without regard to positive or negative sign.
Continuation
(Average [mean] deviation) For grouped data
The standard deviation (SD) is a measure of the spread of variation of data about the mean.
It is computed by calculating the average distance that the average value is from the mean.
The formula for calculating the standard deviation for ungrouped data is given by
∑(x –x̅ )
SD = -----------
n-1
Continuation
(Standard Deviation) For grouped data
Example: Let us consider the same data used in the illustration for using the range. The value 6, 10,
12, 15, 18, 18, 20, 23, 25, 28.
Solution:
Step 1. Compute the mean
6+10 +12+15+18+18+20+23+25+28
x̅ = ------------------------------------------------- = 17.5
10
Step 2. Subtract the mean(x̅) from each score (x) or x̅ - x.
Step 3. Square each difference from step 2 or (x̅ - x) 2
Step 4. Sum all the squares from Step 3
Step 5. Divide the number in Step 4 by n-1. The quantity n-1 is called degrees of freedom, a
statistical concept produces a more accurate estimate of the data.
Step 6. Substitute the value in the formula and compute the SD = 6.95
Interpretation of the Standard Deviation
The standard deviation allows to reach conclusions about scores in the distribution the
following conclusions can be reached if that distribution of scores is normal.
1. Approximately 65% of the scores in the sample falls within one standard deviation of the
mean
2. Approximately 95% of the scores in the sample falls within two standard deviations of the
mean
3. Approximately, 99% of the scores in the sample falls within three standard deviation by
means
Interpretation of the Standard Deviation
The standard deviation allows to reach conclusions about scores in the distribution the
following conclusions can be reached if that distribution of scores is normal.
4. In the sample, with mean of 17.5 and a standard deviation of 6.95, then
68% of the scores will fall in the range
= (17.5 – 6.95) to (17.5 = 6.9)
= 10.5 to 24.45
5. Likewise, 95% of the scores will fall in the range
= 17.5 – (2)(6.95) to 17.5 + (2)(6.95)
= 3.6 to 31.4
Inferential Data Analysis
Inferential Data Analysis
Inferential statistics refers to statistical measures and techniques
that allow to use samples to make generalizations about the
population from which the sample were drawn.
Here are the list of common statistical measures to measure
significant difference and relationship between variables.
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Means
For independent samples (i.e. when the respondents consist of two different
groups as boys and girls, working mothers and non-working mothers,
healthy and malnourished children and the like)
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Means
For independent samples
Case 1: δ1δ2 unknown or n1 ≥ 30 n2 30 (x̅1 - x̅2) – (μ1 - μ )
2
z = ------------------------
δ1 δ2
---- + ----
n1 n2
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Means
For independent samples (x̅1 - x̅2) – (μ1 - μ ) 2
df = smaller of n1 - 1 or n2 - 1
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Means
For independent samples
Case 3: δ1 = δ2 and (x̅1 - x̅2) – (μ1 - μ )
2
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Means
For correlated/dependent samples (i.e when the same set of respondents or
paired sets of respondents are involved.)
d – μd
t = ---------- (df = n-1)
Sd n
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Proportions and Percentages
For independent samples
(p1-p2) – (p1-p2)
z = -----------------------
pq pq
---- + ----
n1 n2
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Proportions and Percentages
For correlated/dependent samples
D–A p1-p2
z = ---------- = -------------
A+D a+d
--------
N
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Analysis of Variance (ANNOVA) is used when significance of difference of
means of two or more groups are to be determined at one time
One-Way Analysis of variance
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
One-Way Analysis of variance
A typical ANNOVA table
Source of Degree of Sum f squares Mean square F-ratio p
Variance Freedom
Between
groups
Within groups
TOTAL
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
One-Way Analysis of variance relies on the F-ratio to test the hypothesis
that the two variables are equal; that is, the subgroups are from the same
population. “Between groups” refers to the variation between each group
mean and the grand of overall mean.
Inferential Data Analysis
TESTS OF RELATIONSHIP
Spear Rank-Order Correlation or Spearman rho. This is used when data
available are expressed in terms of ranks (ordinal variable).
6 ∑D2
ρ = 1 - ------------
N(N – 1)
Inferential Data Analysis
TESTS OF RELATIONSHIP
Chi-Square Test for Independence. This is used when data are expressed in
terms of frequencies or percentage (nominal variables)
Case 1. Multinomial
Case 2. Contingency Table (0-E) 2
∑ (x – x) (y - y)
r = --------------------------
[∑(x –x)]2 [∑(y –y)]2
Inferential Data Analysis
TESTS OF RELATIONSHIP
Product – Moment Coefficient of Correlation or Pearson r. This is used
when data are expressed in terms of scores such as weights and heights or
scores in a test (ratio or interval)
Case 2. When raw scores on the original observations are used
n∑xy-(∑x)(∑y)
r = ----------------------------------------
[n∑x2- (∑x)2] [n∑y2- (∑y)2]
Inferential Data Analysis
TESTS OF RELATIONSHIP
T-test to test the Significance of Pearson r is used to determined if the value
of computed coefficient of correlation is significant.