0% found this document useful (0 votes)
70 views56 pages

Statistical Treatment (Part of Module 4

Here n = 56 (total number of observations) n/2 = 28 The class preceding the median class is 31-40 with F = 56 The median class is 31-40 with f = 22 L = 31 (lower limit of median class) i = 10 (class interval) Median = 31 + (28 - 56)/22 = 35 Therefore, the median age is 35 years.

Uploaded by

Ruby Liza Capate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views56 pages

Statistical Treatment (Part of Module 4

Here n = 56 (total number of observations) n/2 = 28 The class preceding the median class is 31-40 with F = 56 The median class is 31-40 with f = 22 L = 31 (lower limit of median class) i = 10 (class interval) Median = 31 + (28 - 56)/22 = 35 Therefore, the median age is 35 years.

Uploaded by

Ruby Liza Capate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 56

STATISTICAL

TREATMENT
LESSON3B
A Taxonomy of Statistics
Planning Data Analyses using Statistics

Every research methodology requires a data analysis plan. This


plan includes specifying the statistical measures to use and to address
the research questions. The appropriate methods of data analysis are
determined by the type of data, the variables to be used, the number
of cases and the distribution of the variables.
Purpose of Data Analysis Plan
The purpose of data analysis plan is to gather useful information
to find solutions to research questions of interest. It may be used to:
oDescribe data sets;
oDetermine the degree of relationship of variables;
oDetermine differences between variables;
oPredict outcomes; and compare variables.
All of the above mentioned used of data analysis plan could be manipulated by
using or any combination of the following data analysis strategies:

◦ Exploratory Data Analysis. This type of data analysis is used when it is not clear what to expect from
the data. This strategy used numerical ad visual presentations such as graphs. Since the research of
interest is new, it is possible to find some inconsistencies, such as missing values, distribution of the data
usually small or to large values or invalid data.
◦ Descriptive Data Analysis. This type of data is use to describe, show or summarize data in a
meaningful way, leading to a simple interpretation of data. It do not allow to formulate conclusions
beyond the data that have been described. The commonly used descriptive statistics are those that
analyze the distribution of data such as frequency, percentage, measure of central tendency and measure
of dispersion.
All of the above mentioned used of data analysis plan could be manipulated by
using or any combination of the following data analysis strategies:

◦ Inferential Data Analysis. This test hypotheses about a set of data to reach conclusions or make
generalizations beyond merely describing data. Inferential statistics include tests of significance of
difference such as t-test, Analysis of Variance (ANOVA); and tests of relationship such as Product
Moment Coefficient or Correlation or Pearson r, Spearman rho, linear regression and Chi-square test.
Quantitative Analysis of Evaluation
◦ Determining the level of measurement of the quantitative data is important before proceeding with
analysis of data. The choice of statistical measure/s is dependent on the level of measurement of the
data. The following are the levels of measurement scales;
 Nominal scale
 Ordinal scale
 Interval scale
 Ratio scale
Quantitative Analysis of Evaluation
◦ Nominal Scale of measurement is used for labelling variables. It is sometimes called categorical data.
◦ Ordinal Scale of measurements assigns order or items on the characteristic being measured. It involves
the ranking of individuals, attitudes and characteristic.
◦ Interval Scale has equal units of measurement, thereby, making it possible to interpret the order of the
scale scores and the distance between them. However, interval scales do not have a “true zero”. With
interval data, addition and subtraction are possible but not with multiplication and division.
◦ Ratio Scale is considered the highest level of measurement. It has the characteristics of an interval scale
but it has a zero point. Because of this property, all statistical operations can be performed on ratio
scales. All descriptive and inferential statistics may be applied. All variables can be added, subtracted,
multiplied and divided.
Descriptive Data Analysis
Descriptive Data Analysis
1. Measures of Central Tendency. The common measures of central
tendency, sometimes called measures of location or center, include the
mean, median, and mode.
1.1 Mean often called arithmetic average of a set of data, the mean is the
sum of the observed values in the distribution divided by the number of
observations. It is often used for interval or ratio data. The symbol is x̅
“x-bar” is used to denote the arithmetic mean.
Mean
Sum of observations
Mean = ------------------------------------
number of observations

The formula is:


(Mean) For ungrouped data

Example: Find the mean of the measurement: 18, 26, 27, 29, 30

Solution:
18 + 26 + 27 + 29 + 30
= ---------------------------------- = 26
5
(Mean) For grouped data

When mean are grouped into classes the formula for group data is as follows

Solution:
frequency of each class
Mean (x̅ ) = ------------------------------------------- x class midpoint
total number of observations
Descriptive Data Analysis
1.2 Weighted Mean
x̅w = ∑fx/n
Where: f – frequency
x – numerical value or item in a set of data
n – number of observations in the data set
(Weighted Mean) For ungrouped data

Example: Find the mean of the heights of 50 senior high school students summarized as follows:

Height (inches) Frequency Height x frequency


56 6 336
57 15 855
58 12 696
59 8 472
60 5 300
61 2 122
62 2 124
∑f = 50 ∑fx = 2905
Continuation
(Weighted Mean) For ungrouped data

Example: Find the mean of the heights of 50 senior high school students summarized as follows:
∑fx 2905
(x̅ w) = -------- = ------------- = 58.1 inches
∑f 50
(Weighted Mean) For grouped data
When the data is grouped into classes, the class midpoint represents the “x” in the formula. Example: Solve the mean of the data
below

Class Frequency (f) Class Midpoint (x) fx


76-80 3 78 234
71-75 5 73 365
66-70 6 68 408
61-65 8 63 504
56-60 10 58 580
51-55 7 53 371
46-50 7 48 336
41-45 3 43 129
36-40 1 38 38
TOTAL 50 2965
Continuation
(Weighted Mean) For grouped data

Solution:

∑fx 2965
(x̅ w) = -------- = ------------- = 59.3
n 50
Descriptive Data Analysis

1.3 Median is a midpoint of the distribution. It represents the


point in the data where 50% of the values fall below that point
and 50% all above it. When the distribution has an even number
of observations, the median is the average of the two middle
scores. The median is the most appropriate measure of the central
tendency for ordinal data.
(Median) For ungrouped data

The median may be calculated from ungrouped data by doing the


following steps.
1. Arrange the items (score, responses, observations) from the lowest
to highest.
2. Count to the middle value. For an odd number if values arranged
from lowest to highest, the median corresponds to value. If the
array contains an even number of observations, the median is the
average of the two middle value
(Median) For ungrouped data

Example: Consider these odd numbers of numerical values: 7, 8, 8, 9, 10, 12, 23


By inspection, the median is 9 because half of the values (7, 8, 8) are below 9 and half (10,
12, 23) are above 9. Since 7 is odd, the median has rank
n + 1 th 7 + 1 th
-------- = --------- = 4th item and is equal to 9
2 2

Answer: The median is 9


Continuation
(Median) For ungrouped data

Example: Consider these even numbers of numerical values: 12, 15, 22, 30, 32
The two middle values are 18 and 22. If the average of the two middle numbers is taken,
that is, 18 + 22 = 40 nd divided by 2 =, the median is 20

Answer: The median is 20


(Median) For grouped data

If the data are grouped into classes, the median will fall into one of the classes as the (n/2)th
value. The process involves several steps and has for its general formula the following:
n/2 –F
Median = L + I -----------
f
Where:
L = exact lower limit of the class containing the median (median class)
i = interval size
n = total number of items or observations
F = cumulative frequency in the class preceding the median class
f = frequency of median class
Continuation
(Median) For grouped data

Example: The following data show the distribution of the ages of people interviewed for a
survey on a topic climate change.

Class Interval (x) Frequency (f) Cumulative Frequency (F)


11-20 20 20
21-30 14 34
31-40 22 56
41-50 18 74
51-60 14 88
61-70 12 100
f = 100
Continuation
(Median) For grouped data

Solution:
Since there are 100 values in the data set, the median will represent the (n/2) th or the (100/2)
th item, that is the 50th largest value.

Determine in which class the 50th value falls. The first two classes have cumulative
frequency of 34 classes.
It need another 16 values to reach 50. Thus, 50 th value falls in the next class which contains
22 vales. The median class the is 31-40
Thus,
L = 30.5 n = 100 F = 34 f = 22 I = 20
Continuation
(Median) For grouped data

Solution:
n/2 –F (100/2) - 34
Median = L + I ----------- = 30.5 + 10 --------------------
f 22

Median = 37.77

Answer: The median is 37.77

This means that 50% or 50 of the 100 ages will fall bellow 37.77 and 50% or 50 will fall
above it.
Descriptive Data Analysis

1.4 Mode is the most frequency occurring vale in a set of


observations. In cases where there is more than one observations
which is the highest but with equal frequency, the distribution is
bimodal (with 2 highest observations) or multimodal with two
highest observations. In the case where every item has an equal
number of observations, there is no mode. The mode is
appropriate for nominal data.
Mode

Example: The ages of fifteen (15) persons assembles in a room are as follows:
16, 18, 18, 19, 25, 25, 25, 35, 35, and 36.

Solution: An age of 25 is the mode because it has been recorded three times in the sample, more
than any other age.

Answer: The mode is 25


Descriptive Data Analysis
2. Measures of Dispersion.
Suppose the senior high school students were asked to rate the quality of food
at the school canteen and find out the average rating is 3.5 using the scale 5
(excellent), 4 (very satisfactory), 3 (satisfactory), 2 (fair), and 1 (poor).
How close are the ratings given by the students? Do their ratings cluster around
the middle point of 3, or are their ratings spread or dispersed, with some students
giving ratings of 1 and the rest giving ratings of 5.
Descriptive Data Analysis
2. Measures of Dispersion.
The extent of the spread, or dispersion of the data is described by a group of
measures called measures of dispersion, also called measures or variability The
measures to be considered are the range, average or mean deviation, standard
deviation and the variance.
Range

The range is the difference between the largest and the smallest values in a set of data.
Consider the following ranges obtained by ten (10) students participating in a mathematical
contest:
6, 10, 12, 15,18, 18, 20, 23, 25, 28
` Thus, the range is 22. The score range from 6 to 28.
Average (Mean) Deviation

This measure of spread is defined as the absolute difference or deviation between the value
in a set of data and the mean, divided by the total number of values in the set of data.
In mathematics, the term “absolute” represented by the sign “ I I “ simple means taking the
value of a number without regard to positive or negative sign.
Continuation
(Average [mean] deviation) For grouped data

The formula based on definition is:


I x –x̅ I
Average Deviation (AD) = ∑ ------------
n
Example: Consider a set of values which consists of 20, 25, 35. 40, 45. Find the average deviation.
Solving for the mean,
20+25+35+40+45
Mean = ------------------------- = 33
5
I x –x̅ I I 20-33I+I25-33I+I35-33I+I40-33I+I45-33I
AD = ∑ ------------ = -------------------------------------------------------- = 8.4
n 5
Thus, on the average, each value s 8.4 units from the mean
(Standard Deviation) For Ungrouped Data

The standard deviation (SD) is a measure of the spread of variation of data about the mean.
It is computed by calculating the average distance that the average value is from the mean.
The formula for calculating the standard deviation for ungrouped data is given by
∑(x –x̅ )
SD = -----------
n-1
Continuation
(Standard Deviation) For grouped data

Example: Let us consider the same data used in the illustration for using the range. The value 6, 10,
12, 15, 18, 18, 20, 23, 25, 28.
Solution:
Step 1. Compute the mean
6+10 +12+15+18+18+20+23+25+28
x̅ = ------------------------------------------------- = 17.5
10
Step 2. Subtract the mean(x̅) from each score (x) or x̅ - x.
Step 3. Square each difference from step 2 or (x̅ - x) 2
Step 4. Sum all the squares from Step 3
Step 5. Divide the number in Step 4 by n-1. The quantity n-1 is called degrees of freedom, a
statistical concept produces a more accurate estimate of the data.
Step 6. Substitute the value in the formula and compute the SD = 6.95
Interpretation of the Standard Deviation

The standard deviation allows to reach conclusions about scores in the distribution the
following conclusions can be reached if that distribution of scores is normal.
1. Approximately 65% of the scores in the sample falls within one standard deviation of the
mean
2. Approximately 95% of the scores in the sample falls within two standard deviations of the
mean
3. Approximately, 99% of the scores in the sample falls within three standard deviation by
means
Interpretation of the Standard Deviation

The standard deviation allows to reach conclusions about scores in the distribution the
following conclusions can be reached if that distribution of scores is normal.
4. In the sample, with mean of 17.5 and a standard deviation of 6.95, then
68% of the scores will fall in the range
= (17.5 – 6.95) to (17.5 = 6.9)
= 10.5 to 24.45
5. Likewise, 95% of the scores will fall in the range
= 17.5 – (2)(6.95) to 17.5 + (2)(6.95)
= 3.6 to 31.4
Inferential Data Analysis
Inferential Data Analysis
Inferential statistics refers to statistical measures and techniques
that allow to use samples to make generalizations about the
population from which the sample were drawn.
Here are the list of common statistical measures to measure
significant difference and relationship between variables.
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Means
For independent samples (i.e. when the respondents consist of two different
groups as boys and girls, working mothers and non-working mothers,
healthy and malnourished children and the like)
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Means
For independent samples
Case 1: δ1δ2 unknown or n1 ≥ 30 n2 30 (x̅1 - x̅2) – (μ1 - μ )
2

z = ------------------------
δ1 δ2
---- + ----
n1 n2
 
 
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Means
For independent samples (x̅1 - x̅2) – (μ1 - μ ) 2

Case 2: δ1 ≠ δ2 and t = ------------------------


n1 < 30 and n2 < 30 s12 s22
---- + ----
n1 n2

 df = smaller of n1 - 1 or n2 - 1

 
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Means
For independent samples
Case 3: δ1 = δ2 and (x̅1 - x̅2) – (μ1 - μ )
2

n1 < 30 and n2 < 30 t = ------------------------


Where sp2 s p2
(n1-1) s12 + (n1-1) s12 ---- + ----
Sp2 = ------------------------------ n1 n2
n1 + n2 – 2
 df = n1 + n2 - 2

 
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Means
For correlated/dependent samples (i.e when the same set of respondents or
paired sets of respondents are involved.)

d – μd
t = ---------- (df = n-1)
Sd n

 
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Proportions and Percentages
For independent samples
(p1-p2) – (p1-p2)
z = -----------------------
pq pq
---- + ----
n1 n2

 
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Between Proportions and Percentages
For correlated/dependent samples

D–A p1-p2
z = ---------- = -------------
A+D a+d
--------
N
 
 
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
Analysis of Variance (ANNOVA) is used when significance of difference of
means of two or more groups are to be determined at one time
One-Way Analysis of variance
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
One-Way Analysis of variance
A typical ANNOVA table
Source of Degree of Sum f squares Mean square F-ratio p
Variance Freedom
Between
groups
Within groups
TOTAL
Inferential Data Analysis
TEST OF SIGNIFICANCE DIFFERENCE (T-TEST)
One-Way Analysis of variance relies on the F-ratio to test the hypothesis
that the two variables are equal; that is, the subgroups are from the same
population. “Between groups” refers to the variation between each group
mean and the grand of overall mean.
Inferential Data Analysis
TESTS OF RELATIONSHIP
Spear Rank-Order Correlation or Spearman rho. This is used when data
available are expressed in terms of ranks (ordinal variable).

6 ∑D2
ρ = 1 - ------------
N(N – 1)
 
 
Inferential Data Analysis
TESTS OF RELATIONSHIP
Chi-Square Test for Independence. This is used when data are expressed in
terms of frequencies or percentage (nominal variables)
Case 1. Multinomial
Case 2. Contingency Table (0-E) 2

X2 = ∑ ------------ [df = (r-1)(c-1)]


E
Where:
(row total)(column total)
E = ----------------------------------
(grand total)
 
 
Inferential Data Analysis
TESTS OF RELATIONSHIP
Product – Moment Coefficient of Correlation or Pearson r. This is used
when data are expressed in terms of scores such as weights and heights or
scores in a test (ratio or interval)
Case 1. When deviation from the mean are used

∑ (x – x) (y - y)
r = --------------------------
[∑(x –x)]2 [∑(y –y)]2
Inferential Data Analysis
TESTS OF RELATIONSHIP
Product – Moment Coefficient of Correlation or Pearson r. This is used
when data are expressed in terms of scores such as weights and heights or
scores in a test (ratio or interval)
Case 2. When raw scores on the original observations are used

n∑xy-(∑x)(∑y)
r = ----------------------------------------
[n∑x2- (∑x)2] [n∑y2- (∑y)2]
Inferential Data Analysis
TESTS OF RELATIONSHIP
T-test to test the Significance of Pearson r is used to determined if the value
of computed coefficient of correlation is significant.

(fo – fe)2 n-2


t= r -------
x2= ∑------------- (df = (k -1)
1-r2
fe
Where: r = correlation coefficient; n = number of samples
Inferential Data Analysis
TESTS OF RELATIONSHIP
T-test to test the Significance of Pearson r. The coefficient of
determination (r2) can also be used to indicate what proportion of the total
variation in the dependent variables is explained by the linear relationship
with the independent variable. It can be multiply by 100 convert the
coefficient of determination to percent.
Rank
◦ The formula for calculating percentile ranks is relatively simple and straightforward. Knowing only the
distribution of scores, you can easily calculate the percentile rank for any of the scores in the
distribution. The percentile rank formula is: R = P / 100 (N + 1). R represents the rank order of the
score.

You might also like