Statistics
Statistics
Statistics
QUALITATIVE DATA
measurements for which there is no natural
numerical scale but which consists of attributes,
labels, or other nonnumerical characteristics
QUANTITATIVE DATA
numericalmeasurements that arise from a
natural numerical scale
CLASSIFICATION OF
QUANTITATIVE DATA
DISCRETE DATA
types
of data that can be measured or
counted
CONTINUOUS DATA
datathat can assume an infinite number of
values between any two specific values
often include fractions and decimals
EXERCISES: Identify the following
measures as qualitative or quantitative.
1. The genders of the first 40 newborns in a hospital one
year.
2. The natural hair color of 20 randomly selected fashion
models.
3. The ages of 20 randomly selected fashion models.
4. The fuel economy in miles per gallon of 20 new cars
purchased last month.
5. The political affiliation of 500 randomly selected voters.
Collection of Data
SimpleMean :
To compute the mean of ungrouped
data,we use the formula :
X = x1+ x2 + x3 + …..+ Xn / N
Example 1: for Mean for Ungrouped Data
15,22,22,22,25,25, 27,28 30
33, 34,39,43,43,44,48,49
Since ,there is an odd number of
values in the data ,we take the
middle most number/value which is
30 as the median of the data set.
Example 3 #: for Mode
Solution
: First ,arrange the data set in
ascending or descending order
15,22,22,22,25,25,27,28,30,33,34,39,43,
43,44, 48,49
Next ,determine the number that appeared
the most number of times
WM = Σ wx / Σ w
= 3x10 + 5x5 +3x15+4x25 / 3+5+ 3+4
= 30+ 25 + 45 + 100 / 15
= 13.33
Thus,theaverage price of each fruit
bought by Xandra is 13.33pesos.
Grouped Data :
Class mark or Midpoint Method:
-In this method ,the class mark of
each interval has to be known and
then it will be multiplied to the
corresponding frequency of every
class interval.
The formula for the mean using this method
s:
X = ∑ cf X / n
Where :cf = frequency
x = class mark
n = total number of
observations
Examples :
Solution :
First get the midpoint or class mark of
each class interval
Second , multiply the frequency of
each class to the corresponding
midpoint or class mark
Last, Then get the sum of the products
Table
cl cf x cfX
75-79 5 77 385
70-74 7 72 504
65-69 8 67 536
60-64 10 62 620
55-59 7 57 399
50-54 9 52 468
45-48 4 47 188
n=50 Σ cfX=3100
From the values in the table ,we can now compute for
the value of mean by substituting the computed :
X = Σ cfX / n
= 3,100 / 50
= 62
So, therefore the mean of the data is 62
Grouped Data :
The Formula for the Median for group data :
Therefore :
The median class is the interval 61-65
d . Look at the < CF corresponding to the median
class .Then get the < CF before the median class .
1.5 / 8 = 0.1875
X = LBMC + { N /2 - <CFb / cfMC } I
= 60.5 + { 21/2 -9 /8} 5
= 60.5 + { 10.5 -9/ 8 }5
= 60.5+ 0.9375
= 61.4375 or 61.44
Solution :
a . Compute for the < CF of the data
Time (in seconds) Frequency < CF
51-55 2 2
56- 60 7 9
61- 65 8 17
66-70 4 21
21
Measures of Dispersion
K
x x
4
ns 4
i. when K > 3, the distribution is Leptokurtic
ii. when K = 3, the distribution is Mesokurtic
iii. when K < 3, the distribution is Platykurtic
Measure of Kurtosis
• A
normal distribution with a mean of 0 and a standard
deviation of 1 is called the standard normal distribution.
• The z-score measures how many standard deviations an
observed value is above or lower the mean.
• Sample z score is given by the formula
• The standard score is useful when we want to compare two
or more observed values from different data set.
Area under the Standard Normal Curve
Given Steps
Between zero and any Look up the area in the table
number
Between two positives, or Look up both areas in the table
Between two negatives and subtract the smaller from
the larger.
Between a negative and a Look up both areas in the table
positive and add them together
Less than a negative, or Look up the area in the table
Greater than a positive and subtract from 0.5000
Greater than a negative, or Look up the area in the table
Less than a positive and add to 0.5000
Test of Hypothesis
• The
level of significance, denoted by is the maximum
probability of committing a type I error that the
researcher is willing to commit.
• Very frequently used are the .05 and .01 level of
significance.
• Reject
if the value of the test statistic falls in the
region of rejection (that is, test statistics is greater
than the critical value.)
• Reject if the p-value is less than or equal to the level
of significance.
Test Statistic
FREQUENTLY USED INFERENTIAL STATISTICAL TOOLS
Single Two Two More than More than
LEVEL OF Sample Related Independent Two Related two CORRELATI
MEASURE- Samples Samples Samples Independe ONAL
MENT nt MEASURES
Samples
PARAMETRIC
INTERVAL/ t test for Paired t test t test for Pearson r
RATIO single independent ANOVA for ANOVA
sample samples repeated F-test
measures
Z test
ORDINAL Kolmogorov Sign test, Mann- Friedman Kruskal- Spearman
-Smirnov Whitney U Rank Test Wallis rank order
one-sample Wilcoxon test, H Test correlation
test matched-
NON-PARAMETRIC
pairs, Wald-
Wolfowitz
Signed- runs test
ranks test
NOMINAL Chi-square McNemar Chi-square Chi- Phi
one-sample test for square Coefficient,
test independent test for
samples with with more Yule’s Q
two than two
subclasses subclasses
Parametric Test