0% found this document useful (0 votes)
90 views

Lesson 5 - Quantitative Analysis and Interpretation of Data

Uploaded by

ojs99784
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Lesson 5 - Quantitative Analysis and Interpretation of Data

Uploaded by

ojs99784
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 78

One does not need to be a statistical wizard

to grasp the basic mathematical concepts


needed to understand major measurement
issues.
Statistics

plural sense a set of numerical data


singular sense branch of science which
deals with the

• collection
• presentation
• analysis
• interpretation

of data
Population vs. Sample

Population a collection of all the elements under


consideration in any statistical study

Sample a part (or subset) of the population from


which information is collected
Parameter vs. Statistic

Parameter a numerical characteristic of the


population
Statistic a numerical characteristic of the
sample
Areas of Statistics

Descriptive comprise those methods concerned


Statistics with collecting, describing , and
analyzing a set of data without
drawing conclusions or inferences
about a large group

Inferential comprise those methods concerned


Statistics with the analysis of sample data
leading to predictions or inferences
about the population
Data

refers to the information collected, organized,


analyzed, and interpreted by researchers

Classification of Data

Qualitative have labels or names assigned to their


respective categories
Examples:
Color- red, blue, yellow, green
Sex - male, female
Quantitative any attribute that we
measure in numbers
Examples:

weight - 160 lbs, 25 kg, 77 mg, etc.


height - 34 in., 5 cm, 5ft. 6 in., etc
Raw Data vs. Array

Raw Data data in its original form

Array data arranged either from highest


to lowest or from lowest to highest

Example:

Upon examining the monthly billing records of a book


company, the auditor takes a sample of 20 of its unpaid
accounts. The amounts (in thousand pesos) owed the
company were
Measurement

• Measurement is defined as a set of rules for


assigning numbers to represent objects, traits,
attributes, or behaviors.
Scales of Measurement
• Nominal Scales:
• qualitative system for categorizing objects or people.
• numbers or symbols are used simply to classify
an object, person, or characteristics into
categories
• the categories must be distinct, non-overlapping,
and exhaustive
• weakest level of measurement
• Examples: Gender - Female =1, Male = 2; Eye
Color - Brown =1, Blue =2, Green = 3.
• Ordinal Scales:
• allows you to rank people or objects
according to the quantity of a
characteristic.
• contains the properties of the nominal level but the
numbers assigned to categories of any variable may
be ranked or ordered in some low - to - high manner

• Example: Graduation Class Rank - 1 =


Valedictorian, 2 = Salutatorian, 3 = 3rd
Rank, etc..
3. Interval level

• contains the properties of the ordinal


level but the distances between any two
numbers on the scale are of known sizes
• characterized by a common and constant
unit of measurement
• units of measurement are arbitrary
• the number zero does not imply the
absence of the characteristic under
consideration (thus, the zero point is
arbitrary
• Examples: IQs, GRE scores
4. Ratio level

• contains the properties of the interval


level but it has a true zero point, that is,
the number zero indicates the absence of
the characteristic under consideration
• strongest level of measurement

Examples:
- height in meters, feet, etc.
- weight in kilograms, pounds, etc.
Why “Scale” matters
• There is a hierarchy among the scales with nominal
scales being the least sophisticated and providing the
least information and ratio scales being the most
sophisticated and providing the most information.

• Interval and ratio level data allow the use of the


more powerful parametric statistical procedures.
Distributions
• Distribution: a set of scores.
• Raw Score Distributions
• Frequency Distributions
Ungrouped Frequency Distribution
Grouped Frequency Distribution

• Frequency Graphs
Frequency Distribution Table

refers to the tabular arrangement of data by


classes or categories together with the number
of observations falling with each class

1. Single-value grouping
• a form of frequency distribution where the
distinct values are used as classes
Example:

The following data represent the number of school-age


children from a sample of 30 families in a certain
residential area.

0 0 3 2 0 2

0 1 4 4 1 1

0 0 3 3 0 0

2 1 1 0 2 0

2 0 0 2 1 2
Frequency Distribution For Number of School-Age Children

Number of School-Age Number of Relative


Children Families Frequency
0 12 0.40
1 6 0.20
2 7 0.23
3 3 0.10
4 2 0.07

Figure 3.2. Single-Value Frequency Distribution For


The School-Age Data
Remark: Single-value grouping is commonly used for
qualitative type of data.
Copyright © Allyn & Bacon 2006
Copyright © Allyn & Bacon 2006
2. Grouping by class intervals

Definitions:

Class interval the numbers defining a class

Class limits the smallest and the largest values


than can fall in a given class

Class boundaries numbers that are halfway between


the upper limit of a class and the
lower limit of the next class
2. Grouping by class intervals

Definitions:

Class interval the numbers defining a class

Class limits the smallest and the largest values


than can fall in a given class

Class boundaries numbers that are halfway between


the upper limit of a class and the
lower limit of the next class
Class size length of the class interval; computed by taking
the difference between two successive upper/
lower class boundaries or class limits

Class mark midpoint of an interval; computed by taking the


average of the lower and upper class limits of
a given class interval

Relative frequency obtained by dividing the class frequency


by the total number of observations

Relative percentage obtained by multiplying the relative


frequency by 100%
Steps in Constructing a Frequency Distribution Table

1. Determine an adequate number of classes (K).

• not too many, not too few


• usually between 6 and 16
• classes should be non-overlapping

2. Determine the range (R).

R = highest – lowest

3. Compute the ratio of R and K (C*).

C* = R/K
4. Determine the class size (C) by rounding-off C* to a number that
is easy to work with.

5. List the class intervals.

6. Tally the frequency for each class.

7. Sum the frequency column and check against the total number
of observations.

Remark: No hard-and-fast rule can be given as to the number


of classes into which the frequency distribution should
be divided. This is done somewhat arbitrarily, although
we are guided by the size of our sample. Usually, we
choose between 5 and 20. The smaller the number of
data available, the smaller is our choice for the number
of classes.
Example :

The following are the scores of 4th year high school


students in a certain achievement test in
Mathematics
11 14 14 14 16 17 20 24
25 25 28 30 30 31 31 33
34 34 35 35 37 37 37 38
39 41 41 42 44 44 44 45
47 47 47 47 51 53 53 54
54 55 55 56 56 56 57 57
58 58 58 58 59 60 60 60
61 62 62 62 65 66 66 66
66 67 68 68 74 75 76 76
81 87 92 92 97

Let K=9
R = 97-11 = 86
C* = 86/9 = 9.56
C = 10
Frequency Distribution Table for the achievement test scores
of 4th year high school students
Scores Class Boundaries Class Number of
Mark student Percentage
s

10 – 19 9.5  19.5 14.5 6 7.79


20 – 29 19.5  29.5 24.5 5 6.49
30 – 39 29.5  39.5 34.5 14 18.18
40 – 49 39.5  49.5 44.5 11 14.29
50 – 59 49.5  59.5 54.5 17 22.08
60 – 69 59.5  69.5 64.5 15 19.48
70 – 79 69.5  79.5 74.5 4 5.19
80 – 89 79.5  89.5 84.5 2 2.60
90 - 99 89.5  99.5 94.5 3 3.90
5

0
4 5 6 7 8 9 10
Homework Score
MEASURES OF CENTRAL TENDENCY

any single value which is used to identify the “center” of the


data or the typical value; it is oftentimes referred to as the
average
The Mean

• sum of all values of the observations divided by


the
number of observations in the data set

Population Mean (for a finite population):

sum of the observations


m= 
size of the population (N)
Sample Mean:
 sum of the observations
X = 
size of the sample (n)
Example:

The achievement test scores in Math of all 50 freshmen students from


a certain college are as follows:

43 51 53 55 57 58 58 59 61 61
61 62 63 64 65 65 66 66 67 68
68 69 69 69 69 70 70 70 71 71
72 73 73 74 74 75 76 76 77 78
79 79 81 82 82 85 87 89 91 96

The mean of this population is :

43 + 51 + . . . + 91 + 96 3498
 =  =  = 69.96
50 50
Suppose that a sample of seven students from this college
yielded the following observations:

70 , 82 , 77 , 96 , 55 , 85 , 64

The corresponding sample mean is

 70 + 82 + 77 + 96 + 55 + 85 + 64
X =  = 75.57
7

Suppose another sample of students of the same size was


taken and resulted to the following scores:
58 , 72 , 77 , 89 , 63 , 85 , 51
The sample mean is given by

 58 + 72 + 77 + 89 + 63 + 85 + 51
X =  = 70.714
7

For data that are presented in a frequency distribution


table, the mean can be approximated as follows:

i. multiply the class marks by their corresponding class


frequency
ii. sum the obtained products
iii. divide the sum by the number of observations
Example:

Using the frequency distribution table for the ozone concentration data,

Frequency Distribution Table for Ozone Concentration


( in Parts Per Ten Million)

Concentration Number Class fiXI


of Ozone in the of Areas Mark (Xi)
Atmosphere (fi)

10 – 19 6 14.5 87.0
20 – 29 5 24.5 122.5
30 – 39 14 34.5 483.0
40 – 49 11 44.5 489.5
50 – 59 17 54.5 926.5
60 – 69 15 64.5 967.5
70 – 79 4 74.5 298.0
80 – 89 2 84.5 169.0
90 – 99 3 94.5 283.5
The approximated sample mean is

 87 + 122.5 + . . . + 169 + 283.5


X = 
77
= 49.70

Verify that the actual sample mean is 49.8052.


Characteristics of the Mean

• It is the most familiar measure of central tendency


used, and it employs all available information.

• It is strongly influenced by extreme values.

• Since the mean is a calculated number, it may not be


an actual number in the data set.

• It can be applied to data that are measured in at least


interval level.
The Median

• a value that divides an ordered set of data (array) into two


equal parts and is commonly denoted by Md

• a value below which one-half of the data must fall

To get the median:

• when the number of observations is odd:


Md = middle value in the array
= (n+1/2)th observation in the array

• when the number of observations is even:


Md = mean of the two middle values in the array
= mean of (n/2)th and (n/2 + 1)th observations
in the array
Examples:

a. The following are the total receipts of 7 companies (in million


pesos)

1.2 , 7.2 , 12.5 , 6.5 , 50.6 , 4.5 , 10.4

The array corresponding to the above data is given by

1.2 , 4.5 , 6.5 , 7.2 , 10.4 , 12.5 , 50.6

Thus, the median is 7.2


b. The following are the number of years of operation of 8 manufacturing
companies:

8 , 10 , 17 , 18 , 11 , 16 , 17 , 10

The array is given by

8 , 10 , 10 , 11 , 16 , 17 , 17 ,18

The median is
11 + 16
Md =  = 13.5
2

Characteristics of the Median


• It is a positional measure.
• It is not influenced by extreme values.
• It can be applied to data that are measured in at least ordinal
level.
The Mode

• the value in the data set that occurs with the greatest frequency

Example:

A psychologist has developed a new technique intended to


improve rote memory. To test the method against other
standard methods, 30 high school students representing three
sections are selected at random, and each is taught the new
technique. The students are then asked to memorize a list of
100 word phrases using the technique. The following are the
number of word phrases memorized correctly by the students
from each section:
Section 1: 83 64 98 66 83 87 83
93 86 80 93 83 75

Section 2: 87 76 96 77 94 92 88
8566 89

Section 3: 68 84 79 79 84 75 80

Define the mode for each set in the context of this problem.

Section Mode
1 83

2 does not exist

3 84 and 79
Characteristics of the Mode

• It is the easiest to interpret among measures of central


tendency.

• It is not affected by extreme values.

• It does not always exist; if it does, it may not be unique. If a


data set has two modes, we call it bimodal, if there are three
modes, we call it trimodal and so on.

• One advantage of the mode is that it can be applied to


observations that are measured in the nominal level.
MEASURES OF LOCATION

numbers below which a specified amount or percentage of


data must lie and are oftentimes used to find the position of a
specific piece of data in relation to the entire set of data

Percentiles

 values that divide an ordered set of data into 100 equal parts

 the ith percentile (i=1,2,...,99) , denoted by Pi, is a value


below which i% of the data must lie
To determine Pi, we have the following steps:

i. Arrange the data from lowest to highest.

ii. If ni/100 is a whole number, Pi is the mean of the mean of


the (ni/100)th and (ni/100 + 1)th ordered values.

iii. If ni/100 is not a whole number, Pi is the kth ordered value


where k is the closest whole number greater than ni/100.
Deciles

• values that divide an ordered set of data into 10


equal parts
• the ith decile (i=1,2,...,9) , denoted by Di, is a value
below which 10i% of the data must lie

Quartiles

• values that divide an ordered set of data into 4 equal


parts
• the ith quartile (i=1,2,3) , denoted by Qi, is a value
below which 25i% of the data must lie
MEASURES OF DISPERSION
numerical descriptive measures which indicate the extent to
which individual observations in a set of data are scattered
about an average

Some Uses of Measuring Dispersion


• to determine the extent of scatter so that steps may be
taken to control the existing variation

• used as a measure of reliability of an average

Classification of Measures of Dispersion

• Measures of Absolute Dispersion


• Measures of Relative Dispersion
MEASURES OF ABSOLUTE DISPERSION

measures of dispersion that are expressed in the units of the


original observations; cannot be used to compare variation of
two or more data sets when the observations differ in the
units of measurement or when the values of the averages
differ in magnitude

The Range

the difference between the largest and smallest values in


a data set
Characteristics of the Range

• It is the simplest measure of dispersion.

• It is sensitive to extreme value.

• It fails to communicate any information about the


clustering or the lack of clustering of the values
between the extremes.
The Standard Deviation

• a measure of dispersion which indicate the extent of


scattering of the observations from the mean

• the square root of the average squared deviation of the


observations from the mean

Population standard deviation:

Sum of squared obser vations - N  2


 
N
Sample standard deviation:

2
Sum of squared obser vations - nX
s
n 1

Remark: The square of the standard deviation is called the


variance. The population variance is commonly
denoted by 2 whereas the sample variance is
denoted by s2.
Example:

Refer to the data on Math scores of 50 freshmen students.


The population variance is given by

(432  512  532    892  912  962) - 50 (69.962)



50

250226 - 244720.08
   13.7794
6
For the sample of seven students, the standard deviation is computed
as

(702  822  772  962  552  852  642) - 7(75.572)


s
6

41115 - 34264.95
s  13.7794
6
If the data are presented in a frequency
distribution table, the sample standard deviation
can be approximated as follows:

i. Compute for the mean.


ii. Subtract the mean from each of the class marks in
the FDT and square these deviations.
iii. Multiply the squared deviations by the
corresponding class frequency.
iv. Sum the products obtained in (iii) and divide the
resulting value by (n1).
v. Take the square root of the obtained quotient.
Example

Frequency Distribution Table for Scores of High School Students

Scores Number of Class Mark Xi – 49.25 (Xi – 49.25)2 fi (Xi – 49.25)2


Areas (fi) (XI)

10-19 6 14.5 -34.75 1207.5625 7245.375


20-29 5 24.5 -24.75 612.5625 3062.8125
30-39 14 34.5 -15.25 232.5625 3023.3125
40-49 17 44.5 -5.25 27.5625 468.5625
50-59 11 54.5 5.25 27.5625 303.1875
60-69 15 64.5 15.25 232.5625 3488.4375
70-79 4 74.5 25.25 637.5625 2550.25
80-89 2 84.5 35.25 1242.5625 2485.125
90-99 3 94.5 45.25 2047.5625 6142.6875
28769.75
The approximated sample standard deviation is given
by
_________
s =  28769 / 76 = 19.456

Verify that the actual sample standard deviation


is 19.60266
MEASURES OF SKEWNESS

refer to the degree of asymmetry, or departure from


symmetry of a distribution; it indicates not only the amount
of skewness but also the direction

Examples of Symmetric Distributions


Figure 3.22. Examples of symmetric distributions.
Two Types of Skewness

1. Positive Skewness or Skewness to the Right

• distribution tapers more to the right than to the left


• longer tail to the right
• more concentration of values below than above the median

Mo Md 

Figure 3.23. Frequency distribution of a positively skewed data set


2. Negative Skewness or Skewness to the Left

• distribution tapers more to the left than right


• longer tail to the left
• more concentration of values above than below the mean

X Md Mo

Figure 3.24. Frequency distribution of a negatively skewed data set


Some Common Measures of Skewness

1. Pearson’s First Coefficient of Skewness


_
X - Mo
Sk = 
s
_
where X = sample mean
Mo = mode
s = sample standard deviation
2. Pearson’s Second Coefficient of Skewness
_
3 ( X - Md )
Sk = 
s
_
where X = sample mean
Md = median
s = sample standard deviation

Interpretation: Sk < 0 Negatively skewed


Sk = 0 Symmetric
Sk > 0 Positively skewed
Copyright © Allyn & Bacon 2006
Copyright © Allyn & Bacon 2006
Correlation Coefficients
• A correlation coefficient is a mathematical
measure of the relationship between two
variables.

• The correlation coefficient was developed by


Karl Pearson and is designated by the letter r.

Copyright © Allyn & Bacon 2006


Correlation (r)
• Correlations range from -1.0 to +1.0
• Correlations differ on two parameters: size and sign.
• Sign - can be positive or negative. Indicates the
pattern of the relationship.
• Size - a correlation of 0.0 indicates the absence of a
relationship; the closer the correlation gets to 1.0,
the stronger the relationship; a 1.0 indicates a
perfect relationship.

Copyright © Allyn & Bacon 2006


Scatterplots
• Scatterplots: graph depicting the relationship
between two variables (X & Y). Each mark in
the scatterplot actually represents two scores,
an individual’s scores on the X and the Y
variable.

Copyright © Allyn & Bacon 2006


Copyright © Allyn & Bacon 2006
Major Types of Correlations
• Pearson Product-Moment Correlation: both
variables continuous and on an Interval or
Ratio scale.

• Spearman Rank-Difference Correlation: both


variables on an Ordinal scale.

Copyright © Allyn & Bacon 2006


Major Types of Correlations
• Point-Biserial Correlation: one variable
continuous and on Interval/Ratio scale, the
other a genuine dichotomy (e.g., true/false).

• Biserial Correlation: both variables continuous


and on Interval/Ratio scale, but one is reduced
to two categories (i.e., dichotomized).

Copyright © Allyn & Bacon 2006


Factors that Effect Correlations
• Most correlations assume a linear relationship
(falling on a straight line). If another type of
relationship exists, traditional correlations
may underestimate the correlation.
• If there is a restriction of range in either
variable, the magnitude of the correlation will
be reduced.

Copyright © Allyn & Bacon 2006


Qualitative Interpretation of
Correlations
• General Guidelines:
• < 0.30 Weak
• 0.30 - 0.70 Moderate
• > 0.70 Strong

• These are not universally accepted and you


might see other guidelines.

Copyright © Allyn & Bacon 2006


Statistical Significance of
Correlations
• Statistical significance is determined both by
the size of the correlation coefficient and the
size of the sample.

• This and related topics are covered in most


introductory statistics texts and courses.

Copyright © Allyn & Bacon 2006


Quantitative Interpretation of
Correlations
• Coefficient of Determination (r2): the
proportion of variance on one variable that is
determined or predictable from the other
variable.
• Coefficient of Nondetermination (1-r2): the
proportion of variance in one variable that is
not determined or predictable from the other
variable.

Copyright © Allyn & Bacon 2006


Correlation & Prediction
• When variables are correlated, particularly
when there is a strong correlation, knowledge
about performance on one variable provides
information that can help predict performance
on the other variable.

Copyright © Allyn & Bacon 2006


Linear Regression
• A statistical technique for predicting scores on
one variable (criterion or Y) given a score on
another (predictor or X).
• Predicts criterion scores based on a perfect
linear relationship.
• Strong correlations result in accurate
predictions; weak correlations result in less
accurate predictions.

Copyright © Allyn & Bacon 2006


Correlation & Causality
• It is a common misconception that if two
variables are correlated one is causing the
other.

• This is not the case!

Copyright © Allyn & Bacon 2006

You might also like