BStats 1
BStats 1
1
Today’s Highlights
• Introduction
• Classification of Data
• Method of Data Collection
• Primary and Secondary Data
• Measures of Central Tendency
• Measures of Variation
2
What is Statistics?
It is the SCIENCE of
collecting,
organizing,
presenting,
analyzing, and
interpreting data (quantitative or qualitative)
for the purpose of assisting in making a more
effective decision.
3
Applications
• For empirical inquiry
• Financial Decisions
• How is the economy doing?
• The impact of technology at work
• Compensation survey
• Perfomance management
• Employee Satisfaction Survey
• Training feedback evaluation
• Human Resource Accounting
• HR Budgeting
4
Statistical Methods
Descriptive Univariate
Statistics
Analysis Multivariate
5
Statistical Methods Contd…
Descriptive Analysis/Inferential
6
Data Classification
Data
Quantitative Qualitative
or Numerical or Attribute
7
Data Collection Method
8
Types of Data
• Primary Data: Are those which are collected
afresh and for the first time and thus happen
to be original in character
• Surveys
• Focus groups
• Questionnaires
• Personal interviews
• Experiments and observational study
Primary Data - Limitations
• Do you have the time and money for:
– Designing your collection instrument?
– Selecting your population or sample?
– Pretesting/piloting the instrument to work out
sources of bias?
– Administration of the instrument?
– Entry/collation of data?
Primary Data - Limitations
• Researcher error
– Sample bias
– Other confounding factors
• Uniqueness
– May not be able to compare to other populations
Data Collection Choice
• What you must ask yourself:
– Will the data answer my research question?
• To answer that
– You much first decide what your research question
is
– Then you need to decide what data/variables are
needed to scientifically answer the question
Data collection choice
20
Measures of Central Tendency
21
Central Tendencies
Mean: (Average)
The sum of a set of numbers divided by the total number of
the set.
22
Median: (Middle Value)
23
Mode: (Most frequent value)
24
Central Tendencies
18, 21, 8, 12, 26
Mean = 8 + 12 + 18 + 21 + 26 = 85 = 17
5 5
Median = 8, 12, 18, 21, 26
25
Central Tendencies
Mean = 576 = 72
8
76 + 70 = 73
Median = 60, 60, 60, 70, 76, 80, 80, 90
2
Mode = 60 Range = 90 - 60 = 30
26
Mean for grouped data
Calculating the Mean: If there are large amounts of data, it is
easier if it is displayed in a frequency table.
Example 1.
The number of goals scored by Premier League teams over a weekend was
recorded in a table. Calculate the mean and the mode.
Goals x Frequency, f fx
0 2 0
1 4 4 Mean = ∑fx
16 ∑f
2 8
9
3 3
= 42 = 2.1
4 2 8
20
5 1 5
∑f= 20 ∑fx= 42
Mode 27
Grouped Data
Large quantities of data can be much more easily viewed and managed if
placed in groups in a frequency table. Grouped data does not enable
exact values for the mean, median and mode to be calculated. Alternate
methods of analysing the data have to be employed.
Example 1.
During 3 hours at Heathrow airport 55 aircraft arrived late. The number of
minutes they were late is shown in the grouped frequency table below.
Example 1.
During 3 hours at Heathrow airport 55 aircraft arrived late. The number of
minutes they were late is shown in the grouped frequency table below.
30
Grouped Data
The Median Class Interval
Example 1.
During 3 hours at Heathrow airport 55 aircraft arrived late. The number of
minutes they were late is shown in the grouped frequency table below.
32
Grouped Data
Example 2.
A group of University students took part in a sponsored race. The number of
laps completed is given in the table below. Use the information to:
(a) Calculate an estimate for the mean number of laps.
(b) Determine the modal class.
(c) Determine the class interval containing the median.
34
Grouped Data
Example 2.
A group of University students took part in a sponsored race. The number of
laps completed is given in the table below. Use the information to:
(a) Calculate an estimate for the mean number of laps.
(b) Determine the modal class.
(c) Determine the class interval containing the median.
where:
L is the lower class boundary of modal class
fm is the Frequency of the model class
f1 is the previous frequency of the model class
f2 is the next frequency of the model class
h is the size of model class i.e. difference between
upper and lower class boundaries of model class.
36
As for Median (grouped data):
where:
L is the lower class boundary of median class
h is the size of median class i.e. difference between
upper and lower class boundaries of median class
f is the frequency of median class
c is previous cumulative frequency of the median
class
n/2 is total no. of observations divided by
2...OR...sumation of F divided by 2
37
Definition
• Measures of dispersion are descriptive
statistics that describe how similar a set of
scores are to each other
– The more similar the scores are to each other, the
lower the measure of dispersion will be
– The less similar the scores are to each other, the
higher the measure of dispersion will be
– In general, the more spread out a distribution is,
the larger the measure of dispersion will be
38
Measures of Dispersion
• Which of the
distributions of scores 125
dispersion? 50
25
The upper 0
1 2 3 4 5 6 7 8 9 10
distribution has
more dispersion 125
100
50
42
When To Use the Range
• The range is used when
– you have ordinal data or
– you are presenting your results to people with
little or no knowledge of statistics
• The range is rarely used in scientific work as it
is fairly insensitive
– It depends on only two scores in the set of data, XL
and XS
– Two very different sets of data can have the same
range:
1 1 1 1 9 vs 1 3 5 7 9
43
The Semi-Interquartile Range
• The semi-interquartile range (or SIR) is defined
as the difference of the first and third
quartiles divided by two
– The first quartile is the 25th percentile
– The third quartile is the 75th percentile
• SIR = (Q3 - Q1) / 2
44
Quartiles
The two values which are a quarter of
the way into the data from either end:
50
Variance
• Variance is defined as the average of the
square deviations:
2
2
X
N
51
Standard Deviation
53
Computational Formula
• When calculating variance, it is often easier to use
a computational formula which is algebraically
equivalent to the definitional formula:
2
2 X 2
2 X N X
N N
2is the population variance, X is a score, is the
population mean, and N is the number of scores
54
Computational Formula Example
X X2 X- (X- )2
9 81 2 4
8 64 1 1
6 36 -1 1
5 25 -2 4
8 64 1 1
6 36 -1 1
= 42 = 306 =0 = 12
55
Computational Formula Example
2
2 X
X N 2
2
N 2 X
42
2
N
306
6 12
6
306 294 6
6 2
12
6
2
56
Calculation of the standard
deviation of grouped data
Ages: x Mid-pt x – (x – mean)2 (x – mean)2
f mean f
32 81 324
30 - 34 4 –9
37 16 80
35 - 39 5 –4
42 1 2
40 - 44 2 1
47 36 324
45 - 49 9 6
f = 20
2
( x x) f 730
s
n 1 20 1
38 . 42 6 . 20
Measure of Skew
• Skew is a measure of symmetry in the
distribution of scores
Positive Skew Negative Skew
59
Measure of Skew
• The following formula can be used to
determine skew:
3
X X
3 N
s 2
X X
N
60
Measure of Skew
62
Kurtosis
• When the distribution is normally distributed,
its kurtosis equals 3 and it is said to be
mesokurtic
• When the distribution is less spread out than
normal, its kurtosis is greater than 3 and it is
said to be leptokurtic
• When the distribution is more spread out than
normal, its kurtosis is less than 3 and it is said
to be platykurtic 63
Measure of Kurtosis
X X
2
X X
N
s4
N
64
Coefficient of Variation
• The Coefficient of Variation (CV) is the standard
Deviation (SD) expressed as a percentage of
the mean
-Also known as Relative Standard deviation
(RSD)
• CV % = (SD ÷ mean) x 100
65
Measures of Central Tendency &
Dispersion
• Central Tendency
– Mean
– Median
– Mode
• Dispersion
Smaller variation
– Range
Larger variation
– Semi Inter-Quartile Range
– Variance/ Standard Deviation
66