0% found this document useful (0 votes)
24 views38 pages

Descriptive Statistics

Statistics

Uploaded by

HAPPY PHIRI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views38 pages

Descriptive Statistics

Statistics

Uploaded by

HAPPY PHIRI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Descriptive Statistics

Mrs. M. Mushabati
Objectives
By the end of this lecture the student should
understand:
•measures of central tendency (Mean, median,
mode) and how to calculate them.
•quartiles i.e quartiles Q1, Q2 and Q3.
•Measures of variation (range, inter-quartile range,
variance and standard deviation) and how to
calculate them
Introduction
In statistics we usually deal with large volumes of data that
needs to be organized, summarized and described
We need some form of summary to:
 permit us to deal with data in a manageable form

 be able to share our findings with others in scientific talks

and publications.
 A histogram or bar diagram of the frequency distribution

is one type of summary.


 However, for most purposes, a numerical summary n

needed to describe concisely, the properties of the


observed frequency distribution.
 And quantities providing such a summary are called

descriptive statistics.
Introduction
 Frequency distributions from continuous data are
described using measures of central tendency and
measures of dispersion.

 The measures of central tendency locate


observations on a measurement scale.

 The measures of dispersion suggest how widely


the observations are spread out
Measures of Central
Tendency
 Measures of central tendency are single numeric values
that describe the centre of the distribution of a numeric
variable
 The main numerical measures of central tendency used in
statistics are the mean and the median

 The other measure of central tendency used is the mode,


◦ it not often used in statistical computations
The Mean
 The mean is the average value, or the sum ( ∑) of
all the observed values (xi) divided by the total
number of observations (N):
The Mean
 A researcher wants to know the mean number of
decayed teeth among 10 children aged 5 years.
The following are the # of decayed teeth in the
children:
 2, 7, 3, 11, 4, 6, 5, 9, 0, 1 .
 The mean number of decayed teeth is:
(2+ 7+3+11+4+6+5+9+0+1) /10
= 48/10
= 4.8 teeth.
The mean
 The international domestic workers association
wishes to know the average minimum monthly
wage of the domestic workers in Zambia.
Assuming that the workers reported the following:
K200, K230, K300, K320, K350

 Their mean would be:


(200+230+300+320+350) / 5
= 280
The mean is affected by extreme values
 Considering example 2, if there were workers
with considerably higher salaries than the others,
the mean will be different

K200, K230, K300, K320, K1500


 The mean would then be K510, which is not
representative of the set of data as a whole
Properties of the mean
1. It is unique: for a given set of data, there is only
one arithmetic mean
2. It is easy to compute and understand
3. All values in a set of data contributes to the
computation of the mean.
 The mean is affected by extreme values
The median
 The median is the midpoint of the distribution.
 It is the value that divides an ordered data set

into two equal parts such that half of the


observations fall above, and half fall below
 To calculate the median:
◦ Order the data from smallest to largest
◦ Consider whether the number of observations
(n), is even or odd
The Median
 The median is the midpoint of the
distribution.
 It is the value that divides a ranked data set

into two equal parts such that half of the


observations fall above, and half fall below
 To calculate the median:
◦ Rank the data from smallest to largest
◦ Consider whether the number of
observations (n), is even or odd
The Median
 If n is even, the median is the mean of the two
centre observations in the ordered list.
 These are values sitting on the position n/2 and
(n+1)/2
 Considering example 1 regarding tooth decay in
children, n=10
2, 7, 3, 11, 4, 6, 5, 9, 0, 1 .
◦ Oder the observations in the data set
0, 1, 2, 3, 4, 5, 6, 7, 9, 11.
Median = (4+5)/2
= 4.5
The Median
 If n is odd, the median is the centre observation in the
ordered list
 It is the value that sits on the position (n+1)/2
 Considering the minimum wage of domestic workers in
Zambia with an outlier,
K200, K230, K300, K320, K1500
 The median wage would be:
K200, K230, K300, K320, K1500
Properties of the median
 It is unique: there is only one median for a set of
data
 It is easy to compute and understand
 It is not as drastically affected by extreme values
as is the mean
◦ It gives a more representative measure in a
data set that has a skewed distribution
The Mode
 This is the value in the data set which occurs
most frequently
◦ The mode is the value that is most frequently
occurring, not the frequency itself
 Mostly used in qualitative data
The Mode
 The following are the ages (in years) of patients
visiting a hospital with chemical poisoning:
13, 21, 15, 16, 13, 19, 14, 16, 16, 13.
 The most frequently occurring ages are 16 and 13

with a frequency of three.


◦ The data has 2 modal values
 In a small sample consisting observations
13, 21, 15, 16, 19
May not have a modal value since all values are
different
Properties of the Mode
 The mode is easy to calculate and understand
 Its computation is not based on all values as is the
case for the mean
◦ It is not affected by extreme large or small values
 Not be well defined if the data consists of small
number of values
◦ it is possible that there can be more than one
modal value
◦ Sometimes the data may not have a mode at all.
 It is not capable of further mathematical treatment
Measures of dispersion
 Measures of variation
give information on the
spread or variability of
the data values.
 This can be done by
calculating measures
based on percentiles or
measures based on the
mean
 The centre is the same but
the variation is different.
Quartiles
 Percentiles are sometimes called quantiles

 They are a percentage of observations below the


point indicated when all the observations are
ranked in descending order

 After data are ranked from lowest to highest, the


data can be divided into quarters (quartiles)

 Each subset containing the same number of


observations
Quartiles
These are values which divide a series of observations,
arranged in ascending order into 4 equal parts. (Thus the 2nd
Quartile is the Median).
 The data is ranked and then split into 4 segments with an

equal number of values per segment

 The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger

 Q2 is the same as the median (50% are smaller, 50% are


larger)
 Lastly, Only 25% of the observations are greater than the
third quartile
Calculating quartiles
Find a quartile by determining the value in the
appropriate position in the ranked data, where;
 First quartile position:Q1= (n+1)/4
 Second quartile position:Q2= (n+1)/2 (the median
position)
 Third quartile position: Q3= 3(n+1)/4
 where n is the number of observed values
Example 1
Find the first, second and third quartiles for the data:
11 16 12 16 17 22 18 13 21
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21
22 (n = 9)
Q1 is in the(9+1)/4 = 2.5 position of the ranked data so
use the value half way between the 2nd and 3rd values,
Therefore, Q1= 12.5
 Q1 and Q3 are measures of non-central location
 Q2= median, a measure of central location.
Example cont.
 Dataset: 11 12 13 16 16 17 18 21 22
 Q2 is in the(9+1)/2 = 5th position of the ranked
data,
Therefore, Q2= median = 16
 Q3 is in the3(9+1)/4 = 7.5 position of the ranked
data,
Therefore, Q3= 19.5
Inter-quartile range
Formula: IQR = Q3 − Q1

 STEP 1
 Find the median (Q2).

 STEP 2
 Divide the dataset in two sub-datasets. The values 50%
of the data below the median (Q2) and 50% above the
median (Q2).
 The median of the values below the median (Q2) is Q1.
 The median of the values above the median (Q2) is Q3.
Inter-quartile range

• Interquartile range =77–64 =13 OR can be


expressed simply as (IQR 64, 77)
The Range
 The range is a simplest measure of variation and it’s the
difference between the greatest and least data values.
 Range = Xlargest – Xsmallest
 However, it is more informative to provide the minimum
and the maximum values rather than providing the
range.

 Example: The scores of 6 students in a test. The scores


are: 50, 70, 64, 94, 78, 88. Find the range of values.
 Answer: 94-50= 44. The range can also be expressed as
50-94.
Standard Deviation
 Standard deviation (SD) is the most commonly
used measure of dispersion.

 It is a measure of spread of data about the mean.

 SD is the square root of sum of squared deviation


from the mean divided by the number of
observations.

 Has same units as the original data.


Standard Deviation (population)
 Steps in calculation Standard deviation

STEP 1
 Calculate the mean as a measure of central location (MEAN)

STEP 2
 Calculate the difference between each observation and the mean
(x-MEAN)
STEP 3
 Next square the differences known as SQUARED DEVIATION
(x-MEAN)²
What is the effect of this ?
 Negative and positive deviations will not cancel each other out.
Standard Deviation (population)
STEP 4
 Sum up these squared deviations (SUM OF THE

SQUARED DEVIATIONS)
 Σ (x -MEAN)²

STEP 5
 Divide this SUM OF THE SQUARED
DEVIATIONS by the total number of observations
(N) to give the VARIANCE
 This is a measure of the variability of the data
Variance (population)

Standard Deviation (population)
 STEP 6
• Get the square root of the variance to obtain
the standard deviation for the population.
Variance (sample)

Standard deviation
(sample)
The sample standard deviation is obtained by squaring
root of the sample variance.
Therefore, the sample standard deviation is given by;

 Why divide by n-1?


• This is an adjustment for the fact that the mean is
just an estimate of the true population mean. It tends
to make the variance bigger.
Calculating the variance and standard
deviation
Features of the standard deviation
• It is usually positive and NEVER negative

• It is 0 only when all data values are the same number

• The larger value for SD the greater amount the data


varies

• It can increase dramatically with the inclusion of outliers

• The units (minutes, feet, etc...) are the same as the


units of original values
Lecture Summary
We have described measures of central tendency
•Mean, median, mode.
•Discussed.
•Described measures of variation quartiles Q1, Q2
and Q3, range, inter-quartile range, variance and
standard deviation
Any Questions???

You might also like