0% found this document useful (0 votes)
2 views63 pages

Lecture 3-Descriptive Statistics

This lecture covers descriptive statistics, focusing on measures of central tendency (mean, median, mode, midrange) and dispersion (range, variance, standard deviation). It explains the difference between statistics and parameters, and provides examples for calculating these measures using various data sets. Additionally, it introduces the concept of weighted mean and the coefficient of variation, along with practical applications in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views63 pages

Lecture 3-Descriptive Statistics

This lecture covers descriptive statistics, focusing on measures of central tendency (mean, median, mode, midrange) and dispersion (range, variance, standard deviation). It explains the difference between statistics and parameters, and provides examples for calculating these measures using various data sets. Additionally, it introduces the concept of weighted mean and the coefficient of variation, along with practical applications in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 63

Joao Saldanha University/Inst

Elementary Statistics
(STATS 1)
Lecture 3: Descriptive
Statistics

Lecturer: Januario da Costa, PhD Cand.


Email: [email protected]
Introduction
This section of the lecture shows the statistical methods that can be used to
summarize data.
You often hear the word average. For example, the average height of people,
household median income, etc. In this context, average refers to the central
tendency of data values. There are various ways to describe the central
tendency, mean, median, mode, and midrange are some of the terms used to
describe the middle or center of group of data values.
Besides measures measuring the central of numbers, we also need to measure
its dispersion. Dispersion refers to how individual data points varies away from
the middle (e.g., mean).
Different sets of data have different dispersion. The smaller the dispersion, the
closer the data from the mean; or in other words, data are more concentrated
in a particular region of the distribution.
Measures of Central Tendency
Statistics vs Parameter
A statistic is a characteristic or measure obtained by using the data values
from a sample.
A parameter is a characteristic or measure obtained by using all the data
values from a specific population.

For example, the average height of a sample of 1503 men is 165.2 cm.
This is Statistical description

For example, the average height of men in Timor-Leste is 165.2 cm.

This is Parameter description


Intuition of central tendency
Consider the following number and determine its center:
a. 2 5
b. 6 3 -5
c. 6 8 4 7
d. -1 4 7 5 8

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
The Mean
The mean, also known as the arithmetic average, is found by adding the
values of the data and dividing by the total number of values.

The sample mean denoted by is calculate by the formula:

𝑥=
𝑥 1 + 𝑥 2+ 𝑥 3 +… + 𝑥𝑛
=
∑ 𝑥𝑛
𝑛 𝑛

represents total number of values in the sample.


Example 1: Calculate the mean of the following:

a. 1 -4 7 -5 8

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8

b. The number of calls that a Timor Telecom costumer service representative


responded to for a sample of 9 months is shown. Find the mean.
475, 447, 440, 761, 993, 1052, 783, 671, 621
c. During COVID 19 Pandemic. The number of patients with symptoms
admitted to six different hospitals are as follows. Find the mean.

110 76 29 38 105 31

d. If I only ask 5 people of their height what is their mean height ?


170.5 cm, 162.3 cm, 150 cm, 157 cm, 160.1 cm
e. Mean of the following baby Birth Weight (kg)
3.1 5.3 4.5 3.6 5.6
4.7 3.8 6.1 5.9 4.2
5.6 7.1 5.3 4.8 6.3
The Median
The median is the midpoint of the data array. The symbol for the median is
MD.

Procedure of finding the Median


Step 1: Arrange the data values in ascending order.
Step 2: Determine the number of values in the data set.
Step 3: a. If n is odd, select the middle data value as the median.
b. If n is even, find the mean of the two middle values. That is, add
them and divide the sum by 2.
Example 2: Calculate the median of the following:

a. 1 -4 7 -5 8 3

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8

b. The number of calls that a Timor Telecom costumer service representative


responded to for a sample of 9 months is shown. Find the median.
475, 447, 440, 761, 993, 1052, 783, 671, 621
c. During COVID 19 Pandemic. The number of patients with symptoms
admitted to six different hospitals are as follows. Find the median.

110 76 29 38 105 31

d. If I only ask 5 people of their height what is their median height ?


170.5 cm, 162.3 cm, 150 cm, 157 cm, 160.1 cm
e. Median of the following baby Birth Weight (kg)
3.1 5.3 4.5 3.6 5.6
4.7 3.8 6.1 5.9 4.2
5.6 7.1 5.3 4.8 6.3
The Mode
The value that occurs most often in a data set is called the mode.

• A data set that has only one value that occurs with the greatest frequency
is said to be unimodal.
• If a data set has two values that occur with the same greatest frequency,
both values are the mode, and the data set is said to be bimodal.
• If a data set has more than two values that occur with the same greatest
frequency, each value is used as the mode, and the data set is said to be
multimodal.
• When no data value occurs more than once, the data set is said to have no
mode.
Example 3: Calculate the mode of the following:

a. 1 -4 7 -5 8 3 5 7

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8

b. The number of calls that a Timor Telecom costumer service representative


responded to for a sample of 9 months is shown. Find the mode.
475, 447, 440, 761, 993, 1052, 783, 671, 621
c. During COVID 19 Pandemic. The number of patients with symptoms
admitted to eight different hospitals are as follows. Find the mode.

110 76 29 38 105 31 29 110

d. If I only ask 10 people of their height what is their mode?


170.5 cm, 162.3 cm, 150 cm, 157 cm, 160.1 cm, 162.3 cm, 170.5 cm, 157 cm
e. Mode of the following baby Birth Weight (kg)
3.1 5.3 4.5 3.6 5.6
4.7 3.8 6.1 5.9 4.2
5.6 7.1 5.3 4.8 6.3
The Midrange
The midrange is defined as the sum of the lowest and highest values in the
data set, divided by 2. The symbol MR is used for the midrange.
Example 4: Calculate the midrange of the following:

a. 1 -4 7 -5 8 3

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8

b. The number of calls that a Timor Telecom costumer service representative


responded to for a sample of 9 months is shown. Find the midrange.

475, 447, 440, 761, 993, 1052, 783, 671, 621


c. During COVID 19 Pandemic. The number of patients with symptoms
admitted to six different hospitals are as follows. Find the midrange.

110 76 29 38 105 31

d. If I only ask 5 people of their height what is their midrange of their heights?
170.5 cm, 162.3 cm, 150 cm, 157 cm, 160.1 cm
e. Midrange of the following baby Birth Weight (kg)
3.1 5.3 4.5 3.6 5.6
4.7 3.8 6.1 5.9 4.2
5.6 7.1 5.3 4.8 6.3
5. Calculate the mean, median, mode and midrange of the following data
value.
25 25 28 25 21
28 28 25 26 21
21 27 25 29 29
The Weighted Mean
Sometimes, you must find the mean of a data set in which not all values are
equally represented.
When data values are assigned different weights, we can compute a
weighted mean.

𝑋=
𝑤1 𝑋 1 +𝑤 2 𝑋 2 +𝑤3 𝑋 3 +…+ 𝑤𝑛 𝑋 𝑛
=
∑ 𝑤𝑋
𝑤 1+𝑤 2 +𝑤 3 + …+𝑤 𝑛 ∑𝑤
6. Taxi 1

Taxi 2

Taxi 3
7. Computing Grade Point Average. In her first semester of college, a student
took five courses. Her final grades along with the number of credits for each
course were: A (4 credits); A (4 credits); B (4 credits), C (4 credits), and B (4
credits). The grading system assigns quality points to letter grades as follows:
A=4; B=3; C=2; D=1 ;. Compute her grade point average (GPA).
Distribution Shapes
Review of the Basics
Measures of Variation
Basics of variation
Consider the following group of number and determine which group varies
more?
a. 6 8
b. 5 3 9
c. 1 2 1 3

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
Intuition of variation
250
350

300
200
250

200
150

150

100 100

50

50
0
0 200 400 600 800 1000 1200

-50
0
0 200 400 600 800 1000 1200

Data set 1 Data set 2


Which one is more dispersed?
Data set 2
The Range
The range is the highest value minus the lowest value. The symbol R is
used for the range.

Consider the following number and determine the range.


a. 6 8 4 7
b. 5 3 9 8

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
The Variance
SAMPLE VARIANCE
Measures the spread of data in the sample. Sample Variance is
denoted by 2 ∑ ( 𝑥 − 𝑥) 2
𝑠 =
𝑛 −1
9. Consider the following number and determine the variance
a. 6 8 4 7
b. 5 3 9 8

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
POPULATION VARIANCE
Measures the spread of data in the population. Sample Variance is
denoted by
𝜎 =
2 ∑ ( 𝑥 − 𝜇 ) 2

𝑁
10. Calculate the variance age of 65 members of parliament.
The Standard Deviation
SAMPLE STANDARD DEVIATION
Represent the standardized form of the spread of sample data. Sample
standard deviation is denoted as

√ ∑ 2
(𝑥 − 𝑥)
𝑠=
𝑛 −1
POPULATION VARIANCE
Represent the standardized form of the spread of population data. population
standard deviation is denoted as

√ ∑ ( 𝑥 − 𝜇) 2

𝜎=
𝑁
12. Calculate the standard deviation of 65 members of parliament.
13. Calculate the range, standard deviation and variance of the following
data set.
25 25 28 25 21
28 28 25 26 21
21 27 25 29 29
Measures of Position
Coefficient of Variation

The coefficient of variation (or CV) for a set of nonnegative sample or


population data, expressed as a percent, describes the standard
deviation relative to the mean, and is given by the following:
Example 14:
A survey was done to record the Heights and Weights of adults. The mean
heights was and standard deviation of . The mean weights was and
standard deviation of
a. Calculate the coefficient of variation (C.V)
b. Describe a comparison between the two using their respective C.V.

a. Coefficient of Variation (C.V) b. When we compare C.V of the two


we can see that the answer makes
Heights: sense. If we are outside an observe
it is obvious that the weights vary
much more than the heights.
Weights:
Z-Scores
Z-Score: Several standard deviation that a given value is above or below
the mean. When we standardize the distribution, the mean would be
zero.
Sample Population
Sales ($)
𝑥−𝑥 𝑥 −𝜇
𝑍 − 𝑆𝑐𝑜𝑟𝑒=
𝑍 − 𝑆𝑐𝑜𝑟𝑒= 47.60 60.70 54.77 52.26 52.57
𝑠 𝜎
56.43 61.74 57.09 56.45 57.56
Empirical Rule for bell shaped Distribution
Example 15:
Isabel and her brother Mario is very competitive when it comes to grades. Both
of them enroll in JSU. In in semester 2, Isabel took CALCULUS class, and her grade
was 80%, the mean grade for that class was with standard deviation of . Her
Brother Mario took STATISTICS class, and his grade was 91% and the mean grade
for the class were % and standard deviation . If their parent prepare a presents of
$100 to whoever has the higher grades. Who should win the money?
z Scores, Unusual Values, and Outliers

Example 16:
A researcher conducted an IQ test to 1000 Adults in Timor-Leste. The mean IQ
was 90 and the standard deviation of 10.1. On the next day, He tested 4
random people for IQ and the results are as following:
a. Antonio IQ = 94
b. Maria IQ = 86
c. Jorge IQ = 130
d. Angelina IQ = 60

Describe each case


Mean = 90 ; std. Dev = 10.1
a. Antonio IQ = 94
b. Maria IQ = 86
c. Jorge IQ = 130
d. Angelina IQ = 60
Percentiles
Percentiles are measures of location, denoted , , which divide a set of
data into 100 groups with about 1% of the values in each group.

For example, the 50th percentile, denoted , has about 50% of the data
values below it and about 50% of the data values above it. So the 50th
percentile is the same as the median.

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑙𝑒𝑠𝑠 𝑡h𝑎𝑛 𝑥


𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒 𝑥= 𝑥 100 %
𝑇𝑜𝑡𝑎𝑙𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
Example 17: 1 2 2 3 4
Consider the following data set: 5 7 9 9 10
a. What percentile is 15 12 14 16 17 18
b. What percentile 4
c. Find the
d. Find
Example 18:
The frequency distribution for the systolic blood pressure readings (in
millimeters of mercury, mm Hg) of 200 randomly selected college students
is shown here.
a. Find the percentile rank of a blood pressure
reading of 130

b. Find the value that corresponds to the 40th


percentile
Class Relative Cumulative
Boundaries Frequency Frequency Frequency
89.5-104.5 24 12% 12%
104.5-119.5 62 31% 43%
119.5-134.5 72 36% 79%
134.5-149.5 26 13% 92%
149.5-164.5 12 6% 98%
164.5-179.5 4 2% 100%
TOTAL 200
Quartiles
Quartiles are measures of location, denoted , , and , which divide a set of
data into four groups with about 25% of the values in each group.

(First quartile): Separates the bottom 25% of the sorted values from the top
75%. (To be more precise, at least 25% of the sorted values are less than or
equal to and at least 75% of the values are greater than or equal to )
(Second quartile): Same as the median; separates the bottom 50% of the
sorted values from the top 50%.
(Third quartile): Separates the bottom 75% of the sorted values from the top
25%. (To be more precise, at least 75% of the sorted values are less than or
equal to and at least 25% of the values are greater than or equal to )
EXAMPLE 19:
Consider the following data set:
1 2 2 3 4
5 7 9 9 10
12 14 16 17 18

a. Find the
b. Find the IQR
EXAMPLE 19:
Consider the following data set:
1 2 2 3 4
5 7 9 9 10
12 14 16 17 18 20

a. Find the
b. Find the IQR
5 Number Summary and Box Plot
The values of the three quartiles are used for the 5-number summary
and the construction of boxplot graphs.

For a set of data, the 5-number summary consists of the minimum


value, the first quartile , the median (or second quartile ), the third
quartile and the maximum value.

A boxplot (or box-and-whisker diagram) is a graph of a data set that


consists of a line extending from the minimum value to the maximum
value, and a box with lines drawn at the first quartile the median, and
the third quartile .
EXAMPLE 20:
Consider the following data set:
1 2 2 3 4
5 7 9 9 10
12 14 16 17 18
a. Find the 5-number summary
b. Construct a box plot
Outliers
We can consider outliers to be data values meeting specific criteria
based on quartiles and the interquartile range.
A data value is considering an outlier if
Above Q3 by an amount greater than 1.5 x IQR

Below Q1 by an amount less than 1.5 x IQR


EXAMPLE 21:
Consider the following data set:
1 2 2 3 4
5 7 9 9 10
12 14 16 17 18
Find if there is an outlier
EXAMPLE 22:
Consider the following data set:
10 12 11 15
11 14 13 13
12 22 14 14
Find if there is an outlier
EXAMPLE 23:
Consider the following data set:
1 13 14 15 18 19
1 13 14 16 18
6 14 15 18 18
a. Develop five number summary and Box Plot.
b. Determine if there is an outlier.
Descriptive Statistics with
Excel
EXAMPLE 24: Living Expenses
A study was conducted to know the living expenses in Dili of Single adult. 300
samples were collected as shown in excel file “Lecture 3” sheet “ex24”.
a. Use excel to present mean, median, mode, standard deviation and variance.
b. Find 5-Number Summary
c. Construct a Box Plot
d. Determine if there’s an outlier. Yes, there’s outliers as shown in boxplot.
5-Number Summary Mean 153.33
Min 70 Median 152.90
Q1 139.54 Mode #N/A
Std. Dev 22.05
Q2 153.19
Variance 485.99
Q3 167.53
Max 209.69
Example 25: Teacher’s Salary
The following data are the salaries of workers in education sectors at all levels
of job position in 2011 (table on the left). Suppose an average living expenses
for a typical family of four are shown in table on the right.
Spending on Monthly ($) Yearly ($)
Housing 80 960
Food 60 720
Childcare 30 360
Transportation 85 1020
Healthcare 20 240
Savings and Investment 50 600
Other Necessities 100 1200
TOTAL 425 5100
First, assume you work for the high-level education council, and you wish to
increase salaries so that all workers will have enough money to spend in their daily
life. You research team conduct a study a collect 200 teachers' salary as presented
in excel file “Lecture 3” Sheet “ex25”.

Use descriptive statistics techniques and decide what minimum salary should a
teacher gets.

You might also like