Topic 1 Descriptive Statistics SV
Topic 1 Descriptive Statistics SV
Topic 1
Descriptive Statistics
1
Content
1.1 What is Statistics?
1.2 Population Versus Sample
1.3 Basic Terms
1.4 Types of Variables
1.5 Raw Data
1.6 Organizing and Graphing Qualitative Data
1.7 Organizing and Graphing Quantitative Data
1.8 Shapes of Histograms
1.9 Cumulative Frequency Distributions
2
1.10 Stem-and-Leaf Displays
1.1
What is Statistics ?
1st Meaning of Statistics
The word ‘statistics’ has 2 meanings.
1. Statistics refers to numerical facts.
4
2nd Meaning of Statistics
2. Statistics refers to the field or
discipline of study.
Statistics is a group of methods used to
collect, analyze, present, and interpret
data and to make decisions.
5
1.2
Population Versus
Sample
Population
PopulationVersus
and Sample
Sample
Population or Target Population
Consists of all elements (individuals, items,
or objects) whose characteristics are being
studied.
Sample
A portion of the population selected for
study.
7
Illustration
8
1.3
Basic Terms
Definition
Element or Member
An element or member of a sample or
population is a specific subject or
object (e.g. a person, firm, item, state, or
country) about which the information is
collected.
Variable
A variable is a characteristics under study that
assumes different value for different elements.
10
Definition
Observation or Measurement
The value of a variable for an element.
Data Set
A data set is a collection of observations on
one or more variables.
11
SUMMARY
Population or Target Population
Consists of all elements (individuals, items, or objects) whose characteristics are
being studied.
Sample
A portion of the population selected for study.
Element or Member
An element or member of a sample or population is a specific subject or object
(e.g. a person, firm, item, state, or country) about which the information is
collected.
Variable
A variable is a characteristics under study that assumes different value for
different elements.
Observation or Measurement
The value of a variable for an element.
Data Set
A data set is a collection of observations on one or more variables.
Example
13
Example
Problem
The following table gives the scores of five
students on a statistics test.
Student Score i) What is the variable for
Kevin 83 this data set?
ii) How many observations
Susan 91
does this data set
David 78 contain?
Jeff 69 iii) How many elements
Johan 87 does this data set
contains? 14
Solution
15
1.4
Types of Variables
Quantitative Variables
Definition
17
Quantitative Variables
a) Discrete Variable
A variable whose values are countable is
called a discrete variable. In other words,
a discrete variable can assume only
certain values with no intermediate values.
18
Quantitative Variables
b) Continuous Variable
A variable that can assume any numerical
value over a certain interval is called a
continuous variable.
Example:
The height of a person etc.
The time taken to complete an examination.
The yield of potatoes (in pounds) per acre.
19
Qualitative / Categorical Variables
Definition
• A variable that cannot assume numerical value
but can be classified into two or more non-
numeric categories.
• The data collected on such a variable are called
qualitative data.
20
Exercise
Determine whether the following is a Population or Sample
and hence, identify the following as Qualitative,
Quantitative Discrete, or Quantitative Continuous
variables.
21
Solution
22
Illustration
23
1.5
Raw Data
Definition
RAW DATA
Data recorded in the sequence in which
they are collected and before they are
processed or ranked are called raw data.
25
Raw Data (quantitative data)
26
Raw Data (qualitative data)
27
1.6
Organizing &
Graphing Qualitative
Data
Example 1
A sample of 30 employees were asked how stressful their
jobs were. Their responses are recorded below.
29
Example 1 (Solution)
30
Relative Frequency &
Percentage Distributions
Tabular arrangement that lists the
relative frequencies and percentages
for all categories.
frequency of that category f
relative frequency of a category
sum of all frequencie s f
Percentage relative frequency 100
31
Example 1 (Solution)
f
10
14
6
Sum = 30
32
Exercise
The following data give the results (in grade) of 20
students in Mathematics Test.
A C A B F
B A B C B
A B C F A
B B C C B
a) Construct a frequency distribution table.
34
Exercise
35
Frequency, Relative Frequency, Percentage
Distributions Table of Students’ Status
Relative
Status Frequency Percentage
Frequency
F
SO
J
SE
sum
Revision exercise
In a survey, 120 Malaysian adults were asked to rate their health.
The table below summarizes their responses.
State of Health Percentage of Response
Excellent 17.5
Very good 37.5
Good 32.5
Fair 10.0
Poor 2.5
Single-Valued Classes
Are used if the observations in a data set assume
only a few distinct (integer) values
( i.e. classes are made of single values and not of
intervals).
39
Example 2
The Number of Vehicles Owned by 40 Households
from a City
5 1 1 2 0 1 1 2 1 1
1 3 3 0 2 5 1 2 3 4
2 1 2 2 1 2 2 1 1 1
4 2 1 1 2 1 1 4 1 3
Construct a frequency distribution table for these data.
40
Example 2 (Solution)
Number of Households
Vehicles Owned
(f)
0 2
1 18
2 11
3 4
4 3
5 2
Sum 40
41
Bar Graph
42
Grouped Frequency Distribution
43
Example 3
Weekly Earnings of 100 Employees of a Company
401 410 448 450 490 505 521 555 600 601
605 610 620 625 630 650 678 680 685 690
700 725 750 760 770 780 785 790 795 798
800 801 805 809 810 810 814 815 820 825
828 830 835 840 845 850 855 860 865 870
880 888 890 895 900 910 920 930 935 940
950 956 959 960 965 967 970 980 995 1000
1010 1020 1030 1055 1068 1070 1079 1090 1100 1110
1120 1130 1155 1167 1180 1230 1250 1259 1270 1290
1300 1320 1350 1400 1410 1460 1500 1541 1560 1600
45
Relative Frequency &
Percentage Distributions
46
Illustration 1
Illustration 2
Class Boundaries f
134.5 – 156.5 10
156.5 – 178.5 3
178.5 – 200.5 7
200.5 – 222.5 6
222.5 – 244.5 4
Sum = 30
48
Example 4
49
Example 4 (Solution)
Find the frequency, relative frequency and
percentage for all classes.
Relative
Age Frequency Percentage
Frequency
18 – 21
22 – 25
26 – 29
30 – 33
34 – 37
sum
Definition
Class
An interval that includes all the values that fall within
two numbers, the lower and upper limits
Class limits
Endpoints of each interval
Class Boundary
The dividing line between two classes and is given
by the midpoint of the upper limit of one class and
the lower limit of the next higher class.
51
Definition
Class width / class size
The difference between the upper and lower
class boundary.
52
Example 5
53
Example 5 (Solution)
Class Boundaries
400.5 – 600.5
600.5 – 800.5
800.5 – 1000.5
1000.5 – 1200.5
1200.5 – 1400.5
1400.5 – 1600.5
54
Example 6
Class Lower Upper Class
Midpoint
interval boundary boundary width
10 11 15 16 11 15
11 – 15 10.5 15.5 13 15.5 – 10.5 = 5
2 2 2
16 – 20 15 16 20 21 16 20
15.5 20.5 18 20.5 – 15.5 = 5
2 2 2
20 21 25 26 21 25
21 – 25 20.5 25.5 23 25.5 – 20.5 = 5
2 2 2
25 26 30 31 26 30
26 – 30 25.5 30.5 28 30.5 – 25.5 = 5
2 2 2
55
Exercise
Class Lower Upper Class
Midpoint
interval boundary boundary size
0–9
10 – 19
20 – 29
30 – 39
56
Solution
Class Lower Upper Class
Midpoint
interval boundary boundary size
0–9 - 0.5 9.5 4.5 10
57
Exercise
Find the class boundaries and class limits.
a) Number of books 2–3 4–5 6–7 8 – 9 10 – 11
Frequency 10 12 8 4 2
58
Solution
(a)
Number of frequency Class boundaries Class limit
books
2–3 10
4–5 12
6–7 8
8–9 4
10 – 11 2
59
Solution
(b)
Weight (kg) frequency Class boundaries Class limit
40 – <50 10
50 – <60 12
60 – <70 8
70 – <80 4
80 – <90 2
60
Revision exercise
The following table gives the frequency
distribution of ages for all 50 employees of a
company.
Age No. of Employees
18 to 30 12
31 to 43 19
44 to 56 14
57 to 69 5
61
Revision exercise
a) Find the class boundaries and class midpoints.
62
Solution
Class Class Relative Percentage
Age Midpoint frequency
boundaries width frequency (%)
18 – 30 12
31 – 43 19
44 – 56 14
57 – 69 5
Sum =
50
63
Graphing Grouped Data
Grouped (quantitative) data can be
displayed in a histogram or a polygon.
Histogram
Three types of histogram
1. Frequency histogram
2. Relative frequency histogram
3. Percentage histogram
64
Histogram
• A histogram is a graph in which class boundaries
are marked on the horizontal (x) axis & the
frequencies, relative frequencies, or
percentages are marked on the vertical (y) axis.
68
Percentage
Percentage Histogram
69
Polygon
• A graph formed by joining the midpoints of
the tops of successive bars in a histogram.
70
Polygon
20 30 40 50 60 70 80
Marks – – – – – – –
29 39 49 59 69 79 89
Frequency 22 18 22 24 14 14 20
72
Example 7 (Solution)
Class
Marks frequency
boundaries
20 – 29 19.5 – 29.5 22
30 – 39 29.5 – 39.5 18
40 – 49 39.5 – 49.5 22
50 – 59 49.5 – 59.5 24
60 – 69 59.5 – 69.5 14
70 – 79 69.5 – 79.5 14
80 – 89 79.5 – 89.5 20
73
Example 7 (Solution)
74
Exercise
The table below shows the ages distribution for 30
participants in a game. Draw a histogram for frequency
distribution.
75
Solution
Class
Age frequency
boundaries
6 – 10 5.5 – 10.5 2
11 – 15 10.5 – 15.5 7
16 – 20 15.5 – 20.5 8
21 – 25 20.5 – 25.5 6
26 – 30 25.5 – 30.5 3
31 – 35 30.5 – 35.5 4
76
Histogram for the frequency distribution for the age (years) of 30
participants in a game
9
5
Frequency
0
5.56 – 1010.511 – 1515.516 – 2020.521 – 25
25.526 – 30
30.531 – 35
35.5 77
Age
Revision exercise
Weekly Earnings of 100 Employees of a Company
Weekly Earnings Number of Employees
(dollars) (f)
401 – 600 9
601 – 800 22
801 – 1000 39
1001 – 1200 15
1201 – 1400 9
1401 – 1600 6
Sum 100
Construct a histogram and Polygon for the frequency distribution.
Revision exercise
The marks obtained by 120 students in an
examination is recorded in the following
table.
Marks 20-29 30-39 40-49 50-59 60-69 70-79 80-89
Frequency 12 18 24 20 18 16 12
80
1.8
Shapes of Histograms
Symmetric Histogram
82
Skewed Histogram
It is asymmetric and the tail on one side is
longer than the tail on the other side.
83
1.9
Cumulative Frequency
Distributions
Definition
A cumulative frequency distribution
gives the total number of values that fall
below the upper boundary of each class.
85
Example
Example 68
Prepare a cumulative frequency distribution for the
following frequency distribution.
86
Example 8 (Solution)
f
10
3
7
6
4
Sum = 30
87
Cumulative Relative Frequency &
Cumulative Percentage
Cumulative frequency of a class
Cumulative relative frequency =
Total observations in the data set
88
Example 8 (Solution)
c.f.
10
13
20
26
30
89
Ogive
An ogive is a curve drawn for the cumulative
frequency distribution by joining with straight
lines the dots marked above the upper
boundaries of classes at heights equal to the
cumulative frequencies of respective classes.
90
Cumulative Frequency Curve
(Ogive)
There are two types of cumulative frequency
curves:
1) ‘less than’ cumulative frequency curve
2) ‘more than’ cumulative frequency curve
91
Example 9
Construct a ‘less than’ ogive for the data below.
92
Example 9 (Solution)
94
Example 10
Using the data given below, construct a ‘less than’ cumulative
frequency distribution and draw the ogive.
Marks 1 – 10 11 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 – 70 71 – 80
Number of
3 8 12 14 10 6 5 2
Students ( f )
95
‘Less than’ Cumulative Frequency Distribution
‘Less than’
Upper boundary cumulative
frequency
Marks Frequency Less than 0.5 0
1 – 10 3 < 10.5 3
11 – 20 8 < 20.5 11
21 – 30 12 < 30.5 23
31 – 40 14 < 40.5 37
41 – 50 10 < 50.5 47
51 – 60 6 < 60.5 53
61 - 70 5 < 70.5 58
71 – 80 2 < 80.5 60
Sum 60
“Less than” ogive for the cumulative frequency distribution for the
marks scored by 60 students
Example 10 (Solution)
(i) Approximately 52 students score less than 60 marks.
100
1.10
Stem-and-Leaf Displays
Definition
In a stem-and-leaf display of quantitative
data, each value is divided into two portions
– a stem and a leaf.
102
Example 11
The following are the scores of 30 college students
on a statistics test.
75 52 80 96 65 79 71 87 93 95
69 72 81 61 76 86 79 68 50 92
83 84 77 64 71 87 72 92 57 98
103
Stem-and-leaf display for two-digit
numbers
104
Stem-and-leaf display for two-digit
numbers
5 2 0 7
6 9 1 4 5 8
7 5 2 7 6 1 9 1 9 2
8 3 4 0 1 6 7 7
9 6 2 3 5 2 8
105
Example 11 (Solution)
65 40 63 67 75 79 85 45 90
60 55 67 86 55 49 78 76 54
67 98 56 45 50 85 67 72 83
107
Example 12 (Solution)
108
Example 13
The following data give the monthly rents paid by a
sample of 27 households selected from a small city.
109
Example 13 (Solution)
110
Example 14
The following stem-and-leaf display is prepared for the number of hours
that 25 students spent working on computers during the past month.
Stem Leaf
0 6 26 38 49 67 85
1 1 7 9
6 34 37 19 22
2 2 6
3 2 4 7 8 41 56 58 32 49
4 1 5 6 9 9 64 65 45 46 86
5 3 6 8
53 11 17 62 64
6 2 4 4 5 7
7
8 5 6
Key: 0 6 means 6
Prepare a new stem-and-leaf display by grouping the stems with class
interval 0 – 2, 3 – 5, 6 – 8.
Example 14 (Solution)
112
The End
of
Topic 1