Lect 1 Descriptive Statistics
Lect 1 Descriptive Statistics
By
Dr. Jupiter Simbeye
Preliminary issues
STA121: Descriptive Statistics
• Course outline uploaded on the classroom
• Classroom code:
25
20
15
Percent
10
0
A+ A B+ B C+ C C- D E F
Module Aims and Learning Outcomes
Aim:
• To introduce students to basic descriptive statistical analysis
Learning outcomes:
On successful completion of this module, students should be able to:
• Summarise data in form of central measures, frequencies, tables and
graphs,
• Interpret summary statistics,
• Apply descriptive statistics to answer practical questions,
Indicative Content
• Tables and graphs for frequencies and other statistics: use and
interpretation of multi-way tables.
Example
Below are percentage point grades obtained by 10 students in STA121
67, 70, 55, 62, 40, 81, 90, 60, 69, 56
Observation
More terms and definitions
Examples
a) Age in years (25, 15, 74, etc)
b) Birth-weight of a babies in kg (3.1, 2.5, 2.9, 3.5, 4.2, etc).
c) Number of antenatal care (ANC) visits by a pregnant mother (0, 1, 4,
7, etc)
More terms and definitions
Thus age or birth-weight are continuous, because they can take any
values such as 25.5237873244 years or 2.93927634529 kg,
respectively, even if we may not have scales that could measure this
accurately!
Examples
a) Names of countries (Malawi, Zambia, Egypt, Mozambique)
b) Answer to opinion question (strongly disagree, disagree, agree, strongly
agree)
c) Sex of an individual ( male, female)
Note: In most analysis qualitative variables that take limited values are
discretized by assigning them codes (e.g. 1=strongly disagree, …, 4=strongly
agree)
More terms and definitions
7. Frequency Distribution: A frequency distribution is an overview of all
distinct values in some variable and the number of times they occur
Example:
• A sample of 183 students were asked to state which study major they are
following. Below shows part of these data.
Study majors
SN Name of student Sex Major
1 Andrew Gondwe Male Biology
2 John Samale Male Mathematics
3 Pempho Yasini Female Other
4 Felix Wadabwa Male Mathematics
: : : :
: : : :
182 Maren Dickson Female Physics
183 Jack Filipo Male Chemistry
Observations
• Just looking at our 183 values can not provide any important
information about majoring subjects.
• A more viable approach is to simply tabulate each distinct study
major in our data and its frequency -the number of times it occurs.
• The resulting table (below) shows how frequencies are
distributed over values – majoring subjects in this example- and
hence is a frequency distribution.
Frequency distribution table
• One important message we observe is age heaping at 2000, 2500, 3000, 3200, 3500,
4000, 4500 and 5000 grams. This could be recording errors by birth attendants or
mothers rounding the figures when recalling birth-weights.
• Since the summary is not very informative, it is a good idea to group the birth-weights
into some sensible groups before tabulating, say: 1 – 1000, 1001- 2000, 2001-3000, 3001
– 4000, 4001 – 5000, 5001 – 6000, 6001 – 7000, 7001 – 8000, 8001 – 9000, 9001 – 10000
grams.
• Table below provides a frequency table from the ten groups that we have created.
Observations Grouped birth-weight
in grams Frequency Percent Cummulative
• The majority of the babies (5,638)
are born weighing between 3001 1-1000 65 0.5 0.5
and 4000 grams. This represents 1001-2000 787 6.02 6.51
43.11 % of all 13,079 babies whose
birth-weight was recorded in the 2001-3000 5,026 38.43 44.94
survey. 3001-4000 5,638 43.11 88.05
• The second majority of babies are 4001-5000 1,305 9.98 98.03
born weighing between 2001 – 3000
grams. 5001-6000 209 1.6 99.63
• Overall, over 80% of the babies are 6001-7000 37 0.28 99.91
born weighing between 2001 to 7001-8000 6 0.05 99.95
4000 grams.
8001-9000 5 0.04 99.99
• The least likely birth-weights are
birth-weights over 6000 grams. 9000-10000 1 0.01 100
Total 13,079 100
More terms and definitions
8. Class-interval: A class interval is a subdivision of the total range of
values which a (continuous) variable may take
• Since the total frequencies in the two groups differ, it becomes difficult to
make direct comparisons. In this case, the use of relative frequencies
become useful.
Frequency table of birth-weights distributed by rural and urban residence
Rural Urban
60
50
Percent of babies
40
30
20
10
0
500 1500 2,500 3,500 4,500 5,500 6,500 7,500 8,500 9,500
Rural Urban
Observations
• The two distributions have similar shapes but greatly overlap. It appears urban or
rural are equally likely to have heavier or lighter babies.
• A big gap is observed around weights of 3,500 grams, where you have more
babies around those weights in the urban than rural.