Descriptive Statistics, Tables and Graphs 20
Descriptive Statistics, Tables and Graphs 20
Outline
• Concept of Statistics
1
Statistics
• Statistics is a discipline which is
concerned with:
– designing experiments and data collection,
– processing and summarizing information
to aid understanding,
– drawing conclusions from data, and
– estimating the present or predicting the future
Descriptive versus
inferential
statistics
• Descriptive statistics are used to
summarize and describe patterns
through the analysis of numeric data
2
Applications
• Statisticians may apply their knowledge
of statistical methods to a variety of
subject areas, such as biology,
economics, engineering, medicine, public
health, psychology, marketing, education,
and sports.
3
4
5
Variable
6
• Data:
– Refers to observations made on individuals.
• Primary data:
– Collected and recorded systematically by the
investigator himself/herself for some defined
purposes.
• Secondary data:
– Collected by somebody else or for other
purposes. E.g. Information derived from
hospital records
• Raw data:
– Collected data before any cleaning, editing, and
statistical manipulations
7
Common examples
• Mean, median, mode, range, and
standard deviation are some of the
main descriptive statistics.
8
Mean
Median
Order data from smallest to largest!
If odd number of data points, the median is the
middle value
Data: 4 5 6 3 3
Ordered Data: 3 3 4 5 6
Data: 4 5 6 5 3 3
Ordered Data: 3 3 4 5 5 6
9
Mode
Data: 4 7 6 53 3
Mode: 3
10
Small test about measures
of central tendency
• The overall mean score based on 3
tests for students A and B is 70 out of
100.
11
Who is better: A or B?
Standard Deviation
12
Application of Median and
Mode
• Median is used for data with extreme
values (outliers)
13
Range
14
Boxplot
(Box-and-Whisker
diagram)
15
Standard deviation and normal
distribution
Shape of distribution:
Skewness
16
Normal Distribution
• All values are symmetrically distributed
around the mean (mean=median=mode)
Choosing appropriate
measures
• For symmetric distributions (with no
outliers), better to use the mean and
standard deviation;
• For skewed distributions, better to use the
median and interquartile range
• For nominal variables use
frequencies and percentage
• For continuous variables use the mean
17
How to display/present data?
18
Power of graphs
• Why use graphs?
– Gives reader a compact and structured synthesis
– Many details can be shown in a small area
– Gives an immediate depiction of the
differences and patterns in a set of data
– Reader can see immediately major
similarities and differences without having to
compare and interpret figures
19
Line graph
• Line graphs show the progression of
values over time
• Easier for the eye to follow curves for
different series
• Easier to get a clearer picture of the
development over time
• Good for answering the following questions:
– In what periods were the changes large?
– When were the turning points?
20
Example: Scale Line
Graph
21
22
Bar Chart
• Bar graphs compare the values of different
items in specific categories or at discrete
points in time
• Vertical or horizontal
• Simple to create and easy to interpret
• Used to illustrate variable values which
are distinct (i.e. qualitative variable)
• Y-axis represents frequency
• X-axis may represent time or different classes
23
Bar chart
Bar chart
24
Clustered Bar chart
• Bars can be presented as clusters of
subgroups in clustered bar charts.
25
Clustered Bar chart
26
Stacked bar chart
(Total value of categories are easily visible)
27
100% Stacked Bar Chart
Histogram
• A representation of a frequency
distribution by means of rectangles
• Width of bars represents class intervals
and height represents corresponding
frequency
– Area proportional to number
– No space between columns
– One population
28
Histogram
Histogram
29
Pie Chart
• A circular (360 degree) graphic representation
• Compares subclasses or categories
to the whole class or category using
differently coloured or patterned
segments
Pie Chart
• Suitable for illustrating
percentage distributions of
qualitative variables
• Displays the contribution of each value
to a total
• Best suited for overviews
• Should not have too many sectors –
maximum 5 or 6
30
Pie Chart
Tables
• A rectangular arrangement of data in which
the data are positioned in rows and
columns.
• Each row and column should be labelled
• Rows and columns with totals should be
shown in the last row or in the right-hand
column
• Units of measurements
• Max five variables
• Horizontal lines OK, vertical not
31
Commonly used tables
• Single variable tables
– Frequency distribution
• Multivariable tables
– Contingency tables
• 2x2 tables
32
Table 2. Gonorrhoea by age-group and
sex,
Norway, 2005
33
In Summary
• Depending on your data, you can choose
from a variety of chart and graph formats,
including pie charts, histograms, tables
etc.
34