0% found this document useful (0 votes)

131 views6 pages

Presentation and Summary of Data

This document discusses presenting and summarizing numeric data. It defines key terms like variables, scales of measurement, and frequency distributions. Graphical methods for presenting data include histograms and box plots. Numerical summaries include measures of central tendency (mean, median, mode) and dispersion (range, interquartile range, variance, standard deviation, coefficient of variation). These statistical techniques help analyze and communicate patterns in data in a clear and understandable way.

Uploaded by

Dr P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

131 views6 pages

Presentation and Summary of Data

Uploaded by

Dr P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Chapter 2 PRESENTATION AND SUMMARY OF DATA

Objectives
a) To be able to recognise the scale of measurement of any given variable.
b) To know how to present numeric data graphically using histograms and boxplots.
c) To know how the various measures of location and dispersion are defined and to be able to select
appropriate measures for a given set of data.
d) To be able to produce both graphs and summary measures using SPSS.

2.1 Introduction
In a physiology experiment 30 students had their heart rates (in beats per minute) measured after
completing a standard exercise test. The results were as follows:
71 68 76 73 71 72 74 73 74 73
70 76 72 70 73 69 77 71 75 74
72 69 73 78 72 77 75 77 70 75
Clearly there is a degree of variation in these results. As is typical of many measurements made in
medicine and dentistry, this variation may arise from different sources e.g. biological variation
(differences in resting heart rate and fitness between students), measurement error.
In their present state, the figures convey little information. The aim is to present the data in a compact
and understandable form. There are two ways of doing this.
i. In graphical form
ii. By using summary statistics

2.2 Some Definitions

A variable is a characteristic, of a given subject, which may take any one of a set of values. There are
two types of variables:
• qualitative variables e.g. sex, blood group, country of birth
• quantitative or measurable variables subdivided into:
continuous variables, taking one of an infinite number of possible values
e.g. height, weight, temperature, haemoglobin.
discrete variables, taking a number of (usually integer) values
e.g. parity, radioactive counts, number of days in hospital.

Any variable may also be assigned to one of three scales of measurement.

i. nominal categorical scale variables e.g. sex, blood group, country of birth.
ii. ordinal categorical and ranked scale variables
e.g. pain on a three point scale, position in the class in an assessment
iii. interval scale e.g. height (cm), temperature (oC), blood pressure (mmHg)

A frequency distribution shows the frequency of occurrence of the different values of a variable, and
may be represented either as a table or as a graph. For nominal or ordinal scale variables the graph is
called a bar chart with frequency on the vertical axis and values of the variable on the horizontal axis
(see section 2.3). For interval scale variables the range of the variable is first divided into classes and
the figure is called a histogram (see section 2.6). Relative frequency is the frequency expressed as a
proportion (or percentage) of the total frequency and can be particularly useful for comparing two or
more frequency distributions.

2.1
2.3 Graphical presentation of categorical data

Although much used in the popular press, pie charts are not favoured in scientific work and bar charts
are preferred. The following data show the distribution of delay time for almost 2,000 middle-aged
men who had a heart attack in the greater Belfast area during the period 1983-85.

Delay time Frequency Relative frequency 800 39%

Frequency
Short 746 39% 600
Medium 459 24% 24%
21%
400 16%
Long 408 21%
Very long 301 16% 200

1914 100% 0
Short Medium Long Very long
Source: Rev.Epidem.et Sante Publ. 1990; 38: 419-427 Delay time

Relative frequency (%)

40
The simple bar chart shown above may usefully be
extended to a multiple bar chart or a composite bar 30
Single
chart to assist in the comparison of subgroups. 20 Married
Div/Wid/Sep
For example, careful examination of the multiple bar 10

chart shown opposite reveals a slight difference in the 0

distribution of delay time between single, married and Short Medium Long Very long

divorced/widowed/separated men. Delay time

2.4 Measures of Location

A measure of location is the value at which the sample is ‘centred’.

• The arithmetic mean (usually shortened to mean)is the sum of all the observations in
the sample divided by the total number of observations

x1 + x 2 + ...+ xn 1 n
x= = ∑ xi
n n i =1
• The median is the middle value if the sample is arranged in increasing order. The median therefore
cuts the sample in half with 50% less than the median and 50% greater than the median.

(a) for n odd, median is the middle observation

(b) for n even, median is the arithmetic mean of the two middle observations

• The mode is the most commonly occurring value in the sample.

• The quantiles (quartiles, deciles, percentiles etc.) are the (k - 1) values of the variable which divide
the sample into k equal parts when the sample values are arranged in increasing order. They
identify locations other than the centre of the sample.

2.2
2.5 Measures of Dispersion
A measure of dispersion is a quantity which describes the degree of variation, spread, or scatter of the
observations in the sample about their central value.

• The range is the difference between the largest and smallest values in the sample.
Range = xmax - xmin. Unfortunately it is severely affected by outliers (rogue results).

• The interquartile range is the difference between the third and first quartiles.
_
• The variance is approximately the arithmetic x1 - x
_
mean of the squared deviations of the values xi - x _
from their mean xn - x
1 n _
2
s = ∑
n - 1 i=1
( xi - x ) 2 x1 x 2 ... x i _ ... xn
x

• The square root of the variance is the standard deviation. It has an advantage of being
in the original scale of measurement, and is therefore used in preference to the variance.

• The coefficient of variation is the standard deviation as a percentage of the mean.

s
c = x100%
x
This expresses the standard deviation relative to the mean and provides a measure of variation
which is independent of the units of measurement.

The coefficient of variation is particularly useful for comparing dispersions between two

)
variables with different units of measurement. Because it is a measure of relative variation (i.e.
standard deviation relative to mean) it can also be useful for comparing dispersions between
two sets of data with the same units of measurement but with very different means.

Mean Mean
Median Median
2.6 Graphical presentation of measurement data Mode Mode
The histogram is formed by dividing the range of 800
000
the variable into a number of classes of equal width. 700
Frequency

600
The frequency distribution is then plotted as a series
Frequency

500

of contiguous bars, the height of the bar being prop- 400 500
300
ortional to the frequency in the class. The 200

examples opposite show histograms for a variable 100

0
0
with a symmetric distribution (diastolic blood pressure) 45 55 65 75 85 95 105 115 125 135 0 5 10 15

and a variable whose distribution shows positive Total Triglyc (mg/100ml)

DBP (mmHg)
skewness (total triglyceride) with a long tail to the right.
Total Triglyceride (mg/100ml)

The boxplot is a five point summary of the data 150

consisting of the
minimum, maximum, median and first and third
(mmHg)

Maximum 15 Maximum
quartiles. Sometimes outliers (rogue results) are
identified separately by stars. Note the difference 100 10
Third quartile
in appearance of the plot for the symmetric and Median
First quartile
skewed distributions shown opposite. 5
DBP

50 Third quartile
Median
Minimum First quartile
0
Minimum

2.3
These graphical procedures are important in determining which measures of location and
) dispersion are most appropriate for summarising any given set of data. If a distribution is heavily
skewed then the median and interquartile range are preferred as the summary measures rather than
the mean and standard deviation. Sometimes variables which are heavily positively skewed are
logarithmically transformed in order to obtain a more symmetric distribution.
2.7 Heart rate example
Summarise the heart rate data in section 2.1 by constructing the frequency distribution, and present the
results in both tabular and graphical (histogram) form. Calculate appropriate measures of location and
dispersion.

10
Heart rate Tally Frequency Relative
9 Since the frequency
(beats per Frequency
distribution
8
is nearly symmetric the
min) mean 7and standard deviation are the
_______ ___________ _________ _________ most appropriate
6
measures of location

Frequency
67-68 | 1 .033 and dispersion.
5
69-70 ||||| 5 .167 4
71-72 ||||| || 7 .233 3
73-74 ||||| ||| 8 .267
2
75-76 ||||| 5 .167
1
77-78 |||| 4 .133 0
68 70 72 74 76 78
30 1.000
Heart rate (beats per min.)

Since the frequency distribution is nearly symmetric the mean and standard deviation are the most
appropriate measures of location and dispersion.
∑ xi 2190
mean x = = = 73 beats per min.
n 30
1 n _
variance s2 = ∑
n -1 i =1
( x i − x )2

=
1
30 - 1
[(71 − 73) 2 + ( 68 − 73) 2 +...+ ( 75 − 73) 2 ]
= 7.10 (beats per min)2

standard deviation s = 7.10 = 2.67 beats per min

Had the distribution been skewed then the median and interquartile range would have been preferred.
median = (73 + 73)/2 = 73 beats per min
first quartile, Q1 = 71 beats per min
third quartile, Q3 = 75 beats per min
interquartile range = Q3 - Q1 = 75 - 71 = 4 beats per min

Throughout this course we will use the computer to perform these tasks, but it is nevertheless important to
appreciate how they are performed.

2.4
2.8 Obtaining graphical output and summary measures in SPSS
To obtain graphical output (histogram, boxplot, pie chat or bar chart) in SPSS follow the relevant
menu options below and then click on the variables and press the arrow button to move them into the
relevant boxes. Then press OK.

Graphs → Histogram...(click on Display normal curve box for a superimposed Normal distribution)
Graphs → Boxplot... (optionally enter a variable in the Category Axis box for side-by-side boxplots)
Graphs → Pie...
Graphs → Bar...

To obtain summary measures in SPSS follow the Analyze → Descriptive Statistics → Frequencies...
menu options and then click on the variables and press the arrow button to move them into the
Variable(s): box. Press the Statistics... button and click all the required options. Press the Continue
and OK buttons.

The Analyze → Descriptive Statistics → Descriptives… option offers a less comprehensive range of
statistics in a more compact output which may be useful for screening large numbers of variables
quickly.

2.9 Further Reading

Bland Sections 4.1-4.8, 5.3-5.8

2.10 Practical

2.1) The following results for haemoglobin concentration (g/dl) were obtained from blood samples
from 11 individuals.
14.7 15.2 16.2 15.9 13.4 11.6 12.0 13.4 13.3 12.5 10.6
Use SPSS to calculate the following measures:
(i) mean
(ii) median
(iii) range
(iv) standard deviation
(v) coefficient of variation

The mean corpuscular haemoglobin (pg) was estimated for the same 11 people. The results
are given below.
23.8 20.0 21.7 22.0 23.7 24.0 23.7 27.7 30.3 27.4 22.4
Which of these two measurements, (i.e. haemoglobin or mean corpuscular haemoglobin), is
the more variable?

2.5
2.2) Now open the worksheet j:\medstats\caer.sav which contains selected data from a study of
ischaemic heart disease in a cohort of approximately 2,500 middle-aged men from the Welsh
town of Caerphilly.
Examine the distribution of each variable in the table by a suitable graphical method.
Depending on the shape of the distribution, select an appropriate summary measure of location
and of dispersion from and record the values of these measures in the table.

Variable Symmetric Most appropriate Most appropriate

or skewed? measure of location measure of dispersion
SBP Systolic blood pressure (mmHg)

HT Height (cm)

WT Weight (kg)

TOTTRIG Total triglyceride (mmol/l)

For any variable that is heavily positively skewed, re-examine the shape of the distribution
after applying a logarithmic transformation to check that the distribution is more symmetric.

2.3) The following table shows the numbers of fatal and non-fatal road accidents reported to the
police in Northern Ireland in 1981 by day of the week.

Day of Week Fatal Accidents Non-fatal

Sunday 27 A id 585
Monday 18 677
Tuesday 20 657
Wednesday 26 722
Thursday 28 784
Friday 38 842
Saturday 47 774
204 5041

Source: Death and Injury Road Accidents in Northern Ireland

Royal Ulster Constabulary, 1981.

Present the data using a multiple bar chart in such a way as to make it easy to compare the
distributions of each type of accident throughout the week.

Comment on possible reasons for any differences in distribution you observe.

2.6

Strategy Papers and Cases Questions
0% (1)
Strategy Papers and Cases Questions
9 pages
Private Health Institutions Law
100% (1)
Private Health Institutions Law
22 pages
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
100% (1)
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
33 pages
Dietary Practices Among Individuals With Type 2 Diabetes (Diabetes Mellitus) : A Guide To Nutrition Intervention
100% (2)
Dietary Practices Among Individuals With Type 2 Diabetes (Diabetes Mellitus) : A Guide To Nutrition Intervention
68 pages
Risk Ranger
No ratings yet
Risk Ranger
31 pages
Quantitative Data Analysis Assignment (Recovered)
100% (1)
Quantitative Data Analysis Assignment (Recovered)
26 pages
Statistics: Organize Understand
No ratings yet
Statistics: Organize Understand
9 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
3 pages
Powerpoint Presentation On: "Frequency
100% (2)
Powerpoint Presentation On: "Frequency
36 pages
Organization of Terms
No ratings yet
Organization of Terms
10 pages
Glossary of Terms
No ratings yet
Glossary of Terms
7 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
53 pages
Week 5A - Statistics Handout
No ratings yet
Week 5A - Statistics Handout
9 pages
Chapter 2 - Stat
No ratings yet
Chapter 2 - Stat
100 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
No ratings yet
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
34 pages
2 Research - 2ND QT - Week 1 - 10 14 2024
No ratings yet
2 Research - 2ND QT - Week 1 - 10 14 2024
13 pages
Unit 4
No ratings yet
Unit 4
152 pages
1 Basics of Stat (Statistics IEM 2-2)
No ratings yet
1 Basics of Stat (Statistics IEM 2-2)
29 pages
2.data Description
No ratings yet
2.data Description
57 pages
Data Analysis
No ratings yet
Data Analysis
43 pages
BIOSTAT LESSON 2 - Descriptive Statistics
No ratings yet
BIOSTAT LESSON 2 - Descriptive Statistics
3 pages
Session 2 Week 1
No ratings yet
Session 2 Week 1
30 pages
Cba101 MT
No ratings yet
Cba101 MT
4 pages
Basic Statistics (3685) PPT - Lecture On 20-01-2019
100% (1)
Basic Statistics (3685) PPT - Lecture On 20-01-2019
64 pages
02 - Descriptive Statistics
No ratings yet
02 - Descriptive Statistics
45 pages
Data Presentation
No ratings yet
Data Presentation
104 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
Class 1
No ratings yet
Class 1
52 pages
Biostats Lesson 3
No ratings yet
Biostats Lesson 3
6 pages
Business Statistics - KMBN104
No ratings yet
Business Statistics - KMBN104
25 pages
Data Organization Method
No ratings yet
Data Organization Method
65 pages
Bast 503 Lect 5
No ratings yet
Bast 503 Lect 5
53 pages
Stat
No ratings yet
Stat
16 pages
B180 Expt 9 Sem II
No ratings yet
B180 Expt 9 Sem II
8 pages
Lesson2 - Measures of Tendency
No ratings yet
Lesson2 - Measures of Tendency
65 pages
Statistical Analysis - Descriptive Stat
No ratings yet
Statistical Analysis - Descriptive Stat
6 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
4 pages
Basic Stat
No ratings yet
Basic Stat
46 pages
Statistics: I. II. Iii. IV
No ratings yet
Statistics: I. II. Iii. IV
6 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
Important Measures of Central Tendency Are Mean, Median and Mode
No ratings yet
Important Measures of Central Tendency Are Mean, Median and Mode
31 pages
Variables & Data Presentation
No ratings yet
Variables & Data Presentation
39 pages
Statistics in Research Processing and Data Analysis
No ratings yet
Statistics in Research Processing and Data Analysis
34 pages
Intro To Statistics - Descriptive Statistics and NPC - 20250225 - 171911 - 0000
No ratings yet
Intro To Statistics - Descriptive Statistics and NPC - 20250225 - 171911 - 0000
23 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
31 pages
Class Test 1 Revision Notes
No ratings yet
Class Test 1 Revision Notes
10 pages
Introduction To Statistics 1 COD
No ratings yet
Introduction To Statistics 1 COD
58 pages
Basic Statistics
No ratings yet
Basic Statistics
52 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
UNGROUPED DATA Measures of Central Tendency, Dispersion, and Position
No ratings yet
UNGROUPED DATA Measures of Central Tendency, Dispersion, and Position
34 pages
Math
No ratings yet
Math
13 pages
Basic Stats
No ratings yet
Basic Stats
23 pages
43hyrs Principles of Statistics 3
No ratings yet
43hyrs Principles of Statistics 3
56 pages
Final Dispersion2025
No ratings yet
Final Dispersion2025
61 pages
Assignment No 3
No ratings yet
Assignment No 3
16 pages
Hns 2321 Biostatistics Descritive Statistics
No ratings yet
Hns 2321 Biostatistics Descritive Statistics
35 pages
Week One: Introduction To Quantitative Methods MBA 2013
No ratings yet
Week One: Introduction To Quantitative Methods MBA 2013
49 pages
Unit - 2 Biostatistics
No ratings yet
Unit - 2 Biostatistics
9 pages
Lesson 1
No ratings yet
Lesson 1
37 pages
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Image Histogram: Unveiling Visual Insights, Exploring the Depths of Image Histograms in Computer Vision
From Everand
Image Histogram: Unveiling Visual Insights, Exploring the Depths of Image Histograms in Computer Vision
Fouad Sabry
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
ABC Telecom
No ratings yet
ABC Telecom
8 pages
1.0 Introduction To Biochemistry and Cellular Organization
No ratings yet
1.0 Introduction To Biochemistry and Cellular Organization
6 pages
Futong Ism Tds SCG Hdpe h2001wc 20jul20
No ratings yet
Futong Ism Tds SCG Hdpe h2001wc 20jul20
3 pages
Secret of Anti-Aging Anti-Aging Food Con
No ratings yet
Secret of Anti-Aging Anti-Aging Food Con
5 pages
Complete Guide To Service Learning 2
No ratings yet
Complete Guide To Service Learning 2
110 pages
Naukri VinitaSingh 1790045 - 08 00 - 1
No ratings yet
Naukri VinitaSingh 1790045 - 08 00 - 1
3 pages
Operating Room
No ratings yet
Operating Room
1 page
Mis 09
No ratings yet
Mis 09
31 pages
Motion 1 QP
No ratings yet
Motion 1 QP
15 pages
Figure of Speech
No ratings yet
Figure of Speech
4 pages
Customer Inquiry Report-9
No ratings yet
Customer Inquiry Report-9
7 pages
Review Notes in Police Operational Planning: - // (Jonathan R. Budaden)
No ratings yet
Review Notes in Police Operational Planning: - // (Jonathan R. Budaden)
8 pages
Case
No ratings yet
Case
4 pages
Abdullah Shakeel, Bscs 3b 19arid5127
No ratings yet
Abdullah Shakeel, Bscs 3b 19arid5127
6 pages
Phil Summa
No ratings yet
Phil Summa
3 pages
POEM
No ratings yet
POEM
7 pages
Master Thesis Vu Amsterdam
100% (2)
Master Thesis Vu Amsterdam
8 pages
Large Scale Production Fermenter Design
No ratings yet
Large Scale Production Fermenter Design
15 pages
ECE CAD Introduction To AutoCAD
No ratings yet
ECE CAD Introduction To AutoCAD
5 pages
Chemical Engineering in Practice Second Edition - Sampler
100% (1)
Chemical Engineering in Practice Second Edition - Sampler
99 pages
Amisha Reflective Report
No ratings yet
Amisha Reflective Report
8 pages
STAR HIB Plus Product Specifications 4
No ratings yet
STAR HIB Plus Product Specifications 4
1 page
ISU Transaction Codes and Table Names - SAP Community
No ratings yet
ISU Transaction Codes and Table Names - SAP Community
8 pages
Mapping Pulling Cable Grounding System
No ratings yet
Mapping Pulling Cable Grounding System
1 page
24F - 48F DJ ADSS Specs 600 MTR
No ratings yet
24F - 48F DJ ADSS Specs 600 MTR
2 pages
17.tendering Strategies Iconsult
No ratings yet
17.tendering Strategies Iconsult
32 pages

Presentation and Summary of Data

Uploaded by

Presentation and Summary of Data

Uploaded by

Chapter 2 PRESENTATION AND SUMMARY OF DATA

2.2 Some Definitions

Any variable may also be assigned to one of three scales of measurement.

Delay time Frequency Relative frequency 800 39%

Relative frequency (%)

chart shown opposite reveals a slight difference in the 0

divorced/widowed/separated men. Delay time

2.4 Measures of Location

(a) for n odd, median is the middle observation

• The mode is the most commonly occurring value in the sample.

• The coefficient of variation is the standard deviation as a percentage of the mean.

examples opposite show histograms for a variable 100

and a variable whose distribution shows positive Total Triglyc (mg/100ml)

The boxplot is a five point summary of the data 150

standard deviation s = 7.10 = 2.67 beats per min

2.9 Further Reading

Bland Sections 4.1-4.8, 5.3-5.8

Variable Symmetric Most appropriate Most appropriate

TOTTRIG Total triglyceride (mmol/l)

Day of Week Fatal Accidents Non-fatal

Source: Death and Injury Road Accidents in Northern Ireland

Comment on possible reasons for any differences in distribution you observe.

You might also like