0% found this document useful (0 votes)

95 views8 pages

Simple Statistics

This document provides an introduction to descriptive statistics. It defines key statistical concepts such as populations, samples, parameters, statistics, qualitative and quantitative variables, and measures of central tendency and dispersion. Measures of central tendency discussed are the mean, median, and mode. Measures of dispersion examined are range, variance and standard deviation. Examples are provided using sample data on student height and age.

Uploaded by

harsman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views8 pages

Simple Statistics

Uploaded by

harsman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Fall 2020 – Business Maths & Statistics, A1

 
Simple Statistics - univariate series. Measures of central
tendency and measures of dispersion.

What is statistics?

Statistics is
1. the science and art of collecting, organising, describing, presenting and analysing
data, which may be quantitative or qualitative (descriptive statistics);
2. it uses samples drawn from a previously defined population in order to establish
properties concerning the full population, as well as to formulate (predict) possible
future developments (inferential statistics).

The goal of descriptive statistics is the study of a population (a set of individuals or

entities) or some characteristics of this population via the collection and analysis of
data concerning (all) the individuals, or those in a subset of the full population (such a
subset is called sample).

Population – the set of things that we set out to investigate. The elements of a
population are called the individuals.

Sample – a subset of a previously defined population. 

 
A parameter – some (usually numerical) characteristic of a property of population.
Examples: a mean, a proportion, a variation …

A statistic – a (usually numerical) characteristic of a property of a sample, generally

used to estimate a corresponding population parameter. For example: the mean of a
sample is used to estimate the mean of the population from which it was drawn.
Fall 2020 – Business Maths & Statistics, A1

The characteristic that is studied (for example the weight of a sample of the students
at this school, their grades or their eye-colour) is called a random variable.

Qualitative variable – expresses a non-numeric characteristic, like gender, or eye-

colour, nationality, operating system …

Quantitative variable – the characteristic has a numeric value, it can be expressed

numerically. We distinguish discrete quantitative variables, like age (in years), the
number of students in a class or the number of bedrooms in a house (the value of a
discrete variable is generally obtained by counting), and continuous quantitative
variables, like air pressure in a tire, the room temperature at noon, or the weight of
students (the value of a continuous variable is generally obtained by measuring).

In this first introduction to statistics we will concentrate on quantitative variables,

properties of individuals the values of which are found by measuring or counting.

In the a majority practical situations it will unfeasible / impossible to collect and

analyse data on all individuals of a given population. A statistical study therefore
usually will start with the collection of data of a random sample of a population.

In class we started our study of statistics by taking a random sample of size 10 of the
population of all students in group 101. On that sample we determined the values of
two random variables, the variable height H and the variable age A.

Here are the results:

Height H 161 172 170 164 168 165 163 160 160 177
Age A 17 18 18 19 18 17 18 18 18 18

[ Height H is a continuous random variable. We measure height, and —in principle—

we can do that with as much precision as we like; all real numbers within a certain
range can occur as values for H.

Age A is a discrete random variable. A value for A is obtained be counting the number
of full years that have passed since an individual was born. Only integer numbers
(between 0 and, say, 130) can occur as values for A. ]

In a second step, we provide a numerical summary of the data that we collected. A

such summary consists in two types of measures. Measures of central tendency
(also called measures of location) indicate whereabout on the number line the data
can be found, around which values they are located. Measures of dispersion indicate
the degree of variation among the data.
???? 
Fall 2020 – Business Maths & Statistics, A1

Measures of central tendency  

The 3 M’s 

1. The mean of a set of observed values of the random variable X is the arithmetical
average of those values. If the size of our data set (population or sample) is n and the
1 n
n∑ i
individual values are x! 1, x 2, . . . , xn , then the mean is equal to ! x . (In case our
i=1
data are obtained from a full population, the mean is usually indicated by the Greek
letter μ
! (or, if we want to specify the random variable, by μ
! X) ; in case the values are
obtained from a sample, we will indicate the mean by X ! .

For the observed values of age and height in our sample: A

! = 17.9 and H
! = 166.

2. The median (or second quartile) of a set of observed values of the random variable
X is a number indicated Q ! 2(X ) such that half of the values is smaller than or equal to
!Q2(X ), and half of the values is greater than or equal to Q! 2(X ) . The median divides
the data set into two equal parts.

To find the median of a given set of quantitative data, proceed as follows:

a. First order the values, in ascending or descending order.
b. If the size n of the set is uneven, then the median is the value that is the midpoint of
the list, i.e. it is the value that is the (n + 1)/2-th number in the list. For example, if our
data set has size 39, order the data, and find the 20th number in the list. That is the
median.
c. In case the size n of the set is even, then we will take the average of the n/2-th and
the (n/2 + 1)-th value in the list for the median.

For the observed values of age and height in our sample: Q

! 2(A) = 18 and
!Q2(H ) = 164.5

3. The mode is a measure that indicates which value(s) occur(s) most often in the data
set. In case of a discrete variable this is a simple matter of counting: the mode for Age
is obviously 18. (In case of Height, we might say that the mode is 160, even though in
case of a continuous variable —for obvious reasons— it is almost always more
informative to speak of a modal class, meaning a certain range that contains most of
the values in the set.)

Here is a visualisation of the Age values in a so-called bar graph:

There is a bar for each of the values, the length of which corresponds to the number of
occurrence of each of the three age values in the sample.
Fall 2020 – Business Maths & Statistics, A1

A different visualisation that we can use is the so-called pie chart.

Measures of dispersion

The range of the values, i.e. the difference between the max(imum) (the biggest
value) and the min(imum) (the smallest value) in a data set is an obvious first indicator
of how spread out the observed values are on the number line. In our example, the
range of Age is 19-17 = 2, and that of Height is 177-160 = 17.

However the range of course does not tell us much about the degree of variation that
is found in the data.

As an indicator of this variation, we use what is —basically— the average of the

distance between each of the values and their means.
Fall 2020 – Business Maths & Statistics, A1
This is the essence of what is determined in the so called variance and its square root,
the standard deviation.

Attention! The calculation of these measures for a sample is a little different from
that same calculation in case of a full population.

Here are the formulas:

1 n
! X2 = (xi − μ)2 ; standard deviation: σ! X = σX2 .
∑
variance for a population: σ
n i=1

n
1
!SX2 (xi − X )2 ; standard deviation: S SX2 . 
n−1∑
variance for a sample: = ! X=
i=1
 

So, for our sample, we calculate the variance of Age as:

2 × (17 − 17.9)2 + 7 × (18 − 17.9)2 + (19 − 17.9)2

S! A2
= = 0.32, and the standard
10 − 1
deviation S
! A = 0.32 = 0.57.

The calculation ‘by hand’1 of the variance of the Height data is somewhat more
lengthy. The following table shows how to proceed, step by step.

height hi height - mean (height - mean)2

161 -5 25
172 6 36
170 4 16
164 -2 4
168 2 4
165 -1 1
163 -3 9
160 -6 36
160 -6 36
177 11 121
Sum = 288

10
1
(hi − mean)2 = 288/9 = 32, and the
∑
So the sample variance of Height is !
10 − 1 i=1
sample standard deviation therefore ! 32 = 5.66.

1The statistical functions on most scientific calculators and software tools like Excel allow for more
eﬃcient and less time consuming ways to calculate the variance data sets.
Fall 2020 – Business Maths & Statistics, A1
Note that because we squared the differences between the values of Height and their
mean, the dimension of variance (in this example) is cm2. The square root of the
variance, the standard deviation, brings us back to the original dimension, i.e. cm.

Frequency distributions
The values of a continuous quantitative random variable are often summarised in a so-
called frequency distribution. We divide the range of the values into a certain number
of disjoint (but adjacent) intervals (the ‘classes’ or ‘bins’ of the distribution), and then
count how many values are contained in each of the intervals, i.e. we determine the
frequency of values in the respective classes.
As an example, to make a frequency distribution for the values in the sample of
students’ heights, we can chooses intervals with a width of 5 cm, closed to the left,
and open to the right. We choose four of them to cover the range of values that we
found.

[160,165[ [165, 170[ [170, 175[ [175,180[

frequency 5 2 2 1
freq. percentage 50 % 20 % 20 % 10 %
cumulative 50 % 70 % 90 % 100 %
freq. percentage

The cumulative frequency percentages indicate e.g. that 70% of the students in the
sample had a body length of less than 170 centimetres. And we can similarly read in
the frequency percentages row that 40% of the students in the sample had a body
length between 165 and 175 centimetres.
We visualise a frequency distribution in a so-called histogram. 

Here is another example. For a small shop in Belleville the returns (in thousands of
euros) on 20 random days in the past 6 months are given in the following table.
Fall 2020 – Business Maths & Statistics, A1
[0,5[ [5, 10[ [10, 15[ [15,20[
frequency fi 1 7 9 3
freq. percentage pi 5% 35 % 45 % 15 %
cumulative 5% 40 % 85 % 100 %
freq. percentage

In this case we do not know the individual values in the sample of the daily turnovers
of the shop. We only know how they are distributed over four adjacent ‘slots’ with a
‘width’ of 5000 €.
But we still can use the information thus provided to approximate the measures of
central tendency and of dispersion that numerically summarise the sample data.  
In each of the classes our ‘best guess’ can be no other than that each of the values will
be around the average value in that class. These average values are called the
‘midpoints’, mi. In the shop’s example these are 2.5, 7.5, 12.5 and 17.5k euros. So our
approximation of the mean will be :
1 1
! × (1 × 2.5 + 7 × 7.5 + 9 × 12.5 + 3 × 17.5) = × 220 = 11 k euros.
20 20

It is the average of the midpoints weighted by frequencies.  

∑ fi × mi
Written as a formula that will look like mean ≈ , or (writing pi for the
∑ fi
∑ i
frequency percentages), equivalently: mean
! ≈ p × mi .
The mode in case of a frequency distribution, rather than a specific value, is a range of
values: the modal class is the class with the highest frequency of values. In this
example that would be the interval [10, 15[ .

The cumulative frequency percentages guide us in approximating the median and the
other quartiles, for which we will assume that the increase of the values within each
of the classes will be —approximately— linear, allowing us to proceed by linear
interpolation (Thales theorem).

We will use that fact that the median is a value Q2 such that 50% of all values are
below and 50% of all values are above it. We can similarly determine the first and the
third quartile, Q1 and Q3, the first one being a value such that 25% of all values are
below and 75% are above it, the second being a value such that 75% of all values are
below, and 25% are above it.

Here is how we find the three quartiles by means of linear interpolation:

Fall 2020 – Business Maths & Statistics, A1
In case inside each class the increase of the values (from 5 to 10, from 10 to 15 …) is
considered to be approximately linear, we can use Thales theorem to write the
following equations … and solve them to find approximate values for the quartiles:

Q1 − 5 25 − 5
! = ⟹ Q1 ≈ 7.86
10 − 5 40 − 5
Q2 − 10 50 − 40
! = ⟹ Q2 ≈ 11.11
15 − 10 85 − 40
Q3 − 10 75 − 40
! = ⟹ Q3 ≈ 13.89
15 − 10 85 − 40

Finally, to approximate the variance and standard deviation, we use —like for the
mean — the midpoints mi and the frequencies fi.. Basically, we approximate the
variance as the average of the squares of the differences between midpoints and the
approximated mean, weighted by the frequencies or frequency percentages. Only in
case the sample size is known, we can apply the usual ‘sample correction’. The
formulas to use therefore are the following:

∑ fi × (mi − mean)2
in case of a sample with known sample size: variance
! ≈
( ∑ fi) − 1
p × (mi − mean)2
∑ i
in case of population, or unknown sample size: variance
! ≈
(where the pi are the frequency percentages).

So for the variance in the example we find:

1 1
! × (1 × (2.5 − 11)2 + 7 × (7.5 − 11)2 + 9 × (12.5 − 11)2 + 3 × (17.5 − 11)2) = × 305 = 16.053
20 − 1 19

The standard deviation therefore is about 4.

Essentials of Biostatistics and Research
0% (1)
Essentials of Biostatistics and Research
6 pages
Basic Statistics
100% (9)
Basic Statistics
73 pages
Introduction To Performing Arts
No ratings yet
Introduction To Performing Arts
9 pages
Statistics
No ratings yet
Statistics
49 pages
Quantitative Data Analysis
100% (2)
Quantitative Data Analysis
27 pages
Basic Statistics (3685) PPT - Lecture On 20-01-2019
100% (1)
Basic Statistics (3685) PPT - Lecture On 20-01-2019
64 pages
Non Parametric Tests
100% (1)
Non Parametric Tests
49 pages
Hars Medium002 PDF
No ratings yet
Hars Medium002 PDF
13 pages
Identification of Outliers (Monographs On Statistics and - D. M. Hawkins (Auth.)
No ratings yet
Identification of Outliers (Monographs On Statistics and - D. M. Hawkins (Auth.)
194 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
Ebook - Data Analytics Course
No ratings yet
Ebook - Data Analytics Course
510 pages
MCQ Business Statistics
50% (2)
MCQ Business Statistics
41 pages
DSML
No ratings yet
DSML
510 pages
Class 1
No ratings yet
Class 1
52 pages
Unit V Small Sample Tests
No ratings yet
Unit V Small Sample Tests
27 pages
Unit 2 DS PDF
No ratings yet
Unit 2 DS PDF
97 pages
Notes 3 Descriptive Statistics RJMurden 2021
No ratings yet
Notes 3 Descriptive Statistics RJMurden 2021
47 pages
CH 2
No ratings yet
CH 2
49 pages
Unit 4
No ratings yet
Unit 4
152 pages
MMW 6 Data Management Part 3 Central Location Variability PDF
No ratings yet
MMW 6 Data Management Part 3 Central Location Variability PDF
5 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
53 pages
Studi Kasus Pengaruh-Motivasi-Kerja-Terhadap-Kinerja - 2 PDF
No ratings yet
Studi Kasus Pengaruh-Motivasi-Kerja-Terhadap-Kinerja - 2 PDF
14 pages
Statistics 080245
No ratings yet
Statistics 080245
39 pages
Lesson2 - Measures of Tendency
No ratings yet
Lesson2 - Measures of Tendency
65 pages
Ib A&i 3.1
No ratings yet
Ib A&i 3.1
38 pages
Improving Pretraining Data Using Perplexity Correlations
No ratings yet
Improving Pretraining Data Using Perplexity Correlations
31 pages
Hars Medium002 PDF
No ratings yet
Hars Medium002 PDF
13 pages
Chapter 01
No ratings yet
Chapter 01
56 pages
Screenshot 2024-07-22 at 10.26.36 AM
No ratings yet
Screenshot 2024-07-22 at 10.26.36 AM
35 pages
Lecture 03
No ratings yet
Lecture 03
31 pages
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
Statistics: Week 6 Sampling and Sampling Distributions
No ratings yet
Statistics: Week 6 Sampling and Sampling Distributions
51 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
53 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
51 pages
Hars Medium001 PDF
No ratings yet
Hars Medium001 PDF
5 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
34 pages
Individual Household Electric Power Consumption
No ratings yet
Individual Household Electric Power Consumption
29 pages
Important Measures of Central Tendency Are Mean, Median and Mode
No ratings yet
Important Measures of Central Tendency Are Mean, Median and Mode
31 pages
Measures of Location and VARIATION For 1 Variable
No ratings yet
Measures of Location and VARIATION For 1 Variable
44 pages
Lecture 3 Sem 1 Edited
No ratings yet
Lecture 3 Sem 1 Edited
30 pages
2) SummarizationOfData Mean Median Mod SD CV
No ratings yet
2) SummarizationOfData Mean Median Mod SD CV
24 pages
Statistics
No ratings yet
Statistics
30 pages
FDM Presentation
No ratings yet
FDM Presentation
13 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
43hyrs Principles of Statistics 3
No ratings yet
43hyrs Principles of Statistics 3
56 pages
Chi-Square Goodness of Fit Test
No ratings yet
Chi-Square Goodness of Fit Test
24 pages
Full Download Principles of Econometrics 4th Edition Hill Test Bank
100% (61)
Full Download Principles of Econometrics 4th Edition Hill Test Bank
35 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
31 pages
Statistics 1
No ratings yet
Statistics 1
10 pages
Ba 4 Sem Psychology Statistical Methods and Psychological Testing Winter 2018
No ratings yet
Ba 4 Sem Psychology Statistical Methods and Psychological Testing Winter 2018
9 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
Chapter 6 Correlation and Regression
No ratings yet
Chapter 6 Correlation and Regression
29 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Statistics 512 Notes 8: The Monte Carlo Method: X X H H X Is Normal (With Unknown
No ratings yet
Statistics 512 Notes 8: The Monte Carlo Method: X X H H X Is Normal (With Unknown
7 pages
LS 02 - Correlation - Regression
No ratings yet
LS 02 - Correlation - Regression
17 pages
INDR 372 Selected Solutions of Review Exercises For The Midterm Exam
No ratings yet
INDR 372 Selected Solutions of Review Exercises For The Midterm Exam
15 pages
C1S1 Statistics Packet
No ratings yet
C1S1 Statistics Packet
24 pages
Basic Statistics
No ratings yet
Basic Statistics
52 pages
Machine Learning Andrew NG Week 6
No ratings yet
Machine Learning Andrew NG Week 6
11 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Faculty Introduction: Tkachwala@nmims - Edu
No ratings yet
Faculty Introduction: Tkachwala@nmims - Edu
27 pages
Computational I So Morph Sms CL
No ratings yet
Computational I So Morph Sms CL
26 pages
Statistics
No ratings yet
Statistics
13 pages
Extending Intuitionistic Linear Logic With Knotted Structural Rules
No ratings yet
Extending Intuitionistic Linear Logic With Knotted Structural Rules
24 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
02 - Descriptive Statistics
No ratings yet
02 - Descriptive Statistics
45 pages
Unit - 2 Biostatistics
No ratings yet
Unit - 2 Biostatistics
9 pages
Lab 3 Statistics Intro
No ratings yet
Lab 3 Statistics Intro
12 pages
Statistics Notes
No ratings yet
Statistics Notes
16 pages
POLYMATH Polynomial Regression Migration Document: X Calc X Residual X Residual 2
No ratings yet
POLYMATH Polynomial Regression Migration Document: X Calc X Residual X Residual 2
9 pages
DDDDDD 2
No ratings yet
DDDDDD 2
5 pages
Psb-Bba: Linear Mathematics & MATRICES (11/12) : Harold Schellinx
No ratings yet
Psb-Bba: Linear Mathematics & MATRICES (11/12) : Harold Schellinx
13 pages
Note Chapter 3
No ratings yet
Note Chapter 3
14 pages
Statistical Analysis - Descriptive Stat
No ratings yet
Statistical Analysis - Descriptive Stat
6 pages
Impact of Roadway Geometric Features On Crash Severity On Rural Two-Lane Highways
No ratings yet
Impact of Roadway Geometric Features On Crash Severity On Rural Two-Lane Highways
9 pages
Data Mining Project
No ratings yet
Data Mining Project
5 pages
Statistics and Probabilities Quarter 1
No ratings yet
Statistics and Probabilities Quarter 1
6 pages
Math
No ratings yet
Math
6 pages
M.a.M.sc. Statistics Syllabus
No ratings yet
M.a.M.sc. Statistics Syllabus
14 pages
Dmda Mid-2 Assignment - 20.11.2024
No ratings yet
Dmda Mid-2 Assignment - 20.11.2024
2 pages
COSM - Lesson Plan (CSE)
No ratings yet
COSM - Lesson Plan (CSE)
4 pages
Measures of Dispersion Tendency
No ratings yet
Measures of Dispersion Tendency
7 pages
MMW-FINALS-REVIEWER - Etc
No ratings yet
MMW-FINALS-REVIEWER - Etc
4 pages
II B.Tech (MIC23) SMDS Model Paper-1
No ratings yet
II B.Tech (MIC23) SMDS Model Paper-1
2 pages
Ookoi Localized Sounds SoundingLocations - C
No ratings yet
Ookoi Localized Sounds SoundingLocations - C
7 pages
2 - INF STAT Bin Distr
No ratings yet
2 - INF STAT Bin Distr
6 pages
Data Management
No ratings yet
Data Management
7 pages
Community Project: ANCOVA (Analysis of Covariance) in SPSS
No ratings yet
Community Project: ANCOVA (Analysis of Covariance) in SPSS
4 pages
MATM111
No ratings yet
MATM111
8 pages
Mini-Test: Chapter 4 Student's Name:: A: B: C: D: F
No ratings yet
Mini-Test: Chapter 4 Student's Name:: A: B: C: D: F
2 pages
Chapt3 Overheads
No ratings yet
Chapt3 Overheads
8 pages
Model Sum of Squares DF Mean Square F Sig. 1 Regression .471 4 .118 1.576 .196 Residual 3.590 48 .075 Total 4.061 52 A. Predictors: (Constant), LC, EXT, DEBT, TANG B. Dependent Variable: DPR
No ratings yet
Model Sum of Squares DF Mean Square F Sig. 1 Regression .471 4 .118 1.576 .196 Residual 3.590 48 .075 Total 4.061 52 A. Predictors: (Constant), LC, EXT, DEBT, TANG B. Dependent Variable: DPR
3 pages
The Incommensurable Magnitude of Wikipedia's Recent Changes: A (First) Note On Very, Very, Very Long Pieces of Music
No ratings yet
The Incommensurable Magnitude of Wikipedia's Recent Changes: A (First) Note On Very, Very, Very Long Pieces of Music
5 pages
Psb-Bba: Linear Mathematics & MATRICES (1+2/12) : Harold Schellinx
No ratings yet
Psb-Bba: Linear Mathematics & MATRICES (1+2/12) : Harold Schellinx
18 pages
History of Tinnitus
No ratings yet
History of Tinnitus
2 pages
(Cergy-Préfecture, May 19th 2005) : J.K. Harsman
No ratings yet
(Cergy-Préfecture, May 19th 2005) : J.K. Harsman
1 page
1 Class Exercises2015
No ratings yet
1 Class Exercises2015
1 page
Measures of Central Tendency and Spread: Chapter 1, Section 2
No ratings yet
Measures of Central Tendency and Spread: Chapter 1, Section 2
36 pages
Summ. Stat&Prob q3m3
No ratings yet
Summ. Stat&Prob q3m3
2 pages
Appendix B: Introduction To Statistics: Eneral Terminology
No ratings yet
Appendix B: Introduction To Statistics: Eneral Terminology
15 pages
Arch Garch Assignment
No ratings yet
Arch Garch Assignment
5 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet

Simple Statistics

Uploaded by

Simple Statistics

Uploaded by

Fall 2020 – Business Maths & Statistics, A1

The goal of descriptive statistics is the study of a population (a set of individuals or

Sample – a subset of a previously defined population.

A statistic – a (usually numerical) characteristic of a property of a sample, generally

Qualitative variable – expresses a non-numeric characteristic, like gender, or eye-

Quantitative variable – the characteristic has a numeric value, it can be expressed

In this first introduction to statistics we will concentrate on quantitative variables,

In the a majority practical situations it will unfeasible / impossible to collect and

Here are the results:

[ Height H is a continuous random variable. We measure height, and —in principle—

In a second step, we provide a numerical summary of the data that we collected. A

Measures of central tendency

For the observed values of age and height in our sample: A

To find the median of a given set of quantitative data, proceed as follows:

For the observed values of age and height in our sample: Q

Here is a visualisation of the Age values in a so-called bar graph:

A different visualisation that we can use is the so-called pie chart.

As an indicator of this variation, we use what is —basically— the average of the

Here are the formulas:

So, for our sample, we calculate the variance of Age as:

2 × (17 − 17.9)2 + 7 × (18 − 17.9)2 + (19 − 17.9)2

height hi height - mean (height - mean)2

[160,165[ [165, 170[ [170, 175[ [175,180[

It is the average of the midpoints weighted by frequencies.

Here is how we find the three quartiles by means of linear interpolation:

So for the variance in the example we find:

The standard deviation therefore is about 4.

You might also like

Sample – a subset of a previously defined population. 

Measures of central tendency  

It is the average of the midpoints weighted by frequencies.