0% found this document useful (0 votes)

63 views17 pages

Descriptive Statistics: Organizing, Summarizing, Describing, and Presenting Data

This document presents essential methods of Descriptive Statistics for biomedical science students and professionals, covering techniques such as mean, median, mode, variance, standard deviation, and data visualization through graphs. It emphasizes the importance of correctly identifying variable types and provides practical examples to enhance understanding and communication of data in the biomedical field. The paper aims to empower users to effectively analyze and present data, which is crucial for research and decision-making.

Uploaded by

mwangi junior

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views17 pages

Descriptive Statistics: Organizing, Summarizing, Describing, and Presenting Data

Uploaded by

mwangi junior

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/375770059

Descriptive statistics: organizing, summarizing, describing, and presenting

data

Method · November 2023

DOI: 10.13140/RG.2.2.31782.91203

CITATION READS

1 4,705

1 author:

André Moreno Morcillo

State University of Campinas (UNICAMP)
202 PUBLICATIONS 1,922 CITATIONS

SEE PROFILE

All content following this page was uploaded by André Moreno Morcillo on 21 November 2023.

The user has requested enhancement of the downloaded file.

Descriptive statistics: organizing, summarizing, describing, and presenting data

André Moreno Morcillo1

Abstract: In this paper, we present essential methods of Descriptive Statistics for biomedical science
students and professionals. We explore data summary techniques such as the mean, median, and
mode, measures of dispersion such as variance and standard deviation, and position measures like
quartiles and z-scores. Furthermore, we emphasize the importance of data visualization through
graphs, including pie charts, bar charts, and box plots. We demonstrate how to calculate these
statistics practically and provide examples from the biomedical sciences. This paper aims to empower
students and professionals to understand and effectively communicate data, who is crucial for
research, diagnosis, and decision-making in the biomedical field.

The presentation of scientific research results requires the use of standard techniques or
methods, so that articles and reports can be evaluated by researchers in different countries.
This part of statistics, whose objective is to synthesize, organized and make the presentation of
data, is called Descriptive Statistics. Among other techniques, measures of central tendency, variability
(dispersion) and position can be used, as well as tables, graphs, etc.
Currently, we have excellent software for statistical analysis, and we rarely perform calculations
by hand. However, knowing how these calculations are done can enhance the understanding of the results
obtained with software. Another very important point is knowing which descriptive statistical methods
should be used for different types of variables. Considering these aspects, we present below the
calculation using elementary mathematics and interpretation of the main tools of descriptive statistics.

Working with information or data

The results of quantitative research are translated into information or data, which can express
either quantity or quality. The data that expresses a quantity are called quantitative data or variables,
while those that express a quality are called qualitative (categorical) data or variables. Weight, height,
body mass index, hemoglobin values are examples of quantitative variables. Classification according to
gender (male/female), family income (low/middle/high), and education level (low/middle/high) are
examples of qualitative or categorical variables. We have two types of categorical data: nominal and
ordinal. In the nominal categorical type, all categories have the same degree of importance. As an
example, we can mention gender, where male and female are categories with the same degree of
importance. On the other hand, in the ordinal categorical type, the categories have different degrees of
importance. For example, when we talk about high income, we know that these are families with higher
incomes than families with middle and low incomes. We also know that low income means lower income
than middle- and high-income groups.

1
André Moreno Morcillo, PhD, MD from the State University of Campinas, São Paulo, Brazil
ResearchGate: https://www.researchgate.net/profile/Andre-Morcillo/publications
[email protected]
Descriptive statistics: organizing, summarizing … Morcillo AM

Identifying the variable type correctly is very important, because descriptive statistics methods
and data analysis techniques are specific to each type of variable.

Descriptive statistics methods for quantitative data

When the data set is small, it is enough to present it in a simple way. There is no need to use
sophisticated techniques or resources. Given the set of age of 8 children [7, 6, 4, 7, 7, 8, 7, 12]. A simple
way to describe them would be: the youngest is 4 years old, while the oldest is 12 years old. The most
common age is 7 years.
Try repeating the same process with a slightly larger group. Below, we present the ages (in years)
of 60 patients.
20 48 30 44 97 76 24 48 20 68
89 60 33 53 64 5 24 54 82 67
8 76 65 7 33 37 31 70 10 84
1 60 89 63 22 58 35 45 44 72
3 34 27 2 66 66 33 4 48 20
91 98 58 43 63 96 43 7 92 81

The techniques or methods that will be presented were developed to facilitate the presentation
of large sets of data, enabling their reading and interpretation in a systematic and quick way.
To present quantitative data, some numerical methods are used, with the aim of describing what
occurs in the center of the distribution and how the data is dispersed (variability). These methods, known
as summary measures, can be divided into:

• Central tendency measures:

arithmetic mean, geometric mean, median and mode.
• Variability (dispersion) measures:
variance, standard deviation, range, interquartile range, interquartile interval and coefficient of
variation.
• Position measures:
quantiles and z-scores

Measures of central tendency

1. Arithmetic Mean

The arithmetic mean (mean) is one of the most used measures to describe central tendency. Its calculation
is very easy: we sum the measured values and then divide the result by the number of cases evaluated.

We indicate the mean of a population by  and a sample mean by 𝑥

̅.

2
Descriptive statistics: organizing, summarizing … Morcillo AM
∑𝑋 (2)
𝜇= 𝑁
X = sum of population values; N = number of cases in the population

∑𝑥
𝑥̄ =
𝑛
x = sum of sample values; n = number of elements in the sample

Example: given the set of numbers [99, 100, 101, 102, 105], its arithmetic mean will be:

(99 + 100 + 101 + 102 + 105)

𝑥̄ = = 101.4
5

The arithmetic mean has a disadvantage: it is greatly influenced by extreme values (very large or
very small) in relation to the data set. In the example above, if we change the value 100 to 60 the mean
becomes:
(60 + 99 + 101 + 102 + 105)
𝑥̄ = = 93.4
5
Changing a single element caused a decrease of 8 units in the group mean. Thus, the arithmetic
mean is a good parameter of central tendency when the data has a symmetric distribution3. If data are
positively or negatively skewed, the mean is not a good indicator of the center of the distribution. When
the data distribution is skewed, we should use the geometric mean or the median.

2. Geometric Mean

The geometric mean (gm) is a good parameter of central tendency for data greater than zero and
positively skewed, as occurs with the results of antibody titers, weight, body mass index, etc. Its
calculation is given by the formula:
𝑁
𝑔𝑚 = √(𝑥1 . 𝑥2 . 𝑥3 … 𝑥𝑁 )
It can also be calculated in a much more practical way. To do this, we work with the logarithms4
(logs) of the data. We determine the arithmetic mean of the logarithms and then calculate the
antilogarithm of the mean of the logs. The antilogarithm of the mean of the logs is equal to the geometric
mean. Let's look at a simple example: consider the five values: [10, 100, 1000, 10000, 100000]. Initially we
calculate the mean of the logarithms ( x Logs ).
[𝐿𝑜𝑔(10) + log(100) + log(1000) + log(10000) + log⁡(100000)]
𝑥̅ 𝑙𝑜𝑔 =
5
(1 + 2 + 3 + 4 + 5)
𝑥̅ 𝑙𝑜𝑔 = =3
5
Next, we determine the antilogarithm of the mean of logarithms (𝑥̅ 𝑙𝑜𝑔 )

∑𝑵
𝒊=𝟏 𝒙𝒊
2
The correct formula is 𝝁= 𝑵
. We use ∑ 𝑿 = ∑𝑵
𝒊=𝟏 𝒙𝒊 for convenience and ease.
3
An efficient way to assess the symmetry of a distribution is through a histogram.
4
In this text we use logarithms in base 10 ( Log x )
10
3
Descriptive statistics: organizing, summarizing … Morcillo AM
(𝑥̅𝑙𝑜𝑔 ) 3
𝑔𝑚 = 10 = 10 = 1000

3. Median

If we sort the data in ascending order, the median (md) is the value of the variable observed in
in the center of the distribution. The median divides the ordered data into two groups that have the same
number of cases. Half of the cases have lower values and the other half have values greater than the
median. The median is equivalent to the 50th percentile and the 2nd quartile. To determine it, the sample
must initially be ordered (ascending order) and then the element that occupies the central position must
be looked for. The variable value of this element is the median. In the previous example - given a set of
numbers 99, 100, 101, 102, 105:

Order 1st 2nd 3rd 4th 5th

Value 99 100 101 102 105

In the center of the distribution is occupied by the 3rd element whose value is 101. The median
of this group is 101 (md=101). Note that two elements of the distribution are smaller than the median (99
and 100) and two elements are larger than the median (102 and 105).
The most time-consuming step is to identify the element that is in the center of the distribution
of the data. Excel has a routine that automatically sorts data, which greatly simplifies the work. However,
identifying the central element is still a problem when we want to manually determine the median. We
can employ the following procedures to facilitate the work.

a) When the number of cases is odd, there is always an element in the center of the distribution, whose
position is given by:

𝑁+1
𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 =
2
N = number of cases

b) When the number of cases is even, we have two elements in the center of the distribution, and the
median will be the mean of them. The positions of the two elements can be determined by:
𝑁 𝑁
𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛⁡𝑜𝑓⁡𝑓𝑖𝑟𝑠𝑡⁡𝑒𝑙𝑒𝑚𝑒𝑛𝑡 = 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛⁡𝑜𝑓⁡𝑠𝑒𝑐𝑜𝑛𝑑⁡𝑒𝑙𝑒𝑚𝑒𝑛𝑡 = +1
2 2

N = number of cases

For example, consider the 10 values: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20. Applying the formulas above we will
have:

𝑁 10 𝑁
𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛⁡𝑜𝑓⁡𝑓𝑖𝑟𝑠𝑡⁡𝑒𝑙𝑒𝑚𝑒𝑛𝑡 = = =5 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛⁡𝑜𝑓⁡𝑠𝑒𝑐𝑜𝑛𝑑⁡𝑒𝑙𝑒𝑚𝑒𝑛𝑡 = +1 =6
2 2 2

4
Descriptive statistics: organizing, summarizing … Morcillo AM

Position 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th

Number two 4 6 8 10 12 14 16 18 20

The median will be the arithmetic mean of the values of the 5th and 6th elements.

(10 + 12)
𝑀𝑑 = = 11
2

The value 11, which was estimated by interpolation based on the values of the two central
elements of the distribution, does not belong to the data. In this other example with 6 elements 100, 105,
101, 98, 99, 103:

1. Initially we sort the data: 98, 99, 100, 101, 103, 105
2. Next, we determine the two central elements:
𝑁 6 𝑁
𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛⁡𝑓𝑖𝑟𝑠𝑡⁡𝑒𝑙𝑒𝑚𝑒𝑛𝑡 = = =3 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛⁡𝑠𝑒𝑐𝑜𝑛𝑑⁡𝑒𝑙𝑒𝑚𝑒𝑛𝑡 = +1 =4
2 2 2

Position 1st 2nd 3rd 4th 5th 6th

Value 98 99 100 101 103 105

3. Now, we can calculate the median:

(100 + 101)
Md = = 100.5
2

The median is not influenced by extreme values, unlike the arithmetic mean; therefore, it can be
used with both symmetrical and asymmetrical distributions. In the example above, if the sixth element
were changed to 105,000, the median of the distribution would be the same.

Position 1st 2nd 3rd 4th 5th 6th

Value 98 99 100 101 103 105,000

(100 + 101)
Md = = 100,5
2

5
Descriptive statistics: organizing, summarizing … Morcillo AM

4. Mode

The mode (mo) is the most frequent value in a data distribution. We can have data distributions
with no mode (amodal), with one mode (unimodal), with two modes (bimodal), or with more than two
modes (multimodal). In the previous example [100,105,101,98,99,103], all values occur once, therefore,
the distribution has no mode (amodal). But with a group of 15 children whose ages are [4, 5, 6, 7, 7, 7, 7,
7, 7, 7, 7, 7, 8, 8, 9], the mode is 7 because 7 is the most frequent age.

Measures of variability (dispersion)

1. Range
The range is the difference between the largest and smallest observed values. It is a measure of
dispersion calculated from only the two largest and smallest values, ignoring the others. Therefore, it is a
limited measure of the dispersion of the data set.
Considering the ages (years) of a group of 10 children [4, 5, 5, 6, 6, 6, 7, 7, 8, 8], the lowest
observed value is 4 and the highest value is 8. The range is 4 years.

𝑅𝑎𝑛𝑔𝑒 = 8 − 4 = 4⁡𝑦𝑒𝑎𝑟𝑠

Now, consider two datasets [10, 11, 12, 13, 14, 15, 60] and [10, 55, 56, 57, 58, 59, 60]. In both,
the range is equal to 50; however, this value does not correspond to the real variability of the groups.

2. Variance

Variance is a measure of variability (dispersion) that takes into account all values in the group.
We represent the variance of a population by σ2 and of a sample by s2.
To determine the variance, we calculate the deviation of each element from the group mean (x-
μ). Next, we calculate the squared differences (x-μ)2. Finally, we divide the sum of squared differences by
number of elements (N). The formula is:
∑(𝑥−𝜇)2
𝜎2 = 𝑁
.
When working with samples, we want the variance s2 to be a good estimate of the population
variance σ2. Considering this fact, we divide by (n−1) instead of n. The variance is calculated as follows:
∑(𝑥 − 𝑥̅ )2
𝑠2 =
𝑛−1
Example: considering the ages (years) of a group of 10 children [7, 5, 6, 7, 8, 6, 6, 8, 5, 4], initially we
calculate the mean:
(7 + 5 + 6 + 7 + 8 + 6 + 6 + 8 + 5 + 4)
𝑥̄ = = 6.2
10
Next, we create a table with three columns to facilitate the calculations. In the first column we
put the ages. In the second, the differences between each age and the mean of the group (𝑥 − 𝑥̄ ) and, in
the third, the values in the second column squared (𝑥 − 𝑥̅ )2 .
6
Descriptive statistics: organizing, summarizing … Morcillo AM

Ages (𝑥 − 𝑥̄ ) (x− x ) 2

7 0.8 0.64
5 -1.2 1.44
6 -0.2 0.04
7 0.8 0.64
8 1.8 3.24
6 -0.2 0.04
6 -0.2 0.04
8 1.8 3.24
5 -1.2 1.44
4 -2.2 4.84

Total 15.6

Now, we calculate the variance.

∑(𝑥 − 𝑥̅ )2 15.6
𝑠2 = = = 1.7⁡𝑦𝑒𝑎𝑟𝑠 2
𝑛−1 9
With some simple algebraic transformations, we can develop the numerator of the variance
formula, ∑(𝑥 − 𝑥̅ )2 , arriving at an equivalent expression that has the advantage of not using the mean.
(∑ 𝑥)2
∑(𝑥 − 𝑥̅ )2 = ∑(𝑥 2) −
𝑛
Thus, we now have a practical way to calculate the variance:
(∑ 𝑥)2
∑(𝑥 2) −
𝑠2 = 𝑛
𝑛−1
Returning to the previous example and applying this new formula, we have:

Ages x x2

7 7 49
5 5 25
6 6 36
7 7 49
8 8 64
6 6 36
6 6 36
8 8 64
5 5 25
4 4 16

Total 62 400

7
Descriptive statistics: organizing, summarizing … Morcillo AM
(∑ 𝑥)2 622
∑(𝑥 2) − 400 −
𝑠2 = 𝑛 = 10 = 1.7⁡𝑦𝑒𝑎𝑟𝑠 2
𝑛−1 9

3. Standard deviation

Variance is an excellent measure of variability; however, it is rarely used in publications. As we

squared the deviations, we also squared the units of measurement. Thus, the unit of variance for weight
will be kg2, for height will be cm2, etc. The interpretation of these dispersion units becomes very confusing
for the reader. Considering these facts, the square root of the variance is used, which is called standard
deviation. We indicate the standard deviation of a population by σ and of a sample by s.
𝜎 = √𝜎 2 and 𝑠 = √𝑠 2

The standard deviation from the previous example is: 𝑠 = √𝑠 2 = √1.7 = 1.3⁡𝑦𝑒𝑎𝑟𝑠
Because the standard deviation is the square root of the variance, it has the original unit in which
the data was measured. The standard deviation represents how far, on “average”, each observation is
from the mean of the group. The closer the values are to the mean, the smaller the standard deviation
will be and the further they are from the mean, the higher it will be.
Now, we present a new group of 10 children, to calculate the standard deviation of age and
compare it with the previous example: 4, 8, 9, 5, 12, 13, 14, 6, 5, 5.
The arithmetic mean age of this group is:

81
𝑥̄ = = 8.1⁡𝑎𝑛𝑜𝑠
10
The standard deviation is:

2
(∑ 𝑥)2 812
√∑(𝑥 ) − 𝑛 √781 − 10
𝑠= = = 3.7⁡𝑦𝑒𝑎𝑟𝑠
𝑛−1 9

Note that in the first group we had a mean of 6.2 and a standard deviation of 1.3 years. In the
latter, the mean is 8.1 and the standard deviation is 3.7 years.

The variance and standard deviation are good parameters of variability when the data has a
symmetric distribution. If the data is positively or negatively skewed, the variance and standard deviation
are not good indicators of the variability of the distribution. When the data distribution is skewed, we
should use the interquartile range or interquartile interval.

4. Coefficient of variation

The coefficient of variation (CV) is the ratio between the standard deviation and the sample
mean. The coefficient of variation, expressed as a percentage, is a measure used to compare the
dispersions of two or more groups.
𝑠
𝐶𝑉 = . 100
𝑥̄

Considering the two previous examples:

8
Descriptive statistics: organizing, summarizing … Morcillo AM

In the first group of children, we have 𝑥̄ ⁡= 6.2 and s = 1.3

𝑠 1.3
𝐶𝑉 = . 100 = . 100 = 21.0%
𝑥̄ 6.2
In the second group of children, we have 𝑥̄ ⁡= 8.1 and s = 3.7
𝑠 3.7
𝐶𝑉 = . 100 = . 100 = 45.7%
𝑥̄ 8.1
The variability (dispersion) of the second group is 2.2 times greater than that of the first.

Measures of position

1. Quartiles
We call any of the three values that divides the ordered set of data into four groups, each
containing 25% of the cases, a quartile. The 1st quartile separates the group formed by 25% of cases with
the lowest values. The 2nd quartile also divides the group into two subgroups with an equal number of
cases, with half of the cases having lower values and the other half having values higher than the 2 nd
quartile. The 3rd quartile separates the group with the highest values, also with 25% of cases, from the
remaining 75% that have lower values.
The 1st quartile is equivalent to the 25th percentile, the second is equivalent to the 50th percentile
and the median, while the 3rd quartile is equivalent to the 75th percentile.

25% 25% 25% 25%

Minimum 1st Quartile 2nd Quartile 3rd Quartile Maximum

We call the difference between the 3rd and 1st quartile the interquartile range (IQR). It expresses
the variability (dispersion) of cases that occupy the center of the distribution, excluding the smallest 25%
and the largest 25%. The interquartile interval is defined by the values of the 1st and 3rd quartiles.

𝐼𝑄𝑅 = ⁡3𝑟𝑑 ⁡𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 − 1𝑠𝑡 ⁡𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒⁡

How to determine the quartiles?

We initially sort the data and then identify the three values that divide the group into four
subgroups, each with an equal number of cases. To find the position of the element that corresponds to
the 1st Quartile (PQ1), we use the following formula PQ1=(N+1)/4, for the 2nd Quartile use PQ2=2.(N+1)/4
and for 3rd Quartile use PQ3=3.(N+1)/4.
When the position (P) of a quartile is an integer, there is an element in this position in the
researcher's data. Therefore, locate it and check the value of the variable under study. Its value is the
quartile.

9
Descriptive statistics: organizing, summarizing … Morcillo AM

When the quartile position P is a decimal number, the quartile is determined by interpolation,
from two elements of the data set that include P. For example, if PQ1 is 8.3, we use the values of the 8th
and 9th elements in the interpolation. The decimal part, 0.3, is the weighting factor. The formula is:
Quartile = x(8th element) + 0.3.[x(9th element) - x(8th element)]
The quartile value is higher than the value of the 8th element and lower than the value of the 9th
element. For example, if the values of the 8th and 9th elements are 90 and 100, respectively, the quartile
will be: Quartile = 90 + 0.3.[100 - 90] = 93.

2. Z scores
The z-score represents the relative position of the elements in a group in relation to their mean.
The z-score expresses, in standard deviation units, the distance that a given value is in relation to the
mean. To calculate the z score, we use the formula:

(𝑥 − 𝑥̄ )
𝑧 − score =
𝑠
x: variable value; 𝑥̄ : sample mean; s: sample standard deviation

For example, given the set of numbers [100, 101, 105.2, 99.2, 100.5], we initially calculated the
mean and the standard deviation: 𝑥̅ =101.18 and s=2.34. To determine the z-score of 105.2, we do the
following:

(𝑥 − 𝑥̄ ) 105.2 − 101.18
𝑧 − 𝑠𝑐𝑜𝑟𝑒 = = = +1.71
𝑠 2.34

The z-score of 105.2 is +1.71, which means that 105.2 is 1.71 standard deviation units above the
mean of the data group.
The z-score is commonly used in the assessment of the growth of children and adolescents, as
well as in the standardization of variables for machine learning processing.

Data quality assessment

Initially, we should perform a careful assessment of the data, looking for potential problems. This
important step precede the final analysis. For this evaluation, the most important thing is the experience
of the person who will carry out the analysis. It is essential to know the nature and distribution of each of
the variables under study, as well as to evaluate the “quality” of the data that will be analyzed.
When we talk about “quality”, we are referring to the methodological rigor used during
measurements, typing errors, outliers, etc. After this preliminary evaluation, after evaluating the
distribution of the data, descriptive analysis and the application of statistical tests can begin.
Special care should be taken with the outliers. These atypical data are those that are very far
from the center of the distribution, and that can even occur, although sometimes they result from errors
in measurement, notation, or even typing. Outliers are values that are greater than 3rd quartile+1.5.IQR
or less than 1st quartile-1.5.IQR, where IQR is the interquartile range.
10
Descriptive statistics: organizing, summarizing … Morcillo AM

For example, in a study on the height of school-age children, we found cases with a value of
220cm and 240cm. Most likely, there was an error at the time of the anthropometric examination, when
taking notes or even when typing, as it is impossible for there to be school-age children so tall. If these
cases are not removed from the group, there will be serious distortion in the mean and standard
deviation, compromising the statistical tests.
The box plot graph is a very useful and practical tool for conducting this preliminary analysis of
quantitative data. This graph is constructed from five points: the minimum, the first quartile, the second
quartile, the third quartile, and the maximum.
In a Cartesian coordinate system, we begin by marking the minimum and maximum. Next, we
draw a rectangle that passes through the first quartile and the third quartile. Then, we mark the median
inside the rectangle. Finally, we draw two straight-line segments with length equal to 1.5 times the
interquartile range (IQR). The first straight segment is drawn above the upper edge of the rectangle, and
the other is drawn below the lower edge. Cases whose values fall outside of the two extremes of the
straight-line segments are considered outliers and must be reevaluated before proceeding with data
analysis. The figure below shows a box plot.

Descriptive Statistics of categorical or qualitative data

To present qualitative data, we determine frequency distributions and present them in tables
and graphs.

11
Descriptive statistics: organizing, summarizing … Morcillo AM

1. Simple frequency distribution

To obtain a frequency distribution of categorical data, we simply count how many cases there
are in each category. The frequencies of the categories can be expressed as their absolute number or as
a percentage of the total. Calculating the percentage of a given category is very simple: divide the absolute
frequency by the total and multiply by 100. In the next example, the percentage for the eutrophy group
would be:
Eutrophy (%) = 412 / 521 x 100 = 79.07869

We generally approximate to one decimal place which, in the example above, results in 79.1%.

Nutritional assessment using Gomez's criteria of 521 preschool children.

(N) (%)

Eutrophy 412 79.1

Mild Malnutrition 104 20.0
Moderate Malnutrition 5 1.0
Severe Malnutrition 0 0

Total 521 100.0

Sometimes, it may be of interest to the researcher to also present the cumulative frequency. See
the next table.
Nutritional assessment using Gomez's criteria of 521 preschool children.

(N) (%) (%) Accumulated

Eutrophy 412 79.1 79.1

Mild Malnutrition 104 20.0 99.1
Moderate Malnutrition 5 1.0 100.1
Severe Malnutrition 0 0 0

Total 521 100.1 100.1

When working with quantitative variables, it becomes necessary to group the data into
categories to present them in the form of a frequency distribution. The data is grouped into class intervals,
the number of which should not be small or very large, and it is recommended that it range from 5 to 20.
There are some formulas to determine the number of classes, but logic and common sense seem to be
more useful. It is necessary to keep in mind that class intervals must be established in such a way that all
data can be included in only one of the classes. Below we have a frequency distribution of a quantitative
variable (age in months) grouped into class intervals.

12
Descriptive statistics: organizing, summarizing … Morcillo AM

Age distribution (years) of 521 preschool children.

Age (months) (N) (%)

36.0 –| 48.0 35 6.7
48.0 –| 60.0 70 13.4
60.0 –| 72.0 168 32.2
72.0 –| 83.9 204 39.2
84.0 –| 96.0 44 8.4
Total 521 99.9

2. Distribution in relation to two qualitative variables – contingency tables

In this case, the objective is to build a table containing information about two or more variables
of a population or sample.
Distribution of 521 preschool children.

Age (months) Female Male Total

36.0 – 47.9 15 (42.9) 20 (57.1) 35 (100.0)

48.0 – 59.9 41 (58.6) 29 (41.4) 70 (100.0)
60.0 – 71.9 81 (48.2) 87 (51.8) 168 (100.0)
72.0 – 83.9 99 (48.5) 105 (51.5) 204 (100.0)
84.0 – 95.9 24 (54.5) 20 (45.5) 44 (100.0)

Total 260 (49.9) 261 (50.1) 521 (100.0)

N (%)

3. Graphical presentation

a) Pie charts
Pie charts are recommended to present frequency distributions. The area of the circle assigned
to each category is proportional to its frequency. The most practical way to determine it, knowing that
the total (100%) corresponds to an angle of 360º, is: Desired angle = (% x 360)/100. For example, for a
frequency of 45% we must take an angle of 162º: Desired angle = (45 x 360)/100 = 162º.

Below we present an example of a pie chart.

13
Descriptive statistics: organizing, summarizing … Morcillo AM

b) Bar Charts
In the same way as the previous one, this type of graph is recommended for presenting frequency
distributions. In this case, the frequency is related to the height of the bar, and the bars must have the
same width. Below we present a bar graph expressing the distribution of frequencies in relation to family
per capita income.

How to select the appropriate technique for publishing the results?

Guidelines for authors of major medical journals (JAMA, NEJM, BMJ, etc.) are an excellent source
of information. Spriestersbach et al. (2009), Lang & Altman (2015), and Ou et al. (2020) provide general
guidance on the proper presentation of results in articles.
The choice of the best technique should be guided by the type of variable. Additionally, in the
case of quantitative variables, the distribution shape (symmetrical, positively skewed, or negatively
skewed) should be considered. See the examples presented below.
Amorin et al. (2021) conducted a cross-sectional study with 26 children (6-12 years old) from
Londrina, Brazil, with the aim of evaluating eosinophil counts in relation to vitamin D levels. The patients
were stratified into two groups based on the median of vitamin D. Note that for some quantitative
variables, the mean and standard deviation were used, while for others, the median and interquartile

14
Descriptive statistics: organizing, summarizing … Morcillo AM

interval were employed. The criterion for choosing the technique to be used was the shape of the
variable’s distribution (symmetrical, positively skewed, or negatively skewed).

Shakti et al. (2014) selected 543 patients with idiopathic pericarditis and pericardial effusion
registered in the Pediatric Health Information System database (PHIS) – USA, with the aim of
characterizing the patients and hospitalization data. Table 1 presents the demographic data and clinical
characteristics of the patients. Please note that the authors chose to present the results of quantitative
variables in the form of median and interquartile range.

15
Descriptive statistics: organizing, summarizing … Morcillo AM

Bibliography

Altman DG. Practical statistics for medical research. 1st ed. London: Chapman & Hall, 1991.
Amorin CLC, Oliveira JM, Rodrigues A, Furlanetto KC, Pitta F. J Bras Pneumol. 2021;47(1):e20200279.
doi.org/10.36416/1806-3756/e20200279.
Bland M. An introduction to medical statistics. 2nd ed. New York: Oxford University Press, 1995.
Daniel WW. Biostatistics – A foundation for analysis in the health sciences. 6th. Edition. New York: John
Wiley & Sons, Inc., 1995.
Devore JL. Probability and Statistics for Engineering and the Sciences. 8th Ed. Boston: Brooks/Cole,
Cengage Learning, 2012.
Hazra A, Gogtay N. Biostatistics Series Module 1: Basics of Biostatistics. Indian J Dermatol. 2016; 61(1):
10–20.
Lang TA, Altman DG. Basic statistical reporting for articles published in Biomedical Journals: The
‘‘Statistical Analyses and Methods in the Published Literature’’ or the SAMPL Guidelines. International
Journal of Nursing Studies. 2015; 52:5–9.
Lowry L. Concepts and Applications of Inferential Statistics. URL: http://vassarstats.net/textbook/.
Accessed: 28/10/2023.
Ou F-S, Le-Rademacher JG, Ballman KV, Adjei AA, Mandrekar SJ. Guidelines for Statistical Reporting in
Medical Journals. J Thorac Oncol. 2020 Nov;15(11):1722-1726. doi: 10.1016/j.jtho.2020.08.019.
Shakti D, Hehn R, Gauvreau K, Sundel RP, Newburger JW. Idiopathic Pericarditis and Pericardial Effusion
in Children: Contemporary Epidemiology and Management. J Am Heart Assoc. 2014; 3(6): e001483.
Spriestersbach A, Röhrig B, du Prel J-B, Gerhold-Ay A, Blettner M. Descriptive statistics: the specification
of statistical measures and their presentation in tables and graphs. Part 7 of a series on evaluation of
scientific publications. Dtsch Arztebl Int. 2009; 106(36):578-83. doi: 10.3238/arztebl.2009.0578.
Tukey JW. Exploratory data analysis. London: Addison-Wesley Publishing Company, 1977.
Zar J. Biostatistical analysis. 2nd ed. Englewood Cliffs: Prentice-Hall Inc., 1984.

o Ψo

View publication stats

Criminal Justice in Canada 7th Edition by Colin Goff
No ratings yet
Criminal Justice in Canada 7th Edition by Colin Goff
305 pages
Professional Education No. 10
No ratings yet
Professional Education No. 10
12 pages
Statistical Analysis With Software Application
100% (1)
Statistical Analysis With Software Application
6 pages
Why Is Research A Cyclical Process?
No ratings yet
Why Is Research A Cyclical Process?
17 pages
Module 5 Ge 114
No ratings yet
Module 5 Ge 114
15 pages
Types of Pedagogical Approaches
No ratings yet
Types of Pedagogical Approaches
37 pages
Introduction To Statistics
100% (1)
Introduction To Statistics
60 pages
Exam Registration Form B2
No ratings yet
Exam Registration Form B2
1 page
1 Biostatistics LECTURE 1
100% (1)
1 Biostatistics LECTURE 1
64 pages
Chapter 3.1 Managing and Caring For The Self
No ratings yet
Chapter 3.1 Managing and Caring For The Self
27 pages
Descriptive Statistics
0% (1)
Descriptive Statistics
3 pages
Statistics A Review
No ratings yet
Statistics A Review
47 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
9 pages
Statistics
No ratings yet
Statistics
81 pages
Introduction To Statistics For IGCSE Students
No ratings yet
Introduction To Statistics For IGCSE Students
10 pages
Institute of Teacher Education Syllabus in Purposive Communication
No ratings yet
Institute of Teacher Education Syllabus in Purposive Communication
14 pages
Stat - Lesson 1 Concepts and Definitions
No ratings yet
Stat - Lesson 1 Concepts and Definitions
5 pages
MATH2203 Statistics I - Week 1
No ratings yet
MATH2203 Statistics I - Week 1
27 pages
November 2014 Extended Essay Reports: German B
No ratings yet
November 2014 Extended Essay Reports: German B
3 pages
Business Research Methods and Statistics Using SPSS (Chapter 7 - Describing and Presenting Your Data)
No ratings yet
Business Research Methods and Statistics Using SPSS (Chapter 7 - Describing and Presenting Your Data)
29 pages
Subodh Admission Form
No ratings yet
Subodh Admission Form
9 pages
Unit 2 DS PDF
No ratings yet
Unit 2 DS PDF
97 pages
Basic Concepts in Statistics
No ratings yet
Basic Concepts in Statistics
42 pages
ENGDAT1 Module1 PDF
No ratings yet
ENGDAT1 Module1 PDF
34 pages
Emdad Rahman
No ratings yet
Emdad Rahman
85 pages
PSYC6102 Psychological Statistics
No ratings yet
PSYC6102 Psychological Statistics
39 pages
Ns Statistics 2022
No ratings yet
Ns Statistics 2022
70 pages
Q3 - WK 2.1 Introduction To Statistics
No ratings yet
Q3 - WK 2.1 Introduction To Statistics
31 pages
Module 005 - Descriptive Statistics
No ratings yet
Module 005 - Descriptive Statistics
13 pages
Sponsorship
No ratings yet
Sponsorship
1 page
Week 1 Quantitative
No ratings yet
Week 1 Quantitative
32 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
101 pages
Lect 1
No ratings yet
Lect 1
47 pages
Descr Iptive Statis Tics: Inferential Statistics
No ratings yet
Descr Iptive Statis Tics: Inferential Statistics
36 pages
Ch1 Prob&Stat NEW
No ratings yet
Ch1 Prob&Stat NEW
35 pages
Module Ii. Eged 107
No ratings yet
Module Ii. Eged 107
37 pages
Biostatics Course
No ratings yet
Biostatics Course
29 pages
Chapter Summary - SRM - Triad 2
No ratings yet
Chapter Summary - SRM - Triad 2
17 pages
1 Nature of Statistics
No ratings yet
1 Nature of Statistics
33 pages
1st Lecture-Introduction To Biostatistics and Types of Data-15!02!2025
No ratings yet
1st Lecture-Introduction To Biostatistics and Types of Data-15!02!2025
27 pages
Form Ac-Aws007e
No ratings yet
Form Ac-Aws007e
7 pages
Lecture 3 Descriptive Statistics
No ratings yet
Lecture 3 Descriptive Statistics
19 pages
PHS202 Biostatistics
No ratings yet
PHS202 Biostatistics
26 pages
Important Concepts Doc
No ratings yet
Important Concepts Doc
40 pages
Eps B301
No ratings yet
Eps B301
4 pages
احصاء حيوي
No ratings yet
احصاء حيوي
37 pages
Report Stat
No ratings yet
Report Stat
21 pages
Lecture No 01 Statistics 13-2-24
No ratings yet
Lecture No 01 Statistics 13-2-24
34 pages
1 - 2 Biostatistics
No ratings yet
1 - 2 Biostatistics
24 pages
Educational-Statistics Basic-Terms Sampling Data-Gathering
No ratings yet
Educational-Statistics Basic-Terms Sampling Data-Gathering
21 pages
2.educational Statistics - Learning Insights
No ratings yet
2.educational Statistics - Learning Insights
26 pages
Study Guide - Describing Data
No ratings yet
Study Guide - Describing Data
18 pages
Chapter 5 Selection
No ratings yet
Chapter 5 Selection
15 pages
Statistics Part1
No ratings yet
Statistics Part1
28 pages
5630-1 Final
No ratings yet
5630-1 Final
15 pages
2introduction To STATISTICS
No ratings yet
2introduction To STATISTICS
24 pages
ECONOMIC STATISTICS Lecture Notes
No ratings yet
ECONOMIC STATISTICS Lecture Notes
18 pages
GE 104 Module 4
No ratings yet
GE 104 Module 4
24 pages
Chapter 12: Quantitative Data Analysis: Descriptive Statistics
No ratings yet
Chapter 12: Quantitative Data Analysis: Descriptive Statistics
14 pages
Statistics and Probability
No ratings yet
Statistics and Probability
17 pages
Minimum Criteria For MS-M.Phil-Ph.D Program
No ratings yet
Minimum Criteria For MS-M.Phil-Ph.D Program
9 pages
Chapter 1 - Definition and Uses of Statistics - 1
No ratings yet
Chapter 1 - Definition and Uses of Statistics - 1
15 pages
Chapter One Definition of Statistics
No ratings yet
Chapter One Definition of Statistics
17 pages
Statistics Introduction
No ratings yet
Statistics Introduction
8 pages
DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
No ratings yet
DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
15 pages
Ch-1, What Is Statistics
No ratings yet
Ch-1, What Is Statistics
11 pages
The Educator As Assessor: Tutorial Letter: 202/2018
No ratings yet
The Educator As Assessor: Tutorial Letter: 202/2018
14 pages
Introduction To Qa
No ratings yet
Introduction To Qa
4 pages
Course Outline - ELEE 4750U - Fall 2021
No ratings yet
Course Outline - ELEE 4750U - Fall 2021
11 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
4 pages
This Study Resource Was: Supply Chain Management ADM 3302 M Winter 2018
No ratings yet
This Study Resource Was: Supply Chain Management ADM 3302 M Winter 2018
8 pages
Ststistical Concepts and Market Returns
No ratings yet
Ststistical Concepts and Market Returns
7 pages
Module 1
No ratings yet
Module 1
10 pages
Module Information Pack MKT4002 2019 20
No ratings yet
Module Information Pack MKT4002 2019 20
19 pages
Statistics: Basic Concepts
No ratings yet
Statistics: Basic Concepts
5 pages
Class Vi Jawahar Navodaya Vidyalaya Selection Test - 2021
No ratings yet
Class Vi Jawahar Navodaya Vidyalaya Selection Test - 2021
2 pages
Statistics
No ratings yet
Statistics
4 pages
Principles of Accounting
No ratings yet
Principles of Accounting
3 pages
Annual Exam Mumbai University
No ratings yet
Annual Exam Mumbai University
1 page
Nitesh CV
No ratings yet
Nitesh CV
3 pages
10 38016-Jista 922663-1720006
No ratings yet
10 38016-Jista 922663-1720006
7 pages
Assessment Breakdown 2023 PSIH2724
No ratings yet
Assessment Breakdown 2023 PSIH2724
5 pages
Ipsas Diploma
No ratings yet
Ipsas Diploma
2 pages
Science Horizontal Articulation Survey
No ratings yet
Science Horizontal Articulation Survey
4 pages
Enrollment Form
No ratings yet
Enrollment Form
2 pages
Gurukul The School, Ghaziabad Page 1 of 1
No ratings yet
Gurukul The School, Ghaziabad Page 1 of 1
2 pages
Lesson Guide - CEPC 112-Week 5&6 - ConsMat-Lec
No ratings yet
Lesson Guide - CEPC 112-Week 5&6 - ConsMat-Lec
2 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet

Descriptive Statistics: Organizing, Summarizing, Describing, and Presenting Data

Uploaded by

Descriptive Statistics: Organizing, Summarizing, Describing, and Presenting Data

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Descriptive statistics: organizing, summarizing, describing, and presenting

Method · November 2023

André Moreno Morcillo

The user has requested enhancement of the downloaded file.

André Moreno Morcillo1

Working with information or data

Descriptive statistics methods for quantitative data

• Central tendency measures:

Measures of central tendency

We indicate the mean of a population by  and a sample mean by 𝑥

(99 + 100 + 101 + 102 + 105)

Order 1st 2nd 3rd 4th 5th

Value 99 100 101 102 105

Position 1st 2nd 3rd 4th 5th 6th

Value 98 99 100 101 103 105

3. Now, we can calculate the median:

Position 1st 2nd 3rd 4th 5th 6th

Value 98 99 100 101 103 105,000

Measures of variability (dispersion)

Now, we calculate the variance.

Variance is an excellent measure of variability; however, it is rarely used in publications. As we

Considering the two previous examples:

In the first group of children, we have 𝑥̄ ⁡= 6.2 and s = 1.3

25% 25% 25% 25%

Minimum 1st Quartile 2nd Quartile 3rd Quartile Maximum

𝐼𝑄𝑅 = ⁡3𝑟𝑑 ⁡𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 − 1𝑠𝑡 ⁡𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒⁡

How to determine the quartiles?

Data quality assessment

Descriptive Statistics of categorical or qualitative data

1. Simple frequency distribution

Nutritional assessment using Gomez's criteria of 521 preschool children.

Eutrophy 412 79.1

Total 521 100.0

(N) (%) (%) Accumulated

Eutrophy 412 79.1 79.1

Total 521 100.1 100.1

Age distribution (years) of 521 preschool children.

Age (months) (N) (%)

2. Distribution in relation to two qualitative variables – contingency tables

Age (months) Female Male Total

36.0 – 47.9 15 (42.9) 20 (57.1) 35 (100.0)

Total 260 (49.9) 261 (50.1) 521 (100.0)

Below we present an example of a pie chart.

How to select the appropriate technique for publishing the results?

View publication stats

You might also like