Types of Biological Variables: Shreemathi S. Mayya, Ashma D Monteiro, Sachit Ganapathy

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Statistics Corner

Types of biological variables


Shreemathi S. Mayya, Ashma D Monteiro, Sachit Ganapathy

Department of Statistics, Manipal University, Manipal-576104, Karnataka, India


Correspondence to: Dr. Shreemathi S. Mayya, Associate Professor (Sr. Scale). Department of Statistics. Manipal University, Manipal-576104,
Karnataka, India. Email: [email protected].

Abstract: Identification and description of variables used in any study is a necessary component in
biomedical research. Statistical analyses rely on the type of variables that are involved in the study. In this
short article, we introduce the different types of biological variables. A researcher has to be familiar with
the type of variable he/she is dealing with in his/her research to decide about appropriate graphs/diagrams,
summary measures and statistical analysis.

Keywords: Biological variables; discrete variables; continuous variables; categorical variables

Submitted Apr 19, 2017. Accepted for publication May 09, 2017.
doi: 10.21037/jtd.2017.05.75
View this article at: http://dx.doi.org/10.21037/jtd.2017.05.75

Introduction infinitely many values in a given range. This means that,


we can always find an intermediate value between any two
Research question is the initial and integral step in any
values, however close they are. For example, in a given
research work. Depending on the research questions to be
range of 5–10 cm length one can write infinitely many
answered and the data available, researchers decide about
values like 5, 5.1, 5.12, 5.01, 5.003 cm… etc, depending on
the statistical methods to be used for analysis. Researchers
the extent of accuracy decided by the researcher. Height
have to be acquainted with the variety of variables involved
of a person, weight, age, arm length, blood pressure,
in their study to choose appropriate diagrams/graphs and temperature, glucose level are some of the examples for
summary measures for presentation, and valid statistical continuous variable. Here the obtained measurements can
tests for the analysis of data. take any value in a given range.
Information collected about a sample of subjects (often Discrete variable (discontinuous variable) can take only
patients) comprises characteristics which vary among the specified number of values in a given range. For example,
subjects. Any characteristic, which varies from individual number of children per family in a given range of 0–5 can
to individual is called a variable (1). The characteristics be 0, 1, 2, 3, 4 and 5. No more values in this range can be
such as age, sex, height, weight, body mass index (BMI), written. Number of visits to hospital in a year, number
blood group, body temperature, blood glucose level, blood of children in a family, number of admitted patients in a
pressure, heart rate, number of teeth, severity of disease hospital ward, number of missing teeth etc. are some of
(mild, moderate, severe) etc. are some of the examples the examples for discrete variables. Discrete variables are
for biological variables in research. A basic distinction in usually counts.
the nature between these variables is their quantitative or
qualitative (categorical) measurements (1,2).
Qualitative variables

Qualitative (categorical) variables are those characteristics


Quantitative variables
which are not numerically measurable. These variables are
Quantitative variables are those characteristics which can be either nominal (no natural ordering) or ordinal (ordered
a count or measured numerically. They can be continuous categories). Usually, for the purpose of data entry and
or discrete. Continuous variable can theoretically take analysis using software, categories are coded assigning

© Journal of Thoracic Disease. All rights reserved. jtd.amegroups.com J Thorac Dis 2017;9(6):1730-1733
Journal of Thoracic Disease, Vol 9, No 6 June 2017 1731

numerical values. Dealing with Likert type data


Nominal variables allow for only classification or
Likert scale is developed with a principle of measuring
categorization based on some distinctively different
attitudes by asking people to respond to a series of
characteristic, but we cannot rank order those categories.
statements about a topic, in terms of the extent to which
Typical examples of nominal variables are sex, religion, blood
they agree with them (4). A statement (Likert item) such
group, symptoms of disease, cause of death etc. Numerical
as: “It’s important for all biologists to learn statistics” can
values assigned to different categories are useful for the
be asked to be rated as 1= strongly disagree, 2= disagree,
purpose of identification only (e.g., 1= male, 2= female). When
3= neither agree nor disagree, 4= agree, or 5= strongly agree
a qualitative variable has only two categories (alive/dead,
or sometimes on seven values instead of five, including “very
male/female, diabetic/non-diabetic), it is called a binary or strongly disagree” and “very strongly agree”. Variables
dichotomous variable. Nominal variables are summarized measured on Likert item are a type of ordinal variables.
through counting (frequency) and expressing proportion of Likert scale is the result of adding together the scores
each category (percentage). on several Likert items. Likert scale may be treated as a
Ordinal variables allow us to rank order the categories continuous variable. Descriptive and inferential statistics
in terms of which category has less and which category depend on the distribution of scores, symmetric or skewed.
has more of the quality represented by the variable, but
the distances between categories are not known. A typical
example of an ordinal variable in medicine is the stages Presentation of data
of a diseases (stage I to stage IV). For example, we know Qualitative variables
that “stage I” is less severe than “stage II” of a disease
but we cannot tell the exact difference between the two Qualitative data (nominal or ordinal variable) may be
presented in the form of frequency tables. We count
stages. Socioeconomic status of families (low, middle and
the number of subjects/units in each category of the
high socio-economic status), BMI category (underweight,
variable along with percentage and present the numbers
normal, overweight, obese), disease condition (deteriorated,
and percentages in a table. E.g., we summarize Blood
same, improved), Pain score etc. are a few examples for
group distribution of 100 subjects in the form of a table
ordinal variables. Numerical values assigned for various
showing blood group and corresponding frequency along
categories are useful for identification as well as rank
with percentages. If we have the data for two categorical
ordering (e.g., 1= low, 2= middle and 3= high income
variables, data may be presented in the form of a
group). Ordinal variables are summarized through counting
contingency table showing frequency and percentages.
(frequency) and expressing proportion of each category
As ordinal variables are also categorical variables with
(percentage). a pre-determined order, the descriptive measures such as
frequency and percentage has to be reported when the
Categorizing a continuous variable number of categories are few. In addition, median, inter-
quartile range along with maximum and the minimum value
Quantitative variables are often converted to categorical is considered appropriate for summarizing ordinal variables.
ones using “Cut-points”. Instead of presenting the mean Nominal data and ordinal data with limited number of
fasting glucose level of male and female subjects, one categories can also be presented in a diagrammatic form,
may prefer to present the proportion of diabetics in male such as a bar chart and pie chart. In a bar chart, length of
and female population using a fasting glucose level of the bars represents the frequency or relative frequency of
110 mg/dL as the cut-point to categorize the subjects as each category of the variable. Usually the bars are of equal
diabetic/ non-diabetic. However, categorizing a continuous width and there is a space between them. A pie chart is
variable lead to loss of information (3). For example, while essentially a circle divided into segments with the area of each
categorizing, subjects with fasting glucose level of 85 and proportional to the observed frequency in each category of
109 mg/dL are treated as equal and classified as non- the variable. Total area represents the total frequency.
diabetic. Similarly, subjects with glucose level 111 and
150 mg/dL are classified as diabetic. The difference in the
Quantitative variables
values will not be noticed while presenting only the number
of diabetic and non-diabetic cases. Mean and standard deviation are appropriate summary

© Journal of Thoracic Disease. All rights reserved. jtd.amegroups.com J Thorac Dis 2017;9(6):1730-1733
1732 Mayya et al. Types of variables

measures for continuous variables with symmetrical value for t-test compared to chi-square test. An important
distributions. Median and inter-quartile range are to be message that we try to convey here is that, statistical tests
computed to summarize quantitative variables with skewed will have more power for a continuous variable than the
distributions. Range is informative if used as a supplement corresponding nominal or ordinal variables (2). In other
to standard deviation or inter quartile range. Discrete words, to achieve the same power as that of a parametric
variables may be summarized and analyzed either as a test, non-parametric tests require larger sample size than a
continuous variable or as an ordinal variable depending on parametric test. Therefore, one may categorize the data for
the number of distinct values. the purpose of presentation (e.g., hypertensive/normal), but
Quantitative data can be represented graphically by means not for statistical analysis (3).
of a histogram. Histogram is useful to decide about the Detailed discussion of various tests is out of the scope
shape of the distribution, symmetrical or skewed. But, with of this article. Campbell & Swinscow (2) have summarized
small samples, histogram may not be useful to identify the the tests suitable for various types of variables in a single
shape. As a rule of thumb, if the mean is smaller than twice table. For computation procedure and more details about
the standard deviation the data are likely to be skewed for various parametric tests, researchers may refer some standard
variable with positive values (5). Quantitative data can also be text books (1,3,8). For a good discussion of a number of
displayed as stem & leaf plots, dot plots, box & whisker plots nonparametric tests readers may refer Siegel and Castellan (9)
and scatterplots, depending on the situation (6). and Conover (10).

Analysis of data Conclusions

Type of the variables decides the type of statistical analyses The type of descriptive and analytical measures to be used
to be performed, parametric or non-parametric. Parametric in data summarization and analysis, all depend on the type
methods, such as t-tests, ANOVA, Pearson’s correlation, of variables. Therefore, to obtain the relevant measures for
and regression, require the assumption that the data follow dataset at hand, we recommend the researchers to study the
a normal distribution and that variances of the distributions characteristics of data (categorical, quantitative) and shape
are equal. Frequently used nonparametric methods are of the frequency distribution (symmetrical bell shaped,
Mann-Whitney or Wilcoxon rank sum test, Wilcoxon skewed) before deciding about the descriptive measures,
signed rank test and rank correlation. Non-parametric graphs and diagrams, and statistical tests suitable for the
methods, make no assumptions about the distribution of presentation and analysis of data.
the data; they use the rank order of observations rather
than actual measurements (7). Chi-square test (or Fisher’s
Acknowledgements
exact test if the numbers are very small) is the most often
used method to compare categorical data. Failure to pay None.
attention to assumptions and their implications can lead to
increase in type I or type II errors.
Footnote
We analyze data from similar studies, completely
differently depending on the type of variable involved. For Conflicts of Interest: The authors have no conflicts of interest
example, let us say that our target population is 50+ age to declare.
group in a certain population and we have measured the
variable systolic blood pressure in a sample of 40 male and
References
40 female subjects, and our null hypothesis is “Male and
female population have the same systolic blood pressure”. 1. Daniel WW. editor. Biostatistics: A foundation for analysis
We would compare the mean blood pressure in males and in the health sciences. 6th ed. New York: John Wiley &
females with a two-sample t-test (parametric test). If the Sons, 1995.
variable is converted to hypertension status (hypertensive/ 2. Campbell MJ, Swinscow TD. editors. Statistics at Square
normal), it is a nominal variable, and we would compare One. 11th ed. Oxford: Wiley-Blackwell, 2009.
the hypertension frequencies in males and females with a 3. Altman DG, Bland JM. The cost of dichotomizing
Chi-square test (non-parametric test). We find smaller P continuous variables. BMJ 2006;332:1080.

© Journal of Thoracic Disease. All rights reserved. jtd.amegroups.com J Thorac Dis 2017;9(6):1730-1733
Journal of Thoracic Disease, Vol 9, No 6 June 2017 1733

4. McDonald JH. editor. Handbook of biological statistics. 8. Bland M. editor. An Introduction to Medical Statistics. 3rd
Baltimore, MD: Sparky House Publishing, 2009. ed. Oxford University Press; 2000.
5. Altman DG, Bland JM. Detecting skewness from summary 9. Siegel S, Castellan NJ. editors. Nonparametric statistics
information. BMJ 1996;313:1200. for the behavioral sciences. 2nd ed. New York: McGraw-
6. Freeman JV, Walters SJ, Campbell MJ. editors. How to Hill, 1988.
display data. Oxford: Blackwell, 2008. 10. Conover WJ. editor. Practical nonparametric statistics. 3rd
7. Altman DG, Bland JM. Parametric v non-parametric ed. New York: John Wiley, 1998.
methods for data analysis. BMJ 2009;338:a3167.

Cite this article as: Mayya SS, Monteiro AD, Ganapathy S.


Types of biological variables. J Thorac Dis 2017;9(6):1730-
1733. doi: 10.21037/jtd.2017.05.75

© Journal of Thoracic Disease. All rights reserved. jtd.amegroups.com J Thorac Dis 2017;9(6):1730-1733

You might also like