0% found this document useful (0 votes)
114 views27 pages

Biostatistics Series Module 1: Basics of Biostatistics: Resumen

Uploaded by

Gloria González
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views27 pages

Biostatistics Series Module 1: Basics of Biostatistics: Resumen

Uploaded by

Gloria González
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Biostatistics series module 1: Basics of

biostatistics
Hazra, Avijit; Gogtay, Nithya . Indian Journal of Dermatology ; Kolkata  Tomo 61, N.º 1,  (2016).

Enlace de documentos de ProQuest

RESUMEN
 
Although application of statistical methods to biomedical research began only some 150 years ago, statistics is
now an integral part of medical research. A knowledge of statistics is also becoming mandatory to understand
most medical literature. Data constitute the raw material for statistical work. They are records of measurement or
observations or simply counts. A variable refers to a particular character on which a set of data are recorded. Data
are thus the values of a variable. It is important to understand the different types of data and their mutual
interconversion. Biostatistics begins with descriptive statistics that implies summarizing a collection of data from
a sample or population. Categorical data are described in terms of percentages or proportions. With numerical
data, individual observations within a sample or population tend to cluster about a central location, with more
extreme observations being less frequent. The extent to which observations cluster is summarized by measures of
central tendency while the spread can be described by measures of dispersion. The confidence interval (CI) is an
increasingly important measure of precision. When we observe samples, there is no way of assessing true
population parameters. We can, however, obtain a standard error and use it to define a range in which the true
population value is likely to lie with a certain acceptable level of uncertainty. This range is the CI while its two
terminal values are the confidence limits. Conventionally, the 95% CI is used. Patterns in data sets or data
distributions are important, albeit not so obvious, component of descriptive statistics. The most common
distribution is the normal distribution which is depicted as the well-known symmetrical bell-shaped Gaussian
curve. Familiarity with other distributions such as the binomial and Poisson distributions is also helpful. Various
graphs and plots have been devised to summarize data and trends visually. Some plots, such as the box-and-
whiskers plot and the stem-and-leaf plot are used less often but provide useful summaries in select situations.

TEXTO COMPLETO
 
Basics of Biostatistics
Application of statistical methods in biomedical research began more than 150 years ago. One of the early
pioneers, Florence Nightingale, the icon of nursing, worked during the Crimean war of the 1850s to improve the
methods of constructing mortality tables. The conclusions from her tables helped to change the practices in Army
hospitals around the world. At the same time, John Snow in England applied simple statistical methods to support
his theory that contaminated water from a single hand pump was the source of the London cholera epidemic in
1854. Today, statistics is an integral part of conducting biomedical research. In addition, knowledge of statistics is
becoming mandatory to read and understand most biomedical literature.
But why is this so? Broadly speaking, statistics is the science of analyzing data and drawing conclusions thereby
in the face of variability and uncertainty. Biomedical researchers carry out studies in various settings: In the
laboratory, in the clinic, in the field or simply with data already archived in databases. Whatever the source, data
tend to exhibit substantial variability. For instance, patients given the same antimicrobial drug may respond
somewhat differently, laboratory rats maintained under identical condition may develop behavioral variations,
individuals residing as neighbors in the same locality may differ greatly in their perception of stigma associated

PDF GENERADO POR PROQUEST.COM Page 1 of 27


with a common skin disease like vitiligo. Often the degree of variability is substantial even when observational or
interventional conditions are held as uniform and constant as possible. The challenge for the biomedical
researcher is to unearth the patterns that are being obscured by the variability of responses in living systems.
Further, the researcher is often interested in small differences or changes. For instance, if we give you two
antibiotics and say that drug A has 10% cure rate in folliculitis with 7 days of treatment while drug B has 90% cure
rate in the same situation, and ask you to choose one for your patient; the choice would be obvious. However, if we
were to say that the cure rates for drugs A and B are 95% and 97% respectively, then your choice will not be so
obvious. Very likely, you will be wondering whether the difference of 2% is worth changing practice if you are
accustomed to using drug A or maybe you will look at other factors such as the toxicity profile, cost or ease of use.
Statistics, gives us the tools, albeit mathematical, to make an appropriate choice by judging the "significance" of
such small observed differences or changes.
Furthermore, it is important to remember that statistics is the science of generalization. We are generally not in a
position to carry out "census" type of studies that cover entire populations. Therefore, we usually study subsets or
samples of a population and hope that the conclusions drawn from studying such a subset can be generalized to
the population as a whole. This process is fraught with errors, and we require statistical techniques to make the
generalizations tenable.
Before the advent of computers and statistical software, researchers and others dealing with statistics had to do
most of their analysis by hand, taking recourse to books of statistical formulas and statistical tables. This required
one to be proficient in the mathematics underlying statistics. This is no longer mandatory since increasingly user-
friendly software takes the drudgery out of calculations and obviates the need for looking up statistical tables.
Therefore, today, understanding the applied aspects of statistics suffices for the majority of researchers and we
seldom require to dig into the mathematical depths of statistics, to make sense of the data that we generate or
scrutinize.
The applications of biostatistics broadly covers three domains - description of patterns in data sets through
various descriptive measures (descriptive statistics), drawing conclusions regarding populations through various
statistical tests applied to sample data (inferential statistics) and application of modeling techniques to
understand relationship between variables (statistical modeling), sometimes with the goal of prediction. In this
series, we will look at the applied uses of statistics without delving into mathematical depths. This is not to deny
the mathematical underpinnings of statistics - these can be found in statistics textbooks. Our goal here is to
present the concepts and look at the applications from the point of view of the applied user of biostatistics.
Data and Variables
Data constitute the raw material for statistical work. They are records of measurement or observations or simply
counts. A variable refers to a particular character on which a set of data are recorded. Data are thus the values of a
variable. Before a study is undertaken it is important to consider the nature of the variables that are to be recorded.
This will influence the manner in which observations are undertaken, the way in which they are summarized and
the choice of statistical tests that will be used.
At the most basic level, it is important to distinguish between two types of data or variables. The first type includes
those measured on a suitable scale using an appropriate measuring device and is called quantitative variable.
Since quantitative variables always have values expressed as numbers, and the differences between values have
numerical meaning, they are also referred to as numerical variables. The second type includes those which are
defined by some characteristic, or quality, and is referred to as qualitative variable. Because qualitative data are
best summarized by grouping the observations into categories and counting the numbers in each, they are often
referred to as categorical variables.
A quantitative variable can be continuous or discrete. A continuous variable can, in theory at least, take on any
value within a given range, including fractional values. A discrete variable can take on only certain discrete values
within a given range - often these values are integers. Sometimes variables (e.g., age of adults) are treated as
discrete variables although strictly speaking they are continuous. A qualitative variable can be a nominal variable

PDF GENERADO POR PROQUEST.COM Page 2 of 27


or an ordinal variable. A nominal variable covers categories that cannot be ranked, and no category is more
important than another. The data is generated simply by naming, on the basis of a qualitative attribute, the
appropriate category to which the observation belongs. An ordinal variable has categories that follow a logical
hierarchy and hence can be ranked. We can assign numbers (scores) to nominal and ordinal categories; although,
the differences among those numbers do not have numerical meaning. However, category counts do have
numerical significance. A special case may exist for both categorical or numerical variables when the variable in
question can take on only one of two numerical values or belong to only one of two categories; these are known as
binary or dichotomous variables [Table 1].{Table 1}
Numerical data can be recorded on an interval scale or a ratio scale. On an interval scale, the differences between
two consecutive numbers carry equal significance in any part of the scale, unlike the scoring of an ordinal variable
("ordinal scale"). For example, when measuring height, the difference between 100 and 102 cm is the same as the
difference between 176 and 178 cm. Ratio scale is a special case of recording interval data. With interval scale
data the 0 value can be arbitrary, such as the position of 0 on some temperature scales - the Fahrenheit 0 is at a
different position to that of the Celsius scale. With ratio scale, 0 actually indicates the point where nothing is
scored on the scale ("true 0"), such as 0 on the absolute or Kelvin scale of temperature. Thus, we can say that an
interval scale of measurement has the properties of identity, magnitude, and equal intervals while the ratio scale
has the additional property of a true 0. Only on a ratio scale, can differences be judged in the form of ratios. 0 deg
C is not 0 heat, nor is 26 deg C twice as hot as 13 deg C; whereas these value judgments hold with the Kelvin scale.
In practice, this distinction is not tremendously important so far as the handling of numerical data in statistical
tests is concerned.
Changing data scales is possible so that numerical data may become ordinal, and ordinal data may become
nominal (even dichotomous). This may be done when the researcher is not confident about the accuracy of the
measuring instrument, is unconcerned about the loss of fine detail, or where group numbers are not large enough
to adequately represent a variable of interest. It may also make clinical interpretation easier. For example, the
Dermatology Life Quality Index (DLQI) is used to assess how much of an adult subject's skin problem is affecting
his or her quality of life. A DLQI score <6 indicates that the skin problem is hardly affecting the quality of life, score
of 6-20 indicates moderate to large effect on quality while score >20 indicates that the problem is severely
degrading the quality of life. This categorization may be more relevant to the clinician than the actual DLQI score
achieved. In contrast, converting from categorical to numerical will not be feasible without having actual
measurements.
When exploring the relationship between variables, some can be considered as dependent (dependent variable) on
others (independent variables). For instance, when exploring the relationship between height and age, it is obvious
that height depends on age, at least until a certain age. Thus, age is the independent variable, which influences the
value of the dependent variable height. When exploring the relationship between multiple variables, usually in a
modeling situation, the value of the outcome (response) variable depends on the value of one or more predictor
(explanatory) variables. In this situation, some variables may be identified that cannot be accurately measured or
controlled and only serve to confuse the results. They are called confounding variables or confounders. Thus, in a
study of the protective effect of a sunscreen in preventing skin cancer, the amount of time spent in outdoor activity
could be a major confounder. The extent of skin pigmentation would be another confounder. There could even be
confounders whose existence is unknown or effects unsuspected, for instance, undeclared consumption of
antioxidants by the subjects which is quite possible because the study would go on for a long time. Such
unsuspected confounders have been called lurking variables.
Numerical or categorical variables may sometimes need to be ranked, that is arranged in ascending order and new
values assigned to them serially. Values that tie are each assigned average of the ranks they encompass. Thus, a
data series 2, 3, 3, 10, 23, 35, 37, 39, 45 can be ranked as 1, 2.5, 2.5, 4, 5, 6, 7, 8, 9 since the 2, 3s encompass ranks 2
and 3, giving an average rank value of 2.5. Note that when a numerical variable is ranked, it gets converted to an
ordinal variable. Ranking obviously does not apply to nominal variables because their values do not follow any

PDF GENERADO POR PROQUEST.COM Page 3 of 27


order.
Descriptive Statistics
Descriptive statistics implies summarizing a raw data set obtained from a sample or population. Traditionally,
summaries of sample data ("statistics") have been denoted by Roman letters (e.g., [x overline] for mean, standard
deviation [SD], etc.) while summaries of population data ("parameters") have been denoted by Greek letters (e.g.,
[mu] for mean, [sigma] for SD, etc.). The description serves to identify patterns or distributions in data sets from
which important conclusions may be drawn.
Categorical data are described in terms of percentages or proportions. With numerical data, individual
observations within a sample or population tend to cluster about a central location, with more extreme
observations being less frequent. The extent to which observations cluster is summarized by measures of central
tendency while the spread can be described by measures of dispersion.
Measures of Central Tendency
The mean (or more correctly, the arithmetic mean) is calculated as the sum of the individual values in a data
series, divided by the number of observations. The mean is the most commonly used measure of central tendency
to summarize a set of numerical observations. It is usually reliable unless there are extreme values (outliers) that
can distort the mean. It should not, ordinarily be used, in describing categorical variables because of the arbitrary
nature of category scoring. It may, however, be used to summarize category counts.
The geometric mean of a series of n observations is the nth root of the product of all the observations. It is always
equal to or less than the arithmetic mean. It is not often used but is a more appropriate measure of central location
when data recorded span several orders of magnitude, e.g. bacterial colony counts from a culture of clinical
specimens. Interestingly, the logarithm of the geometric mean is the arithmetic mean of the logarithms of the
observations. As such, the geometric mean may be calculated by taking the antilog of the arithmetic mean of the
log values of the observations. The harmonic mean of a set of non-zero positive numbers is obtained as the
reciprocal of the arithmetic mean of the reciprocals of these numbers. It is seldom used in biostatistics. Unlike the
arithmetic mean, neither geometric nor harmonic mean can be applied to negative numbers.
Often data are presented as a frequency table. If the original data values are not available, a weighted average can
be estimated from the frequency table by multiplying each data value by the number of cases in which that value
occurs, summing up the products and dividing the sum by the total number of observations. A frequency table of
numerical data may report the frequencies for class intervals (the entire range covered being broken up into a
convenient number of classes) rather than for individual data values. In such cases, we can calculate the weighted
average by using the mid-point of the class intervals. However, in this instance, the weighted mean may vary
slightly from the arithmetic mean of all the raw observations. Apart from counts, there may be other ways of
ascribing weights to observations before calculating a weighted average.
For data sets with extreme values, the median is a more appropriate measure of central tendency. If the values in a
data series are arranged in order, the median denotes the middle value (for an odd number of observations) or the
average of the two middle values (for an even number of observations). The median denotes the point in a data
series at which half the observations are larger and half are smaller. As such it is identical to the 50 th percentile
value. If the distribution of the data is perfectly symmetrical (as in the case of a normal distribution that we
discuss later), the values of the median and mean coincide. If the distribution has a long tail to the right (a positive
skew), the mean exceeds the median; if the long tail is to the left
(a negative skew), the median exceeds the mean. Thus, the relationship of the two gives an idea of the symmetry
or asymmetry (skewness) of the distribution of data.
Mode is the most frequently occurring value in a data series. It is not often used, for the simple reason that it is
difficult to pinpoint a mode if no value occurs with a frequency markedly greater than the rest. Furthermore, two or
more values may occur with equal frequency, making the data series bimodal or multimodal [Box 1].
[INLINE:1]
Measures of Dispersion

PDF GENERADO POR PROQUEST.COM Page 4 of 27


The spread, or variability, of a data series can be readily described by the range, that is the interval between
minimum and maximum values. However, the range does not provide much information about the overall
distribution of observations and is obviously affected by extreme values.
A more useful estimate of the spread can be obtained by arranging the values in ascending order and then
grouping them into 100 equal parts (in terms of the number of values) that are called centiles or percentiles. It is
then possible to state the value at any given percentile, such as the 5 th or the 95 th percentile and to calculate the
range of values between any two percentiles, such as the 10 th and 90 th or the 25 th and the 75 th percentiles.
The median represents the 50 th percentile. Quartiles divide ordered data set into four equal parts, with the upper
boundaries of the first, second, and third quartiles often denoted as Q1, Q2, and Q3, respectively. Note the
relationship between quartiles and percentiles. Q1 corresponds to 25 th percentile while Q3 corresponds to 75 th
percentile. Q2 is the median value in the set. If we estimate the range of the middle 50% of the observations about
the median (i.e., Q1-Q3), we have the interquartile range. If the dispersion in the data series is less, we can use the
10 th to 90 th percentile value to denote spread.
A still better method of measuring variability about the central location is to estimate how closely the individual
observations cluster about it. This leads to the mean square deviation or variance, which is calculated as the sum
of the squares of individual deviations from the mean, divided by one less than the number of observations. The
SD of a data series is simply the square root of the variance. Note that the variance is expressed in squared units,
which is difficult to comprehend, but the SD retains the basic unit of observation.
The formulae for the variance (and SD) for a population has the value " n" as the denominator. However, the
expression (n - 1) is used when calculating the variance (and SD) of a sample. The quantity (n - 1) denotes the
degrees of freedom, which is the number of independent observations or choices available. For instance if a series
of four numbers is to add up to 100, we can assign different values to the first three, but the value of the last is
fixed by the first three choices and the condition imposed that the total must be 100. Thus, in this example, the
degrees of freedom can be stated to be 3. The degrees of freedom is used when calculating the variance (and SD)
of a sample because the sample mean is a predetermined estimate of the population mean, and, in the sample,
each observation is free to vary except the last one that must be a defined value.
The coefficient of variation (CV) of a data series denotes the SD expressed as a percentage of the mean. Thus, it
denotes the relative size of the SD with respect to the mean. CV can be conveniently used to compare variability
between studies, since, unlike SD, its magnitude is independent of the units employed.
Measures of Precision
An important source of variability in biological observations is measurement imprecision and CV is often used to
quantify this imprecision. It is thus commonly used to describe variability of measuring instruments and laboratory
assays, and it is generally taken that a CV of <5% is acceptable reproducibility.
Another measure of precision for a data series is the standard error of the mean (SEM), which is simply calculated
as the SD divided by the square root of the number of observations. Since, SEM is a much smaller numerical value
than SD, it is often presented in place of SD as a measure of the spread of data. However, this is erroneous since
SD is meant to summarize the spread of data, while SEM is a measure of precision and is meant to provide an
estimate of a population parameter from a sample statistic in terms of the confidence interval (CI).
It is self-evident that when we make observations on a sample, and calculate the sample mean, this will not be
identical to the population ("true") mean. However, if our sample is sufficiently large and representative of the
population, and we have made our observations or measurements carefully, and then the sample mean would be
close to the true mean. If we keep taking repeated samples and calculate a sample mean in each case, the
different sample means would have their own distribution, and this would be expected to have less dispersion than
that of all the individual observations in the samples. In fact, it can be shown that the different sample means
would have a symmetrical distribution, with the true population mean at its central location, and the SD of this
distribution would be nearly identical to the SEM calculated from individual samples.
In general, however, we are not interested in drawing multiple samples, but rather how reliable our one sample is in

PDF GENERADO POR PROQUEST.COM Page 5 of 27


describing the population. We use standard error to define a range in which the true population value is likely to lie,
and this range is the CI while its two terminal values are the confidence limits. The width of the CI depends on the
standard error and the degree of confidence required. Conventionally, the 95% CI (95% CI) is most commonly used.
From the properties of a normal distribution curve (see below) it can be shown that the 95% CI of the mean would
cover a range 1.96 standard errors either side of the sample mean, and will have a 95% probability of including the
population mean; while 99% CI will span 2.58 standard errors either side of the sample mean and will have 99%
probability of including the population mean. Thus, a fundamental relation that needs to be remembered is:
95% CI of mean = Sample mean +/- 1.96 x SEM.
It is evident that the CI would be narrower if SEM is smaller. Thus if a sample is larger, SEM would be smaller and
the CI would be correspondingly narrower and thus more "focused" on the true mean. Large samples therefore
increase precision. It is interesting to note that although increasing sample size improves precision, it is a
somewhat costly approach to increasing precision, since halving of SEM requires a 4-fold increase in sample size.
CIs can be used to estimate most population parameters from sample statistics (means, proportions, correlation
coefficients, regression coefficients, odds ratios, relative risks, etc.). In all cases, the principles and the general
pattern of estimating the CI remains the same, that is:
95% CI of a parameter = Sample statistic +/- 1.96 x standard error for that statistic.
The formulae for estimating standard error however varies for different statistics, and in some instances is quite
elaborate. Fortunately, we generally rely on computer software to do the calculations.
Frequency Distributions
It is useful to summarize a set of raw numbers with a frequency distribution. The summary may be in the form of a
table or a graph (plot). Many frequency distributions are encountered in medical literature [Figure 1] and it is
important to be familiar with commonly encountered ones.{Figure 1}
Majority of distributions that quantitative clinical data follow are unimodal, that is the data have a single peak
(mode) with a tail on either side. The most common of these unimodal distributions is the bell-shaped symmetrical
distribution called the normal distribution or the Gaussian distribution [Figure 2]. In this distribution, the values of
mean, median and mode will coincide. However, some distributions are skewed with a substantially longer tail on
one side. The type of skew is determined by the direction of the longer tail. A positively skewed distribution has a
longer tail to the right. In this case, the mean will be greater than the median because the mean is strongly
influenced by the extreme values in the right-hand tail. On the other hand, a negatively skewed distribution has a
longer tail to the left; in this instance, the mean will be smaller than the median. Thus, the relationship between
mean and median gives an idea of the distribution of numerical data.{Figure 2}
It is possible that datasets may have more than one peak (mode). Such data can be difficult to manage and it may
be the case that neither the mean nor the median is a representative measure. However, it is important to
remember that bimodal or multimodal distributions are rare and may even be artifactual. A distribution with two
peaks may actually be reflecting a combination of two unimodal distributions, for instance, one for each gender or
different age groups. In such cases, appropriate subdivision, categorization, or even recollection of the data may
be required to eliminate multiple peaks.
Probability Distributions
A random variable is a numerical quantity whose values are determined by the outcomes of a random experiment.
The possible values of a random variable and the associated probabilities constitute a statistical probability
distribution. The concept of probability distributions and frequency distributions are similar in that each
associates a number with the possible values of a variable. However, for a frequency distribution, the number is a
frequency, while for a probability distribution, this number is a probability. A frequency distribution describes a set
of data that has been observed; it is thus empirical. A probability distribution describes data that might be
observed under certain specified conditions; hence it is theoretical. Probability distributions are part of descriptive
statistics, and they can be used to predict how random variables are expected to behave under certain conditions.
If the empirical data deviate considerably from the predictions of a probability distribution model, the correctness

PDF GENERADO POR PROQUEST.COM Page 6 of 27


of the model or its assumptions can be questioned, and we may look for alternative models to fit the empirical
data. [Table 2] provides examples of statistical probability distributions. Note that, they are broadly classified as
continuous or discrete probability distributions depending on whether the random variable in question is a
continuous or a discrete variable.{Table 2}
Of the many probability distributions that can be used to model biological events or observations, the most
common is the normal distribution. In such a distribution, the values of the random variable tend to cluster around
a central value, with a symmetrical positive and negative dispersion about this point. The more extreme values
become less frequent the further they lie from the central point [Figure 3]. The term "normal" relates to the sense of
'standard' against which other distributions may be compared. It is also referred to as a Gaussian distribution after
the German mathematician, Karl Friedrich Gauss (1777-1855), although Gauss was not the first person to describe
such a distribution. The bell curve was named 'normal curve' by the great Karl Pearson. Important properties of a
normal distribution are:{Figure 3}
Unimodal bell-shaped distributionSymmetric about the meanFlattens symmetrically as the variance is
increasedKurtosis is 0 ("kurtosis" refers to how peaked a distribution is)The tails may extend toward infinity, but
the total area is taken as 1.
In a normal distribution curve, the mean, median, and mode coincide. The area delimited by one SD either side of
the mean includes 68% of the total area, two SDs 95.4%, and three SDs 99.7%; 95% of the values lie within 1.96 SDs
on either side of the mean. It is for this reason that the interval denoted by mean +/- 1.96 x SD is often taken as the
normal range or reference range for many physiological variables.
If we look at the equation for the normal distribution, it is evident that there are two parameters that define the
curve, namely [mu] (the mean) and [sigma] (the SD):
[INLINE:2]
The standard normal distribution curve is a special case of the normal distribution for which probabilities have
been calculated. It is a symmetrical bell-shaped curve with a mean of 0 and a variance (or SD) of 1. The random
variable of a standard normal distribution is the Z-score of the corresponding value of the variable for the normal
distribution. A standard normal distribution table shows cumulative probability associated with particular Z-scores
and can be used to estimate probabilities of particular values of a normally distributed variable.
In all biomedical research where samples are used to learn about populations, some random procedure is essential
for subject selection to avoid many kinds of bias. This takes the form of random sampling from a population or
randomized allocation of participants to interventional groups. Randomness is a property of the procedure rather
than of the sample and ensures that every potential subject has a fair and equal chance of getting selected. The
resulting sample is called a random sample. As the number of observations increases (say, n >100), the shape of a
random sampling distribution will approximate a normal distribution curve even if the distribution of the variable in
question is not normal. This is explained by the central limit theorem and is one reason why the normal distribution
is so important in biomedical research.
Many statistical techniques require the assumption of normality of the dataset. It is not mandatory for the sample
data to be normally distributed, but it should represent a population that is normally distributed.
Presenting Data
Once summary measures of data have been calculated, they need to be presented in tables and graphs.
Appropriate data presentation summarizes the data in a compact and meaningful manner without burdening the
reader with a surfeit of information, enables conclusions to be drawn simply by looking at the summarized data
and, of course, helps in further statistical analysis where necessary.
Regarding data presentation in tables, it is helpful to remember the following:
Tables should be numberedEach table must have a concise and self-explanatory titleTables must be formatted
with an appropriate number of rows and columns but should not be too large. Larger tables can usually be split
into multiple simpler tablesColumn headings and row classifiers must be clear and conciseFor tables showing
frequency distributions, it must be clear whether the frequencies depicted in each class or class interval represent

PDF GENERADO POR PROQUEST.COM Page 7 of 27


absolute frequency, relative frequency (i.e., the percentage of the total) or the cumulative frequencyFor tables
depicting percentages, it must be clear whether the percentages represent percentages with respect to the row
(row percentage) or the column (column percentage) in which the cell is locatedThe mean is to be used for
numerical data and symmetric (nonskewed) distributionsThe median should be used for ordinal data or for
numerical data if the distribution is skewedThe mode is generally used only for examining bimodal or multimodal
distributionsThe range may be used for numerical data to emphasize extreme valuesThe SD is to be used along
with the meanInterquartile range or percentiles should be used along with the medianSDs and percentiles may
also be used when the objective is to depict a set of norms ("normative data")The CV may be used if the intent is to
compare variability between datasets measured on different numerical scales95% CIs should be used whenever
the intent is to draw inferences about populations from samplesAdditional information required to interpret the
table (e.g., explanation of column headings, other abbreviations, explanatory remarks) can be appended as
footnotes.
For presenting data graphically, it is usually necessary to obtain the summary measures, counts or percentages of
the data. These can then be utilized to draw different types of graphs (or charts or plots or diagrams). The more
common types with some of their variants are summarized in [Table 3] and [Figure 4]. Although charts are visually
appealing, they should not replace tabulation of important summary data. Further, if not constructed or scaled
appropriately, charts can be misleading.{Figure 4}{Table 3}
A pictogram represents quantity by presenting stylized pictures or icons of the variable being depicted - the
number or size of the icon being proportional to the frequency. When comparing between groups using a
pictogram, it is preferable that same-sized icons be used across groups (with their numbers varying) - otherwise
the picture may be misleading. Pictograms are more often used in mass media presentations than in serious
biomedical literature.
Pie chart depicts frequency distribution of categorical data in a circle (the "pie"), with the sectors of the circle
proportional in size to the frequencies in the respective categories. A particular category can be emphasized by
pulling out that sector. All sectors are pulled out in an "exploded" pie chart. Pie charts can be made highly
attractive, by using color and three-dimensional design enhancements, but become cumbersome if there are too
many categories.
Bar chart (also called column chart) depicts categorical or numerical data as a series of vertical (or horizontal)
bars, with the bar heights (lengths) being proportional to the frequencies or the means. The bar widths and
separation between bars should be uniform but are of little significance other than to indicate that the bars denote
separate series or categories. Bars depicting subcategories can be stacked one on top of another (stacked or
segmented or component bar chart). The frequencies can be converted to percentages so that the total numbers
in each category add up to 100% giving 100% stacked bar chart where all the bars are of equal height. Two or more
data series or subcategories can be depicted on the same bar chart by placing corresponding bars side by side -
different patterns or colors are used to distinguish the different series or subcategories (compound or multiple or
cluster bar chart).
The histogram is similar to bar chart in appearance but is used for summarizing continuous numerical data and
hence there should not be any gaps between the bars. The bar widths correspond to the class intervals. The
alignment of the bars is usually horizontal with the class intervals along the horizontal axis and the frequencies
along the vertical axis. A histogram is popularly used to depict the frequency distribution in a large data series.
Accordingly, the class intervals should be so chosen that the bars are narrow enough to illustrate patterns in the
data but not so narrow that they become too large in number. A histogram must be labeled carefully to depict
clearly where the boundaries lie.
A frequency polygon is a line diagram representation of the frequency distribution depicted by the histogram and
is obtained by joining the midpoints of the upper boundary of the histogram blocks. As such it depicts the
frequency distribution of numerical data as a curve.
Dot plot [Figure 5] depicts frequency distribution of numerical variables like histograms but with the advantage of

PDF GENERADO POR PROQUEST.COM Page 8 of 27


depicting individual values as well. Instead of bars, it has a series of dots for each value or class interval - each dot
representing one observation. The alignment can be vertical or horizontal. They are useful in highlighting clusters
and gaps in data sets as well as outliers. Dot plots are conceptually simple but become cumbersome for large data
sets. Scatter plots (sometimes erroneously called dot plots) are used for depicting association between two
variables with the X and Y coordinates of each dot representing the corresponding values of the two variables. A
bubble plot is an extension of the scatter plot to depict the relation between three variables - here each dot is
expanded into a bubble with the diameter of the bubble being proportional to the value of the third variable. This is
preferable to depicting the third variable on a Z axis since it is difficult to comprehend depth on a two-dimensional
surface.{Figure 5}
Stem-and-leaf plot or stem plot [Figure 6] is a sort of mixture of a diagram and a table. It has been devised to
depict frequency distribution, as well as individual values for numerical data. The data values are examined to
determine their last significant digit (the "leaf" item), and this is attached to the previous digits (the "stem" item).
The stem items are usually arranged in ascending or descending order vertically, and a vertical line is usually
drawn to separate the stem from the leaf. The number of leaf items should total up to the number of observations.
However, it becomes cumbersome with large data sets.{Figure 6}
Box-and-whiskers plot (or box plot) is a graphical representation of numerical data based on the five-number
summary - minimum value, 25 th percentile, median (50 th percentile), 75 th percentile and maximum value [Figure
7]. A rectangle is drawn extending from the lower quartile to the upper quartile, with the median dividing this "box"
but not necessarily equally. Lines ("whiskers") are drawn from the ends of the box to the extreme values. Outliers
may be indicated beyond the extreme values by dots or asterisks - in such "modified" 0 or "refined" box plots, the
whiskers have lengths not exceeding 1.5 times the box length. The whole plot may be aligned vertically or
horizontally. Box plots are ideal for summarizing large samples and are being increasingly used. Multiple box plots,
arranged side by side, allow ready comparison of data sets.{Figure 7}
We have looked at the commonly used plots used for summarizing data and depicting underlying patterns. Many
other plots are used in biostatistics for depicting data distributions, time trends in observations, relationships
between two or more variables, exploring goodness-of-fit to hypothesized data distributions and drawing
inferences by comparing data sets. We will get introduced to select other plots in subsequent modules in this
series.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
Futher Reading
Samuels ML, Witmer JA, Schaffner AA, editors. Description of samples and populations. In: Statistics for the Life
Sciences. 4 th ed. Boston: Pearson Education; 2012. p. 26-80.Kirk RE, editor. Random variables and probability
distributions. In: Statistics: An Introduction. 5 th ed. Belmont: Thomson Wadsworth; 2008. p. 207-27.Kirk RE,
editor. Normal distribution and sampling distributions. In: Statistics: An Introduction. 5 th ed. Belmont: Thomson
Wadsworth; 2008. p. 229-55.Dawson B, Trapp RG, editors. Summarizing data &presenting data in tables and
graphs. In: Basic &Clinical Biostatistics. 4 th ed. New York: McGraw-Hill; 2004. p. 23-60.
References
1. Raman CV, Krishnan KS. A new type of secondary radiation. Nature 1928;121:501-2.
2. Rayleigh L. On the scattering of light by small particles. Philos Mag Lett 1871;41:447-51.
3. Schut T, Wothuis R, Caspers P, Puppels G. Real-time tissue characterization on the basis of in vivo Raman
spectra. J Raman Spectrosc 2002;33:580-5.
4. Lieber C, Mahadevan-Jansen A. Development of a handheld Raman microspectrometer for clinical dermatologic
applications. Opt Express 2007;15:11874-82.
5. Caspers PJ, Lucassen GW, Wolthuis R, Bruining HA, Puppels GJ. In vitro and in vivo Raman spectroscopy of

PDF GENERADO POR PROQUEST.COM Page 9 of 27


human skin. Biospectroscopy 1998;4 5 Suppl:S31-9.
6. Huang Z, Zeng H, Hamzavi I, McLean DI, Lui H. Rapid near-infrared Raman spectroscopy system for real-time in
vivo skin measurements. Opt Lett 2001;26:1782-4.
7. Zhao J, Lui H, McLean DI, Zeng H. Integrated real-time Raman system for clinical in vivo skin analysis. Skin Res
Technol 2008;14:484-92.
8. Puppels GJ, de Mul FF, Otto C, Greve J, Robert-Nicoud M, Arndt-Jovin DJ, et al. Studying single living cells and
chromosomes by confocal Raman microspectroscopy. Nature 1990;347:301-3.
9. Valeur B. Molecular Fluorescence: Principles and Applications. Wiley; 2002.
10. Ko AC, Choo-Smith LP, Hewko M, Leonardi L, Sowa MG, Dong CC, et al. Ex vivo detection and characterization
of early dental caries by optical coherence tomography and Raman spectroscopy. J Biomed Opt 2005;10:031118.
11. Caspers PJ, Lucassen GW, Puppels GJ. Combined in vivo confocal Raman spectroscopy and confocal
microscopy of human skin. Biophys J 2003;85:572-80.
12. Smith ZJ, Berger AJ. Surface-sensitive polarized Raman spectroscopy of biological tissue. Opt Lett
2005;30:1363-5.
13. LaPlant F. Lasers, spectrographs, and detectors. In: Matousek P, Morris MD, editors. Emerging Raman
Application and Techniques in Biomedical and Pharmaceutical Fields. New York: Springer; 2010.
14. Barry B, Edwards HG, Williams A. Fourier transform Raman and infrared vibrational study of human skin:
Assignment of spectral bands. J Raman Spectrosc 1992;23:641-5.
15. Edwards HG, Farwell D, Williams A, Barry BW, Rull F. Novel spectroscopic deconvolution procedure for complex
biological systems: Vibrational components in the FT-Raman spectra of ice-man and contemporary skin. J Chem
Soc Faraday Trans 1995;91:3883-7.
16. Williams A, Edwards HG, Barry B. Raman spectra of human keratotic biopolymers: Skin, callus, hair and nail. J
Raman Spectrosc 1994;25:95-8.
17. Lawson E, Edwards HG, Williams A, Barry B. Applications of Raman spectroscopy to skin research. Skin Res
Technol 1997;3:147-53.
18. Schrader B, Keller S, Lochte T, Fendel S, Moore DS, Simon A, et al. NIR FT Raman spectroscopy in medical
diagnostics. J Mol Struct 1995;348:293-6.
19. Williams AC, Barry BW, Edwards HG, Farwell DW. A critical comparison of some Raman spectroscopic
techniques for studies of human stratum corneum. Pharm Res 1993;10:1642-7.
20. Williams AC, Edwards HG, Barry BW. The 'Iceman': Molecular structure of 5200-year-old skin characterised by
Raman spectroscopy and electron microscopy. Biochim Biophys Acta 1995;1246:98-105.
21. Edwards HG, Williams AC, Barry B. Potential applications of FT-Raman spectroscopy for dermatological
diagnostics. J Mol Struct 1995;347:379-87.
22. Anigbogu A, Williams A, Barry B, Edwards HG. Fourier transform Raman spectroscopy of interactions between
the penetration enhancer dimethyl sulfoxide and human stratum corneum. Int J Pharm 1995;125:265-82.
23. Schallreuter KU, Wood JM, Farwell DW, Moore J, Edwards HG. Oxybenzone oxidation following solar irradiation
of skin: Photoprotection versus antioxidant inactivation. J Invest Dermatol 1996;106:583-6.
24. Gniadecka M, Faurskov Nielsen O, Christensen DH, Wulf HC. Structure of water, proteins, and lipids in intact
human skin, hair, and nail. J Invest Dermatol 1998;110:393-8.
25. Schrader B, Dippel B, Fendel S, Keller S, Lochte T, Reidl M, et al. NIR FT Raman spectroscopy - A new tool in
medical diagnostics. J Mol Struct 1997;408:23-31.
26. Shim M, Wilson B. Development of an in vivo Raman spectroscopic system for diagnostic applications. J
Raman Spectrosc 1997;28:131-42.
27. Caspers PJ, Lucassen GW, Bruining HA, Puppels GJ. Automated depth-scanning confocal Raman
microspectrometer for rapid in vivo determination of water concentration profiles in human skin. J Raman
Spectrosc 2000;31:813-8.
28. Egawa M, Tagami H. Comparison of the depth profiles of water and water-binding substances in the stratum

PDF GENERADO POR PROQUEST.COM Page 10 of 27


corneum determined in vivo by Raman spectroscopy between the cheek and volar forearm skin: Effects of age,
seasonal changes and artificial forced hydration. Br J Dermatol 2008;158:251-60.
29. Nakagawa N, Matsumoto M, Sakai S. In vivo measurement of the water content in the dermis by confocal
Raman spectroscopy. Skin Res Technol 2010;16:137-41.
30. Caspers PJ, Williams AC, Carter EA, Edwards HG, Barry BW, Bruining HA, et al. Monitoring the penetration
enhancer dimethyl sulfoxide in human stratum corneum in vivo by confocal Raman spectroscopy. Pharm Res
2002;19:1577-80.
31. Melot M, Pudney PD, Williamson AM, Caspers PJ, Van Der Pol A, Puppels GJ. Studying the effectiveness of
penetration enhancers to deliver retinol through the stratum cornum by in vivo confocal Raman spectroscopy. J
Control Release 2009;138:32-9.
32. Pudney PD, Melot M, Caspers PJ, Van Der Pol A, Puppels GJ. An in vivo confocal Raman study of the delivery of
trans retinol to the skin. Appl Spectrosc 2007;61:804-11.
33. Zhang G, Flach CR, Mendelsohn R. Tracking the dephosphorylation of resveratrol triphosphate in skin by
confocal Raman microscopy. J Control Release 2007;123:141-7.
34. Zhang G, Moore DJ, Sloan KB, Flach CR, Mendelsohn R. Imaging the prodrug-to-drug transformation of a 5-
fluorouracil derivative in skin by confocal Raman microscopy. J Invest Dermatol 2007;127:1205-9.
35. Crowther JM, Sieg A, Blenkiron P, Marcott C, Matts PJ, Kaczvinsky JR, et al. Measuring the effects of topical
moisturizers on changes in stratum corneum thickness, water gradients and hydration in vivo. Br J Dermatol
2008;159:567-77.
36. Knudsen L, Johansson CK, Philipsen PA, Gniadecka M, Wulf HC. Natural variations and reproducibility of in vivo
near-infrared Fourier transform Raman spectroscopy of normal human skin. J Raman Spectrosc 2002;33:574-9.
37. Darvin ME, Fluhr JW, Caspers P, van der Pool A, Richter H, Patzelt A, et al. In vivo distribution of carotenoids in
different anatomical locations of human skin: Comparative assessment with two different Raman spectroscopy
methods. Exp Dermatol 2009;18:1060-3.
38. Lademann J, Caspers P, Van Der Pol A, Richter H, Patzelt A, Zastrow L, et al. In vivo Raman spectroscopy
detects increased epidermal antioxidative potential with topically applied carotenoids. Laser Phys Lett 2008;6:76-
9.
39. Mayne ST, Cartmel B, Scarmo S, Lin H, Leffell DJ, Welch E, et al. Noninvasive assessment of dermal
carotenoids as a biomarker of fruit and vegetable intake. Am J Clin Nutr 2010;92:794-800.
40. Zidichouski JA, Mastaloudis A, Poole SJ, Reading JC, Smidt CR. Clinical validation of a noninvasive, Raman
spectroscopic method to assess carotenoid nutritional status in humans. J Am Coll Nutr 2009;28:687-93.
41. Gniadecka M, Wulf HC, Nielsen OF, Christensen DH, Hercogova J. Distinctive molecular abnormalities in benign
and malignant skin lesions: Studies by Raman spectroscopy. Photochem Photobiol 1997;66:418-23.
42. Hata TR, Scholz TA, Ermakov IV, McClane RW, Khachik F, Gellermann W, et al. Non-invasive Raman
spectroscopic detection of carotenoids in human skin. J Invest Dermatol 2000;115:441-8.
43. Sigurdsson S, Philipsen PA, Hansen LK, Larsen J, Gniadecka M, Wulf HC. Detection of skin cancer by
classification of Raman spectra. IEEE Trans Biomed Eng 2004;51:1784-93.
44. Gniadecka M, Philipsen PA, Sigurdsson S, Wessel S, Nielsen OF, Christensen DH, et al. Melanoma diagnosis by
Raman spectroscopy and neural networks: Structure alterations in proteins and lipids in intact cancer tissue. J
Invest Dermatol 2004;122:443-9.
45. Cartaxo SB, Santos ID, Bitar R, Oliveira AF, Ferreira LM, Martinho HS, et al. FT-Raman spectroscopy for the
differentiation between cutaneous melanoma and pigmented nevus. Acta Cir Bras 2010;25:351-6.
46. Gniadecka M, Wulf HC, Nielsen OF, et al. Potential of Raman spectroscopy fo in vitro and in vivo diagnosis of
malignant melanoma. In: XVI International Conference of Raman Spectroscopy. Chichester: John Wiley and Sons;
1998a.
47. Nijssen A, Bakker Schut TC, Heule F, Caspers PJ, Hayes DP, Neumann MH, et al. Discriminating basal cell
carcinoma from its surrounding tissue by Raman spectroscopy. J Invest Dermatol 2002;119:64-9.

PDF GENERADO POR PROQUEST.COM Page 11 of 27


48. Lui H, Zhao J, McLean D, Zeng H. Real-time Raman spectroscopy for in vivo skin cancer diagnosis. Cancer Res
2012;72:2491-500.
49. Philipsen PA, Knudsen L, Gniadecka M, Ravnbak MH, Wulf HC. Diagnosis of malignant melanoma and basal cell
carcinoma by in vivo NIR-FT Raman spectroscopy is independent of skin pigmentation. Photochem Photobiol Sci
2013;12:770-6.
50. Choi J, Choo J, Chung H, Gweon DG, Park J, Kim HJ, et al. Direct observation of spectral differences between
normal and basal cell carcinoma (BCC) tissues using confocal Raman microscopy. Biopolymers 2005;77:264-72.
51. Ly E, Durlach A, Antonicelli F, Bernard P, Manfait M, Piot O. Probing tumor and peritumoral tissues in superficial
and nodular basal cell carcinoma using polarized Raman microspectroscopy. Exp Dermatol 2010;19:68-73.
52. Ly E, Piot O, Durlach A, Bernard P, Manfait M. Polarized Raman microspectroscopy can reveal structural
changes of peritumoral dermis in basal cell carcinoma. Appl Spectrosc 2008;62:1088-94.
53. Nijssen A, Maquelin K, Santos LF, Caspers PJ, Bakker Schut TC, den Hollander JC, et al. Discriminating basal
cell carcinoma from perilesional skin using high wave-number Raman spectroscopy. J Biomed Opt
2007;12:034004.
54. Larraona-Puy M, Ghita A, Zoladek A, Perkins W, Varma S, Leach IH, et al. Development of Raman
microspectroscopy for automated detection and imaging of basal cell carcinoma. J Biomed Opt 2009;14:054031.
55. Lieber CA, Majumder SK, Billheimer D, Ellis DL, Mahadevan-Jansen A. Raman microspectroscopy for skin
cancer detection in vitro. J Biomed Opt 2008;13:024013.
56. de Mattos Freire Pereira R, Martin AA, Tierra-Criollo C, Santos I. Diagnosis of squamous cell carcinoma of
human skin by Raman spectroscopy. Proc SPIE 2004;5326:106-12.
57. Lieber CA, Majumder SK, Ellis DL, Billheimer DD, Mahadevan-Jansen A. In vivo nonmelanoma skin cancer
diagnosis using Raman microspectroscopy. Lasers Surg Med 2008;40:461-7.
58. Schallreuter KU, Moore J, Wood JM, Beazley WD, Gaze DC, Tobin DJ, et al. In vivo and in vitro evidence for
hydrogen peroxide (H2O2) accumulation in the epidermis of patients with vitiligo and its successful removal by a
UVB-activated pseudocatalase. J Investig Dermatol Symp Proc 1999;4:91-6.
59. Gibbons NC, Wood JM, Rokos H, Schallreuter KU. Computer simulation of native epidermal enzyme structures
in the presence and absence of hydrogen peroxide (H2O2): Potential and pitfalls. J Invest Dermatol 2006;126:2576-
82.
60. Hasse S, Gibbons NC, Rokos H, Marles LK, Schallreuter KU. Perturbed 6-tetrahydrobiopterin recycling via
decreased dihydropteridine reductase in vitiligo: More evidence for H2O2 stress. J Invest Dermatol 2004;122:307-
13.
61. Schallreuter KU, Bahadoran P, Picardo M, Slominski A, Elassiuty YE, Kemp EH, et al. Vitiligo pathogenesis:
Autoimmune disease, genetic defect, excessive reactive oxygen species, calcium imbalance, or what else? Exp
Dermatol 2008;17:139-40.
62. Schallreuter KU, Gibbons NC, Zothner C, Abou Elloof MM, Wood JM. Hydrogen peroxide-mediated oxidative
stress disrupts calcium binding on calmodulin: More evidence for oxidative stress in vitiligo. Biochem Biophys Res
Commun 2007;360:70-5.
63. Schallreuter KU, Gibbons NC, Zothner C, Elwary SM, Rokos H, Wood JM. Butyrylcholinesterase is present in the
human epidermis and is regulated by H2O2: More evidence for oxidative stress in vitiligo. Biochem Biophys Res
Commun 2006;349:931-8.
64. Rokos H, Wood J, Hasse S, Schallreuter K. Identification of epidermal L-trypophan and its oxidation products by
in vivo FT-Raman spectroscopy further supports oxidative stress in patients with vitiligo. J Raman Spectrosc
2008;39:1214-8.
65. Vafaee T, Rokos H, Salem MM, Schallreuter KU. In vivo and in vitro evidence for epidermal H2O2-mediated
oxidative stress in piebaldism. Exp Dermatol 2010;19:883-7.
66. Azrad E, Cagnano E, Halevy S, Rosenwaks S, Bar I. Bullous pemphigoid detection by micro-Raman
spectroscopy under cluster analysis: Structure alterations of proteins. J Raman Spectrosc 2005;36:1034-9.

PDF GENERADO POR PROQUEST.COM Page 12 of 27


67. Kezic S, Kemperman PM, Koster ES, de Jongh CM, Thio HB, Campbell LE, et al. Loss-of-function mutations in
the filaggrin gene lead to reduced level of natural moisturizing factor in the stratum corneum. J Invest Dermatol
2008;128:2117-9.
68. Gonzalez FJ, Alda J, Moreno-Cruz B, Martinez-Escaname M, Ramirez-Elias MG, Torres-Alvarez B, et al. Use of
Raman spectroscopy for the early detection of filaggrin-related atopic dermatitis. Skin Res Technol 2011;17:45-50.
69. Motta S, Sesana S, Monti M, Giuliani A, Caputo R. Interlamellar lipid differences between normal and psoriatic
stratum corneum. Acta Derm Venereol Suppl (Stockh) 1994;186:131-2.
70. Osada M, Gniadecka M, Wulf HC. Near-infrared Fourier transform Raman spectroscopic analysis of proteins,
water and lipids in intact normal stratum corneum and psoriasis scales. Exp Dermatol 2004;13:391-5.
71. Wohlrab J, Vollmann A, Wartewig S, Marsch WC, Neubert R. Noninvasive characterization of human stratum
corneum of undiseased skin of patients with atopic dermatitis and psoriasis as studied by Fourier transform
Raman spectroscopy. Biopolymers 2001;62:141-6.
72. Egawa M, Kunizawa N, Hirao T, Yamamoto T, Sakamoto K, Terui T, et al. In vivo characterization of the
structure and components of lesional psoriatic skin from the observation with Raman spectroscopy and optical
coherence tomography: A pilot study. J Dermatol Sci 2010;57:66-9.
73. Gniadecka M, Wulf HC, Johansson CK, Ullman S, Halberg P, Rossen K. Cutaneous tophi and calcinosis
diagnosed in vivo by Raman spectroscopy. Br J Dermatol 2001;145:672-4.
74. Cinotti E, Labeille B, Perrot JL, Boukenter A, Ouerdane Y, Cambazard F. Characterization of cutaneous foreign
bodies by Raman spectroscopy. Skin Res Technol 2013;19:508-9.
75. Moncada B, Sahagun-Sanchez LK, Torres-Alvarez B, Castanedo-Cazares JP, Martinez-Ramirez JD, Gonzalez FJ.
Molecular structure and concentration of melanin in the stratum corneum of patients with melasma.
Photodermatol Photoimmunol Photomed 2009;25:159-60.
76. Berger A. Raman spectroscopy of blood and urine specimens. In: Matousek P, Morris M, editors. Emerging
Raman Applications and Techniques in Biomedical and Pharmaceutical Fields. New York: Springer; 2010.
77. Willemse-Erix HF, Jachtenberg J, Barutci H, Puppels GJ, van Belkum A, Vos MC, et al. Proof of principle for
successful characterization of methicillin-resistant coagulase-negative staphylococci isolated from skin by use of
Raman spectroscopy and pulsed-field gel electrophoresis. J Clin Microbiol 2010;48:736-40.
78. Huang WE, Stoecker K, Griffiths R, Newbold L, Daims H, Whiteley AS, et al. Raman-FISH: Combining stable-
isotope Raman spectroscopy and fluorescence in situ hybridization for the single cell analysis of identity and
function. Environ Microbiol 2007;9:1878-89.
79. Xie C, Li Y. Confocal micro-Raman spectroscopy of single biological cells using optical trapping and shifted
excitation difference techniques. J Appl Phys 2003;93:2982-6.
80. Smijs TG, Jachtenberg JW, Pavel S, Bakker-Schut TC, Willemse-Erix D, de Haas ER, et al. Detection and
differentiation of causative organisms of onychomycosis in an ex vivo nail model by means of Raman
spectroscopy. J Eur Acad Dermatol Venereol 2014;28:1492-9.
AuthorAffiliation
Avijit Hazra:  Department of Pharmacology, Institute of Postgraduate Medical Education and Research, Kolkata,
West Bengal
Nithya Gogtay:  Department of Clinical Pharmacology, Seth GS Medical College and KEM Hospital, Parel, Mumbai,
Maharashtra

TEXTO COMPLETO TRADUCIDO


Conceptos básicos de la bioestadística

La aplicación de métodos estadísticos en la investigación biomédica comenzó hace más de 150 años. Una de las
primeras pioneras, Florence Nightingale, icono de la enfermería, trabajó durante la guerra de Crimea de la década
de 1850 para mejorar los métodos de construcción de tablas de mortalidad. Las conclusiones de sus mesas

PDF GENERADO POR PROQUEST.COM Page 13 of 27


ayudaron a cambiar las prácticas en los hospitales del Ejército de todo el mundo. Al mismo tiempo, John Snow en
Inglaterra aplicó métodos estadísticos simples para apoyar su teoría de que el agua contaminada de una sola
mano fue la fuente de la epidemia de cólera de Londres en 1854. Hoy en día, las estadísticas son parte integrante
de la realización de investigaciones biomédicas. Además, el conocimiento de las estadísticas es obligatorio para
leer y entender la mayoría de la literatura biomédica.
Pero, ¿por qué es así? En términos generales, la estadística es la ciencia del análisis de datos y de sacar
conclusiones de este modo ante la variabilidad y la incertidumbre. Los investigadores biomédicos realizan
estudios en diversos entornos: en el laboratorio, en la clínica, en el campo o simplemente con datos ya archivados
en bases de datos. Sea cual sea la fuente, los datos tienden a mostrar una variabilidad sustancial. Por ejemplo, los
pacientes que reciben el mismo medicamento antimicrobiano pueden responder de forma algo diferente, las ratas
de laboratorio mantenidas en idénticas condiciones pueden desarrollar variaciones de comportamiento, los
individuos que residen como vecinos en la misma localidad pueden diferir mucho en su percepción del estigma
asociado a una piel común. enfermedad como el vitiligo. A menudo, el grado de variabilidad es sustancial incluso
cuando las condiciones observacionales o intervencionistas se mantienen lo más uniformes y constantes posible.
El desafío para el investigador biomédico es desenterrar los patrones que se están oscureciendo por la
variabilidad de las respuestas en los sistemas vivos. Además, el investigador suele estar interesado en pequeñas
diferencias o cambios. Por ejemplo, si le damos dos antibióticos y decimos que el medicamento A tiene una tasa
de curación del 10% en la foliculitis con 7 días de tratamiento, mientras que el medicamento B tiene una tasa de
curación del 90% en la misma situación, y le pedimos que elija uno para su paciente; la elección sería obvia. Sin
embargo, si dijéramos que las tasas de curación de los medicamentos A y B son del 95% y el 97% respectivamente,
entonces su elección no será tan obvia. Es muy probable que se pregunte si la diferencia del 2% vale la pena
cambiar la práctica si está acostumbrado a usar el medicamento A o tal vez observará otros factores como el
perfil de toxicidad, el costo o la facilidad de uso. Las estadísticas nos proporcionan las herramientas, aunque
matemáticas, para tomar una decisión apropiada juzgando la «importancia» de tan pequeñas diferencias o
cambios observados.
Además, es importante recordar que las estadísticas son la ciencia de la generalización. Por lo general, no
estamos en condiciones de llevar a cabo estudios de tipo «censo» que abarcan poblaciones enteras. Por lo tanto,
solemos estudiar subconjuntos o muestras de una población y esperamos que las conclusiones extraídas del
estudio de dicho subconjunto se generalicen a la población en su conjunto. Este proceso está plagado de errores y
necesitamos técnicas estadísticas para que las generalizaciones sean sostenible.
Antes del advenimiento de las computadoras y del software estadístico, los investigadores y otros que se
ocupaban de las estadísticas tenían que hacer la mayor parte de su análisis a mano, recurriendo a libros de
fórmulas estadísticas y tablas estadísticas. Esto requería ser competente en las matemáticas subyacentes a las
estadísticas. Esto ya no es obligatorio, ya que un software cada vez más fácil de usar elimina la carga de los
cálculos y evita la necesidad de buscar tablas estadísticas. Por lo tanto, hoy en día, la comprensión de los
aspectos aplicados de la estadística basta para la mayoría de los investigadores y rara vez necesitamos
profundizar en las profundidades matemáticas de las estadísticas, para dar sentido a los datos que generamos o
analizamos.
Las aplicaciones de la bioestadística abarcan en términos generales tres ámbitos: descripción de los patrones en
los conjuntos de datos mediante diversas medidas descriptivas (estadísticas descriptivas), sacar conclusiones
sobre las poblaciones mediante diversas pruebas estadísticas aplicadas a los datos de muestra (estadísticas
inferenciales) y aplicación de técnicas de modelado para comprender la relación entre variables (modelado
estadístico), a veces con el objetivo de la predicción. En esta serie, analizaremos los usos aplicados de las
estadísticas sin profundizar en las profundidades matemáticas. Esto no es para negar los fundamentos
matemáticos de las estadísticas; estos se pueden encontrar en los libros de texto estadísticos. Nuestro objetivo
es presentar los conceptos y examinar las aplicaciones desde el punto de vista del usuario aplicado de
bioestadística.

PDF GENERADO POR PROQUEST.COM Page 14 of 27


Datos y variables

Los datos constituyen la materia prima del trabajo estadístico. Son registros de mediciones u observaciones o
simplemente recuentos. Una variable hace referencia a un carácter concreto en el que se registra un conjunto de
datos. Los datos son, por lo tanto, los valores de una variable. Antes de realizar un estudio, es importante tener en
cuenta la naturaleza de las variables que se van a registrar. Esto influirá en la forma en que se realizan las
observaciones, la forma en que se resumen y la elección de las pruebas estadísticas que se utilizarán.
En el nivel más básico, es importante distinguir entre dos tipos de datos o variables. El primer tipo incluye los
medidos en una escala adecuada utilizando un dispositivo de medición adecuado y se denomina variable
cuantitativa. Dado que las variables cuantitativas siempre tienen valores expresados como números y las
diferencias entre valores tienen un significado numérico, también se denominan variables numéricas. El segundo
tipo incluye aquellos que se definen por alguna característica o calidad, y se denomina variable cualitativa. Dado
que los datos cualitativos se resumen mejor agrupando las observaciones en categorías y contando los números
de cada una, a menudo se las denomina variables categóricas.
Una variable cuantitativa puede ser continua o discreta. Una variable continua puede, al menos en teoría, tomar
cualquier valor dentro de un rango determinado, incluidos los valores fraccionarios. Una variable discreta solo
puede adoptar ciertos valores discretos dentro de un rango determinado; a menudo, estos valores son enteros. A
veces, las variables (por ejemplo, la edad de los adultos) se tratan como variables discretas, aunque en sentido
estricto son continuas. Una variable cualitativa puede ser una variable nominal o una variable ordinal. Una variable
nominal cubre categorías que no se pueden clasificar y ninguna categoría es más importante que otra. Los datos
se generan simplemente nombrando, sobre la base de un atributo cualitativo, la categoría apropiada a la que
pertenece la observación. Una variable ordinal tiene categorías que siguen una jerarquía lógica y, por lo tanto, se
pueden clasificar. Podemos asignar números (puntuaciones) a categorías nominales y ordinales; aunque las
diferencias entre esos números no tienen significado numérico. Sin embargo, los recuentos de categorías tienen
significación numérica. Puede existir un caso especial para las variables categóricas o numéricas cuando la
variable en cuestión solo puede adoptar uno de los dos valores numéricos o pertenecer a una de las dos
categorías; se conocen como variables binarias o dicotómicas [Tabla 1]. {Tabla 1}

Los datos numéricos se pueden registrar en una escala de intervalos o en una escala de relación. En una escala de
intervalos, las diferencias entre dos números consecutivos tienen la misma significación en cualquier parte de la
escala, a diferencia de la puntuación de una variable ordinal («escala ordinal»). Por ejemplo, cuando se mide la
altura, la diferencia entre 100 y 102 cm es la misma que la diferencia entre 176 y 178 cm. La escala de relación es
un caso especial de grabación de datos de intervalos. Con los datos de escala de intervalos, el valor 0 puede ser
arbitrario, como la posición 0 en algunas escalas de temperatura; el Fahrenheit 0 se encuentra en una posición
diferente a la de la escala Celsius. Con la escala de ratio, 0 indica en realidad el punto en el que no se puntua nada
en la escala («verdadero 0"), como 0 en la escala de temperatura absoluta o Kelvin. Por lo tanto, podemos decir
que una escala de medición de intervalos tiene las propiedades de identidad, magnitud e intervalos iguales,
mientras que la escala de ratio tiene la propiedad adicional de un 0 verdadero. Solo en una escala de ratio, las
diferencias se pueden juzgar en forma de ratios. 0 grados C no es 0 calor, ni 26 grados C dos veces más caliente
que 13 grados C; mientras que estos juicios de valor se mantienen con la escala Kelvin. En la práctica, esta
distinción no es tremendamente importante en lo que respecta al tratamiento de datos numéricos en pruebas
estadísticas.
Es posible cambiar las escalas de datos para que los datos numéricos se conviertan en ordinales y los datos
ordinales se vuelvan nominales (incluso dicotómicas). Esto se puede hacer cuando el investigador no confía en la
exactitud del instrumento de medición, no le preocupa la pérdida de detalles precisos o cuando los números de
grupo no son lo suficientemente grandes como para representar adecuadamente una variable de interés. También
puede facilitar la interpretación clínica. Por ejemplo, el Índice de Calidad de Vida de Dermatología (DLQI) se utiliza

PDF GENERADO POR PROQUEST.COM Page 15 of 27


para evaluar cuánto del problema cutáneo de un sujeto adulto está afectando su calidad de vida. Una puntuación
DLQI 20 <6 indicates that the skin problem is hardly affecting the quality of life, score of 6-20 indicates moderate to
large effect on quality while score> indica que el problema está degradando gravemente la calidad de vida. Esta
categorización puede ser más relevante para el médico que la puntuación DLQI real obtenida. En cambio, la
conversión de categórica a numérica no será factible sin tener mediciones reales.
Al explorar la relación entre variables, algunas pueden considerarse dependientes (variable dependiente) de otras
(variables independientes). Por ejemplo, al explorar la relación entre estatura y edad, es obvio que la altura
depende de la edad, al menos hasta cierta edad. Por lo tanto, la edad es la variable independiente, que influye en el
valor de la altura de la variable dependiente. When exploring the relationship between multiple variables, usually in
a modeling situation, the value of the outcome (response) variable depends on the value of one or more predictor
(explanatory) variables. In this situation, some variables may be identified that cannot be accurately measured or
controlled and only serve to confuse the results. They are called confounding variables or confounders. Thus, in a
study of the protective effect of a sunscreen in preventing skin cancer, the amount of time spent in outdoor activity
could be a major confounder. The extent of skin pigmentation would be another confounder. There could even be
confounders whose existence is unknown or effects unsuspected, for instance, undeclared consumption of
antioxidants by the subjects which is quite possible because the study would go on for a long time. Such
unsuspected confounders have been called lurking variables.
Numerical or categorical variables may sometimes need to be ranked, that is arranged in ascending order and new
values assigned to them serially. Values that tie are each assigned average of the ranks they encompass. Thus, a
data series 2, 3, 3, 10, 23, 35, 37, 39, 45 can be ranked as 1, 2.5, 2.5, 4, 5, 6, 7, 8, 9 since the 2, 3s encompass ranks 2
and 3, giving an average rank value of 2.5. Note that when a numerical variable is ranked, it gets converted to an
ordinal variable. Ranking obviously does not apply to nominal variables because their values do not follow any
order.
Descriptive Statistics
Descriptive statistics implies summarizing a raw data set obtained from a sample or population. Traditionally,
summaries of sample data ("statistics") have been denoted by Roman letters (e.g., [x overline] for mean, standard
deviation [SD], etc.) while summaries of population data ("parameters") have been denoted by Greek letters (e.g.,
[mu] for mean, [sigma] for SD, etc.). The description serves to identify patterns or distributions in data sets from
which important conclusions may be drawn.
Categorical data are described in terms of percentages or proportions. With numerical data, individual
observations within a sample or population tend to cluster about a central location, with more extreme
observations being less frequent. The extent to which observations cluster is summarized by measures of central
tendency while the spread can be described by measures of dispersion.
Measures of Central Tendency
The mean (or more correctly, the arithmetic mean) is calculated as the sum of the individual values in a data
series, divided by the number of observations. The mean is the most commonly used measure of central tendency
to summarize a set of numerical observations. It is usually reliable unless there are extreme values (outliers) that
can distort the mean. It should not, ordinarily be used, in describing categorical variables because of the arbitrary
nature of category scoring. It may, however, be used to summarize category counts.
The geometric mean of a series of n observations is the nth root of the product of all the observations. It is always
equal to or less than the arithmetic mean. It is not often used but is a more appropriate measure of central location
when data recorded span several orders of magnitude, e.g. bacterial colony counts from a culture of clinical
specimens. Interestingly, the logarithm of the geometric mean is the arithmetic mean of the logarithms of the
observations. As such, the geometric mean may be calculated by taking the antilog of the arithmetic mean of the
log values of the observations. The harmonic mean of a set of non-zero positive numbers is obtained as the
reciprocal of the arithmetic mean of the reciprocals of these numbers. It is seldom used in biostatistics. Unlike the
arithmetic mean, neither geometric nor harmonic mean can be applied to negative numbers.

PDF GENERADO POR PROQUEST.COM Page 16 of 27


Often data are presented as a frequency table. If the original data values are not available, a weighted average can
be estimated from the frequency table by multiplying each data value by the number of cases in which that value
occurs, summing up the products and dividing the sum by the total number of observations. A frequency table of
numerical data may report the frequencies for class intervals (the entire range covered being broken up into a
convenient number of classes) rather than for individual data values. In such cases, we can calculate the weighted
average by using the mid-point of the class intervals. However, in this instance, the weighted mean may vary
slightly from the arithmetic mean of all the raw observations. Apart from counts, there may be other ways of
ascribing weights to observations before calculating a weighted average.
For data sets with extreme values, the median is a more appropriate measure of central tendency. If the values in a
data series are arranged in order, the median denotes the middle value (for an odd number of observations) or the
average of the two middle values (for an even number of observations). The median denotes the point in a data
series at which half the observations are larger and half are smaller. As such it is identical to the 50 th percentile
value. If the distribution of the data is perfectly symmetrical (as in the case of a normal distribution that we
discuss later), the values of the median and mean coincide. If the distribution has a long tail to the right (a positive
skew), the mean exceeds the median; if the long tail is to the left
(a negative skew), the median exceeds the mean. Thus, the relationship of the two gives an idea of the symmetry
or asymmetry (skewness) of the distribution of data.
Mode is the most frequently occurring value in a data series. It is not often used, for the simple reason that it is
difficult to pinpoint a mode if no value occurs with a frequency markedly greater than the rest. Furthermore, two or
more values may occur with equal frequency, making the data series bimodal or multimodal [Box 1].
[INLINE:1]
Measures of Dispersion
The spread, or variability, of a data series can be readily described by the range, that is the interval between
minimum and maximum values. However, the range does not provide much information about the overall
distribution of observations and is obviously affected by extreme values.
A more useful estimate of the spread can be obtained by arranging the values in ascending order and then
grouping them into 100 equal parts (in terms of the number of values) that are called centiles or percentiles. It is
then possible to state the value at any given percentile, such as the 5 th or the 95 th percentile and to calculate the
range of values between any two percentiles, such as the 10 th and 90 th or the 25 th and the 75 th percentiles.
The median represents the 50 th percentile. Quartiles divide ordered data set into four equal parts, with the upper
boundaries of the first, second, and third quartiles often denoted as Q1, Q2, and Q3, respectively. Note the
relationship between quartiles and percentiles. Q1 corresponds to 25 th percentile while Q3 corresponds to 75 th
percentile. Q2 is the median value in the set. If we estimate the range of the middle 50% of the observations about
the median (i.e., Q1-Q3), we have the interquartile range. If the dispersion in the data series is less, we can use the
10 th to 90 th percentile value to denote spread.
A still better method of measuring variability about the central location is to estimate how closely the individual
observations cluster about it. This leads to the mean square deviation or variance, which is calculated as the sum
of the squares of individual deviations from the mean, divided by one less than the number of observations. The
SD of a data series is simply the square root of the variance. Note that the variance is expressed in squared units,
which is difficult to comprehend, but the SD retains the basic unit of observation.
The formulae for the variance (and SD) for a population has the value " n" as the denominator. However, the
expression (n - 1) is used when calculating the variance (and SD) of a sample. The quantity (n - 1) denotes the
degrees of freedom, which is the number of independent observations or choices available. For instance if a series
of four numbers is to add up to 100, we can assign different values to the first three, but the value of the last is
fixed by the first three choices and the condition imposed that the total must be 100. Thus, in this example, the
degrees of freedom can be stated to be 3. The degrees of freedom is used when calculating the variance (and SD)
of a sample because the sample mean is a predetermined estimate of the population mean, and, in the sample,

PDF GENERADO POR PROQUEST.COM Page 17 of 27


each observation is free to vary except the last one that must be a defined value.
The coefficient of variation (CV) of a data series denotes the SD expressed as a percentage of the mean. Thus, it
denotes the relative size of the SD with respect to the mean. CV can be conveniently used to compare variability
between studies, since, unlike SD, its magnitude is independent of the units employed.
Measures of Precision
An important source of variability in biological observations is measurement imprecision and CV is often used to
quantify this imprecision. It is thus commonly used to describe variability of measuring instruments and laboratory
assays, and it is generally taken that a CV of <5% is acceptable reproducibility.
Another measure of precision for a data series is the standard error of the mean (SEM), which is simply calculated
as the SD divided by the square root of the number of observations. Since, SEM is a much smaller numerical value
than SD, it is often presented in place of SD as a measure of the spread of data. However, this is erroneous since
SD is meant to summarize the spread of data, while SEM is a measure of precision and is meant to provide an
estimate of a population parameter from a sample statistic in terms of the confidence interval (CI).
It is self-evident that when we make observations on a sample, and calculate the sample mean, this will not be
identical to the population ("true") mean. However, if our sample is sufficiently large and representative of the
population, and we have made our observations or measurements carefully, and then the sample mean would be
close to the true mean. If we keep taking repeated samples and calculate a sample mean in each case, the
different sample means would have their own distribution, and this would be expected to have less dispersion than
that of all the individual observations in the samples. In fact, it can be shown that the different sample means
would have a symmetrical distribution, with the true population mean at its central location, and the SD of this
distribution would be nearly identical to the SEM calculated from individual samples.
In general, however, we are not interested in drawing multiple samples, but rather how reliable our one sample is in
describing the population. We use standard error to define a range in which the true population value is likely to lie,
and this range is the CI while its two terminal values are the confidence limits. The width of the CI depends on the
standard error and the degree of confidence required. Conventionally, the 95% CI (95% CI) is most commonly used.
From the properties of a normal distribution curve (see below) it can be shown that the 95% CI of the mean would
cover a range 1.96 standard errors either side of the sample mean, and will have a 95% probability of including the
population mean; while 99% CI will span 2.58 standard errors either side of the sample mean and will have 99%
probability of including the population mean. Thus, a fundamental relation that needs to be remembered is:
95% CI of mean = Sample mean +/- 1.96 x SEM.
It is evident that the CI would be narrower if SEM is smaller. Thus if a sample is larger, SEM would be smaller and
the CI would be correspondingly narrower and thus more "focused" on the true mean. Large samples therefore
increase precision. It is interesting to note that although increasing sample size improves precision, it is a
somewhat costly approach to increasing precision, since halving of SEM requires a 4-fold increase in sample size.
CIs can be used to estimate most population parameters from sample statistics (means, proportions, correlation
coefficients, regression coefficients, odds ratios, relative risks, etc.). In all cases, the principles and the general
pattern of estimating the CI remains the same, that is:
95% CI of a parameter = Sample statistic +/- 1.96 x standard error for that statistic.
The formulae for estimating standard error however varies for different statistics, and in some instances is quite
elaborate. Fortunately, we generally rely on computer software to do the calculations.
Frequency Distributions
It is useful to summarize a set of raw numbers with a frequency distribution. The summary may be in the form of a
table or a graph (plot). Many frequency distributions are encountered in medical literature [Figure 1] and it is
important to be familiar with commonly encountered ones.{Figure 1}
Majority of distributions that quantitative clinical data follow are unimodal, that is the data have a single peak
(mode) with a tail on either side. The most common of these unimodal distributions is the bell-shaped symmetrical
distribution called the normal distribution or the Gaussian distribution [Figure 2]. In this distribution, the values of

PDF GENERADO POR PROQUEST.COM Page 18 of 27


mean, median and mode will coincide. However, some distributions are skewed with a substantially longer tail on
one side. The type of skew is determined by the direction of the longer tail. A positively skewed distribution has a
longer tail to the right. In this case, the mean will be greater than the median because the mean is strongly
influenced by the extreme values in the right-hand tail. On the other hand, a negatively skewed distribution has a
longer tail to the left; in this instance, the mean will be smaller than the median. Thus, the relationship between
mean and median gives an idea of the distribution of numerical data.{Figure 2}
It is possible that datasets may have more than one peak (mode). Such data can be difficult to manage and it may
be the case that neither the mean nor the median is a representative measure. However, it is important to
remember that bimodal or multimodal distributions are rare and may even be artifactual. A distribution with two
peaks may actually be reflecting a combination of two unimodal distributions, for instance, one for each gender or
different age groups. In such cases, appropriate subdivision, categorization, or even recollection of the data may
be required to eliminate multiple peaks.
Probability Distributions
A random variable is a numerical quantity whose values are determined by the outcomes of a random experiment.
The possible values of a random variable and the associated probabilities constitute a statistical probability
distribution. The concept of probability distributions and frequency distributions are similar in that each
associates a number with the possible values of a variable. However, for a frequency distribution, the number is a
frequency, while for a probability distribution, this number is a probability. A frequency distribution describes a set
of data that has been observed; it is thus empirical. A probability distribution describes data that might be
observed under certain specified conditions; hence it is theoretical. Probability distributions are part of descriptive
statistics, and they can be used to predict how random variables are expected to behave under certain conditions.
If the empirical data deviate considerably from the predictions of a probability distribution model, the correctness
of the model or its assumptions can be questioned, and we may look for alternative models to fit the empirical
data. [Table 2] provides examples of statistical probability distributions. Note that, they are broadly classified as
continuous or discrete probability distributions depending on whether the random variable in question is a
continuous or a discrete variable.{Table 2}
Of the many probability distributions that can be used to model biological events or observations, the most
common is the normal distribution. In such a distribution, the values of the random variable tend to cluster around
a central value, with a symmetrical positive and negative dispersion about this point. The more extreme values
become less frequent the further they lie from the central point [Figure 3]. The term "normal" relates to the sense of
'standard' against which other distributions may be compared. It is also referred to as a Gaussian distribution after
the German mathematician, Karl Friedrich Gauss (1777-1855), although Gauss was not the first person to describe
such a distribution. The bell curve was named 'normal curve' by the great Karl Pearson. Important properties of a
normal distribution are:{Figure 3}
Unimodal bell-shaped distributionSymmetric about the meanFlattens symmetrically as the variance is
increasedKurtosis is 0 ("kurtosis" refers to how peaked a distribution is)The tails may extend toward infinity, but
the total area is taken as 1.
In a normal distribution curve, the mean, median, and mode coincide. The area delimited by one SD either side of
the mean includes 68% of the total area, two SDs 95.4%, and three SDs 99.7%; 95% of the values lie within 1.96 SDs
on either side of the mean. It is for this reason that the interval denoted by mean +/- 1.96 x SD is often taken as the
normal range or reference range for many physiological variables.
If we look at the equation for the normal distribution, it is evident that there are two parameters that define the
curve, namely [mu] (the mean) and [sigma] (the SD):
[INLINE:2]
The standard normal distribution curve is a special case of the normal distribution for which probabilities have
been calculated. It is a symmetrical bell-shaped curve with a mean of 0 and a variance (or SD) of 1. The random
variable of a standard normal distribution is the Z-score of the corresponding value of the variable for the normal

PDF GENERADO POR PROQUEST.COM Page 19 of 27


distribution. A standard normal distribution table shows cumulative probability associated with particular Z-scores
and can be used to estimate probabilities of particular values of a normally distributed variable.
In all biomedical research where samples are used to learn about populations, some random procedure is essential
for subject selection to avoid many kinds of bias. This takes the form of random sampling from a population or
randomized allocation of participants to interventional groups. Randomness is a property of the procedure rather
than of the sample and ensures that every potential subject has a fair and equal chance of getting selected. The
resulting sample is called a random sample. As the number of observations increases (say, n >100), the shape of a
random sampling distribution will approximate a normal distribution curve even if the distribution of the variable in
question is not normal. This is explained by the central limit theorem and is one reason why the normal distribution
is so important in biomedical research.
Many statistical techniques require the assumption of normality of the dataset. It is not mandatory for the sample
data to be normally distributed, but it should represent a population that is normally distributed.
Presenting Data
Once summary measures of data have been calculated, they need to be presented in tables and graphs.
Appropriate data presentation summarizes the data in a compact and meaningful manner without burdening the
reader with a surfeit of information, enables conclusions to be drawn simply by looking at the summarized data
and, of course, helps in further statistical analysis where necessary.
Regarding data presentation in tables, it is helpful to remember the following:
Tables should be numberedEach table must have a concise and self-explanatory titleTables must be formatted
with an appropriate number of rows and columns but should not be too large. Larger tables can usually be split
into multiple simpler tablesColumn headings and row classifiers must be clear and conciseFor tables showing
frequency distributions, it must be clear whether the frequencies depicted in each class or class interval represent
absolute frequency, relative frequency (i.e., the percentage of the total) or the cumulative frequencyFor tables
depicting percentages, it must be clear whether the percentages represent percentages with respect to the row
(row percentage) or the column (column percentage) in which the cell is locatedThe mean is to be used for
numerical data and symmetric (nonskewed) distributionsThe median should be used for ordinal data or for
numerical data if the distribution is skewedThe mode is generally used only for examining bimodal or multimodal
distributionsThe range may be used for numerical data to emphasize extreme valuesThe SD is to be used along
with the meanInterquartile range or percentiles should be used along with the medianSDs and percentiles may
also be used when the objective is to depict a set of norms ("normative data")The CV may be used if the intent is to
compare variability between datasets measured on different numerical scales95% CIs should be used whenever
the intent is to draw inferences about populations from samplesAdditional information required to interpret the
table (e.g., explanation of column headings, other abbreviations, explanatory remarks) can be appended as
footnotes.
For presenting data graphically, it is usually necessary to obtain the summary measures, counts or percentages of
the data. These can then be utilized to draw different types of graphs (or charts or plots or diagrams). The more
common types with some of their variants are summarized in [Table 3] and [Figure 4]. Although charts are visually
appealing, they should not replace tabulation of important summary data. Further, if not constructed or scaled
appropriately, charts can be misleading.{Figure 4}{Table 3}
A pictogram represents quantity by presenting stylized pictures or icons of the variable being depicted - the
number or size of the icon being proportional to the frequency. When comparing between groups using a
pictogram, it is preferable that same-sized icons be used across groups (with their numbers varying) - otherwise
the picture may be misleading. Pictograms are more often used in mass media presentations than in serious
biomedical literature.
Pie chart depicts frequency distribution of categorical data in a circle (the "pie"), with the sectors of the circle
proportional in size to the frequencies in the respective categories. A particular category can be emphasized by
pulling out that sector. All sectors are pulled out in an "exploded" pie chart. Pie charts can be made highly

PDF GENERADO POR PROQUEST.COM Page 20 of 27


attractive, by using color and three-dimensional design enhancements, but become cumbersome if there are too
many categories.
Bar chart (also called column chart) depicts categorical or numerical data as a series of vertical (or horizontal)
bars, with the bar heights (lengths) being proportional to the frequencies or the means. The bar widths and
separation between bars should be uniform but are of little significance other than to indicate that the bars denote
separate series or categories. Bars depicting subcategories can be stacked one on top of another (stacked or
segmented or component bar chart). The frequencies can be converted to percentages so that the total numbers
in each category add up to 100% giving 100% stacked bar chart where all the bars are of equal height. Two or more
data series or subcategories can be depicted on the same bar chart by placing corresponding bars side by side -
different patterns or colors are used to distinguish the different series or subcategories (compound or multiple or
cluster bar chart).
The histogram is similar to bar chart in appearance but is used for summarizing continuous numerical data and
hence there should not be any gaps between the bars. The bar widths correspond to the class intervals. The
alignment of the bars is usually horizontal with the class intervals along the horizontal axis and the frequencies
along the vertical axis. A histogram is popularly used to depict the frequency distribution in a large data series.
Accordingly, the class intervals should be so chosen that the bars are narrow enough to illustrate patterns in the
data but not so narrow that they become too large in number. A histogram must be labeled carefully to depict
clearly where the boundaries lie.
A frequency polygon is a line diagram representation of the frequency distribution depicted by the histogram and
is obtained by joining the midpoints of the upper boundary of the histogram blocks. As such it depicts the
frequency distribution of numerical data as a curve.
Dot plot [Figure 5] depicts frequency distribution of numerical variables like histograms but with the advantage of
depicting individual values as well. Instead of bars, it has a series of dots for each value or class interval - each dot
representing one observation. The alignment can be vertical or horizontal. They are useful in highlighting clusters
and gaps in data sets as well as outliers. Dot plots are conceptually simple but become cumbersome for large data
sets. Scatter plots (sometimes erroneously called dot plots) are used for depicting association between two
variables with the X and Y coordinates of each dot representing the corresponding values of the two variables. A
bubble plot is an extension of the scatter plot to depict the relation between three variables - here each dot is
expanded into a bubble with the diameter of the bubble being proportional to the value of the third variable. This is
preferable to depicting the third variable on a Z axis since it is difficult to comprehend depth on a two-dimensional
surface.{Figure 5}
Stem-and-leaf plot or stem plot [Figure 6] is a sort of mixture of a diagram and a table. It has been devised to
depict frequency distribution, as well as individual values for numerical data. The data values are examined to
determine their last significant digit (the "leaf" item), and this is attached to the previous digits (the "stem" item).
The stem items are usually arranged in ascending or descending order vertically, and a vertical line is usually
drawn to separate the stem from the leaf. The number of leaf items should total up to the number of observations.
However, it becomes cumbersome with large data sets.{Figure 6}
Box-and-whiskers plot (or box plot) is a graphical representation of numerical data based on the five-number
summary - minimum value, 25 th percentile, median (50 th percentile), 75 th percentile and maximum value [Figure
7]. A rectangle is drawn extending from the lower quartile to the upper quartile, with the median dividing this "box"
but not necessarily equally. Lines ("whiskers") are drawn from the ends of the box to the extreme values. Outliers
may be indicated beyond the extreme values by dots or asterisks - in such "modified" 0 or "refined" box plots, the
whiskers have lengths not exceeding 1.5 times the box length. The whole plot may be aligned vertically or
horizontally. Box plots are ideal for summarizing large samples and are being increasingly used. Multiple box plots,
arranged side by side, allow ready comparison of data sets.{Figure 7}
We have looked at the commonly used plots used for summarizing data and depicting underlying patterns. Many
other plots are used in biostatistics for depicting data distributions, time trends in observations, relationships

PDF GENERADO POR PROQUEST.COM Page 21 of 27


between two or more variables, exploring goodness-of-fit to hypothesized data distributions and drawing
inferences by comparing data sets. We will get introduced to select other plots in subsequent modules in this
series.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
Futher Reading
Samuels ML, Witmer JA, Schaffner AA, editors. Description of samples and populations. In: Statistics for the Life
Sciences. 4 th ed. Boston: Pearson Education; 2012. p. 26-80.Kirk RE, editor. Random variables and probability
distributions. In: Statistics: An Introduction. 5 th ed. Belmont: Thomson Wadsworth; 2008. p. 207-27.Kirk RE,
editor. Normal distribution and sampling distributions. In: Statistics: An Introduction. 5 th ed. Belmont: Thomson
Wadsworth; 2008. p. 229-55.Dawson B, Trapp RG, editors. Summarizing data &presenting data in tables and
graphs. In: Basic &Clinical Biostatistics. 4 th ed. New York: McGraw-Hill; 2004. p. 23-60.
References
1. Raman CV, Krishnan KS. A new type of secondary radiation. Nature 1928;121:501-2.
2. Rayleigh L. On the scattering of light by small particles. Philos Mag Lett 1871;41:447-51.
3. Schut T, Wothuis R, Caspers P, Puppels G. Real-time tissue characterization on the basis of in vivo Raman
spectra. J Raman Spectrosc 2002;33:580-5.
4. Lieber C, Mahadevan-Jansen A. Development of a handheld Raman microspectrometer for clinical dermatologic
applications. Opt Express 2007;15:11874-82.
5. Caspers PJ, Lucassen GW, Wolthuis R, Bruining HA, Puppels GJ. In vitro and in vivo Raman spectroscopy of
human skin. Biospectroscopy 1998;4 5 Suppl:S31-9.
6. Huang Z, Zeng H, Hamzavi I, McLean DI, Lui H. Rapid near-infrared Raman spectroscopy system for real-time in
vivo skin measurements. Opt Lett 2001;26:1782-4.
7. Zhao J, Lui H, McLean DI, Zeng H. Integrated real-time Raman system for clinical in vivo skin analysis. Skin Res
Technol 2008;14:484-92.
8. Puppels GJ, de Mul FF, Otto C, Greve J, Robert-Nicoud M, Arndt-Jovin DJ, et al. Studying single living cells and
chromosomes by confocal Raman microspectroscopy. Nature 1990;347:301-3.
9. Valeur B. Molecular Fluorescence: Principles and Applications. Wiley; 2002.
10. Ko AC, Choo-Smith LP, Hewko M, Leonardi L, Sowa MG, Dong CC, et al. Ex vivo detection and characterization
of early dental caries by optical coherence tomography and Raman spectroscopy. J Biomed Opt 2005;10:031118.
11. Caspers PJ, Lucassen GW, Puppels GJ. Combined in vivo confocal Raman spectroscopy and confocal
microscopy of human skin. Biophys J 2003;85:572-80.
12. Smith ZJ, Berger AJ. Surface-sensitive polarized Raman spectroscopy of biological tissue. Opt Lett
2005;30:1363-5.
13. LaPlant F. Lasers, spectrographs, and detectors. In: Matousek P, Morris MD, editors. Emerging Raman
Application and Techniques in Biomedical and Pharmaceutical Fields. New York: Springer; 2010.
14. Barry B, Edwards HG, Williams A. Fourier transform Raman and infrared vibrational study of human skin:
Assignment of spectral bands. J Raman Spectrosc 1992;23:641-5.
15. Edwards HG, Farwell D, Williams A, Barry BW, Rull F. Novel spectroscopic deconvolution procedure for complex
biological systems: Vibrational components in the FT-Raman spectra of ice-man and contemporary skin. J Chem
Soc Faraday Trans 1995;91:3883-7.
16. Williams A, Edwards HG, Barry B. Raman spectra of human keratotic biopolymers: Skin, callus, hair and nail. J
Raman Spectrosc 1994;25:95-8.
17. Lawson E, Edwards HG, Williams A, Barry B. Applications of Raman spectroscopy to skin research. Skin Res
Technol 1997;3:147-53.

PDF GENERADO POR PROQUEST.COM Page 22 of 27


18. Schrader B, Keller S, Lochte T, Fendel S, Moore DS, Simon A, et al. NIR FT Raman spectroscopy in medical
diagnostics. J Mol Struct 1995;348:293-6.
19. Williams AC, Barry BW, Edwards HG, Farwell DW. A critical comparison of some Raman spectroscopic
techniques for studies of human stratum corneum. Pharm Res 1993;10:1642-7.
20. Williams AC, Edwards HG, Barry BW. The 'Iceman': Molecular structure of 5200-year-old skin characterised by
Raman spectroscopy and electron microscopy. Biochim Biophys Acta 1995;1246:98-105.
21. Edwards HG, Williams AC, Barry B. Potential applications of FT-Raman spectroscopy for dermatological
diagnostics. J Mol Struct 1995;347:379-87.
22. Anigbogu A, Williams A, Barry B, Edwards HG. Fourier transform Raman spectroscopy of interactions between
the penetration enhancer dimethyl sulfoxide and human stratum corneum. Int J Pharm 1995;125:265-82.
23. Schallreuter KU, Wood JM, Farwell DW, Moore J, Edwards HG. Oxybenzone oxidation following solar irradiation
of skin: Photoprotection versus antioxidant inactivation. J Invest Dermatol 1996;106:583-6.
24. Gniadecka M, Faurskov Nielsen O, Christensen DH, Wulf HC. Structure of water, proteins, and lipids in intact
human skin, hair, and nail. J Invest Dermatol 1998;110:393-8.
25. Schrader B, Dippel B, Fendel S, Keller S, Lochte T, Reidl M, et al. NIR FT Raman spectroscopy - A new tool in
medical diagnostics. J Mol Struct 1997;408:23-31.
26. Shim M, Wilson B. Development of an in vivo Raman spectroscopic system for diagnostic applications. J
Raman Spectrosc 1997;28:131-42.
27. Caspers PJ, Lucassen GW, Bruining HA, Puppels GJ. Automated depth-scanning confocal Raman
microspectrometer for rapid in vivo determination of water concentration profiles in human skin. J Raman
Spectrosc 2000;31:813-8.
28. Egawa M, Tagami H. Comparison of the depth profiles of water and water-binding substances in the stratum
corneum determined in vivo by Raman spectroscopy between the cheek and volar forearm skin: Effects of age,
seasonal changes and artificial forced hydration. Br J Dermatol 2008;158:251-60.
29. Nakagawa N, Matsumoto M, Sakai S. In vivo measurement of the water content in the dermis by confocal
Raman spectroscopy. Skin Res Technol 2010;16:137-41.
30. Caspers PJ, Williams AC, Carter EA, Edwards HG, Barry BW, Bruining HA, et al. Monitoring the penetration
enhancer dimethyl sulfoxide in human stratum corneum in vivo by confocal Raman spectroscopy. Pharm Res
2002;19:1577-80.
31. Melot M, Pudney PD, Williamson AM, Caspers PJ, Van Der Pol A, Puppels GJ. Studying the effectiveness of
penetration enhancers to deliver retinol through the stratum cornum by in vivo confocal Raman spectroscopy. J
Control Release 2009;138:32-9.
32. Pudney PD, Melot M, Caspers PJ, Van Der Pol A, Puppels GJ. An in vivo confocal Raman study of the delivery of
trans retinol to the skin. Appl Spectrosc 2007;61:804-11.
33. Zhang G, Flach CR, Mendelsohn R. Tracking the dephosphorylation of resveratrol triphosphate in skin by
confocal Raman microscopy. J Control Release 2007;123:141-7.
34. Zhang G, Moore DJ, Sloan KB, Flach CR, Mendelsohn R. Imaging the prodrug-to-drug transformation of a 5-
fluorouracil derivative in skin by confocal Raman microscopy. J Invest Dermatol 2007;127:1205-9.
35. Crowther JM, Sieg A, Blenkiron P, Marcott C, Matts PJ, Kaczvinsky JR, et al. Measuring the effects of topical
moisturizers on changes in stratum corneum thickness, water gradients and hydration in vivo. Br J Dermatol
2008;159:567-77.
36. Knudsen L, Johansson CK, Philipsen PA, Gniadecka M, Wulf HC. Natural variations and reproducibility of in vivo
near-infrared Fourier transform Raman spectroscopy of normal human skin. J Raman Spectrosc 2002;33:574-9.
37. Darvin ME, Fluhr JW, Caspers P, van der Pool A, Richter H, Patzelt A, et al. In vivo distribution of carotenoids in
different anatomical locations of human skin: Comparative assessment with two different Raman spectroscopy
methods. Exp Dermatol 2009;18:1060-3.
38. Lademann J, Caspers P, Van Der Pol A, Richter H, Patzelt A, Zastrow L, et al. In vivo Raman spectroscopy

PDF GENERADO POR PROQUEST.COM Page 23 of 27


detects increased epidermal antioxidative potential with topically applied carotenoids. Laser Phys Lett 2008;6:76-
9.
39. Mayne ST, Cartmel B, Scarmo S, Lin H, Leffell DJ, Welch E, et al. Noninvasive assessment of dermal
carotenoids as a biomarker of fruit and vegetable intake. Am J Clin Nutr 2010;92:794-800.
40. Zidichouski JA, Mastaloudis A, Poole SJ, Reading JC, Smidt CR. Clinical validation of a noninvasive, Raman
spectroscopic method to assess carotenoid nutritional status in humans. J Am Coll Nutr 2009;28:687-93.
41. Gniadecka M, Wulf HC, Nielsen OF, Christensen DH, Hercogova J. Distinctive molecular abnormalities in benign
and malignant skin lesions: Studies by Raman spectroscopy. Photochem Photobiol 1997;66:418-23.
42. Hata TR, Scholz TA, Ermakov IV, McClane RW, Khachik F, Gellermann W, et al. Non-invasive Raman
spectroscopic detection of carotenoids in human skin. J Invest Dermatol 2000;115:441-8.
43. Sigurdsson S, Philipsen PA, Hansen LK, Larsen J, Gniadecka M, Wulf HC. Detection of skin cancer by
classification of Raman spectra. IEEE Trans Biomed Eng 2004;51:1784-93.
44. Gniadecka M, Philipsen PA, Sigurdsson S, Wessel S, Nielsen OF, Christensen DH, et al. Melanoma diagnosis by
Raman spectroscopy and neural networks: Structure alterations in proteins and lipids in intact cancer tissue. J
Invest Dermatol 2004;122:443-9.
45. Cartaxo SB, Santos ID, Bitar R, Oliveira AF, Ferreira LM, Martinho HS, et al. FT-Raman spectroscopy for the
differentiation between cutaneous melanoma and pigmented nevus. Acta Cir Bras 2010;25:351-6.
46. Gniadecka M, Wulf HC, Nielsen OF, et al. Potential of Raman spectroscopy fo in vitro and in vivo diagnosis of
malignant melanoma. In: XVI International Conference of Raman Spectroscopy. Chichester: John Wiley and Sons;
1998a.
47. Nijssen A, Bakker Schut TC, Heule F, Caspers PJ, Hayes DP, Neumann MH, et al. Discriminating basal cell
carcinoma from its surrounding tissue by Raman spectroscopy. J Invest Dermatol 2002;119:64-9.
48. Lui H, Zhao J, McLean D, Zeng H. Real-time Raman spectroscopy for in vivo skin cancer diagnosis. Cancer Res
2012;72:2491-500.
49. Philipsen PA, Knudsen L, Gniadecka M, Ravnbak MH, Wulf HC. Diagnosis of malignant melanoma and basal cell
carcinoma by in vivo NIR-FT Raman spectroscopy is independent of skin pigmentation. Photochem Photobiol Sci
2013;12:770-6.
50. Choi J, Choo J, Chung H, Gweon DG, Park J, Kim HJ, et al. Direct observation of spectral differences between
normal and basal cell carcinoma (BCC) tissues using confocal Raman microscopy. Biopolymers 2005;77:264-72.
51. Ly E, Durlach A, Antonicelli F, Bernard P, Manfait M, Piot O. Probing tumor and peritumoral tissues in superficial
and nodular basal cell carcinoma using polarized Raman microspectroscopy. Exp Dermatol 2010;19:68-73.
52. Ly E, Piot O, Durlach A, Bernard P, Manfait M. Polarized Raman microspectroscopy can reveal structural
changes of peritumoral dermis in basal cell carcinoma. Appl Spectrosc 2008;62:1088-94.
53. Nijssen A, Maquelin K, Santos LF, Caspers PJ, Bakker Schut TC, den Hollander JC, et al. Discriminating basal
cell carcinoma from perilesional skin using high wave-number Raman spectroscopy. J Biomed Opt
2007;12:034004.
54. Larraona-Puy M, Ghita A, Zoladek A, Perkins W, Varma S, Leach IH, et al. Development of Raman
microspectroscopy for automated detection and imaging of basal cell carcinoma. J Biomed Opt 2009;14:054031.
55. Lieber CA, Majumder SK, Billheimer D, Ellis DL, Mahadevan-Jansen A. Raman microspectroscopy for skin
cancer detection in vitro. J Biomed Opt 2008;13:024013.
56. de Mattos Freire Pereira R, Martin AA, Tierra-Criollo C, Santos I. Diagnosis of squamous cell carcinoma of
human skin by Raman spectroscopy. Proc SPIE 2004;5326:106-12.
57. Lieber CA, Majumder SK, Ellis DL, Billheimer DD, Mahadevan-Jansen A. In vivo nonmelanoma skin cancer
diagnosis using Raman microspectroscopy. Lasers Surg Med 2008;40:461-7.
58. Schallreuter KU, Moore J, Wood JM, Beazley WD, Gaze DC, Tobin DJ, et al. In vivo and in vitro evidence for
hydrogen peroxide (H2O2) accumulation in the epidermis of patients with vitiligo and its successful removal by a
UVB-activated pseudocatalase. J Investig Dermatol Symp Proc 1999;4:91-6.

PDF GENERADO POR PROQUEST.COM Page 24 of 27


59. Gibbons NC, Wood JM, Rokos H, Schallreuter KU. Computer simulation of native epidermal enzyme structures
in the presence and absence of hydrogen peroxide (H2O2): Potential and pitfalls. J Invest Dermatol 2006;126:2576-
82.
60. Hasse S, Gibbons NC, Rokos H, Marles LK, Schallreuter KU. Perturbed 6-tetrahydrobiopterin recycling via
decreased dihydropteridine reductase in vitiligo: More evidence for H2O2 stress. J Invest Dermatol 2004;122:307-
13.
61. Schallreuter KU, Bahadoran P, Picardo M, Slominski A, Elassiuty YE, Kemp EH, et al. Vitiligo pathogenesis:
Autoimmune disease, genetic defect, excessive reactive oxygen species, calcium imbalance, or what else? Exp
Dermatol 2008;17:139-40.
62. Schallreuter KU, Gibbons NC, Zothner C, Abou Elloof MM, Wood JM. Hydrogen peroxide-mediated oxidative
stress disrupts calcium binding on calmodulin: More evidence for oxidative stress in vitiligo. Biochem Biophys Res
Commun 2007;360:70-5.
63. Schallreuter KU, Gibbons NC, Zothner C, Elwary SM, Rokos H, Wood JM. Butyrylcholinesterase is present in the
human epidermis and is regulated by H2O2: More evidence for oxidative stress in vitiligo. Biochem Biophys Res
Commun 2006;349:931-8.
64. Rokos H, Wood J, Hasse S, Schallreuter K. Identification of epidermal L-trypophan and its oxidation products by
in vivo FT-Raman spectroscopy further supports oxidative stress in patients with vitiligo. J Raman Spectrosc
2008;39:1214-8.
65. Vafaee T, Rokos H, Salem MM, Schallreuter KU. In vivo and in vitro evidence for epidermal H2O2-mediated
oxidative stress in piebaldism. Exp Dermatol 2010;19:883-7.
66. Azrad E, Cagnano E, Halevy S, Rosenwaks S, Bar I. Bullous pemphigoid detection by micro-Raman
spectroscopy under cluster analysis: Structure alterations of proteins. J Raman Spectrosc 2005;36:1034-9.
67. Kezic S, Kemperman PM, Koster ES, de Jongh CM, Thio HB, Campbell LE, et al. Loss-of-function mutations in
the filaggrin gene lead to reduced level of natural moisturizing factor in the stratum corneum. J Invest Dermatol
2008;128:2117-9.
68. Gonzalez FJ, Alda J, Moreno-Cruz B, Martinez-Escaname M, Ramirez-Elias MG, Torres-Alvarez B, et al. Use of
Raman spectroscopy for the early detection of filaggrin-related atopic dermatitis. Skin Res Technol 2011;17:45-50.
69. Motta S, Sesana S, Monti M, Giuliani A, Caputo R. Interlamellar lipid differences between normal and psoriatic
stratum corneum. Acta Derm Venereol Suppl (Stockh) 1994;186:131-2.
70. Osada M, Gniadecka M, Wulf HC. Near-infrared Fourier transform Raman spectroscopic analysis of proteins,
water and lipids in intact normal stratum corneum and psoriasis scales. Exp Dermatol 2004;13:391-5.
71. Wohlrab J, Vollmann A, Wartewig S, Marsch WC, Neubert R. Noninvasive characterization of human stratum
corneum of undiseased skin of patients with atopic dermatitis and psoriasis as studied by Fourier transform
Raman spectroscopy. Biopolymers 2001;62:141-6.
72. Egawa M, Kunizawa N, Hirao T, Yamamoto T, Sakamoto K, Terui T, et al. In vivo characterization of the
structure and components of lesional psoriatic skin from the observation with Raman spectroscopy and optical
coherence tomography: A pilot study. J Dermatol Sci 2010;57:66-9.
73. Gniadecka M, Wulf HC, Johansson CK, Ullman S, Halberg P, Rossen K. Cutaneous tophi and calcinosis
diagnosed in vivo by Raman spectroscopy. Br J Dermatol 2001;145:672-4.
74. Cinotti E, Labeille B, Perrot JL, Boukenter A, Ouerdane Y, Cambazard F. Characterization of cutaneous foreign
bodies by Raman spectroscopy. Skin Res Technol 2013;19:508-9.
75. Moncada B, Sahagun-Sanchez LK, Torres-Alvarez B, Castanedo-Cazares JP, Martinez-Ramirez JD, Gonzalez FJ.
Molecular structure and concentration of melanin in the stratum corneum of patients with melasma.
Photodermatol Photoimmunol Photomed 2009;25:159-60.
76. Berger A. Raman spectroscopy of blood and urine specimens. In: Matousek P, Morris M, editors. Emerging
Raman Applications and Techniques in Biomedical and Pharmaceutical Fields. New York: Springer; 2010.
77. Willemse-Erix HF, Jachtenberg J, Barutci H, Puppels GJ, van Belkum A, Vos MC, et al. Proof of principle for

PDF GENERADO POR PROQUEST.COM Page 25 of 27


successful characterization of methicillin-resistant coagulase-negative staphylococci isolated from skin by use of
Raman spectroscopy and pulsed-field gel electrophoresis. J Clin Microbiol 2010;48:736-40.
78. Huang WE, Stoecker K, Griffiths R, Newbold L, Daims H, Whiteley AS, et al. Raman-FISH: Combining stable-
isotope Raman spectroscopy and fluorescence in situ hybridization for the single cell analysis of identity and
function. Environ Microbiol 2007;9:1878-89.
79. Xie C, Li Y. Confocal micro-Raman spectroscopy of single biological cells using optical trapping and shifted
excitation difference techniques. J Appl Phys 2003;93:2982-6.
80. Smijs TG, Jachtenberg JW, Pavel S, Bakker-Schut TC, Willemse-Erix D, de Haas ER, et al. Detection and
differentiation of causative organisms of onychomycosis in an ex vivo nail model by means of Raman
spectroscopy. J Eur Acad Dermatol Venereol 2014;28:1492-9.
AuthorAffiliation
Avijit Hazra:  Department of Pharmacology, Institute of Postgraduate Medical Education and Research, Kolkata,
West Bengal
Nithya Gogtay:  Department of Clinical Pharmacology, Seth GS Medical College and KEM Hospital, Parel, Mumbai,
Maharashtra

DETALLES

Materia: Studies; Statistical methods; Software; Random variables; Biomedical research;


Laboratories; Mortality; Researchers; Datasets

Título: Biostatistics series module 1: Basics of biostatistics

Autor: Hazra, Avijit; Gogtay, Nithya

Título de publicación: Indian Journal of Dermatology; Kolkata

Tomo: 61

Número: 1

Año de publicación: 2016

Fecha de publicación: 2016

Editorial: Medknow Publications &Media Pvt. Ltd.

Lugar de publicación: Kolkata

País de publicación: Kolkata

Materia de publicación: Medical Sciences--Dermatology And Venereology

ISSN: 00195154

e-ISSN: 19983611

Tipo de fuente: Scholarly Journals

PDF GENERADO POR PROQUEST.COM Page 26 of 27


Idioma de la publicación: English

Tipo de documento: Journal_Article

DOI: http://dx.doi.org/10.4103/0019-5154.173988

ID del documento de 1760838922


ProQuest:

URL del documento: https://utb.elogim.com/auth-


meta/login.php?url=https://www.proquest.com/scholarly-journals/biostatistics-
series-module-1-basics/docview/1760838922/se-2?accountid=173814

Copyright: Copyright Medknow Publications &Media Pvt Ltd 2016

Última actualización: 2016-02-03

Base de datos: ProQuest One Academic

Copyright de la base de datos  2021 ProQuest LLC. Reservados todos los derechos.

Términos y condiciones Contactar con ProQuest

PDF GENERADO POR PROQUEST.COM Page 27 of 27

You might also like