Biostatistics Notes-Numbered
Biostatistics Notes-Numbered
Statistics is science & art of dealing with variation in such a way as to obtain reliable results.
Biostatistics: The application of statistics to a wide range of topics in biology. Biostatistics is the
science which deals with development and application of the most appropriate methods for the:
• Collection of data.
• Presentation of the collected data.
• Analysis and interpretation of the results.
• Making decisions on the basis of such analysis
Importance of statistics: The basic understanding of statistics is useful in conducting the
investigations for a research project,also in effective presentation of the results. Statistics plays a
vital role in nursing, enabling evidence-based practice, quality improvement, and informed
decision-making. It helps nurses analyze patient data and predict outcomes, ultimately improving
patient care and safety.
Inferential Statistics: Utilizes a (sub) set of the data (sample) to make estimates, decision, or
prediction about a larger set of data (population). It consists of Estimation and hypothesis of
testing.
Population: The set of all measurements of interest to the investigator .e.g. Monthly income of
households in Pakistan Or Number of TB Patients in Pakistan
Quantitative Data (Measurement): Those characteristics for which the measurements convey
information regarding an amount or quantity. A variable that is numerical in nature and that can
be ordered or ranked.
Example: Age, Blood pressure
3
Measurements of scale: A type of classification that tells how variables are categorized,
counted, or measured; the four type of scales are: (Nominal, Ordinal, Interval, Ratio)
Nominal scale: A measurement level that classifies data into mutually exclusive
(nonoverlapping) categories in which no order or ranking can be imposed on them a qualitative
variable that categorizes an element of a population. Example: Telephone number, zip code,
smoking status
Ordinal scale: A measurement level that classifies data into categories that can be ranked;
However, precise differences between the ranks do not exist a qualitative variable that
incorporates an ordered position or ranking. Example: Educational level socio-economic status
etc.
Interval Scale: Scale having equal units but an arbitrary zero point. Can add and subtract.
Example: Temperature
4
Ratio scale: A measurement level that possess all the characteristics of interval measurement and
a true zero. Zero is the absence of the characteristic being measured.-Can add, subtract, multiply,
and divide.
Discrete scale: Can assume a countable number of values. There is a gap between any two
values. Example: number of children, number of missing teeth etc.
Continuous scale: A variable that can assume all values between any two specific values; a
variable obtained by measuring. E.g: height, weight etc
Frequency distribution: A Tabular summary of a set of data showing the frequency (or number)
of items in each of several non-overlapping (with each data value belonging to one and only one
group) groups.
Class Frequency: Number of observations in a data set falling into a particular class.
Cumulative Frequency: Number of observation in a data set falling below Or above particular
class inclusive of that particular class.
Class relative frequency: Class frequency divided by the total number of observations in the
data set.
GRAPHS: Graphs are Geometrical designs: Convey information at a glance and are
athematically less sophisticated.
DOT PLOT: One of the simplest graphical summaries of data is a dot plot. A horizontal axis
shows the range of data values. Then each data value is represented by a dot placed above the
axis.
PIE CHART: The pie chart is a commonly used graphical device for presenting relative
frequency distributions for qualitative data. First draw a circle; then use the relative frequencies
to subdivide the circle into sectors that correspond to the relative frequency for each class.
Mean: It is the arithmetic average of a set of numbers. Applicable for interval and ratio data and
not applicable for nominal or ordinal data. Computed by summing all values in the data set and
dividing the sum by the number of values in the data set.
E.g: Sample Mean: age of the patients coming to clinic 8, 9, 10, 11, 12, 9
6
Mean X= (summing all values) 8+9+10+11+ 12+9 / (number of values) 6
Median: The Median is a middle value in an ordered array of numbers. Applicable for ordinal,
interval, and ratio data. Not applicable for nominal data.
Median (Computational Procedure): Arrange the observations in an ordered array. If there is
an odd number of terms, the median is the middle term of the ordered array. If there is an even
number of terms, the median is the average of the middle two terms.
7
8
9
10
Appropriate Measures of Central Tendency
Symmetric Distribution: When the data values are evenly distributed about the mean, the
distribution is said to be symmetric.
Skewed Distribution:
Negatively or Left Skewed: When the majority of data values fall to the right of the mean, the
distribution is said to be skewed.
Positively or right Skewed: When the majority of data values fall to the left of the mean, the
distribution is said to be skewed.
What is the Normal Distribution (Curve)? It’s a theoretical model. The normal distribution
plays a very important role in statistical inference. A frequency polygon or histogram that is
unimodal, smooth, and symmetrical (no empirical distribution has a shape that perfectly matches
this ideal model). Since the distribution is unimodal it is bell-Shaped.
Standard Error: the standard deviation of the sampling distribution is called the standard error.
Sampling Error: is the difference between the sample measures and the corresponding
population measure due to the fact that the sample is not a perfect representation of the
population.
Sampling distribution of sample mean: is a distribution using the means computed from all
possible random samples of a specific size taken from a population.
Method of testing hypothesis: The three methods used to test hypotheses are:
Ex: The mean balance score to assess muscle function among rheumatoid arthritis patients is
lower than the osteo-arthritis.
Types of hypothesis:
NULL HYPOTHESIS (H0):- a claim that there is no difference between the population
parameter and the hypothesized value.
Ex:The mean balance score to assess muscle function among rheumatoid arthritis (RA) patients
is greater than or equal to the osteo-arthritis (OA) patients (4).
Directional and Non-directional Hypothesis: one tailed hypotheses are directional; two tailed
hypothesis is otherwise non-directional.
• Null Hypothesis
• Alternative Hypothesis (Researcher Hypothesis)
• Choice of appropriate level of significance ( )
• Assumptions & Test Statistic (Formula)
• Rejection Region (Critical Region)
• Conclusion
Unit-8 Type l and type ll errors, power of the set and p-value
18
p-value approach: Another approach and now a days the most common approach is to report the
extent to which the statistic disagrees with the null hypothesis and compare it with the value of a
for the decision whether to reject the null hypothesis. This measure of disagreement is called the
p-value.
P-Value: A commonly used approach in statistical software in hypothesis testing is to report p-
value.
The p-value measures the strength of the evidence against Ho. P-value is compared with the
value of alpha for the decision whether to reject the null hypothesis.
Type II Error (Non-rejection error or Beta (β) Error): It is the decision that we do not reject
Ho is false.
Regression: Regression is a statistical method used to describe the nature of the relationship
between variables, that is, positive or negative, linear or non-linear. Regression is a numerical
measure that is used to answer following question:
Independent variable: Independent variable is the variable in regression that can be controlled
or manipulated.
Dependent variable: The dependent variable is the variable in regression that cannot be
controlled or manipulated.