Topic 7-9 Notes
Topic 7-9 Notes
Unit objectives
1. Explain the types of events
2. Explain the laws of probability
3. Describe the tree diagrams
4. Explain the mathematical calculation
7.1 Probability
Probability is the chance that an event will occur or happen
The theory of probability provides the foundation for statistical inference
The concept probability is not foreign to health workers and is frequently encountered
in everyday communication e.g. a physician may say that she is 95% certain that a
patient has a particular disease or a public health nurse may say that she is certain a
patient will default treatment nine out of ten of the times etc.
The theory of probability can be explained as follows: If a coin is tossed it is equally
likely to come down heads or tails. This does not mean that if the coin is tossed twice
it will necessarily come down head once and tails once but only that about one half in
a large series of tosses will be heads.
When we perform an experiment a large number of times (n) and observe a particular
outcome m times. The probability (p) of this outcome is estimated to be m/n i.e. the
ratio of the successes to the total number of events. For example, by observing a
large number series of newly live-born children, we might estimate that the
probability of the birth of a male child to be 0.53 by computing the ratio of the live
male births to the total live births.
Event is an outcome of interest. If they are many events, then they shall be referred
to as set of events /sets of outcomes. Probability relies on the concept of set.
We use operations to compute probability. The set operations used in probability
include:
1. The union
2. The intersect
3. Complement
The union
The union of events A and B is written/ presented as AŪB i.e. it represents the set that
contains the elements of A and elements of B or both. That is if we consider A to be a
set of events and B to be another set of events.
1
Note that when we ask or speak of elements found in either one event or another or in
both of them, we are talking about the union of the two events.
For example, the union of events of even numbered rolls (rolls 1,2,3,4,5,6) (even no.
are 2,4,6) and rolls less than 5 (1,2,3,4) is that sub-set that contains elements found
in either set or both sets namely 1,2,3,4,6 is the union of A and B (AŪB).
P(AŪB)
This can be elaborated in the below diagram where we consider A to be a set of event
and B another.
A – Even numbered rolls 2,4,6
B - Numbers less than 5 1,2,3,4
A: 2,4,6, vs B:1,2,3,4
Intersection
For any 2 events say A and B, the intersection of A and B is written as AΩB or A and B
This represents an event that both A and B occurs. Therefore the probability of A
intersection B is the chance of observing event A and B simultaneously i.e. both
occurring.
P(AΩB)
Note that if the two events in the same outcome set have some element in common,
the two events are said to intersect. The intersection of the 2 events in that subset
composed of those common elements.
Complement
Once a subset has been defined, all other remaining elements in the outcome are said
to be complements of that sub-set e.g. if an event is defined as even numbered rolls
(2,4,6), the complementary subset consists of odd numbered rolls e.g. 1,3,5.
If the subset is rolls less than 5 (1,2,3,4), then the complement is the subset
consisting of rolls 5 or greater than 5 (5,6).
The complement of event A is written as Ã/A~
This refers to the negation of event A. Therefore the probability of complement A P(Ã)
means the chance that event A does not occur, then what shall happen.
P(Ã) +P(A) = 1
P(Ã) – probability of event A not occurring
P(A) – probability of event A occurring
7.2 Concepts in Probability
1. Relative frequency of an event/ probability of an event:
2
Relative frequency of an event is the proportion of the total observations of outcomes
that the event presents. e.g. in tossing a coin (H-head, T-tail), if n is the number of
tosses and f the total number of heads observed, then the relative frequency of heads
is:
= f/n
if heads observed in 100 coin tosses is 52, then the relative frequency is:
52/100 = 0.52
R.F = Frequency of that event/ total no. of all observations = f/n
The value f may obvious range from 0-n
Probability is the likelihood of that event expressed either by the relative frequency
observed from a large number of data or by the knowledge of the system under study.
Example 1
A sample of 852 vertebrate animals taken randomly from a forest. The sampling was
done with replacement and the data was as below:
Vertebrates Number/frequency Relative Frequency
Amphibian 53 0.06
Turtles 41 0.05
Snakes 204 0.24
Birds 418 0.49
Mammals 136 0.16
Total 852 1.00
In the above example, RF of vertebrates groups has been observed from randomly
sampling forest animals. If each animal has an equal chance of being caught, we may
estimate the probability (p) that the next animal captured will be a snake (p = 0.24).
0.24 is similar to the relative frequency.
Probability may sometimes be predicted on the basis of knowledge about the system
e.g. structure of a coin (head or tail) or Mendelian principle of heredity i.e. human
gender being either male or female but not both.
Probability of an outcome A is denoted by P(A). Thus: P(A) = p which means that the
probability of event A is p.
Probability like relative frequency can range from 0-1.
0≤p≤1
The probability of 0 means that the event is impossible i.e. cannot happen e.g. in
tossing a coin p(neither head nor tail) = 0
3
The probability of 1 means that the event is certain e.g. tossing a coin p(H or T) = 1 or
the probability that the sun will rise from the East tomorrow is p(sun arising from the
East tomorrow) = 1
Mutually Exclusive
If two events have no elements in common, they are said to be mutually exclusive.
Mutually exclusive events are those events that cannot occur at the same time e.g.
light and darkness
How can someone know that events are mutually exclusive?
1. When the probability of A intersection B is equal to 0
P (AΩB) = 0
A intersection B is the chance of observing event A and B simultaneously but where
P(AΩB)
= 0, then this implies that the events of A and B are mutually exclusive.
2. Additional rule of probability can be used to show that events are
mutually exclusive.
Additional rule of Probability states that the probability of event A occurring and the
probability of event B occurring minus the probability of both A and B occurring should
be equivalent to the probability of either A or B or both (P(AŪB)).
P(AŪB) = P(A) + P(B) - P(AΩB)
Since P(AΩB) is = 0
P(AŪB) = P(A) + P(B)
Independent Events
Events are said to be independent if the occurrence of 1 does not affect the
occurrence of the other. Multiplication rule of probability can be used to show that
events are independent.
Events A and B are said to be independent if the probability of A intersection B is
equal to the product of the probability of A {P (A)}, and probability of event B {P (B)}
P (AΩB) = P (A).P (B)
P (AΩB) this also means P (A and B)
Therefore, independent events can be mutually exclusive when or if the individual
probabilities of the events e.g. A or B {P (A) or P (B)} is equal to zero
Conditional Probability
Conditional probability refers to the chance of an event occurring when another event
has already occurred.
This concept of conditional probability of events is important in scientific research in
the sense that it helps researchers to counsel the efforts of unnecessary events. For
4
example in the use of ARVs in HIV and AIDS; you may recruit patients and start them
on drug B and because of the side effects, change to drug A. The probability A being
effective given that B had already been given.
The probability of A happening given that B has occurred in represented as P (A/B).
This can be calculated by dividing the probability of both A and B occurring by the
probability of the event that has occurred [P (B)]
P (A/B) = P (AΩB)/ P (B)
Example 1
In STD test, the probability that doctor A will have a positive diagnosis is 0.1 and the
probability that doctor B will also have a positive diagnosis is 0.17. The probability of
both doctors (A and B) having positive diagnosis is 0.08.
7.3 Application of Probability law in Screening tests (Baye’s Theorem)
In the health sciences field a widely used application of probability law and concepts is
found in the evaluation of screening tests and diagnostic criteria.
Of interest to health workers is an enhanced ability to correctly predict the presence
or absence of a disease from the knowledge of test results and the status of
presenting symptoms
In our consideration of screening tests, we must be aware of the fact that a screening
test may not always be infallible i.e. a testing procedure may yield a false positive or a
false negative
A false positive: results when a test indicates a positive status when the true status is
negative
A false negative: results when a test indicates a negative status when the true status
is positive
Results of a screening test may be presented in a two-by-two table (2x2 table) as
follows
Disease status
Test result Present (D) Absent (D-) Total
Positive (T) A b a+b
Negative (T-) C d c+d
Total a+c b+d a+b+c+d=n
5
Using a 2x2 table a variety of probability estimates may be computed from the
information displayed. E.g. we may compute the conditional probability estimate
P(T/D) = a/(a+c). This ratio is an estimate of the sensitivity of the screening test.
Sensitivity of a test or symptoms is the probability of a positive test result (or
presence of the symptoms) given the presence of the disease
We may also compute the conditional probability estimate P(T-/D-) = d/(b+d). This
ratio is an estimate of the specificity of the screening test.
The specificity of a test or symptom is the probability of a negative result or
absence of the symptom given the absence of the disease
We can also use the data to estimate a conditional probability of P(D/T). This ratio is
an estimate of a probability called the predictive value positive of a screening
test/symptoms.
6
TOPIC 8: PROBABILITY DISTRIBUTION
Unit objectives
1. Define probability
2. Explain terms related to probability distribution
3. Describe the construction of probability distribution
4. Explain types of probability distribution
8.1 Definition of probability
Probability is the measure of the likeliness that an event will occur
Terms related to probability distribution
A variable is a symbol (A, B, x, y, etc.) that can take on any of a specified set of
values.
When the value of a variable is the outcome of a statistical experiment, that variable
is a random variable.
Random variable is a variable whose value is determined by the outcome of a
random experiment.
A discrete random variable is one whose set of assumed values is countable
(arises from counting).
A continuous random variable is one whose set of assumed values is uncountable
(arises from measurement.).
A discrete probability distribution is a table (or a formula) listing all possible
values that a discrete variable can take on, together with the associated
probabilities.The function f(x) is called a probability density function for the
continuous random variable X where the total area under the curve bounded by the x-
axis is equal to 1. The area under the curve between any two ordinates x = a and x =
b is the probability that X lies between a and b
8.2 Probability Distributions
Values of random variables/numbers and the probabilities of their occurrence may be
summarized by means of a probability distribution.
Probability distributions may be expressed in the form of tables, graph or formula.
Knowledge of the probability distribution or random variables provides health workers
and researchers with a powerful tool for summarizing and describing a set of data and
for reaching conclusions about a population of data on the basis of a sample of data
drawn from the population
Binomial Distribution
– Is one the most commonly encountered form of probability distribution
7
– Binomial distribution is commonly used in situations where events of interest
lead to only two possible outcomes. In this case one is treated as a success and
the other a failure.
– This is concerned with events which may have two possibilities e.g. head or tail,
male or female, dead or alive, cured or not cured.
– It is used in giving the distribution of categorical data. It basically deals with
proportions (using percentages %).
For example: In a research on Prevalence of violence by marital partner among
residents of Nambale division, Busia district. You may ask your respondents that:
Have you ever experienced any violence by your partner?
1] Yes 2] No
In this kind of studies, sample size n is usually determined before data collection
Proportions (^p) of those who have had violence from their partners can be generated
by this study.
Normal Distribution
Unlike the binomial distribution, normal distribution deals with quantitative variables
that are continuous in nature i.e. variables that can take several values to describe
their distribution. Quantitative variables can range from very small to big values and
from negative to positive values
Normal distribution is applied on continuous variables of data. It’s used to represent
population characteristics that may be observed on a continuous scale e.g. length,
age, weight height etc.
The normal distribution is the most important distribution in statistics. It is also called
Gaussian distribution after a German mathematician Carl Friedrich Gaussian (1777-
1856) who first used it in astronomy. It represents a bell-shaped curve which basically
uses the mean and the variance /standard deviation of a distribution
The importance of Mean and Variance /Standard Deviations in Defining a
Normal Curve
1. When the mean changes (increases or decreases), the whole distribution is
shifted with it
Mean increases
If the mean increases the distribution shifts to the right and if the mean decreases,
the distribution moves to the left as shown:
8
For this reason, the mean is also called the measure of location i.e. measure of central
tendency of the distribution e.g. the mean, median and mode which locates the
centre. When the mean changes, the location of the distribution also changes.
2. When the standard deviation changes (increases or decreases), the spread of
the distribution changes
SD increases
The spread increases and the height of the distribution decreases
SD increases
The spread is reduced and the height of the distribution increases
For this reason, SD is said to define the spread of values around the mean.
The mean and the variance/standard deviation can be used in making quantitative
statements about the distribution if the distribution is normally distributed. For
example it is estimated that 1SD from either side of the mean contains 68% of the
population, 2 SD contains 95% and 3SD on ether sides of the mean contains 99% of
the population.
Sampling and Sampling Distribution
– Sampling method is applied to collect the data in most of the studies.
– According to this method, a few units from the whole population are selected
and the results obtained on this basis are generalized for the entire population.
The advantages of sampling method are:
1. It’s cheaper to collect the data since only a sample is collected
2. Sampling saves time since just a sample is collected
3. Since only a part of the whole population is to be studied, a good quality of
labour with better supervision can be provided
4. An investigation of a small part of the population gives us more detailed
information
NB: If the sample size is too small and sample units are not representative
then this method will not be reliable
Sampling errors and non-sampling errors
A sample being only a part of a population cannot represent the population, no matter
how carefully the sample is collected.
This gives rise to a difference between the values of sample statistic and the true
value of the corresponding population parameter. Such a difference is called sampling
error for that sample
9
For example a sample mean (Xbar) obtained from a sample n and population mean µ,
then the difference between Xbar and µ is the sampling error, that is
Sampling error = Xbar-µ
Sampling Distribution and the distribution of the sample mean
Different samples drawn from the same population could have different means. When
the means are plotted, they produce a sampling distribution of the means. The
sampling variability of the different means is measured by the standard error.
Standard Error: Is the standard deviation of the sample mean and is
denoted as SE
If the standard deviation of the observations is s, then the standard error of the mean,
from the sample size n will be s/√n.
If the distribution of the population is normal so will be the distribution of the means
of samples on the same size drawn from the same population.
The table of the normal distribution can therefore be used in making statements
about the distribution of the means.
In statistics it’s stated that if the original population had a normal distribution with
mean µ and the standard deviation σ then the distribution of means of sample of size
n is also normal with mean Xbar and the standard error σ/√n.
This holds true even if the original distribution is not normal. That is for any
distribution with mean µ and standard deviation σ of the distribution of means of
samples of size n (provided n is not very small, will be normal with mean and standard
error σ/√n. This is why the normal distribution is so useful in statistics – no matter
what the original distribution is, the distribution of means will always be normal. This
is the central limit theorem.
Central Limit Theorem: States that given a population of any non-normal functional
form with a mean µ, and finite variance/standard deviation σ, the sampling
distribution of Xbar computed from samples of size n from this population will have
mean µ and the variance presented in form of standard error as σ/√n and will be
approximately normally distributed when the sample size is large.
NB: Sampling distribution of sample means is the distribution of means obtained from
repeated sampling.
10
TOPIC 9: HYPOTHESIS TESTING
Unit objectives
1. Define hypothesis testing
2. Explain methods of hypothesis testing
3. Describe the formulation of hypothesis
9.1 Definition of terms
– A hypothesis is a guess or an assumption. It is a tentative explanation for certain
behaviour patterns, phenomena or events that have occurred or will occur. For
example, it may be believed that a given drug cures 95% of the patients taking it.
This is only a guess. It may or may not be true.
– In research, a hypothesis is a statement that describes an unknown but tentatively
reasonable outcome for the existing phenomena. It is a tentative answer to what
the researcher considers as ought to be the possible outcome of an existing
problem or phenomenon. There are 3 types of hypothesis; conceptual, research
and statistical hypotheses.
– Statistical hypothesis is used in quantitative research. It states the relationship
between numbers representing statistical properties of data such as the mean,
variance, proportions and correlation. This hypothesis is a guess about the value of
population parameter or about the relationship between values of two or more
parameters the hypothesis is testing.
– Statistical hypothesis consist of the null hypothesis (H0) and the alternative
hypothesis (H1). In research, hypothesis is stated in the null i.e. hypothesis of no
difference or no relationship or no association etc. The stated hypothesis is what is
tested to either reject or not reject the null hypothesis. The process of making a
decision on the hypothesis after data analysis is termed as hypothesis testing.
– Hypothesis testing is therefore the process by which this belief or opinion is tested
by statistical means. It helps one to decide on the basis of the information obtained
from sample data whether to accept or reject a statement or an assumption about
the value of the population parameter.
– Hypothesis testing is also called test of significance. It is one of the most important
techniques of statistical inferencing. Hypothesis testing involves the process of
decision making using sample data. Its purpose is to aid in reaching a decision
concerning a population by examining a sample from the population.
– In hypothesis testing, the null hypothesis is specified and this is the hypothesis of
no difference and it is denoted by Ho. Then the problem is to find whether the
11
sample data throws any light on the plausibility of the hypothesis. That is, we
proceed to evaluate the probability of having obtained the observed results and
even more or equally extreme results if Ho were true. The decision we take would
depend on whether this probability is either high or low.
– To avoid philosophical argument on what is a high or low probability we fix a
probability level against which we compare the probability we obtain from our data
under the Ho. The fixed probability level is called the level of significance. This is
decided upon before data are analyzed and it depends on the nature of the
problem being evaluated. In medical research, the level of 0.05 (5%) and 0.01 (1%)
are most commonly used.
– When the Ho is stated, the alternative hypothesis H1 is also stated. The procedure
is then to reject the Ho in favor of H1 if a statistical test yields a probability
associated with Ho which is less than or equal to α.
NB: Rejection of the hypothesis is to declare it false (not true).The acceptance of a
hypothesis is to conclude that there is insufficient evidence to reject it. Acceptance
does not necessarily mean that the hypothesis is true.
9.2 Types of Errors in Hypothesis Testing
There are two types of which could be made when a decision about Ho is made.
The 1st is the type I error, which is to reject Ho when in fact it is true and it is given
by α.
Type II error is to accept the H0 when it actually false and it given by β. It therefore
follows that the higher the value for α, the higher the chances of falsely rejecting
the H0 when actually it’s true.
Decision Accept H0 Reject H0
H0 is True Correct decision Wrong decision (Type l error)
H0 is false Wrong decision (Type ll error)correct decision
Decision on H0
When the probability of obtaining the observed values or even more extreme results
under H0 has been computed, one is then in a position to make a decision on Ho. If
p ≥ α accept the Ho
p < α reject the H0 in favor of H1
Acceptance and rejection regions on normal curve
P-value is the probability of obtaining our results or something more
extreme, if the null hypothesis is true
12
Then null hypothesis relates to the population of interest rather than the sample.
Therefore the null hypothesis is either true or false and we cannot interpret the p-
value as the probability that the null hypothesis is true.
Using the P-value
We must make a decision about how much evidence we require to enable us decide to
reject the null hypothesis in favor of the alternative. The smaller the p-value the
greater the evidence against the null hypothesis.
Conventionally, we consider that if the p-value is less than 0.05 at 95%CI, there is
sufficient evidence to reject the H0, as there is only a small chance of the results
occurring if the null hypothesis were true. We then reject the null hypothesis and we
say that the results are significant at 0.05 (5%).
In contract, if the P-value is equal to or greater than 0.05, we usually conclude that
there is insufficient evidence to reject the null hypothesis. In such a case we do not
reject the null hypothesis and we say that the results are insignificant at p-value
≤0.05 (5% level).
9.3 Normal Distribution and Significance Testing
A test of significance is a rule or procedure by which sample results are used to decide
whether to accept or reject a null hypothesis. The use of CI can achieve 2 major
things. First, it can enable us make statistical statements about a distribution of a
variable in a population. Second, a decision can also be made regarding the H0.
Alternatively, one may also compute the number of standard deviations that the
hypothesized value (stated or implied in the H0) is away from the observed value.
The test of significance used depends on the sample size. It has generally been
agreed that a sample size that exceeds 30 should be regarded as a large sample.
Therefore, to test hypothesis, one may use the Z-score test or the t-score test
depending on the sample size.
z = (x - µ)/σ or
z = (xbar - µ)/s√n
13
9.4 Procedure in Hypothesis testing:
Testing of hypothesis can be done using 2 main approaches: i.e. using CI
determination or using the normal deviates (z-score or t-score)
Using CI:
1. State the null hypothesis that there is no difference between the sample mean
and the population mean
2. Find the standard error of the mean using the formula: s/√n
3. Compute the limits with in which the population mean will fall
4. Find out whether the population mean does lie within those limits or not. If the
population mean lies within those limits then the H0 is accepted, otherwise it is
rejected
Alternative formula
Using the z-score:
1. You need to calculate the z-value then refer to the normal table to see where
the value falls
2. Then you make a decision, either to reject or to accept the H0
For example, In a sample of 100 adult men the mean height is found to be 67.1 in.
with a standard deviation 2.1.in. a man is picked at random from the group of 100
men and is found to be 75 in. tall. Should we consider him exceptionally tall for the
group or population with the mean height of 68inches?
State the H0 : Which says that the man is compatible with the group of 100 men with
an average height of 68in.
H0 x = µ
H1 x ≠ µ
z = (x - µ)/σ
= (75-68)/2.1
= 3.3
This means that the man is 3.3 normal deviates above the average height of this
group. This is referred to the table of normal distributions.
The value of p corresponding to z=3.3 is approximately 0.0005. Since the value of
p is less than 0.05 which was our p value below which we reject the H0 we
therefore reject our H0 of no difference. The man is therefore exceptionally tall for
the population
14