0% found this document useful (0 votes)
39 views

Unit-I Probability

FUNDAMENTALS OF HEALTHCARE ANALYTICS REGULATION 2021

Uploaded by

suhagaja
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Unit-I Probability

FUNDAMENTALS OF HEALTHCARE ANALYTICS REGULATION 2021

Uploaded by

suhagaja
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

1.

Computers and bio statistical analysis


2. Biostatistics
3. Measures of Location
4. Measures of Spread
5. Experiments and Events
6. Probability
7. Conditional Probability
8. Baye’s Theorem
9. Likelihood & Odds
10.Distribution Variability
 The widespread use of computers has had a tremendous impact on health
sciences research in general and bio statistical analysis.
 The necessity - long and tedious arithmet ic co mputations as part of the statistical
analysis of data lives only in the memory of those researchers
 Co mputers can perform mo re calculations faster and far mo re accurately than
can human technicians. Also, gives more time to the improvement of the quality
of raw data and the interpretation of the results.
 The current prevalence o f microcomputers and the abundance of availab le
statistical software programs have further revolutionized statistical computing.
 Co mputers currently on the market are equipped with rando m nu mber generating
capabilit ies. As an alternative to using printed tables of rando m nu mbers,
investigators may use computers to generate the random numbers they need.
 Actually, the “rando m” nu mbers generated by most usefulness of the computer
in the health sciences is not limited to statistical analysis
 Current develop ments in the use of computers in biology, medicine, and related
fields are reported in several periodicals devoted to the subject.
 A few such periodicals are Co mputers in Biology and Medicine, Co mputers and
Bio med ical Research, Co mputer Methods and Programs in Bio medicine,
Computer Applications in the Biosciences, and Computers in Nursing.
 The MINITA B, SPSS, R, and SAS® statistical software packages for the
personal computer have been used for this purpose.
 Statistics is the science whereby inferences are made about specific
random phenomena on the basis of relatively limited sample
material.
 The field of statistics has two main areas:
➢ Mathematical statistics - the development of new methods of
statistical inference and requires detailed knowledge of abstract
mathematics for its implementation
➢ Applied statistics - Applying the methods of mathematical statistics
to specific subject areas, such as economics, psychology, and public
health.
➢ Biostatistics is the branch of applied statistics that applies statistical
methods to medical and biological problems. For example, in some
instances, given a certain bio statistical application, standard methods
do not apply and must be modified. In this circumstance,
biostatisticians are involved in developing new methods.
➢ Mean blood pressures and differences between machine and human
readings at four locations
Example: Cancer, Nutrit ion Some investigators have proposed that
consumption of vitamin A prevents cancer. So, the data is collected on
vitamin-A consumption among 200 hospitalized cancer patients (cases) and
200 controls. The controls would be matched with regard to age and sex
with the cancer cases and would be in the hospital at the same time fo r an
unrelated disease.
In Figure 2.1. The bar graphs show that the controls consume more vitamin A
than the cases do, particularly at consumption levels exceeding the
Recommended Daily Allowance (RDA).
Mean =3166.9g

There is no infant mode, because all the values occur exactly once.
The range is the difference between the largest and smallest observations in a
sample.
Eg:The range in the sample of birth weights is 4146 − 2069 = 2077 g

The pth percentile (quantiles) is defined by


The (k + 1)th largest sample point if np/100 is not an integer (where k is the
largest integer less than np/100)
The average of the (np/100)th and (np/100 + 1)th largest observations if
np/100 is an integer.
Eg:Compute the 10th and 90th percentiles for the birth weight data.

Because 20 × .1 = 2 and 20 × .9 = 18 are integers


10th percentile:
average of the second and third largest values = (2581 + 2759)/2 =2670 g
90th percentile:
average of the 18th and 19th largest values = (3609 + 3649)/2 = 3629 g

We would estimate that 80% of birthweights will fall between 2670 g and
3629 g, which gives an overall impression of the spread of the distribution.
CV=100*(445.3/3166.9) = 14.06 %
The field of “probability theory” is a branch of mathematics that is
concerned with describing the likelihood (Chance) of different
outcomes from uncertain processes.

A simple experiment is some action that leads to the occurrence of a


single outcome s from a set of possible outcomes S .
• The single outcome s is referred to as a sample point
• The set of possible outcomes S is referred to as the sample space
Example. Suppose that you flip a coin n ≥ 2 times and record the
number of times you observe a “heads”. The sample space is
S = {0, 1, . . . , n }, where s = 0 corresponds to observing no heads and
s = n corresponds to observing only heads.

Example. Suppose that you pick a card at random from a standard


deck of 52 playing cards. The sample points are the individual cards in
the deck (e.g., the Queen of Spades is one possible sample point), and the
sample space is the collection of all 52 cards.

Example. Suppose that you roll two standard (six-sided) dice and sum
the obtained numbers. The sample space is S = {2, 3, . . . , 11, 12},
where s = 2 corresponds to rolling “snake eyes” (i.e., two 1’s) and
s = 12 corresponds to rolling “boxcars” (i.e., two 6’s).
An event A refers to any possible subspace of the sample space S , i.e.,
A ⊆ S , and an elementary event is an event that contains a single sample
point s.

For the coin flipping example, we could define the events


• A = { 0} (we observe no heads)
• B = { 1, 2} (we observe 1 or 2 heads)
• C = {c | c is an even number} (we observe an even numbe r of
heads)

Here event A is an elementary event.


A sure event is an event that always occurs, and an impossible event
(or null event) is an event that never occurs.

Example. For the playing card example,


E = {e | e is a Club, Diamond, Heart, or Spade} is a sure event and
I = {Joker} is an impossible event.
Two events A and B are said to be mutually exclusive if A ∩B = ∅,
i.e., if one event occurs, then the other event can not occur. Two events
A and B are said to be exhaustive if A ∪B = S , i.e., if one of the
two events must occur.

Example. For the coin flipping example, the two events A = { 0} and
B = { n } are mutually exclusive events, whereas
A = {a | a is an even number between 0 and n} and
B = {b | b is an odd number between 1 and n} are exhaustive events.

Note that this is assuming that 0 is considered an even number.


A probability is a real number (between 0and 1) that we assign to
events in a sample space to represent their likelihood of occurrence.

The notation P (A ) denotes the probability of the event A ⊆ S .


The three probability axioms
1. P (A ) ≥ 0 (non-negativity)
2. P (S ) = 1 (unit measure)
3. P (A ∪ B ) = P (A ) + P (B ) if A ∩ B = ∅(addi tivity)
A probability distribution F (·)is a mathematical function that assigns
probabilities to outcomes of a simple experiment.

Note that a probability distribution is a function from the sample


space S to the interval [0, 1], which can be denoted as F : S → [0, 1].

Since F : S → [0, 1], we have that F (s) ≥ 0 and F (s) ≤ 1 for any s ∈ S .
Consider the dice rolling example where we sum the numbers of dots
on two rolled dice. The sample space is S = { 2, 3, . . . , 11, 12} .

Assume that the dice are fair, i.e., equal chance of observing each
outcome {1, . . . , 6} on a single roll, and that the two rolls are
independent, i.e., unrelated to one another.

Al though there are only 11 elements in the sample space, i.e., |S| = 11,
there are a total of 62 = 36 possible sequences that we could observe
when rolling two dice.
The n the probability of each elementary event is as follows:

 (1, 1)
 (1, 2), (2, 1)
(1, 3), (2, 2), (3, 1)
2 1/36

3 2/36
 (1, 4), (2, 3), (3, 2), (4, 1) 4 3/36
 (1, 5), (2, 4), (3, 3), (4, 2), (5, 1) 5 4/36
 (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1) 6 5/36
7 6/36
 (2, 6), (3, 5), (4, 4), (5, 3), (6, 2) 8 5/36
 (3, 6), (4, 5), (5, 4), (6, 3) 9 4/36
 (4, 6), (5, 5), (6, 4) 10 3/36
 (5, 6), (6, 5) 11
12
2/36
1/36
 (6, 6)
Bayes’ theorem states that
P (A |B ) = P (B |A )P (A ) and P (B |A ) = P (A |B )P (B )
P (B ) P (A )
➢The sensitivity of a sympto m is the probability that the symptom is present
given that the person has a disease.
➢The specificity of a symptom is the probability that the symptom is not
present given that the person does not have a disease.
➢A false negati ve is defined as a negative test result when the disease or
condition being tested for is actually present.
➢A false positive is defined as a positive test result when the disease or
condition being tested for is not actually present.
what are the probabilities Pr(Bi |A) of the three disease states given the previous symptoms?

The unconditional probability of sarcoidosis is very low (.009), the conditional probability of the disease
given these symptoms and this age-sex-smoking group is .811. Also, the symptoms and diagnostic tests
are consistent with both lung cancer and sarcoidosis, the latter is much more likely among patients in this
age-smoking group
No. of P rograms: 1,2,3,4,5,6,7,8
Frequency : 62 47 39 39 58 37
4 11
P (X): .2088 .1582 .1313 .1313
.1953 .1246 .0135 .0370
p =0.858; q = 1-p = 0.858 -0.142
the answer to the question is = ncxp^xq^n-x= 10*(0.142)^ 2 (0.858)^ 3 =0.1276
Characteristics of the Normal Distribution
1. It is symmet rical about its mean, m., the curve on either side of m is a
mirror image of the other side.
2. The mean, the median, and the mode are all equal.
3. The total area under the curve above the x-axis is one square unit
4. The normal distribution is co mpletely determined by the parameters m
and s.
The Uptimer is a custom-made lightweight battery-operated activity mon itor
that records the amount of time an individual spends in the upright position.
In a study of children ages 8 to 15 years, 529 normally
.developing children who each wore the Uptimer continuously for a 24-hour
period that included a typical school day. The researchers found that the
amount of time children spent in the upright position fo llo wed a normal
distribution with a mean of 5.4 hours and standard deviation of 1.3 hours.
Assume that this finding applies to all children 8 to 15 years of age. Find the
probability that a ch ild selected at rando m spends less than 3 hours in the
upright position in a 24-hour period

You might also like