0% found this document useful (0 votes)
14 views43 pages

02-03-2023 - 23 - Probabilty

Uploaded by

Sweta Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views43 pages

02-03-2023 - 23 - Probabilty

Uploaded by

Sweta Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Biostatistics

Final Year B Pharm. Sem VIII


Unit 2: Probabilty
Dr. Tina Saldanha
UNIT I:
● Introduction: Statistics, Biostatistics, Frequency distribution
● Measures of central tendency:
● Mean, Median, Mode- Pharmaceutical examples
● Measures of dispersion:
● Dispersion, Range, standard deviation, Pharmaceutical problems
● Correlation: Definition, Karl Pearson’s coefficient of correlation, Multiple correlation -Pharmaceuticals
examples
UNIT-II
● Regression: Curve fitting by the method of least squares, fitting the lines y= a + bx and x = a + by,
Multiple regression, standard error of regression– Pharmaceutical Examples

● Probability: Definition of probability, Binomial distribution, Normal distribution, Poisson’s distribution,


properties – problems, Sample, Population, large sample, small sample, Null hypothesis, alternative
hypothesis, sampling, essence of sampling, types of sampling, Error-I type, Error-II type, Standard
error of mean (SEM) - Pharmaceutical examples

● Parametric test: t-test(Sample, Pooled or Unpaired and Paired) , ANOVA, (One way and Two way),
Least Significant difference
Probability
● The earliest mathematical analysis of the theory
of probability dates to the 18th century

● Abraham De Moivre, a French mathematician,


discovered that a mathematical relationship
explained the probability associated with
various games of chance

● He developed the equation and the graphic


pattern that describes it
https://www.khanacademy.org/math/statistic
What is Probability s-probability/probability-library/basic-theoret
ical-probability/a/probability-the-basics

● Probability is the likelihood of an event occurring

● Whenever we're unsure about the outcome of an event, we can talk about the
probabilities of certain outcomes—how likely they are.
● The best example for understanding probability is flipping a coin:
● There are two possible outcomes—heads or tails.
● What’s the probability of the coin landing on Heads? We can find out using the
equation P(H) = ? In this case, P(H) = ½…or 50%……(Probability of an event = (# of
ways it can happen) / (total number of outcomes)
Some aspects about Probability…..

● The probability of an event can only be between 0 and 1 and can also be written
as a percentage.

● The probability of event A is often written as P(A)


● If P(A) > P(B), then event A has a higher chance of occurring than event B
● If P(A) = P(B), then events A and B are equally likely to occur

For those of you who like brain teasers/television games of luck, click on this problem from
Khans academy https://youtu.be/Xp6V_lO1ZKA
Probability distributions

● In statistics, data is generally collected, classified and tabulated into a


form of frequency distribution

● The values of the variables are distributed according to a definite law or


pattern which is usually expressed in a mathematical form
● These are called as probability distributions
Biostatistics by Sai Subramanian
(Career Publications)
Distributions

Discrete Continuous

Binomial Multinomial Normal Z


t χ2 F
Poisson’s
Let's first understand different types of data…..
Types of data

Qualitative or Quantitative or
Categorical Data Numerical Data

Nominal data Ordinal data

Discrete Continuous
data data
Types of Data: Categorical (Qualitative)
● Categorical data is a data type that not quantitative i.e. does not have a
number
● Nominal data is defined as data that is used for naming or labelling
variables, without any quantitative value
● Nominal data collection techniques are mainly question-based
● It is collected via questions that either requires the respondent to give an
open-ended answer or choose from a given list of options
● Which country are you from? What is your gender?

*Eg. Data obtained from Survey forms/google forms/MCQ


Types of Data: Categorical (Qualitative)

● Ordinal data is a type of categorical data with an order


● The variables in ordinal data are listed in an ordered manner….. usually
numbered, so as to indicate the order of the list
● However, the numbers are not mathematically measured or determined but
are merely assigned as labels for opinions.
Example

1] How was the webinar? (Nominal data)

_______

2] How was the webinar (Ordinal data)

1. Very Good
2. Good
3. Neutral
4. Bad
5. Very Bad
What are the Statistical Tests used for
Categorical data analysis?

● Both ordinal and nominal data are evaluated using nonparametric statistics
● The mean and standard deviation cannot be evaluated for these data
types

Tests for nominal data: Tests for ordinal data

➢ Chi-Square test ➢ Wilcoxon signed-rank test


➢ McNemar test ➢ Friedman 2-way ANOVA
➢ Cochran Q's test ➢ Wilcoxon rank-sum test
➢ Fisher's Exact test ➢ Kruskal-Wallis 1-way test
Types of Data: Numerical (Quantitative)

● Numeric variables have values that describe a measurable quantity as a


number, like 'how many' or 'how much'
● Therefore numeric variables are quantitative variables
● A continuous variable: Observations can take any value between a certain
set of real numbers
● The value given to an observation for a continuous variable can include
values as small as the instrument of measurement allows
● Examples: height, time, age, and temperature.
Types of Data: Numerical (Quantitative)

● A discrete variable: Values are distinct and separate


● This data can only take on certain values and can’t be measured but it can
be counted
● A discrete variable cannot take the value of a fraction between one value
and the next closest value
● Examples: the number of heads in 100 coin flips
● number of business locations,number of children in a family, all of of which
measured as whole units (i.e. 1, 2, 3 children)
What are the Statistical Tests used for
Numerical data analysis?

● Parametric tests are those that make assumptions about the parameters of
the population distribution from which the sample is drawn
● Eg. Assumption that the population data are normally distributed
Tests for Numerical data:

➢ Unpaired t test
➢ Paired t test
➢ One way Analysis of Variance (ANOVA)
➢ Above followed by various post tests eg. Dunnetts, Bonferoni, Tukey etc
Ok back to our Probability Distributions…..
Biostatistics by Sai Subramanian
(Career Publications)
Distributions

Discrete Continuous

Binomial Multinomial Normal Z


t χ2 F
Poisson’s
Binomial Distribution
Bi" means "two" ... so this is about things with two results
https://www.mathsisfun.com/

Ex: Tossing a Coin:

● Did we get Heads (H) or


● Tails (T)

We say the probability of the coin landing H is ½


And the probability of the coin landing T is ½
Binomial Distribution
Bi" means "two" ... so this is about things with two results
https://www.mathsisfun.com/

Ex: Throwing a Die:

● Did we get a four ... ?


● ... or not?

We say the probability of a four is 1/6 (one of the six faces is a four)

And the probability of not four is 5/6 (five of the six faces are not a four)
Note that a die has 6 sides but here we look at only two cases: "four: yes" or
"four: no"
Formulas for Binomial distribution
https://www.mathsisfun.com/

What is !
It is called factorial
n! for n=5 is 5x4x3x2x1
● p is the probability of each choice we want
● k is the the number of choices we want
● n is the total number of choices
Ex.: You are helping out at the canteen during AISSMS sports meet to sell
various snacks. 70% of people choose samosas, the rest choose something
else. What is the probability of selling 2 samosas to the next 3 customers?

Soln: Here, let p be the probability of selling a samosa = 0.7 (converted from %)

We want to sell 2 samosa (k=2) to next 3 customers (total number of choices)


P= 3! 0.72 x (1-0.7) (3-2)

2! (3-2)!

P = 3x2x1 0.7 x 0.7x 0.3

2x1x1

P = 3 x 0.147 = 0.441 ie 44.1% probability that our next three customers will
want samosa
Ex. for practice: A and B play a game in which their chance of winning is in the
ratio 2:3
Find A’s chance of winning exactly 3 games out of 5

Let p be the probability that A wins a game =⅖ =0.4


n= total games played =5
we want probability of k = 3

P(3 out of 5) = (5!/3!2!) x 0.4x 0.4x0.4x0.6x0.6= (5x4x3!/3!x2!) x 0.023= 0.23


Points to remember for binomial distribution

● The trials are independent,


● There are only two possible outcomes at each trial*
● The probability of "success" at each trial is constant

* (refer slide 21 for clarification)


Poisson’s Distribution

Poisson's distribution is a limiting case of binomial distribution under the following


conditions:

● The number of trials n is very large (n→∞)


● The probability of success in one trial is indefinitely small (p →0)

A Poisson distribution is a tool that helps to predict the probability of certain events
happening when you know how often the event has occurred. It gives us the
probability of a given number of events happening in a fixed interval of time.
Practical Uses of the Poisson Distribution
https://www.statisticshowto.com

● Poisson distributions are used by businessmen to make forecasts about the


number of customers or sales on certain days or seasons of the year

● In business, overstocking will sometimes mean losses if the goods are not
sold.
● Likewise, having too few stocks would still mean a lost business opportunity
because you were not able to maximize your sales due to a shortage of stock.
● By using this tool, businessmen are able to estimate the time when demand is
unusually higher, so they can purchase more stock.
Practical Uses of the Poisson Distribution
https://www.statisticshowto.com

● Hotels and restaurants could prepare for an influx of customers, they could hire
extra temporary workers in advance, purchase more supplies, or make
contingency plans
● With the Poisson distribution, companies can adjust supply to demand in order
to keep their business earning good profit.
● In addition, waste of resources is prevented.
Poisson’s Distribution

Poisson Distribution is: P(x; μ) = (e-μ * μx) / x!

Where:

● The symbol “!” is a factorial.


● μ (the expected number of occurrences) is sometimes written as λ.
Sometimes called the event rate or rate parameter.
● e = 2.71828 (e is Euler’s number, a constant)
Ex. Poisson Distribution: The average number of major storms in your city is 2
per year. What is the probability that exactly 3 storms will hit your city next
year?

https://www.statisticshowto.com

Average number of storms per year, historically= μ = 2

Number of storms we think might hit next year= x = 3 ; e = 2.71828

P(x; μ) = (e-μ * μx) / x! Therefore, P(3;2) = (2.71828-2x 23)/3x2x1)= (0.13534) (8) / 6 = 0.180

The probability of 3 storms happening next year is 0.180, or 18%

Calculating the Poisson distribution for a simple set of data can be done manually

However, the usual way to calculate a Poisson distribution in real life situations is with
software like IBM SPSS.
Poisson distribution vs. Binomial
https://www.statisticshowto.com

When to use Poisson and when to use binomial

● If your question has an average probability of an event happening per unit (i.e.
per unit of time, cycle, event) and you want to find probability of a certain
number of events happening in a period of time (or number of events), then use
the Poisson Distribution.
● If you are given an exact probability and you want to find the probability of the
event happening a certain number of times out of x (i.e. 10 times out of 100, or
99 times out of 1000), use the Binomial Distribution formula.
Normal Distribution

● It is a mathematical model used by researchers to represent the data collected


by them
● It is based on the law of probability or chance of certain events occuring
● If a set of observations conforms to this mathematical model, then it can be
represented by a Bell shaped curve
● A bell curve has a small percentage of the points on both tails and the bigger
percentage on the inner part of the curve
Properties of a normal distribution

● The mean, mode and median are all equal.

● The curve is symmetric at the center (i.e. around the mean, μ).
● Exactly half of the values are to the left of center and exactly half the
values are to the right.
● The total area under the curve is 1.
Normal Distribution: Bell shaped curve

● 68% of the data falls within one


standard deviation of the mean.

● 95% of the data falls within two


standard deviations of the mean.
● 99.7% of the data falls within three
standard deviations of the mean.

μ= population mean; 𝝈 = population std deviation


Normal Distribution: Bell shaped curve

● The standard deviation controls the spread


of the distribution
● A smaller standard deviation indicates that
the data is tightly clustered around the
mean; the normal distribution will be taller.
● A larger standard deviation indicates that
the data is spread out around the mean;
the normal distribution will be flatter and
wider. μ= population mean; 𝝈 = population std deviation
Ex.

● IQ is assumed to be normally
distributed
● The Wechsler Intelligence scale for
children-revised (WISC-R) has a
mean of 100 and a Std deviation of
15
● Thus 68.2% of the children have a
WISC-R score in the range of ? - ?
Solution:

68.2% of the area under the normal distribution curve falls within one std deviation
from the mean ie mean士Std dev

Here the mean is 100 and the Std deviation is 15

Thus 68.2% of the children have a WISC-R score in the range of 100-15 to 100+
15…….ie 85-115
Practical Applications of the Standard
Normal Model

● The standard normal distribution could help you figure out which subject you are
getting good grades in and which subjects you have to exert more effort into due
to low scoring percentages

● Once you get a score in one subject that is higher than your score in another
subject, you might think that you are better in the subject where you got the higher
score.
● This is not always true.
● You can only say that you are better in a particular subject if you get a score with
a certain number of standard deviations above the mean
Another type of normal distribution problems
Ex. Diameters of cylindrical syringes are normally distributed with mean 0.498 cm and SD of 0.002
cm. If the tolerance limit is given by 0.5士0.004, find out how many cylinders will be rejected from
10,000 sent for testing?
Soln. Here mean is 0.498, Std dev is 0.002
The tolerance limits by QC is 0.500士0.004 this indicates lower limit is 0.496 and upper limit is 0.504.
Let us calculate z value for these two limits by formula
z= (individual value of interest- mean value)/std dev
z= (0.496-0.498)/0.002 = 1 and also z = (0.504-0.498)/0.002 = 3 …lets check this in the table
A= 0.3413 + A=0.4987 =0.84, this means 84% will be accepted
Out of 10,000 syringes, 1600 will be rejected
Here, we are checking under the
level of P<0.001.
For other problems if not specified
the acceptable P level can be
taken as P<0.05
Check out below site for more problems

https://www.statisticshowto.com/probability-and-statistics/normal-distrib
utions/
How to use excel to plot a normal
distribution curve

https://www.youtube.com/watch?v=UASCe-3Y1to

You might also like