02-03-2023 - 23 - Probabilty
02-03-2023 - 23 - Probabilty
● Parametric test: t-test(Sample, Pooled or Unpaired and Paired) , ANOVA, (One way and Two way),
Least Significant difference
Probability
● The earliest mathematical analysis of the theory
of probability dates to the 18th century
● Whenever we're unsure about the outcome of an event, we can talk about the
probabilities of certain outcomes—how likely they are.
● The best example for understanding probability is flipping a coin:
● There are two possible outcomes—heads or tails.
● What’s the probability of the coin landing on Heads? We can find out using the
equation P(H) = ? In this case, P(H) = ½…or 50%……(Probability of an event = (# of
ways it can happen) / (total number of outcomes)
Some aspects about Probability…..
● The probability of an event can only be between 0 and 1 and can also be written
as a percentage.
For those of you who like brain teasers/television games of luck, click on this problem from
Khans academy https://youtu.be/Xp6V_lO1ZKA
Probability distributions
Discrete Continuous
Qualitative or Quantitative or
Categorical Data Numerical Data
Discrete Continuous
data data
Types of Data: Categorical (Qualitative)
● Categorical data is a data type that not quantitative i.e. does not have a
number
● Nominal data is defined as data that is used for naming or labelling
variables, without any quantitative value
● Nominal data collection techniques are mainly question-based
● It is collected via questions that either requires the respondent to give an
open-ended answer or choose from a given list of options
● Which country are you from? What is your gender?
_______
1. Very Good
2. Good
3. Neutral
4. Bad
5. Very Bad
What are the Statistical Tests used for
Categorical data analysis?
● Both ordinal and nominal data are evaluated using nonparametric statistics
● The mean and standard deviation cannot be evaluated for these data
types
● Parametric tests are those that make assumptions about the parameters of
the population distribution from which the sample is drawn
● Eg. Assumption that the population data are normally distributed
Tests for Numerical data:
➢ Unpaired t test
➢ Paired t test
➢ One way Analysis of Variance (ANOVA)
➢ Above followed by various post tests eg. Dunnetts, Bonferoni, Tukey etc
Ok back to our Probability Distributions…..
Biostatistics by Sai Subramanian
(Career Publications)
Distributions
Discrete Continuous
We say the probability of a four is 1/6 (one of the six faces is a four)
And the probability of not four is 5/6 (five of the six faces are not a four)
Note that a die has 6 sides but here we look at only two cases: "four: yes" or
"four: no"
Formulas for Binomial distribution
https://www.mathsisfun.com/
What is !
It is called factorial
n! for n=5 is 5x4x3x2x1
● p is the probability of each choice we want
● k is the the number of choices we want
● n is the total number of choices
Ex.: You are helping out at the canteen during AISSMS sports meet to sell
various snacks. 70% of people choose samosas, the rest choose something
else. What is the probability of selling 2 samosas to the next 3 customers?
Soln: Here, let p be the probability of selling a samosa = 0.7 (converted from %)
2! (3-2)!
2x1x1
P = 3 x 0.147 = 0.441 ie 44.1% probability that our next three customers will
want samosa
Ex. for practice: A and B play a game in which their chance of winning is in the
ratio 2:3
Find A’s chance of winning exactly 3 games out of 5
A Poisson distribution is a tool that helps to predict the probability of certain events
happening when you know how often the event has occurred. It gives us the
probability of a given number of events happening in a fixed interval of time.
Practical Uses of the Poisson Distribution
https://www.statisticshowto.com
● In business, overstocking will sometimes mean losses if the goods are not
sold.
● Likewise, having too few stocks would still mean a lost business opportunity
because you were not able to maximize your sales due to a shortage of stock.
● By using this tool, businessmen are able to estimate the time when demand is
unusually higher, so they can purchase more stock.
Practical Uses of the Poisson Distribution
https://www.statisticshowto.com
● Hotels and restaurants could prepare for an influx of customers, they could hire
extra temporary workers in advance, purchase more supplies, or make
contingency plans
● With the Poisson distribution, companies can adjust supply to demand in order
to keep their business earning good profit.
● In addition, waste of resources is prevented.
Poisson’s Distribution
Where:
https://www.statisticshowto.com
P(x; μ) = (e-μ * μx) / x! Therefore, P(3;2) = (2.71828-2x 23)/3x2x1)= (0.13534) (8) / 6 = 0.180
Calculating the Poisson distribution for a simple set of data can be done manually
However, the usual way to calculate a Poisson distribution in real life situations is with
software like IBM SPSS.
Poisson distribution vs. Binomial
https://www.statisticshowto.com
● If your question has an average probability of an event happening per unit (i.e.
per unit of time, cycle, event) and you want to find probability of a certain
number of events happening in a period of time (or number of events), then use
the Poisson Distribution.
● If you are given an exact probability and you want to find the probability of the
event happening a certain number of times out of x (i.e. 10 times out of 100, or
99 times out of 1000), use the Binomial Distribution formula.
Normal Distribution
● The curve is symmetric at the center (i.e. around the mean, μ).
● Exactly half of the values are to the left of center and exactly half the
values are to the right.
● The total area under the curve is 1.
Normal Distribution: Bell shaped curve
● IQ is assumed to be normally
distributed
● The Wechsler Intelligence scale for
children-revised (WISC-R) has a
mean of 100 and a Std deviation of
15
● Thus 68.2% of the children have a
WISC-R score in the range of ? - ?
Solution:
68.2% of the area under the normal distribution curve falls within one std deviation
from the mean ie mean士Std dev
Thus 68.2% of the children have a WISC-R score in the range of 100-15 to 100+
15…….ie 85-115
Practical Applications of the Standard
Normal Model
● The standard normal distribution could help you figure out which subject you are
getting good grades in and which subjects you have to exert more effort into due
to low scoring percentages
● Once you get a score in one subject that is higher than your score in another
subject, you might think that you are better in the subject where you got the higher
score.
● This is not always true.
● You can only say that you are better in a particular subject if you get a score with
a certain number of standard deviations above the mean
Another type of normal distribution problems
Ex. Diameters of cylindrical syringes are normally distributed with mean 0.498 cm and SD of 0.002
cm. If the tolerance limit is given by 0.5士0.004, find out how many cylinders will be rejected from
10,000 sent for testing?
Soln. Here mean is 0.498, Std dev is 0.002
The tolerance limits by QC is 0.500士0.004 this indicates lower limit is 0.496 and upper limit is 0.504.
Let us calculate z value for these two limits by formula
z= (individual value of interest- mean value)/std dev
z= (0.496-0.498)/0.002 = 1 and also z = (0.504-0.498)/0.002 = 3 …lets check this in the table
A= 0.3413 + A=0.4987 =0.84, this means 84% will be accepted
Out of 10,000 syringes, 1600 will be rejected
Here, we are checking under the
level of P<0.001.
For other problems if not specified
the acceptable P level can be
taken as P<0.05
Check out below site for more problems
https://www.statisticshowto.com/probability-and-statistics/normal-distrib
utions/
How to use excel to plot a normal
distribution curve
https://www.youtube.com/watch?v=UASCe-3Y1to