100% found this document useful (1 vote)
8 views76 pages

Concept of Probability

The document discusses the concept of probability, emphasizing its importance in statistical reasoning and decision-making. It highlights common misconceptions about probability, such as overconfidence and the tendency to see patterns in random data. Additionally, it contrasts frequentist and Bayesian statistics, explaining how probabilities can be derived from models and subjective beliefs.

Uploaded by

Jumping Kills
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
8 views76 pages

Concept of Probability

The document discusses the concept of probability, emphasizing its importance in statistical reasoning and decision-making. It highlights common misconceptions about probability, such as overconfidence and the tendency to see patterns in random data. Additionally, it contrasts frequentist and Bayesian statistics, explaining how probabilities can be derived from models and subjective beliefs.

Uploaded by

Jumping Kills
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 76

CONCEPT OF

PROBABILITY
When it is not in our power to
determine what is true , we ought to
follow what is most probable.

Rene Descartes
“Statistics Means Never Having To
Say You’re Certain!”

MYLES HOLLANDER
(STATISTICIAN)
We expect statistical calculations to yield definite conclusions.
But in fact, every statistical conclusion is stated in terms of
probability. Statistics can be very diffi cult to learn if you keep looking
for definitive conclusions
Click icon to add picture
Why Study
PROBABLITY?
Why Study Probability
1. We tend to jump to conclusions
“ Girls don’t drive trucks , only boys do “ “ Boys don’t wear Pink “

◦The ability to generalize from a sample to a population is hardwired


into our brains
◦To avoid our natural inclination to make overly strong conclusions
from limited data, scientists need to use statistics
Why Study Probability

2. We tend to be over confident


Test devised by
Russo and Schoemaker
Answer each of these questions with a range
Pick a range that you think has a 90% chance of containing the correct answer
Don’t use google
The goal is not to provide precise answers, but rather is to correctly quantify your uncertainty and
come up with ranges of values that you think are 90% likely to include the true answer.
If you have no idea, answer with a super wide interval.
For example, if you truly have no idea at all about the answer to the first question, answer with
the range zero to 120 years old, which you can be 100% sure includes the true answer.
But try to narrow your responses to each of these questions to a range that you are 90% sure
contains the right answer:
Why Study Probability And Statistics
1. Martin Luther King Jr.’s age at death
2. Length of the Nile River, in miles or kilometers
3. Number of countries in OPEC
4. Number of books in the Old Testament
5. Diameter of the moon, in miles or kilometers
6. Weight of an empty Boeing 747, in pounds or kilograms
7. Year Mozart was born
8. Gestation period of an Asian elephant, in days
9. Distance from London to Tokyo, in miles or kilometers
10.Deepest known point in the ocean, in miles or kilometers
Why Study Probability And Statistics
1. Martin Luther King Jr.’s age at death: 39
2. Length of the Nile River: 4,187 miles or 6,738 kilometers
3. Number of countries in OPEC: 13
4. Number of books in the Old Testament: 39
5. Diameter of the moon: 2,160 miles or 3,476 kilometers
6. Weight of an empty Boeing 747: 390,000 pounds or 176,901 kilograms
7. Year Mozart was born: 1756
8. Gestation period of an Asian elephant: 645 days
9. Distance from London to Tokyo: 5,989 miles or 9,638 kilometers
10.Deepest known point in the ocean: 6.9 miles or 11.0 kilometers
Why Study Probability

2. We tend to be over confident


Test devised
by Russo and Schoemaker

Russo and Schoemaker (1989) tested more than 1,000 people and reported that 99% of them
were overconfident.

Most people created narrow ranges that included only 30% to 60% of the correct answers.

Since we tend to be too sure of ourselves, scientists must use statistical


methods to quantify confidence properly
Why Study Probability And Statistics
3. We see patterns in random data

Simulated data from 10 basketball players (1 per row) shooting 30 baskets each. An “X”
represents a successful shot and a “–” represents a miss.
Why Study Probability
3. We see patterns in random data
Most people see streaks of successful shots and conclude this is not random

Each spot had a 50% chance of being “X” (a successful shot) and a 50% chance of being
“–” (an unsuccessful shot), without taking into consideration previous shots.

We see clusters perhaps because our brains have evolved to find patterns and do so very
well.
Why Study Probability And Statistics
4. We don’t realise coincidences are
common
While it is highly unlikely that any particular coincidence will occur, it
is almost certain that some seemingly astonishing set of unspecified
events will happen often, since we notice so many things each day.

Remarkable coincidences are always noted in hindsight and never


predicted with foresight.
Why Study Probability
4. We don’t expect variability to depend on
size
◦ Gelman (1998) looked at the relationship between the populations of counties and the age-
adjusted, per-capita incidence of kidney cancer (incidence of about 15 cases per 100,000 adults
in the United States)
◦ First, he focused on counties with the lowest per-capita incidence of kidney cancer. – Most of
these counties had small populations. Why? One might imagine that something about the
environment in these rural counties leads to lower rates of kidney cancer
◦ Then he focused on those with higher rates – those were also small counties.– Why ???? medical
care in these tiny counties leads to higher rates of kidney cancer.
◦ But it seems pretty strange that both the highest and lowest incidences of kidney cancer be in
counties with small populations?
◦ Random variation can have a bigger effect on averages within small groups than within large
groups. This simple principle is logical, yet is not intuitive to many people.
Why Study Probability And Statistics

4. We don’t do Bayesian calculations


intuitively
Screening blood donors for HIV .
0.1 % of blood donors are HIV positive .
The antibody test is highly accurate but not quite perfect. It correctly identifies 99% of
infected blood samples but also incorrectly concludes that 1% of noninfected samples have
HIV

When this test identifies a blood sample as having HIV present, what is
the chance that the donor does, in fact, have HIV, an what is the chance
the test result is an error (a false positive)?
Why Study Probability And Statistics

4. We don’t do Bayesian calculations intuitively


Scenario 1 –
1,00,000 were tested of these 100 (0.1%) have HIV , and 99 (99%) of these will have a
positive result.
99900 will not have HIV but the test will still label 1% of 99,900 = 999 as false positives
So in total there will be 99+999= 1098 positive results of which only 99/1098 =9% will
be true positives and other 91 % false positives .
Thus , only a 9% chance that there is actually HIV in that sample

Most people, including most physicians, intuitively think that a positive test almost
certainly means that HIV is present.
Our brains are not adept at combining what we already know (the prevalence of HIV)
with new knowledge (the test is positive).
Why Study Probability And Statistics

4. We don’t do Bayesian calculations intuitively


Scenario 2 – IV Drug Users -10% have HIV
◦1,00,000 were tested of these 10,000 (10%) have HIV, and the test will
be positive for 9,900 (99%) of them.
◦Other 90,000 people will not have HIV, but the test will incorrectly
return a positive result in 1% of cases.
◦So there will be 900 false positive tests (1% of 90,000)
◦ Altogether, there will be 9,900 + 900 = 10,800 positive tests, of which
9,900/10,800 = 92% will be true positives.
◦The other 8% of the positive tests will be false positives. So if a test is
positive, there is a 92% chance that there is HIV in that sample.
Why Study Probability And Statistics

4. We don’t do Bayesian calculations intuitively


The interpretation of the test result depends greatly on what fraction of
the population has the disease
Why Study Probability And Statistics

4. We are fooled by regression towards mean


RTM is a statistical phenomenon that occurs when repeated
measurements are made on the same subject or unit of observation.
It happens because values are observed with random error.
Random error - a non-systematic variation in the observed values
around a true mean (e.g. random measurement error, or random
fluctuations in a subject).
Systematic error, where the observed values are consistently biased, is
not the cause of RTM.
It is rare to observe data without random error, which makes RTM a
common phenomenon.
Why Study Probability And Statistics

4. We are fooled by regression towards mean

All data in (A) were drawn from random distributions (Gaussian; mean = 120, SD = 15)
without regard to the designations “before” and “after” and without regard to any pairing.
(A) shows 48 random values, divided arbitrarily into 24 before–after pairs (which overlap
enough that you can’t count them all).
Why Study Probability And Statistics

4. We are fooled by regression towards mean


This is an example of regression to the mean: the more
extreme a variable is upon its first measurement, the more
likely it is to be closer to the average the second time it is
measured
PROBABAILITY
BASICS
Probability Basics
◦Probabilities range from 0.0 to 1.0 (or 100%) and are used to quantify
a prediction about future events or the certainty of a belief.
◦A probability of 0.0 means either that an event can’t happen or that
someone is absolutely sure that a statement is wrong.
◦A probability of 1.0 (or 100%) means that an event is certain to
happen or that someone is absolutely certain a statement is correct.
◦A probability of 0.50 (or 50%) means that an event is equally likely to
happen or not happen, or that someone believes that a statement is
equally likely to be true or false
Probability Basics
◦Probability that is “out there,” or outside your head. This is probability as
long term frequency. The probability that a certain event will happen has a
definite value, but we rarely have enough information to know that value
with certainty.

◦Probability that is inside your head. This is probability as strength of


subjective beliefs, so it may vary among people and even among different
assessments by the same person.
PROBABILITY AS LONG-TERM FREQUENCY
A woman plans to get pregnant and wants to know the chance that her baby will be a
boy.
Probabilities is as the predictions of future events that are derived by using a model.
A model is a simplified description of a mechanism.
For this example, we can create the following simple model:
◦ Each ovum has an X chromosome, and none have a Y chromosome.
◦ Half the sperm have an X chromosome (but no Y) and half have a Y chromosome (and no X).
◦ Only one sperm will fertilize the ovum.
◦ Each sperm has an equal chance of fertilizing the ovum.
◦ If the winning sperm has a Y chromosome, the fetus will have both an X and a Y chromosome
and so will be male. If the winning sperm has an X chromosome, the fetus will have two X
chromosomes and so will be female.
◦ Any miscarriage or abortion is equally likely to happen to male and femalefetuses.
PROBABILITY AS LONG-TERM
FREQUENCY
◦Thus, our model predicts that the chance that the fetus will be male is 0.50, or 50%
◦ You can make predictions about the occurrence of future events from any model, even if
the model doesn’t reflect reality---- this one is not perfect but close to reality

Probabilities based on data


◦ Of all the babies born in the world in 2011, 51.7% were boys (Central Intelligence Agency
[CIA], 2012).
◦ Based on these data, we can answer the question, what is the chance that my baby will
be a boy?
◦ The answer is 51.7%.
PROBABILITY AS STRENGTH OF BELIEF

Subjective probabilities:
◦ Someone badly wants a boy.
◦ They search the Internet and read about an interesting book:
◦ How to Choose the Sex of Your Baby explains the simple, at-home, noninvasive Shettles
method and presents detailed steps to take to conceive a child of a specific gender. The
properly applied Shettles method gives couples a 75 percent or better chance of having a
child of the desired sex. (Shettles, 1996)
◦ What is the chance that you’ll have a boy?
◦ If someone has complete faith that the method is correct, then they believe that the
probability, as stated on the book jacket, is 75%.- subjective probability.

(0.850 × 0.750) + (0.150 × 0.517) = 0.715 = 71.5%. (Their)


(0.010 × 0.750) + (0.990 × 0.517) = 0.519 = 51.9%. (Yours )
Probability vs. odds
Odds and probability are two alternative for expressing precisely the same concept.
◦ Worldwide, the sex ratio at birth in many countries is about 1.07. Another way to say this is that the
odds of having a boy versus a girl 1.07 to 1.00, or 107 to 100.
◦ If there are 107 boys born for every 100 girls born, the chance that any particular baby will be male is
107/ (107 + 100) = 0.517, or 51.7%

The odds are defined as the probability that the event will occur divided by the
probability that the event will not occur.

◦ If the probability of having a boy is 51.7%, then you expect 517 boys to be born for every 1,000 births.
◦ Of these 1,000 births, you expect 517 boys and 483 girls (which is 1,000 − 517) to be born.
◦ So the odds of having a boy versus a girl are 517/483 = 1.07 to 1.00.
Probability vs. odds
◦Odds can be any positive value or zero, but they cannot be negative.
◦ A probability must be between zero and 1 if expressed as a fraction, or be between
zero and 100 if expressed as a percentage
◦A probability of 0.5 is the same as odds of 1.0. The probability of flipping a coin to
heads is 50%. The odds are 50:50, which equals 1.0.
◦As the probability goes from 0.5 to 1.0, the odds increase from 1.0 to approach infinity.
◦For example,if the probability is 0.75, then the odds are 75:25, 3 to 1, or 3.0
Probability vs.
statistics
◦ Probability calculations go from general to
specific, from population to sample, and
from model to data

◦ Statistical calculations work in the


opposite direction . You start with one set
of data (the sample) and make inferences
about the overall population or model. The
logic goes from specific to general, from
sample to population, and from data to
model.
PROBABILITY IN STATISTICS
Probability can be “out there” or “in your head.”
We compute confidence intervals (CIs) and p values which are out there.
This style of analyzing data is called FREQUENTIST STATISTICS
Only the data from a current set are used as inputs when
calculating P values or CI’s. Scientists often account for
prior data and theory when interpreting these results,
Bayesian statistics
◦Many statisticians prefer an alternative approach called Bayesian statistics, in which
prior beliefs are quantified and used as part of the calculations.
◦These prior probabilities can be subjective (based on informed opinion), objective (based
on solid data or well-established theory), or uninformative (based on the belief that all
possibilities are equally likely).
◦Bayesian calculations combine these prior probabilities with the current data to compute
probabilities and Bayesian CIs called credible intervals
Sample space and Events

◦The sex of all newborns in hospital - Two outcomes – males and


females
◦A set containing all the possible outcomes – SAMPLE SPACE
◦Outcomes – discrete or continuous
◦E.g Birth weight in grams – Sample space (1000, 4500 gms )
Mutually and Non mutually exclusive events

When the occurrence of one event exclude the possibility of


another event – Mutually Exclusive Events e.f blood groups A,B,
AB, O and Gender

Non – mutually Exclusive - Symptoms such as headache , nausea


and fever, Family history of Diabetes , hypertension Ischaemic
heart Disease
Independent and Dependent Events
When the occurence of one event does not influence the occurrence of another –
INDEPENDENT EVENTS
◦E.g. Gender does not influence blood group or vice versa
◦Gender of first child does not influence gender of future children
If the occurrence of on event affects the occurrence of others –
- DEPENDENT EVENTS
Presence of diabetes and retinopathy , presence of diabetes and family history of
diabetes
Additive rule of Probablity
When computing the probability
of occurrence of one among any
of the events of interest
e.g the probability that a selected
person from a sample has either
blood type “A” OR “B”

P(A or B)= P(A)+P(B)


=0.22+0.33=0.55
Multiplication RuleThe multiplication rule gives the
probability that two (or more) events
happen together.
The probability of two independent events
is the probability that both events occur and
is found by multiplying the probabilities of
the two events, which is called the
multiplication rule for probabilities.
P(Male and blood type O)=
P(Male )x P(blood type O)
= 0.50 X 0.42= 0.21
Non independent Events & the Modified Multiplication Rule
Let A stand for the event “known
meningitis” and B for the event
“recent epidemic” (in which known
meningitis is having either
meningitis alone or meningitis with
sepsis).

We want to know the probability of


event A given event B, written P(A |
B) where the vertical line, |, is read
as “given.”

In other words, we want to know


the probability of event A, assuming
that event B has happened
Nonindependent Events & the Modified Multiplication Rule

CONDITIONAL PROBABILITY, is the


probability of one event given that another
event has occurred.

Put another way, the probability of a patient


having known meningitis is conditional on
the period of the epidemic
Nonindependent Events & the Modified Multiplication Rule

CONDITIONAL PROBABILITY, is the


probability of one event given that another
event has occurred.

Put another way, the probability of a


patient having known meningitis is
conditional on the period of the epidemic
Nonindependent Events & the Modified Multiplication Rule

CONDITIONAL PROBABILITY, is the


probability of one event given that another
event has occurred.

Put another way, the probability of a


patient having known meningitis is
conditional on the period of the epidemic
Nonmutually Exclusive Events & the Modified Addition Rule

◦From a community hospital setting , it is known that for a particular


clinical condition , the patients present with three symptoms fever,
headache and nausea (F,H,N) . TOTAL 100 patients

P(F OR H )= P(F )+P(H)- P(F and H )

=(62/100)+ (61/100) –(35 /100)

=(62+61-35)/100= 88/100= 0.88


COMMON MISTAKES:
PROBABILITY
COMMON MISTAKES: PROBABILITY
Mistake: Ignoring the assumptions- “What is the chance that a fetus will be male?”
◦ We are asking about human babies. Sex ratios may be different in another species.
◦ There is only one fetus. “Will the baby be a boy or girl?” is ambiguous, or needs elaboration,
if you allow for the possibility of twins or triplets.
◦ There is only a tiny probability (which we ignore) that the baby is neither completely female
nor completely male.
◦ The sex ratio is the same for all countries and all ethnic groups.
◦ The sex ratio does not change from year to year, or between seasons.
◦ There will be (and have been) no sex-selective abortions or miscarriages, so the sex ratio at
conception is the same as the sex ratio at birth.

Probabilities are always contingent on a set of assumptions. To think clearly about


probability in any situation, you must know what those assumptions are
COMMON MISTAKES: PROBABILITY
Mistake: Trying to understand a probability without clearly defining both the numerator
and the denominator
Numerator – baby boys denominator – all babies
E.g. in vitro fertilization – ambiguous

Mistake: Reversing probability statements


◦ The probability that a baby is a boy is obviously very different from the probability that a boy is a baby.
◦ The probability that a heroin addict first used marijuana is not the same as the probability that a marijuana user will
later become addicted to heroin.
◦ The probability that someone with abdominal pain has appendicitis is not the same as the probability that someone
diagnosed with appendicitis will have had abdominal pain
◦ The probability that a statistics books will be boring is not the same as the probability that a boring book is about
statistics.
COMMON MISTAKES: PROBABILITY
Mistake: Believing that probability has a memory
If a couple has four children, all boys, what is the chance that the next child will be a boy?
Frequency
Distribution
The set of all possible
outcomes of a variable
along with the
frequency (the number
of occurrences ) of each
outcome or a group of
outcomes is called the
“ Frequency
Distribution “
Probability
Distribution
◦The collection of
probabilities of all possible
outcomes as computed from
the observed or sampled
data is known as observed
or empirical probability
distribution.
Probability Distributions

They are characterized on the basis of the different aspects of


variables .

1. Binomial and Poisson Distributions – Discrete Variables


2. Exponential and Normal- Continuous Variables
NORMAL / Gaussian Distribution
◦ The Gaussian bell-shaped distribution, also called the normal distribution, is the basis for much of
statistics.
◦ It arises when many random factors create variability
◦ Most values are near the mean, some values are farther from the mean, and very few are quite far from
the mean.
◦ When you plot the data on a frequency distribution, the result is a symmetrical, bellshaped distribution,
idealized as the Gaussian distribution.
◦ For example, in a laboratory experiment, variation between experiments might be caused by several
factors: imprecise weighing of reagents, imprecise pipetting, the random nature of radioactive decay, non
homogenous suspensions of cells or membranes, and so on.
◦ Variation in a clinical value might be caused by many genetic and environmental factors.
◦ When scatter is the result of many independent additive causes, the distribution will tend to follow a bell-
shaped Gaussian distribution
The Normal Distribution:

The Normal curve is a mathematical abstraction


which conveniently describes ("models") many
frequency distributions of scores in real-life.
Francis Galton (1876) 'On the height and weight of boys aged 14, in town and
country public schools.' Journal of the Anthropological Institute, 5, 174-180:

Height of 14 year-old children


16

country
14
town

12

frequency (%)
10

- 52 -54 -56 - 58 -60 - 62 -64 - 66 -68 -70


51 53 55 57 59 61 63 65 67 69
height (inches)
Properties of the Normal Distribution:
1. It is bell-shaped and asymptotic at the extremes.
2. It's symmetrical around the mean.
3. The mean, median and mode all have same value.
4. It can be specified completely, once mean and SD
are known.
5. The area under the curve is directly proportional
to the relative frequency of observations.
e.g. here, 50% of scores fall below the mean, as
does 50% of the area under the curve.
e.g. here, 85% of scores fall below score X,
corresponding to 85% of the area under the curve.
Relationship between the normal curve and the
standard deviation:
All normal curves share this property: the SD cuts off a
constant proportion of the distribution of scores:-
frequency

68%

95%

99.7%

-3 -2 -1 mean +1 +2 +3
Number of standard deviations either side of mean
About 68% of scores fall in the range of the mean plus and minus 1 SD;
95% in the range of the mean +/- 2 SDs;
99.7% in the range of the mean +/- 3 SDs.

e.g. IQ is normally distributed (mean = 100, SD = 15).


68% of people have IQs between 85 and 115 (100 +/- 15).
95% have IQs between 70 and 130 (100 +/- (2*15).
99.7% have IQs between 55 and 145 (100 +/- (3*15).

68%

85 (mean - 1 SD) 115 (mean + 1 SD)


We can tell a lot about a population just from knowing
the mean, SD, and that scores are normally distributed.
If we encounter someone with a particular score, we can
assess how they stand in relation to the rest of their
group.
e.g. someone with an IQ of 145 is quite unusual (3 SDs
above the mean).
IQs of 3 SDs or above occur in only 0.15% of the
population [ (100-99.7) / 2 ].
z-scores:
z-scores are "standard scores".
A z-score states the position of a raw score in relation to
the mean of the distribution, using the standard
deviation as the unit of measurement.

raw score  mean


z 
standard deviation
1. Find the difference between a score
and the mean of the set of scores.
for a population :
2. Divide this difference by the SD (in
X  μ order to assess how big it really is).
z 
σ

for a sample :
X - X
z 
s
Raw score distributions:
A score, X, is expressed in the original units of measurement:

X = 236
X = 65

X 50 s  10 X 200 s  24

z = 1.5

X 0 s  1

z-score distribution:
X is expressed in terms of its deviation from the mean (in SDs).
z-scores transform our original scores into scores with a
mean of 0 and an SD of 1.
Raw IQ scores (mean = 100, SD = 15)
z for 100 = (100-100) / 15 = 0, z for 115 = (115-100) / 15 = 1,
z for 70 = (70-100) / -2, etc.

raw: 55 70 85 100 115 130 145


z-score: -3 -2 -1 0 +1 +2 +3
Why use z-scores?
1. z-scores make it easier to compare scores from
distributions using different scales.

e.g. two tests:


Test A: Fred scores 78. Mean score = 70, SD = 8.
Test B: Fred scores 78. Mean score = 66, SD = 6.

Did Fred do better or worse on the second test?


Test A: as a z-score, z = (78-70) / 8 = 1.00
Test B: as a z-score , z = (78 - 66) / 6 = 2.00

Conclusion: Fred did much better on Test B.


2. z-scores enable us to determine the relationship
between one score and the rest of the scores, using just
one table for all normal distributions.
e.g. If we have 480 scores, normally distributed with a
mean of 60 and an SD of 8, how many would be 76 or
above?
(a) Graph the problem:
(b) Work out the z-score for 76:
z = (X - X) / s = (76 - 60) / 8 = 16 / 8 = 2.00

(c) We need to know the size of the area beyond z


(remember - the area under the Normal curve corresponds
directly to the proportion of scores).
Many statistics books (and my website!) have z-score
tables, giving us this information:

(a)
z (a) Area between (b) Area
mean and z beyond z
0.00 0.0000 0.5000
0.01 0.0040 0.4960
(b)
0.02 0.0080 0.4920
: : :
1.00 0.3413 * 0.1587
: : :
*
x 2 = 68% of scores
2.00 0.4772 + 0.0228 +
x 2 = 95% of scores
: : : #
x 2 = 99.7% of scores
3.00 0.4987 # 0.0013 (roughly!)
0.0228

(d) So: as a proportion of 1, 0.0228 of scores are likely to


be 76 or more.
As a percentage, = 2.28%

As a number, 0.0228 * 480 = 10.94 scores.


How many scores would be 54 or less?
Graph the problem:

z = (X - X) / s = (54 - 60) / 8 = - 6 / 8 = - 0.75


Use table by ignoring the sign of z : “area beyond z” for
0.75 = 0.2266. Thus 22.7% of scores (109 scores) are 54
or less.
Word comprehension test scores:
Normal no. correct: mean = 92, SD = 6 out of 100
Brain-damaged person's no. correct: 89 out of 100.
Is this person's comprehension significantly impaired?

Step 1: graph the problem:

?
Step 2: convert 89 into a z-score:

z = (89 - 92) / 6 = - 3 / 6 = - 0.5 89 92


Step 3: use the table to find ?
the "area beyond z" for our z
of - 0.5:

Area beyond z = 0.3085 89 92


z-score value: Area between the Area beyond z:
mean and z:
0.44 0.17 0.33
Conclusion: .31 (31%) of 0.45 0.1736 0.3264
normal people are likely to 0.46 0.1772 0.3228
0.47 0.1808 0.3192
have a comprehension score 0.48 0.1844 0.3156
this low or lower. 0.49 0.1879 0.3121
0.5 0.1915 0.3085
0.51 0.195 0.305
0.52 0.1985 0.3015
0.53 0.2019 0.2981
0.54 0.2054 0.2946
0.55 0.2088 0.2912
0.56 0.2123 0.2877
0.57 0.2157 0.2843
0.58 0.219 0.281
0.59 0.2224 0.2776
0.6 0.2257 0.2743
0.61 0.2291 0.2709
Conclusions:
Many psychological/biological properties are
normally distributed.

This is very important for statistical inference


(extrapolating from samples to populations - more
on this in later lectures...).

z-scores provide a way of


(a) comparing scores on different raw-score
scales;
(b) showing how a given score stands in relation to
the overall set of scores.
Conclusions:

The logic of z-scores underlies many statistical tests.

1. Scores are normally distributed around their mean.

2. Sample means are normally distributed around the


population mean.

3. Differences between sample means are normally


distributed around zero ("no difference").

We can exploit these phenomena in devising tests to


help us decide whether or not an observed difference
between sample means is due to chance.
THANKS

You might also like