0% found this document useful (0 votes)
57 views54 pages

Random Sampling & Probability

Random sampling and probability are central to inferential statistics. There are two types of random sampling - sampling with replacement, where selected members are returned before the next selection, and sampling without replacement, where members are not returned. Probability can be approached from an a priori viewpoint, using theoretical probabilities, or an a posteriori viewpoint, calculating probabilities based on experimental data. Bias in sampling can occur if sampling rules are not followed or if response rates are low.

Uploaded by

Shaila Lopez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views54 pages

Random Sampling & Probability

Random sampling and probability are central to inferential statistics. There are two types of random sampling - sampling with replacement, where selected members are returned before the next selection, and sampling without replacement, where members are not returned. Probability can be approached from an a priori viewpoint, using theoretical probabilities, or an a posteriori viewpoint, calculating probabilities based on experimental data. Bias in sampling can occur if sampling rules are not followed or if response rates are low.

Uploaded by

Shaila Lopez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

RANDOM

SAMPLING &
PROBABILITY
Inferential Statistics
• Basic aim of inferential statistics is to use the sample scores
to make a statement about a characteristic of the
population.
• 2 Kinds of Statement: Hypothesis Testing and Parameter
Estimation.
• In hypothesis testing, the experimenter is collecting data in
an experiment on a sample set of subjects in an attempt to
validate some hypothesis involving a population.
“the improvement in final exam scores was due to the new teaching
method and not chance factors. Furthermore, the improvement does
not apply just to the particular sample tested. Rather, the
improvement would be found in the whole population of third
graders if they were taught by the new method.”
• In parameter estimation experiments, the
experimenter is interested in determining the
magnitude of a population characteristics.
“The probability is 0.95 that the interval of $300-$400
contains the population mean.”
• Random sampling and probability are central to
the methodology of inferential statistics.
• Sampling is the process of drawing out sample in
the population.
• Sampling frame is the list of all individual where
your sample be taken
RANDOM SAMPLING
• Both in hypothesis testing and in parameter
estimation experiments, the sample cannot be
just any subset of the population. Rather, it is
crucial that the sample is a random sample.
• Random sample is defined as a sample selected
from the population by a process ensures that (1)
each possible sample of a given size has an equal
chance of being selected and (2) all the members
of the population have an equal chance of being
asked into the sample.
The sample should be a random sample
for two reasons:
First, to generalize from a sample to a
population, it is necessary to apply
the laws of probability to the sample.
Second, to generalize from a sample
to a population, it is necessary that
the sample be representative of the
population.
Techniques for Random Sampling
• Fishbowl
• Table of Random Numbers
Sampling with replacement is defined as a
method of sampling in which each
member of the population selected for the
sample is returned to the population
before the next member is selected. When
sample size is small relative to population
size, the differences are negligible and
'with-replacement' techniques are much
easier to use providing the mathematical
basis for inference.
Sampling without replacement
is defined as a method of
sampling in which the member
of the sample are not returned
to the population before
subsequent members are
selected. Often used in
experiment.
SAMPLING

METHOD
Sampling
• It is a method that allows researchers to infer
information about a population based on results
from a subset of the population, withoyt having to
investigate every individual.
• Sampling reduces cost and workload, and may
make it easier to obtain high quality information.
• Whatever method it is chosen, it is important that
the individuals selected are representative of the
whole population.
• Sampling can be subdivided into two groups:
probability sampling and non-probability
Probability Sampling
• Probability sampling refers to sampling techniques for which a
person’s (or event’s) likelihood of being selected for membership in
the sample is known.
• Researchers who use probability sampling techniques are aiming to
identify a representative sample from which to collect data.
• You start in a complete sampling frame of all eligible individuals
• Do random sampling so that individuals in a population will have a
chance of being chosen for the sample, and you will be more able to
generalize the results from your study. Generalizability refers to the
idea that a study’s results will tell us something about a group larger
than the sample from which the findings were generated.
• More time consuming and expensive
Non-probability Sampling
• Do not start with a complete sampling frame, so
some of the individuals have no chance of being
selected.
• Consequently you cannot estimate the effect of
sampling error and there is a significant risk of
ending up with a non-representative sample
which produces non-generalizable results
• Cheaper and more convenient
• Useful for exploratory research and hypothesis
generation
Probability Sampling Methods
1. Simple random sampling - each individual is
chosen entirely by chance and each member of the
population has an equal chance, or probability, of
being selected.
-Using table of random numbers
-FIshbowl method
• Straightforward method
• Disadvantage, you may not select enough
individuals with your characteristics of interest,
especially if that characteristic is uncommon.
2. Systematic Sampling - individuals are selected
at regular intervals from the sampling frame.
-X/nth - 1000/100 = 10th
-More convenient than the first
3. Stratified Sampling - the population is first divided
into subgroups (or strata) who all share a similar
characteristics.
-It is used when we might reasonably expect the
measurement of interest to vary between the
different subgroups, and we want to ensure
representation from all the subgroups.
-It may also be appropriate to choose non-equal
sample sizes from each stratum
-It improves the accuracy and representativeness of
the results bu reducing sampling bias;
• It is difficult to decide which characteristics to
stratify by.
4. Clustered Sampling - subgroup of the
population are used as the sampling unit,
rather than indivioduals. The population is
divided into subgroups, known as clusters.
-Single-stage cluster sampling
-Two-stage cluster sampling
-Can be more efficient than simple random
sampling, especially where a study takes place
over a wide geographical region.
-Disadvantages include an increased risk of
bias
Non-probability Sampling
1. Convenience Sampling - participants are
selected based on availability and willingness
to take part.
-Prone to significant bias, because those who
volunteer to take part may be different from
those who choose not to (volunteer bias), and
the sample may not be representative of
other haracteristics.
-Researcher gathers data from whatever cases
happen to be convenient.
2. Quota Sampling-often used by
market researchers. Interviewers are
given a quota of subjects of a specified
type of attempt to recruit.
-Advantage of being straightforward
and potentially representative
-Researcher selects cases from within
several different subgroups.
3. Purposive (Judgment Sampling) - also known as
selective or subjective sampling.
-This sampling relies on the judgment of the
researcher when choosing who to ask to
participate
-This approach is often used by the media when
canvassing the public for opinions and in qualitative
research.
-Advantage of being time and cost effective to
perform whilst resulting in a range of response.
-Researcher seeks out elements that meet specific
criteria.
4. Snowball Sampling - commonly used in
social sciences when investigating hard-to-
reach groups.
-Existing subjects are asked to nominate
further subjects known to them, so the
sample increases in size like rolling
snowball.
-Advantage, it can be effective when
sampling frame is difficult to identify.
-Researcher relies on participant referrals to
recruit new participants.
Bias in sampling
There are five important potential sources of bias that
should be considered when selecting a sample,
irrespective of the method used. Sampling bias may be
introduced when:
1. Any pre-agreed sampling rules are deviated from
2. People in hard-to-reach groups are omitted
3. Selected individuals are replaced with others, for
example if they are difficult to contact
4. There are low response rates
5. An out-of-date list is used as the sample frame (for
example, if it excludes people who have recently
moved to an area)
PROBABILITY
PROBABILITY
May be approached in two ways:
1. From a priori/classical viewpoint
2. From an a posteriori/empirical
viewpoint
A priori means that which can be deduced from reason
alone, without experience (without recourse to any data
collection.

1. From the a priori/clasical viewpoint probability


defined as:
NumberofEv
entsClassi
fiableasA
p(A) 
TotalNumbe
rofPossibl
eEvents

p(A) is read as the “the probability of occurence of


event A.”
Example using a die, each die has siz sides, with different
number of spots painted on each side (vary form 1-6).
Suppose we are going to roll a die once. What is the
probability it will come to rest with a 2 (side with two
spots) facing upward. Since there are six possible
numbers that might occur and only one of these is 2, the
probability of a 2, in one roll of one die, is:
NumberofEventClassif iableas 2 1
p( A)  p(2)    0.1667 *
TotalNumbe rofPossibl eEvents 6
What is the probability of getting a number greater than
4 in one roll of one die? This time there are two events
classifiable as A (rolling 5 or 6). Thus,
NumberofEventClassif iableas 5or 6 2
p( A)  p(5or 6)    0.3333
TotalNumbe rofPossibl eEvents 6
A posteriori means “after the fact,” and in the content of
probability, it means after some data have been collected.

2. From the a posteriori/empirical viewpoint probability is


defined as:

NumberofTi
mesAhasOcc
ured
p(A) 
TotalNumbe
rofOccuren
ce

To determine the probability of a 2 in one roll of one die by


using the empirical approach, we would have to take the
actual die, roll it many times, and count the number of times
a 2 has occured. The more times we roll the die, the better.
• Assuming that we roll the die 100,000 times and that a
2 occurs 16,000 times. The probability in one roll of the
die is found by:
NumberofTi mesAhasOccured 16000
p ( A)    0.1600
TotalNumbe rofOccuren ce 100000
A. Note that, with this approach, it is necessary to have
the actual die and to collect some data before determining
the probability. The interesting thing is that if the die is
evenly balanced (all numbers are equally alike), then when
we roll the die many, many times, the a posteriori
probability approaches the a priori probability. If we roll an
infinite number of times, the two probabilitie will equal to
each other.
B. Also, if the die is loaded (weighted so tha ine side
comes up more often than the others), then the a posteriori
will differ from the a priori determination. For example, if
the die is heavily weighted for a 6 to come up, a 2 might
never appear.
We can see now that a priori equation assumes that each
possible outcome has an equal chance of occurence
Basic Points to Probability
• probability is fundamentally a proportion, it ranges in value from 0.00 to 1.00.
• If the probability of an event occuring is equal to 1.00, then the event is
certain to occur (the probability that a number from 1 to 6 will occur equals
1.00, it is certain that numbers will occur).
• If the probability equals 0.00, then the event is certain not to occur (example,
an ordinary die does not have a side with 7dots on it, rolling a 7 is certain not
to occur).
• The probability of occurence of an event is expressed as a fraction or a decimal
number (the answer may be left as a fraction but usually converted to decimal)
• Sometimes probability is express as “chance in 100.” For example, someone
might say the probability that event A will occur is 5 chance in 100. What it
really means p(A) = 0.05.
• Occasionally, probability is also expressed as the odds for or against an event
occuring. For example, a betting person might say that the odds are 3 to 1
favoring Mich to win the race. In probability terms, p(Mich's winning) = 3/4 =
0.75. If the odds were 3 to 1 against Mich's winning, the p(Mich's winning) =
1/4 = 0.25
Two Major Probability Rules
1. Addition Rule
2. Multiplication Rule
The Addition Rule
• It is concerned with determining the probability of
occurence of any one of several possible events.
• Let's assume there are only two possible events, A
and B. When there are two events, the addition rule
states the following.
• The probability of occurence of A or B is equal to the
probability of occurence of A plus the probability of
occurence of B minus the probability of occurence of
both A and B.
• Addition rule for two events - general equation
p
( )
AorB
p
(A)p
(B)p
(Aan
)
• First method, there are 16 ways to get an ace or a club,
so the probability of getting an ace or a club =16/52 =
0.3077
• Second method, uses addition rule. The probability of
getting an ace = 4/52, and the probability of getting a
club = 13/52. The probability of getting both an ace and
a club = 1/52. By the addition rule, the probability of
getting an ace or a clube =

4 13 1 16
     0.3077
52 52 52 52
Conditions:
-Events are mutually exclusive
(two or many)
-Events are mutually exhaustive
and mutually exclusive (two or
many)
• Two events are mutually exclusive if both cannot occur
together. Another way of saying this is that two events are
mutually exclusive if the occurence of one preludes the
occurrence of the other.
• The events of rolling a 1 and of rolling a 2 in one roll of a die
are mutually exclusive. If the roll end with a 1, it cannot also be
a 2. The events of picking an ace and a king in one draw from a
deck of ordinary playing cards are mutually exclusive. If the
card is an ace, it preludes the card also being king. (Ace and
club are not mutually exclusive because there is an ace of club
card)
• When the events are mutually exclusive, the probability of
both events occuring together is zero. Thus, p(A and B)=0,
when A and B are mutually exclusive.
• To simplify: p(A or B) = p(A) + p(B)
Suppose you are going to random sample 1
individual from a population of 130 people.
In the population, there are 40 children
younger than 12, 60 teenagers, and 30
adults. What is the probability the individual
you select will be a teenager or an adult?
• p(teenager or adult)= p(teenager)+p(adult)
• 60/130 + 30/130 = 90/130 = 0.6923
What is the probability of randomly
picking a 10 or a 4 in one draw from a
deck of ordinary playing cards? (There
are four 10s an four 4s)
• p(10 or 4) = p(10) + p(4)
• p(10 or 4) = 4/52 + 4/52 = 8/52 =
0.1538
• Addition rule with more than two mutually exclusive events.
p(A or B or C... or Z) = p(A)+p(B)+p(C)+...+p(Z)
• A set of events is exhaustive if the set includes all of the
possible events.
• For example, in rolling a die once, the set of events of getting a
1, 2, 3, 4, 5, or 6 is exhaustive because the set includes all of the
possble events.
• When a set of events is both exhaustive and mutually
exclusive, a very useful relationship exist.
p(A)+p(B)+p(C)+...+p(Z) = 1.00
• A,B, C... Z (last event) = the events
• Since the events are exhaustive and mutually exclusive, the sum
of their probabilities must equal to 1.00, Thus:
p(1)+p(2)+p(3)+p(4)+p(5)+p(6) = 1.00
1/6+1/6+1/6+1/6+1/6+1/6=1.00
• When there are only two events and the events
are mutually exclusive, it is customary to assign
symbol P to the probability of occurence of one
events and Q to the probability of occurence of
the other event.
• For example, if were flipping a penny and only
allowed it to come up heads or tails, this would be
a situation in which there are only two possible
events with each flip (a head or a tail), and the
events are mutually exclusive (if it is head, it cant
be tail, vice versa). P be the head and Q be the tail.
• In flipping coins, fair vs biased coins must be
distinguished.
• A fair coin or unbiased coin is one where it flipped once,
the probability of a head = the probability of a tail = 1/2.
• If the coin is biased, the probability of a head not equal
to the probability of a tail, not equal to 1/2
• Thus, if we are flipping a coin, if we let P equal the
probability of a head and Q equal the probability of a
tail, and the coin is a fair coin, the P = 1/2 or 0.50 and Q
= 1/2 or 0.50.
• Since the (2) events of getting a head or a tail in a single
flip of a coin are exhaustive and mutually exclusive,
their probabilities must equal to 1.
• Thus: P+Q = 0.50+0.50 = 1.00
The Multiplication Rule
• It is concerned with the joint or successive occurence of several
event.
• It deals with what happens on more than one roll or draw, while
addition rule covers just one roll or draw.
• It states the following: the probability of occurence of both A
and B qual to the probability of occurence of A times the
probability of B, give A has occured.
• Multiplication rule with two events - equation form
)p(A
p(AandB )p(B/A)
• p(B|A) = probability of occurence of B given A has occured (does
not mean B divided by A)
• Multiplication rule is concerned with the occurence both A and
B (Addition rule applies to the occurence of either A or B)
3 Conditions:
-Events are mutually
exclusive
-Events are independent
-Events are dependent
Multiplication Rule: Mutually
Exclusive Events
• If A and B are mutually exclusive,
then p(A and B) = 0
• Because when events are mutually
exclusive, the occurrence of one
precludes the occurence of the other.
The probability of their joint
occurence is zero.
Multiplication Rule: Independent
Events
• Two events are independent if the occurence of one has no effect
on the probability of occurence of the other.
• Sampling with replacement illustrates this condition well. For
example, suppose we are going to draw two cards, one at a time,
with replacement, from a deck of ordinary playing cards.
• We can let A be the card drawn first and B be the card drawn
second. Since A is replaced before drawing B, the occurence of A
on the first draw has no effect on the probability of occurence of B.
• If A and B are independent, then the probability of B occuring is
unaffected by A. Therefore p(B|A)=p(B)

p )
(AandB
p
(A)p
(B )
/A p
(A)p
(B)
Suppose we are going to randomly draw two cards, one at a time,
with replacement, from a deck of ordinary playing cards. What is
the probability both cards will be aces?
• Since the problem requires an ace on the first draw and an ace on
the second draw, the multiplication rule is appropriate. We can
let A be an ace and B be an ace on the second draw. Since
sampling is with replacement, A and B are independent.
• Thus, p (an ace on first draw and an ace on second draw) = p (an
ace on first draw) p (an ace on second draw).
• There are four aces possible on he first draw, four aces possible
on the second draw (sampling is with replacement), and 52 cards
in the deck, so
• p(an ace on first draw) = 4/52
• p(an ace on second draw) = 4/52
• Thus, p(an ace on first draw and an ace on second draw) = 4/52
(4/52) = 16/2704 = 0.0059
• The multiplication rule with independent
events also applies to situation in which there
are more than two events. In such cases, the
probability of the joint occurence of the events
is equal to the product of the individual
probabilities of each event.
• p(A and B and C and ... Z) = p(A) p(B) p(C) ... p(Z)
- multiplication rule with more than two
independent events.
Multiplication Rule: Dependent
Events
• When A and B are dependent, the
probability of occurence of B is affected
by the occurence of A. We must use it in
its original form.
• Thus, if A and B are dependent: p(A and
B) = p(A) p(B|A)
• Sampling without replacement provides
a good illustration for dependent events.
Suppose you are going to draw two cards. What is the probability both
cards will be aces?
• We can let A be an ace on the first draw and B be an ace on the
second draw. Since sampling is without replacement (whatever card is
picked the first time is kept out of the deck), the occurence of A does
affect the probability of B. A and B are dependent.
• Since the problem ask for an ace on the first draw and an ace on the
second draw, and these events are dependent, the multiplication rule
with dependent event is appropriate.
• Thus, p(an ace on first draw and an ace on second draw)= p(an ace on
first draw) p(an ace on second draw, given an ace was obtained on
first draw) = 4/52.
• Since sampling is without replacement, p(an ace on second draw
given an ace on first draw) = 3/51.
• Thus, p (an ace on first draw and an ace on second draw) = 4/52
(3/51) = 12/2652 = 0.0045
• Like the multiplication rule with independent
events, the multiplication rule with dependent
events also applies to situation in which there
are more than two event. In such cases, the
equation becomes.
• p(A and B and C and ... Z) = p(A)p(B|A)p(C|
AB)... p(Z|ABC) - multiplication rule with more
than two dependent events.
• Combination of
Multiplication and
Addition Rule
Probability and Continuous
Variable
• Variables that have been discrete, such as sampling
from a deck of cards or rolling a pair of dice.
• Many dependent variables that are evaluated in
experiments are continuous, not discrete
• Probability of A with a continuous variable
Area Un hd eCuer Tr veCo
res p orn To
dAing
p(A )
To t a l Arean d er Ut h veeCu r
Suppose we have measured the eights of all the freshmen
women at this university. Let's assume this is a population set
of scores that is normally distributed, with mean of 120
pounds and a standard deviaton of 8 pounds. If we randomly
sampled one score from the population, what is the probability
it would be equal to or greater than a score of 134?
• The scores are normally distributed, so we can find this
proportion by converting the raw score to its z-transformed
value and then looking up the area in the z table.

X   134  120 14
z     1 . 75
 8 8
p ( X  134 )  0 . 0401
• The main difference is that, the problem has been cast in
terms of probability rather than asking for the proportion of
percentage of scores as was done.
The addition rules: The multiplication rules:
probability of A and B
probability of A or B
p(A and B)=p(A)p(B|A)
p(A or B)=p(A)+p(B)-p(A and B)
*If the events are mutually
*If events are mutually exclusive
exclusive p(A and B) = 0
p(A or B)=p(A)+p(B)
*If the events are independent
*If events are mutually p(A and B) = p(A)p(B)
exclusive and exhaustive
*If the events are dependent,
p(A)+p(B) = 1.00 we must use the general
equation
p(A and B)=p(A)p(B|A)

You might also like