Unit III - Update
Unit III - Update
Independent Events
The occurrence of one event has no effect on the probability that
the other event will occur.
Multiplication Rule
Multiply together the separate probabilities of several independent
events to find the probability that these events will occur together.
7. Narrate the symbols used for the mean and standard deviation of
three types of Distributions.
19. What is the use of one-tailed and two – tailed tests in hypothesis testing? When to use it?
One and Two-Tailed Tests are ways to identify the relationship between the statistical
variables.
For checking the relationship between variables in a single direction (Left or Right
direction), use a one-tailed test.
A two-tailed test is used to check whether the relations between variables are in any
direction or not.
where
Example
Type I error (false positive): the test result says you have
coronavirus, but you actually don’t.
Type II error (false negative): the test result says you don’t
have coronavirus, but you actually do.
For one-tailed, we use either > or < sign for the alternative hypothesis. For two-tailed, we use ≠
sign for the alternative hypothesis.
When the alternative hypothesis specifies a direction then we use a one-tailed test. If no
direction is given then we will use a two-tailed test
If we require a 100(1−α)% confidence interval we have to make some adjustments when using
a two-tailed test.
For a one-tailed test, the critical value is 1.645 . So the critical region is Z<−1.645 for a left-
tailed test and Z>1.645 for a right-tailed test. For a two-tailed test, the critical value is 1.96 . So
the confidence interval is |Z|<1.96 and the critical regions are where |Z|>1.96 .
The central limit theorem says that the sampling distribution of the mean will always be
normally distributed, as long as the sample size is large enough.
The central limit theorem states that irrespective of a random variable's distribution if large
enough samples are drawn from the population then the sampling distribution of the mean for
that random variable will approximate a normal distribution.
This fact holds true for samples that are greater than or equal to 30
10
PART B
Sample
Any subset of observations from a population.
The sample size is small relative to the population size.
Example 3.1
For each of the following pairs, indicate with a Yes or No
whether the relationship between the first and second
expressions could describe that between a sample and its
population, respectively.
(a) students in the last row; students in class
(b) citizens of Wyoming; citizens of New York
(c) 20 lab rats in an experiment; all lab rats, similar to
those used, that could undergo the same experiment
(d) all U.S. presidents; all registered Republicans
(e) two tosses of a coin; all possible tosses of a coin
Solution
(a) Yes
(b) No. Citizens of Wyoming aren’t a subset of citizens of New York.
(c) Yes
(d) No. All U.S. presidents aren’t a subset of all registered Republicans.
(e) Yes
11
Example 3.2
Identify all of the expressions from Example 3.1 that involve
a hypothetical population.
Solution
Expressions in 8.1(c) and 8.1(e) involve hypothetical populations.
Random Sampling
A selection process that guarantees all potential observations in
the population have an equal chance of being selected.
Inferential statistics requires that samples be random.
Example 3.3
Indicate whether each of the following statements is True or False.
A random selection of 10 playing cards from a deck of 52 cards implies that
(a) the random sample of 10 cards accurately represents the
important features of the whole deck.
(b) each card in the deck has an equal chance of being selected.
(c) it is impossible to get 10 cards from the same suit (for
example, 10 hearts).
(d) any outcome, however unlikely, is possible.
Solution
a. False. Sometimes, just by chance, a random sample of 10 cards
fails to represent the important features of the whole deck.
b. True
c. False. Although unlikely, 10 hearts could appear in a random
sample of 10 cards.
d. True
12
Example 3.4
Describe how you would use the table of random numbers to take
a. a random sample of five statistics students in a classroom
where each of nine rows consists of nine seats.
b. a random sample of size 40 from a large directory consisting
of 3041 pages, with 480 lines per page.
Solution
a. There are many ways. For instance, consult the tables of random numbers,
using the first digit of each 5-digit random number to identify the row
(previously labelled 1, 2, 3, and so on), and the second digit of th e same
random number to locate a particular student’s seat within that row.
Repeat this process until five students have been identified. (If the
classroom is larger, use additional digits so that every student can be
sampled.)
b. Once again, there are many ways. For instance, use the initial 4 digits of each random
number (between 0001 and 3041) to identify the page number of the telephone
directory and the next 3 digits (between 001 and 480) to identify the particular line on
that page. Repeat this process, using 7-digit numbers, until 40 telephone numbers have
been identified
Probability
The proportion or fraction of times that a particular event is
likely to occur.
13
Independent Events
The occurrence of one event has no effect on the probability that
the other event will occur.
Multiplication Rule
Multiply together the separate probabilities of several independent events to find the
probability that these events will occur together.
Dependent Events
When the occurrence of one event affects the probability of the
other event, these events are dependent.
Although the heights of randomly selected pairs of men are
independent, the heights of brothers are dependent.
Conditional Probability
The probability of one event, given the occurrence of another event.
14
Itemize all possible random samples, each of size two, that could
be taken from this population.
15
There are four possibilities on the first draw from the population and also four
possibilities on the second draw from the population, as indicated in Table
3.1.*
Table 3.1 - All possible samples of size two from a miniature population
16
17
FIGURE 3.3
Emergence of the sampling distribution of the mean from all possible
samples
18
Example 3.8
Without peeking, list the special symbols for the mean of
the population
(a) mean of the sampling distribution of the mean
(b) mean of the sample
(c) standard error of the mean
(d) standard deviation of the sample
(e) standard deviation of the population (f) .
Example 3.9
Imagine a very simple population consisting of only five
observations: 2, 4, 6, 8, 10.
(a) List all possible samples of size two.
19
Example 3.10
Indicate whether the following statements are True or False.
Example 3.10
Indicate whether the following statements are True or False. The
21
Example 3.11
Indicate whether the following statements are True or False.
The central limit theorem
a. states that, with sufficiently large sample sizes, the shape of
the population is normal.
b. states that, regardless of sample size, the shape of the
sampling distribution of the mean is normal.
c. ensures that the shape of the sampling distribution of the
mean equals the shape of the population.
applies to the shape of the sampling distribution—not to the shape of the
population and not to the shape of the sample.
22
Defining Hypotheses
Null hypothesis (H0):
In statistics, the null hypothesis is a general statement or default
position that there is no relationship between two measured cases
or no relationship among groups. In other words, it is a basic
assumption or made based on the problem knowledge.
Example:
A company’s mean production is 50 units/per
day H0: = 50.
Alternative hypothesis (H1):
The alternative hypothesis is the hypothesis used in
hypothesis testing that is contrary to the
null hypothesis. Example: company’s production is
23
24
Figure 3.6 - One possible set of common and rare outcomes (values of X).
Figure 3.6 shows one possible set of boundaries for common and rare
outcomes, expressed in values of X.
If the one observed sample mean is located between 478 and 522, it
will qualify as a common outcome, and the null hypothesis will be
retained.
25
If, however, the one observed sample mean is greater than522 or less
than 478, it will qualify as a rare outcome, and the null hypothesis
will be rejected.
where
26
Critical z Score
A z score that separates common from rare outcomes and
hence dictates whether H0 should be retained or rejected.
Example 3.14
First using words, then symbols, identify the null hypothesis
for each of the following situations.
a. A school administrator wishes to determine whether sixth-grade
boys in her school district differ, on average, from the national
norms of 10.2 pushups for sixth-grade boys.
b. A consumer group investigates whether, on average, the true
weights of packages of ground beef sold by a large supermarket
chain differ from the specified 16 ounces.
c. A marriage counselor wishes to determine whether, during
a standard conflict-resolution session, his clients differ, on
average, from the 11 verbal interruptions reported for
“welladjusted couples.”
28
Example 3.15
For each of the following situations, indicate whether H 0 should
be retained or rejected and justify your answer by specifying the
precise relationship between observed and critical z scores. Should
H 0 be retained or rejected, given a hypothesis test with critical z
scores of ±
1.96 and
One and Two-Tailed Tests are ways to identify the relationship between
the statistical variables.
For checking the relationship between variables in a single direction
(Left or Right direction), use a one-tailed test.
A two-tailed test is used to check whether the relations between variables
are in any direction or not.
29
Figure 3.8 a. One-Tailed or Directional Test (Lower Tail Critical) Figure 3.8 b. One-
Tailed or Directional Test (Upper Tail Critical)
30
Figure 3.10 shows rejection regions that are associated with both
tails of the hypothesized sampling distribution.
The corresponding decision rule, with its pair of critical z scores of
±1.96, is referred to as a two-tailed or nondirectional test.
Critical region lies entirely on either Critical region is given by the portion
the right side or left side of the of the area lying in both the tails of the
sampling distribution. probability curve of the test statistic.
31
Rejection region is either from the left Rejection region is from both sides i.e.
side or right side of the sampling left and right of the sampling
distribution. distribution.
It checks the relation between the It checks the relation between the
variable in a single direction. variables in any direction.
It is used to check whether the one It is used to check whether the two
mean is different from another mean or mean different from one another or
not. not.
Example 3.17
For each of the following situations, indicate whether H0 should
be retained or rejected.
Given a one-tailed test, lower tail critical with α = .01, and
(a) z = – 2.34 (b) z = – 5.13 (c) z = 4.04
Given a one-tailed test, upper tail critical with α = .05, and
(d) z = 2.00 (e) z = – 1.80 (f) z = 1.61
a. Reject H0 at the .01 level of significance because z = –2.34 is more
negative than –2.33.
b. Reject H0 at the .01 level of significance because z = –5.13 is more
negative than –2.33.
c. Retain H0 at the .01 level of significance because z = 4.04 is less negative
than –2.33. (The value of the observed z is in the direction of no concern.)
d. Reject H0 at the .05 level of significance because z = 2.00 is more
positive than 1.65.
e. Retain H0 at the .05 level of significance because z = –1.80 is less positive
than 1.65. (The value of the observed z is in the direction of no concern.)
f. Retain H0 at the .05 level of significance because z = 1.61 is less
positive than 1.65.
Example 3.18
Specify the decision rule for each of the following situations
(referring to Table to find critical z values):
(a) a two-tailed test with α = .05
(b) a one-tailed test, upper tail critical, with α = .01
(c) a one-tailed test, lower tail critical, with α = .05
(d) a two-tailed test with α = .01
a. Reject H0 at the .05 level of significance if z equals or is more positive than
1.96 of if z equals or is more negative than –1.96.
b. Reject H0 at the .01 level of significance if z equals or is more
positive than 2.33.
c. Reject H0 at the .05 level of significance if z equals or is more negative than
32
–1.65.
d. Reject H0 at the .01 level of significance if z equals or is more positive
than 2.58 or if z equals or is more negative than –2.58.
The best single point estimate for the unknown population mean
is simply the observed value of the sample mean.
Example 3.19
A random sample of 200 graduates of U.S. colleges reveals a
mean annual income of $62,600. What is the best estimate of the
unknown mean annual income for all graduates of U.S.
colleges?
$62,600
where
33
Example 3.20
Reading achievement scores are obtained for a group of fourth graders.A score of 4.0
indicates a level of achievement appropriate for fourth grade, a score
below 4.0 indicates underachievement, and a score above
4.0 indicates overachievement. Assume that the population
standard deviation equals 0.4. A random sample of 64 fourth graders reveals a mean
achievement score of 3.82.
a. . Construct a 95 percent confidence interval for the unknown population mean.
(Remember to convert the standard deviation to a standard error.)
b. b. Interpret this confidence interval; that is, do you find any consistent evidence either of
overachievement or of underachievement?
Solution
c. Can claim, with 95 percent confidence, that the interval between 3.72 and 3.92 includes the true
population mean reading score for the fourth graders. All of these values suggest that, on average, the fourth
graders are underachieving
Example 3.21
Before taking the GRE, a random sample of college seniors received special training on how to take
the test. After analysing their scores on the GRE, the investigator reported a dramatic gain, relative to
the national average of 500, as indicated by a 95 percent confidence interval of 507 to 527. Are the
following interpretations true or false?
a.About 95 percent of all subjects scored between 507 and 527.
b.The interval from 507 to 527 refers to possible values of the population mean for all students who
undergo special training.
c.The true population mean definitely is between 507 and 527.
d.This particular interval describes the population mean about 95 percent of the time.
f. .In practice, we never really know whether the interval from 507 to 527 is true or false.
We can be reasonably confident that the population mean is between 507 and 527.
(a) Solution :
(b) False. We can be 95 percent confident that the mean for all subjects will be between 507 and 527.
(c) True
34
(d) False. We can be reasonably confident—but not absolutely confident— that the true population mean
lies between 507 and 527.
(e) False. This particular interval either describes the one true population mean or fails to describe the
one true population mean.
(f) True
(g) True
LEVEL OF CONFIDENCE
The level of confidence indicates the percent of time that a series
of confidence intervals includes the unknown population
characteristic, such as the population mean.
Any level of confidence may be assigned to a confidence interval merely
by substituting an appropriate value for zconf in Formula
Example 3.22
On the basis of a random sample of 120 adults, a pollster
reports, with 95 percent confidence, that between 58 and 72
percent of all Americans believe in life after death.
a. If this interval is too wide, what, if anything, can be
done with the existing data to obtain a narrower
confidence interval?
b. What can be done to obtain a narrower 95 percent
confidence interval if another similar investigation is
35
being planned?
a. Switch to an interval having a lesser degree of confidence,
such as 90 percent or 75 percent.
b. Increase the sample size.
Example 3.23
In a recent scientific sample of about 900 adult Americans, 70 percent favour stricter gun control of
assault weapons, with a margin of error of ±4 percent for a 95 percent confidence interval. Therefore, the
95 percent confidence interval equals 66 to 74 percent. Indicate whether the following interpretations are
true or false:
a. The interval from 66 to 74 percent refers to possible values of the sample percent.
c. In the long run, a series of intervals similar to this one would fail to include the
population percent about 5 percent of the time.
d. We can be reasonably confident that the population percent is between 66 and 74
percent.
Solution
(a) False. The interval from 66 to 74 percent refers to possible values of the population
proportion.
(b) False. Can be reasonably confident—but not absolutely confident— that the true
population proportion is between 66 and 74 percent.
(c) True
(d) True
36
Example 3.23
For the population at large, the Wechsler Adult Intelligence Scale
is designed to yield a normal distribution of test scores with a mean
of 100 and a standard deviation of 15. School district officials
wonder whether, on the average, an IQ score different from 100
describes the intellectual aptitudes of all students in their district.
Wechsler IQ scores are obtained for a random sa mple of 25 of their
students, and the mean IQ is found to equal 105. Using the step-by-
step procedure, test the null hypothesis at the .05 level of
significance.
Example 3.24
Consult the power curves in Figure 11.7 to estimate the approximate
detection rates, rounded to the nearest tenth, for the
following situations:
(a) a three-point effect, with a sample size of 29
(b) a six-point effect, with a sample size of 13
(c) a twelve-point effect, with a sample size of 13
(a) .3
(b) .4
(c) .9
37
Example 3.25
An investigator consults a chart to determine the sample size
required to detect an eight-point effect with a probability of .80.
What happens to this detection rate of .80—will it actually be smaller,
the same, or larger—if, unknown to the investigator, the true effect
actually equals
(a) twelve points?
(b) five points?
a. The power for the 12-point effect is larger than .80 because the true
sampling distribution is shifted further into the rejection region for
the false H0.
b. The power for the 5-point effect is smaller than .80 because the true
sampling distribution is shifted further into the retention region for
the false H0.
Example 3.26
(b) can claim, with 99 percent confidence, that the interval between
$78,552 and $81,648 includes the true population mean salary for all
female members of the American Psychological Association. All of
these values suggest that, on average, females’ salaries are less than
males’ salaries.
38
Example 3.27
Imagine that one of the following 95 percent confidence intervals estimates the
effect of vitamin C on IQ scores:
Solution:
(a) Which one most strongly supports the conclusion that
vitamin C increases IQ scores?
(b) Which one implies the largest sample size?
(c) Which one most strongly supports the conclusion that
vitamin C decreases IQ scores?
(c) Which one would most likely stimulate the investigator to
conduct an additional experiment using larger sample
sizes?
(a) 3 (b) 1 (c) 5 (d) 4
39
40
Step 5:
Conditional Probability of an improved relationship, given a couple has children
Out of the 50 couples who had children, 45 couples described their relationships as improved. So, the
conditional probability of an improved relationship, given a couple has children, is calculated as:
45/50=0.9 or 90%.
8. The probability of a boy being born equals 0.50, or 1/2, as does the
probability of a girl being born. For a randomly selected family with two
children, what's the probability of
Two boys, that is, a boy and a boy? (3)
Two girls? (2)
Either two boys or two girls? (2) (Nov/Dem 2023))
Step 1: Analyze the probability of each event
According to the problem, the probability for a child to be a boy or a girl is
0.50 or 1/2. Since having a boy or a girl are independent events, the probability
of having two boys or two girls is the product of their individual probabilities
(multiplication rule).
Step 2: Compute the probability of two boys
The probability of having a boy is 1/2 and since the births are independent, the
probability of having two boys is (1/2)∗(1/2)=1/4.
Step 3: Compute the probability of two girls
Similar to Step 2, the probability of having two girls is
(1/2)∗(1/2)=1/4.
Step 4: Compute the probability of either two boys or two girls
Two boys and two girls are mutually exclusive events, meaning they can't occur
simultaneously. Hence, the probability of having either two boys or two girls is the sum
of their individual probabilities (addition rule). So, the probability is 1/4+1/4=1/2
9. The normal range for a widely accepted meapre of body size, the Body Mass
Index (BMI), ranges from 18.5 to Using the midrange BMI score of 21.75
as the null hypothesized value for the population mean, test this
hypothesis at the .01 level of significance given a random sample of,30
weight-watcher participants who show a mean BMI = 22. /and a
standard deviation of 3.1. (6) (Nov/Dem 2023)
Solution
Not reject the null hypothesis
Significance level of .01, the p-value will be 0.433, which is
greater than the level of significance of .01.
No difference between the BMI and body size of the 30 weight -
41
10. State any two reasons why the research hypothesis is not tested directly.
Explain them in brief. (7) (Nov/Dem 2023)
The research hypothesis is not tested directly because it is
difficult to prove a specific effect or relationship exists.
Instead, K researchers test the null hypothesis, which states that
there is no effect or relationship.
11. Imagine that one of the following 95 percent confidence intervals estimates the effect of
vitamin C on IQ score.
(Apr/May 2024)
95% Confidence Interval Lower Limit Upper Limit
1 100 102
2 95 99
3 102 106
4 90 111
5 91 98
i) Which one most strongly support the conclusion that vitamin Cincreases IQ score? (4)
ii) Which one implies the largest sample size? (3)
iii) Which one most strongly supports the conclusion that vitamin C decreases IQ scores?
(3)
iv) Which onewould most likely stimulate the investigator to conduct an additionl
experiment using larger sample size? (3)
Solution :
42
12. Examplify in detail about the significance of z-test, its procedure and decision rule with
example. (Apr/May 2024) (6)
Z-Test
A z-test is a statistical test used to determine whether two population means are different
when the variances are known and the sample size is large.
It can also be used to compare one mean to a hypothesized value.
A z-test is a hypothesis test for data that follows a normal distribution.
A z-statistic, or z-score, is a number representing the result from the z-test.
Z-tests are closely related to t-tests, but t-tests are best performed when an experiment has
a small sample size.
Z-tests assume the standard deviation is known, while t-tests assume it is unknown.
z-test Example
43
A gym trainer claimed that all the new boys in the gym are above average weight. A random
sample of thirty boys weight have a mean score of 112.5 kg and the population mean weight is
100 kg and the standard deviation is 15.
44
13. A study finds that racism in cricket more oftentakes place when the game is played in
England orAustralia or New Zaland (say EAN countries). Given that
Racism takes place or Game is played in EAN is 9/13.
Racism takes place or Game is played in EAN is 5/7.
Game is played in EAN given that racism takes place is 4/5
45
Solution:
GPAs for all students at some college have a mean of 3,01 and a
median of 3.20.
15. Assume that we have a stream of items of large and unknown length
that we can only iterate over once. Devise an effective sampling
algorithm that randomly chooses an item from this stream such that
each item is equally likely to be selected. (Apr/May2024)
Reservoir Sampling
Let us assume we have to sample 5 objects out of an infinite stream
such that each element has an equal probability of getting selected.
import randomdef generator(max):
number = 1
46
number += 1
yield number# Create as stream generator
stream = generator(10000)# Doing Reservoir Sampling
from the stream
k=5
reservoir =
for i, element in enumerate(stream):
if i+1<= k:
reservoir.append(element)
else:
probability = k/(i+1)
if random.random() < probability:
# Select item in stream and remove one of the k
items already selected
reservoir[random.choice(range(0,k))] = elementprint(reservoir)
47