Statistics and Probability2021 - Quarter 3 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

STATISTICS AND

PROBABILITY
Level: SENIOR HIGH SCHOOL Semester: SECOND
Subject Group: CORE SUBJECT Quarter: THIRD

Course Description:
At the end of the course, the students must know how to find the mean and variance of a
random variable, to apply sampling techniques and distributions, to estimate population mean
and proportion, to perform hypothesis testing on population mean and proportion, and to
perform correlation and regression analyses on real-life problems..

Course Requirements:
Below is the list of activities that must be completed and submitted with their corresponding
percentage.
WEEK ACTIVITIES Date of Completion Final Grade
1 Enabling Assessment Activity No.1 January 14, 2022 10%
2 Mini Performance Task 1 January 21, 2022 15%
3 Enabling Assessment Activity No.2 January 28, 2022 10%
4 Mini Performance Task 2 February 4, 2022 15%
5 Enabling Assessment Activity No.3 February 11, 2022 10%
6 Mini Performance Task 3 February 18, 2022 15%
7 Enabling Assessment Activity No.4 February 25, 2022 10%
8 Final Performance Task March 4, 2022 15%
TOTAL 100%

QUARTER 3 CULMINATING PERFORMANCE TASK


GOAL – Prepare a normal distribution graph that will help in your decision making.
ROLE - Statistician
AUDIENCE – A client who wants to have a beach vacation.
SITUATION – A client in your travel company acquired some information on what month should
he take his beach vacation. As much as possible your client wanted not to coincide his vacation
with the peak season or where people gets crowded in the beach.
PRODUCT – Survey at least 50 students in your strand and identify on what month of the year
they usually go out on a beach. Based on the result, create a normal distribution graph and
identify which month your client should took his vacation.
STANDARDS – The recommendation would be assessed based on the following criteria
Colegio de Los Baños – STATISTICS AND PROBABILITY 1

CRITERIA PERCENTAGE
Relevance
(The output contains timely information and reasonable type of vacation 40%
options)
Clarity of plan & process
(The output shows clear data regarding the result of the survey) 30%

Presentation of data
30%
(Data should be presented accurately and precisely based on formula)
Total 100%

Week 1 and 2
DISCRETE RANDOM VARIABLE
INTRODUCTION
In this lesson, you will learn the difference of a continuous from a discrete variable

LEARNING MATERIALS: Module, pen, paper, internet (if applicable), scientific calculator

PRAYER: Father God, please guide me in the lesson today and help me grow in love and
kindness more like Jesus every day. AMEN

MELC: At the end of the lesson, you should be able to:


 Illustrate a random variable (discrete and continuous).
 Distinguish between a discrete and a continuous random variable
 Find the possible values of a random variable
 Illustrate a probability distribution for a discrete random variable and its properties.
 Compute probabilities corresponding to a given random variable
 Illustrate the mean and variance of a discrete random variable
 Calculate the mean and the variance of a discrete random variable
 Interpret the mean and the variance of a discrete random variable.
 Solve problems involving mean and variance of probability distributions.

INSTITUTIONAL VALUES: Mastery of competencies, Problem solving

REVIEW: Basic Probability and Event


DEVELOPMENT
MOTIVATION - PROCESS QUESTIONS:
1. What’s the difference between discrete and continuous variable?
2. What is the significance of mean and variance of a discrete random variable?

LESSON PROPER
Discrete and Continuous Random Variables:
A variable is a quantity whose value changes.
A discrete variable is a variable whose value is obtained by counting.
Colegio de Los Baños – STATISTICS AND PROBABILITY 2

Examples: number of students present


number of red marbles in a jar
number of heads when flipping three coins
students’ grade level

A continuous variable is a variable whose value is obtained by measuring.

Examples: height of students in class


weight of students in class
time it takes to get to school
distance traveled between classes

A random variable is a variable whose value is a numerical outcome of a random


phenomenon.

▪A random variable is denoted with a capital letter


▪The probability distribution of a random variable X tells what the possible values of X are
and how probabilities are assigned to those values
▪ A random variable can be discrete or continuous

A discrete random variable X has a countable number of possible values.

Example: A fair coin is tossed twice. Let X be the number of heads that are observed.
a. Construct the probability distribution of X.
b. Find the probability that at least one head is observed.

Solution:
a. The possible values that X can take are 0, 1, and 2. Each of these numbers corresponds
to an event in the sample space
S = {hh,ht,th,tt}

of equally likely outcomes for this experiment:

X = 0 to {tt}, X = 1 to {ht,th} and X = 2 to {hh}.

The probability of each of these events, hence of the corresponding value of X, can be
found simply by counting, to give

x 0 1 2
P(x) 0.25 0.5 0.25

This table is the probability distribution of X.

b. “At least one head” is the event X≥1, which is the union of the mutually exclusive
events X=1 and X=2. Thus

P(X≥1) = P(1) + P(2) = 0.50 + 0.25 = 0.75


Colegio de Los Baños – STATISTICS AND PROBABILITY 3

A histogram that graphically illustrates the probability distribution is given in Figure

Figure 1.Probability Distribution for Tossing a Fair Coin Twice


Example 2: A pair of fair dice is rolled. Let X denote the sum of the number of dots on
the top faces.
a. Construct the probability distribution of X for a paid of fair dice.
b. Find P(X≥9).
c. Find the probability that X takes an even value.

The sample space of equally likely outcomes is

1,1 2,1 3, 1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2
1,3 2,3 3, 3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4
1,5 2,5 3, 5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6
6

a. The possible values for X are the numbers 2 through 12.

X=2 is the event {11}, so P(2)=1/36.


X=3 is the event {12, 21},
so P(3)=2/36.

Continuing this way we obtain the following table

X 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 6 6 5 4 3 2 1
P(X)
36 36 36 36 36 36 36 36 36 36 36
b. The event X≥9 is the union of the mutually exclusive events X=9, X=10, X=11,
and X=12. Thus

P(X≥9) =P(9) + P(10) + P(11) + P(12)


= 4/36 + 3/36 + 2/36 + 1/36 = 10/36
= 0.27
Colegio de Los Baños – STATISTICS AND PROBABILITY 4

c. Before we immediately jump to the conclusion that the probability that X takes an even
value must be 0.5, note that X takes six different even values but only five different odd
values. We compute

P(X is even) = P(2) + P(4) + P(6) + P(8) + P(10) + P(12)


= 1/36 + 3/36 + 5/36 + 5/36 + 3/36 + 1/36
= 18/36 = 0.5

A continuous random variable X takes all values in a given interval of numbers.

▪The probability distribution of a continuous random variable is shown by a density curve.


▪The probability that X is between an interval of numbers is the area under the density curve
between the interval endpoints
▪The probability that a continuous random variable X is exactly equal to a number is zero

Mean, Variance and Standard Deviation

Example: Tossing a single unfair die


For fun, imagine a weighted die (cheating!) so we have these probabilities:
1 2 3 4 5 6
0.1 0.1 0.1 0.1 0.1 0.5

Mean or Expected Value: μ

When we know the probability p of every value x we can calculate the Expected Value
(Mean) of X:

μ = Σxp

To calculate the Expected Value:


1. multiply each value by its probability
2. sum them up
Colegio de Los Baños – STATISTICS AND PROBABILITY 5

Example continued:
x 1 2 3 4 5 6
p 0.1 0.1 0.1 0.1 0.1 0.5
xp 0.1 0.2 0.3 0.4 0.5 3
μ = Σxp
= 0.1+0.2+0.3+0.4+0.5+3
= 4.5

The expected value is 4.5

Variance: Var(X)

The Variance is:


Var(X) = Σx2p − μ2
1. square each value and multiply by its probability
2. sum them up and we get Σx2p
3. then subtract the square of the Expected Value μ2

Example continued: x 1 2 3 4 5 6
p 0.1 0.1 0.1 0.1 0.1 0.5
2
xp 0.1 0.4 0.9 1.6 2.5 18
2
Σx p = 0.1 + 0.4 + 0.9 + 1.6 + 2.5 + 18 = 23.5

Var(X) = Σx2p − μ2
= 23.5 – (4.5)2 = 3.25
The variance is 3.25

Standard Deviation: σ

The Standard Deviation is the square root of the Variance:

σ = √Var(X)

Example continued:
x 1 2 3 4 5 6
p 0.1 0.1 0.1 0.1 0.1 0.5
x2p 0.1 0.4 0.9 1.6 2.5 18

σ = √Var(X) = √3.25 = 1.803

The Standard Deviation is 1.803

In some ways, the standard deviation is the more tangible of the two measures, since it is in
the same units as X. For example, if X is a random variable measuring lengths in meters,
then the standard deviation is in meters (m), while the variance is in square meters (m 2).
Unlike the mean, there is no simple direct interpretation of the variance or standard
deviation. The variance is analogous to the moment of inertia in physics, but that is not
Colegio de Los Baños – STATISTICS AND PROBABILITY 6

necessarily widely understood by students. What is important to understand is that, in


relative terms:
a small standard deviation (or variance) means that the distribution of the random variable is
narrowly concentrated around the mean
a large standard deviation (or variance) means that the distribution is spread out, with some
chance of observing values at some distance from the mean.
Note that the variance cannot be negative, because it is an average of squared quantities.
This is appropriate, as a negative spread for a distribution does not make sense.
Hence, var(X) ≥ 0 and sd(X) ≥ 0 always.
Colegio de Los Baños – STATISTICS AND PROBABILITY 7

ANSWER SHEET (Please submit only the answers. Do not return the entire module.)
Name:_______________________________ Section: _______________________
LAST NAME, FIRST NAME, MIDDLE INITIAL

ENGAGEMENT

Enabling Assessment Activity No.1. Discrete Random Variable

The number of days in the winter months that a construction crew cannot
work because of the weather has the following probability distribution

X 3 4 5 6 7 8 9 10
P(X) 0.05 0.10 0.20 0.25 0.15 0.10 0.08 0.07

a. Find the probability that no more than 5 days can not work next winter (5 pts)
b. Find the probability that from 4 to 8 days will be absent next winter. (5 pts)
c. Find the probability that at most 7 days at all will be absent next winter. (5 pts)
d. Compute the mean and standard deviation of X. Interpret the mean in
the context of the problem (10 pts)

ASSIMILATION
Answer in 3-5 sentences.
Is it possible that the probability of an event to happen will be 0? Justify
your answer by giving an example? (10pts)

___________________________________________________________________
SIGNATURE OVER PRINTED NAME OF PARENT/GUARDIAN
DATE: ___________________
Colegio de Los Baños – STATISTICS AND PROBABILITY 8

ANSWER SHEET (Please submit only the answers. Do not return the entire module.)
Name: ________________________________ Section: _______________________
LAST NAME, FIRST NAME MIDDLE INITIAL

Mini-Performance Task No.1. VARIABLE ON TOSSING COINS (50 pts)


1. Get three coins (same denomination e.g. three 1-peso coin). Toss them all at once.
2. Record the outcome based on the number of tails and heads that has appeared.
3. Repeat step 1 for 19 times. Tabulate your data.

Event 3 heads 2 heads, 1 tail 1 head, two Three tails


tails
Frequency

Find the probabilities that a tail will appear

Event 0 tail 1 tail 2 tails 3 tails

Probability

Find the mean, variance and standard deviation of the said event

___________________________________________________________________
SIGNATURE OVER PRINTED NAME OF PARENT/GUARDIAN
DATE: ______________
Colegio de Los Baños – STATISTICS AND PROBABILITY 9

WEEK 3 and 4 – NORMAL RANDOM VARIABLE and SAMPLING DISTRIBUTIONS


In this lesson, you will learn how to identify the probabilities under a curve and how to conduct
sampling distributions

LEARNING MATERIALS: Module, pen, paper, internet (if applicable), scientific calculator

PRAYER: Father God, please guide me in the lesson today and help me grow in love and
kindness more like Jesus every day. AMEN

INTRODUCTION:
MELC: At the end of the lesson, you should be able to:
Illustrate a normal random variable and its characteristics
Identify regions under the normal curve corresponding to different standard normal values
Convert a normal random variable to a standard normal variable and vice versa
Compute probabilities and percentiles using the standard normal table
Illustrate random sampling
Distinguish between parameter and statistic
Identify sampling distributions of statistics (sample mean)
Find the mean and variance of the sampling distribution of the sample mean

INSTITUTIONAL VALUES: Mastery of competencies, Problem solving, Critical Thinking

REVIEW: Random Variables

DEVELOPMENT

MOTIVATION - PROCESS QUESTIONS:


1. What is the significance of mean and standard deviation to a normal curve?
2. What is the significance of sampling distributions?

LESSON PROPER
LESSON 3: NORMAL RANDOM VARIABLES

In Summarizing Data Graphically and Numerically, we encountered data sets, such as


height and weight, with distributions that are fairly symmetric with a central peak. We call
these bell-shaped.

Many variables, such as weight, shoe sizes, foot lengths, and other human physical
characteristics, exhibit these properties. The symmetry indicates that the variable is just as
likely to take a value a certain distance below its mean as it is to take a value that same
distance above its mean. The bell shape indicates that values closer to the mean are more
likely, and it becomes increasingly unlikely to take values far from the mean in either
direction.

We use a mathematical model with a smooth bell-shaped curve to describe these bell-
shaped data distributions. These models are called normal curves or normal
distributions. They were first called “normal” because the pattern occurred in many
different types of common measurements.
Colegio de Los Baños – STATISTICS AND PROBABILITY 10

The general shape of the mathematical model used to generate a normal curve looks like
this:

Because normal curves are mathematical models, we use Greek letters to represent the
mean and standard deviation of a normal curve. The mean of a normal distribution locates
its center. We use the Greek letter μ (pronounced “mu” ) to represent the mean. We use the
Greek letter σ (pronounced “sigma”) to represent the standard deviation of a normal
distribution. The standard deviation determines the spread of the distribution. In fact, the
shape of a normal curve is completely determined by specifying its standard deviation. As
we will see, if two normal distributions have the same standard deviation, then the shapes of
their normal curves will be identical.

Following are some


observations we can make as
we look at the figure:

 The black and the red


normal curves have means or
centers at μ = 10. However, the
red curve is more spread out
and thus has a larger standard
deviation. Notice that the red
normal curve is also shorter.
This makes sense because
these curves are probability
density curves, so the area
under each curve has to be 1.
 The black and the green normal curves have the same standard deviation or spread.
 We use 𝒙̅ to represent the mean of data in a sample. We use μ to represent the mean of
a density curve defined by a mathematical model.
 We use SD or Sx to represent the standard deviation of data in sample. We use σ to
represent the standard deviation of a density curve defined by a mathematical model.

Example on Foot Length


How many standard deviations below or above the mean male foot length is 13 inches?
Since the mean is 11 inches, 13 inches is 2 inches above the mean. Since a standard
deviation is 1.5 inches, this would be 2 / 1.5 = 1.33 standard deviations above the mean.
Combining these two steps, we could write:
(13 in. − 11 in.) / (1.5 in. per standard deviation) = (13 − 11) / 1.5 standard deviations =
+1.33 standard deviations
Colegio de Los Baños – STATISTICS AND PROBABILITY 11

In the language of statistics, we have just found the z-score for a male foot length of 13
inches to be z = +1.33. Or, to put it another way, we have standardized the value of 13. In
general, the standardized value z tells how many standard deviations below or above the
mean the original value is. It is calculated as follows:

z-score = (value – mean) / standard deviation.

The convention is to denote a value of our normal random variable X with the letter x. Since
the mean is written μ and the standard deviation σ, we may write the standardized value as

̅ −μ)/σ
z = (𝒙

Notice that since σ is always positive, for values of 𝒙


̅ above the mean (μ), z will be positive;
for values of x below μ, z will be negative.

STANDARD NORMAL RANDOM VARIABLE

A standard normal random variable is a normally distributed random variable with


mean μ=0 and standard deviation σ=1. It will always be denoted by the letter Z
To compute probabilities for Z we will not work with its density function directly but instead
read probabilities out of Cumulative Normal Probability Table (see table at the end of this
module) The tables are tables of cumulative probabilities; their entries are probabilities of
the form P(Z<z). The use of the tables will be explained by the following series of examples.

Determining probabilities
The following mathematical notations on probabilities are used to simplify lengthy
expressions.
P (a < z < b) denotes the probability that the z – score is between a and b.
P (z > a) denotes the probability that the z – score is greater than a.
P (z < a) denotes the probability that the z – score is less than a.
Thus, P(1 < z < 2) = 0.1359 is read as “ the probability that the z – score falls between z = 1
and z = 2 is 0.1359.”
Case 1. The required area is: greater than a, at least a, more than a, to the right of a, or
above a.
P (z > a)
Colegio de Los Baños – STATISTICS AND PROBABILITY 12

Case 2. The required area is: less than a, at most a, no more than a, to the left of a, or
below a.
P (z < a)
Case 3. The required area is between a and b then P (a < z < b)

Find the probabilities indicated, where as always Z denotes a standard normal random
variable

1. P(Z<1.48).

Cumulative Probability table shows how this probability is read directly from the table without
any computation required. The digits in the ones and tenths places of 1.48, namely 1.4, are
used to select the appropriate row of the table; the hundredths part of 1.48, namely 0.08, is
used to select the appropriate column of the table. The four decimal place number in the
interior of the table that lies in the intersection of the row and column selected, 0.9306, is the
probability sought:

P(Z<1.48) = 0.9306
Colegio de Los Baños – STATISTICS AND PROBABILITY 13

Sample 2: P(Z > 1.60)


Solution:
P(Z > 1.60) = 1 - P(Z < 1.60)
Since P(Z < 1.60) = 0.9452 (using Probability Distribution Table)
Then 1 - P(Z < 1.60) = 1 – 0.9452 = 0.0548

Sample 3: P(Z > -1.02)

Complementary rule means that if P(Z > -n) for n being any integer
P(Z > -n) = P(Z ≤ n)

So P(Z > -1.02) is equal to P(Z ≤ 1.02) = 0.8461

Sample 4: P(0.5 < Z < 1.57)

For the shaded curve in between, the law states that


P(a < Z < b) = P(Z < b) – P(Z < a)
Colegio de Los Baños – STATISTICS AND PROBABILITY 14

So P(0.5 < Z < 1.57) = P(Z < 1.57) - P(Z < 0.5)

For P(Z < 1.57) = 0.9418 and P(Z < 0.5) = 0.6915

Then P(Z < 1.57) - P(Z > 0.5) = 0.9418 – 0.6915 = 0.2503

Sample 5: Find the z-score of a normal distribution curve under 90 th percentile

When we go to the table, we find that the value 0.90 is not there exactly, however, the values
0.8997 and 0.9015 are there and correspond to Z values of 1.28 and 1.29, respectively. Hence
we get the median (as an estimate) of the two z values

Z(0.90) = (1.28 + 1.29)/2 = 1.285

Sample 6: The mean BMI for men aged 60 is 29 with a standard deviation of 6. What is 90th
percentile of BMI for men?

Given
μ = 29 σ=6
Find: X90 = value at 90th percentile

Remember that Z(0.90) = 1.285 and z-score = (value – mean) / standard deviation.

̅ −μ)/σ into X = μ + Zσ
So rearranging z = (𝒙

Therefore X = μ + Zσ

X = 29 + (1.285 * 6) = 36.71
Colegio de Los Baños – STATISTICS AND PROBABILITY 15

ANSWER SHEET (Please submit only the answers. Do not return the entire module.)
Name:_______________________________ Section: _______________________

ENGAGEMENT
Enabling Assessment Activity No.2. Normal Random Variable

Answer the following problem. Show your solutions.


1. In a certain population of the herring Pomolobus aestevalis, the length of
the individual fish follows a normal distribution. The mean length of individual
fish is 75 mm and the standard deviation is 7.5 mm.
a) What percentage of the fish is between 70 and 80 mm long? (11 pts)
b) What percentage of the fish is less than 73 mm long? (7 pts)
c) What percentage of the fish is more than 78 mm long? (7 pts)

ASSIMILATION
Answer in 3-5 sentences.
Stat teachers use Normal distribution curve to know if cheating incident happened during an
exam. How are they able to know if cheating happened just by looking at the scores of the
students? (10pts)

___________________________________________________________________

SIGNATURE OVER PRINTED NAME OF PARENT/GUARDIAN

DATE: ____________________
Colegio de Los Baños – STATISTICS AND PROBABILITY 16

LESSON 4: SAMPLING DISTRIBUTIONS

Random sampling simply describes when every element in a population has an equal chance
of being chosen for the sample.
Sampling Techniques

Simple random sampling requires using randomly


generated numbers to choose a sample. More specifically, it
initially requires a sampling frame, a list or database of all
members of a population. You can then randomly generate
a number for each element, using RAN# in your Scientific
calculator

Stratified random sampling starts off by dividing a


population into groups with similar attributes. Then a
random sample is taken from each group.

Cluster sampling starts by dividing a population into groups, or clusters. What makes this
different that stratified sampling is that each cluster must be representative of the
population. Then, you randomly selecting entire clusters to sample.
Colegio de Los Baños – STATISTICS AND PROBABILITY 17

Systematic random sampling is a very common technique in which you sample every k’th
element. For example, if you were conducting surveys at a mall, you might survey every 100th
person that walks in, for example.
If you have a sampling frame then you would divide the size of the frame, N, by the desired
sample size, n, to get the index number, k. You would then choose every k’th element in the
frame to create your sample.

Proportions from Random Samples Vary


Imagine a small college with only 200 students, and suppose that 60% of these students are
eligible for financial aid.
In this simplified situation, we can identify the population, the variable, and the population
proportion.
 Population: 200 students at the college.
 Variable: Eligibility for financial aid is a categorical variable, so we use a proportion as a
summary.
 Population proportion: 0.60 of the population is eligible for financial aid.
Note: Populations are usually much larger than 200 people. Also, in real situations, we do not
know the population proportion. We are using a simplified situation to investigate how random
samples relate to the population. This is the first step in creating a probability model that will be
useful in inference.
How accurate are random samples at predicting this population proportion of 0.60?
To answer this question, we randomly select 8 students and determine the proportion who are
eligible for financial aid. We repeat this process several times. Here are the results for 3 random
samples:
Colegio de Los Baños – STATISTICS AND PROBABILITY 18

A parameter is a number that describes a population. A statistic is a number that we calculate


from a sample.
 When we do inference, the parameter is not known because it is impossible or impractical
to gather data from everyone in the population. (Note: In each example on this page, we
assumed we knew the parameter so that we could investigate how statistics relate to the
parameter. This is the first step in creating a probability model. However, when we do
inference, we use a statistic to draw a conclusion about an unknown parameter.)
 We make an inference about the population parameter on the basis of a sample statistic.
 Statistics from samples vary.
In this course, if the variable is categorical, the parameter and the statistic are both
proportions. If the variable is quantitative, the parameter and statistic are both means.
Using the situation in sample proportions:
 Parameter: A population proportion. For this population of students at a small college,
0.60 are eligible for financial aid.
 Statistics: Sample proportions that vary. In the example, 0.75, 0.625, and 0.375 are all
statistics that describe the proportion eligible for financial aid in a sample of 8 students.

The total set of observations that can be made is called the population, normally denoted
by N.
A sample is a set of observations drawn from a population. The sample size is normally
denoted by n.
A parameter is a measurable characteristic of a population, such as a mean or standard
deviation.
A statistic is a measurable characteristic of a sample, such as a mean or standard
deviation, which in effect can be used as an estimate of the population parameter.

Mean and Standard Deviation of the Sample Mean

Sample Mean Sample Mean


1, 2, 2.00 1, 4, 3.33
3 5
1, 2, 2.33 2, 3, 3.00
4 4
1, 2, 2.67 2, 3, 3.33
5 5
1, 3, 2.67 2, 4, 3.67
4 5
1, 3, 3.00 3, 4, 4.00
5 5
𝝈
μx = Σ𝒙
̅ P(𝑥̅ ) 𝝈𝒙 =
√𝒏

̅𝟐 P(𝑥̅ )] − ( 𝛍𝒙 )𝟐
𝝈𝒙 = √𝜮[𝒙
Colegio de Los Baños – STATISTICS AND PROBABILITY 19

Mean Frequency Probability P (𝑥̅ )*


̅ P(𝑥̅ )
𝒙 ̅𝟐 P(𝑥̅ )
𝒙
𝑥̅ P (𝑥̅ ) (𝑥̅ − 𝜇 )2
2.00 1 1 / 10 =
0.10 0.2 0.4 0.1
2.33 1 1 / 10 =
0.10 0.233 0.54289 0.04489
2.67 2 2 / 10 =
0.20 0.534 1.42578 0.02178
3.00 2 2 / 10 =
0.20 0.6 1.8 0
3.33 2 2 / 10 =
0.20 0.666 2.21778 0.02178
3.67 1 1 / 10 =
0.10 0.367 1.34689 0.04489
4.00 1 1 / 10 =
0.10 0.4 1.6 0.1
Total Σ 10 1.0 μx = 3 9.33334 0.33334

μx = mean of the sample mean


𝝈𝒙 = 𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 𝒐𝒇 𝒕𝒉𝒆 𝒔𝒂𝒎𝒑𝒍𝒆 𝒎𝒆𝒂𝒏

̅𝟐 P(𝑥̅ )] − ( 𝛍𝒙 )𝟐
𝝈𝒙 = √𝜮[𝒙

𝝈𝒙 = √𝟗. 𝟑𝟑𝟑𝟑𝟒 − (𝟑)𝟐

𝝈𝒙 = 0.5774
Colegio de Los Baños – STATISTICS AND PROBABILITY 20

ANSWER SHEET (Please submit only the answers. Do not return the entire module.)
Name: ________________________________ Section: _______________________
LAST NAME, FIRST NAME MIDDLE INITIAL

ENGAGEMENT
Mini-Performance Task No.2. VARIABLE ON ROLLING A DIE (50 pts)
1. Get a 6-phased die. Roll it three times
2. Enter the values of the die based on its outcome. (the first part of table is already for
your example.
3. Calculate the mean of the three outcomes.
4. Repeat steps 1-3 four more times.
5. Tabulate the results

Outcomes in a die Mean


1, 4, 3 2.67

Complete the table below


Mean Frequency Probability P (𝑥̅ )*
̅ P(𝑥̅ )
𝒙 ̅𝟐 P(𝑥̅ )
𝒙
𝑥̅ P (𝑥̅ ) (𝑥̅ − 𝜇 )2
2.67

Total Σ

Find the mean and standard deviation of the said event

___________________________________________________________________
SIGNATURE OVER PRINTED NAME OF PARENT/GUARDIAN
DATE: ______________
Colegio de Los Baños – STATISTICS AND PROBABILITY 21

WEEK 5 and 6 – CENTRAL LIMIT THEOREM and T-TABLE


In this week, you will learn how to calculate sampling distributions and interpret the T-table

LEARNING MATERIALS: Module, pen, paper, internet (if applicable), scientific calculator, T-
table

PRAYER: Father God, please guide me in the lesson today and help me grow in love and
kindness more like Jesus every day. AMEN

INTRODUCTION:
MELC: At the end of the lesson, you should be able to:
Define the sampling distribution of the sample mean for normal population when the
variance is: (a) known (b) unknown
Illustrate the Central Limit Theorem
Define the sampling distribution of the sample mean using the Central Limit Theorem
Solve problems involving sampling distributions of the sample mean
Illustrate the t-distribution
Identify percentiles using the t-table

INSTITUTIONAL VALUES: Mastery of competencies, Problem solving, Excellence, Integrity

REVIEW: Sampling Distribution and Probability Distribution Table

DEVELOPMENT
MOTIVATION - PROCESS QUESTIONS:
1. What is the significance central limit theorem?
2. What is the purpose of T-table

LESSON PROPER
LESSON 5: SAMPLING DISTRIBUTION OF THE SAMPLING MEAN

If the population is normally distributed with mean μ and standard deviation σ, then the sampling
distribution of the sample mean is also normally distributed no matter what the sample size is.
When the sampling is done with replacement or if the population size is large compared to the
𝜎
sample size, then 𝑥̅ has mean μ and standard deviation . We use the term standard error for
√𝑛
the standard deviation of a statistic, and since sample average, 𝑥̅ is a statistic, standard
deviation of 𝑥̅ is also called standard error of 𝑥̅ .

Standard Deviation of (𝑥̅ ) [Standard Error]


𝜎
SD(𝑥̅ ) = SE(𝑥̅ ) = .
√𝑛

When we know the sample mean is Normal or approximately Normal, then we can calculate a z-
score for the sample mean and determine probabilities for it using
Colegio de Los Baños – STATISTICS AND PROBABILITY 22

Sample Problem 1
The engines made by Ford for speedboats have an average power of 220 horsepower (HP)
and standard deviation of 15 HP. You can assume the distribution of power follows a normal
distribution.
Consumer reports are testing the engines and will dispute the company's claim if the sample
mean is less than 215 HP. If they take a sample of 4 engines, what is the probability the
mean is less than 215?

Answer
We want to find P(𝑥̅ <215).

Since the population follows a normal distribution, we can conclude that 𝑥̅ has a normal
𝜎 15
distribution with mean 220 HP (μ=220) and a standard deviation of = = 7.5HP.
√𝑛 √4

215−220
P( 𝑥̅ < 215) = P (Z < ) = P (Z < -0.67)
7.5
= 0.2514 (Z table)

If the consumer reports samples four engines, the probability that the mean is less than 215 HP
is 25.14%

TRY THIS
Using the speedboat engines example above, answer the following question.
If consumer reports samples 100 engines, what is the probability that the sample mean will
be less than 215?
(Answer should be 0.0043 or 0.43%)
Colegio de Los Baños – STATISTICS AND PROBABILITY 23

CENTRAL LIMIT THEOREM


What happens when the sample comes from a population that is not normally
distributed? This is where the Central Limit Theorem comes in
Central limit theorem states that if random samples of size n are drawn from a
population, then as n becomes larger, the sampling distribution of the mean approaches the
normal distribution, regardless of the shape of the population distribution.
̅−𝜇
𝑋
𝑧= 𝜎
√𝑛
where 𝑋̅ = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 , μ = population mean ,
σ = population standard deviation, n = sample size.

Example: The average number of milligrams (mg) of cholesterol in a cup of a certain brand of
ice cream is 660 mg, and the standard deviation is 35 mg. Assume the variable is normally
distributed.
a) If a cup of ice cream is selected, what is the probability that the cholesterol content will be
more than 670 mg?
Solution:
Given : μ = 660, σ = 35, X = 670 ;
find P( X > 670) ;
since this is an individual data, the regular z – score formula is used:
𝑋− 𝜇 670−660
𝑧= = = 0.29.
𝜎 35
Thus P (X > 670) = P (z > 0.29) = 0.5 – 0.1141 = 0.3859.
So the probability that the cholesterol content will be more than 670 mg is 38.59%.

Example. If a sample of 10 cups of ice cream is selected, what is the probability that the mean
of the sample will be larger than 670 mg? Assume that population mean = 660 and population
standard deviation = 35.

Solution :
Given : μ = 660, σ = 35, 𝑋̅ = 670, n = 10 ;
find P(𝑋̅ > 670) ;
since this involves sample data, the z formula to use is
𝑋− 𝜇 670−660
𝑧= 𝜎 = 35 = 0.90.
√𝑛 √10
Thus P (𝑋̅ > 670) = P (z > 0.90) = 0.5 – 0.3159 = 0.1841.
So the probability that the mean cholesterol content of 10 randomly selected cups of ice cream
will be more than 670 mg is 18.41%.

T-DISTRIBUTION TABLE

What is the T Distribution?


The T distribution (also called Student’s T Distribution) is a family of distributions that look
almost identical to the normal distribution curve, only a bit shorter and fatter. The t distribution is
used instead of the normal distribution when you have small samples. The larger the sample
size, the more the t distribution looks like the normal distribution. In fact, for sample sizes larger
than 20 (e.g. more degrees of freedom), the distribution is almost exactly like the normal
distribution.
Colegio de Los Baños – STATISTICS AND PROBABILITY 24

T-Distribution Table
Colegio de Los Baños – STATISTICS AND PROBABILITY 25

When you conduct a t-test, you can compare the test statistic from the t-test to the critical
value from the t-Distribution table. If the test statistic is greater than the critical value found
in the table, then you can reject the null hypothesis of the t-test and conclude that the results
of the test are statistically significant.

Example #1: One-tailed t-test for a mean

A researcher recruits 20 subjects for a study and conducts a one-tailed t-test for a mean using
an alpha level of 0.05.

Question: Once she conducts her one-tailed t-test and obtains a test statistic t, what critical
value should she compare t to?

Answer: For a t-test with one sample, the degrees of freedom is equal to n-1, which is 20-1 =
19 in this case. The problem also tells us that she is conducting a one-tailed test and that she is
using an alpha level of 0.05, so the corresponding critical value in the t-distribution table
is 1.729.

Example #2: Two-tailed t-test for a mean

A researcher recruits 18 subjects for a study and conducts a two-tailed t-test for a mean using
an alpha level of 0.10.

Question: Once she conducts her two-tailed t-test and obtains a test statistic t, what critical
value should she compare t to?

Answer: For a t-test with one sample, the degrees of freedom is equal to n-1, which is 18-1 =
17 in this case. The problem also tells us that she is conducting a two-tailed test and that she is
using an alpha level of 0.10, so the corresponding critical value in the t-distribution table is 1.74.

Example #3: Determining the critical value

A researcher conducts a two-tailed t-test for a mean using a sample size of 14 and an alpha
level of 0.05.

Question: What would the absolute value of her test statistic t need to be in order for her to
reject the null hypothesis?

Answer: For a t-test with one sample, the degrees of freedom is equal to n-1, which is 14-1 =
13 in this case. The problem also tells us that she is conducting a two-tailed test and that she is
using an alpha level of 0.05, so the corresponding critical value in the t-distribution table
is 2.16. This means that she can reject the null hypothesis if the test statistic t is less than -2.16
or greater than 2.16.
Colegio de Los Baños – STATISTICS AND PROBABILITY 26

Example #4: Comparing a critical value to a test statistic

A researcher conducts a right-tailed t-test for a mean using a sample size of 19 and an alpha
level of 0.10.

Question: The test statistic t turns out to be 1.48. Can she reject the null hypothesis?

Answer: For a t-test with one sample, the degrees of freedom is equal to n-1, which is 19-1 =
18 in this case. The problem also tells us that she is conducting a right-tailed test (which is a
one-tailed test) and that she is using an alpha level of 0.10, so the corresponding critical value
in the t-distribution table is 1.33. Since her test statistic t is greater than 1.33, she can reject the
null hypothesis.

Should You Use the t Table or the z Table?

One problem that students frequently encounter is determining if they should use the t-
distribution table or the z table to find the critical values for a particular problem. If you’re stuck
on this decision, you can use the following flow chart to determine which table you should use:

Example #5: Locating percentiles in T-Distribution Table


Find the 95th percentile of the t(df=3) distribution.
Solution: Using one tail  = 1.00 – 0.95 = 0.05
Go to the row labeled 3 [this is the row that contains quantiles of the t(df=3) distribution] and
then over to the column labeled 0.05. The table entry is 2.353
Answer: 2.353
Colegio de Los Baños – STATISTICS AND PROBABILITY 27

ONE-TAILED VS TWO-TAILED
A two-tailed test is appropriate if you want to determine if there is any difference between the
groups you are comparing. For instance, if you want to see if Group A scored higher or lower
than Group B, then you would want to use a two-tailed test. This is because a two-tailed test
uses both the positive and negative tails of the distribution. In other words, it tests for the
possibility of positive or negative differences.

A one-tailed test is appropriate if you only want to determine if there is a difference between
groups in a specific direction. So, if you are only interested in determining if Group A scored
higher than Group B, and you are completely uninterested in possibility of Group A scoring
lower than Group B, then you may want to use a one-tailed test. The main advantage of using a
one-tailed test is that it has more statistical power than a two-tailed test at the same significance
(alpha) level. In other words, your results are more likely to be significant for a one-tailed test if
there truly is a difference between the groups in the direction that you have predicted. This is
because only one tail of the distribution is used for the test.
Colegio de Los Baños – STATISTICS AND PROBABILITY 28

ANSWER SHEET (Please submit only the answers. Do not return the entire module.)
Name:_______________________________ Section: _______________________
LAST NAME, FIRST NAME, MIDDLE INITIAL

ENGAGEMENT
Enabling Assessment Activity No.3. CENTRAL LIMIT THEOREM

Answer the following problem. Show your solution (25 poins)

A certain machine produces components having a mean length of 15 centimeters. As a result of


measuring samples of these components, it is found that the standard deviation is 0.5 cm. A test
is carried out on a sample to check whether the data on the lengths of the components is
normally distributed and it is found that it is. Determine the number of components likely to have
a length of less than 14.75 cm in a batch of 30 components

Using the problem above at 95% confidence level, justify if the null hypothesis should be
rejected.

ASSIMILATION
Answer in 3-5 sentences.
How can the knowledge on Central Limit Theorem help you in your PR1? (10pts)

___________________________________________________________________

SIGNATURE OVER PRINTED NAME OF PARENT/GUARDIAN

DATE: _______________
Colegio de Los Baños – STATISTICS AND PROBABILITY 29

ANSWER SHEET (Please submit only the answers. Do not return the entire module.)
Name: ________________________________ Section: _______________________
LAST NAME, FIRST NAME MIDDLE INITIAL

Mini-Performance Task No.3. PERCENTILE LOCATION (50 pts)


At two-tailed graph, locate in the T-distribution table the entries for different percentiles. (5 pts
each)

Percentile Degrees of freedom T-distribution value


80th 5
90th 10
95th 15
98th 20
99th 25
80th 30
90th 35
95th 60
98th 120
99th 1000

___________________________________________________________________
SIGNATURE OVER PRINTED NAME OF PARENT/GUARDIAN
DATE: ______________
Colegio de Los Baños – STATISTICS AND PROBABILITY 30

MODULE 7 and 8 – CONFIDENCE INTERVAL


In this module, you will learn how to estimate appropriate parameters in population sampling

LEARNING MATERIALS: Module, pen, paper, internet (if applicable), scientific calculator

PRAYER: Father God, please guide me in the lesson today and help me grow in love and
kindness more like Jesus every day. AMEN

INTRODUCTION:
MELC: At the end of the lesson, you should be able to:
Identify the length of a confidence interval
Compute for the length of the confidence interval
Compute for an appropriate sample size using the length of the interval.
Solve problems involving sample size determination

INSTITUTIONAL DEVELOPMENT: Mastery of competencies, Integrity, Excellence, Problem


Solving, Critical Thinking

REVIEW: Basic Probability and Event

DEVELOPMENT

MOTIVATION - PROCESS QUESTIONS:


1. What is the significance of confidence interval?
2. What formula would you use for sampling large amount of population?

LESSON PROPER
Confidence Intervals
Statisticians use a confidence interval to express the precision and uncertainty associated
with a particular sampling method. A confidence interval consists of three parts: confidence
level, statistic, and margin of error.
The confidence level describes the uncertainty of a sampling method. The statistic and the
margin of error define an interval estimate that describes the precision of the method. The
interval estimate of a confidence interval is defined by the sample statistic + margin of error.

Confidence Level
The probability part of a confidence interval is called a confidence level. The confidence
level describes the likelihood that a particular sampling method will produce a confidence
interval that includes the true population parameter.
Suppose all possible samples from a given population were collected and the confidence
intervals for each sample computed. Some confidence intervals would include the true
population parameter; others would not. A 95% confidence level means that 95% of the
intervals contain the true population parameter; a 90% confidence level means that 90% of the
intervals contain the population parameter; and so on.
Colegio de Los Baños – STATISTICS AND PROBABILITY 31

Margin of Error
A margin of error tells you how many percentage points your results will differ from the
real population value. For example, a 95% confidence interval with a 4 percent margin of error
means that your statistic will be within 4 percentage points of the real population value 95% of
the time.

More technically, the margin of error is the range of values below and above the sample
statistic in a confidence interval. The confidence interval is a way to show what
the uncertainty is with a certain statistic (i.e. from a poll or survey).
For example, a poll might state that there is a 98% confidence interval of 4.88 and 5.26. That
means if the poll is repeated using the same techniques, 98% of the time the true population
parameter (parameter vs. statistic) will fall within the interval estimates (i.e. between 4.88 and
5.26) 98% of the time.

Confidence Intervals vs. Confidence Levels


Confidence levels are expressed as a percentage (for example, a 95% confidence level). It
means that should you repeat an experiment or survey over and over again, 95 percent of the
time your results will match the results you get from a population (in other words, your statistics
would be sound!). Confidence intervals are your results and they are usually numbers. For
example, you survey a group of pet owners to see how many cans of dog food they purchase a
year. You test your statistic at the 99 percent confidence level and get a confidence interval of
(200,300). That means you think they buy between 200 and 300 cans a year. You’re super
confident (99% is a very high level!) that your results are sound, statistically.

Calculating Confidence Interval


A group of 10 foot surgery patients had a mean weight of 240 pounds. The sample standard
deviation was 25 pounds. Find a confidence interval for a sample for the true mean weight of all
foot surgery patients. Find a 95% CI.

Step 1: Subtract 1 from your sample size.


10 – 1 = 9.
This gives you degrees of freedom, which you’ll need in step 3.
Step 2: Subtract the confidence level from 1, then divide by two.

(1 – .95) / 2 = .025 (one tail)

Step 3: Look up your answers to step 1 and 2 in the t-distribution table.


Since you divide the significant level  (1 - 0.95) by two, use the two tails in T-distribution table

For 9 degrees of freedom (df) and  = 0.025, the result is 2.262.


Colegio de Los Baños – STATISTICS AND PROBABILITY 32
Colegio de Los Baños – STATISTICS AND PROBABILITY 33

Step 4: Use the formula below to and substitute the values

Where
𝑋̅ = value
t = value using t-distribution table
s = standard deviation
n = sample size

25
240 ± 2.262
√10

Upper end is
25
240 + 2.262 = 257.883
√10

Lower end is
25
240 − 2.262 = 222.117
√10

Confidence interval is from 222.117 to 257.883

Suppose the local newspaper conducts an election survey and reports that the independent
candidate will receive 30% of the vote. The newspaper states that the survey had a 5% margin
of error and a confidence level of 95%. These findings result in the following confidence interval:
The survey is 95% confident that the independent candidate will receive between 25% and 35%
of the vote.
𝜎 𝜎
Thus at 95% confidence interval, the value of 𝑋̅ − 1.96 ( ) < 𝜇 < 𝑋̅ + 1.96 ( ) can
√𝑛 √𝑛
vaguely estimate the population mean μ. Also a 95% confidence interval indicates a 5% level of
confidence. The level of significance is normally referred to as α . The general formula for
finding the confidence interval would be :
𝜎 𝜎
𝑋̅ − 𝑧𝛼/2 ( ) < 𝜇 < 𝑋̅ + 𝑧𝛼/2 ( ) .
√𝑛 √𝑛

For ready reference, a 90% confidence interval gives 𝑧𝛼/2 = ±1.65 ; a 95% confidence
interval gives 𝑧𝛼/2 = ±1.96 ; and a 99% confidence interval gives 𝑧𝛼/2 = ±2.58.

𝜎
The term 𝑧𝛼/2 ( ) is called the margin of error in statistics. This is simply defined as the
√𝑛
maximum likely difference between the observed sample mean and the true value of the
population mean. The above formula should only be used for sampling size greater than 30.
Colegio de Los Baños – STATISTICS AND PROBABILITY 34

Example: A researcher wants to estimate the number of hours that 5-year old children spend
watching television. A sample of 50 five-year old children was observed to have a mean viewing
time of 3 hours. The population is assumed to be normally distributed with a population standard
deviation of σ = 0.5 hours. Find the best point estimate of the population mean and the 95%
confidence interval.

Solution: Since the sample size is more than 30, the central limit theorem holds and therefore
the best point estimate of the population mean would be equal to the sample mean of 3 hours.

Since the confidence interval is at 95%, the confidence coefficient would be 1.96 and the margin
of error can be computed as
𝜎 0.5
E = 𝑧𝛼/2 ( ) = 1.96 ( ) = 0.14.
√𝑛 √50

Thus the lower and upper limits would be 𝑋̅ ± 𝐸 .


So

3 + 0.14 = 3.14 would be the upper limit of the confidence interval and

3 – 0.14 = 2.86 is the lower limit.

Determining sample size


𝑧𝛼/2 ∗ 𝜎 2
n=( )
𝐸
where n = sample size ,
𝑧𝛼/2 = confidence coefficient ,
σ = standard deviation,
and E = margin of error.

Feeding Program: In a certain village, Mrs. Ramos wants to estimate the mean weight μ, in
kilograms, of all six-year old children to be included in a feeding program. She wants to be 99%
confident that the estimate of μ is accurate to within 0.06 kg. Suppose from a previous study,
the standard deviation of the weights of the target population was 0.5 kg, what should the
sample size be?

Solution:
Given a confidence interval of 99% ,
α = 1 – 0.99 = 0.01,
and 𝑧𝛼/2 = 2.58 .
The desired margin of error E = 0.06 and σ = 0.5 kg.
𝑧𝛼/2 ∗ 𝜎 2 2.58∗ 0.5 2
n=( ) = ( ) = (21.5)2
𝐸 0.06
= 462.25 , round up to 463 six-year old children.
Colegio de Los Baños – STATISTICS AND PROBABILITY 35

Another convenient formula to use in computing the sample size is the Sloven’s formula.
𝑁
𝑛 = 2
1+𝑁𝑒
where :
n = sample size e = margin of error (between 1% – 10%)
N = population size
A student wanted to conduct a survey regarding the perspective of the GAS Grade 11
students on LGU’s protocols for COVID-19.There are 135 enrolled Grade 11 GAS students
in CDLB. For his data to be statistically valid, a 5% margin of error should be used in
sampling. How many residents of Maahas should be his respondents?
Given
Population size, N = 135
Margin of error, e = 5% or 0.05
𝑁
𝑛 =
1 + 𝑁𝑒 2

135
𝑛 =
1 + 135(0.05)2
𝒏 = 𝟏𝟎𝟎. 𝟗𝟖 = 𝟏𝟎𝟏 𝑮𝒓𝒂𝒅𝒆 𝟏𝟏 𝑮𝑨𝑺 𝑺𝒕𝒖𝒅𝒆𝒏𝒕𝒔
Colegio de Los Baños – STATISTICS AND PROBABILITY 36

ANSWER SHEET (Please submit only the answers. Do not return the entire module.)
Name:_______________________________ Section: _______________________

ENGAGEMENT
Enabling Assessment Activity No.4. Sampling Size Determination

You are to conduct a sampling method using CXDLB SHS students as your respondents.
Using Slovin’s formula, calculate how many respondents should be your sampling size in each
strand if you are given 10% allowable error. Show your solutions. (5 pts per strand)

Strand Population Size Sampling Size


ABM 162
GAS 371
HUMSS 190
ICT 53
STEM 145

ASSIMILATION
Answer in 3-5 sentences.
First time researchers, like students, are allowed to use 10% as allowable error, while in the
actual field research, 5% error is allowed. Why is this so? (10pts)

___________________________________________________________________
SIGNATURE OVER PRINTED NAME OF PARENT/GUARDIAN
Colegio de Los Baños – STATISTICS AND PROBABILITY 37

QUARTER 3 CULMINATING PERFORMANCE TASK (100 pts)

GOAL – Prepare a normal distribution graph that will help in your decision making.
ROLE - Statistician
AUDIENCE – A client who wants to have a beach vacation.
SITUATION – A client in your travel company acquired some information on what month should
he take his beach vacation. As much as possible your client wanted not to coincide his vacation
with the peak season or where people get crowded in the beach.
PRODUCT – Survey at least 50 students in your strand and identify on what month of the year
they usually go out on a beach. Based on the result, create a normal distribution graph and
identify which month your client should took his vacation.
STANDARDS – The recommendation would be assessed based on the following criteria

CRITERIA PERCENTAGE
Relevance
(The output contains timely information and reasonable type of vacation 40%
options)
Clarity of plan & process
(The output shows clear data regarding the result of the survey) 30%

Presentation of data
30%
(Data should be presented accurately and precisely based on formula)
Total 100%

You might also like