0% found this document useful (0 votes)
2K views133 pages

ECE-069_Engineering-Data-Analysis_WM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views133 pages

ECE-069_Engineering-Data-Analysis_WM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 133

ECE 069

Engineering Data Analysis

This document and the information thereon is the property of PHINMA Education
ECE 069 - ENGINEERING DATA ANALYSIS
SYLLABUS

PEN CODE: ECE 069 Credit: 3 units


PEN Subject Title: ENGINEERING DATA ANALYSIS Prerequisite: MAT 171

I. Course Description:
This course is designed for undergraduate engineering students with emphasis on problem
solving related to societal issues that engineers and scientists are called upon to solve. It introduces
different methods of data collection and the suitability of using a particular method for a given
situation. The relationship of probability to statistics is also discussed, providing students with the
tools they need to understand how "chance" plays a role in statistical analysis. Probability
distributions of random variables and their uses are also considered. The course also includes
estimation techniques for unknown parameters; and hypothesis testing used in making inferences
from sample to population; inference for regression parameters and build models for estimating
means and predicting future values of key variables under study.
II. Course Objectives:
At the end of the course, the students should be able to:
1. Apply statistical methods in the analysis of data.
2. Design experiments involving several factors.

III. Course Topics and Time Allotment

Lesson Topics Week


No. No.
PERMUTATION AND COMBINATION
1  Definition of Terms :
Fundamental Principle of Counting, Permutation,
Combination
PROBABILITY 1
 Definition of Terms: Sample Space, Event , Probability
 Properties of Probability
2  Additive Laws of Probability
 Conditional Probability
 Independent Events
3 SAMPLING METHODS & DESCRIPTIVE STATISTICS FOR
SAMPLES
2
DATA COLLECTION
4
 Sources of Primary and Secondary Data
QUIZ#1
PRESENTATION OF DATA
5  Textual Presentation 3
 Tabular Presentation
 Graphical Presentation
ECE 069 - ENGINEERING DATA ANALYSIS
SYLLABUS

PEN CODE: ECE 069 Credit: 3 units


PEN Subject Title: ENGINEERING DATA ANALYSIS Prerequisite: MAT 171

RANDOM VARIABLES
6  Types of Random Variables : Discrete & Continuous
 Probability Distribution of Random Variables 4
PLANNING AND CONDUCTING SURVEYS
7
 Steps in Collecting Data Through Surveys
FIRST PERIODICAL EXAM 5
PLANNING AND CONDUCTING EXPERIMENTS
8
 Steps in Collecting Data Through Experiments
DISCRETE PROBABILITY DISTRIBUTION – 6
9 BINOMIAL DISTIBUTION
 Definition of a Binomial Probability Distribution
 Descriptive Statistics of Binomial Distribution
DISCRETE PROBABILITY DISTRIBUTION –
POISSON DISTIBUTION
10
 Definition of a Poisson Probability Distribution 7
 Descriptive Statistics of Poisson Distribution
QUIZ#2
CONTINUOUS PROBABILITY DISTRIBUTION –
NORMAL DISTIBUTION
11
 Definition of a Normal Probability Distribution
 Descriptive Statistics of Normal Distribution
8
CONTINUOUS PROBABILITY DISTRIBUTION –
EXPONENTIAL DISTIBUTION
12
 Definition of an Exponential Probability Distribution
 Descriptive Statistics of Exponential Distribution
SECOND PERIODICAL EXAM 9
SAMPLING DISTRIBUTION AND POINT ESTIMATION
 Definition of Point Estimate and Point Estimator
13  Distribution of Point Estimator
 Properties of Point Estimator
 Definition & Interpretation of Standard Error 10
STATISTICAL INTERVALS
 Forms of Interval Estimation : Confidence Intervals
14 Prediction Intervals
Tolerance Intervals
 Statistical Intervals for Normal Distribution
ECE 069 - ENGINEERING DATA ANALYSIS
SYLLABUS

PEN CODE: ECE 069 Credit: 3 units


PEN Subject Title: ENGINEERING DATA ANALYSIS Prerequisite: MAT 171

INTRODUCTION TO HYPOTHESIS TESTING –


ONE SAMPLE Z – TEST
 Definition of Terms :
Hypothesis Testing, Null Hypothesis , Alternative
15 Hypothesis, Level of Significance, Test Statistic
11
 Types of Errors in Hypothesis Testing
 Definition and Interpretation of P -value
 Steps in Hypothesis Testing
 Definition and Assumptions for One – Sample Z-test
QUIZ#3
HYPOTHESIS TESTING : t - TEST
 Types of t – Test :
16 - One Sample t – Test
- Independent Sample t – Test
- A Paired Sample t - Test
SIMPLE LINEAR REGRESSION & CORRELATION 12
COEFFICIENT
17
 Simple Linear Regression Model
 Assumptions of Simple Linear Regression
 Correlation Coefficient , r
 Coefficient of Determination, r2
FINAL EXAMINATION 13

IV. Textbook:
1. Probability (Schaum’s Outline Series) By: Seymour Lipchitz
2. Probability and Statistics (Sixth Edition) By: R. E. Walpole, R. H. Myers & S. L. Myers
3. Introduction to Probability & Statistics (10 th Edition) By: Mendenhall/Beaver/Beaver

V. Grading System:
Passing score is 60%
Final Grade = (0.17 x P1) + (0.17 x P2) + (0.16 x P3) + (0.50 x FE)
P1, P2, P3 = First Periodical Grade; Second Periodical Grade; Third Periodical Grade ;
respectively
Periodical Grade = (0.50 x Class Standing) + (0.50 x Periodical Exam)
ECE 069: Engineering Data Analysis
Module #8 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Lesson Title: PLANNING AND CONDUCTING EXPERIMENTS Materials:


Board & marker
Lesson Objectives: At the end of this session the students will
be able to: References:
1. In a statistical experiment, define and identify the https://explorable.com/
following: https://www.questionpro.com/
a. dependent and independent variables. https://www.formpl.us/
b. control group and the experimental group. https://www.wisdomjobs.com/
https://www.reference.com/
https://www.engineeringcivil.com/

Productivity Tip: Get a productive environment.

A. LESSON PREVIEW/REVIEW
1) Introduction (2 mins)

Before conducting a research experiment, researchers come up with a research design. Experimental
research design serves as an instruction manual on how the experiment is conducted. The design helps
the researcher stay on track and makes sure all bases are being properly covered to ensure the experiment's
validity. Designed Experiments achieve manufacturing cost savings by minimizing process variation and
reducing rework, scrap, and the need for inspection.

2) Activity 1: What I Know Chart, part 1 (3 mins)


Fill in the first column of what you know to answer the questions on the second column of the table below.

What I Know QUESTION What I Learned

What is an experimental group?

What is a control group?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #8 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

B. MAIN LESSON

1. Activity 2: Content Notes (13 mins)

In planning to conduct an experiment to collect the research data, the following must already be defined.
 The research problem and the research objectives. Formulate the research question or a problem
statement.
 The responses and the factors. The variables of interest, in relation to your research problems or
objectives, should be identified. Indicate the independent and the dependent variables. Make some
predictions or hypothesis of the possible outcome (the dependent variable or the response) when the
independent variables (the factors) are manipulated. Combination of the factors is termed as treatments.
For example, if you designed an experiment to determine how quickly a cup of hot chocolate drink cools,
then, the manipulated independent variable is time and the dependent measured variable is temperature.
 The experiment research design. This is the process of planning an experiment to test the researcher’s
hypothesis. The relationship between two variables - the dependent and the independent variable is
determined. Data collected in experimental research usually are quantitative in nature.

Experimental Design Terminology


Treatment Group. The group in an experiment design where the independent variable is manipulated. This
is also termed as the experimental group.

Control Group. The group of the experimental design not exposed to treatment. The difference in the
performance of the control group and the treatment group measures the effects of the full treatment on the
treatment group.

Important Experimental Designs - Informal experimental designs


(Source: https://www.wisdomjobs.com)

 Before-and-after without control design: In such a design a single test group or area is selected and
the dependent variable is measured before the introduction of the treatment. The treatment is then
introduced and the dependent variable is measured again after the treatment has been introduced. The
effect of the treatment would be equal to the level of the phenomenon after the treatment minus the
level of the phenomenon before the treatment.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #8 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

The main difficulty of such a design is that with the passage of time considerable extraneous variations
may be there in its treatment effect.

 After-only with control design: In this design two groups or areas (test area and control area) are
selected and the treatment is introduced into the test area only. The dependent variable is then
measured in both the areas at the same time. Treatment impact is assessed by subtracting the value
of the dependent variable in the control area from its value in the test area.

The basic assumption in such a design is that the two areas are identical with respect to their behavior
towards the phenomenon considered. If this assumption is not true, there is the possibility of extraneous
variation entering into the treatment effect. However, data can be collected in such a design without the
introduction of problems with the passage of time. In this respect the design is superior to before-and-
after without control design.

 Before-and-after with control design: In this design two areas are selected and the dependent
variable is measured in both the areas for an identical time-period before the treatment. The treatment
is then introduced into the test area only, and the dependent variable is measured in both for an
identical time-period after the introduction of the treatment. The treatment effect is determined by
subtracting the change in the dependent variable in the control area from the change in the dependent
variable in test area.

This design is superior to the above two designs for the simple reason that it avoids extraneous
variation resulting both from the passage of time and from non-comparability of the test and control
areas. But at times, due to lack of historical data, time or a comparable control area, we should prefer
to select one of the first two informal designs stated above.
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Module #8 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Data Collection Methods in Experimental Research

 Observational Study. Here, data are collected through observation from experiments.
 Simulations: This procedure uses a mathematical, physical, or computer models to replicate a real-life
process or situation. It is frequently used when the actual situation is too expensive, dangerous, or
impractical to replicate in real life. This method is commonly used in engineering and operational research
for learning purposes and sometimes as a tool to estimate possible outcomes of real research.

2. Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking).

Read the abstract of a research paper and answer the following questions?
1. What is the research problem of the paper being presented?

2. What are the dependent (response) variable and the independent variables (factors)?

3. What is the control group and the experimental or the treatment group?

Experimental Study of Human Hair in Concrete as Fibre Reinforcement


By: G. Ajaya Kumar O. Ganesh Kumar K. Damodar C. Jayasree Simpa Karmakar
Sai Ganapathi Engineering College, Visakhapatnam, Andhra Pradesh, India

Abstract— Since the ancient times, many researches and advancements were carried to enhance the physical and
mechanical properties of concrete. Fiber reinforced concrete is one among those advancements which offers a convenient,
practical and economical method for overcoming micro cracks and similar type of deficiencies. Since concrete is weak in
tension hence some measures must be adopted to overcome this deficiency. Human hair is generally strong in tension; hence
it can be used as a fiber reinforcement material. Human hair Fiber is an alternative non-degradable matter available in
abundance and at cheap cost. It also reduces environmental problems. Also addition of human hair fibers enhances the
binding properties, micro cracking control, Imparts ductility and also increases swelling resistance. The experimental
findings in our studies would encourage future research in the direction for long term performance to extending this cost
of effective type of fibers for use in structural applications. Experiments were conducted on concrete cubes, cylinders and
beams of standard sizes with addition of various percentages of human hair fiber i.e., 0%, 0.5%, 1% and 1.5% by weight of
cement, fine & coarse aggregate and results were compared with those of plain cement concrete of M-20 grade. For each
percentage of human hair added in concrete, four cubes, three cylinders and three beams were tested for their respective
mechanical properties at curing periods of 3 , 7 and 28 days. Optimum hair fiber content was obtained as 1.5% by weight
of cement.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #8 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

3) Activity 4: What I Know Chart, Part 2 (2 mins).


You may now answer the third column of table in activity 1 based on what you know now.
What I Know QUESTION What I Learned

What is an experimental group?

What is a control group?

4) Activity 5: Check for Understanding (5 mins).


Underline the correct answer.
In the experiment that you have conducted in your research.
a. The (response, factors) represent your independent variable/s.
b. The data collected should answer the (research problem, research design).
c. (Qualitative data, Quantitative data) are usually collected.
d. The proves of planning an experiment to test research hypothesis is termed as (experimental design,
experimental research problem).

C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Do you have some doubts be clarified regarding today’s lesson?


Think of a possible research problem wherein you will be conducting experiment to collect the data. Share
it with your classmate and reserve this research problem in our next session.

KEY TO CORRECTION
Activity #3
1) Usage of human hair as a fiber reinforcement material.
2) Mechanical properties of concrete are the dependent variable or the response in the experiment.
The various percentage of human hair in concrete are the independent variables or the factors.
3) Mechanical properties of plain cement concrete of of M-20 grade is the control group Mechanical
properties of concrete (dependent variable or the response in the experiment) with the various
percentage of human hair (the independent variables), 0%, 0.5%, 1% and 1.5% by weight is the
experimental group/ the treatment group.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #9 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Lesson Title: DISCRETE PROBABILITY Materials:


DISTRIBUTION – BINOMIAL DISTIBUTION Board, marker and calculator (casio fx 350EX model)

Lesson Objectives: At the end of this session the References:


students will be able to: https://en.wikipedia.org/wiki/Binomial_distribution/
1. define a binomial experiment. https://mathworld.wolfram.com/BinomialDistribution.html/
2. solve probability problems as an application to Ronald E. Walpole, R, et.al. (2007) Probability &
binomial distribution Statistics for Engineers & Scientists, eBook. Pearson
Prentice Hall Pearson Education: New Jersey.

Productivity Tip: “Tomorrow becomes never. No matter how small the task, take the first step now! “ – T. Ferriss

A. LESSON PREVIEW/REVIEW
1) Introduction (3 mins)

Let us play a Binomial Experiment. To do this, please toss a one-peso coin 20 times and tally the
outcome for each toss whether a head or a tail appears. Fill in the table below for the result of your
statistical experiment.

OUTCOMES TALLY TOTAL


TAIL
HEAD
TOTAL

2) Activity 1: What I Know Chart, part 1 (2 mins)

Fill in the first column of what you know to answer the questions on the second column of the table below.

What I Know QUESTION What I Learned

What is Binomial Experiment?

What is the formula for the Binomial


Probability Distribution?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #9 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

B. MAIN LESSON

1. Activity 2: Content Notes (13 mins)

Statistical Experiments are experiments that have three things in common. The experiments have more
than one possible outcomes, each possible outcome can be specified ahead of time, and each outcome
depends on chance. An examples is flipping a coin; where there two possible outcomes, (more than one
outcome), the outcomes can be specified in advance either a head or tail, and the outcome is uncertain
(depends on chance).

Probability Distribution. A probability distribution is a table or an equation that links each different
outcomes of a statistical experiment with its probability of occurrence. In some cases, the probability
distribution is represented as a graph. The outcomes of the experiment are represented by a random
variable.

For example, in flipping a coin two times. An outcome of the experiment might be the number of heads that
we see in two coin flips. If we let the variable X be the number of heads that come up, then X is termed as
the random variable which could take a value of X = 1 (meaning of the two coins flipped only one head
appears, so a tail appears on the other coin) or X= 2 (meaning of the two coins flipped e heads appear, so
no heads appear) or X = 0 (meaning no head appears and that 2 heads appear in flipping the 2 coins).

Let the outcome be the number of heads that you see in flipping two coins. Represented by the random
variable X. Note that the possible outcomes of this experiment are {HH, HT, TH, & TT}. Below is the
probability distribution of the above statistical experiment.
Probability Distribution of tossing a coin 2 times

a. Tabular presentation of Probability Distribution (PROBABILITY DISTRBUTION TABLE)

NUMBER OF OUTCOMES ; ( x) Probability of X, P( X = x) ; f (x)


x=2 P ( X= 2 ) = 1/4
x=1 P (X= 1 ) = 1/2
x=0 P ( X= 0) = 1/4

b. Graphical presentation of Probability Distribution (HISTOGRAM)

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #9 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

c. An equation representing the Probability Distribution (Probability Density Function).


2!
f ( x)   (0.5) x
x!(2  x)!

 Binomial distribution describes the probability of a particular outcome in a series of experiment where
the outcome has two distinct possibilities, success or failure. The prefix bi means two. Binomial
distributions are discrete ( that is the events are separate ) and can be used to model the total number
of successes in repeated trials as long as each trial is independent ( means the outcome of one trial does
not affect the outcome of another trial) and the probability of getting either outcome remains constant.

Binomial distribution is a series of independent and identically distributed Bernoulli trials. In a Bernoulli trial,
the experiment is said to be random and could only have two possible outcomes: success or failure. For
example, flipping a coin is considered to be a Bernoulli trial; each trial can only take one of two values (heads
or tails), each success has the same probability (the probability of flipping a head is 0.5), and the results of
one trial do not influence the results of another. The Bernoulli distribution is a special case of the binomial
distribution where the number of trials n = 1. So, repeated flipping of a coin is considered as a binomial
experiment.

BINOMIAL PROBABILITY DISTRIBUTION.

The random variable X which has a binomial probability distribution can be represented as ,

 X ~ B (n, p) where: n - the total number of experiments


p - the probability of success of each experiment

 𝑃(𝑥: 𝑛, 𝑝) = 𝑛𝐶𝑥 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 or 𝑃(𝑥: 𝑛, 𝑝) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥


where: n = the number of experiments
x = 0, 1, 2, 3, 4, …
p = probability of success in a single experiment
q = probability of failure in a single experiment = 1 – p

Descriptive Statistics for a Binomial Distribution

Mean / Median / Mode

The mean is also termed as the expected value or the average of the outcomes. Mean =n p ; where n= the
total number of trials and p = probability of success. For, example the number of heads in 100 trials is 50,
then the mean is 100*0.5.

Median is the middle value in sorted (in an increasing or decreasing arrangement) outcomes. There is no
single formula to find the median for a binomial distribution. However, several special results have been

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #9 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

established: If np is an integer, then the mean, median, and mode coincide and equal np.

Standard Deviation / Variance

Standard deviation is a measure of dispersion of the data set from its mean. Dispersion help you to interpret
the variability of data i.e. to know how much homogenous or heterogenous the data is. In simple terms, it
shows how data approaches the mean. The greater is the standard deviation, the greater is the deviation of
the value of each data from the mean. For a binomial experiment the standard deviation,  , is,

  n  p  (1  p)

The variance is the square of the standard deviation,  2 .

2. Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking) .

A. A coin is tossed 5 times.


a. What is the probability distribution of this binomial experiment?

b. Find the probability that exactly 2 heads will appear.

c. Find the probability that no head will appear.

d. Find the probability that at most 2 heads will appear.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #9 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

B. In the automobile spare part production of your company, 90% pass final inspection (and 10% fail and
need to be fixed). What is the mean and the standard deviation that will pass in the next 5 inspections?

3) Activity 4: What I Know Chart, Part 2 (2 mins)


You may now answer the third column of table in activity 1 based on what you know now.
What I Know QUESTION What I Learned

What is Bernoulli Experiment?

What is formula for the Binomial


Probability Distribution?

4) Activity 5: Check for Understanding (5 mins).

a. From your binomial experiment game (page 1, introduction) and from 5 of your classmate’s data, fill
in the table below:

P( head comes up)


From your game
From your classmates’ game
A
B
C
D
E
Average

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #9 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

b. From the binomial graphs below, what is P (X = 2)

i) ii )

c. In tossing a dice, what is the probability that a non-zero number will appear?

C. LESSON WRAP-UP

1) Activity 6: Thinking about Learning (5 mins)


You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

What are other examples of a binomial experiments?


 Throwing a dice, wherein you are interested to find the probability that an even number will appear.
 Identifying the probability of obtaining exactly one item defective from a set of production.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #10 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Lesson Title: DISCRETE PROBABILITY DISTRIBUTION – Materials:


POISSON DISTRIBUTION Board, marker, and calculator(casio fx
350EX model)
Lesson Objectives: At the end of this session the students will
be able to: References:
1. define Poisson distribution.
2. solve probability problems on events following Poisson https://probabilityformula.org/
processes. https://towardsdatascience.com/
https://www/toppr.com/

Productivity Tip:
“In life, people tend to wait for good things to come to them, and by waiting, they miss out’” - Neil Strauss

A. LESSON PREVIEW/REVIEW

1) Introduction (2 mins)

The Poisson distribution is a discrete distribution. It is named after Simeon-Denis Poisson (1781-1840), a
French mathematician, who published its essentials in a paper in 1837.

The Poisson distribution is a special case of the Binomial distribution. Since, as n approaches infinity, the
binomial distribution also approaches the Poisson distribution. Poisson distribution is actually an important
type of probability distribution formula. Poisson distribution models rare events and is asymmetric —
meaning it is always skewed toward the right.

2) Activity 1: What I Know Chart, part 1 (3 mins)


Fill in the first column of what you know to answer the questions on the second column of the table below.

What I Know QUESTION What I Learned

What is a Poisson process?

What is a Poisson distribution?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #10 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

B. MAIN LESSON

1. Activity 2: Content Notes (13 mins)

An experiment is a Poisson experiment when it possesses the following probabilities:


 The outcomes of the experiment can be easily classified as either success or failure.
 The average of the number of successes within a region that is specified is known.
 The probability of occurrence of a success is always proportional to the size of the specified region. The
region may in the form of volume, area, length, time periods, etc.

Poisson Distribution has the following characteristics:


 It is a discrete distribution.
 Each occurrence is independent of the other occurrences.
 It describes discrete occurrences over an interval.
 The occurrences in each interval can range from zero to infinity.
 The mean number of occurrences must be constant throughout the experiment.

Poisson Distribution gives the probability of a number of events in an interval generated by a Poisson
process. The Poisson distribution is defined by the rate parameter, λ, which is the expected number of
events in the interval and the highest probability number of events.
Applications of the Poisson distribution can be found in many fields including:
 Asymptotic Poisson model of seismic risk for large earthquakes.
 Number of decays in a given time interval in a radioactive sample.
 The number of photons emitted in a single laser pulse.
 The number of yeast cells used when brewing Guinness beer. This example was used by William Sealy
Gosset (1876–1937).
 The number of phone calls arriving at a call centre within a minute. This example was described by A.K.
Erlang (1878–1929).
 Failure of a machine in one month.

The Poisson Distribution Formula


x 
e
P( X  x)  where : x = 0 , 1, 2, 3, . .
x! ; λ = mean of the occurrences in the interval
e = Euler’s constant = 2.72828

Example . A particular river overflows every 25 years on the average. Find the probability that there are
x = 2 overflows in a 25 year interval.
Here, λ = 1, x = 2, hence,

𝜆𝑥 𝑒 −𝜆 12 𝑒 −1
P (there are 2 overflows in a 25-year interval, X = 2) 𝑃(𝑋 = 𝑥) = 𝑥!
= 2!
= 0.1839

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #10 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Descriptive Statistics for a Poisson Distribution

The mean and the variance of the Poisson distribution is  .

Examples. Some vehicles pass through a junction on a busy road at an average rate of 300 per hour. Find
out the probability that none passes in a given minute.
a. What is the average number of vehicles passing per minute?
b. What is the probability that no vehicle will pass in a given hour?
c. What is the expected number of vehicles passing in three minutes?
Solution.
a. The average number of vehicles passing per minute, λ = 300 /60 = 5,

x e  
b. Using the formula; P( X  x) 
x!
50 e 5
P ( X  0)   0.00674
0!
c. Expected number of vehicles passing in three minutes = 3· λ = 3· 5 = 15

Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking)

1. Twenty cars were examined for defective surface coating. The frequency of the number of cars with a
given number of defective surface coating per were was as follows:

Number of Defective Surface Coating, n 0 1 2 3 4 5 6


Frequency, f 4 3 5 2 4 1 1
Total Number of Defective Surface Coating, (n· f) 0 3 10 6 16 5 6

If a car is chosen at random, what is the probability that a car has 3 or more defective surface coating?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #10 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

2. If electricity power failures occur according to a Poisson distribution with an average of 5 failures every
20 weeks, calculate the probability that there will not be more than one failure during a particular week.

3) Activity 4: What I Know Chart, Part 2 (2 mins)


You may now answer the third column of table in activity 1 based on what you know now.
What I Know QUESTION What I Learned

What is a Poisson process?

What is a Poisson distribution?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #10 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

4) Activity 5: Check for Understanding (5 mins)

A company makes electric motors. The defects of the motors follow a Poisson distribution. The probability
an electric motor is defective is 0.01. What is the probability that a sample of 100 electric motors will contain
exactly 3 defective motors?

C. LESSON WRAP-UP

1) Activity 6: Thinking about Learning (5 mins)


You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

What are some characteristics of the Poisson distribution?


Experiments follows a Poisson distributions are unimodal (one peak); exhibit positive skew (that decreases
as λ increases); are centred roughly on λ; and have variance (spread) that increases as λ increases.

KEY TO CORRECTION
Activity #3
1)
Total number of defective surface coating = 0 + 3 + 10 + 6 + 16 + 5 +6 = 46, hence, λ = 46 / 20 = 2.3.
You may use the property of complement of probability, here,
P (finding 3 defective surface coating or more) = 1  P (finding 3 defective surface coating or more) C
P (finding 3 defective surface coating or more) C = 1  P (finding less than 3 defective surface coating)
P (finding less than 3 defective surface coating) =

P( X  3)  P( X  0)  P( X  1)  P( X  2) ;
x e  
using the formula, P( X  x) 
x!

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #11 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Lesson Title: CONTINUOUS PROBABILITY DISTRIBUTION – Materials:


NORMAL DISTIBUTION Calculator(casio fx 350EX model),
board and pen
Lesson Objectives: At the end of this session the students will
be able to: References:
1. solve probability problems for normally distributed data. https://www3.nd.edu/~rwilliam/stats1/x
21.pdf
https://www.stat.colostate.edu/
https://www.mathsisfun.com/l
https://en.wikipedia.org/wiki/

Productivity Tip: “Hard work keeps the wrinkles out of the mind and spirit.”– Helena Rubinstein

A. LESSON PREVIEW/REVIEW
1) Introduction (2 mins)
The normal distribution is the most commonly used probability distribution. This is also known as the
Gaussian distribution. A random variable that follows a normal distribution is said to be normally
distributed. If we know a random variable is normally distributed, then, you can use the known properties
of the normal distribution to calculate the probability of this variable on certain values. Random variables
representing height and intelligence are approximately normally distributed.

2) Activity 1: What I Know Chart, part 1 (3 mins)


Fill in the first column of what you know to answer the questions on the second column of the table below.

What I Know QUESTION What I Learned

What is normal distribution?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #11 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

What are the properties of a normal


distribution graph?

B. MAIN LESSON

1. Activity 2: Content Notes (13 mins)

The random variable X which has a normal distribution can be represented as,

where: µ - mean of the distribution


X ~ N ( µ, σ 2 )
σ 2 - variance of the distribution

Characteristics of the Normal Distribution.


 It is bell shaped.
 It is defined by its mean (µ) and its variance (σ 2) or its standard deviation (σ).
 The mean, median, mode is at the same coordinates.

The Normal Distribution Curve

The Normal Probability Density Function (PDF)


where:
x = random variable
1 (𝑥−𝜇)2 µ = mean

𝑓(𝑥 ) = 𝑒 2 𝜎2 σ = standard deviation
𝜎√2𝜋 e = 2.71828…, a constant
π = 3.1416…, a constant

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #11 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

The standard Normal Distribution

A standard Normal distribution is when mean ( µ ) = 0 and standard deviation (σ ) =1, substituting these
values to the above equation gives the pdf of the standard normal distribution

( x )2
1 
f ( x)  e 2
2

The number of standard deviation from the mean is called as the standard score or the z- score. A positive
z-score indicates the raw score is higher than the mean average. For example, if a z-score is equal to +1, it
is 1 standard deviation above the mean. A negative z-score reveals the raw score is below the mean
average. For example, if a z-score is equal to - 2, it is 2 standard deviations below the mean.

In terms of the standard score, z

1 
( z )2
x
f ( z)  e 2 where: z
2 

Example 1. The heights of the male adults are normally distributed with a mean of 1.7 meter and a standard
deviation of 0.20. What is the corresponding standard score of if the heights of these adults are x1 = 1.4
meter and 1.6 meter.

Solution: For x1 = 1.4 meter, corresponding z-score is . . .


x 1.4  1.7
z   1.5
 0.2

For x1 = 1.6 meter, corresponding z-score is . . .


x 1.6  1.7
z   0.5
 0.2
The z – score allows the researchers to calculate the probability of a score occurring within a
standard normal distribution and enable you to compare two scores that are from different samples
( which have different means and standard deviations).

The Standard Normal Probability Distribution Curve ( mean , µ = 0 and standard deviation, σ = 1.0 ).
Note that the total area under the probability distribution curve is equal to 1.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #11 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

From example 1. There are 400 males adults in the population.


a. What is the probability that the height will be between 1.4 meter and 1.6 meter.
b. How many male adults have heights between 1.4 meter and 1.6 meter.

Solution: Referring to the Areas Under the Normal Curve ( Statistical Table).

At z = -1.5 , the area is 0.06881 and at z = - 0.5 , the area is 0.30854.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #11 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

a) The area between z = - 1.5 and z = - 0.5 is 0.24173. This is also the probability that male adults
have heights between x = 1.4 meter and x = 1.6 meter. Mathematically expressed as

P(1.4  x  1.6)  P(1.5  z  0.5)  0.24173

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #11 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

b) Number of male adults having a height between x= 1.4 meter and 1.6 meter is 97 (400 x 0.24173).

The normal curve below shows the probability distribution of z (in percentage) given its standard deviation.

Source: https://www.mathsisfun.com/data/standard-normal-distribution.html

Example 2: A machine produces electrical components. 99.7 % of the components have lengths between
1.176 cm and 1.224 cm. Assuming the data is normally distributed, what are the mean and the standard
deviation ?
Solution :

At 99.7 %, z  2.97 .
x
z     x  z

Since

At x = 1.176 & z = - 2.97 ; µ = 1.176 + 2.97σ → eqn.1

x = 1.224 & z = 2.97 ; µ = 1.224 – 2.97σ → eqn.2

From eqn. 1 and eqn. 2 , solve for the mean, µ , and the standard deviation , σ.
This equations gives the following solution : µ = 1.20 cm; σ =0.008 cm

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #11 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

2) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking).

Solve the following.


a. A coffee company packs coffee marked as 250 g. A large number of coffee packs were weighed and the
mean and the standard deviation were calculated as 255 grams and 2.5 grams respectively. Assuming
that the data follows a normal distribution, what percentage of the coffee is underweight?

b. A company makes parts for a machine. The lengths of the parts must be within certain limits or they will
be rejected. A large number of parts were measured and the mean and standard deviation were
calculated as 3.1 and 0.005 m respectively. Assuming this data is normally distributed and 99.7 % were
accepted, what are the limits ?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #11 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

3) Activity 4: What I Know Chart, Part 2 (2 mins)


You may now answer the third column of table in activity 1 based on what you know now.
What I Know QUESTION What I Learned

What is normal distribution ?

What are the properties of a


normal distribution graph?

4) Activity 5: Check for Understanding (5 mins)

Solve the following.


The ages of the population is normally distributed with mean 43 and standard deviation 14. A town has a
population of 5,000, how many would you expect aged 22 to 57.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #11 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Enumerate at least 2 sets of data which are normally distributed.

FAQs
Many events are normally distributed, or very close to it. For a large sample size, N, the distribution of
non – normal random variables approaches that of the normal distribution.

KEY TO CORRECTION
Activity #3
a)
Given: µ = 255 grams; σ = 2.5 grams; x = 250 grams
Solution:
x 250  255
Solving for the standard score, z    2.0
 2.5
From the table at z = - 2.0, area = 0.02275. Mathematically, P( x  250)  0.02275 .
Thus, the percentage of coffee that are underweight is 2.275 % (0.02275 x 100).

b)
Given : µ = 3.1 meter; σ = 0.005 meter ; P(a  x  b)  0.997

Solution :
x
At 99.7 % probability of acceptance, z  2.97 z  x   →
z eqn.3

Using eqn 3, the limits are . . . . x = 3.1  2.97(0.005) = 3.085 meter
x = 3.1 + 2.97(0.005) = 3.115 meter

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #11 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Areas Under the Normal Curve (Statistical Table).

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #11 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #11 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #12 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Lesson Title: CONTINUOUS PROBABILITY DISTRIBUTION Materials:


– EXPONENTIAL DISTIBUTION Board, marker, and calculator(casio fx
350EX model)
Lesson Objectives: At the end of this session the students
will be able to: References:
1. write the formula and plot the exponential density https://courses.lumenlearning.com/
function. https://en.wikipedia.org/wiki/
2. solve probability problems on events having an
https://www.researchgate.net/
exponential distribution.
https://courses.lumenlearning.com/

Productivity Tip: Stay Healthy.

A. LESSON PREVIEW / REVIEW

1) Introduction (2 mins)

Questions that concern the time you need to wait before a given event occurs and if this waiting time is
unknown, it is often appropriate to think of it as a random variable having an exponential distribution.
Further, the time you need to wait before an event occurs has an exponential distribution if the probability
that the event occurs during a certain time interval is proportional to the length of that time interval. For
example, you may ask, how long will a piece of machinery work without breaking down?

2) Activity 1: What I Know Chart, part 1 (3 mins)


Fill in the first column of what you know to answer the questions on the second column of the table below.

What I Know QUESTION What I Learned

Draw a sketch of an exponential


function with y as the dependent
variable and x as the independent
variable.

What is an exponential distribution?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #12 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

B. MAIN LESSON

1. Activity 2 : Content Notes (13 mins)

The random variable X which has an exponential distribution can be represented as,

X ~ Exponential (λ) where: λ - rate parameter, a constant

The exponential distribution refers to the probability distribution that is used to define the time between two
successive events that occur independently and continuously at a constant average rate. Here, the
exponential random variable has fewer large values and more small values.

The assumption of a constant rate is very rarely satisfied in the real world scenarios, however, if the time
interval is selected in such a way that the rate is roughly constant, then you can approximate the random
variable to follow an exponential distribution.

The Exponential Probability Density Function

The exponential probability density function (pdf) is ,

; for x ≥ 0
e  x
f ( x)  
where: X is a non-negative continuous random variable
λ = the rate parameter ( a constant )
 0 ; for x < 0

e  x dx  1  e  x
x
and the probability of X = x is , P( X  x)   0
.

Sample plot of the exponential probability density function, f (x) vs. x.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #12 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

The mean, median and the variance of the exponential random variables, X.

1 ; hence, λ = 1 /mean
The mean or the expected value of X is, E ( X )  mean 

ln( 2)
the median , median( X )  , and

1
the variance of X is, var( X ) 
2

Sample Problem. On the average, a certain computer has a life time of 10 years. If the life of the
computer is exponentially distributed.

a. Plot the probability density function, f (x) versus x.

You may use excel to plot f ( x)  e x versus x, with   1  1  0.10


mean 10

b. What is the probability that a computer has a life of less than 7 years?
1
Let X be the random variable representing the life of the computer. Here,    0.10
10
7

P ( X  7)   0.10e  0.10 x dx   e  0.10 x
0
 7
0  1  e  0.7  1  0.497   0.503

The shaded area

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #12 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

c. What is the probability that a computer has a life of more than 10 years?
P (X > 10) = 1 - P (X > 10) C ; P ( X > 10 )c = P ( X ≤ 10 )

P ( X  10)  
10

0

0.10e 0.10 x dx   e 0.10 x  10
0  1  e 1.0  1  0.368   0.632

hence, P ( X >10 ) = 1- P ( X>10 )c = 1 - 0.632 = 0.368

d. What is the probability that a computer has a life of more than 7 years but less than 10 years?

Note that from b) P( X  7)  0.503 and from c) P ( X < 10 ) = 0.632 , s o


P(7  X  10)  P( X  10)  P( X  7)  0.632  0.503  0.129

e. What is the median of X ?


ln( 2) ln( 2)
median( X )    6.93
 0.01

f. What is the variance of X ?


1 1
var( X )    100
 2
(0.10) 2
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Module #12 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

3) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking
Problem. Suppose that the lifetime (x) of certain model of car battery follows an exponential distribution
with a mean life of 5 years”
a. What is the probability distribution of the life of the car battery?
b. Plot the probability density function, f(x) versus the lifetime of the car battery (x).
c. What is the probability that the life of the battery will be greater than 2 years?
d. What is the probability that the life of the battery is greater than 2 years but less than 4 years?
e. What is the var(x)?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #12 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

3) Activity 4: What I Know Chart, Part 2 (2 mins)


You may now answer the third column of table in activity 1 based on what you know now.
What I Know QUESTION What I Learned

Draw a sketch of an exponential


function with y as the dependent
variable and x as the independent
variable.

What is an exponential distribution?

4) Activity 5: Check for Understanding (5 mins)

MULTIPLE CHOICE. Encircle the correct answer.

1. The mean of the exponential distribution is . . .


1 1
a.
 b.
 2 c.  d. 2
2. The variance of the exponential distribution is . . .

1 1
a.
 b.
 2 c.  d. 2
3. A conversation follows an exponential distribution, f ( x)  e x , with a mean time of 3 minutes.
a) Find the probability that the conversation will be more than 5 minutes.

e 5
5
1 3 
5

a.
e b. e 15 c. e 3
d.
3 3

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #12 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

b) Find the probability that the conversation will be less than 5 minutes.

e 5
5
1  
5

a.
1 e 3 b. 1  e 15 c. 1 e 3
d.
1
3 3

C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

What are some events having an exponential distribution?

Answer. The amount of time (beginning now) until an earthquake occurs; the amount of time, in months,
a car battery lasts. The exponential distribution is widely used in the study of the amount of time a
product lasts (field of reliability).

KEY TO CORRECTION
Activity #3
Solution:

a. From the probability density function for exponential random variables,



f ( x)  e x , and for this model,   1  1  0.20
mean 5

hence, the probability density function of the battery life is f ( x)  0.20e0.2 x .

0.2 x
b. Plot of f ( x)  0.20e versus x.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #13 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Lesson Title: Materials:


Board, marker, and calculator(casio fx
SAMPLING DISTRIBUTION AND POINT ESTIMATION 350EX model)

Lesson Objectives: At the end of this session the students will


be able to: References:
1. define point estimations, discuss the properties of good http://www.buders.com/
estimators and give examples of point estimates. https://www.statisticshowto.com/
2. state and discuss the Central Limit Theorem. https://corporatefinanceinstitute.com/

Productivity Tip: “Make each day your masterpiece.” – John Wooden

A. LESSON PREVIEW / REVIEW

1) Introduction (2 mins)

When a parameter is being estimated, the estimate can be either a single number or it can be a range of
scores. When the estimate is a single number, the estimate is called a "point estimate"; when the estimate is
a range of scores, the estimate is called an interval estimate. Confidence intervals are used for interval
estimates.

2) Activity 1: What I Know Chart, part 1 (3 mins)

Fill in the first column of what you know to answer the questions on the second column of the table below.

What I Know QUESTION What I Learned

What is sampling distribution?

What is a point estimate ?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #13 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

B. MAIN LESSON

1. Activity 2: Content Notes (13 mins)

 ESTIMATION refers to the process by which one makes inferences about a population based on
information obtained from the sample.
 STATISTIC refers to any measurable quantity calculated from the sample. A statistic could be the sample
mean, x ; the sample standard deviation, s; the sample variance, s2; . . .
 PARAMETER refers to the descriptive measures of the population. For example, the population mean,
µ; the population standard deviation, σ; the population variance, σ2; . . .
 ESTIMATOR is a quantity calculated from the sample data which are used to give information about the
unknown quantity in the population. For example, the sample mean, x , an estimator of the population
mean, µ.
 ESTIMATE is the particular value of an estimator that is obtained from a particular sample of data and
used to estimate the value of the parameter. In the preceding example if the sample mean is, x  3.5 ,
then we may say that 3.5 is the estimate of the parameter, the population mean, µ.

Point Estimator / Point Estimate


^
A point estimate of population parameters  , is a single numerical value  of the statistic. The
statistic is called the point estimator and the value of the point estimator is the point estimate.
For Normal Distribution
Population Parameter Point Estimator (Sample Statistic)
Mean µ x
Variance σ2 s2
Standard Deviation σ s

Sampling Distribution

The distribution of the point estimator (statistic) is termed as the sampling distribution

Let each set of random variables X 1 , X 2 ,..., X n is normally distributed with mean µ and variance, θ2 .
X 1  X 2  .....  X n
 The sample mean X 
n
    ....   n
 The mean of the sample distributions X   
n n

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #13 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

 2   2  ....   2 n 2  2
 The variance of the sample distribution  2X   
n2 n2 n
 2 
 The sampling distribution of X , is X  N  , .
 

 n 
Example. An electronic company manufactures resistors that have mean resistance of 120 ohms and a
standard deviation of 12 ohms. If distribution of the resistance is normal, find the mean, the variance and
the standard deviation of the sampling distribution for n = 25 resistors.

The mean of the sample distribution:  X    120


2 12 2
The variance of the sample distribution:  2X    5.76
n 25
The standard deviation of the sample distribution:  X   2 x  2.4

The Central Limit Theorem

The Central Limit Theorem states that the sampling distribution of the sample means (unknown
population) approaches a normal distribution as the sample size gets larger. This holds especially
true for sample sizes over 30.

Properties of Point Estimators


The following are the main characteristics of point estimators:
1. Bias
Bias is the difference between the expected value (the average or mean value) of a point estimator
minus the value of the parameter,  , being estimated.
^
Bias = E (  ) - 

A good estimator has a small bias. When the bias is zero then you may say that the point estimator
is unbiased.
2. Consistency
Consistency shows how close the point estimator to the value of the parameter as the sample size
increases.
^
 E ( ) 
^  as n → ∞
 Var (  ) →  0

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #13 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

3. Relative Efficiency

The absolute efficiency of an estimator is the ratio between the minimum variance and the actual
variance.

An unbiased estimator is called efficient if its variance coincides with the minimum variance for all
values of the population parameter.

If two competing estimators are both unbiased, the one with the smaller variance (for a given
sample size) is said to be relatively more efficient. An estimator θ is said to be more efficient than
another estimator θ2 for θ if the variance of the first is less than the variance of the second.

Standard Error

Standard error is a measure of accuracy of a statistic. This is equal to the standard deviation of the
sampling distribution of this statistic.

The standard error tells you how accurate the mean of any given sample from that population is likely to
be compared to the true population mean. When the standard error increases, i.e. the means are more
spread out, it becomes more likely that any given mean is an inaccurate representation of the true
population mean.

 where: SE = standard error of the sample


SE  σ = standard deviation of the population
n
n = sample size

Example. In a certain property investment company with an international presence, workers have a
mean hourly wage of 125 pesos with a population standard deviation of 5 pesos. Given a sample size of
30, estimate and interpret the SE of the sample mean.
 5
SE    0.90
n 30

Interpretation. If we draw several samples of size to from the population, we will end up with a mean
hourly wage of 125 pesos with a standard error of 0.90

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #13 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

2) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking).
Multiple Choice. Encircle the best answer.
1. A sampling distribution is the probability for which of the . . .
a. sample b. sample statistic c. population d. d. population parameter
2. What is the best description of a point estimate?
a. any value from the sample to estimate a parameter.
b. a sample statistic used to estimate a parameter.
c. the margin of error to estimate the parameter.
d. the population mean.
3. What does the central limit theorem state?
a. If the sample size increases sampling distribution must approach normal distribution.
b. If the sample size decreases, then the sample distribution must approach normal distribution.
c. If the sample size increases, then the sampling distribution much approach an exponential
distribution.
d. If the sample size decreases, then the sampling distribution much approach an exponential
distribution.
4. The difference between the expected value of the sample and the estimates value of the parameter is
the . . .
a. bias b. error c. contradiction d. difference
5. A random sample of 100 engineering students are asked how much they spend a meal during week
days. The average spent is found to be P70. What is the point estimate of the population mean?
a. P 100 b. P 90 c. P80 d. P 70
6. Which of the following statements applies to a point estimate?
a. The point estimate is a parameter.
b. The point estimate will tend to be accurate if the sample size exceeds 30 for non-normal
populations.
c. The point estimate is subject to sampling error and will almost always be different than the
population value.
d. all of the above
7. In an application to estimate the mean number of kilometers students commute to school each day, the
following are given: n = 20; x  4.33 ; s  3.50
The point estimate for the true population mean is:
a. 1.638 b. 4.33 ± 1.638 c. 4.33 d. 3.50
8. s is the point estimate for the . . .
a. population variance c. sample variance
b. population standard deviation d. sample standard deviation
9. A random sample of 340 people in Carmen showed that 66 listened to an FM radio Station A. Based on
this sample information, what is the point estimate for the proportion of people in Carmen that listen to
Station A?
a. 340 b. 0.194 c. 66 d. 0.66
10. According to the Central Limit Theorem.
a. A sampling distribution is normally distributed even if the population in not.
b. A sampling distribution can be normally distributed only if the population is normally distributed also.
c. The population mean measures the sample mean.
d. The population mean and the sample mean of the distribution are equal.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #13 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

3) Activity 4: What I Know Chart, Part 2 (2 mins)


You may now answer the third column of table in activity 1 based on what you know now.
What I Know QUESTION What I Learned

What is sampling distribution?

What is a point estimate?

4) Activity 5: Check for Understanding (5 mins)


A study was recently conducted to estimate the mean cholesterol for adult males over the age of 50 years.
The following random sample data were observed:

245 304 135 300 202


196 210 188 390 256

Given the information above, what is the point estimate for the population mean?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #13 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

C. LESSON WRAP-UP

1) Activity 6: Thinking about Learning (5 mins)


You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

The point estimate of the population is obtained from the sample. The smaller the bias of the point estimator
and as the sample size increases, the closer is its mean value to the parameter being estimated.

KEY TO CORRECTION
Activity #3
1) b
2) b
3) a
4) a
5) d
6) c
7) c
8) a
9) b
10) a

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #14 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Lesson Title: STATISTICAL INTERVALS Materials:


Board, marker, and calculator(casio fx
Lesson Objectives: At the end of this session the students 350EX model)
will be able to:
1. solve for the confidence interval from a normally
distributed data. References:
http://www.buders.com/
https://www.statisticshowto.com/
http://www.bo.astro.it/
https://statisticsbyjim.com/

Productivity Tip: Stay Healthy.

A. LESSON PREVIEW / REVIEW

1) Introduction (2 mins)

When a parameter is being estimated, the estimate can be either a single number or it can be a range of
scores. When the estimate is a single number, the estimate is called a "point estimate"; when the estimate
is a range of scores, the estimate is called an interval estimate.

2) Activity 1: What I Know Chart, part 1 (3 mins)


Fill in the first column of what you know to answer the questions on the second column of the table below.

What I Know QUESTION What I Learned

What is interval estimation?

What is confidence interval ?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #14 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

B. MAIN LESSON

1. Activity 2: Content Notes (13 mins)

Forms of Interval Estimation

 Confidence Intervals
 Prediction Intervals
 Tolerance Intervals

Confidence Interval

 A confidence interval is a range of values that probably contain the population mean.
 The best known and often used statistical intervals. Confidence intervals are used to express the
uncertainty associated with the population parameter. The estimate of the interval should be
repeatable, meaning, if you do estimating the interval again and again, you will get the same result
and this could be express as the confidence level. Confidence levels are percentage o certainty.

For Normal Distribution

 If σ and sample mean are known:



 Confidence Interval: x  z /2 
n

Note: Margin of Error, MOE =
z / 2 
n

Standard Error =
n


 Lower boundary of the confidence interval: x  z / 2 
n


 Upper boundary of the confidence interval: x  z / 2 
n

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #14 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

 If s and sample mean are known:


s
 Confidence Interval: x  t /2 
n

Note: Margin of Error, MOE = t / 2 
n
s
 Lower boundary of the confidence interval: x  t / 2 
n
s
 Upper boundary of the confidence interval: x  t / 2 
n

Example 1. We have a sample of 20 observations from a Normal distribution with a standard deviation of
0.20 and a sample mean of 4.5. We want a 95 % level of significance. What are the lower and upper
boundary of the confidence interval?
 From the Z – score table, at 95% confidence level,   0.05 the corresponding z value is 1.96.

 Lower boundary of the confidence interval:


 0.20
x  z / 2   4.5  1.96   4.4
n 20
 Upper boundary of the confidence interval:
 0.20
x  z / 2   4.5  1.96   4.6
n 20

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #14 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Example 2. The sample mean result is 25%. For this estimate calculate a confidence interval if the margin
of error is 3.2% for this estimate.
 Lower boundary of the confidence interval:
x  MOE  25 %  3.2  21 .8%
 Upper boundary of the confidence interval:
x  MOE  25 %  3.2  28.2%

Therefore, the confidence interval is (21.8% to 28.2%).

Prediction Intervals
Prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability,
given what has already been observed. For example, in a 95% prediction interval of [10 15], you are 95%
confident that the next new observation will fall within this range.

Tolerance Intervals
A tolerance interval covers a specified proportion of the population for a given confidence level.
For example, 85% of the time, batteries will fall into the interval 100 to 120 hours, with 95% confidence.

2. Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking).

Problem Solving. The quality assurance (QA) manager of a light bulb factory needs to estimate the average
lifetime of a large shipment of bulbs made at the factory. The lifetime of these light bulbs is normally
distributed with a standard deviation of 100 hours. A random sample of 64 bulbs from the shipment results
in a sample mean lifetime of 350 hours.
Given: σ = 100 hrs.; x = 350 hrs.; n = 64

(a) Find a 95% confidence interval for the mean lifetime (µ) for the entire shipment.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #14 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

(b) Suppose that the standard deviation was 80 rather than 100 hours. Recalculate your confidence interval
from Part (a). Is it narrower or wider than your solution to (a)?

3) Activity 4: What I Know Chart, Part 2 (2 mins)

You may now answer the third column of table in activity 1 based on what you know now.
What I Know QUESTION What I Learned

What is interval estimation?

What is confidence interval?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #14 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

4) Activity 5: Check for Understanding (5 mins)

Problem Solving. A bottling company fills thousands of 12-ounce bottles with soda drink at the same level.
A random sample of bottles were taken from the processing line containing the following amount of soda
drink (in ounces.). 11.8; 12.1; 11.2; 12.0; 11.8; 11.7; 11.9. Assuming the distribution of the content is
normally distributed with a standard deviation of 0.01, find the 95% confidence interval the soda drink in the
bottles.

C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

 Increasing the sample size decreases, the width of confidence intervals, because it decreases the
standard error.
 The statement, "the 95% confidence interval for the population mean is (250, 300), is equivalent to
the statement, "there is a 95% probability that the population mean is between 250 and 300.

FAQs
Which is more accurate, a 95% confidence interval or a 99% confidence interval ?
The 99% confidence interval is more accurate than the 95%.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #15 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Lesson Title: Materials:


Board, marker, and calculator (casio fx
INTRODUCTION TO HYPOTHESIS TESTING – ONE
350EX model)
SAMPLE Z – TEST
References:
Lesson Objective: At the end of this session the students will https://opentextbc.ca/
be able to: https://statisticsbyjim.com/
1. apply one z – test to sample data. https://blog.minitab.com/
https://support.minitab.com/
https://www.superprof.co.uk/

Productivity Tip: Stay Healthy.

A. LESSON PREVIEW / REVIEW

1) Introduction (2 mins)

When you conduct some researches, you are trying to discover of something new. Improved process?
More accessible raw material? . . . Along the process, several questions will come up. So you are trying
to make some hypotheses to answer your questions. This session, will guide you on how to test your
hypothesis.

2) Activity 1: What I Know Chart, part 1 (3 mins)

Fill in the first column of what you know to answer the questions on the second column of the table below.

What I Know QUESTION What I Learned

What is hypothesis testing?

What is a z – test?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #15 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

B. MAIN LESSON

1. Activity 2: Content Notes (13 mins)

 HYPOTHESIS TESTING
 Hypothesis Testing is a statistical test used to determine whether the hypothesis assumed for the
sample of data stands true for the entire population or not.
 Hypothesis testing is also used when you are comparing two or more groups.
 The purpose of hypothesis testing is to determine whether there is enough statistical evidence in
favor of a hypothesis about a parameter.
 Hypothesis should be simple and specific. There are two types of statistical hypothesis, the null
hypothesis and the alternative hypothesis. The null and alternative hypotheses are contradictory.
Since they are contradictory, you must examine evidence to decide if you have enough evidence to
reject the null hypothesis or not.

 NULL HYPOTHESIS
 Denoted as Ho. H0 always has a symbol with an equal in it.
 A statement that there is no relationship between two measured phenomena or no association
among groups.
 A null hypothesis is a hypothesis that says there is no statistical significance between the two
variables in the hypothesis. It is the hypothesis that the researcher is trying to disprove.
 It is a statement of no difference between sample means or proportions. It may also be a statement
of no difference between a sample mean and a population mean. In other words, the difference
equals 0.

 ALTERNATIVE HYPOTHESIS
 Denoted as H1.
 It is a claim about the population that is contradictory to Ho.
 H1 never has a symbol with an equal in it.
 The hypothesis that one is trying to establish, and it can be “statistically proved” by a rejection of
the null hypothesis.

Example. Write the null hypothesis and the alternative hypothesis in the following statements.

1. Employees have mean paid vacation of 4 weeks per year.


Null Hypothesis, Ho: µ = 4 weeks per year
Alternative Hypothesis, H1: µ  4 weeks per year

2. The mean number of cars a person owns in his lifetime is not more than 5.
Null Hypothesis, Ho: µ ≤ 5 cars
Alternative Hypothesis, H1: µ > 5 cars

3. Seventy percent of the first year engineering students have no failing grades this school
year.
Null Hypothesis, Ho: p = 0.75
Alternative Hypothesis, H1: p  0.75
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Module #15 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

4. At most 85% of the people voted in latest local elections.


Null Hypothesis, Ho: p ≤ 0.85
Alternative Hypothesis, H1: µ > 0.85

 LEVEL OF SIGNIFICANCE
 Denoted by α
 Measures the strength of the evidence that must be present in your sample before rejecting the null
hypothesis.
 It is the probability of rejecting the null hypothesis when in fact it is true, that is (Type 1 error = α).
 Usual values of α are 0.05, 0.02, or 0.01.

 TEST STATISTIC
 A test statistic is a random variable that is calculated from sample data and used in a hypothesis
test.
 Test statistics is used to determine whether to reject the null hypothesis. The test statistic compares
your data with what is expected under the null hypothesis.
 The test statistic is used to calculate the p-value.
 Examples of test statistic are: for a Z-test is the Z-statistic, for the T –test is the t – statistic.

 TYPES OF ERRORS IN HYPOTHESIS TESTING


 Type l Error
o Occurs when the null hypothesis is true but it is rejected.
o Denoted by α.
 Type II Error
o Accepting the null hypothesis which should have been rejected.
o Denoted by β.

TRUTH ABOUT THE POPULATION


Ho is true H1 is true
Decision Based Reject Ho Type I Error Correct Decision
on Sample Accept H1 Correct Decision Type II Error

 P-VALUE
 The probability that your sample could have been drawn from the population being tested given
that the null hypothesis is true.
 A p-value of 0.05 indicates that you have only 5% chance of drawing the sample tested if the
null hypothesis was actually true.
 If the p-value is less than the significance level, we reject the null hypothesis.
 The p – value is the area under the curve at the rejection region.
Example. If the observed value of z = 1.51 (calculated value), then from the statistical table at z
= 1.5 is 0.93448, so the p – value of the sample is 0.06552.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #15 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Statistical significance plays a pivotal role in statistical hypothesis testing. It is used to determine
whether the null hypothesis should be rejected or retained. The null hypothesis is the default
assumption that nothing happened or changed.

For the null hypothesis to be rejected, an observed result has to be statistically significant, i.e. the

observed p-value is less than the pre-specified significance level .

Source : https://www.google.com/

 STEPS IN HYPOTHESIS TESTING


Step 1: Specify the null hypothesis and the alternative hypothesis.
Step 2: Set the level of significance (α). Indicate the critical value of the test statistic at this level
of significance.
Step 3: Calculate the test statistic and corresponding p – value.
 Critical Value Approach
o If H1 contains the “>”, then conduct a right tailed test. Compare calculated test statistic
with the critical value of the test statistic at the given α. If calculated test statistic < critical
value of the test statistic, then you do not reject the null hypothesis.

o If H1 contains the “<”, then conduct a left tailed test. Compare calculated test statistic
with the critical value of the test statistic at the given α. If calculated test statistic > critical
value of the test statistic, then you do not reject the null hypothesis.

o If H1 contains the “  “, then conduct a 2 tailed test. Compare calculated test statistic with
the critical value of the test statistic at the given α/2. If calculated test statistic (if negative)
< critical value of the test statistic, then you do not reject the null hypothesis and If
calculated test statistic ( if positive) > critical value of the test statistic, then you do not
reject the null hypothesis
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Module #15 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

 P- value Approach
o Compare the p – value with α. If p – value < α, then you do not reject the null hypothesis.

Step 4: Drawing a conclusion. Whether to reject the null hypothesis or not to reject the null hypothesis.

One Sample Z – test


 Z-test is a statistical test where normal distribution is applied and is basically used for dealing with
problems relating to large samples when n ≥ 30.

Assumptions of One – Sample z – test


 The data are continuous.
 The data follow the normal probability distribution.
 The sample is a simple random sample from its population. Each individual in the population has an
equal probability of being selected in the sample.
 The population standard deviation is known.

Example of One Sample z – test.


Source : https://www.superprof.co.uk/

Problem. A manufacturer of electric lamps is testing a new production method that will be considered
acceptable if the lamps produced by this method result in a normal population with an average life of 2,400
hours and a standard deviation equal to 300. A sample of 100 lamps produced by this method has an average
life of 2,320 hours. Can the hypothesis of validity for the new manufacturing process be accepted with a risk
equal to or less than 5%?
Step 1. State the Null Hypothesis and the Alternative Hypothesis
Null Hypothesis: Ho: µ = 2,400 hours
Alternative Hypothesis: H1: µ  2,400 hours
Step 2. Level of Significance, α = 0.05
Step 3. Calculate the test statistic, z – statistic.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #15 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

x 2,320  2,400


z   2.67 , (note :  is called as the standard error of the statistic)
 300 n
n 100
Step 4. The calculated z – statistic ( 2.67) is less than the critical value of the z statistic at α/2 (1.96), so
there’s enough evidence to reject the null hypothesis in favor of the alternative hypothesis. The test
is statistically significant, suggesting that the mean life of these electric lamps is not equal to 2,400
hours as claimed by the manufacturer.

Further, at z = 2.67, from the statistical table, the area is equal to (0.0038 x 2) 0.0076 .
Note that, this area is the p value of the sample, and since the p- value is less than the α/2 (0.025),
then we do not accept the null hypothesis.

2. Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking) .

A rental car company claims the mean time to rent a car on their website is 60 seconds with a standard
deviation of 30 seconds. A random sample of 36 customers attempted to rent a car on the website. The
mean time to rent was more than 70 seconds. Is there enough evidence that the sample mean time is more
than 60 seconds at 95% level of significance?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #15 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

3. Activity 4: What I Know Chart, Part 2 (2 mins)


You may now answer the third column of table in activity 1 based on what you know now.
What I Know QUESTION What I Learned

What is hypothesis testing?

What is a z – test ?

4. Activity 5: Check for Understanding (5 mins)

MULTIPLE CHOICE. Encircle the correct answer.


1. A test is conducted; Ho: µ = 20, the standard deviation is σ = 3. A sample of 36 has a x  21 . What is
the value of the z - test statistic?
a. 2.1 b. 2.0 c. 1/3 d. 4

2. A test is conducted; Ho: µ = 40, H1 : µ > 40 . The z- test statistic is 1.5. What is the p- value of
this test?
a. 0.9332 b. 0.0667 c. 0.05 d. 0.01

3. A test is conducted; Ho: µ = 40, H1 : µ < 40 . The z- test statistic is - 1.5. The correct decision is:
a. Reject H0 both α = 0.05 and α = 0.01.
b. Reject H0 at α = 0.05 but do not reject Ho at α = 0.01.
c. Reject H0 both α = 0.05 and α = 0.10.
d. Reject H0 at α = 0.05 but do not reject Ho at α = 0.10.

4. Which is a null hypothesis?


a. Ho: µ = 40 b. Ho: µ < 40 c. Ho: µ > 40 d. Ho: µ  40

5. The z – test is used to test the sample mean in the following case. . .
a. sample standard deviation is known.
b. sample size is less than 30.
c. sample size is more than 30.
d. data are not normally distributed.
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Module #15 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

C. LESSON WRAP-UP

1) Activity 6: Thinking about Learning (5 mins).


You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Statistics is about data and it is the interpretation of the data that we are interested in. In hypothesis testing
we are trying to interpret or draw conclusions about the population using data coming from the sample.
Further, hypothesis testing evaluates statements about a population to evaluate which statement is
supported by the sample data.

KEY TO CORRECTION
Activity #3
Conduct a one-sample z-test.

Step 1. State the Null Hypothesis and the Alternative Hypothesis


Null Hypothesis : Ho : µ = 60 seconds
Alternative Hypothesis : H1 : µ > 60 seconds
Step 2. Level of Significance, α = 0.05 , at α, z crit = 1.96

Step 3. Calculate the test statistic, z – statistic.


x 70  60
z  2
 30
n 36

Step 4. The calculated z – statistic (2.0) is greater than the critical value of the z statistic at α (1.96), so
there’s enough evidence to reject the null hypothesis in favor of the alternative hypothesis,
suggesting that the mean time to rent a car is more than 60 seconds.

Further, if z = 2.0, the corresponding p value of the sample is 0.02275, and since the p - value is
less than the α (0.05), then we do not accept the null hypothesis or we reject the null
hypothesis in favor of the alternative hypothesis.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #15 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #15 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Areas Under the Normal Curve (Statistical Table).

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #15 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Lesson Title: HYPOTHESIS TESTING : t - TEST Materials:


Board, marker, and calculator(casio fx
350EX model)

Lesson Objectives: At the end of this session the students References:


will be able to: https://www.investopedia.com/
1. conduct hypothesis testing using the t – test. https://www.statisticshowto.com/
https://blog.minitab.com /
https://www.statisticshowto.com/
https://www.investopedia.com/

Productivity Tip: Stay Healthy.

A. LESSON PREVIEW / REVIEW

1) Introduction (2 mins)

⮚ A t-test is a statistical test used to determine if there is a significant difference between the means of
two groups.

2) Activity 1: What I Know Chart, part 1 (3 mins)


Fill in the first column of what you know to answer the questions on the second column of the table below.

What I Know QUESTION What I Learned

What is a one sample t – test?

What is a two sample t – test?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

B. MAIN LESSON

1. Activity 2: Content Notes (13 mins)

⮚ A t-test looks at the


● t-statistic.
● t-distribution values.
● the degrees of freedom to determine the statistical significance.

⮚ t -Test Assumptions
● The sample is collected from a representative of randomly selected portion of the total population.
● The data is normally distributed.
● Population means is known.

Types of t – test
⮚ One Sample t – test. This test the mean of a single group against a known population.
⮚ Independent Sample t – test. This test compares the mean for two groups of sample.
⮚ A Paired Sample t – test. This test compares the means of the same group at different times.

One Sample t – test

⮚ The One Sample t - Test is commonly used to test the statistical difference between a sample mean
and a known or hypothesized value of the mean in the population.

⮚ t-statistic.
where x = sample mean
x
t s = sample standard deviation
s µ = population mean
n n = sample size

Example. Test the hypothesis at α = 0.05 that taking a vitamin capsule makes an individual smarter. Average
IQ of an individual is 100. To test the hypothesis 12 engineering students take a the same vitamin capsule
for one year and then an IQ test was given to these students. The results are 116, 111, 101, 120, 99, 94,
106, 115, 107, 101, 110, and 92.

Answer. Follow the steps in hypothesis testing.

Step 1. State the Null Hypothesis and the Alternative Hypothesis


Null Hypothesis : Ho : µ = 100
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Alternative Hypothesis : H1 : µ > 100 ( One tailed test )


Step 2. At level of significance, α = 0.05, and degrees of freedom ( n – 1 ) equals 11, the critical value of
t, crit = 1.796.

Step 3. Calculate the test statistic, t – statistic.

x 106  100


t    2.35
s 8.83
From the sample, x = 106 and s = 8.83; n 12
Step 4. The calculated or the observed value of the t – statistic (2.35 ) is greater than the critical value of
the t- statistic (at α = 0.05 and degrees of freedom = 11, 1.796), or we may say that the observed value
of the t – statistic is at the rejection region. Hence, we reject the null hypothesis in favor of the
alternative hypothesis. This suggests that the mean IQ of the sample individuals is above the
average IQ of 100.

Further using the p - approach, at t = 2.35 and at degrees of freedom = 11, the p value is between
1% and 2.5%, hence lesser than the level of significance, α = 0.05 ( or 5 % ), so we reject the
null hypothesis.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Independent Sample t – test.

⮚ This test compares the mean for two groups of sample that are independently selected from each
other.
⮚ There are two types of independent sample t - test.
● Equal Variance ( Pooled variance t – test) with degrees of freedom, df = n1 + n2 – 2.
● Unequal Variance (Separate variance t – test) with degrees of freedom,

𝑠2 𝑠2
( 1 + 2 )2
𝑛1 𝑛2
𝐷𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 = 2
𝑠 𝑠2
( 1 )2 ( 2 )2
𝑛1 𝑛2
𝑛1 − 1 + 𝑛2 − 1

⮚ The t - statistic
where :
( x1  x 2 )  Do
t x1 and x2 = mean of sample 1 and sample 2
(n1  1)  s1  (n2  1)  s2  1 1 
2 2
s1 and s2 = variance of sample 1 and sample 2
    n1 and n2 = size of sample 1 and sample 2
n1  n2  2  n1 n2  Do = - (a number that is deduced
from the statement of the situation).

Equal Variance (or Pooled) t -Test

Example. An experiment was performed to compare the abrasive wear of two materials. Ten pieces of
material 1 ( group 1) and ten pieces of material 2 (group 2) were tested. The test on material 1 gave an
average wear of 85 units with a sample standard deviation of 4, and the test on material 2 gave an average
wear of 81 with a sample standard deviation of 5. Can we conclude at 0.05 level of significance that abrasive
wear of material 1 is greater than that of material 2 ? Assume the populations are normally distributed and
with equal variances.

Answer. Follow the steps in hypothesis testing.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Step 1. State the Null Hypothesis and the Alternative Hypothesis


Null Hypothesis : Ho : µ1 - µ2 = 0
Alternative Hypothesis : H1 : µ1 - µ2 > 0 ( One tail test )
Step 2. At level of significance, α = 0.05, and degrees of freedom (n1 + n2 - 2 ) equals 18, the critical
value of t, crit = 1.34.

Step 3. Calculate the test statistic, t – statistic.


( x1  x 2 )  Do (85  81)  0
t   1.96
(n1  1)  s1  (n2  1)  s2
2 2
1 1  (10  1)  4  (10  1)  52  1
2
1 
      
n1  n2  2  n1 n2  10  10  2  10 10 

Step 4. The calculated or the observed value of the t – statistic (1.96 ) is greater than the critical value of
the t- statistic (at α = 0.05 and degrees of freedom = 18, 1.34), or we may say that the observed value
of the t – statistic is at the rejection region. Hence, we reject the null hypothesis in favor of the
alternative hypothesis. This suggests that the abrasive wear of material 1 is greater than the
abrasive wear of material 2.

Further, at t = 1.96 and at degrees of freedom = 18, the p value is between 5% and 2.5%, hence
lesser than the level of significance, α = 0.05 ( or 5 % ), so we reject the null hypothesis.

Unequal Variance t -Test

Example. Assume that we are taking a diagonal measurement of bill boards purchased by a company.
Group 1 of samples includes 20 bill boards, while group 2 includes 10 billboards.
Statistical Data : Group 1 : mean diagonal measurement = 21.6 inches ; variance = 17.1
Group 2 : mean diagonal measurement = 19.4 inches ; variance = 1.4
Can we conclude that the mean of group 1 is greater than group 2.
Step 1. State the Null Hypothesis and the Alternative Hypothesis
Null Hypothesis : Ho : µ1 = µ2
Alternative Hypothesis : H1 : µ1 > µ2 ( One tail test )
Step 2. At level of significance, α = 0.05, and

degrees of freedom equals 24, the critical value of t, crit = 1.711.

2
 s12 s2 2 
2
   17.1 1.4 
     
Degrees of freedom =  n1 n2 
  
20 10
 24
2 2 2 2 2 2
 s1   s2   17.1   1 .4 
       
n  n   20   10 
  
1 2  
n1  1 n2  1 20  1 10  1

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Step 3. Calculate the test statistic, t – statistic.


( x1  x 2 ) ( 21.6  19.4)
t    2.194
2 2
s1 s 17.1 1.4
 2 
n1 n2 20 10
Step 4. The calculated or the observed value of the t – statistic (2.194 ) is greater than the critical value of
the t- statistic (at α = 0.05 and degrees of freedom = 18, 1.711), or we may say that the observed value
of the t – statistic is at the rejection region. Hence, we reject the null hypothesis in favor of the
alternative hypothesis. This suggests that the diagonal length of the billboards in group 1 sample
is greater than that of group 2 sample.

Further using the p approach, at t = 2.194 and at degrees of freedom = 24, the p value is between
2.5% and 1.0%, hence lesser than the level of significance, α = 0.05 ( or 5 % ), so we reject the
null hypothesis.

Paired t - Test

⮚ A paired t-test is used when we are interested in the difference between two variables for the
same subject. Often the two variables are separated by time or something other than time.
⮚ Compares the means of two related groups of samples.
⮚ The t –statistic with degrees of freedom df = n-1

t
D
n   D 2   D 
2

n 1 where: D = sample difference


Paired t – Test Example.

Compare the fuel economy of the two cars , where the cars in each pair is operated using different
types of gasoline ( Type 1 gasoline & Type 2 gasoline)

Car 1 Car 2 D, D^2,


Model (Using Type 1 gasoline ) (Using Type 2 gasoline ) Difference ( Difference)^2
1 17 17 0 0
2 13.2 12.9 0.3 0.09
3 35.3 35.4 -0.1 0.01
4 13.6 13.2 0.4 0.16
5 32.7 32.5 0.2 0.04

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

6 18.4 18.1 0.3 0.09


7 22.5 22.5 0 0
8 26.8 26.7 0.1 0.01
9 15.1 15 0.1 0.01

Follow the steps in hypothesis testing.

Step 1. State the Null Hypothesis and the Alternative Hypothesis


Null Hypothesis : Ho : µ1 = µ2
Alternative Hypothesis : H1 : µ1 > µ2 ( One tail test )
Step 2. At level of significance, α = 0.05, and degrees of freedom (n - 1 ) equals 8, the critical value of t, t
crit = 1.86.

Step 3. Calculate the test statistic, t – statistic.

t
D 
1.3
 2.6
nD   D  9  (0.41)  (1.3) 2
2 2

n 1 9 1

Step 4. The calculated or the observed value of the t – statistic (2.6 ) is greater than the critical value of the
t- statistic (at α = 0.05 and degrees of freedom = 8, 1.86), or we may say that the observed value of
the t – statistic is at the rejection region. Hence, we reject the null hypothesis in favor of the
alternative hypothesis. This suggests that the Type 1 gasoline is more economical fuel than the
Type 2 gasoline.

Further ( using the p-value approach), at t = 2.6 and at degrees of freedom = 8, the p value is
between 2.5% and 1.0%, hence lesser than the level of significance, α = 0.05 ( or 5 % ), so we
reject the null hypothesis.

2. Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking) .

The table gave the observations of the control group and the treatment group. Use paired t-test to at 0.05
level of significance to determine if there is a significant difference between the mean of the two groups.
Sample Control Treatment
No. Group Group
1 3 20
2 3 13

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

3 3 13
4 12 20
5 15 29
6 16 32
7 17 23
8 19 20
9 23 25
10 24 15
11 32 30

Answer Sheet:

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

3) Activity 4: What I Know Chart, Part 2 (2 mins)


You may now answer the third column of table in activity 1 based on what you know now.
What I Know QUESTION What I Learned

What is a one sample t – test?

What is a two sample t – test?

4) Activity 5: Check for Understanding (5 mins)


Multiple Choice. Encircle the letter of the correct answer.
1. A one sample t – test uses the following statistics: n = 5; x  5.5 and s =0.70 .
The null hypothesis is µ = 5.0 is . . .
a. accepted at the 10% level of significance; accepted at the 5 % level of significance.
b. accepted at the10% level of significance; rejected at the 5 % level of significance.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

c. rejected at the 10% level of significance; accepted at the 5 % level of significance.


d. rejected at the 10% level of significance; rejected at the 5 % level of significance.

2. In testing the differences between the means of two independent populations, the null hypothesis
is . . .
a. H o : 1   2  1 b. H o : 1   2  0 c. H o : 1   2  0 d. H o : 1   2  1

3. The result is statistically significant when. . .


a. the null hypothesis is true. c. when the p value is less than or equal to the α.
b. the alternative hypothesis is true. d. when the p value is larger than α.

4. A one sample t – test was conducted to test the IQ of engineering students. The observe
t – statistic in the study with 15 samples at 0.05 level of significance is 2.0. What is the p – value
of this study?
a. within a value of 0.05 and 0.025.
b. greater than 0.05.
c less than 0.025.
d. none of the above

5. Two different alloys are being considered for making lead-free solder used in the wave soldering
process for printed circuit boards. A crucial characteristic of solder is its melting point, which is
known to follow a Normal distribution. A study was conducted using a random sample of 21 pieces
of solder made from each of the two alloys. In each sample, the temperature at which each of the
21 pieces melted was determined. The mean and standard deviation of the sample for Alloy 1 were
x1 = 218.9ºC and s1 = 2.7ºC; for Alloy 2 the results were x2 = 215.5ºC and s2 = 3.6ºC. If we were
to test H0: µ1 = µ2 against Ha: µ1 ≠ µ2. In this study what is the degrees of freedom equal to?

a. 21 b. 20 c. 40 d. 42

C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

⮚ The t test is the commonly used statistical test to test the means of two groups of sample.
⮚ You can also use the data menu of Microsoft excel to find the critical value of t and the observe
value of t.

⮚ Let us use Microsoft excel to find the critical value of t and the observed t for the data in activity 3
above.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

The table gave the observations of the control group and the treatment group. Use paired t-test to
at 0.05 level of significance to determine the significance of the mean of the two groups.
Sample Control Treatmen
No. Group t Group
1 3 20
2 3 13
3 3 13
4 12 20
5 15 29
6 16 32
7 17 23
8 19 20
9 23 25
10 24 15
11 32 30

Steps:

● Enter the control group and the treatment group columns in excel.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

● Click Data, then Data Analysis , the t-test: Paired Two Sample for Means, then press OK.

● Input Variable 1 Range, then Variable 2 Range, enter Alpha, click New Worksheet Ply, then OK.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

The observed / calculated value of t

The critical value of t

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

KEY TO CORRECTION
Activity #3
Follow the steps in hypothesis testing.

Step 1. State the Null Hypothesis and the Alternative Hypothesis


Null Hypothesis: Ho: µ1 = µ2
Alternative Hypothesis: H1: µ1  µ2 ( Two tailed test )
Step 2. At level of significance, α/2 = 0.025, and degrees of freedom (n - 1) equals 10, the critical value of
t, t crit = 2.228

Step 3. Calculate the test statistic, t – statistic.

t
D 
 73
 2.73
n   D 2   D  11  (1,131)  ( 73) 2
2

n 1 11  1

Control Treatment
Sample Group Group D
No.
1 3 20 -17 289
2 3 13 -10 100
3 3 13 -10 100
4 12 20 -8 64
5 15 29 -14 196
6 16 32 -16 256
7 17 23 -6 36
8 19 20 -1 1
9 23 25 -2 4
10 24 15 9 81
11 32 30 2 4

Step 4. The calculated or the observed value of the t – statistic (- 2.73 or 2.73 since two tailed test is
conducted) is greater than the critical value of the t- statistic (at α = 0.05 and degrees of freedom =
10, 2.228), or we may say that the observed value of the t – statistic is at the rejection region.
Hence, we reject the null hypothesis in favor of the alternative hypothesis. This suggests that the
control group and the treatment group do not have equal mean.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Further (using the p-value approach), at t = 2.73 and at degrees of freedom = 10, the p value is
between 2.5% and 1.0%, hence lesser than the level of significance, α = 0.05 (or 5 %), so we reject
the null hypothesis.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #16 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #17 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Lesson Title: Materials:

SIMPLE LINEAR REGRESSION & CORRELATION Board, marker, and calculator(casio fx


COEFFICIENT 350EX model)

Lesson Objectives: At the end of this session the students References:


will be able to: http://onlinestatbook.com/
1. solve for the regression equation and the correlation https://www.scribbr.com/
coefficient of two variables, then give interpretation to https://www.thebalancesmb.com/
the extent of the relationship of the response variable https://wps.prenhall.com/
and the predictor variable.

Productivity Tip: Stay Healthy.

A. LESSON PREVIEW / REVIEW

1) Introduction (2 mins)

 Simple linear regression is a statistical method for obtaining a formula to predict the scores on one
variable from the scores on a second variable. The variable we are predicting is called the criterion
variable and is referred to as Y. The variable we are basing our predictions on is called the predictor
variable and is referred to as X. When there is only one predictor variable, the prediction method is
called simple regression.
 In simple linear regression, the predictions of Y when plotted as a function of X form a straight line.
 Linear regression consists of finding the best-fitting straight line through the points. The best-fitting line
is called a regression line.

2) Activity 1: What I Know Chart, part 1 (3 mins)


Fill in the first column of what you know to answer the questions on the second column of the table below.

What I Know QUESTION What I Learned

What is simple linear regression?

What is correlation coefficient?

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #17 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

B. MAIN LESSON

1. Activity 2: Content Notes (13 mins)

Simple Linear Regression Model


where : Yi = the response variable
Yi   0  1 X i   i Xi = the predictor (or explanatory )variable
β0 and β1 = the regression coefficients
εi = the residual error

The residual error is,  i = Yi  Y0 , where Yi is the predicted value and Y0 is the observed
value. The error term is used to account for the variability in y that cannot be explained by the linear
relationship between x and y. If ε were not present, that would mean that knowing x would provide
enough information to determine the value of y.

The  0 ( the intercept of the regression line) and  1 ( the coefficient of X i or the slope of the
regression line ) is estimated by minimizing the sum of the square of the residual error. This
procedure is known as the Method of Least Square.

 
2
minimize ( i = (Yi  Y.0 ) 2 )

From calculus, we arrived at the following values of  0 and  1 .

1 n 
n n n
n   xi yi   xi   yi
n

1  i 1 i 1 i 1 and  o    yi  1  xi 
n
 n 
2 n  i 1 i 1 
n xi    xi 
2

i 1  i 1 
Equation 2
Equation 1

We then substitute the value of  0 and  1 and to the equation and have the regression line
equation.
Y   0  1 X Equation 3

Assumptions of Simple Linear Regression


Simple linear regression is a parametric test, meaning that it makes certain assumptions about the data.
These assumptions are . . .
 Homogeneity of variance, that is the size of the error in our prediction doesn’t change
significantly across the values of the independent variable.
 Independence of observations.
 Normality, that is the data follows a normal distribution.
This document is a property of PHINMA EDUCATION
ECE 069: Engineering Data Analysis
Module #17 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

 The relationship between the independent and dependent variable is linear, that is, the line of
best fit through the data points is a straight line (rather than a curve)

Correlation Coefficient, r

 One of the most commonly used correlation coefficient is the Pearson’s correlation coefficient, r.
 The correlation coefficient, r, measures the strength of the linear relationship between the response
variable and the set of explanatory variable.

Formula of the Correlation Coefficient , r

nx y  x y
r 
n x 2
  x 
2
 n y 2
  y 
2
 Equation 4

Coefficient of Determination, r2
 The square of the correlation coefficient.
 It is the proportion of variation in the response variable explained by the regression model.
 The most common interpretation of the coefficient of determination is how well the regression model fits
the observed data. For example, a coefficient of determination of 60% shows that 60% of the data fit the
regression model. Generally, a higher coefficient indicates a better fit for the model.

Interpretation of the correlation coefficient, r.

Graph of Data Points on the Regression line at Various Value of r.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #17 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

Example 1. A research was done to study the effect of ambient temperature, x, on the electric power
consumed, y, by an industrial plant. Other factors were held constant. Below are data collected from
the experiment. Find the equation of the regression line and estimate the electric power consumption
when x = 70 0F.

y, x,
Trials
(BTU) (0 F )
1 250 27
2 285 45
3 320 72
4 295 58
5 265 31
6 298 60
7 267 31
8 321 74

We extend columns of the above table to solve for the  0 and  1 .

y x x*y x^2 y^2

1 250 27 6750 729 62500


2 285 45 12825 2025 81225
3 320 72 23040 5184 102400
4 295 58 17110 3364 87025
5 265 31 8215 961 70225
6 298 60 17880 3600 88804
7 267 31 8277 961 71289
8 321 74 23754 5476 103041

sum 2301 398 117851 22300 666509

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #17 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

From this table, we have Σ y i = 2,301; Σ x I = 398; Σ x i * y I = 117, 851; Σ xi 2 = 22,300 and
Σ yi 2 = 22,300. We then substitute these values to Equation 1, then to Equation 2 to solve  0 and
1
n n n
n   xi yi   xi   yi
8  (117 ,851)  398  ( 2,301)
1  i 1 i 1 i 1
  1.35
8  ( 22,300 )  398 
2 2
n
 n 
n  xi    xi 
2

i 1  i 1 

1 n n

o    yi  1  xi  
1
2,301  1.35  (398)   220 .5
n  i 1 i 1  8

 Substitute the values of  0 and  1 Equation 3, hence, the regression line equation is . . .
y = 220.5 + 1.35 x.

 To predict the power consumption at x = 70 0F, we substitute this value to the regression line to
predict the power consumption, y.

y = 220.5+ 1.35 (70) = 315 BTU

Example 2. What is correlation coefficient of Example 1? Interpret your result.

From this table, we have Σ y i = 2,301; Σ x I = 398 ; Σ x i * y I = 117, 851 ; Σ xi 2 = 22,300 ;


Σ yi 2 = 22,300. Substitute the values to Equation 4 to solve for r.

nx y x y 8  (117,851)  (398)  (2,301)


r   0.99
n x   x  n y   y  8  (22,300)  (398)  8  (666,509  (2,301) 
2 2 2 2 2 2

The value of r =0.99, indicates that there is a very high positive relationship between the electric power
consumption and ambient temperature. That there is an increase in electric power consumption for an
increase in ambient temperature. Furthermore, the coefficient of determination of 0.98 (r2 = 0.992)
indicates that 98 % of the data fits into the regression line.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #17 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

2) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking)
Given below are data set on y and x. Let the y be the response variable and x be the predictor variable.
Find the equation of the regression line equation and the value of the correlation coefficient, r. Interpret
your result.

x y

0 2
1 3
2 5
3 4
4 6

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #17 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

3) Activity 4: What I Know Chart, Part 2 (2 mins)


You may now answer the third column of table in activity 1 based on what you know now.
What I Know QUESTION What I Learned

What is simple linear regression?

What is correlation coefficient?

4) Activity 5: Check for Understanding (5 mins)


Multiple Choice. Encircle the best answer.
1. Regression analysis . . .
a. estimate the mean of two variables. c. establishes a relation between two variables.
b. establish cause and effect. d. measures confidence.

2. If r 2 = 0.99, how confident are you in using the regression line to estimate the response variable given
the predictor variable?
a. not confident c. the relationship is weak to predict
b. very confident d. the relationship cannot be predicted
3. If the correlation coefficient is 0.90, the percentage of variation in the response variable explained by
the variation in the predictor variable is . . .
a. 0.90 % b. 90% c. 81% d. 0.81%

4. The correlation coefficient is used to determine . . .


a. a value of the y-variable given a specific value of the x-variable.
b. a value of the x-variable given a specific value of the y-variable.
c. the strength of the relationship between the x and y variables.
d. none of the above.

5. Larger values of r2 give us idea t hat the observations are more closely grouped about the . . ..
a. average value of the independent variables.
b. average value of the dependent variable
c. least squares line.
d. none of the above.

This document is a property of PHINMA EDUCATION


ECE 069: Engineering Data Analysis
Module #17 Student Activity Sheet

Name: _________________________________________________ Class number: _______


Section: ____________ Schedule: __________________________ Date: _______________

C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Group yourselves by three. Search for a problem (with given data points) related to your profession
that uses regression analysis. Solve for the regression line and the correlation coefficient then interpret
your result.

KEY TO CORRECTION
Activity #3
Extending the columns of the preceding table.

x y x*y x^2 y^2

0 2 0 0 4
1 3 3 1 9
2 5 10 4 25
3 4 12 9 16
4 6 24 16 36

SUM 10 20 49 30 90

From this table, we have n = 5; Σ y i = 20; Σ x I = 10; Σ x i * y I = 49; Σ xi 2 = 30; Σ yi 2 = 90.

We then substitute these values to Equation 1, then to Equation 2 to solve  0 and  1 .


n n n
n   xi yi   xi   yi
5  ( 49)  (10)  ( 20)
1  i 1 i 1 i 1
  0.9
n
 n 
2
5  (30)  (10) 2
n xi    xi 
2

i 1  i 1 

1 n n

o    yi  1  xi  
1
20  0.9  (10)   0.20
n  i 1 i 1  5

This document is a property of PHINMA EDUCATION

You might also like