0% found this document useful (0 votes)
316 views159 pages

MTH281 Final Samples and Notes

This document provides a preview of an online test for a Probability and Statistics course. The test contains 8 multiple choice questions about confidence intervals. It informs students that partial credit will only be given if work is shown, answers should be rounded according to instructions, and online calculators and a formula sheet are available. The test can be saved and resumed later, and answers are automatically saved.

Uploaded by

narandar kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
316 views159 pages

MTH281 Final Samples and Notes

This document provides a preview of an online test for a Probability and Statistics course. The test contains 8 multiple choice questions about confidence intervals. It informs students that partial credit will only be given if work is shown, answers should be rounded according to instructions, and online calculators and a formula sheet are available. The test can be saved and resumed later, and answers are automatically saved.

Uploaded by

narandar kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 159

Puja Gridhar 500+

My Institution Courses Community Content Collection Academic Support Taskstream AMS

H Instructor’s Assessments ASSIGNMENTS Preview Test: Assignment 4 Edit Mode is: • ON ?


MTH-
281-
502-
Preview Test: Assignment 4
FALL2022-
1--
->
*
ERS
Probability
and
Statistics-
Test Information
CRN:40722
Description
Instructions
For Essay questions, you need to show ALL your work to get the full mark.
Partial credit will only be given where sufficient understanding of the problem
has been demonstrated and work is shown; no work=0 points.
In fill in blank questions, please round your answer according to the instructions
given in the question.
Online calculators:
http://www.statdistributions.com/t/
http://www.statdistributions.com/normal/
https://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/Descriptive.htm
Formula Sheet: Chapter 6 - Formula Sheet

Multiple Not allowed. This test can only be taken once.


Attempts
Force This test can be saved and resumed later.
Completion
Your answers are saved automatically.

QUESTION 1 3 points   Save Answer

Find the confidence level related to the critical value zα/2=1.81

96.5%

C
93%
89%
7%

QUESTION 2 4 points   Save Answer

A research study reported a 95% confidence interval for the mean time of hip surgeries
-
based on a sample of 91- patients as (116 minutes, 138 minutes). Which of the following
statements is correct about the given interval?
- The mean time of hip surgeries of A. Correct *
correct the 91 patients in the selected
sample falls between 116 and 138
minutes.
B. Incorrect
."so
- The mean time of hip surgeries of
any sample of 91 patients will fall
Incorrect between 116 and 138 minutes.
-
95% of hip surgeries take between ->
Incorrect
116 and 138 minutes.
- We are 95% confident that the mean
time of hip surgeries will be between
116 and 138 minutes. correct

QUESTION 3 3 points   Save Answer


A survey of 36 popular restaurants is used to estimate the mean number of kitchen
employees. Assume that the number of kitchen employees is normally distributed, which
not normal
5122 >,30
distribution should be used to construct the confidence interval?
2
->
t distribution with 35 degrees of freedom
Neither standard normal distribution nor t distribution
d): n-1
36-1=3S
=

t distribution with 36 degrees of freedom


Standard normal distribution

QUESTION 4 3 points   Save Answer

An advertisement agent computed 90%,& 95%, 97%, and 99% confidence intervals for the 4
mean lifetime of a smartwatch battery. Which interval is the one with 95%
-
confidence level? c4 Width
(129.2, 144.8) -> 144.8 1129.2: 15.6 -
97.1
(127.7, 146.3) -> 18.6 199.
(131.1, 142.9) 90·1
->
11.8 -

->
(129.9, 144.1)
->
14.2 - 95.

QUESTION 5 3 points   Save Answer

The health department is estimating the percentage of smokers among adults aged n = 50
between 18 - 22. A random sample of --50 students was used and 18 are reported as 18
ec =
smokers. Find the corresponding critical value for calculating 94.5% confidence interval.
Round your answer to three decimal places.
c= 0.945
2a
B P=&C=18_0.36

QUESTION 6 3 points   Save Answer


M
A company owns 520 cars and the manager wants to estimate the mean distance traveled normal
during the past 20 working days. So she gathered the information for 16 cars. Assuming the
-

distances traveled are-


-

normally distributed. Find the corresponding critical value for


t-dist
calculating a 90% confidence interval. Round your answer to three decimal places if HI G
-
needed.
df 16-1=15
S3
=

taz ->
C=0.90

QUESTION 7 3 points   Save Answer

The daily electrical energy consumption (in Watts) during July for a random sample of 64 H= 6 4
households was recorded. The mean was 4789 W and the standard deviation was 274 W.
-

W
Assume the underlying distribution of the consumption is- normal. Find the upper 4789
x=
confidence limit (UCL) of the ->
99% confidence interval for the mean daily electrical energy S=224W
consumption during that month. Round your answer to the nearest whole number
(integer). 0.99 normal
+t2,5 t-dist
c
5
=

UCL =

Wh
df = 63

vec=So
QUESTION 8 3 points   Save Answer

The daily electrical energy consumption (in Watts) during July for a random sample of 49
-
n=19
=
x 4267 w
3 - 334 W
c = 0.95
households was recorded. The mean was 4267 W and the standard deviation was 334 W.
- - Not Normal
Find the upper confidence limit (UCL) of the 95% confidence interval for the mean daily $122
Large
-

electrical energy consumption during that month. Round your answer to the nearest
->

whole number (integer).


i
F
UCL =
t
"
-

UCL=1
QUESTION 9 3 points   Save Answer

The time spent, in minutes, on physical activities per day was recorded for a sample of check width
college students. The 95% confidence interval for the mean time spent on physical activities
is (56, 60.9). Which of the following would be a->
*
-

99% confidence interval calculated from the


sample data used to create the 95% confidence interval above?
(56.26, 60.64) -> 4.3 Width of 95/
(56.39, 60.51) 1.56, 60.9) = 60.9-56
->
4.12 = 4.9
(56.65, 60.25)
-> 3. U

--->
(55.23, 61.67)

QUESTION 10 3 points   Save Answer

The -
t distribution approaches the standard normal distribution as the normal
t-

Time
population size increases
sample mean increases

-
sample size decreases
degrees of freedom increases

df=
↓n-
QUESTION 11 3 points   Save Answer

Which one of the following does NOT affect confidence interval width?
① c- level of confidence
*Population Mean.
② sample size
Sample Size

Confidence Level
standard size

Sample Standard Deviation

QUESTION 12 3 points   Save Answer

was
In a random sample of 49 bottles of orange juice were checked and the mean amount of
sugar added was 7.7 g/L and the standard deviation was 0.84 g/L. Find a 95% confidence =7 .79/L
interval for the amount of sugar added to the juice.
- -

c = 0 95 S=0.84
(7.421, 7.979) 95%.cI- X
o

I
(7.503, 7.897) m
lace for Not normal

->
size
(7.465, 7.935) large

xs
(7.391, 8.009)
7.7 I 196

=
(7.465, 7.935)
QUESTION 13 3 points   Save Answer

"W
*
An IT technician recorded the time in minutes per user in a random sample of 49 students
#

on social media on the campus WIFI network. The resulting 95% confidence interval for the
population mean was-> (153.5 min, 182.0 min). What is the sample mean used to calculate
the given interval? Round your answer to the nearest whole number (integer).
16 S

=t133.5=162.75in
*

QUESTION 14 3 points   Save Answer

An IT technician recorded the time in minutes per user in a random sample of 49 students
on social media on the campus WIFI network. The resulting 95% confidence interval for the

Not normal ->


large size
W
population mean was (149.4 min, 188.7 min). What is the*sample standard deviation used to
calculate the given interval? Round your answer to the nearest whole number (integer).
PAACL
x
tryas 1
E 19.65
caseTaxTra
=

#
To
S= <018 - MD
QUESTION 15 6 points   Save Answer

To find out how much fat is in a common hot dog, a nutritionist recorded the amount of fat in 100
grams of a random sample of 9 brands as follows:
4 5 5.5 6 6.25 6.75 7 7.758 8
Assume that the underlying distribution of fat in grams is normal. leave
this
a. Find the mean and the standard deviation question
of the amount of fat. Round your answer to two decimal places.

b. Find the 95% confidence interval, ( , ), for the


average amount of fat in the hot dog. Round your answer to two decimal places.

QUESTION 16 6 points   Save Answer

F
-
The confidence interval for the mean of all MTH281 scores in the midterm was (68.542, 75.058). This interval was
calculated based on a sample of 25 randomly selected students. Assume the grades are normally distributed and the
-

sample standard deviation is 7.5. What is the confidence level that has been used in constructing the interval?
n = 2S
Show ALL your work.
-

t dist -

For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).


S =7.5
Paragraph Arial 10pt
Cz?
of 24 =

2
thz E=ULL. -685
E= &
Br
d

P
5.258=thXS =
E 3.258
0 WORDS POWERED BY TINY
1.5
3,258=tx in+-calculator dfee
Paths
->
QUESTION 17 th= 3.258=2.12 -
3 points   Save Answer

A quality engineer of a manufacturer of electric plugs took a sample of 400 plugs, of which
- n 400
=

only 81 are not working. Find the lower confidence limit (LCL) of the 99% of confidence
= 8/
-

interval for the proportion of plugs that are not working from all plugs produced by that x
manufacturer. Round your answer to three decimal places.
↑=8
12 =

p-zan.
P
H
As
I
LCL
= 0.2025

02025- 2.576x - I

(1-0.2025)
m
-

0.2025
QUESTION 18 -
40 O
a
3 points   Save Answer
P

&
A recent research study of 213 female migrants in UAE showed that 66.7% of them have n = 2 13
Vitamin D deficiency. Find a 99% confidence interval for the proportion of female migrants in
B=
-

UAE who have Vitamin D deficiency. C = 0.99


0,667

+P=<a
(0.5838, 0.7502)
conditions
(0.6037, 0.7303) cI
RRC up 25 (1-4K,5
(0.6007, 0.7333)

oscrssSciCrOs
conditions
(0.6139, 0.7201) e
met
·

are
QUESTION 19 3 points   Save Answer

A research study showed that a 99% confidence interval for the proportion of German adults
who drink mineral water is 0.48± 0.04. What does this mean?
-
0.48 10.04
-
-

We are 99% confident that between 44% and 52% of the German adults in the selected
sample drink mineral water. X -
I
99% of all random samples of German adults will show that 48% of them drink mineral 0.44 0.52
water. X
*
We are 99% sure that 44% to 52% of all German adults drink mineral water. correct 44 52%/
Between 44% and 52% of all German adults drink mineral water.

QUESTION 20 3 points   Save Answer

The Arab Youth Survey showed that a 95% confidence interval of the proporrtion of young
-= N+LCL
-

Arabs who would most like to live in UAE is (0.4869, 0.5786). Find the sample proportion of
-

young Arabs who would most like to live in UAE. Round your answer to four decimal
places. *
0.3328 =
oo.488 a

QUESTION 21 3 points   Save Answer

In which of the following situations we can NOT construct a confidence interval for the conditions
population proportion?
-
n8 7,5 ~ (1-P),5
n >5 1-P)>,5

contential

- m
= 0.98
n=300, x=294 =

294 0.98x e

294
300 X11-0-98)
G
N
= =>
n=50, x=48
L
n=20, x=8 b = 48
-> 50x48 48
Fo
=

n=500, x=250 Fo

QUESTION 22 3 points   Save Answer

Suppose that the ->average number of grams of fat consumed in a day for the sample of 25 -
n = 25
adults is 77 grams with a standard deviation of 35.9 grams. If the fat consumption by adults
*=
- - -

is normally distributed, find the margin of error (E) of the 99% confidence interval for the T7
- -

mean number of grams of fat consumed daily by adults. Round your answer to one 5 = 3509
decimal place. d = 24
f c= 0.99
2. O
E &
tap f
Un t-dist
normal

QUESTION 23
=
0.2797x
9= 3 points   Save Answer

A quality control inspector checks a sample of chips produced by a new machine. A sample H = 75
of 75 chips was checked and 2 were defective. What is the 90% confidence interval for the a <- z
proportion of devetive chips produced by this machine?
-

- >
Can't calculate the confidence interval n P2,5 n (1-p) >5
C= 0.98

F
D
(-0.0043, 0.0577) ↓ =

(0, 0.0573)
35x2=2
>5
(-0.0039, 0.0573)

condition not met

QUESTION 24 3 points   Save Answer

In a sample of 25 water specimens taken from a construction site, five contained detectable UT 25
levels of lead. What is the 95% confidence interval for the proportion of water specimens
->

that contain detectable levels of lead? c = 0.95 2= 5

CiteF2Ckiris
- (0.0432, 0.3568) 1.96
24
=

5PS5
(0.0684, 0.3316)

on ↑
conditions
are met

=
=0.2
(0, 0.4061)
=
0.3368)
Can't calculate the confidence interval

QUESTION 25 3 points   Save Answer

A survey of 9 popular restaurants is used to estimate the mean number of kitchen


employees. Which interval should be used?
Not normal
z-interval
either z- or t- interval size (30
t-interval small
*
Neither z- nor t- interval

QUESTION 26 3 points   Save Answer

In a survey of 800 parents in UAE, 509 said that moral education has a positive effect on

WOOEs
-

attitude towards others. Find the margin of error (E) for a 99% confidence interval for the
proportion of parents who believe that moral education has a positive effect. Round your

Erotfoul
answer to four decimal places.

irs esins
E-
0.0438

in
asso
QUESTION 27 3 points   Save Answer

#
Shamma computed a confidence interval for the mean weight of ZU female students based
on a sample of size 40. The weights are not normally distributed and Shamma used the
-
2-dist
Student’s * t distribution for the confidence interval instead of a normal distribution. Her
interval is ............... one obtained by using an appropriate normal distribution.

&
larger than t-dist is wider than 2-dist
longer than 2-dist
shorter than
CI width
No enough information
same as

Question Completion Status:

QUESTION 28 3 points   Save Answer


I
->
A 95% confidence interval for the mean number of hours per week spent on social media by
-

college students in UAE is (17.1, 19.5). Based on this interval, which of the following is NOT
a reasonable conclusion?
College students in UAE spend, on average, more than 17 hours on SM per week.
College students in UAE spend, on average, less than 3 hours on SM per day.
3.1
->
19.

-
College students in UAE spend, on average,* notseousle
15 hours on SM per week.
College students in UAE spend, on average, more than 16 hours on SM per week.

QUESTION 29 3 points   Save Answer

H
A social study showed that a 95% confidence interval for the proportion of employees in
- -
=
?
computer-related jobs who have changed jobs in the past year is (0.178, 0.246). Find the
- - - -

sample size (n) used in this study. Round your answer to the next integer. =00178 NOK
E=
O2T2) Exooo
2a
0034 =196 x

Nop) e
->
H

0= Y =

QUESTION 30
1.9 b NO288 6 points   Save Answer
- (2x0.212x0288 n
=

# S Ans
In a random sample of -60 children, the mean age at which they first began to combine words was&16 months, with a
standard deviation of 9.6 months. Construct a 96% confidence interval for the mean age at which children first begin
to combine words. Show ALL your work and round your answer to two decimal places.
not
n= 60 =16 months 3=9.6 C= 0.9 G
normal
II -> is
lay to
I 1611054 +
fo -dist
(Large
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).

Paragraph Arial 10pt

96%. CI ->
(13,48, 1852
2 decurnal places
P 0 WORDS POWERED BY TINY

Click Save and Submit to save and submit. Click Save All Answers to save all answers.

Save All Answers Save and Sub


Zayed University
MTH21
College of Natural and Health Sciences
Chapter
Department of Mathematics and Statistics

MTH213: Business Statistics


Outline

CHAPTER 9 - LECTURE 1
ANSWERS
1 Introduction

2 Correlation 2 / 38
Chapter 12

3 Simple Linear Regression


In addition to hypothesis testing and confidence intervals, inferential statistics involves
determining whether a relationship between two or more numerical or quantitative variables
- - - - -

exists.

• Number of hours of study and grades


• Distance travelled and taxi cost
• Personality type and choice of movies
• Exercise hours and major subjects in university

How would we know if any relationship exists?

In this section, you will study how to describe what type of relationship, or*
correlation,

-
exists between two quantitative variables and how to determine whether the correlation
is significant.
DEFINITION

A correlation is a relationship between->


two variable ! and ".
The data is represented in ordered pair (!, "),
where ! is an independent variable and " is a dependent variable.

Identify X and Y variable

• Number of hours of study and grades


DR Y
• Distance travelled and taxi cost

• Personality type and choice of movies


X Y
• Fat consumption and Weight
How do we show the relationship between X and Y?
• We use A scatter plot to show the relationship between X and Y.
• A scatterplot is a type of data display that shows the relationship between two
numerical
=
variables. The values of the # appear on the horizontal axis, and the
values of the $ appear on the vertical axis.
Ex: Scatterplot shows the money spent ( in $) and fuel bought( in Gallons).
Each data point is represented as ordered pair (!, "),
! coordinate- Dollars
" coordinate-amount of Fuel( gallons).

(Y)

By Big
B
(10,2.5)

·
B


(5, 1.3)

Pos (X)
Types of Correlation between X and Y

• Types of Correlation
There are three types of correlations between the two variables.
•Positive Correlation
•Negative Correlation
•No correlation

• Nature of Correlation
Strong
Weak /Moderate
Positive Correlation
• A positive correlation occurs when the values of two variables move in the same direction.
When the variable- >
! increases, the variable " increase. The trend is upwards
Examples: Y
X
•When income falls, consumption also falls. X
->
•The sale of ice cream increases as the temp increases.

• If the points are close to a line, the correlation is strong.


• If the scatter points are widely dispersed around the line, the correlation is weak .
• If all the points are on a straight line, the correlation is perfect.

#I
- &
MODERATE POSITIVE ->
PERFECT POSITIVE
CORRELATION CORRELATION
Negative Correlation
• A negative correlation occurs when the values of two variables move in opposite directions.
-

In other words, an increase or decrease in one variable causes a decrease or increase in the
-

other variable. When the variable ! increases, the variable " decreases. The trend is
- -

downwards
-
Examples: X Y
•When the price of property increases, the sales for property decreases.
- -

•As the price of property drops, the sales for property increases. Y
- -

-
x

=>

-A

PERFECT Negative
CORRELATION
-
•No correlation

•When two variables have no correlation; this means they do not appear to be statistically
-

related. The value of one variable does not change in relation to the value of the other
- - -

variable.
-

Examples: X Y
There is no relationship between the amount of tea drunk and level of intelligence.
- -

Weight and annual income


X Y
•FORM OF RELATIONSHIP
• Linear correlation
Correlation is said to be linear if the ratio of change is constant.
=>

In other words, when all the points on the scatter diagram>


tend to lie near a line which looks like a straight line,
the correlation is said to*
be linear.

• Non-Linear (Curvilinear) Correlation


Correlation is said to be nonlinear if the ratio of change is not constant.
-

In other words, when all the points on the scatter diagram tend to lie near a smooth curve, the correlation is
said to be nonlinear (curvilinear). This is shown in the figure on the right below.
-

-
•CORRELATION COEFFICIENT “r”
Interpreting correlation using a-> scatter plot can be subjective. it is difficult to know strength of
~

the relationship just by looking at the graph. A more precise way to measure the type and
- -

strength
-
of a linear correlation between two variables is to calculate the correlation coefficient
“r”.
• r is called correlation coefficient.
I SU1/
• The range of r is from- -1 to 1,both included.
-

v =0
Norelation
r=0.75

v = 0.37

v= -
0.93

v= = 0.5
moderate
B negative
r=0.81 Interpret the relationship between X and Y based on correlation coeffcient.
strong
r= 0.45 weak
positive r= - 0.92 ->
strong negative positive

r= 0 NO

correlation

r= -1

Perfect
negative A
B 0.5
Interpret the relationship between X and Y based on
correlation coeffcient.
(solve in class
show the answer
(
80.7
e

weak
negative
WO moderate
correlation positive

strong strong
afire
negative meg
strong
positive
HOW TO CALCULATE CORRELATION “r” 1. Using calculator
2. Using ONLINE Link ( in your course)

WATCH THIS VIDEO : HOW TO USE CALCULATOR TO FIND CORRELATION COEFFICIENT r


https://www.youtube.com/watch?v=I7F260zL3zo
Use online calculator ( in your course)
http://vassarstats.net/corr_stats.html
Q2
Online calculator
http://vassarstats.net/corr_stats.html

Find correlation coefficient and interpret the relationship


~
scores
correlation get higher
strong positive they
->
hours,
As student study for more
((c)

xc ↑ y ↑
Q3 A student conducts a study to determine whether there is a linear relationship between the number of hours a student
exercises each week and the student’s grade point average (GPA). The data are shown in the table below.
Find correlation
. .
coefficient
.
and interpret
.
the .
relationship
Online calculator
http://vassarstats.net/corr_stats.html

Isolve in class and show the answers)

0.164- weak negative will decrease


r=-
exercise (x) Increase, GPA(X)
- As hours of
Q1 A director of alumni affairs at a small college wants to determine whether there is a linear relationship
between the number of years alumni have been out of school and their annual contributions (in thousands of dollars).
The data are shown in the table below. Online calculator
Calculate the correlation and describe the type of correlation. HW http://vassarstats.net/corr_stats.html
Q4
Online calculator
http://vassarstats.net/corr_stats.html
Hu
I

I As x-hours of safety training increase

f lost decrease
in
0.954 Ye work hours
(1) = -

correlation accidents
strong negative
Hypothesis testing-Correlation Coefficient (r)

r measures the linear correlation between x values and y values in the-


The correlation coefficient& sample, which is
used to estimate the population correlation coefficientg
-
ρ.

When there is no linear relationship between the two variables, then- ρ = 0.


Once the sample correlation coefficient r has been calculated, we need to test whether the population correlation
coefficient ρ is significant, i.e.

-
a
Ho : ! = 0 ( No significant correlation)
Ha : ! ≠ 0 ( there is significant correlation)
Chapter 12

test

A hypothesis test for ρ can also be one-tailed: (NOT IN YOUR COURSE)


► Right-tailed (Ha : ρ > 0; i.e., Positive correlation)
► Left-tailed (Ha : ρ < 0; i.e., Negative correlation)

In this course, we are interested only in the two-tailed hypothesis test.

t distribution withf
The test is based on-> n− 2 degrees of freedom.
Hypothesis testing-Correlation Coefficient

-
corrs
No: 9 = 0 (No
there
Ho " 1 tO (yes, (
is cour
C

W
21 / 38
Chapter 12

2
- -me

Solve p-value ( two tails),


if
3
p ≤∝ reject Ho

p >∝ Don’t Reject Ho


*

D = 7

22 / 38
Chapter 12

-> Link

3
Y
-

DC X = 0.05
I As x-hours of safety training increase
SOLUTION
f lost decrease
in
0.954 Ye work hours
(1) = -

correlation accidents
strong negative
correlation)
->
Reject
No: 1 = 0 (No
⑰ correlation)
(3) a 0.85
=

Ha: (significant
150

@df 2 7 2
=
5

+= NF0.954)
n
-
=
-

0.954x5
=

=
2

a 23 / 38
Chapter 12

two tail test


n -
2
link
Uset-online
-

*
711
->

x 0.05
p-value 0.001
=
=

3 (4) PCa Reject No ->

evidence to
conclude that there is
safetytraining
We have enough between 40 hours of
correlation accidents.
a significant
-
number of work hours lost due to
and y a
Try yourself: The weights (in pounds) of eight vehicles and the variabilities of their braking distances (in feet)
when stopping on a dry surface are shown in the table. At % = 0.01, is there enough evidence to conclude that
there is a significant linear correlation between vehicle weight and variability in braking distance on a dry
surface?
~

of the
vehicles increase
r 0.623
I
As xc-weight increase
distance
=

breaking
-"Strong positive 7-
Don't Reject
24 / 38
Chapter 12
correlation correlations
->
2 ③ No:
<=0 (No
(3)) a 0.01 correlation)
(significant
=

Ha: 150
2 6
@df 8
=

r
-

=
n
=
-

0523x
=

a + 1 - 0.623

two tail test


t-online link -

3 *1.95 - Use

x 0.01
significantvehicles (x)
0.099
=

p-value =

no
there is a
Ho) This means
(Don't Reject
-
P>x of the
correlation between. Weight
and the breaking distance (y).
PRACTICE- PAST PAPER Questions
Q1 Which value of the coefficient of correlation r indicates the strongest linear relationship?
a) 0.55
b) 0.60
This is the highest values, doesn't matter position or
negative
c) &-
-0.75
d) -0.45

Q2 Which of the following statements does NOT contain a mistake? college is Qualitative
a) There is a correlation of r = 0.64 between the college of the student and his/her GPA.I r is relation between
b) The correlation between age of a car and its price was found to be r =- 0.76. Quantitative variables
c) The correlation between the experience of an IT technician and his/her salary is r = 0.71 Dhs/year.
d) The correlation between fat and calories in burgers was found to be r =1.1.
Chapter 12
25 / 38
↓ This is not re
2

For
Q3 If all the points of a scatterplot lie on the least squares regression line, cannot be
This is slope

then the correlation coefficient for these variables based on these data is more than
1

a) 1 -
rc/
b) 0
c) -1
3
d) 1 or -1 depending on whether relation is positive or negative
e) Anything between -1 and 1
Q4 Match the value of the correlation to the data in the scatterplot.

r -

=
0.1 r= =
0-71

0.89, 0.51, -0.11 , -0.71

0.89 r 0.51
=

r=

26 / 38
Chapter 12

Q4 Which scatter plot


2 shows a strong association between two variables even though the correlation is probably
- - .

near zero?
- curve (b) r will be a valuenegative
0
value
=

Here r
will be a positive
this I curve (c) I

because association
non-linear (d) r 0 and no
But
=

is 3 curve
curve. How
assoming
s
is are
pointsclose-
Outline

SWERS CHAPTER 9 - LECTURE 2


1 Introduction

2 Correlation 2 / 38
Chapter 12 &

3 Simple Linear Regression


REGRESSION
f
• After verifying the linear correlation between two variables, the next step is to determine the equation of the
. . . .

line that best models the data.


- . . .

• This line is called a regression line, andX


-
its equation can be used to predict the value of y for a given value of x.
---

• Although many lines can be drawn through a set of points, aT


regression line is determined by -
specific criteria.

line of best fit


REGRESSION LINE
Ø A regression line , also called lne of best fit is the line for which the sum of the
-

squares of the residual is a minimum. actual predicted


Hamda
- -
- ->
12 80

Residual =
80-12 = 8

The equation of regresion line is given by

"! = $% + $!'
↓r

predicted
value
$! ()* $1 : Regression coeffiecients

↓ Here bo represents (y- intercept)


Residual b1 represents ( slope)

(Self Study) WATCH THIS VIDEO:


https://www.youtube.com/watch?v=zPG4NjIkCjc
LET’S RECAP: Do you remember Y- intercept and Slope of a straight line?
Sara received $30 from her favorite aunt for her birthday. She wants to save her money to purchase an e-reader.
She decided to save this $30 and add from her own money each week. The amount of money saved over weeks
is shown in the graph below.

1.Find the Y – intercept (c )


Y-intercept (c) is point on the line that cuts the Y-axis.
At y-intercept x (weeks)=0 , y(Money)= 30$

Interpretation : At the beginning, Sara has 30$

2. Find the slope(m)


Slope (m) is change in y for every one unit increase in x.

! "! $%&'() *' ! +,"-,


m = #! "#" . = $%&'() *' # = ."/
= 5$/week
! "
She saves 5$ every week

Interpretation : For every additional week, her savings increase


By 5$
3. Write the Equation y =mx + c
y = 5x + 30
LET’S RECAP: Do you remember Y- intercept and Slope of a straight line?
John took a loan of $1000 from his friend. He decided to return the loan money by paying some amount every
week. The remaining loan over weeks is shown in the graph below. (show the answers)
1.Find the Y – intercept (c ) C

Y-intercept (c) is point on the line that cuts the Y-axis.


x 0)
$100
(0,
y-Intercept -
(0,1000$) weeks

his loan is loos ⑳


=) At the beginning, XI Y
(2,800$)
2. Find the slope(m) ·
weeks
Y2
Slope (m) is change in y for every one unit increase in x. 2
*
600$)

&Y 800reekstooleks (Ywees


m = ·
=

=-100$/ weeks
loan amount
For additional week,
every
=>

goes down by 100$


3. Write the Equation y= m x +. c
a de
1000
100x +
y
·

= N
Ø We will use Online link to compute Regression line.

Link for online calculator


The equation of regresion line is given by
http://vassarstats.net/corr_stats.html
C + MCC
"! = $% + $!'
dint slope
$! ()* $1 : Regression coeffiecients

Here bo represents (y- intercept)


b1 represents ( slope)
Q 1 The following data represent the number of calories per serving (y) and the number of grams of sugar per
serving (x) for a random sample of -
high-fiber cereals.

"! = $% + $0 '

Find the correlation coefficient (r) and interpret the relationship.


0. 753
strong positive
relation
hig n-fibre
v =
-
calories (4) will increase in
when increase, cereals
-> sugard)
Find the intercept of the linear regression model. Round your answer to two decimal places.
y-intercept=
8
(b)
Find the slope of the linear regression model. Round your answer to two decimal places.
slope # [-rsD
->
~ is positive
(bs)
Write the Regression equation "! = $% + $0 ' slope
IS positive

:+2980C
Q2 A student conducts a study to determine whether there is a linear relationship between the number of hours
a student exercises each week and the student’s grade point average (GPA). The data are shown in the table
below.->
( Show the Answers)

Find the correlation coefficient (r) and interpret the relationship.


-0.164 Weak negative will decrease
v = ->
exercse(s) Increase, GPA(Y)
-> As hours of
Find the intercept of the linear regression model. Round your answer to two decimal places.
bo <- 3:19

Find the slope of the linear regression model. Round your answer to two decimal places.
b= 0.021
a

GPA
=

Hours
3.19-0.021

I
<-

Write the Regression equation "! = $% + $0 '

Y 319-0.0210C
=
Q 3Try yourself: ( show the answers)
The following table provides information on the monthly incomes (x) and food expenditures (y) of seven
households.

Find the correlation coefficient (r) and interpret the relationship.

Find the intercept of the linear regression model. Round your answer to two decimal places.

Find the slope of the linear regression model. Round your answer to two decimal places.

Write the Regression equation "! = $% + $1 '


NOTES:
• Slope of a regression line can be positive , negative or 0
• Positive slope means upwards trend, correlation coefficient r should be positive.
• Negative slope means downward trend, correlation coefficient r should be negative.

SOLVE: The coefficient of correlation between calories and fat content in burgers is found to be 0.88. Which of the
following values cannot be the slope of the line of the linear regression?

a) 0.88 *
the answers

b) 1.2
c) -3
B
d) Any of these options can a regression slope.

SOLVE: If the coefficient of correlation is -0.4, then the slope of the regression line
a) Must be -0.4
b) Must be 0.4
c) t
Must be negative
d) Can be either positive or negative
q INTERPRETATION OF REGRESSION COEFFIECIENTS $! and $1 :
v The Y intercept bo is the predicted value of y when ' = 0 .

v The slope b1 is the predicted change in y as a result of a one unit increase in x.


v slopes are always expressed in y- units per x- unit

WATCH THIS VIDEO: https://www.khanacademy.org/math/ap-statistics/bivariate-data-ap/least-squares-


regression/v/interpreting-slope-of-regression-line
Q1 A researcher took a sample of 25 electronics companies and found the following relationship
between the amount of money (in millions of Dhs) spent on advertising (x) by a company and total
-

gross sales (in millions of Dhs) of that company. *


T do by
"! = 3.6 + 11.75'

A adv
sales =
3.6 + 11.75
SC Y
sales
What does the Y- intercept mean in this context? ->
with no adv,
O ↓ int million 3. 6 million
&C 3.6
.

was
=
s
-

do
adv = 0
Sales = 3.6 million
money

/million of
11.75m of sales
by b. = adv
What does the slope mean in this context?
-

11.75M.
in advccc), sales (x) increase by
For
addition increase
-> every
Q2 It is believed that the final grade of a college student (y) depends on the number of days absent (x) during
-

the semester. A regression has been conducted on a sample of college students in a statistics course. The
regression equation was found to be bob,
"! = 90 − 3 '
grade = 90-3 absent
bo
a

Y
absence

I) Which of the following is the correct interpretation of the Y- intercept? Mus X O=

a. For every additional 3 absences, the student’s final grade drops below 90, on average.
b. The Y-intercept is meaningless in this case.
c. -
The expected final grade of students with no absences is 90.
d. For every additional absence, the student’s final grade drops by 3 points, on average.

slope? (61)
II) Which of the following is the correct interpretation of the *
=
-

3
points/absence
a. For every additional 3 absences, the student’s final grade drops below 90, on average.
b. The slope is meaningless in this case.
c. The expected final grade of students with no absences is 90.
d. For
- every additional
- absence, the student’s
a final grade -
drops by 3 points, on average.
Q3 Try yourself: ( show the answers)
The regression equation of monthly salary (1000s Dhs) on experience (years) for computer programmers is

1 = 14.5 + 0.6 /012%32452


"#$#%&

I) Which of the following is the correct interpretation of the Y- intercept?

(a)The expected monthly salary of a programmer with no experience is 14,500 Dhs.


(b) There is no practical interpretation of the y-intercept.
(c) Estimated monthly salary increases by 14.5 Dhs with each 0.6-year increase in experience.
(d) Estimated monthly salary increases by 600 Dhs with each 1-year increase in experience.

II) Which of the following is the correct interpretation of the slope?

(a) The expected monthly salary of a programmer with no experience is 14,500 Dhs.
(b) There is no practical interpretation of the y-intercept.
(c) Estimated monthly salary increases by 14.5 Dhs with each 0.6-year increase in experience.
(d) Estimated monthly salary increases by 600 Dhs with each 1-year increase in experience.
*
w
Q 4 In a recent study of soy milk, a random sample of 1-cup servings from various brands was
obtained and the number of calories was measured in each (y). In addition, sugar (x, in
grams) was also measured. The following regression equation describes the effect of sugar on
the number of calories.
"! = 81 + 3.5'

What does the Y- intercept mean in this context?


milk has 81 calories.
A sugar free soy

What does the slope mean in this context? by 81

additional increase in sugar, calories


increase

For every
SPECIAL CASE:
• When the scope of the model does not cover& x =0, bo does not have any meaning as a separate term
in the regression model. T
• The interpretation of the intercept sometimes doesn’t make sense in the real world.
• It is, however, acceptable (even required) to interpret this as a coefficient in the model.
&
Y

The graph shows the distance travelled and amount of gas used .
- -

Interpret regression coefficient. bo and by


Which regression coefficient makes sense?

SOLUTION :
Y- intercept (bo) = 3.1
This means when distance(x) = 0 ,
Gas used (Y) = 3.1 gallons.

10,3 Is it possible? NO
Y- intercept is meaning less in this situation.

CC Slope (b1) = 0.35 gallon/km .


For every additional km travelled, 0.35 gallon is used.
Try yourself: -
( show the answers)

The regression equation of the amount of energy used in electrical components (in kW) on the number of
---

Residents in the house is


- -

occupants
242%7&
= = 460 + 6485591#4:;
A
= 460 + 640
Y
bo by x
Yo 460kw,
0
=

I) Based on the model, the interpretation of the Intercept:


Toccup ants
=

-d

energy
(a) All houses will be charged based on 460 kW.
w
.
(b) There is no practical interpretation since no occupants in the house is nonsensical value.
(c) For each additional 64 occupants the electrical use increases on average by 460 kW.
(d) For each additional occupant, the electrical use increases on average by 64 kW.

II) Based on the model, the interpretation of the Slope: ⑤


Kw)occupant
(a) All houses will be charged based on 460 kW.
(b) There is no practical interpretation since no occupants in the house is nonsensical value.
(c) For each additional 64 occupants the electrical use increases on average by 460 kW.
I
x
(d) For each additional occupant, the electrical use increases on average by 64 kW.
- .

-
Outline

WERS
CHAPTER 9 - LECTURE 3

1 Introduction

2 Correlation 2 / 38
Chapter 12

3 Simple Linear Regression


Ø Predictions using Regression equation
We can predict Y values using regression equation or the graph.
x

How many cats owned at the age ofD


40 years?

1) Graph
40 years,
y = 2,800 2.9 cats
X
x =

Y
2) Regression equation
0.06290 + 0.2934
y =

y
0.0629X40
=
+ 0.2934 2.88
x

15.9
*

Do
A
Types of Predictions

Ø When we predict values thatOfall within the range of data points taken it is called -
interpolation.
Ø When we predict values for pointsto
outside the range of data taken it is calledX extrapolation.
Y x
How much gallons used when distance travelled is 10Km?
T
-

y 3.1 +
0.356 Interpolation
=

gallons
-

y 3:1
=

+0.35x10 = 6.51 Y
Y
How much gallons used when distance travelled is 32Km?
-

* 6.58

+ 0.35xC
y 3.1
=

Extrapolation
3.1 + 0.35x32
D
=

14.21
-

gallons
YOU TUBE VIDEOs :
https://www.youtube.com/watch?v=3wLIe4NS0IM
X
https://www.youtube.com/watch?v=IwAHeJY-INc
I
Ø Why extrapolation is a bad idea in regression analysis?

These predictions are considered incorrect because


They consider x values out of the range, and we wouldn’t know if the the linear trend will
continue, and so extrapolation generally should not be trusted.
- -
A

Which of the following are interpolations? Solve for Y


- .

7.4514x Y
a)x y 37.449
+

3
=
=

y 37.449
= + 7.4514x3
+
858-600 -

86
d)x 6.5
=
x4
=
=
5=

x 13 -
Extrapolation
(x)
=

T
*I

45 y
(d)x
=

= -

I
r 0.535
=

A bo+ b,x
a)Find the regression equation. =
- -

Y Predicted
R 231.69+ 125.32xC
value
=
y
b)What would you predict for the price of a 3.5 TB disk? q

y
= 231.69+ 125.32x3.5=670.6Dhs = TODUs
3.5 TB drive for A
c)You have found a- 525 Dhs. Based on above answer, is this a good buy? How much would
you save? actual
actual value is less than
it is a good buy because
yes,
value. We money
670 -
525=
Dhs
predicted save
xC
-
How
Y
A)Find and interpret the correlation coefficient. r 0.994
=

=
+b,cC
B) Find the regression equation. * bo
y 4.16
=

+ 15.51x
x5
c)Predict the service time when 5 components are repaired.
=

5 =
4.16+ 15.51x5=323 minutes

D) Should you use the model to predict the service time when 20 components are repaired? Why or Why not?
No, we should not use this model to predict for 20 components

prediction for so components is extrapolation.


Ø Try yourself
Q1 Consider the following scatterplot of the length of a service call (y) and the number of
- . . .

electronic components in the computer that must be repaired (x) for a sample of 14 service calls.
. . .

T
The regression line has been drawn in the plot.
-

Based on the regression line, the best prediction of the length of a service call when 10
components are repaired would be -
service
for
T
160 minutes of 10 components

170 minutes

150 minutes

165 minutes
C

X
Ø Try yourself
Q2 A researcher took a sample ofF 25 electronics companies and found the following relationship
between the amount of money (in millions of Dhs) spent on advertising (x) by a company
-

(Ranging fromX1 to 10) and total gross sales (in millions of Dhs) of that company.
↓ million
"! = 3.6 + 11.75 +
sales radv x
Y
Predict the total gross sales (in million Dhs) of an electronics company which spent 9 million Dhs
--

on advertising.
a 3.6 11.75x9
a). 235 million Dhs Y
+

b) A
109.35 million Dhs -
6.35
c) At least 300 million Dhs
d) This is extrapolation; the prediction is risky.
Ø Try yourself
Q3 A researcher took a sample of 25 electronics companies and found the following relationship
between the amount of money (in millions of Dhs) spent on advertising (x) by a company
(Ranging from 1 to 10) and total gross sales (in millions of Dhs) of that company.
x=20 million
"! = 3.6 + 11.75 +
x

Predict the total gross sales (in million Dhs) of an electronics company which spent 20 million Dhs
x
- x

on advertising.
a)235 million Dhs
b) 238.6 million Dhs
c) At least 300 million Dhs
d) o
This is extrapolation; the prediction is risky.
Ø Try yourself
Q4 Consider the following scatterplot of the length of a service call (y) and the number of
-

electronic components in the computer that must be repaired (x) for a sample of 14 service calls.
The regression line has been drawn in the plot.
Based on the regression line, theA
best estimate of the slope would be
you can
point
as
sell
also
*
C i
Ta Ke
other
in one
Y

f
a) 4
b) 15.6 any I

Y(

96)
130)
180
·

c) Not enough information (6)


d) 14 O
Slope (x1
6ko
=

6 4
xz x(
-

= 30 - 15
E

slope 130 B
10
-

= =

-x
Ø Try yourself
Q5 Consider the following scatterplot of the length of a service call (y) and the number of
electronic components in the computer that must be repaired (x) for a sample of 14 service calls.
The regression line has been drawn in the plot.
Based on the regression line, the best estimate of the -
Y- intercept would be

a)&
4
b) 20
c) Not enough information
d) We can’t find, it is extrapolation
0 electrical repaired
minutes used
4
does not make
# y-intercept 10.VE-./
x
sense
ANSWERS LECTURE1-CHAPTER 6
Objectives

6.1 Introduction to Confidence intervals

6.2 Confidence Intervals for the Mean


Basic Properties of Confidence Intervals
Confidence Intervals for the Mean

6.3 Confidence Interval for One Proportion

In this chapter, you will learn

► To construct and interpret confidence interval estimates for the population mean.
► To construct and interpret confidence interval estimates for the population
proportion.

VIDEO LINKS:

What is a confidence interval?


https://www.youtube.com/watch?v=tFWsuO9f74o&ab_channel=DrNic%27s
MathsandStats

Meaning of confidence Interval


https://www.youtube.com/watch?v=JYP6gc--sGQ
In this chapter, you will begin your study of inferential statistics—use sample statistics
to estimate population parameter.

Statistical Inference is a body of techniques which use probability theory to help us to


draw conclusions about a population on the basis of a random sample.

There are two common types of formal statistical inference:

0
Estimation: to use sample statistics to estimate the value of an unknown population
parameter.

Hypothesis Testing: used when we are making a decision about a population parameter
based on the value of the sample statistic.

population mean ! using sample mean $ # .


ummmm
In this section you will learn how to construct Confidence Intervals to estimate the

We will discuss two cases:


• When the population standard deviation % is known
or
• When the population standard deviation0% is unknown
_
DEFINITION

What is Confidence Interval ?

• The confidence interval (CI) is the range of values constructed



from the sample within which we my expect to find the population
mean ! ( true mean)with a certain degree of confidence.
-

• The degree of confidence is called-


level of confidence “ c “.
90 't , 951 .
> 98.1 .
> 991
.

Example : We are 95% confident that average GPA of ZU students is


between 2.92 and 5.62

Here confidence interval is (2.92, 5.62) and level of confidence is 95%.

Ø WATCH VIDEO:
https://www.youtube.com/watch?v=tFWsuO9f74o&ab_channel=DrNic%27sMaths
andStats
Confidence Intervals when Population SD ) is known

The general formula for confidence intervals is: U I


mm

Point Estimate. ± Margin of error


$$$ ± %
#
$ ¥É§E
$$$
# ± &!/#
√&
{ri tical value
$ $
CI is written as: $$$ - &!/#
( # , $$$
# + &!/# )
√& √&

Point estimate: is the Single value of a sample statistic that is used to


estimate a population parameter. In this case, the sample mean ! ", is a point
estimator for the population mean %

Margin of error E:
• Margin of error is variation of your estimate. It happens due to sampling
variability( different samples of same size from same population will have
different means).

• It estimates the greatest possible distance between sample means and


actual mean or how close the sample mean lies to population mean

&∝/# ( Critical Values)


The critical values, - &∝/# and &∝/# are the values that enclose the area that
represents level of confidence c .
∝  is the proportion in the tails of the sampling distribution that is outside the
confidence interval (A proportion of  ∝/2 is located in each of the lower and
upper tails of the distribution).

alpha
,


Example 1: Write the critical values &!/# for following confidence intervals.
*

Online Normal distribution link to corresponding values for '∝/# :


http://www.statdistributions.com/normal/

a) 80% Confidence C = 0.80


-

z%= -1-1.282
b) 90% confidence C. =

242=-1-1.645
c) 95% confidence C. = 0-95
-

242=1.96

d) 98% confidence C- 0.98

242--2-327

Note: The values of *$/% for commonly used confidence levels are shown
me urn
below:
µ estimate

E-
Example 1 : The life in hours of a 75-watt light bulb is known to be0normally
-
distributed with a standard deviation 20= hours. A random sample of

25 bulbs has a mean life of 1020 hours.
Construct a 95% confidence interval for the mean life of the 75-watt light
bulbs. Interpret your results FORMULA for CI
Online link: http://www.statdistributions.com/normal/ Sample Mean ± Margin of error
*** ± +
)
000 ,,, = 1020 and
We have n = 25,   % = 20, + c= 0.95, ***
) ± ,%/&
√)
'

*$/% = 1.96
E E
$ $
The 95% CI will be. $$$ − &!/#
( # , $$$ + &!/#
# )
√& √&

'( '(
(1020 −1.96 , 1020 +1.96 )
√') √')

(
CI → 1012.2 , 1027.8)

Lower 1cL Upper UCL
Sample Mean
95.1 CI →
.

in in
1020
1012.2 1027.8
Interpretations of CI:
► We are 95% confident that that the population mean lifetime % is
between 1012.16 and 1027.84 hours.
This means that if we take different samples of same size, 95% intervals will
- -

capture population mean.


-

=
Ex: if we take 20 samples, 19 of them ( 95%) will capture population mean %
100 Samples
95 →
will capture
↑ 5 → will not

capture
• • • • • • •

NOTE: it is not correct to say that the probability is 95% that is the true mean
between 1012.16 and 1027.84. Population mean is a fixed number. It will be
present or not present in an interval.
¥ -
.Example 2: Scores of an exam are normally distributed with a population
standard deviation of 5.6. A random sample of 40 scores on the exam has
mean of 32.
A) Estimate the population mean with 90% and 95% confidence.Interpret
-

- -

your results FORMULA for CI


B) Which Confidence interval is wider? Sample Mean ± Margin of error
32 *** O
) ± +
40
=
T = 5.6 n=
*** '
) ± ,%/&
√)

90.1.ci 22,2=1.645
CI I

±
}%Fn
1.645×5%0
cI
32 ±

901.C.IS/3o-5g33.5)-gs-y 30395¥33
.
CI →
Z%=
1.96 ?

5÷m
I -96 ✗
32 ±
wider than 901
interval
95Y-→(3O3,337#
.

is
95.1 .
'
'

of confidence
,

increase level
NÉE : As we
increase
width will

Finding Limits of Confidence Intervals (CI) [I question]


Confidence limits are the numbers at the upper and lower end of a
confidence interval.
Example: if your sample mean is 7.4 with confidence limits of 5.4 and 9.4, your
confidence interval is 5.4 to 9.4.

CLCL ) ( VCL)
Practice
A quality control technician is checking the weights of a product which is
-
normally distributed with a standard deviation of 2.54. She took a random
sample of 47 units and found an average weight of 50.71 and. Find the upper

o=É5n=47
confidence limit (UCL) for a 95% confidence interval for the mean.
-

22%1?
7- 50.71kg
= ⇐◦ '

95

Sample Mean ± Margin of error


I -1 E


Uci $$$ ± A
#
$
It $$$
Z4z%n # ± &!/#
√&

50.71+1.96
25,4g
→ ✗



Practice

random sample of☐ →
The ages of cars running in the streets of Dubai are normally distributed. A
25 cars yields a standard deviation of 7.7 years.

What is the&margin of error to find°


95% confidence interval? Round your
answer to four decimal places. 20.95 -96
D= 25 0=7.7 Z%=l
Sample Mean ± Margin of error
$$$ ± A


#
E-
Z%%n
$
$$$
=

# ± &!/#
√&

1.961177µg
= Lower upper

E =
3.01
Finding Sample mean and Margin of Error from the limits

*
Formulas
&'()('( &'(*('(
2,= %
, 3= %
X
Example : Suppose that the 95% confidence interval for the average number of
- -

COVID-19 new cases in UAE during the last three months is given by (1014.6,
-

1253.4) based on a random sample of 16 days. It has been noticed that the
-

number of new cases is approximately*


-

normal.
What is the ->
sample mean andm Margin of error used when computing the
interval?
95% -> 6, 1853.4)
(1014.
14 6 1/34
1+10 cases
-

x = =

E = 1253.42 1014 6
=
119.4
-> 1134-119.4 113 4 1134 H119-4
-
D 3
1014. G 1253 4

Example
Factors3: m
that affect width of Confidence Intervals (CI)
C

Three factors that affect width of CI are as


C4, width 4
1. Confidence level c
As we increase the confidence level, CI becomes wider.

2.-
Size of Sample n
A larger sample size or lower variability will result in a tighter confidence
interval with a smaller margin of error. as
n4s width
As we increase the sample size n, CI becomes narrow.

3. Sample standard deviation8


s ( will discuss in Next lecture)
n (large size) Less spread
n (small size

variation is
high cows

All
SD is
high

-
Meaning of Confidence Intervals (CI)
SELF STUDY: ( two questions in quiz/Assignment)
WATCH VIDEO: https://www.youtube.com/watch?v=JYP6gc--sGQ
T

TRY THIS ONLINE LINK to see how confidence intervals work


http://visualize.tlok.org/elem-stat/mean_confidence_intervals.php

O
Q : Ali constructed a 90% confidence interval for the mean amount spent per day in a
shopping mall based on 50 customers. The confidence interval was found to be (1000 AED,
2000 AED). Which of the following statement(s) is (are) always correct?

#
This question has multiple correct options. ( Past Paper)

v
correct

X
Incorrect

X Incorrect

~
correct
ANSWERS
LECTURE 2-CHAPTER 6
Objectives

6.1 Introduction to Confidence intervals

6.2 Confidence Intervals for the Mean


Basic Properties of Confidence Intervals
Confidence Intervals for the Mean

6.3 Confidence Interval for One Proportion

In this chapter, you will learn

► To construct and interpret confidence interval estimates for the population mean.
► To construct and interpret confidence interval estimates for the population
proportion.

VIDEO LINKS:

What is a t- distribution?
https://www.youtube.com/watch?v=7SS4oHXZw74

.
In lecture 1, we discussed section 6.1
muniments
• ( Section 6.1)- Construct CI when we know population SD !

$
CI: ###
" ± %!/#
√&
In this lecture,
• (Section 6.2) - Construct CI when we don’t know population SD0
! ( section 6.2)
Umm mm

SECTION 6.2 : CI WHEN POPULATION STANDARD DEVIATION ! IS UNKNOWN

- -
O
In real life situation, We don’t know population SD (# ) because: u r

• If we had # known, then % is also know (since to calculate # you need to know % )

• If you truly know µ there would be no need to gather a sample to estimate it.

• Our earlier formula will not work.


I s

u r

So, when population SD # is unknown, we work with sample standard deviationC)


'.
un un -

There are two cases


CASE 1 :
• Population SD # is unknown
• When Population is normal and any size ( small or Large)
• We use t- distribution.

CASE 2 :
• Population SD # is unknown
• Population is not normal, and sample is large. According to Central
limit theorem, the distribution will be approximately normal.
• We use z- distribution.

→ Imp

We use this table when population SD # is


unknown and we work with Sample SD s
0
Central limit theorem
tribute
.

~P°P Ndis
sample
Pop n aPP
What is a T- distribution?
Ø Watch this Video: https://www.youtube.com/watch?v=7SS4oHXZw74

The T distribution (also called Student’s T Distribution) is a family of distributions that

• look almost identical to the normal distribution curve,

• bit shorter and fatter because small sizes have more variability, that’s why fatter curves.

• t distribution can only be used when population is normal

► ONLINE LINKS for t- distribution and z- distribution

• Online Link to solve ( ∝/# ∶ →


http://www.statdistributions.com/t/
• Online link to solve *∝/# :
http://www.statdistributions.com/normal/

@
PRACTICE: Find t critical for 95 % confidence interval with Sample size
_

( = 0.95
-12263
a. n= 10 d- f- = 10-1=9 t%= 0.95
df= t%= -1-211
18-1=17

g
b. n=18
c. n=40 df=39 t%= -1-2023
d. n=70 df 69 -11.995
t%=
=
*
- waiting time
EXAMPLE 1: Finding CI

->
The waiting time of customers at a fast food outlet is normally distributed. A random
-

sample of 11 customers showed a mean of 2.20 minutes and standard deviation of 0.35
- -
-

minutes. Find a*
95% condence interval for the mean waiting time at this outlet.
x =
1 x=220minutes <= 0.35 minutes c= 0.95
t-dist df 11-1 10
normal
= =

->

pop ->

CI:
I I tap on
·20 12.228 x 35
UT I
(
(1.96, 9. The
min
=
We are 95% confident that
2.20
waiting time of
*

ale and
#( all customers is between
2.4 3
1.96 1.96 min and 2.43 min.

Example 2: FINDING CI sX
- -

-
-

x = 100 F = -4.53 days 5 = 3.68 days

C =
0.95
size
not normal - large
pop ->

22-dist
ag
CI. *I lay
& *5 3(

=
4.531
4.96Xbe 3.81
we are 95% confident
5.25

( 3.81 9 5.25) that al ->


any stay of
days days patients is between
3.81 and 5.25 days
✗→ temp of the coffee
Example 3:

☐ df=15
-

-
0

n=16 I = 162°F s=1O°F_hP


C. =
0.95
t dist
normal
-

Pop →

CI → I

±
ᵗ%¥

±
162
2-132×1%6
156.7°F , 167.3°F )
CI →
(
MIXED PRACTICE
Example4 : FINDING SAMPLE MEAN AND SD

e- ÷
Suppose the 90% confidence interval for the mean SAT scores of applicants at a business
college is given by (1690, 1810). This confidence interval uses the sample mean and the
sample standard deviation based on 40 observations. -
D= 40

What are the sample mean and the sample standard deviation used when computing
CI → ( 1690,1810 ) I ? S ?
the interval? C- 0-9 = =

LCL UCL
I=
VfL =

1810+2-1690 =

pop → not normal ; sample > 3.0


SIZE
E-
22,25pm
=
→ E =
?

810¥90 @
UCL#L
=
= =

E =
1% Fw
60%-1645×5*0
60,1%40-5 =s s=2
Example 5 : FINDING confidence level c
j
Example 5 : FINDING confidence level c
-7-16
A statistician selected a sample of 16 IT managers from UAE companies. He reported that
the sample information indicated that the confidence interval for the average salary (in


AED) of all IT managers in UAE is (28698.8, 31301) and the sample standard deviation is 2000
Dhs. Assume that salary is normally distributed.He neglected to report what confidence
level he had used. Based on the above information, what is the confidence level?

5=2000 AED TIF


( 313µg ;)
CI →
28698.8 ,
-

ki
pop is normal → ⑤ t%£✓n
?
U¥fL
E = E-

31301-8 =

E=13
-
0
g
÷ ᵗ%%n
=

1301.1
↓t% 200¥ 2%0=500
= →

130¥!
=

ᵗ%×5¥◦
b-
2. 602
=
%
-

C=0?⃝
Example 6 : Find CI from other CIs
X
Each of the following is a confidence interval for the true average time to response to an
editing command with a new operating system:-
-
-
(84.70, 105.30) and (88.42, 101.58)
Both intervals were calculated from the same sample data *
(Very large sample).
A) The confidence level for one of these intervals is 90% and for the other is 99%. Which
of the intervals has the 90% confidence level, and why?
-

90%. T 95./
(I->(88.42, 101.58)
t
&

-A bigger
with
?
--
88. 42
<
? 105.3
101.58

calculate
can 99./
you
width by doing LCI
-

B) Find the 95% confidence interval. c = 0.95 cI =


7
pop,
thermal very large size

Y E
E=2c
CI x
=
S
U
*
1A5888K=5
90%.
(I->(88.42, 101.58)
E= UCI-LCL=10288.42
->
2
2

=58

g
901eE = 2 X S
q
d de Un
658:10454
fr
=>Ts
I ± 0.95
2% few
CI → e-


I
1.96×4.5%5
95
SUMMARY

• SECTION 6.1 -
Construct CI when we know population SD !

$
CI: ###
" ± %!/#
√&

• SECTION 6.2
-
Construct CI when we DON’T know population SD !

→ CI

estimate
U
ANSWERS LECTURE 3-CHAPTER 6
Objectives

6.1 Introduction to Confidence intervals

6.2 Confidence Intervals for the Mean


Basic Properties of Confidence Intervals
Confidence Intervals for the Mean

0
6.3 Confidence Interval for One Proportion

In this chapter, you will learn

► To construct and interpret confidence interval estimates for the population mean.
► To construct and interpret confidence interval estimates for the population
proportion.

.
1
Ø SECTION 6.3 : CONFIDENCE INTERVAL FOR POULATION PROPORTION

A population proportion is a-
fraction of the population that has a certain characteristic. -

Example:
-
let's say, in 1,000 people in the population and 237 of those people have blue eyes.

The fraction of people who have blue eyes is 237 out of 1,000, or 237/1000
2%-0
• The letter p is used for the population proportion. Here,O
p= 0.237≈ 23.7%

p →
pop proportion
How to estimate population proportion “p”? ↑ →
sample proportion
Ø Impossible to estimate about entire population in the real world, .
I ateµ
#
Ø We use ( sample proportion ) to estimate p.
Ø This sample proportion is written as☐ p̂, pronounced p-hat.
p^ estimate p

• Sample proportion “ p̂ “ is point estimate for population proportion “ p”.

FORMULA : Sample proportion

! #$%&'( *+ ,'*,-' ./01 0120 312(231'0(40/3'4


p̂ = =
☐ " 52%,-' 4/6'

Ø CONFIDENCE INTERVALS (CI) for Population Proportion “p”

CI : "̂ ± % ' = 7
&
8
,<(>?,<)
) "̂ ± *9/; + "
,

CONDITIONS to write CI for Population proportions:

We can solve CI if and only if * ≥,


( ) and *) ≥ ,
((. − )

2
Example 1 :

*
&
* -

U = 200

roportion
- #

Online link for Normal distribution for ( :


of disks <
∝/#

asfEOPn
*
http://www.statdistributions.com/normal/

=P=LY=0.8
*

O is too long.
check conditions
② nC1-)>,5r
np 7,5-
↓ f 200 x (-0.07) = 186
200x 0.07 = 14

③ cI:(I ave)(
-
(0.02 I
os
16454.
CI
S 003, 0.0996)
->

↑=0.07
⑬ Meaning
-
that ->I
confident C
-

are 90% #
996
We
~
os

ofall defectiveand
an
↓ ↓
prop 10%
40 /
10%

3
Example 2 :

->
Online link for Normal distribution for (∝/# :
C0 ·
98
http://www.statdistributions.com/normal/ NOTE: keep
i as fraction
1S P =

R
=
75
-
ifdecimal
540
is too long.

2) 250 n(1-P)35-
txo.14= 75.6 540x (1-0.14)
s
= 464. 4

3) I =

(
P1
15
5 40
I

in
Offices sol

-
(0.1144, 0.1664 (

1) We are 90%. Confident had service


that 110/ to
16% customers
interruptions. 0.1634

=
8.114 4
(
160/
11.
4
Example 3 :

>
check this CI
to labeled

Ø Questions from Past assignments


Ø ( These questions are also posted in Practice file)

&

Example 4 ( Past Assignment) P


A research study of 388 patients with diabetes who chose to fast during Ramadan showed that 23.71%
A G
of them reported Hypoglycaemia (High Blood Sugar) during fasting. Find a 95% confidence interval for
·

the proportion of fasting diabetic patients who report Hypoglycaemia. n 388


=

0.95

2 =

I
23.711
W

② n2,55 n(1)>5N=0.237)
a88x (1-0.2371)
↓ ↓
388x40.237
2. 96.01
91. 9

cI:<PFis p (
N

=10.2371 1 196x
=
DLOTTES
( 0.19 0.28) 9

Meaning. We are 95% confident that proportion


between
hypoglycemia is
-

who reported
of patients to 28%
and 28%. OR We can say 19
19.1
reported hypoglycemia.
=>

5
Example 5
A college administrator collects data on a nationwide random sample of 1550 students enrolled in
~

M.B.A. programs and finds that 375 of these students have undergraduate degrees in business. Find
the upper confidence limit (UCL) of the 95% confidence interval for the proportion of such students in
the nationwide population who have undergraduate degrees in business. Round your answer to four
decimal places. h = 1550 xc=375 C=0.95
is long decimal
p fraction of
=Asx
there
& =

=
=
keep it

phantt
is
Uc =

i
15
62

Bos
Example 6
W ↑
A random sample of 280 household electricity bills in Abu Dhabi revealed that 19% of these bills

exceeded the ideal average consumption recommended by ADDC. Find the margin of error (E) of a
-

90% confidence interval for the proportion of household electricity bills in Abu Dhabi which exceed the
-

ideal average consumption. Round your answer to four decimal places.


H = 280 # = 19%.= 809 c=090
exceeds
that
P-proportion houses
on limit.
·

2b.
P
E=

OOG
to
-

=1645

6
Example 5:
R C
A recent study of over 1000 parents on cyberbullying showed that a 95% confidence interval
P
of proportion of kids who faced bullying in social media sites and apps is (0.1327, 0.19). CI
-
-

a) Find the proportion of kids who faced bullying in social media sites and apps.
b) Find the margin of error. Round your answer to four decimal places.

CI: 10.1327, 0.19) -=


!
ABACDBA

;

A) =009: *Y E=
DBA?ABA
2 '

E=
0.19-0.1327 t8
*E
-
B) -
2
A
S
0.1614-0.0287 P=0 -1614 0.1614 +0.0287
*->
0.19
0.1327
Very
Question ( Past Assignment) -> Imp
A researcher reported that a confidence interval for the proportion of adults whose weight
CI
decreased by the end of Ramadan is (12.16%, 27.84%) based on a random sample of 100
adults. However, the researcher neglected to report what confidence level has been used
for constructing this interval. Based on the above information, what was the confidence
level? Show ALL your work. E.
n 100 adults "=
!
!"!#$"!

(0.1216s 0.2784)
=

CI: %

adults who had weightloss


P- proportion of E=
$"!&!"!

I
'


E=
IC-P E= UCL-LCL
-

=0.2784-0.1216
-

oolon-zayflica
2
is
M
=

8Y
P = UCL +LCL
-
· Z
0.04 =0.2784 + 0.1216
0.0 U -

#
2

K -
S
FS5 Ans 7
AANSWERS
LECTURE-1-CHAPTER 7
Objectives

7.1 Introduction to Hypothesis Testing

7.2 Hypothesis Testing for Population Mean


• One-Sample z test
• One-Sample t test

7.3 Hypothesis Testing for Population Proportion

VIDEO LINKS:

Writing Null and Alternative Hypotheses?

https://www.youtube.com/watch?v=uvwNc5dVwNM

What is significance level and Type 1 error?


https://www.youtube.com/watch?v=dDOKxRnCovo
DEFINITION

What is a Hypothesis ?

A statement about a population parameter is called a hypothesis.


It’s basically a claim that can be tested.

Examples: All Dubai Residents are vaccinated.


Average( mean) age of marriage in Dubai is 29 years.
Proportion( percentage) of unemployed males in UK 0.18 (18%).

A hypothesis test is a process that uses sample statistics to test a claim about
the value of a population parameter.

HYPOTHESIS Testing
• Gathering enough evidence to either REJECT or NOT REJECT the claim.
4.2.1 Steps for Hypothesis Testing

Ø Steps for Hypothesis testing:


1. Write the Hypothese( Null and Alternative)
2. Calculate test statistic value – One sample test
3. Set up the Significance level
4. Decide the test- Right tail, left tail or two tail test
5. Compute p- value
6. Make an decision and write the conclusion.

Step 1: Write the Hypotheses

There are two types of Hypotheses

or Ha

Null hypothesis (H0):


► A null hypothesis is a statement about a population that is assumed to be
true until it is declared false.
Example : Average weight of ZU students is 62 Kg.
Alternative hypothesis (Ha or H1) :
► An alternate Hypothesis is nothing but just the opposite statement to
that of our null hypothesis( Ha)
► . Alternative Hypothesis is true if the null hypothesis is proven false.
Example: Average weight of ZU student is not 62 Kg
Important Points :

-
► Null Hypothesis is the one who contain equality sign (≤, ≥, =)
► Either hypothesis—the null hypothesis or the alternative hypothesis—
may represent the original claim.

Most important step is setting up Null ( H0) and Alternative (Ha)Hypotheses.

Ø ( Self Study)Watch this video


https://www.youtube.com/watch?v=uvwNc5dVwNM

WRITE NULL (H0) AND ALTERNATIVE HYPOTHESIS (Ha).


Identify which one represents the claim.

ll
Ex1: Mike claims that average( mean) price for houses in castle hill is⑧
more
-T

M:
than $500,000.
average price of
houses in castle hill

Do:M <500,000
a > 500,000 (claim)
Ha:
Sh
Ex 2: A testing center advertises that average(
---
mean) waiting time for COVID
test is-
less than 30 minutes.
m: average waiting time for testing
Ho: l >,30
Ha: l <30 (claim)
in average dring time
for its paints
Ex 3: A company claims that have a mean drying time of its paints is 45
minutes.

M 45 (claim)
No:
=

M F 45
Ha:

Ex 4: A consumer analyst reports that the mean( average) life of a certain


type of automobile battery is not 74 months.

Ho: M = 74 He averagehere life


Ha: M*< 4 (claim)

Ex5 : A real estate company publicizes that the proportion of homeowners


who feel their house is too small for their family is more than 24%.
owners
Ho: P30.24 p proportion of
who feel house

P > 0.24 (claim) is small.


Ha:

CHOOSE RIGHT OPTION for Null and Alternative:

-reclaim
Practice >
self practice

O
Step 2: Calculate test statistic value (Collecting Evidence)

In step 2, we collect evidence . That evidence is test statistics value.


The test statistics value is the value we collect from sample and use as
evidence to make decision.

To solve test statistics value, we need to consider two cases.


Case I : Population is normal and size doe does not matter.
We use one sample t- test.
Both and I have

B
2
!"#$
Test statistics value is t= but
%/√( same formula
Case II: Large sample and Population is not normal. we will put
online
We use one sample z-test. different
W
calculators

I
!"#$
Test statistics value is #=
%/√(

Calculate test statistics value: ( Past Paper) ->


(2 marks]
->
Ex 1: Find the standardized test statistic t for a sample with
-
n = 20, '
X = 14.5, f
&
s = 1.25, if null hypothesis is H0:-
µ = 16.

# Be
M: H-1:
t =
s

Ex2: Find the standardized test statistic z for a sample with n = 40, '
X = 17.5, s = -

6.25, if null hypothesis is H0: µ = 15. Round your answer to three decimal places.
-
-

z= T-M =
-
1.5-1:
Do
s 6.25/V40
Calculate test statistics value:
Ex 3: A famous property consultant announce that the average price of an
outlet is 300,000AED. A sample of 25 outlets for sale are investigated and it
was revealed an average selling price of 265,000AED. Assume that the outlets
@

costs are normally distributed with a standard deviation of 14000AED, what is


-

the value of the test statistic ( past paper). In this guestion


1:300,000 is iven
change F
g
normal n = 25

Frits000
*

= 265,000
t =

F = 14000 AED

.Rs
&
Hypothese Ho and Ha
&
test statistic value
&
cevidence)
3
2: Set up significance level (∝ −&'()*)
Step A
• Sometimes Null Hypo is true, but you still reject it . This means you made an
-> m
error. This is called Type 1 error : When you reject a TRUE null hypothesis.
• The level of significance(∝) is the probability(risk) of making such error.

• ∝ is the rejection region for Null Hypo.

If test value ( evidence) falls in this


area, then Null hypothesis is
-

rejected.
-

tance
accep
area test statistics
-
<
Ho
DON'T Reject No Reject
• Three commonly used levels of significance are ∝ = 0.10, ∝ = 0.05, and ∝ =
-
0.01. (∝ is given in the question).

Interpretation :
Ex: If ∝= 0.05, there is 5% chance of making a type 1 error.
If ∝= 0.1, there is 10% chance of making a type 1 error.

10.)

5:1-
>as we Increase &

rejection region

There
->
becomes

is
bigger.
more -
I
probability that
Ho will be rejection
• NOTE: Smaller the ∝, less is chances of making type 1 error.

Ø Once you find your test statistics value, you want to see if this value falls in
the rejection area or not because if the value of the test statistic(
evidence) falls in the rejection region, the null hypothesis is rejected.
-

Step 4: Decide the Test ( right tail, left tail or two tail)

Ø There are three types of tests to see whether-


test statistics value falls in
rejection area:
• These tests are based on the sign of Alternative Hypothesis Ha.
-

Two Tailed test


Left Tailed test Right Tailed test
Ha has⑧
≠sign
Ha has⑧
< ( less than )sign Ha has > ( greater ) sign
->

⑧ &
&


/2
a S ⑭

Practice : Identify the Nature of a Hypothesis Test

H! : 6 ≤ 8.0 tail test Rejection


H" : 6 > 8.0
L
Right
#
I ount
- rigside
side
Region

left
Rejection
on on
tail lest
left
~
H! : 9 ≥ 0.25
H" : 9 < 0.25 -
d
Rejection
ion

H! : 6 = 9.2 Two
tail test W1 Yonbot
H" : 6 ≠ 9.2

SOLVE ( Past Paper questions)
Q1 Type 1 error is
When you reject false Null Hypo
- >
When you reject true Null Hypo
When you reject false Alternative Hypo
When you reject true Alternative Hypo

Q2 When the value of ∝ is increased, the probability of making a type I error is


A)*Increased
B) Decreased
C) Does not change
D) None of above

Q3 Which value of ∝ highest risk of making type 1 error.


a) ∝ =0.01
b) ∝ =0.05
c)-∝ =0.1

Q4 The average amount spent for breakfast by IT professionals was 17 AED. It-

was*
claimed that breakfasts in one of the café were higher than 17AED. A ->

sample of 25 IT professionals on that café had an average breakfast which cost


-

18.5AED with a standard deviation of 3.5 AED. Assuming that the breakfast's
-

cost is normally distributed, a) No: MKITAED


-

15 AED
(claim)
a) Write Null and alternative Hypothesis. Ha: Ux

AED
b) Calculate test statistics value
b) n = 25, T= 18.5
3.5 AED
c) Which test would you use? 5 =
t-dist
Normal - Use

t=
F-Ucl5
-

SIU

#
M2
Mais is
C) sign of
tail test
Right
LECTURE 2-CHAPTER 7
Objectives

7.1 Introduction to Hypothesis Testing

7.2 Hypothesis Testing for Population Mean


• One-Sample z test
• One-Sample t test

7.3 Hypothesis Testing for Population Proportion

Ø Steps for Hypothesis testing:


1. Write the Hypothese( Null and Alternative)
2. Calculate test statistic value – One sample test
Ha
3. Decide the test- Right tail, left tail or two tail test -> sign of
4. Compute p- value
5. Based on significance level, make a decision and write
the conclusion.

!"#$
Ø t or z=
%/√(

!"#$
Ø If ! is given, then test statistics value is t or z=
)/√(
Step 5: Calculating p- value ( to know the strength of the evidence)

• The main idea of the p-value is whether there’s enough evidence to reject
the null hypothesis.

• The p-value tell us the strength of the evidence against Ho.


*
• The smaller the p-value, the stronger the evidence against H0.

• If the p-value is low (less than the level of significance ∝ −#$%&'), we can
state that there is enough evidence to reject the null hypothesis. Otherwise,
we should not reject the null hypothesis.

How to make decision based on p- Value

Rejection
acceptance
to
Ho
-> ↑
DON'T REJECT
Ho

How to compute the p-value?


We will use online calculator to compute p-value.
http://www.statdistributions.com/normal/ Z- based online calculator
http://www.statdistributions.com/t/ t- based online calculator
Z- based online calculator

Ha> Right fail


[ let rail
two tails
f

t- based online calculator


Finding a P-Value

1) For a*
left-tailed test with the test statistic-
z = -2.35, what is the associated p- T

value? Decide whether to reject H0 when the level of significance is ∝ = 0.10.


-

2.35

oI
1) 2. -

left tail

p
[2,33
2.

ID P
0.009
S G
0 10
Set Ho

2) Find the P-value for a-> two-tailed hypothesis test with a standardized test
statistic of z = 1.64. what is the associated p-value ?Decide whether to reject
-

H0 when the level of significance is ∝ = 0.10.


1) R-value OC =

M
P20.10 I 2 = 1,64
tail test
I
two

-
2)P S &

0.10/ 0. 10


->
REJECT Ho

3) For a right-tailed test based on a sample of 14 observations with the test


statistic*t = 2.185, what is the associated p-value? Decide whether to reject
-

H0 when the level of significance is ∝ = 0.05.


n = 14 ① P-value Ozy 8
=

t = 2.185
t 2.185
test
=

Right tail
df 141-1=13
=

use +- Calculator 2) P < G


0.02 4 0.05

RECT Ho
4) For a two-tailed test based on a sample of 14 observations with the test
-
statistic t = 1.974, what is the associated p-value? Decide whether to reject
H0 when the level of significance is ∝ = 0.05.

n = 14 P-value
① = 0.07

t = 1.974
19?et
df =
14-1=13

a
② P
> 0.05
0.07

DON'T REJECT Ho
5) The mean potassium content of a popular sports drink is listed asO
50 mg in a
el
500-ml bottle. Mariam is interested in->
testing whether the actual mean
some
potassium content is less than the listed value based on a sample ofB
-
40
bottles. She was not sure about the distribution of the potassium content.
She computed the test statistic as S
-
-1.97. What is the corresponding p-
value of the test?
1) No: M), 50
HaM (s0 (claim) 1. Hypothese( Null and Alternative) ~

2. test statistic value ~

is not normal !"#$


( *+ , = %/
27 =40, POP √(
-> size IS large
3. Right tail, left tail or two tail test
2 =
-
1.97 Ha has > , < , ≠

tail lest -
4. p- value

33 left 5. Make an decision based on p- value and ∝

and write the conclusion.

~
-
Bozk - Ans
Ø HYPOTHESIS TESTING WITH FULL STEPS -> (7-8 marks]
M
Ex1 :An environmental researcher*
claims that the mean wind speed in Abu
-

Dhabi is less than 15 km per hour. A sample of 16 days has a mean wind speed
- -

of 14.5 km per hour and a standard deviation of 1 km per hour. Assume that the
-

wind speed in Abu Dhabi is normally distributed. At 1% significance level, is


-Ol
-

there enough evidence to support the researcher’s claim?

1) No: M) 15 16. Hypothese( Null and Alternative) -

17. test statistic value


Ha: M <15km/hr (claim) ( *+ , =
W
!"#$ v
%/√(

18. Right tail, left tail or two tail test


2) = 16 T = 14.5km/hr
Ha has > , < , ≠
S = 1km/hr 19. p- value

Normal - z= T-M 20. Make an decision based on p- value and ∝


-

SVT and write the conclusion.


-

II5=
-=

t=-2g df= 15
3 left-tails -
4)
p-value > a
0.81

-
0.032
Ho 0.032
DON'T REJECT
-
I
E= I
-

to
5) Conclusion: We have enough evidence conclude that
-
claim
and
the
reject
speedofthewindresulte
average
U
Ex 2: A researcher->
claims that the mean
-
annual cost of raising a child (age 2
and under) by husband-wife families in the U.S. is $13,960. In a random sample
- -

of husband-wife families in the U.S., the mean


-
annual cost of raising a child
is $13,725 and sample standard deviation is $2345 . The sample consists of 500
-

children. At f
∝= 0.10, is there enough evidence to reject the claim.

1) No: i = $13,960 (claim) 11. Hypothese( Null and Alternative) ~

12. test statistic value


f $13,960
-
Ha: M !"#$
( *+ , = %/√(
M

T $13,725 =$2345 13. Right tail, left tail or two tail test
2)
=

Ha has > , < , ≠

popis so normal, Largeple 14. p- value

15. Make an decision based on p- value and ∝

F-M
As-1398
and write the conclusion.
z =
-

Slon
ACROS
=
z

z=-2.2408
two fail tests
3) Use

4) P-value
d
C
a

0.10
WREJECT
Ho

to
0.0 25 evidence
enough
We
have conclude
conclusion: (claim) and
I
5) Ho not equa
reject
-
IS -
cost -

that any
to
$13960
Ch
Ex 3: A used car dealer says that the mean price of a two-year-old sedan (in
- > -

good condition) is->


at least $20,500. You suspect this claim is incorrect and find
- -

M T
that a random sample of 14 similar vehicles has a mean price of $19,850 and a
-
-
S
standard deviation of $1084. Is there enough evidence to reject the dealer’s

&
claim at ∝ = 0.05? Assume the population is normally distributed.
-

. claim 6. Hypothese( Null and Alternative)


$20,500
-

↳ Ho:/ >
7. test statistic value

Ua: M < $20, 500


( *+ , = %/
!"#$
√(

t-dist 8. Right tail, left tail or two tail test

2) Normal - Ha has > , < , ≠

t =
Th 9. p- value

10. Make an decision based on p- value and ∝


>
IV and write the conclusion.

19850- 20,500
-
=

1884/ VT
=-2.243 6

t=-2.2436, left tail


(3)
p-value, d5
Test
14+1=13

A
=

-
= 0.022 P
a ect Ho
0.022 <0.05
the claim (Hol
(4) We have
enough
evidence to
reject year
old
spice of
2
that mean
and conclude
is less than $20,500.
car
Sedar
average -

Ex 4 A carpet company advertises that it will deliver your carpet at most 6


- -

H
days (6 days or less) of purchase. A sample of 36 past customers is taken. The
-

-
I
average delivery time in the sample was 6.8 days. The population standard
-a
->
- I

deviation is known to be 2.7 days. At ∝= 0.05, is there enough evidence to


reject the claim 1. Hypothese( Null and Alternative)

1) No: M (G (claim) 2. test statistic value


!"#$
( *+ , = ↑ = 2.7 days
)/√(
Ha: ll > G 3. Right tail, left tail or two tail test

Notnormal, largesize
a
Ha has > , < , ≠
(2)
4. p- value
z-score
We will use
5. Check significance level, make a
2I X-I
-

5UTh
=

sk
tot decision and write the conclusion

-
tail test
3)p-value
- 2 = 1078s Right
P-value 0.038

See
=

Ho
P-value a

0.05
0.038 <

evidence to
(U) Conclusion We have enough and conclude
-
- Yet the claim (Ho) is
ref delivery
time
mean
the
move
Ex 5 According to a study, the mean cost of weight loss surgery is $21,500. You
think this information is incorrect. You randomly select 25 surgery patients and
find that the mean cost for their surgeries is $20,695. From past studies, the
5
population
-
standard deviation is known to be $2250, and the population is ⑧ -

normally distributed. Is there enough evidence to support your claim at ∝ =


0.05? Use a P-value.


1. Hypothese( Null and Alternative)
1) No: M = $21,500 2. test statistic value

Ha: I $21,500 ( *+ , =
!"#$
)/√(

2) n =25 x = $20,695 3. Right tail, left tail or two tail test


normal Ha has > , < , ≠
v = $2250s uset-dist 4. p- value

Tll=cHasasot
5. Check significance level, make a
t =

5 Ih decision and write the conclusion

t = -

1078
tail
R-value -? += -1078, Two
(3) test
df = 24

to.088 P &
⑳ REJECT
Ho
0.05
0.088 >
to
evidence
have enough
(4) Csion: We

claim (Ha)
and conclude that

reject the is $21,500.


cost of
the singery
mean
EXTRA PRACTICE :
COMPUTE p - VALUE ( Past assignments) Ho: M >, 130,000

Ha: le <130,000 (claim)


1) The Eagle Ridge Contractors Association claims the average price of a home in their
Ch 5 M
subdivision is $130,000 with a standard deviation of $7,000. A sample of 16 homes for sale
*
in this subdivision had an average selling price of $125,550. The Eagle Ridge Homeowners
Association is interested in knowing if the costs of homes for sale in this subdivision are
actually lower than claimed. Assuming that the costs of homes are normally
distributed, what is the p-value for this test? Ot
-
=

M = o-13000
df= 15
Q p-value * left tail, 7000 /UT
* ol * 2.543 = -

2) A recent survey indicated that the average amount spent for breakfast by business
o
managers was $7.58 with a standard deviation of $0.42. It was felt that breakfasts on the
-

West Coast were higher than $7.58. A sample of 81 business managers on the West Coast
T
had average breakfast cost of $7.65. Find the P-value for the test.

No: M$2.5. (claim)


① A
7.65
* $
8
F = 0.42 $
SI22
normal, large n = 81
② Not n 81 =

2 I X-l
-
1. 65-7.58
-
-
= *
5

v/ Un 0.42/881
(use
z-calculator]
③ p-value, 2=1.5, Righttail
o0.067 =

3) The mean potassium content of a popular sports drink is listed as 50 mg in a 500-ml


bottle. Mariam is interested in testing whether the actual mean potassium content is less
& >
&
than the listed value based on a sample of 16 bottles. She believes that the distribution of
-

hesosa
the potassium content is normal. She computed the test statistic as -1.77. What is the
corresponding p-value of the test?
claims

② t = -1077
of s
p-values left test
fail
5


st
simotal
mP
*

I
O
4) Ahmed is interested in testing whether the pH level of the drinking water is 8 using 64
water samples. The corresponding hypotheses are H0: μ = 8 versus Ha: μ ≠ 8. Which of
the following possible sample results gives the strongest evidence to reject H0 in favor
- -

smaller evidence
value stronger
-
of Ha? p
check p-value for each option

-> I I 8.3-8 3.2


=
- p-value
=
.001
-
0.75 / 04
1.78 p-value s
-> z= -8 =

0.45/884
-2.29 -Prae =
S z=2.8-8 -
=

0.7/84

-Pvalue K3
=

-z=8=1145
0.7184
-> smallest p-value = 0.001
answer.
the correct
option (a) is
5) Abu Dhabi  fire department aims to respond to  fire calls in 4 minutes

&M
or less, on average. A sample of 18 recent  fire calls showed a mean response time of 4
S
=>

minutes 30 seconds with a standard deviation of 1 minute. Assume that response times are
- -

normally distributed. Would this information provide sufficient evidence to show that the
-

goal is being met at 5% significance level? * = 4min30sec =


4.5 minutes

① Ho M14 (claim)
Ha. · MT4
1.
2.
Hypothese( Null and Alternative)
test statistic value

& Normal -t =

e
FMGi ( *+ , = %/
!"#$
√(
# tail of=1 3. Right tail, left tail or two tail test

③ ↑value - Right 0.05


Ha has > , < , ≠
2=
↑= 0.02 4 4. p- value
Ho
PCX Reject 5. Check significance level, make a

haveenoughto
Conclusion:
& We decision and write the conclusion

time
and conclude
that respond
4 minutes.
is more than
LECTURE 3-CHAPTER 7
Objectives

7.1 Introduction to Hypothesis Testing

7.2 Hypothesis Testing for Population Mean I


• One-Sample z test
• One-Sample t test

7.3 Hypothesis Testing for Population Proportion

RECAP OF LECTURE 1 and LECTURE 2


Ø Steps for Hypothesis testing for mean !:
1. Write the Hypothese( Null and Alternative)
2. Calculate test statistic value – One sample test
3. Decide the test- Right tail, left tail or two tail test
4. Compute p- value
5. Based on significance level, make a decision and
write the conclusion.

!"#$
Ø t or z=
%/√(
!"#$
Ø If ! is given, then test statistics value is t or z=
)/√(
Ø 7.3 HYPOTHESIS TESTING FOR POPULATION PROPORTION

Do you remember ?

In section 6.3, we learnt about population proportion.


(by people in

unemployed
A population proportion is a fraction of the population that has a certain
characteristic. The letter&
-

-
“p” is used for the population proportion.

" to estimate p ( population proportion ).


We use ( sample proportion )D
!
I -> H
D
!
This sample proportion is written as p̂, pronounced p-hat. $% =
" ↑-> P

In Sections 7.1 and 7.2, you learned how to perform a hypothesis test for a
-

population mean '. In this section, you will learn how to test a population
- -

proportion p.
-
->

We use&
z- test for hypothesis testing for population proportion.

Ul


Check these tor Z

conditions X-th
=

S/h
Q1 Write the appropriate null and alternative hypotheses are
P
A healthcare researcher claims that 30% of fasting adults lose weight by the
---

end of Ramadan.

No: P = 0.30 (claim)

Ha: PF 0.30

P
A researcher claims that less than 40% of U.S. cell phone owners use their
-

phone for most of their online browsing.

No! P> 0.40

na: P3040
op
cain
A researcher claims that more than 40% of U.S. cell phone owners use their
phone for most of their online browsing.

No: P < 0.40


Ha! P > 0.40 (claim)
The quality control manager at a manufacturing company claims that the
proportion of produced carbon monoxide detectors that are defective
exceeds 3%.
~

Ho i P 1 0.03
(claim)
Ma: p>0.03

A researcher claims that fewer than 40% of U.S. cell phone owners use their
phone for most of their online browsing.
-

Ho: P > 0.40


Ha: < 0.40
P
Q2 Find the test Statistic value and p - value for proportion.
P
A research center claims that 30% of U.S. adults have not purchased a certain
-

brand because they found the advertisements distasteful. You decide to test

-
this claim and ask a random sample of 250 U.S. adults whether they have not purchased a
certain brand because they found the advertisements distasteful. Of those surveyed, 90 ⑧
g
reply yes.
n=
250cc =90


Be
a) Find the test statistics value.
b) Find the p - value

1) Do: P =
00 (claim)
0-30
Ha: PF
and (-Po)>5
2) uPo 352 n -
O

250 (1-0.30) ->
0.30
250 x
A
* are met
Both conditions

Fitopof308
B: =
*
Tol
tail
two
P-values
138
=

Q3 A researcher is testing whether more than 60% of college students use passwords that
-

are less secure because complicated ones are too hard to remember. In a random sample
of 400 college students, 260 say they use passwords that are less secure
because complicated ones are too hard to remember. ① No: P 1060
Ha: p > 0.60 (claim)
a) Find the test statistics value.
check conditions
b) Find the p - value &
R(1-P0)35
0.65-0-60
&
2. "Po>5;

EO
04 l

res
=
-
a) I

Nt[1-0.60)
-
=

a 260 0.65
- & =

400 Too
tail
Use calculator, Rig ht
6)Pralue -> 2
#
o2
► Hypothesis testing for proportions(Full steps) -> (7-8 marks)
of
Q1 A researcher claims that 86% of college graduates say their college degree
has been a good investment. In a random sample of 1000 graduates, 845 say
-

their college degree has been a good investment. At ∝ = 0.10, is there enough
evidence to reject the researcher’s claim? =1000 x = 845
Po
&
Ho: P = 0.86 (claim)
=>
=x=845
Too
1.
2.
Hypothesis
Check the conditions
3. Calculate test statistics
Ha! P f 0.86 :WY5 4. Find P- Value based on
I

Right tailed, Left tailed


Two tailed
conditions
② check and n)-Pol>5-
5. Make a decision
6. Write the conclusion
UPo7,52 de
↓ x (1-0.83)
1000 x 0.86
1000

⑧s T
met
Both conditions are

test statistic
i
z
③ po
=

Too
lis s) =
test
two-tail 1.367
④ P-value
2: -
s

TREJECT
*
-

<
boy p>
174 0.10


0.
evidence
have enough
Conclusion We 861
⑬ -
->

to conclude
that
that degree
graduates feel investment
is good a
*
Q2 A researcher claims that less than 40% of U.S. cell phone owners use their
phone for most of their online browsing. In a random sample of 100 adults,
up
&31% say they use their phone for most of their online browsing. At ∝ = 0.01,
is there enough evidence to support the researcher’s claim?
0.40
① Ho p >
P
·

= 0.31
0.40 (claim) 100
1. Hypothesis
2. Check the conditions
Ma: PC n =

3. Calculate test statistics


conditions 4. Find P- Value
② check the >5 Right tailed
and nC-Po) Left tailed
NPO 35 Two tailed
5. Make a decision
③ test statistics 6. Write the conclusion

Up-Po
For a J
2 = ·

nos =

#Po zail test


W
H

-1.837, left Don'tHo Reject


④ P-value, z=
0.01
↑>
0.033 x =

p
=

Conclusion:
to reset the or
enoughevidence
- more
have
(5) We
and online browsing,
the claim
use
the phone for
w
He=>
Q3 In a sample of 150 households in a certain city, 99 had high-speed
I
internet access. Can you conclude that at least 70% of the households in this
-

city have high-speed internet access? Use   = 0.1.


① No: p>0.70 (claim) =99=8G

0.70
Ha: P<
and W(l-Pols,5
conditions upo),5
& check
value
test statistics
co
t-Os
③ 2 P-Po -
=
-
1,069
-

NPoC W
0.1
tail test 0.145 a
left p
= =

11889,
④ P-value
We
reject
-> z=-
Ho

evidence
enough
⑤ Conclusion: We have
the household
have
to prove that atleast 70% of
speed internet.
high
->
Q4 In which of the following situations the one-proportion z-test will NOT be
applicable? >5
~Po), 5 and nC1-P)
G

n=200, x=160, p0=0.5 200 X 0.5 200 (1-0.5)


-
v
1002
=
-
100

n=200, x=10, p0=0.97

n=200, x=4, p0=0.96


1 Po 7,5
0.01
&n=400, x=15, p0=0.01 400 X
4
=> < 5
met
condition not
TYPES OF ERROR

No matter which hypothesis represents the claim, you always begin a hypothesis test by

- -
f-
assuming that the equality condition in the null hypothesis is true. So, when you

->
perform a hypothesis test, you make one of two decisions:

->
Reject the null H
or.
o

*
Fail to reject the null H
0

Because your decision is based on a sample rather than the entire population,
there is always the possibility you will make the wrong decision.
-

-

-> f

• Generally, we know that charging an innocent (Type 1 ) is more serious error than a
setting a criminal free( Type 2).

• You can decrease the probability of a type I error by lowering the level of
significance. G

The table shows the four possible outcomes of a hypothesis test. We use this table to
analyze the decisions in Hypothesis testing.

► PRACTICE : ANALYZE THE ERRORS


M
Q1 Shamsa claims that mean weight of ZU students is more than 59 kg
-

H0: μ ≤ 59 -> True P ⑦


0.033 ) 0.01
Ha: μ > 59 (claim)
P Don't Reject Ho
Using a significance level of 0.01.The p-value of the test is 0.033.

-> ↳
If the true (actual) value of μ is found to be 57kg, the conclusion

I. Type I error T
5 159
yes
II. Type II error

III. *
Correct decision
*
IV. Both Type I and Type II errors

Q2 Shamsa claims that average weight of ZU students is more than 59 kg


-

H0: μ ≤ 59 -> True

Ha: μ > 59 claim P > a


0.033
& 0.05
&
Using a significance level of 0.05.The p-value of the test is 0.033.
Reject Ho
&
If the true (actual) value of μ is found to be 57kg, the conclusion
-

I. ⑧
Type I error

II. Type II error

III. Correct decision

IV. Both Type I and Type II errors ⑧


Q3 Shamsa claims that average age of ZU students is 22 years
S ↓
False P
H0: μ = 22 ->

Ha: μ ≠ 22 0.03 0.05


Ho
Using a significance level of 0.05, the p-value of the test is 0.03. Reject
-
If the true value of μ is found to be 25years , the conclusion results in a

I. Type I error

II. Type II error


-
III. ->
Correct decision

IV. Both Type I and Type II errors * *

Q4 Shamsa claims that average age of ZU students is less than 26 years


&
H0: μ ≥ 26 -> False P >
Ha: μ < 26 0.05
0.08
Ho
Using a significance level of 0.05, the p-value of the test is 0.08. DON'T REJECT

If the true value of μ is found to be->


24years , the conclusion results in a

I. Type I error

II. &
Type II error
X
III. Correct decision - *
IV. Both Type I and Type II errors
ANSWERS PRACTICE - CHAPTER 6
• SECTION 6.1 Construct CI when we know population SD !

$
CI: ###
" ± %!/#
√&

• SECTION 6.2 Construct CI when we DON’T know population SD !

1 =/0/120/
0
#

20/,/0/
E=
3

• SECTION 6.3 Construct CI for Population Proportions

CI : &̂ ± ( * = '
)
&
()(+,())
, &̂ ± %!/# - .
.

CONDITIONS to write CI for Population proportions:

We can solve CI if and only if % ≥'


# $ and %) ≥ '
#() − $

/0/120/
/=
)
#

20/,/0/
E=
3
H

Q1 Meera wants to estimate the mean waiting time at a bank in Abu Dhabi. A random sample of 35
T S
customers yielded a mean of 5.5 minutes and a standard deviation of 2.5 minutes. Which distribution
should be used to construct the confidence interval?
not normal
-> population
(n >,30)
->
Large Sample
-> l-distribution (standard
normal
distribution)

M
Q2 Meera wants to estimate the mean waiting time at a bank in Abu Dhabi. A random sample of 15
I
->

S
customers yielded a mean of 5.5 minutes and a standard deviation of 2.5 minutes. Which distribution
should be used to construct the confidence interval?
normal
-> population not
size (n 230)
-> small sample
-> does not qualify for
2- distribution

Q3 Jalal is interested in estimating the mean waiting time at a fast food outlet. He selected a random
O * S
sample of 40 customers which yielded a mean of 3.5 minutes and a standard deviation of 0.9 minutes.
- - - -

Given that the waiting time is normally distributed, which distribution should be used to construct the
-

confidence interval?
normal
population
t-distribution
39
df=1+1 = 40-1 =
Q4 A computer supply store sells a wide variety of generic replacement ink cartridges for printers. A
quality inspector is checking the cartridges that should contain 30 ml of ink. A random sample of 16 R

#
->

black replacement cartridges revealed a mean amount of 29.53 ml with a standard deviation of 0.52 ml . -

S
Assume that the underlying distribution of the amount of ink in these cartridges is normal.
Find the 90% confidence interval for the mean amount of ink in such cartridges. Normal
-
distribution
use t-dist
a) (29.30, 29.76)
cI:)II
tax)
H= I G
b) (29.25, 29.81) 0.90
c =
c) (29.15, 29.91)
d) (29.32, 29.74) x 29.53
=

(29.53 +1253 X
fist ( S=0.52
·

&

X
29.30 29.53 29.76
90%CI: (29.50s 29.76)
A
-
9 0.aI

H *
Q5 A research study of 30 households in Baniyas showed that the mean pH level in drinking water
-> -

was 6.92 with a standard deviation of 0.36. Construct a 99% confidence interval for the mean pH level
-

in household drinking water in Baniyas.


population not normal, but large sample
( >30)
a) (6.751, 7.089) use 2-dist
n = 30
b) (6.812, 7.028)
9ggcI: (X +
29kful
*

c) (6.791, 7.049) = 6.92

d) (6.739, 7.101) S=0-36

Ess)
x

x- PH level
:
(6.9212.576 c = 0.9 9

water
of drinking Banyas (6075, 7.089)
in
houses *

= 692 7-089
6. 75

<
-
that pH level in
We are 99% confident average
houses is between
drinking water in Banyas
6.75 and 7.089.
Q6 Fatema wants to determine the mean play time of media clips on her smartphone. A random sample
H T S
of 25 media clips revealed a mean of 86 minutes and a standard deviation of 27 minutes. Assume that
- -
-

the underlying distribution of the play time is normal. Find the lower confidence limit (LCL) of the 90%
-
-

confidence interval estimate for the mean play time of media clips on Fatema's smartphone. Round
your answer to four decimal places.
!
Normal popu lation
use
t-dist
*
CCL: -

tay C=0.90
Un x = 86min
↓ min
76-7603 S 27

G
=
=
86-171IX n = 25

d5 = 25-1:

Q7 For certain types of computers, the time, in hours, between charges of the battery is normally
distributed. We found that a sample of 16 randomly selected computers has a mean of 3.4 and a
- -

standard deviation of 0.5. Find the Upper Confidence limit (UCL) of the 90% confidence interval
-

estimate for the average time period of all those computers. Round your answer to four decimal
-

places.
Normal
population Y
tax to
th
+

UCL=

Use
t-dist ↓
c = 0.90 3. 4 t 1.753x05
n = 16 VTC

"S
X = 3.4

S =
0.5
15
df = 16+=
Q8 A heathcare analyst wants to determine the mean daily number of new COVID-19 cases in UAE in
2021. A random sample of 33 days revealed 2674 cases with a standard deviation of 664 cases.
-

Find the margin of error (E) of the 99% confidence interval estimate for the mean daily number of new
COVID-19 cases in UAE in 2021. Round your answer to the nearest integer (Whole number).
H = 33
population notnormal, but large sample
T =
2674 Use 2-distribution >30)
(7

S = 664
E
[ak ③
=

c =
0.99 Th
2.576
14
=
X = 298 cases

Q9 Suppose that the average number of grams of fat consumed in a day for the sample of 25 adults is
-

77 grams with a standard deviation of 30.3 grams. If the fat consumption by adults is normally
- -
-

distributed, find the margin of error (E) of the 90% confidence interval for the mean number of grams of
fat consumed daily by adults. Round your answer to one decimal place.
normal
4 = 25 population
T =
<g Use t-dist

=30.3 9 E=
to th
df
x503
= 25-1=24
= 171
c = 0.90

Q10 Find the confidence level related to the critical value zα/2=1.476

B
Q11 Find the confidence level related to the critical value tα/2= 1.92 where the sample size n =20.

D
Q12 Rana is interested in estimating the average salary of new business graduate students based on a
random sample of 60 students. Find the corresponding critical value for calculating an 87% confidence
not normal
interval. Round your answer to three decimal places. n = 60 population
but large size
(30)
2412 for
c=0.87
Use 2-dist
1.514
Zak
=

Q13 Ibrahim is interested in estimating the average salary of new business graduate students based on
a random sample of 22 students. It is known that the salaries of new business graduate students are
- -

normally distributed. Find the corresponding critical value for constructing a 96% confidence
- -

interval. Round your answer to two decimal places


normal
n =
22, population is

Use +-dist
of = 2)
c=0.96
#9 for
2.19
tayz =

Q14 A journal article reports that a random sample of 36 antivirus software packages was used as a
basis for calculating a 95% confidence interval for the true mean annual cost of an antivirus software in
- - - -

UAE. The resulting interval was (220 Dhs, 261 Dhs). What is the sample mean used when computing
-
-

the interval? Round your answer to the nearest dirham (Whole number).

(CL,UCL) -
(220,261)
C= 0.95
T =
ULACL
2
Cazo=

220
#
No.5
T 26 1
X cost of
of T
-
I
antivirus
240.5
software
->
95./ CI
M
Q15 A journal article reports that a random sample of 16 antivirus software packages was used as a
-

basis for calculating a 99% t-based confidence interval for the true mean annual cost of an antivirus
-

software in UAE. The resulting interval was (246 Dhs, 290 Dhs). What is the sample standard
-

deviation used when computing the interval? Round your answer to two decimal places.

H = 16 = d5 = 15 E= take fo E= UCL-ICL
-

2
c = 0.99
↓ d

2YI
=
2.947 X2
(246, 290) 22 =

CI:
-
rt
t-based interval E=
RR2
Ex
= s

S=?

2 9.86 = 5

Q16 Past experience indicates that the standard deviation of the time it takes for a dentist to perform
5
dental scaling for patients is 4 minutes. A researcher wishes to estimate the mean time to perform
-

dental scaling with 99 percent confidence and a margin of error of ± 0.5 minutes. Given this, what must
I
the sample size be? Roundup your answer to the next whole number.
question, population
SDC is
In this given
=4 minutes

0.99 or
C =
E =
Tay i
E = 0.5 ↓ ↓
2.576 x1
M =
? 0.5=
- Ve

4
o =
CX
0.5
"

n =

x4)
(ezs
424.69
-
5 patients
H
Q17 Ibrahim kept a careful record of the fuel efficiency of his car. After the first 25 times he filled up the
-

tank, he found the mean was 23.3 miles per gallon (mpg) with a sample standard deviation of 1.1mpg.
Assuming the mpg is approximately normal distributed, Ibrahim estimated the true mean mpg of his car
- -

and found that it lies between the bounds 22.85 and 23.75. Find the confidence level he used to
-

estimate the true mean mpg. Indicate all steps.


UCL-LCL
E

4 = 25 =
-

x = 23.3 miles/gallon
E:
take fo 2.


:
232.85
=101 miles/gallon ↓ 2

uset-dist
normal ->

o.usF 0.45
=>

population
CI: (22.85, 23.75) x 51 +Ck
c =?
of
*
t</2

I
us

Q18 The following data give the amount of time (in minutes) in one day spent on Facebook by each of
10 college students.
78 87 83 100 84 47 79 80 91 71

4 = 10
Assume that the underlying distribution of the times is normal. at-dist
- > df = 9
The 95% confidence interval for the mean amount of time (in minutes) in one day spent on Facebook by
->

college students is: (_____________,______________) Round answer to nearest minute (Whole number).
C=0.95
IO

II+aztn)
n=
CI:
adall =
= 800
(80 12263x 102)
IO
can
solve
=>
For is, you
used
on calculator. I
online Website,
90.03)
(69.9), ↓ *M.O2
LCL UCL
Q19 A researcher computed 90%, 95%, 97%, and 99% confidence intervals for the mean lenght of
hospital stay at NMC. Which of the following intervals is the the 97% confidence interval?

-> width = 13.63-10.37 = 3.26 4


-> width = 13.47-10.53 = 2.94 9 0%

-> width =
13.23-107 = 2.46 95%

width 1393-10.07 =
3.8699 /
->
=
As c increase

Y width increase

t=as Testas
I
13
I
90%
lo 13.23

us

10.07

Width 97/
We can find out by comparing the -

(0.37, 13.63)

Q20 A random sample of 40 observations was used to estimate the mean SAT scores of applicants at a
business college. The 99% confidence interval was found to be (1690 , 1810). If we are to increase the -

M
sample size to 45 (other factors remain unchanged), the 99% confidence interval for the mean SAT
-

scores would become narrow

that affect the width of CI


Three factors
-> As increase, width increase
B c
decrease
u- As a increase, width
*
width increase
increase,
-> * s -> As s
S
Q21 If the underlying population is normally distributed, which of the following would produce the widest
confidence interval for μ?
For widest interval
I. 90% confidence interval with n = 180 I should be highest -> 95%
II. 90% confidence interval with n = 18 smallest ->18
n
should be
III. 95% confidence interval with n = 180
IV. 95% confidence interval with n = 18

that affect the width of CI


Three factors
As increase, width increase
* c-
decrease
ne As n increase, width
*
width increase
As s increase,
* s ->

Q22 Ahmed constructed a 95% confidence interval for the mean amount spent per day in a shopping
mall based on 40 customers. The confidence interval was found to be (1500 AED, 2000 AED). Which of
the following statement(s) is (are) always correct? This question has multiple correct options.

True
False

1
TrUe
also
F

mean amount of 40 customer


F is

(
->
1500
don
is
hstomer
*

- Yes, this same mean amount set


to
between 1800

confident
amount AED.
- We are 95% 1500 and
2000
is between
of a Amers
-
Q23 State true or false about t- distribution.
I. Student t distribution is less spread out than z distribution. ↑alse

II. Student t distribution is more spread( fatter) out than z distribution. TrUC

III. The degree of freedom does not depend on sample size. # alse

IV. t-distribution is bell shaped. Tree

V. As df increase, t- distribution looks like z- dist.( standard normal dist) True

VI. Population for t-dist can be non normal False

out than 2-dist because

1) t-dist is more spread


to small samples -

t-dist applies
sample size .df = n
and
1117 of depends on
narrow
t-dist becomes
100k
As n Increase, it starts to

sample with >30 size,

like z-dist.
should be normal fort-dist.
VIS population

Questions on CI for Population Proportions( Section 6.3)

Q24 In which of the following situations we can NOT construct a confidence interval for the population
proportion? check conditions for each option
np -5 and n(1-4)>5
M
n=400, x=396 p = 396

= 0.99 - 400 x 0.99: 396 400 (1-0.99) =MD
oo (condition not
n=400, x=200 P= 200 ->
0.5
satisfied)
↑oo

* To itfractimal
n=20, x=6

n=550, x=70

The we cannot construct


(I

first option,
x =396,
In n=400
Is not satisfied.
as the condition (1-4)>5
Q25 A random sample of 400 adults in UAE revealed that 265 of them took the COVID-19 vaccination.
-> -

The 99% confidence interval for the proportion of all adults in UAE who took the COVID-19 vaccination
is
& = 400
@C=26g
n
= 0.662s =

x
265
=

a) (0.6236, 0.7014) 0.99


c =
b) (0.6162, 0.7088) &
check conditions
c)
d)
(0.6075, 0.7175)
(0.6016, 0.7234) np -5 and
~ (1-4)S
400 X0.6625 400 (1-0.6623)
265 T
5 135 > S

conditions met

③ 99/.cI-)Y I
Iasi S
->
(0.6625 I 2.576 -
nS
025(1-05625)
->

Q26 A research study of 388 patients with diabetes who chose to fast during Ramadan showed that
23.71% of them reported Hypoglycaemia (High Blood Sugar) during fasting. Find a 95% confidence
interval for the proportion of fasting diabetic patients who report Hypoglycaemia.

solved in Lecture 3
This question IS

answers.
to lecture 3 for
Refer
Q27 It is highly recommended that adults should monitor a specific health indicator, for example weight,
diet, or exercise. Suppose a random sample of 250 ZU students is obtained, and all were asked if they
-

monitor their health in some way. If 139 of them indicated that they do, find the upper confidence level
-

(UCL) of the 99% confidence interval for the proportion of ZU students who monitor their health in some
-

way. Round your answer to four decimal places.

&
=b9= 0.5,

UCL - p+ 29.
d
T I

r
-

0.556 (1-0.556)
0.55 6 t 2.576 -
=

2 50

= 0.636 9

Q28 A random sample of 280 household electricity bills in Abu Dhabi revealed that 19% of these bills
exceeded the ideal average consumption recommended by ADDC. Find the margin of error (E) of a
90% confidence interval for the proportion of household electricity bills in Abu Dhabi which exceed the
ideal average consumption. Round your answer to four decimal places.

solved in Lecture 3
This question IS

answers.
to lecture 3 for
Refer
Q29 A recent study of over 1000 parents on cyberbullying showed that a 95% confidence interval of
proportion of kids who faced bullying in social media sites and apps is (0.1470, 0.1989).
a) Find the proportion of kids who faced bullying in social media sites and apps.
b) Find the margin of error. Round your answer to four decimal places.

solved in Lecture 3
This question IS

answers.
to lecture 3 for
Refer

Q30 A researcher reported that a confidence interval for the proportion of adults whose weight
decreased by the end of Ramadan is (12.16%, 27.84%) based on a random sample of 100 adults.
However, the researcher neglected to report what confidence level has been used for constructing this
interval. Based on the above information, what was the confidence level? Show ALL your work.

solved in Lecture 3
ThIS question IS

answers.
to lecture 3 for
Refer
ANSWERS -

PRACTICE - CHAPTER 7
• SECTION 7.1 and 7.2 Hypothesis Testing for Population Mean !

!"#$
Test statistics value is t or z = %/√(
!"#$
If " is given, then test statistics value is t or z=
%/√(
1. Hypothese( Null and Alternative)
2. test statistic value
!"#$
! #$ % =
%/√(
3. Right tail, left tail or two tail test
Ha has > , < , ≠
4. p- value

• STEPS FOR HYPOTHESIS TESTING 5. Make an decision based on p- value and ∝

and write the conclusion.

SECTION 7.3 Hypothesis Testing for Population Proportions p


Q1 Al Marai Orange Juice labeling indicates that the total Carbohydrates is*
no more than 11 grams per
serving. A nutritionist is interested in*
testing whether the actual mean amount of Carbohydrates exceeds
&
Plaim
the announced amount. The appropriate null and alternative hypotheses are

claim

->
↓claim

Q2 Salem is interested in testing whether ZU male students spend on average no more than 7 hours
-

-
per week watching sports on Internet. The appropriate null and alternative hypotheses are
claim
claim
-
P
*
-

Q3 If you rejected H0 at the 0.05 level of significance, then


*
= S

es -

at lower a, itmmust
· If Ho is rejected This means, if
Ho is
4:
at higher

reject =0.05
* rejected at a = 0.01 must reject at x
it will obviously
than 3=0.01,
because if p-value is less
than x = 0.05
be less
a
·
But if a IS resected at higher

This means
if He
a
may
is
or

rejected
get
not
at a=0.05 May get a
rejected lowerare rejected
t
at 2= 0.01 ·
Example: p-value =
S
0.03
for <=0.05
but

not for <zo.01


Q4 A health researcher wants to know whether the mean weight of newborn girls in UAE is less than 3
kg (H0: μ ≥ 3 versus Ha: μ < 3) using a significance level of 0.01. The p-value of the test is 0.003. If the

-
true value of μ is 3.1 kg, the conclusion results in a
Ho: M >, 3 P < G

0.01
0.003
Ha! M I 3
Decision: Reject Ho
True value 3.1kg that
ofHo =

means true and


No: U33 is

you rejected to. That means


you rejected a true Null Mo. error
Q5 An environmental researcher claims that the mean amount of sulfur dioxide in the air in UAE cities is
less than 1.15 parts per billion (ppb). A random sample of 15 cities is randomly selected to test the
researcher's claim. Assume that the amount of sulfur dioxide is ->
normally distributed. Which test would
be appropriate for this problem?
Normally distributed t-test.
- We will use one -
Sample

claim
* -
Q6 A company claimed to have developed a new AAA battery that lasts longer than its regular AAA
batteries. Based on years of experience, the company knows that its regular AAA batteries last forI
30
hours of continuous use, on average. The production manager collecetd a simple random sample off
* 15
I
new batteries and computed the test statistic as 1.65. He believes that the distribution of the lifetime of
the new batteries is&
normal. What is the corresponding p-value of the test? Round your answer to
three decimal places.
No: M130 hours test
(claims Righttail
Ma: U 30
will use to test
Distribution is normal, we
H = 15

d
I
15-=14+=
fail test
1.65, Right (Use online
=
p-value = 0.061
claim
W
Q7A company claimed to have developed a new AAA battery that lasts-
-
longer than its regular AAA
batteries. Based on years of experience, the company knows that its regular AAA batteries last for&
30
hours of continuous use, on average. The production manager collecetd a simple random sample of⑦
- 60

&
new batteries and computed the test statistic as 1.43. He was not sure about the distribution of the
- -

lifetime of the new batteries. What is the corresponding p-value of the test? Round your answer to
three decimal places.
No: M130 hours
x claims
- Right tail test
Ma: U > 30
normal but size is large (n=60)
not
n= So, Distribution is
fail test
will use 2-dist z = 143, Right [uselink)
We

*
=0.026

I
Q8 A medical research team claims that the mean recovery time for patients after the new surgical
treatment is-
less than 96 hours. The team tested the hypothesis H0: μ ≥ 96 versus Ha: μ < 96. The null
hypothesis is not rejected. The appropriate conclusion would be
-

Ho: M >, 96

Ha: 1 <96
(claim)

->

& Tr. REJECT No

means we
This
the claim (Ha)
reject
time
that recovery
is less than 96hrs.
Q9 A manufacturer is interested in the output voltage of a power supply used in a PC. The manufacturer
wishes to test H0: μ ≤ 5 Volts versus Ha: μ > 5 Volts, using n = 64 units. If the sample yielded a mean of
5.06 Volts and a standard deviation 0.25 Volts, then the corresponding p-value is
Ho: M 15

-
Distribula W5 f
CRight test
ail targe size 64) (n =
& z-dist,
will
FM fobson
use z =

- We
64
F 5.06
n =
=

S= 0.25 *
1 9.2
2-link
-> Use
test
*
O27
Right tail
->
-

Q10 A manufacturer is interested in the output voltage of a power supply used in a PC. The
manufacturer wishes to test H0: μ = 5 Volts versus Ha: μ≠ 5 Volts, using n = 64 units. Assume that the
output voltage is normally distributed. If the sample yielded a mean of 5.06 Volts and a standard
deviation 0.25 Volts, then the corresponding p-value is
S
Ho: M test)
(Two tail
=

Ha: 45 -

we will
uset-dist
Distribution - normal,
64+=AB
->

of n=

FM fobon
=
=

x 64
=

* F = 5.0 6
S= 0.25 Nail 19.2

+-link test
\ -> Use
=059
Q11Classify each of the following statements about the p-value as either true or false:
A. True the p-value indicates strenght of the evidence against Null Hypothesis.
FALSE,

smaller p-values stronger


evidence against NUUUypoto.
True,
B.
need test statistic value (torz)
2. False. For p-value we

about the Hato decide


and the info
tail test can be
thought
P-value
is probability of chance,
that results of
D. False.
as the probability by random.
of by chance or
experiment are

the evidence
smaller p-value, stronger
is
E True No.
against
Q12 For a left-tailed test with the test statistic z = -2.35, what is the associated p-value?
- - - -

#09
Q13 For a right-tailed test with a test statistic of t=2.35, what is the associated p-value?

need of
forh-test, we
I

need
3
m and
For df, we
in the question.
it is not given

Q14 For a two-tailed test based on a sample of 16 observations with the test statistic t = 2.35, what is
the associated p-value?
-

1 16 =

of 15

&
=

e 0.033 =

t = 2:35

Two tail test


Q15 Ahmed is interested in testing whether the pH level of the drinking water is 8 using 64 water
samples. The corresponding hypotheses are H0: μ = 8 versus Ha: μ ≠ 8. Which of the following possible
sample results gives the strongest evidence to reject H0 in favor of Ha?

I have solved this question


practice of lecture
2.
in

*
to
lecture 2

Q16 A social media analyst claims that the mean time spent on social media platforms by college
-

students exceeds 100 minutes per day. A random sample of 36 college students revealed a mean of
-

108 minutes and a standard deviation of 6 minutes. What is the-


test statistic? Round your answer
to two decimal places. &
Ho: M100 min
(claim)
U> 100 min
Ha:
not normal but she is large
② h=36 - Distribution (n =
3 4)

We will use z-test


=
3:
108 min
6 min
I:
FTMtOI0
Q17Salma conducted a-
left-tail test to determine whether ZU students sleep, on average, less than 7

-
0.024. If the=>
hours during Ramadan. The p-value is found to be-> test had been two-tailed, the p-value
would have been: tail test Ho: M3 7
left M << ->
claim
Ha:
two tail
test, then
it was
If both sides
o0024
0.024 on

p=0.024+0.02448
Q18 International health organizations -
recommend that people should exercise -
more than 60 minutes
per week to lower various health risks such as high blood pressure, heart disease, and high cholesterol.
U
Forty randomly selected women have been selected and asked to report their own physical exercise per
-

mean and the*


week in minutes. The- standard deviation of weekly exercise time were 63 and 12 -

minutes, respectively.-
Test to determine if women meet the recommended weekly exercise time. Use
-

&
10% significance level.
-

State the hypotheses, compute the test statistic (Round to two decimal places), find the p-value
(Round to three decimal places) and state the conclusion. n 40
=

min T =
63 min
P No: M160
S min
min claim = 12
1760
-
Ha:
left tail test
C = 0.18
2

not normal ==
FAM 63-60
Distribution
=

② 40)
-

size (n =
S/In 12/240
but large
z-test
We will use 2 =
1581

③ P-value =?
Use 2-link

z= 1581s left
tailtest ->
943
Ho
↓ PC & Reject
(1) p
0.10
0.943 C
that
to conclude
(5) We have enough envidence recommended
meet the weekly
women
time.
exercise
Q19 Regulatory and Supervisory Bureau (RSB) standards require that the amount of lead in drinking
water be less than 10 μg/L. Twenty
-
two samples of water from a particular source yielded a mean lead
content of 9.85 μg/L and a standard deviation of 0.4 μg/L. Assume that the amount of lead in drinking
-

water is->
normally distributed. At 1% significance level, can you conclude that the water from this source
meets the RSB standard? State the hypotheses, compute the test statistic (Round to two decimal
places), find the p-value (Round to three decimal places) and state the conclusion.

n= 22

Hypothesis:
& No: M), 10
T 9.85
=

Ha: U < 10
(claim) S = 0.4

test statistics value 22+1=At


& totest df
=

Normal distribution -

=
IU Sir.2
t =

③ p-value F-176
tailtest,
left
0.04C
p-value =

No
④ P > G DON'T REJECT
0.0 1
0.046
Ho and
We
don't Reject
⑤ conclusion: this source
water from
conclude that RBS
standard
NOT meet
DOES
-
->
Q20 A) The error of rejecting null hypothesis when it is true is
---

Q20 B)The probability of rejecting null hypothesis when it is true is


-

c) The error of not rejecting a false null hypothesis is


- - -

D) The&
probability of not rejecting a false null hypothesis is

I
QUESTIONS ON HYPOTHESIS TESTING FOR PROPORTIONS (SECTION7.3)
Q21 A healthcare researcher claims thatI
no more than 30% of fasting adults lose weight by the end of
- >

Ramadan. The appropriate null and alternative hypotheses are

-
Q22 Aysha is interested in testing whether at least 80% of fasting adults change their lifestyle in
Ramadan. She tested the hypothesis H0: p ≥ 0.8 versus Ha: p < 0.8. If the null hypothesis is not
rejected, the appropriate conclusion would be No: P7,0.80 (claim)
Ha: p <080

Decision? Ho is not
rejected
->
& This means we

Ho (claim)
support

I
Q23 A researcher is testing whether more than 60% of college students use passwords that are less
secure because complicated ones are too hard to remember. In a random sample of 400 college
students,I
55% say they use passwords that are less secure because complicated ones are too hard to
remember. The p-value of the test is

Ho: p10.60 n = 400
claim
& Ha:p>0.60
-
B = 0.55

FPO FOTOs
0.55-0.60
⑧ 2:

son =
SPI
-2.041

Rightfailtest **O2
P-value
③ =? 0.979
use z =-2.041s
Po
Q24According to Security Magazine, 32% of ransomware victims paid to restore access to their data.
108
Assume that a random sample of 400 ransomware victims in UAE showed that [x]
# paid to restore their

data. We are interested in testing whether the percentage of ransomware victims who paid to restore
lower than the global rate. What is the value of the*
their data in UAE is- test statistic? Round your
answer to two decimal places.
= 108 0.27
=

Ho: P 0.32
>

Poo Ha: P < 0.32 (claim)

Do
no So 3e =4

Q25 A public health researcher is interested in testing whetherA


-
less than 75% of residents in UAE rely
on social media as a source of information for COVID-19 related health updates. If the resulting test

f
statistic was -2.27, what would be the corresponding p-value?

Ho
I
Fx 0.75
P
O a

(claim)
Ha p<0.75

2.27
② z = -

⑤ P-value =
?
left
tail test A = 0.012

3 =-2.27,

Q26 In which of the following situations the one-proportion z-test will NOT be applicable?

check conditionsfor option


>3
and n(1-Pr)
MPo IS *

(1-0.95) 2.45 x
and 120 x
=

120x0.95=117.675 (not met)


-> -
(met)
-
Q27 Alia is testing whether more than 20% of those who took Pfizer-BioNTech COVID-19 vaccine
experience headache after taking the second dose (H0: p ≤ 0.20 versus Ha: p > 0.20) using a
significance level of 0.01. The p-value of the test is 0.02.- >
If the true value of p is 0.22, the conclusion
results in a Ho: P 10-20
Ha: P >0.20

Ha
a
ct
=
0.01
R-value = 0.02
Ho

If true value of P=0.22


ase
-

is
which mean Ho: P1020
and you have rejected Ho
false Ho
means you
have rejected a
This decision.
made correct
->

have
you
Q28 The test statistic for testing whether 20% of teens in UAE are obese (H0: p = 0.2 versus Ha: p ≠ 0.2)
-

is z = 2.5. This test is (claims


Ho; p=0.20
0.20
Ha! PF
-
Two-tail test
and
z 2.5

P-value 0.012 =

-> If x = 0.01 P > a

Don't reject Ho

<<
If <= 0.05 P
Ho
Reject
is
test significant
Note: A
hypothesis
evidence statistics for value) from one

(test -

if our
No

usualenoughresultreect
is
sample So this Hypothesis
at a= 0.05 ·

⑤Here we are rejecting to


at 9 = 0.05
test is significant
Q29 Suppose that SSMC Pharmacy receives a large shipment of Bayer aspirin, in which each tablet
should contain 80 mg of the active ingredient in aspirin. A random sample of tablets will be obtained and
chemically analyzed to test whether the mean amount of active ingredient is 80 g using the following
hypotheses

H0: μ = 80 versus Ha: μ ≠ 80

If -
H0 is rejected, the entire aspirin shipment will be returned to the Bayer corporation. What would be a
->
Type II Error in this cases?

don't to when it is false.


I error is when we reject doesn't
Type means Asrin
(
Here No: 1=80 ->
ifthis is false ss M
amount of active ingredient (80mg),
have specified returned the shipment
to Bayer
but

should have
decided to take shipment and this
they I
becomes type error.

Q30 A global survey found that 30% of people have invested in cryptocurrencies. A recent survey of
500 UAE residents showed that 172 said they have invested in cryptocurrencies. At 1% significance
level, do these data provide sufficient evidence that the inclination to invest in cryptocurrencies in UAE
o
differs from the global level?
State the hypotheses, compute the test statistic (Round to two decimal places), find the p-value
0.01
(Round to three decimal places) and state the conclusion. x =

P 0.30=

a 172 0.344
p
=

0.30
① Ho:P =

500
Ha! P7030
(claim)
Q31 A researcher claims that less than 40% of U.S. cell phone owners use their phone for most of their
online browsing. In a random sample of 100 adults,31% say they use their phone for most of their online
browsing. At ∝ = 0.01,is there enough evidence to support the researcher’s claim?

② check conditions
7)5 v
n (1-Po) 3,5 -

Po conditions
500 X 0.3 500 (1-0.3) met
= 150 350
-

⑤ test statistic value

2:
Po
&Pol
:Nikos 500
o.so,
=
2.14T

④ p-value =
?
test
two
tail
2 = 2147 -
<
5
P-value = 0.032 Posz 0.01

Don't Reject Ho

evidence

The data provides sufficient
Invest In
inclination to
that level
from global
isrent
crypto
Q31 A researcher claims that less than 40% of U.S. cell phone owners use their phone for most of their


online browsing. In a random sample of 100 adults,31% say they use their phone for most of their online
browsing. At ∝ = 0.01,is there enough evidence to support the researcher’s claim?

① Mo: PC,0.40 P = 0.31

Ha: P<0.40 (claim)n= 100


conditions
② check the >5
and nC-Po)
NPO 35

③ test statistics
P-Po
For a J
z= ·

nos =

#Po zail test


psac
H

-1.837, left Reject


④ P-value, z =

0.01
Ho
0.033 x =

p
=

Conclusion:
to resetthe or
evidence
- more
have enough
(5) We
and online browsing,
the claim
use
the phone for
MTH281- Probability and Statistics Fall 2022

ANSWERS -
Practice Problems - Chapter 9

• Online Calculator: Linear Correlation and Regression

– http://vassarstats.net/corr stats.html

• Hypothesis Testing for The Correlation Coefficient, ⇢:


r
n 2
t=r
1 r2

ERS
*
Section 1: Multiple-Choice: For each question, circle the correct answer.

1. Which of the following would the correlation coefficient not be applicable?


between
(a) - >
Personal incomes and marital status.> correlation coefficient is
variables are
(b) Height and weight of football players.
two
Martial antative Qualitative
-

(c) IQ and height of elementary school children. variable.


(d) Daily temperatures and daily consumptions of electricity.

2. Which value of the following values of r indicates the -


strongest>
correlation?

(a) –0.40
value
(b) *
–0.60 shighest
(c) +0.40
(d) +0.53

3. The data below consist of the number of lifetime surgeries and age of a patient.
Number of Surgeries 9 2 3 4 2 5 9 10
Age (years) 85 52 55 68 67 86 83 73
The correlation coefficient is


(a) 0.708 r=0.708
(b) 0.246
(c) 0.235
(d) -0.071

4. The estimated regression equation in the previous question is


Ho: 920 (No corndation]
4.5

I
(a) yb = 0.180 7.276 x nar: sto (significant
correlation)
&
(b) yb = 55.788 + 2.788 x
(c) yb = 7.276 + 0.180 x #= rx
= LOREXNE
2013
1-U2

#302
(d) yb = 2.788 + 55.788 x
=>
P = 0.017
<x
[Reject HoT
5. A researcher conducted a study to investigate the relationship between programming expe-
rience and the completion time for a certain programming task for 10 programmers. The
correlation coefficient was -0.73. Which of the following best describes the conclusion about
the significance of the relationship at 5% level? x 0.05 =

D
(a) p-value=0.017. We conclude that programming experience and the completion time are
correlated.
-
- Y
(b) p-value=0.017. We conclude that programming experience and the completion time are
not correlated.
(c) p-value=0.008. We conclude that programming experience and the completion time are
correlated.
(d) We can’t judge based on the given information.

1
6. If a sample of 30 students is selected and the sample correlation between their GPA and the
daily study is r = 0.468, what is the test statistic value for testing whether the true population
correlation coefficient is equal to zero?

statistic
we
(a) t = 2.80 test

uses
opbox
(b) t = 3.01
(c) t = 2.0484
(d) Can’t be determined without knowing the level of significance for the test. &

7. A recent study of 15 shoppers showed that the correlation between the time spent in the store
and the dollars spent was 0.235. Using 5% level, which of the following is true?
·

Leave this question


(a) The null hypothesis that the population mean is equal to zero should be rejected and we
should conclude that the true correlation is not equal to zero.
(b) Based on the sample data there is not enough evidence to conclude that the true corre-
lation is di↵erent from zero.
(c) The sample correlation coefficient could be zero since the test statistic does not fall in
the rejection region.
(d) The null hypothesis should be rejected because the test statistic exceeds the critical
value.

8. If the coefficient of correlation is -0.4, then the slope of the regression line
-

(a) must also be -0.4


(b) can be either negative or positive
(c) =>
>
must be negative negative correlation, negative slope
(d) must be 0.16

9. If all the points in a scatterplot lie on the regression line, then the correlation coefficient

(a) must be 1.0


(b) must be -1.0
(c) -
must be either 1.0 or -1.0
(d) must be 0

10. A research study has reported that-


there is a correlation of r = 0.59 between the eye color
(brown, green, blue) of an experimental animal and the amount of nicotine that is fatal to
the animal when consumed. This indicates:

(a) Nicotine is less harmful to one eye color than the others.
(b) The lethal dose of nicotine goes down as the eye color of the animal changes.
(c) The researchers need to do further study to explain the causes of this negative correlation.
(d) ->
The correlation is nonsensical. seye color is a qualitative variable

2
negative Correlation
11. A medical research study found a correlation of --0.73 betweenm consumption of vitamin A and
the cancer rate of a particular type of cancer. This could be interpreted to mean:
1- decrease
-

as xc increase
B(a) the more
-
vitamin A consumed,
-
the lower chances are of getting this type of cancer.
(b) the less vitamin A consumed, the lower chances are of getting this type of cancer.
(c) the more vitamin A consumed, the higher chances are of getting this type of cancer.
(d) vitamin A causes this type of cancer.

12. The regression equation of the amount of energy used (in kW) on the number of occupants
in the house (for houses with 1 to 8 occupants) is
- -

\ = 460 + 64 occupants
Energy

The predicted amount of energy used by a house with 15 occupants is:


Range of ec is into occupants.
8
(a) 1420 kW
(b) 460 kW 32 = 15 is out of range.
(c) 960 kW
This is extrapolation.
I
(d) This is extrapolation.

13. The regression equation of the amount of energy used (in kW) on the number of occupants
X As <c= 0. occupants 0
=
in the house is y
\
Energy = 460 + 64 occupants = 460- energy
consumed
-
Based on the model, the interpretation of the y-intercept: -
is 460
(a) All houses will be charged based on 460 kW.
This doesn't make sense ↑

v &
(b) There is no practical interpretation since no occupants in the house is nonsensical value.
(c) For each additional 64 occupants the electrical use increases on average by 460 kW.
(d) For each additional occupant the electrical use increases on average by 64 kW.

14. There is an approximate linear relationship between the height of females and their age (from
5 to 18 years) described by:
\ = 50.3 + 6.01 Age
↳his
question Height
where height is measured in cm and age in years. Which of the following is not correct?

(a) The estimated slope is 6.01 which implies that children increase by about 6 cm for each
year they grow older. C
-

(b) The estimated height of a child who is 10 years old is about 110 cm.->
20 -r

(c) The estimated intercept is 50.3 cm which implies that children reach this height when
they are 50.3/6.01=8.4 years old.
(d) The average height of children when they are 5 years old is about 50% of the average
height when they are 18 years old. I

I
p

3
15. The regression equation of monthly salary (1000s Dhs) on experience (years) for computer
programmers is Y *
\ = 14.5 + 0.6 Experience
Salary when x(Experience) =
0

Based on the model, the interpretation of the y-intercept: -> y.Salary) 14.5 =

thousands
-

14500 Dhs
=

&
(a) The expected monthly salary of a programmer with no experience is 14,500 Dhs.
- - -

(b) There is no practical interpretation of the y-intercept.


(c) Estimated monthly salary increases by 14.5 Dhs with each 0.6 year increase in experience.
(d) Estimated monthly salary increases by 600 Dhs with each 1-year increase in experience.

16. The regression equation of monthly salary (1000s Dhs) on experience (years) for computer
programmers is
\ = 14.5 + 0.6 Experience
Salary
one unit
slope: Increase
Based on the model, the interpretation of the-
in for every
increase in X-

(a) The expected monthly salary of a programmer with no experience is 14,500 Dhs.
(b) There is no practical interpretation of the slope.
----
(c) Estimated monthly salary increases by 14.5 Dhs withf
each 0.6 year increase in experience.
(d) Estimated monthly salary increases by 600 Dhs with each 1-year increase in experience.

17. A regression analysis between weight (in kg) and height (in cm) resulted in the following
regression line:
\ = 55 + 0.92 Height
Weight
This implies that if the height is increased by 1 cm, the weight is expected to:

(a) increase by 1 kg.


(b) decrease by 1 kg.
increase by 0.92 kg. -slope
(c) ->
(d) decrease by 55 kg.

18. A study reported the correlation coefficient between x and y is -0.45. Which of the following
could be the estimated regression equation using the same data?
negative, so regression slope
yb = 0.180 7.276 x correlation
(a) ->
is
should be negative
(b) yb = 55.788 + 2.788 x
(c) yb = 7.276 + 0.180 x
(d) yb = 2.788 + 55.788 x

4
Use the following information to answer the next four questions:
Consider the following scatterplot of amounts of CO (carbon monoxide) and NOX (nitro-
gen oxide) in grams per km driven in the exhausts of cars. The regression line has been
drawn in the plot.

1.8


7 intercept
1.8
CO2 Not
CO

e moderate
19. What would be your best estimate of the correlation coefficient for the amounts of CO (carbon

t
monoxide) and NOX (nitrogen oxide) in grams per km?
relation
(a) -0.70
(b) -0.97
(c) &
-0.50
(d) -0.05

20. The intercept of the regression line is approximately


(a) -0.7
(b) 18
From
the graph
*
(c) 1.8 -

(d) 2.0

21. The regression line would predict that a car that emits 10 grams of CO per km driven would
emit approximately how many grams of NOX per km driven?
(a) 10
(b) 1.7
(c) ->
1.1 >
This is interpolation
(d) Can’t tell

22. Predicting the NOX emission of a car that emits 25 grams of CO per km is called
(a) correlation
(b) ->
extrapolation -> This is extrapolation
(c) causation
(d) interpolation

5
Section 2: Free-Response Problems
1. Match each graph with the appropriate correlation coefficient.
...... 0.62 . . . . . . 0.54 . . . . . . 0.09 . . . . . . 0.93
0.09 0.95 -
0.62 0.5 4

Weak
strong strong strong
positive positive negative positive
2. The regression equation relating the number of years of college (x) and the current annual
v salary in thousands of dollars (y) for a random sample of heavy equipment salespeople is:

ŷ = 21.6 + 2x
(a) What does the slope mean in this context? <Answers on next page)
(b) What does the intercept mean in this context? Is it meaningful?
(c) Predict the annual salary of a salesperson with one year of college.
3. Chicken sandwiches are often advertised as a healthier alternative to beef because many are
Wlower in fat. Data from tests on 6 di↵erent sandwiches are shown below.
Fat (g) 10 20 25 30 35 40
Calories 220 450 400 600 720 810 <Answers on next page)

(a) Find and interpret the correlation coefficient between calories and fat content.
(b) At 1% significance level, are the calories and fat content linearly related?
(c) Find the equation of the regression line to estimate calories from the fat content.
(d) Explain the meaning of the y-intercept.
(e) Explain the meaning of the slope.
(f) A new chicken sandwich containing 28 grams of fat is introduced. How many calories
does it have?

4. The following data show the retail price (in Dhs) for 8 randomly selected laptop computers
along with their corresponding processor speeds in gigahertz (GHz).

Speed 2 1.6 1.8 2 1.2 1.6 1 1.4


Price 8068 3688 7768 8548 4048 8008 3748 3476

(a) Construct a scatter plot of price and processor speed of laptops. Comment on the graph.
(b) Find and interpret the correlation coefficient.
(c) Test at the 5% level of significance whether the processor speed of a laptop and its price
are linearly related.
(d) Find the regression equation relating the price to the processor speed.
(e) Interpret the meaning the regression coefficients.
(f) Predict the price of a laptop with a 2.16 GHz processor speed.

6
bo+b/
M
xc-Number of
0.2 21.6 + 222
=

Y years of
intercept
=
"slope
experience
(a) ↳=> annual salary
stope: b=2000$/year
-
|

additional year of experience,


For every
increase by 2000.
salary
thousand $ 21,600$
=>

bo 21.6
Intercept:
=

(b) 0
experience
=

When eC=O =

21,600 $
salary starting salary
=

with
=>
no experience,
is 21,600K.
1 year s=?
ac =
when
(1) Predict
ec= 1

y =
21.6 + 2CC
thousands
23.6
(16 +2x1
Y
=
=

experience, predicted
with 1 year of is 23,600
=>

annual salary
v=0.972
0.3 (a) correlation coefficient
increase
- in burgers
=> As fat (C)
increase
calories (x)
(b) No:<=0 (irrelation < = 0.01

(significant)
t=rx
Ha: O
1-U2

P-value = 0.001
Revalue <a PHe
evidence
conclusion: We
have enough
-
that there is a
to conclude
relation between fat (c)
significant
and calories (y)
(C) Regression Equation
↳ob) sC x -> fat
I = 9.14 + 19.6600
↳- Calories
(bo)
7-intercept
(d) meaning of fat no
when <=0 =>
9.14 calories
9.14 =
Y =

911 calories
has
=> A fat free burger as
Les slope 161):19.36calories/gramoff
calories
For every
=
is an increase of 1966
there
when 1 289,]=?
9.14+ 19.66xs
=

9.14+19.66Cc =
9

If) =

You might also like