0% found this document useful (0 votes)
373 views58 pages

STA2023 Summary Notes: Chapter 1 - 10

This document provides an overview of introductory statistics concepts from Chapter 1 of a statistics textbook. It covers topics like descriptive versus inferential statistics, types of variables (qualitative, quantitative, discrete, continuous), data collection and sampling, and key terminology like population, sample, and random sample. The chapter also discusses variables and data types like discrete versus continuous variables and how continuous data requires rounding measurements.

Uploaded by

Divine Mabika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
373 views58 pages

STA2023 Summary Notes: Chapter 1 - 10

This document provides an overview of introductory statistics concepts from Chapter 1 of a statistics textbook. It covers topics like descriptive versus inferential statistics, types of variables (qualitative, quantitative, discrete, continuous), data collection and sampling, and key terminology like population, sample, and random sample. The chapter also discusses variables and data types like discrete versus continuous variables and how continuous data requires rounding measurements.

Uploaded by

Divine Mabika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

MIAMI DADE COLLEGE - HIALEAH CAMPUS

STA2023
Summary Notes
Chapter 1 - 10
Dr. Mohammad Shakil
Editor: Jeongmin Correa
2

Ch1
1 - 1 Descriptive and Inferential Statistics
Contents
 Statistics
The Methods of classification
and Analysis of numerical & non-numerical data
Chapter 1: The Nature of Probability and Statistics
For Drawing valid conclusion and making reasonable decisions.

Chapter 2: Frequency distribution and Graphs < Two Major Areas of Statistics >

Chapter 3: Data Description Descriptive Statistics Inferential Statistics


It consists of making inferences from
Chapter 4: Probability and Counting Rules It consists of the collection, samples to populations,
organization, summarization, hypothesis testing, determining
Chapter 5: Discrete Probability Distributions and presentation of data. relationships among variables,
(It describes the situation as it is). and making predictions.
Chapter 6: The Normal Distribution (It is based on probability theory.)

Chapter 7: Confidence Intervals and Sample Size * Probability; the chance of an event occurring.
Cards, dice, bingo, & lotteries
Chapter 8: Hypothesis Testing
In order to gain information about seemingly haphazard
events, statisticians study random variables.
Chapter 9: Testing the Difference Between Two Means,
Two Variances, and Two Proportions 1. Variables
A variable is a characteristic or an attribute that can assume different values.
Chapter 10: Correlation and Regression Height, weight, temperature, number of phone calls received, etc.

2. Random Variables
Variables whose values are determined by chance

Miami Dade College -- Hialeah Campus


3

< Collection of Data > 1 – 2 Variables and Types of Data


The collection of data constitutes the starting point of any statistical investigation. It
< Classification of Variables (and Data) >
should be conducted systematically with a definite aim in view and with as much
accuracy as is desired in the final results, for detailed analysis would not compensate 1. Qualitative Variables
for the bias and inaccuracies in the original data. – No mathematical meaning or Non-numerical
variables that can be placed into distinct categories, according to
1. Data; the measurements or observations (values) for a variable some characteristic or attribute.
Ex) gender, religious preferences, geographic locations, grades of
2. Data Set; A collection of data values
a student, car‟s tags, numbers on the uniforms of baseball players, etc.
3. Data Value or Datum: Each value in the data set
2. Quantitative Variables
Example:
Suppose a researcher selects a specific day and records the number of calls received numerical in nature and can be ordered or ranked.
by a local office of the Internal Revenue Service each hour as follows: {8, 10, 12, 12, Ex) age, heights, weights, body temperatures, etc.
15, 11, 13, 6}, where 8 is the number of calls received during the first hour, 10 the Discrete Variables Continuous Variables
number of calls received during the second hour, and so on. can assume all values between
The collection of these numbers is an example of a data set, and each number in the assume values that can be counted such as whole
any two specific values by
data set is a data value. numbers
measuring.
Data may be collected for each and every unit of the whole lot (called population), Ex) the number of children in a family, the
for it would ensure greater accuracy. Ex) Temperature, height,
number of students in a class-room, the number of
But, however, since in most cases the populations under study are usually very large, weight, length, time, speed,
calls received by a switchboard operator each day
and it would be difficult and time-consuming to use all members, therefore etc.
for one month, batting order numbers of
statisticians use subgroups called samples to get the necessary data for their studies. baseball, etc.
The conclusions drawn on the basis of this sample are taken to hold for the
population *Since continuous data must be measured, rounding answers is
necessary because of the limits of the measuring device. Usually, answers are
1. Population rounded to the nearest given unit
the totality of all subjects possessing certain common  (there is time between 2 seconds, , it must be rounded up.)
characteristics that are being studied.
Ex) Heights must be rounded to the nearest inch, weights to the nearest ounce, etc.
2. Sample; a subgroup or subset of the population.
Hence, a recorded height of 73 inches would mean any measure of 72.5 inches up to
but not including 73.5 inches.
3. Random Sample Thus, the boundary of this measure is given as 72.5 – 73.5 inches.
A sample obtained without bias or showing preferences in selecting items of the (We have taken 72.5 as one of the boundaries since it could be rounded to 73. But,
population is called a random sample. we cannot include 73.5 because it would be 74 when rounded). Sometimes 72.5 –
73.5 is called a class which will contain the recorded height of 73 inches.
The concept of the boundaries of a continuous variable is illustrated
in the following Table I:

Miami Dade College -- Hialeah Campus


4

Nominal Ordinal Interval Ratio


TABLE I Variable Recorded Value Boundaries (Class)
No order or rank
No meaningful True zero
Length 15 cm 14.5 – 15.5 cm Equality, Order , Rank ,
zero,
0 0 Categories, No equal distance
Temperature 86 F 85.5 – 86.5 F Equal distances
No mathematical between 2 ranks
between 2 points
Time 0.43 sec 0.425 – 0.435 sec meaning
Weight 1.6 gm 1.55 – 1.65 gm Zip code, Gender, Grade (ABCDF), Ex) STA score, Ex) Height,
Color, Ethnics Judging (1st, 2nd, 3rd), IQ, Temperature, Weight, Time,
Note: The boundaries of a continuous variable in the above table are
Political affiliation, Rating scale 12 hours of day, Salary, Age,
given in one additional decimal place and always end with the digit 5.
Religious affiliation, (Excellent, good, bad), Date of a week, 24 hours of
Major field, Ranking of sports Days of a month, days
< MEASUREMENT SCALES OF A DATA: > Nationality, players, Week, Months of a year (0 = 24)
Marital status, Sports Months,
1. Nominal-level Data (no order or no comparing values) player‟s back Mon ~ Fri,
– Equality, Categories, No mathematical meaning –Binomial numbers, left center right,
, AM & PM, Date, Morning, Afternoon,
The nominal-level of measurement classifies data into mutually exclusive (non-
Credit card numbers Evening, Birthdays
overlapping), exhaustive categories in which
no ordering or ranking can be imposed on the data.
Nominal; Sue is young, and Mary is old.
2. Ordinal-Level Data – Order , Rank (Qualitative data)
Ordinal; Sue is younger than Mary.
The ordinal-level of measurement classifies data into categories that can be
Interval; Sue is 20 years younger than Mary.
ordered or ranked. (only before and after no bigger or less..)
However, precise differences between the ranks do not exist. Ratio ; Sue is twice as young as Mary.

Interval-level Data (Quantitative data)

The interval-level of measurement ranks data, and precise differences between


units of measure do exist. (equal distances between 2 points)
However, there is no meaningful zero (i.e., starting point)

3. Ratio-level Data (Quantitative data)


possesses all the characteristics of interval measurement (i.e., data can be ranked,
and there exists a true zero or starting point).
In addition, true ratios exist between different units of measure.

Miami Dade College -- Hialeah Campus


5

1 – 3 Data Collection and Sampling Techniques

When the population is large and diverse, a sampling method must be designed so
that the sample is representative, unbiased and random, i.e. every subject (or element)
in the population has an equal chance of being selected for the sample.

1. Random Sampling
This method requires that each member of the population be identified and
assigned a number.
Then a set of numbers drawn randomly from this list forms the required random
sample.
Note that each member of the population has an equal chance of
being selected.
Ex) For a large population, computers are used to generate random
numbers which contain series of numbers arranged in random order.

3. Stratified Sampling
This method requires that the population be classified into a number of smaller
homogeneous strata or subgroups.
A sample is drawn randomly from each stratum.
= Subdivide the population into at least 2 different subgroups
(or strata) so that subject within the same characteristics ( such as
gender or age bracket) then draw a sample from each subgroup.
Ex) age, sex, marital status, education, religion, occupation, ethnic
background or virtually any characteristic.

2. Systemic Sampling – K th – every 5th numbers


This method requires that every k th member (or item) of
the population be selected to form the required random sample.
Ex) We might select every 10th house on a city block for the random sample.

Miami Dade College -- Hialeah Campus


6

4. Cluster Sampling < Statistical Inference and Measurement of Reliability >


The population area is first divided into a number of sections (or subpopulations)
called clusters. A statistical inference is an estimate or prediction or some other generalization
A few of those clusters are randomly selected, and sampling is carried out only in about a population based on information contained in a random sample of the
those clusters. population. That is, the information contained in the random sample is used to learn
(and then choose all members from the selected clusters) about the population.

Ex) a community can be divided into city blocks as its clusters. Several blocks
A measure of reliability is a statement (usually quantified) about the degree of
are then randomly selected. After this, residents on the selected blocks are
uncertainty associated with a statistical inference.
randomly chosen, providing a sampling of the entire community.

< Elements of Descriptive and Inferential Statistical Problems >

1. Four Elements of Descriptive Statistical Problems


a. The population or sample of interest.
b. One or more variables (characteristics of the population or
sample units) that are to be investigated.
c. Tables, graphs, numerical summary tools.
d. Identification of patterns in the data.

2. Five Elements of Inferential Statistical Problems


a. The population of interest.
b. One or more variables that are to be investigated.
5. Convenience Sampling c. The sample of population units.
we use the results that are readily available. d. The statistical inference about the population based on information contained
Ex) Someone could say to you, “Do you know…?” in the random sample of the population.
e. A measure of reliability for the statistical inference.

Miami Dade College -- Hialeah Campus


7

Ch 2  How to make the Table of Categorical Frequency Distribution


1. Class Limit: Range = Highest value – Lowest value
 Raw (Original) Data: Data are in original form (Unorganized)
2. Class Limit: The Number of classes desired (5 ~ 20 classes.)
 Class: Each raw data value is placed into a quantitative or *Ideal of number of classes by Sturges‟ guidline
qualitative category. ⁄

 Frequency Distribution (Round up to the next whole number)


The organization of raw data in table form, using classes and frequencies 3. Class Limit: The Class Width = Range ÷ the number of classes
a) Categorical Frequency Distribution - Nun numerical data (Round up to the next whole number)
Class width = low class limit – previous low class limit (Vertical)
b) Grouped Frequency Distribution - Numerical data
= upper class boundary – lower boundary (Horizontal)
c) Ungrouped Frequency Distribution (Subtracting the lower (or upper) class limit of one class
from the lower (or upper) class limit of the next class.)
Rules for Constructing a Frequency Distribution
4. Class Limit: Select the starting point for the lowest class limit.
1. Classes‟ numbers should be between 5 and 20 classes.
5. Class Limit: Subtract one unit from the lower limit of the second class to get the
2. The Class Midpoint upper limit of the 1st class.
Then add the width to each upper limit to get all the upper limits.
6. Class boundaries:
3. The classes must be mutually exclusive, but the class boundaries are not. Lower Boundary = Lower Limit – 0.5 (or 0.05) depend on the
4. The classes must be continuous (No gap) Upper boundary = Upper Limit – 0.5 (or 0.05) number of the data

The only exception is if 1st or the last class starts with „zero‟ frequency. Ex1)
5. The classes must be equal in width. Class Limits Class boundaries
The only exception that has an open-ended class. 24 – 30 (24 – 0.5) – (30 + 0.5) 23.5 – 30.5
(below, and more, etc.) 31 – 37 (31 – 0.5) – (37 + 0.5) 30.5 – 37.5
Ex2)
Class Limits Class boundaries
2.3 – 2.9 (2.3 – 0.05) – (2.9 + 0.05) 2.25 – 2.95
3.0 – 3.6 (3.0 – 0.05) – (3.6 + 0.05) 2.95 – 3.65

7. Tally & Frequency: Count the number of data of each class


8. Find the sum of all of Frequencies.
9. Cumulative Frequency: adding the frequencies of the classes less
than or equal to the upper class boundary of a specific class.
** The number the last class and the frequencies‟ sum must be same.
Miami Dade College -- Hialeah Campus
8

10. Relative Frequency = frequency ÷ total number = P41 Ex 2-2) Record High Temperatures - Grouped F. Distribution
112 100 127 120 134 118 105 110 109 112 110 118 117 116 118
11. Percent = ⁄ 122 114 114 105 109 107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104
12. Midpoint 111 120 113 120 117 105 110 118 112 114 114

Solution)
1. Range = Highest value – Lowest value 134-100 = 34
2. The Number of classes desired that between 5 and 20 classes. 7 classes
P38 Ex) Distribution of Blood types - Categorical F. Distribution 3. The Class Width = Range ÷ the number of classes 34 ÷ 7= 4.9  5
A B B AB O O O B AB B B B O (Round up to the next whole number)
A O A O O O AB AB A O B A 4. Select the starting point for the lowest class limit. 100
5. Subtract one unit from the lower limit of the second class to get the upper limit of
Blood Type A: 5 people Blood Type B: 7 People
the 1st class.
Blood Type O: 9 people Blood Type AB: 4 people Total:25 people
Then add the width to each upper limit to get all the upper limits.
Class Tally Frequency Relative F. Percent (%) 100-104, 105-109, 110-114 , 115-119, 120-124, 125-129, 130-134
A 5 ⁄ ⁄ 20% 6. Find boundaries.
Lower Boundary = Lower Limit – 0.5 (or 0.05) depend on the
B 7 ⁄ 28%
Upper boundary = Upper Limit – 0.5 (or 0.05) number of the data
O 9 ⁄ 36%
7. Tally & Frequency: Count the number of data of each class
AB 4 ⁄ 16%
8. Find the sum of all of Frequencies.
Total ∑ 1 100% 9. Cumulative Frequency: adding the frequencies of the classes less
than or equal to the upper class boundary of a specific class.
** The number the last class and the frequencies‟ sum must be same.
Class Cumulative Frequency
A 5 10. Relative Frequency = frequency ÷ total number ⁄ each class
B 5+7 = 12 11. Percent = ⁄
O 12+9 = 21 12. Midpoint
AB 21+4 = 25 (=∑ )

Miami Dade College -- Hialeah Campus


9

<Cumulative Frequency Distribution>


C.L. Class boundaries Tally f m.d.
Class f C. F.
100 -104 99.5 – 104.5 2 102
Less than 105 2 2
105-109 104.5 – 109.5 8 107
Less than 110 8 2+8=10
110-114 109.5 – 114.5 18 112
Less than 115 18 10+18=28
115-119 114.5 – 119.5 13 117
Less than 120 13 28+13=41
120-124 119.5 – 124.5 7 122
Less than 125 7 41+7=48
125-129 124.5 – 129.5 1 127
Less than 130 1 48+1=49
130-134 129.5 – 134.5 1 132
Less than 135 1 49+1=50

C.L. f R. F. % C.F. ∑ same as ∑

100 -104 2 ⁄ ⁄ ⁄ 2
 Histogram
105-109 8 ⁄ ⁄ ⁄ 10 the data by using continuous vertical bars (unless the frequency of a
class is 0) of various heights to represent the frequencies of the classes

110-114 18 ⁄ 28  Frequency Polygon

the data by using lines that connect points plotted for frequencies for the classes.
115-119 13 ⁄ ⁄ 41
(starts from zero)
120-124 7 ⁄ ⁄ 48
The frequencies are represented by the height of the points.
125-129 1 ⁄ ⁄ 49
 Ogive (=Cumulative frequency)
130-134 1 ⁄ ⁄ 50 the cumulative frequencies for the classes in a frequency distribution
Total ∑ 1 100%  ***Note: Those three graphs are used when the data are contained in a grouped
frequency distribution

C.L. = class limits


f = frequency
c.f. = cumulative frequency
R.F.= relative frequency

Miami Dade College -- Hialeah Campus


10

Graphs from the Ungrouped Frequency Distribution of Blood types Histogram


<Vertical Bar Graph > *** Using Class boundaries for x – axis and Frequencies for y –axis***

Frequency 20

10 15

8 10
6 5
4
0
2 Less 99.5 104.5 109.5 114.5 119.5 124.5 129.5 More
0
A B O AB

Frequency Polygon
< Frequency Graph > < Cumulative Frequency Graph> *** Using Midpoints for x – axis and Frequencies for y –axis***
Frequency Cummulative Frequency 20

10 30 15
8 25
20 10
6
4 15
5
10
2
5 0
0 0 Less 102 107 112 117 122 127 132 More
A B O AB A B O AB

Ogive
Ex) Drawing Graphs of Grouped Frequency distribution *** Using Class boundaries for x – axis and
from Ex 2-2) Cumulative Frequencies for y –axis **
Frequency Distribution
50
20
40
15
30
10 20
5 10

0
0
100 -104 105-109 110-114 115-119 120-124 125-129 130-134 Less 99.5 104.5 109.5 114.5 119.5 124.5 129.5 More

Miami Dade College -- Hialeah Campus


11

< Distribution Shapes > <Other Types of Graphs>


Bar Graph
Bell shape Uniform U shape
: using vertical or horizontal bars whose heights or lengths represent
10 8 the frequencies of the data
10
7
8 8
6 Blood Types
6 5 AB
6 10
4
8 O
4 4 3
6
2 B
2 2 4
1 2 A
0 0 0 0
A B O AB 0 2 4 6 8 10
J shape Reverse J shape Bimodal
10 10 7 Pareto Chart (Horizontal)
6 : a Categorical variable and the frequencies are displayed by the heights
8 8
5 of vertical bars which are arranged in order from highest to lowest
6 6 4
10
4 4 3
8
2
2 2 6
1 4
0 0 0 2
0

Left Skewed Right Skewed O B A AB

8 8
Time Series Graph : occur over a specific period of time
6 6 (Temperatures over a 24 hours period)
80
4 4
75
2 2 70
65
0 0
60
55
50
BEFORE 12AM 3AM 6AM 9AM 12PM 3PM 6PM 9PM AFTER

Miami Dade College -- Hialeah Campus


12

Pie Graph (Percentage or proportions-Nominal or Categorical) <Analyzing the scatter plot>


: Divided into sections or wedges according to the percentage of No relationship Positive Liner relationship
frequencies in each category of the distribution in a circle

A
B
O
AB

Step 1) Degrees = ⁄ ∑ : to measure

Step 2) % = ⁄ ∑ : to show

 The sum of degrees or percentages does not always sum of or 100% due to
rounding
No Liner relationship Positive Liner relationship
<Scatter Plots>
A graph of order pairs of data values that is used to determine if a relationship exists Negative Liner relationship
between the two values 

Ex)
No. of
Accidents, 376 650 884 1162 1513 1650 2236 3002 4028
Fatalities, 5 20 20 28 26 34 35 56 68
80

60

40

20

0
0 1000 2000 3000 4000 5000

Miami Dade College -- Hialeah Campus


13

< Stem and Leaf Plot (Exploratory Data Analysis)> Ch 3


A data plot that uses part of the data value as the stem as the stem and
part of the data value as the leaf to form groups or classes 3 – 1 Measures of Central Tendency

Step 1) Arrange the data in order  Statistics : a characteristic or measurer obtains by using
the data values from a sample
Step 2) Separate the data according to the first digit
Step 3) A display can be made by using the leading digit as the stem
 Parameter: a characteristic or measurer obtains by using
and the trailing digit as the leaf. all the data values from a specific population
** If there are no data values in a class, you should write the stem number and leave
the leaf row blank. Do not put a zero in the leaf row. S
1, 3, 5 2, 4, 6
Ex) 24 32 2 56 44 2 13 32 44 31 32 14 105 23 20
Step 1) 2 13 14 20 23 24 31 32 32 32 44 44 56 105
Ex) Statistics  2, 4, 6 (a sample)
Step 2) 02 31 32 32 32 105
13 14 44 44 Parameter 1, 2, 3, 4, 5, 6(population)
20 23 24 56
 Mean (=Arithmetic Average) Affected by the highest and lowest values
Step3) Stem (Leading Digit) Leaf (Trailing Digit)

0 2 ̅
1 3 4
2 0 3 4 ∑
3 1 2 2 2
4 4 4 ** n = the numbers of the sample
5 6 ** N = the numbers of the population
10 5
Ex) 2 6 9 10 5 7
Ex) Atlanta: 26 29 30 31 36 36 40 40 50 52 60 ∑
N.Y. : 25 31 31 32 36 39 40 43 51 52 56

Step 1) It‟s arranged already.  Median (MD) : the midpoint of the data array
Atlanta: 26 29 30 31 36 36 40 40 50 52 60
1) Arrange the all the data in order
N.Y. : 25 31 31 32 36 39 40 43 51 52 56
2) Select the midpoint
Step 3)
Atlanta Stem N.Y. 3) If there are 2 numbers of MD, adding the 2 numbers
9 6 2 5 And then divide by 2.
6 6 1 0 3 1 1 2 6 9 ** Data array =the data set is ordered
0 0 4 0 3
2 0 5 1 2 6
0 6 Ex) 3 5 4 9 2 3 4 6 10
2 3 3 4 4 5 6 9 10  4 is MD
Miami Dade College -- Hialeah Campus
14

Ex) 20 41 66 27 21 24 3 – 2 Measures of Variation


20 21 24 27 41 66  24 & 27 are in the middle  Population Variance and Standard Deviation
: to have a more meaningful statistic to measure the variability, using variance and
standard deviation
 Mode the value that occurs most often in a data set : When the means of 2 sets of data are equal, the larger the variance
or standard deviation is more variable the data are.
 No mode: all different data ex) 2 3 5 9 7 12 1 4
 Unimodal: one mode ex) 2 3 5 9 7 5 2 4
 Bimodal: two mode ex) 2 3 5 9 3 7 2 11 **Distance between highest and lowest values
 Multimodal: more than two mode ex) 2 3 9 2 7 3 4 9
∑ ∑
 Midrange (MR): approximate of data values
**average of the squares of the distance that each value


√ √
 Weighted Mean: (ex) GPA

̅

Ex) Course Credits (W) Grade (X)  Sample Variance and Standard Deviation
Math 3 A (4points)
Not usually used, but since in most cases the purpose of calculating the statistics is to
English 4 C (2points)
estimate the corresponding parameter
Biology 2 B (3points)
: because giving a slightly larger value and an unbiased estimate of the
population variance (
̅
∑ ̅

 Distribution Shapes ∑ ̅

Positively skewed Negatively skewed Bell shape Symmetric
(Right skewed) (Left skewed) (Evenly) ̅
8 8 10
***Short cut or Computation Formulas (No need ̅ )

6 6
∑ ∑
∑ √ ∑
4 4 5 ∑ ∑
2 2

0 0 0

Mode Mean Mean Mode Mean


Median Median Median
Mode
Miami Dade College -- Hialeah Campus
15

Ex) 131p Find population variance and population standard deviation  For Variance and Standard Deviation for Grouped Data
Comparison of outdoor paint (how long each will last before fading) - Using it uses the midpoints of each class
A 10 60 50 30 40 20
B 35 45 30 35 40 25 Ex) Class Frequency(f) Midpoint ( )
Step1) ∑ 05.5 - 10.5 1 8

Step2) 10.5 - 15.5 2 13
Step3) Range A: 60 – 10 = 50 months 15.5 - 20.5 3 18
B: 45 – 25 = 25 months
20.5 - 25.5 5 23
Step4) Variance
A: 10-35=35, 60-35=25, 50-35=15, 30-35=-5, 40-35=5, 20-35=-15 25.5 - 30.5 4 28
B: 35-35=0, 45-35=10, 30-35=-5 35-35=0, 40-35=5, 25-35=-10 30.5 - 35.5 3 33
A:
35.5 - 40.5 2 38
B:
∑ Step 1) Find the mid points of each class.
Step 2)

Step 3)

Step5) Standard deviation √ Class


05.5 - 10.5 1 8 8 64

10.5 - 15.5 2 13 26 338
15.5 - 20.5 3 18 54 972
20.5 - 25.5 5 23 115 2645
25.5 - 30.5 4 28 112 3136
30.5 - 35.5 3 33 99 3267
35.5 - 40.5 2 38 76 2888
∑ ∑
Sum (∑ )


Step5) √

Miami Dade College -- Hialeah Campus


16

<Uses of the Variance and Standard Deviation>  Range rule of Thumb


1. Variance and Standard Deviation can be used to determine the
spread of the data. If the variance or standard deviation is large, the ̅
data are more dispersed. This information is useful in comparing ̅
two(or more) data sets to determine which is more(most) variable.
2. The measure of variance and standard deviation are used to  Chebyshev's Theorem
determine the consistency of a variable. for example, in the 1. The proportion of values from a data set that will fall within k
manufacture of fitting, such as nuts and bolts, the variation in the standard deviation of the mean will be at least ⁄ ,
diameters must be small, or the parts will not fit together.
where k is a number greater than 1 (k isn't necessarily an integer).
3. The variance or standard deviation are used determine the number
of data values that fall within a specified interval in a distribution.
For example, Chebyshev's Theorem shows that, for an distribution,
2. Find the minimum % of data values that will fall between
at least 75% of the data values will fall within 2 standard deviations
any two given values.
of the mean.
3. This states at least 75% of the data values will fall
4. finally, the variance or standard deviation are used quite often in within 2 standard deviations of the mean of the data set.
inferential statistics. These uses will be shown in later chapters.
At least 88.89%
At least 75%
 Coefficient of Variation with percentage(%)

*** To compare standard deviations when the units are different


the larger coefficient of variance is more variable than the other.

X ̅- 3s X ̅ - 2s X̅ X ̅ + 2s X ̅+ 3s
Ex 3-25) p140 The mean of the number of sales of cars over a 3-month period is
87, and the standard deviation is 5. The mean of the commissions is $5225, and the
standard deviation is $773.
Compare the variations of the two. ex) The mean price of houses in a certain neighborhood is $50,000,

-Solution and the standard deviation is $10,000. Find the price range
for which at least 75%, of the houses will sell.
Sales -Solution

Commissions Hence, at least 75% of all homes sold in the area will have a price range from
$30,000 to $70,000.
Since the coefficient of variation is larger for commissions, the
commissions are more variable than sales.
Miami Dade College -- Hialeah Campus
17

 Standard Scores or z score ( z ) 3-3 Measures of Position


- a comparison of a relative standard similar to both can be made
 Percentiles ( Pn )
the mean and standard deviations
- Divide the data set into 100 equal groups
- Number of standard deviations a data value is above or below - Position in hundredths that a data value holds in the distribution
the mean for a specific distribution of values
(each part = 1%)
̅

***To find the approximate percentile rank of the data value


 The Empirical (normal) rule in the bell- shaped of graph
Approximately 68% of the data value will fall 1 standard deviation of mean
Approximately 95% of the data value will fall 2 standard deviation of mean Ex 3-32)A teacher gives a 20 point test to 10 students.
Approximately 99.7% of the data value will fall 3 standard deviation of mean The scores are shown here. Find the percentile rank of a score of 12.
99.7% 18 15 12 6 8 2 3 5 20 10
95%
68% Step 1) Arrange the data 2 3 5 6 8 10 12 15 18 20
Step 2)

Step 3) a student whose score was 12 did better than 65% of the class.

X ̅- 3s X ̅ - 2s X ̅ - 1s X̅ X ̅ + 1s X ̅ + 2s X ̅+ 3s
 Finding a Value corresponding to a Given Percentile

If cth is not a whole number, round it up to the next whole number.

If cth is a whole number (c+1)th is the next value number of c.

Ex 3-34) from 3- 32 find the value corresponding to the 25th percentile.

Step 2) The 3rd value is 5.


Hence, the value 5 corresponds to 25th percentile.

Miami Dade College -- Hialeah Campus


18

Ex 3-35) from 3- 32 find the value corresponding to the 60th percentile.  Outliers
- An outlier is an extremely high or an extremely low data value
when compared with the rest of the data values.
- Strongly affect with the mean and standard deviation
Step 2) The 6th value(=c) is 10 and 12 is 7th value(=c+1).
Step 1) Arrange the data in order and find Q1 and Q3.
Step 2) Find the interquartile range = IQR = Q3 - Q1
Hence, 11 corresponds to the 60th percentile. Step 3) (1.5) IQR
Anyone scoring 11 would have done better than 60% of the class. Step 4) Q1 - [ (1.5) IQR ]
Q3 + [ (1.5) IQR ]
Step 5) Check the data set for any value that is smaller than
 Quartiles ( Qn ) Q1 - [ (1.5) IQR ] or larger than Q3 + [ (1.5) IQR ]
: Position in fourths that a data value holds in the distribution IQR
Step 1) Arrange the data in order from lowest to highest
Step 2) Divide into 4 groups Q1 Q2 Q3
25% 25% 25% 25% Q1 - [ (1.5) IQR ] Q3 + (1.5) IQR

Smallest Q1 Q2 Q3 Largest Ex 3 - 37) Set for outliers from ex3-36)


data value MD data value Step 1) Arrange the data set  5 6 12 13 15 18 22 50
25th p 50th p 75thpercentile
Step 2) To find Q2  divide into 2 =14 =MD = Q2
Q1 = =9 Q3 = = 20 Q3 - Q1= 20-9 = 11
Ex 3-36 ) 15 13 6 5 12 50 22 18 Find Q1, Q2, & Q3
Step 3) (1.5) IQR = 1.5 11 = 16.5
Step 1) Arrange the data set  5 6 12 13 15 18 22 50 Step 4) Q1 - [ (1.5) IQR ] = 9 - 16.5 = - 7.5
Step 2) To find Q2  divide into 2 =14 =MD = Q2 Q3 + [ (1.5) IQR ] = 20 + 16.5 = 36.5
Step 3) Q1 = =9 Step 5) Check the data set for any data values that fall outside
the interval from - 7.5 to 36.5.
Step 4) Q3 = = 20 The value 50 is outside this interval.
Hence, it can be considered an outlier.
 Deciles (Dn)
Position in tenths that a data value holds in the distribution
Step 1) Arrange the data in order from lowest to highest
Step 2) Divide into 10 groups

10% 10% 10% 10% 10% 10% 10% 10% 10% 10%

Smallest data value Largest data value

Miami Dade College -- Hialeah Campus


19

3 - 5 Exploratory Data Analysis Ex 3-39) A dietitian is interested in comparing the sodium content of
 The Five - Number summary and Boxplots real cheese with the sodium content of a cheese substitute.
The data for two random samples are shown.
1. The 5-Number Summary Compare the distributions, using boxplots.
1) The lowest value of the data set (Minimum)
2) Q1 Real Cheese Cheese Substitute
3) Q2 = The Median 310 420 45 40 270 180 250 290 130
4) Q3 220 240 180 90 260 340 310
5) The highest value of the data set (Maximum)
2. a Boxplot Step 1) Real cheese : 40 45 90 180 220 240 310 420
A graph of a data set obtained by drawing a horizontal line from Cheese Substitute : 130 180 250 260 270 290 310 340
the minimum data value to Q1, a horizontal line from Q3 to the maximum data Step 2) Q2 (The Median)
value, and drawing a box whose vertical sides pass through Q 1 and Real cheese : = 200 = Q2
Q3 with a vertical line inside the box passing through the median or Q 2.
Cheese Substitute : = 265 = Q2
3. How to make a Boxplot Step 3) Q1 & Q3
Step 1) Arrange the data in order.
Real cheese : = 67.5 = Q1 = 275 = Q3
Step 2) Find Q2 (The Median).
Step 3) Find Q1 & Q3. Cheese Substitute : = 215 = Q1 = 275 = Q3
Step 4) Draw a scale for the data on the . Step 4, 5, &6 )
Step 5) Locate the lowest value, Q1, the median, Q3, and 40 67.5 200 275 420
the highest value on the scale. Real cheese
Step 6) Draw a box around Q1 & Q3., draw a vertical line through
the median, and connect the upper and lower values.
Cheese Substitute
4. Information Obtained from a Boxplot
1) If the median is near the center of the box, 130 215 265 300 340
the distribution is approximately symmetric.
0 100 200 300 400 500
2) If the median falls to the left of the center of the box,
the distribution is positively skewed.

3) If the median falls to the right of the center,


the distribution is negatively skewed.
4) If the lines are about the same length,
the distribution is approximately symmetric.
5) If the right line is larger than the left line,
the distribution is positively skewed.
6) If the right line is larger than the left line,
the distribution is negatively skewed.
Miami Dade College -- Hialeah Campus
20

** Compare the plots. It is quite apparent that the distribution for the cheese CH 4
substitute data has a higher median than the median for the distribution for the real
cheese data. The variation or spread for the distribution of the real cheese data is 4-1 Sample Space and counting Rules
larger than the variation for the distribution of the cheese substitute data.  Probability - The chance of an Event occurring
1. Probability Experiments
Traditional Exploratory Data Analysis
A chance process that lead to well-fined results called outcomes.
Frequency distribution Stem and Leaf Plot (not known in advance of an act)
Histogram Boxplot 2. Outcome; The result of a single trial of a probability experiment
Mean Median 3. Event ( = E )
Standard deviation Interquartile range a subject(a sample from total) of the given sample space denoted
by A, B, C, D, etc. (it can consist more than one outcomes.)
The most three commonly used measures of central tendency are mean, median,
and mode. Ex 1) A question has multiple choices that 4 possible results
The most three commonly used measurements of variation are range, variance, and (Outcomes) such as ⓐⓑⓒ and ⓓ.
standard deviation. Only one of them is the right answer.
The most common measures of position are percentiles, quartiles, and deciles. What is a chance that a person gets the answer?
The coefficient of variation is used to describe the standard deviation in relationship
to the mean. ⓐⓑⓒ ⓓ
These methods are commonly called traditional statistical methods and are
primarily used to confirm various conjectures about the nature of the data. Ex 2) Tossing a fair and balance coin.
The boxplot and 5-number summaries are part of exploratory data analysis; to (Well- defined, outcomes Head & Tail)
examine data to see what they reveal. What is the possibility (of chance) of getting "Head" ?
2 possible outcomes (Head & Tail = H & T)

* Fair- each side(face) if equally likely


* Balance- it should fall on either side (Head and Tail)

Ex 3) Rolling a die (a six-faced cube from 1 to 6),


what is the probability of getting 4?

4. Sample Space (= S )
the set (or collection) of all possible outcomes of a probability
experiment
* A die is rolled S = {1, 2, 3, 4, 5, 6} (=a set of notation)
* A coin is tossed S = {H, T}
Miami Dade College -- Hialeah Campus
21

Ex 4) A die is rolled S={1, 2, 3, 4, 5, 6} Let E = {2, 4, 6}  Venn Diagram


Observing an event number A diagram used as a pictorial representative for a
S
probability concept or rule

S= Sample Space
 Sample Space of Rolling 2 Dice
= all the possible outcomes
Event A , Event B

You can represent the Probability of the Events using a Venn diagram from set
theory. (can‟t use this method with all cases)
The rectangle is Sample Space (S).
The circle (set) of A or B is the event, and they are dependent of each other.
The intersection area of events A and B is a nice correspondence between "events A
and B both occurring" and "being inside both circle A and circle B".
Experiment Sample Space The union area of event A or B is covered the maximum combined area of A and B,
when they do not overlap and it's the maximum possible area of A-union-B.
Toss a coin Head, Tail
Toss 2 coins H-H, H-T, T-T, T-H S S
Roll a die 1, 2, 3, 4, 5, 6

1-2, 1-2 1-3, 1-4, 1-5, 1-6,, 2-1, 2-2, 2-3, etc 36
Roll 2 dice
outcomes.

 Playing Cards in a deck


S S
Diamonds (Red); 13 Cards

Spades (Black) ; 13 Cards

From Ex2) Tossing a coin From Ex1) ⓐⓑⓒ and ⓓ

Clubs (Black) ; 13 Cards S S

Hearts (Red) ; 13 Cards


Head ⓐ ⓑ ⓒ ⓓ
Tail
Total A deck of 52 Cards = 26 of Red Cards + 26 Black Cards
Face or picture cards =12 = 4 Jacks(J) + 4 Queens(Q)+4 Kings(K)

Miami Dade College -- Hialeah Campus


22

From Ex 4) Let E = {2, 4, 6} (Observing an event number) Ex 5) A coin is tossed 100 times, find the n(S)
S = { H,H,H,…T,T,T,…}

S
1, 3, 5 2, 4, 6

 Tree Diagram; the method of constructing a sample space

P193 [Ex 4 - 4] Gender of Children


a) Find the probability of all possibility outcomes that a
married couple has 3 children. (Girls and boys) Ex) A coin is tossed only one time
st
1 Child nd
2 Child rd
3 Child A coin is tossed 3 times
B BBB BGB S 1st time 2nd time 3rd time 4th time 5th time etc.
B
G BBG BGG
B
B GBB GBG Ex 6) A coin is tossed 10 times, find P(all are Heads)
G
GGB GGG
G

B
B
G
G Ex 7) A die is rolled,
B
1. Find Odds in favor of getting of less than 4.
G n (S) = 8 outcomes
G

2. Find Odds in favor of getting of less than 5.


b) Find the probability of all children are boys
⁄ ⁄
3. Find Odds in against of getting of less than 5

 When a coin is tossed N times


Proceeding in the same number if a coin is tossed N times

Miami Dade College -- Hialeah Campus


23

 Odds Three Basic Interpretations of Probability


The Actual Odds Against event A occurring are the ratio
1. Classical probability
, usually expresses in the form of a:b (or "a to b"), 2. Empirical or Relative Frequency Probability
where a and b are integers having no common factors. 3. Subjective Probability
The Actual odds in favor of event A are the reciprocal of the actual odds
1. Classical Probability
against that event. If the odds against A are a:b,
S
then the odds in favor of A are b:a.
E
The Payoff Odds against event A represent the ratio of net profit
(if you win) to the amount bet. a) P(E) = 0 ; an event E is uncertain (0%)
Φ (Phi) = no number in the sample place
Payoff Odds Against Event A = ( Net profit ) : ( Amount bet ) b) P(E) = 1 ; an event E is certain (100%)
The sum of probabilities of all outcomes in the Sample Space
Favor Against S
at least at most less than greater than
#F #A
(no less than) (no more than)
#T= number of Total
Number of Total = Number of F + Number of A = n(S)
#A = #T - #F
#F = #T - #A
Ex 9) A die is rolled, let A = {1}. P(1)?

Ex 10) A die is rolled Let B = {2,4,1,3}


* The order isn‟t important in the set of notation.

Ex 8) A card is drawn from a deck (4+48=52)

Ex 11) A die is rolled Let P (S)

Ex 12) An event of observing the 13 when a die is rolled.


Let P (Φ) = {13}

Miami Dade College -- Hialeah Campus


24

P194 [Ex4-7] Drawing a card from a deck (52 cards) p200 [Ex4-13]
a) Of getting a Jack Distribution of Blood Type - Find the following probabilities
Type A B AB D Total
Frequency 22 5 2 21 50

b) Of getting the 6 of clubs a) A person has type O blood

b) A person type A or type b blood


c) Of getting a 3 or a diamond

c) A person neither type A nor O blood

d) Of getting a 3 or a 6
d) A person doesn't have type AB blood

Unlikely Likely
P201 [Ex4-14]
Number of days of maternity patients stayed in the hospital
in the distribution
0 (Uncertain) (Fifty-fifty chance) 1 (Certain) Number of days stayed 3 4 5 6 7
Total= 127 15 32 56 19 5

a) A patient stayed Exactly 5 days

b) Less than 6 days


2. Empirical Probability
Given a frequency distribution, the probability of an event being in a given class and
it is based on observation. c) At most 4 days

d) At least 5 days

Miami Dade College -- Hialeah Campus


25

3. Subjective Probability P205 24] Computers in Elementary School


; The type of probability that uses a probability value based on an educated guess or Elementary and secondary schools were classified by
number of computers they had.
estimate, employing opinions and inexact information
Choose one of these schools at random.
(based on the person's experience and education of a solution)
Computers 1-10 11-20 21-50 51-100 100+
Schools 3170 4590 16,741 23,753 34,803
Find the probability that it has.
 Complementary Events
a. 50 or fewer computers 0.295
( )
Find total 83057, no intersection
1. ( )
2. ( )
3. ( ) b. More than 100 computers 0.419
4. “at least one” = complementary of” none”
“none” = “complementary of “at least one”
P(at least one) = 1- P(none) c. No more than 20 computers 0.093
P (none) = 1 – P(at least one)

*(in class) Choose class “50-100”


P197 Ex 4-10]
Finding Complements
a) Rolling a die and getting a 4
( ) P205 19] Prime Numbers
A prime number is a number that is evenly divisible only
b) Selecting a month and getting a month that begins with J. 1 and itself. Those less than 100 are listed below.
2 3 5 7 11 13 17 19 23 29 31 37 41
( )
43 47 53 59 61 67 71 73 79 83 89 97
c) Selecting a day of the week and getting a weekday Choose one at random and find the probability that
( )
a. The number is even

b. The sum of the number‟s digit is even

c. The number is greater than 50

Miami Dade College -- Hialeah Campus


26

4-2 The Addition Rules for Probability P212 [2] Determine whether these events are mutually exclusive.
 Mutually Exclusive Events a. Roll a die: Get an even number, and get a number less than 3.
; Probability events that cannot occur at the same time b. Roll a die: Get a prime number (2,3,5), and get an odd number.
c. Roll a die: Get a number greater than 3,
 Event and get a number less than 3.
1. Simple; can't break the event ex) E={1} d. Select a student in your class:
The student has blond hair, and the student has blue eyes.
2. Compound; "and" ; "or" ex) { } { } { } { }
e. Select a student in your college:
The student is a sophomore, and the business major.
f. Select any course:
It is a calculus course, and it is an English course.
Case 1 g. Select a registered voter: The voter is a Republican,
(Mutually exclusive events)
and the voter is a Democrat.
P(A or B) = P(A) + P(B) A Ans: Yes- c, f, and g.
B
*In only single trial, event A or B occurs P212 [5] At a convention there are instructors of 7 mathematics,
5 computer science, 3 statistics, and 4 science.
and no intersection
If an instructor is selected , find the probability of getting
*A and B are mutually exclusive a science or math instructor.
(i.e., disjoint )
Total = P(S) = 7+5+3+4=19
Case 2
(No Mutually exclusive events)
P(A or B)=P(A) + P(B) - P(A and B) Ex] A die is rolled one time, find P(E) getting 4 or less than 6.
*A and B aren‟t mutually exclusive
4
(i.e., )
Ex] A card is drawn randomly from an ordinary deck of 52 cards
Case 3 ( an extra case)
Find P(the card is diamond or an ace)
P(A or B or C)
= P(A) + P(B) + P(C)
- P(A+B) - P(A+C) - P(B+C) P209 [Ex4- 20]
+ P(A+B+C) A single card is drawn at random from an ordinary deck of
cards. Find the probability of either an ace or a black card.

(likely)
Miami Dade College -- Hialeah Campus
27

P209 [Ex 4-24] 4–3


In a hospital unit, 8 nurses and 5 physicians; 7 nurses and The Multiplication Rules and Conditional Probability
3 physicians are females. Find the probability that the
P (A and B) = P (both A and B)
subject is a nurse or a male.
= (An event A occurs in the 1st trial and
Females Males Total event B occurs in the 2nd trial)
Nurses 7 1 8
(* “and” or “both” is in a sentence.)
Physicians 3 2 5
Total 10 3 13 Case 1 Independent Event
When A and B are independent
(i.e., the occurrence of A doesn‟t affect the probability of
the occurrence of B)

Ex ] Find the probability of getting a Head on the coin and


a 4 on the die
p213 [13] P(male) + P(18~24) – P(Male in 18~24)=

P220 Ex 4-25]
Ex] In a statistics class there are 18 juniors, 10 seniors;
There are 3 red balls, 2 blue balls, 5 white balls.
6 of the seniors are females, and 12 of the juniors are
2 items selected and replaced the cards.
males. If a student is selected at random, find the probability
of selecting the following: (  replaced the cards = independent, 2 events)
a. A junior or a female a. 2 blue balls
18 Juniors = 12 males + 6 females
10 Seniors = 4 males + 6 females
b. A blue and a white
28 students= 16 males+12 females

c. A red and blue


b. A senior or a female

Case 2 Dependent Event |


Where P ( | ) Probability B, given that A is already occurred.
c. A junior or a senior
(* The event A – the 1st outcome, a given event, or
previous event - using past sentence)
(* The event B – the 2nd outcome or the last event)

When the probability of the occurrence of the event B is affected


by the occurrence of the event A.

Miami Dade College -- Hialeah Campus


28

P222 Ex 4-30) 3 Cards are drawn from a deck and not replaced the cards P225 Ex 4-32]
(Not replaced = Dependent ) A box contains black chips and white chips.
a. Getting 3 Jacks A person selects 2 chips without replacement.
If the probability of selecting a black chip and a white chip is
if the probability of selecting a black chip on the first draw is
b. Getting an Ace, a King, a Queen and it‟s given that.
Find the probability of selecting a white chip on the second draw.

c. Getting a club, a spade, a heart |

P225 Ex 4-34]
d. Getting 3 clubs A recent survey asked 100 people if they thought women in
the armed forces should be permitted to participate in combat
Gender Yes No Total
Male 32 18 50
Ex) 30% chance to get sick. Find of the probability of selecting Female 8 42 50
2 students and they both are sick in the school. Total 40 60 100
(  It‟s a dependent case and there is already probability ) a. The respondent answered yes,
given that the person was a female.
( was a female; 1st event, yes; 2nd event)
Conditional Probability
|

b. The resident was a male, given that the person answered no


( answered no; 1st event, male; 2nd event)

|
| P230 [33] At an exclusive country club,
68% of the members play bridge and drink champagne,
Ex] A die is rolled twice. Find the probability of getting 4 and 83% play bridge .
If a member is selected at random, find the probability
after getting an even number.
that the member drinks champagne,
 Event A= P(even number) ; 1st outcome given that he or she plays bridge.
 Event B = P(“4”) ; 2nd outcome
|
4
| Try P230 [34]

Miami Dade College -- Hialeah Campus


29

4 - 4 Counting Rules Ex] How many ways can a dinner patron select 2 appetizers,
2 drinks, 3 foods, and 2 desserts on the menu?
1. The Fundamental Counting Rule
In a sequence of events in which the 1st one has , possibilities
and so on, the total number of possibilities of the sequence.
Ex] The digit 0, 1, 2, 3, and 4 are to be used in a four-digit ID card.
How many different cards are possible
a. When events are just listed with “and”, it‟s counting rule case. a. if it can be repeated.
b. Event A, event B and event C = Event A event B event C
(In this case “and” means to multiply)
b. If it cannot be repeated
P233 Ex 4-38] Tossing a coin and rolling a die, find the number of
outcomes for the sequence of events.

 Factorial Notation
;the number of ways a square of n events can over
if the 1st event can occur in k1 ways, the 2nd event can occur
in k2 ways, etc.

P241 [1] How many ways can a base ball manager arrange
A batting order of 9 player? (no repeat)
( 2 different event = 1st outcome 2nd outcome) (  9 positions 9 players)

P233 Ex 4-38]
A paint manufacturer wishes to manufacture several different paints. Ex ] Florida lottery ={1, 2, 3,…,53}
Color Red, blue, white, black, green, brown, yellow By choosing any six numbers out of 53 numbers and the
Type Latex, oil picked numbers are not in order.
53C6 = 22,957,480 = n(S) = total (using calculator)  Very unlikely
Texture Flat, semi gloss, high gloss
Use Outdoor, indoor
How many different kinds of paint can be made if a person
select one color, one type, one texture, and one use? S
53C6 = 22,957,480 outcomes 1
One chance to win

Miami Dade College -- Hialeah Campus


30

2. Permutation Rule p238 Ex 4-46]


Ordered arrangment of different things Given the letters A, B, C, and D list the permutations
and Combinations for selecting 2 letters.
Permutation Combination
AB BA CA DA AB BC
AC BC CB DB AC BD
AD BD CD DC AD CD
3. Combination Rule
A set of different objects in which ordering isn‟t important. 12 ways 6 ways
The elements of a combination are usually
listed alhabetically.
(A set of items in which ordering isn‟t important)

p238 Ex 4-49]
In a club there are 7 women and 5 men. A committee of
„n’ ; items (all different) 3 women and 2 men is to be chosen. Hpw many different
‘r’; items selected out of „n‟ Possiblilities are there?

P241 [1] How many 5-digit zip codes arre possble


a. if digit can be repeated?
( 5 places, 10 digits = 0~9)

b. If there cannot be repetitions?

Ex] How many different tests can be made from a test bank
of 20 questions if the test consists of 5 questions?
( order & repetation are not important.= Combination)

Miami Dade College -- Hialeah Campus


31

Ch5
 From Ch1
A Discrete Variable: assume values that can be counted
A Continuous Variable: can assume all values in the interval between any 2 values

 Discrete Probability Distribution


Consists of the values a random variable can assume and corresponding probabilities
of values. The probabilities are determined theoretically or by observation.
P262 Ex5-1] Construct a probability distribution for rolling a die.

Outcome X 1 2 3 4 5 6 ∑

Probability P(X) 1

X is Discrete Probability Distribution.

 2 Requirements for a Probability Distribution (P.D.)


1. ∑
2.

P265 Ex 5-4] Determine whether each distribution is a Probability


Distribution (P.D.)
0 5 10 15 20 P.D.

0 2 4 6 No P.D.

-1.0 1.5 0.3 0.2

1 2 3 4 P.D.

2 3 7 No P.D.
0.3 0.4 ∑

Miami Dade College -- Hialeah Campus


32

 Mean of a Probability Distribution (P.D.) p275 #8] X 1 2 3 4


∑ P(X) 0.32 0.51 0.12 0.05
Is this a probability distribution?
The mean of a random variable with a discrete probability distribution
X ; outcomes a) all P(x) are
P(X); corresponding probability b) ∑
c) Thus it's not probability distribution.

P262 Ex5-5] Construct a probability distribution for rolling a die.


 Binomial Probability distribution Requirements
Outcome X 1 2 3 4 5 6 ∑
1. There must be a fixed number of trials
Probability P(X) ⁄ ⁄ ⁄ ⁄ ⁄ ⁄ 1 2. The probability of a success must remain the same
for each trial.
3. Each trial can have only two outcomes or

outcomes that can be reduced to outcomes. (
(A die doesn‟t have 3.5, but the theoretical average is 3.5.)
4. The outcomes of each trial must be independent for each other.
P262 Ex5-6] In a family with 2 kids, find the mean of the number
of the kids who will be girls. P285 #1] Are they binomial experiments or not?
# of girls 0 girl 1 girl 2 girls Yes/No (fixed number of trials, only two outcomes)
1. Surveying 100 people to determine if they like Sudsy Soap.
P(X) Yes (100, like or dislike)
2. Tossing a coin 100 times to see how many heads occur.
Yes (100, head or tail)
∑ 3. Drawing a card from a deck and getting a heart
Yes (1, heart or no heart)
4. Asking 1000 people which brand of cigarettes they smoke
p275 #3] X 0 1 2 3 4 No (1000, more than 2 brands)
P(X) 0.18 0.44 0.27 0.08 0.03 5. Testing 4 different brands of aspirin to see which brands are
effective No (no, 4 brands)
a. Is this a probability distribution? 6. Testing 1 brand of aspirin by using 10 people to determine
a) all P(x) are whether it is effective Yes (10, effective or not)
b) ∑ 7. Asking 100 people if they smoke
c) Thus it's probability distribution. Yes (100, smoke or no smoke)
b. Find it's mean. 8. Checking 1000 applicants to see whether they were admitted
to White Oak College Yes (1000, admitted or not)
9. Surveying 300 prisoners to see how many different crimes
they were convicted No (300, more than 2 crimes)
10. Surveying 300 prisoners to see whether this is their 1st offence
Yes (300, 1st offence or not)
Miami Dade College -- Hialeah Campus
33

 Binomial Probability  Using Table:

 Binomial Probability
Mean
Variance
Standard deviation √

P286 # 14] Find mean, variance, and standard deviation


1.

P285 #3] Compute the probability of X success, √


Using Table B in Appendix C.(p636)
1. 2.
2.
3. √
4.
5. 3.
6.
7. √
8. 4.
9.

P285 #3] Compute the binomial probability of X success,
1. 5.


2.
6.

3. √

7.
4.

5. 8.


Miami Dade College -- Hialeah Campus
34

Ch6  Properties of a normal distribution


1. A normal distribution curve is bell-shaped
Discrete Random Variable; Binomial Distribution
2. The mean, median, and mode are equal and are located
Continuous Random Variable; Normal distribution interval (a, b)
at the center of the distribution
ex) height, weight, temperature, blood pressure, & time
3. A normal distribution curve is unimodal (i.e., it has only one mode)
In theory, a normal distribution curve is the theoretical counterpart to a relative 4. The curve is symmetric about the mean, which is equivalent to
frequency histogram for a large number of data values with a very small class width.
saying that its shape is the same on both sides of a vertical line passing
through the center
5. The curve is continuous, that is, there are no gaps or holes.
For each value of X, there is a corresponding value of Y
6. The curve never touches the x axis.
Theoretically, no matter how far in either direction the curve extends,
it never meets the x axis – but it gets increasingly closer
7. The total area under a normal distribution curve is equal to 1.00,
or 100% (
8. The area under the part of a normal curve that lies within 1 standard
deviation of the mean is approximately 0.68, or 68%; within 2 standard
deviations, about 0.95, or 95%; and within 3 standard deviations,
about 0.997, or 99.7%. The Empirical rule applies.

In statistics, a standard score is derived by subtracting the population mean from an


individual raw score and then dividing the difference by the population standard deviation.

Miami Dade College -- Hialeah Campus


35

 Standard scores are also called z-scores.


Case 3

 Standard Normal Distribution is a normal distribution with a mean


of 0 and a standard deviation of 1.

 Finding Area Under the Standard Norrmal Deviation Curve


1. Draw a picture
2. Put the Z on the graph and shade the area
3. Find the value of probability(=area) in the table
(Cumulative Standard Normal Distribution Table) Case 4
= +
Case1 For the area to the left of a specified z value, use the table entry directly. = + ]

-a b

Another way

Case 2

Miami Dade College -- Hialeah Campus


36

 Finding z Value that corresponds to the given area


Find z in the table

(ex) 89.07% = 0.8907  a = 1.2 + 0.03 = 1.23

x%

-a 0
a
( )

(ex) 10.93% = 0.1093  a = - 1.2 + 0.03 = -1.23

x%
]
]
-a

(ex) 10.93% = 0.1093 a = - 1.2 + 0.03 = -1.23  answer is 1.23

x%
-a 0 a

-a a

0 a

Miami Dade College -- Hialeah Campus


37

 Normal Distribution  Finding probability for a normally distributed variable by transformong it


Non- Standard Normal distribution; onto a standard nomal variable.

Standard Normal distribution; Step 1) Find the z value ceresponding to a given number X or X 1 and X2.

 Relationship between x and z


Poppulation Sample
Step 2) Drawing the figure and represent the area
̅
(to the left , right, beween or union area of the z)

X=z ̅

 Continuous Random Value (z has been calculated.)


Suppose X Normal distribution ( z
Suppose a typical score = X = Step 3) Find the probability or the area in the table.

Suppose Standrd Diviation =

Miami Dade College -- Hialeah Campus


38

<Finding probabilities (area) for a normally distributed variable by <Finding specific data values for given percentage,
transforming it into a standard normal variable by using the formula> using the standard normally distribution  >
Ex 1] The average or the mean = 3.1 hours The standard deviation = 0.5 Ex 3) In the top 10%, the mean is 200 and the standard deviation is 20.
Find the percentage of less than 3.5 hours. Find the lowest possible score to quality.
Step 1) Step 1)

Step 2 )
z
Step 2) Find the z in the table
Step 3)
0.8
Step 3) 0.8 + .00  0.7881 Step 4)

Ex4 ) To select in the middle 60% of the population, the mean is 120,
Ex 2 ] The mean is 28 lb, and the standard deviation is 2 lb. and the standard deviation is 8. Find the upper and lower values
1. Between 27 and 31 lb Step 1) 60% = 0.60
Step 1)

-a 0 a
Step 2)
Step 2)
]
( )

-0.5 1.5
Step 3) Step 3)
2. More than 30.2 lb
Step 1) Step 4)

Step 2)

1.1
Step 3)

Miami Dade College -- Hialeah Campus


39

Ch 7  Confidence Interval Estimate of a Parameter, say population mean

<Use the Central Limit Theorem to Solve Problems Involving Sample Means Population
for Large Samples> Random Random Value
Sample ̅
A Statistic
 A Sample Distribution of Sample Means
A Parameter
A distribution obtained by using the means computed from random samples of a
specific size taken from a population
A Statistic ; a characteristic or measure obtained by using
 Sampling Error the data values from a sample.
The difference between the sample measure and the corresponding population
measure due to the fact that the sample is not a perfect representation of the A Parameter; a characteristic or measure obtained by using
populatiion. all the data values for a specific population

Ch5. Two Requirements for a Probability Distribution


1. < Confidence Intervals for the Mean (
2. ∑ and Sample Size >

Two Requirements for Distribution of Sample Means  A Point Estimate ( ̅)


∑ ̅ A specific numerical value estimate of a parameter
The best point estimate of the population mean the sample mean ̅
̅
(Consider is true value unkown best estimated value)
 Probabilities of the Distribution of Sample Means
 3 Properties of a Good estimator
1. The mean of sample means will be the same as the population mean
1. The estimator should be an unbised estimator.
̅ The expected value or the mean of the estimates obtained from

samples of a given size is equal to the parameter being estimated.
̅ √ ̅̅̅̅
̅
̅
2. The estimator should be consistent.
is the population standard deviation.
For a Consistant estimator, as sample size increase, the value of
is the size of sample (or the number of observations in the sample).
the estimator approaches the value of the parameter estimated.
depends on the level of confidence. ̅
3. The estimator should be a relatively efficiient estimator.
That is, of all the ststistics that can be used to estimate a parameter,
The relatively efficient estimator has the smallest variance.

Miami Dade College -- Hialeah Campus


40

An Interval Estimate of a Parameter is an interval or a range of values  | ̅|


used to estimate the parameter.
1. Confidence Interval estimate of
The estimate may or may not contain the value of the value of the paraameter being
2. Suppose
estimated.
3. Find answer with “Z – table”
The Confidence Level of an interval estimate of a parameter is the probability that
4. The Maximum Error of Estimate ( E = Margin of error)
the interval eatimate will contain the parameter.
the maximum likely difference betwwen the point estimate of a parameter and
the actual value of the parameter.
A confidence Interval is a specific interval estimate of a parameter determined by
using data obtained from a sample and by using the specific confidence level of the ̅
estimate. When

 Range of Values

 Which may contain ⁄
 It‟s called interval estimate of to find confidence level ; ⁄
√ √
 Probability of success
is e total area in both tailsof the standard normal distribution curve.  The Confidence Interval of the Mean for a Specific

̅ ( ) ̅ ( ) ̅ ( )
√ √ √
̅ ̅ ̅ ̅
̅

 The Minimun Sample of Size


Reqired for coffidence
an Interval Estimate of the (population mean)

( )
Confidence Interval (C.I.) ⁄

90% 100% - 90% = 10% 10%=0.10 0.05 **** Round up to next whole number ****
Ex) 0.23  1 ; 2.43  3 ; 4.91  5
95% 100% - 95% = 5% 5% = 0.05 0.025
99% 100% - 99% = 1% 1% = 0.01 0.005

 99% Confidence Interval is better than 90% or 95%


because the Confidence Level is larger.

Miami Dade College -- Hialeah Campus


41

P364 #11
A sample reading score of 35 students

A, Find the best point estimate of the mean


̅ ̅
B. Find 95% confidence interval of the mean reading scores of all students
 ̅ ̅

[Step 1]

⁄ ⁄

[Step 2]
In the table, it‟s less than 0.50 (50%), so look at the ⊝ table

[Step 3] use only +1.96


√ √
* Round up to next whole number
[Step 4]
̅ ̅

Miami Dade College -- Hialeah Campus


42

<How large a sample is nessecery to make an accurate estimate?> <Confidence Intervals for the Mean (
and Sample Size>
a. Depend on 3 things
E (the maximun error of estimate) 1. -- Charicteristics of the Distribution
(the population standard diviation) 1. A normal distribution curve is bell-shaped
The degree of confidence (30%, 95%, 99%, etc.) 2. The mean, median, and mode are equal and are located
b. Use The Minimun Sample of Size at the center of the distribution ( )
3. A normal distribution curve is unimodal (i.e., it has only one mode)

( ) 4. The curve is symmetric about the mean
5. The curve is continuous,
6. The curve never touches the x axis.
P365 #21
A university dean of students wishes to the average number of estimate hours 2. -- Charicteristics of the Distribution
students spend doing homework per week.  The variance
Standard diviation is 6.2 hours.  A family of curves based on the concept of ”Degrees of Freedom:”,
How large a sample must be selected, if he wants to be 99% confidence of finding  which is related to sample size.
whether the true mean differs from the sample mean by 1.5 hours? Degrees of Freedom = d.f. D.F. =
 To find The Minimun Sample of Size  As the sample size increase, the distribution approches
[Step 1] the standard normal distribution.
*At the bottom of the table where , the ⁄ values can be found for
specific confidence intervals

 Normal distribution gragh has only one curve,


but t – distribution gragh is changed & it depends on n..

⁄ ⁄

[Step 2]
In the table, it‟s less than 0.50 (50%), so look at the ⊝ table

[Step 3] use only +2.575  A Specific Confidence Interval of the Mean,


When is Unknown and Sample Size

( ) ( )
̅ ( ) ̅ ( ) ̅ ( )
√ √ √
* Round up to next whole number ̅ ̅ ̅ ̅
̅

Miami Dade College -- Hialeah Campus


43

 How to find the Confidence Interval Estimate of p373 #20


When is Unknown and Sample Size A sample of data; 61, 12, 6, 40, 27, 38, 93, 5, 13, 40
[Step 1] d.f. = n -1 Construct a 98% confidence interval based on the data.
[Step 2] Critical Values = Use the t- distribution table; [Step1] d.f. = n -1 = 10-1 =9
[Step 2] Critical Values = Use the t- distribution table;
left colum of d.f.'s the number & top row (Confidence Interval)
which is given 90%, 95%, 99%, etc.  left colum of d.f. 9 & top row (Confidence Interval) 98%  2.821 =
[Step 3] The Maximum Error of Estimate [Step 3] s =?

⁄ ∑ ̅ ∑ ̅

√ √
√ √
finding sample standard diviation (p136) ⁄
* Rounding 2 decimal places (dp = 2 CF significant figure) √ √
[Step 4] [Step 4] = 98% (given)
[Step 5] the Confidence Interval Estimate of [Step 5] the Confidence Interval Estimate of
̅ ̅ ∑
̅
Ex) Find the ⁄ value for a 95% confidence interval, sample size is 22. ̅ ̅
[Step 1] the d.f. = 22-1 =21 & C.I. =95%
[Step 2] Use the t- distribution table;
left colum 21 & top row (Confidence Interval) 95%  2.080
p372 #11
n = 28, for 95% confidence interval, sample standard diviation=2
 When to Use the z or t Distribution (p371) [Step1] d.f. = n -1 = 28-1 = 27
[Step 2] Critical Values = Use the t- distribution table;
Is known? Use ⁄ values
Yes left colum of d.f. 27 & top row (Confidence Interval) 95%  2.052 =
No matter what the sample size is.
[Step 3]
No

Is ?
Yes √ √
Use ⁄ values and [Step 5] the Confidence Interval Estimate of
S in place of in the formula ∑
No ̅

Use ⁄values and ̅ ̅


s in the formula Variable must be approximately normaly
distribution

Miami Dade College -- Hialeah Campus


44

p372 #5 (H.W. #20) p325 #30


99% Confidence Interval Estimate of ̅ The scores are normaly distributed, with a mean of 62 and a standard diviation of 8.

[Step1] d.f. = n -1 = 20-1 =19 a. Find 67th percentile. ( simmilar to H.W. #10)
[Step 2] Critical Values = t- distribution table; [Step 1] 67% = 0.67  in the table z = 0.44

left colum of d.f. 19 & top row (Confidence Interval) 99%  2.861 =

[Step 3]
√ √ z =0.44
[Step 4] the Confidence Interval Estimate of
̅ ̅ [Step 2]

 Percentiles; Divide the data set into 100 equal groups. (p151)
66

 Quartiles; Divide the distribution into 4 equal groups. b. Find (3rd quartile)
[Step 1] % = 0.75  in the table
0.7486 (0.0014 difference)  closer  Z = 0.67
0.7517 (0.0017 difference)

z =0.67
[Step 2]

68

H.W. #7)
Step 1]

-a 0 a
Step 2] ]
( )

Miami Dade College -- Hialeah Campus


45

P325 #12 Ch 8
N=4 Sample size= 3 = n ̅
90
 Statistical Hypothesis; a conjecture about a population
150 110 ∑ parameter. This conjecture may or may not be true.

190

̅
 Hypothesis-Testing Common Phrases (P400)

Ex) N=4, n = 2
̅
at least at most Is greater than Is less than
2 6
no less than no more than Is above Is below
4 8
∑ √ Is greater than or Is less than or Is incresed Is decresed or
√ √ equal to equal to reduced from
Is longer than Is shorter than
√ Is bigger than Is smaller than
̅
√ √ Is highter than Is lower than

Is equal to, is the same as, has not changed from,


̅
is exactly the same as
the population standard deviation.
the size of sample (or the number of observations in the sample). Is not equal to, is not the same as, has changed from,
is different from
depends on the level of confidence.
̅

 Two Types of Statistical Hypotheses for each situation


̅
= confidence level Hypothesis
Between a parameter and a specific
E = The Maximum Error of Estimate or Margin of error value or Suggestions
(= claim)
Between 2 parameters

The degree of confidence (30%, 95%, 99%, etc.) Null No difference
d.f. = Degrees of Freedom
Alternative
A difference
(= research)

Miami Dade College -- Hialeah Campus


46

< Traditional Method > 1. Statistical Test

 Six Steps of Hypothesis- Testing Uses the data obtained from a sample to make a decision about whether
the (null hypothesis) should be rejected.
1. 1. The Null Hypothesis ( )
2. 2. The Alternative Hypothesis ( ) 2. Test Value
The numerical value obtained from a statistical test
3. 3. Test Statistics (T.S.) (or Test Value=T.V.) for z or t
̅ Null Alternative
Tails Gragh
( ) ( )
⁄√
̅
⁄√ Two - Tailed Test

4. 4. Critical Values (C.V.s) using


Right – Tailed Test


5. 5. Decision; Always about (Null) 2 options
[ ]

Left – Tailed Test


6. 6. Conclusion: about the Claim that could be or
Reject B
Do Not Reject A
Claim
Reject A
Do Not Reject B
A: There is a enough evidence to support the claim that ~
There is not a enough evidence to reject the claim that ~
B: There is not a enough evidence to support the claim that ~
There is a enough evidence to reject the claim that ~

One-tail
0.25 0.20 0.10 0.05 0.025 0.02 0.01 0.005
Two-tail
0.50 0.40 0.20 0.10 0.05 0.04 0.02 0.01

z 0.67 0.84 1.28 1.645 1.96 2.05 2.33 2.58

Miami Dade College -- Hialeah Campus


47

Ex) State the null and Alternative hypotheses for each conjecture.  Hypothesis Testing & a Jury Trial
1. A researcher thinks that if expectant mothers use vitamin pills, the weight of the True False
babies will increase. The average birth weight of the population is 8.6 lb. (Innocent) (Not Innocent)
Ans.; Increase  , Reject Type I Error
Average ~ of the population is 8.6 lb.  Correct Decision
Find guilty P ( Type I Error ) =
Do not Reject Type II Error
2. An engineer hypothesizes that the mean number of defects can be decreased in a Correct Decision
Find not guilty P ( Type I I Error ) =
manufacturing process of compact disks by using robots instead of humans for
certain tasks. The mean number of defective disks per 1000 is 18.  Type I Error: If when , Reject
Ans.; Decrease  ,  Type II Error: If when , Do Not Reject
Average ~ is 18 
 Type I Error: There is not sufficient evidence to support the claim
3. A psychologist feels that playing soft music during a test will change the results (There is enough evidence to reject the claim.)
of the test. The psychologist is not sure whether the grades will be higher or Probability of type I error = P(type I error ) =
lower. In the past, the mean of the scores was 73. Type II Error: There is sufficient evidence to support the claim
(There is not enough evidence to reject the claim.)
Ans.; Change  , Whether ~ higher or lower,
Probability of type II error = P(type I I error ) =
Mean ~ was 73 

* T.S. = Test Statistics (or Test Value=T.V.)


* C.V. = Critical Values (A significance level of
 Situation in Hypothesis Testing ̅
1. If is true
the population standard deviation.
the size of sample

2. If is false

1.
2.
3.
4. We are testing if is outside the confidence interval.
5. We are testing whether ̅ is in one of the rejection regions.

Miami Dade College -- Hialeah Campus


48

< Z Test for a Mean > Ex) Left-tailed Case

Ex) Two-tailed Case

̅
̅
[Idea;  left tailed test, n=36 30  z table]
[Idea;  2 tailed test, n=35 30  z table]
Step 1)
Step 1)
Step 2)  Claim
Step 2)  Claim ̅
̅ Step 3) ⁄√ ⁄√
Step 3) ⁄√ ⁄√
Step 4) Critical Values (C.V.s) using z table
Step 4) Critical Values (C.V.s) using z table
Reject Non Reject

Region Region

Do not Reject
Reject Reject -1.56 -1.28 =0
Region Region (T.S) (C.V.)
Step 5) Decision always about (Null) 2 options
The decision is to reject (Null)
- 2.58 1.01 2.58
(C.V.) (T.S) (C.V.) Step 6) Conclusion (about Claim that
There is sufficient evidence to support the claim that
Step 5) Decision always about (Null) 2 options average cost is less than $80.
(Null) does not reject because the test value is
in the nonrejection (noncritical) region
(Null) : false
Alternative) : true
Step 6) Conclusion (about Claim)
There is not sufficient evidence to support the claim that
the average cost is different from $24,267. (

*** Fail to Reject = Do not Reject

Miami Dade College -- Hialeah Campus


49

Ex) Right-tailed Case  Hypothesis Test: wording of Final Conclusion ( 1 )


A: There is a sufficient evidence to support the claim that ~
B: There is not a sufficient evidence to support the claim that ~
̅
[Idea;  right tailed test, n=30 30  z table]
Step 1) Test Value z
Step 2)  Claim
̅
A B
Step 3) ⁄√ ⁄√
(Because the claim (Because the claim
is true) could be false)
Step 4) Critical Values (C.V.s) using z table
-1.96 1.96 2.05
Non Reject Reject (C.V.) (C.V.) (T.S)
Region Region Test Value z

B A
=0 1.32 1.65 (Because the claim (Because the claim
(T.S) (C.V.) is false) could be true)
Step 5) Decision always about (Null) 2 options -1.96 1.65 1.96
The decision is not to reject (Null) (C.V.) (T.S) (C.V.)
Step 6) Conclusion (about Claim)
There is not sufficient evidence to support the claim
That average cost is more than $42,000.
Test Value z
 Hypothesis Test: wording of Final Conclusion
1. Does the Claim have equaility ( = )? A B
(Because the claim (Because the claim
Yes Reject = Cancel B is true) could be false)
( ) Do not reject = Keep A -1.75 -1.645
(T.S) (C.V.)
No Reject = Keep A
Test Value z
Do not reject = Cancel B
B
A
A: There is a enough evidence to support the claim that ~ (Because the claim
(Because the claim
There is not a enough evidence to reject the claim that ~ is false)
icould be true)
B: There is not a enough evidence to support the claim that ~ -1.645 -0.3
There is a enough evidence to reject the claim that ~ (C.V.) (T.S)

Miami Dade College -- Hialeah Campus


50

< P-Value Method >  Six Steps of Hypothesis- Testing


(P - value, p- value, or Probability Value)
1. 1. The Null Hypothesis ( )
The probability of getting a sample statistic (such as the mean) or a more extream
sample statistic in the direction of (the alternative hypothesis) 2. 2. The Alternative Hypothesis ( )
when the (the null hypothesis) is true. 3. 3. Test Statistics (T.S.) (or Test Value=T.V.) for z or t
The actual area under the standard normal distribution curve representing the by respective formulas
probability of a particular sample statistic or a more extream sample statistic ̅ ̅ ̅
occurring if (the null hypothesis) is true ̅ ⁄√ ⁄√
4. 4. Find P - value using z or t table (about the probability) with T.S.
 Guidelines for P-values
Type of the Test P – V alue Graphes
P-value & (Null) The Difference is
P- Value (twice this area)
P – value Reject Highly significant Twice the area to
0.01 P – value 0.05 Reject Significant the left of the test
statistic
0.05 P – value 0.10 Type I error
Two tailed (T.S)
Do not (twice this area) P- Value
0.10 P – value Not significant
reject Twice the area to
the right of the test
*** P- Value is all about Area (Probability) statistic
(T.S)
P- Value
To the left of the
Left – Tailed
test statistic
(T.S)
P- Value
To the right of the
Right – Tailed
test statistic
(T.S)

5. 5. Decision; about (Null) [ ]

6. 6. Conclusion; about the Claim (It is or )


A: There is a sufficient evidence to support the claim that ~
B: There is not a sufficient evidence to support the claim that ~

Miami Dade College -- Hialeah Campus


51

Ex) A significance level of is used in testing the claim that Ex) The claim that the average age of lifeguards in a city is greater than 24 years.
and the sample data result in a test statistic of . A sample of 36 guards, the mean of the sample to be 24.7years, and a standard
[Idea;  right tailed test] deviation of 2 years.
Is there evidence to support the claim at = 0.05?
Step1) z = 1.18 with z table  0.1190 [Idea;  right tailed test, n=36 30  z table]
Step 2) P-value = P(z=1.18) = 0.119
Step 1)
Step 2)  Claim
̅
Step 3) ⁄√ ⁄√
1.18 (T.S)
P- Value = 0.1190 Step 4) Find P - value using z table
Step 3) Fail to reject P (Right tailed  symmetric
Step 4) The P-value of 0.119 is relatively large, indicating that Non Reject Reject = 0.05
the sample results could easily occur by chance. Region Region
p-value = P(-2.1) = 0.179
Ex) A significance level of is used in testing the claim that
and the sample data result in a test statistic of .
[Idea;  two- tailed test] =0 2.10 (T.S)
Step 5) Decision always about (Null)
Step1) z = 2.34 with z table  0.0096 P-value = 0.179 = 0.05 ; Reject
P-value = 0.0096 2 = 0.0192
Step 6) Conclusion (Also about Claim)
Step 2) /2 =0.025 There is sufficient evidence to support the claim
P-value /2= 0.019/2 =0.0096 That average aget is greater than 24 years.

-2.34(T.S) 2.34 (T.S)


P- Value 0.0192 0.05 =
Step 3) Reject
Step 4) The P-value of 0.0192 is small, indicating that
the sample results are not likely to occur by chance.

Miami Dade College -- Hialeah Campus


52

< t Test for a Mean >

Ex) A job director claims that the average starting salary for nurses is
$24,000. A sample of 10 nurses has a mean of $23,450 and a
standard deviation of $400.
Is there enough evidence to reject the director‟s claim at ?
Assume the variable is normally distributed.
[Idea;  2 tailed test, n=10 30  t table]

Step 1)  Claim
Step 2)
̅
Step 3) t ⁄√ ⁄√

Step 4) Critical Values (C.V.s) using t table

Do not Reject
Reject Reject
Region Region

-4.35 - 2.262 2.262


(T.S) (C.V.) (C.V.)
Step 5) Decision always about (Null) 2 options
(Null) rejects because the test value is
in the rejection (critical) region

Step 6) Conclusion (about Claim)


There is not enogh evidence to support the claims
the starting salary of nurses is $24,000.

Ex) Find the P - value when the t test value is 2.983, the sample size is 6,
and the test is two-tailed.

Step 1) d.f. = 6-1=5 2-tailed values=2.983 with t - table


Step 2) They are 2.571 & 3.365  0.05 & 0.02
Step 3) 0.02 < P-Value < 0.05

Miami Dade College -- Hialeah Campus


53

Tip ! Ch 9

<Testing the Difference between 2 Means>


 Assumptions for the Test to determine the Difference
 Using the z or t test between 2 Means
Yes 1. The samples must be independent of each other. (They are not related.)
̅
⁄√ 2. Normally distributed, (symmetric)
̅
3. Three Types of Formulas
⁄√
̅ 1. 1) the standard deviations of variable must be known, and (both)
⁄√ ̅̅̅ ̅̅̅ ̅̅̅̅ ̅̅̅̅

 The claim is right  √ √ √


There is a enough evidence to support the claim ~
( or There is not a enough evidence to reject the claim that ~)

 The claim is wrong 


Two Tailed
There is not a enough evidence to support the claim that ~
There is a enough evidence to reject the claim that ~
Left Tailed
 Six Steps of Hypothesis- Testing
1. The Null Hypothesis ( ) - Prove Innocent
2. The Alternative Hypothesis ( ) - Prove not Innocent Right Tailed
3. Test Statistics (T.S.) (or Test Value=T.V.) for z or t
4. by respective formulas 2) the standard deviations of variable are unknown, and
̅̅̅ ̅̅̅ ̅̅̅̅ ̅̅̅̅
5. or
6. Critical Values (C.V.s) using z or t table √ √ √
7. Decision; Always about (Null) 2 options
3) the standard deviations of variable are unknown, and
[ ] ( one or both)  t - Table
8. Conclusion; Always about the Claim that is or

 The Confidence Interval for Difference between 2 Means


( both)

̅̅̅ ̅̅̅ ⁄ √ ̅̅̅ ̅̅̅ ⁄ √

Miami Dade College -- Hialeah Campus


54

Ex) a average hotel room rate in N.Y. is $88.42 and in Miami is $80.61. Ex) There are 2 groups who left their profession within a few months after
Two samples of 50 hotels each, the standard deviation were $5.62 & graduation (leavers) and who remained in their profession after they
$4.83. At , can it be concluded that there is a significant graduated (stayers).
difference in the rates? Test the claim that those who stayed had a higher science grade point
average than those who left. Use
[ Idea;
Leavers Stayers
]
̅̅̅ ̅̅̅
Step 1)
Step 2)  Claim
[ Idea; ]
Step 3) Test Statistics (T.S.) (or Test Value=T.V.)
̅̅̅ ̅̅̅ Step 1)
Step 2)  Claim
√ √
Step 3) Test Statistics (T.S.) (or Test Value=T.V.)
Step 4) Critical Values (C.V.s) using z table ̅̅̅ ̅̅̅
⁄  using z table 
√ √
Step 5) Decision always about (Null) 
Do not Reject Step 4) Critical Values (C.V.s) using z table
Reject Reject  using z table 
Region Region
Step 5) Decision always about (Null) 
Do not Reject
Reject
-1.96 +1.96 7.45 Region
Step 6) Conclusion; about the Claim 
There is enough evidence to support the claim that
-2.33 -2.01
Step 6) Conclusion; about the Claim
There is not enough evidence to support the claim that

Miami Dade College -- Hialeah Campus


55

<Testing the Difference between 2 Means : Small Dependent Samples> Ex) A dietitian wishes to see if a person's cholesterol level will change if the diet is
Two Tailed Left Tailed Right Tailed supplemented by a certain mineral.
Six subjects were pretested, and then they took the mineral supplement for a 6-
week period. The results are shown in the table.
Can it be concluded that the cholesterol level has been
changed at

 Six Steps of Hypothesis- Testing Subject 1 2 3 4 5 6


Before 210 235 208 190 172 244
Step 1) The Null Hypothesis ( )
After 190 170 210 188 173 228
Step 2) The Alternative Hypothesis ( )
[ Idea;
Step 3) Test Statistics (T.S.) (or Test Value=T.V.) for z or t ]
by respective formulas
Step 1)
* Compute the
Step 2)  Claim
Step 3) Test Statistics (T.S.) (or Test Value=T.V.)

∑ ∑
∑ ∑
210 190 20 400
∑ 235 170 65 4,225
̅̅̅
208 210 -2 4
∑ 190 188 2 4
√∑
172 173 -1 1

̅̅̅ 244 228 16 256


⁄√ ∑ ∑ ∑ ∑
Step 4) Critical Values (C.V.s) using t table, degree of freedom 1,259 1,159 100 4,890
Step 5) Decision; Always about (Null) 2 options ∑
̅̅̅
[ ]

√∑
Step 6) Conclusion; Always about the Claim that is or

A: There is a sufficient evidence to support the claim that ~
B: There is not a sufficient evidence to support the claim that ~
̅̅̅
⁄√ ⁄√
Step 4) Critical Values (C.V.s) using t table
 using t table d.f.=6 - 1 = 5 

Miami Dade College -- Hialeah Campus


56

Step 5) Decision always about (Null)  Ch 10


Do not Reject
Reject Reject <Linear Correlation Coefficient r >
Region Region  A scatter plot : a graph of the ordered pairs of numbers
consisting of the independent variable and the dependent variable

 the Correlation Coefficient r


-2.0015 +2.015 Computed from the sample data measures the strength and direction of a linear
Step 6) Conclusion; about the Claim  relationship between two variables.
There is not enough evidence to support the claim that The symbol for the sample correlation coefficient is r.
The symbol for the population correlation coefficient is (Greek letter rho)

 Range of Values for the Correlation Coefficient

-1 -0.5 0 0.5 +1
Perfect Strong No Linear Strong Perfect
Negative Negative Relationship Positive Positive
L.C.C. L.C.C. L.C.C. L.C.C. L.C.C.
* L.C.C.=Linear Correlation Coefficient

< Line of Best Fit >


̂

*In Algebra,
In statisic,

 Relationship Between the Correlation Coefficient and


the Line of Best Fit

(a) r = 0.50 (b) r = 0.90 (strong) (c) r = 1.00 (perfect)

(d) r = - 0.50 (e) r = - 0.90 (strong) (f) r = -1.00 (perfect)


Miami Dade College -- Hialeah Campus
57

Ex) In a study of age and systolic blood pressure of 6 randomly


*** When the line is horizontal selected subjects.
Subject A B C D E F
Age x 43 48 56 61 67 70
Pressure y 128 120 135 143 141 152
 Four Steps of Hypothesis- Testing
Step1)
Step 1)

43 128 5504 1849 16384


∑ ∑ ∑ ∑ ∑ 48 120 5760 2304 14400
56 135 7560 3136 18225
Step 2) the Correlation Coefficient r
∑ ∑ ∑ 61 143 8723 3721 20449
67 141 9447 4489 19881
√ ∑ ∑ ] ∑ ∑ ]
70 152 10640 4900 23104

Step3) the values of a and b ∑ ∑ ∑ ∑ ∑


∑ ∑ ∑ ∑
∑ ∑ 345 819 47,634 20,399 112,443
∑ ∑ ∑ Step 2) the Correlation Coefficient r
∑ ∑ ∑ ∑ ∑
Step 4) When r is significant, The Regression Line Equation
̂ √ ∑ ∑ ] ∑ ∑ ]

∑ ∑ ∑
√ ] ]
* The correlation coefficient suggests a Strong Positive Relationship
between age and blood pressure.
Step 3) the values of a and b
∑ ∑ ∑ ∑
∑ ∑
∑ ∑ ∑
∑ ∑
Step 4) When r is significant, The Regression Line Equation
̂
Miami Dade College -- Hialeah Campus
58

Pressure
y Source
150
Elementary Statistics - A Brief Version
140
http://rchsbowman.wordpress.com/category/statistics/statistics-notes/page/7/
130
http://statistics.laerd.com/statistical-guides/normal-distribution-calculations.php
120
http://rchsbowman.wordpress.com/2009/12/01/statistics-notes-%E2%80%93-the-
110 standard-normal-distribution-2/

http://www.analyzemath.com/statistics/mutually-exclusive.html
40 50 60 70 x
Age
Plot (43, 128) , ( 48, 120), (56, 135), (61, 143), (67, 141), & (70, 152)

Ex) Using the equation, predict the blood pressure for a person
who is 50 years old.
[ Idea: 50 years old  a value ]
̂

In other words, the predicted systolic blood pressure for


a 50-years-old person is 129.

 The t Test for the Correlation coefficient

Miami Dade College -- Hialeah Campus

You might also like