STA2023 Summary Notes: Chapter 1 - 10
STA2023 Summary Notes: Chapter 1 - 10
STA2023
Summary Notes
Chapter 1 - 10
Dr. Mohammad Shakil
Editor: Jeongmin Correa
2
Ch1
1 - 1 Descriptive and Inferential Statistics
Contents
Statistics
The Methods of classification
and Analysis of numerical & non-numerical data
Chapter 1: The Nature of Probability and Statistics
For Drawing valid conclusion and making reasonable decisions.
Chapter 2: Frequency distribution and Graphs < Two Major Areas of Statistics >
Chapter 7: Confidence Intervals and Sample Size * Probability; the chance of an event occurring.
Cards, dice, bingo, & lotteries
Chapter 8: Hypothesis Testing
In order to gain information about seemingly haphazard
events, statisticians study random variables.
Chapter 9: Testing the Difference Between Two Means,
Two Variances, and Two Proportions 1. Variables
A variable is a characteristic or an attribute that can assume different values.
Chapter 10: Correlation and Regression Height, weight, temperature, number of phone calls received, etc.
2. Random Variables
Variables whose values are determined by chance
When the population is large and diverse, a sampling method must be designed so
that the sample is representative, unbiased and random, i.e. every subject (or element)
in the population has an equal chance of being selected for the sample.
1. Random Sampling
This method requires that each member of the population be identified and
assigned a number.
Then a set of numbers drawn randomly from this list forms the required random
sample.
Note that each member of the population has an equal chance of
being selected.
Ex) For a large population, computers are used to generate random
numbers which contain series of numbers arranged in random order.
3. Stratified Sampling
This method requires that the population be classified into a number of smaller
homogeneous strata or subgroups.
A sample is drawn randomly from each stratum.
= Subdivide the population into at least 2 different subgroups
(or strata) so that subject within the same characteristics ( such as
gender or age bracket) then draw a sample from each subgroup.
Ex) age, sex, marital status, education, religion, occupation, ethnic
background or virtually any characteristic.
Ex) a community can be divided into city blocks as its clusters. Several blocks
A measure of reliability is a statement (usually quantified) about the degree of
are then randomly selected. After this, residents on the selected blocks are
uncertainty associated with a statistical inference.
randomly chosen, providing a sampling of the entire community.
The only exception is if 1st or the last class starts with „zero‟ frequency. Ex1)
5. The classes must be equal in width. Class Limits Class boundaries
The only exception that has an open-ended class. 24 – 30 (24 – 0.5) – (30 + 0.5) 23.5 – 30.5
(below, and more, etc.) 31 – 37 (31 – 0.5) – (37 + 0.5) 30.5 – 37.5
Ex2)
Class Limits Class boundaries
2.3 – 2.9 (2.3 – 0.05) – (2.9 + 0.05) 2.25 – 2.95
3.0 – 3.6 (3.0 – 0.05) – (3.6 + 0.05) 2.95 – 3.65
10. Relative Frequency = frequency ÷ total number = P41 Ex 2-2) Record High Temperatures - Grouped F. Distribution
112 100 127 120 134 118 105 110 109 112 110 118 117 116 118
11. Percent = ⁄ 122 114 114 105 109 107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104
12. Midpoint 111 120 113 120 117 105 110 118 112 114 114
Solution)
1. Range = Highest value – Lowest value 134-100 = 34
2. The Number of classes desired that between 5 and 20 classes. 7 classes
P38 Ex) Distribution of Blood types - Categorical F. Distribution 3. The Class Width = Range ÷ the number of classes 34 ÷ 7= 4.9 5
A B B AB O O O B AB B B B O (Round up to the next whole number)
A O A O O O AB AB A O B A 4. Select the starting point for the lowest class limit. 100
5. Subtract one unit from the lower limit of the second class to get the upper limit of
Blood Type A: 5 people Blood Type B: 7 People
the 1st class.
Blood Type O: 9 people Blood Type AB: 4 people Total:25 people
Then add the width to each upper limit to get all the upper limits.
Class Tally Frequency Relative F. Percent (%) 100-104, 105-109, 110-114 , 115-119, 120-124, 125-129, 130-134
A 5 ⁄ ⁄ 20% 6. Find boundaries.
Lower Boundary = Lower Limit – 0.5 (or 0.05) depend on the
B 7 ⁄ 28%
Upper boundary = Upper Limit – 0.5 (or 0.05) number of the data
O 9 ⁄ 36%
7. Tally & Frequency: Count the number of data of each class
AB 4 ⁄ 16%
8. Find the sum of all of Frequencies.
Total ∑ 1 100% 9. Cumulative Frequency: adding the frequencies of the classes less
than or equal to the upper class boundary of a specific class.
** The number the last class and the frequencies‟ sum must be same.
Class Cumulative Frequency
A 5 10. Relative Frequency = frequency ÷ total number ⁄ each class
B 5+7 = 12 11. Percent = ⁄
O 12+9 = 21 12. Midpoint
AB 21+4 = 25 (=∑ )
100 -104 2 ⁄ ⁄ ⁄ 2
Histogram
105-109 8 ⁄ ⁄ ⁄ 10 the data by using continuous vertical bars (unless the frequency of a
class is 0) of various heights to represent the frequencies of the classes
⁄
110-114 18 ⁄ 28 Frequency Polygon
⁄
the data by using lines that connect points plotted for frequencies for the classes.
115-119 13 ⁄ ⁄ 41
(starts from zero)
120-124 7 ⁄ ⁄ 48
The frequencies are represented by the height of the points.
125-129 1 ⁄ ⁄ 49
Ogive (=Cumulative frequency)
130-134 1 ⁄ ⁄ 50 the cumulative frequencies for the classes in a frequency distribution
Total ∑ 1 100% ***Note: Those three graphs are used when the data are contained in a grouped
frequency distribution
Frequency 20
10 15
8 10
6 5
4
0
2 Less 99.5 104.5 109.5 114.5 119.5 124.5 129.5 More
0
A B O AB
Frequency Polygon
< Frequency Graph > < Cumulative Frequency Graph> *** Using Midpoints for x – axis and Frequencies for y –axis***
Frequency Cummulative Frequency 20
10 30 15
8 25
20 10
6
4 15
5
10
2
5 0
0 0 Less 102 107 112 117 122 127 132 More
A B O AB A B O AB
Ogive
Ex) Drawing Graphs of Grouped Frequency distribution *** Using Class boundaries for x – axis and
from Ex 2-2) Cumulative Frequencies for y –axis **
Frequency Distribution
50
20
40
15
30
10 20
5 10
0
0
100 -104 105-109 110-114 115-119 120-124 125-129 130-134 Less 99.5 104.5 109.5 114.5 119.5 124.5 129.5 More
8 8
Time Series Graph : occur over a specific period of time
6 6 (Temperatures over a 24 hours period)
80
4 4
75
2 2 70
65
0 0
60
55
50
BEFORE 12AM 3AM 6AM 9AM 12PM 3PM 6PM 9PM AFTER
A
B
O
AB
Step 2) % = ⁄ ∑ : to show
The sum of degrees or percentages does not always sum of or 100% due to
rounding
No Liner relationship Positive Liner relationship
<Scatter Plots>
A graph of order pairs of data values that is used to determine if a relationship exists Negative Liner relationship
between the two values
Ex)
No. of
Accidents, 376 650 884 1162 1513 1650 2236 3002 4028
Fatalities, 5 20 20 28 26 34 35 56 68
80
60
40
20
0
0 1000 2000 3000 4000 5000
Step 1) Arrange the data in order Statistics : a characteristic or measurer obtains by using
the data values from a sample
Step 2) Separate the data according to the first digit
Step 3) A display can be made by using the leading digit as the stem
Parameter: a characteristic or measurer obtains by using
and the trailing digit as the leaf. all the data values from a specific population
** If there are no data values in a class, you should write the stem number and leave
the leaf row blank. Do not put a zero in the leaf row. S
1, 3, 5 2, 4, 6
Ex) 24 32 2 56 44 2 13 32 44 31 32 14 105 23 20
Step 1) 2 13 14 20 23 24 31 32 32 32 44 44 56 105
Ex) Statistics 2, 4, 6 (a sample)
Step 2) 02 31 32 32 32 105
13 14 44 44 Parameter 1, 2, 3, 4, 5, 6(population)
20 23 24 56
Mean (=Arithmetic Average) Affected by the highest and lowest values
Step3) Stem (Leading Digit) Leaf (Trailing Digit)
∑
0 2 ̅
1 3 4
2 0 3 4 ∑
3 1 2 2 2
4 4 4 ** n = the numbers of the sample
5 6 ** N = the numbers of the population
10 5
Ex) 2 6 9 10 5 7
Ex) Atlanta: 26 29 30 31 36 36 40 40 50 52 60 ∑
N.Y. : 25 31 31 32 36 39 40 43 51 52 56
Step 1) It‟s arranged already. Median (MD) : the midpoint of the data array
Atlanta: 26 29 30 31 36 36 40 40 50 52 60
1) Arrange the all the data in order
N.Y. : 25 31 31 32 36 39 40 43 51 52 56
2) Select the midpoint
Step 3)
Atlanta Stem N.Y. 3) If there are 2 numbers of MD, adding the 2 numbers
9 6 2 5 And then divide by 2.
6 6 1 0 3 1 1 2 6 9 ** Data array =the data set is ordered
0 0 4 0 3
2 0 5 1 2 6
0 6 Ex) 3 5 4 9 2 3 4 6 10
2 3 3 4 4 5 6 9 10 4 is MD
Miami Dade College -- Hialeah Campus
14
∑
√ √
Weighted Mean: (ex) GPA
∑
̅
∑
Ex) Course Credits (W) Grade (X) Sample Variance and Standard Deviation
Math 3 A (4points)
Not usually used, but since in most cases the purpose of calculating the statistics is to
English 4 C (2points)
estimate the corresponding parameter
Biology 2 B (3points)
: because giving a slightly larger value and an unbiased estimate of the
population variance (
̅
∑ ̅
Distribution Shapes ∑ ̅
√
Positively skewed Negatively skewed Bell shape Symmetric
(Right skewed) (Left skewed) (Evenly) ̅
8 8 10
***Short cut or Computation Formulas (No need ̅ )
6 6
∑ ∑
∑ √ ∑
4 4 5 ∑ ∑
2 2
0 0 0
Ex) 131p Find population variance and population standard deviation For Variance and Standard Deviation for Grouped Data
Comparison of outdoor paint (how long each will last before fading) - Using it uses the midpoints of each class
A 10 60 50 30 40 20
B 35 45 30 35 40 25 Ex) Class Frequency(f) Midpoint ( )
Step1) ∑ 05.5 - 10.5 1 8
∑
Step2) 10.5 - 15.5 2 13
Step3) Range A: 60 – 10 = 50 months 15.5 - 20.5 3 18
B: 45 – 25 = 25 months
20.5 - 25.5 5 23
Step4) Variance
A: 10-35=35, 60-35=25, 50-35=15, 30-35=-5, 40-35=5, 20-35=-15 25.5 - 30.5 4 28
B: 35-35=0, 45-35=10, 30-35=-5 35-35=0, 40-35=5, 25-35=-10 30.5 - 35.5 3 33
A:
35.5 - 40.5 2 38
B:
∑ Step 1) Find the mid points of each class.
Step 2)
∑
Step 3)
∑
∑
Step5) √
X ̅- 3s X ̅ - 2s X̅ X ̅ + 2s X ̅+ 3s
Ex 3-25) p140 The mean of the number of sales of cars over a 3-month period is
87, and the standard deviation is 5. The mean of the commissions is $5225, and the
standard deviation is $773.
Compare the variations of the two. ex) The mean price of houses in a certain neighborhood is $50,000,
-Solution and the standard deviation is $10,000. Find the price range
for which at least 75%, of the houses will sell.
Sales -Solution
Commissions Hence, at least 75% of all homes sold in the area will have a price range from
$30,000 to $70,000.
Since the coefficient of variation is larger for commissions, the
commissions are more variable than sales.
Miami Dade College -- Hialeah Campus
17
Step 3) a student whose score was 12 did better than 65% of the class.
X ̅- 3s X ̅ - 2s X ̅ - 1s X̅ X ̅ + 1s X ̅ + 2s X ̅+ 3s
Finding a Value corresponding to a Given Percentile
Ex 3-35) from 3- 32 find the value corresponding to the 60th percentile. Outliers
- An outlier is an extremely high or an extremely low data value
when compared with the rest of the data values.
- Strongly affect with the mean and standard deviation
Step 2) The 6th value(=c) is 10 and 12 is 7th value(=c+1).
Step 1) Arrange the data in order and find Q1 and Q3.
Step 2) Find the interquartile range = IQR = Q3 - Q1
Hence, 11 corresponds to the 60th percentile. Step 3) (1.5) IQR
Anyone scoring 11 would have done better than 60% of the class. Step 4) Q1 - [ (1.5) IQR ]
Q3 + [ (1.5) IQR ]
Step 5) Check the data set for any value that is smaller than
Quartiles ( Qn ) Q1 - [ (1.5) IQR ] or larger than Q3 + [ (1.5) IQR ]
: Position in fourths that a data value holds in the distribution IQR
Step 1) Arrange the data in order from lowest to highest
Step 2) Divide into 4 groups Q1 Q2 Q3
25% 25% 25% 25% Q1 - [ (1.5) IQR ] Q3 + (1.5) IQR
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
3 - 5 Exploratory Data Analysis Ex 3-39) A dietitian is interested in comparing the sodium content of
The Five - Number summary and Boxplots real cheese with the sodium content of a cheese substitute.
The data for two random samples are shown.
1. The 5-Number Summary Compare the distributions, using boxplots.
1) The lowest value of the data set (Minimum)
2) Q1 Real Cheese Cheese Substitute
3) Q2 = The Median 310 420 45 40 270 180 250 290 130
4) Q3 220 240 180 90 260 340 310
5) The highest value of the data set (Maximum)
2. a Boxplot Step 1) Real cheese : 40 45 90 180 220 240 310 420
A graph of a data set obtained by drawing a horizontal line from Cheese Substitute : 130 180 250 260 270 290 310 340
the minimum data value to Q1, a horizontal line from Q3 to the maximum data Step 2) Q2 (The Median)
value, and drawing a box whose vertical sides pass through Q 1 and Real cheese : = 200 = Q2
Q3 with a vertical line inside the box passing through the median or Q 2.
Cheese Substitute : = 265 = Q2
3. How to make a Boxplot Step 3) Q1 & Q3
Step 1) Arrange the data in order.
Real cheese : = 67.5 = Q1 = 275 = Q3
Step 2) Find Q2 (The Median).
Step 3) Find Q1 & Q3. Cheese Substitute : = 215 = Q1 = 275 = Q3
Step 4) Draw a scale for the data on the . Step 4, 5, &6 )
Step 5) Locate the lowest value, Q1, the median, Q3, and 40 67.5 200 275 420
the highest value on the scale. Real cheese
Step 6) Draw a box around Q1 & Q3., draw a vertical line through
the median, and connect the upper and lower values.
Cheese Substitute
4. Information Obtained from a Boxplot
1) If the median is near the center of the box, 130 215 265 300 340
the distribution is approximately symmetric.
0 100 200 300 400 500
2) If the median falls to the left of the center of the box,
the distribution is positively skewed.
** Compare the plots. It is quite apparent that the distribution for the cheese CH 4
substitute data has a higher median than the median for the distribution for the real
cheese data. The variation or spread for the distribution of the real cheese data is 4-1 Sample Space and counting Rules
larger than the variation for the distribution of the cheese substitute data. Probability - The chance of an Event occurring
1. Probability Experiments
Traditional Exploratory Data Analysis
A chance process that lead to well-fined results called outcomes.
Frequency distribution Stem and Leaf Plot (not known in advance of an act)
Histogram Boxplot 2. Outcome; The result of a single trial of a probability experiment
Mean Median 3. Event ( = E )
Standard deviation Interquartile range a subject(a sample from total) of the given sample space denoted
by A, B, C, D, etc. (it can consist more than one outcomes.)
The most three commonly used measures of central tendency are mean, median,
and mode. Ex 1) A question has multiple choices that 4 possible results
The most three commonly used measurements of variation are range, variance, and (Outcomes) such as ⓐⓑⓒ and ⓓ.
standard deviation. Only one of them is the right answer.
The most common measures of position are percentiles, quartiles, and deciles. What is a chance that a person gets the answer?
The coefficient of variation is used to describe the standard deviation in relationship
to the mean. ⓐⓑⓒ ⓓ
These methods are commonly called traditional statistical methods and are
primarily used to confirm various conjectures about the nature of the data. Ex 2) Tossing a fair and balance coin.
The boxplot and 5-number summaries are part of exploratory data analysis; to (Well- defined, outcomes Head & Tail)
examine data to see what they reveal. What is the possibility (of chance) of getting "Head" ?
2 possible outcomes (Head & Tail = H & T)
4. Sample Space (= S )
the set (or collection) of all possible outcomes of a probability
experiment
* A die is rolled S = {1, 2, 3, 4, 5, 6} (=a set of notation)
* A coin is tossed S = {H, T}
Miami Dade College -- Hialeah Campus
21
S= Sample Space
Sample Space of Rolling 2 Dice
= all the possible outcomes
Event A , Event B
You can represent the Probability of the Events using a Venn diagram from set
theory. (can‟t use this method with all cases)
The rectangle is Sample Space (S).
The circle (set) of A or B is the event, and they are dependent of each other.
The intersection area of events A and B is a nice correspondence between "events A
and B both occurring" and "being inside both circle A and circle B".
Experiment Sample Space The union area of event A or B is covered the maximum combined area of A and B,
when they do not overlap and it's the maximum possible area of A-union-B.
Toss a coin Head, Tail
Toss 2 coins H-H, H-T, T-T, T-H S S
Roll a die 1, 2, 3, 4, 5, 6
1-2, 1-2 1-3, 1-4, 1-5, 1-6,, 2-1, 2-2, 2-3, etc 36
Roll 2 dice
outcomes.
From Ex 4) Let E = {2, 4, 6} (Observing an event number) Ex 5) A coin is tossed 100 times, find the n(S)
S = { H,H,H,…T,T,T,…}
S
1, 3, 5 2, 4, 6
B
B
G
G Ex 7) A die is rolled,
B
1. Find Odds in favor of getting of less than 4.
G n (S) = 8 outcomes
G
P194 [Ex4-7] Drawing a card from a deck (52 cards) p200 [Ex4-13]
a) Of getting a Jack Distribution of Blood Type - Find the following probabilities
Type A B AB D Total
Frequency 22 5 2 21 50
d) Of getting a 3 or a 6
d) A person doesn't have type AB blood
Unlikely Likely
P201 [Ex4-14]
Number of days of maternity patients stayed in the hospital
in the distribution
0 (Uncertain) (Fifty-fifty chance) 1 (Certain) Number of days stayed 3 4 5 6 7
Total= 127 15 32 56 19 5
∑
a) A patient stayed Exactly 5 days
d) At least 5 days
4-2 The Addition Rules for Probability P212 [2] Determine whether these events are mutually exclusive.
Mutually Exclusive Events a. Roll a die: Get an even number, and get a number less than 3.
; Probability events that cannot occur at the same time b. Roll a die: Get a prime number (2,3,5), and get an odd number.
c. Roll a die: Get a number greater than 3,
Event and get a number less than 3.
1. Simple; can't break the event ex) E={1} d. Select a student in your class:
The student has blond hair, and the student has blue eyes.
2. Compound; "and" ; "or" ex) { } { } { } { }
e. Select a student in your college:
The student is a sophomore, and the business major.
f. Select any course:
It is a calculus course, and it is an English course.
Case 1 g. Select a registered voter: The voter is a Republican,
(Mutually exclusive events)
and the voter is a Democrat.
P(A or B) = P(A) + P(B) A Ans: Yes- c, f, and g.
B
*In only single trial, event A or B occurs P212 [5] At a convention there are instructors of 7 mathematics,
5 computer science, 3 statistics, and 4 science.
and no intersection
If an instructor is selected , find the probability of getting
*A and B are mutually exclusive a science or math instructor.
(i.e., disjoint )
Total = P(S) = 7+5+3+4=19
Case 2
(No Mutually exclusive events)
P(A or B)=P(A) + P(B) - P(A and B) Ex] A die is rolled one time, find P(E) getting 4 or less than 6.
*A and B aren‟t mutually exclusive
4
(i.e., )
Ex] A card is drawn randomly from an ordinary deck of 52 cards
Case 3 ( an extra case)
Find P(the card is diamond or an ace)
P(A or B or C)
= P(A) + P(B) + P(C)
- P(A+B) - P(A+C) - P(B+C) P209 [Ex4- 20]
+ P(A+B+C) A single card is drawn at random from an ordinary deck of
cards. Find the probability of either an ace or a black card.
(likely)
Miami Dade College -- Hialeah Campus
27
P220 Ex 4-25]
Ex] In a statistics class there are 18 juniors, 10 seniors;
There are 3 red balls, 2 blue balls, 5 white balls.
6 of the seniors are females, and 12 of the juniors are
2 items selected and replaced the cards.
males. If a student is selected at random, find the probability
of selecting the following: ( replaced the cards = independent, 2 events)
a. A junior or a female a. 2 blue balls
18 Juniors = 12 males + 6 females
10 Seniors = 4 males + 6 females
b. A blue and a white
28 students= 16 males+12 females
P222 Ex 4-30) 3 Cards are drawn from a deck and not replaced the cards P225 Ex 4-32]
(Not replaced = Dependent ) A box contains black chips and white chips.
a. Getting 3 Jacks A person selects 2 chips without replacement.
If the probability of selecting a black chip and a white chip is
if the probability of selecting a black chip on the first draw is
b. Getting an Ace, a King, a Queen and it‟s given that.
Find the probability of selecting a white chip on the second draw.
P225 Ex 4-34]
d. Getting 3 clubs A recent survey asked 100 people if they thought women in
the armed forces should be permitted to participate in combat
Gender Yes No Total
Male 32 18 50
Ex) 30% chance to get sick. Find of the probability of selecting Female 8 42 50
2 students and they both are sick in the school. Total 40 60 100
( It‟s a dependent case and there is already probability ) a. The respondent answered yes,
given that the person was a female.
( was a female; 1st event, yes; 2nd event)
Conditional Probability
|
|
| P230 [33] At an exclusive country club,
68% of the members play bridge and drink champagne,
Ex] A die is rolled twice. Find the probability of getting 4 and 83% play bridge .
If a member is selected at random, find the probability
after getting an even number.
that the member drinks champagne,
Event A= P(even number) ; 1st outcome given that he or she plays bridge.
Event B = P(“4”) ; 2nd outcome
|
4
| Try P230 [34]
4 - 4 Counting Rules Ex] How many ways can a dinner patron select 2 appetizers,
2 drinks, 3 foods, and 2 desserts on the menu?
1. The Fundamental Counting Rule
In a sequence of events in which the 1st one has , possibilities
and so on, the total number of possibilities of the sequence.
Ex] The digit 0, 1, 2, 3, and 4 are to be used in a four-digit ID card.
How many different cards are possible
a. When events are just listed with “and”, it‟s counting rule case. a. if it can be repeated.
b. Event A, event B and event C = Event A event B event C
(In this case “and” means to multiply)
b. If it cannot be repeated
P233 Ex 4-38] Tossing a coin and rolling a die, find the number of
outcomes for the sequence of events.
Factorial Notation
;the number of ways a square of n events can over
if the 1st event can occur in k1 ways, the 2nd event can occur
in k2 ways, etc.
P241 [1] How many ways can a base ball manager arrange
A batting order of 9 player? (no repeat)
( 2 different event = 1st outcome 2nd outcome) ( 9 positions 9 players)
P233 Ex 4-38]
A paint manufacturer wishes to manufacture several different paints. Ex ] Florida lottery ={1, 2, 3,…,53}
Color Red, blue, white, black, green, brown, yellow By choosing any six numbers out of 53 numbers and the
Type Latex, oil picked numbers are not in order.
53C6 = 22,957,480 = n(S) = total (using calculator) Very unlikely
Texture Flat, semi gloss, high gloss
Use Outdoor, indoor
How many different kinds of paint can be made if a person
select one color, one type, one texture, and one use? S
53C6 = 22,957,480 outcomes 1
One chance to win
p238 Ex 4-49]
In a club there are 7 women and 5 men. A committee of
„n’ ; items (all different) 3 women and 2 men is to be chosen. Hpw many different
‘r’; items selected out of „n‟ Possiblilities are there?
Ex] How many different tests can be made from a test bank
of 20 questions if the test consists of 5 questions?
( order & repetation are not important.= Combination)
Ch5
From Ch1
A Discrete Variable: assume values that can be counted
A Continuous Variable: can assume all values in the interval between any 2 values
Outcome X 1 2 3 4 5 6 ∑
Probability P(X) 1
0 2 4 6 No P.D.
∑
-1.0 1.5 0.3 0.2
1 2 3 4 P.D.
2 3 7 No P.D.
0.3 0.4 ∑
Binomial Probability
Mean
Variance
Standard deviation √
√
2.
6.
3. √
7.
4.
√
5. 8.
√
Miami Dade College -- Hialeah Campus
34
-a b
Another way
Case 2
x%
-a 0
a
( )
x%
]
]
-a
x%
-a 0 a
-a a
0 a
Standard Normal distribution; Step 1) Find the z value ceresponding to a given number X or X 1 and X2.
√
X=z ̅
<Finding probabilities (area) for a normally distributed variable by <Finding specific data values for given percentage,
transforming it into a standard normal variable by using the formula> using the standard normally distribution >
Ex 1] The average or the mean = 3.1 hours The standard deviation = 0.5 Ex 3) In the top 10%, the mean is 200 and the standard deviation is 20.
Find the percentage of less than 3.5 hours. Find the lowest possible score to quality.
Step 1) Step 1)
Step 2 )
z
Step 2) Find the z in the table
Step 3)
0.8
Step 3) 0.8 + .00 0.7881 Step 4)
Ex4 ) To select in the middle 60% of the population, the mean is 120,
Ex 2 ] The mean is 28 lb, and the standard deviation is 2 lb. and the standard deviation is 8. Find the upper and lower values
1. Between 27 and 31 lb Step 1) 60% = 0.60
Step 1)
-a 0 a
Step 2)
Step 2)
]
( )
-0.5 1.5
Step 3) Step 3)
2. More than 30.2 lb
Step 1) Step 4)
Step 2)
1.1
Step 3)
<Use the Central Limit Theorem to Solve Problems Involving Sample Means Population
for Large Samples> Random Random Value
Sample ̅
A Statistic
A Sample Distribution of Sample Means
A Parameter
A distribution obtained by using the means computed from random samples of a
specific size taken from a population
A Statistic ; a characteristic or measure obtained by using
Sampling Error the data values from a sample.
The difference between the sample measure and the corresponding population
measure due to the fact that the sample is not a perfect representation of the A Parameter; a characteristic or measure obtained by using
populatiion. all the data values for a specific population
Range of Values
⁄
Which may contain ⁄
It‟s called interval estimate of to find confidence level ; ⁄
√ √
Probability of success
is e total area in both tailsof the standard normal distribution curve. The Confidence Interval of the Mean for a Specific
̅ ( ) ̅ ( ) ̅ ( )
√ √ √
̅ ̅ ̅ ̅
̅
90% 100% - 90% = 10% 10%=0.10 0.05 **** Round up to next whole number ****
Ex) 0.23 1 ; 2.43 3 ; 4.91 5
95% 100% - 95% = 5% 5% = 0.05 0.025
99% 100% - 99% = 1% 1% = 0.01 0.005
P364 #11
A sample reading score of 35 students
[Step 1]
⁄ ⁄
[Step 2]
In the table, it‟s less than 0.50 (50%), so look at the ⊝ table
√ √
* Round up to next whole number
[Step 4]
̅ ̅
<How large a sample is nessecery to make an accurate estimate?> <Confidence Intervals for the Mean (
and Sample Size>
a. Depend on 3 things
E (the maximun error of estimate) 1. -- Charicteristics of the Distribution
(the population standard diviation) 1. A normal distribution curve is bell-shaped
The degree of confidence (30%, 95%, 99%, etc.) 2. The mean, median, and mode are equal and are located
b. Use The Minimun Sample of Size at the center of the distribution ( )
3. A normal distribution curve is unimodal (i.e., it has only one mode)
⁄
( ) 4. The curve is symmetric about the mean
5. The curve is continuous,
6. The curve never touches the x axis.
P365 #21
A university dean of students wishes to the average number of estimate hours 2. -- Charicteristics of the Distribution
students spend doing homework per week. The variance
Standard diviation is 6.2 hours. A family of curves based on the concept of ”Degrees of Freedom:”,
How large a sample must be selected, if he wants to be 99% confidence of finding which is related to sample size.
whether the true mean differs from the sample mean by 1.5 hours? Degrees of Freedom = d.f. D.F. =
To find The Minimun Sample of Size As the sample size increase, the distribution approches
[Step 1] the standard normal distribution.
*At the bottom of the table where , the ⁄ values can be found for
specific confidence intervals
⁄ ⁄
[Step 2]
In the table, it‟s less than 0.50 (50%), so look at the ⊝ table
⁄ ∑ ̅ ∑ ̅
⁄
√ √
√ √
finding sample standard diviation (p136) ⁄
* Rounding 2 decimal places (dp = 2 CF significant figure) √ √
[Step 4] [Step 4] = 98% (given)
[Step 5] the Confidence Interval Estimate of [Step 5] the Confidence Interval Estimate of
̅ ̅ ∑
̅
Ex) Find the ⁄ value for a 95% confidence interval, sample size is 22. ̅ ̅
[Step 1] the d.f. = 22-1 =21 & C.I. =95%
[Step 2] Use the t- distribution table;
left colum 21 & top row (Confidence Interval) 95% 2.080
p372 #11
n = 28, for 95% confidence interval, sample standard diviation=2
When to Use the z or t Distribution (p371) [Step1] d.f. = n -1 = 28-1 = 27
[Step 2] Critical Values = Use the t- distribution table;
Is known? Use ⁄ values
Yes left colum of d.f. 27 & top row (Confidence Interval) 95% 2.052 =
No matter what the sample size is.
[Step 3]
No
⁄
Is ?
Yes √ √
Use ⁄ values and [Step 5] the Confidence Interval Estimate of
S in place of in the formula ∑
No ̅
[Step1] d.f. = n -1 = 20-1 =19 a. Find 67th percentile. ( simmilar to H.W. #10)
[Step 2] Critical Values = t- distribution table; [Step 1] 67% = 0.67 in the table z = 0.44
left colum of d.f. 19 & top row (Confidence Interval) 99% 2.861 =
⁄
[Step 3]
√ √ z =0.44
[Step 4] the Confidence Interval Estimate of
̅ ̅ [Step 2]
Percentiles; Divide the data set into 100 equal groups. (p151)
66
Quartiles; Divide the distribution into 4 equal groups. b. Find (3rd quartile)
[Step 1] % = 0.75 in the table
0.7486 (0.0014 difference) closer Z = 0.67
0.7517 (0.0017 difference)
z =0.67
[Step 2]
68
H.W. #7)
Step 1]
-a 0 a
Step 2] ]
( )
P325 #12 Ch 8
N=4 Sample size= 3 = n ̅
90
Statistical Hypothesis; a conjecture about a population
150 110 ∑ parameter. This conjecture may or may not be true.
√
190
̅
Hypothesis-Testing Common Phrases (P400)
√
Ex) N=4, n = 2
̅
at least at most Is greater than Is less than
2 6
no less than no more than Is above Is below
4 8
∑ √ Is greater than or Is less than or Is incresed Is decresed or
√ √ equal to equal to reduced from
Is longer than Is shorter than
√ Is bigger than Is smaller than
̅
√ √ Is highter than Is lower than
Six Steps of Hypothesis- Testing Uses the data obtained from a sample to make a decision about whether
the (null hypothesis) should be rejected.
1. 1. The Null Hypothesis ( )
2. 2. The Alternative Hypothesis ( ) 2. Test Value
The numerical value obtained from a statistical test
3. 3. Test Statistics (T.S.) (or Test Value=T.V.) for z or t
̅ Null Alternative
Tails Gragh
( ) ( )
⁄√
̅
⁄√ Two - Tailed Test
One-tail
0.25 0.20 0.10 0.05 0.025 0.02 0.01 0.005
Two-tail
0.50 0.40 0.20 0.10 0.05 0.04 0.02 0.01
Ex) State the null and Alternative hypotheses for each conjecture. Hypothesis Testing & a Jury Trial
1. A researcher thinks that if expectant mothers use vitamin pills, the weight of the True False
babies will increase. The average birth weight of the population is 8.6 lb. (Innocent) (Not Innocent)
Ans.; Increase , Reject Type I Error
Average ~ of the population is 8.6 lb. Correct Decision
Find guilty P ( Type I Error ) =
Do not Reject Type II Error
2. An engineer hypothesizes that the mean number of defects can be decreased in a Correct Decision
Find not guilty P ( Type I I Error ) =
manufacturing process of compact disks by using robots instead of humans for
certain tasks. The mean number of defective disks per 1000 is 18. Type I Error: If when , Reject
Ans.; Decrease , Type II Error: If when , Do Not Reject
Average ~ is 18
Type I Error: There is not sufficient evidence to support the claim
3. A psychologist feels that playing soft music during a test will change the results (There is enough evidence to reject the claim.)
of the test. The psychologist is not sure whether the grades will be higher or Probability of type I error = P(type I error ) =
lower. In the past, the mean of the scores was 73. Type II Error: There is sufficient evidence to support the claim
(There is not enough evidence to reject the claim.)
Ans.; Change , Whether ~ higher or lower,
Probability of type II error = P(type I I error ) =
Mean ~ was 73
2. If is false
1.
2.
3.
4. We are testing if is outside the confidence interval.
5. We are testing whether ̅ is in one of the rejection regions.
̅
̅
[Idea; left tailed test, n=36 30 z table]
[Idea; 2 tailed test, n=35 30 z table]
Step 1)
Step 1)
Step 2) Claim
Step 2) Claim ̅
̅ Step 3) ⁄√ ⁄√
Step 3) ⁄√ ⁄√
Step 4) Critical Values (C.V.s) using z table
Step 4) Critical Values (C.V.s) using z table
Reject Non Reject
⁄
Region Region
Do not Reject
Reject Reject -1.56 -1.28 =0
Region Region (T.S) (C.V.)
Step 5) Decision always about (Null) 2 options
The decision is to reject (Null)
- 2.58 1.01 2.58
(C.V.) (T.S) (C.V.) Step 6) Conclusion (about Claim that
There is sufficient evidence to support the claim that
Step 5) Decision always about (Null) 2 options average cost is less than $80.
(Null) does not reject because the test value is
in the nonrejection (noncritical) region
(Null) : false
Alternative) : true
Step 6) Conclusion (about Claim)
There is not sufficient evidence to support the claim that
the average cost is different from $24,267. (
B A
=0 1.32 1.65 (Because the claim (Because the claim
(T.S) (C.V.) is false) could be true)
Step 5) Decision always about (Null) 2 options -1.96 1.65 1.96
The decision is not to reject (Null) (C.V.) (T.S) (C.V.)
Step 6) Conclusion (about Claim)
There is not sufficient evidence to support the claim
That average cost is more than $42,000.
Test Value z
Hypothesis Test: wording of Final Conclusion
1. Does the Claim have equaility ( = )? A B
(Because the claim (Because the claim
Yes Reject = Cancel B is true) could be false)
( ) Do not reject = Keep A -1.75 -1.645
(T.S) (C.V.)
No Reject = Keep A
Test Value z
Do not reject = Cancel B
B
A
A: There is a enough evidence to support the claim that ~ (Because the claim
(Because the claim
There is not a enough evidence to reject the claim that ~ is false)
icould be true)
B: There is not a enough evidence to support the claim that ~ -1.645 -0.3
There is a enough evidence to reject the claim that ~ (C.V.) (T.S)
Ex) A significance level of is used in testing the claim that Ex) The claim that the average age of lifeguards in a city is greater than 24 years.
and the sample data result in a test statistic of . A sample of 36 guards, the mean of the sample to be 24.7years, and a standard
[Idea; right tailed test] deviation of 2 years.
Is there evidence to support the claim at = 0.05?
Step1) z = 1.18 with z table 0.1190 [Idea; right tailed test, n=36 30 z table]
Step 2) P-value = P(z=1.18) = 0.119
Step 1)
Step 2) Claim
̅
Step 3) ⁄√ ⁄√
1.18 (T.S)
P- Value = 0.1190 Step 4) Find P - value using z table
Step 3) Fail to reject P (Right tailed symmetric
Step 4) The P-value of 0.119 is relatively large, indicating that Non Reject Reject = 0.05
the sample results could easily occur by chance. Region Region
p-value = P(-2.1) = 0.179
Ex) A significance level of is used in testing the claim that
and the sample data result in a test statistic of .
[Idea; two- tailed test] =0 2.10 (T.S)
Step 5) Decision always about (Null)
Step1) z = 2.34 with z table 0.0096 P-value = 0.179 = 0.05 ; Reject
P-value = 0.0096 2 = 0.0192
Step 6) Conclusion (Also about Claim)
Step 2) /2 =0.025 There is sufficient evidence to support the claim
P-value /2= 0.019/2 =0.0096 That average aget is greater than 24 years.
Ex) A job director claims that the average starting salary for nurses is
$24,000. A sample of 10 nurses has a mean of $23,450 and a
standard deviation of $400.
Is there enough evidence to reject the director‟s claim at ?
Assume the variable is normally distributed.
[Idea; 2 tailed test, n=10 30 t table]
Step 1) Claim
Step 2)
̅
Step 3) t ⁄√ ⁄√
Do not Reject
Reject Reject
Region Region
Ex) Find the P - value when the t test value is 2.983, the sample size is 6,
and the test is two-tailed.
Tip ! Ch 9
Ex) a average hotel room rate in N.Y. is $88.42 and in Miami is $80.61. Ex) There are 2 groups who left their profession within a few months after
Two samples of 50 hotels each, the standard deviation were $5.62 & graduation (leavers) and who remained in their profession after they
$4.83. At , can it be concluded that there is a significant graduated (stayers).
difference in the rates? Test the claim that those who stayed had a higher science grade point
average than those who left. Use
[ Idea;
Leavers Stayers
]
̅̅̅ ̅̅̅
Step 1)
Step 2) Claim
[ Idea; ]
Step 3) Test Statistics (T.S.) (or Test Value=T.V.)
̅̅̅ ̅̅̅ Step 1)
Step 2) Claim
√ √
Step 3) Test Statistics (T.S.) (or Test Value=T.V.)
Step 4) Critical Values (C.V.s) using z table ̅̅̅ ̅̅̅
⁄ using z table
√ √
Step 5) Decision always about (Null)
Do not Reject Step 4) Critical Values (C.V.s) using z table
Reject Reject using z table
Region Region
Step 5) Decision always about (Null)
Do not Reject
Reject
-1.96 +1.96 7.45 Region
Step 6) Conclusion; about the Claim
There is enough evidence to support the claim that
-2.33 -2.01
Step 6) Conclusion; about the Claim
There is not enough evidence to support the claim that
<Testing the Difference between 2 Means : Small Dependent Samples> Ex) A dietitian wishes to see if a person's cholesterol level will change if the diet is
Two Tailed Left Tailed Right Tailed supplemented by a certain mineral.
Six subjects were pretested, and then they took the mineral supplement for a 6-
week period. The results are shown in the table.
Can it be concluded that the cholesterol level has been
changed at
∑ ∑
∑ ∑
210 190 20 400
∑ 235 170 65 4,225
̅̅̅
208 210 -2 4
∑ 190 188 2 4
√∑
172 173 -1 1
-1 -0.5 0 0.5 +1
Perfect Strong No Linear Strong Perfect
Negative Negative Relationship Positive Positive
L.C.C. L.C.C. L.C.C. L.C.C. L.C.C.
* L.C.C.=Linear Correlation Coefficient
*In Algebra,
In statisic,
∑ ∑ ∑
√ ] ]
* The correlation coefficient suggests a Strong Positive Relationship
between age and blood pressure.
Step 3) the values of a and b
∑ ∑ ∑ ∑
∑ ∑
∑ ∑ ∑
∑ ∑
Step 4) When r is significant, The Regression Line Equation
̂
Miami Dade College -- Hialeah Campus
58
Pressure
y Source
150
Elementary Statistics - A Brief Version
140
http://rchsbowman.wordpress.com/category/statistics/statistics-notes/page/7/
130
http://statistics.laerd.com/statistical-guides/normal-distribution-calculations.php
120
http://rchsbowman.wordpress.com/2009/12/01/statistics-notes-%E2%80%93-the-
110 standard-normal-distribution-2/
http://www.analyzemath.com/statistics/mutually-exclusive.html
40 50 60 70 x
Age
Plot (43, 128) , ( 48, 120), (56, 135), (61, 143), (67, 141), & (70, 152)
Ex) Using the equation, predict the blood pressure for a person
who is 50 years old.
[ Idea: 50 years old a value ]
̂