Statistics Lecture PDF
Statistics Lecture PDF
1 2
Section 1 Outline
Introduction to statistics
Introduction to Statistics and Basic concepts: variables and data
SPSS Getting acquainted with SPSS
Reading materials:
Chap 1, 2 (Keller)
3 4
5 6
1
Types of statistics Basic concepts: variables and data
Descriptive statistics: A variable is some characteristics of population or
Collecting, organising, summarising, and presenting data sample
E.g: graphical techniques; Eg:
numerical techniques • Height of female students
Inferential statistics: • Occupation of students in this class
Estimating, predicting, and making decisions about Data are the observed values of a variable
population based on sample data Eg:
E.g: estimation; • Height of 10 female students: 1.6, 1.7, 1.55, 1.59, 1.5, 1.58,
hypothesis testing 1.64, 1.67, 1.58, 1.55
• Occupation of 5 students: teller, accountant, IT, marketing
manager, teacher
7 8
Ordinal data: a sort of nominal data but their values are in order
Ordinal Continuous
Nominal Discrete - Study performance of students: poor, fair, good, very good, excellent
- Opinions of consumers: strongly disagree, somewhat disagree, neither
disagree nor agree, agree, strongly agree
9 10
2
Population versus sample Population versus sample (con.t)
A sample is a smaller group of the population.
Population is a set of all items or people that share
some common characteristics A sample survey is obtained by collecting information of
some members of the population
A census is obtained by collecting information - Collect the height of 1,000 Vietnamese citizens
about every member of a population - Verify the quality of a proportion of products that are
produced by factory X
- Collect the height of Vietnamese citizens 2
Statistics: a descriptive measure of a sample (x, s )
- Verify the quality of all products that are produced by Sampling: taking a sample from the population
factory X
An important requirement: a sample must be representative
Parameter: a descriptive measure of a population of the population. That means the profile of the sample is the
( , 2 ) same as that of the population
14
13
Sample
15 16
Multistage sample
Quota sample
Non-random sampling
Cluster sample
17 18
3
Data presentation: Outline
Tables and charts
Frequency distribution
- Simple frequency table
- Grouped frequency table
Charts
- Bar and pie charts
- Histograms
Reading materials: - Boxplot
Chap 2, 3 (Keller) - Stem-and-leaf
- Ogive
1 2
Recap
Frequency is the number of times a certain event has
◦ In the previous chap you know how to collect data. Data collected
through surveys are called ‘raw’ data.
happened
◦ Raw data may include thous. obs and often provide too much A frequency distribution records the number of times
information => need to summarise before presenting to audience each value occurs and is presented in the form of
Requirement table
◦ Data summary clears away details but should give the overall pattern. Types of frequency distribution:
◦ Summarised information are concise but should reflect the accurate • Simple frequency distribution
view of the original data
• Grouped frequency distribution
Methods to summarise and present data
• Cumulative, percentage, and cumulative percentage
◦ Tables
frequency distribution
◦ Charts
◦ Numerical summaries (measure of location and dispersion)
3 4
• Qualitative data 4 3
• Discrete variable with few values 5 3
5 6
1
Simple frequency distribution: Simple frequency distribution: example 2
nominal variable
Nationality Number of students (frequency)
Australia 179
Example 2: We have a data set of 686 international students New Zealand 1
studying at UNSW, Australia. Create a frequency table Hong Kong 21
Large data set => can’t create a frequency table manually Singapore 48
Malaysia 70
Creating a simple frequency table using SPSS Indonesia 76
Grouped frequency table: discrete variable with Grouped frequency table: discrete variable with
many values many values (cont.)
Example 3:3: the marks scored by 58 candidates seeking promotion in a
personnel selection test were recorded as follows. Construct a frequency Marks (class interval) Number of
table using a class width of ten marks candidates
(frequency)
37 49 58 59 56 79 21 – 30 2 Note: Decision on the
62 82 53 58 34 45 31 – 40 11 number of classes and
40 43 44 50 42 61 class intervals is
41 – 50 17 subjective but the
54 30 49 54 76 47 51 – 60 20 number should be
64 53 64 54 60 39 61 – 70 5 chosen carefully
49 44 47 44 25 38 71 – 80 2
55 57 54 55 59 40 81 – 90 1
31 41 53 47 58 55 Total 58
59 64 56 42 38 37
33 33 47 50
9 10
Example 4:4: draw a frequency table of wages (in USD) Wages (class Number of
Terminology:
interval) people
paid to 30 people as follows: (frequency)
Lower value: the lowest value of one
< $100 2 class.
$100 – < $200 5 Upper value: the highest value of
202 277 554 145 361 one class
$200 – <$300 8 Class interval: range from lower to
457 87 94 240 144
upper value
310 391 362 437 429 $300 – <$400 9
Open-ended class: the first or last
176 325 221 374 216 $400 – <$500 5 classes in the range may be open-
ended. That means they have no
480 120 274 398 282 $500 – <$600 1 lower or upper values (e.g: <$100).
153 470 303 338 209 Total 30 Open-ended class is designed for
uncommon value: too low or too high
11 12
2
Cumulative, percentage, and cumulative
Frequency distribution: summary percentage frequency distribution
1. Simple frequency distribution: easy task and can either do Wages (class Number of Cumulative Percentage Cumulative
manually or rely on statistical software interval) people frequency frequency percentage
(frequency) frequency
2. Grouped frequency distribution: more difficult. The
< $100 2 2 6.7 6.7
hardest task is to decide the number of classes and class
width or class intervals. Ideal: each class reflects $100 – < $200 5 7 16.7 23.3
differences in the nature of data. The more you work on it, $200 – <$300 8 15 26.7 50.0
the more reasonable classes’ number and size you decide $300 – <$400 9 24 30.0 80.0
$400 – <$500 5 29 16.7 96.7
3. The upper value of the previous class should not coincide
with the lower value of the following class to make sure $500 – <$600 1 30 3.3 100.0
each value should only be in one class. Total 30
13 14
T l for
Tools f continuous
ti data:
d t A t li & NZ
Australia 180 26 24%
26.24%
China 120 17.49%
• Histograms
South East Asia 227 33.09%
• Stem-and-leaf plots India 11 1.60%
• Cumulative frequency curve (ogive) USA & Canada 14 2.04%
• Boxplots (discussed in lecture 3) UK & Ireland 35 5.10%
Other Europe 42 6.12%
Rest of the world 57 8.31%
Total 686 100.00%
15 16
250 6.12%
5.10%
200
26.24%
2 04%
2.04%
Frequency
150
1.60%
100
50 17.49%
33.09%
0
Australia China South India USA & UK & Other Rest of
& NZ East Asia Canada Ireland Europe the world Australia & NZ China South East Asia India
USA & Canada UK & Ireland Other Europe Rest of the world
17 18
3
Equal--width histograms
Equal
Histograms
All bars have the same width (the same class intervals)
Raw data => frequency table => histograms The height of each bar represents the frequency of the
class intervals
A histogram looks like a bar charts except that the Using raw data in the example 4, draw a histogram
bars are joined together representing wages
Two types
yp of histograms:
g
Equal-width histogram
Unequal-width histogram
19 20
40 25
Frequency
20
30
Frequency
15
20 10
10
0
0.0 1.5 3.0 4.5 6.0 7.5
Positive skew
0
-2.4 -1.6 -0.8 0.0 0.8 1.6 2.4
Symmetric
21 22
30
20
25
15
Frequency
Frequency
20
15 10
10
5
5
0
0 -1.5 0.0 1.5 3.0 4.5 6.0
3.0 4.5 6.0 7.5 9.0 Bimodal
Negative skew
23 24
4
Histogram terms Stem-and-leaf display
4 1
25 26
Ogive
frequency 300
250
<100 22 22 200
150
100-<150 44 66 100
150-<200 79 145 50
0
200-<250 96 241 <100 100-<150 150-<200 200-<250 250-<300 300-<350
Value
250-<300 44 285
300-<350 15 300
27 28
5
Numerical summaries: Outline
Central tendency and dispersion
Measures of location:
Mean, median, mode
Selection of measures of location
Measures of dispersion:
Range
Range, quartile range,
range quartile deviation,
deviation variance,
variance
standard deviation
Chebyshev’s law
Reading materials: Coefficient of variation
Coefficient of skewness
Chap 4 (Keller)
1 2
X i
A measure of location shows where the centre of Arithmetic mean from population: i 1
N
the data is
n
Arithmetic mean/average n
3 4
1
Median Calculate median from raw data
Median is the value of the observation which is If the data has an odd number of observations:
located in the middle of the data set (n 1)th
◦ Middle observation:
2
Steps to find median: Median x( n 1))th
1. Arrange the observations in order of size (normally 2
ascending order) If the data has an even number of observations:
2. Find the number of observations and hence the middle ◦ There are two observations located in the middle and
observation
3. The median is the value of the middle observation
M edian ( x th x th )/2
n n
1
2 2
7 8
Advantages:
E.g1. Raw data: 11, 11, 13, 14, 17 => find median
◦ Easy to understand and calculate
E.g 2. Raw data: 11, 11, 13, 14, 16, 17 => find ◦ Not affected by outlying values => thus can be used when
median th mean would
the ld be
b misleading
i l di
Disadvantages
◦ Value of one observation => fails to reflect the whole data
set
◦ Not easy to use in other analysis
9 10
Mode
Example to calculate mode
Mode is the value which occurs most frequently
in the data set X Frequency
8 3
Steps to find mode
12 7
1. Draw a frequency table for the data 16 12
11 12
2
Mean, median and mode in normal and skewed
Bimodal and multimodal data distributions
13 14
17 18
3
Variance
Range
2
( X i )2
Range is the difference between the largest and Variance from population:
N
smallest value => Sort data before computing range
Formula: Range = maximum value - minimum Variance from sample s2
( x x) 2
value n 1
19 20
21
A 8.3 -6.2 20.9 -2.7 33.6 42.9 24.4 5.2 3.1 30.5 ( x 1s) ( x 1s )
B 12.1
12 1 -2.8
-2 8 6.4
6 4 12.2
12 2 27.8
27 8 25.3
25 3 18.2
18 2 10.7
10 7 -1.3
-1 3 11.4
11 4
◦ 95.45% of all obs fall within 2 standard deviation of the mean, i.e. in
the range:
4
Boxplot Boxplots
Here is the Boxplot of height of international students
Need MEDIAN and QUARTILES to create a boxplot
studying at UNSW
MEDIAN = middle of observations, i.e. ½ way through
observations
Boxplot of Height QUARTILES = mark quarter points of observations, i.e.
200 ¼ (Q1) and ¾ (Q3) of the way through data [(n+1)/4;
whisker 3(n+1)/4]
190
INTERQUARTILE RANGE = Q3-Q1
180 upper quartile Whiskers: max length is 1.5*IQR; stretch from box to
Height
median
furthest data point (within this range)
170 box
Points further out from box marked with stars; called
160 lower quartile outliers
whisker
150
25 26
0.0
Mean 120 125
Range
Standard deviation 50 51
-2.5
Coefficient of skewness (C of S)
Coefficient of variation (cont.)
This measures the shape of distribution
Formula: There are some measures of skewness.
s
Coefficient of variation = standard deviation/mean = Below is a common one: Pearson’s coefficient of skewness.
x
Coefficient of skewness = 3 x (mean-median)/standard deviation
C off VA = 0.417
0 417 andd C off VB=0.408
0 408 =>
> A iis more spreadd outt If C of S is nearly +1 or -1, the distribution is highly skewed
than B
If C of S is positive => distribution is skewed to the right
(positive skew)
If C of S is negative => distribution is skewed to the left
(negative skew)
29 30
5
Activity 1 Distribution shapes
Summary statistics of two data sets are as follows
10
200
Set 1: Set 2:
8
Ages of students Wages of staffs
150
studying at UNSW
6
quency
Freq uency
Mean 22 4839
22.4839 294 3
294.3
100
Freq
4
Median 21 292.5
50
Standard deviation 6.3756 125.93
2
0
0
20 40 60 80 100 200 300 400 500 600
Compute the Pearson’s coefficient of skewness of these data age wages
31 32
n 1
1 n
X iYi nX Y
n 1 i 1
80 40
60 30
Positive
Negative
40 20
20 10
0 0
0 10 20 30 40 50
X values 0 10 20 30 40 50
X values
6
Values of covariance Coefficient of Correlation
If cov=0, then as X changes, Y doesn’t Also measures strength of linear
change variables are not linearly relationship between X and Y.
related Is bounded between -1 and +1.
0.5
0.0
COV ( X ,Y ) co v( X , Y )
-0.5
, r
Zero
-1.0
X Y s X sY
-1.5
-2.0
-2.5
0 10 20 30 40 50
X values
Example
If correlation equals….
Calculate covariance and correlation for the following
If r=-1, perfect negative linear relationship data.
xi x *
If r=+1, perfect positive linear relationship xi yi xi x xi x
2
yi y yi y y i y
2
7
Why do we need to study probability
Section 2 and probability distribution?
1 2
3 4
Continued Probabilities
Number of favorable outcomes
An event is a collection of one or more Probability of an event=
Total number of outcomes
simple (individual) outcomes or events. For the sample space S, P(S)=1
E.g. roll a die: event A = odd number comes E.g. roll a die: sample space:
up. Then A={1, 3, 5}. S={1,
S {1, 2, 3, 4, 5, 6}.
Example of events:
In general, use sample space S={E1, E2,…, Obtain the number ‘1’: A= {1} and P(A)= 1/6
En} where there are n possible outcomes. Obtain an odd number: B={1, 3, 5} and P(B)=1/2
Probability of an event Ei occurring on a Obtain a number larger than 6: C={} and P(C)=0
single trial is written as P(Ei) Obtain a number smaller than 7: D={1, 2, 3, 4, 5, 6} and
P(D)=1
5 6
1
Two rules about probabilities Probabilities of Combined Events
7 8
Joint Probabilities
Marginal Probabilities (1)
Eg: mutual funds (http://www.howtosavemoney.com/how-
do-mutual-funds-work/)
2
Some rules of probability Independence
Additive rule: for the union of two events
P(A or B) P(A) P(B) P( A and B) Two events are independent if
Multiplicative rule: for the joint prob. of P(A|B)=P(A) or P(B|A)=P(B)
two events:
P(Aand B) Note: If A and B are independent,
independent then
P A| B P(Aand B) P A| B P(B) P B| A P(A) P(A and B) = P(A)*P(B) Note: only if indep!
P(B)
Then P(A|B) = [P(A and B)]/P(B)
Complement rule: A and its complement, Ā,
=[P(A)*P(B)] /P(B)
so P(A)+P(Ā)=1;
=P(A)
therefore P(Ā)=1-P(A)
13 14
15 16
17 18
3
Discrete probability distributions More on discrete prob distns
Definition: A table or formula listing all If x is the value taken by a r.v. X, then
possible values that a discrete r.v. can p(x)=P(X=x)= sum of all the probabilities
take, together with the associated associated with the simple events for
probabilities. which X=x.
E.g.
E forf our toss three
h coins i example: l
If a r.v. X can take values xi, then
x 0 1 2 3
P(X=x) 1/8 3/8 3/8 1/8 1. 0 p xi 1 for all xi
(Check the probability in the table) 2. px 1
xi
i
19 20
21 22
23 24
4
Variance Variance continued
Measures spread/dispersion of distribution
Let X be a discrete random variable with
values xi that occur with probability p(xi),
2 E X
2
and E(X) = μ.
The variance of X is defined as E X
2 2
xi2 p xi 2
E X
2 2
all xi
xi p xi
2
all x
i
25 26
29 30
5
Example continued Bivariate probability distribution
Outcome (S) x y y
HHH 3 0
0 1 2 px(x)
HHT 2 1
HTH 2 2 0 1/8 0 0 1/8
THH 2 1 1 0 2/8 1/8 3/8
TTH 1 1 x
2 0 2/8 1/8 3/8
THT 1 2
3 1/8 0 0 1/8
HTT 1 1
TTT 0 0 py(y) 2/8 4/8 2/8 1
31 32
cov(( x, y )
; 1 1
x y 3
x , y 1
2
3 1
x , y2
2
4 2
35 36
6
Covariance for example The sum of two random variables
m n
xy xi y j p xi , y j x . y o Consider two real estate agents.
i 1 j i
37 38
39 40
41 42
7
Law of expected value and variance Application of this – portfolio
of the sum of two variables diversification and asset allocation
43 44
45 46
47 48
8
Notes about continuous pdfs The Normal Distribution
3) A continuous random variable has a
Bell-shaped, symmetric about µ, reaches highest
mean and a variance! point at x=µ, tends to zero as x→±∞.
The mean measures the location of the
distribution, the variance measures the
spread of the distribution.
49 50
51 52
σ=1
1
σ=0.5
σ=2
f(x)
0
-4 -2 0 2 4
x
53 54
9
OR we require (2) So, need to find the area under the curve…(3)
55 56
59 60
10
Examples using tables (1) Examples using tables (2)
1) P(Z<1.5) = 0.9332 (from table) 2) P(Z>1) = 1 – P(Z<1)
= 1 – 0.8413 (from tables)
= 0.1587
61 62
63 64
In general
X
Know that Z ~ N (0,1).
X a
So, P X a P
a
PZ where Z ~ N (0,1).
65
11
Outline
Section 3
Distribution of sample means
The central limit theorem
Sampling Distribution
Reading materials:
Chap 9 (Keller)
1 2
100
10
10
Frequency
Frequency
5 50
Frequency
0 0
6 8 10 12 14 16 18 20 22 24 26 10 20 30
Pizza time Pizza time
0
10 12 14 16 18 20 22 24 26
Pizza time Variable N Mean Median StDev
Pizza time 50 17.585 17.374 3.872
Variable N Mean Median StDev Variable N Mean Median StDev
Pizza time 50 17.256 17.041 3.743 Pizza time 1000 17.934 17.627 4.009
3 4
200 median
di 1000 17
17.757
757 17
17.804
804 1 433
1.433
100
10 20 30 40
Pizza time
90
80
80
70 70
60 60
Frequency
Frequency
40
20
10 10
0 0
13 14 15 16 17 18 19 20 21 22 14 15 16 17 18 19 20 21 22 23
average median
5 6
1
More random numbers
S.D for the 1000 random samples of size 10
Another thousand datasets are generated from the same model,
but this time each dataset has 25 observations.
90
80
70
100 80
60 90
uency
70
80
50
70 60
Frequ
40
Frequency
Frequency
60 50
30 50 40
40
20 30
30
20
10 20
10 10
0
0 0
60
becoming more and more symmetric and bell-shaped
50
and less variable, particularly those for the sample
uency
40
mean
Frequ
30
20
0
the sample mean is not only decreasing as sample
2 3 4 5 6
stdev
9 10
11 12
2
This is the Central Limit Theorem So, how large does n need to be?
If X is a random variable with a mean µ Generally, it depends on the original distribution of
and variance σ², then in general, X.
◦ If X has a normal distribution, then the sample mean has a
normal distribution for all sample sizes.
◦ If X has a distribution that is close to normal, the
2
X N , approximation is good for small sample sizes (e.g. n=20).
n ◦ If X has a distribution that is far from normal, the
approximation requires larger sample sizes (e.g. n=50).
X
Z Z ~ N 0,1 as n .
n
13 14
Activity 1
The average height of Vietnamese women is
1.6m, with a standard deviation of 0.2m. If I
choose 25 women at random, what is the
probability that their average height is less than
1 53m?
1.53m?
15
3
Outline
Estimation Concepts of estimation – point and interval
estimators; unbiasedness and consistency
Estimating the population mean when the
population variance is known
Estimating the population mean when the
population variance is unknown
Reading materials: Selecting the sample size
Chap 10 (Keller)
1 2
As n→∞, the distribution of the sample mean If the distribution of X is normal, then for all n the
becomes Normal, with centre µ and standard sample mean will follow a normal distribution.
deviation σ/√n. If the distribution of X is VERY not normal, then
This happens regardless of the shape of the original we will need a large n for us to see the normality
of the distribution of the sample mean.
mean
population.
In all cases, as n gets larger, the distribution of the
i.e. X follows a Normal distribution with mean gets more normal.
E ( X ) and
var( X )
2
n
3 4
5 6
1
Estimators Desirable qualities of estimators
There are two types of estimators Want our estimators to be precise and accurate
Point estimate: a single value or point, i.e. sample Accurate: on average, our estimator is getting
towards the true value
mean = 4 is a point estimate of the population
Precise: our estimates are close together
mean, µ.
Interval estimate: Draws inferences about a Sample mean is a precise and accurate estimator
population by estimating a parameter using an of the population mean. (Sometimes, accurate
interval (range). and precise together is referred to as unbiased.)
• E.g. We are 95% confidence that the unknown mean
score lies between 56 and 78.
7 8
9 10
P x 1.96 n x 1.96
n 0.95
11 12
2
P x 1.96 x 1.96 0.95 Example 1
n n
This is called a 95% confidence interval Suppose we know from experience that a
for μ. random variable X~N(μ, 1.66), and for a sample
What this means: of size 10 from this population, the sample mean
• In repeated sampling, 95% of the intervals is 1.58.
createdd this
hi way would
ld contain
i μ andd 5% N
Now,
would not.
Can change how confident we are by
changing the 1.96 P x 1.96 x 1.96 0.95
• Use 1.64 to get a 90% confidence interval n n
• Use 2.57 to get a 99% confidence interval
13 14
General notation
P x 1.96 x 1.96 0.95
n n In general, a 100(1-α)% confidence interval
estimator for μ is given by
1.66 1.66
P 1.58 1.96 1.58 1.96 0.95
10 10
x Z / 2
P x Z / 2 100(1 )%
n n
P 0.78 2.38 0.95 Notations:
Interpretation: If the experiment were carried out C o n f id e n c e le v e l: 1 0 0 (1 ) % -
multiple times, 95% of the intervals created in this th e p ro b . th a t a p a r a m e te r f a lls in to C I
way would contain μ.
Lower Confidence Limit: 0.78, Upper Confidence C I: x Z /2
n
Limit: 2.38
LCL: x Z /2 ; U CL: x Z /2
n n
15 16
17 18
3
Factors influence width of the interval IMPORTANT!
19 20
4
About the t-
t-distribution (1) About the t-distribution (2)
Found by Gossett, published under pseudonym
Normal
“Student”.
distribution
Called “Student’s t-distribution”
It is symmetric around 0, mound shaped (like a Bell-shaped t ((df = 13))
normal), but has a higher variance than a normal Symmetric
distribution. More spread out t (df = 5)
The higher the degrees of freedom, the more
normal the curve looks.
Z
t
0
25 26
27 28
5
Determine the sample size Sample size required
Suppose that before we gather data, we
know that we want to get an average Example 4: Assume that the standard
within a certain distance of the true deviation of a population is 5. I want to
population
p p value. estimate the true p
population
p mean lying
y g in a
We can use the CLT to find the minimum range of 3, with 99% certainty.
sample size required to meet this Step 1: set up the equation needed.
condition, if the standard deviation of the
population is known.
P X 3 0.99
31 32
Activity 1
Suppose that we know the standard
deviation of men’s heights is 10cm. How
many men should we measure to ensure
that the sample
p mean we obtain is no
more than 2cm from the population mean
with 99% confidence?
35
6
Outline
Section 4
Hypothesis testing: basic concepts;
Hypothesis Testing Testing µ when is known
Testingg µ when is unknown
Testing for the difference of two means
(independent samples)
Reading materials:
Chap 11, 12 (Keller)
2
1
3 4
1
Analogy continued – outcomes (1) Analogy continued – outcomes (2)
If we say we have a 95% chance of making the right
decision, it means we have a 5% chance of making an error.
Criminal law Hypothesis testing But, what type of error do we have a 5% chance of making?
A Type 1 error is considered to be more serious than a Type 2 error.
Accused is acquitted Choose H0 Therefore, by convention, we set up testing so the probability of
yp 1 error,, α,, is small;;
makingg a Type
Accused is convicted Choose HA
Ideally, we would also like to have the probability of making a Type 2
Convict an innocent error, β, small. But reducing chance of Type 1 error increases chance
Type 1 Error of Type 2 error;
person
Therefore, we choose to set α to 5% (i.e. a 5% chance we reject H0
Acquit a guilty person Type 2 Error
when it is true), or some other fixed, low probability and ignore β
“Beyond reasonable “95% Certainty of making
doubt” the right decision”
7
8
9 10
11
12
2
Applying the rules to example 1 Recap: The Central Limit Theorem
14
13
15 16
17
18
3
P-value approach (by hand or computer) Applying this to example 1 (by hand)
From the standard normal tables:
This is probability of getting our test statistic or
further away from middle if the null is true. P(Z>2.46) = 1 – P(Z<2.46)
Draw a diagram – it is the area more extreme than = 1 – 0.9931
our test statistic, i.e. for the last example, p-value is = 0.0069
P(Z>2 46)
P(Z>2.46).
Small p-value is evidence against the null This means that the probability of observing a sample mean
hypothesis. at least as large as 178 for a population whose mean is 170
Rule: is 0.0069, or extremely small (much smaller than 0.05).
Therefore, we reject the null and conclude that the mean
If p-value < α, => reject null hypothesis; monthly account is higher than $170 (the same conclusion
If p-value > α, => Do not reject null hypothesis as we did using the rejection region approach)
19 20
For a 5% significance level, we set up a rejection The 95% confident interval for μ is:
region:
65 65
178-1.96 < 178 1.96
X X 400 400
1.96 or 1.96
n n 171.63< <184.37
X
Acceptance Region is: -1.96< 1.96
n Because µ does not lie b/w this CI, we reject the
Then the 95% CI for is: X -1.96 / n < X 1.96 / n null in favor of the alternative
21 22
23 24
4
So if alternative is “>
“>“
If alternative is “<
“<“
Right tailed test
Left tailed test
Rejection Region will be Z>+Zα
Rejection Region will be Z<-Zα
P-value will be P(Z>T.S)
P-value will be P(Z<T.S)
25 26
X
t
s n
It follows the t-distribution with n-1 degrees of freedom (use
t-table to find rejection region or p-value).
27 28
29
30
5
Conclusion
31
6
Outline
Section 5 Simple Regression:
Form of the general model
Procedure in SPSS
Regression analysis Interpretation of SPSS output
T i significance
Testing i ifi off a slope/intercept
l /i
Assumption checking
1 2
3 4
5 6
1
Steps in regression analysis Simple linear regression: notation
1. Analyse the nature of the relationship b/w independent Simple regression – one predictor
and dependent variables We have n observations.
Xi = value of the independent variable on ith obs
2. Make a scatterplot
Yi= value of dependent variable on ith obs.
3
3. Formulate the mathematical model that describes the sx=sample standard deviation of the independent variables
relationship b/w the independent and dependent variables
sy=sample standard deviation of the dependent variables
4. Estimate and interpret the coefficients of the model Y is the sample average of the independent variables
5. Test the model X is the sample average of the dependent variables
7
8
10
(Residual)
9
Yi 0 1 X i i
Attitude Towards City
4 Slope
3
2
Estimate the parameters Applying this to example
Step 4: Estimate the parameters (slope and
intercept) Slope = 16.333/27.697
Yˆi ˆ 0 ˆ 1 X i = 0.5897
Can calculate estimates of slope and intercept Intercept = 6.5833-0.5897*9.333
using formulae, which are derived from the OLS =1.0796
n n n
n X iYi X i Yi
1 i 1 i 1 i 1 Fitted Equation: Yˆi 1.07960.5897*Xi
2
n
n
n X X i i
2
i 1 i 1
0 Y 1 X
13 14
̂1= 0.5897 means that each additional year of Can test significance of linear relationship
staying in the city, your attitude towards city will H0:β1=0
increase by an average of 0.5897 points HA:β1≠0
̂ 0 = 1.0796 is the value when X=0. This means Test Statistic:
that other reasons unrelated to the duration of ˆ1 1
residence make your attitude towards city equal to T ; where sˆ is the standard error of ˆ1.
sˆ 1
1.0796 points. 1
17 18
3
Step 6: Determine the strength and
significance of association Applying this to example
19 20
21 22
23
24
4
Model – general form Interpreting a Partial Regression Coefficient
Y 0 1 X 1 2 X 2
27 28
29
30
5
Points about R2 Significance Testing
Now called coefficient of multiple Can test two different things
determination 1. Significance of the overall regression
Will go up as we add more explanatory 2. Significance of specific partial regression
terms to the model whether they are coefficients.
“i
“important” ” or not.
Often we use “adjusted R2” –
compensates for adding more variables, so
is lower than R2 when variables are not
“important”
31 32
6
Check residuals Error terms normally distributed
Can be checked by looking at a histogram
Assumptions made: of the residuals - look for bell-shaped
Error terms normally distributed
distribution.
Error terms have mean 0, constant variance
Error terms are independent Also normal probability plot – look for
Definition: A residual (also called error term) is straight line.
the difference between the observed response For preference, use standardised residuals
value Yi, and the value predicted by the
– have a std dev of 1.
regression equation, Yˆi
(Vertical distance between point and line.)
37 38
Error terms have mean 0, constant variance Error terms are independent
Check in previous plots; also in residuals
Checked by using plots of residuals vs vs time/order.
predicted values; residuals vs independent Look for random scatter of residuals.
variables.
variables
Look for random scatter of points around
zero.
If not, may indicate linear regression is not
appropriate – may need to transform data
39 40
Example
Residual Plots for Attitude Towards City
Normal Probability Plot of the Residuals Residuals Versus the Fitted Values
99 2
Standardized Residual
90
1
Percent
50 0
10 -1
1 -2
2
S
-2 -1 0 1 2 2 4 6 8 10
Standardized Residual Fitted Value
1
Frequency
2
0
1
-1
0 -2
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 1 2 3 4 5 6 7 8 9 10 11 12
Standardized Residual Observation Order
41
7
8/13/2012
Section 6 Outline
Overview of time series;
Stationarity
hypothesis
Introduction to Time Series Autoregressive
g processes;
p ;
Determining process order;
Reading materials:
Chap 20 (Keller)
1 2
3 4
5 6
1
8/13/2012
i the
is th same as the
th joint
j i t distribution
di t ib ti off
LLN and CLT hold for TS if the process ( yt1 h , yt2 h ,..., ytm h )
satisfies stationary conditions
Weak or covariance stationary if covariances
b/w y t and y for any h do not depend upon
t h
t.
7 8
9 10
12
11
2
8/13/2012
0.2
correlation between yt and yt-3, after controlling for the
0.0
-0.2
effects of yt-1 and yt-2.
-0.4 Note: at lag 1, the autocorrelation and partial
-0.6
autocorrelation coefficients are equal, since there are no
-0.8
-1.0
intermediate lag effects to eliminate.
1 10 20 30 40 50 60 70 80
Lag
14
13
1.0
0.8
RSS
0.6
A IC e 2k / n k = lag order
Partial Autocorrelation
04
0.4
0.2
n n = # obs
0.0 RSS
-0.2
◦
IC n k
S Technique: / n
-0.4 n
-0.6 1. Fit model with k lags. Calculate AIC and SIC;
-0.8
2. Fit another model with k+1 or k-1 lags; and,
-1.0
1 10 20 30 40 50 60 70 80
3. Best model will have lowest AIC/SIC.
Lag
15 16
17