0% found this document useful (0 votes)
16 views32 pages

12 Correlation and Significancy

Uploaded by

990293kwi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views32 pages

12 Correlation and Significancy

Uploaded by

990293kwi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Correlation

1
Q&A
• SSres 和 SSE 是同一個東西嗎

• 分母的「除以 x 的離均差平方和 (x-x_bar)^2 」是為了標準化嗎

2
Reviews
1. What is the probability distribution? What is it usage?
2. What is the degree of freedom? What is it used for?
3. What is the different between the z-distribution and t-distribution? What is
their application?
4. What is the usage of two-sample testing?
5. What is the 5% significant level?
6. What is the null hypothesis?
7. What is the p-value? What is it used for?
8. What is the usage of linear regression?
9. How to judge if a dataset is normal distributed or not?
3
Correlation and covariance
• In statistic, correlation is a statistical linear relation between two random
variables.
• It is used to measure the degree to which two random variables
move/change in relation to each other.
• It is a statistical analysis on scatter plots.
• It has a coefficient (called the correlation coefficient, R or r) ranging from -1
to 1.
• The correlation coefficient (R or r) need to be calculated from covariance.
• The correlation is the covariance normalized by the standard deviation of the
two random variables.

4
Correlation vs linear regression?
(Think and discussion)
• What is the difference between R2 (coefficient of determination) and
R (correlation coefficient)?


(𝑥¿¿𝑖− 𝑥)(𝑦 ¿¿𝑖−𝑦)
𝑅=∑ ¿¿
√ ∑ (𝑥¿¿𝑖−𝑥) √∑ ( 𝑦¿¿𝑖−𝑦) ¿¿
2 2

5
More about
the coefficient of determination (R 2)
• In statistics, the coefficient of determination is a measurement
that examines how differences in one variable (y) can be
explained by the difference in a second variable (x).
• Use to see how good one variable (y) can be predicted from
the second variable (x).
• This quantity is a measure of the proportion of variability
explained by the fitted model.

6
More about
the coefficient of determination (R 2)
• An example:
• the coefficient of determination (0.913) suggests that the
model fit to the data explains 91.3% of the variability
observed.

• [Probability & Statistics for Engineers & Scientists, Ninth


Edition, Walpole et al., 2011]

7
Correlation coefficients (R) vs
determination coefficients (R2)
• R2: a measure of the proportion of variability explained
by the fitted model.

• R: a measure of strength/degree of the relationships


between two variables.

8
For examples (-1 to 1)

9
For examples (-1 to 1)

10
What is covariance?
• Think about some words with “co-”?
• In statistic, covariance is a measure of the joint variability of
two random variables.
• The covariance is the calculation of variance of two random
variables.

11
Calculating covariance

• Calculate variance from sample data


(d.f. = N-1):
var

• Calculate covariance of two sample variables


(d.f. = N-1):

(𝑥¿ ¿ 𝑖− 𝑥 )(𝑦 ¿ ¿ 𝑖− 𝑦 )
𝜎 𝑥𝑦 =∑ ¿¿
𝑁 −1
12
Calculating correlation (REVIEW)
• The correlation is the covariance normalized by the standard
deviation of the two random variables.

𝜎 𝑥𝑦 (𝑥¿¿𝑖−𝑥)(𝑦¿¿𝑖− 𝑦) ❑
𝑅= =∑ ¿¿
𝜎𝑥𝜎𝑦 (𝑁 −1)𝜎 𝑥 𝜎 𝑦

(𝑥¿¿𝑖− 𝑥)(𝑦¿¿𝑖− 𝑦)
¿∑ ¿¿
s
√ ∑ (𝑥¿¿𝑖− 𝑥) √∑ (𝑦¿¿𝑖− 𝑦) ¿¿
2 2

13
Calculating correlation
• The correlation is the covariance normalized by the standard
deviation of the two random variables.

𝜎 𝑥𝑦 (𝑥¿¿𝑖−𝑥)(𝑦¿¿𝑖− 𝑦) ❑
𝑅= =∑ ¿¿
𝜎𝑥𝜎𝑦 (𝑁 −1)𝜎 𝑥 𝜎 𝑦

(𝑥¿¿𝑖− 𝑥)(𝑦¿¿𝑖− 𝑦)
¿∑ ¿¿
s
√ ∑ (𝑥¿¿𝑖− 𝑥) √∑ (𝑦¿¿𝑖− 𝑦) ¿¿
2 2

14
Standard Coordinates of Histograms
(REVIEW)
• Assume we have a dataset {x} of N data items, x1, … , xN.

Dataset

Data
xi=x1, … , xN
先把資料無因次化 (normalized)

15
Confusion Caused by Correlation
(Using correlation incorrectly)--REVIEW
• High correlation can only tell when one is large then the other is large
(positive correlation) or small (negative correlation).
• But, correlation DOES NOT mean that changing in one variable causes
(or absolutely cause) the other to change. (causation 原因 )
• Examples:
(1) Shoe size vs reading skills
(2) Fertilizer vs plant size.
(3) Ocean temperature vs ocean salinity
(4) Ocean temperature vs ocean current speed.
16
Confusion of Correlation Coefficients--REVIEW
(Using correlation coefficient incorrectly)
• To check how good/reliable the analysis of correlation is, we need to do
significant testing (we will discuss this next semester (maybe later) after
learning the probability), besides of calculating correlation coefficients.

17
(𝑥¿¿𝑖− 𝑥)(𝑦 ¿¿𝑖−𝑦)❑
𝑅=∑ ¿¿
√ ∑ (𝑥¿¿𝑖−𝑥) √∑ ( 𝑦¿¿𝑖−𝑦) ¿¿
2 2

Critical value of r at the


5% significant level

r c= 2
√ 𝑡2
𝑡 + 𝑛 −2

n-2: d.f.
18
t-table

n-2


2
𝑡𝑐
r= 2
𝑡 𝑐 +𝑛 − 2

=0.8783

19
Homework 11 ( 這周五之前繳交 )

• Link to
• https://forms.gle/pcxJyJ4kpohCDAdA6

20
HW 11
• 求以下資料集 x 與 y 的相關係數 (R) 。 ❑
(𝑥¿¿𝑖− 𝑥)(𝑦 ¿¿𝑖−𝑦)
𝑅=∑ ¿¿
√ ∑ (𝑥¿¿𝑖−𝑥) √∑ ( 𝑦¿¿𝑖−𝑦) ¿¿
2 2


2
𝑡𝑐
r= 2
𝑡 𝑐 +𝑛 − 2

• 根據資料的自由度,判斷所求得的相關係數 (R) 在統計上是否有意


義。

21
補強題目
• 根據以下的 x 和 y 的資料,求出它們之間的 (a) 相關係數、 (b) 決
定係數 和 (c) 數線性回歸線之方程式。 這些統計結果是否顯著 ?
為什麼 ?

22
se??

23
se?? (Standard error)

• Comparing sample mean () and population mean ()

24
Standard error (of the estimated sample mean)

25
Standard error (of the estimated sample mean)

−𝑡 𝛼/ 2 𝑡 𝛼/ 2

Confidence interval
26
Previous Exercise 01 (discussion)
• John recorded a set of data for a sine wave as shown in
below. Help him to conduct the two-tailed t-testing for the
mean of the data. Can the mean of the data statistically
trustable/representable? (The mean and std of data is about
0.097 and 0.639, respectively)

1
John Data
0.8
Sine wave
John data t(s) h(m)
23 0.39
𝑥 − 𝜇0
0.6
25 0.42
0.4
𝑡= 34
70
0.55
0.93
0.2
𝑠 200 -0.34
h (m)

0
229 -0.75
-0.2 √𝑛 501
509
0.62
0.51
-0.4
593 -0.79
-0.6
685 -0.57
-0.8

-1
0 100 200 300 400 500 600 700 800 27
t (s)
Example (discussion)
• (The mean and std of data is about 0.097 and 0.639, respectively)
• =0.2
• =2.262
• -0.3< <0.5 (95%)

1
John Data
0.8
Sine wave
John data t(s) h(m)
23 0.39
𝑥 − 𝜇0
0.6
25 0.42
0.4
𝑡= 34
70
0.55
0.93
0.2
𝑠 200 -0.34
h (m)

0
229 -0.75
-0.2 √𝑛 501
509
0.62
0.51
-0.4
593 -0.79
-0.6
685 -0.57
-0.8

-1
0 100 200 300 400 500 600 700 800 28
t (s)
t-table

29
Example (discussion)
• (The mean and std of data is about 0.47 and 0.56, respectively)
• =0.28
• =3.182
• -0.42< <1.36 (95%)

1
John Data
0.8
Sine wave
John data t(s) h(m)
0.6
23 0.39
25 0.42
0.4 34 0.55
0.2 70 0.93
200 -0.34
h(m)

0
229 -0.75
-0.2 501 0.62
-0.4
509 0.51
593 -0.79
-0.6
685 -0.57
-0.8

-1
0 100 200 300 400 500 600 700 800 30
t(s)
Example comparison (discussion)
• (The mean and std of data is about 0.097 and 0.639, respectively)
• =0.2
• =2.262
• -0.3< <0.5 (95%)

1
John Data
0.8
Sine wave
John data t(s) h(m)
0.6
23 0.39
25 0.42
0.4 34 0.55
0.2 70 0.93
200 -0.34
h (m)

0
229 -0.75
-0.2 501 0.62
-0.4
509 0.51
593 -0.79
-0.6
685 -0.57
-0.8

-1
0 100 200 300 400 500 600 700 800 31
t (s)
Question (application & discussion)
• If the mean and std of data is about 0.097 and 0.7, respectively.
Then how many data that you need to get standard error<0.1, by
assuming that the natural signal that John observed is unknown?
• =0.1
• =0.7/0.1=7
• =49

32

You might also like