0% found this document useful (0 votes)
60 views2 pages

Cheat Sheet 2 in 1-1

Uploaded by

y1yat1717
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views2 pages

Cheat Sheet 2 in 1-1

Uploaded by

y1yat1717
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

21 -

1 5
.
(I2R) 23 -

1 5
. (I2R)
( )
Chapter 1 ↓ ↓ Rate of return (for CV): 𝑅 = where X is the cash flow [uniform distribution, more on chapter 4] and c is the cost of project [constant]

Variable: 1) Categorical (ordinal = ordered; nominal = not ordered) → bar chart (count) // pie chart (percentage) // two-way (contingency) table a s
Bernoulli distribution: 1) 2 possible outcomes; 2) fixed probability; 3) independence 4) n=1
[Area principle for bar chart & pie chart: the area of a plot that shows data should be PROPORTIONAL TO the amount of data] except décor / baseline not 0
Min Max Min Max Probability 𝑃(𝑋 = 1) = 𝑝 ; 𝑃(𝑋 = 0) = (1 − 𝑝) = 𝑞
Variable: 2) Numerical → histogram // boxplot // scatterplot | Boxplot: left-skewed = right-skewed = || Detailed: Md; IQR; ±1.5*IQR Mean [pop. mean] [ref. 1f] 𝐸(𝑋) = 𝑝
Variance [pop. var] [ref. 1f] 𝑉𝑎𝑟(𝑋) = 𝑝 ∙ (1 − 𝑝) = 𝑝 ∙ 𝑞
Normal curve: symmetric; bell shape // Unimodal // Bimodal // Left-skewed // Right skewed || Mean > Median → Right-skewed // Mean < Median → Left-skewed Standard deviation 𝑝 ∙ (1 − 𝑝) = 𝑝 ∙ 𝑞
↓ Binomial distribution: 1) 2 possible outcomes; 2) fixed probability; 3) independence 4) n identical trials
Time series v. Cross-sectional data || Independent (Explanatory) variable v. Dependent (Response) Variable
Chloe Tin
theory

? Y
Q0 = Min // Q1 = (n+1/4)th // Q2 = (n+1/2)th // Q3 = 3(n+1/4)th // Q4 = Max }→ 5 number summary | IQR = Q3-Q1 Probability 𝑝(𝑟) = 𝐶 ∙ 𝑝 ∙ (1 − 𝑝) =𝐶 ∙𝑝 ∙𝑞
Mean 𝐸(𝑋) = 𝑛 ∙ 𝑝
Robust = Insensitive to a few extreme observations | Robust: median (centre); IQR (spread) | Not robust: mean (centre); SD (spread) Variance 𝜎 = 𝑛 ∙ 𝑝 ∙ (1 − 𝑝) = 𝑛 ∙ 𝑝 ∙ 𝑞
Standard deviation 𝑛 ∙ 𝑝 ∙ (1 − 𝑝) = 𝑛 ∙ 𝑝 ∙ 𝑞
no possible value //PT successful rate &

!
( ̅) ( ̅ )( )

Y
∑ more
Sample: mean = ; variance = ; sd = √𝑣𝑎𝑟 ; cov = || cor = (−1 ≤ 𝑟 ≤ 1) *cov and cor are unitless*

Population: mean =∑ 𝑥𝑝(𝑥) ; variance = ∑(𝑥 − 𝜇 ) 𝑝(𝑥) ; sd = √𝑣𝑎𝑟 ; cov =∑ , [(𝑥 − 𝜇 ) ∙ 𝑦 − 𝜇 ∙ 𝑃 𝑋 = 𝑥 ∩ 𝑌 = 𝑦 ] || cor = (−1 ≤ 𝑟 ≤ 1) Chapter 4

Scatterplot: show relationship between 2 QUANTITATIVE variables measured on the SAME individuals | 1) trend up/down 2) linear/curved 3) cluster/scatter 4) outliers? PDF (con’t): f(x) is a continuous function such that f(x)≥0 for all x; uniform or normal or other shapes; a continuous function such that the total area under f(x)=1

f
? I
Two-way table: describe the relationship between two categorical variables, tables contain counts or proportions
(contingency table (
1) 𝑋 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚[𝑐, 𝑑] → 𝑓(𝑥) =
cells = combination of values of the two variables; joint distribution (only %); marginal distribution (only sides); conditional distribution (condition as denominator) 0
> X
ca

?
Mean 𝑐+𝑑

I
Simpson’s Paradox = a change in the direction of association between two variables when data are separated into groups defined by a third variable (lurking variable) 𝜇 =
2
Standard deviation 𝑑−𝑐
Lurking variable = a variable that has an important effect but was overlooked >
-
this proportion (not var !!! ) 𝜎 =
√12

right
I'd =
-1 x >d => sd Probability = 𝑃(𝑐 ≤ 𝑋 ≤ 𝑑) = 𝑤𝑖𝑑𝑡ℎ ∙ ℎ𝑒𝑖𝑔ℎ𝑡 = (𝑑 − 𝑐) ∙ =1
Chapter 2 assuming probability
all outcomes 2) 𝑋 ~ 𝑁(𝜇, 𝜎 ) → Standard normal: 𝑋 ~ 𝑁(0,1) || Mean = Median = Mode || The area under the normal curve (−∞, +∞) is 1
t
pof are the same
->
with LLN ↓

highers-lowers
-
Experiment; Outcome; Sample Space; Event; Probability 1) classical method 2) long-run relative frequency 3) subjective – assessment based on experience / expertise on
Empirical rule: 68-95-99.7 V
- 1000
M

i ...
C
Law of Large Number: the relative frequency of an outcome converges to a number i.e. the probability of the outcome as the number of observed outcomes increases
Quantile-quantile Plot / QQ Plot → a graphical method to compare two probability distributions by plotting their quantiles against each other I

Law of Total Probability; Mutually exclusive; Independent; Dependent | Joint prob = 𝑃(𝐴 ∩ 𝐵) | Marginal (unconditional) prob = 𝑃(𝐴) | Conditional prob = 𝑃(𝐴|𝐵) ̅
I
O
>
Standard score: 𝑧 = || Specific value reverse: 𝑥 = 𝑥̅ + (𝑧𝜎) ~
When P(A) & P(B) > 0 → cannot mutually exclusive and independent at the same time
Normal approximation to the Binomial: 𝑌 ~ 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 (𝑛, 𝑝) → 𝑋~ 𝑁(𝑛𝑝, 𝑛𝑝𝑞) → 𝑋~ 𝑁(𝜇, 𝜎 ) || The approximation is good only when 𝑛𝑝 ≥ 5, 𝑛𝑞
up25 ≥5
ng25 ,

Rules: ↳ P(Y < 500 Z


eg
I
goodnormaation
.

- t-distribution:
t-dist
& P(X 500)
- t
<

1) Addition rule → 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) where 𝑃(𝐴 ∩ 𝐵) = 0 for mutually exclusive events
-
~ P(z ) ...

>
1) Mean = 0 for df > 1; 2) Median = Mode = 0; 3) Symmetric and bell-shaped, fatter tails than the normal [higher prob in tails]; 4) t tends to z when df tends to ∞
2) Complement rule → 𝑃(𝐴) = 1 − 𝑃(𝐴 ) // 𝑃(𝐴|𝐵) = 1 − 𝑃(𝐴 |𝐵)

3) Multiplication rule (also test for independence) → 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵) [independent] // 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴|𝐵) × 𝑃(𝐵) = 𝑃(𝐵|𝐴) × 𝑃(𝐴) [dependent] Chapter 5
( ∩ )
* 𝑃(𝐴|𝐵) = ( )
where P(B)≠0 Central Limit Theorem (CLT):
( ∩ ) ( | )× ( )
Bayes’ rule: 𝑃(𝐴|𝐵) = = where needed 1) 𝑃(𝐴) | 2) 𝑃(𝐴 ) | 3) 𝑃(𝐵|𝐴) | 4) 𝑃(𝐵|𝐴 ) || Contingency table → practice As the sample size increases, sampling distribution of the sample mean 𝜇 , 𝜎 approaches the normal distribution 𝑋 ~ 𝑁 𝜇, → always stand when 𝑛 ≥ 30
( ) ( | )× ( ) ( | )× ( )

Chapter 3 P(x) Population s.d. 𝜎 Population distribution Sample size n Result

Known Normal ≥ 30 𝑍𝑋 ~ 𝑁(0, 1)


Discrete RV = Probability mass function → | Random RV = Probability density function → con’t
> X Known Normal < 30 𝑍𝑋 ~ 𝑁(0, 1)
1) 𝐸(𝑎𝑋) = 𝑎𝐸(𝑋) 1) 𝑉𝑎𝑟(𝑎𝑋) = 𝑎 ∙ 𝑉𝑎𝑟(𝑋)  𝐶𝑜𝑣(𝑎𝑋, 𝑏𝑌) = 𝑎𝑏 ∙ 𝐶𝑜𝑣(𝑋, 𝑌)
2) 𝐸(𝑋 + 𝑏) = 𝐸(𝑋) + 𝑏 2) 𝑉𝑎𝑟(𝑋 + 𝑏) = 𝑉𝑎𝑟(𝑋)  𝐶𝑜𝑣(𝑋 + 𝑎, 𝑌 + 𝑏) = 𝐶𝑜𝑣(𝑋, 𝑌) Known Not normal ≥ 30 𝑍𝑋 ~ 𝑁(0, 1)
Independent 3) 𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌) 3) 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 ) − 𝐸(𝑋)  𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸(𝑋 ∙ 𝑌) − 𝐸(𝑋) ∙ 𝐸(𝑌) Known Not normal < 30 - Nil.
Uncorrelated
∗∗∗ 𝑉𝑎𝑟(𝑎𝑋 + 𝑏𝑌) = 𝑎 ∙ 𝑉𝑎𝑟(𝑋) + 𝑏 ∙ 𝑉𝑎𝑟(𝑌) + 2𝑎𝑏 ∙ 𝐶𝑜𝑣(𝑋, 𝑌) ∗∗∗
Unknown Normal ≥ 30 𝑍𝑋 ~ 𝑁(0, 1)

-
T
↳ Independence implies uncorrelatedness → uncorrelated = cov and cor = 0
[ I Unknown Normal < 30 𝑡𝑋 ~ 𝑡𝑛−1
. .
Coefficient of variation: 𝐶𝑉 = = → measure risk without changing unit (unlike Sharpe ratio) | Higher ratio = higher risk Unknown Not normal ≥ 30 𝑍𝑋 ~ 𝑁(0, 1)

Unknown Not normal < 30 Nil.


Y Example (Chapter
-
Sharpe ratio: 𝑆(𝑋) = where rf = risk free rate | Higher ratio = higher average rate of return relative to s.d. >

4 and 5)
Chapter 6 Chi-square test for independence:
*
Confidence level 95% → Motive: 95% confident that 𝜇 is within the interval // Mechanism: 95 in 100 intervals covers 𝜇 [H0: The Y variable is independent of the X variable] vs [Ha: The Y variable is independent of the X variable]
-
Steps: Requirement for the test:
A
1) Identify whether it is sample proportion (binomial) or sample mean 1) The observed frequencies are obtained from a Simple Random Sample (SRS)
A
2) If proportion → good approximation requirement → z // If mean → z or t? [ref TOPIC 5 table] 2) The expected frequencies are all ≥ 5

3) 𝛼 = 1 − 𝑥% → get 𝛼 Expected table: C1*R1/Total  do this for all cells || Degree of freedom = (C-1)(R-1)

4) Get df = n-1 if needed for t-test


Example (Chapter 6) I Reject when X >
X dy where X" =
[0 : -
Eil
,
Ei

5) Find critical value 𝑧 or 𝑡 ,


Chapter 8
∙( )
6) Find margin of error = 𝐿 = 𝑧 ∙ or 𝑧 ∙ or 𝑡 , ∙ Simple linear regression model (only 1 explanatory variable): 𝜇 𝑦 𝑥 = 𝛽 + 𝛽 𝑥 // 𝑦 = 𝑏 + 𝑏 𝑥
√ √

Build up a regression model with b0 and b1


7) Find interval = [mean – L, mean + L]
- ∑ ∑ (∑ )
∑ ∑ ∑( )
Sample size requirement calculation: 𝑏 = = ;𝑏 =𝑌−𝑏 𝑋 || where 1) 𝑋 = ; 2) 𝑠 =
(∑ )
∑( )

𝑛≥
∙( )∙( )
or 𝑛 ≥
∙( )
or others [NEVER USE t IN SAMPLE SIZE REQUIREMENT CALCULATION]
↳ same
for Y
𝑌 = 𝑏 + 𝑏 𝑋 by LSE (Least Squares Estimation)

......
2
Model assumption:
M
Chapter 7 =
Chare = (no 1) Linearity → 𝐸(𝜖) = 0 → Mean zero assumption  Plot scatterplot (Y/X) and scatterplot (e/X)

!
: : : : : :) X

Hypothesis situations
Null hypothesis Alternative & X ~
2) Independence (error)  Not time series
↓ ↓ hypothesis

,
Less than “𝐻 ” 𝐻 :𝜇 ≥ 𝑘 vs 𝐻 :𝜇 <𝑘 3) Normality → Errors are normal RV  QQ Plot check
One-sided

(
Greater than “𝐻 ” 𝐻 :𝜇 ≤ 𝑘 vs 𝐻 :𝜇 >𝑘
Two-sided Not equal to “𝐻 ” 𝐻 :𝜇 = 𝑘 vs 𝐻 :𝜇 ≠𝑘 4) Equal / Constant variance of error (MSE)  Plot scatterplot (Y/X) and scatterplot (e/X) → no fan out / funnel in (heteroscedastic error) but evenly spread
- 1
(Chapter
Error:
Example (homoscedastic error) ~ 31 4
X

Type I error → Rejecting true 𝐻 || 𝛼 = 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 | 𝐻 𝑖𝑠 𝑡𝑟𝑢𝑒) || Significance level: 𝛼 TT .


d
𝑌 = 𝛽 + 𝛽 𝑋 + 𝜖 𝑤ℎ𝑒𝑟𝑒 𝜖 ~ 𝑁(0, 𝜎 ),where i.i.d. stands for identically, independently distributed
.
.

Type II error → NOT rejecting false 𝐻 || 𝛽 = 𝑃(𝑁𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻 | 𝐻 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒) || Power: 1 − 𝛽


- *Danger of extrapolation when X is out of the experimental region (dataset range)*
Hypothesis testing:
SSE (Unexplained sum of SSR (Model / Explained sum SST (Total sum of squares) MSE (Mean square error) SE (Standard error)
1) Identify whether it is sample proportion (binomial) or sample mean squared error / squares) of squared error / squares) (Overall variability in Y)
* = 𝑒 = 𝑌 −𝑌 = (𝑌 − 𝑌 ) =𝑠 =𝑠
2) If proportion → good approximation requirement → z // If mean → z or t? [ref TOPIC 5 table] 𝑆𝑆𝐸 = √𝑀𝑆𝐸
=
= 𝑌 −𝑌 =
[((2) -

n(Y)2 𝑛−2
3) Identify the three hypothesis situations → follow the following steps
T
Note that 𝑆𝑆𝑇 = 𝑆𝑆𝑅 + 𝑆𝑆𝐸
↑ point estimate
Critical value approach
Coefficient of determination (𝑅 ) = = 1−
Greater than; upper tail Less than; Lower tail Not equal to; two-sided
4) Get df = n-1 if needed for t-test 4) Get df = n-1 if needed for t-test 4) Get df = n-1 if needed for t-test
5) Find critical point using significance level 𝛼: 5) Find critical point using significance level 𝛼: 5) Find critical point using significance level 𝛼: Interpretation: about (𝑅 )% of the sample variation in Y can be explained by the simple linear regression model where we use X to predict Y
Critical point = 𝑧 or 𝑡 , Critical point = −𝑧 or −𝑡 , [negative] Critical point = 𝑧 or 𝑡 ,
̅ ̅ Coefficient of correlation (or simply [correlation]) (r) =
6) Find z- or t-statistic: z / t = ; z for p = ⋅
6) Find z- or t-statistic: z / t = ; z for p = ⋅ 6) Find z- or t-statistic: z / t =
̅
; z for p =
√ √ ⋅

7) Apply rejection rule to reject H (in favor of H ) 7) Apply rejection rule to reject H (in favor of H ) Interpretation: see TOPIC 1 || 𝑟 =𝑅
7) Apply rejection rule to reject H (in favor of H )
G bi bo
if z > 𝑧 𝑜𝑟 𝑡 > 𝑡 , if z < −𝑧 𝑜𝑟 𝑡 < −𝑡 , if [𝑧 > 𝑧 or 𝑧 < −𝑧 ] strong/weak +velve linear relationship ?
Standard deviation 𝑠
or if [t > t , or 𝑡 < −𝑡 , ] no need refer to Topic 5 𝑠 =
𝑠 ∙ √𝑛 − 1 𝑠 =𝑠∙
1
+
𝑋
1 𝑛 (𝑛 − 1) ∙ 𝑠
p-value approach (usually z-test because t-test is hard to find) I
Greater than; upper tail Less than; Lower tail Not equal to; two-sided Hypothesis testing (determine z or t by simply 𝑛 ≥ 30 or not) 𝑏 −𝛽 𝑏 −𝛽
of for + 𝑧 𝑜𝑟 𝑡 = 𝑧 𝑜𝑟 𝑡 =

Inormally
̅ ̅ ̅
= n-2 :: 2 variables 𝑠 𝑠
4) Find z-statistic: z = ;z= 4) Find z-statistic: z = ;z= 4) Find z-statistic: z = ;z=

⋅ ⋅
√ √
⋅ Interval (only z) 𝑏 ± 𝑧 ∙ (𝑠 ) 𝑏 ± 𝑧 ∙ (𝑠 )
5) Find p-value = 𝑃(𝑍 ≥ 𝑧 | 𝐻 𝑖𝑠 𝑡𝑟𝑢𝑒) = the 5) Find p-value = 𝑃(𝑍 ≤ 𝑧 | 𝐻 𝑖𝑠 𝑡𝑟𝑢𝑒) = the left 5) Find p-value = 2 × 𝑃(𝑍 ≥ |𝑧| | 𝐻 𝑖𝑠 𝑡𝑟𝑢𝑒) =
right side of z-statistic side of z-statistic double the right side of absolute z-statistic -
-
assume Bo and B ,
= 0 and use 2-sided test
6) Apply rejection rule to reject H (in favor of H ) 6) Apply rejection rule to reject H (in favor of H ) 6) Apply rejection rule to reject H (in favor of H )
L
if p < 𝛼 if p < 𝛼 if p < 𝛼
CI (estimate average of Y) (mean value of Y) PI (predict Y) (individual value
of Y)
Interval estimate (only z)
-1 1 (𝑋 − 𝑋 ) 1 (𝑋 − 𝑋 )
D -1 D 𝑌± 𝑧 ∙ (𝑠) ∙ + 𝑌± 𝑧 ∙ (𝑠) ∙ 1 + +
I I 1 1
-
-

𝑛 (𝑛 − 1) ∙ 𝑠 𝑛 (𝑛 − 1) ∙ 𝑠
Remember sample proportion : -
p

Pq Pq
Distinguish: 𝑠, 𝑠 , 𝑠 , 𝑠 , 𝑠 , 𝑠 , 𝑠 , 𝑠 , 𝑠 , 𝑋, 𝑌 , 𝑆𝑆𝐸, 𝑆𝑆𝑅, 𝑆𝑆𝑇, 𝑟, 𝑟 , 𝑅
j

N in

You might also like