Data Class
Data Class
Panel or
longitudinal data : time series for each
-
>
estimator
=
find mean ( µ)
when expected value
( >
is said to be unbaised
-
>
of × ( C- ( ))
x
=
value of
M" " population param .
* unbaised estimators -
s variance
proportions
ex :
random variable u =
Xp
-
biz
> i
mean
-
> : the expected of ✗
V.
µ
=
mean a
b ✗
z
juu = C- Cu ] =
C- [ × , -
bxz ]
=
C- [ xp ] -
C- [ bxz ] -1ha -
b -
Ecxz ]
+ ±
C- [ × , ] -16g
jUz= C- Cxz ]
C- [ ax ] =
a •
C- [ ✗ ]
→
variance : the variance of ✗ a =
5 }
Vav (a) =
0 } the variance of ✗ 2=0 }
Vav (a) =
Vav (× , -
bxz ) =
Vav ( ✗ a) -
vav ( bxz) =
T } -
b
≥
Vav /✗ 2)
± ±
Vav ( ✗ g) =T } var / ✗ 2) =
T }
Var (ax)=a2.Vav( × )
T{ } }
≥
=D =
T -
b •
T
B) v =
axz t b✗ a
→ mean :
Mv
=
C- [ v ] =
C- [ axz t b xp ] =
a
-
C- [ ✗ z ] + b- C- [ ✗ a ]
C- [ ax ] =
a. Ecx ] guv
=
a -142 +
b.)Up
→ variance :
var (v) =
TI =
Vav ( axztbx ,)
=
a
≥
•
Vav (✗ 2) + b2 Vau (x , )
2
Var ( ax)=a2 .
Vav ( x) g ✓
=
of ◦
T { + b2 .
J }
Vav ( x) =
TI
not related
→
--
bc Obv
\ >
cross covariance = 0
are in dep
'_ÉaafO
→
mean :
Ecju ] ,
=
)U Ecax ] =
a. C- [ × ]
C- [Ju , ] =
ECÉ, ( × ,
+
✗ < + ✗
3
+ ✗ a) ] =
±, •
C- [ × ,
+ ✗ 2 + ✗
3 + ✗
u ]
=
±, •
C- [ × ] ,
+ Ecxz ] + C- [ ✗ 3)+ C- [ ✗ u ]
±
¥
± ±
su
=
4µ
su su
µ
=
¥ •
XX
var ( ax ) =
a
≥ .
Var ( x)
→
variance : vav ju ) = T2
,
hi ,
<
Vavju ) ,
=
Vav ( I, ( •
× , + ✗
ztxz + ✗ u ) =
⇐/ •
Vav ( × ,
+✗e +✗ stxu )
2 ≥
& JZ J
Z
J
2
= (( T
=
(1) 2- Var ( × )
,
+ Vav (a) + Vav (✗ 3) + Vav (✗ )
u
=
(4)
'
◦ 4T
≥
=
¥6 •
402 =
¥ =
Vav (jur )
t-frtrftfrfr-rhofvavcax.az -
Vavcx)
→ :
variance
var Gina)=µ =
Vav( tox ,
+ to ✗ < +1-4×3 { 4) + ✗
a) E.2. Vor(✗z)+É?Vav( a)
-
2.
=
to *
Vor (× ) ,
+
to Vov( ✗ +
✗
2 2 2 2
g- g- g- g-
↳ ✗ 96
{
=
£4 •
T2 +
%,
•
T2 + ¥6 •
T2 +
↑
% .
☐
2
=
g • TZ
L)
↳ ✗ 4 ↳ ✗ 16 common
denominator
2
Vav(µ- 2)
◦
= ☐
c) sufficiency
=D is to use all the information
available of the sample
↳
the sample mean uses all
the information
* both
fly &
)Uz use all the info (✗ i. ✗ 2. ✗ 3. ✗ )
u
Ju ,
-
> ¥2 =
0.34T
≥
je > ÷ =
◦ -25T
≥
lower variance
@ hanjuz>
estimator
efficient
-
, more
smallest =
⊕ effie .
consistency
>
behaviour of the distribution of the
-
the
sampling
"
LARGER
"
estimator site →
as the
sample n
gets
ex
to the population
⊖
parameter
Baised estimator
ex :
sample medians ,
& deviation
ranges standard
↳ unbaised estimator
=
RAISED ESTIMATOR
is not the most
→
dont take into consideration
efficient one
some numbers & dont target the
population parameter
① >
Y indepv X
Linearity
-
dep .
variable is related to .
t error U
as =D Y =
Bo + Baxter
② Random
sampling
- s we assume the data we are
given is
selected
randomly
③ variation
- >
the ✗
samples are not the same value (✗ i. ✗ e. ✗ 3 . .
)
-
> linear
equation
I Y =
BO + bi ×
(
<3 linear R
simple .
model
only
=
we
variable
a Utne change in
Y
the intercept for every unit change
-
>
b. ☐ =
value of Y if in × .
→
✗ =
0 slope
@ E) (y I )]+¢xz E) Cy %)]
- -
- -
cov ×
-
•
=
,
. .
.
, ,
n -
f
ynstandard
deviation
b4 Cov ( ×
y)
=
,
s } -
s
Sx =
↳ variance
↳0
=
Y -
by ×
L>
Y =
to + BTX
correlation coefficient
cov( xx)
Pxy "
= =
Y Sx Sy
•
→
standard der is ALWAYS POSITIVE
S
→ s=sF [ ,
Cv =
I s =
Tooo =
31.622
coefficient of
I
a
=L \ ↳
variation
✓
=
^ = 39.622 measures
CV
=
0.3962
# dispevsity
I =
d- ◦ SOOO
SO
A scatter plot is a
typical application of a
→
the correlation coef
" " "← "◦ +
✓ is between -
^ + ^
a line
-t≤r≥#
0.77-5
✓ = t →
④ pert linear reiati .
0.77s p ≈ 0
→ r=0→ linear relationship
=
p no
① strong Linear ✓ . -
t - r > 0 → ⊖ mod .
/weak L .
r
[
r ≈ -
t -
>⊖ strong Linear ✓ -
t > ⊖
✓ perfect Lin vet
-
= -
- .
Y
ex
Fx
Fy
:
.
"
i.
→
in conclusion → the linear correlation coefficient is
(> SE deviation
b) 23
d) Yes →
Older less
salary p= -0-075 negative
coefficients
Estimation
Ordinary Least
Square Model
→ :
OLS unknown parameters
method of estimating
Regression equation
:
Yi= to + b. TX + ei
× -
axis
Fitted value :
given bot bt we obtain Yi
↳ value
predicted
1y=2 fitted V.
Residual value :
difference between Yi & its fitted value
ui =
Yi -
Ji
-
Ji =
BO + BTX
*
if ui ⊕ -
s under
predicting Yi
*
if Ui ① → over
predicting Yi
→
preferred 0 *
in most
case Ui cases
every
=
residual hi =/ 0
Ui -
Yi
-
( botbrx)
929.058 =
9095 -
7224.058
Residual
=
actually predicted y
-
value value
~
Y =
Bo + Box +132×-0-7=130+135×+1327
Goodness Of Fit
Coefficient of determination : R -
squared
>
Y
-
}
✓
E- a → perfect fit
◦
value of v2 -
s ◦ ≤ v2 ≥ 7 v2 ≈ O
-
S
poor fit of the OLS
Line
R v2 SSI v2
_(sgsg_ )
=D =
t
squared or =
-
SST
_
> 109
INTERPRET rates
Bo + B, ×
Y
=
intercept -
Slope to
-0
(x) A % F- Ba
by
=
slope
T
intercept (Bo) OX
=
INTERPRET →
normal
For the slope, if income per capita increases by 1 thousand of euros, consumption
of electric energy increases by 0.571 thousand kWh
T I T I L L AT E _T f-
① explanatory)
step t - > × ? ( in dependant ✗ =
9
Y ? ( dependant explained ) Y =
e
) /
> Bo =
(g)
a
→ Cov (g. e) =
# ( i E) ( Yi
✗ -
-
5) I # ◦
29.76
vav
'
Bo
f. 24
Bag
=
=
e-
(g) =n÷
2
-
>
Var (✗ i E) - =
÷ •
60.7-7 =
2.53
f. 24
dei =
Bo +
Big → -0.549
+0.489g i
=
0.489
>
By
-
=
2 . 53
→
BO =
0.83 -
O 489
-
°
2. 82
= -
O -
549
R2
Calculating
-
>
SST ( sum of squares Total ) :
-
> SSE ( sum of squares explained
-o←
by the regression ) :
→
unexplained by the regression
6- ^ ≥
R2
=
-
S =
_ , g- O.gg
14 -
S2
◦ ≤ R2 ≥ T
- >
122--0.58 -
s 587 .
① Linearity - >
Y =
bot btxtbzx + b3X . _ . + Ui
↳ non
-
linear : baised estimator less sufficient
② )
of
' '
observations
random
sampling random sample
-
< >
non -
random :
braised estimator
③ not -3
can't
perfect collinear
ity one variable be determined
↳
perfect ity perfectly from the other
:
collinear
valve Z .
MLR
salary
=-
Const + bi / Points)
-
ÉT)tbw)tUi
when points increase
by
A will increase
point salary
0.33 this dollars
by on average
① holding all the other variables
PARI BUS
?⃝
}
Model T S
Y =
Bo + Bix + Be ✗ + 13311 + But
-
F -
variables 4
the slope is higher
points in model 2
Bt =
0-33
points
,,,,,y;g,,µ-
m , ,, ,
,
,
, ,, ,
BZ 0.58 7-
=
points
R
squared
-
if variables
}
we add
22576934
→ " "°" " ^ = ^ "
=
◦ ◦ +8
'
→ → "
° " "
+ ◦ the M"" " ↑ "
24512637
we need to do
21816987
0.777
=
'
→ R 2 =
t > TT i RL adjusted
-
Model
-
-
.
zuggzggy
.
-
3
R2 -
adjusted
=
SG -
^
R2 -
adj Model 7 =
T -
(1-0.078)=0.006
56-4 -
y
s G T
(7-0-779)
-
R' 0.078
7
ju
=
2
=
model
-
adi
_
-
-
p -
→
we prefer MODEL T
Problem set T
's C- ( × ] Ecxz ]
E.
+
→
Ecju ]
=
, = 2
C- Gu , ] =
,u=t ,
- _
, ju
=
y
µ ,
BA 's
I. 2 set
]=¥E¥)+E¥)=
o
C- Gui
-
-0 Vav [× ] =
02=9
VI.fi#--'-i.z=o-sVavCsuo)--z-VavCxrz
•
I :c;)
+
) = •
2 =
0.72s
}
→ Bais -_ ECO ] - ⊖
9- 9=0
Bais µ , =
S T =
-
O -
S
O
-
=
Baisjuz
-
-
MSE
MSE Gun ) = 0.5+672=0.5 °
/ ◦ west
+ C- ◦ g) 2=0.3
, g-
MSECJUL )= 0.72s
•
lower
its MSE is
estimator as
the second
we prefer
the first estimator .
than
→ None ,
sample
c) ✓ ✗
y
=
Corky ) J = F. 3
Sx
Sy F- = 8.716
=L
( 3) (3.4-8.716)+(7.2-7.3) (9.5-8.716)
• a •
Cov ( v. 1) 6 5.2 - t -
2. 44
Corfu 1)
=
≥
3.27-6
(6-0--3)
_ . .
=
≥
3) 2+(7.2-7.3)
+
Vav (b) = ( s -2 -
F.
65.524
(3.4-8.7-16)<1-(9.5-8.716) 2+(6.6-8.716)
' =
. . .
Vav ( I )=6-
44
2-
0.1665-0 y ≤ v79 -
( v. e) =
-
=
✓ WEAK
65-524 ) ≈ 1 to →
3. 276
•
✓ (u ,
,
I but its
U &
correlation between
⊕
,
is a
◦ there
weak .
Problem set 2 Slope
INTERPRET intercept
☐
Slope
(
by Ti ,
then the rent rates will increase
by
0.03% On average
have a
→ Yes ,
they will be
rent rates
.
the
higher
are in the
city ,
the
-
≈ 7 WEAK
◦ ≈ r
t ≤ v2 ≥ t s
-
b) 2=0-992--0
-
rates is explained
of rent
variability
-
> 19 -2% Of
the
city population
by
•
↳ constant
2684 on average .
marketing spending
increases
b)
the
For the slope , if
mem b- Of por lament will
then the
9 ths euros ,
by
0.02s on average .
increase
by
c) ✓ 2=0-392
of members of parliament
-
is 39.2% Of variability
spending of marketing
.
the
is explained by
7- SO
2.684+0.025
-
ME
=
d)
-0
7- 50.000
--
met
= 29 .
434
Mt
BIX tu
°
create model 4=130 1-
(
a
a) × ?
= attend
=D
=
Bo Bt attend +
µ,
grade ;
-
Y? grade
=
it
student h as more absences ,
a
relationship : when
→
negative
will have a lower grade
122=0-785
of students final grade is
-078 -
si . Of variability
their absences .
explained by
d) grade ; =
S -
68-0=5.68
will increase
by 0-7-6 points on
students grade
average
-
S %
→ Model T 122=0 - 78s → 78 -
→ model 2 122=0-385-0 38 -
s %
0 -25s -025 Si
RL
-
2 adj
=
Model -
.
-o variance
/
↳variation
if F is niquer
0 then overall se(Bi)
* will be niquer
↳ if not lower
variance is higher= se higher
D Assumptions
p
-linearity: assumption
is broken as there is no
less efficient.
③ INTERPRET
s@
tevitiavy.edu
Bo
If R&D ↑
by ④
1% Of GDP the
innovation will
↑ 8.48 points
by
.
• (4-82) •
(63.3 ) •
(353)
↳ countries
Lo 7- 0
⑤
}
R2 63%
model T
=
variables
→
different
68% =
R2 adj
RL
-
Model 2
=
→
7- Z T
(9-0.63)<=0.8677
-
R2 -
adj Mt =p _
7- Z - T -
T
7- 2 y
0.6812=0-8930
-
'
R adi M2=
-
g- (g-
7- 2 -
3 -
of innovation is explained
In Model 2 ,
89 -3% Of variability
two more variables
R&D tevitiavy & GDP .
Adding
by ,
innovation compared
more of variable
helps
us explain
T
Model
.
to