0% found this document useful (0 votes)
7 views

Data Class

1. Cross-sectional data provides information at a given point in time, while time series data provides information ordered over time. 2. Pooled cross-sectional data combines cross-sectional and time series data. 3. Panel or longitudinal data provides time series information for each cross-sectional member. It allows observation of the same sample over time.

Uploaded by

icisanman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Data Class

1. Cross-sectional data provides information at a given point in time, while time series data provides information ordered over time. 2. Pooled cross-sectional data combines cross-sectional and time series data. 3. Panel or longitudinal data provides time series information for each cross-sectional member. It allows observation of the same sample over time.

Uploaded by

icisanman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

DATA CLASS

Cross sectional DATA :


given point in time

Time series : time


over ,
chronological Order

Pooled Cross section : cross sectional ⊕ time series

Panel or
longitudinal data : time series for each

Cross sectional m e m bet


Estimator t their properties 3

-
>

estimator
=

find mean ( µ)
when expected value
( >
is said to be unbaised
-
>
of × ( C- ( ))
x
=
value of
M" " population param .

* unbaised estimators -
s variance

proportions
ex :
random variable u =
Xp
-

biz

> i
mean

-
> : the expected of ✗
V.
µ
=
mean a

µu= C- [ u ] the expected V. of ✗ z =/Uz


is a number

u =
✗ y -

b ✗
z

juu = C- Cu ] =
C- [ × , -

bxz ]
=
C- [ xp ] -
C- [ bxz ] -1ha -
b -

Ecxz ]
+ ±
C- [ × , ] -16g
jUz= C- Cxz ]
C- [ ax ] =
a •
C- [ ✗ ]

gun =/Ui b)U2


-


variance : the variance of ✗ a =
5 }
Vav (a) =
0 } the variance of ✗ 2=0 }

Vav (a) =
Vav (× , -

bxz ) =
Vav ( ✗ a) -

vav ( bxz) =
T } -
b

Vav /✗ 2)

± ±

Vav ( ✗ g) =T } var / ✗ 2) =
T }
Var (ax)=a2.Vav( × )
T{ } }

=D =
T -
b •
T
B) v =
axz t b✗ a

→ mean :

Mv
=
C- [ v ] =
C- [ axz t b xp ] =

a
-
C- [ ✗ z ] + b- C- [ ✗ a ]

C- [ ax ] =
a. Ecx ] guv
=
a -142 +
b.)Up

→ variance :

var (v) =
TI =
Vav ( axztbx ,)
=
a


Vav (✗ 2) + b2 Vau (x , )
2

Var ( ax)=a2 .
Vav ( x) g ✓
=
of ◦
T { + b2 .
J }
Vav ( x) =
TI

not related

--

bc Obv
\ >
cross covariance = 0
are in dep

'_ÉaafO

mean :
Ecju ] ,
=

)U Ecax ] =
a. C- [ × ]

C- [Ju , ] =
ECÉ, ( × ,
+
✗ < + ✗
3
+ ✗ a) ] =
±, •
C- [ × ,
+ ✗ 2 + ✗
3 + ✗
u ]

=
±, •

C- [ × ] ,
+ Ecxz ] + C- [ ✗ 3)+ C- [ ✗ u ]
±
¥
± ±
su
=

su su

µ
=

¥ •

XX
var ( ax ) =
a
≥ .
Var ( x)

variance : vav ju ) = T2
,
hi ,

<

Vavju ) ,
=
Vav ( I, ( •

× , + ✗
ztxz + ✗ u ) =
⇐/ •
Vav ( × ,
+✗e +✗ stxu )
2 ≥
& JZ J
Z
J
2
= (( T

=
(1) 2- Var ( × )
,
+ Vav (a) + Vav (✗ 3) + Vav (✗ )
u

=
(4)
'

◦ 4T

=
¥6 •
402 =

¥ =
Vav (jur )
t-frtrftfrfr-rhofvavcax.az -
Vavcx)
→ :
variance

var Gina)=µ =

Vav( tox ,
+ to ✗ < +1-4×3 { 4) + ✗

a) E.2. Vor(✗z)+É?Vav( a)
-
2.
=
to *
Vor (× ) ,
+
to Vov( ✗ +

2 2 2 2
g- g- g- g-

↳ ✗ 96
{
=
£4 •

T2 +
%,

T2 + ¥6 •

T2 +

% .

2
=
g • TZ

L)
↳ ✗ 4 ↳ ✗ 16 common

denominator
2

Vav(µ- 2)

= ☐

c) sufficiency
=D is to use all the information
available of the sample


the sample mean uses all

the information
* both
fly &
)Uz use all the info (✗ i. ✗ 2. ✗ 3. ✗ )
u

they are sufficient

D) we prefer the lowest estimator


-
> LOWER VARIANCE

Ju ,
-
> ¥2 =

0.34T

we prefer the jut estimator as it has a

je > ÷ =
◦ -25T

lower variance
@ hanjuz>
estimator

efficient
-

, more
smallest =
⊕ effie .
consistency
>
behaviour of the distribution of the
-

the
sampling
"

LARGER
"
estimator site →
as the
sample n
gets

ex

the more consistent


-
it the it is
gets closer

to the population

parameter

Baised estimator

> estimator tends to


MSE
-

over or underestimate the parameter


↳ not precise

ex :
sample medians ,

& deviation
ranges standard
↳ unbaised estimator
=
RAISED ESTIMATOR
is not the most

dont take into consideration
efficient one
some numbers & dont target the

population parameter

Assumptions for simple linear regression

① >
Y indepv X
Linearity
-

dep .
variable is related to .
t error U

as =D Y =
Bo + Baxter

② Random
sampling
- s we assume the data we are
given is

selected
randomly

③ variation
- >
the ✗
samples are not the same value (✗ i. ✗ e. ✗ 3 . .
)

④ Zero conditional mean -3 U has expected value of 0 given

any value for × .


Linear Regression Model

-
> linear
equation
I Y =
BO + bi ×

(
<3 linear R
simple .

model
only
=
we

dependant have one × -

variable

a Utne change in
Y
the intercept for every unit change
-
>
b. ☐ =
value of Y if in × .


✗ =
0 slope

@ E) (y I )]+¢xz E) Cy %)]
- -
- -

cov ×
-

=
,
. .
.
, ,

n -
f

ynstandard
deviation

b4 Cov ( ×
y)
=
,

s } -
s
Sx =

↳ variance

↳0
=
Y -

by ×

L>
Y =
to + BTX

correlation coefficient

cov( xx)
Pxy "
= =

Y Sx Sy


standard der is ALWAYS POSITIVE

coef can 't


in this example the Corral -
is negative which

with a covariance =D NOT COMPATIBLE


?⃝
sum of all the X
>
(

S
→ s=sF [ ,

Cv =
I s =
Tooo =
31.622
coefficient of

I
a

=L \ ↳
variation


=
^ = 39.622 measures
CV
=
0.3962
# dispevsity
I =
d- ◦ SOOO

SO

A scatter plot is a
typical application of a

regression analysis (not baised )


the correlation coef
" " "← "◦ +
✓ is between -
^ + ^

a line

-t≤r≥#
0.77-5
✓ = t →
④ pert linear reiati .

g- v≈t → ① strong linear retort -

☐< r > A → moderate/ weak L V -

0.77s p ≈ 0
→ r=0→ linear relationship
=

p no

① strong Linear ✓ . -
t - r > 0 → ⊖ mod .
/weak L .
r

[
r ≈ -
t -
>⊖ strong Linear ✓ -

t > ⊖
✓ perfect Lin vet
-
= -
- .

Y
ex
Fx
Fy
:
.

"
i.


in conclusion → the linear correlation coefficient is

strong LINEAR the


a relationship but

shows non linear relationship


graph a ,

a curve The should be closer to O


its p
'
-
PFAFF

(> SE deviation

a) cross section , nba


players ,
56

b) 23

C) we can't be the variables have different units we

need to calculate coefficient of variation (Cu)

d) Yes →
Older less
salary p= -0-075 negative

e) Points & minutes


- >
larges linear correlation

coefficients
Estimation
Ordinary Least
Square Model

→ :
OLS unknown parameters
method of estimating

Regression equation
:
Yi= to + b. TX + ei

× -

axis

Fitted value :
given bot bt we obtain Yi
↳ value
predicted

botblx > St 3.x S s


y St 3 s
Y + Ui ✗
-
-
= =
ex y
-
: =

1y=2 fitted V.
Residual value :
difference between Yi & its fitted value

ui =
Yi -

Ji
-

Ji =
BO + BTX

*
if ui ⊕ -
s under
predicting Yi
*
if Ui ① → over
predicting Yi

preferred 0 *
in most
case Ui cases
every
=

residual hi =/ 0

Ui -
Yi
-

( botbrx)

929.058 =
9095 -
7224.058

Residual
=

actually predicted y
-

value value
~

Y =
Bo + Box +132×-0-7=130+135×+1327

Goodness Of Fit

Coefficient of determination : R -

squared

>
Y
-

the fraction of sample variation in that is explained



by
.

}

E- a → perfect fit


value of v2 -
s ◦ ≤ v2 ≥ 7 v2 ≈ O
-
S
poor fit of the OLS

Line

R v2 SSI v2
_(sgsg_ )
=D =
t
squared or =
-

SST
_
> 109
INTERPRET rates

Bo + B, ×
Y
=

intercept -
Slope to
-0

(x) A % F- Ba
by
=
slope
T

intercept (Bo) OX
=

INTERPRET →
normal

For the slope, if income per capita increases by 1 thousand of euros, consumption
of electric energy increases by 0.571 thousand kWh

FIND OLS ESTIMATOR

T I T I L L AT E _T f-

① explanatory)
step t - > × ? ( in dependant ✗ =
9
Y ? ( dependant explained ) Y =
e

② step 2- > y Bo + Box -


>
Éi =
13^0 +
Biagi cov(g. e)
=

) /
> Bo =

(g)
a
→ Cov (g. e) =
# ( i E) ( Yi
✗ -
-

5) I # ◦
29.76
vav

'
Bo
f. 24
Bag
=
=
e-

(g) =n÷
2
-
>
Var (✗ i E) - =

÷ •
60.7-7 =
2.53

f. 24
dei =
Bo +
Big → -0.549
+0.489g i
=
0.489
>
By
-
=
2 . 53


BO =
0.83 -
O 489
-
°
2. 82
= -
O -
549
R2
Calculating
-
>
SST ( sum of squares Total ) :

-
> SSE ( sum of squares explained
-o←

by the regression ) :

- > SSR ( sum of squares residual ) :


unexplained by the regression

6- ^ ≥
R2
=
-
S =
_ , g- O.gg
14 -
S2

◦ ≤ R2 ≥ T
- >
122--0.58 -
s 587 .

* 58% of the variability on the


average growth rate is

explained growth rate of GDP


by the .
Properties of the regression coefficients
MULTIPLE linear regression
MLR assumptions :

① Linearity - >
Y =
bot btxtbzx + b3X . _ . + Ui

↳ non
-
linear : baised estimator less sufficient

② )
of
' '

observations
random
sampling random sample
-

< >
non -
random :
braised estimator

③ not -3
can't
perfect collinear
ity one variable be determined


perfect ity perfectly from the other
:
collinear

7=130 + Bix +132×2+133×3


,
→ ✗ 3--11 ,
+ Xz *
if you know value × can't deter min

valve Z .

① Zero conditional mean - S U has exp value of 0 given


<>
regression model differs any values of indep V.
from true model

⑤ tomoskedasticity -7 U has same variance as


indep
↳ variables
variance =
baised

MLR

salary
=-
Const + bi / Points)
-

ÉT)tbw)tUi
when points increase
by
A will increase
point salary
0.33 this dollars
by on average
① holding all the other variables

fixed ( HT , age , wt ) CENTERIS

PARI BUS
?⃝
}
Model T S
Y =
Bo + Bix + Be ✗ + 13311 + But
-

F -
variables 4
the slope is higher
points in model 2

Bt =
0-33
points

,,,,,y;g,,µ-
m , ,, ,
,
,
, ,, ,

BZ 0.58 7-
=

points

R
squared
-

if variables

}
we add
22576934
→ " "°" " ^ = ^ "
=
◦ ◦ +8
'
→ → "
° " "
+ ◦ the M"" " ↑ "
24512637
we need to do
21816987
0.777
=
'
→ R 2 =
t > TT i RL adjusted
-

Model
-
-
.

zuggzggy
.

-
3
R2 -

adjusted
=

SG -
^
R2 -

adj Model 7 =
T -

(1-0.078)=0.006
56-4 -
y

s G T
(7-0-779)
-

R' 0.078
7
ju
=

2
=
model
-

adi
_
-

-
p -


we prefer MODEL T
Problem set T

MSE & BAIS

→ Ecx ]=M NBA's

's C- ( × ] Ecxz ]
E.
+

Ecju ]
=
, = 2
C- Gu , ] =
,u=t ,
- _

, ju
=
y
µ ,

BA 's
I. 2 set

]=¥E¥)+E¥)=
o
C- Gui
-

-0 Vav [× ] =
02=9

VI.fi#--'-i.z=o-sVavCsuo)--z-VavCxrz

I :c;)
+

) = •
2 =
0.72s

}
→ Bais -_ ECO ] - ⊖

9- 9=0
Bais µ , =

S T =
-
O -

S
O
-

=
Baisjuz
-

-
MSE
MSE Gun ) = 0.5+672=0.5 °
/ ◦ west

+ C- ◦ g) 2=0.3
, g-
MSECJUL )= 0.72s

lower
its MSE is
estimator as
the second
we prefer
the first estimator .

than

→ None ,

they don't use all the info of the

sample
c) ✓ ✗
y
=
Corky ) J = F. 3

Sx
Sy F- = 8.716

=L
( 3) (3.4-8.716)+(7.2-7.3) (9.5-8.716)
• a •

Cov ( v. 1) 6 5.2 - t -

2. 44
Corfu 1)
=


3.27-6
(6-0--3)
_ . .
=

3) 2+(7.2-7.3)
+

Vav (b) = ( s -2 -
F.

65.524
(3.4-8.7-16)<1-(9.5-8.716) 2+(6.6-8.716)
' =
. . .

Vav ( I )=6-

44
2-
0.1665-0 y ≤ v79 -

( v. e) =
-

=
✓ WEAK
65-524 ) ≈ 1 to →

3. 276

✓ (u ,
,

I but its
U &
correlation between

,

is a
◦ there

weak .
Problem set 2 Slope

INTERPRET intercept


Slope
(

a) For the slope , if the


city population increases

by Ti ,
then the rent rates will increase
by
0.03% On average

relationship If more people


positive
.

have a
→ Yes ,
they will be
rent rates
.

the
higher
are in the
city ,
the

-
≈ 7 WEAK
◦ ≈ r
t ≤ v2 ≥ t s
-

b) 2=0-992--0
-

has a positive but


→ the determination coefficient
weak linear relationship .

rates is explained
of rent
variability
-
> 19 -2% Of
the
city population
by

↳ constant

a) For the intercept , if the


spending in
marketing
the members Of / ament will be
is 0 ,
then par

2684 on average .

marketing spending
increases

b)
the
For the slope , if
mem b- Of por lament will
then the
9 ths euros ,

by
0.02s on average .

increase
by

c) ✓ 2=0-392
of members of parliament
-
is 39.2% Of variability
spending of marketing
.

the
is explained by
7- SO
2.684+0.025
-

ME
=

d)
-0
7- 50.000
--
met
= 29 .
434
Mt
BIX tu
°
create model 4=130 1-

(
a

a) × ?
= attend
=D
=
Bo Bt attend +
µ,
grade ;
-

Y? grade
=

it
student h as more absences ,
a
relationship : when

negative
will have a lower grade

b) notes taken in class

c) For the slope , if students number of absences

increases 1 their final will


by absence , grade
decrease
by 0.26 points on average .

122=0-785
of students final grade is
-078 -

si . Of variability
their absences .

explained by

d) grade ; =
S -
68-0=5.68

e) if the participation in class increases


by
T the

will increase
by 0-7-6 points on
students grade
average
-

S %
→ Model T 122=0 - 78s → 78 -

Model T R2 0.983 -018.3%


adj
- = .

→ model 2 122=0-385-0 38 -
s %

0 -25s -025 Si
RL
-

2 adj
=
Model -
.

◦ Model 2 is better than model 9- as zs.SI .


Of

students grades is explained their absences


by
and participation .
rock
exam

-o variance
/
↳variation

if F is niquer
0 then overall se(Bi)
* will be niquer

↳ if not lower
variance is higher= se higher

D Assumptions
p
-linearity: assumption
is broken as there is no

linear relationship. The


a
estimators are based

less efficient.
③ INTERPRET

s@
tevitiavy.edu

Bo

If R&D ↑
by ④
1% Of GDP the

innovation will

↑ 8.48 points
by
.

• (4-82) •
(63.3 ) •
(353)

↳ countries

Lo 7- 0

}
R2 63%
model T
=
variables

different
68% =
R2 adj
RL
-

Model 2
=

7- Z T
(9-0.63)<=0.8677
-

R2 -

adj Mt =p _

7- Z - T -
T

7- 2 y
0.6812=0-8930
-

'
R adi M2=
-

g- (g-
7- 2 -
3 -

of innovation is explained
In Model 2 ,
89 -3% Of variability
two more variables
R&D tevitiavy & GDP .

Adding
by ,

innovation compared
more of variable
helps
us explain
T
Model
.

to

You might also like