0% found this document useful (0 votes)
275 views27 pages

Empirical Methods For Finance: Sjoerd Van Den Hauwe

The document discusses principal components analysis (PCA) as a method to reduce the dimensionality of correlated explanatory variables and extract common signals from imperfect proxies. It provides an example of using PCA to compute investor sentiment from a set of market-based sentiment proxies. Specifically, it (1) standardizes and transforms the proxies, (2) computes principal components from the correlation matrix to obtain loadings and eigenvectors, and (3) uses the top component to represent the common sentiment signal shared among the proxies.

Uploaded by

bscjjw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
275 views27 pages

Empirical Methods For Finance: Sjoerd Van Den Hauwe

The document discusses principal components analysis (PCA) as a method to reduce the dimensionality of correlated explanatory variables and extract common signals from imperfect proxies. It provides an example of using PCA to compute investor sentiment from a set of market-based sentiment proxies. Specifically, it (1) standardizes and transforms the proxies, (2) computes principal components from the correlation matrix to obtain loadings and eigenvectors, and (3) uses the top component to represent the common sentiment signal shared among the proxies.

Uploaded by

bscjjw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Empirical Methods for Finance

(FEM11198-21)

Sjoerd van den Hauwe


[email protected]

Department of Business Economics – Finance


Erasmus School of Economics
Erasmus Universiteit Rotterdam

Master’s in Financial Economics


Week 7, December 8, 2021
Outline Week 7

Part I: Principal components analysis


I Motivation
I Investor sentiment
I Computation
I Number of components
Part II: Q&A

2
Motivation

All covered methods consider the set of regressors as known.


Two issues might pop up.
I Regressors show strong mutual correlation (week 3).
I Imperfect proxies for the financial quantity of interest.

Remedies
I Summarizing the regressors in fewer variables (dimension reduction).
I Combining proxies to extract a stronger signal of the financial quantity.

→ Principal components analysis (PCA) can do both jobs.

3
Setting

For each (firm) i, (i = 1, 2, . . . , T ) we have

I The set of explanatory variables xj,i , (j = 1, 2, . . . , J)


I → Potentially relevant for dependent variable yi .

Pairs of regressors (xj1 , xj2 ) can

I Be strongly correlated (check sample correlations; VIFs).


I Share a common component (economics/finance theory).

PCA: Construct linear combinations of regressors such that

I Much of the variation in the regressor set is captured in few combinations.


I These linear combinations are uncorrelated (orthogonal).

4
Regressor Combination

We stack regressor observations per firm i


I xi = (x1,i , x2,i , . . . , xJ,i )0
I No constant (=regressor that does not vary)

Collect these firm regressor vectors in


 0 
x1
 x20 
I X =  .  (see week 1).

 .. 
xT0
I T × J matrix.

A linear combination of the J regressors is


I z = Xa = ( Jj=1 xj,1 aj , Jj=1 xj,2 aj , . . . , Jj=1 xj,T aj )0 ,
P P P

I a = (a1 , a2 , . . . , aJ )0 a J × 1 vector with linear combination weights.

5
Principal Components

PCA finds the linear combinations zk = Xak , (k = 1, 2, . . . J) such that

I Total variation of all the zk equals total regressor variation.


I All combinations are uncorrelated: zk1 0 zk2 = 0.

→ These linear combinations

I Ordered (descending) according to their sample variance


I are called the sample principal components.

Hence, the sample variance of the first principal component z1 is larger than
the second’s, z2 ’s, etc.

6
Investor Sentiment Example

Baker & Wurgler (2006, 2007) [BW]:


I Relation investor sentiment and stock prices.

BW define investor sentiment as ”[investor’s] propensity to speculate.”

No definitive or uncontroversial measures of sentiment →


I Construct a composite index
I Use the common variation in a set of sentiment proxies.

Proxies can be
I Survey based (e.g., confidence indexes)
I Market based (trading characteristics)

Both can be dealt with using PCA to extract a common component that is
defined as investor sentiment.

7
Market-Based Proxies

BW consider 5 variables each containing a part representative of investor


sentiment.

Monthly data on (acronym; sign of relation with sentiment)

I Number of initial public offerings (IPOs) (NIPO; +)


I First-day IPO returns: Monthly average (RIPO; +)
I Closed-end funds discount rate: Average difference net asset value per
share–market price. (CEFD; –)
I Equity share in new issues: Volume new equity issues relative to new
equity + long-term debt issues (ENI; +)
I Dividend premium: Log average difference value-weighted M2B dividend
payers and non-payers (PDND; –)

All 5 contain a sentiment component and idiosyncratic (=non-sentiment) part.

8
Transformations

Check regressors for


I Relative timing (some lead/lag others) (time-series data)
I Trending behavior (same reasons as in week 3) (time-series data)
I Aberrant observations

→ Check plots.

Additionally: In most cases we standardize the xj ’s


I If one regressor’s sample variance dominates the others’ →
I It will simply be the first principal component.

→ Standardization erases the impact of regressor’s scale.


I It is the scale-free mutual correlation we need.
I PCA combines those parts of the regressors that co-move to form a linear
combination with maximum sample variance.

9
Example

Investor sentiment: Monthly data for January 1967–December 2018 (T = 624)

Timing:
Some proxies lead/lag (inspecting plots and cross-correlations):
I Dividend premium (PDND) leads 12 months.
I RIPO leads 12 months.

Transformations:
I 12-Month moving averages to remove noise.
I All 5 proxies standardized.

10
Inversely Related Proxies
2 Transformed proxies inversely related to sentiment, Jan. 1967–Dec. 2018

-1

-2

-3
70 75 80 85 90 95 00 05 10 15

RECESSION CEFD PDND

11
Positively Related Proxies
3 Transformed proxies positively related to sentiment, Jan. 1967–Dec. 2018

-1

-2
70 75 80 85 90 95 00 05 10 15

RECESSION RIPO
NIPO ENI

12
Eigenvalues and Eigenvectors

A square (q × q) matrix A has q


I Eigenvalues (scalar) λi , i = 1, . . . , q
I Assciated eigenvectors (q × 1 vector) ei .

Each eigenvalue-eigenvector pair (λi , ei ) has the characteristic that

Aei = λi ei

→ Matrix times eigenvector equals eigenvector multiplied by the eigenvalue.

Note:
I Eigenvectors are known up to a proportionality factor.
I If ei is an eigenvector, then c · ei is.
I We normalize the eigenvectors (=have unit length) such that ei 0 ei = 1.

13
Computing Principal Components

If regressor variables are standardized, then


1
R≡ X 0X
T −1
is the matrix with sample correlations.

Employing PCA:
1. Compute all eigenvalue-eigenvector pairs (λk , ek ) of R, (k = 1, 2, . . . , J) .

I Sort eigenvalues in descending order:


I λ1 ≥ λ2 ≥ . . . ≥ λJ .
I Eigenvectors are normalized: ek 0 ek = 1.

2. The kth sample principal component (SPC) is the T × 1 vector

zk = Xek , → the ith observation of the kth SPC is zk,i = xi0 ek .

14
Correlation Matrix

Investor sentiment example January 1967–December 2018 (T = 624)

Correlation matrix:

 
CEFD NIPO PDND RIPO ENI

 CEFD 1 

 NIPO −0.35 1 
R= 

 PDND 0.61 −0.53 1 

 RIPO −0.24 0.29 −0.51 1 
ENI 0.25 0.27 −0.04 0.15 1

→ Finance interpretation: Mutual correlation predominantly due to sentiment


component they share.

→ PCA to extract this common component: First SPC summarizes investor


sentiment.

15
Loadings

Eigenvalues of sample correlation matrix R in descending order.


λ1 λ2 λ3 λ4 λ5
2.30 1.25 0.73 0.42 0.29

Associated eigenvectors are called SPCs’ loadings.

     
−0.473 0.467 0.208

 0.482 


 0.256 


 −0.540 

e1 = 
 −0.589 ,
 e2 = 
 0.081 ,
 e3 = 
 −0.028 ,

 0.437   0.182   0.805 
0.080 0.822 −0.126
   
0.489 −0.525

 0.621 


 0.159 

e4 = 
 0.200 ,
 e5 = 
 0.778 .

 0.213   0.287 
−0.538 0.109

16
Interpretation

Check how the kth SPC zk loads on the original variables xj , (j = 1, . . . , J).

I As the xj are standardized, sample correlation of SPC and regressor is


p
cor(z
c k , xj ) = ek,j λk .

I Example: First sample principal component = investor sentiment:

z1,t = −0.473 · CEFDt + 0.482 · NIPOt − 0.589 · PDNDt−12


+ 0.437 · RIPOt−12 + 0.080 · ENIt .

I Correlation 5 financial sentiment proxies with investor sentiment

CEFD NIPO PDND RIPO ENI


−0.72 0.73 −0.89 0.66 0.12

17
Investor Sentiment Index
Time-series plot Baker & Wurgler Investor Sentiment, Jan. 1967–Dec. 2018.

-1

-2

-3

-4
70 75 80 85 90 95 00 05 10 15

RECESSION
PC1 -- Investor Sentiment

18
Number of Components

In the example we have a clear financial interpretation of the first component.

How many components to select in general?


I If goal is to summarize much of the total variation by a few components:
I Select the first p that capture relatively much.

If regressors are standardized →


I Total sample variance equals sum of diagonal elements of R: J · 1.
I Sample variance of the kth SPC equals ek0 Rek = λk .
I Hence, kth SPC accounts for λk /J × 100% of total sample variance.

Pp
As SPCs are uncorrelated, the first p explain J −1 k=1 λk × 100%.

19
Choosing Components

How to choose the SPCs depends on financial application.

1. Summarizing much of total sample variance in few SPCs (multicollinearity)


I Make scree plot of λk versus k:
I Look for ”elbow”: eigenvalues more or less of equal size after elbow.

2. Extracting a common component with financial interpretation


I Compute loadings of SPCs on regressors:
I Look for pattern in loadings/check plots.

Relation between regressors and yi is NOT taken into account by PCA.


I The last SPC can have the highest sample correlation with regressand yi !

20
Scree Plot
Cumulative percentage of total sample variance explained by kth SPC

kth SPC
1 2 3 4 5
% 46 71 86 94 100

Scree Plot (Ordered Eigenvalues)

2.4

2.0

1.6

1.2

0.8

0.4

0.0
1 2 3 4 5

21
Principal Components Regression

Observations zk,i on the first p SPCs can serve as regressors in a model for yi ,
p
X
yi = β0 + βk zk,i + εi .
k=1

I Much of the variation of the original regressors is retained.


I New regressors are mutually uncorrelated by construction.
I Finance interpretation: Examine correlation between relevant SPCs in the
regression and original regressors.

Example: Baker & Wurgler use investor sentiment


I To trade on it.
I As factor (=regressor) in empirical asset-pricing model.

22
Course Conclusions

Range of commonly applied empirical methods for finance → to analyze


I Time-series data
I Cross-sectional data
I Panel data

Scheme for doing empirical work in finance


I RQ → empirical (regression) model
I Estimate model parameters
I Check model assumptions (diagnostics)
I Adjust model/use robust s.e.’s if need be
I Test hypotheses (t and F testing)

23
Last Words . . .

This course epitomized in the ”Last Words” by Angrist & Pischke (2009,
p.327):

”If applied econometrics were easy, theorists would do it. But it’s
not as hard as the dense pages of Econometrica might lead you to
believe. Carefully applied to coherent causal questions, regression and
2SLS almost always make sense. Your standard errors probably won’t
be quite right, but they rarely are. Avoid embarrasment by being your
own best skeptic, and especially, DON’T PANIC!”

Joshua D. Angrist and Jörn-Steffen Pischke (2009), Mostly Harmless


Econometrics, Princeton, NJ: Princeton University Press.

24
Course Material Week 7

I Slides

Principal components analysis


I Brooks: Appendix 4.2

25
Background Material Week 7

Principal Components Analysis and Investor Sentiment


I M. Baker & J. Wurgler (2006), Investor Sentiment and the Cross-Section
of Stock Returns, Journal of Finance, 61, pp. 1645–1680
I M. Baker & J. Wurgler (2007), Investor Sentiment in the Stock Market,
Journal of Economic Perspectives, 21, pp. 129–151

26
Q&A

Any questions on the 7 lectures or course material?

27

You might also like