0% found this document useful (0 votes)
85 views

CH - 4 - Application To Time Series and Panel Data in Stata

Application to Time Series and Panel Data in stata

Uploaded by

mengistu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
85 views

CH - 4 - Application To Time Series and Panel Data in Stata

Application to Time Series and Panel Data in stata

Uploaded by

mengistu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 40
Mengistu Yismaw (MSc.) Department of Economics Debre Markos University (Burie Campus) Email: [email protected] Chapter Outline “ Introduction to time series regression model = what is time series data * Dealing with Time series data: Declare the data as time series, sorting. + Some Diagnostic tests © Autocorrelation: Durbin Watson test, Breusch-Godtrey Test * Dealing with autocorrelation * Stationarity test © Informal tests: Graphical methods of detecting non-stationarity © Formal tests: Unit root test Estimation of stationary data: OLS = Dealing with non-stationary data © Differencing © Detrending * Estimation of non-stationary data: Cointegration and ECM “+ Introduction to Panel regression model = what is panel data * Specification and Estimation of panel regression model Fixed vs random effect model: Hausman Test IU ee EO 4.1. Introduction to Time Series Regression Model What is Time Series Data Q Time-series data is a sequence of data points collected over time © Itallow us to track changes over time. Q Time-series data can track changes over years, quarters, months, weeks, days, seconds... Q Acollection of random variables ordered in time known as stochastic process Ca UY. TUL ce CMO) In estimating time series data, the first thing we need to do is declaring the data is time series. To declare the data as yearly data i. Using menu bar Click statis >tie series > st ups and utes > declare Ti Then you will see the following dialog box ji, Using command bar Syntax: tsset timevar, yearly Example: tsset Year, yearly Note: you cane declare time series data at any time you want. For example, Y Quarterly: tsset Year, quarterly ¥ Monthly: tsset Year, monthly Y Daily: tsset Year, daily rT UY. Me Le ea OT Vv Time series data have certain characteristics that cross-sectional data do not, These require special attention when applying OLS to time series data. Then before making diagnostic tests, we have to sort the data based time Syntax: sort timevar. Example: sort Year Autocorrelation Test Autocorrelation (serial correlation) problem arises when error terms in a regression model correlate over time or dependent one each other. icc. if there is autocorrelation problem, cov (wi, uj [Xi, Xj) =E(ui, uj) #0 wheres i # j Ca UY. TUL ee 1. Informal Tests C1 It helps to visualize if there is autocorrelation problem A. Draw time series graph of the residual > Follow the following steps Run the regression model ji, Predict the error term (Ui) ili, Draw time series line for ui Example: reg NPL Loans predict uhat, residual tsline uhat You can draw with reference line tsline uhat if e(sample)==1, yline(0) Me Le ea OT B. Correlogram and Autocorrelation Function (ACF) syntax: corrgram var Example: corrgram uhat Q Plot the autocorrelation function Q ACF relates the correlation coefficient within a given variable over time to its lag value a SRRESEERRSGEERIELELEELES SSSEEREEEEG EE > Le. it shows corr(Yt, Yt-k)/var(¥t) syntax: ac var Example: ac uhat TM ee 8. Run axillary regression Syntax: reg uhat l.uhat Qiif the coefficient is high and statistically significant, indicates evidence of autocorrelation problem. = ae > In our case the coefficient is high enough (0.369) and statistically significant @ 5% (p-value =0.002) © Implies there is autocorrelation problem in our model FE Te SESS ETT OCHRE SAA SOOSIUNIVERSTITIONAUT II, Formal tests Celi) C It helps to get statistical evidences about the presence of autocorrelation problem A. Durbin Watson test > Follow the following steps 1 Run the regression model 24 Run Watson test Syntax: estat dwatson Example: reg NPL Loans estat dwatson ® Or you can use ‘dwstat’ command instead of ‘estat dwatson’ Ca UY. eat deatson Dusbin-watson d-otatistic( 2, 66) = 1.208867 tatson d-statistic( 2, 66) = 1.208847 Decision + Ifthe model has no autocorrelation problem, d- statistics will be 2 (very close to 2). 4 Ifthe model has perfect (serious) autocorrelation problem, d- d-statistics will be 0 (close to 0).. » For the above example, since the d-statistics is not 2 (not very close to 2), our model has some autocorrelation problem, TUL ee B. Breusch-Godfrey Test Syntax: bgodfrey Decision: \f p-value less the intended level of significance, we have to reject Ho © There is autocorrelation problem > For the above example, since the p-value (0.0022) lower than the intended level of significance (10%), we have to reject Ho © There is serial correlation (autocorrelation) problem GIL ee CeO ok PEE arate) Use Cochrane-Orcutt regression Q The ‘prais’ command is used to perform Cochrane-Orcutt transformation Example: prais NPL Loans, core Note: Durbin Watson or Breusch-Godfrey Test } appropriate after corc regression. a ot: avadtabiatax: pais © Because the problem is already corrected. raz), « byodtrey © Please look at the transformed d-statistics. [This command only works after regre: (301) UY. TUL ee Ca Celi) 2, Stationarity (unit root) Test CNon-stationary time series data will have a time varying mean or a time-varying variance or both. ; Tea nEPanEoE © Which makes forecasting or prediction difficult. | |, Informal tests OQ Helps to visualize and check if there is some sort of negative or positive trend in the data, it is an indication of stationarity A. Autocorrelation graph Syntax: ac var Example: ac NPL ac Loans IAPTER FOUR: TIME SERIES & PANELDATA __ DEBRE MARKOS UNIVERSI =i Il. Draw time series line graph Syntax: tsline var. Q With zero trend line tsline var.if e(sample)==1, yline(0) Example: tsline NPL if e(sample)==1, yline(0) Q You can draw time series line graph for multiple variables at a time Example: tsline NPL Deposit Loans if e(sample}==1, yline(0) ok GLI Celi) Il. Formal test Dickey-Fuller (DF) test O Suppose the following random walk model Ve= Pes * Ur HO: unit root (B=1)-non-stationary Hi: unit root stationary » The general essence behind the unit root test of stationarity is, therefore, to find out if the estimated rho(p) is statistically equal to one Syntaxe: dfuller var Examples: ctier nr ules Loane Decision: if the absolute value of the Test statistics greater than the absolute value intended significant level (usually 5%), we reject HO © Implies that the data (variable) is stationary (no unit root). + In other word, if mackinnon p-value > the intended significant level(usually 5%), we accept HO => Both cases show that NPL and Loans are not stationarity at level Ca UY. TUL ce CMI Celi) Q four data is stationary, the process can be finished here O And we can make estimation, prediction... using OLS without any problem > i.e. we can run this regression ‘reg NPL Loans’ and interpret the result QO However, our data is not stationary, we have to take some remedial measures before we run the regression. Q Unless our result will be spurious! Ca UY. TUL ce CMO) yy arb) @ Atthough our interest isin stationary time series, one often encounters nonstationary time sees, the classic example being the random walk model (RWM}) |. The following are the most common techniques used to transform nonstationary time series ‘A Detrending: Trend Stationary Processes (TSP) 1 Ifthe trond in a time saries is completely predictable and not variable, we call it a deterministic rend it can be treated by detrending |G This procedure of removing the (deterministic) rend is called detrending, i. Subtract the mean of Y from Mt, the resulting series wil be stationary, hence the name trend Example: forthe variable NPL ‘gen NPLbar = mean(NPL) send INPL-NPUbar) ‘nd forte variable Loans gen Loansbar~ mean(loans) fg2n dloans = (Loans - Loansbar) Why? > Then check stationarity of the new variables ANPL and dloans Thiet ‘CHAPTER FOUR: TIME SERIES & PANEL DATA uv => The DF results show that both variables are not becoming stationary by detrending. SII) li, Regress the variable on time and the residuals from this regression will then be stationary. Example: for the variable NPL ; reg NPL Year aalitae ae oy i predict ui, residual And for the variable Loans asereeeres seis x reg Loans Year Be eeneiat eh Catttenl 100 ceiteal predict ui2, residual cx ime a ai =m Then check stationarity of the new error terms (ui and ui2) Saene ne eer ee > If the error terms are stationary use ul and ui2 in the regression instead of NPL and Loans . = However, the DF results for both error terms show that they are nat Syntax: reg ui ui2 becoming stationary again. why? 1 The estimate of above mode! will not be spurious! "hans this is because the two variables are not TSs but DSPs > Letus check if DSP! Me Le ea OT B. Differencing: Difference Stationary Processes (DSP) 2. ifthe trend is not predictable, we callita stochastic trend = D__Itcan be transform to stationary by taking the nth difference of the variable | ae This procedure of transforming non-stationary data to stationary nth difference is called differencing, ‘Syntaxes: To generate 1* difference of the variable (e.g. NPL) igen dINPL = d.NPL To generate 1" diference of the variable (e.g. NPL) gen diloans =d.loans Then check stationarity of the new variable (44NPL) TL Note: You loses one observation each cifferencing, Hence, take care for the sample Size and di ; 7 ‘=> The DF result sows that both variables are stationary at first difference" eee => Implies that both variables are DSP. “% 3 ‘CHAPTER FOUR: TIME SERIES & PANELDATA _ DEBRE MARKOS UNIVERSITY(DMU) Oif first difference of the variable is not stationary, take the second difference and so on. To generate 2™ difference of the variable (e.g. NPL) gen d2NPL = d.d1NPL Note: when you take more differences, you loses more observation each differencing. Hence, take care for the sample size and df. TU Y. TUL ce Integrated Stochastic Processes Time series that can be made stationary by differencing is called integrated stochastic process. o Recall that the RWM without a drift is nonstationary but its first difference is stationary. > Thus we call RWM without a drift integrated of order 1, denoted as y, ~ ! (1). > Similarly, if a time series has to be differenced twice to make it stationary, such a time series is called integrated of order 2, denoted as y, ~ I (2) © In general, if a nonstationary time series has to be differenced d times to make it stationary, is said to be integrated of order d, y; ~ I (d). > Iftime series y, is stationary from the start, called integrated of order 0, y, ~ | (0) Note: The terms ‘stationary time series’ and time series integrated of order zero’ to say the same thing. Ca UY. HAPTER FOUR: ME SERIES & PANEL DATA CMO) Properties of integrated series Let x, y, &z; be three time series: # Ifx, ~ 0) and y, ~ I(1), then 2, = (x, + y) is I(1). —The sum of stationary and nonstationary time series is nonstationary. “If x ~ 1d), then y; = (a + bx, ~ I(d); where a and b are constants. —The linear combination of I(d) series is also I(d). “If X, ~ 1(d,) and y, ~ 1(d,), then z, = (ax, + by,) ~ id), where d; > do. * Ifx, ~ I(d) and y, ~ I(d), then z, = (ax, + by,) ~ I(d), where d’= d, but sometimes d'< d. Example: If x, ~ /(7) and y, ~ I(1), then it can be that z, = (ax, + by,) ~ 1(0) > The possibility to find stationary linear combinations of nonstationary time series is known as Cointegration! TOT TELL Lee Cointegration C1 We have seen that the regression of @ nonstationary time series on another nonstationary time series may produce a spurious regression O However, this may not always happen. o If the variables are integrated of the same order, sometimes regression of a unit root time series on another unit root time series may give us non- spurious regression. The variables are co-integrated! i.e. error term will be stationary! This makes estimation of the model using OLS possible! > We have seen that NPL and Loans are stationary at 1* difference Means that they are integrated of order 1, y, ~ / (1). There may be a possibility of cointegration between the two variables If so, the result of OLS regression ‘reg NPL Loans’ will not be spurious, though the two variables are not stationary at level [not integrated of order 0, y; ~ | (0)] Therefore, let us conduct the cointegration test ok |APTER FOUR: TIME SE ES & PANEL Di IIo) Q The existence of cointegration amongst sets of nonstationary time series has three important implications. 1. The existence of dynamic long-run equilibria (co-movement r/p) 2. The long-run parameter estimates) converge to their population values 3. Allows for specification of both long-run and short-run dynamics. TOT TRE Le =i Celi) The Choice of appropriate Lag Length Q Before conducting cointegration test, we need to know the variables are integrated of the same order In other word, we have to determining appropriate seiccrion» lag length Sie aie San kiwas = > Follow the following two-step commands eS SES Syntaxes: 2 | chenist "Sloe f Sige 2:3ee3" varbasic vars. “tioyenous! "eons varsoc Example: eee eee Decision: The stars on each information criteria (IC) indicates varsoc the appropriate lag length. > For the above example, the appropriate lag length is 1. Ca UY. TUL ee Testing for Cointegration QA number of methods for testing cointegration have been proposed in the literature. We consider Ssric:Issr"= az here two comparatively simple methods: 8 Johansen co-integration test U Since the appropriate lag length is 1, let us conduct the test at this lag syntax Johansen co-integration test Ho: no co-integration Example Hi: There is cocintegration vecrank NPL Loans , lag (1) Decision: If the trace statistics greater than the critical value, there is co-integration (reject Ho) =» Hence, there is co-integration between NPL and loans! Me Le ea OT Q The result of Johansen co-integration test shows co-integration relationship Q This implies that the result of OLS regression will not be spurious! > So we can estimate the OLS result of the model which shows the long- run r/p between variables. —— reg NPL Loans ok ‘CHAPTER FOUR: TIME SERIES & PANELDATA __ DEBRE MARKOS TSO Error Correction Model (ECM) O We just showed that NPL and Loans are cointegrated. mea Ae atoms Lk > There is a long-term, or equilibrium, relationship between the | _* _* =. ma two. Somes 0 @ iemiind baat > However, there may be disequilibrium in the short run mi | cannes Lesa Q The SR r/p is estimated using ECM am | sae 8 bit O The ECM for the above model specified as Saat acer arta ales etatel nara ANPL = Bo + yALoans + Oresidual,_, + & Why? . steps: Fehr el | tc Aes ‘Step 1: Estimate the model and obtain the residual ———— Stop 2: Estimate the model using © iference of the vais ad 1 gf the resin Syntaxes: reg d.NPL d.Loans Lui =» The coefficient of the residual is si takes more than one period(1/0.556= 1.79 =2 years) to correct the deviation from the LR equi Me Le ea OT CAM i ceele ase M Tia taeda mel] ‘What is Panel Data a ee So far, we have seen regression analysis using cross-sectional and time series data separately. Q However, these two types of data can come together Q such kind of data called panel or pooled data © pooling of time series and cross-sectional observations), » Panel data (also known as longitudinal or cross-sectional time- series data) > A panel data set where there are repeated observations (set of entities that remain the same) through time. Note: panel (longitudinal) data and pooled data are not exactly the same thing. ‘CHAPTER FOUR: TIME SERIES & PANELDATA _ DEBRE MARKOS UNIVERSITY(DMU) Celi) ‘Types of Panel data Short panel: many individuals and few time periods Long panel: many time periods and few individuals Both: many time periods and many individuals Balanced panel: when all individuals are observed in all time periods Unbalanced panel: when individuals are not observed in all time periods Ca UY. HAPTER FOUR: TELL Lee Celi) Regressors ¢ Varying regressors Xi. o annual income for a person, annual consumption of a product ¢ Time-invariant regressors Xz = x; for all t. o gender, race, education ¢ Individual-invariant regressors xj, = x; for all i. © time trend, economy trends such as unemployment rate ok ‘CHAPTER FOUR: TIME SERIES & PANELDATA __ DEBRE MARI Variation for the dependent variable and regressors + Between variation: variation between individuals “ Within variation: variation within individuals (over time) Overall variation: variation over time and individuals Me Le ea OT G Panel data models have the following general form: Vie =o + eaxsic + Hie Where; stands for the it cross-sectional unit in the time period t X is the regressors y is the usual dependent variable uj is the error term assumed to follow the classical assumptions of zero mean and constant variance. For example, suppose you want to estimate the effect of capital stock on RGDP in Ethiopia and Kenya within 1972 and 2021 Then the model becomes RGDPi_ = cry + eyCapStockie + Hie Where; i stands for the either Ethiopia or Kenya in the time period t |APTER FOUR: TIME SE EVIL Une Like that of time series data, the first thing we need to do is declaring the data as Panel data before conducting any estimation. 2 To declare the data as panel data use the following commands aaeHGes > Toset cross-sectional version of the data (id) eg, to set Country_Code ome asid tine variable: Year, delta 1 Syntax: global 1D id > to set Time series version of the data (id) eg, to set Year as year Syntax: global Year year > finally decleare the data as panel ‘Syntax: xtset ID Year This shows that our data (model) is balanced panel. Which is the same as saying you have no missing observations. Me Le ea OT ‘Suppose the following simple hypothetical data: me omen oes [pce ene io ar ete tot Overall vatiances$ = Yet ~ 9 cA ille-20)+(10-20)+(11-20)'+3(20-20)?+/25-20}+(30-20)4(35-20)° =1/8(652)=81.5 >V81: 1.0277 Within variance si = => Di Le(%ie — 2)? = he (0-10? + (10-10? (11-107 +320-207+25.30)°+(30- 30)?+(35-30)°] =1/8(52)~6.5 -V6B= 2.549 a Between variance DiC — ¥)? VT0 = 10 ans ay ee 1° 3 ‘Syntax: xtsum id year Cons ‘CHAPTER FOUR: TIME SERIES & PANELDATA _ DEBRE MARKOS UNIVERSITY(DMU) Celi) O Let us go back to our data we are working on and have some overlook ‘Syntax: xtsum Year ID RGDP Capitalstock GARGDP has more within variation ($728140) than between stm vor 10 ncoP capitate variation ($51230.31) vesaie ree ee Q The average RGDP of an individual is between $242726.3 and as $315176.9 across individuals ‘Note: Time-invariant variables like individual ID has zero within sr variation. es ee C And also, individual-invariant regressors such as time (year) have zero between variation, Why? > ID has zero within variation because it doesn’t change (vary) with time. > Year has zero between variation because it doesn’t change (vary) between the two groups (Ethiopia and Kenya) Ca UY. HAPTER FOUR: TELL Lee Celi) Estimation of panel data regression model QEstimation of panel data model depends on the assumptions we make about the intercept, the slope coefficients, and the error term, uit. Based on this assumption, there are three types of models: © The pooled model (pooled OLS) © The fixed effects model © The random effects model Ca UY. HAPTER FOUR: TELL Lee Pooled OLS Assumption: All Coefficients Constant across Time and Individuals O Itis the simplest, and possibly naive, approach as it disregard the space and time dimensions of the pooled data and just velit | SMEG SHOES Eee ER estimate the usual OLS regression. The problem of this model is that it doesn’t distinguish —] Foe between cross-sections (countries) And by combining (pooling) these countries, deny the heterogeneity or individually that may exist between countries > This is the most restrictive panel data model and is not used smuch in the literature. Me Le ea OT Choosing between fixed and random effects Q To identify which model to be used, we can conduct Hausman test > Run the following commands simultaneously quietly xtreg RGDP Capitalstock fe B20) + eer) estimate store fixed quietly xtreg RGDP Capitalstock re i) Balisman tet Halal dade Ho: random effect model is consistent estimate store random H1: fixed effect model is consistent hausman fixed random Decision: if p-vale is significant (p-valuecthe critical value usually 10%), reject HO. This implies that better to use fixed effect model > For the above example, the p-value (0.0044) is less than the intended level of significant. So, we have to reject Ho Implies that fixed effect model is consistent for our data SII) Fixed Effect Model C2 Use the following command to run fixed effect mode! Syntax: xtreg RGDP Capitalstock fe core (ud, Xb) ~ -0.0689 thows the correlation of the errors ui with the" eye rogressoss. 5 De Raccmatic Ei rel cote ss my = 0089 reese Mites O The value of rho sows that 2.6% of the variance is (variation in RGDP) due to differences across panels. Where; : re tcsitan tena amin thos ae Fier been Fa a = err sigma_u = sd of residuals within groups u, sigma_e = sd of residuals (overall error term) e, SS SSS ESE SSS rc Celis Random effect model O Use the following command to run random effect model Syntax: xtreg RGDP Capitalstock ,re se 2)=€ ssa shows ne creation of the ers ul wih the regressors, rf, ees) > Because tis RE mode wa - mime | tt O The value of sigma_u (the residual within the group) is Pa because it is RE model considers variation between the group. Me Le ea OT

You might also like