0% found this document useful (0 votes)
368 views

Data Preparation: Missing Values (Excel)

In this issue, we start with the sampling assumptions of the time series: equal spacing and completeness. Then we consider a time series with missing values and discuss how to represent them in Excel, with the aid of NumXL processing. Finally, we look at unequally spaced time series, how they come into existence, how they are related to the missing values scenario, and what to do with them. For the example spreadsheet and a tutorial video, please visit us at: http://bitly.com/HwJZVT

Uploaded by

NumXL Pro
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
368 views

Data Preparation: Missing Values (Excel)

In this issue, we start with the sampling assumptions of the time series: equal spacing and completeness. Then we consider a time series with missing values and discuss how to represent them in Excel, with the aid of NumXL processing. Finally, we look at unequally spaced time series, how they come into existence, how they are related to the missing values scenario, and what to do with them. For the example spreadsheet and a tutorial video, please visit us at: http://bitly.com/HwJZVT

Uploaded by

NumXL Pro
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

#N/A

Missing Values

Thisissueisthefirstinaseriesofarticlesthatexplorethedatapreparationaspectoftimeseries analysis.Datapreparationisoftenoverlookedbyanalysts,butwebelieveitisavitalphasethatwieldsa vastinfluenceontheoverallanalysisandmodelingprocess.Thevastmajorityoftimeseriesand econometrictheoriesassumeinputtimeseriestobestationaryandhomogenous,withequallyspaced observationsandvaluesthatarepresentandreal.Inpractice,weoftenhandlesampleswithmissing values,unequallyspacedobservationspossibleoutliers,mean/variancedependency,restrictedvalues rangesandotherphenomena.Theaimofthisseriesofarticlesistoaddresseachoftheseproblemsand introducepracticalmethodstoovercomethem. Inthisissue,westartwiththesamplingassumptionsofthetimeseries:equalspacingand completeness.Thenweconsideratimeserieswithmissingvaluesanddiscusshowtorepresentthemin Excel,withtheaidofNumXLprocessing.Finally,welookatunequallyspacedtimeseries,howthey comeintoexistence,howtheyarerelatedtothemissingvaluesscenario,andwhattodowiththem.

Sampling
Thecommon(perfect)situationforatimeseriessampleisonethathasequallyspacedobservationsand presentvaluesforallpoints.Thisariseseitherbecauseobservationsaremadedeliberatelyateven intervals(continuousprocess),orbecausetheprocessonlygeneratesoutputsatsuchintervalintime (discreteprocess). Furthermore,thetimeunitforasamplingperiod(i.e.step)betweentwoconsecutiveobservationscan beeitherabsolute(e.g.daily,weekly,monthly,orannual),orbasedonaholidaycalendar(i.e.adjusted forweekendandholidays).Forexample,adailyfinancialtimeseriesofIBMstockclosingpricesisbased ontheNYSEholidayscalendar,soeachobservationistakenonaNYSEtradingday(open/close). Withrespecttotimeseriesmodelingandforecasting,itisnotimportantwhetherweuseabsolutetime orifweadjustforweekendsandholidays.Whatisimportantishowweinterprettheoutofsample dates,astheytooarebasedonthesamesamplingmethod. Next,letsexaminesomecaseswheretheinputtimeseriesisnotsoperfect.

Tips&HintsMissingValues

SpiderFinancialCorp,2012

Issue 1: Missing values


Insomesituations,oneormoreobservationdatesyieldinvalidormissing values.Thesevaluesaredesignatedasnotavalues,orNaNforshort.In Excel,NaNisidentifiedbythespecial#N/Arepresentation,andfewbuilt infunctionscanbeusedtodetect(e.g.NA(),ISNA(.),IFERROR(.),etc.)or ignorethem(e.g.MIN(.),MAX(.)),andotherfunctionsarenotsupportive. Intimeseriesanalysis,weoftenencountermissingvaluesphenomena, eitherintheoriginalrawtimeseriesorasaresultofatimeseriesoperator (e.g.lag,differencing,etc.). Q:Whatcanwedowithatimeserieswithmissingvalues? NumXLhastwosimplerules: 1. Themissingvaluesatthebeginningortheendofthetimeseries aresimplyignored. NumXLwilltruncatetheinputtimeseriestostartfromthe1stnonmissingvalueandendwith thelastnonmissingvalue. 2. Theintermediatemissingvaluesareconsideredseriousflawsintheinputtimeseries,and NumXLcantprocessthem. Theserulesbegthequestion:howdowehandlemissingintermediatevalues?

Manytechniqueshavebeenproposedtohandletimeserieswithmissingdata,butwecansummarize theseproposalswithtwoprinciples:ignoreandinterpolate.

Tips&HintsMissingValues

SpiderFinancialCorp,2012

IGNORE Theignoresolutionsimplydropsthemissingvaluefromthetimeseries.YoucanusetheNumXLRMNA (.)functionforthispurpose.However,youshouldapproachthissolutioncautiouslyasitaltersthe samplingofthetimeseriesitself. INTERPOLATE Theinterpolateapproachreplacesthemissingvalueswithinterpolatedvalues.Thereareseveral interpolationalgorithms:linear,polynomial,smoothing,spline,filtering,etc. Interpolationdoesnotchangethefrequencyofthesampling,butitmayaffecttheperceiveddynamics oftheunderlyingprocessifitisusedforseveralpointsinthetimeseries. NumXLcomeswithaninterpolationfunctionINTERPOLATEwhichsupportsfour(4)different interpolationalgorithms: Forward & Backward Flat Interpolation

Linear & Cubic Spline Interpolation

NOTE:TheInterpolatefunctiondiscardsallpointswithmissingvalues,sowecanusethefunction directlyontherawdatasetwithoutanyintermediatepreparation.

Tips&HintsMissingValues

SpiderFinancialCorp,2012

Issue 2: Unevenlyspaced time series


Unevenlyspacedtimeseriesarecommoninmanyreallifeapplicationswhenmeasurementsare constrainedbypracticalconditions.Theirregularityofobservationscanhaveseveralfundamental reasons.First,anyeventdrivencollectionprocess(inwhichobservationarecollectedwhensomeevent occurs)isinherentlyirregular.Second,insuchapplicationsassensornetworksoranydistributed monitoringinfrastructure,datacollectionisdistributedandcollectionagentscanteasilysynchronize withoneanother.Inaddition,thesamplingintervalsandpoliciesmaybedifferent.Finally, measurementscannotbemaderegularlyormayhavetobeinterruptedduetosomeevents(either foreseenornot).

Note:Unliketheequalspacedtimeseriescase,intermediateobservationswithmissingvaluescanbe safelydroppedfromtheoriginalserieswithoutanylossofinformation,and,obviously,theresultant seriesisunevenlyspacedaswell. Manytechniqueshavebeenproposedtohandletimeserieswithmissingdata,whichinthelimitcanbe viewedasirregularlysampled. Indataanalysispractice,irregularityisarecognizeddatacharacteristic,andpractitionersdealwithit heuristically.

Solution 1: Convert to equallyspaced time series


1. IGNORE: IGNOREtheirregularityinthetimesandtreatthedataasifitwereregular. 2. RESAMPLE:RESAMPLEusingalowersamplingrate.Thereductionsimplifiestheproblemto onethathasalreadybeenthoroughlyanalyzedandforwhichmanyapproachesareavailable. Note:Forapricetimeseries,downsamplingrequirestakingthelastobservationinthenew sampleperiod.Forthisstrategyslogreturn,theresampledreturnisthecumulativereturnsof allperiodsintheoriginalsampleperiods. Tips&HintsMissingValues 4 SpiderFinancialCorp,2012

3. INTERPOLATE:Interpolatetheintermediatemissingvaluesandconverttheseriestoonewith equallyspacedsamplingtimes.Whilethisisareasonableheuristicfordealingwithmissing values,theinterpolationprocesstypicallyresultsinasignicantbias(e.g.smoothingofthedata) thatchangesthedynamicsoftheprocess,thusthesemodelscannotbeappliedifthedatais trulyunequallyspaced. 4. Kernel Smoothing 5. Brownian Bridging:Anumberofauthorshavesuggestedusingcontinuoustimediusion processestofindmissingvalues.Inprinciple,tointerpolateamissingvalue,weassumea Brownianmotionbetweenthevaluesimmediatelypriortoandafterthenonmissing observations. Note:Asofthedateofthisissue,NumXLdoesnotsupporttheBrownianbridginginterpolationmethod.

Solution II Use unequallyspaced time series Models


Thesemodelsareslightlymorecomplexthantheirequallyspacedcounterpartmodels,andmanycanbe viewedasanextensionoftheequallyspacedtimeseriesmodels. Supposing Y (t ) isatimeserieswithirregularsampling,wecandecomposeitinto: Where

Y (t ) a (t ) X (t )

a (t ) isaslowlychangingdeterministicfunction(trendcomponent) X (t ) isarandomnoisecomponent

Ingeneral,onecanonlyobserve Y (t ) ,ourfirstgoalistoestimatethedeterministiccomponentand forprocess X (t ) . Note:Asofthedateofthisissue,NumXLdoesnotsupportunevenlyspacedtimeseriesmodels.

extracttherandomnoise X (t ) Y (t ) a (t ) ;oursecondgoalistofindasatisfactoryprobabilisticmodel

Tips&HintsMissingValues

SpiderFinancialCorp,2012

You might also like