Tutorial:
Empirical Distribution Function (EDF)
Thisisthesecondentryinourongoingseriesaboutempiricalorsampledistribution.Inthistutorial,we willstartwiththegeneraldefinition,motivationandapplicationsofEDF,andthenuseNumXLtocarry outourEDFanalysis. Inanearlierentry,wediscussedthehistogramasanonparametricmethodfortheprobability distributioninferenceofarandomvariable.Inthistutorial,wegoovertheempiricaldistribution functionandestimateitsvaluesforthedifferentpointsinthesample. Forsampledata,wegeneratedadatasetof29randomlygeneratedvaluesfromtheGaussian distribution.
Background
Theempiricaldistributionfunction(EDF)orempiricalcdfisastepfunctionthatjumpsby1/Natthe occurrenceofeachobservation: Where
EDF ( x)
1 N
I {x
i 1
x}
I {A} istheindicatorofaneventfunction
1 xi x I {xi x} 0 xi x
Bydefinition,theEDFfunctioncomputesthecumulativedistributionoftheunderlyingrandomnumber.
Why do we care?
TheEDFestimatesthetrueunderlyingcumulativedensityfunctionofthepointsinthesample;itis virtuallyguaranteedtoconvergewiththetruedistributionasthesamplesizegetssufficientlylarge.
Process
First,letsorganizeourinputdata.Wecanstartbyplacingthevaluesofthesampledatainaseparate column.Thesamplemaycontainoneormoremissingvalues.
EmpiricalDistributionFunction(EDF)Tutorial
SpiderFinancialCorp,2013
NowwearereadytoconstructourEDFPlotFirst,selecttheemptycellinyourworksheetwhereyou wishtheoutputtabletobegenerated,thenlocateandclickontheDescriptiveStatisticsiconinthe NumXLtab(ortoolbar).Then,selecttheEmpiricalDistributionFunctionitemfromthedropdown menu.
TheEDFWizardpopsup.
EmpiricalDistributionFunction(EDF)Tutorial 2 SpiderFinancialCorp,2013
Selectthecellsrangeforthevaluesoftheinputvariable. Notes:
1. Thecellsrangeincludes(optional)theheading(Label)cell,whichwouldbeusedintheoutput tableswhereitreferencesthosevariables. 2. Bydefault,theoutputtablecellsrangeissettothecurrentselectedcellinyourworksheet. 3. Bydefault,theoutputgraphcellsrangeissettothe7cellsrightofthecurrentselectedcellin yourworksheet. Finally,onceweselecttheinputdata(X)cellsrange,theOptionsandMissingValuestabsbecome available(enabled). Next,selecttheOptionstab.
Initially,thetabissettothefollowingvalues: OverlayNormaldistributionischecked.Thisoptionineffectinstructsthewizardtogeneratea secondcurvefortheGaussiandistributionforcomparisonpurposes.Leavethisoptionchecked.
Now,clickontheMissingValuestab.
EmpiricalDistributionFunction(EDF)Tutorial
SpiderFinancialCorp,2013
Inthistab,youcanselectanapproachtohandlemissingvaluesinthedataset(Xs).Bydefault,any observationwithmissingvaluewouldbeexcludedfromtheanalysis. Thistreatmentisagoodapproachforouranalysis,soletsleaveitunchanged. Now,clickOKtogeneratetheoutputtables.
EmpiricalDistributionFunction(EDF)Tutorial 4 SpiderFinancialCorp,2013
Notes:
1. ThevaluesofallobservationsaresortedinascendingorderandplacedincolumnE. 2. TheXBarandYBarcolumnscarrynospecialstatisticalmeaning;theyaremerelycomputedto assistusgeneratingastepwisetypeofgraphinExcel. 3. Finally,theequivalentcumulativedensityfunction(CDF)ofthenormaldistributioniscomputed incolumnI. ThegeneratedplotoftheEDFisshownbelow:
Conclusion
Inthistutorial,wedemonstratedtheprocesstogenerateanempiricaldistributionfunctioninExcel usingNumXLsaddinfunctions.
EmpiricalDistributionFunction(EDF)Tutorial
SpiderFinancialCorp,2013
Wheredowegofromhere? Toobtaintheprobabilitydensityfunction(PDF),oneneedstotakethederivativeoftheCDF,butthe EDFisastepfunctionanddifferentiationisanoiseamplifyingoperation.Asaresult,theconsequent PDFisveryjaggedandneedsconsiderablesmoothingformanyareasofapplication. Inournextentry,wewilllookatthekerneldensityestimationmethodtoobtaintheprobabilitydensity functionoftheunderlyingrandomprocess.
EmpiricalDistributionFunction(EDF)Tutorial
SpiderFinancialCorp,2013