0% found this document useful (0 votes)
205 views

Stepwise Regression Tutorial With NumXL

This is the second entry in our regression analysis and modeling series. In this tutorial, we continue the analysis discussion we started earlier and leverage an advanced technique – stepwise regression ‐ to help us find an optimal set of explanatory variables for the model For more information and/or to download the spreadsheet file, http://bitly.com/133pCQn

Uploaded by

NumXL Pro
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
205 views

Stepwise Regression Tutorial With NumXL

This is the second entry in our regression analysis and modeling series. In this tutorial, we continue the analysis discussion we started earlier and leverage an advanced technique – stepwise regression ‐ to help us find an optimal set of explanatory variables for the model For more information and/or to download the spreadsheet file, http://bitly.com/133pCQn

Uploaded by

NumXL Pro
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Tutorial:

Regression 102
Thisisthesecondentryinourregressionanalysisandmodelingseries.Inthistutorial,wecontinuethe analysisdiscussionwestartedearlierandleverageanadvancedtechniquestepwiseregressionto helpusfindanoptimalsetofexplanatoryvariablesforthemodel. Again,wewilluseasampledatasetgatheredfrom20differentsalespersons.Theregressionmodel attemptstoexplainandpredictweeklysalesforeachsalesperson(dependentvariable)usingtwo explanatoryvariables:intelligence(IQ)andextroversion.

Data Preparation
Similartowhatwedidinanearliertutorial,weorganizeoursampledatabyplacingthevalueofeach variableinaseparatecolumnandeachobservationinaseparaterow. Next,weintroducethemask.ThemaskisaBooleanarray(0,1),whichchooseswhichvariableis included(orexcluded)fromtheanalysis. Initially,atthetopofthetable,letsinsertthemaskcellsarray,eachwithavalueof1(i.e.included).The arrayisshownhighlightedbelow.

Inthisexample,wehave20observationsandtwoindependent(explanatory)variables.Theresponseor dependentvariableistheweeklysales.

Process
Now,wearereadytoconductourregressionanalysis.First,selectanemptycellinyourworksheet whereyouwishtheoutputtobegenerated,thenlocateandclickontheregressioniconintheNumXL

Regression102Tutorial

SpiderFinancialCorp,2013

tab(ortoolbar).

TheRegressionwizardappears.

Selectthecellsrangefortheresponse/dependentvariablevalues(i.e.weeklysales).Selectthecells rangefortheexplanatory(independent)variablesvalues.ForVariables(X)Mask,selectthecellsatthe topofthedatatable(Booleanarray). Notes: 1. Thecellsrangeincludes(optional)theheading(Label)cell,whichwouldbeusedintheoutput tableswhereitreferencesthosevariables. 2. Theexplanatoryvariables(i.e.X)arealreadygroupedbycolumns(eachcolumnrepresentsa variable),sowedontneedtochangethat. 3. Bydefault,theoutputcellsrangeissettothecurrentlyselectedcellinyourworksheet. PleasenotethatonceweselecttheXandYcellsrange,theOptions,ForecastandMissingValues tabsbecomeavailable(enabled). Next,selecttheOptionstab.

Regression102Tutorial

SpiderFinancialCorp,2013

Initially,thetabissettothefollowingvalues: Theregressionintercept/constantisleftblank.Thisindicatesthattheregressioninterceptwill beestimatedbytheregression.Tosettheregressiontoafixedvalue(e.g.zero(0)),enterit there. Thesignificancelevel(aka. )issetto5%. IntheOutputsection,themostcommonregressionanalysesareselected. LeaveAutoModelingunchecked.Wewilldiscussthisfunctionalityinalaterissue.

Now,clickontheMissingValuestab.

Regression102Tutorial 3 SpiderFinancialCorp,2013

Inthistab,youcanselectanapproachtohandlemissingvaluesinthedataset(XandY).Bydefault,any missingvaluefoundinXorinYinanyobservationwouldexcludetheobservationfromtheanalysis. Thistreatmentisagoodapproachforouranalysis,soletsleaveitunchanged. Now,clickOKtogeneratetheoutputtables:

Analysis
AsidefromtheVariables(X)Masksettings,everythingisexactlythesameaswedidintheprior tutorial,sowhatsournextstep? TheMaskvariabledetermineswhichvariableisincludedintheregressionanalysis,soletstake anotherlookattheCoefficientstable.

First,letsexcludetheIntelligenceinputvariablefromtheanalysis.Thisisdonesimplybyflippingthe maskvalueforthiscelltozero.

Regression102Tutorial

SpiderFinancialCorp,2013

Now,ifyouhavetheCalculationoptionsettomanual,forcerecalculation.Otherwise,thespreadsheet recalculatesautomatically.

Checkingtheoutputtables,wefindthefollowing: Rsquaredroppedby6%. AdjustedRsquaredroppedby1.5%. Standarderrorincreasedby$3. AICdroppedbyone(1). ANOVAtableshowstheregressionissignificant. Residualdiagnosischecksoutforalltests. Intheregressioncoefficientstable,theinterceptandthecoefficientoftheExtroversion variablearebothstatisticallysignificant.

Thismodelhasfewerparameters(i.e.one)andexplainsthevariationinthevaluesoftheresponse variablejustaswellaswhenwehadtwo(2)explanatoryvariables. Now,letsplottheestimatedvaluesagainsttheactual.

Regression102Tutorial

SpiderFinancialCorp,2013


$4,500 $Sales/Week Estimated $4,000

$3,500

$3,000

$2,500

$2,000

$1,500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Theshadedarearepresentsthe95%confidenceintervalfortheestimatesoftheregressionmodel. Sofar,wehavedemonstratedthatdroppingavariablefromtheanalysisisaseasyasflippingaswitch; nomorecopyingdataandclutteringyourspreadsheetwithtonsofoutputtables.Thisisnice,butyou mightbewondering:ifIhadmoreexplanatoryvariables(say10),whatistheoptimalsetofvariables? ShouldItryeverysinglesubset? NumXLsupportsaninterestingfunctionalitystepwiseregressiontohelpyouselectthisoptimalset. Letsdemonstratehowyouwoulduseit. (1) IntheMaskcellsrange,turnthevariablesonoroffthatyouwishthestepwiseregressionto consider.Forthisdemonstration,wewillturnthemallon.

(2) LocateandclickontheregressioniconintheNumXLtab. Regression102Tutorial 6 SpiderFinancialCorp,2013

(3) TheRegressionWizardpopsup. (4) IntheGeneraltab,selecttheinputcellsrangeandthemaskcellsrange. (5) UndertheOptionstab,checktheStepwiseRegressionbox.

(6) Leavethe3differentmethodschecked. (7) ClickOK. (8) Theoutputtablesaregenerated.

Thestepwiseregressiongeneratesoneadditionaltablenexttothecoefficientstable. Regression102Tutorial 7 SpiderFinancialCorp,2013

Letstakeacloserlookatthisnewtable. ThestepwiseregressioncarriesonaseriesofpartialF testtoinclude(ordrop)variablesfromtheregression model. Forwardselection:westartwithanintercept, andexamineaddinganadditionalvariable. Backwardelimination:westartfromthefull modelwithallvariablesin,andconsider droppingonerepressoratatime. Bidirectionaleliminationisahybridofthetwo methods.

Thetabledisplaysthemaskfortheoptimalmodelfoundineachcolumn.One(1)standsforinclusion andzero(0)forexclusion. Atthebottomofthetable,wecomputetheregressionstatisticsforeachmodelforourcomparison.In thiscase,thethreemodelscamebackwiththesamesetofvariables,sonocomparisonisneeded. Pleasenotethat,giventhesamesetofinputvariablesandresponses,themaskisusedtodifferentiate onemodelfromotherssimplybylistingtheinclusion/exclusionlist.

Conclusion
Sofar,wehavecreatedaregressionmodel,examineditssignificance,verifiedthatitsatisfiesunderlying assumptions,andfoundtheoptimalsubsetofvariablesofthemodel. Formany,thisistheendofanalysis,andtheywouldprobablystartusingitforforecasting. Beforewecanusethemodelforforecasting,therearetwomorequestionsweoughttoanswer: (1) Dowehaveanyobservationthatexertsasignificantinfluence(e.g.outliers)ontheregression model? (2) Istheregressionmodelstableoverthesampledata? Thiswillbecoveredinthe3rdentryinourregressiontutorialseries.Please,readon.

Regression102Tutorial

SpiderFinancialCorp,2013

You might also like