MATH11002 PRAC Modules Reco
MATH11002 PRAC Modules Reco
MATH11002 PRAC Modules Reco
Math11002BusinessStatistics
By:AurinoRilmanAdamDjamaris
MODELLINGANDSIMULATIONLABORATORY
MANAGEMENTPROGRAM
2010
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
1.1
1.
Answerquestionsbelowwithabriefdescription...................................................................................8
EXPLAINKEYDEFINITIONANDGIVEATLEAST1EXAMPLE!.........................................................8
1.2
UseMicrosoftExcelcompletefollowingtasks!!.....................................................................................9
1.3
CreateBarchartandalsoincludecumulativelinechartusingdataontable1.........................................9
1.4
CreatePieGraph,andattachexcelgraphresultstoasyouranswer!......................................................9
1.5
Thefollowingdatarepresentthecostofelectricityduringjuly2006forrandomsamplesof50one
bedroomapartmentsinlargecity......................................................................................................................9
1.6
Fromafrequencydistributionandpercentagedistributionthathaveclassintervalwithupperclasslimits
$99,$119,andsoon.........................................................................................................................................10
1.7
Constructahistogramandapercentagepolygon..................................................................................10
1.8
Formacumulativepercentagedistributionandplotacumulativepercentagepolygon.........................10
1.9
Aroundwhatamountdoesmonthlyelectricitycostseemtobeconcentrated?.....................................10
1.10 Appendix..............................................................................................................................................10
1.10.1
InstallingExcelAddInsforPHStat2.......................................................................................................10
1.10.2
INSTALLINGDATAANALYSISONEXCEL2007.....................................................................................10
1.10.3
InstallingandOperatingthePrenticeHallPHStatONYourHomeComputer......................................11
1.10.4
ConfiguringExcel2007securityforPHStat2.........................................................................................11
NUMERICALDESCRIPTIVEMEASURES.........................................................................................13
2.1
CentralTendency..................................................................................................................................13
2.1.1 TheMean...................................................................................................................................................13
2.1.2 TheMedian................................................................................................................................................14
2.1.3 TheMode...................................................................................................................................................15
2.1.4 Quartiles.....................................................................................................................................................16
2.1.5 TheGeometricMean.................................................................................................................................17
2.1.6 OtherusefulExcelBasicBuiltInFunctions:...............................................................................................17
2.2
Assignment2.1:....................................................................................................................................20
2.3
Variation..............................................................................................................................................20
2.3.1 TheRange...................................................................................................................................................20
2.3.2 TheInterQuartileRange.............................................................................................................................21
2.3.3 TheVarianceandStandarDeviation..........................................................................................................21
2.3.4 TheCoefficientofVariance.......................................................................................................................22
ARDBUSINESSSTATISTICSSec.2
Page2of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
2.3.5
ZScores......................................................................................................................................................23
2.4
Shape...................................................................................................................................................24
2.4.1 Formula:.....................................................................................................................................................24
2.5
Assignment2.2:...................................................................................................................................25
2.6
Descriptivesummaryofpopulation......................................................................................................25
2.6.1 ExcelStatisticalAnalysisTools...................................................................................................................25
2.6.2 InstallandusetheAnalysisToolPak..........................................................................................................26
2.7
Boxwhiskerplot...................................................................................................................................27
2.8
Assignment2.3.....................................................................................................................................29
2.9
Weightedmean....................................................................................................................................29
2.10
Assignment2.4.....................................................................................................................................30
2.11
Correlationcoefficients.........................................................................................................................30
2.12
Covariance............................................................................................................................................33
2.13 Assignment2.5.....................................................................................................................................33
2.13.1
CaloriesandFatrelationship.................................................................................................................33
2.13.2
FuelEfficiencyCalculationandStandard...............................................................................................34
3
3.1
PROBABILITY..............................................................................................................................35
BasicProbability...................................................................................................................................35
3.2
Samplespacesandevents,contingencytables,simpleprobabilityandjointprobability........................36
3.2.1 SampleSpace.............................................................................................................................................36
3.2.2 EventinSampleSpace...............................................................................................................................36
3.2.3 SimpleandJointProbability.......................................................................................................................37
3.3
Bayes'Theorem....................................................................................................................................38
3.4
Assignment3.1.....................................................................................................................................39
3.5
BasicProbabilityRules..........................................................................................................................41
3.5.1 DiscreteRandomVariable..........................................................................................................................41
3.5.2 DiscreteRandomVariablesExpectedValue..............................................................................................42
3.5.3 DiscreteRandomVariablesDispersion......................................................................................................42
3.5.4 Covariance..................................................................................................................................................42
3.5.5 TheSumofTwoRandomVariables:Measures..........................................................................................43
ARDBUSINESSSTATISTICSSec.2
Page3of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
3.6
BinomialDistribution............................................................................................................................44
3.6.1 Properties...................................................................................................................................................44
3.6.2 TheBinomialDistributionFormula............................................................................................................45
3.6.3 TheshapeandCharacteristics...................................................................................................................45
3.7
PoissonDistribution..............................................................................................................................46
3.7.1 Properties...................................................................................................................................................46
3.7.2 Formula......................................................................................................................................................46
3.7.3 Shape..........................................................................................................................................................47
3.8
Hypergeometricdistribution.................................................................................................................47
3.8.1 Formula......................................................................................................................................................47
3.8.2 Example......................................................................................................................................................48
3.9
ReadExcelCompaniontoChapter5......................................................................................................48
3.10
Assignment3.2.....................................................................................................................................48
3.11
Assignment3.3.....................................................................................................................................49
NORMALANDSAMPLINGDISTRIBUTION...................................................................................50
4.1
NormalDistributionandEvaluatingNormality......................................................................................50
4.1.1 NormalProbabilityDensityFunction.........................................................................................................51
4.1.2 EvaluatingNormality..................................................................................................................................52
4.2
SamplingandSamplingDistribution......................................................................................................54
4.2.1 Sample........................................................................................................................................................54
4.2.2 TypesofSamples........................................................................................................................................54
4.2.3 SamplingDistributions...............................................................................................................................55
4.2.4 SAMPLINGFROMFINITEPOPULATIONS....................................................................................................56
4.3
AssignmentforSimpleRandomSample................................................................................................56
4.4
AssignmentforSamplingDistribution...................................................................................................56
4.5
AssignmentforTheSamplingDistributionofthemean.........................................................................56
4.6
AssignmentforSamplingfromFinitePopulation...................................................................................57
CONFIDENCEINTERVALESTIMATION..........................................................................................58
5.1
Confidenceintervals.............................................................................................................................58
5.1.1 Apointestimateandaconfidenceintervalestimate................................................................................58
5.1.2 ConfidenceIntervalfor(Known).........................................................................................................59
ARDBUSINESSSTATISTICSSec.2
Page4of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
5.1.3
ConfidenceIntervalfor(Unknown).....................................................................................................61
5.2
ConfidenceIntervalEstimateforaSinglePopulationProportion...........................................................64
5.2.1 ExampleforConfidenceIntervalsforthePopulationProportion..............................................................64
5.3
DeterminingSampleSize......................................................................................................................65
5.3.1 IFPopulationStandardDeviation()Known.............................................................................................65
5.3.2 IFPopulationStandardDeviation()Unknown.........................................................................................66
5.3.3 ToDetermineTheRequiredSampleSizeForTheProportion...................................................................66
5.4
Assignment5........................................................................................................................................67
HYPOTHESISTESTINGANDTWOSAMPLETEST...........................................................................68
6.1
HypothesisTesting................................................................................................................................68
6.1.1 TheNullHypothesis,H0..............................................................................................................................68
6.1.2 TheAlternativeHypothesis,H1..................................................................................................................69
6.1.3 TheHypothesisTestingProcess.................................................................................................................69
6.1.4 TheTestStatisticandCriticalValues..........................................................................................................70
6.1.5 ErrorsinDecisionMaking..........................................................................................................................70
6.1.6 LevelofSignificance,...............................................................................................................................71
6.1.7 HypothesisTesting:Known.....................................................................................................................71
6.1.8 6StepsofHypothesisTesting:...................................................................................................................72
6.1.9 HypothesisTesting:KnownpValueApproach.......................................................................................73
6.1.10
HypothesisTesting:KnownConfidenceIntervalConnections...........................................................74
6.1.11
OneTailTests.........................................................................................................................................74
6.1.12
HypothesisTesting:Unknown............................................................................................................77
6.1.13
HypothesisTesting:ConnectiontoConfidenceIntervals......................................................................77
6.1.14
HypothesisTestingProportion..............................................................................................................78
6.2
Assignment6.1.....................................................................................................................................79
6.3
TwoSampleTests.................................................................................................................................79
6.3.1 TwoSampleTestsIndependentPopulations.............................................................................................81
6.3.2 IndependentPopulationsUnequalVariance.............................................................................................82
ANOVAANDCHISQUAREANDNONPARAMETRICTESTS..........................................................83
7.1
OneWayAnalysisofVariance..............................................................................................................84
7.1.1 Hypotheses:OneWayANOVA...................................................................................................................84
7.1.2 PartitioningtheVariation...........................................................................................................................85
7.1.3 ObtainingtheMeanSquares.....................................................................................................................86
7.1.4 OneWayANOVATable..............................................................................................................................86
7.1.5 Teststatistic...............................................................................................................................................86
7.1.6 Example......................................................................................................................................................87
ARDBUSINESSSTATISTICSSec.2
Page5of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
7.1.7
7.1.8
TheTheTukeyKramerProcedure.............................................................................................................88
ANOVAAssumptions..................................................................................................................................89
7.2
TwoWayAnalysisofVariance..............................................................................................................90
7.2.1 SourcesofVariation...................................................................................................................................90
7.2.2 TwoWayANOVA:Features.......................................................................................................................91
7.2.3 Interaction..................................................................................................................................................91
7.3
CHISQUAREANDNONPARAMETRICTESTS..........................................................................................91
7.3.1 OneVariableChiSquare(goodnessoffittest)withequalexpectedfrequencies...................................92
7.3.2 OneVariableChiSquare(goodnessoffittest)withpredeterminedexpectedfrequencies....................94
7.3.3 TwoVariableChiSquare(testofindependence)......................................................................................96
7.4
Assignment...........................................................................................................................................98
7.4.1 Assignment7.1...........................................................................................................................................98
7.4.2 Assignment7.2.........................................................................................................................................100
7.4.3 Assignment7.3.........................................................................................................................................101
REGRESSIONANALYSIS.............................................................................................................105
8.1
SimpleRegressionAnalysis.................................................................................................................105
8.2
RegressionAnalysisUsingExcel..........................................................................................................105
8.3
RegressionDialogBox.........................................................................................................................106
8.4
SimpleRegression...............................................................................................................................107
8.5
LinearCorrelationandRegressionAnalysis.........................................................................................107
MULTIPLEREGRESSIONMODEL................................................................................................111
9.1
MULTIPLEREGRESSIONUSINGTHEDATAANALYSISADDIN................................................................111
9.2
INTERPRETREGRESSIONSTATISTICSTABLE.........................................................................................113
9.3
INTERPRETANOVATABLE...................................................................................................................114
9.4
INTERPRETREGRESSIONCOEFFICIENTSTABLE.....................................................................................114
9.5
CONFIDENCEINTERVALSFORSLOPECOEFFICIENTS.............................................................................115
9.6
TESTHYPOTHESISOFZEROSLOPECOEFFICIENT("TESTOFSTATISTICALSIGNIFICANCE").....................116
9.7
TESTHYPOTHESISONAREGRESSIONPARAMETER..............................................................................116
9.7.1 Usingthepvalueapproach.....................................................................................................................116
ARDBUSINESSSTATISTICSSec.2
Page6of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
9.7.2
Usingthecriticalvalueapproach.............................................................................................................116
9.8
OVERALLTESTOFSIGNIFICANCEOFTHEREGRESSIONPARAMETERS...................................................116
9.9
PREDICTEDVALUEOFYGIVENREGRESSORS.......................................................................................117
9.10
EXCELLIMITATIONS............................................................................................................................117
9.11
Assignment9.1...................................................................................................................................117
10
TIMESERIESFORECASTING...................................................................................................119
10.1 Timeseriesforecastingmodels...........................................................................................................120
10.1.1
CLASSICALMULTIPLICATIVETIMESERIESMODELFORANNUALDATA..............................................120
10.1.2
Assignment9.1....................................................................................................................................121
10.2 MovingAverageandExponentialSmoothing......................................................................................121
10.2.1
MovingAverageModels......................................................................................................................121
10.2.2
ExponentialSmoothingModels...........................................................................................................123
10.3
Assignment10.2.................................................................................................................................124
10.4 Linear,exponentialandquadratictrend..............................................................................................124
10.4.1
LinearTrendModel.............................................................................................................................124
10.4.2
ExponentialTrendModel....................................................................................................................126
10.4.3
ModelSelectionUsingFirst,Second,andPercentageDifferences......................................................127
10.4.4
Assignment10.3..................................................................................................................................128
10.5
Theautoregressiveandtheleastsquaremodelsforseasonaldata......................................................128
10.6 Pricesindexes.....................................................................................................................................128
10.6.1
Example...............................................................................................................................................129
10.7 Aggregatedandsimpleindexes...........................................................................................................129
10.7.1
UnweightedAggregatePriceIndex.....................................................................................................130
10.7.2
WeightedAggregatePriceIndexes.....................................................................................................130
ARDBUSINESSSTATISTICSSec.2
Page7of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Practicum:Math11002
BusinessStatistics
MODULE1
Submittedonlyon
Day/Date:____________/
______________
Time:12.0014.00WIB
In____________________
ModuleDescription:
Objective
Output
DateofReceipt
Score:
AssistantSignature
I herewith signed here on stated that I have strived to do all this the
modulebymyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
DataCollectionandDataPresentation
The student understand the sources of data used in business, types of
data used in business, Developing tables and charts for categorical data
Developing tables and charts for numerical data and presenting graphs
Examinationofcrosstabulateddatausingthecontingencytableandside
bysidebarchartandusingMicrosoftExceltoprocessbusinessdata.
Useseparatepaperstoreportyourresults(inhandwritingorcomputer
print out). A report produced by the students should be in the form of
workingproceduresandresultsinbothsoftcopyandhardcopy.
PreLabRead:
Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,pages1830andpages7593.
SettheMs.ExcelApplicationtobereadyforDataAnalysisAddIn.Seepage2829.
1.1 Answerquestionsbelowwithabriefdescription.
1. ExplainKeyDefinitionandgiveatleast1example!
1.1 Population:
1.2 Sample:
1.3 Parameter:
1.4 Statistics:
1.5 Descriptive:
1.6 InferentialStatistics:
2. Namethreecircumstancesthatrequiredatacollection
3. ExplainthedifferencebetweenDescriptiveandInferentialStatistics
4.
Designquestionnaireaboutdatacollectionofyourownwithatleast10question!
ARDBUSINESSSTATISTICSSec.3
Page8of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
5.
5.1
5.2
5.3
5.4
6.
6.1
6.2
6.3
6.4
AccordingtoTheStateoftheNewsMedia,2006,theaverageageofviewersof"ABCWorldNews
Tonight"is59years.SupposearivalnetworkexecutivehypothesizesthattheaverageageofABC
newsviewersislessthan59.Totestherhypothesis,shesamples500ABCnightlynewsviewersand
determinestheageofeach.
Describethepopulation.
Describethevariableofinterest.
Describethesample.
Describetheinference.
ProblemColawarsisthepopulartermfortheintensecompetitionbetweenCocaColaandPepsi
displayedintheirmarketingcampaigns.Theircampaignshavefeaturedmovieandtelevisionstars,
rockvideos,athleticendorsements,andclaimsofconsumerpreferencebasedontastetests.
Suppose,aspartofaPepsimarketingcampaign,1,000colaconsumersaregivenablindtastetest
(i.e.,atastetestinwhichthetwobrandnamesaredisguised).Eachconsumerisaskedtostatea
preferenceforbrandAorbrandB.
Describethepopulation,
Describethevariableofinterest.
Describethesample.
Describetheinference.
1.2 UseMicrosoftExcelcompletefollowingtasks!!
1.3 CreateBarchartandalsoincludecumulativelinechartusingdataontable1.
1.4 CreatePieGraph,andattachexcelgraphresultstoasyouranswer!
Table1.PercentageExpendedMoney
WhatYouWouldDoWiththe
Percentage
Money
(%)
Buyaluxuryitem,vacation,orgift
20
Giveittocharity
2
Paydebt
24
Save
31
Spendonessentials
16
Other
7
1.5 Thefollowingdatarepresentthecostofelectricityduringjuly2006forrandom
samplesof50onebedroomapartmentsinlargecity
Table2.UtilityCharge
96 171 202 178 147 102 153 197 127
ARDBUSINESSSTATISTICSSec.3
82
Page9of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
157
141
95
108
111
128
143
135
148
144
187
191
213
168
166
137
130
109
139
129
165
167
149
158
1.6 Fromafrequencydistributionandpercentagedistributionthathaveclass
intervalwithupperclasslimits$99,$119,andsoon.
1.7 Constructahistogramandapercentagepolygon
1.8 Formacumulativepercentagedistributionandplotacumulativepercentage
polygon
1.9 Aroundwhatamountdoesmonthlyelectricitycostseemtobeconcentrated?
1.10 Appendix
1.10.1 InstallingExcelAddInsforPHStat2
The Prentice Hall PHStat Microsoft Excel addin enhances Microsoft Excel to better support the
statisticalanalysestaughtinanintroductorystatisticscourse.UsingPHStatlessensthetechnicaltraining
needed to use Microsoft Excel to perform statistical analysis and allows you to generate results that
wouldotherwisebeverytediousorimpossibletoproducefromworksheetsbuiltfromscratch.PHStat
requiresthatDataAnalysisisinstalledonEXCELandthefollowingsystemrequirements:
AnyWindows95(orlater)system;MicrosoftExcel95orMicrosoftExcel97(orlater)
32 MB of main memory; 64 MB required when running sampling distribution simulations and
dataintensive regression analyses; approximately 5 MB hard disk free space during setup
processand3MBharddiskspaceafterinstallation.
PreferredDisplaysettings:PHStatwillrunwithanydisplaysettings,butforbestresultssetthe
Desktopareato800by600pixelswithSmallFonts.(UsetheSettingstaboftheDisplayappletof
theControlPaneltochangesettings.).
1.10.2 INSTALLINGDATAANALYSISONEXCEL2007
ARDBUSINESSSTATISTICSSec.3
Page10of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ClickontheMicrosoftOfficebuttonintheupperlefthandcorneroftheEXCELspreadsheetandclick
onEXCELOptionsinthelowerrighthandcornerofthepulldownmenu.OntheleftsideoftheEXCEL
OptionspageclickonAddinsandthentheGobuttonatthebottomofthepage.Thisshouldopen
theAddinssection.SelectAnalysisToolPakandAnalysisToolPakVBAandclickOK.
1.10.3 InstallingandOperatingthePrenticeHallPHStatONYourHomeComputer
To use the Prentice Hall PHStat Microsoft Excel addin, you first need to run the setup program
(Setup.exe) located in the PHStat directory on this disk. The setup program will install the PHStat
program files to your system and add icons on your Desktop and Start Menu for PHStat. To do this
simplyinsertPHStatdiskinyourCDdriveandfollowdirections.
TooperatePHStatorEXCELsimplydoubleclicksonthePHStaticon.ForEXCEL2007users,youwilllikely
havetoclickonEnableMacroswhichshouldpopupbyitself.
1.10.4 ConfiguringExcel2007securityforPHStat2
You must change the Trust Center settings to allow PHStat2 to properly function. Click the Office
Button,andthenclickExcelOptionsintheOfficemenu.IntheExcelOptionsdialogboxthatappears,
clickTrustCenterandthenintheTrustCenterpanel,clickTrustCenterSettings.Intheleftpaneofthe
ARDBUSINESSSTATISTICSSec.3
Page11of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
TrustCenterdialogboxthatappears,firstclickAddInsandclear,ifnecessaryallofthecheckboxesthat
appearundertheAddinsbanner.Next,clickMacroSettingsintheleftpaneandclickeitherDisableall
macroswithnotification(recommended)orEnableallmacros(notrecommended,useonlyiftheother
choicefailstoallowPHStat2tofunctionproperly).
ARDBUSINESSSTATISTICSSec.3
Page12of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Practicum:Math11002
BusinessStatistics
MODULE2
Submittedonlyon
Day/Date:____________/
______________
Time:12.0014.00WIB
In____________________
ModuleDescription:
Objective
Output
DateofReceipt
Score:
AssistantSignature
I herewith signed here on stated that I have strived to do all this the
modulebymyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
NUMERICALDESCRIPTIVEMEASURES
Measuresofcentraltendency,variation,andshapePopulationsummary
measures Five number summary and BoxandWhisker plots Covariance
andCoefficientofcorrelation.
Useseparatepaperstoreportyourresults(inhandwritingorcomputer
print out). A report produced by the students should be in the form of
workingproceduresandresultsinbothsoftcopyandhardcopy.
2 NUMERICALDESCRIPTIVEMEASURES
2.1 CentralTendency
Centraltendencyreferstothetendencyoftheindividualmeasuresinadistributiontocluster
togethertowardsomepointofaggregation.
2.1.1
TheMean
Meanorarithmeticmeanisvalueoftotalsumofvaluesdividedbythenumberofdatavalues
includedincludedtothecalculation(quantityofinteger).
2.1.1.1 Formula:TheMean
Totalsumdividedbyquantityofintegers
Where
=Samplemean
=Numberofvaluesorsamplesize
=ithvalueofthevariableX
=Summationofall valueinthesample
2.1.1.2 MsExcelBuiltInFunctionforcalculatingMean
Thefunctioniswrittenasfollows:
=AVERAGE(argument)
ARDBUSINESSSTATISTICSSec.4
Page13of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Theargumentforthisfunctionisdatacontainedintheselectedrangeofcells.
ExampleUsingExcel'sAVERAGEFunction:
Note:Forhelpwiththisexample,seetheimagetotheright.
1. EnterthefollowingdataintocellsC1toC6:11,12,13,14,15,16.
2. ClickoncellC7thelocationwheretheresultswillbedisplayed.
3. Type"=average("incellC7.
4. DragselectcellsC1toC6withthemousepointer.
5. Typetheclosingbracket")"afterthecellrangeincellC7.
6. PresstheENTERkeyonthekeyboard.
7. Theanswer13.5shouldbedisplayedincellC7.
8. The complete function = AVERAGE (C1 : C6) appears in the formula bar above the
worksheet.
2.1.2
TheMedian
TheMEDIANshowsyouthemiddlevalueinalistofnumbers.Middle,inthiscase,refersto
arithmeticsizeratherthanthelocationofthenumbersinalist.Ifthereisanevensetof
numbers,themedianistheaverageofthemiddletwovalues.
2.1.2.1 Formula:TheMedian
Middlevaluethatseparatesthegreaterandlesserhalvesof
adataset
rankedvalue
2.1.2.2 MsExcelBuiltInFunctionforcalculatingMedian
ThesyntaxfortheMEDIANfunctionis:
=MEDIAN(number1,number2,...number255)
Note:Upto255numberscanbeenteredintothefunction.
ARDBUSINESSSTATISTICSSec.4
Page14of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ExampleUsingExcel'sMEDIANFunction:
Note:Forhelpwiththisexample,seetheimagetotheright.
1. EnterthefollowingdataintocellsD1toD5:4,12,49,24,65.
2. ClickoncellE1thelocationwheretheresultswillbedisplayed.
3. ClickontheFormulastab.
4. ChooseMoreFunctions>Statisticalfromtheribbontoopenthefunctiondropdownlist.
5. ClickonMEDIANinthelisttobringupthefunction'sdialogbox.
6. Drag select cells D1 to D5 in the spreadsheet to enter the range into the dialog box, then
ClickOK.
7. Theanswer24shouldappearincellE1sincetherearetwonumberslarger(49and65)and
twonumberssmaller(4and12)thanitinthelist.
8. Thecompletefunction=MEDIAN(D1:D5)appearsintheformulabarabovetheworksheet
whenyouclickoncellF1.
2.1.3
TheMode
ThemodeisMostfrequentnumberinadataset.
2.1.3.1 Formula:TheMedian
Forexample,themodeofarrayof1,3,4,4,4,7,7,12,17is
4.
2.1.3.2 MsExcelBuiltInFunctionforcalculatingMode
TheMODEfunction,oneofExcel'sstatisticalfunctions,tells
youthemostfrequentlyoccurringvalueinalistofnumbers.
ThesyntaxfortheMODEfunctionis:
ARDBUSINESSSTATISTICSSec.4
Page15of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
1.
2.
3.
4.
5.
6.
EnterthefollowingdataintocellsD1toD6:98,135,147,135,98,135.
ClickoncellE1thelocationwheretheresultswillbedisplayed.
ClickontheFormulastab.
ChooseMoreFunctions>Statisticalfromtheribbontoopenthefunctiondropdownlist.
ClickonMODEinthelisttobringupthefunction'sdialogbox.
DragselectcellsD1toD6inthespreadsheettoentertherangeintothedialogbox.Then
ClickOK.
7. Theanswer135shouldappearincellE1sincethisnumberappearsthemost(threetimes)in
thelistofdata.
8. Thecompletefunction=MODE(D1:D6)appearsintheformulabarabovetheworksheet
whenyouclickoncellE1.
2.1.4
Quartiles
Quartilesoftenareusedinsalesandsurveydatatodividepopulationsintogroups.Forexample,
youcanuseQUARTILEtofindthetop25percentofincomesinapopulation.
2.1.4.1 FormulasofQuartiles
2.1.4.2 MsExcelBuiltInFunctionforcalculatingMode
ThesyntaxfortheMODEfunctionis:
=QUARTILE(array,quart)
Arrayisthearrayorcellrangeofnumericvaluesforwhichyouwantthequartilevalue.
Quartindicateswhichvaluetoreturn.
If quart equals
ARDBUSINESSSTATISTICSSec.4
QUARTILE returns
Minimum value
Maximum value
Page16of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
2.1.5
TheGeometricMean
TheGeometricMeanmeasurestherateofchangeofavariableovertime.Returns the geometric
mean of an array or range of positive data. For example, you can use GEOMEAN to calculate average
growth rate given compound interest with variable rates
2.1.5.1 Formula:TheGeometricMean
GeometricMeanisthenthrootoftheproductofnvalues
xG
Or
GeometricMeanRateofReturnmeasurestheaveragepercentagereturnofaninvestmentover
time.
RG
2.1.5.2 MsExcelBuiltInFunctionforcalculatingGeometricMean
Syntax
=GEOMEAN(number1,number2,...)
Number1,number2,...are1to255argumentsforwhichyou
wanttocalculatethemean.Youcanalsouseasingle
arrayorareferencetoanarrayinsteadofarguments
separatedbycommas.
Example:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. OnB4typeformula=GEOMEAN(A2:A8).ThenClickENTER.
3. Theanswer5.47698697shouldappearincellB4
2.1.6
OtherusefulExcelBasicBuiltInFunctions:
2.1.6.1 SUM
ARDBUSINESSSTATISTICSSec.4
Page17of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Horizontal
100
200
Vertical
100
200
300
600
300
600
=SUM(C4:E4)
=SUM(C7:C9)
Single Cells
100
300
600
200
What Does It Do ?
This function creates a total from a list of numbers.
It can be used either horizontally or vertically.
The numbers can be in single cells, ranges are from other functions.
Syntax
=SUM(Range1,Range2,Range3... through to Range30).
2.1.6.2 COUNT
Entries To Be Counted
10
20
30
10
0
30
10
-20
30
10
1-Jan-88
30
10
21:30
30
10
0.758576
30
10
30
10
Hello
30
10
#DIV/0!
30
Count
3
3
3
3
3
3
2
2
2
=COUNT(C4:E4)
=COUNT(C5:E5)
=COUNT(C6:E6)
=COUNT(C7:E7)
=COUNT(C8:E8)
=COUNT(C9:E9)
=COUNT(C10:E10)
=COUNT(C11:E11)
=COUNT(C12:E12)
What Does It Do ?
This function counts the number of numeric entries in a list.
It will ignore blanks, text and errors.
Syntax
=COUNT(Range1,Range2,Range3... through to Range30)
2.1.6.3 MAX
Values
ARDBUSINESSSTATISTICSSec.4
Maximum
Page18of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
120
800
100
120
250
800
=MAX(C4:G4)
Dates
1-Jan-98
25-Dec-98
31-Mar-98
27-Dec-98
4-Jul-98
Maximum
27-Dec-98
=MAX(C7:G7)
What Does It Do ?
This function picks the highest value from a list of data.
Syntax
=MAX(Range1,Range2,Range3... through to Range30)
2.1.6.4 MIN
Values
120
800
100
120
250
Minimum
100
=MIN(C4:G4)
Dates
1-Jan-98
25-Dec-98
31-Mar-98
27-Dec-98
4-Jul-98
Maximum
1-Jan-98
=MIN(C7:G7)
What Does It Do ?
This function picks the lowest value from a list of data.
Syntax
=MIN(Range1,Range2,Range3... through to Range30)
ARDBUSINESSSTATISTICSSec.4
Page19of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
2.2 Assignment2.1:
Thesampledataof38banksfordirectdepositcustomerswhomaintainaRp.100(millions)balance:
26
18
28
20
28
20
10
25
40
25
2
25
20
25
21
22
21
22
22
30
22
30
25
30
25
30
25
30
25
3
18
65
18
15
25
20
25
20
15
29
15
29
20
23
20
26
18
45
UsingformulasabovecalculateMean,Median,Mode,QuartilesandGeometricMeanofthe
sampledata.
2. UseMsExcelFunctionstocalculateMean,Median,Mode,QuartilesandGeometricMeanofthe
sampledata.
3. Comparetheresultandreportyouranalysis.
1.
2.3 Variation
Variability or variation refers to the overall separations and differences that exist among the
individual measures in a distribution, while central tendency refers to their closeness and
similarity. Variation measures the spread or the dispersion of values in a data set.
2.3.1 TheRange
TheRangeequaltothelargestvalueminusthesmallestvalue.
2.3.1.1 Formula:TheRange
2.3.1.2 MsExcelBuiltInFunctionforcalculatingTheRange
TocalculatetherangeinMsExcelweusetwobuiltinfunction:MAX()andMIN().SeeSection
1.1.6above.
BasedontheformulaoftherangeabovethesyntaxofformulatocalculateTheRange:
=MAX()MIN()
Example:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. OnB4typeformula=MAX(A2:A8)MIN(A2:A8).ThenClickENTER.
ARDBUSINESSSTATISTICSSec.4
Page20of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
3. Theanswer5.47698697shouldappearincellB4
2.3.2 TheInterQuartileRange
TheInterQuartileRangeequaltothedifferentbetweenthethirdquartileandthefirstquartileinaset
ofdata.
2.3.2.1 Formula:TheRange
2.3.2.2 MsExcelBuiltInFunctionforcalculatingTheRange
TocalculatetherangeinMsExcelweuseFORMULAwithbuiltin
functionQUARTILE().SeeSection1.1.4.2above.
Basedontheformulaoftherangeabovethesyntaxofformulato
calculateTheRange:
=QUARTILE(range,3)QUARTILE(range,1)
Example:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. OnB4typeformula=QUARTILE(A2:A8,3)QUARTILE(A2:A8,1).ThenClickENTER.
3. Theanswer5.47698697shouldappearincellB4
2.3.3 TheVarianceandStandarDeviation
TheInterQuartileRangeequaltothedifferentbetweenthethirdquartileandthefirstquartileinaset
ofdata.
2.3.3.1 Formula:TheVarianceandStandardDeviation
Varianceformula:
Or
StandarVariationformula:
ARDBUSINESSSTATISTICSSec.4
Page21of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
2.3.3.2 MsExcelBuiltInFunctionforcalculatingVarianceandStandardDeviation
TocalculatetherangeinMsExcelweuseFORMULAwithbuiltinfunctionVAR(),andthe
standarddeviationweuserSTDEV()
Syntax:
=VAR(number1,number2,...)
=STDEV(number1,number2,...)
ExampleforVARIANCE:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. OnB4typeformula=VAR(A2:A8).ThenClickENTER.
3. Theanswer8shouldappearincellB4
ExampleforSTANDARDEVIATION:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. OnB4typeformula=STDEV(A2:A8).ThenClickENTER.
3. Theanswer2.828427125shouldappearincellB4
2.3.4 TheCoefficientofVariance
TheCoefficientofVarianceisarelativemeasureofvariationthatalwaysexpressedinpercentage.
2.3.4.1 Formula:TheCoefficientofVariance
Thecoefficientofvarianceisequaltothestandarddeviationdividedbythemeanandmultipliedby
100%
Formula:
ARDBUSINESSSTATISTICSSec.4
Page22of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
100%
2.3.4.2 MsExcelBuiltInFunctionforcalculatingVarianceandStandardDeviation
TocalculatetherangeinMsExcelweuseFORMULAwithbuiltinfunctionSTDEV(),andthemean
weuseAVERAGE()
Syntax:
=(STDEV(number1,number2,...)/AVERAGE(number1,number2,...))*100%
Number1, number2, ... are 1 to 255 number arguments corresponding to a sample of a population
ExampleforVARIANCE:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. On B4 type formula = (STDEV(A2:A8)/AVERAGE(A2:A8)X100%
ThenClickENTER.
3. Theanswer8shouldappearincellB4
2.3.5 ZScores
ZScoresisanextremevalueoroutlierlocatedfarawayfromthemean.
Formula:
2.3.5.1 MsExcelBuiltInFunctionforZScores
TocalculateZScoreinMsExcelweuseFORMULAwithbuiltinfunctionSTDEV(),andthemean
weuseAVERAGE()
Syntax:
'=(number - AVERAGE(range of number))/STDEV(range of number)
ARDBUSINESSSTATISTICSSec.4
Page23of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ExampleforVARIANCE:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. On C2 type formula '=(A2AVERAGE($A$2:$A$8))/STDEV($A$2:$A$8) Then copy to
otherscells(C3toC8)ENTER.
3. Theanswer0.707106781shouldappearincellC2.
2.4 Shape
Theofadatasetrepresentsapatternofallthevalues,fromthelowesttothehighestvalue.A
distributioniseithersymmetricalorskewed.Asymmetricaldistributionisvaluesbelowmeanare
distributedexactlyasthevaluesabovethemean.Whileskeweddistributionwillresultsinanimbalance
oflowvaluesorhighvalues.
2.4.1 Formula:
Shapeinfluencestherelationshipofthemeantothemedianinthefollowingways:
Mean<Median:negativeorleftskewed
Mean=Median:symmetricorzeroskewness
Mean>Median:positiveorrightskewed
2.4.1.1 MsExcelFunctionforcalculatingskewness
Returns the skewness of a distribution. Skewness characterizes the degree of asymmetry of a distribution
around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more
positive values. Negative skewness indicates a
distribution with an asymmetric tail extending toward
more negative values.
Syntax
=SKEW(numbers)
ARDBUSINESSSTATISTICSSec.4
Page24of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Examples:
ExampleforNegativeSKEWNESS:
1. EnterdatatocellsA2throughA8:10,10,20,30,40,50,50
2. OnB2typeformula=SKEW(A3:A8),thenpressENTER.
3. Theanswer0.38shouldappearincellB2. This mean number mean that the
distributionofdata(A3toA8)isnegative
4. OnB4typeformula=SKEW(A2:A8),thenpressENTER.
5. Theanswer0shouldappearincellB4.
This mean number mean that the
distributionofdata(A2toA8)issymmetric.
6. OnB6typeformula=SKEW(A2:A7),thenpressENTER.
7. Theanswer+0.38shouldappearincellB6. This mean number mean that the
distributionofdata(A2toA7)ispositive
2.5 Assignment2.2:
UsingDataon1.2abovecalculateorcomposeRange,InterQuartileRange,Varianceand
StandarDeviation,TheCoefficientofVariance,ZScores,Shape.Reportyourresults.
2.6 Descriptivesummaryofpopulation
TheDescriptiveStatisticsprocedureoftheToolPakaddin.
ExcelStatisticalAnalysisTools
Excel has several data analysis tools included through an Analysis ToolPak add-in. These tools
can quickly produce complex engineering or statistical analyses of your data. Each tool is a little
different, but all require you to input what data you wish Excel to analyze.
ARDBUSINESSSTATISTICSSec.4
Page25of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
DataAnalysisislocatedundertheToolsmenu.Iftheoptionisnotthere,youwillneedtoinstallthe
AnalysisToolPak.
2.6.2
InstallandusetheAnalysisToolPak
1. OntheToolsmenu,clickAddIns.
2. SelecttheAnalysisToolPakcheckbox.
3. OntheToolsmenu,clickDataAnalysis.
Note:IfAnalysisToolPakisnotlistedintheAdd
Insdialogbox,clickBrowseandlocatethe
drive,foldername,andfilenamefortheAnalysis
ToolPakaddin,Analys32.xllusuallylocatedin
theMicrosoftOffice\Office\Library\Analysis
folderorruntheSetupprogramifitisn't
installed.
Click on Data Tab and click on Data Analysis Icon on Data Tab.
Click on the Microsoft Office button in the upper left hand corner of the EXCEL
spreadsheet and click on EXCEL Options in the lower right hand corner of the pull-down
menu. On the left side of the EXCEL Options page click on Add-ins and then the Go
button at the bottom of the page. This should open the Add-ins section.
3. Select Analysis ToolPak and Analysis ToolPak-VBA and click OK.
2.
ARDBUSINESSSTATISTICSSec.4
Page26of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
2.
If Data Analysis does not appear on the Tools pull-down menu, then click on AddIns and click on the first two boxes (Analysis ToolPak and Analysis ToolPakVBA). Click OK and open Data Analysis.
2.7 Boxwhiskerplot
Indescriptivestatistics,aboxplotorboxplot(alsoknownasaboxandwhiskerdiagramorplot)isa
convenientwayofgraphicallydepictinggroupsofnumericaldatathroughtheirfivenumbersummaries:
thesmallestobservation(sampleminimum),lowerquartile(Q1),median(Q2),upperquartile(Q3),and
largestobservation(samplemaximum).Aboxplotmayalsoindicatewhichobservations,ifany,mightbe
consideredoutliers.
Aboxplot,orboxandwhiskerdiagram,providesasimplegraphicalsummaryofasetofdata.Itshowsa
measureofcentrallocation(themedian),twomeasuresofdispersion(therangeandinterquartile
range),theskewness(fromtheorientationofthemedianrelativetothequartiles)andpotentialoutliers
(markedindividually).Boxplotsareespeciallyusefulwhencomparingtwoormoresetsofdata.
Regrettably,thereiscurrentlynoboxplotfacilityinMicrosoftExcel.Forsimplicity,manyrecentstatistics
textbooks(forexample,Dalyetal,1995)omitthefencesusedtoidentifypossibleoutliers.These
simplifiedboxplots,displayingmostoftheimportantfeatures,canbedrawnquiteeasilyinExcel.Inthe
absenceofanyfences(seeDevoreandPeck(1990)foradefinition),asimpleruleisthatawhiskerwhich
islongerthanthreetimesthelengthoftheboxprobablyindicatesanoutlier.
ARDBUSINESSSTATISTICSSec.4
Page27of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
CreateBoxWisherPlot
TocreateBoxPlotusingMsExcel2007:
Highlight the whole table, including figures and series labels, then select Insert from the main
menu.UnderChartsselectaLinechartandchoosetheLinewithMarkersoption.
2. UnderChartToolsselectDesign>SwitchRow/Column.Rightclickonadatapointfromthefirst
data series, and choose Format Data Series > Line Colour > No line to remove the connecting
lines.Repeatfortheotherfourdataseriesinturn.
3. SelectanyofthedataseriesandunderChartToolsselectLayout>Analysis>Lines>HighLow
Lines,thenLayout>Analysis>Lines>Up/DownBars>Up/DownBars.
4. Further customisingcan becarriedoutaccording toyourownpreferencesbyrightclickingon
therelevantobjectandselectingtheFormatoptionontheshortcutmenu.
1.
TheResult:
90
80
70
60
Q1
50
Min
40
Median
30
Max
20
Q3
10
0
set1
set2
set3
ARDBUSINESSSTATISTICSSec.4
Page28of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
2.8 Assignment2.3
Replicatesection2.7Boxwhiskerplotprocedure
2.9 Weightedmean
Exceldoesnotcontainabuiltinfunctiontocalculateaweightedaverage.Itishowevereasytodoit
usingtheSUMPRODUCT()functioninasimpleformula.
Weightedaverage
2
3
4
5
6
7
8
9
GradeA
GradeB
GradeC
Average
WtdAvg
Cost
13000
15000
20000
Staff
5
2
3
16000
15500
SumProduct()multipliestwoarrays(orranges)togetherandreturnsthesumoftheproduct.Inthe
illustrationitwouldcalculate'(B4xC4)+(B5xC5)+(B6xC6)'.
TheformulaincellB9is:=SUMPRODUCT(B4:B6,C4:C6)/SUM(C4:C6)
Theresultshowsthattheweightedaverageislessthantheplainarithmeticmean.Thisisbecauseithas
takenintoaccountthelargernumberofstaffbeingpaidthelowersalary.
F
G
H
13
Forecastincorporatingrisk
14
15
ProbabilitySales
16
17
18
19
20
21
Goodweather
Mediocreweather
Poorweather
Hurricane
Forecast
30%
50%
19%
1%
100%
10000
8000
2000
0
7380
Theweightedaveragecanalsobeusedforassessingtheriskordeterminingtheprobabilityofvarious
outcomes.Ifajudgementismadeaboutthelikelihoodofvariousweatherconditionsforanoutdoor
sportingandtheeffectonticketsales,apredictedvalueofsalescanbecalculatedusingasimilar
formulaasthepreviousexample.=SUMPRODUCT(G16:G19,H16:H19)returnsthevalueof7,380.The
ARDBUSINESSSTATISTICSSec.4
Page29of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
probabilityvalues(G16:G19)arealreadyexpressedaspercentages(total=100%or1.0)andsothereis
noneedtodividebySUM(G16:G19).
2.10 Assignment2.4
CapitalComponent
RetainedEarnings
CommonStocks
PreferredStocks
Debt(Bonds)
Cost
8%
9%
10%
6.67%
%ofcapitalstructure
30%
10%
15%
45%
UsingtableaboveCalculatetheweightedaveragecostofcapital(WACC)ofthiscompany!
2.11 Correlationcoefficients
2.11.1.1 CorrelationCoefficientsFormula
If(X1,Y1),(X2,Y2),(X3,Y3)...,(Xn,Yn)aretheobservedvaluesthenthecorrelationcoefficient(usually
denotedasCorr(X,Y)orXY)oftheobservedsampleisdefinedas:
Anotherwayofvisualizingtheformulais:
,
Nowwegeneralizetheideaofsamplecorrelationcoefficientwhenthesampleisnotbivariatebut
multivariate.
LetX~1,X~2,X~3,...,X~nbearandomsamplewhereeachX~iisakdimensionalvectoroftheform
X~i=Xi1,Xi2,Xi3,...,Xin.Justlikeintheprevioustopic.
Justlikeinthecaseofsamplecovariance,inthemultivariatecasewetalkofsamplecorrelation
coefficientmatrix.Likethedispersionmatrix,thesamplecorrelationcoefficientmatrixisasquare
matrixoforderkxkdefinedasbelow.
Allthediagonalentriesare1asbothmathematicallyandheuristicallyweseethatthe
correlationcoefficientofanyvariablewithitselfshouldbe1.
ii=1foralli
ARDBUSINESSSTATISTICSSec.4
Page30of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Similartothedispersionmatrix,theoffdiagonalelementsarecorrelationcoefficientoftheith
andjthvariables.
Orinanotherway:
,
2.11.1.2 MsExcelFunctionforcalculatingcorrelation
Step1:TomakethiscalculationselectTools/DataAnalysis/CorrelationThefollowingdialogboxis
displayed:
Step2:Intheinputrangetextboxentertherangeofthedata(includethefirstrowcontainingthe
variablename)orclickonthedataselectioniconandmarktherangetouse.
Step3:NoticethattheLabelsinFirstRowcheckboxischecked.
Step4:ClickonOKandthefollowinginformationwillappearinanewworksheet:
A
B
1
TIME1
TIME2
2
TIME1
1
3
TIME2
0.763957
1
ThePearsonscorrelationforthesetwovariablesis0.764(rounded.)
ARDBUSINESSSTATISTICSSec.4
Page31of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Example2
2.11.1.3 Asecondwaytocalculatethecorrelationiswithafunction.
Step1:IntheExampleworksheet,entersomelabelsincolumnItoindicatethatyouarecalculatinga
correlation.
Step2:IntheJ3(orwhereveryouwantit)cell,youwillenteranExcelfunctionthatwillcalculatethe
desiredcorrelation.
Step3:Entertheformula
=CORREL(C2:C51,D2:D51)
Notethatitisoftheform,=CORREL(array1,array2)
Wherethefirstarrayandsecondarraycontainthepairednumberstocorrelate.ItisIMPORTANTthat
thenumbersbepairedcorrectly.)
Theanswerwillappearinthecell.Inthiscase,thePearsonscorrelationis0.764(rounded.)
2.11.1.4 CalculatethecorrelationiswithusingFormula
ARDBUSINESSSTATISTICSSec.4
Page32of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
2.12 Covariance
Forabivariatesamplewehavedealtwiththecovariancealready.Letusjustrecallit:
Givenarandomsample(X1,Y1),(X2,Y2),(X3,Y3)...,(Xn,Yn)thesamplecovarianceCov(X,Y)isdefinedas
1
2.12.1.1 MsExcelCalculationforCovariance:
2.12.1.2 MsExcelFunctionforCovariance:
TocalculateCovarianceusingMsExcelFunctionwecanuseCOVAR(array1,array2)
2.13 Assignment2.5
2.13.1 CaloriesandFatrelationship
Calories
Fat
Product
240
8.0
260
3.5
350
22.0
Starbucks Iced Coffee Mocha Expresso (whole milk and whipped cream
350
20.0
420
16.0
510
22.0
530
19.0
ARDBUSINESSSTATISTICSSec.4
Page33of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
2.13.2 FuelEfficiencyCalculationandStandard
a.
b.
c.
Government
Standard
Car
Owner
14.3
16.8
15.0
17.8
27.8
26.2
27.9
34.2
48.8
47.6
16.8
18.3
23.7
28.5
32.8
33.1
37.3
56.0
Compute the covariance using both techniques explained above and compare. Explain !
Compute the coefficient of correlation using techniques explained above.
What your conclusions about the relationship between Owner Calculation and Government Standard?
Explain.
ARDBUSINESSSTATISTICSSec.4
Page34of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Practicum:MATH11002
BusinessStatistics
MODULE3
Submittedonlyon
Day/Date:____________/
______________
Time:WIB
In____________________
ModuleDescription:
Objective
Output
DateofReceipt
Score:
AssistantSignature
IherewithsignedhereonstatedthatIhavestrivedtodoallthiswiththe
modulemyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
Probability
Thestudentunderstandandabletodefineandexaminebasicprobability
conceptsDefineconditional,jointandmarginalprobabilityTouseBayes'
theorem to revise probabilities Statistical Independence; Addressed the
probability of a discrete random variable Define covariance and discuss
its application in finance To compute probability from the binomial,
Poisson and Hypergeometric distribution How to use this distribution to
solve business problem using Ms Excel Regression Analysis or Other
StatisticalSoftwares.
A report produced by the students should be in the form of working
proceduresandresultsinbothsoftcopyandhardcopy.
PROBABILITY
3.1 BasicProbability
Probability:thechancethatanuncertaineventwilloccur(alwaysbetween0and1)
Event:Eachpossibletypeofoccurrenceoroutcome
SimpleEvent:aneventthatcanbedescribedbyasinglecharacteristic
SampleSpace:thecollectionofallpossibleevents
Therearethreeapproachestoassessingtheprobabilityofanuncertainevent:
1.
AprioriClassicalProbability:theprobabilityofaneventisbasedonpriorknowledgeofthe
processinvolve
d.
Example:Findtheprobabilityofselectingafacecard(Jack,Queen,orKing)fromastandard
deckof52cards.Answer:
ARDBUSINESSSTATISTICSSec.5
Page35of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
2.
EmpiricalClassicalProbability:theprobabilityofaneventisbasedonobserveddata.
Probability of Occurrence
Example:Findtheprobabilityofselectingamaletakingstatisticsfromthepopulationdescribed
inthefollowingtable:
TakingStats
NotTakingStats
Total
Male
84
145
229
Female
76
134
210
Total
160
279
439
3.
84
439
0.191
SubjectiveProbability:theprobabilityofaneventisdeterminedbyanindividual,basedonthat
personspastexperience,personalopinion,and/oranalysisofaparticularsituation.
3.2 Samplespacesandevents,contingencytables,simpleprobabilityand
jointprobability
3.2.1
SampleSpace
TheSampleSpaceisthecollectionofallpossibleevents
Ex.All6facesofadie:
Ex.All52cardsinadeckofcards
Ex.Allpossibleoutcomeswhenhavingachild: BoyorGirl
3.2.2
EventinSampleSpace
Simpleevent
Anoutcomefromasamplespacewithonecharacteristic
ex.Aredcardfromadeckofcards
ComplementofaneventA(denotedA/)
AlloutcomesthatarenotpartofeventA
ex.Allcardsthatarenotdiamonds
Jointevent
Involvestwoormorecharacteristicssimultaneously
ARDBUSINESSSTATISTICSSec.5
Page36of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ex.Anacethatisalsoredfromadeckofcards
Inmathematics,aprobabilityofaneventAisrepresentedbyarealnumberintherangefrom0to1and
writtenasP(A),p(A)orPr(A).Animpossibleeventhasaprobabilityof0,andacertaineventhasa
probabilityof1.TheoppositeorcomplementofaneventAistheevent[notA](thatis,theeventofA
notoccurring);itsprobabilityisgivenbyP(notA)=1P(A).Asanexample,thechanceofnotrollingasix
onasixsideddieis1(chanceofrollingasix)=1
3.2.3
SimpleandJointProbability
Simple(Marginal)Probabilityreferstotheprobabilityofasimpleevent.
ex.P(King)
JointProbabilityreferstotheprobabilityofanoccurrenceoftwoormoreevents.
ex.P(KingandSpade)
IfboththeeventsAandBoccuronasingleperformanceofanexperimentthisiscalledtheintersection
orjointprobabilityofAandB,denotedas
.Iftwoevents,AandBareindependentthenthe
jointprobabilityis
forexample,iftwocoinsareflippedthechanceofbothbeingheadsis
IfeithereventAoreventBorbotheventsoccuronasingleperformanceofanexperimentthisiscalled
theunionoftheeventsAandBdenotedas
.Iftwoeventsaremutuallyexclusivethenthe
probabilityofeitheroccurringis
Forexample,thechanceofrollinga1or2onasixsideddieis
1
1 2
1 2
1
2
6
1
6
2
6
Iftheeventsarenotmutuallyexclusivethen
Forexample,whendrawingasinglecardatrandomfromaregulardeckofcards,thechanceofgettinga
heartorafacecard(J,Q,K)(oronethatisboth)is
,becauseofthe52cardsofadeck13
arehearts,12arefacecards,and3areboth:herethepossibilitiesincludedinthe"3thatareboth"are
includedineachofthe"13hearts"andthe"12facecards"butshouldonlybecountedonce.
ARDBUSINESSSTATISTICSSec.5
Page37of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ConditionalprobabilityistheprobabilityofsomeeventA,giventheoccurrenceofsomeothereventB.
ConditionalprobabilityiswrittenP(A|B),andisread"theprobabilityofA,givenB".Itisdefinedby
IfP(B)=0thenP(A|B)isundefined.
Summaryofprobabilities
Event
Probability
0,1
1
A
notA
AorB
AandB
ifAandBaremutuallyexclusive
|
ifAandBareindependent
AgivenB
3.3 Bayes'Theorem
P(Bi | A)
P(A | Bi )P(Bi )
where:
Bi=itheventofkmutuallyexclusiveandcollectivelyexhaustiveevents
A=neweventthatmightimpactP(Bi)
BayesTheoremExample
Adrillingcompanyhasestimateda40%chanceofstrikingoilfortheirnewwell.Adetailedtesthas
beenscheduledformoreinformation.Historically,60%ofsuccessfulwellshavehaddetailedtests,and
20%ofunsuccessfulwellshavehaddetailedtests.Giventhatthiswellhasbeenscheduledfora
detailedtest,whatistheprobabilitythatthewellwillbesuccessful?
Solution:
LetS=successfulwellandU=unsuccessfulwell
P(S)=.4,P(U)=.6(priorprobabilities)
DefinethedetailedtesteventasD
Conditionalprobabilities:P(D|S)=.6andP(D|U)=.2
ARDBUSINESSSTATISTICSSec.5
Page38of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
P(S | D)
P(D | S)P(S)
P(D | S)P(S) P(D | U)P(U)
(.6)(.4)
.24
.667
(.6)(.4) (.2)(.6) .24 .12
Giventhedetailedtest,therevisedprobabilityofasuccessfulwellhasrisento.667fromthe
originalestimateof0.4.
PriorProb.
Conditional
Prob.
S(successful)
.4
.6
.4*.6=.24
.24/.36=.667
U(unsuccessful)
.6
.2
.6*.2=.12
.12/.36=.333
Event
JointProb.
RevisedProb.
3.4 Assignment3.1
CreateentryasscreenshotbeloworuseProbability.xlsworkbookfilefromCDcompanionofStatistics
forManagersUsingMicrosoftExcelTextbook.
Inputthedataonlytothebluecolorcells.
Probabilities
SampleSpace
RowVariable
ARDBUSINESSSTATISTICSSec.5
A
A'
Totals
ColumnVariable
B
B'
200
50
100
650
300
700
Totals
250
750
1000
Page39of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
SimpleProbabilities
P(A)
P(A')
P(B)
P(B')
0.25
0.75
0.30
0.70
JointProbabilities
P(AandB)
P(AandB')
P(A'andB)
P(A'andB')
0.20
0.05
0.10
0.65
AdditionRule
P(AorB)
P(AorB')
P(A'orB)
P(A'orB')
0.35
0.90
0.95
0.80
1. AMusicStorehasbeenvisitedby7customersthathavebeenboughtsomegoodsand9others
justwindowshoppingatrandomtimes.Achmad(customer)arrivedat11:30am.
a. Giveanexampleofasimpleevent
b. Whatisthecomplementofacustomerhavebeenboughtsomegoods?
2. Giventhefollowingcontingencytable:
B B
A 12 48
A 30 54
UsecalculatorandMSExceltofindtheprobabilityof
a. EventA
b. EventAandB
c. EventAandB
d. EventAandB
3. Comparecalculationresults(calculatorandMsExcel)
4. Aboxofnineglovescontainstwolefthandedglovesandsevenrighthandedgloves.
a. iftwoglovesarerandomlyselectedfromtheboxwithoutreplacement,whatisthe
probabilitythatbothglovesselectedwillberighthanded?
b. iftwoglovesarerandomlyselectedfromtheboxwithoutreplacement,whatisthe
probabilitytherewillbeonerighthandedandonelefthandedgloves?
c. ifthreeglovesareselectedfromtheboxwithreplacement,whatistheprobabilitythat
allthreegloveswillbeleftrighthanded?
d. Ifyouweresamplingwithreplacement,whatwouldbetheanswersto(a)and(b)?
ARDBUSINESSSTATISTICSSec.5
Page40of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
5. Anadvertizingexecutiveisstudyingtelevisionviewinghabitsofmarriedmanandwomenduring
primetimehours.Basedonpastviewingrecords,theexecutivehasdeterminedthatduring
primetime,husbandsarewatchingtelevision60%ofthetime.Whenthehusbandiswatching
television,40%ofthetimethewifeisalsowatching.Whenthehusbandisnotwatching
television,30%ofthetimethewifeiswatchingtelevision.Findtheprobabilitythat
a. Ifthewifeiswatchingtelevision,thehusbandisalsowatchingtelevision
b. Thewifeiswatchingtelevisioninprimetime.
3.5 BasicProbabilityRules
Arandomvariablerepresentsapossiblenumericalvaluefromanuncertainevent.
Discreterandomvariablesproduceoutcomesthatcomefromacountingprocess(i.e.number
ofclassesyouaretaking).
Continuousrandomvariablesproduceoutcomesthatcomefromameasurement(i.e.your
annualsalary,oryourweight).
3.5.1
DiscreteRandomVariable
Aprobabilitydistributionforadiscreterandomvariableisamutuallyexclusivelistingofall
possiblenumericaloutcomesforthatvariableandaparticularprobabilityofoccurrence
associatedwitheachoutcome
NumberofClassesTaken
2
3
4
5
Probability
0.2
0.4
0.24
0.16
Example:Experimentwithtoss2coins.LetX=numberofheads.
XValueProbability
01/4=.25
12/4=.50
21/4=.25
ARDBUSINESSSTATISTICSSec.5
Page41of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
3.5.2
DiscreteRandomVariablesExpectedValue
ExpectedValue(ormean)ofadiscretedistribution(WeightedAverage)
N
E(X) X i P ( X i )
i 1
Example:Toss2coins,X=#ofheads,
ComputeexpectedvalueofX:
E(X)=(0)(.25)+(1)(.50)+(2)(.25)=1.0
3.5.3
DiscreteRandomVariablesDispersion
Varianceofadiscreterandomvariable
N
2 [X i E(X)]2 P(X i )
i 1
StandardDeviationofadiscreterandomvariable
[X
i 1
E(X)]2 P(X i )
where:
E(X)=ExpectedvalueofthediscreterandomvariableX
Xi=theithoutcomeofX
P(Xi)=ProbabilityoftheithoccurrenceofX
Example:Toss2coins,X=#heads,computestandarddeviation(recallthatE(X)=1)
Covariance
Thecovariancemeasuresthestrengthofthelinearrelationshipbetweentwonumericalrandom
variablesXandY.Apositivecovarianceindicatesapositiverelationship.Anegativecovariance
indicatesanegativerelationship.
N
Covarianceformula:
XY [ X i E ( X )][(Yi E (Y )] P ( X iYi )
i 1
where: X=discretevariableX
Xi=theithoutcomeofX
Y=discretevariableY
Yi=theithoutcomeofY
P(XiYi)=probabilityofoccurrenceoftheconditionaffecting
ARDBUSINESSSTATISTICSSec.5
Page42of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
theithoutcomeofXandtheithoutcomeofY
Example:
Considerthereturnper$1000fortwotypesofinvestments
Economic
P(X Y )Condition
Investment
PassiveFundX
AggressiveFundY
0.2Recession
$25
$200
0.5StableEconomy
+$50
+$60
+$100
+$350
i i
0.3ExpandingEconomy
InvestmentReturnsTheMean
E(X)=X=(25)(.2)+(50)(.5)+(100)(.3)=50
E(Y)=Y=(200)(.2)+(60)(.5)+(350)(.3)=95
Interpretation:FundXisaveraginga$50.00returnandfundYisaveraginga$95.00
returnper$1000invested.
InvestmentReturnsStandardDeviation
X (-25 50) 2 (.2) (50 50) 2 (.5) (100 50) 2 (.3) 43.30
Y (-200 95) 2 (.2) (60 95) 2 (.5) (350 95) 2 (.3) 193.71
Interpretation:EventhoughfundYhasahigheraveragereturn,itissubjecttomuch
morevariabilityandtheprobabilityoflossishigher.
InvestmentReturnsCovariance
Interpretation: Since the covariance is large and positive, there is a positive relationship
betweenthetwoinvestmentfunds,meaningthattheywilllikelyriseandfalltogether.
3.5.5
TheSumofTwoRandomVariables:Measures
ExpectedValue: E ( X Y ) E ( X ) E (Y )
Variance: Var( X Y ) 2X Y 2X Y2 2 XY
ARDBUSINESSSTATISTICSSec.5
Page43of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Standarddeviation: X Y 2X Y
Example:PortfolioExpectedReturnandExpectedRisk
Investmentportfoliosusuallycontainseveraldifferentfunds(randomvariables)
Theexpectedreturnandstandarddeviationoftwofundstogethercannowbecalculated.
InvestmentObjective:Maximizereturn(mean)whileminimizingrisk(standarddeviation).
Recall:InvestmentX:E(X)=50
X=43.30
InvestmentY:E(Y)=95
Y=193.21
XY=8250
Suppose40%oftheportfolioisinInvestmentXand60%isinInvestmentY:
E(P) .4 (50) (.6) (95) 77
3.6 BinomialDistribution
3.6.1
Properties
Afixednumberofobservations,n
ex.15tossesofacoin;tenlightbulbstakenfromawarehouse
Twomutuallyexclusiveandcollectivelyexhaustivecategories
ex.headortailineachtossofacoin;defectiveornotdefectivelightbulb;havinga
boyorgirl
Generallycalledsuccessandfailure
Probabilityofsuccessisp,probabilityoffailureis1p
Constantprobabilityforeachobservation
ex.Probabilityofgettingatailisthesameeachtimewetossthecoin
Observationsareindependent
Theoutcomeofoneobservationdoesnotaffecttheoutcomeoftheother
Twosamplingmethods
Infinitepopulationwithoutreplacement
Finitepopulationwithreplacement
ThenumberofcombinationsofselectingXobjectsoutofnobjectsis:
n
n
n!
C X
X X!(n X)!
where:
n!=n(n1)(n2)...(2)(1)
X!=X(X1)(X2)...(2)(1)
0!=1(bydefinition)
ARDBUSINESSSTATISTICSSec.5
Page44of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
3.6.2
TheBinomialDistributionFormula
P(X)
n!
p X (1 p) n X
X!(n X)!
P(X) =probabilityofXsuccessesinntrials,withprobabilityofsuccessponeachtrial
X =numberofsuccessesinsample,(X=0,1,2,...,n)
N
=samplesize(numberoftrialsorobservations)
P
=probabilityofsuccess
Example:Whatistheprobabilityofonesuccessinfiveobservationsiftheprobabilityofsuccess
is.1?X=1,n=5,andp=.1
n!
p X (1 p ) n X
X!(n X)!
5!
(.1)1 (1 .1) 51
1!(5 1)!
P(X 1)
(5)(.1)(.9) 4
.32805
3.6.3
TheshapeandCharacteristics
Theshapeofthebinomialdistributiondependsonthevaluesofpandn
Mean: E(x) np
VarianceandStandardDeviation
2 np(1 - p ) and np (1 - p )
np (5)(.1) 0.5
np (1 - p ) (5)(.1)(1 .1) 0.6708
np (5)(.5) 2.5
np(1 - p) (5)(.5)(1 .5) 1.118
ARDBUSINESSSTATISTICSSec.5
Page45of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
3.7 PoissonDistribution
Anareaofopportunityisacontinuousunitorintervaloftime,volume,orsuchareainwhich
morethanoneoccurrenceofaneventcanoccur.
ex.Thenumberofscratchesinacarspaint
ex.Thenumberofmosquitobitesonaperson
ex.Thenumberofcomputercrashesinaday
3.7.1
Properties
Countthenumberoftimesaneventoccursinagivenareaofopportunity
Theprobabilitythataneventoccursinoneareaofopportunityisthesameforallareasof
opportunity
Thenumberofeventsthatoccurinoneareaofopportunityisindependentofthenumber
ofeventsthatoccurintheotherareasofopportunity
Theprobabilitythattwoormoreeventsoccurinanareaofopportunityapproacheszeroas
theareaofopportunitybecomessmaller
Theaveragenumberofeventsperunitis(lambda)
3.7.2
Formula
P(X)
where:
X=theprobabilityofXeventsinanareaofopportunity
=expectednumberofevents
e=mathematicalconstantapproximatedby2.71828
Supposethat,onaverage,5carsenteraparkinglotperminute.Whatistheprobabilitythatina
givenminute,7carswillenter?So,X=7and=5
P(7)
e x e 5 57
0.104
X!
7!
So,thereisa10.4%chance7carswillentertheparkinginagivenminute.
ARDBUSINESSSTATISTICSSec.5
e x
X!
Page46of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
3.7.3
Shape
=0.5
0.70
0.60
P(X)
0
1
2
3
4
5
6
7
0.6065
0.3033
0.0758
0.0126
0.0016
0.0002
0.0000
0.0000
P(x)
0.50
P(X=2)=.0758
0.40
0.30
0.20
0.10
0.00
0
3.8 Hypergeometricdistribution
Thebinomialdistributionisapplicablewhenselectingfromafinitepopulationwithreplacement
orfromaninfinitepopulationwithoutreplacement.
Thehypergeometricdistributionisapplicablewhenselectingfromafinitepopulationwithout
replacement.
ntrialsinasampletakenfromafinitepopulationofsizeN
Sampletakenwithoutreplacement
Outcomesoftrialsaredependent
ConcernedwithfindingtheprobabilityofXsuccessesinthesamplewherethereareA
successesinthepopulation
3.8.1
Formula
A N A
X n X
P( X )
N
n
Where:
N=populationsize
A=numberofsuccessesinthepopulation
NA=numberoffailuresinthepopulation
ARDBUSINESSSTATISTICSSec.5
Page47of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
n=samplesize
X=numberofsuccessesinthesample
nX=numberoffailuresinthesample
Themeanofthehypergeometricdistributionis: E(x)
Thestandarddeviationis:
Where:
nA
nA(N - A) N - n
N2
N -1
N-n
is called the Finite Population Correction Factor from sampling without
N -1
replacementfromafinitepopulation
3.8.2
Example
Differentcomputersarecheckedfrom10inthedepartment.4ofthe10computershaveillegal
software loaded. What is the probability that 2 of the 3 selected computers have illegal
softwareloaded?
So,N=10,n=3,A=4,X=2
A N A 4 6
X n X 2 1 (6)(6)
P(X 2)
0.3
120
N
10
n
3
Theprobabilitythat2ofthe3selectedcomputershaveillegalsoftwareloadedis.30,or30%.
3.9 ReadExcelCompaniontoChapter5
Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,pages211215
3.10 Assignment3.2
1.
2.
3.
4.
5.
ProblemsforSection5.1Number5.2and5.4
ProblemsforSection5.2Number5.14
ProblemsforSection5.3Number5.24and5.28
ProblemsforSection5.4Number5.34and5.42
ProblemsforSection5.5Number5.46and5.50
ARDBUSINESSSTATISTICSSec.5
Page48of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
3.11 Assignment3.3
1.
2.
3.
4.
ProblemSection6.2No.6.2,No.6.7
ProblemSection6.3No.6.14,6.15andNo.6.16
ProblemSection6.4No.6.24,6.25andNo.6.26
ProblemSection6.5No.6.35andNo.6.36
ARDBUSINESSSTATISTICSSec.5
Page49of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Practicum:Math11002
BusinessStatistics
MODULE4
Submittedonlyon
Day/Date:____________/
______________
Time:12.0014.00WIB
In____________________
ModuleDescription:
Objective
Output
DateofReceipt
Score:
AssistantSignature
I herewith signed here on stated that I have strived to do all this the
modulebymyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
NORMALANDSAMPLINGDISTRIBUTION
Define continuous distribution: normal, uniform and exponential
Probabilities using formulas and tables The concept of the sampling
distributionTheimportanceoftheCentralLimitTheoremExaminewhen
toapplydifferentdistributions
Useseparatepaperstoreportyourresults(inhandwritingorcomputer
print out). A report produced by the students should be in the form of
workingproceduresandresultsinbothsoftcopyandhardcopy.
4 NORMALANDSAMPLINGDISTRIBUTION
4.1 NormalDistributionandEvaluatingNormality
NormaldistributionorGaussiandistributionisacontinuousprobabilitydistributionthatdescribesdata
thatclusteraroundthemean.Thenormaldistributionhasseveraltheoreticalproperties:
BellShapedinitsappearance
Measuresofcentraltendency(mean,medianandmode)areequal
Interquartilerangeisequalto1.33standardeviations.
Infiniterange
The normal distribution can be used to describe, at least approximately, any variable that tends to
clusteraroundthemean.Forexample,theheightsofadultmalesintheIndonesianareroughlynormally
distributed,withameanofabout160cm.Mostmenhaveaheightclosetothemean,thoughasmall
numberofoutliershaveaheightsignificantlyaboveorbelowthemean.Ahistogramofmaleheightswill
appearsimilartoabellcurve,withthecorrespondencebecomingcloserifmoredataareused.
ARDBUSINESSSTATISTICSSec.6
Page50of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Figure41NormalDistribution
Source:http://upload.wikimedia.org/wikipedia/commons/b/bb/Normal_distribution_and_scales.gif
Bythecentrallimittheorem,thesumofalargenumberofindependentrandomvariablesisdistributed
approximately normally. For this reason, the normal distribution is used throughout statistics, natural
science,andsocialscienceasasimplemodelforcomplexphenomena.Forexample,theobservational
error in an experiment is usually assumed to follow a normal distribution, and the propagation of
uncertaintyiscomputedusingthisassumption.
4.1.1
NormalProbabilityDensityFunction
1
2
The Z value is equal to the difference between X and the mean, , divided by the standard
deviation, .
ARDBUSINESSSTATISTICSSec.6
Page51of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
4.1.1.2 ProbabilityandtheNormalCurve
The normal distribution is a continuous probability distribution. This has several implications for
probability.
Thetotalareaunderthenormalcurveisequalto1.
TheprobabilitythatanormalrandomvariableXequalsanyparticularvalueis0.
TheprobabilitythatXisgreaterthanaequalstheareaunderthenormalcurveboundedbya
andplusinfinity(asindicatedbythenonshadedareainthefigurebelow).
TheprobabilitythatXislessthanaequalstheareaunderthenormalcurveboundedbyaand
minusinfinity(asindicatedbytheshadedareainthefigurebelow).
TheStandardizedNormalProbabilityDensityFunctionisgivenbyequation:
1
2
Additionally, every normal curve (regardless of its mean or standard deviation) conforms to the
following"rule".
About68%oftheareaunderthecurvefallswithin1standarddeviationofthemean.
About95%oftheareaunderthecurvefallswithin2standarddeviationsofthemean.
About99.7%oftheareaunderthecurvefallswithin3standarddeviationsofthemean.
Collectively,thesepointsareknownastheempiricalruleorthe689599.7rule.Clearly,givenanormal
distribution,mostoutcomeswillbewithin3standarddeviationsofthemean.
Toseehowtransformationformulaisappliedseepage222232Chapter6TheNormalDistributionof
Levine,et.al.2008.StatisticsforManagersusingMicrosoftExcelFifthEdition.
4.1.2
EvaluatingNormality
4.1.2.1 CompareDataCharacteristicstoTheoreticalPropertiesofnormaldistribution
Thenormaldistribution:
Symmetricalmeanandmedianareequal
Bellshapedempiricalruleapplies
Interquartilerange=1.33standarddeviations
Howtocompare:
5. Constructchartsandobservetheirappearance.Forsmallormoderatedatasets,
constructstemleafdisplayoraboxandwhiskerplot.Forlargedatasets,constructthe
frequencydistributionandplotthehistogramorpolygon.
ARDBUSINESSSTATISTICSSec.6
Page52of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
6. Computedescriptivenumericalmeasuresandcomparethecharacteristicsofthedata
withthetheoreticalpropertiesofthenormaldistribution.Comparemeanandmedia.
Theinterquartilerangeshouldapproximately1.33timesofthestandarddeviation.The
rangeapproximately6timesthestandarddeviation.
7. Evaluatehowthevaluesindatadistributed.Determinewhether2/3ofvalueslie
betweenthemeanandstandarddeviation.Determine4/5ofthevaluesliebetween
themeanand1.28standarddeviations.Determinewhether19outofevery20
valuesliesbetweenthemean2standarddeviation
4. Example:
3YearReturn
Mean
StandardError
Median
Mode
StandardDeviation
SampleVariance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Largest(1)
Smallest(1)
ConfidenceLevel(95.0%)
17.8
0.17099
17.2
15.1
4.94991
24.5016
1.03812
0.66073
35.6
6.7
42.3
14916.4
838
42.3
6.7
0.33562
3 Year
Return
10
20
30
40
1. TheMean(17.8)slightlyhigherthanTheMedian(17.2){NormalDist.mean=
median}
2. BoxandWhiskerplotrightskewedwithmaxoulier42.{NormalDist.Symmetrical}
3. Interquartilerange7.0approx.1.41StandardDeviation(SD){NormalDist.1.33}
4. Range35.6equalto7.19SD{NormalDist.6SD}
5. 74.2Returnsarewithin1SDofthemean.{NormalDist.68.26%}
6. 83.3%orreturnswithin1.28SD(NormalDist.80%}
Thus,theconclusionbaseonthefactabove,thethreeyearreturnsarerightskewedand
notnormallydistributed.
4.1.2.2 Constructanormalprobabilityplot
Anormalprobabilityplotisgraphicalapproachforevaluatingwhetherdataarenormally
distributed.Theapproachiscalledquantilequantileplot.Anormalprobabilityplotfordatafrom
anormaldistributionwillbeapproximatelylinear.Tocomputenormalprobabilitiesandcreate
plots,wecanusePHStatasdescribedonExcelCompaniontoChapter6ofLevine,et.al.2008.
ARDBUSINESSSTATISTICSSec.6
Page53of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
StatisticsforManagersUsingMicrosoftExcel.FifthEditon.PearsonEducation,Inc.,Upper
SaddleRiver,NewJersey.,pages247249
4.2 SamplingandSamplingDistribution
ReadExcelCompaniontoChapter7Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.
FifthEditon.PearsonEducation,Inc.,UpperSaddleRiver,NewJersey.,pages281282
4.2.1
Sample
Selectingasampleislesstimeconsumingthanselectingeveryiteminthepopulation(census).
Selectingasampleislesscostlythanselectingeveryiteminthepopulation.
Ananalysisofasampleislesscumbersomeandmorepracticalthanananalysisoftheentire
population.
4-2Therelationshipbetweenpopulations,samples,parameters,andstatistics.
4.2.2
TypesofSamples
Inanonprobabilitysample,itemsincludedarechosenwithoutregardtotheirprobabilityof
occurrence.
o Conveniencesampling,itemsareselectedbasedonlyonthefactthattheyareeasy,
inexpensive,orconvenienttosample.
o Judgmentsample,yougettheopinionsofpreselectedexpertsinthesubject
matter.
Inaprobabilitysample,itemsinthesamplearechosenonthebasisofknownprobabilities.
o SimpleRandomSampling,everyindividualoritemfromtheframehasanequal
chanceofbeingselected.Selectionmaybewithreplacement(selectedindividualis
returnedtoframeforpossiblereselection)orwithoutreplacement(selected
individualisntreturnedtotheframe).Samplesobtainedfromtableofrandom
numbersorcomputerrandomnumbergenerators.
ARDBUSINESSSTATISTICSSec.6
Page54of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
SystematicSampling,Decideonsamplesize:n;DivideframeofNindividualsinto
groupsofkindividuals:k=N/n;Randomlyselectoneindividualfromthe1stgroup;
Selecteverykthindividualthereafter.
Forexample,supposeyouweresamplingn=9individualsfromapopulation
ofN=72.So,thepopulationwouldbedividedintok=72/9=8groups.
Randomlyselectamemberfromgroup1,sayindividual3.Then,select
every8thindividualthereafter(i.e.3,11,19,27,35,43,51,59,67)
o StratifiedSampling,dividepopulationintotwoormoresubgroups(calledstrata)
accordingtosomecommoncharacteristic.Asimplerandomsampleisselectedfrom
eachsubgroup,withsamplesizesproportionaltostratasizes.Samplesfrom
subgroupsarecombinedintoone.Thisisacommontechniquewhensampling
populationofvoters,stratifyingacrossracialorsocioeconomiclines.
o ClusterSampling,Populationisdividedintoseveralclusters,eachrepresentative
ofthepopulation.Asimplerandomsampleofclustersisselected.Allitemsinthe
selectedclusterscanbeused,oritemscanbechosenfromaclusterusinganother
probabilitysamplingtechnique.Acommonapplicationofclustersamplinginvolves
electionexitpolls,wherecertainelectiondistrictsareselectedandsampled.
ComparingSamplingMethods
o SimplerandomsampleandSystematicsample
Simpletouse
Maynotbeagoodrepresentationofthepopulationsunderlying
characteristics
o Stratifiedsample
Ensuresrepresentationofindividualsacrosstheentirepopulation
o Clustersample
Morecosteffective
Lessefficient(needlargersampletoacquirethesamelevelofprecision)
o
4.2.3
SamplingDistributions
Asamplingdistributionisadistributionofallofthepossiblevaluesofastatisticforagiven
sizesampleselectedfromapopulation.
Forexample,supposeyousample50studentsfromyourcollegeregardingtheirmeanGPA.
Ifyouobtainedmanydifferentsamplesof50,youwillcomputeadifferentmeanforeach
sample.WeareinterestedinthedistributionofallpotentialmeanGPAwemightcalculate
foranygivensampleof50students.
Example:
o Supposeyourpopulation(simplified)wasfourpeopleatyourinstitution.
o PopulationsizeN=4
o Randomvariable,X,isageofindividuals
ARDBUSINESSSTATISTICSSec.6
Page55of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
o
4.2.4
ValuesofX:18,20,22,24(years)
SAMPLINGFROMFINITEPOPULATIONS
4.2.4.1 USINGTHEFINITEPOPULATIONCORRECTIONFACTORWITHTHEMEAN
InthecerealfillingexampleinSection7.3onpage265,youselectedasampleof25cereal
boxesfromafillingprocesswith=368grams.Supposethat2,000boxes(i.e.,thepopulation)
arefilledonthisparticularday.Usingthefpcfactor,whatistheprobabilitythatthesample
meanisbelow365grams?
SOLUTIONUsingthefpcfactor,=15,n=25,andN=2,000,sothatTheprobabilitythatthe
samplemeanisbelow365iscomputedasfollows:
FromTableE.2,theareabelow365gramsis0.1562.
Thefpcfactorhasaverysmalleffectonthestandarderrorofthemeanandthesubsequent
areaunderthenormalcurvebecausethesamplesizeisonly1.25%ofthepopulationsize(that
is,n/N=25/2,000=0.0125).
4.3 AssignmentforSimpleRandomSample
ProblemforSection7.1Number7.2,7.4,and7.8;
ProblemforSection7.2Number7.10,7.14
4.4 AssignmentforSamplingDistribution
ProblemforSection7.4Number7.18,7.20,and7.24
4.5 AssignmentforTheSamplingDistributionofthemean
ProblemforSection7.5Number7.28,and7.32
ARDBUSINESSSTATISTICSSec.6
Page56of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
4.6 AssignmentforSamplingfromFinitePopulation
1. GiventhatN=80andn=10andthesampleisselectedwithoutreplacement,
determinethefpcfactor.
2. Historically,93%ofthedeliveriesofanovernightmailservicearrivebefore10:30the
followingmorning.Ifarandomsampleof500deliveriesisselectedwithoutreplacement
fromapopulationthatconsistedof10,000deliveries,whatistheprobabilitythatthe
samplewillhave:
a.between93%and95%ofthedeliveriesarrivingbefore10:30thefollowingmorning?
b.morethan95%ofthedeliveriesarrivingbefore10:30thefollowingmorning?
ARDBUSINESSSTATISTICSSec.6
Page57of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Practicum:Math11002
BusinessStatistics
MODULE5
Submittedonlyon
Day/Date:____________/
______________
Time:12.0014.00WIB
In____________________
ModuleDescription:
Objective
Output
DateofReceipt
Score:
AssistantSignature
I herewith signed here on stated that I have strived to do all this the
modulebymyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
CONFIDENCEINTERVALESTIMATION
Toconstructandinterpretconfidenceintervalestimatesforthemeanand
theproportionHowtodeterminethesamplesizenecessarytodevelopa
confidence interval for the mean or proportion How to use confidence
intervalestimatesinauditing
Useseparatepaperstoreportyourresults(inhandwritingorcomputer
print out). A report produced by the students should be in the form of
workingproceduresandresultsinbothsoftcopyandhardcopy.
PreLabRead:
Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,pages322326.
5 CONFIDENCEINTERVALESTIMATION
5.1 Confidenceintervals
5.1.1
Apointestimateandaconfidenceintervalestimate
5.1.1.1 PointEstimates
A point estimate is a single number. For the population mean (and population standard
deviation),apointestimateisthesamplemean(andsamplestandarddeviation).Aconfidence
intervalprovidesadditionalinformationaboutvariability.
PointEstimate
Widthofconfidenceinterval
5.1.1.2 ConfidenceIntervalEstimates
PointEstimate(CriticalValue)(StandardError)
ARDBUSINESSSTATISTICSSec.7
Page58of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Aconfidenceintervalgivesarangeestimateofvalues:
Takesintoconsiderationvariationinsamplestatisticsfromsampletosample
Basedonalltheobservationsfrom1sample
Givesinformationaboutclosenesstounknownpopulationparameters
ConfidenceLevel:Confidenceinwhichtheintervalwillcontaintheunknownpopulation
parameter.Apercentage(lessthan100%)Statedintermsoflevelofconfidence
Ex.95%confidence,99%confidence
5.1.1.3 ConfidenceLevel
Supposeconfidencelevel=95%,alsowritten(1)=.95.Arelativefrequencyinterpretation
In the long run, 95% of all the confidence intervals that can be constructed will contain the
unknown true parameter. A specific interval either will contain or will not contain the true
parameter
5.1.2
ConfidenceIntervalfor(Known)
Assumptions:
o Populationstandarddeviationisknown
o Populationisnormallydistributed
o Ifpopulationisnotnormal,uselargesample
Confidenceintervalestimate: X Z
whereZisthestandardizednormaldistribution
n
criticalvalueforaprobabilityof/2ineachtail.
5.1.2.1 FindingtheCriticalValue,Z
Considera95%confidenceinterval:
ARDBUSINESSSTATISTICSSec.7
Page59of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Commonlyusedconfidencelevelsare90%,95%,and99%
ConfidenceLevel
80%
90%
95%
98%
99%
99.8%
99.9%
ConfidenceCoefficient
.80
.90
.95
.98
.99
.998
.999
Zvalue
1.280
1.645
1.960
2.330
2.580
3.080
3.270
5.1.2.2 IntervalsandLevelofConfidence
Example:
Asampleof11circuitsfromalargenormalpopulationhasameanresistanceof2.20ohms.We
knowfrompasttestingthatthepopulationstandarddeviationis.35ohms.Determinea95%
and99%confidenceintervalforthetruemeanresistanceofthepopulation.
Solution:
95%CI
(1.9932 , 2.4068)
X Z
2.20 1.96 (.35/ 11) 2.20 .2068
n
Weare95%confidentthatthetruemeanresistanceisbetween1.9932and2.4068ohms
Althoughthetruemeanmayormaynotbeinthisinterval,95%ofintervalsformedinthis
mannerwillcontainthetruemean.
99%CI
(1.9277 , 2.4723)
X Z
2.20 2.58 (.35/ 11) 2.20 0.2723
n
ARDBUSINESSSTATISTICSSec.7
Page60of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Weare98%confidentthatthetruemeanresistanceisbetween1.9277and2.4723ohms
Althoughthetruemeanmayormaynotbeinthisinterval,96%ofintervalsformedinthis
mannerwillcontainthetruemean.
5.1.3
ConfidenceIntervalfor(Unknown)
Ifthepopulationstandarddeviationisunknown,wecansubstitutethesamplestandard
deviation,SThisintroducesextrauncertainty,sinceSisvariablefromsampletosampleSowe
usethetdistributioninsteadofthenormaldistribution.
Assumptions:
o
o
o
o
Populationstandarddeviationisunknown
Populationisnormallydistributed
Ifpopulationisnotnormal,uselargesample
UseStudentstDistribution
ConfidenceIntervalEstimate: X t n -1
S
,wheretisthecriticalvalueofthetdistributionwith
n
n1d.f.andanareaof/2ineachtail
Thetvaluedependsondegreesoffreedom(d.f.),Numberofobservationsthatarefreetovary
aftersamplemeanhasbeencalculated:d.f.=n1
5.1.3.1 StudentstDistribution
IfnincreasesthentZ.
ARDBUSINESSSTATISTICSSec.7
Page61of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
5.1.3.2 StudentstTable
5.1.3.3 ConfidenceIntervalfor(Unknown)Example
Example1
Arandomsampleofn=25hasX=50andS=8.Forma95%confidenceintervalfor.
Solution:d.f.=n1=24,sotheconfidenceintervalis
X t/2, n -1
S
8
50 (2.0639)
n
25
=(46.698,53.302)
A
Forma95%confidenceinterval
forMeanusingMsExcel
2
3
4
5
6
7
Data
SampleStandardDeviation
SampleMean
SampleSize
8
50
25
ConfidenceLevel
95%
8
9
10
11
12
13
14
15
16
17
IntermediateCalculations
StandardErroroftheMean
DegreesofFreedom
tValue
IntervalHalfWidth
ConfidenceInterval
IntervalLowerLimit
IntervalUpperLimit
1.6000
24
2.0639
3.3022
=B4/SQRT(B6)
=B61
=TINV(1B7,B11)
=B12*B10
46.6978 =B5B13
53.3022 =B5+B13
Example2:
ARDBUSINESSSTATISTICSSec.7
Page62of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Contruct a 95% confidence interval estimate for the population mean force required to break the
insulator:
1870
1866
1820
1728
1764
1744
ForceRequiredtoBreakElectricInsulators(inpounds)
1656
1610
1634
1784
1522
1696
1734
1662
1734
1774
1550
1756
1788
1688
1810
1752
1680
1810
1592
1762
1652
1662
1866
1736
Solution:
PutDataonrangeofF2toO4
A
B
EstimatefortheMeanAmountofForceRequired
2
3
4
5
6
7
Data
SampleStandardDeviation
SampleMean
SampleSize
ConfidenceLevel
89.5508 =STDEV(F2:O4)
1723.4 =AVERAGE(F2:O4)
30 =COUNT(F2:O4)
95%
8
9
10
11
12
13
14
15
16
17
IntermediateCalculations
StandardErroroftheMean
DegreesofFreedom
tValue
IntervalHalfWidth
ConfidenceInterval
IntervalLowerLimit
IntervalUpperLimit
16.3497
29
2.0452
33.4388
=B4/SQRT(B6)
=B61
=TINV(1B7,B11)
=B12*B10
1689.96 =B5B13
1756.84 =B5+B13
We can conclude with 95% confidence that the mean breaking force required for the population of
insulator is between 1689.96 an d 1756.84 pounds. The validity of this confidence interval estimate
dependsontheassumptionthattheforcerequiredisnormallydistributed.Ifthesamplenumberislarge
than we can slightly loosen this assumption. Thus, with a sample of 30, we can use the t distribution
even distribution is slightly left skewed (see. Probability Plot or boxand whisker plot). Thus, the t
distributionisappropriateforthedata.
ARDBUSINESSSTATISTICSSec.7
Page63of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
2000
1800
1600
1400
1200
Force
1000
Force
800
600
400
200
0
-3
-2
-1
Z0
Value
1500
1600
1700
1800
1900
5.2 ConfidenceIntervalEstimateforaSinglePopulationProportion
Anintervalestimateforthepopulationproportion()canbecalculatedbyaddinganallowance
foruncertaintytothesampleproportion(p).
Recallthatthedistributionofthesampleproportionisapproximatelynormalifthesamplesizeis
large,withstandarddeviation: p
(1 )
Wewillestimatethiswithsampledata:
p(1 p)
Upperandlowerconfidencelimitsforthepopulationproportionarecalculatedwiththeformula:
pZ
p(1 p)
where:
5.2.1
Zisthestandardizednormalvalueforthelevelofconfidencedesired
pisthesampleproportion
nisthesamplesize
ExampleforConfidenceIntervalsforthePopulationProportion
A random sample of 100 people shows that 25 have opened IRAs this year. Form a 95%
confidenceintervalforthetrueproportionofthepopulationwhohaveopenedIRAs.
ARDBUSINESSSTATISTICSSec.7
Page64of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
SolvingConfidenceIntervalforPopulationProportionusingMsExcel
1
2
3
4
5
ProportionofInErrorSalesInvoices
Data
SampleSize
100
NumberofSuccesses
10
ConfidenceLevel
95%
6
7
8
IntermediateCalculations
SampleProportion
9
10
11
12
13
14
ZValue
StandardErroroftheProportion
IntervalHalfWidth
ConfidenceInterval
IntervalLowerLimit
IntervalUpperLimit
0.1
1.9600
0.03
0.0588
=B5/B4
=NORMSINV((1B6)/2)
=SQRT(B9*(1B9)/B4)
=ABS(B10*B11)
0.0412 =B9B12
0.1588 =B9+B12
5.3 DeterminingSampleSize
Therequiredsamplesizecanbefoundtoreachadesiredmarginoferror(e)withaspecifiedlevel
of confidence (1 ). The margin of error is also called sampling error is the amount of
imprecisionintheestimateofthepopulationparameterandtheamountaddedandsubtractedto
thepointestimatetoformtheconfidenceinterval.
To determine the required sample size for the mean, you must know The desired level of
confidence(1),whichdeterminesthecriticalZvalue;theacceptablesamplingerror(marginof
error),eandThestandarddeviation,.
Theformula: n
5.3.1
Z 2 2
e2
IFPopulationStandardDeviation()Known
If=45,whatsamplesizeisneededtoestimatethemeanwithin5with90%confidence?
Z 2 2 (1.645) 2 (45) 2
219.19 Therequiredsamplesizeisn=220
Solution: n
52
e2
UsingMsExcel:
ARDBUSINESSSTATISTICSSec.7
Page65of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
5.3.2
IFPopulationStandardDeviation()Unknown
Ifunknown,canbeestimatedwhenusingtherequiredsamplesizeformulabyusingavalue
forthatisexpectedtobeatleastaslargeasthetrueandselectapilotsampleand
estimatewiththesamplestandarddeviation,S.
5.3.3
ToDetermineTheRequiredSampleSizeForTheProportion
Todeterminetherequiredsamplesizefortheproportion,youmustknow:
o
o
o
o
Thedesiredlevelofconfidence(1),whichdeterminesthecriticalZvalue
Theacceptablesamplingerror(marginoferror),e
Thetrueproportionofsuccesses,
canbeestimatedwithapilotsample,ifnecessary(orconservativelyuse=.50)
Z 2 (1 )
n
e2
Example:Howlargeasamplewouldbetoestimatethetrueproportiondefectiveinalarge
populationwithin3%,with95%confidence?(Assumeapilotsampleyieldsp=.12)
Solution:For95%confidence,useZ=1.96,e=.03andp=.12,sousethistoestimate
UsingMsExcel:
ARDBUSINESSSTATISTICSSec.7
Page66of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
5.4 Assignment5
1.
2.
3.
4.
Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,Chapter8Problems.Pages283319
ProblemforSection8.1No.8.2,8.4,8.8
ProblemforSection8.2No.8.12,8.14,8.18,8.22
ProblemforSection8.3No.8.24,8.28,8.32
ProblemforSection8.4No.8.36,8.40,8.46
ARDBUSINESSSTATISTICSSec.7
Page67of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Practicum:Math11002
BusinessStatistics
MODULE6
Submittedonlyon
Day/Date:____________/
______________
Time:12.0014.00WIB
In____________________
ModuleDescription:
Objective
Output
DateofReceipt
Score:
AssistantSignature
I herewith signed here on stated that I have strived to do all this the
modulebymyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
HYPOTHESISTESTINGANDTWOSAMPLETEST
ThebasicprinciplesofhypothesistestingHowtousehypothesistestingto
test a mean or proportion The assumption of each hypothesistesting
procedure, how to evaluate them and the consequences if they are
violated Formulate a decision rule for testing a hypothesis Know Type I
and Type II errors and Use hypothesis testing for comparing the
difference between: The means of two independent populations The
means of two related populations The proportions of two independent
populationsThevariancesoftwoindependentpopulations.
Useseparatepaperstoreportyourresults(inhandwritingorcomputer
print out). A report produced by the students should be in the form of
workingproceduresandresultsinbothsoftcopyandhardcopy.
PreLabRead:
Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,Chapter9andExcelCompaniontoChapter9.
Pages328367and369420
6 HYPOTHESISTESTINGANDTWOSAMPLETEST
6.1 HypothesisTesting
Ahypothesisisaclaim(assumption)aboutapopulationparameter:
Populationmean.Example:Themeanmonthlycellphonebillofthiscityis=$52.
Populationproportion.Example:Theproportionofadultsinthiscitywithcellphonesis
=.68
Statestheassumption(numerical)tobetested
Example: ThemeannumberofTVsetsinU.S.Homesisequaltothree. H 0 : 3
6.1.1
TheNullHypothesis,H0
o Isalwaysaboutapopulationparameter,notaboutasamplestatistic.
o Beginwiththeassumptionthatthenullhypothesisistrue.
o Itreferstothestatusquo
o Alwayscontains=,orsign
o Mayormaynotberejected
ARDBUSINESSSTATISTICSSec.8
Page68of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
6.1.2
6.1.3
TheAlternativeHypothesis,H1
o IstheoppositeofthenullhypothesisEx:ThemeannumberofTVsetsinU.S.homesisnot
equalto3(H1:3)
o Challengesthestatusquo
o Nevercontainsthe=,orsign
o Mayormaynotbeproven
o Isgenerallythehypothesisthattheresearcheristryingtoprove
TheHypothesisTestingProcess
Claim:Thepopulationmeanageis50.
o H0:=50, H1:50
o Samplethepopulationandfindsamplemean.
Population:
Sample:
o
o
o
o
SupposethesamplemeanagewasX=20.
Thisissignificantlylowerthantheclaimedmeanpopulationageof50.
Ifthenullhypothesisweretrue,theprobabilityofgettingsuchadifferentsamplemean
wouldbeverysmall,soyourejectthenullhypothesis.
Inotherwords,gettingasamplemeanof20issounlikelyifthepopulationmeanwas50,you
concludethatthepopulationmeanmustnotbe50.
ARDBUSINESSSTATISTICSSec.8
Page69of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
6.1.4
TheTestStatisticandCriticalValues
If the sample mean is close to the assumed population mean, the null hypothesis is not
rejected.
Ifthesamplemeanisfarfromtheassumedpopulationmean,thenullhypothesisisrejected.
HowfarisfarenoughtorejectH0?
Thecriticalvalueofateststatisticcreatesalineinthesandfordecisionmaking.
6.1.5
ErrorsinDecisionMaking
6.1.5.1 TypeIError
o Rejectatruenullhypothesis
o Consideredaserioustypeoferror
o TheprobabilityofaTypeIErroris
Calledlevelofsignificanceofthetest
Setbyresearcherinadvance
6.1.5.2 TypeIIError
o Failuretorejectfalsenullhypothesis
o TheprobabilityofaTypeIIErroris
PossibleHypothesisTestOutcomes
ActualSituation
Decision
H0True
H0False
DoNotRejectH0
NoError
Probability1
TypeIIError
Probability
RejectH0
TypeIError
Probability
NoError
Probability1
ARDBUSINESSSTATISTICSSec.8
Page70of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
6.1.6
LevelofSignificance,
Forexample,Claim:Thepopulationmeanageis50.
6.1.7
HypothesisTesting:Known
Fortwotailtestforthemean,known:
Convertsamplestatistic(X)toteststatistic Z
DeterminethecriticalZvaluesforaspecified
levelofsignificancefromatableorbyusingExcel
DecisionRule:Iftheteststatisticfallsintherejectionregion,rejectH0;otherwisedo
notrejectH0
ARDBUSINESSSTATISTICSSec.8
Page71of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Example: Test the claim that the true mean weight ofchocolate bars manufactured in a factory is 3
ounces.
Solution:
Statetheappropriatenullandalternativehypotheses:H0:=3H1:3(Thisisatwotailed
test)
Specifythedesiredlevelofsignificance:Supposethat=.05ischosenforthistest
Chooseasamplesize:Supposeasampleofsizen=100isselected
Determinetheappropriatetechnique
o isknownsothisisaZtest
Setupthecriticalvalues
o For=.05thecriticalZvaluesare1.96
Collectthedataandcomputetheteststatistic
o Supposethesampleresultsaren=100,X=2.84(=0.8isassumedknownfrompast
companyrecords)
Sotheteststatisticis: Z
6.1.8
Since Z = 2.0 < 1.96, you reject the null hypothesis and conclude that there is sufficient
evidencethatthemeanweightofchocolatebarsisnotequalto3.
6StepsofHypothesisTesting:
1. Statethenullhypothesis,H0andstatethealternativehypotheses,H1
2. Choosethelevelofsignificance,,andthesamplesizen.
3. Determinetheappropriatestatisticaltechniqueandtheteststatistictouse
4. Findthecriticalvaluesanddeterminetherejectionregion(s)
5. Collectdataandcomputetheteststatisticfromthesampleresult
ARDBUSINESSSTATISTICSSec.8
X
2.84 3 .16
2.0
0.8
.08
n
100
Page72of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
6.
Comparetheteststatistictothecriticalvaluetodeterminewhethertheteststatisticfallsin
theregionofrejection.Makethestatisticaldecision:RejectH0iftheteststatisticfallsinthe
rejectionregion.Expressthedecisioninthecontextoftheproblem.
SeeExample9.2and9.3
Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,pages336and337
6.1.9
HypothesisTesting:KnownpValueApproach
Thepvalueistheprobabilityofobtainingateststatisticequaltoormoreextreme(<or>)
thantheobservedsamplevaluegivenH0istrue.Alsocalledobservedlevelofsignificance.
SmallestvalueofforwhichH0canberejected.
ConvertSampleStatistic(ex.X)toTestStatistic(ex.Zstatistic)
ObtainthepvaluefromatableorbyusingExcel
Comparethepvaluewith
Ifpvalue<,rejectH0
Ifpvalue,donotrejectH0
Example:
6.1.9.1 ManualCalculation
Howlikelyisittoseeasamplemeanof2.84(orsomethingfurtherfromthemean,ineither
direction)ifthetruemeanis=3.0?Supposethesampleresultsaren=100,=0.8isassumed
Comparethepvaluewith
Ifpvalue<,rejectH0
Ifpvalue,donotrejectH0
Here: pvalue=.0456and=.05,Since.0456<.05,yourejectthenullhypothesis
ARDBUSINESSSTATISTICSSec.8
Page73of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
6.1.9.2 UsingMsExcel:
6.1.10 HypothesisTesting:KnownConfidenceIntervalConnections
ForX=2.84,=0.8andn=100,the95%confidenceintervalis:
2.84 - (1.96)
0.8
0.8
to 2.84 (1.96)
100
100
.6832 2.9968
Sincethisintervaldoesnotcontainthehypothesizedmean(3.0),yourejectthenull
hypothesisat=.05
6.1.11 OneTailTests
Inmanycases,thealternativehypothesisfocusesonaparticulardirection
Thisisalowertailtestsincethealternativehypothesisisfocusedonthe
lowertailbelowthemeanof3
Thisisanuppertailtestsincethealternativehypothesisisfocusedon
theuppertailabovethemeanof3
ARDBUSINESSSTATISTICSSec.8
Page74of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Example:
Aphoneindustrymanagerthinksthatcustomermonthlycellphonebillshaveincreased,and
nowaveragemorethan$52permonth.Thecompanywishestotestthisclaim.Pastcompany
recordsindicatethatthestandarddeviationisabout$10.
Formhypothesistest:
H0:52 themeanislessthanorequaltothan$52permonth
H1:>52 themeanisgreaterthan$52permonth(i.e.,sufficientevidenceexiststosupport
themanagersclaim)
Supposethat=.10ischosenforthistest
Findtherejectionregion:
WhatisZgivena=0.10?
Supposeasampleistakenwiththefollowingresults:n=64,X=53.1(=10wasassumed
knownfrompastcompanyrecords)
Thentheteststatisticis: Z
X
53.1 52
0.88
10
n
64
ARDBUSINESSSTATISTICSSec.8
Page75of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
DonotrejectH0sinceZ=0.881.28
i.e.:thereisnotsufficientevidencethatthemeanbillisgreaterthan$52
Calculatethepvalueandcompareto
MicrosoftExcelZtestResults
ARDBUSINESSSTATISTICSSec.8
Page76of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
6.1.12 HypothesisTesting:Unknown
Ifthepopulationstandarddeviationisunknown,youinsteadusethesamplestandarddeviation
S.Becauseofthischange,youusethetdistributioninsteadoftheZdistributiontotestthenull
hypothesisaboutthemean.Allothersteps,concepts,andconclusionsarethesame.
Thetteststatisticwithn1degreesoffreedomis: t n -1
S
n
Example:ThemeancostofahotelroominNewYorkissaidtobe$168pernight.Arandom
sampleof25hotelsresultedinX=$172.50andS=15.40.Testatthe=0.05level.
(Astemandleafdisplayandanormalprobabilityplotindicatethedataareapproximately
normallydistributed)
H0:=168
H1: 168
1.46
t n 1
X
172.50 168
1.46
S
15.40
n
25
DonotrejectH0:notsufficientevidencethattruemeancostisdifferentfrom$168
6.1.13 HypothesisTesting:ConnectiontoConfidenceIntervals
ForX=172.5,S=15.40andn=25,the95%confidenceintervalis:
172.5 - (2.0639)
ARDBUSINESSSTATISTICSSec.8
15.4
15.4
to 172.5 (2.0639)
25
25
Page77of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
166.14178.86
Sincethisintervalcontainsthehypothesizedmean(168),youdonotrejectthenullhypothesis
at=.05
o
o
o
Recallthatyouassumethatthesamplestatisticcomesfromarandomsamplefroma
normaldistribution.
Ifthesamplesizeissmall(<30),youshoulduseaboxandwhiskerplotoranormal
probabilityplottoassesswhethertheassumptionofnormalityisvalid.
Ifthesamplesizeislarge,thecentrallimittheoremappliesandthesampling
distributionofthemeanwillbenormal.
MicrosoftExcelResults
6.1.14 HypothesisTestingProportion
Involvescategoricalvariables.Twopossibleoutcomes,thatis,Success(possessesacertain
characteristic)andFailure(doesnotpossessesthatcharacteristic).Fractionorproportionof
thepopulationinthesuccesscategoryisdenotedby
Sampleproportioninthesuccesscategoryisdenotedbyp
n
sample size
ARDBUSINESSSTATISTICSSec.8
Page78of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Whenbothnandn(1)areatleast5,pcanbeapproximatedbyanormaldistributionwith
(1 )
meanandstandarddeviation p and p
Thesamplingdistributionofpisapproximatelynormal,sotheteststatisticisaZvalue:
(1 )
n
Example:Amarketingcompanyclaimsthatitreceives8%responsesfromitsmailing.Totest
thisclaim,arandomsampleof500weresurveyedwith30responses.Testatthe=.05
significancelevel.
Solution:
n=(500)(.08)=40
n(1)=(500)(.92)=460
6.2 Assignment6.1
1.
2.
3.
4.
5.
ProblemforSection9.1No.9.1through9.5,9.14,9.18
ProblemforSection9.2No.9.20,9.24,9.30,9.32
ProblemforSection9.3No.9.36,9.44,9.46
ProblemforSection9.4No.9.50,9.54,9.56,9.62
ProblemforSection9.5No.9.68,9.70,9.74
6.3 TwoSampleTests
Goal:Testhypothesisorformaconfidenceintervalforthedifferencebetweentwopopulation
means,12
ARDBUSINESSSTATISTICSSec.8
Page79of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Thepointestimateforthedifferencebetweensamplemeans: X X
Differentdatasources
Independent:Sampleselectedfromonepopulationhasnoeffectonthesampleselected
fromtheotherpopulation
Usethedifferencebetween2samplemeans
UseZtest,pooledvariancettest,orseparatevariancettest
IndependentPopulationMeans:
1. 1and2knownUseaZteststatistic
Assumptions:Samplesarerandomlyandindependentlydrawnandpopulation
distributionsarenormal
When1and2areknownandbothpopulationsarenormal,theteststatisticis
aZvalueandthestandarderrorofX1X2is
1 2
X1 X 2 1 2
and Z
2
2
n1 n 2
1 2
n1 n 2
2
X1 X 2
TwoIndependentPopulations,ComparingMeans
2. 1and2unknownUseStoestimateunknown,useatteststatistic
ARDBUSINESSSTATISTICSSec.8
Page80of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
6.3.1
Assumptions:Samplesarerandomlyandindependentlydrawn,Populations
arenormallydistributedandPopulationvariancesareunknownbutassumed
equal
Formingintervalestimates:Thepopulationvariancesareassumedequal,so
usethetwosamplestandarddeviationsandpoolthemtoestimatethetest
statisticisatvaluewith(n1+n22)degreesoffreedom
Sp
2
p
n1 1S12 n 2 1S2 2
(n1 1) (n 2 1)
X X
1
1 1
S2p
n1 n 2
2
2
n1 1S1 n 2 1S2
(n1 1) (n 2 1)
TwoSampleTestsIndependentPopulations
Youareafinancialanalystforabrokeragefirm.Isthereadifferenceindividendyieldbetween
stockslistedontheNYSE&NASDAQ?Youcollectthefollowingdata:
NYSE NASDAQ
Number 21
25
Samplemean 3.27
2.53
Samplestddev 1.30
1.16
Assumingbothpopulationsareapproximatelynormalwithequalvariances,isthereadifference
inaverageyield(=0.05)?
Theteststatisticis: t
2
p
1 1
S2p
n1 n 2
3.27 2.53 0
1
1
1.5021
21 25
2
2
n1 1S1 n 2 1S2
21 11.30 2 25 11.16 2
(n1 1) (n 2 1)
(21 - 1) (25 1)
2.040
1.5021
H0:12=0i.e.(1=2)
H1:120i.e.(12)
=0.05
df=21+252=44
CriticalValues:t=2.0154
TestStatistic:2.040
ARDBUSINESSSTATISTICSSec.8
X X
Page81of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Decision:RejectH0at=0.05
Conclusion:Thereisevidenceofadifferenceinthemeans
6.3.2
IndependentPopulationsUnequalVariance
Ifyoucannotassumepopulationvariancesareequal,thepooledvariancettestisinappropriate,
Instead,useaseparatevariancettest,whichincludesthetwoseparatesamplevariancesinthe
computationoftheteststatistic.Thecomputationsarecomplicatedandarebestperformed
usingExcel.
1 2
Theconfidenceintervalfor12is: X1 X 2 Z
n1 n 2
ARDBUSINESSSTATISTICSSec.8
Page82of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Practicum:Math11002
BusinessStatistics
MODULE7
Submittedonlyon
Day/Date:____________/
______________
Time:12.0014.00WIB
In____________________
ModuleDescription:
Objective
Output
DateofReceipt
Score:
AssistantSignature
I herewith signed here on stated that I have strived to do all this the
modulebymyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
ANOVAandCHISQUAREANDNONPARAMETRICTESTS
The basic concepts of experimental design How to use the oneway
analysis of variance to test for the differences among the means of
severalgroupsHowtousethetwowayanalysisofvarianceandinterpret
theinteraction and How and when to use the chisquare test for
contingencytablesHowtousetheMarascuilloprocedurefordetermining
pairwise differences when evaluating more than two porportions How
andwhentousetheMcNemartestHowandwhentousenonparametric
tests
Useseparatepaperstoreportyourresults(inhandwritingorcomputer
print out). A report produced by the students should be in the form of
workingproceduresandresultsinbothsoftcopyandhardcopy.
PreLabRead:
Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,Chapter10andExcelCompaniontoChapter10.
Pages369420
7 ANOVAANDCHISQUAREANDNONPARAMETRICTESTS
ANOVA
GeneralANOVASetting
Investigatorcontrolsoneormorefactorsofinterest
ARDBUSINESSSTATISTICSSec.9
Page83of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
o Eachfactorcontainstwoormorelevels
o Levelscanbenumericalorcategorical
o Differentlevelsproducedifferentgroups
o Thinkofthegroupsaspopulations
Observeeffectsonthedependentvariable,arethegroupsthesame?
Experimentaldesign:theplanusedtocollectthedata
CompletelyRandomizedDesign
Experimentalunits(subjects)areassignedrandomlytothedifferentlevels(groups),subjectsare
assumedhomogeneous
Onlyonefactororindependentvariable,withtwoormorelevels(groups)
Analyzedbyonefactoranalysisofvariance(onewayANOVA)
7.1 OneWayAnalysisofVariance
Evaluatethedifferenceamongthemeansofthreeormoregroups
Examples:Accidentratesfor1st,2nd,and3rdshiftorExpectedmileageforfivebrandsoftires
Assumptions:
7.1.1
Populationsarenormallydistributed
Populationshaveequalvariances
Samplesarerandomlyandindependentlydrawn
Hypotheses:OneWayANOVA
H 0 : 1 2 3 c
Allpopulationmeansareequal,i.e.,notreatmenteffect(novariationinmeansamong
groups)
H1 : 1 2 3 c
Atleastonepopulationmeanisdifferent,i.e.,thereisatreatment(groups)effect.Does
notmeanthatallpopulationmeansaredifferent.
AllMeansarethesame:TheNullHypothesisisTrue
(NoGroupEffect)
ARDBUSINESSSTATISTICSSec.9
Page84of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
7.1.2
Atleastonemeanis
different:TheNull
HypothesisisNOTtrue
(TreatmentEffectis
present)
PartitioningtheVariation
Totalvariationcanbesplitintotwoparts:
SST= TotalVariation=theaggregatedispersionoftheindividualdatavaluesaroundthe
overall(grand)meanofallfactorlevels(SST)
c
nj
SST ( X ij X ) 2
j 1 i 1
SST ( X 11 X ) ( X 12 X ) 2 ... ( X nc X ) 2
2
Where:
SST=Totalsumofsquares
c=numberofgroups
nj=numberofvaluesingroupj
=ithvaluefromgroupj
=grandmean(meanofalldatavalues)
SSA= AmongGroupVariation=dispersionbetweenthefactorsamplemeans(SSA)
c
SSA n j ( X j X ) 2
j 1
Where:
SSA=Sumofsquaresamonggroups
c=numberofgroups
nj=samplesizefromgroupj
=samplemeanfromgroupj
=grandmean(meanofalldatavalues)
SSW= WithinGroupVariation=dispersionthatexistsamongthedatavalueswithinthe
particularfactorlevels(SSW)
c
SSW
j 1
nj
(X
i 1
ij
X j )2
SSW ( X 11 X 1 ) ( X 21 X 1 ) 2 ... ( X nc X c ) 2
2
ARDBUSINESSSTATISTICSSec.9
Page85of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Where:
7.1.3
SSW=Sumofsquareswithingroups
c=numberofgroups
nj=samplesizefromgroupj
=samplemeanfromgroupj
=ithvalueingroupj
ObtainingtheMeanSquares
SST
n 1
SSA
MSA
c 1
SSW
MSW
nc
MST
MeanSquaresTotal
MeanSquaresAmong
MeanSquaresWithin
7.1.4
OneWayANOVATable
c=numberofgroups
n=sumofthesamplesizes
fromallgroups
df=degreesoffreedom
7.1.5
Teststatistic
MSAismeansquaresamongvariances
MSWismeansquareswithinvariances
Degreesoffreedom
df1=c1(c=numberofgroups)
df2=nc(n=sumofallsamplesizes)
TheFstatisticistheratiooftheamongvariancetothewithinvariance
Theratiomustalwaysbepositive
df1=c1willtypicallybesmall
df2=ncwilltypicallybelarge
DecisionRule:RejectH0ifF>FU,otherwisedo
notrejectH0
ARDBUSINESSSTATISTICSSec.9
Page86of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
7.1.6
Example
Anexperimentwasconductedtodeterminewhetheranysignificantdifferencesexistinthe
strengthofparachuteswovenfromsyntheticfibersfromfourdifferentsuppliers(Supplier1,
Supplier2,Supplier3,andSupplier4)
SampleMean
SampleStandardDeviation
Supplier1
18.5
24.0
17.2
19.9
18.0
Supplier2
26.3
25.3
24.0
21.2
24.5
Supplier3
20.6
25.2
20.8
24.7
22.9
Supplier4
25.4
19.9
22.6
17.5
20.4
19.52
24.26
22.84
21.16
2.69
1.92
2.13
2.98
=AVERAGE()
=STDEV()
Tensile Strength
20
15
10
5
0
0
2
Supplier
ToconstructtheANOVAsummarytable,wecomputethesamplemeansineachgroup.
Thencomputethegrandmeanbysummingall20valuesanddividingbytotalnumberof
values:
438.9
20
21.945
Thencomputesumofsquares:
5 19.52
5 21.16
18.5
24.5
25.4
ARDBUSINESSSTATISTICSSec.9
21.945
21.945
19.52
24.26
21.16
5 24.26
63.2855
21.945
18 19.52
20.6 22.84
20.4 21.16
5 22.84
21.945
26.63 24.26
22.9 22.84
97.5040
Page87of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
18.5
21.945
24
21.945
20.4
21.945
160.7895
62.2855
4 1
21.0952
97.5040
20 4
6.0940
21.0952
6.0940
3.4616
FuformFdistributionTablewith3degreesoffreedominnumeratorand16degreesof
freedomindominatorat0.05levelofsignificanceis3.24.
BecausethecomputeteststatisticF=3.4616>Fu=3.24,werejectthenullhypotesis.The
conclusionthatthereisasignificantdifferenceinthemeantensilestrengthamongthe
foursupplier.
UsingMsExcelDataDataAnalysisAnova:SingleFactor:
7.1.7
TheTheTukeyKramerProcedure
Firstcomputethedifferences,
.ThencomputeCRITICALRANGEFORTHETURKEY
KRAMMER
ARDBUSINESSSTATISTICSSec.9
Page88of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
WhereQUistheuppertailcriticalvaluefroma
Studentizedrangedistributionhavingcdegreesoffreedominnumeratorandncdegrees
inthedenominator.
where:
QU=ValuefromStudentizedRangeDistributionwithcandncdegreesoffreedom
forthedesiredlevelof
MSW=MeanSquareWithin
njandnj=Samplesizesfromgroupsjandj
7.1.8
ANOVAAssumptions
RandomnessandIndependence:Selectrandomsamplesfromthecgroups(orrandomly
assignthelevels)
Normality:Thesamplevaluesfromeachgrouparefromanormalpopulation
HomogeneityofVariance:CanbetestedwithLevenesTest
LevenesTest
o
o
o
o
Teststheassumptionthatthevariancesofeachgroupareequal.
First,definethenullandalternativehypotheses:
H0:21=22==2c
H1:Notall2jareequal
Second,computetheabsolutevalueofthedifferencebetweeneachvalueand
themedianofeachgroup.
Third,performaonewayANOVAontheseabsolutedifferences.
ARDBUSINESSSTATISTICSSec.9
Page89of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
F=0.2068<3.2389(orthepvalue=0.8902>0.05).,thuswedonotrejecttheH0.Thereisnoevidence
ofasignificantdifferenceamongthefourvariances.Therefore,thehomogeneityofvariance
assumptionforANOVAprocedureisjustified.
7.2
7.2.1
TwoWayAnalysisofVariance
Examinestheeffectof
Twofactorsofinterestonthedependentvariable
e.g.,Percentcarbonationandlinespeedonsoftdrinkbottlingprocess
Interactionbetweenthedifferentlevelsofthesetwofactors
e.g.,Doestheeffectofoneparticularcarbonationleveldependonwhich
levelthelinespeedisset?
Assumptions
Populationsarenormallydistributed
Populationshaveequalvariances
Independentrandomsamplesareselected
SourcesofVariation
SST=SSA+SSB+SSAB+SSE
TwoFactorsofinterest:AandB
r=numberoflevelsoffactorA
c=numberoflevelsoffactorB
n/=numberofreplicationsforeachcell
n=totalnumberofobservationsinallcells
(n=rcn/)
Xijk=valueofthekthobservationofleveli
offactorAandleveljoffactorB
ARDBUSINESSSTATISTICSSec.9
Page90of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
7.2.2
TwoWayANOVA:Features
Degreesoffreedomalwaysaddup:n1=rc(n/1)+(r1)+(c1)+(r1)(c1)
Total=error+factorA+factorB+interaction
ThedenominatoroftheFTestisalwaysthesamebutthenumeratorisdifferent
Thesumsofsquaresalwaysaddup:SST=SSE+SSA+SSB+SSAB
Total=error+factorA+factorB+interaction
7.2.3
Interaction
7.3
CHISQUAREANDNONPARAMETRICTESTS
Alloftheinferentialstatisticswehavecoveredinpastlessons,arewhatarecalledparametric
statistics.Tousethesestatisticswemakesomeassumptionsaboutthedistributionstheycome
from,suchastheyarenormallydistributed.Withparametricstatisticswealsodealwithdatafor
thedependentvariablethatisattheintervalorratiolevelofmeasurement,i.e.testscores,
physicalmeasurements.
Theparametricstatisticswehavediscussedsoforinthiscourseare:
1.
2.
3.
4.
5.
6.
theZscoretest
theZtest
thesinglesamplettest
theindependentttest
thedependentttest
onesampleanalysisofvariance(ANOVA)
Wewillnowconsiderawidelyusednonparametrictest,chisquare,whichwecanusewithdataat
thenominallevel,thatisdatathatisclassificatory.Forexample,weknowthefrequencywithwhich
enteringfreshman,whenrequiredtopurchaseacomputerforcollegeuse,selectMacintosh
ARDBUSINESSSTATISTICSSec.9
Page91of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Computers,IBMComputers,orSomeotherbrandofcomputer.Wewanttoknowifthereisa
differenceamongthefrequencieswithwhichthesethreebrandsofcomputersareselectedorif
theychoosebasicallyequallyamongthethreebrands.Thisisaproblemwecanusethechisquare
statisticfor.
Thechisquarestatisticisusedtocomparetheobservedfrequencyofsomeobservation(suchas
frequencyofbuyingdifferentbrandsofcomputers)withanexpectedfrequency(suchasbuying
equalnumbersofeachbrandofcomputer).Thecomparisonofobservedandexpectedfrequencies
isusedtocalculatethevalueofthechisquarestatistic,whichinturncanbecomparedwiththe
distributionofchisquaretomakeaninferenceaboutastatisticalproblem.
Thesymbolforchisquareandtheformulaareasfollows:
where
Oistheobservedfrequency,and
Eistheexpectedfrequency.
Thedegreesoffreedomfortheonedimensionalchisquarestatisticis:
df=C1
whereCisthenumberofcategoriesorlevelsoftheindependentvariable.
7.3.1
OneVariableChiSquare(goodnessoffittest)withequalexpectedfrequencies
Wecanusethechisquarestatistictotestthedistributionofmeasuresoverlevelsofavariableto
indicateifthedistributionofmeasuresisthesameforalllevels.Thisisthefirstuseoftheone
variablechisquaretest.Thistestisalsoreferredtoasthegoodnessoffittest.
Usingtheexamplewealreadymentionedofthefrequencywithwhichenteringfreshman,when
requiredtopurchaseacomputerforcollegeuse,selectMacintoshComputers,IBMComputers,or
Someotherbrandofcomputer.Wewanttoknowifthereisasignificantdifferenceamongthe
frequencieswithwhichthesethreebrandsofcomputersareselectedorifthestudentsselect
equallyamongthethreebrands.
Thedatafor100studentsisrecordedinthetablebelow(theobservedfrequencies).Wehavealso
indicatedtheexpectedfrequencyforeachcategory.Sincethereare100measuresor
observationsandtherearethreecategories(Macintosh,IBM,andOther)wewouldindicatethe
expectedfrequencyforeachcategorytobe100/3or33.333.Inthethirdcolumnofthetablewe
havecalculatedthesquareoftheobservedfrequencyminustheexpectedfrequencydividedby
ARDBUSINESSSTATISTICSSec.9
Page92of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
theexpectedfrequency.Thesumofthethirdcolumnwouldbethevalueofthechisquare
statistic.
Frequencywithwhichstudentsselectcomputerbrand
Computer
Observed
Frequency
IBM
47
Macintosh
36
Other
17
Total(chisquare)
Expected
Frequency
33.333
33.333
33.333
(OE)2/E
5.604
0.213
8.003
13.820
Fromthetablewecanseethat:
Thedf=C1=31=2
Wecancomparetheobtainedvalueofchisquarewiththecriticalvalueforthe.05leveland
withdegreeesoffreedomof2obtainedfromAppendixTableF(DistributionofChiSquare)on
page331ofthetext.Lookingunderthecolumnfor.05andtherowfordf=2weseethatthe
criticalvalueforchisquareis5.991.
Wenowhavetheinformationweneedtocompletethesixstepprocessfortestingstatistical
hypothesesforourresearchproblem.
1. Statethenullhypothesisandthealternativehypothesisbasedonyourresearchquestion.
Note:Ournullhypothesis,forthechisquaretest,statesthattherearenodifferencesbetween
theobservedandtheexpectedfrequencies.Thealternatehypothesisstatesthatthereare
significantdifferencesbetweentheobservedandexpectedfrequencies.
2. Setthealphalevel.
Note:Asusualwewillsetouralphalevelat.05,wehave5chancesin100ofmakingatypeI
error.
3. Calculatethevalueoftheappropriatestatistic.Alsoindicatethedegreesoffreedomforthe
statisticaltestifnecessary.
df=C1=2
ARDBUSINESSSTATISTICSSec.9
Page93of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
4. Writethedecisionruleforrejectingthenullhypothesis.
RejectH0if
>=5.991.
Note:Towritethedecisionrulewehadtoknowthecriticalvalueforchisquare,withanalpha
levelof.05,and2degreesoffreedom.WecandothisbylookingatAppendixTableFandnoting
thetabledvalueforthecolumnforthe.05levelandtherowfor2df.
5. Writeasummarystatementbasedonthedecision.
RejectH0,p<.05
Note:Sinceourcalculatedvalueof
(13.820)isgreaterthan5.991,werejectthenull
hypothesisandacceptthealternativehypothesis.
6. WriteastatementofresultsinstandardEnglish.
Thereisasignificantdifferenceamongthefrequencieswithwhichstudentspurchasedthree
differentbrandsofcomputers.
7.3.2
OneVariableChiSquare(goodnessoffittest)withpredeterminedexpectedfrequencies
Let'slookattheproblemwejustsolved,inawaythatillustratestheotheruseofonevariable
chisquare,thatiswithpredeterminedexpectedfrequenciesratherthanwithequalfrequencies.
Wecouldformulatedourrevisedproblemasfollows:
Inanationalstudy,studentsrequiredtobuycomputersforcollegeuseboughtIBMcomputers
50%ofthetime,Macintoshcomputers25%ofthetime,andothercomputers25%ofthetime.
Of100enteringfreshmanwesurveyed36boughtMacintoshComputers,47boughtIBM
computers,and17boughtsomeotherbrandofcomputer.Wewanttoknowifthese
frequenciesofcomputerbuyingbehaviorissimilartoordifferentthanthenationalstudydata.
Thedatafor100studentsisrecordedinthetablebelow(theobservedfrequencies).Inthiscasethe
expectedfrequenciesarethosefromthenationalstudy.Togettheexpectedfrequencywetakethe
percentagesfromthenationalstudytimesthetotalnumberofsubjectsinthecurrentstudy.
ExpectedfrequencyforIBM=100X50%=50
ExpectedfrequencyforMacintosh=100X25%=25
ExpectedfrequencyforOther=100X25%=25
Theexpectedfrequenciesarerecordedinthesecondcolumnofthetable.Asbeforewehave
calculatedthesquareoftheobservedfrequencyminustheexpectedfrequencydividedbythe
expectedfrequencyandrecordedthisresultinthethirdcolumnofthetable.Thesumofthethird
columnwouldbethevalueofthechisquarestatistic.
ARDBUSINESSSTATISTICSSec.9
Page94of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Frequencywithwhichstudentsselectcomputerbrand
Computer
Observed
Frequency
IBM
47
Macintosh
36
Other
17
Total(chisquare)
Expected
Frequency
50
25
25
(OE)2/E
0.18
4.84
2.56
7.58
Fromthetablewecanseethat:
Thedf=C1=31=2
Wecancomparetheobtainedvalueofchisquarewiththecriticalvalueforthe.05leveland
withdegreeesoffreedomof2obtainedfromAppendixTableF(DistributionofChiSquare)on
page331ofthetext.Lookingunderthecolumnfor.05andtherowfordf=2weseethatthe
criticalvalueforchisquareis5.991.
Wenowhavetheinformationweneedtocompletethesixstepprocessfortestingstatistical
hypothesesforourresearchproblem.
1. Statethenullhypothesisandthealternativehypothesisbasedonyourresearchquestion.
Note:Ournullhypothesis,forthechisquaretest,statesthattherearenodifferencesbetween
the observed and the expected frequencies. The alternate hypothesis states that there are
significantdifferencesbetweentheobservedandexpectedfrequencies.
2. Setthealphalevel.
Note:Asusualwewillsetouralphalevelat.05,wehave5chancesin100ofmakingatypeI
error.
3. Calculatethevalueoftheappropriatestatistic.Alsoindicatethedegreesoffreedomforthe
statisticaltestifnecessary.
7.58
df=C1=2
4. Writethedecisionruleforrejectingthenullhypothesis.
RejectH0if
>=5.991.
ARDBUSINESSSTATISTICSSec.9
Page95of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
Note:Towritethedecisionrulewehadtoknowthecriticalvalueforchisquare,withanalpha
levelof.05,and2degreesoffreedom.WecandothisbylookingatAppendixTableFandnoting
thetabledvalueforthecolumnforthe.05levelandtherowfor2df.
5. Writeasummarystatementbasedonthedecision.
RejectH0,p<.05
Note:Sinceourcalculatedvalueof (7.58)isgreaterthan5.991,werejectthenullhypothesis
andacceptthealternativehypothesis.
6. WriteastatementofresultsinstandardEnglish.
Thereisasignificantdifferenceamongthefrequencieswithwhichstudentspurchasedthree
differentbrandsofcomputersandtheproportionssuggestedbyanationalstudy.
7.3.3
TwoVariableChiSquare(testofindependence)
Nowletusconsiderthecaseofthetwovariablechisquaretest,alsoknownasthetestof
independence.
Forexamplewemaywishtoknowifthereisasignificantdifferenceinthefrequencieswithwhich
malescomefromsmall,medium,orlargecitiesasconstrastedwithfemales.Thetwovariableswe
areconsideringherearehometownsize(small,medium,orlarge)andsex(maleorfemale).
Anotherwayofputtingourresearchquestionis:Isgenderindependentofsizeofhometown?
Thedatafor30femalesand6malesisinthefollowingtable.
Frequencywithwhichmalesandfemalescomefromsmall,medium,andlargecities
Female
Male
Totals
14
1
15
6
1
7
30
6
36
Theformulaforchisquareisthesameasbefore:
where
Oistheobservedfrequency,and
Eistheexpectedfrequency.
Thedegreesoffreedomforthetwodimensionalchisquarestatisticis:
ARDBUSINESSSTATISTICSSec.9
Page96of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
df=(C1)(R1)
whereCisthenumberofcolumesorlevelsofthefirstvariableandRisthenumberofrowsor
levelsoftheseconedvariable.
Inthetableabovewehavetheobservedfrequencies(sixofthem).Nowwemustcalculatethe
expectedfrequencyforeachofthesixcells.Fortwovariablechisquarewefindtheexpected
frequencieswiththeformula:
ExpectedFrequencyforaCell=(ColumnTotalXRowTotal)/GrandTotal
InthetableabovewecanseethattheColumnTotalsare14(small),15(medium),and7(large),
whiletheRowTotalsare30(female)and6(male).Thegrandtotalis36.
Usingtheformulawecanthusfindtheexpectedfrequencyforeachcell.
1.
2.
3.
4.
5.
6.
Theexpectedfrequencyforthesmallfemalecellis14X30/36=11.667
Theexpectedfrequencyforthemediumfemalecellis15X30/36=12.500
Theexpectedfrequencyforthelargefemalecellis7X30/36=5.833
Theexpectedfrequencyforthesmallmalecellis14X6/36=2.333
Theexpectedfrequencyforthemediummalecellis15X6/36=2.500
Theexpectedfrequencyforthelargemalecellis7X6/36=1.167
Wecanputtheseexpectedfrequenciesinourtableandalsoincludethevaluesfor(OE)2/E.The
sumofallthesewillofcoursebethevalueofchisquare.
Observedfrequencies,expectedfrequencies,and(OE)2/Eformalesandfemalesfromsmall,
medium,andlargecities
Female
Male
Totals
Observed
10
4
14
Small
Expected
11.667
2.333
(OE)2/E
0.238
1.191
Observed
14
1
15
Medium
Expected
12.500
2.500
(OE)2/E
0.180
0.900
Observed
6
1
7
Large
Expected
5.833
1.167
Totals
(OE)2/E
0.005
30
0.024
6
36
Fromthetablewecanseethat:
=0.238+.180+.005+1.191+0.900+0.024=2.538
anddf=(C1)(R1)=(31)(21)=(2)(1)=2
Wenowhavetheinformationweneedtocompletethesixstepprocessfortestingstatistical
hypothesesforourresearchproblem.
ARDBUSINESSSTATISTICSSec.9
Page97of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
1. Statethenullhypothesisandthealternativehypothesisbasedonyourresearch
question.
2. Setthealphalevel.
3. Calculatethevalueoftheappropriatestatistic.Alsoindicatethedegreesoffreedomfor
thestatisticaltestifnecessary.
df=(C1)(R1)=(2)(1)=2
4. Writethedecisionruleforrejectingthenullhypothesis.
RejectH0if
>=5.991.
Note:Towritethedecisionrulewehadtoknowthecriticalvalueforchisquare,withan
alphalevelof.05,and2degreesoffreedom.WecandothisbylookingatAppendixTable
Fandnotingthetabledvalueforthecolumnforthe.05levelandtherowfor2df.
5. Writeasummarystatementbasedonthedecision.
FailtorejectH0
Note:Sinceourcalculatedvalueof (2.538)isnotgreaterthan5.991,wefailtoreject
thenullhypothesisandareunabletoacceptthealternativehypothesis.
6. WriteastatementofresultsinstandardEnglish.
Thereisnotasignificantdifferenceinthefrequencieswithwhichmalescomefromsmall,
medium,orlargetownsascomparedwithfemales.
Hometownsizeisnotindependentofgender.
Chisquareisausefulnonparametricstatistictohelpevaluatestatisticalhypothesis,involving
frequencieswithwhichobservationsfallinvariouscategories(nominaldata).
7.4
Assignment
7.4.1
Assignment7.1
istheformulafor
1.
2.
3.
4.
thedependentttest.
theindependentttest.
theonewayanalysisofvariancetest.
theScheffeposthoctest.
ARDBUSINESSSTATISTICSSec.9
Page98of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
istheformulafor
1.
2.
3.
4.
istheformulafor
1.
2.
3.
4.
thedependentttest.
theindependentttest.
theonewayanalysisofvariancetest.
theScheffeposthoctest.
thedependentttest.
theindependentttest.
theonewayanalysisofvariancetest.
theScheffeposthoctest.
ForthefollowingresearchproblemYouareconcernedwiththeeffectofcomputersonthe
qualityofwrittenlanguage.Yourandomlyplacethe30studentsinyourEnglishclassintotwo
groupsof15each.ThefirstgroupisaskedtowritetheirnextEnglishthemeassignmentusinga
wordprocessingprogramonacomputer,whiletheothergroupisaskedtowritetheirthemes
byhand.YouaskanotherEnglishteacher,toreadall30themesandgivethema1(poorest)to
10(best)ratingonthequalityoftheirEnglishusage.Youwanttoknowifthereisasignificant
differenceinthequalityratingsofthetwogroups.
Whatistheproperstatisticaltesttousewiththisresearchproblem?
1.
2.
3.
4.
thedependentttest
theindependentttest
theonesamplettest
theonewayanalysisofvariancetest
ForthefollowingresearchproblemThenumberofhoursasubjectcouldstayawakewas
measuredasafunctionofthedoselevelofaparticulardrug.Threelevelsofdrugdosagewere
used.Analyzetheresultsforthedataonthedependentvariable(numberofhoursawake)to
determineiftherewasasignificantdifferenceamongthethreelevelsofdrugdosageused.
Whatistheproperstatisticaltesttousewiththisresearchproblem?
1.
2.
3.
4.
thedependentttest
theindependentttest
theonesamplettest
theonewayanalysisofvariancetest
ARDBUSINESSSTATISTICSSec.9
Page99of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
7.4.2
Assignment7.2.
1. Anindustrialpsychologistisinterestedinevaluatingfourdifferenttypesoftrainingonworker
productivity.Usingastandardmeasureofproductivity,thepsychologistmeasuresthe
productivityofasetofworkerswhohavebeentrainedusingeachoneofthefourprocedures.
Usingthedatabelow,determinewhetherthereisasignificantdifferencebetweenthetraining
methods.Largernumbersonthedependentvariableindicatehigherproductivity.
ProductivityScoresforFourGroupsofWorkersTrainedbyDifferentMethods
Group1
Onthe
Job
67
68
61
62
60
56
1.
2.
3.
4.
5.
6.
7.
8.
Group2
Computer
Assisted
68
62
59
71
60
66
Group3 Group4
Lecture Videotape
46
39
38
47
46
49
37
46
49
48
49
53
H0
:
:
H1
F
=
F12
=
F13
=
=
F23
CriticalValueforF=
StateconditionsunderwhichyouwouldrejectH0:
2. Aschoolguidancecounselorinvestigatestheinfluenceofdifferentmotivationaldevicesonthe
academicachievementofstudents.Thecounselorarrangesforonegroupofstudentstoreceive
immediatefeedbackuponthecompletionofanEnglishassignment.Asecondgroupofstudents
receivesfeedbackattheendoftheday,whileathirdgroupreceivesfeedbackattheendofthe
week.Usingthestudents'gradesonastandardizedEnglishtest,determinewhetherthereisa
significantdifferencebetweenthegroups.Ifnecessary,performScheffetests.
EnglishTestResultsforGroupsofStudentsReceivingVariousTypesofFeedback
No
1
2
3
4
Group1
Group2
Group3
ImmediateFeedback Day'sEndFeedback Week'sEndFeedback
49
40
36
40
37
32
41
42
31
46
39
39
ARDBUSINESSSTATISTICSSec.9
Page100of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
No
5
6
7
8
1.
2.
3.
4.
5.
6.
7.
8.
Group1
Group2
Group3
ImmediateFeedback Day'sEndFeedback Week'sEndFeedback
42
45
40
50
39
39
53
45
41
51
49
38
H0
:
:
H1
F
=
F12
=
F13
=
F23
=
CriticalValueforF=
StateconditionsunderwhichyouwouldrejectH0:
7.4.3
Assignment7.3
Foreachofthefollowingproblems,statethenullhypothesis,thealternatehypothesis,thecalculated
valueofthestatistic,thecriticalvalueofthestatistic,andtheconditionsunderwhichyouwouldreject
thenullhypothesis.
1. Asampleof100peopleareclassifiedastotheirsocialclubmembershipandtheiracademic
status.Isbelongingtoasocialclubindependentofacademicstatus?.
AcademicClassificationandSocialClubMembershipfor100People
Academic
Belongto DonotBelongto
classification SocialClub
SocialClub
Freshman 9
16
Sophomore 11
14
Junior
16
9
Senior
19
6
1. H0:
2. H1:
3.
=
4. CriticalValuefor =
5. StateconditionsunderwhichyouwouldrejectH0
2. Aconsumerresearchgroupasked100mentouseeachofthreekindsofaftershavelotionfor
onemonth.Afterthetrialperiod,eachmanindicatedthelotionhepreferred.Usingtheresults
below,determinewhetherthereisasignificantpreferenceforanyofthethreeaftershave
lotions.
ARDBUSINESSSTATISTICSSec.9
Page101of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
NumberofMenPreferringEachofThreeAfterShaveLotions
Lotion NumberofMenPreferring
1
42
2
36
3
22
1. H0:
2. H1:
3.
4. CriticalValuefor =
5. StateconditionsunderwhichyouwouldrejectH0
1.
2.
3.
4.
istheformulafor
thedependentttest.
theindependentttest.
theonewayanalysisofvariancetest.
thechisquaretest.
1.
2.
3.
4.
istheformulafor
thedependentttest.
theindependentttest.
theonewayanalysisofvariancetest.
thechisquaretest.
istheformulafor
1. thedependentttest.
2. theindependentttest.
ARDBUSINESSSTATISTICSSec.9
Page102of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
3. theonewayanalysisofvariancetest.
4. theScheffeposthoctest.
ForthefollowingresearchproblemYouareinterestedinknowingwhetherornotthe
compositionofafamilyisrelatedtothetypeofvacationstheyliketotake.Accordingly,you
collectthefollowingdatafromasurveyofpreferredvacations:
Frequencieswithwhichfamiliesofvarioustypesprefervariousvacationtypes
Vacation
FamilyType
Whatistheproperstatisticaltesttousewiththisresearchproblem?
1.
2.
3.
4.
thedependentttest
theindependentttest
theonewayanalysisofvariancetest
thechisquaretest
ForthefollowingresearchproblemIsitreallytruethatpeoplewithgraduatedegreesincertain
fieldsearnsubstantiallylessmoneythanpeoplewithgraduatedegreesincertainotherfields?
Toanswerthisquestion,youlookatdatacollectedbyYuppieUniversityonthesalariesearned
byrecentgraduateandprofessionalstudents.
SalariesforrecentgraduatesofYuppieUniversitybyfieldofstudy
EngineeringPhD HumanitiesPhD EducationPhD
J.D.
M.D.
$40,000
$22,000
$25,000
$40,000 $50,000
$28,000
$24,000
$27,000
$35,000 $43,000
$32,000
$28,000
$31,000
$33,000 $33,000
$36,000
$24,000
$24,000
$36,000 $39,000
$30,000
$27,000
$38,000 $50,000
$32,000
Whatistheproperstatisticaltesttousewiththisresearchproblem?
1.
2.
3.
4.
thedependentttest
theindependentttest
theonewayanalysisofvariancetest
thechisquaretest
ARDBUSINESSSTATISTICSSec.9
Page103of131
MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY
ARDBUSINESSSTATISTICSSec.9
Page104of131
Practicum:MATH11002
BusinessStatistics
MODULE8
Submittedonlyon
Day/Date:____________/
______________
Time:WIB
In____________________
ModuleDescription:
Objective
Output
DateofReceipt
Score:
AssistantSignature
IherewithsignedhereonstatedthatIhavestrivedtodoallthiswiththe
modulemyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
RegressionAnalysis
The student understand and able use regression analysis to predict the
value of a dependent variable based on an independent variable; The
meaningoftheregressioncoefficients;Makinginferencesabouttheslope
andcorrelationcoefficient;Estimatingmeanvaluesandpredictindividual
valuesusingMsExcelRegressionAnalysisorOtherStatisticalSoftwares.
A report of Simple Regression Analysis produced by the students should
be in the form of working procedures and results in both softcopy and
hardcopy.
RegressionAnalysis
8.1 SimpleRegressionAnalysis
The field of econometrics uses regression analysis to create quantitative models that can be used to
predict the value of a series if one knows the value of several other variables.
Thisanalysistoolperformslinearregressionanalysisbyusingthe"leastsquares"methodtofitaline
throughasetofobservations.Youcananalyzehowasingledependentvariableisaffectedbythevalues
ofoneormoreindependentvariablesforexample,how
Forexample,thewageperhourcanbepredictedifoneknowsthevaluesofthevariablesthatconstitute
theregressionequation.ThisisabigleapoffaithfromacorrelationorConfidenceintervalestimate.Ina
correlation,thestatisticianisnotpresumingorimplyinganycausalityordeductionofcausality.Onthe
otherhand,regressionanalysisisusedsooften(probablyevenabused)becauseofitssupposedability
to link cause and effect. Skepticism of causal relationships is not only healthy but also important
becauserealpowerofregressionliesinacomprehensiveinterpretationoftheresults.
8.2 RegressionAnalysisUsingExcel
Beforeusinganyanalysistool,youmustarrangethe
datayouwanttoanalyzeincolumnsorrowsonyour
worksheet. Thiswillbeyourinputrange.Oncethe
data is set you can open the analysis tool, in this
case Regression. Tools> Data Analysis >
Regression
ARDBUSINESSSTATISTICS08
Page105of131
8.3 RegressionDialogBox
InputYRangeEnterthereferencefortherangeofdependentdata.Therangemustconsistofasingle
columnofdata.YoucantypeinthedataorusetheCollapseorgooutandgetitbutton .Thiswill
collapseyourwindowsuchthatyoucanselectthedatayouwishtouse.Onceyouhavechosenyour
desireddataeitherpressEnterorclickontheExpandbutton .
Input X Range Enter the reference for the range of independent data. Microsoft Excel orders
independent variablesfromthisrange inascending orderfromlefttoright. The maximumnumberof
independentvariablesis16.
Labels Select if the first row or column of your input range or ranges contains labels. Clear if your
inputhasnolabels;Excelgeneratesappropriatedatalabelsfortheoutputtable.
ConfidenceLevelSelecttoincludeanadditionallevelinthesummaryoutputtable.Inthebox,enter
theconfidencelevelyouwantappliedinadditiontothedefault95percentlevel.
ConstantisZeroSelecttoforcetheregressionlinetopassthroughtheorigin.
Output Range Enter the reference for the upperleft cell of the output table. Allow at least seven
columnsforthesummaryoutputtable,whichincludesananovatable,coefficients,standarderrorofy
estimate,r2values,numberofobservations,andstandarderrorofcoefficients.
New Worksheet Ply Click to insert a new worksheet in the current workbook and paste the results
startingatcellA1ofthenewworksheet.Tonamethenewworksheet,typeanameinthebox.
NewWorkbookClicktocreateanewworkbookandpastetheresultsinthenewworkbook.
ResidualsSelecttoincluderesidualsintheresidualsoutputtable.
StandardizedResidualsSelecttoincludestandardizedresidualsintheresidualsoutputtable.
ResidualPlotsSelecttogenerateachartforeachindependentvariableversustheresidual.
LineFitPlotsSelecttogenerateachartforpredictedvaluesversustheobservedvalues.
ARDBUSINESSSTATISTICS08
Page106of131
NormalProbabilityPlotsSelecttogenerateachartthatplotsnormalprobability.
8.4 SimpleRegression
8.5 LinearCorrelationandRegressionAnalysis
Inthissectiontheobjectiveistoseewhetherthereisacorrelationbetweentwovariablesandto
find a model that predicts one variable in terms of the other variable. There are so many
examplesthatwecouldmentionbutwewillmentionthepopularonesintheworldofbusiness.
Usuallyindependentvariableispresentedbytheletterxandthedependentvariableispresented
by the letter y. A business man would like to see whether there is a relationship between the
numberofcasesofsoldandthetemperatureinahotsummerdaybasedoninformationtaken
from the past. He also would like to estimate the number cases of soda which will be sold in a
particularhotsummerdayinaballgame.Heclearlyrecordedtemperaturesandnumberofcases
ofsodasoldonthoseparticulardays.ThefollowingtableshowstherecordeddatafromJune1
through June 13. The weatherman predicts a 94F degree temperature for June 14. The
ARDBUSINESSSTATISTICS08
Page107of131
businessmanwouldliketomeetalldemandsforthecasesofsodasorderedbycustomersonJune
14.
DAY CasesofSoda Temperature
1Jun
57
56
2Jun
59
58
3Jun
65
63
4Jun
67
66
5Jun
75
73
6Jun
81
78
7Jun
86
85
8Jun
88
85
9Jun
88
87
10Jun
84
84
11Jun
82
88
12Jun
80
84
13Jun
83
89
NowletsuseExceltofindthelinearcorrelationcoefficientandtheregressionlineequation.The
linearcorrelationcoefficientisaquantitybetween1and+1.ThisquantityisdenotedbyR.The
closer R to +1 the stronger positive (direct) correlation and similarly the closer R to 1 the
stronger negative (inverse) correlation exists between the two variables. The general form of
the regression line is y = mx + b. In this formula, m is the slope of the line and b is the y
intercept. You can find these quantities from the Excel output. In this situation the variable y
(thedependentvariable)isthenumberofcasesofsodaandthex(independentvariable)isthe
temperature.TofindtheExceloutputthefollowingstepscanbetaken:
Step1.FromthemenuschooseToolsandclickonDataAnalysis.
Step2.WhenDataAnalysisdialogboxappears,clickoncorrelation.
Step3.Whencorrelationdialogboxappears,enterB1:C14intheinputrangebox.Clickon
Labelsinfirstrowandentera16intheoutputrangebox.ClickonOK.
Asyouseethecorrelationbetweenthenumberofcasesofsodademandedandthe
temperatureisaverystrongpositivecorrelation.Thismeansasthetemperatureincreasesthe
demandforcasesofsodaisalsoincreasing.Thelinearcorrelationcoefficientis0.966598577
whichisverycloseto+1.
Nowletsfollowsamestepsbutabitdifferenttofindtheregressionequation.
ARDBUSINESSSTATISTICS08
Page108of131
Step1.FromthemenuschooseToolsandclickonDataAnalysis
Step2.WhenDataAnalysisdialogboxappears,clickonregression.
Step3.WhenRegressiondialogboxappears,enterb1:b14intheyrangeboxandc1:c14inthe
xrangebox.Clickonlabels.
Step4.Entera19intheoutputrangebox.
Note:TheregressionequationingeneralshouldlooklikeY=mX+b.Inthisequationmisthe
slopeoftheregressionlineandbisitsyintercept.
SUMMARY OUTPUT
RegressionStatistics
MultipleR
0.966598577
RSquare
0.934312809
AdjustedRSquare 0.928341246
StandardError
2.919383191
Observations
13
ANOVA
df SS
MS
SignificanceF
11 93.75078034 8522798213
Total
12 1427.230769
9.17800767 5.445742836
Pvalue
Lower95%
Upper95%
Therelationshipbetweenthenumberofcansofsodaandthetemperatureis:
Y = 0.879202711 X + 9.17800767
The number of cans of soda = 0.879202711*(Temperature) + 9.17800767. Referring to this
expressionwecanapproximatelypredictthenumberofcasesofsodaneededonJune14.The
weatherforecastforthisis94degrees,hencethenumberofcansofsodaneededisequalto;
Thenumberofcasesofsoda=0.879202711*(94)+9.17800767=91.82orabout92cases.
ARDBUSINESSSTATISTICS08
Page109of131
ARDBUSINESSSTATISTICS08
Page110of131
Practicum:MATH11002
BusinessStatistics
MODULE9
Submittedonlyon
Day/Date:____________/
______________
Time:WIB
In____________________
ModuleDescription:
Objective
Output
DateofReceipt
Score:
AssistantSignature
IherewithsignedhereonstatedthatIhavestrivedtodoallthiswiththe
modulemyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
MULTIPLEREGRESSION
How to develop a multiple regression model How to interpret the
regression coefficients How to determine which independent variables
are most important in predicting a dependent variable How to use
quadratic terms in a regression model How to measure the correlation
amongindependentvariables
AreportofMultipleRegressionAnalysisproducedbythestudentsshould
be in the form of working procedures and results in both softcopy and
hardcopy.
9 MultipleRegressionModel
MultipleRegressionisanextensionofsimpleregression.Simpleregressionhasonlyone
independent(explanatory)variable.MultipleRegressionfitsamodelforonedependent(response)
variablebasedonmorethanoneindependent(explanatory)variables.
9.1 MULTIPLEREGRESSIONUSINGTHEDATAANALYSISADDIN
We then create a new variable in cells C2:C6, cubed household size as a regressor.
Then in cell C1 give the the heading CUBED HH SIZE. (It turns out that for the se data
squared HH SIZE has a coefficient of exactly 0.0 the cube is used).
The spreadsheet cells A1:C6 should look like:
We have regression with an intercept and the regressors HH SIZE and CUBED HH SIZE
ARDBUSINESSSTATISTICS08
Page111of131
y = b1 + b2 x2 + b3 x3
The only change over one-variable regression is to include more than one column in the
Input X Range.
Note, however, that the regressors need to be in contiguous columns (here columns B and
C).
If this is not the case in the original data, then columns need to be copied to get the
regressors in contiguous columns.
Hitting OK we obtain
ARDBUSINESSSTATISTICS08
Page112of131
Regressionstatisticstable
ANOVAtable
Regressioncoefficientstable.
9.2 INTERPRETREGRESSIONSTATISTICSTABLE
This is the following output. Of greatest interest is R Square.
Explanation
MultipleR
0.895828 R=squarerootofR2
RSquare
0.802508 R2
0.444401 Thisisthesampleestimateofthestandarddeviationoftheerroru
Observations
Numberofobservationsusedintheregression(n)
Page113of131
R2 = 0.8025 means that 80.25% of the variation of yi around ybar (its mean) is explained
by the regressors x2i and x3i.
9.3 INTERPRETANOVATABLE
AnANOVAtableisgiven.Thisisoftenskipped.
df SS
MS
SignificanceF
Regression
Residual
2 0.3950 0.1975
Total
4 2.0
The ANOVA (analysis of variance) table splits the sum of squares into its components.
Total sums of squares
= Residual (or error) sum of squares + Regression (or explained) sum of squares.
Thus i (yi - ybar)2 = i (yi - yhati)2 + i (yhati - ybar)2
where yhati is the value of yi predicted from the regression line
and ybar is the sample mean of y.
For example:
R2 = 1 - Residual SS / Total SS (general formula for R2)
= 1 - 0.3950 / 1.6050
(from data in the ANOVA table)
= 0.8025
(which equals R2 given in the regression Statistics table).
The column labeled F gives the overall F-test of H0: 2 = 0 and 3 = 0 versus Ha: at least
one of 2 and 3 does not equal zero.
Aside: Excel computes F this as:
F = [Regression SS/(k-1)] / [Residual SS/(n-k)] = [1.6050/2] / [.39498/2] = 4.0635.
The column labeled significance F has the associated P-value.
Since 0.1975 > 0.05, we do not reject H0 at signficance level 0.05.
Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of regressors
including hte intercept.
Here FINV(4.0635,2,2) = 0.1975.
9.4 INTERPRETREGRESSIONCOEFFICIENTSTABLE
The regression output of most interest is the following table of coefficients and associated
output:
ARDBUSINESSSTATISTICS08
Page114of131
Coefficient
Intercept
0.89655
HHSIZE
0.33647
CUBEDHHSIZE 0.00209
St.error
0.76440
0.42270
0.01311
tStat
1.1729
0.7960
0.1594
Pvalue
0.3616
0.5095
0.8880
Lower95%
2.3924
1.4823
0.0543
Upper95%
4.1855
2.1552
0.0585
Let j denote the population coefficient of the jth regressor (intercept, HH SIZE and
CUBED HH SIZE).
Then
Column"Coefficient"givestheleastsquaresestimatesofj.
Column"Standarderror"givesthestandarderrors(i.e.theestimatedstandarddeviation)
oftheleastsquaresestimatesbjofj.
Column"tStat"givesthecomputedtstatisticforH0:j=0againstHa:j0.
Thisisthecoefficientdividedbythestandarderror.Itiscomparedtoatwith(nk)
degreesoffreedomwhereheren=5andk=3.
Column"Pvalue"givesthepvaluefortestofH0:j=0againstHa:j0..
ThisequalsthePr{|t|>tStat}wheretisatdistributedrandomvariablewithnkdegrees
offreedomandtStatisthecomputedvalueofthetstatisticgivenintheprevious
column.
Notethatthispvalueisforatwosidedtest.Foraonesidedtestdividethispvalueby2
(alsocheckingthesignofthetStat).
Columns"Lower95%"and"Upper95%"valuesdefinea95%confidenceintervalforj.
Asimplesummaryoftheaboveoutputisthatthefittedlineis
9.5 CONFIDENCEINTERVALSFORSLOPECOEFFICIENTS
95% confidence interval for slope coefficient 2 is from Excel output (-1.4823, 2.1552).
Excel computes this as
b2 t_.025(3) se(b2)
= 0.33647 TINV(0.05, 2) 0.42270
= 0.33647 4.303 0.42270
= 0.33647 1.8189
= (-1.4823, 2.1552).
ARDBUSINESSSTATISTICS08
Page115of131
Otherconfidenceintervalscanbeobtained.
Forexample,tofind99%confidenceintervals:intheRegressiondialogbox(intheData
AnalysisAddin),
checktheConfidenceLevelboxandsetthelevelto99%.
9.6 TESTHYPOTHESISOFZEROSLOPECOEFFICIENT("TESTOFSTATISTICAL
SIGNIFICANCE")
The coefficient of HH SIZE has estimated standard error of 0.4227, t-statistic of 0.7960
and p-value of 0.5095.
It is therefore statistically insignificant at significance level = .05 as p > 0.05.
The coefficient of CUBED HH SIZE has estimated standard error of 0.0131, t-statistic of
0.1594 and p-value of 0.8880.
It is therefore statistically insignificant at significance level = .05 as p > 0.05.
There are 5 observations and 3 regressors (intercept and x) so we use t(5-3)=t(2).
For example, for HH SIZE p = =TDIST(0.796,2,2) = 0.5095.
9.7 TESTHYPOTHESISONAREGRESSIONPARAMETER
Here we test whether HH SIZE has coefficient 2 = 1.0.
Example: H0: 2 = 1.0 against Ha: 2 1.0 at significance level = .05.
Then
t = (b2 - H0 value of 2) / (standard error of b2 )
= (0.33647 - 1.0) / 0.42270
= -1.569.
9.7.1
9.7.2
Usingthepvalueapproach
pvalue=TDIST(1.569,2,2)=0.257.[Heren=5andk=3sonk=2].
Donotrejectthenullhypothesisatlevel.05sincethepvalueis>0.05.
Usingthecriticalvalueapproach
Wecomputedt=1.569
Thecriticalvalueist_.025(2)=TINV(0.05,2)=4.303.[Heren=5andk=3sonk=2].
Sodonotrejectnullhypothesisatlevel.05sincet=|1.569|<4.303.
9.8 OVERALLTESTOFSIGNIFICANCEOFTHEREGRESSIONPARAMETERS
We test H0: 2 = 0 and 3 = 0 versus Ha: at least one of 2 and 3 does not equal zero.
ARDBUSINESSSTATISTICS08
Page116of131
From the ANOVA table the F-test statistic is 4.0635 with p-value of 0.1975.
Since the p-value is not less than 0.05 we do not reject the null hypothesis that the
regression parameters are zero at significance level 0.05.
Conclude that the parameters are jointly statistically insignificant at significance level 0.05.
Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of regressors
including hte intercept.
Here FINV(4.0635,2,2) = 0.1975.
9.9 PREDICTEDVALUEOFYGIVENREGRESSORS
Consider case where x = 4 in which case CUBED HH SIZE = x^3 = 4^3 = 64.
= b1 + b2 x2 + b3 x3 = 0.88966 + 0.33654 + 0.002164 = 2.37006
9.10 EXCELLIMITATIONS
Excel requires that all the regressor variables be in adjoining columns. You may
need to move columns to ensure this. e.g. If the regressors are in columns B and D
you need to copy at least one of columns B and D so that they are adjacent to each
other.
Excel standard errors and t-statistics and p-values are based on the assumption that
the error is independent with constant variance (homoskedastic).
Excel does not provide alternaties, such asheteroskedastic-robust or
autocorrelation-robust standard errors and t-statistics and p-values
9.11 Assignment9.1
DATA:
Store BarsSold Price(cents) Promotion($) Store Barssold Price(cents) Promotion($)
1
4141
59
200
18
2730
79
400
2
3842
59
200
19
2618
79
400
3
3056
59
200
20
4421
79
400
4
3519
59
200
21
4113
79
600
5
4226
59
400
22
3746
79
600
6
4630
59
400
23
3532
79
600
7
3507
59
400
24
3825
79
600
8
3754
59
400
25
1096
99
200
9
5000
59
600
26
761
99
200
ARDBUSINESSSTATISTICS08
Page117of131
Asampleof34storesdatainiasupermarketchainisselectedforatestmarketstudyofOmniPower.All
thestoresselectedhaveapproximatelythesamemonthlysalesvolume.Twoindependentvariablesare
pricesofbar(X1)andmonthlyAdsexpenditures(X2).
a.
b.
c.
d.
e.
UseExcelDataAnalysisRegressiontoestimatetheregressionline
Interpretregressionstatisticstable
Use95%and99%confidenceinterval
TestHypothesisOfZeroSlopeCoefficient("TestOfStatisticalSignificance")
TestHypothesisOnARegressionParameter
i. UsingThePValueApproach
ii. UsingTheCriticalValueApproach
f. OverallTestOfSignificanceOfTheRegressionParameters
g. PredictedValueOfYGivenPrice89centsandPromotion800
ARDBUSINESSSTATISTICS08
Page118of131
Practicum:MATH11002
BusinessStatistics
MODULE10
Submittedonlyon
Day/Date:____________/
______________
Time:WIB
In____________________
ModuleDescription:
Objective
Output
DateofReceipt
Score:
AssistantSignature
IherewithsignedhereonstatedthatIhavestrivedtodoallthiswiththe
modulemyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
TIMESERIESFORECASTING
Discussed the important of forecasting Performed smoothing of data
seriesDescribedleastsquaretrendfittingandforecastingAddressedtime
seriesforecastingAddressedautoregressivemodelsDescribedprocedure
forchoosingappropriatemodels
10 TimeSeriesForecasting
TimeSeriesanalysishastwomaingoals:
*Identifyingthenatureofasequenceofobservations.
*Predictingfuturevaluesusinghistoricalobservations(alsoknownasforecasting).
In Time Series analysis, it is assumed that the data consists of a systematic pattern, and also random
noise that makes the pattern difficult to identify. Most time series analysis techniques use filtering to
remove the data noise. There are two general components of Time series patterns: Trend and
Seasonality.Thetrendisalinearornonlinearcomponent,anddoesnotrepeatwithinthetimerange.
TheSeasonalityrepeatsitselfinsystematicintervalsovertime.Thesetwocomponentsareoftenboth
presentinrealdata.
TrendAnalysis
Trendanalysisisatechniqueused toidentifyatrendcomponentintimeseriesdata.Inmany
cases data can be approximated by a linear function, but logarithmic, exponential, and
polynomialfunctionscanalsobeused.
RegressionAnalysis
Regressionanalysisisthestudyofrelationshipsamongvariables,anditspurposeistopredict,or
estimate,thevalueofonevariablefromtheknownvaluesofothervariablesrelatedtoit.Any
methodoffittingequationstodatamaybecalledregression,andtheseequationsareusefulfor
makingpredictions,andjudgingthestrengthofrelationships.
Forecasting and extrapolation from present values to future values is not a function of regression
analysis. To predict the future, time series analysis is used. To predict values it is necessary to find a
predictive function that will minimize the sum of distances between each of the points, and the
predictive function itself. The leastsquares method is the most common function amongst the
ARDBUSINESSSTATISTICS09
Page119of131
predictivefunctions,anditcalculatestheminimumaveragesquareddeviationsbetweenthepoints,and
theestimatedfunction.
10.1 Timeseriesforecastingmodels
Basicassumptionoftimeseriesforecastingisthatthefactorsthathaveinfluencedactivitiesinthepast
andpresentwillcontinuetodosoinapproximatelythesamewayinthefuture.Atrendisanoverall
longtermupwardordownwardmovementinatimeseries.Themostbasicintheclassicalmultiplicative
modelforannual,quarterly,andmonthly.
10.1.1 CLASSICALMULTIPLICATIVETIMESERIESMODELFORANNUALDATA
Yi=TixCixIi
Where:
Ti=valueofthetrendcomponentinyeari
Ci=valueofthecyclicalcomponentinyeari
Ii=valueoftheirregularcomponentinyeari
CLASSICALMULTIPLICATIVETIMESERIESMODELFORANNUALDATAWITHASEASONALCOMPONENT
Yi=TixSixCixIi
Where:
Ti,Ci,Ii=valueofthetrend,cyclical,andirregularcomponentsinyeari
Si=valueofthecomponentinyeari
UseWrigleyCodedDatabelowtocreateexcelchartplotforActualGrossRevenue
Year ActualRevenue Year ActualRevenue
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
ARDBUSINESSSTATISTICS09
591
620
699
781
891
993
1111
1149
1301
1440
1661
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
1770
1851
1954
2023
2079
2146
2430
2746
3069
3649
4159
Page120of131
4500
4000
y = 143.63x - 284695
R = 0.9121
3500
Revenue ($millions)
3000
2500
2000
1500
1000
500
0
1980
1985
1990
1995
Year
2000
2005
Year
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Population
198,584
200,591
203,133
205,220
207,753
212,577
215,092
217,570
221,168
223,357
226,082
2010
10.1.2 Assignment9.1
Year
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
Population
176,383
178,206
180,587
182,753
184,613
186,393
189,164
190,925
192,805
194,838
196,814
Workforce
113,544
115,461
117,834
119,865
121,669
123,869
125,840
126,346
128,105
129,200
131,056
Workforce
132,304
133,943
136,297
137,673
139,368
142,583
143,734
144,863
146,510
146,817
147,956
c. PlotusingMsExcelthetimeseriesforUSciviliannoninstitutionalpopulationofpeople
16yearsandolder.
d. Computetheliniertrendforecastingequation
e. ForecasttheUSciviliannoninstitutionalpopulationofpeople16yearsandolderfor
2006and2007.
f. Repeat(a)through(c)forUS.civiliannoninstitutionalworkforceofpeople16yearsand
older.
10.2 MovingAverageandExponentialSmoothing
10.2.1 MovingAverageModels
UsetheAddTrendlineoptiontoanalyzeamovingaverageforecastingmodelinExcel.Youmustfirst
createagraphofthetimeseriesyouwanttoanalyze.Selecttherangethatcontainsyourdataandmake
ascatterplotofthedata.Oncethechartiscreated,followthesesteps:
ARDBUSINESSSTATISTICS09
Page121of131
1. Clickonthecharttoselectit,andclickonanypointonthelinetoselectthedataseries.
Whenyouclickonthecharttoselectit,anewoption,Chart,saddedtothemenubar.
2. FromtheChartmenu,selectAddTrendline.
Movingaveragesforachosenperiodoflength(L)consistofaseriesofmeanscomputedovertimesuch
thateachmeaniscalculatedforasequenceofLobservedvalues.MovingAveragearerepresentedby
thesymbolMA(L).Forexamplewehave11yearsdataandwanttocomputefiveyearmovingaverages
(L=5).
11yearsperiod1996to2006data:
4.0
5.0
7.0
6.0
8.0
9.0
5.0
2.0
3.5
5.5
6.6
MA(5)=(Y1+Y2+Y3+Y4+Y5)/L=(4.0+5.0+7.0+6.0+8.0)/5=6.0
Putthemovingaveragecomputedabovecenteredonnewmiddlevalue(7.0).CalculatetherestMA(L)
andwehave:
Revenue
4.0
5.0
7.0
6.0
8.0
9.0
5.0
2.0
3.5
5.5
6.6
MA
6.0
7.0
7.0
6.0
5.5
5.0
4.5
ThefollowingisthreeyearandsevenyearmovingforCabotCorporationrevenues:
Revenue
1982
1588
1983
1558
MA 3-Year
MA 7-Year
#N/A
#N/A
1633
#N/A
#N/A
1984
1753
1573
1985
1408
1490.3
1531.1
1986
1310
1380.7
1581.0
1987
1424
1470.3
1599.1
1988
1677
1679.3
1561.3
1989
1937
1766.3
1583.3
1990
1685
1703.3
1627.4
1991
1488
1578.3
1665.0
1992
1562
1556.3
1688.4
1993
1619
1622.7
1678.1
1994
1687
1715.7
1671.3
1995
1841
1797.7
1694.9
1996
1865
1781.0
1714.4
1997
1637
1718.3
1725.7
1998
1653
1663.0
1702.3
1999
1699
1683.3
1661.7
2000
1698
1640.0
1651.7
2001
1523
1592.7
1694.1
2002
1557
1625.0
2003
1795
1762.0
2004
1934
2005
2125
1951.3
#N/A
ARDBUSINESSSTATISTICS09
2000
Revenues ($millions)
Year
1500
Revenue
Revenue
1000
MA 3-Year
MA 7-Year
500
1980
1985
1990
1995
Year
2000
2005
1761.6
#N/A
#N/A
#N/A
Page122of131
2010
10.2.2 ExponentialSmoothingModels
ThesimplestwaytoanalyzeatimerseriesusinganExponentialSmoothingmodelinExcelistousethe
dataanalysistool.ThistoolworksalmostexactlyliketheoneforMovingAverage,exceptthatyouwill
needtoinputthevalueofainsteadofthenumberofperiods,k.Onceyouhaveenteredthedatarange
andthedampingfactor,1,andindicatedwhatoutputyouwantandalocation,theanalysisisthe
sameastheonefortheMovingAveragemodel.
COMPUTINGANDEXPONENTIALLYSMOOTHEDVALUEINTIMEPERIODi
Ei=Yi
Ei=WYi+(1W)Ei1
i=2,3,4,
Where
Ei=valueoftheexponentiallysmoothedseriesbeingcomputedintimeperiodi
Ei1=valueoftheexponentiallysmoothedseriesbeingcomputedintimeperiodi1
Yi=Observedvalueofthetimeseriesinperiodi
W=subjectivelyassignedweightorsmoothingcoefficient(0<W<1).
Revenue
ES(W=.50)
ES(W=.25)
1982
1588
1588.0
1588.0
1983
1558
1573.0
1580.5
1984
1753
1663.0
1623.6
1985
1408
1535.5
1569.7
1986
1310
1422.8
1504.8
1987
1424
1423.4
1484.6
1988
1677
1550.2
1532.7
1989
1937
1743.6
1633.8
1990
1685
1714.3
1646.6
1991
1488
1601.1
1606.9
1992
1562
1581.6
1595.7
1993
1619
1600.3
1601.5
1994
1687
1643.6
1622.9
1995
1841
1742.3
1677.4
1996
1865
1803.7
1724.3
1997
1637
1720.3
1702.5
1998
1653
1686.7
1690.1
1999
1699
1692.8
1692.3
2000
1698
1695.4
1693.8
2001
1523
1609.2
1651.1
2002
1557
1583.1
1627.5
2003
1795
1689.1
1669.4
2004
1934
1811.5
1735.6
2005
2125
1968.3
1832.9
2000
Revenues ($millions)
Year
1500
Revenue
1000
ES(W=.50)
ES(W=.25)
500
1980
1985
1990
1995
Year
2000
2005
ARDBUSINESSSTATISTICS09
Page123of131
2010
10.3 Assignment10.2
Year
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Deals
715
865
708
861
931
939
1031
893
735
759
1013
622
h. Plotthetimeseries
i. Fitathreeyearmovingaveragetothedataandplottheresults.
j. UsingasmoothingcoefficientofW=0.50,exponentiallysmooththeseriesand
plotstheresults
k. Repeat(c)usingW=0.25
l. Comparetheresultsof(c)and(d).
10.4 Linear,exponentialandquadratictrend
10.4.1 LinearTrendModel
LiniertrendmodelYi=0+1Xi+ iisthesimplestforecastingmodel.
UsingWrigleyDataaboveweplotusingMicrosoftExceltimeseriesofrealgrossrevenuesshownbelow:
UsingMicrosoftExcel,weperforma
simplelinierregressionanalysisonthe
adjustedtimeseriesresultsinthe
followingliniertrendforecasting
equation: 469.9158 62.1068
Theregressioncoefficientcanbe
interpretasfollows:
TheYintercept,b0=
469.9158
The Slope, b1=62.1068
Forexamplewewanttoprojectthetrendin2006thensubstituteX23=22(2006code),intothelinear
trendforecastingequation:
ARDBUSINESSSTATISTICS09
Page124of131
469.9158
62.1068 22
1,839.265
1983
1984
QuadraticTrendModel
,isthesimplestnonlinearmodel.TheequationofQuadraticTrendModel
presentedbelow:
,
quadraticeffectonY
estimated
Forexample,UsingMicrosoftExceltocomputethequadratictrendforecastingequation.Figurebelowprovides
theresultsforquadratictrendmodelusedtoforecastrealgrossrevenuesattheWM.WrigleyJr.company:
618.3211
17.5852
2.1201
Tocomputeaforecastusingthequadratictrendequationin2006thensubstituteX23=22(2006code),
intothequadratictrendforecastingequation:
618.3211
17.5852 22
2.1201 22
2,031.324
ARDBUSINESSSTATISTICS09
Page125of131
10.4.2 ExponentialTrendModel
Theexponentialtrendmodelequation(
where
1 100%
trendforecastingequationislog(Yi)=b0+b1Xi.
% .Theexponential
ExcelresultsworksheetforanexponentialtrendmodelforrealgrossrevenuesattheWM.WrigleyJr.companyis
Usingexponentialtrendequationandtheresultsabovewehave:log(Yi)=2.7647+.0245Xi,whereyear0is1984.
Computethevaluesfor and byusingtheantilogofregressioncoefficients(b0andb1):
2.7647
0.0245
10
10
581.701
.
1.058
Thus,theequationoftheexponentialtrendforecastingis:
581.701 1.058
Toforecastrealgrossrevenuesfor2006(X23=22)usingtheaboveequationareasfollow:
log(Yi)=2.7647+.0245(22)=3.3037
3.3037
103.3037
2,012.334
Thechartofexponentialtrendforecastingis:
ARDBUSINESSSTATISTICS09
Page126of131
10.4.3 ModelSelectionUsingFirst,Second,andPercentageDifferences
Toselectwhichofthosemodelsaboveisthemostappropriatemodel,wecanusevisually
inspectingscatterplotandcompatingtheadjustedr2values,wecancompareandexaminefirst,
second,andpercentagedifferences.
PerfectFitForLinearTrendModel:Thefirstdifferencesareconstant.Andtheconsecutive
valuesintheseriesarethesamethroughout
Example:
Passengers
First Diff
1997
30
1998
33
33
1999
36
3
2000
39
36
2001
42
6
2002
45
39
2003
48
9
2004
51
42
2005
54
12
2006
57
45
PerfectFitForQuadraticTrendModel:Theseconddifferencesareconstant.Andthe
consecutivevaluesintheseriesarethesamethroughout
Example:
Passengers
First Diff
Second Diff
1997
30
1998
31
31
1999
33.5
2.5
1.5
2000
37.5
35
1.5
2001
43
8
1.5
2002
50
42
1.5
2003
58.5
16.5
1.5
2004
68.5
52
1.5
2005
80
28
1.5
2006
93
65
1.5
PerfectFitForExponentialTrendModel:Thepercentagedifferencebetweentheconsecutive
valuesareconstant.Thus
100%
100%
100%
Example:
1997 1998 1999 2000
ARDBUSINESSSTATISTICS09
Passengers
FirstDiff
SecondDiff
PercentageDiff
30
31.5
31.5
5%
33.1
1.6
0.1
5%
34.8
36.5
33.2
3.3
0.1 7.11E15
5%
5%
38.3
35
0.1
5%
40.2
5.2
0.1
5%
42.2
37
0.1
5%
44.3
7.3
0.1
5%
46.5
39.2
0.1
5%
For the real gross revenue data at WM Jr. Company, neither the first, second differences, nor
percentage differences are constant across the series (see: table below). Therefore, the other
modelsmaybemoreappropriate(includingthoseconsideredinAutoregressiveModeling.
10.4.4 Assignment10.3
a. PlottheDataofTable9.1BedBath&BeyondInc.
Table101BedBath&BeyondInc.
b. Computealineartrendforecastingequationandplotthe
results.
c. Computealineartrendforecastingequationandplotthe
results.
d. alineartrendforecastingequationandplottheresults.
e. Usingtheforecastingequationin(b)through(d),whatare
yourannualforecastsofthenumberofstoresopenfor
2007and2008
f. Howcanyouexplainthedifferencesinthethreeforecast
in(e)?Whatforecastdoyouthinkyoushoulduse?Why?
10.5 Theautoregressiveandtheleastsquaremodelsforseasonaldata
Autoregressivemodelingisatechniqueusedtoforecasttimeserieswithautocorrelation.Afirstorder
autocorrelationreferstotherelationshipbetweenconsecutivevaluesintimeseries.Asecondorder
autocorrelationreferstotherelationshipbetweenvaluesthataretwoperiodapart.Apthorderorder
autocorrelationreferstothecorrelationbetweenvaluesinatimeseriesthatarepperiodapart.
FirstOrderAutoregressiveModel
issimilarinformtothesimplelinearregressionmodel.
10.6 Pricesindexes
Indexnumbersallowrelativecomparisonsovertime
Indexnumbersarereportedrelativetoabaseperiodindex
Baseperiodindex=100bydefinition
ARDBUSINESSSTATISTICS09
Page128of131
where
Ii=indexnumberforyeari
Pi=priceforyeari
Pbase=priceforthebaseyear
10.6.1 Example
Airplaneticketpricesfrom1998to2006:
Pricesin1998were92.2%ofbaseyearprices
Pricesin2000were100%ofbaseyearprices(bydefinition,since2000isthebaseyear)
Pricesin2006were130.2%ofbaseyearprices
10.7 Aggregatedandsimpleindexes
An aggregate index is used to measure the rate of change from a base period for a group of items
ARDBUSINESSSTATISTICS09
Page129of131
10.7.1 UnweightedAggregatePriceIndex
Example:
Year
Leasepayment
Fuel
Repair
Total
Index(2003=100)
2003
260
45
40
345
100.0
2004
280
60
40
380
110.1
2005
305
55
45
405
117.4
2006
310
50
50
410
118.8
410
I 2006
100
(100) 118.8
P2003
345
Unweightedtotalexpenseswere18.8%higherin2006thanin2003
P2006
10.7.2 WeightedAggregatePriceIndexes
ARDBUSINESSSTATISTICS09
Page130of131
ARDBUSINESSSTATISTICS09
Page131of131