MATH11002 PRAC Modules Reco

Download as pdf or txt
Download as pdf or txt
You are on page 1of 131

PracticumModule

Math11002BusinessStatistics

By:AurinoRilmanAdamDjamaris

MODELLINGANDSIMULATIONLABORATORY
MANAGEMENTPROGRAM

2010


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

1.1

1.

Answerquestionsbelowwithabriefdescription...................................................................................8

EXPLAINKEYDEFINITIONANDGIVEATLEAST1EXAMPLE!.........................................................8

1.2

UseMicrosoftExcelcompletefollowingtasks!!.....................................................................................9

1.3

CreateBarchartandalsoincludecumulativelinechartusingdataontable1.........................................9

1.4

CreatePieGraph,andattachexcelgraphresultstoasyouranswer!......................................................9

1.5
Thefollowingdatarepresentthecostofelectricityduringjuly2006forrandomsamplesof50one
bedroomapartmentsinlargecity......................................................................................................................9
1.6
Fromafrequencydistributionandpercentagedistributionthathaveclassintervalwithupperclasslimits
$99,$119,andsoon.........................................................................................................................................10
1.7

Constructahistogramandapercentagepolygon..................................................................................10

1.8

Formacumulativepercentagedistributionandplotacumulativepercentagepolygon.........................10

1.9

Aroundwhatamountdoesmonthlyelectricitycostseemtobeconcentrated?.....................................10

1.10 Appendix..............................................................................................................................................10
1.10.1
InstallingExcelAddInsforPHStat2.......................................................................................................10
1.10.2
INSTALLINGDATAANALYSISONEXCEL2007.....................................................................................10
1.10.3
InstallingandOperatingthePrenticeHallPHStatONYourHomeComputer......................................11
1.10.4
ConfiguringExcel2007securityforPHStat2.........................................................................................11

NUMERICALDESCRIPTIVEMEASURES.........................................................................................13

2.1
CentralTendency..................................................................................................................................13
2.1.1 TheMean...................................................................................................................................................13
2.1.2 TheMedian................................................................................................................................................14
2.1.3 TheMode...................................................................................................................................................15
2.1.4 Quartiles.....................................................................................................................................................16
2.1.5 TheGeometricMean.................................................................................................................................17
2.1.6 OtherusefulExcelBasicBuiltInFunctions:...............................................................................................17
2.2

Assignment2.1:....................................................................................................................................20

2.3
Variation..............................................................................................................................................20
2.3.1 TheRange...................................................................................................................................................20
2.3.2 TheInterQuartileRange.............................................................................................................................21
2.3.3 TheVarianceandStandarDeviation..........................................................................................................21
2.3.4 TheCoefficientofVariance.......................................................................................................................22
ARDBUSINESSSTATISTICSSec.2

Page2of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

2.3.5

ZScores......................................................................................................................................................23

2.4
Shape...................................................................................................................................................24
2.4.1 Formula:.....................................................................................................................................................24
2.5

Assignment2.2:...................................................................................................................................25

2.6
Descriptivesummaryofpopulation......................................................................................................25
2.6.1 ExcelStatisticalAnalysisTools...................................................................................................................25
2.6.2 InstallandusetheAnalysisToolPak..........................................................................................................26
2.7

Boxwhiskerplot...................................................................................................................................27

2.8

Assignment2.3.....................................................................................................................................29

2.9

Weightedmean....................................................................................................................................29

2.10

Assignment2.4.....................................................................................................................................30

2.11

Correlationcoefficients.........................................................................................................................30

2.12

Covariance............................................................................................................................................33

2.13 Assignment2.5.....................................................................................................................................33
2.13.1
CaloriesandFatrelationship.................................................................................................................33
2.13.2
FuelEfficiencyCalculationandStandard...............................................................................................34

3
3.1

PROBABILITY..............................................................................................................................35
BasicProbability...................................................................................................................................35

3.2
Samplespacesandevents,contingencytables,simpleprobabilityandjointprobability........................36
3.2.1 SampleSpace.............................................................................................................................................36
3.2.2 EventinSampleSpace...............................................................................................................................36
3.2.3 SimpleandJointProbability.......................................................................................................................37
3.3

Bayes'Theorem....................................................................................................................................38

3.4

Assignment3.1.....................................................................................................................................39

3.5
BasicProbabilityRules..........................................................................................................................41
3.5.1 DiscreteRandomVariable..........................................................................................................................41
3.5.2 DiscreteRandomVariablesExpectedValue..............................................................................................42
3.5.3 DiscreteRandomVariablesDispersion......................................................................................................42
3.5.4 Covariance..................................................................................................................................................42
3.5.5 TheSumofTwoRandomVariables:Measures..........................................................................................43
ARDBUSINESSSTATISTICSSec.2

Page3of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

3.6
BinomialDistribution............................................................................................................................44
3.6.1 Properties...................................................................................................................................................44
3.6.2 TheBinomialDistributionFormula............................................................................................................45
3.6.3 TheshapeandCharacteristics...................................................................................................................45
3.7
PoissonDistribution..............................................................................................................................46
3.7.1 Properties...................................................................................................................................................46
3.7.2 Formula......................................................................................................................................................46
3.7.3 Shape..........................................................................................................................................................47
3.8
Hypergeometricdistribution.................................................................................................................47
3.8.1 Formula......................................................................................................................................................47
3.8.2 Example......................................................................................................................................................48
3.9

ReadExcelCompaniontoChapter5......................................................................................................48

3.10

Assignment3.2.....................................................................................................................................48

3.11

Assignment3.3.....................................................................................................................................49

NORMALANDSAMPLINGDISTRIBUTION...................................................................................50

4.1
NormalDistributionandEvaluatingNormality......................................................................................50
4.1.1 NormalProbabilityDensityFunction.........................................................................................................51
4.1.2 EvaluatingNormality..................................................................................................................................52
4.2
SamplingandSamplingDistribution......................................................................................................54
4.2.1 Sample........................................................................................................................................................54
4.2.2 TypesofSamples........................................................................................................................................54
4.2.3 SamplingDistributions...............................................................................................................................55
4.2.4 SAMPLINGFROMFINITEPOPULATIONS....................................................................................................56
4.3

AssignmentforSimpleRandomSample................................................................................................56

4.4

AssignmentforSamplingDistribution...................................................................................................56

4.5

AssignmentforTheSamplingDistributionofthemean.........................................................................56

4.6

AssignmentforSamplingfromFinitePopulation...................................................................................57

CONFIDENCEINTERVALESTIMATION..........................................................................................58

5.1
Confidenceintervals.............................................................................................................................58
5.1.1 Apointestimateandaconfidenceintervalestimate................................................................................58
5.1.2 ConfidenceIntervalfor(Known).........................................................................................................59
ARDBUSINESSSTATISTICSSec.2

Page4of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

5.1.3

ConfidenceIntervalfor(Unknown).....................................................................................................61

5.2
ConfidenceIntervalEstimateforaSinglePopulationProportion...........................................................64
5.2.1 ExampleforConfidenceIntervalsforthePopulationProportion..............................................................64
5.3
DeterminingSampleSize......................................................................................................................65
5.3.1 IFPopulationStandardDeviation()Known.............................................................................................65
5.3.2 IFPopulationStandardDeviation()Unknown.........................................................................................66
5.3.3 ToDetermineTheRequiredSampleSizeForTheProportion...................................................................66
5.4

Assignment5........................................................................................................................................67

HYPOTHESISTESTINGANDTWOSAMPLETEST...........................................................................68

6.1
HypothesisTesting................................................................................................................................68
6.1.1 TheNullHypothesis,H0..............................................................................................................................68
6.1.2 TheAlternativeHypothesis,H1..................................................................................................................69
6.1.3 TheHypothesisTestingProcess.................................................................................................................69
6.1.4 TheTestStatisticandCriticalValues..........................................................................................................70
6.1.5 ErrorsinDecisionMaking..........................................................................................................................70
6.1.6 LevelofSignificance,...............................................................................................................................71
6.1.7 HypothesisTesting:Known.....................................................................................................................71
6.1.8 6StepsofHypothesisTesting:...................................................................................................................72
6.1.9 HypothesisTesting:KnownpValueApproach.......................................................................................73
6.1.10
HypothesisTesting:KnownConfidenceIntervalConnections...........................................................74
6.1.11
OneTailTests.........................................................................................................................................74
6.1.12
HypothesisTesting:Unknown............................................................................................................77
6.1.13
HypothesisTesting:ConnectiontoConfidenceIntervals......................................................................77
6.1.14
HypothesisTestingProportion..............................................................................................................78
6.2

Assignment6.1.....................................................................................................................................79

6.3
TwoSampleTests.................................................................................................................................79
6.3.1 TwoSampleTestsIndependentPopulations.............................................................................................81
6.3.2 IndependentPopulationsUnequalVariance.............................................................................................82

ANOVAANDCHISQUAREANDNONPARAMETRICTESTS..........................................................83

7.1
OneWayAnalysisofVariance..............................................................................................................84
7.1.1 Hypotheses:OneWayANOVA...................................................................................................................84
7.1.2 PartitioningtheVariation...........................................................................................................................85
7.1.3 ObtainingtheMeanSquares.....................................................................................................................86
7.1.4 OneWayANOVATable..............................................................................................................................86
7.1.5 Teststatistic...............................................................................................................................................86
7.1.6 Example......................................................................................................................................................87
ARDBUSINESSSTATISTICSSec.2

Page5of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

7.1.7
7.1.8

TheTheTukeyKramerProcedure.............................................................................................................88
ANOVAAssumptions..................................................................................................................................89

7.2
TwoWayAnalysisofVariance..............................................................................................................90
7.2.1 SourcesofVariation...................................................................................................................................90
7.2.2 TwoWayANOVA:Features.......................................................................................................................91
7.2.3 Interaction..................................................................................................................................................91
7.3
CHISQUAREANDNONPARAMETRICTESTS..........................................................................................91
7.3.1 OneVariableChiSquare(goodnessoffittest)withequalexpectedfrequencies...................................92
7.3.2 OneVariableChiSquare(goodnessoffittest)withpredeterminedexpectedfrequencies....................94
7.3.3 TwoVariableChiSquare(testofindependence)......................................................................................96
7.4
Assignment...........................................................................................................................................98
7.4.1 Assignment7.1...........................................................................................................................................98
7.4.2 Assignment7.2.........................................................................................................................................100
7.4.3 Assignment7.3.........................................................................................................................................101

REGRESSIONANALYSIS.............................................................................................................105

8.1

SimpleRegressionAnalysis.................................................................................................................105

8.2

RegressionAnalysisUsingExcel..........................................................................................................105

8.3

RegressionDialogBox.........................................................................................................................106

8.4

SimpleRegression...............................................................................................................................107

8.5

LinearCorrelationandRegressionAnalysis.........................................................................................107

MULTIPLEREGRESSIONMODEL................................................................................................111

9.1

MULTIPLEREGRESSIONUSINGTHEDATAANALYSISADDIN................................................................111

9.2

INTERPRETREGRESSIONSTATISTICSTABLE.........................................................................................113

9.3

INTERPRETANOVATABLE...................................................................................................................114

9.4

INTERPRETREGRESSIONCOEFFICIENTSTABLE.....................................................................................114

9.5

CONFIDENCEINTERVALSFORSLOPECOEFFICIENTS.............................................................................115

9.6

TESTHYPOTHESISOFZEROSLOPECOEFFICIENT("TESTOFSTATISTICALSIGNIFICANCE").....................116

9.7
TESTHYPOTHESISONAREGRESSIONPARAMETER..............................................................................116
9.7.1 Usingthepvalueapproach.....................................................................................................................116
ARDBUSINESSSTATISTICSSec.2

Page6of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

9.7.2

Usingthecriticalvalueapproach.............................................................................................................116

9.8

OVERALLTESTOFSIGNIFICANCEOFTHEREGRESSIONPARAMETERS...................................................116

9.9

PREDICTEDVALUEOFYGIVENREGRESSORS.......................................................................................117

9.10

EXCELLIMITATIONS............................................................................................................................117

9.11

Assignment9.1...................................................................................................................................117

10

TIMESERIESFORECASTING...................................................................................................119

10.1 Timeseriesforecastingmodels...........................................................................................................120
10.1.1
CLASSICALMULTIPLICATIVETIMESERIESMODELFORANNUALDATA..............................................120
10.1.2
Assignment9.1....................................................................................................................................121
10.2 MovingAverageandExponentialSmoothing......................................................................................121
10.2.1
MovingAverageModels......................................................................................................................121
10.2.2
ExponentialSmoothingModels...........................................................................................................123
10.3

Assignment10.2.................................................................................................................................124

10.4 Linear,exponentialandquadratictrend..............................................................................................124
10.4.1
LinearTrendModel.............................................................................................................................124
10.4.2
ExponentialTrendModel....................................................................................................................126
10.4.3
ModelSelectionUsingFirst,Second,andPercentageDifferences......................................................127
10.4.4
Assignment10.3..................................................................................................................................128
10.5

Theautoregressiveandtheleastsquaremodelsforseasonaldata......................................................128

10.6 Pricesindexes.....................................................................................................................................128
10.6.1
Example...............................................................................................................................................129
10.7 Aggregatedandsimpleindexes...........................................................................................................129
10.7.1
UnweightedAggregatePriceIndex.....................................................................................................130
10.7.2
WeightedAggregatePriceIndexes.....................................................................................................130

ARDBUSINESSSTATISTICSSec.2

Page7of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Practicum:Math11002
BusinessStatistics
MODULE1
Submittedonlyon
Day/Date:____________/
______________
Time:12.0014.00WIB
In____________________

ModuleDescription:
Objective

Output

DateofReceipt

Score:

AssistantSignature

I herewith signed here on stated that I have strived to do all this the
modulebymyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
DataCollectionandDataPresentation
The student understand the sources of data used in business, types of
data used in business, Developing tables and charts for categorical data
Developing tables and charts for numerical data and presenting graphs
Examinationofcrosstabulateddatausingthecontingencytableandside
bysidebarchartandusingMicrosoftExceltoprocessbusinessdata.
Useseparatepaperstoreportyourresults(inhandwritingorcomputer
print out). A report produced by the students should be in the form of
workingproceduresandresultsinbothsoftcopyandhardcopy.

PreLabRead:

Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,pages1830andpages7593.
SettheMs.ExcelApplicationtobereadyforDataAnalysisAddIn.Seepage2829.

1.1 Answerquestionsbelowwithabriefdescription.

1. ExplainKeyDefinitionandgiveatleast1example!
1.1 Population:
1.2 Sample:
1.3 Parameter:
1.4 Statistics:
1.5 Descriptive:
1.6 InferentialStatistics:
2. Namethreecircumstancesthatrequiredatacollection
3. ExplainthedifferencebetweenDescriptiveandInferentialStatistics

4.

Designquestionnaireaboutdatacollectionofyourownwithatleast10question!

ARDBUSINESSSTATISTICSSec.3

Page8of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

5.

5.1
5.2
5.3
5.4
6.

6.1
6.2
6.3
6.4

AccordingtoTheStateoftheNewsMedia,2006,theaverageageofviewersof"ABCWorldNews
Tonight"is59years.SupposearivalnetworkexecutivehypothesizesthattheaverageageofABC
newsviewersislessthan59.Totestherhypothesis,shesamples500ABCnightlynewsviewersand
determinestheageofeach.
Describethepopulation.
Describethevariableofinterest.
Describethesample.
Describetheinference.
ProblemColawarsisthepopulartermfortheintensecompetitionbetweenCocaColaandPepsi
displayedintheirmarketingcampaigns.Theircampaignshavefeaturedmovieandtelevisionstars,
rockvideos,athleticendorsements,andclaimsofconsumerpreferencebasedontastetests.
Suppose,aspartofaPepsimarketingcampaign,1,000colaconsumersaregivenablindtastetest
(i.e.,atastetestinwhichthetwobrandnamesaredisguised).Eachconsumerisaskedtostatea
preferenceforbrandAorbrandB.
Describethepopulation,
Describethevariableofinterest.
Describethesample.
Describetheinference.

1.2 UseMicrosoftExcelcompletefollowingtasks!!
1.3 CreateBarchartandalsoincludecumulativelinechartusingdataontable1.
1.4 CreatePieGraph,andattachexcelgraphresultstoasyouranswer!
Table1.PercentageExpendedMoney
WhatYouWouldDoWiththe
Percentage
Money
(%)
Buyaluxuryitem,vacation,orgift
20
Giveittocharity
2
Paydebt
24
Save
31
Spendonessentials
16
Other
7

1.5 Thefollowingdatarepresentthecostofelectricityduringjuly2006forrandom
samplesof50onebedroomapartmentsinlargecity
Table2.UtilityCharge
96 171 202 178 147 102 153 197 127
ARDBUSINESSSTATISTICSSec.3

82
Page9of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

157
141
95
108

185 90 116 172


149 206 175 123
163 150 154 130
119 183 151 114

111
128
143
135

148
144
187
191

213
168
166
137

130
109
139
129

165
167
149
158

1.6 Fromafrequencydistributionandpercentagedistributionthathaveclass
intervalwithupperclasslimits$99,$119,andsoon.
1.7 Constructahistogramandapercentagepolygon
1.8 Formacumulativepercentagedistributionandplotacumulativepercentage
polygon
1.9 Aroundwhatamountdoesmonthlyelectricitycostseemtobeconcentrated?

1.10 Appendix
1.10.1 InstallingExcelAddInsforPHStat2

The Prentice Hall PHStat Microsoft Excel addin enhances Microsoft Excel to better support the
statisticalanalysestaughtinanintroductorystatisticscourse.UsingPHStatlessensthetechnicaltraining
needed to use Microsoft Excel to perform statistical analysis and allows you to generate results that
wouldotherwisebeverytediousorimpossibletoproducefromworksheetsbuiltfromscratch.PHStat
requiresthatDataAnalysisisinstalledonEXCELandthefollowingsystemrequirements:

AnyWindows95(orlater)system;MicrosoftExcel95orMicrosoftExcel97(orlater)
32 MB of main memory; 64 MB required when running sampling distribution simulations and
dataintensive regression analyses; approximately 5 MB hard disk free space during setup
processand3MBharddiskspaceafterinstallation.
PreferredDisplaysettings:PHStatwillrunwithanydisplaysettings,butforbestresultssetthe
Desktopareato800by600pixelswithSmallFonts.(UsetheSettingstaboftheDisplayappletof
theControlPaneltochangesettings.).

1.10.2 INSTALLINGDATAANALYSISONEXCEL2007

1. Open Excel and click the Office Button.


2. In the Office Button pane, click Excel Options.

ARDBUSINESSSTATISTICSSec.3

Page10of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

3. In the Excel options dialog box that appears, click Add-Ins


in the left panel and
look for Analysis ToolPak and Analysis ToolPak VBA under Active Application
Add-ins.
in the Add-Ins dialog box that
4. If they do not appear, click Go.
appears, verify that Analysis ToolPak and Analysis ToolPak VBA are both checked in
the Add-Ins available list.
5. Click OK and exit Excel to save these settings.

ClickontheMicrosoftOfficebuttonintheupperlefthandcorneroftheEXCELspreadsheetandclick
onEXCELOptionsinthelowerrighthandcornerofthepulldownmenu.OntheleftsideoftheEXCEL
OptionspageclickonAddinsandthentheGobuttonatthebottomofthepage.Thisshouldopen
theAddinssection.SelectAnalysisToolPakandAnalysisToolPakVBAandclickOK.

1.10.3 InstallingandOperatingthePrenticeHallPHStatONYourHomeComputer

To use the Prentice Hall PHStat Microsoft Excel addin, you first need to run the setup program
(Setup.exe) located in the PHStat directory on this disk. The setup program will install the PHStat
program files to your system and add icons on your Desktop and Start Menu for PHStat. To do this
simplyinsertPHStatdiskinyourCDdriveandfollowdirections.
TooperatePHStatorEXCELsimplydoubleclicksonthePHStaticon.ForEXCEL2007users,youwilllikely
havetoclickonEnableMacroswhichshouldpopupbyitself.

1.10.4 ConfiguringExcel2007securityforPHStat2
You must change the Trust Center settings to allow PHStat2 to properly function. Click the Office
Button,andthenclickExcelOptionsintheOfficemenu.IntheExcelOptionsdialogboxthatappears,
clickTrustCenterandthenintheTrustCenterpanel,clickTrustCenterSettings.Intheleftpaneofthe
ARDBUSINESSSTATISTICSSec.3

Page11of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

TrustCenterdialogboxthatappears,firstclickAddInsandclear,ifnecessaryallofthecheckboxesthat
appearundertheAddinsbanner.Next,clickMacroSettingsintheleftpaneandclickeitherDisableall
macroswithnotification(recommended)orEnableallmacros(notrecommended,useonlyiftheother
choicefailstoallowPHStat2tofunctionproperly).

ARDBUSINESSSTATISTICSSec.3

Page12of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Practicum:Math11002
BusinessStatistics
MODULE2
Submittedonlyon
Day/Date:____________/
______________
Time:12.0014.00WIB
In____________________

ModuleDescription:
Objective

Output

DateofReceipt

Score:

AssistantSignature

I herewith signed here on stated that I have strived to do all this the
modulebymyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
NUMERICALDESCRIPTIVEMEASURES
Measuresofcentraltendency,variation,andshapePopulationsummary
measures Five number summary and BoxandWhisker plots Covariance
andCoefficientofcorrelation.
Useseparatepaperstoreportyourresults(inhandwritingorcomputer
print out). A report produced by the students should be in the form of
workingproceduresandresultsinbothsoftcopyandhardcopy.

2 NUMERICALDESCRIPTIVEMEASURES
2.1 CentralTendency
Centraltendencyreferstothetendencyoftheindividualmeasuresinadistributiontocluster
togethertowardsomepointofaggregation.
2.1.1

TheMean
Meanorarithmeticmeanisvalueoftotalsumofvaluesdividedbythenumberofdatavalues
includedincludedtothecalculation(quantityofinteger).

2.1.1.1 Formula:TheMean
Totalsumdividedbyquantityofintegers

Where

=Samplemean
=Numberofvaluesorsamplesize
=ithvalueofthevariableX
=Summationofall valueinthesample

2.1.1.2 MsExcelBuiltInFunctionforcalculatingMean
Thefunctioniswrittenasfollows:

=AVERAGE(argument)
ARDBUSINESSSTATISTICSSec.4

Page13of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Theargumentforthisfunctionisdatacontainedintheselectedrangeofcells.
ExampleUsingExcel'sAVERAGEFunction:
Note:Forhelpwiththisexample,seetheimagetotheright.
1. EnterthefollowingdataintocellsC1toC6:11,12,13,14,15,16.
2. ClickoncellC7thelocationwheretheresultswillbedisplayed.
3. Type"=average("incellC7.
4. DragselectcellsC1toC6withthemousepointer.
5. Typetheclosingbracket")"afterthecellrangeincellC7.
6. PresstheENTERkeyonthekeyboard.
7. Theanswer13.5shouldbedisplayedincellC7.
8. The complete function = AVERAGE (C1 : C6) appears in the formula bar above the
worksheet.

2.1.2

TheMedian
TheMEDIANshowsyouthemiddlevalueinalistofnumbers.Middle,inthiscase,refersto
arithmeticsizeratherthanthelocationofthenumbersinalist.Ifthereisanevensetof
numbers,themedianistheaverageofthemiddletwovalues.

2.1.2.1 Formula:TheMedian
Middlevaluethatseparatesthegreaterandlesserhalvesof
adataset

rankedvalue

2.1.2.2 MsExcelBuiltInFunctionforcalculatingMedian
ThesyntaxfortheMEDIANfunctionis:
=MEDIAN(number1,number2,...number255)
Note:Upto255numberscanbeenteredintothefunction.
ARDBUSINESSSTATISTICSSec.4

Page14of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

ExampleUsingExcel'sMEDIANFunction:
Note:Forhelpwiththisexample,seetheimagetotheright.
1. EnterthefollowingdataintocellsD1toD5:4,12,49,24,65.
2. ClickoncellE1thelocationwheretheresultswillbedisplayed.
3. ClickontheFormulastab.
4. ChooseMoreFunctions>Statisticalfromtheribbontoopenthefunctiondropdownlist.
5. ClickonMEDIANinthelisttobringupthefunction'sdialogbox.
6. Drag select cells D1 to D5 in the spreadsheet to enter the range into the dialog box, then
ClickOK.
7. Theanswer24shouldappearincellE1sincetherearetwonumberslarger(49and65)and
twonumberssmaller(4and12)thanitinthelist.
8. Thecompletefunction=MEDIAN(D1:D5)appearsintheformulabarabovetheworksheet
whenyouclickoncellF1.

2.1.3

TheMode
ThemodeisMostfrequentnumberinadataset.

2.1.3.1 Formula:TheMedian
Forexample,themodeofarrayof1,3,4,4,4,7,7,12,17is
4.
2.1.3.2 MsExcelBuiltInFunctionforcalculatingMode
TheMODEfunction,oneofExcel'sstatisticalfunctions,tells
youthemostfrequentlyoccurringvalueinalistofnumbers.
ThesyntaxfortheMODEfunctionis:

= MODE ( number1, number2, ... number255 )


Note:Upto255numberscanbeenteredintothefunction.
ExampleUsingExcel'sMODEFunction:
Note:Forhelpwiththisexample,seetheimagetotheright.

ARDBUSINESSSTATISTICSSec.4

Page15of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

1.
2.
3.
4.
5.
6.

EnterthefollowingdataintocellsD1toD6:98,135,147,135,98,135.
ClickoncellE1thelocationwheretheresultswillbedisplayed.
ClickontheFormulastab.
ChooseMoreFunctions>Statisticalfromtheribbontoopenthefunctiondropdownlist.
ClickonMODEinthelisttobringupthefunction'sdialogbox.
DragselectcellsD1toD6inthespreadsheettoentertherangeintothedialogbox.Then
ClickOK.
7. Theanswer135shouldappearincellE1sincethisnumberappearsthemost(threetimes)in
thelistofdata.
8. Thecompletefunction=MODE(D1:D6)appearsintheformulabarabovetheworksheet
whenyouclickoncellE1.

2.1.4

Quartiles
Quartilesoftenareusedinsalesandsurveydatatodividepopulationsintogroups.Forexample,
youcanuseQUARTILEtofindthetop25percentofincomesinapopulation.

2.1.4.1 FormulasofQuartiles

First quartile (designated Q1) = lower quartile = cuts off


lowest 25% of data = 25th percentile
Ssecond quartile (designated Q2) = median = cuts data set in
half = 50th percentile
Third quartile (designated Q3) = upper quartile = cuts off
highest 25% of data, or lowest 75% = 75th percentile

2.1.4.2 MsExcelBuiltInFunctionforcalculatingMode
ThesyntaxfortheMODEfunctionis:
=QUARTILE(array,quart)

Arrayisthearrayorcellrangeofnumericvaluesforwhichyouwantthequartilevalue.
Quartindicateswhichvaluetoreturn.
If quart equals

ARDBUSINESSSTATISTICSSec.4

QUARTILE returns

Minimum value

First quartile (25th percentile)

Median value (50th percentile)

Third quartile (75th percentile)

Maximum value

Page16of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

2.1.5

TheGeometricMean
TheGeometricMeanmeasurestherateofchangeofavariableovertime.Returns the geometric
mean of an array or range of positive data. For example, you can use GEOMEAN to calculate average
growth rate given compound interest with variable rates

2.1.5.1 Formula:TheGeometricMean

GeometricMeanisthenthrootoftheproductofnvalues
xG

Or

GeometricMeanRateofReturnmeasurestheaveragepercentagereturnofaninvestmentover
time.
RG

2.1.5.2 MsExcelBuiltInFunctionforcalculatingGeometricMean
Syntax

=GEOMEAN(number1,number2,...)
Number1,number2,...are1to255argumentsforwhichyou
wanttocalculatethemean.Youcanalsouseasingle
arrayorareferencetoanarrayinsteadofarguments
separatedbycommas.

Example:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. OnB4typeformula=GEOMEAN(A2:A8).ThenClickENTER.
3. Theanswer5.47698697shouldappearincellB4

2.1.6

OtherusefulExcelBasicBuiltInFunctions:

2.1.6.1 SUM

ARDBUSINESSSTATISTICSSec.4

Page17of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Horizontal
100

200

Vertical
100
200
300
600

300

600

=SUM(C4:E4)

=SUM(C7:C9)

Single Cells
100

300

600

200
What Does It Do ?
This function creates a total from a list of numbers.
It can be used either horizontally or vertically.
The numbers can be in single cells, ranges are from other functions.
Syntax
=SUM(Range1,Range2,Range3... through to Range30).

2.1.6.2 COUNT

Entries To Be Counted
10
20
30
10
0
30
10
-20
30
10
1-Jan-88
30
10
21:30
30
10
0.758576
30
10
30
10
Hello
30
10
#DIV/0!
30

Count
3
3
3
3
3
3
2
2
2

=COUNT(C4:E4)
=COUNT(C5:E5)
=COUNT(C6:E6)
=COUNT(C7:E7)
=COUNT(C8:E8)
=COUNT(C9:E9)
=COUNT(C10:E10)
=COUNT(C11:E11)
=COUNT(C12:E12)

What Does It Do ?
This function counts the number of numeric entries in a list.
It will ignore blanks, text and errors.
Syntax
=COUNT(Range1,Range2,Range3... through to Range30)

2.1.6.3 MAX

Values
ARDBUSINESSSTATISTICSSec.4

Maximum
Page18of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

120

800

100

120

250

800

=MAX(C4:G4)

Dates
1-Jan-98

25-Dec-98

31-Mar-98

27-Dec-98

4-Jul-98

Maximum
27-Dec-98

=MAX(C7:G7)

What Does It Do ?
This function picks the highest value from a list of data.
Syntax
=MAX(Range1,Range2,Range3... through to Range30)

2.1.6.4 MIN

Values
120

800

100

120

250

Minimum
100

=MIN(C4:G4)

Dates
1-Jan-98

25-Dec-98

31-Mar-98

27-Dec-98

4-Jul-98

Maximum
1-Jan-98

=MIN(C7:G7)

What Does It Do ?
This function picks the lowest value from a list of data.
Syntax
=MIN(Range1,Range2,Range3... through to Range30)

ARDBUSINESSSTATISTICSSec.4

Page19of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

2.2 Assignment2.1:
Thesampledataof38banksfordirectdepositcustomerswhomaintainaRp.100(millions)balance:
26
18
28
20

28
20
10
25

40
25
2
25

20
25
21
22

21
22
22
30

22
30
25
30

25
30
25
30

25
3
18
65

18
15
25
20

25
20
15
29

15
29
20
23

20
26
18
45

UsingformulasabovecalculateMean,Median,Mode,QuartilesandGeometricMeanofthe
sampledata.
2. UseMsExcelFunctionstocalculateMean,Median,Mode,QuartilesandGeometricMeanofthe
sampledata.
3. Comparetheresultandreportyouranalysis.
1.

2.3 Variation
Variability or variation refers to the overall separations and differences that exist among the
individual measures in a distribution, while central tendency refers to their closeness and
similarity. Variation measures the spread or the dispersion of values in a data set.

2.3.1 TheRange
TheRangeequaltothelargestvalueminusthesmallestvalue.
2.3.1.1 Formula:TheRange

2.3.1.2 MsExcelBuiltInFunctionforcalculatingTheRange
TocalculatetherangeinMsExcelweusetwobuiltinfunction:MAX()andMIN().SeeSection
1.1.6above.
BasedontheformulaoftherangeabovethesyntaxofformulatocalculateTheRange:
=MAX()MIN()
Example:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. OnB4typeformula=MAX(A2:A8)MIN(A2:A8).ThenClickENTER.
ARDBUSINESSSTATISTICSSec.4

Page20of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

3. Theanswer5.47698697shouldappearincellB4

2.3.2 TheInterQuartileRange
TheInterQuartileRangeequaltothedifferentbetweenthethirdquartileandthefirstquartileinaset
ofdata.
2.3.2.1 Formula:TheRange

2.3.2.2 MsExcelBuiltInFunctionforcalculatingTheRange
TocalculatetherangeinMsExcelweuseFORMULAwithbuiltin
functionQUARTILE().SeeSection1.1.4.2above.
Basedontheformulaoftherangeabovethesyntaxofformulato
calculateTheRange:
=QUARTILE(range,3)QUARTILE(range,1)
Example:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. OnB4typeformula=QUARTILE(A2:A8,3)QUARTILE(A2:A8,1).ThenClickENTER.
3. Theanswer5.47698697shouldappearincellB4

2.3.3 TheVarianceandStandarDeviation
TheInterQuartileRangeequaltothedifferentbetweenthethirdquartileandthefirstquartileinaset
ofdata.
2.3.3.1 Formula:TheVarianceandStandardDeviation
Varianceformula:

Or

StandarVariationformula:

ARDBUSINESSSTATISTICSSec.4

Page21of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

2.3.3.2 MsExcelBuiltInFunctionforcalculatingVarianceandStandardDeviation
TocalculatetherangeinMsExcelweuseFORMULAwithbuiltinfunctionVAR(),andthe
standarddeviationweuserSTDEV()
Syntax:
=VAR(number1,number2,...)

=STDEV(number1,number2,...)

Number1, number2, ... are 1 to 255 number arguments


corresponding to a sample of a population

ExampleforVARIANCE:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. OnB4typeformula=VAR(A2:A8).ThenClickENTER.
3. Theanswer8shouldappearincellB4

ExampleforSTANDARDEVIATION:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. OnB4typeformula=STDEV(A2:A8).ThenClickENTER.
3. Theanswer2.828427125shouldappearincellB4

2.3.4 TheCoefficientofVariance
TheCoefficientofVarianceisarelativemeasureofvariationthatalwaysexpressedinpercentage.
2.3.4.1 Formula:TheCoefficientofVariance
Thecoefficientofvarianceisequaltothestandarddeviationdividedbythemeanandmultipliedby
100%
Formula:

ARDBUSINESSSTATISTICSSec.4

Page22of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

100%

2.3.4.2 MsExcelBuiltInFunctionforcalculatingVarianceandStandardDeviation
TocalculatetherangeinMsExcelweuseFORMULAwithbuiltinfunctionSTDEV(),andthemean
weuseAVERAGE()
Syntax:
=(STDEV(number1,number2,...)/AVERAGE(number1,number2,...))*100%

Number1, number2, ... are 1 to 255 number arguments corresponding to a sample of a population

ExampleforVARIANCE:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. On B4 type formula = (STDEV(A2:A8)/AVERAGE(A2:A8)X100%
ThenClickENTER.
3. Theanswer8shouldappearincellB4

2.3.5 ZScores
ZScoresisanextremevalueoroutlierlocatedfarawayfromthemean.
Formula:

2.3.5.1 MsExcelBuiltInFunctionforZScores
TocalculateZScoreinMsExcelweuseFORMULAwithbuiltinfunctionSTDEV(),andthemean
weuseAVERAGE()
Syntax:
'=(number - AVERAGE(range of number))/STDEV(range of number)

ARDBUSINESSSTATISTICSSec.4

Page23of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

number is 1 number argument corresponding to a sample of a population

Range of Number are 1 to 255 number arguments corresponding to a sample of a population

ExampleforVARIANCE:
1. EnterdatatocellsA2throughA8:4,5,8,7,11,4,3
2. On C2 type formula '=(A2AVERAGE($A$2:$A$8))/STDEV($A$2:$A$8) Then copy to
otherscells(C3toC8)ENTER.
3. Theanswer0.707106781shouldappearincellC2.

2.4 Shape
Theofadatasetrepresentsapatternofallthevalues,fromthelowesttothehighestvalue.A
distributioniseithersymmetricalorskewed.Asymmetricaldistributionisvaluesbelowmeanare
distributedexactlyasthevaluesabovethemean.Whileskeweddistributionwillresultsinanimbalance
oflowvaluesorhighvalues.
2.4.1 Formula:
Shapeinfluencestherelationshipofthemeantothemedianinthefollowingways:

Mean<Median:negativeorleftskewed
Mean=Median:symmetricorzeroskewness
Mean>Median:positiveorrightskewed

2.4.1.1 MsExcelFunctionforcalculatingskewness
Returns the skewness of a distribution. Skewness characterizes the degree of asymmetry of a distribution
around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more
positive values. Negative skewness indicates a
distribution with an asymmetric tail extending toward
more negative values.

Syntax
=SKEW(numbers)

ARDBUSINESSSTATISTICSSec.4

Page24of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Examples:

ExampleforNegativeSKEWNESS:
1. EnterdatatocellsA2throughA8:10,10,20,30,40,50,50
2. OnB2typeformula=SKEW(A3:A8),thenpressENTER.
3. Theanswer0.38shouldappearincellB2. This mean number mean that the
distributionofdata(A3toA8)isnegative
4. OnB4typeformula=SKEW(A2:A8),thenpressENTER.
5. Theanswer0shouldappearincellB4.
This mean number mean that the
distributionofdata(A2toA8)issymmetric.
6. OnB6typeformula=SKEW(A2:A7),thenpressENTER.
7. Theanswer+0.38shouldappearincellB6. This mean number mean that the
distributionofdata(A2toA7)ispositive

2.5 Assignment2.2:
UsingDataon1.2abovecalculateorcomposeRange,InterQuartileRange,Varianceand
StandarDeviation,TheCoefficientofVariance,ZScores,Shape.Reportyourresults.

2.6 Descriptivesummaryofpopulation
TheDescriptiveStatisticsprocedureoftheToolPakaddin.

INSTALLING DATA ANALYSIS ON EXCEL


2.6.1

ExcelStatisticalAnalysisTools

Excel has several data analysis tools included through an Analysis ToolPak add-in. These tools
can quickly produce complex engineering or statistical analyses of your data. Each tool is a little
different, but all require you to input what data you wish Excel to analyze.

ARDBUSINESSSTATISTICSSec.4

Page25of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

DataAnalysisislocatedundertheToolsmenu.Iftheoptionisnotthere,youwillneedtoinstallthe
AnalysisToolPak.

2.6.2

InstallandusetheAnalysisToolPak

1. OntheToolsmenu,clickAddIns.
2. SelecttheAnalysisToolPakcheckbox.
3. OntheToolsmenu,clickDataAnalysis.

Note:IfAnalysisToolPakisnotlistedintheAdd
Insdialogbox,clickBrowseandlocatethe
drive,foldername,andfilenamefortheAnalysis
ToolPakaddin,Analys32.xllusuallylocatedin
theMicrosoftOffice\Office\Library\Analysis
folderorruntheSetupprogramifitisn't
installed.

For EXCEL 2007:


1.

Click on Data Tab and click on Data Analysis Icon on Data Tab.

Click on the Microsoft Office button in the upper left hand corner of the EXCEL
spreadsheet and click on EXCEL Options in the lower right hand corner of the pull-down
menu. On the left side of the EXCEL Options page click on Add-ins and then the Go
button at the bottom of the page. This should open the Add-ins section.
3. Select Analysis ToolPak and Analysis ToolPak-VBA and click OK.
2.

For EXCEL 2003 or earlier version:


1.

Click on the Tools tab/pull-down menu and click on Data Analysis.

ARDBUSINESSSTATISTICSSec.4

Page26of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

2.

If Data Analysis does not appear on the Tools pull-down menu, then click on AddIns and click on the first two boxes (Analysis ToolPak and Analysis ToolPakVBA). Click OK and open Data Analysis.

Using ToolPak Descriptive Statistics


Begin the Analysis ToolPak add-in and Descriptive Statistics from the Analysis Tools list and
Click OK. In the Descriptive Statistics dialog box (shown below), enter the cell range of the
data as the Input Range. Click the Column option and Labels in first row. See Designing
Effective Worksheets in Section 1.6 of Levine, et.al. 2008. Statistics For Managers Using
Microsoft Excel, Fifth Edition. Pearson Education, Inc. Upper Saddle River, New Jersey, 07458.
Finish by clicking New Worksheet Ply, Summary statistics, Kth Larget, and Kth Smallest, and
the OK.

2.7 Boxwhiskerplot
Indescriptivestatistics,aboxplotorboxplot(alsoknownasaboxandwhiskerdiagramorplot)isa
convenientwayofgraphicallydepictinggroupsofnumericaldatathroughtheirfivenumbersummaries:
thesmallestobservation(sampleminimum),lowerquartile(Q1),median(Q2),upperquartile(Q3),and
largestobservation(samplemaximum).Aboxplotmayalsoindicatewhichobservations,ifany,mightbe
consideredoutliers.
Aboxplot,orboxandwhiskerdiagram,providesasimplegraphicalsummaryofasetofdata.Itshowsa
measureofcentrallocation(themedian),twomeasuresofdispersion(therangeandinterquartile
range),theskewness(fromtheorientationofthemedianrelativetothequartiles)andpotentialoutliers
(markedindividually).Boxplotsareespeciallyusefulwhencomparingtwoormoresetsofdata.
Regrettably,thereiscurrentlynoboxplotfacilityinMicrosoftExcel.Forsimplicity,manyrecentstatistics
textbooks(forexample,Dalyetal,1995)omitthefencesusedtoidentifypossibleoutliers.These
simplifiedboxplots,displayingmostoftheimportantfeatures,canbedrawnquiteeasilyinExcel.Inthe
absenceofanyfences(seeDevoreandPeck(1990)foradefinition),asimpleruleisthatawhiskerwhich
islongerthanthreetimesthelengthoftheboxprobablyindicatesanoutlier.

ARDBUSINESSSTATISTICSSec.4

Page27of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

CreateBoxWisherPlot

TocreateBoxPlotusingMsExcel2007:
Highlight the whole table, including figures and series labels, then select Insert from the main
menu.UnderChartsselectaLinechartandchoosetheLinewithMarkersoption.
2. UnderChartToolsselectDesign>SwitchRow/Column.Rightclickonadatapointfromthefirst
data series, and choose Format Data Series > Line Colour > No line to remove the connecting
lines.Repeatfortheotherfourdataseriesinturn.
3. SelectanyofthedataseriesandunderChartToolsselectLayout>Analysis>Lines>HighLow
Lines,thenLayout>Analysis>Lines>Up/DownBars>Up/DownBars.
4. Further customisingcan becarriedoutaccording toyourownpreferencesbyrightclickingon
therelevantobjectandselectingtheFormatoptionontheshortcutmenu.
1.

TheResult:
90
80
70
60

Q1

50

Min

40

Median

30

Max

20

Q3

10
0
set1

set2

set3

ARDBUSINESSSTATISTICSSec.4

Page28of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

2.8 Assignment2.3
Replicatesection2.7Boxwhiskerplotprocedure

2.9 Weightedmean
Exceldoesnotcontainabuiltinfunctiontocalculateaweightedaverage.Itishowevereasytodoit
usingtheSUMPRODUCT()functioninasimpleformula.

Weightedaverage

2
3
4
5
6
7
8
9

GradeA
GradeB
GradeC

Average
WtdAvg

Cost
13000
15000
20000

Staff
5
2
3

16000
15500

SumProduct()multipliestwoarrays(orranges)togetherandreturnsthesumoftheproduct.Inthe
illustrationitwouldcalculate'(B4xC4)+(B5xC5)+(B6xC6)'.
TheformulaincellB9is:=SUMPRODUCT(B4:B6,C4:C6)/SUM(C4:C6)
Theresultshowsthattheweightedaverageislessthantheplainarithmeticmean.Thisisbecauseithas
takenintoaccountthelargernumberofstaffbeingpaidthelowersalary.

F
G
H
13
Forecastincorporatingrisk
14

15

ProbabilitySales
16
17
18
19
20
21

Goodweather
Mediocreweather
Poorweather
Hurricane

Forecast

30%
50%
19%
1%

100%

10000
8000
2000
0

7380

Theweightedaveragecanalsobeusedforassessingtheriskordeterminingtheprobabilityofvarious
outcomes.Ifajudgementismadeaboutthelikelihoodofvariousweatherconditionsforanoutdoor
sportingandtheeffectonticketsales,apredictedvalueofsalescanbecalculatedusingasimilar
formulaasthepreviousexample.=SUMPRODUCT(G16:G19,H16:H19)returnsthevalueof7,380.The

ARDBUSINESSSTATISTICSSec.4

Page29of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

probabilityvalues(G16:G19)arealreadyexpressedaspercentages(total=100%or1.0)andsothereis
noneedtodividebySUM(G16:G19).

2.10 Assignment2.4
CapitalComponent
RetainedEarnings
CommonStocks
PreferredStocks
Debt(Bonds)

Cost
8%
9%
10%
6.67%

%ofcapitalstructure
30%
10%
15%
45%

UsingtableaboveCalculatetheweightedaveragecostofcapital(WACC)ofthiscompany!

2.11 Correlationcoefficients

2.11.1.1 CorrelationCoefficientsFormula
If(X1,Y1),(X2,Y2),(X3,Y3)...,(Xn,Yn)aretheobservedvaluesthenthecorrelationcoefficient(usually
denotedasCorr(X,Y)orXY)oftheobservedsampleisdefinedas:

Anotherwayofvisualizingtheformulais:
,

Nowwegeneralizetheideaofsamplecorrelationcoefficientwhenthesampleisnotbivariatebut
multivariate.
LetX~1,X~2,X~3,...,X~nbearandomsamplewhereeachX~iisakdimensionalvectoroftheform
X~i=Xi1,Xi2,Xi3,...,Xin.Justlikeintheprevioustopic.
Justlikeinthecaseofsamplecovariance,inthemultivariatecasewetalkofsamplecorrelation
coefficientmatrix.Likethedispersionmatrix,thesamplecorrelationcoefficientmatrixisasquare
matrixoforderkxkdefinedasbelow.

Allthediagonalentriesare1asbothmathematicallyandheuristicallyweseethatthe
correlationcoefficientofanyvariablewithitselfshouldbe1.
ii=1foralli

ARDBUSINESSSTATISTICSSec.4

Page30of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Similartothedispersionmatrix,theoffdiagonalelementsarecorrelationcoefficientoftheith
andjthvariables.

Orinanotherway:
,

2.11.1.2 MsExcelFunctionforcalculatingcorrelation
Step1:TomakethiscalculationselectTools/DataAnalysis/CorrelationThefollowingdialogboxis
displayed:

Step2:Intheinputrangetextboxentertherangeofthedata(includethefirstrowcontainingthe
variablename)orclickonthedataselectioniconandmarktherangetouse.

Step3:NoticethattheLabelsinFirstRowcheckboxischecked.

Step4:ClickonOKandthefollowinginformationwillappearinanewworksheet:

A
B
1

TIME1
TIME2
2
TIME1
1
3
TIME2
0.763957
1

ThePearsonscorrelationforthesetwovariablesis0.764(rounded.)
ARDBUSINESSSTATISTICSSec.4

Page31of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Example2

2.11.1.3 Asecondwaytocalculatethecorrelationiswithafunction.
Step1:IntheExampleworksheet,entersomelabelsincolumnItoindicatethatyouarecalculatinga
correlation.

Step2:IntheJ3(orwhereveryouwantit)cell,youwillenteranExcelfunctionthatwillcalculatethe
desiredcorrelation.
Step3:Entertheformula
=CORREL(C2:C51,D2:D51)
Notethatitisoftheform,=CORREL(array1,array2)
Wherethefirstarrayandsecondarraycontainthepairednumberstocorrelate.ItisIMPORTANTthat
thenumbersbepairedcorrectly.)
Theanswerwillappearinthecell.Inthiscase,thePearsonscorrelationis0.764(rounded.)
2.11.1.4 CalculatethecorrelationiswithusingFormula

ARDBUSINESSSTATISTICSSec.4

Page32of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

2.12 Covariance
Forabivariatesamplewehavedealtwiththecovariancealready.Letusjustrecallit:
Givenarandomsample(X1,Y1),(X2,Y2),(X3,Y3)...,(Xn,Yn)thesamplecovarianceCov(X,Y)isdefinedas
1

2.12.1.1 MsExcelCalculationforCovariance:

2.12.1.2 MsExcelFunctionforCovariance:
TocalculateCovarianceusingMsExcelFunctionwecanuseCOVAR(array1,array2)

The covariance calculation on Ms Function base on


equation, where x and y are the
sample means AVERAGE(array1) and AVERAGE(array2), and n is the sample size.

2.13 Assignment2.5
2.13.1 CaloriesandFatrelationship
Calories

Fat

Dunkin' Donuts Iced Mocha Swirl latte (whole milk)

Product

240

8.0

Starbucks Coffee Frappuccino blended coffee

260

3.5

Dunkin' Donuts Coffee Coolatta (cream)

350

22.0

Starbucks Iced Coffee Mocha Expresso (whole milk and whipped cream

350

20.0

Starbucks Mocha Frappuccino blended coffee (whipped cream)

420

16.0

Starbucks Chocolate Brownie Frappuccino blended coffee (whipped cream)

510

22.0

Starbucks Chocolate Frappuccino Blended Crme (whipped cream)

530

19.0

ARDBUSINESSSTATISTICSSec.4

Page33of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Using data above calculate:


a.
b.
c.
d.

The covariance using both technique above and compare. Explain !


Compute the coefficient of correlation using techniques explained above.
Which do you think is more valuable in expressing the relationship between calories and fat the covariance
or the coefficient of correlation? Explain.
What your conclusions about the relationship between Calories and Fat? Explain.

2.13.2 FuelEfficiencyCalculationandStandard

a.
b.
c.

Government
Standard

Car

Owner

2005 Ford F-150

14.3

16.8

2005 Chevrolet Silverado

15.0

17.8

2002 Honda Accord LX

27.8

26.2

2002 Honda Civic

27.9

34.2

2004 Honda Civic Hybrid

48.8

47.6

2002 Ford Explorer

16.8

18.3

2005 Toyota Camry

23.7

28.5

2003 Toyota Corolla

32.8

33.1

2005 Toyota Prius

37.3

56.0

Compute the covariance using both techniques explained above and compare. Explain !
Compute the coefficient of correlation using techniques explained above.
What your conclusions about the relationship between Owner Calculation and Government Standard?
Explain.

ARDBUSINESSSTATISTICSSec.4

Page34of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Practicum:MATH11002
BusinessStatistics
MODULE3
Submittedonlyon
Day/Date:____________/
______________
Time:WIB
In____________________

ModuleDescription:
Objective

Output

DateofReceipt

Score:

AssistantSignature

IherewithsignedhereonstatedthatIhavestrivedtodoallthiswiththe
modulemyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
Probability
Thestudentunderstandandabletodefineandexaminebasicprobability
conceptsDefineconditional,jointandmarginalprobabilityTouseBayes'
theorem to revise probabilities Statistical Independence; Addressed the
probability of a discrete random variable Define covariance and discuss
its application in finance To compute probability from the binomial,
Poisson and Hypergeometric distribution How to use this distribution to
solve business problem using Ms Excel Regression Analysis or Other
StatisticalSoftwares.
A report produced by the students should be in the form of working
proceduresandresultsinbothsoftcopyandhardcopy.

PROBABILITY

3.1 BasicProbability

Probability:thechancethatanuncertaineventwilloccur(alwaysbetween0and1)

Event:Eachpossibletypeofoccurrenceoroutcome

SimpleEvent:aneventthatcanbedescribedbyasinglecharacteristic

SampleSpace:thecollectionofallpossibleevents

Therearethreeapproachestoassessingtheprobabilityofanuncertainevent:
1.

AprioriClassicalProbability:theprobabilityofaneventisbasedonpriorknowledgeofthe
processinvolve

d.

Example:Findtheprobabilityofselectingafacecard(Jack,Queen,orKing)fromastandard
deckof52cards.Answer:

ARDBUSINESSSTATISTICSSec.5

Page35of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

2.

EmpiricalClassicalProbability:theprobabilityofaneventisbasedonobserveddata.

Probability of Occurrence

number of favorable outcomes observed


total number of outcomes observed

Example:Findtheprobabilityofselectingamaletakingstatisticsfromthepopulationdescribed
inthefollowingtable:

TakingStats

NotTakingStats

Total

Male

84

145

229

Female

76

134

210

Total

160

279

439

3.

84
439

0.191

SubjectiveProbability:theprobabilityofaneventisdeterminedbyanindividual,basedonthat
personspastexperience,personalopinion,and/oranalysisofaparticularsituation.

3.2 Samplespacesandevents,contingencytables,simpleprobabilityand
jointprobability
3.2.1

SampleSpace
TheSampleSpaceisthecollectionofallpossibleevents
Ex.All6facesofadie:
Ex.All52cardsinadeckofcards

Ex.Allpossibleoutcomeswhenhavingachild: BoyorGirl
3.2.2

EventinSampleSpace
Simpleevent
Anoutcomefromasamplespacewithonecharacteristic
ex.Aredcardfromadeckofcards
ComplementofaneventA(denotedA/)
AlloutcomesthatarenotpartofeventA
ex.Allcardsthatarenotdiamonds
Jointevent
Involvestwoormorecharacteristicssimultaneously

ARDBUSINESSSTATISTICSSec.5

Page36of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

ex.Anacethatisalsoredfromadeckofcards

Inmathematics,aprobabilityofaneventAisrepresentedbyarealnumberintherangefrom0to1and
writtenasP(A),p(A)orPr(A).Animpossibleeventhasaprobabilityof0,andacertaineventhasa
probabilityof1.TheoppositeorcomplementofaneventAistheevent[notA](thatis,theeventofA
notoccurring);itsprobabilityisgivenbyP(notA)=1P(A).Asanexample,thechanceofnotrollingasix
onasixsideddieis1(chanceofrollingasix)=1
3.2.3

SimpleandJointProbability
Simple(Marginal)Probabilityreferstotheprobabilityofasimpleevent.
ex.P(King)
JointProbabilityreferstotheprobabilityofanoccurrenceoftwoormoreevents.
ex.P(KingandSpade)

IfboththeeventsAandBoccuronasingleperformanceofanexperimentthisiscalledtheintersection
orjointprobabilityofAandB,denotedas
.Iftwoevents,AandBareindependentthenthe
jointprobabilityis

forexample,iftwocoinsareflippedthechanceofbothbeingheadsis

IfeithereventAoreventBorbotheventsoccuronasingleperformanceofanexperimentthisiscalled
theunionoftheeventsAandBdenotedas
.Iftwoeventsaremutuallyexclusivethenthe
probabilityofeitheroccurringis

Forexample,thechanceofrollinga1or2onasixsideddieis
1
1 2
1 2
1
2
6

1
6

2
6

Iftheeventsarenotmutuallyexclusivethen

Forexample,whendrawingasinglecardatrandomfromaregulardeckofcards,thechanceofgettinga
heartorafacecard(J,Q,K)(oronethatisboth)is

,becauseofthe52cardsofadeck13

arehearts,12arefacecards,and3areboth:herethepossibilitiesincludedinthe"3thatareboth"are
includedineachofthe"13hearts"andthe"12facecards"butshouldonlybecountedonce.
ARDBUSINESSSTATISTICSSec.5

Page37of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

ConditionalprobabilityistheprobabilityofsomeeventA,giventheoccurrenceofsomeothereventB.
ConditionalprobabilityiswrittenP(A|B),andisread"theprobabilityofA,givenB".Itisdefinedby

IfP(B)=0thenP(A|B)isundefined.
Summaryofprobabilities
Event

Probability

0,1
1

A
notA
AorB

AandB

ifAandBaremutuallyexclusive

|
ifAandBareindependent

AgivenB

3.3 Bayes'Theorem
P(Bi | A)

P(A | Bi )P(Bi )

P(A | B1 )P(B1 ) P(A | B 2 )P(B 2 ) P(A | B k )P(B k )

where:

Bi=itheventofkmutuallyexclusiveandcollectivelyexhaustiveevents

A=neweventthatmightimpactP(Bi)

BayesTheoremExample
Adrillingcompanyhasestimateda40%chanceofstrikingoilfortheirnewwell.Adetailedtesthas
beenscheduledformoreinformation.Historically,60%ofsuccessfulwellshavehaddetailedtests,and
20%ofunsuccessfulwellshavehaddetailedtests.Giventhatthiswellhasbeenscheduledfora
detailedtest,whatistheprobabilitythatthewellwillbesuccessful?
Solution:

LetS=successfulwellandU=unsuccessfulwell
P(S)=.4,P(U)=.6(priorprobabilities)
DefinethedetailedtesteventasD
Conditionalprobabilities:P(D|S)=.6andP(D|U)=.2

ARDBUSINESSSTATISTICSSec.5

Page38of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

P(S | D)

P(D | S)P(S)
P(D | S)P(S) P(D | U)P(U)

(.6)(.4)
.24

.667
(.6)(.4) (.2)(.6) .24 .12

Giventhedetailedtest,therevisedprobabilityofasuccessfulwellhasrisento.667fromthe
originalestimateof0.4.
PriorProb.

Conditional
Prob.

S(successful)

.4

.6

.4*.6=.24

.24/.36=.667

U(unsuccessful)

.6

.2

.6*.2=.12

.12/.36=.333

Event

JointProb.

RevisedProb.

3.4 Assignment3.1
CreateentryasscreenshotbeloworuseProbability.xlsworkbookfilefromCDcompanionofStatistics
forManagersUsingMicrosoftExcelTextbook.

Inputthedataonlytothebluecolorcells.
Probabilities
SampleSpace

RowVariable

ARDBUSINESSSTATISTICSSec.5

A
A'
Totals

ColumnVariable
B
B'
200
50
100
650
300
700

Totals
250
750
1000
Page39of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

SimpleProbabilities
P(A)
P(A')
P(B)
P(B')

0.25
0.75
0.30
0.70

JointProbabilities
P(AandB)
P(AandB')
P(A'andB)
P(A'andB')

0.20
0.05
0.10
0.65

AdditionRule
P(AorB)
P(AorB')
P(A'orB)
P(A'orB')

0.35
0.90
0.95
0.80

1. AMusicStorehasbeenvisitedby7customersthathavebeenboughtsomegoodsand9others
justwindowshoppingatrandomtimes.Achmad(customer)arrivedat11:30am.
a. Giveanexampleofasimpleevent
b. Whatisthecomplementofacustomerhavebeenboughtsomegoods?
2. Giventhefollowingcontingencytable:

B B
A 12 48
A 30 54

UsecalculatorandMSExceltofindtheprobabilityof
a. EventA
b. EventAandB
c. EventAandB
d. EventAandB
3. Comparecalculationresults(calculatorandMsExcel)
4. Aboxofnineglovescontainstwolefthandedglovesandsevenrighthandedgloves.
a. iftwoglovesarerandomlyselectedfromtheboxwithoutreplacement,whatisthe
probabilitythatbothglovesselectedwillberighthanded?
b. iftwoglovesarerandomlyselectedfromtheboxwithoutreplacement,whatisthe
probabilitytherewillbeonerighthandedandonelefthandedgloves?
c. ifthreeglovesareselectedfromtheboxwithreplacement,whatistheprobabilitythat
allthreegloveswillbeleftrighthanded?
d. Ifyouweresamplingwithreplacement,whatwouldbetheanswersto(a)and(b)?

ARDBUSINESSSTATISTICSSec.5

Page40of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

5. Anadvertizingexecutiveisstudyingtelevisionviewinghabitsofmarriedmanandwomenduring
primetimehours.Basedonpastviewingrecords,theexecutivehasdeterminedthatduring
primetime,husbandsarewatchingtelevision60%ofthetime.Whenthehusbandiswatching
television,40%ofthetimethewifeisalsowatching.Whenthehusbandisnotwatching
television,30%ofthetimethewifeiswatchingtelevision.Findtheprobabilitythat
a. Ifthewifeiswatchingtelevision,thehusbandisalsowatchingtelevision
b. Thewifeiswatchingtelevisioninprimetime.

3.5 BasicProbabilityRules

Arandomvariablerepresentsapossiblenumericalvaluefromanuncertainevent.
Discreterandomvariablesproduceoutcomesthatcomefromacountingprocess(i.e.number
ofclassesyouaretaking).

Continuousrandomvariablesproduceoutcomesthatcomefromameasurement(i.e.your
annualsalary,oryourweight).

3.5.1

DiscreteRandomVariable
Aprobabilitydistributionforadiscreterandomvariableisamutuallyexclusivelistingofall
possiblenumericaloutcomesforthatvariableandaparticularprobabilityofoccurrence
associatedwitheachoutcome
NumberofClassesTaken
2
3
4
5

Probability
0.2
0.4
0.24
0.16

Example:Experimentwithtoss2coins.LetX=numberofheads.
XValueProbability
01/4=.25
12/4=.50
21/4=.25
ARDBUSINESSSTATISTICSSec.5

Page41of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

3.5.2

DiscreteRandomVariablesExpectedValue
ExpectedValue(ormean)ofadiscretedistribution(WeightedAverage)
N

E(X) X i P ( X i )
i 1

Example:Toss2coins,X=#ofheads,
ComputeexpectedvalueofX:
E(X)=(0)(.25)+(1)(.50)+(2)(.25)=1.0

3.5.3

DiscreteRandomVariablesDispersion
Varianceofadiscreterandomvariable
N

2 [X i E(X)]2 P(X i )
i 1

StandardDeviationofadiscreterandomvariable

[X
i 1

E(X)]2 P(X i )

where:

E(X)=ExpectedvalueofthediscreterandomvariableX

Xi=theithoutcomeofX

P(Xi)=ProbabilityoftheithoccurrenceofX

Example:Toss2coins,X=#heads,computestandarddeviation(recallthatE(X)=1)

(0 1) 2 (.25) (1 1) 2 (.50) (2 1) 2 (.25) .50 .707


3.5.4

Covariance
Thecovariancemeasuresthestrengthofthelinearrelationshipbetweentwonumericalrandom
variablesXandY.Apositivecovarianceindicatesapositiverelationship.Anegativecovariance
indicatesanegativerelationship.
N

Covarianceformula:

XY [ X i E ( X )][(Yi E (Y )] P ( X iYi )
i 1

where: X=discretevariableX

Xi=theithoutcomeofX

Y=discretevariableY

Yi=theithoutcomeofY

P(XiYi)=probabilityofoccurrenceoftheconditionaffecting
ARDBUSINESSSTATISTICSSec.5

Page42of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

theithoutcomeofXandtheithoutcomeofY
Example:
Considerthereturnper$1000fortwotypesofinvestments
Economic
P(X Y )Condition

Investment
PassiveFundX

AggressiveFundY

0.2Recession

$25

$200

0.5StableEconomy

+$50

+$60

+$100

+$350

i i

0.3ExpandingEconomy

InvestmentReturnsTheMean
E(X)=X=(25)(.2)+(50)(.5)+(100)(.3)=50
E(Y)=Y=(200)(.2)+(60)(.5)+(350)(.3)=95
Interpretation:FundXisaveraginga$50.00returnandfundYisaveraginga$95.00
returnper$1000invested.
InvestmentReturnsStandardDeviation

X (-25 50) 2 (.2) (50 50) 2 (.5) (100 50) 2 (.3) 43.30
Y (-200 95) 2 (.2) (60 95) 2 (.5) (350 95) 2 (.3) 193.71
Interpretation:EventhoughfundYhasahigheraveragereturn,itissubjecttomuch
morevariabilityandtheprobabilityoflossishigher.
InvestmentReturnsCovariance

XY (-25 50)(-200 95)(.2) (50 50)(60 95)(.5) (100 50)(350 95)(.3)


8250

Interpretation: Since the covariance is large and positive, there is a positive relationship
betweenthetwoinvestmentfunds,meaningthattheywilllikelyriseandfalltogether.

3.5.5

TheSumofTwoRandomVariables:Measures
ExpectedValue: E ( X Y ) E ( X ) E (Y )
Variance: Var( X Y ) 2X Y 2X Y2 2 XY

ARDBUSINESSSTATISTICSSec.5

Page43of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Standarddeviation: X Y 2X Y

Example:PortfolioExpectedReturnandExpectedRisk
Investmentportfoliosusuallycontainseveraldifferentfunds(randomvariables)
Theexpectedreturnandstandarddeviationoftwofundstogethercannowbecalculated.
InvestmentObjective:Maximizereturn(mean)whileminimizingrisk(standarddeviation).
Recall:InvestmentX:E(X)=50
X=43.30

InvestmentY:E(Y)=95
Y=193.21

XY=8250
Suppose40%oftheportfolioisinInvestmentXand60%isinInvestmentY:
E(P) .4 (50) (.6) (95) 77

P (.4) 2 (43.30) 2 (.6) 2 (193.21) 2 2(.4)(.6)(8250) 133.04


TheportfolioreturnisbetweenthevaluesforinvestmentsXandYconsideredindividually.

3.6 BinomialDistribution
3.6.1

Properties
Afixednumberofobservations,n
ex.15tossesofacoin;tenlightbulbstakenfromawarehouse
Twomutuallyexclusiveandcollectivelyexhaustivecategories
ex.headortailineachtossofacoin;defectiveornotdefectivelightbulb;havinga
boyorgirl
Generallycalledsuccessandfailure
Probabilityofsuccessisp,probabilityoffailureis1p
Constantprobabilityforeachobservation
ex.Probabilityofgettingatailisthesameeachtimewetossthecoin
Observationsareindependent
Theoutcomeofoneobservationdoesnotaffecttheoutcomeoftheother
Twosamplingmethods
Infinitepopulationwithoutreplacement
Finitepopulationwithreplacement
ThenumberofcombinationsofselectingXobjectsoutofnobjectsis:
n

n
n!
C X

X X!(n X)!

where:

n!=n(n1)(n2)...(2)(1)

X!=X(X1)(X2)...(2)(1)

0!=1(bydefinition)

ARDBUSINESSSTATISTICSSec.5

Page44of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

3.6.2

TheBinomialDistributionFormula

P(X)

n!
p X (1 p) n X
X!(n X)!

P(X) =probabilityofXsuccessesinntrials,withprobabilityofsuccessponeachtrial
X =numberofsuccessesinsample,(X=0,1,2,...,n)
N
=samplesize(numberoftrialsorobservations)
P
=probabilityofsuccess

Example:Whatistheprobabilityofonesuccessinfiveobservationsiftheprobabilityofsuccess
is.1?X=1,n=5,andp=.1

n!
p X (1 p ) n X
X!(n X)!
5!

(.1)1 (1 .1) 51
1!(5 1)!

P(X 1)

(5)(.1)(.9) 4
.32805
3.6.3

TheshapeandCharacteristics
Theshapeofthebinomialdistributiondependsonthevaluesofpandn

Mean: E(x) np
VarianceandStandardDeviation

2 np(1 - p ) and np (1 - p )

np (5)(.1) 0.5
np (1 - p ) (5)(.1)(1 .1) 0.6708

np (5)(.5) 2.5
np(1 - p) (5)(.5)(1 .5) 1.118

ARDBUSINESSSTATISTICSSec.5

Page45of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

3.7 PoissonDistribution
Anareaofopportunityisacontinuousunitorintervaloftime,volume,orsuchareainwhich
morethanoneoccurrenceofaneventcanoccur.
ex.Thenumberofscratchesinacarspaint
ex.Thenumberofmosquitobitesonaperson
ex.Thenumberofcomputercrashesinaday
3.7.1

Properties
Countthenumberoftimesaneventoccursinagivenareaofopportunity
Theprobabilitythataneventoccursinoneareaofopportunityisthesameforallareasof
opportunity
Thenumberofeventsthatoccurinoneareaofopportunityisindependentofthenumber
ofeventsthatoccurintheotherareasofopportunity
Theprobabilitythattwoormoreeventsoccurinanareaofopportunityapproacheszeroas
theareaofopportunitybecomessmaller
Theaveragenumberofeventsperunitis(lambda)

3.7.2

Formula

P(X)

where:

X=theprobabilityofXeventsinanareaofopportunity

=expectednumberofevents

e=mathematicalconstantapproximatedby2.71828

Supposethat,onaverage,5carsenteraparkinglotperminute.Whatistheprobabilitythatina
givenminute,7carswillenter?So,X=7and=5

P(7)

e x e 5 57

0.104
X!
7!

So,thereisa10.4%chance7carswillentertheparkinginagivenminute.

ARDBUSINESSSTATISTICSSec.5

e x

X!

Page46of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

3.7.3

Shape

=0.5

0.70
0.60

P(X)

0
1
2
3
4
5
6
7

0.6065
0.3033
0.0758
0.0126
0.0016
0.0002
0.0000
0.0000

P(x)

0.50

P(X=2)=.0758

0.40
0.30
0.20
0.10
0.00
0

3.8 Hypergeometricdistribution

Thebinomialdistributionisapplicablewhenselectingfromafinitepopulationwithreplacement
orfromaninfinitepopulationwithoutreplacement.
Thehypergeometricdistributionisapplicablewhenselectingfromafinitepopulationwithout
replacement.

ntrialsinasampletakenfromafinitepopulationofsizeN
Sampletakenwithoutreplacement
Outcomesoftrialsaredependent
ConcernedwithfindingtheprobabilityofXsuccessesinthesamplewherethereareA
successesinthepopulation

3.8.1

Formula
A N A


X n X

P( X )
N

n

Where:

N=populationsize

A=numberofsuccessesinthepopulation
NA=numberoffailuresinthepopulation
ARDBUSINESSSTATISTICSSec.5

Page47of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

n=samplesize

X=numberofsuccessesinthesample
nX=numberoffailuresinthesample
Themeanofthehypergeometricdistributionis: E(x)

Thestandarddeviationis:

Where:

nA

nA(N - A) N - n

N2
N -1

N-n
is called the Finite Population Correction Factor from sampling without
N -1

replacementfromafinitepopulation
3.8.2

Example
Differentcomputersarecheckedfrom10inthedepartment.4ofthe10computershaveillegal
software loaded. What is the probability that 2 of the 3 selected computers have illegal
softwareloaded?

So,N=10,n=3,A=4,X=2

A N A 4 6


X n X 2 1 (6)(6)

P(X 2)

0.3
120
N
10


n

3
Theprobabilitythat2ofthe3selectedcomputershaveillegalsoftwareloadedis.30,or30%.

3.9 ReadExcelCompaniontoChapter5
Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,pages211215

3.10 Assignment3.2

1.
2.
3.
4.
5.

ProblemsforSection5.1Number5.2and5.4
ProblemsforSection5.2Number5.14
ProblemsforSection5.3Number5.24and5.28
ProblemsforSection5.4Number5.34and5.42
ProblemsforSection5.5Number5.46and5.50

ARDBUSINESSSTATISTICSSec.5

Page48of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

3.11 Assignment3.3
1.
2.
3.
4.

ProblemSection6.2No.6.2,No.6.7
ProblemSection6.3No.6.14,6.15andNo.6.16
ProblemSection6.4No.6.24,6.25andNo.6.26
ProblemSection6.5No.6.35andNo.6.36

ARDBUSINESSSTATISTICSSec.5

Page49of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Practicum:Math11002
BusinessStatistics
MODULE4
Submittedonlyon
Day/Date:____________/
______________
Time:12.0014.00WIB
In____________________

ModuleDescription:
Objective

Output

DateofReceipt

Score:

AssistantSignature

I herewith signed here on stated that I have strived to do all this the
modulebymyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
NORMALANDSAMPLINGDISTRIBUTION
Define continuous distribution: normal, uniform and exponential
Probabilities using formulas and tables The concept of the sampling
distributionTheimportanceoftheCentralLimitTheoremExaminewhen
toapplydifferentdistributions
Useseparatepaperstoreportyourresults(inhandwritingorcomputer
print out). A report produced by the students should be in the form of
workingproceduresandresultsinbothsoftcopyandhardcopy.

4 NORMALANDSAMPLINGDISTRIBUTION
4.1 NormalDistributionandEvaluatingNormality
NormaldistributionorGaussiandistributionisacontinuousprobabilitydistributionthatdescribesdata
thatclusteraroundthemean.Thenormaldistributionhasseveraltheoreticalproperties:

BellShapedinitsappearance
Measuresofcentraltendency(mean,medianandmode)areequal
Interquartilerangeisequalto1.33standardeviations.
Infiniterange

The normal distribution can be used to describe, at least approximately, any variable that tends to
clusteraroundthemean.Forexample,theheightsofadultmalesintheIndonesianareroughlynormally
distributed,withameanofabout160cm.Mostmenhaveaheightclosetothemean,thoughasmall
numberofoutliershaveaheightsignificantlyaboveorbelowthemean.Ahistogramofmaleheightswill
appearsimilartoabellcurve,withthecorrespondencebecomingcloserifmoredataareused.

ARDBUSINESSSTATISTICSSec.6

Page50of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Figure41NormalDistribution

Source:http://upload.wikimedia.org/wikipedia/commons/b/bb/Normal_distribution_and_scales.gif

Bythecentrallimittheorem,thesumofalargenumberofindependentrandomvariablesisdistributed
approximately normally. For this reason, the normal distribution is used throughout statistics, natural
science,andsocialscienceasasimplemodelforcomplexphenomena.Forexample,theobservational
error in an experiment is usually assumed to follow a normal distribution, and the propagation of
uncertaintyiscomputedusingthisassumption.
4.1.1

NormalProbabilityDensityFunction

Normal equation. The value of the random variable Y (f(X)) is:

1
2

where X is a normal random variable, is the mean, is the standard deviation, is


approximately 3.14159, and e is approximately 2.71828.
4.1.1.1 TransformationFormula

The Z value is equal to the difference between X and the mean, , divided by the standard
deviation, .

ARDBUSINESSSTATISTICSSec.6

Page51of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

4.1.1.2 ProbabilityandtheNormalCurve

The normal distribution is a continuous probability distribution. This has several implications for
probability.

Thetotalareaunderthenormalcurveisequalto1.
TheprobabilitythatanormalrandomvariableXequalsanyparticularvalueis0.
TheprobabilitythatXisgreaterthanaequalstheareaunderthenormalcurveboundedbya
andplusinfinity(asindicatedbythenonshadedareainthefigurebelow).
TheprobabilitythatXislessthanaequalstheareaunderthenormalcurveboundedbyaand
minusinfinity(asindicatedbytheshadedareainthefigurebelow).

TheStandardizedNormalProbabilityDensityFunctionisgivenbyequation:

1
2

Additionally, every normal curve (regardless of its mean or standard deviation) conforms to the
following"rule".

About68%oftheareaunderthecurvefallswithin1standarddeviationofthemean.
About95%oftheareaunderthecurvefallswithin2standarddeviationsofthemean.
About99.7%oftheareaunderthecurvefallswithin3standarddeviationsofthemean.

Collectively,thesepointsareknownastheempiricalruleorthe689599.7rule.Clearly,givenanormal
distribution,mostoutcomeswillbewithin3standarddeviationsofthemean.
Toseehowtransformationformulaisappliedseepage222232Chapter6TheNormalDistributionof
Levine,et.al.2008.StatisticsforManagersusingMicrosoftExcelFifthEdition.
4.1.2

EvaluatingNormality

4.1.2.1 CompareDataCharacteristicstoTheoreticalPropertiesofnormaldistribution
Thenormaldistribution:

Symmetricalmeanandmedianareequal
Bellshapedempiricalruleapplies
Interquartilerange=1.33standarddeviations

Howtocompare:
5. Constructchartsandobservetheirappearance.Forsmallormoderatedatasets,
constructstemleafdisplayoraboxandwhiskerplot.Forlargedatasets,constructthe
frequencydistributionandplotthehistogramorpolygon.
ARDBUSINESSSTATISTICSSec.6

Page52of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

6. Computedescriptivenumericalmeasuresandcomparethecharacteristicsofthedata
withthetheoreticalpropertiesofthenormaldistribution.Comparemeanandmedia.
Theinterquartilerangeshouldapproximately1.33timesofthestandarddeviation.The
rangeapproximately6timesthestandarddeviation.
7. Evaluatehowthevaluesindatadistributed.Determinewhether2/3ofvalueslie
betweenthemeanandstandarddeviation.Determine4/5ofthevaluesliebetween
themeanand1.28standarddeviations.Determinewhether19outofevery20
valuesliesbetweenthemean2standarddeviation
4. Example:
3YearReturn
Mean
StandardError
Median
Mode
StandardDeviation
SampleVariance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Largest(1)
Smallest(1)
ConfidenceLevel(95.0%)

17.8
0.17099
17.2
15.1
4.94991
24.5016
1.03812
0.66073
35.6
6.7
42.3
14916.4
838
42.3
6.7
0.33562

Box-and-Whisker Plot of Three-Year Returns

3 Year
Return

10

20

30

40

1. TheMean(17.8)slightlyhigherthanTheMedian(17.2){NormalDist.mean=
median}
2. BoxandWhiskerplotrightskewedwithmaxoulier42.{NormalDist.Symmetrical}
3. Interquartilerange7.0approx.1.41StandardDeviation(SD){NormalDist.1.33}
4. Range35.6equalto7.19SD{NormalDist.6SD}
5. 74.2Returnsarewithin1SDofthemean.{NormalDist.68.26%}
6. 83.3%orreturnswithin1.28SD(NormalDist.80%}
Thus,theconclusionbaseonthefactabove,thethreeyearreturnsarerightskewedand
notnormallydistributed.
4.1.2.2 Constructanormalprobabilityplot
Anormalprobabilityplotisgraphicalapproachforevaluatingwhetherdataarenormally
distributed.Theapproachiscalledquantilequantileplot.Anormalprobabilityplotfordatafrom
anormaldistributionwillbeapproximatelylinear.Tocomputenormalprobabilitiesandcreate
plots,wecanusePHStatasdescribedonExcelCompaniontoChapter6ofLevine,et.al.2008.

ARDBUSINESSSTATISTICSSec.6

Page53of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

StatisticsforManagersUsingMicrosoftExcel.FifthEditon.PearsonEducation,Inc.,Upper
SaddleRiver,NewJersey.,pages247249

4.2 SamplingandSamplingDistribution
ReadExcelCompaniontoChapter7Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.
FifthEditon.PearsonEducation,Inc.,UpperSaddleRiver,NewJersey.,pages281282
4.2.1

Sample
Selectingasampleislesstimeconsumingthanselectingeveryiteminthepopulation(census).

Selectingasampleislesscostlythanselectingeveryiteminthepopulation.

Ananalysisofasampleislesscumbersomeandmorepracticalthanananalysisoftheentire
population.

4-2Therelationshipbetweenpopulations,samples,parameters,andstatistics.

4.2.2

TypesofSamples
Inanonprobabilitysample,itemsincludedarechosenwithoutregardtotheirprobabilityof
occurrence.
o Conveniencesampling,itemsareselectedbasedonlyonthefactthattheyareeasy,
inexpensive,orconvenienttosample.
o Judgmentsample,yougettheopinionsofpreselectedexpertsinthesubject
matter.
Inaprobabilitysample,itemsinthesamplearechosenonthebasisofknownprobabilities.
o SimpleRandomSampling,everyindividualoritemfromtheframehasanequal
chanceofbeingselected.Selectionmaybewithreplacement(selectedindividualis
returnedtoframeforpossiblereselection)orwithoutreplacement(selected
individualisntreturnedtotheframe).Samplesobtainedfromtableofrandom
numbersorcomputerrandomnumbergenerators.

ARDBUSINESSSTATISTICSSec.6

Page54of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

SystematicSampling,Decideonsamplesize:n;DivideframeofNindividualsinto
groupsofkindividuals:k=N/n;Randomlyselectoneindividualfromthe1stgroup;
Selecteverykthindividualthereafter.
Forexample,supposeyouweresamplingn=9individualsfromapopulation
ofN=72.So,thepopulationwouldbedividedintok=72/9=8groups.
Randomlyselectamemberfromgroup1,sayindividual3.Then,select
every8thindividualthereafter(i.e.3,11,19,27,35,43,51,59,67)
o StratifiedSampling,dividepopulationintotwoormoresubgroups(calledstrata)
accordingtosomecommoncharacteristic.Asimplerandomsampleisselectedfrom
eachsubgroup,withsamplesizesproportionaltostratasizes.Samplesfrom
subgroupsarecombinedintoone.Thisisacommontechniquewhensampling
populationofvoters,stratifyingacrossracialorsocioeconomiclines.

o ClusterSampling,Populationisdividedintoseveralclusters,eachrepresentative
ofthepopulation.Asimplerandomsampleofclustersisselected.Allitemsinthe
selectedclusterscanbeused,oritemscanbechosenfromaclusterusinganother
probabilitysamplingtechnique.Acommonapplicationofclustersamplinginvolves
electionexitpolls,wherecertainelectiondistrictsareselectedandsampled.

ComparingSamplingMethods
o SimplerandomsampleandSystematicsample
Simpletouse
Maynotbeagoodrepresentationofthepopulationsunderlying
characteristics
o Stratifiedsample
Ensuresrepresentationofindividualsacrosstheentirepopulation
o Clustersample
Morecosteffective
Lessefficient(needlargersampletoacquirethesamelevelofprecision)
o

4.2.3

SamplingDistributions
Asamplingdistributionisadistributionofallofthepossiblevaluesofastatisticforagiven
sizesampleselectedfromapopulation.
Forexample,supposeyousample50studentsfromyourcollegeregardingtheirmeanGPA.
Ifyouobtainedmanydifferentsamplesof50,youwillcomputeadifferentmeanforeach
sample.WeareinterestedinthedistributionofallpotentialmeanGPAwemightcalculate
foranygivensampleof50students.
Example:
o Supposeyourpopulation(simplified)wasfourpeopleatyourinstitution.
o PopulationsizeN=4
o Randomvariable,X,isageofindividuals

ARDBUSINESSSTATISTICSSec.6

Page55of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

o
4.2.4

ValuesofX:18,20,22,24(years)

SAMPLINGFROMFINITEPOPULATIONS

4.2.4.1 USINGTHEFINITEPOPULATIONCORRECTIONFACTORWITHTHEMEAN
InthecerealfillingexampleinSection7.3onpage265,youselectedasampleof25cereal
boxesfromafillingprocesswith=368grams.Supposethat2,000boxes(i.e.,thepopulation)
arefilledonthisparticularday.Usingthefpcfactor,whatistheprobabilitythatthesample
meanisbelow365grams?
SOLUTIONUsingthefpcfactor,=15,n=25,andN=2,000,sothatTheprobabilitythatthe
samplemeanisbelow365iscomputedasfollows:

FromTableE.2,theareabelow365gramsis0.1562.

Thefpcfactorhasaverysmalleffectonthestandarderrorofthemeanandthesubsequent
areaunderthenormalcurvebecausethesamplesizeisonly1.25%ofthepopulationsize(that
is,n/N=25/2,000=0.0125).

4.3 AssignmentforSimpleRandomSample
ProblemforSection7.1Number7.2,7.4,and7.8;
ProblemforSection7.2Number7.10,7.14

4.4 AssignmentforSamplingDistribution
ProblemforSection7.4Number7.18,7.20,and7.24

4.5 AssignmentforTheSamplingDistributionofthemean
ProblemforSection7.5Number7.28,and7.32

ARDBUSINESSSTATISTICSSec.6

Page56of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

4.6 AssignmentforSamplingfromFinitePopulation
1. GiventhatN=80andn=10andthesampleisselectedwithoutreplacement,
determinethefpcfactor.
2. Historically,93%ofthedeliveriesofanovernightmailservicearrivebefore10:30the
followingmorning.Ifarandomsampleof500deliveriesisselectedwithoutreplacement
fromapopulationthatconsistedof10,000deliveries,whatistheprobabilitythatthe
samplewillhave:
a.between93%and95%ofthedeliveriesarrivingbefore10:30thefollowingmorning?
b.morethan95%ofthedeliveriesarrivingbefore10:30thefollowingmorning?

ARDBUSINESSSTATISTICSSec.6

Page57of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Practicum:Math11002
BusinessStatistics
MODULE5
Submittedonlyon
Day/Date:____________/
______________
Time:12.0014.00WIB
In____________________

ModuleDescription:
Objective

Output

DateofReceipt

Score:

AssistantSignature

I herewith signed here on stated that I have strived to do all this the
modulebymyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
CONFIDENCEINTERVALESTIMATION
Toconstructandinterpretconfidenceintervalestimatesforthemeanand
theproportionHowtodeterminethesamplesizenecessarytodevelopa
confidence interval for the mean or proportion How to use confidence
intervalestimatesinauditing
Useseparatepaperstoreportyourresults(inhandwritingorcomputer
print out). A report produced by the students should be in the form of
workingproceduresandresultsinbothsoftcopyandhardcopy.

PreLabRead:
Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,pages322326.

5 CONFIDENCEINTERVALESTIMATION
5.1 Confidenceintervals
5.1.1

Apointestimateandaconfidenceintervalestimate

5.1.1.1 PointEstimates
A point estimate is a single number. For the population mean (and population standard
deviation),apointestimateisthesamplemean(andsamplestandarddeviation).Aconfidence
intervalprovidesadditionalinformationaboutvariability.

PointEstimate

Widthofconfidenceinterval

5.1.1.2 ConfidenceIntervalEstimates
PointEstimate(CriticalValue)(StandardError)
ARDBUSINESSSTATISTICSSec.7

Page58of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Aconfidenceintervalgivesarangeestimateofvalues:

Takesintoconsiderationvariationinsamplestatisticsfromsampletosample
Basedonalltheobservationsfrom1sample
Givesinformationaboutclosenesstounknownpopulationparameters
ConfidenceLevel:Confidenceinwhichtheintervalwillcontaintheunknownpopulation
parameter.Apercentage(lessthan100%)Statedintermsoflevelofconfidence
Ex.95%confidence,99%confidence

5.1.1.3 ConfidenceLevel
Supposeconfidencelevel=95%,alsowritten(1)=.95.Arelativefrequencyinterpretation
In the long run, 95% of all the confidence intervals that can be constructed will contain the
unknown true parameter. A specific interval either will contain or will not contain the true
parameter
5.1.2

ConfidenceIntervalfor(Known)
Assumptions:
o Populationstandarddeviationisknown
o Populationisnormallydistributed
o Ifpopulationisnotnormal,uselargesample
Confidenceintervalestimate: X Z

whereZisthestandardizednormaldistribution
n

criticalvalueforaprobabilityof/2ineachtail.
5.1.2.1 FindingtheCriticalValue,Z
Considera95%confidenceinterval:

ARDBUSINESSSTATISTICSSec.7

Page59of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Commonlyusedconfidencelevelsare90%,95%,and99%
ConfidenceLevel
80%
90%
95%
98%
99%
99.8%
99.9%

ConfidenceCoefficient
.80
.90
.95
.98
.99
.998
.999

Zvalue
1.280
1.645
1.960
2.330
2.580
3.080
3.270

5.1.2.2 IntervalsandLevelofConfidence

Example:
Asampleof11circuitsfromalargenormalpopulationhasameanresistanceof2.20ohms.We
knowfrompasttestingthatthepopulationstandarddeviationis.35ohms.Determinea95%
and99%confidenceintervalforthetruemeanresistanceofthepopulation.
Solution:
95%CI

(1.9932 , 2.4068)
X Z
2.20 1.96 (.35/ 11) 2.20 .2068
n

Weare95%confidentthatthetruemeanresistanceisbetween1.9932and2.4068ohms
Althoughthetruemeanmayormaynotbeinthisinterval,95%ofintervalsformedinthis
mannerwillcontainthetruemean.
99%CI

(1.9277 , 2.4723)
X Z
2.20 2.58 (.35/ 11) 2.20 0.2723
n

ARDBUSINESSSTATISTICSSec.7

Page60of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Weare98%confidentthatthetruemeanresistanceisbetween1.9277and2.4723ohms
Althoughthetruemeanmayormaynotbeinthisinterval,96%ofintervalsformedinthis
mannerwillcontainthetruemean.

5.1.3

ConfidenceIntervalfor(Unknown)
Ifthepopulationstandarddeviationisunknown,wecansubstitutethesamplestandard
deviation,SThisintroducesextrauncertainty,sinceSisvariablefromsampletosampleSowe
usethetdistributioninsteadofthenormaldistribution.
Assumptions:
o
o
o
o

Populationstandarddeviationisunknown
Populationisnormallydistributed
Ifpopulationisnotnormal,uselargesample
UseStudentstDistribution

ConfidenceIntervalEstimate: X t n -1

S
,wheretisthecriticalvalueofthetdistributionwith
n

n1d.f.andanareaof/2ineachtail
Thetvaluedependsondegreesoffreedom(d.f.),Numberofobservationsthatarefreetovary
aftersamplemeanhasbeencalculated:d.f.=n1
5.1.3.1 StudentstDistribution

IfnincreasesthentZ.

ARDBUSINESSSTATISTICSSec.7

Page61of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

5.1.3.2 StudentstTable

5.1.3.3 ConfidenceIntervalfor(Unknown)Example
Example1
Arandomsampleofn=25hasX=50andS=8.Forma95%confidenceintervalfor.
Solution:d.f.=n1=24,sotheconfidenceintervalis

X t/2, n -1

S
8
50 (2.0639)

n
25

=(46.698,53.302)

A
Forma95%confidenceinterval
forMeanusingMsExcel

2
3
4
5
6
7

Data
SampleStandardDeviation
SampleMean
SampleSize

8
50
25

ConfidenceLevel

95%

8
9
10
11
12
13
14
15
16
17

IntermediateCalculations
StandardErroroftheMean
DegreesofFreedom
tValue
IntervalHalfWidth

ConfidenceInterval
IntervalLowerLimit
IntervalUpperLimit

1.6000
24
2.0639
3.3022

=B4/SQRT(B6)
=B61
=TINV(1B7,B11)
=B12*B10

46.6978 =B5B13
53.3022 =B5+B13

Example2:
ARDBUSINESSSTATISTICSSec.7

Page62of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Contruct a 95% confidence interval estimate for the population mean force required to break the
insulator:

1870
1866
1820

1728
1764
1744

ForceRequiredtoBreakElectricInsulators(inpounds)
1656
1610
1634
1784
1522
1696
1734
1662
1734
1774
1550
1756
1788
1688
1810
1752
1680
1810

1592
1762
1652

1662
1866
1736

Solution:
PutDataonrangeofF2toO4

A
B
EstimatefortheMeanAmountofForceRequired

2
3
4
5
6
7

Data
SampleStandardDeviation
SampleMean
SampleSize
ConfidenceLevel

89.5508 =STDEV(F2:O4)
1723.4 =AVERAGE(F2:O4)
30 =COUNT(F2:O4)
95%

8
9
10
11
12
13
14
15
16
17

IntermediateCalculations
StandardErroroftheMean
DegreesofFreedom
tValue
IntervalHalfWidth

ConfidenceInterval
IntervalLowerLimit
IntervalUpperLimit

16.3497
29
2.0452
33.4388

=B4/SQRT(B6)
=B61
=TINV(1B7,B11)
=B12*B10

1689.96 =B5B13
1756.84 =B5+B13

We can conclude with 95% confidence that the mean breaking force required for the population of
insulator is between 1689.96 an d 1756.84 pounds. The validity of this confidence interval estimate
dependsontheassumptionthattheforcerequiredisnormallydistributed.Ifthesamplenumberislarge
than we can slightly loosen this assumption. Thus, with a sample of 30, we can use the t distribution
even distribution is slightly left skewed (see. Probability Plot or boxand whisker plot). Thus, the t
distributionisappropriateforthedata.

ARDBUSINESSSTATISTICSSec.7

Page63of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

2000

Force Required to Break Electrical


Insulators

Force Required to Break Electrical


Insulators

1800
1600
1400
1200
Force

1000

Force

800
600
400
200
0
-3

-2

-1

Z0
Value

1500

1600

1700

1800

1900

5.2 ConfidenceIntervalEstimateforaSinglePopulationProportion
Anintervalestimateforthepopulationproportion()canbecalculatedbyaddinganallowance
foruncertaintytothesampleproportion(p).
Recallthatthedistributionofthesampleproportionisapproximatelynormalifthesamplesizeis
large,withstandarddeviation: p

(1 )

Wewillestimatethiswithsampledata:

p(1 p)

Upperandlowerconfidencelimitsforthepopulationproportionarecalculatedwiththeformula:

pZ

p(1 p)

where:

5.2.1

Zisthestandardizednormalvalueforthelevelofconfidencedesired
pisthesampleproportion
nisthesamplesize

ExampleforConfidenceIntervalsforthePopulationProportion
A random sample of 100 people shows that 25 have opened IRAs this year. Form a 95%
confidenceintervalforthetrueproportionofthepopulationwhohaveopenedIRAs.

p Z p(1 p)/n 25/100 1.96 .25(.75)/100

.25 1.96 (.0433) (0.1651 , 0.3349)

ARDBUSINESSSTATISTICSSec.7

Page64of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

SolvingConfidenceIntervalforPopulationProportionusingMsExcel

1
2
3
4
5

ProportionofInErrorSalesInvoices
Data

SampleSize
100
NumberofSuccesses
10
ConfidenceLevel
95%

6
7
8

IntermediateCalculations
SampleProportion

9
10

11
12
13
14

ZValue
StandardErroroftheProportion
IntervalHalfWidth

ConfidenceInterval
IntervalLowerLimit
IntervalUpperLimit

0.1

1.9600
0.03
0.0588

=B5/B4
=NORMSINV((1B6)/2)
=SQRT(B9*(1B9)/B4)
=ABS(B10*B11)

0.0412 =B9B12
0.1588 =B9+B12

5.3 DeterminingSampleSize
Therequiredsamplesizecanbefoundtoreachadesiredmarginoferror(e)withaspecifiedlevel
of confidence (1 ). The margin of error is also called sampling error is the amount of
imprecisionintheestimateofthepopulationparameterandtheamountaddedandsubtractedto
thepointestimatetoformtheconfidenceinterval.
To determine the required sample size for the mean, you must know The desired level of
confidence(1),whichdeterminesthecriticalZvalue;theacceptablesamplingerror(marginof
error),eandThestandarddeviation,.
Theformula: n
5.3.1

Z 2 2

e2

IFPopulationStandardDeviation()Known
If=45,whatsamplesizeisneededtoestimatethemeanwithin5with90%confidence?

Z 2 2 (1.645) 2 (45) 2

219.19 Therequiredsamplesizeisn=220
Solution: n
52
e2
UsingMsExcel:

ARDBUSINESSSTATISTICSSec.7

Page65of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

5.3.2

IFPopulationStandardDeviation()Unknown
Ifunknown,canbeestimatedwhenusingtherequiredsamplesizeformulabyusingavalue
forthatisexpectedtobeatleastaslargeasthetrueandselectapilotsampleand
estimatewiththesamplestandarddeviation,S.

5.3.3

ToDetermineTheRequiredSampleSizeForTheProportion
Todeterminetherequiredsamplesizefortheproportion,youmustknow:
o
o
o
o

Thedesiredlevelofconfidence(1),whichdeterminesthecriticalZvalue
Theacceptablesamplingerror(marginoferror),e
Thetrueproportionofsuccesses,
canbeestimatedwithapilotsample,ifnecessary(orconservativelyuse=.50)

Z 2 (1 )
n

e2

Example:Howlargeasamplewouldbetoestimatethetrueproportiondefectiveinalarge
populationwithin3%,with95%confidence?(Assumeapilotsampleyieldsp=.12)
Solution:For95%confidence,useZ=1.96,e=.03andp=.12,sousethistoestimate

UsingMsExcel:

Z 2 (1 ) (1.96) 2 (.12)(1 .12)

450.74 451 samples


(.03) 2
e2

ARDBUSINESSSTATISTICSSec.7

Page66of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

5.4 Assignment5

1.
2.
3.
4.

Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,Chapter8Problems.Pages283319
ProblemforSection8.1No.8.2,8.4,8.8
ProblemforSection8.2No.8.12,8.14,8.18,8.22
ProblemforSection8.3No.8.24,8.28,8.32
ProblemforSection8.4No.8.36,8.40,8.46

ARDBUSINESSSTATISTICSSec.7

Page67of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Practicum:Math11002
BusinessStatistics
MODULE6
Submittedonlyon
Day/Date:____________/
______________
Time:12.0014.00WIB
In____________________

ModuleDescription:
Objective

Output

DateofReceipt

Score:

AssistantSignature

I herewith signed here on stated that I have strived to do all this the
modulebymyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
HYPOTHESISTESTINGANDTWOSAMPLETEST
ThebasicprinciplesofhypothesistestingHowtousehypothesistestingto
test a mean or proportion The assumption of each hypothesistesting
procedure, how to evaluate them and the consequences if they are
violated Formulate a decision rule for testing a hypothesis Know Type I
and Type II errors and Use hypothesis testing for comparing the
difference between: The means of two independent populations The
means of two related populations The proportions of two independent
populationsThevariancesoftwoindependentpopulations.

Useseparatepaperstoreportyourresults(inhandwritingorcomputer
print out). A report produced by the students should be in the form of
workingproceduresandresultsinbothsoftcopyandhardcopy.

PreLabRead:
Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,Chapter9andExcelCompaniontoChapter9.
Pages328367and369420

6 HYPOTHESISTESTINGANDTWOSAMPLETEST
6.1 HypothesisTesting

Ahypothesisisaclaim(assumption)aboutapopulationparameter:
Populationmean.Example:Themeanmonthlycellphonebillofthiscityis=$52.
Populationproportion.Example:Theproportionofadultsinthiscitywithcellphonesis
=.68
Statestheassumption(numerical)tobetested
Example: ThemeannumberofTVsetsinU.S.Homesisequaltothree. H 0 : 3

6.1.1

TheNullHypothesis,H0
o Isalwaysaboutapopulationparameter,notaboutasamplestatistic.
o Beginwiththeassumptionthatthenullhypothesisistrue.
o Itreferstothestatusquo
o Alwayscontains=,orsign
o Mayormaynotberejected

ARDBUSINESSSTATISTICSSec.8

Page68of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

6.1.2

6.1.3

TheAlternativeHypothesis,H1
o IstheoppositeofthenullhypothesisEx:ThemeannumberofTVsetsinU.S.homesisnot
equalto3(H1:3)
o Challengesthestatusquo
o Nevercontainsthe=,orsign
o Mayormaynotbeproven
o Isgenerallythehypothesisthattheresearcheristryingtoprove
TheHypothesisTestingProcess
Claim:Thepopulationmeanageis50.
o H0:=50, H1:50
o Samplethepopulationandfindsamplemean.

Population:

Sample:

o
o
o
o

SupposethesamplemeanagewasX=20.
Thisissignificantlylowerthantheclaimedmeanpopulationageof50.
Ifthenullhypothesisweretrue,theprobabilityofgettingsuchadifferentsamplemean
wouldbeverysmall,soyourejectthenullhypothesis.
Inotherwords,gettingasamplemeanof20issounlikelyifthepopulationmeanwas50,you
concludethatthepopulationmeanmustnotbe50.

ARDBUSINESSSTATISTICSSec.8

Page69of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

6.1.4

TheTestStatisticandCriticalValues
If the sample mean is close to the assumed population mean, the null hypothesis is not
rejected.
Ifthesamplemeanisfarfromtheassumedpopulationmean,thenullhypothesisisrejected.
HowfarisfarenoughtorejectH0?
Thecriticalvalueofateststatisticcreatesalineinthesandfordecisionmaking.

6.1.5

ErrorsinDecisionMaking

6.1.5.1 TypeIError
o Rejectatruenullhypothesis
o Consideredaserioustypeoferror
o TheprobabilityofaTypeIErroris
Calledlevelofsignificanceofthetest
Setbyresearcherinadvance
6.1.5.2 TypeIIError
o Failuretorejectfalsenullhypothesis
o TheprobabilityofaTypeIIErroris

PossibleHypothesisTestOutcomes

ActualSituation

Decision

H0True

H0False

DoNotRejectH0

NoError
Probability1

TypeIIError
Probability

RejectH0

TypeIError
Probability

NoError
Probability1

ARDBUSINESSSTATISTICSSec.8

Page70of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

6.1.6

LevelofSignificance,
Forexample,Claim:Thepopulationmeanageis50.

6.1.7

HypothesisTesting:Known
Fortwotailtestforthemean,known:

Convertsamplestatistic(X)toteststatistic Z

DeterminethecriticalZvaluesforaspecified
levelofsignificancefromatableorbyusingExcel
DecisionRule:Iftheteststatisticfallsintherejectionregion,rejectH0;otherwisedo
notrejectH0

ARDBUSINESSSTATISTICSSec.8

Page71of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Example: Test the claim that the true mean weight ofchocolate bars manufactured in a factory is 3
ounces.
Solution:

Statetheappropriatenullandalternativehypotheses:H0:=3H1:3(Thisisatwotailed
test)
Specifythedesiredlevelofsignificance:Supposethat=.05ischosenforthistest
Chooseasamplesize:Supposeasampleofsizen=100isselected
Determinetheappropriatetechnique
o isknownsothisisaZtest
Setupthecriticalvalues
o For=.05thecriticalZvaluesare1.96
Collectthedataandcomputetheteststatistic
o Supposethesampleresultsaren=100,X=2.84(=0.8isassumedknownfrompast
companyrecords)
Sotheteststatisticis: Z

6.1.8

Since Z = 2.0 < 1.96, you reject the null hypothesis and conclude that there is sufficient
evidencethatthemeanweightofchocolatebarsisnotequalto3.

6StepsofHypothesisTesting:
1. Statethenullhypothesis,H0andstatethealternativehypotheses,H1
2. Choosethelevelofsignificance,,andthesamplesizen.
3. Determinetheappropriatestatisticaltechniqueandtheteststatistictouse
4. Findthecriticalvaluesanddeterminetherejectionregion(s)
5. Collectdataandcomputetheteststatisticfromthesampleresult

ARDBUSINESSSTATISTICSSec.8

X
2.84 3 .16

2.0

0.8
.08
n
100

Page72of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

6.

Comparetheteststatistictothecriticalvaluetodeterminewhethertheteststatisticfallsin
theregionofrejection.Makethestatisticaldecision:RejectH0iftheteststatisticfallsinthe
rejectionregion.Expressthedecisioninthecontextoftheproblem.

SeeExample9.2and9.3
Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,pages336and337

6.1.9

HypothesisTesting:KnownpValueApproach
Thepvalueistheprobabilityofobtainingateststatisticequaltoormoreextreme(<or>)
thantheobservedsamplevaluegivenH0istrue.Alsocalledobservedlevelofsignificance.
SmallestvalueofforwhichH0canberejected.

ConvertSampleStatistic(ex.X)toTestStatistic(ex.Zstatistic)
ObtainthepvaluefromatableorbyusingExcel
Comparethepvaluewith
Ifpvalue<,rejectH0
Ifpvalue,donotrejectH0

Example:
6.1.9.1 ManualCalculation
Howlikelyisittoseeasamplemeanof2.84(orsomethingfurtherfromthemean,ineither
direction)ifthetruemeanis=3.0?Supposethesampleresultsaren=100,=0.8isassumed

Comparethepvaluewith
Ifpvalue<,rejectH0
Ifpvalue,donotrejectH0
Here: pvalue=.0456and=.05,Since.0456<.05,yourejectthenullhypothesis

ARDBUSINESSSTATISTICSSec.8

Page73of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

6.1.9.2 UsingMsExcel:

6.1.10 HypothesisTesting:KnownConfidenceIntervalConnections
ForX=2.84,=0.8andn=100,the95%confidenceintervalis:
2.84 - (1.96)

0.8
0.8
to 2.84 (1.96)

100
100

.6832 2.9968
Sincethisintervaldoesnotcontainthehypothesizedmean(3.0),yourejectthenull
hypothesisat=.05

6.1.11 OneTailTests
Inmanycases,thealternativehypothesisfocusesonaparticulardirection
Thisisalowertailtestsincethealternativehypothesisisfocusedonthe
lowertailbelowthemeanof3

Thisisanuppertailtestsincethealternativehypothesisisfocusedon
theuppertailabovethemeanof3
ARDBUSINESSSTATISTICSSec.8

Page74of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Example:
Aphoneindustrymanagerthinksthatcustomermonthlycellphonebillshaveincreased,and
nowaveragemorethan$52permonth.Thecompanywishestotestthisclaim.Pastcompany
recordsindicatethatthestandarddeviationisabout$10.
Formhypothesistest:
H0:52 themeanislessthanorequaltothan$52permonth
H1:>52 themeanisgreaterthan$52permonth(i.e.,sufficientevidenceexiststosupport
themanagersclaim)
Supposethat=.10ischosenforthistest
Findtherejectionregion:

WhatisZgivena=0.10?

Supposeasampleistakenwiththefollowingresults:n=64,X=53.1(=10wasassumed
knownfrompastcompanyrecords)

Thentheteststatisticis: Z

X
53.1 52

0.88

10
n
64

ARDBUSINESSSTATISTICSSec.8

Page75of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

DonotrejectH0sinceZ=0.881.28
i.e.:thereisnotsufficientevidencethatthemeanbillisgreaterthan$52
Calculatethepvalueandcompareto

MicrosoftExcelZtestResults

ARDBUSINESSSTATISTICSSec.8

Page76of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

6.1.12 HypothesisTesting:Unknown
Ifthepopulationstandarddeviationisunknown,youinsteadusethesamplestandarddeviation
S.Becauseofthischange,youusethetdistributioninsteadoftheZdistributiontotestthenull
hypothesisaboutthemean.Allothersteps,concepts,andconclusionsarethesame.
Thetteststatisticwithn1degreesoffreedomis: t n -1

S
n

Example:ThemeancostofahotelroominNewYorkissaidtobe$168pernight.Arandom
sampleof25hotelsresultedinX=$172.50andS=15.40.Testatthe=0.05level.
(Astemandleafdisplayandanormalprobabilityplotindicatethedataareapproximately
normallydistributed)
H0:=168
H1: 168

1.46

t n 1

X
172.50 168

1.46
S
15.40
n
25

DonotrejectH0:notsufficientevidencethattruemeancostisdifferentfrom$168
6.1.13 HypothesisTesting:ConnectiontoConfidenceIntervals
ForX=172.5,S=15.40andn=25,the95%confidenceintervalis:
172.5 - (2.0639)

ARDBUSINESSSTATISTICSSec.8

15.4
15.4
to 172.5 (2.0639)

25
25
Page77of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

166.14178.86
Sincethisintervalcontainsthehypothesizedmean(168),youdonotrejectthenullhypothesis
at=.05
o
o
o

Recallthatyouassumethatthesamplestatisticcomesfromarandomsamplefroma
normaldistribution.
Ifthesamplesizeissmall(<30),youshoulduseaboxandwhiskerplotoranormal
probabilityplottoassesswhethertheassumptionofnormalityisvalid.
Ifthesamplesizeislarge,thecentrallimittheoremappliesandthesampling
distributionofthemeanwillbenormal.

MicrosoftExcelResults

6.1.14 HypothesisTestingProportion
Involvescategoricalvariables.Twopossibleoutcomes,thatis,Success(possessesacertain
characteristic)andFailure(doesnotpossessesthatcharacteristic).Fractionorproportionof
thepopulationinthesuccesscategoryisdenotedby
Sampleproportioninthesuccesscategoryisdenotedbyp

number of successes in sample


X

n
sample size

ARDBUSINESSSTATISTICSSec.8

Page78of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Whenbothnandn(1)areatleast5,pcanbeapproximatedbyanormaldistributionwith

(1 )

meanandstandarddeviation p and p

Thesamplingdistributionofpisapproximatelynormal,sotheteststatisticisaZvalue:

(1 )
n

Example:Amarketingcompanyclaimsthatitreceives8%responsesfromitsmailing.Totest
thisclaim,arandomsampleof500weresurveyedwith30responses.Testatthe=.05
significancelevel.
Solution:
n=(500)(.08)=40

n(1)=(500)(.92)=460

6.2 Assignment6.1
1.
2.
3.
4.
5.

ProblemforSection9.1No.9.1through9.5,9.14,9.18
ProblemforSection9.2No.9.20,9.24,9.30,9.32
ProblemforSection9.3No.9.36,9.44,9.46
ProblemforSection9.4No.9.50,9.54,9.56,9.62
ProblemforSection9.5No.9.68,9.70,9.74

6.3 TwoSampleTests
Goal:Testhypothesisorformaconfidenceintervalforthedifferencebetweentwopopulation
means,12

ARDBUSINESSSTATISTICSSec.8

Page79of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Thepointestimateforthedifferencebetweensamplemeans: X X
Differentdatasources
Independent:Sampleselectedfromonepopulationhasnoeffectonthesampleselected
fromtheotherpopulation
Usethedifferencebetween2samplemeans
UseZtest,pooledvariancettest,orseparatevariancettest
IndependentPopulationMeans:
1. 1and2knownUseaZteststatistic
Assumptions:Samplesarerandomlyandindependentlydrawnandpopulation
distributionsarenormal
When1and2areknownandbothpopulationsarenormal,theteststatisticis
aZvalueandthestandarderrorofX1X2is

1 2
X1 X 2 1 2

and Z

2
2
n1 n 2
1 2

n1 n 2
2

X1 X 2

TwoIndependentPopulations,ComparingMeans

2. 1and2unknownUseStoestimateunknown,useatteststatistic
ARDBUSINESSSTATISTICSSec.8

Page80of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

6.3.1

Assumptions:Samplesarerandomlyandindependentlydrawn,Populations
arenormallydistributedandPopulationvariancesareunknownbutassumed
equal
Formingintervalestimates:Thepopulationvariancesareassumedequal,so
usethetwosamplestandarddeviationsandpoolthemtoestimatethetest
statisticisatvaluewith(n1+n22)degreesoffreedom

Sp

2
p

n1 1S12 n 2 1S2 2
(n1 1) (n 2 1)

X X
1

1 1
S2p
n1 n 2

2
2

n1 1S1 n 2 1S2

(n1 1) (n 2 1)

TwoSampleTestsIndependentPopulations
Youareafinancialanalystforabrokeragefirm.Isthereadifferenceindividendyieldbetween
stockslistedontheNYSE&NASDAQ?Youcollectthefollowingdata:

NYSE NASDAQ
Number 21
25
Samplemean 3.27
2.53
Samplestddev 1.30
1.16
Assumingbothpopulationsareapproximatelynormalwithequalvariances,isthereadifference
inaverageyield(=0.05)?
Theteststatisticis: t

2
p

1 1
S2p
n1 n 2

3.27 2.53 0
1
1
1.5021
21 25

2
2

n1 1S1 n 2 1S2
21 11.30 2 25 11.16 2

(n1 1) (n 2 1)

(21 - 1) (25 1)

2.040

1.5021

H0:12=0i.e.(1=2)
H1:120i.e.(12)
=0.05
df=21+252=44
CriticalValues:t=2.0154
TestStatistic:2.040

ARDBUSINESSSTATISTICSSec.8

X X

Page81of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Decision:RejectH0at=0.05
Conclusion:Thereisevidenceofadifferenceinthemeans
6.3.2

IndependentPopulationsUnequalVariance
Ifyoucannotassumepopulationvariancesareequal,thepooledvariancettestisinappropriate,
Instead,useaseparatevariancettest,whichincludesthetwoseparatesamplevariancesinthe
computationoftheteststatistic.Thecomputationsarecomplicatedandarebestperformed
usingExcel.

1 2

Theconfidenceintervalfor12is: X1 X 2 Z

n1 n 2

ARDBUSINESSSTATISTICSSec.8

Page82of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Practicum:Math11002
BusinessStatistics
MODULE7
Submittedonlyon
Day/Date:____________/
______________
Time:12.0014.00WIB
In____________________

ModuleDescription:
Objective

Output

DateofReceipt

Score:

AssistantSignature

I herewith signed here on stated that I have strived to do all this the
modulebymyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
ANOVAandCHISQUAREANDNONPARAMETRICTESTS
The basic concepts of experimental design How to use the oneway
analysis of variance to test for the differences among the means of
severalgroupsHowtousethetwowayanalysisofvarianceandinterpret
theinteraction and How and when to use the chisquare test for
contingencytablesHowtousetheMarascuilloprocedurefordetermining
pairwise differences when evaluating more than two porportions How
andwhentousetheMcNemartestHowandwhentousenonparametric
tests

Useseparatepaperstoreportyourresults(inhandwritingorcomputer
print out). A report produced by the students should be in the form of
workingproceduresandresultsinbothsoftcopyandhardcopy.

PreLabRead:
Levine,et.al.2008.StatisticsforManagersUsingMicrosoftExcel.FifthEditon.Pearson
Education,Inc.,UpperSaddleRiver,NewJersey.,Chapter10andExcelCompaniontoChapter10.
Pages369420

7 ANOVAANDCHISQUAREANDNONPARAMETRICTESTS
ANOVA

GeneralANOVASetting

Investigatorcontrolsoneormorefactorsofinterest

ARDBUSINESSSTATISTICSSec.9

Page83of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

o Eachfactorcontainstwoormorelevels
o Levelscanbenumericalorcategorical
o Differentlevelsproducedifferentgroups
o Thinkofthegroupsaspopulations
Observeeffectsonthedependentvariable,arethegroupsthesame?
Experimentaldesign:theplanusedtocollectthedata

CompletelyRandomizedDesign

Experimentalunits(subjects)areassignedrandomlytothedifferentlevels(groups),subjectsare
assumedhomogeneous
Onlyonefactororindependentvariable,withtwoormorelevels(groups)
Analyzedbyonefactoranalysisofvariance(onewayANOVA)

7.1 OneWayAnalysisofVariance
Evaluatethedifferenceamongthemeansofthreeormoregroups
Examples:Accidentratesfor1st,2nd,and3rdshiftorExpectedmileageforfivebrandsoftires
Assumptions:

7.1.1

Populationsarenormallydistributed
Populationshaveequalvariances
Samplesarerandomlyandindependentlydrawn

Hypotheses:OneWayANOVA
H 0 : 1 2 3 c
Allpopulationmeansareequal,i.e.,notreatmenteffect(novariationinmeansamong
groups)

H1 : 1 2 3 c
Atleastonepopulationmeanisdifferent,i.e.,thereisatreatment(groups)effect.Does
notmeanthatallpopulationmeansaredifferent.

AllMeansarethesame:TheNullHypothesisisTrue
(NoGroupEffect)

ARDBUSINESSSTATISTICSSec.9

Page84of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

7.1.2

Atleastonemeanis
different:TheNull
HypothesisisNOTtrue
(TreatmentEffectis
present)

PartitioningtheVariation
Totalvariationcanbesplitintotwoparts:
SST= TotalVariation=theaggregatedispersionoftheindividualdatavaluesaroundthe
overall(grand)meanofallfactorlevels(SST)
c

nj

SST ( X ij X ) 2
j 1 i 1

SST ( X 11 X ) ( X 12 X ) 2 ... ( X nc X ) 2
2

Where:

SST=Totalsumofsquares

c=numberofgroups

nj=numberofvaluesingroupj

=ithvaluefromgroupj

=grandmean(meanofalldatavalues)

SSA= AmongGroupVariation=dispersionbetweenthefactorsamplemeans(SSA)
c

SSA n j ( X j X ) 2
j 1

SSA n 1 (X1 X) n 2 (X 2 X) 2 ... n c (X c X) 2


2

Where:

SSA=Sumofsquaresamonggroups

c=numberofgroups

nj=samplesizefromgroupj

=samplemeanfromgroupj

=grandmean(meanofalldatavalues)

SSW= WithinGroupVariation=dispersionthatexistsamongthedatavalueswithinthe
particularfactorlevels(SSW)
c

SSW
j 1

nj

(X
i 1

ij

X j )2

SSW ( X 11 X 1 ) ( X 21 X 1 ) 2 ... ( X nc X c ) 2
2

ARDBUSINESSSTATISTICSSec.9

Page85of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Where:

7.1.3

SSW=Sumofsquareswithingroups
c=numberofgroups
nj=samplesizefromgroupj
=samplemeanfromgroupj
=ithvalueingroupj

ObtainingtheMeanSquares

SST

n 1
SSA
MSA

c 1
SSW
MSW

nc

MST

MeanSquaresTotal
MeanSquaresAmong
MeanSquaresWithin

7.1.4

OneWayANOVATable
c=numberofgroups
n=sumofthesamplesizes
fromallgroups
df=degreesoffreedom

7.1.5

Teststatistic
MSAismeansquaresamongvariances
MSWismeansquareswithinvariances
Degreesoffreedom
df1=c1(c=numberofgroups)
df2=nc(n=sumofallsamplesizes)
TheFstatisticistheratiooftheamongvariancetothewithinvariance
Theratiomustalwaysbepositive
df1=c1willtypicallybesmall
df2=ncwilltypicallybelarge
DecisionRule:RejectH0ifF>FU,otherwisedo
notrejectH0

ARDBUSINESSSTATISTICSSec.9

Page86of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

7.1.6

Example
Anexperimentwasconductedtodeterminewhetheranysignificantdifferencesexistinthe
strengthofparachuteswovenfromsyntheticfibersfromfourdifferentsuppliers(Supplier1,
Supplier2,Supplier3,andSupplier4)

SampleMean
SampleStandardDeviation

Supplier1
18.5
24.0
17.2
19.9
18.0

Supplier2
26.3
25.3
24.0
21.2
24.5

Supplier3
20.6
25.2
20.8
24.7
22.9

Supplier4
25.4
19.9
22.6
17.5
20.4

19.52

24.26

22.84

21.16

2.69

1.92

2.13

2.98

=AVERAGE()
=STDEV()

Tensile Strength Scatter Diagram


30
25

Tensile Strength

20
15
10
5
0
0

2
Supplier

ToconstructtheANOVAsummarytable,wecomputethesamplemeansineachgroup.
Thencomputethegrandmeanbysummingall20valuesanddividingbytotalnumberof
values:

438.9
20

21.945

Thencomputesumofsquares:

5 19.52
5 21.16

18.5
24.5
25.4
ARDBUSINESSSTATISTICSSec.9

21.945
21.945

19.52
24.26
21.16

5 24.26
63.2855

21.945

18 19.52
20.6 22.84
20.4 21.16

5 22.84

21.945

26.63 24.26
22.9 22.84
97.5040
Page87of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

18.5

21.945

24

21.945

20.4

21.945

160.7895

62.2855
4 1

21.0952

97.5040
20 4

6.0940

21.0952
6.0940

3.4616

FuformFdistributionTablewith3degreesoffreedominnumeratorand16degreesof
freedomindominatorat0.05levelofsignificanceis3.24.
BecausethecomputeteststatisticF=3.4616>Fu=3.24,werejectthenullhypotesis.The
conclusionthatthereisasignificantdifferenceinthemeantensilestrengthamongthe
foursupplier.

UsingMsExcelDataDataAnalysisAnova:SingleFactor:

7.1.7

TheTheTukeyKramerProcedure
Firstcomputethedifferences,
.ThencomputeCRITICALRANGEFORTHETURKEY
KRAMMER

ARDBUSINESSSTATISTICSSec.9

Page88of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

WhereQUistheuppertailcriticalvaluefroma

Studentizedrangedistributionhavingcdegreesoffreedominnumeratorandncdegrees
inthedenominator.
where:
QU=ValuefromStudentizedRangeDistributionwithcandncdegreesoffreedom
forthedesiredlevelof
MSW=MeanSquareWithin
njandnj=Samplesizesfromgroupsjandj

7.1.8

ANOVAAssumptions
RandomnessandIndependence:Selectrandomsamplesfromthecgroups(orrandomly
assignthelevels)
Normality:Thesamplevaluesfromeachgrouparefromanormalpopulation
HomogeneityofVariance:CanbetestedwithLevenesTest

LevenesTest
o
o

o
o

Teststheassumptionthatthevariancesofeachgroupareequal.
First,definethenullandalternativehypotheses:
H0:21=22==2c
H1:Notall2jareequal
Second,computetheabsolutevalueofthedifferencebetweeneachvalueand
themedianofeachgroup.
Third,performaonewayANOVAontheseabsolutedifferences.

ARDBUSINESSSTATISTICSSec.9

Page89of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

F=0.2068<3.2389(orthepvalue=0.8902>0.05).,thuswedonotrejecttheH0.Thereisnoevidence
ofasignificantdifferenceamongthefourvariances.Therefore,thehomogeneityofvariance
assumptionforANOVAprocedureisjustified.

7.2

7.2.1

TwoWayAnalysisofVariance
Examinestheeffectof
Twofactorsofinterestonthedependentvariable
e.g.,Percentcarbonationandlinespeedonsoftdrinkbottlingprocess
Interactionbetweenthedifferentlevelsofthesetwofactors
e.g.,Doestheeffectofoneparticularcarbonationleveldependonwhich
levelthelinespeedisset?
Assumptions
Populationsarenormallydistributed
Populationshaveequalvariances
Independentrandomsamplesareselected
SourcesofVariation
SST=SSA+SSB+SSAB+SSE
TwoFactorsofinterest:AandB
r=numberoflevelsoffactorA
c=numberoflevelsoffactorB
n/=numberofreplicationsforeachcell
n=totalnumberofobservationsinallcells

(n=rcn/)

Xijk=valueofthekthobservationofleveli

offactorAandleveljoffactorB

ARDBUSINESSSTATISTICSSec.9

Page90of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

7.2.2

TwoWayANOVA:Features
Degreesoffreedomalwaysaddup:n1=rc(n/1)+(r1)+(c1)+(r1)(c1)
Total=error+factorA+factorB+interaction

ThedenominatoroftheFTestisalwaysthesamebutthenumeratorisdifferent
Thesumsofsquaresalwaysaddup:SST=SSE+SSA+SSB+SSAB
Total=error+factorA+factorB+interaction

7.2.3

Interaction

7.3

CHISQUAREANDNONPARAMETRICTESTS
Alloftheinferentialstatisticswehavecoveredinpastlessons,arewhatarecalledparametric
statistics.Tousethesestatisticswemakesomeassumptionsaboutthedistributionstheycome
from,suchastheyarenormallydistributed.Withparametricstatisticswealsodealwithdatafor
thedependentvariablethatisattheintervalorratiolevelofmeasurement,i.e.testscores,
physicalmeasurements.
Theparametricstatisticswehavediscussedsoforinthiscourseare:
1.
2.
3.
4.
5.
6.

theZscoretest
theZtest
thesinglesamplettest
theindependentttest
thedependentttest
onesampleanalysisofvariance(ANOVA)

Wewillnowconsiderawidelyusednonparametrictest,chisquare,whichwecanusewithdataat
thenominallevel,thatisdatathatisclassificatory.Forexample,weknowthefrequencywithwhich
enteringfreshman,whenrequiredtopurchaseacomputerforcollegeuse,selectMacintosh
ARDBUSINESSSTATISTICSSec.9

Page91of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Computers,IBMComputers,orSomeotherbrandofcomputer.Wewanttoknowifthereisa
differenceamongthefrequencieswithwhichthesethreebrandsofcomputersareselectedorif
theychoosebasicallyequallyamongthethreebrands.Thisisaproblemwecanusethechisquare
statisticfor.
Thechisquarestatisticisusedtocomparetheobservedfrequencyofsomeobservation(suchas
frequencyofbuyingdifferentbrandsofcomputers)withanexpectedfrequency(suchasbuying
equalnumbersofeachbrandofcomputer).Thecomparisonofobservedandexpectedfrequencies
isusedtocalculatethevalueofthechisquarestatistic,whichinturncanbecomparedwiththe
distributionofchisquaretomakeaninferenceaboutastatisticalproblem.
Thesymbolforchisquareandtheformulaareasfollows:

where
Oistheobservedfrequency,and
Eistheexpectedfrequency.
Thedegreesoffreedomfortheonedimensionalchisquarestatisticis:
df=C1
whereCisthenumberofcategoriesorlevelsoftheindependentvariable.
7.3.1

OneVariableChiSquare(goodnessoffittest)withequalexpectedfrequencies
Wecanusethechisquarestatistictotestthedistributionofmeasuresoverlevelsofavariableto
indicateifthedistributionofmeasuresisthesameforalllevels.Thisisthefirstuseoftheone
variablechisquaretest.Thistestisalsoreferredtoasthegoodnessoffittest.
Usingtheexamplewealreadymentionedofthefrequencywithwhichenteringfreshman,when
requiredtopurchaseacomputerforcollegeuse,selectMacintoshComputers,IBMComputers,or
Someotherbrandofcomputer.Wewanttoknowifthereisasignificantdifferenceamongthe
frequencieswithwhichthesethreebrandsofcomputersareselectedorifthestudentsselect
equallyamongthethreebrands.
Thedatafor100studentsisrecordedinthetablebelow(theobservedfrequencies).Wehavealso
indicatedtheexpectedfrequencyforeachcategory.Sincethereare100measuresor
observationsandtherearethreecategories(Macintosh,IBM,andOther)wewouldindicatethe
expectedfrequencyforeachcategorytobe100/3or33.333.Inthethirdcolumnofthetablewe
havecalculatedthesquareoftheobservedfrequencyminustheexpectedfrequencydividedby

ARDBUSINESSSTATISTICSSec.9

Page92of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

theexpectedfrequency.Thesumofthethirdcolumnwouldbethevalueofthechisquare
statistic.
Frequencywithwhichstudentsselectcomputerbrand
Computer

Observed
Frequency
IBM
47
Macintosh
36
Other
17
Total(chisquare)

Expected
Frequency
33.333
33.333
33.333

(OE)2/E
5.604
0.213
8.003
13.820

Fromthetablewecanseethat:

Thedf=C1=31=2
Wecancomparetheobtainedvalueofchisquarewiththecriticalvalueforthe.05leveland
withdegreeesoffreedomof2obtainedfromAppendixTableF(DistributionofChiSquare)on
page331ofthetext.Lookingunderthecolumnfor.05andtherowfordf=2weseethatthe
criticalvalueforchisquareis5.991.
Wenowhavetheinformationweneedtocompletethesixstepprocessfortestingstatistical
hypothesesforourresearchproblem.
1. Statethenullhypothesisandthealternativehypothesisbasedonyourresearchquestion.

Note:Ournullhypothesis,forthechisquaretest,statesthattherearenodifferencesbetween
theobservedandtheexpectedfrequencies.Thealternatehypothesisstatesthatthereare
significantdifferencesbetweentheobservedandexpectedfrequencies.
2. Setthealphalevel.

Note:Asusualwewillsetouralphalevelat.05,wehave5chancesin100ofmakingatypeI
error.
3. Calculatethevalueoftheappropriatestatistic.Alsoindicatethedegreesoffreedomforthe
statisticaltestifnecessary.

df=C1=2

ARDBUSINESSSTATISTICSSec.9

Page93of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

4. Writethedecisionruleforrejectingthenullhypothesis.
RejectH0if

>=5.991.

Note:Towritethedecisionrulewehadtoknowthecriticalvalueforchisquare,withanalpha
levelof.05,and2degreesoffreedom.WecandothisbylookingatAppendixTableFandnoting
thetabledvalueforthecolumnforthe.05levelandtherowfor2df.
5. Writeasummarystatementbasedonthedecision.
RejectH0,p<.05
Note:Sinceourcalculatedvalueof
(13.820)isgreaterthan5.991,werejectthenull
hypothesisandacceptthealternativehypothesis.
6. WriteastatementofresultsinstandardEnglish.
Thereisasignificantdifferenceamongthefrequencieswithwhichstudentspurchasedthree
differentbrandsofcomputers.
7.3.2

OneVariableChiSquare(goodnessoffittest)withpredeterminedexpectedfrequencies
Let'slookattheproblemwejustsolved,inawaythatillustratestheotheruseofonevariable
chisquare,thatiswithpredeterminedexpectedfrequenciesratherthanwithequalfrequencies.
Wecouldformulatedourrevisedproblemasfollows:
Inanationalstudy,studentsrequiredtobuycomputersforcollegeuseboughtIBMcomputers
50%ofthetime,Macintoshcomputers25%ofthetime,andothercomputers25%ofthetime.
Of100enteringfreshmanwesurveyed36boughtMacintoshComputers,47boughtIBM
computers,and17boughtsomeotherbrandofcomputer.Wewanttoknowifthese
frequenciesofcomputerbuyingbehaviorissimilartoordifferentthanthenationalstudydata.

Thedatafor100studentsisrecordedinthetablebelow(theobservedfrequencies).Inthiscasethe
expectedfrequenciesarethosefromthenationalstudy.Togettheexpectedfrequencywetakethe
percentagesfromthenationalstudytimesthetotalnumberofsubjectsinthecurrentstudy.

ExpectedfrequencyforIBM=100X50%=50
ExpectedfrequencyforMacintosh=100X25%=25
ExpectedfrequencyforOther=100X25%=25

Theexpectedfrequenciesarerecordedinthesecondcolumnofthetable.Asbeforewehave
calculatedthesquareoftheobservedfrequencyminustheexpectedfrequencydividedbythe
expectedfrequencyandrecordedthisresultinthethirdcolumnofthetable.Thesumofthethird
columnwouldbethevalueofthechisquarestatistic.

ARDBUSINESSSTATISTICSSec.9

Page94of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Frequencywithwhichstudentsselectcomputerbrand
Computer

Observed
Frequency
IBM
47
Macintosh
36
Other
17
Total(chisquare)

Expected
Frequency
50
25
25

(OE)2/E
0.18
4.84
2.56
7.58

Fromthetablewecanseethat:

Thedf=C1=31=2
Wecancomparetheobtainedvalueofchisquarewiththecriticalvalueforthe.05leveland
withdegreeesoffreedomof2obtainedfromAppendixTableF(DistributionofChiSquare)on
page331ofthetext.Lookingunderthecolumnfor.05andtherowfordf=2weseethatthe
criticalvalueforchisquareis5.991.
Wenowhavetheinformationweneedtocompletethesixstepprocessfortestingstatistical
hypothesesforourresearchproblem.
1. Statethenullhypothesisandthealternativehypothesisbasedonyourresearchquestion.

Note:Ournullhypothesis,forthechisquaretest,statesthattherearenodifferencesbetween
the observed and the expected frequencies. The alternate hypothesis states that there are
significantdifferencesbetweentheobservedandexpectedfrequencies.
2. Setthealphalevel.

Note:Asusualwewillsetouralphalevelat.05,wehave5chancesin100ofmakingatypeI
error.
3. Calculatethevalueoftheappropriatestatistic.Alsoindicatethedegreesoffreedomforthe
statisticaltestifnecessary.
7.58
df=C1=2
4. Writethedecisionruleforrejectingthenullhypothesis.
RejectH0if

>=5.991.

ARDBUSINESSSTATISTICSSec.9

Page95of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

Note:Towritethedecisionrulewehadtoknowthecriticalvalueforchisquare,withanalpha
levelof.05,and2degreesoffreedom.WecandothisbylookingatAppendixTableFandnoting
thetabledvalueforthecolumnforthe.05levelandtherowfor2df.
5. Writeasummarystatementbasedonthedecision.
RejectH0,p<.05
Note:Sinceourcalculatedvalueof (7.58)isgreaterthan5.991,werejectthenullhypothesis
andacceptthealternativehypothesis.
6. WriteastatementofresultsinstandardEnglish.
Thereisasignificantdifferenceamongthefrequencieswithwhichstudentspurchasedthree
differentbrandsofcomputersandtheproportionssuggestedbyanationalstudy.
7.3.3

TwoVariableChiSquare(testofindependence)
Nowletusconsiderthecaseofthetwovariablechisquaretest,alsoknownasthetestof
independence.
Forexamplewemaywishtoknowifthereisasignificantdifferenceinthefrequencieswithwhich
malescomefromsmall,medium,orlargecitiesasconstrastedwithfemales.Thetwovariableswe
areconsideringherearehometownsize(small,medium,orlarge)andsex(maleorfemale).
Anotherwayofputtingourresearchquestionis:Isgenderindependentofsizeofhometown?
Thedatafor30femalesand6malesisinthefollowingtable.
Frequencywithwhichmalesandfemalescomefromsmall,medium,andlargecities

Female
Male
Totals

Small Medium Large Totals


10
4
14

14
1
15

6
1
7

30
6
36

Theformulaforchisquareisthesameasbefore:

where
Oistheobservedfrequency,and
Eistheexpectedfrequency.
Thedegreesoffreedomforthetwodimensionalchisquarestatisticis:
ARDBUSINESSSTATISTICSSec.9

Page96of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

df=(C1)(R1)
whereCisthenumberofcolumesorlevelsofthefirstvariableandRisthenumberofrowsor
levelsoftheseconedvariable.
Inthetableabovewehavetheobservedfrequencies(sixofthem).Nowwemustcalculatethe
expectedfrequencyforeachofthesixcells.Fortwovariablechisquarewefindtheexpected
frequencieswiththeformula:
ExpectedFrequencyforaCell=(ColumnTotalXRowTotal)/GrandTotal
InthetableabovewecanseethattheColumnTotalsare14(small),15(medium),and7(large),
whiletheRowTotalsare30(female)and6(male).Thegrandtotalis36.
Usingtheformulawecanthusfindtheexpectedfrequencyforeachcell.
1.
2.
3.
4.
5.
6.

Theexpectedfrequencyforthesmallfemalecellis14X30/36=11.667
Theexpectedfrequencyforthemediumfemalecellis15X30/36=12.500
Theexpectedfrequencyforthelargefemalecellis7X30/36=5.833
Theexpectedfrequencyforthesmallmalecellis14X6/36=2.333
Theexpectedfrequencyforthemediummalecellis15X6/36=2.500
Theexpectedfrequencyforthelargemalecellis7X6/36=1.167

Wecanputtheseexpectedfrequenciesinourtableandalsoincludethevaluesfor(OE)2/E.The
sumofallthesewillofcoursebethevalueofchisquare.
Observedfrequencies,expectedfrequencies,and(OE)2/Eformalesandfemalesfromsmall,
medium,andlargecities

Female
Male
Totals

Observed
10
4
14

Small
Expected
11.667
2.333

(OE)2/E
0.238
1.191

Observed
14
1
15

Medium
Expected
12.500
2.500

(OE)2/E
0.180
0.900

Observed
6
1
7

Large
Expected
5.833
1.167

Totals
(OE)2/E

0.005
30
0.024
6

36

Fromthetablewecanseethat:

=0.238+.180+.005+1.191+0.900+0.024=2.538
anddf=(C1)(R1)=(31)(21)=(2)(1)=2
Wenowhavetheinformationweneedtocompletethesixstepprocessfortestingstatistical
hypothesesforourresearchproblem.
ARDBUSINESSSTATISTICSSec.9

Page97of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

1. Statethenullhypothesisandthealternativehypothesisbasedonyourresearch
question.

2. Setthealphalevel.

3. Calculatethevalueoftheappropriatestatistic.Alsoindicatethedegreesoffreedomfor
thestatisticaltestifnecessary.

df=(C1)(R1)=(2)(1)=2
4. Writethedecisionruleforrejectingthenullhypothesis.
RejectH0if

>=5.991.

Note:Towritethedecisionrulewehadtoknowthecriticalvalueforchisquare,withan
alphalevelof.05,and2degreesoffreedom.WecandothisbylookingatAppendixTable
Fandnotingthetabledvalueforthecolumnforthe.05levelandtherowfor2df.
5. Writeasummarystatementbasedonthedecision.
FailtorejectH0
Note:Sinceourcalculatedvalueof (2.538)isnotgreaterthan5.991,wefailtoreject
thenullhypothesisandareunabletoacceptthealternativehypothesis.
6. WriteastatementofresultsinstandardEnglish.
Thereisnotasignificantdifferenceinthefrequencieswithwhichmalescomefromsmall,
medium,orlargetownsascomparedwithfemales.
Hometownsizeisnotindependentofgender.
Chisquareisausefulnonparametricstatistictohelpevaluatestatisticalhypothesis,involving
frequencieswithwhichobservationsfallinvariouscategories(nominaldata).
7.4

Assignment

7.4.1

Assignment7.1

istheformulafor
1.
2.
3.
4.

thedependentttest.
theindependentttest.
theonewayanalysisofvariancetest.
theScheffeposthoctest.

ARDBUSINESSSTATISTICSSec.9

Page98of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

istheformulafor
1.
2.
3.
4.

istheformulafor
1.
2.
3.
4.

thedependentttest.
theindependentttest.
theonewayanalysisofvariancetest.
theScheffeposthoctest.

thedependentttest.
theindependentttest.
theonewayanalysisofvariancetest.
theScheffeposthoctest.

ForthefollowingresearchproblemYouareconcernedwiththeeffectofcomputersonthe
qualityofwrittenlanguage.Yourandomlyplacethe30studentsinyourEnglishclassintotwo
groupsof15each.ThefirstgroupisaskedtowritetheirnextEnglishthemeassignmentusinga
wordprocessingprogramonacomputer,whiletheothergroupisaskedtowritetheirthemes
byhand.YouaskanotherEnglishteacher,toreadall30themesandgivethema1(poorest)to
10(best)ratingonthequalityoftheirEnglishusage.Youwanttoknowifthereisasignificant
differenceinthequalityratingsofthetwogroups.
Whatistheproperstatisticaltesttousewiththisresearchproblem?
1.
2.
3.
4.

thedependentttest
theindependentttest
theonesamplettest
theonewayanalysisofvariancetest

ForthefollowingresearchproblemThenumberofhoursasubjectcouldstayawakewas
measuredasafunctionofthedoselevelofaparticulardrug.Threelevelsofdrugdosagewere
used.Analyzetheresultsforthedataonthedependentvariable(numberofhoursawake)to
determineiftherewasasignificantdifferenceamongthethreelevelsofdrugdosageused.
Whatistheproperstatisticaltesttousewiththisresearchproblem?
1.
2.
3.
4.

thedependentttest
theindependentttest
theonesamplettest
theonewayanalysisofvariancetest

ARDBUSINESSSTATISTICSSec.9

Page99of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

7.4.2

Assignment7.2.

1. Anindustrialpsychologistisinterestedinevaluatingfourdifferenttypesoftrainingonworker
productivity.Usingastandardmeasureofproductivity,thepsychologistmeasuresthe
productivityofasetofworkerswhohavebeentrainedusingeachoneofthefourprocedures.
Usingthedatabelow,determinewhetherthereisasignificantdifferencebetweenthetraining
methods.Largernumbersonthedependentvariableindicatehigherproductivity.
ProductivityScoresforFourGroupsofWorkersTrainedbyDifferentMethods
Group1
Onthe
Job
67
68
61
62
60
56
1.
2.
3.
4.
5.
6.
7.
8.

Group2
Computer
Assisted
68
62
59
71
60
66

Group3 Group4
Lecture Videotape
46
39
38
47
46
49

37
46
49
48
49
53

H0
:
:
H1
F
=
F12
=
F13
=
=
F23
CriticalValueforF=
StateconditionsunderwhichyouwouldrejectH0:

2. Aschoolguidancecounselorinvestigatestheinfluenceofdifferentmotivationaldevicesonthe
academicachievementofstudents.Thecounselorarrangesforonegroupofstudentstoreceive
immediatefeedbackuponthecompletionofanEnglishassignment.Asecondgroupofstudents
receivesfeedbackattheendoftheday,whileathirdgroupreceivesfeedbackattheendofthe
week.Usingthestudents'gradesonastandardizedEnglishtest,determinewhetherthereisa
significantdifferencebetweenthegroups.Ifnecessary,performScheffetests.
EnglishTestResultsforGroupsofStudentsReceivingVariousTypesofFeedback
No
1
2
3
4

Group1
Group2
Group3
ImmediateFeedback Day'sEndFeedback Week'sEndFeedback
49
40
36
40
37
32
41
42
31
46
39
39

ARDBUSINESSSTATISTICSSec.9

Page100of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

No
5
6
7
8

1.
2.
3.
4.
5.
6.
7.
8.

Group1
Group2
Group3
ImmediateFeedback Day'sEndFeedback Week'sEndFeedback
42
45
40
50
39
39
53
45
41
51
49
38

H0
:
:
H1
F
=
F12
=
F13
=
F23
=
CriticalValueforF=
StateconditionsunderwhichyouwouldrejectH0:

7.4.3

Assignment7.3

Foreachofthefollowingproblems,statethenullhypothesis,thealternatehypothesis,thecalculated
valueofthestatistic,thecriticalvalueofthestatistic,andtheconditionsunderwhichyouwouldreject
thenullhypothesis.
1. Asampleof100peopleareclassifiedastotheirsocialclubmembershipandtheiracademic
status.Isbelongingtoasocialclubindependentofacademicstatus?.
AcademicClassificationandSocialClubMembershipfor100People
Academic
Belongto DonotBelongto
classification SocialClub
SocialClub
Freshman 9
16
Sophomore 11
14
Junior
16
9
Senior
19
6
1. H0:
2. H1:
3.
=
4. CriticalValuefor =
5. StateconditionsunderwhichyouwouldrejectH0
2. Aconsumerresearchgroupasked100mentouseeachofthreekindsofaftershavelotionfor
onemonth.Afterthetrialperiod,eachmanindicatedthelotionhepreferred.Usingtheresults
below,determinewhetherthereisasignificantpreferenceforanyofthethreeaftershave
lotions.

ARDBUSINESSSTATISTICSSec.9

Page101of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

NumberofMenPreferringEachofThreeAfterShaveLotions
Lotion NumberofMenPreferring
1
42
2
36
3
22
1. H0:
2. H1:
3.

4. CriticalValuefor =
5. StateconditionsunderwhichyouwouldrejectH0

1.
2.
3.
4.

istheformulafor
thedependentttest.
theindependentttest.
theonewayanalysisofvariancetest.
thechisquaretest.

1.
2.
3.
4.

istheformulafor
thedependentttest.
theindependentttest.
theonewayanalysisofvariancetest.
thechisquaretest.

istheformulafor

1. thedependentttest.
2. theindependentttest.
ARDBUSINESSSTATISTICSSec.9

Page102of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

3. theonewayanalysisofvariancetest.
4. theScheffeposthoctest.

ForthefollowingresearchproblemYouareinterestedinknowingwhetherornotthe
compositionofafamilyisrelatedtothetypeofvacationstheyliketotake.Accordingly,you
collectthefollowingdatafromasurveyofpreferredvacations:
Frequencieswithwhichfamiliesofvarioustypesprefervariousvacationtypes
Vacation
FamilyType

NoChildren LessThan5Children 510Children


VisitRelatives
0
15
5
GotoBeach
5
5
10
UrbanSightseeing
15
0
5

Whatistheproperstatisticaltesttousewiththisresearchproblem?
1.
2.
3.
4.

thedependentttest
theindependentttest
theonewayanalysisofvariancetest
thechisquaretest

ForthefollowingresearchproblemIsitreallytruethatpeoplewithgraduatedegreesincertain
fieldsearnsubstantiallylessmoneythanpeoplewithgraduatedegreesincertainotherfields?
Toanswerthisquestion,youlookatdatacollectedbyYuppieUniversityonthesalariesearned
byrecentgraduateandprofessionalstudents.

SalariesforrecentgraduatesofYuppieUniversitybyfieldofstudy
EngineeringPhD HumanitiesPhD EducationPhD
J.D.
M.D.
$40,000
$22,000
$25,000
$40,000 $50,000
$28,000
$24,000
$27,000
$35,000 $43,000
$32,000
$28,000
$31,000
$33,000 $33,000
$36,000
$24,000
$24,000
$36,000 $39,000
$30,000

$27,000
$38,000 $50,000

$32,000
Whatistheproperstatisticaltesttousewiththisresearchproblem?
1.
2.
3.
4.

thedependentttest
theindependentttest
theonewayanalysisofvariancetest
thechisquaretest

ARDBUSINESSSTATISTICSSec.9

Page103of131


MANAGEMENT PROGRAM
MODELLING AND SIMULATION LABORATORY

ARDBUSINESSSTATISTICSSec.9

Page104of131

Practicum:MATH11002
BusinessStatistics
MODULE8
Submittedonlyon
Day/Date:____________/
______________
Time:WIB
In____________________

ModuleDescription:
Objective

Output

DateofReceipt

Score:

AssistantSignature

IherewithsignedhereonstatedthatIhavestrivedtodoallthiswiththe
modulemyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
RegressionAnalysis
The student understand and able use regression analysis to predict the
value of a dependent variable based on an independent variable; The
meaningoftheregressioncoefficients;Makinginferencesabouttheslope
andcorrelationcoefficient;Estimatingmeanvaluesandpredictindividual
valuesusingMsExcelRegressionAnalysisorOtherStatisticalSoftwares.
A report of Simple Regression Analysis produced by the students should
be in the form of working procedures and results in both softcopy and
hardcopy.

RegressionAnalysis

8.1 SimpleRegressionAnalysis
The field of econometrics uses regression analysis to create quantitative models that can be used to
predict the value of a series if one knows the value of several other variables.
Thisanalysistoolperformslinearregressionanalysisbyusingthe"leastsquares"methodtofitaline
throughasetofobservations.Youcananalyzehowasingledependentvariableisaffectedbythevalues
ofoneormoreindependentvariablesforexample,how
Forexample,thewageperhourcanbepredictedifoneknowsthevaluesofthevariablesthatconstitute
theregressionequation.ThisisabigleapoffaithfromacorrelationorConfidenceintervalestimate.Ina
correlation,thestatisticianisnotpresumingorimplyinganycausalityordeductionofcausality.Onthe
otherhand,regressionanalysisisusedsooften(probablyevenabused)becauseofitssupposedability
to link cause and effect. Skepticism of causal relationships is not only healthy but also important
becauserealpowerofregressionliesinacomprehensiveinterpretationoftheresults.

8.2 RegressionAnalysisUsingExcel
Beforeusinganyanalysistool,youmustarrangethe
datayouwanttoanalyzeincolumnsorrowsonyour
worksheet. Thiswillbeyourinputrange.Oncethe
data is set you can open the analysis tool, in this
case Regression. Tools> Data Analysis >
Regression
ARDBUSINESSSTATISTICS08

Page105of131

8.3 RegressionDialogBox

InputYRangeEnterthereferencefortherangeofdependentdata.Therangemustconsistofasingle
columnofdata.YoucantypeinthedataorusetheCollapseorgooutandgetitbutton .Thiswill
collapseyourwindowsuchthatyoucanselectthedatayouwishtouse.Onceyouhavechosenyour
desireddataeitherpressEnterorclickontheExpandbutton .
Input X Range Enter the reference for the range of independent data. Microsoft Excel orders
independent variablesfromthisrange inascending orderfromlefttoright. The maximumnumberof
independentvariablesis16.
Labels Select if the first row or column of your input range or ranges contains labels. Clear if your
inputhasnolabels;Excelgeneratesappropriatedatalabelsfortheoutputtable.
ConfidenceLevelSelecttoincludeanadditionallevelinthesummaryoutputtable.Inthebox,enter
theconfidencelevelyouwantappliedinadditiontothedefault95percentlevel.
ConstantisZeroSelecttoforcetheregressionlinetopassthroughtheorigin.

Output Range Enter the reference for the upperleft cell of the output table. Allow at least seven
columnsforthesummaryoutputtable,whichincludesananovatable,coefficients,standarderrorofy
estimate,r2values,numberofobservations,andstandarderrorofcoefficients.
New Worksheet Ply Click to insert a new worksheet in the current workbook and paste the results
startingatcellA1ofthenewworksheet.Tonamethenewworksheet,typeanameinthebox.
NewWorkbookClicktocreateanewworkbookandpastetheresultsinthenewworkbook.
ResidualsSelecttoincluderesidualsintheresidualsoutputtable.
StandardizedResidualsSelecttoincludestandardizedresidualsintheresidualsoutputtable.
ResidualPlotsSelecttogenerateachartforeachindependentvariableversustheresidual.
LineFitPlotsSelecttogenerateachartforpredictedvaluesversustheobservedvalues.
ARDBUSINESSSTATISTICS08

Page106of131

NormalProbabilityPlotsSelecttogenerateachartthatplotsnormalprobability.

8.4 SimpleRegression

8.5 LinearCorrelationandRegressionAnalysis
Inthissectiontheobjectiveistoseewhetherthereisacorrelationbetweentwovariablesandto
find a model that predicts one variable in terms of the other variable. There are so many
examplesthatwecouldmentionbutwewillmentionthepopularonesintheworldofbusiness.
Usuallyindependentvariableispresentedbytheletterxandthedependentvariableispresented
by the letter y. A business man would like to see whether there is a relationship between the
numberofcasesofsoldandthetemperatureinahotsummerdaybasedoninformationtaken
from the past. He also would like to estimate the number cases of soda which will be sold in a
particularhotsummerdayinaballgame.Heclearlyrecordedtemperaturesandnumberofcases
ofsodasoldonthoseparticulardays.ThefollowingtableshowstherecordeddatafromJune1
through June 13. The weatherman predicts a 94F degree temperature for June 14. The

ARDBUSINESSSTATISTICS08

Page107of131

businessmanwouldliketomeetalldemandsforthecasesofsodasorderedbycustomersonJune
14.
DAY CasesofSoda Temperature
1Jun
57
56
2Jun
59
58
3Jun
65
63
4Jun
67
66
5Jun
75
73
6Jun
81
78
7Jun
86
85
8Jun
88
85
9Jun
88
87
10Jun
84
84
11Jun
82
88
12Jun
80
84
13Jun
83
89
NowletsuseExceltofindthelinearcorrelationcoefficientandtheregressionlineequation.The
linearcorrelationcoefficientisaquantitybetween1and+1.ThisquantityisdenotedbyR.The
closer R to +1 the stronger positive (direct) correlation and similarly the closer R to 1 the
stronger negative (inverse) correlation exists between the two variables. The general form of
the regression line is y = mx + b. In this formula, m is the slope of the line and b is the y
intercept. You can find these quantities from the Excel output. In this situation the variable y
(thedependentvariable)isthenumberofcasesofsodaandthex(independentvariable)isthe
temperature.TofindtheExceloutputthefollowingstepscanbetaken:
Step1.FromthemenuschooseToolsandclickonDataAnalysis.
Step2.WhenDataAnalysisdialogboxappears,clickoncorrelation.
Step3.Whencorrelationdialogboxappears,enterB1:C14intheinputrangebox.Clickon
Labelsinfirstrowandentera16intheoutputrangebox.ClickonOK.

Cases of Soda Temperature


1
Cases of Soda
0.96659877
1
Temperature

Asyouseethecorrelationbetweenthenumberofcasesofsodademandedandthe
temperatureisaverystrongpositivecorrelation.Thismeansasthetemperatureincreasesthe
demandforcasesofsodaisalsoincreasing.Thelinearcorrelationcoefficientis0.966598577
whichisverycloseto+1.
Nowletsfollowsamestepsbutabitdifferenttofindtheregressionequation.
ARDBUSINESSSTATISTICS08

Page108of131

Step1.FromthemenuschooseToolsandclickonDataAnalysis
Step2.WhenDataAnalysisdialogboxappears,clickonregression.
Step3.WhenRegressiondialogboxappears,enterb1:b14intheyrangeboxandc1:c14inthe
xrangebox.Clickonlabels.
Step4.Entera19intheoutputrangebox.
Note:TheregressionequationingeneralshouldlooklikeY=mX+b.Inthisequationmisthe
slopeoftheregressionlineandbisitsyintercept.

SUMMARY OUTPUT
RegressionStatistics
MultipleR

0.966598577

RSquare

0.934312809

AdjustedRSquare 0.928341246
StandardError

2.919383191

Observations

13

ANOVA
df SS

MS

SignificanceF

Regression 1 1333.479989 1333.479989 156.4603497 7.58511E08


Residual

11 93.75078034 8522798213

Total

12 1427.230769

Coefficients StandardError tStat


Intercept

9.17800767 5.445742836

Temperature 0.879202711 0.07028892

Pvalue

Lower95%

Upper95%

1.685354587 0.120044801 2.80799756 21.16401


12.50841116 7.58511E08 0.724497763 1.033908

Therelationshipbetweenthenumberofcansofsodaandthetemperatureis:

Y = 0.879202711 X + 9.17800767
The number of cans of soda = 0.879202711*(Temperature) + 9.17800767. Referring to this
expressionwecanapproximatelypredictthenumberofcasesofsodaneededonJune14.The
weatherforecastforthisis94degrees,hencethenumberofcansofsodaneededisequalto;
Thenumberofcasesofsoda=0.879202711*(94)+9.17800767=91.82orabout92cases.

ARDBUSINESSSTATISTICS08

Page109of131

Assignment 8.1 Regression Analysis:


Thehighwaydeathsper100millionvehiclemilesandhighwayspeedlimitsfor10countries,aregiven
below:
(Death,Speed)=(3.0,55),(3.3,55),(3.4,55),(3.5,70),(4.1,55),(4.3,60),(4.7,55),(4.9,60),(5.1,60),
and(6.1,75).
Fromthiswecanseethatfivecountrieswiththesamespeedlimithaveverydifferentpositionsonthe
safetylist.Forexample,Britain...withaspeedlimitof70isdemonstrablysaferthanJapan,at55.Can
wearguethat,speedhaslittletodowithsafety.Useregressionanalysistoanswerthisquestion.

ARDBUSINESSSTATISTICS08

Page110of131

Practicum:MATH11002
BusinessStatistics
MODULE9
Submittedonlyon
Day/Date:____________/
______________
Time:WIB
In____________________

ModuleDescription:
Objective

Output

DateofReceipt

Score:

AssistantSignature

IherewithsignedhereonstatedthatIhavestrivedtodoallthiswiththe
modulemyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:

MULTIPLEREGRESSION
How to develop a multiple regression model How to interpret the
regression coefficients How to determine which independent variables
are most important in predicting a dependent variable How to use
quadratic terms in a regression model How to measure the correlation
amongindependentvariables

AreportofMultipleRegressionAnalysisproducedbythestudentsshould
be in the form of working procedures and results in both softcopy and
hardcopy.

9 MultipleRegressionModel
MultipleRegressionisanextensionofsimpleregression.Simpleregressionhasonlyone
independent(explanatory)variable.MultipleRegressionfitsamodelforonedependent(response)
variablebasedonmorethanoneindependent(explanatory)variables.

9.1 MULTIPLEREGRESSIONUSINGTHEDATAANALYSISADDIN
We then create a new variable in cells C2:C6, cubed household size as a regressor.
Then in cell C1 give the the heading CUBED HH SIZE. (It turns out that for the se data
squared HH SIZE has a coefficient of exactly 0.0 the cube is used).
The spreadsheet cells A1:C6 should look like:

We have regression with an intercept and the regressors HH SIZE and CUBED HH SIZE

ARDBUSINESSSTATISTICS08

Page111of131

The population regression model is: y = 1 + 2 x2 + 3 x3 + u


It is assumed that the error u is independent with constant variance (homoskedastic) - see
EXCEL LIMITATIONS at the bottom.
We wish to estimate the regression line:

y = b1 + b2 x2 + b3 x3

We do this using the Data analysis Add-in and Regression.

The only change over one-variable regression is to include more than one column in the
Input X Range.
Note, however, that the regressors need to be in contiguous columns (here columns B and
C).
If this is not the case in the original data, then columns need to be copied to get the
regressors in contiguous columns.
Hitting OK we obtain

ARDBUSINESSSTATISTICS08

Page112of131

The regression output has three components:

Regressionstatisticstable
ANOVAtable
Regressioncoefficientstable.

9.2 INTERPRETREGRESSIONSTATISTICSTABLE
This is the following output. Of greatest interest is R Square.

Explanation
MultipleR

0.895828 R=squarerootofR2

RSquare

0.802508 R2

AdjustedRSquare 0.605016 AdjustedR2usedifmorethanonexvariable


StandardError

0.444401 Thisisthesampleestimateofthestandarddeviationoftheerroru

Observations

Numberofobservationsusedintheregression(n)

The above gives the overall goodness-of-fit measures:


R2 = 0.8025
Correlation between y and y-hat is 0.8958 (when squared gives 0.8025).
Adjusted R2 = R2 - (1-R2 )*(k-1)/(n-k) = .8025 - .1975*2/2 = 0.6050.
The standard error here refers to the estimated standard deviation of the error term u.
It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k)).
It is not to be confused with the standard error of y itself (from descriptive statistics) or
with the standard errors of the regression coefficients given below.
ARDBUSINESSSTATISTICS08

Page113of131

R2 = 0.8025 means that 80.25% of the variation of yi around ybar (its mean) is explained
by the regressors x2i and x3i.

9.3 INTERPRETANOVATABLE
AnANOVAtableisgiven.Thisisoftenskipped.
df SS

MS

SignificanceF

Regression

2 1.6050 0.8025 4.0635 0.1975

Residual

2 0.3950 0.1975

Total

4 2.0

The ANOVA (analysis of variance) table splits the sum of squares into its components.
Total sums of squares
= Residual (or error) sum of squares + Regression (or explained) sum of squares.
Thus i (yi - ybar)2 = i (yi - yhati)2 + i (yhati - ybar)2
where yhati is the value of yi predicted from the regression line
and ybar is the sample mean of y.
For example:
R2 = 1 - Residual SS / Total SS (general formula for R2)
= 1 - 0.3950 / 1.6050
(from data in the ANOVA table)
= 0.8025
(which equals R2 given in the regression Statistics table).
The column labeled F gives the overall F-test of H0: 2 = 0 and 3 = 0 versus Ha: at least
one of 2 and 3 does not equal zero.
Aside: Excel computes F this as:
F = [Regression SS/(k-1)] / [Residual SS/(n-k)] = [1.6050/2] / [.39498/2] = 4.0635.
The column labeled significance F has the associated P-value.
Since 0.1975 > 0.05, we do not reject H0 at signficance level 0.05.
Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of regressors
including hte intercept.
Here FINV(4.0635,2,2) = 0.1975.

9.4 INTERPRETREGRESSIONCOEFFICIENTSTABLE
The regression output of most interest is the following table of coefficients and associated
output:

ARDBUSINESSSTATISTICS08

Page114of131

Coefficient
Intercept
0.89655
HHSIZE
0.33647
CUBEDHHSIZE 0.00209

St.error
0.76440
0.42270
0.01311

tStat
1.1729
0.7960
0.1594

Pvalue
0.3616
0.5095
0.8880

Lower95%
2.3924
1.4823
0.0543

Upper95%
4.1855
2.1552
0.0585

Let j denote the population coefficient of the jth regressor (intercept, HH SIZE and
CUBED HH SIZE).
Then

Column"Coefficient"givestheleastsquaresestimatesofj.
Column"Standarderror"givesthestandarderrors(i.e.theestimatedstandarddeviation)
oftheleastsquaresestimatesbjofj.
Column"tStat"givesthecomputedtstatisticforH0:j=0againstHa:j0.
Thisisthecoefficientdividedbythestandarderror.Itiscomparedtoatwith(nk)
degreesoffreedomwhereheren=5andk=3.

Column"Pvalue"givesthepvaluefortestofH0:j=0againstHa:j0..
ThisequalsthePr{|t|>tStat}wheretisatdistributedrandomvariablewithnkdegrees
offreedomandtStatisthecomputedvalueofthetstatisticgivenintheprevious
column.
Notethatthispvalueisforatwosidedtest.Foraonesidedtestdividethispvalueby2
(alsocheckingthesignofthetStat).

Columns"Lower95%"and"Upper95%"valuesdefinea95%confidenceintervalforj.

Asimplesummaryoftheaboveoutputisthatthefittedlineis

y = 0.8966 + 0.3365*x + 0.0021*z

9.5 CONFIDENCEINTERVALSFORSLOPECOEFFICIENTS
95% confidence interval for slope coefficient 2 is from Excel output (-1.4823, 2.1552).
Excel computes this as
b2 t_.025(3) se(b2)
= 0.33647 TINV(0.05, 2) 0.42270
= 0.33647 4.303 0.42270
= 0.33647 1.8189
= (-1.4823, 2.1552).

ARDBUSINESSSTATISTICS08

Page115of131

Otherconfidenceintervalscanbeobtained.
Forexample,tofind99%confidenceintervals:intheRegressiondialogbox(intheData
AnalysisAddin),
checktheConfidenceLevelboxandsetthelevelto99%.

9.6 TESTHYPOTHESISOFZEROSLOPECOEFFICIENT("TESTOFSTATISTICAL
SIGNIFICANCE")
The coefficient of HH SIZE has estimated standard error of 0.4227, t-statistic of 0.7960
and p-value of 0.5095.
It is therefore statistically insignificant at significance level = .05 as p > 0.05.
The coefficient of CUBED HH SIZE has estimated standard error of 0.0131, t-statistic of
0.1594 and p-value of 0.8880.
It is therefore statistically insignificant at significance level = .05 as p > 0.05.
There are 5 observations and 3 regressors (intercept and x) so we use t(5-3)=t(2).
For example, for HH SIZE p = =TDIST(0.796,2,2) = 0.5095.

9.7 TESTHYPOTHESISONAREGRESSIONPARAMETER
Here we test whether HH SIZE has coefficient 2 = 1.0.
Example: H0: 2 = 1.0 against Ha: 2 1.0 at significance level = .05.
Then
t = (b2 - H0 value of 2) / (standard error of b2 )
= (0.33647 - 1.0) / 0.42270
= -1.569.
9.7.1

9.7.2

Usingthepvalueapproach
pvalue=TDIST(1.569,2,2)=0.257.[Heren=5andk=3sonk=2].
Donotrejectthenullhypothesisatlevel.05sincethepvalueis>0.05.
Usingthecriticalvalueapproach
Wecomputedt=1.569
Thecriticalvalueist_.025(2)=TINV(0.05,2)=4.303.[Heren=5andk=3sonk=2].
Sodonotrejectnullhypothesisatlevel.05sincet=|1.569|<4.303.

9.8 OVERALLTESTOFSIGNIFICANCEOFTHEREGRESSIONPARAMETERS
We test H0: 2 = 0 and 3 = 0 versus Ha: at least one of 2 and 3 does not equal zero.
ARDBUSINESSSTATISTICS08

Page116of131

From the ANOVA table the F-test statistic is 4.0635 with p-value of 0.1975.
Since the p-value is not less than 0.05 we do not reject the null hypothesis that the
regression parameters are zero at significance level 0.05.
Conclude that the parameters are jointly statistically insignificant at significance level 0.05.
Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of regressors
including hte intercept.
Here FINV(4.0635,2,2) = 0.1975.

9.9 PREDICTEDVALUEOFYGIVENREGRESSORS
Consider case where x = 4 in which case CUBED HH SIZE = x^3 = 4^3 = 64.
= b1 + b2 x2 + b3 x3 = 0.88966 + 0.33654 + 0.002164 = 2.37006

9.10 EXCELLIMITATIONS

Excel restricts the number of regressors (only up to 16 regressors).

Excel requires that all the regressor variables be in adjoining columns. You may
need to move columns to ensure this. e.g. If the regressors are in columns B and D
you need to copy at least one of columns B and D so that they are adjacent to each
other.
Excel standard errors and t-statistics and p-values are based on the assumption that
the error is independent with constant variance (homoskedastic).
Excel does not provide alternaties, such asheteroskedastic-robust or
autocorrelation-robust standard errors and t-statistics and p-values

9.11 Assignment9.1
DATA:
Store BarsSold Price(cents) Promotion($) Store Barssold Price(cents) Promotion($)
1
4141
59
200
18
2730
79
400
2
3842
59
200
19
2618
79
400
3
3056
59
200
20
4421
79
400
4
3519
59
200
21
4113
79
600
5
4226
59
400
22
3746
79
600
6
4630
59
400
23
3532
79
600
7
3507
59
400
24
3825
79
600
8
3754
59
400
25
1096
99
200
9
5000
59
600
26
761
99
200
ARDBUSINESSSTATISTICS08

Page117of131

Store BarsSold Price(cents) Promotion($) Store Barssold Price(cents) Promotion($)


10
5120
59
600
27
2088
99
200
11
4011
59
600
28
820
99
200
12
5015
59
600
29
2114
99
400
13
1916
79
200
30
1882
99
400
14
675
79
200
31
2159
99
400
15
3636
79
200
32
1602
99
400
16
3224
79
200
33
3354
99
600
17
2295
79
400
34
2927
99
600

Asampleof34storesdatainiasupermarketchainisselectedforatestmarketstudyofOmniPower.All
thestoresselectedhaveapproximatelythesamemonthlysalesvolume.Twoindependentvariablesare
pricesofbar(X1)andmonthlyAdsexpenditures(X2).
a.
b.
c.
d.
e.

UseExcelDataAnalysisRegressiontoestimatetheregressionline
Interpretregressionstatisticstable
Use95%and99%confidenceinterval
TestHypothesisOfZeroSlopeCoefficient("TestOfStatisticalSignificance")
TestHypothesisOnARegressionParameter
i. UsingThePValueApproach
ii. UsingTheCriticalValueApproach
f. OverallTestOfSignificanceOfTheRegressionParameters
g. PredictedValueOfYGivenPrice89centsandPromotion800

ARDBUSINESSSTATISTICS08

Page118of131

Practicum:MATH11002
BusinessStatistics
MODULE10
Submittedonlyon
Day/Date:____________/
______________
Time:WIB
In____________________

ModuleDescription:
Objective

Output

DateofReceipt

Score:

AssistantSignature

IherewithsignedhereonstatedthatIhavestrivedtodoallthiswiththe
modulemyself.
Name/NIM:______________________________/_______________
Signature:_______________________________________________
Rem.:
TIMESERIESFORECASTING
Discussed the important of forecasting Performed smoothing of data
seriesDescribedleastsquaretrendfittingandforecastingAddressedtime
seriesforecastingAddressedautoregressivemodelsDescribedprocedure
forchoosingappropriatemodels

A report produced by the students should be in the form of working


proceduresandresultsinbothsoftcopyandhardcopy.

10 TimeSeriesForecasting
TimeSeriesanalysishastwomaingoals:
*Identifyingthenatureofasequenceofobservations.
*Predictingfuturevaluesusinghistoricalobservations(alsoknownasforecasting).
In Time Series analysis, it is assumed that the data consists of a systematic pattern, and also random
noise that makes the pattern difficult to identify. Most time series analysis techniques use filtering to
remove the data noise. There are two general components of Time series patterns: Trend and
Seasonality.Thetrendisalinearornonlinearcomponent,anddoesnotrepeatwithinthetimerange.
TheSeasonalityrepeatsitselfinsystematicintervalsovertime.Thesetwocomponentsareoftenboth
presentinrealdata.
TrendAnalysis
Trendanalysisisatechniqueused toidentifyatrendcomponentintimeseriesdata.Inmany
cases data can be approximated by a linear function, but logarithmic, exponential, and
polynomialfunctionscanalsobeused.
RegressionAnalysis
Regressionanalysisisthestudyofrelationshipsamongvariables,anditspurposeistopredict,or
estimate,thevalueofonevariablefromtheknownvaluesofothervariablesrelatedtoit.Any
methodoffittingequationstodatamaybecalledregression,andtheseequationsareusefulfor
makingpredictions,andjudgingthestrengthofrelationships.

Forecasting and extrapolation from present values to future values is not a function of regression
analysis. To predict the future, time series analysis is used. To predict values it is necessary to find a
predictive function that will minimize the sum of distances between each of the points, and the
predictive function itself. The leastsquares method is the most common function amongst the
ARDBUSINESSSTATISTICS09

Page119of131

predictivefunctions,anditcalculatestheminimumaveragesquareddeviationsbetweenthepoints,and
theestimatedfunction.

10.1 Timeseriesforecastingmodels

Basicassumptionoftimeseriesforecastingisthatthefactorsthathaveinfluencedactivitiesinthepast
andpresentwillcontinuetodosoinapproximatelythesamewayinthefuture.Atrendisanoverall
longtermupwardordownwardmovementinatimeseries.Themostbasicintheclassicalmultiplicative
modelforannual,quarterly,andmonthly.
10.1.1 CLASSICALMULTIPLICATIVETIMESERIESMODELFORANNUALDATA

Yi=TixCixIi
Where:
Ti=valueofthetrendcomponentinyeari
Ci=valueofthecyclicalcomponentinyeari
Ii=valueoftheirregularcomponentinyeari

CLASSICALMULTIPLICATIVETIMESERIESMODELFORANNUALDATAWITHASEASONALCOMPONENT

Yi=TixSixCixIi
Where:
Ti,Ci,Ii=valueofthetrend,cyclical,andirregularcomponentsinyeari
Si=valueofthecomponentinyeari

UseWrigleyCodedDatabelowtocreateexcelchartplotforActualGrossRevenue
Year ActualRevenue Year ActualRevenue
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994

ARDBUSINESSSTATISTICS09

591
620
699
781
891
993
1111
1149
1301
1440
1661

1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005

1770
1851
1954
2023
2079
2146
2430
2746
3069
3649
4159

Page120of131

Wm. Wrigley Jr. Company Actual Revenue

4500
4000

y = 143.63x - 284695
R = 0.9121

3500

Revenue ($millions)

3000
2500
2000
1500
1000
500
0
1980

1985

1990

1995

Year

2000

2005

Year
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005

Population
198,584
200,591
203,133
205,220
207,753
212,577
215,092
217,570
221,168
223,357
226,082

2010

10.1.2 Assignment9.1
Year
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994

Population
176,383
178,206
180,587
182,753
184,613
186,393
189,164
190,925
192,805
194,838
196,814

Workforce
113,544
115,461
117,834
119,865
121,669
123,869
125,840
126,346
128,105
129,200
131,056

Workforce
132,304
133,943
136,297
137,673
139,368
142,583
143,734
144,863
146,510
146,817
147,956

c. PlotusingMsExcelthetimeseriesforUSciviliannoninstitutionalpopulationofpeople
16yearsandolder.
d. Computetheliniertrendforecastingequation
e. ForecasttheUSciviliannoninstitutionalpopulationofpeople16yearsandolderfor
2006and2007.
f. Repeat(a)through(c)forUS.civiliannoninstitutionalworkforceofpeople16yearsand
older.

10.2 MovingAverageandExponentialSmoothing
10.2.1 MovingAverageModels
UsetheAddTrendlineoptiontoanalyzeamovingaverageforecastingmodelinExcel.Youmustfirst
createagraphofthetimeseriesyouwanttoanalyze.Selecttherangethatcontainsyourdataandmake
ascatterplotofthedata.Oncethechartiscreated,followthesesteps:

ARDBUSINESSSTATISTICS09

Page121of131

1. Clickonthecharttoselectit,andclickonanypointonthelinetoselectthedataseries.
Whenyouclickonthecharttoselectit,anewoption,Chart,saddedtothemenubar.
2. FromtheChartmenu,selectAddTrendline.
Movingaveragesforachosenperiodoflength(L)consistofaseriesofmeanscomputedovertimesuch
thateachmeaniscalculatedforasequenceofLobservedvalues.MovingAveragearerepresentedby
thesymbolMA(L).Forexamplewehave11yearsdataandwanttocomputefiveyearmovingaverages
(L=5).
11yearsperiod1996to2006data:
4.0

5.0

7.0

6.0

8.0

9.0

5.0

2.0

3.5

5.5

6.6

MA(5)=(Y1+Y2+Y3+Y4+Y5)/L=(4.0+5.0+7.0+6.0+8.0)/5=6.0
Putthemovingaveragecomputedabovecenteredonnewmiddlevalue(7.0).CalculatetherestMA(L)
andwehave:
Revenue

4.0

5.0

7.0

6.0

8.0

9.0

5.0

2.0

3.5

5.5

6.6

MA

6.0

7.0

7.0

6.0

5.5

5.0

4.5

ThefollowingisthreeyearandsevenyearmovingforCabotCorporationrevenues:
Revenue

1982

1588

1983

1558

MA 3-Year

MA 7-Year

#N/A

#N/A

1633

#N/A
#N/A

1984

1753

1573

1985

1408

1490.3

1531.1

1986

1310

1380.7

1581.0

1987

1424

1470.3

1599.1

1988

1677

1679.3

1561.3

1989

1937

1766.3

1583.3

1990

1685

1703.3

1627.4

1991

1488

1578.3

1665.0

1992

1562

1556.3

1688.4

1993

1619

1622.7

1678.1

1994

1687

1715.7

1671.3

1995

1841

1797.7

1694.9

1996

1865

1781.0

1714.4

1997

1637

1718.3

1725.7

1998

1653

1663.0

1702.3

1999

1699

1683.3

1661.7

2000

1698

1640.0

1651.7

2001

1523

1592.7

1694.1

2002

1557

1625.0

2003

1795

1762.0

2004

1934

2005

2125

1951.3
#N/A

ARDBUSINESSSTATISTICS09

Moving Averages for Cabot Corporation Revenue


2500

2000

Revenues ($millions)

Year

1500

Revenue
Revenue

1000

MA 3-Year
MA 7-Year

500

1980

1985

1990

1995
Year

2000

2005

1761.6
#N/A
#N/A
#N/A

Page122of131

2010

10.2.2 ExponentialSmoothingModels
ThesimplestwaytoanalyzeatimerseriesusinganExponentialSmoothingmodelinExcelistousethe
dataanalysistool.ThistoolworksalmostexactlyliketheoneforMovingAverage,exceptthatyouwill
needtoinputthevalueofainsteadofthenumberofperiods,k.Onceyouhaveenteredthedatarange
andthedampingfactor,1,andindicatedwhatoutputyouwantandalocation,theanalysisisthe
sameastheonefortheMovingAveragemodel.
COMPUTINGANDEXPONENTIALLYSMOOTHEDVALUEINTIMEPERIODi
Ei=Yi
Ei=WYi+(1W)Ei1

i=2,3,4,

Where
Ei=valueoftheexponentiallysmoothedseriesbeingcomputedintimeperiodi
Ei1=valueoftheexponentiallysmoothedseriesbeingcomputedintimeperiodi1
Yi=Observedvalueofthetimeseriesinperiodi
W=subjectivelyassignedweightorsmoothingcoefficient(0<W<1).

Revenue

ES(W=.50)

ES(W=.25)

1982

1588

1588.0

1588.0

1983

1558

1573.0

1580.5

1984

1753

1663.0

1623.6

1985

1408

1535.5

1569.7

1986

1310

1422.8

1504.8

1987

1424

1423.4

1484.6

1988

1677

1550.2

1532.7

1989

1937

1743.6

1633.8

1990

1685

1714.3

1646.6

1991

1488

1601.1

1606.9

1992

1562

1581.6

1595.7

1993

1619

1600.3

1601.5

1994

1687

1643.6

1622.9

1995

1841

1742.3

1677.4

1996

1865

1803.7

1724.3

1997

1637

1720.3

1702.5

1998

1653

1686.7

1690.1

1999

1699

1692.8

1692.3

2000

1698

1695.4

1693.8

2001

1523

1609.2

1651.1

2002

1557

1583.1

1627.5

2003

1795

1689.1

1669.4

2004

1934

1811.5

1735.6

2005

2125

1968.3

1832.9

Exponentially Smoothed Cabot Corporation Revenue


2500

2000

Revenues ($millions)

Year

1500

Revenue

1000

ES(W=.50)
ES(W=.25)

500

1980

1985

1990

1995
Year

2000

2005

ARDBUSINESSSTATISTICS09

Page123of131

2010

10.3 Assignment10.2
Year

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Deals

715

865

708

861

931

939

1031

893

735

759

1013

622

h. Plotthetimeseries
i. Fitathreeyearmovingaveragetothedataandplottheresults.
j. UsingasmoothingcoefficientofW=0.50,exponentiallysmooththeseriesand
plotstheresults
k. Repeat(c)usingW=0.25
l. Comparetheresultsof(c)and(d).

10.4 Linear,exponentialandquadratictrend

10.4.1 LinearTrendModel
LiniertrendmodelYi=0+1Xi+ iisthesimplestforecastingmodel.
UsingWrigleyDataaboveweplotusingMicrosoftExceltimeseriesofrealgrossrevenuesshownbelow:
UsingMicrosoftExcel,weperforma
simplelinierregressionanalysisonthe
adjustedtimeseriesresultsinthe
followingliniertrendforecasting
equation: 469.9158 62.1068
Theregressioncoefficientcanbe
interpretasfollows:

TheYintercept,b0=
469.9158
The Slope, b1=62.1068

Forexamplewewanttoprojectthetrendin2006thensubstituteX23=22(2006code),intothelinear
trendforecastingequation:
ARDBUSINESSSTATISTICS09

Page124of131

469.9158

62.1068 22

1,839.265

1983

1984

QuadraticTrendModel

,isthesimplestnonlinearmodel.TheequationofQuadraticTrendModel

presentedbelow:

,
quadraticeffectonY

estimated

Forexample,UsingMicrosoftExceltocomputethequadratictrendforecastingequation.Figurebelowprovides
theresultsforquadratictrendmodelusedtoforecastrealgrossrevenuesattheWM.WrigleyJr.company:

618.3211

17.5852

2.1201

Tocomputeaforecastusingthequadratictrendequationin2006thensubstituteX23=22(2006code),
intothequadratictrendforecastingequation:

618.3211

17.5852 22

2.1201 22

2,031.324

ARDBUSINESSSTATISTICS09

Page125of131

10.4.2 ExponentialTrendModel
Theexponentialtrendmodelequation(
where

1 100%
trendforecastingequationislog(Yi)=b0+b1Xi.

% .Theexponential

ExcelresultsworksheetforanexponentialtrendmodelforrealgrossrevenuesattheWM.WrigleyJr.companyis

Usingexponentialtrendequationandtheresultsabovewehave:log(Yi)=2.7647+.0245Xi,whereyear0is1984.
Computethevaluesfor and byusingtheantilogofregressioncoefficients(b0andb1):
2.7647
0.0245

10

10

581.701
.

1.058

Thus,theequationoftheexponentialtrendforecastingis:

581.701 1.058

Toforecastrealgrossrevenuesfor2006(X23=22)usingtheaboveequationareasfollow:
log(Yi)=2.7647+.0245(22)=3.3037
3.3037

103.3037

2,012.334

Thechartofexponentialtrendforecastingis:

ARDBUSINESSSTATISTICS09

Page126of131

10.4.3 ModelSelectionUsingFirst,Second,andPercentageDifferences
Toselectwhichofthosemodelsaboveisthemostappropriatemodel,wecanusevisually
inspectingscatterplotandcompatingtheadjustedr2values,wecancompareandexaminefirst,
second,andpercentagedifferences.
PerfectFitForLinearTrendModel:Thefirstdifferencesareconstant.Andtheconsecutive
valuesintheseriesarethesamethroughout

Example:

Passengers
First Diff

1997
30

1998
33
33

1999
36
3

2000
39
36

2001
42
6

2002
45
39

2003
48
9

2004
51
42

2005
54
12

2006
57
45

PerfectFitForQuadraticTrendModel:Theseconddifferencesareconstant.Andthe
consecutivevaluesintheseriesarethesamethroughout

Example:

Passengers
First Diff
Second Diff

1997
30

1998
31
31

1999
33.5
2.5
1.5

2000
37.5
35
1.5

2001
43
8
1.5

2002
50
42
1.5

2003
58.5
16.5
1.5

2004
68.5
52
1.5

2005
80
28
1.5

2006
93
65
1.5

PerfectFitForExponentialTrendModel:Thepercentagedifferencebetweentheconsecutive
valuesareconstant.Thus

100%

100%

100%

Example:
1997 1998 1999 2000
ARDBUSINESSSTATISTICS09

2001 2002 2003 2004 2005 2006


Page127of131

Passengers
FirstDiff
SecondDiff
PercentageDiff

30

31.5
31.5
5%

33.1
1.6
0.1
5%

34.8
36.5
33.2
3.3
0.1 7.11E15
5%
5%

38.3
35
0.1
5%

40.2
5.2
0.1
5%

42.2
37
0.1
5%

44.3
7.3
0.1
5%

46.5
39.2
0.1
5%

For the real gross revenue data at WM Jr. Company, neither the first, second differences, nor
percentage differences are constant across the series (see: table below). Therefore, the other
modelsmaybemoreappropriate(includingthoseconsideredinAutoregressiveModeling.
10.4.4 Assignment10.3
a. PlottheDataofTable9.1BedBath&BeyondInc.

Table101BedBath&BeyondInc.

b. Computealineartrendforecastingequationandplotthe
results.
c. Computealineartrendforecastingequationandplotthe
results.
d. alineartrendforecastingequationandplottheresults.
e. Usingtheforecastingequationin(b)through(d),whatare
yourannualforecastsofthenumberofstoresopenfor
2007and2008
f. Howcanyouexplainthedifferencesinthethreeforecast
in(e)?Whatforecastdoyouthinkyoushoulduse?Why?

10.5 Theautoregressiveandtheleastsquaremodelsforseasonaldata
Autoregressivemodelingisatechniqueusedtoforecasttimeserieswithautocorrelation.Afirstorder
autocorrelationreferstotherelationshipbetweenconsecutivevaluesintimeseries.Asecondorder
autocorrelationreferstotherelationshipbetweenvaluesthataretwoperiodapart.Apthorderorder
autocorrelationreferstothecorrelationbetweenvaluesinatimeseriesthatarepperiodapart.
FirstOrderAutoregressiveModel
issimilarinformtothesimplelinearregressionmodel.

10.6 Pricesindexes

Indexnumbersallowrelativecomparisonsovertime
Indexnumbersarereportedrelativetoabaseperiodindex
Baseperiodindex=100bydefinition

ARDBUSINESSSTATISTICS09

Page128of131

where

Ii=indexnumberforyeari
Pi=priceforyeari
Pbase=priceforthebaseyear

10.6.1 Example
Airplaneticketpricesfrom1998to2006:

Pricesin1998were92.2%ofbaseyearprices

Pricesin2000were100%ofbaseyearprices(bydefinition,since2000isthebaseyear)

Pricesin2006were130.2%ofbaseyearprices

10.7 Aggregatedandsimpleindexes
An aggregate index is used to measure the rate of change from a base period for a group of items

ARDBUSINESSSTATISTICS09

Page129of131

10.7.1 UnweightedAggregatePriceIndex

Example:
Year

Leasepayment

Fuel

Repair

Total

Index(2003=100)

2003

260

45

40

345

100.0

2004

280

60

40

380

110.1

2005

305

55

45

405

117.4

2006

310

50

50

410

118.8

410
I 2006
100
(100) 118.8

P2003
345

Unweightedtotalexpenseswere18.8%higherin2006thanin2003

P2006

10.7.2 WeightedAggregatePriceIndexes

ARDBUSINESSSTATISTICS09

Page130of131

ARDBUSINESSSTATISTICS09

Page131of131

You might also like