0% found this document useful (0 votes)
181 views11 pages

Probability and Statistics in Excel

The document provides a guide to using probability and statistics functions in Microsoft Excel. It outlines descriptive statistics, graphical representations, permutations and combinations, and standard probability distributions that can be calculated in Excel. It also discusses limitations of Excel for statistical computing and recommends other statistical packages.

Uploaded by

Jaroslav Pokorny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
181 views11 pages

Probability and Statistics in Excel

The document provides a guide to using probability and statistics functions in Microsoft Excel. It outlines descriptive statistics, graphical representations, permutations and combinations, and standard probability distributions that can be calculated in Excel. It also discusses limitations of Excel for statistical computing and recommends other statistical packages.

Uploaded by

Jaroslav Pokorny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

aguideto

ProbabilityandStatisticsin
MicrosoftExcel

Resourcestosupportthelearningofmathematics,
statisticsandORinhighereducation.

www.mathstore.ac.uk
TheStatisticalEducationthroughProblemSolving(STEPS)
glossary
www.stats.gla.ac/steps/glossary
ProbabilityandStatisticsinMicrosoftExcel
Excelprovidesmorethan100functionsrelatingtoprobabilityandstatistics.Italsohasafacilityfor
constructingawiderangeofchartsandgraphsfordisplayingdata.Thisleafletprovidesaquickreference
guidetoassistyouinharnessingExcelsstatisticalcapability.Exceptwhereindicated,thefeaturesincluded
hereareavailableinExcelVersions4.0andabove.Almostalltheinstructionsherealsoapplytothe
spreadsheetfacilityinOpenOffice(http://openoffice.orgsuite.com/);anyslightvariationsincommands
shouldbeobvioustotheuser.

Excelisnotdesignedforstatisticalcomputing.Ifyourequirestatisticalanalysisbeyonddatavalidationand
manipulation,tabulation,presentationandcalculationofsummarystatistics,youareadvisedtousea
bespokestatisticalpackagesuchasMinitaborSPSS.
ExcelhasanAnalysisToolpakoptionaladdinfacilitythatincludesmacrosforcarryingoutmany
elementarystatisticalanalyses.TheinstructionsforinstallationofthisaddinvarywiththeversionofExcel
usetheHelpfacilityinExcelforfurtherinformationonthis.Thisaddinfacilityisnotusedinthisleaflet.

Therearetworeasonswhythisaddinshouldbeusedwithcare:
Unlikeotherspreadsheetfunctionality,whichensuresthatcalculationsautomaticallyupdateinthe
lightofchangeselsewhereintheworkbook,theoutputfromtheaddinisnotdynamicallylinkedtothe
sourcedata.Henceifanyofthedatachangetheaddinmustberunagaintoobtainupdatedoutput.
Outputfromtheaddincanbemisleading(seehttp://support.microsoft.com/kb/829252forexample).

ThereareothercommerciallyavailableaddinsthatmakeuseofExcelsfamiliaruserinterfacebut
supplementitsstatisticalfunctionality.Examplesinclude:
Analyseit http://www.analyseit.com/
RExcel http://rcom.univie.ac.at/
Unistat http://www.unistat.com/
XLSTAT http://www.xlstat.com/en/home/
StatTools http://www.palisade.com/stattools/

Usingthisleaflet
Supposeyouhaveasampleofthreedata,10.4,11.2and16.4,thatyouhaveenteredintocellsA2:A4ona
worksheet.InExcelafunction,e.g.SUM,canbeappliedtothesedatainoneoffourways:
=SUM(10.4,11.2,16.4)
=SUM(A2,A3,A4)
=SUM(A2:A4)
=SUM(x) wherexisthenameattachedtorangeA2:A4.
Inthisleaflet,forsimplicity,wehavechosentorefertonamed
ranges.Tonamearange,simplyhighlighttherangeofcells,clickin
theNameBoxonthefarleftoftheFormulaBar,typeintherequired
name,e.g.x,thenpressEnter.InExcel2007namescanbemanaged
viaFormulas>NameManager.

Ifyouprefernottousenamestheninwhatfollowssimplyreplacethe
nameoftherange,e.g.x,bytherangeaddress,e.g.A2:A4.

DescriptiveStatistics
Assumingasampleofdatainrangex
Sampletotal,x =SUM(x)
Samplesize,n =COUNT(x)
Samplemean,x/n =AVERAGE(x)
2
Samplevariance,s =VAR(x)
Samplestandarddeviation,s =STDEV(x)
Meansquareddeviation =VARP(x)
Rootmeansquareddeviation =STDEVP(x)
Correctedsumofsquares,Sxx =DEVSQ(x)
2
Rawsumofsquares,x =SUMSQ(x)
Minimumvalue =MIN(x)
Maximumvalue =MAX(x)
Range =MAX(x)MIN(x)
LowerQuartile,Q1* =QUARTILE(x,1)
Median,Q2 =MEDIAN(x)
UpperQuartile,Q3* =QUARTILE(x,3)
Interquartilerange,IQR =QUARTILE(x,3)QUARTILE(x,1)
th
K Percentile =PERCENTILE(x,K%) whereKisanumberbetween0and100
Mode =MODE(x)

*Note:Thereareseveraldifferentdefinitionsfortheupperandlowerquartiles,sothevaluescalculatedby
Excelmaynotagreewithyourtextbookorotherstatisticalcalculationtools.

Boxplot Seehttp://www.coventry.ac.uk/ec/~nhunt/boxplot.htm

GroupedFrequencyData
Assumingafrequencydistributionwithclassmidpointsstoredinrangexandfrequenciesinrangef:
Samplesize,n =SUM(f)
Sampletotal,fx =SUMPRODUCT(f,x)
Samplemean,fx/n =SUMPRODUCT(f,x)/SUM(f)
Correctedsumofsquares,Sxx =SUMPRODUCT(f,x,x)SUMPRODUCT(f,x)^2/SUM(f)
Samplevariance,s2 =(SUMPRODUCT(f,x,x)SUMPRODUCT(f,x)^2/SUM(f))/(SUM(f)1)
Samplestandarddeviation,s =SQRT(Samplevariance)

GraphicalRepresentations
Exceloffersawiderangeofcharttypesfordisplayingdata.Manyoftheseareoverelaborate.In
particular,3Deffectscanbemisleadingandshouldbeavoided.
InExcel2007toconstructachartforyourdata:
1. Selecttherangecontainingyourdata,includinganyroworcolumnlabels.
2. Onthemainribbon,clickontheInserttab.
3. UndertheChartsgroupoficons,selectthecharttyperequired,thenthepreferredchartsubtype.
4. UnderChartToolsonthemainribbon,usetheDesign,LayoutandFormattabstocustomisethechart.
InearlierversionsofExcel,selectthedatarangeandthenInsert>CharttoinvoketheChartWizard.

PermutationsandCombinations
Numberofdifferentcombinationsofmobjectsselectedfromnobjects
n
C m =COMBIN(n,m)

Numberofdifferentpermutationsofmobjectsselectedfromnobjects
n
P m =PERMUT(n,m)

StandardProbabilityDistributions
AssumingarandomvariableXandconstantsaandb

Binomial Bin(n,p)

P(X=a) =BINOMDIST(a,n,p,FALSE)

P(Xa) =BINOMDIST(a,n,p,TRUE)



Geometric Geom(p)

P(X=a) =BINOMDIST(1,a,p,FALSE)/a

P(Xa) =1BINOMDIST(0,a,p,FALSE)



Poisson Po()

P(X=a) =POISSON(a,lambda,FALSE)

P(Xa) =POISSON(a,lambda,TRUE)



Pascal Pasc(n,p)

P(X=a) =NEGBINOMDIST(an,n,p)

P(Xa) =BETADIST(p,n,an+1)/BETADIST(1,n,a
n+1)



Normal N(, 2)

f(a) =NORMDIST(a,mu,sigma,FALSE)

P(Xa) =NORMDIST(a,mu,sigma,TRUE)

P(aXb) =NORMDIST(b,mu,sigma,TRUE)
NORMDIST(a,mu,sigma,TRUE)

P(Xb) =1NORMDIST(b,mu,sigma,TRUE)



Exponential Expon()

f(a) =EXPONDIST(a,theta,FALSE)

P(Xa) =EXPONDIST(a,theta,TRUE)

P(aXb) =EXP(a*theta)EXP(b*theta)

P(Xb) =EXP(b*theta)



Gamma Ga(,)

f(a) =GAMMADIST(a,alpha,beta,
FALSE)

P(Xa) =GAMMADIST(a,alpha,beta,TRUE)

P(aXb) =GAMMADIST(b,alpha,beta,
TRUE)
GAMMADIST(a,alpha,beta,
TRUE)

P(Xb) =1GAMMADIST(b,alpha,beta,
TRUE)

TestStatisticsforPopularSignificanceTests

Onesampletestofamean
Assumingasampleofdatainrangex,drawnfromapopulationwithmeanandstandarddeviation:
H0:=0H1:0
Teststatistic,z =(AVERAGE(x)mu0)/(sigma/SQRT(COUNT(x))) assumingknown
Teststatistic,t =(AVERAGE(x)mu0)/(STDEV(x)/SQRT(COUNT(x))) assumingunknown

Onesampletestofavariance
Assumingasampleofdatainrangex,drawnfromapopulationwithmeanandstandarddeviation:
H0:2=02H1:2> 02
Teststatistic,2 =DEVSQ(x)/sigma0^2

Twosampletestofdifferencebetweenmeans
Assumingtwosamplesofdatainrangesxandy,drawnfrompopulationswithmeans1and2andequal
variances:
H0:12=cH1:12c
Estimatetheunknowncommonstandarddeviationbythepooledestimate:
s =SQRT((DEVSQ(x)+DEVSQ(y))/(COUNT(x)+COUNT(y)2))
Teststatistic,t =(AVERAGE(x)AVERAGE(y)c)/(s*SQRT(1/COUNT(x)+1/COUNT(y)))

Twosampletestofratioofvariances
Assumingtwosamplesofdatainrangesxandy,drawnfrompopulationswithvariances12and22:
H0:12=22H1:12>22
Teststatistic,F =VAR(x)/VAR(y)

Chisquaredtestofassociation
Assumingatwowaycontingencytableofobservedfrequencies.
H0:rowfactorindependentofcolumnfactor
H1:someassociationbetweenrowandcolumnfactors
Thesuggestedlayoutbelowfora4x2tablecaneasilybemodifiedfortablesofothersizes.

A1: =SUM(C3:D6)
A3: =SUM(C3:D3) copydowntoA6
C1: =SUM(C3:C6) copyacrosstoD1
G3: =$A3*C$1/$A$1 copyintoG3:H6
C8: =CHITEST(C3:D6,G3:H6)
C9: =(COUNT(A3:A6)1)*(COUNT(C1:D1)1)
C10: =CHIINV(C8,C9)

CriticalValuesandPvaluesforStatisticalTests
Therearetwoapproachestoconductingsignificancetests.Someanalystsliketocomparetheteststatistic
withthecriticalvalueforagivensignificancelevel;othersprefertocalculatethePvaluecorrespondingto
theteststatistic.Excelcanbeusedforeithermethod.
Assumingsignificancelevel,(typically=5%or0.05):

Twotailedztest
Uppertailcriticalvalue =NORMSINV(1alpha/2)
Pvalueforgivenz =2*(1NORMSDIST(ABS(z)))

Twotailedttestwithvdegreesoffreedom
Uppertailcriticalvalue =TINV(alpha,v)
Pvalueforgivent =TDIST(ABS(t),v,2)

Onetailed 2testwithvdegreesoffreedom
Uppertailcriticalvalue =CHIINV(alpha,v)
Pvalueforgivenchisquared=CHIDIST(chisquared,v)

OnetailedFtestwithv1degreesoffreedomin
thenumeratorandv2inthedenominator
Uppertailcriticalvalue =FINV(alpha,v1,v2)
PvalueforgivenF =FDIST(F,v1,v2)

ConfidenceLimits
Assumingdegreeofconfidence100(1)% (e.g.for95%confidence=0.05):

Onesamplestatistics,withdatainrangex
For (known) Lowerlimit=AVERAGE(x)NORMSINV(1alpha/2)*sigma/SQRT(COUNT(x))
or =AVERAGE(x)CONFIDENCE(alpha,sigma,COUNT(x))
Upperlimit=AVERAGE(x)+NORMSINV(1alpha/2)*sigma/SQRT(COUNT(x))
or =AVERAGE(x)+CONFIDENCE(alpha,sigma,COUNT(x))

For (unknown) Lowerlimit=AVERAGE(x)TINV(alpha, COUNT(x)1)*STDEV(x)/SQRT(COUNT(x))

Upperlimit=AVERAGE(x)+TINV(alpha, COUNT(x)1)*STDEV(x)/SQRT(COUNT(x))

For 2 Lowerlimit=(DEVSQ(x)/CHIINV(alpha/2,COUNT(x))1)

Upperlimit=(DEVSQ(x)/CHIINV(1alpha/2,COUNT(x))1)

Twosamplestatistics,withdataforthefirstsampleinrangex,andthesecondsampleinrangey
For x y ( xknown, y known)
Lowerlimit
=AVERAGE(x)AVERAGE(y)NORMSINV(1alpha/2)*SQRT(sigmax^2/COUNT(x)+sigmay^2/COUNT(y))
Upperlimit
=AVERAGE(x)AVERAGE(y)+NORMSINV(1alpha/2)*SQRT(sigmax^2/COUNT(x)+sigmay^2/COUNT(y))

For x y ( xand y unknownbutassumedequal)


Estimatetheunknowncommonstandarddeviationbythepooledestimate:
s =SQRT((DEVSQ(x)+DEVSQ(y))/(COUNT(x)+COUNT(y)2))
Lowerlimit
=AVERAGE(x)AVERAGE(y)TINV(alpha,COUNT(x)+COUNT(y)2)*s*SQRT(1/COUNT(x)+1/COUNT(y))
Upperlimit
=AVERAGE(x)AVERAGE(y)+TINV(alpha,COUNT(x)+COUNT(y)2)*s*SQRT(1/COUNT(x)+1/COUNT(y))

For x2/ y2

Lowerlimit=DEVSQ(x)/DEVSQ(y)/FINV(alpha/2,COUNT(x)1,COUNT(y)1)
Upperlimit(DEVSQ(x)/DEVSQ(y)/FINV(1alpha/2,COUNT(x)1,COUNT(y)1)

SimpleLinearRegression
InExcelVersions5andabove,aregressionline(ortrendline)canbeaddedtoascatterplotbyrightclicking
ononeoftheplottedpointsandselectingAddTrendlinefromtheshortcutmenu.Bothlinearandavariety
ofnonlinearmodelsmaybefittedtothedata.Theequationofthefittedmodelmaybedisplayed,
togetherwiththevalueofthecoefficientofdetermination,R2.Therearealsooptionstoextrapolatethe
trendlineineitherdirection,ortoforcethetrendlinetohaveaspecificintercept.

Thetrendlineapproachispurelygraphical.Tocalculatepredictions,regressionfunctionsmustbeused.

Assumingasampleofvaluesoftheindependentvariableinrangex,andcorrespondingvaluesofthe
dependentvariableinrangey:
Leastsquaresestimateofintercept,a =INTERCEPT(y,x)
Leastsquaresestimateofslope,b =SLOPE(y,x)
Sxy =SUMPRODUCT(x,y)COUNT(x)*AVERAGE(x)*AVERAGE(y)
Sxx =DEVSQ(x)
Syy =DEVSQ(y)
Samplecovariance,Cov(x,y) =COVAR(x,y)*COUNT(x)/(COUNT(x)1)
Estimateof,s =STEYX(y,x)
Predictionofyatx=x0,=a+bx0 =FORECAST(x0,y,x)

Estimatedstandarderrorofindividualpredictedyatx=x0
=STEYX(y,x)*SQRT(1+1/COUNT(x)+(x0AVERAGE(x))^2/DEVSQ(x))
Estimatedstandarderrorofmeanpredictedyatx=x0
=STEYX(y,x)*SQRT(1/COUNT(x)+(x0AVERAGE(x))^2/DEVSQ(x))

Correlation
Assumingtwosamplesofpaireddatainrangesxandy:
Pearsonproductmoment
correlationcoefficient,r =CORREL(x,y)

RankCorrelation
Assumingtwosamplesofpaireddatainrangesxandywithnoties:
Rankofithvalueinrangex =RANK(INDEX(x,i),x,1)

Assumingtwosamplesofpaireddatainrangesxandywithsometiedvalues:
Rankofithvalueinrangex =(RANK(INDEX(x,i),x,1)RANK(INDEX(x,i),x,0)+COUNT(x)+1)/2

Assumingthattherangesrxandrycontaintheranksofthedatainxandyrespectively:
Spearmanrankcorrelationcoefficient,rS=CORREL(rx,ry)


Intheexampleabove:
D2: =RANK(B2,$B$2:$B$7,1) copydowntoD7
E2: =RANK(B2,$B$2:$B$7,0) copydowntoE7
F2: =(D2E2+COUNT($B$2:$B$7)+1)/2 copydowntoF7
F9: =CORREL(C2:C7,F2:F7) adjustedforties

TimeSeries
Theexamplesbelowrefertothreeyearsofobservedquarterlydata.
Forecastsaremadeforafurtherfourquarters(oneextrayear).
Levelonly


Simplemovingaverageperiod5
C4: =AVERAGE(B2:B6) copydowntoC11
C14: =C$11 copydowntoC17
Centredmovingaverageperiod4
D4: =(AVERAGE(B2:B5)+AVERAGE(B3:B6))/2 copydowntoD11
D14: =D$11 copydowntoD17
Exponentiallyweightedmovingaverage
E2: =B2 initiallevelestimate
E3: =$G2*B3+(1$G2)*E2 copydowntoE13
E14: =E$13 copydowntoE17

ThechartwasdrawnbyhighlightingB1:B17andE1:E17thenusingInsert>Charts>Line>2DLine.

Levelandconstanttrend


C2: =FORECAST(A2,$B$2:$B$13,$A$2:$A$13) copydowntoC17

Levelandchangingtrend


C2: =B2 initiallevelestimate
C3: =$F2*B3+(1$F2)*(C2+D2) copydowntoC13
D2: =B3B2 initialtrendestimate
D3: =$G2*(C3C2)+(1$G2)*D2 copydowntoD13
E3: =C2+D2 copydowntoE13
E14: =C$13+(A14A$13)*D$13 copydowntoE17

Level,changingtrendandseasonality


C5: =AVERAGE(B2:B5) initiallevelestimate
C6: =G$2*B6/E2+(1G$2)*(C5+D5) copydowntoC13
D5: =(AVERAGE(B6:B9)C5)/4 initialtrendestimate
D6: =H$2*(C6C5)+(1H$2)*D5 copydowntoD13
E2: =B2/C$5 copydowntoE5,initialseasonalestimates
E6: =I$2*B6/C6+(1I$2)*E2 copydowntoE13
F6: =(C5+D5)*E2 copydowntoF13
F14: =(C$13+(A14A$13)*D$13)*E10 copydowntoF17

Version9,9June2009

You might also like