0% found this document useful (0 votes)

24 views6 pages

8 Probability Distributions: 8.1 R As A Set of Statistical Tables

This document discusses probability distributions in R. It provides: 1) A table listing common probability distributions available in R with the function names to evaluate each distribution's density, CDF, quantile, and random values. 2) Examples of examining the distribution of a dataset using summaries, histograms, density plots, and normality tests. 3) Examples of one-sample and two-sample tests, including a t-test comparing the means of two datasets.

Uploaded by

sansantosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views6 pages

8 Probability Distributions: 8.1 R As A Set of Statistical Tables

Uploaded by

sansantosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

33

8 Probability distributions

8.1 R as a set of statistical tables

One convenient use of R is to provide a comprehensive set of statistical tables. Functions are
provided to evaluate the cumulative distribution function P (X ≤ x), the probability density
function and the quantile function (given q, the smallest x such that P (X ≤ x) > q), and to
simulate from the distribution.

Distribution R name additional arguments

beta beta shape1, shape2, ncp
binomial binom size, prob
Cauchy cauchy location, scale
chi-squared chisq df, ncp
exponential exp rate
F f df1, df2, ncp
gamma gamma shape, scale
geometric geom prob
hypergeometric hyper m, n, k
log-normal lnorm meanlog, sdlog
logistic logis location, scale
negative binomial nbinom size, prob
normal norm mean, sd
Poisson pois lambda
signed rank signrank n
Student’s t t df, ncp
uniform unif min, max
Weibull weibull shape, scale
Wilcoxon wilcox m, n
Prefix the name given here by ‘d’ for the density, ‘p’ for the CDF, ‘q’ for the quantile function
and ‘r’ for simulation (r andom deviates). The first argument is x for dxxx, q for pxxx, p for
qxxx and n for rxxx (except for rhyper, rsignrank and rwilcox, for which it is nn). In not
quite all cases is the non-centrality parameter ncp currently available: see the on-line help for
details.
The pxxx and qxxx functions all have logical arguments lower.tail and log.p and the
dxxx ones have log. This allows, e.g., getting the cumulative (or “integrated”) hazard function,
H(t) = − log(1 − F (t)), by
- pxxx(t, ..., lower.tail = FALSE, log.p = TRUE)
or more accurate log-likelihoods (by dxxx(..., log = TRUE)), directly.
In addition there are functions ptukey and qtukey for the distribution of the studentized
range of samples from a normal distribution, and dmultinom and rmultinom for the multinomial
distribution. Further distributions are available in contributed packages, notably SuppDists
(https://CRAN.R-project.org/package=SuppDists).
Here are some examples
> ## 2-tailed p-value for t distribution
> 2*pt(-2.43, df = 13)
> ## upper 1% point for an F(2, 7) distribution
> qf(0.01, 2, 7, lower.tail = FALSE)
See the on-line help on RNG for how random-number generation is done in R.
Chapter 8: Probability distributions 34

8.2 Examining the distribution of a set of data

Given a (univariate) set of data we can examine its distribution in a large number of ways. The
simplest is to examine the numbers. Two slightly different summaries are given by summary and
fivenum and a display of the numbers by stem (a “stem and leaf” plot).

> attach(faithful)
> summary(eruptions)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.600 2.163 4.000 3.488 4.454 5.100
> fivenum(eruptions)
[1] 1.6000 2.1585 4.0000 4.4585 5.1000
> stem(eruptions)

The decimal point is 1 digit(s) to the left of the |

16 | 070355555588
18 | 000022233333335577777777888822335777888
20 | 00002223378800035778
22 | 0002335578023578
24 | 00228
26 | 23
28 | 080
30 | 7
32 | 2337
34 | 250077
36 | 0000823577
38 | 2333335582225577
40 | 0000003357788888002233555577778
42 | 03335555778800233333555577778
44 | 02222335557780000000023333357778888
46 | 0000233357700000023578
48 | 00000022335800333
50 | 0370

A stem-and-leaf plot is like a histogram, and R has a function hist to plot histograms.

> hist(eruptions)
## make the bins smaller, make a plot of density
> hist(eruptions, seq(1.6, 5.2, 0.2), prob=TRUE)
> lines(density(eruptions, bw=0.1))
> rug(eruptions) # show the actual data points

More elegant density plots can be made by density, and we added a line produced by
density in this example. The bandwidth bw was chosen by trial-and-error as the default gives
Chapter 8: Probability distributions 35

too much smoothing (it usually does for “interesting” densities). (Better automated methods of
bandwidth choice are available, and in this example bw = "SJ" gives a good result.)

Histogram of eruptions
0.7
0.6
0.5
Relative Frequency

0.4
0.3
0.2
0.1
0.0

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

eruptions

We can plot the empirical cumulative distribution function by using the function ecdf.
> plot(ecdf(eruptions), do.points=FALSE, verticals=TRUE)
This distribution is obviously far from any standard distribution. How about the right-hand
mode, say eruptions of longer than 3 minutes? Let us fit a normal distribution and overlay the
fitted CDF.
> long <- eruptions[eruptions > 3]
> plot(ecdf(long), do.points=FALSE, verticals=TRUE)
> x <- seq(3, 5.4, 0.01)
> lines(x, pnorm(x, mean=mean(long), sd=sqrt(var(long))), lty=3)

ecdf(long)
1.0
0.8
0.6
Fn(x)

0.4
0.2
0.0

3.0 3.5 4.0 4.5 5.0

Quantile-quantile (Q-Q) plots can help us examine this more carefully.

par(pty="s") # arrange for a square figure region
qqnorm(long); qqline(long)
Chapter 8: Probability distributions 36

which shows a reasonable fit but a shorter right tail than one would expect from a normal
distribution. Let us compare this with some simulated data from a t distribution

Normal Q−Q Plot

5.0
4.5
Sample Quantiles

4.0
3.5
3.0

−2 −1 0 1 2

Theoretical Quantiles

x <- rt(250, df = 5)
qqnorm(x); qqline(x)
which will usually (if it is a random sample) show longer tails than expected for a normal. We
can make a Q-Q plot against the generating distribution by
qqplot(qt(ppoints(250), df = 5), x, xlab = "Q-Q plot for t dsn")
qqline(x)
Finally, we might want a more formal test of agreement with normality (or not). R provides
the Shapiro-Wilk test
> shapiro.test(long)

Shapiro-Wilk normality test

data: long
W = 0.9793, p-value = 0.01052
and the Kolmogorov-Smirnov test
> ks.test(long, "pnorm", mean = mean(long), sd = sqrt(var(long)))

One-sample Kolmogorov-Smirnov test

data: long
D = 0.0661, p-value = 0.4284
alternative hypothesis: two.sided
(Note that the distribution theory is not valid here as we have estimated the parameters of the
normal distribution from the same sample.)

8.3 One- and two-sample tests

So far we have compared a single sample to a normal distribution. A much more common
operation is to compare aspects of two samples. Note that in R, all “classical” tests including
the ones used below are in package stats which is normally loaded.
Consider the following sets of data on the latent heat of the fusion of ice (cal/gm) from Rice
(1995, p.490)
Chapter 8: Probability distributions 37

Method A: 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97

80.05 80.03 80.02 80.00 80.02
Method B: 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97
Boxplots provide a simple graphical comparison of the two samples.
A <- scan()
79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97
80.05 80.03 80.02 80.00 80.02

B <- scan()
80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97

boxplot(A, B)
which indicates that the first group tends to give higher results than the second.
80.04
80.02
80.00
79.98
79.96
79.94

1 2

To test for the equality of the means of the two examples, we can use an unpaired t-test by
> t.test(A, B)

Welch Two Sample t-test

data: A and B
t = 3.2499, df = 12.027, p-value = 0.00694
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.01385526 0.07018320
sample estimates:
mean of x mean of y
80.02077 79.97875
which does indicate a significant difference, assuming normality. By default the R function does
not assume equality of variances in the two samples (in contrast to the similar S-Plus t.test
function). We can use the F test to test for equality in the variances, provided that the two
samples are from normal populations.
> var.test(A, B)

F test to compare two variances

Chapter 8: Probability distributions 38

data: A and B
F = 0.5837, num df = 12, denom df = 7, p-value = 0.3938
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.1251097 2.1052687
sample estimates:
ratio of variances
0.5837405
which shows no evidence of a significant difference, and so we can use the classical t-test that
assumes equality of the variances.
> t.test(A, B, var.equal=TRUE)

Two Sample t-test

data: A and B
t = 3.4722, df = 19, p-value = 0.002551
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.01669058 0.06734788
sample estimates:
mean of x mean of y
80.02077 79.97875
All these tests assume normality of the two samples. The two-sample Wilcoxon (or Mann-
Whitney) test only assumes a common continuous distribution under the null hypothesis.
> wilcox.test(A, B)

Wilcoxon rank sum test with continuity correction

data: A and B
W = 89, p-value = 0.007497
alternative hypothesis: true location shift is not equal to 0

Warning message:
Cannot compute exact p-value with ties in: wilcox.test(A, B)
Note the warning: there are several ties in each sample, which suggests strongly that these data
are from a discrete distribution (probably due to rounding).
There are several ways to compare graphically the two samples. We have already seen a pair
of boxplots. The following
> plot(ecdf(A), do.points=FALSE, verticals=TRUE, xlim=range(A, B))
> plot(ecdf(B), do.points=FALSE, verticals=TRUE, add=TRUE)
will show the two empirical CDFs, and qqplot will perform a Q-Q plot of the two samples. The
Kolmogorov-Smirnov test is of the maximal vertical distance between the two ecdf’s, assuming
a common continuous distribution:
> ks.test(A, B)

Two-sample Kolmogorov-Smirnov test

data: A and B
D = 0.5962, p-value = 0.05919
alternative hypothesis: two-sided

Introduction To Statistics and Data Analysis
No ratings yet
Introduction To Statistics and Data Analysis
567 pages
Probability and Statistics For Scientists and Engineers 8th Ed K Ye S Myers SOLUTIONS MANUAL
81% (16)
Probability and Statistics For Scientists and Engineers 8th Ed K Ye S Myers SOLUTIONS MANUAL
285 pages
Exercises in Advanced Risk and Portfolio Management SSRN-id1447443
100% (1)
Exercises in Advanced Risk and Portfolio Management SSRN-id1447443
281 pages
Applied Statistics For Bioinformatics PDF
No ratings yet
Applied Statistics For Bioinformatics PDF
278 pages
Tenko Raykov, George A. Marcoulides-Basic Statistics - An Introduction With R-Rowman & Littlefield Publishers (2012) PDF
No ratings yet
Tenko Raykov, George A. Marcoulides-Basic Statistics - An Introduction With R-Rowman & Littlefield Publishers (2012) PDF
345 pages
Essentials of Statistics
No ratings yet
Essentials of Statistics
272 pages
Research Methodology Lab File
100% (1)
Research Methodology Lab File
77 pages
Probability and Statistics Advanced (Second Edition)
100% (1)
Probability and Statistics Advanced (Second Edition)
359 pages
Programming With R Test 2
50% (2)
Programming With R Test 2
5 pages
Lecture Notes Ma12003 PDF
100% (1)
Lecture Notes Ma12003 PDF
105 pages
Unit-4 NPT 2024
No ratings yet
Unit-4 NPT 2024
4 pages
Advanced Data Analysis Binder 2015
100% (1)
Advanced Data Analysis Binder 2015
165 pages
Statistics Cheat Sheet
100% (1)
Statistics Cheat Sheet
4 pages
Chi Square Test
100% (1)
Chi Square Test
52 pages
MAT 211 Introduction To Business Statistics I Lecture Notes
No ratings yet
MAT 211 Introduction To Business Statistics I Lecture Notes
69 pages
Exercises in Advanced Risk and Portfolio Management PDF
No ratings yet
Exercises in Advanced Risk and Portfolio Management PDF
281 pages
B. Beaver-Introduction To Probability and Statistics - Mendenhall, Beaver and Beaver (STUDENT's SOLUTION MANUAL) - Duxbury (2006)
No ratings yet
B. Beaver-Introduction To Probability and Statistics - Mendenhall, Beaver and Beaver (STUDENT's SOLUTION MANUAL) - Duxbury (2006)
159 pages
Book IntroStatistics PDF
No ratings yet
Book IntroStatistics PDF
263 pages
Krijnen IntroBioInfStatistics
No ratings yet
Krijnen IntroBioInfStatistics
278 pages
Estadistica Medica Con R
No ratings yet
Estadistica Medica Con R
73 pages
SSRN Id1447443
No ratings yet
SSRN Id1447443
281 pages
CompleteLectureNotes STAT 261
No ratings yet
CompleteLectureNotes STAT 261
158 pages
Statistics With MATLAB/Octave: Andreas Stahel Bern University of Applied Sciences Version of 30th June 2017
No ratings yet
Statistics With MATLAB/Octave: Andreas Stahel Bern University of Applied Sciences Version of 30th June 2017
46 pages
Genetica Cuantitativa
No ratings yet
Genetica Cuantitativa
120 pages
Medical Statistics With R
No ratings yet
Medical Statistics With R
85 pages
Intro Stat
No ratings yet
Intro Stat
324 pages
Collaborative Statistics Teacher's Guide
No ratings yet
Collaborative Statistics Teacher's Guide
59 pages
Describing Data: Probability and Statistics For Science and Engineering With Examples in R
No ratings yet
Describing Data: Probability and Statistics For Science and Engineering With Examples in R
24 pages
Aslani and Miranda - Fagility Slab Column Connections
No ratings yet
Aslani and Miranda - Fagility Slab Column Connections
28 pages
Random Numbers
No ratings yet
Random Numbers
99 pages
Simple Statistics Functions in R
No ratings yet
Simple Statistics Functions in R
41 pages
Probability Distributions in R
No ratings yet
Probability Distributions in R
42 pages
Mathematical Computations Using R
No ratings yet
Mathematical Computations Using R
53 pages
ACTEX Learning: R Formula & Review Sheet
No ratings yet
ACTEX Learning: R Formula & Review Sheet
52 pages
Lecture 8
No ratings yet
Lecture 8
76 pages
Recurrence Interval Analysis of Financial Time Series: Wei-Xing Zhou, Zhi-Qiang Jiang, and Wen-Jie Xie
No ratings yet
Recurrence Interval Analysis of Financial Time Series: Wei-Xing Zhou, Zhi-Qiang Jiang, and Wen-Jie Xie
86 pages
Introduction To Nonparametric Statistical Methods: January 2018
No ratings yet
Introduction To Nonparametric Statistical Methods: January 2018
46 pages
Neuman Allen
No ratings yet
Neuman Allen
48 pages
Fitting Distributions With R
No ratings yet
Fitting Distributions With R
24 pages
Statistics Using R Tutorial
No ratings yet
Statistics Using R Tutorial
22 pages
Evaluation of English Textbook For 8
No ratings yet
Evaluation of English Textbook For 8
13 pages
FSML Descriptives Probability RV RNG Handout
No ratings yet
FSML Descriptives Probability RV RNG Handout
57 pages
Lecture 7 - Distance Measures
No ratings yet
Lecture 7 - Distance Measures
38 pages
USEPJSEP An Evaluation of Glueability of Nigerian Isoberlinia Doka at Moisture Content Above Fiber Saturation
No ratings yet
USEPJSEP An Evaluation of Glueability of Nigerian Isoberlinia Doka at Moisture Content Above Fiber Saturation
20 pages
Statistics With MATLABOctave
No ratings yet
Statistics With MATLABOctave
46 pages
Pengaruh Kualitas Pelayanan, Harga Dan Lokasi Terhadap Kepuasan Pelanggan Cafe Milkmoo
No ratings yet
Pengaruh Kualitas Pelayanan, Harga Dan Lokasi Terhadap Kepuasan Pelanggan Cafe Milkmoo
20 pages
STA80006 Weeks7-12 PDF
No ratings yet
STA80006 Weeks7-12 PDF
29 pages
Econometrics Unit 4
No ratings yet
Econometrics Unit 4
56 pages
Unit 1 Assignment SKELETON R spr18
No ratings yet
Unit 1 Assignment SKELETON R spr18
23 pages
Introduction To Rstudio: Creating Vectors
No ratings yet
Introduction To Rstudio: Creating Vectors
11 pages
Day 3
No ratings yet
Day 3
19 pages
QM2 Tutorial 3
No ratings yet
QM2 Tutorial 3
26 pages
Presentation 3
No ratings yet
Presentation 3
29 pages
Bab 3
No ratings yet
Bab 3
15 pages
Problems
No ratings yet
Problems
22 pages
Homework 1: Statistics 109 Due February 17, 2019 at 11:59pm EST
No ratings yet
Homework 1: Statistics 109 Due February 17, 2019 at 11:59pm EST
23 pages
Assumption of Normality
No ratings yet
Assumption of Normality
27 pages
Article Abdul Wahid Acc
No ratings yet
Article Abdul Wahid Acc
21 pages
BSc. AC-Sem IV
No ratings yet
BSc. AC-Sem IV
19 pages
Annotated 3 Ch3 Data Description F2014
No ratings yet
Annotated 3 Ch3 Data Description F2014
16 pages
R-Unit III Lec 17
No ratings yet
R-Unit III Lec 17
30 pages
R Complete
No ratings yet
R Complete
24 pages
Statistics With R
No ratings yet
Statistics With R
41 pages
Power Comparisons of Shapiro-Wilk Kolmogorov-Smirn
No ratings yet
Power Comparisons of Shapiro-Wilk Kolmogorov-Smirn
14 pages
3 Assmuption-Testing PDF
No ratings yet
3 Assmuption-Testing PDF
17 pages
Rcmds From Class
No ratings yet
Rcmds From Class
17 pages
Useful R Commands
No ratings yet
Useful R Commands
17 pages
Educaplay: A Gamification Tool For Academic Performance in Virtual Education During The Pandemic Covid-19
No ratings yet
Educaplay: A Gamification Tool For Academic Performance in Virtual Education During The Pandemic Covid-19
14 pages
Colors in Marketing: A Study of Color Associations and Context (In) Dependence
No ratings yet
Colors in Marketing: A Study of Color Associations and Context (In) Dependence
14 pages
Kimpton Hotels Case: Sumit Kunnumkal Indian School of Business
No ratings yet
Kimpton Hotels Case: Sumit Kunnumkal Indian School of Business
12 pages
PDF
No ratings yet
PDF
4 pages
Type I and Type II Errors Type I Error
No ratings yet
Type I and Type II Errors Type I Error
7 pages
Practical Statistics
No ratings yet
Practical Statistics
14 pages
RP Notes Unit 4 - Distribution Fucntions
No ratings yet
RP Notes Unit 4 - Distribution Fucntions
5 pages
Massey 1951
No ratings yet
Massey 1951
12 pages
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
No ratings yet
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
4 pages
A New One Parameter Distribution Properties and Estimation With Applications To Complete and Type II Censored Data
No ratings yet
A New One Parameter Distribution Properties and Estimation With Applications To Complete and Type II Censored Data
9 pages
Real Time Customer Communication in e Commerce Improving Customer Experience Satisfaction and Loyalty
No ratings yet
Real Time Customer Communication in e Commerce Improving Customer Experience Satisfaction and Loyalty
8 pages
R Commands
No ratings yet
R Commands
5 pages
At Site Flood Frequency Analysis of Bait
No ratings yet
At Site Flood Frequency Analysis of Bait
11 pages
Lab 4
No ratings yet
Lab 4
6 pages
00 Probability 2
No ratings yet
00 Probability 2
19 pages
Berger 2014
No ratings yet
Berger 2014
5 pages
02 Manova
No ratings yet
02 Manova
5 pages
R Commands
No ratings yet
R Commands
2 pages
Kasus 1: Tests of Normality
No ratings yet
Kasus 1: Tests of Normality
3 pages
The Two-Sample Kolmogorov-Smirnov Test: N X X I X CDF M X Y I X CDF
No ratings yet
The Two-Sample Kolmogorov-Smirnov Test: N X X I X CDF M X Y I X CDF
4 pages
Constraint Networks: Targeting Simplicity for Techniques and Algorithms
From Everand
Constraint Networks: Targeting Simplicity for Techniques and Algorithms
Christophe Lecoutre
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet

8 Probability Distributions: 8.1 R As A Set of Statistical Tables

Uploaded by

8 Probability Distributions: 8.1 R As A Set of Statistical Tables

Uploaded by

33

8.1 R as a set of statistical tables

Distribution R name additional arguments

8.2 Examining the distribution of a set of data

The decimal point is 1 digit(s) to the left of the |

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

3.0 3.5 4.0 4.5 5.0

Quantile-quantile (Q-Q) plots can help us examine this more carefully.

Normal Q−Q Plot

Shapiro-Wilk normality test

One-sample Kolmogorov-Smirnov test

8.3 One- and two-sample tests

Method A: 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97

Welch Two Sample t-test

F test to compare two variances

Two Sample t-test

Wilcoxon rank sum test with continuity correction

Two-sample Kolmogorov-Smirnov test

You might also like