0% found this document useful (0 votes)
2 views

r Programming 1

Chapter 6 discusses discrete and continuous probability distributions, focusing on their applications and how to determine probabilities in various scenarios. It covers discrete probability distributions, including the discrete uniform distribution and the binomial distribution, along with their properties, mean, variance, and how to generate random samples using R. The chapter provides examples and R commands for simulating these distributions and calculating probabilities.

Uploaded by

devankshisharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

r Programming 1

Chapter 6 discusses discrete and continuous probability distributions, focusing on their applications and how to determine probabilities in various scenarios. It covers discrete probability distributions, including the discrete uniform distribution and the binomial distribution, along with their properties, mean, variance, and how to generate random samples using R. The chapter provides examples and R commands for simulating these distributions and calculating probabilities.

Uploaded by

devankshisharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

CHAPTER-6

PROBABILITIES AND DISTRIBUTIONS IN R


Some common standard discrete and continuous probability distributions which
are widely used for either practical applications or constructing statistical methods are
discussed in this chapter. Suppose we are interested in determining the probability of a
certain event. The determination of probabilities depends upon the nature of the study and
various prevailing conditions which affect it. For example, the determination of the
probability of a head when tossing a coin is diferent from the determination of the
probability of rain in the afternoon. One can speculate that some mathematical functions
can be defined which depict the behaviour of probabilities under different situations.
Such functions have special properties and describe how probabilities are distributed
under different conditions. We now discuss some distributions which are very useful.
These distributions are based on certain assumptions. Using these distributions,
predictions can be made on theoretical grounds.
6.1Discrete ProbabilityDistributions:
If the support of the random variable is a finite or countably infinite number of
values, then the random variable is discrete. Discrete random variables have a probability
mass function (pmf). This pmf gives the probability that a random variable will take on
each value in its support. The cumulative distribution function (cd) provides the
probability the random variable is less than or equal to a particular value. The quantile
function is the inverse of the cumulative distribution function, i.e. you provide a
probability and the quantile function returns the valueof the random variable such the cdf
willreturn that probability.

104/ Rfor data science


Discrete Uniform Distribution:
6.1.1

The probability distribution in which the random variable assumes all its values
withequal probabilities, is known as discrete unifornm distribution. If the random variable
Xassumes the value x1, X2, ......, XK With equal probabilities, then its probability function
is given by
P(X = x) =for
k i = 1,2, . , k
If the outcomes are the natural numbers x; = i, (i=1,2, ...,k), the mean and
variance of X are obtained as
k+1
E(X) 2
1
Var(x) =(k2-
12 1).
How to do it with R

To generate random numbers from a discrete uniform distribution the sample() is


used in R. The general syntax to simulate a discrete uniform random variable is
>sample(x, size, replace = TRUE).
The argument x identifies the numbers from which to randomly sample. If x is a
number, then sampling is done from Ito x. The argument size tells how big the sample
size should be, and replace tells whether or not numbers should be replaced in the urn
after having been sampled. The default option is replace = FALSE but for discrete
uniforms the sampled values should be replaced.
Example: Toflip a fair coin 500 times, do
>set.seed(1234)
-sample(c(H", "T"), size -500, replace = TRUE).
Ine command generates a random sample of size 500 from a uniform distribution with
ne four possible outcomes H and T", t is necessary to use the option replace=TRUE
tO Simulate draws with replacement, i.e. to guarantee that a value can occur more than
nee. One can choose an integer at random with the sample function. The use of the
set.seed(0 function allows to reproduce the generated random numbers at any time.

105/ R for data


science
> set.seed (1234)
> sample(c("H", "T"), size = S00, replace TRUE)
[11 "T" "T" "T" "T" "H" "T" "H" "H" "H'" "T" "T" "T" "T" "H" "T" " !
"T"
"" "H" "H"
[21] "T" "" "" "" " "T" "H" "T" "T" "T" "H" "T" "H" "H" "H" "T"
[41] "T" "H" "T" "T" "T" "H" "H" "H" "H" T" "T" "T" "T" T" "H" "T'" "T" "T" "H"
r611 "T" "T" "H" "T" "H" "H" "T" "H" "H" "T" "T" "H'" "H" "H" "T" "T! "H"
"T! "H"
u""
[S1] "H" "T" "H" "T" "H" "T" "H" "T" "T" "H" "H" "H" "H" "" "H" "" p! H"
101] " "H"
"H" "H" "H" "T" "H" "T" "H" "H" "H T" "T" "T" ""T" "H" "p! "yu
"H" "!
[121] "H" "T""T" "H" "T" "H" "T" T" T" "T" "H" "H" "T" "H"
141] "H" "" "H" "T" "T" "
"H" "T'" "T" "H" "T" "T" "H""T" T "T "H"
161] "H" "H""H" "T" "T" "T" "T" "T" "H" "T" "T" "H" "H" "H" "H"
"H" "H" "T"
"H" "T!! " ! uu
! nyt!
"H"
181] "H" "H""H" "H" "T" "T" "H" "T" "H'" T" T" "T" "H" "T" "H"
2011 " "H "T" "H" "H" "H" "T" "T" T" "H" "T" "H" "T" "y" .
"T"
[2211 "T" "T" "T" "H" "H" "H" "H" "T" "H" "T" "T" "H" "T" "H" "H" "H" "T" "y ".
[2411 "H" "T" "H" "H" "T" "H" "T" "H" T" "T" "H" "H" "H" "H" "T" "H" "" "p M
2611 "" "T" "H" "T" "H" "T" "T" "T" "T "H" "H" "H" "H" "H" "H" "T" "" "yr M
r2811 "H" "T" "T" "H" "H" "H" "H" "H" "T" "T" "H" "T" "T" "H" "H" "" "p" u
r3011 "H" "H "H" "H" "T" "T" "H" "H" "H" "T" "H" "T'" T" "H"
"H" "T" "T'" "H" "H" T" T"
"H""T'" "H" "H" "p!
"T" "H" "H" "T" "" u
6.1.2 The Binomial Distribution
A series of independent trials which can result in
exclusive outcomes called success or failure such that the one of the two mutually
failure) in each trial is constant, then such repeated probability of success (or
trials. If we perform a series of n Bernoulli trialsindependent trials are called Bernoulli
such that
probability of success and q is the probability of failure (p + q= for each trial, p is the
1), then probability of x
successes in a series of n independent trials is given by
p(X) = "Cyp*q-* where x = 0,1,2, ...n
This is known as binomial
distribution.
Definition: A discrete random variable X is said to follow a
parameters n and pif itsPMF is given by (1). We also write Xbinomial distribution with
~ B(n; p). The mean and
variance of a binomial random variable X are given by
E(X) = np,
Var(X) = np(1 - p)
To generate a sample from binomial in Rfollowing
command in Rare used
dbinom(x, size, prob, log = FALSE): The prefix d stands for
"density". This
function is used to find probability at a particular value for a data
binomial distribution i.e. it finds: P(X = x) that follows
pbinom(gq, size, prob, lower.tail = TRUE, log.p FALSE) :The
function pbinom() is used to find the
cumulative probability of a dau
following binomial distribution till a given value ie it finds P(X <= x)

106/ R for data science


qbin om(p, size, prob, lower.tail =TRUE, log.p = FALSE):This function is
used to find the nth quantile, that is if P(X<* x) is given, it finds x.
rbinom(n, size, prob):
particular probability.
This function generates n random variables of a

ror example, consider an experiment of rolling an unfair die 50 times wItn ne


probability of success of 0.7. We can use the pmf to calculate the probability of a
parucular outcome of the experiment, what isthe probability of seeing 6succeses? We
can use the dbinom function as given
below:
> n <- 50
> p <- 0.7
> dbinom(6, size = n, prob = P)
[1] 1.841054e-17
> X <- 0:n
> plot(x, dbinom (x, size = n, prob = p), main = "Probability
mass function for Bin(50, 0.7)")

p)
= Probability mass function for Bin(50, 0.7)
prob 0.12

n, o

= 0,06
size
dbinom(%,
0.00
Lgooo0009oago000000oag0000000° Ooo0000o
20 30 40 50
10

Figure 6.1 Binomial PMF plot

plot of the PMF of a binomial distribution withn= 50 and p = 0.7 (i.e. B(50,
A
given in Figure 6.1. If we want to calculate the probability of observing an
0.)) is cumulative distribution
value, we can use the
Outcome less thanor egual to a particular
function.

the pbinom(x, n, p) command, where the prefix p stands for probability,


an use of observin8 D0
tO Calcuiale the CDF at any noint. Forexample. what is the probability
or fewer successes? Use the code given
below:
107/ Rfor data science
pbinom(30, size = n, prob = p)
>
[1]0.0848026
poinom(x, Size = n, prob = p), type="s", main ="Cu
>plot (x, function for Bin(50,0.7)")
mulative distribution

Cumulative distribution function for Bin(50,0.7)


1.0

p)
= 0.8
prob

=n, 0.6
size
0.4
pbinom(x,
0.2
0.0

10 20 30 40 50

Figure 6.2 Binomial CDF plot


For a discrete random variable, the cdf is a step function since
the
function jumps whenever it comes across a value in the support for the random variable.
To plot a CDF for binomial distribution in R we use the plot()
function as illustrated in
Figure 6.2.
Here is the quantile function using the qbinom function.
quantile function of B(50,0.7)
Figure 6.3 illustrates
P_seq <- seq(from =0, to = 1, length = 101)
plot(p_seq, qbinom(p_seq, size = n, prob = p), type="'s", main ="Quantile
function for Bin(50,0.7)")

108/ R for data science


R
How to do it in
> p_seq
<- seg(from = 0, to = 1, 1ength = 101)
> p_seq
11 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0. 08 0.09
ru10.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19
f211 0. 20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
r311 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39
F41] 0.40 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49
rS1] 0.50 0.5l 0.52 0.53 0. 54 0.55 0.56 0.57 0.58 0.59
r61] 0.60 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0,68 0.69
[71] 0.70 0.71 0.72 0.73 0,74 0.75 0.76 0.77 0.78 0.79
[81] 0.80 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0. 88 0.89
[91] 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.,98 0.99
[101] 1.00
type="", main =
> plot(P_seq, qbinom(p_seg, size = n, prob = p),
"Quantile function for Bin(50,0.7)")

p) Quantile function for Bin(50,0.7)


= 50
prob
40
n,
= 10
size 20
30

qbinom(p_seq,

0.6 0.8 1.0


0.0 0.2 0.4

P_seq

Figure 6.3 Binomial 00 plot


draw random values from this distribution using the rbinom function.
Finally, we can
binomial distribution with n=50 and p=0.7
# Generating a random sample from a
> draws <- rbinom(100, size = n, prob = p)
> brks <- (0:(n+1)) - 0.5
from Bin
> hist (draws, breaks = brks. main = "Random draws
(50,0.7)")
109/ R for data science
Rather than having the number of draws, we often want the
eference. we can add the true probabilities using the pmf (shown percentage
of

Figurdreaws.64. FrR
in red) in
> hist(draws, breaks =
> points (x, dbinom(x, brks, =probability =TRUE)
size n, prob = p),
col=" red")
Histogram of draws
0.15

Density0.10
0.05

0.00

10 20 30 40 50
draws

Fig. 6.4 Histogram of a


6.1.3 Poisson distribution Binomial distribution
The Poisson
distribution is a discrete probability function that
only take specific values in a given list of means the variable can
distribution measures how many times an event is numbers, probably infinite. A Poisson
time. In other words, we can likely to occur within "x" period of
Poisson experiment. A Poisson define it as the
probability distribution that results from the
experiment
experiment into two categories, such as is a statistical
eXperiment that classities the
limiting process of the binomial success or failure. Poisson distribution is a
APoisson distribution.
random variable x"
distribution occurs when there aredefines the number of
events that
successes in the Cxperiment. This
number of outcomes. Poisson do not 0Ccur as the outcomes of a definite
distribution used under certain conditions.They ae
The number of trials "n"
is
tends to infinity
Probabilityof success "p" tends to zero
np = lis finite
110/ Rfor data
science
Adiscrele tandom variable Xis said to
follow a Poisson distribution with parameter
fits PMF is given by

P(X =x =exp(-A) (x =0, 1,2. ...).


x!
We also Write X~ Po(). The mean and varjance of aPoisson random variable are
identical:

E(X) = Var(X) = 2.
How to do it with R

Syntax: poission distribution is given by


dpois(x, lambda, log = FALSE)
ppois(q, lambda, lower.tail = TRUE, Iog.p = FALSE)
log.p = FALSE)
qpois(p, lambda, lower.tail=TRUE,
rpois(n, lambda)
The pmf is given by
> rate <- 2
> X <- 0:10
l ambda = rate), main = "Probability
> plot(x, dpois(x, Po(2)")
mass function for

the PMF of Po(2) is given below in Figure 6.5


A plot of

science
111/R for data
Probability mass function for Po(2)
rate)
lambda
dpois(x,

&

.10

0.00

2 4 6 10

Figure 6.5 Poission PMF plot


The cdf is given by
plot(x, ppois (x, lambda = rate), type="s", main =
"Cumulative distribution function for Po(2)")
A plot of the CDF of a Po(2) is shown below in Figure 6.6

1.0
0.8 Cumulative distribution function for Po(2)
rate)
=
lambda
0.6

Ppois(x,
0.4

0.2

2 4 6 8 10

Figure 6.6 Poisson CDF plot

112/ Rfor data science


The quantile function is
given by
>
plot(p_seq,
|(0,10), main = qpois(p_seq,
lambda = rate), type=""s", ylim=c
"Quantile function for Po(2)")
Aplot of the Quantile Function of
Po(2) is shown in Figure 6.7

10 Quantile function for Po(2)


rate)
= 8
lambda
6

qpois(p_seq,
4

0.0 0.2 0.4 0.6 0.8 1.0

P_seq

Figure 6.7 Poisson 00 plot


#Generating a random sample from a Poisson distribution with lambda=2
> draws <- rpois(100, lambda = rate)
> mean(draws)
[1] 1.95
> sd(draws)
[1] 1.388081
> hist(draws, breaks = (0: (max (draws)+1) ) - 0. 5, probability
= TRUE, main = "Random draws from Po(2)")
> points (x, dpois (x, lambda = rate), col="red")
Aplot of the density function of Po(2) is shown in Figure 6.8

science
113/ R for data
0.30 Random draws from Po(2)

0.20
Density

0.10

0.00

1 2 3 4 5

draws

Figure 6.8 Density function of PO(2)


6.2 Continuous distributions
In contrast to discrete random
an uncountably infinite number of
variables, continuous random variables can take on
values. The
random variable can take on any value between easiest
two
way for this to happen is that the
variables have a probability density function (pdf) specified values. Continuous random
instead of a pmf. When integrated
from ato b, this pdf gives the
probability the random variable will take on a value
between a and b. Continuous random variables still
have a cdf, quantile function, and
random generator that allstill have the same
interpretation.
6.2.1 Uniform distribution
The simplest continuous distribution is the
uniform distribution is also referred to as the uniform distribution. The continuous
number selection from the continuous interval probability distribution of any random
defined between intervals a and b. A
uniform distribution holds the same probability for the entire
rectangle, and therefore it is often referred to as rectangularinterval. Thus, itsplot is a
probability density function for a uniform random variable is zero outsidedistribution.
of a
he
continuous random variable X is said to be uniformly distributed, or having and b. Te
distribution on the interval [a,b]. We write X~U(a,b), if its probability densityrectangula
functo
equals

114/ R for data science


f(z) = b-a 2 ¬ |a, b and 0 elsewhere (Lovric 2011).

when a<<b
f(z) =
0, when z<a or >b

Probability Density Function


dunif)method in is used to generate density function. It calculates the uniform
density function inRlanguage in the specified interval (a, b).
Syntax:
dunif(x, min =0, max =1, log = FALSE)
Parameter:

x: input sequence
min, max= range of values

log: indicator, of whether to display the output values as probabilities.


How to doin R

>X_dunif <- seq(0, 100, by = 1)


> y_dunif <- dunif(x_dunif, min =
10, max = 50)
>plot(y_dunif, type = "o")
Output:
= 1)
> X_dunif <- seq(0, 100, by
> X_dunif 10 11 12 13 14 15 16 17 18 19
5 6 7
[1] 0 1 2 3 32 33 34 35 36 37 38 39
25 26 27 28 29 30 31
[21] 20 21 22 23 24 S3 54 55 56 57 58 S9
47 48 49 50 51 52
[41] 40 41 42 43 44 45 46 73 74 75 76 77 78 79
67 68 69 70 71 72
64 65 66 98 99
[61] 60 61 62 63
88 89 90 91 92 93 94 95 96 97
82 83 84 85 86 87
[81] 80 81
[101] 100 = 50)
= 10, maX
> y_dunif <- dunif(x dunif. min
> y_dunif 0.000 0.000 0.025 0.025 0.025
0.000 0.000 0.000 0. 000
[1] 0.000 0.000 0.000 0.000 0.025 0.025
0.025 0.025 0.025 0.025 0.025
0.025 0.025 0.025 0.025 0.025 0.025 0.025 0.025
[14] 0. 025 0.025 0.025
0.025 0.025
0.025 0.025 0.025 0.025 0.025 0.025 0.000
[27] 0. 025 0.025 0.025 0.025 0.025 0. 025
0.025 0.025
140] 0. 025 0.025 0.025 0.025 0.000 0.000 0.000 0.000
000 0.000 0. 000 0.000 0. 000
0.000
[53] 0.000 0.000 0.000 0. 0.000 0. 000 0.000 0.000
0.000 0.000 0. 000 0.000 0.000 0.000 0.000 0.000
Lbb] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
000 0.000
LJ O.000 0. 000 0.000 0.000 0. 0.000 0.000 0.000 0.000
L92J O.000 0. 000 0.000 0.000 0. 000 0.000
> plot (y_dunif, type = "o")

115/Rfor data science


Plot for PMF of Uniform Distribution is shown below in Figure 6.9

0.015
y_dunif

0.000

20 40 60 80 100

Index

Figure6.9 Uniform PMF plot


Cumulative probability distribution
The punif) method in R is used to
function, this is, the probability of a variablecalculate the uniform cumulative
X takinga value lower than X distribution
(that is, x<= X).
If we need to compute a value x >
X, we can calculate 1
Syntax: -punif(x).
punif(q, min =0, max= 1,
lower.tail=TRUE)
> X_punif <- seq(0, 100, by
> = 1)
y_punif <-
plot(y_punif,punif(x_punif,
> min = 10, max = 50)
type = "o")

116/ Rfor data science


Output:
>
X_punif
X_punif
<- seg(0, 100, by = 1)
[1) 1 2 3
[21] 20 21 22 23
4 6 8 9 10 11 12 13 14 15 16 17 18 19
[41) 40 24 25 26 27
41 42 43 28 29 30 31 32 33 34 35 36 37 38 39
[61] 60 61 44 45 46 4R
62 63 49 S0 S1 52 53 S4 55 56 57 58
[81) 80 81 65 66 67 68 76 77 78 79
82 83 84 69 70 71 72 73 74 75
[101] 100 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
> y_punif <-
> Y_punif punif(x_punif, min = 10, max = 50)
[1] 0.000 0.000 0. 000 0.000 0.000 0.00o0 o000
[14] 0.075 o o00 0.000 0.000 0.000 0. 025
0. 100 0. 125 0.150 0.175 0.200 0.225 U.030
[27] 0.400 0. 425 0.450 0.475 0.250 0.275 0.300 0.325 O.50
0.500 0.525 0.550 o. 575 0.600 0.625 0.650 0.675 U.00
[40] 0.725 0.750 0. 775 0.800 0.825 0.850 0.875
[53] 1l.000 1. 000 1.000 1.000 1.000 1.000 1.000 1.00o 1.000 0.900 0.925 0.950 0.975 1.0000
1.000 1.000 1.000 . 00
L66J .000 1.000 1.000 1.000 1.000 1.000 1.000 1.00o 1.000 1. 000 1.000 1.000 1. 000
L79] 1.000 1.000 1. 000 1. 000 1. 000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.00
[92] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 l. 000 1. 000 1.000
> plot(y _punif, type = "o")

Figure 6.10 illustrates the plot for CDF of uniformdistribution

1.0

0.8
0.6
y_punr
0.4

0.2

0.0

60 80 100
20 40

Index

Figure 6.10Uniform CDF plot


Quantile probability distribution
percent) of points below the given
mean the fraction (or
By a quantile, we corresponding quantile for any probability
qunif() method is used to calculate the simply the function had to be called with
value. distribution. To use this function for uniform
uniform quantile
(P) Tor a given 6.11 illustrates the plot of
the required parameters. Figure
distribution
Syntax:
qunif(p, min =0, max =|)
117/R for data science
Parameter
p-The vector of probabilities
min,maX The limits for calculation of
quantile function
> X_qunif <- seq(0, 1, by =
> y_qunif <- qunif(x_qunif, min 0.01)
> plot(y _qunif, type = "o") = 10,, max = 50)

Output:
> X_qunif <-
> X_qunif seq(0, 1, by = 0.01)
[1] 0.00
171 0.16 0.01 0.02 0.03 0.04 0.05
0.17
331 0. 32 0.33 0.18 0.19 0.20 0.21 0.06
0.22
0.07 0.08 0.09 0.10
0.11 0.12 0.13 0.14 0.15
0.34
[49] 0.48 0.49 0.50 0.35 0.23
0.36 0.37 O.38 0.39 0.24 0.25 0.26 0.27 0.28
[65] 0.64 0.65 0.66 0.51 0.52 0.53 0.54 0.55 0.40 0.41 0.42 0.43 0.44 0.29
0. 45
0.30 0.31
[81] 0.80 0.81 0.67 0.68 0.69 0.70 0.56 0.57 0.58 0.59 0.46 0.47
[97] 0.96 0.97 0.82 0.83 0.84 0.85 0.86 0.71 0.72 0.73 0.74 0.75 0.60 0.61 0.62 0.63
> y_qunif <- 0.98 0.99 1.00 0.87 0.88 0.76
0.89 0.90 0.91 0.92 0.77 0.78 0.79
> y_qunif
[1] 10.0 10.4
qunif(x_qunif,
min = 10, max = 50) 0.93 0.94 0.95
[17] 16.4 16.8 10.8
17.2
11.2 11.6 12.0 12.4
12.8
[331 22.8 23.2 17.6 18.0 18.4 13.2 13.6 14.0 14.4
18.8 19.2 14.8
[49] 29.2 29.6 23.6 24.0 24.4 24.8 25.2 25.6
19.6 20.0 20.4 20.8 21.2
15.2 15.6 16.0
[651 35.6 36.0 30.0 30.4 30.8 31.2 31.6 32.0
26.0 26.4 26.8 27.2 27.6
21.6 22.0 22.4
36.4
[81] 42.0 42.4 42.8 36.8 37.2 37.6 32.4 32.8 33.2 33.6
28.0 28.4 28.8
43.2 38.0 38.4 38.8 39.2 34.0 34.4 34.8
[97] 48.4 43.6 44.0 44.4 39.6 40.0 35.2
> plot(y 48.8 49.2 49.6 50.0 44.8 45.2 45.6 46.0 46.4
40.4 40.8 41.2 41.6
_qunif, type = "o") 46.8 47.2 47.6 48.0

50

40

Y_qunif
30

20

10

20 40 60 80 100

Index

Figure 6. 11 Uniform 02 plot


118/ Rfor data science
The runif() function in R
ofrandom following the uniformprogramming language is used to generate a sequence
is given in Figure 6. 12. distribution.
Density function of uniform distribution
Sjntax:
runif(n, min =0, max = 1)

Parameter:

nz number of random samples


min=minimum value(by default 0)
max-maximum value(by default l)
> set.seed(1234)
> N <- 10000
> y_runif <- runif(N, min = 10, max = 50)
head(y_runif)
1] 14.54814 34.89198 34.37099 34.93518 44.43662 35.61242
> hist(y_runif,breaks = 50,
main = ,x1im = c(0, 100))

250

Frequency
150

50
0

20 40 60 80 100

Y_runif

Figure 6.12 Density function of uniform distribution

science
l19/ Rfor data
6.3 Normal Distribution:
While dealing with quantities whose
weight, height etc. a continuous probability magnitude is ofis
experiment involves discrete phenomena, the distribution
touse if the number of appropriate discrete
contneeded.inuous
natsoure
Al
a continuous model to observations are model may be difr
large. In such cases, it is often
most useful theoreticalapproximate discrete model. Normal Conveni e nt to
the

Çoncerning biological anddistributions for continuous variables. Most distribution


of
is one oi
tk
agricultural
distribution. Functions to Generate Normal research can be assumed to have statistical
Syntax Distribution Ris given by dnorm()
in

dnorm(x, m-0, sd=1, log-False)


It is used to find the
given mean and standard height of the probability distribution at each point to a
deviation
In statistics, it is
measured by below formula
fz) = -{-u /2o?
V2ro
where, u is mean and o is standard
deviation.
How to do it in R
> X_dnorm <- seq(- 5, S, by =
> head(x_dnorm) 0.05)
[1] -5.00 -4.95 -4.90
> -4.85 -4.80 -4.75
y_dnorm <- dnorm(x_dnorm)
> head(y_dnorm)
[1] 1.486720e-06
[5] 3.961299e-06 1.906601e-06
5.029507e-06 2.438961e-06 3.112176e-06
> plot(y_dnorm)

120/Rfor data science


Density function of normal distribution is given in Figure 6.13.

0.4
0.3

y_dnorm
0.2

0.1

0.0
oo0goooO0oo00000

100 150 200


50

Index

Figure 6.13 Normal Density functionplot


Syntax: pnorm(x, mean, sd)
cumulative distribution function which measures the probability
pnorm) function isthe or equal to x i.e., in statistics itis given
value lessthan
that a random number X takesa
by
Fx (x) = Pr[X<x] =a
0.05)
X_pnorm <- seq(- 5, 5, by =
>
>head(X_pnorm) -4.85 -4.80 -4.75
-4.95 -4.90
[1] -5.00 pnorm(x_pnorm)
>Y_pnorm <- 4.791833e-07 6.173074e-07
>head(y_pnorm) 3.710674e-07
[1] 2.866516e-07 1.01708 3e -06
[5] 7.933282e -07
>plot(y_pnorm)

121/ Rfor data science


Plot)function is used plot cumulative density plot Figure(6.14).

0.8
Y_pnorm
0.4

J.0

50
100
150
200
Index

Figure6.14 Normal Cumulative


qnorm) density function
Syntax: qnorm(p, mean, sd)
qnorm) function is the inverse of
gives output which pnorm) function. It takes the
percentiles corresponds the probability value. It is useful
of a normal
to probability value and
plot. distribution. Plot() (Figure 6.15) function is usedinplotfinding the
quantile
> X_qnorm <-
seq(0, 1, by =
> head («_gnorm) 0.005)
[1] 0.000 0.005 0.010
> y_qnorm <- 0.015 0.020 0.025
> head(y_qnorm)
[1]
qnorm(x_qnorm)
-Inf
> plot(y _qnorm)-2.575829 -2.326348 -2.170090 -2.053749
-1.959964

122/ Rfor data science


2
1
y_qnorm

-Z

150 200
50 100

Index

6.15 Normal Ouantilefunctionplot


Figure
numbers
rnorm() generate a vector of random
in R programming is used to normal distribution is
given in
rnorm) function function for
Density
which are normally distributed.
Figure 6.16.
sd)
Syntax: rnorm(x, mean,
> set.seed(12345)
> N<- 1000
<- rnorm(N) 0.6058875 -1.8179560
> y_rnorm -0.4534972
> head(y_rnorm) 0.7094660 -0. 1093033
[1] 0. 5855288
> plot(y_rnorm)
(y_rnorm))
> plot(density

123/ Rfor data science


O.4 density.defauit(x = y_morm)

0.3
Density
0.2

0.1

0.0

-2 0 2
4
N= 1000 Bandwidth = 0.2167

Figure 6.16 Normal Density Plot.

4 R
for data
science

You might also like