R-Program Lab Manual
R-Program Lab Manual
Binomial Distribution in R
Programming:
The binomial distribution is also known as discrete probability distribution, which
is used to find the probability of success of an event. The event has only two possible
outcomes in a series of experiments. The tossing of the coin is the best example of the
binomial distribution. When a coin is tossed, it gives either a head or a tail. The
probability of finding exactly three heads in repeatedly tossing the coin ten times is
approximate during the binomial distribution.
1. x It is a vector of numbers.
2. p It is a vector of probabilities.
3. n It is a vector of observations.
Example
Output:
2. pbinom():Direct Look-Up, Intervals
Example
Output:
The qbinom() function of R takes the probability value and generates a number whose
cumulative value matches with the probability value. In simple words, it calculates the
inverse cumulative distribution function of the binomial distribution.
Let's find the number of heads that have a probability of 0.45 when a coin is tossed 51
times.
Example
Output:
4.rbinom()
The rbinom() function of R is used to generate required number of random values for
given probability from a given sample.
Let's see an example in which we find nine random values from a sample of 160 with
a probability of 0.5.
Example
Output:
2. Poisson distribution
Many probability distributions can be easily implemented in R language with the help
of R’s inbuilt functions.
Consider a Random Variable X with Poisson distribution given as The mean (u)
of this distribution is given by . The variance of such a distribution is
So if there are ‘n’ which happened out of which the only k were successful when the
probability of success is very less then the probability of success becomes
1. dpois()
This function is used for illustration of Poisson density in an R plot. The function
dpois() calculates the probability of a random variable that is available within a
certain range.
Syntax:
where,
Example:
dpois(2, 3)
dpois(6, 6)
Output:
[1] 0.2240418
[1] 0.1606231
2. ppois()
This function is used for the illustration of cumulative probability function in an R
plot. The function ppois() calculates the probability of a random variable that will be
equal to or less than a number.
Syntax:
ppois(q, \lambda, lower.tail, log)
where,
K: number of successful events happened in an interval
\lambda: mean per interval
lower.tail: If TRUE then left tail is considered otherwise if the FALSE right tail is
considered
log: If TRUE then the function returns probability in form of log
Example:
ppois(2, 3)
ppois(6, 6)
Output:
[1] 0.4231901
[1] 0.6063028
3. rpois()
The function rpois() is used for generating random numbers from a given Poisson’s
distribution.
Syntax:
rpois(q, \lambda)
where
q: number of random numbers needed
\lambda: mean per interval
Example:
rpois(2, 3)
rpois(6, 6)
Output:
[1] 2 3
[1] 6 7 6 10 9 4
4. qpois()
The function qpois() is used for generating quantile of a given Poisson’s
distribution.
In probability, quantiles are marked points that divide the graph of a probability
distribution into intervals (continuous ) which have equal probabilities.
Syntax:
qpois(q, \lambda, lower.tail, log)
Where,
K: number of successful events happened in an interval
\lambda: mean per interval
lower.tail: If TRUE then left tail is considered otherwise if the FALSE right tail is
considered
log: If TRUE then the function returns probability in form of log
Example:
y <- c(.01, .05, .1, .2)
qpois(y, 2)
qpois(y, 6)
Output:
[1] 0 0 0 1
[1] 1 2 3 4
3. R Normal Distribution
In random collections of data from independent sources, it is commonly seen that the
distribution of data is normal. It means that if we plot a graph with the value of the
variable in the horizontal axis and counting the values in the vertical axis, then we get
a bell shape curve. The curve center represents the mean of the data set. In the graph,
fifty percent of the value is located to the left of the mean. And the other fifty percent
to the right of the graph. This is referred to as the normal distribution.
1. x It is a vector of numbers.
2. p It is a vector of probabilities.
3. n It is a vector of observations.
4. mean It is the mean value of the sample data whose default value is
1. dnorm():Density
The dnorm() function of R calculates the height of the probability distribution at each
point for a given mean and standard deviation. The probability density of the normal
distribution is:
Example
Output:
2 .pnorm():Direct Look-Up
f(x)=P(X≤x)
Example
1. # Creating a sequence of numbers between -1 and 20 incrementing by 0.2.
2. x <- seq(-1, 20, by = .1)
3. # Choosing the mean as 2.0 and standard deviation as 0.5.
4. y <- pnorm(x, mean = 2.0, sd = 0.5)
5. # Giving a name to the chart file.
6. png(file = "pnorm.png")
7. #Plotting the graph
8. plot(x,y)
9. # Saving the file.
10. dev.off()
Output:
3.qnorm():Inverse Look-Up
The qnorm() function takes the probability value as an input and calculates a number
whose cumulative value matches with the probability value. The cumulative
distribution function and the inverse cumulative distribution function are related by
p=f(x)
x=f-1 (p)
Example
Output:
4. rnorm():Random variates
The rnorm() function is used for generating normally distributed random numbers.
This function generates random numbers by taking the sample size as an input. Let's
see an example in which we draw a histogram for showing the distribution of the
generated numbers.
Example
Output:
4.Uniform Distribution
In simple terms, Uniform Distribution is a probability based distribution wherein it
happens to have equal chances of outcome to occur as a result. That is, it has equals
chances for the probability to have the group values.
For example: Consider a scenario wherein we deal with the question that determines
the probability of males or females who would opt for blue color for their car.
In such a scenario, with respect to the Uniform distribution of data, both males and
females would have equal probability outcomes.
1. dunif() function
2. runif() function
3. qunif() function
4. punif() function
1. dunif() function
R dunif() function enables us calculate the uniform probability density function for
our passed set of values.
min: Specifies the minimum value to be set for the uniform density function.
max: Depicts the maximum value for the uniform density function.
Example:
#Removed all the existing objects
rm(list = ls())
plot(dta,type="o")
Output:
2. R punif() function
R punif() function helps us estimate the uniform cumulative distribution function for
the set of values.
In the below example, we use the same set of values as above! Further, punif()
function calculates the cumulative frequency distribution for the set of values passed
to it.
Example:
plot(dta,type="o")
Output:
3. R qunif() function
R qunif() function helps us to get the uniform quantile distribution probability values
for the data values.
In this example, we have used qunif() function to get the quantile distribution values
from the data values passed.
Example:
rm(list = ls())
plot(dta,type="o")
Output:
4. R runif() function
Now, R runif() function helps us get random numbers as usual but with a little twist.
Using runif() function, we get a set of random numbers that are uniformly distributed
i.e. follow an uniform distribution.
Example:
rm(list = ls())
info = 20000
hist(dta,breaks = 40,
Output:
5. Exponential Distribution
The exponential distribution in R Language is the probability distribution of the time
between events in a Poisson point process, i.e., a process in which events occur
continuously and independently at a constant average rate. It is a particular case of the
gamma distribution.
dexp()
dexp(x_dexp, rate)
pexp()
pexp(x_pexp, rate )
qexp()
qexp(x_qexp, rate)
rexp()
rexp(N, rate )
Where,
1. dexp() Function
dexp() function returns the corresponding values of the exponential density for an
input vector of quantiles.
Syntax:
dexp(x_dexp, rate)
Example:
# R program to illustrate
# exponential distribution
# Specify x-values
plot(y_dexp)
Ouput:
2. pexp() Function
Syntax:
pexp(x_pexp, rate )
Example:
# R program to illustrate
# exponential distribution
# Specify x-values
plot(y_pexp)
Output :
3. qexp() Function
qexp() function gives the possibility, we can use the qexp function to return the
corresponding values of the quantile function.
Syntax:
qexp(x_qexp, rate)
Example:
# R program to illustrate
# exponential distribution
# Specify x-values
# Plot values
plot(y_qexp)
Output:
4. rexp() Function
rexp() function is used to simulate a set of random numbers drawn from the
exponential distribution.
Syntax:
rexp(N, rate )
Example:
# R program to illustrate
# exponential distribution
set.seed(500)
# Specify size
N <- 100
Output :
6.T-Test in R
In statistics, the T-test is one of the most common test which is used to determine
whether the mean of the two groups is equal to each other. The assumption for the test
is that both groups are sampled from a normal distribution with equal fluctuation. The
null hypothesis is that the two means are the same, and the alternative is that they are
not identical. It is known that under the null hypothesis, we can compute a t-statistic
that will follow a t-distribution with n1 + n2 - 2 degrees of freedom.
In R, there are various types of T-test like one sample and Welch T-test. R provides
a t.test() function, which provides a variety of T-tests.
There are the following syntaxes of t.test() function for different T-test
1. t.test(y~x)
1. t.test(y1,y2)
Paired T-test
1. t.test(y1,y2,paired=TRUE)
t.test(y,mu=3)
Let's see how one-sample, paired sample, and independent samples T-test is
performed.
One-Sample T-test
One-Sample T-test is a T-test which compares the mean of a vector against a theoretical
mean. There is a following formula which is used to compute the T-test :
T-Test in R
Here,
M is the mean.
? is the theoretical mean.
s is the standard deviation.
n is the number of observations.
t.test(x, ?=0)
Here,
Example
Let's see an example of One-Sample T-test in which we test whether the volume of a
shipment of wood was less than usual(?0=0).
1. set.seed(0)
2. ship_vol <- c(rnorm(70, mean = 35000, sd = 2000))
3. t.test(ship_vol, mu = 35000)
Output:
2. Paired-Sample T-test
To perform a paired-sample test, we need two vectors data y1 and y2. Then, we will
run the code using the syntax t.test (y1, y2, paired = TRUE).
Example:
1. set.seed(2800)
2. pre.treatment <- c(rnorm(2000, mean = 130, sd = 5))
3. post.treatment <- c(rnorm(2000, mean = 144, sd = 4))
4. t.test(pre_Treatment, post_Treatment, paired = TRUE)
Output:
Independent-Sample T-test
Depending on the structure of our data and the equality of their variance, the
independent-sample T-test can take one of the three forms, which are as follows:
There is the following general form of t.test() function for the independent-sample t-
test:
1. t.test(y1,y2, paired=FALSE)
By default, R assumes that the versions of y1 and y2 are unequal, thus defaulting to
Welch's test. For toggling this, we set the flag var.equal=TRUE.
1. set.seed(0)
2. Spenders.Cleve <- rnorm(50, mean = 300, sd = 70)
3. Spenders.NY <- rnorm(50, mean = 350, sd = 70)
4. t.test(Spenders.Cleve, Spenders.NY, var.equal = TRUE)
Output:
7. F test in R
Fisher’s F test calculates the ratio between the larger variance and the smaller
variance. We use the F test when we want to check where means of three or more
groups are different or not. F-test is used to assess whether the variances of two
populations (A and B) are equal. The method is simple; it consists of taking the ratio
between the larger variance and the smaller variance. var.test() function in R
Programming performs an F-test between 2 normal populations with the hypothesis
that variances of the 2 populations are equal.
Implementation in R
To test the equality of variances between the two sample use var.test(x, y)
Syntax:
Parameters:
x, y: numeric vectors
Example 1:
Let we have two samples x, y. The R function var.test() can be used to compare two
variances as follow:
# var test in R
Output:
The p-value of F-test is p = 0.4901 which is greater than the alpha level 0.05. In
conclusion, there is no difference between the two sample.
8. Chi-square test
The Chi-Square Test is used to analyze the frequency table (i.e., contingency table),
which is formed by two categorical variables. The chi-square test evaluates whether
there is a significant relationship between the categories of the two variables.
The Chi-Square Test is a statistical method which is used to determine whether two
categorical variables have a significant correlation between them. These variables
should be from the same population and should be categorical like- Yes/No,
Red/Green, Male/Female, etc.
R provides chisq.test() function to perform chi-square test. This function takes data as
an input, which is in the table form, containing the count value of the variables in the
observation.
Let's see an example in which we will take the Cars93 data present in the "Mass"
library. This data represents the sales of different models of cars in the year 1993.
Data:
1. library("MASS")
2. print(str(Cars93))
Output:
Example:
9. Correlation Coefficient
R Language provides two methods to calculate the pearson correlation coefficient. By using the
functions cor() or cor.test() it can be calculated. It can be noted that cor() computes the correlation
coefficient whereas cor.test() computes the test for association or correlation between paired samples. It
returns both the correlation coefficient and the significance level(or p-value) of the correlation.
Parameters:
# R program to illustrate
# pearson Correlation Testing
# Using cor()
# Calculating
# Correlation coefficient
# Using cor() method
result = cor(x, y, method = "pearson")
Output:
# R program to illustrate
# pearson Correlation Testing
# Using cor.test()
# Calculating
# Correlation coefficient
# Using cor.test() method
result = cor.test(x, y, method = "pearson")
Output:
# Data Frame
df <- as.data.frame(cbind(IQ, result))
Output:
IQ result
1 25.46872 0
2 26.72004 0
3 27.16163 0
4 27.55291 1
5 27.72577 0
6 28.00731 0
7 28.18095 0
8 28.28053 0
9 28.29086 0
10 28.34474 1
11 28.35581 1
12 28.40969 0
13 28.72583 0
14 28.81105 0
15 28.87337 1
16 29.00383 1
17 29.01762 0
18 29.03629 0
19 29.18109 1
20 29.39251 0
21 29.40852 0
22 29.78844 0
23 29.80456 1
24 29.81815 0
25 29.86478 0
26 29.91535 1
27 30.04204 1
28 30.09565 0
29 30.28495 1
30 30.39359 1
31 30.78886 1
32 30.79307 1
33 30.98601 1
34 31.14602 0
35 31.48225 1
36 31.74983 1
37 31.94705 1
38 31.94772 1
39 33.63058 0
40 35.35096 1
abline() function in R Language is used to add one or more straight lines to a graph. The
abline() function can be used to add vertical, horizontal or regression lines to plot.
Syntax:
abline(a=NULL, b=NULL, h=NULL, v=NULL, …)
Parameters:
a, b: It specifies the intercept and the slope of the line
h: specifies y-value for horizontal line(s)
v: specifies x-value(s) for vertical line(s)
# third example
set.seed(1200); mydata<-rnorm(180)
hist(mydata, col="darkgreen")
Output:
Here, in above example straight line is added using abline() to different graphical
plots
12. R Histogram
For creating a histogram, R provides hist() function, which takes a vector as an input
and uses more parameters to add more functionality. There is the following syntax of
hist() function:
hist(v,main,xlab,ylab,xlim,ylim,breaks,col,border)
Example
1. # Creating data for the graph.
2. v <- c(12,24,16,38,21,13,55,17,39,10,60)
3.
4. # Giving a name to the chart file.
5. png(file = "histogram_chart.png")
6.
7. # Creating the histogram.
8. hist(v,xlab = "Weight",ylab="Frequency",col = "green",border = "red")
9.
10. # Saving the file.
11. dev.off()
Output:
13.R Bar Charts
In R, we can create a bar chart to visualize the data in an efficient manner. For this
purpose, R provides the barplot() function, which has the following syntax:
barplot(h,x,y,main, names.arg,col)
Example
H<- c(12,35,54,3,41)
png(file = "bar_chart.png")
barplot(H)
dev.off()
Output:
14.R Pie Charts
The Pie charts are created with the help of pie () function, which takes positive
numbers as vector input. Additional parameters are used to control labels, colors,
titles, etc.
Here,
97.2K
iPhone 13 Pro & 13 Pro Max: ONE MONTH REVIEW!
1. X is a vector that contains the numeric values used in the pie chart.
2. Labels are used to give the description to the slices.
3. Radius describes the radius of the pie chart.
4. Main describes the title of the chart.
5. Col defines the color palette.
6. Clockwise is a logical value that indicates the clockwise or anti-clockwise
direction in which slices are drawn.
[1] 0 0 0 1
[1] 1 2 3 4
Output:
15. Scatter plots in R.
Example
data <-mtcars[,c('wt','mpg')]
png(file = "scatterplot.png")
# Plotting the chart for cars with weight between 2.5 to 5 and mileage between 15 and
30.
dev.off()
Output:
16. Plot box in R.
R provides a boxplot() function to create a boxplot. There is the following syntax of
boxplot() function:
Output: