0% found this document useful (0 votes)
14 views30 pages

Unit3 R

The document covers statistical concepts and functions in R, including mean, median, mode, variance, covariance, and correlation, along with their respective syntax and examples. It also discusses basic data visualization techniques using various R packages and functions for creating different types of charts such as bar plots, histograms, pie charts, and scatter plots. Additionally, it introduces common probability distributions, particularly the normal distribution, and the built-in R functions for generating and analyzing these distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views30 pages

Unit3 R

The document covers statistical concepts and functions in R, including mean, median, mode, variance, covariance, and correlation, along with their respective syntax and examples. It also discusses basic data visualization techniques using various R packages and functions for creating different types of charts such as bar plots, histograms, pie charts, and scatter plots. Additionally, it introduces common probability distributions, particularly the normal distribution, and the built-in R functions for generating and analyzing these distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

BCA V R

UNIT-3
Statistics and Probability
Mean, Median, Mode
Mean:It is calculated by taking the sum of the values and dividing with the
number of values in a data series.
Syntax:
The basic syntax for calculating mean in R is −
mean(x, trim = 0, na.rm = FALSE, ...)
Following is the description of the parameters used −
 x is the input vector.
 trim is used to drop some observations from both end of the sorted vector.
 na.rm is used to remove the missing values from the input vector.
Example:
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.
result.mean <- mean(x)
print(result.mean)
o/p:
[1] 8.22
Median: The middle most value in a data series is called the median. The
median() function is used in R to calculate this value.
Syntax
The basic syntax for calculating median in R is −
median(x, na.rm = FALSE)
Following is the description of the parameters used −
 x is the input vector.
 na.rm is used to remove the missing values from the input vector.
Example
# Create the vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find the median.
median.result <- median(x)
print(median.result)
o/p:
[1] 5.6
BCA V R

Mode:The mode is the value that has highest number of occurrences in a set of
data. Unike mean and median, mode can have both numeric and character data.
R does not have a standard in-built function to calculate mode.
Finding a mode is perhaps most easily achieved by using R’s table function,
which gives you the frequencies you need.
Example:
R> xdata <- c(2,4.4,3,3,2,2.2,2,4)
R> xtab <- table(xdata)
R> xtab
xdata
2 2.2 3 4 4.4
3 1 21 1

The min and max functions will report the smallest and largest values, with range
returning both in a vector of length 2.
R> min(xdata)
[1] 2
R> max(xdata)
[1] 4.4
R> range(xdata)
[1] 2.0 4.4
tapply() function
The tapply() helps us to compute statistical measures (mean, median, min, max,
etc..) or a self-written function operation for each factor variable in a vector.
Syntax: tapply( x, index, fun )
 x: determines the input vector or an object.
 index: determines the factor vector that helps us distinguish the data.
 fun: determines the function that is to be applied to input data.
tapply(chickwts$weight,INDEX=chickwts$feed,FUN=function(x) length(x) /
nrow(chickwts) )
casein horsebean linseed meatmeal soybean sunflower
0.1690141 0.1408451 0.1690141 0.1549296 0.1971831 0.1690141
round function, which rounds numeric data output to a certain number of
decimal places.
R> round(table(chickwts$feed)/nrow(chickwts),digits=3)
casein horsebean linseed meatmeal soybean sunflower
0.169 0.141 0.169 0.155 0.197 0.169
Quantiles, Percentiles, and the Five-Number Summary:A quantile is a value
computed from a collection of numeric measurements that indicates an
observation’s rank when compared to all the other present observations. For
BCA V R

example, the median is itself a quantile—it gives you a value below which half
of the measurements lie—it’s the 0:5th quantile. Alternatively, quantiles can be
expressed as a percentile—this is identical but on a “percent scale” of 0 to 100.
quantile function:
Syntax: quantile(x)
x: Data set
Example:
R> xdata <- c(2,4.4,3,3,2,2.2,2,4)
R> quantile(xdata,prob=0.8)
80%
3.6
Summary Function: The summary function also provides summary of all the
above statistics.
R> summary(xdata)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.000 2.000 2.600 2.825 3.250 4.400
Variance: The variance is a particular representation of the average squared
distance of each observation when compared to the mean.
The standard deviation is simply the square root of the variance.
The interquartile range(IQR) measures the width of the “middle 50 percent” of
the data, that is, the range of values that lie within a 25 percent quartile on either
side of the median.
The direct R commands for computing these measures of spread are
var(variance), sd (standard deviation), and IQR (interquartile range).
R> var(xdata)
[1] 0.9078571
R> sd(xdata)
[1] 0.9528154
R> IQR(xdata)
[1] 1.25
Covariance and Correlation
 The covariance expresses how much two numeric variables “change
together” and the nature of that relationship, whether it is positive or
negative.
 Correlation allows you to interpret the covariance further by identifying
both the direction and the strength of any association.
R> xdata <- c(2,4.4,3,3,2,2.2,2,4)
R> ydata <- c(1,4.4,1,3,2,2.2,2,7)
R> cov(xdata,ydata)
[1] 1.479286
BCA V R

R> cor(xdata,ydata)
[1] 0.7713962
BASIC DATA VISUALIZATION
R Visualization Packages
1) plotly
The plotly package provides online interactive and quality graphs. This package
extends upon the JavaScript library ?plotly.js.
2) ggplot2
R allows us to create graphics declaratively. R provides the ggplot package for
this purpose. This package is famous for its elegant and quality graphs, which
sets it apart from other visualization packages.
3) tidyquant
The tidyquant is a financial package that is used for carrying out quantitative
financial analysis. This package adds under tidyverse universe as a financial
package that is used for importing, analyzing, and visualizing the data.
4) taucharts
Data plays an important role in taucharts. The library provides a declarative
interface for rapid mapping of data fields to visual properties.
5) ggiraph
It is a tool that allows us to create dynamic ggplot graphs. This package allows
us to add tooltips, JavaScript actions, and animations to the graphics.
6) geofacets
This package provides geofaceting functionality for 'ggplot2'. Geofaceting
arranges a sequence of plots for different geographical entities into a grid that
preserves some of the geographical orientation.
7) googleVis
googleVis provides an interface between R and Google's charts tools. With the
help of this package, we can create web pages with interactive charts based on R
data frames.
8) RColorBrewer
This package provides color schemes for maps and other graphics, which are
designed by Cynthia Brewer.
BCA V R

9) dygraphs
The dygraphs package is an R interface to the dygraphs JavaScript charting
library. It provides rich features for charting time-series data in R.
10) shiny
R allows us to develop interactive and aesthetically pleasing web apps by
providing a shiny package. This package provides various extensions with
HTML widgets, CSS, and JavaScript.
barplot(): R uses the barplot() function to create bar charts. Here, both vertical
and Horizontal bars can be drawn.
Syntax:
barplot(H, xlab, ylab, main, names.arg, col)
Parameters:
H: This parameter is a vector or matrix containing numeric values which are used
in bar chart.
xlab: This parameter is the label for x axis in bar chart.
ylab: This parameter is the label for y axis in bar chart.
main: This parameter is the title of the bar chart.
names.arg: This parameter is a vector of names appearing under each bar in bar
chart.
col: This parameter is used to give colors to the bars in the graph.
Example:
# Create the data for the chart
A <- c(17, 32, 8, 53, 1)
# Plot the bar chart
barplot(A, xlab = "X-axis", ylab = "Y-axis", main ="Bar-Chart")
BCA V R

Creating a Horizontal Bar Chart in R


To create a horizontal bar chart:
 Take all parameters which are required to make a simple bar chart.
 Now to make it horizontal new parameter is added.
barplot(A, horiz=TRUE )
Example:
barplot(A, horiz = TRUE, xlab = "X-axis",ylab = "Y-axis", main ="Horizontal
Bar Chart” )

R Histogram
A histogram is a type of bar chart which shows the frequency of the number of
values which are compared with a set of values ranges. The histogram is used for
the distribution, whereas a bar chart is used for comparing different entities.
Syntax
The basic syntax for creating a histogram using R is −
hist(v,main,xlab,xlim,ylim,breaks,col,border)
 v is a vector containing numeric values used in histogram.
 main indicates title of the chart.
 col is used to set color of the bars.
 border is used to set border color of each bar.
 xlab is used to give description of x-axis.
 xlim is used to specify the range of values on the x-axis.
 ylim is used to specify the range of values on the y-axis.
 breaks is used to mention the width of each bar.
Example
# Creating data for the graph.
v <- c(12,24,16,38,21,13,55,17,39,10,60)
# Giving a name to the chart file.
BCA V R

png(file = "histogram_chart.png")

# Creating the histogram.


hist(v,xlab = "Weight",ylab="Frequency",col = "green",border = "red")
# Saving the file.
dev.off()
O/p

R Pie Charts
A pie-chart is a representation of values in the form of slices of a circle with
different colors. The Pie charts are created with the help of pie () function,
Syntax:
pie(x, labels, radius, main, col, clockwise)
 x is a vector containing the numeric values used in the pie chart.
 labels is used to give description to the slices.
 radius indicates the radius of the circle of the pie chart.(value between −1
and +1).
 main indicates the title of the chart.
 col indicates the color palette.
 clockwise is a logical value indicating if the slices are drawn clockwise or
anti clockwise.
Example:
# Create data for the graph.
x <- c(21, 62, 10, 53)
labels <- c("London", "New York", "Singapore", "Mumbai")
BCA V R

# Plot the chart.


pie(x,labels)

R - Line Chart
A line chart is a graph that connects a series of points by drawing line segments
between them. The plot() function in R is used to create the line graph.
Syntax
plot(v,type,col,xlab,ylab)
 v is a vector containing the numeric values.
 type takes the value "p" to draw only the points, "l" to draw only the lines
and "o" to draw both points and lines.
 xlab is the label for x axis.
 ylab is the label for y axis.
 main is the Title of the chart.
 col is used to give colors to both the points and lines
example:
v <- c(7,12,28,3,41)
# Plot the bar chart.
plot(v,type = "o")
BCA V R

R – Boxplots
Boxplots are a measure of how well data is distributed across a data set. This
divides the data set into three quartiles. This graph represents the minimum,
maximum, average, first quartile, and the third quartile in the data set.
Syntax
boxplot(x, data, notch, varwidth, names, main)
 x is a vector or a formula.
 data is the data frame.
 notch is a logical value. Set as TRUE to draw a notch.
 varwidth is a logical value. Set as true to draw width of the box
proportionate to the sample size.
 names are the group labels which will be printed under each boxplot.
 main is used to give a title to the graph
example
data<-
data.frame(Group_A=c(25,28,30,32,35,37,38,39,40,41,42),Group_B=c(22,24,2
6,29,31,33,36,37,38,40,43))
boxplot(data,main="Boxplaplot of Group A and
B",xlab="Groups",ylab="values",col=c("lightblue","lightgreen"),border="black
")
BCA V R

R – Scatterplots
Scatterplots show many points plotted in the Cartesian plane. Each point
represents the values of two variables. One variable is chosen in the horizontal
axis and another in the vertical axis.
Syntax
plot(x, y, main, xlab, ylab, xlim, ylim, axes)
 x is the data set whose values are the horizontal coordinates.
 y is the data set whose values are the vertical coordinates.
 main is the tile of the graph.
 xlab is the label in the horizontal axis.
 ylab is the label in the vertical axis.
 xlim is the limits of the values of x used for plotting.
 ylim is the limits of the values of y used for plotting.
 axes indicates whether both axes should be drawn on the plot.
Example:
> x<-1:10
> y<-c(2,4,5,7,8,10,11,13,14,16)
>plot(x,y,main="scatterplotexample",xlab="X-axis",ylab="Y-
axis",col="blue",pch=16,xlim=c(0,11),ylim=c(0,17))
BCA V R

Common probability distributions


In R probability distribution function with respect to probability density starts
with ‘d’, the cumulative distribution function always begins with ‘p’ ,inverse
cumulative distribution begins with ‘q’ and functions that produces random
variables begins with ‘r’
Normal Distribution
Normal Distribution is a probability function used in statistics that tells about
how the data values are distributed. It is the most important probability
distribution function used in statistics because of its advantages in real case
scenarios. For example, the height of the population, shoe size, IQ level, rolling a
dice, and many more.
The normal distribution(Gaussian Distribution) is defined by the following
probability density function, where μ is the population mean and σ2 is the
variance. It is represented as N(μ, σ2).
In R, there are 4 built-in functions to generate normal distribution:
 dnorm()
 pnorm()
 qnorm()
 rnorm()
dnorm()
dnorm() function in R programming measures density function of distribution. In
statistics, it is measured by below formula-
f(x) = e−(x − μ)^2/2σ^2/σ√2π
BCA V R

Syntax :
dnorm(x, mean, sd)
where,
– x represents the data set of values
– mean(x) represents the mean of data set x. It’s default value is 0
– sd(x) represents the standard deviation of data set x. It’s default value is 1
Example:
# creating a sequence of values
# between -15 to 15 with a difference of 0.1
x = seq(-15, 15, by=0.1)

y = dnorm(x, mean(x), sd(x))


# Plot the graph.
plot(x, y)

pnorm()
pnorm() function is the cumulative distribution function which measures the
probability that a random number X takes a value less than or equal to x i.e., in
statistics it is given by-

Syntax:
pnorm(x, mean, sd,lower.tail)
– x represents the data set of values
– mean(x) represents the mean of data set x. It’s default value is 0
– sd(x) represents the standard deviation of data set x. It’s default value is 1
– lower.tail represents a logical value including whether to compute lower tail
probability. It’s default value is TRUE
Example
# creating a sequence of values
# between -10 to 10 with a difference of 0.1
BCA V R

x <- seq(-10, 10, by=0.1)


y <- pnorm(x, mean = 2.5, sd = 2)
plot(x, y)

qnorm()
qnorm() function is the inverse of pnorm() function. It takes the probability value
and gives output which corresponds to the probability value. It is useful in
finding the percentiles of a normal distribution.
Syntax:
qnorm(p, mean, sd)
– mean(x) represents the mean of data set x. It’s default value is 0
– sd(x) represents the standard deviation of data set x. It’s default value is 1
– p is vector of probabilities
Example:
# Create a sequence of probability values
# incrementing by 0.02.
x <- seq(0, 1, by = 0.02)
y <- qnorm(x, mean(x), sd(x))
plot(x, y)

rnorm()
rnorm() function in R programming is used to generate a vector of random
numbers which are normally distributed.
BCA V R

Syntax:
rnorm(x, mean, sd)
– x represents the data set of values
– mean(x) represents the mean of data set x. It’s default value is 0
– sd(x) represents the standard deviation of data set x. It’s default value is 1
Example
# Create a vector of 1000 random numbers
# with mean=90 and sd=5
x <- rnorm(10000, mean=90, sd=5)
# Create the histogram with 50 bars
hist(x, breaks=50)

Poisson Distribution
The Poisson distribution represents the probability of a provided number of cases
happening in a set period of space or time if these cases happen with an
identified constant mean rate (free of the period since the ultimate event).
The probability mass function of the Poisson distribution is:

Where:
 X is a random variable following a Poisson distribution
 k is the number of times an event occurs
 P(X = k) is the probability that an event will occur k times
 e is Euler’s constant (approximately 2.718)
 is the average number of times an event occurs
 ! is the factorial function

There are four Poisson functions available in R:


BCA V R

• dpois
• ppois
• qpois
• rpois
dpois()
The function dpois() calculates the probability of a random variable that is
available within a certain range.
Syntax:
dpois(k, lambda, log)
where,
 K: number of successful events happened in an interval
 lambda: mean per interval
 log: If TRUE then the function returns probability in form of log
Example:
dpois(2, 3)
dpois(6, 6)

Output:
[1] 0.2240418
[1] 0.1606231

ppois()
The function ppois() calculates the probability of a random variable that will be
equal to or less than a number.
Syntax:
ppois(q,lambda,lower.tail,log)
where,
 q: number of successful events happened in an interval
 lambda: mean per interval
 lower.tail: If TRUE then left tail is considered otherwise if the FALSE
right tail is considered
 log: If TRUE then the function returns probability in form of log
Example:
ppois(2, 3)
ppois(6, 6)
Output:
[1] 0.4231901
[1] 0.6063028
qpois()
BCA V R

The function qpois() is used for generating quantile of a given Poisson’s


distribution. In probability, quantiles are marked points that divide the graph of a
probability distribution into intervals (continuous ) which have equal
probabilities.
Syntax:
qpois(q, lambda, lower.tail, log)
where,
 K: number of successful events happened in an interval
 lambda: mean per interval
 lower.tail: If TRUE then left tail is considered otherwise if the FALSE
right tail is considered
 log: If TRUE then the function returns probability in form of log
Example
y <- c(.01, .05, .1, .2)
qpois(y, 2)
qpois(y, 6)
Output:
[1] 0 0 0 1
[1] 1 2 3 4

rpois()
The function rpois() is used for generating random numbers from a given
Poisson’s distribution.
Syntax:
rpois(q, lambda)
where,
 q: number of random numbers needed
 lambda: mean per interval
Example
rpois(2, 3)
rpois(6, 6)
Output:
[1] 2 3
[1] 6 7 6 10 9 4

Binomial Distribution
The binomial distribution model deals with finding the probability of success of
an event which has only two possible outcomes in a series of experiments. For
example, tossing of a coin always gives a head or a tail. The probability of
finding
BCA V R

exactly 3 heads in tossing a coin repeatedly for 10 times is estimated during the
binomial distribution.

R has four in-built functions to generate binomial distribution. They are


described below.
dbinom(x, size, prob)
pbinom(x, size, prob)
qbinom(p, size, prob)
rbinom(n, size, prob)
Following is the description of the parameters used −
 x is a vector of numbers.
 p is a vector of probabilities.
 n is number of observations.
 size is the number of trials.
 prob is the probability of success of each trial.
dbinom()
This function gives the probability density distribution at each point.
Example
# Create a sample of 50 numbers which are incremented by 1.
x <- seq(0,50,by = 1)
# Create the binomial distribution.
y <- dbinom(x,50,0.5)
# Plot the graph for this sample.
plot(x,y)
BCA V R

pbinom()
This function gives the cumulative probability of an event. It is a single value
representing the probability.
Example
# Probability of getting 26 or less heads from a 51 tosses of a coin.
x <- pbinom(26,51,0.5)
print(x)
When we execute the above code, it produces the following result –
[1] 0.610116
qbinom()
This function takes the probability value and gives a number whose cumulative
value matches the probability value.
Example
# How many heads will have a probability of 0.25 will come out when a coin
# is tossed 51 times.
x <- qbinom(0.25,51,1/2)
print(x)
When we execute the above code, it produces the following result −
[1] 23
rbinom()
This function generates required number of random values of given probability
from a given sample.
Example
# Find 8 random values from a sample of 150 with probability of 0.4.
x <- rbinom(8,150,.4)
print(x)
BCA V R

When we execute the above code, it produces the following result −


[1] 58 61 59 66 55 60 61 67

Continuous uniform distribution in R


A uniform distribution is a probability distribution in which every value between
an interval from a to b is equally likely to be chosen. The probability that we will
obtain a value between x1 and x2 on an interval from a to b can be found using
the formula:
P(obtain value between x1 and x2)=(x2-x1)/(b-a)
The uniform distribution has the following properties:
 The mean of the distribution is μ = (a + b) / 2
 The variance of the distribution is σ2 = (b – a)2 / 12
 The distribution’s standard deviation, or SD, is σ = √σ2
dunif() method in R programming language is used to generate density function.
It calculates the uniform density function in R language in the specified interval
(a, b).
Syntax:
dunif(x, min = 0, max = 1, log = FALSE)

Parameter:
 x: input sequence
 min, max= range of values
 log: indicator, of whether to display the output values as probabilities.
The result produced will be for each value of the interval. Hence, a sequence will
be generated.
Example 1:
# generating a sequence of values
x <- 5:10
print ("dunif value")

# calculating density function


dunif(x, min = 1, max = 20)
Output
[1] “dunif value”
[1] 0.05263158 0.05263158 0.05263158 0.05263158 0.05263158 0.05263158
All values are equal and this is the reason why it is called uniform distribution.
Let us plot it for a better picture.
Example 2:
min <- 0
BCA V R

max <- 100

# Specify x-values for qunif function


xpos <- seq(min, max , by = 0.5)

# supplying corresponding y coordinations


ypos <- dunif(xpos, min = 10, max = 80)

# plotting the graph


plot(ypos , type="o")

The punif() method in R is used to calculate the uniform cumulative distribution


function, this is, the probability of a variable X taking a value lower than x (that
is, x <= X). If we need to compute a value x > X, we can calculate 1 – punif(x).

Syntax:
punif(q, min = 0, max = 1, lower.tail = TRUE)

All the independent probabilities that satisfy the comparison condition will be
added.
Example:
min <- 0
max <- 60
# calculating punif value
punif (15 , min =min , max = max)
BCA V R

Output
[1] 0.25
Example 2:
# Grid of X-axis values
x <- seq(-0.5, 1.5, 0.01)

# Uniform distribution between 0 and 1


plot(x, punif(x), type = "l", main = "Uniform CDF", ylab = "F(x)", lwd = 2, col
= "red")

qunif() method is used to calculate the corresponding quantile for any probability
(p) for a given uniform distribution. To use this simply the function had to be
called with the required parameters.

Syntax:
qunif(p, min = 0, max = 1)

Parameter :
 p – The vector of probabilities
 min , max – The limits for calculation of quantile function
Example
min <- 0
max <- 1
BCA V R

# Specify x-values for qunif function


xpos <- seq(min, max , by = 0.02)

# supplying corresponding y coordinations


ypos <- qunif(xpos, min = 10, max = 100)

# plotting the graph


plot(xpos,ypos)

The runif() function in R programming language is used to generate a sequence


of random following the uniform distribution.
Syntax:
runif(n, min = 0, max = 1)
Parameter:
 n= number of random samples
 min=minimum value(by default 0)
 max=maximum value(by default 1)
Example 1:
print("Random 15 numbers between 1 and 3")
runif(15, min=1, max=3)
Output
[1] “Random 15 numbers between 1 and 3”
[1] 1.534 1.772 1.027 1.765 2.739 1.681 1.964 2.199 1.987 1.372 2.655 2.337
2.588 1.216 2.447
Example 2:
# n = 1000
hist(runif(1000), main = "n = 10000", xlim = c(-0.2, 1.25),
BCA V R

xlab = "", prob = TRUE)


lines(x, dunif(x), col = "red", lwd = 2)

Bernoulli Distribution

Bernoulli Distribution is a special case of Binomial distribution where only a


single trial is performed. It is a discrete probability distribution for a Bernoulli
trial (a trial that has only two outcomes i.e. either success or failure). For
example, In R it can be represented as a coin toss where the probability of getting
the head is 0.5 and getting a tail is 0.5. It is a probability distribution of a random
variable that takes value 1 with probability p and the value 0 with probability
q=1-p. The Bernoulli distribution is a special case of the binomial distribution
with n=1.

The probability mass function f of this distribution, over possible outcomes k, is


given by :

dbern()
dbern() function in R programming measures the density function of the Bernoulli
distribution.
Syntax: dbern(x, prob, log = FALSE)

Parameter:
 x: vector of quantiles
 prob: probability of success on each trial
 log: logical; if TRUE, probabilities p are given as log(p)
BCA V R

Example:
# Importing the Rlab library
library(Rlab)

# x values for the dbern() function


x <- c(0, 1, 3, 5, 7, 10)

# Using dbern() function to obtain the corresponding Bernoulli PDF


y <- dbern(x, prob = 0.5)

# Plotting dbern values


plot(x, y, type = "o")

pbern()
pbern() function in R programming giver the distribution function for the
Bernoulli distribution.
Syntax: pbern(q, prob, lower.tail = TRUE, log.p = FALSE)

Parameter:
 q: vector of quantiles
 prob: probability of success on each trial
 lowe.tail: logical
 log.p: logical; if TRUE, probabilities p are given as log(p).
Example:
# import Rlab library
library(Rlab)

# x values for the


# pbern( ) function
x <- seq(0, 10, by = 1)
BCA V R

# using pbern( ) function


# to x to obtain corresponding
# Bernoulli CDF
y <- pbern(x, prob = 0.7)

# plot pbern values


plot(y, type = "o")

qbern()
qbern() gives the quantile function for the Bernoulli distribution. A quantile
function in statistical terms specifies the value of the random variable such that
the probability of the variable being less than or equal to that value equals the
given probability.

Syntax: qbern(p, prob, lower.tail = TRUE, log.p = FALSE)

Parameter:
 p: vector of probabilities.
 prob: probability of success on each trial.
 lower.tail: logical
 log.p: logical; if TRUE, probabilities p are given as log(p).
Example:
# import Rlab library
library(Rlab)
# x values for the
# qbern( ) function
x <- seq(0, 1, by = 0.2)
BCA V R

# using qbern( ) function


# to x to obtain corresponding
# Bernoulli QF
y <- qbern(x, prob = 0.5,lower.tail = TRUE, log.p = FALSE)

# plot qbern values


plot(y, type = "o")

rbern()
rbern() function in R programming is used to generate a vector of random
numbers which are Bernoulli distributed.

Syntax: rbern(n, prob)

Parameter:
 n: number of observations.
 prob: number of observations.
Example:
# import Rlab library
library(Rlab)
set.seed(9999)
# sample size
N <- 100
# generate random variables using
# rbern() function
BCA V R

random_values <- rbern(N, prob = 0.5)

# print the values


print(random_values)

# plot of randomly
# drawn density
hist(random_values,breaks = 10,main = "")

Output:

[1] 0 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0
[41] 1 0 1 0 1 1 0 1 1 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 1 0 1 1 0
[81] 1 0 0 0 1 0 0 1 1 0 1 1 0 1 1 1 1 1 0 1

Student t Distribution
The t-distribution, also known as the Student's t-distribution, is a type of
probability distribution that is similar to the normal distribution with its bell
shape but has heavier tails. It is used for estimating population parameters for
small sample sizes or unknown variances.
BCA V R

dt() function in R is used to find the value of probability density function (pdf)
of the Student’s t-distribution given a random variable x,
Syntax: dt(x, df)
Parameters:
 x is the quantiles vector
 df is the degrees of freedom(degrees of freedom determines the shape of
distribution, as degree increases, it becomes normal distribution)
Example:
x_dt <- seq(- 10, 10, by = 0.01)
y_dt <- dt(x_dt, df = 3)
plot(y_dt)

pt() function is used to get the cumulative distribution function (CDF) of a t-


distribution
Syntax: pt(q, df, lower.tail = TRUE)
BCA V R

Parameter:
 q is the quantiles vector
 df is the degrees of freedom
 lower.tail – if TRUE (default), probabilities are P[X ≤ x], otherwise, P[X >
x].
Example:
x_pt <- seq(- 10, 10, by = 0.01) # Specify x-values for pt function
y_pt <- pt(x_pt, df = 3) # Apply pt function
plot(y_pt) # Plot pt values

The qt() function is used to get the quantile function or inverse cumulative
density function of a t-distribution.
Syntax: qt(p, df, lower.tail = TRUE)
Parameter:
 p is the vector of probabilities
 df is the degrees of freedom
 lower.tail – if TRUE (default), probabilities are P[X ≤ x], otherwise, P[X >
x].
Example:
x_qt <- seq(0, 1, by = 0.01) # Specify x-values for qt function
y_qt <- qt(x_qt, df = 3) # Apply qt function
plot(y_qt) # Plot qt values
BCA V R

rt() function is used to generate random deviates from a student’s t-distribution


Syntax: rt(n, df)
Parameter:
• n is the number of observation to generate
• df is the degrees of freedom
Example:
set.seed(91929) # Set seed for reproducibility
N <- 10000 # Specify sample size
y_rt <- rt(N, df = 3) # Draw N log normally distributed values
y_rt # Print values to RStudio console
hist(y_rt, breaks = 100,main = "") # Plot of randomly drawn student t density

You might also like