0% found this document useful (0 votes)
24 views53 pages

R Practicals

The document discusses various data structures and looping functions in R programming. It provides examples of creating different data structures like vectors, matrices, arrays, data frames and lists. It also demonstrates the use of mathematical, statistical and looping functions like apply, lapply and sapply on vectors and data frames. Indexing and subsetting of vectors and data frames is explained with examples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views53 pages

R Practicals

The document discusses various data structures and looping functions in R programming. It provides examples of creating different data structures like vectors, matrices, arrays, data frames and lists. It also demonstrates the use of mathematical, statistical and looping functions like apply, lapply and sapply on vectors and data frames. Indexing and subsetting of vectors and data frames is explained with examples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 53

A LAB Manual of

Elements of R Programming
BBA BUSINESS ANALYTICS, Sem-II
2022-23

Faculty
Dr. D. Srinivasa Rao
INDEX

Remarks
PNO Name of the Experiment

1 Introduction to R programming: creating Data


Structures in R
2 Basic Mathematical and Statistical Functions in
R
3 Indexing and sub-setting of Data Frames
4 Looping Functions in R
5 Basic Graphs with R
6 Advanced Graphs with R packages
7 Correlation and Regression with R
8 Sampling with R
9
10

Signature of Faculty
Declaration

The data and information created and presented is the manual is


original to the best of my knowledge.

Signature of Student
Practical 1: Creating Data Structures in R
2. Concept : Data structures in R. In R program there are
Number of data structures. A data structure is a
Frame work is to hold different types of data in R.
There are 5 main data structures in R:
1. Vector
2. Matrix
3. Array
4. Data structure
5. List
3. Example: We shall create all five data Structures.
4. Procedure:
- To create a vector we shall use c() function
-To create a matrix we shall use matrix() function

-To create an array we shall use array() function


-To create a data frame we shall use dataframe()
Function
-To create a list we shall use list() function
5.R code:

##############################
#Data structure in R program
#############################
#Creating a vector
# vector of Doubles
x=c(1.1,2,9,3.4,4.7)
class(x)
#Vector of integers
Y=c(1L,3L,5L)
class(Y)
#charecter vector
z=c('a','b','c')
class(z)
# logical vector
A=c(T,F,T)
class(A)
# complex vector
B=complex(real=c(1,2,3), imaginary=c(2,3,1))
class(B)
# creating matrices
mymat=matrix(1:10,nrow=2)
mymat
class(mymat)
# creating an Array
myarr=array(1:12,dim=c(2,2,3))
myarr
class(myarr)
#data frame
x1=c(1,2,3,5)
y1=c('a','b','c','d')
z1=c(T,F,T,F)
mydf=data.frame(x1,y1,z1)
mydf
class(mydf)
# creating a list
mylist=list(c(12,3), matrix(1:10,2), mydf, myarr)
mylist
class(mylist)

1.
Screenshot of R syntax
Practical 2: Basic Mathematical and Statistical Functions in R
2.Concept: Built in Mathematical and Statistical Functions in R
The following are some of the mathematical functions in R:
 sum() # sum of number
 prod() # product of number
 seq() # sequence of number
 rep() # repeating an input
 min() # minimum of a numeric vector
 max() # maximum of a numeric vector
 log() #logarithm with a base e
 exp()# exponentiation
 abs() # absolute value
 length() #no. of elements in the vector
 dim() # no. of rows and columns in a data frame
 sqrt()# square root
 factorial() #factorial of a given number
 choose() #combinations
 rank() # ranking of numbers
The following are some of the statistical functions in R:
 mean()
 median()
 sd() # standard deviation of a vector
 var() #variance of vector
 quantile() # quintiles
 skew() #skewness
 range() #range
 cor()#correaltion
 summary()#descriptive summary of data frames
3.Example:
 sum(1:100) # sum of first 100 numbers
 prod(1:10) # product of first 10 numbers
 seq(1:30) # sequence from 1 to 30
 rep(c(1,2,3),3) # repeating an input 3 times
 min(c(2,10,20,30)) # minimum of a numeric
vector
 max(c(10,30,100,7)) # maximum of a numeric
vector
 log(10) #logarithm with a base e
 exp(20)# exponentiation of 20
 abs(c(-2,-3,1)) # absolute value of vector with
negative elements
 length(c(2,2,8,9,10)) #no. of elements in the
vector
 dim(trees) # no. of rows and columns in trees
data frame
 sqrt(169)# square root of 169
 factorial(5) #factorial of 5
 choose(5,3) #5c3
 rank(c(2,10,12,7,9,3,14)) # ranking of elements in
the vector
# Statistical functions :
 mean(1:10) #mean of first 10 numbers
 median(20:40)# median of numbers from 20 - 40
 sd(c(2,2,3,10,10)) # standard deviation of a
vector
 var(c(12,10,20,31,21)) #variance of vector
 quantile(c(1,3,5,9,10)) # quartiles of a vector
library(psysch)
 skew(c(23,13,14,20,12)) #skewness
 range(c(2,10,34,23,18)) #range
 cor(x=c(12,18,23,12,10),y=c(1,3,8,4,12))#correal
tion between x and y
 summary(trees) #descriptive summary of data
frames:trees
4.Procedure: We shall execute them in R script
5.R code output:
####################################
##Lab 2: Mathematical function in R
####################################
sum(1:100) # sum of first 100 numbers
prod(1:10) # product of first 10 numbers
seq(1:30) # sequence from 1 to 30
rep(c(1,2,3),3) # repeating an input 3 times
min(c(2,10,20,30)) # minimum of a numeric vector
max(c(10,30,100,7)) # maximum of a numeric vector
log(10) #logarithm with a base e
exp(20)# exponentiation of 20 by 2
abs(c(-2,-3,1)) # absolute value of vector with negative elements
length(c(2,2,8,9,10)) #no. of elements in the vector
dim(trees) # no. of rows and columns in trees data frame
sqrt(169)# square root of 169
factorial(5) #factorial of 5
choose(5,3) #5c3
rank(c(2,10,12,7,9,3,14)) # ranking of elements in the vector
################################
##Statistical functions in R
###############################
mean(1:10) #mean of first 10 numbers
median(20:40)# median of numbers from 20 - 40
sd(c(2,2,3,10,10)) # standard deviation of a vector
var(c(12,10,20,31,21)) #variance of vector
quantile(c(1,3,5,9,10)) # quartiles of a vector
library(psych)
skew(c(23,13,14,20,12)) #skewness
range(c(2,10,34,23,18)) #range
cor(x=c(12,18,23,12,10),y=c(1,3,8,4,12))#correaltion between x and y
summary(trees) #descriptive summary of data frames:trees

Screen shot of R syntax:


Practical-3: Indexing and Sub-Setting of
Data frames
1.Concept:

Indexing or sub-setting means taking the part of the data.


Data could be a vector or a dataframe.
-To subset a vector we use the [] operator
-To subset a dataframe we use three methods:
1. The $ operator
2. The [i,j] operator, ‘i’ for rows and ‘j’ for column
3. The subset () function

2. Examples & Procedure:

#############################################
##Practical-3: Indexing and Sub-setting of data frames
#############################################
## craeting a numeric vector x and charecter vector y
x=c(1,3,6,9,3,3)
y=c('a','b','c')
##subset of first three elements of x
x[1:3]
## subset of third and fifth element of x
x[c(3,5)]
## subset of last element of y
y[3]
## subsetting of a data frame 'trees'
## method 1: subsetting a column with $ operator
## subset 'volume' column from trees data frame
trees$Volume
## subset species column from 'iris' data frame
iris$Species
## method 2: using [] operator
## subset first 12 rows of trees data frame
trees[1:12,]
## subset first 1,4,5,9 rows of trees data frame
trees[c(1,4,5,9),]
## subset first two columns of trees data frame
trees[,c(1:2)]
## subset first three rows and first two columns
trees[c(1:3),c(1:2)]
## neagtive subsetting
trees[-(4:31),-3]
## method-3:subset() function
## find trees with height > 75 in trees data frame
subset(trees,trees$Height>75)
## find trees with height > 75 and volume>18in trees data frame
subset(trees,trees$Height>75&trees$Volume>18)
## find trees with height > 75 and volume>18 and girth<18 in
trees data frame
subset(trees,trees$Height>75&trees$Volume>18&trees$Girth<18
)
3.Output:
Practical-4: Looping Functions in R

1.Concept:

Looping means repeatedly performing a function on a vector or a data


frame. In R programming there are several looping functions:
1. T apply()
2. apply()
3. L apply()
4. S apply()

o T apply function is used on a vector across groups.


o Apply function is used on a dataframe : row wise and column wise.
o L apply function is used only on columns of a data frame.
o S apply function is a simplification of l apply function.

2.Examples and Procedure:


#############################################
##Practical-4: Looping functions in R
#############################################
## tapply() function
## find group(species) wise arthematic mean of Sepal.Length of iris data
tapply(iris$Sepal.Length,iris$Species,mean)
## find group wise median of petal width of iris data
tapply(iris$Petal.Width,iris$Species,median)
## apply() funtion
## find standard deviation all varibles in trees dataframe
apply(trees,2,sd)
## find summary all varibles in trees dataframe
apply(trees,2,summary)
### lapply()Function- lapply functions applies on columns
## the output is a list
lapply(trees,var)
## s apply -it is similar to lapply and simplifies the output of lapply
sapply(trees,var)

3.Output:
Practical-5: Basic Graphs with R

1.Concept:

 Graphs and charts means converting numbers into visuals


 The type of graph depends on the types of data
 We have two types of data: Quantitative and Qualitative
 We have three types of Graphs and Charts
o Uni-variate
o Bi-variate
o Multivariate

 From the above classification we have


o Plots for Univariate quant variables
o Plots for Univariate Qualitative variables
o Plots for Bivariate Quant variables
o Plots for Bivariate Qualitative variables
o Plots for Multivariate Quant variables

2.: Examples and Procedures:


 Plots for Univariate Quant variables:
o Histogram
o Box plot
o Density plot
 Plots for Univariate Qualitative variables:
o Pie chart
o Bar chart
 Plots for Bi variate Quant variables:
o Scatter plot
 Plots for Bivariate Qualitative variables:
o Stacked Bar plot
o Side by Side bar plot
 Plots for Multivariate Quant variables:
o Heat map
3.R – Code :
#################################
### Practical-5:Basic Graphs with R
###############################
###Plots for univariate quant variables
## Histogram
hist(trees$Girth)
## Box plot
boxplot(iris$Sepal.Length)
## Density plot
plot(density(chickwts$weight))
###Plots for univariate qualitative variables
pie(table(chickwts$feed))
## barchart
barplot(table(chickwts$feed))
## Plots for Bivariate Quant variables
##Scatter plot
plot(iris$Sepal.Length,iris$Sepal.Width)
## Plots for Bivariate Qualitative variables
## stacked bar diagram
## create gender and religion
gender=rep(c('m','f','m'),30)
religion=rep(c('H','M','C','O'),c(50,20,10,10))
barplot(table(gender,religion))
## side by side barplot
barplot(table(gender,religion),beside = T)
### Multivariate plots
## Heat map
library(psych)
psych::cor.plot(trees)
cor.plot(trees)
4.Screen shot of R syntax:
5.Graphs:
#Univariate quant variable:
a. Histogram:

b. Box plot:
c. Density plot:

# Univariate Qualitative variables


a.Pie chart:
b. Bar plot:

# Bivariate of Quant variables:


a. Scatter plot:
#Bivariate for Qualitative variables:
a. Stacked bar diagram:

b. Side by side bar plot:


#Multivariate plots Quant variables :
a. Heat Map:
Practical-6: Advanced Graphs with R packages
1.Concept:
 In R there are two special packages for advanced graphs: Lattice
and ggplot2
 Lattice package is based on grid system
 ggplot2 package is based on grammar of graphics
 We can draw both basic and advanced graphs with these
packages

2. Examples and Procedure:

Lattice package:
 Lattice package is built on the principle of grid system
 It uses formula interface
 Lattice Syntax format:(DV~ IV |group, data)

ggplot2 package:
 Based on grammar of graphics
 It uses three components in the syntax
o Data
o Aesthetics(aes)
o Geometry(geom)
3.R-code:
#############################
## Graphs with lattice package
## Graphs for quantitative variables
# Histogram
library(lattice)
histogram(~ Height,data=trees)
## Grouped Histogram
histogram(~Sepal.Length|Species,data=iris)
## Density plot
densityplot(~Sepal.Width,data=iris)
## grouped density plots
densityplot(~Sepal.Width|Species,data=iris)
## Box plot
bwplot(~Petal.Length,data=iris)
## Grouped Box plots
bwplot(~Petal.Length|Species,data=iris)
# Scatter plot
xyplot(Height~Girth,data=trees)
## Bar plot
barchart(~Sepal.Length|Species,data=iris)
######################
##plots with ggplot2
#####################
## univariate plots
## plots in Quantitative variables
library(ggplot2)
##Histogram
ggplot(data=trees,aes(x=Height))+geom_histogram(bins=10)
## Box plot
ggplot(data=trees,aes(x=Girth))+geom_boxplot()
##Density plot
ggplot(data=trees,aes(x=Volume))+geom_density()
##plots for qualitative variable
## Bar plot
ggplot(data=chickwts,aes(x=feed))+geom_bar()
##Pie chart
ggplot(data=chickwts,aes(x=feed))+geom_bar()+coord_polar()
##Bivariate plots
## Scatter plot
ggplot(data=trees,aes(x=Height,y=Volume))+geom_point()
4.Screen shot of R syntax:

5.Graphs:

# Lattice Package

#Quantitative variables

1.Histogram:
2.Grouped Histogram:

3.Density plot:
4.Grouped Density Plot:

5.Box plot:
6.Grouped Box plots:

7.Scatter plot:
8.Bar plot:

## ggplot2 package

## Univariate plots

##Quantitative variables
1. Histogram:

2. Box plot:

3.Density plot:
# Plots for Qualitative Variables:
1. Bar plot:

2. Pie chart:
##Bivariate plot:

1. Scatter plot:
Practical 7: Correlation and Regression with R

1.Concept:

a) Correlation:
- Correlation is a measure of linear association
between two numeric variables
- Consideration can be positive , negative , zero
- We can measure correlation by two methods:
o Pearson’s Correlation Coefficient
o Spearman’s Rank Correlation Coefficient
- Correlation between more than two variables is
called Partial Correlation
b) Regression:
 Regression tries to find functional relationship
between Dependent and Independent variable
 Regression is of two types: Linear and Non-linear

2.Example :

 We shall consider mtcars data from R base.


 We use the R function: cor(mtcars) to find out both
Pearson and spearman’s correlation coefficient
## Simple Linear Regression Model:

 We shall use trees data set for this


 DV=Volume , IV=Height
 R function for Linear Regression is lm()

3.Procedure: We shall execute them in R script.

 >Cor(mtcars) #Correlation Matrix


 ## Simple Linear Regression Model
>Model=lm(Volume~Height , data=trees)

 ## Summary of Model
>Summary(Model)

 ## Finding MSE
>mse=(mean(Model$fitted-Model$Volume)^2)
 ##Multiple Linear Regression Model
#DV= Volume , IVS=Height , Girth
>Model=lm(Volume~Height+Girth,data=trees)

 ## Summary of Model
>Summary(Model)

 ## Finding MSE
>mse=(mean(Model$fitted-Model$Volume)^2)
4. R-Output:

## Finding correlation coefficients

## Pearson correlation coefficient between mpg and hp of mtcars


data

cor(mtcars$mpg,mtcars$hp)

## Spearman's Rank Correlation between mpg and cyl

cor(mtcars$mpg,mtcars$hp,method = 'spearman')

## Correlation Matrix

cor(mtcars)

##Simple Linear Regression

Model=lm(Volume~Height,data=trees)
summary(Model)

##R-Square value is 0.3579

##Finding MSE

MSE=mean((Model$fitted.values-trees$Volume)^2)

MSE

## We find R- Square as0.3579 and MSE 167.89

##Multiple Linear Regression

Model=lm(Volume~Height+Girth,data=trees)

##Summary

summary(Model)

## we find R-Square value is 0.9442

#Finding MSE

MSE=mean((Model$fitted.values-trees$Volume)^2)

MSE

## We find that MSE is 13.61

5. R-Syntax:
Practical-8: Sampling with R

1. Concept:
i. Sampling-
 Sampling means the method of taking a sample
from a population with or without replacement.
ii. Types of Sampling-
 There are two types of sampling
o Probability sampling
o Non – Probability sampling
iii. Probability Sampling- It is based on rules of
Probability. There are four types of Probability
sampling:
o Simple Random Sampling
o Systematic Random Sampling
o Stratified Random Sampling
o Cluster Sampling

2. Procedure: We shall execute the three different sampling


methods in R.
#Simple Random Sampling with/without Replacement of a
Vector.
###############################
##Practical 8- sampling with R
###############################
## simple random sampling
##Create a population pf 100 natural numbers
pop=1:100
##Random sample of 20 members without replacement
sample(pop,size = 20)
##Random sample of 20 members with replacement
sample(pop,size=20,replace = T)
##Systematic random sampling
##with 20 natural numbers take systematic random sampling of
3 elements.
##code for systematic sampling
obtain_sys=function(N,n){
k=ceiling(N/n)
r=sample(1:k,1)
seq(r,r+k*(n-1),k)}
obtain_sys(20,3)
##Stratified Random Sampling
## we shall do stratified randomsampling on chickwts data
library(sampling)
sampling::strata(chickwts,stratanames =
c('feed'),size=c(4,4,3,2,3,3))
3. R-output screen shots:
Practical-9: Time Series Analysis: Decomposition

1. Concept:
 Decomposition in time series means separating
the components of a time series.
 There are two types of Decompositions
o Additive : T+C+S+R
o Multiplicative : T*C*S*R
o Here T is Trend, C is Cycles, S is
Seasonality, and R is Random components
2. Procedure:
 From a given time series we shall calculate
Trend, Seasonality, Cycles separately and
then subtract three components from the
time series and get the random component.
3. Example:
 We shall consider a built in R-data set
‘JohnsonJohnson’ for decomposition.
 R-code for decomposition of
‘JohnsonJohnson’ data.

4. R-CODE:
#############

## Lab-9

#############

## Decomposition of Time series data

## data: Johnsonjohnson

#######################

# Loading the data

data("JohnsonJohnson")

## Structure of data

str(JohnsonJohnson)

## it is quaterely time series data

## Examing the time series plot

plot(JohnsonJohnson)

## we find that seasonality in the data is increasing

## we shall use multiplicative model :T *S*R

mod=decompose(JohnsonJohnson,type="multiplicative")

## Finding trend,seasonality and random components


mod_trend=mod$trend

mod_season=mod$seasonal

mod_rand=mod$random

##ploting the decomposition of johnson johnson data

plot(decompose(JohnsonJohnson))

## in the plot all the three components are shown

5. R-output:

You might also like