Nishant R File
Nishant R File
A Practical File
On
Data Analytics Using R Lab.
2
(f) repeat all the values in colour
vector exactly twice
(g)access multiple elements
simultaneoulsy in colour vector
3
9. Check and justify the outcome of 26
the following expression.
Code:
a) sqrt(7)^2==7
b) sqrt(4)^2==4
c) near(sqrt(3)^2,3)
d) near(sqrt(4)^2,5)
33
15. Write a program using tapply to
explore population by region .
16. Write a script file to compute the 34-35
following of the numeric variable
in BOSTON dataset.
a) sum
b) range
c) mean
5
a) find the character count in
each name
b) find “bruno” and “stuart” in
student
41
22. Create an empty vector of factor
datatype for the name of first 6
monthsin an year remainder to keep
the levels in order of month from
jan-june.
6
23. Create a vector to store the grades 42-43
of 20 students for first minor grade
are given as {A,B,C,D}.Compute
the modale grade. Further, store the
grades of the same students for
second minor exam. Compare the
grades for 2 exams. Count number
of student who have got higher
grade.
7
PROGRAM – 01
Install R and then RStudio. Get yourself acquainted with the GUI of various
working windows of RStudio.
Step2: Now open the download icon and click on next-next for setting up R as the instruction
given by the it.
8
Step 3: Select destination location.
9
Step5: Interaction with R GUI interface.
10
Step 2: click next to setup RStudio
11
Step 4: chose select menu folder
12
~Interaction with the GUI of RStudio:
13
PROGRAM – 02
SOURCE CODE
#BTech CSE
a<-38L #Integer
class(a)
b<-38 #Numeric
class(b)
comp<-38+24i #Complex
class(comp)
l<-TRUE #Logical
class(l)
ch<-"STUDENT" #Character
Output: -
class(ch)
14
# Type Conversion
x<-as.character(l)
class(x)
y<-as.complex(b)
class(y)
z<-as.numeric(a)
class(z)
n<-as.logical(a)
class(n)
m<-as.integer(b)
class(m)
Output: -
15
# Mathematical Operation
23+45
67-34
244/4
8*5
min(45,43,67,98,332)
max(34,57,76,12,98)
sum(23,5,32,89)
mean(23,87,65,34)
sqrt(144)
ceiling(7.9)
floor(9.3)
Output: -
16
PROGRAM – 03
R-code:
# B.TECH CSE
# create two vectors and perform multiple operations
x1<-c(23,34,45,67)
x2<-c(98,87,76,54)
x1+x2 # Element wise addition
x1-x2 # Element wise subtract
x1*x2 # Element wise multiplication
paste(x1,x2) #concatenation of two vectors
sum(x1,x2) #sum of concatenated vectors
avg<-sum(x1,x2)/8 #average of the concatenated vector
avg
Output: -
17
PROGRAM – 04
Create a vector of all those values form 1:100 that are divisible by 5
and do the following operation on the vector(x)
(a) find the length of vector x
(b)print the value stored at 10th, 15th, 20th location of vector(x)
(c) find the sum, mean, median, standard deviation of vector(x)
(d)create another vector with name colour and print its value
(e)change the value of 5th location in colour vector
(f) repeat all the values in colour vector exactly twice
(g)access multiple elements simultaneoulsy in colour vector
# B.TECH CSE
#create vector of the value from 1:100 divisible by 5
x<-seq(from=5,to=100,by=5)
x
#a) find the length of vector x
length(x)
#b) print the value of 10th, 15th, 20th location
x[c(10,15,20)]
#c) find the sum, mean, median, standard deviation
sum(x)
sum(x)/length(x) #mean
median(x) #median
sd(x) #standard deviation
#d) create another vector name color and print its value
color<-c("Blue","Red","Green","Yellow","Black")
color
18
#e) change 5th location in color vector
color[5]<-"orange"
color
#f) repeat all the values in color exactly twice
c<-rep(color,each=2)
c
#g) access multiple elements simultaneously in color vector
a<-c(1,2,4)
for (i in a)
{
print(color[i])
}
Output: -
19
PROGRAM – 05
P5. Create a list of Students name and perform the following operations:
(a)access a element from list
(b)change the item value at 2nd and 3rd location
(c)find out the length of a list
(d)check if item exists in a list or not
(e)add the new students name in the list
(f)access the elements of a list through loop
(g)make another list of students and merge it with the original list
# B.TECH CSE
Output: -
21
PROGRAM – 6
Apply the summary() command to iris database of ‘datasets’ package
and interpret the output.
Code:
> #cse
> summary(iris)
Output: -
Interpretation:
The summary () function provides us with information on the distribution of each variable. There
are 50 observations for each of the three sub-species. The iris dataset contains four features
(length and width of sepals and petals) of 50 samples of three species of iris (iris setosa, iris
virginica and iris versicolor). These measures used to create linear discriminant model to classify
species. The dataset is often used in data mining, classification and clustering examples and to test
algorithm.
22
PROGRAM -7
Use plot(iris) function and interpret the output. Write down your
findings about the dataset.
Code:
#cse
plot(iris)
Output: -
Interpretation:
We can see that the data frame contains 6 columns and 150 rows.
Iris dataset is considered as the Hello world for data science. It contains five columns namely Petal
Length, Petal Width, Sepal Length, Sepal Width, and species Type. Iris is a flowering plant, the
researchers have measured various features of the different iris flowers and recorded them digitally.
We can use support vector machine (SVM and a Neural network for further classification the iris
dataset
23
PROGRAM – 8
Install and load ‘MASS’ package and access the Boston dataset. Study the dataset from the
resources available on the internet and write what you can find relevant to the dataset.
Code:
#cse
Install.package(“MASS”)
library(“MASS”)
head(Boston)
summary(Boston)
Output: -
24
Relevant data:
The Boston data frame has 506 rows and 14 columns. For easy understanding of Boston dataset,
we use head (Boston) which gives information about first 6 rows of Boston dataset. It gives the
Housing values in Suburbs of Boston and gives the relevant information about the area such as
crime rates per capita (crim), proportion of residential land zoned for lots over 25000 sq.ft (zn),
nitrogen oxides concentration (parts per 10 million) (nox) etc.
25
PROGRAM - 9
Code:
e) sqrt(7)^2==7
f) sqrt(4)^2==4
g) near(sqrt(3)^2,3)
h) near(sqrt(4)^2,5)
Output: -
26
PROGRAM - 10
A=matrix(c(2,0,1,3),ncol=2)
B=matrix(c(5,2,4,-1),ncol=2)A+B
A-B
3*A diag(c(4,1,2,3),nrow=4)
eigen(A)
Ax=B
t(A)
eigen(A*t(A))
A%*%B b =
c(7,4)
A*b
27
Output: -
28
PROGRAM – 11
Install MASS package and then use apply to find the measure of
central tendency of dispersion .
Code:
intall.package(MASS)
library(MASS)
data(state)
head(state.x77)
str(state.x77)
apply(state.x77,2,mean)
apply(state.x77,2,median)
apply(state.x77,2,sd)
Output: -
29
PROGRAM - 12
Create a function that give mean and standard deviation then
save them as object.
CODE:
Output: -
30
PROGRAM – 13
Write a function that give min ,median and max then save
them as object.
Code:
state.range=apply(state.x77,2,function(x)c(min(x),median(x),max(x)))
state.range
Output: -
31
PROGRAM – 14
CODE:
population<- state.x77[1:50]
area<- state.area
pop.dens<-mapply(function(x,y)x/y, population, area)
pop.dens
Output: -
32
PROGRAM – 15
Code:
region.info<- tapply( population ,state.region ,function (x)
c(min(x),median(x),max(x)))
region.info
Output: -
33
PROGRAM - 16
Write a script file to compute the following of the numeric variable in
BOSTON dataset.
a) sum
b) range
c) mean
d) standard daviation
Code:
T<- library(MASS)
T
Boston
sapply(Boston,sum)
sapply(Boston,range)
sapply(Boston,sd)
Output: -
34
35
PROGRAM - 17
Code:
student<- c("henry", "voung", "bruno", "george", "harry","nicole",
"john", "stuart", "anne", "katty")
nchar(student)
"bruno" %in% student
"stuart" %in% student
Output: -
36
PROGRAM - 18
Output the indexes of the names that contain substring “aa” in
vector student of program 17.
CODE:
student<- c("henry", "voung", "bruno", "george", "harry","nicole",
"john", "stuart", "anne", "katty")
for (I in 1:length(student))
if ( grepl(“Aa”,student[i])==TRUE))
print(i)
Output: -
37
PROGRAM - 19
Find out how many strings end with ‘ry’ in vector student of
assignment 17.
Code:
Output: -
38
PROGRAM - 20
Create a vector of vector of type data for the hair colours of 10
people where values for hair colours are black, grey, dark brown,
blonde.
1) Display the levels of factor data
2) Find the max value in the vector of hair colour using table
function.
Code:
haircolor <- factor(c(“black”,”grey”,”blonde”,”dark
brown”,”black”,”blonde”,”dark brown”,”blonde”,”dark brown”
“grey”))
levels(haircolor)
table(haircolor)
Output: -
39
PROGRAM - 21
Apply class,str,and summary command to the vector
created in assignment 17.
Code:
student<- c("henry", "voung", "bruno", "george", "harry","nicole",
"john", "stuart", "anne", "katty")
class(student)
str(student)
summary(student)
Output: -
40
PROGRAM - 22
Create an empty vector of factor datatype for the name of first 6
months in an year remainder to keep the levels in order of month
from jan-june.
Code:
Output: -
41
PROGRAM – 23
Create a vector to store the grades of 20 students for first minor
grade are given as {A,B,C,D}.Compute the modale grade.
Further, store the grades of the same students for second minor
exam.
Compare the grades for 2 exams. Count number of student who
have got higher grade.
CODE:
minor1<-factor(c(‘A’,’B’,’D’,’C’,’A’, ‘A’,’B’,’D’,’C’,’A’, ‘A’,’B’,’D’,’C’,’A’,
‘A’,’B’,’D’,’C’,’A’),levels=c(‘A’,’B’,’C’,’D’) ,ordered = TRUE)
minor1
table(minor1)
minor2<-factor(c(‘B’,’A’,’D’,’C’,’A’, B’,’A’,’D’,’C’,’A’, B’,’A’,’D’,’C’,’A’,
B’,’A’,’D’,’C’,’A’, B’,’A’,’D’,’C’,’A’),levels=c(‘A’,’B’,’C’,’D’),ordered = TRUE)
minor2
table(minor2)
minor1==minor2
table(minor1==minor2)
sum(minor1>minor2)
which.max(table(minor1))
which.min(table(minor2))
Output: -
42
43
PROGRAM - 24
Create a 4*3 matrix A of uniformly distributed random integer
numbers between 1 to 100. Create another 3*4 matrix B with
uniformly distributed random integer numbers between 1 to 10.
Perform matrix multiplication of the two matrices and store the
result in the third matrix C.
CODE:
A<-matrix(runif(12,min=1,max= 100), nrow=4,ncol=3)
A
B<-matrix(runif(12,min=1,max=10),nrow=3,ncol=4)
B
C=A%*%B
C
Output: -
44
PROGRAM - 25
Create A and B, two 4*3 matrices of normally distributed random
numbers, with mean 0 and standard deviation 1. Find the indices
of all those numbers in matrix A which are less than the respective
numbers in matrix B and print these numbers.
Code:
A<-matrix(rnorm(12,mean=0,sd=1),now=4,ncol=3)
A
B<-matrix(rnorm(12,mean=0,sd=1),nrow=4,ncol-3)
B
A<B
C=which(A<B,arr.ind = TRUE)
A[C]
Output: -
45
PROGRAM - 26
Plotting pressure dataset in different forms:
(a) Histogram
(b) Boxplot
Code:
(a) Histogram
hist(pressure$temperature,main=”frequency distribution of temperature
variable”, xlab=”temperature(in Celcius)”,ylab=”frequency”,border=”black”,
col=c(“violet”,”green”,”blue”,”orange”,”brown”,”purple”,”pink”))
box(“figure”)
PLOT:
46
(b) boxplot
Boxplot(pressure,main=”box plot of variables of pressure dataset”,names =
c(‘temperature(in celcius)’,’pressure(MG)’,border=”black”,col=c(‘blue’,’red’))
PLOT
47
PROGRAM – 27
Plotting mtcars dataset in frequency Ploygon.
CODE:
my_plot = ggplot(mtcars, aes(x=mpg))
my_geom = geom_freqpoly(binwidth = 1.0)
my_labels = labs(title = “Frequency polygon : Mileage per
Gallon(mtcars)”, x = “mpg”, y = ”Frequency”)
my_plot + my_geom + my_labels
OUTPUT:
48
PROGRAM - 28
Plotting scatter plots from iris dataset with title and labels.
CODE:
ggplot(iris, aes(x = Petal.Length, y = Sepal.Length, color = Species,
shape = Species)) + geom_point() +
+labs(title = “Iris: Petal Length versus Sepal Length”, x = “Petal
Length” , y = “Sepal Length”)+
+theme(plot.title = element_text(family = “Helvetica” , size = 12,
face = “bold”), axis.title = element_text(family = “Helvetica”, size =
10, face = “italic”), axis.text = element_text(family = “Courier”, color
= “black”, size = 9))
Output: -
49