0% found this document useful (0 votes)
27 views49 pages

Nishant R File

The document is a practical file for a Data Analytics course using R, detailing various programming tasks and exercises. It includes instructions for installing R and RStudio, performing mathematical operations, creating vectors and lists, and analyzing datasets like iris and Boston. Each section outlines specific programming tasks with corresponding R code and expected outputs.

Uploaded by

Tarun kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views49 pages

Nishant R File

The document is a practical file for a Data Analytics course using R, detailing various programming tasks and exercises. It includes instructions for installing R and RStudio, performing mathematical operations, creating vectors and lists, and analyzing datasets like iris and Boston. Each section outlines specific programming tasks with corresponding R code and expected outputs.

Uploaded by

Tarun kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Guru Jambheshwar University of

Science & Technology, Hisar

A Practical File
On
Data Analytics Using R Lab.

Submitted To: Submitted By:


Renu Nishant
Asst. Professor 200010130080
(Dept. of CSE) B.Tech. CSE 2
(6th Sem/3rd Year)
INDEX

SR NO. PROGRAM PAGE NO. REMARKS


1. Install R and then RStudio. Get 8-13
yourself acquainted with the GUI of
various working windows of
RStudio.

2. Perform the following operations in 14-16


R:
• Create variable of different data
types and print their class.
• Perform type conversion
• Perform all the basic
mathematical operations.

3. Create two vectors and find their 17


elementwise addition, subtraction
and multiplication, concatenate the
two vectors and find their sum and
average.

4. Create a vector of all those values 18-19


form 1:100 that are divisible by 5
and do the following operation
on the vector(x)
(a) find the length of vector x
(b)print the value stored at 10th,
15th, 20th location of vector(x)
(c) find the sum, mean, median,
standard deviation of vector(x)
(d)create another vector with name
colour and print its value
(e)change the value of 5th location
in colour vector

2
(f) repeat all the values in colour
vector exactly twice
(g)access multiple elements
simultaneoulsy in colour vector

5. Create a list of Students name and 20-21


perform the following operations:
(a)access a element from list
(b)change the item value at 2nd
and 3rd location
(c)find out the length of a list
(d)check if item exists in a list or
not
(e)add the new students name in
the list
(f)access the elements of a list
through loop
(g)make another list of students
and merge it with the original list

6. Apply the summary() command to 22


iris database of ‘datasets’ package
and interpret the output.

7. Use plot(iris) function and interpret 23


the output. Write down your
findings about the dataset.

8. Install and load ‘MASS’ package 24-25


and access the Boston dataset.
Study the dataset from the resources
available on the internet and write
what you can find relevant to the
dataset.

3
9. Check and justify the outcome of 26
the following expression.
Code:
a) sqrt(7)^2==7
b) sqrt(4)^2==4
c) near(sqrt(3)^2,3)
d) near(sqrt(4)^2,5)

10. Consider A 27-28


matrix=(c(2,0,1,3),ncol=2) and B
matrix=(c(5,2,4,-1)ncol=2).
1 a) Find A+B
b) Find A-B
c) Scalar multiplication 3*A
2 Using diagonal function build a
diagonal matrix of size 4with the
following values diagonal 4,1,2,3.
3 Using the
function
eigen find
the eigen
values for A.
4 Find the
values of x
on Ax=B
5 Find the transpose matrix of A
6 Find the eigen values and eigen
vectors of A
7 Find the solution of AB, where
B same as in question statement
4
8 Find the solution of AB, where
A is same as in question statement
andb=c (7,4).

11. Install MASS package and then use 29


apply to find the measure of central
tendency of dispersion.

12. Create a function that give mean and 30


standard deviation then save themas
object.

13. Write a function that give min, 31


median and max then save them
as object.

14. Write a program to find 32


population density for each state
by using mapply function.

33
15. Write a program using tapply to
explore population by region .
16. Write a script file to compute the 34-35
following of the numeric variable
in BOSTON dataset.
a) sum
b) range
c) mean

17. Assuming the character vector 36


student having 10 names of
students:

5
a) find the character count in
each name
b) find “bruno” and “stuart” in
student

18. Output the indexes of the names that 37


contain substring “aa” in vector
student of program 17.

19. Find out how many strings end with 38


‘ry’ in vector student of assignment
17.

20. Create a vector of vector of type 39


data for the hair colours of 10
peoplewhere values for hair colours
are black, grey, dark brown, blonde.
1) Display the levels of factor data
2) Find the max value in the
vector of hair color using table
function.
21. Apply class, str, and summary 40
command to the vector created in
assignment 17.

41
22. Create an empty vector of factor
datatype for the name of first 6
monthsin an year remainder to keep
the levels in order of month from
jan-june.

6
23. Create a vector to store the grades 42-43
of 20 students for first minor grade
are given as {A,B,C,D}.Compute
the modale grade. Further, store the
grades of the same students for
second minor exam. Compare the
grades for 2 exams. Count number
of student who have got higher
grade.

24. Create a 4*3 matrix A of uniformly 44


distributed random integer numbers
between 1 to 100. Create another
3*4 matrix B with uniformly
distributed random integer numbers
between 1 to 10.
Perform matrix multiplication of the
two matrices and store the result in
the third matrix C.

25. Create A and B, two 4*3 matrices of 45


normally distributed random
numbers, with mean 0 and standard
deviation 1. Find the indices of all
those numbers in matrix A which
are less than the respective numbers
in matrix B and print these numbers.

26. Plotting pressure dataset in different 46-47


forms:
(a) Histogram
(b) Boxplot

27. Plotting mtcars dataset in frequency 48


Ploygon.

28. Plotting scatter plots from iris 49


dataset with title and labels.

7
PROGRAM – 01
Install R and then RStudio. Get yourself acquainted with the GUI of various
working windows of RStudio.

Step 1: To install R, go to https://cran.rstudio.com/ and click on Download R-4.2.2 for windows{if


you are operating on windows}.

Step2: Now open the download icon and click on next-next for setting up R as the instruction
given by the it.

8
Step 3: Select destination location.

Step 4: Installing the R.

9
Step5: Interaction with R GUI interface.

~Installing the RStudio:

Step1: To download RStudio, go to www.rstudio.com and click on download RSTUDIO


DESKTOP FOR WINDOWS button.

10
Step 2: click next to setup RStudio

Step3: Select path for the RStudio

11
Step 4: chose select menu folder

Step 5: Installing of RStudio

12
~Interaction with the GUI of RStudio:

13
PROGRAM – 02

Perform the following operations in R:


• Create variable of different data types and print their class.
• Perform type conversion
• Perform all the basic mathematical operations.

SOURCE CODE
#BTech CSE
a<-38L #Integer
class(a)
b<-38 #Numeric
class(b)
comp<-38+24i #Complex
class(comp)
l<-TRUE #Logical
class(l)
ch<-"STUDENT" #Character
Output: -

class(ch)

14
# Type Conversion
x<-as.character(l)
class(x)
y<-as.complex(b)
class(y)
z<-as.numeric(a)
class(z)
n<-as.logical(a)
class(n)
m<-as.integer(b)
class(m)
Output: -

15
# Mathematical Operation

23+45
67-34
244/4
8*5
min(45,43,67,98,332)
max(34,57,76,12,98)
sum(23,5,32,89)
mean(23,87,65,34)
sqrt(144)
ceiling(7.9)
floor(9.3)

Output: -

16
PROGRAM – 03

Create two vectors and find their elementwise addition, subtraction


and multiplication, concatenate the two vectors and find their sum
and average.

R-code:
# B.TECH CSE
# create two vectors and perform multiple operations
x1<-c(23,34,45,67)
x2<-c(98,87,76,54)
x1+x2 # Element wise addition
x1-x2 # Element wise subtract
x1*x2 # Element wise multiplication
paste(x1,x2) #concatenation of two vectors
sum(x1,x2) #sum of concatenated vectors
avg<-sum(x1,x2)/8 #average of the concatenated vector
avg

Output: -

17
PROGRAM – 04

Create a vector of all those values form 1:100 that are divisible by 5
and do the following operation on the vector(x)
(a) find the length of vector x
(b)print the value stored at 10th, 15th, 20th location of vector(x)
(c) find the sum, mean, median, standard deviation of vector(x)
(d)create another vector with name colour and print its value
(e)change the value of 5th location in colour vector
(f) repeat all the values in colour vector exactly twice
(g)access multiple elements simultaneoulsy in colour vector

# B.TECH CSE
#create vector of the value from 1:100 divisible by 5
x<-seq(from=5,to=100,by=5)
x
#a) find the length of vector x
length(x)
#b) print the value of 10th, 15th, 20th location
x[c(10,15,20)]
#c) find the sum, mean, median, standard deviation
sum(x)
sum(x)/length(x) #mean
median(x) #median
sd(x) #standard deviation
#d) create another vector name color and print its value
color<-c("Blue","Red","Green","Yellow","Black")
color

18
#e) change 5th location in color vector
color[5]<-"orange"
color
#f) repeat all the values in color exactly twice
c<-rep(color,each=2)
c
#g) access multiple elements simultaneously in color vector
a<-c(1,2,4)
for (i in a)
{
print(color[i])
}
Output: -

19
PROGRAM – 05

P5. Create a list of Students name and perform the following operations:
(a)access a element from list
(b)change the item value at 2nd and 3rd location
(c)find out the length of a list
(d)check if item exists in a list or not
(e)add the new students name in the list
(f)access the elements of a list through loop
(g)make another list of students and merge it with the original list
# B.TECH CSE

# Create a list of students name


students<-list("George","Alice","Joe")
students[2] #(a)access the element of list

#(b)change the item value at 2nd and 3rd position


students[2]<-"harry"
students[3]<-"nicole"
students[c(2,3)]
length(students) #(c)find out the length of a list
"lokesh" %in% students #(d)check if item exists in a list or not
studentsnew=append(students,"deny ") #(e)add the new students name in the list
studentsnew
for(x in studentsnew) #(f)access the elements of a list through loop
{
print(x)
}
#(g)make another list of students and merge it with the original list
students2=list("sam","rio","john","katty")
20
paste(c(studentsnew,students2))

Output: -

21
PROGRAM – 6
Apply the summary() command to iris database of ‘datasets’ package
and interpret the output.
Code:
> #cse
> summary(iris)
Output: -

Interpretation:
The summary () function provides us with information on the distribution of each variable. There
are 50 observations for each of the three sub-species. The iris dataset contains four features
(length and width of sepals and petals) of 50 samples of three species of iris (iris setosa, iris
virginica and iris versicolor). These measures used to create linear discriminant model to classify
species. The dataset is often used in data mining, classification and clustering examples and to test
algorithm.

22
PROGRAM -7
Use plot(iris) function and interpret the output. Write down your
findings about the dataset.
Code:

#cse
plot(iris)
Output: -

Interpretation:
We can see that the data frame contains 6 columns and 150 rows.
Iris dataset is considered as the Hello world for data science. It contains five columns namely Petal
Length, Petal Width, Sepal Length, Sepal Width, and species Type. Iris is a flowering plant, the
researchers have measured various features of the different iris flowers and recorded them digitally.
We can use support vector machine (SVM and a Neural network for further classification the iris
dataset

23
PROGRAM – 8
Install and load ‘MASS’ package and access the Boston dataset. Study the dataset from the
resources available on the internet and write what you can find relevant to the dataset.

Code:

#cse
Install.package(“MASS”)
library(“MASS”)
head(Boston)
summary(Boston)
Output: -

24
Relevant data:

The Boston data frame has 506 rows and 14 columns. For easy understanding of Boston dataset,
we use head (Boston) which gives information about first 6 rows of Boston dataset. It gives the
Housing values in Suburbs of Boston and gives the relevant information about the area such as
crime rates per capita (crim), proportion of residential land zoned for lots over 25000 sq.ft (zn),
nitrogen oxides concentration (parts per 10 million) (nox) etc.

25
PROGRAM - 9

Check and justify the outcome of the following expression.

Code:

e) sqrt(7)^2==7
f) sqrt(4)^2==4
g) near(sqrt(3)^2,3)
h) near(sqrt(4)^2,5)

Output: -

26
PROGRAM - 10

Consider A matrix=(c(2,0,1,3),ncol=2) and B matrix=(c(5,2,4,-1)ncol=2).1


a)Find A+B
d) Find A-B
e) Scalar multiplication 3*A
4 Using diagonal function build a diagonal matrix of size 4with thefollowing
values diagonal 4,1,2,3.
5 Using the function eigen find the eigen values for A.4
Find the values of x on Ax=B
9 Find the transpose matrix of A
10 Find the eigen values and eigen vectors of A
11 Find the solution of AB,where B same as in question statement
12 Find the solution of AB ,where A is same as in question statement and
b=c(7,4).
Code:

A=matrix(c(2,0,1,3),ncol=2)
B=matrix(c(5,2,4,-1),ncol=2)A+B

A-B

3*A diag(c(4,1,2,3),nrow=4)
eigen(A)
Ax=B
t(A)

eigen(A*t(A))

A%*%B b =
c(7,4)
A*b

27
Output: -

28
PROGRAM – 11

Install MASS package and then use apply to find the measure of
central tendency of dispersion .

Code:
intall.package(MASS)
library(MASS)
data(state)
head(state.x77)
str(state.x77)
apply(state.x77,2,mean)
apply(state.x77,2,median)
apply(state.x77,2,sd)

Output: -

29
PROGRAM - 12
Create a function that give mean and standard deviation then
save them as object.

CODE:

state.summary<- apply(state.x77,2,function(x) c(mean(x),sd(x)))


state.summary

Output: -

30
PROGRAM – 13

Write a function that give min ,median and max then save
them as object.

Code:
state.range=apply(state.x77,2,function(x)c(min(x),median(x),max(x)))
state.range

Output: -

31
PROGRAM – 14

Write a program to find population density for each state by


using mapply function.

CODE:

population<- state.x77[1:50]
area<- state.area
pop.dens<-mapply(function(x,y)x/y, population, area)
pop.dens

Output: -

32
PROGRAM – 15

Write a program using tapply to explore population by region .

Code:
region.info<- tapply( population ,state.region ,function (x)
c(min(x),median(x),max(x)))
region.info

Output: -

33
PROGRAM - 16
Write a script file to compute the following of the numeric variable in
BOSTON dataset.

a) sum
b) range
c) mean
d) standard daviation

Code:

T<- library(MASS)
T
Boston
sapply(Boston,sum)
sapply(Boston,range)
sapply(Boston,sd)

Output: -

34
35
PROGRAM - 17

Assuming the character vector student having 10 names of


students:
c) find the character count in each name
d) find “bruno” and “stuart” in student

Code:
student<- c("henry", "voung", "bruno", "george", "harry","nicole",
"john", "stuart", "anne", "katty")
nchar(student)
"bruno" %in% student
"stuart" %in% student

Output: -

36
PROGRAM - 18
Output the indexes of the names that contain substring “aa” in
vector student of program 17.

CODE:
student<- c("henry", "voung", "bruno", "george", "harry","nicole",
"john", "stuart", "anne", "katty")
for (I in 1:length(student))
if ( grepl(“Aa”,student[i])==TRUE))
print(i)

Output: -

37
PROGRAM - 19
Find out how many strings end with ‘ry’ in vector student of
assignment 17.

Code:

student<- c("henry", "voung", "bruno", "george", "harry","nicole",


"john", "stuart", "anne", "katty")
endswith(student , ”ry”)

Output: -

38
PROGRAM - 20
Create a vector of vector of type data for the hair colours of 10
people where values for hair colours are black, grey, dark brown,
blonde.
1) Display the levels of factor data
2) Find the max value in the vector of hair colour using table
function.

Code:
haircolor <- factor(c(“black”,”grey”,”blonde”,”dark
brown”,”black”,”blonde”,”dark brown”,”blonde”,”dark brown”
“grey”))

levels(haircolor)
table(haircolor)

Output: -

39
PROGRAM - 21
Apply class,str,and summary command to the vector
created in assignment 17.
Code:
student<- c("henry", "voung", "bruno", "george", "harry","nicole",
"john", "stuart", "anne", "katty")
class(student)
str(student)

summary(student)

Output: -

40
PROGRAM - 22
Create an empty vector of factor datatype for the name of first 6
months in an year remainder to keep the levels in order of month
from jan-june.
Code:

Month <-factor(c(), levels=c(‘january’,’february’,’march’,’april’,


‘may’, june’))
Month

Output: -

41
PROGRAM – 23
Create a vector to store the grades of 20 students for first minor
grade are given as {A,B,C,D}.Compute the modale grade.
Further, store the grades of the same students for second minor
exam.
Compare the grades for 2 exams. Count number of student who
have got higher grade.

CODE:
minor1<-factor(c(‘A’,’B’,’D’,’C’,’A’, ‘A’,’B’,’D’,’C’,’A’, ‘A’,’B’,’D’,’C’,’A’,
‘A’,’B’,’D’,’C’,’A’),levels=c(‘A’,’B’,’C’,’D’) ,ordered = TRUE)
minor1
table(minor1)
minor2<-factor(c(‘B’,’A’,’D’,’C’,’A’, B’,’A’,’D’,’C’,’A’, B’,’A’,’D’,’C’,’A’,
B’,’A’,’D’,’C’,’A’, B’,’A’,’D’,’C’,’A’),levels=c(‘A’,’B’,’C’,’D’),ordered = TRUE)
minor2
table(minor2)
minor1==minor2
table(minor1==minor2)
sum(minor1>minor2)
which.max(table(minor1))
which.min(table(minor2))

Output: -

42
43
PROGRAM - 24
Create a 4*3 matrix A of uniformly distributed random integer
numbers between 1 to 100. Create another 3*4 matrix B with
uniformly distributed random integer numbers between 1 to 10.
Perform matrix multiplication of the two matrices and store the
result in the third matrix C.

CODE:
A<-matrix(runif(12,min=1,max= 100), nrow=4,ncol=3)
A
B<-matrix(runif(12,min=1,max=10),nrow=3,ncol=4)
B
C=A%*%B
C

Output: -

44
PROGRAM - 25
Create A and B, two 4*3 matrices of normally distributed random
numbers, with mean 0 and standard deviation 1. Find the indices
of all those numbers in matrix A which are less than the respective
numbers in matrix B and print these numbers.
Code:
A<-matrix(rnorm(12,mean=0,sd=1),now=4,ncol=3)
A
B<-matrix(rnorm(12,mean=0,sd=1),nrow=4,ncol-3)
B
A<B
C=which(A<B,arr.ind = TRUE)
A[C]

Output: -

45
PROGRAM - 26
Plotting pressure dataset in different forms:
(a) Histogram
(b) Boxplot
Code:
(a) Histogram
hist(pressure$temperature,main=”frequency distribution of temperature
variable”, xlab=”temperature(in Celcius)”,ylab=”frequency”,border=”black”,
col=c(“violet”,”green”,”blue”,”orange”,”brown”,”purple”,”pink”))
box(“figure”)
PLOT:

46
(b) boxplot
Boxplot(pressure,main=”box plot of variables of pressure dataset”,names =
c(‘temperature(in celcius)’,’pressure(MG)’,border=”black”,col=c(‘blue’,’red’))

PLOT

47
PROGRAM – 27
Plotting mtcars dataset in frequency Ploygon.

CODE:
my_plot = ggplot(mtcars, aes(x=mpg))
my_geom = geom_freqpoly(binwidth = 1.0)
my_labels = labs(title = “Frequency polygon : Mileage per
Gallon(mtcars)”, x = “mpg”, y = ”Frequency”)
my_plot + my_geom + my_labels

OUTPUT:

48
PROGRAM - 28
Plotting scatter plots from iris dataset with title and labels.

CODE:
ggplot(iris, aes(x = Petal.Length, y = Sepal.Length, color = Species,
shape = Species)) + geom_point() +
+labs(title = “Iris: Petal Length versus Sepal Length”, x = “Petal
Length” , y = “Sepal Length”)+
+theme(plot.title = element_text(family = “Helvetica” , size = 12,
face = “bold”), axis.title = element_text(family = “Helvetica”, size =
10, face = “italic”), axis.text = element_text(family = “Courier”, color
= “black”, size = 9))

Output: -

49

You might also like