0% found this document useful (0 votes)
85 views9 pages

R Assignment

This document contains 7 problems involving data analysis and manipulation in R. Key points: - Problem 1 has the user perform basic calculations in R and showcase the output. - Problem 2 involves creating and manipulating vectors, including multiplying two vectors and extracting subsets. - Problem 3 calculates statistics (sum, median, standard deviation) of a vector created in Problem 2. - Problem 4 demonstrates creating and combining multiple vectors, adding new records, and indexing values. - Problem 5 creates lists of different object types and merges the lists. - Problem 6 loads dataset WHO1, creates plots and tables to visualize and analyze variables. - Problem 7 demonstrates creating a data frame, applying a conditional to a column

Uploaded by

Pratik Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views9 pages

R Assignment

This document contains 7 problems involving data analysis and manipulation in R. Key points: - Problem 1 has the user perform basic calculations in R and showcase the output. - Problem 2 involves creating and manipulating vectors, including multiplying two vectors and extracting subsets. - Problem 3 calculates statistics (sum, median, standard deviation) of a vector created in Problem 2. - Problem 4 demonstrates creating and combining multiple vectors, adding new records, and indexing values. - Problem 5 creates lists of different object types and merges the lists. - Problem 6 loads dataset WHO1, creates plots and tables to visualize and analyze variables. - Problem 7 demonstrates creating a data frame, applying a conditional to a column

Uploaded by

Pratik Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

R Assigment

Problem 1: Use R as a calculator to compute the following values. After you do so, cut and paste
your input and output from R to Word. Add numbering in Word to identify each part of each
problem. (Do this for every problem from now on).

Ans:

(a):27(48-19)

> a<-27*(48-19)
>a
[1] 783
(*) operator is used for multiplication and (-) operator is used for subtraction.
(b): 19^7
> a<-19^7
>a
[1] 893871739
(^) operator is used for power.
(c) √436/12
> a<-sqrt(436/12)
>a
[1] 6.027714
Sqrt() is a function which is used to find the square root of a number

Problem 2: Create the following vectors in R. (2 Point) a = (5, 10, 15, 20, ..., 160) b = (97, 96, 95, ...,
56) Use vector arithmetic to multiply these vectors and call the result d. Select subsets of d to
identify the following. (a) What is the 13th, 14th, and 15st elements of d? (b) What are all of the
elements of d which are less than 2000?

Solution:

> a=seq(5,160,5)

>a

[1] 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95

[20] 100 105 110 115 120 125 130 135 140 145 150 155 160

Seq() takes three inputs – starting number, ending number, and increment by value
> b<-97:56

>b

[1] 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73

[26] 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56

> d<- a*b

Warning message:

In a * b : longer object length is not a multiple of shorter object length

>d

[1] 485 960 1425 1880 2325 2760 3185 3600 4005 4400 4785 5160

[13] 5525 5880 6225 6560 6885 7200 7505 7800 8085 8360 8625 8880

[25] 9125 9360 9585 9800 10005 10200 10385 10560 325 640 945 1240

[37] 1525 1800 2065 2320 2565 2800

a and b are two vectors. Elements of vector a and b are multiplied to obtain vector d.

(a): > d[13]

[1] 5525

> d[14]

[1] 5880

> d[15]

[1] 6225

13th,14th and 15th elements of vector d are shown using vector indexing
(b): > d[d<2000]

[1] 485 960 1425 1880 325 640 945 1240 1525 1800

There are ten elements in d which is less than 2000.

Problem 3: Using d from problem 2, use R to compute the following statistics of d:

(a) sum (b) median (c) standard deviation

Ans:

(a): >sum(d)

[1] 217745

Sum() function is used to calculate sum of the elements in vector d


(b): > median(d)
[1] 4972.5

Median() function is used to calculate median of all the elements in vector d


(c): > sd(d)

[1] 3369.714

Sd() function is used to calculate the standard deviation in vector d which is the square root
of its variance.
Problem 4: Using the concept of Vector perform the following:

a) Create a vector name student and add 10 vector values into it.

b) Create a vector age and add corresponding values into it.

c) Show both vectors values.

d) Show the vector values available at position 4th , 7th, and 10th in any vector.

e) Combine vector student and age together and show them.

f) Add one more vector into existing vector student called subject and show it.

g) Add new student record into existing vector and show

Ans:

(a):
>student<c("Ram","Shyam","Gyan","Iqbal","Ketan","Param","Nancy","Nakul","Ambika","
Ankur")
A vector name student is created with 10 names as its elements.

(b)
age<-c(18,19,20,21,22,23,24,25,26,27)
A vector name age is created with age of ten students as its elements.

(c)
> student
[1] "Ram" "Shyam" "Gyan" "Iqbal" "Ketan"
[6] "Param" "Nancy" "Nakul" "Ambika" "Ankur"
> age
[1] 18 19 20 21 22 23 24 25 26 27
All the assigned values in the vector student and age can shown by calling the vector student
and age respectively.
(d) > student[4]
[1] "Iqbal"
> student[7]
[1] "Nancy"
> student[10]
[1] "Ankur"
The elements in vectors student and age can be referenced using vector indexing.

(e)
> studentInfo<-data.frame(student,age)
> studentInfo
student age
1 Ram 18
2 Shyam 19
3 Gyan 20
4 Iqbal 21
5 Ketan 22
6 Param 23
7 Nancy 24
8 Nakul 25
9 Ambika 26
10 Ankur 27
The two vectors student and age can be framed together by using data.frame() function.
(f)

> student<-c("subject")
> student
[1] "subject”
(g)
> age<-c(28)
> studentrecord<-data.frame(student,age)
> studentrecord
student age
1 subject 28
> Newstudentrecord<-rbind(studentInfo,studentrecord)
> Newstudentrecord
student age
1 Ram 18
2 Shyam 19
3 Gyan 20
4 Iqbal 21
5 Ketan 22
6 Param 23
7 Nancy 24
8 Nakul 25
9 Ambika 26
10 Ankur 27
11 subject 28
The elements of vector studentinfo and studentrecord can be combined by using rbind()
function
Q.5 R list is the object which contains elements of different types – like strings, numbers, vectors and
another list inside it. R list can also contain a matrix or a function as its elements. The list is created
using the list() function in R. In other words, a list is a generic vector containing other objects. So
create three lists named as vec, char_vec and logic_vec such as : vec <- c(1,2,3) char_vec <-
c("Hadoop", "Spark", "Flink", "Mahout") logic_vec <- c(TRUE, FALSE, TRUE, FALSE) Print the list and
access the third element from list char_vec and merge all the lists together.

Ans:

> vec<-c(1,2,3)

> char_vec <- c("Hadoop", "Spark", "Flink", "Mahout")

> logic_vec <- c(TRUE, FALSE, TRUE, FALSE)

> vec
[1] 1 2 3

> char_vec

[1] "Hadoop" "Spark" "Flink" "Mahout"

> logic_vec

[1] TRUE FALSE TRUE FALSE

> char_vec[3]

[1] "Flink"

> a<- c(vec,char_vec,logic_vec)

>a

[1] "1" "2" "3" "Hadoop" "Spark" "Flink" "Mahout" "TRUE"

[9] "FALSE" "TRUE" "FALSE"

Problem 6: Load the file with the name WHO1. Identify the number of variables and objects existing
in the dataset. After analyzing variables perform the following on (WHO1) (3 Point)

 visualize the data variable using plot() and boxplot() and label them as per your choice. Also give
title to the various plots and take a screenshot of your output copy on word and send it to me for
the evaluation with proper explanation. (5 Points)

 Create table for any variable existing in data set and apply tapply() function

Ans:

(a): >plot(WHO1$year,WHO1$e_inc_tbhiv_100k_lo,xlab = "Year",ylab = "TBHIV",main =


"Year wise TBHIV Data")
Plot is a simple x-y graph. Here we have obtained this scattered plot.

>boxplot(WHO1$e_tbhiv_prct ~
WHO1$g_whoregion,xlab="Region",ylab="e_tbhiv_prct",main="Region wise e_tbhiv_prct")

A box plot makes a plot between the first and third quartile of the data set and the line in the
box marks the median value. The difference between the first and third quartile gives the
interquartile range (IQR). Whiskers are drawn on either sides of the first and third quartile
mark till 1.5 times IQR. Here we have used the boxplot() function to draw the graph.
(b): >table(WHO1$g_whoregion)

AFR AMR EMR EUR SEA WPR


882 835 407 1016 207 684
Table() function is used to perform categorical tabulation of data with the variable and its
frequency.
> tapply(WHO1$e_inc_100k_hi, WHO1$g_whoregion,mean)
AFR AMR EMR EUR SEA WPR
419.32971 42.28084 107.44214 50.98944 352.08696 173.03744
Tapply function splits the observations by whoregion then computes the mean of the variable
e_inc_100k_hi.

Problem 7:
(a): > x <- data.frame("Student" =
+ c("Ron","Jake","Ava","Sophia","Mia"),"Marks" = c(35,75,45,30,85))
>x
Student Marks
1 Ron 35
2 Jake 75
3 Ava 45
4 Sophia 30
5 Mia 85

> str(x)

'data.frame': 5 obs. of 2 variables:

$ Student: chr "Ron" "Jake" "Ava" "Sophia" ...

$ Marks : num 35 75 45 30 85

> x$Marks= ifelse(x$"Marks"<=40,"Fail","Pass")

> print(x)
Student Marks

1 Ron Fail

2 Jake Pass

3 Ava Pass

4 Sophia Fail

5 Mia Pass

x is a data set which has names of students and their corresponding marks. Here we have used
the ifelse condition to give out the result. The condition is applied for each student.
(b): > mymat<-matrix(nrow=20,ncol=20)
> for(i in 1:dim(mymat)[1]){for(j in 1:dim(mymat)[2]){mymat[i,j]=i*j}}
> mymat[1:10,1:10]

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 2 3 4 5 6 7 8 9 10
[2,] 2 4 6 8 10 12 14 16 18 20
[3,] 3 6 9 12 15 18 21 24 27 30
[4,] 4 8 12 16 20 24 28 32 36 40
[5,] 5 10 15 20 25 30 35 40 45 50
[6,] 6 12 18 24 30 36 42 48 54 60
[7,] 7 14 21 28 35 42 49 56 63 70
[8,] 8 16 24 32 40 48 56 64 72 80
[9,] 9 18 27 36 45 54 63 72 81 90
[10,] 10 20 30 40 50 60 70 80 90 100
mymat is a matrix which has 20 rows and 20 columns. nrow and ncol is used to define the
number of rows and columns in a matrix. Only the first 10 rows and 10 columns are shown in
the result

You might also like