0% found this document useful (0 votes)
15 views34 pages

R Sem3

The document outlines a series of practical problems related to statistical computing using R software, covering tasks such as vector manipulation, matrix operations, and data type conversions. It includes detailed instructions and R code snippets for each problem, demonstrating how to perform operations like rounding, extracting elements, and calculating statistics. The problems are structured to enhance understanding of R programming and statistical analysis techniques.

Uploaded by

npes982
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views34 pages

R Sem3

The document outlines a series of practical problems related to statistical computing using R software, covering tasks such as vector manipulation, matrix operations, and data type conversions. It includes detailed instructions and R code snippets for each problem, demonstrating how to perform operations like rounding, extracting elements, and calculating statistics. The problems are structured to enhance understanding of R programming and statistical analysis techniques.

Uploaded by

npes982
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

MSTP-305: Practical-III, 3rd Semester,

Sept-Dec, 2023

Part-A: Problem Solving using R


Software-I
(Statistical Computing-II)
Problem 1

Define a vector of decimal data, whose elements are: 2.345612, 1.256701,


2.19875,0.123765,4.132895,7.312678,0.123453,4.103254,3.198341 and perform the following tasks:

I. Check the length of this vector and also convert its elements to 3 decimal places.
> x = c(2.345612, 1.256701,2.19875, 0.123765, 4.132895, 7.312678, 0.123453,
4.103254,3.198341)
>x
[1] 2.345612 1.256701 2.198750 0.123765 4.132895 7.312678 0.123453 4.103254 3.198341
> length(x)
[1] 9
> round(x,3)
[1] 2.346 1.257 2.199 0.124 4.133 7.313 0.123 4.103 3.198

II. Extract the 4th and 6th entry.


> x[4]
[1] 0.123765
> x[6]
[1] 7.312678

III. Extract all those entries whose values are greater than equal to 3.
> x[x>=3]
[1] 4.132895 7.312678 4.103254 3.198341

IV. Check whether it’s a numeric vector, character vector or integer vector?
> class(x)
[1] "numeric"

V. Calculate the square root and natural log of each entry in a single command.
> sqrt(x)
[1] 1.5315391 1.1210268 1.4828183 0.3518025 2.0329523 2.7041964 0.3513588 2.0256490
1.7883906
> log(x)
[1] 0.8525463 0.2284900 0.7878890 -2.0893707 1.4189781 1.9896096 -2.0918948
1.4117803 1.1626322
VI. Convert this vector to character and integer vector, thereafter perform the checking by using
testing functions available in R.
> y =as.character(x)
>y
[1] "2.345612" "1.256701" "2.19875" "0.123765" "4.132895" "7.312678" "0.123453"
"4.103254" "3.198341"
> z = as.integer(x)
>z
[1] 2 1 2 0 4 7 0 4 3
> is.numeric(x)
[1] TRUE
> is.character(y)
[1] TRUE
> is.integer(z)
[1] TRUE

VII. Name the vector elements as X1, X2, X3, ….


> names(x) = paste0("X", 1:9)
>x
X1 X2 X3 X4 X5 X6 X7 X8 X9
2.345612 1.256701 2.198750 0.123765 4.132895 7.312678 0.123453 4.103254 3.198341

VIII. By using the names subsets, the vector for the values of X3, X6 and X9
> x[c("X3", "X6", "X9")]
X3 X6 X9
2.198750 7.312678 3.198341
Problem 2

Generate a vector of 20 elements starting with 2.5 by a jump of 0.3 and perform the following tasks:

I. Check the class and type of this vector.


x = seq(2.5, by=0.3, length=20)
>x
[1] 2.5 2.8 3.1 3.4 3.7 4.0 4.3 4.6 4.9 5.2 5.5 5.8 6.1 6.4 6.7 7.0 7.3 7.6 7.9
[20] 8.2
> class(x)
[1] "numeric"
> typeof(x)
[1] "double"

II. Convert this to complex vector and save with name z.


> z = as.complex(x)
>z
[1] 2.5+0i 2.8+0i 3.1+0i 3.4+0i 3.7+0i 4.0+0i 4.3+0i 4.6+0i 4.9+0i 5.2+0i
[11] 5.5+0i 5.8+0i 6.1+0i 6.4+0i 6.7+0i 7.0+0i 7.3+0i 7.6+0i 7.9+0i 8.2+0i

III. Extract all the elements which are at the even places.
> even = x[seq(2,length(x), by=2)]
> even
[1] 2.8 3.4 4.0 4.6 5.2 5.8 6.4 7.0 7.6 8.2

IV. Extract the first five elements. Thereafter bind them row wise and column wise; and
store under the name row_mat. Perform the testing for matrix and vector.
> x[1:5]
[1] 2.5 2.8 3.1 3.4 3.7
> row_mat = rbind(x[1:5])
> row_mat
[,1] [,2] [,3] [,4] [,5]
[1,] 2.5 2.8 3.1 3.4 3.7
> col_mat =cbind(x[1:5])
> col_mat
[,1]
[1,] 2.5
[2,] 2.8
[3,] 3.1
[4,] 3.4
[5,] 3.7
> is.vector(x)
[1] TRUE
> is.matrix(row_mat)
[1] TRUE
> is.matrix(col_mat)
[1] TRUE
Problem 3

Generate one vector of size=5 of 2’s and second vector of size m=4 of size 5’s and perform the
following tasks:

I. Combine them into a single vector.


> x = rep(2, times=5)
> y = rep(5, times=4)
> z = c(x,y)
>z
[1] 2 2 2 2 2 5 5 5 5
II. Add an element to the original vector, whose value is 1 and store them under the name
value.
> value = c(z,1)
> value
[1] 2 2 2 2 2 5 5 5 5 1

III. Extract all the elements which are at the odd places.
> odd = value[c(TRUE,FALSE)] # method 1
> odd
[1] 2 2 2 5 5
> odd = value[seq(1,length(value), by=2)] # method 2
> odd
[1] 2 2 2 5 5
Problem 4

Define a vector of the complex numbers, whose elements are 1+2i, 4-1i,2+1i,2- 2i,5-1i,3+6i and
perform the following tasks:

I. Extract the real and imaginary part of the above vector.


> x = c(1+2i, 4-1i, 2+1i, 2-2i, 5-1i, 3+6i)
>x
[1] 1+2i 4-1i 2+1i 2-2i 5-1i 3+6i
> Re(x)
[1] 1 4 2 2 5 3
> Im(x)
[1] 2 -1 1 -2 -1 6

II. Calculate the modulus and complex conjugate of the above vector.
> Mod(x)
[1] 2.236068 4.123106 2.236068 2.828427 5.099020 6.708204
> Conj(x)
[1] 1-2i 4+1i 2-1i 2+2i 5+1i 3-6i

III. Check the type and class of this vector, also use str() function to get the full detail of this
vector.
> typeof(x)
[1] "complex"
> class(x)
[1] "complex"
> str(x)
cplx [1:6] 1+2i 4-1i 2+1i ...
Problem 5

Define the following matrix with name mat_A in R.

5.07 3.19 -2.32 1.87


1.21 8.95 -2.04 4.49
-3.32 -4.66 9.93 2.05
1.18 6.72 -3.66 8.77

I. Print the number of row, column and dimensions of the above matrix.
> x = c(5.07, 3.19, -2.32, 1.87, 1.21, 8.95, -2.04, 4.49, -3.32, -4.66, 9.93, 2.05, 1.18, 6.72, -
3.66, 8.77)
> mat_A = matrix(x,4,4, byrow=T)
> mat_A
[,1] [,2] [,3] [,4]
[1,] 5.07 3.19 -2.32 1.87
[2,] 1.21 8.95 -2.04 4.49
[3,] -3.32 -4.66 9.93 2.05
> dim(mat_A)
[1] 4 4
> nrow(mat_A) > ncol(mat_A)
[1] 4 [1] 4

II. Calculate the transpose, inverse, row sums, column sums, row mean and column mean of
the above matrix.
> t(mat_A) # transpose
[,1] [,2] [,3] [,4]
[1,] 5.07 1.21 -3.32 1.18
[2,] 3.19 8.95 -4.66 6.72
[3,] -2.32 -2.04 9.93 -3.66
[4,] 1.87 4.49 2.05 8.77
> solve(mat_A) # inverse
[,1] [,2] [,3] [,4]
[1,] 0.23714256 -0.03800968 0.0332654713 -0.03888111
[2,] -0.02411353 0.18858693 -0.0005364163 -0.09128430
[3,] 0.06513140 0.09627464 0.1034974210 -0.08737042
[4,] 0.01375093 -0.09921181 0.0391279387 0.15274054
> rowSums(mat_A)
[1] 7.81 12.61 4.00 13.01
> colSums(mat_A)
[1] 4.14 14.20 1.91 17.18
> rowMeans(mat_A)
[1] 1.9525 3.1525 1.0000 3.2525
> colMeans(mat_A)
[1] 1.0350 3.5500 0.4775 4.2950
III. Use R function to print the diagonal of mat_A.
> diag(mat_A)
[1] 5.07 8.95 9.93 8.77

IV. Extract the 2nd column of mat_A.


> mat_A[,2]
[1] 3.19 8.95 -4.66 6.72

V. Extract the 3rd row of mat_A.


> mat_A[3,]
[1] -3.32 -4.66 9.93 2.05

VI. Extract the 4th entry of the 3rd column of mat_A.


> mat_A[4,3]
[1] -3.66

VII. Define the row names as R1, R2, R3, R4.


> rownames(mat_A) = c("R1","R2","R3","R4")
> mat_A
[,1] [,2] [,3] [,4]
R1 5.07 3.19 -2.32 1.87
R2 1.21 8.95 -2.04 4.49
R3 -3.32 -4.66 9.93 2.05
R4 1.18 6.72 -3.66 8.77

VIII. After step 7, define the column names also as C1, C2, C3, C4.
> colnames(mat_A) = c("C1", "C2","C3","C4")
> mat_A
C1 C2 C3 C4
R1 5.07 3.19 -2.32 1.87
R2 1.21 8.95 -2.04 4.49
R3 -3.32 -4.66 9.93 2.05
R4 1.18 6.72 -3.66 8.77
Problem 6

Generate two sequences, each of size 25, starting with 1 by a jump of o.15 and 0.45. Using these two
sequences make two matrices of order 5X5, with names mat_B and mat_C.

> x = seq(1,by=0.15, length=25)


> y = seq(1,by=0.45, length=25)
> mat_B = matrix(x,5,5)
> mat_B
[,1] [,2] [,3] [,4] [,5]
[1,] 1.00 1.75 2.50 3.25 4.00
[2,] 1.15 1.90 2.65 3.40 4.15
[3,] 1.30 2.05 2.80 3.55 4.30
[4,] 1.45 2.20 2.95 3.70 4.45
[5,] 1.60 2.35 3.10 3.85 4.60
> mat_C = matrix(y,5,5)
> mat_C
[,1] [,2] [,3] [,4] [,5]
[1,] 1.00 3.25 5.50 7.75 10.00
[2,] 1.45 3.70 5.95 8.20 10.45
[3,] 1.90 4.15 6.40 8.65 10.90
[4,] 2.35 4.60 6.85 9.10 11.35
[5,] 2.80 5.05 7.30 9.55 11.80

I. Perform addition, subtraction and multiplication with the above two matrices.
> mat_B + mat_C
[,1] [,2] [,3] [,4] [,5]
[1,] 2.0 5.0 8.0 11.0 14.0
[2,] 2.6 5.6 8.6 11.6 14.6
[3,] 3.2 6.2 9.2 12.2 15.2
[4,] 3.8 6.8 9.8 12.8 15.8
[5,] 4.4 7.4 10.4 13.4 16.4
> mat_B - mat_C
[,1] [,2] [,3] [,4] [,5]
[1,] 0.0 -1.5 -3.0 -4.5 -6.0
[2,] -0.3 -1.8 -3.3 -4.8 -6.3
[3,] -0.6 -2.1 -3.6 -5.1 -6.6
[4,] -0.9 -2.4 -3.9 -5.4 -6.9
[5,] -1.2 -2.7 -4.2 -5.7 -7.2
> mat_B * mat_C
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0000 5.6875 13.7500 25.1875 40.0000
[2,] 1.6675 7.0300 15.7675 27.8800 43.3675
[3,] 2.4700 8.5075 17.9200 30.7075 46.8700
[4,] 3.4075 10.1200 20.2075 33.6700 50.5075
[5,] 4.4800 11.8675 22.6300 36.7675 54.2800
II. Extract the 1st, 3rd and 5th column of the mat_B and store under the name mat_D.
> mat_D = mat_B[,c(1,3,5)]
> mat_D
[,1] [,2] [,3]
[1,] 1.00 2.50 4.00
[2,] 1.15 2.65 4.15
[3,] 1.30 2.80 4.30
[4,] 1.45 2.95 4.45
[5,] 1.60 3.10 4.60

III. Combine 5th column of mat_C to mat_D.


> mat_D = cbind(mat_D, mat_C[,5])
> mat_D
[,1] [,2] [,3] [,4]
[1,] 1.00 2.50 4.00 10.00
[2,] 1.15 2.65 4.15 10.45
[3,] 1.30 2.80 4.30 10.90
[4,] 1.45 2.95 4.45 11.35
[5,] 1.60 3.10 4.60 11.80

IV. Combine mat_B and mat_C row wise and column wise.
> rbind(mat_B, mat_C)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.00 1.75 2.50 3.25 4.00
[2,] 1.15 1.90 2.65 3.40 4.15
[3,] 1.30 2.05 2.80 3.55 4.30
[4,] 1.45 2.20 2.95 3.70 4.45
[5,] 1.60 2.35 3.10 3.85 4.60
[6,] 1.00 3.25 5.50 7.75 10.00
[7,] 1.45 3.70 5.95 8.20 10.45
[8,] 1.90 4.15 6.40 8.65 10.90
[9,] 2.35 4.60 6.85 9.10 11.35
[10,] 2.80 5.05 7.30 9.55 11.80
> cbind(mat_B,mat_C)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1.00 1.75 2.50 3.25 4.00 1.00 3.25 5.50 7.75 10.00
[2,] 1.15 1.90 2.65 3.40 4.15 1.45 3.70 5.95 8.20 10.45
[3,] 1.30 2.05 2.80 3.55 4.30 1.90 4.15 6.40 8.65 10.90
[4,] 1.45 2.20 2.95 3.70 4.45 2.35 4.60 6.85 9.10 11.35
[5,] 1.60 2.35 3.10 3.85 4.60 2.80 5.05 7.30 9.55 11.80
Problem 7

Tree=c(rep(1,7),rep(2,7),rep(3,7),rep(4,7),rep(5,7))
Age=rep(c(118,484,664,1004,121,1372,1582),5)
Circumferences=c(30,58,87,115,120,142,145,33,69,111,156,172,203,203,30,51,75,108,115,139,140,
32,62,112,167,179,209,214,30,49,61,125,142,174,177)
df =data.frame(Tree,Age,Circumferences)
df

d = df[, c("Age", "Circumferences")]


d

for (i in 1:5)
{
print(i)
print("Minimum:")
print(min(df$Circumferences[Tree==i]))
print("Maximum:")
print(max(df$Circumferences[Tree==i]))
print("Mean:")
print(mean(df$Circumferences[Tree==i]))
print("Standard Deviation:")
print(sd(df$Circumferences[Tree==i]))
print('______________________')
}

m3 = mean(df$Circumferences[Tree==3])
m3
m5=mean(df$Circumferences[Tree==5])
m5
m3>m5
Problem 8
Plot the density function of the standard normal distribution in the interval (-6, 6) and add the points
-6(1)6 in the obtained density curve. Also, plot the curve of the distribution function.

> x = seq(-6, 6, length.out = 1000)


> d = dnorm(x)
> p = pnorm(x)

> plot(x, d, type = 'l', col = 'skyblue', lwd = 2, ylim = c(0, 1), ylab = 'Density', main = 'Standard Normal
Distribution Density and Distribution Functions')

> y = seq(-6,6,length = 20)


> points(y, dnorm(y), col = 'blue', pch = 16)

> lines(x, p, col = 'coral', lwd = 2)

> legend('topleft', legend = c('Density Function', 'Distribution Function', 'Points'), col = c('skyblue',
'coral', 'blue'), lwd = 2, pch = c(NA, NA, 16), bty = 'n')
Problem 9

Draw the random numbers of size 25 from the given pdf/pmf given below. Also, plot the pdf
(probability density function) and cdf (cumulative distribution function) of the following
continuous/ discrete distributions.
I. Binomial with size=60, prob=0.3, 0.5, 0.9.
II. Poisson λ= 2, 5, 10
III. Gamma distribution
IV. Standard Cauchy distribution
V. Weibull distribution

> library(MASS)
> library(ggplot2)

> par(mfrow=c(3,3), mar=c(4,4,2,2), oma=c(2,0,2,0))

# binomial
> n = 60
> p = c(0.3,0.5,0.9)
> for (prob in p){
+ x_binom = rbinom(25, size=n, prob=prob)
+ hist(x_binom, probability =TRUE, col="lightblue",main=paste("Binomial (n = 60, p =
",prob,")",sep=""),xlab="Values",ylab="Density")
+ lines(density(x_binom), col="darkred",lwd="2")
+ lines(ecdf(x_binom),col="blue",lwd=2)
+}

# poisson
> l = c(2, 5, 10)
> for (lambda in l) {
+ x_pois <- rpois(25, lambda = lambda)
+ hist(x_pois, probability = TRUE, col = "lightgreen", main = paste("Poisson (lambda = ",
lambda, ")", sep = ""), xlab = "Values", ylab = "Density")
+ lines(density(x_pois), col = "darkred", lwd = 2)
+ lines(ecdf(x_pois), col = "blue", lwd = 2)
+}

# gamma
> shape = 2
> scale = 1
> x_gamma <- rgamma(25, shape = shape, scale = scale)
> hist(x_gamma, probability = TRUE, col = "lightcoral", main = "Gamma (shape = 2, scale
= 1)", xlab = "Values", ylab = "Density")
> lines(density(x_gamma), col = "darkred", lwd = 2)
> lines(ecdf(x_gamma), col = "blue", lwd = 2)
# cauchy
> x_cauchy = rcauchy(25)
> hist(x_cauchy,probability = TRUE, col = "lightgoldenrod", main = "Standard Cauchy",
xlab = "Values", ylab = "Density")
> lines(density(x_cauchy), col = "darkred", lwd = 2)
> lines(ecdf(x_cauchy), col = "blue", lwd = 2)

# weibull
> shape = 2
> scale = 1
> x_weib = rweibull(25, shape = shape, scale = scale)
> hist(x_weib, probability = TRUE, col = "lavenderblush", main = "Weibull (shape=2,
scale=1)", xlab = "Values", ylab = "Density")
> lines(density(x_weib), col = "darkred", lwd = 2)
> lines(ecdf(x_weib), col = "blue", lwd = 2)

> mtext("Probability Density Function (PDF) and Cumulative Distribution Function (CDF)",
outer = TRUE, line = 0.5, cex = 1)
Problem 10

Create a data frame using the following data for some 20 vegetables at a supermarket sold in a
month.
The names of the vegetables (in order) are as follows: Eggplant, Pea, Cucumber, Potato, Pumpkin,
Lettuce, Tomato, Sweet potato, Mushroom, Green bean, Corn, Cauliflower, Beetroot, Bell pepper,
Broccoli, Celery, Cabbage, Carrot, Onion, Lady finger. Perform the following:

> df = data.frame(
+ Sales_Kg = c(200, 100, 100, 400, 500, 800, 150, 200, 120, 400, 500, 185, 200, 150, 240,
500, 250, 170, 350, 180),
+ Price_Kg = c(40, 80, 40, 60, 70, 90, 40, 90, 30, 30, 60, 60, 100, 65, 45, 60, 65, 35, 85, 60),
+ Color_Flag = c(2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1) )
> df
Sales_Kg Price_Kg Color_Flag
1 200 40 2
2 100 80 1
3 100 40 1
4 400 60 2
5 500 70 2
6 800 90 1
7 150 40 2
8 200 90 2
9 120 30 2
10 400 30 1
11 500 60 1
12 185 60 1
13 200 100 2
14 150 65 2
15 240 45 1
16 500 60 1
17 250 65 1
18 170 35 2
19 350 85 2
20 180 60 1
I. Add these names, as the row names of the above data frame.
> Vegetable = c("Eggplant", "Pea", "Cucumber", "Potato", "Pumpkin", "Lettuce", "Tomato",
"Sweet potato", "Mushroom", "Green bean", "Corn", "Cauliflower", "Beetroot", "Bell
pepper", "Broccoli", "Celery", "Cabbage", "Carrot", "Onion", "Lady finger")
> row.names(df) = Vegetable
> df

II. Calculate the revenue for each vegetable and its proportion in the total revenue.
> df$Revenue = df$Sales_Kg * df$Price_Kg
> tot_rev = sum(df$Revenue)
> df$Proportion = round((df$Revenue / tot_rev),2)
> df

III. Add the label ‘green’ where 1 is written and the label ‘other. color’, where 2 is written.
> df$Label = ifelse(df$Color_Flag == 1, "green", "other.color")
> df

IV. Calculate the difference between ‘green’ color vegetables ‘other. color’ vegetables revenues.
> green_rev = sum(df$Revenue[df$Label == "green"])
> other_rev = sum(df$Revenue[df$Label == "other.color"])
> diff = green_rev - other_rev
> diff
[1] 44900

V. Replace the label ‘other. color’ by ‘not. green’.


> df$Label[df$Label == "other.color"] = "not.green"
> df

VI. Extract all the rows where ‘not. green’ is appearing.


> df[df$Label == "not.green", ]

Sales_Kg Price_Kg Color_Flag Revenue Proportion Label


Eggplant 200 40 2 8000 0.02 not.green
Potato 400 60 2 24000 0.07 not.green
Pumpkin 500 70 2 35000 0.10 not.green
Tomato 150 40 2 6000 0.02 not.green
Sweet potato 200 90 2 18000 0.05 not.green
Mushroom 120 30 2 3600 0.01 not.green
Beetroot 200 100 2 20000 0.05 not.green
Bell pepper 150 65 2 9750 0.03 not.green
Carrot 170 35 2 5950 0.02 not.green
Onion 350 85 2 29750 0.08 not.green
Problem 11
> data=USArrests
> head(data)

> murder=data$Murder
> murder
> assault=data$Assault
> assault
> rape=data$Rape
> rape

> s1=s2=s3=0
> for(i in 1:nrow(data))
+{
+ s1=s1+murder[i]
+ s2=s2+assault[i]
+ s3=s3+rape[i]
+}

> s1=s2=s3=0
> i=1
> while(i<=nrow(data))
+{
+ s1=s1+murder[i]
+ s2=s2+assault[i]
+ s3=s3+rape[i]
+ i=i+1
+}

> s1=s2=s3=0
> i=1
> repeat
+{
+ s1=s1+murder[i]
+ s2=s2+assault[i]
+ s3=s3+rape[i]
+ i=i+1
+ if(i>length(murder))
+{
+ break
+}
+}

> sum(murder)
> sum(assault)
> sum(rape)
> subset=data.frame()
> for(i in 1:nrow(data))
+{
+ if(murder[i]>22)
+{
+ next
+}
+ else if(murder[i] <10)
+{
+ subset=rbind(subset,data[i,])
+}
+}
> subset

> row_sums=apply(data,1,sum)
> row_sums
> col_sums=apply(data,2,sum)
> col_sums
> row_avg=apply(data,1,mean)
> row_avg
> col_avg=apply(data,2,mean)
> col_avg
Problem 12

x = rgamma(100,shape=3,scale=2)
s=sum(x)
true_shape=3
true_scale=2
n=100

mle_scale=s/(n*true_shape)
mle_scale
# MLE of scale parameter: 1.917102

scale_val=seq(from=0.1,to=10,by=0.1)
log_likelihood=((true_shape-1)*sum(log(x)))-(s/scale_val)-(n*true_shape*log(scale_val))-
(n*lgamma(true_shape))
plot(scale_val,log_likelihood,type="l",xlab="Scale Parameter",ylab="Log Likelihood",main="Log
Likelihood for Scale Parameter")
abline(v=mle_scale,col="red",lty=2)

x=rexp(n=100,rate = 2.5)
s=sum(x)
true_lambda=2.5
n=100

mle_lambda=n/s
mle_lambda
# MLE of lambda: 2.461891

lambda_val=seq(0.1,10,by = 0.1)
log_likelihood=(n*log(lambda_val))-(lambda_val*s)
plot(lambda_val,log_likelihood,type="l",xlab="Lambda",ylab="Log Likelihood",main="Log
Likelihood for Lambda")
abline(v=mle_lambda,col="red",lty=2)
Problem 13

x = runif(50)
y = rnorm(100, mean = 0, sd = 1)
z = rexp(50, 0.1)

lst = list(x = x, y = y, z = z)

f = function(n)
{
result = list(
average = mean(n),
sum = sum(n),
range = range(n),
median = median(n),
minimum = min(n),
maximum = max(n),
log_e = log(n),
log_10 = log10(n)
)
return(result)
}

lapply(lst[c(“x”, ”z”)], f)
sapply(lst[c(“x”, ”z”)], f)

y = y[y > 0]
lst = list(x = x, y = y, z = z)

lapply(lst, f)
sapply(lst, f)
Problem 14

newton_raphson = function(initial_guess, tolerance = 0.0001) {


v1 = initial_guess
iterations = 0

repeat {
v2 = v1 - (exp(-v1) - 5 * v1 - 2) / (-exp(-v1) - 5)
d = v2 - v1
iterations = iterations + 1

if (abs(d) < tolerance) {


break
}

v1 = v2
}

result = list(root = v2, iterations = iterations)


return(result)
}

newton_raphson(initial_guess = 0)
Problem 15

> x = c(3,5,7,9,2)
> W = diag(x)
>W
[,1] [,2] [,3] [,4] [,5]
[1,] 3 0 0 0 0
[2,] 0 5 0 0 0
[3,] 0 0 7 0 0
[4,] 0 0 0 9 0
[5,] 0 0 0 0 2
> diag(4)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1
> matrix(1,3,3)
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1

Problem 16

> p = matrix(c(0.5,0.5,0.0,0.0,0.0,
+ 0.5,0.5,0.0,0.0,0.0,
+ 0.0,0.0,0.3333333,0.3333333,0.0,
+ 0.0,0.0,0.6666667,0.6666667,0.0,
+ 0.2,0.2,0.2,0.2,0.2),5,5,byrow=TRUE)

> p_100 = p
> for (i in 1:99) {
+ p_100 = p_100 %*% p
+}

> p_100
[,1] [,2] [,3] [,4] [,5]
[1,] 0.50 0.50 0.0000000 0.0000000 0.000000e+00
[2,] 0.50 0.50 0.0000000 0.0000000 0.000000e+00
[3,] 0.00 0.00 0.3333333 0.3333333 0.000000e+00
[4,] 0.00 0.00 0.6666667 0.6666667 0.000000e+00
[5,] 0.25 0.25 0.2500000 0.2500000 1.267651e-70
Problem 17
> library(MASS)

> A = matrix(c(3, 1, 5, 2,
+ 1,11, 7, 6,
+ 3, 7, 4, 1,
+ 6, 3, 7, 1), nrow = 4, byrow = TRUE)
> A_plus = ginv(A)

> verify1 = function(A, A_plus) {


+ prop1 = all(round(A %*% A_plus %*% A, 8) == round(A, 8))
+ prop2 = all(round(A_plus %*% A %*% A_plus, 8) == round(A_plus, 8))
+ prop3 = all(round(t(A %*% A_plus), 8) == round(A %*% A_plus, 8))
+ prop4 = all(round(t(A_plus %*% A), 8) == round(A_plus %*% A, 8))
+ return(list(Property_1 = prop1, Property_2 = prop2, Property_3 = prop3, Property_4 = prop4))
+}

> verify1(A, A_plus)


$Property_1
[1] TRUE

$Property_2
[1] TRUE

$Property_3
[1] TRUE

$Property_4
[1] TRUE

> verify2 = function(A, A_plus) {


+ prop_a = qr(A)$rank == qr(A_plus)$rank
+ prop_b = all(round(A %*% A_plus, 0) == round(A_plus %*% A, 0) & round(A %*% A_plus, 0)==
diag(4))
+ prop_c = all(round(t(ginv(t(A %*% A_plus))), 8) == round(A %*% A_plus, 8))
+ return(list(Property_A = prop_a, Property_B = prop_b, Property_C = prop_c))
+}

> verify2(A, A_plus)


$Property_A
[1] TRUE

$Property_B
[1] TRUE

$Property_C
[1] TRUE
Practical 18

> a1 = c(1,1,0)
> a2 = c(1,0,1)
> a3 = c(0,1,1)

> b1 = a1
> b2 = a2 - ((sum(a2 * b1)) * b1) / (sum(b1^2))
> b3 = a3 - (((sum(a3 * b1)) * b1) / sum(b1^2) - (((sum(a3 * b2)) * b2) / sum(b2^2)))

> og_basis = list(b1, b2, b3)


> og_basis

[[1]]
[1] 1 1 0

[[2]]
[1] 0.5 -0.5 1.0

[[3]]
[1] -0.3333333 0.3333333 1.3333333

> b1_norm = b1 / (sqrt(sum(b1^2)))


> b2_norm = b2 / (sqrt(sum(b2^2)))
> b3_norm = b3 / (sqrt(sum(b3^2)))

> on_basis = list(b1_norm, b2_norm, b3_norm)


> on_basis

[[1]]
[1] 0.7071068 0.7071068 0.0000000

[[2]]
[1] 0.4082483 -0.4082483 0.8164966

[[3]]
[1] -0.2357023 0.2357023 0.9428090
Practical 19

> f1 = function(x){(sqrt(x))/(sqrt(3-x)+sqrt(x))}
> f2 = function(x){1/(1+sqrt(1/tan(x)))}
> f3 = function(y){exp(-y) * y^6}
> f4 = function(x){x*abs(x)}
> f5 = function(x){abs(cos(x))}
> f6 = function(x){abs(sin(2*pi*x))}
> f7 = function(x){1/(sqrt(x+1)+sqrt(5*x+1))}
> f8 = function(x){atan((2*x-1)/(1+x-x^2))}
> f9 = function(x){(x^2-2)/(x^3 *sqrt(x^2-1))}
> f10 = function(x){1/(sqrt(1-x[1]^2)*sqrt(1- x[2]^2))}

> integrate(f1,1,2)$value
[1] 0.5
> integrate(f2, pi/6,pi/3)$value
[1] 0.2617994
> integrate(f3,0,Inf)$value
[1] 720
> integrate(f4,-1,1)$value
[1] 0
> integrate(f5,0,pi)$value
[1] 2
> integrate(f6,0,1)$value
[1] 0.6366198
> integrate(f7,0,3)$value
[1] 0.7445872
> round(integrate(f8,0,1)$value,8)
[1] 0
> round(integrate(f9,1,Inf)$value,8)
[1] 0
# install.packages("cubature")
> adaptIntegrate(f10, c(0, 0), c(1, 1))$integral
[1] 2.467393
Practical 20

> f1 <- expression(2*x^3 + x^2 + x)


> f2 <- expression((2*x - 1)/(1 + x - x^2))
> f3 <- expression(-3*x^5 + 8*sqrt(x) - 9.3)

> d1 <- D(f1, "x")


> d2 <- D(f2, "x")
> d3 <- D(f3, "x")

> eval(d1, list(x = 2))


[1] 29
> eval(d2, list(x = 3))
[1] 0.6
> eval(d2, list(x = 1))
[1] 3
> eval(d3, list(x = 0.5))
[1] 4.719354
Problem 21

> income = c(6.5, 10.5, 12.7, 13.8, 13.2, 11.4, 5.5, 8.0, 9.6, 9.1, 9.0, 8.5, 4.8, 7.3, 8.4, 8.7, 7.3, 7.4, 5.6,
6.8, 6.9, 6.8, 6.1, 6.5, 4.0, 6.4, 6.4, 8.0, 6.6, 6.2, 4.7, 7.4, 8.0, 8.3, 7.6, 6.7)

> mu_0 = 10
> alpha = 0.05

# two-sided t-test
> t_test = t.test(income, mu = mu_0)

> t_test$statistic
t
-5.826874
> t_test$p.value
[1] 1.305634e-06

> if (t_test$p.value < alpha/2) {


+ print("Reject the null hypothesis. The mean income is significantly different from Rs. 10,000")
+ } else {
+ print("Fail to reject the null hypothesis. There is not enough evidence to conclude a difference in
the mean income")
+}
[1] "Reject the null hypothesis. The mean income is significantly different from Rs. 10,000"

# one-sided (left-tailed) t-test


> t_test = t.test(income, mu = mu_0, alternative = "less")

> t_test$statistic
t
-5.826874
> t_test$p.value
[1] 6.52817e-07

> if (t_test$p.value < alpha) {


+ print("Reject the null hypothesis. The mean income is significantly less than Rs. 10,000")
+ } else {
+ print("Fail to reject the null hypothesis. There is not enough evidence to conclude the mean
income is less than Rs. 10,000")
+}
[1] "Reject the null hypothesis. The mean income is significantly less than Rs. 10,000"
Problem 22

> df = rivers
> boxplot(df)

> q1 = quantile(df, 0.25)


> q3 = quantile(df, 0.75)
> ub = q3 + 1.5 * (q3-q1)
> lb = q1 - 1.5 * (q3-q1)
> df = ifelse(df >= lb & df <= ub, df, NA)
> df = ifelse(is.na(df), median(df, na.rm = TRUE), df)
> boxplot(df)

> para = fitdistr(df, densfun = "gamma")


> ks.test(df, "pgamma", shape = para$estimate[1], rate = para$estimate[2])

Asymptotic one-sample Kolmogorov-Smirnov test

data: df
D = 0.11917, p-value = 0.03644
alternative hypothesis: two-sided
Warning message:
In ks.test.default(df, "pgamma", shape = para$estimate[1], rate = para$estimate[2]) :
ties should not be present for the Kolmogorov-Smirnov test
> qqplot(df, qgamma(ppoints(length(df)), shape = para$estimate[1], rate = para$estimate[2]),main =
"Q-Q Plot - Gamma Distribution", xlab = "Theoretical Quantiles", ylab = "Sample Quantiles")
> abline(0, 1, col = "blue")

> shapiro.test(df)

Shapiro-Wilk normality test

data: df
W = 0.88346, p-value = 3.924e-09

#b
> ks.test(rock$area, "pnorm")

Asymptotic one-sample Kolmogorov-Smirnov test

data: rock$area
D = 1, p-value < 2.2e-16
alternative hypothesis: two-sided

Warning message:
In ks.test.default(rock$area, "pnorm") :
ties should not be present for the Kolmogorov-Smirnov test
> para = fitdistr(rock$shape, "weibull")
Warning messages:
1: In densfun(x, parm[1], parm[2], ...) : NaNs produced
2: In densfun(x, parm[1], parm[2], ...) : NaNs produced
> ks.test(rock$shape, "pweibull", shape = para$estimate[1])

Asymptotic one-sample Kolmogorov-Smirnov test

data: rock$shape
D = 0.88681, p-value < 2.2e-16
alternative hypothesis: two-sided

Warning message:
In ks.test.default(rock$shape, "pweibull", shape = para$estimate[1]) :
ties should not be present for the Kolmogorov-Smirnov test
Problem 23

> df = mtcars
> boxplot(df, main="Before removing outliers")

> out = function(x) {


+ q1 = quantile(x, 0.25)
+ q3 = quantile(x, 0.75)
+ ub = q3 + 1.5 * (q3-q1)
+ lb = q1 - 1.5 * (q3-q1)

+ x = ifelse(x >= lb & x <= ub, x, NA)


+ x
+}

> y = c("hp","wt","qsec","carb")
> for (col in y) {
+ df[[col]] = out(df[[col]])
+}

> df = apply(df, 2, function(x) ifelse(is.na(x), median(x, na.rm = TRUE), x))


> boxplot(df, main="After removing outliers")
> X = df[,1]
> Y = df[,-1]
> model = lm(X ~ ., data = as.data.frame(Y))
> summary(model)

Call:
lm(formula = X ~ ., data = as.data.frame(Y))

Residuals:
Min 1Q Median 3Q Max
-4.7639 -1.6332 0.0468 1.3660 5.3056

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.722893 23.950216 0.823 0.4195
cyl 0.659105 1.077842 0.612 0.5474
disp -0.025542 0.012211 -2.092 0.0488 *
hp 0.002362 0.023532 0.100 0.9210
drat 1.856711 1.760976 1.054 0.3037
wt -3.833424 1.691290 -2.267 0.0341 *
qsec 0.455660 0.858629 0.531 0.6012
vs -1.068076 2.449393 -0.436 0.6672
am -1.425667 2.491880 -0.572 0.5733
gear 0.863971 1.640065 0.527 0.6039
carb -1.319005 0.630910 -2.091 0.0489 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.725 on 21 degrees of freedom


Multiple R-squared: 0.8615, Adjusted R-squared: 0.7956
F-statistic: 13.07 on 10 and 21 DF, p-value: 6.59e-07

The model is statistically significant overall, and some variables (disp, wt, carb) are individually
significant. The 𝑅² value indicates a good overall fit, but it's important to consider the adjusted 𝑅²
when evaluating the model's complexity.
Problem 24

> df = data.frame(
+ locationNo = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
+ y1 = c(35, 35, 40, 10, 6, 20, 35, 35, 35, 30),
+ y2 = c(3.5, 4.9, 30.0, 2.8, 2.7, 2.8, 4.6, 10.9, 8.0, 1.6),
+ y3 = c(2.80, 2.70, 4.38, 3.21, 2.73, 2.81, 2.88, 2.90, 3.28, 3.20)
+)
> df
locationNo y1 y2 y3
1 1 35 3.5 2.80
2 2 35 4.9 2.70
3 3 40 30.0 4.38
4 4 10 2.8 3.21
5 5 6 2.7 2.73
6 6 20 2.8 2.81
7 7 35 4.6 2.88
8 8 35 10.9 2.90
9 9 35 8.0 3.28
10 10 30 1.6 3.20

> cov(df[, c("y1", "y2", "y3")])


y1 y2 y3
y1 140.544444 49.680000 1.9412222
y2 49.680000 72.248444 3.6760889
y3 1.941222 3.676089 0.2501211

> cor(df[, c("y1", "y2", "y3")])


y1 y2 y3
y1 1.0000000 0.4930154 0.327411
y2 0.4930154 1.0000000 0.864762
y3 0.3274110 0.8647620 1.000000
Problem 25

data("USArrests")
df = USArrests

pca = prcomp(df, scale. = TRUE)

v = pca$sdev^2
plot(v, type = "b", main = "Variances of Principal Components",xlab = "Principal Component", ylab =
"Variance")

plot(pca$x[, 1], pca$x[, 2], xlab = "PC1", ylab = "PC2", main = "Principal Component Plot", pch = 16)

abline(h = 0, col = "gray", lty = 2)


abline(v = 0, col = "gray", lty = 2)

arrows(0, 0, pca$rotation[, 1], pca$rotation[, 2], angle = 15, length = 0.1, col = "red")

text(pc_result$rotation[, 1], pca$rotation[, 2], labels = colnames(df), pos = 3, col = "blue")

You might also like