R Sem3
R Sem3
Sept-Dec, 2023
I. Check the length of this vector and also convert its elements to 3 decimal places.
> x = c(2.345612, 1.256701,2.19875, 0.123765, 4.132895, 7.312678, 0.123453,
4.103254,3.198341)
>x
[1] 2.345612 1.256701 2.198750 0.123765 4.132895 7.312678 0.123453 4.103254 3.198341
> length(x)
[1] 9
> round(x,3)
[1] 2.346 1.257 2.199 0.124 4.133 7.313 0.123 4.103 3.198
III. Extract all those entries whose values are greater than equal to 3.
> x[x>=3]
[1] 4.132895 7.312678 4.103254 3.198341
IV. Check whether it’s a numeric vector, character vector or integer vector?
> class(x)
[1] "numeric"
V. Calculate the square root and natural log of each entry in a single command.
> sqrt(x)
[1] 1.5315391 1.1210268 1.4828183 0.3518025 2.0329523 2.7041964 0.3513588 2.0256490
1.7883906
> log(x)
[1] 0.8525463 0.2284900 0.7878890 -2.0893707 1.4189781 1.9896096 -2.0918948
1.4117803 1.1626322
VI. Convert this vector to character and integer vector, thereafter perform the checking by using
testing functions available in R.
> y =as.character(x)
>y
[1] "2.345612" "1.256701" "2.19875" "0.123765" "4.132895" "7.312678" "0.123453"
"4.103254" "3.198341"
> z = as.integer(x)
>z
[1] 2 1 2 0 4 7 0 4 3
> is.numeric(x)
[1] TRUE
> is.character(y)
[1] TRUE
> is.integer(z)
[1] TRUE
VIII. By using the names subsets, the vector for the values of X3, X6 and X9
> x[c("X3", "X6", "X9")]
X3 X6 X9
2.198750 7.312678 3.198341
Problem 2
Generate a vector of 20 elements starting with 2.5 by a jump of 0.3 and perform the following tasks:
III. Extract all the elements which are at the even places.
> even = x[seq(2,length(x), by=2)]
> even
[1] 2.8 3.4 4.0 4.6 5.2 5.8 6.4 7.0 7.6 8.2
IV. Extract the first five elements. Thereafter bind them row wise and column wise; and
store under the name row_mat. Perform the testing for matrix and vector.
> x[1:5]
[1] 2.5 2.8 3.1 3.4 3.7
> row_mat = rbind(x[1:5])
> row_mat
[,1] [,2] [,3] [,4] [,5]
[1,] 2.5 2.8 3.1 3.4 3.7
> col_mat =cbind(x[1:5])
> col_mat
[,1]
[1,] 2.5
[2,] 2.8
[3,] 3.1
[4,] 3.4
[5,] 3.7
> is.vector(x)
[1] TRUE
> is.matrix(row_mat)
[1] TRUE
> is.matrix(col_mat)
[1] TRUE
Problem 3
Generate one vector of size=5 of 2’s and second vector of size m=4 of size 5’s and perform the
following tasks:
III. Extract all the elements which are at the odd places.
> odd = value[c(TRUE,FALSE)] # method 1
> odd
[1] 2 2 2 5 5
> odd = value[seq(1,length(value), by=2)] # method 2
> odd
[1] 2 2 2 5 5
Problem 4
Define a vector of the complex numbers, whose elements are 1+2i, 4-1i,2+1i,2- 2i,5-1i,3+6i and
perform the following tasks:
II. Calculate the modulus and complex conjugate of the above vector.
> Mod(x)
[1] 2.236068 4.123106 2.236068 2.828427 5.099020 6.708204
> Conj(x)
[1] 1-2i 4+1i 2-1i 2+2i 5+1i 3-6i
III. Check the type and class of this vector, also use str() function to get the full detail of this
vector.
> typeof(x)
[1] "complex"
> class(x)
[1] "complex"
> str(x)
cplx [1:6] 1+2i 4-1i 2+1i ...
Problem 5
I. Print the number of row, column and dimensions of the above matrix.
> x = c(5.07, 3.19, -2.32, 1.87, 1.21, 8.95, -2.04, 4.49, -3.32, -4.66, 9.93, 2.05, 1.18, 6.72, -
3.66, 8.77)
> mat_A = matrix(x,4,4, byrow=T)
> mat_A
[,1] [,2] [,3] [,4]
[1,] 5.07 3.19 -2.32 1.87
[2,] 1.21 8.95 -2.04 4.49
[3,] -3.32 -4.66 9.93 2.05
> dim(mat_A)
[1] 4 4
> nrow(mat_A) > ncol(mat_A)
[1] 4 [1] 4
II. Calculate the transpose, inverse, row sums, column sums, row mean and column mean of
the above matrix.
> t(mat_A) # transpose
[,1] [,2] [,3] [,4]
[1,] 5.07 1.21 -3.32 1.18
[2,] 3.19 8.95 -4.66 6.72
[3,] -2.32 -2.04 9.93 -3.66
[4,] 1.87 4.49 2.05 8.77
> solve(mat_A) # inverse
[,1] [,2] [,3] [,4]
[1,] 0.23714256 -0.03800968 0.0332654713 -0.03888111
[2,] -0.02411353 0.18858693 -0.0005364163 -0.09128430
[3,] 0.06513140 0.09627464 0.1034974210 -0.08737042
[4,] 0.01375093 -0.09921181 0.0391279387 0.15274054
> rowSums(mat_A)
[1] 7.81 12.61 4.00 13.01
> colSums(mat_A)
[1] 4.14 14.20 1.91 17.18
> rowMeans(mat_A)
[1] 1.9525 3.1525 1.0000 3.2525
> colMeans(mat_A)
[1] 1.0350 3.5500 0.4775 4.2950
III. Use R function to print the diagonal of mat_A.
> diag(mat_A)
[1] 5.07 8.95 9.93 8.77
VIII. After step 7, define the column names also as C1, C2, C3, C4.
> colnames(mat_A) = c("C1", "C2","C3","C4")
> mat_A
C1 C2 C3 C4
R1 5.07 3.19 -2.32 1.87
R2 1.21 8.95 -2.04 4.49
R3 -3.32 -4.66 9.93 2.05
R4 1.18 6.72 -3.66 8.77
Problem 6
Generate two sequences, each of size 25, starting with 1 by a jump of o.15 and 0.45. Using these two
sequences make two matrices of order 5X5, with names mat_B and mat_C.
I. Perform addition, subtraction and multiplication with the above two matrices.
> mat_B + mat_C
[,1] [,2] [,3] [,4] [,5]
[1,] 2.0 5.0 8.0 11.0 14.0
[2,] 2.6 5.6 8.6 11.6 14.6
[3,] 3.2 6.2 9.2 12.2 15.2
[4,] 3.8 6.8 9.8 12.8 15.8
[5,] 4.4 7.4 10.4 13.4 16.4
> mat_B - mat_C
[,1] [,2] [,3] [,4] [,5]
[1,] 0.0 -1.5 -3.0 -4.5 -6.0
[2,] -0.3 -1.8 -3.3 -4.8 -6.3
[3,] -0.6 -2.1 -3.6 -5.1 -6.6
[4,] -0.9 -2.4 -3.9 -5.4 -6.9
[5,] -1.2 -2.7 -4.2 -5.7 -7.2
> mat_B * mat_C
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0000 5.6875 13.7500 25.1875 40.0000
[2,] 1.6675 7.0300 15.7675 27.8800 43.3675
[3,] 2.4700 8.5075 17.9200 30.7075 46.8700
[4,] 3.4075 10.1200 20.2075 33.6700 50.5075
[5,] 4.4800 11.8675 22.6300 36.7675 54.2800
II. Extract the 1st, 3rd and 5th column of the mat_B and store under the name mat_D.
> mat_D = mat_B[,c(1,3,5)]
> mat_D
[,1] [,2] [,3]
[1,] 1.00 2.50 4.00
[2,] 1.15 2.65 4.15
[3,] 1.30 2.80 4.30
[4,] 1.45 2.95 4.45
[5,] 1.60 3.10 4.60
IV. Combine mat_B and mat_C row wise and column wise.
> rbind(mat_B, mat_C)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.00 1.75 2.50 3.25 4.00
[2,] 1.15 1.90 2.65 3.40 4.15
[3,] 1.30 2.05 2.80 3.55 4.30
[4,] 1.45 2.20 2.95 3.70 4.45
[5,] 1.60 2.35 3.10 3.85 4.60
[6,] 1.00 3.25 5.50 7.75 10.00
[7,] 1.45 3.70 5.95 8.20 10.45
[8,] 1.90 4.15 6.40 8.65 10.90
[9,] 2.35 4.60 6.85 9.10 11.35
[10,] 2.80 5.05 7.30 9.55 11.80
> cbind(mat_B,mat_C)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1.00 1.75 2.50 3.25 4.00 1.00 3.25 5.50 7.75 10.00
[2,] 1.15 1.90 2.65 3.40 4.15 1.45 3.70 5.95 8.20 10.45
[3,] 1.30 2.05 2.80 3.55 4.30 1.90 4.15 6.40 8.65 10.90
[4,] 1.45 2.20 2.95 3.70 4.45 2.35 4.60 6.85 9.10 11.35
[5,] 1.60 2.35 3.10 3.85 4.60 2.80 5.05 7.30 9.55 11.80
Problem 7
Tree=c(rep(1,7),rep(2,7),rep(3,7),rep(4,7),rep(5,7))
Age=rep(c(118,484,664,1004,121,1372,1582),5)
Circumferences=c(30,58,87,115,120,142,145,33,69,111,156,172,203,203,30,51,75,108,115,139,140,
32,62,112,167,179,209,214,30,49,61,125,142,174,177)
df =data.frame(Tree,Age,Circumferences)
df
for (i in 1:5)
{
print(i)
print("Minimum:")
print(min(df$Circumferences[Tree==i]))
print("Maximum:")
print(max(df$Circumferences[Tree==i]))
print("Mean:")
print(mean(df$Circumferences[Tree==i]))
print("Standard Deviation:")
print(sd(df$Circumferences[Tree==i]))
print('______________________')
}
m3 = mean(df$Circumferences[Tree==3])
m3
m5=mean(df$Circumferences[Tree==5])
m5
m3>m5
Problem 8
Plot the density function of the standard normal distribution in the interval (-6, 6) and add the points
-6(1)6 in the obtained density curve. Also, plot the curve of the distribution function.
> plot(x, d, type = 'l', col = 'skyblue', lwd = 2, ylim = c(0, 1), ylab = 'Density', main = 'Standard Normal
Distribution Density and Distribution Functions')
> legend('topleft', legend = c('Density Function', 'Distribution Function', 'Points'), col = c('skyblue',
'coral', 'blue'), lwd = 2, pch = c(NA, NA, 16), bty = 'n')
Problem 9
Draw the random numbers of size 25 from the given pdf/pmf given below. Also, plot the pdf
(probability density function) and cdf (cumulative distribution function) of the following
continuous/ discrete distributions.
I. Binomial with size=60, prob=0.3, 0.5, 0.9.
II. Poisson λ= 2, 5, 10
III. Gamma distribution
IV. Standard Cauchy distribution
V. Weibull distribution
> library(MASS)
> library(ggplot2)
# binomial
> n = 60
> p = c(0.3,0.5,0.9)
> for (prob in p){
+ x_binom = rbinom(25, size=n, prob=prob)
+ hist(x_binom, probability =TRUE, col="lightblue",main=paste("Binomial (n = 60, p =
",prob,")",sep=""),xlab="Values",ylab="Density")
+ lines(density(x_binom), col="darkred",lwd="2")
+ lines(ecdf(x_binom),col="blue",lwd=2)
+}
# poisson
> l = c(2, 5, 10)
> for (lambda in l) {
+ x_pois <- rpois(25, lambda = lambda)
+ hist(x_pois, probability = TRUE, col = "lightgreen", main = paste("Poisson (lambda = ",
lambda, ")", sep = ""), xlab = "Values", ylab = "Density")
+ lines(density(x_pois), col = "darkred", lwd = 2)
+ lines(ecdf(x_pois), col = "blue", lwd = 2)
+}
# gamma
> shape = 2
> scale = 1
> x_gamma <- rgamma(25, shape = shape, scale = scale)
> hist(x_gamma, probability = TRUE, col = "lightcoral", main = "Gamma (shape = 2, scale
= 1)", xlab = "Values", ylab = "Density")
> lines(density(x_gamma), col = "darkred", lwd = 2)
> lines(ecdf(x_gamma), col = "blue", lwd = 2)
# cauchy
> x_cauchy = rcauchy(25)
> hist(x_cauchy,probability = TRUE, col = "lightgoldenrod", main = "Standard Cauchy",
xlab = "Values", ylab = "Density")
> lines(density(x_cauchy), col = "darkred", lwd = 2)
> lines(ecdf(x_cauchy), col = "blue", lwd = 2)
# weibull
> shape = 2
> scale = 1
> x_weib = rweibull(25, shape = shape, scale = scale)
> hist(x_weib, probability = TRUE, col = "lavenderblush", main = "Weibull (shape=2,
scale=1)", xlab = "Values", ylab = "Density")
> lines(density(x_weib), col = "darkred", lwd = 2)
> lines(ecdf(x_weib), col = "blue", lwd = 2)
> mtext("Probability Density Function (PDF) and Cumulative Distribution Function (CDF)",
outer = TRUE, line = 0.5, cex = 1)
Problem 10
Create a data frame using the following data for some 20 vegetables at a supermarket sold in a
month.
The names of the vegetables (in order) are as follows: Eggplant, Pea, Cucumber, Potato, Pumpkin,
Lettuce, Tomato, Sweet potato, Mushroom, Green bean, Corn, Cauliflower, Beetroot, Bell pepper,
Broccoli, Celery, Cabbage, Carrot, Onion, Lady finger. Perform the following:
> df = data.frame(
+ Sales_Kg = c(200, 100, 100, 400, 500, 800, 150, 200, 120, 400, 500, 185, 200, 150, 240,
500, 250, 170, 350, 180),
+ Price_Kg = c(40, 80, 40, 60, 70, 90, 40, 90, 30, 30, 60, 60, 100, 65, 45, 60, 65, 35, 85, 60),
+ Color_Flag = c(2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1) )
> df
Sales_Kg Price_Kg Color_Flag
1 200 40 2
2 100 80 1
3 100 40 1
4 400 60 2
5 500 70 2
6 800 90 1
7 150 40 2
8 200 90 2
9 120 30 2
10 400 30 1
11 500 60 1
12 185 60 1
13 200 100 2
14 150 65 2
15 240 45 1
16 500 60 1
17 250 65 1
18 170 35 2
19 350 85 2
20 180 60 1
I. Add these names, as the row names of the above data frame.
> Vegetable = c("Eggplant", "Pea", "Cucumber", "Potato", "Pumpkin", "Lettuce", "Tomato",
"Sweet potato", "Mushroom", "Green bean", "Corn", "Cauliflower", "Beetroot", "Bell
pepper", "Broccoli", "Celery", "Cabbage", "Carrot", "Onion", "Lady finger")
> row.names(df) = Vegetable
> df
II. Calculate the revenue for each vegetable and its proportion in the total revenue.
> df$Revenue = df$Sales_Kg * df$Price_Kg
> tot_rev = sum(df$Revenue)
> df$Proportion = round((df$Revenue / tot_rev),2)
> df
III. Add the label ‘green’ where 1 is written and the label ‘other. color’, where 2 is written.
> df$Label = ifelse(df$Color_Flag == 1, "green", "other.color")
> df
IV. Calculate the difference between ‘green’ color vegetables ‘other. color’ vegetables revenues.
> green_rev = sum(df$Revenue[df$Label == "green"])
> other_rev = sum(df$Revenue[df$Label == "other.color"])
> diff = green_rev - other_rev
> diff
[1] 44900
> murder=data$Murder
> murder
> assault=data$Assault
> assault
> rape=data$Rape
> rape
> s1=s2=s3=0
> for(i in 1:nrow(data))
+{
+ s1=s1+murder[i]
+ s2=s2+assault[i]
+ s3=s3+rape[i]
+}
> s1=s2=s3=0
> i=1
> while(i<=nrow(data))
+{
+ s1=s1+murder[i]
+ s2=s2+assault[i]
+ s3=s3+rape[i]
+ i=i+1
+}
> s1=s2=s3=0
> i=1
> repeat
+{
+ s1=s1+murder[i]
+ s2=s2+assault[i]
+ s3=s3+rape[i]
+ i=i+1
+ if(i>length(murder))
+{
+ break
+}
+}
> sum(murder)
> sum(assault)
> sum(rape)
> subset=data.frame()
> for(i in 1:nrow(data))
+{
+ if(murder[i]>22)
+{
+ next
+}
+ else if(murder[i] <10)
+{
+ subset=rbind(subset,data[i,])
+}
+}
> subset
> row_sums=apply(data,1,sum)
> row_sums
> col_sums=apply(data,2,sum)
> col_sums
> row_avg=apply(data,1,mean)
> row_avg
> col_avg=apply(data,2,mean)
> col_avg
Problem 12
x = rgamma(100,shape=3,scale=2)
s=sum(x)
true_shape=3
true_scale=2
n=100
mle_scale=s/(n*true_shape)
mle_scale
# MLE of scale parameter: 1.917102
scale_val=seq(from=0.1,to=10,by=0.1)
log_likelihood=((true_shape-1)*sum(log(x)))-(s/scale_val)-(n*true_shape*log(scale_val))-
(n*lgamma(true_shape))
plot(scale_val,log_likelihood,type="l",xlab="Scale Parameter",ylab="Log Likelihood",main="Log
Likelihood for Scale Parameter")
abline(v=mle_scale,col="red",lty=2)
x=rexp(n=100,rate = 2.5)
s=sum(x)
true_lambda=2.5
n=100
mle_lambda=n/s
mle_lambda
# MLE of lambda: 2.461891
lambda_val=seq(0.1,10,by = 0.1)
log_likelihood=(n*log(lambda_val))-(lambda_val*s)
plot(lambda_val,log_likelihood,type="l",xlab="Lambda",ylab="Log Likelihood",main="Log
Likelihood for Lambda")
abline(v=mle_lambda,col="red",lty=2)
Problem 13
x = runif(50)
y = rnorm(100, mean = 0, sd = 1)
z = rexp(50, 0.1)
lst = list(x = x, y = y, z = z)
f = function(n)
{
result = list(
average = mean(n),
sum = sum(n),
range = range(n),
median = median(n),
minimum = min(n),
maximum = max(n),
log_e = log(n),
log_10 = log10(n)
)
return(result)
}
lapply(lst[c(“x”, ”z”)], f)
sapply(lst[c(“x”, ”z”)], f)
y = y[y > 0]
lst = list(x = x, y = y, z = z)
lapply(lst, f)
sapply(lst, f)
Problem 14
repeat {
v2 = v1 - (exp(-v1) - 5 * v1 - 2) / (-exp(-v1) - 5)
d = v2 - v1
iterations = iterations + 1
v1 = v2
}
newton_raphson(initial_guess = 0)
Problem 15
> x = c(3,5,7,9,2)
> W = diag(x)
>W
[,1] [,2] [,3] [,4] [,5]
[1,] 3 0 0 0 0
[2,] 0 5 0 0 0
[3,] 0 0 7 0 0
[4,] 0 0 0 9 0
[5,] 0 0 0 0 2
> diag(4)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1
> matrix(1,3,3)
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1
Problem 16
> p = matrix(c(0.5,0.5,0.0,0.0,0.0,
+ 0.5,0.5,0.0,0.0,0.0,
+ 0.0,0.0,0.3333333,0.3333333,0.0,
+ 0.0,0.0,0.6666667,0.6666667,0.0,
+ 0.2,0.2,0.2,0.2,0.2),5,5,byrow=TRUE)
> p_100 = p
> for (i in 1:99) {
+ p_100 = p_100 %*% p
+}
> p_100
[,1] [,2] [,3] [,4] [,5]
[1,] 0.50 0.50 0.0000000 0.0000000 0.000000e+00
[2,] 0.50 0.50 0.0000000 0.0000000 0.000000e+00
[3,] 0.00 0.00 0.3333333 0.3333333 0.000000e+00
[4,] 0.00 0.00 0.6666667 0.6666667 0.000000e+00
[5,] 0.25 0.25 0.2500000 0.2500000 1.267651e-70
Problem 17
> library(MASS)
> A = matrix(c(3, 1, 5, 2,
+ 1,11, 7, 6,
+ 3, 7, 4, 1,
+ 6, 3, 7, 1), nrow = 4, byrow = TRUE)
> A_plus = ginv(A)
$Property_2
[1] TRUE
$Property_3
[1] TRUE
$Property_4
[1] TRUE
$Property_B
[1] TRUE
$Property_C
[1] TRUE
Practical 18
> a1 = c(1,1,0)
> a2 = c(1,0,1)
> a3 = c(0,1,1)
> b1 = a1
> b2 = a2 - ((sum(a2 * b1)) * b1) / (sum(b1^2))
> b3 = a3 - (((sum(a3 * b1)) * b1) / sum(b1^2) - (((sum(a3 * b2)) * b2) / sum(b2^2)))
[[1]]
[1] 1 1 0
[[2]]
[1] 0.5 -0.5 1.0
[[3]]
[1] -0.3333333 0.3333333 1.3333333
[[1]]
[1] 0.7071068 0.7071068 0.0000000
[[2]]
[1] 0.4082483 -0.4082483 0.8164966
[[3]]
[1] -0.2357023 0.2357023 0.9428090
Practical 19
> f1 = function(x){(sqrt(x))/(sqrt(3-x)+sqrt(x))}
> f2 = function(x){1/(1+sqrt(1/tan(x)))}
> f3 = function(y){exp(-y) * y^6}
> f4 = function(x){x*abs(x)}
> f5 = function(x){abs(cos(x))}
> f6 = function(x){abs(sin(2*pi*x))}
> f7 = function(x){1/(sqrt(x+1)+sqrt(5*x+1))}
> f8 = function(x){atan((2*x-1)/(1+x-x^2))}
> f9 = function(x){(x^2-2)/(x^3 *sqrt(x^2-1))}
> f10 = function(x){1/(sqrt(1-x[1]^2)*sqrt(1- x[2]^2))}
> integrate(f1,1,2)$value
[1] 0.5
> integrate(f2, pi/6,pi/3)$value
[1] 0.2617994
> integrate(f3,0,Inf)$value
[1] 720
> integrate(f4,-1,1)$value
[1] 0
> integrate(f5,0,pi)$value
[1] 2
> integrate(f6,0,1)$value
[1] 0.6366198
> integrate(f7,0,3)$value
[1] 0.7445872
> round(integrate(f8,0,1)$value,8)
[1] 0
> round(integrate(f9,1,Inf)$value,8)
[1] 0
# install.packages("cubature")
> adaptIntegrate(f10, c(0, 0), c(1, 1))$integral
[1] 2.467393
Practical 20
> income = c(6.5, 10.5, 12.7, 13.8, 13.2, 11.4, 5.5, 8.0, 9.6, 9.1, 9.0, 8.5, 4.8, 7.3, 8.4, 8.7, 7.3, 7.4, 5.6,
6.8, 6.9, 6.8, 6.1, 6.5, 4.0, 6.4, 6.4, 8.0, 6.6, 6.2, 4.7, 7.4, 8.0, 8.3, 7.6, 6.7)
> mu_0 = 10
> alpha = 0.05
# two-sided t-test
> t_test = t.test(income, mu = mu_0)
> t_test$statistic
t
-5.826874
> t_test$p.value
[1] 1.305634e-06
> t_test$statistic
t
-5.826874
> t_test$p.value
[1] 6.52817e-07
> df = rivers
> boxplot(df)
data: df
D = 0.11917, p-value = 0.03644
alternative hypothesis: two-sided
Warning message:
In ks.test.default(df, "pgamma", shape = para$estimate[1], rate = para$estimate[2]) :
ties should not be present for the Kolmogorov-Smirnov test
> qqplot(df, qgamma(ppoints(length(df)), shape = para$estimate[1], rate = para$estimate[2]),main =
"Q-Q Plot - Gamma Distribution", xlab = "Theoretical Quantiles", ylab = "Sample Quantiles")
> abline(0, 1, col = "blue")
> shapiro.test(df)
data: df
W = 0.88346, p-value = 3.924e-09
#b
> ks.test(rock$area, "pnorm")
data: rock$area
D = 1, p-value < 2.2e-16
alternative hypothesis: two-sided
Warning message:
In ks.test.default(rock$area, "pnorm") :
ties should not be present for the Kolmogorov-Smirnov test
> para = fitdistr(rock$shape, "weibull")
Warning messages:
1: In densfun(x, parm[1], parm[2], ...) : NaNs produced
2: In densfun(x, parm[1], parm[2], ...) : NaNs produced
> ks.test(rock$shape, "pweibull", shape = para$estimate[1])
data: rock$shape
D = 0.88681, p-value < 2.2e-16
alternative hypothesis: two-sided
Warning message:
In ks.test.default(rock$shape, "pweibull", shape = para$estimate[1]) :
ties should not be present for the Kolmogorov-Smirnov test
Problem 23
> df = mtcars
> boxplot(df, main="Before removing outliers")
> y = c("hp","wt","qsec","carb")
> for (col in y) {
+ df[[col]] = out(df[[col]])
+}
Call:
lm(formula = X ~ ., data = as.data.frame(Y))
Residuals:
Min 1Q Median 3Q Max
-4.7639 -1.6332 0.0468 1.3660 5.3056
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.722893 23.950216 0.823 0.4195
cyl 0.659105 1.077842 0.612 0.5474
disp -0.025542 0.012211 -2.092 0.0488 *
hp 0.002362 0.023532 0.100 0.9210
drat 1.856711 1.760976 1.054 0.3037
wt -3.833424 1.691290 -2.267 0.0341 *
qsec 0.455660 0.858629 0.531 0.6012
vs -1.068076 2.449393 -0.436 0.6672
am -1.425667 2.491880 -0.572 0.5733
gear 0.863971 1.640065 0.527 0.6039
carb -1.319005 0.630910 -2.091 0.0489 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The model is statistically significant overall, and some variables (disp, wt, carb) are individually
significant. The 𝑅² value indicates a good overall fit, but it's important to consider the adjusted 𝑅²
when evaluating the model's complexity.
Problem 24
> df = data.frame(
+ locationNo = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
+ y1 = c(35, 35, 40, 10, 6, 20, 35, 35, 35, 30),
+ y2 = c(3.5, 4.9, 30.0, 2.8, 2.7, 2.8, 4.6, 10.9, 8.0, 1.6),
+ y3 = c(2.80, 2.70, 4.38, 3.21, 2.73, 2.81, 2.88, 2.90, 3.28, 3.20)
+)
> df
locationNo y1 y2 y3
1 1 35 3.5 2.80
2 2 35 4.9 2.70
3 3 40 30.0 4.38
4 4 10 2.8 3.21
5 5 6 2.7 2.73
6 6 20 2.8 2.81
7 7 35 4.6 2.88
8 8 35 10.9 2.90
9 9 35 8.0 3.28
10 10 30 1.6 3.20
data("USArrests")
df = USArrests
v = pca$sdev^2
plot(v, type = "b", main = "Variances of Principal Components",xlab = "Principal Component", ylab =
"Variance")
plot(pca$x[, 1], pca$x[, 2], xlab = "PC1", ylab = "PC2", main = "Principal Component Plot", pch = 16)
arrows(0, 0, pca$rotation[, 1], pca$rotation[, 2], angle = 15, length = 0.1, col = "red")