0% found this document useful (0 votes)

38 views10 pages

STA2050 Assignment 2

The document imports several R packages and creates a dataset with tree diameter and age measurements. It then: 1. Plots the tree diameter and age data and estimates population mean age using ratio estimation, obtaining an estimate of 117.6204 with a standard error of 6.40904. 2. Fits a linear regression model to the data and estimates population mean age using regression estimation. The regression equation is estimated to be age = -0.03031 + 11.42115*diameter, with an estimated mean age of 117.6075 and standard error. 3. Adds labels to the original scatterplot for the two estimates obtained in 1 and 2, showing them to be very similar.

Uploaded by

gugugaga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views10 pages

STA2050 Assignment 2

Uploaded by

gugugaga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

STA2050 Assignment 2

Mark Bilahi M’rabu

2023-11-28

Import the Necessary Packages

library(imputeTS)

## Warning: package ’imputeTS’ was built under R version 4.3.2

## Registered S3 method overwritten by ’quantmod’:

## method from
## as.zoo.data.frame zoo

library(VIM)

## Warning: package ’VIM’ was built under R version 4.3.2

## Loading required package: colorspace

## Loading required package: grid

## VIM is ready to use.

## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues

##
## Attaching package: ’VIM’

## The following object is masked from ’package:datasets’:

##
## sleep

library(nnet)
library(readxl)
library(csv)
library(dplyr)

##
## Attaching package: ’dplyr’

1
## The following objects are masked from ’package:stats’:
##
## filter, lag

## The following objects are masked from ’package:base’:

##
## intersect, setdiff, setequal, union

library(tidyverse)

## Warning: package ’tidyverse’ was built under R version 4.3.2

## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --

## v forcats 1.0.0 v readr 2.1.4
## v ggplot2 3.4.3 v stringr 1.5.0
## v lubridate 1.9.3 v tibble 3.2.1
## v purrr 1.0.2 v tidyr 1.3.0

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --

## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(finalfit)

## Warning: package ’finalfit’ was built under R version 4.3.2

library(lubridate)
library(tidyr)
library(ggplot2)
library(psych)

## Warning: package ’psych’ was built under R version 4.3.2

##
## Attaching package: ’psych’
##
## The following objects are masked from ’package:ggplot2’:
##
## %+%, alpha

library(lme4)

## Loading required package: Matrix

##
## Attaching package: ’Matrix’
##
## The following objects are masked from ’package:tidyr’:
##
## expand, pack, unpack

2
library(mice)

##
## Attaching package: ’mice’
##
## The following object is masked from ’package:stats’:
##
## filter
##
## The following objects are masked from ’package:base’:
##
## cbind, rbind

library(tibble)

Question 1

# Create the DataSet

Tree_no = c(1:20)
Diameter_X = c(12, 11.4, 7.9, 10.5, 7.9, 9, 7.3, 10.2, 11.7, 11.3, 5.7, 8, 10.3, 12, 9.2, 8.5, 7, 10.7,
Age_Y = c(125, 119, 83, 85, 99, 117, 69, 133, 154, 168, 61, 80, 114, 147, 122, 106, 82, 88, 97, 99)
Treedata = data.frame(cbind(Tree_no, Diameter_X, Age_Y))
head(Treedata)

## Tree_no Diameter_X Age_Y

## 1 1 12.0 125
## 2 2 11.4 119
## 3 3 7.9 83
## 4 4 10.5 85
## 5 5 7.9 99
## 6 6 9.0 117

#View(Treedata)
str(Treedata)

## ’data.frame’: 20 obs. of 3 variables:

## $ Tree_no : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Diameter_X: num 12 11.4 7.9 10.5 7.9 9 7.3 10.2 11.7 11.3 ...
## $ Age_Y : num 125 119 83 85 99 117 69 133 154 168 ...

a) Draw a scatterplot of y vs x

Treeplot = ggplot(Treedata, aes(x = Diameter_X, y = Age_Y)) + geom_point() + labs(Treedata, title = 'Sca

Treeplot

3
Scatterplot of Y (Age) against X (Diameter)

150

125
Tree Age

100

6 8 10 12
Tree Diameter

"\n"

## [1] "\n"

b) Estimate the Population Mean age of trees in the stand using ratio estimation and give an approximate
standard error for your estimate

Diameter = Auxilliary = x
Age = Dependent = y
n
X
tx = xi
i=1
n
X
ty = yi
i=1

tx µx
=
ty µy
ty
P opulationM eanAgeµy = µx ∗
tx
StanDev
StandardError = √
nof age

4
#Estimating Population Mean Age
meanx = 10.3
totalx = sum(Diameter_X)
cat("Diameter Sample Total =", totalx, "\n")

## Diameter Sample Total = 188.1

totaly = sum(Age_Y)
cat("Age Sample Total =", totaly, "\n")

## Age Sample Total = 2148

meany = meanx * (totaly / totalx)

cat("Population Mean Age =", meany, "\n")

## Population Mean Age = 117.6204

#Approximating Standard Error for above Estimation

StanError = sd(Age_Y) / (sqrt(length(Age_Y)))
cat("Standard Error=", StanError, "\n")

## Standard Error= 6.40904

cat("Therefore, the estimated population mean age of trees is", meany, "while the standard error for sai

## Therefore, the estimated population mean age of trees is 117.6204 while the standard error for said e

c) Repeat b) using Regression Estimation

P P P
n x∗y− x∗ y
bi = P 2 P 2
n ∗ x − ( x)
y = b0 + b1 x
#Creating a Subset of the Treedata
Regdata = Treedata
Regdata[4] = Regdata[2] * Regdata[3]
Regdata[5] = Regdata[2]ˆ2
Regdata[6] = Regdata[3]ˆ2
colnames(Regdata) = c("Tree_no", 'Diameter_X', "Age_Y", "XY", "Xsqr", "Ysqr")
Regdata = Regdata %>%
mutate_at(vars(1), as.character)
Totals = Regdata %>%
summarize(Tree_no = "Totals",
Diameter_X = sum(Regdata$Diameter_X),
Age_Y = sum(Regdata$Age_Y),
XY = sum(Regdata$XY),
Xsqr = sum(Regdata$Xsqr),
Ysqr = sum(Regdata$Ysqr))
Regdata = bind_rows(Regdata, Totals)
#Regdata = Regdata %>%
# mutate(Regdata$Tree_no[21], "Totals")
str(Regdata)

5
## ’data.frame’: 21 obs. of 6 variables:
## $ Tree_no : chr "1" "2" "3" "4" ...
## $ Diameter_X: num 12 11.4 7.9 10.5 7.9 9 7.3 10.2 11.7 11.3 ...
## $ Age_Y : num 125 119 83 85 99 117 69 133 154 168 ...
## $ XY : num 1500 1357 656 892 782 ...
## $ Xsqr : num 144 130 62.4 110.2 62.4 ...
## $ Ysqr : num 15625 14161 6889 7225 9801 ...

head(Regdata)

## Tree_no Diameter_X Age_Y XY Xsqr Ysqr

## 1 1 12.0 125 1500.0 144.00 15625
## 2 2 11.4 119 1356.6 129.96 14161
## 3 3 7.9 83 655.7 62.41 6889
## 4 4 10.5 85 892.5 110.25 7225
## 5 5 7.9 99 782.1 62.41 9801
## 6 6 9.0 117 1053.0 81.00 13689

#Fitting the Linear Regression Model

Treemodel = lm(Age_Y ~ Diameter_X, data = Regdata)
summary(Treemodel)

##
## Call:
## lm(formula = Age_Y ~ Diameter_X, data = Regdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.892 -11.171 -0.288 9.977 38.971
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03031 4.33635 -0.007 0.994
## Diameter_X 11.42115 0.10301 110.874 <2e-16 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 17.98 on 19 degrees of freedom
## Multiple R-squared: 0.9985, Adjusted R-squared: 0.9984
## F-statistic: 1.229e+04 on 1 and 19 DF, p-value: < 2.2e-16

Intercept = coef(Treemodel)[1]
Slope = coef(Treemodel)[2]
cat("The slope of the model is", Slope, "while the intercept is", Intercept, "\n")

## The slope of the model is 11.42115 while the intercept is -0.03030842

Estimate = Intercept + (Slope *meanx)

The Regression is given by:

age = −7.63 + 12.23diameter

6
#Approximating Standard Error for the above Estimation
Residuals = resid(Treemodel)
ResStanDev = sd(Residuals)
SE = ResStanDev * sqrt(1 + 1/length(Age_Y) + ((meanx - mean(Diameter_X))ˆ2 / sum((Diameter_X - mean(Diam
cat("The Regression Estimate of the Population Mean Age is", Estimate, "while the Standard Error for the

## The Regression Estimate of the Population Mean Age is 117.6075 while the Standard Error for the above

d) Label your Estimates on your graph. How do they compare?

Ratio = c(10.3, meany)

Regression = c(10.3, Estimate)
Est = data.frame(rbind(Ratio, Regression))
Est

## V1 X.Intercept.
## Ratio 10.3 117.6204
## Regression 10.3 117.6075

colnames(Est) = c("X", "Y")

X = Est$X
Y = Est$Y
EstGraph = ggplot(Treedata, aes(x = Diameter_X, y = Age_Y), color = "blue", shape = 15) + geom_point() +
EstGraph

Scatterplot of Y (Age) against X (Diameter)

with Estimates & Regression Line

150

125
Tree Age

100

6 8 10 12
Tree Diameter

7
"\n"

## [1] "\n"

From the above graph it is clear to see that both the Ratio Estimation & The Regression Estimation of the
Population Mean Age are very close in their value however the Standard Error for the methods vary greatly

Question 2

The new candy Green Goobules is being test-marketed in an area of upstate NY. The market research firm
decided to sample 6 cities from the 45 in the area & then to sample supermarkets within the cities, wanting
to know the no. of Green Gobules cases sold

a) Obtain summary statistics for each cluster

# Create the Data Sets

cluster_1 = c(146, 180, 251, 152, 72, 181, 171, 361, 73, 186)
cluster_2 = c(99, 101, 52, 121)
cluster_3 = c(199, 179, 98, 63, 126, 87, 62)
cluster_4 = c(226, 129, 57, 46, 86, 43, 85, 165)
cluster_5 = c(12, 23)
cluster_6 = c(87, 43, 59)

#Cluster 1
summary(cluster_1)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 72.0 147.5 175.5 177.3 184.8 361.0

sd1 = sd(cluster_1)
var1 = var(cluster_1)
cat("The Standard Deviation of Cluster 1 is", sd1, "while the Variance is", var1, "\n", "\n")

## The Standard Deviation of Cluster 1 is 83.59964 while the Variance is 6988.9

#Cluster 2
summary(cluster_2)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 52.00 87.25 100.00 93.25 106.00 121.00

sd2 = sd(cluster_2)
var2 = var(cluster_2)
cat("The Standard Deviation of Cluster 2 is", sd2, "while the Variance is", var2, "\n", "\n")

## The Standard Deviation of Cluster 2 is 29.23896 while the Variance is 854.9167

8
#Cluster 3
summary(cluster_3)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 62.0 75.0 98.0 116.3 152.5 199.0

sd3 = sd(cluster_3)
var3 = var(cluster_3)
cat("The Standard Deviation of Cluster 3 is", sd3, "while the Variance is", var3, "\n", "\n")

## The Standard Deviation of Cluster 3 is 54.53963 while the Variance is 2974.571

#Cluster 4
summary(cluster_4)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 43.00 54.25 85.50 104.62 138.00 226.00

sd4 = sd(cluster_4)
var4 = var(cluster_4)
cat("The Standard Deviation of Cluster 4 is", sd4, "while the Variance is", var4, "\n", "\n")

## The Standard Deviation of Cluster 4 is 64.59309 while the Variance is 4172.268

#Cluster 5
summary(cluster_5)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 12.00 14.75 17.50 17.50 20.25 23.00

sd5 = sd(cluster_5)
var5 = var(cluster_5)
cat("The Standard Deviation of Cluster 5 is", sd5, "while the Variance is", var5, "\n", "\n")

## The Standard Deviation of Cluster 5 is 7.778175 while the Variance is 60.5

#Cluster 6
summary(cluster_6)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 43 51 59 63 73 87

sd6 = sd(cluster_6)
var6 = var(cluster_6)
cat("The Standard Deviation of Cluster 1 is", sd6, "while the Variance is", var6, "\n", "\n")

9
## The Standard Deviation of Cluster 1 is 22.27106 while the Variance is 496
##

b) Estimate the total number of cases sold, and the average number sold per supermarket, along with the
standard errors of your Estimates

# Create the Data Set

City = c(1, 2, 3, 4, 5, 6)
Supermarket_No = c(52, 19, 37, 39, 8, 14)
"cluster_1 = I(list(c(146, 180, 251, 152, 72, 181, 171, 361, 73, 186)))
cluster_2 = I(list(c(99, 101, 52, 121)))
cluster_3 = I(list(c(199, 179, 98, 63, 126, 87, 62)))
cluster_4 = I(list(c(226, 129, 57, 46, 86, 43, 85, 165)))
cluster_5 = I(list(c(12, 23)))
cluster_6 = I(list(c(87, 43, 59)))"

## [1] "cluster_1 = I(list(c(146, 180, 251, 152, 72, 181, 171, 361, 73, 186)))\ncluster_2 = I(list(c(99,

Cases_Sold = data.frame(Clusters = c("Cluster_1", "Cluster_2", "Cluster_3", "Cluster_4", "Cluster_5", "C

#Cases_Sold = bind_rows(
# tibble(Cluster = "cluster_1", Value = cluster_1),
# tibble(Cluster = "cluster_2", Value = cluster_2),
# tibble(Cluster = "cluster_3", Value = cluster_3),
# tibble(Cluster = "cluster_4", Value = cluster_4),
# tibble(Cluster = "cluster_5", Value = cluster_5),
# tibble(Cluster = "cluster_6", Value = cluster_6)
#)
#Cases_Sold
#Cases_Sold = tidyr::gather(data.frame(cluster_1, cluster_2, cluster_3, cluster_4, cluster_5, cluster_6)
#Cases_Sold = data.frame(var1 = rbind(cluster_1, cluster_2, cluster_3, cluster_4, cluster_5, cluster_6))
View(Cases_Sold)
Green_Gobules = cbind(City, Supermarket_No, Cases_Sold[2])
View(Green_Gobules)
#Green_Gobules = Green_Gobules %>%
# mutate_at(vars(3), as.numeric)
mean(Green_Gobules[1, 3])

## Warning in mean.default(Green_Gobules[1, 3]): argument is not numeric or

## logical: returning NA

## [1] NA

A028 GLM-SC3
No ratings yet
A028 GLM-SC3
137 pages
BT1101 Tutorial 2 Part 2
No ratings yet
BT1101 Tutorial 2 Part 2
33 pages
Shahun Term Workr1
No ratings yet
Shahun Term Workr1
34 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
SML Practical 1to11
No ratings yet
SML Practical 1to11
23 pages
Genetica Cuantitativa
No ratings yet
Genetica Cuantitativa
120 pages
Applied Statistics MAT1011
No ratings yet
Applied Statistics MAT1011
22 pages
Data Scinece Practical File
No ratings yet
Data Scinece Practical File
23 pages
R Codes
No ratings yet
R Codes
5 pages
R Program
No ratings yet
R Program
22 pages
Chapter 3 - STAT1204..
No ratings yet
Chapter 3 - STAT1204..
10 pages
Ind Forecasting 5 Workshop 1 R .176656.1581830125.0139
No ratings yet
Ind Forecasting 5 Workshop 1 R .176656.1581830125.0139
5 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
Lab1-Markdown SP2025 XFL (CLEAN)
No ratings yet
Lab1-Markdown SP2025 XFL (CLEAN)
7 pages
RSCH8079 - Session 09 - Data Science With R
No ratings yet
RSCH8079 - Session 09 - Data Science With R
69 pages
R Practicals
No ratings yet
R Practicals
32 pages
Mock Exam - Appendix
No ratings yet
Mock Exam - Appendix
15 pages
R Code
No ratings yet
R Code
9 pages
Ds
No ratings yet
Ds
2 pages
Part A R Programming
No ratings yet
Part A R Programming
10 pages
DA Lab Week-1
No ratings yet
DA Lab Week-1
7 pages
Assignment-1 80501
No ratings yet
Assignment-1 80501
6 pages
Cours BI - R
No ratings yet
Cours BI - R
18 pages
Copy Entire Document Content in R Studio: R Script Compiled by Mr. Anup Sharma (Strictly To Be Used As Class Notes)
No ratings yet
Copy Entire Document Content in R Studio: R Script Compiled by Mr. Anup Sharma (Strictly To Be Used As Class Notes)
15 pages
WEEK
No ratings yet
WEEK
17 pages
Machine Learning-Intro
No ratings yet
Machine Learning-Intro
7 pages
Pool
No ratings yet
Pool
13 pages
Intro To R Software
No ratings yet
Intro To R Software
7 pages
R Console
No ratings yet
R Console
6 pages
R File Code
No ratings yet
R File Code
16 pages
Questions With No Solutions
No ratings yet
Questions With No Solutions
20 pages
R Commands
No ratings yet
R Commands
18 pages
Sathyabama Becse-Dsregulation
No ratings yet
Sathyabama Becse-Dsregulation
120 pages
Fluid Mechanics and Pipe Flow: Turbulence, Simulation and Dynamics
100% (4)
Fluid Mechanics and Pipe Flow: Turbulence, Simulation and Dynamics
483 pages
(Mai 1.16) Eigenvalues - Eigenvectors
No ratings yet
(Mai 1.16) Eigenvalues - Eigenvectors
8 pages
Data Science Using R
No ratings yet
Data Science Using R
11 pages
UL2
No ratings yet
UL2
2 pages
Apex Steel Catalogue 2021 FINAL Compressed Compressed
No ratings yet
Apex Steel Catalogue 2021 FINAL Compressed Compressed
71 pages
Sheet1 Sol
No ratings yet
Sheet1 Sol
10 pages
Problem Set 1 Solution Numerical Methods
No ratings yet
Problem Set 1 Solution Numerical Methods
32 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
Numerical Methods
No ratings yet
Numerical Methods
130 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
100% (1)
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
896 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Aditya Garg DMDW
No ratings yet
Aditya Garg DMDW
40 pages
Using R For Data Preprocessing, Exploratory Analysis, Visualization
No ratings yet
Using R For Data Preprocessing, Exploratory Analysis, Visualization
7 pages
R Functions List
No ratings yet
R Functions List
8 pages
Duality in Linear Programming 3 - Solved Examples
No ratings yet
Duality in Linear Programming 3 - Solved Examples
5 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
R Course
No ratings yet
R Course
7 pages
BAN5
No ratings yet
BAN5
2 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
Day 4 Slot 2 Mid - SPR 24 25
No ratings yet
Day 4 Slot 2 Mid - SPR 24 25
100 pages
Computational Fluid Mechanics and Heat Transfer Ktu Honours
No ratings yet
Computational Fluid Mechanics and Heat Transfer Ktu Honours
7 pages
Lab Wk1soln PDF
No ratings yet
Lab Wk1soln PDF
14 pages
R Examples
No ratings yet
R Examples
56 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
Smoothed Particle Hydrodynamics (SPH) : An Overview and Recent Developments
100% (1)
Smoothed Particle Hydrodynamics (SPH) : An Overview and Recent Developments
52 pages
TCAD - Silvaco Detailed
100% (1)
TCAD - Silvaco Detailed
30 pages
21 Knapsack Problem
No ratings yet
21 Knapsack Problem
32 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
E Mahesh PGT Mathematics
No ratings yet
E Mahesh PGT Mathematics
14 pages
Wolfe Method
No ratings yet
Wolfe Method
10 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
Chapter 2 Matrices MATV101 Updated
No ratings yet
Chapter 2 Matrices MATV101 Updated
35 pages
8-5 Study Guide and Intervention: Using The Distributive Property
No ratings yet
8-5 Study Guide and Intervention: Using The Distributive Property
3 pages
Unit-3 Matrices 2022-2023
No ratings yet
Unit-3 Matrices 2022-2023
35 pages
Maths UG0803 Three Four Year Bachelor of Science (Maths Group) English
No ratings yet
Maths UG0803 Three Four Year Bachelor of Science (Maths Group) English
24 pages
Chapter 3 - Matrix
No ratings yet
Chapter 3 - Matrix
8 pages
Lecture Notes-1
No ratings yet
Lecture Notes-1
98 pages
8 - M2 - Stratified Sampling
No ratings yet
8 - M2 - Stratified Sampling
33 pages
Discriminant Analysis Presentation
No ratings yet
Discriminant Analysis Presentation
21 pages
Presentation Topics For ECE-B
No ratings yet
Presentation Topics For ECE-B
4 pages
Introduction To Linear Systems
No ratings yet
Introduction To Linear Systems
40 pages
Advance Engineering Mathematics ECE 321
No ratings yet
Advance Engineering Mathematics ECE 321
33 pages
Fractional Differentiation Matrices With Applications
No ratings yet
Fractional Differentiation Matrices With Applications
20 pages
STA1040 MidSem Exam
No ratings yet
STA1040 MidSem Exam
12 pages
Logistic Regression
No ratings yet
Logistic Regression
22 pages
STA1040 Assignment
No ratings yet
STA1040 Assignment
9 pages
Maria Integrated School Poblacion Norte, Maria, Siquijor Email Add
No ratings yet
Maria Integrated School Poblacion Norte, Maria, Siquijor Email Add
7 pages
ECCM16 Paper IRC-Technologie Ucsnik
No ratings yet
ECCM16 Paper IRC-Technologie Ucsnik
8 pages
Logistic Regression
No ratings yet
Logistic Regression
5 pages
Decentralized Load-Frequency Control of A Two-Area Power System Via Linear Programming and Optimization Techniques
No ratings yet
Decentralized Load-Frequency Control of A Two-Area Power System Via Linear Programming and Optimization Techniques
6 pages
AEIE Proposed 2nd Year Syllabus-15.12.11
No ratings yet
AEIE Proposed 2nd Year Syllabus-15.12.11
24 pages
Generalised Linear Models: Getwd
No ratings yet
Generalised Linear Models: Getwd
7 pages
Count Data
No ratings yet
Count Data
5 pages
Present Tense
No ratings yet
Present Tense
14 pages
Individual Assingment 2
No ratings yet
Individual Assingment 2
3 pages
Cheat Sheet imputeTS
No ratings yet
Cheat Sheet imputeTS
1 page
Module 2 Linear Programming
No ratings yet
Module 2 Linear Programming
8 pages
An Application of Homotopy Perturbation Method For Non-Linear BlasiusEquation To Boundary Layer Flow Over A Flat Plate
No ratings yet
An Application of Homotopy Perturbation Method For Non-Linear BlasiusEquation To Boundary Layer Flow Over A Flat Plate
6 pages
Sma MCQ
No ratings yet
Sma MCQ
2 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages

STA2050 Assignment 2

Uploaded by

STA2050 Assignment 2

Uploaded by

STA2050 Assignment 2

Mark Bilahi M’rabu

Import the Necessary Packages

## Warning: package ’imputeTS’ was built under R version 4.3.2

## Registered S3 method overwritten by ’quantmod’:

## Warning: package ’VIM’ was built under R version 4.3.2

## Loading required package: colorspace

## Loading required package: grid

## VIM is ready to use.

## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues

## The following object is masked from ’package:datasets’:

## The following objects are masked from ’package:base’:

## Warning: package ’tidyverse’ was built under R version 4.3.2

## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --

## Warning: package ’finalfit’ was built under R version 4.3.2

## Warning: package ’psych’ was built under R version 4.3.2

## Loading required package: Matrix

# Create the DataSet

## Tree_no Diameter_X Age_Y

## ’data.frame’: 20 obs. of 3 variables:

Treeplot = ggplot(Treedata, aes(x = Diameter_X, y = Age_Y)) + geom_point() + labs(Treedata, title = 'Sca

## Diameter Sample Total = 188.1

## Age Sample Total = 2148

meany = meanx * (totaly / totalx)

## Population Mean Age = 117.6204

#Approximating Standard Error for above Estimation

## Standard Error= 6.40904

c) Repeat b) using Regression Estimation

## Tree_no Diameter_X Age_Y XY Xsqr Ysqr

#Fitting the Linear Regression Model

## The slope of the model is 11.42115 while the intercept is -0.03030842

Estimate = Intercept + (Slope *meanx)

The Regression is given by:

d) Label your Estimates on your graph. How do they compare?

Ratio = c(10.3, meany)

colnames(Est) = c("X", "Y")

Scatterplot of Y (Age) against X (Diameter)

a) Obtain summary statistics for each cluster

# Create the Data Sets

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## The Standard Deviation of Cluster 1 is 83.59964 while the Variance is 6988.9

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## The Standard Deviation of Cluster 2 is 29.23896 while the Variance is 854.9167

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## The Standard Deviation of Cluster 3 is 54.53963 while the Variance is 2974.571

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## The Standard Deviation of Cluster 4 is 64.59309 while the Variance is 4172.268

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## The Standard Deviation of Cluster 5 is 7.778175 while the Variance is 60.5

## Min. 1st Qu. Median Mean 3rd Qu. Max.

# Create the Data Set

Cases_Sold = data.frame(Clusters = c("Cluster_1", "Cluster_2", "Cluster_3", "Cluster_4", "Cluster_5", "C

## Warning in mean.default(Green_Gobules[1, 3]): argument is not numeric or

You might also like