0% found this document useful (0 votes)

13 views17 pages

WEEK

Uploaded by

m.gamingboy204

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views17 pages

WEEK

Uploaded by

m.gamingboy204

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

WEEK – 5

a)How to find a correlation matrix and plot the correlation on iris data set

install.packages("corrplot") # If not already installed

library(corrplot)

# Calculate and plot the correlation matrix

cor_matrix <- cor(iris[, 1:4]) # Correlation for numeric columns

corrplot(cor_matrix, method = "circle", main = "Correlation Matrix of Iris Dataset")

b) Plot the correlation plot on dataset and visualize giving an overview of on iris
data.

corrplot(cor_matrix, method = "circle",

main = "Correlation Matrix of Iris Dataset",

tl.col = "black", tl.srt = 45)

c) Analysis of covariance: variance (ANOVA), if data have categorical variables
on iris data. SOURCE CODE:

library(ggplot2)

# Scatter plot

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point()

# Box plot

ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) + geom_boxplot()

# Correlation heatmap

cor_matrix <- cor(iris[, 1:4])

cor_data <- as.data.frame(as.table(cor_matrix))

ggplot(cor_data, aes(Var1, Var2, fill = Freq)) + geom_tile() + scale_fill_gradient2()

WEEK – 6

Create Relationship Model & get the Coefficients

# Reduced data points (3 points each)

x <- c(151, 174, 138)

y <- c(63, 81, 56)

# Apply the lm() function

relation <- lm(y ~ x)

# Print the model

print(relation)
Call:

lm(formula = y ~ x)

Coefficients:

(Intercept) x

-42.0787 0.7046

Get the Summary of the Relationship

# Reduced data points (3 points each)

x <- c(151, 174, 138)

y <- c(63, 81, 56)

# Apply the lm() function

relation <- lm(y ~ x)

# Print the summary of the model

print(summary(relation))
Call:

lm(formula = y ~ x)

Residuals:

1 2 3

-1.3180 0.4759 0.8420

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -42.07874 9.83170 -4.28 0.1461

x 0.70461 0.06341 11.11 0.0571 .

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.635 on 1 degrees of freedom

Multiple R-squared: 0.992, Adjusted R-squared: 0.9839

F-statistic: 123.5 on 1 and 1 DF, p-value: 0.05714

Predict the weight of new persons

# Reduced data points (3 points each)

x <- c(151, 174, 138)

y <- c(63, 81, 56)

# Apply the lm() function

relation <- lm(y ~ x)

# Print the summary of the model

print(summary(relation))
Call:

lm(formula = y ~ x)

Residuals:

1 2 3

-1.3180 0.4759 0.8420

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -42.07874 9.83170 -4.28 0.1461

x 0.70461 0.06341 11.11 0.0571 .

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.635 on 1 degrees of freedom

Multiple R-squared: 0.992, Adjusted R-squared: 0.9839

F-statistic: 123.5 on 1 and 1 DF, p-value: 0.05714

# Predict y values based on the model

predicted_y <- predict(relation, newdata = data.frame(x = c(160, 145, 170)))

# Print the predicted values

print(predicted_y)
1 2 3

70.65948 60.09027 77.70562

Visualize the Regression Graphically

# Reduced data points (3 points each)

x <- c(151, 174, 138)

y <- c(63, 81, 56)

# Apply the lm() function

relation <- lm(y ~ x)

# Plot the data points

plot(x, y, main = "Height vs Weight Regression",

xlab = "Weight (kg)", ylab = "Height (cm)",

pch = 16, col = "blue")

# Add the regression line

abline(relation, col = "red")

WEEK – 7

# Example dataset

data(mtcars)

# Convert 'cyl' (cylinder) into a factor (categorical variable)

mtcars$cyl <- factor(mtcars$cyl)

# Apply logistic regression

mylogit <- glm(am ~ mpg + wt + cyl, data = mtcars, family = "binomial")

# Print the summary of the logistic regression model

summary(mylogit)
Call:

glm(formula = am ~ mpg + wt + cyl, family = "binomial", data = mtcars)

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 23.92836 14.17738 1.688 0.0915 .

mpg -0.09851 0.35135 -0.280 0.7792

wt -8.17801 3.34965 -2.441 0.0146 *

cyl6 3.00979 2.51067 1.199 0.2306

cyl8 4.98194 3.50934 1.420 0.1557

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 43.230 on 31 degrees of freedom

Residual deviance: 14.588 on 27 degrees of freedom

AIC: 24.588

Number of Fisher Scoring iterations: 7

WEEK – 8

# Load the mtcars dataset

data(mtcars)

# Convert 'cyl' to a factor

mtcars$cyl <- factor(mtcars$cyl)

# Fit a linear regression model with 'cyl' as the predictor

model <- lm(mpg ~ cyl, data = mtcars)

# Display regression coefficients

coef(summary(model))
Estimate Std. Error t value Pr(>|t|)

(Intercept) 26.663636 0.9718008 27.437347 2.688358e-22

cyl6 -6.920779 1.5583482 -4.441099 1.194696e-04

cyl8 -11.563636 1.2986235 -8.904534 8.568209e-10

# Perform ANOVA on the model

anova(model)
Analysis of Variance Table

Response: mpg

Df Sum Sq Mean Sq F value Pr(>F)

cyl 2 824.78 412.39 39.697 4.979e-09 ***

Residuals 29 301.26 10.39

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
WEEK – 9

Install relevant pacakages

install.packages("rpart.plot")

install.packages("tree")

install.packages("ISLR")

install.packages("rattle")

library(tree)

library(ISLR)

library(rpart.plot)

library(rattle)

# Load data and clean NA values

Hitters <- na.omit(Hitters)

# Log transform Salary

Hitters$Salary <- log(Hitters$Salary)

# Plot transformed Salary

hist(Hitters$Salary, main = "Log-Transformed Salary", xlab = "Log(Salary)")

# Fit regression tree

tree.fit <- tree(Salary ~ Hits + Years, data = Hitters)

summary(tree.fit)
Regression tree:

tree(formula = Salary ~ Hits + Years, data = Hitters)

Number of terminal nodes: 7

Residual mean deviance: 0.002708 = 0.6933 / 256

Distribution of residuals:

Min. 1st Qu. Median Mean 3rd Qu. Max.

-0.2355000 -0.0258400 -0.0005869 0.0000000 0.0332000 0.2069000

# Plot the tree

plot(tree.fit, uniform = TRUE, margin = 0.2)

text(tree.fit, use.n = TRUE, cex = 0.8)

# Split data into training and testing

set.seed(123)

split <- createDataPartition(Hitters$Salary, p = 0.5, list = FALSE)

train <- Hitters[split, ]

test <- Hitters[-split, ]

# Train a tree model on the training data

trees <- tree(Salary ~ ., data = train)

plot(trees)

text(trees, pretty = 0)

# Cross-validation to prune the tree

cv.trees <- cv.tree(trees)

plot(cv.trees)

# Prune tree to best model

prune.trees <- prune.tree(trees, best = 4)

plot(prune.trees)

text(prune.trees, pretty = 0)

# Predict on test data using the pruned tree

yhat <- predict(prune.trees, test)

# Plot predicted vs actual salary

plot(yhat, test$Salary)
abline(0, 1)

# Calculate mean squared error

mean_squared_error <- mean((yhat - test$Salary)^2)

cat("Mean Squared Error:", mean_squared_error, "\n")

Mean Squared Error: 0.002714588
WEEK – 10

Clustering algorithms for unsupervised classification.

# Load the cluster library

library(cluster)

# Set random seed for reproducibility

set.seed(20)

# Apply K-means clustering on Sepal.Width and Petal.Width (columns 3 and 4)

irisCluster <- kmeans(iris[, 3:4], centers = 3, nstart = 20)

# Output the clustering result

print(irisCluster)
K-means clustering with 3 clusters of sizes 52, 48, 50

Cluster means:

Petal.Length Petal.Width

1 4.269231 1.342308

2 5.595833 2.037500

3 1.462000 0.246000

Clustering vector:

[1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

[46] 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1

[91] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2

[136] 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2

Within cluster sum of squares by cluster:

[1] 13.05769 16.29167 2.02200

(between_SS / total_SS = 94.3 %)

Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss"

[7] "size" "iter" "ifault"

# Convert cluster assignments to factor

irisCluster$cluster <- as.factor(irisCluster$cluster)

# Plot the clustering results

library(ggplot2)

ggplot(iris, aes(Petal.Length, Petal.Width, color = irisCluster$cluster)) +

geom_point() +

labs(title = "K-means Clustering on Iris Dataset",

x = "Petal Length", y = "Petal Width")

# Create a distance matrix from the mtcars dataset

d <- dist(as.matrix(mtcars))

# Apply hierarchical clustering

hc <- hclust(d)

# Plot the dendrogram

plot(hc, main = "Hierarchical Clustering of mtcars", xlab = "", sub = "")

# Generate a synthetic dataset with two clusters

x <- rbind(cbind(rnorm(10, 0, 0.5), rnorm(10, 0, 0.5)),

cbind(rnorm(15, 5, 0.5), rnorm(15, 5, 0.5)))

# Apply PAM clustering with 2 clusters and plot the result

library(cluster)

clusplot(pam(x, 2))

# Add noise (25 random values) to the dataset

x4 <- cbind(x, rnorm(25), rnorm(25))

# Apply PAM clustering with 2 clusters and plot the result

library(cluster)

clusplot(pam(x4, 2))

Mountain State University 2
80% (5)
Mountain State University 2
4 pages
M-4 U-3 Combined Notes
No ratings yet
M-4 U-3 Combined Notes
166 pages
EDA With R Lab Manual
No ratings yet
EDA With R Lab Manual
110 pages
A028 GLM-SC3
No ratings yet
A028 GLM-SC3
137 pages
MCQ Business Statistics
50% (2)
MCQ Business Statistics
41 pages
Bharathidasan University-Statistics-QP-Nov-2010
No ratings yet
Bharathidasan University-Statistics-QP-Nov-2010
3 pages
Week 2
No ratings yet
Week 2
66 pages
Midterm Solutions Machine
100% (1)
Midterm Solutions Machine
17 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Chapter 4A: Inferences Based On A Single Sample: Confidence Intervals
No ratings yet
Chapter 4A: Inferences Based On A Single Sample: Confidence Intervals
88 pages
R Stastics PDF
No ratings yet
R Stastics PDF
30 pages
Unit-5 Bda
No ratings yet
Unit-5 Bda
21 pages
Statistical Models in S
No ratings yet
Statistical Models in S
115 pages
File 2
No ratings yet
File 2
17 pages
File 2
No ratings yet
File 2
17 pages
DS File Et C1 23
No ratings yet
DS File Et C1 23
15 pages
Statistics Vocabulary List
100% (1)
Statistics Vocabulary List
1 page
Sample Size Calculator
100% (1)
Sample Size Calculator
2 pages
Data Scinece Practical File
No ratings yet
Data Scinece Practical File
23 pages
Curve Fitting: ME 537 Numerical Methods For Engineers University of Gaziantep Faculty of Engineering Dr. Mustafa Özakça
No ratings yet
Curve Fitting: ME 537 Numerical Methods For Engineers University of Gaziantep Faculty of Engineering Dr. Mustafa Özakça
171 pages
BDA Lab Manual (12 Weeks)
No ratings yet
BDA Lab Manual (12 Weeks)
22 pages
Model Linear
No ratings yet
Model Linear
33 pages
R Lab Manual (1) - Merged
No ratings yet
R Lab Manual (1) - Merged
25 pages
Measures of Variation - Ungrouped Data
No ratings yet
Measures of Variation - Ungrouped Data
3 pages
Data Science
No ratings yet
Data Science
15 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
96 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
R Program
No ratings yet
R Program
22 pages
R Programming Student Lab Manual-52-63-3-12
No ratings yet
R Programming Student Lab Manual-52-63-3-12
10 pages
Impact Evaluation in Practice (Second Edition) Offers A Comprehensive and Accessible Introduction To
No ratings yet
Impact Evaluation in Practice (Second Edition) Offers A Comprehensive and Accessible Introduction To
39 pages
Datamining Lab Record
No ratings yet
Datamining Lab Record
36 pages
DM Lab
No ratings yet
DM Lab
18 pages
Triangle Test Example
No ratings yet
Triangle Test Example
51 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
DATAMINING
No ratings yet
DATAMINING
24 pages
Regression Analysis Assignment1111
No ratings yet
Regression Analysis Assignment1111
13 pages
Toc ch1
No ratings yet
Toc ch1
9 pages
Final Data Lab
No ratings yet
Final Data Lab
21 pages
20mia1006 FDA LAB REGRESSION TYPES
No ratings yet
20mia1006 FDA LAB REGRESSION TYPES
11 pages
R Codes
No ratings yet
R Codes
5 pages
R Practicals
No ratings yet
R Practicals
32 pages
Bi 5to 8
No ratings yet
Bi 5to 8
6 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
Make Up Cat
No ratings yet
Make Up Cat
6 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
R Intro STAT5000
No ratings yet
R Intro STAT5000
17 pages
STA2050 Assignment 2
No ratings yet
STA2050 Assignment 2
10 pages
Exam 1 Notes
No ratings yet
Exam 1 Notes
4 pages
R Console
No ratings yet
R Console
6 pages
R Unit 4th and 5th
No ratings yet
R Unit 4th and 5th
17 pages
Dav Pracs
No ratings yet
Dav Pracs
9 pages
R Practice
No ratings yet
R Practice
38 pages
Data Science Using R
No ratings yet
Data Science Using R
11 pages
7f18e5b8-c197-4086-98da-243347927150
No ratings yet
7f18e5b8-c197-4086-98da-243347927150
3 pages
20BCE1205 Lab3
No ratings yet
20BCE1205 Lab3
9 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
Data Analysis Rough
No ratings yet
Data Analysis Rough
3 pages
Assignment - 3.R: 2021-08-02 by Harshith H S 2001115
No ratings yet
Assignment - 3.R: 2021-08-02 by Harshith H S 2001115
12 pages
AGR003 Laboratory Stats Tester: For Android
No ratings yet
AGR003 Laboratory Stats Tester: For Android
3 pages
R Code Default Data PDF
No ratings yet
R Code Default Data PDF
10 pages
Ds
No ratings yet
Ds
2 pages
R Course
No ratings yet
R Course
7 pages
BAN5
No ratings yet
BAN5
2 pages
Lab Wk1soln PDF
No ratings yet
Lab Wk1soln PDF
14 pages
Btam302 18
No ratings yet
Btam302 18
3 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
Liu Et Al. 2018 The Effect of Sample Size On Distribution Models
No ratings yet
Liu Et Al. 2018 The Effect of Sample Size On Distribution Models
14 pages
Mini-Assignment 1 PDF
100% (1)
Mini-Assignment 1 PDF
3 pages
Sampling Theory: Double Sampling (Two Phase Sampling)
No ratings yet
Sampling Theory: Double Sampling (Two Phase Sampling)
12 pages
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
No ratings yet
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
8 pages
Disease Mapping
No ratings yet
Disease Mapping
35 pages
Benchmark 6ix Sigma - Inspiring Minds, Facilitating Excellence
No ratings yet
Benchmark 6ix Sigma - Inspiring Minds, Facilitating Excellence
10 pages
Measure of Dispersion
No ratings yet
Measure of Dispersion
10 pages
CFA Level II Item-Set - Questions Study Session 3 June 2019: Reading 7 Correlation and Regression
No ratings yet
CFA Level II Item-Set - Questions Study Session 3 June 2019: Reading 7 Correlation and Regression
30 pages
Formula Sheet (Quantitative Methods)
No ratings yet
Formula Sheet (Quantitative Methods)
9 pages
TY - COMP - Descriptive Analytics - DEC 2019
No ratings yet
TY - COMP - Descriptive Analytics - DEC 2019
4 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
MHA 610 Week 4 Assignment
No ratings yet
MHA 610 Week 4 Assignment
7 pages
bài tập
No ratings yet
bài tập
4 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
102b ProblemSet 5 Solutions
No ratings yet
102b ProblemSet 5 Solutions
10 pages
Graph: Graph /scatterplot (Bivar) X With Y /missing Listwise
No ratings yet
Graph: Graph /scatterplot (Bivar) X With Y /missing Listwise
9 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet