0% found this document useful (0 votes)

6 views6 pages

Model Lab

The document outlines various data analysis tasks using R, including preprocessing the Titanic dataset, visualizing the mtcars dataset with Esquisse, developing a Shiny app with the iris dataset, and performing outlier analysis and correlation studies. It also covers regression analysis for sales trends, blood test parameter correlations, heart disease prediction using logistic regression, and real estate price prediction with multiple linear regression. Each section includes R code snippets for implementation and analysis of the respective datasets.

Uploaded by

navhans2104

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views6 pages

Model Lab

Uploaded by

navhans2104

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1. Load the Titanic dataset using read.

csv(), explore it with str(), summary(), and

head(). Identify numerical/categorical variables and handle missing values by
imputing with mean, median, or mode. Convert categorical data into factors, scale
numerical features, engineer new features, and save the preprocessed data using
write.csv().

# Load data
titanic <- read.csv("titanic.csv")

# Explore data
str(titanic)
summary(titanic)
head(titanic)

# Handle missing values

titanic$Age[is.na(titanic$Age)] <- mean(titanic$Age, na.rm = TRUE)
titanic$Embarked[is.na(titanic$Embarked)] <- "S"

# Convert to factors
titanic$Sex <- as.factor(titanic$Sex)
titanic$Embarked <- as.factor(titanic$Embarked)

# Scale numerical features

titanic$Fare <- scale(titanic$Fare)
titanic$Age <- scale(titanic$Age)

# Engineer new feature: FamilySize

titanic$FamilySize <- titanic$SibSp + titanic$Parch + 1

# Save preprocessed data

write.csv(titanic, "titanic_cleaned.csv", row.names = FALSE)

2. Use Esquisse to visualize the mtcars dataset: 1.Create a histogram/density plot for
mpg (Miles per Gallon). 2. Compare the number of automatic vs. manual cars
using a bar plot of am.3.Show the relationship between cyl (Cylinders) and hp
(Horsepower) using a box/scatter plot. 4.Visualize mpg vs. wt (Weight) with color
for cyl. 5.Create a bar plot showing car counts by gear (Transmission gears).
Export and save the R code.

install.packages("esquisse")
library(esquisse)

# Load mtcars
data("mtcars")

# Launch Esquisse GUI

esquisse::esquisser()

3. Load a mtcars dataset, identify numerical and categorical variables, and visualize
distributions using box plots, histograms, and violin plots. Use scatter plots with
trend lines to analyze relationships. Create facets, bar plots, and heatmaps to
explore patterns and correlations. Provide insights on distributions, outliers, and
variable relationships.

library(ggplot2)
library(vioplot)
data("mtcars")

# Boxplot
boxplot(mtcars$mpg, main="MPG Boxplot")

# Histogram
hist(mtcars$hp, main="HP Histogram", col="lightblue")

# Violin plot
vioplot::vioplot(mtcars$wt, names = "Weight")

# Scatter plot with regression line

ggplot(mtcars, aes(x=mpg, y=wt)) +
geom_point() + geom_smooth(method="lm")

# Correlation heatmap
heatmap(cor(mtcars), main="Correlation Heatmap")

4. Develop a Shiny app using the iris dataset with three features: (1) Select a
numerical and categorical variable to display summary statistics. (2) Choose two
numerical variables for a scatter plot, colored by a categorical variable. (3)
Generate a box plot for a selected numerical and categorical variable.

library(shiny)
data(iris)

ui <- fluidPage(
titlePanel("Iris Shiny App"),
selectInput("num", "Numerical Variable:", choices=names(iris)[1:4]),
selectInput("cat", "Categorical Variable:", choices=c("Species")),
verbatimTextOutput("summary"),
selectInput("x", "X-axis Variable:", choices=names(iris)[1:4]),
selectInput("y", "Y-axis Variable:", choices=names(iris)[1:4]),
plotOutput("scatter"),
plotOutput("boxplot")
)
server <- function(input, output) {
output$summary <- renderPrint({
summary(iris[[input$num]])
})

output$scatter <- renderPlot({

plot(iris[[input$x]], iris[[input$y]], col=iris$Species, pch=19)
})

output$boxplot <- renderPlot({

boxplot(iris[[input$num]] ~ iris[[input$cat]], main="Boxplot")
})
}

shinyApp(ui = ui, server = server)

5. Analyze outliers in the mtcars dataset using IQR and Z-score methods. Compute
probabilities of selecting outlier cars based on mpg, hp, and wt. Explore
conditional probabilities, independence, and expected outliers. Examine
distribution shapes, skewness, correlations, and compare different outlier
detection methods to assess consistency and insights.

library(e1071)
data("mtcars")

# IQR method
Q1 <- quantile(mtcars$mpg, 0.25)
Q3 <- quantile(mtcars$mpg, 0.75)
IQR_val <- Q3 - Q1
outliers_iqr <- which(mtcars$mpg < (Q1 - 1.5 * IQR_val) | mtcars$mpg > (Q3 + 1.5 *
IQR_val))

# Z-score method
z_scores <- scale(mtcars$mpg)
outliers_z <- which(abs(z_scores) > 3)

# Print outliers
mtcars[outliers_iqr, ]
mtcars[outliers_z, ]

# Skewness
skewness(mtcars$mpg)
6. Perform correlation analysis on the mtcars dataset to evaluate relationships
between mpg and other features using Pearson and Spearman methods. Identify
key influencing variables and visualize insights using heatmaps, scatter plots, box
plots, bar plots, and clustering dendrograms, ensuring a comprehensive
understanding of mpg dependencies and trends.
data("mtcars")

data("mtcars")

# Pearson and Spearman

cor(mtcars$mpg, mtcars$hp, method = "pearson")
cor(mtcars$mpg, mtcars$hp, method = "spearman")

# Correlation heatmap
heatmap(cor(mtcars), main = "Correlation Heatmap")

# Dendrogram clustering
dist_m <- dist(mtcars)
hcl <- hclust(dist_m)
plot(hcl)

7. Analyze sales trends over time using scatter plots and regression models. Fit linear
and quadratic regression, compare their R-squared and RMSE values, and
interpret results. Identify the best-fitting model and predict sales for the next 6
months (Months: 61-66) to assess future trends and decision-making insights.

set.seed(123)
months <- 1:60
sales <- 100 + 2*months + rnorm(60, 0, 10)
data <- data.frame(months, sales)

# Linear regression
model1 <- lm(sales ~ months, data)
summary(model1)

# Quadratic regression
model2 <- lm(sales ~ months + I(months^2), data)
summary(model2)

# Predict next 6 months

future <- data.frame(months = 61:66)
predict(model2, newdata = future)
8. A medical researcher is studying the relationship between various blood test
parameters, such as glucose levels, cholesterol, and blood pressure, to identify
potential risk factors for diabetes. Compute the Pearson and Spearman
correlation coefficients for a given dataset and analyze the relationships between
variables. Explain how correlation matrices assist in feature selection and why
highly correlated features might lead to redundancy in predictive models. Using
linear algebra, describe how the correlation matrix can be decomposed using
eigenvalues and eigenvectors to understand the data structure.

# Simulated data
df <- data.frame(
glucose = rnorm(100, 100, 15),
cholesterol = rnorm(100, 200, 25),
bp = rnorm(100, 120, 10)
)

# Correlation
cor(df, method = "pearson")
cor(df, method = "spearman")

# Eigen decomposition
e <- eigen(cor(df))
e$values
e$vectors

9. Analyze the Heart Disease dataset by exploring its structure, target variable, and
missing values. Visualize correlations, split data, and train a Logistic Regression
model. Evaluate using accuracy, precision, recall, and F1-score. Apply PCA for
dimensionality reduction and assess if performance improves for better heart
disease prediction.

heart <- read.csv("heart.csv")

# Train-test split
set.seed(123)
train_idx <- sample(1:nrow(heart), 0.7 * nrow(heart))
train <- heart[train_idx, ]
test <- heart[-train_idx, ]

# Logistic regression
model <- glm(target ~ ., data=train, family="binomial")
pred <- predict(model, newdata=test, type="response")
pred_class <- ifelse(pred > 0.5, 1, 0)

# Evaluation
conf_matrix <- table(Predicted=pred_class, Actual=test$target)
accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)
accuracy

# PCA
pca <- prcomp(train[,-ncol(train)], scale. = TRUE)
summary(pca)

10. A real estate company aims to predict house prices based on factors such as square
footage, number of bedrooms, and location. Formulate the problem as a multiple
linear regression equation and express it in matrix form. Using the normal
equation, calculate the regression coefficients to fit the model. Discuss how
multicollinearity among predictor variables affects the reliability of the regression
model and explain how Principal Component Regression (PCR) can be used to
address multicollinearity issues while improving predictive accuracy.

# Sample data
data <- data.frame(
sqft = c(1000, 1500, 2000),
bedrooms = c(2, 3, 4),
location = c(1, 2, 3), # encoded location
price = c(200000, 300000, 400000)
)

# Design matrix X (with bias term)

X <- as.matrix(cbind(1, data$sqft, data$bedrooms, data$location))
y <- as.matrix(data$price)

# Normal equation
beta <- solve(t(X) %*% X) %*% t(X) %*% y
beta

# PCR to fix multicollinearity

library(pls)
model <- pcr(price ~ ., data=data, scale=TRUE, validation="CV")
summary(model)

BVT Bed Re Ets: Vie I
No ratings yet
BVT Bed Re Ets: Vie I
228 pages
ANP Technical Note 10 - Human Factors
No ratings yet
ANP Technical Note 10 - Human Factors
7 pages
Ud Module 4
No ratings yet
Ud Module 4
105 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Thesis Definition of Terms Format
100% (3)
Thesis Definition of Terms Format
4 pages
Bda Skill
No ratings yet
Bda Skill
34 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
Da Thoery
No ratings yet
Da Thoery
24 pages
R Module 11 - Statistics
No ratings yet
R Module 11 - Statistics
35 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
Shahun Term Workr1
No ratings yet
Shahun Term Workr1
34 pages
Data Science For Civil Engineering Unit 3 Notes-1
No ratings yet
Data Science For Civil Engineering Unit 3 Notes-1
29 pages
Data Science
No ratings yet
Data Science
15 pages
The Next Big Thing Quantum Computings Potential On Chemicals
No ratings yet
The Next Big Thing Quantum Computings Potential On Chemicals
7 pages
DEV Lab Manual
No ratings yet
DEV Lab Manual
27 pages
Science Fair Literature Review Example
100% (2)
Science Fair Literature Review Example
4 pages
Commands For Data Analysis Using R
No ratings yet
Commands For Data Analysis Using R
11 pages
Best Students - Coe
No ratings yet
Best Students - Coe
2 pages
Applied Statistics
No ratings yet
Applied Statistics
457 pages
Module 2 Notes
No ratings yet
Module 2 Notes
30 pages
CE118 Project Part 1
No ratings yet
CE118 Project Part 1
42 pages
ML Observation
No ratings yet
ML Observation
29 pages
R Record-1
No ratings yet
R Record-1
57 pages
ProbList2 24 SLN
No ratings yet
ProbList2 24 SLN
20 pages
Rlab
No ratings yet
Rlab
7 pages
R Codes
No ratings yet
R Codes
5 pages
DATAMINING
No ratings yet
DATAMINING
24 pages
ASSIGNMENT NO - 2, FDAS - SUMANYAKUMARI - Bfia
No ratings yet
ASSIGNMENT NO - 2, FDAS - SUMANYAKUMARI - Bfia
6 pages
DM Lab
No ratings yet
DM Lab
18 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
DATA ANALYTICS With R - 2025
No ratings yet
DATA ANALYTICS With R - 2025
21 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
Task by Task Guide - Build and Deploy A Stroke Prediction Model Using R
No ratings yet
Task by Task Guide - Build and Deploy A Stroke Prediction Model Using R
5 pages
Saurabh
No ratings yet
Saurabh
22 pages
R Practicals
No ratings yet
R Practicals
32 pages
Aman DA 111
No ratings yet
Aman DA 111
14 pages
Dav Pracs
No ratings yet
Dav Pracs
9 pages
WEEK
No ratings yet
WEEK
17 pages
Ds
No ratings yet
Ds
2 pages
DM Assignment
No ratings yet
DM Assignment
17 pages
R Cheatsheet ABCD
No ratings yet
R Cheatsheet ABCD
3 pages
CourseKata R Cheatsheet ABC
No ratings yet
CourseKata R Cheatsheet ABC
5 pages
Y9 2. Possibility Diagram
No ratings yet
Y9 2. Possibility Diagram
13 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
Ds
No ratings yet
Ds
2 pages
Binomial Theorem: IIT JEE (Main) Examination
No ratings yet
Binomial Theorem: IIT JEE (Main) Examination
56 pages
Da Exp9,10
No ratings yet
Da Exp9,10
9 pages
bml-205 KK en
No ratings yet
bml-205 KK en
1 page
Lesson One - Inclusive Education - Supplimentary Notes
No ratings yet
Lesson One - Inclusive Education - Supplimentary Notes
10 pages
Angela Ales Bello The Divine in Husserl and Other Explorations 1st Edition Angela Ales Bello Auth Instant Download
No ratings yet
Angela Ales Bello The Divine in Husserl and Other Explorations 1st Edition Angela Ales Bello Auth Instant Download
29 pages
CLIL WORKSHEETS Globalization
No ratings yet
CLIL WORKSHEETS Globalization
1 page
R Lab Program
No ratings yet
R Lab Program
20 pages
1 1exercises
No ratings yet
1 1exercises
17 pages
Business Etiquette in South Korea - 20230908 - 122053 - 0000
No ratings yet
Business Etiquette in South Korea - 20230908 - 122053 - 0000
8 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
#PART 1a) : "Vqv/ggbiplot"
No ratings yet
#PART 1a) : "Vqv/ggbiplot"
29 pages
Assignment 2 - Factor Hair
No ratings yet
Assignment 2 - Factor Hair
39 pages
Kami Export - ALEXA CADENA - 13 Cellular Respiration-S
No ratings yet
Kami Export - ALEXA CADENA - 13 Cellular Respiration-S
6 pages
Iaad 2023
No ratings yet
Iaad 2023
4 pages
Regression
No ratings yet
Regression
4 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
PROJECTS
No ratings yet
PROJECTS
6 pages
R Note
No ratings yet
R Note
56 pages
Data Science Using R
No ratings yet
Data Science Using R
11 pages
R Basics
No ratings yet
R Basics
18 pages
UL2
No ratings yet
UL2
2 pages
8 TQ Quarter4
No ratings yet
8 TQ Quarter4
2 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Eoa Peg-4000 (En) Msds
No ratings yet
Eoa Peg-4000 (En) Msds
7 pages
UC3843 ChipsWinner
No ratings yet
UC3843 ChipsWinner
11 pages
Signal Integrity Measurements and Network Analysis
No ratings yet
Signal Integrity Measurements and Network Analysis
55 pages
R Codes
No ratings yet
R Codes
23 pages
BAN5
No ratings yet
BAN5
2 pages
Predictive Analytics: Group Assignment 2
No ratings yet
Predictive Analytics: Group Assignment 2
6 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
Amta - Final Exams: Code: # Load The Toyotacorolla - CSV
No ratings yet
Amta - Final Exams: Code: # Load The Toyotacorolla - CSV
13 pages
137-E Blank Form
No ratings yet
137-E Blank Form
3 pages
Mtcars: Choosing The Most Related Variable (S) To The Response
No ratings yet
Mtcars: Choosing The Most Related Variable (S) To The Response
13 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
R Assignment
No ratings yet
R Assignment
8 pages
ACR-Orientation Work Arrangement
No ratings yet
ACR-Orientation Work Arrangement
10 pages
BNAD 277 Tableau Assignment
No ratings yet
BNAD 277 Tableau Assignment
1 page
R Course
No ratings yet
R Course
7 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Date Preparation and Exploration:: Titanic Data - CSV
No ratings yet
Date Preparation and Exploration:: Titanic Data - CSV
5 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
TiO2 APPLAB989092510 1
No ratings yet
TiO2 APPLAB989092510 1
3 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
15-Nguyen Van Thin-Bai Bao28!3!2007
No ratings yet
15-Nguyen Van Thin-Bai Bao28!3!2007
8 pages