0% found this document useful (0 votes)

13 views4 pages

LAb Test 2

Uploaded by

Christina Tauauvea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views4 pages

LAb Test 2

Uploaded by

Christina Tauauvea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Q1

# Load necessary libraries

library(dplyr)

# Import the insurance data

insurance_data <- read.csv("https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-
datasets/master/insurance.csv")

# Take a peek at the data

head(insurance_data)
glimpse(insurance_data)

# Convert sex, smoker, and region into nominal categorical variables (factor)
insurance_data$sex <- factor(insurance_data$sex)
insurance_data$smoker <- factor(insurance_data$smoker)
insurance_data$region <- factor(insurance_data$region)

# Convert children into an ordinal categorical variable (factor with ordered levels)
insurance_data$children <- factor(insurance_data$children,
ordered = TRUE)

# Check the structure of the updated data

str(insurance_data)
glimpse(insurance_data)

# Check for missing values in the dataset

colSums(is.na(insurance_data))

# Summary statistics for numerical variables to check for outliers

summary(insurance_data)

# Fit a basic linear model for medical charges based on all other variables
linear_model <- lm(charges ~ age + sex + bmi + children + smoker + region, data =
insurance_data)

# Calculate Cook's Distance to identify influential observations

cooksd <- cooks.distance(linear_model)

# Plot Cook's Distance

plot(cooksd, main = "Cook's Distance for Influential Observations", ylab = "Cook's
Distance")
abline(h = 4/(nrow(insurance_data)), col = "red") # A common threshold for Cook's Distance

# Display the rows with Cook's Distance greater than the threshold
influential_obs <- which(cooksd > 4/nrow(insurance_data))
insurance_data[influential_obs, ]
Q3

# Load necessary library -correlation with significance tests

library(Hmisc)

# Compute Pearson correlation between bmi and age

cor_bmi_age <- cor(insurance_data$bmi, insurance_data$age, method = "pearson")
cor_test_bmi_age <- cor.test(insurance_data$bmi, insurance_data$age, method = "pearson")

# Compute Pearson correlation between bmi and charges

cor_bmi_charges <- cor(insurance_data$bmi, insurance_data$charges, method = "pearson")
cor_test_bmi_charges <- cor.test(insurance_data$bmi, insurance_data$charges, method =
"pearson")

# Display correlation coefficients and significance tests

cat("Correlation between BMI and Age:\n")
print(cor_bmi_age)
print(cor_test_bmi_age)

cat("\nCorrelation between BMI and Charges:\n")

print(cor_bmi_charges)
print(cor_test_bmi_charges)

# Load the GGally package

library(GGally)

# Select the relevant continuous variables (bmi, age, charges)

data_subset <- insurance_data[, c("bmi", "age", "charges")]

# Create a scatterplot matrix

ggpairs(data_subset,
title = "Scatterplot Matrix of BMI, Age, and Charges",
upper = list(continuous = "cor"), # Show correlation in the upper panels
lower = list(continuous = "smooth"), # Add a smooth line in the lower panels
diag = list(continuous = "density")) # Show density plots on the diagonal

Q5# Build the multiple linear regression model

model <- lm(bmi ~ age + sex + children + charges, data = insurance_data)

# Print out the results of the model

summary(model)


 Equation: bmi = 28.72+0.02472age+0.4544sex-0.1558children

 For every 1 unit that age increases the bmi increases by 0.02472, for every 1 unit sex
increases the bmi increases by 0.02472 and with every 1 unit that children increases
the bmi decreases by 0.1558. Where p-value is less than 0.05, the relationship is
statiscally significant, otherwise the predictor is not statistically significant in
explaining 'bmi'.

library(ggplot2)
# Create predicted values based on the model
insurance_data$predicted_bmi <- predict(model)

# Create residuals (difference between observed and predicted values)

insurance_data$residuals <- insurance_data$bmi - insurance_data$predicted_bmi
# Scatterplot of observed vs predicted BMI values
ggplot(insurance_data, aes(x = predicted_bmi, y = bmi)) +
geom_point(color = "blue", alpha = 0.5) + # Scatterplot points
geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") + # Line of
perfect fit
labs(title = "Observed vs. Predicted BMI",
x = "Predicted BMI",
y = "Observed BMI")
 The scatterplot shows a significant scatter of the points around the line of best fit
therefore the model does not necessarily capture the variability in the data and there
may be other factors that can better explain 'bmi'.

Research Methodology - Best Practices For Rigorous, Credible, and Impactful Research
100% (1)
Research Methodology - Best Practices For Rigorous, Credible, and Impactful Research
617 pages
Pset 6 - Fall2019 - Solutions PDF
100% (3)
Pset 6 - Fall2019 - Solutions PDF
33 pages
Business Statistics Final Exam Solutions
100% (4)
Business Statistics Final Exam Solutions
10 pages
Wiley'S Cfa Program Level I Smartsheets: Fundamentals For Cfa Exam Success
No ratings yet
Wiley'S Cfa Program Level I Smartsheets: Fundamentals For Cfa Exam Success
11 pages
Churn Assignment
No ratings yet
Churn Assignment
11 pages
Course Regression Model Strategies PDF
No ratings yet
Course Regression Model Strategies PDF
307 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
R Workshop PART 2
No ratings yet
R Workshop PART 2
36 pages
Model Linear
No ratings yet
Model Linear
33 pages
Chapter3 Is This
No ratings yet
Chapter3 Is This
27 pages
R Practice
No ratings yet
R Practice
38 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Analytics
No ratings yet
Analytics
11 pages
Marketing Analytics Project: Alisha Srivastava Prachi Aggarwal Anup Thakur Gowtham Reddy Sandeep Pal
No ratings yet
Marketing Analytics Project: Alisha Srivastava Prachi Aggarwal Anup Thakur Gowtham Reddy Sandeep Pal
16 pages
Pima Indians Diabetes Database Analysis - Kaggle
No ratings yet
Pima Indians Diabetes Database Analysis - Kaggle
37 pages
Hubungan Persepsi Mahasiswa Tentang Keluarga Harmonis Dengan Kesiapan Menikah
No ratings yet
Hubungan Persepsi Mahasiswa Tentang Keluarga Harmonis Dengan Kesiapan Menikah
7 pages
Regression Models Course Notes
No ratings yet
Regression Models Course Notes
102 pages
Linear Regression Experiment
No ratings yet
Linear Regression Experiment
6 pages
HW4 Solutions: Problem 6.2
No ratings yet
HW4 Solutions: Problem 6.2
8 pages
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
No ratings yet
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
34 pages
Lab-5-1-Regression and Multiple Regression
100% (2)
Lab-5-1-Regression and Multiple Regression
8 pages
Homework 2
100% (1)
Homework 2
14 pages
Linear Regression in Scikit-Learn (Sklearn) - An Introduction - Datagy
No ratings yet
Linear Regression in Scikit-Learn (Sklearn) - An Introduction - Datagy
22 pages
To Show Whether or Not Colours Are Evenly Distributed in A Bag of Gummi Bears
No ratings yet
To Show Whether or Not Colours Are Evenly Distributed in A Bag of Gummi Bears
6 pages
CHAPTER 4 Measure of Dispersion
No ratings yet
CHAPTER 4 Measure of Dispersion
76 pages
BIA B350F Assignment 1 Regression Analysis Sample
No ratings yet
BIA B350F Assignment 1 Regression Analysis Sample
19 pages
Presentation Health Insurance USA
No ratings yet
Presentation Health Insurance USA
18 pages
DESCRIPTIVE ANALYTICS PPT - Updated
No ratings yet
DESCRIPTIVE ANALYTICS PPT - Updated
127 pages
Coloring Fruits
No ratings yet
Coloring Fruits
15 pages
4-R Code and PPT - Predicting Medical Expenses Using Linear Regression - New Without Prerequsit
No ratings yet
4-R Code and PPT - Predicting Medical Expenses Using Linear Regression - New Without Prerequsit
17 pages
D Linear Regression With R
No ratings yet
D Linear Regression With R
9 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
Confidence Level and Sample Size
No ratings yet
Confidence Level and Sample Size
19 pages
IntroR 2
No ratings yet
IntroR 2
18 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
222ECO01 Anand Advanced Econometrics Activity1
No ratings yet
222ECO01 Anand Advanced Econometrics Activity1
6 pages
7th Report
No ratings yet
7th Report
14 pages
SML Lab 1
No ratings yet
SML Lab 1
19 pages
Anova Test - Post Hoc 1
No ratings yet
Anova Test - Post Hoc 1
2 pages
7 - Sampling Distributions & Point Estimation of Parameters
No ratings yet
7 - Sampling Distributions & Point Estimation of Parameters
45 pages
Department of Business Management Course: - Operational Research Individual Assignment 3
No ratings yet
Department of Business Management Course: - Operational Research Individual Assignment 3
6 pages
Computer Lab 3 MM
No ratings yet
Computer Lab 3 MM
38 pages
Solution Manual Business Analytics 3rd Edition by Camm & Cochran
No ratings yet
Solution Manual Business Analytics 3rd Edition by Camm & Cochran
56 pages
R Practicals
No ratings yet
R Practicals
32 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Unit 531 Describing and Assessing The Linear Relationship Between Two Scale Variables Without Answers
No ratings yet
Unit 531 Describing and Assessing The Linear Relationship Between Two Scale Variables Without Answers
4 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
79 pages
Topic1.4-Functions of Random Variables
No ratings yet
Topic1.4-Functions of Random Variables
41 pages
21BCS5999 - Ankit Kumar (Assignment 2)
No ratings yet
21BCS5999 - Ankit Kumar (Assignment 2)
16 pages
ML - LAB - FILE Pankaj
No ratings yet
ML - LAB - FILE Pankaj
13 pages
Random Variables: X X X X X X PX X P P P P
No ratings yet
Random Variables: X X X X X X PX X P P P P
3 pages
National University of Singapore ST5215: Advanced Statistical Theory (I)
No ratings yet
National University of Singapore ST5215: Advanced Statistical Theory (I)
3 pages
Midterm Project Group 6
No ratings yet
Midterm Project Group 6
41 pages
Second Assignment On QT
No ratings yet
Second Assignment On QT
1 page
BES - R Lab 9
No ratings yet
BES - R Lab 9
7 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
20 pages
Medical Insurance Cost Prediction
No ratings yet
Medical Insurance Cost Prediction
16 pages
Chapter4 Notes
No ratings yet
Chapter4 Notes
18 pages
Chi Square Test
No ratings yet
Chi Square Test
7 pages
ProbList5 24 SLN
No ratings yet
ProbList5 24 SLN
9 pages
Group 5 - Applied Statistics and Experimental 152611
No ratings yet
Group 5 - Applied Statistics and Experimental 152611
28 pages
Chi Square Practice Problems
No ratings yet
Chi Square Practice Problems
3 pages
Linearregression
No ratings yet
Linearregression
18 pages
Structural Break
No ratings yet
Structural Break
4 pages
Real Statistics Examples Regression 1
No ratings yet
Real Statistics Examples Regression 1
440 pages
S Stream
No ratings yet
S Stream
5 pages
GRMD2102 Homework 1
No ratings yet
GRMD2102 Homework 1
3 pages
Regression Analysis Assignment1111
No ratings yet
Regression Analysis Assignment1111
13 pages
TP MSDC 2 Sujet
No ratings yet
TP MSDC 2 Sujet
5 pages
Ex 5
No ratings yet
Ex 5
6 pages
(Download PDF) Data Science and Analytics With Python 1St Edition Jesus Rogel Salazar Online Ebook All Chapter PDF
100% (16)
(Download PDF) Data Science and Analytics With Python 1St Edition Jesus Rogel Salazar Online Ebook All Chapter PDF
42 pages
ECO254
No ratings yet
ECO254
136 pages
Amsterdam + Berlin Schedule & Curriculum Edorer Business Analytics & Data Science Bootcamp
No ratings yet
Amsterdam + Berlin Schedule & Curriculum Edorer Business Analytics & Data Science Bootcamp
14 pages
3-Applying Multiple Linear Regression
No ratings yet
3-Applying Multiple Linear Regression
5 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
26 pages
Lecture-5 2
No ratings yet
Lecture-5 2
51 pages
ASSIGNMENT NO - 2, FDAS - SUMANYAKUMARI - Bfia
No ratings yet
ASSIGNMENT NO - 2, FDAS - SUMANYAKUMARI - Bfia
6 pages
Finding Latent Groups in Observed Data
No ratings yet
Finding Latent Groups in Observed Data
56 pages
SPPUML5
No ratings yet
SPPUML5
4 pages
QUIZ Notes
No ratings yet
QUIZ Notes
5 pages
Step 1
No ratings yet
Step 1
10 pages
Da Thoery
No ratings yet
Da Thoery
24 pages
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
No ratings yet
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
8 pages
Data Mining - Lab 2
No ratings yet
Data Mining - Lab 2
5 pages
Linear Regression Modelfor Predicting Medical Expenses
No ratings yet
Linear Regression Modelfor Predicting Medical Expenses
5 pages
Bloem PSRM 2023 Semester Test 3 Information and Additional Exercises - 024406
No ratings yet
Bloem PSRM 2023 Semester Test 3 Information and Additional Exercises - 024406
6 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
From Everand
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
Steve Brown
No ratings yet