0% found this document useful (0 votes)

2 views2 pages

9.18 Problem Set - Classification

The document outlines a problem set focused on classification methods using a breast cancer biopsy dataset. It details the steps for data preparation, including cleaning and splitting the dataset, and applying logistic regression, random forest, and k-NN for prediction. The results discuss the accuracy of each method and emphasize the importance of minimizing false negatives in medical diagnoses.

Uploaded by

Trinh Lê

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views2 pages

9.18 Problem Set - Classification

Uploaded by

Trinh Lê

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

---

title: '9.18 Problem Set: Classification'

author: 'Trinh Le'
---
```{r}
install.packages("randomForest")
```

```{r}
library(MASS)
```
```{r}
library(tidyverse)
library(class)
library(randomForest)
```

1.Review the dataset documentation and describe the dataset in your own words.
```{r}
#The biopsy dataset from the MASS package contains data about breast cancer biopsies. It
includes:Predictor Variables: 10 numerical features describing cell nuclei
characteristics.Target Variable: class, which indicates whether the case is benign or
malignant.
```
2.Prepare your data:
a.Save to a variable as a tibble.
```{r}
library(MASS)
library(tidyverse)
data <- as_tibble(biopsy)
```
b.Remove the ID variable.
```{r}
data <- data %>% select(-ID)
```

c.Drop rows that have any NA values using the function drop_na().
```{r}
data <- data %>% drop_na()
```
d.Split the data into a training set and a testing set, using 80% of the records for the
training set and 20% for the testing set.
```{r}
set.seed(42) # For reproducibility
n <- nrow(data)
train_indices <- sample(1:n, size = 0.8 * n)
train_set <- data[train_indices, ]
test_set <- data[-train_indices, ]
```

3.Use each of the following classification methods to predict class. Use all of the
available predictor variables. For each method, report accuracy on the test set and print
a confusion matrix.
a.Logistic regression (Note that you will need to convert the class variable to be 1 or 0
instead of “benign” or “malignant.” Be sure to not include the original class variable in
your model.)
```{r}
train_set <- train_set %>%
mutate(class_binary = if_else(class == "malignant", 1, 0))

test_set <- test_set %>%

mutate(class_binary = if_else(class == "malignant", 1, 0))

logreg_model <- glm(class_binary ~ . -class, family = binomial, data = train_set)

logreg_pred <- predict(logreg_model, newdata = test_set, type = "response")
logreg_binary <- if_else(logreg_pred > 0.5, 1, 0)

logreg_accuracy <- Accuracy(y_pred = logreg_binary, y_true = test_set$class_binary)

logreg_conf_matrix <- ConfusionMatrix(y_pred = logreg_binary, y_true =
test_set$class_binary)

logreg_accuracy
logreg_conf_matrix
```

b.Random forest
```{r}
library(randomForest)

rf_model <- randomForest(class ~ . -class_binary, data = train_set)

rf_pred <- predict(rf_model, newdata = test_set)

rf_accuracy <- Accuracy(y_pred = rf_pred, y_true = test_set$class)

rf_conf_matrix <- ConfusionMatrix(y_pred = rf_pred, y_true = test_set$class)

rf_accuracy
rf_conf_matrix
```

c.k-NN (use k = 5. Note that the data is already normalized and ready to use for k-NN.)
```{r}
library(class)

knn_pred <- knn(

train = train_set %>% select(-class, -class_binary),
test = test_set %>% select(-class, -class_binary),
cl = train_set$class,
k = 5
)

knn_accuracy <- Accuracy(y_pred = knn_pred, y_true = test_set$class)

knn_conf_matrix <- ConfusionMatrix(y_pred = knn_pred, y_true = test_set$class)

knn_accuracy
knn_conf_matrix
```

4.Briefly discuss the results. Which method performed the best? Do you think false
positives or false negatives are more important in this case?
```{r}
#Which Method Performed the Best? Based on the accuracy values: Logistic Regression had an
accuracy of [logreg_accuracy].Random Forest had an accuracy of [rf_accuracy].k-NN had an
accuracy of [knn_accuracy]. The method with the highest accuracy is the best at predicting
whether the case is benign or malignant.
#False Positives vs. False Negatives: False Negatives (malignant classified as benign) are
more dangerous because they could delay treatment for a potentially life-threatening
condition. False Positives (benign classified as malignant) are less critical but can
cause unnecessary stress and additional testing.
#Conclusion: The best method is accurate at distinguishing between benign and malignant
cases, but it's crucial to reduce false negatives to ensure no malignant cases are missed.
```

GPUY
No ratings yet
GPUY
404 pages
Additional Program
No ratings yet
Additional Program
573 pages
Accenture Top Five Improvements Sales Effectiveness
No ratings yet
Accenture Top Five Improvements Sales Effectiveness
16 pages
Geography
No ratings yet
Geography
4 pages
Worksheet Classification1
No ratings yet
Worksheet Classification1
15 pages
Brief of Amicus Curiae Robert C. Hannum, PH.D., in Support of Defendant-Appellee and Affirmance
No ratings yet
Brief of Amicus Curiae Robert C. Hannum, PH.D., in Support of Defendant-Appellee and Affirmance
48 pages
02 - Windows CE Porting and Installation - 1
No ratings yet
02 - Windows CE Porting and Installation - 1
16 pages
DS File Et C1 23
No ratings yet
DS File Et C1 23
15 pages
Data Analysis and Machine Learning On The Wisconsin Breast Cancer Dataset
No ratings yet
Data Analysis and Machine Learning On The Wisconsin Breast Cancer Dataset
11 pages
File 2
No ratings yet
File 2
17 pages
Essay On The Principles of Translation
No ratings yet
Essay On The Principles of Translation
445 pages
Pranati. Hubli
No ratings yet
Pranati. Hubli
3 pages
Sem1 Batch1 PDF
No ratings yet
Sem1 Batch1 PDF
73 pages
ML Practicals
No ratings yet
ML Practicals
21 pages
ML RECORD EX 5,6,7,8,9 (Without Border)
No ratings yet
ML RECORD EX 5,6,7,8,9 (Without Border)
13 pages
Structured Format of Predictive
No ratings yet
Structured Format of Predictive
13 pages
Vighnesh - S Log 13
No ratings yet
Vighnesh - S Log 13
4 pages
Supervised Learningclassification Part2
No ratings yet
Supervised Learningclassification Part2
17 pages
Liquid-Liquid Extraction Principles
No ratings yet
Liquid-Liquid Extraction Principles
34 pages
Female A S Breast Cancer Prediction Model
No ratings yet
Female A S Breast Cancer Prediction Model
8 pages
Data Science
No ratings yet
Data Science
15 pages
Literasi Dalam Bahasa Inggris - SIMULASI UTBK2024
No ratings yet
Literasi Dalam Bahasa Inggris - SIMULASI UTBK2024
3 pages
Composition For The Whole Mind by Jason Bellipanni
No ratings yet
Composition For The Whole Mind by Jason Bellipanni
22 pages
Rlab
No ratings yet
Rlab
7 pages
Orange Green Corporate Geometric Business Case Study and Report Business Presentation
No ratings yet
Orange Green Corporate Geometric Business Case Study and Report Business Presentation
11 pages
SUMMARY
No ratings yet
SUMMARY
16 pages
8 To 12 Jaimeen
No ratings yet
8 To 12 Jaimeen
34 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
A Study To Assess The Patient's Satisfaction On Nursing Care in Emergency Department
No ratings yet
A Study To Assess The Patient's Satisfaction On Nursing Care in Emergency Department
3 pages
Domande Complete ML UNIPD
No ratings yet
Domande Complete ML UNIPD
12 pages
Assignment 1
No ratings yet
Assignment 1
17 pages
Waukesha 7101 Spec
100% (1)
Waukesha 7101 Spec
4 pages
Creepand Shrinkage
No ratings yet
Creepand Shrinkage
9 pages
ML Lab 5
No ratings yet
ML Lab 5
2 pages
ML Model Report
No ratings yet
ML Model Report
8 pages
Assigmnent 3 (Data Mining)
No ratings yet
Assigmnent 3 (Data Mining)
18 pages
Supervised Learningclassification Part2
No ratings yet
Supervised Learningclassification Part2
13 pages
Cancer Disease Classification
No ratings yet
Cancer Disease Classification
6 pages
Financial Planning - Budgeting
No ratings yet
Financial Planning - Budgeting
7 pages
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
22 pages
Time: 60 Minutes NAME: . GROUP: .. DATE
No ratings yet
Time: 60 Minutes NAME: . GROUP: .. DATE
4 pages
Saini 2012
No ratings yet
Saini 2012
17 pages
BDA Lab Manual (12 Weeks)
No ratings yet
BDA Lab Manual (12 Weeks)
22 pages
Grove Mus About
No ratings yet
Grove Mus About
3 pages
ISYE6501 Homework 2
No ratings yet
ISYE6501 Homework 2
11 pages
Five Qgis Network Analysis Toolboxes For Routing and Isochrones - Free and Open Source Gis Ramblings
No ratings yet
Five Qgis Network Analysis Toolboxes For Routing and Isochrones - Free and Open Source Gis Ramblings
4 pages
EX - NO:3: Algorithm
No ratings yet
EX - NO:3: Algorithm
11 pages
How To Read An Inch Based Micrometer
No ratings yet
How To Read An Inch Based Micrometer
2 pages
HW11數學規劃
No ratings yet
HW11數學規劃
14 pages
Part I
No ratings yet
Part I
12 pages
Hypervisor Overview PDF
No ratings yet
Hypervisor Overview PDF
2 pages
Appendix 5: Affirmations For Prosperity and Abundance: Higher Awareness Intuitive Resource List
No ratings yet
Appendix 5: Affirmations For Prosperity and Abundance: Higher Awareness Intuitive Resource List
2 pages
Economics Ee
No ratings yet
Economics Ee
21 pages
20BCE1205 Lab6
No ratings yet
20BCE1205 Lab6
12 pages
Random Forest PDF
No ratings yet
Random Forest PDF
14 pages
Tree Based Methods
No ratings yet
Tree Based Methods
21 pages
Da Exp9,10
No ratings yet
Da Exp9,10
9 pages
Lab 4
No ratings yet
Lab 4
20 pages
Datamining 2
No ratings yet
Datamining 2
54 pages
Workplace Education Project
No ratings yet
Workplace Education Project
2 pages
Avmg BS R
No ratings yet
Avmg BS R
2 pages
A Very Basic Introduction To Random Forests Using R - Oxford Protein Informatics Group
No ratings yet
A Very Basic Introduction To Random Forests Using R - Oxford Protein Informatics Group
7 pages
PDS DeltaV DocLibrary
No ratings yet
PDS DeltaV DocLibrary
3 pages
Random Forest Intro Presented
No ratings yet
Random Forest Intro Presented
38 pages
Abdul Subhan C.V
No ratings yet
Abdul Subhan C.V
2 pages
Affordable Housing Speech 3-2-2024
No ratings yet
Affordable Housing Speech 3-2-2024
3 pages
R Assignment
No ratings yet
R Assignment
8 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
8 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
Mla - 2 (Cia - 1) - 20221013
No ratings yet
Mla - 2 (Cia - 1) - 20221013
14 pages
Classification
No ratings yet
Classification
4 pages
CASOS
No ratings yet
CASOS
12 pages
Text Emaillanguage
No ratings yet
Text Emaillanguage
4 pages
Name: Le Ho Thao Nguyen Student ID: 20194224
No ratings yet
Name: Le Ho Thao Nguyen Student ID: 20194224
9 pages
Circular No: Date:: RAD/Kharghar/2024-25/PRN336 24.12.2024
No ratings yet
Circular No: Date:: RAD/Kharghar/2024-25/PRN336 24.12.2024
2 pages
ML File - Merged
No ratings yet
ML File - Merged
24 pages
WEEK
No ratings yet
WEEK
17 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
St. John College of Engineering and Management, Palghar - Maharashtra
No ratings yet
St. John College of Engineering and Management, Palghar - Maharashtra
11 pages
ML Algorithms
100% (1)
ML Algorithms
1 page
Bda Assign
No ratings yet
Bda Assign
15 pages
HW1 Final
No ratings yet
HW1 Final
4 pages
Discussion 3 Supervised
No ratings yet
Discussion 3 Supervised
14 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Random Forest
No ratings yet
Random Forest
5 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

9.18 Problem Set - Classification

Uploaded by

9.18 Problem Set - Classification

Uploaded by

---

title: '9.18 Problem Set: Classification'

test_set <- test_set %>%

logreg_model <- glm(class_binary ~ . -class, family = binomial, data = train_set)

logreg_accuracy <- Accuracy(y_pred = logreg_binary, y_true = test_set$class_binary)

rf_model <- randomForest(class ~ . -class_binary, data = train_set)

rf_pred <- predict(rf_model, newdata = test_set)

rf_accuracy <- Accuracy(y_pred = rf_pred, y_true = test_set$class)

knn_pred <- knn(

knn_accuracy <- Accuracy(y_pred = knn_pred, y_true = test_set$class)

You might also like