Classification

The document outlines the process of performing classification in R Studio, focusing on predicting discrete target variables based on input features. It details the steps for building a classification model, including data preparation, exploratory data analysis, model training, evaluation, and validation, along with required R packages like caret and randomForest. An implementation example is provided, demonstrating the use of decision trees to classify smoking status based on lung capacity, age, and height, with an achieved accuracy of approximately 88.48%.

Uploaded by

productionsankit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views4 pages

Classification

Uploaded by

productionsankit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

LAB PRACTICAL-9

Objective: Perform Classification in R STUDIO.

Classification is a type of supervised learning where the aim is to assign a label or category
to input data based on a set of features. In R, classification models are developed and
evaluated using statistical and machine learning methods. Below is an overview of the
theoretical concepts involved in classification.
Fundamentals of Classification
 Objective: Predict a discrete target variable (class label) based on input features.
 Inputs: A dataset with:
o Features: Predictor variables (e.g., age, income, etc.).
o Class Labels: Target categories to predict (e.g., "yes" or "no").
 Output: A model that maps features to class labels.
Common Classification Algorithms are Decision Tree, Random Forest, Support Vector
Machine etc.
Steps to Build a Classification Model in R
1. Data Preparation: Load, clean, and split the dataset.
2. EDA: Explore the data to understand relationships and distributions.
3. Model Training: Fit the model to the training data.
4. Model Evaluation: Use metrics like accuracy, precision, recall, and ROC-AUC.
5. Hyperparameter Tuning: Optimize the model for better performance.
6. Validation: Test the model on unseen data.

R packages required for Classification

1. caret: Provides a unified interface to numerous machine learning algorithms for

classification and regression tasks.

2. randomForest: Implements the Random Forest algorithm for classification and

regression tasks.

3. rpart: Builds decision trees for classification and regression tasks.

IMPLEMENTATION

1. Import .CSV data file to R studio.

Type the command in the console: data1<-(file.choose(), Header=T)
A dialog box will appear, choose the dataset from the dialog box and that dataset will be
available in Global Environment in R studio.
Then, we can perform all the operations on the data in the R Studio.
PROGRAM CODE:
str(data3) # View structure of the dataset
summary(data3) # Summary statistics
colSums(is.na(data3))
data3$Smoke <- as.factor(data3$Smoke)
set.seed(123) # For reproducibility
library(caTools)
split <- sample.split(data3$Smoke, SplitRatio = 0.7)
train_data <- subset(data3, split == TRUE)
test_data <- subset(data3, split == FALSE)
library(rpart)
tree_model <- rpart(Smoke ~ LungCap.cc. + Age..years. + Height.inches.,
data = train_data,
method = "class")
# Visualize the tree
library(rpart.plot)
rpart.plot(tree_model)
predictions <- predict(tree_model, test_data, type = "class")
table(Predicted = predictions, Actual = test_data$Smoke)
confusionMatrix <- table(Predicted = predictions, Actual = test_data$Smoke)
# Accuracy
accuracy <- sum(diag(confusionMatrix)) / sum(confusionMatrix)
print(accuracy)
# Using the caret package for detailed metrics
library(caret)
confusionMatrix(predictions, test_data$Smoke)

OUTPUT

> str(data3) # View structure of the dataset

'data.frame': 725 obs. of 6 variables:
$ LungCap.cc. : num 6.47 10.12 9.55 11.12 4.8 ...
$ Age..years. : int 6 18 16 14 5 11 8 11 15 11 ...
$ Height.inches.: num 62.1 74.7 69.7 71 56.9 58.7 63.3 70.4 70.5 59.2 ...
$ Smoke : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
$ Gender : chr "male" "female" "female" "male" ...
$ Caesarean : chr "no" "no" "yes" "no" ...
> summary(data3) # Summary statistics
LungCap.cc. Age..years. Height.inches.
Min. : 0.507 Min. : 3.00 Min. :45.30
1st Qu.: 6.150 1st Qu.: 9.00 1st Qu.:59.90
Median : 8.000 Median :13.00 Median :65.40
Mean : 7.863 Mean :12.33 Mean :64.84
3rd Qu.: 9.800 3rd Qu.:15.00 3rd Qu.:70.30
Max. :14.675 Max. :19.00 Max. :81.80
Smoke Gender Caesarean
no :648 Length:725 Length:725
yes: 77 Class :character Class :character
Mode :character Mode :character

> colSums(is.na(data3))
LungCap.cc. Age..years. Height.inches.
0 0 0
Smoke Gender Caesarean
0 0 0
> data3$Smoke <- as.factor(data3$Smoke)
> set.seed(123) # For reproducibility
> library(caTools)
> split <- sample.split(data3$Smoke, SplitRatio = 0.7)
> train_data <- subset(data3, split == TRUE)
> test_data <- subset(data3, split == FALSE)
> library(rpart)
> tree_model <- rpart(Smoke ~ LungCap.cc. + Age..years. + Height.inches.,
+ data = train_data,
+ method = "class")
> # Visualize the tree
> library(rpart.plot)
> rpart.plot(tree_model)
> predictions <- predict(tree_model, test_data, type = "class")
> table(Predicted = predictions, Actual = test_data$Smoke)
Actual
Predicted no yes
no 189 20
yes 5 3
> confusionMatrix <- table(Predicted = predictions, Actual = test_data$Smoke)
> # Accuracy
> accuracy <- sum(diag(confusionMatrix)) / sum(confusionMatrix)
> print(accuracy)
[1] 0.8847926
> # Using the caret package for detailed metrics
> library(caret)
Loading required package: ggplot2
Use suppressPackageStartupMessages() to eliminate
package startup messages
Loading required package: lattice
> confusionMatrix(predictions, test_data$Smoke)
Confusion Matrix and Statistics

Reference
Prediction no yes
no 189 20
yes 5 3

Accuracy : 0.8848
95% CI : (0.8346, 0.924)
No Information Rate : 0.894
P-Value [Acc > NIR] : 0.71613

Kappa : 0.1469

Mcnemar's Test P-Value : 0.00511

Sensitivity : 0.9742
Specificity : 0.1304
Pos Pred Value : 0.9043
Neg Pred Value : 0.3750
Prevalence : 0.8940
Detection Rate : 0.8710
Detection Prevalence : 0.9631
Balanced Accuracy : 0.5523

'Positive' Class : no

Map Assign 8
No ratings yet
Map Assign 8
7 pages
Practical Machine Learning
No ratings yet
Practical Machine Learning
11 pages
Final Data Lab
No ratings yet
Final Data Lab
21 pages
Ebay Auction Case Solution
No ratings yet
Ebay Auction Case Solution
9 pages
Diabetes Dectection
No ratings yet
Diabetes Dectection
7 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Multivariate Statistics - Tutorial 4 Sensitivity, Specificity, ROC and Validation
No ratings yet
Multivariate Statistics - Tutorial 4 Sensitivity, Specificity, ROC and Validation
19 pages
Decision Tree
No ratings yet
Decision Tree
10 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Predicting Earnings Manipulation - FinalDoc
No ratings yet
Predicting Earnings Manipulation - FinalDoc
29 pages
Final Data Lab
No ratings yet
Final Data Lab
20 pages
Mla - 2 (Cia - 1) - 20221013
No ratings yet
Mla - 2 (Cia - 1) - 20221013
14 pages
20BCE1205 Lab6
No ratings yet
20BCE1205 Lab6
12 pages
Camera Ready
No ratings yet
Camera Ready
5 pages
Appendix: Ps Matching in R: (With Attached Dataset and Code)
No ratings yet
Appendix: Ps Matching in R: (With Attached Dataset and Code)
24 pages
Grid Search For KNN
No ratings yet
Grid Search For KNN
17 pages
Vighnesh - S Log 13
No ratings yet
Vighnesh - S Log 13
4 pages
Muhamad Choza Inul Muna - Analisis Sentimen
No ratings yet
Muhamad Choza Inul Muna - Analisis Sentimen
8 pages
Predicting Disease With Machine Learning
No ratings yet
Predicting Disease With Machine Learning
20 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
期末作業
No ratings yet
期末作業
10 pages
Bayesian Classifier
No ratings yet
Bayesian Classifier
17 pages
Cvms
No ratings yet
Cvms
37 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
20 pages
Package PK': January 20, 2025
No ratings yet
Package PK': January 20, 2025
43 pages
FIT2086 Assignment 3 Law Khye Jian
No ratings yet
FIT2086 Assignment 3 Law Khye Jian
12 pages
Assigmnent 3 (Data Mining)
No ratings yet
Assigmnent 3 (Data Mining)
18 pages
Healthcare Analytics
No ratings yet
Healthcare Analytics
72 pages
R Assignment
No ratings yet
R Assignment
8 pages
Hypothesis Testing in R
No ratings yet
Hypothesis Testing in R
13 pages
Pima Tutorial
No ratings yet
Pima Tutorial
8 pages
Name: Le Ho Thao Nguyen Student ID: 20194224
No ratings yet
Name: Le Ho Thao Nguyen Student ID: 20194224
9 pages
Package Rminer': R Topics Documented
No ratings yet
Package Rminer': R Topics Documented
43 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
R Functions
No ratings yet
R Functions
6 pages
Regression Analysis Assignment1111
No ratings yet
Regression Analysis Assignment1111
13 pages
Cse437 4
No ratings yet
Cse437 4
14 pages
Ex 10 - Decision Tree With Rpart and Fancy Plot and Cardio Data
No ratings yet
Ex 10 - Decision Tree With Rpart and Fancy Plot and Cardio Data
4 pages
HussainBadshah SafwanSheikh
No ratings yet
HussainBadshah SafwanSheikh
12 pages
ETE 399 Mini Project
No ratings yet
ETE 399 Mini Project
7 pages
Web Application
No ratings yet
Web Application
13 pages
Da 06-10
No ratings yet
Da 06-10
14 pages
WEEK
No ratings yet
WEEK
17 pages
Shark Tank Deal Prediction - Uudhya - Dec 2019
No ratings yet
Shark Tank Deal Prediction - Uudhya - Dec 2019
16 pages
Cardiovascular Disease Slides
No ratings yet
Cardiovascular Disease Slides
35 pages
Task 1 RR Usa
No ratings yet
Task 1 RR Usa
5 pages
QUIZ Notes
No ratings yet
QUIZ Notes
5 pages
Aman DA 111
No ratings yet
Aman DA 111
14 pages
Customer Churn Analysis
No ratings yet
Customer Churn Analysis
10 pages
BAN5
No ratings yet
BAN5
2 pages
Project Report
No ratings yet
Project Report
18 pages
Experiment 2
No ratings yet
Experiment 2
17 pages
Final Research Paper
No ratings yet
Final Research Paper
3 pages
Aih Exp 3
No ratings yet
Aih Exp 3
8 pages
Assignment# 06
No ratings yet
Assignment# 06
16 pages
IntroR 2
No ratings yet
IntroR 2
18 pages
Lecture - Model Accuracy Measures
No ratings yet
Lecture - Model Accuracy Measures
61 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Codigos 5700
No ratings yet
Codigos 5700
153 pages
SCIEX QTRAP 5500 System Specification
No ratings yet
SCIEX QTRAP 5500 System Specification
13 pages
Sfepy Manual
No ratings yet
Sfepy Manual
988 pages
Ai 2024 Board Paper Solution
No ratings yet
Ai 2024 Board Paper Solution
4 pages
Southern Province Grade 10 Information and Communication Technology Ict 2020 1 Term Test Paper 61e9422335b6f
No ratings yet
Southern Province Grade 10 Information and Communication Technology Ict 2020 1 Term Test Paper 61e9422335b6f
13 pages
Onboarding Form Filling Guide
No ratings yet
Onboarding Form Filling Guide
2 pages
MyPractice - Question Bank - Results
No ratings yet
MyPractice - Question Bank - Results
194 pages
Low Power Clock Tree Optimization by Clock Buffer/Inverter Reduction
100% (1)
Low Power Clock Tree Optimization by Clock Buffer/Inverter Reduction
2 pages
Codigos de FalhaCP 224 e 274
No ratings yet
Codigos de FalhaCP 224 e 274
6 pages
Pratical No 14 Edp HKK
No ratings yet
Pratical No 14 Edp HKK
5 pages
(NOV) F2升F3 BI (tech savvy)
No ratings yet
(NOV) F2升F3 BI (tech savvy)
33 pages
Information and Communication Technologies in Healthcare in 2020 - State of Play and Trends en
No ratings yet
Information and Communication Technologies in Healthcare in 2020 - State of Play and Trends en
203 pages
CPE 445-Internet of Things - Chapter 7
No ratings yet
CPE 445-Internet of Things - Chapter 7
39 pages
Pre-Employment Requirements
No ratings yet
Pre-Employment Requirements
2 pages
CO2 Pre-Test & Functional Test Sheet
No ratings yet
CO2 Pre-Test & Functional Test Sheet
10 pages
Candidate Handbook
No ratings yet
Candidate Handbook
66 pages
6SL3210-5HE12-0UF0 Datasheet en
No ratings yet
6SL3210-5HE12-0UF0 Datasheet en
2 pages
TL-WR844N (EU) 1.0 Datasheet
100% (1)
TL-WR844N (EU) 1.0 Datasheet
5 pages
Boost Performance of Informatica Lookups
No ratings yet
Boost Performance of Informatica Lookups
5 pages
Arranz - 2022 - Fluid-Structure Interaction of Multi-Body Systems Methodology and Applications
No ratings yet
Arranz - 2022 - Fluid-Structure Interaction of Multi-Body Systems Methodology and Applications
20 pages
Invoice 1344349852 I0129P2403025271 Unlocked
No ratings yet
Invoice 1344349852 I0129P2403025271 Unlocked
1 page
Summary of Charges Summary of Charges Summary of Charges: Past Due
No ratings yet
Summary of Charges Summary of Charges Summary of Charges: Past Due
3 pages
Anh Do Biography
No ratings yet
Anh Do Biography
2 pages
Embedded Systems Input and Output Optional
No ratings yet
Embedded Systems Input and Output Optional
4 pages
Precedent EM Wiring
No ratings yet
Precedent EM Wiring
64 pages
SET Duct Manufacturing, Inc.: Spiral Duct Dimensional Guide
100% (1)
SET Duct Manufacturing, Inc.: Spiral Duct Dimensional Guide
20 pages
Geo SCADA Expert Performance Guidelines
No ratings yet
Geo SCADA Expert Performance Guidelines
12 pages
Sau 1366720897
No ratings yet
Sau 1366720897
2 pages
KPM180 Manual
No ratings yet
KPM180 Manual
108 pages
Manual de Servicio
No ratings yet
Manual de Servicio
133 pages

Classification

Uploaded by

Classification

Uploaded by

LAB PRACTICAL-9

Objective: Perform Classification in R STUDIO.

R packages required for Classification

1. caret: Provides a unified interface to numerous machine learning algorithms for

2. randomForest: Implements the Random Forest algorithm for classification and

3. rpart: Builds decision trees for classification and regression tasks.

1. Import .CSV data file to R studio.

> str(data3) # View structure of the dataset

Mcnemar's Test P-Value : 0.00511

You might also like