Supervised Learningclassification Part2

Uploaded by

Chandini Gujju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views17 pages

Supervised Learningclassification Part2

Uploaded by

Chandini Gujju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 17

Supervised

Learning:Classification
Nearest Neighbor Part-2
Consider the following dataset, apply KNN and tell the classification of
sample point having sepal length=5.2 and sepal width =3.1
If k=1, setosa
If k=2, setosa if k=3,setosa
Diagnosing Breast Cancer with
the k-NN Algorithm
• We will utilize the Wisconsin Breast Cancer
Diagnostic dataset.
• The breast cancer data includes 569 examples
of cancer biopsies, each with 32 features.
• One feature is an identification number,
another is the cancer diagnosis, and 30 are
numeric-valued laboratory measurements.
• The diagnosis is coded as "M" to indicate
malignant or "B" to indicate benign.
• Download the wisc_bc_data.csv file and save
it to your R working directory.
• Save the Wisconsin breast cancer data to the
wbcd data frame:
> wbcd <- read.csv ("wisc_bc_data.csv",
stringsAsFactors = FALSE)
• If we want to find the structure of wbcd,
execute:
> str(wbcd)
• The first variable is an integer variable named id. As
this is simply a unique identifier (ID) for each
patient in the data, it does not provide useful
information, and we will need to exclude it from
the model.
> wbcd <- wbcd[-1]
• The next variable indicates whether the example is
from a benign or malignant mass. The table()
output indicates that 357 masses are benign while
212 are malignant:
> table(wbcd$diagnosis)
• We will need to recode the diagnosis variable.
> wbcd$diagnosis<- factor(wbcd$diagnosis,
levels = c("B", "M"), labels = c("Benign",
"Malignant"))
• Now, when we look at the prop.table() output,
we notice that the values have been labeled
Benign and Malignant with 62.7 percent and
37.3 percent of the masses, respectively:
> round(prop.table(table(wbcd$diagnosis)) *
100, digits = 1)
• The remaining 30 features are all numeric.
>summary(wbcd[c("radius_mean",
"area_mean", "smoothness_mean")])
• Now here we can see that the impact of area
is going to be much larger than the
smoothness in the distance calculation.
• To normalize these features, we need to create a
normalize() function.
> normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
• We can now apply the normalize() function to the
numeric features in our data frame.
• The lapply() function takes a list and applies a
specified function to each list element.
> wbcd_n <- as.data.frame(lapply(wbcd[2:31],
normalize))

• To confirm that the transformation was applied correctly,

let's look at one variable's summary statistics:
> summary(wbcd_n$area_mean)

• We will use the first 469 records for the training dataset
and the remaining 100 to simulate new patients
• we will split the wbcd_n data frame into wbcd_train
and wbcd_test:
> wbcd_train <- wbcd_n[1:469, ]
> wbcd_test <- wbcd_n[470:569, ]
• When we constructed our normalized training and
test datasets, we excluded the target variable,
diagnosis.
• For training the k-NN model, we will need to store
these class labels in factor vectors, split between the
training and test datasets:
> wbcd_train_labels <- wbcd[1:469, 1]
> wbcd_test_labels <- wbcd[470:569, 1]

•To classify our test instances, we will use a k-NN

implementation from the class package, which
provides a set of basic R functions for classification.
> install.packages("class")
• To load the package during any session in which
you wish to use the functions, execute
> library(class)

• Now we can use the knn() function to classify

the test data:
> wbcd_test_pred <- knn(train = wbcd_train, test
= wbcd_test, cl = wbcd_train_labels, k = 21)
• The knn() function returns a factor vector of
predicted labels for each of the examples in the test
dataset, which we have assigned to
wbcd_test_pred.
• The next step of the process is to evaluate how well
the predicted classes in the wbcd_test_pred vector
match up with the known values in the
wbcd_test_labels vector.
• To do this, we can use the CrossTable() function in
the gmodels package.
> install.packages("gmodels")

• Load the package using,

> library(gmodels)
> CrossTable(x = wbcd_test_labels, y =
wbcd_test_pred, prop.chisq=FALSE)
Algorithm
1. Read the given dataset wbcd
2. Display the structure and analyze it
3. Remove id column
4. Find the number of B and M samples
5. Recode the B and M labels to “Benign” and
“Malignant”
6. Generate the summary of "radius_mean",
"area_mean", "smoothness_mean“
7. Apply normalization (as value of area is much larger
than smoothness)
8. The lapply() function takes a list and applies a
specified function to each list element excluding
‘diagnosis’.
9. Check the values of area_mean after normalization
10. Split the dataset into training and testing phase
11. Apply knn
12. Apply crosstable() function to to evaluate how well
the predicted classes in the wbcd_test_pred vector
match up with the known values in the
wbcd_test_labels vector.

Instant download Malea Fashion District How Successful Managers Use Financial Information to Grow Organizations 3rd Edition Antonio Davila pdf all chapter
100% (10)
Instant download Malea Fashion District How Successful Managers Use Financial Information to Grow Organizations 3rd Edition Antonio Davila pdf all chapter
40 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Hanrahan v. Cambridge IGCSE and O Level Additional Mathematics 2023
100% (3)
Hanrahan v. Cambridge IGCSE and O Level Additional Mathematics 2023
396 pages
Supervised Learningclassification Part2
No ratings yet
Supervised Learningclassification Part2
13 pages
K-Nearest Neighbor - Breast Cancer Example Summary
No ratings yet
K-Nearest Neighbor - Breast Cancer Example Summary
3 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
Worksheet Classification1
No ratings yet
Worksheet Classification1
15 pages
A008 - KNN.R: # Load The Dataset
No ratings yet
A008 - KNN.R: # Load The Dataset
4 pages
Article Eda
No ratings yet
Article Eda
7 pages
MLLABDA2
No ratings yet
MLLABDA2
5 pages
Introduction To KNN and R
No ratings yet
Introduction To KNN and R
12 pages
Untitled Document (9)
No ratings yet
Untitled Document (9)
1 page
IJERT Developing A Web Based System For
No ratings yet
IJERT Developing A Web Based System For
5 pages
Da 06-10
No ratings yet
Da 06-10
14 pages
INSY446 - 5 - Classification Part 2
No ratings yet
INSY446 - 5 - Classification Part 2
37 pages
Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang
7 pages
da thoery
No ratings yet
da thoery
24 pages
Practical 7
No ratings yet
Practical 7
6 pages
Breastcancer Research
No ratings yet
Breastcancer Research
9 pages
Cancer Detection Using Data Mining
No ratings yet
Cancer Detection Using Data Mining
13 pages
KNN
No ratings yet
KNN
6 pages
Logistic Regression For Malignancy Prediction in Cancer - by Luca Zammataro - Towards Data Science
No ratings yet
Logistic Regression For Malignancy Prediction in Cancer - by Luca Zammataro - Towards Data Science
32 pages
Anembeddedfeatureselectionmethodbasedongeneralizedclassifierneural Network For Cancer Classification
No ratings yet
Anembeddedfeatureselectionmethodbasedongeneralizedclassifierneural Network For Cancer Classification
11 pages
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
No ratings yet
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
8 pages
DL Project Progress Report
No ratings yet
DL Project Progress Report
49 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
KnnClassifier - Jupyter Notebook
No ratings yet
KnnClassifier - Jupyter Notebook
2 pages
v08i03-06
No ratings yet
v08i03-06
6 pages
Classification and K Nearest Neighbour Algorithm
No ratings yet
Classification and K Nearest Neighbour Algorithm
53 pages
K Nearest Neighbours (KNN) : Short Intro To KNN
No ratings yet
K Nearest Neighbours (KNN) : Short Intro To KNN
13 pages
Using Predictive Analytics Model To Diagnose Breast Cnacer
No ratings yet
Using Predictive Analytics Model To Diagnose Breast Cnacer
9 pages
DA_lab-9
No ratings yet
DA_lab-9
4 pages
'9.18 Problem Set- Classification'
No ratings yet
'9.18 Problem Set- Classification'
2 pages
HUST_PPT_template_2022_RED_16x9_567042-2
No ratings yet
HUST_PPT_template_2022_RED_16x9_567042-2
25 pages
DSA1101 2019 Week3 Part1
No ratings yet
DSA1101 2019 Week3 Part1
38 pages
Map Assign 8
No ratings yet
Map Assign 8
7 pages
Breast Cancer Classification
No ratings yet
Breast Cancer Classification
18 pages
Breast Cancer Detection
No ratings yet
Breast Cancer Detection
15 pages
Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle
No ratings yet
Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle
17 pages
ETE 399 Mini Project
No ratings yet
ETE 399 Mini Project
7 pages
A Complete Guide To KNN
No ratings yet
A Complete Guide To KNN
16 pages
Mla - 2 (Cia - 1) - 20221013
No ratings yet
Mla - 2 (Cia - 1) - 20221013
14 pages
ML Lecture#2
No ratings yet
ML Lecture#2
70 pages
Mid Term Assignment Data Warehousing and Data Mining Section: C Name: Joy, MD - Monowar Hossain ID: 18-38618-2
No ratings yet
Mid Term Assignment Data Warehousing and Data Mining Section: C Name: Joy, MD - Monowar Hossain ID: 18-38618-2
3 pages
ML Lab File[1]
No ratings yet
ML Lab File[1]
43 pages
T2 KNN
No ratings yet
T2 KNN
16 pages
21STA024 Md. Toufik Umar Assignment on STAT 309 1
No ratings yet
21STA024 Md. Toufik Umar Assignment on STAT 309 1
12 pages
Disease Prediction
No ratings yet
Disease Prediction
15 pages
KNN and Naive Bayes
No ratings yet
KNN and Naive Bayes
61 pages
K Nearest neighbour’s(knn)[1] using R
No ratings yet
K Nearest neighbour’s(knn)[1] using R
9 pages
Week 10 Abhishek Srivastava VFinal
No ratings yet
Week 10 Abhishek Srivastava VFinal
14 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Support Vector Machine (SVM) - Bioinformatics
No ratings yet
Support Vector Machine (SVM) - Bioinformatics
10 pages
Grid Search For KNN
No ratings yet
Grid Search For KNN
17 pages
Machine Learning For Breast Cancer Diagnosis A Proof of Concept
No ratings yet
Machine Learning For Breast Cancer Diagnosis A Proof of Concept
27 pages
7073-21560-2-PB
No ratings yet
7073-21560-2-PB
9 pages
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
No ratings yet
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
74 pages
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
No ratings yet
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
15 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
Expt 5
No ratings yet
Expt 5
3 pages
Vighnesh - S Log 13
No ratings yet
Vighnesh - S Log 13
4 pages
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
2023 Control Systems III - Learner Guide
No ratings yet
2023 Control Systems III - Learner Guide
29 pages
Needs Analysis
No ratings yet
Needs Analysis
2 pages
Lean Six Sigma Green Belt
0% (1)
Lean Six Sigma Green Belt
5 pages
G2 U1 W3 L7
No ratings yet
G2 U1 W3 L7
15 pages
Teach Phonics: A Step-by-Step Guide | Owl Tutors
No ratings yet
Teach Phonics: A Step-by-Step Guide | Owl Tutors
4 pages
whitworth2015
No ratings yet
whitworth2015
39 pages
IHC Reviewer
No ratings yet
IHC Reviewer
3 pages
Discussion Scope - Expert
No ratings yet
Discussion Scope - Expert
2 pages
SEABC Law School Handbook 1 2021 22
No ratings yet
SEABC Law School Handbook 1 2021 22
18 pages
TAR-1-ACTIVITIES
No ratings yet
TAR-1-ACTIVITIES
2 pages
Aspect of Verb
No ratings yet
Aspect of Verb
22 pages
mcom-curriculum (1)
No ratings yet
mcom-curriculum (1)
4 pages
Writing A Case Study
No ratings yet
Writing A Case Study
4 pages
Unit 3
No ratings yet
Unit 3
12 pages
2 PPT Lecture Unit 2 Lesson 1 - Functions of Art
No ratings yet
2 PPT Lecture Unit 2 Lesson 1 - Functions of Art
19 pages
International-Scholarship-Opportunity-Advert-FINAL-2-4
No ratings yet
International-Scholarship-Opportunity-Advert-FINAL-2-4
3 pages
Dominos Case Study
No ratings yet
Dominos Case Study
2 pages
Dedmon Classroom Rules 2020-2021
No ratings yet
Dedmon Classroom Rules 2020-2021
4 pages
Full Download Microsoft Blazor: Building Web Applications in .NET - Second Edition Peter Himschoot PDF DOCX
100% (2)
Full Download Microsoft Blazor: Building Web Applications in .NET - Second Edition Peter Himschoot PDF DOCX
50 pages
SSC Result 2020-21
No ratings yet
SSC Result 2020-21
6 pages
Letter
No ratings yet
Letter
1 page
5.10.12-Press Release - Lord Chamberlain Nightingale Awards
No ratings yet
5.10.12-Press Release - Lord Chamberlain Nightingale Awards
2 pages
IGNOU Common-Prospectus-English
No ratings yet
IGNOU Common-Prospectus-English
268 pages
EMCEE SCRIPT-seminar
No ratings yet
EMCEE SCRIPT-seminar
2 pages
ENGLISG 3 Story
No ratings yet
ENGLISG 3 Story
14 pages
Accreditation Letter - European Commission
No ratings yet
Accreditation Letter - European Commission
2 pages
Part A Fill in The Blank by Using Past Continuous or Simple Past
No ratings yet
Part A Fill in The Blank by Using Past Continuous or Simple Past
6 pages
Annex 1 - Task 2 - My Wonderful Dreams
No ratings yet
Annex 1 - Task 2 - My Wonderful Dreams
3 pages

Supervised Learningclassification Part2

Uploaded by

Supervised Learningclassification Part2

Uploaded by

Supervised

• To confirm that the transformation was applied correctly,

•To classify our test instances, we will use a k-NN

• Now we can use the knn() function to classify

• Load the package using,

You might also like