0% found this document useful (0 votes)

37 views14 pages

Week 10 Abhishek Srivastava VFinal

ML Project

Uploaded by

abhishek.disney

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views14 pages

Week 10 Abhishek Srivastava VFinal

ML Project

Uploaded by

abhishek.disney

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Week Ten Exercise

Abhishek Srivastava

2024-08-10

Introduction to Machine Learning

# Load necessary libraries
library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.4.1

## Warning: package 'tidyr' was built under R version 4.4.1

## Warning: package 'purrr' was built under R version 4.4.1

## Warning: package 'dplyr' was built under R version 4.4.1

## Warning: package 'forcats' was built under R version 4.4.1

## Warning: package 'lubridate' was built under R version 4.4.1

## ── Attaching core tidyverse packages ──────────────────────── tidyverse

2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ──────────────────────────────────────────
tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all
conflicts to become errors

library(class)
library(ggplot2)

# Load the datasets

binary_data <- read.csv("binary-classifier-data.csv")
trinary_data <- read.csv("trinary-classifier-data.csv")

# Plot the data from binary dataset

ggplot(binary_data, aes(x = x, y = y, color = factor(label))) +
geom_point() +
labs(title = "Binary Classifier Data", x = "X", y = "Y")
# Plot the data from trinary dataset
ggplot(trinary_data, aes(x = x, y = y, color = factor(label))) +
geom_point() +
labs(title = "Trinary Classifier Data", x = "X", y = "Y")
# Function to calculate Euclidean distance between two points
euclidean_distance <- function(p1, p2) {
sqrt((p1$x - p2$x)^2 + (p1$y - p2$y)^2)
}

# Function to predict using k-nearest neighbors algorithm

knn_predict <- function(train_data, test_data, k) {
predicted_labels <- knn(train = train_data[, c("x", "y")],
test = test_data[, c("x", "y")],
cl = train_data$label,
k = k)
return(predicted_labels)
}

# Function to calculate accuracy

calculate_accuracy <- function(true_labels, predicted_labels) {
accuracy <- sum(true_labels == predicted_labels) / length(true_labels)
return(accuracy)
}

# Fit k-nearest neighbors model for each dataset and compute accuracy
k_values <- c(3, 5, 10, 15, 20, 25)
accuracy_results <- data.frame()

for (k in k_values) {
binary_predicted_labels <- knn_predict(binary_data, binary_data, k)
binary_accuracy <- calculate_accuracy(binary_data$label,
binary_predicted_labels)

trinary_predicted_labels <- knn_predict(trinary_data, trinary_data, k)

trinary_accuracy <- calculate_accuracy(trinary_data$label,
trinary_predicted_labels)

accuracy_results <- rbind(accuracy_results, data.frame(k = k,

binary_accuracy =
binary_accuracy,
trinary_accuracy =
trinary_accuracy))
}

# Plot accuracy results

ggplot(accuracy_results, aes(x = k)) +
geom_line(aes(y = binary_accuracy, color = "Binary Dataset")) +
geom_line(aes(y = trinary_accuracy, color = "Trinary Dataset")) +
labs(title = "Accuracy of k-Nearest Neighbors Model", x = "k", y =
"Accuracy") +
scale_color_manual(values = c("blue", "red"))

# Linear classifier might not work well on these datasets as the data points
are not linearly separable

# The accuracy of logistic regression classifier from last week may differ
from k-nearest neighbors due to the underlying assumptions and how the models
handle the data. Logistic regression assumes a linear relationship between
features and the log-odds of the output, while k-nearest neighbors makes
predictions based on the similarity of data points. The two methods may
perform differently depending on the distribution of the data and the
relationships between features.

Clustering Solution

### Summary :
1. Data Preparation:
• Load the dataset: load clustering-data.csv.
• Inspect the data: Examine the structure and content of the dataset.

2. Data Visualization:
• Scatter plot: Plot the data points using a scatter plot to visualize the dataset.

3. K-Means Clustering:
• Fit the model: Apply the k-means clustering algorithm to the dataset for different
values of k (from 2 to 12).
• Visualize clusters: Create scatter plots showing the resultant clusters for each
value of k.

4. Average Distance Calculation:

• Calculate distance: Compute the average distance of each data point to the center
of its assigned cluster.
• Plot results: Plot these average distances on a graph with k on the x-axis and the
average distance on the y-axis.

5. Determine Elbow Point:

• Elbow point: Analyze the graph to identify the “elbow point,” which is the value of k
where the average distance starts to decrease more slowly.

Detailed Steps:

a. Load and Inspect Data:

# Load libraries
library(ggplot2)
library(dplyr)

# Load dataset
data <- read.csv("clustering-data.csv")
# Inspect dataset
head(data)

## x y
## 1 46 236
## 2 69 236
## 3 144 236
## 4 171 236
## 5 194 236
## 6 195 236

b. Plot Data:
# Scatter plot of the dataset
ggplot(data, aes(x = x, y = y)) +
geom_point() +
ggtitle("Clustering Data")

c. Implement K-Means and Visualize Clusters:

# Function to plot clusters
plot_clusters <- function(data, k) {
kmeans_result <- kmeans(data, centers = k)
data$cluster <- as.factor(kmeans_result$cluster)

ggplot(data, aes(x = x, y = y, color = cluster)) +

geom_point() +
ggtitle(paste("K-Means Clustering with k =", k))
}
# Plot clusters for k from 2 to 12
for (k in 2:12) {
print(plot_clusters(data, k))
}
d. Calculate Average Distance:
# Function to calculate average distance from cluster center
average_distance <- function(data, k) {
kmeans_result <- kmeans(data, centers = k)
data$cluster <- kmeans_result$cluster
centers <- kmeans_result$centers

distances <- sapply(1:nrow(data), function(i) {

cluster_center <- centers[data$cluster[i], ]
sqrt(sum((data[i, 1:2] - cluster_center)^2))
})

avg_distance <- mean(distances)

return(avg_distance)
}

# Calculate average distances for k from 2 to 12

k_values <- 2:12
avg_distances <- sapply(k_values, average_distance, data = data)

e. Plot Average Distances:

# Line plot of average distances
avg_distance_data <- data.frame(k = k_values, avg_distance = avg_distances)

ggplot(avg_distance_data, aes(x = k, y = avg_distance)) +

geom_line() +
geom_point() +
ggtitle("Average Distance from Cluster Center vs. k") +
xlab("Number of Clusters (k)") +
ylab("Average Distance")

f. Determine Elbow Point:

• Elbow point analysis: The “elbow point” is typically where the curve starts to
flatten out, indicating that adding more clusters doesn’t significantly reduce the
average distance. In this example it is looking 6

Shaping Paper English
100% (1)
Shaping Paper English
26 pages
Unit 6 - Machine Learning in R
No ratings yet
Unit 6 - Machine Learning in R
45 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Da Thoery
No ratings yet
Da Thoery
24 pages
KNN Class 2
No ratings yet
KNN Class 2
40 pages
Datamininganddataware
No ratings yet
Datamininganddataware
25 pages
Data Science 6th Sem CS Engineesring Questions
No ratings yet
Data Science 6th Sem CS Engineesring Questions
35 pages
ML Fundamentals
No ratings yet
ML Fundamentals
38 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Record
No ratings yet
Record
23 pages
Data Science
No ratings yet
Data Science
15 pages
Wa0003
No ratings yet
Wa0003
16 pages
DM Lab
No ratings yet
DM Lab
18 pages
Analysis Course HW2
No ratings yet
Analysis Course HW2
13 pages
Enjoying English Grammar
No ratings yet
Enjoying English Grammar
232 pages
BDA Lab Manual (12 Weeks)
No ratings yet
BDA Lab Manual (12 Weeks)
22 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
Da 06-10
No ratings yet
Da 06-10
14 pages
R - Language Lab Manual - PG 2024
No ratings yet
R - Language Lab Manual - PG 2024
29 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Datamining Lab Record
No ratings yet
Datamining Lab Record
36 pages
DATAMINING
No ratings yet
DATAMINING
24 pages
ISYE6501 Homework 2
No ratings yet
ISYE6501 Homework 2
11 pages
Worksheet Classification1
No ratings yet
Worksheet Classification1
15 pages
Mla - 2 (Cia - 3) - 20221013
No ratings yet
Mla - 2 (Cia - 3) - 20221013
21 pages
R Code For Discriminant and Cluster Analysis
No ratings yet
R Code For Discriminant and Cluster Analysis
23 pages
Shubham Pract 6 - Merged
No ratings yet
Shubham Pract 6 - Merged
12 pages
Module 3 Lab 2
No ratings yet
Module 3 Lab 2
6 pages
Clustering 2
No ratings yet
Clustering 2
11 pages
Worksheet Classification2
No ratings yet
Worksheet Classification2
14 pages
AIML Lab 10
No ratings yet
AIML Lab 10
4 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
Lab 07
No ratings yet
Lab 07
4 pages
K Means
No ratings yet
K Means
3 pages
V
No ratings yet
V
8 pages
WEEK
No ratings yet
WEEK
17 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
Assignment-1 80501
No ratings yet
Assignment-1 80501
6 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
ISYE 6501 Georgia Tech Hmwk3.1a
No ratings yet
ISYE 6501 Georgia Tech Hmwk3.1a
4 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Image Classifaction
No ratings yet
Image Classifaction
17 pages
Question 2.2
No ratings yet
Question 2.2
4 pages
KMEANS
No ratings yet
KMEANS
9 pages
Case Analysis 3 IKEA Group 6 1
100% (2)
Case Analysis 3 IKEA Group 6 1
16 pages
Solution 1
No ratings yet
Solution 1
6 pages
04d - E70 Longitudinal Dynamics Systems
100% (1)
04d - E70 Longitudinal Dynamics Systems
44 pages
Clusterig
No ratings yet
Clusterig
6 pages
6005 Completo
No ratings yet
6005 Completo
196 pages
ML Lab2 PGM
No ratings yet
ML Lab2 PGM
3 pages
Power of Attorney Sample SBI
67% (3)
Power of Attorney Sample SBI
3 pages
Chapter 7
100% (1)
Chapter 7
42 pages
National Curriculum in England - Mathematics Programmes of Study - GOV - UK
No ratings yet
National Curriculum in England - Mathematics Programmes of Study - GOV - UK
45 pages
Introduction To Hospitality - Food Safety
No ratings yet
Introduction To Hospitality - Food Safety
49 pages
Machine Learning
100% (5)
Machine Learning
56 pages
2.3 Aiml Rishit
No ratings yet
2.3 Aiml Rishit
7 pages
Guide
No ratings yet
Guide
28 pages
Clustering R Codes
No ratings yet
Clustering R Codes
2 pages
Week 1 HW
No ratings yet
Week 1 HW
3 pages
DS - ML - 7 - 60019210046 1
No ratings yet
DS - ML - 7 - 60019210046 1
6 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
Telephone Call 1
No ratings yet
Telephone Call 1
36 pages
Availability Check and Transfer of Requirements
No ratings yet
Availability Check and Transfer of Requirements
49 pages
ML Minors Exp7
No ratings yet
ML Minors Exp7
6 pages
Grid Search For KNN
No ratings yet
Grid Search For KNN
17 pages
Artistic Maps in GIMP
No ratings yet
Artistic Maps in GIMP
22 pages
Yiye Avila - Dones Del Espíritu
50% (4)
Yiye Avila - Dones Del Espíritu
1 page
English and Business Communication
87% (31)
English and Business Communication
392 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
RD 01 Mus 2
No ratings yet
RD 01 Mus 2
9 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Gonzales Gene08 Ethics Course Syllabus
No ratings yet
Gonzales Gene08 Ethics Course Syllabus
7 pages
Corporate Governance
No ratings yet
Corporate Governance
10 pages
A Journey Through Time - Us Forces in Malir Ww2
No ratings yet
A Journey Through Time - Us Forces in Malir Ww2
7 pages
L2 Students' Barriers in Engaging With Form and Content-Focused AI-generated Feedback in Revising Their Compositions
No ratings yet
L2 Students' Barriers in Engaging With Form and Content-Focused AI-generated Feedback in Revising Their Compositions
23 pages
Replacing The Hood Maxfire
No ratings yet
Replacing The Hood Maxfire
2 pages
Young Rewired State: White Paper V1.1
No ratings yet
Young Rewired State: White Paper V1.1
28 pages
Accounting - Seneca - Toronto, Canada
No ratings yet
Accounting - Seneca - Toronto, Canada
7 pages
High Elf Wizard
No ratings yet
High Elf Wizard
4 pages
Syllabus Computer Science
No ratings yet
Syllabus Computer Science
2 pages
Report Aslan Sissekenov
No ratings yet
Report Aslan Sissekenov
2 pages
Gastrointestinal Physiology
No ratings yet
Gastrointestinal Physiology
6 pages
Usage of Cell Phone and Learning Performance
No ratings yet
Usage of Cell Phone and Learning Performance
12 pages
Smarto Life C
No ratings yet
Smarto Life C
16 pages
ForwardInvoice ORD474579931
No ratings yet
ForwardInvoice ORD474579931
2 pages

Week 10 Abhishek Srivastava VFinal

Uploaded by

Week 10 Abhishek Srivastava VFinal

Uploaded by

Week Ten Exercise

Introduction to Machine Learning

## Warning: package 'tidyverse' was built under R version 4.4.1

## Warning: package 'tidyr' was built under R version 4.4.1

## Warning: package 'purrr' was built under R version 4.4.1

## Warning: package 'dplyr' was built under R version 4.4.1

## Warning: package 'forcats' was built under R version 4.4.1

## Warning: package 'lubridate' was built under R version 4.4.1

## ── Attaching core tidyverse packages ──────────────────────── tidyverse

# Load the datasets

# Plot the data from binary dataset

# Function to predict using k-nearest neighbors algorithm

# Function to calculate accuracy

trinary_predicted_labels <- knn_predict(trinary_data, trinary_data, k)

accuracy_results <- rbind(accuracy_results, data.frame(k = k,

# Plot accuracy results

4. Average Distance Calculation:

5. Determine Elbow Point:

a. Load and Inspect Data:

c. Implement K-Means and Visualize Clusters:

ggplot(data, aes(x = x, y = y, color = cluster)) +

distances <- sapply(1:nrow(data), function(i) {

avg_distance <- mean(distances)

# Calculate average distances for k from 2 to 12

e. Plot Average Distances:

ggplot(avg_distance_data, aes(x = k, y = avg_distance)) +

f. Determine Elbow Point:

You might also like