Week 10 Abhishek Srivastava VFinal
Week 10 Abhishek Srivastava VFinal
Abhishek Srivastava
2024-08-10
library(class)
library(ggplot2)
# Fit k-nearest neighbors model for each dataset and compute accuracy
k_values <- c(3, 5, 10, 15, 20, 25)
accuracy_results <- data.frame()
for (k in k_values) {
binary_predicted_labels <- knn_predict(binary_data, binary_data, k)
binary_accuracy <- calculate_accuracy(binary_data$label,
binary_predicted_labels)
# Linear classifier might not work well on these datasets as the data points
are not linearly separable
# The accuracy of logistic regression classifier from last week may differ
from k-nearest neighbors due to the underlying assumptions and how the models
handle the data. Logistic regression assumes a linear relationship between
features and the log-odds of the output, while k-nearest neighbors makes
predictions based on the similarity of data points. The two methods may
perform differently depending on the distribution of the data and the
relationships between features.
Clustering Solution
### Summary :
1. Data Preparation:
• Load the dataset: load clustering-data.csv.
• Inspect the data: Examine the structure and content of the dataset.
2. Data Visualization:
• Scatter plot: Plot the data points using a scatter plot to visualize the dataset.
3. K-Means Clustering:
• Fit the model: Apply the k-means clustering algorithm to the dataset for different
values of k (from 2 to 12).
• Visualize clusters: Create scatter plots showing the resultant clusters for each
value of k.
Detailed Steps:
# Load dataset
data <- read.csv("clustering-data.csv")
# Inspect dataset
head(data)
## x y
## 1 46 236
## 2 69 236
## 3 144 236
## 4 171 236
## 5 194 236
## 6 195 236
b. Plot Data:
# Scatter plot of the dataset
ggplot(data, aes(x = x, y = y)) +
geom_point() +
ggtitle("Clustering Data")