0% found this document useful (0 votes)
11 views23 pages

Record

The document outlines various machine learning algorithms implemented in R and Python, including regression models, logistic regression, decision trees, KNN, K-means, and K-medoids. Each experiment includes an aim, algorithm steps, source code, and results indicating successful execution. Additionally, it covers correlation and covariance analysis using R with visualizations and statistical evaluations.

Uploaded by

Lissy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views23 pages

Record

The document outlines various machine learning algorithms implemented in R and Python, including regression models, logistic regression, decision trees, KNN, K-means, and K-medoids. Each experiment includes an aim, algorithm steps, source code, and results indicating successful execution. Additionally, it covers correlation and covariance analysis using R with visualizations and statistical evaluations.

Uploaded by

Lissy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Exp 1 page no.

1
Regression Model

Aim : To write a program for regression model using R. a)Import a data from webpage.

Algorithm :

Step 1 : Start

Step 2 : Load ggplot2 library.

Step 3 : Import Iris dataset from URL.

Step 4 : Fit linear regression: sepal_length ~ sepal_width + petal_length + petal_width.

Step 5 : Display model summary.

Step 6 : Predict sepal_length using the model.

Step 7 : Plot actual vs predicted values.

Step 8 : Stop

Source code :

library(ggplot2)

url <- "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"

df <- read.csv(url)

model <- lm(sepal_length ~ sepal_width + petal_length + petal_width, data = df)

summary(model)

predictions <- predict(model, df)

ggplot(df, aes(x = sepal_length, y = predictions)) +

geom_point(color = 'blue') +

geom_abline(slope = 1, intercept = 0, color = 'red', linetype = "dashed") +

labs(title = "Actual vs Predicted Sepal Length", x = "Actual", y = "Predicted") +

theme_minimal( )
Page no. 2

Result:T he program has been successfully executed.

Output:
Exp 2 : Page no. 3

Logistic regression
Aim: To write a R program for logistic regression model

Algorithm:

Step 1: Start

Step 2: Load libraries: dplyr, ggplot2.

Step 3: Generate data: Create random GRE, GPA, Rank, and Admission data.

Step 4: Fit model: Logistic regression (glm) with Admission as the target.

Step 5: Predict: Calculate predicted probabilities.

Step 6: Plot: Visualize GRE vs. predicted probabilities, colored by Admission.

Source code

library(dplyr)

library(ggplot2)

set.seed(123)

n <- 200

data <- data.frame(

GRE = rnorm(n, mean = 320, sd = 10),

GPA = rnorm(n, mean = 3.5, sd = 0.5)

Rank = sample(1:4, n, replace = TRUE),

Admission = sample(0:1, n, replace = TRUE)

model <- glm(Admission ~ GRE + GPA + Rank, data = data, family = binomial)

summary(model)

data$Predicted_Probabilities <- predict(model, type = "response")

ggplot(data, aes(x = GRE, y = Predicted_Probabilities, color = as.factor(Admission))) +


Page no. 4

geom_point() +

labs(title = "Predicted Probabilities of Admission based on GRE Score",

x = "GRE Score",

y = "Predicted Probability of Admission") +

theme_minimal()

Result: The program has been successfully executed

Output:
Exp : 3 Page no. 5

Apriori algorithm
Aim: To implement apriori algorithm in python

Algorithm:

Step 1: Start

Step 2: Input Dataset: Provide a list of transactions, where each transaction is a list of items.

Step 3: Data Preprocessing: Convert the dataset to a one-hot encoded DataFrame.

Step 4: Apply Apriori Algorithm: Use the apriori() function to find frequent itemsets with a
minimum support threshold.

Step 5: Generate Association Rules: Use association_rules() to generate rules from the
frequent itemsets, filtering based on a metric like "lift".

Step 6:Display Results: Show frequent itemsets and association rules.

Step 7:Stop

Source code:

import pandas as pd

from mlxtend.frequent_patterns import apriori, association_rules

dataset = [ ['Milk', 'Bread', 'Butter'],

['Milk', 'Bread'],

['Bread', 'Butter'],

['Milk', 'Bread', 'Butter'],

['Milk', 'Butter']]

df = pd.DataFrame(dataset, columns=['Milk', 'Bread', 'Butter'])

df = df.notna().astype('int')

frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)

print(frequent_itemsets)

rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)

print(rules)
Page no. 6

Result: The program has been successfully executed

Output:
Exp : 4 Page no. 7

K- Nearest Neighbor Algorithm

Aim : To implement KNN algorithm in Python

Algorithm :

Step 1 : Start

Step 2 : Load the Iris dataset

Step 3 : Create a DataFrame to view the data (optional)

Step 3 : Split the data into training and testing sets

Step 4 : Initialize the KNN classifier with k = 3

Step 5 : Train the classifier on the training data

Step 6 : Predict class labels for the test data

Step 7 : Visualize the predictions using a scatter plot of the first two features

Step : Stop

Source code:

import pandas as pd

import matplotlib.pyplot as plt

from sklearn import datasets

from sklearn.datasets import make_blobs

from sklearn.neighbors import KNeighborsClassifier

from sklearn.model_selection import train_test_split

iris = datasets.load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)

print(df)

X = iris['data']

y = iris['target']
Page no. 8

#X, y = make_blobs(n_samples = 500, n_features = 2, centers = 4,cluster_std = 1.5,


random_state = 4)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

knn3 = KNeighborsClassifier(n_neighbors = 3)

knn3.fit(X_train, y_train)

y_pred_3 = knn3.predict(X_test)

plt.scatter(X_test[:,0], X_test[:,1], c=y_pred_3)

plt.title("Predicted values with k=3", fontsize=20)

plt.show()

Result : The program has been successfully executed

Output :
Page no. 9
Exp 5 page no. 10

Decision Tree Algorithm

Aim : To implement decision tree in python

Algorithm :

Step 1 : Start

Step 2 : Load the Iris dataset

Step 3 : Split data into training and testing sets (70/30)

Step 4 : Train a Decision Tree classifier on the training data

Step 5 : Predict the target values for the test data

Step 6 : Calculate and print the model’s accuracy

Step 7 : Visualize the trained decision tree with feature and class names

Step 8 : Stop

Source code :

from sklearn.datasets import load_iris

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

iris = load_iris()

X = iris.data # Features

y = iris.target # Labels

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

clf = DecisionTreeClassifier()

clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)


Page no. 11

print("Accuracy:", accuracy)

from sklearn.tree import plot_tree

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 8))

plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)

plt.show()

Result : The program has been successfully executed

Output :
Exp : 6 Page no. 12

K Means Algorithm

Aim :To implement K means algorithm in python

Algorithm :

Step 1 : start

Step 2 : Load the Iris dataset

Step 3 : Set the number of clusters (k = 3)

Step 4 : Apply K-Means algorithm to fit the data

Step 5 : Assign each data point to a cluster

Step 6 : Print the cluster centers

Step 7 : Plot the data points colored by cluster labels using the first two features

Step 8 : Stop

Source code :

from sklearn.datasets import load_iris

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

iris = load_iris()

X = iris.data

kmeans = KMeans(n_clusters=3, random_state=42)

kmeans.fit(X)

labels = kmeans.labels_

print("Cluster centers:")

print(kmeans.cluster_centers_)

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')

plt.title("K-Means Clustering (Iris Dataset)")


Page no. 13

plt.xlabel("Feature 1")

plt.ylabel("Feature 2")

plt.show()

Result : The program has been successfully executed.

Output :
Exp 7 : Page no. 14

K-Medoids Algorithm

Aim : To implement K-Medoids algorithm in python.

Algorithm :

Step 1 : start

Step 2 : Load the Iris dataset

Step 3 : Initialize k random medoids from the data

Step 4 : Assign each data point to the nearest medoid (using Euclidean distance)

Step 5 : Update each medoid to the most central point (minimize total distance in its cluster)

Step 6 : Repeat steps 3–4 until medoids do not change or max iterations reached

Step 7 : Plot the final clusters and medoids using the first two features

Step 8 : Stop

Source code :

import numpy as np

import random

from sklearn.datasets import load_iris

import matplotlib.pyplot as plt

iris = load_iris()

X = iris.data

def euclidean_distance(a, b):

return np.linalg.norm(a - b)

def assign_points(X, medoids):

clusters = {}

for idx, medoid in enumerate(medoids):

clusters[idx] = []
Page no.15

for point in X:

distances = [euclidean_distance(point, medoid) for medoid in medoids]

cluster_idx = np.argmin(distances)

clusters[cluster_idx].append(point)

return clusters

def calculate_cost(clusters, medoids):

cost = 0

for idx, cluster_points in clusters.items():

for point in cluster_points:

cost += euclidean_distance(point, medoids[idx])

return cost

def update_medoids(clusters):

new_medoids = []

for cluster_points in clusters.values():

min_cost = float('inf')

best_medoid = None

for candidate in cluster_points:

cost = sum(euclidean_distance(candidate, point) for point in cluster_points)

if cost < min_cost:

min_cost = cost

best_medoid = candidate

new_medoids.append(best_medoid)

return new_medoids

def k_medoids(X, k, max_iter=100):


Page no. 16

initial_medoids = random.sample(list(X), k)

medoids = initial_medoids

for i in range(max_iter):

clusters = assign_points(X, medoids)

new_medoids = update_medoids(clusters)

if np.all([np.array_equal(m, nm) for m, nm in zip(medoids, new_medoids)]):

break

medoids = new_medoids

return medoids, clusters

k=3

medoids, clusters = k_medoids(X, k)

colors = ['r', 'g', 'b']

for idx, points in clusters.items():

points = np.array(points)

plt.scatter(points[:, 0], points[:, 1], c=colors[idx], label=f'Cluster {idx}')

plt.scatter(medoids[idx][0], medoids[idx][1], c='black', marker='X', s=200)

plt.title("K-Medoids Clustering (from scratch)")

plt.xlabel("Feature 1")

plt.ylabel("Feature 2")

plt.legend()

plt.show()

Result : The program has been successfully executed.


Page no. 18

Output :
Exp 8 Page no. 19

Classification Model

Aim : To implement classification model using R

Algorithm :

Step 1 : Start

Step 2 : Load required packages (caret, e1071)

Step 3 : Import Iris dataset from the web

Step 4 : Convert species to a factor

Step 5 : Split data into training (70%) and testing (30%) sets

Step 6 : Train a k-NN model using train() from caret

Step 7 : Predict on the test data

Step 8 : Convert predictions to factor with correct levels

Step 9 : Evaluate using confusionMatrix()

Step 10 : Display accuracy and metrics

Step 11 : stop

Source code :

# Install required packages (run once if not already installed)

install.packages("caret")

install.packages("e1071")

# Load the libraries

library(caret)

library(e1071)

# Import the Iris dataset from the web

url <- "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"

iris_data <- read.csv(url)


Page no. 20

# Convert the target column to factor (important!)

iris_data$species <- as.factor(iris_data$species)

# Set seed for reproducibility

set.seed(123)

# Split the data into training (70%) and testing (30%)

trainIndex <- createDataPartition(iris_data$species, p = 0.7, list = FALSE)

trainData <- iris_data[trainIndex, ]

testData <- iris_data[-trainIndex, ]

# Train a k-NN classifier using caret

knn_model <- train(species ~ ., data = trainData, method = "knn", tuneLength = 5)

# View model details

print(knn_model)

# Predict on test data

predictions <- predict(knn_model, newdata = testData)

# Make sure predictions and actual values are factors with the same levels

predictions <- factor(predictions, levels = levels(testData$species))

# Evaluate model performance

conf_matrix <- confusionMatrix(predictions, testData$species)

print(conf_matrix)

Result : The program has been executed successfully


Page no. 21

Output :
Exp : 9 Page no. 22

Correalation and Covariance Analysis

Aim : To implement correlation and covariance analysis using R

Algorithm :

Step 1 : Start

Step 2 : Load libraries and iris dataset

Step 3 : Compute correlation matrix (numeric columns only)

Step 4 : Plot the correlation matrix using corrplot

Step 5 : Perform ANOVA: Sepal.Length ~ Species

Step 6 : Visualize ANOVA result with a boxplot

Step 7 : Stop

Source code :

library(ggplot2)

library(corrplot)

library(dplyr)

# Load the iris dataset

data(iris)

# We'll exclude the Species column since it's categorical

cor_matrix <- cor(iris[, 1:4]) # Only numerical columns

print("Correlation Matrix:")

print(cor_matrix)

### b. Plot the correlation plot

corrplot(cor_matrix, method = "color", type = "upper",

tl.col = "black", tl.srt = 45, addCoef.col = "black",

title = "Correlation Plot of Iris Dataset", mar=c(0,0,1,0))


Page no. 23

### c. Analysis of Covariance: Variance (ANOVA)

# We'll check if species has a significant effect on Sepal.Length

# Fit the ANOVA model

anova_model <- aov(Sepal.Length ~ Species, data = iris)

summary(anova_model)

# Plotting boxplot for visual understanding

ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +

geom_boxplot() +

labs(title = "ANOVA: Sepal Length by Species",

x = "Species", y = "Sepal Length") +

theme_minimal()

Result : The program has been successfully executed

Output :
Page no. 24

You might also like