0% found this document useful (0 votes)

27 views25 pages

MLA Manual

The document is a lab manual for the Machine Learning Algorithms course at G H Raisoni University, detailing course outcomes, practical exercises, and evaluation metrics. It includes practical tasks such as evaluating regression models, implementing classification models, and using various machine learning algorithms like KNN, Linear Regression, and Decision Trees. Each practical section outlines the aim, required software, theoretical background, and sample programs with expected outputs.

Uploaded by

harshnebhnani02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views25 pages

MLA Manual

Uploaded by

harshnebhnani02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

G H RAISONI UNIVERSITY, AMRAVATI

SCHOOL OF ENGINEERING & TECHNOLOGY

Department of Computer Science & Engineering

Lab Manual

Subject: Machine Learning Algorithms

(UAIPR206)

Semester / Branch:
SEM-VI / BTECH CSE

Department of Computer Science & Engineering

G H RAISONI UNIVERSITY ,AMRAVATI
G H RAISONI UNIVERSITY, AMRAVATI

SCHOOL OF ENGINEERING & TECHNOLOGY

Department of Computer Science & Engineering

SCHOOL OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

NAME OF PROGRAM: B. TECH CSE

Name of Course: Machine Learning Algorithms Course Code: UAIPR206

Year/Semester: III / VI Session: 2024-25

Course Outcome: After completion of the course, students will be able to

CO1 : Understand modern notions in machine learning and computing
CO2 : Understand a wide variety of learning algorithms
CO3 : Be capable of confidently applying common Machine Learning algorithms in practice and implementing
their own
CO4 : Evaluate Machine Learning Models generated from data
CO5 : Apply the algorithms to a real problem, optimize the models learned and report on the expected accuracy
that can be achieved by applying the models

PRACTICAL LIST

Sr. No. List of Practical

1 Evaluate a regression model using the following metrics: 1. Mean Absolute Error
2. Mean Squared Error 3. R squared Error
2 Implement a classification model on given dataset.
3 Implement a Linear Regression Model on given dataset.
4 Implement a Decision tree on given dataset.
5 Identify Over fitting and Underfitting on given dataset.
6 Implement Logistic Regression on given dataset.
7 Implement Gaussian Naïve Bayes learning on given dataset.
8 Implement PCA Algorithm
9 Implement K-Means Clustering
10 Implement Gaussian Mixture Models

Practical Teacher HOD

Dr.Mahip Bartere Dr. Amit Gaikwad
Dr. Ajay Kumar
Prof. Sneha Bohra
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY

Department of Computer Science & Engineering

Practical No 1
Aim: Write a program to evaluate a regression model using the following metrics:
1. Mean Absolute Error
2. Mean Squared Error
3. R squared Error
Software Required: Python ,VS Code or Google Colab
Theory: Regression algorithms are algorithms used to expect continuous numerical values
primarily based on entering features.
Types of Regression Metrics
Some common regression metrics are:
1. Mean Absolute Error (MAE)
2. Mean Squared Error (MSE)
3. R-squared (R²) Score

Mean Absolute Error (MAE): In the fields of statistics and machine learning, the Mean
Absolute Error (MAE) is a frequently employed metric. It's a measurement of the typical absolute
discrepancies between a dataset's actual values and projected values.
Mathematical Formula
The formula to calculate MAE for a data with "n" data points is:
𝑛
1
𝑀𝐴𝐸 = ∑ |𝑥𝑖 − 𝑦𝑖 |
𝑛
𝑖=1
Where:
 xi represents the actual or observed values for the i-th data point.
 yi represents the predicted value for the i-th data point.

Mean Squared Error (MSE): A popular metric in statistics and machine learning is the Mean
Squared Error (MSE). It measures the square root of the average discrepancies between a
dataset's actual values and projected values. MSE is frequently utilized in regression issues and
is used to assess how well predictive models work.
Mathematical Formula
For a dataset containing 'n' data points, the MSE calculation formula is:
𝑛
1
𝑀𝑆𝐸 = ∑(𝑥𝑖 − 𝑦𝑖 )2
𝑛
𝑖=1
where:
 xi represents the actual or observed value for the i-th data point.
 yi represents the predicted value for the i-th data point.

R-squared (R²) Score: A statistical metric frequently used to assess the goodness of fit of a
regression model is the R-squared (R2) score, also referred to as the coefficient of
determination. It quantifies the percentage of the dependent variable's variation that the model's
independent variables contribute to. R2 is a useful statistic for evaluating the overall
effectiveness and explanatory power of a regression model.
Mathematical Formula
The formula to calculate the R-squared score is as follows:
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
𝑆𝑆𝑅
𝑅2 = 1 −
𝑆𝑆𝑇
Where:
 R2 is the R-Squared.
 SSR represents the sum of squared residuals between the predicted values and actual values.
 SST represents the total sum of squares, which measures the total variance in the dependent
variable.

Program:
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score

true_values = [2.5, 3.7, 1.8, 4.0, 5.2]

predicted_values = [2.1, 3.9, 1.7, 3.8, 5.0]

mae = mean_absolute_error(true_values, predicted_values)

print("Mean Absolute Error:", mae)
mse=mean_squared_error(true_values,predicted_values)
print("Mean Squared Error:",mse)
r2_score=r2_score(true_values,predicted_values)
print("R2-Square:",r2_score)

Output:
Mean Absolute Error: 0.22000000000000003
Mean Squared Error: 0.057999999999999996
R2-Square: 0.9588769143505389

Conclusion: Thus, we have successfully evaluated the regression model using various metrics.
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
Practical No 2
Aim: Evaluate a classification model
Problem Statement: Use Iris Dataset to implement KNN Classification.
Software Required: Python ,VS Code or Google Colab
Theory: Classification is a type of supervised learning in machine learning, where the goal is to
predict the categorical label (or class) of a given input based on labeled training data. In other
words, the model is trained on a dataset where the input data is associated with a predefined class,
and it learns to map the input features to these classes. After training, the model can then classify
new, unseen data into one of the predefined classes.
Types of Classification
1. Binary Classification:
o A classification problem where the task is to classify the input data into one of two
classes.
o Example: Predicting whether an email is "spam" or "not spam."
2. Multiclass Classification:
o A classification problem where the task is to classify the input data into more than
two classes.
o Example: Predicting the type of flower (Iris dataset) into categories like "setosa,"
"versicolor," or "virginica."
3. Multilabel Classification:
o A problem where each data point can be assigned multiple labels simultaneously.
o Example: Predicting which genres a movie belongs to (action, comedy, drama, etc.).
A single movie can belong to multiple genres.

Common Classification Algorithms:

1. Logistic Regression
2. K-Nearest Neighbors (KNN)
3. Support Vector Machines (SVM)
4. Decision Trees
5. Random Forest
6. Naive Bayes
7. Neural Networks

Evaluation Metrics for Classification: To assess the performance of classification models, various
metrics are used, depending on the nature of the task and the class distribution:
1. Accuracy
2. Precision
3. Recall (Sensitivity)
4. F1 Score
5. Confusion Matrix
6. Area Under the ROC Curve (AUC-ROC)

Program:
# Import necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY

from sklearn.metrics import accuracy_score, classification_report,

confusion_matrix

# Load the Iris dataset

iris = load_iris()
X = iris.data # Features
y = iris.target # Target labels

# Convert the features into a DataFrame to display the features with column
names
iris_df = pd.DataFrame(X, columns=iris.feature_names)

# Add the target variable 'species' to the DataFrame

iris_df['species'] = iris.target_names[y] # Map target values to species
names

# Display the first few rows of the dataset (features)

print("Iris Dataset Features:")
print(iris_df.head()) # Displaying the first 5 rows

# Plot Pairwise Scatter Plots (Using pairplot from seaborn)

sns.pairplot(iris_df, hue="species", palette="Set2",
plot_kws={'alpha':0.5}) # Now 'species' column is available
plt.suptitle('Pairwise Scatter Plots of Iris Dataset', y=1.02)
plt.show()

# Plot Correlation Heatmap

corr_matrix = iris_df.drop(columns=['species']).corr() # Calculate
correlation for numerical features only
plt.figure(figsize=(6, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f',
linewidths=0.5)
plt.title('Correlation Heatmap of Iris Dataset')
plt.show()

# Split the dataset into training and testing sets (80% training, 20%
testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Feature scaling (Standardize the data)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize the KNN classifier with k=3

knn = KNeighborsClassifier(n_neighbors=3)

# Train the model

knn.fit(X_train, y_train)
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY

# Make predictions on the test set

y_pred = knn.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
print(f"\nAccuracy: {accuracy:.2f}")

# Print the classification report

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Print the confusion matrix

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Output:
Iris Dataset Features:
sepal length (cm) sepal width (cm) petal length (cm) petal width
(cm) species
0 5.1 3.5 1.4 0.2
setosa
1 4.9 3.0 1.4 0.2
setosa
2 4.7 3.2 1.3 0.2
setosa
3 4.6 3.1 1.5 0.2
setosa
4 5.0 3.6 1.4 0.2
setosa

Accuracy: 1.00

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 10

1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]

Conclusion: Thus, we have successfully implemented Classification Model.

G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
Practical No 3
Aim: Implement a Linear Regression Model on given dataset.
Software Required: Python ,VS Code or Google Colab
Program:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Step 1: Load the California Housing dataset

data = fetch_california_housing()
X = data.data[:, :1] # Let's use just the first feature for
simplicity in visualization
y = data.target

# Step 2: Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42)

# Step 3: Initialize the Linear Regression model

model = LinearRegression()

# Step 4: Train the model using the training data

model.fit(X_train, y_train)

# Step 5: Make predictions using the trained model

y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

# Step 6: Evaluate the model performance using Mean Squared

Error (MSE) and R-squared (R²)
train_mse = mean_squared_error(y_train, y_train_pred)
test_mse = mean_squared_error(y_test, y_test_pred)
train_r2 = r2_score(y_train, y_train_pred)
test_r2 = r2_score(y_test, y_test_pred)

# Print the results

print(f"Training Mean Squared Error: {train_mse:.4f}")
print(f"Test Mean Squared Error: {test_mse:.4f}")
print(f"Training R²: {train_r2:.4f}")
print(f"Test R²: {test_r2:.4f}")

# Step 7: Visualize the results

plt.figure(figsize=(8, 6))

# Plot the training data and the model's predictions

plt.scatter(X_train, y_train, color='blue', label='Training
Data', alpha=0.5)
plt.plot(X_train, y_train_pred, color='red', label='Linear
Regression Line (Training)', linewidth=2)

# Plot the testing data and the model's predictions

G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
plt.scatter(X_test, y_test, color='green', label='Testing
Data', alpha=0.5)
plt.plot(X_test, y_test_pred, color='orange', label='Linear
Regression Line (Testing)', linestyle='--', linewidth=2)

plt.title('Linear Regression on California Housing Dataset')

plt.xlabel('Feature 1: Median Income')
plt.ylabel('Median House Value')
plt.legend()
plt.show()

Output:
Training Mean Squared Error: 0.7051
Test Mean Squared Error: 0.6918
Training R²: 0.4737
Test R²: 0.4729

Conclusion: Thus, we have successfully implemented Linear Regression.

G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
Practical No 4
Aim: Implement a Decision tree on given dataset.
Problem Statement: Use Iris Dataset to implement Decision Tree.
Software Required: Python ,VS Code or Google Colab
Theory: A Decision Tree is a popular and powerful algorithm used in machine learning for both
classification and regression tasks. It is a supervised learning method that recursively splits the
dataset into subsets based on the features, ultimately creating a tree-like model of decisions. The
goal is to make predictions by following paths from the root of the tree to a leaf node based on the
input features.
In a classification task, a decision tree predicts a class label for a given input, while in a regression
task, it predicts a continuous value.
Key Concepts in Decision Trees
1. Root Node: The root node represents the entire dataset, which is then recursively split into
subgroups based on the feature that results in the most effective division. The root node contains
the whole dataset before any decisions are made.
2.Internal Nodes: Each internal node represents a decision or a test on one of the features. Based
on the outcome of the test, the dataset is split into two or more branches.
3. Branches: Branches represent the outcome of a test (decision) at each internal node. Each
branch connects a node to its child node.
4. Leaf Nodes (Terminal Nodes): Leaf nodes represent the final decision or outcome. In
classification tasks, they contain the predicted class label. In regression tasks, they contain the
predicted continuous value.
5.Splitting:The process of dividing the dataset into subsets based on a specific feature and its
corresponding threshold or category. The goal is to create pure nodes, where the data points in
each node are as similar as possible with respect to the target variable.
6.Pruning: Pruning is the process of removing branches from the decision tree after it has been
grown to its full size. This helps to prevent overfitting and ensures that the model generalizes well
to new data.

Program:
# Importing the required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn import tree

# Loading the dataset

iris = load_iris()

#converting the data to a pandas dataframe

data = pd.DataFrame(data = iris.data, columns = iris.feature_names)

#creating a separate column for the target variable of iris dataset

data['Species'] = iris.target

#replacing the categories of target variable with the actual names of the
species
target = np.unique(iris.target)
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
target_n = np.unique(iris.target_names)
target_dict = dict(zip(target, target_n))
data['Species'] = data['Species'].replace(target_dict)

# Separating the independent dependent variables of the dataset

x = data.drop(columns = "Species")
y = data["Species"]
names_features = x.columns
target_labels = y.unique()

# Splitting the dataset into training and testing datasets

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3,
random_state = 93)

# Importing the Decision Tree classifier class from sklearn

from sklearn.tree import DecisionTreeClassifier

# Creating an instance of the classifier class

dtc = DecisionTreeClassifier(max_depth = 3, random_state = 93)

# Fitting the training dataset to the model

dtc.fit(x_train, y_train)

# Plotting the Decision Tree

plt.figure(figsize = (30, 10), facecolor = 'b')
Tree = tree.plot_tree(dtc, feature_names = names_features, class_names =
target_labels, rounded = True, filled = True, fontsize = 14)
plt.show()
y_pred = dtc.predict(x_test)

# Finding the confusion matrix

confusion_matrix = metrics.confusion_matrix(y_test, y_pred)
matrix = pd.DataFrame(confusion_matrix)
axis = plt.axes()
sns.set(font_scale = 1.3)
plt.figure(figsize = (10,7))

# Plotting heatmap
sns.heatmap(matrix, annot = True, fmt = "g", ax = axis, cmap = "magma")
axis.set_title('Confusion Matrix')
axis.set_xlabel("Predicted Values", fontsize = 10)
axis.set_xticklabels([''] + target_labels)
axis.set_ylabel( "True Labels", fontsize = 10)
axis.set_yticklabels(list(target_labels), rotation = 0)
plt.show()
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
Output:

Conclusion: Thus, we have successfully implemented Decision Tree.

G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
Practical No 5
Aim: Identify Over fitting and Underfitting on given dataset.
Problem Statement: Use Iris Dataset to implement KNN Classification.
Software Required: Python ,VS Code or Google Colab
Theory: In machine learning, the terms overfitting and underfitting refer to the model's ability
to generalize well to unseen data. These are key concepts when evaluating how well a model
fits a dataset, and they are associated with bias-variance trade-off.
Overfitting: Overfitting occurs when a machine learning model learns not only the underlying
patterns in the training data but also the noise and random fluctuations that do not generalize
well to new, unseen data. Essentially, the model becomes too complex and too tailored to the
training data.
Characteristics of Overfitting:
1. High variance, low bias
2. Model Complexity
3. Low training error, high test error
Why Does Overfitting Happen?
1. Too many features
2. Excessive model complexity
3. Insufficient data
Signs of Overfitting:
1. Good performance on training data, poor performance on test data.
2. The model is very sensitive to small changes in the input data.
3. A very complex model (e.g., deep decision trees or a high-degree polynomial in
regression) that closely tracks the training data.
How to Avoid Overfitting?
1. Simplify the model
2. Regularization
3. Cross-validation
4. Increase the dataset size
5. Early stopping

Underfitting: Underfitting occurs when a model is too simple to capture the underlying patterns
of the data. It is characterized by both high bias and low variance. The model is not complex
enough to learn the relationships in the data, resulting in poor performance on both the training
and test datasets.

Characteristics of Underfitting:
1. High bias, low variance
2. Model Complexity
3. High training error, high test error
Why Does Underfitting Happen?
1. Too few features
2. Model too simple
3. Overly strong assumptions

Signs of Underfitting:
1. Poor performance on both training and test data.
2. The model is too simple and doesn't have the capacity to learn the data patterns.
3. The model may consistently make the same error (e.g., using a linear regression on non-
linear data).
How to Avoid Underfitting?
1. Increase model complexity
2. Use more relevant features
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
3. Reduce regularization:
4. Tune hyperparameters

Program:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error

# Step 1: Load the California housing dataset

from sklearn.datasets import fetch_california_housing

# Load dataset
data = fetch_california_housing()
X = data.data[:, :1] # Using just one feature for simplicity in
visualization
y = data.target

# Step 2: Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Function to train and evaluate model with different polynomial degrees

def plot_overfitting_underfitting(degree):
# Step 3: Create polynomial features
poly = PolynomialFeatures(degree)
X_poly_train = poly.fit_transform(X_train)
X_poly_test = poly.transform(X_test)

# Step 4: Train the Linear Regression model

model = LinearRegression()
model.fit(X_poly_train, y_train)

# Step 5: Make predictions

y_train_pred = model.predict(X_poly_train)
y_test_pred = model.predict(X_poly_test)

# Step 6: Calculate Mean Squared Error

train_mse = mean_squared_error(y_train, y_train_pred)
test_mse = mean_squared_error(y_test, y_test_pred)

print(f"Degree {degree} -> Train MSE: {train_mse:.4f}, Test MSE:

{test_mse:.4f}")

# Plot the results (showing the fit over the data)

plt.figure(figsize=(8, 6))
X_range = np.linspace(X.min(), X.max(), 1000).reshape(-1, 1)
X_poly_range = poly.transform(X_range)
y_range_pred = model.predict(X_poly_range)

plt.scatter(X, y, color='gray', label='Data points')

plt.plot(X_range, y_range_pred, label=f'Polynomial Degree
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
{degree}', color='red')
plt.title(f'Overfitting and Underfitting (Degree {degree})')
plt.xlabel('Feature 1 (Simplified for Visualization)')
plt.ylabel('Target (Median House Value)')
plt.legend()
plt.show()

# Step 7: Test different polynomial degrees

for degree in [1, 3]:
plot_overfitting_underfitting(degree)

Output:

Conclusion: Thus, we have successfully identified over fitting and under fitting.
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
Practical No 6
Aim: Implement a Logistic Regression Model on given datasets
Problem Statement: Use Diabetes dataset to implement and evaluate Logistic Regression.
Software Required: Python ,VS Code or Google Colab
Theory: A statistical model for binary classification is called logistic regression. Using the sigmoid
function, it forecasts the likelihood that an instance will belong to a particular class, guaranteeing
results between 0 and 1. To minimize the log loss, the model computes a linear combination of
input characteristics, transforms it using the sigmoid, and then optimizes its coefficients using
methods like gradient descent. These coefficients establish the decision boundary that divides the
classes. Because of its ease of use, interpretability, and versatility across multiple domains,
Logistic Regression is widely used in machine learning for problems that involve binary outcomes.
Over fitting can be avoided by implementing regularization. Logistic Regression models the
likelihood that an instance will belong to a particular class. It uses a linear equation to combine the
input information and the sigmoid function to restrict predictions between 0 and 1. Gradient
descent and other techniques are used to optimize the model’s coefficients to minimize the log
loss. These coefficients produce the resulting decision boundary, which divides instances into two
classes. When it comes to binary classification, logistic regression is the best choice because it is
easy to understand, straightforward, and useful in a variety of settings. Generalization can be
improved by using regularization.
Program:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix, roc_curve, auc
# Load the diabetes dataset
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target

# Convert the target variable to binary (1 for diabetes, 0 for no

diabetes)
y_binary = (y > np.median(y)).astype(int)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_binary,
test_size=0.2, random_state=42)
# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train the Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))
# evaluate the model
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
print("\nClassification Report:\n", classification_report(y_test,
y_pred))

Output:
Accuracy: 73.03%
Confusion Matrix:
[[36 13]
[11 29]]

Classification Report:
precision recall f1-score support

0 0.77 0.73 0.75 49

1 0.69 0.72 0.71 40

accuracy 0.73 89
macro avg 0.73 0.73 0.73 89
weighted avg 0.73 0.73 0.73 89

Conclusion: Thus, we have successfully implemented Logistic Regression Model.

G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY

Practical No 7
Aim: Implement Gaussian Naïve Bayes learning on given dataset.
Problem Statement: Use Wine Dataset to implement Gaussian Naïve Bayes.
Software Required: Python ,VS Code or Google Colab
Program:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns

# Step 1: Load the Wine dataset

wine = datasets.load_wine()
X = wine.data # Features (13 chemical properties of wines)
y = wine.target # Target labels (3 classes of wines)

# Step 2: Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42)

# Step 3: Initialize and train the Gaussian Naive Bayes model

gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Step 4: Make predictions on the test set

y_pred = gnb.predict(X_test)

# Step 5: Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

# Print the accuracy and confusion matrix

print(f"Accuracy: {accuracy * 100:.2f}%")
print("Confusion Matrix:")
print(conf_matrix)

# Step 6: Visualize the confusion matrix using a heatmap

sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=wine.target_names, yticklabels=wine.target_names)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix for Gaussian Naive Bayes (Wine
Dataset)')
plt.show()
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
Output:
Accuracy: 100.00%
Confusion Matrix:
[[19 0 0]
[ 0 21 0]
[ 0 0 14]]

Conclusion: Thus, we have successfully implemented Gaussian Naïve Bayes.

G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY

Practical No 8
Aim: Implement PCA Algorithm
Problem Statement: Use Breast Cancer Dataset to implement PCA Algorithm.
Software Required: Python ,VS Code or Google Colab
Theory: Principal Component Analysis (PCA) is a dimensionality reduction technique widely
used in data analysis, machine learning, and statistics. PCA transforms the data into a new
coordinate system, where the axes are the principal components that capture the maximum
variance in the data. This technique is essential for simplifying data, improving computational
efficiency, and enhancing visualization, especially in high-dimensional spaces.
Objective of PCA: The primary goal of PCA is to reduce the dimensionality of a dataset while
retaining as much variance (information) as possible. It achieves this by projecting the original
data onto a new set of orthogonal axes called principal components. These components are
ordered in such a way that the first principal component captures the maximum variance in the
data, followed by the second, and so on.
Why Use PCA?
 Dimensionality Reduction: PCA helps in reducing the number of features (dimensions) in a
dataset while preserving the most important patterns in the data. This is especially useful for
visualizing high-dimensional data (e.g., reducing from 30 features to 2 or 3 for visualization).
 Noise Reduction: By discarding components with low variance, PCA can help in eliminating
noisy features that might not contribute to the underlying structure of the data.
 Improving Model Performance: Reducing the number of features can lead to better
performance for machine learning algorithms, particularly for models that struggle with high-
dimensional data.

Program:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Step 1: Load the Breast Cancer dataset

cancer = load_breast_cancer()
X = cancer.data # Features
y = cancer.target # Target labels

# Step 2: Standardize the data

scaler = StandardScaler()
X_standardized = scaler.fit_transform(X)

# Step 3: Compute the covariance matrix

cov_matrix = np.cov(X_standardized.T)

# Step 4: Calculate the eigenvalues and eigenvectors

eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Step 5: Sort the eigenvalues and eigenvectors

eigenvalue_index = np.argsort(eigenvalues)[::-1]
sorted_eigenvectors = eigenvectors[:, eigenvalue_index]
sorted_eigenvalues = eigenvalues[eigenvalue_index]

# Step 6: Select the top k eigenvectors (for 2D visualization,

k=2)
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
k = 2
eigenvector_subset = sorted_eigenvectors[:, :k]

# Step 7: Project the data onto the new basis (principal

components)
X_pca = X_standardized.dot(eigenvector_subset)

# Step 8: Visualize the projected data (2D plot)

plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='coolwarm',
edgecolor='k', s=100)
plt.title('PCA - Breast Cancer Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.colorbar(label='Malignant (1) or Benign (0)')
plt.show()

# Step 9: Print the explained variance ratio (for

understanding how much variance is captured)
explained_variance = sorted_eigenvalues /
np.sum(sorted_eigenvalues)
print("Explained Variance Ratio:", explained_variance[:k])

Output:
Explained Variance Ratio: [0.44272026 0.18971182]

Conclusion: Thus, we have successfully implemented PCA.

G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
Practical No 9
Aim: Implement K-Means Clustering
Problem Statement: Use Wine Dataset to implement K-Means Clustering.
Software Required: Python ,VS Code or Google Colab
Theory: K-Means Clustering is one of the most widely used unsupervised learning algorithms
for clustering data. It is a partition-based clustering algorithm that divides a dataset into K
clusters based on feature similarities. The main idea behind K-Means is to assign each data
point to one of the K clusters and update the cluster centers (centroids) iteratively until
convergence.
Objective of K-Means Clustering: The objective of the K-Means algorithm is to partition a
given set of data points into K clusters such that
1. Points within each cluster are more similar to each other than to those in other clusters.
2. Each cluster is represented by its centroid, which is the mean of all points within that
cluster.
Program:
import numpy as np
import pandas as pd
from sklearn.datasets import load_wine
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

# Step 1: Load the Wine dataset

wine = load_wine()
X = wine.data # Features
y = wine.target # Target labels (not used in clustering)

# Step 2: Standardize the data

scaler = StandardScaler()
X_standardized = scaler.fit_transform(X)

# Step 3: Apply K-Means clustering

kmeans = KMeans(n_clusters=3, random_state=42) # Set number
of clusters to 3 (since we know there are 3 wine types)
kmeans.fit(X_standardized)

# Step 4: Get the cluster labels and centroids

labels = kmeans.labels_ # Cluster labels for each data point
centroids = kmeans.cluster_centers_ # Centroids of each
cluster

# Step 5: Visualize the clusters (using the first two features

for 2D visualization)
plt.figure(figsize=(8, 6))
plt.scatter(X_standardized[:, 0], X_standardized[:, 1],
c=labels, cmap='viridis', edgecolor='k', s=100)
plt.scatter(centroids[:, 0], centroids[:, 1], c='red',
marker='X', s=200, label='Centroids')
plt.title('K-Means Clustering on Wine Dataset')
plt.xlabel('Standardized Alcohol Content')
plt.ylabel('Standardized Malic Acid')
plt.legend()
plt.show()
G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
# Step 6: Print the cluster centroids
print("Cluster Centroids (after K-Means clustering):")
print(centroids)

# Step 7: Optional: Evaluate the clustering (e.g., using

Inertia or Silhouette Score)
inertia = kmeans.inertia_
print("Inertia (Sum of squared distances to centroids):",
inertia)

Output:
Cluster Centroids (after K-Means clustering):
[[-0.92607185 -0.39404154 -0.49451676 0.17060184 -0.49171185
-0.07598265 0.02081257 -0.03353357 0.0582655 -0.90191402
0.46180361 0.27076419 -0.75384618]
[ 0.16490746 0.87154706 0.18689833 0.52436746 -0.07547277
-0.97933029 -1.21524764 0.72606354 -0.77970639 0.94153874
-1.16478865 -1.29241163 -0.40708796]
[ 0.83523208 -0.30380968 0.36470604 -0.61019129 0.5775868
0.88523736 0.97781956 -0.56208965 0.58028658 0.17106348
0.47398365 0.77924711 1.12518529]]
Inertia (Sum of squared distances to centroids):
1277.928488844642

Conclusion: Thus, we have successfully implemented K-Means Clustering.

G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY

Practical No 10
Aim: Implement Gaussian Mixture Models
Problem Statement: Use Digit Dataset to implement Gaussian Mixture Models.
Software Required: Python ,VS Code or Google Colab

Program:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
from sklearn.mixture import GaussianMixture
from sklearn.decomposition import PCA

# Step 1: Load the Digits dataset

digits = load_digits()
X = digits.data # Features (pixel values of the images)
y = digits.target # Labels (the actual digits, which we won't
use for clustering)

# Step 2: Standardize the data

scaler = StandardScaler()
X_standardized = scaler.fit_transform(X)

# Step 3: Fit the Gaussian Mixture Model (GMM)

# Let's assume there are 10 clusters because there are 10
digits (0-9)
gmm = GaussianMixture(n_components=10, random_state=42)
gmm.fit(X_standardized)

# Step 4: Predict the cluster labels

labels = gmm.predict(X_standardized)

# Step 5: Visualize the GMM clustering result (use PCA to

reduce dimensionality to 2D)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_standardized)

plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels,
cmap='viridis', edgecolor='k', s=100)
plt.title('Gaussian Mixture Model Clustering (Digits
Dataset)')
plt.xlabel('PCA Component 1')
plt.ylabel('PCA Component 2')
plt.colorbar(label='Cluster Label')
plt.show()

# Step 6: Print the GMM means (centroids) and covariances

(cluster shape)
print("Cluster Means (Centroids):")
print(gmm.means_)

print("\nCovariances (Cluster Shapes):")

G H RAISONI UNIVERSITY, AMRAVATI
SCHOOL OF ENGINEERING & TECHNOLOGY
print(gmm.covariances_)

# Step 7: Log-Likelihood (for model evaluation)

log_likelihood = gmm.score(X_standardized)
print("\nLog-Likelihood of the model:", log_likelihood)

Output:

Conclusion: Thus, we have successfully implemented Gaussian Mixture Models.

Machine Learning With Python 2021
No ratings yet
Machine Learning With Python 2021
124 pages
Group B: Machine Learning
No ratings yet
Group B: Machine Learning
25 pages
Unit 2
No ratings yet
Unit 2
20 pages
Top 200
No ratings yet
Top 200
232 pages
Adobe Scan 30 Dec 2024
No ratings yet
Adobe Scan 30 Dec 2024
1 page
IAS111 Study Guide 2024
No ratings yet
IAS111 Study Guide 2024
15 pages
Ads Exp 4
No ratings yet
Ads Exp 4
4 pages
ML Python
No ratings yet
ML Python
11 pages
Aam Unit 1 QB With Answer
No ratings yet
Aam Unit 1 QB With Answer
12 pages
Predictive ModellingAnalytics
No ratings yet
Predictive ModellingAnalytics
27 pages
Lecture 5
No ratings yet
Lecture 5
41 pages
Global Elevation Data Download Tool - January 15, 2025
No ratings yet
Global Elevation Data Download Tool - January 15, 2025
5 pages
Ammonia STD 10
No ratings yet
Ammonia STD 10
2 pages
Lect 02 Evaluation Part 1
No ratings yet
Lect 02 Evaluation Part 1
33 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
ML Lab
No ratings yet
ML Lab
23 pages
ADS Exp 04
No ratings yet
ADS Exp 04
3 pages
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
No ratings yet
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
13 pages
Performance Evaluation
No ratings yet
Performance Evaluation
24 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
Intro To Sampling Theory
No ratings yet
Intro To Sampling Theory
11 pages
ML Combined
No ratings yet
ML Combined
254 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
CH 19 Cardiovascular System
No ratings yet
CH 19 Cardiovascular System
25 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
INT354 Syllabus
No ratings yet
INT354 Syllabus
2 pages
Saep 349 PDF
100% (1)
Saep 349 PDF
41 pages
Evaluation Metrics For Your Regression Model - Analytics Vidhya
No ratings yet
Evaluation Metrics For Your Regression Model - Analytics Vidhya
6 pages
PMA 133 Book - Verbal Intelligence Test Questions (Solved) - 1
No ratings yet
PMA 133 Book - Verbal Intelligence Test Questions (Solved) - 1
4 pages
Krishna Edx Machine Learning With Python
No ratings yet
Krishna Edx Machine Learning With Python
18 pages
SupervisedLearning Classification
No ratings yet
SupervisedLearning Classification
20 pages
Atlas Copco Pf4000 Manual
67% (6)
Atlas Copco Pf4000 Manual
476 pages
ML Cyber Lab
No ratings yet
ML Cyber Lab
16 pages
FAM Unit6
No ratings yet
FAM Unit6
32 pages
BCS 040 Previous Year Question Papers by Ignouassignmentguru 2
No ratings yet
BCS 040 Previous Year Question Papers by Ignouassignmentguru 2
66 pages
22AIP3101A Session 3
No ratings yet
22AIP3101A Session 3
24 pages
Assignment 3 - LP1
No ratings yet
Assignment 3 - LP1
13 pages
JETIR2403387
No ratings yet
JETIR2403387
5 pages
Effect of Grist
No ratings yet
Effect of Grist
9 pages
Lecture - (3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture - (3-4) Evaluation Metrices Classification and Regression
28 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
DUAL NATURE Test
No ratings yet
DUAL NATURE Test
2 pages
Object Oriented Analysis and Design
No ratings yet
Object Oriented Analysis and Design
12 pages
Lecture 1 - INTRODUCTION
No ratings yet
Lecture 1 - INTRODUCTION
25 pages
ML RECORD - Merged
No ratings yet
ML RECORD - Merged
33 pages
Understanding Scuffing and Micropitting of Gears: R W Snidle, H P Evans, M P Alanou, M J A Holmes
No ratings yet
Understanding Scuffing and Micropitting of Gears: R W Snidle, H P Evans, M P Alanou, M J A Holmes
18 pages
Final Cc01 Group7
No ratings yet
Final Cc01 Group7
23 pages
General Science Reviewer For LET 2022
No ratings yet
General Science Reviewer For LET 2022
117 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
CS5371 Theory of Computation
No ratings yet
CS5371 Theory of Computation
2 pages
Wanxiang Refrigeration: Lecturer: Jane Xie
No ratings yet
Wanxiang Refrigeration: Lecturer: Jane Xie
34 pages
Mid-1 ML
No ratings yet
Mid-1 ML
12 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
CS 611 Slides 4
No ratings yet
CS 611 Slides 4
25 pages
W8-Supervised Learning Methods
No ratings yet
W8-Supervised Learning Methods
30 pages
Full Text 02
No ratings yet
Full Text 02
52 pages
Unit 2
No ratings yet
Unit 2
80 pages
FINAL - CC01 - Group7
No ratings yet
FINAL - CC01 - Group7
23 pages
Regression
No ratings yet
Regression
35 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
Application of 3D Numerical Model in Bed PDF
No ratings yet
Application of 3D Numerical Model in Bed PDF
11 pages
ME P4252-II Semester - MACHINE LEARNING
100% (1)
ME P4252-II Semester - MACHINE LEARNING
48 pages
PDF
No ratings yet
PDF
3 pages
Homomorphism
No ratings yet
Homomorphism
10 pages
AIML Lab
No ratings yet
AIML Lab
48 pages
Chemistry Acid and Basic Radicals
87% (15)
Chemistry Acid and Basic Radicals
1 page
Week 2: Machine Learning Intro: Instructor: Ting Sun
No ratings yet
Week 2: Machine Learning Intro: Instructor: Ting Sun
21 pages
Seminar Presentation
No ratings yet
Seminar Presentation
25 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
Exam Paper 2 Year 6 (Math)
50% (2)
Exam Paper 2 Year 6 (Math)
7 pages
Machine Learning Study Experiment
No ratings yet
Machine Learning Study Experiment
5 pages
23 Hack in Sight 2014
100% (2)
23 Hack in Sight 2014
652 pages
Exercises
No ratings yet
Exercises
69 pages
Ajax Selenium Webdriver
No ratings yet
Ajax Selenium Webdriver
6 pages
HW#7 Solutions
No ratings yet
HW#7 Solutions
5 pages
COMP1801 - Copy 1
No ratings yet
COMP1801 - Copy 1
18 pages
Crude Oil Emulsions A State-Of-The-Art Review
100% (3)
Crude Oil Emulsions A State-Of-The-Art Review
11 pages
DIN A Rail Sections
100% (1)
DIN A Rail Sections
1 page
Cryptography and Network Security: Fifth Edition by William Stallings
No ratings yet
Cryptography and Network Security: Fifth Edition by William Stallings
23 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
Statistical Learning Slides
No ratings yet
Statistical Learning Slides
60 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Machine Learning: Engr. Ejaz Ahmad
No ratings yet
Machine Learning: Engr. Ejaz Ahmad
54 pages
Relatório Machine Learning
No ratings yet
Relatório Machine Learning
24 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet