0% found this document useful (0 votes)
17 views30 pages

ML Lab

Uploaded by

Deepesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views30 pages

ML Lab

Uploaded by

Deepesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Guru Tegh Bahadur Institute of Technology, Delhi

(Affiliated to Guru Gobind Singh Indraprastha University, Dwarka, New Delhi)

Machine Learning Lab


Paper Code: CIE‐421P

Academic Year: 2024-2025


Semester: 7th

Submitted To: Submitted By:

Ms. Amanpreet Kaur Name: Gyanesh Kumar


Roll No: 05876802721

Guru Tegh Bahadur Institute of Technology


Department of Computer Science & Engineering

1|Page
INDEX
S. List of Experiments Date T.
No. Sign
1 Introduction to JUPYTER IDE and its libraries
Pandas and NumPy
2 Program to demonstrate Simple Linear
Regression
3 Program to demonstrate Logistic Regression
4 Program to demonstrate Decision Tree - ID3
Algorithm
5 Program to demonstrate k-Nearest Neighbor
flowers classification
6 Program to demonstrate Naïve- Bayes
Classifier
7 Program to demonstrate PCA and LDA on Iris
dataset
8 Program to demonstrate DBSCAN clustering
algorithm
9 Program to demonstrate K-Medoids clustering
algorithm
10. Program to demonstrate K-Means Clustering
Algorithm on Handwritten Dataset

2|Page
Experiment -1
Aim:
Introduction to JUPYTER IDE and its libraries Pandas and NumPy.

Theory:
Jupyter IDE: Jupyter Notebook provides an interactive coding environment ideal for data
analysis, machine learning, and scientific computing. It allows users to mix code with rich
text, images, and visualizations within a single document, making it popular for both research
and education.
Main Features of Jupyter Notebook:
• Cell-based Execution: Code is written in cells, and each cell can be run
independently, allowing for step-by-step code execution.
• Markdown Integration: Support for Markdown lets users document their code and add
formatted text, equations, and images, enhancing readability.
• Data Visualization: Jupyter can display outputs inline, making it easy to embed
visualizations and graphs from libraries like Matplotlib and Seaborn.
• Export Options: Notebooks can be exported in formats such as HTML, PDF, or slides,
facilitating collaboration and sharing.

3|Page
Core Libraries for Data Analysis:
1. NumPy (Numerical Python):
o Purpose: Provides support for numerical and matrix computations,
foundational for scientific computing in Python.
o Main Components:
▪ ndarray (N-dimensional array): NumPy’s primary data structure, which
supports fast and efficient operations on large datasets.
▪ Broadcasting and Vectorized Operations: Eliminates the need for
looping over data by enabling element-wise operations, enhancing
performance.
▪ Mathematical Functions and Linear Algebra: Includes functions for
statistical analysis, linear algebra, Fourier transformations, and random
sampling, which are essential for scientific computations.
o Benefits: NumPy’s optimized performance makes it a preferred choice for
tasks requiring fast and memory-efficient computations, such as handling large
datasets and matrix operations.

2. Pandas:
o Purpose: Provides powerful data manipulation and analysis capabilities,
especially for structured data.
o Main Components:

4|Page
▪ DataFrames and Series: DataFrame is a 2D labeled data structure (akin
to a table), while Series is a 1D array with labels. Together, they
provide an intuitive way to handle and manipulate structured data.
▪ Data Cleaning and Transformation: Pandas excels in data
preprocessing with functions for handling missing values, filtering
rows and columns, renaming, and reshaping data.
▪ Data Aggregation and Grouping: Allows grouping and summarizing
data, which is useful for generating statistics, creating pivot tables, and
performing operations on subsets of data.
▪ Time Series Handling: Pandas has specialized support for time series
data, making it easier to analyze trends and perform date-based
calculations.
o Benefits: Pandas is flexible and integrates well with other libraries, making it
ideal for data wrangling and exploratory data analysis.
Combining Jupyter, NumPy, and Pandas:
These tools together provide an efficient, interactive platform for data exploration and
analysis. Jupyter serves as the interface for executing code and visualizing data, while
NumPy and Pandas provide the computational and data manipulation power needed for
analytical tasks.

5|Page
Experiment -2
Aim:
Program to demonstrate Simple Linear Regression

Theory:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Generating synthetic dataset (or you can load your dataset here)
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Splitting the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating and training the Linear Regression model


model = LinearRegression()
model.fit(X_train, y_train)

# Making predictions on the test set


y_pred = model.predict(X_test)

# Calculating mean squared error


mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

6|Page
# Visualizing the results
plt.scatter(X, y, color="blue", label="Data Points")

# Plotting a continuous regression line


X_line = np.sort(X, axis=0)
y_line = model.predict(X_line)
plt.plot(X_line, y_line, color="red", linewidth=2, label="Regression Line")

plt.xlabel("Independent Variable (X)")


plt.ylabel("Dependent Variable (y)")
plt.title("Simple Linear Regression")
plt.legend()
plt.show()

7|Page
Output:

8|Page
Experiment -3
Aim:
Program to demonstrate Logistic Regression

Theory:
# Importing required libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification

# Generating a synthetic dataset for binary classification


X, y = make_classification(n_samples=200, n_features=2, n_classes=2,
n_informative=2, n_redundant=0, n_clusters_per_class=1,
random_state=0)

# Splitting the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating and training the Logistic Regression model


model = LogisticRegression()
model.fit(X_train, y_train)

# Making predictions on the test set


y_pred = model.predict(X_test)

# Evaluating the model

9|Page
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print("Accuracy:", accuracy)
print("\nConfusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)

# Visualizing the decision boundary


plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', marker='o', s=50)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("Logistic Regression Decision Boundary")

# Plotting the decision boundary


x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
np.arange(y_min, y_max, 0.1))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3, cmap='viridis')
plt.show()

10 | P a g e
Output:

11 | P a g e
Experiment -4
Aim:
Program to demonstrate Decision Tree - ID3 Algorithm

Theory:
# Importing required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn import tree
import matplotlib.pyplot as plt

# Example dataset (or load your dataset here)


data = {
'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overcast', 'Sunny',
'Sunny', 'Rain', 'Sunny', 'Overcast', 'Overcast', 'Rain'],
'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool',
'Mild', 'Mild', 'Mild', 'Hot', 'Mild'],
'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High',
'Normal', 'Normal', 'Normal', 'High', 'Normal', 'High'],
'Windy': [False, True, False, False, False, True, True, False, False, False, True,
True, False, True],
'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes',
'Yes', 'Yes', 'No']
}

# Creating a DataFrame
df = pd.DataFrame(data)

# Encoding categorical data

12 | P a g e
df = pd.get_dummies(df, columns=['Outlook', 'Temperature', 'Humidity', 'Windy'],
drop_first=False)

# Splitting the dataset into features and target variable


X = df.drop('PlayTennis', axis=1)
y = df['PlayTennis'].apply(lambda x: 1 if x == 'Yes' else 0) # Encoding target

# Splitting the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Creating and training the Decision Tree classifier with ID3 algorithm
model = DecisionTreeClassifier(criterion="entropy", random_state=42)
model.fit(X_train, y_train)

# Making predictions on the test set


y_pred = model.predict(X_test)

# Evaluating the model


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print("Accuracy:", accuracy)
print("\nConfusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)

# Visualizing the Decision Tree


plt.figure(figsize=(12, 8))
tree.plot_tree(model, feature_names=X.columns, class_names=["No", "Yes"], filled=True)
plt.title("Decision Tree - ID3 Algorithm")
plt.show()

13 | P a g e
Output:

14 | P a g e
Experiment -5
Aim:
Program to demonstrate k-Nearest Neighbor flowers classification

Theory:
# Importing required libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
import seaborn as sns

# Loading the Iris dataset


iris = load_iris()
X = iris.data # Feature matrix
y = iris.target # Target variable

# Splitting the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Creating and training the k-NN model (using k=3 for this example)
k=3
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)

# Making predictions on the test set


y_pred = knn.predict(X_test)

15 | P a g e
# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred, target_names=iris.target_names)

print("Accuracy:", accuracy)
print("\nConfusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)

# Visualizing the confusion matrix


plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, cmap="Blues", fmt="d",
xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.title(f"k-Nearest Neighbors (k={k}) - Confusion Matrix")
plt.show()

16 | P a g e
Output:

17 | P a g e
Experiment -6
Aim:
Program to demonstrate Naïve- Bayes Classifier

Theory:
# Importing required libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
import seaborn as sns

# Loading the Iris dataset


iris = load_iris()
X = iris.data # Feature matrix
y = iris.target # Target variable

# Splitting the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Creating and training the Naive Bayes model


nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Making predictions on the test set


y_pred = nb_model.predict(X_test)

# Evaluating the model

18 | P a g e
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred, target_names=iris.target_names)

print("Accuracy:", accuracy)
print("\nConfusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)

# Visualizing the confusion matrix


plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, cmap="Blues", fmt="d",
xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.title("Naive Bayes Classifier - Confusion Matrix")
plt.show()

19 | P a g e
Output:

20 | P a g e
Experiment -7
Aim:
Program to demonstrate PCA and LDA on Iris dataset

Theory:
# Importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.model_selection import train_test_split

# Loading the Iris dataset


iris = load_iris()
X = iris.data # Feature matrix
y = iris.target # Target variable
target_names = iris.target_names

# Splitting the dataset into training and testing sets for LDA (optional for PCA)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Applying PCA (keeping first 2 components)


pca = PCA(n_components=2)
X_pca = pca.fit_transform(X) # PCA is applied to the whole dataset

# Plotting the PCA results


plt.figure(figsize=(8, 6))
for i, target_name in zip([0, 1, 2], target_names):

21 | P a g e
plt.scatter(X_pca[y == i, 0], X_pca[y == i, 1], label=target_name)
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA of Iris Dataset")
plt.legend()
plt.show()

# Applying LDA (keeping first 2 components)


lda = LDA(n_components=2)
X_lda = lda.fit_transform(X_train, y_train) # LDA needs both X and y for fitting

# Plotting the LDA results


plt.figure(figsize=(8, 6))
for i, target_name in zip([0, 1, 2], target_names):
plt.scatter(X_lda[y_train == i, 0], X_lda[y_train == i, 1], label=target_name)
plt.xlabel("LDA Component 1")
plt.ylabel("LDA Component 2")
plt.title("LDA of Iris Dataset")
plt.legend()
plt.show()

22 | P a g e
Output:

23 | P a g e
Experiment -8
Aim:
Program to demonstrate DBSCAN clustering algorithm

Theory:
# Importing required libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn.datasets import make_moons

# Generating synthetic data (two interleaving half circles)


X, y = make_moons(n_samples=300, noise=0.1, random_state=42)

# Applying DBSCAN
dbscan = DBSCAN(eps=0.2, min_samples=5)
y_dbscan = dbscan.fit_predict(X)

# Plotting the results


plt.figure(figsize=(10, 6))
unique_labels = set(y_dbscan)
colors = [plt.cm.Spectral(i / float(len(unique_labels))) for i in range(len(unique_labels))]

for k, col in zip(unique_labels, colors):


if k == -1:
# Black color for noise
col = 'k'

# Mask for the current cluster


class_member_mask = (y_dbscan == k)

24 | P a g e
# Plotting the core points
xy_core = X[class_member_mask & np.isin(np.arange(len(X)),
dbscan.core_sample_indices_)]
plt.plot(xy_core[:, 0], xy_core[:, 1], 'o', markerfacecolor=col, markeredgecolor='k',
markersize=8)

# Plotting the non-core points


xy_non_core = X[class_member_mask & ~np.isin(np.arange(len(X)),
dbscan.core_sample_indices_)]
plt.plot(xy_non_core[:, 0], xy_non_core[:, 1], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=5)

plt.title('DBSCAN Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

25 | P a g e
Output:

26 | P a g e
Experiment -9
Aim:
Program to demonstrate K-Medoids clustering algorithm

Theory:
# Importing required libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn_extra.cluster import KMedoids

X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

# Applying K-Medoids
kmedoids = KMedoids(n_clusters=4, random_state=0)
y_kmedoids = kmedoids.fit_predict(X)

plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y_kmedoids, s=30, cmap='viridis', marker='o', edgecolor='k')

# Plotting the Medoids (cluster centers)


plt.scatter(kmedoids.cluster_centers_[:, 0], kmedoids.cluster_centers_[:, 1],
c='red', s=200, marker='X', label='Medoids')

plt.title('K-Medoids Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()

plt.show()

27 | P a g e
Output:

28 | P a g e
Experiment -10
Aim:
Program to demonstrate K-Means Clustering Algorithm on Handwritten Dataset

Theory:
# Importing required libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

# Generating synthetic data


X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

# Applying K-Means
kmeans = KMeans(n_clusters=4, random_state=0)
y_kmeans = kmeans.fit_predict(X)

plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=30, cmap='viridis', marker='o', edgecolor='k')

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], c='red',


s=200, marker='X', label='Centroids')

plt.title('K-Means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

plt.legend()

plt.show()

29 | P a g e
Output:

30 | P a g e

You might also like