ML Lab
ML Lab
1|Page
INDEX
S. List of Experiments Date T.
No. Sign
1 Introduction to JUPYTER IDE and its libraries
Pandas and NumPy
2 Program to demonstrate Simple Linear
Regression
3 Program to demonstrate Logistic Regression
4 Program to demonstrate Decision Tree - ID3
Algorithm
5 Program to demonstrate k-Nearest Neighbor
flowers classification
6 Program to demonstrate Naïve- Bayes
Classifier
7 Program to demonstrate PCA and LDA on Iris
dataset
8 Program to demonstrate DBSCAN clustering
algorithm
9 Program to demonstrate K-Medoids clustering
algorithm
10. Program to demonstrate K-Means Clustering
Algorithm on Handwritten Dataset
2|Page
Experiment -1
Aim:
Introduction to JUPYTER IDE and its libraries Pandas and NumPy.
Theory:
Jupyter IDE: Jupyter Notebook provides an interactive coding environment ideal for data
analysis, machine learning, and scientific computing. It allows users to mix code with rich
text, images, and visualizations within a single document, making it popular for both research
and education.
Main Features of Jupyter Notebook:
• Cell-based Execution: Code is written in cells, and each cell can be run
independently, allowing for step-by-step code execution.
• Markdown Integration: Support for Markdown lets users document their code and add
formatted text, equations, and images, enhancing readability.
• Data Visualization: Jupyter can display outputs inline, making it easy to embed
visualizations and graphs from libraries like Matplotlib and Seaborn.
• Export Options: Notebooks can be exported in formats such as HTML, PDF, or slides,
facilitating collaboration and sharing.
3|Page
Core Libraries for Data Analysis:
1. NumPy (Numerical Python):
o Purpose: Provides support for numerical and matrix computations,
foundational for scientific computing in Python.
o Main Components:
▪ ndarray (N-dimensional array): NumPy’s primary data structure, which
supports fast and efficient operations on large datasets.
▪ Broadcasting and Vectorized Operations: Eliminates the need for
looping over data by enabling element-wise operations, enhancing
performance.
▪ Mathematical Functions and Linear Algebra: Includes functions for
statistical analysis, linear algebra, Fourier transformations, and random
sampling, which are essential for scientific computations.
o Benefits: NumPy’s optimized performance makes it a preferred choice for
tasks requiring fast and memory-efficient computations, such as handling large
datasets and matrix operations.
2. Pandas:
o Purpose: Provides powerful data manipulation and analysis capabilities,
especially for structured data.
o Main Components:
4|Page
▪ DataFrames and Series: DataFrame is a 2D labeled data structure (akin
to a table), while Series is a 1D array with labels. Together, they
provide an intuitive way to handle and manipulate structured data.
▪ Data Cleaning and Transformation: Pandas excels in data
preprocessing with functions for handling missing values, filtering
rows and columns, renaming, and reshaping data.
▪ Data Aggregation and Grouping: Allows grouping and summarizing
data, which is useful for generating statistics, creating pivot tables, and
performing operations on subsets of data.
▪ Time Series Handling: Pandas has specialized support for time series
data, making it easier to analyze trends and perform date-based
calculations.
o Benefits: Pandas is flexible and integrates well with other libraries, making it
ideal for data wrangling and exploratory data analysis.
Combining Jupyter, NumPy, and Pandas:
These tools together provide an efficient, interactive platform for data exploration and
analysis. Jupyter serves as the interface for executing code and visualizing data, while
NumPy and Pandas provide the computational and data manipulation power needed for
analytical tasks.
5|Page
Experiment -2
Aim:
Program to demonstrate Simple Linear Regression
Theory:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
# Generating synthetic dataset (or you can load your dataset here)
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
6|Page
# Visualizing the results
plt.scatter(X, y, color="blue", label="Data Points")
7|Page
Output:
8|Page
Experiment -3
Aim:
Program to demonstrate Logistic Regression
Theory:
# Importing required libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
9|Page
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print("Accuracy:", accuracy)
print("\nConfusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)
10 | P a g e
Output:
11 | P a g e
Experiment -4
Aim:
Program to demonstrate Decision Tree - ID3 Algorithm
Theory:
# Importing required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn import tree
import matplotlib.pyplot as plt
# Creating a DataFrame
df = pd.DataFrame(data)
12 | P a g e
df = pd.get_dummies(df, columns=['Outlook', 'Temperature', 'Humidity', 'Windy'],
drop_first=False)
# Creating and training the Decision Tree classifier with ID3 algorithm
model = DecisionTreeClassifier(criterion="entropy", random_state=42)
model.fit(X_train, y_train)
13 | P a g e
Output:
14 | P a g e
Experiment -5
Aim:
Program to demonstrate k-Nearest Neighbor flowers classification
Theory:
# Importing required libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
import seaborn as sns
# Creating and training the k-NN model (using k=3 for this example)
k=3
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
15 | P a g e
# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred, target_names=iris.target_names)
print("Accuracy:", accuracy)
print("\nConfusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)
16 | P a g e
Output:
17 | P a g e
Experiment -6
Aim:
Program to demonstrate Naïve- Bayes Classifier
Theory:
# Importing required libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
import seaborn as sns
18 | P a g e
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred, target_names=iris.target_names)
print("Accuracy:", accuracy)
print("\nConfusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)
19 | P a g e
Output:
20 | P a g e
Experiment -7
Aim:
Program to demonstrate PCA and LDA on Iris dataset
Theory:
# Importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.model_selection import train_test_split
# Splitting the dataset into training and testing sets for LDA (optional for PCA)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
21 | P a g e
plt.scatter(X_pca[y == i, 0], X_pca[y == i, 1], label=target_name)
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA of Iris Dataset")
plt.legend()
plt.show()
22 | P a g e
Output:
23 | P a g e
Experiment -8
Aim:
Program to demonstrate DBSCAN clustering algorithm
Theory:
# Importing required libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn.datasets import make_moons
# Applying DBSCAN
dbscan = DBSCAN(eps=0.2, min_samples=5)
y_dbscan = dbscan.fit_predict(X)
24 | P a g e
# Plotting the core points
xy_core = X[class_member_mask & np.isin(np.arange(len(X)),
dbscan.core_sample_indices_)]
plt.plot(xy_core[:, 0], xy_core[:, 1], 'o', markerfacecolor=col, markeredgecolor='k',
markersize=8)
plt.title('DBSCAN Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
25 | P a g e
Output:
26 | P a g e
Experiment -9
Aim:
Program to demonstrate K-Medoids clustering algorithm
Theory:
# Importing required libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn_extra.cluster import KMedoids
# Applying K-Medoids
kmedoids = KMedoids(n_clusters=4, random_state=0)
y_kmedoids = kmedoids.fit_predict(X)
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y_kmedoids, s=30, cmap='viridis', marker='o', edgecolor='k')
plt.title('K-Medoids Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()
27 | P a g e
Output:
28 | P a g e
Experiment -10
Aim:
Program to demonstrate K-Means Clustering Algorithm on Handwritten Dataset
Theory:
# Importing required libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
# Applying K-Means
kmeans = KMeans(n_clusters=4, random_state=0)
y_kmeans = kmeans.fit_predict(X)
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=30, cmap='viridis', marker='o', edgecolor='k')
plt.title('K-Means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()
29 | P a g e
Output:
30 | P a g e