0% found this document useful (0 votes)

31 views14 pages

Python For Data Science IA 1 Programs

The document provides implementations and explanations for four machine learning algorithms: Simple Linear Regression, K-Nearest Neighbors (KNN), K-Means Clustering, and Naïve Bayes. Each section includes code examples, step-by-step breakdowns of the processes, and visualizations where applicable. The document aims to illustrate how these algorithms work, how to implement them in Python, and how to evaluate their performance.

Uploaded by

prerana.basavraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views14 pages

Python For Data Science IA 1 Programs

Uploaded by

prerana.basavraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Simple linear regression

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error

def generate_dataset(n_samples=100):

np.random.seed(42)

X = 2 * np.random.rand(n_samples, 1)

y = 3 * X + 4 + np.random.randn(n_samples, 1)

class SimpleLinearRegression:

def __init__(self):

self.slope = None

self.intercept = None

def fit(self, X, y):

n = len(X)

X_mean = np.mean(X)

y_mean = np.mean(y)

numerator = np.sum((X - X_mean) * (y - y_mean))

denominator = np.sum((X - X_mean) ** 2)

self.slope = numerator / denominator

self.intercept = y_mean - self.slope * X_mean

def predict(self, X):

return self.slope * X + self.intercept

if __name__ == "__main__":

X, y = generate_dataset()
dataset = pd.DataFrame({

"X": X.flatten(),

"y": y.flatten()

})

print("Dataset:")

print(dataset)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = SimpleLinearRegression()

model.fit(X_train.flatten(), y_train.flatten())

y_pred = model.predict(X_test.flatten())

mse = mean_squared_error(y_test, y_pred)

print(f"Model Coefficients: Slope = {model.slope:.2f}, Intercept = {model.intercept:.2f}")

print(f"Mean Squared Error on Test Set: {mse:.2f}")

plt.scatter(X, y, color="blue", label="Actual Data")

plt.plot(X, model.predict(X.flatten()), color="red", label="Regression Line")

plt.xlabel('X')

plt.ylabel('y')

plt.title('Simple Linear Regression')

plt.legend()

plt.show()
Explanation:

Step-by-step breakdown:

• Step 1: Importing Libraries

o numpy: Used for generating synthetic data and performing numerical

operations.

o matplotlib.pyplot: Used for visualizing the data and the regression line.

o LinearRegression: This is the linear regression model from scikit-learn that will
be used to fit the data.

o train_test_split: Splits the dataset into training and testing sets.

o mean_squared_error: Used to evaluate the performance of the model by

computing the mean squared error.

• Step 2: Generating Synthetic Data

o We generate synthetic data using the equation y=3x+4y = 3x + 4y=3x+4 with

some added Gaussian noise. This helps simulate real-world data where the
relationship between variables is linear but with some randomness.

o X contains the feature values (input), and y contains the target values (output).

• Step 3: Splitting Data

o train_test_split() divides the data into training and testing sets. 80% of the data is
used for training, and 20% is used for testing.

• Step 4: Initializing the Model

o We create an instance of the LinearRegression class to initialize the linear

regression model.

• Step 5: Training the Model

o linear_reg.fit(X_train, y_train) fits the model to the training data, learning the
coefficients (slope and intercept) that best fit the linear relationship between X
and y.

• Step 6: Making Predictions

o y_pred = linear_reg.predict(X_test) predicts the target values (y_pred) for the test
data (X_test).

• Step 7: Evaluating the Model

o Mean Squared Error (MSE) is used to measure how well the model fits the data.
A lower MSE indicates a better fit.

o R-squared measures the proportion of the variance in the target variable that is
predictable from the features. A value closer to 1 indicates a good fit.

• Step 8: Visualizing the Results

o We use matplotlib to visualize the test data points (X_test vs. y_test) and the
regression line that represents the model's predictions.

How Linear Regression Works:

Linear regression attempts to model the relationship between a dependent variable y and an
independent variable X by fitting a straight line to the data. The relationship is described by the
equation:

y=β0+β1⋅Xy = \beta_0 + \beta_1 \cdot Xy=β0+β1⋅X

Where:

• y is the target variable (output),

• X is the input feature (independent variable),

• β0\beta_0β0 is the intercept (where the line crosses the y-axis),

• β1\beta_1β1 is the slope of the line.

The goal of the algorithm is to find the values of β0\beta_0β0 and β1\beta_1β1 that minimize the
difference between the predicted values ypredy_{pred}ypred and the actual values yyy (using a
loss function like Mean Squared Error).
KNN program

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

from sklearn.datasets import load_iris

class KNNClassifier:

def init(self, k=3):

self.k = k self.X_train = None

self.y_train = None

def fit(self, X, y):

self.X_train = X

self.y_train = y

def predict(self, X):

predictions = []

for x in X:

distances = np.linalg.norm(self.X_train - x, axis=1)

nearest_indices = distances.argsort()[:self.k]

nearest_labels = self.y_train[nearest_indices]

prediction = np.bincount(nearest_labels).argmax()

predictions.append(prediction)

return np.array(predictions)

iris = load_iris()

iris_df = pd.DataFrame(data=np.c_[iris['data'], iris['target']]

columns=iris['feature_names'] + ['target'])

X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

knn = KNNClassifier(k=3)

knn.fit(X_train, y_train)

y_pred = knn.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

print("\nPredictions:")

for i, (true_label, pred_label) in enumerate(zip(y_test, y_pred)):

status = "Correct" if true_label == pred_label else "Incorrect"

print(f"Test Sample {i + 1}: True Label = {true_label}, Predicted = {pred_label}, {status}")

Explanation:

• Step 1: Importing Libraries

o numpy: Used for handling arrays and matrix operations.

o train_test_split: This function splits the dataset into training and testing subsets.

o KNeighborsClassifier: This is the KNN classifier from scikit-learn that will be

used for training and prediction.

o load_iris: A function to load the Iris dataset, which is a classic dataset used for
classification tasks.

o accuracy_score: This function calculates the accuracy of predictions by

comparing the predicted labels to the true labels.

• Step 2: Loading the Dataset

o We use load_iris() to load the Iris dataset, which is a simple classification

dataset where the goal is to predict the type of iris flower (Setosa, Versicolour, or
Virginica) based on four features (sepal length, sepal width, petal length, petal
width).

• Step 3: Splitting the Data

o train_test_split() splits the data into a training set and a test set (80% for training
and 20% for testing in this case). It shuffles the data and ensures that we
evaluate the model on unseen data.

• Step 4: Creating the KNN Classifier

o We create an instance of KNeighborsClassifier, setting the number of neighbors

n_neighbors=3. This means that the class of a new data point will be predicted
based on the majority class among the 3 nearest neighbors.

• Step 5: Training the Model

o knn.fit(X_train, y_train) trains the model using the training dataset (X_train as
input features and y_train as target labels).

• Step 6: Making Predictions

o knn.predict(X_test) makes predictions on the test data based on the trained

model. The X_test is the feature set for which we want to predict the class labels.

• Step 7: Evaluating the Model

o accuracy_score(y_test, y_pred) compares the predicted labels (y_pred) with the

true labels (y_test) and calculates the accuracy.

How KNN Works:

KNN is a simple yet powerful classification algorithm:

• For each test data point:

1. It calculates the distance (usually Euclidean distance) from that point to every
other point in the training set.

2. Then, it selects the k nearest points.

3. The majority class among the k nearest neighbors is taken as the prediction for
the test data point.
K-means program

import numpy as np

import matplotlib.pyplot as plt

def initialize_centroids(X, k):

return X[np.random.choice(X.shape[0], k, replace=False)]

def compute_distance(a, b):

return np.sqrt(np.sum((a - b) ** 2))

def assign_clusters(X, centroids):

clusters = []

for point in X:

distances = [compute_distance(point, centroid) for centroid in centroids]

cluster = np.argmin(distances) # Find the index of the nearest centroid

clusters.append(cluster)

return np.array(clusters)

def update_centroids(X, clusters, k):

new_centroids = np.zeros((k, X.shape[1]))

for i in range(k):

new_centroids[i] = np.mean(X[clusters == i], axis=0)

return new_centroids

def k_means(X, k, max_iters=100, tolerance=1e-4):

centroids = initialize_centroids(X, k)

for i in range(max_iters):

clusters = assign_clusters(X, centroids)

new_centroids = update_centroids(X, clusters, k)

if np.all(np.abs(new_centroids - centroids) < tolerance):

print(f"Converged at iteration {i}")

break

centroids = new_centroids

return centroids, clusters

from sklearn.datasets import make_blobs

X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

k=4

centroids, clusters = k_means(X, k)

plt.scatter(X[:, 0], X[:, 1], c=clusters, cmap='viridis')

plt.scatter(centroids[:, 0], centroids[:, 1], s=300, c='red', marker='X')

plt.title("K-Means Clustering (from scratch)")

plt.xlabel('Feature 1')

plt.ylabel('Feature 2')

plt.show()

Explanation:

Step-by-step breakdown:

• Step 1: Importing Libraries

o numpy: For handling numerical data and matrix operations.

o matplotlib.pyplot: For visualizing the data points and clusters.

o make_blobs: A function to generate synthetic data with a specified number of

clusters.

o KMeans: The KMeans algorithm from scikit-learn for clustering the data.

• Step 2: Generate Data

o make_blobs() generates synthetic data with 300 samples, 4 centers (clusters),

and random variance. This is used to simulate real-world clustering problems.

o X holds the generated data points, while y is the true label (not used in K-Means,
as it’s an unsupervised learning algorithm).
• Step 3: Visualizing Data Points

o The first plt.scatter() function plots the data points before applying the clustering
algorithm. They are all gray for now, and we use this plot to see how the data
looks before clustering.

• Step 4: K-Means Clustering

o KMeans(n_clusters=4) initializes the K-Means algorithm with 4 clusters (the

number of clusters we want to form).

o kmeans.fit(X) trains the K-Means model using the data X.

• Step 5: Get Cluster Centroids and Labels

o kmeans.cluster_centers_: This gives the coordinates of the centroids (the

centers) of each of the 4 clusters.

o kmeans.labels_: This gives the predicted labels (cluster assignments) for each
data point. Each data point is assigned a label that corresponds to the cluster it
belongs to.

• Step 6: Visualize Clusters

o The second plt.scatter() function visualizes the clusters by coloring each data
point according to its assigned cluster (using the labels). The centroids are
highlighted in red with a X marker.

o This plot helps us visually confirm the clusters formed by the K-Means algorithm.

How K-Means Works:

• Initialization:

o K-Means starts by randomly initializing k centroids (cluster centers).

• Iteration:

1. Assigning Labels: For each data point, it computes the distance from the point to each
centroid and assigns the point to the nearest centroid (i.e., the cluster).

2. Recalculating Centroids: After assigning labels to all points, it recalculates the centroids
by averaging the points within each cluster.

3. Repeat: Steps 1 and 2 are repeated iteratively until the centroids no longer change (i.e.,
convergence is reached).
Naïve Bayes Program

import numpy as np

from sklearn.datasets import make_classification

class NaiveBayes:

def __init__(self):

self.class_probs = {}

self.class_means = {}

self.class_vars = {}

def fit(self, X, y):

# Get unique class labels

classes = np.unique(y)

for c in classes:

self.class_probs[c] = np.mean(y == c)

for c in classes:

X_c = X[y == c]

self.class_means[c] = np.mean(X_c, axis=0)

self.class_vars[c] = np.var(X_c, axis=0)

def gaussian_pdf(self, x, mean, var):

return (1 / np.sqrt(2 * np.pi * var)) * np.exp(-(x - mean) ** 2 / (2 * var))

def predict(self, X):

predictions = []

for sample in X:

class_probs = {}

for c in self.class_probs:
prob = np.log(self.class_probs[c]) # Log prior P(class)

for i in range(len(sample)):

prob += np.log(self.gaussian_pdf(sample[i], self.class_means[c][i],

self.class_vars[c][i]))

class_probs[c] = prob

predicted_class = max(class_probs, key=class_probs.get)

predictions.append(predicted_class)

return np.array(predictions)

X, y = make_classification(n_samples=200, n_features=2, n_classes=2, random_state=42)

nb = NaiveBayes()

nb.fit(X, y)

predictions = nb.predict(X)

accuracy = np.mean(predictions == y)

print(f"Accuracy: {accuracy * 100:.2f}%")

Explanation:

Step-by-step breakdown:

• Step 1: Importing Libraries

o numpy: Used for numerical operations (though not directly used here, it is used
in the underlying data).

o train_test_split: This function splits the dataset into a training set and a test set.

o GaussianNB: This is the Naive Bayes classifier for continuous features

(assuming Gaussian/Normal distribution).

o load_iris: A function to load the Iris dataset, which contains flower data and their
corresponding species.
o accuracy_score: This function calculates the accuracy of predictions by
comparing the predicted labels (y_pred) with the true labels (y_test).

• Step 2: Loading the Dataset

o load_iris() loads the Iris dataset, which consists of 150 samples, each containing
4 features (sepal length, sepal width, petal length, petal width) and
corresponding target labels (y), which represent three species of Iris flowers.

• Step 3: Splitting the Data

o train_test_split() divides the data into a training set and a testing set (with 70%
training and 30% testing in this case). This helps in evaluating the model on
unseen data.

• Step 4: Initializing the Naive Bayes Model

o GaussianNB() initializes the Naive Bayes classifier that assumes the features are
normally distributed (Gaussian distribution).

• Step 5: Training the Model

o naive_bayes.fit(X_train, y_train) trains the Naive Bayes model using the training
data (X_train as input features and y_train as the target labels).

• Step 6: Making Predictions

o naive_bayes.predict(X_test) predicts the labels for the test data (X_test) based
on the trained model.

• Step 7: Evaluating the Model

o accuracy_score(y_test, y_pred) compares the predicted labels (y_pred) with the

true labels (y_test) and calculates the accuracy.

How Naive Bayes Works:

Naive Bayes is a probabilistic classifier based on Bayes' Theorem, with the "naive" assumption
that all features are independent given the class label. It works by computing the probability of
each class given the features and predicting the class with the highest probability.

• Bayes' Theorem: P(C∣X)=P(X∣C)P(C)P(X)P(C|X) = \frac{P(X|C)

P(C)}{P(X)}P(C∣X)=P(X)P(X∣C)P(C) Where:

o P(C∣X)P(C|X)P(C∣X) is the posterior probability of class CCC given the features

XXX.

o P(X∣C)P(X|C)P(X∣C) is the likelihood of the features XXX given the class CCC.

o P(C)P(C)P(C) is the prior probability of class CCC.

o P(X)P(X)P(X) is the probability of the features XXX.

In practice, Naive Bayes estimates the probability of each class by assuming that the features
are conditionally independent. For the Gaussian Naive Bayes (used here), it assumes the
features are normally distributed and uses the mean and variance of each feature to calculate
the likelihood.

Hermes Paradigm Book III (Digital Version) PDF
100% (15)
Hermes Paradigm Book III (Digital Version) PDF
111 pages
History of Cartography
No ratings yet
History of Cartography
27 pages
Watershed Report Card Template Iloilo Watershed Management Council
0% (1)
Watershed Report Card Template Iloilo Watershed Management Council
33 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
Shubham Pract 6 - Merged
No ratings yet
Shubham Pract 6 - Merged
12 pages
cp4252 Machine Learning Lab Manual
No ratings yet
cp4252 Machine Learning Lab Manual
21 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Machine Learning LAB
No ratings yet
Machine Learning LAB
20 pages
Lab Manual
No ratings yet
Lab Manual
9 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Machine Learning Final Manual
No ratings yet
Machine Learning Final Manual
45 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
CP4252 Lab Manual
No ratings yet
CP4252 Lab Manual
13 pages
V
No ratings yet
V
8 pages
Record
No ratings yet
Record
23 pages
Data Analytics
No ratings yet
Data Analytics
10 pages
Building, Tuning, and Deploying Models
No ratings yet
Building, Tuning, and Deploying Models
11 pages
Advance AI and ML LAB
No ratings yet
Advance AI and ML LAB
16 pages
aml lab
No ratings yet
aml lab
6 pages
1st PGM
No ratings yet
1st PGM
10 pages
ML Lab
No ratings yet
ML Lab
23 pages
1
No ratings yet
1
13 pages
Minor Lab
No ratings yet
Minor Lab
4 pages
ML Lab
No ratings yet
ML Lab
7 pages
Aiml Practical
No ratings yet
Aiml Practical
17 pages
Assignment 4
No ratings yet
Assignment 4
9 pages
Lab 1 - Machine Learning with Python - ML Engineering مهم
No ratings yet
Lab 1 - Machine Learning with Python - ML Engineering مهم
10 pages
ML II Lab
No ratings yet
ML II Lab
5 pages
Iii Aid - ML
No ratings yet
Iii Aid - ML
30 pages
ML Journal External
No ratings yet
ML Journal External
14 pages
16BCB0126 VL2018195002535 Pe003
No ratings yet
16BCB0126 VL2018195002535 Pe003
40 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
Linearregression SVM
No ratings yet
Linearregression SVM
3 pages
ML Full For Print New 1
No ratings yet
ML Full For Print New 1
38 pages
ML Experiment WithDataset
No ratings yet
ML Experiment WithDataset
23 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
ML Lab Programs (1-13)
No ratings yet
ML Lab Programs (1-13)
44 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
ML Minimized Programs
No ratings yet
ML Minimized Programs
9 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Argha's ML LAB - 240927 - 121838
No ratings yet
Argha's ML LAB - 240927 - 121838
13 pages
ML Algorithms
100% (1)
ML Algorithms
1 page
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
ML Lab Mannual
No ratings yet
ML Lab Mannual
29 pages
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
No ratings yet
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
26 pages
Minor Assignment 4
No ratings yet
Minor Assignment 4
17 pages
Final ML File
No ratings yet
Final ML File
34 pages
ML Lab Manual
No ratings yet
ML Lab Manual
13 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
MLP Unit-2
No ratings yet
MLP Unit-2
102 pages
Python Code For KNN Classifier 1. Initial Message
No ratings yet
Python Code For KNN Classifier 1. Initial Message
7 pages
Print Out ML - Finallllllllllllllll
No ratings yet
Print Out ML - Finallllllllllllllll
11 pages
ML Programs
No ratings yet
ML Programs
14 pages
Mnbnmnbnnmbbhhuyrgh
No ratings yet
Mnbnmnbnnmbbhhuyrgh
3 pages
Sklearn
No ratings yet
Sklearn
141 pages
ML Lab Record8to15
No ratings yet
ML Lab Record8to15
23 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
BroilerManagement 2
No ratings yet
BroilerManagement 2
14 pages
Telma Manual
No ratings yet
Telma Manual
10 pages
First 38 (Beneteau)
No ratings yet
First 38 (Beneteau)
3 pages
Lung Virome: New Potential Biomarkers For Asthma Severity and Exacerbation
No ratings yet
Lung Virome: New Potential Biomarkers For Asthma Severity and Exacerbation
18 pages
Mathematical Modeling of Indias Populati PDF
No ratings yet
Mathematical Modeling of Indias Populati PDF
10 pages
Aunt Jennifer's Tigers - Questions & Answers
No ratings yet
Aunt Jennifer's Tigers - Questions & Answers
4 pages
Quart Darkscan Duo Eng - 0
No ratings yet
Quart Darkscan Duo Eng - 0
4 pages
The Alien Interview
No ratings yet
The Alien Interview
9 pages
Notes - Nanotechnology
No ratings yet
Notes - Nanotechnology
20 pages
Finals Eda
No ratings yet
Finals Eda
11 pages
Oral Hygiene in Ventilator
No ratings yet
Oral Hygiene in Ventilator
136 pages
Ijresm V4 I6 53
No ratings yet
Ijresm V4 I6 53
2 pages
Flowers and Fruits
0% (1)
Flowers and Fruits
14 pages
Yukleyici ZF Şanzıman Arıza Kodları PDF
No ratings yet
Yukleyici ZF Şanzıman Arıza Kodları PDF
22 pages
Chartreux
No ratings yet
Chartreux
21 pages
Peach Festival and Country Auction: Car Show at Chester Village
100% (2)
Peach Festival and Country Auction: Car Show at Chester Village
24 pages
Production of Spray-Dried Coconut
No ratings yet
Production of Spray-Dried Coconut
4 pages
nRF24L01 Prelim Prod Spec 1 2 PDF
No ratings yet
nRF24L01 Prelim Prod Spec 1 2 PDF
39 pages
Product Brief - RSC-4128
No ratings yet
Product Brief - RSC-4128
2 pages
Civil Question Paper
No ratings yet
Civil Question Paper
9 pages
For Digital Communication Over Dispersive Channels: Decision-Feedback Equalization
No ratings yet
For Digital Communication Over Dispersive Channels: Decision-Feedback Equalization
102 pages
Introduction To SCR
No ratings yet
Introduction To SCR
19 pages
Teks Story Telling Maling Kundang
No ratings yet
Teks Story Telling Maling Kundang
2 pages
NextStopNextGen 2
No ratings yet
NextStopNextGen 2
37 pages
Jeff Bezos: Business Career: Early Career
100% (1)
Jeff Bezos: Business Career: Early Career
5 pages
Im Case Study 04
No ratings yet
Im Case Study 04
49 pages
Lecture 2 Introduction PPT Final
No ratings yet
Lecture 2 Introduction PPT Final
13 pages

Python For Data Science IA 1 Programs

Uploaded by

Python For Data Science IA 1 Programs

Uploaded by

Simple linear regression

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error

def fit(self, X, y):

numerator = np.sum((X - X_mean) * (y - y_mean))

denominator = np.sum((X - X_mean) ** 2)

self.slope = numerator / denominator

self.intercept = y_mean - self.slope * X_mean

def predict(self, X):

return self.slope * X + self.intercept

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

mse = mean_squared_error(y_test, y_pred)

print(f"Model Coefficients: Slope = {model.slope:.2f}, Intercept = {model.intercept:.2f}")

plt.scatter(X, y, color="blue", label="Actual Data")

plt.plot(X, model.predict(X.flatten()), color="red", label="Regression Line")

plt.title('Simple Linear Regression')

• Step 1: Importing Libraries

o numpy: Used for generating synthetic data and performing numerical

o train_test_split: Splits the dataset into training and testing sets.

o mean_squared_error: Used to evaluate the performance of the model by

• Step 2: Generating Synthetic Data

o We generate synthetic data using the equation y=3x+4y = 3x + 4y=3x+4 with

• Step 3: Splitting Data

• Step 4: Initializing the Model

o We create an instance of the LinearRegression class to initialize the linear

• Step 5: Training the Model

• Step 6: Making Predictions

• Step 7: Evaluating the Model

• Step 8: Visualizing the Results

How Linear Regression Works:

y=β0+β1⋅Xy = \beta_0 + \beta_1 \cdot Xy=β0+β1⋅X

• y is the target variable (output),

• X is the input feature (independent variable),

• β0\beta_0β0 is the intercept (where the line crosses the y-axis),

• β1\beta_1β1 is the slope of the line.

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

from sklearn.datasets import load_iris

def __init__(self, k=3):

self.k = k self.X_train = None

def fit(self, X, y):

def predict(self, X):

distances = np.linalg.norm(self.X_train - x, axis=1)

iris_df = pd.DataFrame(data=np.c_[iris['data'], iris['target']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

for i, (true_label, pred_label) in enumerate(zip(y_test, y_pred)):

status = "Correct" if true_label == pred_label else "Incorrect"

print(f"Test Sample {i + 1}: True Label = {true_label}, Predicted = {pred_label}, {status}")

• Step 1: Importing Libraries

o numpy: Used for handling arrays and matrix operations.

o KNeighborsClassifier: This is the KNN classifier from scikit-learn that will be

o accuracy_score: This function calculates the accuracy of predictions by

• Step 2: Loading the Dataset

o We use load_iris() to load the Iris dataset, which is a simple classification

• Step 3: Splitting the Data

• Step 4: Creating the KNN Classifier

o We create an instance of KNeighborsClassifier, setting the number of neighbors

• Step 5: Training the Model

• Step 6: Making Predictions

o knn.predict(X_test) makes predictions on the test data based on the trained

• Step 7: Evaluating the Model

o accuracy_score(y_test, y_pred) compares the predicted labels (y_pred) with the

How KNN Works:

KNN is a simple yet powerful classification algorithm:

• For each test data point:

2. Then, it selects the k nearest points.

import matplotlib.pyplot as plt

def initialize_centroids(X, k):

return X[np.random.choice(X.shape[0], k, replace=False)]

def compute_distance(a, b):

return np.sqrt(np.sum((a - b) ** 2))

def assign_clusters(X, centroids):

distances = [compute_distance(point, centroid) for centroid in centroids]

def init(self, k=3):