0% found this document useful (0 votes)

44 views54 pages

Amity University: Jharkhand

This lab manual for the Machine Learning Using Python course at Amity University Jharkhand outlines general instructions for students, course objectives, prerequisites, and expected learning outcomes. It includes a detailed list of weekly experiments focusing on practical applications of machine learning concepts using Python, such as data visualization, data manipulation with Pandas, and implementing regression models. The manual emphasizes individual experimentation, adherence to lab rules, and the importance of understanding machine learning algorithms and their applications.

Uploaded by

Saurav Dubey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views54 pages

Amity University: Jharkhand

Uploaded by

Saurav Dubey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 54

1

AMITY UNIVERSITY
JHARKHAND

LAB MANUAL

Course Title: Machine Learning Using Python

Course Level: UG
Course Code: CSIT737
Program: MCA
Semester: III

Faculty: Prepared By: sourav dubey

DR. UMANG GUPTA

Assistant

AMITY UNIVERSITY JHARKHAND

General instructions to students

1. Students should be regular and come prepared for lab practice.

2. In case a student misses a class, it is his/her responsibility to complete that missed experiment(s).
3. Students should bring the observation book, lab journal and lab manual.
4. Prescribed textbooks and class notes can be kept ready for reference if required.
5. They should implement the given experiment individually.
6. Once the experiment(s) get executed, they should show the program and results to the instructors and copy
the same in their observation book.
7. Questions for lab tests and exam need not necessarily be limited to the questions in the manual but could
involve some variations and / or combinations of the questions.
8. All the students must maintain silence inside the lab.
9. All the students must carry their id card before entering the lab, and college uniform is strictly mandatory,
otherwise students will not be permitted to sit inside the lab.
10. No food or beverage items are allowed inside the lab.
11. Keep your bags outside the lab.
12. Do not use cell phones inside the lab. (If anybody is found using cell phone inside the lab, his/her mobile
phone will be seized by the responsible authority.)
13. Shut down the system and arrange your chair before leaving the lab.
14. While using lab sign in the register with your name, enrollment number and branch to mark your
attendance.
15. Do not plug any device without permission.
16. Do not use the internet without permission.

Note: Above mentioned instructions can be modified based on the context of the lab. Credit

L (Lecture) P/S
T (Practical/Studio) Total Credit Units

3 - 2 4
Unit

Lab/ Practical/ Studio Assessment

Continuous Assessment/Internal Assessment End Term Examination

Components Lab
Performance Mid Term Viva Attendance Practical Viva
(Drop down) Record

Weightage (%) 20 10 5 10 5 30 20

AMITY UNIVERSITY JHARKHAND

3
Course Objectives:

Machine learning is the science of getting computers to act without being explicitly programmed. This
course provides a broad introduction to machine learning, datamining, and statistical pattern recognition.
It will introduce you to a wide range of machine learning tools in Python. The focus is on the concepts,
methods, and applications of the general predictive modeling and unsupervised learning and how they are
implemented in the Python language environment. The goal is to understand how to use these tools to
solve real world problems. After this course you will be able to carry out your experiments with the
publicly available algorithms or develop your own algorithm.
Pre-requisites:

Student Learning Outcomes:

• Understand the objectives and functions of machine learning

• Apply the type of machine learning and data modelling, and data engineering
• Analyze the different machine leaning algorithm, such as Linear regression, ridge regression,
Lasso, Bayesian regression, regression with basic functions
• Create theoretical of machine learning mechanisms for predictive analysis
• Apply different ways of machine learning approach such as supervised learning, unsupervised
learning, reinforcement learning and deep learning
• Analyze fluent with popular machine learning techniques
• Understand Be aware of other available machine learning modules
• Analyze Explain and adopt the machine learning algorithm
Pedagogy for Course Delivery:
The course will be delivered using classroom teaching, short practical experiments and lab experiments.
Apart from this instructor is free to adopt any methodology to make class interactive.

AMITY UNIVERSITY JHARKHAND

List of Experiments
WEEK 1
1. Write a Python program to create a line chart, bar chart, and histogram using matplotlib.

AMITY UNIVERSITY JHARKHAND

WEEK 2

1. Write a Python program to create an n * k matrix to represent a linear function that maps
kdimensional vectors to n-dimensional vectors. Use NumPy to generate a 4x3 matrix with random
integers between 1 and 10.
import numpy as np n = 4 k = 3 matrix =
np.random.randint(1, 11, size=(n, k))
print(matrix)
import numpy as np random_number = np.random.rand()
print(random_number) # Output: A random float in the range [0, 1)
import numpy as np
random_number = np.random.rand() print(random_number) # Output: A
random float in the range [0, 1) import numpy as np

# Define dimensions n = 4 # Dimension

of the output vector k = 3 # Dimension
of the input vector

# Create a random n x k matrix A

= np.random.rand(n, k)
print("Matrix A:\n", A)

# Create a random k-dimensional input vector

x = np.random.rand(k) print("Vector x:", x)

# Perform the linear transformation

y = np.dot(A, x) print("Vector y:",

AMITY UNIVERSITY JHARKHAND

8
y) import numpy as np

# Define dimensions n = 4 # Dimension

of the output vector k = 3 # Dimension
of the input vector

# Create a random n x k matrix using randint

A = np.random.randint(1, 10, size=(n, k)) # Random integers between 1 and 9
(inclusive) print("Matrix
A:\n", A)

# Create a random k-dimensional input vector using randint x =

np.random.randint(1, 10, size=k) # Random integers between 1 and 9
(inclusive) print("Vector
x:", x)

# Perform the linear transformation

y = np.dot(A, x) print("Vector y:",
y)

WEEK 3

1. A psychologist is observing eating behaviour in 131 children aged 3 years old from Ranchi. He
presents each child 20 new foods which they have never eaten before. He then records the number
of foods they try. The results are shown in the table below. Previous research with thousands of
children from across the country has shown that we expect 40 % of young children to try 0 to 5 new
foods, 30% to try 6 to 10 new foods, 20% to try 11 to 15 new foods and 10 % to try 16 to 20 new
foods.
Perform a chi square test to see if the children from Ranchi follow the same distribution that the
research on Indian children for significance level 5% 3 degrees of freedom (7.815).

AMITY UNIVERSITY JHARKHAND

WEEK 4: PANDAS ASSIGNMENT

Dataset: customer_churn-1.csv

1. Start off by importing the customer_churn.csv file in the jupyter notebook and store that in churn
DataFrame.
2. From the churn DataFrame, select only 3rd, 7th, 9th, and 20th columns and all the rows and store
that in a new DataFrame named newCols.
3. From the original DataFrame, select only the rows from the 200th index till the 1000th
index(inclusive) column.
4. Now select the rows from 20th index till 200th index(exclusive), and columns from 2nd index till
15th index value.
5. Display the top 100 records from the original DataFrame.
6. Display the last 10 records from the DataFrame.
7. Display the last record from the DataFrame.
8. Now from the churn DataFrame, try to sort the data by the tenure column according to the
descending order.
9. Fetch all the records that are satisfying the following condition: a. Tenure>50 and the gender as
‘Female’ b. Gender as ‘Male’ and SeniorCitizen as 0 c. TechSupport as ‘Yes’ and Churn as ‘No’ d.
Contract type as ‘Month-to-month’ and Churn as ‘Yes’
10. Use a for loop to calculate the number of customers that are getting the tech support and are male
senior citizens.
11. Write a Python program to manipulate and rescale the following data using pandas and
scikitlearn:
import pandas as pd

data = {'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data) print(df)

from google.colab import drive

drive.mount('/content/drive') import
pandas as pd

# Load the data into a DataFrame

churn = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Data/customer_churn-
1.csv') churn
11
# Selecting the 3rd, 7th, 9th, and 20th columns (Python is zero-indexed)
newCols = churn.iloc[:, [2, 6, 8, 19]] newCols

# Including index 1000 subset_df

= churn.loc[200:1000] subset_df
# Rows from index 20 to 199, columns from index 2 to 14
subset_df2 = churn.iloc[20:200, 2:15] subset_df2

# Displaying the top 100 records

AMITY UNIVERSITY JHARKHAND

12
print(churn.head(100))

# Displaying the last 10 records print(churn.tail(10))

# Displaying the last record print(churn.iloc[-1])

# Sorting by 'tenure' column in descending order sorted_churn
= churn.sort_values(by='tenure', ascending=False) sorted_churn
# Using multiple conditions to fetch data condition_a =
churn[(churn['tenure'] > 50) & (churn['gender'] == 'Female')]
print(condition_a)
condition_b = churn[(churn['gender'] == 'Male') & (churn['SeniorCitizen'] ==
0)] print(condition_b)
condition_c = churn[(churn['TechSupport'] == 'Yes') & (churn['Churn'] == 'No')]
print(condition_c) condition_d = churn[(churn['Contract'] == 'Month-to-month') &
(churn['Churn'] ==
'Yes')] print(condition_d)
# Calculating number of male senior citizens getting tech support
count = 0 for index, row in churn.iterrows():
if row['TechSupport'] == 'Yes' and row['gender'] == 'Male' and
row['SeniorCitizen'] == 1:
count += 1 print("Number of male senior citizens getting tech
support:", count) count = churn[(churn['TechSupport'] == 'Yes') &
(churn['gender'] == 'Male') & (churn['SeniorCitizen'] == 1)].shape[0]
print("Number of male senior citizens getting tech support:",
count) from sklearn.preprocessing import MinMaxScaler

# Create DataFrame data = {'A': [1, 2, 3, 4, 5], 'B':

[10, 20, 30, 40, 50]} df = pd.DataFrame(data)

# Initialize scaler scaler

= MinMaxScaler()

# Fit and transform data

df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
print(df_scaled)
13
WEEK 5: SCRAPY

1. Write a Python script using Scrapy to scrape the titles and prices of books from a sample book
store website http://books.toscrape.com.

Instructions:
1. Setup Scrapy Project:
o Install Scrapy if you haven't already: pip install scrapy o Create a
new Scrapy project: scrapy startproject bookscraper o Navigate to
the project directory: cd bookscraper
o Generate a new spider: scrapy genspider books books.toscrape.com
2. Define the Spider:
o Open the books_spider.py file in the spiders directory.
o Modify the spider to scrape book titles and prices.
Sample Code:
# bookscraper/spiders/books_spider.py

import scrapy
class
BooksSpider(scrapy.Spider):
name = "books"
start_urls = ['http://books.toscrape.com']
def parse(self, response): for book in
response.css('article.product_pod'):
yield {
'title': book.css('h3 a::attr(title)').get(),
'price': book.css('div.product_price
p.price_color::text').get(),
}

# Follow pagination links

next_page = response.css('li.next a::attr(href)').get()
if next_page is not None:
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, callback=self.parse)

Run the Spider:

o Execute the spider to scrape the data and store it in a JSON file: scrapy crawl
books -o books.json

Additional Questions:
1. Explain the purpose of each part of the spider code.
2. Modify the spider to also scrape the book's availability status.
3. How would you handle potential issues such as missing data or pagination errors?

Step-by-Step Guide:
1. Setup Scrapy Project:
AMITY UNIVERSITY JHARKHAND
o Open your terminal and run the following commands: sh Copy code pip
install scrapy
15

scrapy startproject bookscraper cd

bookscraper
scrapy genspider books books.toscrape.com
2. Define the Spider:
o Open the books_spider.py file located in bookscraper/spiders/ and replace its
content with the sample code provided above.
3. Run the Spider:
o In the terminal, navigate to the root of your Scrapy project (bookscraper) and run:
sh Copy code
scrapy crawl books -o books.json
4. Additional Modifications:
o To scrape the book's availability status, modify the yield statement in the parse
method as follows:
yield {
'title': book.css('h3 a::attr(title)').get(),
'price': book.css('div.product_price
p.price_color::text').get(),
'availability':
book.css('p.instock.availability::text').get().strip(),
}
5. Handling Potential Issues:
o Missing Data: Use .get(default='N/A') to provide a default value if the data is
missing.
o Pagination Errors: Implement error handling using try-except blocks around the
pagination logic.

next_page = response.css('li.next a::attr(href)').get()

if next_page is not None:
next_page = response.urljoin(next_page)
try:
yield scrapy.Request(next_page,
callback=self.parse) except
Exception as e:
self.logger.error(f"Failed to follow pagination
link: {e}")
Final Notes:
• Make sure to explore the Scrapy documentation to understand more advanced features and
best practices: Scrapy Documentation
• Test your spider thoroughly to ensure it handles edge cases and errors gracefully.

WEEK 6 & 7: Linear Regression, Ridge Regression and Lasso Regression

1. Implement linear regression and enhance it using Lasso and Ridge regression on the
California housing dataset
import numpy as np import
pandas as pd
from sklearn.datasets import fetch_california_housing from

AMITY UNIVERSITY JHARKHAND

sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.metrics import mean_squared_error, r2_score import
matplotlib.pyplot as plt # import the matplotlib library import
seaborn as sns # import the seaborn library
# Load the california housing dataset california
= fetch_california_housing()
X = pd.DataFrame(california.data, columns=california.feature_names) y
= pd.DataFrame(california.target, columns=["MEDV"])
X.head() y.head() # Display basic information
about the dataset print(X.info())
print(X.describe()) # Check for missing
values print(X.isnull().sum())
# Visualize the distribution of features
X.hist(figsize=(12, 10)) plt.show()

AMITY UNIVERSITY JHARKHAND

# Analyze relationships between features and target variable

plt.figure(figsize=(10, 6)) sns.pairplot(pd.concat([X, y],
axis=1), hue='MEDV') plt.show()

# Box plots to identify outliers

plt.figure(figsize=(10, 6))
X.boxplot() plt.show()
# Explore correlations between features
correlation_matrix = X.corr() plt.figure(figsize=(10,
8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix') plt.show()

AMITY UNIVERSITY JHARKHAND

# # Function to drop highly correlated features

# def drop_highly_correlated_features(df, threshold):
# # Create a correlation matrix that is the absolute value of the given
correlation matrix
# corr_matrix = df.corr().abs()

# # Find index/column names of highly correlated features (above the

threshold)
# upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape),
k=1).astype(np.bool_))
# to_drop = [column for column in upper.columns if any(upper[column] >
threshold)]

# # Drop features
# df_reduced = df.drop(columns=to_drop)
# return df_reduced, to_drop

AMITY UNIVERSITY JHARKHAND

# # Applying the function with a 0.8 threshold
18
# X_reduced, dropped_features = drop_highly_correlated_features(X, 0.8)

# Drop the 'AveBedrms' from the DataFrame

X.drop('AveBedrms', axis=1, inplace=True)
# Combine Latitude and Longitude into a single feature
X['Location'] = X['Latitude'] + X['Longitude']
X_modified = X.drop(['Latitude', 'Longitude'], axis=1)
X_modified.columns # Split the data into training and testing sets for both
original and reduced datasets
X_train_orig, X_test_orig, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
X_train_red, X_test_red, _, _ = train_test_split(X_modified, y, test_size=0.2,
random_state=42)

# Function to train and evaluate a model def

train_and_evaluate(X_train, X_test, y_train, y_test):
model = LinearRegression()
model.fit(X_train, y_train) y_pred =
model.predict(X_test) mse =
mean_squared_error(y_test, y_pred) r2 =
r2_score(y_test, y_pred) return mse, r2

# Train and evaluate the model using the original features mse_original,
r2_original = train_and_evaluate(X_train_orig, X_test_orig, y_train,
y_test) print("Original Features - MSE: {:.4f}, R²:
{:.4f}".format(mse_original, r2_original))

# Train and evaluate the model using the reduced features mse_reduced,
r2_reduced = train_and_evaluate(X_train_red, X_test_red, y_train, y_test)
print("Reduced Features - MSE: {:.4f}, R²: {:.4f}".format(mse_reduced,
r2_reduced)) def train_and_evaluate_lasso(X_train, X_test, y_train,
y_test, alpha=0.1):
model = Lasso(alpha=alpha)
model.fit(X_train, y_train) y_pred =
model.predict(X_test) mse =
mean_squared_error(y_test, y_pred) r2 =
r2_score(y_test, y_pred) return mse, r2
# Train and evaluate the Lasso model using the original features
mse_original_lasso, r2_original_lasso = train_and_evaluate_lasso(X_train_orig,
X_test_orig, y_train, y_test) print("Lasso with Original Features - MSE:
19
mse_reduced_lasso, r2_reduced_lasso = train_and_evaluate_lasso(X_train_red,
X_test_red, y_train, y_test) print("Lasso with Reduced Features - MSE:
{:.4f}, R²: {:.4f}".format(mse_reduced_lasso, r2_reduced_lasso))

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train a linear regression model model

= LinearRegression() model.fit(X_train,
y_train)

# Make predictions on the test set y_pred

= model.predict(X_test)

# Evaluate the model mse =

mean_squared_error(y_test, y_pred) r2 =
r2_score(y_test, y_pred)
print("Mean Squared Error:",
mse) print("R-squared:", r2)
# Train a Lasso regression model lasso_model
= Lasso(alpha=0.1) lasso_model.fit(X_train,
y_train)

# Make predictions on the test set y_pred_lasso

= lasso_model.predict(X_test)

# Evaluate the Lasso model mse_lasso =

mean_squared_error(y_test, y_pred_lasso) r2_lasso =
r2_score(y_test, y_pred_lasso)

print("Lasso Regression - Mean Squared Error:", mse_lasso) print("Lasso

Regression - R-squared:", r2_lasso)

# Train a Ridge regression model ridge_model

= Ridge(alpha=0.1) ridge_model.fit(X_train,
y_train)

# Make predictions on the test set y_pred_ridge

= ridge_model.predict(X_test)

# Evaluate the Ridge model mse_ridge =

mean_squared_error(y_test, y_pred_ridge) r2_ridge =
20
print("Ridge Regression - R-squared:", r2_ridge)

# Analyze feature importance in the linear regression model

coefficients = pd.DataFrame(model.coef_[0], index=X.columns,
columns=['Coefficients']) print(coefficients)

# Visualize feature importance

plt.figure(figsize=(10, 6))
coefficients.plot(kind='bar') plt.title('Feature
Importance in Linear Regression')
plt.xlabel('Features') plt.ylabel('Coefficients')
plt.show()

# Analyze feature importance in the Lasso regression model

lasso_coefficients = pd.DataFrame(lasso_model.coef_, index=X.columns,
columns=['Coefficients']) print(lasso_coefficients)

# Visualize feature importance

plt.figure(figsize=(10, 6))
lasso_coefficients.plot(kind='bar')
plt.title('Feature Importance in Lasso Regression')
plt.xlabel('Features') plt.ylabel('Coefficients')
plt.show()

# Analyze feature importance in the Ridge regression model

ridge_coefficients = pd.DataFrame(ridge_model.coef_[0], index=X.columns,
columns=['Coefficients']) print(ridge_coefficients)

# Visualize feature importance plt.figure(figsize=(10,

6)) ridge_coefficients.plot(kind='bar')
plt.title('Feature Importance in Ridge Regression')
plt.xlabel('Features') plt.ylabel('Coefficients')
plt.show()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)
21
# Predictions y_pred_lr =
lr_model.predict(X_test)

# Evaluation mse_lr = mean_squared_error(y_test, y_pred_lr)

r2_lr = r2_score(y_test, y_pred_lr) print(f"Linear Regression -
MSE: {mse_lr:.4f}, R²: {r2_lr:.4f}")

# Lasso Regression lasso_model = Lasso(alpha=0.1) # Alpha is the

regularization strength lasso_model.fit(X_train, y_train)

# Predictions y_pred_lasso =
lasso_model.predict(X_test)

# Evaluation mse_lasso = mean_squared_error(y_test, y_pred_lasso)

r2_lasso = r2_score(y_test, y_pred_lasso) print(f"Lasso Regression -
MSE: {mse_lasso:.4f}, R²: {r2_lasso:.4f}")
# Ridge Regression ridge_model = Ridge(alpha=1.0) # Alpha is the
regularization strength ridge_model.fit(X_train, y_train)

# Predictions y_pred_ridge =
ridge_model.predict(X_test)

# Evaluation mse_ridge = mean_squared_error(y_test, y_pred_ridge)

r2_ridge = r2_score(y_test, y_pred_ridge) print(f"Ridge Regression -
MSE: {mse_ridge:.4f}, R²: {r2_ridge:.4f}")

from sklearn.model_selection import GridSearchCV

# Setting up the range of alpha values to test for Lasso

lasso_params = {'alpha': [0.001, 0.01, 0.1, 1, 10, 100]}
# Setting up the GridSearchCV object for Lasso lasso_grid
= GridSearchCV(Lasso(), lasso_params, cv=5,
scoring='neg_mean_squared_error') lasso_grid.fit(X_train,
y_train)

# Best alpha value

print("Best alpha for Lasso: ", lasso_grid.best_params_)
import numpy as np

# Function to calculate the mean of each feature def

calculate_mean(data):
n_samples = data.shape[0]
sum_data = np.sum(data, axis=0)
mean = sum_data / n_samples return
mean

# Function to calculate the covariance matrix def

calculate_covariance(data, mean):
y_pred_lasso_best = lasso_grid.best_estimator_.predict(X_test)
mse_lasso_best = mean_squared_error(y_test, y_pred_lasso_best)
r2_lasso_best = r2_score(y_test, y_pred_lasso_best) print(f"Optimized
Lasso Regression - MSE: {mse_lasso_best:.4f}, R²:
{r2_lasso_best:.4f}")

# Setting up the range of alpha values to test for Ridge ridge_params

= {'alpha': [0.1, 1, 10, 100, 1000]}

# Setting up the GridSearchCV object for Ridge ridge_grid

= GridSearchCV(Ridge(), ridge_params, cv=5,
scoring='neg_mean_squared_error') ridge_grid.fit(X_train,
y_train)

# Best alpha value print("Best alpha for Ridge: ",

ridge_grid.best_params_)
# Evaluate the best model found by GridSearchCV y_pred_ridge_best =
ridge_grid.best_estimator_.predict(X_test) mse_ridge_best =
mean_squared_error(y_test, y_pred_ridge_best) r2_ridge_best =
r2_score(y_test, y_pred_ridge_best) print(f"Optimized Ridge
Regression - MSE: {mse_ridge_best:.4f}, R²: {r2_ridge_best:.4f}")

WEEK 8
Python Example: MLE for Bivariate Gaussian Distribution

We’ll simulate a dataset representing two features, which could correspond to the sizes and weights in the
previous example and perform MLE to estimate the parameters of the bivariate Gaussian distribution.

Step-by-step Explanation:

Generate Data: Create synthetic data for two features. Compute Mean: Calculate the sample mean of each
feature. Compute Covariance Matrix: Manually calculate the covariance matrix. MLE Estimation: Use the
computed mean and covariance as the MLE estimates
28

23
n_samples = data.shape[0] deviations = data - mean
covariance_matrix = np.dot(deviations.T, deviations) / n_samples
return covariance_matrix

# Generate synthetic data np.random.seed(0) data =

np.random.multivariate_normal(mean=[0, 0], cov=[[1, 0.5], [0.5, 1]],
size=100)

# Step 1: Compute the mean mean_est

= calculate_mean(data)

# Step 2: Compute the covariance matrix covariance_est

= calculate_covariance(data, mean_est)

# Print the results print("Estimated

Mean:\n", mean_est)
print("Estimated Covariance Matrix:\n", covariance_est)

Creating graphs to visualize the Bivariate Gaussian Distribution

import numpy as np import
matplotlib.pyplot as plt import
seaborn as sns
# Mean (mu) and Covariance (Sigma) of the distribution
mu = np.array([0, 0]) # Example mean for two dimensions (x and y)
Sigma = np.array([[1, 0.5], [0.5, 1]]) # Example covariance matrix
# Generate random data data =
np.random.multivariate_normal(mu, Sigma, size=500)
# Extracting individual components x,
y = data.T

# Setting up the plot with matplotlib plt.figure(figsize=(8,

6))

# Using seaborn to create a scatter plot sns.scatterplot(x=x,

y=y, color='blue')

# Adding titles and labels

plt.title('Bivariate Gaussian Distribution')
plt.xlabel('X') plt.ylabel('Y')

AMITY UNIVERSITY JHARKHAND

# Setting up the plot plt.figure(figsize=(8,
6))

# Creating a density plot with contour lines sns.kdeplot(x=x, y=y,

cmap="Reds", fill=True, thresh=0, levels=100)
# Adding scatter plot to show actual data points
sns.scatterplot(x=x, y=y, color='blue', s=50, edgecolor='w', linewidth=0.5)
# Titles and labels
plt.title('Bivariate Gaussian Distribution with Density Contours')
plt.xlabel('X') plt.ylabel('Y')

# Show the plot

plt.grid(True) plt.show()
30

AMITY UNIVERSITY JHARKHAND

Explanation of the Code

mu and Sigma: These are the parameters for the mean and covariance of the distribution. You can modify
these to see how they affect the distribution’s shape and orientation.

np.random.multivariate_normal: Generates random data points based on the specified mean and
covariance.

sns.scatterplot: Plots individual data points on a scatter plot.

sns.kdeplot: Adds a Kernel Density Estimate (KDE) plot that shows the distribution's density with contour
lines.

Week 9
Write a python code to implement Correlation, covariance, Mahalanobis distance, Minkowski distance,
distance metric, Jaccard coefficient, missing values, feature transformations, and Geometrical
interpretation of Euclidean.

Correlation and Covariance

import numpy as np import pandas as pd from sklearn.datasets import load_iris
from scipy.spatial.distance import mahalanobis, minkowski, euclidean, cityblock,
cosine from scipy.spatial.distance import jaccard from sklearn.preprocessing

AMITY UNIVERSITY JHARKHAND

32
import StandardScaler
# Load the Iris dataset iris
= load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target) # Correlation matrix
correlation_matrix = X.corr() print("Correlation
Matrix:\n", correlation_matrix)
# Covariance matrix covariance_matrix = X.cov()
print("Covariance Matrix:\n", covariance_matrix)

Mahalanobis Distance
# Calculate Mahalanobis distance between the first and second sample mean =
np.mean(X, axis=0) cov_matrix = np.cov(X.T) inv_cov_matrix =
np.linalg.inv(cov_matrix) mahal_dist = mahalanobis(X.iloc[0], X.iloc[1],
inv_cov_matrix) print(f"Mahalanobis Distance between the first and second
sample: {mahal_dist}")

Minkowski Distance
# Calculate Minkowski distance (p=3) between the first and second sample
minkowski_dist = minkowski(X.iloc[0], X.iloc[1], p=3) print(f"Minkowski
Distance (p=3) between the first and second sample:{minkowski_dist}")

Distance Metrics (Euclidean, Manhattan)

# Calculate Euclidean, Manhattan, and Cosine distances between the first and
second sample
euclidean_dist = euclidean(X.iloc[0], X.iloc[1]) manhattan_dist
= cityblock(X.iloc[0], X.iloc[1]) print(f"Euclidean Distance
between the first and second sample:
{euclidean_dist}") print(f"Manhattan Distance between the first
and second sample:
{manhattan_dist}")

Jaccard Coefficient
# The Jaccard coefficient is usually used for binary data. We'll create a simple
example.
# Example binary data binary_data1 =
np.array([0, 1, 1, 0, 1]) binary_data2 =
np.array([1, 1, 0, 0, 1])
# Calculate Jaccard coefficient jaccard_coeff = jaccard(binary_data1,
binary_data2) print(f"Jaccard Coefficient between binary data samples:
{jaccard_coeff}")

AMITY UNIVERSITY JHARKHAND

Handling Missing Values

# Introduce missing values into the dataset
X_missing = X.copy()
X_missing.iloc[0, 0] = np.nan
# Handling missing values by imputing the mean
X_missing.fillna(X_missing.mean(), inplace=True) print("Data
after handling missing values:\n", X_missing.head())

Feature Transformations
from sklearn.preprocessing import MinMaxScaler, StandardScaler
# Normalization (Min-Max scaling) scaler
= MinMaxScaler()
X_normalized = scaler.fit_transform(X)
# Standardization
standardizer = StandardScaler()
X_standardized = standardizer.fit_transform(X) print("First 5 samples
after Min-Max Scaling:\n", X_normalized[:5]) print("First 5 samples
after Standardization:\n", X_standardized[:5])

Geometrical Interpretation of Euclidean Distance

import matplotlib.pyplot as plt
# Select the first two features for 2D visualization
X_2d = X.iloc[:, :2] # Plot the data points
plt.scatter(X_2d.iloc[:, 0], X_2d.iloc[:, 1], c=y, cmap='viridis')
plt.xlabel('Feature 1') plt.ylabel('Feature 2')

AMITY UNIVERSITY JHARKHAND

# Plot the Euclidean distance between the first and second sample
point1 = X_2d.iloc[0] point2 = X_2d.iloc[1]

plt.plot([point1[0], point2[0]], [point1[1], point2[1]], 'r-', linewidth=2)

plt.scatter(point1[0], point1[1], c='red', edgecolor='k', s=100)
plt.scatter(point2[0], point2[1], c='blue', edgecolor='k', s=100)
plt.title('Geometrical Interpretation of Euclidean Distance') plt.show()

AMITY UNIVERSITY JHARKHAND

Extra Practice Questions

1. The objective of proposed work is to predict the insurance charges of a person and identify those
patients with health insurance policy and medical details weather they have any health issues or
not. The level of treatment in crisis department vary drastically depending the type of health
insurance a person has by this we predict the insurance charges of a person .
(new_insurance_data.csv)

AMITY UNIVERSITY JHARKHAND

Ridge Regression Overview

Ridge Regression, also known as Tikhonov regularization, is a technique used to analyze multiple
regression data that suffer from multicollinearity. By adding a degree of bias to the regression
estimates, ridge regression reduces the standard errors.

Steps to Implement Ridge Regression

1. Load the Dataset
2. Preprocess the Data
3. Split the Data into Training and Testing Sets
4. Train the Ridge Regression Model
5. Evaluate the Model

Let's go through these steps in detail.

Step 1: Load the Dataset

AMITY UNIVERSITY JHARKHAND

39
First, we'll load the Boston Housing dataset. This dataset is available in the sklearn.datasets
module.

import numpy as np import

pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler from
sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error, r2_score

# Load the California Housing dataset

california = fetch_california_housing()
X = pd.DataFrame(california.data,
columns=california.feature_names) y =
pd.Series(california.target)

print(X.head()) print(y.head())

Step 2: Preprocess the Data

Before training the model, it's important to standardize the features. Standardization can improve
the performance of the model, especially for regularized models like Ridge Regression.

# Standardize the features

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Step 3: Split the Data into Training and Testing Sets Next,
we split the data into training and testing sets.

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y,
test_size=0.2, random_state=42)

Step 4: Train the Ridge Regression Model

Now, we'll train the Ridge Regression model. We'll also tune the regularization parameter α\alphaα
to find the best value.

# Train the Ridge Regression model

ridge = Ridge(alpha=1.0)

# You can change the alpha value to tune the regularization strength
ridge.fit(X_train, y_train)

Step 5: Evaluate the Model

Finally, we'll evaluate the model's performance using Mean Squared Error (MSE) and R-squared
(R²) metrics.

# Make predictions

AMITY UNIVERSITY JHARKHAND

40
y_pred_train = ridge.predict(X_train) y_pred_test
= ridge.predict(X_test)

# Evaluate the model

mse_train = mean_squared_error(y_train, y_pred_train)
mse_test = mean_squared_error(y_test, y_pred_test)
r2_train = r2_score(y_train, y_pred_train) r2_test =
r2_score(y_test, y_pred_test)

print(f"Training MSE: {mse_train}")

print(f"Testing MSE: {mse_test}") print(f"Training
R²: {r2_train}")
print(f"Testing R²: {r2_test}")

Lasso Regression Overview

Lasso Regression (Least Absolute Shrinkage and Selection Operator) is a type of linear regression that
uses L1 regularization. The L1 regularization adds a penalty equal to the absolute value of the magnitude
of coefficients. This type of regression can shrink some coefficients to zero, effectively performing
variable selection.

Steps to Implement Lasso Regression

1. Load the Dataset
2. Preprocess the Data
3. Split the Data into Training and Testing Sets
4. Train the Lasso Regression Model
5. Evaluate the Model

Let's go through these steps in detail.

Step 1: Load the Dataset
First, we'll load the Diabetes dataset. This dataset is available in the sklearn.datasets module.
import numpy as np import pandas as pd
from sklearn.datasets import load_diabetes from
sklearn.model_selection import train_test_split from
sklearn.preprocessing import StandardScaler from
sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error, r2_score

# Load the Diabetes dataset

diabetes = load_diabetes()
X = pd.DataFrame(diabetes.data, columns=diabetes.feature_names) y
= pd.Series(diabetes.target)

print(X.head()) print(y.head())

Step 2: Preprocess the Data

Before training the model, it's important to standardize the features. Standardization can improve
the performance of the model, especially for regularized models like Lasso Regression.

AMITY UNIVERSITY JHARKHAND

41
# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Step 3: Split the Data into Training and Testing Sets Next,
we split the data into training and testing sets.

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y,
test_size=0.2, random_state=42)

Step 4: Train the Lasso Regression Model

Now, we'll train the Lasso Regression model. We'll also tune the regularization parameter α\alphaα
to find the best value.

# Train the Lasso Regression model

lasso = Lasso(alpha=1.0)
# You can change the alpha value to tune the regularization strength
lasso.fit(X_train, y_train)

Step 5: Evaluate the Model

Finally, we'll evaluate the model's performance using Mean Squared Error (MSE) and R-squared
(R²) metrics.

# Make predictions
y_pred_train = lasso.predict(X_train) y_pred_test
= lasso.predict(X_test)

# Evaluate the model

mse_train = mean_squared_error(y_train, y_pred_train) mse_test
= mean_squared_error(y_test, y_pred_test) r2_train =
r2_score(y_train, y_pred_train) r2_test = r2_score(y_test,
y_pred_test)

print(f"Training MSE: {mse_train}")

print(f"Testing MSE: {mse_test}") print(f"Training
R²: {r2_train}")
print(f"Testing R²: {r2_test}")

Correlation

1. Write a python code to implement Correlation, covariance, Mahalanobis distance, Minkowski

distance, distance metric, Jaccard coefficient, missing values, feature transformations, and
Geometrical interpretation of Euclidean.

1. Correlation and Covariance

import numpy as np
import pandas as pd

AMITY UNIVERSITY JHARKHAND

42
from sklearn.datasets import load_iris
from scipy.spatial.distance import mahalanobis, minkowski,
euclidean, cityblock, cosine
from scipy.spatial.distance import jaccard
from sklearn.preprocessing import
StandardScaler

# Load the Iris dataset

iris = load_iris()
X = pd.DataFrame(iris.data,
columns=iris.feature_names) y =
pd.Series(iris.target)

# Correlation matrix
correlation_matrix = X.corr()
print("Correlation Matrix:\n", correlation_matrix)

# Covariance matrix
covariance_matrix = X.cov()
print("Covariance Matrix:\n", covariance_matrix)

2. Mahalanobis Distance
# Calculate Mahalanobis distance between the first and second
sample
mean = np.mean(X, axis=0)
cov_matrix = np.cov(X.T)
inv_cov_matrix = np.linalg.inv(cov_matrix)

mahal_dist = mahalanobis(X.iloc[0], X.iloc[1], inv_cov_matrix)

print(f"Mahalanobis Distance between the first and second sample:
{mahal_dist}")

3. Minkowski Distance

# Calculate Minkowski distance (p=3) between the first and second

sample minkowski_dist = minkowski(X.iloc[0], X.iloc[1], p=3)
print(f"Minkowski Distance (p=3) between the first and second
sample:
{minkowski_dist}")

4. Distance Metrics (Euclidean, Manhattan, Cosine)

# Calculate Euclidean, Manhattan, and Cosine distances between

the first and second sample
euclidean_dist = euclidean(X.iloc[0], X.iloc[1])
manhattan_dist = cityblock(X.iloc[0], X.iloc[1])
cosine_dist = cosine(X.iloc[0], X.iloc[1])
print(f"Euclidean Distance between the first and second
sample:

AMITY UNIVERSITY JHARKHAND

43
{euclidean_dist}") print(f"Manhattan Distance between the
first and second sample:
{manhattan_dist}") print(f"Cosine Distance between the
first and second sample:
{cosine_dist}")

5. Jaccard Coefficient
The Jaccard coefficient is usually used for binary data. We'll create a simple example.
# Example binary data
binary_data1 = np.array([0, 1, 1, 0, 1])
binary_data2 = np.array([1, 1, 0, 0, 1])

# Calculate Jaccard coefficient

jaccard_coeff = jaccard(binary_data1, binary_data2)
print(f"Jaccard Coefficient between binary data samples:
{jaccard_coeff}")

6. Handling Missing Values

# Introduce missing values into the dataset
X_missing = X.copy()
X_missing.iloc[0, 0] = np.nan

# Handling missing values by imputing the mean

X_missing.fillna(X_missing.mean(), inplace=True)
print("Data after handling missing values:\n", X_missing.head())

7. Feature Transformations from sklearn.preprocessing import MinMaxScaler,

StandardScaler

# Normalization (Min-Max scaling)

scaler = MinMaxScaler()
X_normalized = scaler.fit_transform(X)

# Standardization
standardizer = StandardScaler()
X_standardized = standardizer.fit_transform(X)

print("First 5 samples after Min-Max Scaling:\n",

X_normalized[:5]) print("First 5 samples after Standardization:\
n", X_standardized[:5])

8. Geometrical Interpretation of Euclidean Distance

Here, we'll visually interpret Euclidean distance using a 2D plot.
import matplotlib.pyplot as plt

# Select the first two features for 2D visualization

X_2d = X.iloc[:, :2]

# Plot the data points

AMITY UNIVERSITY JHARKHAND

44
plt.scatter(X_2d.iloc[:, 0], X_2d.iloc[:, 1], c=y,
cmap='viridis')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

# Plot the Euclidean distance between the first and second sample
point1 = X_2d.iloc[0]
point2 = X_2d.iloc[1]

plt.plot([point1[0], point2[0]], [point1[1], point2[1]], 'r-',

linewidth=2)
plt.scatter(point1[0], point1[1], c='red', edgecolor='k', s=100)
plt.scatter(point2[0], point2[1], c='blue', edgecolor='k', s=100)
plt.title('Geometrical Interpretation of Euclidean Distance')
plt.show()

LOGISTIC REGRESSION

Problem Statement: You work in XYZ Company. The company officials have collected some data on
health parameters based on diabetes and wish for you to create a model from it.

Dataset: diabetes.csv

Tasks to Be Performed:
• Load the dataset using pandas
• Extract data from outcome column is a variable named Y
• Extract data from every column except outcome column in a variable named X
• Divide the dataset into two parts for training and testing in 80% and 20% proportion
• Create and train Logistic Regression Model on training set
• Make predictions based on the testing set using the trained model
• Check the performance by calculating the confusion matrix and accuracy score of the mode

AMITY UNIVERSITY JHARKHAND

DECISION TREE OVERVIEW

Problem Statement

We aim to classify whether a tumor is malignant or benign based on features such as mean radius, mean
texture, mean perimeter, mean area, and mean smoothness. We'll use a Decision Tree classifier to model
this relationship and evaluate its performance.

Decision Tree Classifier Overview

A Decision Tree classifier splits the data at each node based on the feature that provides the best split
according to a certain criterion (e.g., Gini impurity or information gain). The process continues
recursively, creating a tree structure where each leaf node represents a class label.

Step 1: Load the Dataset

First, we'll load the Breast Cancer dataset. This dataset is available in the sklearn.datasets module.
import numpy as np import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import
train_test_split from sklearn.preprocessing
import StandardScaler from sklearn.tree import
DecisionTreeClassifier
from sklearn.metrics import accuracy_score,
classification_report, confusion_matrix
import matplotlib.pyplot as plt
from sklearn.tree import
plot_tree

# Load the Breast Cancer dataset

breast_cancer = load_breast_cancer()
X = pd.DataFrame(breast_cancer.data,
columns=breast_cancer.feature_names) y =
pd.Series(breast_cancer.target)

print(X.head())
print(y.head()) Step 2:
Preprocess the Data

AMITY UNIVERSITY JHARKHAND

48
We will standardize the features to ensure they all have the same scale.
# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Step 3: Split the Data into Training and Testing Sets Next,
we split the data into training and testing sets. python
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y,
test_size=0.2, random_state=42)

Step 4: Train the Decision Tree Classifier Now,

we'll train the Decision Tree classifier.
# Train the Decision Tree Classifier
decision_tree = DecisionTreeClassifier(random_state=42)
decision_tree.fit(X_train, y_train)

Step 5: Evaluate the Model

We'll evaluate the model's performance using accuracy, classification report, and confusion matrix.
# Make predictions
y_pred_train = decision_tree.predict(X_train)
y_pred_test = decision_tree.predict(X_test)

# Evaluate the model

train_accuracy = accuracy_score(y_train, y_pred_train)
test_accuracy = accuracy_score(y_test, y_pred_test)
classification_rep = classification_report(y_test,
y_pred_test) conf_matrix = confusion_matrix(y_test,
y_pred_test)

print(f"Training Accuracy: {train_accuracy}")

print(f"Testing Accuracy: {test_accuracy}")
print("Classification Report:\n", classification_rep)
print("Confusion Matrix:\n", conf_matrix)

Step 6: Visualize the Decision Tree

Finally, we'll visualize the Decision Tree to understand the splits and decision rules.
# Visualize the Decision Tree
plt.figure(figsize=(20, 10))
plot_tree(decision_tree, feature_names=breast_cancer.feature_names,
class_names=breast_cancer.target_names, filled=True)
plt.title("Decision Tree for Breast Cancer Dataset")
plt.show()

KNN (K-NEAREST NEIGHBOUR)

Problem Statement

AMITY UNIVERSITY JHARKHAND

49
We aim to classify iris flowers into three species (setosa, versicolor, and virginica) based on four features:
sepal length, sepal width, petal length, and petal width. We'll use a K-Nearest Neighbors classifier to
model this relationship and evaluate its performance.
K-Nearest Neighbors Classifier Overview
The K-Nearest Neighbors algorithm classifies a data point by looking at the 'k' nearest data points in the
training set and assigning the class that is most common among them. It is a type of lazy learning where
the model is built only when a query is made.

Step 1: Load the Dataset

First, we'll load the Iris dataset. This dataset is available in the sklearn.datasets module.
import numpy as np import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import
train_test_split from sklearn.preprocessing
import StandardScaler from sklearn.neighbors
import KNeighborsClassifier
from sklearn.metrics import accuracy_score,
classification_report, confusion_matrix

# Load the Iris dataset

iris = load_iris()
X = pd.DataFrame(iris.data,
columns=iris.feature_names) y =
pd.Series(iris.target)

print(X.head())
print(y.head())

Step 2: Preprocess the Data

We will standardize the features to ensure they all have the same scale.
# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Step 3: Split the Data into Training and Testing Sets Next,
we split the data into training and testing sets.
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y,
test_size=0.2, random_state=42)

Step 4: Train the KNN Classifier

Now, we'll train the KNN classifier. We'll choose k=5 for this example.
# Train the KNN Classifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

Step 5: Evaluate the Model

We'll evaluate the model's performance using accuracy, classification report, and confusion matrix.
# Make predictions

AMITY UNIVERSITY JHARKHAND

50
y_pred_train =
knn.predict(X_train) y_pred_test
= knn.predict(X_test)
# Evaluate the model
train_accuracy = accuracy_score(y_train, y_pred_train)
test_accuracy = accuracy_score(y_test, y_pred_test)
classification_rep = classification_report(y_test,
y_pred_test) conf_matrix = confusion_matrix(y_test,
y_pred_test)

print(f"Training Accuracy: {train_accuracy}")

print(f"Testing Accuracy: {test_accuracy}")
print("Classification Report:\n", classification_rep)
print("Confusion Matrix:\n", conf_matrix)

SVM (SUPPORT VECTOR MACHINE) CLASSIFIER

Problem Statement
We aim to classify different types of wine into three classes based on 13 chemical attributes such as
alcohol, malic acid, ash, alcalinity of ash, magnesium, total phenols, flavanoids, and others. We'll use an
SVM classifier to model this relationship and evaluate its performance.
Support Vector Machine Overview
Support Vector Machines find the hyperplane that maximizes the margin between different classes. For
non-linearly separable data, SVM uses kernel tricks to transform the data into a higher-dimensional space
where a linear separator can be found. Common kernels include linear, polynomial, and radial basis
function (RBF).

Step 1: Load the Dataset

First, we'll load the Wine dataset. This dataset is available in the sklearn.datasets module.

import numpy as np
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score,
classification_report, confusion_matrix

# Load the Wine dataset

wine = load_wine()
X = pd.DataFrame(wine.data,
columns=wine.feature_names) y =
pd.Series(wine.target)

print(X.head())
print(y.head())

AMITY UNIVERSITY JHARKHAND

51
Step 2: Preprocess the Data
We will standardize the features to ensure they all have the same scale.

# Standardize the features

scaler = StandardScaler()
X_scaled =
scaler.fit_transform(X) Step 3:
Split the Data into Training and Testing Sets
Next, we split the data into training and testing
sets.
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y,
test_size=0.2, random_state=42)

Step 4: Train the SVM Classifier

Now, we'll train the SVM classifier using an RBF kernel.
# Train the SVM Classifier
svm = SVC(kernel='rbf', random_state=42)
svm.fit(X_train, y_train)

Step 5: Evaluate the Model

We'll evaluate the model's performance using accuracy, classification report, and confusion matrix.
# Make predictions
y_pred_train = svm.predict(X_train)
y_pred_test = svm.predict(X_test)

# Evaluate the model

print(f"Training Accuracy: {train_accuracy}")

print(f"Testing Accuracy: {test_accuracy}")
print("Classification Report:\n", classification_rep)
print("Confusion Matrix:\n", conf_matrix)

KMEANS CLUSTERING
Problem Statement

We aim to cluster iris flowers into three groups based on four features: sepal length, sepal width, petal
length, and petal width. We'll use the K-Means clustering algorithm to achieve this and evaluate the
results.

K-Means Clustering Overview

K-Means clustering aims to partition n observations into k clusters in which each observation belongs to
the cluster with the nearest mean. The algorithm iteratively updates the centroids of the clusters and
assigns data points to the nearest cluster until convergence.
AMITY UNIVERSITY JHARKHAND
52
Step 1: Load the Dataset
First, we'll load the Iris dataset. This dataset is available in the sklearn.datasets module.
import numpy as np import pandas as pd from
sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler from
sklearn.cluster import KMeans from sklearn.metrics
import silhouette_score, confusion_matrix import
matplotlib.pyplot as plt import seaborn as sns

# Load the Iris dataset iris = load_iris() X =

pd.DataFrame(iris.data,
columns=iris.feature_names) y =
pd.Series(iris.target)

print(X.head()
)
print(y.head()
)

Step 2: Preprocess the Data

We will standardize the features to ensure they all have the same scale.
# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Step 3: Apply K-Means Clustering

We'll apply the K-Means algorithm with k=3 since we know there are three types of iris flowers.
# Apply K-Means Clustering kmeans =
KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_scaled)
# Get cluster labels
labels = kmeans.labels_

print(labels)

Step 4: Evaluate the Model

We'll evaluate the model using the silhouette score and visualize the clusters.
# Calculate the silhouette score
silhouette_avg = silhouette_score(X_scaled,
labels) print(f"Silhouette Score:
{silhouette_avg}")
# Plot the clusters plt.figure(figsize=(10, 7))
sns.scatterplot(x=X_scaled[:, 0], y=X_scaled[:, 1], hue=labels,
palette='viridis', s=100, alpha=0.6, edgecolor='w') plt.title('K-
Means Clustering of Iris Data') plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

AMITY UNIVERSITY JHARKHAND

# Confusion matrix to compare with true labels

conf_matrix = confusion_matrix(y, labels)
print("Confusion Matrix:\n", conf_matrix)

Multi-layer Perceptron and Back propagation

Problem Statement

We aim to classify handwritten digits (0-9) based on pixel values of 8x8 images. We'll use an MLP
classifier to model this relationship and evaluate its performance.

Multi-layer Perceptron Overview

A Multi-layer Perceptron consists of an input layer, one or more hidden layers, and an output layer. Each
neuron in a layer is connected to all neurons in the next layer. During training, the model adjusts the
weights using backpropagation, which involves calculating the gradient of the loss function with respect
to each weight and updating the weights accordingly.

Step 1: Load the Dataset

First, we'll load the digits dataset. This dataset is available in the sklearn.datasets module.
import numpy as np import pandas as pd
from sklearn.datasets import load_digits from
sklearn.model_selection import train_test_split
from sklearn.preprocessing import
StandardScaler from sklearn.neural_network
import MLPClassifier
from sklearn.metrics import accuracy_score,
classification_report, confusion_matrix

# Load the Digits dataset

digits = load_digits() X =
pd.DataFrame(digits.data)
y =
pd.Series(digits.target)

print(X.head())
print(y.head())

Step 2: Preprocess the Data

We will standardize the features to ensure they all have the same scale.
# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Step 3: Split the Data into Training and Testing Sets

Next, we split the data into training and testing sets.

# Split the data into training and testing sets

AMITY UNIVERSITY JHARKHAND

54
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y,
test_size=0.2, random_state=42)

Step 4: Train the MLP Classifier

Now, we'll train the MLP classifier. We'll use one hidden layer with 100 neurons for this example.
# Train the MLP Classifier
mlp = MLPClassifier(hidden_layer_sizes=(100,),
max_iter=300, random_state=42) mlp.fit(X_train,
y_train)

Step 5: Evaluate the Model

We'll evaluate the model's performance using accuracy, classification report, and confusion matrix.

# Make predictions
y_pred_train = mlp.predict(X_train)
y_pred_test = mlp.predict(X_test)

# Evaluate the model

print(f"Training Accuracy: {train_accuracy}")

print(f"Testing Accuracy: {test_accuracy}")
print("Classification Report:\n", classification_rep)
print("Confusion Matrix:\n", conf_matrix)

********

Textbooks:

• The Second Machine Age: Work, Progress and Prosperity in a Time of Brilliant Technologies by Erik
Brynjolfsson and Andrew McAfee. ISBN-10: 0393239357
• Getting started with Internet of Things, by Cuno Pfister, Shroff; First edition (17 May 2011), ISBN-10:
9350234130
• Big Data and The Internet of Things, by Robert Stackowiak, Art licht, Springer Nature; 1st ed. edition
(12 May 2015), ISBN-10: 1484209877

AMITY UNIVERSITY JHARKHAND

Machine Learning Lab (CIE 421P)
No ratings yet
Machine Learning Lab (CIE 421P)
49 pages
Data Science Lab Exp Lis
No ratings yet
Data Science Lab Exp Lis
72 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
ML Lab Manual
No ratings yet
ML Lab Manual
90 pages
Report Intership Chapters
No ratings yet
Report Intership Chapters
39 pages
AD-502 Machine Learning Lab - Exp 1-10
No ratings yet
AD-502 Machine Learning Lab - Exp 1-10
13 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
36 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
167 pages
ML LAB Manual
No ratings yet
ML LAB Manual
18 pages
Datascience
No ratings yet
Datascience
8 pages
AIot Lab Syllabus
No ratings yet
AIot Lab Syllabus
4 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Practical 7 Thsem
No ratings yet
Practical 7 Thsem
50 pages
Machine L-Lab-Manual
No ratings yet
Machine L-Lab-Manual
90 pages
ML Aml Cse It Lab Manual Final
No ratings yet
ML Aml Cse It Lab Manual Final
22 pages
AI ML Course
No ratings yet
AI ML Course
19 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
IDS Syllabus
No ratings yet
IDS Syllabus
5 pages
Updated ML LAB Manual-2020-21
No ratings yet
Updated ML LAB Manual-2020-21
57 pages
Teaching Plan (Machine Learning) : Programming Exercises
No ratings yet
Teaching Plan (Machine Learning) : Programming Exercises
2 pages
Astha ML Manual
No ratings yet
Astha ML Manual
56 pages
Data Science Lab Python
No ratings yet
Data Science Lab Python
3 pages
CS 601 ML Lab Manual
0% (1)
CS 601 ML Lab Manual
14 pages
PH3094D Computational Lab - Exercise3
No ratings yet
PH3094D Computational Lab - Exercise3
3 pages
B.tech Minor Syllabus-CSE (Data Science) - Final
No ratings yet
B.tech Minor Syllabus-CSE (Data Science) - Final
17 pages
ML Lab
No ratings yet
ML Lab
45 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
31 pages
AI&ML Lab Report
No ratings yet
AI&ML Lab Report
19 pages
It, Hardware Exp1
No ratings yet
It, Hardware Exp1
10 pages
AIML Syllabus
No ratings yet
AIML Syllabus
7 pages
ML Lab Manual With Statistical Formulas
No ratings yet
ML Lab Manual With Statistical Formulas
9 pages
956 - BSC DataScience Semester 4 DSC D ML Paper 4
No ratings yet
956 - BSC DataScience Semester 4 DSC D ML Paper 4
3 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Foundation of Data Science Lab Manual Full
No ratings yet
Foundation of Data Science Lab Manual Full
8 pages
Lab Manual Ds&Bdal
No ratings yet
Lab Manual Ds&Bdal
100 pages
Machine Learning With Python
100% (2)
Machine Learning With Python
137 pages
COMP484 Machine Learning Syllabus Undergraduate Fall 2018
No ratings yet
COMP484 Machine Learning Syllabus Undergraduate Fall 2018
6 pages
IML Lab Manual
No ratings yet
IML Lab Manual
31 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Unit 1 - Introduction: Artificial Intelligence For Everyone: Categorize The Given Applications Into The Three Domains
No ratings yet
Unit 1 - Introduction: Artificial Intelligence For Everyone: Categorize The Given Applications Into The Three Domains
10 pages
Syllabus Sem 6
No ratings yet
Syllabus Sem 6
6 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
Data Mining Using Python Lab
100% (1)
Data Mining Using Python Lab
63 pages
AI ML - LAB - Manual - 2020 - 211213 - 100634
No ratings yet
AI ML - LAB - Manual - 2020 - 211213 - 100634
53 pages
ML Lab Manual
No ratings yet
ML Lab Manual
59 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
4 pages
Artificial Intelligence & Machine Learning: Practical Training Report
No ratings yet
Artificial Intelligence & Machine Learning: Practical Training Report
15 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
3 pages
Rufh 2
No ratings yet
Rufh 2
28 pages
PDS Exp 1 To 3
No ratings yet
PDS Exp 1 To 3
17 pages
ML Manual - 2023-24
No ratings yet
ML Manual - 2023-24
54 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
3rd EXPERIMENT
No ratings yet
3rd EXPERIMENT
13 pages
Pds Leb Manual
No ratings yet
Pds Leb Manual
54 pages
Artificial Intelligence Lab: Bahria University, Islamabad
No ratings yet
Artificial Intelligence Lab: Bahria University, Islamabad
5 pages
Machine Learning - Lab Wise Manual Abbbbb
No ratings yet
Machine Learning - Lab Wise Manual Abbbbb
13 pages
ML 1
No ratings yet
ML 1
48 pages
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
Teaching Primary Programming with Scratch Pupil Book Year 5
From Everand
Teaching Primary Programming with Scratch Pupil Book Year 5
Phil Bagge
No ratings yet
New in SAS 9.2
0% (1)
New in SAS 9.2
33 pages
Experimental Design and Data Analysis For Biologists 1st Edition Gerry P. Quinn Download
No ratings yet
Experimental Design and Data Analysis For Biologists 1st Edition Gerry P. Quinn Download
46 pages
(April-17) (EMA-204) B.Tech. Degree Examination Electronics & Communication Engineering Iv Semester Probability Theory and Random Processes
No ratings yet
(April-17) (EMA-204) B.Tech. Degree Examination Electronics & Communication Engineering Iv Semester Probability Theory and Random Processes
3 pages
Digital Communications Homework
No ratings yet
Digital Communications Homework
7 pages
STA466 SOW MAC - AUG 25 (For STUDENTS)
No ratings yet
STA466 SOW MAC - AUG 25 (For STUDENTS)
2 pages
Core Data Analysis Worksheet 6
No ratings yet
Core Data Analysis Worksheet 6
20 pages
(Shavelson & Webb, 2005) - Generalizability Theory
No ratings yet
(Shavelson & Webb, 2005) - Generalizability Theory
14 pages
Chapter 6 MAS202 - ST
No ratings yet
Chapter 6 MAS202 - ST
46 pages
An Introduction To Markovchain Package
No ratings yet
An Introduction To Markovchain Package
35 pages
EC2203
No ratings yet
EC2203
20 pages
GRP 2 - MT381 (Buhlman's Cred)
No ratings yet
GRP 2 - MT381 (Buhlman's Cred)
40 pages
Data Analysis Fundamentals: Data Types and Hypothesis Testing
No ratings yet
Data Analysis Fundamentals: Data Types and Hypothesis Testing
5 pages
Horssen Et Al - 1999
No ratings yet
Horssen Et Al - 1999
14 pages
Statistics Basics Assignment
No ratings yet
Statistics Basics Assignment
4 pages
Sathyabama University: Register Number
No ratings yet
Sathyabama University: Register Number
4 pages
ITCT Lab Manual 2018-19
100% (3)
ITCT Lab Manual 2018-19
40 pages
Mtable PDF
No ratings yet
Mtable PDF
41 pages
ch11 Analysis of Variance and Design of Experiment
No ratings yet
ch11 Analysis of Variance and Design of Experiment
54 pages
Ex4 22
No ratings yet
Ex4 22
3 pages
Lampiran 1 Lembar Persetujuan Setelah Penjelasan (PSP) : (Informed Consent)
No ratings yet
Lampiran 1 Lembar Persetujuan Setelah Penjelasan (PSP) : (Informed Consent)
9 pages
Flight Price Prediction Report
No ratings yet
Flight Price Prediction Report
18 pages
Tolerance Interpretation: Dr. Richard A. Wysk ISE316 Fall 2010
No ratings yet
Tolerance Interpretation: Dr. Richard A. Wysk ISE316 Fall 2010
20 pages
Math Formulas Ans Statistical Tables-ALevels
No ratings yet
Math Formulas Ans Statistical Tables-ALevels
16 pages
One Way Analysis of Variance and DMRT For G9
No ratings yet
One Way Analysis of Variance and DMRT For G9
32 pages
Hypothesis Testing For The Difference of Proportions
No ratings yet
Hypothesis Testing For The Difference of Proportions
6 pages
Gauss Markov Book
No ratings yet
Gauss Markov Book
150 pages
Unit 3 Topic 1 Bivariate Data Analysis: Miss Perry
No ratings yet
Unit 3 Topic 1 Bivariate Data Analysis: Miss Perry
25 pages
AISYAH G0220521 Dan SITTI NASMAIDA G0220503
No ratings yet
AISYAH G0220521 Dan SITTI NASMAIDA G0220503
2 pages
Exercise Risk Management
No ratings yet
Exercise Risk Management
2 pages
1.2. Ch-2 - Correlation Theory-1
No ratings yet
1.2. Ch-2 - Correlation Theory-1
29 pages