0% found this document useful (0 votes)
5 views

Machine Learning

The document is a practical file for a Machine Learning course, detailing various implementations of machine learning techniques using Python. It includes practical exercises on vector and matrix algebra, data preprocessing, and multiple regression models, along with code snippets and outputs for each implementation. The file is submitted by a student to their instructor at Akal University, Talwandi Sabo, in May 2025.

Uploaded by

diljeetpc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Machine Learning

The document is a practical file for a Machine Learning course, detailing various implementations of machine learning techniques using Python. It includes practical exercises on vector and matrix algebra, data preprocessing, and multiple regression models, along with code snippets and outputs for each implementation. The file is submitted by a student to their instructor at Akal University, Talwandi Sabo, in May 2025.

Uploaded by

diljeetpc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 81

Practical File

Of
Machine Learning

Submitted to: Submitted by:

Er.Zubair Fayaz Name: Diljeet Singh


Dept. of Computer Science & Engineering Class: B-Tech (Sem-4) AI-ML
AUID: 237106007

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

AKAL UNIVERSITY, TALWANDI SABO

May 2025

TABLE OF CONTENT

1
S. No. Name Of Practical Page No.
1. Implementation of vector algebra in 3-5
machine learning
2. Implementation of matrix algebra in 6-9
machine learning
3. Implementation of various data 10-25
preprocessing steps in Python
4. Implementation of Simple linear 26-36
regression in Python.
5. Implementation of Multiple linear 37-42
regression in Python.
6. Implementation of Support Vector 43-45
Machine using python.
7. Implementation of Decision Tree 46-49
Regression using python
8. Implementation of Random forest 50-53
classification using pythons

9. Implementation of Random forest 54-55


regression using python

10. Implementation of Logistic 56-59


Regression using python
11. Implementation of KNN regression 60-61
in python
12. Implementation of Clustering with 62-64
k-means in python

2
13. Implementation of agglomerative 65-67
hierarchal clustering in python *
14. Implementation of Naïve Bayes 68-70
using Pyton
15. Implementation of Hierarchical 71-72
clustering using Pyton
16. Implementation of Ridge and Lasso 73-74
Regression using Pyton
17. Implementation of DBSCAN using 75-76
Pyton
18. Implementation of K-Mean 77-80
Clustering using Pyton
2

Practical: 1
AIM: Implementation of vector algebra using python.

#import necessary libraries

importnumpy as np 1.

#create a vector

a=np.array([2,3,7])

b=np.array([1,2,3])

Output :

3
2. #vector addition

print(a+b) Output :

3. #vector subtraction

print(a-b) Output :

4. #vector

multiplication

print(a*b) Output :

5. #vector division

print(a/b) Output :

4
6. #vector scaler

multiplication

print(5*a) Output :

7. #vector exponential
print(a**b) Output

8. #vector dot product

print(np.dot(a,b))

Output :

9. #cross product

print(np.cross(a,b))

Output :

5
10. #vector norm

print(np.linalg.norm(

a)) Output :

Practical: 2
AIM: Implementation of basic matrix algebra using python.

1. # create a matrix Import

numpy as np

a=np.array([[1,3,5], [2,4,7],

[4,9,2]]) b=np.array([[1,2,3],

[4,5,6], [7,8,9]]) print (a) print

(b)

Output:

6
2. #addition

print(np.add(a,b)) Output:

3. #subtraction

print(np.subtract(a,b)) Output:

4. #matrix scaler

multiplication

7
print(np.multiply(5,a))

Output:

5. #vactor multiplication

v=np.array([[1],[3],[3]])

print(np.dot(a,v)) Output:

6. #matrix multiplication

print(np.matmul(a,b))

print(np.dot(a,b)) result=a@b

print(result) Output:

8
7.#determinant
print(np.linalg.det(a)) Output:

8. #transpose

print(np.transpose(a))

Output:

9
9. #inverse

print(np.linalg.inv(a))

Output:

Practical -3
Aim: Implementation of various data preprocessing steps in Python.

Description:- 1)Importing the libraries.

2)Importing the dataset

3)Taking care of missing data.

10
4)Encoding the Categorical data(description)

5)Feature Scaling(Normalization and Standardization)

#importing the dataset import

pandas as pd

data = pd.read_csv("employees.csv") data

#checking missing data data.isnull()

#check number of null values in each column


data.isnull().sum()

11
#total number of null values in the dataset data.isnull().sum().sum()

#total number of not null values in the dataset

data.notnull().sum().sum()

#drop all the missing values data.dropna()

#drop null values from a particular

column data["Gender"].dropna() #dropping

a row data.drop("Gender",axis=1)

#dropping a column

data.drop(0, axis=0)

#using dropna(how=?)

data.dropna().any()

data.dropna().all()

12
#dropping the row on basis of particular keyword

data[data["Team"].str.contains("Marketing") == False]

#using fillna import

numpy as np

data.fillna(50)

#data.fillna(method= 'pad')

#data['Team'].fillna(method= 'bfill', inplace= True)

data.replace(to_replace=np.NaN, value=50) data.head(5)

13
dict = {'FirstScore':[100, 90, np.nan, 95],

'SecondScore': [30, 45, 56, np.nan],

'ThirdScore':[np.nan, 40, 80, 98]}

# creating a dataframe from list

df = pd.DataFrame(dict) m=

df['FirstScore'].mean()

df['FirstScore'].fillna(m)

14
df.interpolate(method ='linear', limit_direction ='forward')

Encoding the Categorical data.


Description: - Encoding categorical data is a common task in machine learning and
data analysis, especially when working with algorithms that require numerical
input.

• One-hot encoding(1,0 form)

Description: - This method creates binary columns for each category and assigns a
1 or 0 to indicate the presence or absence of a category. For example, if you have
the same "red," "green," and "blue" categories, one-hot encoding would create
three columns: "red," "green," and "blue,"

#encoding categorical data import

category_encoders as ce

dict = {'City':['Delhi','Chennai','bangalore','Hyderabad','Jammu']}

df= pd.DataFrame(dict) df

encoder = ce.OneHotEncoder(cols="City", handle_unknown= True) encoded_data

= encoder.fit_transform(df)

encoded_data

OneHot= pd.get_dummies(df["City"])

OneHot

15
Merge= pd.concat([df, OneHot], axis=1)

#Merge

Merge.drop(["City"], axis=1)

#Merge

• Dummy encoding

Description: - Similar to one-hot encoding but drops one of the columns to avoid
multicollinearity. This is often used when building linear models to avoid
redundancy in the encoded variables.

#dummy encoding

16
dummy = pd.get_dummies(df["City"]) dummy

= dummy.drop("Delhi",axis=1) dummy

#effect encoding or deviation or sum encoding

effect = ce.sum_coding.SumEncoder(df["City"])

encoded_data=effect.fit_transform(df) encoded_data

• Label encoding

Description:- This involves assigning a unique integer to each category. For


example, if you have categories like "red," "green," and "blue," you could encode
them as 0, 1, and 2, respectively.

#label encoding

from sklearn.preprocessing import LabelEncoder le

= LabelEncoder()

dict = {'City':['Delhi','Chennai','bangalore','Chennai','Hyderabad','Jammu']}

df1= pd.DataFrame(dict) df1

df1["City_Label"]= le.fit_transform(df1["City"]) df1

17
• Ordinal encoding

Description:- This is suitable when there's an inherent order or hierarchy among the
categories. For instance, if you have categories like "low," "medium," and "high,"
you can assign them ordinal values like 1, 2, and 3, respectively.

#ordinal encoding
from sklearn.preprocessing import OrdinalEncoder encoder

= OrdinalEncoder()

dict = {'T-Shirt size':['Small','Medium','Large']}

df2= pd.DataFrame(dict) df2

encoded_data= encoder.fit_transform(df2) encoded_data

• Binary encoding

18
Description:- This method converts categories into binary digits and then splits
those digits into separate columns. It reduces the number of columns compared to
one-hot encoding while still preserving the information.

#binary encoding from category_encoders

import BinaryEncoder BE =

BinaryEncoder()

encoded_data = BE.fit_transform(df2) encoded_data

• Count encoding

Description:- Count encoding is a method used to transform categorical variables


into numerical representations based on the frequency of each category in the
dataset. It replaces each category with the number of times it appears in the dataset.
This encoding technique is particularly useful for high-cardinality categorical
variables, where one-hot encoding might lead to a high-dimensional sparse matrix.

#count encoding from category_encoders

import CountEncoder CE =

CountEncoder()

df3 = pd.DataFrame({'fruits':['Apple','banana','Cherry', 'Apple', 'Cherry']}) df3

a = df3['fruits'].value_counts() df3['fruits'].map(a)

encoded_data = CE.fit_transform(df3) encoded_data

19
• BaseN encoding

Description:- BaseN encoding is a method used to represent numerical data in a


different base system, such as binary, octal, or hexadecimal. In this encoding, each
digit in the original number is represented by a character in the chosen base system.

#baseN encoding import category_encoders as ce

dict = {'City':['Delhi','Chennai','bangalore','Chennai','Hyderabad','Jammu']}

df1= pd.DataFrame(dict) df1

encoder = ce.BaseNEncoder(cols=['City'], base=5, return_df=True)

encoded_data = encoder.fit_transform(df1) encoded_data

• Target encoding

Description:- Target encoding, also known as mean encoding or likelihood


encoding, is a method used to encode categorical variables into numerical

20
representations based on the target variable. In target encoding, each category of
the categorical variable is replaced with the mean (or another statistic) of the target
variable for that category. #target encoding import pandas as pd import
category_encoders as ce

car1={ "cars":"bmw", "price":20 } car2={ "cars":"audi",

"price":30 } list=[] for i in range(10000):

list.append(car1) for i

in range(10000):

list.append(car2)

21
df=pd.DataFrame(list)

df

ce.TargetEncoder().fit_transform(df["cars"],df['price'])

➢ Feature scaling
Feature scaling is a preprocessing technique used in machine learning to
standardize the range of independent variables or features of the dataset. It ensures
that all features have the same scale, which can be crucial for certain algorithms to
perform effectively, particularly those based on distance calculations or gradient
descent optimization.

Min-Max Scaling (Normalization)

Description:- Min-max scaling, also known as normalization, is a technique used in


data preprocessing to scale numeric features to a specific range. This process
involves transforming the data such that it falls within a pre-defined interval,
typically [0, 1] or [-1, 1].

#min max scaling

from sklearn.preprocessing import MinMaxScaler scaler

= MinMaxScaler()

dict= {'weight in grams': [500, 400, 300, 700, 800], 'price in dollars': [10, 8, 5, 12,
15]}

df5 = pd.DataFrame(dict) df5

22
import pandas as pd

data=pd.read_csv("pima-indians-diabetes.data.csv")

df6 = pd.DataFrame(data) scaled_data =

scaler.fit_transform(df6)

#labels = ('a','b','c','d','e','f','g','h','i')

frame = pd.DataFrame(scaled_data, columns=df6.columns) frame

Standardization

Description:- Standardization, also known as z-score normalization, is another data


preprocessing technique used to scale numeric features. Unlike min-max scaling,
standardization does not bound the data to a specific range like [0, 1] or [-1, 1].
Instead, it centers the data around the mean and scales it based on the standard
deviation. This results in transformed data with a mean of 0 and a standard
deviation of 1.

from sklearn .preprocessing import StandardScaler rescaled_data

=StandardScaler().fit_transform(df6) print(rescaled_data)

23
24
PRACTICAL-4
AIM: Implementation of Simple linear regression in Python.

Simple linear regression: Aims to find a linear relationship to


describe the correlation between an independent and possibly
dependent variable.
import pandas as pd
import pandas as pd
import matplotlib.pyplot as plt
d=pd.read_csv("/content/placement and cgpa.csv") print(d)

plt. scatter(d['cgpa'],
d['package']) plt. xlabel('cgpa')
plt. ylabel('package')

25
x=d.iloc[:, 0:1]
print(x)

y=d.iloc[:, 1:2]

26
print(y)

from sklearn.model_selection import train_test_split


x_train, x_test, y_train, y_test =train_test_split(x, y, test_size=0.2)
print(x_train) print(x_test)

print(y_train)

MAKING OUR MODEL USING ALWAYS TRAINING DATASET AND THIS IS WHERE
ACTUALLY OUR MACHINE IS LEARNING!

from sklearn.linear_model import


LinearRegression lr=LinearRegression ()
lr.fit(x_train, y_train)

27
PRIDICTIONS ON TRAINING DATASET
import pandas as pd
predictions = lr.predict(x_train)
pre_d=pd.DataFrame (predictions, columns=['predictions'])
print(pre_d)

print(x_train)

28
print(y_train)

VISUALIZATION ON TRAINING DATASET

plt.scatter(d['cgpa'], d['package'])
plt.plot(x_train, lr.predict (x_train), color=
'red')
plt.xlabel('cgpa')
plt. ylabel('package(in lpa)) '

29
PREDICTIONS ON TEST DATASET
lr.predict(X_test.iloc[0].values.reshape(1,1))

# Visualization on Test Dataset


plt.scatter(df['cgpa'],df['package']) plt.plot(X_test,lr.predict(X_test),color='green') #It's
predicting the target variable values #for each input feature in X_train.
plt.xlabel('CGPA') plt.ylabel('Package(in
lpa)')

30
# DOING RANDOM PREDICTIONS FOR TESTING OUR MODEL

m=lr.coef_
print(m)

b=lr.intercept_
print(b)

y=m*3.58+b
print(y)

31
EVALUATION METRics

from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score

y_pred=lr.predict(x_test) print(y_pred)

y_test.values

32
print("MAE",mean_absolute_error(y_test,y_pred))

print("MSE",mean_squared_error(y_test,y_pred))

multiple regression models the linear relationship between single dependent


variable and ore than one independent variable import pandas as pd
d=pd.read_csv("/content/50_Startups.csv")
print(d)

print("RMSE",np.sqrt(mean_squared_error(y_test,y_pred)))

33
print("R2 Score"
,r2_score
(y_test,y_pred
))

# Assuming y_test and y_pred are your actual and predicted values respectively

# Calculate Mean Squared Error (MSE)


mse = mean_squared_error(y_test, y_pred)

# Calculate the variance of the actual target values variance_y_test


= np.var(y_test)

# Calculate Relative MSE


relative_mse = mse / variance_y_test

print("Relative MSE:", relative_mse)

rmse = np.sqrt(mean_squared_error(y_test,y_pred))
print(rmse)

mean_y_test = np.mean(y_test)
mean_y_test

CV= rmse/ mean_y_test


CV

34
PRACTICAL-5

Aim: Implementation of Multiple linear Regression in python


import pandas as pd import
matplotlib.pyplot as plt df =
pd.read_csv('50_Startups.csv')
df.head()

Multiple Linear Regression is one of the important regression algorithms which


models the linear relationship between a single dependent continuous variable and
more than one independent variable. Example: Prediction of CO2 emission based
on engine size and number of cylinders in a car.
x=d.iloc[:, 0:4]
print(x)

y=d.iloc[:, 4:5]
print(y)

35
import pandas as pd
import matplotlib.pyplot as plt plt.
scatter(d['R&D Spend'], d['Profit'])
plt. xlabel('R&D Spend')
plt. ylabel('Profit')

plt. scatter(d['Administration'],
d['Profit']) plt. xlabel('Administration')
plt. ylabel('Profit')

36
plt. scatter(d['Marketing Spend'],
d['Profit']) plt. xlabel('Marketing Spend')
plt. ylabel('Profit')

plt. xlabel('State') plt.

37
scatter(d['State'], d['Profit'])
plt. ylabel('Profit')

Handling Categorical Variables


import category_encoders as ce import
pandas as pd
dp=pd.get_dummies(data=d,drop_first=True)
print(dp)

38
from sklearn.model_selection import train_test_split
# splitting the data
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42) from
sklearn.linear_model import LinearRegression
# creating an object of LinearRegression class
LR = LinearRegression()
# fitting the training data
LR.fit(x_train,y_train)

y_prediction = LR.predict(x_test) y_prediction

coefficients = LR.coef_ intercept


= LR.intercept_
print(coefficients)
print(intercept)

import numpy as np
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error

# predicting the accuracy score

39
score=r2_score(y_test,y_prediction)

print('r2 socre is ',score)

print('mean_sqrd_error is==',mean_squared_error(y_test,y_prediction))

print('root_mean_squared error of is==',np.sqrt(mean_squared_error(y_test,y_prediction)))

40
Practical-6
Aim-Implementation of Support vector Regression Using Python

41
42
43
Practical -7
Aim-Implementation of Decision Tree Regression using python

44
45
46
Practical: 8
Aim: Implementation of random forest classification using pythons
tep:1 Import the necessary libraries.

import numpy as np import pandas as pd Step:

2 Load dataset from sklearn.datasets import

load_breast_cance r data =

load_breast_cancer() data.data

data.feature_names

data.target

47
data.target_names

df = pd.DataFrame(np.c_[data.data, data.target],

columns=[list(data.feature_names)+['target']]) df.head()

df.tail() df.shape

Step: 4 Split Data X

= df.iloc[:, 0:-1] y =

df.iloc[:, -1]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print('Shape of X_train = ', X_train.shape) print('Shape of y_train = ', y_train.shape)

print('Shape of X_test = ', X_test.shape)

print('Shape of y_test = ',


y_test.shape)

48
Step: 5 Train Random Forest Classification Model from

sklearn.ensemble import RandomForestClassifier classifier =

RandomForestClassifier(n_estimators=100, criterion='gini')

classifier.fit(X_train, y_train) classifier.score(X_test, y_test)

Step: 6 Predict Cancer

patient1 = [17.99, 10.38, 122.8, 1001.0, 0.1184, 0.2776, 0.3001, 0.1471, 0.2419, 0.07871, 1.095,
0.9053, 8.589, 153.4, 0.006399, 0.04904, 0.05373, 0.01587, 0.03003, 0.006193, 25.38, 17.33,
184.6, 2019.0, 0.1622, 0.6656, 0.7119, 0.2654, 0.4601, 0.1189] patient1 = np.array([patient1])

patient1

classifier.predict(patient1)

pred = classifier.predict(patient1)

if pred[0] == 0:

print('Patient has Cancer (malignant tumor)')

else: print('Patient has no Cancer (benign

tumor)')

49
Practical –9
Aim: Implement Random Forest Regression using Python.

50
Step 1: Import necessary libraries.

51
Step 2: Load the Height-Age dataset.

52
Step 3: Separate the dataset into independent and dependent variables.

Step 4: Split the dataset into training and testing sets.

Step 5: Import the Random Forest Regressor.

Step 6: Create a Random Forest Regressor object.

Step 7: Train the model with the training data.

Step 8: Make predictions on the test dataset.

Step 9: Evaluate the model using R-Square.

53
Step 10: Visualize the Random Forest Regression.

Practical-10
Aim: Implementation of Logistic Regression using python

54
importnumpy
#X represents the size oftumor
a in centimeters.
X = numpy.array([
3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)

#Note: X has to be reshaped into a column from a row for the LogisticRegression() function to work.
#y represents whether or notthe tumor is cancerous (0 for "No", 1 for "Yes").
y = numpy.array([
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
fromsklearnimportlinear_model
logr = linear_model.LogisticRegression()
logr.fit(X,y)

#predict if tumor is cancerous where the size is 3.46mm:


predicted = logr.predict(numpy.array([
3.46]).reshape(-1,1))

print(predicted)

Example2:
importpandasas pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn import preprocessing

from sklearn.metrics import accuracy_score

from matplotlib import pyplot as plt

import seaborn as sns df =


pd.read_csv('creditcard.csv')
df.info()

55
df.head()

sum(df.duplicated())

df.drop_duplicates(inplace=True)

df.drop('Time', axis=1, inplace=True) X


= df.iloc[:,df.columns != 'Class']

56
y = df.Class
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.20, random_state=5, stratify=y)

from sklearn.preprocessing import StandardScaler

# Fit the scaler to the training data and transform both the training and test data scaler
= StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = LogisticRegression()

model.fit(X_train_scaled, y_train) #training the model

# Make predictions using the trained model


y_pred = model.predict(X_test_scaled)

train_acc = model.score(X_train_scaled, y_train)


print("The Accuracy for Training Set is {}".format(train_acc*100))

test_acc = accuracy_score(y_test, y_pred)


print("The Accuracy for Test Set is {}".format(test_acc*100))

57
Practical-11
Aim: Implementation of KNN Regression using python
#importing neccessary libraries
import pandas as pd import
numpy as np

gymdata=pd.DataFrame(data=DataValues,columns=ColumnNames) gymdata.head()

TargetVariable='Weight'
Predictors=['Hours','Calories']
x=gymdata[Predictors].values
y=gymdata[TargetVariable].values print(x)

print(y)

from sklearn.model_selection import train_test_split

58
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
from sklearn.neighbors import KNeighborsRegressor RegModel =
KNeighborsRegressor(n_neighbors=2) print(RegModel)

KNN=RegModel.fit(X_train,y_train)
prediction=KNN.predict(X_test) print(X_test)

from sklearn import metrics


print('R2 Value:',metrics.r2_score(y_train, KNN.predict(X_train)))

print('Accuracy',100- (np.mean(np.abs((y_test - prediction) / y_test)) * 100))

TestingDataResults=pd.DataFrame(data=X_test, columns=Predictors)
TestingDataResults[TargetVariable]=y_test
TestingDataResults[('Predicted'+TargetVariable)]=prediction TestingDataResults.head()

Practical-12
Aim: Implementation of Clustering with K means using python.

59
from sklearn.cluster import KMeans import
pandas as pd from sklearn.preprocessing import
MinMaxScaler from matplotlib import pyplot as
plt df = pd.read_csv("income.csv") df.head()
plt.scatter(df.Age,df['Income($)'])
plt.xlabel('Age') plt.ylabel('Income($)')

# Assuming df is your DataFrame containing the 'Age' and 'Income($)' columns


km = KMeans(n_clusters=3) y_predicted =
km.fit_predict(df[['Age','Income($)']])

# Now y_predicted should contain the cluster labels


print(y_predicted)

df['cluster']=y_predicted df.head()

60
km.cluster_centers_

df1 = df[df.cluster==0] df2 = df[df.cluster==1] df3 = df[df.cluster==2]


plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker=
'*',label='centroid')
plt.xlabel('Age')
plt.ylabel('Income ($)')
plt.legend()

61
Preprocessing using min max scale scaler
= MinMaxScaler()

scaler.fit(df[['Income($)']]) df['Income($)'] =
scaler.transform(df[['Income($)']])

scaler.fit(df[['Age']]) df['Age'] =
scaler.transform(df[['Age']]) df.head()

plt.scatter(df.Age,df['Income($)'])

62
km = KMeans(n_clusters=3) y_predicted =
km.fit_predict(df[['Age','Income($)']]) y_predicted

df['cluster']=y_predicted df.head()

km.cluster_centers_

63
df1 = df[df.cluster==0] df2 = df[df.cluster==1] df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker=
'*',label='centroid')
plt.legend()

Elbow Plot
sse = [] k_rng =
range(1,10) for k in
k_rng:
km = KMeans(n_clusters=k)
km.fit(df[['Age','Income($)']])
sse.append(km.inertia_)
plt.xlabel('K') plt.ylabel('Sum of
squared error')
plt.plot(k_rng,sse)

64
Practical-13
Aim-Implementation of Agglomerative Hierarchical
Clustering in python.

# Importing the libraries


import numpy as nm import
matplotlib.pyplot as mtp
import pandas as pd import
warnings
# Define a function that triggers a specific warning def
trigger_warning():
warnings.warn("This is a warning message", Warning)

# Ignore the specific warning using a context manager


with warnings.catch_warnings():
warnings.simplefilter("ignore") trigger_warning()

# After the context manager, warnings are not ignored anymore


trigger_warning()

# Importing the dataset dataset =


pd.read_csv('Mall_Customers.csv')
dataset.head()

65
x = dataset.iloc[:, [3, 4]].values print(x)

#Finding the optimal number of clusters using the dendrogram import


scipy.cluster.hierarchy as shc dendro = shc.dendrogram(shc.linkage(x,
method="ward"))#ward is technique mtp.title("Dendrogram Plot")
mtp.ylabel("Euclidean Distances") mtp.xlabel("Customers") mtp.show()

66
#training the hierarchical model on dataset from sklearn.cluster import
AgglomerativeClustering hc= AgglomerativeClustering(n_clusters=5,
metric='euclidean', linkage='ward') y_pred= hc.fit_predict(x)

y_pred

67
#visulaizing the clusters mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 50,
c = 'blue', label = 'Cluster
1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c = 'green', label =
'Cluster 2') mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red',
label = 'Cluster
3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c = 'cyan', label =
'Cluster 4') mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c =
'magenta', label =
'Cluster 5') mtp.title('Clusters of
customers') mtp.xlabel('Annual
Income (k$)') mtp.ylabel('Spending
Score (1-100)') mtp.legend()
mtp.show()

68
Practical -14
Aim: Implementation of Naïve Bayes in Python.

69
70
71
Practical -15
Aim: Implementation of Hierarchical Clustering in Python.

72
73
Practical -16
Aim: Implementation of Ridge and Lasso Regression in Python.

74
75
76
Practical -15
Aim: Implementation of Hierarchical in Python.

77
78
Practical -18
Aim: Implementation of K-Mean Clustering in Python.

79
80
81

You might also like