Machine Learning
Machine Learning
Of
Machine Learning
May 2025
TABLE OF CONTENT
1
S. No. Name Of Practical Page No.
1. Implementation of vector algebra in 3-5
machine learning
2. Implementation of matrix algebra in 6-9
machine learning
3. Implementation of various data 10-25
preprocessing steps in Python
4. Implementation of Simple linear 26-36
regression in Python.
5. Implementation of Multiple linear 37-42
regression in Python.
6. Implementation of Support Vector 43-45
Machine using python.
7. Implementation of Decision Tree 46-49
Regression using python
8. Implementation of Random forest 50-53
classification using pythons
2
13. Implementation of agglomerative 65-67
hierarchal clustering in python *
14. Implementation of Naïve Bayes 68-70
using Pyton
15. Implementation of Hierarchical 71-72
clustering using Pyton
16. Implementation of Ridge and Lasso 73-74
Regression using Pyton
17. Implementation of DBSCAN using 75-76
Pyton
18. Implementation of K-Mean 77-80
Clustering using Pyton
2
Practical: 1
AIM: Implementation of vector algebra using python.
importnumpy as np 1.
#create a vector
a=np.array([2,3,7])
b=np.array([1,2,3])
Output :
3
2. #vector addition
print(a+b) Output :
3. #vector subtraction
print(a-b) Output :
4. #vector
multiplication
print(a*b) Output :
5. #vector division
print(a/b) Output :
4
6. #vector scaler
multiplication
print(5*a) Output :
7. #vector exponential
print(a**b) Output
print(np.dot(a,b))
Output :
9. #cross product
print(np.cross(a,b))
Output :
5
10. #vector norm
print(np.linalg.norm(
a)) Output :
Practical: 2
AIM: Implementation of basic matrix algebra using python.
numpy as np
a=np.array([[1,3,5], [2,4,7],
[4,9,2]]) b=np.array([[1,2,3],
(b)
Output:
6
2. #addition
print(np.add(a,b)) Output:
3. #subtraction
print(np.subtract(a,b)) Output:
4. #matrix scaler
multiplication
7
print(np.multiply(5,a))
Output:
5. #vactor multiplication
v=np.array([[1],[3],[3]])
print(np.dot(a,v)) Output:
6. #matrix multiplication
print(np.matmul(a,b))
print(np.dot(a,b)) result=a@b
print(result) Output:
8
7.#determinant
print(np.linalg.det(a)) Output:
8. #transpose
print(np.transpose(a))
Output:
9
9. #inverse
print(np.linalg.inv(a))
Output:
Practical -3
Aim: Implementation of various data preprocessing steps in Python.
10
4)Encoding the Categorical data(description)
pandas as pd
11
#total number of null values in the dataset data.isnull().sum().sum()
data.notnull().sum().sum()
a row data.drop("Gender",axis=1)
#dropping a column
data.drop(0, axis=0)
#using dropna(how=?)
data.dropna().any()
data.dropna().all()
12
#dropping the row on basis of particular keyword
data[data["Team"].str.contains("Marketing") == False]
numpy as np
data.fillna(50)
#data.fillna(method= 'pad')
13
dict = {'FirstScore':[100, 90, np.nan, 95],
df = pd.DataFrame(dict) m=
df['FirstScore'].mean()
df['FirstScore'].fillna(m)
14
df.interpolate(method ='linear', limit_direction ='forward')
Description: - This method creates binary columns for each category and assigns a
1 or 0 to indicate the presence or absence of a category. For example, if you have
the same "red," "green," and "blue" categories, one-hot encoding would create
three columns: "red," "green," and "blue,"
category_encoders as ce
dict = {'City':['Delhi','Chennai','bangalore','Hyderabad','Jammu']}
df= pd.DataFrame(dict) df
= encoder.fit_transform(df)
encoded_data
OneHot= pd.get_dummies(df["City"])
OneHot
15
Merge= pd.concat([df, OneHot], axis=1)
#Merge
Merge.drop(["City"], axis=1)
#Merge
• Dummy encoding
Description: - Similar to one-hot encoding but drops one of the columns to avoid
multicollinearity. This is often used when building linear models to avoid
redundancy in the encoded variables.
#dummy encoding
16
dummy = pd.get_dummies(df["City"]) dummy
= dummy.drop("Delhi",axis=1) dummy
effect = ce.sum_coding.SumEncoder(df["City"])
encoded_data=effect.fit_transform(df) encoded_data
• Label encoding
#label encoding
= LabelEncoder()
dict = {'City':['Delhi','Chennai','bangalore','Chennai','Hyderabad','Jammu']}
17
• Ordinal encoding
Description:- This is suitable when there's an inherent order or hierarchy among the
categories. For instance, if you have categories like "low," "medium," and "high,"
you can assign them ordinal values like 1, 2, and 3, respectively.
#ordinal encoding
from sklearn.preprocessing import OrdinalEncoder encoder
= OrdinalEncoder()
• Binary encoding
18
Description:- This method converts categories into binary digits and then splits
those digits into separate columns. It reduces the number of columns compared to
one-hot encoding while still preserving the information.
import BinaryEncoder BE =
BinaryEncoder()
• Count encoding
import CountEncoder CE =
CountEncoder()
a = df3['fruits'].value_counts() df3['fruits'].map(a)
19
• BaseN encoding
dict = {'City':['Delhi','Chennai','bangalore','Chennai','Hyderabad','Jammu']}
• Target encoding
20
representations based on the target variable. In target encoding, each category of
the categorical variable is replaced with the mean (or another statistic) of the target
variable for that category. #target encoding import pandas as pd import
category_encoders as ce
list.append(car1) for i
in range(10000):
list.append(car2)
21
df=pd.DataFrame(list)
df
ce.TargetEncoder().fit_transform(df["cars"],df['price'])
➢ Feature scaling
Feature scaling is a preprocessing technique used in machine learning to
standardize the range of independent variables or features of the dataset. It ensures
that all features have the same scale, which can be crucial for certain algorithms to
perform effectively, particularly those based on distance calculations or gradient
descent optimization.
= MinMaxScaler()
dict= {'weight in grams': [500, 400, 300, 700, 800], 'price in dollars': [10, 8, 5, 12,
15]}
22
import pandas as pd
data=pd.read_csv("pima-indians-diabetes.data.csv")
scaler.fit_transform(df6)
#labels = ('a','b','c','d','e','f','g','h','i')
Standardization
=StandardScaler().fit_transform(df6) print(rescaled_data)
23
24
PRACTICAL-4
AIM: Implementation of Simple linear regression in Python.
plt. scatter(d['cgpa'],
d['package']) plt. xlabel('cgpa')
plt. ylabel('package')
25
x=d.iloc[:, 0:1]
print(x)
y=d.iloc[:, 1:2]
26
print(y)
print(y_train)
MAKING OUR MODEL USING ALWAYS TRAINING DATASET AND THIS IS WHERE
ACTUALLY OUR MACHINE IS LEARNING!
27
PRIDICTIONS ON TRAINING DATASET
import pandas as pd
predictions = lr.predict(x_train)
pre_d=pd.DataFrame (predictions, columns=['predictions'])
print(pre_d)
print(x_train)
28
print(y_train)
plt.scatter(d['cgpa'], d['package'])
plt.plot(x_train, lr.predict (x_train), color=
'red')
plt.xlabel('cgpa')
plt. ylabel('package(in lpa)) '
29
PREDICTIONS ON TEST DATASET
lr.predict(X_test.iloc[0].values.reshape(1,1))
30
# DOING RANDOM PREDICTIONS FOR TESTING OUR MODEL
m=lr.coef_
print(m)
b=lr.intercept_
print(b)
y=m*3.58+b
print(y)
31
EVALUATION METRics
y_pred=lr.predict(x_test) print(y_pred)
y_test.values
32
print("MAE",mean_absolute_error(y_test,y_pred))
print("MSE",mean_squared_error(y_test,y_pred))
print("RMSE",np.sqrt(mean_squared_error(y_test,y_pred)))
33
print("R2 Score"
,r2_score
(y_test,y_pred
))
# Assuming y_test and y_pred are your actual and predicted values respectively
rmse = np.sqrt(mean_squared_error(y_test,y_pred))
print(rmse)
mean_y_test = np.mean(y_test)
mean_y_test
34
PRACTICAL-5
y=d.iloc[:, 4:5]
print(y)
35
import pandas as pd
import matplotlib.pyplot as plt plt.
scatter(d['R&D Spend'], d['Profit'])
plt. xlabel('R&D Spend')
plt. ylabel('Profit')
plt. scatter(d['Administration'],
d['Profit']) plt. xlabel('Administration')
plt. ylabel('Profit')
36
plt. scatter(d['Marketing Spend'],
d['Profit']) plt. xlabel('Marketing Spend')
plt. ylabel('Profit')
37
scatter(d['State'], d['Profit'])
plt. ylabel('Profit')
38
from sklearn.model_selection import train_test_split
# splitting the data
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42) from
sklearn.linear_model import LinearRegression
# creating an object of LinearRegression class
LR = LinearRegression()
# fitting the training data
LR.fit(x_train,y_train)
import numpy as np
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
39
score=r2_score(y_test,y_prediction)
print('mean_sqrd_error is==',mean_squared_error(y_test,y_prediction))
40
Practical-6
Aim-Implementation of Support vector Regression Using Python
41
42
43
Practical -7
Aim-Implementation of Decision Tree Regression using python
44
45
46
Practical: 8
Aim: Implementation of random forest classification using pythons
tep:1 Import the necessary libraries.
load_breast_cance r data =
load_breast_cancer() data.data
data.feature_names
data.target
47
data.target_names
df = pd.DataFrame(np.c_[data.data, data.target],
columns=[list(data.feature_names)+['target']]) df.head()
df.tail() df.shape
= df.iloc[:, 0:-1] y =
df.iloc[:, -1]
48
Step: 5 Train Random Forest Classification Model from
RandomForestClassifier(n_estimators=100, criterion='gini')
patient1 = [17.99, 10.38, 122.8, 1001.0, 0.1184, 0.2776, 0.3001, 0.1471, 0.2419, 0.07871, 1.095,
0.9053, 8.589, 153.4, 0.006399, 0.04904, 0.05373, 0.01587, 0.03003, 0.006193, 25.38, 17.33,
184.6, 2019.0, 0.1622, 0.6656, 0.7119, 0.2654, 0.4601, 0.1189] patient1 = np.array([patient1])
patient1
classifier.predict(patient1)
pred = classifier.predict(patient1)
if pred[0] == 0:
tumor)')
49
Practical –9
Aim: Implement Random Forest Regression using Python.
50
Step 1: Import necessary libraries.
51
Step 2: Load the Height-Age dataset.
52
Step 3: Separate the dataset into independent and dependent variables.
53
Step 10: Visualize the Random Forest Regression.
Practical-10
Aim: Implementation of Logistic Regression using python
54
importnumpy
#X represents the size oftumor
a in centimeters.
X = numpy.array([
3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)
#Note: X has to be reshaped into a column from a row for the LogisticRegression() function to work.
#y represents whether or notthe tumor is cancerous (0 for "No", 1 for "Yes").
y = numpy.array([
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
fromsklearnimportlinear_model
logr = linear_model.LogisticRegression()
logr.fit(X,y)
print(predicted)
Example2:
importpandasas pd
55
df.head()
sum(df.duplicated())
df.drop_duplicates(inplace=True)
56
y = df.Class
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.20, random_state=5, stratify=y)
# Fit the scaler to the training data and transform both the training and test data scaler
= StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
model = LogisticRegression()
57
Practical-11
Aim: Implementation of KNN Regression using python
#importing neccessary libraries
import pandas as pd import
numpy as np
gymdata=pd.DataFrame(data=DataValues,columns=ColumnNames) gymdata.head()
TargetVariable='Weight'
Predictors=['Hours','Calories']
x=gymdata[Predictors].values
y=gymdata[TargetVariable].values print(x)
print(y)
58
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
from sklearn.neighbors import KNeighborsRegressor RegModel =
KNeighborsRegressor(n_neighbors=2) print(RegModel)
KNN=RegModel.fit(X_train,y_train)
prediction=KNN.predict(X_test) print(X_test)
TestingDataResults=pd.DataFrame(data=X_test, columns=Predictors)
TestingDataResults[TargetVariable]=y_test
TestingDataResults[('Predicted'+TargetVariable)]=prediction TestingDataResults.head()
Practical-12
Aim: Implementation of Clustering with K means using python.
59
from sklearn.cluster import KMeans import
pandas as pd from sklearn.preprocessing import
MinMaxScaler from matplotlib import pyplot as
plt df = pd.read_csv("income.csv") df.head()
plt.scatter(df.Age,df['Income($)'])
plt.xlabel('Age') plt.ylabel('Income($)')
df['cluster']=y_predicted df.head()
60
km.cluster_centers_
61
Preprocessing using min max scale scaler
= MinMaxScaler()
scaler.fit(df[['Income($)']]) df['Income($)'] =
scaler.transform(df[['Income($)']])
scaler.fit(df[['Age']]) df['Age'] =
scaler.transform(df[['Age']]) df.head()
plt.scatter(df.Age,df['Income($)'])
62
km = KMeans(n_clusters=3) y_predicted =
km.fit_predict(df[['Age','Income($)']]) y_predicted
df['cluster']=y_predicted df.head()
km.cluster_centers_
63
df1 = df[df.cluster==0] df2 = df[df.cluster==1] df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker=
'*',label='centroid')
plt.legend()
Elbow Plot
sse = [] k_rng =
range(1,10) for k in
k_rng:
km = KMeans(n_clusters=k)
km.fit(df[['Age','Income($)']])
sse.append(km.inertia_)
plt.xlabel('K') plt.ylabel('Sum of
squared error')
plt.plot(k_rng,sse)
64
Practical-13
Aim-Implementation of Agglomerative Hierarchical
Clustering in python.
65
x = dataset.iloc[:, [3, 4]].values print(x)
66
#training the hierarchical model on dataset from sklearn.cluster import
AgglomerativeClustering hc= AgglomerativeClustering(n_clusters=5,
metric='euclidean', linkage='ward') y_pred= hc.fit_predict(x)
y_pred
67
#visulaizing the clusters mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 50,
c = 'blue', label = 'Cluster
1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c = 'green', label =
'Cluster 2') mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red',
label = 'Cluster
3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c = 'cyan', label =
'Cluster 4') mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c =
'magenta', label =
'Cluster 5') mtp.title('Clusters of
customers') mtp.xlabel('Annual
Income (k$)') mtp.ylabel('Spending
Score (1-100)') mtp.legend()
mtp.show()
68
Practical -14
Aim: Implementation of Naïve Bayes in Python.
69
70
71
Practical -15
Aim: Implementation of Hierarchical Clustering in Python.
72
73
Practical -16
Aim: Implementation of Ridge and Lasso Regression in Python.
74
75
76
Practical -15
Aim: Implementation of Hierarchical in Python.
77
78
Practical -18
Aim: Implementation of K-Mean Clustering in Python.
79
80
81