0% found this document useful (0 votes)

13 views45 pages

Machine_Learning_Lab_File (1)

The document is a lab file for a Machine Learning course at Chandigarh Group of Colleges, detailing various experiments related to data preprocessing, regression techniques, decision trees, and random forest classification. Each experiment includes an aim, methodology, and program code for implementation. The document serves as a practical guide for students to understand and apply machine learning concepts using Python.

Uploaded by

varnikasood05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views45 pages

Machine_Learning_Lab_File (1)

Uploaded by

varnikasood05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Chandigarh Group of Colleges

College of Engineering, Landran, Mohali-

140307

Department of Computer Science & Engineering

Machine Learning
Lab File (BTCS619-18)

SUBMITTED TO: SUBMITTED BY:

INDEX

Exp. No. Name of the Experiment Page No.

1 Implement Data Preprocessing. 3-7

2 Implement Simple Linear Regression. 8-14

3 Simulate Multiple Linear Regression. 15-16

4 Implement Decision Tree. 17-21

5 Deploy Random Forest Classification. 22-25

6 Simulate Naïve Bayes Theorem. 26-29

7 Implement K-Nearest Neighbors(K-NN), k- 30-36

means.

8 Deploy Support Vector Machine, Apriori 37-39

Algorithm.
9 Simulate Artificial Neural Network. 40-42

10 Implement the Genetic Algorithm code. 43-45

EXPERIMENT-1
AIM: Implement Data Preprocessing.

Data preprocessing is a data mining technique which is used to transform the

raw data in a useful and efficient format.

Steps Involved in Data Preprocessing:

1. Data Cleaning:
The data can have many irrelevant and missing parts. To handle this part,
data cleaning is done. It involves handling of missing data, noisy data etc.
 (a). Missing Data:
This situation arises when some data is missing in the data. It can be
handled in various ways.
Some of them are:
1. Ignore the tuples:
This approach is suitable only when the dataset we have is quite large
and multiple values are missing within a tuple.
2. Fill the Missing values:
There are various ways to do this task. You can choose to fill the
missing values manually, by attribute mean or the most probable value.
 (b). Noisy Data:
Noisy data is a meaningless data that can’t be interpreted by machines.It
can be generated due to faulty data collection, data entry errors etc. It can
be handled in following ways :

1. Binning Method:
This method works on sorted data in order to smooth it. The whole
data is divided into segments of equal size and then various methods
are performed to complete the task. Each segmented is handled
separately. One can replace all data in a segment by its mean or
boundary values can be used to complete the task.
2. Regression:
Here data can be made smooth by fitting it to a regression function.The
regression used may be linear (having one independent variable) or
multiple (having multiple independent variables).
3. Clustering:
This approach groups the similar data in a cluster. The outliers may be
undetected or it will fall outside the clusters.
2. Data Transformation:
This step is taken in order to transform the data in appropriate forms suitable
for mining process. This involves following ways:
1. Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0
or 0.0 to 1.0)
2. Attribute Selection:
In this strategy, new attributes are constructed from the given set of
attributes to help the mining process.
3. Discretization:
This is done to replace the raw values of numeric attribute by interval
levels or conceptual levels.
4. Concept Hierarchy Generation:
Here attributes are converted from level to higher level in hierarchy. For
Example-The attribute “city” can be converted to “country”.
3. Data Reduction:
Since data mining is a technique that is used to handle huge amount of data.
While working with huge volume of data, analysis became harder in such
cases. In order to get rid of this, we uses data reduction technique. It aims to
increase the storage efficiency and reduce data storage and analysis costs.
The various steps to data reduction are:
1. Data Cube Aggregation:
Aggregation operation is applied to data for the construction of the data
cube.
2. Attribute Subset Selection:
The highly relevant attributes should be used, rest all can be discarded.
For performing attribute selection, one can use level of significance and p-
value of the attribute. The attribute having p-value greater than
significance level can be discarded.
3. Numerosity Reduction:
This enable to store the model of data instead of whole data, for example:
Regression Models.
4. Dimensionality Reduction:
This reduce the size of data by encoding mechanisms.It can be lossy or
lossless. If after reconstruction from compressed data, original data can be
retrieved, such reduction are called lossless reduction else it is called lossy
reduction. The two effective methods of dimensionality reduction are:
Wavelet transforms and PCA (Principal Component Analysis).

Program :

import pandas as pd
import numpy as np

data = pd.read_csv("data1.csv")
print("data\n")
print(data.head())

# find missing data row

indi = []
for index,item in data.iterrows():
if(str(item["RM"]) == 'nan'):
indi.append(index)
print("Index where data is null\n")
print(indi)

# using function
print("Using function")
null_data = pd.isnull(data["RM"])
print(null_data)
print(data[null_data])

# Train-Test Splitting

def split_train_test_data(data, test_ratio):

np.random.seed(42)
shuffled = np.random.permutation(len(data))
test_set_size = int(len(data) * test_ratio)
test_indices = shuffled[:test_set_size]
train_indices = shuffled[test_set_size:]
return data.iloc[train_indices], data.iloc[test_indices]

train_set, test_set = split_train_test_data(data, 0.4)

print(f"Rows in train set: {len(train_set)}\nRows in test set:

{len(test_set)}\n")

# using function
print("Using Function")
from sklearn.model_selection import train_test_split
train_set_using_function, test_set_using_function = train_test_split(d
ata, test_size=0.2, random_state=42) # , random_state=42
print(f"Rows in train set: {len(train_set_using_function)}\nRows in tes
t set: {len(test_set_using_function)}\n")

Output :
EXPERIMENT-2
AIM: Implement Simple Linear Regression.
Simple Linear Regression is a type of Regression algorithms that models the
relationship between a dependent variable and a single independent variable.
The relationship shown by a Simple Linear Regression model is linear or a
sloped straight line, hence it is called Simple Linear Regression.

For simple linear regression, the form of the model is-

Y = β 0 + β 1X

Here,
• Y is a dependent variable.
• X is an independent variable.

• β0 and β1 are the regression coefficients.

• β0 is the intercept or the bias that fixes the offset to a line.

• β1 is the slope or weight that specifies the factor by which X has an

impact on Y.

There are following 3 cases possible-

Case-01: β1 < 0

• It indicates that variable X has negative impact on Y.

• If X increases, Y will decrease and vice-versa.
Case-02: β1 = 0

• It indicates that variable X has no impact on Y.

• If X changes, there will be no change in Y.

Case-03: β1 > 0

• It indicates that variable X has positive impact on Y.

• If X increases, Y will increase and vice-versa.
Program :

import pandas
from pandas import DataFrame
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from scipy.stats import pearsonr

data = pandas.read_csv('Linear_regression_basic.csv')
print("Dataset : \n",data)

print(data.describe())

x = DataFrame(data,columns=['x'])
y = DataFrame(data,columns=['y'])

# plt.figure(figsize=(10,10))
plt.title('LINEAR REGRESSION')
plt.xlabel('X axis') #label of X axis
plt.ylabel('Y axis') #label of Y axis
plt.ylim(0,7) #for Y axis limit
plt.xlim(0,8) #for X axis limit
plt.grid() #for grid
# plt.scatter(x,y,alpha=(0.7)) #Visibility of point
plt.scatter(x,y,color='green',s=50) #s for size
plt.show()

print("Data is x : \n",data['x'])
print("Data in y : \n",data['y'])
# Correlation Coefficient
corr,_ = pearsonr(data['x'],data['y'])
print("Correlation Coefficient : ",corr)

regression = LinearRegression()
regression.fit(x,y)
print("Regression Cofficient : ",regression.coef_)
# Intercept
print("Regression Intercept : ",regression.intercept_)

plt.figure(figsize=(10,6))
plt.title('LINEAR REGRESSION')
plt.xlabel('x --->')
plt.ylabel('y --->')
plt.ylim(0,15)
plt.xlim(0,15)
plt.scatter(x,y,alpha=(0.5))
plt.plot(x,regression.predict(x),color = "red",linewidth=2)
plt.show()
print("Accuracy score : ",regression.score(x,y))

print(" New data ")

data['y'][2] = 7

print(data)

X = DataFrame(data,columns=['x'])
Y = DataFrame(data,columns=['y'])

plt.title('LINEAR REGRESSION NEW')

plt.xlabel('X axis') #label of X axis
plt.ylabel('Y axis') #label of Y axis
plt.ylim(0,10) #for Y axis limit
plt.xlim(0,10) #for X axis limit
plt.grid() #for grid
# plt.scatter(x,y,alpha=(0.7)) #Visibility of point
plt.scatter(X,Y,color='green',s=50) #s for size
plt.show()

regression = LinearRegression()
regression.fit(X,Y)

print("Regression cofficient : ",regression.coef_)

print("Regression Intercept : ",regression.intercept_)

plt.figure(figsize=(10,6))
plt.title('LINEAR REGRESSION NEW')
plt.xlabel('x --->')
plt.ylabel('y --->')
plt.ylim(0,15)
plt.xlim(0,15)
plt.scatter(X,Y)
plt.plot(X,regression.predict(X),color = "red",linewidth=2)
plt.show()

print("New score : ",regression.score(X,Y))

Output :
EXPERIMENT-3
AIM: Simulate Multiple Linear Regression.
In multiple linear regression, the dependent variable depends on more than
one independent variables.

For multiple linear regression, the form of the model is-

Y = β0 + β1X1 + β2X2 + β3X3 + …… + βnXn

Here,
• Y is a dependent variable.
• X1, X2, …., Xn are independent variables.
• β0, β1,…, βn are the regression coefficients.
βj (1<=j<=n) is the slope or weight that specifies the factor by
which Xj has an impact on Y.

Program :

import pandas as pd
import numpy as np
from sklearn import linear_model

data = pd.read_csv("mlr.csv")
print("\n Dataset\n")
print(data.head())

print("length of data : ",data.shape)

# dependent variable = price

# independent variables (features) = area,bedrooms,age

# price = m1 x area + m2 x bedrooms + m3 x age + m4

# m1,m2,m3 = coefficient
# m4 = intercept

# Train-Test Splitting

from sklearn.model_selection import train_test_split

train_set, test_set = train_test_split(data, test_size=0.2, random_sta
te=42)
print(f"Rows in train set: {len(train_set)}\nRows in test set:
{len(test_set)}\n")

print("Train set : ",train_set)

print("Test set : ",test_set)

reg = linear_model.LinearRegression()
reg.fit(train_set[["area","bedrooms","age"]],train_set["price"])

print("Regression Cofficient : ",reg.coef_) #m1,m2,m3

print("Intercept : ",reg.intercept_) #m4

print("predicted value of test set : ")

print(reg.predict(test_set[["area","bedrooms","age"]]))

Output:
EXPERIMENT-4
AIM: Implement Decision Trees .
o Decision Tree is a Supervised learning technique that can be used for
both classification and Regression problems, but mostly it is preferred
for solving Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the
outcome.
o In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision and
have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the
given dataset.
o It is a graphical representation for getting all the possible solutions to
a problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the
root node, which expands on further branches and constructs a tree-like
structure.
o In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer
(Yes/No), it further split the tree into subtrees.
ALOGRITHM OF DECISION TREES:

o Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the
best attributes.
o Step-4: Generate the decision tree node, which contains the best
attribute.
o Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as
a leaf node.

Implementation:

import pandas as pd

df = pd.read_csv("titanic.csv")
print(df.head(5))
print("Shape : ",df.shape)
data = pd.DataFrame(df,columns=["Pclass","Sex","Age","Fare","Survived"]
)
print(data.head())
input = data.drop('Survived',axis=1)
input.Sex = input.Sex.map({'male':1,'female':2})
target = data.Survived
print("Input Data set : \n")
print(input)
print("Target Data set : \n")
print(target)
print(input.info())
input.Age = input.Age.fillna(input.Age.mean())

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(input,target,test_s

ize=0.2)
print("length of Train Dataset",len(X_train))
print("length of test Dataset",len(X_test))
from sklearn import tree

model = tree.DecisionTreeClassifier()
model.fit(X_train,y_train)
print("predicted values on test set : \n")
print(model.predict(X_test))
print("Score : ",model.score(X_test,y_test))
print("Confusion Matrix : ")
y_predicted = model.predict(X_test)

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_predicted)
print(cm)

import seaborn as sn
import matplotlib.pyplot as plt

plt.figure(figsize = (10,7))
sn.heatmap(cm, annot=True)
plt.xlabel('Predicted')
plt.ylabel('Truth')

Dataset :

Titanic data set

Output:
EXPERIMENT-5
AIM: Implement Random Forest Classification.
Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a complex
problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that contains a number

of decision trees on various subsets of the given dataset and takes the
average to improve the predictive accuracy of that dataset." Instead of
relying on one decision tree, the random forest takes the prediction from each
tree and based on the majority votes of predictions, and it predicts the final
output.

The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.

ALGORITHM OF RANDOM FOREST ALOGRITHM:

Random Forest works in two-phase first is to create the random forest by

combining N decision tree, and second is to make predictions for each tree
created in the first phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points
(Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.

Python Code:
# Random Forest Classification
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset

dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, -1].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0
.25, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Training the Random Forest Classification model on the Training set

from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'ent
ropy', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results

y_pred = classifier.predict(X_test)

# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

# Visualising the Training set results

from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_
set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_
set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel(
)]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

# Visualising the Test set results

from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_
set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_
set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel(
)]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

Output:
[[63 5]
[ 4 28]]
EXPERIMENT-6
AIM: Simulate Naïve Bayes algorithm.
o Naïve Bayes algorithm is a supervised learning algorithm, which is based
on Bayes theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional
training dataset.
o Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which
can be described as:

o Naïve: It is called Naïve because it assumes that the occurrence of a

certain feature is independent of the occurrence of other features. Such
as if the fruit is identified on the bases of color, shape, and taste, then
red, spherical, and sweet fruit is recognized as an apple. Hence each
feature individually contributes to identify that it is an apple without
depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.

Bayes' Theorem:

o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is

used to determine the probability of a hypothesis with prior knowledge.
It depends on the conditional probability.
o The formula for Bayes' theorem is given as:
Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed

event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the

probability of a hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the

evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:

Working of Naïve Bayes' Classifier can be understood with the help of the
below example:

Suppose we have a dataset of weather conditions and corresponding target

variable "Play". So using this dataset we need to decide that whether we
should play or not on a particular day according to the weather conditions. So
to solve this problem, we need to follow the below steps:

1. Convert the given dataset into frequency tables.

2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Program :

import numpy as np
# Import LabelEncoder
from sklearn import preprocessing
#Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB

# Assigning features and label variables

wheather=['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast'
,'Sunny','Sunny',
'Rainy','Sunny','Overcast','Overcast','Rainy']
temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild
','Mild','Mild','Hot','Mild']
humidity = ['high','high','high','high','normal','normal','normal','hig
h','normal','normal','normal','high','normal','high']
windy = ['false','true','false','false','false','true','true','false','
false','false','true','true','false','true']
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Ye
s','Yes','No']

#creating labelEncoder le --> Label Encoder

le = preprocessing.LabelEncoder()

# Converting string labels into numbers.

wheather_encoded=le.fit_transform(wheather)

# Converting string labels into numbers

temp_encoded=np.array(le.fit_transform(temp))

# Converting string labels into numbers

humidity_encoded=np.array(le.fit_transform(humidity))

windy_encoded=np.array(le.fit_transform(windy))

label=le.fit_transform(play)

temp=np.array(temp)
label=np.array(label)

for r in range(0,len(wheather)):
print ("%10s\t%d\t%10s\t%d\t%10s\t%d\t%10s\t%d\t%10s\t%d"%
(wheather[r],wheather_encoded[r],temp[r],temp_encoded[r],humidity[r],hu
midity_encoded[r],windy[r],windy_encoded[r],play[r],label[r]))

#Combinig weather and temp into single listof tuples

#features=zip(wheather_encoded,temp_encoded)
#features=[wheather_encoded,temp_encoded]
features=[]
print("\nFeature Array")
for r in range(0,len(wheather)):
temp=[]
temp.append(wheather_encoded[r])
temp.append(temp_encoded[r])
temp.append(humidity_encoded[r])
temp.append(windy_encoded[r])
features.append(temp)
print ("%5d%5d%5d%5d"%(features[r][0],features[r][1],features[r]
[2],features[r][3]))

#Create a Gaussian Classifier

model = GaussianNB()

# Train the model using the training sets

model.fit(features,label)

#Predict Output
predicted= model.predict([[2,1,0,0]]) # 0:Overcast, 2:Mild
if(predicted==0): print ("Predicted Value:", predicted,"\tPlay: NO")
if(predicted==1): print ("Predicted Value:", predicted,"\tPlay: YES")
Output:
EXPERIMENT-7
AIM: Implement K-Nearest Neighbors (K-NN), k-means.
K-NN Algorithm:
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most
similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears then it
can be easily classified into a well suite category by using K- NN
algorithm.
o K-NN algorithm can be used for Regression as well as for Classification
but mostly it is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from
the training set immediately instead it stores the dataset and at the time
of classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it
gets new data, then it classifies that data into a category that is much
similar to the new data.

K-NN Algorithm:

o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of neighbors

o Step-3: Take the K nearest neighbors as per the calculated Euclidean

distance.

o Step-4: Among these k neighbors, count the number of the data points in
each category.

o Step-5: Assign the new data points to that category for which the number of
the neighbor is maximum.
o Step-6: Our model is ready.

Program:

import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()

print("IRIS\nFeature Names:\n",iris.feature_names)
print("\nTarget Names:\n",iris.target_names)
df = pd.DataFrame(iris.data,columns=iris.feature_names)

df['target'] = iris.target
print("\nDATASET:\n",df.head())
print("Shape: ",df.shape)

from sklearn.model_selection import train_test_split

X = df.drop(['target'], axis='columns')
y = df.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2
)

from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train,y_train)
print("Predicted values on testing dataset: \n",model.predict(X_test))
print("MOdel Scores : ",model.score(X_test,y_test))

Output:
k-Means Algorithm:

K-Means Clustering is an Unsupervised Learning algorithm, which groups the

unlabelled dataset into different clusters. Here K defines the number of pre-
defined clusters that need to be created in the process, as if K=2, there will be
two clusters, and for K=3, there will be three clusters, and so on .

It allows us to cluster the data into different groups and a convenient way to
discover the categories of groups in the unlabeled dataset on its own without
the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a

centroid. The main aim of this algorithm is to minimize the sum of distances
between the data point and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-
number of clusters, and repeats the process until it does not find the best
clusters. The value of k should be predetermined in this algorithm .

k-Means ALGORITHM:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input
dataset).

Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the
new closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Program:

from sklearn.cluster import KMeans

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from matplotlib import pyplot as plt

df = pd.read_csv("income.csv")
print("DATAFRAME:\n",df.head())

plt.scatter(df.Age,df['Income($)'])
plt.xlabel('Age')
plt.ylabel('Income($)')
plt.show()

km = KMeans(n_clusters=3)
y_predicted = km.fit_predict(df[['Age','Income($)']])
print("Y predicted:",y_predicted)

df['cluster']=y_predicted
print("New dataframe:\n",df.head())
print("\nCluster Centers : ",km.cluster_centers_)

df1 = df[df.cluster==0]
df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='pu
rple',marker='*',label='centroid')
plt.xlabel('Age')
plt.ylabel('Income ($)')
plt.legend()
plt.show()

# Preprocessing using min max scaler

scaler = MinMaxScaler()

scaler.fit(df[['Income($)']])
df['Income($)'] = scaler.transform(df[['Income($)']])

scaler.fit(df[['Age']])
df['Age'] = scaler.transform(df[['Age']])
print("\nDataframe :\n",df.head())
plt.scatter(df.Age,df['Income($)'])
plt.show()
km = KMeans(n_clusters=3)
y_predicted = km.fit_predict(df[['Age','Income($)']])
print("\nY predicted : ",y_predicted)
df['cluster']=y_predicted
print("\nDataset\n",df.head())

print("\nCluster Centers : ",km.cluster_centers_)

Output:
EXPERIMENT-8
AIM: Deploy Support Vector Machine, Apriori algorithm.
Support Vector Machine

Support Vector Machine or SVM is one of the most popular Supervised

Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in Machine
Learning.

The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine.

Program:

import pandas as pd
from sklearn.datasets import load_digits
digits = load_digits()

print("Digits Target : ",digits.target)

df = pd.DataFrame(digits.data,digits.target)
df['target'] = digits.target
print("\nDATASET DIGIT\n",df.head())

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df.drop('target',ax
is='columns'), df.target, test_size=0.3)

print("\nRBF model\n")
from sklearn.svm import SVC
rbf_model = SVC(kernel='rbf')

rbf_model.fit(X_train, y_train)
print("RBF model Score",rbf_model.score(X_test,y_test))
print("\nUsing Linear kernel\n")
linear_model = SVC(kernel='linear')
linear_model.fit(X_train,y_train)
print("linear model Score",linear_model.score(X_test,y_test))
Output:

APRIORI ALGORITHM:

The Apriori algorithm uses frequent itemsets to generate association rules, and
it is designed to work on the databases that contain transactions. With the
help of these association rule, it determines how strongly or how weakly two
objects are connected. This algorithm uses a breadth-first search and Hash
Tree to calculate the itemset associations efficiently. It is the iterative process
for finding the frequent itemsets from the large dataset.

This algorithm was given by the R. Agrawal and Srikant in the year 1994. It is
mainly used for market basket analysis and helps to find those products that
can be bought together. It can also be used in the healthcare field to find drug
reactions for patients.

Program:

import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

data = pd.read_excel('Online_Retail.xlsx')
print("DataSet:",data.head())
print("Data Columns : ",data.columns)
print("Data Shape : ",data.shape)

# Stripping extra spaces in the description

data['Description'] = data['Description'].str.strip()

# Dropping the rows without any invoice number

data.dropna(axis = 0, subset =['InvoiceNo'], inplace = True)
data['InvoiceNo'] = data['InvoiceNo'].astype('str')
# Dropping all transactions which were done on credit
data = data[~data['InvoiceNo'].str.contains('C')]

# Splitting the data according to the region of transaction

# Transactions done in France
basket_France = (data[data['Country'] =="France"].groupby(['InvoiceNo',
'Description'])
['Quantity'].sum().unstack().reset_index().fillna(0).set_index('Invoice
No'))

def hot_encode(x):
if(x<= 0):
return 0
if(x>= 1):
return 1

# Applying one hot encoding

basket_encoded = basket_France.applymap(hot_encode)
basket_France = basket_encoded

# Building the model

frq_items = apriori(basket_France, min_support = 0.1, use_colnames = Tr
ue)

# Collecting the inferred rules in a dataframe

rules = association_rules(frq_items, metric ="lift", min_threshold = 1)
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, Fa
lse])

print(rules.head())

Output:
From the above output, it can be seen that paper cups and plates are bought together
in France.

EXPERIMENT-9
AIM: Simulate Artificial Neural Network.
Neural networks (NN), also called artificial neural networks (ANN) are a subset
of learning algorithms within the machine learning field that are loosely based
on the concept of biological neural networks.
Andrey Bulezyuk, who is a German-based machine learning specialist
with more than five years of experience, says that “neural networks are
revolutionizing machine learning because they are capable of efficiently
modelling sophisticated abstractions across an extensive range of disciplines
and industries.”

Basically, an ANN comprises of the following components:

An input layer that receives data and pass it on
A hidden layer
An output layer
Weights between the layers

A deliberate activation function for every hidden layer. In this simple neural
network Python tutorial, we’ll employ the Sigmoid activation function.
There are several types of neural networks. In this project, we are going to
create the feed-forward or perception neural networks. This type of ANN
relays data directly from the front to the back.

Training the feed-forward neurons often need back-propagation, which

provides the network with corresponding set of inputs and outputs. When the
input data is transmitted into the neuron, it is processed, and an output is
generated.
Summarizing an Artificial Neural Network:
1. Take inputs
2. Add bias (if required)
3. Assign random weights to input features
4. Run the code for training.
5. Find the error in prediction.
6. Update the weight by gradient descent algorithm.
7. Repeat the training phase with updated weights.
8. Make predictions.

Python Code:
from joblib.numpy_pickle_utils import xrange
from numpy import *

class NeuralNet(object):
def __init__(self):
# Generate random numbers
random.seed(1)

# Assign random weights to a 3 x 1 matrix,

self.synaptic_weights = 2 * random.random((3, 1)) - 1

# The Sigmoid function

def __sigmoid(self, x):
return 1 / (1 + exp(-x))

# The derivative of the Sigmoid function.

# This is the gradient of the Sigmoid curve.
def __sigmoid_derivative(self, x):
return x * (1 - x)

# Train the neural network and adjust the weights each time.
def train(self, inputs, outputs, training_iterations):
for iteration in xrange(training_iterations):
# Pass the training set through the network.
output = self.learn(inputs)

# Calculate the error

error = outputs - output

# Adjust the weights by a factor

factor = dot(inputs.T, error * self.__sigmoid_derivative(ou
tput))
self.synaptic_weights += factor

# The neural network thinks.

def learn(self, inputs):

return self.__sigmoid(dot(inputs, self.synaptic_weights))

if __name__ == "__main__":
# Initialize
neural_network = NeuralNet()

# The training set.

inputs = array([[0, 1, 1], [1, 0, 0], [1, 0, 1]])
outputs = array([[1, 0, 1]]).T

# Train the neural network

neural_network.train(inputs, outputs, 10000)

# Test the neural network with a test example.

print(neural_network.learn(array([1, 0, 1])))

Output:
EXPERIMENT-10
AIM: Implement the Genetic Algorithm Code.
Genetic Algorithm (GA) is a search-based optimization technique based on the
principles of Genetics and Natural Selection. It is frequently used to find
optimal or near-optimal solutions to difficult problems which otherwise would
take a lifetime to solve. It is frequently used to solve optimization problems, in
research, and in machine learning.
WORKING OF GENETIC ALGORITHM:
1. Initial Population– Initialize the population randomly based on the data.
2. Fitness function– Find the fitness value of the each of the chromosomes(a
chromosome is a set of parameters which define a proposed solution to
the problem that the genetic algorithm is trying to solve)
3. Selection– Select the best fitted chromosomes as parents to pass the
genes for the next generation and create a new population
4. Cross-over– Create new set of chromosome by combining the parents and
add them to new population set
5. Mutation– Perform mutation which alters one or more gene values in a
chromosome in the new population set generated. Mutation helps in
getting more diverse opportunity. Obtained population will be used in the
next generation
Repeat step 2-5 again for each generation.
Python Code:

import numpy as np
import pandas as pd
import random
import matplotlib.pyplot
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
#import the breast cancer dataset
from sklearn.datasets import load_breast_cancer
cancer=load_breast_cancer()
df = pd.DataFrame(cancer['data'],columns=cancer['feature_names'])
label=cancer["target"]
#splitting the model into training and testing set
X_train, X_test, y_train, y_test = train_test_split(df,
label, test_size=0.
30,
random_state=101)
#training a logistics regression model
logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)
predictions = logmodel.predict(X_test)
print("Accuracy = "+ str(accuracy_score(y_test,predictions)))
#defining various steps required for the genetic algorithm
def initilization_of_population(size,n_feat):
population = []
for i in range(size):
chromosome = np.ones(n_feat,dtype=np.bool)
chromosome[:int(0.3*n_feat)]=False
np.random.shuffle(chromosome)
population.append(chromosome)
return population

def fitness_score(population):
scores = []
for chromosome in population:
logmodel.fit(X_train.iloc[:,chromosome],y_train)
predictions = logmodel.predict(X_test.iloc[:,chromosome])
scores.append(accuracy_score(y_test,predictions))
scores, population = np.array(scores), np.array(population)
inds = np.argsort(scores)
return list(scores[inds][::-1]), list(population[inds,:][::-1])

def selection(pop_after_fit,n_parents):
population_nextgen = []
for i in range(n_parents):
population_nextgen.append(pop_after_fit[i])
return population_nextgen
def crossover(pop_after_sel):
population_nextgen=pop_after_sel
for i in range(len(pop_after_sel)):
child=pop_after_sel[i]
child[3:7]=pop_after_sel[(i+1)%len(pop_after_sel)][3:7]
population_nextgen.append(child)
return population_nextgen

def mutation(pop_after_cross,mutation_rate):
population_nextgen = []
for i in range(0,len(pop_after_cross)):
chromosome = pop_after_cross[i]
for j in range(len(chromosome)):
if random.random() < mutation_rate:
chromosome[j]= not chromosome[j]
population_nextgen.append(chromosome)
#print(population_nextgen)
return population_nextgen

def generations(size,n_feat,n_parents,mutation_rate,n_gen,X_train,
X_test, y_train, y_test):
best_chromo= []
best_score= []
population_nextgen=initilization_of_population(size,n_feat)
for i in range(n_gen):
scores, pop_after_fit = fitness_score(population_nextgen)
print(scores[:2])
pop_after_sel = selection(pop_after_fit,n_parents)
pop_after_cross = crossover(pop_after_sel)
population_nextgen = mutation(pop_after_cross,mutation_rate)
best_chromo.append(pop_after_fit[0])
best_score.append(scores[0])
return best_chromo,best_score

Output:

Secure data science intergrating cyber security and data science 1st Edition by Bhavani Thuraisngham, Murat Kantarcioglu, Latifur Khan ISBN 9781000557510 1000557510instant download
100% (3)
Secure data science intergrating cyber security and data science 1st Edition by Bhavani Thuraisngham, Murat Kantarcioglu, Latifur Khan ISBN 9781000557510 1000557510instant download
83 pages
Data Science: Concepts and Practice 2nd Edition- eBook PDF instant download
100% (6)
Data Science: Concepts and Practice 2nd Edition- eBook PDF instant download
34 pages
Learning OpenCV 3 Computer Vision in C with the OpenCV Library 1st Edition Adrian Kaehler 2024 scribd download
100% (2)
Learning OpenCV 3 Computer Vision in C with the OpenCV Library 1st Edition Adrian Kaehler 2024 scribd download
65 pages
Unit - II MLT
No ratings yet
Unit - II MLT
75 pages
Lecture Notes Data Mining Data Warehousing Unit-2: Data Preprocessing
No ratings yet
Lecture Notes Data Mining Data Warehousing Unit-2: Data Preprocessing
3 pages
36.-ISC-Artificial-Intelligence
No ratings yet
36.-ISC-Artificial-Intelligence
13 pages
Data Mining Lab Manual 2 2
No ratings yet
Data Mining Lab Manual 2 2
63 pages
Experiment No. 5: Objective
No ratings yet
Experiment No. 5: Objective
5 pages
static and dynamic Type Checking
No ratings yet
static and dynamic Type Checking
2 pages
Data Analytics Lab Manual_250402_095326
No ratings yet
Data Analytics Lab Manual_250402_095326
58 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
Cloud Computing Practical File
No ratings yet
Cloud Computing Practical File
16 pages
Artificial Intelligence and International Law (Jaemin Lee) (Z-Library)
No ratings yet
Artificial Intelligence and International Law (Jaemin Lee) (Z-Library)
264 pages
Accident Possibility Indicator in Machine Learning Using Decision Tree Classifier Technique
No ratings yet
Accident Possibility Indicator in Machine Learning Using Decision Tree Classifier Technique
4 pages
Advance Python
No ratings yet
Advance Python
5 pages
Data Mining UNIT II
No ratings yet
Data Mining UNIT II
19 pages
EPilots A System To Predict Hard Landing During The Approach Phase of Commercial Flights
No ratings yet
EPilots A System To Predict Hard Landing During The Approach Phase of Commercial Flights
8 pages
03 Data Preparation
No ratings yet
03 Data Preparation
28 pages
Major Project Synopsis Batch B7
No ratings yet
Major Project Synopsis Batch B7
6 pages
213015005_Lab2
No ratings yet
213015005_Lab2
8 pages
CONIT 2022 Schedule
No ratings yet
CONIT 2022 Schedule
22 pages
Losing Control (Group) The Machine Learning Control Method For Counterfactual Forecasting
No ratings yet
Losing Control (Group) The Machine Learning Control Method For Counterfactual Forecasting
44 pages
DMC - Record
No ratings yet
DMC - Record
54 pages
系统1：专家直觉的解释
No ratings yet
系统1：专家直觉的解释
24 pages
Machine Learning Methods For Solar Radiation Forecasting. A Review
No ratings yet
Machine Learning Methods For Solar Radiation Forecasting. A Review
33 pages
DMiningKuliah 2A DPreparation
No ratings yet
DMiningKuliah 2A DPreparation
32 pages
JEEVA FINAL
No ratings yet
JEEVA FINAL
34 pages
Ai & Mi
No ratings yet
Ai & Mi
3 pages
PCA & Factor Analysis: Presented by Deepak Sharma
No ratings yet
PCA & Factor Analysis: Presented by Deepak Sharma
11 pages
ML Combined
No ratings yet
ML Combined
254 pages
Design of Convolutional Neural Networks Architecture For Non Profiled Side Channel Attack Detection
No ratings yet
Design of Convolutional Neural Networks Architecture For Non Profiled Side Channel Attack Detection
6 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
5 pages
Generative Ai-In-The-Loop: Integrating Llms and Gpts Into The Next Generation Networks
No ratings yet
Generative Ai-In-The-Loop: Integrating Llms and Gpts Into The Next Generation Networks
9 pages
DMC Lab Ex - 1 To 15 (31.03.2024)
No ratings yet
DMC Lab Ex - 1 To 15 (31.03.2024)
52 pages
DATA MINING LAB MANUAL
No ratings yet
DATA MINING LAB MANUAL
35 pages
DS Unit 2
No ratings yet
DS Unit 2
42 pages
10-2 Data analysis and pre-processing part 4 PDF
No ratings yet
10-2 Data analysis and pre-processing part 4 PDF
23 pages
Module 3 Notes
No ratings yet
Module 3 Notes
5 pages
ofosu-ampong
No ratings yet
ofosu-ampong
12 pages
Lecture 1 - Introduction To Cloud Computing
No ratings yet
Lecture 1 - Introduction To Cloud Computing
32 pages
EXP-2 ML
No ratings yet
EXP-2 ML
6 pages
Data Minig Lab Manual
No ratings yet
Data Minig Lab Manual
58 pages
VIPDMTheoryChapter3
No ratings yet
VIPDMTheoryChapter3
87 pages
Numpy Cheatsheet
No ratings yet
Numpy Cheatsheet
11 pages
PTDLKT
No ratings yet
PTDLKT
11 pages
datascience
No ratings yet
datascience
26 pages
R Programming Unit-2
No ratings yet
R Programming Unit-2
29 pages
dm(2)
No ratings yet
dm(2)
3 pages
Practical 1 ML_removed
No ratings yet
Practical 1 ML_removed
5 pages
UNIT-2
No ratings yet
UNIT-2
37 pages
DA PROGRAM UPTO 6 (1)
No ratings yet
DA PROGRAM UPTO 6 (1)
20 pages
Be A 65 Ads Exp 3
No ratings yet
Be A 65 Ads Exp 3
6 pages
DataAnalytics Lab Manual (1)
No ratings yet
DataAnalytics Lab Manual (1)
35 pages
ml4
No ratings yet
ml4
17 pages
data-analytics-manual lab g.anill kumar
No ratings yet
data-analytics-manual lab g.anill kumar
23 pages
pp DWDM 4 5
No ratings yet
pp DWDM 4 5
26 pages
EXP-2
No ratings yet
EXP-2
6 pages
Data pre Processing
No ratings yet
Data pre Processing
11 pages
ML SELF UNIT 2
No ratings yet
ML SELF UNIT 2
20 pages
DA lab
No ratings yet
DA lab
27 pages
project format for btech 6th Sem
No ratings yet
project format for btech 6th Sem
13 pages
Data Preprocessing Unit 2
No ratings yet
Data Preprocessing Unit 2
3 pages
AI Song Contest: Human-AI Co-Creation in Songwriting
No ratings yet
AI Song Contest: Human-AI Co-Creation in Songwriting
9 pages
DA_Programs
No ratings yet
DA_Programs
44 pages
6 Data Preprocessing
No ratings yet
6 Data Preprocessing
37 pages
HIT391-week 3-New
No ratings yet
HIT391-week 3-New
43 pages
MSDSModule 2
No ratings yet
MSDSModule 2
35 pages
Generative Ai Can Change Real Estate But The Industry Must Change To Reap The Benefits
No ratings yet
Generative Ai Can Change Real Estate But The Industry Must Change To Reap The Benefits
9 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
Unit - II
No ratings yet
Unit - II
56 pages
ML Practical File
100% (2)
ML Practical File
43 pages
Shortlisted Candidates - SRM
No ratings yet
Shortlisted Candidates - SRM
31 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
25 pages
3-Data Preprocessing
No ratings yet
3-Data Preprocessing
32 pages
Real-World Image Datasets For Federated Learning
No ratings yet
Real-World Image Datasets For Federated Learning
8 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
DSUR_EA2352001010391_W7
No ratings yet
DSUR_EA2352001010391_W7
3 pages
Build Your Own Chatbot Using Python
No ratings yet
Build Your Own Chatbot Using Python
24 pages
The Alliance 6.0
No ratings yet
The Alliance 6.0
9 pages
Machine Learning Lab File (BTCS619-18)
No ratings yet
Machine Learning Lab File (BTCS619-18)
50 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Theeffectofhoroscopesonwomensrelationships2001
No ratings yet
Theeffectofhoroscopesonwomensrelationships2001
8 pages
Amity School of Engineering and Technology Amity University, Uttar Pradesh
No ratings yet
Amity School of Engineering and Technology Amity University, Uttar Pradesh
5 pages
BlockChain - Intern Requisition Form
No ratings yet
BlockChain - Intern Requisition Form
2 pages
Handling Missing Values in A Real-Time Dataset During
No ratings yet
Handling Missing Values in A Real-Time Dataset During
5 pages
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
40 pages
Machine Learning in Big Data
No ratings yet
Machine Learning in Big Data
10 pages
Data Preprocessing
No ratings yet
Data Preprocessing
3 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)