Abhishek ML File
Abhishek ML File
The probability that it is Friday and that a student is absent is 3 %. Since there are 5 school
days in a week, the
probability that it is Friday is 20 %. What is the probability that a student is absent given that
today is Friday?
Apply Baye’s rule in python to get the result. (Ans: 15%)
ALGORITHM:
Step 1: Calculate probability for each word in a text and filter the words which have a
probability less than threshold
probability. Words with probability less than threshold probability are irrelevant.
Step 2: Then for each word in the dictionary, create a probability of that word being in
insincere questions and its
probability insincere questions. Then finding the conditional probability to use in naive Bayes
classifier.
Step 3: Prediction using conditional probabilities.
Step 4: End.
PROGRAM:
} PFIA=float(input(“Enter probability that it is Friday and that a student is absent=”))
PF=float(input(“ probability that it is Friday=”))
PABF=PFIA / PF
print(“probability that a student is absent given that today is Friday using conditional
probabilities=”,PABF)
OUTPUT:
Enter probability that it is Friday and that a student is absent= 0.03
probability that it is Friday= 0.2
probability that a student is absent given that today is Friday using conditional probabilities=
0.15
LAB EXPERIMENT 2
ALGORITHM:
Step 1: Connect to MySQL from Python
Step 2: Define a SQL SELECT Query
Step 3: Get Cursor Object from Connection
Step 4: Execute the SELECT query using execute() method
Step 5: Extract all rows from a result
Step 6: Iterate each row
Step 7: Close the cursor object and database connection object
Step 8: End.
PROCEDURE
CREATING A DATABASE IN MYSQL AS FOLLOWS:
CREATE DATABASE myDB;
SHOW DATABASES;
USE myDB
CREATE TABLE MyGuests (id INT, name VARCHAR(20), email VARCHAR(20));
SHOW TABLES;
INSERT INTO MyGuests (id,name,email) VALUES(1,"sairam","[email protected]");
… SELECT *
FROM authors;
We need to install mysql-connector to connect Python with MySQL. You can use the below
command to
install this in your system.
pip install mysql-connector-python-rf
)
mycursor = mydb.cursor()
mycursor.execute("SELECT * FROM MyGuests")
myresult = mycursor.fetchall()
for x in myresult:
print(x)
OUTPUT:
Extracting data from Excel sheet using Python
Step1: First convert dataset present in excel to CSV file using online resources, then execute
following
program:
consider dataset excel consists of 14 input columns and 3 output columns (C1, C2, C3)as follows:
Python Souce Code:
import pandas as pd
dataset=pd.read_csv("Mul_Label_Dataset.csv", delimiter=',')
print(dataset) #Print entire dataset
X=
dataset[['Send','call','DC','IFMSCV','MSCV','BA','MBZ','TxO','RS','CA','AL','IFWL','WWL','FWL']].v
alues
Y = dataset[['C1','C2','C3']].values
print(Y) #Prints output values
print(X) #Prints intput values
X1 = dataset[['Send','call','DC','IFMSCV','MSCV']].values
print(X1) #Prints first 5 columns of intput values
print(X[0:5]) # Prints only first 5 rows of input values
OUTPUT SCREENS:
Excel Format: CSV
Format:
LAB EXPERIMENT 3
ALGORITHM:
Step 1: Load the data
Step 2: Initialize the value of k
Step 3: For getting the predicted class, iterate from 1 to total number of training data points
i) Calculate the distance between test data and each row of training data. Here we will use
Euclidean
distance as our distance metric since it’s the most popular method. The other metrics that can
be
used are Chebyshev, cosine, etc.
ii) Sort the calculated distances in ascending order based on distance values 3. Get top k rows
from the
sorted array
iii) Get the most frequent class of these rows i.e. Get the labels of the selected K entries
iv) Return the predicted class If regression, return the mean of the K labels If
classification, return
the mode of the K labels
If regression, return the mean of the K labels
If classification, return the mode of the K labels
Step 4: End.
PROGRAM
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()
data = iris.data
labels = iris.target
for i in [0, 79, 99, 101]:
print(f"index: {i:3}, features: {data[i]}, label: {labels[i]}")
np.random.seed(42)
indices = np.random.permutation(len(data))
n_training_samples = 12
learn_data = data[indices[:-n_training_samples]]
learn_labels = labels[indices[:-n_training_samples]]
test_data = data[indices[-n_training_samples:]]
test_labels = labels[indices[-n_training_samples:]]
print("The first samples of our learn set:")
print(f"{'index':7s}{'data':20s}{'label':3s}")
for i in range(5):
print(f"{i:4d} {learn_data[i]} {learn_labels[i]:3}")
print("The first samples of our test set:")
print(f"{'index':7s}{'data':20s}{'label':3s}")
for i in range(5):
print(f"{i:4d} {learn_data[i]} {learn_labels[i]:3}")
#The following code is only necessary to visualize the data of our learnset
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
colours = ("r", "b")
X = []
for iclass in range(3):
X.append([[], [], []])
for i in range(len(learn_data)):
if learn_labels[i] == iclass:
X[iclass][2].append(sum(learn_data[i][2:]))
colours = ("r", "g", "y")
fig = plt.figure()
for iclass in range(3):
ax.scatter(X[iclass][0], X[iclass][1], X[iclass][2], c=colours[iclass])
plt.show()
#
def distance(instance1, instance2):
""" Calculates the Eucledian distance between two instances"""
return np.linalg.norm(np.subtract(instance1, instance2))
def get_neighbors(training_set, labels, test_instance, k, distance):
"""
get_neighors calculates a list of the k nearest neighbors of an instance 'test_instance'.
The function returns a list of k 3-tuples. Each 3-tuples consists of (index, dist, label)
"""
distances = []
for index in range(len(training_set)):
dist = distance(test_instance, training_set[index])
distances.append((training_set[index], dist, labels[index]))
distances.sort(key=lambda x: x[1])
neighbors = distances[:k]
return neighbors
for i in range(5):
neighbors = get_neighbors(learn_data, learn_labels, test_data[i], 3, distance=distance)
print("Index: ",i,'\n',
"Testset Data: ",test_data[i],'\n',
"Testset Label: ",test_labels[i],'\n',
"Neighbors: ",neighbors,'\n')
OUTPUT:
LAB EXPERIMENT 4
ALGORITHM:
Step 1: Create Database for Linear Regression
Step 2:Finding Hypothesis of Linear Regression
Step 3:Training a Linear Regression model
Step 4:Evaluating the model
Step 5: Scikit-learn implementation
Step 6: End
plt.show() ) OUTPUT:
LAB EXPERIMENT 5
ALGORITHM:
Step 1: Read the Given data Sample to X
Step 2: Train Dataset with K=5
Step 3: Find optimal number of clusters(k) in a dataset using Elbow method
Step 4: Train Dataset with K=3 (optimal K-Value)
Step 4: Compare results
Step 6: End
PROGRAM:
#Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn import datasets
#Read DataSet
df = datasets.load_iris()
x = df.data
y = df.target
print(x)
print(y)
#Lets try with k=5 initially
kmeans5 = KMeans(n_clusters=5)
y_kmeans5 = kmeans5.fit_predict(x)
print(y_kmeans5)
print(kmeans5.cluster_centers_)
# To find optimal number of clusters(k) in a dataset
Error =[ ]
for i in range(1, 11):
kmeans = KMeans(n_clusters = i).fit(x)
kmeans.fit(x)
Error.append(kmeans.inertia_)
import matplotlib.pyplot as plt
plt.plot(range(1, 11), Error)
plt.title('Elbow method')
plt.xlabel('No of clusters')
plt.ylabel('Error')
plt.show()
#Now try with k=3 finally
kmeans3 = KMeans(n_clusters=3)
y_kmeans3 = kmeans3.fit_predict(x)
print(y_kmeans3)
print(kmeans3.cluster_centers_)
OUTPUT:
LAB EXPERIMENT 6
OBJECTIVE: Implement Naive Bayes Theorem to Classify the English Text using python
The values 0,1,2, encode the frequency of a word that appeared in the initial text data.
E.g. The first transformed row is [0 1 1 1 0 0 1 0 1] and the unique vocabulary is [‘and’,
‘document’,
‘first’, ‘is’, ‘one’, ‘second’, ‘the’, ‘third’, ‘this’], thus this means that the words “
document”, “first”, “is”,
“the” and “this” appeared 1 time each in the initial text string (i.e. ‘This is
the first document.’).
In our example, we will convert the collection of text documents (train and test
sets) into a matrix of token
counts.
To implement that text transformation we will use the make_pipeline function.
This will internally transform
the text data and then the model will be fitted using the transformed data.
Source Code
print("NAIVE BAYES ENGLISH TEST CLASSIFICATION")
import numpy as np, pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
from sklearn.metrics import confusion_matrix, accuracy_score
sns.set() # use seaborn plotting style
# Load the dataset
data = fetch_20newsgroups()# Get the text categories
text_categories = data.target_names# define the training set
train_data = fetch_20newsgroups(subset="train", categories=text_categories)#
define the test set
test_data = fetch_20newsgroups(subset="test", categories=text_categories)
print("We have {} unique classes".format(len(text_categories)))
print("We have {} training samples".format(len(train_data.data)))
print("We have {} test samples".format(len(test_data.data)))
# let’s have a look as some training data let it 5th only
#print(test_data.data[5])
# Build the model
model = make_pipeline(TfidfVectorizer(), MultinomialNB())# Train the model
using the training data
model.fit(train_data.data, train_data.target)# Predict the categories of the test
data
predicted_categories = model.predict(test_data.data)
print(np.array(test_data.target_names)[predicted_categories])
# plot the confusion matrix
mat = confusion_matrix(test_data.target, predicted_categories)
sns.heatmap(mat.T, square = True, annot=True, fmt = "d",
xticklabels=train_data.target_names,yticklabels=train_data.target_names)
plt.xlabel("true labels")
plt.ylabel("predicted label")
plt.show()
print("The accuracy is {}".format(accuracy_score(test_data.target,
predicted_categories)))
OUTPUT:
LAB EXPERIMENT 7
OBJECTIVE: Implement an algorithm to demonstrate the significance of Genetic
Algorithm in python
ALGORITHM:
1. Individual in population compete for resources and mate
2. Those individuals who are successful (fittest) then mate to create more offspring than others
3. Genes from “fittest” parent propagate throughout the generation, that is sometimes parents create
offspring which is better than either parent.
4. Thus each successive generation is more suited for their environment.
Given a target string, the goal is to produce target string starting from a random string of the same
length. In
the following implementation, following analogies are made –
Characters A-Z, a-z, 0-9 and other special symbols are considered as genes
A string generated by these character is considered as chromosome/solution/Individual
Fitness score is the number of characters which differ from characters in target string at a particular
index. So
individual having lower fitness value is given more preference.
Source Code
# Python3 program to create target string, starting from
# random string using Genetic Algorithm
import random
# Number of individuals in each generation
POPULATION_SIZE = 100
# Valid genes
GENES = '''abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOP
QRSTUVWXYZ 1234567890, .-;:_!"#%&/()=?@${[]}'''
# Target string to be generated
TARGET = "I love GeeksforGeeks"
class Individual(object):
'''
Class representing individual in population '''
def init (self, chromosome):
self.chromosome = chromosome
self.fitness = self.cal_fitness()
@classmethod
def mutated_genes(self):
'''
create random genes for mutation
'''
global GENES
gene = random.choice(GENES)
return gene
@classmethod
def create_gnome(self):
'''
create chromosome or string of genes
'''
global TARGET
gnome_len = len(TARGET)
return [self.mutated_genes() for _ in range(gnome_len)]
def mate(self, par2):
''' Perform mating and produce new offspring '''
# chromosome for offspring
child_chromosome = []
for gp1, gp2 in zip(self.chromosome, par2.chromosome):
# random probability
child_chromosome.append(self.mutated_genes())
# create new Individual(offspring) using
# generated chromosome for offspring
return Individual(child_chromosome)
def cal_fitness(self):
''' Calculate fittness score, it is the number of
characters in string which differ from target string. '''
global TARGET
fitness = 0
for gs, gt in zip(self.chromosome, TARGET):
if gs != gt: fitness+= 1
return fitness
# Driver code
def main():
global POPULATION_SIZE
#current generation
while not found:
# sort the population in increasing order of fitness score
population = sorted(population, key = lambda x:x.fitness)
# if the individual having lowest fitness score ie.
# 0 then we know that we have reached to the target
# and break the loop
if population[0].fitness <= 0:
found = True
break
# Otherwise generate new offsprings for new generation
new_generation = []
# Perform Elitism, that mean 10% of fittest population
# goes to the next generation
s = int((10*POPULATION_SIZE)/100)
new_generation.extend(population[:s])
# From 50% of fittest population, Individuals
# will mate to produce offspring
s = int((90*POPULATION_SIZE)/100)
for _ in range(s):
parent1 = random.choice(population[:50])
parent2 = random.choice(population[:50])
child = parent1.mate(parent2)
new_generation.append(child)
population = new_generation
print("Generation: {}\tString: {}\tFitness: {}".\
format(generation,
"".join(population[0].chromosome),
population[0].fitness))
generation += 1
print("Generation: {}\tString: {}\tFitness: {}".\
format(generation,
"".join(population[0].chromosome),
population[0].fitness))
if name == ' main ':
main()
OUTPUT:
LAB EXPERIMENT 8
ALGORITHM:
It is the most widely used algorithm for training artificial neural networks.
In the simplest scenario, the architecture of a neural network consists of some sequential
layers, where the
layer numbered i is connected to the layer numbered i+1. The layers can be classified into 3
classes:
1. Input
2. Hidden
3. Output
Usually, each neuron in the hidden layer uses an activation function like sigmoid or rectified
linear unit
(ReLU). This helps to capture the non-linear relationship between the inputs and their
outputs.
The neurons in the output layer also use activation functions like sigmoid (for regression) or
SoftMax (for
classification).
To train a neural network, there are 2 passes (phases):
Forward
Backward
The forward and backward phases are repeated from some epochs. In each epoch, the
following occurs:
1. The inputs are propagated from the input to the output layer.
2. The network error is calculated.
3. The error is propagated from the output layer to the input layer.
Knowing that there’s an error, what should we do? We should minimize it. To minimize
network error, we
must change something in the network. Remember that the only parameters we can change
are the weights
and biases. We can try different weights and biases, and then test our network.
Source Code:
import numpy
import matplotlib.pyplot as plt
def sigmoid(sop):
return 1.0/(1+numpy.exp(-1*sop))
def error(predicted, target):
return numpy.power(predicted-target, 2)
def error_predicted_deriv(predicted, target):
return 2*(predicted-target)
def sigmoid_sop_deriv(sop):
return sigmoid(sop)*(1.0-sigmoid(sop))
def sop_w_deriv(x):
return x
def update_w(w, grad, learning_rate):
return w - learning_rate*grad
x1=0.1
x2=0.4
target = 0.7
learning_rate = 0.01
w1=numpy.random.rand()
w2=numpy.random.rand()
print("Initial W : ", w1, w2)
predicted_output = []
network_error = []
old_err = 0
for k in range(80000):
# Forward Pass
y = w1*x1 + w2*x2
predicted = sigmoid(y)
err = error(predicted, target)
predicted_output.append(predicted)
network_error.append(err)
# Backward Pass
g1 = error_predicted_deriv(predicted, target)
g2 = sigmoid_sop_deriv(y)
g3w1 = sop_w_deriv(x1)
g3w2 = sop_w_deriv(x2)
gradw1 = g3w1*g2*g1
gradw2 = g3w2*g2*g1
w1 = update_w(w1, gradw1, learning_rate)
w2 = update_w(w2, gradw2, learning_rate)
#print(predicted)
plt.figure()
plt.plot(network_error)
plt.title("Iteration Number vs Error")
plt.xlabel("Iteration Number")
plt.ylabel("Error")
plt.show()
plt.figure()
plt.plot(predicted_output)
plt.title("Iteration Number vs Prediction")
plt.xlabel("Iteration Number")
plt.ylabel("Prediction")
plt.show()
OUTPUT:
Initial W : 0.08698924153243281 0.4532713230157145
LAB EXPERIMENT 9
Training Database
Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint a, in h
If the constraint a, is satisfied by x
Then do nothing
Else replace a, in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
Hypothesis Construction
Source Code:
with open('enjoysport.csv', 'r') as csvfile:
for row in csv.reader(csvfile):
a.append(row)
print(a)
print("\n The total number of training instances are : ",len(a))
num_attribute = len(a[0])-1
print("\n The initial hypothesis is : ")
hypothesis = ['0']*num_attribute
print(hypothesis)
for i in range(0, len(a)):
if a[i][num_attribute] == 'TRUE': #for each positive example only
for j in range(0, num_attribute):
if hypothesis[j] == '0' or hypothesis[j] == a[i][j]:
hypothesis[j] = a[i][j]
else:
hypothesis[j] = '?'
print("\n The hypothesis for the training instance {} is : \n".format(i+1),hypothesis)
print("\n The Maximally specific hypothesis for the training instance is ")
print(hypothesis)
OUTPUT:
LAB EXPERIMENT 10
Training Database
Algorithm
Source Code:
import csv
with open("enjoysport.csv") as f:
csv_file=csv.reader(f)
data=list(csv_file)
print(data)
print(" ")
s=data[1][:-1] #extracting one row or instance or record
g=[['?' for i in range(len(s))] for j in range(len(s))]
print(s)
print(" ")
print(g)
print(" ")
for i in data:
if i[-1]=="TRUE": # For each positive training record or instance
for j in range(len(s)):
if i[j]!=s[j]:
s[j]='?'
g[j][j]='?'
elif i[-1]=="FALSE": # For each negative training record or example
for j in range(len(s)):
if i[j]!=s[j]:
g[j][j]=s[j]
else:
g[j][j]="?"
print("\nSteps of Candidate Elimination Algorithm",data.index(i)+1)
print(s)
print(g)
gh=[]
for i in g:
for j in i:
if j!='?':
gh.append(i)
break
print("\nFinal specific hypothesis:\n",s)
print("\nFinal general hypothesis:\n",gh)
OUTPUT: