0% found this document useful (0 votes)
21 views41 pages

AI &ML LAB Manual

The document provides an overview of Computer Science, Artificial Intelligence, Machine Learning, and Deep Learning, highlighting their interconnections and applications. It includes a detailed explanation of the Traveling Salesman Problem (TSP) and two algorithms for solving it: the Nearest Neighbor Algorithm and a brute-force method using permutations. Additionally, it describes the A* Search algorithm, which is used for finding the shortest path in a graph, and includes example implementations in Python.

Uploaded by

Devabn Nirmal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views41 pages

AI &ML LAB Manual

The document provides an overview of Computer Science, Artificial Intelligence, Machine Learning, and Deep Learning, highlighting their interconnections and applications. It includes a detailed explanation of the Traveling Salesman Problem (TSP) and two algorithms for solving it: the Nearest Neighbor Algorithm and a brute-force method using permutations. Additionally, it describes the A* Search algorithm, which is used for finding the shortest path in a graph, and includes example implementations in Python.

Uploaded by

Devabn Nirmal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Computer Science (CS):

Computer Science is the study of algorithms, computation, information,


and their implementations through various systems. It encompasses a
wide range of topics including software development, programming
languages, data structures, algorithms, computer architecture, artificial
intelligence, and more. CS involves both theoretical understanding and
practical application to design, develop, and analyze computing systems
and software solutions.

Artificial Intelligence (AI):


Artificial Intelligence refers to the simulation of human intelligence in
machines that are programmed to think and perform tasks that typically
require human intelligence. It encompasses a broad spectrum of
techniques, including machine learning, natural language processing,
robotics, computer vision, and more. AI systems aim to replicate cognitive
functions such as learning, problem-solving, perception, and decision-
making.

Machine Learning (ML):


Machine Learning is a subset of artificial intelligence that focuses on
enabling machines to learn and improve from experience without being
explicitly programmed. ML algorithms use data to identify patterns, make
predictions, or optimize processes. It involves various types of learning,
including supervised learning (learning from labeled data), unsupervised
learning (finding patterns in unlabeled data), and reinforcement learning
(learning from feedback).

Deep Learning (DL):


Deep Learning is a specialized field of machine learning that uses neural
networks with multiple layers (deep neural networks) to extract high-level
features from data. DL models are capable of learning representations of
data in a hierarchical manner, enabling them to perform tasks such as
image and speech recognition, natural language processing, and other
complex pattern recognition tasks. Deep Learning has gained attention
due to its ability to handle large amounts of data and solve intricate
problems.
In summary, Computer Science provides the foundational knowledge and
tools for various fields, including AI, which focuses on creating intelligent
systems. Within AI, Machine Learning employs algorithms to enable
machines to learn from data, while Deep Learning, a subset of ML, utilizes
complex neural networks for pattern recognition and sophisticated tasks.

1. Write a program to solve the traveling salesman problem-


TSP.

Solution:
The distance function calculates the Euclidean distance between two
points in a two-dimensional space. In simple terms, it measures the
straight-line distance between two points in a plane.

This formula is derived from the Pythagorean theorem. It calculates the


length of the straight line (hypotenuse) between the two points when
considering them as the vertices of a right-angled triangle, with the
horizontal and vertical distances as the other two sides.

Here's a breakdown of the formula:


Taking the square root of this sum gives the actual Euclidean distance
between the points.
This distance calculation is often used in various applications, including
geometry, computer science, machine learning (especially in clustering
algorithms), and more, to measure the proximity or separation between
points in a two-dimensional space.
Example:
(0, 0),-0
(1, 2),
(2, 4),
(3, 1),
(5, 3)
Explanation of the Nearest Neighbour Algorithm:
Initialization:

Choose a starting city arbitrarily. In most cases, the starting city is


selected randomly or as the first city in the list of cities.
Iterative Process:

From the current city, find the nearest unvisited city.


Mark that city as visited and add it to the tour.
Update the current city to be the newly visited city.
Repeat this process until all cities are visited.
Completion:

Once all cities are visited, return to the starting city to complete the
tour.
Step-by-Step Execution:

Starting City Selection:


The algorithm begins at a randomly chosen starting city (in the
provided code, it starts from city 0).
City Selection Process:
From the current city, the algorithm identifies the nearest unvisited city
by calculating the distances between the current city and all unvisited
cities.
It selects the city with the shortest distance as the next city to visit and
adds it to the tour.
This process continues iteratively until all cities are visited.
Returning to the Starting City:
After visiting all cities, the algorithm returns to the starting city,
completing the tour.
import math
import matplotlib.pyplot as plt

# Function to calculate Euclidean distance between two points


def distance(point1, point2):
return math.sqrt((point1[0] - point2[0])**2 + (point1[1] -
point2[1])**2)

# Nearest Neighbor algorithm for solving TSP


def nearest_neighbor(cities):
num_cities = len(cities)
visited = [False] * num_cities
path = [0] # Start from city 0
visited[0] = True
total_distance = 0
distances = []

for i in range(num_cities - 1):


current_city = path[-1]
min_dist = float('inf')
nearest_city = None

for i in range(num_cities):
if not visited[i] and i != current_city:
dist = distance(cities[current_city], cities[i])
if dist < min_dist:
min_dist = dist
nearest_city = i

distances.append(min_dist)
total_distance += min_dist
path.append(nearest_city)
visited[nearest_city] = True

# Returning to the starting city


path.append(0)
total_distance += distance(cities[path[-2]], cities[0])
distances.append(distance(cities[path[-2]], cities[0]))
return path, total_distance, distances

# Example cities represented by their coordinates (x, y)


cities = [
(0, 0),
(1, 2),
(2, 4),
(3, 1),
(5, 3)
]

# Solve TSP using nearest neighbor algorithm


optimal_path, total_distance, distances = nearest_neighbor(cities)

# Extract x and y coordinates for plotting


x_coords = [city[0] for city in cities]
y_coords = [city[1] for city in cities]

# Plotting the cities


plt.figure(figsize=(8, 6))
plt.scatter(x_coords, y_coords, c='red', label='Cities')
plt.plot(x_coords[0], y_coords[0], 'go', label='Start/End') # Mark the
starting city

# Plotting the optimal path


for i in range(len(optimal_path) - 1):
city1 = cities[optimal_path[i]]
city2 = cities[optimal_path[i + 1]]
plt.plot([city1[0], city2[0]], [city1[1], city2[1]], 'b--')

# Annotate the cities with their indices and distances


for i, (x, y) in enumerate(cities):
plt.text(x, y, f'{i}', ha='center', va='center')

# Show distances between cities


for i, dist in enumerate(distances):
city1 = cities[optimal_path[i]]
city2 = cities[optimal_path[i + 1]]
mid_x = (city1[0] + city2[0]) / 2
mid_y = (city1[1] + city2[1]) / 2
plt.text(mid_x, mid_y, f'{dist:.2f}', color='green', ha='center',
va='center')

plt.title('Traveling Salesman Problem - Nearest Neighbor Algorithm')


plt.xlabel('X-coordinate')
plt.ylabel('Y-coordinate')
plt.legend()
plt.grid(visible=True)
plt.show()

print("Optimal Path:", optimal_path)


print("Total Distance:", total_distance)
print("Distances between cities:", distances)

Output:

Optimal Path: [0, 1, 2, 3, 4, 0]


Total Distance: 16.29379263475945
Distances between cities: [2.23606797749979, 2.23606797749979,
3.1622776601683795, 2.8284271247461903, 5.830951894845301]
Example-2:
import matplotlib.pyplot as plt
from itertools import permutations

V=4

# Function to find all possible paths and the optimal path


def find_paths(graph, s):
min_path = float('inf')
optimal_path = []
all_paths = []

# Generate all permutations of cities


perm = permutations(range(V))

for p in perm:
current_pathweight = 0
for i in range(V - 1):
current_pathweight += graph[p[i]][p[i + 1]]
current_pathweight += graph[p[-1]][p[0]] # Add return to start
city

all_paths.append((list(p), current_pathweight))

if current_pathweight < min_path:


min_path = current_pathweight
optimal_path = list(p) + [p[0]]

return min_path, optimal_path, all_paths

# Function to plot the graph and paths


def plot_paths(graph, paths, optimal_path):
x_coords = [0, 10, 15, 20]
y_coords = [10, 0, 35, 25]

plt.figure(figsize=(8, 6))
plt.scatter(x_coords, y_coords, c='red', label='Cities')
plt.plot(x_coords[0], y_coords[0], 'go', label='Start/End') # Mark the
starting city

for path, weight in paths:


if path != optimal_path:
for i in range(V - 1):
plt.plot([x_coords[path[i]], x_coords[path[i + 1]]],
[y_coords[path[i]], y_coords[path[i + 1]]], 'k--')
for i in range(V - 1):
plt.plot([x_coords[optimal_path[i]], x_coords[optimal_path[i + 1]]],
[y_coords[optimal_path[i]], y_coords[optimal_path[i + 1]]], 'g--')

plt.title('Traveling Salesman Problem - All Paths')


plt.xlabel('X-coordinate')
plt.ylabel('Y-coordinate')
plt.legend()
plt.grid(visible=True)
plt.show()

# Driver Code
if __name__ == "__main__":
graph = [
[0, 10, 15, 20],
[10, 0, 35, 25],
[15, 35, 0, 30],
[20, 25, 30, 0]
]
start_city = 0

min_dist, optimal_path, all_paths = find_paths(graph, start_city)

print("\nOptimal Path:", optimal_path)


print("Total Distance of Optimal Path:", min_dist)
print("\nAll Paths and their Distances:")
for path, dist in all_paths:
print(f"Path: {path}, Distance: {dist}")

plot_paths(graph, all_paths, optimal_path)

OUTPUT:
Optimal Path: [0, 1, 3, 2, 0]
Total Distance of Optimal Path: 80

All Paths and their Distances:


Path: [0, 1, 2, 3], Distance: 95
Path: [0, 1, 3, 2], Distance: 80
Path: [0, 2, 1, 3], Distance: 95
Path: [0, 2, 3, 1], Distance: 80
Path: [0, 3, 1, 2], Distance: 95
Path: [0, 3, 2, 1], Distance: 95
Path: [1, 0, 2, 3], Distance: 80
Path: [1, 0, 3, 2], Distance: 95
Path: [1, 2, 0, 3], Distance: 95
Path: [1, 2, 3, 0], Distance: 95
Path: [1, 3, 0, 2], Distance: 95
Path: [1, 3, 2, 0], Distance: 80
Path: [2, 0, 1, 3], Distance: 80
Path: [2, 0, 3, 1], Distance: 95
Path: [2, 1, 0, 3], Distance: 95
Path: [2, 1, 3, 0], Distance: 95
Path: [2, 3, 0, 1], Distance: 95
Path: [2, 3, 1, 0], Distance: 80
Path: [3, 0, 1, 2], Distance: 95
Path: [3, 0, 2, 1], Distance: 95
Path: [3, 1, 0, 2], Distance: 80
Path: [3, 1, 2, 0], Distance: 95
Path: [3, 2, 0, 1], Distance: 80
Path: [3, 2, 1, 0], Distance: 95
2. Implement A* Search algorithm.
Solution:

The A* (A-star) search algorithm is a widely used graph traversal and


pathfinding algorithm that efficiently finds the shortest path between
nodes in a graph, considering both the actual cost from the start
node and the heuristic estimate to the goal node.
The A* (A-star) algorithm is a pathfinding algorithm commonly used
for finding the shortest path between two points in a graph. It's
widely applied in various fields, especially in robotics, gaming, and
navigation systems. A* efficiently combines the features of Dijkstra's
algorithm (which guarantees the shortest path) and greedy best-first
search (which is faster but doesn't guarantee optimality).
we are going to find out how the A* search algorithm can be used to
find the most cost-effective path in a graph. Consider the following
graph above.
The numbers written on edges represent the distance between the
nodes, while the numbers written on nodes represent the heuristic
values. Let us find the most cost-effective path to reach from start
state A to final state G using the A* Algorithm.

Let’s start with node A. Since A is a starting node, therefore, the


value of g(x) for A is zero, and from the graph, we get the heuristic
value of A is 11, therefore
import heapq
import matplotlib.pyplot as plt

class Node:
def __init__(self, state, parent=None, cost=0, heuristic=0):
self.state = state
self.parent = parent
self.cost = cost
self.heuristic = heuristic

def __lt__(self, other):


return (self.cost + self.heuristic) < (other.cost + other.heuristic)

def a_star_search(start, goal, neighbors, heuristic):


open_set = []
closed_set = set()

start_node = Node(start, None, 0, heuristic(start, goal))


heapq.heappush(open_set, start_node)

while open_set:
current_node = heapq.heappop(open_set)

if current_node.state == goal:
path = []
while current_node:
path.append(current_node.state)
current_node = current_node.parent
return path[::-1]

closed_set.add(current_node.state)

for neighbor in neighbors(current_node.state):


if neighbor in closed_set:
continue

neighbor_node = Node(neighbor, current_node,


current_node.cost + 1, heuristic(neighbor, goal))

for node in open_set:


if node.state == neighbor and node.cost <=
neighbor_node.cost:
break
else:
heapq.heappush(open_set, neighbor_node)

return None

# Example heuristic function (Manhattan distance for grid-like


environments)
def manhattan_distance(state, goal):
return abs(state[0] - goal[0]) + abs(state[1] - goal[1])

# Example function to get neighbors of a node in a grid


def get_neighbors(state):
x, y = state
neighbors = [(x + 1, y), (x - 1, y), (x, y + 1), (x, y - 1)] # For a 4-
connected grid
return [(nx, ny) for nx, ny in neighbors if 0 <= nx < 10 and 0 <= ny <
10] # Example grid size

# Function to visualize grid and path


def visualize_grid(start, goal, path):
plt.figure(figsize=(8, 8))
plt.title('A* Search Visualization')
plt.xlabel('X-coordinate')
plt.ylabel('Y-coordinate')

for i in range(10): # Example grid size


for j in range(10):
plt.plot(i, j, 'o', color='lightgray') # Plot grid cells

for node in path:


plt.plot(node[0], node[1], 'go', markersize=8) # Plot path nodes

plt.plot(start[0], start[1], 'bs', label='Start') # Plot start node


plt.plot(goal[0], goal[1], 'rs', label='Goal') # Plot goal node

plt.legend()
plt.grid(visible=True)
plt.gca().invert_yaxis() # Invert y-axis to match grid representation
plt.show()

# Example usage:
start_state = (0, 0)
goal_state = (9, 9)
path = a_star_search(start_state, goal_state, get_neighbors,
manhattan_distance)

if path:
print("Path found:", path)
visualize_grid(start_state, goal_state, path)
else:
print("No path found.")
Output:
Path found: [(0, 0), (1, 0), (2, 0), (3, 0), (3, 1), (3, 2), (3, 3),
(3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (4, 9), (5, 9), (6, 9),
(7, 9), (8, 9), (9, 9)]

3. AO* algorithm – Artificial intelligence


In the above figure, the buying of a car may be broken down into
smaller problems or tasks that can be accomplished to achieve
the main goal in the above figure, which is an example of a
simple AND-OR graph. The other task is to either steal a car that
will help us accomplish the main goal or use your own money to
purchase a car that will accomplish the main goal. The AND symbol
is used to indicate the AND part of the graphs, which refers to the
need that all subproblems containing the AND to be resolved
before the preceding node or issue may be finished.
The start state and the target state are
already known in the knowledge-based search strategy known as
the AO* algorithm, and the best path is identified by heuristics.
The informed search technique considerably reduces the
algorithm’s time complexity. The AO* algorithm is far more
effective in searching AND-OR trees than the A* algorithm.
Difference between the A* Algorithm and AO* algorithm
 A* algorithm and AO* algorithm both works on the best
first search.
 They are both informed search and works on given
heuristics values.
 A* always gives the optimal solution but AO* doesn’t
guarantee to give the optimal solution.
 Once AO* got a solution doesn’t explore all possible
paths but A* explores all paths.
 When compared to the A* algorithm, the AO* algorithm
uses less memory.
 opposite to the A* algorithm, the AO* algorithm cannot go
into an endless loop.
Example:

Here in the above example below the Node which is


given is the heuristic value i.e h(n). Edge length is
considered as 1.
Step-1
Step 2
Step 3
Code
# Cost to find the AND and OR path
def Cost(H, condition, weight = 1):
cost = {}
if 'AND' in condition:
AND_nodes = condition['AND']
Path_A = ' AND '.join(AND_nodes)
PathA = sum(H[node]+weight for node in AND_nodes)
cost[Path_A] = PathA

if 'OR' in condition:
OR_nodes = condition['OR']
Path_B =' OR '.join(OR_nodes)
PathB = min(H[node]+weight for node in OR_nodes)
cost[Path_B] = PathB
return cost

# Update the cost


def update_cost(H, Conditions, weight=1):
Main_nodes = list(Conditions.keys())
Main_nodes.reverse()
least_cost= {}
for key in Main_nodes:
condition = Conditions[key]
print(key,':', Conditions[key],'>>>', Cost(H, condition,
weight))
c = Cost(H, condition, weight)
H[key] = min(c.values())
least_cost[key] = Cost(H, condition, weight)
return least_cost
# Print the shortest path
def shortest_path(Start,Updated_cost, H):
Path = Start
if Start in Updated_cost.keys():
Min_cost = min(Updated_cost[Start].values())
key = list(Updated_cost[Start].keys())
values = list(Updated_cost[Start].values())
Index = values.index(Min_cost)

# FIND MINIMIMUM PATH KEY


Next = key[Index].split()
# ADD TO PATH FOR OR PATH
if len(Next) == 1:

Start =Next[0]
Path += '<--' +shortest_path(Start, Updated_cost, H)
# ADD TO PATH FOR AND PATH
else:
Path +='<--('+key[Index]+') '

Start = Next[0]
Path += '[' +shortest_path(Start, Updated_cost, H) + '
+'

Start = Next[-1]
Path += shortest_path(Start, Updated_cost, H) + ']'

return Path

H = {'A': -1, 'B': 5, 'C': 2, 'D': 4, 'E': 7, 'F': 9, 'G': 3, 'H': 0, 'I':0, 'J':0}

Conditions = {
'A': {'OR': ['B'], 'AND': ['C', 'D']},
'B': {'OR': ['E', 'F']},
'C': {'OR': ['G'], 'AND': ['H', 'I']},
'D': {'OR': ['J']}
}
# weight
weight = 1
# Updated cost
print('Updated Cost :')
Updated_cost = update_cost(H, Conditions, weight=1)
print('*'*75)
print('Shortest Path :\n',shortest_path('A', Updated_cost,H))

Output:
4. Write a program to demonstrate the working of the decision tree-based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a
new sample.

5. import math
6. import pandas as pd
7. from operator import itemgetter

class DecisionTree:
def __init__(self, df, target, positive, parent_val, parent):
self.data = df
self.target = target
self.positive = positive
self.parent_val = parent_val
self.parent = parent
self.childs = []
self.decision = ''

def _get_entropy(self, data):


p = sum(data[self.target]==self.positive)
n = data.shape[0] - p
p_ratio = p/(p+n)
n_ratio = 1 - p_ratio
entropy_p = -p_ratio*math.log2(p_ratio) if p_ratio != 0
else 0
entropy_n = - n_ratio*math.log2(n_ratio) if n_ratio !=0
else 0
return entropy_p + entropy_n

def _get_gain(self, feat):


avg_info=0
for val in self.data[feat].unique():
avg_info+=self._get_entropy(self.data[self.data[feat]
== val])*sum(self.data[feat]==val)/self.data.shape[0]
return self._get_entropy(df) - avg_info
def _get_splitter(self):
self.splitter = max(self.gains, key = itemgetter(1))[0]

def update_nodes(self):
self.features = [col for col in self.data.columns if col !=
self.target]
self.entropy = self._get_entropy(self.data)
if self.entropy != 0:
self.gains = [(feat, self._get_gain(feat)) for feat in
self.features]
self._get_splitter()
residual_columns = [k for k in self.data.columns if k !
= self.splitter]
for val in self.data[self.splitter].unique():
df_tmp = self.data[self.data[self.splitter]==val]
[residual_columns]
tmp_node = DecisionTree(df_tmp, self.target,
self.positive, val, self.splitter)
tmp_node.update_nodes()
self.childs.append(tmp_node)

def print_tree(n):
for child in n.childs:
if child:
print(child.__dict__.get('parent', ''))
print(child.__dict__.get('parent_val', ''), '\n')
print_tree(child)

df = pd.read_csv('id3.csv')
df
dt = DecisionTree(df, 'Play', 'Yes', '', '')
dt.update_nodes()
print_tree(dt)
6. Build an Artificial Neural Network by
implementing the Backpropagation algorithm
and test the same using appropriate data
sets
Solution:
Artificial Neural Network (ANN) using the backpropagation algorithm:
An Artificial Neural Network (ANN) is a computational model inspired
by the way biological neural networks in the human brain operate. It
consists of interconnected nodes, called neurons, organized in
layers. In a typical ANN, there are three types of layers:
Input Layer: This layer receives input signals and passes them on to
the next layer. The number of neurons in the input layer
corresponds to the number of input features in your dataset.

Hidden Layers: These layers process the input data through


weighted connections from the previous layer, applying an
activation function to the weighted sum of inputs. Hidden layers
allow neural networks to learn complex patterns in the data.

Output Layer: This layer produces the final output of the neural
network. The number of neurons in the output layer depends on the
nature of the problem you are trying to solve (e.g., regression,
classification).
The backpropagation algorithm is a method used to train neural
networks. It works by iteratively adjusting the weights of the
connections in the network to minimize the difference between the
predicted output and the actual output. This is done by computing
the gradient of the loss function with respect to the weights using
the chain rule of calculus and updating the weights in the direction
that reduces the loss.

import numpy as np

X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float) #X=


(hours sleeping, hours studying)
y = np.array(([92], [86], [89]), dtype=float) #y=
score on test

# scale units
X = X/np.amax(X, axis=0) # maximum of X array
y = y/100 # max test score is 100

class Neural_Network(object):
def __init__(self):
# Parameters
self.inputSize = 2
self.outputSize = 1
self.hiddenSize = 3
# Weights
self.W1 = np.random.randn(self.inputSize,
self.hiddenSize) # (3x2) weight matrix from input to
hidden layer
self.W2 = np.random.randn(self.hiddenSize,
self.outputSize) # (3x1) weight matrix from hidden to
output layer

def forward(self, X):


#forward propagation through our
network
self.z = np.dot(X, self.W1) # dot product of X
(input) and first set of 3x2 weights
self.z2 = self.sigmoid(self.z) # activation
function
self.z3 = np.dot(self.z2, self.W2) # dot product of
hidden layer (z2) and second set of 3x1 weights
o = self.sigmoid(self.z3) # final activation
function
return o

def sigmoid(self, s):


return 1/(1+np.exp(-s)) # activation function

def sigmoidPrime(self, s):


return s * (1 - s) # derivative of sigmoid

def backward(self, X, y, o):


# backward propgate through the
network
self.o_error = y - o # error in output
self.o_delta = self.o_error*self.sigmoidPrime(o) #
applying derivative of sigmoid to
self.z2_error = self.o_delta.dot(self.W2.T) # z2 error:
how much our hidden layer weights contributed to output
error
self.z2_delta = self.z2_error*self.sigmoidPrime(self.z2)
# applying derivative of sigmoid to z2 error
self.W1 += X.T.dot(self.z2_delta) # adjusting first
set (input --> hidden) weights
self.W2 += self.z2.T.dot(self.o_delta) # adjusting
second set (hidden --> output) weights

def train (self, X, y):


o = self.forward(X)
self.backward(X, y, o)

NN = Neural_Network()
for i in range(1000): # trains the NN 1,000 times
print ("\nInput: \n" + str(X))
print ("\nActual Output: \n" + str(y))
print ("\nPredicted Output: \n" + str(NN.forward(X)))
print ("\nLoss: \n" + str(np.mean(np.square(y -
NN.forward(X))))) # mean sum squared loss)
NN.train(X, y)

Output:
7.Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
# import necessary libarities
import pandas as pd
from sklearn import tree
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB

# load data from CSV


data = pd.read_csv('tennisdata.csv')
print("THe first 5 values of data is :\n",data.head())

The First 5 values of train data is


Outlook Temperature Humidity Windy
0 Sunny Hot High False
1 Sunny Hot High True
2 Overcast Hot High False
3 Rainy Mild High False
4 Rainy Cool Normal False
y = data.iloc[:,-1]
print("\nThe first 5 values of Train output is\n",y.head())

The first 5 values of Train output is


0 No
1 No
2 Yes
3 Yes
4 Yes
Name: PlayTennis, dtype: object
# Convert then in numbers
le_outlook = LabelEncoder()
X.Outlook = le_outlook.fit_transform(X.Outlook)

le_Temperature = LabelEncoder()
X.Temperature = le_Temperature.fit_transform(X.Temperature)

le_Humidity = LabelEncoder()
X.Humidity = le_Humidity.fit_transform(X.Humidity)

le_Windy = LabelEncoder()
X.Windy = le_Windy.fit_transform(X.Windy)

print("\nNow the Train data is :\n",X.head())

Now the Train data is :


Outlook Temperature Humidity Windy
0 2 1 0 0
1 2 1 0 1
2 0 1 0 0
3 1 2 0 0
4 1 0 1 0
le_PlayTennis = LabelEncoder()
y = le_PlayTennis.fit_transform(y)
print("\nNow the Train output is\n",y)

Now the Train output is


[0 0 1 1 1 0 1 0 1 1 1 1 1 0]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.20)

classifier = GaussianNB()
classifier.fit(X_train,y_train)

from sklearn.metrics import accuracy_score


print("Accuracy is:",accuracy_score(classifier.predict(X_test),y_test))
Accuracy is: 0.6666666666666666

8. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use


the same data set for clustering using k-Means algorithm. Compare
the results of these two algorithms and comment on the quality of
clustering. You can add Java/Python ML library classes/API in the
program.

9. from sklearn.cluster import KMeans


10. from sklearn import preprocessing
11. from sklearn.mixture import GaussianMixture
12. from sklearn.datasets import load_iris
13. import sklearn.metrics as sm
14. import pandas as pd
15. import numpy as np
16. import matplotlib.pyplot as plt
17.
18. dataset=load_iris()
19. # print(dataset)
20.
21. X=pd.DataFrame(dataset.data)
22. X.columns=['Sepal_Length','Sepal_Width','Petal_Lengt
h','Petal_Width']
23. y=pd.DataFrame(dataset.target)
24. y.columns=['Targets']
25. # print(X)
26.
27. plt.figure(figsize=(14,7))
28. colormap=np.array(['red','lime','black'])
29.
30. # REAL PLOT
31. plt.subplot(1,3,1)
32. plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.
Targets],s=40)
33. plt.title('Real')
34.
35. # K-PLOT
36. plt.subplot(1,3,2)
37. model=KMeans(n_clusters=3)
38. model.fit(X)
39. predY=np.choose(model.labels_,
[0,1,2]).astype(np.int64)
40. plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[p
redY],s=40)
41. plt.title('KMeans')
42.
43. # GMM PLOT
44. scaler=preprocessing.StandardScaler()
45. scaler.fit(X)
46. xsa=scaler.transform(X)
47. xs=pd.DataFrame(xsa,columns=X.columns)
48. gmm=GaussianMixture(n_components=3)
49. gmm.fit(xs)
50.
51. y_cluster_gmm=gmm.predict(xs)
52. plt.subplot(1,3,3)
53. plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y
_cluster_gmm],s=40)
54. plt.title('GMM Classification')
Output:
9. Write a program to implement the k-Nearest Neighbour algorithm to classify the iris
data set. Print both correct and wrong predictions. Java/Python ML library classes can be
used for this problem.
Answer:
The Iris dataset is a classic dataset in machine learning and
statistics. It was introduced by the British biologist and statistician
Ronald Fisher in his 1936 paper "The use of multiple measurements
in taxonomic problems" as an example of linear discriminant
analysis. The dataset consists of 150 samples of iris flowers from
three different species: Setosa, Versicolor, and Virginica. For each
sample, four features are measured: the length and width of the
sepals and petals, in centimeters.

The k-Nearest Neighbors (k-NN) algorithm is a simple and intuitive


classification algorithm. It works by storing all available cases and
classifying new cases based on a similarity measure (e.g., distance
functions). When a new case is to be classified, the algorithm finds
the k nearest neighbors in the training dataset and assigns the most
common class among those neighbors to the new case.

In the context of the Iris dataset, the k-NN algorithm can be used to
classify iris flowers into one of the three species based on their
sepal and petal measurements. The algorithm calculates the
distances between the new flower and all other flowers in the
dataset, then selects the k nearest neighbors and assigns the class
label based on the majority class among those neighbors.

The performance of the k-NN algorithm can be evaluated using


various metrics such as accuracy, precision, recall, and F1-score,
which provide insights into how well the algorithm is classifying the
flowers. These metrics help us understand the strengths and
weaknesses of the algorithm and can be used to tune its parameters
for better performance.

Code:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score,
recall_score, f1_score
import matplotlib.pyplot as plt
import numpy as np
from tabulate import tabulate

# Load the Iris dataset


iris = load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names

# Split the dataset into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42)

# Initialize the k-NN classifier


knn = KNeighborsClassifier(n_neighbors=3)

# Train the classifier


knn.fit(X_train, y_train)

# Make predictions on the test set


y_pred = knn.predict(X_test)

# Calculate performance metrics


accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred,
average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

# Print the performance metrics


print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1-score: {f1}")

# Plot the performance metrics


labels = ['Accuracy', 'Precision', 'Recall', 'F1-score']
values = [accuracy, precision, recall, f1]

plt.figure(figsize=(10, 5))
plt.bar(labels, values, color=['blue', 'green', 'orange', 'red'])
plt.ylabel('Score')
plt.title('Performance Metrics of k-NN on Iris Dataset')
plt.ylim(0, 1)
plt.show()

# Create a table for actual and predicted flower names


table_data = []
for i in range(len(y_test)):
table_data.append([target_names[y_test[i]],
target_names[y_pred[i]]])

print("\nActual vs Predicted:")
print(tabulate(table_data, headers=['Actual', 'Predicted'],
tablefmt='grid'))

Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-score: 1.0
10. Implement the non-parametric Locally Weighted Regression algorithm in
order to fit data points. Select the appropriate dataset for your experiment and
draw gr
Answer:
To implement the Locally Weighted Regression (LWR) algorithm, we'll use Python.
We'll also select a dataset and fit data points using LWR. For this example, let's use
the boston dataset from the sklearn—datasets module, which contains housing prices
and other information about houses in Boston suburbs. We'll fit a regression line using
LWR and plot the results.
from math import ceil
import numpy as np
from scipy import linalg

def lowess(x, y, f, iterations):


n = len(x)
r = int(ceil(f * n))
h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)]
w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0)
w = (1 - w ** 3) ** 3
yest = np.zeros(n)
delta = np.ones(n)
for iteration in range(iterations):
for i in range(n):
weights = delta * w[:, i]
b = np.array([np.sum(weights * y), np.sum(weights * y *
x)])
A = np.array([[np.sum(weights), np.sum(weights * x)],
[np.sum(weights * x), np.sum(weights * x * x)]])
beta = linalg.solve(A, b)
yest[i] = beta[0] + beta[1] * x[i]

residuals = y - yest
s = np.median(np.abs(residuals))
delta = np.clip(residuals / (6.0 * s), -1, 1)
delta = (1 - delta ** 2) ** 2

return yest

import math
n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)
f =0.25
iterations=3
yest = lowess(x, y, f, iterations)

import matplotlib.pyplot as plt


plt.plot(x,y,"r.")
plt.plot(x,yest,"b-")

Output:

You might also like