CS3491AI & ML Lab Manual
CS3491AI & ML Lab Manual
SEMESTER IV
LAB MANUAL
DHIRAJLAL GANDHI COLLEGE OF
TECHNOLOGY
Salem Airport (Opp.), Salem – 636 309 Ph.
(04290) 233333, www.dgct.ac.in
BONAFIDE CERTIFICATE
Name : …………………………………………………………
Degree : …………………………………………………………
Branch: …………………………………………………………
Certified that this is the bonafide record of the work done by the above student in
…………………………………………………………………………………………………………………….
Laboratory during the academic year …………………………………
▪ Students must be present in proper dress code and wear the ID card.
▪ Students should enter the log-in and log-out time in the log
register without fail.
▪ Students are not allowed to download pictures, music,
videos or files without the permission of respective lab in-
charge.
▪ Students should wear their own lab coats and bring observation
note books to the laboratory classes regularly.
▪ Record of experiments done in a particular class should be
submitted in the next lab class.
▪ Students who do not submit the record note book in time will not
be allowed to do the next experiment and will not be given
attendance for that laboratory class.
▪ Students will not be allowed to leave the laboratory until they
complete the experiment.
▪ Students are advised to switch-off the Monitors and CPU when they
leave the lab.
▪ Students are advised to arrange the chairs properly when they leave
the lab.
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
College
Vision
To improve the quality of human life through multi-disciplinary programs in
Engineering, architecture and management that are internationally recognized
and would facilitate research work to incorporate social economical and
environmental development.
Mission
To create a vibrant atmosphere that creates competent engineers, innovators,
scientists, entrepreneurs, academicians and thinkers of tomorrow.
To establish centers of excellence that provides sustainable solutions to industry
and society.
To enhance capability through various values added programs so as to meet the
challenges of dynamically changing global needs.
Department
Vision
To cultivate creative, globally competent, employable and disciplined computing
professionals with the spirit of benchmarking educational system that promotes
academic excellence, scientific pursuits, entrepreneurship and professionalism.
Mission
● To develop the creators of tomorrow’s technology to meet the social needs of
our nation.
● To promote and encourage the strength of research in Engineering, Science
and Technology.
● To channel the gap between Academia, Industry and Society.
The Graduates of the program would constantly learn and update the
PEO1 knowledge in the emerging fields of technology.
Program Outcomes(POs)
To apply knowledge of mathematics, science, engineering fundamentals and
PO1 computer science theory to solve the complex problems in Computer Science
and Engineering.
To analyze problems, identify and define the solutions using basic principles of
PO2
mathematics, science, technology and computer engineering.
To design, implement, and evaluate computer based systems, processes,
PO3 components, or software to meet the realistic constraints for the public health
and safety, and the cultural, societal and environmental considerations.
To design and conduct experiments, perform analysis & interpretation and
PO4
provide valid conclusions with the use of research-based knowledge and
research methodologies related to Computer Science and Engineering.
To propose innovative original ideas and solutions, culminating into modern
PO5
engineering products for a large section of the society with longevity.
To apply the understanding of legal, health, security, cultural & social issues,
PO6 and thereby ones responsibility in their application in
Professional Engineering practices.
COURSE OUTCOME
Mapping
CO1 3 2 2 - - - - - 2 1 3 1 2 3
2
CO2 1 3 2 3 3 - - - 2 3 2 2 3 2 1
CO3
3 3 2 1 1 - - - 1 - 1 3 3 2 1
CO4 3 1 2 1 3 - - - 1 - 2 1 1 3 2
CO5 3 1 1 1 1 - - - 1 1 2 1 2 1 2
LIST OF EXPERIMENTS:
OUTCOMES: Upon completion of the course, the students will be able to:
CONTENTS
Implementation of Uninformed
1 search algorithms (BFS,DFS)
Implementation of Informed
2 search algorithms (A*,memory-
bounded A*)
SCORED: LAB-IN-CHARGE:
13
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
14
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
The aim of implementing Breadth-First Search (BFS) algorithms is to traverse a graph or a tree data
structure in a systematic way, visiting all nodes and edges in the structure in a particular order, without
revisiting any node twice.
Program:
# Breadth-First Search (BFS) algorithm
graph = {
'5' : ['3','7'],
'3' : ['2', '4'],
'7' : ['8'],
'2' : [],
'4' : ['8'],
'8' : []
}
15
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
# Driver Code
print("Following is the Breadth-First Search")
bfs(visited, graph, '5')
Output:
Following is the Breadth-First Search
537248
Result:
Thus the uninformed search algorithms Breadth-First Search (BFS) have been executed successfully and
the output got verified.
Viva Questions:
1. Breadth First Search is equivalent to which of the traversal in the Binary Trees?
a) Pre-order Traversal b) Post-order Traversal
c) Level-order Traversal d) In-order Traversal
2. The Data structure used in standard implementation of Breadth First Search is?
a) Stack b) Queue c) Linked List d) Tree
6. Regarding implementation of Breadth First Search using queues, what is the maximum distance
between two nodes present in the queue? (considering each edge length 1)
a) Can be anything b) 0 c) At most 1 d) Insufficient Information
8. A person wants to visit some places. He starts from a vertex and then wants to visit every place
connected to this vertex and so on. What algorithm he should use?
a) Depth First Search b) Breadth First Search
c) Trim’s algorithm d)Kruskal’s algorithm
9. The Breadth First Search algorithm has been implemented using the queue data structure. One possible
order of visiting the nodes of the following graph is
16
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Practice Exercise:
1. Develop a code by implementing the Uninformed search algorithm- BFS
2. Develop a code by implementing the 8 puzzles using the BFS.
3. Implementation of Breadth First Search for Tic-Tac-Toe Problem
4. Write a program to implement Towers of Hanoi problem.
5. A---B
|\ |
| \ |
| \|
C---D
Write a Python program to perform a Breadth-First Search on the above graph starting from vertex ‘A’.
6. A
/ \
B C
/ \ \
D E F
Write a Python program to perform a Breadth-First Search on this graph starting from vertex ‘A’.
7. Write a python program to implement BFS the graph is implemented as an adjacency list.
8. Write a program to implement the Uninformed strategy – Breadth-First Search considering the
following graph
graph = {'Q': ['P', 'C'], 'R':['D'], 'C':[], 'P':[]}
9. Write a program to implement the Uninformed strategy – Uniform Search
17
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
18
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
19
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Program:
# Depth-First Search (DFS) algorithm
graph = {
'5' : ['3','7'],
'3' : ['2', '4'],
'7' : ['8'],
'2' : [],
'4' : ['8'],
'8' : []
}
# Driver Code
print("Following is the Depth-First Search")
dfs(visited, graph, '5')
Output:
Following is the Depth-First Search
5
3
2
4
8
7
20
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Result:
Thus the uninformed search algorithms Depth-First Search (DFS) have been executed successfully and
the output got verified.
Viva Questions:
1. Depth First Search is equivalent to which of the traversal in the Binary Trees?
a) Pre-order Traversal b) Post-order Traversal
c) Level-order Traversal d) In-order Traversal
3. The Data structure used in standard implementation of Breadth First Search is?
a) Stack b) Queue c) Linked List d) Tree
5. A person wants to visit some places. He starts from a vertex and then wants to visit every vertex till it
finishes from one vertex, backtracks and then explore other vertex from same vertex. What algorithm he
should use?
a) Depth First Search b) Breadth First Search
c) Trim’s algorithm d) Kruskal’s Algorithm
8. Regarding implementation of Depth First Search using stacks, what is the maximum distance between two
nodes present in the stack? (Considering each edge length 1)
a) Can be anything b) 0 c) At most 1 d) Insufficient Information
10. Is following statement true/false If a DFS of a directed graph contains a back edge, any other DFS of the
same graph will also contain at least one back edge.
a) True b) False
21
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Practice Exercise:
1. Develop a code by implementing the Uninformed search algorithm- DFS
2. Develop a code to implement the DFS of a large dataset using the maximum recursion by using DFS
3. Implementation of Depth First Search for Water Jug Problem
4. Write a Python program to implement Depth-First Search using a tree.
5. Write a Python program to perform a DFS traversal starting in a graph and show the order of visited
vertices.
6. Write a Python program to find the articulation points of the graph using Depth-First Search.
7. Given the following adjacency matrix:
0110
1001
1001
0110
Perform a Depth-First Search on this graph starting from vertex ‘0’.
8. Write a program to implement the Uninformed strategy – Depth First Search considering the following
graph
graph = {'A': ['B', 'C'], 'B':['D'], 'C':[], 'D':[]}
9. Write a Program to Implement Monkey Banana Problem using Python.
22
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
23
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
24
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
The aim of a C program for implementing informed search algorithms like A* and memory-bounded A* is to
efficiently find the shortest path between two points in a graph or network. The A* algorithm is a heuristic-
based search algorithm that finds the shortest path between two points by evaluating the cost function of each
possible path.
Program:
from queue import PriorityQueue
v =14
graph =[[] for i in range(v)]
25
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
if visited[v] ==False:
visited[v] =True
pq.put((c, v))
print()
source =0
target =9
best_first_search(source, target, v)
OUTPUT:
A* :
0
1
3
2
8
9
Result:
Thus the above program executed successfully.
Viva Questions:
1. Which data structure is typically used to implement the open and closed lists in the A* search algorithm?
A. Queue B. Stack C. Set D. Priority Queue
2. Which of the following best describes the heuristic function in the A* search algorithm?
A. A function that assigns weights to the edges of the graph.
B. A function that estimates the cost from the current node to the goal node.
26
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
C. A function that selects the next node to expand based on the lowest path cost.
D. A function that checks if the goal node has been reached.
4. Which of the following best describes the admissibility property in relation to the heuristic function in A*
search algorithm?
A. The heuristic function never overestimates the actual cost to reach the goal node.
B. The heuristic function always underestimates the actual cost to reach the goal node.
C. The heuristic function provides an accurate estimate of the actual cost to reach the goal node.
D. The heuristic function does not affect the search process in A* algorithm.
6. Which of the following conditions can lead to an optimal path in the A* search algorithm?
A. The heuristic function is admissible, but not consistent.
B. The heuristic function is both admissible and consistent.
C. The heuristic function is consistent, but not admissible.
D. The heuristic function is neither admissible nor consistent.
8. Which of the following scenarios can cause the A* search algorithm to return a suboptimal path?
A. The heuristic function is admissible but not consistent.
B. The heuristic function is consistent but not admissible.
C. The search space contains cycles.
D. The heuristic function is neither admissible nor consistent.
9. Which of the following techniques can be used to improve the efficiency of the A* search algorithm?
A. Increasing the size of the open list.
B. Using an effective heuristic function.
C. Randomizing the order of expanding nodes.
D. Ignoring the closed list.
10. Which of the following search algorithms is a generalization of the A* search algorithm and guarantees
finding an optimal path even with inconsistent heuristic functions?
A. Depth-First Search (DFS)
27
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
B. Breadth-First Search (BFS)
C. Iterative-Deepening A* (IDA*)
D. Uniform Cost Search (UCS)
Practice Exercise:
1. Develop a code by implementing the Informed search algorithm- A*
2. Develop a code using the repository of UCI Dataset and perform the Informed search algorithm- A*
3. Write the program to find the shortest path from `start` to `goal` in a `graph` by means of A* algorithm.
4. Write a program to implement the A* algorithm to find the shortest path from source to all vertices.
5. Write a program to implement Hill Climbing algorithm.
28
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
29
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
30
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
The aim of a C program for implementing informed search algorithms like memory-bounded A* is to efficiently
find the shortest path between two points in a graph or network. The memory bounded A* algorithm is a
variant of the A* algorithm that uses a limited amount of memory and is suitable for large search spaces.
Program:
#Memory Bounded A *
class Graph:
def __init__(self, graph, heuristicNodeList, startNode): #instantiate graph object with graph topology,
heuristic values, start node
self.graph = graph
self.H=heuristicNodeList
self.start=startNode
self.parent={}
self.status={}
self.solutionGraph={}
31
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
self.H[n]=value # set the revised heuristic value of a given node
def printSolution(self):
print("FOR GRAPH SOLUTION, TRAVERSE THE GRAPH FROM THE START NODE:",self.start)
print("------------------------------------------------------------")
print(self.solutionGraph)
print("------------------------------------------------------------")
def computeMinimumCostChildNodes(self, v): # Computes the Minimum Cost of child nodes of a given node
v
minimumCost=0
costToChildNodeListDict={}
costToChildNodeListDict[minimumCost]=[]
flag=True
for nodeInfoTupleList in self.getNeighbors(v): # iterate over all the set of child node/s
cost=0
nodeList=[]
for c, weight in nodeInfoTupleList:
cost=cost+self.getHeuristicNodeValue(c)+weight
nodeList.append(c)
if flag==True: # initialize Minimum Cost with the cost of first set of child node/s
minimumCost=cost
costToChildNodeListDict[minimumCost]=nodeList # set the Minimum Cost child node/s
flag=False
else: # checking the Minimum Cost nodes with the current Minimum Cost
if minimumCost>cost:
minimumCost=cost
costToChildNodeListDict[minimumCost]=nodeList # set the Minimum Cost child node/s
return minimumCost, costToChildNodeListDict[minimumCost] # return Minimum Cost and Minimum Cost
child node/s
def aoStar(self, v, backTracking): # AO* algorithm for a start node and backTracking status flag
print("HEURISTIC VALUES :", self.H)
print("SOLUTION GRAPH :", self.solutionGraph)
print("PROCESSING NODE :", v)
print("-----------------------------------------------------------------------------------------")
if self.getStatus(v) >= 0: # if status node v >= 0, compute Minimum Cost nodes of v
minimumCost, childNodeList = self.computeMinimumCostChildNodes(v)
print(minimumCost, childNodeList)
self.setHeuristicNodeValue(v, minimumCost)
self.setStatus(v,len(childNodeList))
solved=True # check the Minimum Cost nodes of v are solved
for childNode in childNodeList:
self.parent[childNode]=v
if self.getStatus(childNode)!=-1:
solved=solved & False
if solved==True: # if the Minimum Cost nodes of v are solved, set the current node status as solved(-1)
self.setStatus(v,-1)
self.solutionGraph[v]=childNodeList # update the solution graph with the solved nodes which may be
32
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
a part of solution
if v!=self.start: # check the current node is the start node for backtracking the current node value
self.aoStar(self.parent[v], True) # backtracking the current node value with backtracking status set to
true
if backTracking==False: # check the current call is not for backtracking
for childNode in childNodeList: # for each Minimum Cost child node
self.setStatus(childNode,0) # set the status of child node to 0(needs exploration)
self.aoStar(childNode, False) # Minimum Cost child node is further explored with backtracking
status as false
G2 = Graph(graph2, h2, 'A') # Instantiate Graph object with graph, heuristic values and start Node
G2.applyAOStar() # Run the AO* algorithm
G2.printSolution() # Print the solution graph as output of the AO* algorithm search
Output:
Graph - 1
HEURISTIC VALUES : {'A': 1, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
10 ['B', 'C']
HEURISTIC VALUES : {'A': 10, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : B
33
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
-----------------------------------------------------------------------------------------
6 ['G']
HEURISTIC VALUES : {'A': 10, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
10 ['B', 'C']
HEURISTIC VALUES : {'A': 10, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : G
-----------------------------------------------------------------------------------------
8 ['I']
HEURISTIC VALUES : {'A': 10, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 8, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : B
-----------------------------------------------------------------------------------------
8 ['H']
HEURISTIC VALUES : {'A': 10, 'B': 8, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 8, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
12 ['B', 'C']
HEURISTIC VALUES : {'A': 12, 'B': 8, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 8, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : I
-----------------------------------------------------------------------------------------
0 []
HEURISTIC VALUES : {'A': 12, 'B': 8, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 8, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': []}
PROCESSING NODE : G
-----------------------------------------------------------------------------------------
1 ['I']
HEURISTIC VALUES : {'A': 12, 'B': 8, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I']}
PROCESSING NODE : B
-----------------------------------------------------------------------------------------
2 ['G']
HEURISTIC VALUES : {'A': 12, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G']}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
6 ['B', 'C']
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G']}
PROCESSING NODE : C
-----------------------------------------------------------------------------------------
2 ['J']
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G']}
34
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
6 ['B', 'C']
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G']}
PROCESSING NODE : J
-----------------------------------------------------------------------------------------
0 []
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 0}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G'], 'J': []}
PROCESSING NODE : C
-----------------------------------------------------------------------------------------
1 ['J']
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 1, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 0}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G'], 'J': [], 'C': ['J']}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
5 ['B', 'C']
FOR GRAPH SOLUTION, TRAVERSE THE GRAPH FROM THE START NODE: A
------------------------------------------------------------
{'I': [], 'G': ['I'], 'B': ['G'], 'J': [], 'C': ['J'], 'A': ['B', 'C']}
------------------------------------------------------------
Graph - 2
HEURISTIC VALUES : {'A': 1, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
11 ['D']
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {}
PROCESSING NODE : D
-----------------------------------------------------------------------------------------
10 ['E', 'F']
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
11 ['D']
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {}
PROCESSING NODE : E
-----------------------------------------------------------------------------------------
0 []
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 10, 'E': 0, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {'E': []}
PROCESSING NODE : D
-----------------------------------------------------------------------------------------
6 ['E', 'F']
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 6, 'E': 0, 'F': 4, 'G': 5, 'H': 7}
35
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SOLUTION GRAPH : {'E': []}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
7 ['D']
HEURISTIC VALUES : {'A': 7, 'B': 6, 'C': 12, 'D': 6, 'E': 0, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {'E': []}
PROCESSING NODE : F
-----------------------------------------------------------------------------------------
0 []
HEURISTIC VALUES : {'A': 7, 'B': 6, 'C': 12, 'D': 6, 'E': 0, 'F': 0, 'G': 5, 'H': 7}
SOLUTION GRAPH : {'E': [], 'F': []}
PROCESSING NODE : D
-----------------------------------------------------------------------------------------
2 ['E', 'F']
HEURISTIC VALUES : {'A': 7, 'B': 6, 'C': 12, 'D': 2, 'E': 0, 'F': 0, 'G': 5, 'H': 7}
SOLUTION GRAPH : {'E': [], 'F': [], 'D': ['E', 'F']}
PROCESSING NODE : A
-----------------------------------------------------------------------------------------
3 ['D']
FOR GRAPH SOLUTION, TRAVERSE THE GRAPH FROM THE START NODE: A
------------------------------------------------------------
{'E': [], 'F': [], 'D': ['E', 'F'], 'A': ['D']}
Result:
Thus the above program executed successfully.
Viva Questions:
1. What is the other name of informed search strategy?
a) Simple search b) Heuristic search c) Online search d) None of the mentioned
3. Which search uses the problem specific knowledge beyond the definition of the problem?
a) Informed search b) Depth-first search
c) Breadth-first search d) Uninformed search
4. Which function will select the lowest expansion node at first for evaluation?
a) Greedy best-first search b) Best-first search
c) Depth-first search d) None of the mentioned
36
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
7. Which method is used to search better by learning?
a) Best-first search b) Depth-first search
c) Metalevel state space d) None of the mentioned
10. Which search method will expand the node that is closest to the goal?
a) Best-first search b) Greedy best-first search
c) A* search d) None of the mentioned
Practice Exercise:
1. Develop a code by implementing the Informed search algorithm- Memory Bounded A*
2. Implement the code for accessing the Basketball Logo through Informed search algorithm- Memory
Bounded A*
3. Write a Program to Implement N-Queens Problem using Python
4. Consider the following grid map, where each cell is either passable (0) or blocked (1):
000100000
000100000
000100000
000100000
000000000
000000000
Write the program to find the shortest path from the start to the goal using A* algorithm.
Note: Moves can in any of the four cardinal directions (up, down, left, right) but not diagonally.
The start position is (0, 0) and the goal position is (5, 8).
37
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
38
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
39
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
40
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
IMPLEMENT NAVIBAYES
Aim:
The aim of the Naïve Bayes algorithm is to classify a given set of data points into different classes based on the
probability of each data point belonging to a particular class. This algorithm is based on the Bayes theorem,
which states that the probability of an event occurring given the prior knowledge of another event can be
calculated using conditional probability.
Algorithm:
1. Collect the dataset: The first step in using Naïve Bayes is to collect a dataset that contains a set of data
points and their corresponding classes.
2. Prepare the data: The next step is to preprocess the data and prepare it for the Naïve Bayes algorithm.
This involves removing any unnecessary features or attributes and normalizing the data.
2. Compute the prior probabilities: The prior probabilities of each class can be computed by calculating the
number of data points belonging to each class and dividing it by the total number of data points.
3. Compute the likelihoods: The likelihoods of each feature for each class can be computed by calculating the
conditional probability of the feature given the class. This involves counting the number of data points in
each class that have the feature and dividing it by the total number of data points in that class.
3. Compute the posterior probabilities: The posterior probabilities of each class can be computed by
multiplying the prior probability of the class with the product of the likelihoods of each feature for that
class.
4. Make predictions: Once the posterior probabilities have been computed for each class, the Naïve Bayes
algorithm can be used to make predictions by selecting the class with the highest probability.
5. Evaluate the model: The final step is to evaluate the performance of the Naïve Bayes model. This can be
done by computing various performance metrics such as accuracy, precision, recall, and F1 score.
Program:
# load the iris dataset
from sklearn.datasets import load_iris
iris = load_iris()
41
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
gnb = GaussianNB()
gnb.fit(X_train, y_train)
# comparing actual response values (y_test) with predicted response values (y_pred)
from sklearn import metrics
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test, y_pred)*100)
Output:
Gaussian Naive Bayes model accuracy(in %): 95.0
Result:
Thus the program for Navy Bayes is verified successfully and output is verified.
Viva Questions:
1. Which of the following statements best describes Naive Bayes Algorithm?
a) It is a supervised learning algorithm used for classification.
b) It is an unsupervised learning algorithm used for clustering.
c) It is a reinforcement learning algorithm used for decision making.
d) It is a dimensionality reduction algorithm used for feature extraction.
2. What assumption does Naive Bayes Algorithm make regarding the independence of features?
a) Conditional independence b) Mutual independence c) Dependence d) None of the above
3. Which probability distribution is commonly used for modeling the likelihood in Naive Bayes Algorithm?
a) Normal distribution b) Uniform distribution
c) Poisson distribution d) Bernoulli distribution
5. Which assumption is violated by Naive Bayes Algorithm if there is a high degree of interdependence among
the features?
a) Linearity assumption b) Normality assumption
c) Independence assumption d) Homoscedasticity assumption
6. Which variant of Naive Bayes Algorithm is suitable for handling continuous-valued features?
a) Gaussian Naive Bayes b) Multinomial Naive Bayes
c) Complement Naive Bayes d) Bernoulli Naive Bayes
42
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
b) Estimating the conditional probabilities of each feature given the class
c) Combining the prior and conditional probabilities
d) All of the above
8. What problem can occur in Naive Bayes Algorithm if a particular feature has zero probability in the training
dataset for a certain class?
a) Overfitting b) Underfitting c) Zero-frequency problem d) Class imbalance problem
9. Which evaluation metric is commonly used to assess the performance of Naive Bayes Algorithm for
classification tasks?
a) Mean Absolute Error (MAE) b) Root Mean Squared Error (RMSE)
c) F1 score d) R-squared (R^2) score
Practice Exercise:
1. Develop a code by implementing the Analyzation of data set using naïve Bayes models
2. Develop a code to implement the Gaussian naïve Bayes models for the spam filtering process.
3. Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to perform
this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy, precision, and
recall for your data set.
4. Write a program to implement the naïve Bayesian classifier for a sample training data set stored as a .CSV
file. Compute the accuracy of the classifier, considering few test data sets.
5. Write a python program to implement a Naive Bayes classifier using scikit-learn library
6. Write a python program to implement Gaussian naïve bayes models
7. Write a python program to implement Bernoulli naïve bayes models
8. Write a python program to implement Multinomial naïve bayes models
9. Write a program to implement Naive Bayes models for the following problem Assume we have to find the
probability of the randomly picked card to be king given that it is a face card.
43
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
44
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
45
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
46
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
The aim of implementing Bayesian Networks is to model the probabilistic relationships between a set of
variables. A Bayesian Network is a graphical model that represents the conditional dependencies between
different variables in a probabilistic manner. It is a powerful tool for reasoning under uncertainty and can be
used for a wide range of applications, including decision making, risk analysis, and prediction.
Algorithm:
1. Define the variables: The first step in implementing a Bayesian Network is to define the variables that will
be used in the model. Each variable should be clearly defined and its possible states should be enumerated.
2. Determine the relationships between variables: The next step is to determine the probabilistic relationships
between the variables. This can be done by identifying the causal relationships between the variables or by
using data to estimate the conditional probabilities of each variable given its parents.
3. Construct the Bayesian Network: The Bayesian Network can be constructed by representing the variables
as nodes in a directed acyclic graph (DAG). The edges between the nodes represent the conditional
dependencies between the variables.
4. Assign probabilities to the variables: Once the structure of the Bayesian Network has been defined, the
probabilities of each variable must be assigned. This can be done by using expert knowledge, data, or a
combination of both.
5. Inference: Inference refers to the process of using the Bayesian Network to make predictions or draw
conclusions. This can be done by using various inference algorithms, such as variable elimination or belief
propagation.
6. Learning: Learning refers to the process of updating the probabilities in the Bayesian Network based on
new data. This can be done using various learning algorithms, such as maximum likelihood or Bayesian
learning.
7. Evaluation: The final step in implementing a Bayesian Network is to evaluate its performance. This can be
done by comparing the predictions of the model to actual data and computing various performance
metrics, such as accuracy or precision.
47
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Database: 0 1 2 3 4 Total
Attribute Information:
Program:
#install
!pip install pgmpy
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianNetwork
from pgmpy.inference import VariableElimination
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
48
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
print(heartDisease.head())
model=
BayesianNetwork([('age','heartdisease'),('sex','heartdisease'),('exang','heartdisease'),('cp','heartdisease'),('hear
tdisease','restecg'),('heartdisease','chol')])
print('\nLearning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
Output:
oldpeak slope ca thal heartdisease
0 2.3 3 0 6 0
1 1.5 2 3 3 2
2 2.6 2 2 7 1
3 3.5 3 0 3 0
4 1.4 1 0 3 0
49
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
thal object
heartdisease int64
dtype: object
Result:
Thus the program to implement a Bayesian Networks in the given heart disease dataset have been executed
successfully and the output got verified.
Viva Questions:
5. How the entries in the full joint probability distribution can be calculated?
a) Using variables b) Using information
c) Both Using variables & information d) None of the mentioned
10. What is the consequence between a node and its predecessors while creating bayesian network?
a) Functionally dependent b) Dependant
c) Conditionally independent d) Both Conditionally dependant & Dependant
Practice Exercise:
1. Write a program to implement Bayesian Network that will model the performance of a student on an
exam.
2. Write a program to construct a Bayesian network considering medical data. Use this model to demonstrate
the diagnosis of heart patients using standard Heart Disease Data Set. You can use Python ML library
classes/API
3. Write a python program to create a simple Bayesian network using pgmpy.
4. Write a python program to implement the EM algorithm for Bayesian networks in Python
5. Write a python program using the K2 algorithm for learning the structure of a Bayesian network
6. Develop a code to implement the Bayesian Networks for performing the Iteration process and Analyze the
random networks.
7. Write a EM code for understand the heart diseases and implement using the Bayesian Networks.
8. Develop a code by implementing the probability relationship check between two dataset using Bayesian
Networks
51
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
52
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
53
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
54
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To build regression models such as locally weighted linear regression and plot the necessary graphs.
Algorithm:
1. Read the Given data Sample to X and the curve (linear or non-linear) to Y
2. Set the value for Smoothening parameter or Free parameter say τ
3. Set the bias /Point of interest set x0 which is a subset of X
4. Determine the weight matrix using :
5. Determine the value of model term parameter β using :
6. Prediction = x0*β.
Program:
55
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
return yest
import math
n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)
f =0.25
iterations=3
yest = lowess(x, y, f, iterations)
import matplotlib.pyplot as plt
plt.plot(x,y,"r.")
plt.plot(x,yest,"b-")
Output
Result:
Thus the program to implement non-parametric Locally Weighted Regression algorithm in order to fit
data points with a graph visualization have been executed successfully.
VIVA Questions
1. Which one of the following statements about the correlation coefficient is correct?
🗸 The correlation coefficient is unaffected by scale changes.
🗸 Both the change of scale and the change of origin have no effect on the correlation coefficient.
🗸 The correlation coefficient is unaffected by the change of origin.
🗸 The correlation coefficient is affected by changes of origin and scale.
2. Choose the correct option concerning the correlation analysis between 2 sets of data.
🗸 Multiple correlations is a correlational analysis comparing two sets of data.
🗸 A partial correlation is a correlational analysis comparing two sets of data.
🗸 A simple correlation is a correlational analysis comparing two sets of data.
🗸 None of the preceding.
56
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Practice Exercises
1. Develop a code to understand and predict an outcome variable based on the input
Regression models.
2. Predict an outcome of the number of customer increased by analyzing through the regression
model.
3. Write a python program to implement Simple Linear Regression and plot the graph.
57
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
58
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
59
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
60
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To implement the concept of decision trees with suitable dataset from real world problems using CART
algorithm.
Algorithm:
Steps in CART algorithm:
Program
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_csv('Social_Network_Ads.csv')
data.head()
feature_cols = ['Age', 'EstimatedSalary']
x = data.iloc[:, [2, 3]].values
y = data.iloc[:, 4].values
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(x_test)
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier()
classifier = classifier.fit(x_train, y_train)
y_pred = classifier.predict(x_test)
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
61
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
from matplotlib.colors import ListedColormap
x_set, y_set = x_test, y_test
x1, x2 = np.meshgrid(np.arange(start=x_set[:, 0].min()-1, stop=x_set[:, 0].max()+1, step=0.01),
np.arange(start=x_set[:, 1].min()-1, stop=x_set[:, 1].max()+1, step=0.01))
plt.contourf(x1,x2, classifier.predict(np.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape), alpha=0.75,
cmap=ListedColormap(("red", "green")))
plt.xlim(x1.min(), x1.max())
plt.ylim(x2.min(), x2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], c=ListedColormap(("red", "green"))(i), label=j)
plt.title("Decision Tree(Test set)")
plt.xlabel("Age")
plt.ylabel("Estimated Salary")
plt.legend()
plt.show()
from sklearn.tree import export_graphviz
from six import StringIO
from IPython.display import Image
import pydotplus
dot_data = StringIO()
export_graphviz(classifier, out_file=dot_data, filled=True, rounded=True, special_characters=True,
feature_names=feature_cols, class_names=['0', '1'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.write_png('decisiontree.png'))
classifier = DecisionTreeClassifier(criterion="gini", max_depth=3)
classifier = classifier.fit(x_train, y_train)
y_pred = classifier.predict(x_test)
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
dot_data = StringIO()
export_graphviz(classifier, out_file=dot_data, filled=True, rounded=True, special_characters=True,
feature_names=feature_cols, class_names=['0', '1'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.write_png('opt_decisiontree_gini.png'))
62
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Result:
Thus the program to implement the concept of decision trees with suitable dataset from real world
problems using CART algorithm have been executed successfully.
VIVA Questions
63
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
🗸 Decision Node
🗸Path
🗸 Arc/Edge
4. Increase in Training time will tends to
🗸 Decreased of Size
🗸Increased of Size
🗸 Constant Size
🗸 None of the above.
5. For each Split the number of random attributes tested are tends to be
🗸Sensitive
🗸 Insensitive
🗸Fairly insensitive
🗸None of the above
Practice Exercises
1. Develop a code to build random forests for the dataset by understand the difference between
Random and Decision Tree.
2. Develop a code to understand the risk to prevent the heart attack using the Decision Trees.
3. Write a python program to build decision tree regression using scikit-learn library.
64
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
65
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
66
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To create a machine learning model which classifies the Spam and Ham E-Mails from a given dataset using
Support Vector Machine algorithm.
Algorithm:
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import string
from nltk.corpus import stopwords
import os
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
from PIL import Image
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import roc_curve, auc
from sklearn import metrics
from sklearn import model_selection
from sklearn import svm
from nltk import word_tokenize
from sklearn.metrics import roc_auc_score
from matplotlib import pyplot
from sklearn.metrics import ConfusionMatrixDisplay
class data_read_write(object):
def __init__(self):
pass
def __init__(self, file_link):
67
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
self.data_frame = pd.read_csv(file_link)
def read_csv_file(self, file_link):
return self.data_frame
def write_to_csvfile(self, file_link):
self.data_frame.to_csv(file_link, encoding='utf-8', index=False, header=True)
return
class generate_word_cloud(data_read_write):
def __init__(self):
pass
def variance_column(self, data):
return np.variance(data)
def word_cloud(self, data_frame_column, output_image_file):
text = " ".join(review for review in data_frame_column)
stopwords = set(STOPWORDS)
stopwords.update(["subject"])
wordcloud = WordCloud(width = 1200, height = 800, stopwords=stopwords,
max_font_size = 50, margin=0,
background_color = "white").generate(text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.savefig("Distribution.png")
plt.show()
wordcloud.to_file(output_image_file)
return
class data_cleaning(data_read_write):
def __init__(self):
pass
def message_cleaning(self, message):
Test_punc_removed = [char for char in message if char not in string.punctuation]
Test_punc_removed_join = ''.join(Test_punc_removed)
Test_punc_removed_join_clean = [word for word in Test_punc_removed_join.split()
if word.lower() not in stopwords.words('english')]
final_join = ' '.join(Test_punc_removed_join_clean)
return final_join
def apply_to_column(self, data_column_text):
data_processed = data_column_text.apply(self.message_cleaning)
return data_processed
class apply_embeddding_and_model(data_read_write):
def __init__(self):
pass
def apply_count_vector(self, v_data_column):
vectorizer = CountVectorizer(min_df=2, analyzer="word", tokenizer=None,
preprocessor=None, stop_words=None)
return vectorizer.fit_transform(v_data_column)
def apply_svm(self, X, y):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
params = {'kernel': 'linear', 'C': 2, 'gamma': 1}
svm_cv = svm.SVC(C=params['C'], kernel=params['kernel'], gamma=params['gamma'],
probability=True)
svm_cv.fit(X_train, y_train)
y_predict_test = svm_cv.predict(X_test)
68
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
cm = confusion_matrix(y_test, y_predict_test)
sns.heatmap(cm, annot=True)
print(classification_report(y_test, y_predict_test))
print("test set")
print("\nAccuracy Score: " + str(metrics.accuracy_score(y_test, y_predict_test)))
print("F1 Score: " + str(metrics.f1_score(y_test, y_predict_test)))
print("Recall: " + str(metrics.recall_score(y_test, y_predict_test)))
print("Precision: " + str(metrics.precision_score(y_test, y_predict_test)))
class_names = ['ham', 'spam']
titles_options = [("Confusion matrix, without normalization", None),
("Normalized confusion matrix", 'true')]
for title, normalize in titles_options:
disp = plot_confusion_matrix(svm_cv, X_test,
y_test,display_labels=class_names,cmap=plt.cm.Blues,normalize=normalize)
disp.ax_.set_title(title)
print(title)
print(disp.confusion_matrix)
plt.savefig("SVM.png")
plt.show()
ns_probs = [0 for _ in range(len(y_test))]
lr_probs = svm_cv.predict_proba(X_test)
lr_probs = lr_probs[:, 1]
ns_auc = roc_auc_score(y_test, ns_probs)
lr_auc = roc_auc_score(y_test, lr_probs)
print('No Skill: ROC AUC=%.3f' % (ns_auc))
print('SVM: ROC AUC=%.3f' % (lr_auc))
ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)
lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)
pyplot.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')
pyplot.plot(lr_fpr, lr_tpr, marker='.', label='SVM')
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
pyplot.savefig("SVMMat.png")
pyplot.show()
return
data_obj = data_read_write("emails.csv")
data_frame = data_obj.read_csv_file("processed.csv")
data_frame.head()
data_frame.tail()
data_frame.describe()
data_frame.info()
data_frame.head()
data_frame.groupby('spam').describe()
data_frame['length'] = data_frame['text'].apply(len)
data_frame['length'].max()
sns.set(rc={'figure.figsize':(11.7,8.27)})
ham_messages_length = data_frame[data_frame['spam']==0]
spam_messages_length = data_frame[data_frame['spam']==1]
ham_messages_length['length'].plot(bins=100, kind='hist',label = 'Ham')
spam_messages_length['length'].plot(bins=100, kind='hist',label = 'Spam')
69
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
plt.title('Distribution of Length of Email Text')
plt.xlabel('Length of Email Text')
plt.legend()
data_frame[data_frame['spam']==0].text.values
ham_words_length = [len(word_tokenize(title)) for title in
data_frame[data_frame['spam']==0].text.values]
spam_words_length = [len(word_tokenize(title)) for title in
data_frame[data_frame['spam']==1].text.values]
print(max(ham_words_length))
print(max(spam_words_length))
sns.set(rc={'figure.figsize':(11.7,8.27)})
ax = sns.distplot(ham_words_length, norm_hist = True, bins = 30, label = 'Ham')
ax = sns.distplot(spam_words_length, norm_hist = True, bins = 30, label = 'Spam')
plt.title('Distribution of Number of Words')
plt.xlabel('Number of Words')
plt.legend()
plt.savefig("SVMGraph.png")
plt.show()
def mean_word_length(x):
word_lengths = np.array([])
for word in word_tokenize(x):
word_lengths = np.append(word_lengths, len(word))
return word_lengths.mean()
ham_meanword_length =data_frame[data_frame['spam']==0].text.apply(mean_word_length)
spam_meanword_length =data_frame[data_frame['spam']==1].text.apply(mean_word_length)
sns.distplot(ham_meanword_length, norm_hist = True, bins = 30, label = 'Ham')
sns.distplot(spam_meanword_length , norm_hist = True, bins = 30, label = 'Spam')
plt.title('Distribution of Mean Word Length')
plt.xlabel('Mean Word Length')
plt.legend()
plt.savefig("Graph.png")
plt.show()
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
def stop_words_ratio(x):
num_total_words = 0
num_stop_words = 0
for word in word_tokenize(x):
if word in stop_words:
num_stop_words += 1
num_total_words += 1
return num_stop_words / num_total_words
ham_stopwords = data_frame[data_frame['spam'] == 0].text.apply(stop_words_ratio)
spam_stopwords = data_frame[data_frame['spam'] == 1].text.apply(stop_words_ratio)
sns.distplot(ham_stopwords, norm_hist=True, label='Ham')
sns.distplot(spam_stopwords, label='Spam')
print('Ham Mean: {:.3f}'.format(ham_stopwords.values.mean()))
print('Spam Mean: {:.3f}'.format(spam_stopwords.values.mean()))
plt.title('Distribution of Stop-word Ratio')
plt.xlabel('Stop Word Ratio')
plt.legend()
70
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
ham = data_frame[data_frame['spam']==0]
spam = data_frame[data_frame['spam']==1]
spam['length'].plot(bins=60, kind='hist')
ham['length'].plot(bins=60, kind='hist')
data_frame['Ham(0) and Spam(1)'] = data_frame['spam']
print( 'Spam percentage =', (len(spam) / len(data_frame) )*100,"%")
print( 'Ham percentage =', (len(ham) / len(data_frame) )*100,"%")
sns.countplot(data_frame['Ham(0) and Spam(1)'], label = "Count")
data_clean_obj = data_cleaning()
data_frame['clean_text'] = data_clean_obj.apply_to_column(data_frame['text'])
data_frame.head()
data_obj.data_frame.head()
data_obj.write_to_csvfile("processed_file.csv")
cv_object = apply_embedding_and_model()
spamham_countvectorizer = cv_object.apply_count_vector(data_frame['clean_text'])
X = spamham_countvectorizer
label = data_frame['spam'].values
y = label
cv_object.apply_svm(X,y)
Output:
test set
Accuracy Score: 0.9895287958115183
F1 Score: 0.9776119402985075
Recall: 0.9739776951672863
Precision: 0.9812734082397003
Normalized confusion matrix
[[0.99429875 0.00570125]
[0.0260223 0.9739777 ]]
71
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
72
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Result:
Thus the program to create a machine learning model which classifies the Spam and Ham E-Mails from a
given dataset using Support Vector Machine algorithm have been successfully executed.
VIVA Questions
Practice Exercises
1. Write a python program to build SVM (Support Vector Machine) models using scikit-learn.
2. Write a program to implement the SVM using the following dataset
https://www.kaggle.com/mltuts/social- network-ads.
3. Write a python program to implement Agglomerative Hierarchical Clustering, using Python and
scikit-learn library.
74
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
75
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
76
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To implement the ensembling technique of Blending with the given Alcohol QCM Dataset.
Algorithm:
1. Split the training dataset into train, test and validation dataset.
2. Fit all the base models using train dataset.
3. Make predictions on validation and test dataset.
4. These predictions are used as features to build a second level model
5. This model is used to make predictions on test and meta-features.
Program:
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
import xgboost as xgb
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
df = pd.read_csv("train_data.csv")
target = df["target"]
train = df.drop("target")
X_train, X_test, y_train, y_test = train_test_split(train, target, test_size=0.20)
train_ratio = 0.70
validation_ratio = 0.20
test_ratio = 0.10
x_train, x_test, y_train, y_test = train_test_split(
train, target, test_size=1 - train_ratio)
x_val, x_test, y_val, y_test = train_test_split(
x_test, y_test, test_size=test_ratio/(test_ratio + validation_ratio))
model_1 = LinearRegression()
model_2 = xgb.XGBRegressor()
model_3 = RandomForestRegressor()
model_1.fit(x_train, y_train)
val_pred_1 = model_1.predict(x_val)
test_pred_1 = model_1.predict(x_test)
val_pred_1 = pd.DataFrame(val_pred_1)
77
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
test_pred_1 = pd.DataFrame(test_pred_1)
model_2.fit(x_train, y_train)
val_pred_2 = model_2.predict(x_val)
test_pred_2 = model_2.predict(x_test)
val_pred_2 = pd.DataFrame(val_pred_2)
test_pred_2 = pd.DataFrame(test_pred_2)
model_3.fit(x_train, y_train)
val_pred_3 = model_1.predict(x_val)
test_pred_3 = model_1.predict(x_test)
val_pred_3 = pd.DataFrame(val_pred_3)
test_pred_3 = pd.DataFrame(test_pred_3)
df_val = pd.concat([x_val, val_pred_1, val_pred_2, val_pred_3], axis=1)
df_test = pd.concat([x_test, test_pred_1, test_pred_2, test_pred_3], axis=1)
final_model = LinearRegression()
final_model.fit(df_val, y_val)
final_pred = final_model.predict(df_test)
print(mean_squared_error(y_test, pred_final))
Output:
4790
Result:
Thus the program to implement ensembling technique of Blending with the given Alcohol QCM Dataset
have been executed successfully and the output got verfied.
VIVA Questions:
78
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
1. Write a python program to implement ensemble techniques, such as voting and bagging, using
scikit-learn.
2. Write a python program to implement clustering algorithms, specifically K-means and
DBSCAN, using scikit-learn.
3. Write a python program to implement the EM algorithm for Bayesian networks in Python.
79
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
80
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
81
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To implement K-Nearest Neighbor algorithm to classify the Iris Dataset
Algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.
Step-6: Our model is ready.
Program:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import pandas as pd
import numpy as np
from sklearn import datasets
iris=datasets.load_iris()
iris_data=iris.data
iris_labels=iris.target
print("accuracy is")
print(classification_report(y_test, y_pred))
Output:
82
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
accuracy is
Result:
Thus the program to implement k Nearest Neighbour Algorithm for clustering Iris dataset have been executed
successfully and output got verified.
Viva Questions:
1.Which of the following is a goal of clustering algorithms?
🗸 Classification
🗸 Regression
🗸 Dimensionality reduction
🗸 Grouping similar data points together
2. Which clustering algorithm is based on the concept of centroids?
🗸 K-Means
🗸 DBSCAN
🗸 Agglomerative
🗸 Mean-Shift
3.Which of the following is finally produced by Hierarchical Clustering?
🗸 final estimate of cluster centroids
🗸 tree showing how close things are to each other
🗸 assignment of each point to clusters
🗸 all of the mentioned
4. Which of the following clustering requires merging approach?
🗸 Partitional
🗸 Hierarchical
🗸 Naive Bayes
🗸 None of the mentioned
5.Which of the following is true for clustering
🗸 Clustering is a technique used to group similar objects into clusters.
🗸 partition data into groups
🗸 dividing entire data, based on patterns in data
83
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
🗸 All of the above
Practice Exercise :
1.Implement an application that predict the segmentation and classify the customer requirement using the
clustering algorithms.
2. Write a program to implement k-means clustering algorithm
84
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
85
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
86
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To implement the EM algorithm for clustering networks using the given dataset.
Algorithm:
Step 1 :Initialize θ randomly Repeat until convergence:
Step 2:E-step: Compute q(h) = P(H = h | E = e; θ) for each h (probabilistic inference)
Step 3:Create fully -observed weighted examples: (h, e) with weight q(h)
Step 4:M-step: Maximum likelihood (count and normalize) on weighted examples to get θ
Program:
# print(dataset)
X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y=pd.DataFrame(dataset.target)
y.columns=['Targets']
# print(X)
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
# REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Real')
87
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
# K-PLOT
plt.subplot(1,3,2)
model=KMeans(n_clusters=3)
model.fit(X) predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')
# GMM PLOT
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm],s=40)
plt.title('GMM Classification')
Output:
88
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Result:
Thus the program to implement EM Algorithm for clustering networks using the given dataset have been executed
successfully and the output got verified.
Viva Questions:
89
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Practice Exercise :
1. Write a python program to EM algorithm to learn parameters for a Bayesian network using the pgmpy library
2. Write a EM code for understand the heart diseases and implement using the Bayesian network
90
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
91
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To implement the neural network model for the given dataset.
Algorithm:
Step-1 Image Acquisition: The first step is to acquire images of paper documents with the help of
optical scanners. This way, an original image can be captured and stored.
Step-2: Pre-processing: The noise level on an image should be optimized and areas outside the text removed.
Pre-processing is especially vital for recognizing handwritten documents that are more sensitive to noise.
Step-3: Segmentation: The process of segmentation is aimed at grouping characters into meaningful chunks.
There can be predefined classes for characters. So, images can be scanned for patterns that match the classes.
Step-4: Feature Extraction: This step means splitting the input data into a set of features, that is, to find
essential characteristics that make one or another pattern recognizable.
Step-6: Post processing: This stage is the process of refinement as an OCR model can require some corrections.
However, it isn’t possible to achieve 100% recognition accuracy. The identification of characters heavily
depends on the context.
Program:
from __future__ import print_function
import numpy as np
import tensorflow as tf
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.layers import Conv2D, MaxPooling2D, Flatten
from keras.optimizers import RMSprop, SGD
from keras.optimizers import Adam
from keras.utils import np_utils
from emnist import list_datasets
from emnist import extract_training_samples
from emnist import extract_test_samples
92
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
np.random.seed(1671) # for reproducibility
# network and training
NB_EPOCH = 30
BATCH_SIZE = 256
VERBOSE = 2
NB_CLASSES = 256 # number of outputs = number of classes
OPTIMIZER = Adam()
N_HIDDEN = 512
VALIDATION_SPLIT=0.2 # how much TRAIN is reserved for VALIDATION
DROPOUT = 0.20
print(list_datasets())
X_train, y_train = extract_training_samples('byclass')
print("train shape: ", X_train.shape)
print("train labels: ",y_train.shape)
X_test, y_test = extract_test_samples('byclass')
print("test shape: ",X_test.shape)
print("test labels: ",y_test.shape)
#for indexing from 0
y_train = y_train-1
y_test = y_test-1
RESHAPED = len(X_train[0])*len(X_train[1])
X_train = X_train.reshape(len(X_train), RESHAPED)
X_test = X_test.reshape(len(X_test), RESHAPED)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# normalize
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, NB_CLASSES)
Y_test = np_utils.to_categorical(y_test, NB_CLASSES)
# M_HIDDEN hidden layers
# 35 outputs
# final stage is softmax
model = Sequential()
model.add(Dense(N_HIDDEN, input_shape=(RESHAPED,)))
model.add(Activation('relu'))
model.add(Dropout(DROPOUT))
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(DROPOUT))
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(DROPOUT))
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(DROPOUT))
93
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
model.add(Dense(NB_CLASSES))
model.add(Activation('softmax'))
model.summary()
model.compile(loss='categorical_crossentropy',
optimizer=OPTIMIZER,
metrics=['accuracy'])
history = model.fit(X_train, Y_train,
batch_size=BATCH_SIZE, epochs=NB_EPOCH,
verbose=VERBOSE, validation_split=VALIDATION_SPLIT)
score = model.evaluate(X_test, Y_test, verbose=VERBOSE)
print("\nTest score:", score[0])
print('Test accuracy:', score[1])
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
Output:
['balanced', 'byclass', 'bymerge', 'digits', 'letters', 'mnist']
train shape: (697932, 28, 28)
train labels: (697932,)
test shape: (116323, 28, 28)
test labels: (116323,)
697932 train samples
116323 test samples
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 512) 401920
activation (Activation) (None, 512) 0
dropout (Dropout) (None, 512) 0
dense_1 (Dense) (None, 256) 131328
activation_1 (Activation) (None, 256) 0
dropout_1 (Dropout) (None, 256) 0
dense_2 (Dense) (None, 256) 65792
activation_2 (Activation) (None, 256) 0
dropout_2 (Dropout) (None, 256) 0
94
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
dense_3 (Dense) (None, 256) 65792
activation_3 (Activation) (None, 256) 0
dropout_3 (Dropout) (None, 256) 0
dense_4 (Dense) (None, 256) 65792
activation_4 (Activation) (None, 256) 0
Total params: 730,624
Trainable params: 730,624
Non-trainable params: 0
95
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Result:
Thus the program to implement the neural network model for the given dataset.
Viva Questions:
1. Why do we need biological neural networks?
🗸 To make smart human interactive & user friendly system
🗸 To apply heuristic search methods to find solutions of problem
🗸 To solve tasks like machine vision & natural language processing
🗸 All of the above
2. Artificial neural network is used for
🗸 Classification
🗸 Clustering
🗸 Pattern recognition
🗸 All of the above
3. . Artificial Neural Network is based on which approach?
🗸 Weak Artificial Intelligence approach
🗸 Cognitive Artificial Intelligence approach
🗸 Strong Artificial Intelligence approach
🗸 Applied Artificial Intelligence approach
4. A Neural Network can answer
🗸 For Loop questions
🗸 what-if questions
🗸 IF-The-Else Analysis Questions
🗸 None of the mentioned
5. The first neural network computer:
🗸 AM
🗸 AN
🗸 RFD
🗸 SNARC
Practice Exercise :
1. Develop a neural network model to optimize the pattern for the information.
2. Write a code to find the shortest path to scale data for long short-term memory network in python
96
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
97
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
98
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Aim:
To implement and build a Convolutional neural network model which predicts the age and gender of a
person using the given pre-trained models.
Algorithm:
Steps in CNN Algorithm:
Step-1: Choose the Dataset.
Step-2: Prepare the Dataset for training.
Step-3: Create training Data.
Step-4: Shuffle the Dataset.
Step-5: Assigning Labels and Features.
Step-6: Normalising X and converting labels to categorical data.
Step-7: Split X and Y for use in CNN.
Step-8: Define, compile and train the CNN Model.
Step-9: Accuracy and Score of the model.
Program:
import cv2 as cv
import math
import time
from google.colab.patches import cv2_imshow
def getFaceBox(net, frame, conf_threshold=0.7):
frameOpencvDnn = frame.copy()
frameHeight = frameOpencvDnn.shape[0]
frameWidth = frameOpencvDnn.shape[1]
blob = cv.dnn.blobFromImage(frameOpencvDnn, 1.0, (300, 300), [104, 117, 123], True,
False) net.setInput(blob)
detections = net.forward()
bboxes = []
for i in range(detections.shape[2]):
confidence = detections[0, 0, i, 2]
if confidence > conf_threshold:
x1 = int(detections[0, 0, i, 3] * frameWidth)
y1 = int(detections[0, 0, i, 4] * frameHeight)
x2 = int(detections[0, 0, i, 5] * frameWidth)
y2 = int(detections[0, 0, i, 6] * frameHeight)
bboxes.append([x1, y1, x2, y2])
cv.rectangle(frameOpencvDnn, (x1, y1), (x2, y2), (0, 255, 0),
int(round(frameHeight/150)), 8)
return frameOpencvDnn, bboxes
99
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
faceProto = "/content/opencv_face_detector.pbtxt"
faceModel = "/content/opencv_face_detector_uint8.pb"
ageProto = "/content/age_deploy.prototxt"
ageModel = "/content/age_net.caffemodel"
genderProto = "/content/gender_deploy.prototxt"
genderModel = "/content/gender_net.caffemodel"
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746)
ageList = ['(0-2)', '(4-6)', '(8-12)', '(15-20)', '(25-32)', '(38-43)', '(48-53)', '(60-100)']
genderList = ['Male', 'Female']
ageNet = cv.dnn.readNet(ageModel, ageProto)
genderNet = cv.dnn.readNet(genderModel, genderProto)
faceNet = cv.dnn.readNet(faceModel, faceProto)
def age_gender_detector(frame):
# Read frame
t = time.time()
frameFace, bboxes = getFaceBox(faceNet, frame)
for bbox in bboxes:
# print(bbox)
face = frame[max(0,bbox[1]-padding):min(bbox[3]+padding,frame.shape[0]-
1),max(0,bbox[0]-padding):min(bbox[2]+padding, frame.shape[1]-1)]blob =
cv.dnn.blobFromImage(face, 1.0, (227, 227), MODEL_MEAN_VALUES, swapRB=False)
genderNet.setInput(blob)
genderPreds = genderNet.forward()
gender = genderList[genderPreds[0].argmax()]
# print("Gender Output : {}".format(genderPreds))
print("Gender : {}, conf = {:.3f}".format(gender,
genderPreds[0].max()))ageNet.setInput(blob)
agePreds = ageNet.forward()
age = ageList[agePreds[0].argmax()]
print("Age Output : {}".format(agePreds))
print("Age : {}, conf = {:.3f}".format(age, agePreds[0].max()))label = "{},{}".format(gender,
age)
cv.putText(frameFace, label, (bbox[0], bbox[1]-10), cv.FONT_HERSHEY_SIMPLEX, 0.8, (0,
255, 255), 2, cv.LINE_AA)
return frameFace
from google.colab import files
uploaded = files.upload()
input = cv.imread("2.jpg")
output = age_gender_detector(input)
cv2_imshow(output)
Output:
10
0
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Result:
Thus the program to implement and build a Convolutional neural network model which predicts the age and
gender of a person using the given pre-trained models have been executed successfully and the output got
verified
Viva Questions:
1. ________ computes the output volume by computing dot product between all filters and image patch
🗸 Input Layer
🗸 Convolution Layer
🗸 Pool Layer
🗸 Activation Function Layer
2. _____ is/are the ways to represent uncertainty
🗸 Fuzzy logic
🗸 Entropy
🗸 Probability
🗸 All of the above
10
1
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Practice Exercise :
1.Write a python program to building a deep neural network model using python and the keras
library(multi-layer perception(MLP) model for multi-class classification).
10
2
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
10
3
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
10
4