UNIT-1 Introduction To Artificial Intelligence: Mrs - Harsha Patil, Dr.D.Y.Patil ACS College, Pimpri, Pune
UNIT-1 Introduction To Artificial Intelligence: Mrs - Harsha Patil, Dr.D.Y.Patil ACS College, Pimpri, Pune
INTRODUCTION TO
ARTIFICIAL INTELLIGENCE
Year 1966: The researchers emphasized developing algorithms which can solve
mathematical problems. Joseph Weizenbaum created the first chatbot in 1966,
which was named as ELIZA.
Year 1972: The first intelligent humanoid robot was built in Japan which was named
as WABOT-1.
A boom of AI (1980-1987)
Year 1980: After AI winter duration, AI came back with "Expert System". Expert
systems were programmed that emulate the decision-making ability of a human
expert.
In the Year 1980, the first national conference of the American Association of
Artificial Intelligence was held at Stanford University.
Year 1997: In the year 1997, IBM Deep Blue beats world chess champion, Gary
Kasparov, and became the first computer to beat a world chess champion.
Year 2002: for the first time, AI entered the home in the form of Roomba, a vacuum
cleaner.
Year 2006: AI came in the Business world till the year 2006. Companies like
Facebook, Twitter, and Netflix also started using AI.
2. Speech Recognition
In the 1990s, computer speech recognition reached a practical level for limited
purposes. Thus United Airlines has replaced its keyboard tree for flight information
by a system using speech recognition of flight numbers and city names. It is quite
convenient. On the other hand, while it is possible to instruct some computers using
speech, most users have gone back to the keyboard and the mouse as still more convenient.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Branches of AI
Ordinary Problems
1.Perception
➢ Vision
➢ Voice Recognition
➢ Speech Recognition
2.Natural Language
➢ Understanding
➢ Generation
➢ Translation
3.Robot Control
Formal Problems
➢ Game Playing
➢ Solving complex mathematical Problem
Expert Problems
➢ Design
➢ Fault Finding
➢ Scientific Analysis
➢ Medical Diagnosis
➢ Financial Analysis
Solution 2
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Solution 3
Liter in 4 Liter Jug Liter in 3 Liter Jug Rule Applied
0 0
4 0 1
1 3 8
0 3 3
3 0 5
3 3 2
4 2 7
0 2 3
2 0 5
Solution 3
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Water Jug Problem of 8,5 and 3 Ltr Jug
The following is a problem which can be solved by using state
space search technique. “we have 3 jugs of capacities 3,5, and
8 liters respectively. There is no scale on the jugs. So it is only
their capacities that we certainly know. Initially the 8 liter jug
is full of water, the other two are empty. We can pour water
from one jug to another, and the goal is to have exactly 4 liters
of water in any of the jug. There is no scale on the jug and we
do not have any other tools that would help. The amount of
water in the other two jugs at the end is irrelevant.
Formalize the above problem as state space search . You should
1. Suggest suitable representation of the problem
2. State the initial and goal state of this problem
3. Specify the production rules for getting from one state to
another
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Water Jug Problem of 8,5 and 3 Ltr Jug
Solution:-
The state space for this problem can be defined as
x -represents the number of liters of water in the 8-liter jug
y -represents the number of liters of water in the 5-liter jug
z –represent the number of liters of water in he 3-liter jug
Therefore, x =0,1,2,3,5,6,70r 8
y=0,1,2 ,3,4 or 5
z=0,1,2 or 3
The initial state is ( 8,0,0) .The goal state is to get 4 liter of water in
any jug.
The goal state can be defined as (4,n,n) or (n,4,n) for any value of n
The initial state (i,j) is (3,3) i.e. three missionaries and three
cannibals on side A of a river and ( 0,0) on side B of the river.
1 2 3 1 2 3 1 2 3
7 8 4 7 8 4 7 4
6 5 6 5 6 8 5
1 2 3
8 4
7 6 5
2 8 3
1 6 4
7 5
1 2 3
8 4
7 6 5
2 8 3 1 2 3
1 4 8 4
7 6 5 7 6 5
2 3 1 2 3
1 8 4 8 4
7 6 5 7 6 5
Actions: It gives the description of all the available actions to the agent.
Solution: It is an action sequence which leads from the start node to the
goal node.
Optimal Solution: If a solution has the lowest cost among all solutions.
Breadth-first Search
Depth-first Search
Depth-limited Search
Iterative deepening depth-first search
Uniform cost search
Bidirectional Search
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Issues in the design of search programs
Issues in the design of search programs :
Advantages:
✓BFS will provide a solution if any solution exists.
✓If there are more than one solutions for a given problem, then BFS will
provide the minimal solution which requires the least number of steps.
Example:
In the below tree structure, we have shown the traversing of the tree using
BFS algorithm from the root node S to goal node K. BFS search algorithm
traverse in layers, so it will follow the path which is shown by the dotted
arrow, and the traversed path will be:
S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K
Advantage:
✓DFS requires very less memory as it only needs to store a stack of the
nodes on the path from root node to the current node.
✓It takes less time to reach to the goal node than BFS algorithm (if it
traverses in the right path).
Example:
In the below search tree, we have shown the flow of depth-first search, and
it will follow the order as:
It will start searching from root node S, and traverse A, then B, then D and
E, after traversing E, it will backtrack the tree as E has no other successor
and still goal node is not found. After backtracking it will traverse node C
and then G, and here it will terminate as it found goal node.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Depth-first Search
Space Complexity: DFS algorithm needs to store only single path from the
root node, hence space complexity of DFS is equivalent to the size of the
fringe set, which is O(bm).
BFS stands for Breadth First Search. DFS stands for Depth First Search.
BFS is more suitable for searching vertices which are DFS is more suitable when there are solutions away from
closer to the given source. source.
The Time complexity of BFS is O(V + E) when The Time complexity of DFS is also O(V + E) when
Adjacency List is used and O(V^2) when Adjacency Adjacency List is used and O(V^2) when Adjacency
Matrix is used, where V stands for vertices and E stands Matrix is used, where V stands for vertices and E stands
for edges. for edges.
Advantages:
✓Depth-limited search is Memory efficient.
Disadvantages:
✓It does not care about the number of steps involve in searching and only
concerned about path cost. Due to which this algorithm may be stuck in an
infinite loop.
Example:
Time Complexity:
Let C* is Cost of the optimal solution, and ε is each step to get closer to the
goal node. Then the number of steps is = C*/ε+1. Here we have taken +1, as we
start from state 0 and end to C*/ε.Hence, the worst-case time complexity of
Uniform-cost search isO(b1 + [C*/ε])/.
Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of
Uniform-cost search is O(b1 + [C*/ε]).
Optimal:
Uniform-cost search is always optimal as it only selects a path with the lowest
path cost.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Iterative deepening depth-first Search
5. Iterative deepening depth-first Search:
Disadvantages:
✓The main drawback of IDDFS is that it repeats all the work of the
previous phase.
Example:
Completeness:
This algorithm is complete is if the branching factor is finite.
Time Complexity:
Let's suppose b is the branching factor and depth is d then the worst-case
time complexity is O(bd).
Space Complexity:
The space complexity of IDDFS will be O(bd).
Optimal:
IDDFS algorithm is optimal if path cost is a non- decreasing function of
the depth of the node.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Bidirectional Search
6. Bidirectional Search Algorithm:
Advantages:
✓Bidirectional search is fast.
✓Bidirectional search requires less memory
Example:
In the below search tree, bidirectional search algorithm is applied. This
algorithm divides one graph/tree into two sub-graphs. It starts traversing
from node 1 in the forward direction and starts from goal node 16 in the
backward direction.
The algorithm terminates at node 9 where two searches meet.
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic
cost should be less than or equal to the estimated cost.
Greedy best-first search algorithm always selects the path which appears
best at that moment. It is the combination of depth-first search and breadth-
first search algorithms. It uses the heuristic function and search. Best-first
search allows us to take the advantages of both algorithms. With the help of
best-first search, at each step, we can choose the most promising node. In
the best first search algorithm, we expand the node which is closest to the
goal node and the closest cost is estimated by heuristic function, i.e.
f(n)= g(n).
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Best-first Search
Where, h(n)= estimated cost from node n to the goal.
The greedy best first algorithm is implemented by the priority queue.
Disadvantages:
✓It can behave as an unguided depth-first search in the worst case scenario.
✓It can get stuck in a loop as DFS.
✓This algorithm is not optimal.
Example:
Consider the below search problem, and we will traverse it using greedy
best-first search. At each iteration, each node is expanded using evaluation
function f(n)=h(n) , which is given in the below table.
Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then
return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of
evaluation function (g+h), if node n is goal node then return success and
stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the
closed list. For each successor n', check whether n' is already in the OPEN
or CLOSED list, if not then compute evaluation function for n' and place
into Open list.
Advantages:
A* search algorithm is the best algorithm than other search algorithms.
A* search algorithm is optimal and complete.
This algorithm can solve very complex problems.
Disadvantages:
It does not always produce the shortest path as it mostly based on heuristics
and approximation.
A* search algorithm has some complexity issues.
The main drawback of A* is memory requirement as it keeps all generated
nodes in the memory, so it is not practical for various large-scale problems.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
A* Search
Example:
In this example, we will traverse the given graph using the A* algorithm.
The heuristic value of all states is given in the below table so we will
calculate the f(n) of each state using the formula f(n)= g(n) + h(n), where
g(n) is the cost to reach any node from start state.
Here we will use OPEN and CLOSED list.
Points to remember:
➢A* algorithm returns the path which occurred first, and it does not search
for all remaining paths.
➢The efficiency of A* algorithm depends on the quality of heuristic.
➢A* algorithm expands all nodes which satisfy the condition f(n)
✓AND-OR graphs are useful for certain problems where the solution
involves decomposing the problem into smaller problems. This is called
Problem Reduction.
✓Here, alternatives involves branches where some or all must be satisfied
before we can progress.
✓In case of A* algorithm, we use the open list to hold nodes that have been
generated but not expanded & the closed list to hold nodes that have been
expanded.
✓It requires that nodes traversed in the tree be labelled as, SOLVED or
UNSOLVED in the solution process to account for AND node solutions
which requires solutions to all successor nodes.
✓A solution is found when the start node is labelled as SOLVED.
✓AO* is best algorithm for solving cyclic AND-OR graphs.
TV Set
AND
OR
Disadvantages:
✓Sometimes for unsolvable nodes, it can’t find the optimal path. Its
complexity is than other algorithms.
OR
A1*A2 A3 A1 A2*A3
1 1
1
In figure the top node A has been expanded producing two area one leading to B
and leading to C- . The numbers at each node represent heuristic cost (h at that
node (cost of getting to the goal state from current state). For simplicity, it is
assumed that every operation(i.e. applying a rule) has unit cost, i.e., each arc with
single successor will have a cost of 1 and each of its components. With the
available information till now , it appears that C is the most promising node to
expand since its h = 3 , the lowest but going through B would be better since to
use C we must also use D and the cost would be 9(3+4+1+1). Through B it
would be 6(5+1).
Thus the choice of the next node to expand depends not only on a value but also
on whether that node is part of the current best path form the initial node.
In figure the node G appears to be the most promising node, with the
least f ' value. But G is not on the current best path, since to use G we
must use GH with a cost of 9 and again this demands that arcs be used
(with a cost of 27). The path from A through B, E-F is better with a
total cost of (17+1=18).
2. Pick one of these unexpanded nodes and expand it. Add its
successors to the graph and computer f (cost of the remaining
distance) for each of them.
3.It is also called greedy local search as it only looks to its good
immediate neighbor state and not beyond that.
Step 1: Evaluate the initial state, if it is goal state then return success and
Stop.
Step 2: Loop Until a solution is found or there is no new operator left to
apply.
Step 3: Select and apply an operator to the current state.
Step 4: Check new state:
a. If it is goal state, then return success and quit.
b. Else if it is better than the current state then assign new state as a
current state.
c. Else if not better than the current state, then return to step2.
Step 5: Exit.
Step 1: Evaluate the initial state, if it is goal state then return success
and stop, else make current state as initial state.
Step 2: Loop until a solution is found or the current state does not
change
Stochastic hill climbing does not examine for all its neighbor before moving.
Rather, this search algorithm selects one neighbor node at random and
decides whether to choose it as a current state or examine another state.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Hill Climbing
Problems in Hill Climbing Algorithm:
1 2 3 1 2 3 1 2 3
H=4 7 8 4 H=2 7 8 4 7 4 H=3
6 5 6 5 6 8 5
1 2 3 1 2 3 1 2 3
7 8 H=1 8 4 7 8 4 H=3
H=5
6 5 4 7 6 5 6 5
2 3 1 2 3
Search tree for 8
1 8 4 H=0 8 4
H=2 puzzle problem by hill
7 6 5 Goal State 7 6 5 climbing procedure
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Tower of Hanoi Problem
Problem Statement:
✓We have studied the strategies which can reason either in forward or
backward, but a mixture of the two directions is appropriate for solving a
complex and large problem. Such a mixed strategy, make it possible that
first to solve the major part of a problem and then go back and solve the
small problems arise during combining the big parts of the problem. Such a
technique is called Means-Ends Analysis.
✓Means-Ends Analysis is problem-solving techniques used in Artificial
intelligence for limiting search in AI programs.
✓It is a mixture of Backward and forward search technique.
✓The MEA technique was first introduced in 1961 by Allen Newell, and
Herbert A. Simon in their problem-solving computer program, which was
named as General Problem Solver (GPS).
✓The MEA analysis process centered on the evaluation of the difference
between the current state and goal state.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Means-Ends Analysis
How means-ends analysis Works:
➢First, evaluate the difference between Initial State and final State.
➢Select the various operators which can be applied for each difference.
➢Apply the operator at each difference, which reduces the difference
between the current state and goal state.
Solution:
To solve the above problem, we will first find the differences between initial
states and goal states, and for each difference, we will generate a new state
and will apply the operators. The operators we have for this problem are:
Move
Delete
Expand
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Means-Ends Analysis
1. Evaluating the initial state: In the first step, we will evaluate the initial state and will
compare the initial and Goal state to find the differences between both states.
2. Applying Delete operator: As we can check the first difference is that in goal state there
is no dot symbol which is present in the initial state, so, first we will apply the Delete
operator to remove this dot.
3. Applying Move Operator: After applying the Delete operator, the new state occurs
which we will again compare with goal state. After comparing these states, there is
another difference that is the square is outside the circle, so, we will apply the Move
Operator.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Means-Ends Analysis
4. Applying Expand Operator: Now a new state is generated in the third step, and we will
compare this state with the goal state. After comparing the states there is still one difference
which is the size of the square, so, we will apply Expand operator, and finally, it will generate
the goal state.
START
1 2 3 4 5 6 7
WALK(R1) PICKUP(A) PUTDOWN(A) PICKUP(B) PUTDOWN(B) PUSH(D,R2) WALK (R1)
8 9 10 11 12 13 14
PICKUP(A) CARRY(A,R2) PUTDOWN(A) WALK(R1) PICKUP(B) CARRY (B,R2) PLACE(A,B )
GOAL
Eg.
Solution:
➢ From first row of multiplication it is clear that B=1 as
JE*B=JE
➢ As in the multiplication, second row should start from 0 at
tenth's place. So A = 0.
➢ Now in the hundred's place, J + Something = 10. When
you add something to the single digit number that results in
10. So J = 9.
Solution:
From the first row of multiplication, H =1 is clear, As HE x H = HE.
Now, H+A=M i.e 1+A=10+M as there is carry over next level
Therefore A=9 ,M=0 and N=2
Now, HE*E=HHA i.e 1E*E=119 so by trial and error we get E=7
* *
*
*
* *
* *
*
*
1 2 3 1 2 3 1 2 3
7 8 4 7 8 4 7 4
6 5 6 5 6 8 5
1 2 3
8 4
7 6 5
2 8 3
1 6 4
7 5
1 2 3
8 4
7 6 5
2 8 3 1 2 3
1 4 8 4
7 6 5 7 6 5
2 3 1 2 3
1 8 4 8 4
7 6 5 7 6 5
Solution:
Set of variables Xi=[Pune, Mumbai, Nasik, Jalgaon, Nagpur] Set of
domain Di=[Red, Green, Blue] for each xi
Constraint: No adjacent city have the same color
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Map or GraphColoring Problem
City/ Operation Pune Nasik Mumbai Nagpur Jalgaon
Initial Domain RGB RGB RGB RGB RGB
Assign Red to Pune R GB GB RGB RGB
Assign Green to Nasik R G B RG RG
Assign Red to Nagpur R G B R G
Assign Green to R G B R G
Jalgaon
Breadth-first Search
Depth-first Search
Depth-limited Search
Iterative deepening depth-first search
Uniform cost search
Bidirectional Search
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Issues in the design of search programs
Issues in the design of search programs :
Advantages:
✓BFS will provide a solution if any solution exists.
✓If there are more than one solutions for a given problem, then BFS will
provide the minimal solution which requires the least number of steps.
Example:
In the below tree structure, we have shown the traversing of the tree using
BFS algorithm from the root node S to goal node K. BFS search algorithm
traverse in layers, so it will follow the path which is shown by the dotted
arrow, and the traversed path will be:
S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K
Advantage:
✓DFS requires very less memory as it only needs to store a stack of the
nodes on the path from root node to the current node.
✓It takes less time to reach to the goal node than BFS algorithm (if it
traverses in the right path).
Example:
In the below search tree, we have shown the flow of depth-first search, and
it will follow the order as:
It will start searching from root node S, and traverse A, then B, then D and
E, after traversing E, it will backtrack the tree as E has no other successor
and still goal node is not found. After backtracking it will traverse node C
and then G, and here it will terminate as it found goal node.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Depth-first Search
Space Complexity: DFS algorithm needs to store only single path from the
root node, hence space complexity of DFS is equivalent to the size of the
fringe set, which is O(bm).
BFS stands for Breadth First Search. DFS stands for Depth First Search.
BFS is more suitable for searching vertices which are DFS is more suitable when there are solutions away from
closer to the given source. source.
The Time complexity of BFS is O(V + E) when The Time complexity of DFS is also O(V + E) when
Adjacency List is used and O(V^2) when Adjacency Adjacency List is used and O(V^2) when Adjacency
Matrix is used, where V stands for vertices and E stands Matrix is used, where V stands for vertices and E stands
for edges. for edges.
Advantages:
✓Depth-limited search is Memory efficient.
Disadvantages:
✓It does not care about the number of steps involve in searching and only
concerned about path cost. Due to which this algorithm may be stuck in an
infinite loop.
Example:
Time Complexity:
Let C* is Cost of the optimal solution, and ε is each step to get closer to the
goal node. Then the number of steps is = C*/ε+1. Here we have taken +1, as we
start from state 0 and end to C*/ε.Hence, the worst-case time complexity of
Uniform-cost search isO(b1 + [C*/ε])/.
Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of
Uniform-cost search is O(b1 + [C*/ε]).
Optimal:
Uniform-cost search is always optimal as it only selects a path with the lowest
path cost.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Iterative deepening depth-first Search
5. Iterative deepening depth-first Search:
Disadvantages:
✓The main drawback of IDDFS is that it repeats all the work of the
previous phase.
Example:
Completeness:
This algorithm is complete is if the branching factor is finite.
Time Complexity:
Let's suppose b is the branching factor and depth is d then the worst-case
time complexity is O(bd).
Space Complexity:
The space complexity of IDDFS will be O(bd).
Optimal:
IDDFS algorithm is optimal if path cost is a non- decreasing function of
the depth of the node.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Bidirectional Search
6. Bidirectional Search Algorithm:
Advantages:
✓Bidirectional search is fast.
✓Bidirectional search requires less memory
Example:
In the below search tree, bidirectional search algorithm is applied. This
algorithm divides one graph/tree into two sub-graphs. It starts traversing
from node 1 in the forward direction and starts from goal node 16 in the
backward direction.
The algorithm terminates at node 9 where two searches meet.
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic
cost should be less than or equal to the estimated cost.
Greedy best-first search algorithm always selects the path which appears
best at that moment. It is the combination of depth-first search and breadth-
first search algorithms. It uses the heuristic function and search. Best-first
search allows us to take the advantages of both algorithms. With the help of
best-first search, at each step, we can choose the most promising node. In
the best first search algorithm, we expand the node which is closest to the
goal node and the closest cost is estimated by heuristic function, i.e.
f(n)= g(n).
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
Best-first Search
Where, h(n)= estimated cost from node n to the goal.
The greedy best first algorithm is implemented by the priority queue.
Disadvantages:
✓It can behave as an unguided depth-first search in the worst case scenario.
✓It can get stuck in a loop as DFS.
✓This algorithm is not optimal.
Example:
Consider the below search problem, and we will traverse it using greedy
best-first search. At each iteration, each node is expanded using evaluation
function f(n)=h(n) , which is given in the below table.
Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then
return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of
evaluation function (g+h), if node n is goal node then return success and
stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the
closed list. For each successor n', check whether n' is already in the OPEN
or CLOSED list, if not then compute evaluation function for n' and place
into Open list.
Advantages:
A* search algorithm is the best algorithm than other search algorithms.
A* search algorithm is optimal and complete.
This algorithm can solve very complex problems.
Disadvantages:
It does not always produce the shortest path as it mostly based on heuristics
and approximation.
A* search algorithm has some complexity issues.
The main drawback of A* is memory requirement as it keeps all generated
nodes in the memory, so it is not practical for various large-scale problems.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri Pune.
A* Search
Example:
In this example, we will traverse the given graph using the A* algorithm.
The heuristic value of all states is given in the below table so we will
calculate the f(n) of each state using the formula f(n)= g(n) + h(n), where
g(n) is the cost to reach any node from start state.
Here we will use OPEN and CLOSED list.
Points to remember:
➢A* algorithm returns the path which occurred first, and it does not search
for all remaining paths.
➢The efficiency of A* algorithm depends on the quality of heuristic.
➢A* algorithm expands all nodes which satisfy the condition f(n)
✓AND-OR graphs are useful for certain problems where the solution
involves decomposing the problem into smaller problems. This is called
Problem Reduction.
✓Here, alternatives involves branches where some or all must be satisfied
before we can progress.
✓In case of A* algorithm, we use the open list to hold nodes that have been
generated but not expanded & the closed list to hold nodes that have been
expanded.
✓It requires that nodes traversed in the tree be labelled as, SOLVED or
UNSOLVED in the solution process to account for AND node solutions
which requires solutions to all successor nodes.
✓A solution is found when the start node is labelled as SOLVED.
✓AO* is best algorithm for solving cyclic AND-OR graphs.
TV Set
AND
OR
Disadvantages:
✓Sometimes for unsolvable nodes, it can’t find the optimal path. Its
complexity is than other algorithms.
OR
A1*A2 A3 A1 A2*A3
1 1
1
In figure the top node A has been expanded producing two area one leading to B
and leading to C- . The numbers at each node represent heuristic cost (h at that
node (cost of getting to the goal state from current state). For simplicity, it is
assumed that every operation(i.e. applying a rule) has unit cost, i.e., each arc with
single successor will have a cost of 1 and each of its components. With the
available information till now , it appears that C is the most promising node to
expand since its h = 3 , the lowest but going through B would be better since to
use C we must also use D and the cost would be 9(3+4+1+1). Through B it
would be 6(5+1).
Thus the choice of the next node to expand depends not only on a value but also
on whether that node is part of the current best path form the initial node.
In figure the node G appears to be the most promising node, with the
least f ' value. But G is not on the current best path, since to use G we
must use GH with a cost of 9 and again this demands that arcs be used
(with a cost of 27). The path from A through B, E-F is better with a
total cost of (17+1=18).
2. Pick one of these unexpanded nodes and expand it. Add its
successors to the graph and computer f (cost of the remaining
distance) for each of them.
Eg.
Solution:
➢ From first row of multiplication it is clear that B=1 as
JE*B=JE
➢ As in the multiplication, second row should start from 0 at
tenth's place. So A = 0.
➢ Now in the hundred's place, J + Something = 10. When
you add something to the single digit number that results in
10. So J = 9.
Solution:
From the first row of multiplication, H =1 is clear, As HE x H = HE.
Now, H+A=M i.e 1+A=10+M as there is carry over next level
Therefore A=9 ,M=0 and N=2
Now, HE*E=HHA i.e 1E*E=119 so by trial and error we get E=7
* *
*
*
* *
* *
*
*
Per the indeed.com, percentage growth of Python is 500 times more than it’s peer Languages.
http://www.indeed.com/jobtrends?q=Perl%2C+.Net%2C+Python%2Cjava&l=&relative=1
Source: http://www.forbes.com/sites/louiscolumbus/2014/12/29/where-big-data-jobs-will-be-in-
2015/
Easy to Maintain ✓ Python code is easily to write and debug. Python's success is that its source code is fairly
easy-to-maintain.
Portable ✓ Python can run on a wide variety of Operating systems and platforms and providing the
similar interface on allplatforms.
Broad Standard Libraries ✓ Python comes with many prebuilt libraries apx. 21K
High Level programming ✓ Python is intended to make complex programming simpler. Python deals with memory
addresses, garbage collection etc internally.
Interactive ✓ Python provide an interactive shell to test the things before implementation. It provide
the user the direct interface with Python.
Database Interfaces ✓ Python provides interfaces to all major commercial databases. These interfaces are
pretty easy to use.
GUI programming ✓ Python supports GUI applications and has framework for Web. Interface to tkinter,
WXPython, DJango in Python makeit .
By RipalRanpara
Python 2's print statement has been replaced by the print() function.
Old: New:
Old: New:
Old: New:
The division of two integers returns a float instead of an integer. "//" can be
used to have the "old" behavior.
▪ String indexes starting at 0 in the beginning of the string and working their way from -1
at the end.
+ Concatenation - Adds values on either side of the operator a + b will give HelloPython
* Repetition - Creates new strings, concatenating multiple copies of the a*2 will give HelloHello
same string
[] Slice - Gives the character from the given index a[1] will give e
a[-1] will give o
[:] Range Slice - Gives the characters from the given range a[1:4] will give ell
in Membership - Returns true if a character exists in the given string ‘H’ in a will give True
str.isalpha() Returns True if string has at least 1 character and all characters are alphanumeric
and False otherwise.
str.isdigit() Returns True if string contains only digits and False otherwise.
str.lower() Converts all uppercase letters in string to lowercase.
str.upper() Converts lowercase letters in string to uppercase.
str.replace(old, new) Replaces all occurrences of old in string with new.
str.split(str=‘ ’) Splits string according to delimiter str (space if not provided) and returns list
of substrings.
str.strip() Removes all leading and trailing whitespace of string.
str.title() Returns "titlecased" version of string.
→ List comprehension
A keyword is one that means something to the language. In other words, you can’t use a
reserved word as the name of a variable, a function, a class, or a module.All the Python
keywords contain lowercase letters only.
▪ Python Tuples are Immutable objects that cannot be changed once they have been
created.
▪ A tuple contains items separated by commas and enclosed in parentheses instead of square
brackets.
•Assume that you have an object and you want to assign a key to it to
make searching easy.
•To store the key/value pair, you can use a simple array
like a data structure where keys (integers) can be used
directly as an index to store values.
•However, in cases where the keys are large and
cannot be used directly as an index, you should use
hashing.
▪ Python's dictionaries are kind of hash table type which consist of key-value pairs of
unordered elements.
• Keys : must be immutable data types ,usually numbers or strings.
• Values : can be any arbitrary Python object.
▪ Python Dictionaries are mutable objects that can change their values.
▪ A dictionary is encleach key is separated from its value by a colon osed by curly
braces ({ }), the items are separated by commas, and (:).
▪ Dictionary’s values can be assigned and accessed using square braces ([]) with a
key to obtain its value.
▪ The output:
Method Description
dict.keys() Returns list of dict's keys
dict.values() Returns list of dict's values
dict.items() Returns a list of dict's (key, value) tuple pairs
dict.get(key, default=None) For key, returns value or default if key not in dict
dict.has_key(key) Returns True if key in dict, False otherwise
dict.update(dict2) Adds dict2's key-values pairs to dict
dict.clear() Removes all elements of dict
▪ Example:
▪ continue :Causes the loop to skip the remainder of its body and immediately retest its
condition prior to reiterating.
▪ pass :Used when a statement is required syntactically but you do not want any
command or code to execute.
A function is a block of organized, reusable code that is used to perform a single, related action.
Functions provide better modularity for your application and a high degree of code reusing.
Defining a Function
• Function blocks begin with the keyword def followed by the function name and parentheses ( (
) ).
• Any input parameters or arguments should be placed within these parentheses. You can also
define parameters inside these parentheses.
• The first statement of a function can be an optional statement - the documentation string of
the function or docstring.
• The code block within every function starts with a colon (:) and is indented.
• The statement return [expression] exits a function, optionally passing back an expression to
the caller. A return statement with no arguments is the same as return None.
8/22/2017
▪ Function Arguments
You can call a function by using any of the following types of arguments:
• Required arguments: the arguments passed to the function in correct
positional order.
• Keyword arguments: the function call identifies the arguments by the
parameter names.
• Default arguments: the argument has a default value in the function
declaration used when the value is not provided in the function call.
• Variable-length arguments: This used when you need to process unspecified additional
arguments. An asterisk (*) is placed before the variable name in the function declaration.
Raised when there is no input from either the raw_input() or input() function and the
EOFError
end of file is reached.
ImportError Raised when an import statement fails.
KeyboardInter
Raised when the user interrupts program execution, usually by pressing Ctrl+c.
rupt
LookupError Base class for all lookup errors.
IndexErro Raised when an index is not found in a sequence.
r Raised when the specified key is not found in the dictionary.
KeyError
NameError Raised when an identifier is not found in the local or global namespace.
UnboundLoc
Raised when trying to accessa local variable in a function or method but no value has
al Error
been assigned to it.
EnvironmentE
Base class for all exceptions that occur outside the Python environment.
rror
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Raised when an input/ output operation fails, such as the print statement or the open()
IOError
function when trying to open a file that does not exist.
IOError
Raised foroperating system-related errors.
SyntaxError Raised when there is an error in Python syntax.
IndentationError Raised when indentation is not specified properly.
Raised when the interpreter finds an internal problem, but when this error is
SystemError
encountered the Python interpreter does not exit.
Raised when Python interpreter is quit by using the sys.exit() function. If not handled in
SystemExit
the code, causes the interpreter to exit.
Raised when an operation or function is attempted that is invalid for the specified data
TypeError
type.
Raised when the built-in function for a data type has the valid type of arguments, but
ValueError
thearguments have invalid values specified.
RuntimeError Raised when a generated error does not fall into any category.
▪ A module is a file consisting of Python code that can define functions, classes and
variables.
▪ A module allows you to organize your code by grouping related code which makes the code
easier to understand and use.
▪ You can use any Python source file as a module by executing an import statement
▪ Python's from statement lets you import specific attributes from a module into the
current namespace.
▪ import * statement can be used to import all names from a module into the current
namespace
→ Class variable
→ Class constructor
Output →
▪ Data Hiding You need to name attributes with a double underscore prefix, and those
attributes then are not be directly visible to outsiders.
▪ Hello World
Java
Python
▪ String Operations
Java
Python
▪ Collections
Java
Python
Python
▪ Python IDEs
• Vim
• Eclipse with PyDev
• Sublime Text
• Emacs
• Komodo Edit
• PyCharm
2. Data Processing
The received data in the data acquisition layer is then sent forward to the data
processing layer where it is subjected to advanced integration and processing and
involves normalization of the data, data cleaning, transformation, and encoding.
The data processing is also dependent on the type of learning being used. For e.g.,
if supervised learning is being used the data shall be needed to be segregated
into multiple steps of sample data required for training of the system and the data
thus created is called training sample data or simply training data.
4. Execution
This stage in machine learning is where the experimentation is done, testing is
involved and tunings are performed. The general goal behind being to optimize the
algorithm in order to extract the required machine outcome and maximize the
system performance, The output of the step is a refined solution capable of
providing the required data for the machine to make decisions.
5. Deployment
Like any other software output, ML outputs need to be operational zed or be
forwarded for further exploratory processing. The output can be considered as a
non-deterministic query which needs to be further deployed into the decision-
making system. It is advised to seamlessly move the ML output directly to
production where it will enable the machine to directly make decisions based on
the output and reduce the dependency on the further exploratory steps.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
➢ Machine Learning Applications in Healthcare :-
Doctors and medical practitioners will soon be able to predict with
accuracy on how long patients with fatal diseases will live. Medical
systems will learn from data and help patients save money by skipping
unnecessary tests.
i) Drug Discovery/Manufacturing
ii) Personalized Treatment/Medication
4. Text Data :-
Text data is nothing but literals. The first step of handling test data is to
convert them into numbers as or model is mathematical and needs data to
inform of numbers. So to do so we might use functions as a bag of word
formulation.
3.Test Dataset: Most of the time when we try to make changes to the
model based upon the output of the validation set then unintentionally we
make the model peek into our validation set and as a result, our model
might get over fit on the validation set as well. To overcome this issue we
have a test dataset that is only used to test the final output of the model in
order to confirm the accuracy.
Gathering Data
Data preparation
Data Wrangling
Analyze Data
Train the model
Test the model
Deployment
1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The
goal of this step is to identify and obtain all data-related problems.
In this step, we need to identify the different data sources, as data can be
collected from various sources such as files, database, internet,
or mobile devices. It is one of the most important steps of the life cycle.
The quantity and quality of the collected data will determine the
efficiency of the output. The more will be the data, the more accurate will
be the prediction.
2. Data preparation :
After collecting the data, we need to prepare it for further steps. Data preparation is
a step where we put our data into a suitable place and prepare it to use in our
machine learning training.
In this step, first, we put all data together, and then randomize the ordering of data.
This step can be further divided into two processes:
Data exploration:
It is used to understand the nature of data that we have to work with. We need to
understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
Data pre-processing:
Now the next step is preprocessing of data for its analysis.
Data wrangling is the process of cleaning and converting raw data into a useable
format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in the
next step. It is one of the most important steps of the complete process. Cleaning of
data is required to address the quality issues.
It is not necessary that data we have collected is always of our use as some of the
data may not be useful. In real-world applications, collected data may have various
issues, including:
Missing Values
Duplicate data
Invalid data
Noise
Now the next step is to train the model, in this step we train our model to
improve its performance for better outcome of the problem.
We use datasets to train the model using various machine learning
algorithms. Training a model is required so that it can understand the
various patterns, rules, and, features.
6. Test Model :
Once our machine learning model has been trained on a given dataset,
then we test the model. In this step, we check for the accuracy of our model
by providing a test dataset to it.
Testing the model determines the percentage accuracy of the model as per
the requirement of project or problem.
1.Data Integration
2.Data Cleaning
3.Data Transformation
2. Data Cleaning :-
2.1 Dealing with Missing data :-
It is common to have some missing or null data in the real-world data set.
Most of the machine learning algorithms will not work with such data. So,
it becomes important to deal with missing or null data. Some of common
measures taken are,
Get rid of the column if there are plenty of rows with null values.
Eliminate the row if there are plenty of columns with null values.
Change the missing value by mean or median or mode of that column
depending on data distribution in that column.
Outliers are those observation that has extreme values, much beyond the
normal range of values for that feature. For example, a very high salary of
CEO of a company can be an outlier if we consider salary of other regular
employees of the company.
Even few outliers in data set can contribute to poor accuracy of machine
learning model. The common methods to detect outliers and remove
them are –
Standard Deviation
Box Plot
Box Plots:
Box plots are a graphical depiction of numerical data through their
quintiles. It is a very simple but effective way to visualize outliers. Think
about the lower and upper sideburns as the boundaries of the data
distribution. Any data points that show above or below the sideburns, can
be considered outliers or anomalous.
The concept of the Interquartile Range (IQR) is used to build the box
plot graphs. IQR is a concept in statistics that is used to measure the
statistical dispersion and data variability by dividing the dataset into
quartiles.
In simple words, any dataset or any set of observations is divided into four
defined intervals based upon the values of the data and how they
compare to the entire dataset. A quartile is what divides the data into
three points and four intervals.
3.Data Transformation :-
First of all, let us have a look at the dataset we are going to use for this
particular example. You can download or take this dataset from :
https://github.com/tarunlnmiit/machine_learning/blob/master/DataPrepro
cessing.csv
It is as shown below:
When you run this code section, along with libraries, you should not see any
errors. When successfully executed, you can move to variable explorer in the
Spyder UI and you will see the following three variables.
Well the first idea is to remove the lines in the observations where there is
some missing data. But that can be quite dangerous because imagine this
data set contains key information. It would be quite dangerous to remove
such observation. So, we need to figure out a better idea to handle this
problem. And the most common idea to handle missing data is to take the
mean of the columns, as discussed in earlier section.
If you noticed in our dataset, we have two values missing, one for age
column in 6th data index and for Income column in 4th data row. Missing
values should be handled during the data analysis. So, we do that as
follows.
Here, we are taking training set to be 80% of the original data set and
testing set to be 20% of the original data set. This is usually the ratio in
which they are split. But, you can come across sometimes to a 70–30% or
75–25% ratio split. But you don’t want to split it 50–50%. This can lead
to Model Overfitting. For now, we are going to split it in 80–20% ratio.
After split, our training set and testing set look like this.
As you can see, we have these two columns age and income that contains
numerical numbers. You notice that the variables are not on the same scale
because the age are going from 32 to 55 and the salaries going from 57.6 K to
like 99.6 K. So, because this age variable in the salary variable don’t have the
same scale. This will cause some issues in your Machine Learning models.
And why is that. It’s because a lot of ML models are based on what is called
the Euclidean distance.
We use feature scaling to convert different scales to a standard scale to make
it easier for Machine Learning algorithms. We do this in Python as follows:
# feature scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
1.Statistical Analysis
2.Data Visualization
3.Data Modelling and Machine Learning
4.Deep Learning
5.Natural Language Processing (NLP)
➢ Python comes with tons of libraries for the sole purpose of statistical
analysis. Top statistical packages that provide in-built functions to perform
the most complex statistical computations are:
1.Supervised Learning:
Supervised Learning is the one, where you can consider the learning
is guided by a teacher. We have a dataset which acts as a teacher and
its role is to train the model or the machine. Once the model gets
trained it can start making a prediction or decision when new data is
given to it.
Supervised learning uses labelled training data to learn the mapping
function that turns input variables (X) into the output variable (Y). In
other words, it solves for f in the following equation:
Y = f (X)
This allows us to accurately generate outputs when given new inputs.
“The outcome or output for the given input is known before itself” and the
machine must be able to map or assign the given input to the output.
Multiple images of a cat, dog, orange, apple etc here the images are
labelled. It is fed into the machine for training and the machine must
identify the same. Just like a human child is shown a cat and told so, when
it sees a completely different cat among others still identifies it as a cat,
the same method is employed here. In short,Supervised Learning means
– Train Me!
Unsupervised learning models are used when we only have the input
variables (X) and no corresponding output variables.
They use unlabelled training data to model the underlying structure of the
data. Input data is given and the model is run on it. The image or the input
given are mixed together and insights on the inputs can be found .
The model learns through observation and finds structures in the data.
Once the model is given a dataset, it automatically finds patterns and
relationships in the dataset by creating clusters in it.
What it cannot do is add labels to the cluster, like it cannot say this a
group of apples or mangoes, but it will separate all the apples from
mangoes.
3.Reinforced Learning:
“Signal” as the true underlying pattern that you wish to learn from the
data.
“Noise” on the other hand, refers to the irrelevant information or
randomness in a dataset.
Overfitting occurs when our machine learning model tries to cover all the
data points or more than the required data points present in the given
dataset. Because of this, the model starts caching noise and inaccurate
values present in the dataset, and all these factors reduce the efficiency
and accuracy of the model. The overfitted model has low bias and high
variance.
The chances of occurrence of overfitting increase as much we provide
training to our model. It means the more we train our model, the more
chances of occurring the overfitted model.
Overfitting is the main problem that occurs in supervised learning.
Example: The concept of the overfitting can be understood by the below
graph of the linear regression output:
Cross-Validation
Training with more data
Removing features
Early stopping the training
Regularization
Ensembling
The "Goodness of fit" term is taken from the statistics, and the goal of the
machine learning models to achieve the goodness of fit. In statistics
modeling, it defines how closely the result or predicted values match the
true values of the dataset.
The model with a good fit is between the underfitted and overfitted
model, and ideally, it makes predictions with 0 errors, but in practice, it is
difficult to achieve it.
There are two other methods by which we can get a good point for our
model, which are the resampling method to estimate model accuracy
and validation dataset.
Linear Regression
Logistic Regression
Polynomial Regression
Support Vector Regression
Decision Tree Regression
Random Forest Regression
Ridge Regression
Lasso Regression:
Recall the geometry lesson from high school. What is the equation of a
line?
y = mx + c
➢ m is the slope. It determines what will be the angle of the line. It is the
parameter denoted as β.
y = b0 + b1 * x1
➢ b0 is constant.
➢ y is dependent variable
#importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('salary_data.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
viz_train = plt
viz_train.scatter(X_train, y_train, color='red')
viz_train.plot(X_train, regressor.predict(X_train), color='blue')
viz_train.title('Salary VS Experience (Training set)')
viz_train.xlabel('Year of Experience')
viz_train.ylabel('Salary')
viz_train.show()
viz_test = plt
viz_test.scatter(X_test, y_test, color='red')
viz_test.plot(X_train, regressor.predict(X_train), color='blue')
viz_test.title('Salary VS Experience (Test set)')
viz_test.xlabel('Year of Experience')
viz_test.ylabel('Salary')
viz_test.show()
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
After running above code excluding code explanations part, you can see
2 plots in the console window as shown below:
Output :
y_pred = regressor.predict(X_test)
For Examples:
The selling price of a house can depend on the desirability of the
location, the number of bedrooms, the number of bathrooms, the year the
house was built, the square footage of the plot and a number of other
factors.
The height of a child can rest on the height of the mother, the height of the
father, nutrition, and environmental factors.
y = b0 + b1 * x1
Or
i
Y= b0 + ∑ bn
1
#Importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
dataset = pd.read_csv('salary_data.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
#Splitting the dataset into the Training set and Test set
regressor = LinearRegression()
regressor.fit(X_train, y_train)
x_new = [[5],[2],[1],[2]]
y_pred = regressor.predict(np.array(x_new).reshape(1, 4))
print(y_pred)
accuracy = (regressor.score(X_test,y_test))
print(accuracy)
You can offer to your candidate the salary of ₹48017.20 and this is the
best salary for him!
where,
b0 is constant .
y is dependent variable
bicoefficient can be thought of as a multiplier that connects the
independent and dependent variables. It translates how much y will be
affected by a degree or powerof change in x. In other words, a change in
xi does not usually mean an equal change in y.
x is an independent variable.
#Importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv(‘position_salaries’)
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
def viz_polymonial():
plt.scatter(X, y, color='red')
plt.plot(X, pol_reg.predict(poly_reg.fit_transform(X)), color='blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
return
viz_polymonial()
def viz_polymonial_smooth():
X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape(len(X_grid), 1)
plt.scatter(X, y, color='red')
plt.plot(X_grid, pol_reg.predict(poly_reg.fit_transform(X_grid)),
color='blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
return
viz_polymonial_smooth()
print(pol_reg.predict(poly_reg.fit_transform([[5.5]])))
Output:
It’s time to let our candidate know, we will offer him a best salary in class
with₹ 132,148!
1. Boosting
Boosting refers to a group of algorithms that utilize weighted averages to
make weak learners into stronger learners. Boosting is all about
“teamwork”. Each model that runs, dictates what features the next model
will focus on.In boosting as the name suggests, one is learning from other
which in turn boosts the learning.
Lazy Learners: Lazy Learner firstly stores the training dataset and wait
until it receives the test dataset. In Lazy learner case, classification is done
on the basis of the most related data stored in the training dataset. It takes
less time in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning.
Linear Models
• Logistic Regression
• Support Vector Machines
Non-linear Models
• K-Nearest Neighbors
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification
What is Logistic Regression :
P(X) = b0 + b1 *X
(
log P(x)/1-P(x) )= b 0 + b1 * X
where, the left-hand side is called the logit or log-odds function, and p(x)/(1-
p(x)) is called odds.
The odds signify the ratio of probability of success [p(x)] to probability of
failure [ 1- p(X)]. Therefore, in Logistic Regression, linear combination of
inputs is mapped to the log(odds) - the output being equal to 1.
If we take an inverse of the above function, we get:
P(x) =
For step 3, the most used distance formula is Euclidean Distance which is
given as follows:
By Euclidean Distance, the distance between two points P1(x1,y1)and
P2(x2,y2) can be expressed as :
The K-NN working can be explained on the basis of the below algorithm:
Suppose we have a new data point and we need to put it in the required
category. Consider the below image:
Below are some points to remember while selecting the value of K in the
KNN algorithm:
There is no particular way to determine the best value for "K", so we need
to try some values to find the best out of them. The most preferred value
for K is 5.
A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.
Large values for K are good, but it may find some difficulties.
It is simple to implement.
It is robust to the noisy training data
It can be more effective if the training data is large.
In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision and
have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
The decisions or the test are performed on the basis of features of the
given dataset.
Below are the two reasons for using the Decision tree:
Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more
homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree. This algorithm compares
the values of root attribute with the record (real dataset) attribute and,
based on the comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with
the other sub-nodes and move further. It continues the process until it
reaches the leaf node of the tree. The complete process can be better
understood using the below algorithm:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best
attributes.
Step-4: Generate the decision tree node, which contains the best
attribute.
Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as a
leaf node.
The below diagram explains the working of the Random Forest algorithm:
➢ Model Accuracy:
Model accuracy in terms of classification models can be defined as the
ratio of correctly classified samples to the total number of samples:
➢ Model Accuracy:
Model accuracy in terms of classification models can be defined as the
ratio of correctly classified samples to the total number of samples:
Market Segmentation
Statistical data analysis
Social network analysis
Image segmentation
Anomaly detection, etc.
The clustering methods are broadly divided into Hard Clustering (data
point belongs to only one group) and Soft Clustering (data points can
belong to another group also). But there are also other various
approaches of Clustering exist. Below are the main clustering methods
used in Machine learning:
Partitioning Clustering
Density-Based Clustering
Distribution Model-Based Clustering
Hierarchical Clustering
Fuzzy Clustering
Applications of Clustering :
2.In Search Engines: Search engines also work on the clustering technique.
The search result appears based on the closest object to the search query.
It does it by grouping similar data objects in one group that is far from the
other dissimilar objects. The accurate result of a query depends on the
quality of the clustering algorithm used.
5.In Land Use: The clustering technique is used in identifying the area of
similar lands use in the GIS database. This can be very useful to find that
for what purpose the particular land should be used, that means for which
purpose it is more suitable.
It allows us to cluster the data into different groups and a convenient way to
discover the categories of groups in the unlabeled dataset on its own
without the need for any training.
The algorithm takes the unlabeled dataset as input, divides the dataset into
k-number of clusters, and repeats the process until it does not find the best
clusters. The value of k should be predetermined in this algorithm.
Assigns each data point to its closest k-center. Those data points which
are near to the particular k-center, create a cluster.
Hence each cluster has data points with some commonalities, and it is
away from other clusters. The below diagram explains the working of the
K-means Clustering Algorithm:
Elbow Method :
The Elbow method is one of the most popular ways to find the optimal
number of clusters. This method uses the concept of WCSS
value. WCSS stands for Within Cluster Sum of Squares, which defines
the total variations within a cluster. The formula to calculate the value of
WCSS (for 3 clusters) is given below:
∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances
between each data point and its centroid within a cluster1 and the same for
the other two terms.
To measure the distance between data points and centroid, we can use any
method such as Euclidean distance or Manhattan distance.
To find the optimal value of clusters, the elbow method follows the below
steps:
It executes the K-means clustering on a given dataset for different K values
(ranges from 1-10).
For each value of K, calculates the WCSS value.
Plots a curve between calculated WCSS values and the number of clusters K.
The sharp point of bend or a point of the plot looks like an arm, then that
point is considered as the best value of K.
Since the graph shows the sharp bend, which looks like an elbow, hence it is
known as the elbow method. The graph for the elbow method looks like the
below image:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Hierarchical clustering is another unsupervised machine learning
algorithm, which is used to group the unlabeled datasets into a cluster
and also known as hierarchical cluster analysis or HCA.
In this algorithm, we develop the hierarchy of clusters in the form of a
tree, and this tree-shaped structure is known as the dendrogram.
Sometimes the results of K-means clustering and hierarchical clustering
may look similar, but they both differ depending on how they work. As
there is no requirement to predetermine the number of clusters as we did
in the K-Means algorithm.
3.Lift : It is the strength of any rule, which can be defined as below formula:
Step-2: Take all supports in the transaction with higher support value than
the minimum or selected support value.
Step-3: Find all the rules of these subsets that have higher confidence value
than the threshold or minimum confidence.
In the first step, we will create a table that contains support count (The
frequency of each itemset individually in the dataset) of each itemset in
the given dataset. This table is called the Candidate set or C1.
In this step, we will generate C2 with the help of L1. In C2, we will create
the pair of the itemsets of L1 in the form of subsets.
After creating the subsets, we will again find the support count from the
main transaction table of datasets, i.e., how many times these pairs have
occurred together in the given dataset. So, we will get the below table for
C2:
Now we will create the L3 table. As we can see from the above C3 table,
there is only one combination of itemset that has support count equal to
the minimum support count. So, the L3 will have only one combination,
i.e., {A, B, C}.
To generate the association rules, first, we will create a new table with the
possible rules from the occurred combination {A, B.C}. For all the rules,
we will calculate the Confidence using formula sup( A ^B)/A. After
calculating the confidence value for all rules, we will exclude the rules
that have less confidence than the minimum threshold(50%).
Consider the below table:
(A ^B)= 2/4=0.5=50%
T9 1 1 1 0 0
ITEM TIDSET
Bread {T1, T4, T5, T7, T8, T9}
Butter {T1, T2, T3, T4, T6, T8, T9}
Milk {T3, T5, T6, T7, T8, T9}
Coffee {T2, T4}
Tea {T1, T8}
ITEM TIDSET
{Bread, Butter} {T1, T4, T8, T9}
{Bread, Milk} {T5, T7, T8, T9}
{Bread, Coffee} {T4}
{Bread, Tea} {T1, T8}
{Butter, Milk} {T3, T6, T8, T9}
{Butter, Coffee} {T2, T4}
{Butter, Tea} {T1, T8}
{Milk, Tea} {T8}
ITEM TIDSET
{Bread, Butter, Milk} {T8, T9}
{Bread, Butter, Tea} {T1, T8}
K=4
ITEM TIDSET
{Bread, Butter, Milk, Tea} {T8}