UNIT-1 TO Artificial Intelligence
UNIT-1 TO Artificial Intelligence
INTRODUCTION
TO
ARTIFICIAL
INTELLIGENCE
What is AI
?
What is AI ?
Artificial Intelligence (AI) is a branch of Science which deals with
helping machines
find solutions to complex problems in a more human-like fashion.
This generally involves borrowing characteristics from human
intelligence, and applying them as algorithms in a computer friendly
way.
A more or less flexible or efficient approach can be taken
depending on the requirements established, which influences how
artificial the intelligent behavior appears
Artificial intelligence can be viewed from a variety of perspectives.
From the perspective of intelligence artificial intelligence is
making machines "intelligent" -- acting as we would expect people to
act.
The inability to distinguish computer responses from human responses
is called the
Turing test.
Intelligence requires knowledge
Expert problem solving - restricting domain to allow including
significant relevant knowledge
What is AI
?
Object-oriented languages are a class of languages more recently
used for AI programming. Important features of object-oriented
languages include: concepts of objects and messages, objects bundle
data and methods for manipulating the data, sender specifies what is
to be done receiver decides how to do it, inheritance (object hierarchy
where objects inherit the attributes of the more general class of
objects). Examples of object-oriented languages are Smalltalk,
Objective C, C++. Object oriented extensions to LISP (CLOS -
Common LISP Object System) and PROLOG (L&O - Logic & Objects) are
also used.
Artificial Intelligence is a new electronic machine that stores large
amount of information and process it at very high speed
The computer is interrogated by a human via a teletype It passes
if the human cannot tell if there is a computer or human at the other
end
The ability to solve problems
It is the science and engineering of making intelligent machines,
especially intelligent computer programs. It is related to the similar
task of using computers to understand human intelligence
-
History of Artificial
Intelligence
Artificial Intelligence is not a new word and not a new technology for
researchers. This technology is much older than you would imagine.
Even there are the myths of Mechanical men in Ancient Greek and
Egyptian Myths. Following are some milestones in the history of AI
which defines the journey from the AI generation to till date
development
-
History of Artificial
Intelligence
Maturation of Artificial Intelligence (1943-1952)
Year 1943: The first work which is now recognized as AI was done by
Warren McCulloch and Walter pits in 1943. They proposed a model
of artificial neurons.
Year 1949: Donald Hebb demonstrated an updating rule for modifying
the
connection strength between neurons. His rule is now called Hebbian
learning.
Year 1950: The Alan Turing who was an English mathematician and
pioneered Machine learning in 1950. Alan Turing publishes "Computing
Machinery and Intelligence" in which he proposed a test. The test can
check the machine's ability to exhibit intelligent behavior equivalent to
human intelligence, called a Turing test.
-
History of Artificial
Intelligence
The first AI winter (1974-1980)
The duration between years 1974 to 1980 was the first AI winter
duration. AI winter refers to the time period where computer
scientist dealt with a severe shortage of funding from
government for AI researches.
During AI winters, an interest of publicity on artificial intelligence
was decreased.
A boom of AI (1980-1987)
Year 1997: In the year 1997, IBM Deep Blue beats world chess
champion, Gary Kasparov, and became the first computer to beat a
world chess champion.
Year 2002: for the first time, AI entered the home in the form of
Roomba, a vacuum
cleaner.
Year 2006: AI came in the Business world till the
- year 2006.
Companies like Facebook, Twitter, and Netflix also started using
AI.
History of Artificial
Intelligence
Deep learning, big data and artificial general intelligence (2011-present)
Year 2011: In the year 2011, IBM's Watson won jeopardy, a quiz show,
where it had to solve the complex questions as well as riddles.
Watson had proved that it could understand natural language and can
solve tricky questions quickly.
Year 2012: Google has launched an Android app feature "Google now",
which was able to provide information to the user as a prediction.
Year 2014: In the year 2014, Chatbot "Eugene Goostman" won a
competition in the
infamous "Turing test."
Year 2018: The "Project Debater" from IBM debated on complex
topics with two master debaters and also performed extremely well.
Google has demonstrated an AI program "Duplex" which was a
virtual assistant and which had taken hairdresser appointment on call,
and lady on other side didn't notice that she was talking with the
machine. Now AI has developed to a remarkable level. The
concept of Deep learning, big data, and data science are now
-
trending like a boom. Nowadays companies like Google, Face book,
IBM, and Amazon are working with AI and creating amazing devices.
The future of Artificial Intelligence is inspiring and will come with high
intelligence.
Branches of
AI
1. Game Playing
You can buy machines that can play master level chess for a few hundred
dollars.
There is some AI in them, but they play well against people mainly through
brute
force computation--looking at hundreds of thousands of positions. To
beat a world champion by brute force and known reliable heuristics
requires being able to look at 200 million positions per second.
2. Speech Recognition
In the 1990s, computer speech recognition reached a practical level for
limited
purposes. Thus United Airlines has replaced its keyboard tree for flight
information by a system using speech recognition of flight numbers
and city names. It is quite convenient. On the other
- hand, while it is
possible to instruct some computers using
speech, most users have gone back to the keyboard and the mouse as
Branches of
AI
3. Understanding Natural Language
Just getting a sequence of words into a computer is not enough. Parsing
sentences is not enough either. The computer has to be provided with
an understanding of the
domain the text is about, and this is presently possible only for very
limited domains.
4. Computer Vision
The world is composed of three-dimensional objects, but the inputs to
the human eye
and computers' TV cameras are two dimensional. Some useful programs
can work solely in two dimensions, but full computer vision requires
partial three-dimensional information that is not just a set of two-
dimensional views. At present there are only limited ways of
representing three-dimensional information directly, and they are not as
-
good as what humans evidently use.
Branches of
AI
Expert Systems
5.
A ``knowledge engineer'' interviews experts in a certain domain and
tries to embody their knowledge in a computer program for carrying
out some task. How well this
works depends on whether the intellectual mechanisms required for
the task are within the present state of AI. When this turned out not
to be so, there were many disappointing results.
6. Heuristic Classification
One of the most feasible kinds of expert system given the present
knowledge of AI is
to put some information in one of a fixed set of categories using
several sources of
information. An example is advising whether to accept a proposed credit
card purchase. Information is available about the owner of the credit
card, his record of payment and also about the -item he is buying and
about the establishment from which he is buying it (e.g., about whether
there have been previous credit card frauds at this establishment).
Applications of
AI
Artificial Intelligence has various applications in today's society. It
is becoming essential for today's time because it can solve complex
problems with an efficient way in multiple industries, such as
Healthcare, entertainment, finance, education, etc. AI is making our
daily life more comfortable and fast.
Following are some sectors which have the application of Artificial
Intelligence:
-
Applications of
AI
1. AI in Astronomy
Artificial Intelligence can be very useful to solve complex universe
problems. AI technology can be helpful for understanding the universe
such as how it works, origin, etc.
2.AI in Healthcare
In the last, five to ten years, AI becoming more advantageous for
the healthcare industry and going to have a significant impact on
this industry.
Healthcare Industries are applying AI to make a better and faster
diagnosis than humans. AI can help doctors with diagnoses and can
inform when patients are worsening so that medical help can reach
to the patient before hospitalization.
3. AI in Gaming
AI can be used for gaming purpose. The AI machines can play
strategic games like chess, where the machine needs to think of a
large number of possible places.
-
Applications of
AI
4. AI in Finance
AI and finance industries are the best matches for each other. The finance
industry is implementing automation, chatbot, adaptive intelligence,
algorithm trading, and machine learning into financial processes.
5. AI in Data Security
The security of data is crucial for every company and cyber-attacks are
growing very rapidly in the digital world. AI can be used to make your
data more safe and secure. Some examples such as AEG bot, AI2 Platform,
are used to determine software bug and cyber-attacks in a better way.
6. AI in Social Media
Social Media sites such as Facebook, Twitter, and Snapchat contain billions of
user profiles, which need to be stored and managed in a very efficient way.
AI can organize and manage massive amounts of data. AI can analyze lots of
data to identify the latest trends, hashtag, and requirement of different
users.
-
Applications of
AI
7. AI in Travel & Transport
AI is becoming highly demanding for travel industries. AI is capable of
doing various travel related works such as from making travel
arrangement to suggesting the hotels, flights, and best routes to the
customers. Travel industries are using AI- powered chatbots which can
make human-like interaction with customers for better and fast
response.
8.AI in Automotive Industry
Some Automotive industries are using AI to provide virtual assistant to
their user for better performance. Such as Tesla has introduced
TeslaBot, an intelligent virtual assistant.
Various Industries are currently working for developing self-driven
cars which can make your journey more safe and secure.
9. AI in Robotics:
Artificial Intelligence has a remarkable role in Robotics. Usually,
general robots are programmed such that they can perform some
repetitive task,
experiences but with
without pre- the help of AI, we can
-
create intelligent
robots which can perform tasks with their own
programmed.
Humanoid Robots are best examples for AI in robotics, recently the
intelligent Humanoid robot named as Erica and Sophia has been
developed which can talk and behave like humans.
10. AI in Entertainment
We are currently using some AI based applications in our daily
life with some entertainment services such as Netflix or Amazon.
With the help of ML/AI algorithms, these services show the
recommendations for programs or shows.
11. AI in Agriculture
Agriculture is an area which requires various resources, labor, money,
and time for best result. Now a day's agriculture is becoming digital,
and AI is emerging in this field. Agriculture is applying AI as
agriculture robotics, solid and crop monitoring, predictive analysis. AI
in agriculture can be very helpful for farmers.
-
Applications of
AI
12. AI in E-commerce
AI is providing a competitive edge to the e-commerce industry, and
it is becoming more demanding in the e-commerce business. AI is
helping shoppers to discover associated products with recommended
size, color, or even brand.
13.AI in education:
AI can automate grading so that the tutor can have more time to
teach. AI chatbot
can communicate with students as a teaching assistant.
AI in the future can be work as a personal virtual tutor for students,
which will be accessible easily at any time and any place.
-
AI Problem &
Techniques
Followin are problems that can be solved by
g AI.
someFollowing using
categories of problems are considered as AI
problems.
Ordinary Problems
1.Perception
Vision
Voice Recognition
Speech Recognition
2.Natural Language
Understanding
Generation
Translation
3.Robot Control
-
AI Problem &
Techniques
Followin are problems that can be solved by
g AI.
someFollowing using
categories of problems are considered as AI
problems.
Formal Problems
Game Playing
Solving complex mathematical Problem
Expert Problems
Design
Fault Finding
Scientific Analysis
Medical Diagnosis
Financial Analysis
-
AI Problem &
Techniques
There are three important AI techniques:
Search — Provides a way of solving problems for which no direct
approach is available. It also provides a framework into which any
direct techniques that are available can be embedded.
-
Thanks
!!!
-
Chapter 2: Problem Spaces &
Search
The objective of this lesson is to provide an overview
of representation techniques i.e.
problem
Representing AI problems as a mathematical model
Representing AI problems as a production system
Defining AI problems as a state space search
Thislesson also gives in depth knowledge about searching
the
techniques BFS and DFS search algorithm with
advantages
Solution 2
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Solution 3
Liter in 4 Liter Jug Liter in 3 Liter Jug Rule Applied
0 0
4 0 1
1 3 8
0 3 3
3 0 5
3 3 2
4 2 7
0 2 3
2 0 5
Solution 3
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Water Jug Problem of 8,5 and 3 Ltr Jug
The following is a problem which can be solved by using state
space search technique. “we have 3 jugs of capacities 3,5, and
8 liters respectively. There is no scale on the jugs. So it is only
their capacities that we certainly know. Initially the 8 liter jug
is full of water, the other two are empty. We can pour water
from one jug to another, and the goal is to have exactly 4 liters
of water in any of the jug. There is no scale on the jug and we
do not have any other tools that would help. The amount of
water in the other two jugs at the end is irrelevant.
Formalize the above problem as state space search . You should
1. Suggest suitable representation of the problem
2. State the initial and goal state of this problem
3. Specify the production rules for getting from one state to
another
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Water Jug Problem of 8,5 and 3 Ltr Jug
Solution:-
The state space for this problem can be defined as
x -represents the number of liters of water in the 8-liter
jug
y -represents the number of liters of water in the 5-liter
jug
Therefore, x =0,1,2,3,5,6,70r 8
z –represent the number of liters of water in he 3-liter jug
y=0,1,2 ,3,4 or 5
z=0,1,2 or 3
The initial state is ( 8,0,0) .The goal state is to get 4 liter of water
in any jug.
The goal state can be defined as (4,n,n) or (n,4,n) for any value of
n
The initial state (i,j) is (3,3) i.e. three missionaries and three
cannibals on side A of a river and ( 0,0) on side B of the river.
1 2 3 1 2 3 1 2 3
7 8 4 7 8 4 7 4
6 5 6 5 6 8 5
1 2 3
8 4
7 6 5
2 8 3 1 2 3
1 4 8 4
7 6 5 7 6 5
2 3 1 2 3
1 8 4 8 4
7 6 5 7 6 5
Actions: It gives the description of all the available actions to the agent.
Solution: It is an action sequence which leads from the start node to the
goal node.
Optimal Solution: If a solution has the lowest cost among all solutions.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Search and Control Strategies
Properties of Search Algorithms:
Breadth-first Search
Depth-first Search
Depth-limited Search
Iterative deepening depth-first search
Uniform cost search
Bidirectional Search
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Issues in the design of search programs
Issues in the design of search programs :
Advantages:
BFS will provide a solution if any solution exists.
If there are more than one solutions for a given problem, then BFS will
provide the minimal solution which requires the least number of steps.
Example:
In the below tree structure, we have shown the traversing of the tree using
BFS algorithm from the root node S to goal node K. BFS search algorithm
traverse in layers, so it will follow the path which is shown by the dotted
arrow, and the traversed path will be:
S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K
Advantage:
DFS requires very less memory as it only needs to store a stack of the
nodes on the path from root node to the current node.
It takes less time to reach to the goal node than BFS algorithm (if it
traverses in the right path).
Example:
In the below search tree, we have shown the flow of depth-first search, and
it will follow the order as:
It will start searching from root node S, and traverse A, then B, then D and
E, after traversing E, it will backtrack the tree as E has no other successor
and still goal node is not found. After backtracking it will traverse node C
and then G, and here it will terminate as it found goal node.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Depth-first Search
Space Complexity: DFS algorithm needs to store only single path from the
root node, hence space complexity of DFS is equivalent to the size of the
fringe set, which is O(bm).
BFS stands for Breadth First Search. DFS stands for Depth First Search.
BFS is more suitable for searching vertices which are DFS is more suitable when there are solutions away from
closer to the given source. source.
The Time complexity of BFS is O(V + E) when The Time complexity of DFS is also O(V + E) when
Adjacency List is used and O(V^2) when Adjacency Adjacency List is used and O(V^2) when Adjacency
Matrix is used, where V stands for vertices and E Matrix is used, where V stands for vertices and E
stands for edges. stands for edges.
Advantages:
Depth-limited search is Memory efficient.
Disadvantages:
It does not care about the number of steps involve in searching and only
concerned about path cost. Due to which this algorithm may be stuck in an
infinite loop.
Example:
Time Complexity:
Let C* is Cost of the optimal solution, and ε is each step to get closer to the
goal node. Then the number of steps is = C*/ε+1. Here we have taken +1, as we
start from state 0 and end to C*/ε.Hence, the worst-case time complexity of
Uniform-cost search isO(b1 + [C*/ε])/.
Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of
Uniform-cost search is O(b1 + [C*/ε]).
Optimal:
Uniform-cost search is always optimal as it only selects a path with the lowest
path cost.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Iterative deepening depth-first Search
5. Iterative deepening depth-first Search:
Disadvantages:
The main drawback of IDDFS is that it repeats all the work of the
previous phase.
Example:
Completeness:
This algorithm is complete is if the branching factor is finite.
Time Complexity:
Let's suppose b is the branching factor and depth is d then the worst-case
time complexity is O(bd).
Space Complexity:
The space complexity of IDDFS will be O(bd).
Optimal:
IDDFS algorithm is optimal if path cost is a non- decreasing function of
the depth of the node.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Bidirectional Search
6. Bidirectional Search Algorithm:
Advantages:
Bidirectional search is fast.
Bidirectional search requires less memory
Example:
In the below search tree, bidirectional search algorithm is applied. This
algorithm divides one graph/tree into two sub-graphs. It starts traversing
from node 1 in the forward direction and starts from goal node 16 in the
backward direction.
The algorithm terminates at node 9 where two searches meet.
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic
cost should be less than or equal to the estimated cost.
Greedy best-first search algorithm always selects the path which appears
best at that moment. It is the combination of depth-first search and breadth-
first search algorithms. It uses the heuristic function and search. Best-first
search allows us to take the advantages of both algorithms. With the help of
best-first search, at each step, we can choose the most promising node. In
the best first search algorithm, we expand the node which is closest to the
goal node and the closest cost is estimated by heuristic function, i.e.
f(n)= g(n). Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Best-first Search
Where, h(n)= estimated cost from node n to the goal.
The greedy best first algorithm is implemented by the priority queue.
Disadvantages:
It can behave as an unguided depth-first search in the worst case scenario.
It can get stuck in a loop as DFS.
This algorithm is not optimal.
Example:
Consider the below search problem, and we will traverse it using greedy
best-first search. At each iteration, each node is expanded using evaluation
function f(n)=h(n) , which is given in the below table.
Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then
return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of
evaluation function (g+h), if node n is goal node then return success and
stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the
closed list. For each successor n', check whether n' is already in the OPEN
or CLOSED list, if not then compute evaluation function for n' and place
into Open list.
Advantages:
A* search algorithm is
the best algorithm than
other search algorithms.
A* search algorithm is optimal and complete.
This algorithm can solve very complex problems.
Disadvantages:
It does not always produce the shortest path as it
mostly based on heuristics
and approximation.
A* search algorithm has some complexity issues. Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
The main drawback of A* is memory
A* Search
Example:
In this example, we will traverse the given graph using the A* algorithm.
The heuristic value of all states is given in the below table so we will
calculate the f(n) of each state using the formula f(n)= g(n) + h(n), where
g(n) is the cost to reach any node from start state.
Here we will use OPEN and CLOSED list.
Points to remember:
A* algorithm returns the path which occurred first, and it does not search
for all remaining paths.
The efficiency of A* algorithm depends on the quality of heuristic.
A* algorithm expands all nodes which satisfy the condition f(n)
=149+100=249
Put these node in START list and sort the list
hence START = [ K(172),L(243),I(249),B(258),D(285)]
Remove Bestnode from START i.e K which is not our goal
node and hence generate its successor i.e M and calculate
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
its f values Pune.
A* Algorithm Example
F(A-C-F-J-K-M) = g(m)+h(m)
=172+0=172
Put these node in START list
and sort the list
Therefore START =
[ M(172),L(243),B(258),D(2
85)]
Remove Bestnode from START i.e M which is our
goal node
AND-OR graphs are useful for certain problems where the solution
involves decomposing the problem into smaller problems. This is called
Problem Reduction.
Here, alternatives involves branches where some or all must be satisfied
before we can progress.
In case of A* algorithm, we use the open list to hold nodes that have been
generated but not expanded & the closed list to hold nodes that have been
expanded.
It requires that nodes traversed in the tree be labelled as, SOLVED or
UNSOLVED in the solution process to account for AND node solutions
which requires solutions to all successor nodes.
A solution is found when the start node is labelled as SOLVED.
AO* is best algorithm for solving cyclic AND-OR graphs.
TV Set
AND
OR
Disadvantages:
Sometimes for unsolvable nodes, it can’t find the optimal path. Its
complexity is than other algorithms.
OR
A1*A2 A3 A1 A2*A3
1 1
1
In figure the top node A has been expanded producing two area one leading to B
and leading to C- . The numbers at each node represent heuristic cost (h at that
node (cost of getting to the goal state from current state). For simplicity, it is
assumed that every operation(i.e. applying a rule) has unit cost, i.e., each arc with
single successor will have a cost of 1 and each of its components. With the
available information till now , it appears that C is the most promising node to
expand since its h = 3 , the lowest but going through B would be better since to
use C we must also use D and the cost would be 9(3+4+1+1). Through B it
would be 6(5+1).
Thus the choice of the next node to expand depends not only on a value but also
on whether that node is part of the current best path form the initial node.
In figure the node G appears to be the most promising node, with the
least f ' value. But G is not on the current best path, since to use G we
must use GH with a cost of 9 and again this demands that arcs be used
(with a cost of 27). The path from A through B, E-F is better with a
total cost of (17+1=18).
2. Pick one of these unexpanded nodes and expand it. Add its
successors to the graph and computer f (cost of the
remaining distance) for each of them.
Step 1: Evaluate the initial state, if it is goal state then return success and
Stop.
Step 2: Loop Until a solution is found or there is no new operator left to
apply.
Step 3: Select and apply an operator to the current state.
Step 4: Check new state:
a. If it is goal state, then return success and quit.
b.Else if it is better than the current state then assign new state
as a current state.
c. Else if not better than the current state, then return to step2.
Step 5: Exit.
Stochastic hill climbing does not examine for all its neighbor before moving.
Rather, this search algorithm selects one neighbor node at random and
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
decides whether to choose it as a current state or
Pune.examine another state.
Hill Climbing
Problems in Hill Climbing Algorithm:
1 2 3 1 2 3 1 2 3
H=4 7 8 4 7 8 4 7 4
H=2 H=3
6 5 6 5 6 8 5
1 2 3 1 2 3 1 2 3
7 8 H=1 8 4 7 8 4 H=3
H=5
6 5 4 7 6 5 6 5
2 3 1 2 3
Search tree for 8
1 8 4 H=0 8 4
H=2 puzzle problem by hill
7 6 5 Goal State 7 6 5 climbing procedure
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Tower of Hanoi Problem
Problem Statement:
First, evaluate the difference between Initial State and final State.
Select the various operators which can be applied for each difference.
Apply the operator at each difference, which reduces the difference
between the current state and goal state.
Solution:
To solve the above problem, we will first find the differences between initial
states and goal states, and for each difference, we will generate a new state
and will apply the operators. The operators we have for this problem are:
Move
Delete
Expand
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Means-Ends Analysis
1. Evaluating the initial state: In the first step, we will evaluate the initial state and
will compare the initial and Goal state to find the differences between both states.
2. Applying Delete operator: As we can check the first difference is that in goal state
there is no dot symbol which is present in the initial state, so, first we will apply the
Delete operator to remove this dot.
3. Applying Move Operator: After applying the Delete operator, the new state occurs
which we will again compare with goal state. After comparing these states, there is
another difference that is the square is outside the circle, so, we will apply the
Move Operator.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Means-Ends Analysis
4. Applying Expand Operator: Now a new state is generated in the third step, and we will
compare this state with the goal state. After comparing the states there is still one difference
which is the size of the square, so, we will apply Expand operator, and finally, it will generate
the goal state.
START
1 2 3 4 5 6 7
WALK(R1) PICKUP(A) PUTDOWN(A) PICKUP(B) PUTDOWN(B) PUSH(D,R2) WALK (R1)
8 9 10 11 12 13 14
PICKUP(A) CARRY(A,R2) PUTDOWN(A) WALK(R1) PICKUP(B) CARRY (B,R2) PLACE(A,B )
GOAL
Eg.
Solution:
From first row of multiplication it is clear that B=1
as JE*B=JE
As in the multiplication, second row should start from 0
at tenth's place. So A = 0.
Now in the hundred's place, J + Something = 10.
When you add something to the single digit number that
results in
10. So J = 9.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Cryptarithmatic Problem
Now J+E=10+D i.e 9+E=10+D . Here E can not be 0,1
as these digits are assigned to A and Bresp.
Assume E=2 which gives 9+2=11 means D=1 which is
not possible therefore E can not be 2
Assume E=3 which gives 9+3=12 hence D=2
Hence the solution is
Solution:
From the first row of multiplication, H =1 is clear, As HE x H =
HE.
Now, H+A=M i.e 1+A=10+M as there is carry over next level
Therefore A=9 ,M=0 and N=2
Now, HE*E=HHA i.e 1E*E=119 so by trial and error we get E=7
* *
* *
*
*
1 2 3 1 2 3 1 2 3
7 8 4 7 8 4 7 4
6 5 6 5 6 8 5
1 2 3
8 4
7 6 5
2 8 3 1 2 3
1 4 8 4
7 6 5 7 6 5
2 3 1 2 3
1 8 4 8 4
7 6 5 7 6 5
Solution:
Set of variables Xi=[Pune, Mumbai, Nasik, Jalgaon, Nagpur] Set
of
domain Di=[Red, Green, Blue] for each xi
Constraint: No adjacent city have the same color
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Map or GraphColoring Problem
City/ Operation Pune Nasik Mumbai Nagpur Jalgaon
Initial Domain RGB RGB RGB RGB RGB
Assign Red to Pune R GB GB RGB RGB
Assign Green to Nasik R G B RG RG
Assign Red to Nagpur R G B R G
Assign Green to R G B R G
Jalgaon
Breadth-first Search
Depth-first Search
Depth-limited Search
Iterative deepening depth-first search
Uniform cost search
Bidirectional Search
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Issues in the design of search programs
Issues in the design of search programs :
Advantages:
BFS will provide a solution if any solution exists.
If there are more than one solutions for a given problem, then BFS will
provide the minimal solution which requires the least number of steps.
Example:
In the below tree structure, we have shown the traversing of the tree using
BFS algorithm from the root node S to goal node K. BFS search algorithm
traverse in layers, so it will follow the path which is shown by the dotted
arrow, and the traversed path will be:
S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K
Advantage:
DFS requires very less memory as it only needs to store a stack of the
nodes on the path from root node to the current node.
It takes less time to reach to the goal node than BFS algorithm (if it
traverses in the right path).
Example:
In the below search tree, we have shown the flow of depth-first search, and
it will follow the order as:
It will start searching from root node S, and traverse A, then B, then D and
E, after traversing E, it will backtrack the tree as E has no other successor
and still goal node is not found. After backtracking it will traverse node C
and then G, and here it will terminate as it found goal node.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Depth-first Search
Space Complexity: DFS algorithm needs to store only single path from the
root node, hence space complexity of DFS is equivalent to the size of the
fringe set, which is O(bm).
BFS stands for Breadth First Search. DFS stands for Depth First Search.
BFS is more suitable for searching vertices which are DFS is more suitable when there are solutions away from
closer to the given source. source.
The Time complexity of BFS is O(V + E) when The Time complexity of DFS is also O(V + E) when
Adjacency List is used and O(V^2) when Adjacency Adjacency List is used and O(V^2) when Adjacency
Matrix is used, where V stands for vertices and E Matrix is used, where V stands for vertices and E
stands for edges. stands for edges.
Advantages:
Depth-limited search is Memory efficient.
Disadvantages:
It does not care about the number of steps involve in searching and only
concerned about path cost. Due to which this algorithm may be stuck in an
infinite loop.
Example:
Time Complexity:
Let C* is Cost of the optimal solution, and ε is each step to get closer to the
goal node. Then the number of steps is = C*/ε+1. Here we have taken +1, as we
start from state 0 and end to C*/ε.Hence, the worst-case time complexity of
Uniform-cost search isO(b1 + [C*/ε])/.
Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of
Uniform-cost search is O(b1 + [C*/ε]).
Optimal:
Uniform-cost search is always optimal as it only selects a path with the lowest
path cost.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Iterative deepening depth-first Search
5. Iterative deepening depth-first Search:
Disadvantages:
The main drawback of IDDFS is that it repeats all the work of the
previous phase.
Example:
Completeness:
This algorithm is complete is if the branching factor is finite.
Time Complexity:
Let's suppose b is the branching factor and depth is d then the worst-case
time complexity is O(bd).
Space Complexity:
The space complexity of IDDFS will be O(bd).
Optimal:
IDDFS algorithm is optimal if path cost is a non- decreasing function of
the depth of the node.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Bidirectional Search
6. Bidirectional Search Algorithm:
Advantages:
Bidirectional search is fast.
Bidirectional search requires less memory
Example:
In the below search tree, bidirectional search algorithm is applied. This
algorithm divides one graph/tree into two sub-graphs. It starts traversing
from node 1 in the forward direction and starts from goal node 16 in the
backward direction.
The algorithm terminates at node 9 where two searches meet.
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic
cost should be less than or equal to the estimated cost.
Greedy best-first search algorithm always selects the path which appears
best at that moment. It is the combination of depth-first search and breadth-
first search algorithms. It uses the heuristic function and search. Best-first
search allows us to take the advantages of both algorithms. With the help of
best-first search, at each step, we can choose the most promising node. In
the best first search algorithm, we expand the node which is closest to the
goal node and the closest cost is estimated by heuristic function, i.e.
f(n)= g(n). Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Best-first Search
Where, h(n)= estimated cost from node n to the goal.
The greedy best first algorithm is implemented by the priority queue.
Disadvantages:
It can behave as an unguided depth-first search in the worst case scenario.
It can get stuck in a loop as DFS.
This algorithm is not optimal.
Example:
Consider the below search problem, and we will traverse it using greedy
best-first search. At each iteration, each node is expanded using evaluation
function f(n)=h(n) , which is given in the below table.
Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then
return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of
evaluation function (g+h), if node n is goal node then return success and
stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the
closed list. For each successor n', check whether n' is already in the OPEN
or CLOSED list, if not then compute evaluation function for n' and place
into Open list.
Advantages:
A* search algorithm is
the best algorithm than
other search algorithms.
A* search algorithm is optimal and complete.
This algorithm can solve very complex problems.
Disadvantages:
It does not always produce the shortest path as it
mostly based on heuristics
and approximation.
A* search algorithm has some complexity issues. Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
The main drawback of A* is memory
A* Search
Example:
In this example, we will traverse the given graph using the A* algorithm.
The heuristic value of all states is given in the below table so we will
calculate the f(n) of each state using the formula f(n)= g(n) + h(n), where
g(n) is the cost to reach any node from start state.
Here we will use OPEN and CLOSED list.
Points to remember:
A* algorithm returns the path which occurred first, and it does not search
for all remaining paths.
The efficiency of A* algorithm depends on the quality of heuristic.
A* algorithm expands all nodes which satisfy the condition f(n)
=149+100=249
Put these node in START list and sort the list
hence START = [ K(172),L(243),I(249),B(258),D(285)]
Remove Bestnode from START i.e K which is not our goal
node and hence generate its successor i.e M and calculate
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
its f values Pune.
A* Algorithm Example
F(A-C-F-J-K-M) = g(m)+h(m)
=172+0=172
Put these node in START list
and sort the list
Therefore START =
[ M(172),L(243),B(258),D(2
85)]
Remove Bestnode from START i.e M which is our
goal node
AND-OR graphs are useful for certain problems where the solution
involves decomposing the problem into smaller problems. This is called
Problem Reduction.
Here, alternatives involves branches where some or all must be satisfied
before we can progress.
In case of A* algorithm, we use the open list to hold nodes that have been
generated but not expanded & the closed list to hold nodes that have been
expanded.
It requires that nodes traversed in the tree be labelled as, SOLVED or
UNSOLVED in the solution process to account for AND node solutions
which requires solutions to all successor nodes.
A solution is found when the start node is labelled as SOLVED.
AO* is best algorithm for solving cyclic AND-OR graphs.
TV Set
AND
OR
Disadvantages:
Sometimes for unsolvable nodes, it can’t find the optimal path. Its
complexity is than other algorithms.
OR
A1*A2 A3 A1 A2*A3
1 1
1
In figure the top node A has been expanded producing two area one leading to B
and leading to C- . The numbers at each node represent heuristic cost (h at that
node (cost of getting to the goal state from current state). For simplicity, it is
assumed that every operation(i.e. applying a rule) has unit cost, i.e., each arc with
single successor will have a cost of 1 and each of its components. With the
available information till now , it appears that C is the most promising node to
expand since its h = 3 , the lowest but going through B would be better since to
use C we must also use D and the cost would be 9(3+4+1+1). Through B it
would be 6(5+1).
Thus the choice of the next node to expand depends not only on a value but also
on whether that node is part of the current best path form the initial node.
In figure the node G appears to be the most promising node, with the
least f ' value. But G is not on the current best path, since to use G we
must use GH with a cost of 9 and again this demands that arcs be used
(with a cost of 27). The path from A through B, E-F is better with a
total cost of (17+1=18).
2. Pick one of these unexpanded nodes and expand it. Add its
successors to the graph and computer f (cost of the
remaining distance) for each of them.
Eg.
Solution:
From first row of multiplication it is clear that B=1
as JE*B=JE
As in the multiplication, second row should start from 0
at tenth's place. So A = 0.
Now in the hundred's place, J + Something = 10.
When you add something to the single digit number that
results in
10. So J = 9.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Cryptarithmatic Problem
Now J+E=10+D i.e 9+E=10+D . Here E can not be 0,1
as these digits are assigned to A and Bresp.
Assume E=2 which gives 9+2=11 means D=1 which is
not possible therefore E can not be 2
Assume E=3 which gives 9+3=12 hence D=2
Hence the solution is
Solution:
From the first row of multiplication, H =1 is clear, As HE x H =
HE.
Now, H+A=M i.e 1+A=10+M as there is carry over next level
Therefore A=9 ,M=0 and N=2
Now, HE*E=HHA i.e 1E*E=119 so by trial and error we get E=7
* *
* *
*
*
-
Python Overview
-
-
Job Trend
Per the indeed.com, percentage growth of Python is 500 times more than it’s
peer Languages.
http://www.indeed.com/jobtrends?q=Perl%2C+.Net%2C+Python%2Cjava&l=&rel
ative=1
-
Job In Big Data space
Source: http://www.forbes.com/sites/louiscolumbus/2014/12/29/where-big-data-
jobs-will-be-in- 2015/
-
What is Scripting Language?
-
What is Python?
-
Interpreters
VersusCompilers
-
• Create source file using text edit
• Use compiler to syntax check and convert source file
into binary
• Use linker to turn binary files into executable format
• Run the resulting executable format file in the
operating system.
-
• The biggest difference between interpreted code and compiled
code is that an interpreted application need not be
“complete.”
• You can test it in bits and pieces until you are satisfied with
the results and put them all together later for the end user to
use.
-
Python Features
-
More Features ..
-
Why
Python
Easy to read Python scripts have clear syntax, simple structure and very few protocols to
remember before programming.
Easy to Maintain Python code is easily to write and debug. Python's success is that its
source code is fairly easy-to-maintain.
Portable Python can run on a wide variety of Operating systems and platforms
and providing the similar interface on allplatforms.
Broad Standard Libraries Python comes with many prebuilt libraries apx. 21K
High Level programming Python is intended to make complex programming simpler. Python deals with
memory
addresses, garbage collection etc internally.
Interactive Python provide an interactive shell to test the things before
implementation. It provide the user the direct interface with Python.
Database Interfaces Python provides interfaces to all major commercial databases. These
interfaces are pretty easy to use.
GUI programming Python supports GUI applications and has framework for Web.
Interface to tkinter, WXPython, DJango in Python makeit .
-
History of Python
-
Python Versions
-
Python time line
By Ripal
Ranpara
-
Key Changes in Python
3.0
Python 2's print statement has been replaced by the print() function.
Old: New:
-
Key Changes in Python
3.0
In Python 3, we should enclose the exception argument in parentheses.
Old: New:
Old: New:
The division of two integers returns a float instead of an integer. "//" can be
used to have the "old" behavior.
-
Python Syntax
-
Basic Syntax
Indentation is used in Python to delimit blocks. The number of spaces
variable,but all statements within the same block must be
is
indented the same amount.
The header line for compound statements, such as if, while, def, and
class should be terminated with a colon ( : )
The semicolon ( ; ) is optional at the end of statement. Error!
-
Variables
-
Python Data Types
-
Numbers
Numbers are Immutable objects in Python that cannot change their
values.
There are three built-in data types for numbers in Python3:
• Integer (int)
• Floating-point numbers (float) (not used much in Python programming)
• Complex numbers: <real part> + <imaginary part>j
Common Number Functions
Function Description
int(x) to convert x to an integer
float(x) to convert x to a floating-point number
abs(x) The absolute value of x
cmp(x,y) -1 if x < y, 0 if x == y, or 1 if x > y
exp(x) The exponential of x: ex
log(x) The natural logarithm of x, for x> 0
pow(x,y) The value of x**y
sqrt(x) The square root of x for x > 0
-
Strings
Python Strings are Immutable objects that cannot change their values.
String indexes starting at 0 in the beginning of the string and working their way from -1
at the end.
-
Strings
String Formatting
+ Concatenation - Adds values on either side of the operator a + b will give HelloPython
* Repetition - Creates new strings, concatenating multiple copies of the a*2 will give HelloHello
same string
[] Slice - Gives the character from the given index a[1] will give e
a[-1] will give
o
[:] Range Slice - Gives the characters from the given range a[1:4] will give ell
in Membership - Returns true if a character exists in the given string ‘H’ in a will give True
-
Strings
Common String Methods
Method Description
str.count(sub, beg= Counts how many times sub occurs in string or in a substring of string if starting index
0,end=len(str)) beg and ending index end are given.
str.isalpha() Returns True if string has at least 1 character and all characters are alphanumeric
and False otherwise.
str.isdigit() Returns True if string contains only digits and False otherwise.
str.lower() Converts all uppercase letters in string to lowercase.
str.upper() Converts lowercase letters in string to uppercase.
str.replace(old, new) Replaces all occurrences of old in string with new.
str.split(str=‘ ’) Splits string according to delimiter str (space if not provided) and returns list
of substrings.
str.strip() Removes all leading and trailing whitespace of string.
str.title() Returns "titlecased" version of string.
-
Lists
A list in Python is an ordered group of items or elements, and these list elements don't have
to be of the same type.
Python Lists are mutable objects that can change their values.
A list contains items separated by commas and enclosed within square brackets.
List indexes like strings starting at 0 in the beginning of the list and working their way from -
1
at the end.
Similar to strings, Lists operations include slicing ([ ] and [:]) , concatenation (+),
repetition (*), and membership (in).
This example shows how to access, update and delete list elements:
-
Lists
Lists can have sublists as elements and these sublists may contain other sublists as
well.
-
Lists
Common List
Methods Method Description
list.append(obj) Appends object obj to list
list.insert(index, obj) Inserts object obj into list at offset index
List comprehension
-
Python Reserved Words
Python Tuples are Immutableobjects that cannot be changed once they have been
created.
A tuple contains items separated by commas and enclosed in parentheses instead of square
brackets.
-
Set
s
Set
Sets are used to store multiple items in a single variable.
Set is one of 4 built-in data types in Python used to store collections of data, the other 3
are List, Tuple, and Dictionary, all with different qualities and usage.
A set is a collection which is both unordered and unindexed.
Sets are written with curly brackets.
Example
Create a Set:
thisset = {"apple", "banana", "cherry"}
print(thisset)
Set Items
Set items are unordered, unchangeable, and do
not allow duplicate values.
Unordered
Unordered means that the items in a set do not
have a defined order.
Set items can appear in a different order every
time you use them, and cannot be referred to
by index
or key.
-
Unchangeable
Sets are unchangeable, meaning that we cannot change the items after the set has been created.
Once a set is created, you cannot change its items, but you can add new items.
Hash Table
• Hashing is a technique that is used to uniquely identify a specific object
from a group of similar objects.
•Assume that you have an object and you want to assign a key to it
to make searching easy.
-
Dictionary
Python's dictionaries are kind of hash table type which consist of key-value pairs
of
unordered elements.
• Keys : must be immutable data types ,usually numbers or strings.
• Values : can be any arbitrary Python object.
Python Dictionaries are mutable objects that can change their values.
A dictionary is encleach key is separated from its value by a colon osed by
curly braces ({ }), the items are separated by commas, and (:).
Dictionary’s values can be assigned and accessed using square braces ([]) with
a key to obtain its value.
-
Dictionary
This example shows how to access, update and delete dictionary
elements:
The output:
-
Dictionary
Common Dictionary Functions
• cmp(dict1, dict2) : compares elements of both dict.
• len(dict) : gives the total number of (key, value) pairs in the
dictionary.
-
Python Control Structures
-
Conditionals
In Python, True and False are Boolean objects of class 'bool' and they are immutable.
Python assumes any non-zero and non-null values as True, otherwise it is False value.
Python does not provide switch or case statements as in other languages.
Syntax:
Example:
-
Conditionals
-
Loops
-
Loops
Loop Control Statements
break :Terminates the and transfers execution to the statement
stloaotepment
I immediately following the
loop.
continue :Causes the loop to skip the remainder of its body and immediately retest
its
condition prior to reiterating.
pass :Used when a statement is required syntactically but you do not want
any
command or code to execute.
-
Python Functions
-
Functions
A function is a block of organized, reusable code that is used to perform a single, related action.
Functions provide better modularity for your application and a high degree of code reusing.
Defining a Function
• Function blocks begin with the keyword def followed by the function name and parentheses ( (
) ).
• Any input parameters or arguments should be placed within these parentheses. You can also
define parameters inside these parentheses.
• The first statement of a function can be an optional statement - the documentation string of
the function or docstring.
• The code block within every function starts with a colon (:) and is indented.
• The statement return [expression] exits a function, optionally passing back an expression to
the caller. A return statement with no arguments is the same as return None.
8/22/20
17
-
Functions
Function Syntax
Function Arguments
You can call a function by using any of the following types of arguments:
• Required arguments: the arguments passed to the function in correct
positional order.
• Keyword arguments: the function call identifies the arguments by the
parameter names.
• Default arguments: the argument has a default value in the function
declaration used when the value is not provided in the function call.
-
Functions
• Variable-length arguments: This used when you need to process unspecified additional
arguments. An asterisk (*) is placed before the variable name in the function declaration.
-
Python File Handling
-
File Handling
-
File Handling
-
Python Exception Handling
-
Exception Handling
Common Exceptions in Python:
NameError - TypeError - IndexError - KeyError - Exception
Exception Handling Syntax:
-
EXCEPTION NAME DESCRIPTION
Exception Base class for all exceptions
Raised when the next() method of an iterator does
StopIteration
not point to any
object.
SystemExit Raised by thesys.exit() function.
Base class for all built-in exceptions except
StandardError
StopIteration andSystemExit.
Base class for all errors that occur for numeric
ArithmeticError
calculation.
Raised when a calculationexceeds maximum limit for a
OverflowError
numeric
type.
FloatingPointError Raised when a floating point calculation fails.
Raised when division or modulo by zero takes place for
ZeroDivisionError
all numeric types.
AssertionError Raised in case of failure of the
Assert statement.
-
AttributeErro Raised in case of failure of attribute reference or assignment.
r
Raised when there is no input from either the raw_input() or input()
EOFError
function and the
end of file is reached.
ImportError Raised when an import statement fails.
KeyboardInt
Raised when the user interrupts program execution, usually by pressing
er rupt
Ctrl+c.
LookupError Base class for all lookup errors.
IndexErr Raised when an index is not found in asequence.
or Raised when the specified key is not found in the dictionary.
KeyError
NameError Raised when an identifier is not found in the local or global namespace.
UnboundLo
Raised when trying to access a local variable in a function or method
c al
but no value has been assigned to it.
Error
Base class for all exceptions that occur outside the Python
Environmen
environment.
tE rror -
Raised when an input/ output operation fails, such as the print
IOErro
statement or the open() function when trying to open a file that does
r
not exist.
IOErro
Raised foroperating system-related errors.
r
SyntaxError Raised when there is an error in Python
IndentationErr syntax. Raised when indentation is not
or specified properly.
Raised when the interpreter finds an internal problem, but when this error
SystemError
is
encountered the Python interpreter does not exit.
Raised when Python interpreter is quit by using the sys.exit() function. If
SystemExit
not handled in the code, causes the interpreter to exit.
-
Modules
A module is a file consisting of Python code that can define functions, classes and
variables.
A module allows you to organize your code by grouping related code which
makes the code
easier to understand and use.
You can use any Python source file as a module by executing an import statement
Python's from statement lets you import specific attributes from a module into the
current namespace.
import * statement can be used to import all names from a module into the
current
namespace
-
Python Object Oriented
-
Python Classes
Class variable
Class constructor
Output
-
Python Classes
Data Hiding You need to name attributes with a double underscore prefix, and
those attributes then are not be directly visible to outsiders.
-
Class Inheritance
-
Python vs. Java
Code Examples
-
Python vs.
Java
Hello World
Java
Python
String Operations
Java
Python
-
Python vs.
Java
Collections
Java
Python
-
Python vs.
Class and Inheritance Java
Java
Python
-
Python Useful Tools
-
Useful Tools
Python IDEs
•Vim
•Eclipse with
PyDev
•Sublime Text
•Emacs
•Komodo Edit
•PyCharm
-
Useful Tools
-
Who Uses Python?
-
Organizations Use Python
-
Thank You
-
-
⦿ Machine learning is about extracting knowledge from the data. It can be defined
as,
⦿ Machine learning is a subfield of artificial intelligence, which enables machines to
learn from past data or experiences without being explicitly programmed.
⦿ Machine learning enables a computer system to make predictions or take some
decisions using historical data without being explicitly programmed. Machine
learning uses a massive amount of structured and semi-structured data so that a
machine learning model can generate accurate result or give predictions based on
that data.
⦿ Machine learning works on algorithm which learn by it’s own using historical data.
It works only for specific domains such as if we are creating a machine learning
model to detect pictures of dogs, it will only give result for dog images, but if we
provide a new data like cat image then it will become unresponsive. Machine
learning is being used in various places such as for online recommender system,
for Google search algorithms, Email spam filter, Face book Auto friend tagging
suggestion, etc.
It can be divided into three types:
⦿ Supervised learning
⦿ Reinforcement learning
⦿ Unsupervised learning
-
-
-
⦿ A Machine Learning system learns from historical data, builds
the prediction models, and whenever it receives new data,
predicts the output for it. The accuracy of predicted output
depends upon the amount of data, as the huge amount of data
helps to build a better model which predicts the output more
accurately.
⦿ Suppose we have a complex problem, where we need to perform
some predictions, so instead of writing a code for it, we just need
to feed the data to generic algorithms, and with the help of these
algorithms, machine builds the logic as per the data and predict
the output. Machine learning has changed our way of thinking
about the problem. The below block diagram explains the
working of Machine Learning algorithm:
-
Features of Machine Learning:
Machine learning uses data to detect various patterns in a given dataset.
It can learn from past data and improve automatically.
It is a data-driven technology.
Machine learning is much similar to data mining as it also deals with the
huge amount of the data.
-
Fig:- Block diagram of decision flow architecture for ML System
-
⦿ 1. Data Acquisition
As machine learning is based on available data for the system to make a decision
hence the first step defined in the architecture is data acquisition. This involves data
collection, preparing and segregating the case scenarios based on certain features
involved with the decision making cycle and forwarding the data to the processing
unit for carrying out further categorization. This stage is sometimes called the data
preprocessing stage. The data model expects reliable, fast and elastic data which may
be discrete or continuous in nature. The data is then passed into stream processing
systems (for continuous data) and stored in batch data warehouses (for discrete data)
before being passed on to data modeling or processing stages.
⦿ 2. Data Processing
The received data in the data acquisition layer is then sent forward to the data
processing layer where it is subjected to advanced integration and processing and
involves normalization of the data, data cleaning, transformation, and encoding.
The data processing is also dependent on the type of learning being used. For e.g.,
if supervised learning is being used the data shall be needed to be segregated
into multiple steps of sample data required for training of the system and the data
thus created is called training sample data or simply training data.
-
⦿ 3. Data Modeling
This layer of the architecture involves the selection of different algorithms that
might adapt the system to address the problem for which the learning is being
devised, These algorithms are being evolved or being inherited from a set of
libraries. The algorithms are used to model the data accordingly, this makes the
system ready for execution step.
⦿ 4. Execution
This stage in machine learning is where the experimentation is done, testing is
involved and tunings are performed. The general goal behind being to optimize the
algorithm in order to extract the required machine outcome and maximize the
system performance, The output of the step is a refined solution capable of
providing the required data for the machine to make decisions.
⦿ 5. Deployment
Like any other software output, ML outputs need to be operational zed or be
forwarded for further exploratory processing. The output can be considered as a
non-deterministic query which needs to be further deployed into the decision-
making system. It is advised to seamlessly move the ML output directly to
production where it will enable the machine to directly make decisions based on
the output and reduce the dependency on the further exploratory steps.
-
Machine Learning Applications in Healthcare :-
Doctors and medical practitioners will soon be able to predict with
accuracy on how long patients with fatal diseases will live. Medical
systems will learn from data and help patients save money by skipping
unnecessary tests.
i) Drug Discovery/Manufacturing
ii) Personalized Treatment/Medication
-
Machine Learning Applications in Retail :-
Machine learning in retail is more than just a latest trend, retailers are
implementing big data technologies like Hadoop and Spark to build big
data solutions. Machine learning algorithms process this data intelligently
and automate the analysis to make this supercilious goal possible for retail
giants like Amazon, Alibaba and Walmart.
i) Machine Learning Examples in Retail for Product
Recommendations
ii)Machine Learning Examples in Retail for Improved Customer
Service.
-
Machine Learning Applications in Media :-
Machine learning offers the most efficient means of engaging billions of
social media users. From personalizing news feed to rendering targeted
ads, machine learning is the heart of all social media platforms for their
own and user benefits. Social media and chat applications have advanced
to a great extent that users do not pick up the phone or use email to
communicate with brands – they leave a comment on Facebook or
Instagram expecting a speedy reply than the traditional channels.
⦿ Earlier Facebook used to prompt users to tag your friends but nowadays
the social networks artificial neural networks machine learning algorithm
identifies familiar faces from contact list. The ANN (Artificial Neural
Networks)algorithm mimics the structure of human brain to power facial
recognition.
⦿ The professional network like LinkedIn knows where you should apply for
your next job, whom you should connect with and how your skills stack
up against your peers as you search for new job.
-
Let’s understand the type of data available in the datasets from the
perspective of machine learning.
1. Numerical Data :-
Any data points which are numbers are termed as numerical data.
Numerical data can be discrete or continuous. Continuous data has any
value within a given range while the discrete data is supposed to have a
distinct value. For example, the number of doors of cars will be discrete
i.e. either two, four, six, etc. and the price of the car will be continuous that
is might be 1000$ or 1250.5$. The data type of numerical data is int64 or
float64.
2. Categorical Data :-
Categorical data are used to represent the characteristics. For example
car color, date of manufacture, etc. It can also be a numerical value
provided the numerical value is indicating a class. For example, 1 can be
used to denote a gas car and 0 for a diesel car. We can use categorical
data to forms groups but cannot perform any mathematical operations on
them. Its data type is an object.
-
3. Time Series Data :-
It is the collection of a sequence of numbers collected at a regular
interval over a certain period of time. It is very important, like in the field
of the stock market where we need the price of a stock after a constant
interval of time. The type of data has a temporal field attached to it so that
the timestamp of the data can be easily monitored.
4. Text Data :-
Text data is nothing but literals. The first step of handling test data is to
convert them into numbers as or model is mathematical and needs data to
inform of numbers. So to do so we might use functions as a bag of word
formulation.
-
⦿ ML Dataset :-
-
⦿ Types of datasets :-
1.Training Dataset: This data set is used to train the model i.e.
these datasets are used to update the weight of the model.
3.Test Dataset: Most of the time when we try to make changes to the
model based upon the output of the validation set then unintentionally we
make the model peek into our validation set and as a result, our model
might get over fit on the validation set as well. To overcome this issue we
have a test dataset that is only used to test the final output of the model in
order to confirm the accuracy.
-
⦿ Machine learning life cycle is a cyclic process to build an efficient
machine learning project. The main purpose of the life cycle is to find a
solution to the problem or project.
⦿ Machine learning life cycle involves seven major steps, which are given
below:
⦿ Gathering Data
⦿ Data preparation
⦿ Data Wrangling
⦿ Analyze Data
⦿ Train the model
⦿ Test the model
⦿ Deployment
-
-
⦿ In the complete life cycle process, to solve a problem, we create a
machine learning system called "model", and this model is created by
providing "training". But to train a model, we need data, hence, life cycle
starts by collecting data.
⦿ The most important thing in the complete process is to understand the
problem and to know the purpose of the problem.
1. Gathering Data:
⦿ Data Gathering is the first step of the machine learning life cycle. The
goal of this step is to identify and obtain all data-related problems.
⦿ In this step, we need to identify the different data sources, as data can be
collected from various sources such as files, database, internet,
or mobile devices. It is one of the most important steps of the life cycle.
The quantity and quality of the collected data will determine the
efficiency of the output. The more will be the data, the more accurate will
be the prediction.
-
⦿ This step includes the below tasks:
Identify various data sources
Collect data
Integrate the data obtained from different sources
By performing the above task, we get a coherent set of data, also called as
a dataset. It will be used in further steps.
2. Data preparation :
⦿ After collecting the data, we need to prepare it for further steps. Data preparation is
a step where we put our data into a suitable place and prepare it to use in our
machine learning training.
⦿ In this step, first, we put all data together, and then randomize the ordering of data.
⦿ This step can be further divided into two processes:
⦿ Data exploration:
It is used to understand the nature of data that we have to work with. We need to
understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
⦿ Data pre-processing:
Now the next step is preprocessing of data for its analysis.
-
3. Data Wrangling :
⦿ Data wrangling is the process of cleaning and converting raw data into a useable
format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in the
next step. It is one of the most important steps of the complete process. Cleaning of
data is required to address the quality issues.
⦿ It is not necessary that data we have collected is always of our use as some of the
data may not be useful. In real-world applications, collected data may have various
issues, including:
⦿ Missing Values
⦿ Duplicate data
⦿ Invalid data
⦿ Noise
-
4. Data Analysis :
⦿ Now the cleaned and prepared data is passed on to the analysis step. This
step involves:
⦿ Selection of analytical techniques
⦿ Building models
⦿ Review the result
⦿ The aim of this step is to build a machine learning model to analyze the
data using various analytical techniques and review the outcome. It starts
with the determination of the type of the problems, where we select the
machine learning techniques such as Classification, Regression, Cluster
analysis, Association, etc. then build the model using prepared data, and
evaluate the model.
⦿ Hence, in this step, we take the data and use machine learning algorithms
to build the model.
-
5. Train Model :
⦿ Now the next step is to train the model, in this step we train our model to
improve its performance for better outcome of the problem.
⦿ We use datasets to train the model using various machine learning
algorithms. Training a model is required so that it can understand the
various patterns, rules, and, features.
6. Test Model :
⦿ Once our machine learning model has been trained on a given dataset,
then we test the model. In this step, we check for the accuracy of our model
by providing a test dataset to it.
⦿ Testing the model determines the percentage accuracy of the model as per
the requirement of project or problem.
-
7. Deployment :
⦿ The last step of machine learning life cycle is deployment, where we
deploy the model in the real-world system.
⦿ If the above-prepared model is producing an accurate result as per our
requirement with acceptable speed, then we deploy the model in the real
system. But before deploying the project, we will check whether it is
improving its performance using available data or not. The deployment
phase is similar to making the final report for a project.
-
⦿ Pre-processing refers to the changes applied to our data before feeding it
to the ML algorithm. Data pre-processing is a technique that is used to
convert the created or collected (raw) data into a clean data set. In other
words, whenever the data is gathered from different sources it is collected
in raw format which is not feasible for the analysis or processing by ML
model. Following figure shows transformation processing performed on
raw data before, during and after applying ML techniques:
-
⦿ Data Pre-processing in Machine Learning can be broadly divided into 3
main parts –
1. Data Integration
2. Data Cleaning
3. Data Transformation
-
-
1.Data Integration and formatting :-
During hackathon and competitions, we often deal with a single csv or
excel file containing all training data. But in real world, source of data
might not be this simple. In real life, we might have to extract data from
various sources and have to integrate it.
2. Data Cleaning :-
1. Dealing with Missing data :-
It is common to have some missing or null data in the real-world data set.
Most of the machine learning algorithms will not work with such data. So,
it becomes important to deal with missing or null data. Some of common
measures taken are,
⦿ Get rid of the column if there are plenty of rows with null values.
⦿ Eliminate the row if there are plenty of columns with null values.
⦿ Change the missing value by mean or median or mode of that column
depending on data distribution in that column.
-
⦿ By substituting the missing values by ‘NA’ or ‘Unknown’ or some other
relevant term, in case of categorical feature column, we can consider
missing data as a new category in itself.
⦿ In this method to come up with educated guesses of possible candidate,
replace missing value by applying regression or classification techniques.
-
⦿ Binning Method:
First sort data and partition
Then one can smooth by bin mean, median and boundaries.
For Example:
• Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29,
34
* Partition into bins: - Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
* Smoothing by bin means: - Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
* Smoothing by bin boundaries: - Bin 1: 4,
4, 4, 15
- Bin 2: 21,
-
21, 25, 25
- Bin 3: 26,
26, 26, 34
⦿ Regression Method:
-
2.3 Remove Outliers from Data :-
⦿ Outliers are those observation that has extreme values, much beyond the
normal range of values for that feature. For example, a very high salary of
C E O of a company can be an outlier if we consider salary of other regular
employees of the company.
⦿ Even few outliers in data set can contribute to poor accuracy of machine
learning model. The common methods to detect outliers and remove
them are –
⦿ Standard Deviation
⦿ Box Plot
-
⦿ Standard Deviation
In statistics, if a data distribution is approximately normal then about 68%
of the data values lie within one standard deviation of the mean and about
95% are within two standard deviations, and about 99.7% lie within three
standard deviations.
-
⦿ Therefore, if you have any data point that is more than 3 times the
standard deviation, then those points are very likely to be anomalous or
outliers.
⦿ Box Plots:
⦿ Box plots are a graphical depiction of numerical data through their
quintiles. It is a very simple but effective way to visualize outliers. Think
about the lower and upper sideburns as the boundaries of the data
distribution. Any data points that show above or below the sideburns, can
be considered outliers or anomalous.
⦿ The concept of the Interquartile Range (IQR) is used to build the box
plot graphs. IQR is a concept in statistics that is used to measure the
statistical dispersion and data variability by dividing the dataset into
quartiles.
⦿ In simple words, any dataset or any set of observations is divided into four
defined intervals based upon the values of the data and how they
compare to the entire dataset. A quartile is what divides the data into
three points and four intervals.
-
-
2.4 Dealing with Duplicate Data :-
The approach to deal with duplicate data depends on the fact whether
duplicate data represents the real-world scenario or is more of an
inconsistency. If it is former than duplicate data should be conserved, else
it should be removed.
3.Data Transformation :-
-
This huge differences of ranges between features in a data set can distort
the training of machine learning model. So, we need to bring the ranges
of all the features at common scale. The common approaches of feature
scaling are –
⦿ Mean Normalization
⦿ Min-Max Normalization
⦿ Z-Score Normalization or Standardization,
We will make brief overview with examples on these approaches in the
last section of this unit.
3.2 Dealing with categorical data :-
⦿ Categorical data, also known as qualitative data are text or string-based
data. Example of categorical data are gender of persons (Male or Female),
names of places (India, America, England), colour of car (Red,White).
⦿ Most of the machine learning algorithms works on numerical data only
and will not be able to process categorical data. So, we need to transform
categorical data into numerical form without losing the sense of
information. Below are the popular approaches to convert categorical
data into numerical form –
⦿ Label Encoding
⦿ One Hot Encoding
⦿ Binary Encoding
-
3.3 Dealing with Imbalanced Data Set :-
⦿ Imbalanced data set are type of data set in which most of the data belongs
to only one class and very few data belongs to other class. This is
common in case of medical diagnosis, anomaly detection where the data
belonging to positive class is a very small percentage.
⦿ For e.g. only 5-10% of data might belong to a disease positive class which
can be an expected distribution in medical diagnosis. But this skewed
data distribution can trick the machine learning model in training phase
to only identify the majority classes and it fails to learn the minority
classes. For example, the model might fail to identify the medical
condition even though it might be showing a very high accuracy by
identifying negative scenarios.
⦿ We need to do something about imbalanced data set to avoid a bad
machine learning model. Below are some approaches to deal with such
situation –
⦿ Under Sampling Majority Class
⦿ Over Sampling Minority Class
⦿ SMOTE (Synthetic Minority Oversampling Technique)
-
3.4 Feature Engineering :-
⦿ Feature engineering is an art of creating new feature from the given data
by either applying some domain knowledge or some common sense or
both.
⦿ A very common example of feature engineering is converting a Date
feature into additional features like Day, Week, Month, Year thus adding
more information into data set.
⦿ Feature engineering enriches data set with more information that can
-
⦿ Execution of Data Pre-processing methods using Python
commonly involves
following steps:
Importing the libraries
Importing the Dataset
Handling of Missing Data
Handling of Categorical Data
Splitting the dataset into training and testing datasets
Feature Scaling
ForthisDataPre-processingscript,WearegoingtouseAnaconda
Navigator and specifically Spyder (IDE) to write the following code.
-
Importing the
libraries :-
import numpy as # used for handling
np import pandas numbers
from # used
as pd sklearn.impute import for handling the# used for
SimpleImputer
handling missing data dataset
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
#used for encoding categorical data
from sklearn.model_selection import # used for
train_test_split training and testing data splitting
from sklearn.preprocessing import
StandardScaler # used for
feature scaling
used to
from sklearn.compose import Applies
ColumnTransformer
# transformers to columns of an array
-
Importing the
Dataset :-
⦿ First of all, let us have a look at the dataset we are going to use
for this particular example. You can download or take this
dataset from :
https://github.com/tarunlnmiit/machine_learning/blob/master/Da
taPrepro
cessing.csv
⦿ It is as shown below:
-
⦿ By pressing raw button in the link, copy this dataset and store it in
Data.csv
file, in the folder where your program is stored.
⦿ In order to import this dataset into our script, we are apparently
going to use pandas as follows.
⦿ When you run this code section, along with libraries, you should
not see any errors. When successfully executed, you can move
to variable explorer in the Spyder UI and you will see the
following three variables.
-
-
⦿ When you double click on each of these variables, you
should see something similar.
-
Handling of Missing
Data :-
⦿ Well the first idea is to remove the lines in the observations
where there is some missing data. But that can be quite
dangerous because imagine this data set contains key
information. It would be quite dangerous to remove such
observation. So, we need to figure out a better idea to handle
this problem. And the most common idea to handle missing
data is to take the mean of the columns, as discussed in
earlier section.
⦿ If you noticed in our dataset, we have two values missing,
one for age column in 6th data index and for Income column
in 4th data row. Missing values should be handled
during the data analysis. So, we do that as follows.
-
# handling the missing data and replace missing values with
nan from numpy and replace with mean of all the other
values
imputer = SimpleImputer(missing_values=np.nan,
strategy='mean’)
imputer = imputer.fit(X[:, 1:])
X[:, 1:] = imputer.transform(X[:, 1:])
-
Here you can see, that the missing values have been replaced by
the average
values of the respective columns.
-
Handling of Categorical
Data :-
In this dataset we can see that we have two categorical
-
⦿ # encode categorical data
from sklearn.preprocessing import LabelEncoder,
OneHotEncoder labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
rg = ColumnTransformer([("Region", OneHotEncoder(),
[0])], remainder =
'passthrough’)
X = rg.fit_transform(X)
labelencoder_Y = LabelEncoder()
Y = labelencoder_Y.fit_transform(Y)
-
-
⦿ Here, you can see that the Region variable is now made up of a 3
bit binary variable. The left most bit represents India, 2nd
bit represents Brazil and the last bit represents USA. If
the bit is 1 then it represents data for that country otherwise
not.
For Online Shopper variable, 1 represents Yes and 0 represents
No.
-
Splitting the dataset into training and
testing
⦿ Any datasets :-
machine learning algorithm needs to be tested for
accuracy. In order to do that, we divide our data set into
two parts: training set and testing set. As the name itself
suggests, we use the training set to make the algorithm
learn the behaviours present in the data and check the
correctness of the algorithm by testing on testing set. In
Python, we do that as follows:
# feature scaling
sc_X = StandardScaler()
X_train =
sc_X.fit_transform(X_train) -
X_test =
sc_X.transform(X_test)
⦿ After the execution of this code, our training independent
variable X and
our testing independent variable X and look like this.
-
⦿ In the older days, people used to perform Machine Learning tasks by
manually coding all the algorithms and mathematical and statistical
formula. This made the process time consuming, tedious and inefficient.
But in the modern days, it becomes very much easier and efficient
compared to the olden days by various python libraries, frameworks, and
modules. Today, Python is one of the most popular programming
languages for this task and it has replaced many languages in the
industry, one of the reasons is its vast collection of libraries. Python
libraries that used in Machine Learning are:
⦿ Numpy
⦿ Scipy
⦿ Scikit-learn
⦿ Theano
⦿ TensorFlow
⦿ Keras
⦿ PyTorch
⦿ Pandas
⦿ Matplotlib
-
⦿ The single most important reason for the popularity of Python in the field
of AI and ML is the fact that Python provides 1000s of inbuilt libraries that
have in-built functions and methods to easily carry out data analysis,
processing, wrangling, modelling and so on.
In the below section, we’ll discuss the libraries for the following tasks:
1. Statistical Analysis
2. Data Visualization
3. Data Modelling and Machine Learning
4. Deep Learning
5. Natural Language Processing (NLP)
-
1.Statistical Analysis:
Python comes with tons of libraries for the sole purpose of statistical
analysis. Top statistical packages that provide in-built functions to perform
the most complex statistical computations are:
-
Pandas :- Pandas is another important statistical library mainly used in a
wide range of fields including, statistics, finance, economics, data analysis
and so on. The library relies on the NumPy array for the purpose of
processing pandas data objects. NumPy, Pandas, and SciPy are heavily
dependent on each other for performing scientific computations, data
manipulation and so on. Pandas is one of the best libraries for processing
huge chunks of data, whereas NumPy has excellent support for multi-
dimensional arrays and Scipy, on the other hand, provides a set of sub-
packages that perform a majority of the statistical analysis tasks.
StatsModels :- Built on top of NumPy and SciPy, the StatsModels Python
package is the best for creating statistical models, data handling and
model evaluation. Along with using NumPy arrays and scientific models
from SciPy library, it also integrates with Pandas for effective data
handling. This library is famously known for statistical computations,
statistical testing, and data exploration.
-
2. Data Visualization:
A picture speaks more than a thousand words. Data visualization is all
about expressing the key insights from data effectively through graphical
representations. It includes the implementation of graphs, charts, mind
maps, heat-maps, histograms, density plots, etc, to study the correlations
between various data variables.
Best Python data visualization packages that provide in-built functions to
study the dependencies between various data features are:
Matplotlib :-Matplotlib is the most basic data visualization package in
Python. It provides support for a wide variety of graphs such as
histograms, bar charts, power spectra, error charts, and so on. It is a 2
Dimensional graphical library that produces clear and concise graphs
that are essential for Exploratory Data Analysis (EDA).
Seaborn :-The Matplotlib library forms the base of the Seaborn library. In
comparison to Matplotlib, Seaborn can be used to create more appealing
and descriptive statistical graphs. Along with extensive supports for data
visualization, Seaborn also comes with an inbuilt data set oriented API for
studying the relationships between multiple variables.
-
Plotly :-Ploty is one of the most well know graphical Python libraries. It
provides interactive graphs for understanding the dependencies
between target and predictor variables. It can be used to analyze and
visualize statistical, financial, commerce and scientific data to produce
clear and concise graphs, sub-plots, heatmaps, 3D charts and so on.
-
3. Machine Learning :
Implementing ML, DL, etc. involves coding 1000s of lines of code and this
can become more cumbersome when you want to create models that
solve complex problems through neural networks. But thankfully we don’t
have to code any algorithms because Python comes with several
packages just for the purpose of implementing machine learning
techniques and algorithms.
Top ML packages that provide in-built functions to implement all the ML
algorithms:
-
XGBoost :-XGBoost which stands for Extreme Gradient Boosting is one of
the best Python packages for performing Boosting Machine Learning.
Libraries such as LightGBM and CatBoost are also equally equipped with
well-defined functions and methods. This library is built mainly for the
purpose of implementing gradient boosting machines which are used to
improve the performance and accuracy of Machine Learning Models.
-
4.Deep Learning :
The biggest advancements in ML and AI is been through deep learning.
With the introduction to deep learning, it is now possible to build
complex models and process humungous data sets. Thankfully, Python
provides the best deep learning packages that help in building effective
neural networks.
Top deep learning packages that provide in-built functions to implement
convoluted Neural Networks are:
-
Pytorch :-Pytorch is an open-source, Python-based scientific computing
package that is used to implement Deep Learning techniques and Neural
Networks on large datasets. This library is actively used by Facebook to
develop neural networks that help in various tasks such as face
recognition and auto-tagging.
-
5.Natural Language Processing:
Have you ever wondered how Google so apply predicts what you’re
searching for? The technology behind Alexa, Siri, and other chatbots is
Natural Language Processing. NLP has played a huge role in designing AI-
based systems that help in describing the interaction between human
language and computers.
Top Natural Language Processing packages that provide in-built functions
to implement high-level AI-based systems are:
-
spaCy:- spaCy is a free, open-source Python library for implementing
advanced Natural Language Processing (NLP) techniques. When you’re
working with a lot of text it is important that you understand the
morphological meaning of the text and how it can be classified to
understand human language. These tasks can be easily achieved through
spaCY.
-
Thanks !!!
-
-
Types of ML :-
⦿ There are four types of machine learning:
1. Supervised Learning:
⦿ Supervised Learning is the one, where you can consider the learning
is guided by a teacher.We have a dataset which acts as a teacher and
its role is to train the model or the machine. Once the model gets
trained it can start making a prediction or decision when new data is
given to it.
⦿ Supervised learning uses labelled training data to learn the mapping
function that turns input variables (X) into the output variable (Y). In
other words, it solves for f in the following equation:
Y = f (X)
⦿ This allows us to accurately generate outputs when given new inputs.
-
⦿ Two types of supervised learning are: classification and regression.
-
⦿ Thus, In supervised Machine Learning
⦿ “The outcome or output for the given input is known before itself ” and the
machine must be able to map or assign the given input to the output.
Multiple images of a cat, dog, orange, apple etc here the images are
labelled. It is fed into the machine for training and the machine must
identify the same. Just like a human child is shown a cat and told so, when
it sees a completely different cat among others still identifies it as a cat,
the same method is employed here. In short,Supervised Learning means
– Train Me!
-
2.Unsupervised Learning:
⦿ Unsupervised learning models are used when we only have the input
variables (X) and no corresponding output variables.
⦿ They use unlabelled training data to model the underlying structure of the
data. Input data is given and the model is run on it. The image or the input
given are mixed together and insights on the inputs can be found .
⦿ The model learns through observation and finds structures in the data.
Once the model is given a dataset, it automatically finds patterns and
relationships in the dataset by creating clusters in it.
⦿ What it cannot do is add labels to the cluster, like it cannot say this a
group of apples or mangoes, but it will separate all the apples from
mangoes.
-
⦿ Two types of unsupervised learning are:Association and Clustering
-
Fig: grouping of similar data
-
3.Semi-supervised Learning:
3. Reinforced Learning:
-
⦿ It is the ability of an agent to interact with the environment and find out
what is the best outcome. It follows the concept of hit and trial method.
The agent is rewarded or penaltized with a point for a correct or a wrong
answer, and on the basis of the positive reward points gained the model
trains itself.
-
Fig : Types of Machine
Learning
-
1. Overfitting :Over fitting refers to a model that models the training data
too well.
⦿ Over fitting happens when a model learns the detail and noise in the
training data to the extent that it negatively impacts the performance of
the model on new data. This means that the noise or random fluctuations
in the training data is picked up and learned as concepts by the model.
The problem is that these concepts do not apply to new data and
negatively impact the models ability to generalize.
⦿ Over fitting is more likely with nonparametric and nonlinear models that
have more flexibility when learning a target function. As such, many
nonparametric machine learning algorithms also include parameters or
techniques to limit and constrain how much detail the model learns.
2. Underfitting : Under fitting refers to a model that can neither model the
training data nor generalize to new data.
⦿ An under fit machine learning model is not a suitable model and will be
obvious as it will have poor performance on the training data.
⦿ Under fitting is often not discussed as it is easy to detect given a good
performance metric. The remedy is to move on and try alternate machine
learning algorithms. Nevertheless, it does provide a good contrast to the
problem of over fitting.
-
⦿ Bias: It gives us how closeness is our predictive model’s to training data
after averaging predict value. Generally algorithm has high bias which
help them to learn fast and easy to understand but are less flexible. That
looses it ability to predict complex problem, so it fails to explain the
algorithm bias. This results in under fitting of our model.
⦿ Getting more training data will not help much.
⦿ “Signal” as the true underlying pattern that you wish to learn from the
data.
⦿ “Noise” on the other hand, refers to the irrelevant information or
randomness in a dataset.
-
⦿ Overfitting and Underfitting are the two main problems that occur in
machine learning and degrade the performance of the machine learning
models.
-
Over fitting :
⦿ Overfitting occurs when our machine learning model tries to cover all the
data points or more than the required data points present in the given
dataset. Because of this, the model starts caching noise and inaccurate
values present in the dataset, and all these factors reduce the efficiency
and accuracy of the model. The overfitted model has low bias and high
variance.
⦿ The chances of occurrence of overfitting increase as much we provide
training to our model. It means the more we train our model, the more
chances of occurring the overfitted model.
⦿ Overfitting is the main problem that occurs in supervised learning.
⦿ Example: The concept of the overfitting can be understood by the below
graph of the linear regression output:
-
-
⦿ In above graph, the model tries to cover all the data points present in the
scatter plot. It may look efficient, but in reality, it is not so. Because the
goal of the regression model to find the best fit line, but here we have not
got any best fit, so, it will generate the prediction errors.
⦿ How to avoid the Overfitting in Model :
⦿ Both overfitting and underfitting cause the degraded performance of the
machine learning model. But the main cause is overfitting, so there are
some ways by which we can reduce the occurrence of overfitting in our
model.
⦿ Cross-Validation
⦿ Training with more data
⦿ Removing features
⦿ Early stopping the training
⦿ Regularization
⦿ Ensembling
-
Underfitting :
⦿ Underfitting occurs when our machine learning model is not able to
capture the underlying trend of the data. To avoid the overfitting in the
model, the fed of training data can be stopped at an early stage, due to
which the model may not learn enough from the training data. As a result,
it may fail to find the best fit of the dominant trend in the data.
⦿ In the case of underfitting, the model is not able to learn enough from the
training data, and hence it reduces the accuracy and produces
unreliable predictions.
⦿ An underfitted model has high bias and low variance.
⦿ Example: We can understand the underfitting using below output of the
linear regression model:
-
-
⦿ In above graph, the model is unable to capture the data points present in
the plot.
-
Goodness of Fit :
⦿ The "Goodness of fit" term is taken from the statistics, and the goal of the
machine learning models to achieve the goodness of fit. In statistics
modeling, it defines how closely the result or predicted values match the
true values of the dataset.
⦿ The model with a good fit is between the underfitted and overfitted
model, and ideally, it makes predictions with 0 errors, but in practice, it is
difficult to achieve it.
⦿ There are two other methods by which we can get a good point for our
model, which are the resampling method to estimate model accuracy
and validation dataset.
-
What is Regression :
Regression analysis is a statistical method to model the relationship
between a dependent (target) and independent (predictor) variables
with one or more independent variables.
It helps to understand how the value of the dependent variable is
changing corresponding to an independent variable when other
independent variables are held fixed. It predicts continuous/real values
such as temperature.
Regression is a supervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous
output variable based on the one or more predictor variables. It is mainly
used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables., age,
salary, price, etc.
-
In Regression, we plot a graph between the variables which best fits the
given data points, using this plot, the machine learning model can make
predictions about the data. In simple words, "Regression shows a line or
curve that passes through all the data points on target-predictor graph
in such a way that the vertical distance between the data points and the
regression line is minimum." The distance between data points and line
tells whether a model has captured a strong relationship or not.
-
⦿ Terminologies Related to the Regression :
-
⦿ Types of Regression :
There are various types of regressions which are used in data science and
machine learning.
⦿ Linear Regression
⦿ Logistic Regression
⦿ Polynomial Regression
⦿ Support Vector Regression
⦿ Decision Tree Regression
⦿ Random Forest Regression
⦿ Ridge Regression
⦿ Lasso Regression:
-
-
Linear Regression:
⦿ Linear regression is a statistical regression method which is used for
predictive analysis.
⦿ It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous variables.
⦿ It is used for solving the regression problem in machine learning.
⦿ Linear regression shows the linear relationship between the independent
variable (X-axis) and the dependent variable (Y-axis), hence called
linear regression.
⦿ If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input
variable, then such linear regression is called multiple linear
regression.
⦿ The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.
-
-
⦿ Some popular applications of linear regression are:
-
⦿ Model the relationship between the two variables. Such as the
relationship between Income and expenditure, experience and Salary,
etc.
⦿ Forecasting new observations. Such as Weather forecasting according
to temperature, Revenue of a company according to the investments in a
year, etc.
⦿ Recall the geometry lesson from high school. What is the equation of a
line?
y = mx + c
-
Where,
m is the slope. It determines what will be the angle of the line. It is the
parameter denoted as β.
y = b0 + b1 * x 1
-
Where ,
b0 is constant.
y is dependent variable
-
⦿ Simple Linear Regression in Python :
#importing libraries
import numpy as np
import
matplotlib.pyplot as
plt
import pandas as pd
# Importing the
dataset
dataset = pd.read_csv('salary_data.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
-
# Splitting the dataset into the Training set and Test set
-
# Visualizing the Training set results
viz_train = plt
viz_train.scatter(X_train, y_train, color='red')
viz_train.plot(X_train, regressor.predict(X_train), color='blue')
viz_train.title('Salary VS Experience (Training set)')
viz_train.xlabel('Year of Experience')
viz_train.ylabel('Salary')
viz_train.show()
viz_test = plt
viz_test.scatter(X_test, y_test, color='red')
viz_test.plot(X_train, regressor.predict(X_train), color='blue')
viz_test.title('Salary VS Experience (Test set)')
viz_test.xlabel('Year of Experience')
viz_test.ylabel('Salary')
viz_test.show()
-
⦿ After running above code excluding code explanations part, you can see
2 plots in the console window as shown below:
-
-
⦿ One plot is from training set and another from test. Blue lines are in the
same direction. Our model is good to use now.
⦿ Now we can use it to calculate (predict) any values of X depends on
y or any values of y depends on X. This can be done by using predict()
function as follows:
Output :
-
In conclusion, with Simple Linear Regression, we have to do 5 steps
as per below:
y_pred = regressor.predict(X_test)
-
Predict y_pred using array of
X_test
-
2. Multiple Linear Regression :
⦿ For Examples:
⦿ The selling price of a house can depend on the desirability of the
location, the number of bedrooms, the number of bathrooms, the year the
house was built, the square footage of the plot and a number of other
factors.
⦿ The height of a child can rest on the height of the mother, the height of the
father, nutrition, and environmental factors.
-
⦿ Multiple linear regression works the same way as that of simple linear
regression, except for the introduction of more independent variables and
their corresponding coefficients.
⦿ In Simple Linear Regression we dealt with equation:
y = b 0 + b 1 * x1
y = b 0 + b 1 * x 1 + b 2 * x2 + b 3 * x3 + … … … . . . +b n * xn
Or
i
Y= b 0 + ∑ bn
1
-
⦿ In translation, predicted value y is sum of all features multiplied with their
coefficients, summed with base coefficient b0 .
Where,
-
Multiple Linear Regression in Python :
#Importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
dataset = pd.read_csv('salary_data.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
regressor = LinearRegression()
regressor.fit(X_train, y_train)
x_new = [[5],[2],[1],[2]]
y_pred = regressor.predict(np.array(x_new).reshape(1, 4))
print(y_pred)
accuracy = (regressor.score(X_test,y_test))
print(accuracy)
-
Output :
You can offer to your candidate the salary of ₹48017.20 and this is
the
best salary for him!
-
3. Polynomial Linear Regression :
-
⦿ For Example: Increment of salaryof employees per year is often non-
linear. We may express it in terms of polynomial Equation as
y = b 0 + b 1 x + b 2 x 2 + b 3 x 3 + ......+ b n x
n
where,
⦿ b0 is constant .
⦿ y is dependent variable
⦿ b i coefficient can be thought of as a multiplier that connects the
independent and dependent variables. It translates how much y will be
affected by a degree or powerof change in x. In other words, a change in
x i does not usually mean an equal change in y.
⦿ x is an independent variable.
-
⦿ Let us consider dataset of this kind of example that represent the
Polynomial shape.
-
⦿ To get an overview of the increment of salary, let’s visualize the data set
into a chart:
-
⦿ Let’s think about our candidate. He has 5.5 Year of experience. What if we
use the Linear Regression in this example?
-
Polynomial Linear Regression in Python :
#Importing libraries
import numpy as np
import
matplotlib.pyplot as
plt
import pandas as pd
# Importing the
dataset
dataset =
pd.read_csv(‘positio
n_salaries’)
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
-
# Splitting the dataset into the Training set and Test set
-
# Visualizing the Polynomial Regression results
def viz_polymonial():
plt.scatter(X, y, color='red')
plt.plot(X, pol_reg.predict(poly_reg.fit_transform(X)), color='blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
return
viz_polymonial()
-
# Additional feature
# Making the plot line (Blue one) more smooth
def viz_polymonial_smooth():
X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape(len(X_grid), 1)
plt.scatter(X, y, color='red')
plt.plot(X_grid, pol_reg.predict(poly_reg.fit_transform(X_grid)),
color='blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
return
viz_polymonial_smooth()
-
⦿ After calling the viz_polynomial() function, you can see a plotting as per
below:
-
Last step, let’s predict the value of our candidate (with 5.5 YE) Polynomial
Regression model:
print(pol_reg.predict(poly_reg.fit_transform([[5.5]])))
Output:
It’s time to let our candidate know, we will offer him a best salary in class
with₹ 132,148!
-
⦿ Decision trees are supervised learning algorithms used for both,
classification and regression.
⦿ Decision trees are assigned to the information-based learning algorithms
which use different measures of information gain for learning. We can use
decision trees for issues where we have continuous but also categorical
input and target features.
⦿ The main idea of decision trees is to find those descriptive features which
contain the most "information" regarding the target feature and then split
the dataset along the values of these features such that the target feature
values for the resulting sub datasets are as pure as possible.
⦿ The descriptive feature which leaves the target feature most purely is said
to be the most informative one.
⦿ This process of finding the "most informative" feature is done until we
accomplish a stopping criterion, where we then finally end up in so
called leaf nodes.
-
-
⦿ The leaf nodes contain the predictions we will make for new query
instances presented to our trained model.
⦿ This is possible since the model has kind of learned the underlying
structure of the training data and hence can, given some assumptions,
make predictions about the target feature value (class) of unseen query
instances.
⦿ A decision tree mainly contains of a root node, interior nodes, and leaf
nodes which are then connected by branches.
-
⦿ Decision trees are sensitive to the specific data on which they are trained.
If the training data is changed the resulting decision tree can be quite
different and in turn the predictions can be quite different.
⦿ Also, Decision trees are computationally expensive to train, carry a big
risk of overfitting (learning system tightly fits the given training data so
much that it would be inaccurate in predicting the outcomes of the
untrained data. In decision trees, over-fitting occurs when the tree is
designed so as to perfectly fit all samples in the training data set.), and
tend to find local optima because they can’t go back after they have made
a split.
⦿ To solve these weaknesses, we use Random Forest which illustrates the
power of combining many decision trees into one model.
-
-
⦿ Random forest is a Supervised Learning algorithm which uses ensemble
learning method for classification and regression.
⦿ An Ensemble method is a technique that combines the predictions from
multiple machine learning algorithms together to make more accurate
predictions than any individual model. A model comprised of many
models is called an Ensemble model.
-
Types of Ensemble Learning:
⦿ Boosting.
⦿ Bootstrap Aggregation (Bagging).
1. Boosting
Boosting refers to a group of algorithms that utilize weighted averages to
make weak learners into stronger learners. Boosting is all about
“teamwork”. Each model that runs, dictates what features the next model
will focus on.In boosting as the name suggests, one is learning from other
which in turn boosts the learning.
-
⦿ The Classification algorithm is a Supervised Learning technique that is
used to identify the category of new observations on the basis of training
data. In Classification, a program learns from the given dataset or
observations and then classifies new observation into a number of classes
or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc.
Classes can be called as targets/labels or categories.
-
⦿ The best example of an ML classification algorithm is Email Spam
Detector.
⦿ The main goal of the Classification algorithm is to identify the category of
a given dataset, and these algorithms are mainly used to predict the
output for the categorical data.
⦿ Classification algorithms can be better understood using the below
diagram. In the below diagram, there are two classes, class A and Class B.
These classes have features that are similar to each other and dissimilar to
other classes.
-
-
⦿ The algorithm which implements the classification on a dataset is known
as a classifier. There are two types of Classifications:
-
Learners in Classification Problems:
⦿ Lazy Learners: Lazy Learner firstly stores the training dataset and wait
until it receives the test dataset. In Lazy learner case, classification is done
on the basis of the most related data stored in the training dataset. It takes
less time in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning.
-
Types of ML Classification Algorithms:
Classification Algorithms can be further divided into the Mainly two category:
⦿ Linear Models
• Logistic Regression
• Support Vector Machines
⦿ Non-linear Models
• K-Nearest Neighbors
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification
What is Logistic Regression :
-
Type of Logistic Regression:
-
⦿ Let’s try to understand first why logistics, and why not linear?
⦿ Let ‘x’ be some feature and ‘y’ be the output which can be either 0 or 1,
with binary classification.
⦿ The probability that the output is 1 given its input x, can be represented
as:
⦿ If we predict the probability via linear regression, we can write it as:
P(X) = b 0 + b 1 *X
-
⦿ To avoid this problem, log-odds function or logit function is used.
⦿ Logistic regression can therefore be expressed in terms of logit function as :
(
log P(x)/1-P(x) )= b0 + b1 * X
⦿ where, the left-hand side is called the logit or log-odds function, and p(x)/(1-
p(x)) is called odds.
⦿ The odds signify the ratio of probability of success [p(x)] to probability of
failure [ 1- p(X)]. Therefore, in Logistic Regression, linear combination of
inputs is mapped to the log(odds) - the output being equal to 1.
⦿
If we take an inverse of the above function, we get:
⦿ P(x) =
-
⦿ In more simplified form, above equation becomes
-
Support Vector Machine(SVM) :
⦿ SVM is Supervised Learning algorithms, also can be used for
Classification as well as Regression problems. But, mostly used for
Classification in Machine Learning.
⦿ The goal of the SVM algorithm is to create the best hyperplane or
decision boundary that can separate n-dimensional space into classes so
that we can easily put the new data point in the correct category in the
future.
⦿ SVM chooses the extreme points/vectors called Support Vectors that help
in creating the hyperplane. Consider the following diagram in which
there are two different categories that are classified using a decision
boundary or hyperplane:
-
-
Hyperplane and Support Vectors in the SVM algorithm:
⦿ Hyperplane: There can be multiple lines/decision boundaries to seprate
the classes in n-dimensional space, but we need to find out the best
decision boundary that helps to classify the data points. This best
boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the
dataset, which means if there are 2 features (as shown in image), then
hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means
the maximum distance between the data points.
⦿ Support Vectors:
The data points or vectors that are the closest to the hyperplane and
which affect the position of the hyperplane are termed as Support Vector.
Since these vectors support the hyperplane, hence called a Support
vector.
-
⦿ Here suppose there is a strange cat that has some features as that of dogs,
so if we want a model that can accurately identify whether it is a cat or
dog, in such cases we use SVM. We will first train our model with lots of
features of cats and dogs so that it can learn from number of features of
cats and dogs, and then test it with this strange animal. So as support
vector creates a decision boundary between these two data (cat and dog)
and choose extreme cases (of support vectors), it will see the extreme
case of cat and dog. On the basis of these extreme cases of support
vectors, it will classify it as a cat.
-
⦿ K-Nearest Neighbour is one of the simplest Supervised Machine Learning
that compares similarities between the K number of features (attributes)
of new dataset of a particular object with K features of the dataset of
available objects and put it into the category that is most similar to. Hence
the name is K-NN.
⦿ This algorithm can be used for Regression as well as for Classification,
but mostly it is useful for the classification problems.
⦿ It is a non-parametric algorithm, means it does not make any
assumption on underlying data and is also called a lazy learner
algorithm because it does not learn from the training set immediately
instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
⦿ In other words, at the training phase K-NN algorithm just stores the
dataset and when it gets new data, then it classifies that data
into a category that is much similar to the new data.
-
⦿ For Example, Suppose, we have a new animal that looks similar to cat and
dog, but we want to know either it is a cat or dog. So, for this identification,
we can use the KNN algorithm, as it works on a similarity principle. Our
KNN model will simply, find the similar features of the new data set into
the cats and dogs’ available dataset and based on the most similar
features it will put it in either category of cat or dog.
-
1. If classification, assign the uncategorized object to the class where the
maximum number of neighbours belonged to.
or
2. If regression, find the average value of all the closest neighbours and
assign it as the value for the unknown object.
⦿ For step 3, the most used distance formula is Euclidean Distance which is
given as follows:
⦿ By Euclidean Distance, the distance between two points P1(x1,y1)and
P2(x2,y2) can be expressed as :
-
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and
we have a new data point x1, so this data point will lie in which of these
categories. To solve this type of problem, we need a K-NN algorithm. With
the help of K-NN, we can easily identify the category or class of a
particular dataset. Consider the below diagram:
-
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
⦿ Step-1: Select the number K of the neighbors
⦿ Step-2: Calculate the Euclidean distance of K number of neighbors
⦿ Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
⦿ Step-4: Among these k neighbors, count the number of the data points in
each category.
⦿ Step-5: Assign the new data points to that category for which the number
of the neighbor is maximum.
⦿ Step-6: Our model is ready.
⦿ Suppose we have a new data point and we need to put it in the required
category. Consider the below image:
-
Firstly, we will choose the number of neighbors, so we will choose the k=5.
Next, we will calculate the Euclidean distance between the data points.
The Euclidean distance is the distance between two points, which we have
already studied in geometry. It can be calculated as:
-
By calculating the Euclidean distance we got the nearest neighbors, as
three nearest neighbors in category A and two nearest neighbors in
category B. Consider the below image:
-
As we can see the 3 nearest neighbors are from category A, hence this
new data point must belong to category A.
-
How to select the value of K in the K-NN Algorithm?
Below are some points to remember while selecting the value of K in the
KNN algorithm:
⦿ There is no particular way to determine the best value for "K", so we need
to try some values to find the best out of them. The most preferred value
for K is 5.
⦿ A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.
⦿ Large values for K are good, but it may find some difficulties.
-
Advantages of KNN Algorithm:
⦿ It is simple to implement.
⦿ It is robust to the noisy training data
⦿ It can be more effective if the training data is large.
-
⦿ Decision Tree is a Supervised learning technique that can be used for
both classification and Regression problems, but mostly it is preferred for
solving Classification problems.
⦿ In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision and
have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
⦿ The decisions or the test are performed on the basis of features of the
given dataset.
-
Why use Decision Trees?
Below are the two reasons for using the Decision tree:
⦿ Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more
homogeneous sets.
⦿ Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
-
⦿ Splitting: Splitting is the process of dividing the decision node/root node
into sub-nodes according to the given conditions.
⦿ Branch/Sub Tree: A tree formed by splitting the tree.
⦿ Pruning: Pruning is the process of removing the unwanted branches from
the tree.
⦿ Parent/Child node: The root node of the tree is called the parent node,
and other nodes are called the child nodes.
In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree. This algorithm compares
the values of root attribute with the record (real dataset) attribute and,
based on the comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with
the other sub-nodes and move further. It continues the process until it
reaches the leaf node of the tree. The complete process can be better
understood using the below algorithm:
-
⦿ Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
⦿ Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
⦿ Step-3: Divide the S into subsets that contains possible values for the best
attributes.
⦿ Step-4: Generate the decision tree node, which contains the best
attribute.
⦿ Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as a
leaf node.
-
Attribute Selection Measures :-
⦿ While implementing a Decision tree, the main issue arises that how to
select the best attribute for the root node and for sub-nodes. So, to solve
such problems there is a technique which is called as Attribute selection
measure or ASM. By this measurement, we can easily select the best
attribute for the nodes of the tree. There are two popular techniques for
ASM, which are:
1. Information Gain:
⦿ Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
⦿ It calculates how much information a feature provides us about a
class.
⦿ According to the value of information gain, we split the node and
build
the decision tree.
⦿ A decision tree algorithm always tries to maximize the value of
information gain, and a node/attribute having the highest information
gain is split first. It can be calculated using the below formula:
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each
-
feature)
Entropy: Entropy is a metric to measure the impurity in a given attribute. It
specifies randomness in data. Entropy can be calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
S= Total number of samples
P(yes)= probability of yes
P(no)= probability of no
2. Gini Index:
⦿ Gini index is a measure of impurity or purity used while creating a decision
tree in the CART(Classification and Regression Tree) algorithm.
⦿ An attribute with the low Gini index should be preferred as compared to the
high Gini index.
⦿ It only creates binary splits, and the CART algorithm uses the Gini index to
create binary splits.
⦿ Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj
2
-
Pruning: Getting an Optimal Decision tree
⦿ Pruning is a process of deleting the unnecessary nodes from a tree in order
to get the optimal decision tree.
⦿ A too-large tree increases the risk of overfitting, and a small tree may not
capture all the important features of the dataset. Therefore, a technique
that decreases the size of the learning tree without reducing accuracy is
known as Pruning. There are mainly two types of tree pruning technology
used:1.Cost Complexity Pruning and 2.Reduced Error Pruning.
Advantages of the Decision Tree
⦿ It is simple to understand as it follows the same process which a human
follow while making any decision in real-life.
⦿ It can be very useful for solving decision-related problems.
⦿ It helps to think about all the possible outcomes for a problem.
⦿ There is less requirement of data cleaning compared to other
algorithms.
Disadvantages of the Decision Tree
⦿ The decision tree contains lots of layers, which makes it complex.
⦿ It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.
⦿ For more class labels, the computational complexity of the decision tree
may increase. -
⦿ Random Forest is a popular machine learning algorithm that belongs to
the supervised learning technique. It can be used for both Classification
and Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.
⦿ As the name suggests, "Random Forest is a classifier that contains a
number of decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of that
dataset." Instead of relying on one decision tree, the random forest takes
the prediction from each tree and based on the majority votes of
predictions, and it predicts the final output.
-
⦿ In machine learning classification problems, there are often too many
factors on the basis of which the final classification is done. These factors
are basically variables called features. The higher the number of features,
the harder it gets to visualize the training set and then work on it.
Sometimes, most of these features are correlated, and hence redundant.
This is where dimensionality reduction algorithms come into play.
Dimensionality reduction is the process of reducing the number of
random variables under consideration, by obtaining a set of principal
variables. It can be divided into feature selection and feature extraction.
-
-
Components of Dimensionality Reduction :
-
⦿ Dimensionality reduction may be both linear or non-linear, depending
upon the method used. The prime linear method, called Principal
Component Analysis, or PCA.
⦿ Principal Component Analysis(PCA)
⦿ This method was introduced by Karl Pearson. It works on a condition that
while the data in a higher dimensional space is mapped to data in a lower
dimension space, the variance of the data in the lower dimensional space
should be maximum.
-
It involves the following steps:
-
Advantages of Dimensionality Reduction :
-
Principal Component Analysis :
⦿ Principal Component Analysis is an unsupervised learning algorithm that
is used for the dimensionality reduction in machine learning. It is a
statistical process that converts the observations of correlated features
into a set of linearly uncorrelated features with the help of orthogonal
transformation. These new transformed features are called the Principal
Components. It is one of the popular tools that is used for exploratory
data analysis and predictive modeling. It is a technique to draw strong
patterns from the given dataset by reducing the variances.
⦿ PCA generally tries to find the lower-dimensional surface to project the
high-dimensional data.
⦿ PCA works by considering the variance of each attribute because the
high attribute shows the good split between the classes, and hence it
reduces the dimensionality. Some real-world applications of PCA
are image processing, movie recommendation system, optimizing
the
power allocation in various communication channels. It is a feature
extraction technique, so it contains the important variables and drops the
least important variable.
The PCA algorithm is based on some mathematical concepts such as:
⦿ Variance and Covariance
⦿ Eigenvalues and Eigen factors
Model Accuracy:
⦿ Model accuracy in terms of classification models can be defined as the
ratio of correctly classified samples to the total number of samples:
-
-
⦿ True Positive (TP) — A true positive is an outcome where the
model correctly predicts the positive class.
-
⦿ Binary Classification Model — Predict whether the patient has cancer
or not.
⦿ Let’s assume we have a training dataset with labels—100 cases, 10 labeled
as ‘Cancer’, 90 labeled as ‘Normal’
⦿ Let’s try calculating the accuracy of this model on the above dataset, given
the following results:
-
⦿ In the above case let’s define the TP, TN, FP, FN:
⦿ TP (Actual Cancer and predicted Cancer) = 1
⦿ TN (Actual Normal and predicted Normal) = 90
⦿ FN (Actual Cancer and predicted Normal) = 8
⦿ FP (Actual Normal and predicted Cancer) = 1
-
⦿ So the accuracy of this model is 91%. But the question remains as to
whether this model is useful, even being so accurate?
⦿ This highly accurate model may not be useful, as it isn’t able to predict
the actual cancer patients—hence, this can have worst consequences.
⦿ So for these types of scenarios how do we can trust the machine learning
models?
⦿ Accuracy alone doesn’t tell the full story when we’re working with
a class-imbalanced dataset like this one, where there’s a significant
disparity between the number of positive and negative labels.
-
-
⦿ Recall is defined as the number of true positives divided by the total
number of elements that actually belong to the positive class (i.e. the sum
of true positives and false negatives, which are items which were not
labeled as belonging to the positive class but should have been).
-
Let’s try to measure precision and recall for our cancer prediction use
case:
-
Classification Accuracy :
⦿ Classification Accuracy is what we usually mean, when we use the term
accuracy. It is the ratio of number of correct predictions to the total
number of input samples.
⦿ It works well only if there are equal number of samples belonging to each
class.
⦿ For example, consider that there are 98% samples of class A and 2%
samples of class B in our training set. Then our model can easily get 98%
training accuracy by simply predicting every training sample belonging
to class A.
⦿ When the same model is tested on a test set with 60% samples of class A
and 40% samples of class B, then the test accuracy would drop down to
60%. Classification Accuracy is great, but gives us the false sense of
achieving high accuracy.
⦿ The real problem arises, when the cost of misclassification of the minor
class samples are very high. If we deal with a rare but fatal disease, the
cost of failing to diagnose the disease of a sick person is much higher
than the cost of sending a healthy person to more tests.
-
Thanks !!!
-
-
⦿ In machine learning classification problems, there are often too many
factors on the basis of which the final classification is done. These factors
are basically variables called features. The higher the number of features,
the harder it gets to visualize the training set and then work on it.
Sometimes, most of these features are correlated, and hence redundant.
This is where dimensionality reduction algorithms come into play.
Dimensionality reduction is the process of reducing the number of
random variables under consideration, by obtaining a set of principal
variables. It can be divided into feature selection and feature extraction.
-
-
Components of Dimensionality Reduction :
-
⦿ Dimensionality reduction may be both linear or non-linear, depending
upon the method used. The prime linear method, called Principal
Component Analysis, or PCA.
⦿ Principal Component Analysis(PCA)
⦿ This method was introduced by Karl Pearson. It works on a condition that
while the data in a higher dimensional space is mapped to data in a lower
dimension space, the variance of the data in the lower dimensional space
should be maximum.
-
It involves the following steps:
-
Advantages of Dimensionality Reduction :
-
Principal Component Analysis :
⦿ Principal Component Analysis is an unsupervised learning algorithm that
is used for the dimensionality reduction in machine learning. It is a
statistical process that converts the observations of correlated features
into a set of linearly uncorrelated features with the help of orthogonal
transformation. These new transformed features are called the Principal
Components. It is one of the popular tools that is used for exploratory
data analysis and predictive modeling. It is a technique to draw strong
patterns from the given dataset by reducing the variances.
⦿ PCA generally tries to find the lower-dimensional surface to project the
high-dimensional data.
⦿ PCA works by considering the variance of each attribute because the
high attribute shows the good split between the classes, and hence it
reduces the dimensionality. Some real-world applications of PCA
are image processing, movie recommendation system, optimizing
the
power allocation in various communication channels. It is a feature
extraction technique, so it contains the important variables and drops the
least important variable.
-
The PCA algorithm is based on some mathematical concepts such as:
⦿ Variance and Covariance
⦿ Eigenvalues and Eigen factors
-
Principal Components in PCA :
-
Steps for PCA algorithm :
-
4. Calculating the Covariance of Z
To calculate the covariance of Z, we will take the matrix Z, and will transpose
it. After transpose, we will multiply it by Z. The output matrix will be the
Covariance matrix of Z.
5. Calculating the Eigen Values and Eigen Vectors
Now we need to calculate the eigenvalues and eigenvectors for the resultant
covariance matrix Z. Eigenvectors or the covariance matrix are the directions
of the axes with high information. And the coefficients of these eigenvectors
are defined as the eigenvalues.
6. Sorting the Eigen Vectors
In this step, we will take all the eigenvalues and will sort them in decreasing
order, which means from largest to smallest. And simultaneously sort the
eigenvectors accordingly in matrix P of eigenvalues. The resultant matrix will
be named as P*.
7. Calculating the new features Or Principal Components
Here we will calculate the new features. To do this, we will multiply the P*
matrix to the Z. In the resultant matrix Z*, each observation is the linear
combination of original features. Each column of the Z* matrix is
independent of each other.
-
8. Remove less or unimportant features from the new dataset.
The new feature set has occurred, so we will decide here what to keep
and what to remove. It means, we will only keep the relevant or important
features in the new dataset, and unimportant features will be removed out.
-
Evaluation metrics are tied to machine learning tasks. There are different
metrics for the tasks of classification, regression, ranking, clustering, topic
modeling, etc. Some metrics, such as precision-recall, are useful for
multiple tasks. Classification, regression, and ranking are examples of
supervised learning, which constitutes a majority of machine learning
applications.
Model Accuracy:
⦿ Model accuracy in terms of classification models can be defined as the
ratio of correctly classified samples to the total number of samples:
-
-
⦿ True Positive (TP) — A true positive is an outcome where the
model correctly predicts the positive class.
-
⦿ Binary Classification Model — Predict whether the patient has cancer
or not.
⦿ Let’s assume we have a training dataset with labels—100 cases, 10 labeled
as ‘Cancer’, 90 labeled as ‘Normal’
⦿ Let’s try calculating the accuracy of this model on the above dataset, given
the following results:
-
⦿ In the above case let’s define the TP, TN, FP, FN:
⦿ TP (Actual Cancer and predicted Cancer) = 1
⦿ TN (Actual Normal and predicted Normal) = 90
⦿ FN (Actual Cancer and predicted Normal) = 8
⦿ FP (Actual Normal and predicted Cancer) = 1
-
⦿ So the accuracy of this model is 91%. But the question remains as to
whether this model is useful, even being so accurate?
⦿ This highly accurate model may not be useful, as it isn’t able to predict
the actual cancer patients—hence, this can have worst consequences.
⦿ So for these types of scenarios how do we can trust the machine learning
models?
⦿ Accuracy alone doesn’t tell the full story when we’re working with
a class-imbalanced dataset like this one, where there’s a significant
disparity between the number of positive and negative labels.
-
-
⦿ Recall is defined as the number of true positives divided by the total
number of elements that actually belong to the positive class (i.e. the sum
of true positives and false negatives, which are items which were not
labeled as belonging to the positive class but should have been).
-
Let’s try to measure precision and recall for our cancer prediction use
case:
-
Classification Accuracy :
⦿ Classification Accuracy is what we usually mean, when we use the term
accuracy. It is the ratio of number of correct predictions to the total
number of input samples.
⦿ It works well only if there are equal number of samples belonging to each
class.
⦿ For example, consider that there are 98% samples of class A and 2%
samples of class B in our training set. Then our model can easily get 98%
training accuracy by simply predicting every training sample belonging
to class A.
⦿ When the same model is tested on a test set with 60% samples of class A
and 40% samples of class B, then the test accuracy would drop down to
60%. Classification Accuracy is great, but gives us the false sense of
achieving high accuracy.
⦿ The real problem arises, when the cost of misclassification of the minor
class samples are very high. If we deal with a rare but fatal disease, the
cost of failing to diagnose the disease of a sick person is much higher
than the cost of sending a healthy person to more tests.
-
⦿ Clustering or cluster analysis is a machine learning technique, which
groups the unlabelled dataset. It can be defined as "A way of grouping
the data points into different clusters, consisting of similar data points.
The objects with the possible similarities remain in a group that has
less or no similarities with another group.“
-
The clustering technique can be widely used in various tasks. Some most
common uses of this technique are:
⦿ Market Segmentation
⦿ Statistical data analysis
⦿ Social network analysis
⦿ Image segmentation
⦿ Anomaly detection, etc.
⦿ Apart from these general usages, it is used by the Amazon in its
recommendation system to provide the recommendations as per the past
search of products. Netflix also uses this technique to recommend the
movies and web-series to its users as per the watch history.
⦿ The below diagram explains the working of the clustering algorithm. We
can see the different fruits are divided into several groups with similar
properties.
-
-
Types of Clustering Methods
⦿ The clustering methods are broadly divided into Hard Clustering (data
point belongs to only one group) and Soft Clustering (data points can
belong to another group also). But there are also other various
approaches of Clustering exist. Below are the main clustering methods
used in Machine learning:
⦿ Partitioning Clustering
⦿ Density-Based Clustering
⦿ Distribution Model-Based Clustering
⦿ Hierarchical Clustering
⦿ Fuzzy Clustering
-
1. Partitioning Clustering :
⦿ It is a type of clustering that divides the data into non-hierarchical groups. It
is also known as the Centroid-based method. The most common example
of partitioning clustering is the K-Means Clustering algorithm.
⦿ In this type, the dataset is divided into a set of k groups, where K is used to
define the number of pre-defined groups. The cluster center is created in
such a way that the distance between the data points of one cluster is
minimum as compared to another cluster centroid.
-
2. Density-Based Clustering :
⦿ The density-based clustering method connects the highly-dense areas into
clusters, and the arbitrarily shaped distributions are formed as long as the
dense region can be connected. This algorithm does it by identifying
different clusters in the dataset and connects the areas of high densities into
clusters. The dense areas in data space are divided from each other by
sparser areas.
⦿ These algorithms can face difficulty in clustering the data points if the
dataset has varying densities and high dimensions.
-
3. Distribution Model-Based Clustering :
⦿ In the distribution model-based clustering method, the data is divided
based on the probability of how a dataset belongs to a particular
distribution. The grouping is done by assuming some distributions
commonly Gaussian Distribution.
⦿ The example of this type is the Expectation-Maximization
Clustering
algorithm that uses Gaussian Mixture Models (GMM).
-
4. Hierarchical Clustering :
⦿ Hierarchical clustering can be used as an alternative for the partitioned
clustering as there is no requirement of pre-specifying the number of
clusters to be created. In this technique, the dataset is divided into
clusters to create a tree-like structure, which is also called a dendrogram.
The observations or any number of clusters can be selected by cutting
the tree at the correct level. The most common example of this method is
the Agglomerative Hierarchical algorithm.
-
5. Fuzzy Clustering :
⦿ Fuzzy clustering is a type of soft method in which a data object may
belong to more than one group or cluster. Each dataset has a set of
membership coefficients, which depend on the degree of membership to
be in a cluster. Fuzzy C - means algorithm is the example of this type of
clustering; it is sometimes also known as the Fuzzy k-means algorithm.
⦿ Applications of Clustering :
-
3. Customer Segmentation: It is used in market research to segment the
customers based on their choice and preferences.
-
⦿ K-Means Clustering is an Unsupervised Learning algorithm, which groups
the unlabeled dataset into different clusters. Here K defines the number of
pre-defined clusters that need to be created in the process, as if K=2, there
will be two clusters, and for K=3, there will be three clusters, and so on.
⦿ It allows us to cluster the data into different groups and a convenient way to
discover the categories of groups in the unlabeled dataset on its own
without the need for any training.
⦿ The algorithm takes the unlabeled dataset as input, divides the dataset into
k-number of clusters, and repeats the process until it does not find the best
clusters. The value of k should be predetermined in this algorithm.
-
The k-means clustering algorithm mainly performs two tasks:
⦿ Assigns each data point to its closest k-center. Those data points which
are near to the particular k-center, create a cluster.
⦿ Hence each cluster has data points with some commonalities, and it is
away from other clusters. The below diagram explains the working of the
K-means Clustering Algorithm:
-
-
How does the K-Means Algorithm Work?
The working of the K-Means algorithm is explained in the below steps:
-
Suppose we have two variables M1 and M2. The x-y axis scatter plot of
these two variables is given below:
-
⦿ Let's take number k of clusters, i.e., K=2, to identify the dataset and to put
them into different clusters. It means here we will try to group these
datasets into two different clusters.
⦿ We need to choose some random k points or centroid to form the cluster.
These points can be either the points from the dataset or any other point.
So, here we are selecting the below two points as k points, which are not
the part of our dataset. Consider the below image:
-
Now we will assign each data point of the scatter plot to its closest K-
point or centroid. We will compute it by applying some mathematics that
we have studied to calculate the distance between two points. So, we will
draw a median between both the centroids. Consider the below image:
-
⦿ From the above image, it is clear that points left side of the line is near to
the K1 or blue centroid, and points to the right of the line are close to the
yellow centroid. Let's color them as blue and yellow for clear visualization.
-
⦿ As we need to find the closest cluster, so we will repeat the process by
choosing a new centroid. To choose the new centroids, we will compute
the center of gravity of these centroids, and will find new centroids as
below:
-
⦿ Next, we will reassign each datapoint to the new centroid. For this, we will
repeat the same process of finding a median line. The median will be like
below image:
-
⦿ From the above image, we can see, one yellow point is on the left side of
the line, and two blue points are right to the line. So, these three points
will be assigned to new centroids.
-
⦿ As reassignment has taken place, so we will again go to the step-4, which
is finding new centroids or K-points.
⦿ We will repeat the process by finding the center of gravity of centroids, so
the new centroids will be as shown in the below image:
-
⦿ As we got the new centroids so again will draw the median line and
reassign the data points. So, the image will be:
-
⦿ We can see in the above image; there are no dissimilar data points on
either side of the line, which means our model is formed. Consider the
below image:
-
⦿ As our model is ready, so we can now remove the assumed centroids, and
the two final clusters will be as shown in the below image:
-
Choose the value of "K number of clusters" in K-means Clustering :
Elbow Method :
The Elbow method is one of the most popular ways to find the optimal
number of clusters. This method uses the concept of WCSS
value. WCSS stands for Within Cluster Sum of Squares, which defines
the total variations within a cluster. The formula to calculate the value of
WCSS (for 3 clusters) is given below:
-
WCSS= ∑Pi in Cluster1 distance(Pi C 1 ) 2 +∑Pi in Cluster2 distance(Pi C 2 ) 2 +∑ P i in CLuster3 distance(Pi C 3 ) 2
-
1. Agglomerative Hierarchical clustering :
-
Working of Agglomerative Hierarchical clustering :
The working of the AHC algorithm can be explained using the below steps:
Step-1: Create each data point as a single cluster. Let's say there are N data
points, so the number of clusters will also be N.
-
Step-2: Take two closest data points or clusters and merge them to form one
cluster. So, there will now be N-1 clusters.
-
Step-3: Again, take the two closest clusters and merge them together to form
one cluster. There will be N-2 clusters.
-
Step-4: Repeat Step 3 until only one cluster left. So, we will get the following
clusters. Consider the below images:
-
-
Step-5: Once all the clusters are combined into one big cluster, develop
the dendrogram to divide the clusters as per the problem.
-
Measure for the distance between two clusters :
⦿ As we have seen, the closest distance between the two clusters is crucial
for the hierarchical clustering. There are various ways to calculate the
distance between two clusters, and these ways decide the rule for
clustering. These measures are called Linkage methods. Some of the
popular linkage methods are given below:
1.Single Linkage: It is the Shortest Distance between the closest points
of the clusters. Consider the below image:
-
2.Complete Linkage: It is the farthest distance between the two points of
two different clusters. It is one of the popular linkage methods as it forms
tighter clusters than single-linkage.
-
3.Average Linkage: It is the linkage method in which the distance
between each pair of datasets is added up and then divided by the total
number of datasets to calculate the average distance between two
clusters. It is also one of the most popular linkage methods.
4.Centroid Linkage: It is the linkage method in which the distance
between the centroid of the clusters is calculated. Consider the below
image:
-
Working of Dendrogram in Hierarchical clustering :
-
-
⦿ In the above diagram, the left part is showing how clusters are created in
agglomerative clustering, and the right part is showing the corresponding
dendrogram.
⦿ As we have discussed above, firstly, the data points P2 and P3 combine
together and form a cluster, correspondingly a dendrogram is created,
which connects P2 and P3 with a rectangular shape. The height is decided
according to the Euclidean distance between the data points.
⦿ In the next step, P5 and P6 form a cluster, and the corresponding
dendrogram is created. It is higher than of previous, as the Euclidean
distance between P5 and P6 is a little bit greater than the P2 and P3.
⦿ Again, two new dendrogram are created that combine P1, P2, and P3 in
one dendrogram, and P4, P5, and P6, in another dendrogram.
⦿ At last, the final dendrogram is created that combines all the data
points
together.
⦿ We can cut the dendrogram tree structure at any level as per our
requirement.
-
⦿ Association rule is a type of unsupervised learning technique that checks
for the dependency of one data item on another data item and maps
accordingly so that it can be more profitable. It tries to find some
interesting relations or associations among the variables of dataset. It is
based on different rules to discover the interesting relations between
variables in the database.
⦿ The association rule is one of the very important concepts of machine
learning, and it is employed in Market Basket analysis, Web usage
mining, continuous production, etc. Here market basket analysis is a
technique used by the various big retailer to discover the associations
between items. We can understand it by taking an example of a
supermarket, as in a supermarket, all products that are purchased
together are put together.
⦿ For example, if a customer buys bread, he most likely can also buy butter,
eggs, or milk, so these products are stored within a shelf or mostly nearby.
Consider the below diagram:
-
-
Association rule learning can be divided into three types of algorithms:
⦿ Apriori
⦿ Eclat
⦿ F-P Growth Algorithm
-
⦿ Here the If element is called Antecedent, and then statement is called
as Consequent. These types of relationships where we can find out some
association or relation between two items is known as single cardinality. It
is all about creating rules, and if the number of items increases, then
cardinality also increases accordingly. So, to measure the associations
between thousands of data items, there are several metrics. These metrics
are given below:
⦿ Support
⦿ Confidence
⦿ Lift
-
2.Confidence : Confidence indicates how often the rule has been found to
be true. Or how often the items X and Y occur together in the dataset when
the occurrence of X is already given. It is the ratio of the transaction that
contains X and Y to the number of records that contain X.
3.Lift : It is the strength of any rule, which can be defined as below formula:
-
It is the ratio of the observed support measure and expected support if X
and Y are independent of each other. It has three possible values:
-
Applications of Association Rule :
-
Apriori Algorithm :
The Apriori algorithm uses frequent item sets to generate association rules,
and it is designed to work on the databases that contain transactions. With
the help of these association rule, it determines how strongly or how weakly
two objects are connected. This algorithm uses a Breadth-first
search and Hash Tree to calculate the item set associations efficiently. It is
the iterative process for finding the frequent item sets from the large dataset.
This algorithm was given by the R. Agrawal and Srikant in the year 1994. It
is mainly used for market basket analysis and helps to find those products
that can be bought together. It can also be used in the healthcare field to find
drug reactions for patients.
Step-2: Take all supports in the transaction with higher support value than
the minimum or selected support value.
Step-3: Find all the rules of these subsets that have higher confidence value
than the threshold or minimum confidence.
-
Advantages of Apriori Algorithm :
-
Apriori Algorithm Working :
-
Solution:
In the first step, we will create a table that contains support count (The
frequency of each itemset individually in the dataset) of each itemset in
the given dataset. This table is called the Candidate set or C1.
-
Now, we will take out all the itemsets that have the greater support count
that the Minimum Support (2). It will give us the table for the frequent
itemset L1.
Since all the itemsets have greater or equal support count than the
minimum support, except the E, so E itemset will be removed.
-
Step-2: Candidate Generation C2, and L2:
In this step, we will generate C2 with the help of L1. In C2, we will create
the pair of the itemsets of L1 in the form of subsets.
After creating the subsets, we will again find the support count from the
main transaction table of datasets, i.e., how many times these pairs have
occurred together in the given dataset. So, we will get the below table for
C2:
-
Again, we need to compare the C2 Support count with the minimum
support count, and after comparing, the itemset with less support count
will be eliminated from the table C2. It will give us the below table for L2
-
Step-3: Candidate generation C3, and L3:
For C3, we will repeat the same two processes, but now we will form the
C3 table with subsets of three itemsets together, and will calculate the
support count from the dataset. It will give the below table:
Now we will create the L3 table. As we can see from the above C3 table,
there is only one combination of itemset that has support count equal to
the minimum support count. So, the L3 will have only one combination,
i.e., {A, B, C}.
-
Step-4: Finding the association rules for the subsets:
To generate the association rules, first, we will create a new table with the
possible rules from the occurred combination {A, B.C}. For all the rules,
we will calculate the Confidence using formula sup( A ^B)/A. After
calculating the confidence value for all rules, we will exclude the rules
that have less confidence than the minimum threshold(50%).
Consider the below table:
-
Rules Support Confidence
(A ^B)= 2/4=0.5=50%
-
⦿ Eclat, abbreviated as Equivalence class clustering and bottom up
lattice transversal algorithm is an algorithm for finding frequent
item sets in a transaction or dataset. It is one of the best alternative
method of Association Rule Learning and is a more efficient and
scalable version of the Apriori algorithm. Apriori algorithm works
in a horizontal sense imitating the Breadth-First Search of a graph,
whereas the ECLAT algorithm works in a vertical manner just like
the Depth-First Search of a graph. This vertical style of the ECLAT
algorithm makes it a faster algorithm than the Apriori algorithm.
⦿ Generally, Transaction Id set which is also called as tidsetsis used
to calculate Support value of a dataset. In the first call of function,
all single items or data are used along with their respective
tidsets. Then the function is called recursively.In each recursive
call, each item in tidsets pair is verified and combined with other
item in tidsets pairs. This process is repeated until no candidate
item in tidsets pairs can be combined.
-
⦿ The input given to this Eclat algorithm is a transaction dataset and a
threshold value which is in the range of 0 to 100.
⦿ A transaction dataset is a set of transaction values where each transaction
is a set of items. It is important to note that an item should not be appear
more than once in the same transaction and also the items are assumed to
be sorted by lexicographical order in a transaction.
⦿ Each frequent itemset is marked with its corresponding support value.
The support of an itemset is given by number of times the itemset
appears in the transaction dataset.
⦿ The given transaction data should be a Boolean matrix where for each cell
(i, j), the value denotes that whether the jth item is included in the
ith transaction or not. Here, 1 means true and 0 means false.
⦿ Now, we have to call the function for the first time and arrange each item
with its tidset in a tabular column. We have to call this function
iteratively till no more item-tidset pairs can be combined.
-
⦿ As discussed earlier, the basic idea of eclatis to use Transaction Id
Sets(tidsets) intersections to compute the support value of a candidate.In
the first call of the function, all single items are used along with their
tidsets. Then the function is called recursively and in each recursive call,
each item-tidset pair is verified and combined with other item-tidset
pairs. This process is continued until no candidate item-tidset pairs can
be combined.
transactions record:-
-
Transaction Id Bread Butter Milk Coffee Tea
T1 1 1 0 0 1
T2 0 1 0 1 0
T3 0 1 1 0 0
T4 1 1 0 1 0
T5 1 0 1 0 0
T6 0 1 1 0 0
T7 1 0 1 0 0
T8 1 1 1 0 1
T9 1 1 1 0 0
-
⦿ Each cell (i, j), of the above given data, which is a boolean matrix, denotes
whether the j’th item is included in the i’th transaction or not. 1 means true
while 0 means false.
⦿ First time we now call the function and arrange each item with it’stidset in
a tabular fashion:-
⦿ k = 1, minimum support = 2
ITEM TIDSET
Bread {T1, T4, T5, T7, T8, T9}
Butter {T1, T2, T3, T4, T6, T8, T9}
Milk {T3, T5, T6, T7, T8, T9}
Coffee {T2, T4}
Tea {T1, T8}
-
We now recursively call the function till no more item-tidset pairs can be
combined:-
k=2
ITEM TIDSET
{Bread, Butter} {T1, T4, T8, T9}
{Bread, Milk} {T5, T7, T8, T9}
{Bread, Coffee} {T4}
{Bread, Tea} {T1, T8}
{Butter, Milk} {T3, T6, T8, T9}
{Butter, Coffee} {T2, T4}
{Butter, Tea} {T1, T8}
{Milk, Tea} {T8}
-
K=3
ITEM TIDSET
{Bread, Butter, Milk} {T8, T9}
{Bread, Butter, Tea} {T1, T8}
K=4
ITEM TIDSET
{Bread, Butter, Milk, Tea} {T8}
-
ITEMS BOUGHT RECOMMENDED PRODUCTS
Bread Butter
Bread Milk
Bread Tea
Butter Milk
Butter Coffee
Butter Tea
Bread and Butter Milk
Bread and Butter Tea
-
Advantages Eclat algorithm over Apriori algorithm:-
-
⦿ Reinforcement Learning is defined as a Machine Learning method that is
concerned with how software agents should take actions in an
environment.
⦿ Alternatively, It is the training to machine learning models that make
a sequence of decisions. The agent(algorithm/software) learns to achieve a
goal in an uncertain, potentially complex environment. In reinforcement
learning, an artificial intelligence faces a game-like situation.
The computer employs trial and error to come up with a solution
to the problem. To get the machine to do what the programmer wants,
the artificial intelligence gets either rewards or penalties for the actions it
performs. Its goal is to maximize the total reward.
⦿ It is one of third basic machine learning paradigm, along with supervised
and unsupervised learning which we already covered within units 3 to 6.
⦿ Popular applications of reinforcement learning ranges in wide of areas,
including from robotics, optimizing chemical reactions, games, assisting
human, understanding consequences of different strategies, self-driving
car, to medical and industrial puposes.
-
Fig.: Components of Reinforcement Learning
-
Here are some important terms used in Reinforcement Learning:
-
⦿ Value Function: It specifies the value of a state that is the total amount of
reward. It is an agent which should be expected beginning from that state.
⦿ Model of the environment: This mimics the behaviour of the
environment. It helps you to make inferences to be made and also
determine how the environment will behave.
⦿ Model based methods: It is a method for solving reinforcement learning
problems which use model-based methods.
⦿ Q value or action value (Q): Q value is quite similar to value. The only
difference between the two is that it takes an additional parameter as a
current action.
-
Working Of Reinforcement Learning :
-
-
Explanation of Example:
-
Applications Of Reinforcement Learning :
⦿ RL is mainly used to train Robots for industrial automation.
⦿ Business strategy planning are decided with RL. Bonsai is one of several
-
⦿ Health and medicine: The RL setup of an agent interacting with an
environment receiving feedback based on actions taken, shares
similarities with the problem of learning treatment policies in the medical
sciences. In fact, many RL applications in health care mostly relate to
finding optimal treatment policies. Recent papers mentioned applications
of RL to usage of medical equipment, medication dosing, and two-stage
clinical trials.
⦿ Aircraft control and robot motion control
⦿ Text, speech, and dialog systems: Companies collect a lot of text, and
good tools that can help unlock unstructured text will find users. AI
researchers at SalesForce used deep RL for abstractive text
summarization (a technique for automatically generating summaries from
text based on content “abstracted” from some original text document).
This could be an area where RL-based tools gain new users, as many
companies are in need of better text mining solutions.
-
⦿ RL is also being used to allow dialog systems (i.e., chatbots) to learn from
user interactions and thus help them improve over time (many enterprise
chatbots currently rely on decision trees). This is an active area of
research and V C investments: see Semantic Machines and VocalIQ—
acquired by Apple.
⦿ Media and advertising: Microsoft recently described an internal system
called Decision Service that has since been made available on Azure. This
paper describes applications of Decision Service to content
recommendation and advertising. Decision Service more generally
targets machine learning products that suffer from failure modes
including “feedback loops and bias, distributed data collection, changes
in the environment, and weak monitoring and debugging.”
⦿ Other applications of RL include cross-channel marketing optimization
and real time bidding systems for online display advertising.
⦿ Finance: A Financial Times article pronounced an RL-based system for
optimal trade execution. The system (dubbed “LOXM”) is being used to
perform trading orders at maximum speed and at the best possible price.
-
Thanks !!!