0% found this document useful (0 votes)

13 views588 pages

UNIT-1 TO Artificial Intelligence

The document provides an overview of Artificial Intelligence (AI), defining it as a branch of science focused on enabling machines to solve complex problems in a human-like manner. It outlines the history of AI, highlighting key milestones from its inception in the 1940s to its current applications in various industries such as healthcare, finance, and robotics. Additionally, it discusses different branches and techniques of AI, as well as its growing significance in modern society.

Uploaded by

Soundharya Subramanian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views588 pages

UNIT-1 TO Artificial Intelligence

Uploaded by

Soundharya Subramanian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 588

UNIT-1

INTRODUCTION
TO
ARTIFICIAL
INTELLIGENCE
What is AI
?
What is AI ?
 Artificial Intelligence (AI) is a branch of Science which deals with
helping machines
find solutions to complex problems in a more human-like fashion.
 This generally involves borrowing characteristics from human
intelligence, and applying them as algorithms in a computer friendly
way.
 A more or less flexible or efficient approach can be taken
depending on the requirements established, which influences how
artificial the intelligent behavior appears
 Artificial intelligence can be viewed from a variety of perspectives.
 From the perspective of intelligence artificial intelligence is
making machines "intelligent" -- acting as we would expect people to
act.
 The inability to distinguish computer responses from human responses
is called the
Turing test.
 Intelligence requires knowledge
 Expert problem solving - restricting domain to allow including
significant relevant knowledge
What is AI
?
 Object-oriented languages are a class of languages more recently
used for AI programming. Important features of object-oriented
languages include: concepts of objects and messages, objects bundle
data and methods for manipulating the data, sender specifies what is
to be done receiver decides how to do it, inheritance (object hierarchy
where objects inherit the attributes of the more general class of
objects). Examples of object-oriented languages are Smalltalk,
Objective C, C++. Object oriented extensions to LISP (CLOS -
Common LISP Object System) and PROLOG (L&O - Logic & Objects) are
also used.
 Artificial Intelligence is a new electronic machine that stores large
amount of information and process it at very high speed
 The computer is interrogated by a human via a teletype It passes
if the human cannot tell if there is a computer or human at the other
end
 The ability to solve problems
 It is the science and engineering of making intelligent machines,
especially intelligent computer programs. It is related to the similar
task of using computers to understand human intelligence
-
History of Artificial
Intelligence
 Artificial Intelligence is not a new word and not a new technology for
researchers. This technology is much older than you would imagine.
Even there are the myths of Mechanical men in Ancient Greek and
Egyptian Myths. Following are some milestones in the history of AI
which defines the journey from the AI generation to till date
development

-
History of Artificial
Intelligence
Maturation of Artificial Intelligence (1943-1952)
 Year 1943: The first work which is now recognized as AI was done by
Warren McCulloch and Walter pits in 1943. They proposed a model
of artificial neurons.
 Year 1949: Donald Hebb demonstrated an updating rule for modifying
the
connection strength between neurons. His rule is now called Hebbian
learning.
 Year 1950: The Alan Turing who was an English mathematician and
pioneered Machine learning in 1950. Alan Turing publishes "Computing
Machinery and Intelligence" in which he proposed a test. The test can
check the machine's ability to exhibit intelligent behavior equivalent to
human intelligence, called a Turing test.

The birth of Artificial Intelligence (1952-1956)

 Year 1955: An Allen Newell and Herbert A. Simon created the "first
artificial intelligence program"Which was named
-
as "Logic Theorist".
This program had proved 38 of 52 Mathematics theorems, and find
new and more elegant proofs for some theorems.
History of Artificial
Intelligence
 Year 1956: The word "Artificial Intelligence" first adopted by American
Computer scientist John McCarthy at the Dartmouth Conference. For
the first time, AI coined as an academic field.
 At that time high-level computer languages such as FORTRAN, LISP, or
COBOL were invented. And the enthusiasm for AI was very high at that
time.

The golden years-Early enthusiasm (1956-1974)

 Year 1966: The researchers emphasized developing algorithms which

can solve mathematical problems. Joseph Weizenbaum created the
first chatbot in 1966, which was named as ELIZA.
 Year 1972: The first intelligent humanoid robot was built in Japan which
was named as WABOT-1.

-
History of Artificial
Intelligence
The first AI winter (1974-1980)
 The duration between years 1974 to 1980 was the first AI winter
duration. AI winter refers to the time period where computer
scientist dealt with a severe shortage of funding from
government for AI researches.
 During AI winters, an interest of publicity on artificial intelligence
was decreased.

A boom of AI (1980-1987)

 Year 1980: After AI winter duration, AI came back with "Expert

System". Expert systems were programmed that emulate the
decision-making ability of a human expert.
 In the Year 1980, the first national conference of the American
Association of Artificial Intelligence was held at Stanford University.
-
History of Artificial
Intelligence
The second AI winter (1987-1993)
 The duration between the years 1987 to 1993 was the second AI
Winter duration.
 Again Investors and government stopped in funding for AI research as
due to high cost but not efficient result. The expert system such as
XCON was very cost effective.

The emergence of intelligent agents (1993-2011)

 Year 1997: In the year 1997, IBM Deep Blue beats world chess
champion, Gary Kasparov, and became the first computer to beat a
world chess champion.
 Year 2002: for the first time, AI entered the home in the form of
Roomba, a vacuum
cleaner.
 Year 2006: AI came in the Business world till the
- year 2006.
Companies like Facebook, Twitter, and Netflix also started using
AI.
History of Artificial
Intelligence
Deep learning, big data and artificial general intelligence (2011-present)
 Year 2011: In the year 2011, IBM's Watson won jeopardy, a quiz show,
where it had to solve the complex questions as well as riddles.
Watson had proved that it could understand natural language and can
solve tricky questions quickly.
 Year 2012: Google has launched an Android app feature "Google now",
which was able to provide information to the user as a prediction.
 Year 2014: In the year 2014, Chatbot "Eugene Goostman" won a
competition in the
infamous "Turing test."
 Year 2018: The "Project Debater" from IBM debated on complex
topics with two master debaters and also performed extremely well.
 Google has demonstrated an AI program "Duplex" which was a
virtual assistant and which had taken hairdresser appointment on call,
and lady on other side didn't notice that she was talking with the
machine. Now AI has developed to a remarkable level. The
concept of Deep learning, big data, and data science are now
-
trending like a boom. Nowadays companies like Google, Face book,
IBM, and Amazon are working with AI and creating amazing devices.
The future of Artificial Intelligence is inspiring and will come with high
intelligence.
Branches of
AI
1. Game Playing
You can buy machines that can play master level chess for a few hundred
dollars.
There is some AI in them, but they play well against people mainly through
brute
force computation--looking at hundreds of thousands of positions. To
beat a world champion by brute force and known reliable heuristics
requires being able to look at 200 million positions per second.

2. Speech Recognition
In the 1990s, computer speech recognition reached a practical level for
limited
purposes. Thus United Airlines has replaced its keyboard tree for flight
information by a system using speech recognition of flight numbers
and city names. It is quite convenient. On the other
- hand, while it is
possible to instruct some computers using
speech, most users have gone back to the keyboard and the mouse as
Branches of
AI
3. Understanding Natural Language
Just getting a sequence of words into a computer is not enough. Parsing
sentences is not enough either. The computer has to be provided with
an understanding of the
domain the text is about, and this is presently possible only for very
limited domains.
4. Computer Vision
The world is composed of three-dimensional objects, but the inputs to
the human eye
and computers' TV cameras are two dimensional. Some useful programs
can work solely in two dimensions, but full computer vision requires
partial three-dimensional information that is not just a set of two-
dimensional views. At present there are only limited ways of
representing three-dimensional information directly, and they are not as
-
good as what humans evidently use.
Branches of
AI
Expert Systems
5.
A ``knowledge engineer'' interviews experts in a certain domain and
tries to embody their knowledge in a computer program for carrying
out some task. How well this
works depends on whether the intellectual mechanisms required for
the task are within the present state of AI. When this turned out not
to be so, there were many disappointing results.
6. Heuristic Classification
One of the most feasible kinds of expert system given the present
knowledge of AI is
to put some information in one of a fixed set of categories using
several sources of
information. An example is advising whether to accept a proposed credit
card purchase. Information is available about the owner of the credit
card, his record of payment and also about the -item he is buying and
about the establishment from which he is buying it (e.g., about whether
there have been previous credit card frauds at this establishment).
Applications of

AI
Artificial Intelligence has various applications in today's society. It
is becoming essential for today's time because it can solve complex
problems with an efficient way in multiple industries, such as
Healthcare, entertainment, finance, education, etc. AI is making our
daily life more comfortable and fast.
 Following are some sectors which have the application of Artificial
Intelligence:

-
Applications of
AI
1. AI in Astronomy
 Artificial Intelligence can be very useful to solve complex universe
problems. AI technology can be helpful for understanding the universe
such as how it works, origin, etc.
2.AI in Healthcare
 In the last, five to ten years, AI becoming more advantageous for
the healthcare industry and going to have a significant impact on
this industry.
 Healthcare Industries are applying AI to make a better and faster
diagnosis than humans. AI can help doctors with diagnoses and can
inform when patients are worsening so that medical help can reach
to the patient before hospitalization.
3. AI in Gaming
 AI can be used for gaming purpose. The AI machines can play
strategic games like chess, where the machine needs to think of a
large number of possible places.
-
Applications of
AI
4. AI in Finance
 AI and finance industries are the best matches for each other. The finance
industry is implementing automation, chatbot, adaptive intelligence,
algorithm trading, and machine learning into financial processes.
5. AI in Data Security
 The security of data is crucial for every company and cyber-attacks are
growing very rapidly in the digital world. AI can be used to make your
data more safe and secure. Some examples such as AEG bot, AI2 Platform,
are used to determine software bug and cyber-attacks in a better way.
6. AI in Social Media
 Social Media sites such as Facebook, Twitter, and Snapchat contain billions of
user profiles, which need to be stored and managed in a very efficient way.
AI can organize and manage massive amounts of data. AI can analyze lots of
data to identify the latest trends, hashtag, and requirement of different
users.

-
Applications of
AI
7. AI in Travel & Transport
 AI is becoming highly demanding for travel industries. AI is capable of
doing various travel related works such as from making travel
arrangement to suggesting the hotels, flights, and best routes to the
customers. Travel industries are using AI- powered chatbots which can
make human-like interaction with customers for better and fast
response.
8.AI in Automotive Industry
 Some Automotive industries are using AI to provide virtual assistant to
their user for better performance. Such as Tesla has introduced
TeslaBot, an intelligent virtual assistant.
 Various Industries are currently working for developing self-driven
cars which can make your journey more safe and secure.
9. AI in Robotics:
 Artificial Intelligence has a remarkable role in Robotics. Usually,
general robots are programmed such that they can perform some
repetitive task,
experiences but with
without pre- the help of AI, we can
-
create intelligent
robots which can perform tasks with their own
programmed.
 Humanoid Robots are best examples for AI in robotics, recently the
intelligent Humanoid robot named as Erica and Sophia has been
developed which can talk and behave like humans.
10. AI in Entertainment
 We are currently using some AI based applications in our daily
life with some entertainment services such as Netflix or Amazon.
With the help of ML/AI algorithms, these services show the
recommendations for programs or shows.
11. AI in Agriculture
 Agriculture is an area which requires various resources, labor, money,
and time for best result. Now a day's agriculture is becoming digital,
and AI is emerging in this field. Agriculture is applying AI as
agriculture robotics, solid and crop monitoring, predictive analysis. AI
in agriculture can be very helpful for farmers.

-
Applications of
AI
12. AI in E-commerce
 AI is providing a competitive edge to the e-commerce industry, and
it is becoming more demanding in the e-commerce business. AI is
helping shoppers to discover associated products with recommended
size, color, or even brand.
13.AI in education:
 AI can automate grading so that the tutor can have more time to
teach. AI chatbot
can communicate with students as a teaching assistant.
 AI in the future can be work as a personal virtual tutor for students,
which will be accessible easily at any time and any place.

-
AI Problem &
Techniques
Followin are problems that can be solved by
g AI.
someFollowing using
categories of problems are considered as AI
problems.

Ordinary Problems
1.Perception
 Vision
 Voice Recognition
 Speech Recognition
2.Natural Language
 Understanding
 Generation
 Translation
3.Robot Control

-
AI Problem &
Techniques
Followin are problems that can be solved by
g AI.
someFollowing using
categories of problems are considered as AI
problems.

Formal Problems
 Game Playing
 Solving complex mathematical Problem

Expert Problems
 Design
 Fault Finding
 Scientific Analysis
 Medical Diagnosis
 Financial Analysis

-
AI Problem &
Techniques
There are three important AI techniques:
 Search — Provides a way of solving problems for which no direct
approach is available. It also provides a framework into which any
direct techniques that are available can be embedded.

 Use of knowledge — Provides a way of solving complex problems by

exploiting
the structure of the objects that are involved.

 Abstraction — Provides a way of separating important features and

variations from many unimportant ones that would otherwise
overwhelm any process.

-
Thanks
!!!

-
Chapter 2: Problem Spaces &
Search
 The objective of this lesson is to provide an overview
of representation techniques i.e.
problem
 Representing AI problems as a mathematical model
 Representing AI problems as a production system
 Defining AI problems as a state space search
 Thislesson also gives in depth knowledge about searching
the
techniques BFS and DFS search algorithm with
advantages

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Setps Used to solve AI Problems
To solve different types of AI problems, there are different types of AI
techniques. These techniques use followingsteps
1. Define the problem : This definition must include precise
specifications of what the initial situation(s) will be as well as what
final situations constitute. By defining it properly, one can convert it
into the real workable states.
2. Analyze the problem: Identify the techniques to be used to solve the
given problem. A few important features can have
different techniques for solving the problem
3. Isolate and represent the task knowledge that is necessary to solve the
problem
4. Choose the best problem solving technique and apply on particular
problem

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
AI Problem as State Space Search
 A set of all possible states for a given problem is known as state
space of the problem.
 Representation of states is highly beneficial in AI because they
provide all possible states, operations and the goals.
 If the entire sets of possible states are given, it is possible to
trace the path from the initial state to the goal state and identify
the sequence of operators necessary for doing it.
 Representation allows for a formal definition of a problem using
a set of permissible operations as the need to convert
some given situation into some desired situation.
 We are free to define the process of solving a particular problem
as a combination of known techniques, each of which
are represented as a rule defining a single step in the space,
and search, the general technique of exploring the space to
try to find some path from the current state to a goal state.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Water Jug Problem
 Statement :- We are given 2 jugs, a 4 liter one and a 3-
liter one. Neither has any measuring markers on it. There is
a pump that can be used to fill the jugs with water. we can
pour water out of a jug to the ground. We can pour water
from one jug to another. How can we get exactly 2 liters of
water in to the 4-liter jugs?
 Solution:-The state space for this problem can be defined
as
x -represents the number of liters of water in the 4-liter jug
y -represents the number of liters of water in the 3-liter jug
Therefore, x =0,1,2,3 or 4 and y=0,1,2 or 3
The initial state is ( 0,0) .The goal state is to get ( 2,n) for any
value of ‘n’.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
The various Production Rules that are available to solve
this problem may be stated as given in the following table .
Rule No Production Rule Action
1 (x,y) if x<4 => (4,y) Fill the 4 Liter jug
2 (x,y) if y<3 => (x,3) Fill the 3 Liter jug
3 (x,y) if x>0 =>(0,y) Empty 4 Liter jug on ground
4 (x,y) if y>0 =>(x,0) Empty 3 Liter jug on ground
5 (x,y) if x+y<=4 =>(x+y,0) Pour all the water from 3 liter jug into 4 liter jug
6 (x,y) if x+y<=3 =>(0,x+y) Pour all the water from 4 liter jug into 3 liter jug
7 (x,y) if x+y>=4 => (4,y- Pour water from 3 liter jug to 4 liter jug until 4
(4-x) liter jug is full
8 (x,y) if x+y>=3 =>(x- Pour water from 4 liter jug to 3 liter jug until
(3- y),3) 3 liter jug is full
9 (0,2) =>(2,0) Pour 2 liter from 3 liter jug to 4 liter jug

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Solution 1
One solution to water jug problem is given
as
Liter in 4 Liter in 3 Rule Applied
Liter Jug Liter Jug
0 0
0 3 2
3 0 5
3 3 2
4 2 7
0 2 3
2 0 5
Solution 1
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Solution 2
Liter in 4 Liter in 3 Rule Applied
Liter Jug Liter Jug
0 0
4 0 1
1 3 8
1 0 4
0 1 6
4 1 1
2 3 8

Solution 2
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Solution 3
Liter in 4 Liter Jug Liter in 3 Liter Jug Rule Applied

0 0
4 0 1
1 3 8
0 3 3
3 0 5
3 3 2
4 2 7
0 2 3
2 0 5

Solution 3
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Water Jug Problem of 8,5 and 3 Ltr Jug
The following is a problem which can be solved by using state
space search technique. “we have 3 jugs of capacities 3,5, and
8 liters respectively. There is no scale on the jugs. So it is only
their capacities that we certainly know. Initially the 8 liter jug
is full of water, the other two are empty. We can pour water
from one jug to another, and the goal is to have exactly 4 liters
of water in any of the jug. There is no scale on the jug and we
do not have any other tools that would help. The amount of
water in the other two jugs at the end is irrelevant.
Formalize the above problem as state space search . You should
1. Suggest suitable representation of the problem
2. State the initial and goal state of this problem
3. Specify the production rules for getting from one state to
another
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Water Jug Problem of 8,5 and 3 Ltr Jug
Solution:-
The state space for this problem can be defined as
x -represents the number of liters of water in the 8-liter
jug
y -represents the number of liters of water in the 5-liter
jug
Therefore, x =0,1,2,3,5,6,70r 8
z –represent the number of liters of water in he 3-liter jug
y=0,1,2 ,3,4 or 5
z=0,1,2 or 3
The initial state is ( 8,0,0) .The goal state is to get 4 liter of water
in any jug.
The goal state can be defined as (4,n,n) or (n,4,n) for any value of
n

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
8,5,3 Liter water jug problem
Rule variousProduction
The Production RuleRules Action
at are available to solve
1 th (x,y,z) if x+y<=5 =>(0,x+y,z) Pour all water from 8 liter jug to 5 liter jug
thisthe
problem may be stated as in following
2 given
(x,y,z) if x+z<=3 =>(0,y,x+z) Pour
tableall.water from 8 liter jug to 3 liter jug
3 (x,y,z) if x+y<=8 =>(x+y,0,z) Pour all water from 5 liter jug to 8 liter jug
4 (x,y,z) if y+z<=3 =>(x,0,y+z) Pour all water from 5 liter jug to 3 liter jug
5 (x,y,z) if x+z<=8 =>(x+z,y,0) Pour all water from 3 liter jug to 8 liter jug
6 (x,y,z) if y+z<=5 =>(x,y+z,0) Pour all water from 3 liter jug to 5 liter jug
7 (x,y,z) if x+y>=5 => (x-(5-y),5,z) Pour water from 8 liter jug to 5 liter
jug until 5 liter jug is full
8 (x,y,z) if x+z>=3 => (x-(3-z),y,3) Pour water from 8 liter jug to 3 liter jug
until 3 liter jug is full
9 (x,y,z) if y+z>=3 => (x,y-(3-z),3) Pour water from 5 liter jug to 3 liter
jug until 3 liter jug is full
10 (x,y,z) if y+z>=5 => (x,5,z-(5-y)) Pour water from 3 liter jug to 5 liter jug
until 5 liter jug is full
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Solution
One solution to water jug problem is given
as
Liter in Liter in Liter in Rule
8 Liter 5 Liter 3 Liter Applied
Jug Jug Jug
8 0 0
3 5 0 7
3 2 3 9
6 2 0 5
6 0 2 4
1 5 2 7
1 4 3 9
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Thus, to solve any difficult, unstructured problems, we have to provide a
formal description to solve a problem and for that we have to do following
1. Specify one or more states within that space that describes possible
situations from which the problem solving process may start,
called as initial state
2. Specify one or more states that would be acceptable as solution to the
problem. These states are called as goal state
3. Define a state space that contains all the possible configuration of the
relevant objects
4. Specify a set of rules that describes the action available thus the
problem can be solved by using the rules and also with appropriate
control strategy.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Tiger, Goat and Grass Problem
Statement:
“A Farmer has a tiger, a goat and a bundle of grass. He is
standing at one side of the river with a very week boat which
can hold only one of his belongings at a time. His goal is to
take all three of his belongings to the other side. The constraint
is that the farmer cannot leave either goat and tiger, or goat and
grass, at any side of the river unattended because one of them
will eat the other”
Formalize the above problem in terms of state space search.
You should
i. Suggest a suitable representation for the problem
ii. State the initial state and final state
iii. List the actions for getting from one state to another state

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Solution:
Let F represent Farmer, T represent Tiger, G represent Goat, Gr
represent Grass, A and B represent sides of the river
respectively
Initial State: Side A(T, G, Side B( )
Gr) SideB(T,G,Gr)
Goal State: solution
One possible Side A( )is
1. The F takes G from Side A to side B of theriver
Side A(T, Gr) Side B(G)
2. The F crosses the river from side B to side
A Side A(T, Gr) Side B(G)
3. The F takes Gr from Side A to Side B of
theriver
Side A(T) Side B(G,Mrs.Harsha
Gr) Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
4. The F takes G from Side B to Side A of theriver
Side A(T, G) Side B(Gr)
5. The F takes T from Side Ato Side B of the
river Side A(G) Side B(Gr, T)
6. The F crosses the river from Side B to Side
A Side A(G) Side B(Gr, T)
7. The F takes G from Side A to Side B of theriver
Side A( ) Side B(Gr, T, G)

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Missionaries and Cannibals Problem
Statement:
Three missionaries and three cannibals find themselves on one
side of a river. They have would like to get to the other side,
such that the number of missionaries on either side of the river
should never less than the number of cannibals who are on the
same side. The only boat available can holds only two at a
time. How can everyone cross the river without the
missionaries risking being eaten?
Formalize the above problem in terms of state space search.
You should
i. Suggest a suitable representation for the problem
ii. State the initial state and final state
iii. List the actions for getting from one state to another state

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Solution:-The state space for this problem can be defined as

Let, i represents the number missionaries in one side of a

river Therefore i=0,1,2 or 3

j represents the number of cannibals in the same side of

river Therefore j=0,1,2 or 3

The initial state (i,j) is (3,3) i.e. three missionaries and three
cannibals on side A of a river and ( 0,0) on side B of the river.

The goal state is to get (3,3) at Side B and (0,0) at Side A.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
The various Production Rules that are available to solve
this problem may be stated as given in the following table .
Rule Production Rule Action
No
1 (i,j) if i-1>=j on one One missionary can cross the river
side and i+1>=j on other
side
2 (i,j) if j-1<=i or i=0 on one side One cannibal can cross the river
and j+1<=i or j=0 on other side
3 (i,j) if i-2>=j or i-2=0 on one side Two missionary can cross the river
and i+2>=j on other side
4 (i,j) if j-2<=i or i=0 on one side Two cannibals can cross the river
and j+2<=i or i=0 on other side
5 (i,j) if i-1>=j-1 or i=0 on one side One missionary and one cannibal
and i+1>=j+1 or i=0 on other can cross the river
side Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Solution of Missionary Canibal
Problem
Side A Boat Side Rule Applied
(M,C) (M,C) B
(M,C)
(3,3) Empty (0,0)
(3,1) (0,2) (0,2) 4
(3,2) (0,1) (0,1) 2
(3,0) (0,2) (0,3) 4
(3,1) (0,1) (0,2) 2
(1,1) (2,0) (2,2) 3
(2,2) (1,1) (1,1) 5
(0,2) (2,0) (3,1) 3
(0,3) (0,1) (3,0) 2
(0,1) (0,2) (3,2) 4
(0,2) (0,1) (3,1) 2
(0,0) (0,2) (3.3) 4
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Tower of Hanoi Problem
Problem Statement:

The Tower of Hanoi is a mathematical game or puzzle.It consist of

three rods and a no. Of disks of different sizes, which can slide onto
any rod. The puzzle start with the disks in a stack in ascending order of
size on one rod, the smallest at the top,thus making a conical shape.
The objectives of the puzzel is to move the entire stack into another
rod, using following simple constraint.

1. Only one disk can be moved at a time.

2. Each move consist of taking the upper disk from one of the stacks
and placing it on top of another stack or on an empty rod.
3. No larger disk may be placed on top of a smaller disk.
The minimal no of moves required to solve these
problem. Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Tower of Hanoi Problem
Solution:

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Tower of Hanoi Problem
1. Disk 1 moved from R1 to R3
2. Disk 2 moved from R1 to R2
3. Disk 1 moved from R3 to R2
4. Disk 3 moved from R1 to R3
5. Disk 1 moved from R2 to R1
6. Disk 2 moved from R2 to R3
7. Disk 1 moved from R1 to R3
The state is represented as a tuple , (rod no, Sequence of
disks)
->{(R1,1,2,3),(R2,Nil),(R3,Nil}Initial State
->{(R1,2,3),(R2,NIL)(R3,1)}
->{(R1,3),(R2,2),(R3,1)}
->{(R1,3),(R2,1,2),(R3,NIL)}
->{(R1,NIL),(R2,1,2),(R3,3)}
->{(R1,1),(R2,NIL),(R3,2,3)}
->{(R1,NIL),(R2,NIL),(R3,1,2,3)}Goal State
Total No of moves are 6. Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
8 Puzzle Problem
1 2 3
7 8 4
6 5

1 2 3 1 2 3 1 2 3
7 8 4 7 8 4 7 4
6 5 6 5 6 8 5

Which move is Best?

A Simple 8-puzzle heuristic
 Number of tiles in the correct position.
 The higher the number the better.
 Easy to compute (fast and takes little
memory).
 Probably the simplest possible heuristic.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
8 Puzzle Problem
Problem Definition:
In 8 puzzle problem there are 9 squares arranged in 3*3 matrix
format. Out of the 9 squares, 8 squares contain the values from
1-8 and one square is empty. We can slide the numbers in
empty square from left, right, top, bottom of empty square.
Initially the numbers are arranged in random order and we
have to arrange them as

1 2 3
8 4
7 6 5

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
8 Puzzle Problem
Sate space representation for 8 puzzle problem:
Initial state: Initially the numbers are placed in random
order
2 8 3
1 6 4
7 5

Goal State: We have to place the numbers is following

order
1 2 3
8 4
7 6 5

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
8 Puzzle Problem
2 8 3 2 3
1 6 4 1 8 4
7 5 7 6 5

2 8 3 1 2 3
1 4 8 4
7 6 5 7 6 5

2 3 1 2 3
1 8 4 8 4
7 6 5 7 6 5

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Search and Control Strategies
Problem-solving agents:
In Artificial Intelligence, Search techniques are universal problem-solving
methods. Rational agents or Problem-solving agents in AI mostly used
these search strategies or algorithms to solve a specific problem and
provide the best result. Problem-solving agents are the goal-based agents
and use atomic representation. In this topic, we will learn various problem-
solving search algorithms.

Search Algorithm Terminologies:

1. Search: Searching is a step by step procedure to solve a search-problem
in a given search space. A search problem can have three main factors:
 Search Space: Search space represents a set of possible solutions, which
a system may have.
 Start State: It is a state from begins the search.
 Goal test: It is a function which where agent observe the current state
and returns whether the goal state is achieved or not.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Search and Control Strategies
Search tree: A tree representation of search problem is called Search tree.
The root of the search tree is the root node which is corresponding to the
initial state.

Actions: It gives the description of all the available actions to the agent.

Transition model: A description of what each action do, can be

represented as a transition model.

Path Cost: It is a function which assigns a numeric cost to each path.

Solution: It is an action sequence which leads from the start node to the
goal node.

Optimal Solution: If a solution has the lowest cost among all solutions.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Search and Control Strategies
Properties of Search Algorithms:

Completeness: A search algorithm is said to be complete if it guarantees

to return a solution if at least any solution exists for any random input.
Optimality: If a solution found for an algorithm is guaranteed to be the
best solution (lowest path cost) among all other solutions, then such a
solution for is said to be an optimal solution.
Time Complexity: Time complexity is a measure of time for an algorithm
to complete its task.
Space Complexity: It is the maximum storage space required at any point
during the search, as the complexity of the problem.

Types of search algorithms

Based on the search problems we can classify the search algorithms
into uninformed (Blind search) search and informed search (Heuristic
search) algorithms.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Search and Control Strategies

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Search and Control Strategies
Uninformed/Blind Search:
The uninformed search does not contain any domain knowledge such as
closeness, the location of the goal. It operates in a brute-force way as it
only includes information about how to traverse the tree and how to
identify leaf and goal nodes. Uninformed search applies a way in which
search tree is searched without any information about the search space like
initial state operators and test for the goal, so it is also called blind search.
It examines each node of the tree until it achieves the goal node.

It can be divided into five main types:

Breadth-first search
Uniform cost search
Depth-first search
Iterative deepening depth-first search
Bidirectional Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Search and Control Strategies
Uninformed Search Algorithms

 Uninformed search is a class of general-purpose search algorithms which

operates in brute force-way.
 Uninformed search algorithms do not have additional information about
state or search space other than how to traverse the tree, so it is also called
blind search.

Following are the various types of uninformed search algorithms:

Breadth-first Search
Depth-first Search
Depth-limited Search
Iterative deepening depth-first search
Uniform cost search
Bidirectional Search
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Issues in the design of search programs
Issues in the design of search programs :

 The direction in which to conduct search (forward versus backward

reasoning). If the search proceeds from start state towards a goal state, it is
a forward search or we can also search from the goal.

 How to select applicable rules (Matching). Production systems typically

spend most of their time looking for rules to apply. So, it is critical to have
efficient procedures for matching rules against states.

 How to represent each node of the search process (knowledge

representation problem).

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Breadth-first Search
1. Breadth-first Search:

 Breadth-first search is the most common search strategy for traversing a

tree or graph. This algorithm searches breadth wise in a tree or graph, so it
is called breadth-first search.
BFS algorithm starts searching from the root node of the tree and expands
all successor node at the current level before moving to nodes of next level.
 The breadth-first search algorithm is an example of a general-graph search
algorithm.
Breadth-first search implemented using FIFO queue data structure.

Advantages:
BFS will provide a solution if any solution exists.
 If there are more than one solutions for a given problem, then BFS will
provide the minimal solution which requires the least number of steps.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Breadth-first Search
Disadvantages:
 It requires lots of memory since each level of the tree must be saved into
memory to expand the next level.
BFS needs lots of time if the solution is far away from the root node.

Example:
In the below tree structure, we have shown the traversing of the tree using
BFS algorithm from the root node S to goal node K. BFS search algorithm
traverse in layers, so it will follow the path which is shown by the dotted
arrow, and the traversed path will be:

S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Breadth-first Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Breadth-first Search
Time Complexity: Time Complexity of BFS algorithm can be obtained by
the number of nodes traversed in BFS until the shallowest Node. Where the
d= depth of shallowest solution and b is a node at every state.
T (b) = 1+b2+b3+.......+ bd= O (bd)

Space Complexity: Space complexity of BFS algorithm is given by the

Memory size of frontier which is O(bd).

Completeness: BFS is complete, which means if the shallowest goal node

is at some finite depth, then BFS will find a solution.

Optimality: BFS is optimal if path cost is a non-decreasing function of the

depth of the node.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Depth-first Search
2. Depth-first Search

 Depth-first search is a recursive algorithm for traversing a tree or graph

data structure.
 It is called the depth-first search because it starts from the root node and
follows each path to its greatest depth node before moving to the next path.
DFS uses a stack data structure for its implementation.
The process of the DFS algorithm is similar to the BFS algorithm.

Advantage:
DFS requires very less memory as it only needs to store a stack of the
nodes on the path from root node to the current node.
 It takes less time to reach to the goal node than BFS algorithm (if it
traverses in the right path).

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Depth-first Search
Disadvantage:
There is the possibility that many states keep re-occurring, and there is no
guarantee of finding the solution.
 DFS algorithm goes for deep down searching and sometime it may go to
the infinite loop.

Example:
In the below search tree, we have shown the flow of depth-first search, and
it will follow the order as:

Root node--->Left node ----> right node.

It will start searching from root node S, and traverse A, then B, then D and
E, after traversing E, it will backtrack the tree as E has no other successor
and still goal node is not found. After backtracking it will traverse node C
and then G, and here it will terminate as it found goal node.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Depth-first Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Depth-first Search
Completeness: DFS search algorithm is complete within finite state space as
it will expand every node within a limited search tree.

Time Complexity: Time complexity of DFS will be equivalent to the node

traversed by the algorithm. It is given by:
T(n)= 1+ n2+ n3 +.........+ nm=O(nm)
Where, m= maximum depth of any node and this can be much larger
than d (Shallowest solution depth)

Space Complexity: DFS algorithm needs to store only single path from the
root node, hence space complexity of DFS is equivalent to the size of the
fringe set, which is O(bm).

Optimal: DFS search algorithm is non-optimal, as it may generate a large

number of steps or high cost to reach to the goal node.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Difference between BFS & DFS
BFS DFS

BFS stands for Breadth First Search. DFS stands for Depth First Search.

BFS(Breadth First Search) uses Queue data structure for

DFS(Depth First Search) uses Stack data structure.
finding the shortest path.

BFS can be used to find single source shortest path in an

In DFS, we might traverse through more edges to reach a
unweighted graph, because in BFS, we reach a vertex
destination vertex from a source.
with minimum number of edges from a source vertex.

BFS is more suitable for searching vertices which are DFS is more suitable when there are solutions away from
closer to the given source. source.

DFS is more suitable for game or puzzle problems. We

BFS considers all neighbors first and therefore not suitable make a decision, then explore all paths through this
for decision making trees used in games or puzzles. decision. And if this decision leads to win situation,
we stop.

The Time complexity of BFS is O(V + E) when The Time complexity of DFS is also O(V + E) when
Adjacency List is used and O(V^2) when Adjacency Adjacency List is used and O(V^2) when Adjacency
Matrix is used, where V stands for vertices and E Matrix is used, where V stands for vertices and E
stands for edges. stands for edges.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Depth-Limited Search
3. Depth-Limited Search Algorithm:

 A depth-limited search algorithm is similar to depth-first search with a

predetermined limit. Depth-limited search can solve the drawback of the
infinite path in the Depth-first search. In this algorithm, the node at the
depth limit will treat as it has no successor nodes further.
Depth-limited search can be terminated with two Conditions of failure:
 Standard failure value: It indicates that problem does not have any
solution.
 Cutoff failure value: It defines no solution for the problem within a given
depth limit.

Advantages:
Depth-limited search is Memory efficient.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Depth-Limited Search
Disadvantages:

Depth-limited search also has a disadvantage of incompleteness.

It may not be optimal if the problem has more than one solution.
Example:

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Depth-Limited Search
Completeness: DLS search algorithm is complete if the solution is above
the depth-limit.

Time Complexity: Time complexity of DLS algorithm is O(bℓ).

Space Complexity: Space complexity of DLS algorithm is O(b×ℓ).

Optimal: Depth-limited search can be viewed as a special case of DFS, and

it is also not optimal even if ℓ>d.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Uniform-cost Search
4. Uniform-cost Search Algorithm:

 Uniform-cost search is a searching algorithm used for traversing a

weighted tree or graph.
 This algorithm comes into play when a different cost is available for
each edge. The primary goal of the uniform-cost search is to find a path to
the goal node which has the lowest cumulative cost.
 Uniform-cost search expands nodes according to their path costs form
the root node.
It can be used to solve any graph/tree where the optimal cost is in
demand.
A uniform-cost search algorithm is implemented by the priority queue.
It gives maximum priority to the lowest cumulative cost.
Uniform cost search is equivalent to BFS algorithm if the path cost of
all edges is the same.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Uniform-cost Search
Advantages:
 Uniform cost search is optimal because at every state the path with the
least cost is chosen.

Disadvantages:
 It does not care about the number of steps involve in searching and only
concerned about path cost. Due to which this algorithm may be stuck in an
infinite loop.
Example:

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Uniform-cost Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Uniform-cost Search
Completeness:
Uniform-cost search is complete, such as if there is a solution, UCS will find it.

Time Complexity:
Let C* is Cost of the optimal solution, and ε is each step to get closer to the
goal node. Then the number of steps is = C*/ε+1. Here we have taken +1, as we
start from state 0 and end to C*/ε.Hence, the worst-case time complexity of
Uniform-cost search isO(b1 + [C*/ε])/.

Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of
Uniform-cost search is O(b1 + [C*/ε]).

Optimal:
Uniform-cost search is always optimal as it only selects a path with the lowest
path cost.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Iterative deepening depth-first Search
5. Iterative deepening depth-first Search:

 The iterative deepening algorithm is a combination of DFS and BFS

algorithms. This search algorithm finds out the best depth limit and does it
by gradually increasing the limit until a goal is found.
 This algorithm performs depth-first search up to a certain "depth limit",
and it keeps increasing the depth limit after each iteration until the goal
node is found.
 This Search algorithm combines the benefits of Breadth-first search's fast
search and depth-first search's memory efficiency.
The iterative search algorithm is useful uninformed search when search
space is large, and depth of goal node is unknown.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Iterative deepening depth-first Search
Advantages:
It combines the benefits of BFS and DFS search algorithm in terms of fast
search and memory efficiency.

Disadvantages:
The main drawback of IDDFS is that it repeats all the work of the
previous phase.

Example:

Following tree structure is showing the iterative deepening depth-first

search. IDDFS algorithm performs various iterations until it does not find
the goal node. The iteration performed by the algorithm is given as:

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Iterative deepening depth-first Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Iterative deepening depth-first Search
1'st Iteration-----> A
2'nd Iteration----> A, B, C
3'rd Iteration------>A, B, D, E, C, F, G
4'th Iteration------>A, B, D, H, I, E, C, F, K, G
In the fourth iteration, the algorithm will find the goal node.

Completeness:
This algorithm is complete is if the branching factor is finite.
Time Complexity:
Let's suppose b is the branching factor and depth is d then the worst-case
time complexity is O(bd).
Space Complexity:
The space complexity of IDDFS will be O(bd).
Optimal:
IDDFS algorithm is optimal if path cost is a non- decreasing function of
the depth of the node.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Bidirectional Search
6. Bidirectional Search Algorithm:

 Bidirectional search algorithm runs two simultaneous searches, one form

initial state called as forward-search and other from goal node called as
backward-search, to find the goal node.
 Bidirectional search replaces one single search graph with two small sub
graphs in which one starts the search from an initial vertex and other starts
from goal vertex.
The search stops when these two graphs intersect each other.
 Bidirectional search can use search techniques such as BFS, DFS, DLS,
etc.

Advantages:
Bidirectional search is fast.
Bidirectional search requires less memory

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Bidirectional Search
Disadvantages:
Implementation of the bidirectional search tree is difficult.
In bidirectional search, one should know the goal state in advance.

Example:
In the below search tree, bidirectional search algorithm is applied. This
algorithm divides one graph/tree into two sub-graphs. It starts traversing
from node 1 in the forward direction and starts from goal node 16 in the
backward direction.
The algorithm terminates at node 9 where two searches meet.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Bidirectional Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Bidirectional Search
Completeness: Bidirectional Search is complete if we use BFS in both
searches.

Time Complexity: Time complexity of bidirectional search using BFS

is O(bd).

Space Complexity: Space complexity of bidirectional search is O(bd).

Optimal: Bidirectional search is Optimal.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Informed Search Algorithms
The informed search algorithm is more useful for large search space.
Informed search algorithm uses the idea of heuristic, so it is also called
Heuristic search.

Heuristics function: Heuristic is a function which is used in Informed

Search, and it finds the most promising path. It takes the current state of the
agent as its input and produces the estimation of how close agent is from the
goal. The heuristic method, however, might not always give the best
solution, but it guaranteed to find a good solution in reasonable time.
Heuristic function estimates how close a state is to the goal. It is represented
by h(n), and it calculates the cost of an optimal path between the pair of
states. The value of the heuristic function is always positive.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Informed Search Algorithms
Admissibility of the heuristic function is given as:

h(n) <= h*(n)

Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic
cost should be less than or equal to the estimated cost.

Pure Heuristic Search:

Pure heuristic search is the simplest form of heuristic search algorithms. It
expands nodes based on their heuristic value h(n). It maintains two lists,
OPEN and CLOSED list. In the CLOSED list, it places those nodes which
have already expanded and in the OPEN list, it places nodes which have yet
not been expanded.
On each iteration, each node n with the lowest heuristic value is expanded
and generates all its successors and n is placed to the closed list. The
algorithm continues unit a goal state is found.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Concept of Heuristic Search
 In AI there are some problems that are too complex to solve by
direct search techniques.
 Direct techniques (Blind search or Uninformed search) like
Breadth First Search and Depth First Search are not
always possible as they require too much time or memory.
 BFS and DFS are uninformed in that they did not take into
account
the goal.
trying They
to get do notthey
to unless use happen
any information
to about on
stumble where
a they are
goal. applied
 Weak techniques like heuristic search can be effective if
 correctly onisthe
A heuristic right kinds
a method that of tasks.
might not always find the best
solution but guarantees to find reasonable solution in
reasonable time

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Concept of Heuristic Search Continue
 The basic idea of heuristic search is that, rather than trying all
possible search paths, you try and focus on paths that seem to be
getting you nearer your goal state.
 Of course, you generally can't be sure that you are really near
your goal state, but you might be able to have a good guess.
Heuristics are used to help us make that guess.
 To use heuristic search you need an evaluation function
(Heuristic function) that scores a node in the search tree
according to how close to the target/goal state it seems to be.
 Basically heuristic function guides the search process in the
most profitable direction by suggesting which path to follow
first when more than one is available.
 The more accurately the heuristic function estimates the true
merits of each node in the search tree (or graph), the more direct
the solution process.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Concept of Heuristic Search Continue
 In the extreme, the heuristic function would be so good that
essentially no search would be required. The system would
move directly to a solution. But for many problems, the
cost of computing the value of such a function would
outweigh the effort saved in the search process.
 Hence heuristic is a technique, which sometimes will work, but
not always. Heuristics are the approximations used to
minimize the searching process.
 The heuristic function is a way to inform the search about the
direction to a goal. It provides an informed way to guess
which neighbour of a node will lead to a goal.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Best-first Search
In the informed search we will discuss two main algorithms which are
given below:

Best First Search Algorithm(Greedy search)

A* Search Algorithm

1) Best-first Search Algorithm (Greedy

Search):

Greedy best-first search algorithm always selects the path which appears
best at that moment. It is the combination of depth-first search and breadth-
first search algorithms. It uses the heuristic function and search. Best-first
search allows us to take the advantages of both algorithms. With the help of
best-first search, at each step, we can choose the most promising node. In
the best first search algorithm, we expand the node which is closest to the
goal node and the closest cost is estimated by heuristic function, i.e.
f(n)= g(n). Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Best-first Search
Where, h(n)= estimated cost from node n to the goal.
The greedy best first algorithm is implemented by the priority queue.

Best first search algorithm:

Step 1: Place the starting node into the OPEN list.

Step 2: If the OPEN list is empty, Stop and return failure.
Step 3: Remove the node n, from the OPEN list which has the lowest value
of h(n), and places it in the CLOSED list.
Step 4: Expand the node n, and generate the successors of node n.
Step 5: Check each successor of node n, and find whether any node is a goal
node or not. If any successor node is goal node, then return success and
terminate the search, else proceed to Step 6.
Step 6: For each successor node, algorithm checks for evaluation function
f(n), and then check if the node has been in either OPEN or CLOSED list. If
the node has not been in both list, then add it to the OPEN list.
Step 7: Return to Step 2.
Best-first Search
Advantages:
Best first search can switch between BFS and DFS by gaining the
advantages of both the algorithms.
This algorithm is more efficient than BFS and DFS algorithms.

Disadvantages:
It can behave as an unguided depth-first search in the worst case scenario.
It can get stuck in a loop as DFS.
This algorithm is not optimal.

Example:
Consider the below search problem, and we will traverse it using greedy
best-first search. At each iteration, each node is expanded using evaluation
function f(n)=h(n) , which is given in the below table.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Best-first Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Best-first Search
In this search example, we are using two lists which are OPEN and CLOSED Lists. Following are
the iteration for traversing the above example.
Best-first Search
Expand the nodes of S and put in the CLOSED list
Initialization: Open [A, B], Closed [S]
Iteration 1: Open [A], Closed [S, B]
Iteration 2: Open [E, F, A], Closed [S,
B]
: Open [E, A], Closed [S, B,
F]
Iteration 3: Open [I, G, E, A], Closed [S,
B, F]
: Open [I, E, A], Closed [S, B,
F, G]
Hence the final solution path will be:
S----> B----->F----> G
Time Complexity: The worst case time complexity of Greedy best first
search is O(bd).
Space Complexity: The worst case space complexity of Greedy best
first
search is O(bd). Where, m is the maximum depth of the search space.
BFS search tree

Fig. A Best First

Search Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Generate and Test Algorithm
Generate and Test algorithm:
1. Generate a possible solution. For some problems, this means
generating a point in the problem space or generating a path
from the initial state
2. Test to see if this is the expected solution.
3. If the solution has been found quit else go to step 1.

There are two forms of generate and test algorithm

 Exhaustive generate-and-test: The most simplest way to
implement exhaustive generate and test algorithm is a depth
first search tree with backtracking.
 Heuristic generate-and-test: not consider paths that seem
unlikely to lead to a solution.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Generate and Test Algorithm

Advantages of generate and test algorithm

 It solves the problem with multiple solution
 It is acceptable for simple problems

Disadvantages of generate and test algorithm

 Inefficient for problems with large space
 There is no absolute assurance that algorithm will
find global optimum

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
A* Search
2) A* Search Algorithm:

A* search is the most commonly known form of best-first search. It uses

heuristic function h(n), and cost to reach the node n from the start state g(n).
It has combined features of UCS and greedy best-first search, by which it
solve the problem efficiently. A* search algorithm finds the shortest path
through the search space using the heuristic function. This search algorithm
expands less search tree and provides optimal result faster. A* algorithm is
similar to UCS except that it uses g(n)+h(n) instead of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach
the node. Hence we can combine both costs as following, and this sum is
called as a fitness number.

Mrs.Harsha Patil,Dr. D. Y. Patil ACS College, Pimpri

Pune.
A* Search
Note-At each point in the search space, only those node is expanded which
have the lowest value of f(n), and the algorithm terminates when the goal
node is found.

Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then
return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of
evaluation function (g+h), if node n is goal node then return success and
stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the
closed list. For each successor n', check whether n' is already in the OPEN
or CLOSED list, if not then compute evaluation function for n' and place
into Open list.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
A* Search
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be
attached to the back pointer which reflects the lowest g(n') value.

Step 6: Return to Step 2.

Advantages:
A* search algorithm is
the best algorithm than
other search algorithms.
A* search algorithm is optimal and complete.
This algorithm can solve very complex problems.

Disadvantages:
It does not always produce the shortest path as it
mostly based on heuristics
and approximation.
A* search algorithm has some complexity issues. Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
The main drawback of A* is memory
A* Search
Example:
In this example, we will traverse the given graph using the A* algorithm.
The heuristic value of all states is given in the below table so we will
calculate the f(n) of each state using the formula f(n)= g(n) + h(n), where
g(n) is the cost to reach any node from start state.
Here we will use OPEN and CLOSED list.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
A* Search
Solution:
A* Search
Initialization: {(S, 5)}
Iteration1: {(S--> A, 4), (S-->G, 10)}
Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B,
7), (S-->G, 10)}
Iteration 4 will give the final result, as S--->A--->C--->G it provides the
optimal path with cost 6.

Points to remember:

 A* algorithm returns the path which occurred first, and it does not search
for all remaining paths.
 The efficiency of A* algorithm depends on the quality of heuristic.
 A* algorithm expands all nodes which satisfy the condition f(n)

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
A* Search
Complete: A* algorithm is complete as long as:
Branching factor is finite.
Cost at every action is fixed.
Optimal: A* search algorithm is optimal if it follows below two conditions:
 Admissible: the first condition requires for optimality is that h(n) should
be an admissible heuristic for A* tree search. An admissible heuristic is
optimistic in nature.
 Consistency: Second required condition is consistency for only A* graph-
search.
If the heuristic function is admissible, then A* tree search will always find
the least cost path.
Time Complexity: The time complexity of A* search algorithm depends on
heuristic function, and the number of nodes expanded is exponential to the
depth of solution d. So the time complexity is O(b^d), where b is the
branching factor.
Space Complexity: The space complexity of A* search algorithm
is O(b^d)
A* Algorithm Example
Node Heuristic
Cost
A 223
B 222
C 166
D 192
E 166
F 136
G 122
H 111
I 100
J 60
K 32
L 102
Using A* Algorithm find least cost path form A to M 0
M
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
A* Algorithm Example
 Put initial node i. e A in START list and calculate its f
value
F(A) =g(A)+h(A)
= 0+223=223

Therefore START =[A(223)]

 Remove the Bestnode form START i.e A which is not our goal
node and hence generate its Successor i.e B and C and calculate
their f value
F(A-B) =g(B)+h(B) f(A-C) =g(C)+h(C)
= 36+222= 258 =61+166=227
 Put these node in START list and sort the list
Therefore START list = [C(227),B(258)]
 Remove Bestnode from START i.e C which is not our
goal node and hence generate its successor i.e D, F and L
and calculate their f values

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
A* Algorithm Example
F(A-C-D)=g(D)+h(D) f(A-C-F)=g(F)+h(F)
=93+192=285 = 92+136=228
F(A-C-L)=g(L)+h(L)
=141+102=243
 Put these node in START list and sort the list
Therefore START list = [ F(228),L(243),B(258),D(285)]
 Remove Bestnode from START i.e F which is not our goal
node and hence generate its successor i.e J and K and
calculate their f values
F(A-C-F-J)=g(J)+h(J) f(A-C-F-K)=g(k)+h(k)
= 104+60=164 =204+32=236

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
A* Algorithm Example
 Put these node in START list and sort the list
Therefore START =[ J(164), K(236),L(243),B(258),D(285)]
 Remove Bestnode from START i.e J which is not our goal
node and hence generate its successor i.e K and I and
calculate its f values
F(A-C-F-J-K)=g(k)+h(k) F(A-C-F-J-I)= g(I)+h(I)
=140+32=172

=149+100=249
 Put these node in START list and sort the list
hence START = [ K(172),L(243),I(249),B(258),D(285)]
 Remove Bestnode from START i.e K which is not our goal
node and hence generate its successor i.e M and calculate
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
its f values Pune.
A* Algorithm Example
F(A-C-F-J-K-M) = g(m)+h(m)
=172+0=172
 Put these node in START list
and sort the list
Therefore START =
[ M(172),L(243),B(258),D(2
85)]
 Remove Bestnode from START i.e M which is our
goal node

 Hence least cost path from A to Mis

A-C-F-J-K-M with cost 172
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
AO* or AND-OR
Search
AO* Algorithm = (AND/OR),Problem Decomposition)

 AND-OR graphs are useful for certain problems where the solution
involves decomposing the problem into smaller problems. This is called
Problem Reduction.
 Here, alternatives involves branches where some or all must be satisfied
before we can progress.
 In case of A* algorithm, we use the open list to hold nodes that have been
generated but not expanded & the closed list to hold nodes that have been
expanded.
 It requires that nodes traversed in the tree be labelled as, SOLVED or
UNSOLVED in the solution process to account for AND node solutions
which requires solutions to all successor nodes.
A solution is found when the start node is labelled as SOLVED.
AO* is best algorithm for solving cyclic AND-OR graphs.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
AO* Search
Example:

TV Set

Steal TV Earn Money Buy TV Set

AND

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
AO* Search
Algorithm:

Step 1: Place the starting node into OPEN.

Step 2: Compute the most promising solution tree say T0.
Step 3: Select a node n that is both on OPEN and a member of T0.
Remove it from OPEN and place it in
CLOSE
Step 4: If n is the terminal goal node then leveled n as solved and leveled
all the ancestors of n as solved. If the starting node is marked as solved then
success and exit.
Step 5: If n is not a solvable node, then mark n as unsolvable. If starting
node is marked as unsolvable, then return failure and exit.
Step 6: Expand n. Find all its successors and find their h (n) value, push
them into OPEN.
Step 7: Return to Step 2.
Step 8: Exit.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
AO* Search
Advantages:

It is an optimal algorithm.

If traverse according to the ordering of nodes. It can be used for both OR
and AND graph.

Disadvantages:

Sometimes for unsolvable nodes, it can’t find the optimal path. Its
complexity is than other algorithms.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
AND-OR
AO* Search
Graphs
Following figure shows AND-OR
graph

Matrix multiplication of A1A2A3 can be represented by AND-

OR graph as
A1*A2*A3

(A1A2)A3 AND AND A1(A2A3)

A1*A2 A3 A1 A2*A3

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
AND-OR AO* Graph Search
To find the solutionContinue
in AND-OR graph it must handle AND area
appropriately.

1 1
1

In figure the top node A has been expanded producing two area one leading to B
and leading to C- . The numbers at each node represent heuristic cost (h at that
node (cost of getting to the goal state from current state). For simplicity, it is
assumed that every operation(i.e. applying a rule) has unit cost, i.e., each arc with
single successor will have a cost of 1 and each of its components. With the
available information till now , it appears that C is the most promising node to
expand since its h = 3 , the lowest but going through B would be better since to
use C we must also use D and the cost would be 9(3+4+1+1). Through B it
would be 6(5+1).

Thus the choice of the next node to expand depends not only on a value but also
on whether that node is part of the current best path form the initial node.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
AND-OR
AO*Graph
Search
Continue

In figure the node G appears to be the most promising node, with the
least f ' value. But G is not on the current best path, since to use G we
must use GH with a cost of 9 and again this demands that arcs be used
(with a cost of 27). The path from A through B, E-F is better with a
total cost of (17+1=18).

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
AND-OR AO* Graph
Search
Thus we can Continue
see that to search an AND-OR graph, the following three
things must be done.
1. Traverse the graph starting at the initial node and following the
current best path, and accumulate the set of nodes that are
on the path and have not yet been expanded.

2. Pick one of these unexpanded nodes and expand it. Add its
successors to the graph and computer f (cost of the
remaining distance) for each of them.

3. Change the f estimate of the newly expanded node to reflect the

1.Hill climbing algorithm is a local search algorithm which continuously

moves in the direction of increasing elevation/value to find the peak of
the mountain or best solution to the problem. It terminates when it
reaches a peak value where no neighbor has a higher value.

2.Hill climbing algorithm is a technique which is used for optimizing the

mathematical problems. One of the widely discussed examples of Hill
climbing algorithm is Traveling-salesman Problem in which we need to
minimize the distance traveled by the salesman.

3.It is also called greedy local search as it only looks to its

good immediate neighbor state and not beyond that.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Hill Climbing
4. A node of hill climbing algorithm has two components which are
state and value.
5. Hill Climbing is mostly used when a good heuristic is available.
6. In this algorithm, we don't need to maintain and handle the search
tree or graph as it only keeps a single current state.

Features of Hill Climbing:

Generate and Test variant: Hill Climbing is the variant of Generate

and Test method. The Generate and Test method produce feedback
which helps to decide which direction to move in the search space.
Greedy approach: Hill-climbing algorithm search moves in
the direction which optimizes the cost.
No backtracking: It does not backtrack the search space, as it does
not remember the previous states.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Hill Climbing
Types of Hill Climbing Algorithm:
1. Simple hill Climbing:
2. Steepest-Ascent hill-climbing:
3. Stochastic hill Climbing:

1. Simple Hill Climbing:

Simple hill climbing is the simplest way to implement a hill

climbing algorithm. It only evaluates the neighbor node state at a
time and selects the first one which optimizes current cost and
set it as a current state. It only checks it's one successor state, and
if it finds better than the current state, then move else be in the same
state. This algorithm has the following features:
1. Less time consuming
2. Less optimal solution and the solution is not guaranteed
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Hill Climbing
Algorithm for Simple Hill Climbing:

Step 1: Evaluate the initial state, if it is goal state then return success and
Stop.
Step 2: Loop Until a solution is found or there is no new operator left to
apply.
Step 3: Select and apply an operator to the current state.
Step 4: Check new state:
a. If it is goal state, then return success and quit.
b.Else if it is better than the current state then assign new state
as a current state.
c. Else if not better than the current state, then return to step2.
Step 5: Exit.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Hill Climbing
2. Steepest-Ascent hill climbing:

The steepest-Ascent algorithm is a variation of simple hill climbing

algorithm. This algorithm examines all the neighboring nodes of the
current state and selects one neighbor node which is closest to the
goal state. This algorithm consumes more time as it searches for
multiple neighbors

Algorithm for Steepest-Ascent hill climbing:

Step 1: Evaluate the initial state, if it is goal state then return

success and stop, else make current state as initial state.
Step 2: Loop until a solution is found or the current state does
not change

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Hill Climbing
Step 3:Let SUCC be a state such that any successor of the current
state will be better than it.
Step 4: For each operator that applies to the current state:
a. Apply the new operator and generate a new state.
b. Evaluate the new state.
c. If it is goal state, then return it and quit, else compare it to the
SUCC.
d. If it is better than SUCC, then set new state as SUCC.
e.If the SUCC is better than the current state, then set current state
to SUCC.
Step 5: Exit.

3. Stochastic hill climbing:

Stochastic hill climbing does not examine for all its neighbor before moving.
Rather, this search algorithm selects one neighbor node at random and
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
decides whether to choose it as a current state or
Pune.examine another state.
Hill Climbing
Problems in Hill Climbing Algorithm:

1. Local Maximum: A local maximum is a peak state in the landscape

which is better than each of its neighboring states, but there is another state
also present which is higher than the local maximum.
Solution: Backtracking technique can be a solution of the local maximum
in state space landscape. Create a list of the promising path so that the
algorithm can backtrack the search space and explore other paths as well.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Hill Climbing
2. Plateau: A plateau is the flat area of the search space in which all the
neighbor states of the current state contains the same value, because of this
algorithm does not find any best direction to move. A hill-climbing search
might be lost in the plateau area.
Solution: The solution for the plateau is to take big steps or very little steps
while searching, to solve the problem. Randomly select a state which is far
away from the current state so it is possible that the algorithm could find
non-plateau region.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Hill Climbing
3. Ridges: A ridge is a special form of the local maximum. It has an area
which is higher than its surrounding areas, but itself has a slope, and cannot
be reached in a single move.
Solution: With the use of bidirectional search, or by moving in different
directions, we can improve this problem.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Hill Climbing
Simulated Annealing:

A hill-climbing algorithm which never makes a move towards a lower

value guaranteed to be incomplete because it can get stuck on a local
maximum. And if algorithm applies a random walk, by moving a successor,
then it may complete but not efficient. Simulated Annealing is an
algorithm which yields both efficiency and completeness.

In mechanical term Annealing is a process of hardening a metal or glass to

a high temperature then cooling gradually, so this allows the metal to reach
a low-energy crystalline state. The same process is used in simulated
annealing in which the algorithm picks a random move, instead of picking
the best move. If the random move improves the state, then it follows the
same path. Otherwise, the algorithm follows the path which has a
probability of less than 1 or it moves downhill and chooses another path.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Search tree for 8 Puzzle problem
Here a heuristic function is 1 2 3 The “best” move is the one with
used which return number
of tiles in the incorrect 7 8 4 the lowest number returned by the
position heuristic function
6 5

1 2 3 1 2 3 1 2 3
H=4 7 8 4 7 8 4 7 4
H=2 H=3
6 5 6 5 6 8 5

1 2 3 1 2 3 1 2 3
7 8 H=1 8 4 7 8 4 H=3
H=5
6 5 4 7 6 5 6 5

2 3 1 2 3
Search tree for 8
1 8 4 H=0 8 4
H=2 puzzle problem by hill
7 6 5 Goal State 7 6 5 climbing procedure
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Tower of Hanoi Problem
Problem Statement:

The Tower of Hanoi is a mathematical game or puzzle.It consist of

1. Only one disk can be moved at a time.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

 We have studied the strategies which can reason either in forward or

backward, but a mixture of the two directions is appropriate for solving a
complex and large problem. Such a mixed strategy, make it possible that
first to solve the major part of a problem and then go back and solve the
small problems arise during combining the big parts of the problem. Such a
technique is called Means-Ends Analysis.
 Means-Ends Analysis is problem-solving techniques used in Artificial
intelligence for limiting search in AI programs.
It is a mixture of Backward and forward search technique.
 The MEA technique was first introduced in 1961 by Allen Newell, and
Herbert A. Simon in their problem-solving computer program, which was
named as General Problem Solver (GPS).
The MEA analysis process centered on the evaluation of the difference
between the current state and goal state.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Means-Ends Analysis
How means-ends analysis Works:

The means-ends analysis process can be applied recursively for a problem.

It is a strategy to control search in problem-solving. Following are the main
Steps which describes the working of MEA technique for solving a
problem.

 First, evaluate the difference between Initial State and final State.
 Select the various operators which can be applied for each difference.
 Apply the operator at each difference, which reduces the difference
between the current state and goal state.

Algorithm for Means-Ends Analysis:

Let's we take Current state as CURRENT and Goal State as GOAL, then
following are the steps for the MEA algorithm.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Means-Ends Analysis
Step 1: Compare CURRENT to GOAL, if there are no differences between
both then return Success and Exit.
Step 2: Else, select the most significant difference and reduce it by doing
the following steps until the success or failure occurs.
Select a new operator O which is applicable for the current difference,
and if there is no such operator, then signal failure.
Attempt to apply operator O to CURRENT. Make a description of two
states.
i) O-Start, a state in which O?s preconditions are satisfied.
ii) O-Result, the state that would result if O were applied In O-start.
If
(First-Part <------ MEA (CURRENT, O-START)
And
(LAST-Part <----- MEA (O-Result, GOAL), are successful, then
signal Success and return the result of combining FIRST-PART, O, and
LAST-PART.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Means-Ends Analysis
Example of Mean-Ends Analysis:
Let's take an example where we know the initial state and goal state as
given below. In this problem, we need to get the goal state by finding
differences between the initial state and goal state and applying operators.

Solution:
To solve the above problem, we will first find the differences between initial
states and goal states, and for each difference, we will generate a new state
and will apply the operators. The operators we have for this problem are:
Move
Delete
Expand
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Means-Ends Analysis
1. Evaluating the initial state: In the first step, we will evaluate the initial state and
will compare the initial and Goal state to find the differences between both states.

2. Applying Delete operator: As we can check the first difference is that in goal state
there is no dot symbol which is present in the initial state, so, first we will apply the
Delete operator to remove this dot.

3. Applying Move Operator: After applying the Delete operator, the new state occurs
which we will again compare with goal state. After comparing these states, there is
another difference that is the square is outside the circle, so, we will apply the
Move Operator.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Means-Ends Analysis

4. Applying Expand Operator: Now a new state is generated in the third step, and we will
compare this state with the goal state. After comparing the states there is still one difference
which is the size of the square, so, we will apply Expand operator, and finally, it will generate
the goal state.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Example of MEA
Problem Definition:
consider a household robot whose task is to move a desk with two
objects upon it from one room to another. The desk can only be moved
if clear and the two objects must be placed upon the desk when it is in
the other room. Following operators are available
Operator Preconditions Result
PUSH(obj,loc) At(robot,obj)^large(obj)^ at(obj, loc)ât(robot, loc)
clear(obj)ârmempty
CARRY(obj, loc) At(robot,obj)^small(obj) at(obj, loc)ât(robot, loc)
WALK(loc) None At(robot,loc)
PICKUP(obj) At(robot,obj)^small(obj) Holding(obj)
PUTDOWN(obj) Holding(obj) !(holding obj)
PLACE(obj1,obj2) At(robot,obj1)^holding(obj2) On(obj1,obj2)

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Example of MEA
Solution:
Let R1 and R2 Represent Room1 and Romm2 respectively
A, B represent small objects Object 1 and Object 2
respectively
D represent large object Desk

START
1 2 3 4 5 6 7
WALK(R1) PICKUP(A) PUTDOWN(A) PICKUP(B) PUTDOWN(B) PUSH(D,R2) WALK (R1)

8 9 10 11 12 13 14
PICKUP(A) CARRY(A,R2) PUTDOWN(A) WALK(R1) PICKUP(B) CARRY (B,R2) PLACE(A,B )
GOAL

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Constraint Satisfaction Problem
Constraint Satisfaction Problem:
 Constraint satisfaction is a problem solving technique. It is a finite choice
decision problem, where one is given a fixed set of decisions to make. Each
decision involved choosing among a fixed set of options.
 Each constraint restrict the combination of choices that can be taken
simultaniously. The task is to make all decisions such that all the constraints
are satisfied. Many real life problems can be solved by CSP. E.g.
timetabling, planning.
Following are the CSP’s:
1. Cryptarithmatic problem.
2. The N-Queen problem.
3. A Crossword problem.
4. A map colouring problem.
5. Latin Square problem.
6.8-queen puzzle problem.
7.Sudoku problem.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Cryptarithmetic Problem
A Crypt arithmetic is a mathematical puzzle in which the digits are
replaced by alphabet.

Consider an arithmetic problem represented in alphabets. Assign a

digit to each of the alphabet in a such way that the answer to the
problem is correct. If the same alphabet occurs more than once, it must
be assigned the same digit each time. No two different alphabets may
be assigned the same digit.

Eg.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Cryptarithmatic Problem
 Since the result is one digit more than the numbers, it is quite
obvious that there is a carry over and therefore M must be equal
to 1. Hence M=1.
 Now S+M=O, as M=1 S can not be less than or equal to 8 as
there is carry over next level. Therefore S=9 and hence O=0
 E+O=N i.e E+0=N is not possible as E !=N, Therefore it
should be 1+E+0=N where 1 is carry from
N+E=R. Hence E=N-1
 Now for N+R=E ,the possible cases are,
N + R = 10 + E - - - (1) or
1 + N + R = 10 + E - - - (2)
Substituting E = N -1 in the first equation, N + R = 10 + N - 1,
we get R = 9 which is not possible as S=9.
Substituting E = N - 1 in the second equation,
1 + N + R = 10 + N - 1, we get R = 8.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Cryptarithmatic Problem
 Now E=N-1 means that N and E are consecutive numbers
and N is larger
 Taking (N,E)=(6,5) satisfies the condition of 1+N+R=10+E
and 1+E+0=N
 Now Hence N=6 andAs
D+E=10+Y E=5E=5, D must be greater than 5
Therefore D=7 as 6,8,9 are already assigned to N,R,S
resp. Y=D+E-10
=7+5-10 Therefore Y=2. Hence the result
is

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Cryptarithmatic Problem
Q.) Solve the following crypt arithmetic
problem

Solution:
 From first row of multiplication it is clear that B=1
as JE*B=JE
 As in the multiplication, second row should start from 0
at tenth's place. So A = 0.
 Now in the hundred's place, J + Something = 10.
When you add something to the single digit number that
results in
10. So J = 9.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Cryptarithmatic Problem
Now J+E=10+D i.e 9+E=10+D . Here E can not be 0,1
as these digits are assigned to A and Bresp.
Assume E=2 which gives 9+2=11 means D=1 which is
not possible therefore E can not be 2
Assume E=3 which gives 9+3=12 hence D=2
Hence the solution is

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Cryptarithmatic Problem
Q.) Solve the following crypt arithmetic
problem

Solution:
From the first row of multiplication, H =1 is clear, As HE x H =
HE.
Now, H+A=M i.e 1+A=10+M as there is carry over next level
Therefore A=9 ,M=0 and N=2
Now, HE*E=HHA i.e 1E*E=119 so by trial and error we get E=7

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
N- Queen Problem
Problem Definition:
This problem consists of n queens which are to be arranged on n*n
chess board such that, no two queen can attack each other

Q. Give state space representation of n queen problem

Initial state: Initially the board is empty

Goal State: All queen should be arranged in a manner such that

no two queens should attack each other

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
N- Queen Problem
Solution of n queen problem:
* *
*
*

* *
* *
*
*

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
8 Puzzle Problem
1 2 3
7 8 4
6 5

1 2 3 1 2 3 1 2 3
7 8 4 7 8 4 7 4
6 5 6 5 6 8 5

Which move is Best?

1 2 3
8 4
7 6 5

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
8 Puzzle Problem
Sate space representation for 8 puzzle problem:
Initial state: Initially the numbers are placed in random
order
2 8 3
1 6 4
7 5

Goal State: We have to place the numbers is following

order
1 2 3
8 4
7 6 5

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
8 Puzzle Problem
2 8 3 2 3
1 6 4 1 8 4
7 5 7 6 5

2 8 3 1 2 3
1 4 8 4
7 6 5 7 6 5

2 3 1 2 3
1 8 4 8 4
7 6 5 7 6 5

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
BFS Tree for 8 Puzzle Problem

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Map Constraint Satisfaction
or GraphColoring Problem
cont.
Map coloring problem
Problem Definition: Consider a following map of a state which you
have to color with Red, Green and Blue color such that no adjacent
city have the same color.

Solution:
Set of variables Xi=[Pune, Mumbai, Nasik, Jalgaon, Nagpur] Set
of
domain Di=[Red, Green, Blue] for each xi
Constraint: No adjacent city have the same color
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Map or GraphColoring Problem
City/ Operation Pune Nasik Mumbai Nagpur Jalgaon
Initial Domain RGB RGB RGB RGB RGB
Assign Red to Pune R GB GB RGB RGB
Assign Green to Nasik R G B RG RG
Assign Red to Nagpur R G B R G
Assign Green to R G B R G
Jalgaon

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Thank You

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Chapter 3: Heuristic & Blind Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Search and Control Strategies
Types of search algorithms:
Based on the search problems we can classify the search algorithms into
uninformed
(Blind search) search and informed search (Heuristic search) algorithms.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

It can be divided into five main types:

Breadth-first search
Uniform cost search
Depth-first search
Iterative deepening depth-first search
Bidirectional Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Search and Control Strategies
Uninformed Search Algorithms

 Uninformed search is a class of general-purpose search algorithms which

operates in brute force-way.
 Uninformed search algorithms do not have additional information about
state or search space other than how to traverse the tree, so it is also called
blind search.

Following are the various types of uninformed search algorithms:

 The direction in which to conduct search (forward versus backward

reasoning). If the search proceeds from start state towards a goal state, it is
a forward search or we can also search from the goal.

 How to select applicable rules (Matching). Production systems typically

spend most of their time looking for rules to apply. So, it is critical to have
efficient procedures for matching rules against states.

 How to represent each node of the search process (knowledge

representation problem).

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Breadth-first Search
1. Breadth-first Search:

 Breadth-first search is the most common search strategy for traversing a

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Breadth-first Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Space Complexity: Space complexity of BFS algorithm is given by the

Memory size of frontier which is O(bd).

Completeness: BFS is complete, which means if the shallowest goal node

is at some finite depth, then BFS will find a solution.

Optimality: BFS is optimal if path cost is a non-decreasing function of the

depth of the node.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Depth-first Search
2. Depth-first Search

 Depth-first search is a recursive algorithm for traversing a tree or graph

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Example:
In the below search tree, we have shown the flow of depth-first search, and
it will follow the order as:

Root node--->Left node ----> right node.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Depth-first Search
Completeness: DFS search algorithm is complete within finite state space as
it will expand every node within a limited search tree.

Time Complexity: Time complexity of DFS will be equivalent to the node

traversed by the algorithm. It is given by:
T(n)= 1+ n2+ n3 +.........+ nm=O(nm)
Where, m= maximum depth of any node and this can be much larger
than d (Shallowest solution depth)

Space Complexity: DFS algorithm needs to store only single path from the
root node, hence space complexity of DFS is equivalent to the size of the
fringe set, which is O(bm).

Optimal: DFS search algorithm is non-optimal, as it may generate a large

number of steps or high cost to reach to the goal node.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Difference between BFS & DFS
BFS DFS

BFS stands for Breadth First Search. DFS stands for Depth First Search.

BFS(Breadth First Search) uses Queue data structure for

DFS(Depth First Search) uses Stack data structure.
finding the shortest path.

BFS can be used to find single source shortest path in an

In DFS, we might traverse through more edges to reach a
unweighted graph, because in BFS, we reach a vertex
destination vertex from a source.
with minimum number of edges from a source vertex.

BFS is more suitable for searching vertices which are DFS is more suitable when there are solutions away from
closer to the given source. source.

DFS is more suitable for game or puzzle problems. We

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Depth-Limited Search
3. Depth-Limited Search Algorithm:

 A depth-limited search algorithm is similar to depth-first search with a

Advantages:
Depth-limited search is Memory efficient.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Depth-Limited Search
Disadvantages:

Depth-limited search also has a disadvantage of incompleteness.

It may not be optimal if the problem has more than one solution.
Example:

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Depth-Limited Search
Completeness: DLS search algorithm is complete if the solution is above
the depth-limit.

Time Complexity: Time complexity of DLS algorithm is O(bℓ).

Space Complexity: Space complexity of DLS algorithm is O(b×ℓ).

Optimal: Depth-limited search can be viewed as a special case of DFS, and

it is also not optimal even if ℓ>d.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Uniform-cost Search
4. Uniform-cost Search Algorithm:

 Uniform-cost search is a searching algorithm used for traversing a

Disadvantages:
 It does not care about the number of steps involve in searching and only
concerned about path cost. Due to which this algorithm may be stuck in an
infinite loop.
Example:

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Uniform-cost Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Uniform-cost Search
Completeness:
Uniform-cost search is complete, such as if there is a solution, UCS will find it.

Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of
Uniform-cost search is O(b1 + [C*/ε]).

 The iterative deepening algorithm is a combination of DFS and BFS

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Iterative deepening depth-first Search
Advantages:
It combines the benefits of BFS and DFS search algorithm in terms of fast
search and memory efficiency.

Disadvantages:
The main drawback of IDDFS is that it repeats all the work of the
previous phase.

Example:

Following tree structure is showing the iterative deepening depth-first

search. IDDFS algorithm performs various iterations until it does not find
the goal node. The iteration performed by the algorithm is given as:

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Iterative deepening depth-first Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

 Bidirectional search algorithm runs two simultaneous searches, one form

Advantages:
Bidirectional search is fast.
Bidirectional search requires less memory

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Bidirectional Search
Disadvantages:
Implementation of the bidirectional search tree is difficult.
In bidirectional search, one should know the goal state in advance.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Bidirectional Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Bidirectional Search
Completeness: Bidirectional Search is complete if we use BFS in both
searches.

Time Complexity: Time complexity of bidirectional search using BFS

is O(bd).

Space Complexity: Space complexity of bidirectional search is O(bd).

Optimal: Bidirectional search is Optimal.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Informed Search Algorithms
The informed search algorithm is more useful for large search space.
Informed search algorithm uses the idea of heuristic, so it is also called
Heuristic search.

Heuristics function: Heuristic is a function which is used in Informed

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Informed Search Algorithms
Admissibility of the heuristic function is given as:

h(n) <= h*(n)

Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic
cost should be less than or equal to the estimated cost.

Pure Heuristic Search:

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Best-first Search
In the informed search we will discuss two main algorithms which are
given below:

Best First Search Algorithm(Greedy search)

A* Search Algorithm

1) Best-first Search Algorithm (Greedy

Search):

Best first search algorithm:

Step 1: Place the starting node into the OPEN list.

Disadvantages:
It can behave as an unguided depth-first search in the worst case scenario.
It can get stuck in a loop as DFS.
This algorithm is not optimal.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Best-first Search

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Fig. A Best First

Search Mrs.Harsha Patil, Dr. D. Y. Ptil ACS College, Pimpri Pune.
Generate and Test Algorithm
Generate and Test algorithm:
1. Generate a possible solution. For some problems, this means
generating a point in the problem space or generating a path
from the initial state
2. Test to see if this is the expected solution.
3. If the solution has been found quit else go to step 1.

There are two forms of generate and test algorithm

Mrs.Harsha Patil, Dr. D. Y. Patil ACS Colege, Pimpri Pune.

Generate and Test Algorithm

Advantages of generate and test algorithm

 It solves the problem with multiple solution
 It is acceptable for simple problems

Disadvantages of generate and test algorithm

 Inefficient for problems with large space
 There is no absolute assurance that algorithm will
find global optimum

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
A* Search
2) A* Search Algorithm:

A* search is the most commonly known form of best-first search. It uses

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
A* Search
Note-At each point in the search space, only those node is expanded which
have the lowest value of f(n), and the algorithm terminates when the goal
node is found.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
A* Search
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be
attached to the back pointer which reflects the lowest g(n') value.

Step 6: Return to Step 2.

Advantages:
A* search algorithm is
the best algorithm than
other search algorithms.
A* search algorithm is optimal and complete.
This algorithm can solve very complex problems.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Points to remember:

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Therefore START =[A(223)]

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

 Hence least cost path from A to Mis

A-C-F-J-K-M with cost 172
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
AO* or AND-OR
Search
AO* Algorithm = (AND/OR),Problem Decomposition)

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
AO* Search
Example:

TV Set

Steal TV Earn Money Buy TV Set

AND

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
AO* Search
Algorithm:

Step 1: Place the starting node into OPEN.

It is an optimal algorithm.

If traverse according to the ordering of nodes. It can be used for both OR
and AND graph.

Disadvantages:

Sometimes for unsolvable nodes, it can’t find the optimal path. Its
complexity is than other algorithms.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
AND-OR
AO* Search
Graphs
Following figure shows AND-OR
graph

Matrix multiplication of A1A2A3 can be represented by AND-

OR graph as
A1*A2*A3

(A1A2)A3 AND AND A1(A2A3)

A1*A2 A3 A1 A2*A3

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
AND-OR AO* Graph Search
To find the solutionContinue
in AND-OR graph it must handle AND area
appropriately.

1 1
1

Thus the choice of the next node to expand depends not only on a value but also
on whether that node is part of the current best path form the initial node.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
AND-OR
AO*Graph
Search
Continue

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

2. Pick one of these unexpanded nodes and expand it. Add its
successors to the graph and computer f (cost of the
remaining distance) for each of them.

3. Change the f estimate of the newly expanded node to reflect the

new information produced by its successors. Propagate this
change backward through the graph. Decide which of
the current best path.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Constraint Satisfaction Problem
Constraint Satisfaction Problem:
 Constraint satisfaction is a problem solving technique. It is a finite choice
decision problem, where one is given a fixed set of decisions to make. Each
decision involved choosing among a fixed set of options.
 Each constraint restrict the combination of choices that can be taken
simultaniously. The task is to make all decisions such that all the constraints
are satisfied. Many real life problems can be solved by CSP. E.g.
timetabling, planning.
Following are the CSP’s:
1. Cryptarithmatic problem.
2. The N-Queen problem.
3. A Crossword problem.
4. A map colouring problem.
5. Latin Square problem.
6.8-queen puzzle problem.
7.Sudoku problem.
Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri
Pune.
Cryptarithmetic Problem
A Crypt arithmetic is a mathematical puzzle in which the digits are
replaced by alphabet.

Consider an arithmetic problem represented in alphabets. Assign a

Eg.

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Cryptarithmatic Problem
Q.) Solve the following crypt arithmetic
problem

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Cryptarithmatic Problem
Q.) Solve the following crypt arithmetic
problem

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
N- Queen Problem
Problem Definition:
This problem consists of n queens which are to be arranged on n*n
chess board such that, no two queen can attack each other

Q. Give state space representation of n queen problem

Initial state: Initially the board is empty

Goal State: All queen should be arranged in a manner such that

no two queens should attack each other

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
N- Queen Problem
Solution of n queen problem:
* *
*
*

* *
* *
*
*

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Thank You

Mrs.Harsha Patil, Dr. D. Y. Patil ACS College, Pimpri

Pune.
Introduction to Python
 Areadable, dynamic, pleasant,
 flexible, fast and powerful language

-
Python Overview

-
-
Job Trend

Per the indeed.com, percentage growth of Python is 500 times more than it’s
peer Languages.
http://www.indeed.com/jobtrends?q=Perl%2C+.Net%2C+Python%2Cjava&l=&rel
ative=1
-
Job In Big Data space

Source: http://www.forbes.com/sites/louiscolumbus/2014/12/29/where-big-data-
jobs-will-be-in- 2015/

-
What is Scripting Language?

• A scripting language is a “wrapper” language that integrates OS functions.

• The interpreter is a layer of software logic between your code and the computer
hardware on your machine.
Wiki Says:
• The “program” has an executable form that the computer can use directly to execute
the instructions.
• The same program in its human-readable source code form, from which executable
programs are derived (e.g., compiled)
• Python is scripting language, fast and dynamic.
• Python is called ‘scripting language’ because of it’s scalable
interpreter, but actually it
is much more than that

-
What is Python?

Python is a high-level programming language which is:

 Interpreted: Python is processed at runtime by the interpreter. (Next Slide)
 Interactive: You can use a Python prompt and interact with the interpreter
directly to write your programs.
 Object-Oriented: Python supports Object-Oriented technique of
programming.
 Beginner’s Language:Python is a great language for the
beginner-level
programmers and supports the development of a wide range of
applications.

-
Interpreters
VersusCompilers

• The first thing that is important to understand about

Python is that it is an interpreted language.
• There are two sorts of programming languages: interpreted
ones and compiled ones.A compiled language is what you
are probably used to if you have done any programming in
the past.
• The process for a compiled language is as follows:

-
• Create source file using text edit
• Use compiler to syntax check and convert source file
into binary 
• Use linker to turn binary files into executable format
• Run the resulting executable format file in the
operating system.

-
• The biggest difference between interpreted code and compiled
code is that an interpreted application need not be
“complete.”
• You can test it in bits and pieces until you are satisfied with
the results and put them all together later for the end user to
use.

-
Python Features

 Easy to learn, easy to read and easy to maintain.

 Portable: It can run on various hardware platforms and has the
same
interface on all platforms.
 Extendable: You can add low-level modules to the Python interpreter.
 Scalable: Python provides a good structure and support for large programs.
 Python has support for an interactive mode of testing and debugging.
 Python has a broad standard library cross-platform.
 Everything in Python is an object: variables, functions, even code.
Every
object has an ID, a type, and a value.

-
More Features ..

 Python provides interfaces to all major commercial databases.

 Python supports functional and structured programming methods as well as
OOP.
 Python provides very high-level dynamic data types and supports
dynamic type checking.
 Python supports GUI applications
 Python supports automatic garbage collection.
 Python can be easily integrated with C, C++, and Java.

-
Why
Python
Easy to read  Python scripts have clear syntax, simple structure and very few protocols to
remember before programming.

Easy to Maintain  Python code is easily to write and debug. Python's success is that its
source code is fairly easy-to-maintain.
Portable  Python can run on a wide variety of Operating systems and platforms
and providing the similar interface on allplatforms.
Broad Standard Libraries  Python comes with many prebuilt libraries apx. 21K
High Level programming  Python is intended to make complex programming simpler. Python deals with
memory
addresses, garbage collection etc internally.
Interactive  Python provide an interactive shell to test the things before
implementation. It provide the user the direct interface with Python.
Database Interfaces  Python provides interfaces to all major commercial databases. These
interfaces are pretty easy to use.
GUI programming  Python supports GUI applications and has framework for Web.
Interface to tkinter, WXPython, DJango in Python makeit .

-
History of Python

 Python was conceptualized by Guido Van Rossum in the late

1980s.
 Rossum published the first version of Python code (0.9.0) in
February 1991 at the CWI (Centrum Wiskunde &
Informatica) in the Netherlands , Amsterdam.
 Python is derived from ABC programming language, which is a
general-purpose programming language that had
been developed at the CWI.
 Rossum chose the name "Python", since he was a big fan of
Monty Python's Flying Circus.
 Python is now maintained by a core development team at the
institute, although Rossum still holds a vital role in
directing its progress.

-
Python Versions

Release dates for the major and minor

versions:
Python 1.0 - January 1994
 Python 1.5 - December 31, 1997
 Python 1.6 - September 5, 2000
Python 2.0 - October 16, 2000
 Python 2.1 - April 17, 2001
 Python 2.2 - December 21, 2001
 Python 2.3 - July 29, 2003
 Python 2.4 - November 30, 2004
 Python 2.5 - September 19, 2006
 Python 2.6 - October 1, 2008
 Python 2.7 - July 3, 2010
-
Python Versions

Release dates for the major and minor

versions:
Python 3.0 - December 3, 2008
 Python 3.1 - June 27, 2009
 Python 3.2 - February 20, 2011
 Python 3.3 - September 29, 2012
 Python 3.4 - March 16, 2014
 Python 3.5 - September 13, 2015
 Python 3.7 - December 26, 2016

-
Python time line

By Ripal
Ranpara

-
Key Changes in Python
3.0
 Python 2's print statement has been replaced by the print() function.
Old: New:

 There is only one integer type left, int.

 Some methods such as map() and filter( ) return iterator
objects in Python3 instead of lists in Python 2.
 In Python 3, a TypeError is raised as warning if we try to compare
unorderabletypes. e.g. 1 < ’ ', 0 > None are no longer valid
 Python 3 provides Unicode (utf-8) strings while Python 2 has ASCII
str( ) types and separate unicode( ).
 A new built-in string formatting method format replaces % string
formatting operator.
the

-
Key Changes in Python
3.0
 In Python 3, we should enclose the exception argument in parentheses.

Old: New:

 In Python 3, we have to use the as keyword now in the handling

of exceptions.

Old: New:

 The division of two integers returns a float instead of an integer. "//" can be
used to have the "old" behavior.

-
Python Syntax

-
Basic Syntax
 Indentation is used in Python to delimit blocks. The number of spaces
variable,but all statements within the same block must be
is
indented the same amount.
 The header line for compound statements, such as if, while, def, and
class should be terminated with a colon ( : )
 The semicolon ( ; ) is optional at the end of statement. Error!

 Printing to the Screen:

 Reading Keyboard Input:
 Comments
•Single line:
•Multiple lines:
 Python files have extension .py

-
Variables

 Python is dynamically typed. You do not need

to declare variables!
 The declaration happens automatically
when you assign a value to a variable.
 Variables can change type, simply by assigning
them a new value of a different type.
 Python allows you to assign a single value
to several variables simultaneously.
 You can also assign multiple objects to
multiple
variables.

-
Python Data Types

-
Numbers
 Numbers are Immutable objects in Python that cannot change their
values.
 There are three built-in data types for numbers in Python3:
• Integer (int)
• Floating-point numbers (float) (not used much in Python programming)
• Complex numbers: <real part> + <imaginary part>j
 Common Number Functions
Function Description
int(x) to convert x to an integer
float(x) to convert x to a floating-point number
abs(x) The absolute value of x
cmp(x,y) -1 if x < y, 0 if x == y, or 1 if x > y
exp(x) The exponential of x: ex
log(x) The natural logarithm of x, for x> 0
pow(x,y) The value of x**y
sqrt(x) The square root of x for x > 0
-
Strings
 Python Strings are Immutable objects that cannot change their values.

 You can update an existing string by (re)assigning a variable to another string.

 Python does not support a character type; these are treated as strings of length one.
 Python accepts single ('), double (") and triple (''' or """) quotes to denote string
literals.

 String indexes starting at 0 in the beginning of the string and working their way from -1
at the end.

-
Strings
 String Formatting

 Common String Operators

Assume string variable a holds 'Hello' and variable b holds 'Python’
Operator Description Example

+ Concatenation - Adds values on either side of the operator a + b will give HelloPython

* Repetition - Creates new strings, concatenating multiple copies of the a*2 will give HelloHello
same string

[] Slice - Gives the character from the given index a[1] will give e
a[-1] will give
o

[:] Range Slice - Gives the characters from the given range a[1:4] will give ell
in Membership - Returns true if a character exists in the given string ‘H’ in a will give True

-
Strings
 Common String Methods
Method Description
str.count(sub, beg= Counts how many times sub occurs in string or in a substring of string if starting index
0,end=len(str)) beg and ending index end are given.

str.isalpha() Returns True if string has at least 1 character and all characters are alphanumeric
and False otherwise.
str.isdigit() Returns True if string contains only digits and False otherwise.
str.lower() Converts all uppercase letters in string to lowercase.
str.upper() Converts lowercase letters in string to uppercase.
str.replace(old, new) Replaces all occurrences of old in string with new.
str.split(str=‘ ’) Splits string according to delimiter str (space if not provided) and returns list
of substrings.
str.strip() Removes all leading and trailing whitespace of string.
str.title() Returns "titlecased" version of string.

 Common String Functions str(x) :to convert x to a string

len(string):gives the total length of the string

-
Lists
 A list in Python is an ordered group of items or elements, and these list elements don't have
to be of the same type.
 Python Lists are mutable objects that can change their values.
 A list contains items separated by commas and enclosed within square brackets.
 List indexes like strings starting at 0 in the beginning of the list and working their way from -
1
at the end.
 Similar to strings, Lists operations include slicing ([ ] and [:]) , concatenation (+),
repetition (*), and membership (in).
 This example shows how to access, update and delete list elements:

-
Lists
 Lists can have sublists as elements and these sublists may contain other sublists as
well.

 Common List Functions

Function Description

cmp(list1, list2) Compares elements of both lists.

len(list) Gives the total length of the list.

max(list) Returns item from the list with max value.

min(list) Returns item from the list with min value.

list(tuple) Converts a tuple into list.

-
Lists
 Common List
Methods Method Description
list.append(obj) Appends object obj to list
list.insert(index, obj) Inserts object obj into list at offset index

list.count(obj) Returns count of how many times obj occurs in list

list.index(obj) Returns the lowest index in list that obj appears

list.remove(obj) Removes object obj from list

list.reverse() Reverses objects of list in place
list.sort() Sorts objects of list in place
 List Comprehensions
Each list comprehension consists of an expression followed by a for clause.

 List comprehension

-
Python Reserved Words

Akeyword is one that means something to the language. In other words,

you can’t use a reserved word as the name of a variable, a function, a
class, or a module.All the Python keywords contain lowercase letters only.

and exec not

assert finally or
break for pass
class from print
continue global raise
def if return
del import try
elif in while
else is with
except lambda yield
-
Tuples

 Python Tuples are Immutableobjects that cannot be changed once they have been
created.
 A tuple contains items separated by commas and enclosed in parentheses instead of square
brackets.

 You can update an existing tuple by (re)assigning a variable to another tuple.

 Tuples are faster than lists and protect your data against accidental changes to these data.
 The rules for tuple indices are the same as for lists and they have the same operations,
functions as well.
 To write a tuple containing a single value, you have to include a comma, even though there
is only one value. e.g. t = (3, )

-
Set
s
Set
 Sets are used to store multiple items in a single variable.
 Set is one of 4 built-in data types in Python used to store collections of data, the other 3
are List, Tuple, and Dictionary, all with different qualities and usage.
 A set is a collection which is both unordered and unindexed.
 Sets are written with curly brackets.
Example
Create a Set:
thisset = {"apple", "banana", "cherry"}
print(thisset)
Set Items
Set items are unordered, unchangeable, and do
not allow duplicate values.
 Unordered
Unordered means that the items in a set do not
have a defined order.
Set items can appear in a different order every
time you use them, and cannot be referred to
by index
or key.
-
 Unchangeable
Sets are unchangeable, meaning that we cannot change the items after the set has been created.
Once a set is created, you cannot change its items, but you can add new items.
Hash Table
• Hashing is a technique that is used to uniquely identify a specific object
from a group of similar objects.

•Assume that you have an object and you want to assign a key to it
to make searching easy.

• To store the key/value pair, you can use a

simple array like a data structure where keys (integers)
can be used directly as an index to store values.
• However, in cases where the keys are
large and cannot be used directly as an index, you
should use hashing.

-
Dictionary

 Python's dictionaries are kind of hash table type which consist of key-value pairs
of
unordered elements.
• Keys : must be immutable data types ,usually numbers or strings.
• Values : can be any arbitrary Python object.
 Python Dictionaries are mutable objects that can change their values.
 A dictionary is encleach key is separated from its value by a colon osed by
curly braces ({ }), the items are separated by commas, and (:).
 Dictionary’s values can be assigned and accessed using square braces ([]) with
a key to obtain its value.

-
Dictionary
 This example shows how to access, update and delete dictionary
elements:

 The output:

-
Dictionary
 Common Dictionary Functions
• cmp(dict1, dict2) : compares elements of both dict.
• len(dict) : gives the total number of (key, value) pairs in the
dictionary.

 Common Dictionary Methods

Method Description
dict.keys() Returns list of dict's keys
dict.values() Returns list of dict's values
dict.items() Returns a list of dict's (key, value) tuple pairs
dict.get(key, default=None) For key, returns value or default if key not in dict
dict.has_key(key) Returns True if key in dict, False otherwise
dict.update(dict2) Adds dict2's key-values pairs to dict
dict.clear() Removes all elements of dict

-
Python Control Structures

-
Conditionals
 In Python, True and False are Boolean objects of class 'bool' and they are immutable.
 Python assumes any non-zero and non-null values as True, otherwise it is False value.
 Python does not provide switch or case statements as in other languages.
 Syntax:

 Example:

-
Conditionals

 Using the conditional expression

Another type of conditional structure in Python, which is very convenient and easy to read.

-
Loops

 The For Loop

 The while Loop

-
Loops
Loop Control Statements
 break :Terminates the and transfers execution to the statement
stloaotepment
I immediately following the
loop.

 continue :Causes the loop to skip the remainder of its body and immediately retest
its
condition prior to reiterating.

 pass :Used when a statement is required syntactically but you do not want
any
command or code to execute.

-
Python Functions

-
Functions

A function is a block of organized, reusable code that is used to perform a single, related action.
Functions provide better modularity for your application and a high degree of code reusing.

Defining a Function

• Function blocks begin with the keyword def followed by the function name and parentheses ( (
) ).
• Any input parameters or arguments should be placed within these parentheses. You can also
define parameters inside these parentheses.
• The first statement of a function can be an optional statement - the documentation string of
the function or docstring.
• The code block within every function starts with a colon (:) and is indented.
• The statement return [expression] exits a function, optionally passing back an expression to
the caller. A return statement with no arguments is the same as return None.

8/22/20
17

-
Functions
 Function Syntax

 Function Arguments
You can call a function by using any of the following types of arguments:
• Required arguments: the arguments passed to the function in correct
positional order.
• Keyword arguments: the function call identifies the arguments by the
parameter names.
• Default arguments: the argument has a default value in the function
declaration used when the value is not provided in the function call.

-
Functions

• Variable-length arguments: This used when you need to process unspecified additional
arguments. An asterisk (*) is placed before the variable name in the function declaration.

-
Python File Handling

-
File Handling

 File opening fileObject = open(file_name [, access_mode][,buffering])

Common access modes:

• “r” opens a file for reading only.
• “w” opens a file for writing only. Overwrites the file if the file exists.
Otherwise, it creates a new file.
• “a” opens a file for appending. If the file does not exist, it creates a new file
for writing.

 Closing a file fileObject.close()

The close() method flushes any unwritten information and closes the file object.

-
File Handling

 Reading a file : fileObject.read([count])

• The read() method reads the whole file at once.
• The readline() method reads one line each time from the file.
• The readlines() method reads all lines from the file in a list.

 Writing in a file : fileObject.write(string)

The write() method writes any string to an open
file.

-
Python Exception Handling

-
Exception Handling
 Common Exceptions in Python:
NameError - TypeError - IndexError - KeyError - Exception
 Exception Handling Syntax:

 An empty except statement can catch any exception.

 finally clause: always executed before finishing try statements.

-
EXCEPTION NAME DESCRIPTION
Exception Base class for all exceptions
Raised when the next() method of an iterator does
StopIteration
not point to any
object.
SystemExit Raised by thesys.exit() function.
Base class for all built-in exceptions except
StandardError
StopIteration andSystemExit.
Base class for all errors that occur for numeric
ArithmeticError
calculation.
Raised when a calculationexceeds maximum limit for a
OverflowError
numeric
type.
FloatingPointError Raised when a floating point calculation fails.
Raised when division or modulo by zero takes place for
ZeroDivisionError
all numeric types.
AssertionError Raised in case of failure of the
Assert statement.
-
AttributeErro Raised in case of failure of attribute reference or assignment.
r
Raised when there is no input from either the raw_input() or input()
EOFError
function and the
end of file is reached.
ImportError Raised when an import statement fails.
KeyboardInt
Raised when the user interrupts program execution, usually by pressing
er rupt
Ctrl+c.
LookupError Base class for all lookup errors.
IndexErr Raised when an index is not found in asequence.
or Raised when the specified key is not found in the dictionary.
KeyError
NameError Raised when an identifier is not found in the local or global namespace.

UnboundLo
Raised when trying to access a local variable in a function or method
c al
but no value has been assigned to it.
Error
Base class for all exceptions that occur outside the Python
Environmen
environment.
tE rror -
Raised when an input/ output operation fails, such as the print
IOErro
statement or the open() function when trying to open a file that does
r
not exist.
IOErro
Raised foroperating system-related errors.
r
SyntaxError Raised when there is an error in Python
IndentationErr syntax. Raised when indentation is not
or specified properly.
Raised when the interpreter finds an internal problem, but when this error
SystemError
is
encountered the Python interpreter does not exit.
Raised when Python interpreter is quit by using the sys.exit() function. If
SystemExit
not handled in the code, causes the interpreter to exit.

Raised when an operation or function is attempted that is invalid for the

TypeError
specified data type.
Raised when the built-in function for a data type has the valid type of
ValueError
arguments, but
thearguments have invalid values specified.
RuntimeError Raised when a generated error does not fall into any category.

Raised when an abstract method that needs to be implemented in an

NotImplementedError
inherited class is not actuallyimplemented.
-
Python Modules

-
Modules

 A module is a file consisting of Python code that can define functions, classes and
variables.
 A module allows you to organize your code by grouping related code which
makes the code
easier to understand and use.
 You can use any Python source file as a module by executing an import statement

 Python's from statement lets you import specific attributes from a module into the
current namespace.

 import * statement can be used to import all names from a module into the
current
namespace

-
Python Object Oriented

-
Python Classes

 Class variable

 Class constructor

Output 

-
Python Classes

 Built-in class functions

• getattr(obj, name[, default]) : to access the attribute of object.
• hasattr(obj,name) : to check if an attribute exists or not.
• setattr(obj,name,value) : to set an attribute. If attribute does not exist, then it
would be created.
• delattr(obj, name) : to delete an attribute.

 Data Hiding You need to name attributes with a double underscore prefix, and
those attributes then are not be directly visible to outsiders.

-
Class Inheritance

-
Python vs. Java
Code Examples

-
Python vs.
Java
 Hello World
Java
Python

 String Operations
Java
Python

-
Python vs.
Java
 Collections
Java

Python

-
Python vs.
 Class and Inheritance Java
Java

Python

-
Python Useful Tools

-
Useful Tools

 Python IDEs
•Vim
•Eclipse with
PyDev
•Sublime Text
•Emacs
•Komodo Edit
•PyCharm

-
Useful Tools

 Python Web Frameworks

•Django
•Flask
•Pylons
•Pyramid
•TurboGears
•Web2py

-
Who Uses Python?

-
Organizations Use Python

• Web Development :Google, Yahoo

• Games :Battlefield 2, Crystal Space
• Graphics :Walt Disney Feature Animation, Blender 3D
• Science :National Weather Service, NASA, Applied Maths
• Software Development :Nokia, Red Hat, IBM
• Education :University of California-Irvine, SchoolTool
• Government :The USA Central Intelligence Agency (CIA)

-
Thank You

-
-
⦿ Machine learning is about extracting knowledge from the data. It can be defined
as,
⦿ Machine learning is a subfield of artificial intelligence, which enables machines to
learn from past data or experiences without being explicitly programmed.
⦿ Machine learning enables a computer system to make predictions or take some
decisions using historical data without being explicitly programmed. Machine
learning uses a massive amount of structured and semi-structured data so that a
machine learning model can generate accurate result or give predictions based on
that data.
⦿ Machine learning works on algorithm which learn by it’s own using historical data.
It works only for specific domains such as if we are creating a machine learning
model to detect pictures of dogs, it will only give result for dog images, but if we
provide a new data like cat image then it will become unresponsive. Machine
learning is being used in various places such as for online recommender system,
for Google search algorithms, Email spam filter, Face book Auto friend tagging
suggestion, etc.
It can be divided into three types:
⦿ Supervised learning
⦿ Reinforcement learning
⦿ Unsupervised learning

-
-
-
⦿ A Machine Learning system learns from historical data, builds
the prediction models, and whenever it receives new data,
predicts the output for it. The accuracy of predicted output
depends upon the amount of data, as the huge amount of data
helps to build a better model which predicts the output more
accurately.
⦿ Suppose we have a complex problem, where we need to perform
some predictions, so instead of writing a code for it, we just need
to feed the data to generic algorithms, and with the help of these
algorithms, machine builds the logic as per the data and predict
the output. Machine learning has changed our way of thinking
about the problem. The below block diagram explains the
working of Machine Learning algorithm:

-
Features of Machine Learning:
Machine learning uses data to detect various patterns in a given dataset.
It can learn from past data and improve automatically.
It is a data-driven technology.
Machine learning is much similar to data mining as it also deals with the
huge amount of the data.

-
Fig:- Block diagram of decision flow architecture for ML System

-
⦿ 1. Data Acquisition
As machine learning is based on available data for the system to make a decision
hence the first step defined in the architecture is data acquisition. This involves data
collection, preparing and segregating the case scenarios based on certain features
involved with the decision making cycle and forwarding the data to the processing
unit for carrying out further categorization. This stage is sometimes called the data
preprocessing stage. The data model expects reliable, fast and elastic data which may
be discrete or continuous in nature. The data is then passed into stream processing
systems (for continuous data) and stored in batch data warehouses (for discrete data)
before being passed on to data modeling or processing stages.

⦿ 2. Data Processing
The received data in the data acquisition layer is then sent forward to the data
processing layer where it is subjected to advanced integration and processing and
involves normalization of the data, data cleaning, transformation, and encoding.
The data processing is also dependent on the type of learning being used. For e.g.,
if supervised learning is being used the data shall be needed to be segregated
into multiple steps of sample data required for training of the system and the data
thus created is called training sample data or simply training data.

-
⦿ 3. Data Modeling
This layer of the architecture involves the selection of different algorithms that
might adapt the system to address the problem for which the learning is being
devised, These algorithms are being evolved or being inherited from a set of
libraries. The algorithms are used to model the data accordingly, this makes the
system ready for execution step.

⦿ 4. Execution
This stage in machine learning is where the experimentation is done, testing is
involved and tunings are performed. The general goal behind being to optimize the
algorithm in order to extract the required machine outcome and maximize the
system performance, The output of the step is a refined solution capable of
providing the required data for the machine to make decisions.

⦿ 5. Deployment
Like any other software output, ML outputs need to be operational zed or be
forwarded for further exploratory processing. The output can be considered as a
non-deterministic query which needs to be further deployed into the decision-
making system. It is advised to seamlessly move the ML output directly to
production where it will enable the machine to directly make decisions based on
the output and reduce the dependency on the further exploratory steps.
-
 Machine Learning Applications in Healthcare :-
Doctors and medical practitioners will soon be able to predict with
accuracy on how long patients with fatal diseases will live. Medical
systems will learn from data and help patients save money by skipping
unnecessary tests.
i) Drug Discovery/Manufacturing
ii) Personalized Treatment/Medication

 Machine Learning Applications in Finance :-

More than 90% of the top 50 financial institutions around the world are
using machine learning and advanced analytics. The application of
machine learning in Finance domain helps banks offer personalized
services to customers at lower cost, better compliance and generate
greater revenue.
i)Machine Learning for Fraud Detection
ii)Machine Learning for focused Account Holder Targeting

-
 Machine Learning Applications in Retail :-
Machine learning in retail is more than just a latest trend, retailers are
implementing big data technologies like Hadoop and Spark to build big
data solutions. Machine learning algorithms process this data intelligently
and automate the analysis to make this supercilious goal possible for retail
giants like Amazon, Alibaba and Walmart.
i) Machine Learning Examples in Retail for Product
Recommendations
ii)Machine Learning Examples in Retail for Improved Customer
Service.

 Machine Learning Applications in Travel :-

Travel providers can help travellers find the best time to book a hotel or
buy a cheap ticket by taking advantage of machine learning. When the
agreement is available, the application will probably send a notification to
the user.
i) Machine Learning Examples in Travel for Dynamic Pricing
ii)Machine Learning Examples in Travel for Sentiment Analysis

-
 Machine Learning Applications in Media :-
Machine learning offers the most efficient means of engaging billions of
social media users. From personalizing news feed to rendering targeted
ads, machine learning is the heart of all social media platforms for their
own and user benefits. Social media and chat applications have advanced
to a great extent that users do not pick up the phone or use email to
communicate with brands – they leave a comment on Facebook or
Instagram expecting a speedy reply than the traditional channels.
⦿ Earlier Facebook used to prompt users to tag your friends but nowadays
the social networks artificial neural networks machine learning algorithm
identifies familiar faces from contact list. The ANN (Artificial Neural
Networks)algorithm mimics the structure of human brain to power facial
recognition.
⦿ The professional network like LinkedIn knows where you should apply for
your next job, whom you should connect with and how your skills stack
up against your peers as you search for new job.

-
Let’s understand the type of data available in the datasets from the
perspective of machine learning.
1. Numerical Data :-
Any data points which are numbers are termed as numerical data.
Numerical data can be discrete or continuous. Continuous data has any
value within a given range while the discrete data is supposed to have a
distinct value. For example, the number of doors of cars will be discrete
i.e. either two, four, six, etc. and the price of the car will be continuous that
is might be 1000$ or 1250.5$. The data type of numerical data is int64 or
float64.
2. Categorical Data :-
Categorical data are used to represent the characteristics. For example
car color, date of manufacture, etc. It can also be a numerical value
provided the numerical value is indicating a class. For example, 1 can be
used to denote a gas car and 0 for a diesel car. We can use categorical
data to forms groups but cannot perform any mathematical operations on
them. Its data type is an object.
-
3. Time Series Data :-
It is the collection of a sequence of numbers collected at a regular
interval over a certain period of time. It is very important, like in the field
of the stock market where we need the price of a stock after a constant
interval of time. The type of data has a temporal field attached to it so that
the timestamp of the data can be easily monitored.

4. Text Data :-
Text data is nothing but literals. The first step of handling test data is to
convert them into numbers as or model is mathematical and needs data to
inform of numbers. So to do so we might use functions as a bag of word
formulation.

-
⦿ ML Dataset :-

Machine learning dataset is defined as the collection of data that is

needed to train the model and make predictions. These datasets are
classified as structured and unstructured datasets, where the structured
datasets are in tabular format in which the row of the dataset corresponds
to record and column corresponds to the features, and unstructured
datasets corresponds to the images, text, speech, audio etc. which is
acquired through Data Acquisition, Data Wrangling and Data Exploration,
during the learning process these datasets are divided as training,
validation and test sets for the training and measuring the accuracy of the
mode.

-
⦿ Types of datasets :-
1.Training Dataset: This data set is used to train the model i.e.
these datasets are used to update the weight of the model.

2.Validation Dataset: These types of a dataset are used to reduce over

fitting. It is used to verify that the increase in the accuracy of the training
dataset is actually increased if we test the model with the data that is not
used in the training. If the accuracy over the training dataset increase
while the accuracy over the validation dataset decrease, then this results
in the case of high variance i.e. over fitting.

3.Test Dataset: Most of the time when we try to make changes to the
model based upon the output of the validation set then unintentionally we
make the model peek into our validation set and as a result, our model
might get over fit on the validation set as well. To overcome this issue we
have a test dataset that is only used to test the final output of the model in
order to confirm the accuracy.

-
⦿ Machine learning life cycle is a cyclic process to build an efficient
machine learning project. The main purpose of the life cycle is to find a
solution to the problem or project.
⦿ Machine learning life cycle involves seven major steps, which are given
below:

⦿ Gathering Data
⦿ Data preparation
⦿ Data Wrangling
⦿ Analyze Data
⦿ Train the model
⦿ Test the model
⦿ Deployment

-
-
⦿ In the complete life cycle process, to solve a problem, we create a
machine learning system called "model", and this model is created by
providing "training". But to train a model, we need data, hence, life cycle
starts by collecting data.
⦿ The most important thing in the complete process is to understand the
problem and to know the purpose of the problem.

1. Gathering Data:

⦿ Data Gathering is the first step of the machine learning life cycle. The
goal of this step is to identify and obtain all data-related problems.
⦿ In this step, we need to identify the different data sources, as data can be
collected from various sources such as files, database, internet,
or mobile devices. It is one of the most important steps of the life cycle.
The quantity and quality of the collected data will determine the
efficiency of the output. The more will be the data, the more accurate will
be the prediction.

-
⦿ This step includes the below tasks:
 Identify various data sources
 Collect data
 Integrate the data obtained from different sources
 By performing the above task, we get a coherent set of data, also called as
a dataset. It will be used in further steps.

2. Data preparation :
⦿ After collecting the data, we need to prepare it for further steps. Data preparation is
a step where we put our data into a suitable place and prepare it to use in our
machine learning training.
⦿ In this step, first, we put all data together, and then randomize the ordering of data.
⦿ This step can be further divided into two processes:
⦿ Data exploration:
It is used to understand the nature of data that we have to work with. We need to
understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
⦿ Data pre-processing:
Now the next step is preprocessing of data for its analysis.

-
3. Data Wrangling :

⦿ Data wrangling is the process of cleaning and converting raw data into a useable
format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in the
next step. It is one of the most important steps of the complete process. Cleaning of
data is required to address the quality issues.

⦿ It is not necessary that data we have collected is always of our use as some of the
data may not be useful. In real-world applications, collected data may have various
issues, including:

⦿ Missing Values
⦿ Duplicate data
⦿ Invalid data
⦿ Noise

⦿ So, we use various filtering techniques to clean the data.

⦿ It is mandatory to detect and remove the above issues because it can negatively
affect the quality of the outcome.

-
4. Data Analysis :
⦿ Now the cleaned and prepared data is passed on to the analysis step. This
step involves:
⦿ Selection of analytical techniques
⦿ Building models
⦿ Review the result
⦿ The aim of this step is to build a machine learning model to analyze the
data using various analytical techniques and review the outcome. It starts
with the determination of the type of the problems, where we select the
machine learning techniques such as Classification, Regression, Cluster
analysis, Association, etc. then build the model using prepared data, and
evaluate the model.
⦿ Hence, in this step, we take the data and use machine learning algorithms
to build the model.

-
5. Train Model :

⦿ Now the next step is to train the model, in this step we train our model to
improve its performance for better outcome of the problem.
⦿ We use datasets to train the model using various machine learning
algorithms. Training a model is required so that it can understand the
various patterns, rules, and, features.

6. Test Model :

⦿ Once our machine learning model has been trained on a given dataset,
then we test the model. In this step, we check for the accuracy of our model
by providing a test dataset to it.
⦿ Testing the model determines the percentage accuracy of the model as per
the requirement of project or problem.

-
7. Deployment :
⦿ The last step of machine learning life cycle is deployment, where we
deploy the model in the real-world system.
⦿ If the above-prepared model is producing an accurate result as per our
requirement with acceptable speed, then we deploy the model in the real
system. But before deploying the project, we will check whether it is
improving its performance using available data or not. The deployment
phase is similar to making the final report for a project.

-
⦿ Pre-processing refers to the changes applied to our data before feeding it
to the ML algorithm. Data pre-processing is a technique that is used to
convert the created or collected (raw) data into a clean data set. In other
words, whenever the data is gathered from different sources it is collected
in raw format which is not feasible for the analysis or processing by ML
model. Following figure shows transformation processing performed on
raw data before, during and after applying ML techniques:

-
⦿ Data Pre-processing in Machine Learning can be broadly divided into 3
main parts –

1. Data Integration
2. Data Cleaning
3. Data Transformation

There are various steps in each of these 3 broad categories. Depending

on the data and machine learning algorithm involved, not all steps might
be required though.

-
-
1.Data Integration and formatting :-
During hackathon and competitions, we often deal with a single csv or
excel file containing all training data. But in real world, source of data
might not be this simple. In real life, we might have to extract data from
various sources and have to integrate it.

2. Data Cleaning :-
1. Dealing with Missing data :-
It is common to have some missing or null data in the real-world data set.
Most of the machine learning algorithms will not work with such data. So,
it becomes important to deal with missing or null data. Some of common
measures taken are,
⦿ Get rid of the column if there are plenty of rows with null values.
⦿ Eliminate the row if there are plenty of columns with null values.
⦿ Change the missing value by mean or median or mode of that column
depending on data distribution in that column.

-
⦿ By substituting the missing values by ‘NA’ or ‘Unknown’ or some other
relevant term, in case of categorical feature column, we can consider
missing data as a new category in itself.
⦿ In this method to come up with educated guesses of possible candidate,
replace missing value by applying regression or classification techniques.

2.2 Remove Noise from Data :-

Noise are slightly erroneous data observations which does not fulfil with
trend or distribution of rest of data. Though each error can be small, but
communally noisy data results in poor machine learning model. Noise in
data can be minimized or smooth out by using below popular techniques –
⦿ Binning
⦿ Regression

-
⦿ Binning Method:
 First sort data and partition
 Then one can smooth by bin mean, median and boundaries.
For Example:
• Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29,
34
* Partition into bins: - Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
* Smoothing by bin means: - Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
* Smoothing by bin boundaries: - Bin 1: 4,
4, 4, 15
- Bin 2: 21,
-
21, 25, 25
- Bin 3: 26,
26, 26, 34
⦿ Regression Method:

⦿ Here data can be smoothed by fitting the data to a function.

⦿ Linear regression involves finding the “best” line to fit two attributes, so
that one attribute can be used to predict the other.
⦿ Multiple linear regression is an extension of linear regression,
where more than two attributes are involved and the data are fit to a
multidimensional surface.

-
2.3 Remove Outliers from Data :-

⦿ Outliers are those observation that has extreme values, much beyond the
normal range of values for that feature. For example, a very high salary of
C E O of a company can be an outlier if we consider salary of other regular
employees of the company.
⦿ Even few outliers in data set can contribute to poor accuracy of machine
learning model. The common methods to detect outliers and remove
them are –

⦿ Standard Deviation
⦿ Box Plot

-
⦿ Standard Deviation
In statistics, if a data distribution is approximately normal then about 68%
of the data values lie within one standard deviation of the mean and about
95% are within two standard deviations, and about 99.7% lie within three
standard deviations.

-
⦿ Therefore, if you have any data point that is more than 3 times the
standard deviation, then those points are very likely to be anomalous or
outliers.

⦿ Box Plots:
⦿ Box plots are a graphical depiction of numerical data through their
quintiles. It is a very simple but effective way to visualize outliers. Think
about the lower and upper sideburns as the boundaries of the data
distribution. Any data points that show above or below the sideburns, can
be considered outliers or anomalous.
⦿ The concept of the Interquartile Range (IQR) is used to build the box
plot graphs. IQR is a concept in statistics that is used to measure the
statistical dispersion and data variability by dividing the dataset into
quartiles.
⦿ In simple words, any dataset or any set of observations is divided into four
defined intervals based upon the values of the data and how they
compare to the entire dataset. A quartile is what divides the data into
three points and four intervals.

-
-
2.4 Dealing with Duplicate Data :-
The approach to deal with duplicate data depends on the fact whether
duplicate data represents the real-world scenario or is more of an
inconsistency. If it is former than duplicate data should be conserved, else
it should be removed.

2.5 Dealing with Inconsistent Data :-

Some data might not be consistent with the business rule. This requires
domain knowledge to identify such inconsistencies and deal with it.

3.Data Transformation :-

3.1 Feature Scaling :-

Feature scaling is one of the most important prerequisites for most of the
machine learning algorithm. Various features of data set can have their
own range of values and far different from each other. For e.g. age of
human mostly lies within range of 0-100 years but population of countries
will be in ranges of millions.

-
This huge differences of ranges between features in a data set can distort
the training of machine learning model. So, we need to bring the ranges
of all the features at common scale. The common approaches of feature
scaling are –
⦿ Mean Normalization
⦿ Min-Max Normalization
⦿ Z-Score Normalization or Standardization,
We will make brief overview with examples on these approaches in the
last section of this unit.
3.2 Dealing with categorical data :-
⦿ Categorical data, also known as qualitative data are text or string-based
data. Example of categorical data are gender of persons (Male or Female),
names of places (India, America, England), colour of car (Red,White).
⦿ Most of the machine learning algorithms works on numerical data only
and will not be able to process categorical data. So, we need to transform
categorical data into numerical form without losing the sense of
information. Below are the popular approaches to convert categorical
data into numerical form –
⦿ Label Encoding
⦿ One Hot Encoding
⦿ Binary Encoding
-
3.3 Dealing with Imbalanced Data Set :-
⦿ Imbalanced data set are type of data set in which most of the data belongs
to only one class and very few data belongs to other class. This is
common in case of medical diagnosis, anomaly detection where the data
belonging to positive class is a very small percentage.
⦿ For e.g. only 5-10% of data might belong to a disease positive class which
can be an expected distribution in medical diagnosis. But this skewed
data distribution can trick the machine learning model in training phase
to only identify the majority classes and it fails to learn the minority
classes. For example, the model might fail to identify the medical
condition even though it might be showing a very high accuracy by
identifying negative scenarios.
⦿ We need to do something about imbalanced data set to avoid a bad
machine learning model. Below are some approaches to deal with such
situation –
⦿ Under Sampling Majority Class
⦿ Over Sampling Minority Class
⦿ SMOTE (Synthetic Minority Oversampling Technique)

-
3.4 Feature Engineering :-
⦿ Feature engineering is an art of creating new feature from the given data
by either applying some domain knowledge or some common sense or
both.
⦿ A very common example of feature engineering is converting a Date
feature into additional features like Day, Week, Month, Year thus adding
more information into data set.
⦿ Feature engineering enriches data set with more information that can

contribute to good machine learning model.

3.5 Training-Test Data Split :-

⦿ Just before training supervised machine learning model, this is usually
the last step of data pre-processing in which for a given data set, after
undergoing all cleaning and transformation, is divided into two parts –
one for training of machine learning model and second for testing the
trained model.
⦿ Though there is no rule of thumb, but usually training-test split is done
randomly at 80%-20% ratio. While splitting data set, care has to be taken
that there is no loss of information in training data set.

-
⦿ Execution of Data Pre-processing methods using Python
commonly involves
following steps:
 Importing the libraries
 Importing the Dataset
 Handling of Missing Data
 Handling of Categorical Data
 Splitting the dataset into training and testing datasets
 Feature Scaling
ForthisDataPre-processingscript,WearegoingtouseAnaconda
Navigator and specifically Spyder (IDE) to write the following code.

-
Importing the
libraries :-
import numpy as # used for handling
np import pandas numbers
from # used
as pd sklearn.impute import for handling the# used for
SimpleImputer
handling missing data dataset
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
#used for encoding categorical data
from sklearn.model_selection import # used for
train_test_split training and testing data splitting
from sklearn.preprocessing import
StandardScaler # used for
feature scaling
used to
from sklearn.compose import Applies
ColumnTransformer
# transformers to columns of an array
-
Importing the
Dataset :-
⦿ First of all, let us have a look at the dataset we are going to use
for this particular example. You can download or take this
dataset from :
https://github.com/tarunlnmiit/machine_learning/blob/master/Da
taPrepro
cessing.csv
⦿ It is as shown below:

-
⦿ By pressing raw button in the link, copy this dataset and store it in
Data.csv
file, in the folder where your program is stored.
⦿ In order to import this dataset into our script, we are apparently
going to use pandas as follows.

dataset = pd.read_csv('Data.csv') # to import the dataset into a

variable
# Splitting the attributes into independent and dependent attributes
X = dataset.iloc[:, :-1].values # attributes to determine
independent variable / Class
Y = dataset.iloc[:, -1].values # dependent variable / Class

⦿ When you run this code section, along with libraries, you should
not see any errors. When successfully executed, you can move
to variable explorer in the Spyder UI and you will see the
following three variables.
-
-
⦿ When you double click on each of these variables, you
should see something similar.

-
Handling of Missing
Data :-
⦿ Well the first idea is to remove the lines in the observations
where there is some missing data. But that can be quite
dangerous because imagine this data set contains key
information. It would be quite dangerous to remove such
observation. So, we need to figure out a better idea to handle
this problem. And the most common idea to handle missing
data is to take the mean of the columns, as discussed in
earlier section.
⦿ If you noticed in our dataset, we have two values missing,
one for age column in 6th data index and for Income column
in 4th data row. Missing values should be handled
during the data analysis. So, we do that as follows.

-
# handling the missing data and replace missing values with
nan from numpy and replace with mean of all the other
values

imputer = SimpleImputer(missing_values=np.nan,
strategy='mean’)
imputer = imputer.fit(X[:, 1:])
X[:, 1:] = imputer.transform(X[:, 1:])

⦿ After execution of this code, the independent variable X will

transform into the following.

-
Here you can see, that the missing values have been replaced by
the average
values of the respective columns.

-
Handling of Categorical
Data :-
 In this dataset we can see that we have two categorical

variables namely Region variable and the Online Shopper

variable. These two variables are categorical variables because
simply they contain categories. The Region contains three
categories. It’s India, USA & Brazil and the online shopper
variable contains two categories. Yes and No that’s why they’re
called categorical variables.
 You can guess that since machine learning models are based on
mathematical equations you can naturally understand that it
would cause some problem if we keep the text here in the
categorical variables in the equations because we would only
want numbers in the equations. So that’s why we need to encode
the categorical variables. That is to encode the text that we have
here into numbers. To do this we use the following code snippet.

-
⦿ # encode categorical data
from sklearn.preprocessing import LabelEncoder,
OneHotEncoder labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
rg = ColumnTransformer([("Region", OneHotEncoder(),
[0])], remainder =
'passthrough’)
X = rg.fit_transform(X)
labelencoder_Y = LabelEncoder()
Y = labelencoder_Y.fit_transform(Y)

After execution of this code, the independent variable X and

dependent variable Y will transform into the following.

-
-
⦿ Here, you can see that the Region variable is now made up of a 3
bit binary variable. The left most bit represents India, 2nd
bit represents Brazil and the last bit represents USA. If
the bit is 1 then it represents data for that country otherwise
not.
For Online Shopper variable, 1 represents Yes and 0 represents
No.

-
Splitting the dataset into training and
testing
⦿ Any datasets :-
machine learning algorithm needs to be tested for
accuracy. In order to do that, we divide our data set into
two parts: training set and testing set. As the name itself
suggests, we use the training set to make the algorithm
learn the behaviours present in the data and check the
correctness of the algorithm by testing on testing set. In
Python, we do that as follows:

⦿ # splitting the dataset into training set and test set

X_train, X_test, Y_train, Y_test = train_test_split(X, Y,
test_size=0.2, random_state=0)

⦿ Here, we are taking training set to be 80% of the original

data set and testing set to be 20% of the original data set.
This is usually the ratio in which they are split. But, you
can come across sometimes to a 70–30% or 75–25% ratio
split. But you don’t want to split it 50–50%. This can lead to
Model Overfitting. For now, we are going to split it in 80–20%
ratio. -
⦿ After split, our training set and testing set look like this.
-
Feature Scaling (Variable
Transformation):-
⦿ As you can see, we have these two columns age and income
that contains numerical numbers. You notice that the variables
are not on the same scale because the age are going from 32 to
55 and the salaries going from 57.6 K to like 99.6 K. So, because
this age variable in the salary variable don’t have the same scale.
This will cause some issues in your Machine Learning models.
And why is that. It’s because a lot of ML models are based on
what is called the Euclidean distance.
⦿ We use feature scaling to convert different scales to a standard
scale to make it easier for Machine Learning algorithms. We do
this in Python as follows:

# feature scaling
sc_X = StandardScaler()
X_train =
sc_X.fit_transform(X_train) -

X_test =
sc_X.transform(X_test)
⦿ After the execution of this code, our training independent
variable X and
our testing independent variable X and look like this.

-
⦿ In the older days, people used to perform Machine Learning tasks by
manually coding all the algorithms and mathematical and statistical
formula. This made the process time consuming, tedious and inefficient.
But in the modern days, it becomes very much easier and efficient
compared to the olden days by various python libraries, frameworks, and
modules. Today, Python is one of the most popular programming
languages for this task and it has replaced many languages in the
industry, one of the reasons is its vast collection of libraries. Python
libraries that used in Machine Learning are:
⦿ Numpy
⦿ Scipy
⦿ Scikit-learn
⦿ Theano
⦿ TensorFlow
⦿ Keras
⦿ PyTorch
⦿ Pandas
⦿ Matplotlib

-
⦿ The single most important reason for the popularity of Python in the field
of AI and ML is the fact that Python provides 1000s of inbuilt libraries that
have in-built functions and methods to easily carry out data analysis,
processing, wrangling, modelling and so on.
In the below section, we’ll discuss the libraries for the following tasks:

1. Statistical Analysis
2. Data Visualization
3. Data Modelling and Machine Learning
4. Deep Learning
5. Natural Language Processing (NLP)

-
1.Statistical Analysis:
 Python comes with tons of libraries for the sole purpose of statistical
analysis. Top statistical packages that provide in-built functions to perform
the most complex statistical computations are:

NumPy :- NumPy or Numerical Python is one of the most commonly used

Python libraries. The main feature of this library is its support for multi-
dimensional arrays for mathematical and logical operations. Functions
provided by NumPy can be used for indexing, sorting, reshaping and
conveying images and sound waves as an array of real numbers in multi-
dimension.

SciPy :- Built on top of NumPy, the SciPy library is a collective of sub-

packages that help in solving the most basic problems related to
statistical analysis. SciPy library is used to process the array elements
defined using the NumPy library, so it is often used to compute
mathematical equations that cannot be done using NumPy.

-
Pandas :- Pandas is another important statistical library mainly used in a
wide range of fields including, statistics, finance, economics, data analysis
and so on. The library relies on the NumPy array for the purpose of
processing pandas data objects. NumPy, Pandas, and SciPy are heavily
dependent on each other for performing scientific computations, data
manipulation and so on. Pandas is one of the best libraries for processing
huge chunks of data, whereas NumPy has excellent support for multi-
dimensional arrays and Scipy, on the other hand, provides a set of sub-
packages that perform a majority of the statistical analysis tasks.
StatsModels :- Built on top of NumPy and SciPy, the StatsModels Python
package is the best for creating statistical models, data handling and
model evaluation. Along with using NumPy arrays and scientific models
from SciPy library, it also integrates with Pandas for effective data
handling. This library is famously known for statistical computations,
statistical testing, and data exploration.

-
2. Data Visualization:
 A picture speaks more than a thousand words. Data visualization is all
about expressing the key insights from data effectively through graphical
representations. It includes the implementation of graphs, charts, mind
maps, heat-maps, histograms, density plots, etc, to study the correlations
between various data variables.
 Best Python data visualization packages that provide in-built functions to
study the dependencies between various data features are:
Matplotlib :-Matplotlib is the most basic data visualization package in
Python. It provides support for a wide variety of graphs such as
histograms, bar charts, power spectra, error charts, and so on. It is a 2
Dimensional graphical library that produces clear and concise graphs
that are essential for Exploratory Data Analysis (EDA).
Seaborn :-The Matplotlib library forms the base of the Seaborn library. In
comparison to Matplotlib, Seaborn can be used to create more appealing
and descriptive statistical graphs. Along with extensive supports for data
visualization, Seaborn also comes with an inbuilt data set oriented API for
studying the relationships between multiple variables.

-
Plotly :-Ploty is one of the most well know graphical Python libraries. It
provides interactive graphs for understanding the dependencies
between target and predictor variables. It can be used to analyze and
visualize statistical, financial, commerce and scientific data to produce
clear and concise graphs, sub-plots, heatmaps, 3D charts and so on.

Bokeh :-One of the most interactive libraries in Python, Bokeh can be

used to build descriptive graphical representations for web browsers. It
can easily process humungous datasets and build versatile graphs that
help in performing extensive EDA. Bokeh provides the most well-defined
functionality to build interactive plots, dashboards, and data applications.

-
3. Machine Learning :
 Implementing ML, DL, etc. involves coding 1000s of lines of code and this
can become more cumbersome when you want to create models that
solve complex problems through neural networks. But thankfully we don’t
have to code any algorithms because Python comes with several
packages just for the purpose of implementing machine learning
techniques and algorithms.
 Top ML packages that provide in-built functions to implement all the ML
algorithms:

Scikit-learn :-One of the most useful Python libraries, Scikit-learn is the

best library for data modelling and model evaluation. It comes with tons
and tons of functions for the sole purpose of creating a model. It contains
all the Supervised and Unsupervised Machine Learning algorithms and it
also comes with well-defined functions for Ensemble Learning and
Boosting Machine Learning.

-
XGBoost :-XGBoost which stands for Extreme Gradient Boosting is one of
the best Python packages for performing Boosting Machine Learning.
Libraries such as LightGBM and CatBoost are also equally equipped with
well-defined functions and methods. This library is built mainly for the
purpose of implementing gradient boosting machines which are used to
improve the performance and accuracy of Machine Learning Models.

ELI5 :-ELI5 is another Python library that is mainly focused on improving

the performance of Machine Learning models. This library is relatively
new and is usually used alongside the XGBoost, LightGBM, CatBoost and
so on to boost the accuracy of Machine Learning models.

-
4.Deep Learning :
 The biggest advancements in ML and AI is been through deep learning.
With the introduction to deep learning, it is now possible to build
complex models and process humungous data sets. Thankfully, Python
provides the best deep learning packages that help in building effective
neural networks.
 Top deep learning packages that provide in-built functions to implement
convoluted Neural Networks are:

Tensorflow :-One of the best Python libraries for Deep Learning,

TensorFlow is an open-source library for dataflow programming across a
range of tasks. It is a symbolic math library that is used for building strong
and precise neural networks. It provides an intuitive multiplatform
programming interface which is highly-scalable over a vast domain of
fields.

-
Pytorch :-Pytorch is an open-source, Python-based scientific computing
package that is used to implement Deep Learning techniques and Neural
Networks on large datasets. This library is actively used by Facebook to
develop neural networks that help in various tasks such as face
recognition and auto-tagging.

Keras :-Keras is considered as one of the best Deep Learning libraries in

Python. It provides full support for building, analyzing, evaluating and
improving Neural Networks. Keras is built on top of Theano and
TensorFlow Python libraries which provides additional features to build
complex and large-scale Deep Learning models.

-
5.Natural Language Processing:
 Have you ever wondered how Google so apply predicts what you’re
searching for? The technology behind Alexa, Siri, and other chatbots is
Natural Language Processing. NLP has played a huge role in designing AI-
based systems that help in describing the interaction between human
language and computers.
 Top Natural Language Processing packages that provide in-built functions
to implement high-level AI-based systems are:

NLTK (Natural Language Toolkit) :-

NLTK is considered to be the best Python package for analyzing human
language and behaviour. Preferred by most of the Data Scientists, the
NLTK library provides easy-to-use interfaces containing over 50 corpora
and lexical resources that help in describing human interactions and
building AI-Based systems such as recommendation engines.

-
spaCy:- spaCy is a free, open-source Python library for implementing
advanced Natural Language Processing (NLP) techniques. When you’re
working with a lot of text it is important that you understand the
morphological meaning of the text and how it can be classified to
understand human language. These tasks can be easily achieved through
spaCY.

Gensim :- Gensim is another open-source Python package modeled to

extract semantic topics from large documents and texts to process,
analyze and predict human behaviour through statistical models and
linguistic computations. It has the capability to process humungous data,
irrespective of whether the data is raw and unstructured.

-
Thanks !!!

-
-
Types of ML :-
⦿ There are four types of machine learning:

1. Supervised Learning:

⦿ Supervised Learning is the one, where you can consider the learning
is guided by a teacher.We have a dataset which acts as a teacher and
its role is to train the model or the machine. Once the model gets
trained it can start making a prediction or decision when new data is
given to it.
⦿ Supervised learning uses labelled training data to learn the mapping
function that turns input variables (X) into the output variable (Y). In
other words, it solves for f in the following equation:
Y = f (X)
⦿ This allows us to accurately generate outputs when given new inputs.

-
⦿ Two types of supervised learning are: classification and regression.

 Classification is used to predict the outcome of a given sample when the

output variable is in the form of categories. A classification model might
look at the input data and try to predict labels like “sick” or “healthy.”

 Regression is used to predict the outcome of a given sample when the

output variable is in the form of real values. For example, a regression
model might process input data to predict the amount of rainfall, the
height of a person, etc.

 Ensembling is another type of supervised learning. It means combining

the predictions of multiple machine learning models that are individually
weak to produce a more accurate prediction on a new sample.

-
⦿ Thus, In supervised Machine Learning

⦿ “The outcome or output for the given input is known before itself ” and the
machine must be able to map or assign the given input to the output.
Multiple images of a cat, dog, orange, apple etc here the images are
labelled. It is fed into the machine for training and the machine must
identify the same. Just like a human child is shown a cat and told so, when
it sees a completely different cat among others still identifies it as a cat,
the same method is employed here. In short,Supervised Learning means
– Train Me!

-
2.Unsupervised Learning:

⦿ Unsupervised learning models are used when we only have the input
variables (X) and no corresponding output variables.
⦿ They use unlabelled training data to model the underlying structure of the
data. Input data is given and the model is run on it. The image or the input
given are mixed together and insights on the inputs can be found .
⦿ The model learns through observation and finds structures in the data.
Once the model is given a dataset, it automatically finds patterns and
relationships in the dataset by creating clusters in it.
⦿ What it cannot do is add labels to the cluster, like it cannot say this a
group of apples or mangoes, but it will separate all the apples from
mangoes.

-
⦿ Two types of unsupervised learning are:Association and Clustering

 Association is used to discover the probability of the co-occurrence of

items in a collection. It is extensively used in market-basket analysis. For
example, an association model might be used to discover that if a
customer purchases bread, s/he is 80% likely to also purchase eggs.
 Clustering is used to group samples such that objects within the same
cluster are more similar to each other than to the objects from another
cluster.
⦿ Apriori, K-means, PCA — are examples of unsupervised learning.
⦿ Suppose we presented images of apples, bananas and mangoes to the
model, so what it does, based on some patterns and relationships it
creates clusters and divides the dataset into those clusters. Now if a new
data is fed to the model, it adds it to one of the created clusters.

-
Fig: grouping of similar data

-
3.Semi-supervised Learning:

⦿ It is in-between that of Supervised and Unsupervised Learning. Where the

combination is used to produce the desired results and it is the most
important in real-world scenarios where all the data available are a
combination of labelled and unlabelled data.

3. Reinforced Learning:

⦿ The machine is exposed to an environment where it gets trained by trial

and error method, here it is trained to make a much specific decision. The
machine learns from past experience and tries to capture the best
possible knowledge to make accurate decisions based on the feedback
received. Algorithm allows an agent to decide the best next action based
on its current state by learning behaviours that will maximize a reward.

-
⦿ It is the ability of an agent to interact with the environment and find out
what is the best outcome. It follows the concept of hit and trial method.
The agent is rewarded or penaltized with a point for a correct or a wrong
answer, and on the basis of the positive reward points gained the model
trains itself.

-
Fig : Types of Machine
Learning

-
1. Overfitting :Over fitting refers to a model that models the training data
too well.
⦿ Over fitting happens when a model learns the detail and noise in the
training data to the extent that it negatively impacts the performance of
the model on new data. This means that the noise or random fluctuations
in the training data is picked up and learned as concepts by the model.
The problem is that these concepts do not apply to new data and
negatively impact the models ability to generalize.
⦿ Over fitting is more likely with nonparametric and nonlinear models that
have more flexibility when learning a target function. As such, many
nonparametric machine learning algorithms also include parameters or
techniques to limit and constrain how much detail the model learns.
2. Underfitting : Under fitting refers to a model that can neither model the
training data nor generalize to new data.
⦿ An under fit machine learning model is not a suitable model and will be
obvious as it will have poor performance on the training data.
⦿ Under fitting is often not discussed as it is easy to detect given a good
performance metric. The remedy is to move on and try alternate machine
learning algorithms. Nevertheless, it does provide a good contrast to the
problem of over fitting.
-
⦿ Bias: It gives us how closeness is our predictive model’s to training data
after averaging predict value. Generally algorithm has high bias which
help them to learn fast and easy to understand but are less flexible. That
looses it ability to predict complex problem, so it fails to explain the
algorithm bias. This results in under fitting of our model.
⦿ Getting more training data will not help much.

⦿ Variance: It define as deviation of predictions, in simple it is the amount

which tell us when its point data value change or a different data is use
how much the predicted value will be affected for same model or for
different model respectively. Ideally, the predicted value which we predict
from model should remain same even changing from one training data-
sets to another, but if the model has high variance then model predict
value are affect by value of data-sets.

⦿ “Signal” as the true underlying pattern that you wish to learn from the
data.
⦿ “Noise” on the other hand, refers to the irrelevant information or
randomness in a dataset.

-
⦿ Overfitting and Underfitting are the two main problems that occur in
machine learning and degrade the performance of the machine learning
models.

⦿ The main goal of each machine learning model is to generalize well.

Here generalization defines the ability of an ML model to provide a
suitable output by adapting the given set of unknown input. It means after
providing training on the dataset, it can produce reliable and accurate
output. Hence, the underfitting and overfitting are the two terms that need
to be checked for the performance of the model and whether the model
is generalizing well or not.

-
 Over fitting :

⦿ Overfitting occurs when our machine learning model tries to cover all the
data points or more than the required data points present in the given
dataset. Because of this, the model starts caching noise and inaccurate
values present in the dataset, and all these factors reduce the efficiency
and accuracy of the model. The overfitted model has low bias and high
variance.
⦿ The chances of occurrence of overfitting increase as much we provide
training to our model. It means the more we train our model, the more
chances of occurring the overfitted model.
⦿ Overfitting is the main problem that occurs in supervised learning.
⦿ Example: The concept of the overfitting can be understood by the below
graph of the linear regression output:

-
-
⦿ In above graph, the model tries to cover all the data points present in the
scatter plot. It may look efficient, but in reality, it is not so. Because the
goal of the regression model to find the best fit line, but here we have not
got any best fit, so, it will generate the prediction errors.
⦿ How to avoid the Overfitting in Model :
⦿ Both overfitting and underfitting cause the degraded performance of the
machine learning model. But the main cause is overfitting, so there are
some ways by which we can reduce the occurrence of overfitting in our
model.
⦿ Cross-Validation
⦿ Training with more data
⦿ Removing features
⦿ Early stopping the training
⦿ Regularization
⦿ Ensembling

-
 Underfitting :
⦿ Underfitting occurs when our machine learning model is not able to
capture the underlying trend of the data. To avoid the overfitting in the
model, the fed of training data can be stopped at an early stage, due to
which the model may not learn enough from the training data. As a result,
it may fail to find the best fit of the dominant trend in the data.
⦿ In the case of underfitting, the model is not able to learn enough from the
training data, and hence it reduces the accuracy and produces
unreliable predictions.
⦿ An underfitted model has high bias and low variance.
⦿ Example: We can understand the underfitting using below output of the
linear regression model:

-
-
⦿ In above graph, the model is unable to capture the data points present in
the plot.

⦿ How to avoid underfitting:

⦿ By increasing the training time of the model.

⦿ By increasing the number of features.

-
 Goodness of Fit :

⦿ The "Goodness of fit" term is taken from the statistics, and the goal of the
machine learning models to achieve the goodness of fit. In statistics
modeling, it defines how closely the result or predicted values match the
true values of the dataset.
⦿ The model with a good fit is between the underfitted and overfitted
model, and ideally, it makes predictions with 0 errors, but in practice, it is
difficult to achieve it.
⦿ There are two other methods by which we can get a good point for our
model, which are the resampling method to estimate model accuracy
and validation dataset.

-
What is Regression :
 Regression analysis is a statistical method to model the relationship
between a dependent (target) and independent (predictor) variables
with one or more independent variables.
 It helps to understand how the value of the dependent variable is
changing corresponding to an independent variable when other
independent variables are held fixed. It predicts continuous/real values
such as temperature.
 Regression is a supervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous
output variable based on the one or more predictor variables. It is mainly
used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables., age,
salary, price, etc.

-
 In Regression, we plot a graph between the variables which best fits the
given data points, using this plot, the machine learning model can make
predictions about the data. In simple words, "Regression shows a line or
curve that passes through all the data points on target-predictor graph
in such a way that the vertical distance between the data points and the
regression line is minimum." The distance between data points and line
tells whether a model has captured a strong relationship or not.

⦿ Some examples of regression can be as:

⦿ Prediction of rain using temperature and other factors
⦿ Determining Market trends
⦿ Prediction of road accidents due to rash driving.

-
⦿ Terminologies Related to the Regression :

⦿ Dependent Variable: The main factor in Regression analysis which we want to

predict or understand is called the dependent variable. It is also called
target variable.
⦿ Independent Variable: The factors which affect the dependent variables or which
are used to predict the values of the dependent variables are called independent
variable, also called as a predictor.
⦿ Outliers: Outlier is an observation which contains either very low value or very
high value in comparison to other observed values. An outlier may hamper the
result, so it should be avoided.
⦿ Multicollinearity: If the independent variables are highly correlated with each
other than other variables, then such condition is called Multicollinearity. It should
not be present in the dataset, because it creates problem while ranking the most
affecting variable.
⦿ Underfitting and Overfitting: If our algorithm works well with the training dataset
but not well with test dataset, then such problem is called Overfitting. And if our
algorithm does not perform well even with training dataset, then such problem is
called underfitting.

-
⦿ Types of Regression :
There are various types of regressions which are used in data science and
machine learning.
⦿ Linear Regression
⦿ Logistic Regression
⦿ Polynomial Regression
⦿ Support Vector Regression
⦿ Decision Tree Regression
⦿ Random Forest Regression
⦿ Ridge Regression
⦿ Lasso Regression:

-
-
Linear Regression:
⦿ Linear regression is a statistical regression method which is used for
predictive analysis.
⦿ It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous variables.
⦿ It is used for solving the regression problem in machine learning.
⦿ Linear regression shows the linear relationship between the independent
variable (X-axis) and the dependent variable (Y-axis), hence called
linear regression.
⦿ If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input
variable, then such linear regression is called multiple linear
regression.
⦿ The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.

-
-
⦿ Some popular applications of linear regression are:

⦿ Analyzing trends and sales estimates

⦿ Salary forecasting
⦿ Real estate prediction
⦿ Arriving at ETAs in traffic.

1. Simple Linear Regression :

⦿ Simple Linear Regression is a type of Regression algorithms that models the

relationship between a dependent variable and a single independent
variable. The relationship shown by a Simple Linear Regression model is
linear or a sloped straight line, hence it is called Simple Linear Regression.
⦿ The key point in Simple Linear Regression is that the dependent variable
must be a continuous/real value. However, the independent variable can be
measured on continuous or categorical values.
⦿ Simple Linear regression algorithm has mainly two objectives:

-
⦿ Model the relationship between the two variables. Such as the
relationship between Income and expenditure, experience and Salary,
etc.
⦿ Forecasting new observations. Such as Weather forecasting according
to temperature, Revenue of a company according to the investments in a
year, etc.

⦿ Recall the geometry lesson from high school. What is the equation of a
line?

y = mx + c

Linear regression is nothing but a manifestation of this simple

equation.

-
Where,

 y is the dependent variable i.e. the variable that needs to be estimated

and predicted.

 x is the independent variable i.e. the variable that is controllable. It is the

input.

 m is the slope. It determines what will be the angle of the line. It is the
parameter denoted as β.

 c is the intercept. A constant that determines the value of y when x is 0.

⦿ We may recognize the equation for simple linear regression as the

equation for a sloped line on an x and y axis.

y = b0 + b1 * x 1

-
Where ,

 b0 is constant.

 y is dependent variable

 B1 coefficient can be thought of as a multiplier that connects the

independent and dependent variables. It translates how much y will be
affected by a unit change in x. In other words, a change in x does not
usually mean an equal change in y.

 x1is an independent variable.

-
⦿ Simple Linear Regression in Python :

#importing libraries

import numpy as np
import
matplotlib.pyplot as
plt
import pandas as pd

# Importing the
dataset

dataset = pd.read_csv('salary_data.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

-
# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3,

random_state=0)

# Fitting Simple Linear Regression to the Training set

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)

-
# Visualizing the Training set results

viz_train = plt
viz_train.scatter(X_train, y_train, color='red')
viz_train.plot(X_train, regressor.predict(X_train), color='blue')
viz_train.title('Salary VS Experience (Training set)')
viz_train.xlabel('Year of Experience')
viz_train.ylabel('Salary')
viz_train.show()

# Visualizing the Test set results

viz_test = plt
viz_test.scatter(X_test, y_test, color='red')
viz_test.plot(X_train, regressor.predict(X_train), color='blue')
viz_test.title('Salary VS Experience (Test set)')
viz_test.xlabel('Year of Experience')
viz_test.ylabel('Salary')
viz_test.show()
-
⦿ After running above code excluding code explanations part, you can see
2 plots in the console window as shown below:

-
-
⦿ One plot is from training set and another from test. Blue lines are in the
same direction. Our model is good to use now.
⦿ Now we can use it to calculate (predict) any values of X depends on
y or any values of y depends on X. This can be done by using predict()
function as follows:

# Predicting the result of 5 Years Experience

y_pred =regressor.predict(np.array([5]).reshape(1, 1))

Output :

The value of y_pred with X = 5 (5 Years Experience) is 73545.90

You can offer to your candidate the salary of ₹ 73,545.90
and this is the best salary for him
!

-
 In conclusion, with Simple Linear Regression, we have to do 5 steps
as per below:

⦿ Importing the dataset.

⦿ Splitting dataset into training set and testing set (2 dimensions of X and y
per each set). Normally, the testing set should be 5% to 30% of dataset.
⦿ Visualize the training set and testing set to double check (you can bypass
this step if you want).
⦿ Initializing the regression model and fitting it using training set (both X
and y).
⦿ Let’s predict!!

We can also pass an array of X (of test set):

⦿ # Predicting the Test set results

y_pred = regressor.predict(X_test)

-
Predict y_pred using array of
X_test
-
2. Multiple Linear Regression :

⦿ We have seen the concept of simple linear regression where a single

predictor variable x(years of experience) was used to model the
response variable y (Salary). In many applications, there is more than one
factor that effects the response. Multiple regression models describe how
a single response variable y depends linearly on a number of predictor
variables.

⦿ For Examples:
⦿ The selling price of a house can depend on the desirability of the
location, the number of bedrooms, the number of bathrooms, the year the
house was built, the square footage of the plot and a number of other
factors.
⦿ The height of a child can rest on the height of the mother, the height of the
father, nutrition, and environmental factors.

-
⦿ Multiple linear regression works the same way as that of simple linear
regression, except for the introduction of more independent variables and
their corresponding coefficients.
⦿ In Simple Linear Regression we dealt with equation:

y = b 0 + b 1 * x1

With concerned to it, Multiple Linear Regression equation will become:

y = b 0 + b 1 * x 1 + b 2 * x2 + b 3 * x3 + … … … . . . +b n * xn

Or
i
Y= b 0 + ∑ bn
1

-
⦿ In translation, predicted value y is sum of all features multiplied with their
coefficients, summed with base coefficient b0 .
Where,

⦿ y is dependent variable/ predicted value.

⦿ x i – features / independent variable / explanatory variable / observed
variable
⦿ b 0 is constant
⦿ b n are coefficients that can be thought of as a multiplier that connects the
independent and dependent variables. It translates how much y will be
affected by a unit change in x. In other words, a change in x does not
usually mean an equal change in y.
Alternatively,
⦿ So simplified, we are predicting what value of y will be depending on
features xi and with coefficients b i , we are deciding how much each
feature is affecting predicted value.

-
Multiple Linear Regression in Python :

#Importing libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Importing the dataset

dataset = pd.read_csv('salary_data.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values

#Splitting the dataset into the

Training set and Test set

X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size = 0.2,

random_state = 0) -
# Fitting Multiple Linear Regression to the Training set.

regressor = LinearRegression()
regressor.fit(X_train, y_train)

⦿ Now! We have the Multiple Linear Regression model, we can use it to

calculate (predict) any values of x depends on y or any values of y
depends on x. This is how we do it as follows:

⦿ ‘’’Predicting the result of salary of new employee with 5 Years of total

Experience, 2 years as team lead Experience, one year as project
manager and has 2 certifications’’’

x_new = [[5],[2],[1],[2]]
y_pred = regressor.predict(np.array(x_new).reshape(1, 4))
print(y_pred)
accuracy = (regressor.score(X_test,y_test))
print(accuracy)

-
Output :

The value of y_pred with x_new = [[5],[2],[1],[2]](5 Years of total

Experience, 2 years as team lead, one year as project manager and 2
Certifications) is ₹ 48017.20

You can offer to your candidate the salary of ₹48017.20 and this is
the
best salary for him!

-
3. Polynomial Linear Regression :

⦿ Polynomial regression is a form of regression analysis in which the

relationship between the independent variable x and the dependent
variable y is modelled as nth degree polynomial in x.

⦿ Polynomial regression fits a nonlinear relationship between the value

of x and the corresponding conditional mean of y, denoted E(y |x).

⦿ Although polynomial regression fits a nonlinear model to the data, as

a statistical estimation problem it is linear, in the sense that the regression
function E(y | x) is linear in the unknown parameters that are estimated
from the data.

⦿ For this reason, polynomial regression is considered to be a special case

of multiple linear regression.

-
⦿ For Example: Increment of salaryof employees per year is often non-
linear. We may express it in terms of polynomial Equation as

y = b 0 + b 1 x + b 2 x 2 + b 3 x 3 + ......+ b n x
n

where,
⦿ b0 is constant .
⦿ y is dependent variable
⦿ b i coefficient can be thought of as a multiplier that connects the
independent and dependent variables. It translates how much y will be
affected by a degree or powerof change in x. In other words, a change in
x i does not usually mean an equal change in y.
⦿ x is an independent variable.

-
⦿ Let us consider dataset of this kind of example that represent the
Polynomial shape.

-
⦿ To get an overview of the increment of salary, let’s visualize the data set
into a chart:

-
⦿ Let’s think about our candidate. He has 5.5 Year of experience. What if we
use the Linear Regression in this example?

-
Polynomial Linear Regression in Python :

#Importing libraries

import numpy as np
import
matplotlib.pyplot as
plt
import pandas as pd

# Importing the
dataset

dataset =
pd.read_csv(‘positio
n_salaries’)
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

-
# Splitting the dataset into the Training set and Test set

from sklearn.model_selection importtrain_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

# Fitting Polynomial Regression to the dataset

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree=4)
X_poly = poly_reg.fit_transform(X)
pol_reg = LinearRegression()
pol_reg.fit(X_poly, y)

-
# Visualizing the Polynomial Regression results

def viz_polymonial():
plt.scatter(X, y, color='red')
plt.plot(X, pol_reg.predict(poly_reg.fit_transform(X)), color='blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
return
viz_polymonial()

-
# Additional feature
# Making the plot line (Blue one) more smooth

def viz_polymonial_smooth():
X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape(len(X_grid), 1)

# Visualizing the Polymonial Regression

results

plt.scatter(X, y, color='red')
plt.plot(X_grid, pol_reg.predict(poly_reg.fit_transform(X_grid)),
color='blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
return
viz_polymonial_smooth()

-
⦿ After calling the viz_polynomial() function, you can see a plotting as per
below:

-
Last step, let’s predict the value of our candidate (with 5.5 YE) Polynomial
Regression model:

# Predicting a new result with Polymonial Regression

print(pol_reg.predict(poly_reg.fit_transform([[5.5]])))

Output:

It’s time to let our candidate know, we will offer him a best salary in class
with₹ 132,148!

-
⦿ Decision trees are supervised learning algorithms used for both,
classification and regression.
⦿ Decision trees are assigned to the information-based learning algorithms
which use different measures of information gain for learning. We can use
decision trees for issues where we have continuous but also categorical
input and target features.
⦿ The main idea of decision trees is to find those descriptive features which
contain the most "information" regarding the target feature and then split
the dataset along the values of these features such that the target feature
values for the resulting sub datasets are as pure as possible.
⦿ The descriptive feature which leaves the target feature most purely is said
to be the most informative one.
⦿ This process of finding the "most informative" feature is done until we
accomplish a stopping criterion, where we then finally end up in so
called leaf nodes.

-
-
⦿ The leaf nodes contain the predictions we will make for new query
instances presented to our trained model.
⦿ This is possible since the model has kind of learned the underlying
structure of the training data and hence can, given some assumptions,
make predictions about the target feature value (class) of unseen query
instances.
⦿ A decision tree mainly contains of a root node, interior nodes, and leaf
nodes which are then connected by branches.

-
⦿ Decision trees are sensitive to the specific data on which they are trained.
If the training data is changed the resulting decision tree can be quite
different and in turn the predictions can be quite different.
⦿ Also, Decision trees are computationally expensive to train, carry a big
risk of overfitting (learning system tightly fits the given training data so
much that it would be inaccurate in predicting the outcomes of the
untrained data. In decision trees, over-fitting occurs when the tree is
designed so as to perfectly fit all samples in the training data set.), and
tend to find local optima because they can’t go back after they have made
a split.
⦿ To solve these weaknesses, we use Random Forest which illustrates the
power of combining many decision trees into one model.

-
-
⦿ Random forest is a Supervised Learning algorithm which uses ensemble
learning method for classification and regression.
⦿ An Ensemble method is a technique that combines the predictions from
multiple machine learning algorithms together to make more accurate
predictions than any individual model. A model comprised of many
models is called an Ensemble model.

-
Types of Ensemble Learning:
⦿ Boosting.
⦿ Bootstrap Aggregation (Bagging).

1. Boosting
Boosting refers to a group of algorithms that utilize weighted averages to
make weak learners into stronger learners. Boosting is all about
“teamwork”. Each model that runs, dictates what features the next model
will focus on.In boosting as the name suggests, one is learning from other
which in turn boosts the learning.

2. Bootstrap Aggregation (Bagging)

Bootstrap allows us to better understand the bias and the variance with
the dataset. Bootstrap involves random sampling of small subset of data
from the dataset.Bagging makes each model run independently and
then aggregates the outputs at the end without preference to any model.

-
⦿ The Classification algorithm is a Supervised Learning technique that is
used to identify the category of new observations on the basis of training
data. In Classification, a program learns from the given dataset or
observations and then classifies new observation into a number of classes
or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc.
Classes can be called as targets/labels or categories.

⦿ Unlike regression, the output variable of Classification is a category, not a

value, such as "Green or Blue", "fruit or animal", etc. Since the
Classification algorithm is a Supervised learning technique, hence it takes
labeled input data, which means it contains input with the corresponding
output.

⦿ In classification algorithm, a discrete output function(y) is mapped to

input variable(x).

y=f(x), where y = categorical output

-
⦿ The best example of an ML classification algorithm is Email Spam
Detector.
⦿ The main goal of the Classification algorithm is to identify the category of
a given dataset, and these algorithms are mainly used to predict the
output for the categorical data.
⦿ Classification algorithms can be better understood using the below
diagram. In the below diagram, there are two classes, class A and Class B.
These classes have features that are similar to each other and dissimilar to
other classes.

-
-
⦿ The algorithm which implements the classification on a dataset is known
as a classifier. There are two types of Classifications:

⦿ Binary Classifier: If the classification problem has only two possible

outcomes, then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or
DO G , etc.

⦿ Multi-class Classifier: If a classification problem has more than two

outcomes, then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of
music.

-
Learners in Classification Problems:

In the classification problems, there are two types of learners:

⦿ Lazy Learners: Lazy Learner firstly stores the training dataset and wait
until it receives the test dataset. In Lazy learner case, classification is done
on the basis of the most related data stored in the training dataset. It takes
less time in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning.

⦿ Eager Learners: Eager Learners develop a classification model based

on a training dataset before receiving a test dataset. Opposite to Lazy
learners, Eager learners take less time in training and more time in
prediction.
Example: Decision Trees, Naïve Bayes, ANN.

-
Types of ML Classification Algorithms:
Classification Algorithms can be further divided into the Mainly two category:

⦿ Linear Models
• Logistic Regression
• Support Vector Machines

⦿ Non-linear Models
• K-Nearest Neighbors
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification
What is Logistic Regression :

⦿ Logistic regression is a supervised ML classification algorithm used to

assign observations to a discrete set of classes. Unlike linear regression
which outputs continuous number values like what will be salary of new
employee or price of house etc, logistic regression transforms its output
into probability value which can then be mapped to two or more discrete
classes.
⦿ For example,
⦿ i) From given students training dataset, it learns to decide whether new or
another student data yield result pass or fail.
⦿ ii) From the symptom of patient, whether he/she is infected by corona
virus or not.
⦿ iii) whether, a particular animal is cat or dog or rat etc.

-
Type of Logistic Regression:

⦿ On the basis of the categories, Logistic Regression can be classified into

three types:

⦿ Binomial Logistic regression: where the response variable has two

values 0 and 1 or pass and fail or true and false.

⦿ Multinomial Logistic regression: there can be 3 or more possible

unordered types of the dependent variable, such as "cat", "dogs", or
"rats“

⦿ Ordinal Logistic regression: In this case, there can be 3 or more

possible ordered types of dependent variables, such as "low", "Medium",
or "High".

-
⦿ Let’s try to understand first why logistics, and why not linear?
⦿ Let ‘x’ be some feature and ‘y’ be the output which can be either 0 or 1,
with binary classification.
⦿ The probability that the output is 1 given its input x, can be represented
as:
⦿ If we predict the probability via linear regression, we can write it as:

P(X) = b 0 + b 1 *X

Where, P(X) = P(y=1|x)

⦿ Linear regression model can generate the predicted probability as any

number ranging from - ∞ to + ∞ to, whereas probability of an outcome
can only lie between 0< P(x)<1.Also, Linear regression has a considerable
effect on outliers.

-
⦿ To avoid this problem, log-odds function or logit function is used.
⦿ Logistic regression can therefore be expressed in terms of logit function as :

(
log P(x)/1-P(x) )= b0 + b1 * X
⦿ where, the left-hand side is called the logit or log-odds function, and p(x)/(1-
p(x)) is called odds.
⦿ The odds signify the ratio of probability of success [p(x)] to probability of
failure [ 1- p(X)]. Therefore, in Logistic Regression, linear combination of
inputs is mapped to the log(odds) - the output being equal to 1.
⦿
If we take an inverse of the above function, we get:

⦿ P(x) =

-
⦿ In more simplified form, above equation becomes

⦿ This is known as the Sigmoid function since it gives an S-shaped curve. It

always gives a value of probability ranging from 0<p<1, as shown in
following figure:

⦿ In order to map this to a discrete class (true/false, cat/dog), we select a

threshold value or tipping point above which we will classify values into
class 1 and below which we classify values into class 2.
P(x) ≥0.5,class=1
P(x) <0.5,class=0

-
Support Vector Machine(SVM) :
⦿ SVM is Supervised Learning algorithms, also can be used for
Classification as well as Regression problems. But, mostly used for
Classification in Machine Learning.
⦿ The goal of the SVM algorithm is to create the best hyperplane or
decision boundary that can separate n-dimensional space into classes so
that we can easily put the new data point in the correct category in the
future.
⦿ SVM chooses the extreme points/vectors called Support Vectors that help
in creating the hyperplane. Consider the following diagram in which
there are two different categories that are classified using a decision
boundary or hyperplane:

-
-
Hyperplane and Support Vectors in the SVM algorithm:
⦿ Hyperplane: There can be multiple lines/decision boundaries to seprate
the classes in n-dimensional space, but we need to find out the best
decision boundary that helps to classify the data points. This best
boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the
dataset, which means if there are 2 features (as shown in image), then
hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means
the maximum distance between the data points.
⦿ Support Vectors:
The data points or vectors that are the closest to the hyperplane and
which affect the position of the hyperplane are termed as Support Vector.
Since these vectors support the hyperplane, hence called a Support
vector.

-
⦿ Here suppose there is a strange cat that has some features as that of dogs,
so if we want a model that can accurately identify whether it is a cat or
dog, in such cases we use SVM. We will first train our model with lots of
features of cats and dogs so that it can learn from number of features of
cats and dogs, and then test it with this strange animal. So as support
vector creates a decision boundary between these two data (cat and dog)
and choose extreme cases (of support vectors), it will see the extreme
case of cat and dog. On the basis of these extreme cases of support
vectors, it will classify it as a cat.

-
⦿ K-Nearest Neighbour is one of the simplest Supervised Machine Learning
that compares similarities between the K number of features (attributes)
of new dataset of a particular object with K features of the dataset of
available objects and put it into the category that is most similar to. Hence
the name is K-NN.
⦿ This algorithm can be used for Regression as well as for Classification,
but mostly it is useful for the classification problems.
⦿ It is a non-parametric algorithm, means it does not make any
assumption on underlying data and is also called a lazy learner
algorithm because it does not learn from the training set immediately
instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
⦿ In other words, at the training phase K-NN algorithm just stores the
dataset and when it gets new data, then it classifies that data
into a category that is much similar to the new data.

-
⦿ For Example, Suppose, we have a new animal that looks similar to cat and
dog, but we want to know either it is a cat or dog. So, for this identification,
we can use the KNN algorithm, as it works on a similarity principle. Our
KNN model will simply, find the similar features of the new data set into
the cats and dogs’ available dataset and based on the most similar
features it will put it in either category of cat or dog.

The intuition of KNN can be clearer through following implementation steps:

⦿ Step 1: Identify the problem type as either falling into classification or

regression.
⦿ Step 2: Fix a value for k features of an object to compare which can be any
number greater than zero.
⦿ Step 3: Now find k feature’s values in existing dataset of the specified
feature’s values from that are closest to the unknown/uncategorized
features based on distance (Euclidean Distance, Manhattan Distance etc.)
⦿ Step 4: Find the solution in either of the following steps:

-
1. If classification, assign the uncategorized object to the class where the
maximum number of neighbours belonged to.
or
2. If regression, find the average value of all the closest neighbours and
assign it as the value for the unknown object.

⦿ For step 3, the most used distance formula is Euclidean Distance which is
given as follows:
⦿ By Euclidean Distance, the distance between two points P1(x1,y1)and
P2(x2,y2) can be expressed as :

-
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and
we have a new data point x1, so this data point will lie in which of these
categories. To solve this type of problem, we need a K-NN algorithm. With
the help of K-NN, we can easily identify the category or class of a
particular dataset. Consider the below diagram:

-
How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:
⦿ Step-1: Select the number K of the neighbors
⦿ Step-2: Calculate the Euclidean distance of K number of neighbors
⦿ Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
⦿ Step-4: Among these k neighbors, count the number of the data points in
each category.
⦿ Step-5: Assign the new data points to that category for which the number
of the neighbor is maximum.
⦿ Step-6: Our model is ready.

⦿ Suppose we have a new data point and we need to put it in the required
category. Consider the below image:

-
Firstly, we will choose the number of neighbors, so we will choose the k=5.
Next, we will calculate the Euclidean distance between the data points.
The Euclidean distance is the distance between two points, which we have
already studied in geometry. It can be calculated as:

-
By calculating the Euclidean distance we got the nearest neighbors, as
three nearest neighbors in category A and two nearest neighbors in
category B. Consider the below image:

-
As we can see the 3 nearest neighbors are from category A, hence this
new data point must belong to category A.

-
How to select the value of K in the K-NN Algorithm?

Below are some points to remember while selecting the value of K in the
KNN algorithm:

⦿ There is no particular way to determine the best value for "K", so we need
to try some values to find the best out of them. The most preferred value
for K is 5.
⦿ A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.
⦿ Large values for K are good, but it may find some difficulties.

-
Advantages of KNN Algorithm:

⦿ It is simple to implement.
⦿ It is robust to the noisy training data
⦿ It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

⦿ Always needs to determine the value of K which may be complex some

time.
⦿ The computation cost is high because of calculating the distance between
the data points for all the training samples.

-
⦿ Decision Tree is a Supervised learning technique that can be used for
both classification and Regression problems, but mostly it is preferred for
solving Classification problems.

⦿ It is a tree-structured classifier, where internal nodes represent the

features of a dataset, branches represent the decision rules and each leaf
node represents the outcome.

⦿ In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision and
have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.

⦿ The decisions or the test are performed on the basis of features of the
given dataset.

⦿ It is a graphical representation for getting all the possible solutions to a

problem/decision based on given conditions.
-
⦿ It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like
structure.

⦿ In order to build a tree, we use the CART algorithm, which stands

for Classification and Regression Tree algorithm.

⦿ A decision tree simply asks a question, and based on the answer

(Yes/No), it further split the tree into sub trees.

⦿ Below diagram explains the general structure of a decision tree:

-
Why use Decision Trees?

Below are the two reasons for using the Decision tree:

1.Decision Trees usually mimic human thinking ability while making

a decision, so it is easy to understand.
2. The logic behind the decision tree can be easily understood
because it
shows a tree-like structure.

Decision Tree Terminologies

⦿ Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more
homogeneous sets.
⦿ Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.

-
⦿ Splitting: Splitting is the process of dividing the decision node/root node
into sub-nodes according to the given conditions.
⦿ Branch/Sub Tree: A tree formed by splitting the tree.
⦿ Pruning: Pruning is the process of removing the unwanted branches from
the tree.
⦿ Parent/Child node: The root node of the tree is called the parent node,
and other nodes are called the child nodes.

⦿ How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree. This algorithm compares
the values of root attribute with the record (real dataset) attribute and,
based on the comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with
the other sub-nodes and move further. It continues the process until it
reaches the leaf node of the tree. The complete process can be better
understood using the below algorithm:
-
⦿ Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
⦿ Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
⦿ Step-3: Divide the S into subsets that contains possible values for the best
attributes.
⦿ Step-4: Generate the decision tree node, which contains the best
attribute.
⦿ Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as a
leaf node.

-
Attribute Selection Measures :-
⦿ While implementing a Decision tree, the main issue arises that how to
select the best attribute for the root node and for sub-nodes. So, to solve
such problems there is a technique which is called as Attribute selection
measure or ASM. By this measurement, we can easily select the best
attribute for the nodes of the tree. There are two popular techniques for
ASM, which are:
1. Information Gain:
⦿ Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
⦿ It calculates how much information a feature provides us about a
class.
⦿ According to the value of information gain, we split the node and
build
the decision tree.
⦿ A decision tree algorithm always tries to maximize the value of
information gain, and a node/attribute having the highest information
gain is split first. It can be calculated using the below formula:
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each
-
feature)
Entropy: Entropy is a metric to measure the impurity in a given attribute. It
specifies randomness in data. Entropy can be calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
S= Total number of samples
P(yes)= probability of yes
P(no)= probability of no
2. Gini Index:
⦿ Gini index is a measure of impurity or purity used while creating a decision
tree in the CART(Classification and Regression Tree) algorithm.
⦿ An attribute with the low Gini index should be preferred as compared to the
high Gini index.
⦿ It only creates binary splits, and the CART algorithm uses the Gini index to
create binary splits.
⦿ Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj
2

-
Pruning: Getting an Optimal Decision tree
⦿ Pruning is a process of deleting the unnecessary nodes from a tree in order
to get the optimal decision tree.
⦿ A too-large tree increases the risk of overfitting, and a small tree may not
capture all the important features of the dataset. Therefore, a technique
that decreases the size of the learning tree without reducing accuracy is
known as Pruning. There are mainly two types of tree pruning technology
used:1.Cost Complexity Pruning and 2.Reduced Error Pruning.
Advantages of the Decision Tree
⦿ It is simple to understand as it follows the same process which a human
follow while making any decision in real-life.
⦿ It can be very useful for solving decision-related problems.
⦿ It helps to think about all the possible outcomes for a problem.
⦿ There is less requirement of data cleaning compared to other
algorithms.
Disadvantages of the Decision Tree
⦿ The decision tree contains lots of layers, which makes it complex.
⦿ It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.
⦿ For more class labels, the computational complexity of the decision tree
may increase. -
⦿ Random Forest is a popular machine learning algorithm that belongs to
the supervised learning technique. It can be used for both Classification
and Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.
⦿ As the name suggests, "Random Forest is a classifier that contains a
number of decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of that
dataset." Instead of relying on one decision tree, the random forest takes
the prediction from each tree and based on the majority votes of
predictions, and it predicts the final output.

⦿ The greater number of trees in the forest leads to higher accuracy

and prevents the problem of overfitting.
⦿ The below diagram explains the working of the Random Forest
algorithm:

-
⦿ In machine learning classification problems, there are often too many
factors on the basis of which the final classification is done. These factors
are basically variables called features. The higher the number of features,
the harder it gets to visualize the training set and then work on it.
Sometimes, most of these features are correlated, and hence redundant.
This is where dimensionality reduction algorithms come into play.
Dimensionality reduction is the process of reducing the number of
random variables under consideration, by obtaining a set of principal
variables. It can be divided into feature selection and feature extraction.

⦿ In Below figure, A 3-D classification problem can be hard to visualize,

whereas a 2-D one can be mapped to a simple 2 dimensional space, and a
1-D problem to a simple line. The below figure illustrates this concept,
where a 3-D feature space is split into two 1-D feature spaces, and later, if
found to be correlated, the number of features can be reduced even
further.

-
-
 Components of Dimensionality Reduction :

Feature selection: In this, we try to find a subset of the original set of

variables, or features, to get a smaller subset which can be used to model
the problem. It usually involves three ways:
• Filter
• Wrapper
• Embedded

Feature extraction: This reduces the data in a high dimensional space to a

lower dimension space, i.e. a space with lesser no. of dimensions.
 Methods of Dimensionality Reduction :
⦿ The various methods used for dimensionality reduction include:
⦿ Principal Component Analysis (PCA)
⦿ Linear Discriminate Analysis (LDA)
⦿ Generalized Discriminate Analysis (GDA)

-
⦿ Dimensionality reduction may be both linear or non-linear, depending
upon the method used. The prime linear method, called Principal
Component Analysis, or PCA.
⦿ Principal Component Analysis(PCA)
⦿ This method was introduced by Karl Pearson. It works on a condition that
while the data in a higher dimensional space is mapped to data in a lower
dimension space, the variance of the data in the lower dimensional space
should be maximum.

-
 It involves the following steps:

⦿ Construct the covariance matrix of the data.

⦿ Compute the eigenvectors of this matrix.
⦿ Eigenvectors corresponding to the largest eigenvalues are used to
reconstruct a large fraction of variance of the original data.
⦿ Hence, we are left with a lesser number of eigenvectors, and there might
have been some data loss in the process. But, the most important
variances should be retained by the remaining eigenvectors.

-
 Advantages of Dimensionality Reduction :

⦿ It helps in data compression, and hence reduced storage space.

⦿ It reduces computation time.
⦿ It also helps remove redundant features, if any.

 Disadvantages of Dimensionality Reduction :

⦿ It may lead to some amount of data loss.

⦿ PCA tends to find linear correlations between variables, which is
sometimes undesirable.
⦿ PCA fails in cases where mean and covariance are not enough to
define
datasets.
⦿ We may not know how many principal components to keep- in practice,
some thumb rules are applied.

-
Principal Component Analysis :
⦿ Principal Component Analysis is an unsupervised learning algorithm that
is used for the dimensionality reduction in machine learning. It is a
statistical process that converts the observations of correlated features
into a set of linearly uncorrelated features with the help of orthogonal
transformation. These new transformed features are called the Principal
Components. It is one of the popular tools that is used for exploratory
data analysis and predictive modeling. It is a technique to draw strong
patterns from the given dataset by reducing the variances.
⦿ PCA generally tries to find the lower-dimensional surface to project the
high-dimensional data.
⦿ PCA works by considering the variance of each attribute because the
high attribute shows the good split between the classes, and hence it
reduces the dimensionality. Some real-world applications of PCA
are image processing, movie recommendation system, optimizing
the
power allocation in various communication channels. It is a feature
extraction technique, so it contains the important variables and drops the
least important variable.
The PCA algorithm is based on some mathematical concepts such as:
⦿ Variance and Covariance
⦿ Eigenvalues and Eigen factors

Some common terms used in PCA algorithm:

⦿ Dimensionality: It is the number of features or variables present in the given
dataset. More easily, it is the number of columns present in the dataset.
⦿ Correlation: It signifies that how strongly two variables are related to each
other. Such as if one changes, the other variable also gets changed. The
correlation value ranges from -1 to +1. Here, -1 occurs if variables are
inversely proportional to each other, and +1 indicates that variables are
directly proportional to each other.
⦿ Orthogonal: It defines that variables are not correlated to each other, and
hence the correlation between the pair of variables is zero.
⦿ Eigenvectors: If there is a square matrix M, and a non-zero vector v is given.
Then v will be eigenvector if Av is the scalar multiple of v.
⦿ Covariance Matrix: A matrix containing the covariance between the pair of
variables is called the Covariance Matrix.
Principal Components in PCA :

⦿ As described above, the transformed new features or the output of PCA

are the Principal Components. The number of these PCs are either equal
to or less than the original features present in the dataset. Some
properties of these principal components are given below:
⦿ The principal component must be the linear combination of the original
features.
⦿ These components are orthogonal, i.e., the correlation between a pair of
variables is zero.
⦿ The importance of each component decreases when going to 1 to n, it
means the 1 PC has the most importance, and n PC will have the least
importance.
Steps for PCA algorithm :

1. Getting the dataset

Firstly, we need to take the input dataset and divide it into two subparts X
and Y, where X is the training set, and Y is the validation set.
2. Representing data into a structure
Now we will represent our dataset into a structure. Such as we will
represent the two-dimensional matrix of independent variable X. Here
each row corresponds to the data items, and the column corresponds to
the Features. The number of columns is the dimensions of the dataset.
3. Standardizing the data
In this step, we will standardize our dataset. Such as in a particular
column, the features with high variance are more important compared to
the features with lower variance.
If the importance of features is independent of the variance of the feature,
then we will divide each data item in a column with the standard deviation
of the column. Here we will name the matrix as Z.
4. Calculating the Covariance of Z
To calculate the covariance of Z, we will take the matrix Z, and will transpose
it. After transpose, we will multiply it by Z. The output matrix will be the
Covariance matrix of Z.
5. Calculating the Eigen Values and Eigen Vectors
Now we need to calculate the eigenvalues and eigenvectors for the resultant
covariance matrix Z. Eigenvectors or the covariance matrix are the directions
of the axes with high information. And the coefficients of these eigenvectors
are defined as the eigenvalues.
6. Sorting the Eigen Vectors
In this step, we will take all the eigenvalues and will sort them in decreasing
order, which means from largest to smallest. And simultaneously sort the
eigenvectors accordingly in matrix P of eigenvalues. The resultant matrix will
be named as P*.
7. Calculating the new features Or Principal Components
Here we will calculate the new features. To do this, we will multiply the P*
matrix to the Z. In the resultant matrix Z*, each observation is the linear
combination of original features. Each column of the Z* matrix is
independent of each other.
8. Remove less or unimportant features from the new dataset.
The new feature set has occurred, so we will decide here what to keep
and what to remove. It means, we will only keep the relevant or important
features in the new dataset, and unimportant features will be removed out.

Applications of Principal Component Analysis :

⦿ PCA is mainly used as the dimensionality reduction technique in various

AI applications such as computer vision, image compression, etc.
⦿ It can also be used for finding hidden patterns if data has high
dimensions. Some fields where PCA is used are Finance, data mining,
Psychology, etc.
 Evaluation metrics are tied to machine learning tasks. There are different
metrics for the tasks of classification, regression, ranking, clustering, topic
modeling, etc. Some metrics, such as precision-recall, are useful for
multiple tasks. Classification, regression, and ranking are examples of
supervised learning, which constitutes a majority of machine learning
applications.

 Model Accuracy:
⦿ Model accuracy in terms of classification models can be defined as the
ratio of correctly classified samples to the total number of samples:

-
-
⦿ True Positive (TP) — A true positive is an outcome where the
model correctly predicts the positive class.

⦿ True Negative (TN)—A true negative is an outcome where the

model correctly predicts the negative class.

⦿ False Positive (FP)—A false positive is an outcome where the

model incorrectly predicts the positive class.

⦿ False Negative (FN)—A false negative is an outcome where the

model incorrectly predicts the negative class.

Problem Statement- Build a prediction model for hospitals to identify

whether the patient is suffering from cancer or not .

-
⦿ Binary Classification Model — Predict whether the patient has cancer
or not.
⦿ Let’s assume we have a training dataset with labels—100 cases, 10 labeled
as ‘Cancer’, 90 labeled as ‘Normal’
⦿ Let’s try calculating the accuracy of this model on the above dataset, given
the following results:

-
⦿ In the above case let’s define the TP, TN, FP, FN:
⦿ TP (Actual Cancer and predicted Cancer) = 1
⦿ TN (Actual Normal and predicted Normal) = 90
⦿ FN (Actual Cancer and predicted Normal) = 8
⦿ FP (Actual Normal and predicted Cancer) = 1

-
⦿ So the accuracy of this model is 91%. But the question remains as to
whether this model is useful, even being so accurate?
⦿ This highly accurate model may not be useful, as it isn’t able to predict
the actual cancer patients—hence, this can have worst consequences.
⦿ So for these types of scenarios how do we can trust the machine learning
models?
⦿ Accuracy alone doesn’t tell the full story when we’re working with
a class-imbalanced dataset like this one, where there’s a significant
disparity between the number of positive and negative labels.

 Precision and Recall :

 In a classification task, the precision for a class is the number of true

positives (i.e. the number of items correctly labeled as belonging to the
positive class) divided by the total number of elements labeled as
belonging to the positive class (i.e. the sum of true positives and false
positives, which are items incorrectly labeled as belonging to the class).

-
-
⦿ Recall is defined as the number of true positives divided by the total
number of elements that actually belong to the positive class (i.e. the sum
of true positives and false negatives, which are items which were not
labeled as belonging to the positive class but should have been).

 High precision means that an algorithm returned substantially more

relevant results than irrelevant ones.
 High recall means that an algorithm returned most of the relevant
results

-
 Let’s try to measure precision and recall for our cancer prediction use
case:

Our model has a precision value of 0.5 — in other words, when it

predicts cancer, it’s correct 50% of the time.

Our model has a recall value of 0.11 — in other words, it correctly

identifies only 11% of all cancer patients.

-
Classification Accuracy :
⦿ Classification Accuracy is what we usually mean, when we use the term
accuracy. It is the ratio of number of correct predictions to the total
number of input samples.
⦿ It works well only if there are equal number of samples belonging to each
class.
⦿ For example, consider that there are 98% samples of class A and 2%
samples of class B in our training set. Then our model can easily get 98%
training accuracy by simply predicting every training sample belonging
to class A.
⦿ When the same model is tested on a test set with 60% samples of class A
and 40% samples of class B, then the test accuracy would drop down to
60%. Classification Accuracy is great, but gives us the false sense of
achieving high accuracy.
⦿ The real problem arises, when the cost of misclassification of the minor
class samples are very high. If we deal with a rare but fatal disease, the
cost of failing to diagnose the disease of a sick person is much higher
than the cost of sending a healthy person to more tests.

-
Thanks !!!

-
-
⦿ In machine learning classification problems, there are often too many
factors on the basis of which the final classification is done. These factors
are basically variables called features. The higher the number of features,
the harder it gets to visualize the training set and then work on it.
Sometimes, most of these features are correlated, and hence redundant.
This is where dimensionality reduction algorithms come into play.
Dimensionality reduction is the process of reducing the number of
random variables under consideration, by obtaining a set of principal
variables. It can be divided into feature selection and feature extraction.

⦿ In Below figure, A 3-D classification problem can be hard to visualize,

-
-
 Components of Dimensionality Reduction :

Feature selection: In this, we try to find a subset of the original set of

variables, or features, to get a smaller subset which can be used to model
the problem. It usually involves three ways:
• Filter
• Wrapper
• Embedded

Feature extraction: This reduces the data in a high dimensional space to a

-
 It involves the following steps:

⦿ Construct the covariance matrix of the data.

-
 Advantages of Dimensionality Reduction :

⦿ It helps in data compression, and hence reduced storage space.

⦿ It reduces computation time.
⦿ It also helps remove redundant features, if any.

 Disadvantages of Dimensionality Reduction :

⦿ It may lead to some amount of data loss.

-
The PCA algorithm is based on some mathematical concepts such as:
⦿ Variance and Covariance
⦿ Eigenvalues and Eigen factors

Some common terms used in PCA algorithm:

-
Principal Components in PCA :

⦿ As described above, the transformed new features or the output of PCA

-
Steps for PCA algorithm :

1. Getting the dataset

-
4. Calculating the Covariance of Z
To calculate the covariance of Z, we will take the matrix Z, and will transpose
it. After transpose, we will multiply it by Z. The output matrix will be the
Covariance matrix of Z.
5. Calculating the Eigen Values and Eigen Vectors
Now we need to calculate the eigenvalues and eigenvectors for the resultant
covariance matrix Z. Eigenvectors or the covariance matrix are the directions
of the axes with high information. And the coefficients of these eigenvectors
are defined as the eigenvalues.
6. Sorting the Eigen Vectors
In this step, we will take all the eigenvalues and will sort them in decreasing
order, which means from largest to smallest. And simultaneously sort the
eigenvectors accordingly in matrix P of eigenvalues. The resultant matrix will
be named as P*.
7. Calculating the new features Or Principal Components
Here we will calculate the new features. To do this, we will multiply the P*
matrix to the Z. In the resultant matrix Z*, each observation is the linear
combination of original features. Each column of the Z* matrix is
independent of each other.
-
8. Remove less or unimportant features from the new dataset.
The new feature set has occurred, so we will decide here what to keep
and what to remove. It means, we will only keep the relevant or important
features in the new dataset, and unimportant features will be removed out.

Applications of Principal Component Analysis :

⦿ PCA is mainly used as the dimensionality reduction technique in various

-
 Evaluation metrics are tied to machine learning tasks. There are different
metrics for the tasks of classification, regression, ranking, clustering, topic
modeling, etc. Some metrics, such as precision-recall, are useful for
multiple tasks. Classification, regression, and ranking are examples of
supervised learning, which constitutes a majority of machine learning
applications.

 Model Accuracy:
⦿ Model accuracy in terms of classification models can be defined as the
ratio of correctly classified samples to the total number of samples:

-
-
⦿ True Positive (TP) — A true positive is an outcome where the
model correctly predicts the positive class.

⦿ True Negative (TN)—A true negative is an outcome where the

model correctly predicts the negative class.

⦿ False Positive (FP)—A false positive is an outcome where the

model incorrectly predicts the positive class.

⦿ False Negative (FN)—A false negative is an outcome where the

model incorrectly predicts the negative class.

Problem Statement- Build a prediction model for hospitals to identify

whether the patient is suffering from cancer or not .

 Precision and Recall :

 In a classification task, the precision for a class is the number of true

 High precision means that an algorithm returned substantially more

relevant results than irrelevant ones.
 High recall means that an algorithm returned most of the relevant
results

-
 Let’s try to measure precision and recall for our cancer prediction use
case:

Our model has a precision value of 0.5 — in other words, when it

predicts cancer, it’s correct 50% of the time.

Our model has a recall value of 0.11 — in other words, it correctly

identifies only 11% of all cancer patients.

-
⦿ Clustering or cluster analysis is a machine learning technique, which
groups the unlabelled dataset. It can be defined as "A way of grouping
the data points into different clusters, consisting of similar data points.
The objects with the possible similarities remain in a group that has
less or no similarities with another group.“

⦿ It does it by finding some similar patterns in the unlabelled dataset such

as shape, size, color, behavior, etc., and divides them as per the presence
and absence of those similar patterns.
⦿ It is an unsupervised learning method, hence no supervision is provided
to the algorithm, and it deals with the unlabeled dataset.
⦿ After applying this clustering technique, each cluster or group is
provided with a cluster-ID. ML system can use this id to simplify the
processing of large and complex datasets.
⦿ The clustering technique is commonly used for statistical data
analysis.

-
The clustering technique can be widely used in various tasks. Some most
common uses of this technique are:
⦿ Market Segmentation
⦿ Statistical data analysis
⦿ Social network analysis
⦿ Image segmentation
⦿ Anomaly detection, etc.
⦿ Apart from these general usages, it is used by the Amazon in its
recommendation system to provide the recommendations as per the past
search of products. Netflix also uses this technique to recommend the
movies and web-series to its users as per the watch history.
⦿ The below diagram explains the working of the clustering algorithm. We
can see the different fruits are divided into several groups with similar
properties.

-
-
Types of Clustering Methods

⦿ The clustering methods are broadly divided into Hard Clustering (data
point belongs to only one group) and Soft Clustering (data points can
belong to another group also). But there are also other various
approaches of Clustering exist. Below are the main clustering methods
used in Machine learning:

⦿ Partitioning Clustering
⦿ Density-Based Clustering
⦿ Distribution Model-Based Clustering
⦿ Hierarchical Clustering
⦿ Fuzzy Clustering

-
1. Partitioning Clustering :
⦿ It is a type of clustering that divides the data into non-hierarchical groups. It
is also known as the Centroid-based method. The most common example
of partitioning clustering is the K-Means Clustering algorithm.
⦿ In this type, the dataset is divided into a set of k groups, where K is used to
define the number of pre-defined groups. The cluster center is created in
such a way that the distance between the data points of one cluster is
minimum as compared to another cluster centroid.

-
2. Density-Based Clustering :
⦿ The density-based clustering method connects the highly-dense areas into
clusters, and the arbitrarily shaped distributions are formed as long as the
dense region can be connected. This algorithm does it by identifying
different clusters in the dataset and connects the areas of high densities into
clusters. The dense areas in data space are divided from each other by
sparser areas.
⦿ These algorithms can face difficulty in clustering the data points if the
dataset has varying densities and high dimensions.

-
3. Distribution Model-Based Clustering :
⦿ In the distribution model-based clustering method, the data is divided
based on the probability of how a dataset belongs to a particular
distribution. The grouping is done by assuming some distributions
commonly Gaussian Distribution.
⦿ The example of this type is the Expectation-Maximization
Clustering
algorithm that uses Gaussian Mixture Models (GMM).

-
4. Hierarchical Clustering :
⦿ Hierarchical clustering can be used as an alternative for the partitioned
clustering as there is no requirement of pre-specifying the number of
clusters to be created. In this technique, the dataset is divided into
clusters to create a tree-like structure, which is also called a dendrogram.
The observations or any number of clusters can be selected by cutting
the tree at the correct level. The most common example of this method is
the Agglomerative Hierarchical algorithm.

-
5. Fuzzy Clustering :
⦿ Fuzzy clustering is a type of soft method in which a data object may
belong to more than one group or cluster. Each dataset has a set of
membership coefficients, which depend on the degree of membership to
be in a cluster. Fuzzy C - means algorithm is the example of this type of
clustering; it is sometimes also known as the Fuzzy k-means algorithm.

⦿ Applications of Clustering :

6. In Identification of Cancer Cells: The clustering algorithms are widely

used for the identification of cancerous cells. It divides the cancerous
and non-cancerous data sets into different groups.

7. In Search Engines: Search engines also work on the clustering technique.

The search result appears based on the closest object to the search
query. It does it by grouping similar data objects in one group that is far
from the other dissimilar objects. The accurate result of a query depends
on the quality of the clustering algorithm used.

-
3. Customer Segmentation: It is used in market research to segment the
customers based on their choice and preferences.

4. In Biology: It is used in the biology stream to classify different species of

plants and animals using the image recognition technique.

5. In Land Use: The clustering technique is used in identifying the area of

similar lands use in the GIS database. This can be very useful to find
that for what purpose the particular land should be used, that means
for which purpose it is more suitable.

-
⦿ K-Means Clustering is an Unsupervised Learning algorithm, which groups
the unlabeled dataset into different clusters. Here K defines the number of
pre-defined clusters that need to be created in the process, as if K=2, there
will be two clusters, and for K=3, there will be three clusters, and so on.

⦿ It allows us to cluster the data into different groups and a convenient way to
discover the categories of groups in the unlabeled dataset on its own
without the need for any training.

⦿ It is a centroid-based algorithm, where each cluster is associated with a

centroid. The main aim of this algorithm is to minimize the sum of distances
between the data point and their corresponding clusters.

⦿ The algorithm takes the unlabeled dataset as input, divides the dataset into
k-number of clusters, and repeats the process until it does not find the best
clusters. The value of k should be predetermined in this algorithm.

-
The k-means clustering algorithm mainly performs two tasks:

⦿ Determines the best value for K center points or centroids by an iterative

process.

⦿ Assigns each data point to its closest k-center. Those data points which
are near to the particular k-center, create a cluster.

⦿ Hence each cluster has data points with some commonalities, and it is
away from other clusters. The below diagram explains the working of the
K-means Clustering Algorithm:

-
-
How does the K-Means Algorithm Work?
The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input
dataset).
Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.

-
Suppose we have two variables M1 and M2. The x-y axis scatter plot of
these two variables is given below:

-
⦿ Let's take number k of clusters, i.e., K=2, to identify the dataset and to put
them into different clusters. It means here we will try to group these
datasets into two different clusters.
⦿ We need to choose some random k points or centroid to form the cluster.
These points can be either the points from the dataset or any other point.
So, here we are selecting the below two points as k points, which are not
the part of our dataset. Consider the below image:

-
Now we will assign each data point of the scatter plot to its closest K-
point or centroid. We will compute it by applying some mathematics that
we have studied to calculate the distance between two points. So, we will
draw a median between both the centroids. Consider the below image:

-
⦿ From the above image, it is clear that points left side of the line is near to
the K1 or blue centroid, and points to the right of the line are close to the
yellow centroid. Let's color them as blue and yellow for clear visualization.

-
⦿ As we need to find the closest cluster, so we will repeat the process by
choosing a new centroid. To choose the new centroids, we will compute
the center of gravity of these centroids, and will find new centroids as
below:

-
⦿ Next, we will reassign each datapoint to the new centroid. For this, we will
repeat the same process of finding a median line. The median will be like
below image:

-
⦿ From the above image, we can see, one yellow point is on the left side of
the line, and two blue points are right to the line. So, these three points
will be assigned to new centroids.

-
⦿ As reassignment has taken place, so we will again go to the step-4, which
is finding new centroids or K-points.
⦿ We will repeat the process by finding the center of gravity of centroids, so
the new centroids will be as shown in the below image:

-
⦿ As we got the new centroids so again will draw the median line and
reassign the data points. So, the image will be:

-
⦿ We can see in the above image; there are no dissimilar data points on
either side of the line, which means our model is formed. Consider the
below image:

-
⦿ As our model is ready, so we can now remove the assumed centroids, and
the two final clusters will be as shown in the below image:

-
Choose the value of "K number of clusters" in K-means Clustering :

The performance of the K-means clustering algorithm depends upon

highly efficient clusters that it forms. But choosing the optimal number of
clusters is a big task. There are some different ways to find the optimal
number of clusters, but here we are discussing the most appropriate
method to find the number of clusters or value of K. The method is given
below:

Elbow Method :

The Elbow method is one of the most popular ways to find the optimal
number of clusters. This method uses the concept of WCSS
value. WCSS stands for Within Cluster Sum of Squares, which defines
the total variations within a cluster. The formula to calculate the value of
WCSS (for 3 clusters) is given below:

-
WCSS= ∑Pi in Cluster1 distance(Pi C 1 ) 2 +∑Pi in Cluster2 distance(Pi C 2 ) 2 +∑ P i in CLuster3 distance(Pi C 3 ) 2

In the above formula of WCSS,

⦿ ∑Pi in Cluster1 distance(Pi C 1 ) 2 : It is the sum of the square of the distances
between each data point and its centroid within a cluster1 and the same for
the other two terms.
⦿ To measure the distance between data points and centroid, we can use any
method such as Euclidean distance or Manhattan distance.
⦿ To find the optimal value of clusters, the elbow method follows the below
steps:
⦿ It executes the K-means clustering on a given dataset for different K
values
(ranges from 1-10).
⦿ For each value of K, calculates the WCSS value.
⦿ Plots a curve between calculated WCSS values and the number of
clusters K.
⦿ The sharp point of bend or a point of the plot looks like an arm, then that
point is considered as the best value of K.
⦿ Since the graph shows the sharp bend, which looks like an elbow, hence
it is
known as the elbow method. The graph for- the elbow method looks like the
below image:
-
⦿ Hierarchical clustering is another unsupervised machine learning
algorithm, which is used to group the unlabeled datasets into a cluster
and also known as hierarchical cluster analysis or HCA.
⦿ In this algorithm, we develop the hierarchy of clusters in the form of a
tree, and this tree-shaped structure is known as the dendrogram.
⦿ Sometimes the results of K-means clustering and hierarchical clustering
may look similar, but they both differ depending on how they work. As
there is no requirement to predetermine the number of clusters as we did
in the K-Means algorithm.

The hierarchical clustering technique has two approaches:

 Agglomerative: Agglomerative is a bottom-up approach, in which the
algorithm starts with taking all data points as single clusters and merging
them until one cluster is left.
 Divisive: Divisive algorithm is the reverse of the agglomerative algorithm
as it is a top-down approach.

-
1. Agglomerative Hierarchical clustering :

⦿ The agglomerative hierarchical clustering algorithm is a popular example

of HCA. To group the datasets into clusters, it follows the bottom-up
approach. It means, this algorithm considers each dataset as a single
cluster at the beginning, and then start combining the closest pair of
clusters together. It does this until all the clusters are merged into a single
cluster that contains all the datasets.
⦿ This hierarchy of clusters is represented in the form of the dendrogram.

-
Working of Agglomerative Hierarchical clustering :
The working of the AHC algorithm can be explained using the below steps:
Step-1: Create each data point as a single cluster. Let's say there are N data
points, so the number of clusters will also be N.

-
Step-2: Take two closest data points or clusters and merge them to form one
cluster. So, there will now be N-1 clusters.

-
Step-3: Again, take the two closest clusters and merge them together to form
one cluster. There will be N-2 clusters.

-
Step-4: Repeat Step 3 until only one cluster left. So, we will get the following
clusters. Consider the below images:

-
-
Step-5: Once all the clusters are combined into one big cluster, develop
the dendrogram to divide the clusters as per the problem.
-
Measure for the distance between two clusters :
⦿ As we have seen, the closest distance between the two clusters is crucial

for the hierarchical clustering. There are various ways to calculate the
distance between two clusters, and these ways decide the rule for
clustering. These measures are called Linkage methods. Some of the
popular linkage methods are given below:
1.Single Linkage: It is the Shortest Distance between the closest points
of the clusters. Consider the below image:

-
2.Complete Linkage: It is the farthest distance between the two points of
two different clusters. It is one of the popular linkage methods as it forms
tighter clusters than single-linkage.

-
3.Average Linkage: It is the linkage method in which the distance
between each pair of datasets is added up and then divided by the total
number of datasets to calculate the average distance between two
clusters. It is also one of the most popular linkage methods.
4.Centroid Linkage: It is the linkage method in which the distance
between the centroid of the clusters is calculated. Consider the below
image:

-
Working of Dendrogram in Hierarchical clustering :

⦿ The dendrogram is a tree-like structure that is mainly used to store each

step as a memory that the H C algorithm performs. In the dendrogram
plot, the Y-axis shows the Euclidean distances between the data points,
and the x-axis shows all the data points of the given dataset.
⦿ The working of the dendrogram can be explained using the below
diagram:

-
-
⦿ In the above diagram, the left part is showing how clusters are created in
agglomerative clustering, and the right part is showing the corresponding
dendrogram.
⦿ As we have discussed above, firstly, the data points P2 and P3 combine
together and form a cluster, correspondingly a dendrogram is created,
which connects P2 and P3 with a rectangular shape. The height is decided
according to the Euclidean distance between the data points.
⦿ In the next step, P5 and P6 form a cluster, and the corresponding
dendrogram is created. It is higher than of previous, as the Euclidean
distance between P5 and P6 is a little bit greater than the P2 and P3.
⦿ Again, two new dendrogram are created that combine P1, P2, and P3 in
one dendrogram, and P4, P5, and P6, in another dendrogram.
⦿ At last, the final dendrogram is created that combines all the data
points
together.
⦿ We can cut the dendrogram tree structure at any level as per our
requirement.

-
⦿ Association rule is a type of unsupervised learning technique that checks
for the dependency of one data item on another data item and maps
accordingly so that it can be more profitable. It tries to find some
interesting relations or associations among the variables of dataset. It is
based on different rules to discover the interesting relations between
variables in the database.
⦿ The association rule is one of the very important concepts of machine
learning, and it is employed in Market Basket analysis, Web usage
mining, continuous production, etc. Here market basket analysis is a
technique used by the various big retailer to discover the associations
between items. We can understand it by taking an example of a
supermarket, as in a supermarket, all products that are purchased
together are put together.
⦿ For example, if a customer buys bread, he most likely can also buy butter,
eggs, or milk, so these products are stored within a shelf or mostly nearby.
Consider the below diagram:

-
-
Association rule learning can be divided into three types of algorithms:
⦿ Apriori
⦿ Eclat
⦿ F-P Growth Algorithm

How does Association Rule Learning work?

⦿ Association rule learning works on the concept of If and Else Statement,
such as if A then B.

-
⦿ Here the If element is called Antecedent, and then statement is called
as Consequent. These types of relationships where we can find out some
association or relation between two items is known as single cardinality. It
is all about creating rules, and if the number of items increases, then
cardinality also increases accordingly. So, to measure the associations
between thousands of data items, there are several metrics. These metrics
are given below:
⦿ Support
⦿ Confidence
⦿ Lift

1.Support : Support is the frequency of A or how frequently an item

appears in the dataset. It is defined as the fraction of the transaction T that
contains the item set X. If there are X datasets, then for transactions T, it
can be written as:

-
2.Confidence : Confidence indicates how often the rule has been found to
be true. Or how often the items X and Y occur together in the dataset when
the occurrence of X is already given. It is the ratio of the transaction that
contains X and Y to the number of records that contain X.

3.Lift : It is the strength of any rule, which can be defined as below formula:

-
It is the ratio of the observed support measure and expected support if X
and Y are independent of each other. It has three possible values:

⦿ If Lift= 1: The probability of occurrence of antecedent and consequent is

independent of each other.
⦿ Lift>1: It determines the degree to which the two item sets are dependent
to each other.
⦿ Lift<1: It tells us that one item is a substitute for other items, which means
one item has a negative effect on another.

-
Applications of Association Rule :

It has various applications in machine learning and data mining. Below

are some popular applications of association rule learning:

⦿ Market Basket Analysis: It is one of the popular examples and

applications of association rule mining. This technique is commonly used
by big retailers to determine the association between items.
⦿ Medical Diagnosis: With the help of association rules, patients can be
cured easily, as it helps in identifying the probability of illness for a
particular disease.
⦿ Protein Sequence: The association rules help in determining the
synthesis of artificial Proteins.
⦿ It is also used for the Catalog Design and Loss-leader Analysis and
many more other applications.

-
Apriori Algorithm :
The Apriori algorithm uses frequent item sets to generate association rules,
and it is designed to work on the databases that contain transactions. With
the help of these association rule, it determines how strongly or how weakly
two objects are connected. This algorithm uses a Breadth-first
search and Hash Tree to calculate the item set associations efficiently. It is
the iterative process for finding the frequent item sets from the large dataset.
This algorithm was given by the R. Agrawal and Srikant in the year 1994. It
is mainly used for market basket analysis and helps to find those products
that can be bought together. It can also be used in the healthcare field to find
drug reactions for patients.

What is Frequent Itemset?

⦿ Frequent itemsets are those items whose support is greater than the

threshold value or user-specified minimum support. It means if A & B are the

frequent itemsets together, then individually A and B should also be the
frequent itemset.Suppose there are the two transactions: A= {1,2,3,4,5}, and
B= {2,3,7}, in these two transactions, 2 and 3 are the frequent itemsets.
-
Steps for Apriori Algorithm :

Step-1: Determine the support of itemsets in the transactional database, and

select the minimum support and confidence.

Step-2: Take all supports in the transaction with higher support value than
the minimum or selected support value.

Step-3: Find all the rules of these subsets that have higher confidence value
than the threshold or minimum confidence.

Step-4: Sort the rules as the decreasing order of lift.

-
Advantages of Apriori Algorithm :

⦿ This is easy to understand algorithm

⦿ The join and prune steps of the algorithm can be easily implemented on
large datasets.

Disadvantages of Apriori Algorithm :

⦿ The apriori algorithm works slow compared to other algorithms.

⦿ The overall performance can be reduced as it scans the database for
multiple times.
⦿ The time complexity and space complexity of the apriori algorithm is
O(2 D ), which is very high. Here D represents the horizontal width present
in the database.

-
Apriori Algorithm Working :

Example: Suppose we have the following dataset that has various

transactions, and from this dataset, we need to find the frequent itemsets
and generate the association rules using the Apriori algorithm:

-
Solution:

Step-1: Calculating C1 and L1:

In the first step, we will create a table that contains support count (The
frequency of each itemset individually in the dataset) of each itemset in
the given dataset. This table is called the Candidate set or C1.

-
Now, we will take out all the itemsets that have the greater support count
that the Minimum Support (2). It will give us the table for the frequent
itemset L1.
Since all the itemsets have greater or equal support count than the
minimum support, except the E, so E itemset will be removed.

-
Step-2: Candidate Generation C2, and L2:

In this step, we will generate C2 with the help of L1. In C2, we will create
the pair of the itemsets of L1 in the form of subsets.
After creating the subsets, we will again find the support count from the
main transaction table of datasets, i.e., how many times these pairs have
occurred together in the given dataset. So, we will get the below table for
C2:

-
Again, we need to compare the C2 Support count with the minimum
support count, and after comparing, the itemset with less support count
will be eliminated from the table C2. It will give us the below table for L2

-
Step-3: Candidate generation C3, and L3:
For C3, we will repeat the same two processes, but now we will form the
C3 table with subsets of three itemsets together, and will calculate the
support count from the dataset. It will give the below table:

Now we will create the L3 table. As we can see from the above C3 table,
there is only one combination of itemset that has support count equal to
the minimum support count. So, the L3 will have only one combination,
i.e., {A, B, C}.

-
Step-4: Finding the association rules for the subsets:

To generate the association rules, first, we will create a new table with the
possible rules from the occurred combination {A, B.C}. For all the rules,
we will calculate the Confidence using formula sup( A ^B)/A. After
calculating the confidence value for all rules, we will exclude the rules
that have less confidence than the minimum threshold(50%).
Consider the below table:

-
Rules Support Confidence

A ^B → C 2 Sup{(A ^B) ^C}/sup

(A ^B)= 2/4=0.5=50%

B^C → A 2 Sup{(B^C) ^A}/sup(B ^C)= 2/4=0.5=50%

A^C → B 2 Sup{(A ^C) ^B}/sup(A ^C)= 2/4=0.5=50%

C→ A ^B 2 Sup{(C^( A ^B)}/sup(C)= 2/5=0.4=40%

A→ B^C 2 Sup{(A^( B ^C)}/sup(A)= 2/6=0.33=33.33%

B→ B^C 2 Sup{(B^( B ^C)}/sup(B)= 2/7=0.28=28%

As the given threshold or minimum confidence is 50%, so the first three

rules A ^B → C , B^C → A, and A^C → B can be considered as the strong
association rules for the given problem.

-
⦿ Eclat, abbreviated as Equivalence class clustering and bottom up
lattice transversal algorithm is an algorithm for finding frequent
item sets in a transaction or dataset. It is one of the best alternative
method of Association Rule Learning and is a more efficient and
scalable version of the Apriori algorithm. Apriori algorithm works
in a horizontal sense imitating the Breadth-First Search of a graph,
whereas the ECLAT algorithm works in a vertical manner just like
the Depth-First Search of a graph. This vertical style of the ECLAT
algorithm makes it a faster algorithm than the Apriori algorithm.
⦿ Generally, Transaction Id set which is also called as tidsetsis used
to calculate Support value of a dataset. In the first call of function,
all single items or data are used along with their respective
tidsets. Then the function is called recursively.In each recursive
call, each item in tidsets pair is verified and combined with other
item in tidsets pairs. This process is repeated until no candidate
item in tidsets pairs can be combined.

-
⦿ The input given to this Eclat algorithm is a transaction dataset and a
threshold value which is in the range of 0 to 100.
⦿ A transaction dataset is a set of transaction values where each transaction
is a set of items. It is important to note that an item should not be appear
more than once in the same transaction and also the items are assumed to
be sorted by lexicographical order in a transaction.
⦿ Each frequent itemset is marked with its corresponding support value.
The support of an itemset is given by number of times the itemset
appears in the transaction dataset.
⦿ The given transaction data should be a Boolean matrix where for each cell
(i, j), the value denotes that whether the jth item is included in the
ith transaction or not. Here, 1 means true and 0 means false.
⦿ Now, we have to call the function for the first time and arrange each item
with its tidset in a tabular column. We have to call this function
iteratively till no more item-tidset pairs can be combined.

-
⦿ As discussed earlier, the basic idea of eclatis to use Transaction Id
Sets(tidsets) intersections to compute the support value of a candidate.In
the first call of the function, all single items are used along with their
tidsets. Then the function is called recursively and in each recursive call,
each item-tidset pair is verified and combined with other item-tidset
pairs. This process is continued until no candidate item-tidset pairs can
be combined.

⦿Let us see how it works with an

example:- Consider the following

transactions record:-

-
Transaction Id Bread Butter Milk Coffee Tea
T1 1 1 0 0 1
T2 0 1 0 1 0
T3 0 1 1 0 0
T4 1 1 0 1 0
T5 1 0 1 0 0
T6 0 1 1 0 0
T7 1 0 1 0 0
T8 1 1 1 0 1

T9 1 1 1 0 0

-
⦿ Each cell (i, j), of the above given data, which is a boolean matrix, denotes
whether the j’th item is included in the i’th transaction or not. 1 means true
while 0 means false.
⦿ First time we now call the function and arrange each item with it’stidset in
a tabular fashion:-
⦿ k = 1, minimum support = 2

ITEM TIDSET
Bread {T1, T4, T5, T7, T8, T9}
Butter {T1, T2, T3, T4, T6, T8, T9}
Milk {T3, T5, T6, T7, T8, T9}
Coffee {T2, T4}
Tea {T1, T8}

-
We now recursively call the function till no more item-tidset pairs can be
combined:-
k=2

ITEM TIDSET
{Bread, Butter} {T1, T4, T8, T9}
{Bread, Milk} {T5, T7, T8, T9}
{Bread, Coffee} {T4}
{Bread, Tea} {T1, T8}
{Butter, Milk} {T3, T6, T8, T9}
{Butter, Coffee} {T2, T4}
{Butter, Tea} {T1, T8}
{Milk, Tea} {T8}

-
K=3

ITEM TIDSET
{Bread, Butter, Milk} {T8, T9}
{Bread, Butter, Tea} {T1, T8}

K=4

ITEM TIDSET
{Bread, Butter, Milk, Tea} {T8}

We stop at k = 4 because there are no more item-tidset pairs to combine.

Since minimum support = 2, we conclude the following rules from the
given dataset:-

-
ITEMS BOUGHT RECOMMENDED PRODUCTS
Bread Butter
Bread Milk
Bread Tea
Butter Milk
Butter Coffee
Butter Tea
Bread and Butter Milk
Bread and Butter Tea

-
Advantages Eclat algorithm over Apriori algorithm:-

⦿ Memory Requirements: Since the ECLAT algorithm uses a Depth-First

Search method, it uses less memory than Apriori algorithm.
⦿ Speed: The ECLAT algorithm is naturally faster than the Apriori
algorithm.
⦿ Number of Computations: The ECLAT algorithm does not comprise the
repeated scanning of the data to compute the individual support values.

-
⦿ Reinforcement Learning is defined as a Machine Learning method that is
concerned with how software agents should take actions in an
environment.
⦿ Alternatively, It is the training to machine learning models that make
a sequence of decisions. The agent(algorithm/software) learns to achieve a
goal in an uncertain, potentially complex environment. In reinforcement
learning, an artificial intelligence faces a game-like situation.
The computer employs trial and error to come up with a solution
to the problem. To get the machine to do what the programmer wants,
the artificial intelligence gets either rewards or penalties for the actions it
performs. Its goal is to maximize the total reward.
⦿ It is one of third basic machine learning paradigm, along with supervised
and unsupervised learning which we already covered within units 3 to 6.
⦿ Popular applications of reinforcement learning ranges in wide of areas,
including from robotics, optimizing chemical reactions, games, assisting
human, understanding consequences of different strategies, self-driving
car, to medical and industrial puposes.

-
Fig.: Components of Reinforcement Learning

-
Here are some important terms used in Reinforcement Learning:

⦿ Agent: It is an assumed entity which performs actions in an environment

to gain some reward.
⦿ Environment (e): A scenario that an agent has to face.
⦿ Reward (R): An immediate return given to an agent when he or she
performs specific action or task.
⦿ State (s): State refers to the current situation returned by the
environment.
⦿ Policy (π): It is a strategy which applies by the agent to decide the next
action based on the current state.
⦿ Value (V): It is expected long-term return with discount, as compared to
the short-term reward.

-
⦿ Value Function: It specifies the value of a state that is the total amount of
reward. It is an agent which should be expected beginning from that state.
⦿ Model of the environment: This mimics the behaviour of the
environment. It helps you to make inferences to be made and also
determine how the environment will behave.
⦿ Model based methods: It is a method for solving reinforcement learning
problems which use model-based methods.
⦿ Q value or action value (Q): Q value is quite similar to value. The only
difference between the two is that it takes an additional parameter as a
current action.

-
Working Of Reinforcement Learning :

⦿ Working of reinforcement learning mechanism can be understood by using

following example:
⦿ Consider the scenario of teaching new tricks to your pet say cat (acts like an
agent in reinforcement learning).
⦿ Since, cat doesn't understand English or any other human language, during
training, we can't tell her directly what to do. Instead, we follow something
different strategy.
⦿ We create a situation, so that the cat tries to respond in many different ways. If
the cat's response is the as expected, we will give her fish (reward).
⦿ Now whenever the cat is exposed to the same situation, the cat performs a
similar action with even more enthusiastically in expectation of getting more
reward(food).
⦿ Reinforcement learning is that's kind of learning that cat gets from "what to
do" from positive experiences. At the same time, the cat also studies what not
do when faced with negative experiences.

-
-
Explanation of Example:

⦿ Your cat is an agent that is operating in an environment which is your

house. An example of a state could be your cat sitting, and you use a
specific word to intimate the cat for walk.
⦿ Agent (cat) reacts by performing an action i.e. transition from one
"state"
say (sitting) to another "state” say (walking).
⦿ The reaction of an agent is an action, and the policy is a processor method
of selecting an action given a state in expectation of better outcomes.
⦿ After the transition, agent may get a reward or penalty in return.

-
Applications Of Reinforcement Learning :
⦿ RL is mainly used to train Robots for industrial automation.
⦿ Business strategy planning are decided with RL. Bonsai is one of several

startups building tools to enable companies to use RL and other

techniques for industrial applications.
⦿ Education and training: Online platforms are beginning to experiment

with machine learning to create personalized experiences. Several

researchers are exploring the use of RL and other machine learning
methods in tutoring systems and personalized learning. The use of RL can
lead to training systems that provide custom instruction and materials
tuned to the needs of individual students. A group of researchers are
developing RL algorithms and statistical methods that require less data for
use in future tutoring systems.
⦿ Machine learning and data processing:AutoML from Google uses RL to

produce state-of-the-art machine-generated neural network architectures

for computer vision and language modelling.

-
⦿ Health and medicine: The RL setup of an agent interacting with an
environment receiving feedback based on actions taken, shares
similarities with the problem of learning treatment policies in the medical
sciences. In fact, many RL applications in health care mostly relate to
finding optimal treatment policies. Recent papers mentioned applications
of RL to usage of medical equipment, medication dosing, and two-stage
clinical trials.
⦿ Aircraft control and robot motion control
⦿ Text, speech, and dialog systems: Companies collect a lot of text, and
good tools that can help unlock unstructured text will find users. AI
researchers at SalesForce used deep RL for abstractive text
summarization (a technique for automatically generating summaries from
text based on content “abstracted” from some original text document).
This could be an area where RL-based tools gain new users, as many
companies are in need of better text mining solutions.

-
⦿ RL is also being used to allow dialog systems (i.e., chatbots) to learn from
user interactions and thus help them improve over time (many enterprise
chatbots currently rely on decision trees). This is an active area of
research and V C investments: see Semantic Machines and VocalIQ—
acquired by Apple.
⦿ Media and advertising: Microsoft recently described an internal system
called Decision Service that has since been made available on Azure. This
paper describes applications of Decision Service to content
recommendation and advertising. Decision Service more generally
targets machine learning products that suffer from failure modes
including “feedback loops and bias, distributed data collection, changes
in the environment, and weak monitoring and debugging.”
⦿ Other applications of RL include cross-channel marketing optimization
and real time bidding systems for online display advertising.
⦿ Finance: A Financial Times article pronounced an RL-based system for
optimal trade execution. The system (dubbed “LOXM”) is being used to
perform trading orders at maximum speed and at the best possible price.

-
Thanks !!!

Ai - Unit 1,2,3,4 Notes
100% (2)
Ai - Unit 1,2,3,4 Notes
167 pages
Chapter - 3 - Artificial Intelligence (AI)
100% (2)
Chapter - 3 - Artificial Intelligence (AI)
51 pages
All in One AICourse
No ratings yet
All in One AICourse
173 pages
Servicemanual 20070321 Am232 PDF
No ratings yet
Servicemanual 20070321 Am232 PDF
104 pages
AI UNIT-1 BCA&BSCCS IV Sem
No ratings yet
AI UNIT-1 BCA&BSCCS IV Sem
24 pages
Ai PDF
No ratings yet
Ai PDF
23 pages
AI Notes
50% (2)
AI Notes
121 pages
Iveco Abs Fault Codes PDF
0% (5)
Iveco Abs Fault Codes PDF
6 pages
Chapter - 3 - Artificial Intelligence (AI)
100% (1)
Chapter - 3 - Artificial Intelligence (AI)
63 pages
Chapter Three: Artificial Intelligence (Ai)
100% (2)
Chapter Three: Artificial Intelligence (Ai)
39 pages
Unit 1 - Unit 2
No ratings yet
Unit 1 - Unit 2
288 pages
Artificial Intelligence UNIT - 1
No ratings yet
Artificial Intelligence UNIT - 1
15 pages
Unit-1 Introduction To Ai
No ratings yet
Unit-1 Introduction To Ai
153 pages
Mx3ipg2a PDF
No ratings yet
Mx3ipg2a PDF
2 pages
MSC - Ai Notes
No ratings yet
MSC - Ai Notes
104 pages
Chapter 3 Artifical Intelligence
No ratings yet
Chapter 3 Artifical Intelligence
28 pages
Lms - Uaf.edu - PK Course Uaf Student Result - PHP PDF
100% (1)
Lms - Uaf.edu - PK Course Uaf Student Result - PHP PDF
4 pages
Chapter - 3 - Artificial Intelligence (AI)
No ratings yet
Chapter - 3 - Artificial Intelligence (AI)
64 pages
L-1 Concept of Artificial Intelligence (Defn, Histr, ST & Ty of AI)
No ratings yet
L-1 Concept of Artificial Intelligence (Defn, Histr, ST & Ty of AI)
30 pages
Shading Devices PDF
No ratings yet
Shading Devices PDF
39 pages
Training Practice Questions V11
0% (1)
Training Practice Questions V11
8 pages
Ai Full Notes
No ratings yet
Ai Full Notes
101 pages
Artificial Intelligence Notes
No ratings yet
Artificial Intelligence Notes
156 pages
AI Unit 1
No ratings yet
AI Unit 1
16 pages
Final Updated AI
No ratings yet
Final Updated AI
475 pages
Emerging Tech CH 3
No ratings yet
Emerging Tech CH 3
50 pages
"Dorothy Meets The Scarecrow" : English
No ratings yet
"Dorothy Meets The Scarecrow" : English
25 pages
Attendence System Using Python
No ratings yet
Attendence System Using Python
6 pages
AI UNIT 1 - Compressed
No ratings yet
AI UNIT 1 - Compressed
64 pages
Poison Ivy 2.1.0 Documentation by Shapeless - Tutorial
No ratings yet
Poison Ivy 2.1.0 Documentation by Shapeless - Tutorial
12 pages
Data Privacy Policy PDF
No ratings yet
Data Privacy Policy PDF
5 pages
H2 Intro To AI and Agents
No ratings yet
H2 Intro To AI and Agents
20 pages
Sensor Trainer For Factory Automation: - Mechatronics
No ratings yet
Sensor Trainer For Factory Automation: - Mechatronics
14 pages
UNIT-1 Introduction To Artificial Intelligence: Mrs - Harsha Patil, Dr.D.Y.Patil ACS College, Pimpri, Pune
No ratings yet
UNIT-1 Introduction To Artificial Intelligence: Mrs - Harsha Patil, Dr.D.Y.Patil ACS College, Pimpri, Pune
588 pages
Project Management (Mgt-302) : Engr. Abdul Basit B.Sc. Civil Engineer
No ratings yet
Project Management (Mgt-302) : Engr. Abdul Basit B.Sc. Civil Engineer
30 pages
Artificial Intelligence UNIT-1: Neha Assiatant Professor Dft-Nift
No ratings yet
Artificial Intelligence UNIT-1: Neha Assiatant Professor Dft-Nift
62 pages
Sensors: Design and Construction of An ROV For Underwater Exploration
No ratings yet
Sensors: Design and Construction of An ROV For Underwater Exploration
25 pages
AI RajeevSir Merged
No ratings yet
AI RajeevSir Merged
148 pages
Ai Project
No ratings yet
Ai Project
15 pages
Chapter 3 - Introduction Artificial Intelligence
No ratings yet
Chapter 3 - Introduction Artificial Intelligence
48 pages
Chapter - 3 - Artificial Intelligence (AI)
No ratings yet
Chapter - 3 - Artificial Intelligence (AI)
66 pages
Week 1
No ratings yet
Week 1
17 pages
Chapter-1 AI (TE)
No ratings yet
Chapter-1 AI (TE)
28 pages
I Unit
No ratings yet
I Unit
43 pages
Ai PDF
No ratings yet
Ai PDF
23 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
34 pages
Introduction To Emerging Technologies Chapter 3
No ratings yet
Introduction To Emerging Technologies Chapter 3
47 pages
Full Unit 1 Int 426
No ratings yet
Full Unit 1 Int 426
59 pages
AI Unit 1 Stud
No ratings yet
AI Unit 1 Stud
43 pages
Chapter 3 Slide Note
No ratings yet
Chapter 3 Slide Note
39 pages
Ai Unit1
No ratings yet
Ai Unit1
21 pages
Unit I:: Artificial Intelligence
No ratings yet
Unit I:: Artificial Intelligence
45 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
140 pages
Database Management Systems Course Guide Book PDF
No ratings yet
Database Management Systems Course Guide Book PDF
4 pages
AI Introduction
No ratings yet
AI Introduction
376 pages
PAI Unit1 Toprint
No ratings yet
PAI Unit1 Toprint
47 pages
Module 1 & 2 PDF
No ratings yet
Module 1 & 2 PDF
76 pages
Unit I
No ratings yet
Unit I
28 pages
Unit I Notes
No ratings yet
Unit I Notes
45 pages
AI New All Units
No ratings yet
AI New All Units
137 pages
AI Unit 1
No ratings yet
AI Unit 1
48 pages
Unit 1
No ratings yet
Unit 1
35 pages
Principals of AI Unit 1
No ratings yet
Principals of AI Unit 1
43 pages
Unit I
No ratings yet
Unit I
149 pages
CSC429 Artificial Intelligence2 - 125207
No ratings yet
CSC429 Artificial Intelligence2 - 125207
42 pages
Chapter 3 - EMT
No ratings yet
Chapter 3 - EMT
44 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
AI ML Manual For 22-23
No ratings yet
AI ML Manual For 22-23
48 pages
AI For Engineering Unit-1 (Lec-1)
No ratings yet
AI For Engineering Unit-1 (Lec-1)
26 pages
AI Notes Unit - 1&2
No ratings yet
AI Notes Unit - 1&2
39 pages
Artificial Intelligence (AI)
No ratings yet
Artificial Intelligence (AI)
34 pages
UNIT 1 Introduction To Artificial Intelligence2
No ratings yet
UNIT 1 Introduction To Artificial Intelligence2
39 pages
VIP Students Letters
No ratings yet
VIP Students Letters
40 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
50 pages
Midterm Solution
No ratings yet
Midterm Solution
8 pages
Textile Management System Final Review
No ratings yet
Textile Management System Final Review
40 pages
CRISC GF Application
No ratings yet
CRISC GF Application
10 pages
Omnilogic Hlbase Operation
No ratings yet
Omnilogic Hlbase Operation
40 pages
Cómo Escribir Un Ensayo Romano
100% (1)
Cómo Escribir Un Ensayo Romano
5 pages
AI
No ratings yet
AI
30 pages
E Commerce Marketing
No ratings yet
E Commerce Marketing
11 pages
XenApp 6.5 Advanced Administratoin - Student Manual
No ratings yet
XenApp 6.5 Advanced Administratoin - Student Manual
310 pages
A Beginner
No ratings yet
A Beginner
21 pages
Algorithms and Flowcharts
No ratings yet
Algorithms and Flowcharts
31 pages
Repeat Indicator Panel: (Compact-Rpt / Senadv-Ind)
No ratings yet
Repeat Indicator Panel: (Compact-Rpt / Senadv-Ind)
2 pages
Internship-Report 2028208
No ratings yet
Internship-Report 2028208
24 pages
Dragonpay Payment Instruction
No ratings yet
Dragonpay Payment Instruction
1 page
Wolfsmilkie - Tumblr Blog Tumgik
No ratings yet
Wolfsmilkie - Tumblr Blog Tumgik
4 pages
Benchtop Bender Instruction Manual: MODELS H5502
No ratings yet
Benchtop Bender Instruction Manual: MODELS H5502
24 pages