0% found this document useful (0 votes)
28 views10 pages

Supervised Learning Unsupervised Learning Reinforcement Learning

Uploaded by

jeetendra1980
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views10 pages

Supervised Learning Unsupervised Learning Reinforcement Learning

Uploaded by

jeetendra1980
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

What is artificial intelligence (AI)?

Today's largest and most successful enterprises have used AI to improve their operations and gain
advantage on their competitors.
Artificial intelligence is the simulation of human intelligence processes by machines, especially computer
systems. Specific applications of AI include expert systems, natural language processing, speech What are the advantages and disadvantages of artificial intelligence?
recognition and machine vision.
Artificial neural networks and deep learning artificial intelligence technologies are quickly evolving,
According to the father of Artificial Intelligence, John McCarthy, it is “The science and primarily because AI processes large amounts of data much faster and makes predictions more
engineering of making intelligent machines, especially intelligent computer programs”. accurately than humanly possible.
Artificial Intelligence is a way of making a computer, a computer-controlled robot, or a software think
intelligently, in the similar manner the intelligent humans think. While the huge volume of data being created on a daily basis would bury a human researcher, AI
applications that use machine learning can take that data and quickly turn it into actionable information.
How does AI work?
As of this writing, the primary disadvantage of using AI is that it is expensive to process the large
amounts of data that AI programming requires.
As the hype around AI has accelerated, vendors have been scrambling to promote how their products
and services use AI. Often what they refer to as AI is simply one component of AI, such as machine Advantages
learning. AI requires a foundation of specialized hardware and software for writing and training machine
learning algorithms. No one programming language is synonymous with AI, but a few, including Python,  Good at detail-oriented jobs;
R and Java, are popular.  Reduced time for data-heavy tasks;
 Delivers consistent results; and
In general, AI systems work by ingesting large amounts of labeled training data, analyzing the data for  AI-powered virtual agents are always available.
correlations and patterns, and using these patterns to make predictions about future states. In this way,
a chatbot that is fed examples of text chats can learn to produce lifelike exchanges with people, or an Disadvantages
image recognition tool can learn to identify and describe objects in images by reviewing millions of
examples.  Expensive;
 Requires deep technical expertise;
AI programming focuses on three cognitive skills: learning, reasoning and self-correction.  Limited supply of qualified workers to build AI tools;
 Only knows what it's been shown; and
Learning processes. This aspect of AI programming focuses on acquiring data and creating rules for  Lack of ability to generalize from one task to another.
how to turn the data into actionable information. The rules, which are called algorithms, provide
computing devices with step-by-step instructions for how to complete a specific task. Strong AI vs. weak AI

Reasoning processes. This aspect of AI programming focuses on choosing the right algorithm to reach
AI can be categorized as either weak or strong.
a desired outcome.
 Weak AI, also known as narrow AI, is an AI system that is designed and trained to complete a
Self-correction processes. This aspect of AI programming is designed to continually fine-tune
specific task. Industrial robots and virtual personal assistants, such as Apple's Siri, use weak AI.
algorithms and ensure they provide the most accurate results possible.
 Strong AI, also known as artificial general intelligence (AGI), describes programming that can
replicate the cognitive abilities of the human brain. When presented with an unfamiliar task, a
Why is artificial intelligence important?
strong AI system can use fuzzy logic to apply knowledge from one domain to another and find a
solution autonomously. In theory, a strong AI program should be able to pass both a Turing Test
AI is important because it can give enterprises insights into their operations that they may not have and the Chinese room test.
been aware of previously and because, in some cases, AI can perform tasks better than humans.
Particularly when it comes to repetitive, detail-oriented tasks like analyzing large numbers of legal What are the 4 types of artificial intelligence?
documents to ensure relevant fields are filled in properly, AI tools often complete jobs quickly and with
relatively few errors.
Arend Hintze, an assistant professor of integrative biology and computer science and engineering at
Michigan State University, explained in a 2016 article that AI can be categorized into four types,
This has helped fuel an explosion in efficiency and opened the door to entirely new business
beginning with the task-specific intelligent systems in wide use today and progressing to sentient
opportunities for some larger enterprises. Prior to the current wave of AI, it would have been hard to
systems, which do not yet exist. The categories are as follows:
imagine using computer software to connect riders to taxis, but today Uber has become one of the
largest companies in the world by doing just that. It utilizes sophisticated machine learning algorithms
 Type 1: Reactive machines. These AI systems have no memory and are task specific. An
to predict when people are likely to need rides in certain areas, which helps proactively get drivers on
example is Deep Blue, the IBM chess program that beat Garry Kasparov in the 1990s. Deep Blue
the road before they're needed. As another example, Google has become one of the largest players for a
can identify pieces on the chessboard and make predictions, but because it has no memory, it
range of online services by using machine learning to understand how people use their services and
cannot use past experiences to inform future ones.
then improving them. In 2017, the company's CEO, Sundar Pichai, pronounced that Google would
 Type 2: Limited memory. These AI systems have memory, so they can use past experiences to
operate as an "AI first" company.
inform future decisions. Some of the decision-making functions in self-driving cars are designed
this way.

 Type 3: Theory of mind. Theory of mind is a psychology term. When applied to AI, it means information, schedule appointments, understand the billing process and complete other administrative
that the system would have the social intelligence to understand emotions. This type of AI will be processes. An array of AI technologies is also being used to predict, fight and understand pandemics
able to infer human intentions and predict behavior, a necessary skill for AI systems to become such as COVID-19.
integral members of human teams.
 Type 4: Self-awareness. In this category, AI systems have a sense of self, which gives them AI in business. Machine learning algorithms are being integrated into analytics and customer
consciousness. Machines with self-awareness understand their own current state. This type of AI relationship management (CRM) platforms to uncover information on how to better serve customers.
does not yet exist. Chatbots have been incorporated into websites to provide immediate service to customers. Automation
of job positions has also become a talking point among academics and IT analysts.
What are examples of AI technology and how is it used today?
AI in education. AI can automate grading, giving educators more time. It can assess students and
AI is incorporated into a variety of different types of technology. Here are six examples: adapt to their needs, helping them work at their own pace. AI tutors can provide additional support to
students, ensuring they stay on track. And it could change where and how students learn, perhaps even
 Automation. When paired with AI technologies, automation tools can expand the volume and replacing some teachers.
types of tasks performed. An example is robotic process automation (RPA), a type of software
that automates repetitive, rules-based data processing tasks traditionally done by humans. When AI in finance. AI in personal finance applications, such as Intuit Mint or TurboTax, is disrupting
combined with machine learning and emerging AI tools, RPA can automate bigger portions of financial institutions. Applications such as these collect personal data and provide financial advice. Other
enterprise jobs, enabling RPA's tactical bots to pass along intelligence from AI and respond to programs, such as IBM Watson, have been applied to the process of buying a home. Today, artificial
process changes. intelligence software performs much of the trading on Wall Street.
 Machine learning. This is the science of getting a computer to act without programming. Deep
learning is a subset of machine learning that, in very simple terms, can be thought of as the AI in law. The discovery process -- sifting through documents -- in law is often overwhelming for
automation of predictive analytics. There are three types of machine learning algorithms: humans. Using AI to help automate the legal industry's labor-intensive processes is saving time and
o Supervised learning. Data sets are labeled so that patterns can be detected and used to improving client service. Law firms are using machine learning to describe data and predict outcomes,
label new data sets. computer vision to classify and extract information from documents and natural language processing to
o Unsupervised learning. Data sets aren't labeled and are sorted according to similarities interpret requests for information.
or differences.
o Reinforcement learning. Data sets aren't labeled but, after performing an action or AI in manufacturing. Manufacturing has been at the forefront of incorporating robots into the
several actions, the AI system is given feedback. workflow. For example, the industrial robots that were at one time programmed to perform single tasks
 Machine vision. This technology gives a machine the ability to see. Machine vision captures and and separated from human workers, increasingly function as cobots: Smaller, multitasking robots that
analyzes visual information using a camera, analog-to-digital conversion and digital signal collaborate with humans and take on responsibility for more parts of the job in warehouses, factory
processing. It is often compared to human eyesight, but machine vision isn't bound by biology floors and other workspaces.
and can be programmed to see through walls, for example. It is used in a range of applications
from signature identification to medical image analysis. Computer vision, which is focused on AI in banking. Banks are successfully employing chatbots to make their customers aware of services
machine-based image processing, is often conflated with machine vision. and offerings and to handle transactions that don't require human intervention. AI virtual assistants are
 Natural language processing (NLP). This is the processing of human language by a computer being used to improve and cut the costs of compliance with banking regulations. Banking organizations
program. One of the older and best-known examples of NLP is spam detection, which looks at the are also using AI to improve their decision-making for loans, and to set credit limits and identify
subject line and text of an email and decides if it's junk. Current approaches to NLP are based on investment opportunities.
machine learning. NLP tasks include text translation, sentiment analysis and speech recognition.
 Robotics. This field of engineering focuses on the design and manufacturing of robots. Robots AI in transportation. In addition to AI's fundamental role in operating autonomous vehicles, AI
are often used to perform tasks that are difficult for humans to perform or perform consistently. technologies are used in transportation to manage traffic, predict flight delays, and make ocean shipping
For example, robots are used in assembly lines for car production or by NASA to move large safer and more efficient.
objects in space. Researchers are also using machine learning to build robots that can interact in
social settings. Security. AI and machine learning are at the top of the buzzword list security vendors use today to
 Self-driving cars. Autonomous vehicles use a combination of computer vision, image recognition differentiate their offerings. Those terms also represent truly viable technologies. Organizations use
and deep learning to build automated skill at piloting a vehicle while staying in a given lane and machine learning in security information and event management (SIEM) software and related areas to
avoiding unexpected obstructions, such as pedestrians. detect anomalies and identify suspicious activities that indicate threats. By analyzing data and using
logic to identify similarities to known malicious code, AI can provide alerts to new and emerging attacks
What are the applications of AI? much sooner than human employees and previous technology iterations. The maturing technology is
playing a big role in helping organizations fight off cyber attacks.
Artificial intelligence has made its way into a wide variety of markets. Here are nine examples.
Augmented intelligence vs. artificial intelligence
AI in healthcare. The biggest bets are on improving patient outcomes and reducing costs. Companies
are applying machine learning to make better and faster diagnoses than humans. One of the best- Some industry experts believe the term artificial intelligence is too closely linked to popular culture, and
known healthcare technologies is IBM Watson. It understands natural language and can respond to this has caused the general public to have improbable expectations about how AI will change the
questions asked of it. The system mines patient data and other available data sources to form a workplace and life in general.
hypothesis, which it then presents with a confidence scoring schema. Other AI applications include using
online virtual health assistants and chatbots to help patients and healthcare customers find medical
 Augmented intelligence. Some researchers and marketers hope the label augmented but do not distribute conversation -- except to the companies' technology teams which use it to improve
intelligence, which has a more neutral connotation, will help people understand that most machine learning algorithms. And, of course, the laws that governments do manage to craft to regulate
implementations of AI will be weak and simply improve products and services. Examples include AI don't stop criminals from using the technology with malicious intent.
automatically surfacing important information in business intelligence reports or highlighting
important information in legal filings. Cognitive computing and AI
 Artificial intelligence. True AI, or artificial general intelligence, is closely associated with the
concept of the technological singularity -- a future ruled by an artificial superintelligence that far The terms AI and cognitive computing are sometimes used interchangeably, but, generally speaking,
surpasses the human brain's ability to understand it or how it is shaping our reality. This remains the label AI is used in reference to machines that replace human intelligence by simulating how we
within the realm of science fiction, though some developers are working on the problem. Many sense, learn, process and react to information in the environment.
believe that technologies such as quantum computing could play an important role in making AGI
a reality and that we should reserve the use of the term AI for this kind of general intelligence. The label cognitive computing is used in reference to products and services that mimic and augment
human thought processes.
Ethical use of artificial intelligence
What is the history of AI?
While AI tools present a range of new functionality for businesses, the use of artificial intelligence also
raises ethical questions because, for better or worse, an AI system will reinforce what it has already The concept of inanimate objects endowed with intelligence has been around since ancient times. The
learned. Greek god Hephaestus was depicted in myths as forging robot-like servants out of gold. Engineers in
ancient Egypt built statues of gods animated by priests. Throughout the centuries, thinkers from
This can be problematic because machine learning algorithms, which underpin many of the most Aristotle to the 13th century Spanish theologian Ramon Llull to René Descartes and Thomas Bayes used
advanced AI tools, are only as smart as the data they are given in training. Because a human being the tools and logic of their times to describe human thought processes as symbols, laying the foundation
selects what data is used to train an AI program, the potential for machine learning bias is inherent and for AI concepts such as general knowledge representation.
must be monitored closely.
The late 19th and first half of the 20th centuries brought forth the foundational work that would give
Anyone looking to use machine learning as part of real-world, in-production systems needs to factor rise to the modern computer. In 1836, Cambridge University mathematician Charles Babbage and
ethics into their AI training processes and strive to avoid bias. This is especially true when using AI Augusta Ada Byron, Countess of Lovelace, invented the first design for a programmable machine.
algorithms that are inherently unexplainable in deep learning and generative adversarial network (GAN)
applications. 1940s. Princeton mathematician John Von Neumann conceived the architecture for the stored-program
computer -- the idea that a computer's program and the data it processes can be kept in the computer's
Explainability is a potential stumbling block to using AI in industries that operate under strict regulatory memory. And Warren McCulloch and Walter Pitts laid the foundation for neural networks.
compliance requirements. For example, financial institutions in the United States operate under
regulations that require them to explain their credit-issuing decisions. When a decision to refuse credit is 1950s. With the advent of modern computers, scientists could test their ideas about machine
made by AI programming, however, it can be difficult to explain how the decision was arrived at intelligence. One method for determining whether a computer has intelligence was devised by the
because the AI tools used to make such decisions operate by teasing out subtle correlations between British mathematician and World War II code-breaker Alan Turing. The Turing Test focused on a
thousands of variables. When the decision-making process cannot be explained, the program may be computer's ability to fool interrogators into believing its responses to their questions were made by a
referred to as black box AI. human being.

Despite potential risks, there are currently few regulations governing the use of AI tools, and where 1956. The modern field of artificial intelligence is widely cited as starting this year during a summer
laws do exist, they typically pertain to AI indirectly. For example, as previously mentioned, United conference at Dartmouth College. Sponsored by the Defense Advanced Research Projects Agency
States Fair Lending regulations require financial institutions to explain credit decisions to potential (DARPA), the conference was attended by 10 luminaries in the field, including AI pioneers Marvin
customers. This limits the extent to which lenders can use deep learning algorithms, which by their Minsky, Oliver Selfridge and John McCarthy, who is credited with coining the term artificial intelligence.
nature are opaque and lack explainability. Also in attendance were Allen Newell, a computer scientist, and Herbert A. Simon, an economist,
political scientist and cognitive psychologist, who presented their groundbreaking Logic Theorist, a
The European Union's General Data Protection Regulation (GDPR) puts strict limits on how enterprises computer program capable of proving certain mathematical theorems and referred to as the first AI
can use consumer data, which impedes the training and functionality of many consumer-facing AI program.
applications.
1950s and 1960s. In the wake of the Dartmouth College conference, leaders in the fledgling field of AI
In October 2016, the National Science and Technology Council issued a report examining the potential predicted that a man-made intelligence equivalent to the human brain was around the corner, attracting
role governmental regulation might play in AI development, but it did not recommend specific legislation major government and industry support. Indeed, nearly 20 years of well-funded basic research
be considered. generated significant advances in AI: For example, in the late 1950s, Newell and Simon published the
General Problem Solver (GPS) algorithm, which fell short of solving complex problems but laid the
Crafting laws to regulate AI will not be easy, in part because AI comprises a variety of technologies that foundations for developing more sophisticated cognitive architectures; McCarthy developed Lisp, a
companies use for different ends, and partly because regulations can come at the cost of AI progress language for AI programming that is still used today. In the mid-1960s MIT Professor Joseph
and development. The rapid evolution of AI technologies is another obstacle to forming meaningful Weizenbaum developed ELIZA, an early natural language processing program that laid the foundation
regulation of AI. Technology breakthroughs and novel applications can make existing laws instantly for today's chatbots.
obsolete. For example, existing laws regulating the privacy of conversations and recorded conversations
do not cover the challenge posed by voice assistants like Amazon's Alexa and Apple's Siri that gather

1970s and 1980s. But the achievement of artificial general intelligence proved elusive, not imminent, From the above term some of the compound statements are equivalent to each other, which we can
hampered by limitations in computer processing and memory and by the complexity of the problem. prove using truth table:
Government and corporations backed away from their support of AI research, leading to a fallow period
lasting from 1974 to 1980 and known as the first "AI Winter." In the 1980s, research on deep learning First-Order Logic in Artificial intelligence
techniques and industry's adoption of Edward Feigenbaum's expert systems sparked a new wave of AI
enthusiasm, only to be followed by another collapse of government funding and industry support. The In the topic of Propositional logic, we have seen that how to represent statements using propositional
second AI winter lasted until the mid-1990s. logic. But unfortunately, in propositional logic, we can only represent the facts, which are either true or
false. PL is not sufficient to represent the complex sentences or natural language statements. The
1990s through today. Increases in computational power and an explosion of data sparked an AI propositional logic has very limited expressive power. Consider the following sentence, which we cannot
renaissance in the late 1990s that has continued to present times. The latest focus on AI has given rise represent using PL logic.
to breakthroughs in natural language processing, computer vision, robotics, machine learning, deep
learning and more. Moreover, AI is becoming ever more tangible, powering cars, diagnosing disease and  "Some humans are intelligent", or
cementing its role in popular culture. In 1997, IBM's Deep Blue defeated Russian chess grandmaster  "Sachin likes cricket."
Garry Kasparov, becoming the first computer program to beat a world chess champion. Fourteen years
later, IBM's Watson captivated the public when it defeated two former champions on the game show To represent the above statements, PL logic is not sufficient, so we required some more powerful logic,
Jeopardy!. More recently, the historic defeat of 18-time World Go champion Lee Sedol by Google such as first-order logic.
DeepMind's AlphaGo stunned the Go community and marked a major milestone in the development of
intelligent machines. First-Order logic:

AI as a service  First-order logic is another way of knowledge representation in artificial intelligence. It is an


extension to propositional logic.
Because hardware, software and staffing costs for AI can be expensive, many vendors are including AI  FOL is sufficiently expressive to represent the natural language statements in a concise way.
components in their standard offerings or providing access to artificial intelligence as a service (AIaaS)  First-order logic is also known as Predicate logic or First-order predicate logic. First-order
platforms. AIaaS allows individuals and companies to experiment with AI for various business purposes logic is a powerful language that develops information about the objects in a more easy way and
and sample multiple platforms before making a commitment. can also express the relationship between those objects.
 First-order logic (like natural language) does not only assume that the world contains facts like
Popular AI cloud offerings include the following: propositional logic but also assumes the following things in the world:
o Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus, ......
 Amazon AI o Relations: It can be unary relation such as: red, round, is adjacent, or n-any relation
 IBM Watson Assistant such as: the sister of, brother of, has color, comes between
 Microsoft Cognitive Services o Function: Father of, best friend, third inning of, end of, ......
 Google AI  As a natural language, first-order logic also has two main parts:
1. Syntax
Inference: 2. Semantics

In artificial intelligence, we need intelligent computers which can create new logic from old logic or by Forward Chaining
evidence, so generating the conclusions from evidence and facts is termed as Inference.
Forward chaining is also known as a forward deduction or forward reasoning method when using an
Inference rules: inference engine. Forward chaining is a form of reasoning which start with atomic sentences in the
knowledge base and applies inference rules (Modus Ponens) in the forward direction to extract more
Inference rules are the templates for generating valid arguments. Inference rules are applied to derive data until a goal is reached.
proofs in artificial intelligence, and the proof is a sequence of the conclusion that leads to the desired
goal. The Forward-chaining algorithm starts from known facts, triggers all rules whose premises are satisfied,
and add their conclusion to the known facts. This process repeats until the problem is solved.
In inference rules, the implication among all the connectives plays an important role. Following are
some terminologies related to inference rules: Properties of Forward-Chaining:

 Implication: It is one of the logical connectives which can be represented as P → Q. It is a  It is a down-up approach, as it moves from bottom to top.
Boolean expression.  It is a process of making a conclusion based on known facts or data, by starting from the initial
 Converse: The converse of implication, which means the right-hand side proposition goes to the state and reaches the goal state.
left-hand side and vice-versa. It can be written as Q → P.  Forward-chaining approach is also called as data-driven as we reach to the goal using available
 Contrapositive: The negation of converse is termed as contrapositive, and it can be represented data.
 Forward -chaining approach is commonly used in the expert system, such as CLIPS, business, and
as ¬ Q → ¬ P.
production rule systems.
 Inverse: The negation of implication is called inverse. It can be represented as ¬ P → ¬ Q.
Reasoning:
Difference between backward chaining and forward
chaining The reasoning is the mental process of deriving logical conclusion and making predictions from available knowledge,
facts, and beliefs. Or we can say, "Reasoning is a way to infer facts from existing data." It is a general process of
thinking rationally, to find valid conclusions.
Following is the difference between the forward chaining and backward chaining:
In artificial intelligence, the reasoning is essential so that the machine can also think rationally as a human brain, and can
 Forward chaining as the name suggests, start from the known facts and move forward by applying inference rules perform like a human.
to extract more data, and it continues until it reaches to the goal, whereas backward chaining starts from the goal,
move backward by using inference rules to determine the facts that satisfy the goal. Types of Reasoning
 Forward chaining is called a data-driven inference technique, whereas backward chaining is called a goal-
driven inference technique.
 Forward chaining is known as the down-up approach, whereas backward chaining is known as a top-down In artificial intelligence, reasoning can be divided into the following categories:
approach.
 Deductive reasoning
 Forward chaining uses breadth-first search strategy, whereas backward chaining uses depth-first search
 Inductive reasoning
strategy.
 Abductive reasoning
 Forward and backward chaining both applies Modus ponens inference rule.
 Common Sense Reasoning
 Forward chaining can be used for tasks such as planning, design process monitoring, diagnosis, and
 Monotonic Reasoning
classification, whereas backward chaining can be used for classification and diagnosis tasks.  Non‐monotonic Reasoning
 Forward chaining can be like an exhaustive search, whereas backward chaining tries to avoid the unnecessary
path of reasoning. Note: Inductive and deductive reasoning are the forms of propositional logic.
 In forward-chaining there can be various ASK questions from the knowledge base, whereas in backward
chaining there can be fewer ASK questions. 1. Deductive reasoning:
 Forward chaining is slow as it checks for all the rules, whereas backward chaining is fast as it checks few
required rules only. Deductive reasoning is deducing new information from logically related known information. It is the form of valid
reasoning, which means the argument's conclusion must be true when the premises are true.
S.
Forward Chaining Backward Chaining
No. Deductive reasoning is a type of propositional logic in AI, and it requires various rules and facts. It is sometimes referred
Forward chaining starts from known facts and applies Backward chaining starts from the goal and works to as top-down reasoning, and contradictory to inductive reasoning.
1. inference rule to extract more data unit it reaches to backward through inference rules to find the required facts
the goal. that support the goal. In deductive reasoning, the truth of the premises guarantees the truth of the conclusion.
2. It is a bottom-up approach It is a top-down approach
Deductive reasoning mostly starts from the general premises to the specific conclusion, which can be explained as below
Forward chaining is known as data-driven inference Backward chaining is known as goal-driven technique as we
example.
3. technique as we reach to the goal using the available start from the goal and divide into sub-goal to extract the
data. facts.
Example:
Forward chaining reasoning applies a breadth-first Backward chaining reasoning applies a depth-first search
4.
search strategy. strategy. Premise-1: All the human eats veggies
5. Forward chaining tests for all the available rules Backward chaining only tests for few required rules.
Forward chaining is suitable for the planning, Backward chaining is suitable for diagnostic, prescription, Premise-2: Suresh is human.
6.
monitoring, control, and interpretation application. and debugging application.
Forward chaining can generate an infinite number of Backward chaining generates a finite number of possible Conclusion: Suresh eats veggies.
7.
possible conclusions. conclusions.
The general process of deductive reasoning is given below:
8. It operates in the forward direction. It operates in the backward direction.
9. Forward chaining is aimed for any conclusion. Backward chaining is only aimed for the required data.

Reasoning in Artificial intelligence


In previous topics, we have learned various ways of knowledge representation in artificial intelligence. Now we will
learn the various ways to reason on this knowledge using different logical schemes.

2. Inductive Reasoning: The above two statements are the examples of common sense reasoning which a human mind can easily understand and
assume.
Inductive reasoning is a form of reasoning to arrive at a conclusion using limited sets of facts by the process of
generalization. It starts with the series of specific facts or data and reaches to a general statement or conclusion. 5. Monotonic Reasoning:

Inductive reasoning is a type of propositional logic, which is also known as cause-effect reasoning or bottom-up In monotonic reasoning, once the conclusion is taken, then it will remain the same even if we add some other
reasoning. information to existing information in our knowledge base. In monotonic reasoning, adding knowledge does not
decrease the set of prepositions that can be derived.
In inductive reasoning, we use historical data or various premises to generate a generic rule, for which premises support
the conclusion. To solve monotonic problems, we can derive the valid conclusion from the available facts only, and it will not be
affected by new facts.
In inductive reasoning, premises provide probable supports to the conclusion, so the truth of premises does not guarantee
the truth of the conclusion. Monotonic reasoning is not useful for the real-time systems, as in real time, facts get changed, so we cannot use
monotonic reasoning.
Example:
Monotonic reasoning is used in conventional reasoning systems, and a logic-based system is monotonic.
Premise: All of the pigeons we have seen in the zoo are white.
Any theorem proving is an example of monotonic reasoning.
Conclusion: Therefore, we can expect all the pigeons to be white.
Example:

 Earth revolves around the Sun.

It is a true fact, and it cannot be changed even if we add another sentence in knowledge base like, "The moon revolves
around the earth" Or "Earth is not round," etc.

3. Abductive reasoning: Advantages of Monotonic Reasoning:

Abductive reasoning is a form of logical reasoning which starts with single or multiple observations then seeks to find  In monotonic reasoning, each old proof will always remain valid.
the most likely explanation or conclusion for the observation.  If we deduce some facts from available facts, then it will remain valid for always.

Abductive reasoning is an extension of deductive reasoning, but in abductive reasoning, the premises do not guarantee Disadvantages of Monotonic Reasoning:
the conclusion.
 We cannot represent the real world scenarios using Monotonic reasoning.
Example:  Hypothesis knowledge cannot be expressed with monotonic reasoning, which means facts should be true.
 Since we can only derive conclusions from the old proofs, so new knowledge from the real world cannot be added.
Implication: Cricket ground is wet if it is raining
6. Non‐monotonic Reasoning
Axiom: Cricket ground is wet.
In Non-monotonic reasoning, some conclusions may be invalidated if we add some more information to our knowledge
Conclusion It is raining. base.

4. Common Sense Reasoning Logic will be said as non-monotonic if some conclusions can be invalidated by adding more knowledge into our
knowledge base.
Common sense reasoning is an informal form of reasoning, which can be gained through experiences.
Non-monotonic reasoning deals with incomplete and uncertain models.
Common Sense reasoning simulates the human ability to make presumptions about events which occurs on every day.
"Human perceptions for various things in daily life, "is a general example of non-monotonic reasoning.
It relies on good judgment rather than exact logic and operates on heuristic knowledge and heuristic rules.
Example: Let suppose the knowledge base contains the following knowledge:
Example:
 Birds can fly
 Penguins cannot fly
1. One person can be at one place at a time.
 Pitty is a bird
2. If I put my hand in a fire, then it will burn.
So from the above sentences, we can conclude that Pitty can fly. result. Problem-solving agents are the goal-based agents and use atomic representation. In this topic, we will learn
various problem-solving search algorithms.
However, if we add one another sentence into knowledge base "Pitty is a penguin", which concludes "Pitty cannot
fly", so it invalidates the above conclusion. Search Algorithm Terminologies:
Advantages of Non‐monotonic reasoning:  Search: Searchingis a step by step procedure to solve a search‐problem in a given search space. A search problem can
have three main factors:
 For real‐world systems such as Robot navigation, we can use non‐monotonic reasoning. 1. Search Space: Search space represents a set of possible solutions, which a system may have.
 In Non‐monotonic reasoning, we can choose probabilistic facts or can make assumptions. 2. Start State: It is a state from where agent begins the search.
3. Goal test: It is a function which observe the current state and returns whether the goal state is achieved or not.
Disadvantages of Non‐monotonic Reasoning:  Search tree: A tree representation of search problem is called Search tree. The root of the search tree is the root node
which is corresponding to the initial state.
 In non‐monotonic reasoning, the old facts may be invalidated by adding new sentences.  Actions: It gives the description of all the available actions to the agent.
 It cannot be used for theorem proving.  Transition model: A description of what each action do, can be represented as a transition model.
 Path Cost: It is a function which assigns a numeric cost to each path.
 Solution: It is an action sequence which leads from the start node to the goal node.
Propositional logic in Artificial intelligence  Optimal Solution: If a solution has the lowest cost among all solutions.

Propositional logic (PL) is the simplest form of logic where all the statements are made by propositions. A proposition is Properties of Search Algorithms:
a declarative statement which is either true or false. It is a technique of knowledge representation in logical and
mathematical form. Following are the four essential properties of search algorithms to compare the efficiency of these algorithms:

Example: Completeness: A search algorithm is said to be complete if it guarantees to return a solution if at least any solution
exists for any random input.
1. a) It is Sunday.
2. b) The Sun rises from West (False proposition) Optimality: If a solution found for an algorithm is guaranteed to be the best solution (lowest path cost) among all other
3. c) 3+3= 7(False proposition) solutions, then such a solution for is said to be an optimal solution.
4. d) 5 is a prime number.
Time Complexity: Time complexity is a measure of time for an algorithm to complete its task.
Following are some basic facts about propositional logic:
Space Complexity: It is the maximum storage space required at any point during the search, as the complexity of the
 Propositional logic is also called Boolean logic as it works on 0 and 1. problem.
 In propositional logic, we use symbolic variables to represent the logic, and we can use any symbol for a representing a
proposition, such A, B, C, P, Q, R, etc.


Propositions can be either true or false, but it cannot be both.
Propositional logic consists of an object, relations or function, and logical connectives.
Uninformed Search Algorithms
 These connectives are also called logical operators. Uninformed search is a class of general-purpose search algorithms which operates in brute force-way.
 The propositions and connectives are the basic elements of the propositional logic.
Uninformed search algorithms do not have additional information about state or search space other than how to
 Connectives can be said as a logical operator which connects two sentences.
traverse the tree, so it is also called blind search.
 A proposition formula which is always true is called tautology, and it is also called a valid sentence.
 A proposition formula which is always false is called Contradiction.
Following are the various types of uninformed search algorithms:
 A proposition formula which has both true and false values is called
 Statements which are questions, commands, or opinions are not propositions such as "Where is Rohini", "How are you",
1. Breadth‐first Search
"What is your name", are not propositions.
2. Depth‐first Search
3. Depth‐limited Search
Search Algorithms in Artificial Intelligence 4.
5.
Iterative deepening depth‐first search
Uniform cost search
6. Bidirectional Search
Search algorithms are one of the most important areas of Artificial Intelligence. This topic will explain all about the
search algorithms in AI.
1. Breadth‐first Search:
Problem‐solving agents:
 Breadth‐first search is the most common search strategy for traversing a tree or graph. This algorithm searches
breadthwise in a tree or graph, so it is called breadth‐first search.
In Artificial Intelligence, Search techniques are universal problem-solving methods. Rational agents or Problem-
solving agents in AI mostly used these search strategies or algorithms to solve a specific problem and provide the best

 BFS algorithm starts searching from the root node of the tree and expands all successor node at the current level before  It may not be optimal if the problem has more than one solution.
moving to nodes of next level.
 The breadth‐first search algorithm is an example of a general‐graph search algorithm. Uniform‐cost Search Algorithm:
 Breadth‐first search implemented using FIFO queue data structure.
Uniform-cost search is a searching algorithm used for traversing a weighted tree or graph. This algorithm comes into
Advantages: play when a different cost is available for each edge. The primary goal of the uniform-cost search is to find a path to the
goal node which has the lowest cumulative cost. Uniform-cost search expands nodes according to their path costs form
 BFS will provide a solution if any solution exists. the root node. It can be used to solve any graph/tree where the optimal cost is in demand. A uniform-cost search
 If there are more than one solutions for a given problem, then BFS will provide the minimal solution which requires the
algorithm is implemented by the priority queue. It gives maximum priority to the lowest cumulative cost. Uniform cost
least number of steps.
search is equivalent to BFS algorithm if the path cost of all edges is the same.
Disadvantages:

 It requires lots of memory since each level of the tree must be saved into memory to expand the next level.
Informed Search Algorithms
 BFS needs lots of time if the solution is far away from the root node.
So far we have talked about the uninformed search algorithms which looked through search space for all possible
solutions of the problem without having any additional knowledge about search space. But informed search algorithm
Depth‐first Search contains an array of knowledge such as how far we are from the goal, path cost, how to reach to goal node, etc. This
knowledge help agents to explore less to the search space and find more efficiently the goal node.
 Depth‐first search isa recursive algorithm for traversing a tree or graph data structure.
 It is called the depth‐first search because it starts from the root node and follows each path to its greatest depth node The informed search algorithm is more useful for large search space. Informed search algorithm uses the idea of
before moving to the next path. heuristic, so it is also called Heuristic search.
 DFS uses a stack data structure for its implementation.
 The process of the DFS algorithm is similar to the BFS algorithm. Heuristics function: Heuristic is a function which is used in Informed Search,
Note: Backtracking is an algorithm technique for finding all possible solutions using recursion.

Advantage:

 DFS requires very less memory as it only needs to store a stack of the nodes on the path from root node to the current
node.
 It takes less time to reach to the goal node than BFS algorithm (if it traverses in the right path).

Disadvantage:

 There is the possibility that many states keep re‐occurring, and there is no guarantee of finding the solution.
 DFS algorithm goes for deep down searching and sometime it may go to the infinite loop.

Depth‐Limited Search Algorithm:

A depth-limited search algorithm is similar to depth-first search with a predetermined limit. Depth-limited search can
solve the drawback of the infinite path in the Depth-first search. In this algorithm, the node at the depth limit will treat as
it has no successor nodes further.

Depth-limited search can be terminated with two Conditions of failure:

 Standard failure value: It indicates that problem does not have any solution.
 Cutoff failure value: It defines no solution for the problem within a given depth limit.

Advantages:

Depth-limited search is Memory efficient.

Disadvantages:

 Depth‐limited search also has a disadvantage of incompleteness.


To solve the problem of building a system you should take the following steps: supplied accurately and computer programs cannot handle easily. The storage also presents
1. Define the problem accurately including detailed specifications and what constitutes a another problem but searching can be achieved by hashing.
suitable solution. The number of rules that are used must be minimized and the set can be created by expressing
2. Scrutinize the problem carefully, for some features may have a central affect on the each rule in a form as possible. The representation of games leads to a state space representation
chosen method of solution. and it is common for well-organized games with some structure. This representation allows for
3. Segregate and represent the background knowledge needed in the solution of the the formal definition of a problem that needs the movement from a set of initial positions to one
problem. of a set of target positions. It means that the solution involves using known techniques and a
4. Choose the best solving techniques for the problem to solve a solution. systematic search. This is quite a common method in Artificial Intelligence.
Problem solving is a process of generating solutions from observed data.
• a ‘problem’ is characterized by a set of goals, 2.3.1 State Space Search
• a set of objects, and A state space represents a problem in terms of states and operators that change states.
• a set of operations. A state space consists of:
These could be ill-defined and may evolve during problem solving. A representation of the states the system can be in. For example, in a
• A ‘problem space’ is an abstract space. board game, the board represents the current state of the game.
A problem space encompasses all valid states that can be generated by the A set of operators that can change one state into another state. In a board
application of any combination of operators on any combination of objects. game, the operators are the legal moves from any given state. Often the
The problem space may contain one or more solutions. A solution is a operators are represented as programs that change a state representation to
combination of operations and objects that achieve the goals. represent the new state.
• A ‘search’ refers to the search for a solution in a problem space. An initial state.
Search proceeds with different types of ‘search control strategies’. A set of final states; some of these may be desirable, others undesirable.
The depth-first search and breadth-first search are the two common search This set is often represented implicitly by a program that detects terminal
strategies. states.

2.1 AI - General Problem Solving PRODUCTION SYSTEMS


Problem solving has been the key area of concern for Artificial Intelligence. Production systems provide appropriate structures for performing and describing search
Problem solving is a process of generating solutions from observed or given data. It is however processes. A production system has four basic components as enumerated below.
not always possible to use direct methods (i.e. go directly from data to solution). Instead, A set of rules each consisting of a left side that determines the applicability of the
problem solving often needs to use indirect or modelbased methods. rule and a right side that describes the operation to be performed if the rule is
General Problem Solver (GPS) was a computer program created in 1957 by Simon and Newell applied.
to build a universal problem solver machine. GPS was based on Simon and Newell’s theoretical A database of current facts established during the process of inference.
work on logic machines. GPS in principle can solve any formalized symbolic problem, such as 19
theorems proof and geometric problems and chess playing. GPS solved many simple problems, A control strategy that specifies the order in which the rules will be compared
such as the Towers of Hanoi, that could be sufficiently formalized, but GPS could not solve any with facts in the database and also specifies how to resolve conflicts in selection
real-world problems. of several rules or selection of more facts.
To build a system to solve a particular problem, we need to: A rule firing module.
Define the problem precisely – find input situations as well as final situations for an
acceptable solution to the problem The production rules operate on the knowledge database. Each rule has a precondition—that is,
Analyze the problem – find few important features that may have impact on the either satisfied or not by the knowledge database. If the precondition is satisfied, the rule can be
appropriateness of various possible techniques for solving the problem applied. Application of the rule changes the knowledge database. The control system chooses
Isolate and represent task knowledge necessary to solve the problem which applicable rule should be applied and ceases computation when a termination condition on
Choose the best problem-solving technique(s) and apply to the particular problem the knowledge database is satisfied.
Problem definitions
A problem is defined by its ‘elements’ and their ‘relations’. To provide a formal description of a Tree structure - Tree is a way of organizing objects, related in a hierarchical fashion.
problem, we need to do the following:
a. Define a state space that contains all the possible configurations of the relevant objects, • Tree is a type of data structure in which each element is attached to one or more
including some impossible ones. elements directly beneath it.
b. Specify one or more states that describe possible situations, from which the problemsolving • The connections between elements are called branches.
process may start. These states are called initial states. • Tree is often called inverted trees because it is drawn with the root at the top.
c. Specify one or more states that would be acceptable solution to the problem. • The elements that have no elements below them are called leaves.
These states are called goal states. • A binary tree is a special type: each element has only two branches below it.
Properties
2.3 DEFINING PROBLEM AS A STATE SPACE SEARCH • Tree is a special case of a graph.
To solve the problem of playing a game, we require the rules of the game and targets for winning • The topmost node in a tree is called the root node.
as well as representing positions in the game. The opening position can be defined as the initial • At root node all operations on the tree begin.
state and a winning position as a goal state. Moves from initial state to other states leading to the • A node has at most one parent.
goal state follow legally. However, the rules are far too abundant in most games— especially in • The topmost node (root node) has no parents.
chess, where they exceed the number of particles in the universe. Thus, the rules cannot be • Each node has zero or more child nodes, which are below it .

• The nodes at the bottommost level of the tree are called leaf nodes. search algorithms is illustrated below. The representation begins with two types of search:
Since leaf nodes are at the bottom most level, they do not have children. • Uninformed Search: Also called blind, exhaustive or brute-force search, it uses no
• A node that has a child is called the child’s parent node. information about the problem to guide the search and therefore may not be very efficient.
• The depth of a node n is the length of the path from the root to the node. • Informed Search: Also called heuristic or intelligent search, this uses information about the
• The root node is at depth zero. problem to guide the search—usually guesses the distance to a goal state and is therefore
efficient, but the search may not be always possible.
CHARACTERISTICS OF PRODUCTION SYSTEMS
Production systems provide us with good ways of describing the operations that can be Heuristics
performed in a search for a solution to a problem. A heuristic is a method that improves the efficiency of the search process. These are like tour
At this time, two questions may arise: guides. There are good to the level that they may neglect the points in general interesting
1. Can production systems be described by a set of characteristics? And how can they be directions; they are bad to the level that they may neglect points of interest to particular
easily implemented? individuals. Some heuristics help in the search process without sacrificing any claims to entirety
2. What relationships are there between the problem types and the types of production that the process might previously had. Others may occasionally cause an excellent path to be
systems well suited for solving the problems? overlooked. By sacrificing entirety it increases efficiency. Heuristics may not find the best
To answer these questions, first consider the following definitions of classes of production 36
systems: solution every time but guarantee that they find a good solution in a reasonable time. These are
1. A monotonic production system is a production system in which the application of a particularly useful in solving tough and complex problems, solutions of which would require
rule never prevents the later application of another rule that could also have been infinite time, i.e. far longer than a lifetime for the problems which are not solved in any other
applied at the time the first rule was selected. way.
2. A non-monotonic production system is one in which this is not true. Heuristic search
3. A partially communicative production system is a production system with the To find a solution in proper time rather than a complete solution in unlimited time we use
property that if the application of a particular sequence of rules transforms state P into heuristics. ‘A heuristic function is a function that maps from problem state descriptions to
state Q, then any combination of those rules that is allowable also transforms state P measures of desirability, usually represented as numbers’. Heuristic search methods use
into state Q. knowledge about the problem domain and choose promising operators first. These heuristic
4. A commutative production system is a production system that is both monotonic and search methods use heuristic functions to evaluate the next state towards the goal state. For
partially commutative. finding a solution, by using the heuristic technique, one should carry out the following steps:
HEURISTIC SEARCH TECHNIQUES: 1. Add domain—specific information to select what is the best path to continue searching along.
Search Algorithms 2. Define a heuristic function h(n) that estimates the ‘goodness’ of a node n.
Many traditional search algorithms are used in AI applications. For complex problems, the Specifically, h(n) = estimated cost(or distance) of minimal cost path from n to a goal state.
traditional algorithms are unable to find the solutions within some practical time and space 3. The term, heuristic means ‘serving to aid discovery’ and is an estimate, based on domain
limits. Consequently, many special techniques are developed, using heuristic functions. specific information that is computable from the current state description of how close we are to
The algorithms that use heuristic functions are called heuristic algorithms. a goal.
• Heuristic algorithms are not really intelligent; they appear to be intelligent because they Finding a route from one city to another city is an example of a search problem in which
achieve better performance. different search orders and the use of heuristic knowledge are easily understood.
• Heuristic algorithms are more efficient because they take advantage of feedback from the data 1. State: The current city in which the traveller is located.
to direct the search path. 2. Operators: Roads linking the current city to other cities.
• Uninformed search algorithms or Brute-force algorithms, search through the search space all 3. Cost Metric: The cost of taking a given road between cities.
possible candidates for the solution checking whether each candidate satisfies the problem’s 4. Heuristic information: The search could be guided by the direction of the goal city from the
statement. current city, or we could use airline distance as an estimate of the distance to the goal.
• Informed search algorithms use heuristic functions that are specific to the problem, apply Heuristic search techniques
them to guide the search through the search space to try to reduce the amount of time spent in For complex problems, the traditional algorithms, presented above, are unable to find the
searching. solution within some practical time and space limits. Consequently, many special techniques are
A good heuristic will make an informed search dramatically outperform any uninformed search: developed, using heuristic functions.
for example, the Traveling Salesman Problem (TSP), where the goal is to find is a good solution • Blind search is not always possible, because it requires too much time or Space (memory).
instead of finding the best solution. Heuristics are rules of thumb; they do not guarantee a solution to a problem.
In such problems, the search proceeds using current information about the problem to predict • Heuristic Search is a weak technique but can be effective if applied correctly; it requires
which path is closer to the goal and follow it, although it does not always guarantee to find the domain specific information.
best possible solution. Such techniques help in finding a solution within reasonable time and Characteristics of heuristic search
space (memory). Some prominent intelligent search algorithms are stated below: • Heuristics are knowledge about domain, which help search and reasoning in its domain.
1. Generate and Test Search • Heuristic search incorporates domain knowledge to improve efficiency over blind search.
2. Best-first Search • Heuristic is a function that, when applied to a state, returns value as estimated merit of state,
3. Greedy Search with respect to goal.
4. A* Search Heuristics might (for reasons) underestimate or overestimate the merit of a state with
5. Constraint Search respect to goal.
6. Means-ends analysis Heuristics that underestimate are desirable and called admissible.
There are some more algorithms. They are either improvements or combinations of these. • Heuristic evaluation function estimates likelihood of given state leading to goal state.
• Hierarchical Representation of Search Algorithms: A Hierarchical representation of most • Heuristic search function estimates cost from current state to goal, presuming function is
efficient. 3. Stochastic hill climbing : It does not examine all the neighboring nodes before deciding
which node to select .It just selects a neighboring node at random, and decides (based on
Hill Climbing the amount of improvement in that neighbor) whether to move to that neighbor or to
Hill Climbing is heuristic search used for mathematical optimization problems in the field of examine another.
Artificial Intelligence . State Space diagram for Hill Climbing
Given a large set of inputs and a good heuristic function, it tries to find a sufficiently good State space diagram is a graphical representation of the set of states our search algorithm can
solution to the problem. This solution may not be the global optimal maximum. reach vs the value of our objective function(the function which we wish to maximize).
In the above definition, mathematical optimization problems implies that hill climbing X-axis : denotes the state space ie states or configuration our algorithm may reach.
solves the problems where we need to maximize or minimize a given real function by Y-axis : denotes the values of objective function corresponding to to a particular state.
choosing values from the given inputs. Example-Travelling salesman problem where we The best solution will be that state space where objective function has maximum value(global
need to minimize the distance traveled by salesman. maximum).
‘Heuristic search’ means that this search algorithm may not find the optimal solution to
the problem. However, it will give a good solution in reasonable time. Best First Search (Informed Search)
A heuristic function is a function that will rank all the possible alternatives at any In BFS and DFS, when we are at a node, we can consider any of the adjacent as next
branching step in search algorithm based on the available information. It helps the node. So both BFS and DFS blindly explore paths without considering any cost function. The
algorithm to select the best route out of possible routes. idea of Best First Search is to use an evaluation function to decide which adjacent is most
Features of Hill Climbing promising and then explore. Best First Search falls under the category of Heuristic Search or
1. Variant of generate and test algorithm : It is a variant of generate and test algorithm. The Informed Search.
generate and test algorithm is as follows : We use a priority queue to store costs of nodes. So the implementation is a variation of BFS, we
1. Generate a possible solutions. just need to change Queue to PriorityQueue.
2. Test to see if this is the expected solution.
3. If the solution has been found quit else go to step 1. Searching And-Or graphs
Hence we call Hill climbing as a variant of generate and test algorithm as it takes the feedback The DFS and BFS strategies for OR trees and graphs can be adapted for And-Or trees
from test procedure. Then this feedback is utilized by the generator in deciding the next move in The main difference lies in the way termination conditions are determined, since all
search space. goals following an And node must be realized, whereas a single goal node following
2. Uses the Greedy approach : At any point in state space, the search moves in that direction an Or node will do
only which optimizes the cost of function with the hope of finding the optimal solution at A more general optimal strategy is AO* (O for ordered) algorithm
the end. As in the case of the A* algorithm, we use the open list to hold nodes that have been
Types of Hill Climbing generated but not expanded and the closed list to hold nodes that have been expanded
1. Simple Hill climbing : It examines the neighboring nodes one by one and selects the first The algorithm is a variation of the original given by Nilsson
neighboring node which optimizes the current cost as next node. It requires that nodes traversed in the tree be labeled as solved or unsolved in the
Algorithm for Simple Hill climbing : solution process to account for And node solutions which require solutions to all
Step 1 : Evaluate the initial state. If it is a goal state then stop and return success. Otherwise, successors nodes.
make initial state as current state. A solution is found when the start node is labeled as solved
Step 2 : Loop until the solution state is found or there are no new operators present which can be
applied to current state. A* Search Algorithm
a) Select a state that has not been yet applied to the current state and apply it to produce a new A* is a type of search algorithm. Some problems can be solved by representing the world in the
state. initial state, and then for each action we can perform on the world we generate states for what the
b) Perform these to evaluate new state world would be like if we did so. If you do this until the world is in the state that we specified as
i. If the current state is a goal state, then stop and return success. a solution, then the route from the start to this goal state is the solution to your problem.
ii. If it is better than the current state, then make it current state and proceed further.
iii. If it is not better than the current state, then continue in the loop until a solution is found. AO* Search: (And-Or) Graph
Step 3 : Exit. The Depth first search and Breadth first search given earlier for OR trees or graphs can be easily
40 adopted by AND-OR graph. The main difference lies in the way termination conditions are
2. Steepest-Ascent Hill climbing : It first examines all the neighboring nodes and then determined, since all goals following an AND nodes must be realized; where as a single goal
selects the node closest to the solution state as next node. node following an OR node will do. So for this purpose we are using AO* algorithm.
Step 1 : Evaluate the initial state. If it is goal state then exit else make the current state as initial Like A* algorithm here we will use two arrays and one heuristic function.
state OPEN:
Step 2 : Repeat these steps until a solution is found or current state does not change It contains the nodes that has been traversed but yet not been marked solvable or unsolvable.
i. Let ‘target’ be a state such that any successor of the current state will be better than it; CLOSE:
ii. for each operator that applies to the current state It contains the nodes that have already been processed.
a. apply the new operator and create a new state 6 7:The distance from current node to goal node.
b. evaluate the new state Algorithm:
c. if this state is goal state then quit else compare with ‘target’ Step 1: Place the starting node into OPEN.
d. if this state is better than ‘target’, set this state as ‘target’ Step 2: Compute the most promising solution tree say T0.
e. if target is better than current state set current state to Target Step 3: Select a node n that is both on OPEN and a member of T0. Remove it from OPEN and
Step 3 : Exit place it in

CLOSE explanation and confirmation. All of them are intimately related to problems of belief
Step 4: If n is the terminal goal node then leveled n as solved and leveled all the ancestors of n revision and theory development, knowledge absorption, discovery, and learning.
as solved. If the starting node is marked as solved then success and exit. Logical Reasoning
Step 5: If n is not a solvable node, then mark n as unsolvable. If starting node is marked as Logic is a language for reasoning. It is a collection of rules called Logic arguments, we
unsolvable, then return failure and exit. use when doing logical reasoning.
Step 6: Expand n. Find all its successors and find their h (n) value, push them into OPEN. The logic reasoning is the process of drawing conclusions from premises using rules of
Step 7: Return to Step 2. inference.
Step 8: Exit. The study of logic divided into formal and informal logic. The formal logic is sometimes
called symbolic logic.
MEANS - ENDS ANALYSIS:- Symbolic logic is the study of symbolic abstractions (construct) that capture the formal
Most of the search strategies either reason forward of backward however, often a mixture o the features of logical inference by a formal system.
two directions is appropriate. Such mixed strategy would make it possible to solve the major The formal system consists of two components, a formal language plus a set of inference
parts of problem first and solve the smaller problems the arise when combining them together. rules.
Such a technique is called "Means - Ends Analysis". The formal system has axioms. Axiom is a sentence that is always true within the system.
The means -ends analysis process centers around finding the difference between current state and Sentences derived using the system’s axioms and rules of derivation called theorems.
goal state. The problem space of means - ends analysis has an initial state and one or more goal The Logical Reasoning is of our concern in AI.
state, a set of operate with a set of preconditions their application and difference functions that Approaches to Reasoning
computes the difference between two state a(i) and s(j). A problem is solved using means - ends There are three different approaches to reasoning under uncertainties.
analysis by 1. Symbolic reasoning
2. Statistical reasoning
Propositional Resolution 3. Fuzzy logic reasoning
1. Convert all the propositions of F to clause form. Symbolic Reasoning
2. Negate P and convert the result to clause form. Add it to the set of clauses obtained in The basis for intelligent mathematical software is the integration of the “power of
step 1. symbolic mathematical tools” with the suitable “proof technology”.
3. Repeat until either a contradiction is found or no progress can be made: Mathematical reasoning enjoys a property called monotonicity, that says, “If a conclusion
1. Select two clauses. Call these the parent clauses. follows from given premises A, B, C… then it also follows from any larger set of
2. Resolve them together. The resulting clause, called the resolvent, will be the premises, as long as the original premises A, B, C.. included.”
disjunction of all of the literals of both of the parent clauses with the following Moreover, Human reasoning is not monotonic.
exception: If there are any pairs of literals L and ¬ L such that one of the parent People arrive at conclusions only tentatively; based on partial or incomplete information,
clauses contains L and the other contains ¬L, then select one such pair and reserve the right to retract those conclusions while they learn new facts. Such reasoning
eliminate both L and ¬ L from the resolvent. non-monotonic, precisely because the set of accepted conclusions have become smaller
3. If the resolvent is the empty clause, then a contradiction has been found. If it is when the set of premises expanded.
not, then add it to the set of classes available to the procedure.
The Unification Algorithm Game Playing
In propositional logic, it is easy to determine that two literals cannot both be true at the
same time. Charles Babbage, the nineteenth-century computer architect thought about programming
Simply look for L and ¬L in predicate logic, this matching process is more complicated his analytical engine to play chess and later of building a machine to play tic-tac-toe.
since the arguments of the predicates must be considered. There are two reasons that games appeared to be a good domain.
For example, man(John) and ¬man(John) is a contradiction, while the man(John) and 1. They provide a structured task in which it is very easy to measure success or
¬man(Spot) is not. failure.
Thus, in order to determine contradictions, we need a matching procedure that compares 2. They are easily solvable by straightforward search from the starting state to a
two literals and discovers whether there exists a set of substitutions that makes them winning position.
identical. The first is true is for all games bust the second is not true for all, except simplest games.
There is a straightforward recursive procedure, called the unification algorithm, that does For example, consider chess.
it. The average branching factor is around 35. In an average game, each player might make
50.
Symbolic Reasoning So in order to examine the complete game tree, we would have to examine 35100
The reasoning is the act of deriving a conclusion from certain properties using a given Thus it is clear that a simple search is not able to select even its first move during the
methodology. lifetime of its opponent.
The reasoning is a process of thinking; reasoning is logically arguing; reasoning is It is clear that to improve the effectiveness of a search based problem-solving program
drawing the inference. two things can do.
When a system is required to do something, that it has not been explicitly told how to do, 1. Improve the generate procedure so that only good moves generated.
it must reason. It must figure out what it needs to know from what it already knows. 2. Improve the test procedure so that the best move will recognize and explored first.
81 If we use legal-move generator then the test procedure will have to look at each of them
Many types of Reasoning have been identified and recognized, but many questions because the test procedure must look at so many possibilities, it must be fast.
regarding their logical and computational properties still remain controversial. Instead of the legal-move generator, we can use plausible-move generator in which only
The popular methods of Reasoning include abduction, induction, model-based, some small numbers of promising moves generated.
As the number of lawyers available moves increases, it becomes increasingly important What is NLP?
in applying heuristics to select only those moves that seem more promising. Natural language processing (NLP) can be de_ned as the automatic (or semi-automatic) processing of
The performance of the overall system can improve by adding heuristic knowledge into human language.
both the generator and the tester. The term `NLP' is sometimes used rather more narrowly than that, often excluding information retrieval
In game playing, a goal state is one in which we win but the game like chess. It is not and sometimes even excluding machine translation. NLP is sometimes contrasted with `computational
possible. Even we have good plausible move generator. linguistics', with NLP being thought of as more applied. Nowadays, alternative terms are often preferred,
The depth of the resulting tree or graph and its branching factor is too great. like `Language Technology' or `Language Engineering'. Language is often used in contrast with speech
It is possible to search tree only ten or twenty moves deep then in order to choose the best (e.g., Speech and Language Technology). But I'm going to simply refer to NLP and use the term
move. The resulting board positions must compare to discover which is most broadly.
advantageous. NLP is essentially multidisciplinary: it is closely related to linguistics (although the extent to which NLP
This is done using static evolution function, which uses whatever information it has to overtly draws on linguistic theory varies considerably). It also has links to research in cognitive science,
evaluate individual board position by estimating how likely they are to lead eventually to psychology, philosophy and maths (especially logic). Within CS, it relates to formal language theory,
a win. compiler techniques, theorem proving, machine learning and human-computer interaction. Of course it
Its function is similar to that of the heuristic function h’ in the A* algorithm: in the is also related to AI, though nowadays it's not generally thought of as part of AI.
absence of complete information, choose the most promising position.
MINIMAX Search Procedure Some linguistic terminology
The minimax search is a depth-first and depth limited procedure. The course is organised so that there are six lectures corresponding to different NLP subareas, moving
The idea is to start at the current position and use the plausible-move generator to from relatively
generate the set of possible successor positions. `shallow' processing to areas which involve meaning and connections with the real world. These
93 subareas loosely
Now we can apply the static evolution function to those positions and simply choose the correspond to some of the standard subdivisions of linguistics:
best one. 1. Morphology: the structure of words. For instance, unusually can be thought of as composed of a
After doing so, we can back that value up to the starting position to represent our pre_x un-, a
evolution of it. stem usual, and an af_x -ly. composed is compose plus the in_ectional af_x -ed: a spelling rule means
Here we assume that static evolution function returns larger values to indicate good we end up with composed rather than composeed. Morphology will be discussed in lecture 2.
situations for us. 2. Syntax: the way words are used to form phrases. e.g., it is part of English syntax that a determiner
So our goal is to maximize the value of the static evaluation function of the next board such as the will come before a noun, and also that determiners are obligatory with certain singular
position. nouns. Formal and computational aspects of syntax will be discussed in lectures 3, 4 and 5.
The opponents’ goal is to minimize the value of the static evaluation function. 3. Semantics. Compositional semantics is the construction of meaning (generally expressed as logic)
The alternation of maximizing and minimizing at alternate ply when evaluations are based on syntax. Compositional semantics is discussed in lecture 5. This is contrasted to lexical
to be pushed back up corresponds to the opposing strategies of the two players is semantics, i.e., the meaning of individual words, which is discussed in lecture 6.
called MINIMAX. 4. Pragmatics: meaning in context. This will come into lecture 7, although linguistics and NLP generally
It is the recursive procedure that depends on two procedures have very different perspectives here.
MOVEGEN(position, player)— The plausible-move generator, which returns a
list of nodes representing the moves that can make by Player in Position. Introduction to Natural Language Processing
STATIC(position, player)– static evaluation function, which returns a number Language meant for communicating with the world.
representing the goodness of Position from the standpoint of Player. Also, By studying language, we can come to understand more about the world.
With any recursive program, we need to decide when recursive procedure should stop. If we can succeed at building computational mode of language, we will have a powerful
There are the variety of factors that may influence the decision they are, tool for communicating with the world.
Has one side won? Also, We look at how we can exploit knowledge about the world, in combination with
How many plies have we already explored? Or how much time is left? linguistic facts, to build computational natural language systems.
How stable is the configuration? Natural Language Processing (NLP) problem can divide into two tasks:
We use DEEP-ENOUGH which assumed to evaluate all of these factors and to return 1. Processing written text, using lexical, syntactic and semantic knowledge of the language
TRUE if the search should be stopped at the current level and FALSE otherwise. as well as the required real-world information.
It takes two parameters, position, and depth, it will ignore its position parameter and 2. Processing spoken language, using all the information needed above plus additional
simply return TRUE if its depth parameter exceeds a constant cut off value. knowledge about phonology as well as enough added information to handle the further
One problem that arises in defining MINIMAX as a recursive procedure is that it needs to ambiguities that arise in speech.
return not one but two results. Steps in Natural Language Processing
The backed-up value of the path it chooses. Morphological Analysis
The path itself. We return the entire path even though probably only the first Individual words analyzed into their components and non-word tokens such as
element, representing the best move from the current position, actually needed. punctuation separated from the words.
We assume that MINIMAX returns a structure containing both results and we have two Syntactic Analysis
functions, VALUE and PATH that extract the separate components. Linear sequences of words transformed into structures that show how the words relate to
Initially, It takes three parameters, a board position, the current depth of the search, and each other.
the player to move, Moreover, Some word sequences may reject if they violate the language’s rule for how
MINIMAX(current,0,player-one) If player –one is to move words may combine.
MINIMAX(current,0,player-two) If player –two is to move Semantic Analysis

The structures created by the syntactic analyzer assigned meanings. parsing is done, on the other hand, it constrains the number of constituents that
Also, A mapping made between the syntactic structures and objects in the task domain. semantics can consider.
Moreover, Structures for which no such mapping possible may reject. 2. Syntactic parsing is computationally less expensive than is semantic processing.
Discourse integration Thus it can play a significant role in reducing overall system complexity.
The meaning of an individual sentence may depend on the sentences that precede it. And Although it is often possible to extract the meaning of a sentence without using
also, may influence the meanings of the sentences that follow it. grammatical facts, it is not always possible to do so.
Pragmatic Analysis Almost all the systems that are actually used have two main components:
Moreover, The structure representing what said reinterpreted to determine what was 1. A declarative representation, called a grammar, of the syntactic facts about the
actually meant. language.
Summary 2. A procedure, called parser that compares the grammar against input sentences to
Results of each of the main processes combine to form a natural language system. produce parsed structures.
All of the processes are important in a complete natural language understanding system.
Not all programs are written with exactly these components. Semantic Analysis
Sometimes two or more of them collapsed. The structures created by the syntactic analyzer assigned meanings.
Doing that usually results in a system that is easier to build for restricted subsets of A mapping made between the syntactic structures and objects in the task domain.
English but one that is harder to extend to wider coverage. Structures for which no such mapping is possible may rejected.
The semantic analysis must do two important things:
Steps Natural Language Processing It must map individual words into appropriate objects in the knowledge base or
Morphological Analysis database.
Suppose we have an English interface to an operating system and the following sentence It must create the correct structures to correspond to the way the meanings of the
typed: I want to print Bill’s .init file. individual words combine with each other. Semantic Analysis AI
The morphological analysis must do the following things: Producing a syntactic parse of a sentence is only the first step toward understanding it.
Pull apart the word “Bill’s” into proper noun “Bill” and the possessive suffix “’s” We must produce a representation of the meaning of the sentence.
Recognize the sequence “.init” as a file extension that is functioning as an adjective in the Because understanding is a mapping process, we must first define the language into
sentence. which we are trying to map.
This process will usually assign syntactic categories to all the words in the sentence. There is no single definitive language in which all sentence meaning can describe.
Morphology concerns the structure of words. Words are assumed to be made up of morphemes, which The choice of a target language for any particular natural language understanding
are the minimal information carrying unit. Morphemes which can only occur in conjunction with other program must depend on what is to do with the meanings once they constructed.
morphemes are af_xes: words are made up of a stem (more than one in the case of compounds) and Choice of the target language in Semantic Analysis AI
zero or more af_xes. For instance, dog is a stem which may occur with the plural suf_x +s i.e., dogs. There are two broad families of target languages that used in NL systems,
English only has suf_xes (af_xes which come after a stem) and pre_xes (which come before the stem.in depending on the role that the natural language system playing in a larger system:
English these are limited to derivational morphology), but other languages have in_xes (af_xes which When natural language considered as a phenomenon on its own, as for example
occur inside the stem) and circum_xes (af_xes which go around a stem). For instance, when one builds a program whose goal is to read the text and then answer
Arabic has stems (root forms) such as k t b, which are combined with in_xes to form words (e.g., questions about it. A target language can design specifically to support language
kataba, he wrote; kotob, books). Some English irregular verbs show a relic of in_ection by in_xation processing.
(e.g. sing, sang, sung) but this process is no longer productive (i.e., it won't apply to any new words, When natural language used as an interface language to another program (such as
such as ping). a db query system or an expert system), then the target language must legal input
Syntactic Analysis to that other program. Thus the design of the target language driven by the
A syntactic analysis must exploit the results of the morphological analysis to build a backend program.
structural description of the sentence.
The goal of this process, called parsing, is to convert the flat list of words that form the Statistical Natural Language Processing
sentence into a structure that defines the units that represented by that flat list. Formerly, many language-processing tasks typically involved the direct hand coding of
The important thing here is that a flat sentence has been converted into a hierarchical rules, which is not in general robust to natural-language variation. The machine-learning
structure. And that the structure corresponds to meaning units when a semantic analysis paradigm calls instead for using statistical inference to automatically learn such rules through the
performed. analysis of large corpora of typical real-world examples (a corpus (plural, "corpora") is a set of
Reference markers (set of entities) shown in the parenthesis in the parse tree. documents, possibly with human or computer annotations).
Each one corresponds to some entity that has mentioned in the sentence. Many different classes of machine learning algorithms have been applied to natural-language
These reference markers are useful later since they provide a place in which to processing tasks. These algorithms take as input a large set of "features" that are generated from
accumulate information about the entities as we get it. the input data. Some of the earliest-used algorithms, such as decision trees, produced systems of
hard if-then rules similar to the systems of hand-written rules that were then common.
Syntactic Processing Increasingly, however, research has focused on statistical models, which make
Syntactic Processing is the step in which a flat input sentence converted into a soft, probabilistic decisions based on attaching real-valued weights to each input feature. Such
hierarchical structure that corresponds to the units of meaning in the sentence. This models have the advantage that they can express the relative certainty of many different possible
process called parsing. answers rather than only one, producing more reliable results when such a model is included as a
It plays an important role in natural language understanding systems for two reasons: component of a larger system.
1. Semantic processing must operate on sentence constituents. If there is no syntactic Systems based on machine-learning algorithms have many advantages over hand-produced rules:
parsing step, then the semantics system must decide on its own constituents. If 110
The learning procedures used during machine learning automatically focus on the most system performs very limited manipulation on the input to map to a known question. The same basic
common cases, whereas when writing rules by hand it is often not at all obvious where technique is used
the effort should be directed. in many online help systems.
Automatic learning procedures can make use of statistical inference algorithms to _ summarization
produce models that are robust to unfamiliar input (e.g. containing words or structures _ text segmentation
that have not been seen before) and to erroneous input (e.g. with misspelled words or _ exam marking
words accidentally omitted). Generally, handling such input gracefully with hand-written _ report generation (possibly multilingual)
rules—or more generally, creating systems of hand-written rules that make soft _ machine translation
decisions—is extremely difficult, error-prone and time-consuming. _ natural language interfaces to databases - Natural language interfaces were the `classic' NLP problem
Systems based on automatically learning the rules can be made more accurate simply by in the 70s and 80s. LUNAR is the classic example of a natural language interface to a database (NLID):
supplying more input data. However, systems based on hand-written rules can only be its database concerned lunar rock samples brought back from the Apollo missions. LUNAR is described
made more accurate by increasing the complexity of the rules, which is a much more by Woods (1978) (but note most of the work was done several years earlier): it
difficult task. In particular, there is a limit to the complexity of systems based on handcrafted was capable of translating elaborate natural language expressions into database queries.
rules, beyond which the systems become more and more unmanageable. _ email understanding
However, creating more data to input to machine-learning systems simply requires a _ dialogue systems
corresponding increase in the number of man-hours worked, generally without significant Several of these applications are discussed briefly below. Roughly speaking, they are ordered according
increases in the complexity of the annotation process. to the complexity
Spell Checking of the language technology required. The applications towards the top of the list can be seen simply as
Spell checking is one of the applications of natural language processing that impacts billions of aids to
users daily. A good introduction to spell checking can be found on Peter Norvig’s webpage. The human users, while those at the bottom are perceived as agents in their own right. Perfect performance
article introduces a simple 21-line spell checker implementation in Python combining simple on any of these
language and error models to predict the word a user intended to type. The language model applications would be AI-complete, but perfection isn't necessary for utility: in many cases, useful
estimates how likely a given word `c` is in the language for which the spell checker is versions of these
designed, this can be written as `P(C)`. The error model estimates the probability `P(w|c)` applications had been built by the late 70s. Commercial success has often been harder to achieve,
of typing the misspelled version `w` conditionally to the intention of typing the correctly however.
spelled word `c`.The spell checker then returns word `c` corresponding to the highest value of THE CHALLENGES OF NATURAL LANGUAGE
`P(w|c)P(c)` among all possible words in the language. PROCESSING
Natural language processing (NLP) is the field of designing methods and algorithms that take
1.4 Some NLP applications as input or produce as output unstructured, natural language data. Human language is highly
The following list is not complete, but useful systems have been built for: ambiguous (consider the sentence I ate pizza with friends, and compare it to I ate pizza with
_ spelling and grammar checking olives), and also highly variable (the core message of I ate pizza with friends can also be expressed as
_ optical character recognition (OCR) friends and I shared some pizza). It is also ever changing and evolving. People are great at producing
_ screen readers for blind and partially sighted users language and understanding language, and are capable of expressing, perceiving, and interpreting
augmentative and alternative communication (i.e., systems to aid people who have dif_culty very elaborate and nuanced meanings. At the same time, while we humans are great users of
communicating because of disability) language, we are also very poor at formally understanding and describing the rules that govern
_ machine aided translation (i.e., systems which help a human translator, e.g., by storing translations of language.
phrases and providing online dictionaries integrated with word processors, etc) Understanding and producing language using computers is thus highly challenging. Indeed,
_ lexicographers' tools the best known set of methods for dealing with language data are using supervised machine learning
_ information retrieval - Information retrieval involves returning a set of documents in response to a algorithms, that attempt to infer usage patterns and regularities from a set of pre-annotated
user query: Internet search engines are a form of IR. However, one change from classical IR is that input and output pairs. Consider, for example, the task of classifying a document into one of four
Internet search now uses techniques that rank documents according to how many links there are to categories: S ,P ,G , and E . Obviously, the words in the
them (e.g., Google's PageRank) as well as the presence of search terms. documents
_ document classi_cation (_ltering, routing) provide very strong hints, but which words provide what hints? Writing up rules for this task is
_ document clustering rather challenging. However, readers can easily categorize a document into its topic, and then,
_ information extraction - Information extraction involves trying to discover speci_c information from a based on a few hundred human-categorized examples in each category, let a supervised machine
set of documents. The information required can be described as a template. For instance, for company learning algorithm come up with the patterns of word usage that help categorize the documents.
joint ventures, the template might have slots for Machine learning methods excel at problem domains where a good set of rules is very hard to
the companies, the dates, the products, the amount of money involved. The slot _llers are generally define but annotating the expected output for a given input is relatively simple.
strings. Besides the challenges of dealing with ambiguous and variable inputs in a system with illdefined
_ question answering - Question answering attempts to _nd a speci_c answer to a speci_c question from and unspecified set of rules, natural language exhibits an additional set of properties that
a set of documents, or at least a short piece of text that contains the answer. make it even more challenging for computational approaches, including machine learning: it is
What is the capital of France? discrete, compositional, and sparse.
Paris has been the French capital for many centuries. Language is symbolic and discrete. e basic elements of written language are characters.
There are some question-answering systems on theWeb, but most use very basic techniques. For Characters form words that in turn denote objects, concepts, events, actions, and ideas. Both
instance, Ask Jeeves characters and words are discrete symbols: words such as “hamburger” or “pizza” each evoke in us
relies on a fairly large staff of people who search the web to _nd pages which are answers to potential a certain mental representations, but they are also distinct symbols, whose meaning is external to
questions. The them and left to be interpreted in our heads.

NEURAL NETWORKS AND DEEP LEARNING The first international conference on Machine Translation (MT) was held in 1952 and second was held in 1956.
Deep learning is a branch of machine learning. It is a re-branded name for neural networks—a
family of learning techniques that was historically inspired by the way computation works in the In 1961, the work presented in Teddington International Conference on Machine Translation of Languages and
brain, and which can be characterized as learning of parameterized differentiable mathematical Applied Language analysis was the high point of this phase.
functions.¹ e name deep-learning stems from the fact that many layers of these differentiable
function are often chained together. Second Phase (AI Influenced Phase) – Late 1960s to late 1970s
While all of machine learning can be characterized as learning to make predictions based In this phase, the work done was majorly related to world knowledge and on its role in the construction and
manipulation of meaning representations. That is why, this phase is also called AI-flavored phase.
on past observations, deep learning approaches work by learning to not only predict but also to
The phase had in it, the following:
correctly represent the data, such that it is suitable for prediction. Given a large set of desired
In early 1961, the work began on the problems of addressing and constructing data or knowledge base. This
inputoutput work was influenced by AI.
mapping, deep learning approaches work by feeding the data into a network that produces
successive transformations of the input data until a final transformation predicts the output. e In the same year, a BASEBALL question-answering system was also developed. The input to this system was
transformations produced by the network are learned from the given input-output mappings, restricted and the language processing involved was a simple one.
such that each transformation makes it easier to relate the data to the desired label.
A much advanced system was described in Minsky (1968). This system, when compared to the BASEBALL
Applications of morphological processing question-answering system, was recognized and provided for the need of inference on the knowledge base in
It is possible to use a full-form lexicon for English NLP: i.e., to list all the in_ected forms and to treat interpreting and responding to language input.
derivational morphology as non-productive. However, when a new word has to be treated (generally
because the application is expanded but in principle because a new word has entered the language) it is Third Phase (Grammatico-logical Phase) – Late 1970s to late 1980s
redundant to have to specify (or learn) the in_ected forms as well as the stem, since the vast majority This phase can be described as the grammatico-logical phase. Due to the failure of practical system building in last
of words in English have regular morphology. So a full-form lexicon is best regarded as a form of phase, the researchers moved towards the use of logic for knowledge representation and reasoning in AI.
compilation. Many other languages have many more in_ectional forms, which increases the need to do The third phase had the following in it:
The grammatico-logical approach, towards the end of decade, helped us with powerful general-purpose
morphological analysis rather than full-form listing.
sentence processors like SRI’s Core Language Engine and Discourse Representation Theory, which offered a means
IR systems use stemming rather than full morphological analysis. For IR, what is required is to relate
of tackling more extended discourse.
forms, not to analyse them compositionally, and this can most easily be achieved by reducing all
morphologically complex forms to a canonical form. Although this is referred to as stemming, the In this phase we got some practical resources & tools like parsers, e.g. Alvey Natural Language Tools along with
canonical form may not be the linguistic stem.The most commonly used algorithm is the Porter more operational and commercial systems, e.g. for database query.
stemmer, which uses a series of simple rules to strip endings (see J&M, section 3.4) without the need
for a lexicon. However, stemming does not necessarily help IR. Search engines sometimes do The work on lexicon in 1980s also pointed in the direction of grammatico-logical approach.
in_ectional morphology, but this can be dangerous. For instance, one search engine searches for corpus
as well as corpora when given the latter as input, resulting in a large number of spurious results Fourth Phase (Lexical & Corpus Phase) – The 1990s
involving Corpus Christi and similar terms. We can describe this as a lexical & corpus phase. The phase had a lexicalized approach to grammar that appeared
In most NLP applications, however, morphological analysis is a precursor to some form of parsing. In in late 1980s and became an increasing influence. There was a revolution in natural language processing in this
this case, the requirement is to analyse the form into a stem and af_xes so that the necessary syntactic decade with the introduction of machine learning algorithms for language processing.
(and possibly semantic) information can be associated with it. Morphological analysis is often called
lemmatization. For instance, for the part of speech tagging application which we will discuss in the next
lecture, mugged would be assigned a part of speech tag which indicates it is a verb, though mug is
ambiguous between verb and noun. For full parsing, as discussed in lectures 4 and 5, we'll need more
detailed syntactic and semantic information. Morphological generation takes a stem and some syntactic
information and returns the correct form. For some applications, there is a requirement that
morphological processing is bidirectional: that is, can be used for analysis and generation. The _nite
state transducers we will look at below have this property.

History of NLP
We have divided the history of NLP into four phases. The phases have distinctive concerns and styles.
First Phase (Machine Translation Phase) – Late 1940s to late 1960s
The work done in this phase focused mainly on machine translation (MT). This phase was a period of enthusiasm
and optimism.
Let us now see all that the first phase had in it:
The research on NLP started in early 1950s after Booth & Richens’ investigation and Weaver’s memorandum on
machine translation in 1949.

1954 was the year when a limited experiment on automatic translation from Russian to English demonstrated in
the Georgetown-IBM experiment.

In the same year, the publication of the journal MT (Machine Translation) started.
1 Introduction to Machine Learning y = wx + b (1)
Machine learning is a set of tools that, broadly speaking, allow us to “teach” computers how to where w is a weight and b is a bias. These two scalars are the parameters of the model, which
perform tasks by providing examples of how they should be done. For example, suppose we wish we would like to learn from training data. n particular, we wish to estimate w and b from the N
to write a program to distinguish between valid email messages and unwanted spam. We could try training pairs {(xi, yi)}Ni
to write a set of simple rules, for example, flagging messages that contain certain features (such =1. Then, once we have values for w and b, we can compute the y for a
as the word “viagra” or obviously-fake headers). However, writing rules to accurately distinguish new x.
which text is valid can actually be quite difficult to do well, resulting either in many missed spam
messages, or, worse, many lost emails. Worse, the spammers will actively adjust the way they K-Nearest Neighbors
send spam in order to trick these strategies (e.g., writing “vi@gr@”). Writing effective rules — At heart, many learning procedures — especially when our prior knowledge is weak — amount
and keeping them up-to-date — quickly becomes an insurmountable task. Fortunately, machine to smoothing the training data. RBF fitting is an example of this. However, many of these fitting
learning has provided a solution. Modern spam filters are “learned” from examples: we provide the procedures require making a number of decisions, such as the locations of the basis functions, and
learning algorithm with example emails which we have manually labeled as “ham” (valid email) can be sensitive to these choices. This raises the question: why not cut out the middleman, and
or “spam” (unwanted email), and the algorithms learn to distinguish between them automatically. smooth the data directly? This is the idea behind K-Nearest Neighbors regression.
Machine learning is a diverse and exciting field, and there are multiple ways of defining it: The idea is simple. We first select a parameter K, which is the only parameter to the algorithm.
1. The Artifical Intelligence View. Learning is central to human knowledge and intelligence, Then, for a new input x, we find the K nearest neighbors to x in the training set, based on their
and, likewise, it is also essential for building intelligent machines. Years of effort in AI Euclidean distance ||x−xi||2. Then, our new output y is simply an average of the training outputs
has shown that trying to build intelligent computers by programming all the rules cannot be
done; automatic learning is crucial. For example, we humans are not born with the ability K-nearest neighbors is simple and easy to implement; it doesn’t require us to muck about at
to understand language — we learn it — and it makes sense to try to have computers learn all with different choices of basis functions or regularizations. However, it doesn’t compress the
language instead of trying to program it all it. data at all: we have to keep around the entire training set in order to use it, which could be very
2. The Software Engineering View. Machine learning allows us to program computers by expensive, and we must search the whole data set to make predictions. (The cost of searching
example, which can be easier than writing code the traditional way. can be mitigated with spatial data-structures designed for searching, such as k-d-trees and
3. The Stats View. Machine learning is the marriage of computer science and statistics: computational localitysensitive
techniques are applied to statistical problems. Machine learning has been applied hashing.
to a vast number of problems in many contexts, beyond the typical statistics problems. Machine
learning is often designed with different considerations than statistics (e.g., speed is Quadratics
often more important than accuracy). The objective functions used in linear least-squares and regularized least-squares are multidimensional
Often, machine learning methods are broken into two phases: quadratics. We now analyze multidimensional quadratics further. We will see many more
1. Training: A model is learned from a collection of training data. uses of quadratics further in the course, particularly when dealing with Gaussian distributions.
2. Application: The model is used to make decisions about some new test data. The general form of a one-dimensional quadratic is given by:
f(x) = w2x2 + w1x + w0 (46)
Types of Machine Learning This can also be written in a slightly different way (called standard form):
Some of the main types of machine learning are: f(x) = a(x − b)2 + c (47)
1. Supervised Learning, in which the training data is labeled with the correct answers, e.g., where a = w2, b = −w1/(2w2), c = w0 − w2
“spam” or “ham.” The two most common types of supervised learning are classification 1/4w2. These two forms are equivalent, and it is
(where the outputs are discrete labels, as in spam filtering) and regression (where the outputs easy to go back and forth between them (e.g., given a, b, c, what are w0,w1,w2?).
are real-valued).
2. Unsupervised learning, in which we are given a collection of unlabeled data, which we wish Basic Probability Theory
to analyze and discover patterns within. The two most important examples are dimension Probability theory addresses the following fundamental question: how do we reason? Reasoning
reduction and clustering. is central to many areas of human endeavor, including philosophy (what is the best way to make
3. Reinforcement learning, in which an agent (e.g., a robot or controller) seeks to learn the decisions?), cognitive science (how does the mind work?), artificial intelligence (how do we build
optimal actions to take based the outcomes of past actions. reasoning machines?), and science (how do we test and develop theories based on experimental
There are many other types of machine learning as well, for example: data?). In nearly all real-world situations, our data and knowledge about the world is incomplete,
1. Semi-supervised learning, in which only a subset of the training data is labeled indirect, and noisy; hence, uncertainty must be a fundamental part of our decision-making process.
2. Time-series forecasting, such as in financial markets Bayesian reasoning provides a formal and consistent way to reasoning in the presence of
3. Anomaly detection such as used for fault-detection in factories and in surveillance uncertainty; probabilistic inference is an embodiment of common sense reasoning.
4. Active learning, in which obtaining data is expensive, and so an algorithm must determine The approach we focus on here is Bayesian. Bayesian probability theory is distinguished by
which training data to acquire defining probabilities as degrees-of-belief. This is in contrast to Frequentist statistics, where the
probability of an event is defined as its frequency in the limit of an infinite number of repeated
Linear Regression trials.
In regression, our goal is to learn a mapping from one real-valued space to another. Linear regression 5.1 Classical logic
is the simplest form of regression: it is easy to understand, often quite effective, and very Perhaps the most famous attempt to describe a formal system of reasoning is classical logic, originally
efficient to learn and use. developed by Aristotle. In classical logic, we have some statements that may be true or false,
2.1 The 1D case and we have a set of rules which allow us to determine the truth or falsity of new statements.
We will start by considering linear regression in just 1 dimension. Here, our goal is to learn a
mapping y = f(x), where x and y are both real-valued scalars (i.e., x R, y R). We will take Basic definitions and rules
f to be an linear function of the form: The rules of probability theory provide a system for reasoning with uncertainty.There are a number

of justifications for the use of probability theory to represent logic (such as Cox’s Axioms) that set” and a “validation set.” Let K be the unknown model parameter. We pick a set of range of
show, for certain particular definitions of common-sense reasoning, that probability theory is the possible values for K (e.g., K = 1, ..., 5). For each possible value of K, we learn a model with
only system that is consistent with common-sense reasoning that K on the training set, and compute that model’s error on the validation set. For example, the
error on validation set might be just the squared-error,
Gaussian distributions P
Arguably the single most important PDF is the Normal (a.k.a., Gaussian) probability distribution i ||yi − f(xi)||2. We then pick the K
function (PDF). Among the reasons for its popularity are that it is theoretically elegant, and arises which has the smallest validation set error. The same idea can be applied if we have more model
naturally in a number of situations. It is the distribution that maximizes entropy, and it is also tied parameters (e.g., the σ in KNN), however, we must try many possible combinations of K and σ to
to the Central Limit Theorem: the distribution of a random variable which is the sum of a number find the best.
of random variables approaches the Gaussian distribution as that number tends to infinity (Figure There is a significant problem with this approach: we use less training data when fitting the
6). other model parameters, and so we will only get good results if our initial training set is rather
Perhaps most importantly, it is the analytical properties of the Gaussian that make it so ubiquitous. large. If large amounts of data are expensive or impossible to obtain this can be a serious problem.
Gaussians are easy to manipulate, and their form so well understood, that we often assume N-Fold Cross Validation. We can use the datamuchmore efficiently byN-fold cross-validation.
quantities are Gaussian distributed, even though they are not, in order to turn an intractable model, In this approach, we randomly partition the training data into N sets of equal size and run the
or problem, into something that is easier to work with. learning algorithm N times. Each time, a different one of the N sets is deemed the test set, and
the model is trained on the remaining N − 1 sets. The value of K is scored by averaging the error
Bayes’ Rule across the N test errors. We can then pick the value of K that has the lowest score, and then learn
In general, given that we have a model of the world described by some unknown variables, and we model parameters for this K.
observe some data; our goal is to determine the model from the data. (In coin-flip example, the Bayesian Methods
model consisted of the likelihood of the coin landing heads, and the prior over θ, while the data So far, we have considered statistical methods which select a single “best” model given the data.
consisted of the results of N coin flips.) We describe the probability model as p(data|model)—if This approach can have problems, such as over-fitting when there is not enough data to fully constrain
we knew model, then this model will tell us what data we expect. Furthermore, we must have some the model fit. In contrast, in the “pure” Bayesian approach, as much as possible we only compute
prior beliefs as to what model is (p(model)), even if these beliefs are completely non-committal distributions over unknowns; we never maximize anything. For example, consider a model
(e.g., a uniform distribution). Given the data, what do we know about model? parameterized by some weight vector w, and some training data D that comprises input-output
Applying the product rule as before gives: pairs xi, yi, for i = 1...N. The posterior probability distribution over the parameters, conditioned
p(data,model) = p(data|model)p(model) = p(model|data)p(data) (130) on the data is, using Bayes’ rule, given by
Solving for the desired distribution, gives a seemingly simple but powerful result, known widely p(w|D) =
as Bayes’ Rule: p(D|w)p(w)
Bayes’ Rule: p(model|data) = p(data|model)p(model)/p(data) p(D)
The likelihood distribution describes the likelihood of data given model — it reflects our (205)
assumptions about how the data c was generated. The reason we want to fit the model in the first place is to allow us to make predictions with future
• The prior distribution describes our assumptions about model before observing the data test data. That is, given some future input xnew, we want to use the model to predict ynew. To
data. accomplish this task through estimation in previous chapters, we used optimization to find ML or
• The posterior distribution describes our knowledge of model, incorporating both the data MAP estimates of w, e.g., by maximizing (205).
and the prior. In a Bayesian approach, rather than estimation a single best value for w, we computer (or
• The evidence is useful in model selection, and will be discussed later. Here, its only role is approximate) the entire posterior distribution p(w|D). Given the entire distribution, we can still
to normalize the posterior PDF. make predictions with the following integral:
Cross Validation p(ynew|D, xnew) =
Suppose we must choose between two possible ways to fit some data. How do we choose between Z
them? Simply measuring how well they fit they data would mean that we always try to fit the p(ynew,w|D, xnew)dw
data as closely as possible — the best method for fitting the data is simply to memorize it in big =
look-up table. However, fitting the data is no guarantee that we will be able to generalize to new Z
measurements. As another example, consider the use of polynomial regression to model a function p(ynew|w,D, xnew) p(w|D, xnew)dw (206)
given a set of data points. Higher-order polynomials will always fit the data as well or better than The first step in this equality follows from the Sum Rule. The second follows from the Product
a low-order polynomial; indeed, an N − 1 degree polynomial will fit N data points exactly (to Rule. Additionally, the outputs ynew and training data D are independent conditioned on w, so
within numerical error). So just fitting the data as well as we can usually produces models with p(ynew|w,D) = p(ynew|w). That is, given w, we have all available information about making
many parameters, and they are not going to generalize to new inputs in almost all cases of interest. predictions that we could possibly get from the training data D (according to the model). Finally,
The general solution is to evaluate models by testing them on a new data set (the “test set”), given D, it is safe to assume that xnew, in itself, provides no information about W. With these
distinct from the training set. This measures how predictive the model is: Is it useful in new assumptions we have the following expression for our predictions:
situations? More generally, we often wish to obtain empirical estimates of performance. This p(ynew|D, xnew) =
can be useful for finding errors in implementation, comparing competing models and learning Z
algorithms, and detecting over or under fitting in a learned model. p(ynew|w, xnew) p(w|D)dw
10.1 Cross-Validation
The idea of empirical performance evaluation can also be used to determine model parameters that Principal Components Analysis
might otherwise to hard to determine. Examples of such model parameters include the constant K We now discuss an unsupervised learning algorithm, called Principal Components Analysis, or
in the K-Nearest Neighbors approach or the σ parameter in the Radial Basis Function approach. PCA. The method is unsupervised because we are learning a mapping without any examples of
Hold-out Validation. In the simplestmethod, we first partition our data randomly into a “training what the mapping looks like; all we see are the outputs, and we want to estimate both the mapping
and the inputs.  Implication: It is one of the logical connectives which can be represented as P → Q. It is a Boolean expression.
PCA is primarily a tool for dealing with high-dimensional data. If our measurements are 17-  Converse: The converse of implication, which means the right‐hand side proposition goes to the left‐hand side and vice‐
dimensional, or 30-dimensional, or 10,000-dimensional, manipulating the data can be extremely versa. It can be written as Q → P.
difficult. Quite often, the actual data can be described by a much lower-dimensional representation  Contrapositive: The negation of converse is termed as contrapositive, and it can be represented as ¬ Q → ¬ P.
that captures all of the structure of the data. PCA is perhaps the simplest approach for finding such  Inverse: The negation of implication is called inverse. It can be represented as ¬ P → ¬ Q.
a representation, and yet is it also very fast and effective, resulting in it being very widely used.
There are several ways in which PCA can help: From the above term some of the compound statements are equivalent to each other, which we can prove using truth
• Visualization: PCA provides a way to visualize the data, by projecting the data down to two table:
or three dimensions that you can plot, in order to get a better sense of the data. Furthermore,
the principal component vectors sometimes provide insight as to the nature of the data as
well.
• Preprocessing: Learning complex models of high-dimensional data is often very slow, and
also prone to overfitting—the number of parameters in a model is usually exponential in the
number of dimensions, meaning that very large data sets are required for higher-dimensional
models. This problem is generally called the curse of dimensionality. PCA can be used to
first map the data to a low-dimensional representation before applying a more sophisticated
algorithm to it. With PCA one can also whiten the representation, which rebalances the
weights of the data to give better performance in some cases.
• Modeling: PCA learns a representation that is sometimes used as an entire model, e.g., a
prior distribution for new data.
• Compression: PCA can be used to compress data, by replacing data with its low-dimensional
representation.
x

Clustering
Clustering is an unsupervised learning problem in which our goal is to discover “clusters” in the
data. A cluster is a collection of data that are similar in some way.
Clustering is often used for several different problems. For example, a market researcher might
want to identify distinct groups of the population with similar preferences and desires. When
working with documents you might want to find clusters of documents based on the occurrence
frequency of certain words. For example, this might allow one to discover financial documents,
legal documents, or email from friends. Working with image collections you might find clusters
of images which are images of people versus images of buildings. Often when we are given large Hence from the above truth table, we can prove that P → Q is equivalent to ¬ Q → ¬ P, and Q→ P is equivalent to ¬ P
amounts of complicated data we want to look for some underlying structure in the data, which → ¬ Q.
might reflect certain natural kinds within the training data. Clustering can also be used to compress
data, by replacing all of the elements in a cluster with a single representative element. Types of Inference rules:
1. Modus Ponens:

Rules of Inference in Artificial intelligence The Modus Ponens rule is one of the most important rules of inference, and it states that if P and P → Q is true, then we
can infer that Q will be true. It can be represented as:

Inference:

In artificial intelligence, we need intelligent computers which can create new logic from old logic or by evidence, so
generating the conclusions from evidence and facts is termed as Inference.
Example:
Inference rules:
Statement-1: "If I am sleepy then I go to bed" ==> P→ Q
Inference rules are the templates for generating valid arguments. Inference rules are applied to derive proofs in artificial Statement-2: "I am sleepy" ==> P
intelligence, and the proof is a sequence of the conclusion that leads to the desired goal. Conclusion: "I go to bed." ==> Q.
Hence, we can say that, if P→ Q is true and P is true then Q will be true.
In inference rules, the implication among all the connectives plays an important role. Following are some terminologies
related to inference rules: Proof by Truth table:

4. Disjunctive Syllogism:

The Disjunctive syllogism rule state that if P∨Q is true, and ¬P is true, then Q will be true. It can be represented as:

2. Modus Tollens:

The Modus Tollens rule state that if P→ Q is true and ¬ Q is true, then ¬ P will also true. It can be represented as:

Statement-1: "If I am sleepy then I go to bed" ==> P→ Q


Statement-2: "I do not go to the bed."==> ~Q
Statement-3: Which infers that "I am not sleepy" => ~P

Proof by Truth table:

3. Hypothetical Syllogism:

The Hypothetical Syllogism rule state that if P→R is true whenever P→Q is true, and Q→R is true. It can be represented
as the following notation:

Example:

Statement-1: If you have my home key then you can unlock my home. P→Q
Statement-2: If you can unlock my home then you can take my money. Q→R
Conclusion: If you have my home key then you can take my money. P→R

Proof by truth table:

You might also like