Artificial Intelligence Notes
Artificial Intelligence Notes
INTRODUCTION TO AI
1.1 ARTIFICIAL INTELLIGENCE (AI): A field of computer science and engineering concerned with
the computational understanding of what is commonly called intelligent behavior, and with the creation of
artifacts that exhibit such behavior.
Artificial intelligence (AI, also machine intelligence, MI) is intelligence demonstrated by machines, in
contrast to the natural intelligence (NI) displayed by humans and other animals as we perceive our
environment, learning from it and take action based on what we discovered.
Artificial Intelligence can be defined by examining the meaning of the two words- Artificial and
Intelligence. Artificial means something that is not real and which is a kind of fake because it is simulated.
Artificial grass used in sports field is the best example in the category of artificial. The cost of maintaining
the artificial thing is less than that of original. Intelligence defined in many ways- logic, understanding,
self awareness, learning, emotional knowledge, planning, creativity and problem solving.
So, collectively artificial and intelligence leads something which is not real but have logic, reasoning,
problem solving approach just like human. And thus, Artificial Intelligence (AI) is the study and creation
of computer systems that can perceive reason and act.
The primary aim of AI is to produce intelligent machines. The intelligence should be exhibited by thinking,
making decisions, solving problems, more importantly by learning. AI is an interdisciplinary field that
requires knowledge in computer science, linguistics, psychology, biology, philosophy and so on for serious
research.AI can also be defined as the area of computer science that deals with the ways in which computers
can be made to perform cognitive functions ascribed to humans. But this definition does not say what
functions are performed, to what degree
they are performed, or how these functions
are carried out.
1. Computer Science
2. Cognitive Science
3. Engineering
4. Ethics
5. Linguistics
6. Logic
7. Mathematics
8. Natural Sciences
9. Philosophy
10. Physiology
11. Psychology General problem Solving, Expert System, Natural Language Processing, Computer Vision
12. Statistics
FIG 1.1: Application Area of AI
1
1.1.1 DEFINITIONS OF AI
In computer science AI research is defined as the study of "intelligent agents": any device that perceives its
environment and takes actions that maximize its chance of successfully achieving its goals. Colloquially,
the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans
associate with other human minds, such as "learning" and "problem solving".
“The automation of activities that we associate with human thinking, activities such as decision-making,
problem solving, learning” (Bellman, 1978)
“The study of mental faculties through the use of computational models” (Charniak and McDermott,
1985)
“The study of how to make computers do things at which, at the moment, people are better” (Rich and
Knight, 1991)
“The branch of computer science that is concerned with the automation of in- telligent behavior” (Luger
and Stubblefield, 1993)
Artificial Intelligence is the ability of machines to perform tasks that normal human beings do. They may
not be able to perform all these activities correctly but can come close to it with the help of the conditions
which are set for them. This is an important field in computer science and can be further divided into two
terms, Strong Artificial Intelligence and Weak Artificial Intelligence. These terms have nothing to do with
being strong and weak physically.
2
Weak Artificial Intelligence is the phenomenon that machines which is not too intelligent to do their own
work can be built in such a way that they seem smart. Thus, Weak AI does not believe that creating human-
level intelligence in machines is possible but AI techniques can be developed to solve many real-life
problems- search engine which convert voice command to text and search for given command, virtual
reality where touch screen will be replaced by gesture, surveillance, Autonomous Transport. That is, it is
the study of mental models implemented on a computer. Siri by apple is the best example of weak AI. It
involve Machine learning.
Strong AI Weak AI
The machine can actually
The devices cannot follow
think and perform tasks on
Definition these tasks on their own but
its own just like a human
are made to look intelligent.
being.
Algorithm is stored by a Tasks are entered manually
Functionality
computer program. to be performed.
There are no proper An automatic car or remote
Examples
examples for Strong AI. control devices.
• Humans perceive by patterns whereas the machines perceive by set of rules and data.
• Humans store and recall information by patterns, machines do it by searching algorithms. For
example, the number 40404040 is easy to remember, store, and recall as its pattern is simple.
• Humans can figure out the complete object even if some part of it is missing or distorted; whereas
the machines cannot do it correctly.
A typical program has three major segments: input, processing and output. So regular programming and
Artificial Intelligence programming can be compared in terms of these three segments.
Input
In regular programming, input is a sequence of alphanumeric symbols presented and stored as per some
given set of previously stipulated rules and that uses a limited set of communication media such as
keyboard, mouse, disc, etc.
3
In Artificial Intelligence programming the input may be a sight, sound, touch, smell or taste. Sight means
one dimensional symbols such as typed text, two dimensional objects or three dimensional scenes. Sound
input include spoken language, music, noise made by objects. Touch include temperature, smoothness,
resistance to pressure. Smell input include odors emanating from animate and inanimate objects. And taste
input include sweet, sour, salty, bitter foodstuffs and chemicals.
Processing
In regular programming, processing means manipulation of the stored symbols by a set of previously
Output
In regular programming, output is a sequence of alphanumeric symbols, may be in a given set of colors,
that represents the result of the processing and that is placed on such a medium as a CRT screen, paper, or
magnetic disk.
In AI programming, output can be in the form of printed language and synthesized speech (Speech synthesis
is the computer-generated simulation of human speech. It is used to translate written information into aural
information where it is more convenient, especially for mobile applications such as voice-enabled e-mail
and unified messaging . It is also used to assist the vision-impaired so that, for example, the contents of a
display screen can be automatically read aloud to a blind user), manipulation of physical objects or
locomotion i.e., movement in space.
1.1.5 CHALLENGES
The scope of AI is disputed: as machines become increasingly capable, tasks considered as requiring
"intelligence" are often removed from the definition, a phenomenon known as the AI effect, leading to the
quip, "AI is whatever hasn't been done yet." For instance, optical character recognition (scanner) is
frequently excluded from "artificial intelligence", having become a routine technology. Capabilities
generally classified as AI as of 2017 include successfully understanding human speech, competing at the
highest level in strategic game systems (such as chess and Go), autonomous cars, intelligent routing in
content delivery network and military simulations.
It is true that AI does not yet achieve its ultimate goal. Still AI systems could not defeat even a three year
old child on many counts: ability to recognize and remember different objects, adapt to new situations,
4
understand and generate human languages, and so on. The main problem is that we, still could not
understand how human mind works, how we learn new things, especially how we learn languages and
reproduce them properly.
1.1.6 APPLICATIONS
There are many AI applications that we witness: Robotics, Machine translators, chatbots(audio or vedio
conversation of machine like human, pass turing test) , voice recognizers to name a few. AI techniques
are used to solve many real life problems. Some kind of robots are helping to find land-mines, searching
humans trapped in rubbles due to natural calamities.
1.1.7 FUTURE OF AI
AI is the best field for dreamers to play around. It must be evolved from the thought that making a human-
machine is possible. Though many conclude that this is not possible, there is still a lot of research going on
in this field to attain the final objective. There are inherent advantages of using computers as they do not
get tired or loosing temper and are becoming faster and faster. Only time will say what will be the future of
AI: will it attain human-level or above human-level intelligence or not.
1. Acting humanly
The Turing test (1950) “Computing machinery and intelligence”:
Turing test is performed to identify a computer and human.
During the test, one of the humans functions as the questioner, while the second human and the computer
function as respondents. The questioner interrogates the respondents within a certain subject area, using a
specified format and context. After a preset length of time or number of questions, the questioner is then
asked to decide which respondent was human and which was a computer.
5
The test is repeated many times. If the questioner makes the correct determination in half of the test runs or
less, the computer is considered to have artificial intelligence, because the questioner regards it as "just as
human" as the human respondent.
Suggested major components of AI: knowledge, reasoning, FIG 1.2: Turing Test
language understanding, learning Problem.
Thinking humanly is related to cognitive science. It is the theory of how the brain works. How it
handles the nodes and how it analyzes the data that your body generates. So far, there is no full
understanding on how the brain functions. Thinking humanly means trying to understand and
model how the human mind works. There are (at least) two possible routes that humans use to find
the answer to a question:
The difference between “acting humanly” and “thinking humanly” is that the first is only concerned
with the actions, the outcome or product of the human’s thinking process; whereas the latter is
concerned wtih modeling human thinking processes.
3. Thinking rationally
Trying to understand how we actually think is one route to AI. But another approach is to model
how we should think. The “thinking rationally” approach to AI uses symbolic logic to capture the
laws of rational thought as symbols that can be manipulated. Reasoning involves manipulating the
symbols according to well-define d rules, kind of like algebra. The result is an idealized model of
human reasoning. This approach is attractive to theoretists, i.e., modeling how humans should think
and reason in an ideal world.
4. Acting rationally
Rational behavior: doing the right thing .The right thing: that which is expected to maximize goal
achievement, given the available information. Acting rationally means acting to achieve one’s
goals, given ne’s beliefs or understanding about the world. An agent is a system that perceives an
environment and acts within that environment. An intelligent agent is one that acts rationally with
respect to its goals. Forexample, an agent that is designed to play a game should make moves that
increase its chances of winning the game. When constructing an intelligent agent, emphasis shifts
from designing the theoretically best decision-making procedure to designing the best decision-
making procedure possible within the circumstances in which the agent is acting. Logical
approaches may be used to help find the best action, but there are also other approaches. Achieving
6
so-called “perfect rationality”, making the best decision theoretically possible, is not usually
possible due to limited resources in a real environment (e.g., time, memory, computational power,
uncertainty, etc.). The trick is to do the best with the information and resources you have. This
represents a shift in the field of AI from optimizing (early AI) to satisfying (more recent AI).
1.3 HISTORY OF AI
Many disciplines (philosophy, mathematics, economics, psychology, linguis-tics, computer engineering,
control theory, neuroscience, and more) have contributed ideas, viewpoints, and techniques to AI- a handful
of scientist from these fields began to discuss the possibility of creating an artificial brain. The history of
AI has had cycles of success, misplaced optimism, and resulting retrenchments; cycles of new creativity
and systematic refinement of best approaches.
Artificial intelligence was founded as an academic discipline in 1956, and in the years since has experienced
several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"),
followed by new approaches, success and renewed funding. For most of its history, AI research has been
divided into subfields that often fail to communicate with each other. These sub-fields are based on
technical considerations, such as particular goals (e.g. "robotics" or "machine learning"), the use of
particular tools ("logic" or artificial neural networks), or deep philosophical differences. Subfields have
also been based on social factors (particular institutions or the work of particular researchers).
The traditional problems (or goals) of AI research include reasoning, knowledge representation, planning,
learning, natural language processing, perception and the ability to move and manipulate objects. General
intelligence is among the field's long-term goals. Approaches include statistical methods, computational
intelligence, and traditional symbolic AI. Many tools are used in AI, including versions of search and
mathematical optimization, artificial neural networks, and methods based on statistics, probability and
economics. The AI field draws upon computer science, mathematics, psychology, linguistics, philosophy
and many others.
The field was founded on the claim that human intelligence "can be so precisely described that a machine
can be made to simulate it". This raises philosophical arguments about the nature of the mind and the ethics
of creating artificial beings endowed with human-like intelligence which are issues that have been explored
by myth, fiction and philosophy since antiquity. Some people also consider AI to be a danger to humanity
if it progresses unabatedly. Others believe that AI, unlike previous technological revolutions, will create a
risk of mass unemployment.
In the twenty-first century, AI techniques have experienced a resurgence following concurrent advances in
computer power, large amounts of data, and theoretical understanding; and AI techniques have become an
essential part of the technology industry, helping to solve many challenging problems in computer science.
1943-1955: Gestation- thought in mind to produce computer that follow human instruction.
1952-1969: Great Expectations-The early years of A1 were full of successes-in a limited way. General Problem
Solver ( GPS ) was a computer program created in 1957 by Herbert Simon and Allen Newell to build a universal
problem solver machine. The order in which the program considered subgoals and possible actions was similar to that
in which humans approached the same problems. Thus, GPS was probably the first program to embody the "thinking
humanly" approach. At IBM, Nathaniel Rochester and his colleagues produced some of the first A1 programs. Herbert
Gelernter (1959) constructed the Geometry Theorem Prover, which was able to prove theorems that many students of
mathematics would find quite tricky.
1966-1973: Reality- From the beginning, AI researchers were not shy about making predictions of their coming
successes. The following statement by Herbert Simon in 1957 is often quoted: “It is not my aim to surprise or shock
7
you-but the simplest way I can summarize is to say that there are now in the world machines that think, that learn and
that create. Moreover, their ability to do these things is going to increase rapidly until-in a visible future-the range of
problems they can handle will be coextensive with the range to which the human mind has been applied.
1969-1979: Knowledge is Power-Dendral was an influential pioneer project in artificial intelligence (AI) of the
1960s, and the computer software expert system that it produced. Its primary aim was to help organic chemists in
identifying unknown organic molecules, by analyzing their mass spectra and using knowledge of chemistry. It was
done at Stanford University by Edward Feigenbaum, Bruce Buchanan, Joshua Lederberg, and Carl Djerassi.
1980-present: AI and Industry- In 1981, the Japanese announced the "Fifth Generation" project, a 10-year plan to
build intelligent computers running Prolog. Overall, the A1 industry boomed from a few million dollars in 1980 to
billions of dollars in 1988.
1986-present: The Return of Neural Networks-Psychologists including David Rumelhart and Geoff Hinton
continued the study of neural-net models of memory
1987-present: AI Becomes a Science- In recent years, approaches based on hidden Markov models (HMMs) have
come to dominate the area. Speech technology and the related field of handwritten character recognition are already
making the transition to widespread industrial and consumer applications. The Bayesian network formalism was
invented to allow efficient representation of, and rigorous reasoning with, uncertain knowledge
1995-present: Intelligent Agents -One of the most important environments for intelligent agents is the Internet
AI becomes a science, Intelligent Agents 1987– Rapid increase in technical depth Build on existing theories Base
claims on theorems and/or experments rather than intuition Real world applications rather than toy examples
Replication of experiments with data and code repositories Less isolationism 1995– Whole agents rather than
fragments Situated movement Internet environments (“bots”)
Alan Turing introduced Turing Test for evaluation of intelligence and published Computing Machinery
1950
and Intelligence. Claude Shannon published Detailed Analysis of Chess Playing as a search.
John McCarthy coined the term Artificial Intelligence. Demonstration of the first running AI program at
1956
Carnegie Mellon University.
1958 John McCarthy invents LISP programming language for AI.
Danny Bobrow's dissertation at MIT showed that computers can understand natural language well enough
1964
to solve algebra word problems correctly.
1965 Joseph Weizenbaum at MIT built ELIZA, an interactive problem that carries on a dialogue in English.
Scientists at Stanford Research Institute Developed Shakey, a robot, equipped with locomotion,
1969
perception, and problem solving.
The Assembly Robotics group at Edinburgh University built Freddy, the Famous Scottish Robot, capable
1973
of using vision to locate and assemble models.
8
1979 The first computer-controlled autonomous vehicle, Stanford Cart, was built.
1985 Harold Cohen created and demonstrated the drawing program, Aaron.
Major advances in all areas of AI −
• Significant demonstrations in machine learning
• Case-based reasoning
• Multi-agent planning
1990 • Scheduling
• Data mining, Web Crawler
• natural language understanding and translation
• Vision, Virtual Reality
• Games
1997 The Deep Blue Chess Program beats the then world chess champion, Garry Kasparov.
Interactive robot pets become commercially available. MIT displays Kismet, a robot with a face that
2000
expresses emotions. The robot Nomad explores remote regions of Antarctica and locates meteorites.
• Natural Language Processing − It is possible to interact with the computer that understands
natural language spoken by humans. Just getting a sequence of words into a computer is not
enough. Parsing sentences is not enough either. The computer has to be provided with an
understanding of the domain the text is about, and this is presently possible only for very limited
domains. Voice to text, convert one language to other, semantic search-sentence query.
• Speech Recognition - In the 1990s, computer speech recognition reached a practical level for
limited purposes. Thus United Airlines has replaced its keyboard tree for flight information by a
system using speech recognition of flight numbers and city names. It is quite convenient. On the
the other hand, while it is possible to instruct some computers using speech, most users have gone
back to the keyboard and the mouse as still more convenient.
Some intelligent systems are capable of hearing and comprehending the language in terms of
sentences and their meanings while a human talks to it. It can handle different accents, slang words,
noise in the background, change in human’s noise due to cold, etc.
• Expert Systems − There are some applications which integrate machine, software, and special
information to impart reasoning and advising. They provide explanation and advice to the users.
One of the first expert systems was MYCIN in 1974, which diagnosed bacterial infections of the
blood and suggested treatments, financial decision making-detecting fraud and expedite financial
transaction.
• Vision Systems − These systems understand, interpret, and comprehend visual input on the
computer. For example,
o A spying aero plane takes photographs, which are used to figure out spatial information
or map of the areas.
o Doctors use clinical expert system to diagnose the patient.
o Police use computer software that can recognize the face of criminal with the stored
portrait made by forensic artist.
9
The world is composed of three-dimensional objects, but the inputs to the human eye and
computers' TV cameras are two dimensional. Some useful programs can work solely in two
dimensions, but full computer vision requires partial three-dimensional information that is not just
a set of two-dimensional views. At present there are only limited ways of representing three-
dimensional information directly, and they are not as good as what humans evidently use.
• Handwriting Recognition − The handwriting recognition software reads the text written on paper
by a pen or on screen by a stylus. It can recognize the shapes of the letters and convert it into
editable text.
• Intelligent Robots − Robots are able to perform the tasks given by a human. They have sensors to
detect physical data from the real world such as light, heat, temperature, movement, sound, bump,
and pressure. They have efficient processors, multiple sensors and huge memory, to exhibit
intelligence. In addition, they are capable of learning from their mistakes and they can adapt to the
new environment.
An agent is anything that can perceive its environment through sensors and acts upon that environment
through effectors.
• A human agent has sensory organs such as eyes, ears, nose, tongue and skin parallel to the
sensors, and other organs such as hands, legs, mouth, for effectors.
• A robotic agent replaces cameras and infrared range finders for the sensors, and various motors
and actuators for effectors.
• A software agent has encoded bit strings as its programs and actions.
FIG1.3: AI Agent
• Performance Measure of Agent − It is the criteria, which determines how successful an agent is.
• Behavior of Agent − It is the action that agent performs after any given sequence of percepts.
• Percept − It is agent’s perceptual inputs at a given instance.
• Percept Sequence − It is the history of all that an agent has perceived till date.
• Agent Function − It is a map from the precept sequence to an action.
10
1.5.2 RATIONAL AGENT
Rationality is nothing but status of being reasonable, sensible, and having good sense of judgment.
Rationality is concerned with expected actions and results depending upon what the agent has perceived.
Performing actions with the aim of obtaining useful information is an important part of rationality.
A rational agent always performs right action, where the right action means the action that causes the agent
to be most successful in the given percept sequence. The problem the agent solves is characterized by
Performance Measure, Environment, Actuators, and Sensors (PEAS).
11
Intelligent agents are often described schematically as an abstract functional system similar to a computer
program. For this reason, intelligent agents are sometimes called abstract intelligent agents (AIA) to
distinguish them from their real world implementations as computer systems, biological systems, or
organizations. Some definitions of intelligent agents emphasize their autonomy, and so prefer the term
autonomous intelligent agents. Still others (notably Russell & Norvig (2003)) considered goal-directed
behavior as the essence of intelligence and so prefer a term borrowed from economics, "rational agent".
Intelligent agents in artificial intelligence are closely related to agents in economics, and versions of the
intelligent agent paradigm are studied in cognitive science, ethics, the philosophy of practical reason, as
well as in many interdisciplinary socio-cognitive modeling and computer social simulations.
Intelligent agents are also closely related to software agents (an autonomous computer program that carries
out tasks on behalf of users-mail generating, message generating). In computer science, the term
intelligent agent may be used to refer to a software agent that has some intelligence, regardless if it is not
a rational agent by Russell and Norvig's definition. For example, autonomous programs used for
operator assistance or data mining (sometimes referred to as bots) are also called "intelligent agents".
A simple agent program can be defined mathematically as an agent function which maps every possible
percepts sequence to a possible action the agent can perform or to a coefficient, feedback element.
Agent function is an abstract concept as it could incorporate various principles of decision making like
calculation of utility of individual options, deduction over logic rules, fuzzy logic, etc.
The program agent, instead, maps every possible percept to an action. We use the term percept to refer to
the agent's perceptional inputs at any given instant. An agent is anything that can be viewed as perceiving
its environment through sensors and acting upon that environment through actuators.
1.5.6.1 ARCHITECTURES
Weiss (2013) said we should consider four classes of agents:
• Logic-based agents – in which the decision about what action to perform is made via logical
deduction;
• Reactive agents – in which decision making is implemented in some form of direct mapping from
situation to action;
12
• Belief-desire-intention agents – in which decision making depends upon the manipulation of data
structures representing the beliefs, desires, and intentions of the agent.
• Layered architectures – in which decision making is realized via various software layers, each of
which is more or less explicitly reasoning about the environment at different levels of abstraction.
13
FIG 1.9: Model-Based, Utility-Based Agent
14
A situation action rule is basically a hypothetical imperative. If situation X is the current state of affairs and
goal Z requires plan Y, than execute Y. Or even more simply, given X, execute Y. Thus for a medical
diagnostic agent, if a certain set of symptoms is present, given a certain medical history, offer X diagnosis.
Some expert systems fall under the category of reflex agent. Examples of Reflex Agent : Chess, Checkers,
Tic Tac Toe, Connect-Four)
The simple mercury type thermostat also is a reflex agent with only three rules. If the temperature reaches
x, then turn the heater on. If the temperature reaches y, then turn the heater off. Otherwise, do nothing. The
main difference is that the reflex agent requires a program that is not itself immediately and mechanically
linked to the environment.
A model-based reflex agent should maintain some sort of internal model that depends on the percept history
and thereby reflects at least some of the unobserved aspects of the current state. Percept history and impact
of action on the environment can be determined by using internal model. It then chooses an action in the
same way as reflex agent.
Example: Agent: robot vacuum cleaner
Environment: dirty room, furniture.
Model: map of room, which areas already cleaned.
3 GOAL-BASED AGENTS
Goal-based agents further expand on the capabilities of the model-
based agents, by using "goal" information. Goal information describes
situations that are desirable. This allows the agent a way to choose
among multiple possibilities, selecting the one which reaches a goal
state. Search and planning are the subfields of artificial intelligence
devoted to finding action sequences that achieve the agent's goals.
15
Goal Based agents will use information about what they know and their current state to see if they have
accomplished what they wanted to or not For example, if you have an autonomous car that you want to take
you somewhere, once it gets to a stop and it has the ability to turn in 3 different directions, using the goal-
based agent it can know which one it should take to get to its destination.
Example; Agent: robot maid
Environment: house & people.
Goals: clean clothes, tidy room, table laid, etc.
4 UTILITY-BASED AGENTS
Goal-based agents only distinguish between goal states and non-goal states. It is possible to define a
measure of how desirable a particular state is. Or Sometimes agents will have multiple conflicting goals.
In this case, a utility function is more appropriate. This measure can be obtained through the use of a utility
function which maps a state to a measure of the utility of the state. A more general performance measure
should allow a comparison of different world states according to exactly how happy they would make the
agent. The term utility can be used to describe how "happy" the agent is.
For example, if you have an autonomous car that you want to take you somewhere, once it gets to a stop
and it has the ability to turn in 3 different directions, using the goal-based agent it can know which one it
should take to get to it’s destination.
5 LEARNING AGENTS
The Uber driver, learned something new by trying something different. Learning agents operate similarly.
A learning agent is a tool in AI that is capable of learning from its experiences. It starts with some basic
knowledge and is then able to act and adapt autonomously, through learning, to improve its own
performance. Unlike intelligent agents that act on information provided by a programmer, learning agents
are able to perform tasks, analyze performance and look for new ways to improve on those tasks.
16
2. Critic element: The critic element determines the outcome of the action and gives feedback.
3. Learning element: The learning element takes the feedback from the critic element and figures out
how to make the action better next time.
4. Problem generator: The problem generator is the component that is tasked with developing new
experiences for the learning agent to try. This is the piece that helps the agent continue to learn.
Learning has the advantage that it allows the agents to initially operate in unknown environments and to
become more competent than its initial knowledge alone might allow. The most important distinction is
between the "learning element", which is responsible for making improvements, and the "performance
element", which is responsible for selecting external actions.
The learning element uses feedback from the "critic" on how the agent is doing and determines how the
performance element should be modified to do better in the future. The performance element is what we
have previously considered to be the entire agent: it takes in percepts and decides on actions.
The last component of the learning agent is the "problem generator". It is responsible for suggesting actions
that will lead to new and informative experiences.
1.5.8 APPLICATIONS
Intelligent agents are applied as automated online assistants, where they function to perceive the needs of
customers in order to perform individualized customer service. Such an agent may basically consist of a
dialog system, an avatar (an icon or figure representing a particular person in a video game, Internet forum,
etc.), as well an expert system to provide specific expertise to the user. They can also be used to optimize
coordination of human groups online.
As our human visual understanding of world is reflected in our ability to make decisions through what we
see, providing such a visual understanding to computers would allow them the same power :
17
FIG 1.15: Basic Steps in Scene Interpretation
Most of the time, the raw data acquired by these devices needs to be post-processed in order to be more
efficiently exploited in the next steps.
Computer vision is an interdisciplinary field that deals with how computers can be made for gaining high-
level understanding from digital images or videos. From the perspective of engineering, it seeks to automate
tasks that the human visual system can do.
Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital
images, and extraction of high-dimensional data from the real world in order to produce numerical or
symbolic information, e.g., in the forms of decisions. Understanding in this context means the
transformation of visual images (the input of the retina) into descriptions of the world that can interface
18
with other thought processes and elicit appropriate action. This image understanding can be seen as the
disentangling of symbolic information from image data using models constructed with the aid of geometry,
physics, statistics, and learning theory.
As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract
information from images. The image data can take many forms, such as video sequences, views from
multiple cameras, or multi-dimensional data from a medical scanner. As a technological discipline,
computer vision seeks to apply its theories and models for the construction of computer vision systems.
Sub-domains of computer vision include scene reconstruction(scene reconstruction is the forensic
science discipline in which one gains "explicit knowledge of the series of events-crime scene), event
detection (Event detection is the process of identifying that an event was generated in the SAP application),
video tracking (Video tracking is the process of locating a moving object (or multiple objects) over time
using a camera ), object recognition(Object recognition is a computer vision technique for identifying
objects in images or videos ), 3D pose estimation(3D pose estimation is the problem of determining the
transformation of an object in a 2D image which gives the 3D object.), learning (knowledge aquisition),
indexing(Computer vision is embracing a new research focus in which the aim is to develop visual skills
for robots that allow them to interact with a dynamic, realistic environment), motion estimation(Motion
estimation is the process of determining motion vectors that describe the transformation from one 2D image
to another; usually from adjacent frames in a video sequence.), and image restoration(Image Restoration
is the operation of taking a corrupt/noisy image and estimating the clean, original image ).
1.6.1 APPLICATIONS
Applications range from tasks such as industrial machine vision systems which, say, inspect bottles
speeding by on a production line, to research into artificial intelligence and computers or robots that can
comprehend the world around them. The computer vision and machine vision fields have significant
overlap. Computer vision covers the core technology of automated image analysis which is used in many
fields. Machine vision usually refers to a process of combining automated image analysis with other
methods and technologies to provide automated inspection and robot guidance in industrial applications. In
many computer vision applications, the computers are pre-programmed to solve a particular task, but
methods based on learning are now becoming increasingly common.
19
Examples of applications of computer vision include systems for:
1. Learning 3D shapes has been a challenging task in computer vision. Recent advances in deep learning
has enabled researchers to build models that are able to generate and reconstruct 3D shapes from single
or multi-view depth maps or silhouettes seamlessly and efficiently
• Automatic inspection, e.g., in manufacturing applications;
• Assisting humans in identification tasks, e.g., a species identification system;
• Controlling processes, e.g., an industrial robot;
• Detecting events, e.g., for visual surveillance or people counting;
• Interaction, e.g., as the input to a device for computer-human interaction;
• Modeling objects or environments, e.g., medical image analysis or topographical modeling;
• Navigation, e.g., by an autonomous vehicle or mobile robot; and
• Organizing information, e.g., for indexing databases of images and image sequences
.
2. One of the most prominent application fields is medical computer vision or medical image processing.
This area is characterized by the extraction of information from image data for the purpose of making
a medical diagnosis of a patient. Generally, image data is in the form of microscopy images, X-ray
images, angiography images, ultrasonic images, and tomography images. An example of information
which can be extracted from such image data is detection of tumours, arteriosclerosis or other malign
changes. It can also be measurements of organ dimensions, blood flow, etc. This application area also
supports medical research by providing new information, e.g., about the structure of the brain, or about
the quality of medical treatments. Applications of computer vision in the medical area also includes
enhancement of images that are interpreted by humans, for example ultrasonic images or X-ray images,
to reduce the influence of noise.
3. A second application area in computer vision is in industry, sometimes called machine vision, where
information is extracted for the purpose of supporting a manufacturing process. One example is quality
control where details or final products are being automatically inspected in order to find defects.
Another example is measurement of position and orientation of details to be picked up by a robot arm.
Machine vision is also heavily used in agricultural process to remove undesirable food stuff from bulk
material, a process called optical sorting.
4. Military applications are probably one of the largest areas for computer vision. The obvious examples
are detection of enemy soldiers or vehicles and missile guidance. More advanced systems for missile
guidance send the missile to an area rather than a specific target, and target selection is made when the
missile reaches the area based on locally acquired image data. Modern military concepts, such as
"battlefield awareness", imply that various sensors, including image sensors, provide a rich set of
information about a combat scene which can be used to support strategic decisions. In this case,
automatic processing of the data is used to reduce complexity and to fuse information from multiple
sensors to increase reliability.
20
environment (SLAM) and for detecting obstacles. It can also be used for detecting certain task
specific events, e.g., a UAV looking for forest fires. Examples of supporting systems are obstacle
warning systems in cars, and systems for autonomous landing of aircraft.
Several car manufacturers have demonstrated systems for autonomous driving of cars, but this technology
has still not reached a level where it can be put on the market. There are ample examples of military
autonomous vehicles ranging from advanced missiles, to UAVs for recon missions or missile guidance.
Space exploration is already being made with autonomous vehicles using computer vision, e.g., NASA's
Mars Exploration Rover and ESA's ExoMars Rover.Other application areas include:
• Support of visual effects creation for cinema and broadcast, e.g., camera tracking (matchmoving).
• Surveillance.
• Tracking and counting organisms in the biological sciences
Natural language processing (NLP) is the ability of a computer program to understand human language as
it is spoken. NLP is a component of artificial intelligence (AI). NLP is a branch of artificial intelligence
that deals with analyzing, understanding and generating the languages that humans use naturally in order
to interface with computers in both written and spoken contexts using natural human languages instead of
computer languages.
One of the challenges inherent in natural language processing is teaching computers to understand the way
humans learn and use language. Take, for example, the sentence "Baby swallows fly." This simple sentence
has multiple meanings, depending on whether the word "swallows" or the word "fly" is used as the verb,
which also determines whether "baby" is used as a noun or an adjective. In the course of human
communication, the meaning of the sentence depends on both the context in which it was communicated
and each person understands of the ambiguity in human languages. This sentence poses problems for
software that must first be programmed to understand context and linguistic structures.
The development of NLP applications is challenging because computers traditionally require humans to
"speak" to them in a programming language that is precise, unambiguous and highly structured, or through
a limited number of clearly enunciated voice commands. Human speech, however, is not always precise --
it is often ambiguous and the linguistic structure can depend on many complex variables, including slang,
regional dialects and social context.
Natural language processing (NLP) is an area of computer science and artificial intelligence concerned
with the interactions between computers and human (natural) languages, in particular how to program
computers to process and analyze large amounts of natural language data.
21
Sentiment analysis is another primary use case for NLP. Using sentiment analysis, data scientists can assess
comments on social media to see how their business's brand is performing, for example, or review notes
from customer service teams to identify areas where people want the business to perform better.
Google and other search engines base their machine translation technology on NLP deep learning models
(Deep Learning is an artificial intelligence function that imitates the workings of the human brain in
processing data and creating patterns for use in decision making. Deep learning is a subset of machine
learning in Artificial Intelligence (AI) that has networks capable of learning unsupervised from data that is
unstructured or unlabeled.). This allows algorithms to read text on a webpage, interpret its meaning and
translate it to another language.
Natural Language Processing (NLP) refers to AI method of communicating with an intelligent systems
using a natural language such as English. Processing of Natural Language is required when you want an
intelligent system like robot to perform as per your instructions, when you want to hear decision from a
dialogue based clinical expert system, etc. The field of NLP involves making computers to perform useful
tasks with the natural languages humans use. The input and output of an NLP system can be −
• Speech
• Written Text
COMPONENTS OF NLP
There are two components of NLP as given –
22
Natural KNOWLEDGE OUTPUT Natural
PARSER
Language Text REPRESENTATION TRANSLATIO Language
String input SYSTEM N Text or
Computer Code/
Language Output
DICTIONARY
DIFFICULTIES IN NLU
NL has an extremely rich form and structure.
It is very ambiguous. There can be different levels of ambiguity −
• Lexical ambiguity − It is at very primitive level such as word-level.
• For example, treating the word “board” as noun or verb?
• Syntax Level ambiguity − A sentence can be parsed in different ways.
• For example, “He lifted the beetle with red cap.” − Did he use cap to lift the beetle or he lifted a
beetle that had red cap?
• Referential ambiguity − Referring to something using pronouns. For example, Rima went to
Gauri. She said, “I am tired.” − Exactly who is tired?
• One input can mean different meanings.
• Many inputs can mean the same thing.
NLP Terminology
• Phonology − It is study of organizing sound systematically.
• Morphology − It is a study of construction of words from primitive meaningful units.
• Morpheme − It is primitive unit of meaning in a language.
• Syntax − It refers to arranging words to make a sentence. It also involves determining the
structural role of words in the sentence and in phrases.
• Semantics − It is concerned with the meaning of words and how to combine words into
meaningful phrases and sentences.
• Pragmatics − It deals with using and understanding sentences in different situations and how the
interpretation of the sentence is affected.
• Discourse − It deals with how the immediately preceding sentence can affect the interpretation of
the next sentence.
• World Knowledge − It includes the general knowledge about the world.
Steps in NLP
There are general five steps −
• Lexical Analysis − It involves identifying and analyzing the structure of words. Lexicon of a
language means the collection of words and phrases in a language. Lexical analysis is dividing
the whole chunk of txt into paragraphs, sentences, and words.
23
• Syntactic Analysis (Parsing) − It involves analysis of words in the sentence for grammar and
arranging words in a manner that shows the relationship among the words. The sentence such as
“The school goes to boy” is rejected by English syntactic
analyzer.
• Pragmatic Analysis − During this, what was said is re-interpreted on what it actually meant. It
involves deriving those aspects of language which require real world knowledge.
CONTEXT-FREE GRAMMAR
It is the grammar that consists rules with a single symbol on the left-
hand side of the rewrite rules. Let us create grammar to parse a
sentence − “The bird pecks the grains”
The parse tree breaks down the sentence into structured parts so that the computer can easily understand
and process it. In order for the parsing algorithm to construct this parse tree, a set of rewrite rules, which
describe what tree structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other symbols.
According to first order logic rule, if there are two strings Noun Phrase (NP) and Verb Phrase (VP), then
the string combined by NP followed by VP is a sentence. The rewrite rules for the sentence are as follows
24
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
Lexocon −
DET → a | the
ADJ → beautiful | perching
N → bird | birds | grain | grains
V → peck | pecks | pecking
Now consider the above rewrite rules. Since V can be replaced by both, "peck" or "pecks", sentences such
as "The bird peck the grains" with wrong subject-verb agreement are also permitted.
TOP-DOWN PARSER
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of terminal symbols
that matches the classes of the words in the input sentence until it consists entirely of terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the process is started over
again with a different set of rules. This is repeated until a specific rule is found which describes the
structure of the sentence.
Merit − It is simple to implement.
Demerits
• It is inefficient, as the search process has to be repeated if an error occurs.
• Slow speed of working.
Computer language (Machine level language) is complex to understand by humans and for computers to
understand natural language is equally difficult at present. In order to understand natural language computer
must know how to generate, understand and translate. Thus, a natural language computer have a parser,
knowledge representation system and an output translator as shown in FIG 1.17.
The computer has to be provided with an understanding of the domain the text is about, and this is presently
possible for very limited domains. A robot is capable of understanding speech in a natural language will
be of immense importance, for it could execute any task verbally communicated to it. The phonetic
typewriter, which prints the words pronounced by a person is another recent invention where speech is
employed in a commercial application.
25
1.7.4 MAIN NLP APPLICATIONS
Automatic translation
• text-to-text translation (web, email)
• speech-to-speech translation (telephone, phrasebook)
• assistive technologies: speech-to-subtitles, speech-to-sign-language
• reading:
Human-computer dialogue
• text dialogue systems (SHRDLU, Eliza, chatbots, web helper agents)
• spoken dialogue systems (call centres, in-car systems, Apple SIRI)
• multi-modal systems (smartphone, information desks, avatars/talking heads)
• reading and links:
Question answering
• given a human-language question, determine its answer
• the IBM Watson system won Jeopardy in February 2011
• reading:
Text mining
• web search
• summarization
• categorization
• entity/relation recognition
• sentiment analysis
• reading:
Accessibility
1. visually impaired:
• speech synthesis: screen readers, VoiceXML
• speech recognition: dictation, dialogue systems
• automatic Braille terminals
2. hearing impaired:
• speech recognition and synthesis
• sign language recognition and synthesis
• real-time sign language translation of TV programs
3. elderly:
• can have problems with seeing, hearing, short-term memory, fine motor skills, loneliness
• possible NLP technologies: speech recognition and synthesis, automatic
summarisation, dialogue systems, chatbots
4. communicative disorders:
• alternative and augmentative communication (AAC)
• speech and dialogue technologies can help communicating with the society
Newspaper headlines
Newspaper headlines are extra prone to ambiguities, since they often lack function words.
• Infant abducted from hospital safe --- lexical ambiguity (safe)
• British left waffles on Falklands --- lexical amb. (left, waffles)
• Jails for women in need of a facelift --- structural amb. (in need)
• Enraged cow injures farmer with axe --- structural (with axe)
• Stolen painting found by tree --- word sense (by)
26
• Miners refuse to work after death --- reference (after death)
• Jail releases upset judges --- lexical (releases, upset)
• Drunk gets nine months in violin case --- word sense (case)
• Teacher strikes idle kids --- lexical (strikes)
• Squad helps dog bite victim --- lexical (bite)
• Prostate cancer more common in men --- reference (more common)
• Smithsonian may cancel bombing of Japan exhibits --- structural (exhibits)
• Juvenile court to try shooting defendant --- lexical (try)
• Two sisters reunited after 18 years in checkout counter --- structural (in counter)
• Two Soviet ships collide, one dies --- reference (one)
• Taxiförare dödade man med bil --- structural (med bil)
• Förbud mot droger utan verkan --- structural (utan verkan)
Phonological ambiguity
• "Eye halve a spelling checker
It came with my pea sea
It plainly marks four my revue
Miss steaks eye kin knot sea."
Lexical ambiguity
1. one word -- several meanings = word senses
• "by" is a preposition with 8 senses (New Oxford American Dictionary)
• "case" is a noun with 4 senses
2. different words -- same spelling (or pronunciation)
• "safe" is a noun and an adjective
• "left" is a noun, an adjective and past tense of the verb "leave"
3. there is no general consensus of when we have one word with several senses, or different words
4. most lexical ambiguities automatically lead to structural differences:
• ((jail) releases (upset judges)) vs. ((jail releases) upset (judges))
• ((time) flies (like an arrow)) vs. ((fruit flies) like (a banana))
Structural ambiguity
1. Attachment ambiguity
• adjectives: "Tibetan history teacher"; "old men and women"
• prepositions: "I once shot an elephant in my pajamas. How he got into my pajamas, I'll
never know." (Groucho Marx)
• "I saw the man with the telescope" / "I saw the man with the dog"
2. Garden path sentences
• "the horse raced past the barn fell"
• "the old man the boat"
• "the complex houses married and single soldiers and their families"
Semantic ambiguity
1. Quantifier scope:
• "every man loves a woman" / "some woman admires every man"
• "no news is good news" / "no war is a good war"
• "too many cooks spoil the soup" / "too many parents spoil their children"
• "in New York City, a pedestrian is hit by a car every ten minutes."
2. Pronoun scope:
• "Mary told her mother that she was pregnant."
3. Ellipsis:
• "Kim noticed two typos before Lee did." --- did Lee notice the same typos?
• "Eva worked hard and passed the exam. Adam too." --- what did Adam do?
27
Pragmatic ambiguity
1. Speech-act ambiguity:
• "Do you know the time?" --- "yes"
• "Can you close the window?" --- "sure I can, I'm already five years old"
2. Contextual ambiguity:
• "you have a green light"
• if you are in a car, then perhaps the traffic light has changed
• if you are talking to you boss at work, then perhaps you can go ahead with your project
• or, there could be a green lamp somewhere in you room
28