Ai Document
Ai Document
Submittedinpartialfulfilmentoftherequirements fortheawardofthedegreeof
BACHELOR OF TECHNOLOGY
IN
CERTIFICATE
DECLARATION
We here by declare that this Project entitled “ARTIFICAL INTELLIGENCE” being submitted to
the Department of COMPUTER SCIENCE AND ENGINEERING, Approved by AICTE,
Accredited by NACC, affiliated to JNTU, KAKINADA for the award of Bachelor of
Technology, is a record of bonafide work done by her and it has not been submitted to any other
Institute or University of the award of any other degree or prize.
Place:
Date:
SUBMITTED BY:
V.ROHITH 22AR1A0585
ACKNOWLEDGEMENT
We express our deep sense of gratitude to DR. NALABOTHU VENKATA RAO, B.V.SC
Chairman of SAI TIRUMULA NVR ENGINEERING COLLEGE Narasaraopet, for creating
excellent academic atmosphere and providing good infrastructural facilities to us. We express our deepfelt
gratitude to the management of TIRUMALA INSTITURTE OF TECHNOLOGY AND SCIENCES for
helping us in the successful completion of the project.
We are grateful to our Director Prof. Dr. Y. V. NARAYANA B. Tech, M. Tech, Ph. D. for
providing all the necessary facilities for the completion of this project in a specified time.
We are grateful to our principal, Prof. Dr. K. PRASADA RAO. M. Tech, Ph. D for
providing us with all the necessary facilities for the completion of this project in the
specified time.
We express our sincere gratitude to Dr. K. PRASADA RAO. M. Tech, Ph. D Professor and
Head of the Department of Computer Science & Engineering, for his support and
encouragement during the period of project work.
We express our earnest thanks to other faculty members of CSE for extending their helping
hands and valuable suggestions when in need. We express our special thanks to all the
library staff and non-teaching staff of SAI TIRUMALA NALABOTHU VENKATA
RAO ENGINEERING COLLEGE, for providing the necessary library facilities. Last but
not least, we express our heartfelt thanks to all our friends for our endeavour to complete the
project successfully.
V.ROHITH 22AR1A0585
Index
Introduction
Applications of AI
State space representation
Artificial Neural Networks
AI Problem Solving
Artificial Neural Networks
Machine Learning
Deep Learning
Image Classification
Image Recognition
Data Science
Linear Algebra
Calculus in AI
Bayes’ Theorem
Markov’s Decision Process
Utility Theorem & Functions
Reinforcement Learning
Passive ,Active & Q Learning
Introduction
Artificial intelligence (AI), the ability of a digital computer or computer-
controlled robot to perform tasks commonly associated with intelligent beings. The term is
frequently applied to the project of developing systems endowed with the intellectual processes
characteristic of humans, such as the ability to reason, discover meaning, generalize, or learn
from past experience. Since the development of the digital computer in the 1940s, it has been
demonstrated that computers can be programmed to carry out very complex tasks—as, for
example, discovering proofs for mathematical theorems or playing chess—with great
proficiency. Still, despite continuing advances in computer processing speed and memory
capacity, there are as yet no programs that can match human flexibility over wider domains or
in tasks requiring much everyday knowledge. On the other hand, some programs have attained
the performance levels of human experts and professionals in performing certain specific tasks,
so that artificial intelligence in this limited sense is found in applications as diverse as
medical diagnosis, computer search engines, and voice or handwriting recognition.
Psychologists generally do not characterize human intelligence by just one trait but by
the combination of many diverse abilities. Research in AI has focused chiefly on the following
components of intelligence: learning, reasoning, problem solving, perception, and using
language
Learning
There are a number of different forms of learning as applied to artificial intelligence.
The simplest is learning by trial and error. For example, a simple computer program for solving
mate-in-one chess problems might try moves at random until mate is found. The program might
then store the solution with the position so that the next time the computer encountered the
same position it would recall the solution. This simple memorizing of individual items and
procedures—known as rote learning—is relatively easy to implement on a computer. More
challenging is the problem of implementing what is called generalization. Generalization
involves applying past experience to analogous new situations. For example, a program that
learns the past tense of regular English verbs by rote will not be able to produce the past tense
of a word such as jump unless it previously had been presented with jumped, whereas a
program that is able to generalize can learn the “add ed” rule and so form the past tense
of jump based on experience with similar verbs.
Reasoning
To reason is to draw inferences appropriate to the situation. Inferences are classified as
either deductive or inductive. An example of the former is, “Fred must be in either the museum
or the café. He is not in the café; therefore he is in the museum,” and of the latter, “Previous
accidents of this sort were caused by instrument failure; therefore this accident was caused by
instrument failure.” The most significant difference between these forms of reasoning is that
in the deductive case the truth of the premises guarantees the truth of the conclusion, whereas
in the inductive case the truth of the premise lends support to the conclusion without giving
absolute assurance. Inductive reasoning is common in science, where data are collected and
tentative models are developed to describe and predict future behaviour—until the appearance
of anomalous data forces the model to be revised. Deductive reasoning is common
in mathematics and logic, where elaborate structures of irrefutable theorems are built up from
a small set of basic axioms and rules.
There has been considerable success in programming computers to draw inferences,
especially deductive inferences. However, true reasoning involves more than just drawing
inferences; it involves drawing inferences relevant to the solution of the particular task or
situation. This is one of the hardest problems confronting AI.
Problem solving
Problem solving, particularly in artificial intelligence, may be characterized as a
systematic search through a range of possible actions in order to reach some predefined goal
or solution. Problem-solving methods divide into special purpose and general purpose. A
special-purpose method is tailor-made for a particular problem and often exploits very specific
features of the situation in which the problem is embedded. In contrast, a general-purpose
method is applicable to a wide variety of problems.
One general-purpose technique used in AI is means-end analysis—a step-by-step,
or incremental, reduction of the difference between the current state and the final goal. The
program selects actions from a list of means—in the case of a simple robot this might consist
of PICKUP, PUTDOWN, MOVEFORWARD, MOVEBACK, MOVELEFT, and
MOVERIGHT—until the goal is reached.
Many diverse problems have been solved by artificial intelligence programs. Some examples
are finding the winning move (or sequence of moves) in a board game, devising mathematical
proofs, and manipulating “virtual objects” in a computer-generated world.
Perception
In perception the environment is scanned by means of various sensory organs, real or
artificial, and the scene is decomposed into separate objects in various spatial relationships.
Analysis is complicated by the fact that an object may appear different depending on the angle
from which it is viewed, the direction and intensity of illumination in the scene, and how much
the object contrasts with the surrounding field.
At present, artificial perception is sufficiently well advanced to enable optical sensors
to identify individuals, autonomous vehicles to drive at moderate speeds on the open road, and
robots to roam through buildings collecting empty soda cans. One of the earliest systems
to integrate perception and action was FREDDY, a stationary robot with a moving television
eye and a pincer hand, constructed at the University of Edinburgh, Scotland, during the period
1966–73 under the direction of Donald Michie. FREDDY was able to recognize a variety of
objects and could be instructed to assemble simple artifacts, such as a toy car, from a random
heap of components.
Applications of AI
Marketing
Banking
Finance
Agriculture
HealthCare
Gaming
Space Exploration
Autonomous Vehicles
Chatbots
Artificial Creativity
AI Problem Solving
Solving any kind of problem requires certain systematic steps to be followed. Similar
is the case with solving problems by AI agents. Have a look at the flowchart below. This
flowchart shows the basic steps which are followed by any of the AI agents in order to solve
any problem in an AI environment.
Goal Formulation–In this step as soon as a problem arises, the agent sets a goal or a
target. This requires the agent to promptly analyse and define the problem. This is a crucial
step as if the goal for the problem is wrongly formulated then all the steps taken in order to
reach the goal would be of no use.
Problem Definition–This is one main step of problem solving. It is here that whenever
a problem arises, then the agent decides on what actions must be taken so as to reach till the
formulated goal. This is done in the following steps:
➢ Defining the State Space–A state space can be defined as a collection of all the valid
states in which an agent can be in when finding a solution to the problem.
➢ Defining Initial State–In order for an agent to start solving the problem, it needs to
start from a state. The first state from where the agent starts working is referred to as
the initial state.
➢ Gather Knowledge–Now the agent gathers information and uses the information
required by it to solve the problem. This knowledge will be gathered with past
experiences as well as current learnings.
➢ Planning the Transitions–Some problems are small and so these can be solved easily.
But most of the times problems will be such where proper planning and execution is
needed. Hence this requires proper data structures and control strategies well in
advance.
Here the problem is divided into sub problems. The results of the various actions taken in
solving the previous sub problem are forwarded to the next sub problem and the combined
effect of the sub problems leads to the final solution. This requires proper planning and
execution of transitions.
Testing the Goal State–In this step, the results yielded from the agent are compared
with that of the goal state. If the goal has been reached, then the agents stop any further actions
and the problem reaches the last state. But if the goal is not reached then the agent continues to
find actions to reach till the goal.
➢ Calculating the Cost of Path taken–Whenever an agent takes a path in order to solve
a problem it allots a numeric value (or cost) to that path. These values are then evaluated
using a cost function. The evaluated result is hence used in the agent’s performance
measure. The solution which is reached with the minimum or lowest cost of path is
termed as an optimal solution.
ANNs began as an attempt to exploit the architecture of the human brain to perform
tasks that conventional algorithms had little success with. They soon reoriented towards
improving empirical results, mostly abandoning attempts to remain true to their biological
precursors. Neurons are connected to each other in various patterns, to allow the output of some
neurons to become the input of others. The network forms a directed, weighted graph
An artificial neural network consists of a collection of simulated neurons. Each neuron
is a node which is connected to other nodes via links that correspond to biological axon-
synapse-dendrite connections. Each link has a weight, which determines the strength of one
node's influence on another.
Artificial neurons
ANNs are composed of artificial neurons which are conceptually derived from
biological neurons. Each artificial neuron has inputs and produces a single output which can
be sent to multiple other neurons.[47] The inputs can be the feature values of a sample of external
data, such as images or documents, or they can be the outputs of other neurons. The outputs of
the final output neurons of the neural net accomplish the task, such as recognizing an object in
an image.
To find the output of the neuron, first we must take the weighted sum of all the inputs,
weighted by the weights of the connections from the inputs to the neuron. We add a bias term
to this sum. This weighted sum is sometimes called the activation. This weighted sum is then
passed through a (usually nonlinear) activation function to produce the output. The initial
inputs are external data, such as images and documents. The ultimate outputs accomplish the
task, such as recognizing an object in an image.
Machine Learning
Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to become more accurate at predicting outcomes without being explicitly
programmed to do so. Machine learning algorithms use historical data as input to predict new
output values. Recommendation engines are a common use case for machine learning. Other
popular uses include fraud detection, spam filtering, malware threat detection, business process
automation (BPA) and Predictive maintenance.
Machine learning is important because it gives enterprises a view of trends in customer
behaviour and business operational patterns, as well as supports the development of new
products. Many of today's leading companies, such as Facebook, Google and Uber, make
machine learning a central part of their operations. Machine learning has become a significant
competitive differentiator for many companies
Types of Machine Learning
Classical machine learning is often categorized by how an algorithm learns to become more
accurate in its predictions. There are four basic approaches:
supervised learning, unsupervised learning, semi-supervised learning and reinforcement
learning. The type of algorithm data scientists chose to use depends on what type of data they
want to predict.
Supervised learning: In this type of machine learning, data scientists supply
algorithms with labeled training data and define the variables they want the algorithm to assess
for correlations. Both the input and the output of the algorithm is specified.
Unsupervised learning: This type of machine learning involves algorithms that train
on unlabeled data. The algorithm scans through data sets looking for any meaningful
connection. The data that algorithms train on as well as the predictions or recommendations
they output are predetermined.
Semi-supervised learning: This approach to machine learning involves a mix of the
two preceding types. Data scientists may feed an algorithm mostly labeled training data, but
the model is free to explore the data on its own and develop its own understanding of the data
set.
Reinforcement learning: Data scientists typically use reinforcement learning to teach
a machine to complete a multi-step process for which there are clearly defined rules. Data
scientists program an algorithm to complete a task and give it positive or negative cues as it
works out how to complete a task. But for the most part, the algorithm decides on its own what
steps to take along the way.
Deep Learning
Deep learning (also known as deep structured learning) is part of a broader family
of machine learning methods based on artificial neural networks with representation learning.
Learning can be supervised, semi-supervised or unsupervised.
Deep-learning architectures such as deep neural networks, deep belief networks, deep
reinforcement learning, recurrent neural networks and convolutional neural networks have
been applied to fields including computer vision, speech recognition, natural language
processing, machine translation, bioinformatics, drug design, medical image analysis, climate
science, material inspection and board game programs, where they have produced results
comparable to and in some cases surpassing human expert performance.
Deep learning is a modern variation which is concerned with an unbounded number of
layers of bounded size, which permits practical application and optimized implementation,
while retaining theoretical universality under mild conditions. In deep learning the layers are
also permitted to be heterogeneous and to deviate widely from biologically
informed connectionist models, for the sake of efficiency, trainability and understandability,
whence the "structured" part.
Image Classification
We live in the era of data. With the Internet of Things (IoT) and Artificial Intelligence
(AI) becoming ubiquitous technologies, we now have huge volumes of data being generated.
Differing in form, data could be speech, text, image, or a mix of any of these. In the form of
photos or videos, images make up for a significant share of global data creation
Image classification is the task of categorizing and assigning labels to groups of pixels
or vectors within an image dependent on particular rules. The categorization law can be applied
through one or multiple spectral or textural characterizations.
Image classification techniques are mainly divided into two categories: Supervised and
unsupervised image classification techniques.
Unsupervised classification
Unsupervised classification technique is a fully automated method that does not
leverage training data. This means machine learning algorithms are used to analyze and cluster
unlabeled datasets by discovering hidden patterns or data groups without the need for human
intervention.
With the help of a suitable algorithm, the particular characterizations of an image are
recognized systematically during the image processing stage. Pattern recognition and image
clustering are two of the most common image classification methods used here. Two popular
algorithms used for unsupervised image classification are ‘K-mean’ and ‘ISODATA.’
K-means is an unsupervised classification algorithm that groups objects into k groups based
on their characteristics. It is also called “clusterization.” K-means clustering is one of the
simplest and very popular unsupervised machine learning algorithms.
ISODATA stands for “Iterative Self-Organizing Data Analysis Technique,” it is an
unsupervised method used for image classification. The ISODATA approach includes iterative
methods that use Euclidean distance as the similarity measure to cluster data elements into
different classes. While the k-means assumes that the number of clusters is known a priori (in
advance), the ISODATA algorithm allows for a different number of clusters.
Supervised classification
Supervised image classification methods use previously classified reference samples
(the ground truth) in order to train the classifier and subsequently classify new, unknown data.
Therefore, the supervised classification technique is the process of visually choosing samples
of training data within the image and allocating them to pre-chosen categories, including
vegetation, roads, water resources, and buildings. This is done to create statistical measures to
be applied to the overall image.
Image Recognition
Image recognition, in the context of machine vision, is the ability of software to identify
objects, places, people, writing and actions in images. Computers can use machine vision
technologies in combination with a camera and artificial intelligence software to achieve image
recognition.
Image recognition is used in various cases in our daily life. Image recognition is used
to perform many machine-based visual tasks, such as labeling the content of images with meta-
tags, performing image content search and guiding autonomous robots, self-driving cars and
accident-avoidance systems.
While human and animal brains recognize objects with ease, computers have difficulty
with the task. Software for image recognition requires deep learning.
Performance is best on convolutional neural net processors as the specific task
otherwise requires massive amounts of power for its compute-intensive nature. Image
recognition algorithms can function by use of comparative 3D models, appearances from
different angles using edge detection or by components. Image recognition algorithms are often
trained on millions of pre-labeled pictures with guided computer learning.
Current and future applications of image recognition include smart photo libraries,
targeted advertising, the interactivity of media, accessibility for the visually impaired and
enhanced research capabilities.
Data Science: the impact of Statistics
Data Science influences statistics in many ways. Data Science and address the impact
of statistics on such steps as data acquisition and enrichment, data exploration, data analysis
and modeling, validation and representation and reporting
Statistical data Analysis
Statistical data analysis
Finding structure in data and making predictions are the most important steps in Data
Science. Here, in particular, statistical methods are essential since they are able to handle many
different analytical tasks. Important examples of statistical data analysis methods are the
following.
➢ Hypothesis testing
➢ Classification
➢ Regression
➢ Time series analysis
Linear Algebra
How Linear Algebra is used in AI
Sub-fields in AI which are used in Linear Algebra
Artificial Intelligence is not a single subject it has sub-fields like Learning (Machine
Learning & Deep Learning), Communication using NLP, Knowledge Representation &
Reasoning, Problem Solving, Uncertain Knowledge & Reasoning.
Data representation, Data Processing are not the sub areas of AI, these are using in
ML,DL & NLP areas.
In the above diagram other sub-areas like Problem solving, Knowledge representation
and knowledge reasoning LA objects are used but not as much in Learning (ML/DL) and NLP.
Calculus in AI
Calculus, as a rigorous mathematics course, can provide strict and meticulous logical
thinking for AI. It can model objective problems with mathematical knowledge related to
calculus. At the same time, it can solve AI problems by introducing fuzzy mathematics,
optimization theory or linear algebra.
Calculus is one of the core mathematical concepts in machine learning that permits us
to understand the internal workings of different machine learning algorithms.
One of the important applications of calculus in machine learning is the gradient descent
algorithm, which, in tandem with backpropagation, allows us to train a neural network model
Bayes’ Theorem
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal probabilities of
two random events.
Bayes' theorem was named after the British mathematician Thomas Bayes.
The Bayesian inference is an application of Bayes' theorem, which is fundamental to Bayesian
statistics.
It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine
the probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event
A with known event B:
As from product rule we can write:
P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
P(A ⋀ B)= P(B|A) P(A)
Equating right hand side of both the equations, we will get:
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic
of most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of
hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the
evidence
P(B) is called marginal probability, pure probability of an evidence.
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes'
rule can be written as:
Where A1, A2, A3,. , An is a set of mutually exclusive and exhaustive events.
Applying Bayes' rule:
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A).
This is very useful in cases where we have a good probability of these three terms and want to
determine the fourth one. Suppose we want to perceive the effect of some unknown cause, and
want to compute that cause, then the Bayes' rule becomes:
Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs
80% of the time. He is also aware of some more facts, which are given as follows:
The Known probability that a patient has meningitis disease is 1/30,000.
The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition that patient
has meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a
stiff neck.
• Markov chain - used by systems that are autonomous and have fully observable
states
• Hidden Markov model - used by systems that are autonomous where the state is
partially observable.
Markov models are named after their creator, Andrey Markov, a Russian mathematician
in the late 1800s to early 1900s.
• Dominance: If there are two choices say A and B, where A is more effective than B.
It means that A will be chosen. Thus, A will dominate B. Therefore, multi-attribute
utility function offers two types of dominance:
• Strict Dominance: If there are two websites T and D, where the cost of T is less and
provides better service than D. Obviously, the customer will prefer T rather than D.
Therefore, T strictly dominates D. Here, the attribute values are known.
• Stochastic Dominance: It is a generalized approach where the attribute value is
unknown. It frequently occurs in real problems. Here, a uniform distribution is given,
where that choice is picked, which stochastically dominates the other choices. The exact
relationship can be viewed by examing the cumulative distribution of the attributes.
• Preference Structure: Representation theorems are used to show that an agent with a
preference structure has a utility function as:
• Preference without uncertainty: The preference where two attributes are preferentially
independent of the third attribute. It is because the preference between the outcomes of
the first two attributes does not depend on the third one.
• Preference with uncertainty: This refers to the concept of preference structure with
uncertainty. Here, the utility independence extends the preference independence where
a set of attributes X is utility independent of another Y set of attributes, only if the value
of attribute in X set is independent of Y set attribute value. A set is said to be mutually
utility independent (MUI) if each subset is utility-independent of the remaining
attribute.
Reinforcement Learning
Reinforcement learning is an area of Machine Learning. It is about taking suitable
action to maximize reward in a particular situation. It is employed by various software and
machines to find the best possible behavior or path it should take in a specific situation.
Reinforcement learning differs from supervised learning in a way that in supervised learning
the training data has the answer key with it so the model is trained with the correct answer
itself whereas in reinforcement learning, there is no answer but the reinforcement agent
decides what to do to perform the given task. In the absence of a training dataset, it is bound
to learn from its experience.
Example: The problem is as follows: We have an agent and a reward, with many
hurdles in between. The agent is supposed to find the best possible path to reach the reward.
The following problem explains the problem more easily.
The above image shows the robot, diamond, and fire. The goal of the robot is to get
the reward that is the diamond and avoid the hurdles that are fired. The robot learns by trying
all the possible paths and then choosing the path which gives him the reward with the least
hurdles. Each right step will give the robot a reward and each wrong step will subtract the
reward of the robot. The total reward will be calculated when it reaches the final reward that
is the diamond.
• Input: The input should be an initial state from which the model will start
• Output: There are many possible outputs as there are a variety of solutions to a
particular problem
• Training: The training is based upon the input, The model will return a state and
the user will decide to reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.
In this method, the agent executes a sequence of trials or runs (sequences of states-
actions transitions that continue until the agent reaches the terminal state). Each trial gives a
sample value and the agent estimates the utility based on the samples values. Can be calculated
as running averages of sample values. The main drawback is that this method makes a wrong
assumption that state utilities are independent while in reality they are Markovian. Also, it is
slow to converge.
Suppose we have a 4x3 grid as the environment in which the agent can move either Left,
Right, Up or Down(set of available actions). An example of a run
ADP is a smarter method than Direct Utility Estimation as it runs trials to learn the model
of the environment by estimating the utility of a state as a sum of reward for being in that state
and the expected discounted reward of being in the next state.
Where R(s) = reward for being in states, P(s’|s, π(s)) = transition model, γ = discount factor
and Uπ(s) = utility of being in states.
It can be solved using value-iteration algorithm. The algorithm converges fast but can
become quite costly to compute for large state spaces. ADP is a model based approach and
requires the transition model of the environment. A model-free approach is Temporal Difference
Learning.
TD learning does not require the agent to learn the transition model. The update occurs
between successive states and agent only updates states that are directly affected.
While ADP adjusts the utility of s with all its successor states, TD learning adjusts it
with that of a single successor states’. TD is slower in convergence but much simpler in terms
of computation.
Active Learning
1) ADP with exploration function
As the goal of an active agent is to learn an optimal policy, the agent needs to learn the
expected utility of each state and update its policy. Can be done using a passive ADP agent and
then using value or policy iteration it can learn optimal actions. But this approach results into a
greedy agent. Hence, we use an approach that gives higher weights to unexplored actions and
lower weights to actions with lower utilities.
Where f(u, n) is the exploration function that increases with expected value u and decreases with
number of tries n
R+ is an optimistic reward and Ne is the number of times we want an agent to be forced to
pick an action in every state. The exploration function converts a passive agent into an active
one.
2) Q-Learning
Q-learning is a TD learning method which does not require the agent to learn the
transitional model, instead learns Q-value functions Q(s, a) .