Quiz AI1704 Page 2 of 2
Quiz AI1704 Page 2 of 2
Question 11
Answer saved
Marked out of 0.50
How can we estimate the performance gradient with respect to the policy parameter when the gradient depends on the unknown
effect of policy changes on the state distribution?
a. the TD
b. the dynamic programming
Clear my choice
https://lms-hcmuni.fpt.edu.vn/mod/quiz/attempt.php?attempt=940899&cmid=56041&page=1 1/8
10:01 18/7/24 Quiz - AI1704 (page 2 of 2)
Question 12
Answer saved
Marked out of 0.50
SARSA is a variant of the Expected SARSA algorithm that enhances learning by taking the expected value of action selection instead of
selecting a single action deterministically.
Select one:
True
False
Question 13
Answer saved
Marked out of 0.50
Select one:
True
False
https://lms-hcmuni.fpt.edu.vn/mod/quiz/attempt.php?attempt=940899&cmid=56041&page=1 2/8
10:01 18/7/24 Quiz - AI1704 (page 2 of 2)
Question 14
Answer saved
Marked out of 0.50
Imagine the agent is learning in an episodic problem. Which of the following is true?
b.
The agent takes the same action at each step during an episode.
c. The number of steps in an episode is stochastic: each episode can have a different number of steps.
Clear my choice
https://lms-hcmuni.fpt.edu.vn/mod/quiz/attempt.php?attempt=940899&cmid=56041&page=1 3/8
10:01 18/7/24 Quiz - AI1704 (page 2 of 2)
Question 15
Answer saved
Marked out of 0.50
Action selection is based on the expected value of all possible actions according to the current policy. It computes the expected value
of all actions and selects actions probabilistically based on their probabilities under the current policy.
a. SARSA
b. Expected SARSA
c. Bellman
d. Deep Learning
Clear my choice
Question 16
Answer saved
Marked out of 0.50
Which algorithm that has the step: "Interact with Environment: Sample trajectories by following the current policy in the environment"?
a. Actor Critic
b. Temporal Difference
c. Dynamic programming
d. Monte Carlo
Clear my choice
https://lms-hcmuni.fpt.edu.vn/mod/quiz/attempt.php?attempt=940899&cmid=56041&page=1 4/8
10:01 18/7/24 Quiz - AI1704 (page 2 of 2)
Question 17
Answer saved
Marked out of 0.50
Given a state, the effect of the policy parameter on the actions, and thus on reward, can be computed in a relatively straightforward
way from knowledge of _____________.
a. the gradient
b. the parameterization
c. the value function
d. the algorithm
Clear my choice
Question 18
Answer saved
Marked out of 0.50
The one-step algorithm is semi-gradient Expected Sarsa that involve importance sampling.
Select one:
True
False
https://lms-hcmuni.fpt.edu.vn/mod/quiz/attempt.php?attempt=940899&cmid=56041&page=1 5/8
10:01 18/7/24 Quiz - AI1704 (page 2 of 2)
Question 19
Answer saved
Marked out of 0.50
It is to directly optimize the parameters of a parameterized policy in order to maximize the expected cumulative rewards obtained by
an agent in an environment
a. Model
b. Policy Gradient
c. Bellman
d. TD(0)
Clear my choice
https://lms-hcmuni.fpt.edu.vn/mod/quiz/attempt.php?attempt=940899&cmid=56041&page=1 6/8
10:01 18/7/24 Quiz - AI1704 (page 2 of 2)
Question 20
Answer saved
Marked out of 0.50
a.
b.
https://lms-hcmuni.fpt.edu.vn/mod/quiz/attempt.php?attempt=940899&cmid=56041&page=1 7/8
10:01 18/7/24 Quiz - AI1704 (page 2 of 2)
c.
d.
Clear my choice
https://lms-hcmuni.fpt.edu.vn/mod/quiz/attempt.php?attempt=940899&cmid=56041&page=1 8/8