0% found this document useful (0 votes)
4 views7 pages

CS 4501-Introduction To Reinforcement Learning

CS 4501 is an undergraduate course on Reinforcement Learning, covering foundational concepts such as multi-armed bandits, Markov Decision Processes, and deep learning applications. Prerequisites include familiarity with machine learning, mathematics, and programming, with assessments comprising homework, quizzes, and a course project. The course emphasizes hands-on experience and encourages student collaboration while maintaining academic integrity.

Uploaded by

akhosla67
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views7 pages

CS 4501-Introduction To Reinforcement Learning

CS 4501 is an undergraduate course on Reinforcement Learning, covering foundational concepts such as multi-armed bandits, Markov Decision Processes, and deep learning applications. Prerequisites include familiarity with machine learning, mathematics, and programming, with assessments comprising homework, quizzes, and a course project. The course emphasizes hands-on experience and encourages student collaboration while maintaining academic integrity.

Uploaded by

akhosla67
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CS 4501: Introduction to Reinforcement Learning

Instructor: Hongning Wang (hw5x)


TA: Wanyu Du (wd5jq)
https://www.cs.virginia.edu/~hw5x/Course/RL2022-Fall/_site/
Department of Computer Science
University of Virginia

1 Course Overview
Be curious, not judgmental.
– Walt Whitman, an American poet, essayist, and journalist.

Reinforcement learning has been widely adopted in our everyday life, from robotics, self-
driving car and gaming playing to intelligent information systems, such as search engine and
recommender systems, and many more! It has been extensively studied in multiple disciplines,
including computer science, psychology, neuroscience, optimization, and operational research.
The recent success of DeepMind’s AlphaGo has put reinforcement learning under the spotlight
and helped it receive ever-increasing attention from the field.
In this undergraduate-level course, we will discuss the foundations in reinforcement learning,
starting from multi-armed bandits, to Markov Decision Process, planning, on-policy and off-
policy learning, its recent development under the context of deep learning, and its real-world
applications. Through our lecture discussions and working on your course projects with your
teammates, you will be able to:

• Get familiar with the history of reinforcement learning research and its focuses in different
fields of studies, and recognize the boundaries between it and other machine learning
paradigms;
• Understand core reinforcement learning techniques ranging from dynamic programming,
Monte Carlo methods, temporal difference methods, function approximation methods, and
off-policy evaluation and learning;
• Go deep into a specific topic in reinforcement learning on your own and present your
thoughts and ideas to your instructor and peers in classroom discussions;
• Team up with others to solving challenging reinforcement learning problems.
• Most importantly, get a sense of basic research activities: formulating a real-world problem
in an abstract and mathematical way, and develop principled solutions for it.

2 Prerequisites
Due to the intrinsic complexity of reinforcement learning, a good amount of prerequisites are
imposed or assumed. First of all, a good level of familiarity with general machine learning is
needed. As each one of you is expected to work with specific reinforcement learning algorithms
in your final course project, strong background in machine learning becomes necessary. For

1
SYLLABUS REINFORCEMENT LEARNING
example, you should know what supervised machine learning is and therefore better realize how
reinforcement learning differs from it.
Second, a strong skill in mathematics will help you gain in-depth understanding of the
concepts and algorithms discussed in the course and develop your own idea for new solutions.
You need to be familiar basic concepts of probability (e.g., probability distributions, Bayes’s
theorem and expectation), linear algebra (e.g., matrix inverse and decomposition), and calculus
(e.g., Hessian and second order optimality).
Last but not least, significant programming experience will be helpful as you can focus
more on the exciting reinforcement learning algorithms being explored rather than the syntax of
programming languages. It is recommended you have taken CS 2150 (or higher) and have a good
working familiarity with at least one programming language (python is highly recommended).
If you are not sure if you have met such prerequisites, please feel free to contact the instructor.

3 Course Content & Schedule


To help you get broad exposure to reinforcement learning techniques, we will cover a variety of
basic elements, techniques and modern advances in reinforcement learning; but it can never be
exhaustive. The course will be mainly delivered by lecture-style discussions by the instructor; but
the students are highly encouraged to broadly read the recommended papers after each lecture
to broaden the scope the deepen the depth of your understanding in reinforcement learning.
You will learn through lectures, in-class paper discussions, homework assignments, and course
projects. Topics to be covered include (the schedules are tentative and subject to change, please
keep track of it on our course website):

1. Introduction (∼2 lectures): We will highlight the basic structure and major topics of this
course, and go over some logistic issues and course requirements.

2. Basic elements of reinforcement learning (∼3 lectures): We will lay down the basic con-
cepts (e.g., reward v.s., return, value function v.s., policy) and building blocks in rein-
forcement learning, and introduce categorizations of reinforcement learning problems, e.g.,
planning/control v.s., learning, and corresponding solutions, e.g., model-based v.s., model-
free reinforcement learning.

3. Multi-armed bandit (∼4 lectures): It is a good entry point for more complicated reinforce-
ment learning problems, as it can be understood as reinforcement learning with no state
or no state transitions (a.k.a., contextual bandits). We will focus on the key challenge in
multi-armed bandits, i.e., the explore-exploit trade-off, and introduce classical solutions to
effectively balance the trade-off.

4. Markov decision process (∼2 lectures): It is one of the most well-studied reinforcement
learning problems, and oftentimes (mistakenly) considered as the reinforcement learning
problem. It provides a mathematical framework for modeling decision making in situations
where outcomes are partly random and partly under the control of a decision maker. We
will carefully discuss its structures, underlying assumptions, and limitations. This will be
our primary problem setup for discussing existing reinforcement learning algorithms in this
course.

2
SYLLABUS REINFORCEMENT LEARNING
5. Dynamic programming (∼2 lectures): It is a well-known problem solving principle in com-
puter science, e.g., computer algorithm designs; and it is the foundation for computing
optimal policies given a perfect model of the environment. We will cover important con-
cepts of Bellman optimality equation, value iteration, and policy iteration, originated from
the dynamic programming technique.

6. Monte Carlo Methods (∼2 lectures): It relaxes the assumption of complete knowledge of
environment, and instead enables an agent to learn from the experiences gained by inter-
acting with the environment. We will cover Monte Carlo methods for value estimation and
control. When time allows, we will also cover another important direction of reinforcement
learning, i.e., off-policy evaluation and learning, using Monte Carlo methods.

7. Temporal-Difference Learning (∼3 lectures): It is an important class of model-free rein-


forcement learning solutions, which combine the ideas from Monte Carlo methods (learning
from experience) and dynamic programming (model-based planning) via boot-strapping.
And it is always considered as one of the most fundamental idea in reinforcement learning.
We will cover the n-step TD learning methods, TD(λ), and Q-learning methods in our
lectures.

8. Policy Gradient (∼3 lectures): It is another important family of model-free reinforcement


learning solutions, which perform gradient-based optimization directly in a parameterized
policy space. It is especially useful in problems with continuous states. We will cover the
classical REINFORCE algorithm, off-policy policy gradient and the Actor-Critic method.

9. Approximation Methods (∼2 lectures): Function approximation is an important technique


to make reinforcement learning applicable in practice, especially when the state or action
space is prohibitively large. We will cover various typically employed approximation meth-
ods that extend reinforcement learning from tabular methods to parametric methods.

10. Deep Reinforcement Learning (∼4 lectures): Among the set of typically employed function
approximation methods, deep neural networks stand out, due to their exceptional repre-
sentation learning power. We will introduce deep neural network based methods for both
value-based and policy-based reinforcement learning algorithms.

11. Offline Reinforcement Learning (∼2 lectures): Reinforcement learning improves itself from
its interactions with the environment. However, in many real-world applications, online
interactions with an environment is expensive. It is thus important to study how to leverage
existing offline data for policy learning. We will cover the most recent developments in
this direction and elaborate the key insights for addressing this challenging problem.

4 Assessments
This course will be structured as a hybrid of lecture-driven classes, paper reading and a hands-on
course project. Four light-weighted homework assignments will be provided to help you practice
reinforcement learning algorithms, four in-class quizzes help you master the basic concepts and
key ideas in reinforcement learning, and the course project helps you take a deep dive into the

3
SYLLABUS REINFORCEMENT LEARNING
field. These planned activities should help you obtain a comprehensive understanding of the
course materials and the spectrum of reinforcement learning.
Paper Review (10%) Paper reading is a vital skill in any field of research, and we would like
to provide you the training to hone your skill. Through the semester, every student is required
to choose a paper from suggested readings of each lecture, carefully read it, summarize your
understanding of it, and post your summary on Piazza. We will ask you to act as a reviewer of
this paper and write a critical review about it (e.g., both positive and negative aspects of this
paper). The summary will be peer-evaluated and the summaries with high qualities will receive
bonus points.
Homework Assignments (36%) We have prepared three machine problems (12% each) to
guide you through the core details of multi-armed bandit algorithms (MP1), Markov Decision
Process (MP2), and Policy Gradient method (MP3). As the nature of reinforcement learning,
the effectiveness of your algorithm, in terms of optimality (e.g., regret), will be prioritized. On
the other hand, implementation efficiency will also be emphasized, since the algorithms are often
applied in an online fashion.
In-class Quizzes (24%) To help you master the basic concepts and key ideas in reinforce-
ment learning, we will have four in-class quizzes (6% each). The format of the quiz consists of
True/False questions, multiple choice questions, and short answer questions.
Course Project (40%) Practice makes perfect. Given the intrinsic complex of reinforcement
learning, it is hard to believe one can gain any in-depth understanding of the algorithm without
using it to solve real problems. Our course project gives you such hands-on experience on solving
interesting reinforcement learning problems, e.g., playing Star Craft. The project appreciates
either research-oriented problems or “deliverables.” You need to identify the problem on your
own, apply the knowledge learned in class and beyond, and work in a group of 2-3 students
to solve it. It is preferred that the outcome of your project could be publishable, e.g., your
(unique) solution to some (interesting/important/new) problems, or tangible, e.g., some kind
of prototype system that can be demonstrated. Bonus points will be given to the groups meet
either one of above criteria. Discuss with the instructor and TA about your project idea and
progress is an important way to ensure your success in the end. Every group needs to present
their work to the class and submit a written report to summarize their results.

5 Resources
There are already tons of video lectures, informative documentations and blog articles, technical
reports, research papers, and open implementations out there on the Internet about reinforce-
ment learning. As a graduate student, it is very important to leverage such online resources to
boost your knowledge and research.
We have an official text book for this course, “Reinforcement Learning: An Introduc-
tion, Second Edition, By Richard S. Sutton and Andrew G. Barto, MIT Press, November 2018.”
Our lecture discussions will be mostly based on this text book.
The instructor has also listed several good online resources to help you master the course
material, including online tutorials, similar courses offered in other institutes, public toolkits
and libraries, and research papers and reports. You can find them on our course website. You
are also welcome to share any material you found helpful in our course forum.

4
SYLLABUS REINFORCEMENT LEARNING
6 Policies
How to participate in this course? To minimize impact from COVID-19 to our students,
we will put extra caution and safety measures for protecting everyone’s health, including having
this course live on zoom throughout the whole semester. Although the lectures will be given
in person in a classroom, the instructor will also present the materials live via zoom and have
them recorded, such that if you do not feel well or have concerns regarding COVID exposure you
can still access the class remotely. It is completely fine for you to participate in our discussions
via zoom remotely, which includes your final project presentations. For your convenience, the
zoom link for our live lecture discussion is listed below: https://virginia.zoom.us/meeting/
tJIpfumopzstEtdMTHLrGt_A6jj_dnNWxopj/ics?icsToken=98tyKuCuqjIqGt2VtxGERowABor4c_
TxmGJaj7dZsSvNLzJ0djzXYOhIDbZxPu_I.
When to start working on my assignments? You will be given two weeks to finish each
assignment. Our late policy is simple: 1) You have free 7 late days in total for all assignments;
2) You can use late days for any assignments; and a late day extends the deadline 24 hours.
3) Once you have used all 7 late days, the penalty is 10% for each additional late day (until 0
points left). Start early is always recommended: given the nature of computer programming,
exceptions and errors always happen in the last step.
Evaluation Rubrics The detailed evaluation rubrics will be carefully discussed in the instruc-
tion for your homework assignments, paper reading and final project report. And you can find
them on our course website accordingly. Please note NO curing will be applied by the end of
the semester, and the final grade will be calculated by the weight of each assessment in this class
defined in our Assessment section.
What should not you do in this course? Plagiarism is considered as a serious misconduct in
computer science in both industry and academia: it hurts your credibility and might also cause
legal matters in copyright and intellectual property. As a result, for our machine problems,
discussing with peers or instructor is allowed, copying others’ (including former students in this
class) code or implementation is strictly prohibited. All our machine problems are designed for
students to finish individually, and therefore sharing code/implementation or collaboration is
not allowed. Using third-party public library is allowed (unless explicitly introduced not to),
but it has to be clearly documented and explained in your assignment report.
Disabilities The University of Virginia strives to provide accessibility to all students. If you
require an accommodation to fully access this course, please contact the Student Disability
Access Center (SDAC) at (434) 243-5180 or [email protected]. If you are unsure if you re-
quire an accommodation, or to learn more about their services, you may contact the SDAC
at the number above or by visiting their website at http://studenthealth.virginia.edu/
student-disability-access-center/faculty-staff.
Religious Accommodations It is the University’s long-standing policy and practice to rea-
sonably accommodate students so that they do not experience an adverse academic consequence
when sincerely held religious beliefs or observances conflict with academic requirements. Stu-
dents who wish to request academic accommodation for a religious observance should submit
their request in writing directly to me by email as far in advance as possible. Students and
instructors who have questions or concerns about academic accommodations for religious obser-
vance or religious beliefs may contact the University’s Office for Equal Opportunity and Civil
Rights (EOCR) at [email protected] or 434-924-3200. Accommodations do not relieve

5
SYLLABUS REINFORCEMENT LEARNING
you of the responsibility for completion of any part of the coursework missed as the result of a
religious observance.
Grade Cutoffs We will use the standard grade cutoff points and no curing will be applied to
your final grades, such that you can keep track of and predict your final letter grade on the fly:

Table 1: Grade cutoff points


Letter Grade Point Range
A+ [97,110]
A [93,97)
A- [90, 93)
B+ [87, 90)
B [83, 87)
B- [80, 83)
C+ [77, 80)
C [73, 77)
C- [70, 73)
D+ [67, 70)
D [63, 67)
D- [60, 63)
F [0, 60)

7 Communications
Meeting Times We will have our lecture on every Tuesday and Thursday morning from
9:30am to 10:45am, both in-person in Thornton Hall E303 and via zoom. And the zoom
link is https://virginia.zoom.us/meeting/tJIpfumopzstEtdMTHLrGt_A6jj_dnNWxopj/ics?
icsToken=98tyKuCuqjIqGt2VtxGERowABor4c_TxmGJaj7dZsSvNLzJ0djzXYOhIDbZxPu_I, which
can also be found in our collab site.
Office Hours The instructor’s office hour will be held on Tuesday and Thursday afternoon
from 4:00pm to 5:00pm, online via zoom. And the TA’s office hour will be held on Wednesday
and Friday afternoon from 2:00pm to 3:00pm, online via zoom. You can find the zoom links for
our office hours in collab, but please make an appointment beforehand (at least two hours in
advance). Additional office hours can be requested by email, and you can also request in-person
meetings if you feel that would make our discussions more effective.
Course Web Site The course website is located at http://www.cs.virginia.edu/~hw5x/
Course/RL2022-Fall/_site/. All the course announcements and materials will be posted on
this website. Our Collab site will be used for homework submission, grades releasing, zoom
meetings and recordings.
Piazza The most important forum for communicating in this class is the course’s Piazza fo-
rum. Piazza is like a newsgroup or forum – you are encouraged to use it to ask questions,
initiate discussions, express opinions, share resources, and give advice. The Piazza site for this
class is https://piazza.com/virginia/fall2022/cs450120055. Please enroll yourself at the

6
SYLLABUS REINFORCEMENT LEARNING
beginning of this semester.
We expect that you will be courteous and post only material that is somehow related to the
topic of reinforcement learning or course content. The posts will be lightly moderated. Note
that private posts to Piazza can be used for things like conflict requests, or for letting us know
that you have that sinking feeling anything you do not really want to share with your classmates.

8 At the end
Thanks to you for reading the entire syllabus. Hopefully it makes your experience a bit easier
and less stressful, and focus on more on this exciting area of research!

You might also like