0% found this document useful (0 votes)

137 views9 pages

RLAlgs in MDPs

This document provides references to numerous papers related to reinforcement learning and temporal difference learning. Specifically, it lists over 90 references to papers published between 1992 and 2010 related to topics like multi-agent reinforcement learning, predictive state representations, temporal difference learning with function approximation, policy search methods, and applications of reinforcement learning.

Uploaded by

abarni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views9 pages

RLAlgs in MDPs

Uploaded by

abarni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Li, Y., Szepesvári, C., and Schuurmans, D. (2009).

Learning exercise policies for american

options. In Proc. of the Twelfth International Conference on Artificial Intelligence and
Statistics, JMLR: W&CP, volume 5, pages 352–359.

Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning
and teaching. Machine Learning, 9:293–321.

Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning.

In Cohen and Hirsh (1994), pages 157–163.

Littman, M. L., Sutton, R. S., and Singh, S. P. (2001). Predictive representations of state.
In Dietterich et al. (2001), pages 1555–1561.

Maei, H., Szepesvári, C., Bhatnagar, S., Silver, D., Precup, D., and Sutton, R. (2010a).
Convergent temporal-di↵erence learning with arbitrary smooth function approximation.
In NIPS-22, pages 1204–1212.

Maei, H., Szepesvári, C., Bhatnagar, S., and Sutton, R. (2010b). Toward o↵-policy learning
control with function approximation. In Wrobel et al. (2010).

Maei, H. R. and Sutton, R. S. (2010). GQ( ): A general gradient algorithm for temporal-
di↵erence prediction learning with eligibility traces. In Baum, E., Hutter, M., and Kitzel-
mann, E., editors, Proceedings of the Third Conference on Artificial General Intelligence,
pages 91–96. Atlantis Press.

Mahadevan, S. (2009). Learning representation and control in Markov decision processes:

New frontiers. Foundations and Trends in Machine Learning, 1(4):403–565.

McAllester, D. A. and Myllymäki, P., editors (2008). Proceedings of the 24th Conference in
Uncertainty in Artificial Intelligence (UAI’08). AUAI Press.

Melo, F. S., Meyn, S. P., and Ribeiro, M. I. (2008). An analysis of reinforcement learning
with function approximation. In Cohen et al. (2008), pages 664–671.

Menache, I., Mannor, S., and Shimkin, N. (2005). Basis function adaptation in temporal
di↵erence reinforcement learning. Annals of Operations Research, 134(1):215–238.

Mnih, V., Szepesvári, C., and Audibert, J.-Y. (2008). Empirical Bernstein stopping. In
Cohen et al. (2008), pages 672–679.

Munos, R. and Szepesvári, C. (2008). Finite-time bounds for fitted value iteration. Journal
of Machine Learning Research, 9:815–857.

90
Nascimento, J. and Powell, W. (2009). An optimal approximate dynamic programming
algorithm for the lagged asset acquisition problem. Mathematics of Operations Research,
34:210–237.

Nedič, A. and Bertsekas, D. P. (2003). Least squares policy evaluation algorithms with linear
function approximation. Discrete Event Dynamic Systems, 13(1):79–110.

Neu, G., György, A., and Szepesvári, C. (2010). The online loop-free stochastic shortest-path
problem. In COLT-10.

Ng, A. Y. and Jordan, M. (2000). PEGASUS: A policy search method for large MDPs and
POMDPs. In Boutilier, C. and Goldszmidt, M., editors, Proceedings of the 16th Confer-
ence in Uncertainty in Artificial Intelligence (UAI’00), pages 406–415, San Francisco CA.
Morgan Kaufmann.

Nouri, A. and Littman, M. (2009). Multi-resolution exploration in continuous spaces. In

Koller et al. (2009), pages 1209–1216.

Ormoneit, D. and Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning,

49:161–178.

Ortner, R. (2008). Online regret bounds for Markov decision processes with deterministic
transitions. In Freund, Y., Györfi, L., Turán, G., and Zeugmann, T., editors, Proc. of the
19th International Conference on Algorithmic Learning Theory (ALT 2008), volume 5254
of Lecture Notes in Computer Science, pages 123–137. Springer.

Parr, R., Li, L., Taylor, G., Painter-Wakefield, C., and Littman, M. L. (2008). An analysis of
linear models, linear value-function approximation, and feature selection for reinforcement
learning. In Cohen et al. (2008), pages 752–759.

Parr, R., Painter-Wakefield, C., Li, L., and Littman, M. L. (2007). Analyzing feature gener-
ation for value-function approximation. In Ghahramani (2007), pages 737–744.

Perkins, T. and Precup, D. (2003). A convergent form of approximate policy iteration. In

S. Becker, S. T. and Obermayer, K., editors, Advances in Neural Information Processing
Systems 15, pages 1595–1602, Cambridge, MA, USA. MIT Press.

Peters, J. and Schaal, S. (2008). Natural actor-critic. Neurocomputing, 71(7–9):1180–1190.

Peters, J., Vijayakumar, S., and Schaal, S. (2003). Reinforcement learning for humanoid
robotics. In Humanoids2003, Third IEEE-RAS International Conference on Humanoid
Robots, pages 225—230.

91
Platt, J. C., Koller, D., Singer, Y., and Roweis, S. T., editors (2008). Advances in Neural
Information Processing Systems 20, Cambridge, MA, USA. MIT Press.

Polyak, B. and Juditsky, A. (1992). Acceleration of stochastic approximation by averaging.

SIAM Journal on Control and Optimization, 30:838–855.

Poupart, P., Vlassis, N., Hoey, J., and Regan, K. (2006). An analytic solution to discrete
Bayesian reinforcement learning. In Cohen and Moore (2006), pages 697–704.

Powell, W. B. (2007). Approximate Dynamic Programming: Solving the curses of dimen-

sionality. John Wiley and Sons, New York.

Proper, S. and Tadepalli, P. (2006). Scaling model-based average-reward reinforcement

learning for product delivery. In Fürnkranz et al. (2006), pages 735–742.

Puterman, M. (1994). Markov Decision Processes — Discrete Stochastic Dynamic Program-

ming. John Wiley & Sons, Inc., New York, NY.

Rasmussen, C. and Williams, C. (2005). Gaussian Processes for Machine Learning (Adaptive
Computation and Machine Learning). The MIT Press.

Rasmussen, C. E. and Kuss, M. (2004). Gaussian processes in reinforcement learning. In

Thrun, S., Saul, L. K., and Schölkopf, B., editors, Advances in Neural Information Pro-
cessing Systems 16, pages 751–759, Cambridge, MA, USA. MIT Press.

Riedmiller, M. (2005). Neural fitted Q iteration – first experiences with a data efficient
neural reinforcement learning method. In Gama, J., Camacho, R., Brazdil, P., Jorge, A.,
and Torgo, L., editors, Proceedings of the 16th European Conference on Machine Learning
(ECML-05), volume 3720 of Lecture Notes in Computer Science, pages 317–328. Springer.

Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the
American Mathematics Society, 58:527–535.

Ross, S. and Pineau, J. (2008). Model-based Bayesian reinforcement learning in large struc-
tured domains. In McAllester and Myllymäki (2008), pages 476–483.

Ross, S., Pineau, J., Paquet, S., and Chaib-draa, B. (2008). Online planning algorithms for
POMDPs. Journal of Artificial Intelligence Research, 32:663–704.

Rummery, G. A. (1995). Problem solving with reinforcement learning. PhD thesis, Cambridge
University.

92
Rummery, G. A. and Niranjan, M. (1994). On-line Q-learning using connectionist systems.
Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Depart-
ment.

Rusmevichientong, P., Salisbury, J. A., Truss, L. T., Van Roy, B., and Glynn, P. W. (2006).
Opportunities and challenges in using online preference data for vehicle pricing: A case
study at General Motors. Journal of Revenue and Pricing Management, 5(1):45–61.

Rust, J. (1996). Using randomization to break the curse of dimensionality. Econometrica,

65:487–516.

Scherrer, B. (2010). Should one compute the temporal di↵erence fix point or minimize the
Bellman residual? The unified oblique projection view. In Wrobel et al. (2010).

Schölkopf, B., Platt, J. C., and Ho↵man, T., editors (2007). Advances in Neural Information
Processing Systems 19, Cambridge, MA, USA. MIT Press.

Schraudolph, N. (1999). Local gain adaptation in stochastic gradient descent. In Ninth

International Conference on Artificial Neural Networks (ICANN 99), volume 2, pages
569–574.

Shapiro, A. (2003). Monte Carlo sampling methods. In Stochastic Programming, Handbooks

in OR & MS, volume 10. North-Holland Publishing Company, Amsterdam.

Shavlik, J. W., editor (1998). Proceedings of the 15th International Conference on Machine
Learning (ICML 1998), San Francisco, CA, USA. Morgan Kau↵mann.

Silver, D., Sutton, R. S., and Müller, M. (2007). Reinforcement learning of local shape in
the game of Go. In Veloso, M. M., editor, Proceedings of the 20th International Joint
Conference on Artificial Intelligence (IJCAI 2007), pages 1053—1058.

Simão, H. P., Day, J., George, A. P., Gi↵ord, T., Nienow, J., and Powell, W. B. (2009). An
approximate dynamic programming algorithm for large-scale fleet management: A case
application. Transportation Science, 43(2):178–197.

Singh, S. P. and Bertsekas, D. P. (1997). Reinforcement learning for dynamic channel al-
location in cellular telephone systems. In Mozer, M. C., Jordan, M. I., and Petsche, T.,
editors, NIPS-9: Advances in Neural Information Processing Systems: Proceedings of the
1996 Conference, pages 974–980, Cambridge, MA, USA. MIT Press.

Singh, S. P., Jaakkola, T., and Jordan, M. I. (1995). Reinforcement learning with soft state
aggregation. In Tesauro et al. (1995), pages 361–368.

93
Singh, S. P., Jaakkola, T., Littman, M. L., and Szepesvári, C. (2000). Convergence results for
single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3):287–308.

Singh, S. P. and Sutton, R. S. (1996). Reinforcement learning with replacing eligibility

traces. Machine Learning, 32:123–158.

Singh, S. P. and Yee, R. C. (1994). An upper bound on the loss from approximate optimal-
value functions. Machine Learning, 16(3):227–233.

Solla, S. A., Leen, T. K., and Müller, K. R., editors (1999). Advances in Neural Information
Processing Systems 12, Cambridge, MA, USA. MIT Press.

Strehl, A. L., Li, L., Wiewiora, E., Langford, J., and Littman, M. L. (2006). PAC model-free
reinforcement learning. In Cohen and Moore (2006), pages 881–888.

Strehl, A. L. and Littman, M. L. (2005). A theoretical analysis of model-based interval

estimation. In De Raedt and Wrobel (2005), pages 857–864.

Strehl, A. L. and Littman, M. L. (2008). Online linear regression and its application to
model-based reinforcement learning. In Platt et al. (2008), pages 1417–1424.

Strens, M. (2000). A Bayesian framework for reinforcement learning. In Langley, P., edi-
tor, Proceedings of the 17th International Conference on Machine Learning (ICML 2000),
pages 943–950. Morgan Kaufmann.

Sutton, R. S. (1984). Temporal Credit Assignment in Reinforcement Learning. PhD thesis,

University of Massachusetts, Amherst, MA.

Sutton, R. S. (1988). Learning to predict by the method of temporal di↵erences. Machine

Learning, 3(1):9–44.

Sutton, R. S. (1992). Gain adaptation beats least squares. In Proceedings of the 7th Yale
Workshop on Adaptive and Learning Systems, pages 161—166.

Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. Bradford

Book. MIT Press.

Sutton, R. S., Maei, H. R., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., and
Wiewiora, E. (2009a). Fast gradient-descent methods for temporal-di↵erence learning
with linear function approximation. In Danyluk et al. (2009), pages 993—1000.

Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. (1999a). Policy gradient
methods for reinforcement learning with function approximation. In Solla et al. (1999),
pages 1057–1063.

94
Sutton, R. S., Precup, D., and Singh, S. P. (1999b). Between MDPs and semi-MDPs:
A framework for temporal abstraction in reinforcement learning. Artificial Intelligence,
112:181–211.

Sutton, R. S., Szepesvári, C., Geramifard, A., and Bowling, M. H. (2008). Dyna-style
planning with linear function approximation and prioritized sweeping. In McAllester and
Myllymäki (2008), pages 528–536.

Sutton, R. S., Szepesvári, C., and Maei, H. R. (2009b). A convergent O(n) temporal-
di↵erence algorithm for o↵-policy learning with linear function approximation. In Koller
et al. (2009), pages 1609–1616.

Szepesvári, C. (1997). The asymptotic convergence-rate of Q-learning. In Jordan, M. I.,

Kearns, M. J., and Solla, S. A., editors, Advances in Neural Information Processing Sys-
tems 10, pages 1064–1070, Cambridge, MA, USA. MIT Press.

Szepesvári, C. (1997). Learning and exploitation do not conflict under minimax optimality. In
Someren, M. and Widmer, G., editors, Machine Learning: ECML’97 (9th European Conf.
on Machine Learning, Proceedings), volume 1224 of Lecture Notes in Artificial Intelligence,
pages 242–249. Springer, Berlin.

Szepesvári, C. (1998). Static and Dynamic Aspects of Optimal Sequential Decision Making.
PhD thesis, Bolyai Institute of Mathematics, University of Szeged, Szeged, Aradi vrt. tere
1, HUNGARY, 6720.

Szepesvári, C. (2001). Efficient approximate planning in continuous space Markovian decision

problems. AI Communications, 13:163–176.

Szepesvári, C. and Littman, M. L. (1999). A unified analysis of value-function-based

reinforcement-learning algorithms. Neural Computation, 11:2017–2059.

Szepesvári, C. and Smart, W. D. (2004). Interpolation-based Q-learning. In Brodley, C. E.,

editor, Proceedings of the 21st International Conference on Machine Learning (ICML
2004), pages 791–798. ACM.

Szita, I. and Lőrincz, A. (2008). The many faces of optimism: a unifying approach. In Cohen
et al. (2008), pages 1048–1055.

Szita, I. and Szepesvári, C. (2010). Model-based reinforcement learning with nearly tight
exploration complexity bounds. In Wrobel et al. (2010).

Tadić, V. B. (2004). On the almost sure rate of convergence of linear stochastic approximation
algorithms. IEEE Transactions on Information Theory, 5(2):401–409.

95
Tanner, B. and White, A. (2009). RL-Glue: Language-independent software for
reinforcement-learning experiments. Journal of Machine Learning Research, 10:2133–2136.

Taylor, G. and Parr, R. (2009). Kernelized value function approximation for reinforcement
learning. In Danyluk et al. (2009), pages 1017–1024.

Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-

level play. Neural Computation, 6(2):215–219.

Tesauro, G., Touretzky, D., and Leen, T., editors (1995). NIPS-7: Advances in Neural
Information Processing Systems: Proceedings of the 1994 Conference, Cambridge, MA,
USA. MIT Press.

Thrun, S. B. (1992). Efficient exploration in reinforcement learning. Technical Report CMU-

CS-92-102, Carnegie Mellon University, Pittsburgh, PA.

Toussaint, M., Charlin, L., and Poupart, P. (2008). Hierarchical POMDP controller opti-
mization by likelihood maximization. In McAllester and Myllymäki (2008), pages 562–570.

Tsitsiklis, J. N. (1994). Asynchronous stochastic approximation and Q-learning. Machine

Learning, 16(3):185–202.

Tsitsiklis, J. N. and Mannor, S. (2004). The sample complexity of exploration in the multi-
armed bandit problem. Journal of Machine Learning Research, 5:623–648.

Tsitsiklis, J. N. and Van Roy, B. (1996). Feature-based methods for large scale dynamic
programming. Machine Learning, 22:59–94.

Tsitsiklis, J. N. and Van Roy, B. (1997). An analysis of temporal di↵erence learning with
function approximation. IEEE Transactions on Automatic Control, 42:674–690.

Tsitsiklis, J. N. and Van Roy, B. (1999a). Average cost temporal-di↵erence learning. Auto-
matica, 35(11):1799–1808.

Tsitsiklis, J. N. and Van Roy, B. (1999b). Optimal stopping of Markov processes: Hilbert
space theory, approximation algorithms, and an application to pricing financial derivatives.
IEEE Transactions on Automatic Control, 44:1840–1851.

Tsitsiklis, J. N. and Van Roy, B. (2001). Regression methods for pricing complex American-
style options. IEEE Transactions on Neural Networks, 12:694–703.

Tsybakov, A. (2009). Introduction to nonparametric estimation. Springer Verlag.

96
Van Roy, B. (2006). Performance loss bounds for approximate value iteration with state
aggregation. Mathematics of Operations Research, 31(2):234–244.

Wahba, G. (2003). Reproducing kernel Hilbert spaces – two brief reviews. In Proceedings of
the 13th IFAC Symposium on System Identification, pages 549–559.

Wang, T., Lizotte, D. J., Bowling, M. H., and Schuurmans, D. (2008). Stable dual dynamic
programming. In Platt et al. (2008).

Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, King’s College,
Cambridge, UK.

Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Machine Learning, 3(8):279–292.

Widrow, B. and Stearns, S. (1985). Adaptive Signal Processing. Prentice Hall, Englewood
Cli↵s, NJ.

Williams, R. J. (1987). A class of gradient-estimating algorithms for reinforcement learning

in neural networks. In Proceedings of the IEEE First International Conference on Neural
Networks, San Diego, CA.

Wrobel, S., Fürnkranz, J., and Joachims, T., editors (2010). Proceedings of the 27th An-
nual International Conference on Machine Learning (ICML 2010), ACM International
Conference Proceeding Series, New York, NY, USA. ACM.

Xu, X., He, H., and Hu, D. (2002). Efficient reinforcement learning using recursive least-
squares methods. Journal of Artificial Intelligence Research, 16:259–292.

Xu, X., Hu, D., and Lu, X. (2007). Kernel-based least squares policy iteration for reinforce-
ment learning. IEEE Transactions on Neural Networks, 18:973–992.

Yu, H. and Bertsekas, D. (2007). Q-learning algorithms for optimal stopping based on least
squares. In Proceedings of the European Control Conference.

Yu, J. and Bertsekas, D. P. (2008). New error bounds for approximations from projected lin-
ear equations. Technical Report C-2008-43, Department of Computer Science, University
of Helsinki. revised July, 2009.

Yu, J. Y., Mannor, S., and Shimkin, N. (2009). Markov decision processes with arbitrary
reward processes. Mathematics of Operations Research. to appear.

Zhang, W. and Dietterich, T. G. (1995). A reinforcement learning approach to job-shop

scheduling. In Perrault, C. R. and Mellish, C. S., editors, Proceedings of the Fourteenth

97
International Joint Conference on Artificial Intelligence (IJCAI 95), pages 1114–1120,
San Francisco, CA, USA. Morgan Kaufmann.

Deep Reinforcement Learning: Lecture Notes
No ratings yet
Deep Reinforcement Learning: Lecture Notes
60 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
406 pages
Art of Perfumery Me 00 Pies
No ratings yet
Art of Perfumery Me 00 Pies
308 pages
Untitled
No ratings yet
Untitled
48 pages
Module 4-6: E M A E M I T S C O L L E G E P H I L I P P I N E S
100% (1)
Module 4-6: E M A E M I T S C O L L E G E P H I L I P P I N E S
14 pages
Deep Reinforcement Learning: Overcoming The Challenges of Deep Learning in Discrete and Continuous Markov Decision Processes
No ratings yet
Deep Reinforcement Learning: Overcoming The Challenges of Deep Learning in Discrete and Continuous Markov Decision Processes
110 pages
Coconut Pulp and Eggshell Chalk Potential Unveiled
No ratings yet
Coconut Pulp and Eggshell Chalk Potential Unveiled
59 pages
Final Evolution of Indian Cinema
No ratings yet
Final Evolution of Indian Cinema
36 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
NOTES ON IPC (RA No. 8293) - LAW On PATENTS
100% (4)
NOTES ON IPC (RA No. 8293) - LAW On PATENTS
21 pages
Nonlinear Programming: Analysis and Methods
From Everand
Nonlinear Programming: Analysis and Methods
Mordecai Avriel
5/5 (1)
Reinforcement Learning and Dynamic Programming For Control
100% (1)
Reinforcement Learning and Dynamic Programming For Control
111 pages
Berio Seq1
100% (1)
Berio Seq1
9 pages
HTML Quiz Questions
100% (1)
HTML Quiz Questions
11 pages
Simulation-Based Optimization Parametric Optimizat
100% (1)
Simulation-Based Optimization Parametric Optimizat
11 pages
Extrajudicial Settlement by Agreement Between Heirs - PEREIRA v. CA
100% (1)
Extrajudicial Settlement by Agreement Between Heirs - PEREIRA v. CA
2 pages
4 - Lighting and Energy Standards and Codes
No ratings yet
4 - Lighting and Energy Standards and Codes
34 pages
MainNav GPS Manual MG-950d User Manual 2008-09-16
No ratings yet
MainNav GPS Manual MG-950d User Manual 2008-09-16
20 pages
Policy Gradient Reinforcement Learning Without Regret: by Travis Dick
No ratings yet
Policy Gradient Reinforcement Learning Without Regret: by Travis Dick
108 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
Notes Summary
No ratings yet
Notes Summary
65 pages
Ece 10 - Microprocessor and Microcontroller System and Design (Module 1)
No ratings yet
Ece 10 - Microprocessor and Microcontroller System and Design (Module 1)
20 pages
Lecture 2
No ratings yet
Lecture 2
47 pages
4 Words and Phrases 1 2022
No ratings yet
4 Words and Phrases 1 2022
4 pages
Modern Deep Reinforcement Learning Algorithms
No ratings yet
Modern Deep Reinforcement Learning Algorithms
56 pages
Markov Decicion
No ratings yet
Markov Decicion
40 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
RL Test Leif
No ratings yet
RL Test Leif
163 pages
Algorithms For Reinforcement Learning - Szepesvari
No ratings yet
Algorithms For Reinforcement Learning - Szepesvari
98 pages
Reinforcement Learning - A Comprehensive Overview
No ratings yet
Reinforcement Learning - A Comprehensive Overview
177 pages
Lecture 12 Slides - After
No ratings yet
Lecture 12 Slides - After
50 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
Exploring Reinforcement Learning Algorithms: Information and Communication Technologies Department
No ratings yet
Exploring Reinforcement Learning Algorithms: Information and Communication Technologies Department
60 pages
3.RL Unit 3
No ratings yet
3.RL Unit 3
31 pages
Supply Chain Performance: Achieving Strategic Fit and Scope
No ratings yet
Supply Chain Performance: Achieving Strategic Fit and Scope
28 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
Add-On DRL CS06
No ratings yet
Add-On DRL CS06
23 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Audio To Text Embedding
No ratings yet
Audio To Text Embedding
144 pages
Algorithm For RL
No ratings yet
Algorithm For RL
99 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
52 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
28 pages
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
No ratings yet
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
66 pages
RLAlgs in MDPs
No ratings yet
RLAlgs in MDPs
98 pages
5SC28 L7 Machine Learning
No ratings yet
5SC28 L7 Machine Learning
61 pages
RL-Notes Book
No ratings yet
RL-Notes Book
119 pages
Lecture Notes v1.0 687 F22
No ratings yet
Lecture Notes v1.0 687 F22
115 pages
Book Review: Learning To Maximize Rewards: A Review of R.S. Sutton and A.G. Barto's Pages, ISBN 0-262-19398-1, $42.00
No ratings yet
Book Review: Learning To Maximize Rewards: A Review of R.S. Sutton and A.G. Barto's Pages, ISBN 0-262-19398-1, $42.00
3 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Emergent Complexity Via Multiagent Competition
No ratings yet
Emergent Complexity Via Multiagent Competition
12 pages
Alg RLearning Ejemplo
No ratings yet
Alg RLearning Ejemplo
99 pages
Markov Decision Process and Reinforcement Learning
No ratings yet
Markov Decision Process and Reinforcement Learning
36 pages
SSRN 4963741
No ratings yet
SSRN 4963741
26 pages
RL Unit - Iii
No ratings yet
RL Unit - Iii
20 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
232 pages
Reinf 2
No ratings yet
Reinf 2
4 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
Issues in Using Function Approximation For Reinforcement Learning
No ratings yet
Issues in Using Function Approximation For Reinforcement Learning
9 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
07 Deep Reinforcement Learning (John)
No ratings yet
07 Deep Reinforcement Learning (John)
52 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
ML 4
No ratings yet
ML 4
4 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Kalman Filters: Fundamentals and Applications
From Everand
Kalman Filters: Fundamentals and Applications
Fouad Sabry
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
Lecture Doubts
No ratings yet
Lecture Doubts
2 pages
Theories of Conflict - Lecture Notes
No ratings yet
Theories of Conflict - Lecture Notes
5 pages
Algorithms For Reinforced Learning
No ratings yet
Algorithms For Reinforced Learning
98 pages
RL
No ratings yet
RL
1 page
Distribution Strategy at Walmart
No ratings yet
Distribution Strategy at Walmart
10 pages
RL Frontmatter
No ratings yet
RL Frontmatter
11 pages
CEO Key Performance Indicators 2014-15
No ratings yet
CEO Key Performance Indicators 2014-15
3 pages
Amazing Grace (D)
No ratings yet
Amazing Grace (D)
1 page
Learning Objectives: The Global Trade and Investment Environment
No ratings yet
Learning Objectives: The Global Trade and Investment Environment
9 pages
2015 Paper Source Colorscope
No ratings yet
2015 Paper Source Colorscope
1 page
Open Source Demystified Level 1
No ratings yet
Open Source Demystified Level 1
22 pages
Tourism Industries in Assam Agriculture Economy Geography
No ratings yet
Tourism Industries in Assam Agriculture Economy Geography
6 pages
Health Effects of Voluntary Exposure To Cold Water A Continuing Subject of Debate
No ratings yet
Health Effects of Voluntary Exposure To Cold Water A Continuing Subject of Debate
17 pages
Unit-3-Waves-Definitions and Formula Sheet
No ratings yet
Unit-3-Waves-Definitions and Formula Sheet
3 pages
Micro Optical Tech Letters - 2007 - Luo - Multilayer Frequency Selective Surface With Grating Lobe Suppression
No ratings yet
Micro Optical Tech Letters - 2007 - Luo - Multilayer Frequency Selective Surface With Grating Lobe Suppression
3 pages
Instant Access To An Introduction To International Relations Theory Perspectives and Themes Third Edition Jill Steans Ebook Full Chapters
No ratings yet
Instant Access To An Introduction To International Relations Theory Perspectives and Themes Third Edition Jill Steans Ebook Full Chapters
55 pages
Applied Spatial Statistics and Econometrics 1st Edition Katarzyna Kopczewska PDF Download
No ratings yet
Applied Spatial Statistics and Econometrics 1st Edition Katarzyna Kopczewska PDF Download
62 pages
311 Application SC Rout
No ratings yet
311 Application SC Rout
5 pages
MAT 210 School Based
No ratings yet
MAT 210 School Based
3 pages
Bayesian Learning: Fundamentals and Applications
From Everand
Bayesian Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
వదినా కొంచెం వాటిని తాకచ్చా ప్లీజ్
35% (31)
వదినా కొంచెం వాటిని తాకచ్చా ప్లీజ్
10 pages

RLAlgs in MDPs

Uploaded by

RLAlgs in MDPs

Uploaded by

Li, Y., Szepesvári, C., and Schuurmans, D. (2009).

Learning exercise policies for american

Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning.

Mahadevan, S. (2009). Learning representation and control in Markov decision processes:

Nouri, A. and Littman, M. (2009). Multi-resolution exploration in continuous spaces. In

Ormoneit, D. and Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning,

Perkins, T. and Precup, D. (2003). A convergent form of approximate policy iteration. In

Peters, J. and Schaal, S. (2008). Natural actor-critic. Neurocomputing, 71(7–9):1180–1190.

Polyak, B. and Juditsky, A. (1992). Acceleration of stochastic approximation by averaging.

Powell, W. B. (2007). Approximate Dynamic Programming: Solving the curses of dimen-

Proper, S. and Tadepalli, P. (2006). Scaling model-based average-reward reinforcement

Puterman, M. (1994). Markov Decision Processes — Discrete Stochastic Dynamic Program-

Rasmussen, C. E. and Kuss, M. (2004). Gaussian processes in reinforcement learning. In

Rust, J. (1996). Using randomization to break the curse of dimensionality. Econometrica,

Schraudolph, N. (1999). Local gain adaptation in stochastic gradient descent. In Ninth

Shapiro, A. (2003). Monte Carlo sampling methods. In Stochastic Programming, Handbooks

Singh, S. P. and Sutton, R. S. (1996). Reinforcement learning with replacing eligibility

Strehl, A. L. and Littman, M. L. (2005). A theoretical analysis of model-based interval

Sutton, R. S. (1984). Temporal Credit Assignment in Reinforcement Learning. PhD thesis,

Sutton, R. S. (1988). Learning to predict by the method of temporal di↵erences. Machine

Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. Bradford

Szepesvári, C. (1997). The asymptotic convergence-rate of Q-learning. In Jordan, M. I.,

Szepesvári, C. (2001). Efficient approximate planning in continuous space Markovian decision

Szepesvári, C. and Littman, M. L. (1999). A unified analysis of value-function-based

Szepesvári, C. and Smart, W. D. (2004). Interpolation-based Q-learning. In Brodley, C. E.,

Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-

Thrun, S. B. (1992). Efficient exploration in reinforcement learning. Technical Report CMU-

Tsitsiklis, J. N. (1994). Asynchronous stochastic approximation and Q-learning. Machine

Tsybakov, A. (2009). Introduction to nonparametric estimation. Springer Verlag.

Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Machine Learning, 3(8):279–292.

Williams, R. J. (1987). A class of gradient-estimating algorithms for reinforcement learning

Zhang, W. and Dietterich, T. G. (1995). A reinforcement learning approach to job-shop

You might also like