default search action
Shangtong Zhang
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c22]Shuze Liu, Shangtong Zhang:
Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design. ICML 2024 - [c21]Shangtong Zhang, Xueyan Wang, Weisheng Zhao, Yier Jin:
CRISP: Triangle Counting Acceleration via Content Addressable Memory-Integrated 3D-Stacked Memory. ITC-Asia 2024: 1-6 - [i32]Shuze Liu, Shuhang Chen, Shangtong Zhang:
The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise. CoRR abs/2401.07844 (2024) - [i31]Jiuqi Wang, Ethan Blaser, Hadi Daneshmand, Shangtong Zhang:
Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning. CoRR abs/2405.13861 (2024) - [i30]Shuze Liu, Yuxin Chen, Shangtong Zhang:
Efficient Multi-Policy Evaluation for Reinforcement Learning. CoRR abs/2408.08706 (2024) - [i29]Jiuqi Wang, Shangtong Zhang:
Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features. CoRR abs/2409.12135 (2024) - [i28]Ethan Blaser, Shangtong Zhang:
Almost Sure Convergence of Average Reward Temporal Difference Learning. CoRR abs/2409.19546 (2024) - [i27]Shuze Liu, Claire Chen, Shangtong Zhang:
Doubly Optimal Policy Evaluation for Reinforcement Learning. CoRR abs/2410.02226 (2024) - [i26]Claire Chen, Shuze Liu, Shangtong Zhang:
Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning. CoRR abs/2410.05655 (2024) - [i25]Xiaochi Qian, Zixuan Xie, Xinyu Liu, Shangtong Zhang:
Almost Sure Convergence Rates and Concentration of Stochastic Approximation and Reinforcement Learning with Markovian Noise. CoRR abs/2411.13711 (2024) - [i24]Amar Kulkarni, Shangtong Zhang, Madhur Behl:
CRASH: Challenging Reinforcement-Learning Based Adversarial Scenarios For Safety Hardening. CoRR abs/2411.16996 (2024) - 2023
- [j4]Yuntao Wei, Xueyan Wang, Shangtong Zhang, Jianlei Yang, Xiaotao Jia, Zhaohao Wang, Gang Qu, Weisheng Zhao:
IMGA: Efficient In-Memory Graph Convolution Network Aggregation With Data Flow Optimizations. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 42(12): 4695-4705 (2023) - [c20]Shangtong Zhang:
A New Challenge in Policy Evaluation. AAAI 2023: 15465 - [c19]Shangtong Zhang, Remi Tachet des Combes, Romain Laroche:
On the Convergence of SARSA with Linear Function Approximation. ICML 2023: 41613-41646 - [i23]Shuze Liu, Shangtong Zhang:
Improving Monte Carlo Evaluation with Offline Data. CoRR abs/2301.13734 (2023) - [i22]Xiaochi Qian, Shangtong Zhang:
Direct Gradient Temporal Difference Learning. CoRR abs/2308.01170 (2023) - [i21]Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Çaglar Gülçehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad Zolna, Julian Schrittwieser, David H. Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gómez Colmenarejo, Aäron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals:
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning. CoRR abs/2308.03526 (2023) - 2022
- [j3]Shangtong Zhang, Shimon Whiteson:
Truncated Emphatic Temporal Difference Methods for Prediction and Control. J. Mach. Learn. Res. 23: 153:1-153:59 (2022) - [j2]Shangtong Zhang, Remi Tachet des Combes, Romain Laroche:
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch. J. Mach. Learn. Res. 23: 343:1-343:91 (2022) - [c18]Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt:
Learning Expected Emphatic Traces for Deep RL. AAAI 2022: 7015-7023 - [c17]Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes:
A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms. AAMAS 2022: 1491-1499 - [i20]Shangtong Zhang, Remi Tachet des Combes, Romain Laroche:
On the Chattering of SARSA with Linear Function Approximation. CoRR abs/2202.06828 (2022) - 2021
- [c16]Shangtong Zhang, Bo Liu, Shimon Whiteson:
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning. AAAI 2021: 10905-10913 - [c15]Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson:
Average-Reward Off-Policy Policy Evaluation with Function Approximation. ICML 2021: 12578-12588 - [c14]Shangtong Zhang, Hengshuai Yao, Shimon Whiteson:
Breaking the Deadly Triad with a Target Network. ICML 2021: 12621-12631 - [c13]Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson:
Deep Residual Reinforcement Learning (Extended Abstract). IJCAI 2021: 4869-4873 - [i19]Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson:
Average-Reward Off-Policy Policy Evaluation with Function Approximation. CoRR abs/2101.02808 (2021) - [i18]Shangtong Zhang, Hengshuai Yao, Shimon Whiteson:
Breaking the Deadly Triad with a Target Network. CoRR abs/2101.08862 (2021) - [i17]Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt:
Learning Expected Emphatic Traces for Deep RL. CoRR abs/2107.05405 (2021) - [i16]Shangtong Zhang, Shimon Whiteson:
Truncated Emphatic Temporal Difference Methods for Prediction and Control. CoRR abs/2108.05338 (2021) - [i15]Shangtong Zhang, Remi Tachet des Combes, Romain Laroche:
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch. CoRR abs/2111.02997 (2021) - 2020
- [c12]Yuhang Song, Jianyi Wang, Thomas Lukasiewicz, Zhenghua Xu, Shangtong Zhang, Andrzej Wojcicki, Mai Xu:
Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards. AAAI 2020: 5826-5833 - [c11]Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson:
Deep Residual Reinforcement Learning. AAMAS 2020: 1611-1619 - [c10]Shangtong Zhang, Bo Liu, Shimon Whiteson:
GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values. ICML 2020: 11194-11203 - [c9]Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson:
Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation. ICML 2020: 11204-11213 - [c8]Shangtong Zhang, Vivek Veeriah, Shimon Whiteson:
Learning Retrospective Knowledge with Reverse Reinforcement Learning. NeurIPS 2020 - [i14]Shangtong Zhang, Bo Liu, Shimon Whiteson:
GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values. CoRR abs/2001.11113 (2020) - [i13]Shangtong Zhang, Bo Liu, Shimon Whiteson:
Per-Step Reward: A New Perspective for Risk-Averse Reinforcement Learning. CoRR abs/2004.10888 (2020) - [i12]Shangtong Zhang, Vivek Veeriah, Shimon Whiteson:
Learning Retrospective Knowledge with Reverse Reinforcement Learning. CoRR abs/2007.06703 (2020) - [i11]Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes:
A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms. CoRR abs/2010.01069 (2020)
2010 – 2019
- 2019
- [c7]Shangtong Zhang, Hengshuai Yao:
ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search. AAAI 2019: 5789-5796 - [c6]Shangtong Zhang, Hengshuai Yao:
QUOTA: The Quantile Option Architecture for Reinforcement Learning. AAAI 2019: 5797-5804 - [c5]Borislav Mavrin, Shangtong Zhang, Hengshuai Yao, Linglong Kong:
Exploration in the Face of Parametric and Intrinsic Uncertainties. AAMAS 2019: 2117-2119 - [c4]Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson:
Generalized Off-Policy Actor-Critic. NeurIPS 2019: 1999-2009 - [c3]Shangtong Zhang, Shimon Whiteson:
DAC: The Double Actor-Critic Architecture for Learning Options. NeurIPS 2019: 2010-2020 - [i10]Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson:
Generalized Off-Policy Actor-Critic. CoRR abs/1903.11329 (2019) - [i9]Shangtong Zhang, Shimon Whiteson:
DAC: The Double Actor-Critic Architecture for Learning Options. CoRR abs/1904.12691 (2019) - [i8]Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson:
Deep Residual Reinforcement Learning. CoRR abs/1905.01072 (2019) - [i7]Yuhang Song, Jianyi Wang, Thomas Lukasiewicz, Zhenghua Xu, Shangtong Zhang, Mai Xu:
Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards. CoRR abs/1905.04640 (2019) - [i6]Borislav Mavrin, Shangtong Zhang, Hengshuai Yao, Linglong Kong, Kaiwen Wu, Yaoliang Yu:
Distributional Reinforcement Learning for Efficient Exploration. CoRR abs/1905.06125 (2019) - [i5]Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson:
Provably Convergent Off-Policy Actor-Critic with Function Approximation. CoRR abs/1911.04384 (2019) - 2018
- [j1]Ryan R. Curtin, Marcus Edel, Mikhail Lozhnikov, Yannis Mentekidis, Sumedh Ghaisas, Shangtong Zhang:
mlpack 3: a fast, flexible machine learning library. J. Open Source Softw. 3(26): 726 (2018) - [i4]Shangtong Zhang, Borislav Mavrin, Linglong Kong, Bo Liu, Hengshuai Yao:
QUOTA: The Quantile Option Architecture for Reinforcement Learning. CoRR abs/1811.02073 (2018) - [i3]Shangtong Zhang, Hao Chen, Hengshuai Yao:
ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search. CoRR abs/1811.02696 (2018) - 2017
- [c2]Vivek Veeriah, Shangtong Zhang, Richard S. Sutton:
Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks. ECML/PKDD (1) 2017: 445-459 - [i2]Shangtong Zhang, Osmar R. Zaïane:
Comparing Deep Reinforcement Learning and Evolutionary Methods in Continuous Control. CoRR abs/1712.00006 (2017) - [i1]Shangtong Zhang, Richard S. Sutton:
A Deeper Look at Experience Replay. CoRR abs/1712.01275 (2017) - 2015
- [c1]Pengjing Zhang, Xiaoqing Zheng, Wenqiang Zhang, Siyan Li, Sheng Qian, Wenqi He, Shangtong Zhang, Ziyuan Wang:
A Deep Neural Network for Modeling Music. ICMR 2015: 379-386
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-01-02 18:17 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint