| 2024 |
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
|
|
| 2024 |
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
|
|
| 2023 |
Empirical Design in Reinforcement Learning
|
|
| 2023 |
Does Zero-Shot Reinforcement Learning Exist?
|
|
| 2023 |
BACKSTEPPING TEMPORAL DIFFERENCE LEARNING
|
|
| 2023 |
An Analysis of Quantile Temporal-Difference Learning
|
|
| 2022 |
Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error
|
|
| 2022 |
Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach
|
|
| 2022 |
TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning
|
|
| 2022 |
Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk
|
|
| 2022 |
Safety-constrained Reinforcement Learning with a Distributional Safety Critic
|
|
| 2022 |
SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics
|
|
| 2022 |
Constrained Variational Policy Optimization for Safe Reinforcement Learning
|
|
| 2022 |
Conformal Off-Policy Prediction in Contextual Bandits
|
|
| 2022 |
A Review of Off-Policy Evaluation in Reinforcement Learning
|
|
| 2021 |
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
|
|
| 2021 |
Learning Successor States and Goal-Dependent Values: A Mathematical Viewpoint
|
|
| 2021 |
Learning One Representation to Optimize All Rewards
|
|
| 2021 |
Hoeffding’s Inequality for General Markov Chains and Its Applications to Statistical Learning
|
|
| 2021 |
Adaptive Sampling for Best Policy Identification in MDPs
|
|
| 2020 |
Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning
|
|
| 2020 |
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
|
|
| 2020 |
Fast active learning for pure exploration in reinforcement learning
|
|
| 2020 |
CoinDICE: Off-Policy Confidence Interval Estimation
|
|
| 2019 |
Revisiting the Softmax Bellman Operator: New Benefits and New Perspective
|
|
| 2019 |
Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP
|
|
| 2019 |
Provably Efficient Reinforcement Learning with Linear Function Approximation
|
|
| 2019 |
Benchmarking Safe Exploration in Deep Reinforcement Learning
|
|
| 2018 |
Is Q-learning Provably Efficient?
|
|
| 2018 |
Deep Reinforcement Learning that Matters
|
|
| 2018 |
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
|
|
| 2018 |
Adaptive Sampling for Policy Identification
|
|
| 2017 |
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
|
|
| 2017 |
Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
|
|
| 2017 |
Constrained Policy Optimization
|
|
| 2016 |
Learning the Variance of the Reward-To-Go
|
|
| 2015 |
Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs
|
|
| 2015 |
Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach
|
|
| 2015 |
High Confidence Policy Improvement
|
|
| 2015 |
High Confidence Off-Policy Evaluation
|
|
| 2015 |
A Comprehensive Survey on Safe Reinforcement Learning
|
|
| 2012 |
Policy Gradients with Variance Related Risk Criteria
|
|
| 2009 |
An Analysis of Reinforcement Learning with Function Approximation
|
|
| 2008 |
An Analysis of Model-Based Interval Estimation for Markov Decision Processes
|
|
| 2006 |
PAC Model-Free Reinforcement Learning
|
|
| 2004 |
Bias and Variance in Value Function Estimation
|
|
| 2001 |
TD Algorithm for the Variance of Return and Mean-Variance Reinforcement Learning
|
|
| 2001 |
Convergence of Optimistic and Incremental Q-Learning
|
|
| 2000 |
Eligibility Traces for Off-Policy Policy Evaluation
|
|
| 2000 |
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
|
|
| 1993 |
Convergence of Stochastic Iterative Dynamic Programming Algorithms
|
|
| 1992 |
Reinforcement Learning Applied to Linear Quadratic Regulation
|
|
| 1982 |
The Variance of Discounted Markov Decision Processes
|
|