Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms Paper • 2304.03056 • Published Apr 6, 2023
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal Paper • 2205.14211 • Published May 27, 2022
Local and adaptive mirror descents in extensive-form games Paper • 2309.00656 • Published Sep 1, 2023
Model-free Posterior Sampling via Learning Rate Randomization Paper • 2310.18186 • Published Oct 27, 2023
Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments Paper • 2211.10515 • Published Nov 18, 2022
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm Paper • 2305.18501 • Published May 29, 2023
A New Bound on the Cumulant Generating Function of Dirichlet Processes Paper • 2409.18621 • Published Sep 27, 2024
VA-learning as a more efficient alternative to Q-learning Paper • 2305.18161 • Published May 29, 2023
Generalized Preference Optimization: A Unified Approach to Offline Alignment Paper • 2402.05749 • Published Feb 8, 2024
RL-finetuning LLMs from on- and off-policy data with a single algorithm Paper • 2503.19612 • Published Mar 25
A Provably Efficient Sample Collection Strategy for Reinforcement Learning Paper • 2007.06437 • Published Jul 13, 2020
Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall Paper • 2106.06279 • Published Jun 11, 2021
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation Paper • 2106.13125 • Published Jun 24, 2021
UCB Momentum Q-learning: Correcting the bias without forgetting Paper • 2103.01312 • Published Mar 1, 2021
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret Paper • 2104.11186 • Published Apr 22, 2021
Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity Paper • 2111.02338 • Published Nov 3, 2021
Marginalized Operators for Off-policy Reinforcement Learning Paper • 2203.16177 • Published Mar 30, 2022