Michal Valko's picture

Open to Collab

2 2 1

Michal Valko

misovalko

·

https://misovalko.github.io/

AI & ML interests

large language models, reasoning, fine-tuning, test-time computation, reinforcement learning with human feedback, world models

Recent Activity

upvoted a paper 2 days ago

A General Theoretical Paradigm to Understand Learning from Human Preferences

authored a paper 2 days ago

Optimal Design for Reward Modeling in RLHF

authored a paper 2 days ago

Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms

View all activity

Organizations

authored 20 papers 2 days ago

Optimal Design for Reward Modeling in RLHF

Paper • 2410.17055 • Published Oct 22, 2024

Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms

Paper • 2304.03056 • Published Apr 6, 2023

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

Paper • 2205.14211 • Published May 27, 2022

Local and adaptive mirror descents in extensive-form games

Paper • 2309.00656 • Published Sep 1, 2023

Model-free Posterior Sampling via Learning Rate Randomization

Paper • 2310.18186 • Published Oct 27, 2023

Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments

Paper • 2211.10515 • Published Nov 18, 2022

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

Paper • 2305.18501 • Published May 29, 2023

A New Bound on the Cumulant Generating Function of Dirichlet Processes

Paper • 2409.18621 • Published Sep 27, 2024

VA-learning as a more efficient alternative to Q-learning

Paper • 2305.18161 • Published May 29, 2023

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Paper • 2402.05749 • Published Feb 8, 2024

Preference Optimization with Multi-Sample Comparisons

Paper • 2410.12138 • Published Oct 16, 2024

RL-finetuning LLMs from on- and off-policy data with a single algorithm

Paper • 2503.19612 • Published Mar 25

A Provably Efficient Sample Collection Strategy for Reinforcement Learning

Paper • 2007.06437 • Published Jul 13, 2020

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

Paper • 2106.06279 • Published Jun 11, 2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Paper • 2106.13125 • Published Jun 24, 2021

Broaden Your Views for Self-Supervised Video Learning

Paper • 2103.16559 • Published Mar 30, 2021

UCB Momentum Q-learning: Correcting the bias without forgetting

Paper • 2103.01312 • Published Mar 1, 2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Paper • 2104.11186 • Published Apr 22, 2021

Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity

Paper • 2111.02338 • Published Nov 3, 2021

Marginalized Operators for Off-policy Reinforcement Learning

Paper • 2203.16177 • Published Mar 30, 2022