Accepted Papers — RLC 2025
123 accepted papers
Friday, August 8
Applied RL
Action Mapping for Reinforcement Learning in Continuous Environments with Constraints
Mirco Theile, Lukas Dirnberger, Raphael Trumpp, Marco Caccamo, Alberto Sangiovanni-Vincentelli
Chargax: A JAX Accelerated EV Charging Simulator
Koen Ponse, Jan Felix Kleuker, Thomas M. Moerland, Aske Plaat
WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies
William Solow, Sandhya Saisubramanian, Alan Fern
Drive Fast, Learn Faster: On-Board RL for High Performance Autonomous Racing
Benedict Hildisch, Edoardo Ghignone, Nicolas Baumann, Cheng Hu, Andrea Carron, Michele Magno
Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits
Yannik Mahlau, Maximilian Schier, Christoph Reinders, Frederik Schubert, Marco Bügling, Bodo Rosenhahn
Gaussian Process Q-Learning for Finite-Horizon Markov Decision Process
Maximilian Bloor, Tom Savage, Calvin Tsay, Antonio Del rio chanona, Max Mowbray
Hybrid Classical/RL Local Planner for Ground Robot Navigation
Vishnu Dutt Sharma, Jeongran Lee, Matthew Andrews, Ilija Hadžić
V-Max: Making RL Practical for Autonomous Driving
Valentin Charraut, Waël Doulazmi, Thomas Tournaire, Thibault Buhet
Shaping Laser Pulses with Reinforcement Learning
Francesco Capuano, Davorin Peceli, Gabriele Tiboni
Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics
Foundations
Effect of a slowdown correlated to the current state of the environment on an asynchronous learning architecture
Idriss Abdallah, Laurent CIARLETTA, Patrick HENAFF, Jonathan Champagne, Matthieu BONAVENT
Average-Reward Soft Actor-Critic
Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V Kulkarni
Your Learned Constraint is Secretly a Backward Reachable Tube
Mohamad Qadri, Gokul Swamy, Jonathan Francis, Michael Kaess, Andrea Bajcsy
Recursive Reward Aggregation
Yuting Tang, Yivan Zhang, Johannes Ackermann, Yu-Jie Zhang, Soichiro Nishimori, Masashi Sugiyama
Investigating the Utility of Mirror Descent in Off-policy Actor-Critic
Samuel Neumann, Jiamin He, Adam White, Martha White
Rethinking the Foundations for Continual Reinforcement Learning
Esraa Elelimy, David Szepesvari, Martha White, Michael Bowling
An Analysis of Action-Value Temporal-Difference Methods That Learn State Values
Brett Daley, Prabhat Nagarajan, Martha White, Marlos C. Machado
Reinforcement Learning with Adaptive Temporal Discounting
Sahaj Singh Maini, Zoran Tiganj
Multi-Agent RL
Reinforcement Learning for Finite Space Mean-Field Type Game
Kai Shao, Jiacheng Shen, Mathieu Lauriere
Collaboration Promotes Group Resilience in Multi-Agent RL
Ilai Shraga, Guy Azran, Matthias Gerstgrasser, Ofir Abu, Jeffrey Rosenschein, Sarah Keren
Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models
Aaron Dharna, Cong Lu, Jeff Clune
Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense
Aditya Vikram Singh, Ethan Rathbun, Emma Graham, Lisa Oakley, Simona Boboila, Peter Chin, Alina Oprea
Efficient Information Sharing for Training Decentralized Multi-Agent World Models
Xiaoling Zeng, Qi Zhang
Adaptive Reward Sharing to Enhance Learning in the Context of Multiagent Teams
Kyle Tilbury, David Radke
Seldonian Reinforcement Learning for Ad Hoc Teamwork
Edoardo Zorzi, Alberto Castellini, Leonidas Bakopoulos, Georgios Chalkiadakis, Alessandro Farinelli
Joint-Local Grounded Action Transformation for Sim-to-Real Transfer in Multi-Agent Traffic Control
Justin Turnau, Longchao Da, Khoa Vo, Ferdous Al Rafi, Shreyas Bachiraju, Tiejin Chen, Hua Wei
TransAM: Transformer-Based Agent Modeling for Multi-Agent Systems via Local Trajectory Encoding
Conor Wallace, Umer Siddique, Yongcan Cao
PEnGUiN: Partially Equivariant Graph NeUral Networks for Sample Efficient MARL
Joshua McClellan, Greyson Brothers, Furong Huang, Pratap Tokekar
Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers
Jake Grigsby, Yuqi Xie, Justin Sasek, Steven Zheng, Yuke Zhu
RL Algorithms, Deep RL
Bayesian Meta-Reinforcement Learning with Laplace Variational Recurrent Networks
Joery A. de Vries, Jinke He, Mathijs de Weerdt, Matthijs T. J. Spaan
Cascade - A sequential ensemble method for continuous control tasks
Robin Schmöcker, Alexander Dockhorn
HANQ: Hypergradients, Asymmetry, and Normalization for Fast and Stable Deep Q-Learning
Braham Snyder, Chen-Yu Wei
Rectifying Regression in Reinforcement Learning
Alex Ayoub, David Szepesvari, Alireza Bakhtiari, Csaba Szepesvari, Dale Schuurmans
Efficient Morphology-Aware Policy Transfer to New Embodiments
Michael Przystupa, Hongyao Tang, Glen Berseth, Mariano Phielipp, Santiago Miret, Martin Jägersand, Matthew E. Taylor
Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting
Edoardo Cetin, Ahmed Touati, Yann Ollivier
Concept-Based Off-Policy Evaluation
Ritam Majumdar, Jack Teversham, Sonali Parbhoo
Multiple-Frequencies Population-Based Training
Waël Doulazmi, Auguste Lehuger, Marin Toromanoff, Valentin Charraut, Thibault Buhet, Fabien Moutarde
AVG-DICE: Stationary Distribution Correction by Regression
Fengdi Che, Bryan Chan, Chen Ma, A. Rupam Mahmood
Iterated Q-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning
Thursday, August 7
Deep RL
Understanding the Effectiveness of Learning Behavioral Metrics in Deep Reinforcement Learning
Impoola: The Power of Average Pooling for Image-based Deep Reinforcement Learning
Raphael Trumpp, Ansgar Schäfftlein, Mirco Theile, Marco Caccamo
Eau De Q-Network: Adaptive Distillation of Neural Networks in Deep Reinforcement Learning
Théo Vincent, Tim Faust, Yogesh Tripathi, Jan Peters, Carlo D'Eramo
Disentangling Recognition and Decision Regrets in Image-Based Reinforcement Learning
Alihan Hüyük, Arndt Ryo Koblitz, Atefeh Mohajeri Moghaddam, Matthew Andrews
Make the Pertinent Salient: Task-Relevant Reconstruction for Visual Control with Distractions
Kyungmin Kim, JB Lanier, Roy Fox
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Subhojyoti Mukherjee, Josiah P. Hanna, Qiaomin Xie, Robert D Nowak
Sampling from Energy-based Policies using Diffusion
Vineet Jain, Tara Akhound-Sadegh, Siamak Ravanbakhsh
Optimistic critics can empower small actors
Olya Mastikhina, Dhruv Sreenivas, Pablo Samuel Castro
Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks
AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning
Deep Reinforcement Learning with Gradient Eligibility Traces
Esraa Elelimy, Brett Daley, Andrew Patterson, Marlos C. Machado, Adam White, Martha White
Exploration
Uncertainty Prioritized Experience Replay
Rodrigo Antonio Carrasco-Davis, Sebastian Lee, Claudia Clopath, Will Dabney
Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget
Dengwang Tang, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo
Quantitative Resilience Modeling for Autonomous Cyber Defense
Xavier Cadet, Simona Boboila, Edward Koh, Peter Chin, Alina Oprea
Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization
Sebastian Griesbach, Carlo D'Eramo
Syllabus: Portable Curricula for Reinforcement Learning Agents
Ryan Sullivan, Ryan Pégoud, Ameen Ur Rehman, Xinchen Yang, Junyun Huang, Aayush Verma, Nistha Mitra, John P Dickerson
Exploration-Free Reinforcement Learning with Linear Function Approximation
Luca Civitavecchia, Matteo Papini
Value Bonuses using Ensemble Errors for Exploration in Reinforcement Learning
Abdul Wahab, Raksha Kumaraswamy, Martha White
Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models of the World
Akhil Bagaria, Anita De Mello Koch, Rafael Rodriguez-Sanchez, Sam Lobel, George Konidaris
An Optimisation Framework for Unsupervised Environment Design
Nathan Monette, Alistair Letcher, Michael Beukman, Matthew Thomas Jackson, Alexander Rutherford, Alexander David Goldie, Jakob Nicolaus Foerster
Epistemically-guided forward-backward exploration
Núria Armengol Urpí, Marin Vlastelica, Georg Martius, Stelian Coros
RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
Social, Economic, Neuroscience
Pareto Optimal Learning from Preferences with Hidden Context
Ryan Bahlous-Boldi, Li Ding, Lee Spector, Scott Niekum
When and Why Hyperbolic Discounting Matters for Reinforcement Learning Interventions
Ian M. Moore, Eura Nofshin, Siddharth Swaroop, Susan Murphy, Finale Doshi-Velez, Weiwei Pan
Reinforcement Learning from Human Feedback with High-Confidence Safety Guarantees
Yaswanth Chittepu, Blossom Metevier, Will Schwarzer, Austin Hoag, Scott Niekum, Philip S. Thomas
Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models
Kefan Song, Jin Yao, Runnan Jiang, Rohan Chandra, Shangtong Zhang
Reinforcement Learning for Human-AI Collaboration via Probabilistic Intent Inference
Yuxin Lin, Seyede Fatemeh Ghoreishi, Tian Lan, Mahdi Imani
High-Confidence Policy Improvement from Human Feedback
Hon Tik Tse, Philip S. Thomas, Scott Niekum
Building Sequential Resource Allocation Mechanisms without Payments
Sihan Zeng, Sujay Bhatt, Alec Koppel, Sumitra Ganesh
From Explainability to Interpretability: Interpretable Reinforcement Learning Via Model Explanations
Peilang Li, Umer Siddique, Yongcan Cao
Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning
Umer Siddique, Peilang Li, Yongcan Cao
AI in a vat: Fundamental limits of efficient world modelling for safe agent sandboxing
Fernando Rosas, Alexander Boyd, Manuel Baltieri
Theoretical RL, Bandits
A Finite-Time Analysis of Distributed Q-Learning
Han-Dong Lim, Donghwan Lee
Finite-Time Analysis of Minimax Q-Learning
Narim Jeong, Donghwan Lee
Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism
Kihyun Yu, Duksang Lee, William Overman, Dabeen Lee
Non-Stationary Latent Auto-Regressive Bandits
Anna L. Trella, Walter H. Dempsey, Asim Gazi, Ziping Xu, Finale Doshi-Velez, Susan Murphy
A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization in a Discounted MDP
Tejaram Sangadi, Prashanth L. A., Krishna Jagannathan
Leveraging priors on distribution functions for multi-arm bandits
Sumit Vashishtha, Odalric-Ambrym Maillard
Multi-task Representation Learning for Fixed Budget Pure-Exploration in Linear and Bilinear Bandits
Subhojyoti Mukherjee, Qiaomin Xie, Robert D Nowak
On Slowly-varying Non-stationary Bandits
Ramakrishnan K, Aditya Gopalan
Empirical Bound Information-Directed Sampling
Piotr M. Suder, Eric Laber
Thompson Sampling for Constrained Bandits
Rohan Deb, Mohammad Ghavamzadeh, Arindam Banerjee
Achieving Limited Adaptivity for Multinomial Logistic Bandits
Sukruta Prakash Midigeshi, Tanmay Goyal, Gaurav Sinha
Wednesday, August 6
Evaluation, Benchmarks
Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences
Takuya Hiraoka, Takashi Onishi, Guanquan Wang, Yoshimasa Tsuruoka
Offline vs. Online Learning in Model-based RL: Lessons for Data Collection Strategies
Jiaqi Chen, Ji Shi, Cansu Sancaktar, Jonas Frey, Georg Martius
Multi-Task Reinforcement Learning Enables Parameter Scaling
Reginald McLean, Evangelos Chatzaroulas, J K Terry, Isaac Woungang, Nariman Farsad, Pablo Samuel Castro
Benchmarking Massively Parallelized Multi-Task Reinforcement Learning for Robotics Tasks
Viraj Joshi, Zifan Xu, Bo Liu, Peter Stone, Amy Zhang
PufferLib 2.0: Reinforcement Learning at 1M steps/s
Joseph Suarez
Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL
Ömer Veysel Çağatan, Baris Akgun
Benchmarking Partial Observability in Reinforcement Learning with a Suite of Memory-Improvable Domains
Ruo Yu Tao, Kaicheng Guo, Cameron Allen, George Konidaris
How Should We Meta-Learn Reinforcement Learning Algorithms?
Alexander David Goldie, Zilin Wang, Jaron Cohen, Jakob Nicolaus Foerster, Shimon Whiteson
AdaStop: adaptive statistical testing for sound comparisons of Deep RL agents
MixUCB: Enhancing Safe Exploration in Contextual Bandits with Human Oversight
Jinyan Su, Rohan Banerjee, Jiankai Sun, Wen Sun, Sarah Dean
Hierarchical RL, Planning Algorithms
AVID: Adapting Video Diffusion Models to World Models
Marc Rigter, Tarun Gupta, Agrin Hilmkil, Chao Ma
The Confusing Instance Principle for Online Linear Quadratic Control
Waris Radji, Odalric-Ambrym Maillard
Long-Horizon Planning with Predictable Skills
Nico Gürtler, Georg Martius
Optimal discounting for offline input-driven MDP
Randy Lefebvre, Audrey Durand
A Timer-Enforced Hybrid Supervisor for Robust, Chatter-Free Policy Switching
Jan de Priester, Ricardo Sanfelice
Focused Skill Discovery: Learning to Control Specific State Variables while Minimizing Side Effects
Jonathan Colaço Carr, Qinyi Sun, Cameron Allen
Representation Learning and Skill Discovery with Empowerment
Andrew Levy, Alessandro G Allievi, George Konidaris
Compositional Instruction Following with Language Models and Reinforcement Learning
Composition and Zero-Shot Transfer with Lattice Structures in Reinforcement Learning
Double Horizon Model-Based Policy Optimization
RL Algorithms
Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes
Juan Sebastian Rojas, Chi-Guhn Lee
RL³: Boosting Meta Reinforcement Learning via RL inside RL²
Abhinav Bhatia, Samer B. Nashed, Shlomo Zilberstein
Fast Adaptation with Behavioral Foundation Models
Harshit Sikchi, Andrea Tirinzoni, Ahmed Touati, Yingchen Xu, Anssi Kanervisto, Scott Niekum, Amy Zhang, Alessandro Lazaric, Matteo Pirotta
Understanding Learned Representations and Action Collapse in Visual Reinforcement Learning
Xi Chen, Zhihui Zhu, Andrew Perrault
Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions
Ayush Jain, Norio Kosaka, Xinhu Li, Kyung-Min Kim, Erdem Biyik, Joseph J Lim
ProtoCRL: Prototype-based Network for Continual Reinforcement Learning
Michela Proietti, Peter R. Wurman, Peter Stone, Roberto Capobianco
Offline Reinforcement Learning with Domain-Unlabeled Data
Soichiro Nishimori, Xin-Qiang Cai, Johannes Ackermann, Masashi Sugiyama
SPEQ: Offline Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning
Carlo Romeo, Girolamo Macaluso, Alessandro Sestini, Andrew D. Bagdanov
Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps
Motoki Omura, Yusuke Mukuta, Kazuki Ota, Takayuki Osa, Tatsuya Harada
Zero-Shot Reinforcement Learning Under Partial Observability
Scott Jeen, Tom Bewley, Jonathan Cullen
Adaptive Submodular Policy Optimization
Branislav Kveton, Anup Rao, Viet Dac Lai, Nikos Vlassis, David Arbour
RL from Human Feedback, Imitation Learning
Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback
Qinqing Zheng, Mikael Henaff, Amy Zhang, Aditya Grover, Brandon Amos
Nonparametric Policy Improvement in Continuous Action Spaces via Expert Demonstrations
Agustin Castellano, Sohrab Rezaei, Jared Markowitz, Enrique Mallada
DisDP: Robust Imitation Learning via Disentangled Diffusion Policies
Pankhuri Vanjani, Paul Mattes, Xiaogang Jia, Vedant Dave, Rudolf Lioutikov
Mitigating Goal Misgeneralization via Minimax Regret
Karim Abdel Sadek, Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christian Schroeder de Witt, David Krueger, Michael D Dennis
Modelling human exploration with light-weight meta reinforcement learning algorithms
Thomas D. Ferguson, Alona Fyshe, Adam White
Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners
Calarina Muslimani, Kerrick Johnstonbaugh, Suyog Chandramouli, Serena Booth, W. Bradley Knox, Matthew E. Taylor
PAC Apprenticeship Learning with Bayesian Active Inverse Reinforcement Learning
Ondrej Bajgar, Dewi Sid William Gould, Jonathon Liu, Alessandro Abate, Konstantinos Gatsis, Michael A Osborne
Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets
Alexander Levine, Peter Stone, Amy Zhang
One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware and Multi-Source Noise
Amirabbas Afzali, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, Sanjay Lall
Goals vs. Rewards: A Comparative Study of Objective Specification Mechanisms
Septia Rani, Serena Booth, Sarath Sreedharan
RLC 2026