Past
Event
RLC 2025 Schedule
August 4–8, 2025 · University of Alberta, Edmonton, AB, Canada
Monday, August 4
Tutorials Day
2:00 PM – 6:00 PM
Tutorials
📍 CCIS, University of Alberta
Tuesday, August 5
Workshops Day
9:00 AM – 5:00 PM
Workshops
→ Programmatic
Reinforcement Learning
→ Reinforcement
Learning and Video Games
→ Inductive
Biases in Reinforcement Learning
→ Coordination
and Cooperation in Multi-Agent RL
→ Practical
Insights into RL for Real Systems
→ The Causal
Reinforcement Learning Workshop
→ RL Beyond
Rewards: Ingredients for Developing Generalist Agents
→ Finding the
Frame: Examining Conceptual Frameworks in RL
5:00 PM – 6:30 PM
Welcome Reception
📍 University (Faculty) Club
🎉 Social Event
7:00 PM
RLBReW After Dark
Discuss wacky RL ideas over food and drinks, find collaborators and friends!
Wednesday, August 6
Main Conference – Day 1
9:00 AM – 11:30 AM
Oral Presentations — Four Parallel Tracks
▶ Track 1: RL Algorithms
- Burning RED: Unlocking Subtask-Driven RL and Risk-Awareness in Average-Reward MDPs
- RL³: Boosting Meta Reinforcement Learning via RL inside RL²
- Fast Adaptation with Behavioral Foundation Models
- Understanding Learned Representations and Action Collapse in Visual RL
- Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions
- ProtoCRL: Prototype-based Network for Continual RL
▶ Track 2: RLHF & Imitation Learning
- Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback
- Nonparametric Policy Improvement in Continuous Action Spaces via Expert Demonstrations
- DisDP: Robust Imitation Learning via Disentangled Diffusion Policies
- Mitigating Goal Misgeneralization via Minimax Regret
- Modelling Human Exploration with Light-weight Meta RL Algorithms
- Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners
▶ Track 3: Hierarchical RL & Planning
- AVID: Adapting Video Diffusion Models to World Models
- The Confusing Instance Principle for Online Linear Quadratic Control
- Long-Horizon Planning with Predictable Skills
- Optimal Discounting for Offline Input-Driven MDP
- A Timer-Enforced Hybrid Supervisor for Robust, Chatter-Free Policy Switching
- Focused Skill Discovery: Learning to Control Specific State Variables
▶ Track 4: Evaluation & Benchmarks
- Which Experiences Are Influential for RL Agents?
- Offline vs. Online Learning in Model-based RL: Lessons for Data Collection
- Multi-Task RL Enables Parameter Scaling
- Benchmarking Massively Parallelized Multi-Task RL for Robotics
- PufferLib 2.0: Reinforcement Learning at 1M steps/s
- Uncovering RL Integration in SSL Loss
11:45 AM – 12:30 PM
Oral Presentations (Continued)
▶ Track 1: RL Algorithms (cont.)
- Offline RL with Domain-Unlabeled Data
- SPEQ: Offline Stabilization Phases for Efficient Q-Learning
- Offline RL with Wasserstein Regularization via Optimal Transport Maps
- Zero-Shot RL Under Partial Observability
- Adaptive Submodular Policy Optimization
▶ Track 2: RLHF & Imitation (cont.)
- PAC Apprenticeship Learning with Bayesian Active Inverse RL
- Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets
- One Goal, Many Challenges: Robust Preference Optimization Amid Multi-Source Noise
- Goals vs. Rewards: A Comparative Study of Objective Specification Mechanisms
▶ Track 3: Hierarchical RL (cont.)
- Representation Learning and Skill Discovery with Empowerment
- Compositional Instruction Following with Language Models and RL
- Composition and Zero-Shot Transfer with Lattice Structures in RL
- Double Horizon Model-Based Policy Optimization
▶ Track 4: Evaluation (cont.)
- Benchmarking Partial Observability in RL with a Suite of Memory-Improvable Domains
- How Should We Meta-Learn RL Algorithms?
- AdaStop: Adaptive Statistical Testing for Sound Comparisons of Deep RL Agents
- MixUCB: Enhancing Safe Exploration in Contextual Bandits with Human Oversight
🍽️ Banquet Dinner
6:00 PM
Conference Banquet
📍 Edmonton Convention Centre, Hall D
Doors open at 6 PM · Buffet dinner at 6:45 PM
Entertainment: Improv by RapidFire Theatre & Puzzle Hunt by Michael Bowling and Michael Littman
📍 9797 Jasper Ave, Edmonton, AB T5J 1N9
Thursday, August 7
Main Conference – Day 2
9:00 AM – 11:30 AM
Oral Presentations — Four Parallel Tracks
▶ Track 1: Deep RL
- Understanding the Effectiveness of Learning Behavioral Metrics in Deep RL
- Impoola: The Power of Average Pooling for Image-based Deep RL
- Eau De Q-Network: Adaptive Distillation of Neural Networks in Deep RL
- Disentangling Recognition and Decision Regrets in Image-Based RL
- Make the Pertinent Salient: Task-Relevant Reconstruction for Visual Control
- Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Bandit Learning
▶ Track 2: Social/Economic & Neuroscience
- Pareto Optimal Learning from Preferences with Hidden Context
- When and Why Hyperbolic Discounting Matters for RL Interventions
- RL from Human Feedback with High-Confidence Safety Guarantees
- Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models
- RL for Human-AI Collaboration via Probabilistic Intent Inference
- High-Confidence Policy Improvement from Human Feedback
▶ Track 3: Exploration
- Uncertainty Prioritized Experience Replay
- Pure Exploration for Constrained Best Mixed Arm Identification
- Quantitative Resilience Modeling for Autonomous Cyber Defense
- Learning to Explore in Diverse Reward Settings via TD-Error Maximization
- Syllabus: Portable Curricula for Reinforcement Learning Agents
- Exploration-Free RL with Linear Function Approximation
▶ Track 4: Theory & Bandits
- A Finite-Time Analysis of Distributed Q-Learning
- Finite-Time Analysis of Minimax Q-Learning
- Improved Regret Bound for Safe RL via Tighter Cost Pessimism and Reward Optimism
- Non-Stationary Latent Auto-Regressive Bandits
- A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization
- Leveraging Priors on Distribution Functions for Multi-arm Bandits
11:45 AM – 12:30 PM
Oral Presentations (Continued)
▶ Track 1: Deep RL (cont.)
- Sampling from Energy-based Policies using Diffusion
- Optimistic Critics Can Empower Small Actors
- Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks
- AGaLiTe: Approximate Gated Linear Transformers for Online RL
- Deep RL with Gradient Eligibility Traces
▶ Track 2: Social & Neuroscience (cont.)
- Building Sequential Resource Allocation Mechanisms without Payments
- From Explainability to Interpretability: Interpretable RL Via Model Explanations
- Learning Fair Pareto-Optimal Policies in Multi-Objective RL
- AI in a Vat: Fundamental Limits of Efficient World Modelling for Safe Agent Sandboxing
▶ Track 3: Exploration (cont.)
- Value Bonuses using Ensemble Errors for Exploration in RL
- Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models
- An Optimisation Framework for Unsupervised Environment Design
- Epistemically-guided Forward-backward Exploration
- RLeXplore: Accelerating Research in Intrinsically-Motivated RL
▶ Track 4: Theory & Bandits (cont.)
- Multi-task Representation Learning for Fixed Budget Pure-Exploration in Bandits
- On Slowly-varying Non-stationary Bandits
- Empirical Bound Information-Directed Sampling
- Thompson Sampling for Constrained Bandits
- Achieving Limited Adaptivity for Multinomial Logistic Bandits
Friday, August 8
Main Conference – Day 3
9:00 AM – 11:30 AM
Oral Presentations — Four Parallel Tracks
▶ Track 1: RL Algorithms & Deep RL
- Bayesian Meta-RL with Laplace Variational Recurrent Networks
- Cascade: A Sequential Ensemble Method for Continuous Control Tasks
- HANQ: Hypergradients, Asymmetry, and Normalization for Fast Deep Q-Learning
- Rectifying Regression in Reinforcement Learning
- Efficient Morphology-Aware Policy Transfer to New Embodiments
- Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting
▶ Track 2: Applied RL
- Action Mapping for RL in Continuous Environments with Constraints
- Chargax: A JAX Accelerated EV Charging Simulator
- WOFOSTGym: A Crop Simulator for Learning Crop Management Strategies
- Drive Fast, Learn Faster: On-Board RL for High Performance Autonomous Racing
- Multi-Agent RL for Inverse Design in Photonic Integrated Circuits
- Gaussian Process Q-Learning for Finite-Horizon MDP
▶ Track 3: Multi-Agent RL
- RL for Finite Space Mean-Field Type Game
- Collaboration Promotes Group Resilience in Multi-Agent RL
- Foundation Model Self-Play: Open-Ended Strategy Innovation
- Hierarchical Multi-agent RL for Cyber Network Defense
- Efficient Information Sharing for Training Decentralized Multi-Agent World Models
- Adaptive Reward Sharing to Enhance Learning in Multiagent Teams
▶ Track 4: Foundations
- Effect of a Slowdown Correlated to the Current State on Asynchronous Learning
- Average-Reward Soft Actor-Critic
- Your Learned Constraint is Secretly a Backward Reachable Tube
- Recursive Reward Aggregation
11:45 AM – 12:30 PM
Oral Presentations (Continued)
▶ Track 1: RL Algorithms (cont.)
- Concept-Based Off-Policy Evaluation
- Multiple-Frequencies Population-Based Training
- AVG-DICE: Stationary Distribution Correction by Regression
- Iterated Q-Network: Beyond One-Step Bellman Updates in Deep RL
▶ Track 2: Applied RL (cont.)
- Hybrid Classical/RL Local Planner for Ground Robot Navigation
- V-Max: Making RL Practical for Autonomous Driving
- Shaping Laser Pulses with Reinforcement Learning
- Learning Sub-Second Routing Optimization in Computer Networks
▶ Track 3: Multi-Agent RL (cont.)
- Seldonian RL for Ad Hoc Teamwork
- Joint-Local Grounded Action Transformation for Sim-to-Real Transfer in Multi-Agent Traffic Control
- TransAM: Transformer-Based Agent Modeling for Multi-Agent Systems
- PEnGUiN: Partially Equivariant Graph Neural Networks for Sample Efficient MARL
- Human-Level Competitive Pokémon via Scalable Offline RL with Transformers
▶ Track 4: Foundations (cont.)
- Investigating the Utility of Mirror Descent in Off-policy Actor-Critic
- Rethinking the Foundations for Continual RL
- An Analysis of Action-Value Temporal-Difference Methods That Learn State Values
- RL with Adaptive Temporal Discounting
RLC 2026