RLC 2026 Schedule
August 15–18, 2026 · Université de Montréal, Montréal, QC, Canada
Saturday, August 15
Tutorials & Workshops
2:00 PM – 6:00 PM
Tutorials
📍 CCIS, University of Alberta
Sunday, August 16
Main Conference – Day 1
9:00 AM – 5:00 PM
Workshops
→ Programmatic Reinforcement Learning
→ Reinforcement Learning and Video Games
→ Inductive Biases in Reinforcement Learning
→ Coordination and Cooperation in Multi-Agent RL
→ Practical Insights into RL for Real Systems
→ The Causal Reinforcement Learning Workshop
→ RL Beyond Rewards: Ingredients for Developing Generalist Agents
→ Finding the Frame: Examining Conceptual Frameworks in RL
5:00 PM – 6:30 PM
Welcome Reception
📍 University (Faculty) Club
7:00 PM
RLBReW After Dark
📍 MKT Fresh Food | Beer Market
Monday, August 17
Main Conference – Day 2
9:00 AM – 11:30 AM
Oral Presentations — Four Parallel Tracks
▶ RL Algorithms
- Burning RED: Unlocking Subtask-Driven RL and Risk-Awareness in Average-Reward MDPs
- RL³: Boosting Meta Reinforcement Learning via RL inside RL²
- Fast Adaptation with Behavioral Foundation Models
- Understanding Learned Representations and Action Collapse in Visual RL
- Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions
- ProtoCRL: Prototype-based Network for Continual RL
▶ RLHF & Imitation Learning
- Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback
- Nonparametric Policy Improvement in Continuous Action Spaces via Expert Demonstrations
- DisDP: Robust Imitation Learning via Disentangled Diffusion Policies
- Mitigating Goal Misgeneralization via Minimax Regret
- Modelling Human Exploration with Light-weight Meta RL Algorithms
- Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners
▶ Hierarchical RL & Planning
- AVID: Adapting Video Diffusion Models to World Models
- The Confusing Instance Principle for Online Linear Quadratic Control
- Long-Horizon Planning with Predictable Skills
- Optimal Discounting for Offline Input-Driven MDP
- A Timer-Enforced Hybrid Supervisor for Robust, Chatter-Free Policy Switching
- Focused Skill Discovery: Learning to Control Specific State Variables
▶ Evaluation & Benchmarks
- Which Experiences Are Influential for RL Agents?
- Offline vs. Online Learning in Model-based RL: Lessons for Data Collection
- Multi-Task RL Enables Parameter Scaling
- Benchmarking Massively Parallelized Multi-Task RL for Robotics
- PufferLib 2.0: Reinforcement Learning at 1M steps/s
- Uncovering RL Integration in SSL Loss
11:45 AM – 12:30 PM
Oral Presentations (Continued)
▶ RL Algorithms
- Offline RL with Domain-Unlabeled Data
- SPEQ: Offline Stabilization Phases for Efficient Q-Learning
- Offline RL with Wasserstein Regularization via Optimal Transport Maps
- Zero-Shot RL Under Partial Observability
- Adaptive Submodular Policy Optimization
▶ RLHF & Imitation
- PAC Apprenticeship Learning with Bayesian Active Inverse RL
- Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets
- One Goal, Many Challenges: Robust Preference Optimization Amid Multi-Source Noise
- Goals vs. Rewards: A Comparative Study of Objective Specification Mechanisms
▶ Hierarchical RL
- Representation Learning and Skill Discovery with Empowerment
- Compositional Instruction Following with Language Models and RL
- Composition and Zero-Shot Transfer with Lattice Structures in RL
- Double Horizon Model-Based Policy Optimization
▶ Evaluation
- Benchmarking Partial Observability in RL with a Suite of Memory-Improvable Domains
- How Should We Meta-Learn RL Algorithms?
- AdaStop: Adaptive Statistical Testing for Sound Comparisons of Deep RL Agents
- MixUCB: Enhancing Safe Exploration in Contextual Bandits with Human Oversight
6:00 PM
Conference Banquet
📍 Edmonton Convention Centre, Hall D
Doors open at 6 PM. Buffet dinner at 6:45 PM. Entertainment by RapidFire Theatre.
Tuesday, August 18
Main Conference – Day 3
9:00 AM – 11:30 AM
Oral Presentations — Four Parallel Tracks
▶ Deep RL
- Understanding the Effectiveness of Learning Behavioral Metrics in Deep RL
- Impoola: The Power of Average Pooling for Image-based Deep RL
- Eau De Q-Network: Adaptive Distillation of Neural Networks in Deep RL
- Disentangling Recognition and Decision Regrets in Image-Based RL
- Make the Pertinent Salient: Task-Relevant Reconstruction for Visual Control
- Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Bandit Learning
▶ Social/Economic & Neuroscience
- Pareto Optimal Learning from Preferences with Hidden Context
- When and Why Hyperbolic Discounting Matters for RL Interventions
- RL from Human Feedback with High-Confidence Safety Guarantees
- Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models
- RL for Human-AI Collaboration via Probabilistic Intent Inference
- High-Confidence Policy Improvement from Human Feedback
▶ Exploration
- Uncertainty Prioritized Experience Replay
- Pure Exploration for Constrained Best Mixed Arm Identification
- Quantitative Resilience Modeling for Autonomous Cyber Defense
- Learning to Explore in Diverse Reward Settings via TD-Error Maximization
- Syllabus: Portable Curricula for Reinforcement Learning Agents
- Exploration-Free RL with Linear Function Approximation
▶ Theory & Bandits
- A Finite-Time Analysis of Distributed Q-Learning
- Finite-Time Analysis of Minimax Q-Learning
- Improved Regret Bound for Safe RL via Tighter Cost Pessimism and Reward Optimism
- Non-Stationary Latent Auto-Regressive Bandits
- A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization
- Leveraging Priors on Distribution Functions for Multi-arm Bandits
11:45 AM – 12:30 PM
Oral Presentations (Continued)
▶ Deep RL
- Sampling from Energy-based Policies using Diffusion
- Optimistic Critics Can Empower Small Actors
- Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks
- AGaLiTe: Approximate Gated Linear Transformers for Online RL
- Deep RL with Gradient Eligibility Traces
▶ Social & Neuroscience
- Building Sequential Resource Allocation Mechanisms without Payments
- From Explainability to Interpretability: Interpretable RL Via Model Explanations
- Learning Fair Pareto-Optimal Policies in Multi-Objective RL
- AI in a Vat: Fundamental Limits of Efficient World Modelling for Safe Agent Sandboxing
▶ Exploration
- Value Bonuses using Ensemble Errors for Exploration in RL
- Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models
- An Optimisation Framework for Unsupervised Environment Design
- Epistemically-guided Forward-backward Exploration
- RLeXplore: Accelerating Research in Intrinsically-Motivated RL
▶ Theory & Bandits
- Multi-task Representation Learning for Fixed Budget Pure-Exploration in Bandits
- On Slowly-varying Non-stationary Bandits
- Empirical Bound Information-Directed Sampling
- Thompson Sampling for Constrained Bandits
- Achieving Limited Adaptivity for Multinomial Logistic Bandits
RLC 2026