Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

publications

Probabilistic Multileave for Online Retrieval Evaluation

Anne Schuth, Robert-Jan Bruintjes, Fritjof Büttner, Joost van Doorn, Carla Groenland, Harrie Oosterhuis, Cong-Nguyen Tran, Bas Veeling, Jos van der Velde, Roger Wechsler, David Woudenberg and Maarten de Rijke. Published in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’15), 2015. [pdf, code]

We propose probabilistic multileave and empirically show that it is highly sensitive and unbiased. An important implication of this result is that historical interactions with multileaved comparisons can be reused, allowing for ranker comparisons that need much less user interaction data. Read more

Probabilistic Multileave Gradient Descent

Harrie Oosterhuis, Anne Schuth and Maarten de Rijke. Published in European Conference on Information Retrieval (ECIR ’16), 2016. [pdf, code]

We propose an extension of DBGD, called probabilistic multileave gradient descent (P-MGD) that builds on probabilistic multileave, a recently proposed highly sensitive and unbiased online evaluation method. We demonstrate that P-MGD significantly outperforms state-of-the-art online learning to rank methods in terms of online performance, without sacrificing offline performance and at greater learning speed. Read more

Multileave Gradient Descent for Fast Online Learning to Rank

Anne Schuth, Harrie Oosterhuis, Shimon Whiteson and Maarten de Rijke. Published in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM ’16), 2016. [pdf, code]

An important implication of our results is that orders of magnitude less user interaction data is required to find good rankers when multileaved comparisons are used within online learning to rank. Hence, fewer users need to be exposed to possibly inferior rankers and our method allows search engines to adapt more quickly to changes in user preferences. Read more

Semantic Video Trailers

Harrie Oosterhuis, Sujith Ravi and Michael Bendersky. Published in ICML 2016 Workshop on Multi-View Representation Learning (MVRL ’16), 2016. [pdf]

Query-based video summarization is the task of creating a brief visual trailer, which captures the parts of the video (or a collection of videos) that are most relevant to the user-issued query. In this paper, we propose an unsupervised label propagation approach for this task. Read more

Query-level Ranker Specialization

Rolf Jagerman, Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 1st International Workshop on LEARning Next gEneration Rankers, co-located with the 3rd ACM International Conference on the Theory of Information Retrieval (ICTIR ’17), 2017. [pdf]

Traditional Learning to Rank models optimize a single ranking function for all available queries. This assumes that all queries come from a homogenous source. Instead, it seems reasonable to assume that queries originate from heterogenous sources, where certain queries may require documents to be ranked differently. Read more

Balancing Speed and Quality in Online Learning to Rank for Information Retrieval

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM ’17), 2017. [pdf, code]

Effective optimization is essential for interactive systems to provide a satisfactory user experience. However, it is often challenging to find an objective to optimize for. Generally, such objectives are manually crafted and rarely capture complex user needs accurately. Conversely, we propose an approach that infers the objective directly from observed user interactions. Read more

Sensitive and Scalable Online Evaluation with Theoretical Guarantees

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM ’17), 2017. [pdf, code]

Our contribution is two-fold. First, we propose a theoretical framework for systematically comparing multileaved comparison methods using the notions of considerateness, which concerns maintaining the user experience, and fidelity, which concerns reliable correct outcomes. Second, we introduce a novel multileaved comparison method, Pairwise Preference Multileaving (PPM). Read more

Optimizing Interactive Systems with Data-Driven Objectives

Ziming Li, Artem Grotov, Julia Kiseleva, Maarten de Rijke and Harrie Oosterhuis. Published in arXiv Preprint, 2018. [pdf]

Ranking for Relevance and Display Preferences in Complex Presentation Layouts

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ’18), 2018. [pdf, code]

In this paper, we consider so-called complex ranking settings where it is not clear what should be displayed, that is, what the relevant items are, and how they should be displayed, that is, where the most relevant items should be placed. These ranking settings are complex as they involve both traditional ranking and inferring the best display order. Read more

Differentiable Unbiased Online Learning to Rank

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM ’18), 2018. [pdf, code]

We introduce an entirely novel approach to OLTR that constructs a weighted differentiable pairwise loss after each interaction: Pairwise Differentiable Gradient Descent (PDGD). PDGD breaks away from the traditional approach that relies on interleaving or multileaving and extensive sampling of models to estimate gradients. Instead, its gradient is based on inferring preferences between document pairs from user clicks and can optimize any differentiable model. Read more

The Potential of Learned Index Structures for Index Compression

Harrie Oosterhuis, J. Shane Culpepper and Maarten de Rijke. Published in Proceedings of the 23rd Australasian Document Computing Symposium (ADCS ’18), 2018. [pdf]

In this paper, we consider whether such models may be applied to conjunctive Boolean querying. First, we investigate how a learned model can replace document postings of an inverted index, and then evaluate the compromises such an approach might have. Second, we evaluate the potential gains that can be achieved in terms of memory requirements. Our work shows that learned models have great potential in inverted indexing, and this direction seems to be a promising area for future research. Read more

Optimizing Ranking Models in an Online Setting

Harrie Oosterhuis and Maarten de Rijke. Published in European Conference on Information Retrieval (ECIR ’19), 2019. [pdf, code, slides]

This paper won the Best Reproducibility Paper Award.

In this paper, we investigate whether the previous conclusions about the PDGD and DBGD comparison generalize from ideal to worst-case circumstances. We do so in two ways. First, we compare the theoretical properties of PDGD and DBGD, by taking a critical look at previously proven properties in the context of ranking. Second, we estimate an upper and lower bound on the performance of methods by simulating both ideal user behavior and extremely difficult behavior, i.e., almost-random non-cascading user models. Read more

To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions

Rolf Jagerman, Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’19), 2019. [pdf, code, external video]

Learning to Rank (LTR) from user interactions is challenging as user feedback often contains high levels of bias and noise. At the moment, two methodologies for dealing with bias prevail in the field of LTR: counterfactual methods that learn from historical data and model user behavior to deal with biases; and online methods that perform interventions to deal with bias but use no explicit user models. Read more

Learning to Rank in Theory and Practice: From Gradient Boosting to Neural Networks and Unbiased Learning

Claudio Lucchese, Franco Maria Nardini, Rama Kumar Pasumarthi, Sebastian Bruch, Michael Bendersky, Xuanhui Wang, Harrie Oosterhuis, Rolf Jagerman and Maarten de Rijke. Published in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’19), 2019. [pdf, slides, website]

This tutorial aims to weave together diverse strands of modern Learning to Rank (LtR) research, and present them in a unified full-day tutorial. Read more

Unbiased Learning to Rank: Counterfactual and Online Approaches

Harrie Oosterhuis, Rolf Jagerman and Maarten de Rijke. Published in Companion Proceedings of the Web Conference 2020 (WWW ’20), 2020. [pdf, video, slides, website]

This tutorial is about Unbiased Learning to Rank, a recent research field that aims to learn unbiased user preferences from biased user interactions. We will provide an overview of the two main families of methods in Unbiased Learning to Rank: Counterfactual Learning to Rank (CLTR) and Online Learning to Rank (OLTR) and their underlying theory. Read more

Policy-Aware Unbiased Learning to Rank for Top-k Rankings

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20), 2020. [pdf, code, video, slides]

There is currently no existing counterfactual unbiased LTR method for top-k rankings. We introduce a novel policy-aware counterfactual estimator for LTR metrics that can account for the effect of a stochastic logging policy. We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking. Read more

Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval (ICTIR ’20), 2020. [pdf, code, video, slides]

Counterfactual evaluation can estimate Click-Through-Rate (CTR) differences between ranking systems based on historical interaction data, while mitigating the effect of position bias and item-selection bias. We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimizes the policy for logging data so that the counterfactual estimate has minimal variance. As minimizing variance leads to faster convergence, LogOpt increases the data-efficiency of counterfactual estimation. LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display. Read more

Keeping Dataset Biases out of the Simulation: A Debiased Simulator for Reinforcement Learning based Recommender Systems

Jin Huang, Harrie Oosterhuis, Maarten de Rijke and Herke van Hoof. Published in Proceedings of the Fourteenth ACM Conference on Recommender Systems (RecSys ’20), 2020. [pdf, code, video]

We introduce a debiasing step in a RecSys simulation pipeline, which corrects for the biases present in the logged data before it is used to simulate user behavior. To evaluate the effects of bias on RL4Rec simulations, we propose a novel evaluation approach for simulators that considers the performance of policies optimized with the simulator. Our results reveal that the biases from logged data negatively impact the resulting policies, unless corrected for with our debiasing method. Read more

When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank

Ali Vardasbi, Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20), 2020. [pdf, code, video]

We prove that Inverse Propensity Scoring (IPS) is principally unable to correct for trust bias under non-trivial circumstances. Our main contribution is a new estimator based on affine corrections: it both reweights clicks and penalizes items displayed on ranks with high trust bias. Our estimator is the first estimator that is proven to remove the effect of both trust bias and position bias. Read more

Learning from User Interactions with Rankings: A Unification of the Field

Harrie Oosterhuis. Published in University of Amsterdam, PhD thesis, 2020. [pdf]

This thesis concerns learning to rank methods based on user clicks and specifically aims to unify the different families of these methods. As a whole, the second part of this thesis proposes a framework that bridges many gaps between areas of online, counterfactual, and supervised learning to rank. It has taken approaches, previously considered independent, and unified them into a single methodology for widely applicable and effective learning to rank from user clicks. Read more

Unifying Online and Counterfactual Learning to Rank

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM ’21), 2021. [pdf, code, video, slides, poster]

This paper won the WSDM ’21 Best Paper Award.

We propose a novel intervention-aware estimator for both counterfactual and online Learning to Rank (LTR). With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios. Read more

Robust Generalization and Safe Query-Specialization in Counterfactual Learning to Rank

Harrie Oosterhuis and Maarten de Rijke. Published in The Web Conference (WWW ’21), 2021. [pdf, code, video, slides]

We introduce the Generalization and Specialization (GENSPEC) algorithm, a robust feature-based counterfactual LTR method that pursues per-query memorization when it is safe to do so. GENSPEC optimizes a single feature-based model for generalization: robust performance across all queries, and many tabular models for specialization: each optimized for high performance on a single query. GENSPEC uses novel relative high-confidence bounds to choose which model to deploy per query. By doing so, GENSPEC enjoys the high performance of successfully specialized tabular models with the robustness of a generalized feature-based model. Read more

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

Harrie Oosterhuis. Published in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21), 2021. [pdf, code, video, slides]

This paper won the SIGIR ’21 Best Paper Award.

In this paper, we introduce a novel algorithm: PL-Rank, that estimates the gradient of a PL ranking model w.r.t. both relevance and fairness metrics. Unlike existing approaches that are based on policy gradients, PL-Rank makes use of the specific structure of PL models and ranking metrics. Read more

Unifying Online and Counterfactual Learning to Rank (Extended Abstract)

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI ’21), 2021. [pdf, code, video, slides, poster]

It Is Different When Items Are Older: Debiasing Recommendations When Selection Bias and User Preferences Are Dynamic

Jin Huang, Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM ’22), 2022. [pdf, code]

We theoretically show that in a dynamic scenario in which both the selection bias and user preferences are dynamic, existing debiasing methods are no longer unbiased. To address this limitation, we introduce DebiAsing in the dyNamiC scEnaRio (DANCER), a novel debiasing method that extends the inverse propensity scoring debiasing method to account for dynamic selection bias and user preferences. Read more

FOCUS: Flexible Optimizable Counterfactual Explanations for Tree Ensembles

Ana Lucic, Harrie Oosterhuis, Hinda Haned and Maarten de Rijke. Published in AAAI 2022: Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022. [pdf, code]

We introduce an approximation technique that is effective for finding counterfactual explanations for predictions of the original model and show that our counterfactual examples are significantly closer to the original instances than those produced by other methods specifically designed for tree ensembles. Read more

The Bandwagon Effect: Not Just Another Bias

Norman Knyazev and Harrie Oosterhuis. Published in Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR ’22), 2022. [pdf, code]

We argue that the bandwagon effect should not be seen as a problem of statistical bias. In fact, we prove that this effect leaves both individual interactions and their sample mean unbiased. Nevertheless, we show that it can make estimators inconsistent, introducing a distinct set of problems for convergence in relevance estimation. Read more

Reaching the End of Unbiasedness: Uncovering Implicit Limitations of Click-Based Learning to Rank

Harrie Oosterhuis. Published in Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR ’22), 2022. [pdf, slides]

This paper won the ICTIR ’22 Best Paper Award.

This work aims to uncover the implicit limitations of the high-level prevalent approach in the counterfactual LTR field. Thus, in contrast with limitations that follow from explicit assumptions, our aim is to recognize limitations that the field is currently unaware of. Read more

State Encoders in Reinforcement Learning for Recommendation: A Reproducibility Study

Jin Huang, Harrie Oosterhuis, Bunyamin Cetinkaya, Thijs Rood and Maarten de Rijke. Published in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22), 2022. [pdf, code]

We reproduce and expand on the existing comparison of attention-based state encoders (1) in the publicly available debiased RL4Rec SOFA simulator with (2) a different RL method, (3) more state encoders, and (4) a different dataset. Read more

Learning-to-Rank at the Speed of Sampling: Plackett-Luce Gradient Estimation With Minimal Computational Complexity

Harrie Oosterhuis. Published in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22), 2022. [pdf, code, video, slides, poster]

In this paper, we introduce the novel PL-Rank-3 algorithm that performs unbiased gradient estimation with a computational complexity comparable to the best sorting algorithms. Read more

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness (Extended Abstract)

Harrie Oosterhuis. Published in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI ’22), 2022. [pdf, code, video, slides]

Closing the Gender Wage Gap: Adversarial Fairness in Job Recommendation

Clara Rus, Jeffrey Luppes, Harrie Oosterhuis and Gido H. Schoenmacker. Published in RecSys in HR’22 Workshop at RecSys ’22, 2022. [pdf]

Our results show that representations created from recruitment texts contain algorithmic bias and that this bias results in real-world consequences for recommendation systems. Without controlling for bias, women are recommended jobs with significantly lower salary in our data. Read more

VAE-IPS: A Deep Generative Recommendation Method for Unbiased Learning from Implicit Feedback

Shashank Gupta, Harrie Oosterhuis, and Maarten de Rijke. Published in CONSEQUENCES+REVEAL Workshop at RecSys ’22, 2022. [pdf]

In this work, we address this gap by introducing an inverse propensity scoring (IPS) based unbiased training method for VAEs from implicit feedback data, VAE-IPS, which is provably unbiased w.r.t. selection bias. Read more

Doubly-Robust Estimation for Correcting Position-Bias in Click Feedback for Unbiased Learning to Rank

Harrie Oosterhuis. Published in ACM Transactions on Information Systems 41.3 (TOIS ’23), 2023. [pdf, code, video, slides]

In this paper, we introduce a novel DR estimator that is the first DR approach specifically designed for position-bias. The difficulty with position bias is that the treatment – user examination – is not directly observable in click data. As a solution, our estimator uses the expected treatment per rank, instead of the actual treatment that existing DR estimators use. Read more

Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization

Shashank Gupta, Harrie Oosterhuis, and Maarten de Rijke. Published in Proceedings of the 46nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), 2023. [pdf, code]

We introduce a novel risk-aware CLTR method with theoretical guarantees for safe deployment. We apply a novel exposure-based concept of risk regularization to IPS estimation for LTR. Our risk regularization penalizes the mismatch between the ranking behavior of a learned model and a given safe model. Thereby, it ensures that learned ranking models stay close to a trusted model, when there is high uncertainty in IPS estimation, which greatly reduces the risks during deployment. Read more

Recent Advances in the Foundations and Applications of Unbiased Learning to Rank

Shashank Gupta, Philipp Hager, Jin Huang, Ali Vardasbi, and Harrie Oosterhuis. Published in Proceedings of the 46nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), 2023. [pdf, website]

This tutorial is intended to benefit both researchers and industry practitioners who are interested in developing new unbiased learning to rank solutions or utilizing them in real-world applications. Read more

A Deep Generative Recommendation Method for Unbiased Learning from Implicit Feedback

Shashank Gupta, Harrie Oosterhuis, and Maarten de Rijke. Published in Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval 2023. (ICTIR ’23), 2023. [pdf, code]

We introduce an inverse propensity scoring (IPS) based method for training VAEs from implicit feedback data in an unbiased way. Our IPS-based estimator for the VAE training objective, VAE-IPS, is provably unbiased w.r.t. selection bias. Read more

A Lightweight Method for Modeling Confidence in Recommendations with Learned Beta Distributions

Norman Knyazev and Harrie Oosterhuis. Published in Proceedings of the 17th ACM conference on recommender systems (RecSys'23), 2023. [pdf, code]

In this work, we propose learned beta distributions (LBD) as a simple and practical recommendation method with an explicit measure of confidence. Our main insight is that beta distributions predict user preferences as probability distributions that naturally model confidence on a closed interval, yet can be implemented with the minimal model-complexity. Read more

A First Look at Selection Bias in Preference Elicitation for Recommendation

Shashank Gupta, Harrie Oosterhuis, and Maarten de Rijke. Published in CONSEQUENCES Workshop at RecSys ’23. (CONSEQUENCES ’23), 2023. [pdf, code]

We take a first look at the effects of selection bias in preference elicitation and how they may be further investigated in the future. We find that a big hurdle is the current lack of any publicly available dataset that has preference elicitation interactions. Read more

Unbiased Learning to Rank: On Recent Advances and Practical Applications

Shashank Gupta, Philipp Hager, Jin Huang, Ali Vardasbi, and Harrie Oosterhuis. Published in Proceedings of the 17th ACM International Conference on Web Search and Data Mining (WSDM ’24), 2024. [pdf, website]

Is Interpretable Machine Learning Effective at Feature Selection for Neural Learning-to-Rank?

Lijun Lyu, Nirmal Roy, Harrie Oosterhuis, and Avishek Anand. Published in Proceedings of the 2024 European Conference on Information Retrieval. (ECIR’24), 2024. [pdf, code]

In this work, we explore feature selection for neural learning-to-rank (LTR). In particular, we investigate six widely-used methods from the field of interpretable machine learning (ML) and introduce our own modification, to select the input features that are most important to the ranking behavior. Read more

Estimating the Hessian Matrix of Ranking Objectives for Stochastic Learning to Rank with Gradient Boosted Trees

Jingwei Kang, Maarten de Rijke, and Harrie Oosterhuis. Published in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. (SIGIR ’24), 2024. [pdf, code]

We introducing the first stochastic LTR method for GBDTs. Our main contribution is a novel estimator for the second-order derivatives, i.e., the Hessian matrix, which is a requirement for effective GBDTs. To efficiently compute both the first and second-order derivatives simultaneously, we incorporate our estimator into the existing PL-Rank framework. Read more

Going Beyond Popularity and Positivity Bias: Correcting for Multifactorial Bias in Recommender Systems

Jin Huang, Harrie Oosterhuis, Masoud Mansoury, Herke van Hoof, and Maarten de Rijke. Published in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. (SIGIR ’24), 2024. [pdf, code]

We consider multifactorial selection bias in RSs. Our focus is on selection bias affected by both item and rating value factors, which is a generalization and combination of popularity and positivity bias. Read more

Local Feature Selection without Label or Feature Leakage for Interpretable Machine Learning Predictions

Harrie Oosterhuis, Lijun Lyu, and Avishek Anand. Published in Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. (ICML’24), 2024. [pdf, code]

Local feature selection in machine learning provides instance-specific explanations by focusing on the most relevant features for each prediction, enhancing the interpretability of complex models. However, such methods tend to produce misleading explanations by encoding additional information in their selections. Read more

Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I.

Harrie Oosterhuis, Rolf Jagerman, Zhen Qin, Xuanhui Wang, and Michael Bendersky. Published in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24), 2024. [pdf, code]

Recent advancements in generative artificial intelligence -specifically large language models (LLMs)- can generate relevance annotations at an enormous scale with relatively small computational costs. Potentially, this could alleviate the costs traditionally associated with IR evaluation and make it applicable to numerous low-resource applications. However, generated relevance annotations are not immune to (systematic) errors, and as a result, directly using them for evaluation produces unreliable results. Read more

AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit

Mohanna Hoveyda, Arjen P de Vries, Maarten de Rijke, Harrie Oosterhuis, Faegheh Hasibi. Published in Arxiv preprint, 2024. [pdf, code]

We build on recent advances in the orchestration of multiple large language models (LLMs) and formulate adaptive QA as a dynamic orchestration challenge. We define this as a contextual multi-armed bandit problem, where the context is defined by the characteristics of the incoming question and the action space consists of potential communication graph configurations among the LLM agents. Read more

Optimal Baseline Corrections for Off-Policy Contextual Bandits

Shashank Gupta, Olivier Jeunen, Harrie Oosterhuis, and Maarten de Rijke. Published in Proceedings of the 18th ACM Conference on Recommender Systems (RecSys ’24), 2024. [pdf, code]

The off-policy learning paradigm allows for recommender systems and general ranking applications to be framed as decision-making problems, where we aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric. With unbiasedness comes potentially high variance, and prevalent methods exist to reduce estimation variance. These methods typically make use of control variates, either additive (i.e., baseline corrections or doubly robust methods) or multiplicative (i.e., self-normalisation). Read more

Proximal Ranking Policy Optimization for Practical Safety in Counterfactual Learning to Rank

Shashank Gupta, Harrie Oosterhuis, Maarten de Rijke. Published in CONSEQUENCES Workshop at RecSys ’24. (CONSEQUENCES ’24), 2024. [pdf, code]

We propose a novel approach, proximal ranking policy optimization (PRPO), that provides safety in deployment without assumptions about user behavior. PRPO removes incentives for learning ranking behavior that is too dissimilar to a safe ranking model. Read more

A Simpler Alternative to Variational Regularized Counterfactual Risk Minimization

Hua Chang Bakker, Shashank Gupta, Harrie Oosterhuis. Published in CONSEQUENCES Workshop at RecSys ’24 (CONSEQUENCES ’24), 2024. [pdf, code]

In this work, we revisit the original experimental setting of VRCRM and propose to minimize the f-divergence directly, instead of optimizing for the lower bound using a f-GAN approach. Surprisingly, we were unable to reproduce the results reported in the original setting. In response, we propose a novel simpler alternative to f-divergence optimization by minimizing a direct approximation of f-divergence directly, instead of a f-GAN based lower bound. Read more

Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank

Shashank Gupta, Harrie Oosterhuis, and Maarten de Rijke. Published in Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM ’24), 2024. [pdf, code]

Our contributions are two-fold. First, we generalize the existing safe CLTR approach to make it applicable to state-of-the-art doubly robust CLTR and trust bias. Second, we propose a novel approach, proximal ranking policy optimization (PRPO), that provides safety in deployment without assumptions about user behavior. PRPO removes incentives for learning ranking behavior that is too dissimilar to a safe ranking model. Read more

Consolidating Ranking and Relevance Predictions of Large Language Models through Post-Processing

Le Yan, Zhen Qin, Honglei Zhuang, Rolf Jagerman, Xuanhui Wang, Michael Bendersky, Harrie Oosterhuis. Published in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP ’24), 2024. [pdf]

In this work, we propose a post-processing method to consolidate the relevance labels generated by an LLM with its powerful ranking abilities. Our method takes both LLM generated relevance labels and pairwise preferences. The labels are then altered to satisfy the pairwise preferences of the LLM, while staying as close to the original values as possible. Read more

A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning

Shashank Gupta, Chaitanya Ahuja, Tsung-Yu Lin, Sreya Dutta Roy, Harrie Oosterhuis, Maarten de Rijke, Satya Narayan Shukla. Published in Arxiv preprint, 2025. [pdf]

We systematically analyze the efficiency-effectiveness trade-off between REINFORCE and PPO, and propose leave-one-out PPO (LOOP), a novel RL for diffusion fine-tuning method. LOOP combines variance reduction techniques from REINFORCE, such as sampling multiple actions per input prompt and a baseline correction term, with the robustness and sample efficiency of PPO via clipping and importance sampling. Read more

Optimizing Compound Retrieval Systems

Harrie Oosterhuis, Rolf Jagerman, Zhen Qin, Xuanhui Wang. Published in Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. (SIGIR ’25), 2025. [pdf, code]

We propose the concept of compound retrieval systems as a broader class of retrieval systems that apply multiple prediction models. This encapsulates cascading models but also allows other types of interactions than top-K re-ranking. In particular, we enable interactions with large language models (LLMs) which can provide relative relevance comparisons. Read more

Adaptive Orchestration of Modular Generative Information Access Systems

Mohanna Hoveyda, Harrie Oosterhuis, Arjen P de Vries, Maarten de Rijke, Faegheh Hasibi. Published in Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. (SIGIR ’25), 2025. [pdf, code]

In this perspective paper, we argue that the architecture of future modular generative information access systems will not just assemble powerful components, but enable a self-organizing system through real-time adaptive orchestration - where components’ interactions are dynamically configured for each user input, maximizing information relevance while minimizing computational overhead. Read more

RecGaze: The First Eye Tracking and User Interaction Dataset for Carousel Interfaces

Santiago de Leon-Martinez, Jingwei Kang, Robert Moro, Maarten de Rijke, Branislav Kveton, Harrie Oosterhuis, Maria Bielikova. Published in Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. (SIGIR ’25), 2025. [pdf, code]

We introduce the RecGaze dataset: the first comprehensive feedback dataset on carousels that includes eye tracking results, clicks, cursor movements, and selection explanations. The dataset comprises of interactions from 3 movie selection tasks with 40 different carousel interfaces per user. In total, 87 users and 3,477 interactions are logged. Read more

Learning to Rank with Variable Result Presentation Lengths

Norman Knyazev, Harrie Oosterhuis. Published in Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. (SIGIR ’25), 2025. [pdf, code]

We introduce the variable presentation length ranking task, where simultaneously the ordering of documents and their presentation length is decided. We propose VLPL - a new family of Plackett-Luce list-wise gradient estimation methods for the joint optimization of document ordering and lengths. Read more

Harnessing Pairwise Ranking Prompting Through Sample-Efficient Ranking Distillation

Junru Wu, Le Yan, Zhen Qin, Honglei Zhuang, Tianqi Liu, Zhe Dong, Xuanhui Wang, Harrie Oosterhuis. Published in Proceedings of ReNeuIR at SIGIR 2025: The Fourth Workshop on Reaching Efficiency in Neural Information Retrieval (ReNeuIR ’25), 2025. [pdf]

We propose to harness the effectiveness of PRP through pairwise distillation. Specifically, we distill a pointwise student ranker from pairwise teacher labels generated by PRP, resulting in an efficient student model that retains the performance of PRP with substantially lower computational costs. Read more

Rethinking Click Models in Light of Carousel Interfaces: Theory-Based Categorization and Design of Click Models

Jingwei Kang, Maarten de Rijke, Santiago de Leon-Martinez, Harrie Oosterhuis. Published in Proceedings of the 2025 ACM SIGIR International Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR ’25), 2025. [pdf]

This paper won the ICTIR ’25 Honorable Mention.

This work reconsiders what should be the fundamental concepts in click model design, grounding them - unlike previous approaches - in their mathematical properties. Read more

A Non-Parametric Choice Model That Learns How Users Choose Between Recommended Options

Thorsten Krause, Harrie Oosterhuis. Published in Proceedings of the 19th ACM Conference on Recommender Systems (RecSys ’25), 2025. [pdf, code]

We propose the learned choice model for recommendation (LCM4Rec), a non-parametric method for estimating the choice model. By applying kernel density estimation, LCM4Rec infers the most likely error distribution that describes the effect of inter-item cannibalization and thereby characterizes the users’ choice model. Read more

Dr. Harrie Oosterhuis

Sitemap

Pages

publications

talks