Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

publications

Probabilistic Multileave for Online Retrieval Evaluation

Anne Schuth, Robert-Jan Bruintjes, Fritjof Büttner, Joost van Doorn, Carla Groenland, Harrie Oosterhuis, Cong-Nguyen Tran, Bas Veeling, Jos van der Velde, Roger Wechsler, David Woudenberg and Maarten de Rijke. Published in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’15), 2015. [pdf, code]

We propose probabilistic multileave and empirically show that it is highly sensitive and unbiased. An important implication of this result is that historical interactions with multileaved comparisons can be reused, allowing for ranker comparisons that need much less user interaction data. Read more

Probabilistic Multileave Gradient Descent

Harrie Oosterhuis, Anne Schuth and Maarten de Rijke. Published in European Conference on Information Retrieval (ECIR ’16), 2016. [pdf, code]

We propose an extension of DBGD, called probabilistic multileave gradient descent (P-MGD) that builds on probabilistic multileave, a recently proposed highly sensitive and unbiased online evaluation method. We demonstrate that P-MGD significantly outperforms state-of-the-art online learning to rank methods in terms of online performance, without sacrificing offline performance and at greater learning speed. Read more

Multileave Gradient Descent for Fast Online Learning to Rank

Anne Schuth, Harrie Oosterhuis, Shimon Whiteson and Maarten de Rijke. Published in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM ’16), 2016. [pdf, code]

An important implication of our results is that orders of magnitude less user interaction data is required to find good rankers when multileaved comparisons are used within online learning to rank. Hence, fewer users need to be exposed to possibly inferior rankers and our method allows search engines to adapt more quickly to changes in user preferences. Read more

Semantic Video Trailers

Harrie Oosterhuis, Sujith Ravi and Michael Bendersky. Published in ICML 2016 Workshop on Multi-View Representation Learning (MVRL ’16), 2016. [pdf]

Query-based video summarization is the task of creating a brief visual trailer, which captures the parts of the video (or a collection of videos) that are most relevant to the user-issued query. In this paper, we propose an unsupervised label propagation approach for this task. Read more

Query-level Ranker Specialization

Rolf Jagerman, Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 1st International Workshop on LEARning Next gEneration Rankers, co-located with the 3rd ACM International Conference on the Theory of Information Retrieval (ICTIR ’17), 2017. [pdf]

Traditional Learning to Rank models optimize a single ranking function for all available queries. This assumes that all queries come from a homogenous source. Instead, it seems reasonable to assume that queries originate from heterogenous sources, where certain queries may require documents to be ranked differently. Read more

Balancing Speed and Quality in Online Learning to Rank for Information Retrieval

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM ’17), 2017. [pdf, code]

Effective optimization is essential for interactive systems to provide a satisfactory user experience. However, it is often challenging to find an objective to optimize for. Generally, such objectives are manually crafted and rarely capture complex user needs accurately. Conversely, we propose an approach that infers the objective directly from observed user interactions. Read more

Sensitive and Scalable Online Evaluation with Theoretical Guarantees

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM ’17), 2017. [pdf, code]

Our contribution is two-fold. First, we propose a theoretical framework for systematically comparing multileaved comparison methods using the notions of considerateness, which concerns maintaining the user experience, and fidelity, which concerns reliable correct outcomes. Second, we introduce a novel multileaved comparison method, Pairwise Preference Multileaving (PPM). Read more

Optimizing Interactive Systems with Data-Driven Objectives

Ziming Li, Artem Grotov, Julia Kiseleva, Maarten de Rijke and Harrie Oosterhuis. Published in arXiv Preprint, 2018. [pdf]

Effective optimization is essential for interactive systems to provide a satisfactory user experience. However, it is often challenging to find an objective to optimize for. Generally, such objectives are manually crafted and rarely capture complex user needs accurately. Conversely, we propose an approach that infers the objective directly from observed user interactions. Read more

Ranking for Relevance and Display Preferences in Complex Presentation Layouts

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ’18), 2018. [pdf, code]

In this paper, we consider so-called complex ranking settings where it is not clear what should be displayed, that is, what the relevant items are, and how they should be displayed, that is, where the most relevant items should be placed. These ranking settings are complex as they involve both traditional ranking and inferring the best display order. Read more

Differentiable Unbiased Online Learning to Rank

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM ’18), 2018. [pdf, code]

We introduce an entirely novel approach to OLTR that constructs a weighted differentiable pairwise loss after each interaction: Pairwise Differentiable Gradient Descent (PDGD). PDGD breaks away from the traditional approach that relies on interleaving or multileaving and extensive sampling of models to estimate gradients. Instead, its gradient is based on inferring preferences between document pairs from user clicks and can optimize any differentiable model. Read more

The Potential of Learned Index Structures for Index Compression

Harrie Oosterhuis, J. Shane Culpepper and Maarten de Rijke. Published in Proceedings of the 23rd Australasian Document Computing Symposium (ADCS ’18), 2018. [pdf]

In this paper, we consider whether such models may be applied to conjunctive Boolean querying. First, we investigate how a learned model can replace document postings of an inverted index, and then evaluate the compromises such an approach might have. Second, we evaluate the potential gains that can be achieved in terms of memory requirements. Our work shows that learned models have great potential in inverted indexing, and this direction seems to be a promising area for future research. Read more

Optimizing Ranking Models in an Online Setting

Harrie Oosterhuis and Maarten de Rijke. Published in European Conference on Information Retrieval (ECIR ’19), 2019. [pdf, code, slides]

Award Icon This paper won the Best Reproducibility Paper Award.

In this paper, we investigate whether the previous conclusions about the PDGD and DBGD comparison generalize from ideal to worst-case circumstances. We do so in two ways. First, we compare the theoretical properties of PDGD and DBGD, by taking a critical look at previously proven properties in the context of ranking. Second, we estimate an upper and lower bound on the performance of methods by simulating both ideal user behavior and extremely difficult behavior, i.e., almost-random non-cascading user models. Read more

To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions

Rolf Jagerman, Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’19), 2019. [pdf, code, external video]

Learning to Rank (LTR) from user interactions is challenging as user feedback often contains high levels of bias and noise. At the moment, two methodologies for dealing with bias prevail in the field of LTR: counterfactual methods that learn from historical data and model user behavior to deal with biases; and online methods that perform interventions to deal with bias but use no explicit user models. Read more

Learning to Rank in Theory and Practice: From Gradient Boosting to Neural Networks and Unbiased Learning

Claudio Lucchese, Franco Maria Nardini, Rama Kumar Pasumarthi, Sebastian Bruch, Michael Bendersky, Xuanhui Wang, Harrie Oosterhuis, Rolf Jagerman and Maarten de Rijke. Published in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’19), 2019. [pdf, slides, website]

This tutorial aims to weave together diverse strands of modern Learning to Rank (LtR) research, and present them in a unified full-day tutorial. Read more

Unbiased Learning to Rank: Counterfactual and Online Approaches

Harrie Oosterhuis, Rolf Jagerman and Maarten de Rijke. Published in Companion Proceedings of the Web Conference 2020 (WWW ’20), 2020. [pdf, video, slides, website]

This tutorial is about Unbiased Learning to Rank, a recent research field that aims to learn unbiased user preferences from biased user interactions. We will provide an overview of the two main families of methods in Unbiased Learning to Rank: Counterfactual Learning to Rank (CLTR) and Online Learning to Rank (OLTR) and their underlying theory. Read more

Policy-Aware Unbiased Learning to Rank for Top-k Rankings

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20), 2020. [pdf, code, video, slides]

There is currently no existing counterfactual unbiased LTR method for top-k rankings. We introduce a novel policy-aware counterfactual estimator for LTR metrics that can account for the effect of a stochastic logging policy. We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking. Read more

Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval (ICTIR ’20), 2020. [pdf, code, video, slides]

Counterfactual evaluation can estimate Click-Through-Rate (CTR) differences between ranking systems based on historical interaction data, while mitigating the effect of position bias and item-selection bias. We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimizes the policy for logging data so that the counterfactual estimate has minimal variance. As minimizing variance leads to faster convergence, LogOpt increases the data-efficiency of counterfactual estimation. LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display. Read more

Keeping Dataset Biases out of the Simulation: A Debiased Simulator for Reinforcement Learning based Recommender Systems

Jin Huang, Harrie Oosterhuis, Maarten de Rijke and Herke van Hoof. Published in Proceedings of the Fourteenth ACM Conference on Recommender Systems (RecSys ’20), 2020. [pdf, code, video]

We introduce a debiasing step in a RecSys simulation pipeline, which corrects for the biases present in the logged data before it is used to simulate user behavior. To evaluate the effects of bias on RL4Rec simulations, we propose a novel evaluation approach for simulators that considers the performance of policies optimized with the simulator. Our results reveal that the biases from logged data negatively impact the resulting policies, unless corrected for with our debiasing method. Read more

When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank

Ali Vardasbi, Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20), 2020. [pdf, code, video]

We prove that Inverse Propensity Scoring (IPS) is principally unable to correct for trust bias under non-trivial circumstances. Our main contribution is a new estimator based on affine corrections: it both reweights clicks and penalizes items displayed on ranks with high trust bias. Our estimator is the first estimator that is proven to remove the effect of both trust bias and position bias. Read more

Learning from User Interactions with Rankings: A Unification of the Field

Harrie Oosterhuis. Published in University of Amsterdam, PhD thesis, 2020. [pdf]

This thesis concerns learning to rank methods based on user clicks and specifically aims to unify the different families of these methods. As a whole, the second part of this thesis proposes a framework that bridges many gaps between areas of online, counterfactual, and supervised learning to rank. It has taken approaches, previously considered independent, and unified them into a single methodology for widely applicable and effective learning to rank from user clicks. Read more

Unifying Online and Counterfactual Learning to Rank

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM ’21), 2021. [pdf, code, video, slides, poster]

Award Icon This paper won the WSDM ’21 Best Paper Award.

We propose a novel intervention-aware estimator for both counterfactual and online Learning to Rank (LTR). With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios. Read more

Robust Generalization and Safe Query-Specialization in Counterfactual Learning to Rank

Harrie Oosterhuis and Maarten de Rijke. Published in The Web Conference (WWW ’21), 2021. [pdf, code, video, slides]

We introduce the Generalization and Specialization (GENSPEC) algorithm, a robust feature-based counterfactual LTR method that pursues per-query memorization when it is safe to do so. GENSPEC optimizes a single feature-based model for generalization: robust performance across all queries, and many tabular models for specialization: each optimized for high performance on a single query. GENSPEC uses novel relative high-confidence bounds to choose which model to deploy per query. By doing so, GENSPEC enjoys the high performance of successfully specialized tabular models with the robustness of a generalized feature-based model. Read more

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

Harrie Oosterhuis. Published in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21), 2021. [pdf, code, video, slides]

Award Icon This paper won the SIGIR ’21 Best Paper Award.

In this paper, we introduce a novel algorithm: PL-Rank, that estimates the gradient of a PL ranking model w.r.t. both relevance and fairness metrics. Unlike existing approaches that are based on policy gradients, PL-Rank makes use of the specific structure of PL models and ranking metrics. Read more

Unifying Online and Counterfactual Learning to Rank (Extended Abstract)

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI ’21), 2021. [pdf, code, video, slides, poster]

We propose a novel intervention-aware estimator for both counterfactual and online Learning to Rank (LTR). With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios. Read more

It Is Different When Items Are Older: Debiasing Recommendations When Selection Bias and User Preferences Are Dynamic

Jin Huang, Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM ’22), 2022. [pdf, code]

We theoretically show that in a dynamic scenario in which both the selection bias and user preferences are dynamic, existing debiasing methods are no longer unbiased. To address this limitation, we introduce DebiAsing in the dyNamiC scEnaRio (DANCER), a novel debiasing method that extends the inverse propensity scoring debiasing method to account for dynamic selection bias and user preferences. Read more

FOCUS: Flexible Optimizable Counterfactual Explanations for Tree Ensembles

Ana Lucic, Harrie Oosterhuis, Hinda Haned and Maarten de Rijke. Published in AAAI 2022: Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022. [pdf, code]

We introduce an approximation technique that is effective for finding counterfactual explanations for predictions of the original model and show that our counterfactual examples are significantly closer to the original instances than those produced by other methods specifically designed for tree ensembles. Read more

The Bandwagon Effect: Not Just Another Bias

Norman Knyazev and Harrie Oosterhuis. Published in Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR ’22), 2022. [pdf, code]

We argue that the bandwagon effect should not be seen as a problem of statistical bias. In fact, we prove that this effect leaves both individual interactions and their sample mean unbiased. Nevertheless, we show that it can make estimators inconsistent, introducing a distinct set of problems for convergence in relevance estimation. Read more

Reaching the End of Unbiasedness: Uncovering Implicit Limitations of Click-Based Learning to Rank

Harrie Oosterhuis. Published in Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR ’22), 2022. [pdf, slides]

Award Icon This paper won the ICTIR ’22 Best Paper Award.

This work aims to uncover the implicit limitations of the high-level prevalent approach in the counterfactual LTR field. Thus, in contrast with limitations that follow from explicit assumptions, our aim is to recognize limitations that the field is currently unaware of. Read more

State Encoders in Reinforcement Learning for Recommendation: A Reproducibility Study

Jin Huang, Harrie Oosterhuis, Bunyamin Cetinkaya, Thijs Rood and Maarten de Rijke. Published in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22), 2022. [pdf, code]

We reproduce and expand on the existing comparison of attention-based state encoders (1) in the publicly available debiased RL4Rec SOFA simulator with (2) a different RL method, (3) more state encoders, and (4) a different dataset. Read more

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness (Extended Abstract)

Harrie Oosterhuis. Published in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI ’22), 2022. [pdf, code, video, slides]

In this paper, we introduce a novel algorithm: PL-Rank, that estimates the gradient of a PL ranking model w.r.t. both relevance and fairness metrics. Unlike existing approaches that are based on policy gradients, PL-Rank makes use of the specific structure of PL models and ranking metrics. Read more

Closing the Gender Wage Gap: Adversarial Fairness in Job Recommendation

Clara Rus, Jeffrey Luppes, Harrie Oosterhuis and Gido H. Schoenmacker. Published in RecSys in HR’22 Workshop at RecSys ’22, 2022. [pdf]

Our results show that representations created from recruitment texts contain algorithmic bias and that this bias results in real-world consequences for recommendation systems. Without controlling for bias, women are recommended jobs with significantly lower salary in our data. Read more

Doubly-Robust Estimation for Correcting Position-Bias in Click Feedback for Unbiased Learning to Rank

Harrie Oosterhuis. Published in ACM Transactions on Information Systems 41.3 (TOIS ’23), 2023. [pdf, code, slides]

In this paper, we introduce a novel DR estimator that is the first DR approach specifically designed for position-bias. The difficulty with position bias is that the treatment – user examination – is not directly observable in click data. As a solution, our estimator uses the expected treatment per rank, instead of the actual treatment that existing DR estimators use. Read more

talks