Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

publications

Probabilistic Multileave for Online Retrieval Evaluation

Anne Schuth, Robert-Jan Bruintjes, Fritjof Büttner, Joost van Doorn, Carla Groenland, Harrie Oosterhuis, Cong-Nguyen Tran, Bas Veeling, Jos van der Velde, Roger Wechsler, David Woudenberg and Maarten de Rijke. Published in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’15), 2015. [pdf, code]

We propose probabilistic multileave and empirically show that it is highly sensitive and unbiased. An important implication of this result is that historical interactions with multileaved comparisons can be reused, allowing for ranker comparisons that need much less user interaction data. Read more

Probabilistic Multileave Gradient Descent

Harrie Oosterhuis, Anne Schuth and Maarten de Rijke. Published in European Conference on Information Retrieval (ECIR ’16), 2016. [pdf, code]

We propose an extension of DBGD, called probabilistic multileave gradient descent (P-MGD) that builds on probabilistic multileave, a recently proposed highly sensitive and unbiased online evaluation method. We demonstrate that P-MGD significantly outperforms state-of-the-art online learning to rank methods in terms of online performance, without sacrificing offline performance and at greater learning speed. Read more

Multileave Gradient Descent for Fast Online Learning to Rank

Anne Schuth, Harrie Oosterhuis, Shimon Whiteson and Maarten de Rijke. Published in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM ’16), 2016. [pdf, code]

An important implication of our results is that orders of magnitude less user interaction data is required to find good rankers when multileaved comparisons are used within online learning to rank. Hence, fewer users need to be exposed to possibly inferior rankers and our method allows search engines to adapt more quickly to changes in user preferences. Read more

Semantic Video Trailers

Harrie Oosterhuis, Sujith Ravi and Michael Bendersky. Published in ICML 2016 Workshop on Multi-View Representation Learning (MVRL ’16), 2016. [pdf]

Query-based video summarization is the task of creating a brief visual trailer, which captures the parts of the video (or a collection of videos) that are most relevant to the user-issued query. In this paper, we propose an unsupervised label propagation approach for this task. Read more

Query-level Ranker Specialization

Rolf Jagerman, Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 1st International Workshop on LEARning Next gEneration Rankers, co-located with the 3rd ACM International Conference on the Theory of Information Retrieval (ICTIR ’17), 2017. [pdf]

Traditional Learning to Rank models optimize a single ranking function for all available queries. This assumes that all queries come from a homogenous source. Instead, it seems reasonable to assume that queries originate from heterogenous sources, where certain queries may require documents to be ranked differently. Read more

Balancing Speed and Quality in Online Learning to Rank for Information Retrieval

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM ’17), 2017. [pdf, code]

Effective optimization is essential for interactive systems to provide a satisfactory user experience. However, it is often challenging to find an objective to optimize for. Generally, such objectives are manually crafted and rarely capture complex user needs accurately. Conversely, we propose an approach that infers the objective directly from observed user interactions. Read more

Sensitive and Scalable Online Evaluation with Theoretical Guarantees

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM ’17), 2017. [pdf, code]

Our contribution is two-fold. First, we propose a theoretical framework for systematically comparing multileaved comparison methods using the notions of considerateness, which concerns maintaining the user experience, and fidelity, which concerns reliable correct outcomes. Second, we introduce a novel multileaved comparison method, Pairwise Preference Multileaving (PPM). Read more

Optimizing Interactive Systems with Data-Driven Objectives

Ziming Li, Artem Grotov, Julia Kiseleva, Maarten de Rijke and Harrie Oosterhuis. Published in arXiv Preprint, 2018. [pdf]

Effective optimization is essential for interactive systems to provide a satisfactory user experience. However, it is often challenging to find an objective to optimize for. Generally, such objectives are manually crafted and rarely capture complex user needs accurately. Conversely, we propose an approach that infers the objective directly from observed user interactions. Read more

Ranking for Relevance and Display Preferences in Complex Presentation Layouts

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ’18), 2018. [pdf, code]

In this paper, we consider so-called complex ranking settings where it is not clear what should be displayed, that is, what the relevant items are, and how they should be displayed, that is, where the most relevant items should be placed. These ranking settings are complex as they involve both traditional ranking and inferring the best display order. Read more

Differentiable Unbiased Online Learning to Rank

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM ’18), 2018. [pdf, code]

We introduce an entirely novel approach to OLTR that constructs a weighted differentiable pairwise loss after each interaction: Pairwise Differentiable Gradient Descent (PDGD). PDGD breaks away from the traditional approach that relies on interleaving or multileaving and extensive sampling of models to estimate gradients. Instead, its gradient is based on inferring preferences between document pairs from user clicks and can optimize any differentiable model. Read more

The Potential of Learned Index Structures for Index Compression

Harrie Oosterhuis, J. Shane Culpepper and Maarten de Rijke. Published in Proceedings of the 23rd Australasian Document Computing Symposium (ADCS ’18), 2018. [pdf]

In this paper, we consider whether such models may be applied to conjunctive Boolean querying. First, we investigate how a learned model can replace document postings of an inverted index, and then evaluate the compromises such an approach might have. Second, we evaluate the potential gains that can be achieved in terms of memory requirements. Our work shows that learned models have great potential in inverted indexing, and this direction seems to be a promising area for future research. Read more

Optimizing Ranking Models in an Online Setting

Harrie Oosterhuis and Maarten de Rijke. Published in European Conference on Information Retrieval (ECIR ’19), 2019. [pdf, code, slides]

Award Icon This paper won the Best Reproducibility Paper Award.

In this paper, we investigate whether the previous conclusions about the PDGD and DBGD comparison generalize from ideal to worst-case circumstances. We do so in two ways. First, we compare the theoretical properties of PDGD and DBGD, by taking a critical look at previously proven properties in the context of ranking. Second, we estimate an upper and lower bound on the performance of methods by simulating both ideal user behavior and extremely difficult behavior, i.e., almost-random non-cascading user models. Read more

To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions

Rolf Jagerman, Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’19), 2019. [pdf, code, external video]

Learning to Rank (LTR) from user interactions is challenging as user feedback often contains high levels of bias and noise. At the moment, two methodologies for dealing with bias prevail in the field of LTR: counterfactual methods that learn from historical data and model user behavior to deal with biases; and online methods that perform interventions to deal with bias but use no explicit user models. Read more

Learning to Rank in Theory and Practice: From Gradient Boosting to Neural Networks and Unbiased Learning

Claudio Lucchese, Franco Maria Nardini, Rama Kumar Pasumarthi, Sebastian Bruch, Michael Bendersky, Xuanhui Wang, Harrie Oosterhuis, Rolf Jagerman and Maarten de Rijke. Published in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’19), 2019. [pdf, slides, website]

This tutorial aims to weave together diverse strands of modern Learning to Rank (LtR) research, and present them in a unified full-day tutorial. Read more

Unbiased Learning to Rank: Counterfactual and Online Approaches

Harrie Oosterhuis, Rolf Jagerman and Maarten de Rijke. Published in Companion Proceedings of the Web Conference 2020 (WWW ’20), 2020. [pdf, video, slides, website]

This tutorial is about Unbiased Learning to Rank, a recent research field that aims to learn unbiased user preferences from biased user interactions. We will provide an overview of the two main families of methods in Unbiased Learning to Rank: Counterfactual Learning to Rank (CLTR) and Online Learning to Rank (OLTR) and their underlying theory. Read more

Policy-Aware Unbiased Learning to Rank for Top-k Rankings

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20), 2020. [pdf, code, video, slides]

There is currently no existing counterfactual unbiased LTR method for top-k rankings. We introduce a novel policy-aware counterfactual estimator for LTR metrics that can account for the effect of a stochastic logging policy. We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking. Read more

Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval (ICTIR ’20), 2020. [pdf, code, video, slides]

Counterfactual evaluation can estimate Click-Through-Rate (CTR) differences between ranking systems based on historical interaction data, while mitigating the effect of position bias and item-selection bias. We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimizes the policy for logging data so that the counterfactual estimate has minimal variance. As minimizing variance leads to faster convergence, LogOpt increases the data-efficiency of counterfactual estimation. LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display. Read more

Keeping Dataset Biases out of the Simulation: A Debiased Simulator for Reinforcement Learning based Recommender Systems

Jin Huang, Harrie Oosterhuis, Maarten de Rijke and Herke van Hoof. Published in Proceedings of the Fourteenth ACM Conference on Recommender Systems (RecSys ’20), 2020. [pdf, code, video]

We introduce a debiasing step in a RecSys simulation pipeline, which corrects for the biases present in the logged data before it is used to simulate user behavior. To evaluate the effects of bias on RL4Rec simulations, we propose a novel evaluation approach for simulators that considers the performance of policies optimized with the simulator. Our results reveal that the biases from logged data negatively impact the resulting policies, unless corrected for with our debiasing method. Read more

When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank

Ali Vardasbi, Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20), 2020. [pdf, code, video]

We prove that Inverse Propensity Scoring (IPS) is principally unable to correct for trust bias under non-trivial circumstances. Our main contribution is a new estimator based on affine corrections: it both reweights clicks and penalizes items displayed on ranks with high trust bias. Our estimator is the first estimator that is proven to remove the effect of both trust bias and position bias. Read more

Learning from User Interactions with Rankings: A Unification of the Field

Harrie Oosterhuis. Published in University of Amsterdam, PhD thesis, 2020. [pdf]

This thesis concerns learning to rank methods based on user clicks and specifically aims to unify the different families of these methods. As a whole, the second part of this thesis proposes a framework that bridges many gaps between areas of online, counterfactual, and supervised learning to rank. It has taken approaches, previously considered independent, and unified them into a single methodology for widely applicable and effective learning to rank from user clicks. Read more

Unifying Online and Counterfactual Learning to Rank

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM ’21), 2021. [pdf, code, video, slides, poster]

Award Icon This paper won the WSDM ’21 Best Paper Award.

We propose a novel intervention-aware estimator for both counterfactual and online Learning to Rank (LTR). With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios. Read more

Robust Generalization and Safe Query-Specialization in Counterfactual Learning to Rank

Harrie Oosterhuis and Maarten de Rijke. Published in The Web Conference (WWW ’21), 2021. [pdf, code, video, slides]

We introduce the Generalization and Specialization (GENSPEC) algorithm, a robust feature-based counterfactual LTR method that pursues per-query memorization when it is safe to do so. GENSPEC optimizes a single feature-based model for generalization: robust performance across all queries, and many tabular models for specialization: each optimized for high performance on a single query. GENSPEC uses novel relative high-confidence bounds to choose which model to deploy per query. By doing so, GENSPEC enjoys the high performance of successfully specialized tabular models with the robustness of a generalized feature-based model. Read more

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

Harrie Oosterhuis. Published in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21), 2021. [pdf, code, video, slides]

Award Icon This paper won the SIGIR ’21 Best Paper Award.

In this paper, we introduce a novel algorithm: PL-Rank, that estimates the gradient of a PL ranking model w.r.t. both relevance and fairness metrics. Unlike existing approaches that are based on policy gradients, PL-Rank makes use of the specific structure of PL models and ranking metrics. Read more

Unifying Online and Counterfactual Learning to Rank (Extended Abstract)

Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI ’21), 2021. [pdf, code, video, slides, poster]

We propose a novel intervention-aware estimator for both counterfactual and online Learning to Rank (LTR). With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios. Read more

It Is Different When Items Are Older: Debiasing Recommendations When Selection Bias and User Preferences Are Dynamic

Jin Huang, Harrie Oosterhuis and Maarten de Rijke. Published in Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM ’22), 2022. [pdf, code]

We theoretically show that in a dynamic scenario in which both the selection bias and user preferences are dynamic, existing debiasing methods are no longer unbiased. To address this limitation, we introduce DebiAsing in the dyNamiC scEnaRio (DANCER), a novel debiasing method that extends the inverse propensity scoring debiasing method to account for dynamic selection bias and user preferences. Read more

FOCUS: Flexible Optimizable Counterfactual Explanations for Tree Ensembles

Ana Lucic, Harrie Oosterhuis, Hinda Haned and Maarten de Rijke. Published in AAAI 2022: Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022. [pdf, code]

We introduce an approximation technique that is effective for finding counterfactual explanations for predictions of the original model and show that our counterfactual examples are significantly closer to the original instances than those produced by other methods specifically designed for tree ensembles. Read more

The Bandwagon Effect: Not Just Another Bias

Norman Knyazev and Harrie Oosterhuis. Published in Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR ’22), 2022. [pdf, code]

We argue that the bandwagon effect should not be seen as a problem of statistical bias. In fact, we prove that this effect leaves both individual interactions and their sample mean unbiased. Nevertheless, we show that it can make estimators inconsistent, introducing a distinct set of problems for convergence in relevance estimation. Read more

Reaching the End of Unbiasedness: Uncovering Implicit Limitations of Click-Based Learning to Rank

Harrie Oosterhuis. Published in Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR ’22), 2022. [pdf, slides]

Award Icon This paper won the ICTIR ’22 Best Paper Award.

This work aims to uncover the implicit limitations of the high-level prevalent approach in the counterfactual LTR field. Thus, in contrast with limitations that follow from explicit assumptions, our aim is to recognize limitations that the field is currently unaware of. Read more

State Encoders in Reinforcement Learning for Recommendation: A Reproducibility Study

Jin Huang, Harrie Oosterhuis, Bunyamin Cetinkaya, Thijs Rood and Maarten de Rijke. Published in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22), 2022. [pdf, code]

We reproduce and expand on the existing comparison of attention-based state encoders (1) in the publicly available debiased RL4Rec SOFA simulator with (2) a different RL method, (3) more state encoders, and (4) a different dataset. Read more

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness (Extended Abstract)

Harrie Oosterhuis. Published in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI ’22), 2022. [pdf, code, video, slides]

In this paper, we introduce a novel algorithm: PL-Rank, that estimates the gradient of a PL ranking model w.r.t. both relevance and fairness metrics. Unlike existing approaches that are based on policy gradients, PL-Rank makes use of the specific structure of PL models and ranking metrics. Read more

Closing the Gender Wage Gap: Adversarial Fairness in Job Recommendation

Clara Rus, Jeffrey Luppes, Harrie Oosterhuis and Gido H. Schoenmacker. Published in RecSys in HR’22 Workshop at RecSys ’22, 2022. [pdf]

Our results show that representations created from recruitment texts contain algorithmic bias and that this bias results in real-world consequences for recommendation systems. Without controlling for bias, women are recommended jobs with significantly lower salary in our data. Read more

Doubly-Robust Estimation for Correcting Position-Bias in Click Feedback for Unbiased Learning to Rank

Harrie Oosterhuis. Published in ACM Transactions on Information Systems 41.3 (TOIS ’23), 2023. [pdf, code, video, slides]

In this paper, we introduce a novel DR estimator that is the first DR approach specifically designed for position-bias. The difficulty with position bias is that the treatment – user examination – is not directly observable in click data. As a solution, our estimator uses the expected treatment per rank, instead of the actual treatment that existing DR estimators use. Read more

Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization

Shashank Gupta, Harrie Oosterhuis, and Maarten de Rijke. Published in Proceedings of the 46nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), 2023. [pdf, code]

We introduce a novel risk-aware CLTR method with theoretical guarantees for safe deployment. We apply a novel exposure-based concept of risk regularization to IPS estimation for LTR. Our risk regularization penalizes the mismatch between the ranking behavior of a learned model and a given safe model. Thereby, it ensures that learned ranking models stay close to a trusted model, when there is high uncertainty in IPS estimation, which greatly reduces the risks during deployment. Read more

Recent Advances in the Foundations and Applications of Unbiased Learning to Rank

Shashank Gupta, Philipp Hager, Jin Huang, Ali Vardasbi, and Harrie Oosterhuis. Published in Proceedings of the 46nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), 2023. [pdf, website]

This tutorial is intended to benefit both researchers and industry practitioners who are interested in developing new unbiased learning to rank solutions or utilizing them in real-world applications. Read more

A Deep Generative Recommendation Method for Unbiased Learning from Implicit Feedback

Shashank Gupta, Harrie Oosterhuis, and Maarten de Rijke. Published in Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval 2023. (ICTIR ’23), 2023. [pdf, code]

We introduce an inverse propensity scoring (IPS) based method for training VAEs from implicit feedback data in an unbiased way. Our IPS-based estimator for the VAE training objective, VAE-IPS, is provably unbiased w.r.t. selection bias. Read more

A Lightweight Method for Modeling Confidence in Recommendations with Learned Beta Distributions

Norman Knyazev and Harrie Oosterhuis. Published in Proceedings of the 17th ACM conference on recommender systems (RecSys'23), 2023. [pdf, code]

In this work, we propose learned beta distributions (LBD) as a simple and practical recommendation method with an explicit measure of confidence. Our main insight is that beta distributions predict user preferences as probability distributions that naturally model confidence on a closed interval, yet can be implemented with the minimal model-complexity. Read more

A First Look at Selection Bias in Preference Elicitation for Recommendation

Shashank Gupta, Harrie Oosterhuis, and Maarten de Rijke. Published in CONSEQUENCES Workshop at RecSys ’23. (CONSEQUENCES ’23), 2023. [pdf, code]

We take a first look at the effects of selection bias in preference elicitation and how they may be further investigated in the future. We find that a big hurdle is the current lack of any publicly available dataset that has preference elicitation interactions. Read more

Unbiased Learning to Rank: On Recent Advances and Practical Applications

Shashank Gupta, Philipp Hager, Jin Huang, Ali Vardasbi, and Harrie Oosterhuis. Published in Proceedings of the 17th ACM International Conference on Web Search and Data Mining (WSDM ’24), 2024. [pdf, website]

This tutorial is intended to benefit both researchers and industry practitioners who are interested in developing new unbiased learning to rank solutions or utilizing them in real-world applications. Read more

Is Interpretable Machine Learning Effective at Feature Selection for Neural Learning-to-Rank?

Lijun Lyu, Nirmal Roy, Harrie Oosterhuis, and Avishek Anand. Published in Proceedings of the 2024 European Conference on Information Retrieval. (ECIR’24), 2024. [pdf, code]

In this work, we explore feature selection for neural learning-to-rank (LTR). In particular, we investigate six widely-used methods from the field of interpretable machine learning (ML) and introduce our own modification, to select the input features that are most important to the ranking behavior. Read more

Estimating the Hessian Matrix of Ranking Objectives for Stochastic Learning to Rank with Gradient Boosted Trees

Jingwei Kang, Maarten de Rijke, and Harrie Oosterhuis. Published in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. (SIGIR ’24), 2024. [pdf, code]

We introducing the first stochastic LTR method for GBDTs. Our main contribution is a novel estimator for the second-order derivatives, i.e., the Hessian matrix, which is a requirement for effective GBDTs. To efficiently compute both the first and second-order derivatives simultaneously, we incorporate our estimator into the existing PL-Rank framework. Read more

Going Beyond Popularity and Positivity Bias: Correcting for Multifactorial Bias in Recommender Systems

Jin Huang, Harrie Oosterhuis, Masoud Mansoury, Herke van Hoof, and Maarten de Rijke. Published in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. (SIGIR ’24), 2024. [pdf, code]

We consider multifactorial selection bias in RSs. Our focus is on selection bias affected by both item and rating value factors, which is a generalization and combination of popularity and positivity bias. Read more

Local Feature Selection without Label or Feature Leakage for Interpretable Machine Learning Predictions

Harrie Oosterhuis, Lijun Lyu, and Avishek Anand. Published in Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. (ICML’24), 2024. [pdf, code]

Local feature selection in machine learning provides instance-specific explanations by focusing on the most relevant features for each prediction, enhancing the interpretability of complex models. However, such methods tend to produce misleading explanations by encoding additional information in their selections. Read more

Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I.

Harrie Oosterhuis, Rolf Jagerman, Zhen Qin, Xuanhui Wang, and Michael Bendersky. Published in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24)., 2024. [pdf, code]

Recent advancements in generative artificial intelligence -specifically large language models (LLMs)- can generate relevance annotations at an enormous scale with relatively small computational costs. Potentially, this could alleviate the costs traditionally associated with IR evaluation and make it applicable to numerous low-resource applications. However, generated relevance annotations are not immune to (systematic) errors, and as a result, directly using them for evaluation produces unreliable results. Read more

Optimal Baseline Corrections for Off-Policy Contextual Bandits

Shashank Gupta, Olivier Jeunen, Harrie Oosterhuis, and Maarten de Rijke. Published in Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM ’24), 2024. [pdf, code]

The off-policy learning paradigm allows for recommender systems and general ranking applications to be framed as decision-making problems, where we aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric. With unbiasedness comes potentially high variance, and prevalent methods exist to reduce estimation variance. These methods typically make use of control variates, either additive (i.e., baseline corrections or doubly robust methods) or multiplicative (i.e., self-normalisation). Read more

Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank

Shashank Gupta, Harrie Oosterhuis, and Maarten de Rijke. Published in Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM ’24), 2024. [pdf, code]

Our contributions are two-fold. First, we generalize the existing safe CLTR approach to make it applicable to state-of-the-art doubly robust CLTR and trust bias. Second, we propose a novel approach, proximal ranking policy optimization (PRPO), that provides safety in deployment without assumptions about user behavior. PRPO removes incentives for learning ranking behavior that is too dissimilar to a safe ranking model. Read more

talks