Skip to main content

Research Repository

Advanced Search

Approximating true relevance model in relevance feedback.

Zhang, Peng

Authors

Peng Zhang



Contributors

Dawei Song
Supervisor

John McCall
Supervisor

Abstract

Relevance is an essential concept in information retrieval (IR) and relevance estimation is a fundamental IR task. It involves not only document relevance estimation, but also estimation of user's information need. Relevance-based language model aims to estimate a relevance model (i.e., a relevant query term distribution) from relevance feedback documents. The true relevance model should be generated from truly relevant documents. The ideal estimation of the true relevance model is expected to be not only effective in terms of mean retrieval performance (e.g., Mean Average Precision) over all the queries, but also stable in the sense that the performance is stable across different individual queries. In practice, however, in approximating/estimating the true relevance model, the improvement of retrieval effectiveness often sacrifices the retrieval stability, and vice versa. In this thesis, we propose to explore and analyze such effectiveness-stability tradeoff from a new perspective, i.e., the bias-variance tradeoff that is a fundamental theory in statistical estimation. We first formulate the bias, variance and the trade-off between them for retrieval performance as well as for query model estimation. We then analytically and empirically study a number of factors (e.g., query model complexity, query model combination, document weight smoothness and irrelevant documents removal) that can affect the bias and variance. Our study shows that the proposed bias-variance trade-off analysis can serve as an analytical framework for query model estimation. We then investigate in depth on two particular key factors: document weight smoothness and removal of irrelevant documents, in query model estimation, by proposing novel methods for document weight smoothing and irrelevance distribution separation, respectively. Systematic experimental evaluation on TREC collections shows that the proposed methods can improve both retrieval effectiveness and retrieval stability of query model estimation. In addition to the above main contributions, we also carry out initial exploration on two further directions: the formulation of bias-variance in personalization and looking at the query model estimation via a novel theoretical angle (i.e., Quantum theory) that has partially inspired our research.

Thesis Type Thesis
Institution Citation ZHANG, P. 2013. Approximating true relevance model in relevance feedback. Robert Gordon University, PhD thesis.
Keywords Relevance feedback; True relevance model; Biasvariance analysis; Document weight smoothing; Distribution separation method; Personalisation; Quantum

Files

ZHANG 2013 Approximating True Relevance (1.4 Mb)
PDF

Copyright Statement
Copyright: the author and Robert Gordon University





You might also like



Downloadable Citations

;