Eduard Hoenkamp
The asymptotic behavior of a limited dependencies language model.
Hoenkamp, Eduard; Bruza, Peter; Huang, Qiang; Song, Dawei
Authors
Peter Bruza
Qiang Huang
Dawei Song
Contributors
E. Hoenkamp
Editor
M. De Cock
Editor
V. Hoste
Exhibitor
Abstract
Intuitively, any ‘bag of words’ approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. First, the term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Second, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Third, ranking is achieved by comparing the Kullback-Leibler divergence between the stationary distributions of query and documents. These steps can be implemented as a simple and computationally inexpensive algorithm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation of the document than any model that represents the dependencies in the document by its initial distribution. A secondary contribution is to investigate the practical application of this representation. To do so, the algorithm was tested on the AP88-89 and WSJ87-92 collections in a pseudo-relevance feedback setting. Results showed consistent improvements over a standard language model baseline. Moreover, even in its simple form, the algorithm proved already to be on a par with more sophisticated algorithms that depend on choosing sets of parameters or extensive training. Hence, adding such schemes may be expected to improve the the results of the simple algorithm beyond current practice.
Citation
HOENKAMP, E., BRUZA, P., HUANG, Q. and SONG, D. 2008. The asymptotic behavior of a limited dependencies language model. In Hoenkamp, E., De Cock, M. and Hoste, V. (eds.) Proceedings of 8th Dutch-Belgian information retrieval workshop 2008 (DIR 2008), 14-15 April 2008, Maastricht, Netherlands. Enschede: Neslia Paniculata, pages 59-64.
Conference Name | 8th Dutch-Belgian information retrieval workshop 2008 (DIR 2008) |
---|---|
Conference Location | Maastricht, Netherlands |
Start Date | Apr 14, 2008 |
End Date | Apr 15, 2008 |
Acceptance Date | Apr 15, 2008 |
Online Publication Date | Apr 15, 2008 |
Publication Date | Apr 30, 2008 |
Deposit Date | Jul 6, 2021 |
Publicly Available Date | Jul 6, 2021 |
Pages | 59-64 |
ISBN | 9789056812829 |
Keywords | Information retrieval; Language properties; Target documents; Markov chain |
Public URL | https://rgu-repository.worktribe.com/output/1379965 |
Files
HOENKAMP 2008 The asymptotic behavior (AAM)
(227 Kb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by-nc/4.0/
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search