Skip to main content

Research Repository

Advanced Search

The asymptotic behavior of a limited dependencies language model.

Hoenkamp, Eduard; Bruza, Peter; Huang, Qiang; Song, Dawei

Authors

Eduard Hoenkamp

Peter Bruza

Qiang Huang

Dawei Song



Contributors

E. Hoenkamp
Editor

M. De Cock
Editor

V. Hoste
Exhibitor

Abstract

Intuitively, any ‘bag of words’ approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. First, the term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Second, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Third, ranking is achieved by comparing the Kullback-Leibler divergence between the stationary distributions of query and documents. These steps can be implemented as a simple and computationally inexpensive algorithm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation of the document than any model that represents the dependencies in the document by its initial distribution. A secondary contribution is to investigate the practical application of this representation. To do so, the algorithm was tested on the AP88-89 and WSJ87-92 collections in a pseudo-relevance feedback setting. Results showed consistent improvements over a standard language model baseline. Moreover, even in its simple form, the algorithm proved already to be on a par with more sophisticated algorithms that depend on choosing sets of parameters or extensive training. Hence, adding such schemes may be expected to improve the the results of the simple algorithm beyond current practice.

Citation

HOENKAMP, E., BRUZA, P., HUANG, Q. and SONG, D. 2008. The asymptotic behavior of a limited dependencies language model. In Hoenkamp, E., De Cock, M. and Hoste, V. (eds.) Proceedings of 8th Dutch-Belgian information retrieval workshop 2008 (DIR 2008), 14-15 April 2008, Maastricht, Netherlands. Enschede: Neslia Paniculata, pages 59-64.

Conference Name 8th Dutch-Belgian information retrieval workshop 2008 (DIR 2008)
Conference Location Maastricht, Netherlands
Start Date Apr 14, 2008
End Date Apr 15, 2008
Acceptance Date Apr 15, 2008
Online Publication Date Apr 15, 2008
Publication Date Apr 30, 2008
Deposit Date Jul 6, 2021
Publicly Available Date Jul 6, 2021
Pages 59-64
ISBN 9789056812829
Keywords Information retrieval; Language properties; Target documents; Markov chain
Public URL https://rgu-repository.worktribe.com/output/1379965

Files




Downloadable Citations