Concept-based document readability in domain specific information retrieval.

Yan, Xin; Song, Dawei; Li, Xue

doi:10.1145/1183614.1183692

Concept-based document readability in domain specific information retrieval.

Yan, Xin; Song, Dawei; Li, Xue

Authors

Xin Yan

Dawei Song

Xue Li

Abstract

Domain specific information retrieval has become in demand. Not only domain experts, but also average non-expert users are interested in searching domain specific (e.g., medical and health) information from online resources. However, a typical problem to average users is that the search results are always a mixture of documents with different levels of readability. Non-expert users may want to see documents with higher readability on the top of the list. Consequently the search results need to be re-ranked in a descending order of readability. It is often not practical for domain experts to manually label the readability of documents for large databases. Computational models of readability needs to be investigated. However, traditional readability formulas are designed for general purpose text and insufficient to deal with technical materials for domain specific information retrieval. More advanced algorithms such as textual coherence model are computationally expensive for re-ranking a large number of retrieved documents. In this paper, we propose an effective and computationally tractable concept-based model of text readability. In addition to textual genres of a document, our model also takes into account domain specific knowledge, i.e., how the domain-specific concepts contained in the document affect the document's readability. Three major readability formulas are proposed and applied to health and medical information retrieval. Experimental results show that our proposed readability formulas lead to remarkable improvements in terms of correlation with users' readability ratings over four traditional readability measures.

Citation

YAN, X., SONG, D. and LI, X. 2006. Concept-based document readability in domain specific information retrieval. In Proceedings of the 15th Association for Computing Machinery (ACM) international conference on information and knowledge management (CIKM'06), 5-11 November 2006, Arlington, USA. New York: ACM [online], pages 540-549. Available from: https://doi.org/10.1145/1183614.1183692

Presentation Conference Type	Conference Paper (published)
Conference Name	15th Association for Computing Machinery (ACM) international conference on information and knowledge management (CIKM'06)
Start Date	Nov 5, 2006
End Date	Nov 11, 2006
Acceptance Date	Dec 31, 2006
Online Publication Date	Dec 31, 2006
Publication Date	Dec 31, 2006
Deposit Date	May 12, 2009
Publicly Available Date	May 12, 2009
Publisher	Association for Computing Machinery (ACM)
Peer Reviewed	Peer Reviewed
Pages	540-549
ISBN	1595934332; 9781595934338
DOI	https://doi.org/10.1145/1183614.1183692
Keywords	Document ranking; Document readability; Document scope and cohesion; Readability; Readability formula; Coherence
Public URL	http://hdl.handle.net/10059/336
Contract Date	May 12, 2009

Files

YAN 2006 Concept-based document (267 Kb)
PDF

Publisher Licence URL
https://creativecommons.org/licenses/by-nc-nd/4.0/

Downloadable Citations

HTML

BIB

RTF