Taxonomic corpus-based concept summary generation for document annotation.

Nkisi-Orji, Ikechukwu; Wiratunga, Nirmalie; Hui, Kit-Ying; Heaven, Rachel; Massie, Stewart

doi:10.1007/978-3-319-67008-9_5

Taxonomic corpus-based concept summary generation for document annotation.

Nkisi-Orji, Ikechukwu; Wiratunga, Nirmalie; Hui, Kit-Ying; Heaven, Rachel; Massie, Stewart

Authors

Dr Ikechukwu Nkisi-Orji i.nkisi-orji@rgu.ac.uk
Chancellor's Fellow

Professor Nirmalie Wiratunga n.wiratunga@rgu.ac.uk
Associate Dean for Research

Dr Kit-ying Hui k.hui@rgu.ac.uk
Lecturer

Rachel Heaven

Dr Stewart Massie s.massie@rgu.ac.uk
Associate Professor

Contributors

Jaap Kamps
Editor

Giannis Tsakonas
Editor

Yannis Manolopoulos
Editor

Lazaros Iliadis
Editor

Ioannis Karydis
Editor

Abstract

Semantic annotation is an enabling technology which links documents to concepts that unambiguously describe their content. Annotation improves access to document contents for both humans and software agents. However, the annotation process is a challenging task as annotators often have to select from thousands of potentially relevant concepts from controlled vocabularies. The best approaches to assist in this task rely on reusing the annotations of an annotated corpus. In the absence of a pre-annotated corpus, alternative approaches suffer due to insufficient descriptive texts for concepts in most vocabularies. In this paper, we propose an unsupervised method for recommending document annotations based on generating node descriptors from an external corpus. We exploit knowledge of the taxonomic structure of a thesaurus to ensure that effective descriptors (concept summaries) are generated for concepts. Our evaluation on recommending annotations show that the content that we generate effectively represents the concepts. Also, our approach outperforms those which rely on information from a thesaurus alone and is comparable with supervised approaches.

Citation

NKISI-ORJI, I., WIRATUNGA, N., HUI, K.-Y., HEAVEN, R. and MASSIE, S. 2017. Taxonomic corpus-based concept summary generation for document annotation. In Kampus, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L. and Karydis, I. (eds.) Proceedings of the 21st International conference on theory and practice of digital libraries (TPDL 2017): research and advanced technology for digital libraries, 18-21 September 2017, Thessaloniki, Greece. Lecture notes in computer science, 10450. Cham: Springer [online], pages 49-60. Available from: https://doi.org/10.1007/978-3-319-67008-9_5

Presentation Conference Type	Conference Paper (published)
Conference Name	21st International conference on theory and practice of digital libraries (TPDL 2017)
Start Date	Sep 18, 2017
End Date	Sep 21, 2017
Acceptance Date	May 26, 2017
Online Publication Date	Sep 2, 2017
Publication Date	Sep 30, 2017
Deposit Date	Nov 10, 2017
Publicly Available Date	Sep 3, 2018
Print ISSN	0302-9743
Publisher	Springer
Peer Reviewed	Peer Reviewed
Pages	49-60
Series Title	Lecture notes in computer science
Series Number	10450
Series ISSN	0302-9743
ISBN	9783319670072
DOI	https://doi.org/10.1007/978-3-319-67008-9_5
Keywords	Taxonomy; Text annotation; Information discovery
Public URL	http://hdl.handle.net/10059/2586
Contract Date	Nov 10, 2017