Taxonomic corpus-based concept summary generation for document annotation.
Nkisi-Orji, Ikechukwu; Wiratunga, Nirmalie; Hui, Kit-Ying; Heaven, Rachel; Massie, Stewart
Professor Nirmalie Wiratunga email@example.com
Doctor Stewart Massie firstname.lastname@example.org
Senior Research Fellow
Semantic annotation is an enabling technology which links documents to concepts that unambiguously describe their content. Annotation improves access to document contents for both humans and software agents. However, the annotation process is a challenging task as annotators often have to select from thousands of potentially relevant concepts from controlled vocabularies. The best approaches to assist in this task rely on reusing the annotations of an annotated corpus. In the absence of a pre-annotated corpus, alternative approaches suffer due to insufficient descriptive texts for concepts in most vocabularies. In this paper, we propose an unsupervised method for recommending document annotations based on generating node descriptors from an external corpus. We exploit knowledge of the taxonomic structure of a thesaurus to ensure that effective descriptors (concept summaries) are generated for concepts. Our evaluation on recommending annotations show that the content that we generate effectively represents the concepts. Also, our approach outperforms those which rely on information from a thesaurus alone and is comparable with supervised approaches.
|Start Date||Sep 18, 2017|
|Publication Date||Sep 30, 2017|
|Publisher||Springer (part of Springer Nature)|
|Series Title||Lecture notes in computer science|
|Institution Citation||NKISI-ORJI, I., WIRATUNGA, N., HUI, K.-Y., HEAVEN, R. and MASSIE, S. 2017. Taxonomic corpus-based concept summary generation for document annotation. In Kampus, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L. and Karydis, I. (eds.) Proceedings of the 21st International conference on theory and practice of digital libraries (TPDL 2017): research and advanced technology for digital libraries, 18-21 September 2017, Thessaloniki, Greece. Lecture notes in computer science, 10450. Cham: Springer [online], pages 49-60. Available from: https://doi.org/10.1007/978-3-319-67008-9_5|
|Keywords||Taxonomy; Text annotation; Information discovery|
NKISI-ORJI 2017 Taxonomic corpus-based concept summaries
You might also like
Ontology driven information retrieval.
Matching networks for personalised human activity recognition.
Improving human activity recognition with neural translator models.