Lexicon induction for interpretable text classification.

Clos, J�r�mie; Wiratunga, Nirmalie

doi:10.1007/978-3-319-67008-9_39

Lexicon induction for interpretable text classification.

Clos, J�r�mie; Wiratunga, Nirmalie

Authors

J�r�mie Clos

Professor Nirmalie Wiratunga n.wiratunga@rgu.ac.uk
Associate Dean for Research

Contributors

Jaap Kamps
Editor

Giannis Tsakonas
Editor

Yannis Manolopoulos
Editor

Lazaros Iliadis
Editor

Ioannis Karydis
Editor

Abstract

The automated classification of text documents is an active research challenge in document-oriented information systems, helping users browse massive amounts of data, detecting likely authors of unsigned work, or analyzing large corpora along predefined dimensions of interest such as sentiment or emotion. Existing approaches to text classification tend toward building black-box algorithms, offering accurate classification at the price of not understanding the rationale behind each algorithmic prediction. Lexicon-based classifiers offer an alternative to black-box classifiers by modeling the classification problem with a trivially interpretable classifier. However, current techniques for lexiconbased document classification limit themselves to using either handcrafted lexicons, which suffer from human bias and are difficult to extend, or automatically generated lexicons, which are induced using pointestimates of some predefined probabilistic measure in the corpus of interest. This paper proposes LexicNet, an alternative way of generating high accuracy classification lexicons offering an optimal generalization power without sacrificing model interpretability. We evaluate our approach on two tasks: stance detection and sentiment classification. We find that our lexicon outperforms baseline lexicon induction approaches as well as several standard text classifiers.

Citation

CLOS, J. and WIRATUNGA, N. 2017. Lexicon induction for interpretable text classification. In Kampus, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L. and Karydis, I. (eds.) Proceedings of the 21st International conference on theory and practice of digital libraries (TPDL 2017): research and advanced technology for digital libraries, 18-21 September 2017, Thessaloniki, Greece. Lecture notes in computer science, 10450. Cham: Springer [online], pages 498-510. Available from: https://doi.org/10.1007/978-3-319-67008-9_39

Presentation Conference Type	Conference Paper (published)
Conference Name	21st International conference on theory and practice of digital libraries (TPDL 2017)
Start Date	Sep 18, 2017
End Date	Sep 21, 2017
Acceptance Date	May 26, 2017
Online Publication Date	Sep 2, 2017
Publication Date	Sep 30, 2017
Deposit Date	Oct 16, 2018
Publicly Available Date	Oct 16, 2018
Print ISSN	0302-9743
Electronic ISSN	1611-3349
Publisher	Springer
Peer Reviewed	Peer Reviewed
Pages	498-510
Series Title	Lecture notes in computer science
Series Number	10450
Series ISSN	1611-3349
ISBN	9783319670072
DOI	https://doi.org/10.1007/978-3-319-67008-9_39
Keywords	Text classification; Lexicon induction; Sentiment analysis; Stance classification
Public URL	http://hdl.handle.net/10059/3176
Contract Date	Oct 16, 2018