J�r�mie Clos
Lexicon induction for interpretable text classification.
Clos, J�r�mie; Wiratunga, Nirmalie
Authors
Professor Nirmalie Wiratunga n.wiratunga@rgu.ac.uk
Associate Dean for Research
Contributors
Jaap Kamps
Editor
Giannis Tsakonas
Editor
Yannis Manolopoulos
Editor
Lazaros Iliadis
Editor
Ioannis Karydis
Editor
Abstract
The automated classification of text documents is an active research challenge in document-oriented information systems, helping users browse massive amounts of data, detecting likely authors of unsigned work, or analyzing large corpora along predefined dimensions of interest such as sentiment or emotion. Existing approaches to text classification tend toward building black-box algorithms, offering accurate classification at the price of not understanding the rationale behind each algorithmic prediction. Lexicon-based classifiers offer an alternative to black-box classifiers by modeling the classification problem with a trivially interpretable classifier. However, current techniques for lexiconbased document classification limit themselves to using either handcrafted lexicons, which suffer from human bias and are difficult to extend, or automatically generated lexicons, which are induced using pointestimates of some predefined probabilistic measure in the corpus of interest. This paper proposes LexicNet, an alternative way of generating high accuracy classification lexicons offering an optimal generalization power without sacrificing model interpretability. We evaluate our approach on two tasks: stance detection and sentiment classification. We find that our lexicon outperforms baseline lexicon induction approaches as well as several standard text classifiers.
Citation
CLOS, J. and WIRATUNGA, N. 2017. Lexicon induction for interpretable text classification. In Kampus, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L. and Karydis, I. (eds.) Proceedings of the 21st International conference on theory and practice of digital libraries (TPDL 2017): research and advanced technology for digital libraries, 18-21 September 2017, Thessaloniki, Greece. Lecture notes in computer science, 10450. Cham: Springer [online], pages 498-510. Available from: https://doi.org/10.1007/978-3-319-67008-9_39
Conference Name | 21st International conference on theory and practice of digital libraries (TPDL 2017) |
---|---|
Conference Location | Thessaloniki, Greece |
Start Date | Sep 18, 2017 |
End Date | Sep 21, 2017 |
Acceptance Date | May 26, 2017 |
Online Publication Date | Sep 2, 2017 |
Publication Date | Sep 30, 2017 |
Deposit Date | Oct 16, 2018 |
Publicly Available Date | Oct 16, 2018 |
Print ISSN | 0302-9743 |
Electronic ISSN | 1611-3349 |
Publisher | Springer |
Pages | 498-510 |
Series Title | Lecture notes in computer science |
Series Number | 10450 |
Series ISSN | 1611-3349 |
ISBN | 9783319670072 |
DOI | https://doi.org/10.1007/978-3-319-67008-9_39 |
Keywords | Text classification; Lexicon induction; Sentiment analysis; Stance classification |
Public URL | http://hdl.handle.net/10059/3176 |
Files
CLOS 2017 Lexicon induction for interpretable
(1.1 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by-nc/4.0/
You might also like
Towards feasible counterfactual explanations: a taxonomy guided template-based NLG method.
(2023)
Conference Proceeding
Proceedings of the 6th International workshop on knowledge discovery from healthcare data (KDH@IJCAI 2023)
(2023)
Conference Proceeding
CBR driven interactive explainable AI.
(2023)
Conference Proceeding
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search