Pattaramon Vuttipittayamongkol
On the class overlap problem in imbalanced data classification.
Vuttipittayamongkol, Pattaramon; Elyan, Eyad; Petrovski, Andrei
Abstract
Class imbalance is an active research area in the machine learning community. However, existing and recent literature showed that class overlap had a higher negative impact on the performance of learning algorithms. This paper provides detailed critical discussion and objective evaluation of class overlap in the context of imbalanced data and its impact on classification accuracy. First, we present a thorough experimental comparison of class overlap and class imbalance. Unlike previous work, our experiment was carried out on the full scale of class overlap and an extreme range of class imbalance degrees. Second, we provide an in-depth critical technical review of existing approaches to handle imbalanced datasets. Existing solutions from selective literature are critically reviewed and categorised as class distribution-based and class overlap-based methods. Emerging techniques and the latest development in this area are also discussed in detail. Experimental results in this paper are consistent with existing literature and show clearly that the performance of the learning algorithm deteriorates across varying degrees of class overlap whereas class imbalance does not always have an effect. The review emphasises the need for further research towards handling class overlap in imbalanced datasets to effectively improve learning algorithms’ performance.
Citation
VUTTIPITTAYAMONGKOL, P., ELYAN, E. and PETROVSKI, A. 2021. On the class overlap problem in imbalanced data classification. Knowledge-based systems [online], 212, article number 106631. Available from: https://doi.org/10.1016/j.knosys.2020.106631
Journal Article Type | Article |
---|---|
Acceptance Date | Nov 25, 2020 |
Online Publication Date | Nov 27, 2020 |
Publication Date | Jan 5, 2021 |
Deposit Date | Dec 2, 2020 |
Publicly Available Date | Nov 28, 2021 |
Journal | Knowledge-Based Systems |
Print ISSN | 0950-7051 |
Electronic ISSN | 1872-7409 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 212 |
Article Number | 106631 |
DOI | https://doi.org/10.1016/j.knosys.2020.106631 |
Keywords | Imbalanced data; Class overlap; Classification; Evaluation metric; Benchmark |
Public URL | https://rgu-repository.worktribe.com/output/1000460 |
Files
VUTTIPITTAYAMONGKOL 2021 On the class overlap problem
(616 Kb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by-nc-nd/4.0/
You might also like
A data-driven decision support tool for offshore oil and gas decommissioning.
(2021)
Journal Article
Neighbourhood-based undersampling approach for handling imbalanced and overlapped data.
(2019)
Journal Article
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search