Pattaramon Vuttipittayamongkol
Neighbourhood-based undersampling approach for handling imbalanced and overlapped data.
Vuttipittayamongkol, Pattaramon; Elyan, Eyad
Abstract
Class imbalanced datasets are common across different domains including health, security, banking and others. A typical supervised learning algorithm tends to be biased towards the majority class when dealing with imbalanced datasets. The learning task becomes more challenging when there is also an overlap of instances from different classes. In this paper, we propose an undersampling framework for handling class imbalance in binary datasets by removing potential overlapped data points. Our methods are designed to identify and eliminate majority class instances from the overlapping region. Accurate identification and elimination of these instances maximise the visibility of the minority class instances and at the same time minimises excessive elimination of data, which reduces information loss. Four methods based on neighbourhood searching with different criteria to identify potential overlapped instances are proposed in this paper. Extensive experiments using simulated and real-world datasets were carried out. Results show comparable performance with state-of-the-art methods across different common metrics with exceptional and statistically significant improvements in sensitivity.
Citation
VUTTIPITTAYAMONGKOL, P. and ELYAN, E. 2020. Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Information sciences [online], 509, pages 47-70. Available from: https://doi.org/10.1016/j.ins.2019.08.062
Journal Article Type | Article |
---|---|
Acceptance Date | Aug 26, 2019 |
Online Publication Date | Sep 3, 2019 |
Publication Date | Jan 31, 2020 |
Deposit Date | Sep 9, 2019 |
Publicly Available Date | Sep 4, 2020 |
Journal | Information Sciences |
Print ISSN | 0020-0255 |
Electronic ISSN | 1872-6291 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 509 |
Pages | 47-70 |
DOI | https://doi.org/10.1016/j.ins.2019.08.062 |
Keywords | Imbalanced dataset; Undersampling; k-NN; Class overlap; Classification |
Public URL | https://rgu-repository.worktribe.com/output/512732 |
Contract Date | Sep 9, 2019 |
Files
VUTTIPITTAYAMONGKOL 2020 Neighbourhood based
(903 Kb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by-nc-nd/4.0/
You might also like
On the class overlap problem in imbalanced data classification.
(2020)
Journal Article
A data-driven decision support tool for offshore oil and gas decommissioning.
(2021)
Journal Article
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search