Skip to main content

Research Repository

Advanced Search

Overlap-based undersampling for improving imbalanced data classification.

Vuttipittayamongkol, Pattaramon; Elyan, Eyad; Petrovski, Andrei; Jayne, Chrisina

Authors

Pattaramon Vuttipittayamongkol

Chrisina Jayne



Contributors

Hujun Yin
Editor

David Camacho
Editor

Paulo Novais
Editor

Antonio J. Tallón-Ballesteros
Editor

Abstract

Classification of imbalanced data remains an important field in machine learning. Several methods have been proposed to address the class imbalance problem including data resampling, adaptive learning and cost adjusting algorithms. Data resampling methods are widely used due to their simplicity and flexibility. Most existing resampling techniques aim at rebalancing class distribution. However, class imbalance is not the only factor that impacts the performance of the learning algorithm. Class overlap has proved to have a higher impact on the classification of imbalanced datasets than the dominance of the negative class. In this paper, we propose a new undersampling method that eliminates negative instances from the overlapping region and hence improves the visibility of the minority instances. Testing and evaluating the proposed method using 36 public imbalanced datasets showed statistically significant improvements in classification performance.

Citation

VUTTIPITTAYAMONGKOL, P., ELYAN, E., PETROVSKI, A. and JAYNE, C. 2018. Overlap-based undersampling for improving imbalanced data classification. In Yin, H., Camacho, D., Novais, P. and Tallón-Ballesteros, A. (eds.) Intelligent data engineering and automated learning: proceedings of the 19th International intelligent data engineering and automated learning conference (IDEAL 2018), 21-23 November 2018, Madrid, Spain. Lecture notes in computer science, 11341. Cham: Springer [online], pages 689-697. Available from: https://doi.org/10.1007/978-3-030-03493-1_72

Conference Name 19th International intelligent data engineering and automated learning conference (IDEAL 2018)
Conference Location Madrid, Spain
Start Date Nov 21, 2018
End Date Nov 23, 2018
Acceptance Date Aug 8, 2018
Online Publication Date Nov 9, 2018
Publication Date Dec 21, 2018
Deposit Date Feb 8, 2019
Publicly Available Date Feb 8, 2019
Publisher Springer
Pages 689-697
Series Title Lecture notes in computer science
Series Number 11314
Series ISSN 0302-9743
ISBN 9783030034924
DOI https://doi.org/10.1007/978-3-030-03493-1_72
Keywords Undersampling; Overlap; Imbalanced data; Classification; Fuzzy C-means; Resampling
Public URL http://hdl.handle.net/10059/3281

Files





You might also like



Downloadable Citations