Skip to main content

Research Repository

Advanced Search

Learning from class-imbalanced data: overlap-driven resampling for imbalanced data classification.

Vuttipittayamongkol, Pattaramon

Authors

Pattaramon Vuttipittayamongkol



Contributors

Abstract

Classification of imbalanced datasets has attracted substantial research interest over the past years. This is because imbalanced datasets are common in several domains such as health, finance and security, but learning algorithms are generally not designed to handle them. Many existing solutions focus mainly on the class distribution problem. However, a number of reports showed that class overlap had a higher negative impact on the learning process than class imbalance. This thesis thoroughly explores the impact of class overlap on the learning algorithm and demonstrates how elimination of class overlap can effectively improve the classification of imbalanced datasets. Novel undersampling approaches were developed with the main objective of enhancing the presence of minority class instances in the overlapping region. This is achieved by identifying and removing majority class instances potentially residing in such a region. Seven methods under the two different approaches were designed for the task. Extensive experiments were carried out to evaluate the methods on simulated and well-known real-world datasets. Results showed that substantial improvement in the classification accuracy of the minority class was obtained with favourable trade-offs with the majority class accuracy. Moreover, successful application of the methods in predictive diagnostics of diseases with imbalanced records is presented. These novel overlap-based approaches have several advantages over other common resampling methods. First, the undersampling amount is independent of class imbalance and proportional to the degree of overlap. This could effectively address the problem of class overlap while reducing the effect of class imbalance. Second, information loss is minimised as instance elimination is contained within the problematic region. Third, adaptive parameters enable the methods to be generalised across different problems. It is also worth pointing out that these methods provide different trade-offs, which offer more alternatives to real-world users in selecting the best fit solution to the problem.

Citation

VUTTIPITTAYAMONGKOL, P. 2020. Learning from class-imbalanced data: overlap-driven resampling for imbalanced data classification. Robert Gordon University, PhD thesis. Hosted on OpenAIR [online]. Available from: https://openair.rgu.ac.uk

Thesis Type Thesis
Deposit Date Mar 1, 2021
Publicly Available Date Mar 28, 2024
Keywords Class imbalance; Class overlap; Undersampling; Classification; Machine learning; Medical informatics
Public URL https://rgu-repository.worktribe.com/output/1239009
Award Date Oct 31, 2020

Files




You might also like



Downloadable Citations