Pattaramon Vuttipittayamongkol
Learning from class-imbalanced data: overlap-driven resampling for imbalanced data classification.
Vuttipittayamongkol, Pattaramon
Abstract
Classification of imbalanced datasets has attracted substantial research interest over the past years. This is because imbalanced datasets are common in several domains such as health, finance and security, but learning algorithms are generally not designed to handle them. Many existing solutions focus mainly on the class distribution problem. However, a number of reports showed that class overlap had a higher negative impact on the learning process than class imbalance. This thesis thoroughly explores the impact of class overlap on the learning algorithm and demonstrates how elimination of class overlap can effectively improve the classification of imbalanced datasets. Novel undersampling approaches were developed with the main objective of enhancing the presence of minority class instances in the overlapping region. This is achieved by identifying and removing majority class instances potentially residing in such a region. Seven methods under the two different approaches were designed for the task. Extensive experiments were carried out to evaluate the methods on simulated and well-known real-world datasets. Results showed that substantial improvement in the classification accuracy of the minority class was obtained with favourable trade-offs with the majority class accuracy. Moreover, successful application of the methods in predictive diagnostics of diseases with imbalanced records is presented. These novel overlap-based approaches have several advantages over other common resampling methods. First, the undersampling amount is independent of class imbalance and proportional to the degree of overlap. This could effectively address the problem of class overlap while reducing the effect of class imbalance. Second, information loss is minimised as instance elimination is contained within the problematic region. Third, adaptive parameters enable the methods to be generalised across different problems. It is also worth pointing out that these methods provide different trade-offs, which offer more alternatives to real-world users in selecting the best fit solution to the problem.
Citation
VUTTIPITTAYAMONGKOL, P. 2020. Learning from class-imbalanced data: overlap-driven resampling for imbalanced data classification. Robert Gordon University, PhD thesis. Hosted on OpenAIR [online]. Available from: https://openair.rgu.ac.uk
Thesis Type | Thesis |
---|---|
Deposit Date | Mar 1, 2021 |
Publicly Available Date | Mar 1, 2021 |
Keywords | Class imbalance; Class overlap; Undersampling; Classification; Machine learning; Medical informatics |
Public URL | https://rgu-repository.worktribe.com/output/1239009 |
Award Date | Oct 31, 2020 |
Files
VUTTIPITTAYAMONGKOL 2020 Learning from class-imbalanced data
(2.1 Mb)
PDF
Licence
https://creativecommons.org/licenses/by-nc/4.0/
Copyright Statement
© The Author.
You might also like
A data-driven decision support tool for offshore oil and gas decommissioning.
(2021)
Journal Article
On the class overlap problem in imbalanced data classification.
(2020)
Journal Article
Neighbourhood-based undersampling approach for handling imbalanced and overlapped data.
(2019)
Journal Article
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search