Anh Vu Luong
Data Collector
Heterogeneous ensemble selection for evolving data streams. [Dataset]
Contributors
Dr Thanh Nguyen t.nguyen11@rgu.ac.uk
Data Collector
Alan Wee-Chung Liew
Data Collector
Shilin Wang
Data Collector
Abstract
Ensemble learning has been widely applied to both batch data classification and streaming data classification. For the latter setting, most existing ensemble systems are homogenous, which means they are generated from only one type of learning model. In contrast, by combining several types of different learning models, a heterogeneous ensemble system can achieve greater diversity among its members, which helps to improve its performance. Although heterogeneous ensemble systems have achieved many successes in the batch classification setting, it is not trivial to extend them directly to the data stream setting. In this study, we propose a novel HEterogeneous Ensemble Selection (HEES) method, which dynamically selects an appropriate subset of base classifiers to predict data under the stream setting. We are inspired by the observation that a well-chosen subset of good base classifiers may outperform the whole ensemble system. Here, we define a good candidate as one that expresses not only high predictive performance but also high confidence in its prediction. Our selection process is thus divided into two sub-processes: accurate-candidate selection and confident-candidate selection. We define an accurate candidate in the stream context as a base classifier with high accuracy over the current concept, while a confident candidate as one with a confidence score higher than a certain threshold. In the first sub-process, we employ the prequential accuracy to estimate the performance of a base classifier at a specific time, while in the latter sub-process, we propose a new measure to quantify the predictive confidence and provide a method to learn the threshold incrementally. The final ensemble is formed by taking the intersection of the sets of confident classifiers and accurate classifiers. Experiments on a wide range of data streams show that the proposed method achieves competitive performance with lower running time in comparison to the state-of-the-art online ensemble methods. The supplementary data presented here show the results of these experiments, as well as providing more information about the list of data streams used.
Citation
LUONG, A.V., NGUYEN, T.T., LIEW, A.W.-C. and WANG, S. 2021. Heterogeneous ensemble selection for evolving data streams. [Dataset]. Pattern recognition [online], 112, article ID 107743. Available from: https://www.sciencedirect.com/science/article/pii/S003132032030546X#sec0023
Acceptance Date | Oct 30, 2020 |
---|---|
Online Publication Date | Nov 2, 2020 |
Publication Date | Apr 30, 2021 |
Deposit Date | Mar 18, 2021 |
Publicly Available Date | Nov 3, 2021 |
Publisher | Elsevier |
DOI | https://doi.org/10.1016/j.patcog.2020.107743 |
Keywords | Data streams; Heterogeneous ensembles; Ensemble selection |
Public URL | https://rgu-repository.worktribe.com/output/1167901 |
Publisher URL | https://www.sciencedirect.com/science/article/pii/S003132032030546X#sec0023 |
Related Public URLs | https://rgu-repository.worktribe.com/output/982146 |
Type of Data | Supplementary tables and figures. |
Collection Date | Oct 30, 2020 |
Collection Method | This file of supplementary data appeared alongside an article published in Pattern Recognition ( https://doi.org/10.1016/j.patcog.2020.107743 ). Section 3 of the full article includes an explanation of how the proposed selection method was developed. Section 4 includes an explanation for how the sample data streams were selected for use in the experiments, as well as a description of how the experimental data were collected. |
Files
LUONG 2021 Heterogeneous (DATA)
(669 Kb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by-nc-nd/4.0/
You might also like
Two-layer ensemble of deep learning models for medical image segmentation.
(2024)
Journal Article
DEFEG: deep ensemble with weighted feature generation.
(2023)
Journal Article
A comparative study of anomaly detection methods for gross error detection problems.
(2023)
Journal Article
Heterogeneous ensemble selection for evolving data streams.
(2020)
Journal Article
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search