Heterogeneous ensemble selection for evolving data streams. [Dataset]

doi:10.1016/j.patcog.2020.107743

Heterogeneous ensemble selection for evolving data streams. [Dataset]

Contributors

Anh Vu Luong
Data Collector

Dr Thanh Nguyen t.nguyen11@rgu.ac.uk
Data Collector

Alan Wee-Chung Liew
Data Collector

Shilin Wang
Data Collector

Abstract

Ensemble learning has been widely applied to both batch data classification and streaming data classification. For the latter setting, most existing ensemble systems are homogenous, which means they are generated from only one type of learning model. In contrast, by combining several types of different learning models, a heterogeneous ensemble system can achieve greater diversity among its members, which helps to improve its performance. Although heterogeneous ensemble systems have achieved many successes in the batch classification setting, it is not trivial to extend them directly to the data stream setting. In this study, we propose a novel HEterogeneous Ensemble Selection (HEES) method, which dynamically selects an appropriate subset of base classifiers to predict data under the stream setting. We are inspired by the observation that a well-chosen subset of good base classifiers may outperform the whole ensemble system. Here, we define a good candidate as one that expresses not only high predictive performance but also high confidence in its prediction. Our selection process is thus divided into two sub-processes: accurate-candidate selection and confident-candidate selection. We define an accurate candidate in the stream context as a base classifier with high accuracy over the current concept, while a confident candidate as one with a confidence score higher than a certain threshold. In the first sub-process, we employ the prequential accuracy to estimate the performance of a base classifier at a specific time, while in the latter sub-process, we propose a new measure to quantify the predictive confidence and provide a method to learn the threshold incrementally. The final ensemble is formed by taking the intersection of the sets of confident classifiers and accurate classifiers. Experiments on a wide range of data streams show that the proposed method achieves competitive performance with lower running time in comparison to the state-of-the-art online ensemble methods. The supplementary data presented here show the results of these experiments, as well as providing more information about the list of data streams used.

Citation

LUONG, A.V., NGUYEN, T.T., LIEW, A.W.-C. and WANG, S. 2021. Heterogeneous ensemble selection for evolving data streams. [Dataset]. Pattern recognition [online], 112, article ID 107743. Available from: https://www.sciencedirect.com/science/article/pii/S003132032030546X#sec0023

Acceptance Date	Oct 30, 2020
Online Publication Date	Nov 2, 2020
Publication Date	Apr 30, 2021
Deposit Date	Mar 18, 2021
Publicly Available Date	Nov 3, 2021
Publisher	Elsevier
DOI	https://doi.org/10.1016/j.patcog.2020.107743
Keywords	Data streams; Heterogeneous ensembles; Ensemble selection
Public URL	https://rgu-repository.worktribe.com/output/1167901
Publisher URL	https://www.sciencedirect.com/science/article/pii/S003132032030546X#sec0023
Related Public URLs	https://rgu-repository.worktribe.com/output/982146
Type of Data	Supplementary tables and figures.
Collection Date	Oct 30, 2020
Collection Method	This file of supplementary data appeared alongside an article published in Pattern Recognition ( https://doi.org/10.1016/j.patcog.2020.107743 ). Section 3 of the full article includes an explanation of how the proposed selection method was developed. Section 4 includes an explanation for how the sample data streams were selected for use in the experiments, as well as a description of how the experimental data were collected.