Skip to main content

Research Repository

Advanced Search

Towards handling temporal dependence in concept drift streams.

Wares, Scott Brian

Authors

Scott Brian Wares



Contributors

Abstract

Modern technological advancements have led to the production of an incomprehensible amount of data from a wide array of devices. A constant supply of new data provides an invaluable opportunity for access to qualitative and quantitative insights. Organisations recognise that, in today's modern era, data provides a means of mitigating risk and loss whilst maximising effciency and profit. However, processing this data is not without its challenges. Much of this data is produced in an online environment. Realtime stream data is unbound in size, variety and velocity. Data may arrive complete or with missing attributes, and data availability and persistence is limited to a small window of time. Classification methods and techniques that process offline data are not applicable to online data streams. Instead, new online classification methods have been developed. Research concerning the problematic and prevalent issue of concept drift has produced a considerable number of methods that allow online classifiers to adapt to changes in the stream distribution. However, recent research suggests that the presence of temporal dependence can cause misleading evaluation when accuracy is used as the core metric. This thesis investigates temporal dependence and its negative effcts upon the classification of concept drift data. First, this thesis proposes a novel method for coping with temporal dependence during the classification of real-time data streams, where concept drift is present. Results indicate that a statistical based, selective resetting approach can reduce the impact of temporal dependence in concept drift streams without significant loss in predictive accuracy. Secondly, a new ensemble based method, KTUE, that adopts the Kappa-Temporal statistic for vote weighting is suggested. Results show that this method is capable of outperforming some state-of-the-art ensemble methods in both temporally dependent and non-temporally dependent environments. Finally, this research proposes a novel algorithm for the simulation of temporally dependent concept drift data, which aims to help address the lack of established datasets available for evaluation. Experimental results show that temporal dependence can be injected into fabricated data streams using existing generation methods.

Citation

WARES, S.B. 2023. Towards handling temporal dependence in concept drift streams. Robert Gordon University, PhD thesis. Hosted on OpenAIR [online]. Available from: https://doi.org/10.48526/rgu-wt-2271523

Thesis Type Thesis
Deposit Date Mar 14, 2024
Publicly Available Date Mar 14, 2024
DOI https://doi.org/10.48526/rgu-wt-2271523
Keywords Data; Data streams; Data classification; Concept drift; Temporal dependence
Public URL https://rgu-repository.worktribe.com/output/2271523
Award Date May 31, 2023

Files




You might also like



Downloadable Citations