A scalable expressive ensemble learning using random prism: a MapReduce approach.

Stahl, Frederic; May, David; Mills, Hugo; Bramer, Max; Gaber, Mohamed Medhat

doi:10.1007/978-3-662-46703-9_4

A scalable expressive ensemble learning using random prism: a MapReduce approach.

Stahl, Frederic; May, David; Mills, Hugo; Bramer, Max; Gaber, Mohamed Medhat

Authors

Frederic Stahl

David May

Hugo Mills

Max Bramer

Mohamed Medhat Gaber

Contributors

Abdelkader Hameurlain
Editor

Josef K�ng
Editor

Roland Wagner
Editor

Sherif Sakr
Editor

Lizhe Wang
Editor

Albert Zomaya
Editor

Abstract

The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently been proposed as an alternative to the popular Random Forests classifier, which is based on decision trees. Random Prism is based on the Prism family of algorithms, which is more robust to noise. However, like most ensemble classification approaches, Random Prism also does not scale well on large training data. This paper presents a thorough discussion of Random Prism and a recently proposed parallel version of it called Parallel Random Prism. Parallel Random Prism is based on the MapReduce programming paradigm. The paper provides, for the first time, novel theoretical analysis of the proposed technique and in-depth experimental study that show that Parallel Random Prism scales well on a large number of training examples, a large number of data features and a large number of processors. Expressiveness of decision rules that our technique produces makes it a natural choice for Big Data applications where informed decision making increases the user’s trust in the system.

Citation

STAHL, F., MAY, D., MILLS, H., BRAMER, M. and GABER, M.M. 2015. A scalable expressive ensemble learning using random prism: a MapReduce approach. In Hameurlain, A., Küng, J., Wagner, R., Sakr, S., Wang, L. and Zomaya, A. (eds.) Transactions on large-scale data- and knowledge-centred systems XX: special issue on advanced techniques for big data management. Lecture notes in computer science, 9070. Berlin: Springer [online], pages 90-107. Available from: https://doi.org/10.1007/978-3-662-46703-9_4

Acceptance Date	Mar 18, 2015
Online Publication Date	Mar 18, 2015
Publication Date	Mar 18, 2015
Deposit Date	Nov 23, 2015
Publicly Available Date	Mar 19, 2016
Print ISSN	0302-9743
Publisher	Springer
Pages	90-107
Series Title	Lecture notes in computer science
Series Number	9070
Series ISSN	0302-9743
Book Title	Transactions on large-scale data- and knowledge-centred systems XX: special issue on advanced techniques for big data management.
ISBN	9783662467022
DOI	https://doi.org/10.1007/978-3-662-46703-9_4
Keywords	Random Forest; Base Classifier; Computing Node; Training Instance; Ensemble Classifier
Public URL	http://hdl.handle.net/10059/1360
Contract Date	Nov 23, 2015

Files

STAHL 2015 A scalable expressive ensemble (831 Kb)
PDF

Publisher Licence URL
https://creativecommons.org/licenses/by-nc-nd/4.0/

Downloadable Citations

HTML

BIB

RTF