Skip to main content

Research Repository

Advanced Search

Ensemble-based relationship discovery in relational databases.

Ogunsemi, Akinola; McCall, John; Kern, Mathias; Lacroix, Benjamin; Corsar, David; Owusu, Gilbert

Authors

Akinola Ogunsemi

John McCall

Mathias Kern

David Corsar

Gilbert Owusu



Contributors

Max Bramer
Editor

Richard Ellis
Editor

Abstract

We performed an investigation of how several data relationship discovery algorithms can be combined to improve performance. We investigated eight relationship discovery algorithms like Cosine similarity, Soundex similarity, Name similarity, Value range similarity, etc., to identify potential links between database tables in different ways using different categories of database information. We proposed voting system and hierarchical clustering ensemble methods to reduce the generalization error of each algorithm. Voting scheme uses a given weighting metric to combine the predictions of each algorithm. Hierarchical clustering groups predictions into clusters based on similarities and then combine a member from each cluster together. We run experiments to validate the performance of each algorithm and compare performance with our ensemble methods and the state-of-the-art algorithms (FaskFK, Randomness and HoPF) using Precision, Recall and F-Measure evaluation metrics over TPCH and AdvWork datasets. Results show that performance of each algorithm is limited, indicating the importance of combining them to consolidate their strengths.

Citation

OGUNSEMI, A., MCCALL, J., KERN, M., LACROIX, B., CORSAR, D. and OWUSU, G. 2020. Ensemble-based relationship discovery in relational databases. In Bramer, M. and Ellis, R. (eds.) Artificial intelligence XXXVII: proceedings of 40th British Computer Society's Specialist Group on Artificial Intelligence (SGAI) Artificial intelligence international conference 2020 (AI-2020), 15-17 December 2020, [virtual conference]. Lecture notes in artificial intelligence, 12498. Cham: Springer [online], pages 286-300. Available from: https://doi.org/10.1007/978-3-030-63799-6_22

Conference Name 40th British Computer Society's Specialist Group on Artificial Intelligence (SGAI) Artificial intelligence international conference 2020 (AI-2020)
Conference Location [virtual conference]
Start Date Dec 15, 2020
End Date Dec 17, 2020
Acceptance Date Sep 3, 2020
Online Publication Date Dec 8, 2020
Publication Date Dec 31, 2020
Deposit Date Jan 8, 2021
Publicly Available Date Jan 8, 2021
Publisher Springer
Volume 12498
Pages 286-300
Series Title Lecture notes in artificial intelligence
Series ISSN 0302-9743
Book Title Artificial intelligence XXXVII: proceedings of 40th SGAI Artificial intelligence international conference (AI 2020), 15-17 December 2020, Cambridge, UK
ISBN 9783030637989
DOI https://doi.org/10.1007/978-3-030-63799-6_22
Keywords Data discovery; Database management; Ensemble-based discovery; Primary/foreign key relationship; Semantic relationship
Public URL https://rgu-repository.worktribe.com/output/1085256

Files





You might also like



Downloadable Citations