Skip to main content

Research Repository

Advanced Search

Topology for preserving feature correlation in tabular synthetic data.

Arifeen, Murshedul; Petrovski, Andrei

Authors



Abstract

Tabular synthetic data generating models based on Generative Adversarial Network (GAN) show significant contributions to enhancing the performance of deep learning models by providing a sufficient amount of training data. However, the existing GAN-based models cannot preserve the feature correlations in synthetic data during the data synthesis process. Therefore, the synthetic data become unrealistic and creates a problem for certain applications like correlation-based feature weighting. In this short theoretical paper, we showed a promising approach based on the topology of datasets to preserve correlation in synthetic data. We formulated our hypothesis for preserving correlation in synthetic data and used persistent homology to show that the topological spaces of the original and synthetic data have dissimilarity in topological features, especially in 0th and 1st Homology groups. Finally, we concluded that minimizing the difference in topological features can make the synthetic data space locally homeomorphic to the original data space, and the synthetic data may preserve the feature correlation under homeomorphism conditions.

Citation

ARIFEEN, M. and PETROVSKI, A. 2022. Topology for preserving feature correlation in tabular synthetic data. In Proceedings of the 15th IEEE (Institute of Electrical and Electronics Engineers) International conference on security of information and networks 2022 (SINCONF 2022), 11-13 November 2022, Sousse, Tunisia. Piscataway: IEEE [online], pages 61-66. Available from: https://doi.org/10.1109/SIN56466.2022.9970505

Conference Name 15th International conference on security of information and networks 2022 (SINCONF 2022)
Conference Location Sousse, Tunisia
Start Date Nov 11, 2022
End Date Nov 13, 2022
Acceptance Date Sep 25, 2022
Online Publication Date Nov 13, 2022
Publication Date Dec 16, 2022
Deposit Date Jan 9, 2023
Publicly Available Date Mar 29, 2024
Publisher Institute of Electrical and Electronics Engineers (IEEE)
Pages 61-66
Book Title Proceedings of the 2022 15th IEEE International conference on security of information and networks 2022 (SINCONF 2022)
ISBN 9781665454650
DOI https://doi.org/10.1109/SIN56466.2022.9970505
Keywords Synthetic data; Correlation; GAN; Topology; Persistent homology
Public URL https://rgu-repository.worktribe.com/output/1853567

Files

ARIFEEN 2022 Topology for preserving feature (AAM) (686 Kb)
PDF

Copyright Statement
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.




You might also like



Downloadable Citations