Mr Janaka Senanayake j.senanayake1@rgu.ac.uk
Lecturer
Labelled Vulnerability Dataset on Android source code (LVDAndro) to develop AI-based code vulnerability detection models.
Senanayake, Janaka; Kalutarage, Harsha; Al-Kadri, Mhd Omar; Piras, Luca; Petrovski, Andrei
Authors
Dr Harsha Kalutarage h.kalutarage@rgu.ac.uk
Associate Professor
Mhd Omar Al-Kadri
Luca Piras
Andrei Petrovski
Contributors
Sabrina De Capitani di Vimercati
Editor
Pierangela Samarati
Editor
Abstract
Ensuring the security of Android applications is a vital and intricate aspect requiring careful consideration during development. Unfortunately, many apps are published without sufficient security measures, possibly due to a lack of early vulnerability identification. One possible solution is to employ machine learning models trained on a labelled dataset, but currently, available datasets are suboptimal. This study creates a sequence of datasets of Android source code vulnerabilities, named LVDAndro, labelled based on Common Weakness Enumeration (CWE). Three datasets were generated through app scanning by altering the number of apps and their sources. The LVDAndro, includes over 2,000,000 unique code samples, obtained by scanning over 15,000 apps. The AutoML technique was then applied to each dataset, as a proof of concept to evaluate the applicability of LVDAndro, in detecting vulnerable source code using machine learning. The AutoML model, trained on the dataset, achieved accuracy of 94% and F1-Score of 0.94 in binary classification, and accuracy of 94% and F1-Score of 0.93 in CWE-based multi-class classification. The LVDAndro dataset is publicly available, and continues to expand as more apps are scanned and added to the dataset regularly. The LVDAndro GitHub Repository also includes the source code for dataset generation, and model training.
Citation
SENANAYAKE, J., KALUTARAGE, H., AL-KADRI, M.O., PIRAS, L. and PETROVSKI, A. 2023. Labelled Vulnerability Dataset on Android source code (LVDAndro) to develop AI-based code vulnerability detection models. In De Capitani di Vimercati, S. and Samarati, P. (eds.) Proceedings of the 20th International conference on security and cryptography, 10-12 July 2023, Rome, Italy, volume 1. Setúbal: SciTePress [online], pages 659-666. Available from: https://doi.org/10.5220/0012060400003555
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | 20th International conference on Security and cryptography 2023 (SECRYPT 2023) |
Start Date | Jul 10, 2023 |
End Date | Jul 12, 2023 |
Acceptance Date | Apr 21, 2023 |
Online Publication Date | Jul 12, 2023 |
Publication Date | Dec 31, 2023 |
Deposit Date | Sep 7, 2023 |
Publicly Available Date | Sep 7, 2023 |
Publisher | SciTePress |
Peer Reviewed | Peer Reviewed |
Volume | 1 |
Pages | 659-666 |
Series ISSN | 2184-7711 |
Book Title | Proceedings of the 20th International conference on Security and cryptography |
ISBN | 9789897586668 |
DOI | https://doi.org/10.5220/0012060400003555 |
Keywords | Android application security; Code vulnerability; Labelled dataset; Artificial intelligence; Auto machine learning |
Public URL | https://rgu-repository.worktribe.com/output/2072016 |
Related Public URLs | https://rgu-repository.worktribe.com/output/2072071 (Related dataset link-only output) |
Additional Information | Publisher preferred citation: Senanayake, J.; Kalutarage, H.; Al-Kadri, M.; Piras, L. and Petrovski, A. (2023). Labelled Vulnerability Dataset on Android Source Code (LVDAndro) to Develop AI-Based Code Vulnerability Detection Models. In Proceedings of the 20th International Conference on Security and Cryptography - SECRYPT; ISBN 978-989-758-666-8; ISSN 2184-7711, SciTePress, pages 659-666. DOI: 10.5220/0012060400003555 |
Files
SENANAYAKE 2023 Labelled vulnerability dataset (VOR)
(1.1 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by-nc-nd/4.0/
Copyright Statement
© 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
You might also like
Android source code vulnerability detection: a systematic literature review.
(2023)
Journal Article
Android mobile malware detection using machine learning: a systematic review.
(2021)
Journal Article
Developing secured android applications by mitigating code vulnerabilities with machine learning.
(2022)
Presentation / Conference Contribution
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search