Skip to main content

Research Repository

Advanced Search

Labelled Vulnerability Dataset on Android source code (LVDAndro) to develop AI-based code vulnerability detection models. [Dataset]

Senanayake, Janaka; Kalutarage, Harsha; Al-Kadri, Mhd Omar; Piras, Luca; Petrovski, Andrei

Authors

Janaka Senanayake

Mhd Omar Al-Kadri

Luca Piras



Abstract

Many of the Android apps get published without appropriate security considerations, possibly due to not verifying code or not identifying vulnerabilities at the early stages of development. This can be overcome by using an AI based model trained on a properlly labeled dataset. Hence, LVDAndro provides a dataset for Android source code vulnerabilities, labelled based on Common Weakness Enumeration (CWE). The dataset has been generated using code lines scanned from real-world Android apps containing a large amount of distinct source code samples. The dataset can be downloaded from the Dataset directory. There are 3 dataset folders and each contains a readme file with important details and links to download dataset stored in a Google Drive.

Citation

SENANAYAKE, J., KALUTARAGE, H., AL-KADRI, M.O., PIRAS, L. and PETROVSKI, A. 2023. Labelled Vulnerability Dataset on Android source code (LVDAndro) to develop AI-based code vulnerability detection models [Dataset]. Hosted on GitHub (online). Available from: https://github.com/softwaresec-labs/LVDAndro

Online Publication Date Sep 2, 2022
Publication Date Sep 2, 2022
Deposit Date Sep 7, 2023
Publicly Available Date Sep 7, 2023
Keywords Android application security; Code vulnerability; Labelled dataset; Artificial intelligence; Auto machine learning
Public URL https://rgu-repository.worktribe.com/output/2072071
Publisher URL https://web.archive.org/web/20230907100039/https://github.com/softwaresec-labs/LVDAndro
Related Public URLs https://rgu-repository.worktribe.com/output/2072016 (Related conference paper)
Collection Date Sep 2, 2022