Resource Efficient Boosting Method for IoT Security Monitoring

Machine learning (ML) methods are widely proposed for security monitoring of Internet of Things (IoT). However, these methods can be computationally expensive for resource constraint IoT devices. This paper proposes an optimized resource efficient ML method that can detect various attacks on IoT devices. It utilizes Light Gradient Boosting Machine (LGBM). The performance of this approach was evaluated against four realistic IoT benchmark datasets. Experimental results show that the proposed method can effectively detect attacks on IoT devices with limited resources, and outperforms the state of the art techniques.


I. INTRODUCTION
The Internet of Things (IoT) technology is advancing with the proliferation of physically connected objects. IoT integrated multiple devices into networks [1] to provide efficient and intelligent services at a minimum cost. Therefore, IoT is the driving force behind various advanced automation systems.
Despite their diverse implementation and adoption in different sectors, the security and privacy of these interconnected smart things pose a significant challenge. The need for robust security techniques in response to their resource limitation escalates. Based on the available resource, IoT networks have to provide effective security mechanisms that can monitor and detect severe cyber threats.
The adoption of Machine Learning (ML) techniques are gaining popularity as an approach with wider applications in many areas. In the context of IoT, the deployed anomaly detection system in [2] improved the detection of malicious activities on smart phone devices. The deeper approach in [3] can detect Distributed Denial of Servive (DDoS) botnet attacks on consumer IoT devices. The appealing nature of ML that dignifies its practical implementation in various fields is the capability of developing a model that can learn the statistical distribution of complex and higher dimensional datasets.
Due to the motivational application of ML methods on IoT networks, Decision Tree Ensemble Methods (DTEMs) are employed to mitigate various security threats. An overview of such implementation is presented in [4]. However, most of the underlying DTEMs are computationally expensive with complex and larger dataset [5]. Their practical implementation required intensive memory and time resources. This limits their direct deployment for IoT security monitoring.
Based on this observation, we develop an optimized boosting method for resource constraint IoT devices. It adopted an ML model and factored out its less computational expensive parameters. It can accurately discern attack and regular traffic on IoT networks. It provides a robust implementation in IoT security monitoring, while the best trade-off depends on the datasets and selected ML model.
The rest of this paper is organized as follows. Background and related studies are presented in Section II. The proposed approach is described in Section III. Section IV reports the evaluation process and Section V presents results. Section VI concludes the paper with a future research direction.

II. BACKGROUND AND RELATED WORK
In this section, we present a brief background on DTEMs and previous studies conducted related to this paper.
Ensemble procedure is a combination based technique that involves the strengthening of a multiple weaker classification model to produce a better model. The fine-tune classifier model is utilized to predict new data instances.
Bagging is an ensemble decision tree algorithm that manipulates the training data instances to improve classification model performance [6]. The process of generating another set of data with replacement from the original data is called bootstrapp replicate while the technique is referred to as bootstrap aggregation [7].
Boosting is an ensemble method that ensures the reduction of the classification error generated from the previous classifier. This technique is more sensitive to model over-fitting on noisy data with extensive training iteration [8].
Despite their computational expenses in the accumulation of multiple learners to make a decision, DTEMs offer an appealing advantage in various fields. Especially for solving a plethora of many ML challenges. From the literature, several comprehensive surveys on ensemble learners are available [9], [10] and [11]. In IoT networks, ensemble learners manifest the concept of incorporation to detect attacks captured from multiple devices. The integrated statistical [12] approach can identify irregular events from various IoT network traffics.
Motivated by its bagging and feature selection properties, the Random Forest (RF) method has extensive applications. The authors in [13] utilized it with real-time implementation. Also, Resende et al. in [14] outline its employment opportunities in network security monitoring. Hassan et al. [15] adopted it for network intrusion detection. Empirically validated the approach using an extended version of the KDD 99 dataset. In that context, the technique is less computationally expensive than the Support Vector Machine (SVM) model. The evaluation results reported in [16], shows its effectiveness. It outperforms the SVM and Artificial Neural Network model as tested with smaller datasets. Also, Singh in [17] adopts and implements it for peer to peer real-time botnet detection. However, there is no consideration for memory consumption from the described [17] approach.
In contrast, an optimized DTEMs based on RF was proposed [18] and tested with a small scale Iris dataset. Such implementation over insufficient data records restricted the model adoption capability. This limits the approach applicability in various fields. Especially in the IoT resources constraint domain.
In terms of efficient performance on sparse data, a scalable boosted algorithm was proposed in [19]. Also, a lighter gradient boosting decision tree method based on feature sub sampling that require larger gradient was proposed in [20]. The central assumption made is based on the features dimension reduction to speed up the model classification process. However, this decision cannot be generalized within various datasets as an approach to improve model performance.
Series of ML optimization methods have been proposed for classification tasks with Deep Neural Networks (DNN) in [21] and [22]. The optimized loss function in [23] outperforms the popular cross-entropy technique for image classification. Further, a complex and robust deeper optimization approach has been successfully implemented in [24].
In particular, no existing work in the boosted decision tree literature that automatically optimizes learning parameters for LGBM in the context of IoT. In this paper, we demonstrate such a useable approach that reduces memory and time resources consumption. The method ensures the selection of relevant parameters suitable for IoT resource constraint environment and multidimensional scalable datasets.

III. RESOURCE EFFICIENT BOOSTING METHOD
The process of finding an optimized decision tree classification model can be considered as a challenging task. This is due to the need for intensive parameters tuning in achieving the state of the art performance. For example, optimization of the Gradient Boosting Method (GBM) requires attentive parameter tuning to build a base learner that maximally correlated with the negative gradient of a loss function [25].

A. Light Gradient Boosting Machine
Light Gradient Boosting Machine (LGBM) is a decision tree algorithm based on gradient sampling of data instances with smaller gradient and exclusive feature bundling [20]. This technique has proven successful with a lesser iteration on the training data instances. However, practical implementations of LGBM need various parameter tuning. This is challenging with multidimensional and scalable datasets. Unfortunately, it is difficult to find these optimum parameters that are memory and time resources-efficient with larger training data samples. An attentive configuration is needed in setting the relevant boosted learning architectures for the task of regression, binary, and multiclass classification.

B. LGBM Hyperparameters
LGBM learning model has various hyperparameters to consider for optimization phases and implementation, particularly for the task of binary classification. These include the number of leaves, feature fraction, bagging fraction, bagging frequency, learning rate, and a regularization term. Table I described the range of values of these hyperparameters. There is a recommendation in varying each scale, as described in the LGBM module [26]. Regularization alpha as a constraint can be greater than 0.0. Feature and bagging fraction must be set within [0, 1]. Enabling bagging with its frequency set to non zero value can facilitate efficient model learning.
For the task of efficient resource utilization, the learning rate, and the regularization term are selected in the range of [0.0001, 0.1]. The constraint bagging and feature fraction are utilized within [0, 1]. The constraint number of leaves may cause model over-fitting with increases in computational cost, minimum scales of 2 are initialized and incremented sequentially. Proper configuration of these hyperparameters can minimize time and memory resources consumption.
Primarily, the grid search technique is employed to select the best parameters configuration among various learning models. However, this approach focuses on providing better prediction performance. Yet, there is a challenge in stabilizing the threshold between memory usage and accurate prediction.   l ← length(P ) 9: 10:

C. Parameters Optimization Procedure
Optimization of a decision tree classification model is a requirement to maintain generality on resource-intensive tasks. Particularly for their deployment to resources constraint IoT devices. Careful employment of optimum parameters combination can enhance model performance with minimum resources consumption.
The visualization in Figure 1, is the process diagram for the proposed method. It required datasets and a conventional ML model alongside with the parameters that occur in training. It can accept data, normalize it, and output a model with Fig. 1: Resource efficient process diagram optimized parameters. It employed dataset produce from the collection of realistic benign and malicious traffics using simulated IoT devices. For the task of binary classification, the efficient model can accurately differentiate regular traffic from attack traffic.
The pseudocode of this implementation procedure is presented in Algorithm 2. The foremost steps in Algorithm 2, is the traditional classifier model definition with its parameters composition. During training, parameters are iterated sequentially to find those that efficiently fit a model. The return parameters are those that are effective in terms of resource utilization. Testing data samples are validated using the updated model.

D. Implementation
In practice, the employment of the parameter grid [27] is due to the various series of training sessions that occur for the discovery of optimum parameters with fewer resources. The parameter grid [27] can store multiple parameters. Training and testing was implemented on Spyder [28]  In this section, we present the experimental evaluation of the proposed method with the description of the benchmark datasets selected for IoT security monitoring.

A. Datasets and Preprocessing
The evaluation experiments used four accessible IoT datasets, N-BaIoT [29], Bot-IoT [30], Bot-10 [30], and Unsw [31]. Each dataset consists of various attacks along with normal traffics activities. Particularly, Bot-IoT with multiple categorizations of different botnet attacks. The dataset composed of 72 million records with 16.7 GB of CSV format file, while 10% of the data is made publicly available [30] for model evaluation.
The choice of these datasets allowed frequent model training with thorough experimentation. Each tested dataset are categorized into 70% training and 30% testing records. The datasets are described briefly in Table II. All data records are normalized using the employed min-max standard normalization formula described in Equation 1. The notation X in Equation1, represents the value of vector X, while X max

B. Optimized LGBM Hyperparameters
The most relevant hyperparameters discovered using the proposed method are described in Table III. These include the initial values for the unoptimized LGBM model and optimum values returned by the efficient boosted algorithm.

V. RESULTS
This section presents the analysis and discussion of the experimental results from the implementation of the proposed boosting method. Memory and time usage of the optimization phase are compared with the tested techniques.

A. Testing Speed
The visualization in Figure 2(a) is the testing times needed to evaluate the proposed method against each datasets record. It is efficiently faster than the unoptimized model. It demonstrates reduced classification time in processing each sample of the tested data.
The proposed method is capable of saving processing time resources across datasets. As compared with the conventional model in Figure 2(a), it saved 53.79%, 57.89%, 61.11% and, 47.76% of times for validating a sample of Bot-10, Bot-IoT, Unsw and, N-BaIoT, respectively.

B. Testing Memory
The memory unit consumption comparison in testing each data record with the proposed approach is presented in Figure 2(b). It requires lesser memory. It saved 52.69%, 39.17%, 71.43% and, 41.78% of test memory for each record of Bot-10, Bot-IoT, Unsw, and, N-BaIoT, datasets, respectively. These results indicate its robustness and lightweight security monitoring advantages for IoT devices with overall improvement across benchmarks datasets. It suggests that real-time IoT security monitoring with the proposed approach can be beneficial.

C. Testing Accuracy
The description in Figure 3(a) represents the testing accuracy that the proposed method provides for each dataset. Despite its resource reduction advantages, it also outperformed the unoptimized LGBM method in predicting the Bot-10 and Bot-IoT datasets accurately. The loss of accuracy by the unoptimized approach was due to the utilization of the default configuration parameters. These results indicate the capability of the optimized technique in detecting IoT attack traffic effectively.
The description in Figure 3(b) represents the testing accuracy comparison of the proposed method and the grid search technique. Despite its less computationally expensive, the prediction accuracy across each dataset is closer to that of the grid method.

D. Computational Performance Analysis
The reports in Table IV is the performance comparison for the tested datasets of the grid search and the proposed method.   Fig. 4: Effect of (a) number of tree leaves and (b) bagging frequency on memory.
Regarding the resource consumption of each data record, the proposed approach is better. It required minimal memory and maintained reduced classification time. These results indicate its effectiveness and efficiency advantage. The description in Table V is the computational performance of the employed algorithms against the N-BaIoT dataset sample. The proposed method indicate reductions in memory and time resources. It demonstrates a robust classification of attacks and regular traffic with an accuracy of 99.90%.
The demonstration in Figure 4 is the relationship between hyperparameters tuning against memory consumption. Figure  4a, indicates the effects of varying the constraint number of leaves on memory usage. The graph suggests smaller tree leaves values for lesser memory consumption. Also, Figure  4 against the constraint bagging fraction. It suggests the selection of accurate value where resources are limited. Also, Figure 4(b), demonstrates how feature fraction facilitates memory resource consumption. The illustration in Figure 6(a)is the sample memory consumption against the regualization term. Also, Figure  6(b) shows the impacts of learning rate on memory. It demonstrated that smaller values of these hyperparameters facilitate memory saving. This is useful in controlling the deployment of resource-hungry boosting algorithms.

VI. CONCLUSION AND FUTURE WORK
The increasing number and complexity of IoT devices motivate the development of a robust, efficient, and feasible security protection system. We present such an approach that utilized LGBM with optimized hyperparameters to lower computational cost for resource constraint IoT devices. Mainly due to the assumption that most traditional ML methods are computationally expensive for IoT security monitoring. The proposed technique is efficient and useable for IoT resources consumption reduction. It outperforms the conventional LGBM model tested with the initialized hyperparameters and the grid search technique. It is better than the five employed boosting algorithms for effective attack detection and lesser resource consumption. Regarding the SGB algorithm, it reduced the processing time and memory consumed during testing by 88.09% and 99.70% for each sample. These results motivate follow-up research to enhance the method for real-time IoT security monitoring.
An essential step to validate the method externally would be to replicate this study results with regular and attack traffic captured from different real IoT devices. These include extensive network traffics captures during various IoT attacks generation.
Consideration of various datasets would allow the feasibility measurement of the proposed technique based on the amount and diversity of IoT traffic. We are more concerned about the behavior and variation of different IoT devices. At large, we want to investigate whether specific devices are more vulnerable than others. Further, we would like to explore more challenging ML techniques with other complex parameters and hyperparameters available in the literature.

VII. ACKNOWLEDGEMENT
We thank the School of Computing, Robert Gordon University for their assistance. This work was supported by the Petroleum Technology Development Fund (PTDF), Nigeria.