Rapid Detection of Multi-QR Codes Based on Multistage Stepwise Discrimination and a Compressed MobileNet

Poor real-time performance in multi-QR codes detection has been a bottleneck in QR code decoding-based Internet of Things (IoT) systems. To tackle this issue, we propose in this article a rapid detection approach, which consists of multistage stepwise discrimination (MSD) and a Compressed MobileNet. Inspired by the object category determination analysis, the preprocessed QR codes are extracted accurately on a small scale using the MSD. Guided by the small scale of the image and the end-to-end detection model, we obtain a lightweight Compressed MobileNet in a deep weight compression manner to realize rapid inference of multi-QR codes. The average detection precision (ADP), multiple box rate (MBR) and running time are used for quantitative evaluation of the efficacy and efficiency. Compared with a few state-of-the-art methods, our approach has higher detection performance in rapid and accurate extraction of all the QR codes. The approach is conducive to embedded implementation in edge devices along with a bit of overhead computation to further benefit a wide range of real-time IoT applications.

QR code, as a kind of low-cost reading label, can empower the reliable construction of deep-learning-based perception framework [2]. However, fast detection of multiple QR codes in images remains a challenging task, though it is crucial for registering large quantities of sample tubes or commodity goods in various scenes, such as medical facilities for COVID-19 testing, warehousing and logistics, as illustrated in Fig. 1. This requires not only high performance in real-time, namely, detection of all QR codes from one image efficiently but also low computational complexity for embedded implementation using the edge devices.
Earlier approaches [3], [4], [5] of the QR code detection detected QR codes by calculating the width ratio of the black and white regions. However, they can only be applied to images with a simple background and high contrast. The image resolution, illumination, noise etc., can easily influence the detection performance of these approaches [6]. By extracting the contour lines, QR codes can be detected using the morphological processing and Hough Transform [7]. However, this approach has a high computational complexity, yet the overall accuracy in dealing with complex background is still low. Most conventional QR code detection approaches only consider the single QR code scenario, which can not be effectively applied in the multi-QR codes detection task.
Deep-learning-based object detection can help to predict the location of multiple QR codes in an image. These object detection approaches can be divided into two categories according to the detection principles, i.e., single-stage and two-stage approaches. The single-stage approach mainly uses the fully convolutional networks to process the image, so that the class probability and position of the target can be directly estimated. As a result, the detection speed is fast, but the accuracy rate is lower than that of the two-stage ones. Typical algorithms include YOLOv3 [8], SSD [9], etc. The twostage target detection approach often needs to determine the target candidate regions first before applying region-based classification. Compared with the single-stage scheme, the two-stage one, for instance, R-CNN [10], Fast R-CNN [11], and Faster R-CNN [11], has a higher accuracy but is more time-consuming.
Many researchers have also introduced various network models in the research field of multi-QR codes detection. For example, Sharif et al. [12] proposed an effective 1-D barcode detection network. However, the number of model parameters is high, and the detection accuracy still needs to be improved. Jia et al. [13] first proposed a multiclass barcode detection network based on the Faster R-CNN network that is suitable for complex environments, which can locate multiple barcodes and correct the distortions. However, this approach is too time-consuming and does not consider the influence of illumination changes on detection accuracy. Subsequently, the authors compressed and optimized the network model to reduce the parameters in [14], but the computational cost remains high. Based on the EfficientDet [15], [16] locates the four vertices of the QR code with a relatively high training cost. Furthermore, this approach adds an additional keypoints regression layer for locating four vertices of the QR code accurately, which increases the algorithm's time complexity. In order to reduce the computation overhead of the learningbased model in the QR code detection task, the batch QR code detection approach [17] is proposed, which combines the conventional QR code detection and the lightweight model, demonstrating improved detection speed. However, the application scenarios of this approach have some limitations. The detection accuracy of QR codes will be poor following the high false detection rate, when the detected image contains complicated background or noise.
In summary, conventional detection approaches encounter difficulties to detect multiple QR codes contained in a single image, while learning-based ones depend on a massive model structure, sufficient labeled data, and high hardware configurations, leading to poor detection speed and difficulty for embedded implementation on edge devices. In order to overcome these shortcomings, we propose an effective approach for rapid detection of multiple QR codes. The major contributions are highlighted as follows.
1) Based on object category determination analysis, we propose a multistage stepwise discrimination (MSD) approach, which is capable of rapidly classifying redundant target candidate regions and accurately subsampling them into small sizes. 2) Inspired by the small-scale and end-to-end detection task, we use the deep-learning-based network to capture the convolutional features of QR codes, and propose the lightweight Compressed MobileNet for high accuracy classification. 3) We propose a rapid detection approach for multiple QR codes based on MSD and Compressed MobileNet. The experimental results show that our approach has higher detection accuracy and the faster speed. Our model is conducive to embedded implementation on edge devices, meeting the needs of real-time multi-QR codes detection in IoT applications. The remainder of this article is organized as follows. In Section II, we review the related work in detail. The framework of the proposed approach is given in Section III. Section IV presents the experiments and the results, followed by a summary and discussion of future research in Section IV. Table I lists all acronyms, along with descriptions.

II. RELATED WORK AND MOTIVATIONS
QR code is a low-cost label reading strategy over the perception layer of the IoT. However, accurate and rapid QR code detection on edge sensing devices has been challenging. Related approaches can be divided into two categories, i.e., image processing-based approach and deep-learning-based model as detailed below.

A. Image Processing-Based Approach
Early works [18], [19], [20], [21] use conventional image processing approaches, including image filtering, binarization operation, morphological processing and the Hough transform, to detect the QR codes, followed by extracting the QR codes as the foreground. This kind of approach is simple and effective, and has also been applied to QR code detection in recent years. Sörös and Flörkemeier [22] proposed a QR code location method based on the areas with high concentration of edge and corner structures which is so-called the structure matrix in the HSV(Hue, Saturation, Value) color system. The edge detection is used to capture the boundary information, before being transmitted to the HSV space to locate the QR codes. Wachenfeld et al. [23] proposed an approach for 1-D barcode detection and recognition, which utilizes binarization and other operations to preprocess the image, and then performs boundary detection and classification to locate the barcode. Based on the conventional image processing, a more efficient QR code detection approach is proposed in [24], which can not only locate and detect QR codes under uneven lighting environments but also be suitable for low-resolution images. After converting the captured image into the gray one followed by binarization, Karrach et al. [24] obtains the QR code's bounding-box by locating the Finder Pattern of the QR code. Hakim et al. [25] also converts the RGB image into a binary one and then performs morphological processing to detect the QR code. Lopez-Rincon et al. [26] proposed a QR code detection approach based on image integration, which uses the integral image to determine a threshold for binarization, and then applies the Finder Pattern of the QR code to locate the QR code through the extracted connection regions.
The primary purpose of the binarization operation above is to enhance the edges while removing the unwanted background. Image binarization can be divided into local and global thresholding-based approaches [27]. The local thresholding determines the binarization threshold using the pixel distribution with a neighborhood region. Although the image can carry out this binarization operation adaptively, it needs to calculate multiple thresholds, leading to low efficiency. The global threshold approach achieves the binarization by setting one threshold, thus, it is fast. OTSU [28] is a kind of global thresholding approach with simple and rapid calculation, unaffected by image brightness and contrast. It divides the image into the background and the foreground according to the histogram distribution. Because of the rich black and white block features of the QR code, there is an apparent distinction between the foreground and the background. We can obtain the edge information of the QR code contours by using the OTSU to get the maximal interclass variance of the QR code and the background. Therefore, in our work, we use the OTSU to calculate the binarization threshold and convert the RGB image into a binary one.

B. Deep-Learning-Based Model
In recent years, with the rapid development of Deep learning and computer vision techniques, deep neural network (DNN) models have been applied to detect multiple QR code patterns in the image. Dubská et al. [29] used YOLOv2 [30] to detect the positions of the multiple QR codes in the image. Simultaneously, the rotation angles of the QR codes are regressed by DarkNet19. This approach achieves favorable results on the Muenster barcode data set. Zamberletti et al. [31] used the edge detection operator to process images, mapping them to the Hough space. The rotation angles of the 1-D barcode are obtained through a trained multilayer perceptron network, which achieves satisfactory results in the experiments. A geometric approach in [32] is introduced to the deep-learning-based model, which extracts the bounding box regions and uses a line detection algorithm to detect corresponding barcodes after utilizing the YOLOv2 to locate multiple 1-D barcodes in the image. In [33], an end-toend model is proposed to detect multiple QR codes in the image. This approach contains a Quadrilateral Regression Layer that can accurately locate barcodes and a multiscale spatial pyramid pooling layer to detect small-scale barcodes.
In addition, many DNN models have achieved high accuracy in object detection tasks in recent years. These deep-learningbased models can not only be divided into the two-stage and the single-stage but also anchor-based and anchor-free model where anchor is the pregenerated bounding box. Fast R-CNN, Faster R-CNN, YOLOv2 and SSD belong to anchor-based methods, while DenseBox [34], CornerNet [35], ExtremeNet [36], FSAF [37], FCOS [38] and FoveaBox [39] are anchor-free ones. However, it is for the multi-QR codes detection that the practicability of these approaches is considered necessarily in actual scenes, where whether they can meet the requirements of the perceptive devices with scarce computing resources is particularly challenging. Although anchor-free models proposed in recent years simplify the structure of anchor-based models and reduce the computational complexity, they still have a large number of parameters and require high-performed hardware, which prevents them from being applied directly to directly the multi-QR codes detection task. In order to deploy these DNN models in real scenarios, they need to be compressed through knowledge distillation, low-rank decomposition, network quantization and so on. Moreover, SqueezeNet [40], ShuffleNet [41], MobileNet [42], [43], [44], and other lightweight networks are proposed to reduce the complexity of the DNN models and the parameters. Inspired by network compression technology and the lightweight network model, we introduce a lightweight model in the last part of our approach to improve the recognition accuracy.

C. Our Motivations
Nowadays, the IoT has elicited great attention from both industrial and scholarly circles, such as big data, energy management, and commerce, to name a few. However, in the perception layer of IoT, QR codes, as the most widely used labeling technology, still present challenges. These challenges are primarily composed of two aspects. The first is the complicated industrial environments [45], where the intricacy of real-world settings renders detection and decoding more arduous. This complexity encompasses multiple scales, occlusion between QR codes and backgrounds, geometric distortion, and orientation, commonly seen in industrial environments. The second is the slow detection speed for multi-QR codes, which may also affect the security of QR code information transmission [46], [47] due to the poor detection speed, as well as for ensuring the production line efficiency. Therefore, all of these necessitate that IoT devices are able to quickly acquire as much QR code information as possible.
To tackle these challenges, we aim to develop a rapid multi-QR codes detection approach under industrial environments. Although attempts have been made to address some of these issues, as summarized in Table II, they are far from fully solved. In addition, existing approaches can be categorized into image processing based and Deep learning based. The former features poor detection accuracy but fast speed even on the central processing units (CPUs). The latter requests more computational power, hence, has difficulty for implementation on a resource constrained edge devices yet it tends to have high detection accuracy. How to combine them for both high detection accuracy and fast speed will be the major purpose of the proposed approach.

III. PROPOSED APPROACH
In the QR code image detection task, a large number of predicted bounding boxes are generated. Through utilizing the category determination principle and IoU [48] to analyze these bounding boxes, they can be divided into independent ones, intersecting ones and containing ones. In this section, we propose an efficient box-screening strategy, namely, the MSD-based approach, to improve the screening performance. Finally, we present our multi-QR codes rapid detection approach in detail.

1) Analysis of Predicted Bounding Boxes Based on Category Determination Principle:
In the multi-QR codes detection task, there will be many generated bounding boxes that may not contain the full targets. Thus, these need be examined to remove the false alarms. The classification of these boxes is a kind of object category determination problem. As shown in Fig. 2, there are four types for the overlap of bounding boxes in an image: 1) the four corners of a bounding box are all contained in the interior of the another; 2) a corner of a bounding box is contained inside the another; 3) two corners of a bounding box are contained inside the another; and 4) the bounding box does not contain any corners of the others. As for a further comment, these types can fall into two cases, one is the intersection relation which is shown in Fig. 2(a), and the another is the inclusion one which is presented in Fig. 2(b)-(d). These bounding boxes X can be divided into independent boxes c ind , intersecting ones c int , and containing ones c con . So the set Y of category is defined as x ∈ X represents a bounding box, and the risk of misplacing it into category c is R(c|x), which is defined as This risk can be minimized by finding a decision rule h : X → Y, which h is defined as where E stands for the expected function. For each sample x, if h can minimize the risk value R(h(x) | x), the risk R(h) will also be minimized, i.e., We study the bounding box categories to improve the detection efficiency and propose a decision rule h, namely, MSD, which can eliminate extra boxes rapidly. Thus, all bounding boxes are quickly divided into independent boxes, intersecting ones, containing ones. Finally, the QR code within each bounding box is detected and extracted.
2) Design of MSD Approach: For two bounding boxes containing the same detection target, as shown in Fig. 3, the IoU score can measure their intersection degree. In order to judge whether two boxes are in the inclusion relation, we propose another metric based on the IoU score, namely, the coincidence score C. A and B in Fig. 3 are two intersected boxes. The width iou w , height iou h and IoU score of their intersected region can be defined as where (x 01 , y 01 ), (x 02 , y 02 ), w 1 and h 1 are the upper-left corner, the lower-right corner, width and height of A. (x 11 , y 11 ), (x 12 , y 12 ), w 2 and h 2 are also the upper-left corner, the lowerright corner, width and height of B. S A and S B denote the area of A and B, respectively. The coincidence score C AB is calculated as The coincidence degree analysis (CDA) can be conducted on A and B, i.e., C AB = 1 when the relation of A and B is inclusion. Consider X ind , X int and X con are the boxes sets of c ind , c int and c con , respectively. n ind , n int and n con are the number of samples of X ind , X int and X con . When a sample x is classified into c ind , the risk value R(c ind |x) is defined as where sign(x) is a sign function defined as When x is assigned to c ind correctly, x does not intersect any other bounding box where R(c int |x) = 0 when there is an intersection relation between x and the bounding box Conversely, R(c int |x) ∈ {0.5, 1}. When x is classified into c con , the risk value R(c con | x) is defined as where R(c con |x) = 0 when there is inclusion relation between x and the bounding box x j in X con , namely, x ∈ X con . On the contrary, the risk value is 0 < R(c con |x) < 1. So far, according to the IoU score and coincidence score, we have completed the category determination of independent bounding boxes, intersecting ones and containing ones by minimizing the risk value R(c | x). After obtaining the category of each box by the MSD in this section, we propose a filtering algorithm to eliminate the redundant candidate boxes in X int and X con .
3) Box Filtering Algorithm Based on MSD: Given the upper-left corner (x lt , y lt ), width w and height h of a bounding box, the center point (cx, cy) can be calculated as where the center points (cx A , cy A ) and (cx B , cy B ) in Fig. 3 can be found. The horizontal distance d x and vertical distance d y between the two center points, namely, |cx A − cx B | and |cy A − cy B |, can also be calculated as The distance discrimination (DD) can be conducted for A and B, when they have the critical intersection distance. The width t x and the height t y of the intersected region can be calculated as where if two bounding boxes intersect each other, d x ≤ t x and d y ≤ t y . Otherwise, there is no a intersection relation between them. Fig. 4 gives the flowchart how to eliminate the extra boxes and only retain those smaller ones in Fig. 3, where the intersecting bounding boxes that do not contain QR code should be discarded. Different from the containing bounding boxes, intersecting ones cannot quickly determine whether the boxes contain real QR codes. Therefore, we design a rapid filtering approach by combining both the size and the edge features of the QR code, which has two stages in Fig. 4. First, all the independent bounding boxes are obtained as the prior knowledge, using their size features to filter the intersecting ones preliminarily. Then, the edge feature is utilized to further filter the remaining boxes. When detecting the multi-QR codes, their sizes should be within a range of [min_area, max_area], which can be determined using the areas of the detected independent bounding boxes. Actually, we introduce the area redundancy parameter β = 0.2 and set the size range as [(1 − β)min_area, (1 + β)max_area], aiming to prevent the valid boxes from missing or being eliminated. The bounding boxes are retained when their areas are within this range. Furthermore, for the bounding boxes that contain the QR code, another significant feature is its edge grayscale distribution, which comes from the stage of binarization and morphological processing. The grayscale statistics (GSSs) is utilized to determine the edge grayscale distribution in each intersecting bounding box of the same target, which is for finding the box with the maximal area proportion of the QR code.

B. Proposed Approach for Rapid Detection
This section introduces our detection algorithm for multi-QR codes using MSD presented in the previous section. Different from approaches based on conventional image processing and QR code's finder pattern, we eliminate false bounding boxes without splitting an image into several blocks. These blocks' quantity and size are uncertain and may lead to incomplete QR code region. Hence, actually those approaches have the delicate adaptive ability. Inspired by DNN-based detection approaches, we obtain all bounding boxes that may contain QR code in an image according to the QR code's dense edge information after preprocessing and contours extraction of RGB image. And then the boxes filtering algorithm based on MSD is applied to these bounding boxes, which has the low computational complexity and the high precision. Finally, we train a lightweight model, i.e., Compressed MobileNet, to improve the classification precision of bounding boxes. Fig. 5 is the pipeline of the approach we proposed, which is described in detail in the following.
1) Preprocessing: At the beginning, the multi-QR codes' detection accepts the inputs, high resolution images. These images are usually down sampled to reduce the computational burden of the subsequent steps. However, the number of QR codes is uncertain in the image captured by the actual camera. To make sure that as many QR codes as possible can be detected, we only carry out the gray scale transformation on the RGB images instead of the downsampling. Actually, whether the proportion of finder pattern which is unique to QR code or the orthogonal distribution of QR code's edge grayscale, some complex environments will make both of them insignificant and lead to detection failure directly. In the study of the saliency feature of the QR code, we find that QR code' edge has a dense distribution of the black and white blocks which can be applied to detection as rich edge information. Moreover, a standard QR code is surrounded by a white border to segregate out of the background. This edge information and the white border are less sensitive to the environment with robust saliency. Thus, we utilize them as the basis feature to rapidly find out all the regions that may contain QR code and mainly carry out edge feature extraction of gray image during preprocessing.
The edge detection operator is applied to QR code's edge extraction. There are two types of common edge detection operators. One is first-order differential operator, for instance, Roberts, Sobel and Scharr. Another is second-order differential operators such as Laplacian and Log/Marr. The Roberts operator is conductive to detecting steep edges but sensitive to noise as same as the Laplacian operator, which leads to inaccurate location. The Log/Marr operator consists of Gaussian smoothing and Laplacian filter, which requires more computation. The Sobel operator and the Scharr operator both can detect edges accurately. But the sensitivity of the Sobel operator is different in all directions. And the Scharr operator can calculate grayscale gradient changes more accurately via weighted neighborhood. Thus, we use the Scharr operator to process the gray image, obtaining the gradient image. Furthermore, we apply binarization to the gradient image to enhance the visual quality of the edge features. Since the gradient image where the range of the grayscale distribution is [0, 255] does not have complex background, we utilize OTSU [28] to obtain the binary image. Finally, we dilate the binary image several times after eroding it to eliminate the slight noise. The intermediate results in the preprocessing stage are shown in Fig. 6.
2) Detection of QR Code Regions: Since the QR code's edge feature can not distinguish completely the QR codes from background, this section captures these ones' bounding box through contour extraction and further filtering. After applying contour extraction [49] to bounding regions that contain the QR code's edge feature, we represent these ones as = 0, . . . , n) denotes the ith bounding region and n is their count. Most bounding regions in 0 are negative samples, which contain only the edge background and no the QR code's edge feature. To gain positive ones, inspired by the characteristic that bounding shape is close to a square, we define the Square Degree SD of the bounding shape as where w and h are the bounding shape's width and height.
We eliminate bounding shapes that are not like a square after setting SD-based filter condition, which we call square degree selection method (SDSM). SDSM is introduced in Algorithm 1. We filter out bounding boxes that contain only background after SDSM, but there are still negative samples which have a large or small one. Thus, we first calculate all the area size of  The rect's SD was calculated according to Eq. (14) 4: if SD ≥ t then 5: Keep rect and add it to out 6: end if 7: end for the bounding box obtained by SDSM following figuring out the critical area size via statistical analysis for the distribution of the area size histogram, and finally we regard it as segmentation threshold to further eliminate negative samples. These Calculate rect's area size S 10: if S > t then 11: Keep rect and add it to out 12: end if 13: end for steps which we call area statistics selection method (ASSM) are shown in Algorithm 2. Fig. 7 shows the QR codes' bounding boxes after preliminarily processing in SDSM and ASSM. Obviously we filter out most negative samples after ASSM but still retain several bounding boxes which overlap each other. These boxes also overlap positive samples, where it is hard to eliminate them. This phenomenon also exists in other multi-QR codes detection approaches. In our study, we use MSD-based process to weed out these hard-to-eliminate samples. 3

) Bounding Boxes Filtering Based on MSD:
We rapidly divided the previous stage's overlapping bounding boxes into independent boxes, intersecting ones and containing ones through MSD. In addition, these independent bounding boxes that we utilize as priori knowledge are combined with the QR code's edge feature to process intersecting bounding boxes and containing boxes, which is shown in Fig. 4. Finally, multi-QR codes as same as their bounding boxes are detected rapidly through Preprocessing, Detection of QR Code Regions and Bounding Boxes Filtering Based on MSD. However, there is still low accuracy in bounding box locating. We introduce lightweight DNN-based boxes selection to solve this problem. Unlike the MSD which need to classify a great number of overlapping bounding boxes rapidly, the inputs to this lightweight DNN model are nonoverlapping bounding regions that may contain QR codes. What is more, compared with other DNN models for multi-QR codes detection, the lightweight DNN model that we call Compressed MobileNet has a simple structure which requires a bit of overhand computation, while most of its inputs are positive samples.

4) Boxes Classification Based on Compressed MobileNet:
Deep convolutional network has the ability of feature extraction that can map RGB image to convolutional latent space and obtain the target's higher layer features. Therefore, many object detection models design deep convolutional network as backbone, which can merge contextual feature and capture the deep representation of the inputs. But this method will increase the model's complexity, requiring more train cost to update parameters. The QR code itself consists of many black and white blocks that are obvious and abundant. This feature conduces to differentiate the QR code and other objects. Thus, we only need to train a lightweight model that can learn this feature, which contributes to not only reducing the network parameters and the computation complexity but also greatly improving the runtime speed. Whilist by doing this it is unnecessary to update a large number of parameters like other common object detection models.
In our work, we introduce the modifiable width multiplier and the modifiable resolution multiplier to compress a small trained MobileNet model, i.e., the Compressed MobileNet. The role of the modifiable width multiplier α ∈ (0, 1] is to thin a model uniformly at each layer. The number of a given layer's input channels M becomes αM and the number of its output channels N becomes αN. α is modifiable statistically during training through a series of experiments. The modifiable resolution multiplier ρ ∈ (0, 1] we used is to reduce the inputs and the internal representation at each layer, which can reduce computational cost by ρ 2 . ρ is also modifiable likehood of α. The Compressed MobileNet is used to category discriminant for the output to the stage of Bounding Boxes Filtering Based on MSD. The related experiment is conducted and discussed in Section IV-A.

IV. EXPERIMENT RESULTS AND ANALYSIS
This section mainly compares our approach and other approaches following analyzing them. In Section IV-A, we conduct the comparison of the learned-based model performance. The time cost and the overhand computation of our approach are analyzed in detail in Section IV-B. Section IV-C shows the quantitative results of the multi-QR codes detection approaches.
Our experiment is conducted on a server equipped with an Intel Core i7-5930k 3.50-GHz CPU and an NVIDIA RTX1080Ti GPU. The server system is Ubuntu 20.04, and it has 32 GB of memory. What is more, we use the medical test tube images in [17] as our data set. The image resolution is 3096 × 4048, and each image may contain 2/9/40/80/120 or 160 QR codes. Each QR code regions area accounts for 0.25%-10% of the image. Some examples in our data set are shown in Fig. 8.

A. Quantitative Comparison for Lightweight Models
This section mainly compares the classification performance of lightweight models. We first obtain images from Preprocessing and The Detection of QR Code Regions. These images may contain background or the QR code and are split into two groups manually. One group contains 30 000 positive samples and the another has 17 000 negative ones. We use these images as subdataset to train all the lightweight models and to evaluate their performance. Before training, we randomly divided the training set and test set according to 9:1. Fig. 9 shows some examples in our subdataset.
We first train a small MobileNet model, whilist adjusting our modifiable width multiplier α and modifiable resolution multiplier ρ step by step to compress this model. By  applying α and ρ, our Compressed MobileNet accepts RGB images as inputs with size 32 × 32. Its parameters was reduced to 0.26 million, which is about 1/10 of the standard MobileNet's. What is more, in order to enhance our model's robustness, we extended training samples via data augmentation and increased the number of training epochs. The learning rate is set to 10 −3 , and we employed a variety of tricks to obtain a model with the best performance during training. Our model converged approximately between the 45th and 50th iterations. We exclusively used this model for validation in succeeding trials. Table III shows the accuracy comparison of our approach. In this comparison, SqueezeNet, ShuffleNet, MobileNetV1, MobileNetV2 and MobileNetV3 all adopt standard specifications.
As seen in Table III, the accuracy of all lightweight models exceeds 95% in our binary classification task, in which our Compressed MobileNet has the least number of parameters, i.e., less than 0.3 million, compared to over 1 million in other models. Although the MobileNetV3 has a classification accuracy of 98.1%, it has more than 5 million parameters, thus, has difficulty for implementation on IoT devices. The SqueezeNet and ShuffleNet have a small number of parameters, yet their classification accuracies are only 96.6% and 95.7%, respectively. The Compressed MobileNet has a comparable classification accuracy up to 97.6%, though much less parameters. By training a small lightweight model, we can not only achieve higher classification accuracy but also ensure the embedded implementation on edge devices for real-time multi-QR codes detection.

B. Performance Analysis of the Proposed Approach
In this section, we present the implementation details and relevant experiment results of the proposed approach, while using detection accuracy and time cost to evaluate the detection performance of multi-QR codes. The detection accuracy consists of two metrics, detection precision (DP), the detection accuracy for an image, and average detection precision c (ADP c ), the average detection accuracy on a specific number c of QR codes. DP and ADP c in our work are defined as For all images with the number of QR codes c, n r c in (18) represents the number of images where the number of generated bounding boxes is not less than c. The MBR c reflects the ability of generating redundant bounding boxes of the approach. In (19), the MMBR represents the mean MBR c in the test data set. In order to compare the detection effect of different approaches, we divided the test data set into 6 groups and each group contains 50 images. Group c (c = 2, 9,40,80,120,160) indicates that each image in this group contains c QR codes. Table IV shows the detection accuracy of different groups. It can be seen that for the detection of a single image DP exceeds 97% in Group 40 , Group 80 , Group 120 , and Group 160 . Most of the ADP c remain at 99.8%. However, the DP drops slightly on Group 2 and Group 9 while being at least 50% on Group 2 . The reason is that each image in Group 2 contains only two QR codes. If one QR code is missed during detection, DP will decrease.   missed detection that are shown in Fig. 10, the cause where our approach fails is the influence of extreme blur and extreme noise.
Table VI compares the running time, in seconds, of several key modules in our approach, where obviously our approach has a fast processing speed. It requires about 0.6s to accurately detect all 160 QR codes within an image, and more time is needed if the image contains more QR codes. In Fig. 11, we further visualize the running time in each module as a histogram, where it clearly shows that the Compressed MobileNet takes about 2/3 of the total time. Regarding the mean square error of each stage's detection time across six groups, the error is 0.5139 for the overall detection time, yet it reduces to 0.3160 for the Compressed MobileNet. This indicates that the detection stability of our approach is largely benefitted from the Compressed MobileNet. Actually, the number of parameters in the Compressed MobileNet can be further reduced by adjusting the modifiable width multiplier and the resolution  multiplier. In addition, through MSD-based Bounding Boxes Filtering, most of the inputs to the Compressed MobileNet are positive samples, which can further reduce the running time for the classification of negative samples and improve the detection efficiency.

C. Quantitative Comparison for Multi-QR Codes Detection Approaches
In this section, we use three kinds of metrics, i.e., ADP, MBR, and the time cost, to quantitatively compare the performance of our approach and other up-to-date approaches. How we used the test data set is the same as that in the previous section. Tables VII-IX are three representative   results, corresponding to Group 2 , Group 80 and Group 160 , respectively. We applied the same train data set to all the learning-based approaches in this section. The images in the train data set are not repeated in the test data set and each of them contains the number of QR codes ranging from 2 to 160. What is more, we did not stop the training of the network models until the training loss convergence and the hyperparameters of these models were set uniformly according to their articles.
As seen in Table VII, for Group 2 where each image contains two QR codes, the MBR 2 of our approach is the highest compared with other approaches, which means that our approach can generate more redundant bounding boxes. The ADP 2 of [16] is 1% higher than our approach, because the number of QR codes contained is fewer, leading to the false detection of our approach. In Table VIII, we can find that the ADP 80 of our approach and [16] both have more than 99% on Group 80 where each image contains 80 QR codes, and the MBR 80 of both is also over 98%. Compared with YOLOv3, Faster R-CNN and [13], our approach has the best detection accuracy on Group 80 . With the increasing number of QR codes in an image, Table IX further shows an excellent detection performance of our approach on Group 160 . On Group 160 , except our approach which achieves over 99%, all other approaches have their performance declined apparently, i.e., 71.45% for Faster R-CNN, 37.21% for ADP 80 and 34.24% for ADP 160 , respectively.
As visually compared in Fig. 12 where different multi-QR codes detection approaches were evaluated in different groups, it can find that with the increase of the number of QR codes in an image, the ADP curve and the MBR curve of YOLOv3, Faster R-CNN, [13] and [16] ascend gradually and subsequently decline. The reason is that these network models [8], [11], [13], [16] require manually set anchor box sizes, which limits the size of the targets they can detect. If the size of a image is fixed, as the number of QR codes in this image changes, the size of these targets changes accordingly. Only when the QR codes are kept within an appropriate size range can these network models achieve optimal performance. In contrast, our approach does not lie in the anchor size setting, thus, we can maintain the robust detection performance. Moreover, the more QR codes an image contains, the better detection effect our approach achieves. In addition, it can be seen from Fig. 12(b) that the Multiple Box Rate of our approach is the highest in all groups of the test data set. We attempt to explain such an excellent synthesis effect that our approach tends to generate more bounding boxes, namely, being more tolerant toward the redundancy of bounding boxes.   Table X, YOLOv3 has a rapid detection speed, with an average detection time of 113 ms for a multi-QR codes image. However, both the MADP and MMBR of YOLOv3 is poor, which denotes that YOLOv3 has a low performance for multi-QR codes detection. In contrast, Faster R-CNN, as a two-stage detection model, has a higher MADP and MMBR than YOLOv3, but it is quite time-consuming. For [13], an improved Faster R-CNN, with rotation-invariant detection and key points regression for QR codes detection, also belonging to the dual phase detection approach, has achieved 90% of MMBR. However, when the QR code is large, the MADP is poor. The QR code detection in [16] is improved using the EfficientDet, by adding the key point regression header network, which can capture the key points of the QR codes for regression training. The relatively high MADP is achieved at up to 97.81%. Its MMBR is also close to our approach, but the structure of the model is complex and results in a much higher computational cost. The overall detection accuracy of our approach is the best, which has the lowest computational cost to benefit real-time industrial applications.

V. CONCLUSION
In this article, we have proposed a fast multi-QR codes detection approach, which consists of four stages: 1) Preprocessing; 2) Detection of QR Code Regions; 3) MSD-based Bounding Boxes Filtering; and 4) Compressed MobileNet-based Classification of the detected bounding boxes. The proposed MSD strategy for box-screening has indeed improved the accuracy for rapid classification of the multi-QR codes. In addition, the Compressed MobileNet has resulted in reduced parameters for efficiency but higher recognition accuracy. In comparative experiments, we validate the superiority of our approach on the publicly available data set. Finally, due to lower requirements on hardware resources than the existing deep-learning-based approaches, our approach is conducive to the embedded implementation on edge devices to further benefit multi-QR codes detection in real-time IoT scenarios.
In order to further improve the accuracy and efficiency in multi-QR codes detection, especially, with more lightweight local spatial representation, we will make reference to deeplearning-based image segmentation approaches [50], [51] and local feature extraction approaches [52]. Also, to improve the reliability and explainability of the DL-based models, we will also explore multi-QR codes detection with the Explainable Artificial Intelligence [53] in trustable IoT systems, as well as to test on new version of YOLO, such as YOLOv5.
Peixian Wang is currently pursuing the M.Eng. degree in control science and engineering with Guangdong Polytechnic Normal University, Guangzhou, China.
His research interests mainly focus on image processing and perception, as well as Internet of Things technology.
Huimin Zhao received the B.Sc. and M.Sc. degrees in signal processing from Northwestern Polytechnical University, Xi'an, China, in 1992 and 1997, respectively, and the Ph.D. degree in electrical engineering from Sun Yat-sen University, Guangzhou, China, in 2001.
He is currently a Professor with Guangdong Polytechnic Normal University, Guangzhou. His research interest includes image, video, and information security technology.
Xu Lu received the M.S. and Ph.D. degrees in control theory and control engineering from Guangdong University of Technology, Guangzhou, China, in 2009 and 2015, respectively.
He is currently a Professor with the School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou. His research interests include artificial intelligence and Internet of Things.