A Survey on Network Intrusion Detection using Convolutional Neural Network

Nowadays Artificial Intelligence (AI) and studies dedicated to this field are gaining much attention worldwide. Although the growth of AI technology is perceived as a positive development for the industry, many factors are being threatened. One of these factors is security, especially network security. Intrusion Detection System (IDS) which provides real-time network security has been recognized as one of the most effective security solutions. Moreover, there are various types of Neural Networks (NN) approaches for IDS such as ANN, DNN, CNN, and RNN. This survey mainly focuses on the CNN approach, whether individually used or along with another technique. It analyses 81 articles that were carefully investigated based on a specific criterion. Accordingly, 28 hybrid approaches were identified in combination with CNN. Also, it recognized 21 evaluation metrics that were used to validate the models, as well as 12 datasets.


INTRODUCTION
With the rapid evolution of the Internet and communication technologies, it has become a crucial aspect in almost every part of our life. This has significantly increased the amount of data being generated and dealt with, which in turn had led us to the era of "big data" [1]. Henceforth, it has become a challenging task to protect this data and its connection. Considering any corruption or lack of security during data transmission may lead to serious problems for individuals and organizations. Moreover, the variation of attack methods and the complexity of the network system have increased the difficulty of such task [2]. Therefore, researchers are investigating all the possible techniques and methods that could secure the continuous connection. An Intrusion Detection System (IDS) is one of the ideal solutions [3].
There are two main classes of IDS, the first one is Network-Based Intrusion Detection Systems (NIDS). This monitors the network traffic and alerts the network administrator whenever an attack is detected. While the second one is Host-Based Intrusion Detection Systems (HIDS). The HIDS scans each host device independently (not the network), it alerts the host in case of any suspicious packet detection [4]. This research is primarily concerned with NIDS which is divided into two main methods, misuse detection and anomaly detection. The misuse detection system must be pre-equipped with a set of attack signatures to detect them. Hence, it is not able to detect unknown attacks [5]. On the other hand, the anomaly detection system operates based on the normal usage patterns which allow it to detect the unknown attacks. Nonetheless, due to the process of defining multiple normal use patterns, the anomaly detection system has high false alarms. In other words, using a technique that can learn by itself such as Deep Learning (DL) models would enhance the anomaly detection system's ability to determine the normal use patterns. This will also be beneficial in reducing false alarms [6].  [5] illustrates all machine learning algorithms where it's split into shallow and deep learning. CNN is a supervised deep-learning algorithm alongside other neural network types. It was used for the first time in intrusion detection by R. Upadhyay and D. Pantiukhin in 2017 [7]. Starting from that article until July 2021, this survey lists 81 articles that utilize CNN for IDS whether it was used alone or combined with another shallow or DL technique. To the best of our knowledge, no prior survey has addressed CNN specifically out of all the other DL techniques.

RELATED WORK
There are several surveys that discussed intrusion detection using any DL or shallow techniques as shown in Fig. 1. There are more than 10 techniques, and many of them could be combined with other methods to come up with a hybrid model. Authors in [8] have discussed DL for cyber security intrusion detection. They reviewed 45 papers in total, 7 of them were using CNN. However, only 3 of them were specifically for network intrusion detection. S. Gamage and J.

METHODOLOGY
Based on [15], the methodology of this survey is composed of three main stages which are planning, conducting, and reporting. The following subsections present the detailed implementation of this methodology.

Research Questions
To achieve our goal and analyze the articles successfully, we identified 3 research questions: RQ1: What are the hybrid techniques used in combination with CNN? RQ2: What are the datasets used to evaluate the model? RQ3: What are the evaluation metrics that are used to validate the technique?

Study Selection
This survey involves a comparative analysis of the related articles. The initial search across the digital libraries returned 474 articles. Then we started the filtration process by removing the duplicates. After that, we removed the unrelated articles by applying our inclusion and exclusion criteria. The inclusion rules are: 1) Include the articles that use CNN for NIDS even if it was combined with another method. 2) Include the articles during the period between Jan 2016 and July 2021. On the other hand, the exclusion rules are: 1) Exclude the articles that use CNN for IDS but for other systems such as surveillance, Internet of Vehicles (IOV), human/ animal detection, or any other irrelevant applications. 2) Exclude the articles that involve IDS in IoT environments. 3) Exclude any article that is not classified as peer-reviewed. After following this filtration process, we ended up with 108 articles that are ready for the quality assessment phase.

Quality Assessment Rules (QAR)
To classify the quality of the articles, we selected 10 questions where each of them is answered with a score maximum of 1 as "Excellent", 0.5 as "Good", and 0 as "Not explained". Q1: Is the research problem clearly described? Q2: Is the IDS idea clearly pinpointed? Q3: The authors have discussed related works? Q4: Is the used dataset clearly defined? Q5: Is the design of the proposed algorithm/architecture clearly explained? Q6: Is the algorithm/architecture presented precisely in figures and graphs? Q7: Are the reasons/justifications given for the selected parameters in the model? Q8: Does the research cover accurate evaluation parameters? Q9: Are there comparisons with other algorithms/models in terms of result accuracy? Q10: Is the future work pinpointed?
Based on these assessment questions, the 108 articles were evaluated to assure a sufficient result. The article with a score of 5 or above was selected for the next stage. Hence, a total of 81 articles were chosen for the data extraction stage.

Extract and Synthesize Data
The objective of this stage is to extract the required information to answer our research questions. We used methods obtained from [15] to present the collected information for answering the RQs. For all RQs, narrative synthesis was used. Moreover, the data was presented using bar charts and tables.

RQ1: What are the hybrid techniques used in combination with CNN?
The main objective of this question is to look for hybrid models. Therefore, the articles that considered improved CNN by adding specific layers or changing the activation function will not be discussed. Overall, we identified 28 techniques that were combined with CNN.  Table II summarizes which articles used the hybrid models.

RQ2: What are the datasets used to evaluate the model?
All the algorithms were tested on at least one dataset for IDS or more to evaluate the proposed model. There were 12 different datasets as presented in Fig. 3. used among all the 81 articles, these datasets are classified into three categories as follows [96]: 1. Virtualized: This type of dataset is developed artificially to perform a specific task, as most of its features are virtual or abstract such as the DARPA dataset [97].

Synthesized:
This type of dataset is developed to meet particular conditions which could not be available in realistic datasets. Accordingly, it is very beneficial because the realistic datasets face privacy concerns.
3. Realistic: This type of dataset is collected from real-world traffic, which could be classified as private or public.

Figure 3 Utilized datasets
Some articles like P4, P17, P34, P40, P43, P44, P52, P80, P81 have used more than one dataset to test their algorithm. Therefore, the total number is 101 which is more than the number of articles. The most used dataset is NSL-KDD with 32.67% followed by KDD Cup 1999 with 23.76% of the total experiments. However, both are derived from DARPA, which represents 56.4% of the total experiments.

RQ3: What are the evaluation metrics that are used to validate the technique?
All the models must be validated using specific measurements, which demonstrates how well the model performed. Each of these measurements requires these  Since most of the articles used more than one metric, the total is 302. The most used metric is accuracy which was utilized in 92.59% of the articles. Followed by TPR and FPR with 75.3% and 49.38% respectively.

CONCLUSIONS AND RECOMMENDATIONS
We performed this SLR to explore the NIDS using CNN specifically, whether alone or combined with another technique. We manually examined the initial 474 articles, choosing only 81 relevant articles after applying our selection criteria. This work provides a closer look at employing CNN for NIDS, which helps and aids the researchers towards utilizing CNN to obtain better results. The conclusion could be summarized as follows:   Based on this SLR results, considering that almost half of the selected articles relied only on CNN. We recommend using more hybrid models for NIDS which will open the opportunities to explore better outcomes not only in terms of detection efficiency but also in the model performance. The hybrid models have shown a remarkable boost in terms of model efficiency and performance. Furthermore, many articles used only one or two efficiency metrics which might not be enough to evaluate the model. Additionally, the most used dataset was NSL-KDD followed by KDD Cup 99 which is relatively old, it is recommended to use more up-to-date datasets that accommodate the real network traffic data. This will serve the model to produce better results when it's applied in a real-world network.
Regarding future work, it would benefit the researchers more to add the attack types that each mode can detect as well as the data pre-preprocessing technique that was used in each article. Besides that, we might also consider adding models that used CNN for IDS in other applications such as industrial control systems, surveillance systems, and database IDS. Nonetheless, these applications most likely will be using different types of datasets, pre-processing techniques, and evaluation metrics.
[9] S. Gamage and J. Samarabandu, "Deep learning methods in network intrusion detection: A survey and an objective comparison," J.