A novel approach to ensemble MLP and random forest for network security

. The following paper provides a novel approach for Network Intrusion Detection System using Machine Learning and Deep Learning. This approach uses two MLP (Multi-Layer Perceptron) models one having 3 layers and other having 6 layers. Random Forest is also used for classification. These models are ensembled in such a way that the final accuracy is boosted and also the testing time is reduced. Researchers have implemented various ways for the ensemble of multiple models but we are using contradiction management concept to ensemble machine learning models. Contradiction Management concept means if two machine learning models are contradicting in their decisions (in our case 3-layer MLP and Random Forest), then the third model’s (6-layer MLP) decision is considered whose accuracy is higher than the previous models. The third model is only used for testing when the previous two models contradict in their decision because the testing time of third model is higher than the two previous models as the third model has complex architecture. This approach increased the final accuracy as ensemble of multiple models is done and also testing time has reduced. The novelty of this paper is the choice and the combination of the models for the purpose of Network security.


INTRODUCTION
Today, use of Internet has drastically increased. In future, every person on the earth would be a part of network. Many internet users are not aware about the hacking techniques. So, there is high probability of intrusion of a black hat hacker into a personal network of various devices. Many companies are using cloud technology for their system operations. If an unauthorized intruder gets access to the private data of the company, then the intruder could use the data for gaining money. Network Intrusion Detection system (NIDS) is a very important part of a network. Along with the firewall, NIDS has to be deployed because firewall blocks a particular website only if the website is blacklisted in the database. But NIDS works on the packet data. It traces the packet and uses anomaly detector to detect the intruder.
In this approach, we used Machine Learning models to detect the intruder. The decisions of multiple machine learning models are considered and using the proposed approach decision is finalized. The packet will be captured by the packet capturer and then the packet is tested on the trained Machine Learning models. The model gives its decision. If the decision is positive i.e. the packet is anomaly, then the system administrator is alerted about the packet. This approach also decreases the packet waiting time. Packet waiting time means the time for which the packet is tested by the NIDS.
The main goal of the paper is to boost the accuracy of NIDS using multiple machine learning models and also to reduce the packet waiting time. We have used CIC (Canadian Institute of Cybersecurity) dataset for training the model. The packet analyzer used is CIC-Flow-Meter.

RELATED WORK
YONG ZHANG et al [1] introduced a new network intrusion detection model named the deep hierarchical network, that combines LeNet-5 and LSTM Neural networks.
Meng Wang et al [2] proposed a method to choose optimal feature for detecting DDoS attack using sequential feature selection with MLP.
Yi Yi Aung et al [3] explained the method of applying Random Forest for Network Intrusion Detection System.
Prachi Barapatre et al [4] used MLP with backpropagation algorithm to classify attacks. Dataset used was KDD99 dataset. They analysed the working of MLP model and studied the advantages and disadvantages of MLP model. The main goal was to decrease false alarms. Dimitra Chamou et al [5] used Deep Neural Networks for classification of DDoS attacks and Malware attacks. They used 5-layer which are fully-connected layers of neural network.
Vinayakumar et al [6] explained the working of Convolutional Neural Networks (CNN) in Network Intrusion Detection System. They applied the data on various models having various layers.

Dataset Introduction: -
We trained our algorithm on the CIC (Canadian Institute for Cyber Security) datasets of 5 days. The network data has been captured using the CICFlowMeter which is a packet capturer developed by CIC. This dataset has updated attacks compared to NSL-KDD dataset. The data was captured in 2017 by CIC.
This dataset is built on the abstract behaviour of 25 users based on HTTP, HTTPS, FTP, SSH and email protocols. The capturing of the network data started from 9 am Monday July 3 rd 2017, and ended on Friday 7 th July,2017 at 5 pm. Monday was a normal traffic so we didn't use it for training. We used the dataset of the other 4 days. On Tuesday = SSH-Patator and FTP-Patator, Wednesday=DoS/DDOS, Thursday=Infiltration, Friday =Portscan, DDOS, Bot attacks were done. For more information about dataset and packet capturer refer [13].
All the datasets contain 79 features including label column and samples are over 50000, non-null and data types consist of int, float and object. The final dataset consists of packets from all the 4 days with labels as Anomaly and Normal which basically can be used for Anomaly Detector.

Feature Engineering: -
In feature engineering, we need to select the important features which are responsible for classification. Because some features are not relevant to the output label which decreases accuracy. Those irrelevant features should be dropped out from the dataset. To calculate the relevance of the features with the output label, we should calculate correlation of each and every feature with the output label. If the correlation of a particular feature is less, then those particular features should be dropped off from the dataset. Feature importance in the features of CIC Dataset (top 25 are shown but top 58 were considered for training) shown in Fig. 1.
After feature reduction or dimensionality reduction, the data should be split into training, testing and validation sets. The splitting ratio was 80% training, 10% testing and 10% validation. Now, the data should be label encoded. Label encoding is a process in which the character type features or features containing character type data, should be converted to categorical data Now, the data has to be normalized to prevent biasing condition and then it has to be reshaped to a particular array which can be fed to Neural Networks.

Model Building: -
The CIC dataset is used to train three models. Three models are: -MLP (Multi-Layer Perceptron) with 3-layer, 6-layer MLP and Random Forest. After training, the packet will be sent to Random Forest and 3-layer MLP for testing. Both the models will give their decisions. Now, both the decisions are compared and if there is contradiction then that particular packet is sent to 6-layer MLP. This 6-layer MLP has more accuracy than Random Forest and 3-layer MLP but more testing time than the both. So, as the 6-layer MLP has more testing time, testing all the packets on 6-layer MLP would increase packet waiting time which is not desirable. There is no need to test a packet on 6-layer MLP if the 3-layer MLP and Random Forest both are giving same decision because this shows that the probability of decision being correct is high as both the models gave same decision.
If all the packets are tested on 6-layer MLP then the testing time will be high but, in our approach, we found out that only few samples (<5 %) are sent to 6-layer MLP which decreases the overall testing time. The 3-layer MLP has 128 neurons in 1 st and 3 rd layer and then 64 neurons in 2 nd layer with 'relu' activation and output activation as 'sigmoid'. The dropout is 1%. Similar, configuration is there for 6-layer MLP with alternate layers of 128 and 64 neurons. Architecture is shown in Fig. 4.

Fig. 3 6-Layer MLP Model
For Random Forest, we used 50 estimators and the criterion used was entropy. Rest of the parameters were set to default.
Below is the architecture of the proposed approach:    From the above metrics, we can see that proposed approach metrics has more accuracy than the other models as it boosted the accuracy by ensemble of the above existing models. Table I is the result, when the dataset is tested only on 3-Layer MLP, which tells us that it gives a pretty good metrics. As the dataset is little imbalanced with respect to the labels, we also considered the precision, recall and F1-score. Table II is for Random Forest in which metrics are lower than 3-layer MLP but can be used as a validation model for 3-layer MLP. Table  III is for 6-layer MLP, as layers are more the learning is more accurate, so in this case the metrics are higher. Now, using the models in the table I, II and III in the proposed manner, the results show that the metrics are boosted.
The testing time is also found less when the packet is tested with Random Forest and 3-Layer MLP as compared to 6-Layer MLP as more the layers, more will be the mathematical calculations and thus more testing time is needed. The drawback would be when the packet has to be tested for all the 3 models, but the probability of that happening is found to be very less i.e. 5%.
Following are the testing times:

Models
Testing Time (seconds)

CONCLUSION
We can conclude that, by considering decisions of multiple models and if the decisions contradict, then that decision will be taken by another Neural Network model which has complex architecture but has high accuracy. In this way, the resulting model will have high accuracy and low testing time. In the future work, there is scope of developing new techniques for ensemble modelling