Supervisory framework for threat detection with multilayer processing in CNN

,


Introduction
In the 21st century, the main problem is security of human life from attacker. In the event that we can keep an eye on people's movement from a live cameras, so that we will be aware of the security in places where human movement is restricted like jails, airports, border area, VIP buildings etc. These days, the utilization of CCTV has expanded to protect the unattended areas with the low installation and storage cost. A lot of fear and crimes makes the access restricted to any area, a main issue for many circumstances. Ordinary techniques such as secret key and savvy card are spurious and questionable. Equivalently, face recognition is a faithful and a smart bio-metric identification as well as object detection technique. The recognition systems is extremely functional to be utilized in various applications, for example security systems, criminal recognition, explosives detection and identity verification access. Because of the all around created innovations connected with Deep Learning, we can acquire extensively great and fulfilling results of face recognition. The extracted features through images, are examined and analysed with existing face in the dataset. This system is extremely productive in detecting danger. Convolutional Neural Network also known as ConvNet or CNN, is a type of artificial neural network used primarily for image recognition and processing, due to its ability to recognize patterns in images. It is used for applications like face recognition, neural language processing, etc. The convolution steps are shown below from https://thinkingneuron.com. A proposed solution for this matter is a face recognition security framework, that can recognize tresspassers in highly sensitive areas, and also help in limiting human error. This security system has two sides: hardware side and software side. The hardware side includes a camera and a metal detector system while the software side comprises of face and object detection and identification/classification programming. In this paper, face recognition is begun by starting the framework. Of course, the camera will click many photos of the individuals in frame. The face examined will be checked in the current data set. If there is any occurrence of unknown face, the unknown face will be stored and sent to the admin with an alert email. In any case, in the event of known face, real face is coordinated with faces put in the database. Moreover, the system also scan for guns through camera as well as a metal detector system which is integrated to the original system.

Literature Survey
Mohammad Hoque et al. [1] demonstrates human face recognizing or faces identification strategies such as Camshift, AdaBoost, Haar-Cascade, Hausdor↵ distance, viola jones, and so on. Among this, they utilized Haar classifier Cascade calculation for location of face in this project. In future, their task will be stretched out to other face recognition algorithm; with face identification from a devoted information base for analyzing their performance in similar conditions. Apoorva.P [2] et al. represents a technique for face detection in progressively environment. They utilized Haar classifiers in order to follow faces on OpenCV. The primary benefit is that they utilized citizenship data set which already exists.
Kushal M et al. [3] proposes that to work with object recognition in a school environment, the proposed work distinguishes an individual utilizing tensor stream object discovery API, identifies and perceives the face utilizing haar course technique for OpenCV. They presumes that Model testing and preparing were finished utilizing Tensor Flow Object Detection API with the assistance of elements accommodated ID card location, Harr cascade classifier for face discovery and LBPH strategy for face acknowledgment was executed.
Faizan Ahmad et al. [4] shows the real benefits of face based identification over other biometrics. They want to assess di↵erent face detection and recognition strategies, give total answer for picture based face detection and acknowledgment with higher precision, better response rate as an underlying advance for video observation. In their work, they fostered the framework to assess the face discovery and acknowledgment strategies which are viewed as a bench mark.
Hong Zhao et al. [5] presents the particular face recognition technology which depends on installed platform and advances an answer, which weights on face identification calculation, face acknowledgment calculation, and application improvement. They expresses that working vigorously depends on acknowledgment by the back-end and is restricted by the transfer speed and soundness of information transmission organization. This system utilizes the benefit of PCA algorithm on feature extraction and the benefits, (for example, quick recognition speed and fast discovery pace) of AdaBoost calculation on the basis of Haar.
E. Omer Akay et al. [6] exhibits two di↵erent face recognition algorithms, in particular Histogram of Oriented Gradients and Haar-Cascade algorithms, that are applied and their performances are measured. Deep learning in view of convolutional neural networks (CNNs) is utilized for the identification of the students in the classroom. As a future work, it is expected to incorporate the developed framework into the Student Data Management System of the college/school, that will give programmed generation of the data set for every class.
Ayman Ben Thabet et al. [7] expects to replace expensive picture handling sheets using Raspberry pi board with ARMv7 Cortex-A7 as the main for Opencv library. This project is basically made on image handling by connecting the Opencv library to the Raspberry Pi board. They audit the related work in the sector of home automation and presents the framework plan, programming algorithm, execution and results. In this paper, face acknowledgment framework has been created to read up the likely application for home computerization entryway security with constant reaction and better acknowledgment rate.
Aashna R. Bhatia et al. [8] centers around the advantages and the disadvantages of the Viola-Jones and why it is the most important face detection algorithm and how it can be improvised to address the issues of the present time. The essential goal is to speed up the calculation which might be accomplished through parallel execution utilizing CUDA and OpenCV on Graphics Processing Unit (GPU) and then give analysis of the better computational outcomes between the serial and parallel executions.
Nandhini R et al. [9] states that in their face recognition project, a PC framework will actually identify and perceive human faces quick and definitively in pictures and recordings captured by a surveillance camera. Various algorithms and techniques are created for improvising the face recognition results, yet the idea to be carried out is Deep Learning. The point of this paper is to capture the video of students, convert it into frames, compare them with the data set in order to guarantee their presence or absence and mark the following to keep up with the record.
Danish Ali Chowdhry et al. [10] designs and implements a smart security framework for restricted places where access is restricted to individuals whose faces are stored in the data set. Simple methods are taken on to depict the possibility of a smart system which is a demand of numerous organizations from investigation, intelligence to military.
Dr. V Suresh et al. [11] mentions that the main purpose for their system is to construct a face recognition based attendance monitoring framework for academic institutes for improvising and redesigning the current attendance system into more proficient and powerful when compared with previous. They states that the face datasets will be created for pumping data into the recognizer algorithm. Hence, the facial recognition inserted in the attendance monitoring system can not just guarantee attendence to be taken precisely but furthermore eliminates the defects in the past framework.
Samridhi Dev et al. [12] states that the improvement of their system is meant to achieve digitization of the conventional system for taking attendence by calling names and keeping up with pen and paper records. The proposed framework utilizes Haar classifiers, Gabor filters, Generative adversarial networks, SVM (Support Vector Machine), CNN and KNN (k-nearest neighbors). To recognize face, three algorithms have been used which are K-closest neighbour, convolutional neural networks, and support vector machine, among these, the KNN algorithm demonstrated the most elevated precision of 99.27%.
Md Khaled Hasan et al. [13] presents a broad study on face recognition techniques, classifying them essentially into feature based and image based methodologies. They additionally explores di↵erent types of the accessible face detection algorithms in five stages, including history, working method, benefits, limitations, and use in di↵erent sectors related to face detection. Their paper gives point by point comparisons among the algorithms in order to have a comprehensive viewpoint.
Smitha et al. [14] proposes a framework to check the participation by means of face Id. This framework comprises of four stages database creation, face detection, face recognition, attendance updation. Database is made with the pictures of the children in classroom. Face recognition is done by utilizing Local Binary Pattern Histogram algorithm and Haar-Cascade classifier. Faces are identified from live real-time video of the study hall. Attendance will be sent to the individual sta↵ toward the finish of the meeting.
Shivam Singh et al. [15] proposed a automated face recognition system. They utilized KLT Algorithm, Viola-Jones Algorithm face identification that identify human face utilizing Haar course classifier, but camera is consistently recognizing the face each frame, PCA algorithm for feature selection. In real-time situations, PCA beats other algorithms.. The future work is algorithm recognition.
Michel Owayjan et al. [16] proposed a Face Recognition System, that can recognize intruders to limited regions, and help in limiting human erros. This system is comprised of face identification and recognition algorithms. At the point when an individual enters to the zone in question, the camera captures the images and they are sent to the software for analysis and comparision with current data set of trusted people.
Kruti Goyal et al. [17] introduced a system for tracking as well as detecting faces in videos that can be utilized for many tasks. Di↵erent algorithms like Adaboost and Haar cascades were utilized. They presumed that the algorithms such as camshift algorithm or haar cascade are relaible and give more confirmed result than detection by means of movement. Yet to the extent time is considered, camshift algorithm and identification through motion algorithm is a better choice.
Xiao Han et al. [18] gave di↵erent research areas of interest on face recognition based on depth in the field of biometrics combined with the significant theory and techniques for depth learning, face recognition technology. Their research concluded that Deep learning has a critical upper hand over machine learning for other face recognition methods, deep learning can not just figure out how to get more valuable data, but also fabricate a more exact model.
H. Zhang et al. [19] proposes a face recognition model based on LBP feature and CNN. In this, the LBP feature map is used as the input of CNN to train and identify CNN. After comparing the experimental results, they concluded that the face recognition is easier and better by CNN.
Tejashree Dhawle et al. [20] states that this report contains the manners by which deep learning is a significant piece of engineering field can be utilized to decide the face involving a few libraries in OpenCV alongside python. The systems help in recognizing the human face progressively. This execution can be utilized at di↵erent platforms in machines and cell phones, and a few programming applications. The utilization of python and OpenCV creates it a simpler and helpful tool that can be made by anybody anyone according to their needs.
Kushsairy Kadir et al. [21] assesses two techniques for face detection, their highlights and Local Binary Pattern features in view of detection accuracy and speed. The algorithms were tried on Microsoft Visual C++ 2010 Express with OpenCV library. Related research was conducted between these two algorithms by involving three face datasets as tests which are Taarlab database, MIT CBCL database and Color FERET database. Every one of the three data sets furnish single face images with different kinds of faces. Results have been assessed in light of speed detection for each picture and recognition's accuracy which prompts precision. The experiment's outcomes displays that Local Binary Pattern highlights are generally e↵ective and solid for the execution of a real-time face detection framework.
X. Peng et al. [22] analyzes the e↵ects of di↵erent activation on face recognition rate. Majorly they focused on experimenting with small samples of CNN. They concluded that the ReLU activation function performs better in the siamese neural network than the MFM activation function.
Patrick Laytner et al. [23] states that the most crucial issue is the insufficient contrast between facial features. To beat this test, another face recognition approach, that comprises of Adaboost learning techniques, Haar transformation and Histogram analysis is proposed. The extended Yale Face Database B is utilized to analyze the execution of the suggested strategy and compared against regularly utilized OpenCV's Haar detection algorithm. The experiment's outcomes with 9,883 positive images and 10,349 negative images concluded an impressive enhancement in face accuracy without a critical change in bogus acceptance rates. Experimental results from the proposed strategy end up being a viable approach to managing this issue with an impressive expansion in hit rate without seriously a↵ecting the bogus acceptance rate.
Fan-Fan Peng et al. [24] shows a technique that has insuperable defect, particularly when the lightning changes, the detection e↵ect quickly decreases and is not sufficient for the actual system. The motivation behind this study is to take care standard issue of face detection under various lights, and foster a precise and e↵ective human face detection algorithm using Visual Studio 2015 and OpenCV technology.
Amritha Nag et al. [25] states that their research is to help individuals for development of the entryway security in delicate areas by utilizing detection and identification methods. The suggested framework principally comprises of subsystems in particular taking image, detecting and recognizing face, sending notification on email and programming entryway access. OpenCV supported Face Recognition is put forward because it utilizes Eigen faces and minimizes the size of face images without losing crucial elements, pictures of many people can be put in the databset. The entryway lock can be accessed remotely from any region of the planet by utilizing Telegram app. For security issues, the admin is notified with all the images captured by the camera by an email.
Madan Lalet al. [26] , exhibit most endeavoring face highlights like posture invariance, illuminations, maturing and partial occlusion. They are viewed as basic elements in face recognition system when acknowledged over face pictures. After indepth study, they found that when dimension of feature is more for original image, PCA is best method while in frontal face recognition, the eigen face image feature works better. However, point of this research is to give a detailed analysis over face recognition alongside its uses.
Nourman S. Irjanto et al. [27] exhibit CNN Alexnet facial recognition system which is executed in an entryway security system, information is gathered by 1048 facial information with the help of a framework that is trained using the images where the outcomes are very exact. This research incorporates enhancing the facial data enhancement procedure by utilizing a database, good camera, and the most recent Raspberry Pi model to further develop computational capacities.
Omar Abdul Rhman Salim et al. [28] proposes a technique for fostering a detailed embedded class attendance framework and also control the entryway with the use of facial recognition access. The framework runs on Raspbian (Linux) which is the Operating System of Raspberry Pi. By confronting the camera, it captures the picture then, at that point, passes it to the Raspberry Pi that is programmed to deal with the face recognition by executing the Local Binary Patterns algorithm LBPs. When the student's face matches the trained database images, the model entryway opens utilizing Servo Motor, then, MySQL data set stores the attendance. The data set is associated with Attendance Management System (AMS) web server, that allows to access the attendance from any device.
Mayank Srivastava et al. [29] states that their product will work with the automatic attendance system and allow teachers to access student's information simply by keeping data for IN and OUT time by utilizing OpenCV, Face acknowledgment, Authentication and Haar Cascades Algorithm. This can be utilized by overall population for enquiring the image of any individual caught by any camera by the assistance of appropriate approval by dataset.
Prayag Bhatia et al. [30] proposed Local Binary Patter to identify a person from the database. Their system is proposed to be used for home security with the help of Rasberry-pi and camera.

Methodology
The focus of this algorithm is to identify human faces and objects by means of a camera like webcam or cctv. Convolutional Neural Network, otherwise called ConvNet or CNN, is a notable technique in computer vision applications. It is a class of DNN that are utilized for analysing the images. This sort of architecture is prevailing to recognize objects from an image or video. It is utilized in applications like picture or video recognition, neural language processing, etc. The initial segment includes face recognition utilizing CNN, then, at that point, look for similarity in the data set. A flowchart of security system is given below.
For executing the framework, we made use of Convolutional Neural Network's layers function, a camera reads the input and communication is established serially to connect to the Arduino Uno. The metal detector is made of an insulated copper coil of radius 10 cm. The results of metal detector were transmitted to the GUI (graphical user interface) using serial ports of the Arduino.

Metal Detector:
When some current passes through the coil, it generates a magnetic field around it. Moreover, the modification in the magnetic field induces an electric field. According to Faraday's law, a voltage develops across the coil because of this Electric field, which resists the modification in magnetic field. That is how the coil generates the inductance, which signifies that the generated voltage resists the gain in the current. When any metal comes close to the coil, the coil varies its inductance. This modification in inductance relies upon the metal class. It reduces for non-magnetic metal and boosts for ferromagnetic materials. Arduino generates the block wave or pulse provided to the high pass filter. Due to this, the coil will induce momentary spikes in every transition. The pulse altitude of the developed spikes is proportionate to the coil's inductance. The rising pulse or spike charges a capacitor. Moreover, it needs a few pulses to charge the capacitor to the point where an Arduino analog pin can read its voltage. After reading the voltage, the capacitor quickly discharged. After obtaining the result, we transfer the results via serial port to the application interface and buzzer to detect metal's presence.

Architecture of CNN:
In comparison to other object detection techniques, the best technique is CNN layers. The CNN structure consists of di↵erent layers, like CNN layers, max-pooling layers, and full connected layers. A typical structure has several CNN layers and a pooling layer, followed by 1 or more full connected layers. The convolution and pooling operations can be performed for 2D-CNN as well as three-dimensional (3D)-CNN. The architecture takes a gray scale image with shape (100, 100, 1)

A. Convolution layer:
The convolution layer is a key part of the CNN architecture and it extracts features, that normally comprises of a mix of convolution operation and activation function. The results of a convolution operation, for example, convolution are then gone through a activation function that are representations of a biological neuron behaviour, the most well-known nonlinear activation function utilized as of now is the rectified linear unit (ReLU), which computes the function: ReLU is short form for rectified linear unit. After extracting the feature maps, they are shifted to a ReLU layer. ReLU processes each element and sets all less than 0 pixels to 0. The graph of a ReLU function is shown in fig.  4. The spatial size (o/p volume) depends on i/p volume size W and filter size k x k of the CNN layer neurons, the stride size of 0 padding P on edges. The neuron size that present in a given volume is then:

B. Pooling layer:
A pooling layer produces decreased sampling process, this decreases in any of pooling layers include filter size, pdding, stride(hyper parameter) just like a convolution process.
The pooling layer utilizes di↵erent filters to distinguish various pieces of the picture like edges, corners, body, quills, eyes, and snout. The most famous type of pooling activity is max pooling, which takes patches from the input feature maps, outputs the maximum value in every patch, and deletes all other values. A max pooling with a filter of size 2 ⇥ 2 with a stride of 2 is commonly used in practice. This downsamples the in-plane dimension of feature maps by a factor of 2.

C. Fully connected layer:
The input of the Fully Connected Layer is taken from the output of the last Pooling Layers. High-level features are represented in the output of the convolutional layers. While that output could be flattened and connected to the output layer, adding a fully-connected layer is a (usually) cheap way of learning non-linear combinations of these features. The CNN layers are providing a lowdimensional, meaningful invariant feature space, and the fully-connected layer is learning a function in that space. Fully Connected Layer are mapping/embedding from the feature maps (attention to the spatial correlation and feature description) to an N-dimensional vector space, where a given input x, represented by the set of its feature maps, is mapped to. Ultimately each image, if appropriately mapped, represents a signal point in the higher dimensional vector space that can make up clusters, surfaces, manifolds.  7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 (4) To accomplish the described working, we divided face recognition procedure into four parts: Detection, Extraction, Recognition and Alerting.

A. Detection
The main goal of this segment is to search the images to decide if structures appear, and pinpoint their location and prepare for cropping The subsequent result of this stage is the selection from the each image. In order to increase the success rate of the system, alignment and scaling are used in the input picture. Recognition is also used for locating interest areas, video sorting, retargeting pictures, and many more. By using CNN, the framework could perceive the presence or the shortfall of a human face.

B. Extraction
Subsequent to identifying the face or object in picture, their selction are extracted from pictures. To keep away from natural deficiencies such as lightening, face expressions, impediment and mess, feature extractions are executed to extract data from the picture to reduce size, noticable extraction, and noise reduction. After this process, the selection is extracted and converted into flattened layers.

C. Recognition
After setting up the training file and interpreting the structure of face, the following stage applies the comparison with the database and input image. In this stage, when the system shows input picture, the algorithm will precisely point out the attributes of a face or object, after which feature extraction process only the selected face and distinguish the attributes extracted to the dataset. In this stage, two major features are set up: the first being identification and subsequent being verification.

D. Alerting
After identification, the system makes an note of the time. If in any case it detects an unknown face or dangerous object, an alarm is raised on the screen and an alerting email along with the picture of it is automatically sent to the admin.

Output
The realtime outputs of the security framework system are shown below.

Conclusion and Future scope
Thus, in the proposed framework, the field of CNN allows to an extent to configure the machine, and enable them to capture the world as we do and use the data for Image Analysis, Image Recognition and Classification. The proposed threat detection system using CNN presents the particular face identification framework which depends on speed of embedded system and provides a solution, which emphasizes mainly on face detection algorithm, face recognition algorithm, classification and application improvement. This system utilizes a hybrid method using software-hardware utilization to enhance the security of social gathering places like banks, airports etc. This work has been tested to evaluate the success rate of the algorithms in di↵erent conditions. It is found that, this could be low cost, easy to use for normal security persons, good as compared to alert triggering systems presently available. The integration of CNN with Arduino UNO will be helpful for researchers to develop prototype for surveillance systems with metal detector. The hardware (Arduino) is basically used in serial communication mode which could be replaced in future with Raspberry Pi for increasing the efficiency, speed of processing, and to develop multicamera system with one CPU. This system shows potential for future developments in various fields of Internet communication, security, access control, surveillance, PC entertainment and law enforcement.