Autism Spectrum Disorder Detection Using Enhanced Convolutional Neural Network and Wearable Sensors

. Stereotypical Motor Movements (SMMs) may seriously impede learning and social relationships are one of the distinctive and typical postural or motor behaviours linked with autism spectrum disorders (ASDs). A reliable infrastructure for automatic and quick SMM detection is provided by wireless retail sensor technology, which would facilitate targeted intervention and perhaps provide early warning of meltdown occurrences. However, because of significant inter-and intra-subject variability that is challenging for handmade features to handle, the detection and quantification of SMM patterns remain challenging. In this work, we suggest using the Enhanced Convolutional Neural Network (ECNN) to extract distinguishing characteristics directly from multi-sensor accelerometer inputs. Parameters of the ECNN are tuned using whale optimization. Results with Enhanced convolutional neural networks produce accurate and robust SMM detectors.


Introduction
Developmental impairments known as ASDs hinder a patient's ability to interact socially and communicate with others to varied degrees [1]. 1 in 88 people is estimated to have ASD. Behaviour that is constrained, predictable, and stereotypical in persons with ASD is typically what defines the condition. SMMs, including mouthing, body rocking, and complex hand motions, are common in individuals with autism, and can severely limit a person's capacity for learning and engaging in social relationships. Additionally, the chance of emotional or sensory overload is often increased by SMMs, which might lead to autistic meltdowns [2]. Table 1. Setting Word's margins.
Therefore, one of the main goals of ASD therapies is to reduce SMMs, which calls for precise techniques for identifying and measuring SMM patterns [3,4]. It is useful to understand the limits of standard approaches for evaluating SMM, such as direct behavioral observation, rating scales on paper, and video-based coding, to guide behavioral treatments and maybe avoid SMM resurgence. A quick, effective, and precise way to quantify SMM is through the use of machine learning algorithms and wireless accelerometer sensor technologies [5].
SMM detection often relies on collecting ad-hoc ("handcrafted") characteristics from the accelerometer data, much as in many other signal processing applications. Numerous feature extraction techniques have been used so far. The accelerometer signal is often used to extract two kinds of features: Time domain characteristics and frequency domain characteristics [7]. To construct temporal domain characteristics, statistical data from overlapping windows of data are retrieved. The data include mean, standard deviation, zerocrossing, energy, and correlation [6]. To determine the power of distinct frequency bands in frequency domain characteristics is done using the discrete Fourier transform. The Stock well transform has recently been created for feature extraction from inertial 3-axis accelerometers to enhance time-frequency resolution for non-stationary data [8].
Manual feature extraction, although widely used in movement analysis, has two significant drawbacks: a) Instead of encoding movement data, the feature extraction step mostly relies on general researchers' topic expertise. Without taking into account intra-and inter-subject variance, aspects of unusual motions might be missed; b) Atypical movement detection cannot be used in real-time applications feature extraction is a computationally demanding phase in the processing pipeline.
To train distinguishing characteristics for SMM pattern recognition, this study developed an enhanced convolutional neural network (ECNN) to get beyond these restrictions. The multi-channel accelerometer signal may be reduced to a smaller set of features using the ECNN, this may be used to separate the classes for SMM and non-SMM in the new representation of the signal. Utilizing whale optimization, the ECNN's parameters are tuned. D Sarangi, et al [2017] [9] used an E-healthcare system ANFIS structure. Additionally, the ANFIS system is used by both doctors and patients to diagnose diseases and provide patient assistance. Using rule-based fuzzy parameters, a multi-agent system's management has been completed. Either the physician or the patient may get the service online. In the same way that the doctor's prescription for a diagnosis may be shared with the pathology centre and vice versa, the doctor is automatically informed of the patient's various conditions. The results of the detection were shared for monitoring, post-care, and the necessary medication. This part of the intelligent system was determined to have excellent performance. Boursalie, et al [2015] [10] presented M4CVD: A system created particularly for mobile devices that make it easier to monitor cardiovascular illness called the Mobile Machine Learning Model for Cardiovascular Disease (CVD). The technology collects vital sign trends using wearable sensors and contextualizes them with information from healthcare databases. To classify a patient as having either "continuous risk" or "no longer at risk" for CVD, the system analyses the data locally by feeding it to a support vector machine (SVM) that monitors data obtained from wearable sensors and clinical databases. This is done instead of sending the medical specialists the raw data directly. According to the findings of our study, the system had a 90.5% accuracy rate when identifying a patient's CVD risk.

Literature Review
Padma and Balasubramanie [2011] [11,12]provided an analytical tool, a fuzzy decision support system (FDSS), to find the precedence of jeopardy in occupations spawning shoulder and neck pain (SNP), an important musculoskeletal disorder and the most ubiquitous pain complaint in an occupational environment. FDSS evaluates and prioritizes the relative importance of the imprecise, uncertain, and vague nature of risk factors causing occupational SNP. The objective involves the derivation of mechanical-, physical-and psychosocialrelated risk categories using knowledge acquisition implemented by identifying the risk factors. The fuzzy analytic hierarchy process is applied as an evaluation tool to measure the significance of the risk factors in each occupation. The results indicate that the proposed system supplements SNP diagnosis experts with more precise key decision support information. This assists healthcare organizations in systematically identifying appropriate occupations that ground high risk for the occurrence of SNP so the curative practices can be executed effectively.

Proposed Methodology
In order to diagnose autism spectrum disorders (ASDs) using multi-sensor ac-accelerometer inputs, an enhanced convolutional neural network (ECNN) is suggested as part of a modelbased healthcare monitoring system. Whale optimization is used for tuning the Parameters of the ECNN. The suggested paradigm's general architecture is seen in Figure 1.

Autism Spectrum Disorders (ASDs) detection using Enhanced Convolutional neural network (ECNN)
Data sensed by the sensor are the input for Autism Spectrum Disorders (ASDs) detection using an Enhanced Convolutional neural network (ECNN). In contrast to a conventional artificial neural network, a CNN is architecturally different. The layers of a CNN are selected so that they geographically fit the input data, in contrast to traditional ANN which flattens out the input to a vector [13,14]. A number of blocks of convolutional and subsampling layers, as well as a number of output layers and fully connected layers, are the components that make a standard CNN.

Autism Spectrum Disorders (ASDs) Detection Using Enhanced Convolutional Neural Network (ECNN)
Only two levels of headings should be numbered. Lower-level headings remain unnumbered; they are formatted as run-in headings.

Weighted CNN
Three main types of layers make up CNN: convolution layer, sub-sampling layer, and fully linked layer. A convolutional neural network (CNN) typical architecture is shown in Figure  1. The following sections provide a short explanation of each sort of layer.

Convolution Layer
In this convolution layer, input features are convolved with a kernel (filter). N output feature maps are produced using the output of the convolution of the input feature and kernel. The output features acquired by convolving the kernel and the input is known as feature maps of size i*i, and a convolution matrix's kernel is often referred to as a filter.
The inputs and outputs of the subsequent convolutional layers are feature vectors, and the CNN may have many convolutional layers. Each convolution layer has n filters in all. these filters are convolutional with the input, and there are exactly as many filters employed in the convolution process as there is depth of the feature maps that are produced (n*). It is important to keep in mind that each filter map corresponds to a unique feature located in a particular part of the input [15,16].
The output of the l-th convolution layer is represented by C_i^((l)), and feature maps are included. It comes to be known as Where, B i (l) is the bias matrix and K i,j (l−1) both the ith feature map and the jth feature map on the layer (l-1) is connected by a convolution filter or kernel with the size of a*a.
The output C i (l) layer includes feature maps. In (2), the first convolutional layer A map of features is produced by the kernel. For nonlinear processing of the convolutional layer's outputs after the convolution layer, the activation function may be used.
Where Y i (l) is the activation function's output, and C i (l) is the information that is provided to it as input.

Subsampling or Pooling Layer
Reducing the spatial dimensionality of the system is the main purpose of this layer, a features map that was taken from the layer before this one that performed the convolution. Between the mask and the feature maps, the sub-sampling proce-dure is carried out. In addition to average pooling, total pooling, and maximum pool-ing, several sub-sampling strategies were also discussed. A maximum pool, in which each block's maximum value corresponds to its associated output feature, is the most often used pooling technique. It is important to note that the presence of a sub-sampling layer enables the convolution layer to better withstand rotation and transla-tion among the input images [17,18].

Fully Parameter Tuning Using Whale Optimization
To reduce error and improve accuracy for CNN, it is necessary to choose the best sizes for parameters such as convolutional layer parameters, pooling layer parameters, and fullyconnection layer parameters [19,20].
Recent research has led to the development of the WOA, a unique population-based stochastic optimization approach. To identify the optimal answer to an optimization issue, the WOA employs a collection of search agents. The WOA uses a technique known as "bubble-net hunting" to replicate the actions that humpback whales do while they are pursuing prey. Encircling prey, using a bubble net to attack, and looking for the best prey are the three main phases of the WOA. When benchmark testing was conducted, WOA outperformed GWO and PSO, two other prominent meta-heuristic approaches [21,22].
Humpback whales circle and chase prey using the bubble-net method. The whales strive to change their locations to get the best outcome after enclosing the prey, such as fish. The WOA's primary mathematical component is shown in Eqs. (6) and (7).
X(t + 1)|C. X * (t) − X(t)|. e bl cos(2πt) + X * (t) if p ≥ 0.5 (7) The best option presently available is X* if t is the time or iteration index, and X is a vector representing all of the whales' locations; A=2a. (r-a); C=2.r; b is a constant value that, depending on the specific path, specifies the logarithmic spiral's form, with the value 1 in this paper; over the course of repetitions, and is a coefficient vector that progressively reduces from 2 to 0; the values of the random vector r vary from 0 to 1; p is a random value between 0 and 1 that is used to alternate between updating the locations of the whales in (6) and (7), and l is a random number between -1 and 1. The probability in Eqs. (6) and (7) are 50% and 50%, respectively; Vector A has a random value of [-1, 1] in the bubble-net phase, but it might be either bigger or lower during the searching phase. The search process is shown in Eq. (8) This random search technique focuses on the process of searching and forces an international search using the WOA algorithm. With a collection of random solutions, the WOA searching process starts [23,24]. Then, using algorithm 1, these solutions are updated iteratively. A predetermined maximum number of iterations will be reached before the search is over.

Algorithm .1. For Whale Optimization
Start Input: Parameters of convolutional layer and pooling layer Output: Optimal sizes for all parameters 1. import data 2.
initialize the locations of the whale population X 3.
compute the fitness of each whale 4.
initialize a and r, calculate A and C 5.
initialize X* as the best hunter whale location 6.
while t ≤ max iterations do 8.
for each hunting whale do 9.

Results and Discussion
This section examines the outcomes of the experiments using the suggested model. MATLAB is used to implement this concept. For the felt data on Autism Spectrum Disorders, the existing SVM and ANFIS are contrasted in terms of recall, precision, accuracy, and Fmeasure compared to the proposed ECNN model. Six individuals with autism were used in a long-term study to provide an accelerometer signal dataset. The participants wore three wireless accelerometer sensors that measured acceleration along three axes and participated in SMMs and non-SMM behaviors while the data were being gathered in lab and classroom settings. The sensors were attached to wristbands that were worn on the left and right wrists, and they were worn on the chest by looping a small strip of soft cloth around the chest and fastening it to the left side of the chest. Video cameras were used to capture the subject's behaviors, which an expert then examined in order to annotate the data. MIT's sensors captured the initial set of data (hence referred to as Study1) at a sampling rate of 60Hz.

Precision
The proportion of relevant results is what is indicated by the precision Precision= Truepositive truepositive+falsepositive (9)

Recall
The proportion of all relevant results that a user can recall is known as the recall suggested algorithm properly classifies as relevant Recall=Truepositive/truepositive+FalseNegative (10)

Accuracy
The term "accuracy" refers to the proportion of times that this model generates accurate predictions. The following is the official definition of accuracy.

F-measure
A machine learning model's performance is assessed using the F-score measure. It produces a single score that combines recall and accuracy.

Fig. 3. Accuracy results vs. classification methods
The proposed ECNN and current SVM, as well as ANFIS, are compared in figure 3 for accuracy and precision performance measures. Different approaches are shown on the xaxis in the above image, while accuracy and precision values are shown on the y-axis. The proposed work uses deep learning by which accuracy increases. The suggested model performs better than other existing models, according to the above figure. The SVM and ANFIS models, for instance, only provide results with accuracy levels of 60% and 75%, respectively, whereas the suggested ECNN model achieves results with accuracy levels of 80%. For instance, the suggested ECNN model yields higher F-measure outcomes of 92%, compared to the 79% and 85% yields of the SVM and ANFIS models, respectively.

Conclusion and Future Work
This study used deep learning to solve the issue of automated ASD identification. Created a robust feature space from multi-sensor inputs using an ECNN architecture. Further, showed incorporating the parameter tuning of Enhanced Convolutional Neural Network using whale optimization. The results show that the deep learning paradigm has great promise for overcoming the major obstacles of real-time ASD detection systems employing wearable technology. The suggested model generates greater detection accuracy than other current models, according to experimental data. However deep learning increases computational complexity so need to focus on this in future work.