Detection of COVID-19 from chest radiology using histogram equalization combined with a CNN convolutional network

. The world was shaken by the arrival of the corona virus (COVID-19), which ravaged all countries and caused a lot of human and economic damage. The world activity has been totally stopped in order to stop this pandemic, but unfortunately until today the world knows the arrival of new wave of contamination among the population despite the implementation of several vaccines that have been made available to the countries of the world and this is due to the appearance of new variants. All variants of this virus have recorded a common symptom which is an infection in the respiratory tract. In this paper a new method of detection of the presence of this virus in patients was implemented based on deep learning using a deep learning model by convolutional neural network architecture (CNN) using a COVID-QU chest X-ray imaging database. For this purpose, a pre-processing was performed on all the images used, aiming at unifying the dimensions of these images and applying a histogram equalization for an equitable distribution of the intensity on the whole of each image. After the pre-processing phase we proceeded to the formation of two groups, the first Train is used in the training phase of the model and the second called Test is used for the validation of the model. Finally, a lightweight CNN architecture was used to train a model. The model was evaluated using two metrics which are the confusion matrix which includes the following elements (ACCURACY, SPECIFITY, PRESITION, SENSITIVITY, F1_SCORE) and Receiver Operating Characteristic (the ROC curve). The results of our simulations showed an improvement after using the histogram equalization technique in terms of the following metrics: ACCURACY 96.5%, SPECIFITY 98.60% and PRESITION 98.66%.


Introduction
In the last two years the arrival of the corona virus (COVID-19) has changed the life of humanity. Its great speed of propagation and the damage that can cause especially to old people or reached of chronic diseases, generated a strong pressure on the establishments and medical personnel to take care all the people reached of this virus. This has pushed all researchers to try to find a solution to stop it. Several screening tests for detection of this virus in the human body have been developed such as PCR test, serological ... all these have experienced a shortage because of the high demand, in this case the only technique adopted by doctors to eliminate this uncertainty, is the chest x-ray.
Among the solutions that have been used is the medical imaging processing, which is a steal from the field of computer vision. It has been able to revolutionize the medical field, through the integration of image processing techniques [1] [2] [3], adapted to medical imaging. Among these techniques the most solicited is the deep learning by the use of convolutional neural networks (CNN). The latter known by their robustness in extracting features and classification, it is presented in the form of an architecture that consists of several layers: convolution layers, pooling layers, dense layers, flatten layers, where each layer has a specific * Corresponding author: benradi.gmail@gmail.com role, the purpose of their use is to make the machine learn to detect the presence of a disease in an image of X-ray, X-ray (CXR), CT, MRI, ... They are used in several medical fields, such as the detection of brain tumors [4], detection of prostate cancer [5] [6], detection of pulmonary nodules [7], or breast cancer detection [8] [9].
In this paper we designed a deep learning model of a lightweight CNN architecture using a COVID-QU chest X-ray database. For this purpose, a preprocessing was performed on the set of images used using the histogram equalization technique in order to have a balanced distribution of intensity which allows to increase the similarity evaluation metrics. Then by decomposing the dataset on two subsets which are Train and Test. Finally, we implemented our CNN architecture, in order to have an efficient model in terms of detecting the presence of COVID-19 on patients suspected to be positive from a chest X-ray image, to help the medical staff to assess the status of the patients with a high speed in order to minimize the damage.

Related work
Several research works have been carried out, which aim to train a model from a CNN architecture, to assert or confirm the presence of COVID-19 from a pulmonary CXR. The authors in [10] proposed two models of classification of COVID-19 CXR images the first is based on a MobilNet architecture and the second is based on a ResnNet architecture, which have been modified to solve the problem of gradient disappearance and improve the classification performance. The results show that the first method achieved an accuracy of 99.6% and the second one recorded an accuracy rate of 99.3%. Another method based on CNN architecture for independent operation on each pixel in an original image using its cumulative histogram. The histogram equalization was designed by Gonzalez and Woods in [16]. It consists in applying a transformation T independently on each pixel of an image coded on L levels. To perform this operation, we define the number of occurrences of a level xk, called nk. Then the probability of occurrence of a pixel of level xk in an image is given by equation (1): automatic detection and identification of COVID-19 n k , 0 d k L (1) was proposed in [11], recording an accuracy of 99.64%. On the other hand, a method combining a CNN architecture and a long short-term memory network LTSM, has been implemented in [12] to automatically diagnose COVID-19 from CXR images. The simulation results of this method reached an accuracy of 99.4%. In [13] Another model of detection of patents affected by x k k n Where: n = the number of pixels in an image. Px = the histogram normalized to [0,1]. We associate a new value SK=T(xk) to the transformation T on each pixel of value xk. This transformation is defined by equation (2): k COVID-19 has been proposed based on a faster VGG-16 (Viusal Geometry Group) architecture using a CNN architecture. This method achieved an accuracy of 99.28%.

Methodology
The following is an example of this operation performed on a chest X-ray image as shown in Figure 2 below:

Method
The COVID-QU database of chest radiography images [14] [15], was used to train our model. It contains 21165 images divided into four categories including 3616 images for COVID-19 positive cases, 10192 normal images, 6012 lung opacity and 1345 viral pneumonia images. We will focus on just the two categories COVID-19 positive and normal cases. Our method is based on an architecture of convolutional neural networks that has as objective the realization of a model capable of identifying the presence of contamination in the lungs of a patient from a chest X-ray image. For this purpose, we have followed the following steps: We begin with a phase of preprocessing that aims to prepare the set of images used. For that we proceeded to a labeling of all the images by their categories (COVID or Normal) which is appropriate. Then we performed an operation of equalization of histograms, which has the purpose of adjusting the contrast on each image and distribute the intensities over the entire range of values, this will allow to obtain a new image by performing an Looking at the cumulative histogram in Figure 2, after performing the histogram equalization operation, we can see that the intensity has been distributed throughout the image.
After completing the pre-processing phase, we formed two sets of data, the first called Train is will be used in the training phase of the model and the second called Test is will be used in the validation phase of the model. Then a CNN architecture has been developed to generate our model. This architecture is defined as follows: Dense + Softmax -Three convolution layers that use successively a number of filters of 32, 64 and 128, a kernel of 3x3 and RELU as activation function.
-Each layer will be followed by a Maxpool layer with a value of (2,2), which will aim to reduce the input size to half.
-A Flatten layer to flatten all values.
-Two fully connected dense layers: the first one uses RELU as activation function followed by a Dropout to avoid overfitting at a threshold set to 0.5 and the second one will use Softmax as activation function for classification.
The figure below shows our CNN architecture: Normal Receiver Operating Characteristic or ROC curve: is a method for evaluating the performance of a binary classifier. It is presented in the form of a curve that graphically presents the rate of true positive versus false positive.

Results
The results of our simulations were performed on the COVID-QU image database, in order to evaluate the similarity metrics and to make a comparison to see the impact of each operation performed on the set of images. To better measure the performance of our proposed method two other architectures were tested which are CNN and MOBILNET to make a comparison of results. All models were trained using 100 epochs. Table 1 shows all the results of our simulations. We notice that there is a direct influence on the results by performing processing on the images. The proposed method recorded a considerable increase on the evaluation metrics, compared to using a CNN architecture without preprocessing.
The results of our simulations on the evaluation by the confusion matrix that was generated using the test part, which consists of 600 images of which 300 belong to patients with COVID-19 and 300 belong to normal patients as shown in Figure 4. The exploitation of the confusion matrices revealed that the proposed method outperformed all other methods and identified 296 positive cases and classified 4 results as Normal when they were positive and identified 283 results as Normal when they missed 17 results by classifying them as positive when they were normal. The results from the use of the MobileNet architecture recorded the least satisfactory results by classifying 25 results as positive when they were normal and missed just 4 cases by classifying them as normal when, they were positive. The evaluation of the similarity scores using the ROC curve shown in Figure 5 below, which was used to select the optimal models and reject the suboptimal ones, reveals that the performance of the similarity scores by the proposed method was the best among all other models. This result would prove that our model was able to distinguish the positive and negative cases.

Discussion
The results of simulations performed, shows that the pre-processing performed on the set of images has a good impact on the results of similarity scores, because when using the CNN, the accuracy has reached a value of 96% for ACCURACY, 96.93% for SPECIFITY and 97% PRECISION. On the other hand, the proposed method improved all these values by recording a value of 96.5% for ACCURACY, 98.60% for SPECIFITY and 98.66% for PRECISION. The MobileNet architecture and despite its complex architecture, which consists of more than 300 layers has recorded the least satisfactory results by recording an ACCURACY of 95%.
On the other hand, it is also noted that the comparison of the results of the related works with that of the proposed method in terms of accuracy as shown on the figure below (Fig.6), showed that the proposed method is competitive.

Conclusion
In this paper we have realized a new method of medical image processing for the detection of COID-19 from chest X-ray images, applying a processing on the set of images used, by the histogram equalization technique for an intensity distribution in the images, to make them pass to a light CNN architecture in order to train our model to detect the presence of COVID-19 on the X-ray images. The evaluation of our model was performed using two metrics which are the confusion matrix and the ROC curve. Our method gave satisfactory results and recorded an accuracy rate of 96.5%. It is preferable to train our own database of chest radiography images to better test our model and improve the model similarity scores using other techniques such as segmentation.