Automated Non-invasive Diagnosis of Melanoma Skin Cancer using Dermo-scopic Images

Melanoma skin cancer is one of the deadliest cancers today, the rate of which is rising exponentially. If not detected and treated early, it will most likely spread to other parts of the body. To properly detect melanoma, a skin biopsy is required. This is an invasive technique which is why the need for a diagnosis system that can eradicate the skin biopsy method arises. It is observed that the proposed method is successfully detecting and correctly classifying the malignant and non-malignant skin cancer. Finally, a neural network is used to classify benign and malignant images from the extracted features.


Introduction
Skin cancer is a fatal disease that is life-threatening. Recently, skin cancer has become one of the most lethal forms of cancers found in human beings. Out of all the various types of skin cancers, melanoma cell skin cancers are the most common type of skin cancer and is the most unpredictable. Malignant melanoma has become one of the most dangerous and aggressive type of skin cancer and its occurrence amongst humans has been increasing quickly. Melanoma, also known as malignant melanoma, is a variant of cancer that develops from the cells that contains a pigment known as melanocytes. These melanocytes may develop from a mole and bring about a changes that include increase in size, irregular border, and change in color, breakdown of skin or itchiness. Melanoma is usually caused by over-exposure to Ultra-Violet (UV) radiation from the sun and tanning beds. Due to this, it damages the DNA of the skin cells, and the cells can begin to grow out of control. It is one of the most unpredictable skin cancers and therefore detection of melanoma cancer at the early stages could help in curing it quickly and efficiently. Figure 1 shows a dermoscopic image of a skin melanoma.
Some of the symptoms of skin melanoma cancer is the ABCD rule which is the most common method used for detection of melanomas by doctors where the symptoms taken into consideration are: • A -Asymmetry: The first half of a spot/lesion is not equal to the other half.
• B -Border: The edges of the lesion are ragged, irregular, or blurred.
• C -Color: The color across the lesion is not homogeneous and may include different shades of white, red, brown, blue, gray, or black. * e-mail: huda.gm.khan@gmail.com Melanoma of the skin is the 19th most commonly occurring cancer in men and women [1]. There were about 300,000 new cases of melanoma skin cancer in patients in 2018. In 2019, there were 104,350 with active melanomas which resulted in 11,650 deaths worldwide. In 2018, the highest rate of melanoma was observed in Australia followed by New Zealand. In 2018, India had about 26% of cancers reported as melanoma of the skin. Skin melanomas are treated according to the stage and location of the melanoma [2]. Most treatment operations require surgery to remove the affected area, radiation therapy, biological therapy, targeted therapy, immunotherapy, and chemotherapy. However, melanoma skin cancer can be treated easily if identified and diagnosed at an early stage. This makes the early diagnosis of melanoma as a critical part, which can be a challenging task for dermatologists since other skin lesions might have similar physical characteristics. Dermoscopic images are a widely common technique used in capturing melanomas accurately. This is done by performing an in-vivo observation of pig-mented skin lesions. Earlier, for the detection of malignant melanomas, dermoscopic images had great prospective, but their elucidation requires not only time but is also subjective. As a result, the necessity to construct a system that can help dermatologists in making an accurate decision for the early diagnosis of melanoma has become critical. In this paper, we are proposing an automated noninvasive diagnosis of melanoma skin cancer using dermoscopic images. Skin melanoma is represented with the help of simple yet efficient descriptors and further classified into malignant or nonmalignant melanoma cancer using neural network.

Literature Survey
Global warming has increased the intensity of solar radiation which has led to rise of skin melanoma cancer in human beings [2]. Melanoma, can be cured if it is detected in the early stages. Traditional approach to melanoma skin cancer detection requires biopsy-an invasive technique which can be a painful, costly and tardy procedure. Therefore, the need for an automated procedure to detect skin melanoma cancer accurately is a necessity in the medical field. Melanomas can exist in varied shapes, colors and sizes which make it difficult to detect it an early stage which is why it is essential to design a system that takes proper features into account for extraction. Diagnosis of skin melanoma cancer using an automated system includes the following steps: • Image Acquisition: Input of data image sets is collected.
• Image Pre-processing: Removes unwanted noises and distortions in the input images.
• Image Segmentation: Involves extracting the skin lesion from the input image for further analysis.
• Feature Extraction: Selecting and extracting the relevant features from the skin lesion that is required to correctly detect skin melanoma cancer. The most common method of feature extraction is the ABCD rule where: -A -Asymmetry: Asymmetry specifies the axis of symmetry. Benign lesions tend to be symmetric while malignant lesions are asymmetric in shape. -B -Border: This is a measure of the irregular shape of the border of a lesion. Benign lesions are almost less ragged and have a smooth looking border while malignant lesions are highly irregular in nature. -C -Color: A color score of a lesion is a crucial feature. Benign lesions are more homogeneously colored throughout while malignant lesions could contain various undertones of brown, black, red, white or blue colors and are not uniform in color. -D -Diameter: A malignant melanoma tends to grow in size and surpasses its diameter of larger than 6 mm.
• Classification: The system classifies the input image as either, malignant (cancerous) or benign (non-cancerous).
Majumder and Ullah [3] proposed a system that considered more features from the fundamental ABCD rule. Image resizing and contrast adjustment were used for image pre-processing. An averaging filter was applied to the RGB input image to remove any noise like air bubbles or hair. Segmentation of the skin lesion is done by Otsu's automatic thresholding after which a gray scale masking is done to remove any small blobs present in the segmented lesion. The features that are extracted are: Asymmetry (across x and y-axis), Border Irregularity, Color Variegation, Lesion Diameter, Difference between Maximum and Minimum Feret's Diameters. On analysis, it is found that higher difference signifies the lesion is malignant and a lower difference signifies benign melanoma. These extracted features are then used to train the Back propagation Neural Network (BNN) for classification of the skin lesion as malignant or benign melanoma.
Eltayef et al. [5] proposed another system that focused on various image pre-processing and segmentation techniques. The image pre-processing in this system aimed at hair and noise detection and removal techniques. In order to eradicate any air bubbles, a thresholding method was used that used the pixel's intensity values. For the hair removal, first, the system uses a bank of 64 directional filters to detect hair in the image after which the image is filtered with the help of each of the aforementioned directional filters. Consequently, a Gaussian filter is implied by calculating each pixel's local maximum. This method of thresholding is used to differentiate between the hairs and the actual background of image. Upon detecting the hair from the image, their binary masks and gray scale image are multiplied. This system used two-step segmentation: Fuzzy c-means (FCM) and Markov Random Field (MRF).
Premaladha and Ravichandran [6] emphasizes the need for more number of features to be extracted for feature extraction. This system also uses a different image enhancement and segmentation technique. An input dataset of 992 dermoscopy images are used. It is then pre-processed of any noise, air bubbles, scars or hairs on the skin lesion. Contrast Limited Adaptive Histogram Equalization technique (CLAHE) is the preprocessing technique used to get a contrasted image which helps in accurately deriving the features. Median filtering is applied to eliminate any noise and smooth out the ragged edges of the skin lesion. Normalized Otsu's Segmentation is used for segmentation. The features extracted are: Mean, Standard deviation, Variance, Entropy, Contrast, Homogeneity, Energy, Correlation, Area, Perimeter, Diameter, Asymmetry index, Circularity index, Fractal dimension, and Compactness index. Deep learning based neural networks (DLNN) and Hybrid AdaBoost algorithms were used for classification of the skin melanomas.
Jain et al. [7] proposed a system that is so potent that it can take the input image from any camera, e.g. mobile camera. Image processing techniques like gamma correction were used that facilitated image resizing and adjustment of contrast and brightness. Image segmentation was done using Otsu's automatic thresholding, binary masking and edge detection. The features extracted in this system included geometric features of the skin lesion, they are: Jain et al. [7] Image resizing, contrast and brightness adjustment.
Area, Perimeter, Major and Minor Axis Lengths, Circularity Index, Irregularity Index.
Done by ABCD rule.
Sheha et al. [8] Image resizing, from RGB to grey level conversion.
Uses Texture Analysis.
ABCD features along with features extracted from pattern recognition algorithms.
Done by the ABCD rule.
She et al. [10] High-pass filtering and gradients in the horizontal and vertical directions are applied.
Snake-based edge detection technique.
Asymmetry, Border irregularity, Color variegation, Diameter, and one feature set is extracted from the pattern of the skin lesion.
Area, Perimeter, Major Axis Length, Minor Axis Length, Circularity Index and Irregularity Indices. A pre-defined set of thresholds using the ABCD rule were set for the classification stage. The output would classify the skin melanoma as normal skin mole or melanoma skin cancer.
Sheha et al. [8] encouraged a system that uses texture analysis that eliminates the segmentation step. 102 dermoscopy images with 51 benign and 51 malignant melanomas were used on which image resizing of 512x512 is done. Additionally, features based on grey level co-occurrence matrix is carried out upon converting RGB image to gray-level. The features extracted are: Contrast, Correlation, Cluster Prominence, Dissimilarity, Homogeneity, Difference variance, Difference entropy, Information measure of correlation, Information measure of correlation, Inverse difference homogenous, Inverse difference normalized, and Inverse difference moment normalized. For classification, Multilayer Perceptron (MLP) is used.
Isasi et al. [9] proposed different pattern recognition algorithms based on the nature of the skin melanomas. Three algorithms-globular, reticulated and blue pigmentation-are proposed as these features are recurrent in malignant melanomas. Therefore, the features extracted are those from the fundamental ABCD rule in addition with the features extracted from the pattern recognition algorithms.
She et al. [10] pivots around designing a system that utilizes the ABCD rule to extract features from the skin lesion which helps improve the system's classification accuracy. High-pass filtering and gradients in the horizontal and vertical direction are applied for the image preprocessing after which a snake-based edge detection technique is used to define the lesion boundary. The features extracted are from the ABCD rule and one from the pattern of the skin lesion. Furthermore, Principal Component Analysis (PCA) is used for classification. Table 1 summarizes the various methods used in existing systems. Isasi et al. [9] ABCD features along with features extracted from pattern recognition algorithms.
Proposed System Asymmetry along x-axis, Asymmetry along y-axis (Concept of connectivity in pixels used to evaluate principal axes for the asymmetry scores), Area to perimeter ratio, Compactness index, Product of Area and Perimeter, Color score evaluated with bitwise AND operation on the HSV image and color mask, Average of lesion diameter, Difference between the major and minor axis' lengths, and Lesion diameter. Table 2 gives the comparison of the existing method and proposed method to extracting features. The proposed methodology comprises of the following primary steps: image acquisition, image pre-processing, image segmentation, feature extraction, and classification of the image. An input image is given to the system is a skin lesion that is to be classified as a benign or malignant melanoma. The input images are collected from PH2 database [11] and The International Skin Imaging Collaboration, ISIC. Removal of noises and hairs from the input image is crucial which is why morphological and blackhat filtering, and inpainting algorithm is implied. To segment the skin lesion from the input image, Otsu's algorithm [12] and Chan-Vese model [13] is used. This model extracts nine features according to the fundamental ABCD rule and further uses ANN classifier for classification of benign or malignant melanoma.

Image Acquisition
The PH2 database is a publicly available database of dermoscopy images provided by the Pedro Hispano Hospital. From this database, 200 melanocytic images were collected which consists of 160 benign and 40 malignant lesions. Additionally, 219 benign and 325 malignant images were collected from another publicly available database of dermosocpic images from the The International Skin Imaging Collaboration, ISIC. This totals to a dataset of 744 images.

Image Pre-processing and segmentation
The input image can contain a lot of noise and other artifacts, like air bubbles, hairs and varied textures of the skin. With all these factors, it is essential that the image is properly preprocessed and correctly segmented to increase the accuracy of the system. Image preprocessing is done by performing various morphological operations on the grayscale image. Upon applying the morphological filter on the grayscale image, a blackhat filtering operation is used that helps enhance the darkened moles on the skin lesion as the region of interest. Finally, an inpainting algorithm is used wherein a hair mask is created using thresholding that highlights the hair contours after which the original image is inpainted depending on the mask. Inpainting algorithm helps restore the background to the hairs detected in the hair mask. Suppose the hair mask, H, specifies the location of hair contours' pixels in the input image, I as follows: H(x) = 0, if pixel(x,y) has hair in I 1, if pixel(x,y) has no hair in I On locating the hairs, the pixels are then reconstructed by exploiting the information present in the regions with no hair. Segmentation is an important step in digital image processing. The process of separating the region of interest from the skin lesion is crucial as this segmented image will further be used for feature extraction. Segmentation is performed in two steps using Otsu's thresholding method [12] and the Chan-Vese model [13]. Furthermore, morphological operations are performed to remove any more noises in the image. This allows accurate segmentation of the image. Otsu's method is used to separate the foreground and background pixels into different classes using bi-modal histogram. The algorithm then assesses the threshold that is the most optimum such that the two classes partitioned have their inter-class variance as maximum as possible. Algorithm 1 shows the Otsu's Thresholding Algorithm.
Algorithm 1: Otsu's Thresholding Algorithm 1 Evaluate histogram and probabilities of each intensity level. 2 Set the initial values of ω i (0) and µ i (0). 3 Compute class probabilities of the two classes,ω i , and their class means,µ i , as: 6 Step through all possible t thresholds from t = 1 till maximum thresholding intensity. 7 Update ω i (t) and µ i (t). 8 Evaluate σ 2 b (t) as: The Chan-Vese model uses an active contour model. This contour must stop once it reaches the boundary separating the foreground and background. The binary output image after the Otsu's method, B, is used as an initial contour to the Chan-Vese model. The model minimizes the energy function E as given in the Equation 1: The values of µ, v, λ 1 , λ 2 and p are p = 1, v = 0, and λ 1 = λ 2 = 1 according to the Mumford-Shah approach [14]. c 1 and c 2 are the averages of the given image, u 0 . After the two-step segmentation, a median filter of window size 25 and morphological closing operation is applied on the image. The preprocessed and segmented images of two sample images are shown in Figure 2.

Feature Extraction
Once the skin lesion of interest is extracted from the source input image, we have to extract the appropriate and necessary features needed from the image. In this paper, we have heavily emphasized the accuracy of the ABCD rule, therefore, we will continue to use the fundamental rule in addition to more features. Features like Asymmetry and Diameter require principal axes for the evaluation of their scores. In order to evaluate this, the concept of connectivity in pixels in digital space has been used. A pixel p at coordinates (x, y) is connected to every other pixel horizontally, vertically, and diagonally. As shown in the Figure 3, the pixel at the center p has 8 connected pixels. This means only four axes are possible for this pixel. Upon identifying the centroid of the skin lesion and translating it to the origin of the coordinate system, the length of the four axes passing through the centroid are evaluated and compared. The axis with the greatest length is assigned as the major axis. As principal axes are always perpendicular to each other, the corresponding minor axis of the image. For instance, in the Figure 3, if 7 − 3 is major axis, then 1 − 5 is assigned as the minor axis.

Asymmetry
Asymmetry is one of the most important features in order to classify the skin lesion as malignant or benign. It evaluates how similar one half of a region of the lesion is to the other half. A malignant melanoma will tend to be more asymmetric in nature. To calculate the asymmetry, the image obtained upon segmentation (Figure 4(a)) is made to align with the Euclidean coordinate system. The centroid of the image (Figure 4(b)) is translated into the origin of the coordinate system (Figure 4(c)). This image is then rotated with the orientation angle so that the major axis of the blob fits into the x-axis of the image, giving a lesion image, L (Figure 4(d)). This rotated image is then flipped along the x-axis, L x (Figure 4(e)). The difference between L and L x will give the non-overlapping region along x-axis, L ′ x (Figure 4(f)). Asymmetry along y-axis is carried out in a similar fashion as shown in Figure 4(g), 4(h), and 4(i).

Border
Border defines the contour of the blob on the skin lesion. A malignant melanoma is highly irregular in nature and can have ragged edges to itself. The border features are B1, B2, and B3 which are Area to perimeter ratio, Compactness index, and Product of Area and Perimeter respectively. B1 and B2 gives smaller values in case of malignant lesions. B2 represents smoothness of the lesion's edges and ranges between 1 and 0. In malignant lesions, the edges are uneven and not smooth thus, giving a B2 score of a number closer to 0 whereas a benign lesion will have a B2 score of a number closer to 1. Additionally, for malignant melanomas, B3 tends to be of greater values. These are calculated by the equations (6), (7), and (8) respectively.

Color
In determining the color score, six colors are considered important which are: black, dark brown, light brown, bluegray, red, and white. The segmented image is used as a mask on the original RGB image to get colored segmented image. RGB threshold values for each of the six colors is initialized and the image is transformed into HSV color space. A mask is created using each corresponding color. Bitwise AND operation is performed on the HSV image in order to determine the color mask and active contours. If the length of active contours is found to be greater than zero then that color is assumed to be present in the lesion.
Since there are six colors assumed to be present in the lesion, the color score, C, can range from 1 to 6. Two images having color scores 1 and 3 are shown in the Figure 5(a) and Figure 5(b).
(a) (b) Figure 5: Color scores of two images

Diameter
The diameter of a malignant melanoma may grow in size of over 6mm. The diameter features are D1, D2, and D3 which are Average of lesion diameter, Difference between the major and minor axis' lengths, and Lesion diameter respectively. D1 is the average of the diameters Da and Db of the skin lesion. D2 is difference of the major axis, D, and minor axis, d. D3 is the diameter of the lesion in mm, where M is the magnification factor of the original image and 0.2645 is the factor used to convert the diameter acquired from pixels to mm. These are calculated by the equations (9), (10), and (11) as follows:

Classification
After extracting the features, a classifier is trained that uses the features extracted from the input images. Here, a sequential artificial network with four dense layers has been used. The input layer with nine input dimensions (A1, A2, B1, B2, B3, C, D1, D2, and D3) and a batch size of 10, and the second and third dense layers having a batch size of 50. The elu activation function has been used in the first two layers and the relu activation function in the third layer. The final output dense layer uses the sigmoid activation function. The classification model has been compiled used the Adam optimizer. The loss of the model has been evaluated using Mean Squared Error function.

Results
The diagnostic system proposed in this work has been implemented in Python. For the purpose of testing the system, 25 new images of lesions -12 benign and 13 malignant -were used. The system could correctly classify the images as 'Benign' or 'Malignant'. Figure 6(a), 6(b), 6(c), and 6(d) shows a graph representing the asymmetry scores (A1 and A2), border scores (B1, B2, and B3), color score, C, and diameter scores (D1, D2, and D3) for benign and malignant melanomas. It is observed that in case of malignant melanomas, A1 and A2 are greater for both axes, due to their asymmetric nature, B1 and B2 is smaller and B3 is greater, D1, D2, and D3 is greater, and finally, C is observed to be greater as the number of colors in them is higher when compared to benign melanomas. It is also noted that malignant melanomas tend to have 2 or more colors present in them.

System Performance
The performance measures calculated are: Accuracy, Precision, Recall, and F1-score using the Equations 12, 13, 14, and 15 respectively.
Recall = T P T P + FN (14) F1 − score = 2T P 2T P + FP + FN The system gives an accuracy of 89.93% with test error rate of 0.074, precision of 95%, recall of 82.61%, and F1score of 88.37%. Figure 7 shows the ROC curve of the ANN classifier.

Classification with Different Classifier Models
The features extracted have also been tested with four other classifier models -Decision Tree, Random Forest, Extra Trees, and Gradient Boosting. Figure 8 shows the ROC curves of the different classifiers and Table 3 shows their performance measures.  It is observed that all the models have AUC of over 0.85 which is considered good for a classifier model. Also, the accuracy, sensitivity, and specificity scores of the models are also above 85% meaning that they can perform fairly well. This shows that the features extracted are appropriate and can be used to feed various classifier models to assist in classifying the skin lesions as benign or malignant.

Conclusion and Further Work
The exponential rise of melanoma skin cancer calls for an alternative approach to detect melanomas at an early stage so that it can be cured. The traditional approach to detecting melanomas comprise of carrying out skin biopsy. This method, although widely used, is painful, costly and time-consuming. There is a dire need of an automated process taking over the traditional method that is just as efficient, thus, making it an important area of research. The computer aided system proposed in this paper aims to do the same by using various image processing methods and machine learning algorithms that can help accurately detect melanomas. This system provides an automated system with varied methods for pre-processing, segmentation, feature extraction and classification of image. However, there is still some areas that would require further study. One instance is the segmentation of image in dark-skinned patients. Also, a large amount of image datasets would be used for further improving training the classifier neural network in the future works.