Colored MRI biomedical image tumor classification and segmentation based on transfer learning of modified Y-Net

. By incorporating the colored MRI identification synthesis into the MRI segmentation model with transfer learning AI Y-Net, this study clearly shows the high potential of a multidisciplinary system-level study for diagnoses. This way, such a system can provide integrity of the goal without compromising the quality of each one and saving time consumption. Another alternative to such integration is to be used for enhancement and segmentation that is accurate and robust to the variabilities in scanner and acquisition protocols. System Level Simulator is the deep learning based on Kearse AI deep learning network specified to Y-VGG16 net results of outstanding performance in medical image segmentation. Based on the literature, there are different AI models for the diagnosis system, which are different of what is proposed in this paper. A partial-frozen network is applied to the U-net to compare results between different fine-tuning FT strategies. The network operation is also evaluated depending on the dataset size, showing the importance of the combination of dataset, TL and data augmentation (DA). Transfer learning (TL) helps us for MRI medical image segmentation deep learning with more accurate performances of the TL technique. The system hybrid the Y-Net architecture with Transfer learning to reduce the domain-shift effect in brain MRI segmentation results of the automated deep learning segmentation approach.


Introduction
The science of medical image improves medical diagnosis in an efficient and effective manner that helps to find solutions of medical issues. There are different market techniques such as Magnetic resonance imaging (MRI), Endoscopy, Computed tomography (CT), X-ray, Positron emission tomography (PET), Ultrasound, etc. MRI is an important role in brain tumor diagnosis for clear human brain structure. Generally, brain tumors are difficult to diagnose because they are abnormal cell aggregations that grow within the brain tissues [1]. To extract information from MRI imaging, artificial intelligence techniques have great attention due to their successes in intelligent medicine [2]. Convolutional neural networks have a significant impact on digital image processing especially medical imaging [3]. So, the main focus of this article is to gain knowledge of AI and how to be applied for medical diagnosis enhanced for MRI brain classification. The method presented in the next sub-sections is mainly based on machine learning, specifically deep learning. Where the general principles of deep learning are presented to be motivated for identification and segmentation tasks in neuro-oncology to assign their abilities and limitations [4]. Convolutional Neural Networks (CNN) [LeCun 1995] are used for image processing and analysis (classification, segmentation) [5]. The local operations such as upsampling, convolution, and pooling are based on the application of spatial relations between pixels in 2D or voxels in 3D. The objectives of such a design are to reduce the network parameters number and to reduce the computational costs du-to very large image inputs where parallelized operations are applied. Generally, CNNs based image segmentation training is in an end-to-end manner. Tumor segmentation of MRI medical imaging is particularly important for very challenging tasks due to MRI tissue nature and the diversity of types of equipment [6]. Most of the current methods for image segmentation are CNN-based, such particular methods are brain tumor segmentation as [Kamnitsas 2017a, Myronenko 2018 [7], [8]. The objective of the methodological contributions of this article is to address the evaluation of the main results by using a new DL proposed network, Y-net. Y-net for the purpose of identification and segmentation performs thousands of convolutions, max-pooling, and up-samplings compared to typical U-net [Ronneberger 2015] achieved segmentation only [9]. During an iteration of the training, the different network step operations are stored in the memory of the processing unit's local server. The backpropagation algorithm is applied to compute the gradients of the loss function. A typical MRI is composed of several millions of voxels. Generally, current available GPUs sever of a shortage of memory spaces for MRI requirements. For this reason, current segmentation models have usually trained cloud networks.

Related Work
Nowadays, many recent researchers have focused on the segmentation and classification of brain tumors. The author [10] demonstrated five steps for the segmentation and classification of brain tumors. Firstly, an enhancement source image was applied to a 17-layer CNN architecture segmentation network, secondly, a modified MobileNetV2 CNN was used for feature extraction and selection using the entropy-controlled method. Thirdly, the classification using Multiclass Support Vector Machine M-SVM. Fourthly, a segment tumor using 17-Layer CNN was applied and fifthly classify tumor into either meningioma, glioma, or pituitary results of Accuracy 97.47%, Dice coefficient index of 96.71%. While, Munir et al. [11] segmented brain tumors using a U-Net CNN. The goal of the authors was to reduce large structural deviations and spatial variability. The dataset are taken from the BraTS 2019 dataset, which is processed by CNNs that are effective at recognizing brain tumors. The introduced system, as a result, has a Dice coefficient of 0.9694. Furthermore, the authors Pravitasari et al. [12] demonstrated a U-Net the -VGG16 for segmentation of tumor consisting of two path contracting, another path containing convolutional layer with weights transferred from VGG -16 which has been frozen, and an expensive path for a new convolutional layer to complete extraction. The whole network resulted in about 2,324,353 Parameters and consume 32 minutes to train the model. this model achieved a loss of 0.054 and an accuracy of 0.96 and Ghazanfar et al. [3] demonstrated three-stage for segmentation and classification of tumor. The first stage build binary classification using CNN which consists of nine layers with total trainable parameters of about 217,954, the second stage segmented tumor using Neighboring FCM Which extracted tumor region from tumorous by discarding normal tumor, while the third stage classified image into either edema, necrosis, enhancing or non-enhancing with Total Trainable Parameters about 241,624 as results accuracy 96.88, 96.29 for both binary and multi-classification.

MRI Imaging Deep Learning Network
Multiple MR sequences can be used to image a variety of tumor tissues, including the necrotic core, active rim, and edema. The varying abilities of human experts make manual tumor segmentation difficult and consume a lot of time. While programmed division strategy could help in checking the growth movement by giving the specific confinement of cancer sub-districts and their volume. Nevertheless, the thousands of operations make CNNs segmentation of tumors in large medical images a very difficult task of pooling, convolutions, and up sampling. Current CNN's proposed architecture is designed to extract 2D and 3D context from input images [13]. Probabilistic priors of brain tumors are difficult to be used due to varieties in location, size, and shape. Also, MR tumor image voxels are highly overlapped with the intensities of other brain structures. Depending on the acquisition system, MR image intensities highly vary from one imaging center to another. Due to these aspects, information has to be analyzed. Since the needing of extensive computing and time resources pre-trained models are used as Transfer Learning (TL). TL is used to improve the target learning based on the knowledge of the source task to reduce training time [13]. Such a task is implemented by hybrid the U-Net with another network such as VGG-Net [12] simplifying the U-Net architecture. The VGG-Net as shown in Fig. 1. has an architecture layout similar to the encoder of U-Net, therefore, the implementation U-Net chooses VGG-Net to replace the encoder path that hybrid between these two powerful architectures as shown in Fig. 2. The same action is applied for our proposed Y-net. The resultant architecture will be discussed later in the analysis and results

Y-Net with Transfer Learning Architecture
In this section, Y-NET architecture network is introduced for the purpose of detection and classification. As an architecture, the Y-net is constructed based to the idea of U-Net architecture. Both of them are suggested for biomedical image segmentation. To describe the architecture of Y-net, it is proper to introduce U-net architecture in brief, as shown in Fig. 3., the U-shape consists of an encoder layer (the left path) and the decoder layer (the right path). The encoder network output is the feature map/vector, such output features representing the input information. The decoder has a structure in the opposite way that uses the encoder feature to provide the intended output. The encoder reduces the input matrix size and increases the feature numbers, while the decoder returns the matrix size to its original size by minimizing the feature number results of the ground truth in every pixel. To incorporate the TL in U-net, U-VGG16 was used to replace the encoder path of U-Net as a hybrid between these structures [12]. The used VGG16 already has pre-trained weights by ImageNet that were applied to the U-VGG16 system resulting in the saving of training parameters and training time [14]. This research focuses on the structure of Y-Net and how to be applied for identification and segmentation. The Y-net architecture as given in Fig. 4 consists of two main functions, encoder-decoder for the function of segmentation and encoder-VGG16 fully connected layers for the function of identification. The Y-net encoder is done by using the full convolution of VGG16 while the decoder is done by concatenating the full convolution encoder layers into symmetrical decoder full transposed Conv layers. The full convolution layers of the decoder are an expansion style order of the encoder layer designed as transposed Conv ones. The identification function shares the same encoder of segmentation incorporating with the last three fully connected layers of VGG16 and an additional layer of dense Blocks of the same filter size. To make the feature vector of the identifier of binary form, the last conv layer is of 1×1 sigmoid activation of the threshold value of 0.5. The encoder is transferred-trained based on VGG-16 net using ImageNet one time for both functions. In other words, we replaced the encoder with a pre-trained network Visual Geometry Group, to reduce the number of parameters in the convolutional layers and thus improve the training time as well as to classify brain either has a tumor or not and then segment to calculate the area of tumor in one architecture which we called Y-Net. Due to a limited number of medical images, TL is suggested for cross-domain as pre-train using ImageNet. ImageNet is an expert in detecting of shapes, edges, and textures from natural images, then modify and fine-tune the model with US images [15]. Incorporating VGG16 represents the main parts of the proposed system results of Y-VGG16Net which is replaced by VGG19 for other applications results of Y-VGG19.

Y-Net with Training and Testing
Based on the proposed system design requirement, it is implemented using Cloud Kaggle engine to save in memory and execution time for both training and testing. Due to the library or dataset availability on the cloud, there is no need to install anyone. Accelerators as CPU, GPU P100, GPU T4×2 or TPU v3-8 can easily be chosen, where GPU T4×2 is used. To support the available data sets, augmented data sets are created based on the original data sets, Kaggle data sets. To learn effectively, deep learning needs a large corpus of training data, but collecting this data can sometimes be very expensive and unrealistic. Using the original data, public Kaggle Dataset, and augmented one results of about 3929 containing 65.05% positive while remaining is negative. The dataset images help to train and test the suggested system for Brain Tumor Segmentation and identification effectively. The Y-net architecture as given in Fig. 4. is trained and tested in this section for the task of segmentation and identification. To evaluate Y-net as constructed in section (4.1) above, we firstly run U-Net for the segmentation and CNN for classification processes, resulting in trainable parameters of about 31,031,745 and 30,197,249 respectively since no weights were pre-trained before that consume a lot of time to train all parameter. All the process for segmentation and classification consume about 135 minutes. While pre-trained run U-Net for the segmentation and VGG16 for classification processes, resulting in trainable parameters of about 1,480,929 and 32,769 respectively. Pre-trained Y-net as proposed is run for one time for both segmentation and classification processes results in trainable parameters of about 1,480,929. All the process for segmentation and classification consume about 45 minutes. To reduce the number of parameters, the architecture of Fig. 4. is considered, where the pretrained encoder is incorporated for both the segmentation and classification.

Performance Metric
To evaluate our model during the training and testing phases, we apply the most widely evaluation metrics Loss Function [15], Dice Coefficient DSC(A,B) [16,17], and IoU [18] as follows:. (1) where A , B denote predicted region and ground truth respectively and T.P denote True Positive while F.P False Positive and F.N False Negative (2) where A , B denote predicted region and ground truth respectively and T.P denote True Positive while F.P False Positive and F.N False Negative

Y-Net Results
As described in section V, we stated the effect of using transfer learning for both of classification and segmentation processes. At first, we train the model to classify the image to have a tumor or not with CNN and then we attach a head of Transfer learning using VGG-16. As result total trainable parameter for CNN is about 30,197,249, see Table 1. while for pre-trained VGG-16 about 32,769, see Table 2. Furthermore, CNN has higher accuracy than VGG 16 but the time to execute the model of VGG -16 Is faster than CNN as shown in Table  3. Fig. 5. Training, accuracy, loss, validation Loss, and validation accuracy; (a) CNN (b) With VGG-16.  While in the segmentation process, we used first U-Net which has trainable parameter 31,031,745 as shown in Table 4 , and Y-net Segmentation Model Summary for VGG16 Pretrained encoder Model using ImageNet with about 32,769 of the parameters to train as shown in Table 5. By comparing the Intersection over Union in the segmentation process, the Y-Net has higher than U-Net which is about 0.8255 as well as for another parameter, Dice Coefficients and Accuracy are higher than U-Net 0.9036, 0.9983 respectively, see Table 6. Time to execute the segmentation process for Y-Net is better than U-Net since the encoder is a pre-trained path and only training model for the decoder path, see Table 5.
As a result of classification and segmentation, Fig. 6. and Fig. 7. show respectively a sample of the pre-trained system output.

Conclusion
This study presented the framework Y-net which it is equivalent to the Y-Net with the implementation of TL for both automatic classification and segmentation of MRI images. The pre-trained encoder is implemented for one time for both classification and segmentation. The visual predictions result shows significant refinement in edge smoothing and shaping. Comparison of Y-net with other different models shows the quality in results and saving in training time. The run with GPU makes the MRI classification and segmentation more reliable and runs well on the computer with a processor of Intel Core i7, 32GB RAM, VRAM and.128GB SSD. The ROI segmentation with the proposed model is very well tend to the task of brain tumor MRI image. The system is tested for many 2D patches images of different centers for extraction and training. So far, the extension of the proposed system is to include the creation of models to be trained on 3D patches for the function of segmentation and identification.