An algorithm for crops segmentation in UAV images based on U-Net CNN model: Application to Sugarbeets plants

. In recent years, Digital Agriculture (DA) has been widely developed using new technologies and computer vision technics. Drones and Machine learning have proved their efficiency in the optimization of the agricultural management. In this paper we propose an algorithm based on U-Net CNN Model to crops segmentation in UAV images. The algorithm patches the input images into several 256 × 256 sub-images before creating a mask (ground-truth) that will be fed into a U-Net Model for training. A set of experimentation has been done on real UAV images of Sugerbeets crops, where the mean intersection over Union (MIoU) and the Segmentation accuracy (SA) metrics are adopted to evaluate its performances against other algorithms used in the literature. The proposed algorithm show a good segmentation accuracy compared to three well-known algorithms for UAV image segmentation.


Introduction
Digital agriculture (DA) has been widely developed during the last decades, where the computer vision algorithms, and specially image segmentation, takes a great place in several DA applications.
Nowdays, Deep learning (DL) is presented as the most performance state-of-art systems for image classification. In the last decade, several Deep Convolutional Neural networks has been proposed, such as: AlexNet [1], GoogleNet [2], DenseNet [3], etc. The main idea of Deep learning is to automatically build a data representation through the learning phase, thus avoiding human intervention. We therefore speak of learning by representation. A Deep Learning algorithm will learn increasingly complex hierarchical representations of data. [4].
The use of Deep learning in DA has found a success for two reasons: i) the big amount of available data in DA (Aerial images). ii) The complexity of some DA tasks (disease detection, harvest estimation) against the performance of classical computer vision algorithms. The authors in [5] presented a DL architecture for realtime tomato disease and pest recognition based on three main families of detectors: Region-based Fully Convolutional Network (R-FCN), Faster Region-based Convolutional Neural Network (Faster R-CNN) and Single Shot Multibox Detector (SSD) and after that combining these detectors with a deep feature extractors. Another work presented in [6] proposed a CNN model for detection of mango fruit in trees canopies and the estimation of fruit load. In [7] the authors proposed a supervised deep learning-based classification called transfer learning for soil * Corresponding author: khalid.elamraoui @um5r.ac.ma segmentation. And finally, the work presented in [8], [9], use quantum superposition laws and machine learning for agricultural image segmentation In this paper, we present an algorithm for aerial agricultural image segmentation based on U-Net CNN Model. The proposed algorithm can be divided into four major parts: 1-Image patching. 2-Mask creation.
3-U-Net model training and patch prediction 4-Reconstruction. In order to show the performance of our proposal, a set of experimentation is adopted using agricultural images taking by a UAV and using the mean intersection over Union (MIoU) metrics and the Segmentation accuracy (SA) metrics to quantify its performance. The paper is organised as follow: The next section is dedicated to the used materials. We detailed the proposed method in the section 3. After that, a set of experimentation is conducted in the section 4 and finally a conclusion is presented in the sixth section.

Materials
Our objective is the plant segmentation from aerial images. For that we used the database presented in [10] of aerial images taken by an UAV (Fig. 1). It contains 474 images taken by a DJI MATRICE 100 UAV drone for the Field A, and 194 images taken by DJI PHANTOM 4 UAV for the field B (Table 2). In our model, we used the images from the field A. Table 1 Represents the details of the used Database.

Number of images 474
Image type RGB Sensor size (H × V mm) . × .

Number of images 194
Image type RGB

Proposed Method
As mentioned before, the proposed method is composed of five major steps. In this section we present the five parts of proposed algorithm with a theoretical background of U-Net CNN model. Our main goal is the segmentation and localisation of plants in UAV images. For that, we start with patching each image of the databases into 256 × 256 subimages. After that, each sub-images is used to create a mask that will stand as a ground truth for the training model.

Image patching and labelling
The first step of our proposal consists of patching the input images of the database into several sub-images (144 sub-images for each 4K image). Each sub-image is labelled based on its position in the original image (Fig.  2). These labels will be used in the reconstruction process. Fig. 2. Image patching and labelling.

Mask creation
The first step of our proposal is the mask generation. The idea here is to automatically segment the database images to be used as ground-truth. For that we used a color-based manual segmentation using Hue Saturation Value HSV color space, the Hue matrix of the input images is extracted by thresholding the green elements of the image (Fig. 3). An example of the Mask generation result is presented at the Fig. 4   Fig. 3. Extracting Green zone from the HSV color space.

U-Net Model training and patch prediction
U-Net is a semantic segmentation architecture [11]. It has two paths: one that contracts and one that expands. The convolutional network's contracting path follows the standard architecture. It comprises of two 3x3 convolutions (unpadded convolutions) that are applied repeatedly, each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for down sampling. We quadruple the number of feature channels with each down-sampling step. Every step in the expansive path involves up-sampling the feature map, a 2x2 convolution ("up-convolution") that halves the number of feature channels, a concatenation with the proportionally cropped feature map from the contracting path, and two 3x3 convolutions. each of which is followed by a ReLU. Due to the loss of boundary pixels in every convolution, cropping is required. A 1x1 convolution is employed at the final layer to convert each 64-component feature vector to the desired number of classes. The network comprises a total of 23 convolutional layers (Fig.5).
After the training process, we used the trained model to predict on the patches (Fig.6).

Image reconstruction
The objective of the reconstruction is to remount segmented sub-images to the original segmented highresolution images. For that we used the etiquette given to each sub-images to reconstruct the original segmented 4K image, as shown at the Fig. 7.   Fig. 7. Reconstruction process.
The proposed algorithm is summarized in the Fig. 8.

Experimentation
In order to test the performance of the proposed algorithm, a set of experimentations is done on agricultural aerial images.

Evaluation method
To evaluate the algorithm in term of segmentation accuracy, a quantified and measurable metrics are needed. We used the the mean intersection over Union (MIoU) and Segmentation accuracy (SA) metrics, defined as:

The Mean Intersection over Union (MIoU):
The MIoU present how is the prediction is similar to the ground truth.

Experimentations
In order to show the performance of our algorithm, we applied it to aerial images from the database presented in the section 2, and compared it with the method proposed by Zhun and al. for tobacco plant, which use three stage algorithms: i) plant segmentation using morphological operations. Ii) CNN training ii) Posttreatment operation [12]. The method proposed in [13] which used a CNN model for plants detection and counting, and the method proposed by Pacheco and al. that use an encoder-decoder architecture for classifying each pixel as non-crop or crop [14].
The Table 3 and the figure 10 shows the results of applying the proposed algorithm to the UAV images compared to other methods presented for the same purpose.
The figure 9 shows the test we did on aerial image from the database, field B, presented in section 2. We divide the image into small images of 256*256 and make the predictions. After we reconstruct the image into its original size using the predictions.

Conclusion
We presented in this paper an algorithm for plant segmentation in Aerial images based on U-Net CNN Model. The proposed method starts with an image patching and labelling before using them for mask creating. The created masks are after that used as an input for the training model. The proposed algorithm gives a segmentation accuracy of 97% against 92.8% by the method proposed by Pacheco and al., 92.8% by the method proposed by Zhun and al. and 95.1% by the method proposed by Osco and al.
Our proposal can be optimized by adding a step for enhancing the quality of the patched images. Also a manual mask creating can increase the accuracy of the model by preventing false segmented masks.
As a perspective, we intend to add a super-resolution step to increase the quality of the sub-images using quantum principals.