Algorithm of search and track of static and moving large-scale objects

We suggest an algorithm for processing of a sequence, which contains images of search and track of static and moving large-scale objects. The possible software implementation of the algorithm, based on multithread CUDA processing, is suggested. Experimental analysis of the suggested algorithm implementation is performed.


Introduction
The problem of search and track of static and moving large-scale objects is needed in many areas of human activity such as astronomy, aviation, medicine, etc [1][2][3][4][5][6][7] (Fig 1,2).The problem implies processing of a single image or a sequence of images for detection coordinates of static and moving large-scale objects.
We propose new method based on calculation of the descriptors of the image frame using a graphic subsystem and comparing it with the preliminarily trained matrix of objects by means of an artificial neural network model.The method of large-scale object search consists of 3 steps: • preliminary processing of an image of a video sequence (conversion to monochromatic, reducing noise with the help of median filtering, increasing contrast, selection of visibility region); • detection of the point features of the image (calculation of descriptors) (Fig. 3); • comparison of the point features with a template (comparison of the detected point features of the image with template features of possible objects) (Fig. 4).Proposed algorithm is pretty hard to calculate that is why we decided to aim it at parallel computing system realization.For analysis of the suggested algorithm we have developed software implementation based on multithread loading of the CUDA-calculator.

Algorithm
At figure 5 you can see the structure flow chart of the developed algorithm.Let's describe it in details.At first blocks of flow-chart we show input data of the algorithm: the next frame frame of the video sequence, the vector of parameters opt_flow of the optical flow, the matrix of training descriptors sum_descr and the vector vPoints of the points, which correspond to the desired targets.
The algorithm starts with calculation of the descriptors descriptors_1GPU of the image frame using a graphic subsystem.Then the obtained descriptors are compared with the preliminarily trained matrix sum_descr by means of an artificial neural network model.Operation of the model is described in the next section of the paper.
The output results of the artificial neural network model are the matrices trainIdx, distance, matches.Further on, we select the descriptors of the vector descriptors_1GPU which most exactly coincide with the matrix sum_descr according to the maximum distance criterion MAX_DISTANCE.The next step of the algorithm consists in transformation of the obtained descriptors good_matches into the items of the vector vPointsBig.Coordinates of each group of the vector vPointsBig are averaged by means of a special algorithm.The rest items of the vector vPointsBig are added to the vector vPoints with the help of the algorithm of point adding to the vector vPoints, which is similar to the algorithm of new points adding for point targets.The vector vPoints, which now contains some added items, is updated with the help of the correspondent algorithm, similar to the algorithm of the vector vPoints update for point targets.The last step of the algorithm is output of the suspected targets obtained by analysis of the vector vPoints.

Implementation
For analysis of the suggested algorithm we have developed software implementation [8].For faster execution of our software tool we use multithread loading of the CUDA-calculator (see Fig. 6).
Since new versions (higher than 4.0) of the CUDA platform follow the rule "one CUDA context for one process in the system", certain complications, concerning software implementation of multithread loading of the CUDA-calculator, become a problem.It is possible to avoid such limitation, using API CUDA and generating virtual CUDA contexts for one thread in RAM.Owing to such approach, we have no limitations concerning the rule "one CUDA context for one process", and each thread will have its CUDA context.In this case, the only imitation is available RAM of the computer device.The main idea is loading a group of images into video memory and data-intensive processing of image fragments (the image is equally divided among threads).The number of threads (cuda1 ... cudaN) is chosen experimentally according to the characteristics of the used video card and the size of available RAM of the computer system (recommended utilization of the graphics processor (GP) is not more than 80%).It is necessary to take into account utilization of the graphics processor, not the top temperature limit of the core.
To analyse the execution time of our software implementation of the suggested algorithm with multithread programming of the CUDA-calculator we used the following test [9-10].The number of used threads is 1-8, the input number of images is 2-64.If we use 1 thread and 64 images, then utilization of the graphics processor does not exceed 10%, and if we use 8 threads and 64 images, then GP utilization is 80%.Table 1 shows the testing results.The first column contains the number of images in the sequence.The first line contains the number of threads.The execution time is measured in seconds.

Figure 1 .
Figure 1.Detecting and tracking of static large-scale objects.

Figure 4 .
Figure 4. Filtered results with real target.

Table 1 .
Figure5.The structure flow chart of the algorithm of large-scale object search and detection from complex background.Testing results of the execution time.As you can see from the table, owing to use of multithread loading of the CUDA-calculator it is possible to reduce time costs proportionally to the number of used threads.