Video synopsis algorithm based on two-stage target tubes grouping

. Video synopsis generates a concentrated video that can be browsed quickly. With the increase of condensation ratio, more pseudo collisions between target tubes will occur. To solve this problem, this paper proposed a video synopsis algorithm based on two-stage target tubes grouping. In the first stage, using the hypergraph to analyze the collision relationship between target tubes, and the target tubes are grouped according to the hyper-edges. In the second stage, a clustering algorithm based on equal distance nearest neighbor sampling is proposed to group the target tubes. Then, selecting target tubes according to the selection principle of quantity priority between groups and length priority within groups (QPB-LPG) . Finally, these target tubes are rearranged to generate concentrated videos with smaller pseudo collisions. The experimental results show that this algorithm can significantly reduce the pseudo collision between target tubes without reducing the frame condensation ratio and frame compact rate compared with existing video concentration algorithms, and the feasibility of the method is fully verified.


Introduction
As an important form of data representation, video has been more used in criminal investigation, security management and other fields.People urgently need a way to efficiently obtain the main content of video and occupy less memory.Video synopsis technology is one of the effective means to solve this problem.
The collision of target tubes in video synopsis is one of the important problems in current research.Literature [1][2] reduces the collisions between targets by moving targets or changing the size and speed of targets.However, some target motion information will be lost while changing target attributes.Literature [3][4][5] use graph to analyze the collision relationship between two target tubes, and then uses graph coloring method to rearrange the target tubes.But, these methods can only analyze the collision relationship between two targets.To solve this problem, this paper attempts to analyze and deal with the collisions between target tubes by using the hypergraph which can analyze the collision relationship between multiple targets.Some researchers [6][7][8] used hypergraph to solve the problem of video summarization from multiple perspectives.This paper attempts to use hypergraph to associate the colliding target tubes and group them, so as to avoid the colliding targets in the same frame in the process of rearrangement.
In order to optimize concentrated videos, this paper proposed a video synopsis algorithm based on two-stage target tubes grouping.On the basis of hypergraph grouping, the target tubes in the known grouping are grouped by clustering.Then selecting and rearranging them, and calculating the energy loss, avoid different targets with collision at the same time.
The main contributions of this paper are as follows: (1) Using hypergraph to solve the problem of collisions between target tubes; (2) Proposing a two-stage target tube grouping algorithm; (3) Proposing a target tube concentration strategy of selecting first and then rearrangement.

Two-stage grouping and rearrangement of target tubes
In order to reduce collision, a video synopsis algorithm based on two-stage target tube grouping is proposed in this paper.The basic algorithm is shown in Figure 1.

Stage 1: grouping based on the hypergraph
Hypergraph is a generalization of graph.A hyperedge can connect multiple vertices.Therefore, this paper applied hypergraph to the video synopsis to solve the collision problem between multi-target tubes.Hypergraph is composed of vertex set .As shown in Figure 2, the vertex set V is used to represent the target tube set, and the hyper-edge set E is used to represent the collision relationship between multi-target tubes, then target tubes with collision relationship are connected with a hyper-edge.And the collision frame length is used to 2 describe the collision of target tubes.Using a matrix ij B represent the collision degree between two target tubes in the group, as shown in formula (1): Where the value ij t represents the frame length of the collision between the target tube vertex i and the target tube vertex j .We set a collision threshold 0 t .When ij t is greater than 0 t , it is recorded as a serious collision, otherwise, it is recorded as a slight collision.The vertices determined as serious collision are connected by a hyper-edge.The target tubes are grouped according to the collision relationship to obtain group set . There was no collision relationship between the target tubes from different groups.

Stage 2: cluster grouping of target tubes based on equidistant nearest neighbor sampling
Because the direction and position of the targets in the groups are often not unified.During rearrangement, the moving targets in opposite directions may appear alternately, making the video length longer.Therefore, the grouping of the second stage is added.The group obtained in the first stage is grouped by DBSCAN clustering to obtain group , , , similar targets are grouped into one class according to the distance measurement and direction measurement between the target tubes, and the target tubes are grouped according to the class.As shown in Figure 3, if no second stage grouping, the target tubes in different directions appear alternately during rearrangement, like tube 1 and tube 2, tube 3 and tube 4, resulting in a lot of redundancy.In reference [9], the distance measurement between two targets is calculated by using the starting point and the ending point.Considering the number of points selected by the above method is too small, an equal distance nearest neighbor sampling method is proposed.The specific method is to take the bisection points on the connecting line between the starting point and the ending point, and find the points closest to the bisection points on the target tubes, and take these points as the sampling points.As shown in Figure 4, i and j are two independent target tubes,  with only one group, give priority to the longest target tube in each group in these groups.Considering that such selection may lead to poor robustness, roulette selection strategy is added to the selection [10].

Selection of target tubes
For the length in the group, as shown in Figure 5, according to the principle of length first, the shorter tube placed later can be inserted into the front video, and the video length is significantly shorter.

Rearrangement of target tubes
Considering that there will be some losses in the process of grouping, selection and rearrangement.Therefore, using the energy function to reflect the loss in these process.

The energy function  
EF is calculated as equation (3): Where, F represents the set of starting frames of all target tubes,   c EF is the collision loss item,   t EF is the long loss item, and   s EF is the sorting priority loss item, where: (1)Collision loss item: this loss item is defined as the number of target active pixels that are not occluded in the original videos but are occluded in the concentrated video.As shown in equation ( 4 (2)Duration loss item: this loss item is defined as the maximum end time of all target tubes during rearrangement, that is, the length of the last concentrated video.As shown in equation ( 5): t  is the normalized parameter of the long loss item.
(3)Sorting priority loss item: this loss item is defined as the reverse order of the time order and priority order of the target tube during rearrangement.The larger the value, the greater the degree of destruction of the current priority order, as shown in equation ( 6): p  is the normalization parameter of sorting priority loss item, which is 1 when is the priority sequence number of the ith target tube.

Experimental design and result analysis 3.1 Experimental data sets
The data set scenes are shown in Figure 6. Figure 6 (a) shows the data set Library has a total of 15200 frames.Figure 6 (b) shows the data set Corridor which has a total of 15884 frames.Figure 6 (c) shows the data set Road1 which has a total of 4650 frames.Figure 6 (d) shows the data set Road2 which has a total of 4500 frames.Figure 6 (E) shows the data set Shopping mall which has a total of 9275 frames.In this section, the effect of grouping algorithm is measured by adjusting Rand index [12] (ARI) and normalized mutual information [13] (NMI).
(1)Adjusted rand index (ARI): reflect the overlapping degree of the two division methods.If the value is close to 1, the better the clustering effect.
(2)Normalized mutual information (NMI): an information theory used to measure the degree of mutual prediction between clustering results and predefined clusters based on internal information, and to measure the similarity of two clustering results.together wrongly, and the orange line indicates that groups lacks targets in the same directions.The experimental data are shown in Table 1.
According to the Figure 8 and Table 1, the experimental results of the two-stage target tube grouping algorithm are closer to the expected grouping effect, and there are many deficiencies in the other two algorithms.

Comparison of video synopsis algorithms
Our video synopsis algorithm, the video synopsis algorithm combining object speed and size change [14] (OSSC-VC) Video synopsis method considering trajectory geographic direction [15] (TGD-VC) is compared in five different scenes.
(1) Frame compression ratio [4] (FR): the ratio of the number of frames in the video summary and the original video, , where S T is the number of frames in the video summary and I T is the number of frames in the input original video.
(2) Frame compact rate [4] (CR): it is used to judge whether the target tube rearrangement in the summary video is compact.The calculation formula is as follows (9): (3) Overlap ratio [4] (OR): used to indicate the collision degree of the target tube in the summary video.The calculation formula is as follows (10): It can be seen from Table 2 that in most scenarios, the video concentration algorithm based on two-stage target tube grouping proposed in this paper has better experimental results.

Conclusion
Our algorithm divided tubes into several groups by analyzing the collision and position relationship between targets, and the two-stage grouping method in this paper effectively group target tubes.Later, the QPB-LPG selection principle is adopted to determine the position of the target tube with larger space occupation first, so as to facilitate the subsequent insertion of the target tube with smaller space occupation.Then, rearranged target tubes according to the energy loss.Finally, experiments show that the two-stage grouping video synopsis method proposed in this paper groups effectively reduces the collision rate of concentrated video, improves the condensation ratio of concentrated video, and achieves good concentration effect.However, this method also has some shortcomings.The next step will focus on how to improve the execution efficiency of the algorithm.
s i and e i are the starting point and end point of the target 3 ITM Web of Conferences 45, 01005 (2022) CSCNS2021 https://doi.org/10.1051/itmconf/20224501005tube i , s j and e j are the starting point and end point of the target tube j , and k represents the k -th sampling point from the starting point, kN  .As shown in formula (2): is the similarity of the two target tubes, a is the weight coefficient, , d i j  ( ) and , d i j  ( ) are the distance measurement and direction measurement of the two target tubes,  is the included angle of the connecting line between the starting point and the ending point of the two target tubes, and () rad represents the radian system of  .

Fig. 5 . 1 G
Fig. 5. Length priority within groups.The target tubes extracted from the video often have different shapes, lengths and number of classes.During rearrangement, the video length becomes longer due to the lack of compactness of the targets.Therefore, a QPB-LPG selection principle is designed in this paper.For the group sets with more than 2 groups, give priority to group 1 CC  with the largest number of target tubes in 1 G .Then, for the selected group

): 4 ITM
Web of Conferences 45, 01005 (2022) CSCNS2021https://doi.org/10.1051/itmconf/20224501005 is the normalized parameter of the collision loss item, and is the pixel after segmentation of the instance of the k -th frame of the i -th target tube.

Fig. 7 .Fig. 8 .
Fig. 7. Display of target trajectory in the same direction of original videos

Figure 7 6 ITM
Figure7shows different algorithms in different scenarios from top to bottom.Figure8shows the effects of HD-DCA, SDF-DCA and the algorithm grouping in this paper from left to right.The red line indicates that the targets shown in Figure7are correctly divided into same groups, the green line indicates the target tracks in other directions divided

Table 1 .
Comparison of grouping algorithms.

Table 2 .
Comparison of video synopsis algorithms.