Research on Infrared and Visible Images Registration Algorithm Based on Graph

In this paper, we proposed a registration algorithm based on the combination of pyramid hierarchical idea and graph theory. It can solve the challenging problem of extracting consistent characteristics and matching the infrared and visible images. First, extracting the maximally stable extremal regions (MSERs) in the determining maximum down-sampling images and each MSER is represented by a polygon. Then constructing the gragh features and the mapping relationship of MSERs between the infrared and visible images are determined by the graph matching method. Next we can construct the initial point set for matching according to the mapping relationship. Finally, using the random sample consensus (RANSAC) algorithm to obtain the optimal parameters and determine the error evaluation parameters. According to the idea of pyramid stratification, the above process is repeated in the high resolution images under the constraint condition of current matching error. The experiment results show that the algorithm can make full use of the visual similarity structures between images, and can achieve a smaller matching error under the premise of ensuring the robustness of the matching.


Introduction
Image registration is the process of aligning two or more images in the same scene of different times, different angles and different sensors.It has been widely used in military, medical, remote sensing, pattern recognition, computer vision and other fields [1][2][3].The registration of infrared (IR) and visible images (VI) is an important branch in the field of multi-sensor registration.At present, the IR and VI registration algorithms mentioned in the literature can be divided into two categories: the method based on image gray and the method based on feature [4][5].The gray level registration algorithm is to calculate the spatial transformation relation of the images using the gray information under certain similarity measure criteria.This method does not need to extract image features, but use the gray information of the image directly.The advantages of it are no need for image preprocessing and good versatility.And the disadvantages include the large amount of calculation and sensitivity to the change of image gray.Besides, it only reflects the statistical information of gray value about each pixel in the image while ignoring the image spatial structure information, so it is easy to cause false matches [6].The registration method based on the image features is implemented by extracting the invariant features (such as corners, lines, contours, etc.) to obtain the transformation relation.However, because the IR and VI are in different bands, there is little relevance between them, making registration method based on feature points easily lead to a large amount of false matches.Since registration method based on line features puts forward more stringent requirements on image content and deformation, it limits the application scope of the algorithm [7].And the validity of the registration method based on contour features is largely dependent on the quality of extracted contour features, while stable and effective contour extraction is a classical problem in the field of computer vision [8].
In this paper, we proposed a registration algorithm which combines pyramid hierarchical idea and graph theory.First, extracting the maximally stable extremal regions (MSER) in the determining maximum down-sampling images and constructing the graph feature for each region.Then the regional mapping relationship between the IR and VI is determined by the graph matching method.Finally, we can construct the initial set of matching points based on the regional mapping and obtain the optimum matching parameters using the random sample consensus (RANSAC) algorithm and determine the error evaluation parameters.According to the idea of pyramid stratification, the exact matching in local range in the high resolution images is carried out under the constraint condition of current matching error.Good performance can be achieved under the premise of ensuring the robustness of matching.

Registration definition and transformation model
Image registration is to make two images 1 ( , ) f x y and 2 ( , ) f x y achieve the consistency on geometric position and gray level.The relationship between them can be expressed as follows: where g represents a one-dimensional gray transform function, and h represents a two- dimensional spatial position transform function.Image registration is to find the ideal transformation parameters.Before the registration, it is necessary to determine the transformation model.The commonly used models are rigid body transformation, similarity transformation, affine transformation, perspective transformation, etc.In this paper, we consider affine transformation, that is, there are not only existing translation, rotation and scaling between images, but also inversion and shear [9].The form of the transformation matrix H is as follows: 0 0 1 where 0 m , 1 m , 2 m , 3 m , 4 m , 5 m represent six independent degrees of freedom, an affine transformation can be expressed as follows:

Extraction of MSER
It is difficult to extract the common features for IR and VI due to the different imaging mechanisms.
For the IR and VI, the same target may have different contrast and gray level; on the other hand, the same contrast and gray level may not belong to the same target.The purpose of extracting MSER is to solve the above difficulties, as many as possible to extract the common targets in IR and VI.And MSER is a kind of regional feature extraction operator which has characteristic of affine invariant.Firstly, the original images need to be converted into the gray images, and then the connected pixel sequences are obtained according to the pixel gray value.Finally, detecting the stability of connected regions according to the maximum rate of change max V and the region area to obtain the MSERs [10-  11].For region Q and its border Q ∂ , if gray value ( ) ( ) … represent extremal regions that are nested with each other, that is

Construction of graph features
In order to describe the structure characteristics of the extracted MSERs, we make a polygon approximation for each MSER, and then construct the attribute graph [12][13][14][15].For the matching of two polygons, let us consider two attribute graphs 1 G and 2 G , where 1 ( , , , ) represent a set of nodes, edges, node attributes and edge attributes, respectively.We only consider the edge attributes and they are expressed by 1 × 19 vectors.The first component represents the normalized distance, and the remaining 18 components represent the angle information of the edge and its adjacent edges.For the edge ij e of point i v to j v , its first component represents the normalized length ij L , and we use cyclic Gauss histogram for angle information, as shown in Figure 3. ( ) ( ) ( ) (0, ) ( , ) where ( ) N μ σ represents a discrete Gaussian window of size σ centered on μ , additional Gaussian terms in ( ) P f x induce the circular bins for angle.The final edge attribute ij a is composed by concatenating the normalized distance ij L and angle information ij P .For edge ij e : ij ; which is asymmetric ( ≠ ij ji a a ).In this work, we used a window size 5 P σ = , so that (0,5) 1.0 N = , ( 1,5) 0.4578 N ± = , ( 2,5) 0.0439 N ± = , and 0 otherwise.

Graph matching model
In this paper, one-to-one matching constraints are adopted so that every node in 1 G is mapped to at most one node in 2 G and vice versa.Graph matching identifies the subset of node correspondences between 1 G and 2 G among all of the possible correspondences, which best preserve the attribute relations under the matching constraints.We use binary indicator vector where scaling factor 2 s σ is chosen to be 0.1.In this paper, the diagonal components of M are set to zero, because no node attributes are used.The graph matching problem is to find the best matching relation * y between two graphs.

Construction of objective function
We can find the corresponding affinity values of edge attributes in M according to the components whose values are equal to 1 in match result y .Then the affinity values are calculated as matching score, and the matching score is used to evaluate the matching result of the two graphs.Graph matching can be expressed as follows: n n × components in y , namely the component in y only can be 0 or 1 .And { } enforces the one-to-one constraint.The affinity measure is typically restricted to be nonnegative, therefore the number of matches in the final solution is implicitly determined to the maximum possible number of matches under one-to-one constraint.The underlying assumption is that the solution that contains true matches would have a higher objective value than those which do not.However, this assumption poorly holds even with few outliers.The matching score S is composed of two parts,

T in out
S S S M = + = ⋅ ⋅ y y, where in S represents the matching score of the correct matching point pairs, and out S represents the matching score of the outliers.The existence of the outliers will increase the matching score, so in this paper, we consider not only the matching score but also the number of matching points.By changing the objective function of formula (7) to the following: g ( ) ar where ( ) in all the experiments.We adopt the Markov chain Monte Carlo (MCMC) sampling technique proposed in literature [16] to solve the optimization problem in formula (8).

Solve transformation matrix
We adopt the coarse-to-fine strategy, so the transform matrix needs to be obtained at each level of resolution, and the transformation matrix is obtained at low resolution plays a guiding role on the high resolution matching; finally using the matrix obtained in the original images to achieve registration.In this paper, the original images are processed by three times of down-sampling and the initial matching

IST2017
point set is obtained on the maximum down-sampling images, and then the matching point set is used as the input of the RANSAC algorithm.According to the idea of Pyramid stratification, the above process is repeated in high resolution images to realize the precise matching in local area until the original resolution images according to the transformation matrix obtained in low resolution and with the constraint of current matching error.

Evaluation criteria
In this paper, we adopt the root mean square error (RMSE) as the basis for testing the accuracy of registration algorithms.The smaller the RMSE is, the higher is the accuracy of registration.The calculation formula for RMSE is defined as: where m is the number of matching points, i x , i y are the coordinates of reference image, ' i x , ' i y are the transformed coordinates in the image to be registered.At the same time, we also consider the subjective evaluation, if the registration accuracy is high, the common visual sensitive structure information in IR and VI can completely align, so it does not appear significant dislocation in the fused image and we can obtain the good visual effect.

Experiments
In this section we evaluate our method using real images of three scenes, and the scene 1 contains a building background, as shown in Figure 4 (a) and (b).There is a tiny deformation between images in scene 2, as shown in Figure 5 (a) and (b).And the image quality is poor in scene 3, as shown in Figure 6 (a) and (b).For each scene, we compare the result with two matching methods, namely, the registration method based on mutual information and the registration method based on line features.In the experiment, the VI is the image to be registered and the IR is the reference image.For scene 3, the registration method based on mutual information cannot be implemented due to the difference between VI and IR.Therefore, we only give the experimental results based on line features and the method proposed in this paper.The experimental results are shown as follows:

Objective evaluation
In order to quantitatively evaluate the effect of registration, we calculate the RMSE of the registration method based on mutual information, the registration method based on line features and the proposed method respectively.The results are shown in It can be seen from Table 1 that our method achieves the best performance on three scenes compared with the registration methods based on mutual information and line features.

Subjective evaluation
In order to facilitate the observation of the effect of registration, we fuse the visible reference image and infrared image which has been registered, and the fusion results as shown in the following figures: (a) mutual information method (b) line features method (c) proposed method

IST2017
From the fusion results, we can see that the registration result based on mutual information method exhibits serious dislocation phenomenon for scene 1 and scene 2, and the mutual information method is not able to achieve the registration for scene 3 due to too poor quality, while there is a slight dislocation for registration method based on line features.After applying the registration algorithm proposed in this paper, the common visual sensitive structure information completed the alignment, and compared with the other two methods, there is no significant dislocation phenomenon and visual effect is good enough.

Conclusion
The gray characteristics are different for infrared image and visible images in the same scene due to different imaging mechanisms.Thus it causes the difference of extracted corners, lines and contours, especially for images of poor quality.So some of the existing registration algorithms are not ideal.In order to solve this problem, we proposed a registration algorithm based on the pyramid hierarchical idea and graph theory, and we consider the structure information of the graphs in the coarse-to-fine matching process.Experimental results show that the proposed method has a good adaptability to the image content and image deformation.Compared with other methods, the proposed algorithm can achieve state-of-the-art performance, merely with a small matching error under the premise of ensuring the robustness of the matching.
the most stable extremal region, Δ represents the change of gray value, | • | represents the region area, that is the number of pixels covered by the region.The gray values of the most stable extremal regions are less than (MSER-) or larger than (MSER+) the gray values of the pixels on the boundary.As shown in Figure 2, (a) is part of MSERs marked in VI, and (b) is the MSER extraction results of marked regions in (a).Because the connectivity is used in the extraction process and the constraint of region size, so the second word only retains the lower part.The goodness of MSER comes from its extraction process which is in accordance with human visual system, that is, the human eyes are more sensitive to the saliency regions (corresponding to the MSERs).

Figure 3 .
Figure 3.The representation of angle information.We use uniform bins of size 2 . y ∈ C int ∩C one (

Figure 4 .Figure 5 .
Figure 4. Comparison of results for the first scene.

Figure 6 .
Figure 6.Comparison of results for the third scene.

Figure 7 .Figure 8 .Figure 9 .
Figure 7.Comparison of fusion results for the first scene.
that ,1 , 0 i a y = otherwise.Affinity matrix M consists of the affinity values between edges and nodes.The

Table 1 .
The RMSE of three registration methods.