A Relevance Vector Machine Prediction Method Based on the Biased Wavelet Kernel Function

—Relevance Vector Machine (RVM) is an important learning method in the field of machine learning for its sparsity, global optimality and the ability to solve nonlinear problems by using kernel functions. In this paper, a family of biased wavelets was used to construct the kernel functions of RVM. Biased wavelet have adjustable nonzero mean which makes the kernel of RVM more flexible. With the kernel method of the Centered Kernel Target Alignment (CKTA), the biased parameter was selected to improve the prediction performance of RVM model. The algorithm based on the biased wavelet kernel showed an increased prediction accuracy compared to using wavelet kernel and Cauchy kernel. In short, Relevance Vector Machine with the biased wavelet kernel is a flexible prediction algorithm with high prediction accuracy.


Introduction
RVM proposed by Tipping [1] is a Bayes probability model, and its kernel does not need to satisfy the Mercer conditions [2].Based on the favorable features including its sparseness, Bayesian properties, and kernel characteristics [3], RVM is one of the famous sparse Bayesian learning models [4,5].Similar to the support vector machine (SVM) model [6], the effect of the RVM depends on the kernel function and kernel parameters [7].At present, the methods for choosing an effective kernel function and reasonable kernel parameters are still imperfect [8,9].
The kernel matrix can be learned from data via semidefinite programming (SDP) techniques in [10].Cristianini, Shawe-Taylor, Elisseeff and Kandola [11] proposed a quantity measure named as 'kernel target alignment' (KTA) to adapt the kernel matrix to sample labels, and a series of algorithms are derived for clustering, transduction, kernel combination [12] and kernel selection [13].However, KTA is only a sufficient condition to be a good kernel matrix, but not a necessary condition.It is possible for a kernel matrix to have a very good performance even though its KTA is still low [14].CKTA proposed by Marina [15] is better than KTA with several experiments.
Based on the characteristics of the biased wavelet, it is suitable to be the kernel function of RVM.Wavelet analysis, which can efficiently overcome the shortcomings of Fourier analysis and other analysis tools, is becoming a focus point of many sciences.The zero-mean characteristic of wavelets often drives the phenomenon that a large number of multiresolution levels are needed to reduce the 2 L norm of the approximation error.In order to reduce the redundancy, biased wavelet was proposed by Galvao [16].
In this paper, the biased wavelet was constructed as the kernel function of RVM.CKTA was used to optimize the biased parameters with a fixed scale parameter.We evaluated the prediction accuracy with the biased wavelet kernel, Cauchy kernel and wavelet kernel by using the data from fiber Bragg grating (FBG) temperature sensor system.
With a short introduction of RVM and biased wavelet, the biased parameters were filtrated by CKTA to build the biased wavelet kernel in section 2. Experiments are presented in section 3 and the conclusions in section 4 concludes the paper.Let x be the input data and y the output data.A point y x can be predicted by: V with y can be computed: , ,  , , , Where 2 The parameters a corresponds a zero-mean Gaussian distribution over w , which avoids overfitting.The .After learning the mean and variance, the results of ( 3) is applied to (1): The predicted variance 2 V is the sum of the variance caused by the measurement noise 2 MP V and the uncertainly in the prediction of w .
Kernel functions are important for prediction performance.Methods of selecting the kernel functions, such as experience [17], comparison [18] and multi-kernels [19,20], have been developed in practical application.
Due to the characteristics of the biased wavelet, it can describe the whole data, as well as the details.Therefore, when the biased wavelet is used as a kernel of RVM, the feature space is able to be close to the target space by dynamically adjusting the biased parameters for different types of data, which can improve the prediction accuracy of the RVM model.
A biased wavelet function u satisfying i.
iii. u W is rapidly decreasing to zero when W o f iv.u Z is rapidly decreasing to zero when Z o f In this paper, we used the third type of the biased wavelet ( 6) and the Mexican Hat (7) as the mother wavelet.Then, a set of biased wavelet kernels was defined by Where V is scale parameter, b is translation parameter, c is biased parameter and W is a continuous real variable.Since there is no explicit form for the mapping function, the learning algorithm of RVM get the information of the feature space, model, the training data and their relationship from the Gram matrix.KTA based on Gram matrix is considered as an effective way to filter biased parameters.  ) i j i j

MK k x x
The target matrix was defined by

MK Y MK x x Y x x ¦
The purpose of the KTA is to calculate the degree of alignment between the Gram matrix and the target matrix.However, if, in the feature space, the origin is far away from the convex hull of the data, then the elements of MK have about the same value and, as a result, the matrix MK is illconditioned.Therefore, CKTA, a better method, was proposed by Marina.
From given kernel k , the centered kernel was defined by , The centered Gram matrix was defined by , [ ] ( , ) Similar to KTA, CKTA was defined by , ( , ) , , Compared with KTA, CKTA has the advantage of solving the problem of unbalanced data and the invariance of the linear transform.
Based on CKTA, the filtering strategy of biased parameters was shown in Fig. 1.The target biased parameter of the maximum CKTA was used to construct the final selected kernel.Our experiments showed that the relationship between biased parameters and values of CKTA was not monotonic, which meant that the target biased parameter could be found within a certain range.

Data Set and Error Measures
The dataset contains 1440 instances from 120 hours of responses from FBG temperature sensor system for the Second Yangtze River Bridge in Wuhan.The prediction algorithms were evaluated with respect to the mean relative error (MRE), the mean absolute error (MAE) and the root-mean-square error (RMSE).
¦ where y i is the true value, ŷi is the predictive value and n is the length of test samples involved.

3.2Selection Method of Biased Wavelet Kernel
The wavelength responses from sensors were selected as the output data, and time series were the input data.Fig. 2 showed the relationship between the CKTA value and biased parameters c , when 48 hours of data were used as the training data and the next 50 samples were test data.It could be seen from the figure that the CKTA reached its maximum when the Biased Parameter was -1.3.Because of the monotonic of the connection, the best biased parameter was generally found in the range > @ 10,10 .Therefore, the target biased wavelet kernel was filtered out.
Compared with the biased wavelet kernel, Cauchy kernel and the Wavelet kernel by using the Mexican Hat (7)

Conclusions
In this paper, the kernel function of RVM was constructed by biased wavelet and the optimization of the kernel parameters was investigated.For the adjustable nonzero mean of biased wavelet, the biased wavelet kernel is a flexible function.The CKTA method was used to optimize the parameters of RVM kernel.The biased wavelet kernel function can be adjusted by changing the parameters to maximum the CKTA.
Experimental results showed a higher prediction accuracy by the biased wavelet kernel function.

¦
Where ^ẁ n is the weight vector and , k x x n is a kernel function.n is the length of weight vector and 0 w the measurement noise.The noise vector 0 w is assumed to be normally distributed with zero mean and a variance of 2 V .Using Bayes' rule, the posterior distribution of w , D , 2

Let
of the index was > @ i,j 1, N and the Gram matrix of kernel was defined by , [ ]

Figure 1 .
Figure 1.Flowchart representation of selection of as the mother wavelet, TABLE I. listed the prediction results of the MAE, MRE and RMSE with different length of training set from 96 hours of data.The next 50 samples after every length of training set were taken as test set.Test responses indicated that performance enhancement could be obtained by using the biased wavelet kernel.