A Data Forward Stepwise Fitting Algorithm Based on Orthogonal Function System

Data fitting is the main method of functional data analysis, and it is widely used in the fields of economy, social science, engineering technology and so on. Least square method is the main method of data fitting, but the least square method is not convergent, no memory property, big fitting error and it is easy to over fitting. Based on the orthogonal trigonometric function system, this paper presents a data forward stepwise fitting algorithm. This algorithm takes forward stepwise fitting strategy, each time using the nearest base function to fit the residual error generated by the previous base function fitting, which makes the residual mean square error minimum. In this paper, we theoretically prove the convergence, the memory property and the fitting error diminishing character for the algorithm. Experimental results show that the proposed algorithm is effective, and the fitting performance is better than that of the least square method and the forward stepwise fitting algorithm based on the non-orthogonal function system.


Introduction
Data fitting or data approximation is an important and common technique in data mining [1]. Least squares method (LSM) is widely used in data fitting. But LSM has the following drawbacks: the solution may not exist; the number of parameters to be far less than the number of data, otherwise prone to over fitting; not successive approximation algorithm, the fitting error does not decrease with increasing complexity; do not have memory property, when increasing the base function, need to retraining parameters. In recent years, the paper [2][3][4][5] puts forward the method of piecewise linear fitting. In this method, the original data sequence is matched with the key points of the data sequence, which is used to compress the original sequence in order to get smaller storage and computation cost. However, this method cannot approach the original data sequence arbitrarily, and the fitting error is large. The orthogonal function system is an important set of functions. After completion, it can become the base of Hilbert space, which can be used to carry out the research of Fourier analysis, function approximation and data fitting. The orthogonal function system is easy to construct, and can be obtained by the implementation of the Schmidt orthogonalization method. The paper [6] gives some important orthogonal function systems. Because of the linear independence of the orthogonal function system, it can be used to extract the information of each vertical direction without overlapping. So the orthogonal function system is very suitable for data fitting. Based on LSM, the paper [7] uses a class of Orthogonal Spline Function GF system to fit the scattered data and the paper [8] uses the wavelet function to fit the discrete data. Based on the popular method of surface rendering, the paper [9] applies a complete orthogonal function system to fit the point cloud data. Because of the excellent properties of orthogonal function system, this paper adopts the linear combination of complete orthogonal trigonometric function on L 2 [0, 1] to fit the original data. However, the algorithm idea of this paper is different from the previous literature, but adopts the forward stepwise fitting strategy, and extracts the maximum residual information in each step. The algorithm proposed in this paper has some excellent properties, such as convergence, memory property, and gradual approximation.

Algorithm and Property
Set the data sequence is 1 2 { , , , } N y y y , given a positive integer M such that N M is odd, the data sequence is mapped to the plane of the data set 1 (2) of the above orthogonal function system to fit ( ) f x . The n in (2) reflects the complexity of the model, in general, the bigger the n , the better the fitting, however, the more complex the model, prone to over fitting. The problem now is to give (1), how to learn the parameters of (2) and determine the appropriate complexity.
Given complexity n , we need to learn LSM is to obtain the parameters such that the mean square error  That is to solve the following optimization problems 2 1 , , In the following we solve the above optimization problem. , and solve the above equations, and then we have Using (5) and (6), we obtain the following algorithm. Algorithm A (Data forward stepwise fitting algorithm based on orthogonal trigonometric function, OTFS-DFSFA).
Input: data sequence 1 2 { , , , } N y y y and Complexity n ; Output: Step1. Mapping data sequence into (1); Step2. According to (4) Step4. Output Algorithm A has the following properties. Proposition B.  (2) Note that LSM does not have Proposition D.

Convergence of the Algorithm
Theorem E. Algorithm A is convergent. Proof. It suffices to proof that the denominator of formula (5) and (6) is not equal to 0, i.e.
So we only need to prove that the equality sign of (8) not occur. A necessary and sufficient condition for the equality sign of (8) occur is Where K is a constant. First it follows from the first item of (9) that there exits integer i m such that  (9) is not established. Second it follows from the second item of (9)  . This is a contradiction because the left of the upper equation is an irrational number, and the right is an integer. Therefore, (9) does not hold and (8) cannot take equal sign, so the denominator of (5) and (6) is not zero.

Model Selection
The canonical method of model selection is regularization method. Regularization usually has the following form: Where the first term measure the capacity of model fitting the training data, and the second is the regularization term, which reflects the complexity of the model structure. The relative importance of model fitting accuracy and structure complexity is controlled by regularization parameter 0 O t . Data fitting is usually transformed into an optimization problem with loss function and regularization term, in which parameter O play a tradeoff between loss and over fitting risk. The literature [10] has carried out a thorough study on this problem.
Given complexity n , because of the memory property of Algorithm A, Algorithm A can learn n models, and the complexity of these models from 1 to n . The question now is how to choose the right model from these models. According to (10), the following specific regularization forms are adopted. 2 2 Then when the complexity is changed from k to 1 k , the value of (11) becomes larger. According to the above discussion and the memory property of Algorithm A, the following over fitting criteria are obtained: If complexity k satisfies Then ( ) m f x is over fitting for all m k t , where O can be adjusted according to practical problems.

Experimental data
The data used in the experiment comes from the Dongguan statistical survey information network, which records the monthly electricity load in Dongguan from

Comparison of the fitting performance with the least square method
First, this experiment compares the descending speed of MSE with LSM. Given complexity n , the fitting degree of the algorithm is measured by

Comparison of fitting performance with nonorthogonal function system
This experiment adopts the following non-orthogonal function system ^2 1, ,cos 2 , ,cos 2 2 , , ,cos 2 , The forward stepwise fitting algorithm which is similar to algorithm A is adopted to compare the fitting effects of two different functional systems. First, compare

Model selection and over fitting decision
Taking complexity 34, by the memory property of the algorithm A, we get a total of 34 models with complexity from 1 to 34, and take 1/ 65 O to compute the regularization formula (11) and the over fitting condition (12) of each model. The calculation results are as follows.
As can be seen from above . So, ( ) n f x is over fitting for all 33 n t . In the following the fitting effect of the models are given in figure 3 where complexity is 32 and 33.

Conclusion
In this paper, a data forward stepwise fitting algorithm based on orthogonal function system is proposed. The algorithm has memory property, and the time complexity of the algorithm is less than that of LSM when learning multiple models. The algorithm has gradual approximation property, and the fitting error decreases as the complexity of the model increases. The experimental results show that the performance of the proposed algorithm (MSE dropping speed and fitting accuracy) is better than that of LSM and the forward stepwise fitting algorithm based on the non-orthogonal function system. At the same time, the optimal model selection method and over fitting criterion are given. The experiment shows that the model selection method and the over fitting criterion are effective and have practical significance.