Performance comparison of model selection criteria by generated experimental data

In Bioinformatics and other areas the model selection is a process of choosing a model from set of candidate models of different classes which will provide the best balance between goodness of fitting of the data and complexity of the model. There are many criteria for evaluation of mathematical models for data fitting. The main objectives of this study are: (1) to fitting artificial experimental data with different models with increasing complexity; (2) to test whether two known criteria as Akaike’s information criterion (AIC) and Bayesian information criterion (BIC) can correctly identify the model, used to generate the artificial data and (3) to assess and compare empirically the performance of AIC and BIC.


Introduction
Model selection is a process of choosing a model from set of candidate models from different classes, which will provide the best balance between goodness of fitting of the data and complexity of the model [1,2,3].There are different criteria for evaluation of competitive mathematical models for data fitting (approximation).Information criteria provide an attractive base for model selection [1,3], [4][5][6][7][8][9][10][11].However, little is understood about their relative performance in model selection.
This research has several specific objectives: (1) to generate artificial experimental data by known test models; (2) to fitting data with various models with increasing complexity; (3) to verify if the class model used to generate the data could be correctly identified through the two commonly used criteria Akaike's information criterion (AIC) and Bayesian information criterion (BIC) and to assess and compare their performance.

The generate of experimental data
We use the GraphPad Prism software for the artificial experimental data generating and for curve fitting.GraphPad Prism combines nonlinear regression, basic biostatistics, and scientific graphing.(http://www.graphpad.com/scientific-software/prism).To generate artificial experimental data we use class model -third order polynomial.The individual member of this class is: where ε is random error with Gaussian distribution and standard deviation (SD).The graph of the third order polynomial:  +  −  2 +  3 , a=44, b=99, c=-59, d=8 is shown in Figure 1.For our computational experiments were generated samples with different sizes -small sample (15 points), middle sample (31 points) and large sample (101 points), following the classification in [12].

Fitting experimental data
In this research we use different class models (polynomials from first to sixth order) for fitting the artificial experimental data.To find the individual "optimal" models P * (М j ) in the classes М j , j = 1, …, 6, we use least squares fitting in GraphPad Prism 6.0.Least squares fitting criterion is defined as follows: (2) The problem is to find  * = ( 1 * , … ,   * ), such that minimizes ().

Akaike's information criterion
One of the most commonly used criterion for model selection is AIC.The idea of AIC is to select the model that minimizes the negative likelihood penalizing by the number of parameters: where n is the number of data points; k is the number of the fitting parameters by the regression plus one (since regression is "an estimating" of the sum-of-squares as well as the values of the parameters); RSS, or residual sum of squares, is the sum of the squares of the vertical deviations from each data point to the graph of a curve of the "optimal" fitted model.

Bayesian information criterion
The other most commonly used criterion BIC has the highest posterior probability.AIC and BIC criteria differ only in that the coefficient multiplies the number of parameters.In other words, the criteria differ by how strongly they penalise large models: with the same meaning of RSS, n and k, above.
In this situation, the model that minimizes BIC has the highest posterior probability.BIC penalizes the models more from AIC for increasing number of parameters.AIC does not depend directly on the sample size.In general, models chosen by BIC will be more parsimonious than those chosen by AIC.

Program for calculating the AIC and BIC criteria
For calculation of the criteria values of AIC and BIC according to formulas (3) and ( 4), we use a program "Comparing Models" developed by us in our previous research (see Figure 2), [8].

Case 1: generate 15 points
For our first experiment we generate 15 points (small sample) in the interval from 0 to 3 with step 0.2.In this case only AIC criterion correctly identified the third order polynomial (true class model) as the optimal model, and BIC criterion chooses fifth order polynomial as the optimal model (false class model) (see Table 1).3. Simulated data (15 points) and curves of the fitting models (six fitted polynomial curves of increasing order, from 1 (straight line) to 6).

Case 2: generate 31 points
For the second experiment we use 31 points (middle sample), that are generated in interval from 0 to 3 with step 0.2.Here, both AIC and BIC criteria correctly identified the third order polynomial (true class model) as the optimal model (see Table 2).Simulated data and curves of the fitting models are shown in Figure 4.

Case 3: generate 101 points
In the last case we use 101 points (large sample) generated in interval from 0 to 3 with step 0.2.The obtained results showed that in this case only BIC criterion correctly identified the third order polynomial (true class model) as the optimal model, while AIC criterion chooses fifth order polynomial as the optimal model (false class model) (see Table 3).Simulated data and curves of the fitting models are shown in Figure 5.In Figure 6 we show comparison of effectiveness of AIC and BIC in the selection of the optimal model (in all three cases) from the set of 6 class polynomials that was used for fitting the data.

Discussion
The obtained results from the computational experiments suggested that AIC performs relatively well for small samples but is inconsistent and does not improve performance for large samples.The BIC criterion appears to perform relatively poorly for small samples but is consistent and improves its performance with increasing the sample size.This is consistent with previous studies [4,13], which demonstrated that BIC is consistent (that is, it tends to choose the true model with a probability equal to 1) in large samples.In our experiments BIC also outperforms AIC when there is a large sample (101 data points) in identification the true class model.As a whole, the current results suggest that generally AIC should be preferred in smaller samples whilst BIC should be preferred in larger samples.

Fig. 1 .
Fig. 1.The individual member of third order polynomial.

Fig. 2 .
Fig. 2. Example for calculation of the AIC: dialogue box of the program "Comparing Models" for calculating AIC.

Fig. 4 .
Fig. 4. Simulated data (31 points) and curves of the fitting models (six fitted polynomial curves of increasing order, from 1 (straight line) to 6).

Fig. 5 .
Fig. 5. Simulated data (101 points) and curves of the fitting models (six fitted polynomials curves of increasing order, from 1 (straight line) to 6).

Figure 6 (
Figure 6(a) shows that AIC chooses the true class model, and Figure 6(c) shows that BIC chooses the true class model.In Figure 6(b) we can see that both, AIC and BIC, choose the true class model.