Using of the notion « Pareto set » for development of the forecasting models based on the modified clonal selection algorithm

The algorithm which carries out the multiobjective optimization at realization of the modified clonal selection algorithm based on the use of the notion «Pareto set» when the parental population of antibodies should be created for development of the forecasting models on the base of the strictly binary trees has been offered. Two indicators of quality of the forecasting model – the affinity indicator based on the calculation of the average forecasting error rate, and the tendencies discrepancy indicator – are applied in the role of the objective functions. The results of experimental studies which confirm the efficiency of application of the offered algorithm have been given.


Introduction
The key stage at the solution of the forecasting problem of time series (TS) with use of the artificial intelligence technologies is the choice's stage of the best forecasting model.In particular, in the forecasting model based on the strict binary trees (SBT) and the modified clonal selection algorithm (MCSA) [1,2] the forecasting model is presented in the form of antibodies.The antibody is a sequence encoded by randomly selected characters.Such sequence will be transformed to the analytical dependence which represents some function.This function will be applied to obtaining the predicted values of TS.
The search of the best forecasting model occurs during process of iterative calculations.The best forecasting models must be determined at each step of this process.Such models become parents for the next generation of models at the next step [1 -8].Obviously, the correct selection of antibodies is the key for the effective use of the MCSA and its convergence.
The traditional approach to choose the short-term forecasting models of TS consists in the quality estimation of the forecasting models by means of the average forecasting error rate, calculated for the training set of data.Herewith the average forecasting error rate should be minimized [1 -8].
However, the use of the average forecasting error rate as the single quality indicator of the forecasting model is not always sufficient to determine the best forecasting model.Often it is required to consider the additional quality indicators of the forecasting model, such as the compliance to the seasonal tendencies of TS, the compliance to the trend of TS, lack of emissions, complexity of the forecasting model, etc. [3].
Usually it isn't possible to choose the single quality indicator.Therefore, the development problem of approaches to creation of the forecasting models conforming to the requirements about providing an extremum of several quality indicators is very actual.Therefore, it is expedient to use the additional quality indicator which will allow to estimate the general tendency of values' change of the known elements of TS (for example, the tendencies discrepancy indicator) along with the average forecasting error rate [3].Hence, it is possible to increase the efficiency of the forecasting models on the base of the SBT at the solution of the problem of mediumterm forecasting.

Theoretical part
The average forecasting error rate AFER [1,2], which is also called the affinity indicator Aff (in the context of working with the MSCA) and used as one of the quality indicators for the forecasting models can be calculated as: where j d and j f are respectively the actual (fact) and forecasted values for the j -th element of TS (for the j -th timing); n is the number of TS elements (number of timing).
The rate of discrepancy between the tendencies of TSs (the tendencies discrepancy indicator Tendency ) is used as other quality indicator for the forecasting models and can be calculated as: where h is the number of negative multiplications ; j d and j f are respectively the actual (fact) and forecasted values for the j -th element of TS (for the j -th timing); n is the number of TS elements (number of timing); r is the model order; 1 r n is the total number of multiplications . This indicator is used for adaption of the forecasting models on the base of the SBT and MCSA for the medium-term forecasting.
Both indicators ( Aff and Tendency ) determine the similarity of the predicted values of the analyzed TS with the real ones.However, they use different principles of evaluation.The affinity indicator Aff is used in the implementation of the MCSA to define value of «adaptability» (quality) of the antibody Ab , and the tendencies discrepancy indicator Tendency allows to estimate the quality of the antibody Ab taking into account the coincidence with the trend of the analyzed TS.Herewith both indicators must be minimized.
These indicators are based on various principles of the quality assessment of the forecasting model.The affinity indicator Aff estimates the similarity and difference between the predicted and actual values of the known elements of the analyzed TS.The tendencies discrepancy indicator Tendency estimates the similarity and difference of the change's directions between the predicted and actual values of the known elements of the analyzed TS.This indicator helps to analyze tendencies in the TS and the presence of seasonal fluctuations.
Thus, both indicators (the affinity indicator (1) and the tendencies discrepancy indicator (2)) must be used simultaneously at the quality assessment of the forecasting models on the base of the SBT and MCSA to solve the tasks of medium-term forecasting.
Various well proved approaches can be applied to the solution of the task of the simultaneous accounting of two quality indicators for development of the forecasting models [9].Herewith it is necessary especially to allocate approach, based on the several multiobjective optimization algorithms, including, evolutionary algorithms.
Such multiobjective optimization algorithms provide a solution of the account's problem of the several objective functions (criteria, quality indicators) at the solution of various applied tasks.
Currently the genetic algorithms (GA) have the greatest application among the evolutionary multiobjective optimization algorithms.These algorithms have such advantages as: • lack of the restrictions imposed on nature of the optimized objective functions; • resistance to local optimum traps; • high speed of convergence to the decision; • solution's possibility of the tasks' wide class (including large-scale problems of optimization); • simplicity of realization; • use's possibility for the tasks with the changing environment.
The VEGA [10] was proposed by D. Sɫhaffer in 1984.It belongs to the group of the selection algorithms on the base of the switching objective functions.This algorithm is based on idea that use of the parents possessing the best variations of values of various objective functions (criteria, quality indicators), i.e. with their best sum (value of «supercriterion»), as a result will allow to receive the decision which will combine the best combination of values of various criterion functions (criteria, quality indicators) in total.This algorithm is the only algorithm among the presented algorithms which doesn't use the notion «Pareto set».
The FFGA [15] was proposed by Fonseca C.M. and Fleming P.J. in 1993.It applies the simplest variant of formation of Pareto set for selection of decisions.
Set of all decisions is ranged according to success of some decision with using of all objective functions (criteria, quality indicators) in relation to other decisions.The lower the rank of the solution, the greater probability of its choice for the next step of selection.The fundamental difference of the FFGA is that all objective functions (factors, quality indicators) are considered together (as a whole) in relation to the decision.Herewith not values of the objective functions, but ranks of all solutions in the population are analyzed.
The principal difference and advantage of the NPGA [12], which was proposed by Horn J., Nafpliotis N. and Goldberg D.E. in 1994, from the other genetic multiobjective optimization algorithms is that this algorithm has a mechanism to support the diversity of the solutions' population.This algorithm carries out a combination of the principles of tournament selection and the notation of Pareto dominance.The NPGA solves the problem of convergence to a local minimum, which is one of the main problems of the GA.There are various modifications of the NPGA.They differ in the approaches to the formation of the parent population.
The NSGA [13] was proposed by Srinivas N. and Deb K. in 1994.It applies some other approach to the problem's solution of the support of a variety in the parental population.For each generation not dominating sorting is also executed, but in addition, the so-called phenotype distance for the choice of decisions for selection is calculated.If this distance to any decision in the next generation is less, than is set by means of some threshold value, the considered decision isn't added to the next generation.However, this algorithm has some shortcomings.First, the ranging of decisions becomes superfluous if the phenotype distance is used.Secondly, there is a need of determination of threshold value which sets admissible phenotype distance between the decisions.
The NSGA-II [14] was proposed by Deb K., Agrawal S., Pratap A. and Meyarivan T. in 2002.It provides correction of shortcomings of the NSGA.First, the improved sorting algorithm reduces computing complexity of calculations.Secondly, the decisions both from the modified set and from the initial set of decisions are used for formation of new population of decisions.So-called «ɫrowding distance» is calculated for each decision.This distance allows to estimate, how some decision is close to the solutions-neighbors.Bigger mean value of «ɫrowding distance» corresponds to the best variety of decisions in the population.
The SPEA [15] was proposed by Zitzler E. and Thiele L. in 1998.It uses the following approach.The decisions which aren't dominated by other decisions in the population are stored in the special external array.Thus the elitism mechanism which allows not to lose the good intermediate decision is realized.
The number of such selected decisions can be great.For reduction of the decisions' number which are stored in the external array, the clustering procedure is carried out.The SPEA effectively deals with the typical problem of the premature convergence which is often arising at realization of the principles of elitism.For this purpose the special mechanism of the niches' formation is used.Herewith the detection of the general suitability is carried out not on the base of distance between decisions, but on the base of the principles of Pareto dominance.
In the context of the problem's solution of development of the forecasting models on the base of the SBT and MCSA it is necessary to understand the forecasting model as the decision.
The analysis of literature devoted to the problems of multiobjective optimization with application of genetic algorithms demonstrates, that nowdays such algorithms as the SPEA, NSGA and NSGA-II find the greatest application at the solution of many applied tasks.So, for example, the NSGA and NSGA-II are successfully applied in the such problems as problem of scheduling, problem of drawing up of schedules, the travelling salesman problem [16].Herewith the NSGA-II is significantly better than the NSGA because the NSGA-II minimizes the computing expenses.
In this regard the decision on expediency of adaptation of the ideas put in the NSGA-II at realization of the MCSA which is applied to selection of the forecasting models on the base of the SBT was made.
For confirmation of prospects of the offered transformation of the MCSA it is offered to realize the following algorithm of multiobjective optimization.
Step 1.To generate initial population of antibodies.Each antibody is coded on the base of the SBT and represents some forecasting model.
Step 2. To perform the nondominated sorting to population of antibodies on the base of two indicators of quality for the forecasting model (the affinity indicator (1) and the tendencies discrepancy indicator (2)).
Step 3. To choose the parents-antibodies for the next generation of the clones-antibodies based on the values of the rank and «crowding distance».
Step 4. To pass to step 5 if desirable values of the quality indicators are reached or the quantity of generations in the MCSA is settled.Otherwise to pass to step 2.
Step 5. To accept the antibody with the minimum value of the affinity indicator (1) in the last population as the optimum decision.To use the forecasting model corresponding to this antibody for forecasting.
As a result of application of the offered algorithm the Pareto set of the nondominated forecasting models will be received.These models provide the best combinations of values of the used quality indicators of the forecasting models for the analyzed TS.
The received forecasting models can be applied at the solution of a problem of medium-term forecasting.It will expand application scope of the forecasting models based on the SBT and MCSA.
Forecasting for each TS was executed with use of one ( Aff ) and two ( Aff and Tendency ) quality indicators of the forecasting model.The errors of forecasting are presented in Table 1.The errors of forecasting in the Table 1 are the average error values.These errors were calculated on the base of 1000 program runs for 100 generations of population.Size of population is set by 20 antibodies.
Values of the forward prediction errors (for 1 -5 steps forward) specify that the offered approach to selection of the forecasting models is effective as for the solution of problems of short-term forecasting (for 1 -3 step forward) as for the solution of problems of medium-term forecasting (for 4, 5 steps forward).
It should be noted that use of the additional quality indicator of the forecasting model (the indicator Tendency ) allowed to carry out «search» of the forecasting model in the necessary (correct) direction.As a result, for all reviewed examples of time series for the small number of generations of the MCSA the smaller values of the affinity indicator Aff (training error (1)) and in most cases the smaller values of the forecasting errors for 1 -5 steps forward were received (Table 1).
Results of forecasting for TS «The Brent crude oil price» have been presented in Figures 1 and 2. These results are received with use of the forecasting models on the base of one and two indicators of quality respectively.Tendency ) demonstrates the best survival for the extended forecasting horizon than the forecasting model on the base of one indicator of quality ( Aff ).Besides this model is more effective for short-term forecasting.
These calculations allow to make the following conclusion: use of the second indicator of quality of the forecasting model provides a way to increase the life time of the forecasting model.Herewith the efficiency of the forecasting model for performing of the short-term forecasts saves that in general confirms the success of the offered approach.

Conclusions
Initially, the MCSA was developed for the solution of short-term forecasting problems.However, the fulfilled researches showed application's possibility of the MCSA for the solution of medium-term forecasting problems.
Apparently, application of the Pareto domination principles is the effective solution of the accounting problem of several quality indicators in the development problem of the forecasting models, which represent analytical dependences on the base of the SBT.
It should be noted that computing complexity of the MCSA, when the Pareto-optimal solutions are used, increases slightly.Herewith, it is possible to expand scope of application of the MCSA considerably.

DOI: 10
.1051/ C Owned by the authors, published by EDP Sciences,

Figure 1 .
Figure 1.Forecasting of «The Brent crude oil price» (with one indicator of quality Aff ).

Figure 2 .
Figure 2. Forecasting of «The Brent crude oil price» (with two indicators of quality Aff and Tendency ).The presented graphic dependences show that the second model repeats the mathematical law of initial TS for the training set of data better than the first one.Moreover, this feature is kept also for the test set (when forecasting for 5 steps forward is carried out).Herewith the forecasting model on the base of two indicators of quality ( Aff and

Table 1 .
The erros of forecasting.