Development of the SVM classifier ensemble for the classification accuracy increase

The problem of improving the classification accuracy using the SVM classifier ensemble has been considered. This paper defines the rules for the selection of individual SVM-classifiers used in the future for the creation of an ensemble and the strategies for the integration of ensemble members.


Introduction
Currently, for a wide range of classification problems in various applications the SVM algorithm (Support Vector Machines, SVM), which carries out training on precedents («supervised learning»), is successfully used and included in the group of the boundary algorithms and methods of classification [1].It allows developing the classifiers which can be successfully used for a wide range of applications [2].
At first in the developing of the SVM classifier there is a need to implement multiple learning and testing on the basis of different randomly generated training and test sets, and then it is necessary to choose the best SVM classifier which provides the highest possible quality of classification.Assessment of classification quality can be performed using various indicators [2].
For the training of the SVM classifier it is necessary to define the internal parameters of the SVM classifier: the kernel function type, values of the kernel parameters and value of the regularization parameter.These internal parameters participate in the construction of the classifying function , which assign some object x to the concrete class from the set } 1 ; 1 { [2].Therefore, the problem of the selection of parameters of the SVMclassifier is crucial for receiving exact results of classification.
In recent years, much attention is paid to a question of increase in accuracy of the models, based on the machine learning algorithms.Herewith questions of the opportunities' association of several classifiers and creation of ensembles of classifiers to increase quality of the solution of the applied tasks are investigated [3][4][5].The learning of the ensemble of classifiers is a training procedure of a final set of the base (individual) classifiers, individual solutions of which are then combined to form the resulting classification decisions, based on the aggregated classifier.There are different approaches to choose the rules of combination of the individual classifiers in the ensemble and the strategies for creation of the resulting classification decision [2].
The purpose of this work is to improve the accuracy of classification decisions using the SVM classifier ensemble based on various strategies of integrating of the individual classifiers into an ensemble.Herewith it is assumed that every object i x is mapped to q -dimensional vector of numerical values of characteristics ) , , , (

Theoretical part
x is a numeric value of the l -th characteristic for the i -th object with some object from the set X .
As the kernel function ) , ( which allows separating the objects of different classes, typically one of the following functions is used [3]: • linear function: • polynomial function: is a scalar product of vectors i x and W x ; , values of the kernel parameters and value of the regularization parameter C , which allows finding a compromise between maximizing of the gap separating the classes and minimizing of the total error; 3) to implement multiple learning and testing on different randomly generated training and test sets, with subsequent determination of the best SVM-classifier.The test set is from 1/10 to 1/3 of the experimental data set, it is not involved in the setting of the parameters of the classifier, and used to verify its accuracy.If the quality of learning and testing is acceptable, the SVM-classifier can be used to classify new objects.
As a result of training, the classification function is determined in the following form [1][2][3] The classification decision, associating the object x to the class í1 or +1, is adopted in accordance with the rule [1-3]: In ( 1) and (2):

N
is the kernel classifier; b is the parameter determining the shortest distance from the origin to the hyperplane that separates classes; i D is the Lagrange multiplier; 0 t i D ; i y is the classification decision (í1 or +1) [3].
The main problem with the training of the SVM classifier, is the lack of recommendations for the choice of the regularization parameter C , the function describing the kernel ) , ( , as well as the parameters of the kernel function, for which high accuracy of data classification is achieved.This problem can be solved with the use of various optimization algorithms, in particular using the PSO algorithm [6,7]. After training, each classifier generates its own (individual) classification decisions, the same or different from the actual results of classification.Accordingly, the different individual SVM classifiers correspond to the different classification accuracy.The quality of the received classification decisions can be improved on the base of ensembles of the SVM classifiers [2][3][4][5].In this case, the final set of individually trained classifiers must be learned.Then the classification decisions of these classifiers are combined.The resulting solution is based on the aggregated classifier.The majority vote method and the vote method based on the degree of reliability can be used as the rules (strategies) of the definition of the aggregated solutions.
The majority vote method is one of the most common and frequently used method for combining of decisions in the ensemble of classifiers.But this method does not fully use the information about the reliability of each individual SVM classifier.For example, suppose that the SVM classifier ensemble aggregates the results of five individual SVM classifiers, where values of the function ) (x f of the object x obtained from the three individual SVM classifiers, are negative (class -1), but very close to the neutral position, and values of the function ) (x f of the other two SVM classifiers are strongly positive (class +1), i.e. very far away from the neutral position.Then the result of the aggregated decision of the ensemble on the basis of «one classifier -one vote» is following: the object x belongs to the negative class (majority vote), although it is obvious, that the best and more appropriate choice for the object x is a positive class.Despite the good potential of the majority vote method for combining of the group of decisions, it is recommended to use other methods to increase the accuracy of classification.
Vote method based on the degree of reliability uses value of the function ) (x f for the object x obtained by each individual SVM classifier.The greater the positive value of ) (x f in (1), returned by the SVM classifier, the more precisely the object x is determined in class +1, and the less negative value ) (x f , the more precisely the object x is defined in class -1.Values «-1» and «+1» for ) (x f indicate that the object x is situated on the boundary of the negative and positive classes, respectively. When using an ensemble of classifiers for solving classification problems the special attention should be paid to the methods of forming a set of individual classifiers, which can later be used in the development of the final SVM classifier.It is experimentally confirmed [2][3][4][5], that the ensemble of classifiers shows greater accuracy than any of its individual members, if individual classifiers are accurate and varied.Therefore, the formation of the set of the individual SVM classifiers is required: 1) to use the various kernel functions; 2) to build classifiers in the different ranges of change of the kernel parameters and regularization parameter; 3) to use various sets of training and test data.To select the appropriate members of the ensemble in the set of the trained SVM classifiers it is recommended to use the principle of maximum decorrelation.In this case the correlation between the selected classifications should be as small as possible.After training, each private j -th classifier from the k trained classifier will correspond to a certain array of errors: , where ij e is the error of j -th classifier at i -th row of the experimental data set ( n i , 1 ; k j , 1 ); ij y is the classification decision (í1 or +1) of j -th classifier at i -th row of the experimental data set; ij y ~ is the real meaning of a class (í1 or +1), for which the i -th object is belong to.
The SVM classifiers, not permitting an error on the experimental data set, should be excluded from further consideration, and from the remaining quantity of the SVM classifiers, it is necessary to select an appropriate number of individual SVM classifiers with maximal variety.To solve this problem, decorrelation maximization algorithm can be used.This algorithm provides a variety of individual SVM classifiers, being used in the construction of the ensemble [2].If the correlation between the selected classifiers is small, then the decorrelation is maximum.
Let there be an error matrix  where ij e is the error of the j -th classifier at the i -th row of the experimental data set ( n i , 1 ; k j , 1 ).On the basis of the error matrix E (3) the following assessments can be calculated [2]: • variance: • covariance: Then the elements tj r of the correlation matrix with size k k u are calculated as: where tj r is the correlation coefficient, representing the degree of correlation of t -th and j -th classifiers .
Using the correlation matrix R it is possible for each individual j -th classifier to calculate the pluralcorrelation coefficient j U , which characterizes the degree of correlation of j -th and all other 1) ( k classifiers with numbers t ( k t , 1 ; j t z ) [8]: where R is the determinant of the correlation matrix R ; jj R is the cofactor of the element jj r of the correlation matrix R .A quantity 2 j U called the coefficient of determination.
It shows the proportion of the variation of the analyzed variable, which is explained by variation of the other variables.The coefficient of determination 2 j U can take values from 0 to 1.The closer the coefficient to 1, the stronger the relationship between the analyzed variables (in this case, between individual classifiers) [8].It is believed that there is a dependency, if the coefficient of determination is not less than 0.5.If the coefficient of determination greater than 0.8, it is assumed that high dependence exists.
For selection of individual SVM classifiers for integration into the ensemble it is necessary to determine the threshold T .Thus, the j -th individual classifier must be removed from the list of classifiers if the coefficient of determination ).If it is necessary to identify the most various classifiers, generating decisions with the most different arrays of errors on the experimental data set, thresholds T , satisfying to condition 0.7 T should be selected.Herewith the additional considerations can be also taken into account to avoid the exclusion of insufficient or excessive number of individual SVM classifiers.
The decorrelation maximization algorithm can be summarized into the following steps [2].
Step 1.To calculate the matrix V and the correlation matrix R with formulas ( 5), ( 6) and ( 7) respectively.
Step 2. To calculate the multiple correlation coefficients j U ( k j , 1 ) with ( 8) for all classifiers.
Step 3. To remove classifiers, for which 1 ), from the list of classifiers.Step 4. To repeat iteratively steps 1 -3 for the remaining classifiers in the list until for all classifiers the condition 1 ) will not satisfied.
As a result, the list of classifiers used to form the ensemble will consist of m ( k m d ) individual classifiers.
For classifiers selected in the ensemble, it is necessary to carry out: • the rationing of degrees of the reliability; • the strategy search for the integration of members of the ensemble; • the calculation of the aggregated decision of the ensemble.
Value of the reliability ) (x f j , which is defined for the object x by the j -th classifier, falls into the interval (-, + ).The main drawback of such values is that in the ensemble the individual classifiers with large absolute value are often dominated in the final decision of the ensemble.To overcome this drawback, the rationing is carried out: the transformation of values of degrees of reliability in the interval [0; 1] is fulfilled.In the case of binary classification in the rationalization for the object x the values of the reliability of its membership to positive class (labeled +1) 5. Product strategy: The value ) (x

$
is an aggregated measure of the reliability's value of the SVM classifier ensemble.It can be used to integrate the members of the ensemble.
The learning algorithm of the ensemble of the SVM classifiers can be summarized into the following steps.
Step 1.To divide the experimental data set into k training data sets: 1 TR , …, k TR .
Step 2. To learn k individual SVM classifiers with the different training data sets 1 TR , …, k TR and to obtain k individual SVM classifiers (ensemble members).
Step 3. To select m ( k m d ) decorrelated SVM classifiers from k classifiers using the decorrelation maximization algorithm Step 4. To determine values of m classification functions for each selected individual SVM classifier: Step 5. To transform values of degrees of reliability, using (9) and (10), for the positive class Step 6.To determine the aggregated value ) (x $ of the reliability of the SVM classifier ensemble using (11) -( 15).This algorithm, used for the weak SVM classifiers, will provide a better quality of the classification accuracy than accuracy of any single individual classifier used for aggregation.
The problem of choosing of the threshold T is very important.Value T for which all five rules of classification ( 11) -(15) show stable improvement of quality of classification must be chosen as the threshold value * T ( 0.7 *

T
).Thus the use of each of the five rules leads to improvement of the classification quality resulting in the reduction of the number of erroneous decisions, when the smaller number of individual classifiers, corresponding to the threshold value * T , is applied.Herewith such stable improvement of the classification quality isn't observed for all examined values T c (for which ).It should be noted, that the majority vote rule may be used for decisions, obtained using the classification rules (11) -( 15), to determine the required threshold value * T .

Experimental studies
The feasibility of using of the SVM classifier ensembles was confirmed by test and real data.Actual data used in the experimental researches were taken from Statlog project and from UCI machine learning library.In particular, data set for credit scoring was used.The German credit data set contains 1000 instances including 700 creditworthy cases (class 1) and 300 default cases (class 2); herewith each applicant is described by 24 characteristics ( 24 q , the source is http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/).As a result, 18 private SVM classifiers were trained with use of various input parameters. During

T
. Herewith the final number of classifiers in the ensemble proved equal to 8. A further decrease in the number of classifiers is not feasible (due to a further sharp decrease in their number and a substantial reduction of their variety).Use of the median strategy with 35 .0 T allowed classifying correctly 98.2% of objects of the initial data set.At the same time, the maximum accuracy of one of individual SVM classifiers, used in the ensemble, was equal 93.1%, and the accuracy reached with use of the majority vote rule was equal 96%.
Thus, the use of the SVM classifier ensemble allowed increasing the classification accuracy more than 5% compared with the maximum accuracy of one of the individual classifiers.

Conclusions
The use of the SVM classifier ensembles allows reducing the accident classification decision received by one classifier, and helps to improve the classification accuracy.The shortcomings of some classifiers are compensated by strengths of others classifiers thanks to combination of their results.Classifiers counterbalance the results' accident of each other, finding on the basis of balance the most plausible output classification decision.It allows finding the best classification result with minimum classification error.
Several individual SVM classifiers using different types of the kernel function, different values of the kernel function functions of the kernel parameters and different values of the regularization parameter were learned in the experiments for a particular data set.Herewith the different training and test sets randomly generated from the original data set used were.Then the principle of maximum decorrelation (for selection of individual classifiers, which must be included in the ensemble) and various strategies of forming of the aggregated classifier were applied for the trained classifiers.The obtained classifier should have the higher classification accuracy than the classification accuracy of any single individual classifier.
DOI: 10.1051/ C Owned by the authors, published by EDP Sciences,