American Journal of Intelligent Systems
p-ISSN: 2165-8978 e-ISSN: 2165-8994
2013; 3(2): 57-70
doi:10.5923/j.ajis.20130302.02
1Department of Computer Engineering, School of Technology and Business Studies, Dalarna University, Rödavägen 3, Borlänge, 78188, Sweden
2University of Central Florida, Orlando, 4000 central Florida blvd, Orlando FL 32816, USA
Correspondence to: Hasan Fleyeh, Department of Computer Engineering, School of Technology and Business Studies, Dalarna University, Rödavägen 3, Borlänge, 78188, Sweden.
| Email: | ![]() |
Copyright © 2012 Scientific & Academic Publishing. All Rights Reserved.
This paper presents a multi-class AdaBoost based on incorporating an ensemble of binary AdaBoosts which is organized as Binary Decision Tree (BDT). It is proved that binary AdaBoost is extremely successful in producing accurate classification but it does not perform very well for multi-class problems. To avoid this performance degradation, the multi-class problem is divided into a number of binary problems and binaryAdaBoost classifiers are invoked to solve these classification problems. This approach is tested with a dataset consisting of 6500 binary images of traffic signs. Haar-like features of these images are computed and the multi-class AdaBoost classifier is invoked to classify them. A classification rate of 96.7% and 95.7% is achieved for the traffic sign boarders and pictograms, respectively. The proposed approach is also evaluated using a number of standard datasets such as Iris, Wine, Yeast, etc. The performance of the proposed BDT classifier is quite high as compared with the state of the art and it converges very fast to a solution which indicates it as a reliableclassifier.
Keywords: Multiclass AdaBoost, Binary Decision Tree, Classification
Cite this paper: Hasan Fleyeh, Erfan Davami, Multiclass Adaboost based on an Ensemble of Binary AdaBoosts, American Journal of Intelligent Systems, Vol. 3 No. 2, 2013, pp. 57-70. doi: 10.5923/j.ajis.20130302.02.
![]() | (1) |

![]() | (2) |
![]() | (3) |
, which is a positive number in the range[0, ∞][5], is given by![]() | (4) |
if
and that
becomes larger when
becomes smaller. In each round t, there is one weak learner in each dimension d which is now described by four parameters[thd,t, lpd,t, dt, αd,t]. The best weak learner among all weak learners in all dimensions in round t is the one which generate the least classification error et. It is given by:![]() | (5) |
of sample xi in round t+1 according to the weight
in round t by the following equation:![]() | (6) |
, and decreases when the weak learner correctly classifies the examples. By this way AdaBoost gives more attention to the wrongly classified samples.Finally the strong classifier is updated by the following equation:![]() | (7) |
![]() | Figure 1. A weak learners (Top) versus a strong learner (Bottom) |
![]() | Figure 2. Multi-class AdaBoost designed by Binary Decision Tree |
and that
gets larger when
get smaller. When dealing with binary problems, this constraint is solved as it is almost the same as random guessing. However, when the number of classes increases, it becomes very hard to achieve this constraint. In other words, when the error of the weak classifier becomes more than 0.5,
gets smaller. This means that the weights of the training samples will be updated in the wrong direction. Hence AdaBoost fails in achieving the multi-class classification task[10]. In is order to keep the very successful binary classifier as the core and achieve multi-class classification (N>2), it is necessary to divide this multi-class problem into a number of binary problems provided that this division should rely on classes’ properties and theircommon similarities.Binary Decision Tree decomposes N class problem into N-1 binary problems. The proposed algorithm takes advantage of the high classification accuracy achieved by the AdaBoost and the efficient architecture of the BDT architecture. It is based on recursively dividing the classes into two disjoint groups in each node of the BDT, as illustrated in Figure 2. In the root of the BDT, the classes are decomposed into two disjoint sets where classes with similar features are grouped in the same set. These two sets are further decomposed in each level of the BDT until the leaves are reached. The number of leaves in the tree is equivalent to the number of classes in the original set. Each node of this BDT contains anAdaBoostwhich is trained to decide to which of the two branches of the node the unknown sample will be assigned.Let N be the number of classes to be classified, and
for
be a set of samples each of which is labelled by
. Samples are divided into two disjoint groups
and
as described in the following steps:• Calculate the centre of class of each of the Ndifferent classes.• Assign the two classes with the largest Euclidian distance to the two disjoint groups
and
.• Repeat• Calculate the Euclidian distance of the remaining classes from both groups.• Assign the class with the smallest Euclidian distance from one of the two classes to that class.• Recalculate the centre of this set of classes.• Until all the remaining classes are assigned to either of the two groups.The pseudo code of the proposed algorithm is depicted in Algorithm 1. Note that this algorithm takes the training samples attributes and the corresponding classes as inputs and generates two lists of the classes to be disjoint. The algorithm generates BDT automatically during the training phase of the classification. 
|
![]() | Figure 3. Generating Haar-like features. (A): the way to compute Haar-like features for a candidate object. (B): different Haar matrices |
![]() | (8) |
, K=36 pixels and
is the Haar matrix in which 1 means the blue areas in Figure 3A, -1 is the red area, and 0 means other pixels of the image which are not overlapped with the Haar matrix. The total number of Haar-like features exceeds 598 million becauseof the different Haar matrices employed in this experiment (Figure 3B) and the high number of images.Performance of the proposed multi-class AdaBoost, is evaluated by 10-fold cross validation. The classifier is trained and tested by the 598 million Haar-like features and the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) for each class are computed. Table 2 illustrates the average values of accuracy and recall[24] of the 10-fold cross validation computed per class, while Figure 4 depicts the results achieved by each of the 10 folds.The robustness of the proposed approach with respect to a situation in which traffic signs can be situated in different angles of rotation was also tested. The multi-class Adaboostis trained with Haar-like features of not-rotated signs and tested by images whichare rotated by different angles from -30o to +30o. Figure 5 depicts two samples of images invoked in the testing procedure.The classification rate achieved by the multi-class AdaBoost is computed for each rotation angle and plotted as shown in Figure 6. The classification rate has not been affected by the traffic sign’s angle of rotation which means that by employing the proposed method a system which is invariant to the angle of rotation can be achieved.Figure 7 depicts the ROC diagram of the multi-class AdaBoost for one of the final stages in the binary decision tree. In order to produce this ROC diagram, the strong classifier is trained with a training set consisting of Haar-like features of one certain class (positive class) and the Negative class images (negative class). The training is accomplished while the threshold of the weak learner is gradually increased from -∞ to +∞. The resulting classifier is then invoked to classify the test dataset of the positive class and the values of the false positive rates and the true positive rates are computed for each threshold value.
|
![]() | Figure 4. Classification rateachieved in different folds |
![]() | Figure 5. Samples of the rotated signs used in the testing of the multi-class AdaBoost |
![]() | Figure 6. Average classification rate versus rotation angle of traffic signs |
![]() | Figure 7. ROC diagram of the multi-class AdaBoost |
![]() | Figure 8. Testing errors for Iris dataset |
|
![]() | Figure 9. Testing errors for Wine dataset |
![]() | Figure 10. Testing errors for Yeast dataset |
![]() | Figure 11. Testing errors for Segment dataset |
![]() | Figure 12. Testing errors for Page-blocks dataset |
![]() | Figure 13. Testing errors for Optical digits dataset |
![]() | Figure 14. Testing errors for Pen digits dataset |
![]() | Figure 15. Testing errors for Vowel dataset |