American Journal of Bioinformatics Research
2012; 2(2): 1-10
doi: 10.5923/j.bioinformatics.20120202.01
Hossam Fathy ElSemellawy 1, Amr Badr 2, Mostafa Abdel Aziem 3
1Information System Department, Arab Academy for Science, Technology and Maritime Transport, Cairo, Egypt
2Computer Science Department, Faculty of Computers and Information (Cairo University), Giza, Egypt
3Computer Science Department, Arab Academy for Science, Technology and Maritime Transport, Cairo, Egypt
Correspondence to: Hossam Fathy ElSemellawy , Information System Department, Arab Academy for Science, Technology and Maritime Transport, Cairo, Egypt.
Email: |
Copyright © 2012 Scientific & Academic Publishing. All Rights Reserved.
Successful Prediction for MHC Class II epitopes is an essential step in designing Genetic Vaccines[1]. MHC Class II epitopes are short peptides with length between 9 and 25 amino acids which are bound by MHC. These epitopes are recognized by T-Cell Receptors and leads to activation of cellular and humoral immune system and, ultimately, to effective destruction of pathogenic organism. Successful prediction of MHC class II epitopes is more difficult than MHC class I epitopes due to open binding groove at both ends in class II molecules, this structure leads to variable length for MHC II epitopes and complicating the task for detecting the core binding 9-mer. Large efforts have been exerted in developing algorithms to predict which peptides will bind to a given MHC class II molecules. In this paper we presented a novel classification algorithm for predicting MHC Class II epitopes using Multiple Instance Learning technique. Separated Constructive Clustering Ensemble (SCCE) is our new version for Constructive Clustering Ensemble (CCE)[27]. This method was used for converting multiple instance learning problem into normal Single Instance Problem. Most of the processing in this method lies mainly in vector preparation step before using classifier; Support Vector Machine (SVM) has been used as a method with proven performance in a wide range of practical applications[38]. SCCE integrated many algorithms like Genetic Algorithm, K medoid clustering, Ensemble learning and Support vector machine in an orchestration to predict the MHC II epitopes. SCCE was tested over three benchmark data sets and proved to be very competitive with the state of art regression methods. SCCE achieved these results using only binder and non binder flags; without need for regression data.An implementation of MHCII-SCCE as an online web server for predicting MHC-II Epitopes is freely accessible at .
Keywords: Major Histocompatibility Complex (MHC), Multiple Instance Learning (MIL), Genetic Algorithm (GA), Support Vector Machine (SVM)
Figure 1. Example of tertiary structure of peptide binding to MHC class II. It can be seen that the binding groove is open in the ends in contrast to MHC class I |
Figure 2. representing MHC II binder peptide into bag of 9-mers subs strings, each instance is a candidate to be core 9-mers[20] |
|
|
|
[1] | C. Janeway, P. Travers et al, Immunobiology: The Immune System in Health and Disease, 6th ed. Garland Pub, 2004. |
[2] | Castellino F, Zhong G, : Antigen presentation by MHC class II molecules: invariant chain function, protein trafficking, and the molecular basis of diverse determinant capture. Hum Immunol 1997, 54:159-169. |
[3] | Yewdell JW, Bennink JR: Immunodominance in major histocompatibility complex class I-restricted T lymphocyte responses. Annual review of immunology 1999, 17:51-88. |
[4] | M. Nielsen, C. Lundegaard, and O. Lund, Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics, vol. 8, p. 238, 2007. |
[5] | P. Reche, J. Glutting, H. Zhang, and , Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles. Immunogenetics, vol. 56, no. 6, pp. 405–419, 2004. |
[6] | H. Singh and G. Raghava, ProPred: prediction of HLA-DR binding sites. Bioinformatics, vol. 17, no. 12, pp. 1236–1237, 2001. |
[7] | M. Nielsen, C. Lundegaard, P. Worning, C. Sylvester-Hvid, K. Lamberth, S. Buus, S. Brunak, and O. Lund, Improved prediction of MHC class I and II epitopes using a novel Gibbs sampling approach. Bioinformatics, vol. 20, pp. 1388–97, 2004. |
[8] | M. Rajapakse, B. Schmidt, L. Feng, and V. Brusic, Predicting peptides binding to MHC class II molecules using multi-objective evolutionary algorithms. BMC Bioinformatics, vol. 8, no. 1, p. 459, 2007. |
[9] | H. Mamitsuka, Predicting peptides that bind to MHC molecules using supervised learning of Hidden Markov Models. PROTEINS: Structure, Function, and Genetics, vol. 33, pp. 460–474, 1998. |
[10] | M. Nielsen, C. Lundegaard, P. Worning, S. Lauemøller, K. Lamberth, S. Buus, S. Brunak, and O. Lund, Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Science, vol. 12, pp. 1007–1017, 2003. |
[11] | S. Buus, S. Lauemoller, P. Worning, C. Kesmir, T. Frimurer, S. Corbet, A. Fomsgaard, J. Hilden, A. Holm, and S. Brunak, Sensitive quantitative predictions of peptide-MHC binding by a’Query by Committee’ artificial neural network approach. Tissue Antigens, vol. 62, no. 5, pp. 378–384, 2003. |
[12] | J. Cui, L. Han, H. Lin, H. Zhang, Z. Tang, C. Zheng, Z. Cao, and Y. Chen, Prediction of MHC-binding peptides of flexible lengths from sequence-derived structural and physicochemical properties. MolImmunol, 2006. |
[13] | J. Salomon and D. Flower, Predicting Class II MHC-Peptide binding: a kernel based approach using similarity scores, BMC Bioinformatics, vol. 7, no. 1, p. 501, 2006. |
[14] | M. Nielsen and O. Lund, NN-align. An artificial neural networkbased alignment algorithm for MHC class II peptide binding prediction. BMC bioinformatics, vol. 10, no. 1, p. 296, 2009. |
[15] | I. Doytchinova and D. Flower, Towards the in silico identification of class II restricted Ts-cell epitopes: a partial least squares iterative self consistent algorithm for affinity prediction. pp. 2263–2270, 2003. |
[16] | C. Hattotuwagama, P. Guan, I. Doytchinova, C. Zygouri, and D. Flower, Quantitative online prediction of peptide binding to the major histocompatibility complex. Journal of Molecular Graphics and Modelling, vol. 22, no. 3, pp. 195–207, 2004. |
[17] | W. Liu, X. Meng, Q. Xu, D. Flower, and T. Li, Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models. BMC Bioinformatics, vol. 7, no. 1, p. 182, 2006. |
[18] | H. Bui, J. Sidney, B. Peters, M. Sathiamurthy, A. Sinichi, K. Purton, B. Moth´e, F. Chisari, D. Watkins, and A. Sette, Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics, vol. 57, no. 5, pp. 304–314, 2005. |
[19] | M. Nielsen, C. Lundegaard, and O. Lund: Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics, vol. 8, p. 238, 2007. |
[20] | Yasser EL-Manzalawy, Drena Dobbs, and Vasant Honavar: Predicting MHC-II binding affinity using multiple instance regression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS. |
[21] | P. Wang, J. Sidney, C. Dow, B. Moth´e, A. Sette, and B. Peters: A Systematic Assessment of MHC Class II Peptide Binding Predictions and Evaluation of a Consensus Approach. PLoS Computational Biology, vol. 4, no. 4, 2008. |
[22] | C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, and J. Wootton: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, vol. 262, no. 5131, pp. 208–214, 1993. |
[23] | S. Chang, D. Ghosh, D. Kirschner, and J. Linderman: Peptide length based prediction of peptide-MHC class II binding. Bioinformatics, vol. 22, no. 22, p. 2761, 2006. |
[24] | Y. Chen, J. Bi, and J. Wang, MILES: Multiple-instance learning via embedded instance selection. IEEE Trans Pattern Anal Mach Intell, vol. 28, no. 12, pp. 1931–1947, 2006. |
[25] | S. Henikoff and J. Henikoff, Amino Acid Substitution Matrices from Protein Blocks. Proceedings of the National Academy of Sciences of the United States of America, vol. 89, no. 22, pp. 10 915–10 919, 1992. |
[26] | S. Shevade, S. Keerthi, C. Bhattacharyya, and K. Murthy: Improvements to the SMO Algorithm for SVM Regression. IEEE Transactions on Neural Networks, vol. 11, no. 5, p. 1189, 2000. |
[27] | Zhi-Hua Zhou, Min-Ling Zhang: Solving Multi-Instance Problems with Classifier Ensemble Based on Constructive Clustering. Knowledge and Information Systems, 11(2):155-170, 2007. |
[28] | Nielsen M, Lundegaard C, Blicher T, Peters B, Sette A, Justesen S, Buus S, Lund O: Quantitative predictions of peptide binding to any HLADR molecule of known sequence: NetMHCIIpan. PLoS Comput Biol 2008, 4(7):e1000107. |
[29] | El-Manzalawy Y, Dobbs D, Honavar V: On evaluating MHC-II binding peptide prediction methods. PLoS One 2008, 3(9):e3268. |
[30] | R. H. Dietterich, T. G.; Lathrop and T. Lozano-Perez, Solving the multiple-instance problem with axis parallel rectangles. Artificial Intelligence, vol. 89(1-2), pp. 31–71, 1997. |
[31] | Zhou Z-H, Zhang M-L (2003) Ensembles of multi-instance learners. In Lavra·c N, Gamberger D, Blockeel H, Todorovski L (eds). Lecture Notes in Artificial Intelligence 2837, Springer, , pp 492-502. |
[32] | S. Andrews, I. Tsochantaridis, and T. Hofmann, Support vector machines for multiple-instance learning. Advances in Neural Information Processing Systems, vol. 15, 2002. |
[33] | T. Gartner, P. Flach, A. Kowalczyk, and A. Smola, Multi-instance kernels. Proceedings of the 19th International Conference on Machine Learning, pp. 179–186, 2002. |
[34] | Dietterich TG (2000) Ensemble methods in machine learning. In Kittler J, Roli F (eds). Lecture Notes in Computer Science 1867, Springer, , pp 1-15. |
[35] | Wang J, Zucker J-D (2000) Solving the multiple-instance problem: A lazy learning approach. In Proceedings of the 17th International Conference on Machine Learning, , 2000, pp 1119-1125. |
[36] | Weidmann N, Frank E, Pfahringer B (2003) A two-level learning method for generalized multi-instance problem. In Lavra·c N, Gamberger D, Blockeel H, Todorovski L (eds). Lecture Notes in Artificial Intelligence 2837, Springer, , pp 468-479. |
[37] | Jiawei Han & Micheline Kamber (2001), Data Mining Concepts and Techniques, pp351. |
[38] | Cristianini N, Shawe-Taylor J: An introduction to support vector machines and other kernel-based learning methods. , , Press; 2000. |
[39] | Peng Wang, John Sidney, Yohan Kim, Alessandro Sette, Ole Lund, Morten Nielsen, Bjoern Peters: Peptide binding predictions for HLA DR, DP and DQ molecules. BMC Bioinformatics 2010, 11:568. |
[40] | Brusic V, Rudy G, Harrison L, Journals O. MHCPEP a database of MHCbinding peptides: update 1997. Nucleic Acids Res 26: 368–371. |
[41] | Bhasin M, Singh H, Raghava G (2003) MHCBN: a comprehensive database of MHC binding and non-binding peptides. Bioinformatics 19: 665–666. |
[42] | Raghava G. MHCBench: Evaluation of MHC Binding Peptide Prediction Algorithms. Available athttp://www.imtech.res.in/raghava/mhcbench/. |