Algorithms Research
p-ISSN: 2324-9978 e-ISSN: 2324-996X
2013; 2(1): 8-17
doi:10.5923/j.algorithms.20130201.02
D. S.V.G.K.Kaladhar1, Bharath Kumar Pottumuthu1, Padmanabhuni V. Nageswara Rao2, Varahalarao Vadlamudi3, A.Krishna Chaitanya1, R. Harikrishna Reddy1
1Department of Bioinformatics, GIS, GITAM University, Visakhapatnam, 530045, India
2Dept. of Computer Sciences, GITAM University, Visakhapatnam, India
3Department of Biochemistry, Dr. L B PG College, Visakhapatnam, India
Correspondence to: D. S.V.G.K.Kaladhar, Department of Bioinformatics, GIS, GITAM University, Visakhapatnam, 530045, India.
Email: |
Copyright © 2012 Scientific & Academic Publishing. All Rights Reserved.
Data mining is used in various medical applications like tumor classification, protein structure prediction, gene classification, cancer classification based on microarray data, clustering of gene expression data, statistical model of protein-protein interaction etc. Adverse drug events in prediction of medical test effectiveness can be done based on genomics and proteomics through data mining approaches. Cancer detection is one of the hot research topics in the bioinformatics. Data mining techniques, such as pattern recognition, classification and clustering is applied over gene expression data for detection of cancer occurrence and survivability. Classification of colon cancer dataset using weka 3.6, in which Logistics, Ibk, Kstar, NNge, ADTree, Random Forest Algorithms show 100 % correctly classified instances, followed by Navie Bayes and PART with 97.22 %, Simple Cart and ZeroR has shown the least with 50 % of correctly classified instances. Kappa Statistic for Logistics, Ibk, Kstar, NNge, ADTree, Random Forest has shown Maximum. Mean absolute error and Root mean squared error are shown low for Logistics, Kstar and NNge. Using various Classification algorithms the cancer dataset can be easily analyzed.
Keywords: Data Mining, Colon Cancer, Dataset, ROC
Cite this paper: D. S.V.G.K.Kaladhar, Bharath Kumar Pottumuthu, Padmanabhuni V. Nageswara Rao, Varahalarao Vadlamudi, A.Krishna Chaitanya, R. Harikrishna Reddy, The Elements of Statistical Learning in Colon Cancer Datasets: Data Mining, Inference and Prediction, Algorithms Research, Vol. 2 No. 1, 2013, pp. 8-17. doi: 10.5923/j.algorithms.20130201.02.
|
Figure 1. ROC Analysis graph using kNN alogorithm |
Figure 2. Calibration Plot drawn using kNN alogorithm |
Figure 3. Sieve Multigram |
Figure 4. Survey Plot |
Figure 5. Hierarchical Clustering |
[1] | Mohammed J. Zaki, Shinichi Morishita, Isidore Rigoutsos, “Report on BIOKDD04: Workshop on Data Mining in Bioinformatics”, in SIGKDD Explorations, vol. 6, no. 2, pp. 153-154, 2004. |
[2] | J. Li, L. Wong, Q. Yang, “Data Mining in Bioinformatics”, IEEE Intelligent System, IEEE Computer Society. Indian Journal of Computer Science and Engineering, vol 1 no 2, pp. 114-118, 2005. |
[3] | R. P. Kumar, M. Rao, D. Kaladhar, “Data Categorization and Noise Analysis in Mobile Communication Using Machine Learning Algorithms”, Wireless Sensor Network, vol. 4, no.4, pp. 113-116, 2012. |
[4] | Mark H. E. Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten, “The WEKA data mining software: an update“, SIGKDD Explorations, vol. 11, no.1, pp.10-18, 2009. |
[5] | D. J. Hand, “Statistics and data mining: intersecting disciplines“, SIGKDD Explorations, vol. 1, no. 1, pp. 16-19, 1999. |
[6] | C Apte, E Grossman, E Pednault, B Rosen, F Tipu, B White, "Insurance risk modeling using data mining technology", Proceedings of PADD99: The Practical Application of Knowledge Discovery and Data Mining, pp.39-47, 1999. |
[7] | Liu, Bing, Chee Wee Chin, Hwee Tou Ng. "Mining topic-specific concepts and definitions on the web." Proceedings of the 12th international conference on World Wide Web. ACM, pp.251-260, 2003. |
[8] | M.K. Jakubowski, Q. Guo, M. Kelly, “Tradeoffs between lidar pulse density and forest measurement accuracy”, Remote Sensing of Environment, vol. 130, pp. 245-253, 2013. |
[9] | E. Frank, M. Hall, L. Trigg, G. Holmes, I. H. Witten, “Data mining in bioinformatics using Weka”, Bioinformatics, vol. 20, no. 15, pp. 2479-2481, 2004. |
[10] | M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, “The WEKA data mining software: an update”, ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10-18, 2009. |
[11] | Tomaž Curk, Janez Demšar, Qikai Xu, Gregor Leban, Uroš Petrovič, Ivan Bratko, Gad Shaulsky, Blaž Zupan, “Microarray data mining with visual programming”, Bioinformatics, vol. 21, no. 3, pp. 396-398, 2005. |
[12] | R. W. Burt, J. S. Barthel, K. B. Dunn, D. S. David, E. Drelichman, J. M. Ford, et al, “Colorectal cancer screening”, Journal of the National Comprehensive Cancer Network, vol. 8, no. 1, pp. 8-61, 2010. |
[13] | David Cunningham, Wendy Atkin, Heinz-Josef Lenz, Henry T Lynch, Bruce Minsky, Bernard Nordlinger, Naureen Starling, “Colorectal cancer”, The Lancet, vol. 375, no. 9719, pp. 1030-1047, 2010. |
[14] | R. A. Smith, V. Cokkinides, D. Brooks, D. Saslow, O. W. Brawley, “Cancer screening in the United States, 2010: a review of current American Cancer Society guidelines and issues in cancer screening”, CA: a cancer journal for clinicians, vol. 60, no.2, pp. 99-119, 2010. |
[15] | K. Mehmed, "Data Mining: Concepts, Models, Methods And Algorithms." IEEE Computer Society, IEEE Press, 2003. |
[16] | W. J. Frawley, G. Piatetsky-Shapiro, C. J. Matheus, “Knowledge discovery in databases: An overview”, AI magazine, vol. 13, no. 3, pp. 57, 1992. |
[17] | H. Lieberman, D. Maulsby, “Instructible agents: Software that just keeps getting better”, IBM Systems Journal, vo. 35, no. 3.4, pp. 539-556, 1996. |
[18] | R. Rada, “Expert systems and evolutionary computing for financial investing: A review”, Expert systems with applications, vol. 34, no. 4, pp. 2232-2240, 2008. |
[19] | Cho Sung-Bae, Hong-Hee Won, "Machine learning in DNA microarray analysis for cancer classification", In Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics, vol. 19, pp. 189-198. 2003. |
[20] | Goebel Michael and Le Gruenwald. "A survey of data mining and knowledge discovery software tools", ACM SIGKDD Explorations Newsletter vol. 1, no. 1, pp. 20-33, 1999. |
[21] | A. Chaiboonchoe, S. Samarasinghe, D. Kulasiri, “Machine Learning for Childhood Acute Lymphoblastic Leukaemia Gene Expression Data Analysis: A Review”, Current Bioinformatics, vol. 5, no.2, pp. 118-133, 2010. |
[22] | DSVGK Kaladhar, B. Chandana, “Data Mining, inference and prediction of Cancer datasets using learning algorithms”, International Journal of Science and Advanced Technology, vol. 1, no.3, pp. 68-77, 2011 |
[23] | H. John George, Pat Langley, "Estimating continuous distributions in Bayesian classifiers." In Proceedings of the eleventh conference on uncertainty in artificial intelligence, pp. 338-345. Morgan Kaufmann Publishers Inc., 1995. |
[24] | Chang Chih-Chung, Chih-Jen Lin, "LIBSVM: a library for support vector machines", ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, pp. 27, 2011. |
[25] | Jemal Ahmedin, Freddie Bray, Melissa M. Center, Jacques Ferlay, Elizabeth Ward, David Forman, "Global cancer statistics", CA: a cancer journal for clinicians, vol. 61, no. 2, pp. 69-90, 2011. |
[26] | A. Notterman Daniel, Uri Alon, Alexander J. Sierk, Arnold J. Levine, "Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays", Cancer Research, vol. 61, no. 7, 3124-3130, 2001. |
[27] | Vuk Miha, Tomaz Curk. "ROC curve, lift chart and calibration plot." Metodoloski zvezki, vol. 3, no. 1, pp. 89-108, 2006. |
[28] | Yu, Yunxian, Yifeng Pan, Mingjuan Jin, Mingwu Zhang, Shanchun Zhang, Qilong Li, Xia Jiang et al. "Association of genetic variants in tachykinins pathway genes with colorectal cancer risk." International Journal of Colorectal Disease (2012): 1-8. |
[29] | Desai Monica Dandona, Bikramajit Singh Saroya, Albert Craig Lockhart, "Investigational therapies targeting the ErbB (EGFR, HER2, HER3, HER4) family in GI cancers", Expert opinion on investigational drugs, vol. 0, pp. 1-16, 2013. |
[30] | Penninx, Brenda WJH, Jack M. Guralnik, Richard J. Havlik, Marco Pahor, Luigi Ferrucci, James R. Cerhan, Robert B. Wallace, "Chronically depressed mood and cancer risk in older persons", Journal of the National Cancer Institute, vol. 90, no. 24, pp. 1888-1893, 1998. |