American Journal of Bioinformatics Research
p-ISSN: 2167-6992 e-ISSN: 2167-6976
2016; 6(1): 19-25
doi:10.5923/j.bioinformatics.20160601.03
Md. Siraj-Ud-Doulah, Md. Bipul Hossen
Department of Statistics, Begum Rokeya University, Rangpur, Bangladesh
Correspondence to: Md. Siraj-Ud-Doulah, Department of Statistics, Begum Rokeya University, Rangpur, Bangladesh.
Email: |
Copyright © 2016 Scientific & Academic Publishing. All Rights Reserved.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/
DNA microarray experiments have emerged as one of the most popular tools for the large-scale analysis of gene expression. The challenge to the biologist is to apply appropriate statistical techniques to determine which changes are relevant. One of the tools is clustering. Clustering is a method to discern hidden patterns in data without the need for any supervision and in absence of any prior knowledge. Clustering is a popular method for analysis of microarray data. There are several challenges to clustering of microarray data. Unfortunately the results obtained from the common clustering algorithms are not consistent and even with multiple runs of different algorithms a further validation step is required. Due to absence of well-defined class labels, and unknown number of clusters, the unsupervised learning problem of finding optimal clustering is hard. Obtaining a consensus of judiciously obtained clustering not only provides stable results but also lends a high level of confidence in the quality of results. Several base algorithm runs are used to generate clustering and a co-association matrix of pairs of points is obtained using a configurable majority criterion. Synthetic as well as real world datasets are used in experiment and results obtained are compared using various internal and external validity measures. In this paper, results obtained from consensus clustering are consistent and more accurate than results from base algorithms. The consensus algorithm can identify the number of clusters and detect outliers.
Keywords: Consensus Clustering, Linkage, Microarray, Outliers, Validation Indexes
Cite this paper: Md. Siraj-Ud-Doulah, Md. Bipul Hossen, Performance Evaluation of Clustering Methods in Microarray Data, American Journal of Bioinformatics Research, Vol. 6 No. 1, 2016, pp. 19-25. doi: 10.5923/j.bioinformatics.20160601.03.
Linkage Rules |
Figure 1. Linkage Rules (Distance measures) |
Figure 2. Clustering Algorithms (Validation Indexes) |
|
Figure 3. K-Means clustering (k=4 & k=5) |
Figure 4. Consensus Clustering |