International Journal of Statistics and Applications
p-ISSN: 2168-5193 e-ISSN: 2168-5215
2014; 4(6): 241-248
doi:10.5923/j.statistics.20140406.01
V. Deneshkumar, K. Senthamaraikannan, M. Manikandan
Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, India
Correspondence to: V. Deneshkumar, Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, India.
| Email: | ![]() |
Copyright © 2014 Scientific & Academic Publishing. All Rights Reserved.
The outlier detection problem has important applications in the field of medical research. Clinical databases have accumulated large quantities of information about patients and their medical conditions. In this study, the data mining techniques are used to search for relationships in a large clinical database. Relationships and patterns within this data could provide new medical knowledge. The main objective of this paper is to detect the outliers and identify the influence factor in the diabetes symptoms of the patient using data mining techniques. Results are illustrated numerically and graphically.
Keywords: Data mining, Outlier detection, Diabetes, PCA and refined method
Cite this paper: V. Deneshkumar, K. Senthamaraikannan, M. Manikandan, Identification of Outliers in Medical Diagnostic System Using Data Mining Techniques, International Journal of Statistics and Applications, Vol. 4 No. 6, 2014, pp. 241-248. doi: 10.5923/j.statistics.20140406.01.
of a given manifest variable is the proportion of its variance that is accounted by the common factors’ of the hypothesized latent structure. The unique variance
of a given manifest variable is the proportion of variance that is unaccounted by the common factors [9]. The general factor model has the following form.![]() | (1) |
The jth variable (j=1,2,3….p),
Factor loading of the jth common factor,
The kth common factor,
Number of common factors (k=1,2,3…m),
The unique variance of the jth variablePrincipal Component AnalysisPrincipal component analysis is a multivariate statistical technique to identify or discover the underlying structure characterizing a set of highly correlated variables
. The principal components are extracted so that the first component accounts for the largest amount of total variation in the data, the second principal component accounts for the second largest amount of total variation and so on. A principal component which is a linear combination of set of variables has the following form![]() | (2) |
Linear composite of jth component (j =1, 2,…p number of variable).
The principal component loading of jth variable.
jth Variable.In principal component analysis, the number of principal components extracted is the same as the number of variables describing the dimensionality of the data. If a data set is characterized by ten variables, principal component analysis of the data would generate ten principal components. The most important issue in principal component analysis is the number of principal components to retain for further analysis.Varimax RotationA varimax rotation is a change of coordinates used in principal component analysis and factor analysis that maximizes the sum of the variances of the squared loadings. ![]() | (3) |
is the loading of the ith variable on the jth factor after rotation, where
is the communality for variable i. The Varimax procedure, selects the rotation to find this maximum quantity. The sample variances of the standardized loadings for each factor, summed over the m factors.![]() | (4) |
![]() | (5) |
![]() | Figure 1. Flow Chart for the Model |
| |||||||||||||
|
| ||||||||||||||||||||||
![]() | Figure 2. Scree Plot |
|
![]() | Figure 3. Model Summary and cluster quality |
![]() | Figure 4. Model Summary |
![]() | Figure 5. Cluster Distribution |
![]() | Figure 6. Cluster Structure |
![]() | Figure 7. Cluster Preference |
| [1] | Ahmad, L., and A. Dey (2007): K-Mean Clustering Algorithm for Mixed Numeric and Categorical Data. Data & Knowledge Engineering, Vol. 63, pp. 503-527. |
| [2] | Barnett, V., and T. Lewis (1984): Outliers in Statistical Data, John Wiley & Sons, New York. |
| [3] | Bellazzi, R., and B. Zupan (2008): Predictive data mining in clinical medicine: Current issues and guidelines. International Journal of Medical Informatics, Vol 77, No.2, pp. 81–97. |
| [4] | Chaira, T. (2011): A Novel Intuitionistic Fuzzy C-Means Clustering Algorithm and Its Application to Medical Images. Applied Soft Computing, Vol.11, pp.1711–1717. |
| [5] | Chrominski, K., and M. Tkacz (2010): Comparison of outlier detection methods in biomedical data. Journal of medical Informatics & Technologies, vol.16, ISSN 1642-6037. |
| [6] | Gnanadesikan, R., and J. R. Kettenring (1972): Robust estimates, residuals, and outlier detection with multi-response data. Biometrics, Vol. 28, pp 81-124. |
| [7] | Hadi, A.S., (1992): Identifying multiple outliers in multivariate data, Journal of the Royal statistical Society. Series B, Vol. 54, pp. 761-771. |
| [8] | Hadi, A. S., (1994): A Modification of a Method for the Detection of Outliers in Multivariate Samples. Journal of the Royal Statistical Society. Series B, Vol.56, No.2. |
| [9] | Hair, J. F., Black, J., Babin, W. C., Anderson, B. J., and R. L. Tatham (2006): Multivariate Data Analysis. 6th ed. New Jersey: Prentice Hall. |
| [10] | Hardin.J and D.M. Rocke., 2002, Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator, Computational Statistics & Data Analysis, 44, 625 – 638. |
| [11] | Hauskrecht, M., Batal, I., Valko, M., Visweswaran, S., Cooper, G.F., and G. Clermont (2013): Outlier Detection for Patient Monitoring and Alerting. Journal of Biomedical Informatics, Vol.46, pp.47–55. |
| [12] | Hawkins, D.M., (1980): Identification of Outliers, Chapman and Hall. |
| [13] | Hodge, V.J. (2004) A survey of outlier detection methodologies, Kluver Academic Publishers, Netherlands, January, 43. |
| [14] | Koufakou, A., and M. Georgiopoulos (2010): A Fast outlier Detection Strategy for Distributed high-Dimensional Data sets with Mixed Attributes. Data Mining and Knowledge Discovery, Vol.20. pp. 259–289. |
| [15] | Lavrac, N. (1999): Selected Techniques for Data Mining In Medicine, Artificial Intelligence in Medicine, Vol 16, No. 1, pp. 3–23. |
| [16] | Ordonez, C., Ezquerra, N., and C. A. Santana (2006): Constraining and summarizing association rules in medical data, Knowledge and Information Systems, Vol.9 (3), pp.259–283. |
| [17] | Parsons L., Haque, E. H. Liu, (2004): Subspace Clustering for high Dimensional Data: a Review, SIGKDD Explorations Vol.6(1), pp.90–105. |
| [18] | Penny, K.I., and I.T. Jolliffe (2001): A Comparison of Multivariate Outlier Detection Methods for Clinical Laboratory Safety Data, Journal of the Royal statistical Society. Series D (The Statistician), Vol. 50, No. 3, pp. 295-308. |
| [19] | Petrovskiy, M.I. (2003) Outlier Detection Algorithms in Data Mining Systems, Programming and Computer Software, Vol.29, No.4, pp.228–23. |
| [20] | Srimani.P.K and Sanjay Koti.M., (2011): Application of Data Mining Techniques For outlier mining in Medical Databases, International Journal of Current Research, Vol. 33, No. 6, pp.402-407. |
| [21] | Wasan, K.S., Bhatnagar, V., and H. Kaur (2006): The Impact of Data Mining Techniques on Medical Diagnostics, Data Science Journal, Vol 5, No.19. pp.119-126. |