International Journal of Statistics and Applications

p-ISSN: 2168-5193    e-ISSN: 2168-5215

2023;  13(1): 20-26

doi:10.5923/j.statistics.20231301.03

Received: Nov. 7, 2022; Accepted: Jan. 21, 2023; Published: Jul. 24, 2023

 

Application of EM Algorithm in Classification Problem & Parameter Estimation of Gaussian Mixture Model

Felix N. Nwobi1, Stephen I. Kalu1, 2

1Department of Statistics, Imo State University, Owerri, Nigeria

2Department of Statistics, Abia State Polytechnic, Aba, Nigeria

Correspondence to: Stephen I. Kalu, Department of Statistics, Imo State University, Owerri, Nigeria.

Email:

Copyright © 2023 The Author(s). Published by Scientific & Academic Publishing.

This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/

Abstract

The problem of misclassification in a k-group scenario which involves identifying any component from which each group elements is drawn from is studied in this paper. These problems were modeled as a Gaussian mixture model while, the expectation maximization algorithm (EM) was used in the estimation of the parameters for the identification of the group where each group elements is drawn from. Two data sets were used in this paper; the weights of 1000 students and the weights of 200 babies at birth. Results show that 70% correct classification rate, attributing 30% to misclassification using data set 1 and 74.5% correct classification rate, attributing 25.5% to misclassification using data set 2, were achieved.

Keywords: Mixture Models, Estimation in mixture models, Gaussian mixture models, EM Algorithm, Classification in Gaussians

Cite this paper: Felix N. Nwobi, Stephen I. Kalu, Application of EM Algorithm in Classification Problem & Parameter Estimation of Gaussian Mixture Model, International Journal of Statistics and Applications, Vol. 13 No. 1, 2023, pp. 20-26. doi: 10.5923/j.statistics.20231301.03.

1. Introduction

According to Lindsay (1995), Mixture Models (MM) as probabilistic models are used for representing the presence of population subsets located within an overall population. Mixture models provide a much wider range of modeling possibilities and capabilities than the single component model. Some details of MM can be found in Bohning and Seidel (2003) and Bohning et al. (2007).
For estimation involving mixture models, various analytical methods have been developed for estimating (a parameter space) in finite mixture models. There are many methods of estimating the parameters of a MM, four of such methods are method of moments, minimum distance method, Bayesian method and maximum likelihood (ML) method.
Pearson’s (1894) method of moments is one of the earliest methods for estimating the parameters from finite mixture models. This method held sway until the advent of modern computers to compute the maximum of the log-likelihood function. Some developments in the method of moment estimation can be found in Lindsay & Basak (1993), Furman & Lindsay (1994a, b), Lindsay (1995), Withers (1996) and Craigmile & Titherington (1998).
Minimum distance estimation methods introduced by Wolfowitz (1957) is a general method for estimating in a finite mixture, have been discussed by William et al. (1982) and Titherington et al. (1985).
Another method for estimating is the Bayesian method. Many reasons abound why some researchers are inclined to using Bayesian method of estimation while dealing with a finite MM (Fruhwirth-Schnatter, 2006). The reasons for these are the introduction of a suitable prior distribution or that eliminates spurious modes in the course of maximizing the log-likelihood function, and secondly, in the case where the posterior distribution for the unknown parameters is handy, this method provides reliable inference without recourse to the asymptotic normality of the distributions. These are the inherent advantages associated to this method especially when the sample size is small , since the asymptotic theory of MLE can be implemented when is quite large .
The fourth method for estimating the parameters of a finite MMs is the ML Estimation method. Likelihood based inference has enjoyed tremendous and fast developments and has contributed immensely towards resolving estimation problems involving finite MMs. Since the explicit expression for the MLE’s are typically unavailable, then a numerical EM Algorithm is used for maximizing the log-likelihood function. The expectation-Maximization (EM) Algorithm is one of the methods frequently in use according to Dempster et al. (1977). We can find more details about this methods in McLachlan and Krishna (1997), McLachlan and Peel (2000), Oleszak (2020), Kuroda (2021) and Smyth (2021).
The aim of this paper is to implement classification procedure on the Gaussian mixture model involving 2 groups and to identify the component from which each group elements probably belongs to, and as well as to estimate the respective parameters of the groups and their mixture weights. The EM Algorithm procedures were implemented in this paper using MATLAB.

2. Methods

Let be a random variable from a normal population. Let also be a random sample from that constitute two groups, such that
where .
According to Wirjanto (2009), we assume that . The Gaussian distribution is defined as
(1)
, so that the probability density function of mixture of Gaussians will be given as
(2)
where and are the mixing parameter weights.
In two component Gaussian mixture models, and is the PDF of a normal distribution with finite mean and finite variance . The number of estimable parameters of the Gaussian mixture distributions is given by the formula so that if , as in our case, .

2.1. Implementation Procedure of EM Algorithm

Dempster et al (1977) and Wu (1985) have proposed the use of initial guesses for the parameters Soderlind (2010) in line with Dempster et al (1977) and Wu (1985) suggested using Both and can be set equal to the overall sample variance Here, we take , similarly for . We initialize the mixing proportion at .
(a) e-step:
Following Huber (1964), for a given number of an observed data, we can evaluate the corresponding posterior probabilities, called responsibilities as follows
(3)
for and
(b) m-step:
Now, we compute the weighted means and variances using the obtained responsibilities in (3) as
(4)
(5)
(6)
The mixing probability is computed as;
(7)
Naturally, iteration continues until convergence is achieved. Convergence is generally achieved by evaluating the log-likelihood after performing each iteration and stop further iteration when it appears that the log-likelihood is not changing in a significant manner from one iterative step to another. In Kiefer & Wolfowitz (1956), under the independent and identically distributed (iid) assumption, the log-likelihood is defined as follows:
(8)
The up-dated mixture mean and variance are obtained if we have as a random variable with a two component Gaussian mixture as follows: Representing the mixture mean weight as and that of the mixture variance weight then the respective mixture mean and variance weights can be estimated respectively as
(9)
(10)
where and is the up-dated mean and variance of the Gaussian mixture after each iterative step of the EM Algorithm.

2.2. Classification with Gaussians

We used Bayes’ Theorem for our problem to relate the probability density function of the data, , given the class to the posterior probability or the class given the data. Considering our univariate data consisting of a set of random variable , whose PDFs, given , are Gaussians with means and variances . Using Bayes’ theorem, we specify the component probability density function as;
(11)
where is the likelihood of class given observation . Probability of misclassification is a measure of the likelihood that individuals or objects are classified wrongly. We have two types of misclassification error as;
1. To classify into population given that it is actually from population
2. To classify into population given that it is actually from population
Following Richard et al. (2007), we classify the classification against the true group by creating a Statistician’s Confusion Matrix as in Table 1. Apparent error rate can be defined as the measure of performance in classification that does not depend on the form of the parent population. This rate is the fraction of observed values in the training sample that are misclassified by the sample classification function and can be calculated for any classification procedure. The confusion matrix is of the form
Table 1
     
where
Number of objects that are correctly classified as objects
Number of objects misclassified as objects
Number of objects that are correctly classified into objects
Number of objects that are misclassified into objects.
The formula error rate is given as
(12)
See Appendix G for the MATLAB Source codes of obtaining the relevant quantities in Table 1.

3. Results and Discussions

Two data sets were used. Data Set 1are weights of male and female first year students of Abia State Polytechnic, Aba. Data Set 2 consists of weights of 120 males and 80 female babies at birth from Federal Medical Centre (FMC), Owerri, Imo State. See data on Appendix A, B and C. In this section, we used MATLAB source codes to implement the EM Algorithm procedures and as well, carry out the data classification analysis.

3.1. Result of the Expectation Step

Using Data set 1, taking initial values for . We also take the overall mean . is an column vector of the combined weights of male female students. We generate the probabilities for each of the data points. This step helps us in allocating the distinct mixture observations to previously defined groups. See equations (1) - (7) in page 3-4. Using Data set 2, the Algorithm was initialized with the following parameter values, , taking its overall initial mean as . In this case, is a column vector of the combined baby weights at birth. Data set 1 consist of while in Data set 2, since, and .

3.2. Results of the Maximization Step

For us to obtain the log-likelihood and the mixing proportions using Data sets 1 & 2, we applied the MATLAB code in Appendix F. This approach, maximizes the E-Step and outputs the optimal mixing weights using data set 1 as ,and component means as & component variances as , . Using data set 2 in the same manner, we obtained the final requisite parameter up-dates as , , component means as , and component variances as . See equations (9) - (10) in page 4 of this paper.

3.3. Results of the EM Algorithm

Data set 1
After our implementation of the EM algorithm using data set 1, the iterated maximum likelihood estimates for the parameters are contained in Table 2.
Table 2. Results of estimation of model parameters using data set 1
     
To achieve convergence, 17 iterations were required for optimality criterion, with log-likelihood = -3492.46. See source codes in appendix F for implementation.
Data set 2
Also, having implemented the EM algorithm using data set 2, the iterated maximum likelihood estimates for the parameters are contained in Table 3.
Table 3. Results of estimation of model parameters using data set 2
     
To iteratively achieve convergence, 87 iterations were required for optimality criterion to be met, with log-likelihood = -170.968. See the derivation source codes in appendix F.

3.4. Description of the Iteration Procedure Using Data Set 1&2

To implement the EM Algorithm iterative procedure to our data sets, we applied the following sequence of operation INITIALIZATION
Data set 1 initial values:
Data set 2 initial values:
Expectation step:
1. Input: (Slot in the initial values for either data set 1 or 2 into the source codes of Appendix D)
Output: A set of with weights that maximizes the log-likelihood function of equation (8) will be generated.
Maximization step:
2. Update the mixture model parameters with the computed output weights from E-step using the MATLAB source codes in Appendix E.
3. Stopping criteria: If stopping rule are satisfied (convergence of parameters and log-likelihood) then we stop, else we set and go back to step 1 and input the updated parameter values for the next iteration. . Where is the number at which convergence or optimality conditions was achieved. Stopping Rule: When all the epsilon values are less than or equal to 0.0001 , or the values of all appears not to be changing significantly from one iterative step to another, then we assume that the EM solution is optimal at that point and cannot be improved upon further. The values of the complete iterations are contained in Table 3 and Table 4 of this paper.

3.5. Result of the Classification Using Data Set 1

After implementing the classification procedures on the maximized generated posterior probabilities using equation (11) & (12), we obtained the following confusion matrix. See Appendix G for the MATLAB’s source codes for deriving the discriminant values of Table 4 and Table 5.
Table 4
     
From the obtained confusion matrix in Table 4, we can now compute the apparent error rate to determine the probability of misclassification considering the weights of students. Using equation (12). The apparent error rate is computed as follows:

3.6. Result of the Classification Using Data Set 2

Table 5
     
Likewise from Table 5, we compute the APER to determine the probability of misclassification considering the babies weights at birth.

4. Conclusions

From this paper, we explained the intricacies of how to estimate the parameters of two combined Gaussian models as well as their classifications using EM Algorithm procedure. We also explained exhaustively the actual estimation of these model parameters using sets of sample data, namely data set for the weights of students and data set 2 for the weights of babies at birth.
Table 2 of our analysis displayed the estimated maximized values of the relevant parameters for the Gaussian mixture model using data set 1, while Table 3 showed the estimated maximized values of the parameters for the mixture model using data set 2. Having a look at the both ends of Table 2 & Table 3, revealed that convergences have been achieved, since the parameter estimates stopped changing significantly at those points and . Table 4 presents the result of classification using data set 1, whose interpretation implies that we misclassified data set 1 by about 30% failure rate, attributing 70% to correct classification rate. Table 5 at the other hand showed that we misclassified the data set 2 by about 25.5%, having a classification success rate of 74.5%. However, the overall classification efficiency of the Algorithm based on the output of sample data can be considered high, since 70% of Data set 1 and 74.5% of Data set 2 were correctly classified by the Algorithm. These validations achieved by the sample data procedures suggest that EM Algorithm may be useful as early warning statistical tool for parameter estimation and for predicting and classifying the mixture of Gaussians.

Appendix D

The E-step: See William (2008) for Appendix D, E and F.
% outputs the normalized posterior probabilities from 1 to 10. Hence, in our Algorithm, we replaced 10 with 1000 or 200 as the case may require. Note: is the entire data for student weights or baby weights at birth. is the entire mean of either data set 1 or data set 2 as the case may also demand.

Appendix E

The M-step source codes

Appendix F

MATLAB’s Source codes for the Log-likelihood and the mixing proportions.

Appendix G

MATLAB’s Source codes for the classification/Discriminant Analysis:
For Data Set 1
For Data Set 2

References

[1]  Bohning, D. and Seidel, W. (2003). Editorial: Recent developments in mixture models, computational statistics & Data Analysis 41: 349-357.
[2]  Bohning, D., Seidel W., Alfo, M., Garel, B., Patilea, V., and Walter, G. (2007). Editorial: Advances in mixture models, Computational Statistics and Data Analysis 51: 5205-5210.
[3]  Craigmile, P.F. & Titherington, D.M (1998). Parameter Estimation for finite mixtures of uniform distributions, Communication in Statistics-Theory and methods 26: 1981-1995.
[4]  Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum-likelihhood from incomplete data via the EM algorithm”. Journal of the Royal Statistical Society, Series B 39 (1): 1-38 (https://www.jstor.org/stable/2984875).
[5]  Fruhwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. New York/Berlin/Heidelberg: Springer.
[6]  Furman, W.D. and Lindsay, B.G. (1994a). Testing for the number of components in a mixture of normal distributions using moment estimators. Computational Statistics and Data Analysis 17:473-492.
[7]  Furman, W.D. and Lindsay, B.G. (1994b). Measuring the effectiveness of moment estimators as starting values in maximizing mixture likelihoods. Computational Statistics and Data Analysis 17. 493-507.
[8]  Huber, P.J, (1964). Robust Estimation of a location parameter. Annals of Mathematical Statistics, 35, 73-101.
[9]  Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Annals of Mathematical Statistics 20, 1350-1360.
[10]  Kuroda, M. (2021). Fast computation of the EM Algorithm for mixture Models. Computational Statistics and Application. DOI:10.5772/intechOpen.101249.
[11]  Lindsay, B.G. (1995). Mixture models. Theory, geometry and applications. Institute for mathematical statistics, Hayward, Califonia ( Download: Zipped pdf).
[12]  Lindsay, B.G. and Basak, P. (1993). Multivariate Normal Mixtures: a fast consistent method of moments. Journal of the American Statistical Association 86: 96-107.
[13]  McLachlan, G.J. and Krishna, T. (1997). The EM Algorithm and Extensions. N.Y: Wiley and sons Inc.
[14]  McLachlan, G.J. and Peel, D. (2000). Finite mixture models, inference and application to clustering. Marcel dekker, New York. (a)
[15]  MacLachlan, G.J and Peel, D (2000). Construction of Gaussian mixture distribution using MATLAB – Infinite mixture models. Hoboken, NJ: John Wiley & Sons, Inc. (b)
[16]  Oleszak, P. (2020). Estimating the economy with finite mixture models and the EM Algorithm. Htt:/towardsdatascience.com/estimating-the-state-of-the-economy-with-finite-mixture-models-and- the-em-algorithm-d975d280dbc6.
[17]  Pearson, K. (1894). Contribution to the theory of mathematical evolution, Phil.Trans. Roy. Sco. London A 186, 71-110.
[18]  Richard, A, J. and Dean, W.W. (2007). Applied Multivariate Statistical Analysis, 16th Edition, Pearson education Inc., New Jersey, USA.
[19]  Smyth, P. (2021). Mixture Models and EM Algorithm. htts://www.ics.uci.edu/~smyth/couses/cs274/notes/mixture_models_EM.pdf.
[20]  Soderlind, P. (2010): Lecture Notes On Empirical Finance (PhD). Return Distributions. University of St. Gallen.
[21]  Titherington, D.M., Smith, A.F.M., Makov, U.E. (1985). Statistical Analysis of Finite Mixture Distributions. John Wiley and sons, N.Y: Stanford Press.
[22]  William C. Parr, William R. Schucany (1982). Minimum Distance Estimation of Goodness-of-fit Statistics. Journal of the Royal Statistical Society, Series B (Methodological). Vol.44, No.2, PP. 178-189, Published by Wiley.
[23]  William H.P. (2008). Computational statistics with application to Bio-informatics – unit II: Gaussian mixture models and EM methods using MATLAB.
[24]  Wirjanto, T.S., Xu Dinghai (2009). The Applications of Mixtures of Normal Distributions in Empirical Finance: A Selected Survey. Chinese Economic Society (CES).
[25]  Withers, C.S. (1996). Moment Estimates for mixtures of several distributions and its different means and scales. Communication in Statistics, Theory and methods 25: 1799-1824.
[26]  Wolfowitz, J. (1957).The Minimum Distance Method. The annals of Mathematical Statistics 28(1): 75-88.
[27]  Wu, C.F. (1985). On the convergence properties of the EM algorithm. Annals of Mathematics and Statistics 11: 95-103.