International Journal of Probability and Statistics
p-ISSN: 2168-4871 e-ISSN: 2168-4863
2016; 5(3): 82-88
doi:10.5923/j.ijps.20160503.03

Mohamed Y. Sabry , Rasha B. El Kholy , Ahmed M. Gad
Statistics Department, Faculty of Economics and Political Science, Cairo University, Cairo, Egypt
Correspondence to: Ahmed M. Gad , Statistics Department, Faculty of Economics and Political Science, Cairo University, Cairo, Egypt.
| Email: | ![]() |
Copyright © 2016 Scientific & Academic Publishing. All Rights Reserved.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/

Longitudinal data have vast applications in medicine, epidemiology, agriculture and education. Longitudinal data analysis is usually characterized by its complexity due to inter-correlation between repeated measurements within each subject. One way to incorporate this inter-correlation is to extend the generalized linear model (GLM) to the generalized linear mixed model (GLMM) via including the random effects component. When the distribution of the random effects is far from the usual features of the Gaussian density, such as the student's t-distribution, this increases the complexity of the analysis as well as introducing critical features of subject heterogeneity. Thus, statistical techniques based on relaxing normality assumption for the random effects distribution are of interest. This paper aims to find the maximum likelihood estimates of the parameters of the GLMM when the normality assumption for the random effects is relaxed. Estimation is done in the presence of missing data. We assume a selection model for longitudinal data with a dropout pattern. The proposed estimation method is applied to both simulated and real data sets.
Keywords: Breast Cancer, Longitudinal data, Missing data, Mixed models, Monte Carlo EM
Cite this paper: Mohamed Y. Sabry , Rasha B. El Kholy , Ahmed M. Gad , Generalized Linear Mixed Models for Longitudinal Data with Missing Values: A Monte Carlo EM Approach, International Journal of Probability and Statistics , Vol. 5 No. 3, 2016, pp. 82-88. doi: 10.5923/j.ijps.20160503.03.

represent the jth measurement on the ith subject. Their distribution belongs to the exponential family and it can be expressed in a generalized linear model (GLM) as follows:![]() | (1) |
and
are functions defined according to the chosen distribution of exponential family and
is called the natural parameter of the distribution. For simplification we assume a balanced design, that is
Let 
where
are complete, observed and missing outcomes vectors of subject i respectively. Assuming that we have
missing values for each subject i, then 

and
The conditional distribution of
given the random effects
has a GLM form and follows the exponential family form as in Eq. (1) with a canonical identity link function, i.e.,
Thus
can be written in the form:
, where 
is the jth row of the
design matrix,
for vector of fixed effect parameters
and
is the jth row in
the design matrix of ith subject's random effect,
The scale parameter is
and
The GLMM can be written in the following form:![]() | (2) |
![]() | (3) |
is constant and equals to one. Hence we can write 
and
in Eq. (2). The function k(.) is the density function of the random effects
with non-central mean
variance covariance matrix
and
degrees of freedom.Note that the responses at all occasions j=1,...,n for any subject i are conditionally independent given the common random effect
Thus, the likelihood for N subjects with complete data
can be written as follows:![]() | (4) |
is proportional to the marginal density function
This can be obtained by integrating out the random effect, i.e.![]() | (5) |
be the missing data mechanism indicator whose jth component
equals to 1 when the response
is missing, otherwise it equals to zero. Under the selection model [19], the distribution of
given the response
is indexed with a parameter vector
and has a multi-nominal distribution with
cell probabilities, i.e.![]() | (6) |
![]() | (7) |
The classical normality assumption of the random effects is not always realistic. The t-distribution offers a more viable alternative with respect to real-world data. Assuming a central multivariate t-distribution
for the random effect,
the density function k(.) is of the form![]() | (8) |
![]() | (9) |
is the set of all parameters.
the random effects parameters,
and the missing data model's parameters,
are estimated. A monotone pattern of missing data for the outcome variable
is assumed, where
The random effects of the ith subject,
are unobserved, so they are also considered as missing data. Thus, in E-step of EM algorithm we need to integrate the complete-data log-likelihood in Eq. (9) over the missing data
and
In other words, in the E step the expectation of the log-likelihood function of complete data given the observed data and current estimate of parameters is obtained. The expectation is with respect to the joint distribution of missing data
and
. Starting with initial values
the (t+1)th iteration of the EM algorithm proceeds as follows: The E-step: Obtain the expectations with respect to the conditional distribution of
given in items 1 and 2 below. Also, obtain the conditional expectation with respect to conditional distribution of
given in item 3 below.
The M step: Find the values
that maximize the function in item 1 above. Also, the values
that maximize the function in item 2 above and
that maximize the function in item 3 above. If convergence is achieved, the current values are the maximum likelihood estimates. Otherwise, let 



and return to the E-step.The expectation of the log-likelihood for the ith subject at iteration (t+1) is ![]() | (10) |
is the conditional distribution of missing data
given the observed data and the current parameter estimates. The integral in Eq. (10) is intractable. Therefore, expectations are obtained via Monte Carlo simulation where sampling from
is required.The following two relations describe the complete conditional distributions![]() | (12) |
![]() | (13) |
because the missingness mechanism doesn't depend on the random effects. A single draw
for the first dropout of the ith subject is generated from the predictive distribution
instead of
where
are the current parameter estimates. Then an accept-reject sampling procedure is used to sample
For more details, interested readers are referred to [20]. From properties of a multivariate normal,
is a normal distribution with mean
and covariance matrix
where![]() | (13) |
![]() | (14) |
and
are suitable partitions from mean
and covariance matrix V respectively. Sampling from the second complete conditional distribution in Eq. (12) is conducted using the Metropolis algorithm. A candidate value
is generated from a proposal distribution which is chosen to be the density function of the random effects (the multivariate t). The generated value is accepted as opposed to keep the previous value with the probability:![]() | (15) |
is a univariate normal and could be easily evaluated. The acceptance probability can be simplified as![]() | (16) |
and
The
is jth row of
and
is the jth row in matrix
of dimension
and
is the jth row in matrix
of dimension 
![]() | (17) |
![]() | (18) |
and
is the variance covariance matrix of random effects component
of dimension 
|
i = 1,2,…, 30, j=1,2,…,7. The random effects
i = 1,2,…, 30, were generated from the multivariate t-distribution with dimension q =4 and known degrees of freedom
The response values were generated, conditionally on the random effect, with variance covariance matrix
which is assumed to have a first order autoregressive correlation structure AR(1). The responses of the ith are modeled using the GLMM as![]() | (19) |
is a subject random effects. The
is a vector of ones
The noise component
follows a multivariate normal
. The
k=1,2,3; i=1,2,...,N; j=1,2,…,m are covariates (design matrix) associated with each subject. The values of
are chosen as follows:
The missing data process is modeled as a logistic model, where the missing data mechanism depends on the current and previous outcome as:![]() | (12) |
for all i. Also,
are assumed to be independent for all (i,j) given
and
From Eq. (6) the missing data model for all observations can be written as:![]() | (21) |
is the occasion at the first dropout for subject i.The parameter values are fixed as 


and
Table 1 shows the maximum likelihood estimates obtained using the MCEM algorithm, and the corresponding standard errors. 3000 samples were generated from missing data (missing outcome and random effect) with a burn-in period 500 iterations. Percentage of complete subjects is around 50%. As can be seen from Table 1, all parameters estimates are highly significant. All parameter estimates do not exceed a relative bias of 30% except
its bias percentage is 43.14%, where the relative bias is defined as the ratio of bias with respect to actual values. Small values for standard errors show that we have good results, which are lying in confidence interval 95%.All parameters of missing mechanism are significant, while parameter estimates of missing-mechanism for previous and current outcome are very significant, which support that the missing data-mechanism is non-ignorable.
|
are defined as follows:
The MCEM is implemented using 3000 samples with a bur-in period of 500 iterations.
|

and
The estimates of
and
were around 0.99, 1.21, -1.90, 0.26, and -0.24 respectively. The results in Table 3 are the maximum likelihood estimates using different starting points for all parameters. Results show that the algorithm converges properly as the parameter estimates are similar for the different starting values. Also, all estimates are highly significant. Highly significant coefficients of treatments are indication of strong relationship between the response variable (PACIS) and the treatments. Given, that any patient receives only one treatment during the trail, the higher doze from treatment A or D, the smaller PACIS value i.e. a better quality of life the patient gets. On the other hand, the higher doze from treatment B or C, the higher PACIS value, i.e. patient endure hardship to cope with her life. Also, the significance of
and
supports that the missing data mechanism is non-ignorable.