International Journal of Probability and Statistics
2012; 1(3): 41-47
doi: 10.5923/j.ijps.20120103.03
Ahmed M. Gad , Rasha B. El Kholy
Department of Statistics, Faculty of Economics and Political Science, Cairo University, Cairo, Egypt
Correspondence to: Ahmed M. Gad , Department of Statistics, Faculty of Economics and Political Science, Cairo University, Cairo, Egypt.
| Email: | ![]() |
Copyright © 2012 Scientific & Academic Publishing. All Rights Reserved.
The study of longitudinal data plays a significant role in medicine, epidemiology and social sciences. Typically, the interest is in the dependence of an outcome variable on the covariates. The Generalized Linear Models (GLMs) were proposed to unify the regression approach for a wide variety of discrete and continuous longitudinal data. The responses (outcomes) in longitudinal data are usually correlated. Hence, we need to use an extension of the GLMs that account for such correlation. This can be done by inclusion of random effects in the linear predictor; that is the Generalized Linear Mixed Models (GLMMs) (also called random effects models). The maximum likelihood estimates (MLE) are obtained for the regression parameters of a logit model, when the traditional assumption of normal random effects is relaxed. In this case a more convenient distribution, such as the lognormal distribution, is used. However, adding non-normal random effects to the GLMM considerably complicates the likelihood estimation. So, the direct numerical evaluation techniques (such as Newton - Raphson) become analytically and computationally tedious. To overcome such problems, we propose and develop a Monte Carlo EM (MCEM) algorithm, to obtain the maximum likelihood estimates. The proposed method is illustrated using a simulated data.
Keywords: Generalized Linear Mixed Models, Logistic Regression, Longitudinal Data, Monte Carlo EM Algorithm, Random Effects Model
be an
x 1 vector representing the observed sequence of the outcome variable
recorded at time t = 1,2, ... ,
, for the ith subject, i=1,2, ...., n. Also, assume that xij = (xit1, xit2, ..., xitp) is an 1x p vector of p covariates observed at time t. Thus, Xi is an mi x p matrix of covariates corresponding to the ith subject taking the form:
.For simplicity we can assume that mi=m.The primary focus is on the dependence of the outcome on the covariates[3,5]. In other words, we are usually interested in the inference about the regression coefficients, in the usual linear models of the form
where
is a p x1 vector of the regression coefficients. It is usually assumed that the errors,
, are independent and identically normally distributed. Therefore, these models are not used in situations when response variables have distributions other than the normal, or even when they are qualitative rather than quantitative. Examples include binary longitudinal data.To solve this problem, Reference[18] introduce the generalized linear models (GLM) as a unified framework to model all types of longitudinal data[13, 15, 24]. These models assume that the distribution of Yit ( i = 1, ..., n, t = 1, 2, ..., m) belongs to the exponential family. The exponential family distributions can be written in the form:![]() | (1) |
and
. Also
is the scale parameter and it is treated as a nuisance parameter when the main interest is in the regression coefficients[13]. The distribution in Equation (1) is the canonical form and
is called the natural parameter of the distribution.The first two moments of
in Equation (1) are given by ![]() | (2) |
, are equal to some function of the expected value of
, i. e.,
where
is a monotone and differentiable function called the link function. From Equation (1) and Equation (2), we can write the inverse of the link function as
. Note that a link function in which
(e.g., h is the identity function) is called the canonical link function. It is sometimes preferred because it often leads to simple interpretable reparametrized models. By looking at θ in Table 1, we can see that the canonical link functions that correspond to the given distributions are the identity (normal distribution), the log (Poisson distribution) and the logit (binomial distribution) functions. Hence, the GLM has three elements:
|
that are assumed to share the same distribution from the exponential family,2. The systematic component which is the explanatory variables that produce the linear predictor
, and3. The link function which is a monotone and differentiable function
relating the mean
and the linear predictor
.The aim of this paper is to obtain the maximum likelihoodestimates (MLE) of the regression parameters for the logit model relaxing the normality assumption. In this case the estimation process is cumbersome and intractable. Hence, MCMC techniques could be alternative choice. We propose and develop the Monte Carlo EM (MCEM) algorithm, to obtain the maximum likelihood estimates. The proposed method is illustrated using simulated data.The rest of the paper is organized as follows. In Section 2, we review three distinct approaches to model longitudinal data. Also, we introduce the random effect model. Section 3 discusses several alternatives to maximum likelihood estimation for GLM. The EM algorithm and its variant, namely Monte Carlo EM (MCEM), are described and applied to the random effect model in Section 4. In Section 5, we implement the proposed MCEM algorithm to a simulated binary data. The results obtained from the simulated data are presented in Section 6. Finally, conclusions are given in Section 7.
in terms of
, taking into consideration the possible correlation between the subject’s measurements.
, and the other for the marginal covariance,
In other words, we assume a known structure for the correlation between a subject'smeasurements. The two more natural generalization of the diagonal covariance matrix (case of uncorrelated measurements) are the uniform and the exponential correlation structures. Modelling the mean and the covariance separately has the advantage of making valid inferences about
even when an incorrect form of the within-subject correlation structure is assumed.
on
and the correlation among repeated
into a single equation. The idea behind this approach is that correlation within subject arises because one response is explicitly caused by others. We can specify a regression model for the conditional expectation,
, as an explicit function of
and the other observations on the same subject. The function
is a function of
, where
is the set of all observations on the subject i except at time t. For longitudinal data, it is natural to reduce the set
to include only observations prior to time t;
. A well known example of the transition model is the first-order autoregressive model in which
depends on the past history in
only through the preceding measurement
. Note that the covariance matrix for
in this case corresponds to the marginal model with exponential correlation structure.
), we have independent responses from a distribution belongs the exponential family, i.e.,
has the form in Equation (1) and the conditional moments are given by
. The general specifications of the generalized linear mixed model (GLMM) are:1. The conditional mean,
, is modelled by
where
is an appropriate known link function. The
and
are the outcome variable, the covariates and the regression coefficients vector respectively. The
is an 1x q subset of
associated with random coefficients and
is a q x 1 vector of random effects with mean 0 and variance D.2. The random effects,
, are independent realizations from a common distribution.3. Given the actual coefficients,
, for a given subject, the repeated observations for that subject are mutually independent.Hence, the GLMM is an extension of GLM that includes random effects in the linear predictor, giving an explicit probability model that explains the origin of the correlations [9,10].Note that for linear model, where the response variable has a normal distribution and the identity link function is used, the random effect model coincides with the marginal model assuming the uniform correlation structure.
denote the observed data vector of size m for the ith subject. The conditional distribution of
given
(the random effect vector for the ith subject) follows a GLM of the form in Equation (1) with linear predictor
. The vector
represents the tth row of
, the model matrix for the fixed effects,
. The
represents the tth row of
, the model matrix for random effects,
, corresponding to the ith subject.We now formulate the notion of a GLMM:![]() | (4) |
with variance-covariance matrix D.The probability of any response pattern
(of size m), conditional on the random effects
is equal to the product of the probabilities of the observations on each subject, because they are independent given the common random effects.
Thus the likelihood function for the parameter
and D can be written as:![]() | (5) |
, as missing data. The complete data is then
and the complete data likelihood is given by
Although the observed data likelihood function in Equation (5) is complicated, the complete data likelihood is relatively simple. In other words, the integration over the random effects in Equation (5) is avoided since the value of (the missing data) will be simulated during the EM algorithm and will be no longer unknown.The log-likelihood is given by![]() | (6) |
to be the missing data has two advantages. First, upon knowing
, the
are independent. Second, in the M-step, where the maximization is with respect to the parameters
and D, the parameters
and
only in the first term of Equation (6). Thus, the M-step with respect to
and
uses only the GLM portion of the likelihood function. So, it is similar to a standard GLM computation assuming that
is known. Maximizing with respect to D, in the second term, is just maximum likelihood using the distribution of
after replacing sufficient statistics (in the case that K(.) belongs to the exponential family) with the conditional expected values.For the GLMM of the form in Equation (4), at the (r + 1) iteration, starting with initial values
and
, the EM algorithm follows the steps:1. The E-step (Expectation step)Calculate the expectations with respect to the conditional distribution using the current parameters' value
and
(a)
(b)
2. The M-step (Maximization step)Find the values(a)
that maximizes 1 (a).(b)
that maximizes 1 (b).If convergence is achieved, then the current values are the MLEs, otherwise increment r = r + 1 and repeat the two steps.
is obtained. Then the required expectations are evaluated via Monte Carlo approximations.The Metropolis-Hastings algorithm can be implemented to draw a sample
from the conditional distribution of
. A candidate value
is generated from a proposal distribution Q(.). This potential value is accepted, as opposed to keeping the previous value, by a probability![]() | (7) |
This calculation only involves the conditional distribution of
Incorporating the Metropolis algorithm into the EM algorithm, starting from initial values
and
, at iteration (r + 1),1. Generate L values
from the conditional distribution
using the Metropolis algorithm and the current parameters' values
and
2. The E-step (Expectation step)Calculate the expectations as Monte Carlo estimates(a)
(b)
3. The M-step (Maximization step)Find the values (a)
that maximizes 2 (a).(b)
that maximizes 2 (b).If convergence is achieved, then the current values are the MLEs, otherwise increment r = r + 1 and go to step 1.
are generated, conditionally on the random effect
, from a Bernoulli distribution with mean
. The random intercept logit model
is used.The parameters values are chosen as 
. The binary covariate Trti is set to be 1 for half of the subjects and 0 for the other half. The continuous covariate Time is independently generated from normal distribution with mean vector (0 1 2 3 6 9 12) and standard deviation vector (0 0.1 0.2 0.3 0.5 0.6 0.8). Note that, for each subject i, the 1st value of the Time covariate will always be 0. The random intercepts
are obtained as
such that
. Standardized random intercepts
are generated from lognormal distribution Ln N(0; 1). The lognormal density is chosen to represent a skewed distribution whose support does not cover the whole real line unlike the normal distribution. This setup is the same as in[7] to enable us to compare our results with those in[7].In this case the GLMM in Equation (4) is![]() | (8) |
= 1 so it drops out from the calculations. Second, we have a single random effect which is common (has the same value) for all the measurements of each subject. Thus
are iid from
where the variance D is a scalar. Finally, the distribution K(.) will be the lognormal distribution.
where
The likelihood function in Equation (5) can be written as:
Hence, the log-likelihood is given as:
We apply the MCEM algorithm introduced in Section 4.2. Starting from initial values
and
at iteration (r+1), the algorithm proceeds as:1. For each subject i, generate L values
from
using the Metropolis algorithm with probability of acceptance as in Equation (7) given by
where
and
2. The E-StepCalculate the expectations as Monte Carlo estimates(a) 
where
(b) 
Note that
do not show explicitly in this step but they already used in Step 1 to generate the L values of
3. The M-stepFind the values (a)
that maximizes 2 (a).(b)
and
that maximizes 2 (b) to use them for the calculation of the standard deviation
If convergence is achieved, then the current values are the MLEs, otherwise increment r = r + 1 and return to step 1.
|
|
is more biased but this is compensated with smaller variance resulting in a lower value of the MSE. On the other hand, the estimate for the standard deviation of the random effect is better for the normal model than the lognormal model.It is clear that there is only a small improvement concerning the estimate of
when using the logormal MEM. The bias for
is large when using either, normal or lognormal MEM, it is 100% of the true value of the parameter. Therefore, we turn to a modified MEM (MECM) to improve the results. The modification is in the M-step where the parameter vector
is divided into two subsets;
and (
). The maximization of the Monte Carlo expectations calculated in the preceding E-step is replaced by a conditional maximization. In other words, we first calculate
that maximize the expectation while
is held fixed. Next, we substitute the maximization arguments for
and
in the expectation function and find the value of
that maximizes the function. We repeat this conditional maximization twice. The results obtained for 50 data sets are summarized in Table 4.
|
and the bias is considerably lower for
,
and the standard deviation of the random effect
. On the other hand, the estimates for
,
and
become more variable.
|
and
. Results for
are good for both lognormal MECM and normal GLMM models, however, the latter gives better values. Finally, MECM algorithm produces more variable estimates for
resulting in a higher value for MSE.