International Journal of Probability and Statistics
p-ISSN: 2168-4871 e-ISSN: 2168-4863
2018; 7(4): 97-105
doi:10.5923/j.ijps.20180704.01

Ahmed M. Gad, Niveen I. EL-Zayat
Statistics Department, Faculty of Economics and Political Science, Cairo University, Egypt
Correspondence to: Ahmed M. Gad, Statistics Department, Faculty of Economics and Political Science, Cairo University, Egypt.
| Email: | ![]() |
Copyright © 2018 The Author(s). Published by Scientific & Academic Publishing.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/

Measurements made on several outcomes for the same unit, implying multivariate longitudinal data, are very likely to be correlated. Therefore, fitting such a data structure can be quite challenging due to the high dimensioned correlations exist within and between outcomes over time. Moreover, an additional challenge is encountered in longitudinal studies due to premature withdrawal of the subjects from the study resulting in incomplete (missing) data. Incomplete data is more problematic when missing data mechanism is related to the unobserved outcomes implying what so-called non-ignorable missing data or missing not at random (MNAR). Obtaining valid estimation under non-ignorable assumption requires that the missing-data mechanism be modeled as a part of the estimation process. The multiple continuous outcome-based data model is introduced via the Gaussian multivariate linear mixed models while the missing-data mechanism is linked to the data model via the selection model such that the missing-data mechanism parameters are fitted using the multivariate logistic regression. This article proposes and develops the stochastic expectation-maximization (SEM) algorithm to fit MLMM in the presence of non-ignorable dropout. In the M-step maximizing likelihood function is implemented via a new proposed Quasi-Newton (QN) algorithm that is of EM type, while maximizing the multivariate logistic regression is implemented via Newton-Raphson (NR) algorithm. A simulation study is conducted to assess the performance of the proposed techniques.
Keywords: Non-ignorable missing, Selection models, Multivariate linear mixed models, Stochastic EM algorithm, Newton-Raphson, Fisher-Scoring, Quasi-Newton
Cite this paper: Ahmed M. Gad, Niveen I. EL-Zayat, Fitting Multivariate Linear Mixed Model for Multiple Outcomes Longitudinal Data with Non-ignorable Dropout, International Journal of Probability and Statistics , Vol. 7 No. 4, 2018, pp. 97-105. doi: 10.5923/j.ijps.20180704.01.
and there are
repeated measures on each subject. Also, assume that there are m outcomes on each subject,
. The reponses for the subject i are collected in the response matrix
. The
denotes the measurement of the
subject of the
outcome at the
occasion. Assume that
is the
response matrix of the
subject where each of
is the
response
-vector measuremnets of subject i. It is very common that some subjects may withdraw from the study prematuraly which results in droput. The multivariate longitudinal data can be modeled using the multivariate linear mixed model (MLMM), where for the
subject the response is modeled as: ![]() | (1) |
is the
fixed between-subject design matrix,
is the
matrix of fixed effects assumed to be common for all subjects,
is the
random within-subject design matrix,
is the
matrix of random effects, and
is the
matrix of the measurements errors associated with the response matrix
. The distributional assumptions that are often related to the MLMM are: The
; the rows of
are normally distributed with mean 0 and
unstructured covariance matrix
. The
is the vectorized operator that stacks all columns of the matrix
vertically. The random effects
, where
is
unstructured matrix. Moreover,
and
are assumed to be independent. The marginal distribution of
has normal distribution with expectation
and dispersion matrix
of dimension
; ![]() | (2) |
can be expressed as
where
, and can be modelled as: ![]() | (3) |
, and
. The p.d.f. of
is given by ![]() | (4) |
![]() | (5) |
, can be obtained as![]() | (6) |
can be obtained by maximizing the likelihood function in Eq. (6). Note that this function is based on the marginal distribution function of the response
in the absence of the random effects. The random effects can be considered as nuisance parameters as long as the major interest is on the variance structure introduced by
. Hence the log-likelihood function, based on Eq. (4) for the whole sample is ![]() | (7) |
, is considered the parameter matrix of interest,
, while
. A simpler form of the log-likelihood function is given by (Schafer and Yucel, 2002)![]() | (8) |
, and
.We assume that the variance-covariance matrices,
and
, are unstructured. Moreover, we represent these matrices in terms of their precision form,
and
. These forms are obtained as linear functions as.![]() | (9) |
and
is the number of diagnol and off-diagonal elements of
. Thus
, such that
. Similarly,
,
, and
, such that
Further, each of
and
are square matrices of order
and m, respectively, with one on the
position and zero elsewhere. Hence, the response model parameter vector
and the total number of parameters is
.Missing data model and likelihood functionAssume that, for subject i,
is subject to non-random dropout at a certain occasion
. The responses
can be partitioned into two components; the observed part
of dimension
and missing part
of dimension
. Also, the response matrix using the
operator can be partitioned into two parts;
, where
, and
. Assume that the missing data indicator
of dimension
where
is defined as
For monotone missing data we can define
, where 1 is the matrix of ones of dimension
and 0 is the matrix of zeros of dimension
. The Diggle and Kenward (1994) model has been proposed for univariate longitudinal data setting. We extend this model to the multivariate longitudinal data of m outcomes. Assume that the dropout of subject i occurs at time
denotes the
matrix of observed responses history of the m outcomes up to the time
, where each of
corresponds the row vector of length
of observed responses of
outcome. Diggle and Kenward (1994) modeled the probability of dropout in terms of the history
up to occasion
and the possibly missing current response,
as 
This probability is computed for each subject across all occasions as follows: ![]() | (10) |
can be obtained by ![]() | (11) |
![]() | (12) |
can be computed as ![]() | (13) |
is the conditional probability of the missing given the observed responses. Hence the integral in Eq. (13) can be approximated using Monte Carol expectation (Gad and Ahmed, 2006). The complete-data log-likelihood for the N subjects can be given by ![]() | (14) |
. Also, it can be re-expressed as ![]() | (15) |
can be obtained maximizing each of the two components in Eq. (15) separately. Direct maximization is not feasible because it requires multidimensional integration and maximization (Jason and Xihong, 2002). Numerical and iterative computations are exceedingly required.
. In the second sub-step a new type of Quasi-Newton algorithm is proposed to obtain the MLEs of the response parameter
.The S-Step (Simulation step)Let
denotes the missing components of the response matrix
over the m outcomes such that
indicates the first
missing vector, while the second matrix of order
is introduced by
In this step (the S-step), we simulate m-vector from
to represent the responses of the m outcomes at the time of dropout
This conditional disribution can be factorized, up to a constant of proportionality, as ![]() | (16) |
, then it is passed through the following accept-reject procedure;1. Simulate a candidate m-vector
, from the conditional distribution
which is m-multivariate normal distribution with parameters
and
; the
location vector and the
dispersion matrix respectively, are given by ![]() | (17) |
is the variance matrix corresponds of the observed response matrix
;
2. Compute the m-vector probability of dropout for the candidate vector
based on the dropout model![]() | (18) |
comprises the m-vector of responses at
; the occasion precedes the dropout occasion. The dropout probability vector,
, is evaluated using the inverse transformation in Eq. (12). Then define
. 3. Simulate a random m-vector U from uniform distribution over the interval
, then take
if
, otherwise repeat step 1. We argue that the remaining missing values, following the first missing value, can be considered as missing at random.The M-Step (Maximization step)The joint probability function of the pseudo-observed data is ![]() | (19) |
. Since
does not depend on
by construction, the above equation can be simplified to ![]() | (20) |
and
will be any function of
and
proportional to
. Moreover, based on the selection model,
and
are assumed to be distinct by definition. Hence,
is the same as likelihood
.The M-step of the algorithm comprises two substeps; the logistic step (M1-Step), and the normal step (M2-Step):The M1-Step (the logistic step):In this substep the MLEs of the dropout parameter,
, is estimated numerically using Newton-Raphson approach. A step halfing algorithm is adopted to guarantee that the log-likelihood function increasing toward the maxima over the iterations. Moreover, in case when the Hessian matrix of
is not negative definite, the Newton direction
may not point in an ascent direction. Therefore, a modification is adopted using the Levenberg-Marquardt technique, to ensure that the search direction is an ascent direction, hence the algorithm guides the process to the maximum point (Chong and Zak, 2013).The basic Newton-Raphson algorithm is used to obtain the MLEs for the dropout parameters of the multivariate logistic regression model. The algorithm seeks for optimizing, the log-likelihood function![]() | (21) |
refers to the subjects who complete the study (the completers), hence,
refers to dropout subjects. The probabilities vector
is computed using Eq. (11) for any occasion
, and form Eq. (18) at the occasion of dropout
. Implementing the algorithm needs computing the gradient
vector and the Hessian matrix
. First, the elements of the gradient vector
are obtained as follow:
where
is the m-vector of the responses at occasion j. The elements of the Hessian matrix are obtained as follow:
where
is the response matrix of order
. It is worth noting that at an initial value
that might be chosen at which the Hessian matrix
is not negative definite, the Levenberg-Marquardt modification of Newton’s algorithm will be adopted. That approach approximates
with another matrix
, such that
, where the scalar
is chosen large enough such that
is negative definite (Chong and Zak, 2013).The M2-Step (the normal step):In this substep the MLEs of the parameters,
for the multivariate normal is estimated numerically using a new proposed approach of Quazi-Newton type. The proposed Quazi-Newton algorithm begins with initial values of
with
, where
is computed as ![]() | (22) |
is the initial values vector of the parameter
and the function
as in the EM algorithm. Generally, the algorithm proceeds as follows:1. Initiate the algorithm with
, then compute the corresponding
and
, where
is the gradient of the observed log-likelihood function.2. Compute the step direction vector
, then use the line search algorithm to specify
that maximizes
. The Golden-Section algorithm is adopted here to specify the single parameter
that maximize
along the search direction d from the point
. 3. Set
, and compute
. 4. Based on
, and
can be evaluated as
. 5. Replace
by
,
by
, and
by
and return to step (2) or stop based on a stopping criterion.To implement the above algorithm, the quantities
, and
are needed. Deviations of such quantities are obtained as follows. First, the EM-quantity,
, can be easily deduced by taking the expectation of the log-likelihood of the complete-data given the observed responses
and the current values of the parameter vector
. This function is given by ![]() | (23) |
and
indicates the complete-data response matrix. To compute this expectation, set
where
. Further the complete-data matrix,
, is partitioned into
, similarly,
. Based on axioms of conditional normal distribution, it can be easily shown that
where
is the square submatrix of
corresponding to the observed elements, and
is the rectangle sub-matrix of covariances between the missing and the observed elements. On the other hand, it also can be shown that
where
is the square submatrix of covariances between the missing elements.The gradient of the observed log-likelihood,
, is derived as
where,
, and
. Further,
and
are replaced with
and
, respectively, such that, the scalar
which is intensively known as Kronecker delta, equals 1 when
and 0 otherwise, where
, while
.The second derivatives of the EM-Q function are needed. The computation of the first and the second derivative of the EM-Q function are
where
and 
For a parameter
, the SEM estimates is taken as the mean of the sequence
as ![]() | (24) |
is the length of burn-in period.In the above derivations of the gradient vector and the derivatives of the EM-Q function, the quantities;
and
are replaced with
and
, respectively. The scalar
is Kronecker delta which equals 1 when
and 0 otherwise, where
. Similarly the scalar
is defined for
. The vector
is the basis vector of length mq with
element equals 1 and all other elements are zeros.
, and
to generate data. The number of outcomes for each subject is assumed to be three outcomes;
. The number of time points is fixed at 10;
. The number of replications is chosen as 200. It is assumed that subjects are allocated to one of three treatments A, B, and C. Hence, there are a matrix of dimension
of fixed-effects
of the three covariates across the three outcomes. The subject-specific design matrix is assumed to have two columns
; the random intercept and slope. Such that, the two subject-specific covariates
, for odd subject, and
, for even subjects. Also the random design matrix
is assumed to be time-invariant. Consequently, the individual level matrix of random-effect parameters is
is of order
. It represents the parameters of random-effects across outcomes. This matrix is generated for each subject as 
, where
is of dimension
assumed to be unstructured matrix. The measurement errors associated with the three outcomes are assumed to be independent over the repeated measure having a common unstructured variance-covariance matrix
of dimension
. Hence, for each subject the matrix
is generated from a matrix normal distribution of parameters
. The true value of the elements of the parameter matrices;
and
are assumed as
The response matrix
for each subject, is generated from the model ![]() | (25) |
![]() | (26) |
and
is a 3-column vector, indicates the probabilities of dropout at the occasion j associated to the vector of responses, the vector of responses that were planned to be measured at occasion j, and the vector of responses at the previous occasion
, respectively. However, if the dropout occasion, for a subject i, is specified as
then the dropout probability is computed as a probability of the dropout time
at all possible occasion as follow
For each subject, the response matrix generated in the previous step is exposed to a non-random dropping process on the occasion level. Precisely, at occasion j, after the baseline
, the response vector
has been checked to be whether retained or dropped as follow: 1. The initial values for the parameter vector of logistic dropout model is assumed to be
. 2. The probability vector
is computed based on Eq. (26). 3. A random variable U is generated from uniform (0,1). 4. The response vector
is dropped if
, otherwise it is retained. Simulation Results The proposed approach introduced has been applied to data. For each of the two maximization steps the stopping criterion that is suggested in Demidenko (2004) is used. Precisely, the iteration will be terminated if the norm value of the absolute difference of two successive values of the gradient vector at fixed parameter values is less that 0.0001. To accelerate the algorithm the study adopted the Levenberg-Marquardt technique to modify each of the Hessian matrix of the dropout parameters and the second derivative matrix of the Q function of the model precision parameters, to be negative definite. The SEM algorithm is iterated for 500 iterations with a burn-in period of 250 iterations. The MLEs for the parameters were taken as the averages of the last 250 iterations. The performance of the proposed approach is evaluated via the relative bias (RB) of its obtained MLEs, which was computed as follow:
The parameter estimates are displayed in Table (1) for N = 25 and Table (2) for N = 50.
|
|
of the first outcome show maximum bias in both tables (30.7%, and 42.6%). For sample size N = 25, the estimate of the effect of the first treatment on the second and third outcomes show somewhat large bias (44.1%, and 42.1% receptively). Generally, the second table show less bias for most parameters (except for the estimate of the first intercept). It almost does not exceed 27.0% approximately. Thus, the proposed approach show higher performance with large sample sizes.
to be
, where the matrix
can take a simple autoregressive pattern. However, this will imply estimating more parameters. The performance of the proposed approach is evaluated by a simulation study assuming three outcomes. The study is needed to be extended to check its applicability for a higher number of outcomes.