International Journal of Probability and Statistics
p-ISSN: 2168-4871 e-ISSN: 2168-4863
2018; 7(1): 19-30
doi:10.5923/j.ijps.20180701.03

Morteza Marzjarani
NOAA, National Marine Fisheries Service, Southeast Fisheries Science Center, Galveston Laboratory, Galveston, USA
Correspondence to: Morteza Marzjarani , NOAA, National Marine Fisheries Service, Southeast Fisheries Science Center, Galveston Laboratory, Galveston, USA.
| Email: | ![]() |
Copyright © 2018 Scientific & Academic Publishing. All Rights Reserved.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/

This article presents an overview of the homoscedastic and heteroscedastic Generalized Linear Mixed Model (GLMM) and General Linear Model (GLM). Mathematical relations are defined which map categorical variables onto continuous covariates. It is shown that these relations can be used for different purposes including the addition of pairwise interactions and higher order terms (such as nested terms) to the models. The impacts of these relations on 1984 through 2001 shrimp efforts data in the Gulf of Mexico (GOM), year by year or all years together are compared in the paper. These data sets are also checked for possible heteroscedasticity using Breusch-Pagan and the White’s test. It was observed that these data sets show some degree of the heteroscedasticity. The method of weighted least square (WLS) was applied to these data sets and shrimp efforts were estimated before and after the corrections for the heteroscedasticity were made. In addition, it was shown that both the GLMM and the GLM represent these data sets in a satisfactory manner. Efforts generated via a GLMM and a GLM for both homoscedastic and heteroscedastic models showed that although each data set was heteroscedastic, the severity of the heteroscedasticity was compromised when the data sets 1984 through 2001 were compared.
Keywords: Estimation, General linear models, Generalized linear mixed models, Heteroscedasticity
Cite this paper: Morteza Marzjarani , Heteroscedastic and Homoscedastic GLMM and GLM: Application to Effort Estimation in the Gulf of Mexico Shrimp Fishery, 1984 through 2001, International Journal of Probability and Statistics , Vol. 7 No. 1, 2018, pp. 19-30. doi: 10.5923/j.ijps.20180701.03.
|
![]() | Figure 1. The Gulf of Mexico is divided into twenty-one statistical areas (1-21) as shown |
![]() | (1) |
![]() | (2) |
|
![]() | (3) |
stands for “Exclusive or Exclusive Disjunction or XOR operator.”As an alternative to using Table 2 to revise fathomzone values above 12, the modes of fathomzone values below 12 for each vessel were found (see formula (4)). Then all fathomzones above 12 per vessel were replaced with the corresponding mode. ![]() | (4) |
|
where l<1, p>1, w<1. • To present a more challenging scenario, suppose the experimenter knows from the past that vessels beginning with vessel id number 12345 are considered at least 60 feet long and they usually fish in area 4, at depth 2, and in trimester 1. He/she wishes to set the vessel length at 60 if a length is less than this number. Below is the algorithm for this hypothetical situation.
Although, all the examples mentioned above are functions, such relations do not have to be defined as functions and this provides even a greater flexibility to the researcher.In order to develop a GLMM or a GLM for shrimp effort estimation using the 1984 through 2001 data sets individually or collectively, the variables length, lnlbs, wavgppnd, area, depth, and trimester and also the first order interactions between continuous variables were included in the models as described below.![]() | (5) |
is the response,
is the overall mean,
are constant observations,
‘s are error terms, and zip = 1 if i=p; 0 otherwise. It is more convenient to write the equation given in (6) in matrix form.![]() | (6) |
is a column n x1 vector of observations, 
is an n x n1 matrix relating
is a n1 x 1 column vector of fixed portion of the model. Also,
is an n x n2 identity matrix relating
is an n2 x 1 column vector of random portion, and
is n x 1 column vector of the error terms, the variability in y not explained by the portion
In (6),
is the non-random portion of the model,
is the random effect, and
is the random error part of the model. As is usually the case, it is assumed that
is normally distributed with mean 0 and variance covariance
It is also assumed that
is normally distributed with mean 0 and variance-covariance
In this model, the fixed part includes the categorical variables area, depth, trimester, and year (where applicable). The random portion of the model includes the vessel length (length), lnlbs, wavgppnd, and their pairwise interactions or a mathematical relation. The matrices
and
are commonly known as the G-side and the R-side of the model. The random effect determines the G-side of the model variance and it is defined by the RANDOM portion in the model. Furthermore, it is assumed that
and
are uncorrelated. Therefore, one can easily observe that ![]() | (7) |
![]() | (8) |
, the Identity function is selected for g (.). The reason for having the link function in GLMM is the fact that unlike GLM, the response variable in GLMM does not need to be normally distributed and its range does not have to be in the interval (−∞, +∞). Furthermore, the relationship between predictors and response does not have to be simple relationship. The link function establishes a relationship between these components of the model in such a way that the range of the non-linearly transformed mean g(.) ranges from −∞ to +∞. The model defined in (6) becomes equivalent to a randomized block design if the non-random portion of the model is dropped. Clearly, in the absence of the random portion of the model, equation (6) reduces to a GLM. In the case model (6) contains only the random effects, the
matrix is of interest. On the other hand, in a model with fixed effects only the
matrix is of interest. In the absence of the random effects and if
in the form
where I is an identity matrix (that is, a homoscedastic model), the GLMM reduces to a GLM or a GLM with overdispersion (this term will be defined later in this article). Using a similar approach as in GLM, that is, maximum likelihood estimation (MLE), the parameters in (6) can be estimated as: ![]() | (9) |
was considered as follows:![]() | (10) |
are diagonal matrix. In the case of a homoscedastic model,
is an identity matrix. Some authors have considered some special cases. For example, [14] assumed that the
matrix for a fixed effect model with two covariates was in the form
where
with
With such choice for the
matrix, the parameter δ measures the strength of the heteroscedasticity in the model: the lower its magnitude, the smaller the differences between individual variances. Of course, when δ = 0, all the variances are identical (homoscedastic model). In practice, the three values 1/2, 1, and 2 for δ are of particular interest. ![]() | (11) |
![]() | (12) |
![]() | (13) |
![]() | (14) |
contained all first order terms of continuous variables and their pairwise interactions and the vector
included all the categorical variables. • For a comparison, a GLM with all first order terms of continuous variables length, lnlbs, wavgppnd and categorical variables area, depth, trimester, and the pairwise interactions of the continuous variables was applied to the Match files.• Both GLMM and GLM were fitted to the 1984 through 2001 and also the combined data files under the assumption of heteroscedasticity and were compared to those under the assumption of homoscedastic models.
|
|
|
|
|
|
|
|
|
|
|
![]() | Figure 2. Efforts generated year by year via a GLMM or a GLM using Table 2 or Formula (4) to revise fathomzones over 12 (under the assumption of homoscedasticity) |
|
|