International Journal of Statistics and Applications
p-ISSN: 2168-5193 e-ISSN: 2168-5215
2018; 8(2): 42-52
doi:10.5923/j.statistics.20180802.02

Morteza Marzjarani
NOAA, National Marine Fisheries Service, Southeast Fisheries Science Center, Galveston Laboratory, 4700 Avenue U, Galveston, Texas, USA
Correspondence to: Morteza Marzjarani , NOAA, National Marine Fisheries Service, Southeast Fisheries Science Center, Galveston Laboratory, 4700 Avenue U, Galveston, Texas, USA.
| Email: | ![]() |
Copyright © 2018 The Author(s). Published by Scientific & Academic Publishing.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/

Linear models along with multiple imputation method have become powerful tools for prediction and estimating missing data points. In this research, the collaboration between these two tools will be studied and then the tools will be deployed to estimate shrimp effort (actual hours of fishing per trip) in the Gulf of Mexico (GOM) for the years 2007 through 2014 using a simple form of a general linear model (GLM). Since there was a need for handling missing vessel lengths and the price per pound, multiple imputation method was deployed and missing data points including missing vessel lengths were estimated. An ad-hoc method was also used to estimate the missing vessel lengths and the results were compared with those obtained from the imputation method. As an application, a GLM was developed and used to estimate shrimp effort in the GOM for the years 2007 through 2014. The GLM included a few continuous and categorical variables. Additionally, the model was revised by including year as an independent variable and compared the results with the case of year-by-year estimates.
Keywords: Imputation, General Linear Models
Cite this paper: Morteza Marzjarani , Estimating Missing Values via Imputation: Application to Effort Estimation in the Gulf of Mexico Shrimp Fishery, 2007-2014, International Journal of Statistics and Applications, Vol. 8 No. 2, 2018, pp. 42-52. doi: 10.5923/j.statistics.20180802.02.
|
|
![]() | Figure 1. The Gulf of Mexico is divided into twenty-one statistical subareas (1-21) as shown |
![]() | Figure 2. Conversion of statistical subareas (1-21) and fathomzones (1-12) in the Gulf of Mexico to areas (1-4) and depths (1-3) respectively |
![]() | Figure 3. Combination of areas (1-4) and depths (1-3) in the Gulf of Mexico called SEDAR cells |
|
|
where the vector
represented the observed and missing vessel lengths (length), and
represented the totlbs with all known values and
a vector with elements 1 if a vessel length was missing and 0 otherwise. For a detailed definition of MCAR, the reader is referred to [14]. The 2008 data with moderate missing vessel lengths (7%) was selected and checked for the MCAR pattern. To confirm the MCAR pattern, Little’s test [15] was applied to the same data set producing χ2 = 0.509 with p-value 0.476. Little’s test was also applied to the 2007 data for a confirmation of the pattern which produced χ2 = 3.378 with p-value 0.066 (close to being significant using the threshold 0.05, but still non-significant). Therefore, it was assumed that the missing pattern was MCAR in all data sets used in this research. One could have also assumed the MAR condition (missing at random) as the missing pattern. Imputation methods offers different models or mechanisms depending on the missing data pattern as discussed below. The reader is referred to an article by [16] for a comparison of different imputation methods.![]() | Figure 4a. A monotone (.= missing value) & Figure 4b. An arbitrary pattern (.= missing value) |
|
![]() | (3) |
![]() | (4) |
is the variance where we do not account for the missing values and is found by averaging the variance estimates from each complete set of imputed data. Another quantity of interest is the variance between imputations (B), where![]() | (5) |
![]() | (6) |
![]() | (7) |
![]() | (8) |
determine the variability of
The ratio
indicates how much information is missing. That is, the fraction of missing information shown by δ. The relative efficiency of the estimate (RE) is defined as:![]() | (9) |
![]() | (1) |
![]() | (2) |
is a column matrix of the natural logarithm of towdays,
is an n x m matrix of repressors relating the vector of responses
to 
and the vector of fixed and unknown parameters,
the error term. The vector
is assumed to be a normally, independently, identically distributed (iid) random variable with
and
In this model length is the vessel length, ln(totlbs) is the natural logarithm of total pounds of shrimp per trip, wavgppnd is the weighted average price per pound of shrimp per trip, area (a categorical variable with four levels), and depth and trimester are categorical variables with three levels. The response variable is towdays.
|
|
![]() | Figure 5. Effort generated via GLM for the years 2007 through 2014 for different missing vessel length choices (one imputation, ten imputations or the average of existing vessel lengths) using the monotone regression imputation method |
|
![]() | Figure 6. Impact of imputation models on effort (year as a predictor): regression, MCMC, FCS, and monotone propensity |
|
|
|
|
| [1] | Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys, New York: John Wiley & Sons, Inc. |
| [2] | Graybill, F. (1968). Theory and Application of Linear Model, Duxberry Classic Series. |
| [3] | He, Y. (2011). Missing Data Analysis Using Multiple Imputation: Getting to the heart of the Matter, HHS Public Access, 1-16. |
| [4] | Horton, N. J., Kleinman, K. P. (2007). Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models, Am. Stat., 61(1), 79-90. |
| [5] | Graham, J. W., Hofer, S. M., Donaldson, S. I., MacKinnon, D. P., Schafer, J. L. (1997). Analysis with missing data in prevention research. In K. Bryant, M., Windle, S. West (Eds.), The science of prevention: methodological advances from alcohol and substance abuse research. |
| [6] | Wayman, J. C. (2002). The utility of educational resilience for studying degree attainment in School dropouts. Journal of Educational Research, 95 (3), 167-178. |
| [7] | Schafer, Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research, 33 (4), 545-571. |
| [8] | Graham, Hofer, S. M. (2000). Multiple imputation in multivariate research. In R. J. Little, K. U. Schnabel, J. Baumert (Eds.), Modeling longitudinal and multiple-group data: Practical issues, applied approaches, and specific examples. Erlbaum, Hillsdale. |
| [9] | Sinharay, S., Stern, H. S., Russell, D. (2001). The use of multiple imputation for the analysis of missing data. Psychological Methods, 6 (4), 317-29. |
| [10] | Griffin, W. L., Shah, A. K., Nance, J. M. (1997). Estimation of Standardized Effort in the Heterogeneous Gulf of Mexico Shrimp Fleet, Marine Fisheries Review, (59) 3, 23-33. |
| [11] | Hart R. A., Nance, J. M. (2013). Three Decades of U.S. Gulf of Mexico White Shrimp, Litopenaeus setiferus, Commercial Catch Statistics, Marine Fisheries Review, 75 (4), 43-47. |
| [12] | Marzjarani, M. (2016). Higher Dimensional Linear Models: An Application to Shrimp Effort in the Gulf of Mexico (Years 2007-2014), International Journal of Statistics and Applications 2016, 6(3), 96-104. |
| [13] | Reid D. G., Graham N., Rihan D. J., Kelly E., Gatt, I. R., Griffin, F., Gerritsen, H. D., Kynoch, R. J. (2011). Do big boats tow big nets? ICES Journal of Marine Science, 68(8), 1663–1669. doi:10.1093/icesjms/fsr130. |
| [14] | Raghunathan, T. (2016). Missing Data Analysis in Practice, CRC Press. |
| [15] | Little, R. J. A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association. 83 (404), 1198-1202. |
| [16] | Schmitt, P., Mandel, J., and Guedj, M. (2015). A Comparison of Six Methods for Missing Data Imputation, J Biom Biostat 6:224. doi: 10.4172/2155-6180.1000224. |
| [17] | Yuan, Y. (2011). Multiple Imputation Using SAS Software, Journal of Statistical Software, Vol. 45, Issue 46, pp. 1-25. |
| [18] | Rosenbaum, P. R., Rubin, D. B. (1983). “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika, 70, 41–55. |
| [19] | Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data, New York: Chapman and Hall. |
| [20] | Allison, P. D. (2000). “Multiple Imputation for Missing Data: A Cautionary Tale” Sociological Methods and Research, 28, 301–309. |
| [21] | van Buuren, S., Boshuizen, H. C., Knook, D. L. (2007). “Multiple Imputation of Missing Blood Pressure Covariates in Survival Analysis” Statistics in Medicine, 18, 681–694. |
| [22] | Rubin (1996). Multiple Imputation after 18+ Years, Journal of the American Statistical Association, Vol. 91, No. 434, 473-489. |
| [23] | Patella, F. (1975). Water surface area within statistical subarea used in reporting Gulf coast shrimp data. Mar. Fish. Rev. 37(12), 22–24. |