International Journal of Statistics and Applications
p-ISSN: 2168-5193 e-ISSN: 2168-5215
2014; 4(1): 46-57
doi:10.5923/j.statistics.20140401.05
Gurprit Grover1, Alka Sabharwal2, Juhi Mittal1
1Department of Statistics, University of Delhi, Delhi, 110007, India
2Department of Statistics, Kirori Mal College, University of Delhi, Delhi, 110007, India
Correspondence to: Alka Sabharwal, Department of Statistics, Kirori Mal College, University of Delhi, Delhi, 110007, India.
| Email: | ![]() |
Copyright © 2012 Scientific & Academic Publishing. All Rights Reserved.
Diabetic nephropathy (DN) is one of the major complications of type 2 diabetes. Studies have shown that duration of diabetes and serum creatinine (SrCr) is significant predictors for determining the renal health status of a patient. In this study we have estimated the duration of diabetes of a patient on the basis of their latest renal health status. For this we have developed the joint distribution of three correlated random variables namely duration of diabetes, Serum Creatinine (SrCr) and fasting blood glucose (FBG) to estimate the duration of disease of type 2 diabetic nephropathy patients. This is done by considering two datasets; the first one gives the complete information (from the time of diagnosis till termination of study) and the other gives the latest information (latest 19 months) about the renal health status of a patient. We have used the complete information from the first data to estimate the duration of disease for the DN patients belonging to second dataset. Multivariate analysis is applied for estimating these disease durations by firstly selecting the appropriate distributions for the above three random variables. Then we have checked the normal approximation for each distribution and finally we have checked multivariate normality by applying Mardia test. The distributions of three correlated random variable were found to be approximately normal and they were also found to be jointly normal, therefore three dimensional multivariate normal (MVN) distributions is considered to be an appropriate distribution for duration of diabetes, SrCr and FBG. Conditional expectation under MVN is applied to estimate the duration of diabetes for given values of SrCr and FBG. We have also applied bivariate normal (BVN) distribution as the special case of MVN distribution and estimated the durations of diabetes on the basis of SrCr only. Further we have compared the estimated durations from both MVN and BVN distributions graphically. This estimation procedure will help medical fraternity to guide those patients who have incomplete record history, about their approximate duration of disease. Also it will help in monitoring and evaluating the severity of DN complication.
Keywords: Akaike information criterion, Bivariate normal distribution, Gamma distribution, Lognormal distribution, Mardia test, Multivariate normal distribution
Cite this paper: Gurprit Grover, Alka Sabharwal, Juhi Mittal, Application of Multivariate and Bivariate Normal Distributions to Estimate Duration of Diabetes, International Journal of Statistics and Applications, Vol. 4 No. 1, 2014, pp. 46-57. doi: 10.5923/j.statistics.20140401.05.
![]() | Figure 1. Algorithm to estimate the duration of disease of type 2 diabetic nephropathy patients |
is said to have p-dimensional multivariate normal distribution with a mean vector
and variance-covariance matrix
if its joint probability density function is[14],![]() | (1) |
which can be partitioned as
, has a multivariate normal distribution
with mean vector
and variance-covariance vector,
, where
and
are two sub-vectors of dimensions q and p-q of
respectively. Define a transformation from
to new variables
and
. This is achieved by linear transformation,![]() | (2) |
is also MVN hence, the linear transformation shows that
are jointly MVN distributed. Now, we can show
and
are independent by proving that they are uncorrelated.
Since,
and
are MVN variables and uncorrelated they are independent. Thus,
Now, as
the conditional distribution of
given
is, ![]() | (3) |
[15].
and
respectively by the following probability density function[14]: ![]() | (4) |
Then if the shape parameter γ is large as compared to λ then Gamma distribution tends to normal distribution i.e.,
[16]. In case of lognormal distribution, if arithmetic mean m is much larger than its arithmetic standard deviation s, then the distribution tends to Normal (m, s2). A general rule of thumb for this approximation is m > 6s. If X is a random variable following lognormal distribution with µ and σ as location and scale parameters respectively and with pdf as,
The mean and standard deviation can be defined as,
and,
. Then, if
,
[17].
where,
Mardia (1970) has shown that for large samples the statistics,
follows a
distribution with p(p+1)(p+2)/6 degrees of freedom and
follows a standard normal distribution. Thus, these two measures allow one to test two hypotheses that are compatible with the assumption of normality[10, 18].
|
|
. The normal approximation for t is also judged by graph presented in figure 2. Figure support the claim that normal distribution is good approximation for duration of diabetes. The distribution of SrCr also tends to normal as in case of lognormal distribution the ratio of mean and standard deviation is large (> 6). Applying the results from section 2.3, we can say that
. The same is depicted by graph presented in figure 3. Hence, all three random variables are marginally normally distributed as FBG was also found to be normally distributed with parameters µ = 170.9643 and σ= 21.4293.![]() | Figure 2. Normal approximation for Gamma distribution |
![]() | Figure 3. Normal approximation for Lognormal distribution |
and
(p-values >0.05) respectively, indicating that t, SrCr and FBG are jointly normally distributed. Now, since the variables: duration of diabetes, SrCr and FBG are marginally normally distributed with significant correlation coefficients (
; p-values < 0.0500) and are also jointly normally distributed. Therefore, MVN distribution is a suitable model for representing joint distribution of t, SrCr and FBG and is defined as,![]() | (6) |
![]() | (7) |
represents the mean value of t calculated from the simulated data corresponding to a specific time interval,
is observed value of SrCr and FBG from the data corresponding to a specific interval,
is the mean value of SrCr and FBG,
and
are calculated from the generated sample corresponding to a specific interval. The procedure of calculation for the first time interval t ≤ 8 is illustrated below:1. The mean duration of diabetes is calculated from the data of 60 DN patients for only those patients whose mean duration of diabetes is less than or equal to 8 years and is found to be 7.400 years. 2. The SrCr value for these patients range from 1.6500 to 2.2000 mg/dl with mean value 1.925 mg/dl (
) and FBG value for these patients range from 133 to 199 mg/dl with mean value 166 mg/dl (
).
|
), SrCr (
), FBG (
) and covariance between t, SrCr and FBG (
) are calculated from the simulated sample corresponding to the ranges t ≤ 8, 1.6500 ≤ SrCr≤ 2.2000 and 133≤ FBG ≤ 199.4. Conditional expectation of
is obtained by substituting the above values in equation (7). This gives the mean duration of diabetes for DN patients whose observed duration of disease is less than or equal to 8 years with their mean SrCr and FBG values known.Following the above procedure the mean durations for all time intervals are estimated and presented in table 4.
) and 138 mg/dl (
) respectively.2. Mean and standard deviation of t (
), SrCr (
) and FBG (
) and covariance between t, SrCr and FBG (
) are calculated from the simulated sample corresponding to the ranges 1.4000 ≤ SrCr≤ 1.4200 and 126≤ FBG≤ 138.3. Conditional expectation of
is obtained by substituting the above values in equation (7). This gives the mean duration of disease for the first DN patient with known SrCr and FBG values.The mean durations of disease for 14 DN patients are calculated by applying the above procedure and are presented in table 6.
; p-values >0.05). Thus, BVN distribution is a suitable model for representing joint distribution of t and SrCr and is defined as,![]() | (8) |
![]() | (9) |
represents the mean and standard deviation values of t calculated from the simulated sample corresponding to a specific time interval,
is observed value of SrCr from the data corresponding to a specific interval,
are mean and standard deviation values of SrCr respectively, calculated from the generated sample corresponding to a specific interval. The procedure of calculation for the first time interval t ≤ 8 is illustrated below:1. The SrCr value for these patients range from 1.6500 to 2.2000 mg/dl with mean value 1.925 mg/dl (
).2. Mean and standard deviation of t (
) and SrCr (
) are calculated from the simulated data corresponding to the ranges t ≤ 8 and 1.6500 ≤ SrCr≤ 2.2000.3. Conditional expectation of
is obtained by substituting the above values in equation (9). This gives the mean duration of diabetes for DN patients whose observed duration of disease is less than or equal to 8 years with their mean SrCr value known.The mean durations for all intervals are presented in table 5. The estimated durations of 60 DN patients obtained from MVN and BVN are compared graphically with the observed durations and are presented in figure 4.
).2. Mean and standard deviation of t (
) and SrCr (
) are calculated from the simulated sample corresponding to the ranges 1.4000 ≤ SrCr≤ 1.4200.3. Conditional expectation of
is obtained by substituting the above values in equation (9). This gives the mean duration of disease for the first DN patient with known SrCr value.The mean duration of disease for 14 DN patients is presented in table 6. These estimated durations of diabetes are further compared graphically with those estimated by applying MVN distribution and are presented in figure 5. The calculation and analysis are performed using SPSS for Windows, Version 15 and MATLAB, version 6.5 statistical packages.
|
|
![]() | Figure 4. Comparison of observed and estimated duration of diabetes for 60 DN patients of dataset 1 by applying BVN and MVN distributions |
![]() | Figure 5. Comparison of estimated duration of disease for 14 DN patients of dataset 2 by applying BVN and MVN distributions |
| [1] | Zhuo, L., Zou, G., Li, W., Lu, J., and Ren, W., 2013, Prevalence of diabetic nephropathy complicating non-diabetic renal disease among Chinese patients with type 2 diabetes mellitus, European Journal of Medical Research, 18(4),1-8. |
| [2] | Alwakeel, J.S., Isnani, A.C., Alsuwaida, A., AlHarbi, A., Shaikh, S.A., AlMohaya, S., and Ghonaim, M.A., 2011, Factors affecting the progression of diabetic nephropathy and its complications: A single-center experience in Saudi Arabia, Ann Saudi Med,31(3), 236-242. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3119962/ |
| [3] | Mogensen, C.E., 1999, Microalbuminuria, blood pressure, and diabetic renal disease: origin and development of ideas, Diabetologia, 42, 263-285. |
| [4] | The Diabetes Control and Complications Trial Research Group, 1993, The effect of intensive treatment of diabetes on the development and progression at long-term complications in insulin-dependent diabetes mellitus, NEJM., 29, 977-986. |
| [5] | United Kingdom Prospective Diabetes Study (UKPDS) Group, 1998, Intensive blood glucose control with sulphonylureas or insulin compared with conventional treatment and risk of complications in patients with type-2 diabetes (UKPDS 33). Lancet, 352:837-853[Erratum. Lancet. 1999;354:602]. |
| [6] | Mulec, H., Blohme, G., Grandi, B., and Bjorck, S., 1998, The effect of metabolic control on rate of decline in renal function in insulin- dependent diabetes mellitus with overt diabetic nephropathy. Nephrol Dial Tronspoant,13, 651-655. |
| [7] | Grover, G., Gadpayle, A.K., and Sabharwal, A., 2012, Identifying patients with diabetic nephropathy based on serum creatininein the presence of covariates in type-2 diabetes: A retrospective study, Biomed Res- India, 23 (4), 615-624. |
| [8] | Grover, G., Gadpayle, A.K., and Sabharwal, A., 2010, Identifying patients with diabetic nephropathy based on serum creatinine under zero truncated model, EJASA, 3(1), 28 – 43. |
| [9] | Multivariate analysis concepts, http://support.sas.com/publishing/pubcat/chaps/56903.pdf |
| [10] | Eye, A.V., and Bogat, G.A., 2004, Testing the assumption of multivariate normality. Psychology Science, 2, 243-258. http://www.pabst-publishers.de/psychology-science/2-2004/ps_2_2004_243-258.pdf. |
| [11] | Lipow, M., and Eidemiller, R.L., 1964, Application of the bivariate normal distribution to a stress vs strength problem in reliability analysis, Technometrics, 6(3), 325-328. http://www.jstor.org/stable/1266043. |
| [12] | Yue, S., 1999, Applying bivariate normal distribution to flood frequency analysis, Water International, 24(3), 248-254. http://dx.doi.org/10.1080/02508069908692168. |
| [13] | Bradburn, M.J., Clark, T.G., Love, S.B., Altman, D.G., 2003, Survival analysis part III: Multivariate data analysis-choosing a model and assessing its adequacy and fit, British Journal of Cancer, 89, 605-11, doi:10.1038/sj.bjc.6601120. |
| [14] | Johnson, N.L., and Kotz, S., Distributions in statistics: Continuous multivariate distributions, John Wiley & Sons, New York, 1972. |
| [15] | Multivariate normal distribution. http://www.maths.manchester.ac.uk/~mkt/MT3732%20(MVA)/Notes/MVA_Section3.pdf |
| [16] | http://www.johndcook.com/normal_approx_to_gamma.html. |
| [17] | http://www.vosesoftware.com/modelriskhelp/index.htm |
| [18] | Mardia KV. Measures of multivariate skewness and kurtosis with applications. Biometrika.1970;57:519-530. http://www.jstor.org/discover/10.2307/2334770?uid=3738256&uid=2&uid=4&sid=21102677097813. |