International Journal of Statistics and Applications
p-ISSN: 2168-5193 e-ISSN: 2168-5215
2015; 5(1): 31-46
doi:10.5923/j.statistics.20150501.05
Haider R. Mannan
Centre for Chronic Disease Prevention, School of Public Health and Tropical Medicine, James Cook University, Cairns, Australia
Correspondence to: Haider R. Mannan, Centre for Chronic Disease Prevention, School of Public Health and Tropical Medicine, James Cook University, Cairns, Australia.
| Email: | ![]() |
Copyright © 2015 Scientific & Academic Publishing. All Rights Reserved.
Age- and year- specific rates are widely used in epidemiological modelling studies. As these rates are usually unstable due to small denominators, these require smoothing in both dimensions. We demonstrated the application of a two dimensional nearest neighbour method for smoothing age- and year- specific cardiac procedure and death rates. SAS macros were provided for smoothing two rates successively, however these can be adapted to smooth more than two rates or event counts, if required. We found that for the example data sets, the order of the moving average in both year and age dimensions was three and hence a nine point weighted moving average was justified. We demonstrated that in terms of better calibration and capturing important changes in data, the proposed smoother outperformed a similar smoother assigning maximum weight to the central cell but equal weights around it. The degree of smoothing increased with increase in the assigned central cell weight. In conclusion, because of its simplicity, the proposed nearest neighbour smoother provides a convenient alternative to the existing two dimensional smoothers and is useful in situations requiring smoothing a series of rates or counts in two dimensions. A robust version of the smoother is also available from the author.
Keywords: Smoothing, Two dimensional, Rates, Event counts, Nearest neighbour, Cardiologic application, SAS macros
Cite this paper: Haider R. Mannan, Application and Computer Programs for a Simple Adaptive Two Dimensional Smoother: A Case Study for Cardiac Procedure and Death Rates, International Journal of Statistics and Applications, Vol. 5 No. 1, 2015, pp. 31-46. doi: 10.5923/j.statistics.20150501.05.
![]() | (1) |
where,
When a binary (event or non-event) experiment is repeated a fixed number of times, say n times, then the count of an event and also the probability of an event both follow binomial distribution. Hence, the use of binomial distribution for constructing the likelihood function above is justifiable. The weighted moving averages defined above are based on the assumption that there is only one lag and one lead in both dimensions (when smoothing the rates) resulting in nine cells in the smoothing bandwidth. If higher lags and leads are to be considered in both dimensions, the bandwidth would increase, for example, 25 cells would be required to smooth the rates if two lags and two leads are considered in both dimensions.In our case, the term -2logLF based on the set of weights around the central cell which minimizes it is the deviance. Based on large sample theory, it should have an asymptotic chi-squared distribution with error degrees of freedom [14]. The use of chi-squared distribution in this context is to assess the goodness of fit. In our examples for smoothing rates to be shown in the next section, we fix the central cell weight to 0.35. The value of 0.35 is arbitrary. The only criteria is that it should be the maximum of all the weights. Fixing the central cell weight to 0.35 gives a maximum of 0.3 for any of the other weights. Thus, the criteria of assigning maximum weight to the central cell is satisfied. Values higher than 0.35 (but less than 1) could also have been used to fix the central cell weight. However, it should not be too large because of concerns for over-smoothing. A weight of around 0.35 to the central cell is expected to provide gentler smoothing.
|
![]() | Figure 1. Observed estimates of the probability of a CABG given history of CHD, by age group for males |
![]() | Figure 3. Observed estimates of probability of a CABG given history of CHD by calendar year, males |
![]() | Table 2. Deviance for the Nearest Neighbour Smoother with Unequal and Equal Distribution of Weights Around the Central Cell Based on Some Selected Transition Probabilities |

Appendix 2: SAS Codes for Finding -2logLF for the Two Rates Using Different Weight Sets Defined in Appendix 1 

Appendix 3: A SAS Macro for Finding the Deviance and the Initial Optimal Weight Sets for Smoothing the Two Rates
Appendix 4: SAS Codes for Refining the Initial Optimal Weights by an Increment of .01 Given the Initial Optimal Weights for Smoothing the First Rate are .10,.05,.05,. 15, .35,.10,.05,.05,.10 & for the Second Rate are .05,.05,.05,.3,.35,.05,.05,.05 and .05
Appendix 5: SAS Codes for Finding -2logLF for the Two Rates Using Different Weight Sets Defined in Appendix 3These codes are not shown here as they are the same for Appendix 2 except that the SAS file to be read is d and the output SAS dataset saved is e. Appendix 6: A SAS Macro for Finding the Deviance and the Final Optimal Weight Sets for Smoothing the Two RatesThese codes are not shown here as they are identical to Appendix 3 except that the SAS file to be read is e. Note: After running this macro the optimal weights for the first rate are found as .09,.09,.01,.14,.35,.13,.01,.09,.09 & for the second rate as 05,.05,.01,.34,.35,.09,.05,.05 and .01.Appendix 7: SAS Codes for Estimating the Smoothed Values of the Ratesdata final;/* We define an array for entering the data for two conditional probabilities, the examples given here are for Pr(CABG|CHD history, males) and Pr(CHD death|CHD history, females), both for years 1989 through 2001, and age groups 30-34 through 80-84*/
array p[2,9,11] p1-p198;array w{9} (.09 .09 .01 .14 .35 .13 .01 .09 .09);/* Calculate the nearest neighbour weighted moving averages for the two rates respectively for age group 35-39 and year 1990, age group 35-39 and year 1991, and so on until age group 75-79 and year 2000.*/