Bayesian Hierarchical Modeling with 3PNO Item Response Models

Yanyan Sheng; Todd C. Headrick

Paper Information
Paper Submission

American Journal of Mathematics and Statistics

p-ISSN: 2162-948X e-ISSN: 2162-8475

2013; 3(5): 281-287

doi:10.5923/j.ajms.20130305.05

Bayesian Hierarchical Modeling with 3PNO Item Response Models

Abstract
Reference
Full-Text PDF
Full-text HTML

Yanyan Sheng, Todd C. Headrick

Section on Statistics and Measurement, Southern Illinois University, Carbondale, IL, 62901, USA

Correspondence to: Yanyan Sheng, Section on Statistics and Measurement, Southern Illinois University, Carbondale, IL, 62901, USA.

Email:

Abstract

Fully Bayesian estimation has been developed for unidimensional IRT models. In this context, prior distributions can be specified in a hierarchical manner so that item hyperparameters are unknown and yet still have their own priors. This type of hierarchical modeling is useful in terms of the three-parameter IRT model as it reduces the difficulty of specifying model hyperparameters that lead to adequate prior distributions. Further, hierarchical modeling ameliorates the noncovergence problem associated with nonhierarchical models when appropriate prior information is not available. As such, a Fortran subroutine is provided to implement a hierarchical modeling procedure associated with the three-parameter normal ogive model for binary item response data using Gibbs sampling. Model parameters can be estimated with the choice of noninformative and conjugate prior distributions for the hyperparameters.

Keywords: IRT, Three-parameter Normal Ogive Model, MCMC, Gibbs Sampling, Hyperparameter, Fortran

Cite this paper: Yanyan Sheng, Todd C. Headrick, Bayesian Hierarchical Modeling with 3PNO Item Response Models, American Journal of Mathematics and Statistics, Vol. 3 No. 5, 2013, pp. 281-287. doi: 10.5923/j.ajms.20130305.05.

Article Outline

1. Introduction

2. Methodology

2.1. The Gibbs Sampling Procedure

2.2. The Fortran Subroutine

3. Conclusions

Appendix

1. Introduction

The unidimensional item response theory (IRT) model provides a fundamental framework for modeling person-item interaction given the usual assumption of one latent dimension. The popular two-parameter normal ogive (2PNO; e.g.,[1,2]) IRT model specifies that the probability of the i-th person obtaining a correct response on the j-th item as

(1)

where i = 1,…,n and j = 1,…,k. The notations α_j, β_j, and θ_i in (1) are scalar parameters describing the item (i) slope, (ii) intercept, and (iii) person-trait, respectively. Further, the model in (1) assumes that no guessing is involved with respect to the test item responses.

In terms of objective tests that involve multiple-choice or true-or-false items, where an item may be too difficult for some examinees, the three-parameter normal ogive (3PNO; e.g.,[3]) model should be considered. Specifically, the 3PNO model assumes that the probability associated with a correct response is greater than zero even for examinees with very low trait levels, and it is defined as follows

(2)

Inspection of (2) indicates that the 3PNO model accommodates for guessing by adding the pseudo-chance-level parameter γ_j. As such, the model in (2) is more appealing because it is applicable to a wider variety of testing situations where the 2PNO model may not be appropriate.

In the context of the Bayesian estimation of IRT models, simultaneous estimation of item and person parameters relies on the use of Markov Chain Monte Carlo (MCMC; e.g.,[4,5]) techniques to summarize the posterior distributions. For example, Albert[10] applied a MCMC algorithm (the Gibbs sampler[11]) to the 2PNO model using data augmentation [12], which has been implemented in Fortran[13]. Further, Sahu[14] (see also[15]) generalized this approach to the 3PNO model. However, this generalization, where the model hyperparameters take on specific values (such as in the applications of[16] and[17]), has a disadvantage associated with the nonconvergence problem unless strong informative priors are specified for the item slope and intercept parameters[18].

In the context of the 3PNO model, it has been demonstrated that improper noninformative prior densities for item slope and intercept parameters result in an undefined posterior distribution, which presents the problem of unstable parameter estimates[18, 19]. Further, even with proper informative prior densities, the Gibbs sampling procedure noted above either fails to converge or requires a large number of iterations for the Markov Chain to reach convergence[19]. Sheng[20] indicated that this problem can be resolved by specifying the prior distribution in a hierarchical manner so that the item hyperparameters are unknown and have their own prior distributions. These second-order priors are called hyperpriors and are useful for incorporating uncertainty in the hyperparameters of a prior distribution[6]. Further, it has been demonstrated that a vague hyperprior does not affect posterior precision like a vague prior does[20] because the uncertainty is at the second level of the prior, and consequently its effect on the resulting posterior distribution is so small that the posterior is dominated by the data[21]. This hierarchical modeling approach allows for a more objective approach to inference by estimating the parameters of prior distributions from data rather than specifying them based on subjective information.

Given that MCMC is computationally expensive and that Fortran is fast in terms of numerical computing[22], the purpose of this paper is to provide a subroutine to determine the posterior estimates (and their standard errors) associated with the 3PNO model parameters using Gibbs sampling. The Fortran subroutine has the option of specifying noninformative and conjugate hyperpriors for item slope and intercept parameters.

2. Methodology

2.1. The Gibbs Sampling Procedure

To implement a Gibbs sampling procedure for the 3PNO model defined in (2), a Bernoulli variable W is first introduced such that W_ij = 1 (or W_ij = 0) if the i-th person knows (or does not know) the correct answer to the j-th item. The probability function associated with W_ij is defined as[14]

(3)

where η_ij = α_jθ_i – β_j. As such, if W_ij = 0 then the i-th person will guess the j-th item correctly (or incorrectly) with a probability γ_j (or (1– γ_j)). Further, a latent random variable Z is introduced such that Z_ij ~ N(η_ij, 1)[10, 12] where if W_ij = 1 (or W_ij = 0) then Z_ij > 0 (or Z_ij ≤ 0). The prior distributions associated with the item and person parameters are assumed to be as follows

Note that we consider models where the prior distributions are assumed for the hyperparameters µ_α, µ_β, σ²_α and σ²_β, instead of specifying values for them.

The joint posterior distribution of (θ, ξ, γ, W, Z, µ_ξ, ∑_ξ), where ξ_j = (α_j, β_j)′, µ_ξ=(µ_α, µ_β)′, ∑_ξ = diag(σ²_α, σ²_β), is

(4)

where f(y|W, γ) is the likelihood function.

The full conditional distribution of W_ij, Z_ij, θ_i, ξ_j, and γ_j can be derived in closed form as follows:

(5)

(6)

(7)

(8)

where x =[θ, -1];

(9)

where a_j is the number of correct responses obtained by guessing, and b_j is the number of persons who do not know the correct answer to the j-th item.

In terms of the hyperparameters µ_α, µ_β, σ²_α and σ²_β, their full conditional distributions are

(10)

(11)

(12)

(13)

respectively, with uniform noninformative hyperpriors

and

, or as

(14)

(15)

(16)

(17)

with conjugate hyperprior distributions

As such, using starting values of θ⁽⁰⁾, ξ⁽⁰⁾, γ⁽⁰⁾, µ_ξ⁽⁰⁾ and ∑_ξ⁽⁰⁾, observations (W⁽^l⁾, Z⁽^l⁾, θ⁽^l⁾, ξ⁽^l⁾, γ⁽^l⁾, µ_ξ⁽^l⁾, ∑_ξ⁽^l⁾) can be simulated from the Gibbs sampler by iteratively drawing from their respective full conditional distributions specified in (5) through (15). The transition from (W⁽^l^–1), Z⁽^l^–1), θ⁽^l^–1), ξ⁽^l^–1), γ⁽^l^–1), µ_ξ⁽^l^–1), ∑_ξ⁽^l^–1)) to (W⁽^l⁾, Z⁽^l⁾, θ⁽^l⁾, ξ⁽^l⁾, γ⁽^l⁾, µ_ξ⁽^l⁾, ∑_ξ⁽^l⁾) is based on the following seven steps:

1. Draw W⁽^l⁾ ~ p(W|y, θ⁽^l^–1), ξ⁽^l^–1), γ⁽^l^–1));

2. Draw Z⁽^l⁾ ~ p(Z|W⁽^l⁾, θ⁽^l^–1), ξ⁽^l^–1));

3. Draw θ⁽^l⁾ ~ p(θ|Z⁽^l⁾, ξ⁽^l^–1));

4. Draw ξ⁽^l⁾ ~ p(ξ|Z⁽^l⁾, θ⁽^l⁾, µ_ξ⁽^l^–1), ∑_ξ⁽^l^–1));

5. Draw γ⁽^l⁾ ~ p(γ|y, W⁽^l⁾);

6. Draw µ_ξ⁽^l⁾ ~ p(µ_ξ|ξ⁽^l⁾,∑_ξ⁽^l^–1));

7. Draw ∑_ξ⁽^l⁾ ~ p(∑_ξ|ξ⁽^l⁾, µ_ξ⁽^l⁾).

This iterative procedure produces a sequence of (θ⁽^l⁾, ξ⁽^l⁾, γ⁽^l⁾), l =1,…, L. To reduce the effect of the starting values, early iterations in the Markov chain are set as burn-ins to be discarded. Samples from the remaining iterations are then used to summarize the posterior density of the item parameters ξ, γ and person parameters θ. As with standard Monte Carlo, with large enough samples, the posterior means of ξ, γ and θ are considered as estimates of the true parameters. However, their Monte Carlo standard errors cannot be calculated using the sample standard deviations because subsequent samples in each Markov chain are autocorrelated (e.g.[10, 23]). One approach to calculating the standard errors is through batching[24]. Specifically, with a long chain of samples being separated into contiguous batches of equal length, the Monte Carlo standard deviation for each parameter is then estimated to be the standard deviation of these batched means. And the Monte Carlo standard error of the estimate is a ratio of the Monte Carlo standard deviation and the square root of the number of batches.

2.2. The Fortran Subroutine

The subroutine (see the Appendix) initially sets the starting values for the parameters such that θ_i⁽⁰⁾ = 0, α_j⁽⁰⁾ = 1, β_j⁽⁰⁾ = 1, γ_j⁽⁰⁾ = .2,[16], and, µ_α⁽⁰⁾ = µ_β⁽⁰⁾ = 0, σ²_α⁽⁰⁾ = σ²_β⁽⁰⁾ =1. The subroutine then iteratively draws random samples for W, Z, θ, ξ, and γ from their respective full conditional distributions specified in (5) through (9) with µ= 0, σ²=1, and s = 5, t = 17. Samples associated with the hyperparameters for ξ are simulated from either (10) through (13), where uniform noninformative priors are assumed for µ_ξ and ∑_ξ, or from (14) through (17), where conjugate priors are adopted for them with τ_α = τ_β = 100, ε₁ = ς₁= 2, and ε₂ = ς₂= .001. We would note that the conjugate priors specified in this manner are weakly informative. The algorithm continues until all the L samples are simulated. It then discards the early burn-in samples, and computes the posterior estimates and standard errors for the model parameters, θ, α, β, and γ, using the batching scheme described above.

Table 1. Posterior estimates and Monte Carlos standard errors (MCSEs) for α with noninformative and conjugate priors assumed for µ_ξ and ∑_ξ

Table 2. Posterior estimates and Monte Carlos standard errors (MCSEs) for β with noninformative and conjugate priors assumed for µ_ξ and ∑_ξ

Table 3. Posterior estimates and Monte Carlos standard errors (MCSEs) for γ with noninformative and conjugate priors assumed for µ_ξ and ∑_ξ

For example, for a 4000-by-10 (i.e., n = 4,000 and k = 10) dichotomous (0-1) data matrix simulated using the item parameters shown in the first column of Tables 1 to 3, the Gibbs sampler was implemented so that 10,000 samples were simulated with the first 5,000 taken to be burn-in. The remaining 5,000 samples were separated into 5 batches, each with 1,000 samples.

Two sets of the posterior means for α, β, and γ, as well as their Monte Carlo standard errors, were obtained assuming the noninformative or weakly informative hyperpriors described previously, and are displayed in the rest of the tables. We note that the item parameters were estimated with enough accuracy and the two sets of posterior estimates differ only slightly from each other, signifying that the results are not sensitive to the choice of prior distributions for the hyperparameters µ_ξ and ∑_ξ. In addition, the small values of the Monte Carlo standard errors suggested that the Markov chains with a run length of 10,000 and a burn-in period of 5,000 reached the stationary distribution.

3. Conclusions

The Fortran subroutine leaves it to the user to choose between uniform and conjugate priors for the hyperparameters for item slope and intercept parameters, µ_ξ or ∑_ξ. Further, the user can change the source code so that the prior distribution for θ_i assumes a different location µ, or scale σ². Similarly, the values of s and t can be modified to reflect different prior beliefs on the distribution for the pseudo-chance parameter. One can also change the values for τ_α, τ_β, ε₁, ε₂, ς₁or ς₂to specify different prior densities for the hyperparameters µ_ξ and ∑_ξ. It is noted that convergence can be assessed by comparing the marginal posterior mean and standard deviation of each parameter computed for every 1,000 samples after the burn-ins. Similar values provide a rough indication of similar marginal posterior densities, which further indicates possible convergence of the Gibbs sampler[25, 26].

Appendix

References

[1]	Lord, F. M., 1952, A theory of test scores, Psychometric Monograph No. 7.
[2]	Lord, F. M., and Novick, M. R., 1968, Statistical theories of mental test scores, Addison-Wesley, Reading, MA.
[3]	Lord, F. M., 1980, Applications of item response theory to practical testing problems, Lawrence Erlbaum Associates, Hillsdale, New Jersey.
[4]	Smith, A. F. M., and Roberts, G. O., 1993, Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods, Journal of the Royal Statistical Society, Series B, 55, 3–23.
[5]	Tierney, L., 1994, Markov chains for exploring posterior distributions, The Annals of Statistics, 22, 1701–1762.
[6]	Carlin, B. P., and Louis, T. A., 2000, Bayes and empirical Bayes methods for data analysis (2^nded.), Chapman & Hall, London.
[7]	Chib, S., and Greenberg, E., 1995, Understanding the Metropolis-Hastings algorithm, The American Statistician, 49(4), 327–335.
[8]	Gelfand, A. E., and Smith, A. F. M., 1990, Sampling-based approaches to calculating marginal densities, Journal of the American Statistical Association, 85, 398–409.
[9]	Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B., 2003, Bayesian data analysis, Chapman & Hall/CRC, Boca Raton.
[10]	Albert, J. H., 1992, Bayesian estimation of normal ogive item response curves using Gibbs sampling, Journal of Educational Statistics, 17, 251–269.
[11]	Geman, S., and Geman, D., 1984, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Analysis and Machine Intelligence, 6, 721–741.
[12]	Tanner, M. A., and Wong, W. H., 1987, The calculation of posterior distribution by data augmentation (with discussion), Journal of the American Statistical Association, 82, 528–550.
[13]	Sheng, Y., and Headrick, T. C., 2007, An algorithm for implementing Gibbs sampling for 2PNO IRT models, Journal of Modern Applied Statistical Methods, 6, 341–349.
[14]	Sahu, S. K., 2002, Bayesian estimation and model choice in item response models, Journal of Statistical Computation and Simulation, 72, 217–232.
[15]	Johnson, V. E., and Albert, J. H., 1999, Ordinal data modeling, Springer-Verlag, New York.
[16]	Béguin, A. A., and Glas, C. A. W., 2001, MCMC estimation and some model-fit analysis of multidimensional IRT models, Psychometrika, 66, 541–562.
[17]	Glas, C. W., and Meijer, R. R., 2003, A Bayesian approach to person fit analysis in item response theory models, Applied Psychological Measurement, 27, 217–233.
[18]	Sheng, Y., 2008, Markov chain Monte Carlo estimation of normal ogive IRT models in MATLAB, Journal of Statistical Software, 25(8), 1–15.
[19]	Sheng, Y., 2010, A sensitivity analysis of Gibbs sampling for 3PNO IRT models: Effects of prior specifications on parameter estimates, Behaviormetrika, 37(2), 87–110.
[20]	Sheng, Y., 2013, An empirical investigation of Bayesian hierarchical modeling with unidimensional IRT models, Behaviormetrika, 40(1), 19–40.
[21]	Williams, C. L., and Locke, A., 2003, Hyperprior imprecision in hierarchical Bayesian modeling of cluster Bernoulli observations, InterStat: Statistics on the Internet. URL: http://interstat.statjournals.net/YEAR/2003/abstracts/0310001.php.
[22]	Brainerd, W., 2003, The importance of Fortran in the 21^st century, Journal of Modern Statistical Methods, 2, 14–15.
[23]	Patz, R. J., and Junker, B. W., 1999, A straightforward approach to Markov chain Monte Carlo methods for item response models, Journal of Educational and Behavioral Statistics, 24, 146–178.
[24]	Ripley, B. D., 1987, Stochastic simulation, Wiley, New York.
[25]	Gelfand, A. E., Hills, S. E., Racine-Poon, A., and Smith, A. F. M., 1990, Illustration of Bayesian inference in normal data models using Gibbs sampling, Journal of the American Statistical Association, 85, 315–331.
[26]	Hoijtink, H., and Molenaar, I. W., 1997, A multidimensional item response model: Constrained latent class analysis using posterior predictive checks, Psychometrika, 62, 171–189.

Paper Information

Journal Information

Bayesian Hierarchical Modeling with 3PNO Item Response Models

Article Outline

1. Introduction

2. Methodology

2.1. The Gibbs Sampling Procedure

2.2. The Fortran Subroutine

3. Conclusions

Appendix

References