A polytomous nonlinear mixed model for item analysis



Journal Title

Journal ISSN

Volume Title



The present study attempted to connect the framework of item response theory (IRT) within the more comprehensive and well-known statistical frameworks of the generalized linear mixed model and the nonlinear mixed model (NLMM). This line of research extends previous framework approaches for IRT (Mellenbergh, 1994; Adams & Wilson, 1996; Kamata, 2001; Rijmen et al., 2002). The detailed derivation of a polytomous nonlinear mixed model (PNLMM) was presented to illustrate how a polytomous IRT model could be formulated within the framework of the NLMM and to demonstrate that the derived IRT model was formally equivalent with Muraki (1990)’s rating scale model. Different parameterizations along with the use of different link functions were introduced. The PNLMM was extended to a regression-type model and a model for DIF analyses. Since SAS NLMIXED includes estimation algorithms fitting both GLMM and NLMM frameworks, it provides a readily used environment that can extend current IRT models. An evaluation of the estimation performance of SAS NLMIXED for the PNLMM was made through a 27-condition simulation study manipulating three factors (number of items = 5, 10, 20; number of categories = 3, 5, 7; N = 250, 500, 1000). Eight common indicators were calculated via five replications for the 5- and 10-item conditions whereas for the 20-item conditions, the results of single runs were discussed due to lengthy computation time. An additional evaluation of the estimation performance of PROC NLMIXED involved a comparison between PROC NLMIXED and PARSCALE. The simulation study indicated that PROC NLMIXED recovers IRT parameters pretty well for most conditions. Large estimation error was detected for the second item’s discrimination parameter in the 3-category, 250-subject conditions under both PROC NLMIXED and PARSCALE. This indicates that a sample size of 250 is not enough to achieve stable estimation for item discrimination parameters of the PNLMM (or Muraki’s rating scale model), especially when there are only three categories. Limitations and directions for future research are discussed.