An investigation of the optimal test design for multi-stage test using the generalized partial credit model



Journal Title

Journal ISSN

Volume Title



Although the design of Multistage testing (MST) has received increasing attention, previous studies mostly focused on comparison of the psychometric properties of MST with CAT and paper-and-pencil (P&P) test. Few studies have systematically examined the number of items in the routing test, the number of subtests in a stage, or the number of stages in a test design to achieve accurate measurement in MST. Given that none of the studies have identified an ideal MST test design using polytomously-scored items, the current study conducted a simulation to investigate the optimal design for MST using generalized partial credit model (GPCM). Eight different test designs were examined on ability estimation across two routing test lengths (short and long) and two total test lengths (short and long). The item pool and generated item responses were based on items calibrated from a national test consisting of 273 partial credit items. Across all test designs, the maximum information routing method was employed and the maximum likelihood estimation was used for ability estimation. Ten samples of 1,000 simulees were used to assess each test design. The performance of each test design was evaluated in terms of the precision of ability estimates, item exposure rate, item pool utilization, and item overlap. The study found that all test designs produced very similar results. Although there were some variations among the eight test structures in the ability estimates, results indicate that the performance overall of these eight test structures in achieving measurement precision did not substantially deviate from one another with regard to total test length and routing test length. However, results from the present study suggest that routing test length does have a significant effect on the number of non-convergent cases in MST tests. Short routing tests tended to result in more non-convergent cases, and the presence of fewer stage tests yielded more of such cases than structures with more stages. Overall, unlike previous findings, the results of the present study indicate that the MST test structure is less likely to be a factor impacting ability estimation when polytomously-scored items are used, based on GPCM.