A comparison of the performance of testlet-based computer adaptive tests and multistage tests



Journal Title

Journal ISSN

Volume Title



Computer adaptive testing (CAT) has grown both in research and implementation. Test construction and security issues, however, have led many to reconsider the merits of CAT. Multistage testing (MST) is an alternative adaptive test design that purportedly addresses CAT's shortcomings. Yet considerably less research has been conducted on MST. Also, most research in adaptive testing has been based on item response theory (IRT). Many tests now make use of testlets -- bundles of items administered together, often based on a common stimulus. The use of testlets violates local independence, a fundamental assumptions of IRT. Testlet response theory (TRT) is a relatively new measurement model designed to measure testlet-based tests. Few studies though have examined its use in testlet-based CAT and MST designs. This dissertation investigated the performance of testlet-based CATs and MSTs measured using the TRT model. The test designs compared included a CAT that is adaptive at the testlet level only (testlet-level CAT), a CAT that is adaptive at both the testlet and item levels (item-level CAT) and a MST design (MST). Test conditions manipulated included test length, item pool size, and examinee ability distribution. Examinee data were generated using TRT-calibrated item parameters based on data from a large-scale reading assessment. The three test designs were evaluated based on measurement effectiveness and exposure control properties. The study found that all three adaptive test designs yielded similar and good measurement accuracy. Overall, the item-level CAT produced better measurement precision, followed by the MST design. However, the MST and CAT designs yielded better measurement precision at different areas of the ability scale. All three test designs yielded acceptable exposure control properties at the testlet level. At the item level, the testlet-level CAT produced the best overall result. The item-level CAT had less than ideal pool utilization, but was able to meet its pre-specified maximum exposure control rate and maintain low item exposure rates. The MST had excellent pool utilization, but a higher percentage of items with high exposure rates. Skewing the underlying ability distribution also had a particularly notable negative effect on the exposure control properties of the MST.