Browsing by Subject "IRT"

Now showing 1 - 3 of 3

Effects of sample size, ability distribution, and the length of Markov Chain Monte Carlo burn-in chains on the estimation of item and testlet parameters
(2011-05) Orr, Aline Pinto; Dodd, Barbara Glenzing; Suh, Youngsuk
Item Response Theory (IRT) models are the basis of modern educational measurement. In order to increase testing efficiency, modern tests make ample use of groups of questions associated with a single stimulus (testlets). This violates the IRT assumption of local independence. However, a set of measurement models, testlet response theory (TRT), has been developed to address such dependency issues. This study investigates the effects of varying sample sizes and Markov Chain Monte Carlo burn-in chain lengths on the accuracy of estimation of a TRT model’s item and testlet parameters. The following outcome measures are examined: Descriptive statistics, Pearson product-moment correlations between known and estimated parameters, and indices of measurement effectiveness for final parameter estimates.
An evaluation of item difficulty and person ability estimation using the multilevel measurement model with short tests and small sample sizes
(2011-05) Brune, Kelly Diane; Beretvas, Susan Natasha; Dodd, Barbara G.; Pituch, Keenan A.; Powers, Daniel A.; Zimmaro, Dawn M.
Recently, researchers have reformulated Item Response Theory (IRT) models into multilevel models to evaluate clustered data appropriately. Using a multilevel model to obtain item difficulty and person ability parameter estimates that correspond directly with IRT models’ parameters is often referred to as multilevel measurement modeling. Unlike conventional IRT models, multilevel measurement models (MMM) can handle, the addition of predictor variables, appropriate modeling of clustered data, and can be estimated using non-specialized computer software, including SAS. For example, a three-level model can model the repeated measures (level one) of individuals (level two) who are clustered within schools (level three). Limitations in terms of the minimum sample size and number of test items that permit reasonable one-parameter logistic (1-PL) IRT model’s parameters have not been examined for either the two- or three-level MMM. Researchers (Wright and Stone, 1979; Lord, 1983; Hambleton and Cook, 1983) have found that sample sizes under 200 and fewer than 20 items per test result in poor model fit and poor parameter recovery for dichotomous 1-PL IRT models with data that meet model assumptions. This simulation study tested the performance of the two-level and three-level MMM under various conditions that included three sample sizes (100, 200, and 400), three test lengths (5, 10, and 20), three level-3 cluster sizes (10, 20, and 50), and two generated intraclass correlations (.05 and .15). The study demonstrated that use of the two- and three-level MMMs lead to somewhat divergent results for item difficulty and person-level ability estimates. The mean relative item difficulty bias was lower for the three-level model than the two-level model. The opposite was true for the person-level ability estimates, with a smaller mean relative parameter bias for the two-level model than the three-level model. There was no difference between the two- and three-level MMMs in the school-level ability estimates. Modeling clustered data appropriately; having a minimum total sample size of 100 to accurately estimate level-2 residuals and a minimum total sample size of 400 to accurately estimate level-3 residuals; and having at least 20 items will help ensure valid statistical test results.
The Impact of Misspecifying A Higher Level Nesting Structure in Item Response Theory Models: A Monte Carlo Study
(2013-08-02) Zhou, Qiong
The advantages of Multilevel Item Response Theory (MLIRT) model have been studied by several researchers, and even the impact of ignoring a higher level of data structure in multilevel analysis has been studied and discussed. However, due to the technical complexity of modeling and the shortage in function of dealing with multilevel data in traditional IRT packages (e.g., BILOG and PARSCALE), researchers may not be able to analyze the multilevel IRT data accurately. The impact of this type of misspecification, especially for MLIRT models, has not yet been thoughtfully examined. This dissertation consists of two studies: one is a Monte Carlo study that investigates the impact of this type of misspecification and the other one is a study with real-world data to validate the results obtaining from the simulation study. In Study One (the simulation study), we investigate the potential impact of several factors, including: intra-class correlation (ICC), sample size, cluster size and test length, on the parameter estimates and corresponding test of significance under two situations: when the higher level nesting structure is appropriately modeled (i.e., true model condition) versus inappropriately modeled (i.e., misspecified model condition). Three-level straightly hierarchical data (i.e., items are nested within students who are further nested within schools) were generated. Two person-related and school-related covariates were added at the second level (i.e., person-level) and the third level (i.e., school-level), respectively. The results of simulation studies showed that both parameter estimates and their corresponding standard errors would be biased if the higher level nesting structure was ignored. In Study Two, a real data from the Programme for International Student Assessment with purely hierarchical structure were analyzed by comparing parameter estimates when inappropriate versus appropriate IRT models are specified. The findings mirrored the results obtained from the first study. The implication of this dissertation to researchers is that it is important to model the multilevel data structure even in item response theory models. Researchers should interpret their results in caution when ignoring a higher level nesting structure in MLIRT models. What's more, the findings may help researchers determine when MLIRT should be used to get an unbiased result. Limitations concerning about some of the constraints of the simulation study could be relaxed. For instance, although this study used only dichotomous items, the MLIRT could also be used with polytomous items. The test length could be longer and more variability could be introduced into the item parameters? values.

Browsing by Subject "IRT"

Results Per Page

Sort Options