A feasibility study of a computerized adaptive test of the international personality item pool NEO



Journal Title

Journal ISSN

Volume Title



The Big Five/Five Factor Model of personality is the most widely accepted model in the field of social and personality psychology. Currently the most comprehensive measurement instrument, however, takes 45 minutes to complete, making it frequently impractical to administer in research settings. Although shorter instruments have been created, they tend to be less reliable, internally consistent, and valid. Computerized adaptive testing could be the solution to the trade off between test length and measurement precision. This dissertation investigated the usefulness of developing a computerized adaptive test (CAT) of the Big Five. Because each factor was unidimensional, they were analyzed separately for the dissertation. First, differential item functioning (DIF) by gender was analyzed so that items showing large amounts of DIF could be removed to reduce bias in the measurement. A total of 33 items were removed from the item pool. The majority of the items seemed to relate to different stereotypes, gender roles, and socialization of men and women. Then the remaining item pool was calibrated using Andrich’s rating scale model. Results showed that the scale information functions were peaked around the center of the distribution, indicating that the items in the pool provided the most information and subsequently, the best precision of measurement, for examinees with trait estimates near the middle. Through realistic CAT simulations, using data from real and simulated participants, the utility of creating a CAT version of the IPIP-NEO was evaluated. The simulations indicated that the CAT performed best when the test length was fixed and content was balanced by facet. The variable-length scales tended to reduce accuracy and measurement precision, and therefore, were not recommended. The CAT resulted in correlations with the full version that were similar to an existing shortened version of the IPIP-NEO. Although the standard error of measurement was smaller for the CAT versions, the CAT did not provide enough benefits to warrant recommendation for live testing at this time. Future research is recommended in terms of construct definitions and item pool development before a live CAT should be developed.