Evidence of Construct-Related Validity for Assessment Centers: More Pieces of the Inferential Pie

Archuleta, Kathryn

Evidence of Construct-Related Validity for Assessment Centers: More Pieces of the Inferential Pie

Date

2010-07-14

Authors

Archuleta, Kathryn

Abstract

Much research has been conducted on the topic of the construct-related validity of assessment centers, however a definitive conclusion has yet to be drawn. The central question of this debate is whether assessment centers are measuring the dimensions they are designed to measure. The present study attempted to provide more evidence toward the improvement of construct-related validity. The first hypothesis involved determining whether opportunity to observe and opportunity to behave influenced discriminant and convergent validity. The second hypothesis addressed the debate over evaluation method and examined which method, within-exercise or within-dimension, yielded more favorable internal construct-related validity evidence. The third hypothesis explored the call for exercise scoring in assessment centers and compared the criterion-related validity of exercise versus dimension scores within the same assessment center. Finally, the fourth objective looked at the relationship of the stability of the dimensions with internal construct-related validity, specifically convergent validity evidence. A developmental assessment center used in two applied settings supplied the data. Two administrations of the assessment center were conducted for low to mid-level managers in a state agency (N = 31). Five administrations were conducted in a professional graduate school of public administration that prepares students for leadership and managerial positions in government and public service (N = 108). The seven administrations yielded a total sample size of 139 participants. Analysis of multi-trait-multi-method (MTMM) matrices revealed that, as hypothesized, a lack of opportunity to behave within exercises, operationalized using behavior counts, yielded poor discriminant validity. Assessor ratings of opportunity to observe and behave did not produce hypothesized results. Consistent with the second hypothesis, secondary assessors, who represented the within-dimension evaluation method, provided ratings that demonstrated better construct-related validity evidence than the ratings provided by primary assessors, who represented the within-exercise method. Correlation and regression analyses of the dimension/performance relationships and the exercise/performance relationships revealed neither dimensions nor exercises to be the better predictor of supervisor ratings of performance. Using MTMM, partial support was found for the fourth objective: those dimensions that were more stable across exercises yielded better convergent validity evidence versus those dimensions that were more situationally specific. However the differences were not statistically significant or large. Overall results of this study suggest that there are some areas of design and implementation that can affect the construct-related validity of assessment centers, and researchers should continue to search for ways to improve assessment center construct-related validity, but should also look for ways other than MTMM to assess validity.