Comparing Model-based and Design-based Structural Equation Modeling Approaches in Analyzing Complex Survey Data

Wu, Jiun-Yu

Comparing Model-based and Design-based Structural Equation Modeling Approaches in Analyzing Complex Survey Data

Date

2011-10-21

Authors

Wu, Jiun-Yu

Abstract

Conventional statistical methods assuming data sampled under simple random sampling are inadequate for use on complex survey data with a multilevel structure and non-independent observations. In structural equation modeling (SEM) framework, a researcher can either use the ad-hoc robust sandwich standard error estimators to correct the standard error estimates (Design-based approach) or perform multilevel analysis to model the multilevel data structure (Model-based approach) to analyze dependent data. In a cross-sectional setting, the first study aims to examine the differences between the design-based single-level confirmatory factor analysis (CFA) and the model-based multilevel CFA for model fit test statistics/fit indices, and estimates of the fixed and random effects with corresponding statistical inference when analyzing multilevel data. Several design factors were considered, including: cluster number, cluster size, intra-class correlation, and the structure equality of the between-/within-level models. The performance of a maximum modeling strategy with the saturated higher-level and true lower-level model was also examined. Simulation study showed that the design-based approach provided adequate results only under equal between/within structures. However, in the unequal between/within structure scenarios, the design-based approach produced biased fixed and random effect estimates. Maximum modeling generated consistent and unbiased within-level model parameter estimates across three different scenarios. Multilevel latent growth curve modeling (MLGCM) is a versatile tool to analyze the repeated measure sampled from a multi-stage sampling. However, researchers often adopt latent growth curve models (LGCM) without considering the multilevel structure. This second study examined the influences of different model specifications on the model fit test statistics/fit indices, between/within-level regression coefficient and random effect estimates and mean structures. Simulation suggested that design-based MLGCM incorporating the higher-level covariates produces consistent parameter estimates and statistical inferences comparable to those from the model-based MLGCM and maintain adequate statistical power even with small cluster number.