Rank regression in longitudinal data analysis

Barefield, Eric W.

Rank regression in longitudinal data analysis

Date

2001-05

Authors

Barefield, Eric W.

Publisher

Texas Tech University

Abstract

An important problem that arises frequently in medical research is the analysis of data that arise from a repeated measures design. One of the distinctions of this design is dependencies among the data. Generally, there are a number of subjects that are assumed to be independent, and several measurements are taken on each subject, usually corresponding to different time points. These are the repeated measurements and are generally correlated. Longitudinal data models arise when for each subject at each time point a vector of observations is taken. These situations have been studied extensively in the literature. The parametric approach is described in several papers and summarized nicely by Liang, Diggle, and Zeger (1994). One of the unique features of repeated measures data is the correlation structure. This is generally not the focus of the study, but the more accurately the correlation structure is specified, the better the conclusions will be. In this setting, we typically test linear hypothesis on the slope parameters or on the correlation parameters. Diggle, Liang, and Zeger describe several common assumptions for the covariance structure. The simplest is called the uniform correlation model. This model corresponds to the assumption that all the observations have equal variances and correlations. When the observations within a subject are supposed to be exchangeable, then the uniform correlation model is implied. Another common assumption is the exponential correlation model. This allows for the more reasonable assumption that correlation decreases as the time between observations increases. These first two assumptions specify the covariance structure with only a few parameters to estimate. Another assumption is the unstructured covariance model. This model allows for a separate parameter for each element of the covariance^ matrix. While we are more confident the covariance structure is properly modelled, the addition of new parameters has several drawbacks including increased standard errors. With repeated measures data, typical least squares is not preferred. To take into account the correlation structure weighted least squares is used. Weighted least squares is similar to least squares except the quantity it minimizes involves a weight matrix. If the weight matrix is taken to be the identity matrix, then weighted least squares becomes typical least squares. The most efficient choice of a weight matrix is the inverse of the covariance matrix. A problem with weighted least squares is that when interval estimates are desired the variance parameter needs to be estimated. The method of least squares does not do an adequate job of this. Maximum likelihood estimation {MLE) for the slope parameter remains the same as least squares under Gaussian assumptions, and an estimate of the variance can be obtained, though it is biased. The method of restricted MLE solves the problem of biased estimates by first linearly transforming the data to remove the slope parameters. These methods generally work well when the data is "nice." However, when outlying and influential observations are present, these methods are no longer valid since they put too much weight on these certain observations. Thus, other methods are desired which can resist the effects of these outlying observations.