Coefficient of intrinsic dependence: a new measure of association

Date

2005-08-29

Journal Title

Journal ISSN

Volume Title

Publisher

Texas A&M University

Abstract

To detect dependence among variables is an essential task in many scientific investigations. In this study we propose a new measure of association, the coefficient of intrinsic dependence (CID), which takes value in [0,1] and faithfully reflects the full range of dependence for two random variables. The CID is free of distributional and functional assumptions. It can be easily implemented and extended to multivariate situations. Traditionally, the correlation coefficient is the preferred measure of association. However, it's effectiveness is considerably compromised when the random variables are not normally distributed. Besides, the interpretation of the correlation coefficient is difficult when the data are categorical. By contrast, the CID is free of these problems. In our simulation studies, we find that the ability of the CID in differentiating different levels of dependence remains robust across different data types (categorical or continuous) and model features (linear or curvilinear). Also, the CID is particularly effective when the dependence is strong, making it a powerful tool for variable selection. As an illustration, the CID is applied to variable selection in two aspects: classification and prediction. The analysis of actual data from a study of breast cancer gene expression is included. For the classification problem, we identify a pair of genes that best classify a patient's prognosis signature, and for the prediction problem, we identify a pair of genes that best relates to the expression of a specific gene.

Description

Citation