Privacy-aware publication and utilization of healthcare data
MetadataShow full item record
Open access to health data can bring enormous social and economical benefits. However, such access can also lead to privacy breaches, which may result in discrimination in insurance and employment markets. Privacy is a subjective and contextual concept, thus it should be interpreted from both systemic and information perspectives to clearly understand potential breaches and consequences. This dissertation investigates three popular use cases of healthcare data: specifically, 1) synthetic data publication, 2) aggregate data utilization, and 3) privacy-aware API implementation. For each case, we develop statistical models that improve the privacy-utility Pareto frontier by leveraging a variety of machine learning techniques such as information theoretic privacy measures, Bayesian graphical models, non-parametric modeling, and low-rank factorization techniques. It shows that much utility can be extracted from health records while maintaining strong privacy guarantees and protection of sensitive health information.