Analyzing and Synthesizing Healthcare Time Series Data for Decision-Support
Abstract
Even though the healthcare industry is one of the biggest data generating industries of current time, researchers often do not enjoy the availability of large volume of good quality data. However, state-of-the-art techniques, for computational analysis, depend on large volume of good quality data. Moreover, healthcare time series datasets often come with their own sets of challenges; they can be unannotated, not labeled with lots of inter and intra patient variability. On top of that, clinicians lack an objective protocol to determine whether particular patterns are present in the data. Respiration Induced Tumor Motion or RITM dataset is one such dataset. Using this dataset we present three different analytical studies, where we analyze the datasets using unsupervised machine learning techniques to provide: i) patient similarity as a solution to handle lack of control dataset and variability present, ii) summary of the dataset as low dimensional profiles of the patients, and iii) annotation of the dataset via computational characterization of medically relevant patterns. The latter two case studies help oncologists objectively decide whether medically relevant patterns are present in the data (or not) and to map the patterns to treatment planning strategies. In order, to push the analysis of RITM dataset into the realm of supervised learning, we also use a genetic algorithm implementation of healthcare time series data synthesis. This in return lead us to investigate data synthesis methods for general healthcare time series data.