PerCon: A Personal Digital Library for Heterogeneous Data Management and Analysis



Journal Title

Journal ISSN

Volume Title



Systems are needed to support access to and analysis of larger and more heterogeneous scientific datasets. Users need support in the location, organization, analysis, and interpretation of data to support their current activities with appropriate services and tools. We developed PerCon, a data management and analysis environment, to support such use.

PerCon processes and integrates data gathered via queries to existing data providers to create a personal or a small group digital library of data. Users may then search, browse, visualize, annotate, and organize the data as they proceed with analysis and interpretation. Analysis and interpretation in PerCon takes place in a visual workspace in which multiple data visualizations and annotations are placed into spatial arrangements based on the current task. The system watches for patterns in the user?s data selection, exploration, and organization, then through mixed-initiative interaction assists users by suggesting potentially relevant data from unexplored data sources. In order to identify relevant data, PerCon builds up various precomputed feature tables of data objects including their metadata (e.g. similarities, distances) and a user interest model to infer the user interest or specific information need. In particular, probabilistic networks in PerCon model user interactions (i.e. event features) and predict the data type of greatest interest through network training. In turn, the most relevant data objects of interest in the inferred data type are identified through a weighted feature computation then recommended to the user.

PerCon?s data location and analysis capabilities were evaluated in a controlled study with 24 users. The study participants were asked to locate and analyze heterogeneous weather and river data with and without the visual workspace and mixed-initiative interaction, respectively. Results indicate that the visual workspace facilitated information representation and aided in the identification of relationships between datasets. The system?s suggestions encouraged data exploration, leading participants to identify more evidences of correlation among data streams and more potential interactions among weather and river data.