Comparison of edit history clustering techniques for spatial hypertext

Mandal, Bikash

Comparison of edit history clustering techniques for spatial hypertext

Date

2006-04-12

Authors

Mandal, Bikash

Publisher

Texas A&M University

Abstract

History mechanisms available in hypertext systems allow access to past user interactions with the system. This helps users evaluate past work and learn from past activity. It also allows systems identify usage patterns and potentially predict behaviors with the system. Thus, recording history is useful to both the system and the user. Various tools and techniques have been developed to group and annotate history in Visual Knowledge Builder (VKB). But the problem with these tools is that the operations are performed manually. For a large VKB history growing over a long period of time, performing grouping operations using such tools is difficult and time consuming. This thesis examines various methods to analyze VKB history in order to automatically group/cluster all the user events in this history. In this thesis, three different approaches are compared. The first approach is a pattern matching approach identifying repeated patterns of edit events in the history. The second approach is a rule-based approach that uses simple rules, such as group all consecutive events on a single object. The third approach uses hierarchical agglomerative clustering (HAC) where edits are grouped based on a function of edit time and edit location. The contributions of this thesis work are: (a) developing tools to automatically cluster large VKB history using these approaches, (b) analyzing performance of each approach in order to determine their relative strengths and weaknesses, and (c) answering the question, how well do the automatic clustering approaches perform by comparing the results obtained from this automatic tool with that obtained from the manual grouping performed by actual users on a same set of VKB history. Results obtained from this thesis work show that the rule-based approach performs the best in that it best matches human-defined groups and generates the fewest number of groups. The hierarchic agglomerative clustering approach is in between the other two approaches with regards to identifying human-defined groups. The pattern-matching approach generates many potential groups but only a few matches with those generated by actual VKB users.