Journal Title

Journal ISSN

Volume Title



Collation is an important step in textual criticism and is most often an arduous task for most scholars involved in scholarly edition. Finding variations is important for researchers in bibliography and book history as well. In the late 1940s Charlton Hinman invented a machine that became popular as the Hinman collator. Using optical means, the Hinman Collator allowed manual comparison of separate copies of a text in order to detect any differences that had been introduced. Although these mechanical collation systems are helpful, they still require a lot of manual labor and some scholars find them hard to use. Another approach used sometimes is to perform collation on OCR output of text. However the state-of-the-art OCR mechanisms for 15th/16th century books are not efficient to date (70-80% accurate). Also scholars doing textual criticism generally prefer to work on original copies or facsimiles rather than OCR versions of them because the accuracy and some of the nuanced details of the original copy are important to them. Thus there is a need of a tool that can reduce the effort required in the collation process while maintaining (and sometimes improving) the usefulness of the tool and allowing scholars to use original documents (high quality facsimiles). This research focuses on this aspect of scholarly work and explores various approaches for performing digital collation in a seamlessly easy manner. A prototype of the virtual Hinman (vHinman) collator was created and user evaluation was conducted amongst scholars experienced with collation work. Image-matching algorithms along with context information are used to match words and the tool was integrated into the creativity support environment CritSpace. The tool was tested on books from early modern and late modern period for which multiple copies with slight variations were available.

The tool showed a high accuracy rate for the books tested. Most of the scholars found the tool very promising. This kind of tool can save a massive amount of time for scholars and set up a paradigm of digital collation encouraging even more scholars in finding new uses of collation in their work.