From books to bytes: Accelerating digitization at TTU Libraries with Kirtas BookScan APT 2400

Date

2007-05-30

Authors

Lu, Jessica
Callender, Donell

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In 2006, the Texas Tech University Libraries purchased a high speed book scanner capable of scanning over 2000 pages per hour from Kirtas Technologies Inc. Funded in part with a $130,000 grant from the Lubbock-based Helen Jones Foundation, the purchase of the BookScan APT 2400 is the first of its kind by a university in the United States. The Kirtas scanner turns book pages with a vacuum head, delivering puffs of air that lift and separate pages. Books are secured on a cradle that uses laser technology to maintain focus for dual, 16-megapixel cameras that capture high-resolution page images in color. Because it uses picture technology rather than scanning technology it operates faster than its scanning counterparts. Because the book cradle and automatic page turning device is a mechanical set up that requires a clamp to hold down the pages, only books within a certain size range can take full advantage of the high speed machine. However, TTU libraries have successfully digitized tiny books with manual page turning – and it still saves significant time and facilitates post processing because of the dual-camera set-up. The companion software (APT Manager) enables automated processing in which the user can set up templates to crop pages, remove clamps, de-skew, center, sharpen images, etc., for the entire book. The default output of the book scanner is JPEG. You can determine with the template specific file formats you want the software to convert to. When templates are set for each book, the processing instructions are saved to an XML file and all the files can be run through super batch without human interference, thus significantly increasing production. Technical metadata is automatically generated during the operation and saved to an XML file. Descriptive metadata can be retrieved through a catalog search or manual data entry and output to the designated content management system. The software package also includes an OCR (Optical Character Recognition) Manager that outputs to a variety of file types: PDF, WORD, TEXT and XML. To preserve the look of the original, the “image over text” option allows users to see the PDF document as the original photographed image, but it enables full-text searching with the underlying OCRed text. Super batch can be applied to the OCR process to speed up production. All the scanned files and associated metadata files are directly saved to SAN storage via fiber connection. This acquisition has significantly accelerated TTU Libraries’ digitization efforts. The Libraries already have a variety of book scanning projects in queue ranging from rare books to theses and dissertations. Currently the digital lab is testing the new workflow and set up with a pilot project featuring donor materials. We look forward to sharing lessons learned with anyone interested.

Description

Presentation slides for the 2007 Texas Conference on Digital Libraries (TCDL).

Citation